Performing Analysis of Meteorological Data
In this article, we will be performing the analysis to test the (given) Hypothesis. The Null Hypothesis (Ho) is “The Apparent temperature and humidity compared monthly across 10 years of the data indicate an increase due to Global warming”.
The Ho means we need to find whether the average apparent temperature for the month (say April) starting from 2006 to 2016 and the average humidity for the same period have increased. This monthly analysis has to be done for all 12 months over the 10 year period.
Link for the dataset: https://www.kaggle.com/muthuj7/weather-dataset
Link for Jupyter Notebook: https://github.com/riteshuppal1402/symmetrical-system
Data Preprocessing
- Checking for NULL values
There are no NULL values under the desired columns ‘Apparent_Temperature(C)’, ‘Humidity’, and ‘Formatted Date’.
2. Checking the data type
The data type of ‘Formatted Date’ is ‘object’ which has to be changed to ‘datetime’. Also, observe that the values have different time zones. To deal with this issue we will be using UTC= TRUE as one of the parameters to convert all of the values into the UTC standard time.
3. Adding ‘Month’ and ‘Year’ columns in the data frame.
4. Grouping rows by ‘Year’ and ‘Month’ and taking the average values.
Here we see the average value of columns having int or float datatype in the data frame.
Data Analysis
Now we will be plotting a line chart for ‘Apparent Temperature(C)’ and ‘Humidity’ for different months over the 10 years. To further understand the trend mathematically, we will be looking at the equation of trend/regression line to get enough evidence to be able to reject or not reject Ho.
- January
Equation of trend line for Temperature Plot is
0.0962 x - 192.6The slope is positive, indicating an increase in apparent temperature with time.
Equation of trend line for Humidity Plot is
0.002387 x - 3.949The slope is positive, indicating an increase in humidity with time.
2. February
Equation of trend line for Temperature Plot is
0.1945 x - 388.9The slope is positive, indicating an increase in apparent temperature with time.
Equation of trend line for Humidity Plot is
0.002891 x - 4.999The slope is positive, indicating an increase in humdity with time.
3. March
Equation of trend line for Temperature Plot is
0.1264 x - 247.2The slope is positive, indicating an increase in apparent temperature with time.
Equation of trend line for Humidity Plot is
-0.00218 x + 5.087The slope is negative, indicating a decrease in humidity with time.
4. April
Equation of trend line for Temperature Plot is
0.01305 x - 13.48The slope is positive, indicating an increase in apparent temperature with time.
Equation of trend line for Humidity Plot is
-0.001273 x + 3.201The slope is negative, indicating a decrease in humidity with time.
5. May
Equation of trend line for Temperature Plot is
-0.02658 x + 70.33The slope is negative, indicating a decrease in apparent temperature with time.
Equation of trend line for Humidity Plot is
0.003475 x - 6.297The slope is positive, indicating an increase in humdity with time.
6. June
Equation of trend line for Temperature Plot is
0.02011 x - 19.72The slope is positive, indicating an increase in apparent temperature with time.
Equation of trend line for Humidity Plot is
-0.001182 x + 3.064The slope is negative, indicating a decrease in humidity with time.
7. July
Equation of trend line for Temperature Plot is
-0.03355 x + 90.43The slope is negative, indicating a decrease in apparent temperature with time.
Equation of trend line for Humidity Plot is
0.006731 x - 12.9The slope is positive, indicating an increase in humdity with time.
8. August
Equation of trend line for Temperature Plot is
0.07678 x - 132.1The slope is positive, indicating an increase in apparent temperature with time.
Equation of trend line for Humidity Plot is
0.001612 x - 2.608The slope is positive, indicating an increase in humdity with time.
9. September
Equation of trend line for Temperature Plot is
0.1573 x - 298.8The slope is positive, indicating an increase in apparent temperature with time.
Equation of trend line for Humidity Plot is
0.003366 x - 6.08The slope is positive, indicating an increase in humdity with time.
10. October
Equation of trend line for Temperature Plot is
-0.07068 x + 153.5The slope is negative, indicating a decrease in apparent temperature with time.
Equation of trend line for Humidity Plot is
0.008141 x - 15.6The slope is positive, indicating an increase in humdity with time.
11. November
Equation of trend line for Temperature Plot is
0.03669 x - 67.19The slope is positive, indicating an increase in apparent temperature with time.
Equation of trend line for Humidity Plot is
0.003554 x - 6.32The slope is positive, indicating an increase in humdity with time.
12. December
Equation of trend line for Temperature Plot is
0.09122 x - 181.9The slope is positive, indicating an increase in apparent temperature with time.
Equation of trend line for Humidity Plot is
0.0002569 x + 0.3556The slope is positive, indicating an increase in humdity with time.
Summary
Only the months January, February, August, September, November, and December show an increase in both apparent temperature and humidity over 10 years, so data doesn’t favor the Null Hypothesis and hence we reject it.
I am thankful to mentors at https://internship.suvenconsultants.com for providing awesome problem statements and giving many of us a Coding Internship Experience. Thank you www.suvenconsultants.com