Performing Analysis of Meteorological Data

Ritesh Uppal
6 min readApr 6, 2021

In this article, we will be performing the analysis to test the (given) Hypothesis. The Null Hypothesis (Ho) is “The Apparent temperature and humidity compared monthly across 10 years of the data indicate an increase due to Global warming”.

The Ho means we need to find whether the average apparent temperature for the month (say April) starting from 2006 to 2016 and the average humidity for the same period have increased. This monthly analysis has to be done for all 12 months over the 10 year period.

Data Preprocessing

First 8 rows of the dataset of shape (96453, 11)
  1. Checking for NULL values

There are no NULL values under the desired columns ‘Apparent_Temperature(C)’, ‘Humidity’, and ‘Formatted Date’.

Column names and null values present in them

2. Checking the data type

The data type of ‘Formatted Date’ is ‘object’ which has to be changed to ‘datetime’. Also, observe that the values have different time zones. To deal with this issue we will be using UTC= TRUE as one of the parameters to convert all of the values into the UTC standard time.

The data type of ‘Formatted Date’ changed to ‘datetime64’

3. Adding ‘Month’ and ‘Year’ columns in the data frame.

Data frame after addition of two new columns

4. Grouping rows by ‘Year’ and ‘Month’ and taking the average values.

Here we see the average value of columns having int or float datatype in the data frame.

Mean values of columns grouped by ‘Year’ and ‘Month’

Data Analysis

Now we will be plotting a line chart for ‘Apparent Temperature(C)’ and ‘Humidity’ for different months over the 10 years. To further understand the trend mathematically, we will be looking at the equation of trend/regression line to get enough evidence to be able to reject or not reject Ho.

  1. January
Equation of trend line for Temperature Plot is  
0.0962 x - 192.6
The slope is positive, indicating an increase in apparent temperature with time.
Equation of trend line for Humidity Plot is  
0.002387 x - 3.949
The slope is positive, indicating an increase in humidity with time.

2. February

Equation of trend line for Temperature Plot is  
0.1945 x - 388.9
The slope is positive, indicating an increase in apparent temperature with time.
Equation of trend line for Humidity Plot is  
0.002891 x - 4.999
The slope is positive, indicating an increase in humdity with time.

3. March

Equation of trend line for Temperature Plot is  
0.1264 x - 247.2
The slope is positive, indicating an increase in apparent temperature with time.
Equation of trend line for Humidity Plot is  
-0.00218 x + 5.087
The slope is negative, indicating a decrease in humidity with time.

4. April

Equation of trend line for Temperature Plot is 
0.01305 x - 13.48
The slope is positive, indicating an increase in apparent temperature with time.
Equation of trend line for Humidity Plot is  
-0.001273 x + 3.201
The slope is negative, indicating a decrease in humidity with time.

5. May

Equation of trend line for Temperature Plot is  
-0.02658 x + 70.33
The slope is negative, indicating a decrease in apparent temperature with time.
Equation of trend line for Humidity Plot is  
0.003475 x - 6.297
The slope is positive, indicating an increase in humdity with time.

6. June

Equation of trend line for Temperature Plot is
0.02011 x - 19.72
The slope is positive, indicating an increase in apparent temperature with time.

Equation of trend line for Humidity Plot is
-0.001182 x + 3.064
The slope is negative, indicating a decrease in humidity with time.

7. July

Equation of trend line for Temperature Plot is  
-0.03355 x + 90.43
The slope is negative, indicating a decrease in apparent temperature with time.
Equation of trend line for Humidity Plot is  
0.006731 x - 12.9
The slope is positive, indicating an increase in humdity with time.

8. August

Equation of trend line for Temperature Plot is  
0.07678 x - 132.1
The slope is positive, indicating an increase in apparent temperature with time.
Equation of trend line for Humidity Plot is  
0.001612 x - 2.608
The slope is positive, indicating an increase in humdity with time.

9. September

Equation of trend line for Temperature Plot is  
0.1573 x - 298.8
The slope is positive, indicating an increase in apparent temperature with time.
Equation of trend line for Humidity Plot is  
0.003366 x - 6.08
The slope is positive, indicating an increase in humdity with time.

10. October

Equation of trend line for Temperature Plot is  
-0.07068 x + 153.5
The slope is negative, indicating a decrease in apparent temperature with time.
Equation of trend line for Humidity Plot is  
0.008141 x - 15.6
The slope is positive, indicating an increase in humdity with time.

11. November

Equation of trend line for Temperature Plot is  
0.03669 x - 67.19
The slope is positive, indicating an increase in apparent temperature with time.
Equation of trend line for Humidity Plot is  
0.003554 x - 6.32
The slope is positive, indicating an increase in humdity with time.

12. December

Equation of trend line for Temperature Plot is 
0.09122 x - 181.9
The slope is positive, indicating an increase in apparent temperature with time.
Equation of trend line for Humidity Plot is  
0.0002569 x + 0.3556
The slope is positive, indicating an increase in humdity with time.

Summary

Only the months January, February, August, September, November, and December show an increase in both apparent temperature and humidity over 10 years, so data doesn’t favor the Null Hypothesis and hence we reject it.

I am thankful to mentors at https://internship.suvenconsultants.com for providing awesome problem statements and giving many of us a Coding Internship Experience. Thank you www.suvenconsultants.com

--

--

Ritesh Uppal

Got hit in head by waves of data! Research Intern @Samsung | Ex-Business Analyst @UC Berkeley