[Part-1] Earthquake Data Analysis

Ritesh Uppal
6 min readMay 2, 2022

For this analysis, we’ll use the ‘Significant Earthquakes, 1965–2016’ dataset. This dataset includes a record of the date, time, location, depth, magnitude, and source of every earthquake with a reported magnitude of 5.5 or higher since 1965 compiled by The National Earthquake Information Center (NEIC).

Fig 1 : Destruction from an earthquake

Step-1: Preprocessing the data

a) The dataset was validated for null values, columns insignificant for analysis were dropped, and the ‘Type’ column values were filtered to only include ‘Earthquake’ entries. Check out the code for additional information.

Fig 2: Dataset after initial preprocessing

b) Reverse geocoding was done to convert Latitude and Longitude information to a two-letter country code. To access complete names from country codes, we will use another dataset.

#Python3 program for reverse geocodingdef convert_to_country(df):
for index, row in df.iterrows():
coordinates = (row['Latitude'], row['Longitude'])
result = rg.search(coordinates)
df.at[index, 'Country'] = result[0]['cc']
return df
Fig 3: Resultant dataset after reverse geocoding

Step-2: Data Analysis

  1. How is the magnitude of the earthquake distributed?
import plotly.figure_factory as ff
fig = ff.create_distplot([df['Magnitude']], ['Earthquake Magnitude'])
fig.show()
Fig 4: Earthquake magnitude distribution

We can deduce from this graph that the magnitude of significant earthquakes (magnitude > 5.5) has mainly been in the region of 5.5 -6, with a smaller possibility of experiencing a magnitude larger than 7.5. If the data for magnitude<5.5 is also included, it appears that a normal distribution with a mean of about 5.6 can be formed.

2. March is Earthquake Month, and Other Shaky ‘Facts.’

A glance at descriptive geological statistics would lead one to conclude that March is an earthquake month because it has the most earthquakes compared to other months, however, a closer examination of the data reveals that the number of earthquakes occurring in each month is not significantly different. There is no link between earthquake activity and months or seasons, according to studies [1].

df['DateTime'] = df['Date'] + " " + df['Time']
def month(df):
for index, row in df.iterrows():
datem = datetime.datetime.strptime(row['DateTime'],"%m/%d/%Y %H:%M:%S")
df.at[index, 'Month'] = datem.strftime("%B")
return df
Fig 5: Resultant data frame after adding the ‘Month’ column
import plotly.express as pxfig = px.histogram(df, x="Month")fig.show()
Fig 6: Month vs Earthquake count plot

3. Why are we seeing more earthquakes?

Year = [x for x in range(1965,2017,1)]
Earthquake_count = df.groupby(pd.Grouper(key='DateTime', freq='Y')).count()['ID'].values
fig = px.line(x= Year, y= Earthquake_count, title='Earthquake vs Year Plot')
fig.show()
Fig 7: Earthquake count vs Year plot

The number of earthquakes is certainly growing every year, as shown in this graph. But wait, wait, wait, before you imagine the world’s end is near, there is more to the story! USGS has got an answer for this

The increased number of earthquakes in recent years is due to the availability of more seismic sensors that can record more earthquakes, not because there are more earthquakes. Every year, the National Earthquake Information Center records over 20,000 earthquakes around the world, or about 55 per day. The public now knows about earthquakes more swiftly than ever before as a result of improved communications and greater interest in natural disasters.

Let me throw some more information at you before you delete your #EndIsComing tweet! But, you could argue, what about population growth, plate tectonics, explosions, mining, and a slew of other factors? Let’s take a look at one more plot to help clarify things. I’ve split earthquake magnitude into three categories here.

Fig 8: Earthquake count vs year plot with magnitude label
Fig 9: Number of earthquakes vs Year plot with two labels

While the number of earthquakes is undoubtedly increasing, the number of large magnitude earthquakes (>7) has remained relatively constant throughout time. We expect about 16 major earthquakes in any given year. That includes 15 earthquakes in the magnitude seven range and one earthquake magnitude of 8.0 or greater [2].

4. Where do most of the greater than 6.9 magnitude earthquake occurs?

df_not_low = df.query('Magnitude >= 6.9')
df_not_low['Country'].value_counts()
Fig 10: Countries in order (high to low) of greater than 7 magnitude earthquakes

The top five countries with the most earthquakes of larger than 6.9 magnitude are Indonesia, Japan, Vanuatu, Tonga, and Papua New Guinea, in that order. Let’s use Google Maps to plot these countries.

Fig 11: Plotting top-5 countries having greater than 7 magnitude earthquakes

Have you noticed anything unusual about their location? They’re all around the Pacific Ocean, or, to be more precise, the ‘RING OF FIRE’ 🔥! Maintain your composure! In the following part, we’ll go through it in further depth.

5. Depth of an earthquake is more than just a number

Quakes can strike near the surface or deep within the earth. Shallow earthquakes have a depth of 0–70 km; intermediate earthquakes have a depth of 70–300 km (43–186 miles), and deep earthquakes have a depth of 300–700 km (186–434 miles) [3].

Fig 12: Descriptive statistics for the ‘Depth’ column

Have you seen anything unusual? Negative depths also exist (min value here is -1.100000) ; does this imply that two chunks of crust are slipping past one another at or above the earth’s surface? Huh!? The USGS has a response to this

The error bars to calculate depth are generally larger than the variation due to different depth determination methods and difficulty to calculate depth. When the earthquake depth is very shallow, it can be reported as a negative depth.

plt.hist(df['Depth'],bins = [x for x in range(0,701,70)])
Fig 13: Depth of an earthquake bar plot

We can see that most earthquakes lie in the range of 0–70 km, hence being shallow. Shallow quakes generally tend to be more damaging than deeper quakes. Seismic waves from deep quakes have to travel farther to the surface, losing energy. While deep earthquakes may be less destructive, they’re usually more widely felt [4].

6. Plotting geospatial data

I’ll use the Keplergl package to plot geographical data in Google Colab. To know more about how it works and how you can use layers, filters, or other customized settings, please check this article.

from keplergl import KeplerGlmap_1 = KeplerGl(height = 600)map_1.add_data(data=df, name = 'Earthquakes Visualization')map_1
Fig 14: Heatmap depicting the number of earthquakes from 1965 to 2016

The intensity of the heatmap here is based on the magnitude of the earthquake, where yellow represents high while red represents low magnitude. If you observe carefully, you will see a horseshoe-shaped belt about 40,000 km in the pacific ocean. This is called the Ring of Fire!

Fig 15: Ring of fire

About 76% of the Earth’s seismic energy is released as earthquakes in the Ring of Fire. The Ring of Fire contains approximately 850–1,000 volcanoes that have been active during the last 11,700 years (about two-thirds of the world’s total) [5]. About 90% of the Earth’s earthquakes and about 81% of the world’s largest earthquakes occur along the Ring of Fire.

Keep an eye out for more articles like this! Check out Part-2 here.

References:

  1. https://www.livescience.com/6887-march-earthquake-month-shaky-facts.html
  2. https://www.usgs.gov/faqs/why-are-we-having-so-many-earthquakes-has-naturally-occurring-earthquake-activity-been
  3. https://www.vedantu.com/geography/shallow-intermediate-and-deep-foci-earthquakes
  4. https://phys.org/news/2016-08-difference-shallow-deep-earthquakes.html
  5. https://en.wikipedia.org/wiki/Ring_of_Fire

--

--

Ritesh Uppal

Got hit in head by waves of data! Research Intern @Samsung | Ex-Business Analyst @UC Berkeley