You are on page 1of 19

applied

sciences
Article
Small Earthquakes Can Help Predict Large Earthquakes:
A Machine Learning Perspective
Xi Wang 1 , Zeyuan Zhong 1 , Yuechen Yao 1 , Zexu Li 1 , Shiyong Zhou 2 , Changsheng Jiang 3 and Ke Jia 1,4, *

1 School of Automation, Northwestern Polytechnical University, Xi’an 710129, China


2 School of Earth and Space Science, Peking University, Beijing 100871, China
3 Institute of Geophysics, China Earthquake Administration, No. 5 Minzu Daxue Nan Road, Haidian District,
Beijing 100086, China
4 Shanghai Sheshan National Geophysical Observatory, Shanghai 201602, China
* Correspondence: jk@nwpu.edu.cn

Abstract: Earthquake prediction is a long-standing problem in seismology that has garnered attention
from the scientific community and the public. Despite ongoing efforts to understand the physical
mechanisms of earthquake occurrence, there is no convincing physical or statistical model for pre-
dicting large earthquakes. Machine learning methods, such as random forest and long short-term
memory (LSTM) neural networks, excel at identifying patterns in large-scale databases and offer
a potential means to improve earthquake prediction performance. Differing from physical and
statistical approaches to earthquake prediction, we explore whether small earthquakes can be used
to predict large earthquakes within the framework of machine learning. Specifically, we attempt to
answer two questions for a given region: (1) Is there a likelihood of a large earthquake (e.g., M ≥ 6.0)
occurring within the next year? (2) What is the maximum magnitude of an earthquake expected
to occur within the next year? Our results show that the random forest method performs best in
classifying large earthquake occurrences, while the LSTM method provides a rough estimation of
earthquake magnitude. We conclude that small earthquakes contain information relevant to predict-
ing future large earthquakes and that machine learning provides a promising avenue for improving
the prediction of earthquake occurrences.

Keywords: earthquake prediction; machine learning; random forest; long short-term memory
Citation: Wang, X.; Zhong, Z.; Yao, Y.; neural network
Li, Z.; Zhou, S.; Jiang, C.; Jia, K. Small
Earthquakes Can Help Predict Large
Earthquakes: A Machine Learning
Perspective. Appl. Sci. 2023, 13, 6424. 1. Introduction
https://doi.org/10.3390/app13116424 The prediction of earthquakes has long been a formidable challenge [1–3], owing
Academic Editor: Roberto Zivieri to several factors. Firstly, seismic events are the result of intricate interactions between
tectonic plates, faults, and other geological factors [4], rendering the accurate forecasting of
Received: 6 April 2023
their timing and magnitudes exceedingly challenging. Secondly, the paucity of long-term
Revised: 7 May 2023
and extensive data poses a significant hurdle in earthquake prediction. Large earthquakes
Accepted: 22 May 2023
often recur at lengthy intervals (hundreds to thousands of years) [5], making it arduous
Published: 24 May 2023
to identify trends and patterns over a prolonged time frame [6]. Owing to the intricate
geological interplay, the variability of seismic activity, the inadequacy of comprehensive
data, and the current technological limitations, the prediction of earthquakes remains a
Copyright: © 2023 by the authors.
complex and formidable field.
Licensee MDPI, Basel, Switzerland. Moreover, conventional prediction methods based on empirical (physical or statistical)
This article is an open access article models are often oversimplified and fallacious when applied to real-life scenarios [7]. With
distributed under the terms and the rapid development of artificial intelligence (AI) in recent years, many research fields
conditions of the Creative Commons have been benefited, including earthquake prediction. At the core of AI lies machine
Attribution (CC BY) license (https:// learning, which plays an essential role in driving this transformation. Machine learning’s
creativecommons.org/licenses/by/ ability to identify the corresponding functional relationships between vast amounts of
4.0/).

Appl. Sci. 2023, 13, 6424. https://doi.org/10.3390/app13116424 https://www.mdpi.com/journal/applsci


Appl. Sci. 2023, 13, 6424 2 of 19

data and corresponding labels is its primary advantage. These relationships can be high-
dimensional and nonlinear, making it challenging for humans to comprehend, as is the
case with earthquake prediction [8,9]. Traditionally, experts’ experiences formed the basis
of earthquake prediction, which led to random and uncertain outcomes. However, the
application of machine learning to earthquake prediction provides a promising approach
to achieving more accurate and reliable results.
As early as the 1990s, some scholars proposed the use of machine learning in the
area of research in seismology [10]. Currently, various machine learning methods have
been applied to seismic classification and location [11–14], seismic event prediction [15–17],
seismic early warning [18], seismic exploration [19], slow slip event detection [20], and
tomography [21], achieving promising preliminary results in these research fields. Mean-
while, the rapid deployment of long-term seismic monitoring, providing a huge amount of
seismic dataset information, accelerates the application of machine learning methods in
seismology [8,9,22].
In the research of time series and magnitude prediction, neural networks, including
LSTM and convolution neural networks, have been widely used in recent years [23,24].
In [25], the authors proposed a novel approach to earthquake prediction using LSTM
networks to capture spatiotemporal correlations among earthquakes in different locations.
Their simulation results demonstrate that their method outperforms traditional approaches.
The use of seismicity indicators as inputs in machine learning classifiers has been
shown to improve accuracy in earthquake prediction [26]. Results from applying this
methodology to four cities in Chile demonstrate that robust predictions can be made by
exhaustively exploring how certain parameters should be set up. In [27], a proposed
methodology based on the computation of seismic indicators and GP-AdaBoost classifica-
tion has been trained and tested for three regions: the Hindu Kush, Chile, and Southern
California. The obtained prediction results for these regions exhibit improvement when
compared with already available studies.
The application of artificial neural networks (ANNs) to earthquake prediction is ex-
plored in [28]. The results from the application of ANNs to Chile and the Iberian peninsula
are presented, along with a comparative analysis with other well-known classifiers. The
conclusion is that the use of a new set of inputs improved all classifiers, but the ANN
obtained better results than any other classifier.
A methodology for discovering earthquake precursors by using clustering, grouping,
constructing a precursor tree, pattern extraction, and pattern selection has been applied
to seven different datasets from three different regions [29]. Results show a remarkable
improvement in terms of all evaluated quality measures compared to the former version.
The authors suggest that this approach could be further developed and applied to other
regions with different geophysical properties to improve earthquake prediction.
In [30], the authors use machine learning techniques to detect signals in a correlation
time series corresponding to future large earthquakes. The overall quality is measured by
decision thresholds and receiver operating characteristic (ROC) methods together with
Shannon information entropy. They hope that the deep learning approach will be more
general than previous methods and not require prior guesswork as to what patterns
are important.
The seismic activity parameters constructed based on seismic catalogs are typically
used as the input data set for earthquake prediction. However, it is still debatable whether
the information about small earthquakes (typically with a magnitude smaller than 4.0),
such as foreshocks and aftershocks, contained in these seismic features can predict large
earthquakes (typically with a magnitude larger than 6.0). In fact, the ability to successfully
predict earthquakes has been the subject of controversy [1,3] among researchers. Some
studies have shown that information about small earthquakes can indicate the occurrence
of large earthquakes, while others have drawn opposite conclusions [31,32].
To clarify this debate, a study was conducted by using seismic features based on
seismic catalogs containing small earthquakes to test whether this information can help
To clarify this debate, a study was conducted by using seismic features based on seis-
mic catalogs containing small earthquakes to test whether this information can help pre-
dict large earthquakes. Additionally, the possibility of using machine learning in earth-
quake prediction was explored. The study focused on two important questions:
Appl. Sci. 2023, 13, 6424 3 of 19
1. Will there be a strong earthquake (M ≥ 6.0, 7.0, or 8.0) in the next year?
2. What will be the maximum magnitude of the earthquake in the next year?
The Sichuan–Yunnan region, as shown in Figure 1,was chosen as the research area
predict large earthquakes. Additionally, the possibility of using machine learning in
and seismic features were extracted from the seismic catalog to predict earthquakes.
earthquake prediction was explored. The study focused on two important questions:
The traditional machine learning methods were used to classify whether there would
1. strong
be Willearthquakes
there be a strong
in theearthquake and≥the
next year, (M 6.0,LSTM
7.0, ornetwork
8.0) in the
wasnext year?
used to estimate the
2. What will be the maximum magnitude of the earthquake in the next year?
maximum magnitude of the earthquake in the next year. In this way, we explored the
The applications
potential Sichuan–Yunnan region, as
of machine shown in
learning Figure 1,was
methods chosen asprediction
in earthquake the researchand
areapro-
and
seismic features were extracted from the seismic
vided a possible solution for seismic hazard evaluation.catalog to predict earthquakes.

Spatialdistribution
Figure1.1.Spatial
Figure distributionofofseismicity
seismicityininChuandian
Chuandianregion
regionfrom
from1970
1970toto2021.
2021.The
Theblack
blacklines
lines
represent active faults [33].
represent active faults [33]. The yellow circles represent earthquakes larger than 3.0, the purple circles
circles represent earthquakes larger than 3.0, the purple cir-
cles
showshow earthquakes
earthquakes larger
larger thanthan
6.0,6.0,
andand
the the
redred stars
stars are are earthquakes
earthquakes larger
larger thanthan
7.0. 7.0.

The traditional
2. Dataset and Featuremachine learning methods were used to classify whether there would
Engineering
be strong earthquakes in the next year, and the LSTM network was used to estimate the
The seismic catalog used in this study was obtained from the China Earthquake Data
maximum magnitude of the earthquake in the next year. In this way, we explored the
Center (CEDC, http://data.earthquake.cn/index.html, last accessed on 23 May 2022) and
potential applications of machine learning methods in earthquake prediction and provided
includes earthquake events with a magnitude greater than 3.0 in the Sichuan–Yunnan re-
a possible solution for seismic hazard evaluation.
gion from 1970 to 2021. By means of feature engineering, seismic activity parameters were
generated
2. Datasetbased on several
and Feature statistical laws. These parameters were derived from the seis-
Engineering
mic catalog and were used as the input features for earthquake prediction, rather than the
The seismic catalog used in this study was obtained from the China Earthquake Data
original catalog itself.
Center (CEDC, http://data.earthquake.cn/index.html, last accessed on 23 May 2022) and
To test these methods, the Chuandian region of Southwestern China (98.0° E–106.0°
includes earthquake events with a magnitude greater than 3.0 in the Sichuan–Yunnan
E, 24.0° N–32.0° N) was selected due to its abundant earthquakes. The time range for anal-
region from 1970 to 2021. By means of feature engineering, seismic activity parameters
ysis was from 1 January 1970 to 23 May 2021. While there were several sudden large
were generated based on several statistical laws. These parameters were derived from the
seismic catalog and were used as the input features for earthquake prediction, rather than
the original catalog itself.
To test these methods, the Chuandian region of Southwestern China (98.0◦ E–106.0◦ E,
24.0 N–32.0◦ N) was selected due to its abundant earthquakes. The time range for analysis

was from 1 January 1970 to 23 May 2021. While there were several sudden large increases
in earthquake activity due to large earthquakes during this time period, the complete-
Appl. Sci. 2023, 13, 6424 4 of 22
Appl. Sci. 2023, 13, 6424 4 of 19

increases in earthquake activity due to large earthquakes during this time period, the com-
pleteness
ness magnitude
magnitude waswas estimated
estimated to be
to be approximately
approximately 3.03.0using
usingthe
themaximum
maximum curvature
curvature
technique [34,35]
technique [34,35] (Figure
(Figure2).
2).Therefore,
Therefore,the cutoff
the magnitude
cutoff magnitude was set set
was at 3.0
at for
3.0 this study.
for this study.

Figure2.2. The
Figure The temporal
temporal completeness
completeness magnitude from 1970 to 2021 in the Chuandian
Chuandian region
region based
based
on
onthe
themaximum
maximum curvature
curvature technique [34,35].
technique [34,35].

Prior
Prior research
research has
has demonstrated
demonstrated that that many seismic features
many seismic features derived
derived from
from earthquake
earthquake
catalogs can be used to predict earthquakes [36–38]. These features
catalogs can be used to predict earthquakes [36–38]. These features include an ‘a’ include an ‘a’ value
value
and a ‘b’ value in the Gutenberg–Richter law, the number and maximum/mean
and a ‘b’ value in the Gutenberg–Richter law, the number and maximum/mean magnitude magnitude
of
of past
past earthquakes,
earthquakes, seismic
seismic energy
energy release,
release, magnitude deficit, seismic
magnitude deficit, seismic rate
rate changes,
changes, andand
elapsed time since the last large earthquake. In addition, the standard deviation
elapsed time since the last large earthquake. In addition, the standard deviation of the of the
estimated b value, the deviation from the Gutenberg–Richter law, and
estimated b value, the deviation from the Gutenberg–Richter law, and the probability of the probability
of earthquake
earthquake occurrence
occurrence areare
alsoalso calculated
calculated as seismic
as seismic features.
features. The formulas
The formulas used toused
calcu-to
calculate these seismic features are listed
late these seismic features are listed in Table 1. in Table 1.

Table 1. Details of seismic features and mathematical expressions.


Table 1. Details of seismic features and mathematical expressions.
Seismic
SeismicFeatures
Features Description
Description Mathematical Expressions
Mathematical Expressions
Number
Numberofofearthquakes in
earthquakes in
N.O.
N.O. observation window
observation window
Maximum magnitude in h i
Mag_max Maximum magnitude
Mag_max observation window in max{Mi } , when t ∈ t j , t j + t _ obs 
max{ Mi }, when t ∈ t j , t j + t_obs
observation
Mean window
magnitude in observation ∑ i Mi
Mag_mean

window n
Mag_mean
Mean magnitude in obser-
b value using least square Mi
n∑ ( Mi log Nii )−∑ Mi ∑ log Ni
b_lsq vation window
regression analysis (∑ Mi )2 −n∑ Mi2 n
n ( M i log N )n−  Mi i  log Ni
a value using least square ∑(log N +b_lsq· M )
a_lsq i
regression
b value analysis
using least square si
b_lsq
( )M ) − n M
2n 2 )2
b_std_lsq regression
Standard analysis
deviation of b_lsq ∑ ( Mi − Mag_mean
2 i i =1 i
2.3(b_lsq n ( n −1)

 ( log
Deviation from
std_gr_lsq a value using leastlaw
Gutenberg–Richter square
(b_lsq, ) )
N− + b−_ lsq·  M
∑ log Ni i a_lsq b_lsq Mi
( 2
i
a_lsq
regression analysis
a_lsq)
− n 1
n
b value using maximum log e
b_mlk n − Mc
likelihood method Mag_mean
+(b_mlk
M − Mag _ mean )
2

a_ mlk
Standard deviation
a value using maximumof log N ·M i
b_std_lsq
2.3 ( b _ lsq )
2 i =1 c
likelihood method
b_lsq
n ( n − 1) 2
s n
b_std_ mlk Standard deviation of b_mlk ∑ ( Mi − Mag_mean)
2.3(b_mlk )2 i =1
n ( n −1)
Appl. Sci. 2023, 13, 6424 5 of 19

Table 1. Cont.

Seismic Features Description Mathematical Expressions


Deviation from
2
std_gr_ mlk Gutenberg–Richter law (b_mlk, ∑(log Ni − a_mlk−b_mlk· Mi )
n −1
a_mlk)
dM_lsq Magnitude deficit (b_lsq, a_lsq) Mag_max − a_lsq/b_lsq
Magnitude deficit (b_mlk,
dM_mlk Mag_max − a_mlk/b_mlk
a_mlk) q
Energy Seismic energy release ∑ 1012+1.8Mi


Probability of earthquake −3b_lsq


x7_lsq e log e
occurrence using b_lsq
Probability of earthquake −3b_mlk
x7_mlk e log e
occurrence using b_mlk
qR1 − R2 ,
S1 S2
n +n1 2
where R1 and R2 are seismic rate for the first and second half interval
zvalue Seismic rate change in the observation window. S1 and S2 represent the standard
deviation of seismic rate R1 and R2 . n1 and n2 are the number of
earthquakes in those two intervals.
M(t,δ)−nδ
√ ,
nδ(1−δ)
where n is the number of earthquakes of the whole seismic catalog, t
beta Seismic rate change is the time duration, and δ is the normalized duration of interest.
M(t, δ) represents the observed number of earthquakes by defining
end time t and interval of interest δ.
Days since the last M6.0
T_elaps6
earthquake
Days since the last M6.5
Appl. Sci.T_elaps65
2023, 13, 6424 6 of 22
earthquake
Days since the last M7.0
T_elaps7
earthquake
Days since the last M7.5
T_elaps75 Days since the last M7.5
T_elaps75 earthquake
earthquake

The
The temporal
temporal variations
variations ofof all
all 22
22 seismic
seismic features
features listed
listed in
in Table
Table1.1. were
were calculated
calculated
for
for the Chuandian region from 1970 to 2021 using a sliding window process, similarly to
the Chuandian region from 1970 to 2021 using a sliding window process, similarly to
previous
previousstudies
studies[39].
[39].To
Topredict
predict earthquakes
earthquakes onona mid-term
a mid-term basis, the the
basis, observation
observationwindow,
win-
the
dow,prediction window,
the prediction and theand
window, sliding windowwindow
the sliding were setwere
to beset
2 years,
to be 21 years,
year, and 30 days,
1 year, and
respectively (Figure 3).
30 days, respectively This 3).
(Figure resulted in 591 time
This resulted in 591steps
timewith
steps22with
seismic features
22 seismic at each
features at
step.
each For
step.earthquake classification,
For earthquake labelslabels
classification, were marked with either
were marked 0 or 1,0and
with either or 1,observed
and ob-
magnitudes were used
served magnitudes wereforused
earthquake magnitude
for earthquake prediction.
magnitude prediction.

Figure 3.3.Schematic
Figure Schematicdiagram
diagramof of seismic
seismic feature
feature generation
generation and and
labelslabels
usingusing sliding
sliding window window ap-
approach.
proach.
The The observation
observation window, thewindow, the prediction
prediction window,
window, and and window
the sliding the sliding
are window
set to be 2are set to
years, be 2
1 year,
years,
and 30 1days,
year,respectively.
and 30 days, respectively.

3. Methods
In this study, both traditional machine learning algorithms and LSTM neural net-
works are employed to investigate the occurrence of strong earthquakes and predict their
Appl. Sci. 2023, 13, 6424 6 of 19

3. Methods
In this study, both traditional machine learning algorithms and LSTM neural networks
Appl. Sci. 2023, 13, 6424 are employed to investigate the occurrence of strong earthquakes and predict their magni-7 of 22
tudes, respectively. In the following sections, a brief introduction to these two approaches
will be provided. Additionally, the complete earthquake prediction process is illustrated
in Figure 4.

Figure 4. Flow chart of earthquake prediction.


Figure 4. Flow chart
3.1. Traditional of earthquake
Machine Learning prediction.
Algorithms
Support vector machine (SVM) is a widely used machine learning algorithm for
3.1.classification
Traditional Machine Learning
and regression Algorithms
problems. It constructs a decision boundary defined by a
Support vector
hyperplane machine
in the feature (SVM)
space, is amaximizes
which widely used the machine learningthe
distance between algorithm
hyperplane for clas-
and the and
sification nearest training samples,
regression problems. thereby improving
It constructs the generalization
a decision boundary performance
defined by of athe
hyper-
model. In high-dimensional space, SVM can efficiently handle nonlinear
plane in the feature space, which maximizes the distance between the hyperplane and problems, and it the
performs well for data with small sample size but high dimensionality.
nearest training samples, thereby improving the generalization performance of the model.
Logistic regression (LR) is a commonly used binary classification algorithm, mainly
In high-dimensional space, SVM can efficiently handle nonlinear problems, and it per-
used to predict the probability of an output variable given an input variable. It calculates
forms well for data
the weighted sum of with
thesmall
input sample
variablessize
andbut high
passes dimensionality.
it through a sigmoid function to map
theLogistic
result toregression (LR) is representing
the [0, 1] interval, a commonlythe used binary classification
probability. algorithm,
Logistic regression can use mainly
used to predict the probability of an output variable given an input variable.
optimization algorithms such as gradient descent for parameter estimation and supports It calculates
theextended
weighted sumsuch
forms of the input variables
as polynomial and passes it through a sigmoid function to map
regression.
Decision
the result to thetree[0, (DT) is a commonly
1] interval, used machine
representing learning algorithm,
the probability. which makes
Logistic regression can use
decisions by constructing a tree-shaped model. In a decision tree, each
optimization algorithms such as gradient descent for parameter estimation and supports node represents a
feature, each branch represents a feature value, and each leaf node represents a decision
extended forms such as polynomial regression.
result. By partitioning the data and selecting features, decision trees can effectively perform
Decision tree (DT) is a commonly used machine learning algorithm, which makes
classification and regression tasks.
decisions by constructing
Random forest is an aensemble
tree-shaped model.
learning In a decision
algorithm tree, each
that combines nodedecision
multiple represents a
feature, each
trees for branch represents
classification a feature
and regression tasks. value,
In randomandforests,
each leaf
eachnode represents
decision a decision
tree is trained
result. By partitioning the data and selecting features, decision trees can effectively per-
form classification and regression tasks.
Random forest is an ensemble learning algorithm that combines multiple decision
trees for classification and regression tasks. In random forests, each decision tree is trained
Appl. Sci. 2023, 13, 6424 7 of 19

Appl. Sci. 2023, 13, 6424 using randomly selected samples and features, and the final result is obtained by voting8 of 22
or averaging. Due to its ability to reduce overfitting and improve prediction performance,
random forest is widely used in various machine learning problems.
LSTM is a special kind of recurrent neural network (RNN) proposed in 1997, mainly
3.2. Long Short-Term Memory Neutral Network
to solve the problem of gradient disappearance and gradient explosion in the long time
LSTM is a special kind of recurrent neural network (RNN) proposed in 1997, mainly
series training process. In comparison to RNNs, LSTM performs better for longer time
to solve the problem of gradient disappearance and gradient explosion in the long time
series data. In recent years, LSTM has been widely used in various fields, such as traffic
series training process. In comparison to RNNs, LSTM performs better for longer time
flow prediction [40], stock yield forecast [41], trajectory prediction [42,43], and earthquake
series data. In recent years, LSTM has been widely used in various fields, such as traffic
forecasting [37] .
flow prediction [40], stock yield forecast [41], trajectory prediction [42,43], and earthquake
The memory
forecasting [37]. unit structure of LSTM, consisting of the forget gate, the input gate, and
the output gate,
The memory unit is illustrated
structureinofFigure
LSTM,5. At the current
consisting time step,
of the forget gate,the
thememory unit
input gate, andtakes
in the
the hidden
output gate,variable ℎ ,in
is illustrated the memory
Figure 5. Atvariable 𝐶 time
the current , and thethe
step, input 𝑥 . Then,
memory unit the calcula-
takes in
the hidden variable ht−1 , the memory variable Ct−1 , and the input xt . Then, the calculation ℎ
tion of the forget gate, the input gate, and the output gate yields the output variables
of the𝐶forget
and , which arethe
gate, then fedgate,
input to the
andnext
the unit.
output gate yields the output variables ht and Ct ,
which are then fed to the next unit.

Figure5.5.The
Figure Thestructure
structureofofone
onememory
memoryblock in in
block LSTM neural
LSTM network.
neural network.

The forget gate first adds ht−1 and xt and passes the result through a sigmoid function
The forget gate first adds ℎ and 𝑥 and passes the result through a sigmoid func-
to obtain the forget factor, which is then multiplied with Ct−1 . The forget factor is calculated
tion to obtain the forget factor, which is then multiplied with 𝐶 . The forget factor is
as follows:
calculated as follows:
 
f t = σ W f ·[ht−1 , xt ] + b f (1)
𝑓 𝜎 𝑊 ∙ ℎ ,𝑥 𝑏 (1)
Similarly, the input gate firstly adds ht−1 and xt , and passes the result through a
sigmoid functionthe
Similarly, and a tanh
input function
gate firstly toadds ℎ it and
obtain and 𝑥Cet,, and
respectively.
passes the The inputthrough
result gate then
a sig-
moid function and a tanh function to obtain 𝑖 and 𝐶 , respectively. The input gate then
multiplies the results of the two functions and adds the results with the output of the forget
gate to obtain
multiplies theCtresults
. The calculation
of the two formula of it ,and
functions Cet , and
addsCt the
is asresults
follows:with the output of the
forget gate to obtain 𝐶 . The calculation formula of 𝑖 , 𝐶 , and 𝐶 is as follows:
it = σ (Wi ·[ht−1 , xt ] + bi ) (2)
𝑖 𝜎 𝑊 ∙ ℎ ,𝑥 𝑏 (2)
Cet = tanh(WC ·[ht−1 , xt ] + bC ) (3)
𝐶 𝑡𝑎𝑛ℎ 𝑊 ∙ ℎ , 𝑥 𝑏 (3)
Ct = f t × Ct−1 + it × Cet (4)
𝐶 𝑓 𝐶 𝑖 𝐶 (4)
Finally, the output gate adds ht−1 and xt , and passes the result through a sigmoid
Finally,
function the the
to obtain output gate
forget adds
factor. ℎ celland
The state 𝑥 is
, and
thenpasses
passedtheintoresult through
the tanh a sigmoid
function and
function to obtain the forget factor. The cell state is then passed into the tanh function and
multiplied by the forget factor to obtain the new hidden state, which is then passed into
the next unit. The calculation formula of 𝑜 and ℎ is as follows:
𝑜 𝜎 𝑊 ∙ ℎ ,𝑥 𝑏 (5)
ℎ 𝑜 𝑡𝑎𝑛ℎ 𝐶 (6)
Appl. Sci. 2023, 13, 6424 8 of 19

multiplied by the forget factor to obtain the new hidden state, which is then passed into the
next unit. The calculation formula of ot and ht is as follows:

ot = σ (Wo ·[ht−1 , xt ] + bo ) (5)

ht = ot × tanh(Ct ) (6)
Here, W f , Wi ,WC , and Wo are weight parameters and b f , bi , bC , and bo are bias
parameters.

4. Results
4.1. Evaluation Metrics
The prediction of the occurrence of strong earthquakes involves a two-class classifi-
cation problem, and the evaluation of its prediction results typically employs a confusion
matrix (Table 2).

Table 2. Confusion matrix of binary classification problem.

Predicted Condition Is Predicted Condition Is


Positive Negative
Actual condition is positive True Positive (TP) False Negative (FN)
Actual condition is negative False Positive (FP) True Negative (TN)

Specific judgments regarding the performance of earthquake occurrence prediction


models are typically made by calculating four evaluation indicators based on the confu-
sion matrix:
TP + TN
Accuracy = (7)
TP + TN + FP + FN
TP
Precision = (8)
TP + FP
TP
Recall = (9)
TP + FN
2 ∗ Precision ∗ Recall
F1 = (10)
Precision + Recall
Additionally, ROC curves and area under curve (AUC) are used to illustrate the
relationship between the aforementioned indicators and present the results.
For magnitude prediction, mean square error (MSE), mean absolute error (MAE), and
root mean square error (RMSE) are calculated to evaluate the prediction accuracy of the
model. MSE represents the prediction error, MAE represents the average absolute error
between the predicted value and the observed value, and RMSE reflects the degree of
deviation between the predicted value and the true value. The formula to calculate each
index is as follows:
1 n
MSE = ∑ (yi − ŷi )2 (11)
n i =1

1 n
n i∑
MAE = |yi − ŷi | (12)
=1
s
1 n
n i∑
RMSE = (yi − ŷi )2 (13)
=1

where n is the number of predicted values, yi is the true value, and ŷi is the predicted value.
Appl. Sci. 2023, 13, 6424 10 of 22

Appl. Sci. 2023, 13, 6424 4.2. Forecast Results 9 of 19

4.2.1. Strong Earthquake Occurrence and Classification


Figure
4.2. 6 illustrates
Forecast Results the evaluation results of four traditional machine learning models
(random forest (RF), decision
4.2.1. Strong Earthquake Occurrence tree (DT), support vector machine (SVM), and logistic re-
and Classification
gression (LR))6for
Figure the classification
illustrates of large
the evaluation resultsearthquake occurrence,
of four traditional machine namely, accuracy,
learning models pre-
cision, recall, and F1 score. The four models were implemented by calling
(random forest (RF), decision tree (DT), support vector machine (SVM), and logistic regres- the respective
functions of for
sion (LR)) SVM thein the sklearnoftoolkit,
classification LogisticRegression,
large earthquake occurrence, tree in the
namely, ensemble,
accuracy, and Ran-
precision,
domForestClassifier
recall, and F1 score. in Thethe ensemble.
four models wereFor implemented
SVM, the kernel function
by calling was set to
the respective rbf, and the
functions
of SVM
penalty in the sklearn
relaxation toolkit,
variable wasLogisticRegression, tree inparameters
set to 1.0, with other the ensemble, and RandomForest-
adopting default values.
Classifier
Pandas and in the ensemble.
NumPy were calledFor SVM, theread
here to kernelthefunction
datasetwas set togenerate
file and rbf, and the
the penalty
correspond-
relaxation variable was set to 1.0, with other parameters adopting default
ing array, respectively. Finally, matplotlib.pyplot was used to visualize the predicted values. Pandas re-
and NumPy were called here to read the dataset file and generate the corresponding array,
sults. The four evaluation indicators were calculated using Formulas (7)–(10). All tasks
respectively. Finally, matplotlib.pyplot was used to visualize the predicted results. The four
were completed
evaluation using were
indicators Python 3.9.6. using Formulas (7)–(10). All tasks were completed
calculated
using Python 3.9.6.

Figure
Figure 6. 6. Resultsofoflarge
Results largeearthquake
earthquake occurrence
occurrenceclassification: accuracy,
classification: precision,
accuracy, recall,recall,
precision, and F1and F1
score. A magnitude range of 5.0 to 7.0 is used to represent magnitude threshold of prediction.
score. A magnitude range of 5.0 to 7.0 is used to represent magnitude threshold of prediction.
The vertical axis of each subgraph in Figure 6 represents the respective evaluation
The value,
metric vertical axisthe
while of horizontal
each subgraph in Figurethe6 magnitude
axis represents representsthreshold
the respective
based onevaluation
the
metric value, while the horizontal axis represents the magnitude threshold
magnitude range of the catalog. For the experiment’s training and test sets, the entire dataset based on the
magnitude
was divided range ofathe
using 7:3 catalog. For the
ratio by calling experiment’s
the training and
function train_test_split testmodel_selection
in the sets, the entire da-
taset wassklearn
of the divided using a 7:3 ratio by calling the function train_test_split in the model_se-
package.
lection During the experiment,
of the sklearn package.it was observed that using the dataset directly as input data
forDuring
supportthevector
experiment,and
machine logistic
it was regression
observed resulted
that using inthepoor classification
dataset directlyindicators
as input data
at certain magnitude thresholds. This was mainly due to the fact that the test data points,
for support vector machine and logistic regression resulted in poor classification indica-
which were not of the same class at these thresholds, were only divided into one class. To
torsaddress
at certain magnitude thresholds. This was mainly due to the fact that the test data
this issue, the dataset was standardized and normalized before being used as input.
points,
After which
comparingwerethenot of the same
experiment’s classbefore
results at these
and thresholds, were only
after normalization, divided
it was into one
found that
class. To address this issue, the dataset was standardized
the classification results had been improved to some extent. and normalized before being
used asFigure
input.7After comparing
displays the experiment’s
the ROC curves resultsof
for the classification before and after normalization,
large earthquake occurrences. it
was foundthe
Unlike that
fourtheevaluation
classification resultsthe
indicators, hadROCbeen improved
curve to some
can evaluate theextent.
model without
requiring
Figure a7threshold
displaystothe be set,
ROC providing
curves results
for thethat better reflectof
classification thelarge
true performance
earthquakeofoccur-
rences. Unlike the four evaluation indicators, the ROC curve can evaluate the model with-
out requiring a threshold to be set, providing results that better reflect the true
Appl. Sci. 2023, 13, 6424 10 of 19

Appl. Sci. 2023, 13, 6424 11


the model. Moreover, the ROC curve remains unaffected even when the distribution pro-
portion of positive and negative samples in the test set changes. This feature is particularly
important when dealing with category imbalances in actual datasets, as the ROC curve is
performance
able to effectively of the
eliminate themodel.
impactMoreover, the ROCon
of such imbalances curve remains unaffected
the evaluation results. even whe
distribution
An important proportion
measure derivedoffrom
positive andcurve
the ROC negative
is thesamples in the
area under the test set(AUC).
curve changes. This
The closerture
theisAUC
particularly important
value is to when
1, the better thedealing with category
classification imbalances
performance in actual data
of the model.
When theasAUCthe ROC
is 0.5,curve is able to effectively
the classification eliminate
result is no better thanthe random
impact of such imbalances
guessing. As on
evaluation
depicted in results.
Figure 7, the RF classifier achieved the highest AUC value of 0.98, indicating its
superior performance in earthquake prediction. In contrast, the LR classifier had the lowest
AUC value of 0.72, indicating the weakest performance among the four classifiers.

Figure 7. ROC curves of the large earthquake occurrence classification using SVM, DT, LR, RF.
Figure 7. ROC curves of the large earthquake occurrence classification using SVM, DT, LR, RF
4.2.2. Magnitude Prediction
In the magnitude prediction
An important process,
measure the same
derived from22 the
feature
ROCparameters werearea
curve is the utilized
under the c
as input to the LSTM model, which then outputted the maximum magnitude of
(AUC). The closer the AUC value is to 1, the better the classification performance the next o
forecast window. The training
model. When the AUCset, validation
is 0.5, the set, and test setresult
classification were divided in anthan
is no better 8:1:1random
ratio. gues
Given the small size of the data set, we needed to be cautious of overfitting. Therefore, we
As depicted in Figure 7, the RF classifier achieved the highest AUC value of 0.98, ind
performed multiple optimizations using the validation set and determined the optimal
ing its superior performance in earthquake prediction. In contrast, the LR classifier
configuration of the LSTM model. Specifically, the number of hidden layers was set as 1,
the lowest AUC value of 0.72, indicating the weakest performance among the four cl
the number of neurons as 16, the initial learning rate as 0.01, and the number of epochs as
fiers. overfitting. Pandas and NumPy were also called here to read the dataset
200, to prevent
file and generate the corresponding array, respectively. Torch was imported to build the
LSTM model.4.2.2.All
Magnitude Prediction
figures were obtained using matplotlib.
In the dataIn the magnitude prediction
preprocessing stage, process, the
the MinMaxScaler same 22
function feature
was parameters
utilized to normalizewere utiliz
the data, input
whichtowasthethen
LSTMfed model,
into the which
model then
for training.
outputted Thethe
denormalized prediction of the
maximum magnitude
results were obtained
forecast after theThe
window. model had made
training set, its predictions.
validation set,The
andMinMaxScaler
test set werefunction
divided in an
ratio. Given the small size of the data set, we needed to be cautious of overfitting. Th
fore, we performed multiple optimizations using the validation set and determined
optimal configuration of the LSTM model. Specifically, the number of hidden layers
set as 1, the number of neurons as 16, the initial learning rate as 0.01, and the numb
Appl. Sci. 2023, 13, 6424 11 of 19

of sklearn was called in this section. The magnitude prediction results are presented in
Figures 8 and 9.
Figures 8 and 9 present the outcomes of the maximum magnitude prediction. The
LSTM model can effectively capture the temporal variations of the maximum magnitudes
in the training and validation sets, with most of the predicted magnitudes oscillating within
Appl. Sci. 2023, 13, 6424 a range of ±0.5 of the observed magnitude. However, on the test set, the model can 12 only
of 22
grasp the trend of the magnitude and tends to overestimate the maximum magnitude for
actual events with magnitude <= 5.0 and underestimate the maximum magnitude for actual
events
the withfile
dataset magnitude >= 6.0.the
and generate This suggests thatarray,
corresponding although the LSTMTorch
respectively. modelwascan imported
detect the
general pattern, it tends to produce oversimple predictions
to build the LSTM model. All figures were obtained using matplotlib. for the maximum magnitude.
Figures
In 10preprocessing
the data and 11 illustrate thethe
stage, dispersion
MinMaxScalerof errors usingwas
function boxplot and
utilized histogram.
to normalize
Notably, the error distribution of the training set is centered around 0,
the data, which was then fed into the model for training. The denormalized predictionwhereas the error
results were obtained after the model had made its predictions. The MinMaxScalermodel
distribution of the test set is much wider, centered around 0.5. In the test set, the func-
produces a higher quantity of positive errors than negative ones, as is evident
tion of sklearn was called in this section. The magnitude prediction results are presented from the
histograms in Figure 11.
in Figures 8 and 9.

Figure
Figure 8. (a) Retrospective
8. (a) Retrospectivetest
testofof prediction
prediction of the
of the maximum
maximum magnitude
magnitude earthquake.
earthquake. The green,
The blue, blue,
green, and purple curves represent training, validation, and test period of the observations, respec-
and purple curves represent training, validation, and test period of the observations, respectively.
tively. The yellow, red, and brown curves represent training, validation, and test period of the pre-
The yellow, red, and brown curves represent training, validation, and test period of the predictions,
dictions, respectively. (b) Loss curves of prediction of the maximum magnitude earthquake. The
respectively. (b) Loss curves of prediction of the maximum magnitude earthquake. The blue and red
blue and red curves represent the loss of training set and test set, respectively.
curves represent the loss of training set and test set, respectively.
Appl. Sci. 2023, 13, 6424 12 of 22
13 19

Figure 9. Comparison
Comparison ofof the predicted and observed maximum magnitudes. The The black
black dots,
dots, red
Appl. Sci. 2023, 13, 6424 14 of 22
triangles, and blue crosses represent test, validation, and training dataset, respectively.
respectively.

Figures 8 and 9 present the outcomes of the maximum magnitude prediction. The
LSTM model can effectively capture the temporal variations of the maximum magnitudes
in the training and validation sets, with most of the predicted magnitudes oscillating
within a range of ±0.5 of the observed magnitude. However, on the test set, the model can
only grasp the trend of the magnitude and tends to overestimate the maximum magnitude
for actual events with magnitude <= 5.0 and underestimate the maximum magnitude for
actual events with magnitude >=6.0. This suggests that although the LSTM model can de-
tect the general pattern, it tends to produce oversimple predictions for the maximum mag-
nitude.
Figures 10 and 11 illustrate the dispersion of errors using boxplot and histogram.
Notably, the error distribution of the training set is centered around 0, whereas the error
distribution of the test set is much wider, centered around 0.5. In the test set, the model
produces a higher quantity of positive errors than negative ones, as is evident from the
histograms in Figure 11.

Figure 10. Boxplot of the errors of training set, validation set, and test set.
Figure 10. Boxplot of the errors of training set, validation set, and test set.
Appl. Sci. 2023, 13, 6424 13 of 19

Figure 10. Boxplot of the errors of training set, validation set, and test set.

Figure 11. Histograms of the errors of training set, validation set, and test set in earthquake magni-
Figure 11. Histograms of the errors of training set, validation set, and test set in earthquake magnitude
tude prediction using LSTM.
prediction using LSTM.

Table
Table 3 presentsthe
3 presents theevaluation
evaluation indicators
indicators of model
of our our model and compares
and compares them withthem with
those
those of other
of other studies.studies. The results
The results arethan
are better better
thatthan thatbut
of [36], of worse
[36], but worse
than thanreasons
[37]. The [37]. The
reasons fordifferences
for these these differences will be discussed
will be discussed in the
in the Section 5. Section 5.

Comparison of
Table3.3.Comparison
Table of prediction
predictionresults
resultsby
bydifferent
differentresearchers.
researchers.

[36] [36]
Evaluation Index This StudyEvaluation Index [37] This Study [37]
DL DL
RF RF GLM
GLM
MSE 0.7347 MSE 0.015192 0.7347 0.015192
MAE 0.7252 MAE 0.097173 0.7252 0.097173
1.15 1.15
0.74 0.74 1.03
1.03
RMSE 0.8571 0.123256
RMSE 0.8571 0.123256

To evaluate the importance of different features, the permutation importance method


was used. This method measures the importance of features by calculating the increase
of model prediction error after shuffling the time series of each feature. The advantage of
this method is that it can compare the importance of different features and save more time
compared to other methods.
Figure 12 shows the feature importance of our model, where the length of the bar
chart represents the error of the model after shuffling the order. Longer bars indicate
more important features, while the orange line represents the MAE of the model as the
reference line.
Figure 12 reveals that nearly all the features used have a positive effect on the model’s
performance. Among them, b_std_mlk, x7_mlk, and T_elaps7 are less important, while
dM_mlk, x7_lsq, N.O., and T_elaps6 are the top four most important features. These results
demonstrate that magnitude deficit, probability of earthquake occurrence, number of
earthquakes, and the days since the last large earthquake are crucial factors for earthquake
magnitude prediction, which is consistent with physical understanding.
model prediction error after shuffling the time series of each feature. The advantage of
this method is that it can compare the importance of different features and save more time
compared to other methods.
Figure 12 shows the feature importance of our model, where the length of the bar
chart represents the error of the model after shuffling the order. Longer bars indicate more
Appl. Sci. 2023, 13, 6424 14 of 19
important features, while the orange line represents the MAE of the model as the reference
line.

Figure 12. Feature importance obtained by the permutation importance method.

5. Discussion
Figure 12 reveals that nearly all the features used have a positive effect on the model’s
5.1. Sample Number
performance. Among Issue and Feature
them, Importance
b_std_mlk, x7_mlk, and T_elaps7 are less important, while
Basedx7_lsq,
dM_mlk, on theN.O.,
classification resultsare
and T_elaps6 presented above,
the top four it isimportant
most evident that RF outperforms
features. These re-
LR and
sults SVM in terms
demonstrate thatofmagnitude
several evaluation indicators and
deficit, probability is also marginally
of earthquake better than
occurrence, numberDT.
However, it should be noted that the classification results may be affected
of earthquakes, and the days since the last large earthquake are crucial factors for earth- by the different
proportions
quake magnitudeof positive and negative
prediction, which issamples
consistentin the
withdataset,
physical which is determined by the
understanding.
setting of the strong earthquake threshold and can impact the prediction outcomes. To
investigate
5. Discussion this further, we experimented with several magnitudes which were shown in
Table 4 as the classification threshold and analyzed the number of positive and negative
5.1. Sample Number Issue and Feature Importance
samples, as well as the difference in classification metrics. It is important to exercise caution
whenBased on the classification
interpreting the accuracy scoreresults
in presented
cases where above,
thereitisisanevident
imbalancethat in
RFthe
outperforms
number of
LR and SVM in terms of several evaluation indicators and is also
positive and negative samples. Newer models that are insensitive to class imbalance marginally better than
[44]
DT.
mayHowever, it should
help overcome this be noted that
problem, the classification
but this resultsofmay
is out of the scope this be affected by the dif-
study.
ferent proportions of positive and negative samples in the dataset, which is determined
by the4.setting
Table of the
Evaluation strong
results andearthquake
sample numbersthreshold
of theand can impact
random the prediction outcomes.
forest method.
To investigate this further, we experimented with several magnitudes which were shown
inMagnitude
Table 4 as theAccuracy
classification threshold and
Precision analyzed the
Recall number ofPositive
F1 Score positive and Negative
negative
Samples Samples
samples, as well as the difference in classification metrics. It is important to exercise cau-
5.0 interpreting
tion when 0.975 the accuracy
0.982 score in 0.991
cases where0.986 549
there is an imbalance 41num-
in the
5.5 0.958 0.949 0.987 0.967 386 204
ber of positive and negative samples. Newer models that are insensitive to class imbalance
6.0 0.949 0.922 0.959 0.940 258 332
[44] may
6.5 help overcome
0.958 this problem,
0.96 but 0.857
this is out of 0.906
the scope of this 141 study. 449
7.0 0.992 0.889 1.0 0.941 60 530

At the same time, the impact of the small number of overall samples on the classifi-
cation results should also be taken into consideration. Although the evaluation metric of
RF is relatively the highest among the four classification methods, indicating that it is the
best classifier in the prediction of strong earthquakes, further verification is necessary to
determine its reliability on different datasets due to the small number of samples in the
test set and the uneven distribution of positive and negative samples. To better interpret
the classification results of RF, the feature quantity was ranked from high to low based on
At the same time, the impact of the small number of overall samples on the classifi-
cation results should also be taken into consideration. Although the evaluation metric of
RF is relatively the highest among the four classification methods, indicating that it is the
best classifier in the prediction of strong earthquakes, further verification is necessary to
determine its reliability on different datasets due to the small number of samples in the
Appl. Sci. 2023, 13, 6424 test set and the uneven distribution of positive and negative samples. To better interpret 15 of 19

the classification results of RF, the feature quantity was ranked from high to low based on
their weights in the classification, as illustrated in Figure 13. The top three most important
features
their are in
weights T_elaps75, T_elaps7,
the classification, as and dM_lsq,
illustrated which 13.
in Figure is consistent withmost
The top three the important
feature im-
features
portanceareofT_elaps75,
earthquakeT_elaps7,
magnitude andprediction
dM_lsq, which is consistent
and confirms with the role
the important feature impor-
of the num-
tance ofdays
ber of earthquake magnitude
since the last large prediction
earthquakeand andconfirms the important
the magnitude deficit. role of the number
of days since the last large earthquake and the magnitude deficit.

Figure 13. Feature importance ranking by the random forest method.


Figure 13. feature importance ranking by the random forest method.
The poor classification results of LR and SVM in this experiment also suggest the need
The investigation
for further poor classification results
into the of LR
potential and of
impact SVM in this
dataset sizeexperiment
and feature also suggeston
distribution the
need
the for furtherperformance.
classification investigationTointo the potential
address impact
this issue, of dataset
the feature sizewere
values andstandardized
feature distri-
bution
and on the classification
normalized performance.
in the dataset, To address
and the classifier this issue,
parameters werethe featureaccordingly.
adjusted values were
standardized and normalized in the dataset, and the classifier parameters
This approach led to an improvement in the overall classification performance. were adjusted
accordingly. This approach led to an improvement in the overall classification perfor-
5.2. Comparison with Previous Studies
mance.
From the prediction results, it can be seen that the LSTM model only trended in the
5.2. Comparison
prediction with
results, butPrevious Studies values were significantly larger than the observed
the predicted
values. Upon comparison with two other articles in Table 3, the results were inferior to
those reported in [37]. However, after conducting replication experiments, we discovered
that this disparity could be attributed to differences in the input features used in those two
studies. Using the same input approach as in [37], a similar performance was obtained, as
shown in Figures 14 and 15.
In the replication experiment, the MSE was found to be 0.2289, while MAE was 0.4193,
and RMSE was 0.4784. The prediction results obtained using the approach of [29] are shown
in Figures 14 and 15 and appear to be better than the results obtained previously. However,
we identified a potential data leakage issue in their approach due to the overlap between
input features and output labels. Specifically, the feature “Mag_max_obs” dominated
other features, as confirmed by the feature importance shown in Figure 16. Upon careful
examination of the approach of [37], we found that they input the maximum magnitude of
the first few windows for training and then obtain the maximum magnitude for the several
windows that follow. As these windows have overlapping parts, the maximum magnitudes
of the first several windows contain some characteristics of the maximum magnitudes of
the following several windows. This data leakage problem resulted in the information of
the test set being leaked to the training set, leading to a too-low error and a too-high feature
importance of Mag_max_obs.
From the prediction results, it can be seen that the LSTM model only trended in the
prediction results, but the predicted values were significantly larger than the observed
values. Upon comparison with two other articles in Table 3, the results were inferior to
those reported in [37]. However, after conducting replication experiments, we discovered
that this disparity could be attributed to differences in the input features used in those
Appl. Sci. 2023, 13, 6424 16 of 19
two studies. Using the same input approach as in [37], a similar performance was ob-
tained, as shown in Figures 14 and 15.

14. Retrospective test of


Figure14.
Figure of prediction
predictionofofthe
themaximum
maximummagnitude
magnitudeearthquake
earthquake using
using thethe similar
similar
inputwith
input [37].The
with[37]. Theblue,
blue,green,
green,and
andpurple
purplecurves
curvesrepresent
representtraining,
training,validation,
validation, and
and test
test period
period of
Appl. Sci. 2023, 13, 6424 of the observations, respectively. 18 of 22
the observations, respectively. TheThe yellow,
yellow, red,red,
andand brown
brown curves
curves represent
represent training,
training, validation,
validation, and
and test period of the predictions, respectively.
test period of the predictions, respectively.

Figure 15.
Figure Comparisonofofthe
15. Comparison thepredicted
predictedand
andobserved
observed maximum
maximum magnitudes
magnitudes using
using thethe same
same input
input
with [37].
[37]. The
Theblack
blackdots,
dots,red
redtriangles,
triangles,and
andblue crosses
blue represent
crosses test,
represent validation,
test, andand
validation, training data
training data
set,
set, respectively.
respectively.

In the replication experiment, the MSE was found to be 0.2289, while MAE was
0.4193, and RMSE was 0.4784. The prediction results obtained using the approach of [29]
are shown in Figures 14 and 15 and appear to be better than the results obtained previ-
ously. However, we identified a potential data leakage issue in their approach due to the
Appl. Sci.
Sci. 2023,
2023, 13, 6424 19
17 of
of 22
19

Figure 16.
Figure Feature importance
16. Feature importance for
for the
the retrospective
retrospective experiment
experiment with
with [37].
[37].
Meanwhile, in
Meanwhile, inTable
Table3,3,aacomparative
comparativeanalysis
analysisofof the
the results
results with
with those
those presented
presented in
in [36] is conducted. The authors of this paper employ generalized linear models (GLM),
[36] is conducted. The authors of this paper employ generalized linear models (GLM),
random forest (RF), and deep neural network (DNN) to predict the maximum magnitude
random forest (RF), and deep neural network (DNN) to predict the maximum magnitude
of future earthquakes. Unlike this study, their work benefits from sufficient training data.
of future earthquakes. Unlike this study, their work benefits from sufficient training data.
Remarkably, their predicted magnitudes are smaller than the observed values. Notably, the
Remarkably, their predicted magnitudes are smaller than the observed values. Notably,
LSTM model outperforms the predictions reported in [36], underscoring the effectiveness
the LSTM model outperforms the predictions reported in [36], underscoring the effective-
of the model construction.
ness of the model construction.
6. Conclusions
6. Conclusions
In this study, we explored the possibility of using machine learning for earthquake
In
prediction this by
study, we explored
applying the possibility
four traditional machineof using machine
learning methodslearning for earthquake
and the LSTM neu-
prediction by applying four traditional machine learning
ral network to predict the occurrences and maximum magnitudes of earthquakes methods and the LSTM neural
in the
network to predict
Sichuan–Yunnan the occurrences
region. We calculated and maximum
and extractedmagnitudes of earthquakes
seismicity parameters relatedintothe Si-
earth-
chuan–Yunnan region. We calculated and extracted seismicity
quake occurrence from the earthquake catalog as input features. The results showed that the parameters related to
earthquake
random forest occurrence
method fromwas thethemost
earthquake
effectivecatalog as inputlarge
at classifying features. The results
earthquake showed
occurrences,
that
and the random
the LSTM forest method
method providedwas the most effective
a reasonable estimation at of
classifying
earthquake large earthquake oc-
magnitude.
currences, and thesupport
The findings LSTM methodthe ideaprovided
that small a reasonable
earthquakes estimation of earthquake
contain information magni-
relevant to
tude.
predicting future large earthquakes and offer a promising way to predict the occurrence of
largeThe findings support
earthquakes. the idea
Additionally, thethat smallprovide
findings earthquakes
usefulcontain information
information on which relevant to
features
predicting future large
that are consistent with earthquakes and offer aare
physical interpretation promising
important wayfortoearthquake
predict theprediction.
occurrence
of large
Whileearthquakes. Additionally,
the limitations of this studythe findings
should be provide
noted, useful
they alsoinformation
represent on thewhich fea-
next steps
tures that are
for future consistent
work. with physical
First, under interpretation
this framework, are important
earthquake swarms,for earthquake
which predic-
are statistically
tion.
very rare [45], are difficult to predict due to the small differences between their magnitudes.
While
Second, the limitations
longer-term of this
seismic study should
monitoring be noted,
is needed forthey also represent
the further the next
application of steps
more
complex
for futuremodels (e.g., transformer)
work. First, to improveearthquake
under this framework, the performance swarms,of predictions. Third, the
which are statistically
spatial
very locations
rare [45], areof difficult
earthquakes are notdue
to predict considered in thisdifferences
to the small study but are important
between theirfor earth-
magni-
quake prediction.
tudes. New models
Second, longer-term which
seismic can address
monitoring is spatial
neededinformation
for the further(e.g.,application
graph neural of
networks)
more may models
complex be useful to tackle
(e.g., this problem
transformer) in the future.
to improve Although limitations
the performance of predictions.and
difficulties
Third, exist, we
the spatial are trying
locations of to explore theare
earthquakes nonlinear relationsin
not considered of this
earthquake
study but prediction,
are im-
which is one of the most difficult problems in seismology, by
portant for earthquake prediction. New models which can address spatial information applying machine learning
(e.g., graph neural networks) may be useful to tackle this problem in the future. Although
Appl. Sci. 2023, 13, 6424 18 of 19

methods. This study presents a potential avenue for improving the accuracy of earthquake
prediction in the future.

Author Contributions: Methodology, X.W., Z.Z. and K.J.; Software, X.W., Z.Z., Y.Y. and K.J.;
Validation, X.W., Z.Z. and Z.L.; Formal analysis, S.Z., C.J. and K.J.; Resources, K.J.; Data curation, C.J.;
Writing—original draft, X.W. and Z.Z.; Writing—review & editing, S.Z. and K.J.; Visualization, X.W.
and Z.Z.; Supervision, K.J.; Project administration, K.J.; Funding acquisition, S.Z. and K.J. All authors
have read and agreed to the published version of the manuscript.
Funding: This research was funded by the Special Fund of the Institute of Geophysics, China
Earthquake Administration (Grant No. DQJB 22Z01-09), the National Natural Science Foundation
of China (Grant Nos. 42274068, U2039204), the Key Research and Development Project in Shaanxi
Province (Grant No. 2023-YBSF-237), and the Shanghai Sheshan National Geophysical Observatory
(Grant No. SSOP202207).
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: The dataset of this research can be found from the China Earthquake
Data Center (CEDC, http://data.earthquake.cn/index.html, last accessed on 23 May 2022).
Acknowledgments: We benefit from helpful discussions with Naidan Yun, Han Yue and Xiaoyan
Cai. Comments and suggestions from Editor and two anonymous reviewers are appreciated.
Conflicts of Interest: The authors declare no conflict of interest.

References
1. Geller, R.J.; Jackson, D.D.; Kagan, Y.Y.; Mulargia, F. Earthquakes cannot be predicted. Science 1997, 275, 1616. [CrossRef]
2. Knopoff, L. Earthquake prediction: The scientific challenge. Proc. Natl. Acad. Sci. USA 1996, 93, 3719–3720. [CrossRef]
3. Wyss, M. Cannot earthquakes be predicted? Science 1997, 278, 487–490. [CrossRef]
4. Sibson, R.H. Crustal stress, faulting and fluid flow. Geol. Soc. Lond. Spec. Publ. 1994, 78, 69–84. [CrossRef]
5. Liu, M.; Stein, S. Mid-continental earthquakes: Spatiotemporal occurrences, causes, and hazards. Earth-Sci. Rev. 2016, 162,
364–386. [CrossRef]
6. Field, E.H. How Physics-Based Earthquake Simulators Might Help Improve Earthquake Forecasts. Seismol. Res. Lett. 2019, 90,
467–472. [CrossRef]
7. Shi, Y.; Sun, Y.; Luo, G.; Dong, P.; Zhang, H. Roadmap for earthquake numerical forecasting in China—Reflection on the tenth
anniversary of Wenchuan earthquake. Chin. Sci. Bull. 2018, 63, 1865–1881. [CrossRef]
8. Bergen, K.J.; Johnson, P.A.; de Hoop, M.V.; Beroza, G.C. Machine learning for data-driven discovery in solid Earth geoscience.
Science 2019, 363, eaau0323. [CrossRef]
9. Beroza, G.C.; Segou, M.; Mostafa Mousavi, S. Machine learning and earthquake forecasting-next steps. Nat. Commun. 2021, 12,
4761. [CrossRef]
10. Shimshoni, Y.; Intrator, N. Classification of seismic signals by integrating ensembles of neural networks. IEEE Trans. Signal Process.
1998, 46, 1194–1201. [CrossRef]
11. Li, Z.; Meier, M.-A.; Hauksson, E.; Zhan, Z.; Andrews, J. Machine Learning Seismic Wave Discrimination: Application to
Earthquake Early Warning. Geophys. Res. Lett. 2018, 45, 4773–4779. [CrossRef]
12. Malfante, M.; Dalla Mura, M.; Metaxian, J.-P.; Mars, J.I.; Macedo, O.; Inza, A. Machine Learning for Volcano-Seismic Signals
Challenges and perspectives. IEEE Signal Process. Mag. 2018, 35, 20–30. [CrossRef]
13. Tang, L.; Zhang, M.; Wen, L. Support Vector Machine Classification of Seismic Events in the Tianshan Orogenic Belt. J. Geophys.
Res. Solid Earth 2020, 125, e2019JB018132. [CrossRef]
14. Titos, M.; Bueno, A.; Garcia, L.; Benitez, C. A Deep Neural Networks Approach to Automatic Recognition Systems for Volcano-
Seismic Events. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 1533–1544. [CrossRef]
15. Asim, K.M.; Martinez-Alvarez, F.; Basit, A.; Iqbal, T. Earthquake magnitude prediction in Hindukush region using machine
learning techniques. Nat. Hazards 2017, 85, 471–486. [CrossRef]
16. Mousavi, S.M.; Beroza, G.C. A Machine-Learning Approach for Earthquake Magnitude Estimation. Geophys. Res. Lett. 2020, 47,
e2019GL085976. [CrossRef]
17. Rouet-Leduc, B.; Hulbert, C.; Lubbers, N.; Barros, K.; Humphreys, C.J.; Johnson, P.A. Machine Learning Predicts Laboratory
Earthquakes. Geophys. Res. Lett. 2017, 44, 9276–9282. [CrossRef]
18. Li, Z.; Tian, K.; Wang, F.; Zheng, X.; Wang, F. Home damage estimation after disasters using crowdsourcing ideas and Convolu-
tional Neural Networks. In Proceedings of the 5th International Conference on Measurement, Instrumentation and Automation
(ICMIA), Shenzhen, China, 17–18 September 2016; pp. 857–860.
Appl. Sci. 2023, 13, 6424 19 of 19

19. Shahnas, M.H.; Yuen, D.A.; Pysklywec, R.N. Inverse Problems in Geodynamics Using Machine Learning Algorithms. J. Geophys.
Res. Solid Earth 2018, 123, 296–310. [CrossRef]
20. Provost, F.; Hibert, C.; Malet, J.P. Automatic classification of endogenous landslide seismicity using the Random Forest supervised
classifier. Geophys. Res. Lett. 2017, 44, 113–120. [CrossRef]
21. Araya-Polo, M.; Jennings, J.; Adler, A.; Dahlke, T. Deep-learning tomography. Lead. Edge 2018, 37, 58–66. [CrossRef]
22. DeVries, P.M.R.; Viegas, F.; Wattenberg, M.; Meade, B.J. Deep learning of aftershock patterns following large earthquakes. Nature
2018, 560, 632–634. [CrossRef] [PubMed]
23. Moustra, M.; Avraamides, M.; Christodoulou, C. Artificial neural networks for earthquake prediction using time series magnitude
data or Seismic Electric Signals. Expert Syst. Appl. 2011, 38, 15032–15039. [CrossRef]
24. Panakkat, A.; Adeli, H. Neural network models for earthquake magnitude prediction using multiple seismicity indicators. Int. J.
Neural Syst. 2007, 17, 13–33. [CrossRef] [PubMed]
25. Wang, Q.; Guo, Y.; Yu, L.; Li, P. Earthquake Prediction Based on Spatio-Temporal Data Mining: An LSTM Network Approach.
IEEE Trans. Emerg. Top. Comput. 2020, 8, 148–158. [CrossRef]
26. Asencio-Cortes, G.; Martinez-Alvarez, F.; Morales-Esteban, A.; Reyes, J. A sensitivity study of seismicity indicators in supervised
learning to improve earthquake prediction. Knowl.-Based Syst. 2016, 101, 15–30. [CrossRef]
27. Asim, K.M.; Idris, A.; Iqbal, T.; Martinez-Alvarez, F. Seismic indicators based earthquake predictor system using Genetic
Programming and AdaBoost classification. Soil Dyn. Earthq. Eng. 2018, 111, 1–7. [CrossRef]
28. Martinez-Alvarez, F.; Reyes, J.; Morales-Esteban, A.; Rubio-Escudero, C. Determining the best set of seismicity indicators to
predict earthquakes. Two case studies: Chile and the Iberian Peninsula. Knowl.-Based Syst. 2013, 50, 198–210. [CrossRef]
29. Florido, E.; Asencio Cortes, G.; Aznarte, J.L.; Rubio-Escudero, C.; Martinez-Alvarez, F. A novel tree-based algorithm to discover
seismic patterns in earthquake catalogs. Comput. Geosci. 2018, 115, 96–104. [CrossRef]
30. Rundle, J.B.; Donnellan, A.; Fox, G.; Crutchfield, J.P. Nowcasting Earthquakes by Visualizing the Earthquake Cycle with Machine
Learning: A Comparison of Two Methods. Surv. Geophys. 2022, 43, 483–501. [CrossRef]
31. Rundle, J.B.; Donnellan, A.; Fox, G.; Ludwig, L.G.; Crutchfield, J. Does the Catalog of California Earthquakes, With Aftershocks
Included, Contain Information About Future Large Earthquakes? Earth Space Sci. 2023, 10, e2022EA002521. [CrossRef]
32. Alexandridis, A.; Chondrodima, E.; Efthimiou, E.; Papadakis, G.; Vallianatos, F.; Triantis, D. Large Earthquake Occurrence
Estimation Based on Radial Basis Function Neural Networks. IEEE Trans. Geosci. Remote Sens. 2014, 52, 5443–5453. [CrossRef]
33. Deng, Q.D.; Zhang, P.Z.; Ran, Y.K.; Yang, X.P.; Min, W.; Chu, Q.Z. Basic characteristics of active tectonics of China. Sci. China Ser.
D-Earth Sci. 2003, 46, 356–372.
34. Wiemer, S. A Software Package to Analyze Seismicity: ZMAP. Seismol. Res. Lett. 2001, 72, 373–382. [CrossRef]
35. Wiemer, S.; Wyss, M. Minimum magnitude of completeness in earthquake catalogs: Examples from Alaska, the western United
States, and Japan. Bull. Seismol. Soc. Am. 2000, 90, 859–869. [CrossRef]
36. Asencio-Cortes, G.; Morales-Esteban, A.; Shang, X.; Martinez-Alvarez, F. Earthquake prediction in California using regression
algorithms and cloud-based big data infrastructure. Comput. Geosci. 2018, 115, 198–210. [CrossRef]
37. Li, L.; Shi, Y.; Cheng, S. Exploration of long short-term memory neural network in intermediate earthquake forecast: A case study
in Sichuan-Yunnan region. Chin. J. Geophys. Chin. Ed. 2022, 65, 12–25. [CrossRef]
38. Reyes, J.; Morales-Esteban, A.; Martinez-Alvarez, F. Neural networks to predict earthquakes in Chile. Appl. Soft Comput. 2013, 13,
1314–1328. [CrossRef]
39. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [CrossRef]
40. Fu, R.; Zhang, Z.; Li, L. Using LSTM and GRU Neural Network Methods for Traffic Flow Prediction. In Proceedings of the 31st
Youth Academic Annual Conference of Chinese-Association-of-Automation (YAC), Wuhan, China, 11–13 November 2016; pp.
324–328.
41. Chen, K.; Zhou, Y.; Dai, F. A LSTM-based method for stock returns prediction: A case study of China stock market. In Proceedings
of the 2015 IEEE international conference on big data (big data), Santa Clara, CA, USA, 29 October–1 November 2015; pp.
2823–2824.
42. Altché, F.; de La Fortelle, A. An LSTM network for highway trajectory prediction. In Proceedings of the 2017 IEEE 20th
international conference on intelligent transportation systems (ITSC), Yokohama, Japan, 16–19 October 2017; pp. 353–359.
43. Xue, H.; Huynh, D.Q.; Reynolds, M. SS-LSTM: A hierarchical LSTM model for pedestrian trajectory prediction. In Proceedings of
the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV), Lake Tahoe, NV, USA, 12–15 March 2018; pp.
1186–1194.
44. Peng, C.; Cheng, Q. Discriminative Ridge Machine: A Classifier for High-Dimensional Data or Imbalanced Data. IEEE Trans.
Neural Netw. Learn. Syst. 2021, 32, 2595–2609. [CrossRef]
45. Scislo, L. High Activity Earthquake Swarm Event Monitoring and Impact Analysis on Underground High Energy Physics
Research Facilities. Energies 2022, 15, 3705. [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.

You might also like