Professional Documents
Culture Documents
Structures
journal homepage: www.elsevier.com/locate/structures
A R T I C L E I N F O A B S T R A C T
Keywords: Estimating ground motion characteristics at various locations as a function of fault characteristics is useful for the
Kaikoura proper damage assessment and risk mitigation strategies. This paper explores the application of machine learning
Machine learning approaches to predict peak ground acceleration (PGA) and peak ground velocity (PGV) using New Zealand’s
Pgv
strong motion data. Five machine learning algorithms, namely linear regression, kNN, SVM, Random Forest, and
Random forest
Svm
XGBoost, are used in this study. Using the New Zealand flat-file database, the geometric mean of the peak ground
Xgboost motion parameters is used as predictor variables in training the machine learning algorithms. The performance of
the chosen algorithms and how they work on PGV and PGA are discussed. The best prediction for PGA is obtained
using random forest but for PGV XGboost worked best. The relative importance of various features in the flat file
is also presented for the best-performing machine learning algorithm. Although the magnitude of an earthquake
is found to be most influential for PGV, rupture distance showed the highest impact for PGA. Finally, the pre
dictions are also explained using SHApley Additive exPlanations (SHAP) for the overall dataset as well as on a
sample by sample basis, for a few samples. Pairwise dependency of some features with the highest feature
importance is also presented using SHAP.
1. Introduction attempt machine learning approaches to see how well peak ground
motion acceleration (PGA) and peak ground velocity (PGV) can be
The Hikurangi and Fiordland are two main subduction zones of New predicted. Strong motion accelerograms are usually needed to find the
Zealand with the interface and in-slab earthquakes. Shallow crustal maximum ground motion. For New Zealand, however, a database
earthquakes are also seen over the entire stretch of New Zealand. (popularly known by engineering seismologists as flat-file) exists that
Although the historic 1968 Inangahua was one of the oldest documented has PGA and PGV tabulated along with several other informative pa
earthquakes of magnitude above 7, the 2003 Fiordland earthquake is rameters about the earthquakes from which they originated together
possibly the earliest earthquake exceeding 7 magnitudes with some with the locations where they are recorded.
strong-motion recordings available. More recently, the 2009 Dusky and The New Zealand flat-file is a result of the efforts of GNS science
2010 Darfield earthquakes with a magnitude higher than 7 added to researchers. It has more than 4000 recordings from earthquakes of
New Zealand’s strong motion data significantly [1]. The 2011 Christ magnitude 3.5 and above [2,3]. This database has been used for site
church earthquake has caused notable damage to structures. The Can characterization [3] and computations of response spectra [4]. Five of
terbury earthquake sequence is also known to have triggered several the New Zealand earthquakes also find their place in the worldwide
secondary effects. Damage does have a correlation with intensity (e.g., database (74 earthquakes) of strong near-field motion [5], of which the
peak ground motion), as is assumed in probabilistic seismic demand 2016 Kaikoura earthquake has 30 + time histories. A few hundreds of
models. So, it is important to have predictive models for peak ground accelerograms (both horizontal as well as vertical components) are also
motion. Typically, such predictive models are derived by assuming available for time history analysis of structures. Metadata related to
functional forms, which introduce bias. However, in this study, we various distance metrics, depth to the top of the fault, site classification,
* Corresponding author.
E-mail address: surendra@ce.iith.ac.in (S.N. Somala).
https://doi.org/10.1016/j.istruc.2021.10.085
Received 25 July 2021; Received in revised form 27 September 2021; Accepted 25 October 2021
Available online 6 November 2021
2352-0124/© 2021 Institution of Structural Engineers. Published by Elsevier Ltd. All rights reserved.
S.N. Somala et al. Structures 34 (2021) 4977–4985
4978
S.N. Somala et al. Structures 34 (2021) 4977–4985
Fig. 2. Input variables and prediction variables used in this study. The total number of samples is in excess of 4000, as indicated below the bottommost horizon
tal bar.
4979
S.N. Somala et al. Structures 34 (2021) 4977–4985
square error (RMSE) and the coefficient of determination (R2) are shown
separately for training and testing datasets (Tables 1-2). It can be seen
that the smallest RMSE on the test set is 0.026 m/s, occurring for
XGBoost. However, in PGA prediction, the random forest gives the
smallest RMSE (0.032 g) among the machine learning algorithms
considered. The variance reduction (R2) parameter is also tabulated in
Tables 1 and 2. R2 on the test set is highest for XGBoost in predicting
PGV, while for PGA prediction, it is highest for the random forest. The
classical machine learning algorithms like linear regression, kNN, and
SVM performed relatively poorly in predicting PGA and PGV. Fig. 5
shows the predicted value as a function of the actual value for PGA and
PGV, separately for the training and test datasets.
4980
S.N. Somala et al. Structures 34 (2021) 4977–4985
Fig. 4. Distribution of prediction variables along with the source (magnitude) and path (distance metric) parameters that are typically used in ground-motion
prediction equations.
rupture distance), the rupture distance has the highest feature impor
Table 1
tance. SHAP can also give pairwise interaction. Fig. 9 shows the pairwise
PGV predictions using various machine learning algorithms.
dependence for a few interesting parameters in predicting the PGA and
Machine learning algorithm Training dataset Test dataset PGV. Lower rupture distance has a more positive impact for higher
R2 RMSE R2 RMSE magnitudes, but below Mw 5.3 or so, the trend gets reversed. Such a
Linear regression 0.619 0.046 0.614 0.042 trend is true irrespective of the prediction parameter (PGA or PGV).
Support vector regression 0.269 0.063 0.260 0.062 Also, for higher rupture distances, larger magnitudes have a lower
k-nearest neighbors 0.664 0.043 0.635 0.043 impact.
Random Forest 0.983 0.010 0.861 0.027
XGBoost 0.993 0.006 0.864 0.026
6. Conclusion
For the first time, machine learning algorithms are used to the best of
Table 2 the author’s knowledge on a New Zealand strong motion database to
PGA predictions using various machine learning algorithms. understand the predictability of peak ground motion (PGA and PGV).
Machine learning algorithm Training dataset Test dataset Random forest is found to give the lowest RMSE on the test set for PGA.
For PGV, however, XGBoost gave the lowest RMSE on the test set. SHAP
R2 RMSE R2 RMSE
values reveal that the distance metrics have a negative impact on esti
Linear regression 0.453 0.056 0.401 0.052 mating the peak ground motion, unlike the magnitude, which has a
Support vector regression 0.178 0.069 0.049 0.065
k-nearest neighbors 0.498 0.054 0.435 0.050
positive impact. Moreover, SHAP reveals that while Mw has the highest
Random Forest 0.953 0.018 0.778 0.032 feature importance in estimating PGV within New Zealand, Rrup is the
XGBoost 0.985 0.009 0.705 0.036 most influential parameter in predicting PGA. Considering the top two
important parameters (Mw, Rrup), there is a pivotal magnitude between
Mw 5 and 6 where the influence of lower Rrup on PGA and PGV pre
metrics should be pushing the prediction towards lower values, but only
diction flips from negative to positive.
if that distance metric’s value for the sample of interest is higher than
that of the median value of that distance metric in the test set. If not,
Declaration of Competing Interest
such a distance metric would push the prediction towards the right
(higher values).
The authors declare that they have no known competing financial
The feature importance based on mean SHAP values in estimating
interests or personal relationships that could have appeared to influence
the peak ground motion parameters (PGA and PGV) is given in Fig. 8.
the work reported in this paper.
Magnitude has the highest feature importance in PGV prediction, while
rupture distance is the most influential parameter in PGA estimation.
Out of the distance metrics (epicentral, hypocentral, Joyner-Boore, and
4981
S.N. Somala et al. Structures 34 (2021) 4977–4985
Fig. 5. Predictions of the best performing algorithm (XGBoost) on the training and testing datasets for each of the prediction variables.
Fig. 6. SHAP values in predicting the peak ground motion parameters using XGBoost.
4982
S.N. Somala et al. Structures 34 (2021) 4977–4985
Fig. 7. Local explainability of XGBoost predictions for the first two samples using SHAP.
4983
S.N. Somala et al. Structures 34 (2021) 4977–4985
Acknowledgment [10] Mangalathu S, Hwang S-H, Choi E, Jeon J-S. Rapid seismic damage evaluation of
bridge portfolios using machine learning techniques. Eng Struct 2019;201:109785.
https://doi.org/10.1016/j.engstruct.2019.109785.
Funding from the Ministry of Earth Sciences (MoES), India, under the [11] Mangalathu S, Jeon J-S. Stripe-based fragility analysis of multispan concrete bridge
grant number MoES/P.O.(Seismo)/1(304)/2016 is greatly acknowledged. classes using machine learning techniques. Earthquake Eng Struct Dyn 2019;48
(11):1238–55.
[12] Seo J, Linzell DG. Use of response surface metamodels to generate system level
References fragilities for existing curved steel bridges. Eng Struct 2013;52:642–53.
[13] Kiani J, Camp C, Pezeshk S. On the application of machine learning techniques to
[1] Houtte CV. Performance of response spectral models against New Zealand data. derive seismic fragility curves. Comput Struct 2019;218:108–22.
Bull N Zeal Soc Earthq Eng 2017;50:21–38. https://doi.org/10.5459/ [14] El-Sefy M, Yosri A, El-Dakhakhni W, Nagasaki S, Wiebe L. Artificial neural network
bnzsee.50.1.21-38. for predicting nuclear power plant dynamic behaviors. Nuclear Eng Technol 2021;
[2] Kaiser A, Van Houtte C, Perrin N, McVerry G, Cousin J, Dellow S, et al. 53(10):3275–85.
Characterizing GeoNet strong motion sites. Site metadata update for the 2015 [15] Siam A, Ezzeldin M, El-Dakhakhni W. Machine learning algorithms for structural
Strong Motion Database. 2016. performance classifications and predictions: Application to reinforced masonry
[3] Kaiser A, Houtte CV, Perrin N, Wotherspoon L, McVerry G. Site characterisation of shear walls. Structures, vol. 22, Elsevier; 2019, p. 252–65.
GeoNet stations for the New Zealand Strong Motion Database. Bull N Zeal Soc [16] Raghucharan MC, Somala SN, Rodina S. Seismic attenuation model using artificial
Earthq Eng 2017;50:39–49. https://doi.org/10.5459/bnzsee.50.1.39-49. neural networks. Soil Dyn Earthquake Eng 2019;126:105828. https://doi.org/
[4] Houtte CV, Bannister S, Holden C, Bourguignon S, McVerry G. The New Zealand 10.1016/j.soildyn.2019.105828.
strong motion database. Bull N Zeal Soc Earthq Eng 2017;50:1–20. https://doi.org/ [17] Somala SN, Chanda S, Raghucharan MC, Rogozhin E. Spectral acceleration
10.5459/bnzsee.50.1.1-20. prediction for strike, dip, and rake: a multi-layered perceptron approach. J Seismol
[5] Pacor F, Felicetta C, Lanzano G, Sgobba S, Puglia R, D’Amico M, et al. NESS1: A 2021;25(5):1339–46. https://doi.org/10.1007/s10950-021-10031-2.
worldwide collection of strong-motion data to investigate near-source effects. [18] Mangalathu S, Jeon J-S. Machine learning–based failure mode recognition of
Seismol Res Lett 2018;89:2299–313. https://doi.org/10.1785/0220180149. circular reinforced concrete bridge columns: comparative study. J Struct Eng 2019;
[6] Chanda S, Raghucharan MC, Karthik Reddy KSK, Chaudhari V, Somala SN. 145(10):04019104. https://doi.org/10.1061/(ASCE)ST.1943-541X.0002402.
Duration prediction of Chilean strong motion data using machine learning. J S Am [19] Kowalsky MJ, Priestley MJN. Improved analytical model for shear strength of
Earth Sci 2021;109:103253. https://doi.org/10.1016/j.jsames.2021.103253. circular reinforced concrete columns in seismic regions. SJ 2000;97:388–96.
[7] Somala SN, Karthikeyan K, Mangalathu S. Time period estimation of masonry https://doi.org/10.14359/4633.
infilled RC frames using machine learning techniques. Structures 2021;34:1560–6. [20] Hwang S-H, Mangalathu S, Shin J, Jeon J-S. Machine learning-based approaches
https://doi.org/10.1016/j.istruc.2021.08.088. for seismic demand and collapse of ductile reinforced concrete building frames.
[8] Mangalathu S, Jang H, Hwang S-H, Jeon J-S. Data-driven machine-learning-based J Build Eng 2021;34:101905. https://doi.org/10.1016/j.jobe.2020.101905.
seismic failure mode identification of reinforced concrete shear walls. Eng Struct [21] Friedman JH. The elements of statistical learning: data mining, inference, and
2020;208:110331. https://doi.org/10.1016/j.engstruct.2020.110331. prediction. Springer open; 2017.
[9] Mangalathu S, Sun H, Nweke CC, Yi Z, Burton HV. Classifying earthquake damage [22] Cover T, Hart P. Nearest neighbor pattern classification. IEEE Trans Inf Theory
to buildings using machine learning. Earthq Spectra 2020;36(1):183–208. https:// 1967;13(1):21–7. https://doi.org/10.1109/TIT.1967.1053964.
doi.org/10.1177/8755293019878137.
4984
S.N. Somala et al. Structures 34 (2021) 4977–4985
[23] Breiman L. Random forests. Mach Learn 2001;45:5–32. https://doi.org/10.1023/ approach. Eng Struct 2020;219:110927. https://doi.org/10.1016/j.
A:1010933404324. engstruct.2020.110927.
[24] Chen T, Guestrin C. XGBoost: A Scalable Tree Boosting System. Proceedings of the [26] Lundberg SM, Lee S-I. A unified approach to interpreting model predictions. In:
22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Proceedings of the 31st international conference on neural information processing
Mining 2016:785–94. https://doi.org/10.1145/2939672.2939785. systems; 2017. p. 4768–77.
[25] Mangalathu S, Hwang S-H, Jeon J-S. Failure mode and effects analysis of RC
members based on machine-learning-based SHapley Additive exPlanations (SHAP)
4985