You are on page 1of 11

Determining the optimum set of seismicity indicators to predict earthquakes in Chile

F. Martnezlvareza , J. Reyesb , A. MoralesEstebanc , C. RubioEscuderod


= Department of Computer Science, Pablo de Olavide University of Seville, Spain  fmaralv@upo.es > The Geophysics Team, University of Santiago, Chile  daneel@geosica.cl ? Department of Continuum Mechanics, University of Seville, Spain  ame@us.es @ Department of Computer Science, University of Seville, Spain  crubioescudero@us.es

Abstract

This work explores the use of dierent seismicity indicators as input for articial neural networks. It is proposed the combination of multiple indicators that have already been successfully used in several seismic zones by the application of feature selection techniques. These techniques evaluate every input and propose the best combination of them in terms of information gain. Once such sets have been obtained, articial neural networks are applied to four Chilean zones. To make the comparison to other models possible, the prediction problem has been turned into a classication one thus allowing the application of other machine learning classiers. Comparisons with original sets of inputs and dierent classiers are reported to support the degree of success achieved. Finally, statistical tests have been applied to conrm that the results are signicantly dierent than those of other classiers.
Key words:

Earthquake prediction, seismicity indicators, feature selection, time series is expected, then, that the selection of the features with higher correlation will lead to more accurate predictions. Feature selection (or variable selection or feature reduction) emerges as a crucial step to build robust models, especially when too many variables forms the set of input features. Although many complex approaches have been proposed during the last decade [6, 16, 38], the analysis of the information gain that every seismicity indicator (or feature) presents is carried out to discover which ones show larger correlation with the output. The Chilean areas described in [35] and studied in [36] Talca, Santiago, Valparaso and Pichilemu have been subjected to analysis in order to assess the performance of such proposal. The remainder of the work is structured as follows. Section 2 explores the latest works related with the application of ANN to earthquake prediction. Section 3 describes the methodology used, as well as the mathematical fundamentals underlying the approach. Section 4 presents the results derived from the application of the ANN to Chile. In this section a comparative analysis with other well-known classiers is also provided. A statistical analysis has been carried out in Section 5 to verify that the results obtained by means of the new methodology are statistically dierent from all others. Finally, the conclusions drawn are summarized in Section 6.
2. Related works

1. Introduction

The prediction of natural disasters has always been a challenging task for the human being. Actually, the prediction of tsunamis [34], volcanic eruptions [18], thunderstorms [5], hurricanes [43] or typhoons [40] has been addressed from many dierent points of view. Nevertheless, the prediction of earthquakes stands out due to the devastating eect they may cause in human activity, as thoroughly discussed by Panakkat and Adeli in 2008 [32] and, later in 2012, by Tiampo and Shcherbakov [41]. Despite the eorts made there is no system apparently capable of simultaneously fullling all the requirements established by the Seismological Society of America [3] to make an accurate prediction: to predict when, where and how probable is to occur an earthquake with a particular magnitude. This work is focused on the application of articial neural networks (ANN) to improve earthquake prediction. In particular, based on two previous works [36, 31], it aims at obtaining an optimal set of seismicity indicator as ANN's input. These two works successfully applied completely dierent sets of inputs to Chile and southern California, respectively, two of the regions with larger seismic activity in the world. However, none of them provided an analysis on the correlation exhibited between the inputs and the output. It is reasonable to think that not all the features have the same predictive ability and, even, that some of them could have decreased the prediction quality. And this is precisely the main goal of this work: to apply feature selection techniques to use a reduced set of features as ANN's input. It
Preprint submitted to Elsevier

This section is to provide the reader with a general overview of the latest published works related with earthquake prediction and, particularly, those that used ANN's.
December 18, 2012

The use of articial intelligence techniques has recently emerged as a powerful tool for earthquake prediction. For instance, the use of a method called Pattern Informatics that identies correlated regions of seismicity in recorded data that precede the main shock, was introduced in [28], as well as its extended version for 3D zones [42]. Also, the use of quantitative association rules and decision trees was applied to predict shocks in the Iberian Peninsula [25]. The prediction of medium-large earthquakes by means of the K-means algorithm was presented in [27], where the authors discovered some patterns preceding medium-large earthquakes. Also, hidden Markov models were applied to predict earthquakes in California [13] or cellular automata simulations in Turkey and Western Canada [19]. However, the application of ANN's for earthquake prediction highlights among other techniques, since it was rst proposed for evaluating the seismicity of Azores in 1994 [4]. Panakkat and Adeli [31] proposed three dierent ANN's to predict earthquake magnitude at Southern California and San Francisco bay. Especially remarkable is the novel set of seismicity indicators they used. Later, the same authors, predicted earthquake time and location in Southern California, using a recurrent neural network [33]. To achieve such a task, they computed several sets of earthquakes taking into consideration the latitude and the longitude of the epicentral location as well as the time of occurrence of the following earthquake. Another kind of ANN, a probabilistic neural network was evaluated in [1] and also applied to Southern California. The main novelty was the use of this kind of neural network for classication purposes, in particular, using the earthquake magnitude as a target label to classify. The seismicity of four zones of Chile, one of the countries with higher seismic activity, was explored by means of neural techniques in [36]. The authors proposed a particular architecture and used a novel set of input parameters, mainly based on b-value variations, Bath's law and OmoriUtsu's law. Especially remarkable is the small spatial and temporal uncertainty their ANN's presented (cells varying from 0.5 0.5 to 1 1 and 5 days, respectively). ANN's have also been applied to predict earthquake magnitude in Greece [23]. In this work, the authors used only the magnitude of the previous earthquakes as input and obtained a high accuracy rate for medium earthquakes. However, the rate considerably decreased when considering major seismic events. The suitability of applying ANN's to the northern Red Sea area has also been analyzed in [2]. This time, the authors proposed a number of dierent architectures varying the number of hidden layers, the transfer functions and the number of nodes. Then, they compared their performance to several Box-Jenkins models [8]. Another hazardous area, India, has been subjected to study by means of ANN's [9]. After evaluating several architectures, the authors concluded that the best one must include two hidden layers and the sigmoid transfer function. Also the tectonic regions of Northeast India have 2

been explored [39]. The authors retrieved earthquake data from NOAA and USGS catalogues and proposed two nonlinear forecasting models. Both approaches are stable and suggest the existence of certain seasonality in earthquake occurrence in this area. The East Anatolian fault system is known for causing many earthquakes. A multi-layer Levenberg-Marquardt ANN was applied to predict earthquakes in that area in [22]. The main novelty of this work lied on the use of variations of radon as ANN's input. Also in Turkey, an early earthquake warning system was developed in [7]. To achieve this goal, an ANN making use of the information provided by a seismic sensor network that records ground motions was proposed by the authors. The unsupervised ANN's version Kohonen's selforganized maps [21] was applied to study the concentration and the trend of aftershocks occurred after the earthquake in Sichuan (China) in 2008 [24]. Therefore, the longitude, the latitude and the magnitude of the aftershocks occurring within the next two days after the main shock were predicted. Finally, a general-purpose methodology, based on ANN, was introduced in [20] to calculate the probability of earthquake's inter-arrival time for a particular zone, given a magnitude interval or a magnitude greater than a preset threshold. To show its eciency, the authors validated their methodology on a wide variety of datasets.
3. Methodology

This section describes the tasks accomplished to improve the prediction strategies followed in [36] and [31]. Figure 1 illustrates the full process. Every task is described below: 1. First of all, it is worth noting that a prediction problem has been turned into a binary classication one. To perform such a task, the labels assigned to every event or earthquake have information about the future. That is, every sample has been labeled with an 1 if an earthquake with a magnitude larger than a preset threshold is occurring within the next days; and with a 0 if not. The horizon of prediction has been set to ve days (as in [36]) and the triggering thresholds those than ensure balanced training sets (as discussed in [36]). 2. Analysis of the quality of the features used in [36] and [31]. The information gain has been measured for each set of features separately. Table 1 lists the features considered in this work, where bi are the ith Gutenberg-Richter's bvalues calculated as in Equation (9) from [36], OU stands for Omori-Utsu and GR for Gutenberg-Richter. 3. Selection of the best features, in terms of information gain (see Section 3.1), to be used as ANN's in1 1 1 put. That is, given the sets F1 = {f1 , f2 , . . . , f7 } corresponding to the features proposed in [36] and 2 2 2 F2 = {f1 , f2 , . . . , f9 } corresponding to the features

Set of inputs #1 Feature selection Set of inputs #2 Training sets Test sets Comparisons New set ANN MODEL Predictions

NB, KNN, SVM

Other models

Predictions

Statistical tests

Figure 1: Steps involved in the methodology.

proposed in [31], every feature is evaluated and ranked according to their information gain. Then, form the new set of features, F , with the seven features with higher gain of information. That is, } where every fi can belong to , . . . , f7 , f2 F = {f1 either F1 or F2 , and all these seven features exhibit higher gain of information than the discarded nine ones (#(F1 F2 ) = 16). Note that F is composed of seven features as the ANN architecture applied in [36] is going to be used. 4. Evaluation of the new set of features by means of a wide variety of quality parameters (see Section 3.2), typically used in measuring classiers performance. 5. Application of tests to show the statistical relevance of the results achieved (see Section 3.3). That is, to show that the results obtained with the new set of features outperform former sets not only on average but also with a statistical proof. Additionally, the analysis of other classiers' performance is also provided to conrm that the ANN solution is the one that better ts in this particular problem.
1 1 where f1 , . . . , f7 are the features used in [36] and 2 2 f1 , . . . , f9 those used in [31]. To have a more detailed description of them, please refer to such papers. Finally, it is important to remark that a prediction is made every time an earthquake with magnitude larger than 3.0 occurs (the cuto magnitude for the Chilean datasets is 3.0, as shown in [36]). The areas under study exhibit a large seismic activity, so a prediction is made almost every day.

Table 1: Summary of the set of features evaluated, including formulas and description.

Feature 1 f1 1 f2 1 f3 1 f4 1 f5 1 f6 1 f7 2 f1 2 f2 2 f3 2 f4 2 f5 2 f6 2 f7 2 f8 2 f9

Notation x1 x2 x3 x4 x5 x6 x7 T Mmean dE 1/2 a M

Description bi bi4 bi4 bi8 bi8 bi12 bi12 bi16 bi16 bi20 OU's law Dynamical GR's law Elapsed time Mean magnitude Square root of seismic energy Slope of magnitude-log plot avalue from GR's law Magnitude decit Mean time Coecient of variation Mean square deviation

selection method that, on the contrary, does not take into account feature interaction. The information gain of a given feature f regarding the class attribute C is the reduction in uncertainty about the value of C when the value of f is known, I (C ; f ). The uncertainty about the value of C is measured by means of the entropy, denoted by H (C ). The uncertainty about the value of C when f is known is given by the conditional entropy of C given f , H (C |f ). In other words:

I (C ; f ) = H (C ) H (C |f )

(1)

3.1. Feature selection based on information gain

where the entropy of C , for discrete variables is dened as:

The open source Weka software [30] has been used to measure the information gain associated with each feature with respect to the class. This is a widely used standard feature 3

H (C ) =

k i=1

P (ci )log2 (P (ci ))

(2)

assuming that C can take values in {c1 , . . . , ck }, and P (ci ) is the probability that C = ci . Then, the conditional entropy of C given f , assuming that f can take values in {f1 , . . . , fm } is dened as:

Sn ) and specicity or rate of actual negatives correctly identied (denoted by Sp ), are dened as: Sn = Sp =
3.3. Statistical tests

H (C |f ) = H (C ) =

m j =1

TP TP + FN TN TN + FP

(7) (8)

P (fj )H (C |fj )

(3)

where P (fj ) is the probability that f = fj . If the input feature f is continuous then, in order to compute its information gain with the class attribute C , all possible binary attributes are considered, f , that arise from f when a cuto threshold is chosen on f . may take values from all the range of f . In this case, the information gain formula is reduced to:

I (C ; f ) = argmaxf I (C, f )

(4)

In this particular context, Weka has to deal with continuous features, as all ANN's inputs are real-valued. To calculate the information gain associated with every feature Weka performs a previous discretization of all the variables, turning the continuous problem into a discrete one, which is typically easier to solve.
3.2. Quality parameters

To assess the performance of the ANN's designed, several parameters have been used. In particular: 1. True positives (TP). The number of times that an upcoming earthquake was properly predicted. 2. True negatives (TN). The number of times that neither the ANN triggered an alarm nor an earthquake occurred. 3. False positives (FP). The number of times that the ANN erroneously predicted the occurrence of an earthquake. 4. False negatives (FN). The number of times that the ANN did not trigger an alarm but an earthquake did occur. The combination of these parameters leads to the calculation of several measures:

A statistical analysis is proposed to evaluate the signicance of the new approach, following the non-parametric procedures discussed in Garca et al. [15]. When several classiers need to be compared two different kind of tests can be applied, depending on the previous statistical knowledge of the data. Thus, when data are normally distributed and variances among populations are equal (homogeneity), it is usual to apply the ANOVA (ANalysis Of VAriance) test [12, 37]. Also, when the number of hypothesis is suciently small (typically less than 50) it can be used regardless the distribution that data may have. However, the non-parametric version of ANOVA, the Friedman test [14], is used whenever any assumption can be made about data. This test is precisely the selected method to be applied in this work, since a priori no previous knowledge is known from data. Also, even if data followed a normal distribution, this method would still be valid. Indeed it is a generalization of ANOVA suitable for any kind of data distributions. For these reasons, the Friedman test has been selected to be applied in this work. The steps are now detailed. Given a matrix n k of data {xij }, where n is the number of rows (experiments) and k the number of columns (the algorithms to be tested), calculate the ranks within each row. In case of tied values, assign to each tied value the average of the ranks that would have been assigned without ties. Replace the data with a new matrix {rij } (also with n k elements), where each element rij is the rank of xij within row i. Afterwards, the following values need to be found: n 1 rj = rij (9) n i=1

TN P0 = TN + FN P1 = TP TP + FP

(5) (6)

r=

n k 1 rij nk i=1 j =1 k (rj r)2 j =1

(10)

SSt = n

(11)

where P0 denotes the well-known negative predictive value, and P1 the well-known positive predictive value. Additionally, two more parameters have been used to evaluate the performance of the ANN's, as they correspond to common statistical measures of supervised classiers performance. These two parameters, sensitivity or rate of actual positives correctly identied as such (denoted by 4

SSe =

1 (rij r)2 n(k 1) i=1 j =1


n k

(12)

Once (11) and (12) have been calculated, the result of the statistical test is given by Q = SSt /SSe . Last, when n or k are big enough (typically n > 10 or k > 5), which is the

situation occurred in this work, Q's probability distribution can be approximated by that of a 2 distribution. In such case, the pvalue is given by P (2 k1 Q). The nal step, once veried that p has a signicant value, is to carry out a post-hoc analysis to nd dierences between couples of algorithms. There are many strategies than can be followed, however, in this work, the Holm and Hochberg tests are applied to follow the methodology described in [26]. Let H1 , . . . , Hm be a family of hypotheses and P1 , . . . , Pm their associated pvalues. First, the pvalues are sorted P(1) , . . . , P(m) , denoting their associated hypotheses as H(1) , . . . , H(m) . Given a level of significance , let be k the minimal index such that:

Table 2: Average information gain for every feature in [36] for Talca.

Average 0.277 0.017 0.010 0.009 0 0 0

merit 0.030 0.052 0.030 0.028 0 0 0

Feature x7 x6 x1 x3 x2 x4 x5

Table 3: Average information gain for every feature in [31] for Talca.

P(k)

m1+k

(13)

The procedure then rejects H(1) , . . . , H(k1) and does not reject H(k) , . . . , H(m) . Note that k could be equal to 1. In this case, any hypothesis would be rejected. Also, there could not be any k satisfying (13); in that case all hypotheses are directly rejected. Finally, the method is satisfactory if it ensures that P (k i) < .
4. Results: application to four Chilean zones

Average 0.203 0.177 0.160 0.084 0.009 0 0 0 0

merit 0.019 0.019 0.018 0.055 0.027 0 0 0 0

Feature T Mmean dE 1/2 M a

This section introduces the results of applying the methodology proposed in four main Chilean regions. In particular, the regions with cells varying from 0.5 0.5 to 1 1 around the cities of Talca, Santiago, Pichilemu and Valparaso have been considered. These sets can be downloaded from the University of Chile's National Service of Seismology [29] upon request. For all the four areas, the training and test sets chosen are those analyzed in [36] to make easier comparisons. Additionally, a comparison with Naive Bayes (NB) [17], Knearest neighbors (KNN) [11] and support-vector machines (SVM) [10] has been calculated to show that ANN's have better performance than any other classier. As for the setup of NB, KNN, and SVM, the default conguration provided by Weka 3.6 [30] has been used in all cases.
4.1. Talca

The next step consists in selecting the best features or, in other words, those that presented greater information gain. The number of features is limited to seven, to fulll the architectural constraints presented in [36], as this particular architecture of ANN is desired to be used. Therefore, the new set of input parameters is:

InputsT alca = {x7 , T, , Mmean , , x6 , x1 }

(14)

Given this new set of seismicity indicators, the ANN's were applied again yielding the results summarized in Table 4. It can be concluded that the new set of inputs led to signicantly better results.
Table 4: ANN performance for Talca with the new set of seismicity indicators.

For this area, a training set comprising earthquakes occurred from June 19th 2003 to March 21st 2010 has been used. Analogously, the test set was composed of the earthquakes occurred from March 24th 2010 to January 4th 2011. Tables 2 and 3 represent the information gain that every seismicity indicator exhibited in both Reyes et al. [36] and Panakkat and Adeli [31] works, respectively. In particular, the column Average merit is the average information gain and its standard deviation obtained from the calculation of the information gain in a 10-fold cross-validation process. As it can be observed, there are three features with null contribution in Table 2 and four in Table 3. Therefore, the inclusion of such features in the original prediction may have negatively inuenced the nal results. 5

Parameter TP TN FP FN Sensitivity Specicity P0 P1 Mean

ANN [36] 3 23 14 5 37.50% 62.16% 82.14% 17.65% 49.86%

ANN [31] 2 22 15 7 22.22% 59.46% 75.86% 11.76% 42.33%

new ANN 7 30 7 1 87.50% 81.08% 96.77% 50.00% 78.84%

Finally, to check the adequacy of using ANN, dierent classiers have been applied using as input parameters those in [36], those in [31], and the new set before selected. This information can be found in Table 5. Note that NB stands for Naive Bayes, KNN for K-Nearest Neighbors and SVM for Support-Vector Machine.

Table 5: Several classiers performance with new set of inputs in Talca.

Parameter TP TN FP FN Sensitivity Specicity P0 P1 Mean Parameter TP TN FP FN Sensitivity Specicity P0 P1 Mean Parameter TP TN FP FN Sensitivity Specicity P0 P1 Mean

NB [36] 0 35 2 8 0.00% 94.59% 81.40% 0.00% 44.00%

NB [31] 1 23 14 7 12.50% 62.16% 76.67% 6.67% 39.50%

new NB 3 23 14 5 37.50% 62.16% 82.14% 17.65% 49.86%

information gain and its standard deviation obtained by means of a 10-fold cross-validation process in the training set. Two features with null contribution are reported in Table 6 and seven in Table 7; therefore, it is also desirable to conduct an analysis to obtain a better set of features.
Table 6: Average information gain for every feature in [36] for Santiago.

KNN [36] 0 37 0 8 0.00% 100.00% 82.22% 0.00% 45.56% SVM [36] 0 37 0 8 0.00% 100.00% 82.22% 0.00% 45.56%

KNN [31] 3 26 11 5 37.50% 70.27% 83.87% 21.43% 53.27% SVM [31] 0 37 0 8 0.00% 100.00% 82.22% 0.00% 45.56%

new KNN 3 24 13 5 37.50% 64.86% 82.76% 18.75% 50.97% new SVM 0 37 0 8 0.00% 100.00% 82.22% 0.00% 45.56%

Average 0.012 0.009 0.009 0.009 0.003 0 0

merit 0.002 0.001 0.001 0.002 0.002 0 0

Feature x1 x3 x5 x4 x2 x7 x6

Table 7: Average information gain for every feature in [31] for Santiago.

Average 0.113 0.029 0 0 0 0 0 0 0

merit 0.016 0.061 0 0 0 0 0 0 0

Feature T Mmean dE 1/2 M a

Again, the next step is to select the seven features that presented greater information gain. In this case, the new set of seismicity indicators to be used as ANN inputs is:

InputsSantiago = {T, , x1 , x3 , x5 , x4 , x2 }
From the analysis of Tables 4 and 5 two conclusions can be drawn. First, the use of the new set has led to a general improvement comparing the results obtained in [36] and [31], that is, none of the eleven congurations outperformed on average the results obtained by the ANN when the new set of inputs was used. Second, the comparison between the four classiers presented shows that the ANN clearly presents the best results. On the other hand, for the SVM, all input sets generated the same results, showing the inability of this classier to deal with this kind of data.
4.2. Santiago

(15)

With this new inputs, the ANN was applied again, yielding the results summarized in Table 8, concluding that the new set of seismicity parameters obtained better results (71.89% versus 65.68% and 53.33% accuracy).
Table 8: ANN performance for Santiago with the new set of seismicity indicators.

For this area, the training set contains the earthquakes occurred from May 13th 2003 to June 2nd 2004. On the other hand, the test set was composed of the earthquakes occurred from June 23rd 2004 to January 16th 2006. Tables 6 and 7 represent the information gain that every seismicity indicator exhibited in [36] and [31], respectively. Again, the column Average merit stands for the average 6

Parameter TP TN FP FN Sensitivity Specicity P0 P1 Mean

ANN [36] 5 101 7 9 35.71% 93.52% 91.82% 41.67% 65.68%

ANN [31] 2 99 9 12 14.29% 91.67% 89.19% 18.18% 53.33%

new ANN 5 105 3 9 35.71% 97.22% 92.11% 62.50% 71.89%

Finally, to check the adequacy of using ANN, dierent classiers have been applied using as input param-

eters those in [36], those in [31], and the combined set above selected. This information can be found in Table 9. Note that NB stands for Naive Bayes, KNN for K-Nearest Neighbors and SVM for Support-Vector Machine.
Table 9: Several classiers performance with new set of inputs in Santiago.

Parameter TP TN FP FN Sensitivity Specicity P0 P1 Mean Parameter TP TN FP FN Sensitivity Specicity P0 P1 Mean Parameter TP TN FP FN Sensitivity Specicity P0 P1 Mean

NB [36] 4 98 10 10 28.57% 90.74% 90.74% 28.57% 59.66%

NB [31] 6 76 32 8 42.86% 70.37% 90.48% 15.79% 54.87%

new NB 6 86 22 8 42.86% 79.63% 91.49% 21.43% 58.85%

earthquakes occurred from December 20th 2008 to February 10th 2011. Tables 10 and 11 represent the information gain and its standard deviation (column Average merit ) that every seismicity indicator exhibited in [36] and [31], respectively, when a 10-fold cross-validation process was applied to the training sets. As for Talca and Santiago, several features had null contribution: ve from the set in [36] and four from the set of [31]. Again, it becomes necessary to assess the quality of the features in order to obtain a better set of them.
Table 10: Average information gain for every feature in [36] for Valparaso.

KNN [36] 6 93 15 8 42.86% 86.11% 92.08% 28.57% 62.40% SVM [36] 0 108 14 0 0.00% 88.52% 100% 0.00% 47.13%

KNN [31] 0 82 26 11 21.43% 75.93% 88.17% 10.34% 48.97% SVM [31] 0 108 14 0 0.00% 88.52% 100% 0.00% 47.13%

new KNN 0 101 7 11 21.43% 93.52% 90.18% 30.00% 58.78% new SVM 0 108 14 0 0.00% 88.52% 100% 0.00% 47.13%

Average 0.244 0.175 0 0 0 0 0

merit 0.202 0.016 0 0 0 0 0

Feature x7 x6 x1 x2 x3 x4 x5

Table 11: Average information gain for every feature in [31] for Valparaso.

Average 0.270 0.193 0.188 0.045 0.009 0 0 0 0

merit 0.025 0.017 0.022 0.055 0.028 0 0 0 0

Feature T dE 1/2 Mmean M a

From the observation of Table 9 the followings conclusions can be drawn. First, none of the eleven congurations outperformed on average the results obtained by the ANN when the new set of inputs was used. For the SVM, all input sets generated the same results, showing the inability of the SVM to deal with this kind of data. As for NB, the new results were similar to those obtained by the parameters in [36] and signicantly better than those of [31]. Only in the application of KNN the new set obtained worse results than those of [36] (62.40% versus 58.78%, respectively).
4.3. Valparaso

Then, the features with greater information gain are selected. As commented before, the number of features is limited to seven to serve as input of the ANN presented in [36]. The new set of inputs is composed of:

InputsV alp. = {T, x7 , , , x6 , dE 1/2 , Mmean }

(16)

For this area, a training set comprising earthquakes occurred from January 31st 2006 to December 19th 2008 has been used. Analogously, the test set was composed of the 7

Given this new set of seismicity indicators, the ANN are again applied generating the results reported in Table 12. The same conclusion is reached: the new set of inputs obtained signicantly better results. Finally, to assess the ANN performance, dierent classiers have been applied using as input parameters those in [36], those in [31], and the new set dened in Eq. (16). This information is summarized in Table 13. Again, there was no classier with the set of seismicity indicators outperforming the new set applied to the ANN. The comparison between classiers shows that the new sets proposed also performs better results, on average, than the

Table 12: ANN performance for Valparaso with the new set of seismicity indicators.

Parameter TP TN FP FN Sensitivity Specicity P0 P1 Mean

ANN [36] 20 59 3 24 45.45% 95.16% 71.08% 86.96% 74.66%

ANN [31] 34 47 15 10 77.27% 75.81% 82.46% 69.39% 76.23%

new ANN 29 60 2 15 65.91% 96.77% 80.00% 93.55% 84.06%

used. Analogously, the test set was composed of the earthquakes occurred from April 1st 2010 to October 8th 2011. Tables 14 and 15 report the average information gain and its associated standard deviation that every seismicity indicator exhibit in [36] and [31], respectively. This time there were no features with null average information gain in Table 14 but four features had null contribution in Table 15. Again, the necessity to conduct an analysis to obtain a better set of features is highlighted.

Table 14: Average information gain for every feature in [36] for Pichilemu.

Table 13: Several classiers performance with the new set of inputs in Valparaso.

Parameter TP TN FP FN Sensitivity Specicity P0 P1 Mean Parameter TP TN FP FN Sensitivity Specicity P0 P1 Mean Parameter TP TN FP FN Sensitivity Specicity P0 P1 Mean

NB [36] 18 58 4 26 40.91% 93.55% 69.05% 81.82% 71.33%

NB [31] 21 53 9 23 47.73% 85.48% 69.74% 70.00% 68.24%

new NB 19 61 1 25 43.18% 98.39% 70.93% 95.00% 76.87%

Average 0.512 0.499 0.113 0.103 0.101 0.010 0.010

merit 0.104 0.086 0.012 0.010 0.009 0.030 0.031

Feature x7 x6 x4 x5 x3 x2 x1

Table 15: Average information gain for every feature in [31] for Pichilemu.

KNN [36] 30 52 10 14 68.18% 83.87% 78.79% 75.00% 76.46% SVM [36] 35 46 16 9 79.55% 74.19% 83.64% 68.63% 76.50%

KNN [31] 21 46 16 23 47.73% 74.19% 66.67% 56.76% 61.34% SVM [31] 12 57 5 32 27.27% 91.94% 64.04% 70.59% 63.46%

new KNN 33 53 9 11 75.00% 86.48% 82.81% 78.57% 80.47 % new SVM 38 31 31 6 86.36% 50.00% 83.78% 55.07% 68.80%

Average 0.415 0.298 0.152 0.102 0.100 0 0 0 0

merit 0.026 0.027 0.037 0.038 0.076 0 0 0 0

Feature Mmean dE 1/2 T M a

The next step is to select the best features or, in other words, those that presented greater information gain. The number of features is limited to seven, to use the ANN in [36]. Therefore, the new set of input parameters is:

InputsP ichilemu = {x7 , x6 , Mmean , dE 1/2 , x4 , x5 } (17)


Given this new set of seismicity indicators, the ANN's are applied yielding the results summarized in Table 16. As happened in the other three areas, the new set of inputs generated signicantly better results. Finally, to evaluate the performance of using ANN's, dierent classiers have been applied using as input parameters those in [36], those in [31], and the new set above selected. This information is reported in Table 17. From the observation of Tables 16 and 17 two conclusions can be drawn. In concordance with Talca, Santiago and Pichilemu, the best results have been obtained with the new set of seismicity indicators using the ANN as classier. Second, the use of such a set led to a general improvement in all methods, except for NB, where the use 8

results obtained with the sets proposed in [36] and [31], except for SVM.
4.4. Pichilemu

For this area, a training set comprising earthquakes occurred from August 10th 2005 to March 31st 2010 has been

Table 16: ANN performance for Pichilemu with the new set of seismicity indicators.

Parameter TP TN FP FN Sensitivity Specicity P0 P1 Mean

ANN [36] 13 91 2 16 44.83% 97.85% 85.05% 86.67% 78.60%

ANN [31] 14 76 17 15 48.28% 81.72% 83.52% 45.16% 64.67%

new ANN 21 88 5 8 72.41% 94.62% 91.67% 80.77% 84.87%

can be easily drawn. Table 18 summarizes the use of the features in all the four analyzed areas. As it can be seen, the inputs that have been chosen are {x6 , x7 , T, Mmean , , , dE 1/2 } as they present the highest sum of information gain. The interpretation of the inclusion of this particular features in the set of seismicity indicators is discussed now. 1. The information provided by the dynamical Gutenberg-Richter's law (information included in x7 ) as well as that of Omori-Utsu's law (information codied in variable x6 ) are the most valuable indicators to predict short-term earthquakes, as shown in Table 18, where they obtained the two best positions. 2. The knowledge of the average time elapsed between earthquakes as well as its mean value and associated standard deviation (features T, , and , respectively) also seems to be crucial for an accurate prediction process. 3. The mean magnitude of the last earthquakes occurred, Mmean , has to be also considered to improve the accuracy of any prediction system. 4. The rate of release of square root of energy completes the seven most signicative features. Note that this information was intuitively used in [27], where the authors discovered precursory patterns in the Iberian Peninsula by using x7 , T and Mmean . Equally remarkable is the fact that from the seven best features, ve of them were used in [31] and the remaining two and more signicant ones were used in [36] thus conrming that both set of inputs had, at least, a signicative number of meaningful features.
Table 18: Number of times that a features was used in the dierent datasets examined.
Feature Talca 0.010 0.000 0.009 0.000 0.000 0.017 0.277 0.203 0.177 0.160 0.084 0.000 0.000 0.009 0.000 0.000 Santiago 0.012 0.003 0.009 0.009 0.009 0.000 0.000 0.113 0.000 0.000 0.029 0.000 0.000 0.000 0.000 0.000 Valparaso 0.000 0.000 0.000 0.000 0.000 0.175 0.244 0.270 0.193 0.009 0.188 0.045 0.000 0.000 0.000 0.000 Pichilemu 0.010 0.010 0.101 0.113 0.103 0.499 0.512 0.100 0.102 0.415 0.152 0.298 0.000 0.000 0.000 0.000 Sum

Table 17: Several classiers performance with new set of inputs in Pichilemu.

Parameter TP TN FP FN Sensitivity Specicity P0 P1 Mean Parameter TP TN FP FN Sensitivity Specicity P0 P1 Mean Parameter TP TN FP FN Sensitivity Specicity P0 P1 Mean

NB [36] 12 91 2 17 41.38% 97.85% 84.26% 85.71% 77.30%

NB [31] 3 93 0 26 10.34% 100% 78.15% 100% 72.12%

new NB 3 90 3 26 10.34% 96.77% 77.59% 50.00% 58.68%

KNN [36] 20 38 55 9 68.97% 40.86% 80.85% 26.67% 54.34% SVM [36] 0 93 0 29 0.00% 100% 76.23% 0.00% 44.06%

KNN [31] 13 60 30 16 44.83% 66.67% 78.95% 30.23% 55.17% SVM [31] 14 82 11 15 48.28% 88.17% 84.54% 56.00% 69.25%

new KNN 21 62 28 8 72.41% 68.89% 88.57% 42.86% 68.18% new SVM 13 87 6 16 44.83% 93.55% 84.47% 68.42% 72.82%

Rank 11 12 8 9 10 2 1 3 5 4 6 7 14 13 14 14

x1 x2 x3 x4 x5 x6 x7 T Mmean dE 1/2 M a

0.032 0.013 0.119 0.113 0.112 0.691 1.033 0.686 0.472 0.584 0.453 0.343 0.000 0.009 0.000 0.000

of only the indicators proposed in [36] and [31] reached better results on average.
4.5. Discussion on the use of features

5. Statistical tests

Previous sections have provided a large amount of data and tables. This section is to summarize all the information quantitatively presented, so that, general conclusions 9

This section is to show that the classiers applied to the four Chilean zones studied have signicant statistically dierent results. As described in Section 3.3, Friedman's test is going to be used, declaring a level of signicance of p < 0.05. First, the matrix rij is constructed, with a

size (n k ) = (4, 12), where n = 4 represents the four analyzed zones and k = 12 the twelve dierent classiers applied. Each element in the matrix are the performance's mean values retrieved from Tables 4, 5, 8, 9, 12, 13, 16, 17. For legibility reasons, Table 19 summarizes the transpose . matrix, rij
Table 19: Input values for the Friedman's test (transpose matrix).

ANN with the new set of inputs and the ANN's with the inputs proposed in [36] and [31], respectively. Table 21 shows the pvalues obtained by the ANN in [36] and by the ANN in [31] for a level of signicance = 0.05. The test concludes that the new set of inputs generates better results since the test rejects all hypotheses.
Table 21: Holm-Hochberg test using the ANN with the new set of inputs as control algorithm.

Classier ANN [36] ANN [31] new ANN NB [36] NB [31] new NB KNN [36] KNN [31] new KNN SVM [36] SVM [31] new SVM

Talca 0.4986 0.4233 0.7884 0.4400 0.3950 0.4986 0.4556 0.5327 0.5097 0.4556 0.4556 0.4556

Santiago 0.7860 0.6467 0.8487 0.7730 0.7212 0.5868 0.5434 0.5517 0.6818 0.4406 0.6925 0.7282

Valparaso 0.6568 0.5333 0.7189 0.5966 0.5487 0.5885 0.6240 0.4897 0.5878 0.4713 0.4713 0.4713

Pichilemu 0.7466 0.7623 0.8406 0.7133 0.6824 0.7687 0.7646 0.6134 0.8047 0.7650 0.6346 0.6880

i 2 1

Classier ANN [31] ANN [36]

z 4.56 3.65

pvalue 5.01106 2.61104

/i 0.025 0.050

6. Conclusions

Then, as next step, the rankings matrix is calculated and shown in Table 20.
Table 20: Average rankings of the classiers applied to Chile.

Classier ANN [36] ANN [31] new ANN NB [36] NB [31] new NB KNN [36] KNN [31] new KNN SVM [36] SVM [31] new SVM

Ranking 2.88 7.25 1.00 5.25 7.25 4.38 4.38 5.63 3.50 7.63 7.88 6.88

An optimized earthquake prediction methodology by means of ANN's has been introduced in this work. To enhance the prediction, the analysis on how dierent seismicity indicators inuence the model generation has been conducted. In particular, a feature selection based on the information gain provided by each indicator individually has been done in order to ensure that the ANN's inputs are those with maximum correlation with the output. This strategy has been evaluated on four Chilean zones with different geophysical properties to show the generality of the method proposed. A comparison with other well-known techniques has been provided as well as a statistical analysis, to show that the results are not only better in terms of the quality parameters assessed but also they present distributions statistically dierent, leading to conclude that the new approach outperformed every algorithm to which it was compared, that is, the use of the new sets of seismicity parameters as ANN's input generated the best results in all evaluated cases.
Acknowledgments

From the analysis of Table 20 two conclusions can easily be drawn. First, the use of the new set of inputs improved, on average, all classiers (ANN, NB, KNN and SVM) and, second, the ANN obtained better results than any other classier (as wanted to be proved). However, this fact does not reject the null hypothesis yet. But the application of Friedman's test led to p = P (2 k1 Q) = 0.0432, which satises the initial assumption p < 0.05, thus concluding that p reached a signicant value and, now, rejecting the null hypothesis. Since the pvalue is less than 0.05, a post-hoc analysis can be carried out in order to prove that the results obtained by the ANN with the new set of inputs are statistically dierent (and therefore better) than those obtained by the ANN's with the initial set of inputs. The HolmHochberg test has been applied to compare separately the 10

The authors want to thank TGT-www.geosica.cl for the support through grants number 2210 and 2211. The nancial support given by the Spanish Ministry of Science and Technology, projects BIA2004-01302 and TIN2011-28956C02-01 are equally acknowledged. This work has also been funded by project TIC-7528 by the Junta de Andaluca. Finally, this work has been partially funded by a JPI 2012 Banco Santander's grant.
References

[1] H. Adeli and A. Panakkat. A probabilistic neural network for earthquake magnitude prediction. Neural Networks, 22:1018 1024, 2009. [2] A. S. N. Alari, N. S. N. Alari, and S. Al-Humidan. Earthquakes magnitude predication using articial neural network in northern Red Sea area. Journal of King Saud University Science, 24:301313, 2012. [3] C. R. Allen. Responsibilities in earthquake prediction. Bulletin of the Seismological Society of America, 66:20692074, 1982.

[4] E. I. Alves. Earthquake forecasting using neural networks: Results and future work. Nonlinear Dynamics, 44(1-4):341349, 2006. [5] M. Anad, A. Dash, M. S. J. Kumar, and A. Kesarkar. Prediction and classication of thunderstorms using articial neural network. International Journal of Engineering Science and Technology, 3(5):40314035, 2011. [6] A. Arauzo-Azofra, J. L. Aznarte, and J. M. Bentez. Empirical study of feature selection methods based on individual feature evaluation for classication problems. Expert Systems with Applications, 38(7):81708177, 2007. [7] M. Bose, F. Wenzel, and M. Erdik. PreSIS: a neural networkbased approach to earthquake early warning for nite faults. Bulletin of the Seismological Society of America, 98(1):366382, 2008. [8] G. Box and G. Jenkins. Time series analysis: forecasting and control. John Wiley and Sons, 2008. [9] G. Chattopadhyay and S. Chattopadhyay. Dealing with the complexity of earthquake using neurocomputing techniques and estimating its magnitudes with some low correlated predictors. Arabian Journal of Geosciences, 2(3):247255, 2009. [10] C. Cortes and V. Vapnik. Support-vector networks. Machine Learning, 20(3):273297, 1995. [11] T. M. Cover and P. E. Hart. Nearest neighbor pattern classication. IEEE Transactions on Information Theory, 13(1):2127, 1967. [12] M. D'Arco, A. Liccardo, and N. Pasquino. ANOVA-based approach for DAC diagnostics. IEEE Transactions on Instrumentation and Measurement, 61(7):18741882, 2012. [13] J. E. Ebel, D. W. Chambers, A. L. Kafka, and J. A. Baglivo. Non-poissonian earthquake clustering and the hidden Markov model as bases for earthquake forecasting in California. Seismological Research Letters, 78(1):5765, 2007. [14] D. A. Freedman. Statistical Models: Theory and Practice. Cambridge University Press, 2005. [15] S. Garca, A. Fernndez, J. Luengo, and F. Herrera. A study of statistical techniques and performance measures for geneticsbased machine learning: accuracy and interpretability. Soft Computing, 13(10):959977, 2009. [16] Y. Han and L. Yu. A variance reduction framework for stable feature selection. Statistical Analysis and Data Mining, 5(5):428445, 2012. [17] D. J. Hand and K. Yu. Idiot's Bayes - not so stupid after all? International Statistical Review, 69(3):385399, 2001. [18] N. Houli, J. C. Komorowski, M. de Michele, M. Kasereka, and H. Ciraba. Early detection of eruptive dykes revealed by normalized dierence vegetation index (NDVI) on Mt. Etna and Mt. Nyiragongo. Earth and Planetary Science Letters, 246(34):231240, 2006. [19] A. Jimnez, A. M. Posadas, and K. F. Tiampo. Describing seismic pattern dynamics by means of ising cellular automata. Lecture Notes in Earth Sciences, 112:273290, 2008. [20] P. Kamatchi, K. B. Rao, N. R. Iyer, and S. Arunachalam. Neural network-based methodology for inter-arrival times of earthquakes. Natural Hazards, 64(2):12911303, 2012. [21] T. Kohonen. Self-organized formation of topologically correct feature maps. Biological Cybernetics, 43:5969, 1986. [22] F. Kulahci, M. Inceoz, M. Dogru, E. Aksoy, and O. Baykara. Articial neural network model for earthquake prediction with radon monitoring. Applied Radiation and Isotopes, 67(1):212 220, 2009. [23] M. Moustra and M. Avraamides and C. Christodoulou. Articial neural networks for earthquake prediction using time series magnitude data or seismic electric signals. Expert Systems with Applications, 38(12):1503215039, 2011. [24] R. Madahizadeh and M. Allamehzadeh. Prediction of aftershocks distribution uing articial neural networks and its application on the May 12, 2008 Sichuan earthquake. Journal of Seismology and Earthquake Engineering, 11(3):111120, 2009. [25] F. Martnez-lvarez, A. Troncoso, A. Morales-Esteban, and J. C. Riquelme. Computational intelligence techniques for pre-

[26]

[27]

[28]

[29] [30] [31]

[32] [33]

[34]

[35] [36] [37]

[38]

[39]

[40]

[41] [42]

[43]

dicting earthquakes. Lecture Notes in Articial Intelligence, 6679(2):287294, 2011. M. Martnez-Ballesteros, F. Martnez-lvarez, A. Troncoso, and J. C. Riquelme. An evolutionary algorithm to discover quantitative association rules in multidimensional time series. Soft Computing, 15(10):20652084, 2011. A. Morales-Esteban, F. Martnez-lvarez, A. Troncoso, J. L. de Justo, and C. Rubio-Escudero. Pattern recognition to forecast seismic time series. Expert Systems with Applications, 37(12):83338342, 2010. K.Z. Nanjo, J. R. Holliday, C. C. Chen, J. B. Rundle, , and D. L. Turcotte. Application of a modied pattern informatics method to forecasting the locations of future large earthquakes in the central Japan. Tectonophysics, 424:351366, 2006. University of Chile. National Service of Seismology. http://ssn.dgf.uchile.cl/seismo.html. WEKA The University of Waikatu. Data mining with open source machine learning software in Java. http://www.cs.waikato.ac.nz/ml/weka/. A. Panakkat and H. Adeli. Neural network models for earthquake magnitude prediction using multiple seismicity indicators. International Journal of Neural Systems, 17(1):1333, 2007. A. Panakkat and H. Adeli. Recent eorts in earthquake prediction (19902007). Natural Hazards Review, 9(2):7080, 2008. A. Panakkat and H. Adeli. Recurrent neural network for approximate earthquake time and location prediction using multiple sesimicity indicators. Computer-Aided Civil and Infrastructure Engineering, 24:280292, 2009. K. Ramar and T. T. Mirnalinee. An ontological representation for tsunami early warning system. In Proceedings of the IEEE International Conference on Advances in Engineering, Science and Management, pages 9398, 2012. J. Reyes and V. Crdenas. A Chilean seismic regionalization through a Kohonen neural network. Neural Computing and Applications, 19:10811087, 2010. J. Reyes, A. Morales-Esteban, and F. Martnez-lvarez. Neural networks to predict earthquakes in Chile. Applied Soft Computing, doi: 10.1016/j.asoc.2012.10.014, 2012. R. Romero-Zliz, C. Rubio-Escudero, I. Zwir, and C. del Val. Optimization of multi-classiers for computational biology: application to gene nding and expression. Theoretical Chemistry Accounts: Theory, Computation, and Modeling, 125(3):599 611, 2010. R. Ruiz, J. C. Riquelme, and J. S. Aguilar-Ruiz. Incremental wrapper-based gene selection from microarray expression data for cancer classication. Pattern Recognition, 39(12):23832392, 2006. S. Srilakshmi and R. K. Tiwari. Model dissection from earthquake time series: A comparative analysis using nonlinear forecasting and articial neural network approach. Computers and Geosciences, 35:191204, 2009. W. Sun, S. Shan, C. Zhang, P. Ge, and L. Tao. Prediction of typhoon losses in the South-East of China based on B-P network. In Proceedings of the IEEE International Conference on Articial Intelligence and Computational Intelligence, pages 252256, 2010. K. F. Tiampo and R. Shcherbakov. Seismicity-based earthquake forecasting techniques: Ten years of progress. Tectonophysics, 522-523:89121, 2012. Y. Toya, K. F. Tiampo, J. B. Rundle, C. C. Chen, and W. Klein. Pattern informatics approach to earthquake forecasting in 3D. Concurrency and Computation: Practice and Experience, 22:15691592, 2010. F. Zhang. The future of hurricane prediction. Computing in Science and Engineering, 13(1):912, 2011.

11