Professional Documents
Culture Documents
To cite this article: Qiubing Ren, Mingchao Li & Shuai Han (2019) Tectonic discrimination of
olivine in basalt using data mining techniques based on major elements: a comparative study
from multiple perspectives, Big Earth Data, 3:1, 8-25, DOI: 10.1080/20964471.2019.1572452
RESEARCH ARTICLE
1. Introduction
Olivine is one of the most common minerals on earth, which is the main mineral composition
of mantle rock, and one of the earliest minerals formed when magma crystallizes (Zhou, Chen,
& Chen, 1980). Olivine is also a major rock-forming mineral in basic and ultrabasic igneous
rocks, such as basalt, gabbro and peridotite. Olivine mainly composes of Mg or Fe and
contains other elements, such as Mn, Ni and Co, whose crystals are granular and dispersed
in rocks or granular aggregates (Enciso-Maldonado et al., 2015). As we all know, the basalt
and its minerals have always been the main means to explore the magmatic process, the
hot state of the mantle and the composition of the mantle elements (Di et al., 2017). As the
most important mineral composition of mantle rock and the earliest crystalline mineral of
magma, the basic elements of olivine in basalt can provide effective information about the
partial melting of mantle, early crystallization process of magma and mantle metasomatism
(Howarth & Harris, 2017).
The basalt is usually divided into different types based on the tectonic setting. The
mid-ocean ridge basalt (MORB), ocean island basalt (OIB) and island arc basalt (IAB) are
the three types of basalt most concerned by the academic community (Vermeesch,
2006). Then, how to discriminate the tectonic settings of basalt has become an impor-
tant issue in geochemistry. In general, we can discriminate the tectonic settings formed
by magmatic rocks, mainly including the basalt and granite, and master the chemical
properties of magma sources in terms of the chemical composition of magmatic rocks.
This is a common and feasible discrimination approach, which has been proved by many
facts (Bhatia & Crook, 1986; Li, Arndt, Tang, & Ripley, 2015; Sánchez-Muñoz et al., 2017).
The olivine in basalt records much information about the formation and evolution of
basaltic magma, which is helpful to discriminate the three tectonic settings. At present,
it is generally believed that the chemical composition of olivine is one piece of impor-
tant evidence to identify whether basalt is primitive magma, but the viewpoint that
olivine is closely related to the tectonic setting where it formed is controversial. The
paper aims to prove the rationalization of the latter.
In the 1970s, Pearce and Cann (1973) first proposed the basalt discrimination diagram
based on the chemical composition. The basalt discrimination diagram organically
combines the tectonic setting with the geochemical characteristics of basalt, thus open-
ing up a new way for the study of plate tectonics and continental orogenic belts. This
greatly enriched the research contents of basalt. The basalt discriminant diagrams have
been widely used in academic circles due to their solid theoretical foundations and
concise forms of expression (Pearce, 1996). The study of basalt tectonic setting has also
been pushed to the peak since then. Even now, the discrimination diagram is still the
main method for basalt tectonic setting discrimination. However, with the increasing use
of the discrimination diagram, its inherent defects gradually appear, such as the empiri-
cism and subjectivity, lack of rigorous theory, contradiction of discrimination and
limitation of application (Luo et al., 2018; Zhang, 1990). The drawbacks mentioned
above greatly affect the classification accuracy and work reliability of the discrimination
diagram. To our knowledge, it is difficult to discriminate the basalt tectonic settings
using the discrimination diagram based on the geochemical characteristics of olivine in
basalt. And also, there is almost no introduction about the use of olivine in the basalt
tectonic setting discrimination in the literatures. Therefore, we need to find another way
to discriminate the tectonic settings of basalt on the basis of the chemical composition
of olivine.
The data mining technique is a good choice. With the rapid development of new
technologies, such as big data and cloud computing, as well as the substantial improve-
ment of the computing capacity of computer hardware, the data mining technique has
increasingly aroused great concern among domestic and foreign scholars in recent years
(Zhou et al., 2018). In the areas of pattern recognition, function approximation and
simulation modeling, fruitful results have been achieved. But now in geochemistry, the
research on discriminating the tectonic settings of basalt using data mining is still in the
initial stage. Petrelli and Perugini (2016) used Support Vector Machines (SVM) to discrimi-
nate the tectonic settings of volcanic rocks and obtained high classification scores. Ueki,
10 Q. REN ET AL.
Hino, and Kuwatani (2018) adopted SVM, Random Forest and Sparse Multinomial
Regression approaches for classification performance comparison. The results indicated
that data mining is a highly effective tool in geochemical research. Based on the two
successful cases, we intend to employ data mining to do some simulation experiments,
which all involves the tectonic setting discrimination of basalt based on the chemical
composition of olivine. Being an attempt in geochemical research, we proceed from the
perspective of comparative research, which may lay the groundwork for future research
(Han, Li, Ren, & Liu, 2018). This just determines that there are some fundamental compara-
tive tests, mainly including the comparison of different data mining algorithms, data
preprocessing methods, combinations of geochemical characteristics and sample data
volumes. Nevertheless, we have to admit that the experimental results acquired certainly
contain some drawbacks and are open to comments and further research.
This paper is organized as follows. In Section 2, an overall research framework of this
paper is introduced. Section 3 presents a brief description of different data mining
algorithms, followed by a synopsis of the cross-validation method and measurement
criteria. The data description, preliminary experiment and benchmark classification
effects are provided in Section 4. Section 5 illustrates and discusses the experimental
results and analysis for four comparative tests. The concluding remarks and future work
are finally mentioned in Section 6.
2. Overall framework
The overall research framework shown in Figure 1 is outlined below.
(1) The geochemical data of olivine are collected from the GEOROC and PetDB
databases, followed by data cleaning and management, thus laying a good data
foundation for this study. It is worth mentioning that the olives data used are
measured via common methods of major element analysis, e.g. electron probe
microanalysis, EPMA, or instrumental neutron activation analysis, INAA (Arevalo,
McDonough, & Luong, 2009).
(2) Different data mining algorithms are used to discriminate tectonic settings of the
collated olivine in basalt in terms of geochemical characteristics. The classification
performance of different data mining algorithms is compared.
(3) The effect of data preprocessing on the discrimination results of tectonic settings
of olivine has not been considered in the previous step. Hence, the impacts of
different data preprocessing techniques on the classification performance of four
classifiers are compared.
(4) The importance score of each geochemical characteristic of olivine is analyzed
through Random Forest. The effects of different combinations of geochemical
characteristics on the single and overall classification accuracy of four classifiers
are considered. The relationship between the cumulative feature importance and
the classification accuracy is also studied.
(5) It is necessary to research on the effects of different sample data volumes on the
single and overall classification accuracy of four classifiers, when big data mining
is used to discriminate tectonic settings of olivine. This will provide a reference for
subsequent studies.
(6) Four comparative studies are conducted, including the classification performance of
four classifiers, the impacts of different data preprocessing techniques, combinations
of geochemical characteristics and sample data volumes. Some important conclu-
sions are drawn, and a few valuable suggestions for future research are proposed.
3. Methodology
3.1. Data mining algorithms
3.1.1. Logistic regression classifier (LRC)
Logistic regression is an approach to learning functions of the form f : X ! Y, or PðYjXÞ
in the case where Y is discrete-valued, and X ¼ ðX1 ; X2 ; ;Xn Þ is any vector containing
discrete or continuous variables (Mitchell, 2005; Subasi & Ercelebi, 2005). In this section,
we will primarily consider the case where Y is a boolean variable, in order to simplify
notation. More generally, Y can take on any of the discrete values fy1 ; y2 ; ; yk g
which is used in experiments.
Logistic regression assumes a parametric form for the distribution PðYjXÞ, then
directly estimates its parameters from the training data. The parametric model assumed
by Logistic regression in the case where Y is boolean as follows
1
PðY ¼ 1jXÞ ¼ P (1)
1 þ exp w0 þ ni¼1 wi Xi
and
P
exp w0 þ ni¼1 wi Xi
PðY ¼ 0jXÞ ¼ P (2)
1 þ exp w0 þ ni¼1 wi Xi
Notice that Equation (2) follows directly from Equation (1), because the sum of these two
probabilities must equal one.
12 Q. REN ET AL.
One highly expedient property of this form for PðYjXÞ is that it leads to a simple linear
expression for classification. To classify any given X, we generally want to assign the
value yk that maximizes PðY ¼ yk jXÞ. In other word, we assign the label Y ¼ 0 if the
PðY¼0jXÞ
following condition holds PðY¼1jXÞ > 1. Substituting from Equations (1) and (2), this
Pn
becomes exp w0 þ i¼1 wi Xi > 1.
Then taking the natural log of both sides, we have a linear classification rule that
P
assigns label Y ¼ 0 if X satisfies w0 þ ni¼1 wi Xi > 0, and assigns Y ¼ 1 otherwise.
All model parameters can be estimated by the correlation frequency of the training set.
The common method is maximum likelihood estimation. To estimate the parameters for
the distribution of a feature, we must assume a distribution or generate nonparametric
models for the features from the training set.
and its output category is determined by the mode of the output of decision trees, as shown in
Figure 2.
Random Forest is an ensemble learning method composed of K decision
treesfhðX; Lk Þ; k ¼ 1; 2; ; K g, and fLk ; k ¼ 1; 2; ; K g is a random vector that is inde-
pendent and identically distributed, which is used to control the growth of each tree.
The algorithm first uses bootstrap sampling to extract k samples from a training set, then
establishes a decision tree for each sample, respectively, and finally obtains the
sequencefh1 ðX; L1 Þ; h2 ðX; L2 Þ; ; hk ðX; Lk Þg. Under the given independent variable,
each decision tree acquires a classification result. The final classification result depends
on a simple majority vote for the result of each decision tree. The classification decision
can be expressed as
Xk
HðxÞ ¼ arg max i¼1
Iðhi ðx; Li Þ ¼ YÞ (6)
Y
where HðxÞ represents the combined classification model, hi is a single decision tree
classification model, IðÞ is an indicator function, and Y represents the output variable.
Vote for
D Randomize D2 Decision tree 2 Result 2
the best
@εðnÞ @εðnÞ
Δwji ðnÞ ¼ η yi ðnÞ; ¼ ej ðnÞϕ0 ðvj ðnÞÞ (7)
@vj ðnÞ @vj ðnÞ
where yi is the output of the previous neuron, η is the learning rate, and ϕ0 is the
derivative of the activation function. Though the change in weights to a hidden node is
complicated, the relevant derivative is
@εðnÞ X @εðnÞ
¼ ϕ0 ðvj ðnÞÞ wkj ðnÞ (8)
@vj ðnÞ k @vk ðnÞ
This relies on the change in weights of the kth nodes, and the algorithm represents
a backpropagation of the activation function.
the other hand, the confusion matrix, as a visual tool, can intuitively exhibit the
classification precision (Abdel-Zaher & Eldeib, 2016; Patil & Sherekar, 2013). Both con-
stitute a scientific and rounded evaluation system.
Table 2. Discrimination results of tectonic settings of olivine based on four data mining algorithms.
LRC Naïve Bayes Random Forest MLP
Classes Test set Number Accuracy (%) Number Accuracy (%) Number Accuracy (%) Number Accuracy (%)
MORB 115 86 74.78 102 88.70 99 86.09 102 88.70
OIB 89 49 55.06 36 40.45 60 67.42 72 80.90
IAB 112 96 85.71 89 79.46 104 92.86 104 92.86
Overall 316 231 73.10 227 71.84 263 83.23 278 87.98
* Highlighted in bold denotes the best performance measure of the classifier for each tectonic setting.
100
MORB
MORB
80
80
Tr ue label
True label
60 60
OIB
OIB
40 40
IAB
IAB
20
20
0
MORB OIB IAB MORB OIB IAB
Predicted label Predicted label
(a) (b)
100 100
MORB
MORB
80 80
True label
Tr ue label
60 60
OIB
OIB
40 40
IAB
IAB
20 20
It can be found that the classification accuracy of MORB, OIB and IAB based on MLP is
88.70%, 80.90% and 92.86%, respectively, and the overall classification accuracy is
87.98%. Compared with the other three classifiers, MLP has the best classification effect.
The second is Random Forest that has the same IAB classification accuracy as MLP, while
the MORB and OIB classification accuracy are lower. The classification performance of
LRC and Naïve Bayes is roughly similar. On the whole, though, LRC is better and simpler.
However, the results of the constrained experiment have some limitations and the more
advanced is necessary.
BIG EARTH DATA 17
X Xmin
Xnormalization ¼ (9)
Xmax Xmin
where X represents each sample data, Xmax is the maximum value of sample data, and
Xmin is the minimum value of sample data.
Zero-mean standardization (Bo, Wang, & Jiao, 2006) transforms data to have zero
mean and unit variance. In most cases, standardization is recommended, for example
using the equation below
X μ
Xstandardization ¼ (10)
σ
where X represents each sample data, μ is the mean of sample data, and σ is the
standard deviation of sample data.
Missing value handling (Frane, 1976) is another important step in data preprocessing.
Missing data can distort the classification results and affect the classifier performance.
The direct deletion of data containing missing values can result in data waste, thus
affecting the generalization ability of the classifier. The imputation method (Donders,
van der Heijden, Stijnen, & Moons, 2006) based on statistics replaces the missing value
with a certain value, such as mean and mode, so that the data are relatively complete.
Since the missing value in the data used is numeric, the missing attribute value can be
replaced with the mean of all other values in this attribute.
However, all of these techniques mentioned above have their drawbacks (García,
Ramírez-Gallego, Luengo, Benítez, & Herrera, 2016). If you have outliers in your dataset,
normalizing your data will certainly scale the normal data to a very small interval. And
generally, most of datasets have outliers. When using standardization, your new data are
not bounded (unlike normalization). The mean imputation method is easy to implement
and does not affect the mean of sample data, but it could reduce the variance of the
data, which is not what we want. Therefore, it is necessary to do some research about
the impacts of three data preprocessing methods on the classification performance of
different classifiers.
18 Q. REN ET AL.
77.0 75
Overall classification accuracy (%)
Overall classification accuracy (%)
Raw data
74 Standardized data
76.5 Normalized data
Data without missing values
73
76.0
72
Raw data
75.5 Standardized data
Normalized data 71
Data without missing values
75.0 70
4 6 8 10 12 14 16 4 6 8 10 12 14 16
K (a) K (b)
100 95
Overall classification accuracy (%)
Raw data
Standardized data
95 Normalized data
90 Data without missing values
Raw data
90 Standardized data
Normalized data
Data without missing values 85
85
80 80
4 6 8 10 12 14 16 4 6 8 10 12 14 16
K (c) K (d)
Figure 5. Impacts of different data preprocessing techniques on the discrimination results of tectonic
settings of olivine based on four data mining algorithms: (a) LRC, (b) Naïve Bayes, (c) Random Forest,
and (d) MLP.
BIG EARTH DATA 19
while the other two preprocessing methods do not work. It can also be found in Figure 5(c)
that the mean imputation method greatly improves the classification accuracy of Random
Forest, and the increase is much larger than that of Naïve Bayes, approximately 10%. The
standardization and normalization methods are equally ineffective. As shown in Figure 5(d),
the mean imputation technique unexpectedly reduces the classification accuracy of MLP,
while other techniques remain unchanged. Therefore, the classification accuracy of the
combination model of the mean imputation technique and the Random Forest algorithm is
the highest, which is about 6% higher than that of the single MLP. Moreover, the maximum
difference in classification accuracy of each data mining algorithm is very small within the
range of 5 to 15 folds. It turns out that the four classification models are all robust.
P2O5
Cr2O3
MnO
Al2O3
Basic elements
TiO2
FeOT
Na2O
NiO
MgO
SiO2
CaO
K2O
0 5 10 15 20
Feature importance (%)
Figure 6. Feature importance evaluation results of 12 basic elements using Random Forest.
importance score. For example, P2O5 is the least important element, with a miss rate of up to
93%. Although Na2O is the sixth important element now, it may be more important as the
number of missing values decreases. The only exception is that though there are many
missing values for K2O, it is still the most important. In view of these points, relatively large
amounts of complete data need to be collected to achieve more reliable evaluation results
of the feature importance. Therefore, the order of the feature importance of basic elements
shown in Figure 6 is only for the geochemical data used in this study.
In order to quantitatively measure the contribution of each basic element to the model
classification performance, all of the elements are sorted by feature importance from large to
small, and the simulation experiment is carried out by adding them one by one. There are 12
experimental conditions in all, as illustrated in Table 3. The sum of the feature importance of
corresponding basic elements under each condition is listed in the table, for the convenience
of studying the relationship between the cumulative feature importance and the single and
overall classification accuracy. And also, the 10-fold cross-validation method and the combi-
nation model of the mean imputation technique and the Random Forest algorithm are
adopted in this experiment.
The experimental results are summarized in Figure 7. When the input variable is only K2O,
the classification accuracy of MORB is 0, while the classification accuracy of OIB and IAB are
both as high as 90%. The overall classification accuracy is just 60% due to the poor classifica-
tion effect of MORB. The number of missing values for 12 basic elements in different tectonic
settings is listed in Table 4. Both MORB and IAB have higher miss rate, and it is obvious that the
missing value has greater effects on MORB. On the other hand, when the number of elements
is between two and five, the single and overall classification accuracy increase with the rise of
the number of elements. The classification effect remains stable when the number of
elements is greater than five. The number of missing values has little effect on the classifica-
tion accuracy in these two cases because there exists no mutation. As a whole, the change
trends of both the cumulative feature importance and the classification accuracy are not
entirely consistent. The accumulative feature importance is increasing all the time, while the
classification accuracy improves at the beginning and then keeps steady.
BIG EARTH DATA 21
Table 3. Combination results of basic elements on the basis of feature importance measurement.
Combination of basic elements (note that feature importance goes from large Sum of feature importance
Number to small) (%)
1 K2O 20.69
2 K2O, CaO 40.78
3 K2O, CaO, SiO2 50.93
4 K2O, CaO, SiO2, MgO 59.64
5 K2O, CaO, SiO2, MgO, NiO 68.00
6 K2O, CaO, SiO2, MgO, NiO, Na2O 76.19
T
7 K2O, CaO, SiO2, MgO, NiO, Na2O, FeO 82.74
8 K2O, CaO, SiO2, MgO, NiO, Na2O, FeOT, TiO2 87.60
9 K2O, CaO, SiO2, MgO, NiO, Na2O, FeOT, TiO2, Al2O3 92.04
10 K2O, CaO, SiO2, MgO, NiO, Na2O, FeOT, TiO2, Al2O3, MnO 95.82
T
11 K2O, CaO, SiO2, MgO, NiO, Na2O, FeO , TiO2, Al2O3, MnO, Cr2O3 99.31
12 K2O, CaO, SiO2, MgO, NiO, Na2O, FeOT, TiO2, Al2O3, MnO, Cr2O3, P2O5 100.00
100
80
Accuracy (%)
60
40
Overall classification
MORB classification
20 OIB classification
IAB classification
Cumulative importance
0
0 2 4 6 8 10 12
Number of basic elements
Figure 7. Impacts of different combinations of basic elements (note that feature importance goes from
large to small) on the discrimination results of tectonic settings of olivine based on Random Forest.
Table 4. Number of missing values for 12 basic elements in different tectonic settings.
Number of missing values
Basic elements MORB (539) OIB (463) IAB (580) Total (1582)
K2O 441 0 480 921
CaO 0 1 0 1
SiO2 0 0 0 0
MgO 0 0 0 0
NiO 266 88 0 354
Na2O 380 182 423 985
FeOT 0 0 0 0
TiO2 362 144 2 508
Al2O3 0 88 0 88
MnO 0 5 0 5
Cr2O3 0 0 0 0
P2O5 503 433 535 1471
*(·) denotes the total number of basalt samples for each tectonic setting.
22 Q. REN ET AL.
Table 5. Single and overall classification accuracy for different sample data volumes.
Sample data volume Classification accuracy (%)
Total MORB OIB IAB Overall MORB OIB IAB
532 189 113 230 95.30 95.77 92.92 96.09
682 239 163 280 94.13 96.23 87.73 96.07
832 289 213 330 92.91 93.08 86.85 96.67
982 339 263 380 92.57 92.92 89.73 94.21
1132 389 313 430 92.58 93.32 90.73 93.26
1282 439 363 480 92.59 94.08 89.81 93.33
1432 489 413 530 93.16 93.66 92.98 92.83
1582 539 463 580 93.74 94.06 92.66 94.31
100
95
Accuracy (%)
90
85 Overall classification
MORB classification
OIB classification
IAB classification
80
400 800 1200 1600
Total data volume
Figure 8. Impacts of different sample data volumes on the discrimination results of tectonic settings
of olivine based on Random Forest.
BIG EARTH DATA 23
reduce the computing power and time consumption without affecting the discrimi-
nation accuracy.
(1) It has been preliminarily proved that the composition of olivine in basalt has the
function of discriminating tectonic settings. However, the initial finding obtained by
the data-driven method needs further examination, because some scholars still do
not believe that olivine is closely related to the tectonic setting where it formed.
(2) The combination model of the mean imputation technique and the Random Forest
algorithm is recommended for discriminating tectonic settings of olivine in basalt.
Nevertheless, a single MLP is worth considering if the number of missing values is small.
(3) The seven basic elements, including K2O, CaO, SiO2, MgO, NiO, Na2O and FeOT,
play a role of over 80% in the course of tectonic setting discrimination of olivine
in basalt. It is worth mentioning that the first five basic elements are enough to
achieve good classification effects.
(4) The classification accuracy of OIB is more influenced by the data volume. The
single and overall classification accuracy can reach the large and stable value
when the data volume is over 1400. The more appropriate data volume calls for
more research.
Disclosure statement
No potential conflict of interest was reported by the authors.
Funding
This research was supported by the Tianjin Science Foundation for Distinguished Young Scientists
of China [Grant No. 17JCJQJC44000] and the National Natural Science Foundation for Excellent
Young Scientists of China [Grant No. 51622904].
24 Q. REN ET AL.
ORCID
Mingchao Li http://orcid.org/0000-0002-3010-0892
References
Abdel-Zaher, A. M., & Eldeib, A. M. (2016). Breast cancer classification using deep belief networks.
Expert Systems with Applications, 46, 139–144.
Arevalo, R., McDonough, W. F., & Luong, M. (2009). The K/U ratio of the silicate Earth: Insights into
mantle composition, structure and thermal evolution. Earth and Planetary Science Letters, 278(3),
361–369.
Arlot, S., & Celisse, A. (2010). A survey of cross-validation procedures for model selection. Statistics
Surveys, 4, 40–79.
Bhatia, M. R., & Crook, K. A. (1986). Trace element characteristics of graywackes and tectonic
setting discrimination of sedimentary basins. Contributions to Mineralogy and Petrology, 92(2),
181–193.
Bo, L., Wang, L., & Jiao, L. (2006). Feature scaling for kernel fisher discriminant analysis using
leave-one-out cross validation. Neural Computation, 18(4), 961–978.
Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5–32.
Di, P., Wang, J., Zhang, Q., Yang, J., Chen, W., Pan, Z., . . . Jiao, S. (2017). The evaluation of basalt
tectonic discrimination diagrams: Constraints on the research of global basalt data. Bulletin of
Mineralogy, Petrology and Geochemistry, 36(6), 891–896.
Donders, A. R. T., Van Der Heijden, G. J., Stijnen, T., & Moons, K. G. (2006). A gentle introduction to
imputation of missing values. Journal of Clinical Epidemiology, 59(10), 1087–1091.
Enciso-Maldonado, L., Dyer, M. S., Jones, M. D., Li, M., Payne, J. L., Pitcher, M. J., . . . Rosseinsky, M. J.
(2015). Computational identification and experimental realization of lithium vacancy introduc-
tion into the olivine LiMgPO4. Chemistry of Materials, 27(6), 2074–2091.
Frane, J. W. (1976). Some simple procedures for handling missing data in multivariate analysis.
Psychometrika, 41(3), 409–415.
García, S., Ramírez-Gallego, S., Luengo, J., Benítez, J. M., & Herrera, F. (2016). Big data preproces-
sing: Methods and prospects. Big Data Analytics, 1(1), 9.
Golub, G. H., Heath, M., & Wahba, G. (1979). Generalized cross-validation as a method for choosing
a good ridge parameter. Technometrics, 21(2), 215–223.
Granitto, P. M., Furlanello, C., Biasioli, F., & Gasperi, F. (2006). Recursive feature elimination with
random forest for PTR-MS analysis of agroindustrial products. Chemometrics and Intelligent
Laboratory Systems, 83(2), 83–90.
Han, S., Li, M., Ren, Q., & Liu, C. (2018). Intelligent determination and data mining for tectonic
settings of basalts based on big data methods. Acta Petrologica Sinica, 34(11), 3207–3216.
Hannan, M. A., Arebey, M., Begum, R. A., Mustafa, A., & Basri, H. (2013). An automated solid waste
bin level detection system using Gabor wavelet filters and multi-layer perception. Resources,
Conservation and Recycling, 72, 33–42.
Howarth, G. H., & Harris, C. (2017). Discriminating between pyroxenite and peridotite sources for
continental flood basalts (CFB) in southern Africa using olivine chemistry. Earth and Planetary
Science Letters, 475, 143–151.
Jain, Y. K., & Bhandare, S. K. (2011). Min max normalization based data perturbation method for
privacy protection. International Journal of Computer and Communication Technology, 2(8),
45–50.
Li, C., Arndt, N. T., Tang, Q., & Ripley, E. M. (2015). Trace element indiscrimination diagrams. Lithos,
232, 76–83.
Li, M., Miao, L., & Shi, J. (2014). Analyzing heating equipment’s operations based on measured data.
Energy and Buildings, 82, 47–56.
Longstaff, I. D., & Cross, J. F. (1987). A pattern recognition approach to understanding the
multi-layer perception. Pattern Recognition Letters, 5(5), 315–319.
BIG EARTH DATA 25
Luo, J., Wang, X., Song, B., Yang, Z., Zhang, Q., Zhao, Y., & Liu, S. (2018). Discussion on the method
for quantitative classification of magmatic rocks: Taking it’s application in West Qinling of Gansu
Province for example. Acta Petrologica Sinica, 34(2), 326–332.
Menze, B. H., Kelm, B. M., Masuch, R., Himmelreich, U., Bachert, P., Petrich, W., & Hamprecht, F. A.
(2009). A comparison of random forest and its Gini importance with standard chemometric
methods for the feature selection and classification of spectral data. BMC Bioinformatics, 10(1), 213.
Mitchell, T. M. (2005). Generative and discriminative classifiers: Naive Bayes and logistic regression.
Machine Learning. New York: McGraw Hill.
Patil, T. R., & Sherekar, S. S. (2013). Performance analysis of Naive Bayes and J48 classification
algorithm for data classification. International Journal of Computer Science and Applications, 6(2),
256–261.
Pearce, J. A. (1996). A user’s guide to basalt discrimination diagrams. Trace element geochemistry
of volcanic rocks: Applications for massive sulphide exploration. Geological Association of
Canada, Short Course Notes, 12, 79–113.
Pearce, J. A., & Cann, J. R. (1973). Tectonic setting of basic volcanic rocks determined using trace
element analyses. Earth and Planetary Science Letters, 19(2), 290–300.
Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., . . . Duchesnay, É. (2011).
Scikit-learn: Machine learning in Python. Journal of Machine Learning Research, 12(10),
2825–2830.
Petrelli, M., & Perugini, D. (2016). Solving petrological problems through machine learning: The
study case of tectonic discrimination using geochemical and isotopic data. Contributions to
Mineralogy and Petrology, 171(10), 81.
Ren, Q., Wang, G., Li, M., & Han, S. (2019). Prediction of rock compressive strength using machine
learning algorithms based on spectrum analysis of geological hammer. Geotechnical and
Geological Engineering, 37(1), 475–489.
Sánchez-Muñoz, L., Müller, A., Andrés, S. L., Martin, R. F., Modreski, P. J., & de Moura, O. J. (2017).
The P-Fe diagram for K-feldspars: A preliminary approach in the discrimination of pegmatites.
Lithos, 272, 116–127.
Subasi, A., & Ercelebi, E. (2005). Classification of EEG signals using neural network and logistic
regression. Computer Methods and Programs in Biomedicine, 78(2), 87–99.
Taalab, K., Cheng, T., & Zhang, Y. (2018). Mapping landslide susceptibility and types using Random
Forest. Big Earth Data, 2(2), 159–178.
Townsend, J. T. (1971). Theoretical analysis of an alphabetic confusion matrix. Perception &
Psychophysics, 9(1), 40–50.
Ueki, K., Hino, H., & Kuwatani, T. (2018). Geochemical discrimination and characteristics of mag-
matic tectonic settings: A machine-learning-based approach. Geochemistry, Geophysics,
Geosystems, 19(4), 1327–1347.
Vermeesch, P. (2006). Tectonic discrimination of basalts with classification trees. Geochimica et
Cosmochimica Acta, 70(7), 1839–1848.
Wang, J., Chen, W., Zhang, Q., Jiao, S., Yang, J., Pan, Z., & Wang, S. (2017). Preliminary research on
data mining of N-MORB and E-MORB: Discussion on method of the basalt discrimination
diagrams and the character of MORB’s mantle source. Acta Petrologica Sinica, 33(3), 993–1005.
Zhang, Q. (1990). The correct use of the basalt discrimination diagram. Acta Petrologica Sinica, 2,
87–94.
Zhou, X., Chen, Z., & Chen, T. (1980). Composition and evolution of olivine and pyroxene in alkali
basaltic rocks from Jiangsu Province. Geochimica, 33(3), 253–262.
Zhou, Y., Chen, S., Zhang, Q., Xiao, F., Wang, S., Liu, Y., & Jiao, S. (2018). Advances and prospects of
big data and mathematical geoscience. Acta Petrologica Sinica, 34(2), 255–263.