Professional Documents
Culture Documents
To cite this article: Zhigang Yan, Kan Yao & Yuanxuan Yang (2016): A novel adaptive differential
evolution SVM model for predicting coal and gas outbursts, Journal of Difference Equations
and Applications, DOI: 10.1080/10236198.2016.1214725
Article views: 1
Download by: [La Trobe University] Date: 02 August 2016, At: 02:54
JOURNAL OF DIFFERENCE EQUATIONS AND APPLICATIONS, 2016
http://dx.doi.org/10.1080/10236198.2016.1214725
Parameter selection is a key factor affecting the performance of Received 6 June 2016
support vector machines (SVMs). To further improve the classification Accepted 5 July 2016
accuracy and generalization ability of SVMs, a parameter selection KEYWORDS
model for SVMs with RBF kernel is proposed based on adaptive Support vector machine;
differential evolution (ADE) algorithm, and is applied to predict coal coal and gas outbursts;
and gas outbursts. The function of each parameter and its adjustment differential evolution;
scheme of differential evolution (DE) algorithm are analyzed, and mutation factor; crossover
the algorithm is improved by using the decision error rate of factor
samples as the objective function. Adaptive calculation equations for AMS SUBJECT
variability factor and crossover factor are designed. The variability and CLASSIFICATION
crossover factors are automatically adjusted in the execution process 93C95
of the algorithm, so that the population diversity is maintained in
the early stages of algorithm execution to enhance the ability of
searching global optimal values, while the stability of the algorithm
is guaranteed in the late stages by promoting the searching ability
for local optimal values. A novel ADESVM model for predicting
coal and gas outbursts is established by using ADE algorithm to
select SVM parameters, which is applied to predict the coal and
gas outbursts. Experimental results show that the designed ADE
algorithm has high convergence speed and high computational
accuracy. The proposed ADESVM model has higher training speed
and is more robust compared with other similar SVM models. It also
has higher prediction accuracy and shorter training time, compared
with back propagation neural networks, providing a new method for
the intelligent prediction of coal and gas outbursts.
1. Introduction
For a long time, coal and gas outbursts have been a major kind of disaster threatening
coal mining safety. Timely and accurately predicting coal and gas outbursts is the key
factor of increasing the economic benefits of coal mines and guaranteeing coal mine safety.
Establishing fast and effective prediction models for coal and gas outbursts and making
evaluations on outburst risk are important parts of preventing and controlling coal and
gas outbursts, which have theoretical significance and practical value [17]. Commonly
used prediction methods for coal and gas outbursts include index prediction methods,
gas geology unit methods, geophysical methods, etc., [19]. Coal and gas outbursts are
complex non-linear dynamic systems. Therefore, using non-linear artificial intelligence
technology to recognize the patterns of outbursts and effectively predict them has become
a research hotspot. Currently, numerous researchers have carried out studies on predicting
coal and gas outburst using artificial intelligence methods, and achieved good results.
For example, Wen and Zhang, et al. proposed pattern recognition models [13,20]. Dong
et al. proposed G-K evaluation and a rough set model [5]. Wang et al. suggested a distance
discriminant analysis method [12]. Guo established a fuzzy synthetic mathematic evalu-
ation and clustering method [8]. You et al. developed a neural network based prediction
method [18]. Yang et al. proposed an IDEPB NN model [15]. Recently, support vector
machine (SVM) is applied in coal and gas outburst prediction [11,14]. SVM is a learning
method for small sample sets based on statistical learning theory, with relatively strong
non-linear modelling ability. It is very suitable for coal and gas outburst prediction, and
very good application performance has been reached. However, large numbers of studies
Downloaded by [La Trobe University] at 02:54 02 August 2016
have shown that parameters are the major factor affecting SVM performance. There have
been no unified standards or theories for SVM parameter selection at present [4]. Optimal
parameters are usually obtained empirically or by using cross validation methods with large
numbers of experiments, such as grid search method [1], which is time consuming, and
optimal parameters are not guaranteed. In recent years, many researchers have proposed
other parameter optimization methods. For example, gradient descend method was used
for parameter optimization [6]. Although the method reduces parameter searching time, it
is highly dependent on initial values, and is a kind of linear searching method which is easy
to be trapped in local optima. Particle swarm optimization [3] (PSO) method and genetic
algorithm [7] (GA) method were proposed, respectively, for SVM parameter optimization.
Although these intelligent methods reduce the dependence on initial values, their theory
and implementation are relatively complicated. Moreover, different optimization problems
require different crossover, mutation and selection methods, and are easy to be trapped in
local optima.
Differential evolution (DE) is a kind of heuristic parallel random searching optimization
method using floating point vector coding proposed by Storn and Price [9] in 1995. It
extracts differential information from the current population to guide the next step of
searching. Its principle is relatively simple, and it has only a few control parameters, with
relatively strong global searching ability, robustness, and high optimization speed. We
analyze the ‘reasonable region’ of the parameter optimization for SVMs with RBF kernel,
and the related new parameter optimization methods are studied [16]. On this basis, the
adaptive differential evolution (ADE) method for SVM parameter optimization is designed
in this study. Taking the global searching ability of DE into account, and using minimum
sample decision error rate as the optimization criterion to construct objective function, the
DE algorithm is improved for SVM parameter selection by adaptively design the mutation
and crossover factors, etc., so that the classification accuracy and generalization ability of
SVM is improved. The improved algorithm is applied to coal and gas outburst prediction.
Comparative experiments show that the proposed method has better performance and
higher accuracy than prevalent prediction methods, with relatively good application effects.
JOURNAL OF DIFFERENCE EQUATIONS AND APPLICATIONS 3
2. SVM algorithm
The idea of SVM [10] is based on Mercer theory. The input space is transformed to a
feature space with a higher dimension by using appropriate non-linear transformations.
In the feature space, the optimal classification hyperplane is solved, so that the hyperplane
can correctly classify as many data points of the two classes as possible, and the distances
between the classified data points of the two classes and the classification hyperplane are
as far as possible.
Given k samples: (x1 , y1 ), (x2 , y2 ), . . . (xk , yk ), x ∈ Rn , y ∈ {−1, 1}, a hyperplane (deci-
sion hyperplane) is to be found, so that the samples can be separated by it, i.e. Wx + b = 0,
W ∈ Rn , b ∈ R. The corresponding recognition function is:
yi [Wxi + b] ≥ 1 − ξi , i = 1, 2, . . . , k (2)
The optimal decision hyperplane satisfies the condition that the minimum distance
between the samples of two classes and the decision hyperplane reaches longest. Then, the
classification problem is converted into the minimization problem of Equation (2) with
constraint ξi ≥ 0, i.e.:
1 k
min : τ (W) = W2 + C ξi (3)
2
i=1
where W2 is called structural risk, representing the complexity degree of the model, and
making
k the function smoother to improve the generalization ability;
i=1 ξi is called empirical risk, representing the error of the model;
C is the penalty parameter balancing the above two terms.
Equation (3) is an optimization problem with a constraint, which can be solved by
using Lagrangian optimization method. The corresponding classification function can be
converted into:
k
f (x) = sign αi yi (xi · x) + b (4)
i=1
When samples are non-linearly seperable, the raw samples can be mapped into a high-
dimensional feature space by a non-linear function φ(x), and the classification is carried out
in the high-dimensional feature space. The corresponding classification function becomes:
k
f (x) = sign αi yi (φ(x) · φ(xi )) + b (5)
i=1
The inner product operation in high-dimensional feature space can be defined as kernel
function K(x, y) = φ(x) · φ(y), so that kernel function can be conducted to the variables in
low-dimensional space, instead of directly using function φ. Therefore, Equation (5) can
be converted into Equation (6) for solution:
4 Z. YAN ET AL.
k
f (x) = sign αi yi K(xi , x) + b (6)
i=1
Commonly used kernel functions include linear kernels, polynomial kernels, Sigmoid
kernels, and RBF kernels.
diversity of population. Using the above mutation and crossover operations, a temporary
population is generated. Then, selection operations based on greedy criterion are used
to select the two populations one by one, so that a new generation of population can be
generated. According to the above method, the population can evolve continuously until
the stopping criterion of the algorithm is satisfied.
where
g+1
vi is the mutated individual generated by applying mutation operation in Equation
g
(7) to the individual vi of generation g;
g+1
xbest is the optimal individual in generation g + 1;
g is the current generation;
s1 , s2 , s3 , s4 ∈ {1, 2, . . . , N} are different random numbers not equal to i;
F is the mutation factor which enhances or reduces the differential quantities.
where
g+1
y is the individual generated by applying the crossover operation of Equation (8) to
g i
xi and the individual generated by Equation (7);
rand(j) is a random number evenly distributed in [0, 1];
JOURNAL OF DIFFERENCE EQUATIONS AND APPLICATIONS 5
CR ∈ [0, 1] is the crossover factors. The larger the CR is, the more contribution to C
g+1 g+1 g+1
the vi makes. When CR = 1, vi = yi , and it is beneficial to local searching and
g+1
fast convergence. The smaller the CR is, the more contribution to yi the C makes. When
g g+1
CR = 0, xi = yi , and it is beneficial to the diversity of the population and the global
searching ability.
3.1.3. Selection operation
g+1 g+1 g
g+1 yi , f (yi ) < f (xi )
xi = g g+1 g (9)
xi , f (yi ) ≥ f (xi )
where f is the objective function. Greedy searching strategy is used. The trial individual
g+1 g
yi generated by mutation and crossover operations competes with xi . Only when the
g+1 g g
fitness of yi is better than xi , it is selected as the offspring; otherwise, xi is selected as
offspring.
Downloaded by [La Trobe University] at 02:54 02 August 2016
Through the above operations, the new population is generated. Finally, the stopping
criterion is judged. If not, the mutation, crossover, and selection operations are iteratively
executed until the stopping criterion is satisfied and the optimal solution is obtained.
where
m1 and m2 represent the misclassified data of the two classes, respectively;
n1 and n2 represent the numbers of the samples in the two classes, respectively;
m1 m2
n1 and n2 represent the misclassification rates of the samples in the two classes,
respectively;
k is a scaling factor controlling the significant degree of the changes of the objective
function value;
λ1 , λ2 ∈ [0, 1] are the controlling factors of the error rate for the two classes, respectively.
They adjust the misclassification of the two classes.
g(CRmax − CRmin )
CR = CRmin + (12)
gmax
where
CRmin is the defined minimum crossover probability;
CRmax is the defined maximum crossover probability;
g is the current iteration number;
JOURNAL OF DIFFERENCE EQUATIONS AND APPLICATIONS 7
culate the objective function value f . Judge whether the predefined accuracy is
reached or g = gmax is satisfied (i.e. the maximum generation number is reached).
If either is satisfied, go to Step9; otherwise, go to next step.
Step 4: g = g + 1. Calculate new mutation factor and crossover factor. Perform the
evolution of the next generation.
g
Step 5: Select 4 different individuals xi from the current generation g. Use Equation (7)
g+1
for mutation operations and generate mutated individual vi of generation g + 1.
g+1
Step 6: Perform crossover operation for the mutated individual vi of generation g + 1
g+1
according to Equation (8), which generates the trial individual yi of generation
g + 1.
g+1
Step 7: Perform selection operation to trial individual yi of generation g + 1 according
g+1
to Equation (9), generating individual vi of generation g + 1.
Step 8: Calculate new (C, γ ) in individuals of generation g + 1, and then go to Step2.
Step 9: Obtain the optimal SVM parameters (C, γ ).
The algorithm flowchart is shown in Figure 1.
4. SVM model with ADE optimation for predicting coal and gas outbursts
4.1. Analysis of factors of coal and gas outbursts
There are many factors affecting coal and gas outbursts [19], such as coal type, initial
velocity of gas diffusion, coal sturdy coefficient, coal seam gas pressure, soft coal layer
thickness and the wall rock permeability of coal seams. Although coal and gas outbursts are
related with these factors, the risk is difficult to linearly express by these factors. Therefore,
accurately determine the main affecting factors of coal and gas outbursts is the key of
predicting coal and gas outbursts. Referring [15], we collect multiple groups of coal and
gas outburst data in the experimental mining area, Huaibei Mining Group Company,
Luling mine site. According to expert analysis, 24 relatively independent factors are
selected to establish a tree model of coal and gas outburst accidents. Principal component
analysis is used to select final eight main factors controlling coal and gas outbursts,
namely, gas pressure p, coal mechanical strength f , coal crumbliness comprehensive
feature coefficient Kc , coal permeability coefficient λ, coal split and merge feature coefficient
8 Z. YAN ET AL.
Initialize DE parameters
and SVM parameters (C,γ)
YES
Downloaded by [La Trobe University] at 02:54 02 August 2016
Ks , coal thickness and coal thickness varying comprehensive feature coefficient Kt , fault
complexity coefficient Kf , interlayer sliding comprehensive feature coefficient Ki .
4.2. Experimental data sets for coal and gas outburst prediction
After the main controlling factors are determined, the experimental data are organized,
and 36 groups of coal and gas outburst sample data are obtained, which are shown in
Table 1. Among them, 26 groups are labelled as outburst, and 10 groups are labelled as
non-outburst. Arbitrarily choose 16 groups of outburst data and 5 groups of non-outburst
data as the training set, with number 1–21; the other 15 groups of data make up the test
set, with number 22–36.
Table 1. Sample data of coal and gas outburst collected in Luling coal mine.
No. p f Kc λ Ks Kt Kf Ki Outburst grade
1 2.16 0.34 1.05 0.22 18.7 6.25 0.014 7.74 Dangerous
2 1.75 0.3 1.26 0.51 19.8 6.03 0.039 6.75 Dangerous
3 1.35 0.45 1.48 0.41 5.1 4.02 0.022 2.53 Dangerous
4 0.97 0.41 1.55 0.72 5.1 4.15 0.022 2.53 Dangerous
5 1.02 0.35 1.28 0.55 20.4 5.79 0.035 2.53 Dangerous
6 1.12 0.29 1.36 0.47 6.8 4.99 0.041 10.22 Dangerous
7 0.8 0.2 1.18 0.7 5.1 6.04 0.025 8.86 Dangerous
8 1.4 0.42 1.65 0.39 5.1 7.01 0.076 2.53 Big
9 2.9 0.31 1.72 0.21 25.6 6.89 0.089 21.34 Big
10 3.65 0.22 1.36 0.09 5.1 5.87 0.044 2.53 Big
11 1.27 0.22 1.7 0.55 21.9 6.05 0.057 48.3 Big
12 3.61 0.24 1.81 0.12 15.7 7.77 0.037 2.53 Big
13 1.4 0.24 1.32 0.48 19.2 5.22 0.025 16.29 Ordinary
14 1.24 0.27 1.6 0.46 5.1 6.43 0.026 13.98 Ordinary
15 1.78 0.23 1.52 0.43 10.2 4.78 0.046 25.45 Ordinary
Downloaded by [La Trobe University] at 02:54 02 August 2016
Table 2. Results and prediction accuracies of different parameter selection algorithms for SVM model.
Parameter Parameters Time Prediction
selection algorithms The algorithms’ parameters c γ consuming/s accuracies (%)
Grid search c step: 0.5, γ Step: 0.5, v-fold = 5 1.3195 2.2974 14.936 93.33
PSO c1 = 1.5, c2 = 1.7, maxgen = 100, N = 40 1.2888 2.8635 6.784 93.33
others: the same as DE or default values
GA maxgen = 100, N = 40 others: 1.3396 3.3108 4.355 93.33
thesame as DE or default
values
DE C ∈ 2−10 , 210 , γ ∈ 2−10 , 210 , 1.3212 2.8326 3.837 93.33
maxgen= 100, N = 40 threshold:
0.0001
ADE C ∈ 2−10 , 210 , γ ∈ 2−10 , 210 , 1.3025 2.4605 2.426 93.33
maxgen = 100, N = 40 threshold: 0.0001
The sample data sets are normalized, and the SVM parameters (C, γ ) are optimized
by the above algorithm. The SVM is trained and the prediction results are obtained. The
experimental results are compared with parameter optimization methods, including grid
10 Z. YAN ET AL.
search algorithm [1], PSO [3], GA [7], and DE [9]. The comparative results are shown in
Table 2.
It is clear in Table 2 that the prediction accuracy of the parameter optimization methods
are generally the same. However, the proposed ADE algorithm consumes significantly
shorter time, improving the learning and generalization ability of SVM. Compared with
grid search algorithm, the evolution algorithms (PSO, GA, DE) require much less time,
indicating the advantage of evolution algorithms in parameter optimization. Among them,
DE has better adaptability. The proposed ADE algorithm is the improvement of DE, and
has better performance.
For the same training and test data sets, BPNN model is used for prediction [15], with
prediction accuracy of 86.67%. The prediction accuracy of ADESVM is 93.33%. Therefore,
the prediction accuracy of ADESVM is higher, and the training time is shorter.
Acknowledgements
The authors would also like to thank the reviewers for their constructive comments.
Disclosure statement
No potential conflict of interest was reported by the authors.
Funding
This work was supported by National Natural Science Foundation of China (NSFC) under Con-
tract [41271445], and partially supported by a Project Funded by the Priority Academic Program
Development of Jiangsu Higher Education Institutions.
References
[1] N.E. Ayat, M. Cheriet, and C.Y. Suen, Automatic model selection for the optimization of SVM
kernels, Pattern Recognit. 10(38) (2005), pp. 1733–1745.
JOURNAL OF DIFFERENCE EQUATIONS AND APPLICATIONS 11
[2] C.-C. Chang and C.-J. Lin, LIBSVM: A library for support vector machines, December 14,
2015. Software Available at http://www.csie.ntu.edu.tw/~cjlin/libsv.
[3] L.-H. Chen and H.-D. Hsiao, Feature selection to diagnose a business crisis by using a real GA-
based support vector machine: An empirical study, Exp. Syst. Appl. 35(3) (2008), pp. 1145–1155.
[4] V. Cherkassky and Y. Ma, Practical selection of SVM parameters and noise estimation for SVM
regression, Neural Networks 17(1) (2004), pp. 113–126.
[5] C.-Y. Dong, Z.-G. Cao, Y.-H. Shang, and X. Liu, Coal and gas outburst classification analysis
based on G-K evaluation and rough set, J. Chin. Coal Soci. 36(7) (2011), pp. 1156–1160.
[6] T. Glasmachers and C. Igel, Gradient-based adaptation of general Gaussian kernels, Neural
Comput. 17(10) (2005), pp. 2099–2105.
[7] X.C. Guo, J.H. Yang, C.G. Wu, C.Y. Wang, and Y.C. Liang, A novel LS-SVMs hyper-
parameter selection based on particle swarm optimization, Neurocomputing 71(16–18) (2008),
pp. 3211–3215.
[8] D.-Y. Guo, M.-J. Zheng, C. Guo, D.-M. Hu, and X.-K. Zhang, Extension clustering method
for coal and gas outburst prediction and its application, J. Chin. Coal Soc. 34(6) (2009), pp.
783–787.
Downloaded by [La Trobe University] at 02:54 02 August 2016
[9] R. Storn and K. Price, Differential evolution-a simple and efficient heuristic for global
optimization over continuous spaces, J. Global Optim. 11 (1997), pp. 341–359.
[10] V. Vapnik, An overview of statistical learning theory, IEEE Trans. Neural Networks 10(5)
(1999), pp. 988–999.
[11] Z.-H. Wang and N. Qiao, Prediction model of coal and gas outburst intensity based on IGA-
LSSVM, J. Liaoning Tech. Univ. 34(7) (2015), pp. 791–796.
[12] C. Wang, D.-Z. Song, X.-S. Du, Z.-G. Zhang, D. Zhu, and D.-W. Yang, Prediction of coal and
gas outburst based on distance discriminant analysis method and its application, J. Min. Saf.
Eng. 26(4) (2009), pp. 470–474.
[13] C.-P. Wen, Attribute recognition model and its application of fatalness assessment of gas burst
in tunnel, J. Chin. Coal Soc. 36(8) (2011), pp. 1322–1328.
[14] L. Yang, J.-C. Geng, and K.-L. Wang, Reseach on coal and gas outburst prediction using fuzzy
support vector machines, J. Saf. Sci. Technol. 10(4) (2014), pp. 103–108.
[15] M. Yang, Y.-J. Wang, and Y.-P. Cheng, Improved differential evolution neural network and its
application in prediction of coal and gas outburst, J. Chin. Univ. Min. Technol. 38(3) (2009),
pp. 399–444.
[16] Z. Yan, Y. Yang, and Y. Ding, An experimental study of the hyper-parameters distribution
region and its optimization method for support vector machine with Gaussian kernel, Int. J.
Signal Process. Image Process. Pattern Recognit. 6(5) (2013), pp. 437–446.
[17] J.-W. Yan, X.-B. Zhang, and Z.-M. Zhang, Research on geological control mechanism of coal-gas
outburst, J. Chin. Coal Soc. 38(7) (2013), pp. 1174–1178.
[18] W. You, Y.-X. Liu, Y. Li, C.-H. Liu, and T.-B. Zhou, Predicting the coal and gas outburst using
artificial neural network, J. Chin. Coal Soc. 32(3) (2007), pp. 285–287.
[19] Q.-X. Yu, K. Wang, and S.-Q. Yang, Study on pattern and control of gas emission at coal face in
China, J. Chin. Univ. Min. Technol. 29(1) (2000), pp. 9–14.
[20] Z.-X. Zhang, G.-F. Liu, R.-S. Lu, and J. Zhang, Regional forecast of coal and gas burst based on
fuzzy pattern recognition, J. Chin. Coal Soc. 32(6) (2007), pp. 592–595.