You are on page 1of 9

Neural Computing and Applications (2020) 32:5461–5469

https://doi.org/10.1007/s00521-019-04644-5
(0123456789().,-volV)(0123456789().
,- volV)

ATCI 2019

Application and comparison of several machine learning algorithms


and their integration models in regression problems
Jui-Chan Huang1 • Kuo-Min Ko1 • Ming-Hung Shu2,4 • Bi-Min Hsu3

Received: 18 August 2019 / Accepted: 22 November 2019 / Published online: 30 November 2019
 Springer-Verlag London Ltd., part of Springer Nature 2019

Abstract
With the rapid development of machine learning technology, as a regression problem that helps people to find the law from
the massive data to achieve the prediction effect, more and more people pay attention. Data prediction has become an
important part of people’s daily life. Currently, the technology is widely used in many fields such as weather forecasting,
medical diagnosis and financial forecasting. Therefore, the research of machine learning algorithms in regression problems
is a research hotspot in the field of machine learning in recent years. However, real-world regression problems often have
very complex internal and external factors, and various machine learning algorithms have different effects on scalability
and predictive performance. In order to better study the application effect of machine learning algorithm in regression
problem, this paper mainly adopts three common machine learning algorithms: BP neural network, extreme learning
machine and support vector machine. Then, by comparing the effects of the single model and integrated model of these
machine learning algorithms in the application of regression problems, the advantages and disadvantages of each machine
learning algorithm are studied. Finally, the performance of each machine learning algorithm in regression prediction is
verified by simulation experiments on four different data sets. The results show that the research on several machine
learning algorithms and their integration models has certain feasibility and rationality.

Keywords Machine learning  Regression problem  BP neural network  Extreme learning machine  Support vector
machine

1 Introduction

Forecasting is the judgment of people on the basis of


& Kuo-Min Ko
learning and is a reflection of learning ability. With the
kuomin@nkust.edu.tw
advancement of society and the development of technol-
Jui-Chan Huang
ogy, predictive science is widely disseminated and applied
wish0718@outlook.com
in many fields, such as time series prediction, regression
Ming-Hung Shu
analysis and pattern recognition. Among them, classifica-
workman@nkust.edu.tw
tion and regression are two specific forms of prediction,
Bi-Min Hsu
and they are also the core research direction branches in the
bmhsu@csu.edu.tw
fields of machine learning, statistics and data mining. The
1
Yango University, Fuzhou 350015, China main task of the regression problem is to train a learner
2
Department of Industrial Engineering and Management, based on the existing data and map the input to the cor-
National Kaohsiung University of Science and Technology, responding output result to achieve the purpose of predic-
Kaohsiung City 80778, Taiwan tion. At present, research on regression issues has received
3
Department of Industrial Engineering and Management, a high degree of attention. With the rise of artificial intel-
Cheng Shiu University, Kaohsiung City, Taiwan ligence and the development of machine learning tech-
4
Department of Healthcare Administration and Medical nology, the application of machine learning algorithms in
Informatics, Kaohsiung Medical University, regression problems has become a research hotspot in the
Kaohsiung City 80778, Taiwan

123
5462 Neural Computing and Applications (2020) 32:5461–5469

field of machine learning. In recent years, research on proposed least squares-support vector machine (LS-SVM)
integrated learning has been receiving attention from pair was proposed. The retention or delivery of the service
industry and academia and is widely used in various fields is predicted. The results indicate that the model has a high
such as medical diagnosis, pattern recognition, scientific prediction accuracy in the discussion system and can
research and data mining. The main task of integrated improve the efficiency of the combined service [8]. With
learning is to learn many different individual learners the advent of the Gradient Boosting Machine (GBM), the
through the training data set and then combine the problem of decision tree learning trends to over-fitting or
respective prediction results into the final result. Compared under-fitting [9]. Then gradient boosting decision trees
with the single model of artificial neural network and (GBDT) came into being. Because it fully considers the
support vector machine of machine learning algorithm, trade-off between variance and bias, it has good general-
integrated learning has better generalization ability and ization ability; so it is widely used in search sorting and
stability. Therefore, this paper has important theoretical recommendation systems, fault detection and mode, iden-
and practical value for the study of several common tification and other research fields. Among them, the
machine learning algorithms and their integration models research on regression prediction includes: Ding Chuan
in regression problems. et al. used the GBDT model to predict the traffic of three
In the field of regression prediction, whether from tra- subway stations in Beijing. The results show that compared
ditional mathematical statistics methods or machine with other machine learning models and other black box
learning algorithms in recent years, researchers at home models, GBDT can accurately and automatically identify
and abroad have proposed many research methods for the effects of public transport and short-range subway time
prediction. For example, Wu Shujuan introduced the gray and space characteristics, and sort these effects, revealing
model and the improved gray model. Empirical studies the advantages of GBDT in the multi-modal transportation
show that the improved gray model is more accurate than system, accurate prediction of short-haul subway passenger
the original gray model on the port throughput dataset [1]. capacity [10]. Ma Xiaolei et al. based on the GBDT
Wei and Huang [2] implemented a partial least squares algorithm with both statistical characteristics and artificial
regression method on the precipitation dataset in eastern intelligence can identify the complex nonlinear relationship
China and proposed that the number of principal compo- between variables and apply it to predict nonlinear and
nents and the sample size have a great influence on the unbalanced accident processing time, and crash in Wash-
prediction results. Zhang and Zhang [3] used three expo- ington, USA. Accident data to verify the validity of the
nential smoothing methods to predict the container model, the results show that compared to other models,
throughput data of Qingdao Port and verified the validity of GBDT has significant advantages in the prediction of long
the model. Guo et al. [4] conducted an in-depth study of the processing time accidents and short processing time acci-
Kalman filter model, using a random adaptive Kalman filter dents [11]. Makridakis and Winkler assume that the
model for traffic flow prediction and achieved good results. weights of the models in the combined forecast are equal
On the basis of fully measuring the distance coefficient and do not change with time, that is, the prediction results
between adjacent nodes and target nodes, Xue and Shi [5] of each single prediction model are simple. Arithmetic
optimized the chaotic time series model and obtained the averaging yields combined predictive values [12]. Granger
traffic flow prediction model. Viana et al. used principal and Ramanathan add constant terms to the combined
component analysis to extract atmospheric, oceanic and forecasting model. The predictions for each single fore-
other oceanic variables, and multivariate regression pre- casting model can be biased, relaxing the limit of the sum
dictions of monthly and seasonal precipitation sequences in of the weights of the individual forecasting models to 1. A
southern Brazil. The empirical results show that the model weight calculation method that minimizes the sum of
has good prediction performance [6]. Furkan Baser et al. squared prediction errors is proposed [13]. Diebold and
used the fuzzy regression function-support vector machine Pauly proposed a weighted least squares regression com-
(FRF-SVM) to predict the global solar radiation on the bined prediction model, which proves that all combined
horizon and build it based on the horizon solar radiation prediction models based on variance–covariance matrix are
data observed in Turkey. The model shows that the FRF- the smallest two. A special case of multiplicative regres-
SVM model with Gaussian kernel function can effectively sion combination prediction can be obtained by setting
avoid the influence of abnormal observations and over- different weight matrices [14].
fitting, so it can be applied to the long-term prediction of Although researchers at home and abroad have made
horizon solar radiation [7]. Hu Jingjing et al. lacked ini- some achievements in the study of single and integrated
tiative for the current composite services, and when the models of machine learning algorithms, the simple inte-
services provided exceeded demand, it was prone to grated model in regression problems is more complicated
problems such as service backlog and service failure. The than the instability and uncertainty of the performance of

123
Neural Computing and Applications (2020) 32:5461–5469 5463

single prediction models. The integrated model predicts learning has been comprehensive and systematic. In many
better results. Makridakis and Winkler proposed a practical research fields, machine learning algorithms such as
combination forecasting method to improve the predictive k-nearest neighbor, decision tree, neural network, support
effectiveness by adding a single forecasting model based vector machine and naive Bayesian method are introduced
on different information sources, and then using a simple in detail. In recent years, the research of machine learning
average combined forecasting model to combine. West in regression problems has continued to deepen. For
proposed a more systematic combined forecasting method example, Liu Changjun used BP neural network to make
that selects simple average combined forecasting for a regression predictions on container throughput data.
single forecasting model with the best average performance Empirical studies show that the model works well [21]. The
over a variety of matrices and different but related annual and monthly precipitation series from 1951 to 2013
sequences. However, in most cases, this method cannot be in Nanjing were predicted by the random forest model, and
effectively implemented in practical applications due to the model showed good prediction performance [22].
conditions and the like. Fu Geng et al. proposed the idea of However, the use of a single machine learning model did
superior combination forecasting, solved the unconstrained not play a better predictive effect, so researchers at home
combined forecasting problem with the minimum square of and abroad to study the integrated model of machine
error and pointed out that the condition of optimal com- learning in the regression problem. For example, Abbot
bined forecasting is simple averaging method [15]. Chen et al. combined the Elman dynamic network with a delayed
Huayou proposed a combined forecasting model based on neural network to predict precipitation in Queensland. The
predictive validity and pointed out the use of linear pro- results show that the combined model is more accurate than
gramming to solve the weight of single forecasting model a single model [23]. Wu Huijun et al. realized the appli-
in combined forecasting [16]. Chen and Hou [17] proposed cation of the gray RBF neural network combination model
a combined prediction model based on the standard devi- in the regression prediction of container sea-rail combined
ation of prediction validity. Jingrong and Xiusi [18] pro- transport growth trend and achieved higher precision [24].
posed a nonlinear combination forecasting model based on Wen Pengfei et al. applied the gray prediction model and
Gaussian fuzzy logic system, which avoids the limitation of the cubic exponential smoothing model on the dataset of
linear combination forecasting and gives corresponding port throughput of Nantong City, and applied weighted
algorithms to determine model parameters and sub-sets. combination. The results indicate that the accuracy of the
Dong Jingrong also proposed a nonlinear combined pre- combined model is improved compared with the single
diction model based on wavelet network. The learning model [25]. It can be drawn from the research results that
algorithm for solving delay parameters and scales of linear the machine learning algorithm and its integration model
combination of neural network weight and wavelet func- can be well applied to the regression problem. Therefore,
tion is presented. The model solves the combination of this paper uses several common machine learning algo-
non-stationary time series. The difficulties and shortcom- rithms and their integration models and applies to regres-
ings are encountered in linear combination forecasting sion prediction.
methods [19]. Based on the existing intelligent prediction With the rise of artificial intelligence, machine learning
theory, Cao Yongqiang and so on combine the neural algorithms such as support vector machines and artificial
network algorithm, fuzzy set theory and genetic algorithm neural networks have achieved good results in various
to establish a nonlinear intelligent combination forecasting fields such as economy, meteorology and medicine. At
model and prove the effectiveness of the algorithm [20]. present, the research and application of machine learning
Machine learning algorithms have been widely used in algorithms in regression problems have also penetrated into
various fields such as the Internet, media and medicine and many fields such as weather forecasting, intelligent trans-
have achieved certain results. Over the years, along with portation, medical diagnosis and financial forecasting.
the vigorous development of research directions such as However, real-world regression problems often have very
machine learning, artificial intelligence and data mining, complex internal and external factors, and various machine
researchers have been paying close attention to the devel- learning algorithms have different effects on scalability and
opment of machine learning algorithms. Machine learning predictive performance. Therefore, in order to better study
is a multi-disciplinary subject, including disciplines, the application effect of machine learning algorithms in
statistics, algebra, calculus and algorithmic complexity regression problems, several common machine learning
theory. Machine learning refers to the use of complex algorithms such as BP neural network, support vector
algorithms to allow computers to simulate or possess machine and extreme learning machine are applied to the
human learning behaviors, to constantly update and adjust regression problem. By comparing the effects of a single
their knowledge, to improve and optimize their perfor- model and integrated model of these machine learning
mance systems. At present, the research of machine algorithms in the application of regression problems, the

123
5464 Neural Computing and Applications (2020) 32:5461–5469

performance of each machine learning algorithm and its 2.1 BP neural network
integration model is studied. Finally, in order to verify the
prediction performance of several machine learning algo- Artificial neural network, also known as neural network, is
rithms, this paper selects two personal data sets and two inspired by the composition and working mode of the
real data sets to verify the prediction accuracy, error and human brain nervous system. Its goal is to explore the
execution time efficiency of each algorithm. The prediction mystery of human intelligence through the study of human
results of different algorithms are compared and analyzed brain composition mechanism and thinking mode. Simu-
on different data sets. The experimental results show that lation makes the machine intelligent like humans. The
the EML algorithm in a single model has better perfor- basic building blocks of neural networks are neurons, and
mance than other algorithms. When performing regression various intelligent behaviors are generated by the inter-
prediction on each IMF after modal decomposition, the connection of a large number of neurons in a neural net-
optimal integration model is determined according to the work. For each neuron, it can receive a set of input signals
prediction effect. The experimental results of the integrated from other neurons, each input corresponding to a weight,
model show that the integrated model of this paper is better and the weighted sum of all inputs determines the output of
than the single model. Through the performance compar- that neuron. In neural networks, there are many ways to
ison experiments, several machine learning algorithms and connect neurons. Different connection methods constitute
their integration models studied in this paper have good different connection models of the network. Common
feasibility in the application of regression problems. connection models are:
Feedforward network The neurons are hierarchically
arranged, with input layer, middle layer and output layer.
2 Proposed method
Each layer of neurons only accepts input from the previous
layer of neurons.
Machine learning is an inevitable outcome of the devel-
A network with feedback from the input layer to the
opment of artificial intelligence research to a certain stage.
output layer: some of the output of the output layer is sent
It is a discipline that studies how to use computer programs
as input information to the input layer neurons.
to simulate human learning activities. There are many
There are interconnected networks within the layer: in
problems in the real world, such as handwritten digit
addition to accepting input from the previous layer, the
recognition and automatic driving, which cannot be solved
same layer of neurons can be connected to each other.
by direct programming. People hope that computer pro-
grams can learn from existing experiences to improve their Internet Any two neurons in the network can be connected.
performance. According to Herbert A. Simon, the artificial As a feedforward neural network, BP neural network has
intelligence master, learning refers to the enhancement or strong learning ability and is easy to implement. It is one of
improvement of the system’s own capabilities in repeated the most widely used neural networks. The BP neural
work, so that the next time the system performs the same network is introduced below. The back propagation model,
task or the like, it will do better than now. Better or more also known as the Back Propagation (BP) model, is a back
efficient. In summary, if human behavior changes under the propagation learning algorithm for multilayer feedforward
influence of experience, we call this phenomenon learning. neural networks, which was proposed by Rumelhart et al.
Experience is an important factor in learning, and the result [27] in parallel distributed processing in 1986. A multilayer
of learning leads to behavioral changes. Since the main feedforward neural network using the BP algorithm is
body of machine learning is a computer program, its def- called a BP neural network. Because of its simple structure,
inition is slightly different. So far, there is no unified, it can effectively solve the approximation problem of
recognized and accurate definition of ‘‘machine learning.’’ nonlinear objective functions. BP neural network is widely
One of the widely used definitions is given by Tom used in system simulation, function fitting, pattern recog-
Mitchell of Carnegie Mellon University in his book nition and other fields. It is the most widely used neural
‘‘Machine Learning’’: a computer program learns from task network.
E about the relationship between task T and performance The idea of BP neural network is that the learning
metric P, if task T Performance (measured by P) can be process consists of two parts: forward propagation and
improved by E, which is called machine learning. In a back propagation. Forward propagation means that data are
nutshell, machine learning refers to the use of computer passed in from the input layer and processed through the
programs to improve the performance of the system itself hidden layer to the output layer. The neurons in each layer
[26]. only affect the adjacent next layer of neurons. If the output
data obtained by the output layer do not match the actual

123
Neural Computing and Applications (2020) 32:5461–5469 5465

data, the forward propagation will go to the back propa- function is a core problem in the research of support vector
gation. Back propagation is to return the output error layer machine theory. Unfortunately, so far, there is no effective
by layer in a certain way and adjust the connection weight way to choose kernel function. In practical applications,
of each layer of neurons according to the error returned. there are three most commonly used kernel functions:
Through the continuous iteration of these two processes, polynomial kernel functions, radial basis kernel functions
the error is finally reduced to the allowable range. In short, and multi-layer perceptron kernel functions.
the connection weight of each layer of neurons is con- The basic problem of regression is to find a function
stantly adjusted according to the output error, so that the f 2 F that minimizes
R the expected risk function below,
adjusted network is arbitrarily input and can get the desired namely R½f  ¼ lðy  f ðxÞÞdPðx; yÞ where lðÞ is the loss
output. The learning process of BP neural network can be function and represents the deviation between y and f(x). Its
summarized as follows: (1) select a set of training samples, common form is lðÞ ¼ jy  f ðxÞjp where p is a positive
each sample consists of input and expected output; (2) integer. Since P(x, y) cannot be known in advance,
select a sample from the training sample set and add the R[f] cannot be directly calculated using the above formula.
input to the network; (3) calculate the output of each layer According to the structural risk minimization, there are:
P
of neurons separately; (4) calculate the error between the R½f   Remp þ Rgen where Remp ¼ 1n ni¼1 lðyi  f ðxi ÞÞ is the
actual output of the network and the expected output; (5) empirical risk; and Rgen is a measure of the complexity of
reverse the calculation from the output layer to the first f(x). Therefore, Remp þ Rgen can be used to determine the
hidden layer, and develop according to the direction of upper limit of R[f].
decreasing the error. The principle is to adjust the con- The basic idea of SVM to solve regression prediction is
nection weight between neurons in the network; (6) repeat as follows: Given the set of observation samples (x1, y1),
the steps 3–5 above for each data sample in the training set (x2, y2),…,(xn, yn) 2 Rn  R with p(x, y) as the probability,
until the error of the entire training set reaches the required let the regression function be:
level.
The idea of BP neural network for regression prediction F ¼ ff jf ðxÞ ¼ wT  x þ b; w 2 Rn g ð1Þ
is to first collect historical data to train the neural network Introduce the following structural risk function:
and then predict the problem through the trained neural
1
network model. Compared with the traditional prediction Rreg ¼ jjwjj2 þ C  Remp ½f  ð2Þ
2
method, the neural network-based prediction does not need
to determine the mathematical model of the sample data in where jjwjj2 is a description function; f ðÞ is a complex
advance, and only through the learning of the sample data, term; and C is a constant. Its role is to take a compromise
the trained neural network model can be obtained, and then between empirical risk and model complexity where l is the
it can be used to make it quite accurate. loss function, which can be any convex loss function, and
the loss function is:
2.2 Support vector machine 1. Quadratic loss function

Support vector machine (SVM) is the youngest and most lðf ðxÞ  yÞ ¼ ðf ðxÞ  yÞ2 ð3Þ
practical part of statistical learning theory. Its core content
2. Huber loss function
was proposed between 1992 and 1995 and is still in the (
process of continuous development [28]. Support vector ejf ðxÞ  yj  e2 =2 if jf ðxÞ  yj [ e
lðf ðxÞ  yÞ ¼ 1
machine is a new machine learning algorithm. Its basic kf ðxÞ  yk2
idea is to map low-dimensional nonlinear functions to 2
high-dimensional spaces through a clever nonlinear map- ð4Þ
ping, but the support vector machine does not need to know 3. Minimum mode loss function
the specific form of nonlinear mapping. It is only necessary
to select the appropriate kernel function and calculate the lðf ðxÞ  yÞ ¼ jf ðxÞ  yj ð5Þ
kernel function for optimization, so that the dot product of
the high-dimensional feature space is transformed into the
kernel function of the low-dimensional space for calcula- 2.3 Extreme learning machine
tion, which is skillfully solved to solve in the high-di-
mensional feature space dimension of disaster. The kernel For the multi-layer feedforward neural network, it is not
function must satisfy the Mercer condition while being able hard to fall into local optimum and to produce over-fitting.
to accurately reflect the distribution characteristics of the Huang et al. [29, 30] proposed an extreme learning
training sample data. Therefore, the choice of kernel machine (ELM) learning algorithm. A lot of experimental

123
5466 Neural Computing and Applications (2020) 32:5461–5469

research results indicate that ELM learning algorithm has unstable learning algorithm and can be regarded as a weak
faster learning speed and smaller average generalization learning algorithm. The ELM is used as a weak learning
error than the traditional BP learning algorithm. Compared algorithm to construct a boosting-based learning device
with the SVM algorithm, although the two do not show and achieves satisfactory prediction results. Due to the
obvious advantages and disadvantages in terms of gener- advantages of ELM in generalization ability and running
alization ability, ELM learns much faster than SVM. speed, this paper uses ELM as the base regression learning
Given a training data set D ¼ fðxi ; yi ÞgNi¼1 , where algorithm to build an integrated learner.
xi ¼ ½xi1 ; xi2 ; . . .; xid  2 Rd ; yi 2 R, ELM is an input layer
containing d input neurons, an implicit layer containing h
hidden layer neurons and an output layer containing only a 3 Experiments
single neuron a three-tier network. The mapping between
its input and output can be expressed as: 3.1 Data source
X
h X
h
y^j ¼ bi kðwi ; bi ; xj Þ bi gðwi  xj ; bi Þ ðj ¼ 1; 2; . . .; NÞ The four data sets selected for the experiment contained 2
i¼1 i¼1 individual synthetic data sets and 2 real data sets from
ð6Þ different application fields. The selection of experimental
data sets takes into account different sample sizes and
where wi ¼ ½wi1 ; wi2 ; . . .; wid T is the weight vector con- application areas, and these data sets are widely used for
necting the first implicit node and the input node; bi ¼ performance verification of non-stationary time series
½bi1 ; bi2 ; . . .; bih T is the weight vector connecting the ith prediction algorithms. Table 1 shows the relevant
implicit node and the output node; bi is the offset of the ith descriptive information for the four data sets.
hidden layer node; and kðÞ is the activation function of the As a typical chaotic time series, Lorenz and Mackey–
hidden layer node, and the activation function can be a Glass time series are often used to test the predictive power
radial basis function or a Sigmoid function. Without loss of of the algorithm. They can all be generated by a delay
generality, this paper chooses the RBF kernel function as differential equation. The sunspot time series data set is
the activation function, which is defined as follows: one of the most used data sets in the field of nonlinear and
! non-stationary time series prediction studies. The data set is
jjx  ui jj2 from the solar influences data analysis center (SIDC). The
kðui ; ri ; xÞ ¼ exp  ð7Þ
r2i experiment selected the monthly sunspot time sequence
data from 1749 to 1983 as the experimental object. The
where ui ¼ ½ui1 ; ui2 ; . . .; uih T is the center of the ith kernel Canadian lynx dataset records the number of Canadian
function; r2i is the bandwidth. Unlike standard RBF neural strands trapped each year between 1821 and 1934 in the
networks, parameters ðui ; r2i Þ need to be determined in first major hippo area in northern Canada. The data set is
advance by some learning algorithm. In ELM, the param- derived from the well-known time series database The
eters ðui ; r2i Þ are randomly set before the start of the Time Series Data Library (TSDL).
learning process and remain unchanged throughout the
learning process. Only the connection weight b ¼ 3.2 Evaluation criteria
½b1 ; b2 ; . . .; bh T between the hidden layer neurons and the
After a set of models is proposed, we must propose the
output layer neurons is needed to be determined by learn-
evaluation criteria for the model. Different models must
ing. The goal of ELM learning minimizes the error
adopt different evaluation criteria. The main evaluation
between the predicted and target values.
! criteria we use in this paper are: mean squared error
X N   (MSE), mean absolute percentage error (MAPE) and
min f^ðxi Þ  yi  ð8Þ
i¼1
Table 1 Data set information description table
where yi is the target value corresponding to the predicted
Data set Sample size Source Abbreviation
value f^ðxi Þ.
The implicit node weight vector wi and the offset bi are Lorenz 6000 Artificial data set Lorenz
randomly determined. Different initial values have a sig- Mackey–Glass 5000 Artificial data set MG
nificant influence on the learning effect of the ELM, that is, Sunspot 2820 SIDC Sunspot
the ELM algorithm is sensitive to the initial values of the Canadian lynx 114 TSDL Lynx
parameters wi and bi. Therefore, the ELM algorithm is an

123
Neural Computing and Applications (2020) 32:5461–5469 5467

square correlation coefficient (R2). Among them, the mean


squared error (MSE) is the degree of change in the eval-
uation data, which is the expectation between the predicted
value and the target value. The smaller the values of MSE
and MAPE, the higher the accuracy of the prediction
results. The square correlation coefficient R2 is the per-
centage that determines the variation of the dependent
variable in the regression model.

3.3 Experimental setup

The experiment firstly divides the selected four time series


data sets into training data set, verification data set and test Fig. 1 Comparison of execution time of different algorithms on data
data set, respectively. The detailed division is shown in sets of different scales
Table 2. In order to prevent over-fitting during training, the
estimation error is estimated as 20-fold cross-validation Table 3 Average squared error of a single model on different data
(tenfold CV) average error value as the final prediction sets
result of each comparison algorithm. Before the training of Method Lorenz MG Sunspot Lynx
the learner, the data set is uniformly normalized.
BP 282.47 256.36 512.31 311.25
ELM 246.12 210.54 482.54 298.62
4 Results and discussions SVM-GS 257.78 236.87 475.69 302.13
SVM-ABC 210.66 183.25 426.40 274.67
In order to better compare the application effects of SVM-PSO 198.56 156.13 402.29 214.82
machine learning algorithms in regression problems, this SVM-DE 198.86 142.25 405.63 220.15
paper uses three common machine learning algorithms,
namely BP neural network, SVM and ELM. This paper first
compares the three single models on Lorenz, and the
resulting time comparison results are shown in Fig. 1. The Therefore, it is more suitable to apply the ELM algorithm
results of the prediction model evaluations obtained in in the regression prediction of large-scale data sets.
different data sets are shown in Table 3. Among them, the It can be seen from Table 3 that the mean square error of
different optimization parameters adopted by the SVM different algorithms is different on different data sets.
algorithm include grid search algorithm (GS), artificial bee Among them, the mean square error of the ELM algorithm
colony algorithm (ABC), particle swarm optimization is smaller than the BP algorithm. The SVM uses the PSO
(PSO) and differential evolution algorithm (DE), corre- algorithm to select the parameter with the smallest mean
sponding to the SVM-GS, SVM-ABC, SVM-PSO, SVM- square error, followed by the DE algorithm, while the GS
DE in the table. algorithm optimizes the parameter with the largest mean
It can be seen from Fig. 1 that in the case of the same square error. This paper firstly modally decomposes the
data set size, the execution time of the BP algorithm is the data in the dataset Lynx to obtain four types of intrinsic
longest, the SVM algorithm is second, and the ELM mode functions (IMF) to verify the superiority of the
algorithm is the shortest. As the scale of the data set is integrated model in regression prediction. BP neural net-
large, the execution time of the three algorithms increases, work, SVM and ELM were used to predict the regression
and the increase speed of the EML algorithm is relatively of the four types of IMF, respectively, on the previous
flat, which is less affected by the size of the data set. experiments. The results of the single IMF obtained are
shown in Table 4. Then, the algorithm of each IMF is

Table 2 Data set partition table


Data set Sample size Training set Verification set Test set

Lorenz 6000 2500 2500 1000


Mackey–Glass 5000 2000 2000 1000
Sunspot 2820 1128 1128 564
Canadian lynx 114 50 50 14

123
5468 Neural Computing and Applications (2020) 32:5461–5469

Table 4 Regression results of obtained by applying different machine learning algorithms


Method MSE MAPE R2
different algorithms on each on each data set have better feasibility and rationality.
IMF IMF1
BP 13.35 2.39 0.27
SVM 9.10 2.01 0.48 5 Conclusions
ELM 13.26 2.85 0.27
IMF2 Forecasting is a common method for people to find the law
BP 0.05 0.31 0.97 or extract the value of data from massive data. Currently,
SVM 0.15 0.62 0.94 forecasting technology has been widely applied in many
ELM 0.04 0.28 0.98 fields such as meteorological forecasting, medical diagno-
IMF3 sis and financial forecasting. With the rapid development of
BP 0.24 8.51 0.77 machine learning technology, the research of prediction
SVM 0.16 7.11 0.85 methods has gradually shifted from traditional mathemat-
ELM 0.32 3.59 0.71 ical statistics methods to machine learning algorithms. This
IMF4 paper applies several common machine learning algorithms
BP 0.18 0.19 0.99 such as BP neural network, support vector machine and
SVM 0.52 0.49 0.98 extreme learning machine to the regression problem.
ELM 0.22 0.21 0.98 Through the performance comparison between the single
model and the integrated model of each machine learning
algorithm, the application effect of the machine learning
selected to obtain the final integrated model, and the model algorithm in the regression problem is studied. This paper
is used to implement regression prediction. The final pre- selects 2 individual synthetic data sets and 2 real data sets
diction result is shown in Fig. 2. from different application fields, then divides each training
It can be seen from Table 4 that predicting IMF1 with set, verification set and test set, and normalizes the data to
SVM is the best, and predicting IMF2 with ELM is the interval [- 1,1]. The model uses three evaluation indica-
best. Using BP to predict IMF4 is the best. In the prediction tors: mean square error (MSE), mean absolute percentage
effect of IMF3, no performance evaluation of any algo- error (MAPE) and square correlation coefficient (R2).
rithm is the best. For IMF3 predictions, SVM performs best Firstly, a single model is simulated on each data set, and
in terms of MSE and R2, while ELM performs best in then the IMF of the modal decomposition is subjected to
MAPE. Therefore, according to the performance compar- regression prediction, and the optimal integration model is
ison of each algorithm, the integration model selected in determined according to the prediction effect. The experi-
this paper is: SVM algorithm is adopted on IMF1 and mental results of the integrated model show that the inte-
IMF3, ELM algorithm is adopted on IMF2, and BP algo- grated model of this paper is superior to the single model
rithm is adopted on IMF4. using machine learning algorithm. Through the perfor-
It can be seen from Fig. 2 that the prediction results of mance comparison experiments, several machine learning
the integrated model are significantly better than the single algorithms and their integration models studied in this
model, and each performance evaluation is optimal. The paper have a good effect in the regression problem.
above experimental results show that the prediction results

Compliance with ethical standards

Conflict of interest We declare that we have no financial and personal


relationships with other people or organizations that can inappropri-
ately influence our work, and there is no professional or other per-
sonal interest of any nature or kind in any product, service and/or
company that could be construed as influencing the position presented
in, or the review of, the manuscript entitled.

References
1. Wu S, Wang L, Wu Z (2004) Prediction of port container
throughput using improved grey model. Water Transp Manag
26(11):14–16
Fig. 2 Forecast results of the integrated model on the dataset Lynx

123
Neural Computing and Applications (2020) 32:5461–5469 5469

2. Wei F, Huang J (2010) Predictability of statistical downscaling of 17. Chen H, Hou D (2003) Combined forecasting model based on
summer precipitation in eastern China. Acta Trop Meteorol standard deviation for predictive validity. J Syst Eng
26(4):483–490 15(3):203–210
3. Zhang M, Zhang Y (2006) Container throughput forecast of 18. Jingrong D, Xiusi Y (1999) Research on nonlinear combination
Qingdao Port based on cubic exponential smoothing method. forecasting method based on fuzzy logic system. J Manag Sci
J Trop Meteorol Contain 07:37–39 3:28–33
4. Guo J, Huang W, Willians B (2014) Two adaptive Kalman filter 19. Dong J (2000) Research on nonlinear combination forecasting
approach for stochastic short-term traffic flow rate prediction and method based on wavelet network. J Syst Eng 4:383–388
uncertainty quantification. Transp Res Part C Emerg Technol 20. Cao Y, Zhang D (2004) Nonlinear intelligent combination fore-
43:50–64 casting model and its application. J China Univ Min Technol
5. Xue JN, Shi ZK (2014) Short-time traffic flow prediction based 4:428–432
on chaos time series theory. J Transp Syst Eng Inf Technol 21. Liu C, Zhang Q (2007) Dynamic prediction of container
8(5):68–72 throughput based on time series BP neural network. Water Transp
6. Viana DR, Sansigolo CA (2016) Monthly and seasonal rainfall Eng 1:4–7
forecasting in Southern Brazil using multiple discriminant anal- 22. Zhen Y, Hao M, Lu B et al (2015) Research on medium and long-
ysis. Weather Forecast 31(6):1947–1960 term precipitation prediction model based on random forest.
7. Baser F, Demirhan F (2017) A fuzzy regression with support Hydropower Sci 6:6–10
vector machine approach to the estimation of horizontal global 23. Stoll S, Abbot-Smith K, Lieven E (2010) Lexically restricted
solar radiation. Energy 123:229–240 utterances in Russian, German, and English child-directed
8. Jingjing H, Xiaolei H, Changyou Z (2016) Proactive service speech. Cognit Sci 33(1):75–103
selection based on acquaintance model and LS-SVM. Neuro- 24. Wu H, Liu G (2016) Forecast of Ningbo Port container sea-rail
computing 211:60–65 combined transportation based on grey RBF combined model.
9. Friedman JH (2001) Greedy function approximation: a gradient J Ningbo Univ (Nat Sci Ed) 29(4):123–127
boosting machine. Ann Stat 29(5):1189–1232 25. Wen P, Wang T (2016) Research on container throughput pre-
10. Duan J, Ding C, Lu Y et al (2017) Short-term passenger flow diction of Nantong port by combined model. China Bus Rev
forecasting method for rail transit sites considering dynamic 8:138–141
volatility. J Traffic Inf Saf 35(05):68–75 26. Zhu Y (2016) Artificial intelligence revolution ‘‘combustion
11. Ma X, Ding C, Luan S et al (2017) Prioritizing influential factors promoter’’: machine learning. Sci Technol Rev 34(7):64–66
for freeway incident clearance time prediction using the gradient 27. Ibrahim AO, Shamsuddin SM, Abraham A et al (2019) Adaptive
boosting decision trees method. IEEE Trans Intel Transp Syst memetic method of multi-objective genetic evolutionary algo-
99:1–8 rithm for backpropagation neural network. Neural Comput Appl
12. Makridakis S, Winkler RL (1983) Averages of forecasts-some 4:1–18
empirical results. Manag Sci 29(9):987–996 28. Geng G, Luo X, Xiao Y (2005) Statistical learning theory and
13. Granger CWJ, Ramanathan R (1984) Improved methods of support vector machine. China Sci Technol Inf 12:178
combining. J Forecast 3:197–204 29. Tan L, Liu H, Tan L (2014) Prediction and analysis of shanghai
14. Diebold FX, Pauly P (1987) Structural change and the combi- composite index based on extreme learning machine. J North
nation forecast. J Forecast 6:21–40 China Univ Sci Technol 4:57–60
15. Fu G, Tang X, Zeng Y (1995) Research on generalized recursive 30. Xing X (2019) Ocean Mammalian sound recognition based on
variance reciprocal combination forecasting method. J Univ feature fusion. Rev Cient 29(3):653–664
Electron Sci Technol China 24(2):211–217
16. Chen H (2001) Research on combined prediction model based on Publisher’s Note Springer Nature remains neutral with regard to
prediction validity. Prediction 20(3):72–73 jurisdictional claims in published maps and institutional affiliations.

123

You might also like