You are on page 1of 5

2010 Sixth International Conference on Natural Computation (ICNC 2010)

Using Genetic Algorithms for Time Series Prediction

Cheng-Xiang Yang, Yi-Fei Zhu


School of Resources & Civil Engineering
Northeastern University
Shenyang 110004, P R China

Abstract—This paper proposes using the genetic algorithms (GAs) algorithm for time series modeling. The motivation of the
for nonlinear time series prediction. A nesting evolution scheme nesting evolution method comes from the following
is designed to evolve the forecasting models. In the outer perspectives. First, since the complex real-world problem is
evolution cycle, a binary-coded genetic algorithm is employed to highly nonlinear, it is difficult for forecasters to choose the
evolve the structures of nonlinear polynomial type models. Then exact model structure. Usually, a limit number of different
the coefficients of the evolved models are introduced and models are tried and the one with the most accurate result is
optimized by a real-coded genetic algorithm in the inner selected. However, the final selected model is not necessarily
evolution cycle. The evolution process is repeated by using the best for future uses due to many potential influencing
genetic operators and the principle of ‘survival of the fittest’ until
factors such as sampling variation, model uncertainty, and
find the satisfied results. The proposed method is applied to
deformation prediction of the dangerous rock mass in rock
structure change. By using suitable coding, the problem of
engineering. The results indicate the applicability of the proposed model structure selection can be eased with little extra effort
algorithm with enough accuracy. through function optimization of GAs. Second, the model
structures generated during the model identification process
Keywords-time series; genetic algorithms; nonlinear; modeling; may include a number of coefficients which can have great
forecasting effects on the performance of a model. As a potentially good
model with favorable structure may be removed during the
I. INTRODUCTION evolution process because of inappropriate parameters, a
parameter estimation procedure has to be employed to optimize
Decision making and planning for a variety of complex those coefficients. Then the good of fitness of each generated
systems involves prediction or forecasting, which is normally model structure can be reasonably computed and assigned
carried out by investigating patterns in historical data and before performing natural selection for further evolution. In
speculating that the future trend will behave according to its this study, another GA evolution cycle is used for such
past pattern. Much of the historical data is recorded at specific parameter estimation. Thus, a nesting evolution scheme, i.e., an
time intervals. This time series data provides good insight into outer GA evolution of model structures coupling a series of
the behavior of the systems under study. Many efforts have inner GA evolution of model coefficients generated with mode
been made over the past several decades to develop and structures, is consequently designed. The proposed algorithm
improve time series forecasting models. Although several is then used to model the nonlinear dynamic deformation
different types of time series models are available [1, 2], they behavior of some dangerous rock mass to illustrate its
have difficulties in selection of nonlinear model structures. The efficiency and accuracy.
approximation of linear models to complex real-world problem
is not always satisfactory. Researchers began to introduce
II. GENETIC ALGORITHMS
modern information analysis techniques for nonlinear systems,
such as artificial neural networks, grey system theory and GAs are random search algorithms based on the concepts of
support vector machines, to overcome such limitations and the natural selection, genetics and evolution. The major difference
results are interesting and encouraging [3-7]. This provides between GAs and other classical optimization techniques is that
potential powerful alternatives for nonlinear time series GAs work with a population of possible solutions, while those
modeling. classical optimization techniques work with a single solution.
Another difference is that the GAs use probabilistic transition
Recently, the evolutionary computation techniques [8] have rules instead of deterministic rules. In a GA search process, a
proved themselves robust data-based modeling tools for group of candidate solutions, represented as genes on a
complex system analysis. Based on the Darwinian theory of chromosome (e.g., binary strings or real numbers) in the search
natural selection, they attempt to obtain the best solution by space, are evolved to find better solutions through natural
carrying out global optimization. They use suitable coding to selection and the genetic operators, i.e., crossover and
represent possible solutions for a problem and guide the search mutation, borrowed from natural genetics. A standard GA
by using genetic operators and the principle of 'survival of the consists of the following steps:
fittest'. Of these algorithms, genetic algorithms (GAs) have
established itself as a powerful search and optimization tools in Step 1. Initialize a population of possible solutions;
problem solving and function optimization [9, 10]. Based on Step 2. Calculate the fitness of each candidate solution in
the genetic algorithms, this paper develops a nesting evolution the population;
This work was supported by Program for New Century Excellent
Talents in University, the Special Fund for Basic Scientific Research of
Central Colleges under Grant No. N090401002, N090101001 and the SRF
for ROCS, SEM under Grant No. 20071108-4.
978-1-4244-5961-2/10/$26.00 ©2010 IEEE 4405
Step 3. Select the solutions with higher fitness to take 2 3

part in evolution. ut = f (ut −1 , ut − 2 ) = ∑∑ c jk utk− j


j =1 k =1 (4)
Step 4. Create new population by using GA operators on
2 3 2 3
the selected solutions. = c11ut −1 + c u
12 t −1 +c u 13 t −1 + c21ut − 2 + c u
22 t − 2 +c u
13 t − 2
Step 5. If the stopping criterion is satisfied, then stop the
Therefore, if the structure parameters p and q and the model
computing and take the best solution as the final result,
coefficients cjk are determined, then the nonlinear time series
otherwise, go to Step 2. model represented in (3) is recognized.
The GAs are mathematically simple yet powerful in their Apparently, it is highly multimodal with large parameter
search for improvement after each generation [20]. Due to space for most real-world systems, affected by a large number
their nature, GAs have many advantages: they require no of factors with complex intercorrelations among them. The
knowledge of gradient information of the objective function; question now becomes how we efficiently search such solution
high nonlinearities and discontinuities present on the space for a reasonable combination of those parameters that
objective function have little effect on overall optimization provide overall agreement with observed time series. It is
performance; they are resistant to becoming trapped in local difficult to obtain a global optimal solution using conventional
optima; they perform very well for large-scale optimization regression methods. GAs were thus implemented to find the set
problems and can be employed for a wide variety of of those unknown parameters that best matched the modeling
optimization problems. prediction with observed results.
A closer examination will easily reveal that there is a
III. GENETIC EVOLUTION SCHEME FOR NONLINEAR TIME natural nesting correlation between the model structure
SERIES MODELING parameters p and q and the model coefficients cjk. That is, the
model coefficients cjk have to be added and optimized under
A. Problem Description given model structure, while the goodness of a model structure
Time series forecasting is an important area of forecasting decided by p and q have to be estimated with known model
in which past observations of the same variable are collected coefficients. In this paper, a nesting evolution algorithm is
and analyzed to develop a model describing the underlying proposed for searching those parameters with nesting
relationship. It is reasonable to expect a predominant correlation to find the global optimal result, where an outer
correlation between the current observation and past evolution cycle of model structures is nested by a series inner
observations. Mathematically, as to observed time series {ut} (t evolution cycle of model coefficients. The details are described
= 1, 2, …), the times series model can be represented as in the subsequent sections.
ut = f (ut −1 , ut − 2 , , ut − p ) (1) B. Outer Evolution Cycle of Model Structures
where p is the number of history observations and f (·) the In the outer evolution cycle, model structure parameters are
mapping relationship. By introducing new observations, the generated by a standard GA procedure, starting from a initial
model can then be used to extrapolate the time series into the population of parameter sets {p, q}i (i=1, 2, …, N) (initial
future. Namely generation), where N is the population size. Each parameter set
represents a model structure (i.e., the frame of the polynomial
ut +1 = f (ut , ut −1 , , ut − p +1 ) expression without coefficients). For example, the model
structure with p=2 and q=2 can be written as
ut + 2 = f (ut +1 , ut , , ut − p + 2 ) (2)
2 2
ut = f (ut −1 , ut − 2 ) = ∑∑ utk− j = ut −1 + ut2−1 + ut − 2 + ut2− 2 (5)
In (1), f (·) is usually highly nonlinear. Based on the polynomial j =1 k =1

approximation theory, a polynomial type expression can be Since a relatively small number of parameters with integer
used to express nonlinear relationships as has been widely used values are to be evolved, the binary coding method is used here
in time series analysis. f (·) in (1) may be written as to obtain fast convergence. The chromosomes (binary strings)
p q of some model structures can be typically seen in Fig. 1.
f (ut −1 , ut − 2 , , ut − p ) = ∑∑ c jk utk− i (3) Subsequent generations are generated by using genetic
j =1 k =1 operators including selection, reproduction, crossover and
mutation. We can find that the algorithm can easily adjusts the
where p, q and cjk (j=1, 2, …, p; k=1, 2, …, q) are parameters to model structure through crossover and mutation on binary
be determined (in some cases, some of these parameters may strings. During crossover operation, the sub-strings (genes)
be set accordingly to be zero). Of these parameters, p and q are from two parent chromosomes are randomly selected to
integer values deciding the number of input variables and the produce a child chromosome. The mutation operation is
level of the polynomial model, respectively. They construct the conducted by randomly changing the values of some bits or the
frame of the model expression and can be called structure order of some sub-strings. As shown in Fig. 1.
parameters. cjk are real values deciding the model coefficients.
For example, f (·) in (1) with p =2 and q =3 can be formulated
as

4406
Model structure 1 Model structure 2 input-output sample cases of the studied system. For time series
(p=5, q=1) (p=2, q=2) modeling in the current study, these cases can be constructed
ut −1 + ut −2 + ut −3 + ut −4 + ut −5 ut −1 + ut2−1 + ut −2 + ut2−2 according to (1). Each case consists of a sub-series of inputs (ut-
Chromosome: 01010001 Chromosome: 00100010 1, ut-2, …, ut-p) selected by a moving window with size of p
Crossover from the observed series and a corresponding output ut. The
data set will be divided into two groups. One group is used as
New model structure 3
(p=3, q=1)
fitness cases to obtain the optimal model. The other group is
ut −1 + ut −2 + ut −3
used as testing cases to access the ability of out-of-sample
forecasting of the obtained model. One should note that the
Chromosome: 00110001
value of p is frequently changed during the evolution process
Mutation and the data set must be reconstructed accordingly.
New model structure 4 Fitness function that returns a measurement of the fitness of
(p=2, q=4) a model is generally based on the discrepancies between the
ut −1 + ut2−1 + ut3−1 + ut4−1 + ut −2 + ut2−2 + ut3−2 + ut4−2 model predictions and the observed results. It can be defined
Chromosome: 00100100 using some statistical result of those discrepancies. In the
current study, the root of the mean square error is used, namely
Figure 1. Coding and genetic operators in structure evolution cycle(A 4-bit
coding for each parameter used) 1/2
⎧⎪ 1 n 2⎫
∗⎤ ⎪
Fitness = ⎨ ∑ ⎣ui − ui ⎦ ⎬
⎡ (6)
It should be noted that the fitness of each model structure ⎪⎩ n i =1
can not be estimated now because the model coefficients are ⎭⎪
not known. Therefore, the outer evolution cycle have to be where ui and ui* are the model prediction and observed result
suspended until the model coefficients have been optimized. respectively, and n the number of learning cases.
Having the model structures generated, necessary After the inner evolution cycle, the optimal set of model
coefficients are automatically introduced. These coefficients coefficients for each model structure is determined and the
will be optimized in the inner evolution cycle. corresponding fitness value is assigned to the model structure
for further evolution.
C. Inner Evolution Cycle of Model Coefficients
For every model structure generated in the outer evolution IV. ALGORITHM DETAILS
cycle, the parameter set {p, q} is decoded according to (3) to
The nesting evolution algorithm for time series modeling
construct the mathematical expression by introducing
relies on the following steps:
necessary model coefficients cjk (j=1, 2, …, p; k=1, 2, …, q).
Then another standard GA procedure is used to evolve those Step 1. Randomly generate a population of initial model
coefficients in the context of the current model structure. Since structure parameter sets {p, q}i, then step into the outer
the unknown parameters (e.g., with the number of p × q) are evolution cycle;
real values, it is more suitable and convenient to directly Step 2. Estimate the fitness value of each model structure
represent genes as real values too. Therefore, a real-coded GA parameter set through the following sub-steps.
in which all genes in a chromosome are real numbers is used. Sub-step 1. Construct the learning cases and testing cases
In the present analysis, the real-coded GA with a reset according to (1) with the parameter p of the current model
stochastic selection type of selection procedure, simulated structure;
binary crossovers and polynomial mutations has been used. For Sub-step 2. Decode the model coefficient information (e.g.,
details of these genetic operators, we refer the reader to [11]. A the number of model coefficients) of current model structure,
typical crossover operator in the context of parameter then step into inner evolution cycle;
estimation is shown in Fig. 2. Sub-step 3. Evolve the model coefficients of the current
model structure starting from an initial population of
1
Parent chromosome 1 P1 = (c11 ,… , c11q , c21
1
,… , c12 q ,… c1jk , c1pq , )Τ
coefficient sets until the termination conditions for inner
Parent chromosome 2 P2 = (c112 ,… , c12q , c21
2
,… , c22q ,… c 2jk , c 2pq ,)Τ evolution cycle are satisfied, where the generated coefficient
sets are tested against the fitness cases to calculate the fitness
Crossover values according to (6);
Child chromosome 3 P3 = (c113 = χ (c11
1
, c112 ),… , c 3jk = χ (c1jk , c 2jk ), )Τ Sub-step 4. Test the current model with optimized
coefficients over the testing cases to estimate the fitness value
Figure 2. Crossover operation of two candidate parameter sets in the real- of the current model structure.
coded GA procedure (χ(•) is some crossover rule) Step 3. If the termination conditions for outer evolution
cycle are satisfied, output the best models and terminate the
D. Fitness Test algorithm. Otherwise, go to Step 4.
Step 4. Perform genetic operations on the current
In a GA evolution process for modeling problem, all population of model structures to generate a new generation of
generated models have to be tested against the real-world model structures. Then go to Step 2.
results to give reference values (fitness values) to control the
so-called natural selection. It is necessary to collect a set of The whole evolution process is logically shown in Fig. 3.

4407
Start Much effort has been made in the last decades to establish
different models to predict the stability state. Summarily, there
are two main kinds of methods. One kind of the methods is to
Randomly generate an initial population of develop numerical and analytical models to access the duration
model structure parameter sets {p, q} and magnitude of the movement state of the DAM. However,
affected by the complex time-dependent property of geo-
materials and many other engineering factors with considerable
Loop for each model structure
i1 = 1, N1
uncertainties, the movement of DAM is highly time-dependent
and commonly characterized with complex nonlinear dynamic
behavior. Under the conditions that the constitutive law of geo-
Construct learn cases according to the materials is far from well-known, it is very difficult to develop
current parameter value p accurate physical-based model to calculate and predict the
dynamic behavior. As an alternative, the other kind of methods
is based on data analysis of the deformation history of the
Decode the coefficient information of
current model structure
DAM that is regularly monitored during the movement. Of
these methods, deformation time series analysis has attracted
many attentions with encouraging results [3-7]. In this section,
Randomly generate an initial population of the proposed evolution modeling approach is applied for
model coefficient sets {cjk} modeling and predicting analysis of the dynamic evolution
process of a typical DAM.
Loop for each coefficient set B. Rearch Data and Implement Settings
i2 = 1, N2
To demonstrate the robustness of the proposed evolution
procedure for nonlinear time series modeling problem under
Calculate the fitness value of each consideration, we apply it to a real DAM related to the Three
coefficient set according to (6) Gorges Project in China, and the monitored deformation
history from 1978~1993 is used for modeling and predicting
Perform analysis. The data is illustrated in Fig. 4.
genetic operations
Termination No
to generate a new Before implementation of the proposed evolution procedure,
conditions population of
coefficient sets control parameters have to be pre-chosen to control the search
Yes process, and they include mainly the population size, the
Select the fitness of best coefficient set probabilities of genetic operators (crossover, and mutation), the
as the fitness of current model structure termination conditions, etc. As we have two different GA
cycles, i.e. structure evolution cycle and coefficient evolution
Perform genetic cycle respectively, two different sets of control parameters
Termination No operations to generate a have to be selected so that the search can be carried out
conditions new population of
structure parameter sets
efficiently. The implementation settings in current study are
Yes listed in Table I.
Output results
TABLE I. COMTROL PARAMETERS
Object Parameter
End Structure evolution
Population size 8
Crossover probability 0.7
Figure 3. Nesting evolution procedure for time series modeling Mutation probability 0.2
The fitness value of the best individual has
Termination criterion
It should be noted that the algorithm is based on stochastic remained unchanged for 3 generations
Parameter optimization
search. Therefore, it is a probabilistic approach in nature and
Population size 200
may arrive at different optimal solutions for different runs. To Crossover probability 0.85
eliminate the effect of this inherent variation, it is necessary to Mutation probability 0.05
perform several runs to improve the reliability of the results. Termination criterion
The fitness value of the best individual has
remained unchanged for 3 generations

V. APPLICATION TO DANGEROUS ROCK MASS


C. Application Results
A. Problem Presentation
For the training and testing cases, several runs of
Stability analysis of dangerous rock mass (DAM) is a major implementations described in section IV were performed with
task in rock engineering. The collapse of DAM may lead to different randomly generated initial model structures so that
serious natural hazards, each year accounting for enormous each run takes a different genetic path to evolve models. After
property damage in terms of both direct and indirect costs. all the runs finished, we got the optimal time series model as:

4408
ut = 0.4359ut −1 − 0.0157ut2−1 + 0.9487ut − 2 + models was evolved and examined with the monitored
(7) deformation history. The results shown that there is good
0.0022ut2− 2 + 0.1872ut − 3 + 0.0016ut2− 3 agreement between the observed and predicted deformations,
and the models obtained have interpretative forms that can be
Fig. 4 shows the learned and predicted deformation time
easily used for further analysis. The present method has been
series using the evolved time series model. One can note that
tested with encouraging success. It can be used a useful
the present method can attain a satisfied approximation. That
alternative to other data-based modeling methodologies in the
is, the evolved model found the underlying relationship
prediction analysis of complex dynamic systems.
between history and future deformation series of the DAM.
The evolved model can thus be used in predicting the
movement state of the DAM and providing information for REFERENCES
decision makers to program prevention schedule. [1] G.E.P. Box and G. Jenkins, “Time Series Analysis, Forecasting and
Control”, San Francisco, CA: Holden-Day, 1970.
25 [2] P.J. Brockwell and R.A. Davis, “Introduction to Time Series and
Forecasting”, New York, Springer, 1996.
20 [3] X. T. Feng, Z. Q. Zhang and P. Xu, “Adaptive and intelligent prediction
Deformation/mm

of deformation time series of high rock excavation slope”. Transactions


15 of Nonferrous Metals Society of China (English Edition), Vol. 9, pp.
842-846, April 1999.
10 [4] Z. Q. Huang, T. Jiang, Z.Q. Yue, et al, “Deformation of the central pier
Observed series of the permanent shiplock, Three Gorges Project, China: an analysis case
Learned series study”, International Journal of Rock Mechanics & Mining Sciences.
5 Vol. 40, pp. 877-892, September 2003.
Predicted series
[5] C.-X. Yang and Y.-F. Zhu, “Time series analysis using GA optimized
0 neural networks”, in the 3rd International Conference on Natural
0 6 12 18 Computation, Vol. IV, Los Alamitos, CA: IEEE Computer Society
Press, 2007, pp. 270-274
Observation step
[6] C. Yang, X. Feng and B. Chen, “Prediction of mining induced land
subsidence using support vector machines”, in Land Subsidence-
Figure 4. Learned and predicted results of the evolved nonlinear deformation Proceedings of the Seventh International Symposium on Land
time series model (7) Subsidence, Vol 2, A. G. Zhang, S. L. Gong, L. Carbognin et al., Eds.
Shanghai, Shanghai Scientific & Technical Publishers, pp.799~806,
2005.
VI. CONCLUSIONS [7] H. B. Zhao and X. T. Feng, “Study and application of genetic-support
This paper presented a nesting evolution procedure for vector machine for nonlinear displacement time series forecasting”.
Chinese Journal of Geotechnical Engineering, Vol. 25, pp. 468-471, July
nonlinear time series modeling. This method automatically 2003 (in Chinese).
constructed models through an outer binary–coded GA
[8] D. E. Goldberg, “Genetic and evolutionary algorithms come of age”.
evolution cycle for model structure selection nested with a set Communications of the ACM, Vol. 37, pp. 113-119, March 1994.
of inner real-coded GA evolution cycle for model coefficient [9] D. E. Goldberg, “Genetic Algorithms in Search, Optimization and
optimization. It can perform coupling global optimal search of Machine Learning”, MA: Addison-Wesley, Reading, 1989.
the structure as well as the coefficients of polynomial type [10] J.H. Holland, “Adaptation in Natural and Artificial System”, Michigan,
nonlinear model. Using this method, a typical case study on The University of Michigan Press, 1975.
prediction of deformation series of a real DAM related to the [11] K. Deb, “Multi-objective optimization using evolutionary algorithms”.
Three Gorges Project is discussed. The nonlinear time series UK, Wiley: Chichester, 2001.

4409

You might also like