You are on page 1of 5

Proceedings of the Fourth International Conference on Machine Learning and Cybernetics, Guangzhou, 18-21 August 2005

TIME SERIES PREDICTION BASED ON ENSEMBLE ANFIS
DE-WANG CHEN
1
, JUN-PING ZHANG
2
1
School of Electronics and Information Engineering, Beijing Jiaotong University, Beijing, 100044, China
2
Department of Computer Science and Engineering, Fudan University, ShangHai, 200433, China
E-MAIL: cdw@telecom.njtu.edu.cn, jpzhang@fudan.edu.cn
Abstract:
In this paper, random and bootstrap sampling method
and ANFIS (Adaptive Network based Fuzzy Inference
System)are integrated into En-ANFIS (an ensemble ANFIS)
to predict chaotic and traffic flow time series. The prediction
results of En-ANFIS are compared with an ANFIS using all
training data and each ANFIS unit in En-ANFIS.
Experimental results show that the prediction accuracy of the
En-ANFIS is higher than that of single ANFIS unit, while the
number of training sample and training time of the En-ANFIS
are less than that of the ANFIS using all training data. So,
En-ANFIS is an effective method to achieve both high
accuracy and less computational complexity for time series
prediction.
Keywords:
Time series prediction; ANFIS; ensemble learning;
bootstrap; traffic flow
1. Introduction
Time series prediction is a branch of probability and
statistical discipline with many applications in economics
prediction, weather analysis and traffic flow prediction [1].
There are many methods in time series prediction, such as
linear regression, Kalman filtering [2], neural network [3],
and fuzzy system [4]. Linear regression is simply but has
less adaptation. Kalman filtering is an adaptive method, but
intrinsically linear. The neural network can approximate
any nonlinear function, but it demands a great deal of
training data and is hard to interpret. On contrary, fuzzy
system has good capability of interpretation, but its
adaptability is relative low.
ANFIS [5] put forward by Jang in 1993 integrated the
advantage of both neural network and fuzzy system, which
not only have good learning capability, but can be
interpreted easily also. ANFIS have many applications in
many areas, such as function approximation, intelligent
control and time series prediction.
As to sample selection, many papers on time series
prediction have not given good methods. On the one hand,
they just partition the training data and testing data
randomly. So, the training data sometimes do not always
reflect the real distribution of the prediction model and the
effectiveness of the prediction algorithm can’t be assured.
On the other hand, when there are too many training data,
the training time is long. So how to choose a set of training
data to reflect the real distribution of the prediction model
and decrease the training time in the huge training data is a
very important problem in time series prediction.
In this paper, the ensemble learning and ANFIS are
integrated into En-ANFIS time series prediction algorithm;
the focus is to study the influence of sample selection and
weighting method in ensemble learning on the time series
prediction algorithm. The principle of ANFIS and the
ensemble learning will be introduced in Section II. The
En-ANFIS algorithm and how to apply it in the chaotic
time series and traffic flow prediction will be discussed in
Section III. The results of time series prediction using
En-ANFIS, the ANFIS using all training data (allANFIS)
and ANFIS unit in En-ANFIS are compared each other in
Section IV. The main conclusion and future work are given
in the Section V.
2. ANFIS and Ensemble learning
ANFIS is a neural network implementation of a T-S
(Takagi-Suguno) fuzzy inference system. ANFIS apply the
hybrid algorithm, which integrates BP (Backpropagation)
and LSE (least square estimation) algorithm, so it has rapid
learning speed. Ensemble is a learning paradigm where
multiple component learners are trained for a same task,
and the predictions of the component learners are combined
for dealing with future instances [6]. Since an ensemble is
often more accurate than its component learners, such a
paradigm has become a hot topic in recent years and has
already been successfully applied to optical character
recognition, face recognition, scientific image analysis,
medical diagnosis, etc [7].
In this paper, the output of ensemble learning is the
average of each ensemble unit with different coefficient.
0-7803-9091-1/05/$20.00 ©2005 IEEE
3552
Proceedings of the Fourth International Conference on Machine Learning and Cybernetics, Guangzhou, 18-21 August 2005
When used in classification, the output of the ensemble
learning often is the voting result of all ensemble units.
There are many methods to realize ensemble learning. In
this paper, we will use bootstrap (sampling with
replacement) and random sample without replacement to
construct the subsystems in ensemble system [8].
3. En-ANFIS structure and time series data
description
In this paper, the structure of the proposed En-ANFIS
is illustrated in Fig.1.

Training data
ANFIS1'
Sample tech.
ANFIS2'
ANFISn-1' ANFISn’
Testing data
ANFIS1 ANFIS2 ANFISn-1
ANFISn
System output
Sample layer
Training layer
Testing layer
Output layer
Input layer
Figure1. The structure of En-ANFIS
In Fig.1, En-ANFIS is divided into five layers: input
layer, sample layer, training layer, testing layer and output
layer. Each ANFISi' is trained using bootstrapped or
random selected training data. ANFIS
i
is the trained
ANFIS
i
’. The testing data input to every ANFIS
i
at the same
time. The output of En-ANFIS is the comprehensive output
of all ANFIS units in testing layer. Two methods are
adopted in this paper to calculate the output of En-ANFIS,
which are uniform weighting as in (1) and non-uniform
weighting according to the reciprocal of the training error
of ANFIS
i
’ as in (2) and (3).
Apparently, En-ANFIS is more complicated than a
single ANFIS, but as it is parallel computation, the
complexity of computation do not increase. Furthermore,
the number of random sample or bootstrap is less than the
all-training data; the complexity of computation will
decrease accordingly. Multi bootstrap or random sample
will guarantee to approximate the intrinsic distribution of
data. A chaotic time series and traffic flow time series will
be used to validate the effect of the proposed En-ANFIS.
n ANFIS nANFIS
n
i
i
/ E
1

=
=

=
=
n
i
i i
ANFIS k EnANFIS
1
*
(1)
(2)

=
=
n
i
i
i
i
trnRMSE
trnRMSE
k
1
) / 1 (
) / 1



(3)
The chaotic used is Mackey-Glass time series data [9],
whose differential equation is defined as:
) ( 1 . 0
) ( 1
) ( 2 . 0
) (
10
.
t x
t x
t x
t x −
− +

=
τ
τ
(4)
When and 2 . 1 ) 0 ( = x 17 = τ , it is a non-periodic
and non-convergent time series illustrated in Fig.2, which is
very sensitive to initial conditions. (We assume
whent .)
0 ) ( = t x
0 <

0 200 400 600 800 1000 1200
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
Time (sec)
x
(
t
)
Mackey-Glass Chaotic Time Series

Figure2. The chaotic time series

Another time series we used is traffic flow data,
collected in Beijing urban expressway (3nd ring road). The
detail of the data collection and transmission can be seen in
[10]. The one-week traffic flow data is illustrated in Fig.3.

3553
Proceedings of the Fourth International Conference on Machine Learning and Cybernetics, Guangzhou, 18-21 August 2005
0 1000 2000 3000 4000 5000 6000
0
200
400
600
800
1000
1200
1400
1600
1800
2000
time(2min)
f
l
o
w
(
v
e
h
/
h
)
trafficData

Figure3 Traffic flow data in Beijing 3rd ring road

4. The comparisons and analysis of different
prediction algorithms
4.1. Training data and test data
Similar to [5], We predict x (t+6) from the four past
values of the chaotic time series, that is, x (t-18), x (t-12), x
(t-6), and x(t). Therefore the format of the training data is
[x(t-18), x(t-12), x(t-6), x(t); x(t+6)].
From t = 118 to 1117, we collect 1000 data pairs of the
above format. For allANFIS, the first 500 are used for
training while the others are used for testing. For each
ANFIS unit, we only use 30% of the training data that is
150 sets of data. The training data using random sample are
different, but that of bootstrap have some repetitious data.
Fig.4 shows the training data of ANFIS’
10
after
bootstrapping, consisting of 131 different training data and
19 repetitious training data.

0 50 100 150
0
100
200
300
400
500
Bootstrap time
t
r
a
i
n
i
n
g

d
a
t
a

n
u
m
b
e
r

Figure4.New training data for ANFIS
10
after bootstrapping
In traffic flow prediction, we use the y(t), y(t-1), y(t-2)
and y(t-4) to predict y(t+1) after using input selection.
Therefore the format of the training data is: [y(t-4), y(t-2),
y(t-1), y(t); y(t+1)]. We total got 5030 sets of traffic data.
The 70% (3521) of them are used in training and others
(1509) for testing. We also use the same setting for
bootstrap sampling and random sampling, which is each
ANFIS unit only use 30% of all training data.
To ensure the same standard for comparison,
En-ANFIS have 10 ANFIS unit, and each ANFIS unit adopt
the same parameter set: 55 nodes, 80 linear parameters; 24
nonlinear parameters, 16 fuzzy rules and 10 training epochs.
The only difference of ANFIS units is the training data. The
comparative performance indices (PI) include RMSE (root
mean squared error), APE(%)(average percentage of error),
TT(s) (the training time in second) and NTD (the number of
training data). APE is defined as:
n
y
y i y abs
n
i
/ %) 100 *
) ) ( (
( APE
1

=

= (5)
Different sampling and weighting methods result in
different output of En-ANFIS, as in Table 1.

Table1. The different kinds of En-ANFIS
Uniform
weighting
Non-uniform
weighting
Random sample En-ANFIS1 En-ANFIS2
Bootstrap sample En-ANFIS3 En-ANFIS4
4.2. Chaotic time series prediction
Table 2 the PI comparison of different algorithms
RMSE APE TT NTD
ANFISmin 0.0031 0.23 1.82 150
ANFISmax 0.0040 0.30 2.32 150
Random
sampling
ANFISmean 0.0034 0.25 2.05 150
ANFISmin 0.0034 0.24 1.72 150
(124)
ANFISmax 0.0048 0.30 2.64 150
(135)
Bootstrap
sampling
ANFISmean 0.0039 0.28 2.09 150
(129)
En-ANFIS1 0.0027 0.21 2.32 150
En-ANFIS2 0.0029 0.22 2.32 150
En-ANFIS3 0.0027 0.21 2.64 150
(135)
En-ANFIS4 0.0029 0.22 2.64 150
(135)
allANFIS 0.0025 0.19 6.38 500

The main PI of En-ANFIS, allANFIS and ANFIS units
are shown in Table2. If the sample time is omitted and
3554
Proceedings of the Fourth International Conference on Machine Learning and Cybernetics, Guangzhou, 18-21 August 2005
suppose the ANFIS unit works in parallel, we can think that
the TT of En-ANFIS is the maximum TT of all ANFIS units
and the NTD of En-ANFIS is the maximum NTD of all
ANFIS units. The number in bracket in Table 2 is the
different data of training data, as bootstrap has some
repetitious data.
From the Table2, we can find that the En-ANFIS is
always better than any ANFIS unit whatever using different
sampling technologies or weighting methods. Fig.5 show
the different output error of AFNIS
1
, ANFIS
2
and
En-ANFIS under bootstrap sampling, from which we can
find that the error scatterplots of En-ANFIS in contained in
that of two ANFIS units. Compared with allANFIS, the PI
of En-ANFIS is almost the same, but a little worse,
however, the TT and NTD of En-ANFIS decrease
apparently. The PI of En-ANFIS using different sampling
technologies and weighting methods do not show
apparently difference, althoug it seems that uniform
weighting is a little better. Fig.6 shows that output error
scatterplots of En-ANFIS and allANFIS under bootstrap
sampling.

0 50 100 150 200 250 300 350 400 450 500
-0.025
-0.02
-0.015
-0.01
-0.005
0
0.005
0.01
0.015
0.02
0.025
test data
e
r
r
o
r
Bootstrap
ANFIS1
ANFIS2
enANFIS3

Figure5. The comparison of output error scatterplots of
ANFIS
1
, ANFSI
2
and En-ANFIS

0 50 100 150 200 250 300 350 400 450 500
-0.015
-0.01
-0.005
0
0.005
0.01
0.015
test data
e
r
r
o
r
Bootstrap
enANFIS3
enANFIS4
allANFIS

Figure6. The output error scatterplots of En-ANFIS and
allANFIS
4.3. Traffic flow prediction
As the analytic process of chaotic time series
prediction, the main PI of En-ANFIS, allANFIS and ANFIS
units in traffic flow prediction are shown in Table3.

Table 3. The PI comparison
RMSE APE TT NTD
ANFISmin 134.41 15.20 11.30 1056
ANFISmax 152.45 16.43 13.07 1056
Random
sampling
ANFISmean 138.03 15.56 11.68 1056
ANFISmin 133.84 14.96 12.78 1056
(903)
ANFISmax 150.15 16.47 17.99 1056
(922)
Bootstrap
sampling
ANFISmean 142.41 15.70 15.48 1056
(915)
En-ANFIS1 128.83 14.93 13.07 1056
En-ANFIS2 128.67 14.93 13.07 1056
En-ANFIS3 127.56 14.73 17.99 1056
(922)
En-ANFIS4 127.87 14.77 17.99 1056
(922)
AllANFIS 126.97 14.72 40.02 3521

From Table3, we also get the similar results as in
chaotic time series prediction. The En-ANFIS is always
better than any ANFIS unit in prediction accuracy. The
prediction accuracy of En-ANFIS is similar to that of
allANFIS, however, En-ANFIS uses much less time and
training data. The sampling method and weighting manner
have little influence on the prediction accuracy of
En-ANFIS. In this case, bootstrap has a little better off than
the random sampling; however, weighting manner almost
has no influence at all.

0 200 400 600 800 1000 1200 1400 1600
-1000
-800
-600
-400
-200
0
200
400
600
800
test data
e
r
r
o
r
Random
ANFIS1
ANFIS2
enANFIS1

Figure7. The output error of two ANFIS unit and
En-ANFIS

3555
Proceedings of the Fourth International Conference on Machine Learning and Cybernetics, Guangzhou, 18-21 August 2005
0 200 400 600 800 1000 1200 1400 1600
-500
-400
-300
-200
-100
0
100
200
300
400
500
test data
e
r
r
o
r
Random
enANFIS1
enANFIS2
allANFIS

Figure8 The output error of two En-ANFIS and
allANFIS
Fig.7 shows the error difference among two ANFIS
unit and En-ANFIS under the random sample, the curve of
En-ANFIS is also contained in that of the ANFIS unit. Fig.8
shows the error curves of two kinds of En-ANFIS and
allANFIS under the random sample, it is easy to find that
three curve are overlapped each other, so it is hard to
determine which method is the best.
5. Conclusions
From the above analysis and comparison in chaotic
time series and traffic flow prediction, we can find that the
En-ANFIS can improve performance much better than any
ANFIS unit. On the other hand, the ensemble of multiple
weak ANFIS units can reach a high performance. So, it is
possible to use less training data, less training time to
achieve a good effect. We also discuss how the sampling
method and weighting manner have influence on the PI of
En-ANFIS. According to the experimental results, it is no
apparent different using different sampling technologies
and weighting methods.
The method put forward in this paper not only can be
used in ANFIS ensemble, but in other system ensemble,
such as ensemble neural network, and ensemble fuzzy
system. Furthermore, it is believed that this method is not
only effective in prediction problem, but it also will work in
other domain, such as classification, pattern recognition and
so on.
In this paper, we just give an elementary work on how
to integrate the ensemble learning and ANFIS and use it in
time series prediction. There is so much future work to do,
such as the new sampling technologies, the convergence of
the algorithm, the number of ANFIS unit, the weighting
method, the proportion of training data used in ANFIS unit
and so on.
Acknowledgements
This paper is supported by an open research
foundation from Shanghai Key Laboratory of Intelligent
Information Processing, under grant IIPL-04-014.
References
[1] Shumway RH and stoffer D S. Times Series Analysis
and its application. New York: Springer-Verlag, 2000.
[2] Jie Ma and Jian-fu Teng, “Predict chaotic time-series
using unscented Kalman filter”, Proceeding of the
Third International Conference on Machine Learning
and Cybernetics, Shanghai, China, 26-29, pp.867-890
August, 2004.
[3] R.S Crowder, “Predicting the Mackey-Glass time
series with cascade correlation learning” in Proc. 1990
Connectionist Models Summer School, D.Toutezky, et
al Eds., Carnegie Mellon Univ. , pp.117-123,1990.
[4] Kandel, Fuzzy expert systems, Boca Raton, FL: CRC
Press, 1992.
[5] Jyh-Shing Roger Jang, “ANFIS:
adaptive-network-based fuzzy inference system”,
IEEE Trans. On SMC, Vol.23, No.3, pp.665-685,
1993.
[6] Thomas G. Dietterich, “Machine Learning Research:
Four Current Directions”, Artificial Intelligence, 18(4),
pp.97--136, 1998
[7] Z.-H. Zhou, J. Wu, and W. Tang. ``Ensembling neural
networks:many could be better than all'', Artificial
Intelligence, , 137(1-2), pp. 239-263, 2002.
[8] T. Hastie; R. Tibshirani; J. Friedman, "The elements
of statistical learning", Springer-Verlag, 2001.
[9] M.C. Mackey and L.Glass, “Oscillation and chaos in
physiological control systems”, Science , Vol.197,
No.287-289, July, 1997.
[10] Dewang Chen, Junping Zhang, Shuming Tang and
Jue Wang, “Freeway Traffic Stream Modeling Based
on Principal Curves and Its Analysis”, Vol.5,No.4,
IEEE Transaction On Intelligent Transportation
Systems, pp.246-258, Oct., 2004.

3556