You are on page 1of 8

A COMPARATIVE ANALYSIS OF NEURO-FUZZY AND ARIMA MODELS FOR URBAN RAIL PASSENGER

DEMAND FORECASTING

Milos Milenkovic 1, Nebojsa Bojovic2 , Resad Nuhodzic3


1
Division for Management in Railway, Rolling stock and Traction, The Faculty of Traffic and Transport
Engineering, University of Belgrade, 11000 Belgrade, Serbia
2
Division for Management in Railway, Rolling stock and Traction, The Faculty of Traffic and Transport
Engineering, University of Belgrade, 11000 Belgrade, Serbia
3
Railway Transport of Montenegro, Podgorica, Montenegro

Abstract: The success of strategic and detailed planning of urban rail transportation highly depends on accurate demand
information data. Forecasting is the key to the success of rail passenger operations planning, such as timetabling and seat
allocation. Adaptive neuro-fuzzy inference system (ANFIS) is a class of adaptive multi-layer feed forward networks,
applied to nonlinear forecasting where past samples are used to forecast the sample ahead. ANFIS incorporates the self-
learning ability of neural networks with the linguistic expression function of fuzzy inference. In this paper we made an
application of ANFIS to model rail passengers flow on Belgrade urban railway network. The performance of the neuro-
fuzzy network is compared with a traditional linear model (ARIMA). The models were evaluated for their ability to produce
accurate forecasts.

Keywords: railway, passenger demand, forecasting, ANFIS, ARIMA.

1. Introduction

Passenger flow forecasting is a vital component of transportation systems which can be used to fine-tune travel behaviors,
reduce passenger congestion, and enhance service quality of transportation systems. The forecasting results of passenger
flow can be applied to support transportation system management such as operation planning, and station passenger crowd
regulation planning. In some cases, it is used for establishing the daily train timetables which have direct impact on resource
allocation and utilization. The success of strategic and detailed planning of public transportation highly depends on accurate
demand in formation data. Also, passenger flow forecast represent a basic work for urban rail transport project investment
decision analysis. It is a measure of the economic costs of construction projects. Therefore, scientific and reasonable
passenger flow forecast helps the fundamental guarantee for investment decision.
The transportation forecasting approaches can be generally divided into two categories: parametric and non-parametric
techniques (Smith et al., 2002). Parametric techniques and non-parametric techniques refer to the functional dependency
assumed between independent variables and the dependent variable. In the traditional parametric techniques, historical
average, smoothing techniques and autoregressive integrated moving average (ARIMA) have been applied to forecast
transportation demand. For the non-parametric techniques, several methods have been used to forecast the transportation
demand such as neural networks (Dougherty, 1995; Vlahogianni et al., 2004), non-parametric regression (Smith et al., 2002;
Clark, 2003), Kalman filtering models (Wang and Papageorgiou, 2007), and Gaussian maximum likelihood (Tang et al.,
2003).
In this paper, we studied the urban railway passenger demand in the city of Belgrade. On the base of available historical
data about rail passenger flows two forecasting techniques are proposed and compared, a hybrid of Neural Network and
Fuzzy Logic, known as Adaptive Network-based Fuzzy Inference System (ANFIS) and Autoregressive Integrated Moving
Average (ARIMA) model. The presented models are used to provide one year ahead rail passenger demand forecast.
The remainder of this paper is organized as follows. Section 2. describes the underlying principle of neuro-fuzzy systems
and the architecture of ANFIS. The main characteristics of ARIMA models are given in Section 3. Belgrade rail node, data
used for forecasting and explaining variables are presented in Section 4. The applicability of proposed methods is
demonstrated by modeling the urban rail passenger demand for Belgrade rail node in Section 5. The concluding remark is
provided in Section 6.

2. Neuro fuzzy models

Neuro-fuzzy modeling refers to the way of applying various learning techniques developed in the neural network literature
to fuzzy modeling or to a fuzzy inference system (FIS). The fuzzy inference system (FIS) is a popular computing

1
Corresponding author: m.milenkovic@sf.bg.ac.rs
framework based on the concepts of fuzzy set theory, fuzzy if-then rules, and fuzzy reasoning. Because of its
multidisciplinary nature, the FIS is known by numerous other names, such as fuzzy expert system, fuzzy model, fuzzy logic
controller, and simply fuzzy system (Pribyl and Goulias, 2003).
The basic structure of a FIS is composed of five functional blocks, a rule-base that contains a number of fuzzy if-then rules,
a database that defines the membership functions of the fuzzy sets used by the fuzzy rules, a decision-making subsystem
that performs the inference operations on the rules, a fuzzification interface that transforms crisp measurement to degrees
of membership to different fuzzy sets and finally, a defuzzification interface that transforms the fuzzy results into a crisp
output (e.g. a control signal, a predicted value, etc).
FIS implements a non linear mapping from its input space to the output space. This mapping is accomplished by a number
of fuzzy if-then rules, each of which describes the local behavior of the mapping. The parameters of the if-then rules
(referred to as antecedents or premises in fuzzy modeling) finedea fuzzy region of the input s pace, and the output
parameters (also consequents in fuzzy modeling) specify the corresponding output. Hence, the efficiency of the FIS depends
on the estimated parameters. However, the selection of the shape of
In the present study, the concept of the adaptive network, which is a generalization of the common back propagation neural
network, is employed to tackle the parameter identification problem in a FIS.

2.1. ANFIS architecture

ANFIS (Adaptive Neuro-Fuzzy Inference System) represents a class of adaptive multi-layer feedforward networks, applied
to nonlinear forecasting where past samples are used to forecast the sample ahead. The ANFIS was created by Jyh-Shing R.
Jang (1993) in order to incorporates the self learning ability of neural networks with the linguistic expression of fuzzy
inference.
Consider for example, that the FIS has two inputs x and y and one output z: Forfirst the order Sugeno fuzzy model, a
typical rule set with two fuzzy if-then rules can be expressed as:
Rule 1: If X is A1 and Y is B1 , then f1 = p1 x + q1 y + r1
Rule 2: If X is A2 and Y is B2 , then f 2 = p2 x + q2 y + r2
A1 , A2 and B1 , B2 are the MFs for inputs x and y , respectively. { p1 , q1 , r1} and { p2 , q2 , r2 } are the parameters of the output
function.
The ANFIS architecture and the reasoning mechanism are presented in Figure 1. Figure 1(a) illustrates graphically the fuzzy
reasoning mechanism to derive an output f from a given input [ x, y ] . The firing strengths w1 and w2 are usually obtained
as the product of the membership grades of the premise part, and the output f is the weighted average of each rule’s output.
As we can see from the Figure 1b, the corresponding equivalent ANFIS network architecture is composed of five layers,
fuzzy layer, product layer, normalized layer, de-fuzzy layer and the total output layer.
Each layer contains several nodes described by the node function. Adaptive nodes, denoted by squares, represent the
parameter sets that are adjustable in these nodes, whereas fixed nodes, denoted by circles, represent the parameter sets that
are fixed in the system. The output data from the nodes in the previous layers will be the input in the present layer. The
corresponding ANFIS structure is shown on Fig 2b, where nodes within the same layer perform functions of the same type.
Output of the node i in the layer j is denoted as Oi j . To illustrate the ANFIS procedures, we consider the above system
having two inputs [ x, y ] and one output f . The relationship between input and output of each layer is discussed bellow.
(a)
A1 B1

w1 f1 = p1 x + q1 y + r1

w1 f1 + w2 f 2
X Y =f
w1 + w2
= w1 f1 + w2 f 2
A2 B2

w2 f 2 = p2 x + q2 y + r2

x X y Y
(b)
Layer 1 Layer 2 Layer 3 Layer 4 Layer 5

x y
A1 w1 w1
x Π N w1f1
A2
Σ f

B1 w2f2
y Π w2 N w2
B2
x y

Fig. 1. (a) Fuzzy inference system. (b) Equivalent ANFIS structure (Jang, 1993)
Layer 1: In this layer, each node generates a membership grade of a linguistic label. The node function of the i -th node is
generalized bell membership function:
1
=Oi1 µ=Ai ( x ) 2 bi
(1)
 x − ci 
1+  
 ai 
Where x is the input to node i , Ai the linguistic label associated with this node and {ai , bi , ci } the parameter set that changes
the shapes of the membership function. ANFIS uses gradient descent to fine-tune them during the training process.
Parameters in this layer are referred to as the premise parameters.
Layer 2: Each node in this layer multiplies the incoming signals, denoted as ∏ , and the output represents the firing strength
of a rule,
O=
i
2
w=i µ Ai ( x) µ Bi ( y ), =
i 1, 2 (2)
Layer 3: Node i in this layer calculates the ratio of the i -th rule firing strength to the total of all firing strengths:
wi
O=3
w=
i ,=
i 1, 2 (3)
( w1 + w2 )
i

Layer 4: Node i in this layer computes the contribution of i -th rule towards the overall output using the node function:
Oi4 = wi fi = wi ( pi x + qi y + ri ) (4)
where wi is the output of layer 3 and { pi , qi , ri } is the consequent parameter set to be determined during the training process.
Layer 5: The single node in this layer computes the overall output as the summation of contribution from each rule:
∑i wi fi
=Oi5 ∑= wi f i (5)
i ∑ wi i

3. A brief review of ARIMA models

In real world applications, many processes can be represented using the time series as follows:
x(t − p ),..., x(t − 2), x(t − 1), x(t ) . For making a prediction using time series, a great variety of approaches are available.
Prediction of scalar time series {x (n)} refers to the task of finding estimate x (n + 1) of next future sample x(n + 1) based on
the knowledge of the history of time series (Rank, 2003).
Linear prediction, where the estimate is based on a linear combination of N past samples, can be represented as below:
N −1
x (n=
+ 1) ∑ a x(n − i )
i (6)
i =0

=
with prediction coefficients α i , i 0,1,..., N − 1 .
Introducing general nonlinear function f (.) : R n → R applied to vector = x(n) [ x(n), x(n − M ),..., x(n − ( N − 1)) M ]T of the
past samples, nonlinear prediction approach x (n + 1) =f ( x(n)) is reached (Rank, 2003).
Traditionally, a time series forecasting problem is tackled using linear techniques such as Auto Regressive Moving Average
(ARMA) and Auto Regressive Integrated Moving Average (ARIMA) models. ARIMA model of a time series is defined by
three terms ( p, d , q ) . Identification of a time series is the process of finding integer usually very small (e.g., 0,1, or 2)
values of p , d , and q that model the patterns in the data. When the value is 0, the element is not needed in the model. The
middle element, d , is investigated before p and q . The goal is to determine if the process is stationary, and, if not, to
make it stationary before determining the values of p and q .
The simplest form of the ARIMA model is called the autoregressive model and is similar to linear regression model. Let
zt stand for the value of a stationary time series at time t . By autoregressive, we assume that current zt values depend on
past values from the same series. In symbols, at any t
zt= C + Φ1 zt −1 + Φ 2 zt − 2 + ... + Φ p zt − p + at (7)
Where C is the constant level, zt −1 , zt − 2 ,..., zt − p are past series values (lags), the Φ are coefficients (similar to regression
coefficients) to be estimated, and at is a random variable with zero mean and constant variance. The at are assumed to be
independent and represent random error or random shocks. Just as in regression, some of the Φ coefficients may be zero. If
zt − p is the furthest lag with a nonzero coefficient, the AR model is of order p , denoted as AR ( p ) .
In the case of autoregressive model in which the current observation depends on all past observations we would have too
many parameters Φ to make estimation possible. However, if the Φ themselves are given by a few parameters, then
estimation becomes feasible. If, for example, Φ i =−θ i so that
zt =−θ zt −1 − θ 2 zt − 2 − θ 3 zt − 3 − ... + at (8)
Solving for at gives
zt θ zt −1 + θ 2 zt − 2 + θ 3 zt − 3
at =+ (9)
Multiplying the expression for at −1 by θ , we obtain
θ at −1 =θ zt −1 + θ 2 zt − 2 + θ 3 zt −3 (10)
So by subtraction
at − θ at −1 = zt (11)
This type of model represents a moving average model ( MA ) since zt is a weighted average of the uncorrelated at ’s. in
general we can model zt as
zt = at − θ1at −1 − θ 2 at − 2 − ... − θ q at − q (12)
Such a model is called a moving average model of order q or MA(q ) .
We can combine the AR i MA models for stationary series to account for both past values and past shocks. Such a model
is called an ARMA( p, q ) model with p order AR terms and q order MA terms.
zt = Φ1 zt −1 + Φ 2 zt − 2 + ... + Φ p zt − p + at − θ1at −1 − θ 2 at − 2 − ... − θ q at − q (13)
However, most real series show a trend, an average increase or decrease over time. Series also show cyclic behavior. We
can remove trends and cycles from a series through differencing. For example, suppose t is in months and Yt is a series with
a linear trend. That is, every month the series increases on average by some constant amount C . Since
Yt =C + Yt −1 + N t where N t is a random “noise” component with expectation zero, then the differences, Yt − Yt −1 are equal
C + N t . Thus, N t + C is a stationary series with the linear trend removed. We could now apply the ARMA model to
z=
t C + N t , even though it is not appropriate to do so to Yt directly. One an ARMA model for zt is known, we could
reverse the differencing to form the original Yt from the zt . We term this process integration. The combined model for the
original series Yt , which involves autoregression, moving average, and integration (I) is termed the ARIMA( p, d , q ) model,
with p AR terms, d differences, and q MA terms.
To simplify notation, the ARIMA literature introduces the “backshift” operator, B . B operates on the observation Yt by
shifting it one point back in time. Thus,
B(Yt ) = Yt −1 (14)
B may be exponentiated in this manner:
=B 2 B[= B(Yt )] B=(Yt −1 ) Yt − 2 (15)
In general,
Bk (Yt ) = Yt − k (16)
For the general case, involving differencing d times,
zt = (1 − B ) d Yt =
∇ d Yt (17)
where the ∇ operator is substituted for (1 − B) and serves as a differencing operator. Using backshift notation, the
ARIMA model equation
zt= C + (Φ1 zt −1 + Φ 2 zt − 2 + ... + Φ p zt − p ) − (θ1at −1 + θ 2 at − 2 + ... + θ q at − q ) + at (18)
becomes
zt= C + (Φ1 B + ... + Φ p B p ) zt + (1 − θ1 B − ... − θ q B q )at (19)
To simplify notation further, the autoregressive polynomial
(1 − Φ1 B − ... − Φ p B p )
ss often abbreviated as Φ ( B) , and the moving average polynomial
(1 − θ1 B − ... − θ q B q )
is abbreviated as θ ( B) . Substituting, the model is compactly written as
Φ ( B )∇ d Yt =C + θ ( B )at (20)
or
C θ ( B)
∇ d Y= + at (21)
Φ( B) Φ( B)
t

4. Case study characteristics and data analysis

The city of Belgrade has its own commuter rail transit system called Beovoz which is operated by Serbian Railways.
Beovoz (Беовоз) provides mass-transit service within the Belgrade metropolitan area. Belgrade suburban railway system
connects the suburbs and nearby cities to the west, north and south of the city. Beovoz is operated by Serbian Railways.
In this paper we focused our investigation on a line Pancevo Bridge – Batajnica. This line belongs to Bg-Train as urban rail
system that serves the city of Belgrade. It is operated by the public transit corporation GSP Belgrade and is a part of the
integrated BusPlus system. Bg Train shares tracks and stations with the commuter rail system Beovoz.

4.1. The data

A passenger flow dataset is collected to investigate the viability of the proposed ANFIS approach for forecasting the
passenger flow. Available data belong to nineteen consecutive years from 1993. to 2011. (Fig. 2.)
The dataset for average daily passsenger flow on Bg-train line is not available and it is evaluated on the base of an average
daily number of passengers in Beovoz system and the share of Bg-train line in total passenger flow within the Beovoz
system. This share is evaluated with respect to line-based passenger flow recordings performed by the Transport Institute
CIP in time period from 1993. to 1995., 1997. to 2001., and for 2007. year (Milanovic, 2012)
Passenger flow recordings for Bg-train have also been performed for 2010. and 2011. but this time by the GSP Belgrade.
For the time period from 2008. to 2010. we used a special function in Matlab to fill in missing values.

Fig. 2. Average daily passenger flow on Bg-Train line

4.2. Explanatory variables

To obtain accurate forecasts, the most relevant factors should be determined and included in the model (Owen and Phillips,
1987). We hypothesize that the demand for urban rail travel as measured in this study is a function of population,
employment, economic activity and car ownership in narrower gravitation area surrounding considered Bg-train line.
It is well known and widely recognized that mobility and mode choice is affected by the urban population density. In
general, dense cities are associated with a high use of public transport. Dense cities are expected to have large transport
systems since supply becomes profitable (or less expensive) by taking advantage of scale and density economies.
Employment level is a common demographic variable used in causal analysis of rail passenger demand (Liu, 1993).
We also hypothesize that the demand for urban rail travel on Bg-train line is a function of the level of economic activity.
We used the GDP output series as the economic activity variable. The relationship between the demand for urban rail travel
and this variable would be the outcome of two opposing tendencies. On the one hand, high income people would tend to do
more travelling both in the course of their work and for leisure, but they would also tend to own more cars and therefore be
less likely to make a given trip by rail. Cars undoubtedly represent an important source of competition for city rail
passenger service. Rising car ownership increases the competition against the railway, and consequently should have a
negative impact on rail demand.

5. Comparison of ANFIS and ARIMA models for urban rail passenger demand forecasting

In this section, the performances of the ANFIS are compared with a traditional linear ARIMA model. The models were
evaluated for their ability to produce accurate forecasts for urban rail passenger flow on Bg-train line. The mean square
error is used as the main criteria to assess the derived models.
5.1. ANFIS

In this section, ANFIS is trained and then used to predict the next year average daily passenger flow on Bg-train line. All
input and output data were re-scaled to lie in the range 0-1. The number of membership functions (MFs) assigned to each
input variable is chosen empirically, that is, by plotting the data set and examining it visually, or simply by trial and error.
The model uses a hybrid learning algorithm to identify the parameters for Sugeno-type fuzzy inference systems. The first
60% of data was used for training the model and the 40% for testing the model. The training error goal is set to 0. The
model was trained many times using different time of epochs. Finally, the best results obtained at 300 epochs. The Root
Mean Square Error (MSE) was found to be around 0.00027 for training and 0.26821 for testing data. The output of the
model is the next year’s average daily passenger demand which is 9206 passengers. The next table summarizes the
parameters of the system. Performance analysis of the ANFIS model is realized using MATLAB’s ANFIS Toolbox and
GUI editor.

Table 1: ANFIS parameter types and their values used for training
ANFIS parameter type Value
MF type Trimf
Number of MFs 3
Output MF Constant
Number of Nodes 193
Number of linear parameters 81
Number of nonlinear parameters 36
Total number of parameters 117
Number of training data pairs 11
Number of testing data pairs 8
Number of fuzzy rules 81

5.2. ARIMA

Performance analysis of the ARIMA model is done using the trial version of SPSS package program. The best fit model
obtained is ARIMA (1,0,0). Real and ARIMA system outputs related to urban rail passenger forecasting are given in Fig. 3.

Fig. 3. Real and ARIMA system outputs related to urban rail passenger flow forecasting

The output of the model is the next year’s average daily passenger demand which is 9230 passengers. The forecasting
results for ARIMA are slightly different with respect to ANFIS model. The estimate equation obtained by the ARIMA (1, 0,
0) model is as follows:
= Z t 8229.209 + 0.576 Z t −1
Accuracy of the ARIMA (1,0,0) model, ACF and PACF corelograms are examined and shown in Fig. 4.
Fig. 4. Residual ACF and PACF corelograms of the ARIMA(1,0,0) model

5.3. Comparison results

In this study, four inputs and one output are used to forecast the urban rail passenger volume on Bg-train line in the ANFIS
and ARIMA models. The performance comparisons of the ANFIS and ARIMA models due to RMSE – root mean square
error criteria are shown in Table 2.

Table 2. Performance comparison of ANFIS and ARIMA models


RMSE
ANFIS 0.00027
ARIMA 0.02356

As can be understood from Table 2., compared to the ARIMA, the ANFIS due to RMSE criteria gives much better results.

6. Conclusion

Short-term passenger flow forecasting can provide useful information for decision makers of urban rail passenger systems.
An accurate and stable passenger flow forecasting model can be applied to support transportation system management such
as operation planning, revenue planning, and facility improvement. ANFIS has gained a great popularity in time-series
prediction because of its simplicity and reliability. The performance of ANFIS and ARIMA is compared for the short-term
forecasting the passenger flow volume on Bg-Train line. Both models give very close forecasts of the one-year ahead
number of passenger on Bg-train line but compared to the ARIMA, the ANFIS can more efficiently capture dynamic
behavior of the urban rail passenger flow on Bg-train line.

Acknowledgement

This paper has been performed as the result of research on project MNTR036022 “Critical infrastructure management for
sustainable development in postal, communication and railway sector of Republic of Serbia” and on project MNTR036027
“Software development and national database for strategic management and development of transportation means and
infrastructure in road, rail, air and inland waterways transport using the European transport network models"

References

Box, G. E.; Jenkins, M. 1976. Time series analysis: forecasting and control. Holden-Day, San Francisko, USA.
Clark, S. 2003. Traffic prediction using multivariate nonparametric regression, Journal of Transportation Engineering 129
(2): 161–168.
Dougherty, M. 1995. A review of neural networks applied to transport, Transport Research Part C 3(4): 247–260.
Jang, J. R. 1993. ANFIS: adaptive network based fuzzy inference system, IEEE Transactions on Systems, Man and Cybernetics 23
(3): 665–683.
Liu, Z. 1993. Determinants of Public Transit Ridership: Analysis of Post World War II, Trends and Evaluation of
Alternative Networks. Harward University, USA.
Milanovic, Z. 2012. Electronic business in transport planning based on modular application of micro-simulation models, Ph.D.
thesis, Graduate School of Business Studies, Megatrend University, Belgrade.
Owen, A. D.; Phillips, G. A.: The characteristics of railway passenger demand, Journal of transport economics and policy,
21(3), pp. 231-253, 1987.
Pribyl O.; Goulias K. 2003. On the application of adaptive neuro-fuzzy inference system (ANFIS) to analyze travel
behavior, In Proceedings of the 82nd Annual Transportation Research Board Meeting, Washington, D.C., USA.
Rank, E. 2003. Application of Bayesian trained RBF Networks to nonlinear time-series modeling, Signal Processing 83:
1393-1410.
Smith, B. L.; Williams, B. M.; Keith Oswald, R. 2002. Comparison of parametric and nonparametric models for traffic
flow forecasting, Transportation Research Part C 10(4): 303–321.
Tang, Y. F.; William H. K.; Pan L. P. 2003. Comparison of four modeling techniques for short-term AADT forecasting in
Hong Kong, Journal of Transportation Engineering 129(3): 271–277.
Vlahogianni, E. I.; Golia, J. C.; Karlaftis, M. G. 2004. Short-term traffic forecasting: overview of objectives and methods,
Transport Reviews 24(5): 533–557.
Wang, Y.; Papageorgiou, M. 2007. Real-time freeway traffic state estimation based on extend Kalman filter: a case study,
Transportation Science 42(2): 167–181.

You might also like