You are on page 1of 93

i

TIME SERIES MODELING USING MARKOV AND ARIMA MODELS

MOHD KHAIRUL IDLAN BIN MUHAMMAD

A report submitted in partial fulfillment of the


requirements for the award of the degree of
Master of Engineering (Civil Hydraulic & Hydrology)

Faculty of Civil Engineering


Universiti Teknologi Malaysia

JANUARY 2012

iii

DEDICATION

Special dedication to my beloved father and mother


Mr. Muhammad bin Ismail
and
Madam Siti Maznah binti Abdullah
and
My inspiration

Jazakumullahu khairan for all love and inspiration


throughout the entire creation of this thesis.

iv

ACKNOWLEDGEMENT

Assalammualaikum w.b.t.

Alhamdulillah, all praise to Allah S.W.T for the gift of life and what I have achieved
today.

Appreciation goes to my family for their prayers, moral and financial support. May
Allay reward you abundantly.

My sincere and deepest gratitude goes to my supervisor, Dr. Sobri Harun for his
guidance, encouragement and support in completing this master project.

My gratitude to Dr. Muhammad Askari for his invaluable suggestions, guidance, and
encouragement.

Last but not least, to all my lecturers, classmates and friends, their help and supports are
really appreciated and will be remembers forever, InsyaALLAH. Thank you all

ABSTRACT

Streamflow forecasting plays important roles for flood mitigation and water
resources allocation and management. Inaccurate forecasting will cause losses to water
resources managers and users. The suitability of forecasting method depends on type and
number of available data. Thus, the objective of this study are to propose the streamflow
forecasting methods using Markov and ARIMA models and to inspect the accuracy of
Markov and ARIMA models in forecasting ability. Streamflow data of Sungai Bernam,
Selangor was used. Minitab and Microsoft Excel were used to model ARIMA and
Markov respectively. Criteria performance evaluation procedure that being used in this
study were Mean Absolute Percentage Error (MAPE), Root Mean Squared Error
(RMSE) and Chi-square test of Normality to inspect the forecasting accuracy of the
different models. The tentative model that best fits the criteria and meets the requirement
for ARIMA model is ARIMA (1,1,1)(0,1,1)12. From the criteria performance evaluation
procedure, ARIMA model has better performance of model for forecasting than Markov
model in this study. Therefore, ARIMA model has the ability to accurately predict the
future monthly streamflow for Sungai Bernam.

vi

ABSTRAK

Peramalan aliran sungai memainkan peranan yang penting untuk kawalan banjir
dan pengurusan air. Peramalan yang tidak tepat akan menyebabkan kerugian kepada
pihak pengurusan sumber air dan juga kepada pengguna. Kesesuaian kaedah peramalan
bergantung kepada jenis dan jumlah data yang tersedia. Maka, objektif kajian ini adalah
untuk mencadangkan kaedah peramalan aliran sungai dengan menggunakan model
Markov dan ARIMA dan untuk memeriksa ketepatan model Markov dan ARIMA dalam
membuat peramalan. Data aliran sungai Sungai Bernam telah digunakan. Minitab
digunakan untuk memodelkan model ARIMA dan Microsoft Excel digunakan untuk
memodelkan model Markov. Prosedur penilaian prestasi kriteria yang digunakan dalam
kajian ini ialah Mean Absolute Percentage Error (MAPE), Root Mean Squared error
(RMSE) dan ujian Chi-Squared untuk memeriksa ketepatan peramalan model-model
yang berlainan. Tentatif model yang terbaik sesuai dengan kriteria dan memenuhi
kehendak untuk model ARIMA ialah ARIMA (1,1,1)(0,1,1)12. Dari prosedur penilaian
prestasi kriteria, model ARIMA mempunyai prestasi yang lebih baik dalm membuat
ramalan berbanding dengan model Markov. Justeru, model ARIMA mempunyai
keupayaan untuk meramalkan dengan tepat aliran sungai di masa hadapan untuk Sungai
Bernam.

vii

TABLE OF CONTENTS

CHAPTER

TITLE
DECLARATION
DEDICATION
ACKNOWLEDMENT
ABSTRACT
ABSTRAK
TABLE OF CONTENTS
LIST OF TABLES
LIST OF FIGURES
LIST OF APPENDICES
LIST OF ABBREVIATIONS

PAGE
ii
iii
iv
v
vi
vii
x
xi
xii
xiii

INTRODUCTION

1.1

Background of study

1.2

Problem Statement

1.3

Justification of the Study

1.4

Aim and Objectives

1.5

Scope of Study

LITERATURE REVIEW

2.1

Introduction

2.2

Time Series Model

2.3

Forecasting Time Series

2.4

Streamflow Forecasting Method

10

2.4.1

11

Markov Model

viii

2.4.2

ARIMA Theory

12

2.4.3

ARIMA Algorithms

13

2.4.3.1

AR Model

14

2.4.3.2

MA Model

14

2.4.3.3

ARMA Model

15

2.4.3.4

ARIMA Model

16

2.5

Reviews on Markov Model

17

2.6

Review on ARIMA Model

18

2.7

Concluding Remarks

19

METHODOLOGY

20

3.1

Introduction

20

3.2

Markov Model

21

3.2.1

Statistical Parameters of Historical Data

21

3.2.2

Identification of Distribution

23

3.2.3

Generation of Random Numbers

24

3.2.4

Formulation of the Markov Model

24

3.3

ARIMA Model

25

3.3.1

26

3.3.2

Model Assumptions
3.3.1.1

Data Stationarity

26

3.3.1.2

Normal Distribution

27

3.3.1.3

Outlier

28

3.3.1.4

Missing Data

28

Model Procedure

29

3.3.2.1

Model Identification

29

3.3.2.2

Parameter Estimation

31

3.3.2.3

Diagnostic Checking

31

ix
3.3.3
3.4

Model Comparison and Forecast Evaluation Measures

32
33

RESULTS AND DISCUSSION

35

4.1

Introduction

35

4.2

Estimation of Missing Data Values

36

4.3

Markov Model

38

4.3.1

Statistical Parameters of Historical Data

39

4.3.2

Identification of Distribution

40

4.3.3

Generation of Random Numbers

43

4.3.4

Streamflow Generation of Markov Model

45

4.3.5

Validation of Markov Model

46

4.4

3.4

Minitab Procedure

ARIMA Model

48

4.4.1

Model Identification

49

4.4.2

Parameter Estimation

53

4.4.3

Diagnostic Checking

55

4.4.4

Streamflow Generation of ARIMA Model

58

4.4.5

Validation of ARIMA Model

59

Model Comparison and Forecast Evaluation Measures

60

CONCLUSION AND RECOMMENDATIONS

65

5.1

Conclusion

65

5.2

Recommendations

66

REFERENCES
APPENDICES A-G

68
72 - 81

LIST OF TABLES

TABLE NO.

TITLE

4.1

Parameters of Monthly Historaical Data

4.2

Logarithmic Values of Observed Streamflow Data


for 1960-1970

PAGE
40
42

4.3

Generation of Random Number for Year 2006

45

4.4

Model Streamflow for Year 2006

46

4.5

Accuracy of the Markov Model

47

4.6

General Theoretical ACF and PACF of ARIMA

51

models
4.7

Final Estimates of Parameter for ARIMA (1,1,1)

54

(1,1,1)12
4.8

Final Estimates of Parameter for ARIMA (1,1,1)

54

(0,1,1)12
4.9

Modified Box-Pierce (Ljung Box) Chi-Square

55

statistic for ARIMA (1,1,1)(1,1,1)12


4.10

Modified Box-Pierce (Ljung Box) Chi-Square

56

statistic for ARIMA (1,1,1)(0,1,1)12


4.11

LSE and RMSE Test for ARIMA Tentative Model

56

4.12

Model Streamflow for Year 2006-2007

58

4.13

Accuracy of the ARIMA Model

60

4.14

Accuracy of the model

62

xi

LIST OF FIGURES

FIGURE NO.
2.1

TITLE
Value of time series with forecast function at 50%
probability limits

PAGE
9

3.1

Flowchart of ARIMA modeling

29

4.1

Linear Regression of Two Streamflow for 1962

36

4.2

Linear Regression of Rainfall and Streamflow

37

4.3

Linear Regression of Two Streamflow for 1993

38

4.4

Descriptive Statistics of Sungai Bernam Data

39

4.5

Probability Density Function

41

4.6

Cumulative Distribution Function

42

4.7

Cumulative Distribution Function of the Log-normal

43

Distribution
4.8

Comparison of Observed and Markov Flow

47

4.9

Flow Diagram of Box-Jenkins Methodology

48

4.10

Non stationary data of Sg. Bernam streamflow

50

4.11

Stationary data of Sg. Bernam streamflow

50

4.12

ACF after non-seasonal difference

51

4.13

PACF after non-seasonal difference

52

4.14

ACF after seasonal difference

52

4.15

PACF after seasonal difference

53

4.16

Comparison of Observed and ARIMA Model Flow

59

4.17

Model Comparison

61

4.18

Streamflow for actual and model

63

xii

LIST OF APPENDICES

APPENDIX

TITLE

PAGE

Streamflow Data of Sungai Bernam 1960-2010

72

Logarithmic of Observed Streamflow Data for 1960-2005

73

Generation of Random Number for Year 2006-2010

74

Markov Model Streamflow

75

Performance Evaluation Procedure of Markov Model

76

ARIMA Model Streamflow

78

Performance Evaluation Procedure of ARIMA model

80

xiii

LIST OF ABBREVIATIONS

ACF

Autocorrelation Function

AD

Anderson Darling

AR

Autoregressive

ARIMA

Autoregressive Integrated Moving Average

DF

Degree of Freedom

K-S

Kolmogorov-Smirnov

LSE

Least Squared Error

MA

Moving Average

MAPE

Mean Absolute Percentage Error

PACF

Partial Autocorrelation Function

RMSE

Root Mean Square Error

R2

Coefficient of Determination

Standard Deviation

SE

Standard Error

Sg.

Sungai

Chi-square

CHAPTER 1

INTRODUCTION

1.1

Background of Study

According to Bowerman and OConnell (1993), predictions of future events and


conditions are called forecasts, and the act of making such predictions is called
forecasting. In many types of organizations, forecasting is very important as predictions
of future events must be incorporated into the decision-making process. In forecasting
events that will occur in the future, information concerning events that have occurred in
the past must be relied.

In order to prepare forecasts, past data need to be analyzed to identify a pattern


that can be used to describe it. Then, this pattern is extrapolated or extended into the
future. This forecasting technique rests on the assumption that the pattern that has been
identified will continue in the future to give good predictions. If the data pattern that has
been identified does not persist in the future, this indicates that the forecasting technique
used is likely to produce inaccurate predictions (Bowerman and OConnell, 1993).

2
Most forecasting problems involve the use of time series data. In this study, time
series is used to prepare forecasts. Time series is formed from measurements of a
variable taken at regular intervals over time. It is a stochastic process which amounts to
a sequence of random variables. The hydrologic data of streamflows fall under the
category of time series (Gupta, 1989). Time series can be used in application of
forecasting of future values of a time series from current and past values, and can be
used to forecast streamflow (Box and Jenkins, 1976). Time series plots can reveal
patterns such as random, trends, level shifts, periods or cycles, unusual observations, or
a combination of patterns.

Streamflow forecasting plays important roles for flood mitigation and water
resources allocation and management. In water management, the high quality
streamflow forecast and efficient use of this forecast can give considerable economic
and social benefits. Short-term forecasting like hourly and daily forecasting is crucial for
flood warning and defense while long-term forecasting which is based on monthly,
seasonal or annual time series is very useful for reservoir operation, irrigation
management decision, drought mitigation and managing river treaties (Shalamu, 2009).

Recently, due to the increase in data availability from metering stations, real time
data retrieval and increasing computational capability with the development of more
robust methods and computer techniques, time series models have become quite popular
in streamflow forecasting (Wang, 2006). A considerable number of forecasting models
and methodologies have been developed and applied in streamflow forecasting due to
importance of hydrologic forecasting. In this study, Markov and ARIMA model have
been used in the modeling of monthly streamflow processes.

3
The Markov process considers that the value of streamflow at one time is
correlated with the value of the streamflow at an earlier period (i.e. a serial or
autocorrelation exists in the time series). In a first-order Markov process, this correlation
exists in two successive values of the events (Gupta, 1989).

The first order Markov model states that the value of a variable x in one time
period is dependent on the value of x in the preceding time period plus a random
component. Thus, the synthetic streamflow represent a sequence of numbers, each of
which consists of two parts, which are deterministic and random parts (Gupta, 1989).

Autoregressive Integrated Moving Average (ARIMA) which is often called


method of Box-Jenkins time series has good accuracy for short-term forecasting, but less
good accuracy for long-term forecasting. Usually, it will tend to become flat for a
sufficiently long period. ARIMA model ignores the independent variable completely,
and uses past and present values of dependent variable to produce accurate short-term
forecasting (Hendranata, 2003).

ARIMA is suitable when the observation of time series is statistically related to


the dependent. The purpose of this model is to determine good statistical relationships
between the variables that being predicted and the historical value of these variables, so
that forecasting can be performed with the model (Hendranata, 2003).

4
1.2

Problem Statement

There are many time series forecasting methods can be used to predict the
streamflow. However, not all of these methods can produce accurate forecasts.
Inaccurate forecasting will cause losses to water resources managers and users. The
suitability of forecasting method depends on type and number of available data. ARIMA
and Markov models must be inspected to determine the ability of this method to provide
accurate and reasonable monthly streamflow forecasting. Through statistical methods,
the accuracy of both models for forecasting monthly streamflow will be tested and
evaluated. ARIMA modeling approach and Markov model was employed to the data set
to further investigate the behavioral change in the streamflow. The result of the study
can be used as a reference guideline to the flood control as Markov and ARIMA models
best suited for short-term forecasting.

1.3

Justification of the Study

Monthly streamflow forecasting is an integral part of drought, irrigation and


reservoir operation management. Stochastic data generation aims to provide alternative
hydrologic data sequences that are likely to occur in future to assess the reliability of
alternative systems designs and policies, and to understand the variability in future
system performances. It is also very important to develop a stochastic hydrologic model
to generate the monthly streamflows and thus to estimate the future streamflows.
Through this model, it is wish that the problem on water shortage can be reduced.
Forecasting also can be used to give warning of extreme events like drought (Joomizan,
2010).

5
1.4

Aim and Objectives

The aim of this paper is to forecast streamflow by using appropriate time series
modeling approach. To achieve this aim, the following objectives have been identified:

1. To propose the streamflow forecasting methods using Markov and ARIMA


models.

2. To inspect the accuracy of Markov and ARIMA models in forecasting ability.

1.5

Scope of Study

In this study, two models of time series are used which are Markov model and
ARIMA model to predict the behavior of streamflow. Streamflow data of Sungai
Bernam, Selangor for the period of 1960 to 2010 were used for the application of the
model. The study area that located in southeast Perak and northeast Selangor is semi
developed area and the size is 186km2.

Streamflow data were obtained from station Sg. Bernam at Tanjung Malim
(Station No. 3615412). The data which is monthly streamflow were collected from the
Department of Irrigation and Drainage, Kuala Lumpur. Computer program that being
used for ARIMA model is Minitab 15 and Microsoft Excel is used for Markov model.

CHAPTER 2

LITERATURE REVIEW

2.1

Introduction

Generally, surface water hydrology is the basis to engineering design and sources
of water. High streamflow may cause disaster like flood and erosion. Short-term
forecasting is needed to control this. Meanwhile, low streamflow can disrupt water
supply to domestic user, industrial, generation of hydroelectric power and irrigation.
Here, long-term forecasting is useful to prevent this problem. Therefore, ability to
generate streamflow forecasting accurately can be used in water flow management and
flood control.

Modeling and forecasting time series has long been practiced by using different
statistical methods. Forecasting models of time series that are commonly used are
ARIMA, moving average, exponential smoothing, regression analysis, and Fourier series
analysis. In this study, Markov and ARIMA model are used to predict monthly
streamflow.

7
2.2

Time Series Model

A time series is a time-oriented or chronological sequence of observations on a


variable of interest (Montgomery et al., 2008). Time series models have become popular
in recent years since the publication of the book by Box and Jenkins (1970), and the
subsequent development of computer software for applying these models (Bell, 1984).
The time can be a discrete value, a time interval or a continuous function. The
hydrologic data of streamflows, precipitation, groundwater or lake levels, water
temperatures, or oxygen concentration fall under the category of time series. These data
can be deterministic, random, or a combination of the two (Gupta, 1989).

Many conventional statistical methods traditionally deals with models in which


the observations are assumed to be independent. However, a great deal of data in
business, economics, engineering and natural sciences occur in the form of time series
where observations are dependent. The systematic approach available for answering the
mathematical and statistical questions posed by these series of dependent observations is
called time series analysis. The objective of time series analysis is generally to
understand and identify the stochastic process that produced the observed series and then
to forecast future values of a series from past values alone (Akgun, 2003).

The analysis of a time series, in the time domain, is performed by a parameter


known as the serial correlation coefficient or the autocorrelation coefficient. This
parameter indicates the dependence in successive values of a time series. This
coefficient is determined for successive values (elements) and also for elements that are
various time intervals apart which known as lag period. A graph of the autocorrelation
coefficient against the lag period is known as the correlogram. If a correlogram shows
zero or nearly zero values for all lag periods, the process is purely random. A value close
to 1 will suggest a dominating deterministic process (Gupta, 1989).

8
The analysis of a time series in the frequency domain is done by the spectral
density that identifies the cyclic nature or periodicity in the series. The density indicates
the cycle in the deterministic data. In a purely random process it oscillates randomly.
The purpose of streamflow synthesis, however is not to analyze a time series but to
generate the data based on the series. This does not require the decomposition of the
time series by the analysis above but an understanding of its statistical properties to
reproduce series of similar statistical characteristics (Gupta, 1989).

2.3

Forecasting Time Series

Most forecasting problems involve the use of time series data. Montgomery et al.
(2008) stated that forecasting problems are often classified as short-term, medium term,
and long-term. Short-term forecasting problems involve predicting events only a few
time periods (days, weeks, months) into the future. Medium-term forecasts extend from
one to two years into the future, and long-term forecasting problems can extend beyond
that by many years. Short-term and medium-term forecasts are used for operations
management and development of projects while long-term forecasts can be used for
strategic planning.

In this study, we try to use Markov and ARIMA for long-term forecasting. As we
know, Markov and ARIMA models are best for short-term forecasting. Normally, shortterm and medium-term forecasts are based on identifying, modeling, and extrapolating
the patterns found in historical data. These historical data usually exhibit inertia and do
not change very drastically. Therefore, statistical methods are very useful for short-term
and medium-term forecasting (Montgomery et al., 2008).

9
The use at time t of available observations from a time series to forecasts its
value at some future time can provide a basis for (1) economic and business planning,
(2) production planning, (3) inventory and production control, and (4) control and
optimization of industrial processes (Box et al., 1994). As originally described by Brown
(1962), forecasts are usually needed over a period known as the lead time, which varies
with each problem. Usually, forecasts are made at time t by taking the current month Yt
and previous months Y1, Y2,,Yt-1, to forecast at some future time Ft+1, Ft+2,, Ft+m from
Y value forward.

In order to calculate best forecasts, it is necessary to specify their accuracy. The


accuracy of the forecasts may be expressed by calculating convenient set of probability
limits on either side of each forecast, such as 50% and 95%. It means that the realized
value of time series will be included within these limits with the stated probability when
it eventually happens. To illustrate, Figure 2.1 shows value of time series with forecast
made from origin t for lead time l together at 50% probability limits.

Figure 2.1: Value of time series with forecast function at 50% probability limits
(Source: Box et al., 1994)

10
2.4

Streamflow Forecasting Method

Being a natural phenomenon, streamflow has a random component. But, it is not


fully random because it has been observed that a low flow tends to follow low flow and
a high flow tends to follow high flow. The word stochastic is used to denote the
randomness in statistics but in hydrology it refers to a partial random sequence as well.
Therefore, the streamflow data that represent time series is actually involving a
stochastic process. Various stochastic processes are used for generating the hydrologic
data (Gupta, 1989).

Stochastic modeling of hydrologic time series has been widely used for planning
and management of water resources systems such as for reservoir sizing and forecasting
the occurrence of future hydrologic events. For example, stochastic models are used to
generate synthetic series of water supply that may occur in the future which are then
utilized for estimating the probability distribution of key decision parameters such as
reservoir storage size. Furthermore, stochastic models can be used for forecasting water
supplies and water demands in days, weeks, months and years in advance (Fortin et al.,
2004).

The previous rainfall and streamflow records can be utilized as model inputs for
forecasting the next time step ahead of the streamflow (Mohd Shafiek et al., 2005). This
study employs the previous streamflow records to forecast the streamflow discharge of
the following month.

There are some stochastic models that can be utilized for synthetic generation
and forecasting of hydrological process. Hydrologic processes such as monthly
streamflow may be well represented by stationary linear models such as Markov process

11
or autoregressive (AR) and autoregressive integrated moving average (ARIMA) models.
These models are usually capable of preserving the historical annual statistics, such as
the mean, variance, skewness and covariance (Fortin et al., 2004). In this study, Markov
and ARIMA models are used to predict future monthly streamflow.

2.4.1 Markov Model

The Markov process considers that the value of an event (i.e. streamflow) at one
time is correlated with the value of the event at an earlier period (i.e. a serial or
autocorrelation exists in the time series). In a first-order Markov process, this correlation
exists in two successive values of the events. The first order Markov model, which
constitutes the classic approach in synthetic hydrology, states that the value of a variable
x in one time period is dependent on the value of x in the preceding time period plus a
random component. Thus the synthetic flow for a stream represent a sequence of
numbers, each of which consists of two parts:

(2.1)

where

is flow at ith time (ith number of a time series); di(t) is deterministic part at ith

time; and ei is random part at ith time. The values of ei are tied up with the historical data
by ensuring that they belong to the same frequency distribution and posses similar
statistical properties (mean, deviation, skewness) as the historical series (Gupta, 1989).

The various forms and combinations of deterministic and random component are
recognized as different models. Single season (annual) flow model of lag 1 is the

12
simplest model which assumes that the magnitude of the current flow is significantly
correlated with the previous flow value only. In the other hand, multiple-season models
divide the yearly flow into seasons or months (Gupta, 1989).

First order Markov Model has been successfully applied to many problems.
Examples include modeling sequential data using Markov chains, and solving control
problems posed in the Markov decision processes (MDP) framework. If the Markov
models parameters are estimated from data, the standard maximum likelihood estimates
consider the first order (single step) transitions only. But for many problems, the first
order conditional independence assumptions are not satisfied as a result of the higher
order transition probabilities can be poorly approximated by the learned model
(Joomizan, 2010).

The assumption of first order Markovian processes for representing the inflow
process of a reservoir has generally been considered in the literature as adequate for
most purposes. The development of models incorporating other approaches result in
extremely complex transition probability matrices (Wurbs, 2005).

2.4.2 ARIMA Theory

ARIMA is an abbreviation of AutoRegressive Integrated Moving Average


introduced by Box and Jenkins (Box et.al., 1994). As such, some authors refer to this
modeling approach as a Box and Jenkins model. Box-Jenkins model is stationary time
series model. Time series that generated from zero-mean, finite variance, and

13
uncorrelated variable is called a white noise series which many useful models can be
constructed from it.
The ARIMA modeling is essentially an exploratory data-oriented approach that
has the flexibility of fitting an appropriate model which is adapted from the structure of
the data itself. The stochastic nature of the time series can be approximately modeled
with the aid of autocorrelation function and partial autocorrelation function; from which
information such as trend, random variables, periodic components, cyclic patterns and
serial correlation can be discovered. As a result, forecasts of the future values of the
series, with some degree of accuracy can be readily obtained (Ho and Xie, 1998).

Although ARIMA modeling is sophisticated in theory, but with the advent of


computer technology today, the iterative model building process and hence accurate
forecast can be aided and made simpler by the ease of many user-friendly statistical
software packages such as SAS, Statgraphics, Statistica and Minitab. An iterative threestage process, i.e. through model identification, parameter estimation and diagnostic
check is required to determine the adequacy of the proposed model (Ho and Xie, 1998).

2.4.3 ARIMA Algorithms

ARIMA contains three components, namely autoregressive (AR), Integrated (I)


and moving average (MA) parts. The AR part described the relationship between present
and past observations. The MA part represents the autocorrelation structure of error. The
I part represents the differencing level of the series to eliminate non-stationary
(Hasmida, 2009). It is usually denoted by (p,d,q)(P,D,Q) where p denotes order of autoregressive component, d denotes order of differencing, q denotes order of moving
average and (P,D,Q) denotes corresponding seasonal component.

14
2.4.3.1 AR Model

AR(p) model expressed the current value of time series as a linear combination
of p previous values and a white noise term (random shock). Bell (1984) expressed the
current value of time series of AR(p) model as:
Yt = 1Yt-1 + + pYt-p + at

(2.2)

where 1,, p are AR(p) parameters, the at is the random shock in normal distribution
with zero mean and variance at time t, and p is the order of AR(p).

By introducing the backshift operator B, which defines (BYt = Yt-1), equation


(2.2) can be written as:
(1- 1B - - pBp)Yt = at

(2.3)

(B)Yt = at where (B) = 1- 1B - - pBp

Or

2.4.3.2 MA Model

MA(q) model expressed the current value of a time series as a linear combination
of a current and q previous values of a white noise process. The (purely) moving average
(MA) model is (Bell, 1984):

Or

Yt = at - 1at-1 - - qat-q

(2.4)

Yt = (1- 1B - - qBq) at

(2.5)

15
Yt = (B) at.

Or

where q is the order of MA(q), and coefficients are MA(q) model parameters.

2.4.3.3 ARMA Model

To increase flexibility when fitting actual time series, both autoregressive and
moving average operators are combined to give the ARMA (p,q) model (Bell, 1984):
Yt = 1Yt-1 + + pYt-p + at - 1at-1 - - qat-q

(2.6)

which we write as:


(1- 1B - - pBp)Yt = (1- 1B - - qBq) at
Or

(2.7)

(B)Yt = (B) at.

The mixed type of series which are explained both by its own lagged values and
by lagged noise terms is called Autoregressive Moving-Average models of order (p,q).
This systematic class of stationary time series models carries great importance and
usefulness especially in real-life situations. If the process is stationary, a suitable ARMA
model can be used to represent the data. If it is nonstationary, differencing is applied to
make the model become stationary and this leads to ARIMA model (Akgun, 2003).

16
2.4.3.4 ARIMA model

The first of these conditions implies that the series Yt following (2.6) is
stationary. In practice Yt may well be nonstationary, but with stationary first difference,
Yt - Yt-1 = (1-B) Yt.
If (1-B) Yt is nonstationary, we may need to take the second difference,
Yt - 2Yt-1 + Yt-2 = (1-B) [(1-B)Yt]
= (1-B)2 Yt.

In general, we may need to take the dth difference (1-B)d Yt (although rarely is d
larger than 2). Substituting (1-B)d Yt for Yt in (2.7) yields the ARIMA (p,d,q) model
(Bell, 1984):
(1- 1B - - pBp) (1-B)d Yt = (1- 1B - - qBq) at
Or

(2.8)

(B) (1-B)d Yt = (B) at.

where d is the order of differencing.

When a time series exhibits potential seasonality indexed by s, using a multiplied


seasonal ARIMA(p,d,q)(P,D,Q)s model is advantageous. The seasonal time series is
transformed into a stationary time series with non-periodic trend components. A
multiplied seasonal ARIMA model can be expressed as (Lee and Ko, 2011):
(1- 1B - - pBp) (1- 1Bs - - PBPs) (1-Bs)D Yt =
(1- 1B - - qBq) (1- 1B - - QBQs) at

(2.9)

17
(B)(Bs) (1-Bs)D Yt = (B)(Bs)at.

Or

where D is the order of seasonal differencing, (Bs) and (Bs) are the seasonal AR(p)
and MA(q) operators respectively, which are defined as:
(Bs) = 1- 1Bs - - PBPs
(Bs) = 1- 1B - - QBQs
where 1,, p are the seasonal AR(p) parameters and 1,, p are the seasonal
MA(q) parameters.

To illustrate forecasting with ARIMA models, we shall use (2.9) written as:
Yt+l = 1Yn+l-1 + + p+dYn+l-p-d + an+l - 1an+l-1 - - qan+l-q

(2.10)

for t = n + l. We shall assume we want to forecast Yn+l for l = 1, 2, using data Yn, Yn1,

. For simplicity, we are assuming for now that the data set is long enough so that we

may effectively assume it extends into the infinite past.

2.5

Reviews on Markov Model

Naadimuthu and Lee (1982) proposed first order or lag one serially correlated
inflow. This means that the inflow of each month is dependent only on the inflow of the
previous month, forming a Markov chain. Markov chain method is stochastic method
that can be used to produce new time series of discharge of inflows based on available
time series of data (Adib and Majd, 2009).

18
According to Heiko (2000), Markov chains are stochastic processes that can be
parameterized by empirically estimating transition probabilities between discrete states
in the observed systems. The Markov chain of the first order is one for which each next
state depends only on immediately preceding one. Markov chains of second or higher
order are the processes in which the next state depends on two or more preceding ones.

Dalphin (1987) developed a lag-1 month-to-month Markov streamflow model in


which families of three-parameter Weibull distributions describe monthly streamflow
probabilistically, conditioned on streamflow in the preceding month.

2.6

Reviews on ARIMA Model

Tang et al. (1991) stated that ARIMA model is only good for short term
forecasting since it builds its forecast on previous observations. ARIMA model needs
long memory series, which are more inputs to provide more accurate forecasts. For long
memory series, more training patterns results in more accurate forecasts. This BoxJenkins model does not work well or does not work at all for short input series.

Ho and Xie (1998) proved that ARIMA model is a viable alternative that give
satisfactory results for repairable system reliability forecasting. Ayob and Amat (2004)
used ARIMA to represent water use behavior at Universiti Teknologi Malaysia. ARIMA
modeling method also can be applied to analyses the water quality and rainfall-runoff
data for Johor River recorded for a long period (Hasmida, 2009).

19
Maia et al. (2008) demonstrated that ARIMA exhibited a satisfactory
performance in forecasting interval series with either a linear or non-linear behavior and
are useful forecasting alternative to interval-valued time series. However, the hybrid
model using ARIMA and artificial neural network had better average performance.

A multiplicative seasonal autoregressive integrated moving average is applied to


the monthly streamflow forecasting of the Zayandehrud River in western Isfahan
province, Iran (Modarres, 2007). Nazuha (2010) used ARIMA to analyze monthly
Malaysia crude oil production. Besides that, Yurekli et al. (2004) used ARIMA to
simulate monthly maximum data of Cekerek Stream.

2.7

Concluding Remarks

Various techniques can be utilized for synthetic generation and forecasting of


hydrological process. Stochastic models can provide alternative hydrologic data
sequences that are likely to occur in the future to access the reliability of alternative
systems designs and policies, and to understand the variability in future system
performance.

Streamflow forecasting is an integral part of land management and water


resources management. Hydrologic processes such as monthly streamflow may be well
represented by stationary linear models such as Markov process or autoregressive (AR)
and autoregressive integrated moving average (ARIMA) models.

CHAPTER 3

METHODOLOGY

3.1

Introduction

Various stochastic processes are used for generating the hydrologic data of
streamflow. The models either developed or used in order to carry out this study are of
different types in terms of their purposes, capabilities, interfaces, inputs, and outputs.
These mainly include water balance model, reservoir simulation, and stochastic models.

The brief descriptions of the model development and considerations associated


with each of the models are presented in the following sections. The computation work
used the available historical data taken from Department of Irrigation and Drainage. The
relevant data is used in deriving the forecasting models. Markov and ARIMA modeling
methods have been proposed for streamflow forecasting of Sungai Bernam. The method
to determine the accuracy of these models in forecasting ability also will be discussed.

21
3.2

Markov Model

Gupta (1989) stated that the general Markov procedure of data synthesis comprises:

1. Determination of statistical parameters from the analysis of the historical


record
2. Identifying the frequency distribution of the historical data
3. Generating random numbers of the same distribution and statistical
characteristics
4. Constituting the deterministic part considering the persistence (influence
of previous flows) and combining with the random part.

3.2.1 Statistical Parameters of Historical Data

Four parameters that are important in a synthetic study are mean flow, standard
deviation, coefficient of skewness and correlation coefficient. The sample mean flow is
(Gupta, 1989):

(3.1)
Where,
mean observed (historical) flow
total numbers (values) of flow
ith number of observed flow

22
The sample estimate of the variance or standard deviation, S, which is a measure
of the variability of the data is given by (Gupta, 1989):

(3.2)

The sample of coefficient of skewness, g, which is a measure of the lack of symmetry, is


given by (Gupta, 1989):

(3.3)

The serial correlation coefficient is a measure of the extent to which a flow at


any time is affected by the flow at another time. The K-lag coefficient, in which the
effect extends by K time units is given by (Gupta, 1989):

(3.4)

The one-lag serial coefficient, in which the current flow is affected only by the
previous flow can be obtained by substituting K = 1. The additional lags should be
included as long as they produce a model that explains more about the pattern of flows
than one with fewer lag does (Fiering and Jackson, 1971).

23
3.2.2

Identification of Distribution

Generally, the distributions used in streamflow generation are normal, lognormal and gamma families. The bell-shaped, or normal, distribution is most extensively
used in statistical applications because the sum of variables derived from any
distribution tends to be distributed normally according to the central limit theorem. To
test normality, the historical values of flow are plotted against the percentage of values
in the record that are equal to or greater than the plotted value. The flows are arranged in
descending order. For each value xi, the percent is computed by 100(n i + 1) / n where
i is the rank of value xi and n is the number of historic values. If the plot is a straight
line, the distribution is normal. The coefficient of skewness also should be close to zero,
since the normal distribution has no skewness (Gupta, 1989).

The second distribution that is widely used in hydrology is log-normal


distribution. Log-normal distribution is positively skewed, match with characteristic of
many hydrologic variables. This distribution is suitable for low-flow studies because
small changes in low values produce large changes in their logarithmic values. A
straight-line plot indicates the log-normal distribution, while skewness calculated from
the logarithms of value should be close to zero (Gupta, 1989).

Gamma distribution is used when the historical records of flows or logarithms of


flows show appreciable skewness. However, this distribution cannot be used when
multiple lags exist when a flow is affected by many previous flows. Normally, historical
data do not clearly fit any of these distributions. The choice is made based on the
purpose, economics and any other considerations (Gupta, 1989).

24
3.2.3

Generation of Random Numbers

Gupta (1989) stated that the source of random numbers can be generated either
by the computer-based pseudorandom-number generator or the random number tables.
The random number should belong to the same distribution to which the historical
record belongs for the generated flow to have similar characteristics. Normal random
numbers have a zero mean and one standard deviation while Log-Normal random
numbers have both mean and standard deviation equal to one.

3.2.4

Formulation of the Markov Model

Formulation of the Markov Model for annual flow (Gupta, 1989):

(3.5)

where

is streamflow at ith time;

is mean of recorded flow; ri is lag 1 serial or

autocorrelation coefficient; S is standard deviation of recorded flow; ti is random variate


from an appropriate distribution with a mean of zero and variance of unity; and i is ith
position in series from 1 to N years.

A model on the same lines for monthly flows, developed by Thomas and Fiering
has the following form (Maass et al., 1962):

(3.6)

25

Where,

month in series, measured from the beginning

month in year, j = 1, 2, , 12 for January to December

qi,j

flow in ith month from the beginning, for jth month of the year

qi-1,j-1 =

bj

immediate previous month

mean of flows of jth month (12 values)

regression coefficient of flows of jth month and flows of (j-1)th


month = rjSj/Sj-1 (12 values)

3.3

Sj

standard deviation for jth month (12 values)

ti,j

random normal deviate of zero mean and unit standard deviation

ARIMA Model

ARIMA models as become common practice for specification of stationary timedependent input processes since the work of Box and Jenkins (1970). ARIMA models
are usually used as discrete-time processes (Leemis, 1998) and hence the data from a
trace is interpreted as a count process for ARIMA fitting. There are some assumptions
that were made for performing ARIMA model. Besides, this model has specific
procedures to be followed for fitting ARIMA models to time series.

26

3.3.1

Model Assumptions

Before performing the ARIMA modelling, some assumptions were made such
that (Hasmida, 2009):

1. The data is stationary


2. The data have normal distribution
3. No outlier exist in the data
4. No missing data

3.3.1.1 Data Stationarity

Classical Box-Jenkins model describe stationary time series. Thus, in order to


tentatively identify Box-Jenkins model, we must first determine whether the time series
we wish to forecast is stationary. The stationarity of monthly streamflow data were
examined by graphical representation of the data. The original data were plotted against
its time interval which is in month. A time series is stationary if the statistical properties
(for example, the mean and the variance) of the time series are essentially constant
through time (Bowerman and OConnell, 1993). In order word, stationary models
assume that the process remains in equilibrium about a constant mean level that is when
the plotting shows that the data fluctuates around its constant mean (Box et al., 1994).
Other graphical method applied in this present study is by examined the ACF and PACF
plot of the original data. Stationary data have randomly distributed ACF and PACF plot.

27
The transformation process might be required for the non stationary series and
this can be done using differencing method (Box et.al., 1994) and (Shumway, 1988).
This process has been considered in ARIMA modelling approach as the I (Integrated)
component or represent as d in ARIMA notation. The level of differencing is highly
depending on the level of stationarity of the data. The level of differencing might be 0, 1,
2 or higher than 2. 0 levels means that the differencing process is not perform to the
data. Then level 1 represent the first differencing process needed and second
differencing level needed for level 2. Higher level of differencing might be applied to
the nonstationary and complex data (Hasmida, 2009).

3.3.1.2 Normal Distribution

Data with normal distribution have a pattern of data distribution which follows a
bell shaped curve. The bell shaped curve has several properties such that the curve
concentrated in the center and decreases on either side. This means that the data has less
of a tendency to produce unusually extreme values, compared to some other
distributions. Besides, the bell shaped curve is symmetric. This tells that the probability
of deviations from the mean is comparable in either direction (Hasmida, 2009).

Data without normal distribution behavior must be transformed. Methods of data


transformation that can be applied are normal log transformation method and Box-Cox
transformation method. Box-Cox method is applied if the normal log transformation
method is not capable to transform the data into normal distribution (Hasmida, 2009).

28
3.3.1.3 Outlier

An outlier is an observation that lies outside the overall pattern of a distribution


(Moore and McCabe, 1999). The presence of an outlier always indicates some sort of
problem. This can be a case which does not fit the model under study or an error in
measurement. Outliers are often easy to spot in histograms. For example, the point on
the far left in the above figure is an outlier. This data point should be removed because it
also a sign of nonstationary data (Hasmida, 2009).

3.3.1.4 Missing Data

Yafee and McGee (2000) suggested that data should be replaced by a theoretical
defensible algorithm if some data values are missing is observed in the data series. A
crude missing data replacement method is to plug in the mean for the overall series. A
less crude algorithm is to use the mean of the period within the series in which the
observation is missing. Another algorithm is to take the mean of the adjacent
observations. Missing value in exponential smoothing often applies one step ahead
forecasting from the previous observation. Other form of interpolation employs linear
spines, cubic splines, or step function estimation of the missing data.

In order to handle missing data for this study, linear regression between flow of
study area station and flow of adjacent station is used. If data still cannot be obtained,
regression between streamflow and rainfall for that station is used to get the missing
data.

29
3.3.2 Model Procedure

The ARIMA modeling procedure for fitting ARIMA models to time series,
which was developed by Box and Jenkins (1976), consists of three iterative steps: model
identification; parameter estimation; and diagnostic checking. Figure 3.1 depicts the
process of ARIMA modeling. The procedure is itemized as follows:

Original
Streamflo

Model
Identificatio
Parameters
Estimation

No

Diagnostic
Checking

Is
adequate?

Yes
Streamflo
w
Figure 3.1: Flowchart of ARIMA modeling (Lee and Ko, 2011)

3.3.2.1 Model Identification

One determines whether the time series is stationary or nonstationary. Examine a


time series plot or ACF. From ACF, if large autocorrelations do not die out, indicating
that differencing may be required to give a constant mean. A seasonal pattern that
repeats every kth time interval suggests taking the kth difference to remove a portion of

30
the pattern. Most series should not require more than two difference operations or
orders. Be careful not to overdifference. If spikes in the ACF die out rapidly, there is no
need for further differencing.

Next, examine the ACF and PACF of your stationary data in order to identify
what autoregressive or moving average models terms are suggested. Some general
guidelines (SPSS, 1993) using graphical method was applied in the identification
process:

i.

Nonstationary series have an ACF that remains significant for half a dozen or
more lags, rather than quickly declining to 0. Difference must be done for such a
series until it is stationary before it can be identified.

ii.

Autoregressive processes have an exponentially declining ACF and spikes in the


first one or more lags of the PACF. The number of spikes indicates the order of
the autoregression.

iii.

Moving average processes have spikes in the first one or more lags of the ACF
and an exponentially declining PACF. The number of spikes indicates the order
of the moving average.

iv.

Mixed (ARMA) processes typically show exponential declines in both the ACF
and the PACF.

At the identification stage, the sign of the ACF or PACF and the speed with which
an exponentially declining ACF or PACF approaches 0 are depend upon the sign and
actual value of the AR and MA coefficients (SSPS, 1993).

31
3.3.2.2 Parameter Estimation

Once the tentative model is formulated, the related model parameters are
estimated using the least squares scheme. Parameters are estimated to have zero gradient
of forecasting errors to the historical load data. The primary objective of this parameter
estimation is to minimize the forecasting error and determine both the model and its
parameters (Lee and Ko, 2011). Each ARIMA tentative model parameter can be tested
using t-values and p-values. Dividing the coefficient by its standard error calculates a tvalue.

3.3.2.3 Diagnostic Checking

Then, diagnostic test was conducted to ensure that the essential modeling
assumptions are satisfied for a given model. When the parameters have been well
estimated, the tentative model accuracy is validated by examining the ACF and PACF
residuals. The residuals should simulate the white noise process. Furthermore, the Qstatistics test is applied to confirm the tentative model (ODonovan, 1983). If the
calculated value Q exceeds the critical value of 2 obtained from the chi-square tables,
the tentative model is inadequate (Lee and Ko, 2011).

Furthermore, for this stage, Ljung-Box is used for testing white noise residual.
Hypothesis null is that residual should be white noise. In other word, the residual series
should be independent, homoscedastic (having constant variance), and normally
distributed. We can reject hypothesis null if p-value in Chi-Square statistic greater than
alpha of 5%.

32
These steps are repeated until an adequate model is identified. When the steps in
ARIMA modeling are completed, a specific ARIMA model is applied to predict the
future monthly streamflow for 1 year ahead.

3.3.3 Minitab Procedures

For modeling ARIMA model, a statistical software has been uses, which is called
Minitab version 15. By using Minitab, ARIMA model step can be summarized as
follows:

1. Identify stationay of data

If stationary, then go to step No. 3

If non-stationary, then go to step No. 2

2. Apply the non-seasonal difference (d=1, k=1)

3. Identify seasonal pattern of the data using ACF

If ACF indicating non-seasonal pattern, then go to step No. 5

If ACF indicating seasonal pattern, then go to step No. 6

4. Identify general theoretical PACF of ARIMA model

5. Apply seasonal difference (D=1, k=12; D=2, k=24)

6. Identify general theoretical ACF and PACF of ARIMA model

If seasonal pattern of ACF and PACF is still found from step No. 6, then go to
step No. 5

33

If non-seasonal pattern of ACF is found then go to step No. 7

7. Apply the rest of procedures which are estimation, diagnostic check and
forecasting according to step No. 6until obtaining the best forecasting pattern.

3.4

Model Comparison and Forecast Evaluation Measures

In order to compare the forecasting accuracy of the different models, a


multicriterion performance evaluation procedure was used in this study. The following
indices were used to evaluate the performance of the models (Shalamu, 2009):

1. Mean Absolute Percentage Error (MAPE):

(3.7)

2. Root Mean Squared Error (RMSE):

(3.8)

3. Chi-Squared Test:

(3.9)

34
where,
Yi = the observed flow
Fi = the forecasted flow

CHAPTER 4

RESULT AND DISCUSSION

4.1

Introduction

This chapter consists of detail description on analysis of time series data using
both Markov and ARIMA modeling method for streamflow forecasting. Most of
computation work for ARIMA and Markov models are carried out by using Minitab
Microsoft Excel, respectively. Both of the methods will be used to model the streamflow
of Sungai Bernam at Tanjung Malim, Selangor (Station No. 3615412). The models will
be checked to get an adequate model for streamflow forecasting.

Data from January 1960 to December 2010 was used in deriving stochastic and
forecasting models. Data of 552 months from January 1960 to December 2005 are used
as calibration set for both model. Another 60 months data from January 2006 to
December 2010 is used as validation set.

36
4.2

Estimation of Missing Data Values

Some of data values are missing in the data series for Sungai Bernam
streamflow at Tanjung Malim (Station No. 3615412). In order to handle missing data for
this study, linear regression between flow of study area station and flow of adjacent
station is used. Regression line is determined as the best way to predict y from x. As
there was missing data of streamflow for Sungai Bernam at Tanjung Malim, streamflow
data of adjacent station at Jam. Skc (Station No. 3813411) is used. For example, there is
missing data of January 1962, February 1962 and March 1962. Some adjacent
observations month of streamflow data (previous and forward month) of both station are
used to get the regression line to estimate the missing data. This is shown in Figure 4.1.

Figure 4.1: Linear Regression of Two Streamflow Station for 1962

Missing month data of Station Tanjung Malim for January, February and March
1962 can be completed by using equation of linear regression y = 0.126x + 2.513 with
coefficient of determination, R2 of 0.845, which y and x represented flow of Station
Tanjung Malim (m3/s) and Jam. Skc (m3/s), respectively.

37

If data still cannot be obtained may be because the adjacent streamflow station
also had missing data for that month, rainfall data for adjacent station can be used to get
the regression equation to estimate the missing streamflow data. For example there is
missing data from February 1993 to May 1993 for both station of Tg. Malim and
Jam.Skc. Some adjacent observations month of rainfall data (previous and forward
month) of Station Ldg. Katoyang at Tg. Malim (Station No. 3714152) are used to get the
regression equation with flow data of Station Jam. Skc as shown in Figure 4.2. The
equation of the linear regression was found to be y = 0.146x + 10.43 with coefficient of
determination, R2 of 0.603, which y represented flow for Station Jam. Skc (m3/s) and x
represented rainfall for Station Ldg. Katayong (mm).

Figure 4.2: Linear Regression of Rainfall and Streamflow

After we know the streamflow data for February 1993 to May 1993 at Station
Jam. Skc, we can use that data to estimate the missing data of Station Tg. Malim from
the regression equation of both streamflow by using equation of linear regression y =
0.112x + 3.673 with coefficient of determination, R2 of 0.892, which y and x represented
flow of Station Tanjung Malim (m3/s) and Jam. Skc (m3/s), respectively. Figure 4.3
showed the regression line for the equation.

38

Figure 4.3: Linear Regression of Two Streamflow Station for 1993

After replacing all the missing data with appropriate estimation data from the
linear regression method, streamflow data of Sungai Bernam is shown in Appendix A.

4.3

Markov Model

Formulation of Markov Model is based on the procedures of data synthesis


which are: (1) determination of statistical parameters from the analysis of the historical
record, (2) identifying the frequency distribution of the historical data, (3) generating
random numbers of the same distribution and statistical characteristics and (4)
constituting the deterministic and combining with the random part.

39
4.3.1 Statistical Parameters of Historical Data

The sample mean flow for 612 month of data is 9.75 m3/s. Then, the sample
standard deviation, S is 4.66, skewness is 1.2, standard error is 0.18863 and coefficient
of variance is 0.47828. These statistical parameters can be calculated using Microsoft
Excel or can be obtained from EasyFit software. The result of the descriptive statistics
using EasyFit is shown in Figure 4.4.

Figure 4.4: Descriptive statistics of Sungai Bernam data

For data calibration, to model the streamflow, parameters of monthly historical


data from January 1960 to December 2005 which using 552 data is shown in Table 4.1.

40
Table 4.1: Parameters of Monthly Historical Data
i

qj

S2

Sj

Rj

Sj-1

bj

qj-1

Jan

0.049549 9.07979E-05

0.009529 0.4442686 4.189053605

0.001

0.06

Feb

0.04537

0.00943

0.4901265 3.639813919

0.001

0.05

Mac

0.046522 9.69723E-05

0.009847 0.5777814 3.363576896

0.002

0.05

Apr

0.05187

9.10128E-05

0.00954

0.408

3.69337796

0.001

0.05

May

0.054888 5.21161E-05

0.007219

0.303

3.822355866

0.001

0.05

6.94571E-05

0.008334

0.515

2.990121105

0.001

0.05

July

0.046073 7.22414E-05

0.008499

0.541

3.349038581

0.001

0.05

Aug

0.047227 7.71759E-05

0.008785

0.585

3.27283605

0.002

0.05

Sep

0.053852 7.21758E-05

0.008496

0.406

3.447681936

0.001

0.05

Oct

0.059644 7.62886E-05

0.008734

0.369

3.761513315

0.001

0.05

Nov

0.065038 6.89806E-05

0.008305

0.294

4.175448792

0.001

0.06

Dec

0.059643 0.000101211

0.01006

0.3699155 4.738293291

0.001

0.07

Jun

4.3.2

0.0488

8.89268E-05

Identification of Distribution

In this study, statistical test is used for estimating the parameters of a probability
distribution. Kolmogorov-Smirnov (K-S) test, Anderson Darling (AD) test and Chisquared test can be used as statistical test. K-S test has being used as preference as it is
more powerful and robust. By using EasyFit application, the best-fitting distribution can
be found. K-S goodness of fit test for normal distribution is 0.13466 at ranking 42 while
for Lognormal distribution is 0.05954 at ranking 2. For AD goodness of fit test for
normal distribution is 139.43 at ranking 41 while for lognormal distribution is 34.169 at
ranking 6. Best-fitting distribution for the streamflow data of Sungai Bernam is
Lognormal Distribution (Figure 4.5 and Figure 4.6).

41

0.3
0.28
0.26
0.24
0.22
0.2
0.18
0.16
0.14
0.12
0.1
0.08
0.06
0.04
0.02
0
2

10

12

14

16

18

20

22

24

26

28

30

Flow, q (m3/s)
Histogram

Inv. Gaussian (3P)

Figure 4.5: Probability Density Function

Log-normal distribution is positively skewed, match with characteristic of many


hydrologic variables. This distribution is suitable for low-flow studies because small
changes in low values produce large changes in their logarithmic values.

42

1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
2

10

12

14

16

18

20

22

24

26

28

30

Flow, q (m3/s)
Sample

Inv. Gaussian (3P)

Figure 4.6: Cumulative Distribution Function

As the distribution is log-normal, use the logarithm of the values and finally
convert back the flows. For an example, observed streamflow data in logarithmic values
for 1960 until 1970 is shown in Table 4.2, while other data for year (1971-2005) can be
found in Appendix B. These data as act calibration set to get the parameter of historical
data in order to model the future streamflow.

Table 4.2: Logarithmic Values of Observed Streamflow Data for 1960-1970


i
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970

Jan
0.056
0.052
0.059
0.050
0.056
0.046
0.058
0.065
0.040
0.054
0.054

Feb
0.051
0.044
0.046
0.045
0.047
0.044
0.048
0.054
0.037
0.047
0.034

Mac
0.058
0.045
0.056
0.045
0.050
0.050
0.052
0.047
0.031
0.040
0.036

Apr
0.064
0.051
0.057
0.044
0.052
0.066
0.058
0.060
0.041
0.046
0.045

May
0.055
0.055
0.055
0.044
0.056
0.068
0.045
0.059
0.059
0.060
0.052

Jun
0.046
0.051
0.046
0.046
0.045
0.050
0.052
0.043
0.050
0.050
0.038

Jul
0.057
0.046
0.045
0.045
0.057
0.039
0.053
0.043
0.043
0.037
0.043

Aug
0.049
0.051
0.049
0.053
0.048
0.043
0.054
0.044
0.043
0.044
0.045

Sep
0.058
0.058
0.056
0.060
0.060
0.053
0.057
0.055
0.055
0.042
0.054

Oct
0.057
0.056
0.069
0.070
0.058
0.069
0.072
0.060
0.057
0.059
0.055

Nov
0.063
0.060
0.075
0.079
0.065
0.067
0.077
0.076
0.058
0.056
0.058

Dec
0.065
0.066
0.058
0.066
0.064
0.072
0.072
0.055
0.060
0.053
0.061

43
4.3.3

Generation of Random Numbers

In this study, we generate random numbers using Microsoft Excel command


RAND( ). To get the random normal deviate, t, of mean equal to 1 and unit standard
deviation, we use inverse error function, erf-1(z):

(4.1)

Value of z can be obtained from cumulative distribution function (CDF) of the


log-normal distribution:
(4.2)

Figure 4.7: Cumulative distribution function of the log-normal distribution

44
(4.3)

As log-normal random numbers have both mean and standard deviation equal to
one. Therefore, the Equation 4.3 becomes:

(4.4)

If erf (x) = y, then erf -1 (y) = x. Let,

The value of t = ln x. Therefore,

(4.5)

As an example, the calculation procedure of random numbers generation for year


2006 is shown in Table 4.3, while the random numbers generation for other year (20072010) can be found in Appendix C.

45
Table 4.3: Generation of Random Number for Year 2006
i

RAND ( )

erf -1

ti,j

January

0.699645

0.399289

0.370085

1.523379

February

0.45481

-0.090379

-0.08027

0.886483

March

0.063732

-0.872536

-1.0558

-0.49313

April

0.224711

-0.550577

-0.53482

0.243657

May

0.236038

-0.527923

-0.50847

0.280915

June

0.471912

-0.056176

-0.04983

0.929536

July

0.999341

0.998683

1.443813

3.041859

August

0.533139

0.066278

0.058805

1.083163

September

0.095672

-0.808656

-0.91763

-0.29772

October

0.044674

-0.910651

-1.15355

-0.63136

November

0.997494

0.994989

1.429319

3.021363

December

0.407816

-0.184368

-0.16487

0.766834

4.3.4 Streamflow Generation of Markov Model

As an example, the calculation deterministic part considering the persistence


(influence of previous flows) and combining with the random part to develop monthly
streamflow model for year 2006 is shown in Table 4.4, while the streamflow model for
other year (2007-2010) can be found in Appendix D.

The Markov model for monthly flows, developed by Thomas and Fiering is
using the following form (Maass et al., 1962):

(4.6)

46
We will use Equation 4.6 to develop Markov model for monthly flows. Flow in
ith month from the beginning, for jth month of the year can be modeled by adding mean
of flow of jth month of the year (January to December) with deterministic and random
component.

Table 4.4: Model Streamflow for Year 2006


i
Jan
Febr
Mac
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec

Deterministic Component
qi-1,j-1
qj+bj(qi-1,j-1-qj-1)

Random Component
ti,j
Sjti,j(1-rj2)

0.049549
0.063
0.053
0.043
0.054
0.057
0.055
0.068
0.055
0.052
0.055
0.089

1.523379
0.886483
-0.49313
0.243657
0.280915
0.929536
3.041859
1.083163
-0.29772
-0.63136
3.021363
0.766834

0.049541669
0.045386033
0.04653475
0.051865643
0.054889433
0.048803168
0.046082272
0.04726108
0.053859993
0.059642058
0.065034911
0.059661808

0.013
0.007
-0.004
0.002
0.002
0.007
0.022
0.008
-0.002
-0.005
0.024
0.007

Model flow
qi,j (Log)
qi,j (m3/s)
0.063
0.053
0.043
0.054
0.057
0.055
0.068
0.055
0.052
0.055
0.089
0.067

13.533
9.077
5.641
9.604
10.807
10.210
16.422
10.014
8.642
9.821
32.326
15.849

4.3.5 Validation of Markov Model

The model streamflow by using Markov model is compared with the observed
streamflow that have been set as validation set for 60 monthly data from January 2006 to
December 2010. Graphically, from Figure 4.8, we can say that Markov model cannot
work well for streamflow forecasting for Sungai Bernam because it not match well with
the actual streamflow.

47

Figure 4.8: Comparison of Observed and Markov Model Flow

The ability of Markov model in streamflow forecasting is inspected by using


some forecast evaluation measures like Root Mean Square Error (RMSE), Chi-square
Test and Mean Absolute Percentage Error (MAPE). The result of inspection is
summarized in Table 4.5 and the details of the calculation can be found in Appendix E.

Table 4.5: Accuracy of the Markov Model


Performance
Evaluation Procedure

Markov
model

MAPE

53.66

RMSE

7.29

Chi-square test

250.99

48
4.4

ARIMA Model

In this study, an appropriate ARIMA tentative model for Sg. Bernam streamflow
is investigated. Examination of the autocorrelation function (ACF) and partial
autocorrelation function (PACF) provides a thorough basis for analyzing the system
behavior under time independence, and will suggest the appropriate parameters to
include in the model.

These tentative models will be checked and best tentative model will be selected
for streamflow forecasting of ARIMA model. As mentioned in previous chapter, the
ARIMA modeling follows three important stages that can be figured in flow diagram of
Box-Jenkins methodology (Figure 4.9).

1. Tentative Identification
No

- Stationary & nonstationary time series


- ACF & PACF

2. Parameter Estimation

-Testing parameters

3. Diagnostic Checking
[Is the model adequate?]

- White noise of residuals


- Normal distribution of
residual

Ye
4. Forecasting

-Forecast calculation

Figure 4.9: Flow Diagram of Box-Jenkins Methodology

49
4.4.1 Model Identification

Identification involve looking at the graph of sample autocorrelation function


(ACF) and sample partial autocorrelation function (PACF) to determine whether the
series is stationary or not and then make a decision what functional form best fits and
appropriate model for the data. In practice, the ACF and PACF are random variables and
will not give the same picture as the theoretical functions. This makes the model
identification more difficult and can involve much trial and error (Nazuha et al., 2010).

The most common method to check stationary is through examining the time
series plot of the data. Stationary means that data fluctuate around a constant mean. If
the time series plot is found to be non stationary, differencing needs to be applied.
Figure 4.10 showed that the data is non-stationary. The data need to be applied with nonseasonal difference (d = 1, lag, k = 1). Based on graphical examination, Figure 4.11
showed that the data is stationary at the level of the data after applying non-seasonal
difference.

50

12

30

1111

Streamflow, Yt (m3/s)

25

11

20

15

10

9
1111

11
10310
11
8 12
246 10
10
7
12
11
10
9 12
11
1212
10 9
12
4 12
12
11
10
5
1
10
10
12 11
11
611
8
10
11
5
1
11
7
12
12
12
9 5 8126 3
12
12 12 411 1
11
10
11
35
12
1010
11 10
11
4 1 11
10
10
11
12
411
115
4610 8
12 11
11
11
510 14
10 11 79 79
10
4
11
10
5
11
105
9 6 4510 125 11510 13
11
410 12510 12 10
11
1 99
3411
2
4 5813 2
4 59
5
310
10 14 5 511 119 5 510 3459
9 9 412
11
4
1246
711
17
9 7 9 1011
9 511
5 51035 15
12
10
9
5
5 125 5 93
9
6
9
10
12 1 9
4115
26 5102
12
9 9 5 5 13
2
6
8 4 93782 1 12
469
6
10
1
8
12
8 9 9 9 594
6 49
8 4 2 2 41281278 67 12127 7
12
16 8 125 9 11
281468 81 3 36 6 6 6 5 382
11
13 8 8
1 12
3410 1 2 9 49 10236
9 12
11
12 1010 1610 12
7 6 36126812
8 2
72 169
1
9
12
1
9
2
7
4
4
6
1
9
5
2
12
10
7
4
7 2683
8 16 4 127 2 68
6 3727
6 67 612 5 3 248 48210 1 3 10
27 5
10
5 5 34 3 1693 289 8
4
68
1
4
2 235
68 78 9 7 4 78
7
7
2
3
1
5
8 7
8
7
4
10 9 6 4
3 5 1 1267
8 7836
83
14 3 6 67 237
5 2 9 928 34 378137 57 8 78 36 16
7
1
7
7
1
2
2
7
681347
2 73
78 6713 12 2 136
2
1
4
8
3
1
9
2
42
3 238
7
4
3
2 238
2
2
11

0
Month Jan
Year 1960

Jan
1967

11
12

Jan
1974

Jan
1981

Jan
1988

Jan
1995

Jan
2002

Figure 4.10: Non stationary data of Sg. Bernam streamflow

10

15

11

12
10

12

Streamflow, d1-Yt (m3/s)

10

5
0

-5
-10

-15

9 10
11
11
11
10
11
410
10 11
9
10
11 1010
5
5 10 11
410 11
4
10 11
4
11
411 10
5
11
10
10
4
11
4
12
9
115 9
812
9
1
7
4
11
9
8
9 5
7 11
5 412
1211 2 11
3 11 10
8 12
9
3
4 2 49 925 310 9
9 9 9 99
3
8 10
9 8
9 12
11 269 9 10 9
9 10 126 410
9 9 89 11 121110
10
9
3411
46 4 48 45 5 8 4 7 8 5 94 10
8 10511510 9 4 611 349 5 10812
34
10 35
9 57 12
4511
3
5
3
95
8
4
12 4
4 4 358
88
11
511
3 69 5 39 5 39378
10257 10683 6 89
8 39 10 711
8 5 5 511
12
4 117
9
7
12 1310
9
12
12
4
3
2
11
8
4
4
3
5
7
3
5
5
10
9
11
7
5
3 4 356 34 5 78 78 8
2
7
8
7
8
4
5
58
1 3 4
46
724 8
3
5
2 4 9
23 48 57 3 2813 3 3
10107 47 12
26116712
9 72382 93 12 5 18 3836
11
8 7 238 6
29 47 34
6 2 2 4 38278 37
5 10211 5 23 11
8
10
11
2
12
11
9
2
8
10
3
1 102
63 2 7
2
67
11
11
2
1 128 79
237 2 1 1 10
7 6 1 4 256
716 7
3 46 6 6126 7
2 2 2
10
3 712
7 37 6
1682 6
8 7 5 581 247 512 6
6
12712
71
1
68 161 268 7 12
917
5 6
1 127 612
11
2 1 6 67 6 10 71 82 126
5
10
116 1107
6
1
2 6
6
11
52 1
12
12
12
1
10
12 6 1
2 1
6
1 11
612 11
1
2
1
1
12
12
6
1 1
1
1
1212
12 6
10 12
1
1212
12
1
1
2
1 12
12
12
1
12

Month Jan
Year 1960

Jan
1967

Jan
1974

Jan
1981

Jan
1988

Jan
1995

Jan
2002

Figure 4.11: Stationary data of Sg. Bernam streamflow

The next step is to identify the values of p and q which are the AR (p) and MA
(q) components for both seasonal and non-seasonal series. For this purpose, the ACF and

51
PACF coefficient are computed. The following Table 4.6 gives general theoretical for
identification of the likely model:

Table 4.6: General Theoretical ACF and PACF of ARIMA models


Model
MA(q): moving average of order q

ACF
Cut off after lag q

PACF
Dies down

AR(p): autoregressive of order p

Dies down

Cuts off after lag p

ARMA(p,q): mixed autoregressive- Dies down


moving average of order (p,q)

Dies down

AR(p) or MA(q)

Cuts off after lag q

Cuts off after lag p

No order AR or MA
(White Noise or Random process)

No spike

No spike

Autocorrelation Function for d1-Yt


(with 5% significance limits for the autocorrelations)
1.0
0.8

Autocorrelation

0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
1

10

15

20

25

30

35
Lag

40

45

50

55

60

65

Figure 4.12: ACF after non-seasonal difference

52

Partial Autocorrelation Function for d1-Yt


(with 5% significance limits for the partial autocorrelations)
1.0

Partial Autocorrelation

0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
1

10

15

20

25

30

35
Lag

40

45

50

55

60

65

Figure 4.13: PACF after non-seasonal difference

As we can see from the Figure 4.12 and 4.13, ACF and PACF die down
gradually. Based on the pattern, the respective values of p, d, q was determined for
ARIMA is: ARIMA (1, 1, 1). From ACF correlogram, seasonal pattern of the data is
identified. As ACF is indicating seasonal pattern, seasonal difference (D = 1, lag, k =
12) needs to be applied.

Autocorrelation Function for D1-d1-Yt


(with 5% significance limits for the autocorrelations)
1.0
0.8

Autocorrelation

0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
1

10

15

20

25

30

35
Lag

40

45

50

55

60

Figure 4.14: ACF after seasonal difference

65

53

Partial Autocorrelation Function for D1-d1-Yt


(with 5% significance limits for the partial autocorrelations)
1.0

Partial Autocorrelation

0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
1

10

15

20

25

30

35
Lag

40

45

50

55

60

65

Figure 4.15: PACF after seasonal difference

After applying seasonal difference, we can see from the Figure 4.14, ACF cuts
off after lag 12 while in figure 4.15, PACF dies down. For seasonal ARIMA, the general
notation is ARIMA (p, d, q) (P, D, Q)S. Based on the pattern, the respective values of P,
D, Q was determined for ARIMA is: ARIMA (0, 1, 1)12. However, in order to make sure
that we have identified the right model, we suggest another tentative model which is
ARIMA (1, 1, 1)12.

4.4.2 Parameter estimation

Each ARIMA tentative model parameter can be tested using t-values and pvalues. Dividing the coefficient by its standard error calculates a t-value. The standard
error (SE) of coefficient is the standard deviation of the estimate of a regression
coefficient. It measures how precisely your data can estimate the coefficients unknown
value. Its value is always positive, and smaller values indicate a more precise estimate.
The standard error of a coefficient helps determine whether the value of the coefficient

54
is significantly different than zero. If the p-value associated with this t-statistic is less
than alpha level, we can conclude that the coefficient is significantly different from zero.

From Table 4.7, the standard error of MA 1 coefficient is large relative to the
value of the coefficient itself, so the t-value of 1.26 is too small to declare statistical
significance. We reject hypothesis null if |t|> t/2,df

= n-np.

For MA 1 parameter, tcalc

(=1.26) < ttable (=2.25). The resulting p-value also is much greater than common alpha
level. Therefore, hypothesis null cannot be rejected. So we can conclude this coefficient
not differs from zero. Table 4.8 which estimates parameters for ARIMA (1,1,1)(0,1,1)12
have |tcalc|> ttable (= 2.25) and p-value is less than alpha level. Hence, hypothesis null can
be rejected, and we can conclude that the coefficient is significantly different from zero.

Table 4.7: Final Estimates of Parameters for ARIMA (1,1,1)(1,1,1)12


Type

Coefficient

SE Coefficient

AR 1

0.2782

0.0520

5.35

0.000

SAR 12

0.0589

0.0467

1.26

0.208

MA 1

0.8765

0.0256

34.24

0.000

SMA 12

0.9537

0.0206

46.25

0.000

Table 4.8: Final Estimates of Parameters for ARIMA (1,1,1)(0,1,1)12


Type

Coefficient

SE Coefficient

AR 1

0.2894

0.0516

5.61

0.000

MA 1

0.8788

0.0248

35.41

0.000

SMA 12

0.9553

0.0184

51.98

0.000

55
4.4.3

Diagnostic Checking

The next step of model identification method of time series modeling approach is
diagnostic checking. It is aimed at examining the accuracy of the chosen tentative model
in ensuring that the modeling assumptions are satisfied. Several procedures can be
applied to check the adequacy of the model as to whether the model satisfies the stability
or stationary condition, as required in stochastic modeling works (Ayob and Amat,
2004).

For this stage, Ljung-Box is used for testing white noise residual. Hypothesis
null is that residual should be white noise. In other word, the residual series should be
independent, homoscedastic (having constant variance), and normally distributed. We
can reject hypothesis null if p-value in Chi-Square statistic greater than alpha of 5%.

In this study, both ARIMA tentative models have p-value less than alpha level.
Table 4.9 and Table 4.10 showed p-value for both tentative models. So, the hypothesis
null cannot be rejected and we can conclude that residual is significantly white noise for
both tentative models.

Table 4.9: Modified Box-Pierce (Ljung-Box) Chi-Square statistic


for ARIMA (1,1,1)(1,1,1)12
Lag

12

24

36

48

Chi-Square

21.2

61.8

82.7

98.1

DF

20

32

44

p-Value

0.007

0.000

0.000

0.000

56
Table 4.10: Modified Box-Pierce (Ljung-Box) Chi-Square statistic
for ARIMA (1,1,1)(0,1,1)12
Lag

12

24

36

48

Chi-Square

23.1

62.2

82.7

97.9

DF

21

33

45

p-Value

0.006

0.000

0.000

0.000

Besides that, the best tentative model can be determined through test of Least
Square Error (LSE) and Root Mean Square Error (RMSE). The result for the test on the
tentative model is summarized in Table 4.11. The best fit in the least-squares sense
minimizes the sum of squared residuals, a residual being the difference between an
observed value and the fitted value provided by a model. RMSE also is a good measure
of accuracy. The smaller the value of LSE and RMSE, the tentative model is more
accurate.

Table 4.11: LSE and RMSE Test for ARIMA Tentative Model
ARIMA
Test

ARIMA
12

(1,1,1)(1,1,1)

(1,1,1)(0,1,1)12

Least Square Error (LSE)

1798

1760

Root Mean Square Error (RMSE)

5.5

5.4

So, from two tentative models possible, the model that best fits the criteria and
meets the requirement is model ARIMA (1,1,1)(0,1,1)12. Forecasting is made based on
the chosen model. The model we identified as best-fit model for Sg. Bernam streamflow
is:
(1 - 1B)(1-B)(1-B12)Yt = (1- 1B)(1- 2B12)at

(4.7)

57
Rewriting the model, we have the following:
(1 - 1B)(1-B12-B+B13)Yt = (1- 2B12- 1B + 12B13)at
(1 - 1B)(1-B12-B+B13)Yt = (1- 2B12- 1B + 12B13)at
(1-B12-B+B13- 1B+ 1B13+ 1B2- 1B14) Yt = (1- 2B12- 1B + 12B13)at
(1 - B12 (1+ 1)B + (1+ 1)B13 + 1B2 - 1B14) Yt = (1- 1B - 2B12 + 12B13)at
Yt (1+ 1)Yt-1 + 1Yt-2 Yt-12 + (1+ 1)Yt-13 - 1Yt-14 = at - 1at-1 2at-12 + 12at-13
Yt = (1+ 1)Yt-1 - 1Yt-2 + Yt-12 - (1+ 1)Yt-13 + 1Yt-14 + at - 1at-1 2at-12 + 12at-13
Noted that,
AR1, 1

0.2894

MA1, 1

0.8788

SMA 12 2

0.9553

Yt = (1+ 0.2894) Yt-1 0.2894Yt-2 + Yt-12 - (1+ 0.2894) Yt-13 + 0.2894Yt-14 + 0.2894Yt-14
+ at 0.8788at-1 0.8788at-12 + (0.8788x0.9553)at-13
Yt = 1.2894 Yt-1 0.2894Yt-2 + Yt-12 - 1.2894Yt-13 + 0.2894Yt-14 +
at 0.8788at-1 0.9553at-12 + 0.8395at-13
Yt = Yt-12 + [1.2894 Yt-1 - 1.2894Yt-13 - 0.2894Yt-2 + 0.2894Yt-14] +
[at 0.8788at-1 0.9553at-12 + 0.8395at-13]

(4.8)

Equation (4.8) can be used for streamflow forecasting of ARIMA model. From
Equation 4.8 also, its explained that the forecast for time period t is the sum of (1) the
value of the time series in the same month of the previous year, (2) a trend component
determined by the difference of previous months value and last years previous months
value and difference of last years previous two months value and previous two months
value; (3) the effects of random shocks (or residuals) of period t, t-1, t-12 and t-13 on the
forecast.

58

4.4.4 Streamflow Generation of ARIMA Model

In this study, we will use Minitab to develop Markov model for monthly flows.
As an example, develop monthly streamflow model using Minitab for year 2006 to 2007
is shown in Table 4.12, while the streamflow model for other year (2008-2010) can be
found in Appendix F.

Table 4.12: Model Streamflow for Year 2006-2007


i
Jan 2006
Feb 2006
Mac 2006
Apr 2006
May 2006
Jun 2006
Jul 2006
Aug 2006
Sep 2006
Oct 2006
Nov 2006
Dec 2006
Jan 2007
Feb 2007
Mac 2007
Apr 2007
May 2007
Jun 2007
Jul 2007
Aug 2007
Sep 2007
Oct 2007
Nov 2007
Dec 2007

Actual Flow
(m3/s)
13.08
8.12
6.11
29.72
29.22
17.82
7.94
9.95
28.05
17.63
17.72
11.23
9.05
6.80
7.62
13.46
12.05
11.38
13.06
8.95
9.36
14.33
14.26
8.24

Model Flow
(m3/s)
9.6732
7.1884
7.2612
9.0165
9.9281
7.6110
6.7046
7.0851
9.5168
12.2889
15.2005
12.3581
7.9227
6.6970
7.1341
8.9949
9.9369
7.6286
6.7248
7.1060
9.5379
12.3101
15.2217
12.3794

Residual

Fit

Coefficient

*
*
*
*
*
*
*
*
*
*
*
*
*
-1.57988
-1.39072
-1.05700
-0.14946
1.10867
-1.04180
1.04920
0.26505
-2.99026
-3.58500
4.03841

*
*
*
*
*
*
*
*
*
*
*
*
*
7.5299
7.6507
9.4570
10.2195
7.3913
7.8818
7.2208
11.0050
13.4603
15.6250
11.4816

0.289364
0.878761
0.955283

59
4.4.5

Validation of ARIMA Model

The model streamflow by using ARIMA model is compared with the observed
streamflow that have been set as validation set for 60 monthly data from January 2006 to
December 2010. Graphically, from Figure 4.16, we can say that ARIMA model may
works quite well for streamflow forecasting for Sungai Bernam because many data from
model match well with the actual streamflow. The ability of ARIMA model in
streamflow forecasting is inspected using some forecast evaluation measures.

Figure 4.16: Comparison Observed and ARIMA Model Flow

Like in Markov models validation, the forecast evaluation measures like Root
Mean Square Error (RMSE), Chi-square Test and Mean Absolute Percentage Error

60
(MAPE) are used to examine the accuracy of ARIMA model. The result of inspection is
summarized in Table 4.13 and the details of the calculation can be found in Appendix G.

Table 4.13: Accuracy of the ARIMA Model

4.5

Performance
Evaluation Procedure

ARIMA
model

MAPE

27.50

RMSE

5.41

Chi-square test

191.11

Model Comparison and Forecast Evaluation Measures

Streamflow forecasting methods of Markov model is being compared with


ARIMA model to inspect the accuracy between the models in forecasting ability.
Observed streamflow data that have been set as validation set for 60 monthly data from
January 2006 to December 2010 is used as bench mark to make the comparison. From
From graphical examination on Figure 4.17, we can say that ARIMA model is better for
streamflow forecasting for Sungai Bernam because more data from ARIMA model
match with the actual streamflow.

Most of streamflow forecast by Markov model has higher streamflow value


rather than the actual data. In the accuracy aspects, Markov model is not good rather
than ARIMA model because the model cannot obtain the exact or similar pattern with
the actual ones. However, these high values are a good forecasting as a reference
guideline to prevent damage due to flood problem. We can use Markov model for short-

61
term forecasting, like hourly and daily forecasting in order to give more accurate flood
warning.

Meanwhile, if the forecasts streamflow has the lower value from the actual data,
we cannot estimate the flood occurrence. Lower streamflow forecasts is needed in some
of agriculture field to make sure that plants have sufficient water and grow well.

Figure 4.17: Model Comparison

For short period, ARIMA model can obtain the exact or similar pattern with the
actual ones. ARIMA cannot forecast accurately for longer period as it is best used for
short-term forecasting. Usually, it will tend to become flat for sufficiently long period.
Actually, ARIMA model which is good at short-term forecasting can also be used to
control flood.

62
In order to inspect the forecasting accuracy of the different models, criteria
performance evaluation procedures which are MAPE, RMSE and Chi-square test for
both Markov and ARIMA models are compared. Table 4.14 shows the result of model
comparison of MAPE, RMSE and Chi-Square test for each model.

Table 4.14: Accuracy of the model


Performance
Evaluation Procedure

Markov
model

ARIMA
model

MAPE

53.66

27.50

RMSE

7.29

5.4156

Chi-squared test

250.99

191.11

The minimum value of MAPE, RMSE and Chi-squared methods indicates that
the model is the best for streamflow forecasting. From the result of the performance
evaluation procedure, it showed that ARIMA has less value for all methods used to find
the accurate model. Therefore, in this study, the best performance of model for
streamflow forecasting between these two models is ARIMA model.

In this study, one factor that ARIMA model is better than Markov model because
the historical data for Sg. Bernam is non stationary. If the historical data is stationary,
Markov may has advantage because it is propagating the probability method which
transition from state to another state is depend on probability. Markov model cannot
remove non stationary data but the advantage of ARIMA model is it can transform non
stationary data to stationary data.

ARIMA model selected as best fit as it has minimum mean squared forecast error
and therefore it often used in statistical practice. Therefore, for forecasting one period
ahead, which is Yt+1, the equation is as follows:

63

Yt+1 = Yt-11 + [1.2894 Yt - 1.2894Yt-12 - 0.2894Yt-1 + 0.2894Yt-13] +


[at+1 0.8788at 0.9553at-11 + 0.8395at-12]

(4.9)

By using Minitab, we can easily do streamflow forecasting for the future values
of time series from current and past values. Figure 4.18 shows the comparison of pattern
of streamflow for actual and model streamflow for Sungai Bernam. The first 5 years
from Jan 2006 to December 2010 is the calibration process. This time series plot reveal
pattern of cycles of ARIMA model. We can see that, the model flows follow the pattern
of observed streamflow quite well although the data is nonstationary for several years.

30

4
5

Variable
Yt-actual
Yt-model

Streamflow, Yt (m3/s)

25

20
6 10
11
11
11
11
11
11
11
11
11 3
11
4
47
1
4 10
11
1012
1012
1012
12
1012
1012
10
1012
1012 5 1012 6 1012
12 6 1 7 12
12
5
5 9
5 9
5
5 9
5 9
5
5 9
5
1 5 89
9 345 9 12 4 89 4 911 4 9
4 9
4
4
1 4 89
4
4
5
2 7 1 3 6 12
1 6 1 6
1 6 1 6
8 1 6
1 6 8 1 67 10 1 6
34 6789 3 8 23 78 23 78 23 78 23 78
36
23 6 8
3
7
23 78 2 78 2 78 1212 7 10 2 7
5
3
23
11

15

10

11
10
11

Month Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan
Year 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015

Figure 4.18: Streamflow for actual and model

The next 5 years is the forecast streamflow using ARIMA model which is 60
months from January 2011 to December 2015. We can see from the figure, the model

64
can forecast well but the pattern of streamflow is repeated the same pattern for longer
period. This is because ARIMA model is only good and best suited for short term
forecasting since its forecast on previous observations. For short term forecasting, BoxJenkins model can nicely reproduce the details of the original series. ARIMA cannot
forecast accurately for longer period.

CHAPTER 5

CONCLUSION AND RECOMMENDATIONS

4.1

Conclusion

This study has fulfilled the objectives of the study to propose the streamflow
forecasting methods using Markov and ARIMA models and then inspect the accuracy of
both models in forecasting ability. The Box-Jenkins or ARIMA model is one of the most
popular time series forecasting methods. Markov model has its own advantage in
forecasting ability.

In this study, the tentative model that best fits the criteria and meets the
requirement is model ARIMA (1,1,1)(0,1,1)12. By analyzing the forecasted value using
the performance evaluation procedure, it is found that use of ARIMA model for
forecasting Sg. Bernam streamflow is better than Markov model. From the result of the
performance evaluation procedure, it showed that ARIMA has less value for all methods
used. Therefore, ARIMA model has the ability to predict accurately the future monthly
streamflow for Sungai Bernam.

66
The critical part in modeling using ARIMA is identification of best tentative
model. The tentative model that has been identified will be tested and checked to clarify
that the model is the best fit.

Markov also has some advantage because it forecasts with higher streamflow
compare to actual streamflow. Higher streamflow can cause disaster like flood.
Therefore, Markov model can be used for flood control.

Both Markov and ARIMA models are good for short term forecasting. From the
result, we can see that both models can forecast well for earlier period. But, for longer
period, they cannot forecast accurately.

Although both models good for short-term forecasting and not good for longterm forecasting, comparison between the two model shows that ARIMA is better in
giving accurate forecasts.

4.2

Recommendations

Based on the result, both Markov and ARIMA model can be used for streamflow
forecasting. However, there are some weaknesses that can be overcome. Here are some
recommendations that can be used to increase the accuracy for streamflow forecasting:

67
1. The amount of data, or equivalently the number of training patterns also affects
the forecast performance. For long memory series, more training patterns results
in more accurate forecasts. To forecast accurately, use long input series.

2. To control flood efficiently, we can use Markov model for short-term forecasting
because short-term forecasting is very useful for control flood.
3. Use ARIMA model for short-term forecasting only including for streamflow
forecasting.
4. Compare the streamflow forecasting with other forecasting methods of time
series such as exponential smoothing, regression analysis or Fourier series
analysis.
5. Do the forecasting time series after removing the outliers.
6. Use hybrid model using ARIMA and artificial neural network in streamflow
forecasting.

68
REFERENCES

Adib, A. and Majd, A. R. M. (2009). Optimization of Reservoir Volume by Yield Model


And Simulation of it by Dynamic Programming and Markov Chain Method.
American-Eurasion J. Agric. & Environ. Sci., 5(6), 796-803.

Akgun, B. (2003). Identification of Periodic Autoregressive Moving Average Models.


Middle East Technical University.

Ayob, K. and Amat, S. D. (2004). Water Use Trend at Universiti Tekologi Malaysia:
Application of Arima Model. Jurnal Teknology, 41 (B): 47-56

Bell, W. R. (1984). An Introduction to Forecasting with Time Series Models. Insurance:


Mathematics and Economics 3, pp. 241-255.

Bowerman, B. L. and OConnell, R. T. (1993). Forecasting and Time Series: An


Applied Approach. Third Edition. Duxbury Press.

Box, G. E. P. and Jenkins, G. M. (1970). Time Series Analysis: Forecasting and Control.
Holden Day, San Francisco.

Box, G. E. P. and Jenkins, G. M. (1976). Time Series Analysis, Forecasting and Control.
Holden Day, San Francisco.

Box, G. E. P., Jenkins, G. M. and Reinsel, G. C. (1994). Time Series Analysis:


Forecasting and Control. Third Edition. Prentice Hall.

Brown, R. G. (1962). Smoothing, Forecasting and Prediction of Discrete Time Series.


Prentice Hall, Englewood Cliffs, N. J.

69
Dalphin, R. J. (1987). Markov-Weibull Model of Monthly Streamflow. Journal of Water
Resources Planning and Management, Vol. 113, No. 1.

Fiering, M. B., and Jackson, B. B. (1971). Synthetic Streamflows. Water Resources


Monograph 1. American Geophysicists Union. Washington, D. C.

Fortin, V., Perreault, L. and Salas, J. D. (2004). Retrospective Analysis and Forecasting
of Streamflows Using a Shifting Model. Journal of Hydrology, Vol. 296,
135-163.

Gupta, R. S. (1989). Hydrology and Hydraulic Systems. Prentice Hall, pp 343-350.

Hasmida, H. (2009). Water Quality Trend at The Upper Part of Johor River in Relation
to Rainfall and Runoff Pattern. Universiti Teknologi Malaysia.

Heiko, B. (2000). Markov Chain Model for Vegetation Dynamics. Ecological Modeling,
Vol. 126, pp. 139-154.

Hendranata, A. (2003). ARIMA (Autoregressive Moving Average). Manajemen


Keuangan Sektor Publik FEUI

Ho, S. L. and Xie, M. (1998). The Use of ARIMA Models for Reliability Forecasting
and Analysis. Computers ind. Engng, Vol. 35, Nos 1-2, pp. 213-216.

Joomizan, N. (2010). Reservoir Storage Simulation and Forecasting Models for Muda
Irrigation Scheme, Malaysia. Universiti Teknologi Malaysia.

Lee, C. and Ko, C. (2011). Short-term Load Forecasting Using Lifting Scheme and
ARIMA Models. Expert Systems with Applications, Vol. 38, pp. 5902-5911.

Leemis, L. (1998). Input Modeling. In Proceedings of the 1998 Winter Simulation

70
Conference, ed. D. J. Medeiros, E. F. Watson, J. S. Carson, and M. S.
Manivannan, 1522. Piscataway, New Jersey: Institute of Electrical and
Electronics Engineers, Inc.

Maass, A., Hufschmidt, M. M., Dorfman, R., Thomas, H. A., Marglin, S. A., Fair and G.
M. (1962). The Design of Water-Resource Systems. Harvard University Press,
Cambridge, Mass., pp 467

Maia, A. L. S., de Carvalho, F. de A. T. and Ludermir, T. B. (2008). Forecasting Models


for Interval-valued Time Series. Neurocomputing, Vol. 71, pp. 3344-3352.

Modarres, R. (2007). Streamflow Drought Time Series Forecasting. Stoch Environ Res
Risk Assess.

Mohd Shafiek, Y. Hishamuddin, J. and Sobri, H. (2005). Daily Streamflow Forecasting


Using Simplified Rule-Based Fuzzy Logic System. Journal-The Institution of
Engineers, Malaysia, Vol. 66, No. 4.

Montgomery, D. C., Jennings, C. L., Kulahci, M. (2008). Introduction to Time Series


Analysis and Forecasting. John Wiley & Sons, Inc.

Moore, D. S. and McCabe, G. P. (1999). Introduction to the Practice of Statistics. Third


Edition. New York: W. H. Freeman.

Naadimuthu, G. and Lee, E. S. (1982). Stochastic Modelling and Optimization of Water


Resources Systems. Mathematical Modelling, Vol. 3, pp. 117-136.

Nazuha, M., Ruzaidah, S. and Zamzulani, M. (2010). Malaysia Crude Oil Production
Estimation: an Application of ARIMA Model. International Conference on
Science and Social Research (CSSR 2010)

71
ODonovan, T. M. (1983). Short Term Forecasting: An Introduction to the Box-Jenkins
Approach. New York: Wiley.

Shalamu, A. (2009). Monthly and Seasonal Streamflow Forecasting in the Rio Grande
Basin. New Mexico State University

Shunway, R. H. (1988). Applied Statistical Time Series Analysis. Prentice Hall,


Englewood Cliffs, New Jersey.

SPSS (1993). SPSS for Windows-Trend. Release 6.0.

Tang, Z., Almeida, C. and Fishwick, P. A. (1991). Time Series Forecasting Using
Neural Networks vs. Box-Jenkins Methodology. Simulation.

Wang, W. (2006). Stochasticity, Nonlinearity and Forecasting of Streamflow Processes.


IOS Press, Amsterdam.

Wurbs, R. A. (2005). Comparative Evaluation of Generalized River/Reservoir Systems


Models. Texas Water Resources Institute, TR-282, pp. 27-131.

Yafee, R. and McGee, M. (2000). Introduction to Time Series Analysis and Forecasting
with Application of SAS and SPSS. Academic Press, Inc., New York.

Yurekli, K., Kurunc, A. and Simyek, H. (2004). Prediction of Daily Maximum


Streamflow Based on Stochastic Approaches. Journal of Spatial Hydrology,
Vol.4.

72
APPENDIX A

Streamflow Data of Sungai Bernam 1960-2010


i
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
2006
2007
2008
2009
2010

Jan
10.62
8.72
11.98
8.06
10.31
6.65
11.41
14.99
4.87
9.61
9.69
17.40
8.09
6.27
3.62
8.49
7.88
6.84
5.14
4.08
4.05
8.07
4.95
4.02
5.86
9.55
7.95
4.38
4.81
4.56
5.80
3.89
7.37
7.10
10.66
10.85
12.29
16.24
15.95
9.18
14.05
13.58
6.04
7.45
8.08
3.87
13.08
9.05
11.29
9.73
6.83

Feb
8.38
5.95
6.66
6.35
7.08
6.12
7.38
9.69
4.12
6.95
3.49
6.48
8.04
5.26
7.44
7.38
4.36
4.90
3.54
4.33
3.95
6.99
3.93
2.47
9.04
8.80
5.79
4.07
7.86
2.91
3.14
3.51
6.28
8.00
10.16
9.37
11.50
19.71
16.02
9.77
11.67
9.33
4.35
6.91
6.60
2.73
8.12
6.80
6.76
9.67
4.86

Mac
11.23
6.26
10.28
6.37
7.90
7.96
8.96
6.91
2.94
4.83
3.83
8.20
7.84
5.29
6.80
10.91
5.59
3.34
3.36
3.87
4.91
4.38
5.26
3.84
8.07
9.26
4.94
3.76
6.48
6.13
2.65
5.83
5.56
7.56
10.32
11.74
12.12
20.96
14.69
11.69
15.70
8.05
4.84
5.85
6.67
4.38
6.11
7.62
9.58
15.10
4.36

Apr
14.08
8.40
11.06
6.02
8.73
15.45
11.25
12.13
5.14
6.75
6.48
5.99
9.31
9.88
8.65
11.46
6.05
3.80
6.39
5.70
5.01
10.73
8.94
3.18
8.10
6.99
8.87
5.67
6.22
11.96
3.46
7.66
7.18
11.41
10.59
13.89
18.45
20.22
14.42
11.82
12.87
13.36
11.45
7.34
8.68
4.51
29.72
13.46
12.86
13.72
7.18

May
10.04
10.07
10.07
6.20
10.35
16.39
6.45
11.64
11.72
12.50
8.71
7.06
11.60
11.09
9.52
11.50
4.92
5.61
7.93
5.86
10.07
11.90
9.40
5.42
9.83
11.31
6.31
6.03
9.92
11.79
10.15
11.97
9.65
12.62
10.87
14.36
15.91
17.51
14.90
13.56
9.26
10.84
12.75
9.23
11.12
6.26
29.22
12.05
9.73
8.75
6.17

Jun
6.68
8.50
6.62
6.76
6.44
8.23
8.70
5.89
8.16
8.24
4.41
5.06
9.16
8.87
7.24
8.29
7.92
6.51
3.67
5.24
9.37
7.37
6.94
3.65
9.93
5.53
4.32
4.51
12.09
8.73
5.94
9.24
5.64
7.69
10.78
14.15
16.73
20.14
15.72
9.54
7.45
7.30
7.89
6.69
4.13
6.07
17.82
11.38
12.28
7.31
7.51

Jul
11.01
6.84
6.36
6.45
11.09
4.64
9.38
5.79
5.64
4.26
5.73
4.77
5.88
5.04
7.54
10.70
5.94
4.10
3.99
5.30
6.51
4.83
4.84
4.39
5.98
5.34
3.21
4.75
8.45
8.27
4.55
5.59
6.94
8.96
8.43
13.34
13.12
19.05
16.14
7.52
4.21
5.72
6.99
7.32
6.49
4.56
7.94
13.06
10.89
8.05
7.45

Aug
7.87
8.27
7.79
9.29
7.52
5.81
9.49
5.94
5.91
6.21
6.49
8.17
5.75
7.74
7.79
6.90
8.74
3.73
2.99
5.01
8.79
4.62
6.55
5.68
5.07
4.85
3.68
9.03
8.57
5.09
3.31
5.09
6.01
6.15
10.89
16.59
14.24
15.69
20.16
8.99
9.25
5.05
7.26
6.71
4.44
5.63
9.95
8.95
7.83
9.03
8.04

Sep
11.11
11.27
10.54
12.09
12.39
9.32
11.04
9.91
9.91
5.52
9.79
11.53
8.74
7.42
8.43
11.33
7.55
4.38
4.72
8.86
9.64
9.23
7.73
10.02
5.64
7.14
7.39
12.35
18.19
8.02
6.83
7.83
6.43
9.60
16.08
12.99
13.39
18.91
23.75
10.50
8.85
8.66
8.96
8.80
11.62
3.53
28.05
9.36
9.85
10.08
7.16

Oct
10.83
10.47
16.97
17.82
11.24
16.98
19.17
12.40
11.05
11.84
9.96
6.94
11.88
10.81
7.11
6.14
14.43
18.96
7.86
7.79
12.48
7.44
10.01
5.57
7.82
12.75
14.19
18.64
9.24
12.06
16.42
13.20
7.65
12.40
14.22
13.99
20.88
21.15
19.72
13.37
9.31
6.43
15.31
13.25
14.57
14.99
17.63
14.33
13.14
7.99
6.30

Nov
13.83
12.04
21.14
23.87
14.58
15.76
22.08
21.74
11.56
10.30
11.16
14.11
16.21
9.60
8.15
10.71
12.72
13.17
20.75
12.88
8.48
15.02
18.93
7.60
14.58
21.16
13.57
12.26
11.01
17.90
15.36
13.55
12.25
12.87
13.97
17.08
17.08
26.92
27.30
11.77
14.58
10.47
16.94
19.15
21.65
16.39
17.72
14.26
16.74
12.73
9.56

Dec
14.62
15.52
11.29
15.37
14.11
18.94
18.80
9.84
12.16
9.08
12.61
18.31
9.22
7.13
8.68
13.69
7.66
8.54
5.60
6.31
19.07
10.03
8.03
8.18
19.78
15.96
8.74
9.13
7.39
9.49
11.83
8.27
8.49
16.91
16.15
16.12
29.78
18.46
18.59
16.19
19.88
7.83
7.30
9.72
8.07
18.46
11.23
8.24
10.96
6.88
11.01

73
APPENDIX B

Logarithm of Observed Streamflow Data for 1960-2005


i
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
1996
1997
1998
1999
2000
2001
2002
2003
2004
2005
Mean

Jan
0.056
0.052
0.059
0.050
0.056
0.046
0.058
0.065
0.040
0.054
0.054
0.069
0.050
0.045
0.035
0.051
0.049
0.046
0.041
0.037
0.037
0.050
0.040
0.036
0.043
0.054
0.050
0.038
0.040
0.039
0.043
0.036
0.048
0.047
0.056
0.057
0.060
0.068
0.067
0.053
0.064
0.063
0.044
0.048
0.050
0.036
0.050

Feb
0.051
0.044
0.046
0.045
0.047
0.044
0.048
0.054
0.037
0.047
0.034
0.045
0.050
0.041
0.048
0.048
0.038
0.040
0.034
0.038
0.036
0.047
0.036
0.029
0.053
0.052
0.043
0.037
0.049
0.031
0.032
0.034
0.045
0.050
0.055
0.053
0.058
0.073
0.067
0.054
0.059
0.053
0.038
0.047
0.046
0.030
0.045

Mac
0.058
0.045
0.056
0.045
0.050
0.050
0.052
0.047
0.031
0.040
0.036
0.050
0.049
0.041
0.046
0.057
0.042
0.033
0.033
0.036
0.040
0.038
0.041
0.036
0.050
0.053
0.040
0.035
0.045
0.044
0.030
0.043
0.042
0.049
0.056
0.059
0.060
0.075
0.065
0.059
0.067
0.050
0.040
0.043
0.046
0.038
0.047

Apr
0.064
0.051
0.057
0.044
0.052
0.066
0.058
0.060
0.041
0.046
0.045
0.044
0.053
0.055
0.052
0.058
0.044
0.035
0.045
0.043
0.040
0.057
0.052
0.033
0.050
0.047
0.052
0.043
0.045
0.059
0.034
0.049
0.047
0.058
0.056
0.063
0.071
0.074
0.064
0.059
0.061
0.062
0.058
0.048
0.052
0.038
0.052

May
0.055
0.055
0.055
0.044
0.056
0.068
0.045
0.059
0.059
0.060
0.052
0.047
0.059
0.057
0.054
0.058
0.040
0.042
0.050
0.043
0.055
0.059
0.053
0.042
0.055
0.058
0.045
0.044
0.055
0.059
0.055
0.059
0.054
0.061
0.057
0.064
0.067
0.070
0.065
0.063
0.053
0.057
0.061
0.053
0.058
0.045
0.055

Jun
0.046
0.051
0.046
0.046
0.045
0.050
0.052
0.043
0.050
0.050
0.038
0.041
0.053
0.052
0.048
0.051
0.050
0.045
0.035
0.041
0.053
0.048
0.047
0.035
0.055
0.042
0.038
0.038
0.060
0.052
0.044
0.053
0.043
0.049
0.057
0.064
0.068
0.074
0.067
0.054
0.048
0.048
0.050
0.046
0.037
0.044
0.049

Jul
0.057
0.046
0.045
0.045
0.057
0.039
0.053
0.043
0.043
0.037
0.043
0.039
0.043
0.040
0.049
0.057
0.044
0.037
0.036
0.041
0.045
0.040
0.040
0.038
0.044
0.042
0.033
0.039
0.051
0.051
0.039
0.042
0.047
0.052
0.051
0.062
0.062
0.072
0.067
0.048
0.037
0.043
0.047
0.048
0.045
0.039
0.046

Aug
0.049
0.051
0.049
0.053
0.048
0.043
0.054
0.044
0.043
0.044
0.045
0.050
0.043
0.049
0.049
0.047
0.052
0.035
0.032
0.040
0.052
0.039
0.046
0.043
0.041
0.040
0.035
0.053
0.051
0.041
0.033
0.041
0.044
0.044
0.057
0.068
0.064
0.067
0.074
0.052
0.053
0.040
0.048
0.046
0.038
0.043
0.047

Sep
0.058
0.058
0.056
0.060
0.060
0.053
0.057
0.055
0.055
0.042
0.054
0.058
0.052
0.048
0.051
0.058
0.049
0.038
0.039
0.052
0.054
0.053
0.049
0.055
0.043
0.047
0.048
0.060
0.071
0.050
0.046
0.049
0.045
0.054
0.067
0.061
0.062
0.072
0.079
0.056
0.052
0.052
0.052
0.052
0.059
0.034
0.054

Oct
0.057
0.056
0.069
0.070
0.058
0.069
0.072
0.060
0.057
0.059
0.055
0.047
0.059
0.057
0.047
0.044
0.064
0.072
0.049
0.049
0.060
0.048
0.055
0.042
0.049
0.061
0.064
0.071
0.053
0.060
0.068
0.062
0.049
0.060
0.064
0.063
0.075
0.075
0.073
0.062
0.053
0.045
0.066
0.062
0.065
0.065
0.060

Nov
0.063
0.060
0.075
0.079
0.065
0.067
0.077
0.076
0.058
0.056
0.058
0.064
0.067
0.054
0.050
0.057
0.061
0.062
0.075
0.061
0.051
0.065
0.072
0.049
0.065
0.075
0.063
0.060
0.057
0.070
0.066
0.063
0.060
0.061
0.063
0.069
0.069
0.083
0.083
0.059
0.065
0.056
0.069
0.072
0.076
0.068
0.065

Dec
0.065
0.066
0.058
0.066
0.064
0.072
0.072
0.055
0.060
0.053
0.061
0.071
0.053
0.047
0.052
0.063
0.049
0.051
0.042
0.045
0.072
0.055
0.050
0.050
0.073
0.067
0.052
0.053
0.048
0.054
0.059
0.051
0.051
0.069
0.067
0.067
0.086
0.071
0.071
0.067
0.073
0.049
0.048
0.054
0.050
0.071
0.060

74
APPENDIX C

Generation of Random Number for Year 2006-2010


i

RAND ( )

erf -1

ti,j

Jan-06
Feb-06
Mar-06
Apr-06
May-06
Jun-06
Jul-06
Aug-06
Sep-06
Oct-06
Nov-06
Dec-06
Jan-07
Feb-07
Mar-07
Apr-07
May-07
Jun-07
Jul-07
Aug-07
Sep-07
Oct-07
Nov-07
Dec-07
Jan-08
Feb-08
Mar-08
Apr-08
May-08
Jun-08
Jul-08
Aug-08
Sep-08
Oct-08
Nov-08
Dec-08
Jan-09
Feb-09
Mar-09
Apr-09
May-09
Jun-09
Jul-09
Aug-09
Sep-09
Oct-09
Nov-09
Dec-09
Jan-10
Feb-10
Mar-10
Apr-10
May-10
Jun-10
Jul-10
Aug-10
Sep-10
Oct-10
Nov-10
Dec-10

0.699645
0.45481
0.063732
0.224711
0.236038
0.471912
0.999341
0.533139
0.095672
0.044674
0.997494
0.407816
0.656401
0.32176
0.733219
0.724521
0.401592
0.010641
0.096817
0.516508
0.053638
0.222905
0.612597
0.663435
0.143889
0.070315
0.523247
0.919276
0.705168
0.237308
0.877403
0.425101
0.402188
0.338947
0.687608
0.014286
0.684203
0.305343
0.627906
0.641724
0.751243
0.729118
0.289185
0.954236
0.428914
0.264273
0.687481
0.765445
0.846072
0.27472
0.555255
0.800866
0.779092
0.847218
0.420992
0.996074
0.600695
0.32158
0.630127
0.323203

0.399289
-0.090379
-0.872536
-0.550577
-0.527923
-0.056176
0.998683
0.066278
-0.808656
-0.910651
0.994989
-0.184368
0.312802
-0.35648
0.466438
0.449041
-0.196816
-0.978717
-0.806366
0.033016
-0.892724
-0.554191
0.225195
0.32687
-0.712222
-0.85937
0.046495
0.838551
0.410335
-0.525384
0.754806
-0.149797
-0.195624
-0.322107
0.375216
-0.971427
0.368406
-0.389314
0.255813
0.283447
0.502486
0.458237
-0.421629
0.908473
-0.142173
-0.471453
0.374963
0.530889
0.692144
-0.45056
0.110509
0.601733
0.558183
0.694435
-0.158017
0.992148
0.20139
-0.35684
0.260254
-0.353593

0.370085
-0.08027
-1.0558
-0.53482
-0.50847
-0.04983
1.443813
0.058805
-0.91763
-1.15355
1.429319
-0.16487
0.284724
-0.32724
0.440226
0.421663
-0.17623
-1.36824
-0.91316
0.029268
-1.1059
-0.53909
0.2023
0.298297
-0.75074
-1.02497
0.041228
0.978848
0.381358
-0.50556
0.819547
-0.13354
-0.17514
-0.29369
0.345832
-1.34224
0.339046
-0.35998
0.230738
0.256729
0.479699
0.431438
-0.39299
1.147587
-0.12667
-0.44563
0.34558
0.511878
0.720449
-0.42327
0.098252
0.597223
0.543827
0.723842
-0.14097
1.418338
0.180416
-0.32759
0.234893
-0.32439

1.523379
0.886483
-0.49313
0.243657
0.280915
0.929536
3.041859
1.083163
-0.29772
-0.63136
3.021363
0.766834
1.402661
0.537217
1.622573
1.596322
0.750771
-0.93498
-0.2914
1.041391
-0.56398
0.237618
1.286095
1.421856
-0.0617
-0.44952
1.058306
2.384299
1.539321
0.28503
2.159015
0.81114
0.752312
0.584661
1.489081
-0.89822
1.479484
0.490905
1.326313
1.36307
1.678397
1.610146
0.444235
2.622933
0.820859
0.369778
1.488724
1.723905
2.018868
0.401403
1.138949
1.8446
1.769087
2.023667
0.800643
3.005833
1.255146
0.536714
1.332189
0.541241

75
APPENDIX D

Markov Model Streamflow


Month, i
Jan-06
Feb-06
Mar-06
Apr-06
May-06
Jun-06
Jul-06
Aug-06
Sep-06
Oct-06
Nov-06
Dec-06
Jan-07
Feb-07
Mar-07
Apr-07
May-07
Jun-07
Jul-07
Aug-07
Sep-07
Oct-07
Nov-07
Dec-07
Jan-08
Feb-08
Mar-08
Apr-08
May-08
Jun-08
Jul-08
Aug-08
Sep-08
Oct-08
Nov-08
Dec-08
Jan-09
Feb-09
Mar-09
Apr-09
May-09
Jun-09
Jul-09
Aug-09
Sep-09
Oct-09
Nov-09
Dec-09
Jan-10
Feb-10
Mar-10
Apr-10
May-10
Jun-10
Jul-10
Aug-10
Sep-10
Oct-10
Nov-10
Dec-10

Deterministic Component

Random Component

Model Flow

qi-1,j-1

qj+bj(qi-1,j-1-qj-1)

ti,j

Sjti,j(1-rj2)

qi,j (Log)

0.050
0.063
0.053
0.043
0.054
0.057
0.055
0.068
0.055
0.052
0.055
0.089
0.067
0.062
0.050
0.060
0.066
0.060
0.042
0.044
0.055
0.049
0.062
0.075
0.073
0.049
0.042
0.055
0.073
0.065
0.051
0.062
0.053
0.060
0.064
0.077
0.051
0.062
0.049
0.057
0.064
0.066
0.060
0.049
0.066
0.060
0.063
0.077
0.076
0.067
0.049
0.056
0.068
0.067
0.063
0.052
0.069
0.064
0.064
0.076

0.049541669
0.045386033
0.04653475
0.051865643
0.054889433
0.048803168
0.046082272
0.04726108
0.053859993
0.059642058
0.065034911
0.059661808
0.049559131
0.045384746
0.046529892
0.051883571
0.054896185
0.04880782
0.046063986
0.047223647
0.053859658
0.059640286
0.065039039
0.059650993
0.049565308
0.04536888
0.046516145
0.051878774
0.05490011
0.048815617
0.046075966
0.047251163
0.053858046
0.059649041
0.065040692
0.059652259
0.049543394
0.045385559
0.046529249
0.051881059
0.054895021
0.048816984
0.046088968
0.047231941
0.053870927
0.059649508
0.065039672
0.059652256
0.049568162
0.045391438
0.046528015
0.05187947
0.05489742
0.048817884
0.046093027
0.047235947
0.053873657
0.0596524
0.065040467
0.059651281

1.523379
0.886483
-0.49313
0.243657
0.280915
0.929536
3.041859
1.083163
-0.29772
-0.63136
3.021363
0.766834
1.402661
0.537217
1.622573
1.596322
0.750771
-0.93498
-0.2914
1.041391
-0.56398
0.237618
1.286095
1.421856
-0.0617
-0.44952
1.058306
2.384299
1.539321
0.28503
2.159015
0.81114
0.752312
0.584661
1.489081
-0.89822
1.479484
0.490905
1.326313
1.36307
1.678397
1.610146
0.444235
2.622933
0.820859
0.369778
1.488724
1.723905
2.018868
0.401403
1.138949
1.8446
1.769087
2.023667
0.800643
3.005833
1.255146
0.536714
1.332189
0.541241

0.013
0.007
-0.004
0.002
0.002
0.007
0.022
0.008
-0.002
-0.005
0.024
0.007
0.012
0.004
0.013
0.014
0.005
-0.007
-0.002
0.007
-0.004
0.002
0.010
0.013
-0.001
-0.004
0.009
0.021
0.011
0.002
0.015
0.006
0.006
0.005
0.012
-0.008
0.013
0.004
0.011
0.012
0.012
0.012
0.003
0.019
0.006
0.003
0.012
0.016
0.017
0.003
0.009
0.016
0.012
0.014
0.006
0.021
0.010
0.004
0.011
0.005

0.063
0.053
0.043
0.054
0.057
0.055
0.068
0.055
0.052
0.055
0.089
0.067
0.062
0.050
0.060
0.066
0.060
0.042
0.044
0.055
0.049
0.062
0.075
0.073
0.049
0.042
0.055
0.073
0.065
0.051
0.062
0.053
0.060
0.064
0.077
0.051
0.062
0.049
0.057
0.064
0.066
0.060
0.049
0.066
0.060
0.063
0.077
0.076
0.067
0.049
0.056
0.068
0.067
0.063
0.052
0.069
0.064
0.064
0.076
0.065

76
APPENDIX E

Performance Evaluation Procedure of Markov Model


i
Jan-06
Feb-06
Mar-06
Apr-06
May-06
Jun-06
Jul-06
Aug-06
Sep-06
Oct-06
Nov-06
Dec-06
Jan-07
Feb-07
Mar-07
Apr-07
May-07
Jun-07
Jul-07
Aug-07
Sep-07
Oct-07
Nov-07
Dec-07
Jan-08
Feb-08
Mar-08
Apr-08
May-08
Jun-08
Jul-08
Aug-08
Sep-08
Oct-08
Nov-08
Dec-08
Jan-09
Feb-09
Mar-09
Apr-09
May-09
Jun-09
Jul-09
Aug-09
Sep-09
Oct-09

Actual Flow
(m3/s)
13.08
8.12
6.11
29.72
29.22
17.82
7.94
9.95
28.05
17.63
17.72
11.23
9.05
6.80
7.62
13.46
12.05
11.38
13.06
8.95
9.36
14.33
14.26
8.24
11.29
6.76
9.58
12.86
9.73
12.28
10.89
7.83
9.85
13.14
16.74
10.96
9.73
9.67
15.10
13.72
8.75
7.31
8.05
9.03
10.08
7.99

Model Flow
(m3/s)
13.533
9.077
5.641
9.604
10.807
10.210
16.422
10.014
8.642
9.821
32.326
15.849
13.020
7.992
12.065
15.262
12.299
5.514
6.059
9.874
7.877
13.038
21.161
19.599
7.719
5.384
10.032
19.404
15.098
8.379
13.007
9.219
12.127
14.502
22.302
8.531
13.343
7.856
10.970
14.160
15.629
12.423
7.800
15.337
12.388
13.587

MAPE

RMSE

3.462
11.786
7.681
67.685
63.015
42.706
106.822
0.644
69.192
44.298
82.432
41.109
43.872
17.535
58.336
13.384
2.069
51.550
53.607
10.339
15.846
9.013
48.391
137.858
31.625
20.351
4.721
50.886
55.171
31.768
19.439
17.734
23.119
10.398
33.224
22.161
37.128
18.764
27.352
3.205
78.619
69.942
3.111
69.845
22.897
70.046

0.205001
0.915831
0.220272
404.6583
339.0362
57.91601
71.93914
0.00411
376.688
61.00476
213.3443
21.31845
15.7641
1.421735
19.76006
3.245474
0.06214
34.41474
49.01556
0.855987
2.199906
1.668293
47.61862
129.0379
12.74858
1.89258
0.20453
42.82308
28.81687
15.21902
4.481376
1.928084
5.185165
1.865887
30.93221
5.899487
13.05078
3.292187
17.05788
0.193326
47.32323
26.14056
0.062732
39.77884
5.327024
31.32252

Chi-square
Test
0.015148
0.100896
0.039051
42.13488
31.37171
5.672621
4.380738
0.00041
43.59034
6.211446
6.599874
1.345112
1.210723
0.177887
1.637769
0.212657
0.005052
6.241801
8.089859
0.086689
0.27929
0.127953
2.250341
6.58374
1.651481
0.3515
0.020387
2.206928
1.908638
1.816363
0.344538
0.209153
0.427585
0.12866
1.386991
0.691526
0.97813
0.41909
1.554974
0.013653
3.027875
2.104243
0.008043
2.593644
0.430014
2.305389

77
Nov-09
Dec-09
Jan-10
Feb-10
Mar-10
Apr-10
May-10
Jun-10
Jul-10
Aug-10
Sep-10
Oct-10
Nov-10
Dec-10

12.73
6.88
6.83
4.86
4.36
7.18
6.17
7.51
7.45
8.04
7.16
6.30
9.56
11.01

22.299
21.522
15.834
7.597
10.312
16.493
15.985
13.908
8.744
16.912
14.091
14.296
21.417
14.672

75.168
212.820
131.827
56.317
136.513
129.701
159.084
85.191
17.405
110.346
96.798
126.927
124.026
33.261
53.659

91.56381
214.3895
81.06853
7.491085
35.42605
86.72375
96.34386
40.93266
1.680274
78.70912
48.03522
63.94217
140.5846
13.41047
7.29

4.106203
9.961389
5.119965
0.98606
3.435427
5.258356
6.026957
2.943131
0.192169
4.65409
3.408991
4.472611
6.564209
0.914016
250.9884

78
APPENDIX F

ARIMA Model Streamflow


i
Jan-06
Feb-06
Mar-06
Apr-06
May-06
Jun-06
Jul-06
Aug-06
Sep-06
Oct-06
Nov-06
Dec-06
Jan-07
Feb-07
Mar-07
Apr-07
May-07
Jun-07
Jul-07
Aug-07
Sep-07
Oct-07
Nov-07
Dec-07
Jan-08
Feb-08
Mar-08
Apr-08
May-08
Jun-08
Jul-08
Aug-08
Sep-08
Oct-08
Nov-08
Dec-08
Jan-09
Feb-09
Mar-09
Apr-09
May-09
Jun-09
Jul-09
Aug-09
Sep-09
Oct-09
Nov-09

Actual Flow
(m3/s)
13.08
8.12
6.11
29.72
29.22
17.82
7.94
9.95
28.05
17.63
17.72
11.23
9.05
6.80
7.62
13.46
12.05
11.38
13.06
8.95
9.36
14.33
14.26
8.24
11.29
6.76
9.58
12.86
9.73
12.28
10.89
7.83
9.85
13.14
16.74
10.96
9.73
9.67
15.10
13.72
8.75
7.31
8.05
9.03
10.08
7.99
12.73

Model Flow
(m3/s)
9.6732
7.1884
7.2612
9.0165
9.9281
7.6110
6.7046
7.0851
9.5168
12.2889
15.2005
12.3581
7.9227
6.6970
7.1341
8.9949
9.9369
7.6286
6.7248
7.1060
9.5379
12.3101
15.2217
12.3794
7.9439
6.7182
7.1553
9.0161
9.9581
7.6499
6.7460
7.1273
9.5592
12.3314
15.2429
12.4006
7.9651
6.7394
7.1765
9.0373
9.9794
7.6711
6.7673
7.1485
9.5804
12.3526
15.2642

Residual

Fit

Coefficient

*
*
*
*
*
*
*
*
*
*
*
*
*
-1.57988
-1.39072
-1.05700
-0.14946
1.10867
-1.04180
1.04920
0.26505
-2.99026
-3.58500
4.03841
1.99786
-1.67458
2.57792
0.10621
-1.42906
-1.18154
-1.02019
0.57523
-0.37209
3.89633
3.01737
-4.56349
-1.32522
-0.91615
-1.58516
-3.54478
-3.07188
1.04294
-0.27357
2.58702
1.07675
4.26644
5.43986

*
*
*
*
*
*
*
*
*
*
*
*
*
7.5299
7.6507
9.4570
10.2195
7.3913
7.8818
7.2208
11.0050
13.4603
15.6250
11.4816
9.9828
8.3305
7.7005
10.9538
11.4991
7.8015
7.3802
7.2148
10.9121
13.0737
18.1226
15.8535
9.3852
7.2661
7.9552
9.5648
9.2719
5.7171
6.7266
6.7039
11.0133
13.5536
18.4266

0.289364
0.878761
0.955283

79
Dec-09
Jan-10
Feb-10
Mar-10
Apr-10
May-10
Jun-10
Jul-10
Aug-10
Sep-10
Oct-10
Nov-10
Dec-10

6.88
6.83
4.86
4.36
7.18
6.17
7.51
7.45
8.04
7.16
6.30
9.56
11.01

12.4218
7.9864
6.7607
7.1978
9.0586
10.0006
7.6923
6.7885
7.1698
9.6017
12.3739
15.2854
12.4431

-1.30056
-0.79690
-1.45611
-0.78662
-1.79769
-0.43999
-1.69829
3.62122
-1.95910
1.06042
-3.37562
-2.06698
1.18330

16.6716
11.1109
8.5383
8.6866
10.5277
10.7900
8.1383
7.4688
9.4791
11.3296
14.6156
16.6470
12.9267

80
APPENDIX G

Performance Evaluation Procedure of ARIMA Model


i
Jan-06
Feb-06
Mar-06
Apr-06
May-06
Jun-06
Jul-06
Aug-06
Sep-06
Oct-06
Nov-06
Dec-06
Jan-07
Feb-07
Mar-07
Apr-07
May-07
Jun-07
Jul-07
Aug-07
Sep-07
Oct-07
Nov-07
Dec-07
Jan-08
Feb-08
Mar-08
Apr-08
May-08
Jun-08
Jul-08
Aug-08
Sep-08
Oct-08
Nov-08
Dec-08
Jan-09
Feb-09
Mar-09
Apr-09
May-09
Jun-09
Jul-09
Aug-09
Sep-09
Oct-09

Actual Flow
(m3/s)
13.08
8.12
6.11
29.72
29.22
17.82
7.94
9.95
28.05
17.63
17.72
11.23
9.05
6.80
7.62
13.46
12.05
11.38
13.06
8.95
9.36
14.33
14.26
8.24
11.29
6.76
9.58
12.86
9.73
12.28
10.89
7.83
9.85
13.14
16.74
10.96
9.73
9.67
15.10
13.72
8.75
7.31
8.05
9.03
10.08
7.99

Model Flow
(m3/s)
9.6732
7.1884
7.2612
9.0165
9.9281
7.6110
6.7046
7.0851
9.5168
12.2889
15.2005
12.3581
7.9227
6.6970
7.1341
8.9949
9.9369
7.6286
6.7248
7.1060
9.5379
12.3101
15.2217
12.3794
7.9439
6.7182
7.1553
9.0161
9.9581
7.6499
6.7460
7.1273
9.5592
12.3314
15.2429
12.4006
7.9651
6.7394
7.1765
9.0373
9.9794
7.6711
6.7673
7.1485
9.5804
12.3526

MAPE

RMSE

26.046
11.473
18.841
69.662
66.023
57.290
15.559
28.793
66.072
30.303
14.215
10.029
12.457
1.515
6.377
33.173
17.536
32.965
48.508
20.594
1.901
14.095
6.744
50.235
29.638
0.618
25.310
29.890
2.345
37.705
38.053
8.975
2.948
6.129
8.943
13.144
18.138
30.306
52.473
34.130
14.050
4.940
15.934
20.836
4.956
54.601

11.606
0.868
1.325
428.633
372.178
104.224
1.526
8.208
343.480
28.547
6.344
1.269
1.271
0.011
0.236
19.937
4.465
14.073
40.135
3.397
0.032
4.080
0.925
17.134
11.196
0.002
5.879
14.776
0.052
21.438
17.172
0.494
0.084
0.648
2.241
2.075
3.115
8.588
62.781
21.927
1.511
0.130
1.645
3.540
0.250
19.032

Chi-square
Test
1.200
0.121
0.183
47.538
37.487
13.694
0.228
1.158
36.092
2.323
0.417
0.103
0.160
0.002
0.033
2.217
0.449
1.845
5.968
0.478
0.003
0.331
0.061
1.384
1.409
0.000
0.822
1.639
0.005
2.802
2.546
0.069
0.009
0.053
0.147
0.167
0.391
1.274
8.748
2.426
0.151
0.017
0.243
0.495
0.026
1.541

81
Nov-09
Dec-09
Jan-10
Feb-10
Mar-10
Apr-10
May-10
Jun-10
Jul-10
Aug-10
Sep-10
Oct-10
Nov-10
Dec-10

12.73
6.88
6.83
4.86
4.36
7.18
6.17
7.51
7.45
8.04
7.16
6.30
9.56
11.01

15.2642
12.4218
7.9864
6.7607
7.1978
9.0586
10.0006
7.6923
6.7885
7.1698
9.6017
12.3739
15.2854
12.4431

19.907
80.550
16.931
39.109
65.087
26.164
62.085
2.428
8.848
10.824
34.102
96.411
59.889
13.016
27.497

6.422
30.712
1.337
3.613
8.053
3.529
14.674
0.033
0.434
0.757
5.962
36.892
32.780
2.054
5.416

0.421
2.472
0.167
0.534
1.119
0.390
1.467
0.004
0.064
0.106
0.621
2.981
2.145
0.165
191.114