i

TIME SERIES MODELING USING MARKOV AND ARIMA MODELS

MOHD KHAIRUL IDLAN BIN MUHAMMAD

A report submitted in partial fulfillment of the
requirements for the award of the degree of
Master of Engineering (Civil – Hydraulic & Hydrology)

Faculty of Civil Engineering
Universiti Teknologi Malaysia

JANUARY 2012

iii

DEDICATION

Special dedication to my beloved father and mother
Mr. Muhammad bin Ismail
and
Madam Siti Maznah binti Abdullah
and
My inspiration…

Jazakumullahu khairan for all love and inspiration
throughout the entire creation of this thesis.

iv

ACKNOWLEDGEMENT

Assalammualaikum w.b.t.

Alhamdulillah, all praise to Allah S.W.T for the gift of life and what I have achieved
today.

Appreciation goes to my family for their prayers, moral and financial support. May
Allay reward you abundantly.

My sincere and deepest gratitude goes to my supervisor, Dr. Sobri Harun for his
guidance, encouragement and support in completing this master project.

My gratitude to Dr. Muhammad Askari for his invaluable suggestions, guidance, and
encouragement.

Last but not least, to all my lecturers, classmates and friends, their help and supports are
really appreciated and will be remembers forever, InsyaALLAH. Thank you all

.

Root Mean Squared Error (RMSE) and Chi-square test of Normality to inspect the forecasting accuracy of the different models. Selangor was used.1)12. ARIMA model has better performance of model for forecasting than Markov model in this study. the objective of this study are to propose the streamflow forecasting methods using Markov and ARIMA models and to inspect the accuracy of Markov and ARIMA models in forecasting ability. From the criteria performance evaluation procedure.v ABSTRACT Streamflow forecasting plays important roles for flood mitigation and water resources allocation and management. Therefore. ARIMA model has the ability to accurately predict the future monthly streamflow for Sungai Bernam. The tentative model that best fits the criteria and meets the requirement for ARIMA model is ARIMA (1. The suitability of forecasting method depends on type and number of available data. .1.1)(0. Inaccurate forecasting will cause losses to water resources managers and users.1. Criteria performance evaluation procedure that being used in this study were Mean Absolute Percentage Error (MAPE). Minitab and Microsoft Excel were used to model ARIMA and Markov respectively. Streamflow data of Sungai Bernam. Thus.

Prosedur penilaian prestasi kriteria yang digunakan dalam kajian ini ialah Mean Absolute Percentage Error (MAPE). Maka. .1. Root Mean Squared error (RMSE) dan ujian Chi-Squared untuk memeriksa ketepatan peramalan model-model yang berlainan. model ARIMA mempunyai keupayaan untuk meramalkan dengan tepat aliran sungai di masa hadapan untuk Sungai Bernam. model ARIMA mempunyai prestasi yang lebih baik dalm membuat ramalan berbanding dengan model Markov. Peramalan yang tidak tepat akan menyebabkan kerugian kepada pihak pengurusan sumber air dan juga kepada pengguna. Minitab digunakan untuk memodelkan model ARIMA dan Microsoft Excel digunakan untuk memodelkan model Markov. objektif kajian ini adalah untuk mencadangkan kaedah peramalan aliran sungai dengan menggunakan model Markov dan ARIMA dan untuk memeriksa ketepatan model Markov dan ARIMA dalam membuat peramalan. Justeru. Data aliran sungai Sungai Bernam telah digunakan. Kesesuaian kaedah peramalan bergantung kepada jenis dan jumlah data yang tersedia. Dari prosedur penilaian prestasi kriteria.1.vi ABSTRAK Peramalan aliran sungai memainkan peranan yang penting untuk kawalan banjir dan pengurusan air.1)12.1)(0. Tentatif model yang terbaik sesuai dengan kriteria dan memenuhi kehendak untuk model ARIMA ialah ARIMA (1.

2 Time Series Model 7 2.1 11 Markov Model .2 Problem Statement 4 1.5 Scope of Study 5 LITERATURE REVIEW 6 2.4 Aim and Objectives 5 1.1 Background of study 1 1.4.3 Justification of the Study 4 1.1 Introduction 6 2.4 Streamflow Forecasting Method 10 2.3 Forecasting Time Series 8 2.vii TABLE OF CONTENTS CHAPTER TITLE DECLARATION DEDICATION ACKNOWLEDMENT ABSTRACT ABSTRAK TABLE OF CONTENTS LIST OF TABLES LIST OF FIGURES LIST OF APPENDICES LIST OF ABBREVIATIONS 1 2 PAGE ii iii iv v vi vii x xi xii xiii INTRODUCTION 1 1.

2 Model Assumptions 3.2 Parameter Estimation 31 3.4 Formulation of the Markov Model 24 3.1.3 Diagnostic Checking 31 .5 Reviews on Markov Model 17 2.2.3.2.2 ARIMA Theory 12 2.1 Introduction 20 3.3.1.1.3.3.4.2.6 Review on ARIMA Model 18 2.1 Model Identification 29 3.3.3.3.3.3 ARIMA Model 25 3.4.2.1.2 Normal Distribution 27 3.2 Markov Model 21 3.2.4 Missing Data 28 Model Procedure 29 3.2.3.1 26 3.2 Identification of Distribution 23 3.3 Generation of Random Numbers 24 3.2 MA Model 14 2.1 Statistical Parameters of Historical Data 21 3.3.3 ARIMA Algorithms 13 2.4.4.2.4.3.4 ARIMA Model 16 2.1 Data Stationarity 26 3.4.1 AR Model 14 2.3 ARMA Model 15 2.3 Outlier 28 3.3.7 Concluding Remarks 19 METHODOLOGY 20 3.viii 3 2.3.

ix
3.3.3
3.4

4

Model Comparison and Forecast Evaluation Measures

32
33

RESULTS AND DISCUSSION

35

4.1

Introduction

35

4.2

Estimation of Missing Data Values

36

4.3

Markov Model

38

4.3.1

Statistical Parameters of Historical Data

39

4.3.2

Identification of Distribution

40

4.3.3

Generation of Random Numbers

43

4.3.4

Streamflow Generation of Markov Model

45

4.3.5

Validation of Markov Model

46

4.4

3.4

5

Minitab Procedure

ARIMA Model

48

4.4.1

Model Identification

49

4.4.2

Parameter Estimation

53

4.4.3

Diagnostic Checking

55

4.4.4

Streamflow Generation of ARIMA Model

58

4.4.5

Validation of ARIMA Model

59

Model Comparison and Forecast Evaluation Measures

60

CONCLUSION AND RECOMMENDATIONS

65

5.1

Conclusion

65

5.2

Recommendations

66

REFERENCES
APPENDICES A-G

68
72 - 81

x

LIST OF TABLES

TABLE NO.

TITLE

4.1

Parameters of Monthly Historaical Data

4.2

Logarithmic Values of Observed Streamflow Data
for 1960-1970

PAGE
40
42

4.3

Generation of Random Number for Year 2006

45

4.4

Model Streamflow for Year 2006

46

4.5

Accuracy of the Markov Model

47

4.6

General Theoretical ACF and PACF of ARIMA

51

models
4.7

Final Estimates of Parameter for ARIMA (1,1,1)

54

(1,1,1)12
4.8

Final Estimates of Parameter for ARIMA (1,1,1)

54

(0,1,1)12
4.9

Modified Box-Pierce (Ljung Box) Chi-Square

55

statistic for ARIMA (1,1,1)(1,1,1)12
4.10

Modified Box-Pierce (Ljung Box) Chi-Square

56

statistic for ARIMA (1,1,1)(0,1,1)12
4.11

LSE and RMSE Test for ARIMA Tentative Model

56

4.12

Model Streamflow for Year 2006-2007

58

4.13

Accuracy of the ARIMA Model

60

4.14

Accuracy of the model

62

xi

LIST OF FIGURES

FIGURE NO.
2.1

TITLE
Value of time series with forecast function at 50%
probability limits

PAGE
9

3.1

Flowchart of ARIMA modeling

29

4.1

Linear Regression of Two Streamflow for 1962

36

4.2

Linear Regression of Rainfall and Streamflow

37

4.3

Linear Regression of Two Streamflow for 1993

38

4.4

Descriptive Statistics of Sungai Bernam Data

39

4.5

Probability Density Function

41

4.6

Cumulative Distribution Function

42

4.7

Cumulative Distribution Function of the Log-normal

43

Distribution
4.8

Comparison of Observed and Markov Flow

47

4.9

Flow Diagram of Box-Jenkins Methodology

48

4.10

Non stationary data of Sg. Bernam streamflow

50

4.11

Stationary data of Sg. Bernam streamflow

50

4.12

ACF after non-seasonal difference

51

4.13

PACF after non-seasonal difference

52

4.14

ACF after seasonal difference

52

4.15

PACF after seasonal difference

53

4.16

Comparison of Observed and ARIMA Model Flow

59

4.17

Model Comparison

61

4.18

Streamflow for actual and model

63

xii LIST OF APPENDICES APPENDIX TITLE PAGE A Streamflow Data of Sungai Bernam 1960-2010 72 B Logarithmic of Observed Streamflow Data for 1960-2005 73 C Generation of Random Number for Year 2006-2010 74 D Markov Model Streamflow 75 E Performance Evaluation Procedure of Markov Model 76 F ARIMA Model Streamflow 78 G Performance Evaluation Procedure of ARIMA model 80 .

xiii LIST OF ABBREVIATIONS ACF - Autocorrelation Function AD - Anderson Darling AR - Autoregressive ARIMA - Autoregressive Integrated Moving Average DF - Degree of Freedom K-S - Kolmogorov-Smirnov LSE - Least Squared Error MA - Moving Average MAPE - Mean Absolute Percentage Error PACF - Partial Autocorrelation Function RMSE - Root Mean Square Error R2 - Coefficient of Determination S - Standard Deviation SE - Standard Error Sg. - Sungai Χ2 - Chi-square .

this indicates that the forecasting technique used is likely to produce inaccurate predictions (Bowerman and O’Connell. Then. predictions of future events and conditions are called forecasts. past data need to be analyzed to identify a pattern that can be used to describe it.CHAPTER 1 INTRODUCTION 1. 1993). . In many types of organizations. this pattern is extrapolated or extended into the future. forecasting is very important as predictions of future events must be incorporated into the decision-making process. In order to prepare forecasts. information concerning events that have occurred in the past must be relied. In forecasting events that will occur in the future. and the act of making such predictions is called forecasting. This forecasting technique rests on the assumption that the pattern that has been identified will continue in the future to give good predictions.1 Background of Study According to Bowerman and O’Connell (1993). If the data pattern that has been identified does not persist in the future.

In this study. or a combination of patterns. Time series is formed from measurements of a variable taken at regular intervals over time. trends. drought mitigation and managing river treaties (Shalamu. Short-term forecasting like hourly and daily forecasting is crucial for flood warning and defense while long-term forecasting which is based on monthly. . level shifts. Streamflow forecasting plays important roles for flood mitigation and water resources allocation and management. Markov and ARIMA model have been used in the modeling of monthly streamflow processes. time series models have become quite popular in streamflow forecasting (Wang. Recently. time series is used to prepare forecasts. periods or cycles. 1989).2 Most forecasting problems involve the use of time series data. It is a stochastic process which amounts to a sequence of random variables. A considerable number of forecasting models and methodologies have been developed and applied in streamflow forecasting due to importance of hydrologic forecasting. irrigation management decision. In this study. The hydrologic data of streamflows fall under the category of time series (Gupta. due to the increase in data availability from metering stations. and can be used to forecast streamflow (Box and Jenkins. 2006). Time series can be used in application of forecasting of future values of a time series from current and past values. Time series plots can reveal patterns such as random. 2009). In water management. real time data retrieval and increasing computational capability with the development of more robust methods and computer techniques. 1976). unusual observations. the high quality streamflow forecast and efficient use of this forecast can give considerable economic and social benefits. seasonal or annual time series is very useful for reservoir operation.

and uses past and present values of dependent variable to produce accurate short-term forecasting (Hendranata.e. 1989). each of which consists of two parts. The purpose of this model is to determine good statistical relationships between the variables that being predicted and the historical value of these variables. 2003). but less good accuracy for long-term forecasting. The first order Markov model states that the value of a variable x in one time period is dependent on the value of x in the preceding time period plus a random component.3 The Markov process considers that the value of streamflow at one time is correlated with the value of the streamflow at an earlier period (i. a serial or autocorrelation exists in the time series). which are deterministic and random parts (Gupta. ARIMA model ignores the independent variable completely. 2003). Autoregressive Integrated Moving Average (ARIMA) which is often called method of Box-Jenkins time series has good accuracy for short-term forecasting. ARIMA is suitable when the observation of time series is statistically related to the dependent. 1989). it will tend to become flat for a sufficiently long period. . this correlation exists in two successive values of the events (Gupta. so that forecasting can be performed with the model (Hendranata. the synthetic streamflow represent a sequence of numbers. Usually. Thus. In a first-order Markov process.

ARIMA modeling approach and Markov model was employed to the data set to further investigate the behavioral change in the streamflow. Through this model. Through statistical methods. Forecasting also can be used to give warning of extreme events like drought (Joomizan. not all of these methods can produce accurate forecasts. and to understand the variability in future system performances. irrigation and reservoir operation management.3 Justification of the Study Monthly streamflow forecasting is an integral part of drought.2 Problem Statement There are many time series forecasting methods can be used to predict the streamflow. it is wish that the problem on water shortage can be reduced. ARIMA and Markov models must be inspected to determine the ability of this method to provide accurate and reasonable monthly streamflow forecasting. The result of the study can be used as a reference guideline to the flood control as Markov and ARIMA models best suited for short-term forecasting. 2010). However. Inaccurate forecasting will cause losses to water resources managers and users. Stochastic data generation aims to provide alternative hydrologic data sequences that are likely to occur in future to assess the reliability of alternative systems designs and policies.4 1. 1. the accuracy of both models for forecasting monthly streamflow will be tested and evaluated. The suitability of forecasting method depends on type and number of available data. It is also very important to develop a stochastic hydrologic model to generate the monthly streamflows and thus to estimate the future streamflows. .

To achieve this aim.5 1. Bernam at Tanjung Malim (Station No. 2. To propose the streamflow forecasting methods using Markov and ARIMA models. Streamflow data of Sungai Bernam. Streamflow data were obtained from station Sg. Kuala Lumpur. 3615412).5 Scope of Study In this study. The data which is monthly streamflow were collected from the Department of Irrigation and Drainage. To inspect the accuracy of Markov and ARIMA models in forecasting ability. The study area that located in southeast Perak and northeast Selangor is semi developed area and the size is 186km2.4 Aim and Objectives The aim of this paper is to forecast streamflow by using appropriate time series modeling approach. Selangor for the period of 1960 to 2010 were used for the application of the model. 1. two models of time series are used which are Markov model and ARIMA model to predict the behavior of streamflow. . the following objectives have been identified: 1. Computer program that being used for ARIMA model is Minitab 15 and Microsoft Excel is used for Markov model.

1 Introduction Generally. Forecasting models of time series that are commonly used are ARIMA. Meanwhile. moving average. Markov and ARIMA model are used to predict monthly streamflow. Modeling and forecasting time series has long been practiced by using different statistical methods. low streamflow can disrupt water supply to domestic user. High streamflow may cause disaster like flood and erosion.CHAPTER 2 LITERATURE REVIEW 2. long-term forecasting is useful to prevent this problem. Here. regression analysis. exponential smoothing. . In this study. industrial. Therefore. and Fourier series analysis. surface water hydrology is the basis to engineering design and sources of water. generation of hydroelectric power and irrigation. Short-term forecasting is needed to control this. ability to generate streamflow forecasting accurately can be used in water flow management and flood control.

the process is purely random. This parameter indicates the dependence in successive values of a time series. 2008). engineering and natural sciences occur in the form of time series where observations are dependent.7 2. These data can be deterministic. The objective of time series analysis is generally to understand and identify the stochastic process that produced the observed series and then to forecast future values of a series from past values alone (Akgun. Time series models have become popular in recent years since the publication of the book by Box and Jenkins (1970). A value close to 1 will suggest a dominating deterministic process (Gupta. precipitation. The hydrologic data of streamflows. A graph of the autocorrelation coefficient against the lag period is known as the correlogram. 2003). The time can be a discrete value. or a combination of the two (Gupta. or oxygen concentration fall under the category of time series. a time interval or a continuous function. economics. 1989). is performed by a parameter known as the serial correlation coefficient or the autocorrelation coefficient. water temperatures. However. and the subsequent development of computer software for applying these models (Bell.2 Time Series Model A time series is a time-oriented or chronological sequence of observations on a variable of interest (Montgomery et al. If a correlogram shows zero or nearly zero values for all lag periods. This coefficient is determined for successive values (elements) and also for elements that are various time intervals apart which known as lag period. random. a great deal of data in business. . 1984).. 1989). Many conventional statistical methods traditionally deals with models in which the observations are assumed to be independent. The systematic approach available for answering the mathematical and statistical questions posed by these series of dependent observations is called time series analysis. groundwater or lake levels. in the time domain. The analysis of a time series.

however is not to analyze a time series but to generate the data based on the series. shortterm and medium-term forecasts are based on identifying. Therefore. 2. Normally.8 The analysis of a time series in the frequency domain is done by the spectral density that identifies the cyclic nature or periodicity in the series. modeling. and extrapolating the patterns found in historical data. In a purely random process it oscillates randomly. As we know. The purpose of streamflow synthesis. This does not require the decomposition of the time series by the analysis above but an understanding of its statistical properties to reproduce series of similar statistical characteristics (Gupta. Markov and ARIMA models are best for short-term forecasting. medium term. 1989). . and long-term forecasting problems can extend beyond that by many years. and long-term. Short-term and medium-term forecasts are used for operations management and development of projects while long-term forecasts can be used for strategic planning. In this study. statistical methods are very useful for short-term and medium-term forecasting (Montgomery et al. weeks. Short-term forecasting problems involve predicting events only a few time periods (days. (2008) stated that forecasting problems are often classified as short-term. Medium-term forecasts extend from one to two years into the future.3 Forecasting Time Series Most forecasting problems involve the use of time series data. months) into the future. Montgomery et al. These historical data usually exhibit inertia and do not change very drastically. we try to use Markov and ARIMA for long-term forecasting.. The density indicates the cycle in the deterministic data. 2008).

9 The use at time t of available observations from a time series to forecasts its value at some future time can provide a basis for (1) economic and business planning. To illustrate. forecasts are usually needed over a period known as the lead time. Ft+m from Y value forward..…. Usually. Figure 2. it is necessary to specify their accuracy.. In order to calculate best forecasts.1 shows value of time series with forecast made from origin t for lead time l together at 50% probability limits. Y2.1: Value of time series with forecast function at 50% probability limits (Source: Box et al. forecasts are made at time t by taking the current month Yt and previous months Y1. (2) production planning. such as 50% and 95%. to forecast at some future time Ft+1. It means that the realized value of time series will be included within these limits with the stated probability when it eventually happens. which varies with each problem. (3) inventory and production control. Figure 2. The accuracy of the forecasts may be expressed by calculating convenient set of probability limits on either side of each forecast. Ft+2. As originally described by Brown (1962). 1994). and (4) control and optimization of industrial processes (Box et al.Yt-1. 1994) .….

4 Streamflow Forecasting Method Being a natural phenomenon.10 2. 1989).. Stochastic modeling of hydrologic time series has been widely used for planning and management of water resources systems such as for reservoir sizing and forecasting the occurrence of future hydrologic events. Therefore. 2005). streamflow has a random component. Hydrologic processes such as monthly streamflow may be well represented by stationary linear models such as Markov process . it is not fully random because it has been observed that a low flow tends to follow low flow and a high flow tends to follow high flow. Various stochastic processes are used for generating the hydrologic data (Gupta. This study employs the previous streamflow records to forecast the streamflow discharge of the following month. There are some stochastic models that can be utilized for synthetic generation and forecasting of hydrological process. The word “stochastic” is used to denote the randomness in statistics but in hydrology it refers to a partial random sequence as well. months and years in advance (Fortin et al. The previous rainfall and streamflow records can be utilized as model inputs for forecasting the next time step ahead of the streamflow (Mohd Shafiek et al. Furthermore. For example.. the streamflow data that represent time series is actually involving a stochastic process. 2004). weeks. But. stochastic models are used to generate synthetic series of water supply that may occur in the future which are then utilized for estimating the probability distribution of key decision parameters such as reservoir storage size. stochastic models can be used for forecasting water supplies and water demands in days.

states that the value of a variable x in one time period is dependent on the value of x in the preceding time period plus a random component. Markov and ARIMA models are used to predict future monthly streamflow. deviation.11 or autoregressive (AR) and autoregressive integrated moving average (ARIMA) models. streamflow) at one time is correlated with the value of the event at an earlier period (i. In this study. skewness and covariance (Fortin et al. a serial or autocorrelation exists in the time series). skewness) as the historical series (Gupta. each of which consists of two parts: (2.e. Single season (annual) flow model of lag 1 is the . such as the mean.1 Markov Model The Markov process considers that the value of an event (i.4. These models are usually capable of preserving the historical annual statistics. Thus the synthetic flow for a stream represent a sequence of numbers. di(t) is deterministic part at ith time. The values of ei are tied up with the historical data by ensuring that they belong to the same frequency distribution and posses similar statistical properties (mean. and ei is random part at ith time. The first order Markov model. 1989).e. variance.. The various forms and combinations of deterministic and random component are recognized as different models.1) where is flow at ith time (ith number of a time series). this correlation exists in two successive values of the events. 2004). 2. which constitutes the classic approach in synthetic hydrology. In a first-order Markov process.

Time series that generated from zero-mean. finite variance.. and . 2010). As such. If the Markov model’s parameters are estimated from data. Box-Jenkins model is stationary time series model. multiple-season models divide the yearly flow into seasons or months (Gupta. 1994). First order Markov Model has been successfully applied to many problems.4. the standard maximum likelihood estimates consider the first order (single step) transitions only. The assumption of first order Markovian processes for representing the inflow process of a reservoir has generally been considered in the literature as adequate for most purposes. 2005). Examples include modeling sequential data using Markov chains. the first order conditional independence assumptions are not satisfied as a result of the higher order transition probabilities can be poorly approximated by the learned model (Joomizan.2 ARIMA Theory ARIMA is an abbreviation of AutoRegressive Integrated Moving Average introduced by Box and Jenkins (Box et.12 simplest model which assumes that the magnitude of the current flow is significantly correlated with the previous flow value only. some authors refer to this modeling approach as a Box and Jenkins model. The development of models incorporating other approaches result in extremely complex transition probability matrices (Wurbs. In the other hand. 1989). 2. But for many problems.al. and solving control problems posed in the Markov decision processes (MDP) framework.

Q) where p denotes order of autoregressive component.q)(P. The AR part described the relationship between present and past observations. Statistica and Minitab. 2.d. from which information such as trend. through model identification. 2009).e.D. i. 1998). As a result. The MA part represents the autocorrelation structure of error. The I part represents the differencing level of the series to eliminate non-stationary (Hasmida. Statgraphics. forecasts of the future values of the series. Although ARIMA modeling is sophisticated in theory. .4. but with the advent of computer technology today. The ARIMA modeling is essentially an exploratory data-oriented approach that has the flexibility of fitting an appropriate model which is adapted from the structure of the data itself.Q) denotes corresponding seasonal component. parameter estimation and diagnostic check is required to determine the adequacy of the proposed model (Ho and Xie. namely autoregressive (AR). The stochastic nature of the time series can be approximately modeled with the aid of autocorrelation function and partial autocorrelation function. It is usually denoted by (p. with some degree of accuracy can be readily obtained (Ho and Xie. q denotes order of moving average and (P.D. 1998). An iterative threestage process. Integrated (I) and moving average (MA) parts. d denotes order of differencing.3 ARIMA Algorithms ARIMA contains three components.13 uncorrelated variable is called a ‘white noise’ series which many useful models can be constructed from it. cyclic patterns and serial correlation can be discovered. random variables. the iterative model building process and hence accurate forecast can be aided and made simpler by the ease of many user-friendly statistical software packages such as SAS. periodic components.

Bell (1984) expressed the current value of time series of AR(p) model as: Yt = φ1Yt-1 + ··· + φpYt-p + at (2.φ1B .θqat-q (2. equation (2. 1984): Or Yt = at .2 MA Model MA(q) model expressed the current value of a time series as a linear combination of a current and q previous values of a white noise process.θqBq) at (2.φpBp)Yt = at (2.4.θ1at-1 .4. The (purely) moving average (MA) model is (Bell.φ1B .3. which defines (BYt = Yt-1).θ1B .φpBp Or 2. the at is the random shock in normal distribution with zero mean and variance at time t.5) .··· .4) Yt = (1.14 2.3.2) can be written as: (1.··· .1 AR Model AR(p) model expressed the current value of time series as a linear combination of p previous values and a white noise term (random shock).··· . φp are AR(p) parameters.··· .….2) where φ1.3) φ(B)Yt = at where φ(B) = 1. By introducing the backshift operator B. and p is the order of AR(p).

q) model (Bell.7) φ(B)Yt = θ(B) at. The mixed type of series which are explained both by its own lagged values and by lagged noise terms is called Autoregressive Moving-Average models of order (p.15 Yt = θ(B) at.φpBp)Yt = (1. If it is nonstationary. 2003).θ1at-1 .θ1B . 2.4. .q).··· . Or where q is the order of MA(q). differencing is applied to make the model become stationary and this leads to ARIMA model (Akgun. both autoregressive and moving average operators are combined to give the ARMA (p. 1984): Yt = φ1Yt-1 + ··· + φpYt-p + at .3 ARMA Model To increase flexibility when fitting actual time series.6) which we write as: (1. This systematic class of stationary time series models carries great importance and usefulness especially in real-life situations.φ1B . and θ coefficients are MA(q) model parameters. If the process is stationary.··· .θqBq) at Or (2.··· .3. a suitable ARMA model can be used to represent the data.θqat-q (2.

Yt .ΦPBPs) (1-Bs)D Yt = (1. When a time series exhibits potential seasonality indexed by s. we may need to take the dth difference (1-B)d Yt (although rarely is d larger than 2). In general.16 2.··· . A multiplied seasonal ARIMA model can be expressed as (Lee and Ko.Q)s model is advantageous. If (1-B) Yt is nonstationary.8) φ(B) (1-B)d Yt = θ(B) at.Yt-1 = (1-B) Yt. using a multiplied seasonal ARIMA(p.D.θqBq) at Or (2.··· .6) is stationary.θ1B .φpBp) (1-B)d Yt = (1. Substituting (1-B)d Yt for Yt in (2.φ1B .7) yields the ARIMA (p. 1984): (1.2Yt-1 + Yt-2 = (1-B) [(1-B)Yt] = (1-B)2 Yt. but with stationary first difference.φpBp) (1.··· . In practice Yt may well be nonstationary.d.θqBq) (1.Φ1Bs . The seasonal time series is transformed into a stationary time series with non-periodic trend components. where d is the order of differencing.ΘQBQs) at (2.φ1B .3.θ1B .9) .··· .d. Yt .q)(P.Θ1B .4. we may need to take the second difference.··· .··· . 2011): (1.4 ARIMA model The first of these conditions implies that the series Yt following (2.q) model (Bell.

Θp are the seasonal MA(q) parameters. we are assuming for now that the data set is long enough so that we may effectively assume it extends into the infinite past. 2. we shall use (2. which are defined as: Φ(Bs) = 1. . … using data Yn.…. ….ΘQBQs where Φ1.··· .θqan+l-q (2.Φ1Bs .··· . This means that the inflow of each month is dependent only on the inflow of the previous month.…. 2009). We shall assume we want to forecast Yn+l for l = 1. For simplicity. To illustrate forecasting with ARIMA models.Θ1B . forming a Markov chain. Yn1.5 Reviews on Markov Model Naadimuthu and Lee (1982) proposed first order or lag one serially correlated inflow. Φp are the seasonal AR(p) parameters and Θ1.9) written as: Yt+l = Φ1Yn+l-1 + ··· + Φp+dYn+l-p-d + an+l . Markov chain method is stochastic method that can be used to produce new time series of discharge of inflows based on available time series of data (Adib and Majd.ΦPBPs Θ(Bs) = 1.17 φ(B)Φ(Bs) (1-Bs)D Yt = θ(B)Θ(Bs)at. Or where D is the order of seasonal differencing.θ1an+l-1 .10) for t = n + l. Φ(Bs) and Θ(Bs) are the seasonal AR(p) and MA(q) operators respectively. 2.··· .

more training patterns results in more accurate forecasts. Markov chains of second or higher order are the processes in which the next state depends on two or more preceding ones. (1991) stated that ARIMA model is only good for short term forecasting since it builds its forecast on previous observations. For long memory series. Ho and Xie (1998) proved that ARIMA model is a viable alternative that give satisfactory results for repairable system reliability forecasting. .6 Reviews on ARIMA Model Tang et al. 2009).18 According to Heiko (2000). The Markov chain of the first order is one for which each next state depends only on immediately preceding one. conditioned on streamflow in the preceding month. Ayob and Amat (2004) used ARIMA to represent water use behavior at Universiti Teknologi Malaysia. This BoxJenkins model does not work well or does not work at all for short input series. Markov chains are stochastic processes that can be parameterized by empirically estimating transition probabilities between discrete states in the observed systems. ARIMA modeling method also can be applied to analyses the water quality and rainfall-runoff data for Johor River recorded for a long period (Hasmida. ARIMA model needs long memory series. 2. which are more inputs to provide more accurate forecasts. Dalphin (1987) developed a lag-1 month-to-month Markov streamflow model in which families of three-parameter Weibull distributions describe monthly streamflow probabilistically.

Nazuha (2010) used ARIMA to analyze monthly Malaysia crude oil production. Streamflow forecasting is an integral part of land management and water resources management. Stochastic models can provide alternative hydrologic data sequences that are likely to occur in the future to access the reliability of alternative systems designs and policies. Besides that. and to understand the variability in future system performance. (2004) used ARIMA to simulate monthly maximum data of Cekerek Stream. . (2008) demonstrated that ARIMA exhibited a satisfactory performance in forecasting interval series with either a linear or non-linear behavior and are useful forecasting alternative to interval-valued time series. the hybrid model using ARIMA and artificial neural network had better average performance. A multiplicative seasonal autoregressive integrated moving average is applied to the monthly streamflow forecasting of the Zayandehrud River in western Isfahan province.7 Concluding Remarks Various techniques can be utilized for synthetic generation and forecasting of hydrological process. Yurekli et al.19 Maia et al. However. Hydrologic processes such as monthly streamflow may be well represented by stationary linear models such as Markov process or autoregressive (AR) and autoregressive integrated moving average (ARIMA) models. 2. Iran (Modarres. 2007).

interfaces. The method to determine the accuracy of these models in forecasting ability also will be discussed. capabilities. The relevant data is used in deriving the forecasting models. Markov and ARIMA modeling methods have been proposed for streamflow forecasting of Sungai Bernam.1 Introduction Various stochastic processes are used for generating the hydrologic data of streamflow. The computation work used the available historical data taken from Department of Irrigation and Drainage. These mainly include water balance model. and stochastic models. The models either developed or used in order to carry out this study are of different types in terms of their purposes.CHAPTER 3 METHODOLOGY 3. . inputs. The brief descriptions of the model development and considerations associated with each of the models are presented in the following sections. and outputs. reservoir simulation.

2. Generating random numbers of the same distribution and statistical characteristics 4. mean observed (historical) flow total numbers (values) of flow ith number of observed flow . Identifying the frequency distribution of the historical data 3.21 3. coefficient of skewness and correlation coefficient. The sample mean flow is (Gupta. standard deviation. Determination of statistical parameters from the analysis of the historical record 2.1 Statistical Parameters of Historical Data Four parameters that are important in a synthetic study are mean flow.1) Where. Constituting the deterministic part considering the persistence (influence of previous flows) and combining with the random part. 3.2 Markov Model Gupta (1989) stated that the general Markov procedure of data synthesis comprises: 1. 1989): (3.

The additional lags should be included as long as they produce a model that explains more about the pattern of flows than one with fewer lag does (Fiering and Jackson.3) The serial correlation coefficient is a measure of the extent to which a flow at any time is affected by the flow at another time. is given by (Gupta.4) The one-lag serial coefficient. . which is a measure of the lack of symmetry. 1989): (3. which is a measure of the variability of the data is given by (Gupta. The K-lag coefficient. g.22 The sample estimate of the variance or standard deviation. 1989): (3. S. in which the effect extends by K time units is given by (Gupta. 1989): (3. in which the current flow is affected only by the previous flow can be obtained by substituting K = 1. 1971).2) The sample of coefficient of skewness.

Gamma distribution is used when the historical records of flows or logarithms of flows show appreciable skewness. To test normality. This distribution is suitable for low-flow studies because small changes in low values produce large changes in their logarithmic values. 1989). distribution is most extensively used in statistical applications because the sum of variables derived from any distribution tends to be distributed normally according to the central limit theorem. the percent is computed by 100(n – i + 1) / n where i is the rank of value xi and n is the number of historic values.23 3. Normally. match with characteristic of many hydrologic variables. since the normal distribution has no skewness (Gupta. lognormal and gamma families. Log-normal distribution is positively skewed. 1989). while skewness calculated from the logarithms of value should be close to zero (Gupta. the distributions used in streamflow generation are normal. If the plot is a straight line. However. The bell-shaped. The coefficient of skewness also should be close to zero. A straight-line plot indicates the log-normal distribution. the distribution is normal. The second distribution that is widely used in hydrology is log-normal distribution. . For each value xi. or normal. The choice is made based on the purpose. the historical values of flow are plotted against the percentage of values in the record that are equal to or greater than the plotted value. this distribution cannot be used when multiple lags exist when a flow is affected by many previous flows. The flows are arranged in descending order. 1989). historical data do not clearly fit any of these distributions. economics and any other considerations (Gupta.2 Identification of Distribution Generally.2.

3. ri is lag 1 serial or autocorrelation coefficient.24 3. The random number should belong to the same distribution to which the historical record belongs for the generated flow to have similar characteristics.3 Generation of Random Numbers Gupta (1989) stated that the source of random numbers can be generated either by the computer-based pseudorandom-number generator or the random number tables.2.5) where is streamflow at ith time. A model on the same lines for monthly flows. ti is random variate from an appropriate distribution with a mean of zero and variance of unity. 1989): (3. developed by Thomas and Fiering has the following form (Maass et al. is mean of recorded flow. Normal random numbers have a zero mean and one standard deviation while Log-Normal random numbers have both mean and standard deviation equal to one.6) .4 Formulation of the Markov Model Formulation of the Markov Model for annual flow (Gupta.2. 1962): (3. S is standard deviation of recorded flow. and i is ith position in series from 1 to N years..

this model has specific procedures to be followed for fitting ARIMA models to time series. ARIMA models are usually used as discrete-time processes (Leemis.j = random normal deviate of zero mean and unit standard deviation ARIMA Model ARIMA models as become common practice for specification of stationary timedependent input processes since the work of Box and Jenkins (1970). 2. j = 1. i = month in series.j-1 = bj immediate previous month = mean of flows of jth month (12 values) = regression coefficient of flows of jth month and flows of (j-1)th month = rjSj/Sj-1 (12 values) 3. 1998) and hence the data from a trace is interpreted as a count process for ARIMA fitting.j = flow in ith month from the beginning. for jth month of the year qi-1. .3 Sj = standard deviation for jth month (12 values) ti. measured from the beginning j = month in year. …. Besides. There are some assumptions that were made for performing ARIMA model. 12 for January to December qi.25 Where.

In order word. The data have normal distribution 3. .1. The data is stationary 2. The original data were plotted against its time interval which is in month. A time series is stationary if the statistical properties (for example. Stationary data have randomly distributed ACF and PACF plot. Thus.3.1 Data Stationarity Classical Box-Jenkins model describe stationary time series.3. we must first determine whether the time series we wish to forecast is stationary. No outlier exist in the data 4. stationary models assume that the process remains in equilibrium about a constant mean level that is when the plotting shows that the data fluctuates around its constant mean (Box et al.. 1994). 1993).26 3. 2009): 1. in order to tentatively identify Box-Jenkins model. the mean and the variance) of the time series are essentially constant through time (Bowerman and O’Connell.1 Model Assumptions Before performing the ARIMA modelling. No missing data 3. some assumptions were made such that (Hasmida. The stationarity of monthly streamflow data were examined by graphical representation of the data. Other graphical method applied in this present study is by examined the ACF and PACF plot of the original data.

1994) and (Shumway. This process has been considered in ARIMA modelling approach as the I (Integrated) component or represent as d in ARIMA notation. 2009). Methods of data transformation that can be applied are normal log transformation method and Box-Cox transformation method. 1. Higher level of differencing might be applied to the nonstationary and complex data (Hasmida. Data without normal distribution behavior must be transformed. The bell shaped curve has several properties such that the curve concentrated in the center and decreases on either side. Box-Cox method is applied if the normal log transformation method is not capable to transform the data into normal distribution (Hasmida.2 Normal Distribution Data with normal distribution have a pattern of data distribution which follows a bell shaped curve.. . The level of differencing is highly depending on the level of stationarity of the data. 0 levels means that the differencing process is not perform to the data. Besides. 3.3. This means that the data has less of a tendency to produce unusually extreme values. 2 or higher than 2. The level of differencing might be 0. This tells that the probability of deviations from the mean is comparable in either direction (Hasmida.al. 1988). 2009). 2009). compared to some other distributions. Then level 1 represent the first differencing process needed and second differencing level needed for level 2. the bell shaped curve is symmetric.27 The transformation process might be required for the non stationary series and this can be done using differencing method (Box et.1.

1999).28 3. 2009). This can be a case which does not fit the model under study or an error in measurement. Missing value in exponential smoothing often applies one step ahead forecasting from the previous observation. cubic splines. The presence of an outlier always indicates some sort of problem. This data point should be removed because it also a sign of nonstationary data (Hasmida. or step function estimation of the missing data. linear regression between flow of study area station and flow of adjacent station is used. Outliers are often easy to spot in histograms. the point on the far left in the above figure is an outlier. A less crude algorithm is to use the mean of the period within the series in which the observation is missing. If data still cannot be obtained. In order to handle missing data for this study. Other form of interpolation employs linear spines. 3. A crude missing data replacement method is to plug in the mean for the overall series.3.1. Another algorithm is to take the mean of the adjacent observations. For example. . regression between streamflow and rainfall for that station is used to get the missing data.1.3.4 Missing Data Yafee and McGee (2000) suggested that data should be replaced by a theoretical defensible algorithm if some data values are missing is observed in the data series.3 Outlier An outlier is an observation that lies outside the overall pattern of a distribution (Moore and McCabe.

29
3.3.2 Model Procedure

The ARIMA modeling procedure for fitting ARIMA models to time series,
which was developed by Box and Jenkins (1976), consists of three iterative steps: model
identification; parameter estimation; and diagnostic checking. Figure 3.1 depicts the
process of ARIMA modeling. The procedure is itemized as follows:

Original
Streamflo

Model
Identificatio
Parameters
Estimation

No

Diagnostic
Checking

Is
adequate?

Yes
Streamflo
w
Figure 3.1: Flowchart of ARIMA modeling (Lee and Ko, 2011)

3.3.2.1 Model Identification

One determines whether the time series is stationary or nonstationary. Examine a
time series plot or ACF. From ACF, if large autocorrelations do not die out, indicating
that differencing may be required to give a constant mean. A seasonal pattern that
repeats every kth time interval suggests taking the kth difference to remove a portion of

30
the pattern. Most series should not require more than two difference operations or
orders. Be careful not to overdifference. If spikes in the ACF die out rapidly, there is no
need for further differencing.

Next, examine the ACF and PACF of your stationary data in order to identify
what autoregressive or moving average models terms are suggested. Some general
guidelines (SPSS, 1993) using graphical method was applied in the identification
process:

i.

Nonstationary series have an ACF that remains significant for half a dozen or
more lags, rather than quickly declining to 0. Difference must be done for such a
series until it is stationary before it can be identified.

ii.

Autoregressive processes have an exponentially declining ACF and spikes in the
first one or more lags of the PACF. The number of spikes indicates the order of
the autoregression.

iii.

Moving average processes have spikes in the first one or more lags of the ACF
and an exponentially declining PACF. The number of spikes indicates the order
of the moving average.

iv.

Mixed (ARMA) processes typically show exponential declines in both the ACF
and the PACF.

At the identification stage, the sign of the ACF or PACF and the speed with which
an exponentially declining ACF or PACF approaches 0 are depend upon the sign and
actual value of the AR and MA coefficients (SSPS, 1993).

31
3.3.2.2 Parameter Estimation

Once the tentative model is formulated, the related model parameters are
estimated using the least squares scheme. Parameters are estimated to have zero gradient
of forecasting errors to the historical load data. The primary objective of this parameter
estimation is to minimize the forecasting error and determine both the model and its
parameters (Lee and Ko, 2011). Each ARIMA tentative model parameter can be tested
using t-values and p-values. Dividing the coefficient by its standard error calculates a tvalue.

3.3.2.3 Diagnostic Checking

Then, diagnostic test was conducted to ensure that the essential modeling
assumptions are satisfied for a given model. When the parameters have been well
estimated, the tentative model accuracy is validated by examining the ACF and PACF
residuals. The residuals should simulate the white noise process. Furthermore, the Qstatistics test is applied to confirm the tentative model (O’Donovan, 1983). If the
calculated value Q exceeds the critical value of χ2 obtained from the chi-square tables,
the tentative model is inadequate (Lee and Ko, 2011).

Furthermore, for this stage, Ljung-Box is used for testing white noise residual.
Hypothesis null is that residual should be white noise. In other word, the residual series
should be independent, homoscedastic (having constant variance), and normally
distributed. We can reject hypothesis null if p-value in Chi-Square statistic greater than
alpha of 5%.

then go to step No. D=2. 2 2. which is called Minitab version 15. 3. a specific ARIMA model is applied to predict the future monthly streamflow for 1 year ahead. k=24) 6. Apply the non-seasonal difference (d=1. 3 • If non-stationary. Identify general theoretical PACF of ARIMA model 5. When the steps in ARIMA modeling are completed.3 Minitab Procedures For modeling ARIMA model. By using Minitab. k=1) 3. then go to step No. then go to step No. then go to step No. 5 • If ACF indicating seasonal pattern.32 These steps are repeated until an adequate model is identified. 5 . 6 4.3. a statistical software has been uses. Apply seasonal difference (D=1. k=12. 6. Identify seasonal pattern of the data using ACF • If ACF indicating non-seasonal pattern. then go to step No. Identify stationay of data • If stationary. Identify general theoretical ACF and PACF of ARIMA model • If seasonal pattern of ACF and PACF is still found from step No. ARIMA model step can be summarized as follows: 1.

Mean Absolute Percentage Error (MAPE): (3. Apply the rest of procedures which are estimation. 6until obtaining the best forecasting pattern. 7 7.8) 3.4 Model Comparison and Forecast Evaluation Measures In order to compare the forecasting accuracy of the different models.7) 2. 2009): 1. diagnostic check and forecasting according to step No. Chi-Squared Test: (3. 3. Root Mean Squared Error (RMSE): (3.33 • If non-seasonal pattern of ACF is found then go to step No. The following indices were used to evaluate the performance of the models (Shalamu. a multicriterion performance evaluation procedure was used in this study.9) .

Yi = the observed flow Fi = the forecasted flow .34 where.

Selangor (Station No. 3615412).CHAPTER 4 RESULT AND DISCUSSION 4. The models will be checked to get an adequate model for streamflow forecasting. Most of computation work for ARIMA and Markov models are carried out by using Minitab Microsoft Excel. respectively. Data from January 1960 to December 2010 was used in deriving stochastic and forecasting models.1 Introduction This chapter consists of detail description on analysis of time series data using both Markov and ARIMA modeling method for streamflow forecasting. Another 60 months data from January 2006 to December 2010 is used as validation set. Both of the methods will be used to model the streamflow of Sungai Bernam at Tanjung Malim. . Data of 552 months from January 1960 to December 2005 are used as calibration set for both model.

513 with coefficient of determination.126x + 2. Skc (m3/s). February 1962 and March 1962. which y and x represented flow of Station Tanjung Malim (m3/s) and Jam.2 Estimation of Missing Data Values Some of data values are missing in the data series for Sungai Bernam streamflow at Tanjung Malim (Station No. R2 of 0. This is shown in Figure 4. In order to handle missing data for this study. . For example. streamflow data of adjacent station at Jam. respectively. As there was missing data of streamflow for Sungai Bernam at Tanjung Malim. Some adjacent observations month of streamflow data (previous and forward month) of both station are used to get the regression line to estimate the missing data. Skc (Station No. February and March 1962 can be completed by using equation of linear regression y = 0.36 4. linear regression between flow of study area station and flow of adjacent station is used.1.1: Linear Regression of Two Streamflow Station for 1962 Missing month data of Station Tanjung Malim for January. there is missing data of January 1962. Figure 4. Regression line is determined as the best way to predict y from x. 3813411) is used.845. 3615412).

37 If data still cannot be obtained may be because the adjacent streamflow station also had missing data for that month. Skc (m3/s) and x represented rainfall for Station Ldg. which y and x represented flow of Station Tanjung Malim (m3/s) and Jam. The equation of the linear regression was found to be y = 0. Skc.Skc. 3714152) are used to get the regression equation with flow data of Station Jam. Malim (Station No.146x + 10.603. we can use that data to estimate the missing data of Station Tg. which y represented flow for Station Jam.673 with coefficient of determination. Skc (m3/s). Katoyang at Tg. Skc as shown in Figure 4.2: Linear Regression of Rainfall and Streamflow After we know the streamflow data for February 1993 to May 1993 at Station Jam. Malim from the regression equation of both streamflow by using equation of linear regression y = 0.43 with coefficient of determination.2. Figure 4. For example there is missing data from February 1993 to May 1993 for both station of Tg. R2 of 0.3 showed the regression line for the equation. Katayong (mm). rainfall data for adjacent station can be used to get the regression equation to estimate the missing streamflow data. . Malim and Jam. Figure 4. R2 of 0.892. respectively.112x + 3. Some adjacent observations month of rainfall data (previous and forward month) of Station Ldg.

(3) generating random numbers of the same distribution and statistical characteristics and (4) constituting the deterministic and combining with the random part. 4. (2) identifying the frequency distribution of the historical data. .3: Linear Regression of Two Streamflow Station for 1993 After replacing all the missing data with appropriate estimation data from the linear regression method. streamflow data of Sungai Bernam is shown in Appendix A.3 Markov Model Formulation of Markov Model is based on the procedures of data synthesis which are: (1) determination of statistical parameters from the analysis of the historical record.38 Figure 4.

the sample standard deviation.4: Descriptive statistics of Sungai Bernam data For data calibration. Figure 4.18863 and coefficient of variance is 0. These statistical parameters can be calculated using Microsoft Excel or can be obtained from EasyFit software.1 Statistical Parameters of Historical Data The sample mean flow for 612 month of data is 9. .47828.2. The result of the descriptive statistics using EasyFit is shown in Figure 4. skewness is 1.4.1.3. parameters of monthly historical data from January 1960 to December 2005 which using 552 data is shown in Table 4.75 m3/s.39 4. to model the streamflow. S is 4. Then.66. standard error is 0.

5 and Figure 4. Kolmogorov-Smirnov (K-S) test.27283605 0.05 Oct 0.71759E-05 0.22414E-05 0.3699155 4.001 0.046522 9.585 3.002 0.053852 7.822355866 0.01006 0.06 Dec 0.990121105 0.369 3.05 Sep 0.43 at ranking 41 while for lognormal distribution is 34.000101211 0.046073 7.05 Aug 0.4901265 3.009847 0. K-S goodness of fit test for normal distribution is 0.001 0.07 Jun 4.169 at ranking 6.049549 9. statistical test is used for estimating the parameters of a probability distribution.008499 0. By using EasyFit application.008785 0.94571E-05 0.10128E-05 0.05 Mac 0.06 Feb 0.05 July 0.294 4.002 0.89268E-05 Identification of Distribution In this study.363576896 0.69723E-05 0.001 0.001 0.047227 7.515 2.059644 7.05954 at ranking 2. For AD goodness of fit test for normal distribution is 139.6). the best-fitting distribution can be found.303 3.001 0.008496 0.04537 0. Anderson Darling (AD) test and Chisquared test can be used as statistical test.008334 0.89806E-05 0.447681936 0.2 0.05 Apr 0.62886E-05 0.406 3.065038 6.00954 0.009529 0.738293291 0.00943 0.408 3.001 0.054888 5.13466 at ranking 42 while for Lognormal distribution is 0. Best-fitting distribution for the streamflow data of Sungai Bernam is Lognormal Distribution (Figure 4.05 May 0.001 0.05 6.008305 0.639813919 0.008734 0.189053605 0.349038581 0.69337796 0.5777814 3.21161E-05 0.4442686 4.059643 0.001 0.175448792 0.3.541 3.001 0.05 Nov 0.007219 0.1: Parameters of Monthly Historical Data i qj S2 Sj Rj Sj-1 bj qj-1 Jan 0.001 0.761513315 0. . K-S test has being used as preference as it is more powerful and robust.07979E-05 0.05187 9.0488 8.21758E-05 0.40 Table 4.

22 0.2 0.02 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 Flow.24 0.41 0.08 0.06 0.1 0. Gaussian (3P) Figure 4.3 0.16 0.5: Probability Density Function Log-normal distribution is positively skewed.26 0. match with characteristic of many hydrologic variables. q (m3/s) Histogram Inv.14 0. This distribution is suitable for low-flow studies because small changes in low values produce large changes in their logarithmic values.12 0.04 0.28 0. .18 0.

060 0.040 0.056 0.044 0.056 0.4 0.058 0.2: Logarithmic Values of Observed Streamflow Data for 1960-1970 i 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 Jan 0.052 0.9 0.045 0.060 0.055 0.053 0.037 0.045 Sep 0.044 0.055 0.046 0.051 0.051 0.045 0.060 0.054 0.065 0.056 0.066 0.058 0.6: Cumulative Distribution Function As the distribution is log-normal.043 0.065 0.055 0. use the logarithm of the values and finally convert back the flows.051 0.065 0. while other data for year (1971-2005) can be found in Appendix B.044 0.058 Dec 0.045 0.060 0.047 0.040 0.050 0.045 0.059 0.046 0.053 0.1 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 Flow.056 0.054 0.059 0.039 0.066 0.052 Jun 0.051 0.8 0.055 0.050 0.052 0.070 0.045 May 0.044 0.043 0.031 0.044 0.060 0.6 0.041 0.063 0.058 0.047 0.046 0.048 0.046 0.057 0.2.057 0.2 0.043 0.069 0.072 0.046 0.056 0.044 0.054 Feb 0.059 0.056 0.050 0.058 0.056 0.058 0.42 1 0.050 0. q (m3/s) Sample Inv.075 0.055 0.057 0.049 0.5 0.057 0.059 0.067 0.077 0.072 0.053 0.058 0.034 Mac 0.042 0.037 0. These data as act calibration set to get the parameter of historical data in order to model the future streamflow.050 0.061 .068 0.057 0.046 0.060 0.046 0.069 0.045 0. observed streamflow data in logarithmic values for 1960 until 1970 is shown in Table 4.057 0.038 Jul 0.036 Apr 0.7 0.052 0.047 0.048 0.054 0.3 0.055 0.072 0.052 0.050 0.043 0.045 0.054 Oct 0.049 0.055 Nov 0.058 0.079 0. Gaussian (3P) Figure 4.053 0.064 0. Table 4.066 0.060 0.058 0.043 Aug 0.045 0. For an example.043 0.064 0.076 0.

of mean equal to 1 and unit standard deviation.1) Value of z can be obtained from cumulative distribution function (CDF) of the log-normal distribution: (4.43 4. t.3 Generation of Random Numbers In this study.3. we use inverse error function. erf-1(z): (4. To get the random normal deviate. we generate random numbers using Microsoft Excel command RAND( ).7: Cumulative distribution function of the log-normal distribution .2) Figure 4.

44 (4.3. the calculation procedure of random numbers generation for year 2006 is shown in Table 4.3) As log-normal random numbers have both mean and standard deviation equal to one.5) As an example. Therefore. . then erf -1 (y) = x. (4. the Equation 4. Therefore. Let.4) If erf (x) = y. while the random numbers generation for other year (20072010) can be found in Appendix C. The value of t = ln x.3 becomes: (4.

443813 3.997494 0.29772 October 0.4.044674 -0. 1962): (4.3.49313 April 0.15355 -0.083163 September 0.16487 0.021363 December 0. while the streamflow model for other year (2007-2010) can be found in Appendix D.808656 -0.527923 -0.058805 1.j January 0.236038 -0.04983 0.399289 0.056176 -0.063732 -0.533139 0.280915 June 0.550577 -0.041859 August 0.91763 -0.407816 -0.370085 1.4 Streamflow Generation of Markov Model As an example.523379 February 0.929536 July 0.766834 4.429319 3.53482 0.243657 May 0.224711 -0..994989 1.45 Table 4.08027 0.471912 -0.999341 0.0558 -0.910651 -1.50847 0. developed by Thomas and Fiering is using the following form (Maass et al. The Markov model for monthly flows.3: Generation of Random Number for Year 2006 i RAND ( ) z erf -1 ti.184368 -0.095672 -0.872536 -1.63136 November 0. the calculation deterministic part considering the persistence (influence of previous flows) and combining with the random part to develop monthly streamflow model for year 2006 is shown in Table 4.998683 1.45481 -0.066278 0.699645 0.090379 -0.886483 March 0.6) .

j-1 qj+bj(qi-1.063 0.004 0.053 0.04726108 0.055 0.022 0.049541669 0.007 0.002 -0.49313 0.089 0.6 to develop Markov model for monthly flows.807 10.054 0. .077 5.046082272 0.024 0.068 0. from Figure 4.043 0. we can say that Markov model cannot work well for streamflow forecasting for Sungai Bernam because it not match well with the actual streamflow.057 0.886483 -0.052 0.013 0.210 16.055 0.067 13.055 0.002 0.005 0.083163 -0. Graphically.057 0.021363 0.766834 0.5 Validation of Markov Model The model streamflow by using Markov model is compared with the observed streamflow that have been set as validation set for 60 monthly data from January 2006 to December 2010.053 0.049549 0.048803168 0.055 0.j (Log) qi.821 32.642 9.j (m3/s) 0.849 4.929536 3.243657 0.014 8.007 -0.3.j-1-qj-1) Random Component ti.052 0.053859993 0.065034911 0.043 0.063 0.045386033 0. Flow in ith month from the beginning.280915 0.051865643 0.j√(1-rj2) 0.326 15.29772 -0.523379 0.j Sjti.002 0.422 10.604 10.533 9.054 0.068 0.041859 1.641 9. for jth month of the year can be modeled by adding mean of flow of jth month of the year (January to December) with deterministic and random component.055 0.8.46 We will use Equation 4. Table 4.059642058 0.4: Model Streamflow for Year 2006 i Jan Febr Mac Apr May Jun Jul Aug Sep Oct Nov Dec Deterministic Component qi-1.089 1.059661808 0.054889433 0.055 0.04653475 0.63136 3.008 -0.007 Model flow qi.

47 Figure 4.5: Accuracy of the Markov Model Performance Evaluation Procedure Markov model MAPE 53. Table 4.99 .66 RMSE 7. The result of inspection is summarized in Table 4.8: Comparison of Observed and Markov Model Flow The ability of Markov model in streamflow forecasting is inspected by using some forecast evaluation measures like Root Mean Square Error (RMSE). Chi-square Test and Mean Absolute Percentage Error (MAPE).5 and the details of the calculation can be found in Appendix E.29 Chi-square test 250.

9).Stationary & nonstationary time series . the ARIMA modeling follows three important stages that can be figured in flow diagram of Box-Jenkins methodology (Figure 4. Diagnostic Checking [Is the model adequate?] . As mentioned in previous chapter. an appropriate ARIMA tentative model for Sg. Bernam streamflow is investigated.9: Flow Diagram of Box-Jenkins Methodology .ACF & PACF 2.4 ARIMA Model In this study.48 4. Tentative Identification No . 1. Examination of the autocorrelation function (ACF) and partial autocorrelation function (PACF) provides a thorough basis for analyzing the system behavior under time independence. Forecasting -Forecast calculation Figure 4. These tentative models will be checked and best tentative model will be selected for streamflow forecasting of ARIMA model. and will suggest the appropriate parameters to include in the model. Parameter Estimation -Testing parameters 3.Normal distribution of residual Ye 4.White noise of residuals .

49 4. The data need to be applied with nonseasonal difference (d = 1.1 Model Identification Identification involve looking at the graph of sample autocorrelation function (ACF) and sample partial autocorrelation function (PACF) to determine whether the series is stationary or not and then make a decision what functional form best fits and appropriate model for the data. 2010). . Based on graphical examination.4. The most common method to check stationary is through examining the time series plot of the data. This makes the model identification more difficult and can involve much trial and error (Nazuha et al. In practice. Stationary means that data fluctuate around a constant mean. Figure 4.11 showed that the data is stationary at the level of the data after applying non-seasonal difference. the ACF and PACF are random variables and will not give the same picture as the theoretical functions. Figure 4. lag.10 showed that the data is non-stationary.. differencing needs to be applied. If the time series plot is found to be non stationary. k = 1).

10: Non stationary data of Sg. Bernam streamflow The next step is to identify the values of p and q which are the AR (p) and MA (q) components for both seasonal and non-seasonal series.50 12 30 1111 Streamflow. Bernam streamflow 10 15 11 12 10 12 Streamflow. Yt (m3/s) 25 11 20 15 10 5 9 1111 11 10310 11 8 12 246 10 10 7 12 11 10 9 12 11 1212 10 9 12 4 12 12 11 10 5 1 10 10 12 11 11 611 8 10 11 5 1 11 7 12 12 12 9 5 8126 3 12 12 12 411 1 11 10 11 35 12 1010 11 10 11 4 1 11 10 10 11 12 411 115 4610 8 12 11 11 11 510 14 10 11 79 79 10 4 11 10 5 11 105 9 6 4510 125 11510 13 11 410 12510 12 10 11 1 99 3411 2 4 5813 2 4 59 5 310 10 14 5 511 119 5 510 3459 9 9 412 11 4 1246 711 17 9 7 9 1011 9 511 5 51035 15 12 10 9 5 5 125 5 93 9 6 9 10 12 1 9 4115 26 5102 12 9 9 5 5 13 2 6 8 4 93782 1 12 469 6 10 1 8 12 8 9 9 9 594 6 49 8 4 2 2 41281278 67 12127 7 12 16 8 125 9 11 281468 81 3 36 6 6 6 5 382 11 13 8 8 1 12 3410 1 2 9 49 10236 9 12 11 12 1010 1610 12 7 6 36126812 8 2 72 169 1 9 12 1 9 2 7 4 4 6 1 9 5 2 12 10 7 4 7 2683 8 16 4 127 2 68 6 3727 6 67 612 5 3 248 48210 1 3 10 27 5 10 5 5 34 3 1693 289 8 4 68 1 4 2 235 68 78 9 7 4 78 7 7 2 3 1 5 8 7 8 7 4 10 9 6 4 3 5 1 1267 8 7836 83 14 3 6 67 237 5 2 9 928 34 378137 57 8 78 36 16 7 1 7 7 1 2 2 7 681347 2 73 78 6713 12 2 136 2 1 4 8 3 1 9 2 42 3 238 7 4 3 2 238 2 2 11 0 Month Jan Year 1960 Jan 1967 11 12 Jan 1974 Jan 1981 Jan 1988 Jan 1995 Jan 2002 Figure 4. d1-Yt (m3/s) 10 5 0 -5 -10 -15 9 10 11 11 11 10 11 410 10 11 9 10 11 1010 5 5 10 11 410 11 4 10 11 4 11 411 10 5 11 10 10 4 11 4 12 9 115 9 812 9 1 7 4 11 9 8 9 5 7 11 5 412 1211 2 11 3 11 10 8 12 9 3 4 2 49 925 310 9 9 9 9 99 3 8 10 9 8 9 12 11 269 9 10 9 9 10 126 410 9 9 89 11 121110 10 9 3411 46 4 48 45 5 8 4 7 8 5 94 10 8 10511510 9 4 611 349 5 10812 34 10 35 9 57 12 4511 3 5 3 95 8 4 12 4 4 4 358 88 11 511 3 69 5 39 5 39378 10257 10683 6 89 8 39 10 711 8 5 5 511 12 4 117 9 7 12 1310 9 12 12 4 3 2 11 8 4 4 3 5 7 3 5 5 10 9 11 7 5 3 4 356 34 5 78 78 8 2 7 8 7 8 4 5 58 1 3 4 46 724 8 3 5 2 4 9 23 48 57 3 2813 3 3 10107 47 12 26116712 9 72382 93 12 5 18 3836 11 8 7 238 6 29 47 34 6 2 2 4 38278 37 5 10211 5 23 11 8 10 11 2 12 11 9 2 8 10 3 1 102 63 2 7 2 67 11 11 2 1 128 79 237 2 1 1 10 7 6 1 4 256 716 7 3 46 6 6126 7 2 2 2 10 3 712 7 37 6 1682 6 8 7 5 581 247 512 6 6 12712 71 1 68 161 268 7 12 917 5 6 1 127 612 11 2 1 6 67 6 10 71 82 126 5 10 116 1107 6 1 2 6 6 11 52 1 12 12 12 1 10 12 6 1 2 1 6 1 11 612 11 1 2 1 1 12 12 6 1 1 1 1 1212 12 6 10 12 1 1212 12 1 1 2 1 12 12 12 1 12 Month Jan Year 1960 Jan 1967 Jan 1974 Jan 1981 Jan 1988 Jan 1995 Jan 2002 Figure 4.11: Stationary data of Sg. the ACF and . For this purpose.

The following Table 4.0 -0.2 0.6 -0.8 Autocorrelation 0.8 -1.6: General Theoretical ACF and PACF of ARIMA models Model MA(q): moving average of order q ACF Cut off after lag q PACF Dies down AR(p): autoregressive of order p Dies down Cuts off after lag p ARMA(p.6 gives general theoretical for identification of the likely model: Table 4.4 0.0 1 5 10 15 20 25 30 35 Lag 40 45 50 55 60 65 Figure 4.0 0.q): mixed autoregressive.4 -0.Dies down moving average of order (p.51 PACF coefficient are computed.q) Dies down AR(p) or MA(q) Cuts off after lag q Cuts off after lag p No order AR or MA (White Noise or Random process) No spike No spike Autocorrelation Function for d1-Yt (with 5% significance limits for the autocorrelations) 1.12: ACF after non-seasonal difference .2 -0.6 0.

d.0 0. Based on the pattern.6 -0. q was determined for ARIMA is: ARIMA (1.8 -1. 1. the respective values of p. From ACF correlogram.52 Partial Autocorrelation Function for d1-Yt (with 5% significance limits for the partial autocorrelations) 1.8 Autocorrelation 0. k = 12) needs to be applied.0 -0.4 -0.13: PACF after non-seasonal difference As we can see from the Figure 4.4 -0.12 and 4.2 -0.6 0. Autocorrelation Function for D1-d1-Yt (with 5% significance limits for the autocorrelations) 1.0 Partial Autocorrelation 0. seasonal difference (D = 1.2 -0.4 0. 1).2 0.13. lag.0 1 5 10 15 20 25 30 35 Lag 40 45 50 55 60 Figure 4.6 -0. ACF and PACF die down gradually.6 0. As ACF is indicating seasonal pattern.0 1 5 10 15 20 25 30 35 Lag 40 45 50 55 60 65 Figure 4. seasonal pattern of the data is identified.8 -1.0 -0.8 0.14: ACF after seasonal difference 65 .4 0.2 0.

1)12. 1)12. q) (P. Dividing the coefficient by its standard error calculates a t-value.0 1 5 10 15 20 25 30 35 Lag 40 45 50 55 60 65 Figure 4.0 -0. we can see from the Figure 4.4 0. we suggest another tentative model which is ARIMA (1. However. D. 1. 1. The standard error of a coefficient helps determine whether the value of the coefficient . Q was determined for ARIMA is: ARIMA (0. For seasonal ARIMA. It measures how precisely your data can estimate the coefficient’s unknown value.14. d.53 Partial Autocorrelation Function for D1-d1-Yt (with 5% significance limits for the partial autocorrelations) 1.4.2 -0.6 0. the general notation is ARIMA (p. D.2 Parameter estimation Each ARIMA tentative model parameter can be tested using t-values and pvalues.4 -0.15. ACF cuts off after lag 12 while in figure 4.8 0. Based on the pattern.2 0. in order to make sure that we have identified the right model.8 -1. Q)S.6 -0. the respective values of P. 4. Its value is always positive. and smaller values indicate a more precise estimate. The standard error (SE) of coefficient is the standard deviation of the estimate of a regression coefficient. PACF dies down.15: PACF after seasonal difference After applying seasonal difference.0 Partial Autocorrelation 0.

26 is too small to declare statistical significance. Hence.9553 0.35 0.61 0.2782 0.1.7: Final Estimates of Parameters for ARIMA (1.7.8 which estimates parameters for ARIMA (1. the standard error of MA 1 coefficient is large relative to the value of the coefficient itself.000 .24 0.8788 0.000 MA 1 0. If the p-value associated with this t-statistic is less than alpha level.df = n-np.25).2894 0.1.8: Final Estimates of Parameters for ARIMA (1.0256 34. We reject hypothesis null if |t|> tα/2.0248 35.0589 0.1)(0.0206 46.208 MA 1 0.1. So we can conclude this coefficient not differs from zero.000 SMA 12 0. From Table 4.1)(0. hypothesis null cannot be rejected.26) < ttable (=2.0467 1. Table 4. we can conclude that the coefficient is significantly different from zero.1)12 Type Coefficient SE Coefficient T p AR 1 0. and we can conclude that the coefficient is significantly different from zero.9537 0. hypothesis null can be rejected.54 is significantly different than zero.1. The resulting p-value also is much greater than common alpha level.1.000 SMA 12 0.000 SAR 12 0. For MA 1 parameter. Table 4.000 Table 4.98 0.26 0. Therefore.1)12 have |tcalc|> ttable (= 2.1)(1. so the t-value of 1.8765 0.0184 51.25) and p-value is less than alpha level.1.41 0.25 0.1)12 Type Coefficient SE Coefficient T p AR 1 0. tcalc (=1.0520 5.0516 5.

We can reject hypothesis null if p-value in Chi-Square statistic greater than alpha of 5%. Several procedures can be applied to check the adequacy of the model as to whether the model satisfies the stability or stationary condition. Table 4. So.10 showed p-value for both tentative models.1 DF 8 20 32 44 p-Value 0. In this study. Ljung-Box is used for testing white noise residual.9 and Table 4.2 61. 2004).3 Diagnostic Checking The next step of model identification method of time series modeling approach is diagnostic checking. the residual series should be independent.1.007 0. homoscedastic (having constant variance).1)(1. as required in stochastic modeling works (Ayob and Amat. and normally distributed. For this stage.1)12 Lag 12 24 36 48 Chi-Square 21.7 98.4.1. It is aimed at examining the accuracy of the chosen tentative model in ensuring that the modeling assumptions are satisfied.000 0.55 4.9: Modified Box-Pierce (Ljung-Box) Chi-Square statistic for ARIMA (1. In other word.000 0. Table 4. both ARIMA tentative models have p-value less than alpha level. the hypothesis null cannot be rejected and we can conclude that residual is significantly white noise for both tentative models. Hypothesis null is that residual should be white noise.000 .8 82.

1)(1.φ1B)(1-B)(1-B12)Yt = (1.1. The model we identified as best-fit model for Sg.56 Table 4.10: Modified Box-Pierce (Ljung-Box) Chi-Square statistic for ARIMA (1.7) . from two tentative models possible.1.1)(0.4 So.1)12 Least Square Error (LSE) 1798 1760 Root Mean Square Error (RMSE) 5. RMSE also is a good measure of accuracy.11: LSE and RMSE Test for ARIMA Tentative Model ARIMA Test ARIMA 12 (1. The best fit in the least-squares sense minimizes the sum of squared residuals.1)(0.2 82.θ2B12)at (4.1 62.11. The smaller the value of LSE and RMSE.1)12 Lag 12 24 36 48 Chi-Square 23.1)12. Table 4. a residual being the difference between an observed value and the fitted value provided by a model. The result for the test on the tentative model is summarized in Table 4. Bernam streamflow is: (1 .006 0. Forecasting is made based on the chosen model. the model that best fits the criteria and meets the requirement is model ARIMA (1.1. the best tentative model can be determined through test of Least Square Error (LSE) and Root Mean Square Error (RMSE).1.000 0.9 DF 9 21 33 45 p-Value 0.θ1B)(1.1) (1.1.7 97.1. the tentative model is more accurate.1)(0.000 0.000 Besides that.1.5 5.1.

θ1B .0.θ2B12.B12 – (1+ φ1)B + (1+ φ1)B13 + φ1B2 .8) Equation (4.2894Yt-14 + 0. (2) a trend component determined by the difference of previous month’s value and last year’s previous month’s value and difference of last year’s previous two month’s value and previous two month’s value.θ2B12 + θ1θ2B13)at Yt – (1+ φ1)Yt-1 + φ1Yt-2 – Yt-12 + (1+ φ1)Yt-13 .2894 Yt-1 .2894Yt-13 .1.θ1at-1 – θ2at-12 + θ1θ2at-13 Yt = (1+ φ1)Yt-1 .θ1B + θ1θ2B13)at (1 .9553 Yt = (1+ 0.2894Yt-13 + 0.(1+ 0.8788at-12 + (0.θ2B12. θ1 = 0.θ1at-1 – θ2at-12 + θ1θ2at-13 Noted that.9553at-12 + 0.8788at-1 – 0.φ1B14) Yt = (1.θ1B + θ1θ2B13)at (1-B12-B+B13.1.2894 MA1.57 Rewriting the model.8788at-1 – 0.8) can be used for streamflow forecasting of ARIMA model.2894Yt-2 + Yt-12 .φ1B)(1-B12-B+B13)Yt = (1.2894Yt-2 + 0.8395at-13] (4.8788 SMA 12 θ2 = 0.8395at-13 Yt = Yt-12 + [1.φ1Yt-14 = at .θ2B12.8 also.9553)at-13 Yt = 1.2894) Yt-13 + 0.9553at-12 + 0. AR1. .8788at-1 – 0. φ1 = 0. From Equation 4.8788x0.2894) Yt-1 – 0. we have the following: (1 . t-12 and t-13 on the forecast.θ1B + θ1θ2B13)at (1 .φ1Yt-2 + Yt-12 .2894Yt-14 + at – 0.2894Yt-2 + Yt-12 .φ1B+ φ1B13+ φ1B2.φ1B)(1-B12-B+B13)Yt = (1. (3) the effects of random shocks (or residuals) of period t.2894Yt-14 + at – 0.(1+ φ1)Yt-13 + φ1Yt-14 + at .2894Yt-14] + [at – 0. its explained that the forecast for time period t is the sum of (1) the value of the time series in the same month of the previous year.φ1B14) Yt = (1.2894 Yt-1 – 0. t-1.

22 17. develop monthly streamflow model using Minitab for year 2006 to 2007 is shown in Table 4.2612 9.7046 7.2005 12.72 11.03841 * * * * * * * * * * * * * 7.6970 7.6110 6.36 14.3794 Residual Fit Coefficient * * * * * * * * * * * * * -1.08 8.4816 0.4603 15.14946 1.5168 12.06 8.38 13.8818 7.2889 15.80 7.9369 7.10867 -1.26 8.878761 0.05700 -0.289364 0.57988 -1.6732 7.05 17.04180 1.1341 8.1884 7.12.9949 9.05 11.5379 12.33 14.3913 7.05 6.11 29.2208 11.72 29.95 9.2195 7.5299 7.2217 12.12 6.1060 9.24 Model Flow (m3/s) 9.0851 9.58500 4.9281 7.63 17.95 28. Table 4.7248 7.9227 6.99026 -3.955283 .0165 9.4.39072 -1.6507 9.3581 7.12: Model Streamflow for Year 2006-2007 i Jan 2006 Feb 2006 Mac 2006 Apr 2006 May 2006 Jun 2006 Jul 2006 Aug 2006 Sep 2006 Oct 2006 Nov 2006 Dec 2006 Jan 2007 Feb 2007 Mac 2007 Apr 2007 May 2007 Jun 2007 Jul 2007 Aug 2007 Sep 2007 Oct 2007 Nov 2007 Dec 2007 Actual Flow (m3/s) 13.26505 -2.82 7.6286 6.3101 15.94 9. while the streamflow model for other year (2008-2010) can be found in Appendix F.6250 11.04920 0.4 Streamflow Generation of ARIMA Model In this study. we will use Minitab to develop Markov model for monthly flows.46 12.4570 10.23 9. As an example.58 4.62 13.0050 13.

from Figure 4. The ability of ARIMA model in streamflow forecasting is inspected using some forecast evaluation measures.4.16: Comparison Observed and ARIMA Model Flow Like in Markov model’s validation. we can say that ARIMA model may works quite well for streamflow forecasting for Sungai Bernam because many data from model match well with the actual streamflow. Chi-square Test and Mean Absolute Percentage Error .5 Validation of ARIMA Model The model streamflow by using ARIMA model is compared with the observed streamflow that have been set as validation set for 60 monthly data from January 2006 to December 2010.59 4. the forecast evaluation measures like Root Mean Square Error (RMSE). Graphically. Figure 4.16.

The result of inspection is summarized in Table 4. Observed streamflow data that have been set as validation set for 60 monthly data from January 2006 to December 2010 is used as bench mark to make the comparison. Most of streamflow forecast by Markov model has higher streamflow value rather than the actual data. We can use Markov model for short- . we can say that ARIMA model is better for streamflow forecasting for Sungai Bernam because more data from ARIMA model match with the actual streamflow.50 RMSE 5.60 (MAPE) are used to examine the accuracy of ARIMA model.41 Chi-square test 191.13 and the details of the calculation can be found in Appendix G. However. Markov model is not good rather than ARIMA model because the model cannot obtain the exact or similar pattern with the actual ones. From From graphical examination on Figure 4.11 Model Comparison and Forecast Evaluation Measures Streamflow forecasting methods of Markov model is being compared with ARIMA model to inspect the accuracy between the models in forecasting ability.5 Performance Evaluation Procedure ARIMA model MAPE 27. these high values are a good forecasting as a reference guideline to prevent damage due to flood problem. In the accuracy aspects.17. Table 4.13: Accuracy of the ARIMA Model 4.

ARIMA model which is good at short-term forecasting can also be used to control flood. Lower streamflow forecasts is needed in some of agriculture field to make sure that plants have sufficient water and grow well. like hourly and daily forecasting in order to give more accurate flood warning. ARIMA cannot forecast accurately for longer period as it is best used for short-term forecasting. . it will tend to become flat for sufficiently long period.17: Model Comparison For short period. Meanwhile. Figure 4. Usually. if the forecasts streamflow has the lower value from the actual data. ARIMA model can obtain the exact or similar pattern with the actual ones.61 term forecasting. Actually. we cannot estimate the flood occurrence.

In this study. Markov model cannot remove non stationary data but the advantage of ARIMA model is it can transform non stationary data to stationary data. Bernam is non stationary. the equation is as follows: . in this study.14 shows the result of model comparison of MAPE. one factor that ARIMA model is better than Markov model because the historical data for Sg. If the historical data is stationary.4156 Chi-squared test 250. From the result of the performance evaluation procedure.11 The minimum value of MAPE.29 5.99 191. RMSE and Chi-square test for both Markov and ARIMA models are compared.14: Accuracy of the model Performance Evaluation Procedure Markov model ARIMA model MAPE 53.66 27. Therefore. ARIMA model selected as best fit as it has minimum mean squared forecast error and therefore it often used in statistical practice. RMSE and Chi-squared methods indicates that the model is the best for streamflow forecasting. it showed that ARIMA has less value for all methods used to find the accurate model. Table 4. criteria performance evaluation procedures which are MAPE. for forecasting one period ahead. RMSE and Chi-Square test for each model. which is Yt+1. Table 4. the best performance of model for streamflow forecasting between these two models is ARIMA model. Therefore.62 In order to inspect the forecasting accuracy of the different models.50 RMSE 7. Markov may has advantage because it is propagating the probability method which transition from state to another state is depend on probability.

2894Yt-12 .9553at-11 + 0. This time series plot reveal pattern of cycles of ARIMA model. We can see that.0.18: Streamflow for actual and model The next 5 years is the forecast streamflow using ARIMA model which is 60 months from January 2011 to December 2015.2894Yt-13] + [at+1 – 0.8788at – 0.1. Yt (m3/s) 25 20 6 10 11 11 11 11 11 11 11 11 11 3 11 4 47 1 4 10 11 1012 1012 1012 12 1012 1012 10 1012 1012 5 1012 6 1012 12 6 1 7 12 12 5 5 9 5 9 5 5 9 5 9 5 5 9 5 1 5 89 9 345 9 12 4 89 4 911 4 9 4 9 4 4 1 4 89 4 4 5 2 7 1 3 6 12 1 6 1 6 1 6 1 6 8 1 6 1 6 8 1 67 10 1 6 34 6789 3 8 23 78 23 78 23 78 23 78 36 23 6 8 3 7 23 78 2 78 2 78 1212 7 10 2 7 5 3 23 11 15 10 5 11 10 11 Month Jan Jan Jan Jan Jan Jan Jan Jan Jan Jan Year 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 Figure 4. Figure 4. The first 5 years from Jan 2006 to December 2010 is the calibration process.2894 Yt .63 Yt+1 = Yt-11 + [1.18 shows the comparison of pattern of streamflow for actual and model streamflow for Sungai Bernam.9) By using Minitab. 30 4 5 Variable Yt-actual Yt-model 9 Streamflow. we can easily do streamflow forecasting for the future values of time series from current and past values.8395at-12] (4. the model . We can see from the figure. the model flows follow the pattern of observed streamflow quite well although the data is nonstationary for several years.2894Yt-1 + 0.

This is because ARIMA model is only good and best suited for short term forecasting since its forecast on previous observations.64 can forecast well but the pattern of streamflow is repeated the same pattern for longer period. . For short term forecasting. BoxJenkins model can nicely reproduce the details of the original series. ARIMA cannot forecast accurately for longer period.

. The Box-Jenkins or ARIMA model is one of the most popular time series forecasting methods. Markov model has its own advantage in forecasting ability. By analyzing the forecasted value using the performance evaluation procedure. the tentative model that best fits the criteria and meets the requirement is model ARIMA (1.1 Conclusion This study has fulfilled the objectives of the study to propose the streamflow forecasting methods using Markov and ARIMA models and then inspect the accuracy of both models in forecasting ability. From the result of the performance evaluation procedure.1. Therefore. Bernam streamflow is better than Markov model. it is found that use of ARIMA model for forecasting Sg. ARIMA model has the ability to predict accurately the future monthly streamflow for Sungai Bernam.1.CHAPTER 5 CONCLUSION AND RECOMMENDATIONS 4.1)(0. it showed that ARIMA has less value for all methods used. In this study.1)12.

66 The critical part in modeling using ARIMA is identification of best tentative model. they cannot forecast accurately. But. Therefore. comparison between the two model shows that ARIMA is better in giving accurate forecasts. Higher streamflow can cause disaster like flood. The tentative model that has been identified will be tested and checked to clarify that the model is the best fit. However. Both Markov and ARIMA models are good for short term forecasting. we can see that both models can forecast well for earlier period. 4. for longer period. Markov also has some advantage because it forecasts with higher streamflow compare to actual streamflow. both Markov and ARIMA model can be used for streamflow forecasting. Here are some recommendations that can be used to increase the accuracy for streamflow forecasting: . there are some weaknesses that can be overcome. Although both models good for short-term forecasting and not good for longterm forecasting. From the result. Markov model can be used for flood control.2 Recommendations Based on the result.

The amount of data. more training patterns results in more accurate forecasts. For long memory series. use long input series. we can use Markov model for short-term forecasting because short-term forecasting is very useful for control flood. 2. . Do the forecasting time series after removing the outliers. To control flood efficiently. Compare the streamflow forecasting with other forecasting methods of time series such as exponential smoothing.67 1. 3. 6. 4. Use hybrid model using ARIMA and artificial neural network in streamflow forecasting. Use ARIMA model for short-term forecasting only including for streamflow forecasting. 5. To forecast accurately. or equivalently the number of training patterns also affects the forecast performance. regression analysis or Fourier series analysis.

Jenkins. Forecasting and Time Series: An Applied Approach. R. and O’Connell. D. (2003). T. Ayob. Time Series Analysis: Forecasting and Control. G. Water Use Trend at Universiti Tekologi Malaysia: Application of Arima Model. M. and Majd. 241-255. K. M. A. S. . (1993). M. Time Series Analysis: Forecasting and Control. M. G. Holden Day. G. Englewood Cliffs. San Francisco. E. (1994). C. Prentice Hall. 41 (B): 47-56 Bell. P. Brown. Identification of Periodic Autoregressive Moving Average Models. Bowerman. Jurnal Teknology. Holden Day. Box. pp. (2009). 796-803. J.. E. B. Middle East Technical University. W. R. P. Third Edition. Forecasting and Prediction of Discrete Time Series. G. and Jenkins.. (1970). (1976). Prentice Hall. Agric. & Environ.68 REFERENCES Adib. A. Smoothing. Insurance: Mathematics and Economics 3. Duxbury Press. San Francisco. L. (1984). Akgun. Forecasting and Control. G. P. G. Sci. and Amat. 5(6). (1962). G. Optimization of Reservoir Volume by Yield Model And Simulation of it by Dynamic Programming and Markov Chain Method. and Reinsel. and Jenkins. R. Third Edition. Box. B. Box. G. E. N. (2004). American-Eurasion J. An Introduction to Forecasting with Time Series Models. Time Series Analysis. R.

(1998). R. pp. Vol. Vol. S. Markov Chain Model for Vegetation Dynamics. Nos 1-2. Universiti Teknologi Malaysia. D. American Geophysicists Union. (2000). 35. C. Vol. and Jackson. B. Perreault. 38. (1971). 126. and Xie. Manajemen Keuangan Sektor Publik FEUI Ho. Water Resources Monograph 1. Reservoir Storage Simulation and Forecasting Models for Muda Irrigation Scheme. L. In Proceedings of the 1998 Winter Simulation . The Use of ARIMA Models for Reliability Forecasting and Analysis. M. 296. pp 343-350. Markov-Weibull Model of Monthly Streamflow. 1. C. Vol. Fiering. N. pp. 139-154. Fortin. S. Water Quality Trend at The Upper Part of Johor River in Relation to Rainfall and Runoff Pattern. (2003). (2009). Computers ind. Synthetic Streamflows. and Salas. V. 5902-5911. Retrospective Analysis and Forecasting of Streamflows Using a Shifting Model. B. C. Input Modeling. (2010). (1989). Journal of Hydrology. J. Malaysia. Engng. Ecological Modeling. Gupta. pp. H. Lee. Hasmida. M. Heiko. L. and Ko.. 135-163. Short-term Load Forecasting Using Lifting Scheme and ARIMA Models.. R. Leemis. Vol. J. 113. Washington. (1998). Prentice Hall. (1987). Hendranata. Joomizan. (2004). ARIMA (Autoregressive Moving Average).69 Dalphin. Universiti Teknologi Malaysia. Expert Systems with Applications. B. B. Hydrology and Hydraulic Systems. (2011). L. Journal of Water Resources Planning and Management. No. D. A. 213-216.

. Malaysia. de Carvalho. Vol. J. 117-136. Daily Streamflow Forecasting Using Simplified Rule-Based Fuzzy Logic System. A. Fair and G. ed. Harvard University Press. Journal-The Institution of Engineers. Mass. S. 66. (1962). Cambridge. L. Ruzaidah. A. (2008). M.70 Conference.. Nazuha. D. Vol. No. R. and McCabe.. S. Dorfman. (1982). pp 467 Maia. T. Montgomery. Carson. G.. 15–22. G. S. T. (2005). Piscataway. D. Neurocomputing. New Jersey: Institute of Electrical and Electronics Engineers. H. Malaysia Crude Oil Production Estimation: an Application of ARIMA Model. Inc.. H.. Streamflow Drought Time Series Forecasting. S. Modarres. and Sobri. B. H. The Design of Water-Resource Systems. de A. Stoch Environ Res Risk Assess. Maass. Kulahci.. Third Edition.. Mathematical Modelling. pp. Mohd Shafiek. F. S. Inc. Introduction to the Practice of Statistics. Moore. Hishamuddin. C. and Lee. Introduction to Time Series Analysis and Forecasting. 4. Jennings. P. Thomas. Naadimuthu. D. M. A. J. Stochastic Modelling and Optimization of Water Resources Systems. Vol. M. A. M. S. F. Forecasting Models for Interval-valued Time Series. E. Freeman. 3. M. New York: W. Hufschmidt. Medeiros. S. (2007). and M. and Ludermir. 3344-3352. M. L. Watson. (1999). C. (2008). J. Manivannan. 71. Y. E. pp. Marglin. and Zamzulani. International Conference on Science and Social Research (CSSR 2010) . John Wiley & Sons.. (2010).. R.

Prediction of Daily Maximum Streamflow Based on Stochastic Approaches. Amsterdam. (2006). A.. Z. Short Term Forecasting: An Introduction to the Box-Jenkins Approach. (1988). pp. A. Almeida. Box-Jenkins Methodology. (2000). W. and Simyek. Nonlinearity and Forecasting of Streamflow Processes. Prentice Hall. (2004). New Mexico State University Shunway. T. Wang. Academic Press. P. Wurbs. Englewood Cliffs. (1983). 27-131.. M. K. H. R.0. and Fishwick. Kurunc. Inc. A. SPSS (1993). Release 6. SPSS for Windows-Trend. New Jersey. A. H. Monthly and Seasonal Streamflow Forecasting in the Rio Grande Basin. Simulation. Comparative Evaluation of Generalized River/Reservoir Systems Models. R. (2005). TR-282. .71 O’Donovan.4. New York: Wiley. Texas Water Resources Institute. Vol. (1991). and McGee. C. Yurekli. M. Time Series Forecasting Using Neural Networks vs. Stochasticity. Yafee. (2009). IOS Press. New York.. Introduction to Time Series Analysis and Forecasting with Application of SAS and SPSS. R. Journal of Spatial Hydrology. Applied Statistical Time Series Analysis. Tang. Shalamu.

73 10.80 6.90 9.75 10.95 6.61 18.36 9.39 6.97 17.45 7.75 7.88 11.54 5.94 8.72 8.87 13.06 11.69 11.49 7.90 8.12 6.85 6.94 18.45 11.84 5.87 5.92 6.01 17.54 12.91 23.37 11.28 8.18 May 10.59 6.84 9.26 4.95 3.40 11.83 6.68 13.75 6.94 3.66 8.05 11.73 8.94 3.31 9.39 5.96 3.02 9.98 8.84 6.30 9.48 15.36 Apr 14.31 6.94 4.39 9.52 4.88 8.84 5.10 3.10 4.96 6.65 12.65 9.64 11.54 4.50 6.45 11.16 8.10 10.69 20.01 8.60 6.33 13.54 7.47 16.84 4.38 5.25 12.48 7.52 9.98 5.07 4.24 4.83 7.41 14.96 8.85 10.04 8.13 5.18 19.70 5.91 6.31 6.05 9.37 7.04 9.83 4.60 8.39 17.78 14.15 11.66 10.88 21.77 11.73 8.60 11.33 3.52 11.27 3.93 5.93 7.02 5.13 8.70 5.64 9.95 8.86 10.94 3.40 11.14 6.99 9.30 11.72 13.42 9.73 4.78 15.08 7.38 4.55 12.27 7.96 7.79 10.45 Aug 7.86 9.86 2.34 3.86 Mac 11.58 15.14 7.14 7.73 2.40 5.59 3.44 8.80 6.02 8.87 14.43 13.49 6.95 4.79 9.28 6.00 10.67 9.83 8.65 16.37 6.83 10.45 7.11 18.08 3.90 15.87 4.65 11.47 16.08 12.97 9.09 6.74 3.02 5.65 11.34 8.45 8.15 10.96 8.58 15.56 10.68 9.85 12.32 6.57 12.75 9.26 29.91 6.24 12.79 11.67 4.83 12.57 5.79 5.45 11.87 14.09 9.06 16.68 4.59 14.91 2.36 13.79 6.39 12.44 7.08 8.12 6.29 7.99 9.75 8.90 7.92 27.25 14.58 21.65 12.83 11.88 7.40 14.89 16.04 5.72 13.64 9.73 5.69 15.21 6.41 5.24 9.74 11.07 18.23 8.64 4.73 20.59 13.62 8.12 20.32 4.94 5.65 5.68 8.89 8.29 6.67 4.23 11.89 7.64 7.38 4.92 5.24 8.55 7.23 7.93 5.99 8.84 5.13 2.30 7.38 4.37 9.19 8.02 6.86 9.14 4.93 2.17 20.38 5.45 8.08 21.33 4.26 3.03 9.74 12.95 9.96 6.24 10.62 6.79 4.24 16.82 12.10 6.84 12.16 9.98 19.05 4.04 Sep 11.62 6.30 6.40 8.38 12.29 7.23 6.08 12.19 19.31 6.12 6.91 5.53 28.16 9.42 13.07 11.91 17.94 19.96 8.72 7.76 6.58 6.73 8.05 7.07 7.50 8.43 15.11 11.29 9.91 16.81 4.60 16.46 7.77 5.62 10.35 16.03 8.14 3.05 16.56 10.62 9.16 14.85 3.13 6.69 10.72 9.03 8.50 4.89 8.67 6.43 9.56 5.06 10.75 6.83 7.64 9.88 5.75 12.62 15.43 11.16 13.09 4.26 5.44 5.09 6.30 Nov 13.07 10.88 6.51 3.72 APPENDIX A Streamflow Data of Sungai Bernam 1960-2010 i 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 Jan 10.89 6.74 9.85 8.76 6.36 6.41 10.80 5.42 8.36 4.50 19.56 7.76 9.35 18.14 14.99 13.69 4.15 21.05 7.66 7.39 5.51 14.16 8.94 13.07 9.38 9.35 6.80 10.52 5.99 6.87 9.08 26.61 9.37 7.79 12.83 8.09 3.69 17.74 12.56 7.72 8.60 14.31 6.32 11.26 7.51 6.49 4.31 13.08 6.63 14.83 9.20 7.05 11.77 14.76 22.26 11.56 Dec 14.12 7.19 18.67 5.20 7.01 10.15 16.73 15.51 4.69 4.55 5.07 6.01 6.87 13.36 15.22 11.36 3.72 6.29 15.87 8.43 18.14 23.15 10.26 16.92 11.04 21.79 4.05 13.80 11.49 8.11 6.60 2.71 7.34 13.95 6.23 8.20 10.22 12.82 11.39 9.74 7.52 11.17 12.35 7.18 8.38 5.31 5.84 12.66 6.75 14.31 9.86 13.83 5.27 4.28 7.99 5.01 6.07 10.74 7.01 .26 10.34 3.06 9.13 7.89 18.03 8.88 8.15 19.22 13.09 8.12 19.14 15.99 7.99 17.49 16.56 9.48 8.81 9.46 12.95 7.47 9.18 11.55 5.90 3.54 10.36 11.17 5.18 14.91 4.80 9.72 14.16 8.14 7.04 7.51 29.39 18.57 14.08 17.48 6.97 17.63 9.83 Feb 8.08 4.15 16.83 3.82 12.24 15.16 Oct 10.05 8.99 20.88 10.70 5.71 12.38 6.27 8.32 11.64 7.94 4.71 16.51 12.25 5.06 10.81 7.84 8.99 5.99 4.61 7.58 10.06 6.82 11.90 13.91 3.46 11.45 20.94 11.96 6.49 5.80 3.87 13.21 9.05 3.73 9.11 16.72 11.53 8.72 13.05 9.09 12.69 7.59 16.24 5.94 9.22 14.12 29.62 3.31 7.04 7.07 17.91 9.68 5.48 5.99 3.37 14.62 8.26 10.17 Jun 6.73 6.11 7.24 15.08 9.71 4.46 6.03 8.22 7.96 14.57 7.26 6.02 18.01 5.53 4.04 10.46 18.25 12.30 11.29 16.66 8.49 11.27 10.95 4.96 8.21 5.51 Jul 11.31 19.21 4.86 7.07 4.50 8.91 5.44 10.33 7.70 8.55 4.42 11.78 18.37 7.72 12.87 7.

054 0.059 0.047 0.042 0.052 0.048 0.055 0.048 0.068 0.038 0.052 0.062 0.064 0.072 0.054 0.034 0.069 0.041 0.068 0.046 0.035 0.060 0.065 0.043 0.064 0.058 0.056 0.051 0.043 0.050 0.047 Sep 0.053 0.047 0.049 0.067 0.049 0.034 0.060 0.045 0.059 0.071 0.045 0.062 0.048 0.041 0.079 0.048 0.053 0.043 0.048 0.037 0.062 0.049 0.048 0.049 0.055 0.060 0.049 0.035 0.061 0.050 0.044 0.046 0.049 0.063 0.056 0.067 0.046 0.064 0.064 0.037 0.070 0.057 0.059 0.048 0.062 0.050 Feb 0.054 0.075 0.042 0.058 0.040 0.038 0.064 0.044 0.057 0.040 0.054 0.054 0.052 0.049 0.057 0.047 0.055 Jun 0.045 0.055 0.056 0.065 0.034 0.062 0.056 0.055 0.059 0.060 0.037 0.048 0.047 0.052 0.067 0.060 0.044 0.057 0.052 0.059 0.040 0.072 0.053 0.059 0.056 0.050 0.038 0.039 0.050 0.049 0.055 0.061 0.072 0.041 0.050 0.060 Nov 0.046 0.053 0.037 0.052 0.052 0.051 0.065 0.073 0.067 0.050 0.083 0.050 0.034 0.057 0.031 0.069 0.052 0.071 0.045 0.051 0.075 0.046 Aug 0.043 0.072 0.045 0.059 0.048 0.062 0.73 APPENDIX B Logarithm of Observed Streamflow Data for 1960-2005 i 1960 1961 1962 1963 1964 1965 1966 1967 1968 1969 1970 1971 1972 1973 1974 1975 1976 1977 1978 1979 1980 1981 1982 1983 1984 1985 1986 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 Mean Jan 0.045 0.050 0.054 0.044 0.060 0.044 0.052 0.033 0.035 0.052 0.052 0.059 0.037 0.042 0.046 0.038 0.056 0.053 0.072 0.033 0.036 0.045 0.048 0.047 0.045 0.076 0.079 0.051 0.036 0.051 0.064 0.050 0.046 0.049 0.044 0.048 0.059 0.054 0.053 0.049 0.052 0.042 0.066 0.045 0.060 .047 0.032 0.043 0.058 0.061 0.047 0.057 0.047 0.055 0.046 0.053 0.067 0.050 0.050 0.036 0.058 0.047 0.051 0.065 0.043 0.048 0.038 0.057 0.044 0.047 0.055 0.032 0.055 0.045 0.067 0.064 0.068 0.058 0.061 0.065 0.055 0.059 0.040 0.040 0.037 0.058 0.043 0.057 0.047 0.063 0.062 0.053 0.049 0.049 0.044 0.045 0.062 0.057 0.056 0.061 0.050 0.073 0.033 0.045 0.059 0.049 0.030 0.045 0.066 0.037 0.058 0.042 0.065 0.043 0.033 0.046 0.065 0.075 0.072 0.052 0.077 0.050 0.068 0.052 0.063 0.060 0.043 0.045 0.043 0.067 0.057 0.045 0.072 0.059 0.036 0.041 0.039 0.046 0.053 0.030 0.052 0.074 0.046 0.067 0.055 0.053 0.063 0.048 0.047 0.058 0.043 0.043 0.073 0.075 0.042 0.052 0.039 0.049 0.057 0.039 0.063 0.060 0.067 0.040 0.060 0.046 0.040 0.044 0.049 0.052 0.071 0.052 0.056 0.045 0.058 0.067 0.048 0.051 0.083 0.054 0.069 0.061 0.050 0.073 0.066 0.053 0.069 0.067 0.049 0.053 0.057 0.057 0.043 0.064 0.044 0.046 0.043 0.056 0.064 0.038 0.053 0.040 0.044 0.075 0.044 0.051 0.057 0.036 0.045 0.058 0.039 0.055 0.043 0.054 0.049 Jul 0.040 0.042 0.057 0.049 0.052 0.044 0.031 0.048 0.055 0.045 0.033 0.058 0.049 0.067 0.038 0.045 0.065 0.043 0.059 0.046 0.038 0.057 0.058 0.050 0.048 0.037 0.053 0.065 0.044 0.045 0.060 0.068 0.060 0.029 0.036 0.075 0.050 0.053 0.086 0.064 0.058 0.051 0.071 0.048 0.069 0.039 0.053 0.076 0.056 0.061 0.065 Dec 0.070 0.050 0.054 Oct 0.053 0.047 Apr 0.053 0.055 0.043 0.060 0.041 0.047 0.059 0.057 0.046 0.054 0.047 0.050 0.063 0.038 0.051 0.036 0.055 0.043 0.044 0.039 0.054 0.071 0.051 0.071 0.058 0.066 0.040 0.059 0.055 0.072 0.047 0.063 0.038 0.065 0.072 0.035 0.053 0.052 0.041 0.040 0.052 0.047 0.063 0.048 0.052 0.038 0.047 0.050 0.069 0.048 0.061 0.050 0.042 0.045 0.052 May 0.054 0.044 0.038 0.058 0.060 0.046 0.058 0.058 0.050 0.051 0.053 0.044 0.052 0.041 0.039 0.074 0.053 0.058 0.049 0.040 0.035 0.057 0.060 0.054 0.044 0.045 0.051 0.043 0.060 0.050 0.043 0.042 0.042 0.035 0.070 0.041 0.050 0.053 0.051 0.056 0.055 0.036 0.053 0.069 0.055 0.040 0.052 0.041 0.059 0.034 0.051 0.043 0.061 0.059 0.067 0.036 0.071 0.050 0.050 0.045 0.049 0.040 0.060 0.040 0.068 0.038 0.066 0.054 0.040 0.074 0.063 0.054 0.041 0.035 0.056 0.064 0.045 Mac 0.041 0.049 0.067 0.

29772 -0.525384 0.280915 0.421663 -0.35684 0.13354 -0.537217 1.846072 0.892724 -0.53909 0.04983 1.180416 -0.070315 0.368406 -0.663435 0.766834 1.224711 0.45481 0.523247 0.847218 0.332189 0.8446 1.158017 0.420992 0.75074 -1.819547 -0.502486 0.692144 -0.530889 0.35648 0.600695 0.539321 0.724521 0.81114 0.370085 -0.243657 0.684203 0.255146 0.550577 -0.418338 0.466438 0.1059 -0.996074 0.723905 2.138949 1.149797 -0.014286 0.627906 0.29369 0.444235 2.110509 0.622933 0.353593 0.195624 -0.471453 0.610146 0.401403 1.033016 -0.095672 0.978848 0.286095 1.45056 0.656401 0.929536 3.597223 0.196816 -0.733219 0.234893 -0.511878 0.44952 1.449041 -0.910651 0.516508 0.j Jan-06 Feb-06 Mar-06 Apr-06 May-06 Jun-06 Jul-06 Aug-06 Sep-06 Oct-06 Nov-06 Dec-06 Jan-07 Feb-07 Mar-07 Apr-07 May-07 Jun-07 Jul-07 Aug-07 Sep-07 Oct-07 Nov-07 Dec-07 Jan-08 Feb-08 Mar-08 Apr-08 May-08 Jun-08 Jul-08 Aug-08 Sep-08 Oct-08 Nov-08 Dec-08 Jan-09 Feb-09 Mar-09 Apr-09 May-09 Jun-09 Jul-09 Aug-09 Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-10 Jul-10 Aug-10 Sep-10 Oct-10 Nov-10 Dec-10 0.0558 -0.041859 1.389314 0.08027 -1.021363 0.053638 0.338947 0.283447 0.694435 -0.401592 0.91763 -1.687481 0.489081 -0.779092 0.2914 1.440226 0.458237 -0.596322 0.36307 1.322107 0.85937 0.536714 1.555255 0.143889 0.381358 -0.28503 2.877403 0.147587 -0.91316 0.305343 0.806366 0.384299 1.375216 -0.527923 -0.159015 0.289185 0.326313 1.010641 0.16487 0.630127 0.222905 0.237308 0.046495 0.225195 0.992148 0.056176 0.584661 1.374963 0.044674 0.431438 -0.410335 -0.908473 -0.284724 -0.53482 -0.971427 0.800643 3.17514 -0.712222 -0.023667 0.541241 .750771 -0.402661 0.471912 0.312802 -0.237618 1.999341 0.765445 0.345832 -1.612597 0.029268 -1.236038 0.523379 0.264273 0.641724 0.769087 2.994989 -0.096817 0.919276 0.17623 -1.298297 -0.678397 1.063732 0.083163 -0.93498 -0.687608 0.543827 0.058306 2.230738 0.421629 0.754806 -0.256729 0.36824 -0.32724 0.49313 0.402188 0.255813 0.886483 -0.39299 1.720449 -0.32176 0.50847 -0.820859 0.622573 1.44563 0.260254 -0.479699 0.005833 1.838551 0.34558 0.490905 1.954236 0.02497 0.34224 0.041391 -0.35998 0.369778 1.058805 -0.74 APPENDIX C Generation of Random Number for Year 2006-2010 i RAND ( ) z erf -1 ti.723842 -0.32687 -0.42327 0.428914 0.729118 0.15355 1.098252 0.63136 3.443813 0.425101 0.339046 -0.533139 0.2023 0.142173 -0.998683 0.399289 -0.27472 0.808656 -0.018868 0.488724 1.89822 1.20139 -0.14097 1.421856 -0.978717 -0.041228 0.0617 -0.56398 0.997494 0.872536 -0.554191 0.699645 0.066278 -0.32439 1.32759 0.800866 0.32158 0.705168 0.429319 -0.751243 0.090379 -0.601733 0.479484 0.407816 0.323203 0.184368 0.12667 -0.558183 0.752312 0.50556 0.

055 0.012 0.159015 0.046063986 0.049 0.j-1-qj-1) ti.36307 1.063 0.073 0.008 -0.014 0.050 0.046529892 0.04653475 0.068 0.066 0.052 0.055 0.054896185 0.007 -0.064 0.021 0.401403 1.j-1 qj+bj(qi-1.062 0.060 0.077 0.064 0.402661 0.012 -0.048816984 0.059651281 1.059652259 0.013 0.060 0.004 0.059642058 0.045386033 0.067 0.584661 1.005 0.063 0.059661808 0.046528015 0.369778 1.29772 -0.014 0.065 0.055 0.053870927 0.044 0.048815617 0.769087 2.065040692 0.064 0.750771 -0.047231941 0.073 0.068 0.012 0.929536 3.049 0.046075966 0.068 0.069 0.011 0.062 0.058306 2.062 0.057 0.044 0.063 0.075 0.055 0.060 0.002 0.009 0.065039672 0.048817884 0.049 0.015 0.049 0.053 0.332189 0.286095 1.536714 1.610146 0.054889433 0.054 0.042 0.076 0.066 0.067 0.012 0.089 0.046529249 0.053 0.049541669 0.28503 2.077 0.049 0.041391 -0.045384746 0.004 0.054895021 0.083163 -0.057 0.051 0.065 0.012 0.066 0.066 0.007 0.002 0.820859 0.280915 0.067 0.2914 1.596322 0.63136 3.062 0.016 0.053873657 0.006 0.05490011 0.89822 1.053858046 0.067 0.089 0.059649041 0.049 0.002 0.052 0.010 0.007 -0.076 0.055 0.0596524 0.055 0.009 0.011 0.537217 1.062 0.049 0.076 0.057 0.073 0.045391438 0.046088968 0.051 0.539321 0.622573 1.055 0.062 0.723905 2.56398 0.050 0.053 0.055 0.054 0.005 0.018868 0.043 0.047251163 0.j√(1-rj2) qi.051865643 0.490905 1.237618 1.479484 0.049565308 0.04726108 0.046093027 0.067 0.006 0.064 0.019 0.017 0.011 0.766834 1.8446 1.05489742 0.013 0.065039039 0.049 0.444235 2.060 0.050 0.060 0.004 0.021363 0.006 0.04880782 0.012 0.024 0.047235947 0.052 0.049 0.064 0.006 0.051881059 0.059650993 0.886483 -0.002 0.063 0.003 0.021 0.059652256 0.523379 0.001 -0.068 0.005 -0.012 0.243657 0.752312 0.326313 1.060 0.005833 1.81114 0.007 0.053 0.077 0.622933 0.051 0.042 0.064 0.060 0.060 0.053859658 0.046516145 0.049568162 0.042 0.076 0.049 0.75 APPENDIX D Markov Model Streamflow Month.056 0.065040467 0.007 -0. i Jan-06 Feb-06 Mar-06 Apr-06 May-06 Jun-06 Jul-06 Aug-06 Sep-06 Oct-06 Nov-06 Dec-06 Jan-07 Feb-07 Mar-07 Apr-07 May-07 Jun-07 Jul-07 Aug-07 Sep-07 Oct-07 Nov-07 Dec-07 Jan-08 Feb-08 Mar-08 Apr-08 May-08 Jun-08 Jul-08 Aug-08 Sep-08 Oct-08 Nov-08 Dec-08 Jan-09 Feb-09 Mar-09 Apr-09 May-09 Jun-09 Jul-09 Aug-09 Sep-09 Oct-09 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-10 Jul-10 Aug-10 Sep-10 Oct-10 Nov-10 Dec-10 Deterministic Component Random Component Model Flow qi-1.041859 1.064 0.060 0.051878774 0.065 .063 0.043 0.004 0.064 0.003 0.069 0.062 0.045385559 0.255146 0.002 0.004 0.062 0.023667 0.067 0.066 0.010 0.060 0.055 0.051 0.489081 -0.002 -0.056 0.049543394 0.j (Log) 0.49313 0.052 0.0617 -0.075 0.05187947 0.049559131 0.421856 -0.004 0.057 0.047223647 0.055 0.013 -0.077 0.003 0.053859993 0.44952 1.008 0.541241 0.384299 1.005 0.065034911 0.93498 -0.022 0.042 0.013 0.066 0.063 0.073 0.j Sjti.046082272 0.800643 3.059649508 0.488724 1.04536888 0.678397 1.016 0.138949 1.051883571 0.059640286 0.048803168 0.

0379 12.344538 0.877 13.086689 0.46 12.672621 4.03 10.41909 1.046 0.177887 1.62 13.05 17.821 32.05 11.089859 0.339 15.05078 3.72 29.481376 1.865887 30.734 23.63 17.00476 213.262 12.391 137.352 3.72 11.72 8.20453 42.380738 0.111 69.99 Model Flow (m3/s) 13.299 5.08 8.014 8.89 7.97813 0.219 12.336 13.637769 0.109 43.915831 0.192 44.302 8.077 5.199906 1.119 10.01556 0.206928 1.027875 2.404 15.668293 47.644 69.91601 71.398 33.81687 15.533 9.856 10.210723 0.212657 0.104243 0.12 6.807 10.005052 6.127953 2.73 9.210 16.351 4.80 7.241801 8.82308 28.245474 0.326 15.75 7.038 21.343 7.970 14.688 61.209153 0.691526 0.337 12.619 69.74858 1.00041 43.386991 0.93914 0.422 10.721 50.886 55.908638 1.0362 57.062732 39.76006 3.039051 42.535 58.127 14.96 9.82 7.06214 34.224 22.14056 0.292187 17.21902 4.897 70.427585 0.193326 47.27929 0.651481 0.89258 0.41474 49.872 17.76 9.015148 0.432 41.845 22.220272 404.13488 31.12866 1.83 9.604 10.205 78.85 13.93221 5.67 15.899487 13.800 15.430014 2.298 82.37171 5.641 9.642 9.388 13.250341 6.942 3.423 7.73 12.58 12.6583 339.00411 376.7641 1.327024 31.550 53.059 9.013653 3.05788 0.36 14.345112 1.874 7.95 9.10 13.531 13.098 8.211446 6.384 2.816363 0.439 17.593644 0.59034 6.161 19.822 0.29 6.625 20.3515 0.305389 .08 7.171 31.599874 1.629 12.065 15.205001 0.61862 129.514 6.05 9.05 6.32323 26.161 37.11 29.502 22.786 7.607 10.858 31.020 7.22 17.33 14.24 11.846 9.007 9.31 8.38 13.015 42.95 28.719 5.77884 5.86 9.128 18.94 9.685 63.706 106.3443 21.23 9.928084 5.06 8.587 MAPE RMSE 3.599 7.14 16.008043 2.160 15.554974 0.26 8.681 67.28 10.32252 Chi-square Test 0.379 13.100896 0.849 13.185165 1.462 11.421735 19.992 12.31845 15.855987 2.384 10.013 48.768 19.58374 1.032 19.76 APPENDIX E Performance Evaluation Procedure of Markov Model i Jan-06 Feb-06 Mar-06 Apr-06 May-06 Jun-06 Jul-06 Aug-06 Sep-06 Oct-06 Nov-06 Dec-06 Jan-07 Feb-07 Mar-07 Apr-07 May-07 Jun-07 Jul-07 Aug-07 Sep-07 Oct-07 Nov-07 Dec-07 Jan-08 Feb-08 Mar-08 Apr-08 May-08 Jun-08 Jul-08 Aug-08 Sep-08 Oct-08 Nov-08 Dec-08 Jan-09 Feb-09 Mar-09 Apr-09 May-09 Jun-09 Jul-09 Aug-09 Sep-09 Oct-09 Actual Flow (m3/s) 13.764 27.020387 2.069 51.74 10.

827 56.680274 78.45 8.65409 3.408991 4.091 14.943131 0.522 15.73 6.56 11.258356 6.70912 48.659 91.29 4.908 8.168 212.17 7.06853 7.72375 96.56381 214.9884 .417 14.435427 5.18 6.86 4.672 75.834 7.98606 3.927 124.564209 0.914016 250.798 126.346 96.026 33.88 6.701 159.985 13.41047 7.296 21.119965 0.83 4.03522 63.77 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-10 Jul-10 Aug-10 Sep-10 Oct-10 Nov-10 Dec-10 12.405 110.820 131.261 53.513 129.191 17.94217 140.01 22.192169 4.106203 9.299 21.744 16.34386 40.961389 5.317 136.597 10.493 15.04 7.51 7.93266 1.084 85.36 7.472611 6.026957 2.3895 81.912 14.491085 35.16 6.42605 86.5846 13.30 9.312 16.

99026 -3.80 7.3101 15.62 13.9538 11.24 11.05 6.05700 -0.04180 1.3913 7.02019 0.05 9.3802 7.955283 .07188 1.39072 -1.2208 11.7266 6.7673 7.08 8.0737 18.58500 4.2612 9.9227 6.89 7.4006 7.74 10.10867 -1.58 12.2661 7.9369 7.9794 7.07675 4.3305 7.5804 12.8015 7.91615 -1.1553 9.1884 7.5648 9.3581 7.4266 0.7460 7.73 Model Flow (m3/s) 9.7046 7.7005 10.0133 13.06 8.4570 10.6110 6.75 7.86 9.878761 0.28 10.1273 9.10621 -1.0373 9.57523 -0.7171 6.08 7.9581 7.7248 7.99786 -1.33 14.82 7.58702 1.27357 2.76 9.85 13.14946 1.7182 7.57988 -1.1341 8.11 29.29 6.23 9.3314 15.31 8.2429 12.95 9.2719 5.3526 15.83 9.6970 7.0851 9.12 6.0050 13.1226 15.9439 6.32522 -0.26644 5.3852 7.5168 12.42906 -1.57792 0.05 11.2217 12.96 9.58516 -3.46 12.6250 11.7394 7.36 14.01737 -4.04920 0.05 17.78 APPENDIX F ARIMA Model Streamflow i Jan-06 Feb-06 Mar-06 Apr-06 May-06 Jun-06 Jul-06 Aug-06 Sep-06 Oct-06 Nov-06 Dec-06 Jan-07 Feb-07 Mar-07 Apr-07 May-07 Jun-07 Jul-07 Aug-07 Sep-07 Oct-07 Nov-07 Dec-07 Jan-08 Feb-08 Mar-08 Apr-08 May-08 Jun-08 Jul-08 Aug-08 Sep-08 Oct-08 Nov-08 Dec-08 Jan-09 Feb-09 Mar-09 Apr-09 May-09 Jun-09 Jul-09 Aug-09 Sep-09 Oct-09 Nov-09 Actual Flow (m3/s) 13.10 13.67458 2.72 29.26 8.95 28.99 12.8535 9.9121 13.37209 3.6286 6.03841 1.38 13.7039 11.03 10.73 12.04294 -0.1485 9.5299 7.6732 7.18154 -1.0161 9.2148 10.2195 7.289364 0.0165 9.56349 -1.89633 3.6711 6.8818 7.2005 12.1765 9.5379 12.4603 15.9651 6.9281 7.4991 7.5536 18.14 16.2889 15.22 17.3794 7.26505 -2.54478 -3.4816 9.9552 9.2642 Residual Fit Coefficient * * * * * * * * * * * * * -1.73 9.63 17.43986 * * * * * * * * * * * * * 7.6507 9.9828 8.1060 9.6499 6.67 15.9949 9.72 11.94 9.72 8.5592 12.

06042 -3.45 8.2854 12.4791 11.4431 -1.6156 16.5383 8.62122 -1.1109 8.1698 9.6923 6.3296 14.9864 6.4688 9.9267 .04 7.4218 7.79769 -0.18330 16.78662 -1.69829 3.1978 9.6470 12.37562 -2.30056 -0.51 7.7900 8.18 6.88 6.7607 7.95910 1.83 4.01 12.06698 1.79 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-10 Jul-10 Aug-10 Sep-10 Oct-10 Nov-10 Dec-10 6.43999 -1.6716 11.0586 10.30 9.56 11.86 4.1383 7.3739 15.17 7.6866 10.36 7.7885 7.0006 7.5277 10.16 6.45611 -0.6017 12.79690 -1.

073 40.588 62.4006 7.83 9.638 0.9281 7.377 33.99 Model Flow (m3/s) 9.890 2.7460 7.271 0.836 4.224 1.975 2.822 1.200 0.879 14.384 1.841 69.744 50.511 0.1553 9.417 0.147 0.1341 8.032 4.236 19.228 1.1884 7.46 12.73 12.217 0.046 11.250 19.72 11.135 3.74 10.793 66.62 13.72 8.05 6.968 0.130 14.032 Chi-square Test 1.22 17.397 0.559 28.2217 12.061 1.937 4.7673 7.10 13.0851 9.2889 15.130 1.594 1.0373 9.1273 9.457 1.776 0.06 8.76 9.072 30.438 17.606 0.053 0.508 20.426 0.3101 15.9581 7.274 8.241 2.495 0.7182 7.050 4.129 8.121 0.052 21.017 0.3794 7.303 14.144 18.080 0.158 36.845 5.473 18.115 8.9369 7.05 11.28 10.196 0.14 16.005 2.134 11.956 54.08 8.075 3.183 47.151 0.2005 12.85 13.94 9.173 17.86 9.235 29.73 9.540 0.208 343.3581 7.541 .868 1.000 0.948 6.009 0.1060 9.0161 9.3526 MAPE RMSE 26.546 0.323 0.5804 12.927 1.38 13.029 12.269 1.538 37.526 8.3314 15.465 14.29 6.011 0.1485 9.138 30.748 2.618 25.1765 9.645 3.215 10.487 13.05 9.82 7.002 5.75 7.901 14.2612 9.23 9.9949 9.9439 6.940 15.633 372.943 13.58 12.89 7.095 6.6499 6.965 48.167 0.172 0.639 0.003 0.494 0.96 9.449 1.053 8.7394 7.345 37.547 6.95 9.934 20.331 0.05 17.002 0.6711 6.310 29.662 66.473 34.67 15.72 29.084 0.103 0.31 8.802 2.08 7.2429 12.9794 7.6732 7.026 1.781 21.705 38.6286 6.80 7.243 0.12 6.160 0.069 0.6110 6.36 14.480 28.033 2.536 32.694 0.5168 12.0165 9.092 2.648 2.7046 7.409 0.9227 6.290 15.80 APPENDIX G Performance Evaluation Procedure of ARIMA Model i Jan-06 Feb-06 Mar-06 Apr-06 May-06 Jun-06 Jul-06 Aug-06 Sep-06 Oct-06 Nov-06 Dec-06 Jan-07 Feb-07 Mar-07 Apr-07 May-07 Jun-07 Jul-07 Aug-07 Sep-07 Oct-07 Nov-07 Dec-07 Jan-08 Feb-08 Mar-08 Apr-08 May-08 Jun-08 Jul-08 Aug-08 Sep-08 Oct-08 Nov-08 Dec-08 Jan-09 Feb-09 Mar-09 Apr-09 May-09 Jun-09 Jul-09 Aug-09 Sep-09 Oct-09 Actual Flow (m3/s) 13.95 28.24 11.33 14.325 428.925 17.7248 7.63 17.9651 6.5379 12.344 1.391 1.11 29.601 11.478 0.306 52.03 10.023 57.5592 12.26 8.178 104.515 6.6970 7.

45 8.167 0.674 0.30 9.6923 6.102 96.054 5.119 0.1978 9.889 13.428 8.04 7.0586 10.712 1.053 3.004 0.86 4.780 2.892 32.411 59.7607 7.824 34.757 5.4431 19.81 Nov-09 Dec-09 Jan-10 Feb-10 Mar-10 Apr-10 May-10 Jun-10 Jul-10 Aug-10 Sep-10 Oct-10 Nov-10 Dec-10 12.434 0.613 8.7885 7.165 191.56 11.848 10.2854 12.981 2.6017 12.416 0.472 0.550 16.085 2.88 6.962 36.145 0.73 6.421 2.467 0.016 27.114 .83 4.337 3.109 65.907 80.390 1.17 7.3739 15.033 0.51 7.1698 9.9864 6.931 39.621 2.064 0.422 30.16 6.01 15.36 7.087 26.4218 7.2642 12.0006 7.106 0.534 1.18 6.164 62.497 6.529 14.