You are on page 1of 498
APPLIED MODELING OF HYDROLOGIC TIME SERIES by J.D. Salas, J. W. Delleur, V. Yevjevich and W. L. Lane WATER RESOURCES PUBLICATIONS, DEDICATION To Jose and Olinda To my family To my family To Marily, Mark and Tim eeu hE=o reba For information and correspondence: WATER RESOURCES PUBLICATIONS P.O. Box 2841 Littleton, Colorado 80161, U.S.A. APPLIED MODELING OF HYDROLOGIC TIME SERIES About the Authors: Jose D. Salas is Associate Professor of Civil Engineering at Colorado State University, Fort Collins, Colorado. Jacques W. Delleur is Professor of Civil Engineering at Purdue Uni- versity, Lafayette, Indiana. Vujica M. Yevjevich is Professor of Civil Engineering at Colorado State University and Research Professor and Director at the International Water Resources Institute, School of Engineering and Applied Science, George Washington, University, Washington, D.C. and William L. Lane is Hydrologist, Division of Planning at the Water and Power Resources Service, Pacific Northwest Regional Office, Boise, Idaho ISBN-0-918334-37-3 U.S, Library of Congress Catalog Card Number 80-53334 Copyright © 1980 by Water Resources Publications. Ali rights reserved. Printed in the United States of America, The text of this publication may not be reproduced, stored in 2 retrieval system, or transmitted, in any form or hy any means, without a written permission from the Publisher. ‘This publication is printed and bound by BookCrafters, I2e., Chelsea, Michigan, U.S.A. ii Chapter 1 TABLE OF CONTENTS Page INTRODUCTION 1 1.L STOCHASTIC PROCESSES AND ‘TIME | SERIES. . 1 1.2 ‘TIME SERIES MODELS . . 4 1.3 TIME SERIES MODELING . . 5 1.4 PHYSICAL BASIS OF TIME SERIES | MODELING IN HYDROLOGY. 6 1.5 REPRODUCTION OF HISTORICAL | STATISTICAL CHARACTERISTICS . . 8 1.6 TIME SERIES MODELS IN HYDROLOGY 10 1.7. TIME SERIES MODELING IN HYDROLOGY we 12 1.8 APPLICABILITY OF TIME SERIES™ MODELING IN HYDROLOGY .... . 16 APPENDIX Al.1 DEFINITIONS, TERMS AND NOTATIONS .... 7 APPENDIX A1.2. ELEMENTARY STATISTICAL PRINCIPLES . . 19 APPENDIX Al.3 ELEMENTARY MATRIX DEFINITIONS AND COMPUTATIONS . . 5 REFERENCES... . pees 20 CHARACTERISTICS OF HYDROLOGIC SERIES . . a 31 2.1 TYPE OF HYDROLOGIC SERIES co G 31 2.1.1 TIME SERIES . . fe Bl 2.1.2 LINE SERIES |)... 32 2.1.3 COUNTING SERIES 1... 1... 32 2.2 GENERAL PROPERTIES OF HYDROLOGIC TIME SERIES 33 2.2.1 COMPONENTS OF HYDROLOGIC © SERIES... : 33 2.2.2 BASIC STATISTICAL CHARACTERISTICS OF TIME SERIES 36 2.2.3 COMPLEX CHARACTERISTICS © OF PERIODIC TIME SERIES . 40 2.2.4 DROUGHT RELATED CHARACTERISTICS OF TIME SERIES .. * . 41 2.2.5 STORAGE RELATED eee enlerice OF TIME SERIES... . 43 iii Chapter 2.2.6 NONHOMOGENEITY AND INCONSISTENCY IN HYDROLOGIC SERIES ei : 2.3 CHARACTERISTICS OF ANNUAL TIME SERIES . . 2.4 CHARACTERISTICS OF PERIODIC TIME SERIES 2.5 CHARACTERISTICS OF MULTIVARIATE TIME SERIES . . 2.6 CHARACTERISTICS OF INTERMITTENT TIME SERIES aioe REFERENCES . . 3 STATISTICAL PRINCIPLES AND TECHNIQUES FOR TIME SERIES MODELING : 3.1 BASIC ESTIMATION TECHNIQUES 3.1.1 PROPERTIES OF ESTIMATORS 3.1.2 METHOD OF MOMENTS . . 3.1.3 METHOD OF LEAST SQUARES 3.1.4 METHOD OF MAXIMUM LIKELIHOOD . . ‘i 3.1.5 JOINT ESTIMATION OF PARAMETERS 3.1.6 PARAMETER ESTIMATION BY REGIONALIZATION we 3.2 NORMALIZATION OF TIME SERIES VARIABLES 3.2.1 NORMALIZATION OF ANNUAL” TIME SERIES 3.2.2. NORMALIZATION OF PERIODIC TIME SERIES 3.2.3 REMARKS . 3.8 ESTIMATION OF PERIODIC PARAMETERS BY FOURIER SERIES 3.3.1 JUSTIFICATION OF USING FOURIER SERIES . . . STIMATION OF FOURIER SERIES COEFFICIENTS . 3.3.8 SELECTION OF SIGNIFICANT HARMONICS AND FOURIER COEFFICIENTS . oe 3.4 ESTIMATION OF PARAMETERS OF MULTIVARIATE MODELS... .. . 3.5 TESTS OF GOODNESS OF FIT 3.5.1 TEST OF INDEPENDENCE 3.5.2 TESTS OF NORMALITY . . . 3.6 PRESERVATION OF STATISTICS AND PARSIMONY OF PARAMETERS 3.3.2 iv Page 44 47 53 57 60 61 63 63 63 65 66 67 69 70 70 70 a 73 4 7 8 86 88 89 92 94 Chapter .7 GENERATION AND FORECASTING ace 3.7.2 GENERATION OF SYNTHETIC SAMPLES . . . ‘i USE OF MODELS FOR FORECASTING REFERENCES 4 AUTOREGRESSIVE MODELING . . . fA4.1 DESCRIPTION OF AR MODELS 4.1.1 4.1.2 MATHEMATICAL FORMULATION — OF AR MODELS. PROPERTIES OF AR MODELS 4.2 AR MODELING OF ANNUAL TIME SERIES 4.2.1 4.2.2 4.2.3 4.2.4 4.2.5 4.2.6 4.2.7 4.2.8 ANNUAL AR MODELS ° PARAMETER ESTIMATION FOR ANNUAL AR MODELS . . GOODNESS OF FIT FOR ANNUAL AR MODELS. GENERATION AND FORECASTING USING ANNUAL AR MODELS SUMMARIZED AR MODELING PROCEDURE FOR ANNUAL SERIES... + EXAMPLE OF AR MODELING OF ANNUAL SERIES . LIMITATIONS OF ANNUAL AR- MODELING .. . . PRACTICAL APPLICATIONS OF ANNUAL AR MODELS . 4.3 AR MODELING OF PERIODIC TIME SERIES 4.3.1 4.3.2 4.3.3 4.3.4 4.3.5 4.3.6 4.3.7 APPENDIX PERIODIC AR MODELS . PARAMETER ESTIMATION FOR PERIODIC AR MODELS . . : GENERATION USING PERIODIC AR MODELS : SUMMARIZED AR MODELING PROCEDURE FOR PERIODIC SERIES . . EXAMPLE OF AR MODELING OF PERIODIC SERIES p LIMITATIONS OF PERIODIC AR MODELING . . . PRACTICAL APPLICATIONS OF PERIODIC AR MODELS A4.1 AUTOCORRELATION FUNCTION OF AR(p) MODELS v Page 97 98 101 101 105, 105 106 108 117 117 118 124 126 147 148 148 150 156 157 163 175 177 179 Chapter Page APPENDIX A4.2 PARTIAL AUTOCORRELATION FUNCTION OF F AR(p) MODELS ..... . 179 APPENDIX A4.3 ANNUAL FLOWS OF THE GOTA RIVER, SWEDEN ...... . . 181 REFERENCES ... . ears 182 AUTOREGRESSIVE-MOVING AVERAGE MODELING .. . se 185 5.1 DESCRIPTION OF “ARMA MODELS. | | 185 5.1.1 MATHEMATICAL FORMULATION OF ARMA MODELS . . O00 185 5.1.2 PROPERTIES OF ARMA MODELS. . 187 5.2 ARMA MODELING OF ANNUAL TIME SERIES . . » 196 5.2.1 ANNUAL ARMA MODELS... 196 5.2.2 PARAMETER ESTIMATION FOR ANNUAL ARMA MODELS . 197 5.2.3 GOODNESS-OF-FIT FOR ANNUAL ARMA MODELS . . . 204 5.2.4 GENERATION USING ANNUAL | ARMA MODELS . . . 206 5.2.5 FORECASTING USING ANNUAL ARMA MODELS : ~ 207 5.2.6 SUMMARIZED ARMA MODELING PROCEDURE FOR ANNUAL SERIES . . 2u1 5.2.7 EXAMPLES OF ARMA MODELING FOR GENERATION AND FORECASTING ANNUAL TIME SERIES... 216 5.2.8 LIMITATIONS TO BE CONSIDERED IN APPLICATIONS OF ARMA MODELING OF ANNUAL SERIES . 239 5.3 ARMA MODELING OF PERIODIC TIME SERIES .. . Perea ae ee nna) 5.3.1 PERIODIC ARMA MODELS... . 241 5.3.2 PARAMETER ESTIMATION FOR PERIODIC ARMA MODELS... . 242 5.3.3 GOODNESS OF FIT FOR PERIODIC ARMA MODELS . . . 244 5.3.4 SUMMARIZED ARMA MODELING PROCEDURE FOR PERIODIC SERIES . . 244 5.3.5 EXAMPLES OF ARMA MODELING OF PERIODIC SERIES see B46 vi Chapter 5.3.6 LIMITATIONS TO BE CONSIDERED IN APPLICATIONS OF ARMA MODELING OF PERIODIC SERIES APPENDIX A5.1 COMPUTER PROGRAMS APPENDIX A5.2_ COMPUTER PROGRAM USED IN THE ANNUAL SERIES EXAMPLE, SECM Oecd meer aera ae eo ee APPENDIX A5.3 CALCULATOR PROGRAM APPENDIX AS.4 COMPUTER PROGRAM USED IN MONTHLY SERIES EXAMPLE, SEC 3.5 ee eee REFERENCES .. 2.6... 00200-0005 AUTOREGRESSIVE INTEGRATED MOVING AVERAGE MODELIN' 6.1 DESCRIPTION OF ARIMA MODELS) 6.1.1 THE DIFFERENCING OPERATION : 6.1.2 THE ARIMA MODEL . 6.2 SIMPLE ARIMA MODELING OF TIME, SERIES . 6.2.1 THE SIMPLE ARIMA MODEL | | 6.2.2 PARAMETER ESTIMATION FOR SIMPLE ARIMA MODELS. : 6.2.3 GOODNESS OF FIT FOR SIMPLE ARIMA MODELS ........ 6.2.4 SUMMARIZED PROCEDURE FOR SIMPLE ARIMA MODELING. 6.2.5 EXAMPLE OF Sat ARIMA MODELING . . . 6.2.6 LIMITATIONS T6 BE CONSIDERED IN APPLICATIONS OF SIMPLE ARIMA MODELS . . . 6.3 MULTIPLICATIVE ARIMA MODELING OF PERIODIC TIME SERIES . . : 6.3.1 THE MULTIPLICATIVE ARIMA | MODEL... . . 6.3.2 PARAMETER ESTIMATION FOR MULTIPLICATIVE ARIMA MODELS ..........-. 6.3.3 GOODNESS OF FIT FOR MULTIPLICATIVE ARIMA MODELS vii Page 256 268 am 274 aI7 279 279 280 280 282 282 285 286 286 287 292 293 294 299 300 Chapter Page 6.3.4 SUMMARIZED PROCEDURE FOR MULTIPLICATIVE ARIMA MODELING . 300 6.3.5 EXAMPLES OF MULTIPLICATIVE | ARIMA MODELING . 302 6.3.6 LIMITATIONS TO BE CONSIDERED IN APPLICATIONS OF MULTIPLICATIVE ARIMA MODELS . . 320 6.3.7 COMPARISON AND LIMITATIONS: OF ARMA AND ARIMA MODELS . 323 APPENDIX A6.1 COMPUTER PROGRAMS .. 332 APPENDIX A6.2 COMPUTER PROGRAM USED IN SIMPLE ARIMA MODELING, EXAMPLE 6. 335 APPENDIX A6.3 | PROGRAM UNESTM 1. 339 APPENDIX A6.4 PROGRAM UNESTM AND DATA INPUT FOR MULTIPLICATIVE ARIMA MODELING, EXAMPLE 6.3.5. . fe B44 REFERENCES eee eee se 845 MULTIVARIATE MODELING OF HYDROLOGIC TIME SERIES... . 347 7.1 DESCRIPTION OF MULTIVARIATE TIME SERIES MODELS . . 348 7.1.1 GENERAL MATHEMATICAL MODELS... 348 7.1.2 PROPERTIES OF MULTIVARIATE) MODELS 351 7.2 MULTIVARIATE MODELING OF ANNUAL SERIES . 352 7.2.1 MULTIVARIATE AUTOREGRESSIVE AR(1) and AR(2) MODELS... 352 7.2.2 APPROXIMATE MULTIVARIATE, AUTOREGRESSIVE MOVING AVERAGE ARMA(p,q) MODEL .. 357 7.2.3 GOODNESS OF FIT FOR MULTIVARIATE ANNUAL MODELS . . so 360 7.2.4 SUMMARIZED MODELING. PROCEDURE FOR MULTIVARIATE ANNUAL SERIES... 2... . 362 7.2.5 EXAMPLE OF MODELING MULTIVARIATE ANNUAL TIME SERIES... . 2. (367 viii Chapter Page 7.2.6 LIMITATIONS OF MULTIVARIATE ANNUAL MODELING .. 0... 375 7.2.7 PRACTICAL APPLICATIONS OF ANNUAL MULTIVARIATE MODELS .. 2... 377 7.3 MULTIVARIATE MODELING OF PERIODIC TIME SERIES ‘i 379 7.3.1 MULTIVARIATE AR MODELS’ | 379 7.3.2 MULTIVARIATE ARMA MODELS . 384 7.3.3 GOODNESS OF FIT FOR MULTIVARIATE PERIODIC MODELS ............ 386 7.3.4 SUMMARIZED MULTIVARIATE MODELING PROCEDURE FOR PERIODIC SERIES ....... 387, 7.3.5 EXAMPLE OF MODELING MULTIVARIATE PERIODIC TIME. SERIES... . 394 7.3.6 LIMITATIONS OF MULTIVARIATE PERIODIC MODELING 407 7.3.7 PRACTICAL APPLICATIONS OF PERIODIC MULTIVARIATE MODELS ... we 412 APPENDIX A7.1 TABLES OF TIME SERIES DATA USED IN THE EXAMPLES OF CHAPTER 7 a See euEIEIEEeeS REFERENCES... ........ see AAT DISAGGREGATION MODELING i 421 8.1 DESCRIPTION OF DISAGGREGATION MODELS... . . i 423 8.1.1 GENERAL DISAGGREGATION MODEL . ; - 424 8.1.2 SINGLE SITE TEMPORAL DISAGGREGATION MODELS ... 426 8.1.3 MULTISITE TEMPORAL DISAGGREGATION MODELS . . 428 8.1.4 SINGLE SITE HIGHER ORDER AR MODELS 428 8.1.5 MULTIVARIATE HIGHER ORDER AR MODELS ........ 429 8.1.6 SPATIAL DISAGGREGATION MODEL . . fae 8.2 PROPERTIES OF DISAGGREGATION MODELS... 432 8.2.1 PRESERVATION OF EXPECTED VALUES BY DISAGGREGATION MODELS . . 432 ix Chapter 8.3 8.4 8.5 8.6 8.7 8.8 8.9 8.2.2 PRESERVATION OF ADDITIVITY BY DISAGGREGATION MODELS 8.2.3 PRESERVATION OF COVARIANCES AND VARIANCES BY DISAGGREGATION 5 PARAMETER ESTIMATION FOR ISAGGREGATION MODELS ...... 8.3.1 PARAMETER ESTIMATION FOR THE LINEAR DEPENDENCE MODEL... . 8.3.2 PARAMETER ESTIMATION FOR THE BASIC TEMPORAL DISAGGREGATION MODEL... 8.3.3 PARAMETER ESTIMATION FOR THE EXTENDED TEMPORAL DISAGGREGATION MODEL. . . 8.3.4 PARAMETER ESTIMATION FOR THE CONDENSED TEMPORAL DISAGGREGATION MODEL. 8.3.5 PARAMETER ESTIMATES FOR THE SPATIAL DISAGGREGATION MODEL . . oe GOODNESS OF FIT FOR DISAGGREGATION MODELS. SUMMARIZED MODELING PROCEDURE FOR DISAGGREGATION MODELS . GENERATION AND FORECASTING USING DISAGGREGATION MODELS. . . EXAMPLE OF DISAGGREGATION MODELING 30 LIMITATIONS OF DISAGGREGATION MODELING A PRACTICAL APPLICATIONS OF DISAGGREGATION MODELS ..... . APPENDIX A8.1 ESTIMATION OF COVARIANCES Fe eee eee REFERENCES ........ 9 CONSIDERATIONS IN MODEL APPLICATIONS .... 2. 9.1 PRETREATMENT OF HISTORICAL DATA . oe 9.1.1 DATA COMPILATION |. 1... 9.1.2 DATA FILL-IN AND EXTENSION . . 9.1.3 REDUCTION OF DATA TO NATURAL CONDITIONS Page 432 433 433, 433 434 435 435 436 436 438 441 441 446 450 452 459 461 461 461 463 464 Chapter 9.2 9.3 9.4 9.5 MODEL SELECTION. . 9.2.1 IMPORTANCE OF HISTORICAL STATISTICS .. . 9.2.2 PRESERVATION OF HISTORICAL STATISTICS... PURPOSE FOR GENERATION | SENSITIVITY OF RESULTS REGIONAL ANALYSIS L APPLICATIONS. . RESERVOIR SIZING STUDIES RESERVOIR OPERATION STUDIES ... . 9.3.3 BASIN-WIDE STUDIES MODEL LIMITATIONS . 9.4.1 SHORT- AND LONG-TERM PERSISTENCE . 4.2 ANNUAL MODEL LIMITATIONS’ 4.3 PERIODIC MODEL LIMITATIONS . 9.4.4 DISAGGREGATION MODEL LIMITATIONS CLOSING REMARKS ape ear INDEX BY SUBJECT INDEX BY AUTHOR xi Page 465 465 466 466 467 467 468 468 469 469 410 an 471 412 472 475 482 PREFACE The purpose of this book is to present the application of time series analysis for modeling hydrologic time series for the use of practicing engineers. The book attempts to bridge the gap between complexities of the research literature and the oversimplifications that can be found in some elementary textbooks-and handbooks. The concepts of random variables and random phenomena have been used in the field of hydrology since the beginning of the 20th Century. About the same time, statistics and probability theories were being applied to the analysis of river flow sequences. During the 1950's, such early concepts and applications were extended, introducing the idea of gen- erating streamflow samples by using tables of normal random numbers. However, if was not until the beginning of the 1966’s ‘that the formal development of stochastic modeling began with the introduction and application of autoregressive (Markov) models to seasonal and annua) hydrologic time series, Since then, a great deal of work has been done and published. Research on hydrologic time series has been aimed at studying the main statistical characteristics, provid- ing physical justification to some stochastic models, developing new and/or alternative models, improving the estimates of model parameters, developing new or improving existing modeling procedures, improving tests of goodness of fit, developing procedures on dealing with model and parameter uncertainties and studying the sensitivity of models and model parameters in applied hydrology Although the development of time series modeling in hydrology has reached some degree of sophistication, unfor- tunately most time series modeling in practice is still generally based on simplified methods. This usually involves the selec- tion of the tme series model (mostly an autoregressive model) in advance, and the estimation of its parameters by the method of moments without verifying constraints and also without testing the goodness of fit of the model or comparing it with the competing models. It appears that one of the main reasons for this state-of-the-art is that although many statis- tical books are available with modeling techniques and many research papers have been published in several journals (Water Resources Research, Journal of Hydrology, Journal of ASCE Hydraulics Division etc.), they appear to be impractical for most practicing engineers due to mathematical complexi- ties. Another reason simplified methods are used is that the available textbooks, monographs and handbooks in hydrology, which include stochastic modeling, are either too cumbersome to read or oversimplified. Therefore, most practicing en- gineers either find difficulties in following the research oriented literature which was not written for them, or tend to follow a very simplified and limited modeling approach xiii This book covers the steps involved in the application of time series analysis for modeling hydrologic time series. The models considered have been carefully limited to those which are inferred by the authors to be of the most use and promise to practitioners. An ever-present goal has been to simplify the presentation and to avoid saturating the reader with every possible model and technique. It is readily ad- mitted that this approach may mean that some of the very best, but as yet unused, unrecognized or unproven models and techniques have not been presented. Since this book is oriented to practitioners and not to researchers, this limita- tion is unavoidable. The choice of which models have been developed adequately for use by practitioners was a subjec- tive choice on the part of the authors of this book. This book is the outgrowth of lecture notes prepared for the "Computer Workshop in Statistical Hydrology" held at Colorado State University, in July, 1978. Discussions with J. C. Schaake, T. E. Croley, If, D. C. Boes, E. Benzeden, and R. A. Smith aided in the shaping of the contents of this book. A draft of this book was used as a text for the graduate level course "Stochastic Processes in Hydrology" during Spring, 1980 and for the summer course "Statistical Computer Techniques in Hydrology and Water Resources,” July, 1980, both taught at Colorado State University. Part of the draft was also used as a graduate level course "Statis- tical Hydrology” at Purdue University during Spring, 1980 Gratitude is extended to participants of the two summer courses as well as to the graduate students of the class of Spring, 1980, for their comments and suggestions which improved the text. Gratitude is also extended to the Department of Civil Engineering at Colorado State University, the School of Civil Engineering at Purdue University, the School of Engineering and Applied Science at George Washington University, and the Engineering and Research Center, Water and Power Resources Service (previously Bureau of Reclamation), U.S. Department of Interior, for giving us the opportunity to be involved in teaching, research and practice. Without such experience this book would have been difficult to write. Acknowledgment is also due to the National Science Foundation, Office of Water Resources Research and Technology, and Colorado State University Experiment Station for providing us with research support in the use of statistics, probability theory and sto- chastic processes in hydrology, research experience which gave us insight into the various aspects of modeling hydro- logic time series J. D. Salas J. W. Delleur V. Yevjevich W. L. Lane xiv Chapter 1 INTRODUCTION Most technical literature does little to ease the experienc? a beginner has in applying stochastic techniques The Jargon. mathematical intricacies~and—errers—Common to literature on stochastic hydrology, plus the constant contest~ ing of rival approaches make the chance of success of any initial attempt very small, This book is an attempt to simplify the presentation of stochastic approaches to facilitate the understanding and application by practitioners. A major aim of this book is to cover in an easy-to-understand manner the analysis and modeling of hydrologic time series. Only the most commonly used models are covered. This chapter introduces some of the main concepts, definitions and notations which are used throughout the rest of the book. Chapter 2 deals with the analysis and computa~ tion of the main statistical characteristics of hydrologic time series, and Chapter 3 gives the most relevant techniques needed in the other chapters for modeling hydrologic series. Chapters 4 through 8 cover the step-by-step modeling proce~ dures for univariate and multivariate series of annual and shorter time intervals. The last, Chapter 9 gives some general comments on the state of the available hydrologic series and analysis to be made prior to its stochastic model~ ing, comments on the relative applicability of the models to water resources problems, and other comments extending or relating to presentations in the previous chapters. 1.1 STOCHASTIC PROCESSES AND TIME SERIES Consider a variable denoted by X. If the outcome of this variable can be predicted with certainty, the variable is said to be a deterministic variable. On the other hand, if the outcome of X cannot be predicted with certainty, then X is a random variable. In the latter case, we can also say that the outcome of X can be expressed in probability sense or X is governed by laws of probability. Assume now that the outcome of X can be observed in sequential manner say Xi, X2, ... where the subscript may represent intervals of time, distance, etc. Such sequence is called a series and when the interval is time, it is called a time series. Often when describing ‘the properties and attributes of time series, the discussion also applies to series in general. Often we shall use only the term series meaning time series. If X is a deterministic variable, then the sequence Xj, Xz, ... is a deterministic series. Furthermore, the set of variables {X,} 1 associated with its deterministic governing mechanism or law is called a deterministic process. Similarly, if X is a ran- dom variable then X1, Xz, ... is a probabilistic series or in general a stochastic series. Moreover, the set of random variables X,, X2, ... associated with its underlying prob- ability distribution is called a probabilistic process or in general a stochastic process. Actually a stochastic process requires the knowledge of the joint probability distribution f(X1,X2,X3, ...) of the random variables X1, X2, Xg, . If the joint distribution can be factored into the product of the marginal distributions as {(X;) + f(X2) + f(X3) ... the process becomes an inde- pendent stochastic process and the series is an independent series. Otherwise there is certain type of serial dependence structure among the variables and the process is called a serially dependent stochastic process and correspondingly a serially dependent time series . A sample series X,, Xo, , Xy obtained from a given stochastic process is called a realizMtion of the stochas- tic process. There is an infinite number of possible realiza- tions if the distribution of the X's is continuous Figure 1.1(a) shows two realizations of a stochastic process Up to now, our definitions of stochastic process and time series were restricted to outcomes that occur at discrete points in time, although they can be defined in continuous time too. Since the hydrologic time series dealt with in this text are expressed at discrete times, such as days, weeks, months and years, we will continue our presentation of processes and series in discrete times. However, note that the graphical representation of discrete series is often made as continuous lines just because it is easier to observe the graphical ap- pearance and the overall configuration of the series. Figure 1.1(b) shows a continuous representation of the same discrete series shown in Fig. 1.1(a). Although all properties of a stochastic process are imbedded in the joint distribution f(Xi, Xz, ...), it is con- venient to indicate some specific properties such as expected values, variances and covariances (see Appendix Al.2 for the formal definition of these properties). In general the expect- ed value of a stochastic process Xy, Xz, ... is composed of the set of expected values at each position in time, namely E(Xy), E(X2), ... . Similarly, the set of variances are Var(X1), Var), We will also use the notations 4, = E(X,) and o,? = Var(X,), t = 1,2, , to represent the expected valuds and variances, respectively, Considering any two positions t and t-k, the covariance between the variables X, and X,y is ‘represented by Cov,(k) = Cov(X,, Xp). The cbariance is the property destribing the linear dependence of the stochastic process 2 Xt (b) Figure 1.1(a) Two realizations of a stochastic process with values plotted only at discrete points and (b) the same two realizations of the stochastic pro- cess plotted as continuous lines. A stochastic process (time series) is stationary in the mean or first-order stationary if the expected values do not vary with time, that is E(X,) = E(Xp) = ... = E(X,) = E(X) =». Similarly, when Var(X,) = o?, t = 1,2, is a con- stant, the stochastic process is stationary in the variance. A stochastic process is stationary in the covariance when the covariance depends only on the time lag k but it does not depend on the position t. That is, Cov(X,, X,_,,) = Cov(k) regardless of t. A stochastic process is second-order sta~ tionary when it is stationary in the mean and in the covari- ance. Note that stationarity in the covariance implies station- arity in the variance. A second-order stationary process is also called stationary in the wide sense or weak stationary. In the above definition, instead of using the term "stationary 3 stochastic process" we can also use the term "stationary time series" or simply "stationary series." If the other statistical properties besides the mean, variance and covariance do not depend on time, the stochastic process is stationary in the strict sense or Strong stationary. Conversely, if any proper- ty depends on time, the process is a non-stationary process However, as the various definitions of stationarity would imply, a process can be stationary in regard to one property, but it can be non-stationary in regard to another property. 1.2 TIME SERIES MODELS A mathematical model representing a stochastic process is called "stochastic model" or “time series model." It has a certain mathematical form or structure and a set of param- eters. A simple time series model could be represented by 2 single probability distribution function f(X; 6) with param- eters 6 = {61, 62, ...} Valid for all positions t= 1,2, ..., and without any dependence between X,, Xz, ..... For instance, if X is normal with mean yp and variance 0%, the time series model can be conveniently written as X,=utoe t= 1,2, ay ae where ¢, is also normal with mean zero and variance one and €,, €2, ... are independent. In Eq. (1.1) the model has the parameters wand o and since they are constants (do not vary with time) the model is stationary. The struc- ture of the model is simple since the variable X, is a func- tion only of the independent variable ¢, and so X, is also independent. A time series model with dependence structure can be formed as Foe yt & (1.2) where £ is an independent series with mean zero and variance (1 - 97), & is the dependent series, and is the parameter of the médel. In Eq. (1.2) ¢, is a dependent series because in addition to-being_a function of &, it is a function of the same variable ¢ at time t-l. If ¢; in Eq. (1.1) would be represented by the dependent model of Eq (1.2) then X, would also become a dependent model. In this case the parameters of the model of X, would be y, o and 9. Since the parameters of the above models are constants, the models are stationary representing stationary time series 4 or stationary stochastic processes. Non-stationary models would result if such parameters would vary with time. 1.3 TIME SERIES MODELING Assume we have a sample time series Xi, Xo, ..., Xy of size N, such as N_ years of annual streamflows. We would like to find a mathematical model that represents such time series. The techniques and procedures for finding such a model is called "time series modeling.” Time series modeling is a process which can be simple or complex, depending on the characteristics of the available sample series, on the type of model to use and on the select- ed techniques of modeling. For instance, series with statisti- cal characteristics that do not vary with time usually lead to models and modeling techniques which are simpler than those of series with time-varying characteristics. There are several types of stochastic models which can be used to represent a time series, Some are more complex than others. For a particular type of model, there are various techniques for estimating the parameters of the model and for testing how good the model is. Also, in this case some techniques are more complex than others. Much of the simplicity or com- plexity of the modeling process ultimately depends on the modeler, such as the modeler's theoretical knowledge and practical experience. In general, time series modeling can be organized in the following stages (Box and Jenkins, 1970): 1) the selection of the type of model, 2) the identification of the form of the model, 3) the estimation of the mode] parameters, and 4) the diagnostic check of the model The first stage refers to selecting a type of model among the various types of models available to the analyst. For in- stance, two common types of models for representing the dependence of time series are the Markov chain model and the autoregressive model. For a particular case, the modeler may have to choose between these two types of models. Once a type of model is selected, the next stage is to identify the form or the order of the model. For instance, if autoregres~ sive models were selected in the first stage, then we need to identify the order of the autoregressive model, say order one (one autoregressive coefficient or one parameter), order two, ete, The third stage is to estimate the parameters of the 5 model identified in stage two and some checks are made on the conditions to be met by the estimated parameters. The final stage of the modeling is to make some diagnostic checks to verify how good the model is. The overall time series modeling is actually an iterative process with feedback and interaction between each of the above-referred stages 1.4 PHYSICAL BASIS OF TIME SERIES MODELING IN HYDROLOGY Differences always exist between the true and estimated models and between the true and estimated model parameters. These differences represent modeling uncertainties. One way of decreasing such uncertainties is by selecting the model which best represents the physical reality of the system Sometimes it may be feasible to use physical laws to infer what should be the mathematical expressions of the corre- sponding stochastic models of hydrologic series. This infer- ence is very much contingent on how well the application of physical laws fit the natural hydrologic phenomena, and how various errors and complexities further affect differences between the true and the inferred mathematical models The modeling of streamflow time processes has essentially followed two approaches: the deterministic or physical simula- tion of the hydrologic system, and the statistical or stochastic simulation of the system. In the first approach, the hydro- logic system is described and represented by theoretical and/or empirical physical relationships. There is always a unique correspondence between the input, say precipitation, and the output, say streamflow. _Within this approach, two representative models are the known Stanford Watershed Model (Crawford and Linsley, 1966) and the MIT model (Harley et al., 1970). On the other hand, in the stochastic approach, a type of model is assumed aimed to represent the most relevant statistical characteristics of the historic series. Within this approach, the most widely used models have been the autore- gressive models (Thomas and Fiering, 1962; Yevjevich, 1963) Subsequently, other deterministic and stochastic models appeared in the literature. Several arguments have been given by the advocates of deterministic and stochastic approaches to streamflow modeling and simulation; arguments usually in favor of their own and/or against the other approach. In spite of that, Yevjevich (1963), Thomas (1965) and Fiering (1967), tried to set the physical basis of stochastic modeling, at least for the case of the autoregressive models. In the 1970's, a tendency was observed for linking and reconciliating both the deter~ ministic and stochastic approaches (Quimpo, 1971). On the one hand, the deterministic approach treats the precipitation as a random variable and transfers such randomness to 6 streamflow while keeping its deterministic framework; on the other hand, physical justification of stochastic models is be- coming relevant not only for operational purposes, but also for explaining certain controversial aspects in stochastic hydrology, such as the Hurst phenomenon. Following Quimpo's lead, other papers appeared in the literature such as those by Moss and Bryson (1974), concerning the physical basis of seasonal stochastic models, Klemes (1973), concerning the modeling of watershed runoff based on concepts of semi- jafinite storage reservoirs, O'Connor (1976) extending Quimpo's work and relating the unit hydrograph and flood routing models to autoregressive and moving average models, and by Pegram (1977), Selvalingan (1977) and Dawdy et al (1978) in providing the physical justification of continuous stochastic streamflow models As an example of joint physical and statistical analysis in inferring on the model, it can be demonstrated that if the river flow recession is a simple exponential function Q, = Q, exp(-Kt) where @, is the flow at the beginning of a year and K is the recession constant, then the time dependent annual runoff series Y, should follow the first order autore- gressive model AR(1), namely Y; - Y = o(¥j_) Yt cn where Y is the mean of Y;, $ is the autoregression co- efficient and e; is the independent stochastic component (white noise), (Yevjevich, 1963). It is not difficult to find that the recession constant K and the autoregressive param- eter are related as 6 = exp (-K). However, how close the AR(1) model is to the true model of the annual runoff series, depends on how good the assumption of exponential river flow recessions are. As an example of the physical justification of autoregressive and moving average (ARMA) models for annual streamflow simulation, let us consider a watershed system as in Fig. 1.2, where the variables are of annual values. Then the annual streamflow 2, is composed of groundwater contri- bution equal to cS,; and surface runoff equal to dx, (Fiering, 1967). That is 2 = Sy td x, (1.3) The continuity equation for the groundwater storage S, gives Sp= Sp, tax ce Sy Osa,b, es! Precipitation Osarbsi Evaporation | *t Bx: ’ Ground Surface: x Surface Runott -a-b)Xy 24K, liven Streamflow Ground Water — Level Ground Water Figure 1.2. Conceptual representation of the precipitation- streamflow process (Salas and Smith, 1980a). or 8, = Cre) Stax. (1.4) Combining Eqs. (1.3) and (1.4) Salas and Smith, (1980a) showed that the model for the annual streamflow 2, can be written as (1-e) 4) + d x, - [d(l-c)-ac] x_) (1.5) t t which has the form of an ARMA(1,1) model when the annual precipitation is an independent series. They also extended the above formulation for the general case of ARMA(p,q) models. 1.5 REPRODUCTION OF HISTORICAL STATISTICAL CHARACTERISTICS Models are built to "reproduce" or to "resemble" the main statistical characteristics of the historical hydrologic time series. Such reproduction or resemblance is understood to be in the statistical sense. It does not mean that a generated series based on the model has to give exactly the same statis- tical characteristics as shown by the historical record. This brings up the questions of what statistical characteristics are to be reproduced by the model and how these characteristics should be interpreted or understood Unfortunately, there are no unique and easy answers to the above questions. First of all, the true or population sta- tistical characteristics of hydrologic series is never known, because what is observed or measured is only a finite 8 (sample) number of years (N), and as a result the character- istics derived from such samples are only estimates of the true (unknown) characteristics. Those estimates from a sample of N years are uncertain because if instead of N years of observation, a different number of years N', either smaller or larger than N were observed, then those estimates based on a sample of N' years would be different from those based on N years. The values observed in the historical series of any given number of years is only one realization of the infinite number of possible realizations that may have occur- red during that time. Consequently, the statistical character~ istics derived (estimated) from that sample are only one possible estimate out of many others. That is, the sample estimates are random variables and so they are uncertain. Whenever possible and necessary, such uncertainty must be incorporated in the modeling of hydrologic time series. Apart from the problem of uncertain statistical characteristics for a given time series sample, there is the problem of definition and interpretation of the statistical characteristics derived from the sample. The main character istics are the mean, standard deviation, skewness and auto- correlation. Usually the mean and standard deviation are the less uncertain characteristics. Therefore, no one questions their reproduction by a given model. The skewness is highly uncertain, so whether or not a model is able to reproduce precisely the estimated skewness depends on how long the sample was, and how important the skewness is for the model application. For instance, if a reservoir is designed for almost full river development, then the skewness is not very important. However, if it is designed for a low level of development, then the skewness is important (Klemes, 1972). The autocorrelation is also very uncertain, especially for small sample sizes. Its interpretation often decides on the type of model to be used as well as on its form. Some other statistical characteristics of hydrologic series often are important to look at, even though they may depend on the main characteristics already mentioned. The range (related to storage capacities of reservoirs) and the run (property related to droughts) are the additional characteris- tics, important in water resources studies, that may be de- rived from a series. However, their interpreation and their ultimate use in hydrologic series modeling has led to contro- versies among hydrologists. The range of cumulative depar- tures from the sample mean is related to the minimum storage capacity required for a reservoir to deliver the sample mean throughout a time period equal to the length of the sample (NX). A related statistic, the rescaled range (the range divided by the sample standard deviation) is proportional to N¥ as N>© for models most typically used in hydrology such as AR and the ARMA models. Analysis made by Hurst 9 (1951), using long records of geophysical time series, appears to show that their rescaled range is proportional to N® with h > (While for models such as those referred to above h = 4). “This apparent discrepancy has been called the "Hurst phenomenon." Extensive arguments on issues such as the interpretation of the Hurst phenomenon per se, the uncertain~ ty of the estimates ef the exponent h, the models to be used to reproduce the Hurst phenomenon ‘and its impact on the design and operation of the typical 50-100 year planning horizons, have been raised among hydrologists during the past 30 years, However, studies such as those carried out by Yevjevich (1965), Fiering (1967), O'Connell (1971), Klemes (1974), Delleur et al. (1976), Hipel and McLeod (1978) and Salas et al. (1979), leed us ‘to conclude that simple models such as the AR and ARMA models used in this book are, for most cases of hydrologic series, capable of reproducing the necessary range statistics related to water resources planning problems Just as the consideration of range statistics have led to controversy in stochastic hydrology, the interpretation and reproduction of drought related statistics have also been controversial. It has been a common procedure to use the critical drought (longest run length or largest run sum) of a historic record for making decisions related to designing and operating reservoirs. However, historic droughts, just like any other statistical characteristic are random variables. For instance the average drought duration or the largest (critical or most severe) drought derived from a historic sample of size N are random variables. Therefore, when stochastic models and corresponding simulations are to be used for design and operation of reservoirs, the problem is how to incorporate such historic droughts into the model, or rather what the drought characteristics are and how they should be reproduced by the model. One criteria may be to reproduce the average drought duration and another may be to repro- duce the critical drought. However, such reproduction should be made in a statistical sense, that is, the model should reproduce, say, critical droughts with a given prob- ability of occurrence during a period equal to the historic sample size N. For instance, if such probability is 50% it would mean that the return period of the critical drought is N years. Actually, there is not a unique answer to the questions raised above. Ultimately, it depends on judgment and the analysis of each particular case 1.6 TIME SERIES MODELS IN HYDROLOGY Early studies by Hazen (1914) and Sudler (1927) showed the feasibility of using statistics and probability theory in analyzing river flow sequences. Hurst (1951) in investigating 10 the Nile River for the Aswan Dam project, reported studies of long records of river flows and other geophysical time series, which years later tremendously impacted the theoretical and practical aspects of time series analysis of hydrologic and geophysical phenomena. Barnes (1954) extended the early empirical studies of Hazen and Sudler and introduced the idea of synthetic generation of streamflow by using a table of normal random numbers. However, it was not until the beginning of the 1960's that the formal development of sto- chastic modeling started with the introduction and application of autoregressive models for annual and seasonal streamflows (Thomas and Fiering, 1962; Yevjevich, 1963). Since then, several groups around the world engaged in extensive re~ search efforts toward improving those early concepts and models, providing physical justification of some models, intro- ducing alternative models and studying their impacts in water resources systems planning, design’ and operation. Literature related to these various aspects is extensive and has been reviewed by several hydrologists such as Chiu (1972), Rodriguez et al. (1972), Klemes (1974), Jackson (1975a), Lawrance and Kottegoda (1977), and McLeod and Hipel (1978): Several stochastic models have been proposed in the past for modeling hydrologic time series. They are: autoregres- sive models (AR) (Thomas and Fiering, 1962; Yevjevich, 1963; Matalas, 1967), fractional Gaussian noise models (FGN) (Mandelbrot and Wallis, 1968; Matalas and Wallis, 1971), autoregressive moving-average models (ARMA) (Carlson, et al., 1970; O'Connell, 1971); broken-line models (BL) (Mejia, 1971), shot-noise models (Weiss, 1973), model of intermittent processes (Yakowitz, 1973; Kelman, 1977), disaggregation models (Valencia and Schaake, 1973), Markov mixture models (Jackson, 1975b), ARMA-Markov models (Lettenmaier and Burges, 1977), and general mixture models (Boes and Salas, 1978). | Supposedly, all of these models have been developed and proposed with the objective to reproduce the main statis- tical characteristics which are observed or identified in hy- drologic time series, but which have a bearing on the design and/or operation of the water system under study. Although each model has its own merit and some of them can be suc- cessfully applied in operational hydrology, they do have limitations. They all have been criticized for one or more of the following reasons: (i) not being able to reproduce short- term dependence, (ii) not being able to reproduce long-term dependence, (iii) difficulty in estimating parameters, (iv) limitations for generating large samples of synthetic data, (v) lack of physical basis, and (vi) too many parameters. It would take an extensive discussion beyond the intents of this book to make an accounting of the advantages and limitations of each of the above-mentioned models. The inter- ested reader may wish to go back to the original publications i as well as to the reviews referred to above. Specific com- ments on the limitations of AR, ARMA, ARIMA and disaggre- gation models are given in Sections 4.2.7, 4.3.7, 5.2.8, 5.3.6, 6.2.6, 6.3.6, 6.3.7, 7.2.6, 7.3.6, and 8.8. The experience in using ‘time series analysis in hydrology shows that the judicious use of AR, ARMA, ARIMA and disaggrega- tion models will generally produce satisfactory results for most practical cases of operational hydrology. This book is limited to these models. Other models are not included either because they are too complex for this book, or they are not well incorporated into current state-of-the-art practice. 1.7 TIME SERIES MODELING IN HYDROLOGY The exact mathematical models of hydrologic time series are never known. The inferred population models are only approximations. The exact model parameters are also never known in hydrology since they must be estimated from limited data. Estimations of models and their parameters from avail- able data are often referred to in literature as time series modeling or stochastic modeling of hydrologic series. Although the development of time series modeling in hydrology has reached some degree of sophistication, unfor- tunately most time series modeling in practice is still generally based on the simple methods. This usually involves the selection of a time series model (usually an AR model) in advance, and the estimation of its parameters by the method of moments without verifying their constraints, without test- ing the goodness of fit of the model, verifying or checking it against competing models. It seems that one of the main reasons for this situation is that although many statistical books are available with the above-mentioned techniques, and many research papers are published in several journals (Water Resources Research, Journal of Hydrology, Hyd. Div. ASCE, etc.), they appear to be out of the reach of most practicing engineers due to the language barrier and the mathematical complexity used. Another reasOn is that the available books, monographs and handbooks in hydrology which include sto- chastic modeling, are either too cumbersome to read or too oversimplified. Therefore, the practitioners have two choices. They either do not follow published literature be- cause it was not written on their level, or else a very simpli- fied and limited modeling approach is followed. This book intends to close the gap of research and practice by describ- ing up-to-date advances in modeling in a systematic step-by- step approach, including various details examples of modeling hydrologic time series. A systematic approach to hydrologic time series modeling may be composed of six main phases (Salas and Smith, 1980b): (1) identification of model composition, (2) selection 12 of model type, (3) identification of model form, (4) estimation of model parameters, (5) testing goodness of fit of the model, and (6) evaluation of uncertainties. Figure 1.3 illustrates graphically the interaction between the above six modeling phases. In general, in any modeling of hydrologic time series, one has to decide whether the model will be a univari- ate or multivariate model, or a combination of a univariate and a disaggregation models, or a combination of a multivariate and a disaggregation models, etc. This decision is referred herein as the identification of the model composition. Such identification generally depends on the characteristics of the overall water resource system, the characteristics of the hydrologic time series, and the modeler's input. Vohaactarieties Vodelex I of the Overail Knowledge, Experi ater Rerources oENTIF TATION aces Bias, System OF nODEL Einsteins conposition J 1 Character SELECTION OF § of lydologic. | ——a} Jad [characteristics “Tine Series MODEL TYPE ‘of Hydrologic “ Physical Procestes {LL ewrsercation oF _—— | soner ror ESTIMATION OF [over ParwrereRs y TESTING CoouNESS | OF FIT OF 14 MoUEL EVALUATION OF CERTAINTIES. Figure 1.3, Systematic approach of hydrologic time series modeling (Salas and Smith, 1980b) For instance, to analyze the operation of a reservoir by simulation, monthly inflows to such reservoir must be gener- ated. If there is no cther upstream reservoir or structures 13 that may affect the operation of such reservoir, the univariate modeling of monthly streamflows at or near the site of the dam should be selected. On the other hand, if other reservoirs exist or are planned upstream from the reservoir under study, the multivariate modeling of monthly streamflows at various sites should be the choice. However, instead of multivariate modeling of monthly streamflows, the modeler may select the multivariate modeling of annual series and then use the disaggregation model to obtain the corresponding monthly flows. The above decisions are contingent on the availability of adequate data in the system under study, as well as on their statistical characteristics. For instance, two time series which show significant cross correlation will require a bi- variate modeling, but if the cross correlation is not signifi- cant the two times series can be modeled independently by univariate models Once the model composition is identified, the type of the model(s) must be selected. Namely, the modeler has to decide on one among the various alternative models, say AR (autoregressive), ARMA (autoregressive moving average), ARIMA (autoregressive integrated. moving average), FGN (Fractional Gaussian Noise), BY (Broken Line), SL. (Shifting Level) or any other model that is available in stochastic hydrology. In this decision, three factors are important: the characteristics of hydrologic physical processes, the characteristics of hydrologic time series, and the modeler’s input. In this book we have already restricted the several models available into AR, ARMA, ARIMA, and disaggregation models. Even though ARIMA models include both AR and ARMA models, we have purposely separated them because for practitioners it is easier to Whderstand and to deal with AR models, than with ARMA and finally with ARIMA models The physical characteristics of hydrologic processes help in the selection of the types or the alternative types of models. In fact, we selected the AR and ARMA models because of the physical reasons discussed in Sec. 1.4 The statistical characteristics of the samples of hydrologic series are important deciding factors in the selec- tion of the type of model. For instance, series with low decaying correlograms (long memory) generally require ARMA models rather than AR models. Monthly series, whose annual series has slow decaying correlograms generally require a two level modeling, an ARMA model for the annual series and the disaggregation model to obtain the monthly series. The model type selection ultimately depends on the modeler. In reality this is probably one of the most important factors. The modeler's knowledge of the advantages and limitations of the various types of models will enable him to make better deci- sions and to have a choice. On the other hand, if he only Knows about AR models, then he is bound to choose that 14 model whether it is appropriate or not for the particular problem at hand. The modeler's experience helps substan- tially. Experience will tell when a model can be used or should not be used. A common problem with most modelers is personal bias. This is an unfortunate situation, although perhaps unavoid- able. Personal biases are especially important with those (usually researchers) who were directly or indirectly involved with the development of a given type of model or are "aligned" with it. Some think that AR models are the answer for everything, while others think the same about ARMA models. Some others think that FGN or BL models are the last thing in modeling while others believe that SL models will handle all cases. There are, of course, some exceptions as “non-aligned or third world" modelers who are willing to look at more than one side. Finally, the modeler's limitations such as time available to solve a particular problem, funds for computer time, and availability of ready made programs of alternative models, also contribute in the decision of selecting the type of model (Salas and Smith, 1980b) Once the type of model is selected, the third phase of the modeling is to identify the form of the model. This iden- tification as implied herein goes beyond determining the orders p and q, say of an ARMA model as in the Box- Jenkins approach. For instance, in time series analysis of weekly streamflows, it is necessary to identify whether the series is skewed and if such skewness is consiant or period~ ic, whether the week-to-week correlation coefficients are periodic, and whether the periodic characteristics should be described by the Fourier series, in addition to identifying the order say of an ARMA model. |The statistical characteristics of the historic time series are important for such model iden- tification. In this case the knowledge and experience of the modeler also plays a significant role. Once the model is identified, the_estimation of the parameters of the model is made. The proper method of estimation should be selected The method of moments and the (approximate) method of xximum likelihood are the two methods usually available Generally, the. latter method gives the best estimators. In any case, the estimated parameters must comply with certain conditions of the model which should be checked. If these conditions are not met, an alternative form of the model is required. The model estimated in phase (4) needs to be checked in order to verify whether it complies with certain assumptions about the model and to verify how well it represents the his- torical hydrologic time series. The model assumptions to be checked are usually the independence and normality of resi- duals of the model (for instance, the series £, of Eq. 1-2). In addition, comparisons based on correlograms can be made to see if the model correlogram resembles the historical corre- logram. Further comparisons, based on data generation, can be made which helps to verify whether the model reproduces statistically historical statistics such as the means, variances, skewness, correlations, storage related statistics, drought related statistics, etc. If the above checks and comparisons are not satisfactory, the model form or even the model type should be changed and the procedure repeated until a satis- factory model is found Once the model is judged to be adequate, it remains to evaluate the corresponding uncertainties. Two kinds of un- certainties, gre usually encountered in hydrologic time series analysis: \(a) model uncertainty, and Srameser—DACRT tainty. Model uncertainty results Because ee Grae modelo hydrologic time series are not known and at best the identi- fied model composition, and the selected type and form of the model are only close approximations. Parameter uncertainty results because the model parameters are estimated from a limited amount-ofdata. Model uncertainty may be evaluated by testing whether significant differences in the statistics generated by alternative models exist. Parameters uncertain- ty may be determined by finding the distribution of parameter estimates, and by using the models with parameters sampled from such distributions. Other chapters of this book discuss in more detail the problem of parameter uncertainty for the AR, ARMA and ARIMA models 1.8 APPLICABILITY OF TIME SERIES MODELING IN HYDROLOGY Time series modeling has mainly two uses in hydrology and water resources: (1) for generation of synthetic hydro- logic time series, and (2) for forecasting future hydrologic series. Generation of synthetic series are generally needed for reservoir sizing, for determining the risk of failure (or reliability) of water supply for irrigation systems, for deter- mining the risk of failure of dependable capacities of hydro- electric systems, for planning studies of future reservoir operation, for planning capacity expansion of water supply systems, and similar applications. Forecast of hydrologic series are generally needed for short term planning of reser- voir operation, for real time and short term operations of river basins or systems, for planning operation during an ongoing drought, and similar applications. Additional discus- sion on modeling applications are given in Secs. 4.2.8, 4.3.8, 5.2.9, 5.3.7, 6.2.7, 6.3.7, 6.3.8, 7.2.7, 7.3.7, 8.9 and 9.3. 16 APPENDIX Al.1_ DEFINITIONS, TERMS AND NOTATION Some definitions and terms are given herein so that the reader has less difficulty in following the material presented in this book. Also the notation has been carefully selected to make the presentation of equations and models clear and following as much as feasible, the standard notation commonly encountered in published literature. DEFINITIONS AND TERMS Normalization The operation by which a time series is transformed into normal Standardization The operation by which a time series with a given mean and standard de- viation is converted into a series with mean zero and standard devia- tion one. Independent series Time series which does not have any dependence in time or in space. Independent stochastic Time series which is independent in component time and identically distributed. White noise Same as independent stochastic com- ponent but normally distributed. Periodic series Time series with periodic components or periodic parameters Seasonal series Time series with time intervals that are a fraction of the year. Historical series Time series measured in the past Original series or Series or data available before any data analysis is made. AR model Autoregressive model ARMA model Autoregressive moving average model ARIMA model Autoregressive integrated moving av- erage model Empirical distribution | Frequency distribution of data (no reference to any certain probability distribution function). 17 Normal (0,1) Population Sample Estimate NOTATION Item Population parameter Estimated parameter Normal transformation Inverse normal transformation Logarithms Original data Normalized data Standardized data Stochastic component Variance Standard deviation Covariance Covariance matrix of parameters 6 and 6 Normal distribution with mean zero and variance one. Theoretical, true or known distribu- tion, parameter or __ statistical property. Observed, assumed or generated data of a limited size. Distribution, parameter or any sta- tistical property estimated from a sample Description Ic. Greek letter and Roman letters Example: $, 6, b, A lc. Greek and Roman letters with caret. Example: 6, 6, 6, A, ¥ Y = g(X). Example: Y = log(X) X= gl(¥), Example: X = log” (v)= antilog(Y) log ,(X), In(X) : Base e log 9(X) Base 10 log(X) : Any base X, x Yoy Zaz &, » $, oF independent series Var(X), 0”, s, Cy os Cov(X,¥), Syy V@, 0) 18 Autocovariance function co, Autocorrelation function ACF Population autocorre- py, P,(2), ation function PIO Pea Sample autocorrelation ry, T(x), Ty function Partial autocorrelation 4,(k), PACF function Spectral density g(f) function Univariate series er BE x, Multivariate series Xe et Generated series Bp X A converges to B AoB R proportional to n Rwn A equivalent to B AB Inverse of matrix A at Transpose of matrix A AT Derivative of $ 3 + %S respect to APPENDIX Al.2. ELEMENTARY STATISTICAL PRINCIPLES Elementary statistical principles used in various chapters of this book are briefly reviewed herein. The purpose is a handy access to some basic definitions and properties. For a formal presentation on this subject the reader is referred to standard textbooks on probability and statistics such as those by Benjamin and Cornell (1970), or Mood et al. (1973). Random Variable It is a variable whose outcomes (values) are governed by chance, Its values can not be predicted with certainty but only in probability terms. Random variables can be discrete or continuous. Discrete random variables take on values only at discrete (specified) points, while continuous random 19 variables can take on any value on the real axis or any value between two boundaries values Probability Distribution Function It is a function that defines the probability associated with a random variable. It is also called the probability law of a random variable. For instance, if a random variable X can take on only the values 0 and 1 with probabilities 0.3 and 0.7, respectively, then the probability distribution function or probability law is p(X=0) = 0.3 and P(X=1) = 0.7. The probability distribution function of discrete random variables may be represented by: the probability mass function (PMF) and the cumulative distribution function (CDF), Fer in- stance, for the discrete random variable X=1, X=2, X=3 and X=4, the PMF may be P(X=1) = 0.2, PC = 0.35, P(X=3) 0.25 and P(X=4) = 0.2. The corresponding CDF is P(Xs1) 0.20, P(X<2) = 0.20 + 0.35 = 0.55, P(X<3) = 0.20 + 0.35 + 0.25 = 0.80 and P(X<4) = 0.20 + + 0.25 + 0.20 = 1.00 These two functions are plotted in Fig. Al.1 Similarly, the probability distribution function of a continuous random variable is represented by: the probability geP(xex) 038 02s o2k 2 0.20 7 x ° 1 2 3 4 = A F(x) = P(X=x) 1.0- — 0.20 08 = 0.28 O.6F 04h 03s Figure Al.1. PMF and CDF of a discrete random variable 20 density function (PDF) and the cumulative distribution function (CDF). ‘The PDF of a random variable X, usually denoted by f(x), serves to determine probabilities by inte- gration. That is, the probability of X to be between x, and X2 Xz is obtained by P(xi0, > 0, [ag, ag agg] > 0, ete. aa eee 31 832 33 (AL.15) 26 A is positive semidefinite if the inequalities in Eq. (A1.15) are replaced by > signs REFERENCES Barnes, F. B., 1954. Storage required for a city water supply. J. Inst. Eng., Australia, 26, pp. 198-203. Benjamin, J. R. and Cornell, C. A., 1970. Probability, Statistics and Decision for Civil Engineers. McGraw-Hill Book Co., New York. Boes, D. C. and Salas, J. D., 1978. Nonstationarity in the mean and the Hurst phenomenon. Jour. Water Resour. Res., 14, 1, pp. 135-143 Box, G. E. P. and Jenkins, G., 1970. Time Series Analysis, Forecasting and Control, San Francisco, Holden-Day. Carlson, R. F., MacCormick, A. J. A., and Watts, D. G., 1970 Application of linear models to four annual streamflow series. Jour. Water Resour. Res., 6, 4, pp 1070-1078. Chiu, C. L., 1972. Stochastic methods in hydraulics and hydrology of streamflow. Geophysical Surveys, 1, pp 61-84. Crawford, N. and Linsley, R. K., 1966. Digital simulation in hydrology: Stanford Watershed Model IV. Technical Report 39, Stanford University, Stanford, California. Dawdy, D. R., Gupta, V. and Singh, V., 1978. Stochastic simulation of droughts. Paper presented at the U.S. - Argentinian Workshop on Droughts, Mar del Plata, Argentina, December. Delleur, J. W., Tao, P. C. and Kavvas, M. L., 1976. An evaluation of the practicality and complexity of some rainfall and runoff time series models. Jour. Water Resour. Res., 12, 5, pp. 953-970 Fiering, M. B., 1967. Streamflow Syntehsis. Harvard University Press, Cambridge, Massachusetts Harley, B. M., Perkins, F. E. and Eagleson, P. S., 1970. A modular distributed model of catchment dynamics. Ralph M. Parsons Laboratory Report 133, Dept. of Civil Eng., MIT., Cambridge, Mass 27 Hazen, A., 1914. Storage to be provided in impounding reservoirs for municipal water supply. Trans. Amer. Soc. Civil Eng., 77, pp. 1539-1669 Hipel, K. W. and McLeod, A. I., 1978. Preservation of the rescaled adjusted range, 2. Simulation studies using Box-Jenskins models. Jour. Water Resour. Res., 14, 3, pp. 509-516 Hurst, H. E., 1951. Long term storage capacity of reservoirs.’ Trans. Amer. Soc. Civil Engrs., 116, pp. 710-799 Jackson, B. B., 1974a. The use of streamflow models in planning. Jour. Water Resour. Res., 11, 1, pp. 54-63. Jackson, B. B., 1975b. Markov mixture models for drought lengths. Water Resour. Res., 11, 1, pp. 64-74 Kelman, J., 1977. Stochastic modelling of hydrologic intermittent daily process. Hydrology Paper 89 Colorado State University, Fort Collins, Colorado. Klemes, V., 1972. Comments on "Adequacy of markovian models with cyclic components for stochastic streamflow simulation", by I. Rodriguez-Iturbe, David R. Dawdy and Luis E. Garcia. Jour. Water Resour. Res., 8, 6, pp. 1613-1615. Klemes, V., 1973. Watershed as __ seminfinite storage reservoir. ASCE Jour. Irrig. and Drain. Div., 99, IR4, pp. 477-491 Klemes, V., 1974, The Hurst phenomenon--a puzzle. Jour. Water Resour. Res., 10, 4, pp. 875-688 Lawrence, A. J. and Kottegoda, N. T., 1977. Stochastic modeling of riverflow time series. Jour. Royal Stat Soc., A, 140, pp. 1-47. Lettenmaier, D. P. and Burges, S. J., 1977. Operational assessment of hydrologic models of __ long-term persistence. Jour, Water Resour. Res., 13, 1, pp 113-124, Mandelbrot, B. B. and Wallis, J. R., 1968. Noah, Joseph and operational hydrology. Jour. Water Resour. Res., 4, 5, pp. 909-918. Matalas, N. C., 1967. Mathematical assessment of synthetic hydrology. Jour. Water Resour. Res., 3, 4, pp. 937-945 28 Matalas, N. C. and Wallis, J. R., 1971. Statistical properties of multivariate fractional noise processes. Jour. Water Resour. Res., 7, 6, pp. 1460-1468 McLeod, A. I. and Hipel, K. W., 1978. Preservation of the rescaled adjusted range, 1. A reassessment of the Hurst phenomenon. Jour. Water Resour. Res., 14, 3, pp. 491-508. Mejia, J. M., 1971, On the generation of multivariate sequences exhibiting the Hurst phenomenon and some flood frequency analyses. Ph.D. Dissertation, Colorado State University, Fort Collins, Colorado. Mood, A. M., Graybill, F. A. and Boes, D. C., 1974 Introduction to the Theory of Statistics. Third Edition, McGraw-Hill, New York. Moss, M. E. and Bryson, M. C., 1974. Autocorrelation structure of monthly streamflows. Jour. Water Res. Res., 10, 4, pp. 737-744. O'Connell, P. E., 1971. A simple stochastic modeling of Hurst's law. In Mathematical Models in Hydrology, Warsaw Symposium, (IAHS Publ. 100, 1974), 1, pp. 169-187 O'Connor, K. M., 1976. A discrete linear cascade model for hydrology. Jour. Hydrology, 29, pp. 203-242 Pegram, G. G. S., 1978. Physical justification of continuous streamflow model. In Modeling Hydrologic Processes, Proceedings of the Fort Collins II International Hydrology Symposium, Edited by H. J. Morel-Seytoux, J.D. Salas, T. G. Sanders and R. E. Smith, pp 270-280. Quimpo, R. 1971. Structural relation between parametric and stochastic hydrology models. In Mathematical Models in Hydrology, Warsaw Symposium, (IASH Publ. 100, 1974), 1, pp. 151-157. Rodriguez-Iturbe, I., Mejia, J. M., and Dawdy, D. R., 1973. Streamflow simulation. 1. A new look at markovian models, fractional Gaussian noise and crossing theory Jour. Water Resour. Res., 8, 4, pp. 921-930. Salas, J. D., Boes, D. C., Yevjevich, V. and Pegram, G. G S., 1979. Hurst phenomenon as a pre-asymptotic be- havior. Jour. Hydrology, 44, pp. 1-15 29 Salas, J. D. and Smith, R. A., 1980a, Physical basis of stochastic models of annual flows. Paper accepted for publication in the Jour. Water Resour. Res Salas, J. D. and Smith, R. A., 1980b. Uncertainties in hydrologic time series analysis. Paper presented at the ASCE Spring Meeting, Portland, Oregon, Preprint 80-158 Selvalingam, S., 1978. ARMA and Linear tank models. In Modeling Hydrologic Processes, Proceedings of the Fort Collins III International Hydrology Symposium, Edited by H. J. Morel-Seytoux, J. D. Salas, T. G. Sanders and R. E. Smith, pp. 297-313. Sudler, C. E., 1927. Storage required for the regulation of streamflow. Trans. Amer. Soc. Civil Eng., 91, pp 622-660. Thomas, H. A. and Fiering, M. B., 1962. Mathematical synthesis of streamflow sequences for the analysis of river basins by simulation. In Design of Water Resource Systems (A. Mass et al., eds.), pp. 459-493, Cam- bridge, Massachusetts, Harvard University Press. Thomas, H. A., 1965. Personal communication to M. B Fiering. Examination questions for Engineering 250, Harvard University, Cambridge, Massachusetts. Valencia, D., and Schaake, J. C., 1973. Disaggregation process in stochastic hydrology. Jour. Water Resour. Res., 9, 3, pp. 580-585. Weiss, G., 1973. Filtered poisson processes as models for daily streamflow data. Ph.D. Thesis, Mathematics De- partment, Imperial College, London, quoted in "Flood Studies Report," Vol. I, 1975, Natural Environmental Research Council, London. Yakowitz, 8. J., 1973. A stochastic model for daily riverflow in an arid region. Jour, Water Resour. Res., 9, 5, pp 1271-1285. Yevjevich, V. M., 1963. Fluctuations of wet and dry years Part’ 1. Research data assembly and mathematical models. Hydrology Paper 1, Colorado State University, Fort Collins, Colorado. Yevjevich, V., 1965. The application of surplus, deficit and range in hydrology. Hydrology Paper 10, Colorado State University, Fort Collins, Colorado 30 Chapter 2 CHARACTERISTICS OF HYDROLOGIC SERIES The main purpose of this chapter is to describe the basic and complex statistical characteristics of hydrologic series most commonly encountered in practice. The several types of hydrologic series are first discussed in Sec. 2.1. Section 2.2 describes the general properties of hydrologic time series and gives the equations to compute such proper- ties. Sections 2.3 and 2.4 discuss the characteristics of annual and periodic univariate time series, respectively, while Sec. 2.5 deals with multivariate time series. Finally, the characteristics of intermittent time series are briefly discussed in Sec. 2.6 2.1. TYPES OF HYDROLOGIC SERIES The types of series most commonly encountered in hydrology are basically unidimensional (univariate) in time, line or counting series, and multidimensional (multipoint, multisite or multivariate) in time, line or counting series. __-Whenever_the reference is._time the series—is usually..called "time_series' 2.1.1 TIME SERIES Hydrologic time series can be divided in two basic groups: (1). single time-series at a specified point, and (2) -multiple. time series at several points or multiple series of different kinds at one point. Single time series are also called univariate series while multiple time series are called multisite, multipoint or multivariate time series. In any case they constitute sets of mutually related time series of indi- vidual points along a line,” over an area, across a space, or as sets of time series of mutually related variables of various kinds. Examples of single time series are annual precipitation or annual streamflows at a gaging station, monthly precipita- tion or monthly streamflow at a gaging station, average an- nual or monthly precipitation over an area, aggregated annual or monthly streamflow for a watershed system, etc. Examples of multiple time series are water quantity and related water quality variations in time, the series of annual or monthly precipitation at various gaging stations, the series of annual or monthly streamflow at various points of a river, variables that change over a river cross section for a given time, etc Single and multiple time series are further distinguished according to the time interval used, because the general characteristics, the methods of modeling and the estimation of parameters are related in various ways to the selected time 31 interval. Basically, time intervals determine the following types of time series: 1, Continuous time series for which variables are continuously recorded (time interval zero) 2. Time series of intervals that are a fraction of the day, such as the hourly, 2-hour, 6-hour, 12-hour, etc. (examples are series of short intervals rain- fall), that exhibit the daily and the annual cycle in their basic statistical characteristics in addition to random variations 3. Time series of intervals that are fractions of the year, such as day, week, month, season, or their multiples, with the annual cycle in their ‘statistical characteristics in addition to random variations 4. Annual time series which by the summation or integration over the year do not exhibit cycles. Experience shows that the estimation of models and parameters become simpler to perform as series go from the continuous or very short interval time series to large interval time series, and the simplest analysis is for the series with the time interval of the year 2.1.2 LINE SERIES The line series may be also divided in two basic groups: 1. single line series, such as those representing a random process of a property of river channels along the channel axis, or a porous media property along a well or a drill hole; and re multiple cross sectional lines series, obtained when several line series define the stochastic properties of an area or a space. 2.1.3 COUNTING SERIES The counting series represent the result of sequential counting of the number of occurrences of 2 particular type, as random events in intervals of time, along a line, across an area or over a space. They can be either Univariate or muitivariate series. For instance a counting series result from the number of rainy days during each month for a particular site or area 32 2.2 GENERAL PROPERTIES OF HYDROLOGIC TIME SERIES Univariate series are generally described by estimating their statistical characteristics such as the mean, standard deviation, skewness, probability distribution and time depen- dence structure. On the other hand multivariate series requires, besides the characteristics of each individual series, the estimation of the interrelationships among the series. Both single and multiple series (time, line or counting) are basically studied as discrete series of various intervals. The analysis of series in time is the main subject of this text, although some principles and techniques discussed may be applied to the line and counting hydrologic series. Apart from a verbal description of the characteristics of common hydrologic series, the standard statistical calculations of some properties are given in this chapter. 2.2.1 COMPONENTS OF HYDROLOGIC SERIES Hydrologic time series are often represented by components such as 1. Overyear trends and other deterministic changes (such as slippages or jumps in parameters), 2. Cycles or periodic changes of the day and the year 3. The almost periodic changes such as tidal effects on hydrologic time series. 4, Components that represent the stochastic or random variations. These four components can be explained and defined in various ways as shown below Inconsistency (systematic errors) and nonhomogéneity (changes in nature by humans or by natural disruptive, evolutive or sudden processes) are mainly responsible for the overyear trends or for sudden changes (jumps, slippages) To properly determine the series characteristics, inconsis- tency and nonhomogeneity must be firstly identified and removed, because they are not expected to continue in the same form. They may either continue differently or may not continue at all in the future. The study of the operation of gaging stations with the changes at and around them, and the study of various environmental changes in river basins, should always support the statistically detected trends and jumps. 33, Apparent trends and cycles are often results of chance, namely of sampling fluctuations in a given time series. They are called here apparent, because they may seem as such for short samples, but in reality they are just part of the sam- pling variations when are considered for larger sample sizes. The apparent overyear trends and cycles are not population properties of hydrologic series, provided the known or infer- red nonhomogeneities and inconsistencies in data are properly removed. The apparent trends and cycles must be tested to be statistically significant and physically justified, before considering them as population characteristics of the time series. Reliable inferential statistical techniques should be applied for testing the significance of such characteristics Astronomic cycles are the basic causes of periodicity and almost periodicity in the characteristics of hydrologic time series. Periodicity means that statistical characteristics change periodically with the year. For some examples of periodicity in the mean, standard deviation, skewness coeffi- cient and daily correlation, see Figures 2.1 through 2.4 respectively, for the daily flows of the Boise River, Idaho, USA. Tidal processes have harmonics (frequencies) that are induced jointly by various astronomical cycles (day, lunar month, Solar year). They are almost periodic processes, because of noncomensurability of frequencies of all the har- monic variations involved (ratios of these three frequencies are not the rational numbers). The almost periodic processes never repeat themselves identically as the periodic processes do. Astronomic cycles of the day and the year are present in all the hydrologic time series of time intervals that are fractions of the day or the year. Hydrologic variables af- fected or dependent on tidal processes along coastal areas, have time series with characteristics that follow the almost periodicities of tides 4909} 3000 2000 tooo} saays 7 aoe 50ST] wos (30 Sen Figure 2.1. Estimates X, of Eq. (2.7) of the mean of daily flows,‘ each resulting from 40 values of the 40-year long sampie of the Boise River near Twin Springs, Idaho, USA 34 | ro00! 200] Ai Sele od se so BoD BO 8G] ioe (20) Estimates s, of Eq. (2.8) of the standard deviation of daily flows of the Boise River, Idaho, USA. (ioe [Seseen Figure 2.3. Estimates g, of Eq. (2.9) of the skewness coefficient of the daily flows of the Boise River, Idaho, USA. Turbulence, large scale vorticity, heat conversion, atmospheric opacity for incoming and’ outgoing radiation waves, random thermodynamic processes, and many other processes in the earth's environments, are responsible for randomness (stochasticity) in time series. These sources of randomness produce the variations in time series, referred to as the stochastic components. Storage of water and heat in earth's environments and some resulting smoothing effects, are factors that attenuate the periodic and almost periodic processes and produce the time dependence in the stochastic variation. Inputs to hydrologic environments are mostly a combination of periodic and stochastic variations that mutually interact. The earth's environments react to these inputs in three ways: (1) by smoothing or magnifying these inputs, (2) by adding, attenuating, amplifying or dampening some 35 090 \ 0.8 g o7 3 0.60| | T days 0.50 ——} 4 30 | 100 180 200 280 300 350 (1oet) (30 Sept) Figure 2.4. Estimates r, , or Eq. (2.10) of the correlation coefficients of daily flows of the Boise River, Idaho, USA. harmonics that describe the periodic components, and (3) by adding or modifying randomness resulting from various environmental factors 2.2.2. BASIC STATISTICAL CHARACTERISTICS OF TIME SERIES The reader is referred to any classical statistical book for information on various characteristics of statistical nature Here, only the most common characteristics are defined The basic statistical characteristic of a time series x t=1, ..., N is the sample mean.x given by ie (2.1) Ms i Noe where N is the sample size (length of. a-hydrolodic time series) The sample mean x is the estimate of the popula- tion mean p (the expected value, expectation). It measures the central tendency of x, or determines where the series is located as a whole The second important statistical characteristic of a time Series is the sample variance s? given by 36 (2.24) It is a biased estimate of the population variance (see Sec 3.1.1 for the definition of bias and related properties). The unbiased estimate is obtained by ae a2 s? = ayy 2 Cy, - ¥) (2.2b) tl The estimate of Eq. (2.2b) is the most generally used in statistical hydrology. The square root of —s*_ is called_the standard.deyiation. Related to the mean and standard devia- tion is the coefficient of variation y/o and x/s for the population and the sample, respectively, Just as the mean measures the location of a time series Xe the standard devi- ation measures the dispersion or the spread of the series around the mean xX. A small s means that the values x1, Xg> «++» %y do not defer much from xX, while a large s generally means that the x's have a large spread around x The sample skewness coefficient of a time series may be determined by . N 1 =)3 Ny 2 Oy - XD g == ___ (2.3) where s is obtained from Eq. (2.23). It is a biased estimate of the population skewness coefficient y. An ap- proximate unbiased estimate of y is determined by N Nz (x, - #9 t=1 OFDO=2) ss (2.3b) with s obtained from Eq. (2.2b). The estimate of Eq. (2.3b) is the most generally used in hydrology. The coeffi- cient of skewness measures the asymetry of a time series. If_ —8=0__the probability distribution of the series x, would-be , Symmetric centered around x. If g<0_the distribution. tionis skewed to the right while g>0 indicates a dist tion skewed to the left. 37 The autocovariance function measures the degree of linear autodependence (self-dependence) of a time series. The autocovariance c, between x, and x, may be K t tk determined by N-k 7 7 EX - DO D, OSKSN zim where cy represents the time lag (or distance)between the correlated pairs (x,, X,,,)» X is the sample mean of Eq. (2.1) and N is usually called the lag-k autocovariance, k is the sample size. For the particular case that k=0, co becomes the variance s? of Eq. (2.2a). The sample auto- covariance c, of Eq. (2.4) is a biased estimate of the pop- ulation autocovariance function y,. An unbiased estimate can be obtained by using (N-k) instead of N in the denomi- nator of Eq. (2.4). In either case such estimators are re- ferred as the open series estimators. Notice that these esti- mators have only N-k terms in the cross-products of Eq. (2.4). An approach that considers N cross-products terms is referred as the circular series approach. Appendix A8.1 give covariance estimators based on the circular series approach A dimensionless measure of linear dependence is obtained % by dividing of Eq. (2.4) by cg. Such operation gives x) Oxy > Xx, tek 7 (2.5a) 2 where Ty, is called the (lag-k) autocorrelation coefficient, the serial correlation coefficient or the autocorrelation func- tion (ACF). The plot of T, versus k is generally called the correlogram. The sample autocorrelation coefficient T! is an estimate of the population coefficient p,. The most currently used simple measure of time dependence of a series is the first serial correlation coefficient r, or pi for the sample or the population, respectively. An alternative estimate of the autocorrelation function ey, is 38 where X, is the mean of the first N-k values x. eee and X,,, is the mean of the last N-K values x,4)> *N-k +> X%y- Equations (2.5a) and (2.5b) give ry=1 for k-0 so the correlogram always starts at unity at the origin. In general -1 < ry < +1. The estimate ry, of Eq. (2.5b) is the maximum likelihood estimate of p, when ‘x,, X,4,) is bivariate normal (Jenkins and Watts, 1969). Although it is a good estimate of 6, When considered individually, that is not the case when several estimates say 11, r2, ... are needed. Furthermore, the estimates of Eq. (2.5b) are not positive definite (a posi- tive definite autocorrelation matrix is an important property of stationary time series). On the other hand, the estimates rj of Eq. (2.5a) are positive definite. Both estaimtes ry of Eqs. (2.52) and (2.5b) are biased downward, namely “the average fT, of values r, computed from many series of size N arf not equal to thé population value py. The mean ty is smaller than p, , and the difference increases as the value , increase and sample size N decreases. Biased estimates of the population autocorrelation is a disadvantage in practice. Often it may induce the incorrect inference about the characteristics of a time series. The bias in the estimates rj, may be removed by using the Quenouille's corrections (Kendal and Stuart, 1968, p. 435) = 2 ty 7 0.5 [r,() + 1 (2)] (2.6) where 1 is the corrected "unbiased" serial correlation estimate, rj, is the original serial correlation estimate of either Eq. (2.5a) or (2.5b) and r,(1) and ry(2) are the serial correlation estimates of the first half and the second half of the time series. A disadvantage of Eq. (2.6) is that sometimes 1 takes on values beyond the limits (+1,-1) In addition to the basic statistical characteristics of a time series such as the mean, variance, skewness and 39 autocorrelation, we may be interested on the overall probability distribution of x,, ..., xy. Such distribution may be determined either by the sample frequency distribution or by fitting a probability distribution function to the fre- quency distribution. Continuous distribution functions such as normal, lognormal, gamma and loggamma functions are generally used in practice. 2.2.3 COMPLEX CHARACTERISTICS OF PERIODIC TIME SERIES The statistical characteristics such as the mean, variance, skewness coefficient and serial correlation of periodic hydrologic time series can be determined by Eqs (2.1) through (2.5). However, such equations would only give the statistical characteristics as a whole and they will not show the effect of the annual cycle (except for the case of ¥,). In order to take into account such effect the char- acteristics must be determined for each time interval within the year Consider the periodic time series x, | where v denotes the year and t denotes the time interval within the year. The sample mean for the time interval t is determined by “ow (2.7) where N is the number of years of record and w is the number of time intervals in the year. The sample mean xX, is an estimate of the population mean 4. The sample variance for time t is given by eept Ta. t v * (2.8) NT 2, Aver It is an estimate of the population variance a Similarly, the sample skewness coefficient for time 1 is (2.9) 40 where X, is given by Eq. (2.7) and s, is obtained from Eq. (2.8). The skewness coefficient g, of Eq. (2.9) is an estimate of the population skewness coefficient y,. Although Eqs. (2.8) and (2.9) are most generally used in statistical hydrology for estimating the variance and the skewness coefficient, for certain cases we may wish to determine the biased estimates applying Eqs. (2.2a) and (2.3a), respec- tively for each time period 1. ‘The correlation structure of the time series x, may be determined for each time interval t by M2 N 1 - - yn 2 Oy - EDOXy re 7 XD OO (2.10) 1-k where rj, | is the sample lag-k correlation coefficient which is an estimate of the population correlation coefficient py |. When t-k <1 in Eq. (2.10), Nis replaced by N-1, x 1 is replaced bY Xy.1 wera ANd Hj is replaced by Fyy7_y. The estimates %-, s?, g, and ry, | are generally called the periodic or seasonal statistical characteristics of the series yr While most analyses of hydrologic series with intervals that are fractions of the year, currently consider periodicities in the mean and in the standard deviation, and sometimes in the first serial correlation coefficient, the analysis of periodi- cities in the skewness coefficient is rarely undertaken. In essence, the periodic hydrologic time series represent from a rigorous statistical viewpoint a multivariate process (say, composed of a set of 365 variables for the case of daily flows) with a different probability distribution for each time interval of the year. The periodicity is induced by the annual cycle of revolution of the earth around the sun, with the corre- sponding seasonal changes likely to be present in all the characteristics of the hydrologic time series. Similarly, the series of intervals of 9 fraction of the day exhibit the daily cycle along with the annual cycle. 2.2.4 DROUGHT RELATED CHARACTERISTICS OF TIME SERIES In addition to the statistical characteristics discussed in previous sections, some other characteristics of time series 41 are particularly related to droughts. Consider the time series Xj, ++) %y and a constant demand level y (crossing level) as shown in Fig. 2.5. A negative run occurs when x; is less than y consecutively during one or more time inter- vals. Similarly, a positive run occurs when x, is consecu- tively greater than y. We concentrate only on ‘negative runs since they are related to drought characteristics. ° 5 10 15 N Figure 2.5. Definitions of run-length and run-sum. A run can be determined by its length, its sum or its intensity. For instance, a negative run of length 4 and run- sum equal to d are shown in Fig. 2.5. In general, several runs result in a time series of given demand level and sample size. Assume that M runs of run-lengths 2(1), ..., 2(M) and run-sums (1), ..., d(M) occur. The means, the standard deviation and the maximum of run-length and run- sum are important characteristics describing the runs of a given time series. For instance, for the run-length such characteristics are obtained by M z 2(j) (2.11) Sy (2) = 2 aa - ip?) (2.12) nN’ WT 2 N ee = max (201), ..., 2). (2.13) The histogram of the 2's is also important for describing the overall distribution of runs, Notice that these characteristics are defined for a given sample size N and for a given de- mand level y., As N_ increases and/or y increases, dy, sy(2) and “2 also increase. * It must be noted that for given N and y, the above fs * characteristics Zy, sy(2) and 2, are random variables 42 For instance, if for a given sample x1, ..., xy of size = 100 years and demand y = 0.8 x, with x the mean of the x's, Eq. (2.13) gives 2}o9 = 5 years, it does not mean that too will be also 5 years for another sample of the same size and the same demand. It may be more or less than 8 years. The above run definitions and concepts can be used for both the annual series and the periodic series. In the case of periodic series, the demand level may be also periodic, . * The run length characteristics 2y, sy(2) and fy may be used for comparison with the corresponding character- istics derived from mathematical models fitted to historical series. 2.2.5 STORAGE RELATED CHARACTERISTICS OF TIME SERIES Some characteristics of time series are particularly related to reservoir storage problems. Since such character- istics are functions of the dependence structure of a series, they are also useful for identifying the degree of time depen- dence of a series. Consider a time series x,, ..., Xj and form the sequence iy i aN (2.14) where , is called the partial sums and Xy is the sample mean detdrmined by N Bx; (2.15) i=1 Define the sample standard deviation by N -{1 ~ x)2)8 ou [f 3% a (2.16) and the range (rescaled range) of partial sums by =, max (So, Si, Se, - ‘N min (So, S$), Sy, ... (2.17) with So = 0. Recall that the plot of S; versus i, i= 1, | Nis the typical mass-curve or Ripple diagram, from which the minimum storage capacity of a reservoir to deliver 43, ky throughout the time interval N, can be obtained. Since Ry of Eq. (2.17) is related to the minimum storage capacity needed, it may be a useful statistic for testing whether a model represents the "storage characteristics" of the historical time series. Assume that several samples of size N are available, from which the mean Ry is obtained. The mean range Ry yo With ON may indicate whether the series is of short memory, long memory or infinite memory. Suppose that Ry versus Nis increases with N. The type of variation of R, plotted in a log-log paper and a straight line is fitted h through the points. Then Ry~NN where hy is the slope of the fitted line. The slope hy varies with N, but as N>e either hy >% or hy >h#%. We consider that a time series is of short memory if hy > fairly fast, while it is of long memory if hy *% slowly. On the other hand, the time series is of infinite memory if hy >h#%. These concepts apply to hydrologic series or to series derived from known stochastic models. 2.2.6 NONHOMOGENEITY AND INCONSISTENCY IN HYDROLOGIC SERIES Hydrologic characteristics are generally subject to changes due to nonhomogeneity and inconsistency. Nonhomo- geneity in data is common in hydrologic time series; it is in- duced by humans or produced by significant natural disrup- tive factors, evolutive or sudden (such as natural disasters) In addition, hydrologic data may have significant systematic errors producing inconsistent series. Several characteristics of time series such as the mean, standard deviation and serial correlations may be affected whenever a trend and/or a posi tive or negative jump (slippage) are produced in hydrologic series by nonhomogeneity and inconsistency. The identifica- tion or detection, description and removal of nonhomogeneity and inconsistency are important aspects of time series analy- sis. They are most reliable if changes are substantiated by both statistical tests and physical or historical evidence and justification. The identification and description of the characteristics of changes in hydrologic time series (because of inconsistency and nonhomogeneity) are based on: (1) fitting a trend func~ tion and testing that its parameters are significantly different from zero;. and (2) testing that the basic statistical 44 characteristics of subseries of the sample series are statistically different among themselves. The trend analysis assumes @ monotonic function expanded in power series form as 7 2 m Xpiby +d, t+ bot +... +b (2.18) where bp, bj, ..., b,, are the parameters to be estimated Only when any of the parameters bi, ..., b,, are found to be significantly different from zero, with bg™a constant or zero, then a linear or a nonlinear trend becomes a character- istic of the series. The verification process of the trend should ascertain whether the history of data collection (vari- ous sources of systematic errors), and the history of river basin developments or the other natural factors (various sources of nonhomogeneity), may support and justify the final acceptance of the inferred trend. The second technique divides the historic time series into two or more subseries and the main statistical character- istics are estimated for each subseries. The breaking points of subseries should be the times of hypothesized change of the characteristics, say the times of change of observational technique, times of the start of various projects that change the water regime, or the time in which a sudden change may exist in the mean, in the standard deviation and correlation of the series. Various techniques are available for the com- parison of statistical characteristics of subseries and for testing whether they are or they are not statistically different among themselves. Figure 2.6 shows the inferred trend in the annual flow series of the Colorado River at Lee Ferry, Arizona. To illustrate the method of determining whether a change exists as the basic characteristic of a time series, the example of the time series of the net basin supply (NBS) into the Great Lakes is given in some details. This example tests homogeneity by splitting the sample into two unequal sub- samples and testing whether differences between the means of the subsamples are significantly different from zero on the 95 percent significance level. Only if the probability is less than 5 percent for a difference to be by chance, the sample is considered to be nonhomogeneous. The five mean annual NBS of Lakes Ontario, Erie, Michigan-Huron, Superior and St. Clair are used for homogeneity test. The classical t-statistic is used for testing whether the difference of two means x; and x2 is significant, that is 45 Figure 2.6. Annual flow series of the Colorado River at Lee Ferry, Arizona, 1896-1959: (1) historical an- nual flow record, with 1896-1921 period data estimated from other stations by correlation, (2) the arithmetic mean for the period 1896-1959 (64 years). (3) the arithmetic mean for the historic observed period 1922-1959, and (4) an approximate trend in the period 1922-1959 (After Yevjevich, 1972.) (2.19) with (2.20) where N, and Ny are the subsample sizes, x; is the sample values in the N, subsample and x, in the Ny subsample. The variable t of Eq. (2.19) follows the Student t-distribution with (N, + No - 2) degrees of freedom. The critical value t, for the 95 percent significance prob- ability level is taken from the Student t-tables. Similar equa- tions as above, or equations based on the F-test, can be used for testing whether the variances s{ and s3 of sub- samples are significantly different 46 Table 2.1 gives results of homogeneity test for subsample means, with only the series of annual NBS of Lake St. Clair found to be nonhomogeneous. The means of the subsamples are xX, = 3.06 and X2 = 5.24. Then the NBS series for Lake St. Clair may be corrected with x = 5.24 of the last 26 years of record also used as the mean for the first 43 years. Table 2.1. Example of Testing for Changes in the Mean of the Annual NBS of the Great Lakes System. Subsample Sizes Statistic (95%) Change From in Lake N, Nz t-Tables Computed the Mean (95%) Ontario 36 33 2.0 0.299 No Erie 36-33 2.0 0.635 No Superior 36 33 2.0 1.525 No Michigan 36 33 2.0 0.866 No St. Clair 43 26 2.0 4.477 Yes The test of homogeneity in the standard deviation of the NBS series showed all five series to be homogeneous in the standard deviation. Under the natural conditions, when a hydrologic series has no significant trend or jump in the mean and the standard deviation, usually the other statistical char- acteristics such as the skewness and correlations do not show significant changes either. 2.3 CHARACTERISTICS OF ANNUAL TIME SERIES Annual time series are the simplest series in hydrology as it concerns their general statistical characteristics, For instance, the precipitation observed in 1141 stations in the United States showed that the annual precipitation time series are very close to being time independent, stationary stochas- tic processes for a period of the most reliable data (last 70- 100 years). The independence means that the outcome of precipitation in a year does not depend on precipitation values of previous years. The stationarity means that the basic properties of a process do not change with the absolute time. The time dependence measured by the first serial cor- relation coefficient r; of annual precipitation series computed 47 by Eq. (2.5b) and averaged over the total number of series (i141) is on the order of 0.028, for the period of simultane- ous observations of 30 years (1931-1960). For all years of observations available at these stations, with the average ser- igs length of 54 years, the mean r; value is 0.055 (Yevjevich, 1964). These small values imply that on the average, less than one percent of the total variation of annual precipitation in any year may be due to the annual precipitation which has occurred in the previous years. Therefore, the annual pre- cipitation can be considered in most cases as an independent time series for the time spans of decades. Figure 2.7 shows the series of annual precipitation at Fort Collins, Colorado, USA, for 92 years (1887-1978). The first serial correlation coefficient for this series is r, = -0.20. Such negative value of r; may be considered as a large sampling deviation from the population value p; = 0 for independent series Figure 2.7. Annual precipitation series at Fort Collins, Colorado, for the period 1887-1978 (92 years), ‘in modular coefficients P,/P with P, =the annual values, and P = 14.57 inches = the annual mean Annual runoff series are either time independent or time dependent stochastic processes. When negligible changes in the total stored water of a river basin occur at the end of each water year, the series are independent. They are dependent when the storage at the end of the year has rela~ tively large fluctuation in comparison with the average annual flow. Large variations in total water carryovers in river basins from year to year may be considered as the principal physical factor that affects the time dependence of both the annual runoff and the annual evaporation. The average of first serial correlation coefficients for 140 worldwide selected annual runoff series, with the mean series length of 55 years, gave r, = 0.175. Similarly, the average for 446 annual runoff series in western North America, with the mean series length of 87 years, gave rT, = 0.197 (Yevjevich, 1964). Both sets of data showed that the average first serial 48 correlation coefficient of annual runoff series is about i, = 0.20 which is statistically different from zero. Figure 2.8 shows the annual runoff series of the Danube River at Orsova, Romania, for 120 years (1937-1957), as an example of a somewhat dependent annual series (approximately ry = 0.10). 20F a4/8 nA gh tle In} Whol ae 20 40 60 30 100) 120 Figure 2.8. Annual runoff series of the Danube River at Orsova, Romania, for 120 years (1937-1957) in modular coefficients, @,/Q with @, = the annual values, and Q = the annual mean. The dependence characteristics of annual time series are basically investigated and presented by using two classical statistical computations arid relations: (1) the correlogram which is a representation in the time domain, and (2) the spectrum which is a representation in the frequency domain. For an independent series the population correlogram is equal to zero for k # 0. However, samples of independent time series, due to sampling variability, have r, fluctuating around zero but they are not necessarily equal to zero. In such case, it is useful to determine the probability limits for the correlogram of an independent series. Anderson (1941) gave the limits “14 1.96 JN-K-1 195%) (2.21a) and 1,(99%) = (2,21b) for the 95 percent and 99 percent probability levels respec- tively and N. is the sample size. Figure 2.9 shows correlograms r, of Eq. (2.5b) of annual runoff for four large European rivers, with probability 49 — sta aver a NeMUNts RIVER —- aNUge ALVER RHINE RIVER Le989%) for w=150 L095 9%) tor Figure 2.9. Correlograms of annual runoff series of four European rivers: the Géta River, Sweden (N = 150), the Nemunas River, Lithuania (N 132), the Rhine River at Basle, Switzerland (N = 150), and the Danube River at Orsova, Romania (N = 120). Probability limits at the 95 percent level are given for the normal in- dependent variables for two lengths; N = 150 (max) and N = 120 (min), (Yevjevich, 1964). limits computed by Eq. (2.2la) for two cases of N = 150, and N = 120. The r, values for the Rhine and Danube Rivers are greater than zero, but within the probability limits. For the Géta and Nemunas Rivers the ri coefficients are outside these limits. The other r, coefficients have less than 5 percent of computed values‘ outside the limit: Figure 2.10 shows the correlogram of annual runoff series of the St. Lawrence River, the correlogram of annual effective precipitation series on the basin, the fitted correlo- gram of the first order autoregressive model to runoff cor- relogram (see Eq. 4.14) and the limits for the independent series determined by Eq. (2.21a), The correlogram shows a highly dependent annual runoff series and an independent annual effective precipitation series Figure 2.10, Correlograms of the St. Lawrence River at Ogdensburg, New York: (1) 1, of annual runoff series; (2) ry of annual effective precipitation series; (3) correlogram of the first order autoregressive model; and (4) probability limits at the 95 percent level for normal independent variables with N = 97. 50 The spectrum of annual time series may be determined by transforming the correlogram r, as (Yevjevich, 1972) m g(f) = 214+2 2 Dy ry, cos 2ntk) (2.22) K=1 where g(f) is the smoothed (weighted) sample spectral density, f is the ordinary frequency (equal to 1/2m), k is the lag; m is the maximum number of k's used (often N/6 to N/4), and D, is a smoothing function. For in- stance, Parzen (1967) gives p, = 1- 68) + 6B)? , tor ke ¢ m/2 =2- ES , form/2 m For an independent annual series, g(f) of Eq. (2.22) should not be statistically different from 2.00 for the range of f= 0.0 - 0.50. Figure 2.11 gives the average spectrum of annual precipitation series of 231 gaging stations in an area of the United States, for the lengths of series varying between 35 to 150 years. It shows that the annual precipitation series are, for all practical purposes, the independent (and therefore) temporarily stationary time processes. Figure 2.11. Average variance spectrum of 231 annual (homogeneous) precipitation series of the northwest area of the United States (area between longitudes 94° and 85°, and lati- tude 36.5°° and Canadian-USA border). Figure 2.12 shows the estimated spectral densities g(f), line (1), and a fitted spectrum, line (2), of the second order autoregressive model for the annual flow series of the Géta River for N = 150 years. The resulting residuals of spectral 51 densities, line (3), are added to the expected value 2 of an independent series. s.of g(F) 40] ° ° or 02 0304 8 Figure 2.12, Spectra of annual flow series of the Géta River at Vanersburg, Sweden, for 150 years: (1) estimated spectrum, (2) fitted spectrum of the second order autoregressive model, (3) spec- trum of residuals, and (4) expected spectrum of the independent series Summarizing, the analysis of historical records of annual hydrologic series lead to the following conclusions 1. Processes of annual _ precipitation, annual evaporation, annual effective precipitation on river basins (precipitation minus evaporation}, annual runoff from river basins, and similar hydrologic processes, may be considered in most cases as approximately temporary stationary stochas- tic processes provided the systematic errors in observed data Gnnconsistency) and the human-induced changes and natural accidental disruptions (nonhomogeneity in data) are properly taken into account 2. The major time dependence in hydrologic annual series is produced by the complex geophysical processes of water storage in river basins, with their random fluctuations from year-to-year and periodic and stochastic fluctuations within the year. 52 3. The longer a series the greater is the probability of some nonhomogeneity being present in data, produced either by human activities or by accidental disruption in nature, plus some systematic errors (inconsistency). 4. There exists some observed hydrologic series exhibiting changes in the statistical characteristics which do not appear to be produced by nonhomogeneities or inconsis- vencies. However, further investigations are necessary in order to substantiate claims that such changes are produced by some localized or regional climatic changes 2.4 CHARACTERISTICS OF PERIODIC TIME SERIES The statistical characteristics of periodic series are discussed assuming that any inconsistency and nonhomogene- ity were first identified and removed from the original series Periodic hydrologic time series such as seasonal, monthly, weekly and daily series have similar and different statistical characteristics as annual series. The basic difference is that periodic series, in most’ cases known in nature, have signifi- cant periodic behavior in the mean, standard deviation’ and skewness. In addition to these periodicities, they show a time correlation structure which may be either constant or periodic. Consequently, such time dependence may be repre- sented by say, autoregressive models with constant coeffi- cients as in the case of annual series or with models with periodic coefficients. The periodic statistical characteristics can be determined by Eqs. (2.7) through (2.10) The plot of the periodic series versus time gives a good indication of its main statistical characteristics. For example, the daily flows of the Boise River near Twin Springs, Idaho for the year 1921 as given in Fig. 2.13 shows a typical be- havior in which during some days of the year the flows are low and in other days the flows are high. During the days of low flows the variability is small, on the other hand, during high flows the variability is large. This characteristic behavior of daily flows indicates that the daily mean and daily standard deviation vary periodically throughout the years (see Fig. 2.1 and 2.2). Figure 2.14 also shows the periodic behavior of the monthly flows of the Middle Fork of the American River near Auburn, California. The skewness coefficient is another characteristic which may vary periodical- ly. See for instance Fig. 2.3 or the daily skewness coeffi- cient for the Boise River daily flows. One way to remove the periodic skewness is by logarithmic transformations of the original series (see Sec. 3.2). The time dependence characteristics of periodic series may be studied determining the correlogram 1, of Eqs. 53 14,000 11,200 g ae 4 2,200 me o 73 146 2 292 365 bays Figure 2/13. Daily flows of the Boise River, Idaho, USA, for the year 1921. 3 Discharge, ef ser Se TE THe Tas ge Diacharge, ef 8 8 AN ate TSS TOO TIS 82 TNT Te 19S TS “Tenuaey Figure 2.14. Monthly river flows for Station 11B.402, Middle Fork of the American River near Auburn, California for the period 1931-1960 (Roesner and Yevjevich, 1966) pte ashy TSP TSS TAT Tab (2.5a) or (2.5b) or the correlogram Tet of Eq. (2.10) of the original time series Xoo In most cases though, such correlograms are determined’ after removing the periodic mean and periodic standard deviation. In that case the new stan- dardized series z, is obtained from vot 54 (2.24) where ands, are the periodic mean and periodic standard deviation, respectively. _ Figure 2.25 shows an example of the correlograms r, obtained from Eq. (2.5b) for (a) the logarithims of th original series x, and (b) the series z, of Eq Vat vst (2.24) after using the log transformations, for the monthly flows of the Middle Fork of the America River near Auburn, California. The reason why the correlogram of the case (a) shows a periodic variation is because r, was computed before removing the periodic mean and feriodic standard deviation from the original series. On the other hand, the correlogram in the case (b) does not show a periodic variation Correlgiam of Loportims of Monthy Row Series x Correlegram of Standordied Series Z oo 36 ES Se ea (b) Figure 2.15. Correlograms of (a) the logarithms of the original monthly river flows; (b) the series on the log-domain after removing the per- jodic mean and periodic standard deviation, for the Station 11B.402, Middle Fork of the American River near the Auburn, California (Roesner and Yevjevich, 1966). 5a because the basic periodic characteristics were first removed from the original series before computing r,. Figure 2.4 shows the correlation r, computed from Eq. (2.10) for the daily flows of the Boise River. It shows that during the days of the flows of relatively high variability the correlations are lower and more variable than during the days of the recession. As another example, the correlation r, , of Eq (2.10) was computed for the series 2, | of the Tioga River near Ervins, New York. Figure 2.16 shows the computed ry, for the logarithmic transformed daily, 3-day, 7-day and 13-day flow series. The existence of a periodic behavior of the computed r, | for all indicated time intervals can be observed, although for daily series it shows high fluctuations. 1.0 +90 -80 70 30 Figure 2.16. Periodic correlograms of daily, 3-day, 7-day and 13-day series for the log-transformed flows of the Tioga River: (1) ry , computed from Eq. (2.10), (2) smoothed’ ry , using Fourier (trigonometric) series, and (3) the mean of r, _ (Tao, et al., 1976). 56 2.5 CHARACTERISTICS OF MULTIVARIATE TIME SERIES A variable observed at several points along a line, over an area or across a space, represent a multiple time series or generally speaking a multivariate series. Bach time series can be statistically analyzed separately. However, it is often the case that the stochastic components of these series are mutu- ally dependent random variables (dependent along the line, over an area or across a space). In other words, the sto- chastic components represent a set of n time series depen- dent among themselves. When the objective is to generate new sample of time series at a set of points, the basic re- quirement is not only to preserve the statistical characteris- tics at each of the n series, but also to preserve the mu- tual dependence among these n time series. The dependence structure among n_ time series can be determined by computing the lag-k cross-correlation between the series. For instance, considering the series x,“ and ae the lag-k cross-correlation coefficient 1," is given by Nek 4G 5 G) _ 3G). GQ). 3G) ieog KE) Oe ey WEG) wa FE @ pay 6 +5 (xG) - 3@ fer tek © * te where is the mean of the first N-k values of series i, and is the mean of the last N-k values of series The sample cross-correlation coefficient ry of Eq. (2.25) is the open series estimate of the population cross-correlation coefficient Pp For n_ time series it is common to represent the correlation structure by the matrix 112 in jh Oh ee TE 21.22 2n a alee " My, = . . . (2.26) pol ne pon ne x J where the rs are computed by Eq. (2.25) 57 When dealing with periodic series the periodic dependence structure between two series x and xP) are determined by > » 1 © @ 2 e@) G@_ _ 3@ i Ne eas a vite 7 ¥t-12 ‘eo eG Tv tk where # ana ¢{ are the ai t time inter- a‘ GD periodic means at time inter vals t and tk, respectively, and s() and si) are the periodic standard deviations at time intervals t and t-k, respectively. When t-k <1 in Eq. (2.27), N is replaced Q@ and x(j) ve Lwtt-k wk Te by N-1, xD is replaced by x a xD placed by x). Figure 2.17 presents the location of 79 precipitation stations’ in the Upper Great Plains of the United States = 5 \ Minnesota 3a \ ol awe as |wiseansia Figure 2.17. The study area and location of the 79 stations of precipitation in the Upper Great Plains of the USA (after Tase, 1976). 58, Figure 2.18 shows for the Station No. 52 of Fig. 2.17 how the Jag zero cross-correlation coefficients (ro of Eq. 2.25) be- tween the monthly precipitation series in the area vary with Jatitude and longitude. This coefficient decreases with an in- crease of the distance between the correlated stations, so a relationship of r to the distance d only may be sufficient to describe the regional dependence characteristic. The cross- correlation coefficients as a function of d is given in Fig. 2.19. The fitted function of r+, to the distance d is computed as ° Ty = exp {-0.00418 d} (2,28) Figure 2.18, Isocorrelation patterns for the series of Station 52, as correlated with all the other stations series, for the residual independent series of monthly precipitation in the Upper Great Plains of the USA (after Tase, 1976). In general, the characteristics of multivariate time series are often presented in the form of regional information. For instance, the statistical characteristics such as the mean and standard deviation may be expressed as a function of the lati- tude, longitude and altitude of the gaging stations over the study area. The function describing the regional characteris- tics of a multivariate series may be written as 2 2 + b)X + byY + bgX” + byY” + DEXY (2.29) 59 me soa Figure 2.19. Lag-zero cross correlation coefficient ro versus the interstation distance d, and the fitted function r = exp (-0.00418d) for the stochastic component of monthly precipitation in the Upper Great Plains of the USA (after Tase, 1976) where v is any regional statistical characteristic, X is the longitude and Y is the latitude of gaging station position, and be, bi, bz, ..., are the parameters of the regional equation. Thus relations as Eq. (2.29) enable the estimation of statistical characteristics at any point of a grid of points For instance, for the Upper Great Plains of the United States referred in Fig. 2.17, the isolines of the general monthly mean are shown in Fig. 2.20. Similar isolines can be obtained for other statistical characteristics needed in studying the multivariate time series. For each the characteristic equations similar to Eq. (2.29) may be used. 2.6 CHARACTERISTICS OF INTERMITTE TIME SERIES Intermittent hydrologic time series are those series that have zero or constant values for some intervals or continuous times, and non-zero values (usually positive) or non-constant values for the remaining intervals or continuous times. Ex- amples of intermittent series are: (a) short interval (hour, day, week) of precipitation, (b) small rivers that have per- iods of no flow, (c) sediment transport only during flows greater than a critical discharge, and (d) reservoirs for which the full and empty states may be conceived as intermittencies 60 Figure 2.20. Isolines of the 30-year general monthly mean for the precipitation of the Upper Great Plains of the USA (after Tase, 1976). When the intermittency is added to the periodic and stochastic components of a hydrologic time series, it becomes very complex and further characteristics are needed for its understanding and description. The new information is for the off-on process (or zero and non-zero values), with any patterns in their mutual successions. Two basic approaches are used in describing characteristics of these series: (1) intermittent series are conceived as off-on additional process to all the other process characteristics of periodic time series, and (2) intermittent series are conceived only as truncated series of otherwise non-intermittent general time series Evidently when periodicity, stochasticity and intermittency are considered for a multivariate series, the complexity is so compounded that only a most advanced analysis of the series would enable a reliable statistical description and modeling of such series. REFERENCES Anderson, R. L., 1941. Distribution of the serial correlation coefficients. Annals of Math. Statistics, Vol. 8, No. 1, pp. 1-13, March. 61 Jenkins, G. M. and Watts, D. G., 1969. Spectral Analysis and its Applications. Holden-Day Series in Time Series Analysis, San Francisco, California Kendal, M. G. and Stuart, H., 1968 The Advanced Theory of Statistics, Vol. 3, Design and Analysis and Time Series. 2nd edition, Hafner, New York Parzen, Emanual, 1967. Time Series Analysis Papers. Holden-Day, Inc., San Francisco, California Roesner, L. A., and Yevjevich, V., 1966. Mathematical models for time series of monthly precipitation and monthly runoff. Hydrology Paper 15, Colorado State University, Fort Collins, Colorado Tao, P. C., Yevjevieh, V., Kottegoda, N., 1976. Distribu- tion of hydrologic independent stochastic components. Hydrology Paper 82, Colorado State University, Fort Collins, Colorado. Tase, Noreio, 1976. Area-deficit-intensity characteristics of droughts. Hydrology Paper 87, Colorado State Univer- sity, Fort Collins, Colorado. Yevjevich, Vujica, 1964. Fluctuations of wet and dry years, Part II, Analysis by serial correlation. Hydrology Paper 4, Colorado State University, Fort Collins, Colorado. Yevjevich, Vujica, 1972. Stochastic Processes in Hydrology. Water Resources Publications, Fort Collins, Colorado Yevjevich, Vujica, 1975. Generation of hydrologic samples, Case study of the Great Lakes. Hydrology Paper 72, Colorado State University, Fort Collins, Colorado 62 Chapter 3 STATISTICAL PRINCIPLES AND TECHNIQUES FOR TIME SERIES MODELING The main objective of this chapter is to present the basic principles and techniques necessary for the stochastic modeling of hydrologic time series. The first section dis- cusses the most common statistical techniques for estimating model parameters including the estimation by regionalization Section 3.2 gives various procedures for transforming skewed variables into normal variables. Section 3.3 deals with the estimation of periodic parameters such as the periodic mean, periodic standard deviation and periodic correlations by using Fourier series fitting techniques. Section 3.4 discusses the solution of a matrix equation necessary for estimating the parameters of multivariate models. The techniques for testing the goodness of fit of stochastic models are given in Sec. 3.5 and the topic of preservation of historical statistics and parsimony of parameters is discussed in Sec. 3.6. Finally, Sec. 3.7 deals with the synthetic generation of new series and the use of models for forecasting. The principles and techniques given in this chapter are necessary for the model- ing of univariate and multivariate series presented in subse- quent chapters 3.1 BASIC ESTIMATION TECHNIQUES Methods derived from mathematical statistics for estimating the parameters of models representing random variables are called estimation techniques. Consider, we have a sample time series x,, ..., Xy and @ model with param- eters a and § representing such series. The estimates from the sample of such parameters are denoted @ and f and are called the estimates or estimators of a and p. The most common estimation techniques are the method of moments, the method of least squares and the method of maximum likeli- hood. Depending upon the estimation technique, some estima- tors are better than others. The criteria for judging the goodness of estimators are first discussed below. Subse- quently, the three estimation techniques suggested above are discussed in some detail 3.1.1 PROPERTIES OF ESTIMATORS Two properties are commonly used in statistics for judging the goodness of estimators. They are bias and mean square error. If the expected value E (G) of the estimator 4 is equal to the population parameter a, then @ is said to be an unbiased estimator of a. Otherwise, @ is a biased 63 estimator of a with a bias equal to E (@) - a. The interpretation of this property is as follows, Suppose that based on the model of parameters a and , we generate m synthetic sequences of the same length N and from each sequence the estimators @, i= 1, ..., m are computed. If the mean of the estimators’ @, is statistically the same as the population parameter a, then 4 is an unbiased estimator of a The difference (a-d) between the parameter a and its estimator @ is the error of the estimate. The expected value of the square of such error is called the mean square error (MSE) of the estimator @. It can be written as E{(a-a)*] Var (@) + (a-E(@)]? 3.1) Equation (3.1) shows that the MSE is equal to the variance of the estimator plus the square of the bias. Wherever @ is unbiased, a = E(@), so in such case the MSE is equal to the variance. Obviously, we would like to have an estimator as close to the parameter as possible. That is, we would like to have an estimator with a small MSE or if possible an estimator with the minimum MSE. If @ is unbiased and it has a minimum variance (i.e., unbiased and minimum MSE), then @ is an efficient estimator. When selecting an estimator or a method of estimation, we should consider both properties bias and MSE. We would like to have both desirable properties: an unbiased estimator and a minimum MSE estimator, In some cases an estimator may be unbiased but it may not be minimum MSE estimator In other cases it may be the opposite. Furthermore, estima- tors often are biased and do not have a minimum MSE Therefore, when selecting among alternative estimators, a criteria is to select the estimator with the smallest bias and the smallest MSE. When this is not possible, the analyst must judge which of the two properties is more desirable for a particular case and select the estimator accordingly In addition to the properties of bias and mean square error discussed above, the properties of consistency and sufficiency are important for describing estimators. Assume that @, is an estimator of the parameter a determined from a sample of size N. If & +a as N increases, then N Gy is a consistent estimator of a. Finally, if the estimators make maximum use of the information contained in the data they are said to be sufficient (Benjamin and Cornell, 1970). 64 3.1.2 METHOD OF MOMENTS The expected value E[X] of a random variable X is called the first population moment of X. In general, the expected value E[X") is called the r'® population moment of X. Similarly, when dealing with a sample x,, x2, xy, the r"P sample moment is defined by m, = (I/N) 3 x}. If the random variable X represents a time series given by a model with parameters a, ..., a, the population moments are functions of those parameters.P Therefore, the moment parameter estimates aré obtained by equating population moments and sample moments, and solving for the parameters. If p is the number of parameters to estimate, then the first p. population and sample moments must be equated and solved simultaneously . For instance, consider the time series model X, = atbZ, where a and b are the parameters and Z, is an inde- pendent random variable with mean zero and variance one. We would like to estimate the parameters a and b based on a sample series X,, -.. Xy. Since we have two param- eters, then the first and the second population and sample moments must be equated. The first two population moments of the model are . E(x] =a (3.2) and E [x7] = a +b? (3.3) Similarly, the first two sample moments of the series are 1 . me 2X (3.4) isl and N 2 mth 2X: (3.5) il Equating the first population and sample moments of Eqs. (3.2) and (3.4) respectively, we obtain the estimate 4 of the parameter a as (3.6) Equating the second population and sample moments of Eqs (3.3) and (3.5) respectively, we obtain the estimate 6 of the parameter b as vena s z cs 1 Ne 2 BE xg. #] .D i=l where the symbol * is used to give the meaning of estimate to the corresponding parameter. Note in the above example that the parameters a and b of the model are actually the mean and standard deviation of X. Therefore, their esti- mates given by Eqs. (3.6) and (3.7) are the sample mean and the sample standard deviation respectively The estimation of parameters by the method of moments is usually not difficult to obtain and it is simpler than the estimation by the other methods Often the moment estimates are used as first approximations for the estimation by other methods. Except for the estimate of the mean, the moment estimates of other parameters are usually biased, although adjustments can be applied to make them unbiased. Moment estimates are asymptotically efficient when the underlying distribution is normal For skewed variables though, the moment estimators generally are not asymptotically efficient. 3.1.8 METHOD OF LEAST SQUARES Consider that the model of a sample time series yj, Yu Ye = LOE ys Mpegs ey Gyr vray Gy) +e where ay, : %» are the parameters and ¢, is the residual or error t series wich has zero mean. The least squares estimation method is based on finding the estimates @, ..., 4, so that the sum of the squared differences between the observed values y), ..-, Yy and the estimated expected values vy = fps 1 Gy, oy Gy), tel, ..., Ny respectively is mini- mized. That is, @), ..., 6, should be chosen to minimize NUN par . aoe, J ef = a Gypdp" = [yf » Gy, ---s G9] (3.8) To find the minimum of the sum (3.8) all partial derivatives .of the sum with respect to @ , a. must be equal to zero. That is LY 66 =0 (3.9) These p equations with p unknowns must be solved simultaneously to obtain the parameters’ estimates @, 4): These estimates are efficient when certain conditions are satisfied (Yevjevich, 1972a) Consider the time series model y, = $1Y,_, + © where 38 also zero and the variance o2 of «, is assumed known. We would like to determine the least squares estimate of the parameter 61, based on the sample series yi, yz, ..., Yy- The sum of the squares of the errors as in Eq. (3.8) is the mean of y, is zero, the mean of « 2s ay. .? Def = yy, - by) and the partial derivative of the sum respect to $, is 22, = by =F Ay, - > Cy 28, ye = BD OVE Equating it to zero and solving for a, we have bey (3.10) Note that in the above expressions the summation varies from t=2 through t=N 3.1.4 METHOD OF MAXIMUM LIKELIHOOD Consider the time series model yy = f(¥4_1) Yp-g» , Oy, ++) Gp) + ey is to be fitted to a sample series yy, ..., Vy where a= lo, ...,0,} is the parameter set of the model and ¢, is the error term. The joint probability of is called the likelihood function L(.) and it is Creer expressed Ns N LC.) = f(e,5 a).fle9; a)... fleys a) = a) fey; a). (3.11) The maximum likelihood estimate of the parameter set o is obtained when the function L(.) of Eq. (3.11) is a 67 maximum. The same estimates are obtained if the log-likelihood function is maximized instead of the likelihood function. In such case the log-likelihood function is written as N N In Te f(e,3 a) z ts t= LL¢.) In f(e,3 @ . (3.12) 1 The partial derivatives of LL(.) with respect to the parameters a), ..., a, equated to zero are aLL¢ aLL(.) 3 30: > Se (3.13) 1 P Then the solution of Eqs. (3.13) yields the maximum likelihood estimates 4), > a, The maximum likelihood estimators are asymptotically efficient. They are also consistent esti- mators (if consistent estimators exist) as well as sufficient estimators. As in the previous method, consider the time series model Y_= Ye ¢ & Where e, i8 assumed normal with mean zero and variance o2. We wish to determine the maximum likelihood estimate of the parameter 6, given that we have a sample series yi, ..., Yy- The likelihood function L(.) of Eq. (3.11) can be written‘as s exp {gle + 2 8 (3.14) é Note that with the model y, = 6,y;.; +, and the sample series Yj, -.., ¥y only &, -.., &y can be determined since €, would require the knowledge of y,. Therefore if €, is not included in Eq. (3.14), the true likelihood function is not obtained but only an approximation. There are some ways how ¢, can be predicted and a better approximation of the likelihood function can be obtained. For more details on how to predict the starting values of a time series (as in the case of ¢; of the above example) the interested reader is referred to specific books on time series analysis, for in- stance the book of Box and Jenkins (1970) For our example let us assume that ¢1 is predicted or it is avoided in Eq. (3.14). Then the log-likelihood function of Eq. (3.12) becomes Me 2 € t =-N _a LL(.) = -N In (2x 9,) Yo t1 68 N LL(.) = - N In (21 0,) - * a > ony,” (3.15) ete Taking the partial derivative of LL(.) with respect to 1, equating it to zero as in Eq. (3.13) and solving for 6, we have (3.16) Observe that if ¢, is not predicted the summations of Eq. (3.16) would not include y, in which case the approximate maximum likelihood estimate of $1 of Eq. (3.16) would be the same as its least squares estimate of Eq. (3.10). Gen- erally, if © is predicted, then § of Eq. (3.16) would be a better estimate of $, of Eq. (3.10). 3.1.5 JOINT ESTIMATION OF PARAMETERS Whether the method of moments, least squares or maximum likelihood is used for estimating the parameters of a model, the parameters should be jointly estimated. The rea- son is because the sample estimates are mutually dependent variables. When the model involves a small number of param- eters, as in the case of models of annual time series, the joint ‘estimation is simple to obtain. However, for models of periodic time series usually involving a large number of parameters, the joint estimation becomes more complex gener- ally requiring the solution of large number of non-linear equations. In order to simplify or avoid complex estimations as mentioned above, 2 sequential procedure for estimating a group of parameters at a time is usually followed. For ex~ ample, when dealing with non-normal periodic time series the estimation of parameters of the normal transformation function are first determined. Once the transformation is made, the periodic parameters in the mean and in the standard devia- tion are estimated and removed from the time series. Then, the dependence structure of the residual series is analyzed and the corresponding model parameters are estimated. How different are the parameters of the above sequential estimation procedure with the parameters estimated jointly is difficult to assess and it seems that such comparison has not been made up to present. 69 3.1.6 PARAMETER ESTIMATION BY REGIONALIZATION Regionalization is used in time series modeling mainly for (1) determining the model parameters at ungaged sites, and (2) improving the estimates of model parameters at sites with short records. In the first case, the model parameters are determined at sites with records and they are regionalized in mathematical form as in Eq. (2.21) or in graphical form by regional parameter isolines as in Fig. 2.20, In the second case, the improvement of parameter estimates can be made by transferring information from the stations with long records to the station with short records. Under certain correlation conditions, the transfer of information can improve the accur- acy of parameter estimates. Improvement in modeling at the station with short record can also be made by adopting the type of model identified and estimated at other stations in the region with longer records 3.2. NORMALIZATION OF TIME SERIES VARIABLES Most probability theory and statistical techniques applied to hydrology in general and to hydrologic time series analysis in particular, are developed assuming the variables are nor- mally distributed (Gaussian). Because most frequency curves of hydrologic variables are asymetrically distributed, or are bound by zero (they are positively valued variables), it is often necessary to transform those variables to normal before carrying out the statistical analysis of interest. The first part of this section treats the normalization of annual series and the second part, the normalization of periodic time series Some further discussion on this subject is also made in Chapters 4 and 7 3.2.1 NORMALIZATION OF ANNUAL TIME SERIES For a large number of annual series of precipitation and runoff series in the United States, several authors have found (see Markovic, 1965) that the two-parameter lognormal (log- normal-2) probability distribution function fits their frequency distribution well. Assuming that the annual time series x, is represented by a lognormal-2 distribution, its probability density function is 7 1 1 y fC) =yaray exp tg ! (3.17) where yy and oy are the parameters of the function. They have the subscript y because they represent the mean and the standard deviation of y = log (x) respectively. Therefore, the transformation of x by 70 y = log (x) (3.18) yields a normal series y, with mean . and standard devi- ation o,. This also medhs that by log (x) - H ye (3.19) 7 is a standard normal series with mean zero and standard deviation one If however, the two-parameter gamma (gamma-2) probability distribution function is used for the annual time series x,, namely 1 xtnd eo X/B f(x) = 0 ere (3.20) where a and ® are the parameters and [(.) represents the gamma function operator, then the transformation yak (3.21) produces a relatively symmetrical (though not exactly normal) variable. The above transformations of Eqs. (3.18) and (3.21) could include a lower bound c. For instance, if x is replaced by x-c in Eqs. (3.17) through (3.20), then x will be distributed as lognormal or gamma with three param- eters. In some applications the lower bound is obtained by experience or by trial although analytical estimation proce- dures are available (Yevjevich, 1972a, p. 140, 148) An extension of the power transformation of Eq. (3.21) can be used as = a(x-c)? (3.22) where c is the lower boundary parameter and a and b are the other parameters. Values of b are usually in the order of 1/2, 1/3 and 1/4. More complex transformations are also available. The reader is referred to books in statistics, probability theory or time series analysis such as the book by Box and Tiao (1973, p. 530). 3.2.2. NORMALIZATION OF PERIODIC TIME SERIES. The problem often posed in the normalization of periodic time series is at what point in the analysis the transformation should be made. For instance, the transformation could be made directly on the original series x, , (before the per- jodic means 1, and the periodic standard deviation o, are 71 estimated and removed from the series) or the transformation could be made after the periodic parameters 4, and o, are estimated and removed from the series x, .. This problem has not been properly resolved yet for practical modeling and parameter estimation of periodic hydrologic time series. Also reliable inference techniques are needed for testing whether the transformed variable satisfies the assumption of being close to a normal probability distribution. If the normaliza- tion is carried out directly on x, _, the periodicities in the mean and standard deviation of the transformed series y, | will be highly distorted in comparison with those of the origi- nal series x, On the other hand, if the normalization is VT" carried out in terms of the standardized series - Ayr Yvyu & where fi, and 6, are estimates of , and o, , respec- tively, then the y, , series will have both positive and negative values and generally it would require more complex transformations Transformation of the original asymmetric periodic time series x, , into a normally distributed time series y, | may be accomplished by using various functions. A general transformation function has the form b, ey 7 7 ey) * (3.23) vit where c, is the periodic lower boundary parameter, and a, and B, are the other periodic parameters. Often log transformations as in Eq. (3.18) with no parameters or Your = log (xy y= ey) (3.24) with one-parameter are used. In the above transformations of Eqs. (3.23) and (3.24) it is indicated that the parameters vary with t where t=1, ..., w and w is the number of timé intervals in the year. However, it may not be necessary to have all parameters varying with 't. In some cases, espe- cially when w is large say w=52 for weekly series, w=365 for daily series or even w=12 for monthly series, the param- eters, although varying with time, can remain constant for several values of 1. For instance, for uFl2 it may not be necessary to have twelve values of c_ but only four values, one for each season r 72 In the normal transformations of periodic time series the same type of transformation is usually applied for all time intervals 1 within the year. However, the type of trans- formation function may also vary with time because the mixed distribution of say daily variables to produce weekly, monthly or other seasonal variables, does not need to be composed of the same type of probability distribution function. Atmo- spheric and river basin processes and responses during dry periods do not have to produce the same type of probability distribution function as during wet periods. Therefore, in some cases, the assumption of the same distribution as in Eqs. (3.17) or (3.20) or the same transformation function as in Eq. (3.23) or (3.24) for all values of t may not be realistic and may lead to some distortions in modeling 3.2.3 REMARKS Generally, three main approaches have been proposed for dealing with skewed hydrologic time series: (1) to transform the skewed series into normal before modeling the series; (2) to model the original skewed series and take care about the skewness by finding the probability distribution of the un- correlated residuals; and (3) to find a relationship between the first two moments of the original skewed series and those of the normal series so that the moments of the original skewed series are preserved. Each of the above approaches have some pros and cons that must be considered for their practical application. The major advantage of using the first approach lies in the fact that the best techniques in statistics and stochastic processes are developed for the normal pro- cesses. So it is simpler to transform the skewed variables into normal (or at least close to normal) rather than finding similar procedures for non-normal variables, or trying to assess the errors of applying the methods developed for normal variables to skewed variables, especially to those vari- ables of small time intervals such as hourly, daily or even weekly series which usually are highly skewed. On the other hand, when transforming the original series into normal biases in the statistical properties (such as the mean and standard deviation) of the generated series may oc- cur. In other words, the mean of the transformed series may be reproduced in the generation but not the mean of the orig- inal series (before transformation). If biases are small it would still be advisable to use the first approach of transfor- mations to normal, but if biases are large then the second or third approach may be useful. When the modeling is made directly on the original skewed time series (second approach), the estimation and testing of the model is not as efficient as if the series were normal. Furthermore, since time series models are usually linear such as AR and ARMA models then up to a certain extent, due to the Central Limit Theorem of 73 probability theory, the skewness of the generated dependent variable will be smaller than the skewness of the original dependent series. However, this may not be that critical since estimates of the skewness for the typical lengths of hydrologic records are very unreliable, so some small degree of bias in the skewness may be acceptable. An alternative for avoiding biases in the mean and standard deviation would be the third approach or using a relationship between the moments of the skewed and normal series. Matalas (1967) and Fiering and Jackson (1971) describe how to estimate the parameters of the log-transformed series so as to reproduce the parameters of the original series. Furthermore, Mejia and Rodriguez-Iturbe (1974) presented another approach in order to avoid biases in the correlation structure of the generated series In conclusion, several methods of transformations of skewed time series into normal series have been presented which are applicable for annual and periodic time series Some of the advantages and limitations of such transformations for the reproduction of the main statistical properties of the historical series were discussed as well as two other alterna- tive procedures for approaching the modeling of skewed time series. The modeling procedures described in Chapter 4 through 8 of this text are essentially based on transforming the series into normal when they have significant skewness Alternatively suggestions are made to use the second ap- proach (modeling the original skewed series) when actually needed Further discussion and comments about those two approaches, as indicated above, are made especially in Chapters 4 and 7. We do not use the third approach (rela- tion of moments of skewed and normal series) in the remaining text We feel that in most practical cases of hydrologic time series analysis the first two approaches can solve the problem of skewness. For some special cases the interested reader may wish to consult the references given above for more details on the third approach. 3.3 ESTIMATION OF PERIODIC PARAMETERS BY FOURIER SERIES Let us assume that represents a periodic y, vit hydrologic time ‘series where v is the year and ¢ is the time interval within the year. The mathematical model for Vy._ can be generally written as Ete 2, (3.25) where }, and o, are the periodic mean and periodic stan- dard deviation, respectively and 2, _ is usually a time 74 dependent series with either constant or periodic correlation structure. For example yr Sy Bee * Pur (8.28) is an AR(1) model with periodic autoregression coefficient ry Higher order AR models as well as ARMA models may ald have periodic parameters. It may be shown that these parameters are functions of the lag-k periodic correlation coefficients p, , of the series z, |. For instance, in the case of the AR(1) model of Eq. (3.26) 6, |=, , where py _ i8 the periodic first correlation coefficient of 2, (see Section 4.3.2). Since the periodic population parameters p,, 0, and Py,z are not known, they must be estimated from historical data. Two procedures can be followed for obtaining those estimates: (1) the non-parametric approach, that is, the mean, standard deviation and correlation coefficients are de- termined directly from Eq. (2.7), (2.8) and (2.10), respec- tively or (2) by Fourier series fit of the estimates referred in (1)... The purpose of this section is to describe the Fourier series analysis for estimating periodic parameters such as 4, o, and p,,. The first part of this section gives some reasons why the Fourier series analysis may be useful in practice. The second part describes the main equations used for Fourier series analysis and the third part deals with the criteria for selecting the significant terms of the Fourier series equations. 3.3.1 JUSTIFICATION OF USING FOURIER SERIES During the past years much controversy has been raised among hydrologists about whether to use or not to use Fourier series analysis for estimating the periodic parameters of periodic stochastic models. Therefore, it is fair to put forth some arguments and criteira about why and in what cases the use of Fourier series is justified Consider the case of estimating the periodic mean 4 One estimate of 1, is obtained from Eq. (2.7) as ti, »w (3.27) where w is the total number of time intervals within the year, Then, the differences between the estimate y, of Eq. (3.27) and the population parameter HL are Tl, es (3.28) Because the number of years N is usually small for most hydrologic time series, the individual sampling errors e, of Eq. (3.28) are often large. Besides, if w is large, sdy 52 for weekly series or 365 for daily series, all 52 or 365 values of y, cannot be estimated accurately, and the use of too many ‘parameter estimates violates the principle of statistical parsimony in the number of parameters (see Sec. 3.6) The experience and the physical analysis of responses of hydrologic environments to periodic stochastic inputs shows that mean is a relatively smooth function. It is sufficient to estimate 7, of a series for samples of different sizes and to find that the smoothness of the sequence j, increases as the sample size increases. Therefore, it is expected that in general, by using the Fourier series fit of y,, the resulting estimated periodic function f_ will be smoother than ¥, and will have less sampling erfors 1, 24, w (3.29) than the corresponding errors of Eq. (3.28). In the same manner it may be argued that @., the Fourier series fit of the standard deviation s, of Eq. (2.8), is a smooth func- tion so that the errors o| - 6, are smaller on the average than the errors a. - s_. “Simildr arguments can be given for using the Fourier geries‘ fit of any.other periodic parameter of the series y, ,. Although the use of Fourier series fit of periodic parameters may be often convenient as indicated above, in some cases it may lead to serious distortions of the depen- dence structure of the residual series 2, . of Eq. (3.25) (Rodriguez -Iturbe et al., 1971), This is especially true when a good fit of the periodic parameters during the dry season is difficult to find. For instance, in an extreme case, the fitted periodic mean or periodic standard deviation may even hecome negative. In such cases the use of Fourier series should be avoided. In general, Fourier series analysis for estimating periodic parameters of hydrologic series should be used with judgment. Special care should be taken espe~ cially by those with no experience with such technique However, as familiarity and experience with it are gained, the 76 user will realize its practical advantages as well as its limitations Just as Fourier series analysis can be used for smoothing out the sampling variations of parameters in time, it can be applied for smoothing out sampling variations in space. Karplus and Yevjevich (1973) used it for estimating monthly precipitation parameters in the Central Plains region of the United States. Salas (1975) applied Fourier analysis for estimating regional periodic parameters of weekly stream- flow in the Central Part of the Peruvian Andes and Woolhiser and Pegram (1979) used it for estimating daily precipitation parameters in various regions of the United States. In using Fourier series analysis for multiple time series, checks should be made for obtaining consistent dependence structure in space in addition to verifying the time dependence structure As stated above, care and judgment should be used for the best use of Fourier series in hydrologic regional analysis 3.3.2. ESTIMATION OF FOURIER SERIES COEFFICIENTS Let us consider that u, represents a periodic statistical characteristic of the hylrologic series y, |, such as the periodic mean y,, the standard deviation s,”"lnd the correlation coefficients ry, of Eqs. (2.7), (2.8) and (2.10), respectively. Assume also that u_ is a (nonparam- etric) ‘sample estimate of the unknown population periodic parameter denoted by v.. The parametric or Fourier series representation of u,, denoted in general as @_, can be obtained by (Yevjevich, 1972b) . h @ = 0+ 2 [A cos(2njt/w) + B, sin (2njt/w)], + a) . fl? ; (3.30) where U is the mean of u,, A; and B, are the Fourier series coefficients, j is the harmonic and his the total number of harmonics which is equal to w/2 or (w-1)/2 de- pending if w is even or odd respectively. For instance, for monthly series where w = 12, h = 6; for weekly series with w = 52, h = 26 and for daily series with w = 365, h = 182 The mean u and the Fourier coefficients A, and B. are determined by J ] ae u, (3.31) we ot 7 yh (3.32) and eae (3.33) When w is even the last coefficients Aj, and B, are given by sos ( 2thE A uy cos(y ) (3.34) and B (3.35) When of Eq. (3.30) is determined considering all the harmonics" j=1, ..., h, (all the coefficients A; and By), 0, is exactly the same as u_ for all the values of t=1, » w. In practice though, 2 smaller number of harmonics h¥*ch is used. That is 9, of Eq. (3.30) should be com- puted with only those harnfonics which are "significant" or which "significantly contribute" to the variability of u,. For instance, for monthly series we12 and the total number of harmonics is 6 with j=1, 2, 3, 4, 5, and 6. However, for a particular case, only h*=3 harmonics may be needed, such as i=l, 2, 3, or jel, 2, 5 or any other combination of 3 harmonics which are showed to be the most significant. The criteria and procedures for selecting those harmonics to be used in Eq. (3.30) are discussed in the next section 3.3.3 SELECTION OF SIGNIFICANT HARMONICS AND FOURIER COEFFICIENTS The experience in using Fourier analysis for estimating periodic parameters of hydrologic time series shows that for small time interval series, such as daily and weekly series, only the first 4-6 harmonics are necessary for a good Fourier series fit in the periodic estimate u,. For instance, when dealing with daily series there are a total of 182 harmonics. However, out of this number rarely more than 6 harmonics are needed for obtaining a Fourier series estimate % which will closely fit the estimate wy A similar situation occurs for the case of weekly series‘ or any other series with a relatively small time interval. For monthly series, about 4 harmonics may give a good fit, although often in practice the Fourier analysis is not used for such type of series. 78 similarly, for bimonthly, quarterly or similar series the Fourier analysis is not applied. The above practical criteria can be used as a first guide for finding the number of har- monics and corresponding Fourier coefficients which will enter into Eq. (3.30). However, such criteria should be supple- mented by more precise analysis and tests. Some approxi- mate tests based on theoretical and experimental results are given by Yevjevich (1972b) A graphical and likely the most accurate test for selecting the harmonics in Fourier series fit of a periodic estimate, is the so called “cumulative periodogram test". Consider the periodic estimate uy with a mean u deter- mined by Eq. (3.31) Such estimate could be either the periodic mean y,, standard deviation s_ or correlation coefficient rhs in Eqs. (2.7), (2.8) anki (2.10), respec- tively, or any other similar periodic estimate. The mean squared deviation (MSD) of u_ around u (equivalent to the definition of variance in stdtistical terms) may be deter- mined by u Msp(u) =} 2 (a -a)? (3.36) On the other hand, consider the Fourier series estimate 0, of Eq, (3.30) with harmonics j=l, 2, ..., h and corre? sponding Fourier coefficients A; and Bj. The mean square deviation of @ around U is composed of the MSD(j) of each harmonic j, whith are determined by MSDG) = 5 (AP + BD, jet, .h (3.37) with A; and B, obtained from Eqs. (8.32) through (3.38) It can be shown that the sum of all the values of MSD(j) is equal to MSD(u) of Eq. (3.36). Assume that the values of MSD(j) are ordered in decreasing order of magnitude so that MSD(h;), yee BD represents the ordered sequence, h, is the harmonic with the highest MSD and h, is the harmonic with the lowest MSD The cumulative periodogram P, is the ratio of the sum of the first i mean square deviations MSD(h,), j=1, , ito the mean square deviation MSD(u) of Eq. )(3.36). That is i z MSD(h;) _ ial - Eom it took. Ca 79 The plot of P; vs. i is called the cumulative periodogram A graphical criteria using the cumulative periodogram for obtaining the significant harmonics is given below, ‘The criteria is based on the concept that the variation of P, versus i is composed of the two distinct parts: (1) a periodic part of a fast increase of P, with i and (2) a sampling part of a slow increase of *P, with i. Two ap- proaches are feasible for determining those two parts. First the two parts are approximated by smooth curves that inte: sect at a point, which corresponds to the critical harmon’ h¥ that gives the number of significant harmonies. The second approach is to assume approximate mathematical models of these two parts, to estimate their parameters and to find the intersection of two equations. The ordered harmonic nearest to the intersection point is then the critical harmonic. In the second approach, when 2, of Eq. (3.25) is an independent series, the sampling part of the cumulative periodogram, as referred above, is a straight line, whereas when 2, , is a linearly dependent series, the sampling part is a curve Figures 3.1 and 3.2 show the intersection point A for a periodic series with either an independent or a dependnet stochastic component, respectively. The value of P, at point A is determined by the sample size, while the valhe of i is much “less affected by the sample size and sampling variation. « Difficulties arise when the point A for a dependent stochastic component is in such a position that both curves (3) and (4) of Fig. 3.2 come out to be nearly one continuous curve, implying that the separation of the two parts of the cumulative periodogram becomes uncertain. Examples show that this case is less common in practice. Figures 3.3 through 3.7 show the cumulative periodograms for five statistical characteristics: mean y,, standard devia~ tion s_, and the first, second and third serial correlation coefficients , T1y)T,.~ and T, ,, for five discrete series: (1) 69 years of daily precipitation at Fort Collins, Colorado, from 1898 to 1966, Fig. 3.3; (2) 70 years of 3-day precipita- tion at Austin, Texas, from 1898 to 1967, Fig. 3.4; (3) 18 years of 7-day precipitation at Ames, Iowa, from 1949 to 1966, Fig. 3.5; (4) 40 years of daily discharge of the Tioga River near Erwins, New York, from 1921 to 1960, Fig. 3.6; and (5) 37 years of 3-day discharge of the McKenzie River at McKenzie Bridge, Oregon, from 1924 to 1960, Fig. 3.7. The harmonic i ranges from 1 to 182 for daily series, from 1 to 60 for three-day series, and from 1 to 26 for 7-day series. Because other precipitation and river gaging stations for 80 Figure 3.1 Figure 3.2. Separation of the cumulative periodogram into the periodic part, for both the observed (1) and the fitted (3), and the sampling variation part, also for both the observed (2) and the fitted (4), in case of a periodic series with an inde- pendent stochastic component. L09y i ol ee eae we 0246 creas Separation of the cumulative periodogram into the periodic part, observed (1) and fitted (3), and the sampling variation part, observed (2) and fitted (4), in case of a periodic series with an autoregressive stochastic component 81 025406080 100 120 140 160 180200 Figure 3.3. Cumulative periodogram of five parameters of daily precipitation series, Fort Collins, Colorado. Figure 3.4. Cumulative periodogram of five parameters of 3-day precipitation series, Austin, Texas. 82 Figure 3.5. Cumulative periodogram of five parameters of 7-day precipitation series, Ames, [owa ‘0204060 a0 100 120-140 160180 200 Figure 3.6. Cumulative periodogram of five parameters of daily flow series of the Tioga River. 83, Figure 3.7. Cumulative periodogram of five parameters of 3-day flow series of the McKenzie River , 3-day and 7-day discrete series show results that are similar to those of Fig. 3.3 through 3.7, the following con- clusions are generally valid and can be applied in most cases (1) The estimated means y, and the estimated standard t deviations s_ for the precipitation series are periodic whose cumulative périodograms P, are composed of two parts: a steep rise from i= to about as a result of the periodic part, then a slow rise following approximately a straight line due to the independent stochastic component. The curve P. vs. i for y_ is always above the corresponding curve for s_; this differénce comes from the larger sampling variation of the second moment s? than of the first moment J, (2) The cumulative periodogram for the serial correlation coefficients, ry ,, tp ,, and tz ,, computed after period- icities in the mean (¥_) and in the standard deviation (s_) are removed, from the original precipitation series, shd approximately a straight line relationship. This indicates that the residual series z, of Eq. (3.25) would be an inde~ pendent series MO! (8) The cumulative periodogram of the estimated means y_ and the estimated standard deviations s_ of one-day, ifree-day and of seven-day runoff series show a sharp rise from isl, to about i=3-6, then a slow curvilinear rise up to The first part of the cumulative periodogram indicates a 84 significant periodicity, whereas the second part indicates that 2, of Eq. (3.25) would be a dependent stochastic component. (4) Rivers with runoff .predominately produced by rainfall show no periodicity in serial correlation coefficients, as shown by Fig. 3.6. (5) Rivers greatly affected by snow accumulation and melt, or river regimes with combined runoff from rainfall and snowmelt, usually show periodicity in serial correlation coeffi- cients, as shown by Fig. 3.7. (6) As expected, the sample size affects smoothness and reliability of the cumulative periodogram P, vs. i as shown by comparing Fig. 3.5 with 18 years of data and Figs. 3.3 and 3.4 with about 70 years of data The mean square deviation MSD(j) of Eq. (3.37) is often interpreted as the part of the variance of u, which is con- tributed by the harmonic j. Hence, the cumfulative periodo- gram P, would represent the explained variance yield by the first. i‘ harmonics arranged in decreasing order of magnitude of the sequence MSD(j), » ..+, h. Using this concept, a criteria often used in practice for determining the number of significant harmonics has been to set a fraction or percentage of explained variance, say 90% or 95% and pick the number of harmonics which “explain” that specified amount of variance. However, due to the complexity shown by periodic hydrologic series as described in the above examples (see Figs. 3.3 through 3.7), the criteria based on a given percentage of explained variance would be rather arbitrary and we would prefer to avoid it, unless is it used only as a rough guide or first guess of the significant harmonics. Summarizing, for selecting the number of significant harmonics to be used in Fourier series estimate of periodic parameters we advise the following steps: (1) Use the cumulative periodogram of the periodic estimate u. and select the number of harmonics h* by defihing graphically the intersection point A as explained above. (2) Determine the fitted periodic estimate 0, based on the number of harmonics h* selected‘in (1) and compare it graphically ‘with the estimate uj. (3) (optional) to further verify the Fourier series estimates fi, and 6, of the periodic mean and standard deviation, ‘respectively, compare the 85. skewness and correlation coefficients of the residuals 2, , by using (a) p, and 6, and (b) ¥, ands, in Eq. (3.25), If such computed properties are statistically comparable, then the fitted Gd, and a, would be appropriate. Once the number of significant harmonics h* _ is determined, the Fourier series estimate ¢_ of Eq. (3.30 becomes t hk 0, = 0+ E [AL cos(anh,t/w) + Ba sin(2nh,t/w)] (3.39) where h;, il, ..., h* are the significant harmonics and A, and B, are the corresponding Fourier coefficients. i i 3.4 ESTIMATION OF PARAMETERS OF MULTIVARIATE MODELS The estimation of parameters of multivariate models in general presented in Chapter 7, and of disaggregation models in particular, presented in Chapter 8, requires the solution of the matrix equation BB! = D, That is, given that the elements of the matrix D are known, it is necessary to find the elements of a matrix B such that the product of B times its transpose B! is equal to D. This section deals exclusively with the solution of such type of matrix equations. Any solution for B which will produce BB? =D is a valid solution. In general, there exists an infinite number of solutions which will reproduce D. The solution for BB? = D may be obtained by the principal component analysis, but B is not uniquely identified and it is a rather complex proce- dure. However, if Bis assumed to be a lower triangular matrix (see Appendix A1.3) then a unique solution can be found by the square root method (Young and Pisano, 1968) when D is a positive definite matrix or by a method pro- posed by Lane (1979) when Dis at least a positive semide- finite matrix. Therefore, if B is a lower triangular matrix and D is a positive definite matrix, then the non-zero elements of B may be determined by (Graybill, 1969) pi = alii), for jst, tel, ..., 0, (3.40) 86 _ qi Ae pd = {at - x (b)*) , for js2, ..., n, is} (8.41) k=1 and . fa fl analog pil [= mas a , for j=2, ..., nel, isjtl, ..., 0 a (3.42) where b” are the elements of B, d! are the elements of D and n is the size of the matrices B and D. On the other hand, if B_ is a lower triangular matrix but D is either a positive definite or a positive semidefinite matrix, then the elements of B may be determined by (Lane, 1979) ii bS=0 forall k i, when (3.44) di. x (il)? i when 2 (by - x wiyroo . (3.45) isi Equations (3.43), (3.44) and (3.45) are applied first to calculate the first matrix column, top to bottom, then the second column, third etc. It can be easily shown that BB? must always be at least positive semidefinitive. However, due to computer roundoff errors, it is common for a singular matrix to appear to be negative indefinite. Equation (3.44) overcomes this predicament, in addition to handling the sin- gular case. This solution will be referred to in later chap- ters in order to solve matrix products of the same form. One should always check that the solution for B, when postmulti- plied by itself transposed will reproduce the original matrix D. 87, 3.5 TESTS OF GOODNESS OF FIT Various statistical tests are available for testing hypotheses in hydrologic time series modeling. These tests are either approximations, or they are exact, provided the basic conditions in the derivation of these exact tests are satisfied. Mathematical statistics are full of various paramet- ric or non-parametric tests. However, the practice has shown that a small number of these procedures would satisfy the needs in the analysis and modeling of hydrologic time series. Three types of tests are mainly found in current hydrologic practice: (1) Drawing of probability limits around an assumed (hypothesized) population value or function and comparing whether the sample estimates fall within or outside of these probability limits. An example of such a method is the test of independence of a hydrologic time series based on the corre- lograms as described in Sec. 2 (2) Use of test parameters with known exact or approximate sampling distributions of these test parameters One such test parameter is the test as referred in Sec. 2.2.4 for detecting changes in the mean. Another test parameter is the chi-square test for testing the hypothesis of goodness of fit of a given distribution. (3) Use of the Smirnov-Kolmogorov statistic, namely the maximum absolute difference between the cumulative frequency curve of sample data and the fitted distribution function. This test statistic is approximate, and cannot compete in reliability of test with the other two types of tests. In each of the above three types of tests, the probability level is selected for determining the critical testing parameter value. If the sample estimate of the test parameter exceeds such critical value (on either of the two sides, or only one side of its distribution) the assumed hypothesis is rejected. The modeling of a hydrologic time series usually assumes that the stochastic component, after removing periodic com- ponents and time dependence structure, is an independent and normally distributed series. Similarly, when modeling multivariate series the assumptions are that the stochastic component is time and space independent as well as normal Tests for these assumptions are described below. In addi- tion, the comparison of the historical and model correlograms and of the historical and generated statistics may be used for further testing the fitted time series model. Such compari- sons are explained in Chapters 4 and 7 88 3.5.1 TEST OF INDEPENDENCE Test of Independence in Time The Anderson test of the correlogram and the Porte Manteau lack of fit test are usually applied for testing the independence of a time series. The Anderson test was given in Sec. 2.3 when describing the characteristics of annual time series, so it will not be repeated here in this section. The Porte Manteau lack of fit test was utilized by Box and Pierce (1970) as an approximate test of model adequacy. Hipel and McLeod (1977), Delleur (1978) as well as others applied it for verifying linear models of hydrologic time series. The cumu- lative periodogram can be also used for testing the indepen- dence of a series, especially when it is derived from a series which originally had periodic components. Both the Porte Manteau lack of fit test and the cumulative periodogram test are described below The Porte Manteau Lack of Fit Test. Consider that a time series x, of size N is represented by an ARIMA (p,d,q) model (see Sec. 6.1) where p is the number of autoregres- sive terms, d is the number of differences and q is the number of moving average terms, Assume that after d dif- ferences the ARMA (p,q) series 2,, t=1, ..., N-d is ob- tained and assume further that in such models ¢, is the residual series. We would like to apply the Porte Manteau lack of fit test to check whether ¢, is an independent series, hence, whether the models are adequate. This test uses the statistic L Q= (Qed) 2 rhe) (3.46a) k=l where rj,(z) is the correlogram of the residuals ¢, and L is the maximum lag considered. The static @ is approxi- mately chi-square distributed with L-p-q degrees of free- dom. The adequacy of the ARIMA model for x, or of the ARMA model for 2, may be checked by comparing the statis- tic Q with the chi-square value x(L-p-q) of a given sig- nificance level. If Q YL9/2 | (3.53) where U,_,/9 is the 1-a/2 quantile of the standard normal distribution. Therefore, if ¥ of Eq. (3.52) falls within the limits of expression (3.53) the hypothesis of normality is ac- cepted. Otherwise it is rejected. Actually, the above test is sufficiently accurate for N > 150. For smaller sample sizes, Snedecor and Cochran suggest instead to compare the computed coefficient of skewness ¥ of Eq. (3.52) with a tabulated value y,(N) which depends on the selected probability level a and on the sample size N. Table 3.2 gives the values of y,(N) for a= 0.02 and 0.10 and for various values of N. Thus, if %< y,(N), the hypothesis 6f normality is accepted Table 3.2, Table of Skewness Test for Normality for Sample Size Less than 150 (after Snedecor and Cochran, 1967, p. 552) a | a N | 0.02 0.10 N 0.02 0.10 ee 1.061 0.711 70 ! 0.673 0.459 30 0.986 0.662 80 0.631 0.432 35 0.923 0.621 90 0.596 0.409 40 | 0.870 0.587 100 0.567 0.389 aS 0.825 0.558 125, 0.508 0.350 50 0.787 0.534 150 0.464 0.321 60 0.723 0.492 175, | 0.430 0.298 93 3.6 PRESERVATION OF STATISTICS AND PARSIMONY OF PARAMETERS A current practice in generating new samples of hydrologic time series is to preserve, exactly or very closely, the statistics determined from historic samples, even when these statistics may be subject to large sampling variations. This often leads to models with an excessive number of pa- rameters. In an extreme case the number of estimates of the model may equal the sample size. In that case all the proper- ties of the historic sample are exactly preserved (identical). This of course represents a full negation of the principle of modeling of time series and makes the use of the data genera- tion technique meaningless. The problem of the degree of preservation of historical statistics of sample time series is one of the most unsettled subjectes in statistical hydrology. An approach based on sampling theory and inferential statistics is to preserve those estimates of the historic sample which have the smallest sampling variances (such as means and standard deviations), and to preserve less those esti-~ mates that have the largest sampling variances (such as the extreme values). For instance, if the largest drought of a series of 60 years of annual precipitation or annual runoff has a return period of 200 years, one can not expect that every generated sample of 60 years have the same value of the historical largest drought. Even the average largest drought ofa set of m generated new samples, each 60 years of length, would not be identical to the largest drought of the historical sample. As a general rule, the modeler would like to preserve those historical statistics which are really neces- sary for representing the natural variability of the hydrologic series in question, and which are important for the problem for which the modeling is being made. Usually the mean, standard deviation and serial correlation structure are the main statistics to preserve. However, we should not forget the errors involved in estimating such statistics especially the correlations and whenever possible we should try to give some physical and statistical interpretation to the estimates obtained. The skewness is important in some practical cases but preserving exactly the sample estimate may not be so important because of the great uncertainty of such estimate for the typical sizes of the historical records. Both correlation structure and skewness are important for extreme statistics such as the drought properties and those related to storage capacities such as ranges. However, such extreme properties observed in sample time series are very uncertain and as said above, preserving exactly such properties may not be appropriate. ‘Therefore, a sound and careful interpretation of the historical statistics as well as the intended application ultimately will 4 assist in deciding which historical statistics should be preserved by the stochastic model of the hydrologic time series. Once the decision is made on which statistics to pre- serve, then the modeling problem is to find a model with the minimum number of parameters which will adequately repro- duce such statistics. This is known as the principle of par- simony or the model is said to be parsimonious. Parsimony of Parameters Considering that N_ is the size of a sample time series and K_ is the number of parameters of the fitted model, the ratio (3.54) is a general index of parsimony in parameters. A low value of say 6 = 4-6 will indicate that the basic principle of par- simony in the number of parameters is not respected Roughly, 6 should be at least 15. For instance, for 30 years of monthly time series or N = 360, the hightest number of parameters should be .K = N/é = 360/15 = 24. However, it is common to find stochastic models with at least 36 param- eters to account for 12 means, 12 standard deviations and 12 monthly correlations. If 36 parameters are used then 6 = 10. It is easy to realize that for series with smaller samples, the principle of parsimony is even more drastically violated. For demonstrating the principle of parsimony in parameters, let us assume that for normalizing the periodic series x, , S periodic parameters are needed, each with m, ‘V,T harmonics, requiring the estimate of s (2m; + 1) parameters Then, for the normal series y, | let u, and o, be esti- mated by Eq. (3.30) each with mz harmonics, or a total of 2 (mg + 1) parameters. The periodic series Yy,, Would be approximately standardized by obtaining 2, , of Eq. (3.25). Let us also assume that an ARMA (autoregressive-moving av- erage) model is used for 2, of Eq. (8.25) in the form Pp q Pat Sr ted 2) On Seti * ove where 2, , is the normal dependent variable and eo, , is the normal independent variable (white noise) with standard deviation o,,. Furthermore, 6; , and 0; , are the auto- regressive and moving average coefficients, and p and q are the number of such coefficients. That is a total of p+q 95 parameters, each assumed of mg harmonics if periodic ‘Then the total number of parameters becomes K = s(2m +1) + 2(2mg#1) + (ptq) (2m3tl) . (3.55) If my, = mz = mg =m a simplification of Eq. (3.55) becomes K (stptqt2) (2m + 1) (3.56) By keeping the number of periodic parameters stp+q+2 and the number of harmonics of periodic parameters min Eq. (3.56) as small as feasible, both the modeling and the estimation of parameters and coefficients become greatly sim- plified, and the principle of parsimony in parameters is respected. The number of harmonics (m) used in present practice for the periodic parameters usually is different for each pa- rameter. We could use the same number of significant har- monics for all the periodic parameters provided the given harmonics are not highly out of phase for these parameters. In that case, the significant harmonics are estimated for that parameter which has the lowest sampling variance, say the mean of the series, because the larger the sampling variance of estimates of a parameter, the more difficult it is to infer the significance of harmonics with relatively small amplitudes This approach is feasible provided the major harmonics (such as the 365-day, 182-day, 91-day, ..., for daily series, or 52-week, 26-week, etc., for weekly series) are the dominating harmonics in periodic functions and have the same or very close phases of harmonics. In this case, harmonics (frequen- cies) of the mean are used for the other parameters having similar patterns of harmonics The phases of significant harmonics in a, and c, of Eq. (3.23) may not be the same as those of parameters bf other equations. In the above example of 30 years of monthly series or N = 360, let us assume s = 0, p=1,q=0, andm=3. Then Eq. (3.56) gives K = 21 and Eq. (3.54) gives 6 = 360/21 = 17. Similarly, by assuming s = 1 and constant parameter (m; = 0), H, and o_ periodic with mz = 3 harmonics each, while p =1 ‘and constant coefficients (mg = 0), Eq. (3.55) gives K = 17 and Eq. (3.54) gives 6 = 360/17 = 21, which satisfies well the principle of parsimony It is not difficult to realize that for a typical length of record of monthly time series it would generally be difficult to satisfy the principle of parsimony unless the periodic parameters such as the mean and standard deviation are estimated by a fitting technique as the Fourier series analysis Summarizing, in ongoing mathematical modeling of hydrologic time series that account for periodicities in param- eters of complex precipitation, runoff, water quality and 96 similar processes, a compromise is needed between the work involved in deriving models in complex forms and the accu racy of modeling. The work must take into account the series properties with some level of accuracy. The complex analysis also requires significant computer time in identifying models, in estimating their paraméters and in generating new samples that preserve the inferred characteristics. To de- crease this work, models must be simplified. However, this simplification represents a further deviation from the sample series properties. A tendency in practice is towards simplifi- cation. However, the approximate models with an easy gen- eration of new samples, have led to persistent complaints that new samples do not preserve historical properties. On the other hand, a tendency in academia and research has been to derive complex models with so many parameters that often leads to reproduce pure sampling variability of historical series rather than actual properties. Therefore, a realistic balance should be maintained in the historical statistics to be preserved or reproduced, in the complexity of the model to be fitted, and in the number of parameters of such model so that it would adequately represent the characteristics of the historical time series. Further discussion on this subject is made in other chapters of the text, especially in Chapters 8 and 9. Akaike Information Criterion A mathematical formulation which considers the principle of parsimony in model building is the Akaike Information Criterion (AIC) proposed by Akaike (1974). For comparing among competing ARMA (p,q) models he used AIC(p,q) = N In(62) + 2(ptq) (3.57) where N is the sample size and 6? is the maximum likeli- hood estimate of the residual varidnce. Akaike suggested such criterion to select the correct model among competing ARMA models. Under this criterion the model which gives the minimum AIC is the one to be selected. 3.7 GENERATION AND FORECASTING The two major uses and applications of modeling hydrologic time series are the generation of synthetic samples (as potential future inputs of water resource systems), and the forecast of hydrologic events. These two main applica- tions are specifically described below 97 3.7.1 GENERATION OF SYNTHETIC SAMPLES The modeling of hydrologic time series always reduces it to a noise or independent stochastic component, such as € in Eq. (3.26). This approach is often called "the whitening process", because it always leads to what is known as white noise It is ideal for «€ to be a white noise, or an inde- pendent, stationary normal variable. In such case, the generation of a synthetic time series will start with the generation of independent normal variables with mean zero and variance one, then adding the time and spatial dependence structure as well as periodic components, whichever is necessary. Various generating techniques of normal independent random numbers are presented in many standard books which treat the Monte Carlo (samples generation) techniques and will not be reviewed here. Similarly, various computer programs are readily available for generating these numbers. As an illustration of generating normal random numbers, Box and Muller (1958) proposed the equations Be £1 = [In (1/a,)]* + cos (2mug) (3.58) f2 = Un (/u)]# + sin (mua) (3.58) where £; and £2 are standard normal random numbers and u; and ug are random numbers of the uniform (0,1) distribution. Currently, digital computers and even small programmable calculators can generate or have already avail- able uniform (0,1) random numbers. Table 3.3 below gives 600 standard normal random numbers generated with Eas. (3.58). The mean, standard deviation, skewness coefficient and first serial correlation coefficient of the generated sample are 0.005, 1.008, 0.04 and 0.05 respectively. In generating normal random numbers or in using already available num- bers, the normality and the independence tests for the gen- erated random numbers are important. Often these numbers are called pseudo-normal or pseudo-independent random numbers, because biases that may occur in the generation procedure may cause difficulties in passing the above two tests. ‘The normality and the independence checks should be made before using any normal random number generation scheme that has not been previously verified. For a single time series, once the independent normal random numbers are generated, the time series can be readily generated by adding the other components of the model with their corresponding estimated parameters. For instance, for 98 Table 3.3. Six Hundred Standard Normal Random Numbers 334 1383, 8 358 662 ‘303 153 2385 317 a9t +200 CME 1 Pectin a2 ag 160 ne 1 ane ='309 35911185 Sez = 559 2.995. 1.71 1.26 2.759 Soy 184 245 =!08T 1, Sig om 1305 <'309 1/54 are called the autoregression coefficients. The parameter set of the model of Bq. (4.1) is {u, 0%, @, . . . , 4, og} and it must be specified or estimated from data. Strictly speaking, either 0? or o% should be considered within the parameter set, because they are related to each other. However, both variances are included for practical convenience. Various forms of AR models have been used in the field of stochastic process in general and in stochastic hydrology in particular. Table 4.1 shows various forms commonly found in the literature. They are actually the same model and represent the same autoregressive process, but are written in slightly different forms, The use of one form or another depends mainly on personal preference and convenience. We will use either forms 2 or 3 of Table 4.1 depending on the particular case. The AR(1) model results by considering only one autoregressive term in Eq. (4.1). That is, for p= 1, Eq. (4.1) becomes V_ TR +O) Ly wt ey (4.2) This model is called the first-order autoregressive or first- order Markov since the variable y at time t can be ex- pressed as a function only of y at time t-1 plus a random part. It is also called the lag-one autoregressive or the lag-one Markov model. The parameters of the model are wy, a, , and o. Similarly, the AR(2) model is obtained if p = 2 in Eq. (4.1). 106 Table 4.1. Various Forms of AR(p) Models Commonly used in the Literature. No. Forms of the AR(p) Model Parameters Reference zg . Loy ete 6 Gpyh) H, 07, O),..., Hering and iL 6, R Jackson Pp’ (1971); + o(1-R2)* (o? = 1) Beard (1967) 2 H, 0%, g),..., Yevjevich 6, 02 (1972) pe ale (og 1) Ba ype ete OCH) te, He beens Box and Fl ic Jenkins ory p? %e (1970) Ye=ut= 6: (yy cH) ‘ jaa FET + eb Sboyp TH+ 2 He dyes Box and P 6, & Jenkins 2 = 2 6. 2, .+e,, or pe” (1970) a (of =) ? Pa 124 tj te bt AR Models with Periodic Parameters Autoregressive models with periodic parameters are those models in which part or all of their parameters vary within the year or they are periodic. These models are often refer- red to as periodic AR models. The periodicity may be in the mean, variance and (or) autoregressive coefficients. For instance, an AR(p) model with periodic mean and periodic variance but with constant autoregressive coefficients may be represented by 107 Yor tet Aye (4.3) and aH 4.4 fy 2H 2 rj oye (4.4) where y, _ is the time dependent variable for year v and time t (say months, weeks, etc.) with t=1,...,w and w is the total number of time intervals within the year. The variable y, , is assumed normally distributed with mean, and variance 2 and has an autoregressive correlation structure of order p with constant coefficients as indicated by Eq. (4.4). The dependent variable 2, _ as well as the independent variable ©, | are also normal with mean zero and variance one and o, respectively. The parameter set of the model of Eqs. (4.3) and (4.4) is {u,, ot, op > of; t= 1, ..., whe Instead of Eq. (4.4) %, may be represented by an AR model with periodic coefficients as P Zy,t 950 tej * Per by,r (5) Fl where 9), is the jth periodic autoregressive coefficient at time t, ¢,, is also a periodic coefficient and &, , is the AC independent standardized normal variable. Therefore, the parameter set of the model of Eq. (4.3) with 2, | given by 2 2 = Eq. (4.5) is {u,, of, Hye Ope OG TEL - > wy. 4.1.2 PROPERTIES OF AR MODELS The main properties of autoregressive models discussed in this section are related to the expected value, variance, autocorrelation, partial autocorrelation and the conditions to be met by the model parameters. These properties are im- portant since they constitute the basis for the overall model- ing procedure’ given in subsequent sections. The previous Sec. 4.1.1 described AR models with constant or with periodic autoregressive coefficients. The major emphasis in this section will be with those properties related to AR models with constant parameters since these models are widely Known and have been very much studied in the literature On the other hand, although periodic AR models are also widely applied in hydrology, some of their properties are not very well known and usually are more complex. Therefore, 108 only the commonly known properties of periodic AR models are discussed herein Properties of AR Models with Constant Parameters Expected Value and Variance. The expected values and variances of the dependent and independent variables of the AR model of Eq. (4.1) were defined in Sec. 4.1.1 when the model was first introduced. Summarizing previous definitions we have E(y,) =u (4.6) E(e,) = 0 . 4.2 Var (y,) = 0% cowl" . (4.8) Var (€,) = 02 . ae (4.9) We will see subsequently when discussing the actual modeling of hydrologic series that the parameters y, 0? and o? will be estimated from historical data. The variances o? dnd o2 are related by & 0 Pp =o? (1-2 6, p,) (4.10) ja 193 where 4; is the i** autoregression coefficient and ®; is the lag-j autocorrelation coefficient of the variable y, For the AR(1) model, Eq. (4.10) may be simplified to o2 = o® (1 = $3) (4.11) Similarly, for the AR(2) model, Eq. (4.10) becomes ob =o? (tM) 11 - 92)? - 98 (4.12) Autocorrelation Function. The autocorrelation function p, of the variable y, of Eq. (4.1) is obtained by multiplying both sides of Eq." (4.1) by yj, and taking expectation term by term. It satisfies the difference equation (known as the Yule-Walker equation) 109 +o , RO (4.18) which is due to Yule (1927) and Walker (1931). This equa- tion is commonly used for estimating the parameters of the AR(p) model by the method of moments (see section 4.2.2 below) as well as for determining the correlogram p, for a given set of parameters 9, j=1, ..., p. It is important to know the shape of p, for a given model AR(p) because it will serve for identifying the order of the model for a given time series as well as for comparing the sample correlogram with the model correlogram, The relevance of using the func- tion , should become more clear in the following sections. For the AR(1) model or p = 1, Eq. (4.13) gives the explicit autocorrelation function = ak % Ok, K>O (4.14) For 9, positive, of Eq. (4.14) decays exponentially to PK zero while for 9; negative it oscillates in sign. Figure 4.1 shows two examples of time series generated using the AR(1) model of Eq. (4.2) and their corresponding autocorrelations Py, of Eq. (4.14). The first series (a) was generated with = 0, a2 = 1 and 6, = 0.60. Notice in the first series since 61 = 0.60 (positive time dependence) low values tend to follow low values, and high values tend to follow high values. On the other hand, in the second series (b) since , = -0.60 (negative time dependence) low values usually are followed by high values and vice versa. Observe also the different shapes of the correlograms in each case. For the AR(2) model or p = 2, Eq. (4.13) gives for 1 k ©) = ,/C1 - oo). (4.15) Also Eq. (4.13) for p = 2 becomes Pi = 94 Py t Oy Pigs > 0 (4.16) which must be solved recursively for k > 1 beginning with the initial values p= 1 and p, given by Eq. (4.15). 110 ie} oo Figure 4.1. The time series y, of Eq. (4.2), autocorrelation functions p, of Eq. (4.14), and partial autocor- relation function @,(k) of Eq. (4.18)for AR(1) models with parameters (a) p = 0, of = 1, and $1 = 0.60 and (b) p= 0, o2 = 1 and 6; = -0.60. The autocorrelation function p, of the AR(2) model may have different forms depending on the values that the parameters 6, and $g may take on. For instance, if ,? + 42> 0, Py of Eq. (4.16) decays exponentially to zero when $1 > 0 but it oscillates in sign when $, < 0. On the other hand, if $] + 4 6) < 0, the autocorrelation p, is pseudo periodic or a damped wave. Figure 4.2 shows four examples of autocorrelation functions of the AR(2) model depicting the various shapes of p, described above. In general, for de- termining the correlogram , of an AR model or order p with either known or estimated set of parameters = {0:, .. 4,}, Eq. (4.13) must be solved simultaneously to obtain first- TW Py) gy «++ Ppay- Then for k > p, Bq. (4.18) is used recursively to find p,. A general computational algorithm oP 1 for determining p, is given in the Appendix A4.1 at the end of this chapter. Partial Autocorrelation Function. The partial autocorrelation function or partial correlogram is another way of representing the time dependence structure of a series or of a given model. It is useful for helping to identify the type and order of the model when investigating a given sample time series. In this section we will only concentrate on how to determine the partial correlogram for low order AR models. A more elaborate explanation and computational details is given in Appendix A4.2 Let us denote by ¢,(k) the jth autoregressive coefficient in an AR(K) model, such that $,(k) is the last coefficient. Then the difference equation (4.13) gives Pp; = 940K) 1+ 90K)P;_9 tot CDP) As j=l, >k (4.17) The partial autocorrelation function is 6,(k) and it is determined by successively solving Eq. (4.17) simultaneously for each k= 1, 2, ... . For example, for the AR(1) model Eq. (4.17) gives , k>1. (4.18) Figure 4.1 shows the partial correlogram 6,(k) for the AR(1) model with parameters (a) >, = 0.6 and (b) 9, = -0.6 The corresponding time series and correlograms are also shown. Similarly, for an AR(2) model we have 4) = = 2 $9(2) = : (4.19) 2 and (4) = 0 , k>2 Figure 4.2 shows the partial correlogram ,(k) for the AR(2) model for four sets of parameters. The“torresponding correlograms are also shown 112 oa os oa) qt) Figure 4.2. TF : ) () os} 04 o2t \ x ¢ o é 6 -02] -oal oxtn) all K 29 EF é 6 i Autocorrelation and partial autocorrelation functions of AR(2) models with parameters (a) 61 = 0.5 and 62 = 0.2, (b) o; = -0.5 and 9 = 0.2, (c) 91 = -0.5 and $2 = -0.3, and (d) $1 = 0.5 and $2 = -0.3 113 In general, it may be shown that for an AR(p) model Eq. (4.17) gives afk) #0 , kép afk) =0 , k>p (4.20) The general computational scheme for determining the partial correlogram $,(k) for AR models is shown in Appendix A4.2 Condi to be Met by the Parameters - Stationarity Conditions In order fe the AR(p) model with constant parame Bq. (4.1) or (4.4) to be stationary the set of parameters +9, must satisfy the so called station- arity conditions. These conditions are satisfied if the roots of the characteristic equation (Yevjevich, 1972) pre -g05 ov 9 4 RE BD lie inside the unit circle. That is, we must have |u,t <1, i = 1, ..., p, where u, are the roots of the solutioh of Eq (4.21). "For instance, ‘for the AR(2) model with ¢; = 0.5 and 2 = 0.2, Eq. (4.21) becomes u® - 0.5u - 0.2 = 0. The roots of this equation are u, = 0.762 and uz = -0.262; or {u,| = 0.762 <1 and |ug| = 0.262 < 1 so that they fall inside the unit circle. For the AR(1) model Eq. (4.21) becomes u - ¢ = 0 which implies that u = ¢,. That is 0; < 1 is the stationarity condition for the AR(1) model or equivalently -1< 2 are not available. There- fore, in such cases Eq. (4.21) must be solved numerical Several numerical methods and computer programs are avail able for solving polynomial equations as Eq. (4.21), see for instance Carnahan et al.(1969, Chapter 3). Properties of AR Models with Periodic Parameters AR models with periodic parameters can be represented by Eqs. (4.3) and either Eq. (4.4) or (4.5). Since the properties of the model of Eq. (4.4) were already discussed, only the pair of Eqs. (4.3) and (4.5) are considered herein They are Puta. 2, (4.26) and P 2 Sit 2ytei * Pex fue G27 The model of Eq. (4,26) is obviously nonstationary because both the mean and the standard deviation are periodic. Similarly, Eq. (4.27) represents a nonstationary AR model in which the autoregression coefficients , | and the standard deviation 0,, are also periodic. This fact makes the model more complicated and it has not been studied thoroughly as in the case of autoregressive models with constant parameters Therefore, only a brief discussion dealing with the most commonly used properties is made in this section. Expected Value and Variance. The expected values and variances of the periodic AR(p) models of Eqs. (4.26) and (4.27) are EQ, 2) = Hy , vw (4.28) E@, .) = EE =0 7 vee W (4.29) = 02 Var Oy? = 02 : rT) (4.30) and Var(z, ,) = Var (fy = 1, tel, w (4.31) The periodic variance 02, of Eq. (4.27) can be written as a function of the periodic autoregression coefficients , , and the periodic autocorrelation coefficients p; , as (Salas, 1972, p. 11) ’ 2 Pp lal: ey % % (4.32) For the AR(1) model, Eq. (4.32) simplifies to 2 21 - Fer = 1 Oe Par (4.33) Similarly, for the AR(2) model, Eq. (4.32) becomes 2 s1- - Ser Oe Pye ~ O24 Pave (4.34) Autocorrelation Function, The autocorrelation function of the model of z, , of Eq. (4.27) satisfies the difference equation (Salas, 1972; p. 11) 116 9) Pl}, we, °F? (4.35) where min(k,j) and py = 1. This equation can be used for estimating the parameters 6; , of the model as well as for determining the correlogram p,(t) for a given set of parameters. The estimation of parameters from Eq. (4.35) will be dealt with in detail in Sec. 4.3.2. In general, if one wishes to determine the correlogram p,(t) for a periodic AR(p) model, Eq. (4.35) must be solved recursively. How- ever, this will not be pursued further since it has seldom been used in practice. Conditions to be Met by the Parameters. In general, _ the conditions to be met by the periodic parameters “9; , can not be easily derived. Therefore, for all practical purposes the conditions given for the case of AR models with constant parameters may be used as an approximation. One further check on the parameters could be made by, testing to make sure that Eq. (4.32) yields a variance o?, greater than zero. 4.2. AR MODELING OF ANNUAL TIME SERIES 4.2.1 ANNUAL AR MODELS Consider the annual hydrologic variable x, which in general may be normal or non-normal. Consider further that if x, is not normal, an appropriate transformation is used to make it normal. That is ¥_ = 8 OH) (4.36) where g is the transformation function and y, is the normal variable. Therefore, the models described herein will be referred to the normal variable y,. The AR models considered for fitting the series y, are vp rh t (4.37) where y, has mean p and variance o? and 2, is either the AR(0) model 2 = ep (4.38) 17 or the AR(1) model oeeaNOveeeeretece (4.39) or the AR(2) model 2, = 01 2 ty 2g t ep (4.40) or in general the AR(p) model 2 = 0 2p te Oy App te (4.41) whose properties have been discussed in detail in Sec. 4.1 The parameters of the models must be estimated from histori- cal data and appropriate criteria should be used for selecting the order of the AR model. 4.2.2 PARAMETER ESTIMATION FOR ANNUAL AR MODELS Consider an available sample of annual hydrologic data denoted by X,, Xp, ---, Xy, Where N is the number of years of data. ‘These data are either normal or they are transformed to normal as indicated above. In general let the normal sequence be represented by Yy, y + Yy- With this data sequence we can estimate the parameters {y; 07, $1, > op) 02} of the AR(p) model. Two methods of estimation are discussed in this text, the method of moments and an ap- proximate maximum likelihood method. Generally, the method of maximum likelihood gives better parameter estimates than the method of moments. A detailed discussion and comparison of both methods may be found in Box and Jenkins (1970) and Jenkins and Watts (1969) Moment Estimators of Model Parameters The moment estimate of the parameter p is, 1 N f= 2 ME (4.42) ie 9 Since fi is equal to the average of the sequence y, we will also use the notation y, that is f = y. The moment esti- mate of the variance o? of yy is N oy - ? (4.43) Zin t=1 118 Equation (4.43) gives a biased estimate of o so that it is common practice to adjust it by dividing by (N-1) instead of N. In this case Eq. (4.43) becomes q N ~.2 @=oy ft &-y) Gy 23 (4.44) Once y and o* are estimated by Eqs. (4.42) and (4.43) or (4.44) the remaining parameters $1, .--, 6., oz of the AR model are estimated by asing the sequence P 2 = 9,7 ¥ 1, N (4.45) The parameters 0, ..., 9, are estimated by solving the p system of linear equations (4.13) where the popula- tion correlation coefficients p, are replaced by the sample correlation coefficients 1; and the parameters 9; are re-~ placed by the estimates That is ne = by Met by Meat + 4p Meeps R20 4.46) where Yr), rg, ... are computed from Eq. (2.5b) In particular, for an AR(1) model or p=1, Eq. (4.46) gives =r, (4.47) Similarly, for an AR(2) model or p=2, Eq. (4.46) gives r,(1 - 19) 8, = and (4.48) Finally, the parameter oz is estimated from Eq. (4.10) by using the estimates 6%, §, and ny instead of o7, % and p;, respectively, and by multiplying it by the factor N/(N-p) in order to obtain an w sed estimate of oO. Thus, uw a2 -NG . % = Gp) O PB 2 or). (4.49) ja i In particular, for the AR(1) model, Eqs. (4.47) and (4.49) give (4.50) Similarly, for the AR(2) model, Eqs. (4.48) and (4.49) give N & (1484) 2, & > aryaay tres ah (4.51) Reliability of Estimated Parameters. Due to the fact that the parameters of the AR model are estimated from a limited sample size, such estimates are random variables. Therefore, if we wish to determine how reliable those estimates are, we must find the distribution of such parameter estimates. “The reliability of the estimated parameters is in general not easy to determine for correlated variables. Therefore, the method and equations given in this text are orly approximations and are based on the normal distribution of estimated parameters ‘The variance of the estimated mean ¥ of Eq. (4.42) is 2 1 a N- p+ Var @eyP OtR pS (N-k) 8] (4.52) 1 where 6? is the estimated variance of Eq. (4.44) and 6, is the estimated autocorrelation function of the AR model rep- resenting y,. In particular, for the AR(1) model, A, = a* so that Eq. (4.52) simplifies to ver () (a-6Dn = 24, Ge). (a.53) n? (a-g1)? In general, Eq. (4,52) may be used to obtain the approximate confidence interval for the mean 1 by using the t-distribu- tion (Mood et al., 1974). Therefore, considering s(y) [var (j)]%, the (1-a) confidence interval for the parameter would be (F = tDy gy 8H, F + tN Gp SG] 4.54) where t(N-1);_4/2 is the 1-a/2 quantile of the t-distribution with N-1 degrees of freedom. Estimates of the variances and covariances of the estimated parameters §,, ..., 6, may be obtained from (Box and Jenkins, 1970; p. 244) a 120 -1 Dory tg ee thy yor Ty-9 ~ “1 PB 5 Vv @ = @epyt a- 261) ji di Tht "pee “peg cc} (4.55) where V represents the estimated covariance matrix. In particular, for the AR(1) model Eq. (4.55) gives VG) = Var @y) = 1 ~ 2 /0N-1) (4.56) Similarly, for the AR(2) model, Eq, (4.55) gives Var 6, = Var 9) = (1 ~ 82)/KN-2) (4.81) Cov (41, 9) = = 4, (1+ &y)/N-2) (4.58) Expressions (4,55) through (4.57) give the approximate variances of the parameter estimates which in turn provide the corresponding confidence intervals for the parameters assuming the estimates of the parameters are normally distri- buted. Therefore, considering s@) = [Var a), the (1-a) confidence interval for the parameter 9 would be (4; ~ Yygye 5), 4 + Uy a2 s(6)} (4.59) where Uj42 is the 1-a/2 quantile of the standardized normal distribution A better approximation for the confidence interval of the parameters 4, and $2 of the AR(2) model may be obtained by using the bivariate normal distribution of $, and 2 since the variances and covariances are given by Eqs. (4.57) and (4.58), respectively. In such case the (1-a) confidence region for $, and $2 can be determined instead of the individual confidence intervals. In generai, for the AR(p) model the p-variate normal] distribution can be used to deter- mine the confidence region for the parameters @), .-- 4, The corresponding approximate confidence interval for o? 121 may be obtained by replacing the lower and upper confidence limits of (4.59) instead of the §,'9 of Bq. (4.49). Maximum Likelihood Estimators of Model Parameters The maximum likelihood estimation of the parameters of the AR(p) model presented in this text is an approximate method given by Box and Jenkins (1970; p. 274). The method is based on differentiating the log-likelihood function with respect to the parameters and equating the resulting expressions to zero. Since one of the terms of these expres- sions is a complicated function of the parameters, an approxi- mation is used which leads to a system of linear equations As in previous case the parameter 1 is estimated by Eq. (4.42), and the remaining parameters of the AR(p) model of Eq. (4.41) are estimated by using the 2, sequence of Eq. (4.45). Consider the sum of cross-products (Box and Jenkins, 1970; p. 276) 232) * Fiev@jer oo * eae] Anti” and define Ne1-Gitp) Dy = Da = WTI 0 Zip Zep > (4-60) The maximum likelihood estimates of the parameters 9), -.., 9, are found by solving the system of equations Dy =o Dia +b Dig yo. + cs Dy ptt? j=2, ..., ptl (4.61) for 61, 1 o, In particular, for an AR(1) model, Eq. (4.61) gives D. 12 = > (4.62) 1 Doo Similarly, for an AR(2) model, Eq. (4.61) gives Dig Doo = Dig Pag (gp Pag ~ Dag) (4.63) ~ Piz Pgz 7 Pig Pag 3 and 65 = ap Pag - Dag) 1 7 The variance of the white noise 02 may be estimated y 122 PB @o = 1a a 4, Di jen (4.64) For the AR(1) model Eq. (4.64) yields TED Oy - 4 Dy) (4.65) Similarly, for the AR(2) model 62 is Tres) (Py, - 41 Dy - 4 Dyg)- (4.66) Reliability of Estimated Parameters. The confidence interval for the mean y can be obtained from expression (4.64) Estimates of the variances and covariances of the estimated parameters $), ..., 6, may be obtained from v@ (4.67) where 62 is given by Eq. (4.64) and pb, is the inverse of the matrix Doo Pag Pe pat D D 32 38 Dy ay ie) 2 ; 6 . : : (4.68) Doei2 Ppeta -°° Pptaper, In particular, for the AR(1) model, Eqs. (4.67) and (4.68) give @ VG) = Var a, = (4.69) yg 22 where 6% is given by Eq. (4.65). Similarly, for the AR(2) model, Eq. (4.67) and (4.68) give 2D, £3, var (,) = gq Pgg ~ Pg) 22 2 (yz D3 - Da) (4.70) Var (6,) = and 123 (gq Dag - Cov (,, 4, 4.71) De. Dog) with 62 given by Eq. (4.68). The variances obtained in general from Eq. (4.67), or in particular from Eqs. (4.69) and (4.70), can be used as in Eq. (4.59) to provide the (1 - a) confidence interval for the autoregressive parameters 6;'8. As said in the previous section, the p-variate normal distribution of 4), .... 4, the (1 - a) confidence region of the parameter set. The cor- responding approximate confidence interval for o2 may be can be used in general to obtain obtained by replacing the lower and upper confidence limits of the autoregressive parameters instead of the 6,'s in Eq (4.64). Remarks. Whether the method of moments or maximum likelihood is used to obtain the parameter estimates $4, . # there is no assurance that they will satisfy the station- arity conditions to be met by these parameters. Therefore, G1, +1 @, should be used in Eq. (4.21) instead of 4), + %p and the roots of that equation must lie within the unit circle. 4.2.3 GOODNESS OF FIT FOR ANNUAL AR MODELS The goodness of fit tests of AR models fitted to annual hydrologic series can be accomplished by (1) testing on the assumptions made for the AR model and (2) comparing the historical (sample) correlograms with the model (fitted) corre- lograms. In addition, a test needs to be made for checking whether the order of the fitted model is adequate so as to obtain a parsimonious model. In general, the statistical tests suggested above should be followed as close as possible for modeling annual hydro- logic series. However, there are some situations in practice in which the analyst may depart from some of those tests, and may decide on the order of the model and the corre- sponding parameters based on non-statistical grounds. Some of these cases are discussed in Sec. 4.2.7. Tests on the Assumptions of the Model Two assumptions of the model need to be checked, i.e., the independence and the normality of the residuals of the 124 fitted model. The residuals ¢, of the fitted AR model may be determined from Eq. (4.41) with the parameters $;'s_ re- placed by their estimation values 4,'s. Therefore, © may be written as t =O, y= by tpg by Ep (4.72) where the notation @, is used, instead of ¢, to indicate the sample estimates of the residuals. In order to obtain the initial values 1, &, ..., € in Eq. (4.72) it is necessary to know the values z, 2,9 ..., Zp but they are not known. An approach to determine these 2 values is through a backward forecasting as suggested by Box and Jenkins (1970), in which case Eq. (4.72) could be readily solved. But since for annual hydrologic series p is usually small, i.e., it is of the order of one or two, very little would be lost if the first p values of & are neglected. For in- stance, for the AR(1) model Eq. (4.72) would yield the resi duals @, &3, ..., &y, which are used for the tests of in- dependence and normality Chapter 3 discusses some procedures for testing whether the residuals of a dependence model are uncorrelated. One procedure was based on the probability limits of the popula- tion residual correlogram. Another procedure was the Porte Manteau lack of fit test in which the autocorrelations of the é, are taken as a whole. In this case, Eq. (3.46a) is ap- plied to determine the statistic Q Po a=-n zr r@® (4.78) kel with rj(€) determined from Eq. (2.5b) and L may be of the order of 10-30 percent of the sample size N. The statis- tic Q is approximately x2(L-p). If Q < x2(L-p) then gy of Eq. (4.72) is independent which in turn implies that the selected AR(p) model is adequate. Otherwise, the model is inadequate and another model (say of order pt1) should be selected for analysis. A test of normality is required to check whether the residuals @, of Eq. (4.72) are in fact normal. Chapter 3 gives some tests of normality currently used in the statistical literature. One simple normality test is to plot the empirical distribution of the residuals @ on a normal probability 125 paper and verify whether the plotted points fall on a straight line. In general, one or more of the suggested procedures in Sec. 3.5.2 can be used. Comparison of the Historical and Model Correlograms When modeling hydrologic series, in general a common criteria is to preserve the main statistical characteristics of the historical series. For instance, the modeler may wish to select an AR(p) model whose correlogram resembles that of the historical sample. In this case a graphical comparison of the sample or historical correlogram with that of the selected model is usually made. The sample or historical correlogram of the series 2, of Eq. (4.45) is obtained from Eq. (2.5b) and the corresponding model correlogram from the solution of the difference equation (4.13) (see Sec. 4.1.2) where the g's are replaced by the estimates §'s. Alternatively, the generation of synthetic samples can be used to obtain the model correlograms. This would enable the modeler to com- pare the historical correlogram with the whole distribution of the model correlogram. If biases are observed and they are judged to be important, one may consider an alternative model. Test for the Parsimony of Parameters The Akaike Information Criterion (AIC) was presented in Chapter 3 for checking whether the order of the fitted model is adequate compared with other orders of the dependence model. From Eq. (3.57) the AIC for an AR(p) model is AIC(p) = N In(62) + 2p (4.74) where 62 is the maximum likelihood estimate of the variance Thereforé, a comparison can be made between the AIC(p) and the AIC(p-1) and AIC(p+l). If the AIC(p) is less than both the AIC(p-1) and AIC(p+1), then the AR(p) model is best. Otherwise, the model with less AIC becomes the new candidate model. In a way the AIC is a criterion for the selection of the order of the model, thus the plot of the AIC(p) against P as well as the plot of the sample and population partial correlograms could be used for the final model selection 4.2.4 GENERATION AND FORECASTING USING ANNUAL AR MODELS Once the parameters of the AR model are determined and the goodness of fit tests indicate that the model is appropri- ate, then the model can be used for generating synthetic 126 annual series or for forecasting future annual values. This section concentrates only on the generation of new samples because Sec. 5.2.5 describes in general the procedure to follow for forecasting using AR and ARMA models. The model for generation of synthetic annual time series may be determined by substituting the estimated parameters y, and $,, ..., 4, into Eqs. (4.37) and (4.41) and by the inverse transformation of Eq. (4.36). Thus the generating model is: Gp (4.5) where ¢! (.) denotes the inverse of the estimated trans- formation function € (.). For instance if y, = log (x,) then x, = antilog (y,). From Eqs. (4.37) and (4.41) the variable’ §, is po and (4.76) Tb et 4, 7 io Bip tet Since , is normal with zero mean and variance 62, we can use the standardized normal variable £, so that e, = 6, &- Then, the last equation for 2, becomes 6, & (4.72) Ca Ce cs Therefore, Eqs. (4.75), (4.76) and (4.77) can be used jointly to generate the synthetic annual series %,. Let us assume that we want to generate N, years of synthetic annual time series. The generating procedure is as follows: (1) Generate a standard normal random number £ and use Eq. (4,77) to determine 2, assuming that the previous 20, 2), + B54, are Zeros. In the same manner generate new random number £ and use Eq. (4.77) to obtain 2, based on the already computed 2, and the values f, 2.1, , which are assumed zeros. Repeat this procedure until the series 2), 2), ..., 2y:. is generated where N'= Ny + Ny Where N,, is the warm-up length and N, is the desired number of years of generation. The length Ny, is 127 necessary in order to remove the effect of the starting condition (2, 21, ---» 2p41 equal to zero) and it may be equal to about 50 (Fiering and Jackson, 1971, p.58). Note that since for annual series p is usually of the order of 1 or 2, despite of the starting assumption, the convergence to the true AR process is reached fairly fast (2) Delete the first N,, values Yialized the 2's so that the last Ny values “ 2, becomes Boy ves B By, Bo, oes By Nyy a 2), s+.) By and reini- : a 4 , ‘NH (3) Based on the generated sequence 2,, ..., ay deter- mine the sequence §,, ..., Sy from Eq. (4.76). g Ne (4) Based on the generated sequence §), ..., Fy » use Eq. (4.75) to determine the synthetic annual time series %,, Ng 4.2.5 SUMMARIZED AR MODELING PROCEDURE FOR ANNUAL SERIES The modeling procedure described herein contains five main parts: (1) Preliminary analysis and model identification, (2) Estimation of parameters, (3) Tests of goodness of fit of selected model, (4) Optional tests of the model and (5) Reli- ability of estimated parameters. The first part outlines some fundamental analysis and criteria for selecting the order of the model to be fitted to the historical series. The second part estimates the parameters of the selected model. The third part describes some tests of goodness of fit based essentially on tests for independence and mormality of the residuals, graphical comparison of the historical and model correlograms and a model parsimony test. ‘The fourth part “Optional Tests of the Model" deals with further testing the model by comparing the statistical characteristics of the historical series with the corresponding characteristics of synthetic time series generated with the model. The last part "Reliability of Estimated Parameters" deals with determining the confidence limits of the estimated parameters of the model. Evidently, it is advisable to follow the suggested five parts whenever it is feasible, and the time and funds are available. However, often the analyst may wish to skip one or more parts of the suggested procedure. For instance, the analyst may assume from experience that the annual flows of a particular river are normal and possibly follow an AR(1) 128 model. In such a case the part (1) may be deleted from the analysis. In another case, the analyst may be satisfied with the results of the goodness of fit tests of part (3) and may consider it unnecessary to make further tests of the model. In this case part (4) may be deleted. Some other cases of simplifications and deletions of some steps in the overall procedure are suggested in the text. (1) Preliminary Analysis and Model Identification The main purposes of this part are to check the normality of the original annual time series, to make the appropriate: transformation to normality if necessary, and to identify a tentative AR model of order p for subsequent steps of estimation and verification. Depending on the exper- ience of the analyst and on each particular case some or all the points of this part (1) may be skipped. For instance, previous studies of the distribution of annual flows at a particular region may have shown that they are normal. If this is so, steps (1a) and (1b) may be deleted. Another situ- ation may arise when the flows in a particular region or river basin were shown to have very little or negligible ground- water and surface storage contributions in which case when analyzing flows of a river within that region or basin one may expect either an AR(0) or an AR(1) model with very smail first serial correlation and memory. Thus, also in this case the analyst may decide to delete steps (Ic) through (if). STEP (1a). Check whether the original annual time series is normal. Apply either the chi-square goodness of fit test or the test of the skewness coefficient presented in Sec. 3.1. At this stage a graphical test using the normal probability paper is usually enough. If the series is normal, continue with step (1c), otherwise continue with step (1b). STEP (1b). Transform the non-normal annual time series into normal. Use the procedure for transformation to normality outlined in Sec. 3.1. In most cases the simple logarithmic transformation would approximate the series to normal, there- fore this should be first tried. To see if the selected trans- formation makes the data normal, plot the empirical distribu- tion of the transformed data on a normal probability paper and check if the points follows a straight line. STEP (1c). Assuming that at this step the data is normal, let _y1, Yq. -++» Yy Fepresent the time series with N the number of years of the historical record. Plot the y, time series versus time. The graphical dispiay of the time series is useful information which could help to identify the basic characteristics of the series and the possible model to be 129 fitted to the series. For instance, an AR(Q) model or independent time series plots without an identifiable time pattern. That is, if we take several points of the series and observe for each the next sucessive points, we will note that there is the same chance for these points to either stay above or below the mean with no bearing on the previous reference point. Therefore, if our sample time series resembles this type of display, it is possible that the series is an AR(O) process. On the other hand, and AR(1) time series with parameter, > 0 shows a positive dependence between successive points, that is, a point above the mean is likely to be followed by another point also above the mean, and a point below the mean is likely to be followed by another point below the mean. Therefore, if our sample time series resembles a display with positive dependence as described above, then an AR(1) model with 6, > 0 would be a model candidate for fitting the time series. Similar observations as above could be made for higher order models although in these cases it is more difficult to identify the order of the model based on the time series graphical display alone. STEP (1d). Determine the sample correlogram ry(y) of y, from Eq. (2.5b) and plot r,(y) against the time lag k The characteristics of the correlograms of AR(p) models were discussed in Sec. 4.1.2. Therefore, based on the shape of the sample correlogram r,(y) and on the general shape of AR(), AR(1), ..., AR(p)) model correlograms, a selection of the order of the model would be made. For instance, if the computed rj(y) is a value close to zero and the other ro(y), ra(y), ... are also small, fluctuating without any pattern around zero, it would indicate that the annual time series may be simply represented by an AR(O) model. Simi- larly, if r,(y) tails off in a exponential manner it would indicate the AR(1) model as a potential model of the time series y,. STEP (le). Determine the sample partial correlogram $,(k), , L_ from Eq. (A4.9) as well as the corresponding ity limits for ,(k) = 0 (Eq. A4.10) and plot them versus k. The value of L may be of the order of 0.1 - 0.3 of the sample size N. From the discussion on the prop- erties of AR models given in Sec. 4.1.2 we know that for an AR(p) model 9,(k) = 0 for k > p. Therefore, if the plot of the sample partial correlogram 4,(k) has a cutoff after a certain lag k = p, it may be an‘ indication that the time series corresponds t0 an AR(p) model. The cutoff point may be determined by observing beyond what lag the $,(k) values fall within ‘the computed probability limits. probab 130 STEP (1f). Based on the analysis of the plot of the time series y, of step (1c), the analysis of the correlograms of step (1d), and the analysis of the partial correlograms of step (le), select a model of order p. In the event that the analyst does not have experience of if the graphs indicated above are complex and we are unable to decide on the order of the model, select the lowest order model or p = 0 and proceed with the second part of the analysis. (2) Estimation of Parameters The main purpose of this part is to estimate the parameters of the tentative model selected in part (1) and to check the stationarity conditions of such estimated param- eters. The estimation of the parameters of the model may be made either by the method of moments or by the method of maximum likelihood. The analyst should decide which method to use. Generally, the method of maximum likelihood would yield better results but in some cases one may need or wish to use the method of moments. The following three steps are common to both of the above mentioned estimation methods: STEP (2a). Determine the.sample mean y from Eq. (4.42) and the sample variance 6? from Eq. (4.44) STEP (2b). Determine the series 2, from Eq. (4.45), i.e., by subtracting the sample mean y ‘from the original series y, t STEP (2c). Compute the sample correlogram r,(z), k = 1, , L with L = 0.1N or 0.3N from Eq. (2.5b)* This com- putation can be deleted if r,(y) was previously computed in step (1d) since ry(z) = r,(y). If p = 0 continue with steps of part (3), otherwise continue with the steps corresponding to either the moment parameter estimation or maximum likeli- hood estimation. Moment Parameter Estimates STEP (2d). Determine the autoregressive parameters 4;, j = 1, ...) p by solving the system of equation (4.46). In particular, if p=1 obtain 4; from Eq. (4.47) and if p=2 obtain $, and $2 from Eq. (4.48). STEP (2e). Compute the white noise variance 62 from Eq. (4.49). In particular, if p=1 obtain 62 from*Eq. (4.50) and if p=2 use Eq. (4.51). 131 Maximum Likelihood Parameter Estimates STEP (2d'). Compute Dj, i,j=1, ..., ptl from Eq. (4.60) and determine the autoregressive parameters 4,, itl, ..., P by solving the system of equation (4.61). In particular, if p=1 obtain $, from Eq. (4.62) and if p=2 obtain $, and $. from Eq. (4.63) STEP (2e'). Compute the residual variance 6? from Eq. (4.64). In particular, for p=1 obtain 6? fromi Eq. (4.65) and if p=2 use Eq. (4.66). & Stationarity Conditions of Estimated Parameters STEP (2f). Test the stationarity conditions of the estimated autoregressive parameters ¢,, ..., 6, by obtaining the p roots of Eq. (4.21) and check whether they lie within the unit circle. In particular, for p=1 expression (4.22) must be met while for p=2 expression (4.24) must be met. If the above conditions are fulfilled continue with part (3) otherwise either change the estimation method or change the order of the model and go back to either step (2d) or (2d'). (3) Tests of Goodness of Fit of Selected Model STEP (3a). Determine the residuals of the AR(p) model from Eq. (4.72) starting at {= ptl. Note that when p=0 Eq. (4.72) gives & = 2). STEP (3b). Test the hypothesis that & is an independent series by the procedures suggested in Sec. 4.2.3. For instance, apply the Porte Manteau lack of fit test by comput- ing the statistic Q of Eq. (4.73) and comparing it with the y2 value for L-p degrees of freedom and a given confidence level a. If Q < x2(L-p), , is independent and the selected modei is adequate. In this case, continue with step (3c), otherwise select a higher order model, say pt1 and go back to step (2d) or (2d') depending on which method of estimation is being used. Alternatively, even if the selected model is not adequate in regard to this test, the analyst may wish to proceed with subsequent test steps so that at the end an overall analysis and judgment of the model can be made. This same argument is valid for steps (3c), (3d), and (3e) below STEP (3c). Delete this step if p=0 and step (1a) was previously made. Otherwise, test the hypothesis that @, is 132 normal by any of the procedures suggested in Sec. 3.5.2. For instance, compute the skewness coefficient y(é) of & from Eq. (3.52) and the (1 - a) limits from Eq. (3.53). If y(é) falls inside these limits, é, is normal and proceed with step (3d), otherwise go back to Step (1b) and try a transfor- mation to normality of the original series x,, Alternatively, instead of the skewness test of normality, ‘a graphical test can be made by plotting the empirical frequency distribution of the residuals 2, on a normal probability paper STEP (3d), Determine the Akaike Information Criterions AIC(p-1), AIC(p) and AIC(p+1) from Eq. (4.74). In order to compute the AIC(p-1) and the AIC(p#1) the analyst gener- ally needs to go back to either steps (2d) and (2e) or (2d') and (2e') (depending on which method of estimation is being used) to obtain the variances 62 corresponding to AR models with orders (p-1) and (p+l). The AIC(p) must be less than both AIC(p-1) and AIC(p+i) in order to accept the model of order p. In this case, continue with step (3e). Otherwise, the analyst may wish to try the model with the order corresponding to the mimimum AIC and go back to step (2f) STEP (3e). Compute the correlogram §,(2) of the selected AR(p) model by following the procedure described in Sec 4.1.2. In particular, if p=1 6,(2) is obtained from Eq (4.14) using 4, instead of 6,. ‘Similarly if p=2, (2) is obtained from Eqs. (4.15) and (4.16) using $, and é, instead of 1 and 3. Compare the model correlogram 6, (2) and the sample correlogram r,(z) determined at step (2c) by a graphical display. If both correlograms ate simi- lar, it would indicate a further verification that the selected AR model is adequate (4) Optional Tests of the Model The modeler is usually interested in finding a model which can reproduce the historical statistical characteristics considered relevant. These statistical characteristics may be the historical mean, standard deviation, skewness coefficient, correlogram, mean ranges, and mean run length. Therefore, the main purpose of this part is to compare the statistical characteristics of the generated data with those of the histor- ical data. STEP (4a). Based on the selected AR(p) model and corresponding parameters estimated in part (2) generate say 100 sequences of annual data of the same length N as the length of the historical sample. Follow the generation proce~ dure mentioned in Sec. 4.2.4. 133 STEP (4b). Determine the mean i(i), standard deviation 6(i), skewness coefficient }(i), correlogram r,(i), and any other statistical characteristic the analyst wisheS to consider, for each of the generated sequences i=1, ..., 100 STEP (4c). Compare the generated statistics determined in step (4b) with the corresponding historical statistics. For instance, the comparison of the generated statistic r,(i), i = 1, ..-, 100 with the historical statistic r, can be fhade by determining the sample mean and standard deviation «DD (4.78) 1 100 ~ 9) 2 s(ry) = [5 | (7, @ - #0) | (4.79) iz Then, for each correlation the interval les, Fete sy] (4.80) is determined where c can be taken as 1, 1.5 or 2 depend- ing on how strict the test is made. (Alternatively, c could be taken as the standard normal variate corresponding to a specified significance level, such as _c = 1.64 for 10 percent level). Finally, check whether the historical correlogram r, falls inside the interval of expression (4.80). If that is thé case, we can conclude that the model preserves the historical correlogram. Follow the same procedure as suggested above for determining whether any desired historical characteristic is preserved by the model. If one or more historical statistics are not preserved by the model and the analyst judges that those statistics are of importance for the problem at hand, then he may decide to either change the parameters of the model or perhaps change the model itself. (5) Reliability of Model Parameters Once the parameters of the model are determined and the model is accepted based on the tests referred in parts (3) and (4), the reliability of the parameters of the model can be determined by their confidence interval corresponding to a given probability level. This is done by using the exact and approximate equations given in Sec. 4.2.2. Alternatively, the reliability of the estimated parameters can be obtained experimentally by data generation following similar procedure as indicated in part (4) 134 STEP (5a). Compute the standard deviation s(y) of the sample mean y from Eq. (4.52) where 6% is the variance of y, obtained from Eq. (4.44) and 6, is the correlogram of the AR(p) model obtained_in step (3e). In particular, for the AR(1) model, compute s(y) from Eq. (4.53) STEP (5b). The (1-a) confidence interval of the mean y is determined from expression (4.54). The value of (1-a) may be taken as 95 percent or a = 5 percent STEP (Sc). If p > 0 , continue with steps (5d). If p=0 determine the (1-4) confidence interval for the variance 0? by (N-1) 62 (N-1) 6 2 , 2 XpeasgN-D xf _(N-1) ] (4.81) where 6? is the sample variance computed in step (2a), and XFeq/gQN-D and xGgQN-D are the 1-a/2 and a/2 quantiles of the chi-square distribution with (N-1) degrees of freedom. Methods of Moments STEP (5d). If the autoregressive parameters and residual variance were estimated by the method of moments éetermine the elements of the covariance matrix V($) from Eq. (4,55) where the 4's are the parameters estimated at step (24) and the r;'s are the serial correlation coefficients computed at step (1d). In particular, if p=1, the variance Var($;) can be obtained from Eg. (4.56) and if p=2 the variances Var($,) and Var($z) can be obtained from Eq. (4.57). From the variances determine the standard deviations 5G), Fl, > Pe STEP (Se). Determine the (1-a) confidence interval for the Parameters 6), ..., 6, from Eq. (4.59) where the 4,'s are the parameters estimated at step (2d) and the s(.)'s are the standard deviations computed at step (5d). U STEP (52). Determine the (1-a) confidence interval for the residual variance o2 by replacing into Eq. (4.49) the lower and upper confidenée limits of the autoregressive parameters computed at step (Se) instead of the estimated 4,'s. Method of Maximum Likelihood STEP (5d') If the autoregressive parameters and_residual variance were estimated by the method of maximum likelihood, 135, determine the elements of the covariance matrix V($) from Eq. (4.67) where 6? was computed at step (2e') and deter- mine the elements of the matrix D,, from Eq. (4.68). In particular, if p=1 the variance Var($,) can be obtained from Eq. (4.69) and if p=2 Var($,) and Var($z) can be determined from Eq. (4.70). From the variances Var(é,) determine the. standard deviations s@), yess P ] STEP (Se'). Determine the (1-a) confidence interval for the parameters 1, --., @, from Eq. (4.59) where the 4;'8 are the maximum likelihood parameters estimated at step (2d') and the s())'s are the standard deviations computed at step (5a') above STEP (5f'). Determine the (1-a) confidence interval for the residual variance o? of Eq. (4.64) by replacing the estimated 4,'s by the lower ‘and upper confidence limits of the autore- gressive parameters computed at step (5e'). 4.2.6 EXAMPLE OF AR MODELING OF ANNUAL SERIES Modeling of Annual Flows of Gota River The Géta River is located in Sweden, has a basin area of 18,076 square miles (47,439 km?) at the gaging station Sjétorp-Vinersburg, has a mean discharge of about 19,000 cfs (538 m*/s) and records are available since 1807. Al- though there exists 171 years of historical data, only 50 years (1901-1950) are used for purposes of this example The annual flows of the Géta River in the form of modular coefficients (flows divided by the mean) are taken from Yevjevich (1963) and are shown in Appendix A4.3. The modeling of annual flows in this example follow the step-by- step procedure outlined in Sec. 4.2.5. (1) Preliminary Analysis and Model Identification STEPS (1a) and (1b). Previous analysis made by Yevjevich (1963) has shown that the skewness coefficient for flows of this river was -0.058, which is sufficiently close to zero to assure the normality of the original annual flows. STEP (1c). The 50 years of annual flows are represented by Vis -..; Yso and are plotted in Fig. 4.4. This plot shows some dependence between successive flows and especially the last half of the time series shows that high flows tend to follow high flows and low flows tend to follow low flows. This time series display would indicate either an AR(1) or AR(2) as tentative models. Whether we pick one or the other should 136 tot y - V\py hy as| t ° at at. ° 10 20 30 40 50 Figure 4.4. Time series of annual flows for the Gita River (Sweden) for the period 1901-1950. Data is in modular coefficients. become more definite after analyzing the correlogram and partial correlogram in subsequent steps. STEP (1d). The sample correlogram ry(y), kel, ..., 18 is determined from Eq. (2.5b) and it is shown in Table 4.2 and Fig. 4.5. The correlogram shows a sharp decay from r, ~ 0.4, and the rest has a wavy resemblance. This may indicate the possibility of an AR(2) model. Compare ry(y) of Fig. 0.6 0.4 0.2 ° -0.2 -0.4 Figure 4.5. Correlogram r,(y) of Eq. (2.5b) of the annual flows of the Géta River (Sweden) 137 4.5 with the Py of the AR(2) model shown in Fig. 4.2(d) As a reference the 95 percent probability limits for the cor- relogram of an independent series are also shown in Fig. 4.5 and Table 4.2 STEP (le). The sample partial correlogram (Kk), K=1, ..., 18 is determined from Eq. (A4.9) and is shown in Table 4.3 and Fig. 4.6. The partial correlogram shows that only the first value ,(1) is outside the 95 percent probability limits. This would suggest an AR(1) model Figure 4.6. Partial correlogram $,(k) of Eq. (A4.9) of the annual flows of the Géta River (Sweden). STEP (if). The time series of step (1c) indicates either an AR(1) or AR(2) model, while the correlogram at step (1d) appears to indicate an AR(2) model. On the other hand, the partial correlogram suggest an AR(1) model. Although there is not a clear evidence for either of the two models, the AR(2) model is selected as a tentative model for further analysis (2) Estimation of Parameters STEP (2a). From Eqs. (4.42) and (4.44) the sample mean and sample variance 6? are: y = 0.9528 and 6? = 0.0357. STEP (2b). The time series 2, is obtained from Eq. (4.45). For instance: 138 Table 4.2. Correlogram r,(y) of the Historical Annual Flows of the Géta River r 95 percent Cy) 95 percent lower limit upper limit 1 397 257 2 -.013 1259 3 -.004 +262 4 =.007 +264 5 = 1108 +267 6 013 +269 7 103 1272 8 076 1275 9 ~.055 £278 10 -.054 281 il ~.106 284 12 eres 287 13 ~.356 1291 14 - 038 1294 15 060 +298 16 ~.016 +302 17 075 +306 18 -.004 +310 Table 4.3. Partial Correlogram $,(k) of the Historical Annual Flows of the Géta River. k 95 percent 4,00 95 percent lower limit upper limit 1 397 2277 2 =.203 2277 3 102 2277 4 ~.061 1277 5 ~.100 2217 6 +134 1277 7 2019 2277 8 039 1277 9 -.107 1277 10 2011 207 i -.117 277 12 =. 202 Qt 13 ~.226 1207 14 184 1277 15 ~.083 1277 16 1277 17 207 18 1277 139 2, = ¥, ~ 0.9528 = -0.0178 = Yq - 0.9528 = -0.2908 = gq > 0-9528 = 0.0022 STEP (2c). The correlogram r,(z) = r,(y) was already computed in step (1d) We are going to use the maximum likelihood method for estimating the parameters of the AR(2) tentative model STEP (2d'). The elements Dj;, i, j=1,2,3 are computed from Eq. (4.60). For instance, Dy; = 2.079 and Dy, = 1.037, Then the autoregressive parameters §, = 0.542 and $2 = -0.114 are obtained from Eq. (4.63) STEP (2e'). Since. p=2, the white noise variance 62 = 0.0325 is determined from Ea. (4.66). ® STEP (2f). For p=2, expression (4.24) must be met for the stationarity conditions of the AR(2) model. Since $1 + dy = 0.428 <1, $2 - 6, = -0.656 <1 and -1< 6 = -0-114<1 the model is stationary. (3) Tests of Goodness of Fit of Selected Model STEP (3a). The residuals @ of the AR(2) model are determined from Eq. (4.72) starting at t=3. For instance, 8 = 2 ~ 42q - $041 or &, = 0.0018 - (0.542) (-0.2908)-(-0.114) (-0.0178) = 0.1538 STEP (3b). The statistic Q = 6.103 is obtained from Eq. (4.73) with —L 12, and from the chi-square tables x2qg(12-2) = 18.31. Since Q = 6.103 < 18.31 the hypothesis of independence is accepted by the Porte Manteau lack of fit test. STEP (3c). The coefficient of skewness of the residuals y(é) = 0.298 is computed from Eq. (3.50) and from Table 3.2 140 the tabulated value for the skewness test of normality is 0.544 for 0.1 significance level. Since y(é) = 0.298 < 0.544 the hypothesis of normality of the residuals is accepted STEP (3d). The Akaike Information Criterion of the AR(2) model is compared with “the corresponding values for the AR(1) and AR(3) models. In order to obtain the AIC(1) and AIC(3), steps (2d') and (2e') are repeated for the AR(1) and AR(2) models. The results are: AIC(1) = -173.15, ATB(2) = -172.36 and AIC(3) = -169.89. Since AIC(2) > AIC(1) it means that the AR(2) model is not better than the AR(1) model. However, both values AIC(1) and AIC(2) are not too different from each other. Hence both models the AR(1) and AR(2) would be about the same in the sense of parsimony of parameters. For the purpose of this example, we will try the AR(1) model and go back to step (2d') for another iteration. Estimation of Parameters (Second Iteration) STEP (2d'). Since p=1 the maximum likelihood autoregressive parameter 4, = 0.483 is obtained from Eq. (4.62). Note that if the method of moments is used 6, = ry = 0.397. STEP (2e'). The residual variance 62 = 0.0322 is computed from Eq. (4.65) since p=1 STEP (2f). Since -1 < 4, = 0.483 < 1 the stationarity condition of the model is met. Tests of Goodness of Fit of Selected Model (Second Iteration) STEP (3a). The residuals &, of the AR(1) model are determined from Eq. (4.72) stdrting at t=2. For instance &, = -0.282 and &, = 0.139. STEP (3b). The statistic Q = 5.243 is obtained from Eq. (4.73) with L = 12, and from tables x%9¢(11) = 19.68 Since Q = 5.243 < 19.68, the hypothesis of independence is accepted. STEP (3c). The coefficient of skewness of the residuals is 4(€) = 0.162 and the tabulated value is 0.539 for 0.1 signifi- cance level. Since (é) = 0.162 < 0.539 the hypothesis of normality is accepted. STEP (3d). The AIC is determined for AR(0), AR(1) and AR(2). The results are: AR(0) = -166.63, AIC(1) = -173.15, and AIR(2) = -172.36. Since AIC(1) is minimum, then the AR(1) model is accepted and selected for further analysis. Actually in this step the AR(0) model is conclusively 141 disregarded, but as said above, the AR(1) and AR(2) models are about the same. STEP (3e). The correlogram ,(z) of the fitted AR(1) model is obtained from Eq. (4.14) and shown in Fig. 4.7 together with the sample correlogram r,(z) determined at step (1d). The correlogram ,(z) of the AR(2) model is also shown for comparison. Actually the correlogram of the AR(2) model is closer to the historical correlogram than that of the AR(1) model. In this sense one may wish to recon- sider the AR(2) model especially because the AIC(1) and the AIC(2) of the previous step are close to each other. In this example though, we will continue the analysis with the AR(1) model. B (2), 14(2) 06 AR (1) mode! AR(2)model 04 we Historical Sample 00 Figure 4.7, Sample correlogram r,(z) and fitted correlogram 6,42) for the ‘AR(1) and AR(2) models for the series z, of Eq. (4.45). Data series of the Gdta River, Sweden. (4) Optional Tests of the Model The optional tests of the model based on data generation were not used in this example. However, we will show how 142 synthetic annual flows can be generated based on the selected model. The fitted AR(1) model for the annual flows of the Géta River is Fe = 0.9528 + 2, " z, = 0-483. 2,1 + 0.179 & where §, is normal with mean zero and variance one. From tables of standardized normal variables we pick say 10 values of £. That is, £1 = 0.414, £» = 1.288, €3 = bs 1.019, &4 = 0.610, £5 = 0.289, & = 1.970, £7 = -0.659 = 0,595, £9 = -0.651 and £15 = 0.906. Then assuming 0, we have 2. 0.483 (0.) + 0.179 (0.414) = 0.074 0.483 (0.074) + 0.179 (-1.288) = -0.195 u 0.483 (-0.195) + 0.179 (1.019) 0.088 23 Similarly, the rest of the values of 2 are obtained and they are shown below. Finally, we get the synthetic stream flows Yj, a8 shown below z, t 1 0.074 1.027 2 -0.195 0.758 3 0.088 1.041 4 0.153 0.106 5 0.022 0.975 6 0.363 1.316 7 0.057 1.010 8 0.134 1.087 9 -0.052 0.901 10 0.137 1.090 (5) Reliability of Model Parameters STEP (5a). The standard deviation s(¥) of the sample mean Y is computed from Eq. (4.53). For 6% = 0.0357, N = 50, and 6, = 0.483, Eq. (4.53) gives Var(y) = 0.001996 or s(¥) = 0.0447. STEP (Sb). From t-tables for a = 5 percent, t g75(49) = 2.010, therefore expression (4.54) gives the 95 percent confidence interval of the mean pas 143 {0.9828 - 2.01(0.0447); 0.9528 + 2.01(0.0447)] or [0.8629; 1.0426). Since p=1, skip step (Sc) and proceed with step (5Sd') to determine the reliability of the remaining parameters based on the method of maximum likelihood. STEP (5d'). For p=1, Eq. (4.69) gives the variance of $1 as Var($,) = 0.015, thus the standard deviation of 6, is s(1) = 0.1225. STEP (Se'). From normal tables for a = 5 percent, U gqs = 1.96, therefore expression (4.59) gives the 95 percent confi- dence interval of the parameter 9, as [0.483 - 1.96(0.1225), 0.483 + 1.96(0.1225)} or (0.243, 0.723} STEP (5f'). Using the confidence interval [0.243, 0.723] for 1 determined in step (Se') and the values Dj, = 2.079 and Dy, = 1.037 computed in step (2d'), the approximate 95 percent confidence interval for 62 can be determined from Eq. (4.65) as a [ or 0.0271, 0.0373] 1 (2.079 - 0.723 x 1.0370); B (2.079 - 0.243 x 1.037)] 4.2.7 LIMITATIONS OF ANNUAL AR MODELING Generally, every stochastic model has some limitations that the analyst should be aware of for better practical appli- cations of that model. Some of the limitations are inherent to a particular model and others are general and applicable to other models as well (either univariate or multivariate models). Therefore, some of the limitations discussed herein are also given in other sections and chapters of this book. c It is generally assumed that AR(p) models "preserve" in the statistical sense the historical mean, standard deviation and the first p_ serial correlation coefficients, For instance, the AR(1) model would preserve the mean , standard devia- tion 6 and the first serial correlation coefficient r. How- ever, this is only true (especially for the case of serial cor- relations) when the method of moments is used for estimating the parameters of the model. In such case, we saw for instance, that the estimate of 9, for the AR(1) model was exactly equal to r, or $= r,. This implies that the fitted AR(1) model correlogram would have $6; = r, or that the 144 model correlogram would pass through the historical correlogram at k=1. Similarly, the AR(p) model correlogram would pass through the historical correlogram for k=1, p. However, when the estimation method of maximum likeli: hood is used, the model correlogram p, will not necessarily be the same as the historical correlogtim r, for the first k=l, ..., p. For instance, for the AR(1) thodel 6) # ri and generally 6; > ry. This would imply a better fit of the correlogram in some cases. In conclusion, one can not say in general that the AR(1) model "preserves" the first serial correlation coefficient. Perhaps a more appropriate statement would be that in some cases the AR(1) model can resemble the historical correlogram and in some other cases it can not. By resemblance we mean that the model and historical correlo- grams should be similar in magnitude and shape. The above conclusion and statcment is also applicable in general for AR(p) models. We will still use the word "preserve" in some cases but when referring to the correlogram we will generally use the word "resemblance" As said above, in some cases AR models may or may not be able to resemblé the historical correlogram but such resemblance could occur only in part of the Sorrelogram. That is, the AR model correlogram p, could be similar to r, only, for say the first few lags Mnd depart for larger ids. (This may be so because in general the AR models are of "short memory", which means that their autocorrelation functions decay fairly fast as the time lag increases. It has been shown in practice that if the historical record has a long-term dependence, that is, the historical correlogram ry "persist" or it does not decay to zero rapidly, then the fitted AR models generally are able to resemble r, for small lags only, but they are not for larger lags. Consequently, this could result in generated droughts or generated storage capacities statistically smaller than the historical droughts or historical storage capacity. )Therefore, in some cases the analyst may need to either change the parameters of the AR model or to select other models beyond the AR model. For instance, if for the estimated parameter $; of an AR(1) model the generated average drought length fg(N) (for a given demand level) is significantly smaller than the historical average drought length L,(N), and the analyst considers this unacceptable because the model does not "reproduce" the average historical drought, then it is possible to find a 4 > 3, which could produce ig(N) statistically equal to £,,(N). This can only be obtained at the expense of distorting the short-term dependence structure. In other words, the new or modified model correlogram pj (equal to (@{)*) will not resemble rj, for small lags, although it may better resemble 145 r, for larger lags. If the analyst considers such distortion usimportant then he would be satisfied with the performance of the AR(1) model with the new parameter $}. Otherwise, he may wish to consider alternative models such as the ARMA or ARIMA models described in Chapters 5 and 6. Another factor which is important to consider in practical applications of AR models (or any other model for that matter) is related to the modeling of skewed variables. Three approaches have been suggested in practice for modeling skewed variables: (i) to transform the skewed data into normal by an appropriate transformation (see Sec. 3.2) and model the transformed data; (2) to apply the AR model to the original skewed data and find the distribution of the resi- duais; and (3) to use a relationship (if available) between the first two moments of the original skewed series and those of the normal series so that the moments of the original skewed series are preserved. There are some arguments in favor and against the above approaches and of course the analyst should’ weigh those arguments in each case to decide on one or another approach. For instance, it is desirable to work with series transformed to normal because the estimation of parameters and goodness of fit tests are more efficient than for skewed variables. On the other hand, when synthetic samples are generated with the AR model based on a trans- formed series, it produces biases in the mean, variance and serial correlations of the original series. Another argument against using the second approach or modeling directly the skewed time series, is that AR models are based on normal time series and most of the goodness of fit tests are also based on the normality of the series analyz~ ed. However, even knowing these limitations, in some cases the application of AR medels directly to the original skewed time series may be useful and necessary. For instance, if for a particular case modeling with the transformed series yields large biases of the main statistical characteristics of the series, then one alternative to avoid those biases could be to use the AR model directly with the original series. Another alternative would be to use the third approach. In the latter case, Matalas (1967) showed how to estimate the parameters of the log-transformed series so that the historical parameters of the original series (assumed log-normal) are preserved In discussing the properties of AR models in Sec. 4.1.2 and the estimation of parameters in Sec. 4.2.2 we said that the autoregression coefficients §,, j=l, ..., p must fulfill the so called stationarity conditions. For instance, for the AR(2) model 6, and $2 must meet the conditions given in expression (4.24). However, there may be cases in practice in which such conditions are not met by the estimated param- eters. In such cases, one may decide to change the parameter 146 estimation procedure or try an alternative model either within the class of AR models or with other models such as the ARMA or ARIMA models One final comment in using AR models for modeling annual time series is that the analyst should not forget the physical aspects behind the occurrence of such time series. For instance, the annual streamflow for a given year is the result of the effective precipitation occurring in that year and the contribution of previous years precipitation carried through groundwater storage. This, added to the effect of surface storages in the river basin, usually yields a time series of streamflow with a positive (either small or large) time dependence structure. It is hard to conceive an annual streamflow times series with negative time dependence. However, due to the fact that such positive dependence is usually small (say of the order of 0.1 - 0.3) and due to the fact that the computed or sample correlations, say 1, is a random variable, it is possible that for a given historical sample annual time series r; may be a negative value. If the analyst does not consider the above-mentioned physical and statistical realities, he may go ahead and use that nega- tive r, value for estimating say 4, =r, (for the AR(1) model) and generate annual flows with a negative dependence structure. This could seriously underestimate such character- istics as droughts and storage capacities and lead to wrong decisions. Therefore, in situations as given above, when the r, is negative, the analyst should use experience and judg- ment and preferably estimates of 1, on a regional basis for estimating the parameters of the generating model. 4.2.8 PRACTICAL APPLICATIONS OF ANNUAL AR MODELS AR models of annual time series are needed to generate synthetic annual series for the purposes of planning and operating water resources systems with over-year regulation schemes. For example, the high Aswan Dam in Egypt was designed for over-year regulation and the analysis of annual streamflows was required (Hurst, 1951). The design of a water resource system with over-year regulation may require operational studies at the seasonal level. In such cases, one needs to generate annual AR series first and subsequently disaggregate the annual series into seasonal values. The specific topic of disaggregation is discussed in Chapter 8. Some practical design and/or operational problems of water resource systems consisting of several reservoirs re- quire the aggregation of various reservoirs into a large equivalent reservoir. The inflows to the large reservoir is the aggregation of the inflows to the small reservoirs. In such cases, univariate AR models can be used for generating synthetic time series of the aggregated ‘streamflows. Other 147 operational problems may also suggest the use of annual AR models. For instance, to determine the reliability of the dependable capacity of a hydroelectric system we may need to generate synthetic time series at various sites. Such reliabil- ity can be obtained without difficulty by generating several traces using a multivariate model which generally consists of a large number of parameters. However, due to uncertainty in those parameters, such reliability is not well-known. The uncertainty in parameters must be taken into account. To take into account the uncertainty of the parameters. of a multivariate model is not a simple task. Thus, a simple approach to solve the above problem may be to aggregate the annual flows at the various sites into a single aggregated annual time series and fit say an AR(1) model to such series In this case we need to consider the uncertainty of only the mean i; the standard deviation @ and the autoregression coefficient $;. By sampling from the distribution of this small number of parameters, and based on the AR(1) model, we can generate aggregated annual flows and disaggregate these flows (spatially to obtain the flows at various sites and temporarily, if needed) to obtain the flows for each season. By repeating such generations say 100 times for each of the sampled parameter set {f, , $1}, the accuracy of the reli- ability of the dependable capacity of the above-mentioned hydroelectric system can be determined. 4.3 AR MODELING OF PERIODIC TIME SERIES 4.3.1 PERIODIC AR MODELS Periodic hydrologic series such as seasonal, monthly, weekly and daily series, generally have periodicity in the mean and in the standard deviation. They may be symmetric or skewed with either constant or periodic skewness, and they may have an autoregressive time dependence structure with either constant or periodic autoregression coefficients. We will use AR models so as to take into account the above- mentioned characteristics Let us consider the periodic series x, , where v denotes the year, t denotes the time interval within the year (day, week, month, season, etc.) with tl, ..., w and w is the number of time intervals within the year (for instance w = 4 for quarterly seasonal series). Assume that the variable is skewed and the transformation function g_ is used to transform it to normal. The normalized series yl, | is , (4.82) Furthermore, the periodic normal series y, , is written as 148 Yur 7 Hy tO Aye (4.83) where and o, are the periodic mean and periodic u standard Ueviation ahd 2, _ Tepresents the time dependent series with mean zero and variance one. The time series 2%, may be represented by an AR model with either con- stant or periodic autoregressive coefficients. AR Model with Constant Coefficients Consider the series 2, | of Eq. (4.83) represented by 2, with t = (v-lw + t. Such series may be modeled by AR mbdels with constant coefficients as in the case of annual series. For instance 2, can be simply the AR(0) model of Eq. (4.38), or the AR(1) model of Eq. (4.39), or in general the AR(p) model of Eq. (4.41) as 2, eT et the ep te (4.84) In general the AR(p) model comprised by Eqs. (4.82), (4.83) and (4.84) has the parameter set [£4 Uys 9, Oy) ---1 by o,; t= 1, ..., w} which’ must be estimated from sample data AR Model with Periodic Coefficients In this case, the model for the series x, | consists of the Eqs. (4.82) and (4.83) but the series 2, | would be either the AR(0) model, Be 2. = (4.85) (which is actually the same as Eq. (4.38)), or the AR(1) model, Zur > Oe Aye * See Syyr (4.86) or the AR(2) model, Zu > ye Aer * bye Avyea * Per bye (4.87) or in general the AR(p) model, Zur Myr Meyer to 8 pe Zee * er bye (4:88) where the 6; ,'s are the periodic autoregression coefficients and of, is the periodic variance of the residuals. In general, 149 the AR(p) model of Eqs. (4.82), (4.83) and (4.88) has the parameter set {ge Ber Os Oy gr vee Op ge Oeys TF 1, ey w}, which must be estimated from data. 4.3.2 PARAMETER ESTIMATION FOR PERIODIC AR MODELS Assume that there is an available sample of periodic hydrologic time series denoted by x, J, v= 1, ...,.N and t= 1, ..., w, where N is the total number of years of data and w is the number of time intervals within the year (for instance w = 12 for monthly data). With this sample data the parameters of the periodic AR model will be estimated 7 In the two categories of models described above the transformation to normality of the original data x, , should be made (if needed) prior to the parameter estimation, unless by necessity the model is applied directly to the original skewed series x, |. Also for the two model categories the estimation of the period mean i, and of the periodic stan- dard deviation o, is a common step and this is described in the first section below. Once the periodic mean and periodic standard deviation are estimated, it remains to estimate either the constant or the periodic parameters of the series z. Actually, if the AR model has constant parameters as in Eq. (4.84), the estimation of these parameters can be made either by the method of moments or the method of maximum likeli- hood as described in Sec. 4.2.2, hence it will not be repeated herein. On the other hand, if the AR model has periodic parameters as in Eq. (4.88), their estimation will be made by either the method of moments or by using Fourier series, which are described in the first two subsections below. Like- wise, the reliability of the estimated parameters is given in the third subsection, only for the case of periodic parameters since Sec. 4.2.2 described the case of constant parameters Estimation of the Periodic Mean and Periodic Standard Deviation The estimates (i, and @, of the periodic mean p, and periodic standard deviation |, respectively of the model of Eq. (4.83) may be determined in two ways: (i) by computing the sample mean Y, and sample standard devia~ tion s, for each time interval t = 1, ..., w, or (ii) by fitting Fourier series to va and s,. Actually, these two procedures have been discussed at length in Chapter 3, 150 therefore, only a summary of these procedures is given in the following text. In the first procedure the estimate of the periodic mean and the estimate 6, of the periodic standard deviation are obtained from the sample mean y_ and from the sample standard deviation s,. From Eqs. (2.7) and (2.8) these estimates are N TEL. w (4.89) Ze Youre val VT and @ i 1 No ~ 9 )t/2 ie wD Go. 5] »t=i, “ep 4.90 where N is the total number of years of available data Generally the-above estimates are used when w is small say, w= 2, 3, 4, 6 or 12. However, when w_ is equal or greater than 12, say in the case of monthly, biweekly, weekly or daily data, it is often convenient to fit the estimates y, and 8, by Fourier series. In the second estimation procedure of y, and o,, the sample mean ¥, and sample standard deviation s, of Eqs. (4.89) and (4.90), respectively are fitted by Fourier series following the procedure described in Sec. 3.3. In such case the Fourier series estimate of the periodic mean p, of Eq. (4.83) is determined by a age Pte . fl=y+ 2 [A,(y) + cos(2nh,(y)t/w)+ iF u BAG) > sinQn h,@)v/w)],t21, 8 4.9) J where ¥, determined by Eq. (3.31), is the mean of ay: ee a Ay, 9), and B, (y) are the hj-harmonic Fourier coeffici- i ents for the mean which are determined by Eqs. (3.32) and (3.33), BH) is the jth significant harmonic for the mean, * and h(¥) is the number of significant harmonics out of a maximum of w/2 or (url)/2 depending on whether w is even or odd, respectively 151 Similarly, the Fourier series estimate of the periodic standard deviation o, of Eq. (4.83) is determined by - WS) x 6. B+ [Ay (s) + cos(2h,(s)t/u) + T ia ij i Bh (s) + sinQanhy(s)v/u)1,t=1, 5 (4,92) j where 5 , determined by Eq. (3.31), is the mean of s,, + AR (S) and By (s) are the hj-harmonie Fourier coeffi- i cients for the standard deviation which are determined by Eqs. (3.32) and (3.33), hy(s) is the jth significant harmonic for the standard deviation and h'(s) is the corresponding number of significant harmonics. As said before, the harmon- ies that enter into Eqs. (4.91) and (4.92) are only those which are statistically significant or h'(¥) for the case of fi, and n'(s) for the case of @,. The criteria and proce: dure for determining these total number of harmonics h (¥) and h'(s) are given in detail in Sec. 3.3.3 Estimation of the Autoregression Coefficients and Residual Variance The estimation of the autoregression coefficients and residual variance in the case of the model with constant parameters (Eq. 4.84) was given in Sec. 4.2.2, Therefore, this section concentrates only on the case of the model (Eq 4.88) with periodic parameters. By applying the method of moments the periodic autoregression coefficients are found to be functions of the periodic correlation coefficients. We will see that these periodic correlation coefficients can be either the sample estimates or, as in the case of the periodic mean and period standard deviation, the Fourier series can be fitted to the sample estimates. Once the parameters y, and o, of the model (4.83) are estimated either by Eqs. (4.89) and (4.90) or by Eqs (4.91) and (4.92), respectively, the remaining parameters Op Oy and 92 ,, T=, ... w of the model (4.88) are estimated by using the standardized sequence Yor A zy ye Sve Mt eR ew (4.93) 152 With this sequence z, | the periodic correlation coefficients ry _ (or By ,) are compated by Eq. (2.10). The periodic autoregression coefficients 4, ., ..1 Gy, are estimated by solving the system of p linear equations resulting from Eq. (4.35) where the population periodic correlation coefficients Py, are replaced by their estimates 6, | In particular, for the AR(1) model of Eq. (4.86), the solution of Eq. (4.35) give Heth TH we (4.94) Similarly, for the AR(2) model of Eq. (4.87), the solution of Eq. (4.35) gives Pr 7 Pata Paar oe EEE SE tet at ~ pe A> PL a and (4.95) Boy Py By ye by ELE) pew at 1-62 Tt-1 In Eqs. (4.94) and (4.95) or in general in solving Eq (4.85) the estimates 6, ,, ..., 6, , are either equal to the sample correlation coefficients r, wt ccs Ep,y? computed by Eq. (2.10) or they may be obtained by fitting the Fourier series to the r's. In this latter case the periodic correlation coefficients are determined by r, & : Ay (r) fy Pet 2 TAR = cos (2th (r)r/u) + oe jel ik Bh (x) + sin(2nhy (r)t/w)) (4.96) for k=, ..., p and t= 1, ..., 4, where py , deter- mined by Eq. (3.31), is the mean of ry |, Any © and By (r) are the hy,-harmonic Fourier coefficient for the correlation ry, which are determined by Eqs. (3.32) and (8.33), hy,(r)’ is the jth significant harmonic and hy(r) is 153 the corresponding total number of significant harmonics These significant harmonics can be determined by the proce- dure given in Section 3.3.3. Note that the above estimated periodic autoregression coefficients should meet the same conditions as suggested in Section 4.1.2. Finally, the residual variance 6? | may be estimated by Eq. (4.32) with the estimates 4; , and 8; , replacing 6; 5 and p; ,, respectively Hence in peeraiy Pp a =1- : = Geman) a Bi Bee TEL ew 97) In particular, for the AR(1) model, Eqs. (4-94) and (4.97) give Tel, yw. (4.98) Reliability of Estimated Parameters The reliability of the estimates f,, 6, and 4; , for j=1, 0, p and t= 1, ..., 0 are given in the form of confidence’ intervals of thé corresponding population param- eters. Since periodic hydrologic time series have periodic parameters in addition to time dependence, the actual confi- dence intervals of the parameters are not simple to obtain. Therefore, the equations given are only approximations. The variance of the estimated periodic mean (i, is computed by ae Var.) = Hotel, w (4.99) where a is the estimate of the periodic variance. By using the t-distribution, the (1-a) confidence interval for the parameter p, is given by Th, - Dy gyq * SDs BY COV-D, 479 8(8)] caress (4.100) where s(G_) = [Var(f,)]* and UN-L)ygj2 #8 the 1-0/2 quantile of the t-distribution with N-1 degrees of freedom 154 The residuals £ | of the fitted AR model are determined in general by Eq. (4.88) with the parameters 0,78 and 9,, replaced by the corresponding estimates /_'S and 6,,, respectively. Therefore, &, , of Eq. (4.88) may be written as 8 DGy ebayer oo 7 ope yep) (4-108) where the notation &, , is used instead of §, | to indi- cate the sample estimates of the residuals. For the AR(1) foe of Eq. (4.103) would be obtained beginning at 1,9: For’ the AR(2) model the start would be at £) in general for the AR(p) model Eq. (4.103) would start at model, and Sy pel For testing whether the residuals are independent two approaches may be used. One approach would be to compute the correlation coefficients say r, (6), T= 1, ...,w by Eq. (2.10) and check whether they are statistically equal to zero. In this case the critical correlation coefficient can be obtained by (Yevjevich, 1972; p. 239) t(N-2),_, "9/2 * | a (4.104) N-2 + N24 gg where tN-2)) ayo is the (1-0/2) quantile of the t- distribution with N-2 degrees of freedom and a is the con- fidence level. Therefore, if |ry (6)| STEP (1d). Compute the sample estimates of the mean ¥, by Eq. (4.89), the standard deviation s, by Eq. (4.90) and the periodic correlation coefficients ry \(y) by Eq. (2.10) for k=1,2 and t=1, ..., w. 5 STEP (le). Plot the estimates a and s, versus t. Based on their graphical display and on the number of time intervals within the year decide whether to fit the Fourier series to the estimates y, and 5, STEP (if). Plot the estimates r, , versus t and by visual inspection decide whether the AR model to be used would’ be with either constant or with periodic autoregression coefficients. If Fourier series fit is used for y, and s_ and if the AR model is chosen with periodic autoregression coefficients, then the Fourier series fit can be made to the estimates ry | of the periodic correlation coefficients. (2) Estimation of Parameters The main purposes of this part are to estimate periodic mean and periodic standard deviation by Fourier series (if so decided in step le), to select the order of the AR model, to estimate its corresponding autoregression coefficients, and to check the conditions that must be met by the estimated parameters. STEP (2a). If w, and o, of Eq. (4.83) are estimated by yy and s,, respectively (step le) continue with step (2b) On the other hand, if the periodic parameters are to be fitted by Fourier series, fellow the procedure described in Sec. 3.3.3 STEP (2b). Determine the standardized series 2, , from Eq. (4.93) and continue with steps (2c) or (2c'), depending whether the AR model to be fitted to z, | is with constant or with periodic coefficients, respectively (step 1f) 158 STEP (2c). Determine the parameters of an AR model with constant autoregression coefficients. In this case, follow the modeling procedure given in Sec. 4.2.2 for the case of annual series STEP (2c'). Determine the parameters of an AR model with periodic autoregression coefficients. In this case, follow the steps below (i) Compute and plot the sample correlation coefficients Ne) of z, from. Eq. (2.10) for k= 1, » Pp and t= 1, ..., w, where p' is the maximum number of lags and p' < w. Usually p'~3 as long as w> 3. (ii) Select the order p of the AR model to be tested. In practice it is difficult te identify the order p_ unless a detailed analysis is made of the correlation coefficients ty, ;(2) and the analyst has some experience. In most cases, a low order model say, p = 1 is selected. A practical guide for selecting p is based on experience in modeling various types of hydrologic series. For instance, the seasonal correlation coefficients of precipi- tation are often insignificant, therefore an order p = 0 may be selected in such a case. On the other hand, the seasonal correlation coefficients of streamflow are in most cases significant, therefore an order p = 1 may be selected for further analysis. (iii) If from steps (1e) and (2a) the periodic parameters are f, =¥, and 6,*s,, then make By , = Ty (2) and continue with step (iv). On the other hand, if fi, and 6, are fitted by Fourier series then 6, , may be also obtained by Fourier series. In such a case, follow the procedure described in Sec. 3.3.3. (iv) With the 6, , obtained in step (iii) determine the periodic autoregression coefficients 6; ,, ... 6.4 bY Eq. (4.95). In particular, for p= 1 obtain 6), by Bq. (4.94) and if p = 2 obtain 4, , and 6), by Eq. (4.95). Determine also the periodic variance 62, by Eq. (4.97) or by Eq. (4.98) for p = 1 (v) Test the conditions to be met by the estimated parameters 0) ys «+ +5 4, ,: As an approximation, test the conditions given for the case of AR models with con- stant parameters. In addition, verify that the variance 82, of Eq. (4.97) is positive 159 (3) Tests of Goodness of Fit of Selected Model The tests of goodness of fit of the model depend on whether the selected AR model is of constant or periodic autoregressive coefficients. Therefore, depending on the case, follow the appropriate steps given below. Model with Constant Autoregressive Coefficients In this case follow steps (3a) through (3e) of Sec 4.2.4. Continue with step (4) Model with Periodic Autoregressive Coefficients STEP (3a'). Determine the residuals £ of the model by Eq. (4.103) starting at £1 541 STEP (Sb'). Test the hypothesis that £, | is independent by computing the correlations rj ,(£), t= 1, ...,w by Eq. (2.10) and comparing them with the critical correlation coeffi cient Theg/2 of Eq. (4.104) for the confidence level a =5 percent, If ry ((£) < ry ggg) T= 1, +... W the hypothe- sis of independent residuals is accepted. Alternatively, the overall correlogram r,(£), k = 1, ..., L with L about 10-20 percent of the sample size may be determined by Eq. (2.5b) and either the Anderson test or the Porte Manteau lack of fit test may be applied (see Sec. 3.5.1). If the hypothe- sis of independent residuals is accepted, continue with step (3c'), otherwise select a higher order model say p+1 and go back to step (2c). However, even if the hypothesis is rejected the analyst may wish to proceed with subsequent test steps so that an overall analysis and judgment of the model can be made at the end. STEP (Sc'). Test the hypothesis that the residuals £. | are normal by any of the tests given in Sec. 3.5.2. The normality test can be applied for the residuals of each time interval 1 (a total of w tests, one for each t) or for the whole residual series £, t = 1, ..., (Nu-p). The test of normality is often made by plotting the empirical frequency distribution of the residuals (either for each 1 or as a whole) on a normal probability paper. If the hypothesis of normality is accepted proceed to step (4). Otherwise, go back to step (1b). However, the analyst may wish to relax the normality assumption and proceed with subsequent steps 160 (4) Optional Tests of the Model The main purpose of this part is to use the data generation technique for comparing the statistical character- istics of the historical series with the corresponding characteristics derived from the generated series. The sta~ tistical characteristics such as the periodic mean, periodic standard deviation, periodic skewness coefficient, constant or periodic correlation coefficients, low flow characteristics, and characteristics of the aggregated annual flow series are usual- ly considered for comparison. ‘The steps described below are applicable for comparing any of the above mentioned charac- teristics. It can be used also for determining experimentally the reliability of the parameters of the model (part (5) below). STEP (4a). Determine the historical statistical characteristics uj, from the historical sample x, _ and/or from the trans- formed series y as the analyst judges it convenient ra judg Such historical characteristic. u;, may represent any of the above mentioned statistical characteristics either in the form of periodic or annual values. STEP (4b). Based on the selected AR model and parameters estimated in part (2), generate say 100 sequences of periodic series of the same length as the length of the historical sample. Follow the generation procedure outlined in Sec 4.3.4, STEP (4c). Determine the same statistical characteristics considered in step (4a), denoted as u_(i), from each of the generated series i= 1, ..., 100. gs STEP (44). Compute the mean G, and the standard deviation s(u,) of u,(i), and determine the interval fags esta) s a et © s(u,)] (4.109) where c can be equal to 1.0, 1.5 or 2 depending on how strict is the test. Alternatively c¢ could be taken as the standard normal variate of a given significance level, such as c = 1.64 for the 10 percent level STEP (4e). Check whether the historical characteristic w, falls within the interval given by expression (4.109). If it does, then we can conclude that the AR model "preserves" the historical statistic Uy: Otherwise, the model does not preserve uj, and in. such case the analyst may wish to 161 either modify the parameters of the model or to change the order or the type of the model. (3) Reliability of Model Parameters Once the parameters of the model are estimated and the model is accepted based on the above tests and/or judgment of the analyst, the confidence interval of the parameters of the model can be determined as a measure of the parameter reliability. This is determined in the following steps by using the approximate equations given in Sec, 4.3.2. Alter- natively, the reliability of the estimated parameters can be obtained experimentally by data generation following a similar procedure as indicated in part (4) (in such case instead of computing the statistical characteristics for each generated series, the estimates of parameters of the fitted model are computed for each series) STEP (Sa). Compute the standard deviation s(fi,) of the sample periodic mean fi, by Eq. (4.99) STEP (Sb). Determine the (1-a) confidence interval for the periodic mean 1, from expression (4.100). The confidence level may be taken as a = 5 percent. STEP (5c). Determine the (1-a) confidence interval for the periodic variance of from expression (4.101). Depending on whether the selected AR model is of constant or periodic coefficients follow the appropriate steps shown below. Model with Constant Autoregressive Coefficients In this case, follow the steps (5d) through (5f) of Sec. 4.2.5 if the method of moments is used, or steps (5d') through (5f') if the method of maximum likelihood is used for estimating the parameters. Model with Periodic Autoregressive Coefficients STEP (5d). Determine the (1-a) confidence interval for the periodic correlation coefficients py, from expression (4.102) a STEP (Se). Determine the (1-a) confidence interval for the Periodic autoregression coefficients §, 7, k= 1, ..., p and t= 1, ..., w by substituting the (1-a) confidence interval for py, , (step 5d) into Eq. (4.35) in general, or in partic- ular into Eq. (4.94) if p= 1 or into Eqs. (4.95) if p= 2. 162 4.3.5 EXAMPLE OF AR MODELING OF PERIODIC SERIES Modeling of Monthly Net Basin Supplies for the Lakes Michigan-Huron The lakes Michigan-Huron is part of the Great Lakes system located on the border of the United States and Canada. They have a combined drainage area of 97,400 sq-mi and a water surface area of 45,300 sq-mi. The net basin supply is the water received by the lakes from precipitation on both their surface and drainage areas less the net evapor- ation and condensation on the lakes surface. Sixty-nine years of monthly net basin supply (NBS) are used for the analysis (Yevjevich, 1975) and they are shown in Appendix AT.5 of Chapter 7. (1) Preliminary Analysis and Model Identification STEPS (1a) and (1b). The monthly skewness coefficients of the original data were determined with an average skewness of about 0.6. Therefore,.a transformation to reduce this skewness closer to zero may be desirable. A simple logarith- mic or power transformation could be made but difficulties arise due to some negative values present in data. There- fore, it was decided to model the original data directly and to take care of the skewness by finding an appropriate distribu- tion of the residuals. STEP (1c), The monthly time series denoted as y, | is plotted as shown in Fig. 4.8. It shows that during the spring and part of the summer the NBS are generally higher than during the rest of the year; this situation repeats itself every year in a periodic manner. It may be also obseved that during the months of low NBS the variability is relatively large and this’ may result in monthly standard deviations greater than the monthly means STEP (1d). The sample mean Y,, sample standard deviation s, and sample correlation coefficients r, , were deter- mined for k = 1,2,3 and t= 1, ..., 12. STEP (le). The estimates y, and s,, t= 1, ..., w are plotted in Fig. 4.9. Since w = 12, one would typically use y, and s, as estimates of y, and o,, respectively, instead of a Fourier series fit. However, this is a matter of judgment of the analyst. In this case, we wish to keep the number of parameters of the model to a minimum, therefore the Fourier series fit of y, and s, is made. 163 Figure 4.8. Time series of monthly of the Lake Michiga 1900-1919. basin supply (NBS) ron for the period VprS_oh 8, 300} 2 3 6 6 6 * 6 9 ON Figure 4.9. (1) Fitted periodic mean (i,, (2) periodic mean ¥,» (3) fitted periodic standard deviation 6, and (4) standard deviation s_, for the NBS of Lake Michigan-Huron c STEP (if). The estimates ry, for k= 1,2,3 and v= 1, » 12 are plotted in Fig. 4.10. It shows that ry, do not significantly vary from month-to-month, although ‘rp, and rg.¢ Show variability. However, since ry, is more reli- able than the other two we will assume that in general the 164 oil ER Sa Se ee Figure 4.10. Variation of the monthly first (1), second (2), and third (3) autocorrelation coefficients of the standardized series 2, | for the NBS of Lake Michigan-Huron. a Nyy coefficients do not significantly vary from month-to- month. Therefore, the simple AR model with constant auto- regressive coefficients is selected for further analysis. (2) Estimation of Parameters STEP (2a). Since p, and o, will be estimated by Fourier series fit of y, and" s_, respectively, the following calcula tions are made: (i) The mean of y_ is calculated by Eq. (3.31) as ¥ = 109.787. Similarly, the mean of 8, is $s 65.608. The Fourier coefficients A,(.) and B;(.) and the 1, 1.2, 6 for each of the periodic functions Y, and s_, were corresponding explained variances V(h;), j computed by the ratio of Eqs. (3.37) and (3.36), and are shown in Tables 4.4 and 4.5, respectively. Notice that since w = 12 the total number of harmonics is 6. Table 4.4. Fourier Coefficients_and Explained Variances for the Monthly Mean y, of NBS of Lake Michigan- Huron. HARMONIC COEFFICIENT = COEFFICIENT = EXPLAINED J A By) VARIANCE 1 93.848 84.036 90.916 2 8.191 34,842 7.339 3 9.792 -.353 -550 4 -.911 6.393 239 5 -8.567 -9.185 -904 6 -2.140 0.900 206 165 Table 4,5. Fourier Coefficients and Explained Variances for the Monthly Standard Deviation s_ of NBS of Lake Michigan-Huron i. HARMONIC COEFFICIENT — COEFFICIENT EXPLAINED J Aj(s) Bs) VARIANCE SB 1 3.574 16.535 2 14.685 73.870 3 1.990 5.742 4 1.580 1.396 5 2.590 2.383 6 0.000 037 (ii) The significance of harmonics was tested using the graphical procedure suggested in Sec. 3.3.3. Figure 4.11 shows the cumulative periodogram P. of Eq. (3.38) for both y, and s,. Observing thé periodograms one should select two harmonics for ae and three har- monics for s,. Instead, h* = 4 harmonics were 5 PERIOD0GRAM (9%) cumative 8 8 8 t $ + 4 5 —e MUNBER OF HaRONES Figure 4.11. Cumulative periodogram for the mean y_ and standard deviation s_ for the monthlf NBS of Lake Michigan-Huroh 166 selected because similar periodogram analysis made with NBS for the other lakes of the Great Lakes system indi- cated altogether a better fit by using four harmonics Therefore, the selected four (significant) harmonics for the mean and for the standard deviation were h, = 1, hy = 2, hg = 3 and hy = 5 * * (ii) The Aj(.) and BX.) Fourier coefficients corresponding ‘to the selected significant harmonics are obtained directly from Table 4.4 for the mean and from Table 4.5 for the standard deviation (iv) The fitted periodic mean fi, is computed by Eq. (4.91) and the fitted periodic standard deviation 3, is computed by Eq. (4.92) using the appropriate selected harmonics and the corresponding Fourier coefficients indicated above. For instance, the fitted fi, for t= 1 would be computed by Eq. (4.91) as, fi, = 109.787+[-93.848 cos(2nx1x1/12) + 84.036 sin(2nx1x1/12] f [8.191 cos(2mx2x1/12) - 34.842 sin(2nx2x1/12)] [9.792 cos(2mx3x1/12) - 0.353 sin(2nx3x1/12)] + [-8.567 cos(2nx5x1/12) - 9.185 sin(2nx5x1/12)] = 46.93. Similarly for 1 = 2, Eq. (4.91) gives fig = 109.787+{-93.848 cos(2nx1x2/12) + 84.036 sin(2nx1x2/12)]} + [8.191 cos(2nx2x2/12) - 34.842 sin(2nx2x2/12)] + [9.792 cos(2nx3x2/12) - 0.353 sin(2nx3x2/12)] + [-8.567 cos(2mx5x2/12) - 9.185 sin(2nx5x2/12)] = 95.25. The fitted fi. _and 6, t = 1, ..., 12 are plotted together with * y, and news respectively in Fig. 4.9. STEP (2b). The series y., | is standardized by using Eq. (4.93) with the fitted , and 6, determined in step (2a), Gv). For instance, the standardized series 2, , for v= 1 and t=1and t = 2 are , 167 211 = (41 7 Fp/ 8, = (1.0 - 46.93)/50.96 = -0.99 1,1 2109 = Gy,2 ~ Ay)/ 8, = (162.0 - 95.25)/48.58 = 1.374 Similarly, 2 g = 71-619, 2 4 = -0.364, ..., 269 yo = 1.686 This 2, , series has approximately mean zero and variance one. Since the AR model with constant autoregressive coefficients was selected in step (1) we will continue with step (2c) rather than step (2c'). STEP (2c) (i) The time series 2, , is now denoted as z,, t = 1, 828, since N' = Nw = 69 x 12 = 828. That is, +940, 2) = 1.374, ..., Zgog = 1.686. Gi) The sample correlogram rj(z), k = 1, ..., 40 of the series 2, is determined by Eq. (2.5b). For in- stance, ry(z)"= 0.229 and ro(z) = 0.135. The correlo- gram is shown in Fig. 4.12(a) ---For series 2 © For AR(1) model —for AR(2) model Figure 4.12. (a) Correlogram of the standardized series 2, and expected correlograms for the first and second order Markov models, (b) cor- relograms of the series ¢, after fitting the first (1) and second (2) order Markov models (Lake Michigan-Huron). ii) Superimposed on Fig. 4.12(a) are the correlograms of an AR(1) model with parameter 6, = 0.229 and of an 168 AR(2) model with parameters $, = 0.209 and }, = 0.087 (see step (iv) below and Sec. 4.1.2 for the computation procedure of the correlograms). Figure 4,12(a) shows that the correlogram of the AR(2) model is closer to r,(z) of the historical record than that of the AR(1) model. Therefore, the AR(2) model is select- ed in preference to the AR(1) for further analysis However, it may be also observed from Fig. 4.12(a) that possibly the correlogram of a AR(3) model may approxi- mate more closely to r,(z) than the correlogram of the other two models. The“reader is encouraged to try the AR(3) model and also the | ARMA(1,1) model for comparison. (iv) Since the sampie size is large (N' = 828) the method of moments is used to estimate the parameters of the selected AR model (for large samples the method of moments gives approximately the same estimates as the method of maximum likelihood). Therefore, the autore- gressive coefficients $, and 6: are computed by Eq. (4.48) as 8, = 0.229 (1 - 0.135)/(1 - 0.229%) = 0.209 and de = (0.135 ~ 0.2297)/¢1 - 0.229) = 0.087. The residual variance 62 is computed by Eq. (4.51) as 5 2 - 828(1) (1 + 0.087) . 20 2) 8 6 = oe ocoaTy— [GL - 0.087)" - 0.2097) = 0.943 since the variance of the series 2, is approximately 6* =1 (v) The estimated parameters $, and $2 comply with the stationarity conditions of expressions (4.24) since $1 + 2 = 0.209 + 0.087 = 0.296 < 1, be - $1 = 0.087 - 0.209 = -0.122 <1 and ~1 < $2 = 0.087 < 1. (3) Test of Goodness of Fit of Selected Model STEP (3a). From Eq. (4.72) the residuals é, of the AR(2) model are computed beginning at t= 3 such 4s 83 = 2g - $422 - B92, = -1.619 - 0.209 x 1.374 - 0.087 x (-.94) = -1.824 and 169 fq = 24> 125 - G22e = -0.364 - 02209 (-1.619) - 0.087 (1.374) = -0.623 Similar computations are carried through fgo, = 1.623. STEP (3b). The hypothesis of uncorrelated residuals is tested by the Anderson test of the correlogram. The corre- Jogram r,(€) of the residuals €, is determined from Eq. (2.5b) and the 95 percent probability limits for the correlo- gram of an independent series are computed by Eq. (2.2la). They are plotted in Fig. 4.12(b). This figure also shows the correlogram of the residuals if an AR(1) model is used. The correlogram for residuals of the AR(2) model is within the 95 percent confidence limits, however, for the residuals of the AR(1) model there are four points ‘out of the limits (a maxi- mum of two would be allowed out of the limits since (1 - 0.95) x 40 = 2)). Therefore, it is concluded that the residuals for the AR(2) model are uncorrelated STEP (3c). The hypothesis of normality of the residuals é is rejected by the skewness test of normality. This is not 4 surprise since right from the beginning in steps (1a) and (ib), it was known that the skewness of the original monthly data was significantly different than zero, and it was decided not to make a transformation because of the negative values present in part of the data. Therefore, the normality as- sumption in this case is relaxed and for purposes or genera- tion of synthetic NBS, the probability distribution of the residuals is found. The residuals é, determined at step (3b) are first standardized dividing it by the standard devia- tion a, = 0.943 = 0.971, or & = &,/0.971. Then the em- pirical frequency distribution of {, is determined as well as the fitted lognormal probability function with parameters @ = 1,356, B = 0.246 and & = -4.0. Therefore, the probability distribution of the uncorrelated residuals £, of the AR(2) model is i L tuné-§,) - 4” #&) = b= exp {- —— 3 (E-£0) pyBi 2p? or é é 2 t@ = -ve[ ending. 0) = 1.886] | {2m (0.246) (£ - 4.0) exp 0.246 (4.110) 170 Figure 4.13 shows the empirical density function as well as the fitted lognormal-2 density function of Eq. (4.110) for the residuals £, é a a Figure 4.13. The fit of three-parameters lognormal probability density function (smooth solid line) to the fre- quency density curve (broken line) of &, variable of monthly mean NBS of Lake Michigan- Huron. STEP (3d). As observed in steps (2c) and (3b) the correlograms of the AR(1) model as well as the correlogram of the corresponding residuals practically rule out that model On the other hand, the AR(2) model gave satisfactory corre- lograms. As said in step (2c), (iii) the AR(3) model correlo- gram would likely be closer to the historical correlogram than the correlogram of the AR(2) model. Besides, the correlo- gram of the residuals of the AR(3) model would likely pass the Anderson test as it happens with the AR(2) model. Thus, the question would be, would the AR(3) model be a better model than the AR(2) considering the parsimony of parameters. Thus, the Akaike Information Criterion (AIC) can be used to test the AR(2) against the AR(3) model. For comparison we also tested the AR(1) model. From Eq. (4.74) we get: AIC(1) = -41.34, AIC (2) = -44.59 and AIC(3) = -41,72. Since the AIC(2) is less than the other two, then the AR(2) model is best. STEP (3e). Figure 4.12(a) gives the correlogram of the selected AR(2) model. (4) Optional Tests of the Model The optional test of the model based on data generated is not carried out for this example (see Sec. 7.3.5 for an 11 example of tests based on data generation). However, we will show how synthetic monthly NBS for the Lake Michigan-Huron can be generated based on the fitted AR(2) model. From Eqs. (4.106) and (4.107) we have a (4.111) where i, and 6, are given by t 1 2 3 4 5 6 a 46.93 95.25 176.80 277.60 254.98 210.49 6, 80.98 48.58 72.15 86.70 88.03 65.01 t 7 8 9 10 11 12 120.49 55.79 26.39 5.87 33.13 25.35 52.45 59.06 63.80 72.31 66.25 61.48 and = 0-209 2, + 0.087 2, + 0.971 &, (a.m) where £ is lognormal-3 with parameters @ = 1.356, B = 0.246 ard & = -4.0. In order to generate this type of variable we use the equation & = -4 + exp {1.356 + 0.246 uj (4.113) with u the standardized normal variable with mean zero and variance one. From tables of standardized normal random numbers let us take 18 values of u. Then using Eq. (4.113) we obtain the vaiues of £ These two set of variables are: Order u & [Order u 7 [Order u & 1 0.414 0.297] 7 -0.659 -0.700/ 13 1.195 1.207 2 -1,288 -1.173| 8 0.595 0.492] 14 -1.160 -1,083 3 1,019 0.986} 9 -0.651 -0.694] 15 -1.835 -1.529 4 0.616 0.515] 10 0.906 0.849) 16 -0.468 -0.541 5 -0.289 -0.386 | 11 0.678 0.585} 17 0.068 -0.054 6 1.970 2.300} 12 -1.175 -1.093]18 -0.595 -0.648 172 Assuming 2) = 0.0 and 2, +0.0 generate the variable 2, by Eq. (4.112). For instance, for t= 1, . 5 and usilg the £ values obtained above we get: 7 2, = 0.209 (0.0) + 0.087 (0.0) + 0.971 (0.297) = 0.288 2 = 0.209 (0.288) + 0.087 (0.) + 0.971 (-).173) = -1.079 Bg = 0-209 (-1.079) + 0.087 (0.288) + 0.971 (0.986) = 0.757 2, = 0.209 (0.757) + 0.087 (-1.079) + 0.971 (0.515) = 0.564 sg = 0.209 (0.564) + 0.087 (0.757) + 0.971 (-0.386) = -0.191 Following the same procedure the rest of the %, values are generated . Actually, to begin the generation we as- sumed 2) = 0 and 2_, = 0, so the first generated values of 2, are biased. Fiering and Jackson (1971) recommended to drop the first 50 generated values to avoid such a bias. This recommendation should be followed in practice. For the purpose of this example we will drop only the first 3 gener- ated values Therefore, re-initializing the generated a values 50 that 2 = 2,5 we get t a t a t 1 0.564 6 -0.607 11 -0.761 2 -0.191 7 0.733 12 492 3 0.242 8 0.668 13 -0.903 4 -0.228 9 -0.858 4 -0.371 5 0.413 10 1.748 15 -0.785 Finally, using Eq. (4.111), the parameters fi, and 6, and the generated values, we can generate synthetic t NBS y,, series. For instance: 46.93 + 50.96 (0.564) = 75.67 95.25 + 48.58 (-0.191) = 85.97 = 25.35 + 61.48 (-1.492) = -66.38 = 46.93 + 50.96 (-0.903) = 0.913 176.80 + 72.15 (-0.785) = 120.16. 173 (5) Reliability of Model Parameters STEP (Sa). The standard deviation s(fi,) is obtained by Eq. (4.99). For instance for t = 1 io é s(f,) = —b = 60982 = 6.13, Ww 169 Similar computations can be made for t= 2, ..., 12 STEP (5b). The 95 percent confidence interval for 4 is obtained from expression (4.100). From t-tables, t(68) 97. = 2.0, so for t= 1 we have [46.93 - 2 (6.13), 46.93 + 2 (6.13)]_ or [34.67; 59.19]. Similar confidence intervals can be obtained for t 12 STEP (5c). The 95 percent confidence interval for the standard deviation o, is obtained from expression (4.101) From x2-tables, y2(68) = 47.1 and x?(68) = 92.66, so for t=1 we have 0% eee [sso B50 26) | or 92.66 ri} [43.65;. 61.23]. Similar confidence intervals can be obtained for t > 12 STEP (5d) Since the AR(2) model has constant autoregressive coefficients with parameters estimated by the method of moments, the variances Var($,) and Var($s) are obtained by Eq. (4.57) as Var(1) = Var(,) = (1 - 0.087)/826 = 0.001201 Then the standard deviation is ($1) = s($2) = 0.035. The covariance between §; and $2 is determined from Eq. (4.58) as Cov($;, $2) = -0.209 (4 + 0.087)/826 = -0.000275. Then, the correlation coefficient between $6, and $2 is C41, 2) = “2000278 0.001201 -0.229 STEP (Se). The 95 percent confidence interval for 6, and $2 are obtained from expression (4.59). From normal tables U g7s = 1.96, therefore, the confidence interval for ; is 174 [0.209 - 1.96 (0.035); 0.209 + 1.96 (0.035)] or [0.1404; 0.2776]. Similarly, the confidence interval for $2 is [0.087 ~ 1.96 (0.035); 0.087 + 1.96 (0.035)] or (0.0184; 0.1556] Alternatively, instead of the individual confidence intervals of 1 and 2 their joint confidence region can be obtained assuming that §, and $2 have a bivariate normal distribu- tion with parameters ($1) = 0.209, ($1) = 0.035, ($2) = 0.087, o($2) = 0.035 and (41, 62) = -0.223 STEP (5f). The 95 percent confidence interval for the residual variance o,? is obtained approximately by replacing the lower and upper limits of _$, and 62 in Eq. (4.51). That is, the upper limit for o,? is 828 (1 + 0.0184) 1 (1 _ 9 o1g4)? - 0.14042] 826 (1 - 0.0184 = 0.982 , and the lower limit is 828 (1 + 0.1886) (1 - 0.1556)” - 0.27767] = 0.872 8x6 (1 = 0.1556) 4.3.6 LIMITATIONS OF PERIODIC AR MODELING The modeling of periodic hydrologic series (seasonal, monthly, weekly, daily, etc.) is more complex than the model~ ing of annual series because the former have the influence of the annual cycle which produces periodic variations in some or all of the statistical characteristics of the series. Al- though the models proposed for periodic series are intended to incorporate the various features observed in real, periodic series, their actual application to such series have limitations which must be taken into account. Such limitations can be partially resolved by using engineering judgment and experi- ence; having a good knowledge of the series to analyze, a good knowledge of the overall problem to be solved and a good statistical background First of all, the identification of the periodic AR model requires: (1) to resolve the problem of non-normality of the Gata, (2) to decide whether to use Fourier series in estimat- ing the periodic parameters, and (3) to select the AR model either with constant or with periodic coefficients. The above three problems can not be resolved only on statistical grounds but also by using experience and judgment. In fact, a good part of the problem solution and decision is based on whether or not the size of the historical record is small. Concerning the problem of non-normality, refer to Sec. 4.2.7 for the three ways on how to approach such problems. If the analyst decides to transform the periodic series, a different transfor- mation may be needed for each time interval 1. However, if 175 the sample size is small, the transformations varying with 1 may not be desirable and one may have to settle for a single or overall transformation even though it may not be sufficient to reduce the skewness close to zero (especially considering the time intervals individually). As stated in Sec. 4.2.7, in transforming the original data, biases in the mean, standard deviation and serial correlations can occur. If such biases are considered important, the analyst may wish to avoid such transformation. During the low-flow (or similar) season, periodic series often have standard deviations which are about the same or greater than the corresponding means. In such a situation, it is practically a necessity to transform the data to avoid generating negative flows. Alternatively, the AR modeling can be made on the original data but a bounded distribution should be fitted to the residuals to avoid gener- ating unrealistic samples. For additional discussion on the subject of non-normality, the reader is referred to Sec. 4.2.7 The problem whether to use Fourier series analysis for fitting the periodic statistical characteristics of the historical series is related to the size of the historical record and to the number of time intervals in the year of the data analyzed Generally speaking, if the number of years of record is small, one would like to have a model with a small number of param- eters as well. In such case, the Fourier series fitting of the periodic characteristics of the historical series is desirable in order to decrease the number of parameters of the periodic components such as the mean, standard deviation and correla- tion coefficients. Similarly, when the data is of small time intervals, such as weeks or days, Fourier series fit is advis- able. As said before, the use of Fourier series is a matter of preference and up to the analyst's criteria and experience. The selection of AR models with either constant or periodic autoregression coefficients generally depends upon whether the historical data show periodic or seasonal correlation, An exact statistical test for detecting significant periodic corre- lations appears not be be available. Thus, the graphical observation of the historical correlation for each time interval and judgment of the analyst in most cases are the basis for decisions. Here the sample size aiso plays an important role When the number of years of record is small it may not be worth it to model with periodic coefficients. Furthermore, when the uncertainty of the parameters of the model are taken into account, AR models with constant coefficients are preferable, because, the number of autoregression coefficients would be small say 1, 2 or 3. The method of moments has been widely used for estimating the parameters of periodic AR models. Although the method of maximum likelihood gives better estimates it was not included herein because it is more complex to solve due to 176 periodicities. One further limitation of periodic AR models is related to the conditions to be met by the periodic autore- gression coefficients. Actually, very limited theoretical work is, available on this subject; therefore, we suggest using as ah approximation the same conditions as in the case of AR models with constant parameters The selection of the type and form of the AR model is mostly made by the judgment and experience of the analyst. The tests of goodness of fit of the model generally assume the selected type and form of the AR model and are limited to testing the characteristics (normality and independence) of the residual series. When fitting AR models with constant coefficients, the comparison of historical and fitted model correlograms is often used for testing the goodness of the model. However, when fitting AR models with periodic coeffi- cients, similar comparison for testing the model is not used because of limited studies along these lines. Besides testing the goodness of fit of the proposed AR model by looking at the residuals and correlograms, the analyst may wish to check whether some of the characteristics of the historical time series are preserved. ‘Typically, the periodic characteristics of the mean, standard’ deviation, skewness and_ periodic correlation are checked. A good fitted model should be able to reproduce these characteristics. However, the analyst may also wish to check the characteristics at the annual level, such as the annual serial correlation, mean run lengths, etc. Even though the periodic characteristics may be preserved by model, there is no assurance that the characteristics of an- nual series will be preserved. In cases in which the annual characteristics are important and the model does not preserve such characteristics, the analyst may wish to consider alter- native models. Usually the modeling by disaggregation is an alternative. Such type of modeling is discussed in Chapter 8. 4.3.7 PRACTICAL APPLICATIONS OF PERIODIC AR MODELS AR models of periodic hydrologic time series are needed for generating synthetic periodic series and for forecasting. These applications are for the purposes of planning and operating water resources systems with seasonal regulating schemes. Seasonal and monthly series are most often used, although series of shorter time intervals such as weeks and days may be needed. The periodic AR models are used mostly for modeling periodic series at single sites. In some cases, where the generation at several sites is needed, it may be useful to model the aggregated periodic series by an univariate AR model rather than modeling the series at the various sites by a multivariate AR model. 17 One example is, when testing the operation of a system of reservoirs which are aggregated into a single equivalent reservoir with inflows equal to the aggregated inflows of the individual reservoirs. Another case is when a single lake or reservoir has several inflows. In such case, instead of generating the various inflows by a multivariate AR model, the univariate AR modeling of the aggregated periodic series is used. Another reason for modeling the aggregated series by a univariate model instead of modeling the various series by a multivariate model is to determine the reliability of parameters of the model in rather simple terms, and to study for instance, the effect of parameter uncertainty in a pro- posed operating plan or design. An example of the above was the use of a AR(1) model with periodic mean, periodic standard deviation and constant first autoregression cceffi- cient for modeling the aggregated historical flows (in units of energy) of several rivers in Colombia (Millan and Mejia, 1980). 178 APPENDIX A4.1_ AUTOCORRELATION FUNCTION OF AR(p) MODELS Section 4.1.2 indicated that the autocorrelation function py, of AR models satisfies the difference equation Pi = Pear t MPa tos + OpPyepr K> 0 (A4.1) where 6), ..., 6, are the autoregression coefficients of the AR(p) model of Eq. (4.1). For given or estimated set of 6 coefficients, Eq. (A4.1) can be solved simultaneously to ob- tain py, ++) Pyy- Then for Kk > p, Eq. (A4.1) may be used recursively to find p,. Equations (4.14), (4.15) and (4.16) give the autocorrelation coefficients p, for the AR(1) and AR(2) models. A general computational procedure for determining p, follows. In general, for an autoregressive model of order p, the matrix solution for the p-1 autocorrelation coefficients 1, -.) Py, of Eq. (A4.1) has the form (Salas and Smith, 1978) P ° et Av! [-0] (A4.2) where p= (6), 6. Pol» (8) = bo, + 1, Par cee Ppl > a sy “ty -al and T denotes the transpose of the matrix. The elements of the matrix A may be obtained from ay = Oj > 1 3 ie »» (p-1) (44.3) ag = Oj t Oy 5 TP 1l,jayjri (A4.5) in which 6; = 0 for i>p. Once the 6), ..., p,.) auto- correlation coefficients are obtained from Eq. (A4.2), the co- efficients p, for k > p are obtained recursively from Eq. (A4.1). APPENDIX Ad.2. PARTIAL AUTOCORRELATION FUNCTION OF AR(p) MODELS The partial autocorrelation coefficient 4,(k) in an AR process of order k is a measure of the linear association 179 autoregressive coefficient and 6,(k) for k=1,2,... is the partial autocorrelation function. between 2; and Pink for j < k. It is the k-th The difference equation for an AR(k) model is (see Sec. 4.1.2) p= 9,0) P5_ 4 + 990K); tot 9G); ; es (A4.6) where @,(k) is the j-th autoregressive coefficient of the AR(k) model. The partial autocorrelation is given by the last coefficient $,(k), kK = 1, 2, ... Equation (A4.6) constitutes the set of linear equations $C) pq + 90k), + + + OCR) Py = Py 91GOP) + Oy Ce)Pg +--+. + OCK)Py» = Py 9, C)Po + b90K)p, + +. + OCK)Py_g = Pg OOD 1 Fg) yg + ++ + OCK)Py = Py which may be written as 1 P,P see Py 940k) Py 1 ey + Pea} | 20) Cala Pp Py 1 wes Peg) J Og0K)] = | 0g Pret PK-2 Pag crs 1 4,00) oe or =1,2,... (A4.8) Thus the partial autocorrelation function, (k) is determined by successively applying Eq. (A4.8). 180 The partial autocorrelation function $,(kk) may be also obtained recursively by means of Durbin's (1960) relations 2 Py (1 = Po) Bo > Py 4,0) = ys 6962) = ~—— ,, 6,(2) = =, pee I= 02) OP = 059 k-1 PF C1) Oy; 0k) =| —42,———__ (A4.9) 1- 2 6(e1) p a 4 ) ej 600) = 9;Q-1D) > 44s) OGD To determine the partial autocorrelation function ,(k) from a sample series 21, ..., 2y, first compute the sample autocorrelations rj, from Eq. (2.5b), then replace the p's by the r's in either Eq. (A4.8) or Eq. (A4.9) On the hypothesis that the process is AR(p), the estimated 6,(k) for k>p is asympotically normal’ with mean zero and variance 1/N. Hence, the 1 - a probability limits for “zero partial autocorrelation may be determined by (Box and Jenkins, 1970, p. 65 and 178). fe apg yoWN s+ Uy a o/IN} (A4.10) where Uj.,/2 is the 1-a/2 quantile of the standard normal distribution, N is the sample size and @ is the probability level. The limits of expression (A4.10) may be used to give some guide as to whether theoretical partial autocorrelations are practically zero beyond a particular lag. APPENDIX A4.3. ANNUAL FLOWS OF THE GOTA RIVER, SWEDEN The following table gives the annual flows of the Géta River near Sjétop-Vanersburg for the period 1901-1950. The data is in the form of modular values (actual annual flows di- vided by the mean) as given by Yevjevich (1963) Years 01-10 0.985 950 1.121 0.880 0.802 0.856 1.080 0.959 1.345 11-20 11153 0.929 1'158 0.95% 0.705 0.905 1:00 0.918 0.907 0.991 21-30 01994 0.701 0.682 1.086 1.305 0.898 1.149 1.297 1.168 1.218 31-40 11209 0.974 0.894 0.638 0.991 1.198 1.091 0.892 1.020 0.869 41-50 0.772 0.606 0.739 0.813 1.173 0.916 0.880 0.601 0.720 9.985 181 REFERENCES Beard, L. R., 1967. Simulation of daily streamflow Proceedings of the International Hydrology Sympsoium, Fort Collins, Colorado, 1, Paper 78 Box, G. E. P. and Jenkins, G., 1970. Time Series Analysis, Forecasting and Control. First Ed. Holden-Day Inc., San Francisco Carnahan, B., Luther, H. A. and Wilkes, J. 0., 1969. ‘Applied Numerical Methods. John Wiley and Sons, Inc., New York. Durbin, J., 1960. The fitting of time series model. Rev Int. Inst. Stat., Vol. 28, 233 Fiering, M. B. and Jackson, B. B., 1971. Synthetic Hydrol- ogy. Monograph No. 1, American Geophysical Union, Washington, D.C. Hurst, H. E., 1951. Long term storage capacity of reser- voirs. ‘Trans. Amer. Soc. Civil Engrs., 116, pp. 770- 799. Jenkins, G. M. and Watts, D. G., 1969. Spectral Analysis and its Applications. Holden-Day, San Francisco Matalas, N. C., 1967. Mathematical assessment of synthetic hydrology. Jour. Water Resour. Res., 3, 4, pp. 937- 945 Millan, J. and Mejia, J. M., 1980. Stochastic dynamic pro- gramming models for operation of hydrothermal systems Paper submitted for publication io the Jour. Water Resour. Res. Mood, A. M., Graybill, F. A. and Boes, D. C., 1974. Introduction to the Theory of Statistics. Third Edition, McGraw-Hill, New York Salas, J. D., 1972. Range Analysis of Periodic Stochastic Processes. Hydrology Paper 57, Colorado State Univer- sity, Fort Collins, Colorado. Salas, J. D. and Smith, R. A., 1978. Correlation functions of time series. In Lecture Notes for the Computer Workshop in Statistical Hydrology, Colorado State Uni- versity, Fort Collins, Colorado 182 Tao, P. C. and Delleur, J. W., 1976. Seasonal and non- seasonal ARMA models in hydrology. ASCE Jour. Hydr Div., 102, HY10, pp. 1541-1599. Thomas, H. A. and Fiering, M. B., 1962. Mathematical synthesis of streamflow sequences for the analysis of river basins by simulation. In Design of Water Re- sources Systems, A. Mass et al., pp. 459-493, Harvard University Press, Cambridge, Massachusetts. walker, G. T., 1931. On periodicity in series of related terms. Proc. Roy. Soc., A, 131, pp. 518 Yevjevich, V., 1963. Fluctuations of wet and dry years Part 1. Research data assembly and mathematical models. Hydrology Paper 1, Colorado State University, Fort Collins, Colorado Yevjevich, V., 1972a. _ Probability and Statistics in Hydrol ogy. Water Resources Publications, Fort Collins, Colorado. Yevjevich, V., 1972b. Stochastic Processes in Hydrology. Water Resources Publications, Fort Collins, Colorado Yevjevich, V., 1975. Generation of hydrologic samples: case study of the Great Lakes. Hydrology Paper 172, Colorado State University, Fort Collins, Colorado. Yule, G. U., 1927, On a method of investigating periodicities in disturbed series with special reference to Wolfer's sunspot numbers. Phil. Trans., A, 226, pp. 267 183 184 Chapter 5 AUTOREGRESSIVE-MOVING AVERAGE MODELING 5.1 DESCRIPTION OF ARMA MODELS It has been shown in the previous chapter that the autoregressive models have been successfully applied for modeling hydrologic time series. The low flows of the dry season mainly result from groundwater effluence. They have relatively small variation. During the flow recession, the flow at a particular time is a fraction of the flow at a previous time, which may be represented by an autoregressive scheme The high flows are formed mainly by large rainfall or snow- melts or both. This mixed behavior could be modeled by adding a moving average (MA) component to the autoregres- sive (AR) component. More specifically, by considering the surface runoff and the groundwater contribution at the annual time scale, and by using the mass balance equation for the groundwater storage, it is shown (Sec. 1.4) that the annual runoff can be represented by the mixed autoregressive and moving average (ARMA) process. It can also be shown that ARMA models may come out with a smaller number of param- eters to estimate than the autoregressive model of higher order. The added flexibility obtained by adding a moving average component makes it possible to build a model with the smallest number of parameters. As the parameters are esti- mated from data, the idea of parsimony in the number of parameters is particularly attractive. In this chapter the modeling of hydrologic time series based on ARMA models is presented 5.1.1 MATHEMATICAL FORMULATION OF ARMA MODELS, It is assumed that the hydrologic series to be modeled by an ARMA process is stationary and approximately normal. Otherwise, the appropriate transformation of the original variable is first performed. Let us consider the values of a hydrologic time series yy, Vest Vesa» at equally spaced times t, t+, t+2, ... . Let the deviations from the mean be Yeoh (5.1) The series 2, sum of independent random variables may be represented as an infinite weighted t Sta + Urey t Hoey t (5.2) 1 185 If we make 2, dependent only on a finite number q of previous random ‘variables ¢,, then the resulting process is a "moving average process" (MA) of order q. it is written as Ot” Prep ~ Oey = 7 Cyetng (5.3) and it is usually called the MA(q) model. The MA(q) model of Eq. (5.3) can also be written as 5 tej 6.9) or ? 8 (5.5) 227 e 3 jo J td with the convention 09 = -1. The parameters of the model are the mean w, the variance of of the independent vari- t +2 parameters must be estimated from data. able, and the coefficients 6, 62, ... @y, or a total of The "autoregressive process" (AR) of order p was seen in the previous chapter to be represented by Fe tM tt Op te (5.6) or P 2 4 te (5.7) The parameters of the AR(p) model are the mean 4, the variance oz of the random variable e, and the coefficients 1) 2) c++) Gp, OF a total of pt2 parameters must be estimated from data An autoregressive model of order p and a moving average model of order q may be combined to obtain the "mixed autoregressive-moving average” (ARMA) model of order (p,q). It is defined by 2p = bap tne t OED Fe Ore) > ~ 8get-q 8) which can also be represented as 186 gt ee7 2 Oey (5.9) or 2 3 8, 5.9 Atte Bes (5.9a) with the convention 69 = -1. The parameters of the model are WH, os o> + Op ®1, .. + Og. A total of p+qt2 parameters must be evaluated from data. The autoregressive and moving average model of order p and q is usually called the ARMA (p,q) model. 5.1.2. PROPERTIES OF ARMA MODELS This section lists the principal mathematical properties of the ARMA model. For their detailed derivations, the reader is referred to the text by Box and Jenkins (1976). The properties of the MA modei are given first, followed by those of the ARMA model. In each case, the general properties of a model of an arbitrary order are discussed first, followed by the application to the first order process. Specific properties of the AR model were given in Chapter 4 and they will not be repeated in this chapter. Properties of the MA(q) Model Let us assume that the mean of the 2, process given by Eqs. (5.4) or (5.5) is E(z,] = 0 (8.10) By forming the product 242, and taking the expectation term by term, the autocovariance is found to be (with the convention 6, = -1): a-k j= = 66. k t7t-k =0 k>q For k=0, the variance is: (5.12) The autocorrelation function (ACF) is obtained by taking the ratio of the Eqs. (5.11) and (5.12) as (with the convention orate) » Kq It is seen from Eq. (5.13) that the autocorrelation is trun- cated or cut off at lag q, with a "memory" limited to q lags. Elementary Properties of the MA(1) Model The MA(1) process is defined by 81 ey: (8.14) Its autocovariance function at lag 1 is obtained from Eq. (5.11) as Y= - 61 of : (5.15) Its variance is obtained from Eq. (5.12) as Yo = (1 + 0,7) of (5.16) and its lag-1 autocorrelation coefficient is given in terms of the model parameters by Ya 8) ery Tre ° (8.17) The memory is seen to be only one step long as y, = 0 for K>1. The MA(1) model may also be written as +6 ee) Successive application of this equation yields &_) = 2,1 + 1- - 2 i fr.g thus, £, = 2 + 61 2%) + 0:7 & 9, and in general 2 2 ep = Mt 01 yt O12 ot A gt ... which converges for |@;| <1 and =~ ~ 6,2 - 2 = 01 2p > O12 ig ee +e 188 Conditions to be Met by the Parameters (Invertibility Conditions). The previous equations show that the first order moving average process is inverted into an autoregres- sive process of infinite order, provided that |@,| < I or equivalently the root of r- 8, = 0 be inside the unit circle. This is the invertibility condition. There is no stationary condition on the MA process, as the weighted sum of indepen- dent variables in Eq. (5.3) is finite. The invertibility condi- tion shown for the MA(1) process may be generalized to a moving average process of any order q: the roots of the polynomial rd - aysT! - grt? - 9 = 0 (5.18) must lie inside the unit circle. For example, for the MA(2) model with ©, = 0.6 and & = 0.2, Eq. (5.18) becomes r? - 0.6r- 0.2 = 0. The roots of this equation are rj = 0.839 and rp -263 and they fall inside the unit circle. Partial Autocorrelation. As an MA model is equivalent to an AR process of infinite order, these autoregressive coefficients will decrease rather than cut off, so the partial autocorrela- tions are of infinite extent and tail off. It is important to note the correspondence between the autoregressive and moving average processes. The autocorrelation and partial autocorrelation of the former, behaves like the partial auto- correlation and autocorrelation of the latter, and vice versa. Figures 5.1 and 5.2 show the partial autocorrelation function (PACF) for several MA(1) and MA(2) models. (Note that the values of $,(k) exist only for integer values of k. The lines plotted in the figures are to enhance the appearance only.) No plots of the autocorrelations are given since only 63, given by Eq. (5.17), exists for the MA(1) model; for the MA(2) model only p, and 2 given by Eq. (5.13) exist. General Properties for the ARMA (p, q) Model The general properties of the ARMA (p,q) process are given in terms of the cross covariance between 2 and & Yyg(k) = Covl ay yey] which is nonzero for k < 0 and zero otherwise, as 2 de- pends only on previous values fe With the convention 6 -1, forming the product 2,2, and taking the expectation term by term, the autocovariances of the ARMA(p,q) process are found to be 189 PHI KCK) ‘.b00 ne ‘SYMBOL THETA ® 6 a 78 70 ae on le 1.000 2.000 3.000 SMD gg 8-000 8.000 7.000 8.000 Figure 5.1. Theoretical partial autocorrelation for MA(1) 26 model. ‘SYNBOL —TRETAT THETAG © Pears) A oe 2 8-8 a Se a es Figure 5.2. Theoretical partial autocorrelation for MA(2) model. 190 4 2 Oy, (ki) ,} kq+l For k = 0, the variance is 2.2 q . Var[z,] = of + z - 2 ey. (-i) (5.20) iL il and the autocorrelation function is id Pegs Kat (5.21) The autocorrelations for the first q lags, 1, ...) Pq are seen to depend on the autoregressive and moving average co- efficients, whereas for higher lags, Pq41+ Pqsg» Pept depend only on the autoregressive parameters. Thus, the p values Pgs Pays Pqept1 Provide the initial values to construct the p, in Eq. (5.21), and when q-p 20, the initial q-p+1 values of the autocorrelation function are ir- regular, but the following ones form damped waves or expo- nential decaying functions. When q-p <0 it forms damped waves or decays exponentially. The partial autocorrelation ,(k) is obtained by fitting to the given series AR processeX of orders k = 1, 2, = Org, + O22 g Fo + OCD H+ 002 a. where ,(k) denotes the jt coefficient of an AR process of order k) The plot of $C) vs k is the sample partial autocorrelation function. The partial autocorrelation arises from the Yule-Walker equation (see Appendix A4.2) Thus, for an autoregressive process of order p, the partial autocorrelation $,(k) cuts off at lag p. A moving average process being euivalent to an autoregressive model of infinite order, has a partial autocorrelation which is in- finite in extent and attenuates with a mixture of damped waves and/or exponential decay, For a mixed process both the ACF and the PACF attenuate as damped waves or expo- nential decay 191 Conditions to be Met by the Parameters (Stationarity and Invertibility Conditions). ' For stationarity the parameters 1, $2, s-+1 %p must satisfy the stationarity condition. The roots of the characteristic equation (4.21) must lie inside the unit circle, Likewise for invertibility the parameters 6;, 62, “+1 0g must satisfy the invertibility condition: the roots of the characteristic equation (5.18) must lie inside the unit circle. Elementary Properties of the ARMA(I,1) Model. Because of its simplicity and of its importance in hydrology, the applica~ tion of the ARMA(1,1) model is discussed more specifically. The process is 2, = 124.1 + ey > Ore, It may be written as = O22, + Bp + COr-O1 0844 - 182% = 18243 te, + COr-Ordepy + O1CO1-Or Deg ~ 170184. + @r-O De, + Or10Or-O epg + O17COr-BEy_g + which is an infinite order moving average process. The coef- ficients of the ¢,'s are the weights of Eq. (5.2). The convergence of thé process requires the stationarity condition [ox] <1. For example, let us use the values of $; and 6; of an ARMA(1,1) model fitted to the annual flows of the Niger River (see Sec. 5.2.7). The values of the parameters are $1 = 0.79 and 8, = 0.35, from which the 4 = ot (1-01) be- comes, = 0.44, Wp = 0.35, Js = 0.27, 4 = 0.22, Us = 0.17, Ye = 0.14, ..., thus showing the parsimony of using an ARMA process as compared to an MA process. In a similar fashion the ARMA(1,1) process may be rewritten as an autoregressive process of infinite order as ~ 012401 F Ore 4.4 and fey 17 M22 + 81642 from which = 1-81)2p_1 + O12, 9 - O17e4_9 + ey 192 and 2 = (G1-81)24_4 + O1CG1-81 2-9 + O17(G1-81)4,_g +... He, = M241 + Maz yt M2 yg + - t (5.22) Again with the values of ¢ and 6 for the Niger River the i 0.02, ms = 0.01, ete. of the process requires fod t. =1< $1 <1 and -1 <6, <1. 0 6, that this n, = 6 1(9,-0;) becomes my = 0.44, my = 0.15, mg = 0.05, m4 = It is seen that the invertibility sum converge or that The admissible region of the parameters is thus In typical hydrologic models The variance of the process is obtained by evaluating first y,,(-D Ete = OZ gery teeta > Ore p 1” and El2.e,_)] = O:E[2,_,64.4] + Elee, 4] - ®iElef 1] thus Yog!) = $10% ~ 8108 thus Eq. (5.20) yields Yo = ¢1¥1 * 02 - 8;(6:-8)o2 (5.23) Similarly, ¥,,) from Eq. (5.19) as (0) = o2 and the autocovariance is obtained ¥1 = O1¥0 ~ 0102 (5.24) For higher lags Ye Foye 22 (5.25) Solving Eqs. (5.24) and (5.25) for yo and y: gives yo and y, in terms of the model parameters: _ L+ 0% - 2630) -2 vo = 14 HE phat 2 (6.26) ~a 2 ne o (6.27) 193 and the autocorrelation coefficients become = a = CL = 0301); = 64) os Se eee Pye = Oey = KD. (8.29) Thus decays exponentially from p, at a rate controlled Py y by ;. If 9; is negative it will have alternating positive and negative values. Figures 5.3 and 5.4 show plots of the ACF for several values of 9, and 6; including those of the AR(1) model (0; = 0) and of the MA(1) model (o, = 0) Note that for $, 7 0, @; >0 and @, > @, the autocorrela- tion decays exponentially from po =1 and for $, <0, it decays exponentially from p,. Figures 5.5 and 5.6 show plots of the partial autocorrelation for several values of 61 and 6;. (Note that the values of 9, and 9,(k) exist only for integer values of k. The lines plotted in the figures are used to enhance the visual presentation only). SYROL PAT Ore py The parameters of the ARMA(p,q) model for y, (n, 63, 62, vin Op G1» 82s sey Og, OF) are estimated front anfual’ hy- drologic data. The principal methods of estimating such pa- rameters are given in the next section, and a detailed ex- ample is given in Sec. 5.2.7. 5.2.2 PARAMETER ESTIMATION FOR ANNUAL ARMA MODELS Box and Jenkins (1976) proposed an iterative approach to model building which consists of three stages: (1) identification or use of the data and some of their properties to suggest a tentative type of model, i.e, autore- gressive, moving average or mixed autoregressive-moving average and eventually, the order of the model, e.g ARMA(1,1). (2) estimation (evaluation) or inference about the parameters using the data, conditional on the adequacy of model being considered. (3) diagnostic checking or verification of the validity of the chosen model. The diagnostic checking involves an examination of the residuals from the fitted model. The lack of independence in residuals indicates the need for modifica- tion of the model. The modified model should then be subject to the diagnostic checking procedure. Model Identification The principal tools for model identification are the visual display of the original series, the behavior of the autocorrela- tion function coupled with that of the partial autocorrelation function. Hipel et al., (1977) have recently demonstrated that the inverse autocorrelation function and the partial inverse autocorrelation function also are useful identification tools A visual inspection of a display of the original series may reveal the presence of a trend, persistence, long term cycles or outliers. The ability of the ARMA model to simulate persistence and very low frequencies will be discussed in a subsequent section. 197 The measure of linear dependence between observations separated by a lag k is given by the estimate ry of the autocorrelation function p,. To utilize the autocorrelation function (ACF) for model identification, plot ry, vs k up to N/4 approximately, where N_ is the length of the series. It is useful to show the probability limits on the plot. As ry ;. approximately normal with zero mean and variance _1/N, the approximate 95 percent probability limits are +2//N. Thus, if all the r, values beyond lag q lie inside the limits it may be concluded that the process may be a moving average of order q. If the ACF attenuates, then it may be conclud- ed that the process is autoregressive. When it is not clear whether the ACF truncates or attenuates then the partial autocorrelation is useful. The characteristic behavior of the autocorrelation and of the partial autocorrelation is summa- rized for the AR, MA and ARMA processes in Table 5.1. Table 5.1. Identification Properties for AR, MA and ARMA Processes Process Autocorrelation Partial Autocorrelations AR(p) Infinite in extent, con- Finite in extent, peaks sists of damped expo- at lags 1 through p nentials and/or damped __ then cuts off. waves. Attenuates as P Me 2 Oy MACa) Finite in extent, peaks Infinite in extent, con- at lags 1 through q sists of damped expo- then cuts off. nentials and/or damped waves. ARMA(p,q) Infinite in extent, first Infinite in extent, q-p lags: irregular first p-q lags irregu- then damped exponen- _lar, then damped ex- tials and/or damped ponentials and/or waves. Attenuates as damped waves _P Citar eet akc 2 + (k > 198 Parameter Estimation The parameters are, in general, estimated at three levels of increasing accuracy: a preliminary estimate, a likelihood method and optionally a nonlinear estimation Preliminary Estimates. It was shown in the previous chapter that the Yule-Walker equations provided simple preliminary estimates for the parameters of the autoregressive processes. For the MA process the equivalent of the Yule-Walker equa- tions is Eq. (5.11). Except for the case of q=1, these equa- tions are nonlinear. Equations (5.11) and (5.12) may be solved iteratively for the 6 parameters; however, their statistical efficiency is less than that of the Yule-Walker equations for the autoregressive models. Equations (5.12) and (5.11) respectively, are rewritten in the form © 62 a SEE aI (5.30) 1+ 6R +... + 62 y and n © an nn - * a, =- (a = By 7 BByg oo Bigg) 31) where c, and ¢; are the estimators of the variance and autocovariance respectively, and the ~ indicates estimate: The unknown 6's are assumed to be zero in the first iteration, and improved values of 62 and 8 are obtained successively. For the MA(1) process these equations become aS a= & 1 + e a 1 Os Similarly, for the MA(2) process Eqs. (5.30) and (5.31) give 199 Ae 1 —— a we For the ARMA process the p autoregressive parameters $1, Oa) ++) Gp are estimated first. As seen from Eq. (5.19) for k > q + 1 the autocovariances are independent of the MA parameters. Rewritten in terms of the autocovariance estimates, cj, these equations become wir K2atl (5.32) which can be solved for the estimates 61, 62, ..., a: A new series is now constructed which is the difference between the original series and the one formed by the AR model con- structed with the parameters $1, 2, ..., 4,» namely 27 baz oe Ape + (5.33) This series presumably contains only the MA portion of the Process. Its autocovariance estimates ch, cl, ... cy are calculated and the parameters 01, @2, ..., 84 are estimated by means of Eqs. (5.30) and (5.31) applied iteratively. Parameter Estimates from the Sum of Squares. Having obtained preliminary estimates of parameters of the tentative model, efficient estimates of the parameters are needed that take into account all the information contained in the data. The maximum likelihood estimates satisfy this requirement It can be shown that (Box and Jenkins, 1976, Chapter 7) the maximum likelihood estimates are essentially the same as the least squares estimates if the e's given by Oreo Oey Ore tos t Ogg (5.84) are normally distributed. We are therefore concerned with the evaluation of the sum of the squares of the residuals Nea? S(o.a) = 2 Ce)". (5.35) t=1 200 The sum of the squares of the residuals is understood to depend on the parameters ¢ and @, the 2, series and the starting values of the e's. We therefore seek the set of pa- rameters $ and 8 which minimizes the sum-of-squares function. The variance of the residuals is then estimated by ap - 1 A a = y 86.8) (5.36) For a small number of parameters it is usually convenient to calculate S($,8) for a range of values of the parameters, and to plot the function or contour lines of equal values of S This makes it possible to examine the behavior of the sum~of- squares surface and to observe the values of the parameters for which the surface reaches a minimum value. Alternative- ly, a steepest descent algorithm may be used to obtain the maximum likelihood estimate of the parameters and the residuals. For example, for the MA(1) model t 7 OE the residuals are given by 2 + 816, Coat tL and the sum of the squares of the residuals is thus, N N S(@) = % (ea)? = & (2, + 84641)? a t t-1 1 To calculate the sum we need a starting value of 9 which may be taken as zero, its mean value. Thus 321 &2 = 22 + 04£1 = 22 + O42, fg = 23 + O1e2 = 23 + 8129 + 01221 etc... The residuals are seen to be nonlinear. functions of the parameters. Since the MA model was constrained to be in- vertible, namely |6,{ < 1, the above series converges and @epends on the past observations and on 4. In general, the choice of the starting value only affects the first few 201 residuals and does not significantly affect the parameter estimates. As S(8) theoretically depends on the initial value chosen (€9=0) it is called the conditional sum of squares. McLeod (1976) has proposed a modified sum of squares method which provides refined estimates of the parameters The modified sum of squares is an exact maximum likelihood estimate procedure for the AR processes. The modified sum of squares function is minimized in order to obtain the im- proved parameter estimates. Reliability of Estimated Parameters. To simplify the writing let B denote the vector of the pq parameters $1, ..., 055 81, .++5 8g. In statistical estimation theory it is shown that in large samples the maximum likelihood estimates (MLE) of the parameters are normally distributed with the mean equal to the true value of the parameter. Furthermore, the variance- covariance matrix of the parameter estimates is related to de- rivatives of the sum of squares of residuals First the variance-covariance matrix of the parameters is defined by Var(Bi) — Cov(Br Ba) «+ COVEB1y Byg) Cov(Ba,B:) Var (Ba)... COvCBa Byyq) V(B) = — oo toe s eee COvCB, 4 q Ba) ve Var(Ba yg) so that the standard error of estimates of the parameters are obtained by taking the square roots of the diagonal terms of the matrix V(B). It can be shown that the variance-covariance matrix of the MLE of the parameters is given by (see Box and Jenkins, 1976, p. 227) ase) _ 388) ]7 ap? Boog vip) vz]... 2 (5.38) a — . a28(6) 2 Bota 202 For the ARMA(1,1} model the above relationship reduces to 2?3($1,03) 92S(44,8 Pp _ [Var@s)— Cov($18) ab 391081 V(B) = m2) - ~ #8(41.01) 828 8y) Cov($101) Var(o;) oon a | (5.39) The second derivatives of the sum of squares are usually approximated numerically by the sum of the squares of the residuals evaluated for different values of the parameters However, for the ARMA(1,1) model a simple expression may be obtained (see Box and Jenkins, 1976, p. 246) . © [C-812)C-8181)— (1-9,2)C1-042) (1-912)(1-62)—(1-0;2)(1-9101 (5.40) and the standard errors are 5 a 1/2 8) = [a G-$101°-04°) | (5.41) (1-61)? and - ee even 2 a(@,) = [4 Gedx0,)°C-84*) | (5.42) (41-01) Thus for any parameter 8; the (1-a) confidence interval is given by a tp.) : B 8B.) 5.43 [B; - Ugyy OB) 5 BE + Yyyp OCB) (5.43) where Ujyp is the deviate which excludes a fraction a/2 in the upper tail of the normal distribution. The approximate 1-0 confidence region of the sum of squares is bounded by - Xy?(pta) 8(B) = S(B) (1 + 4} (8.44) 203 where y,2(pta) is the deviate which excludes a fraction a in the upper tail of the chi-square distribution with ptq de- grees of freedom. Nonlinear Estimation of Parameters. It may occur that the sum-of-squares surface does not exhibit a sharp minimum or that more accurate estimates of the model parameters are required. It it therefore desired to evaluate the value of the o's and 8's for which 38(;, 6) as(>. —, 0 and —3— (5.45) However, as previously noted, the moving average process is responsible for the nonlinearity in S(9,8), and 28(g;, 6,)/28, = 0 is nonlinear in the 6,'s. To obtain the parameters, a linear approximation of the Taylor series ex- pansion of ee is therefore used. Fora detailed discussion of the nonlinear estimation of parameters the reader is refer- red to the text of Box and Jenkins (1976, Sec. 7.2) 5.2.3 GOODNESS-OF-FIT FOR ANNUAL ARMA MODELS Once the parameters of the identified model have been estimated, the next phase is to verify the validity of the model. One approach consists in overfitting, that is adding parameters and showing that the added parameters are not significantly different from zero. A second approach consists in considering the modeling of time series as a procedure to transform the observed data into a series of purely indepen- dent residuals. Finally, the parsimony of the model param- eters is tested Overfitting The procedure of overfitting consists of adding a parameter and testing the hypothesis that the added param- eter is equal to zero, provided that the new parameter is not redundant. or example, if the proposed model is an ARMA(1,1) 1 Oy tym Cree y (5.46) one could test the ARMA(2,2) = O1Zp21 + O22 9 + &, > Ore; 1 - Ore, 9 - (5.47) 204 However from Eq. (5.46) one can write Zar = O22 F Spey > Og which, upon subtraction from Eq. (5.46) gives 2p = 1+ 0122.4 = O12 p09 + hp = (1+ ODEEy + BLE,» (5.48) which is an ARMA(2,2) model, however, not independent of the model of Eq. (5.48). To test that model of Eq. (5.47) is not the same as model Eq. (5.46), one may calculate the resi- dual variance given by Eq. (5.36) corrected for the addition- al degrees of freedom due to the additional parameters. The corrected residual variance is ,?), lc = Neon 5(4,9) (5.49) where N_ is the number of observations and n number of parameters (g's and 6's). A decrease in the corrected residual variance provides a measure of the improvement due to the added parameters. Testing of the Residuals for Independence Several tests of independence are discussed in Chapter 3. Two tests commonly applied for the diagnostic checking of the fitted ARMA models are the Porte Manteau lack of fit test and the Cumulative Periodogram test applied to the residuals The Porte Manteau lack of fit test considers the autocorrelation function of the residuals 6, (€) of the fitted ARMA model such that the statistic (see Eq’ 3.46a) L Q=N X 6,26) (5.50) Kl is approximately distributed as ?(L-p-q). The adequacy of the model may be checked by comparing Q with the theo- retical chi-square value for (L-p-q) degrees of freedom. The Cumulative Periodogram test can be used by determining from Eq. (3.47) the periodogram of the residuals series © and the normalized cumulative periodogram P, of Eq. (3.48). Then the Kolmogorov-Smirnov statistic would be used for testing whether the plot of P; vs. h; is a straight line joining the points (0,0) and (0.5,1). Refer to Sec. 3.5.1 for more details about this test. 205 Tests for the Parsimony of Parameters For most hydrologic time series, for example for flow series, the underlying physics involves many phenomena and their interactions, such as rainfall, interception, detention, infiltration, snowmelt, groundwater flow, evapotranspiration, etc. Most of these phenomena have variations in time and space, and as a result the physics of the phenomena to be represented by stochastic processes is too complex to be ex- pressed in simple lumped models, or it is not well understood. Some physical explanations have been given to some models. For example, the MA process may be considered as a model to relate mean annual runoff to mean annual rainfall (see Sec 5.2.7) and a rainfall-runoff transformation has been suggested for the simulation of annual flows by an ARMA(1,1) model (see Sec. 5.2.7). In general, however, these explanations are given a posteriori, and seldom the structure of the sto- chastic model can be justified by an appeal to thé physics of the process (see Sec. 1.4). Instead we try to gain an understanding of the process through these models. As a result there is a need of nonsubjective criteria for the selec- tion between competing models for the same phenomena. A common rule for choosing between models is the principle of parsimony of parameters, which requires a model with the smallest number of parameters. One criteria for selecting among competing ARMA(p,q) models is the information criteria proposed by Akaike (1974). He used the equation (see Sec 3.6.2) AIC(p,q) = N gn (MLE of residual variance) + 2(ptq) (5.51) where N is the sample size. The preferred ARMA(p,q) en is that which yields the minimum value of AIC of Eq (5.51 5.2.4 GENERATION USING ANNUAL ARMA MODELS, Once the ARMA model is fitted it may be used either for generation of synthetic series or for forecasting future events. For generation purposes the ARMA(p,q) model q- e,- 2 Ge, i=l tei (5.52) may be used recursively to generate synthetic values. It is necessary to give p initial z values. By generating a sufficiently long series, and neglecting the first 50 or 100 terms, the transient effect of the initial values is generally negligible. An example of synthetic generation of annual flows is given in Sec. 5.2.7. 206 The synthetically generated series is expected to conserve some of the statistical properties of the historical data. These are the mean, the variance and the autocorrela- tion structure. The ability of the ARMA processes to model the iong term dependence is discussed in Sec.'5.2.8. 5.2.5 FORECASTING USING ANNUAL ARMA MODELS. One of the important applications of ARMA processes in hydrology is to forecast hydrologic variables one or several times steps ahead. Referring to an origin at time "t", it is desired to use the ARMA models to make a minimum mean square error forecast of 2,,; for L>1 where Lis described as the “lead time." The forecasted values for 2, for an origin at t with lead time L will be written ‘A! 2,(L). Box and Jenkins (1976) show that the minimum mean square error forecast 2,(L) is the conditional expectation of 2yyz, at time t; when regarded as a function of L for fixed t, 2,(L) may be termed the forecast function, or 2e(L) = Elgay [p> pyr od t#L can be expressed as an infinite weighted sum of previous random values as in Eq. (5.2). Separating the future values up to lead L from the past values up to the present time t, 24, may be written as An observation 2,,;, at time L-l & 2 Wee 2 Wear : (5.53) The second summation on the right-hand cD is seen to depend on present and previous values cy) _;, j= L, L+1, and is the minimum mean square error forecast of 2,;, at the time origin t for lead time L. Thus, 2,(L) = We teLj (5.54) iL The first term on the right-hand side of Eq. (5.53) depends on future random values 141) &t49) ---> t4y, Which are unknown at time t and is the forecast error for lead time L L-1 eh) = 5 cE jo tLe} + (5.55) 207 Since Efe,,] = 0 for j < L, Efe,(L)] = 0 and the forecast tt is unbiased The forecast may also be written in terms of the difference equation + +e, Zot > %teL-1 * Oo2ten-p * Sten 7 O18ten-1 7 7 feta g (5.56) by taking the conditional expectations, designated here by a square bracket: 2h) = Peg) = altaya) + + + tylterp] + fpr] - Siler - --- > egfe (5.57) ttL-q] The forecasting function is then obtained by noting that the present and past values (with a subscript equal to or less than t) have occurred and are no longer random. Thus, for past values the conditional expectation is the value itself, ice. (2,1 ale j=0, 1, 2, .... For the future values the conditional expectation is the forecast, i.e (2.;] a 2d), i=, 2. random terms [e,.,] = €_; and the conditional expectation 0, for and likewise for the past values of the of their future values is zero, i.e. [ey4; j=1,2,.. The one-step-ahead forecast at time t is (with L=1) 2) = diay t os + Oye iE, 1 (6.58) tpt ~ 8p 7 CGF tage which is given in terms of the present and past random terms. At time tl the observation 2,,, is available and Fee = O12 tt eZee t Epey 7 One - - Sota (5.59) Subtracting Eq. (5.59) from Eq. (5.58) we have Ze 7 2401) e te tel or in general 208 1. (5-60) Thus Eq. (5.60) provides a way of calculating the past e's For example, for the ARMA(1,1) model the one-step-ahead forecasts can be written as Zo(1) * 120 - 8180 21(1) = 6323 - 6181 = 121 - 61[21 - 20(1)] = @121 - 8121 + 036120 ~ 017€0 22(1) = ize - O1€2 = Oize - Oilz2 - 21(D)] = O12 - 0122 + Oyb121 - 81721 + 6179120 - O1%eo etc. It is seen that the effect of the unobserved values 29 and eo gradually diminishes, as [| <1 and [e| <1. It is desirable to take zo =z and € = 0, their expected values Therefore by building up the one-step-ahead forecasts from the beginning of the series it is possible to obtain the values of the q past e's necessary to calculate the forecasts z,(1), 22), ry BCL), 5 24), Where L q_ the forecasts depend on the e's only through the previous forecasts: 2,(L) = o12,(L-1) +... + o,2(L-p> , L>q. (5.62) Obviously, if there is no moving average component, the past e's do not intervene into the forecast calculations. Thus for 0<6,<1 an ARC) model will give 2,(1) <2, but an t ARMA(1,1) model will give 2,(1) $ er 209 To estimate the forecast errors it is necessary to evaluate the coefficients in Eq. (5.55). For this pur- pose, 2, is written as in Eq. (5.2) as an infinite series of weighed random values sta meine) (5.63) Inserting Eq. (5.63) into the general ARMA model 2 2 6 - 2 pr 2 Bey HO, C=O = 1 (5.64) Fo iro yields rp ¢: 2 2M Sti eLj = 0 (5.65) j-o iFo The Y coefficients are obtained by solving Eq. (5.65) for equal values of the index k in &, of Eq. (5.65). For example for the ARMA (1.1) proceSs Eq. (5.65) becomes 2 wet EW ea tee EO (5.66) Applying Eq. (5.66) recursively we obtain for £7 Me SE te 0 for Sea, 7 Ya Spy tO Uy pi y78r Spy =O Ya = Ore Bs for 9, ~ ¥2 S.9 + 1 Ur ep = 0 Ye = (r-O1)01 for ty, - Ve e.g + $1 U2 epg = 0 Ys = (1-01)? Thus in general (o1-81)o:"72 The variance of the forecast error is (with wo = 1): L-1 varfe,(L)] = Efe,?(L)] = 7 i-0 (5.67) 210 The (1-w) confidence limits for the minimum mean square error forecast 2,(L) of the actual value 2,7 are L-1 ayk Zr (4) = 2(L) tuyye + 2 wFVF 3, (5.68) Fl where Ug,o is the deviate exceeded by 4/2 of the standard normal distribution, namely ugj9 = 1.96 for a = 0.95 5.2.6 SUMMARIZED ARMA MODELING PROCEDURE FOR ANNUAL SERIES, The fitting of ARMA(p,q) models to annual hydrologic series 2, of length N comprises the following steps: STEP (1). Calculate the sample mean 2 and variance s* of the series. STEP (2). Calculate and plot the autocovariance function ¢,, the autocorrelation coefficients, ry, = c/s", and the partial autocorrelation coefficients @,(k) for lags k going from 1 to at least N/4 but less than N. The methods of estima- tion of c, and ry, are given in Sec. 2.2.2, and the esti- mation of $,(k) is given in Appendix A4.2 of Chapter 4 As indicated in that chapter the partial autocorrelation func- tion may be obtained recursively from rp-ry? ry(1-rg) 810) = ri 6202) = page} be) = ae k k wi” 2 Ore/[! a a0] (8.69) BaD = OG - O70) O,GejH1) Be RD Steps (1) and (2) can be performed by the IMSL computer program FTAUTO*. STEP (3) Identification. From the behavior of the autocorrelation and partial autocorrelation functions and mak- ing use of Table 5.1 infer the order of the model, namely, the values of p and q which are likely to fit the series. ¥See Appendix of Chapter V on computer programs. 211 STEP (4). Obtain the initial estimates of the p autoregressive parameters $1, $2, .-., 8, solving the p Yule-Walker equations eget = bog * bag te t Ape gstep Cgen = Srcgy t Hey to +4 (5.70) q poat2-p Cgep = MSqep-1 * B2%q-p-2 * ++ * Ophq If the series 2, does not have zero mean there is an overall constant @,, oh the right hand side of Eq. (5.8): - P 845 = 2C1- a oD. (8.71) This step may be performed by means of the IMSL computer program FTARPS (see Appendix A5.1). STEP (5). Obtain the initial estimates of the q moving average parameters. Form the series Bt = B21 7 amg 7 Op G72) and calculate the autocovariance function c,' of the 2,' series. Jt can be calculated as usual. Alternatively, Box and Jenkins (1976, p. 202) give the following formula for the in terms of the c, of the 2, series and the $ already +o p-i*p 4 where (8.73) With the cys the @-parameters and the residual variance o2 are obtained by solving Eqs. (5.30) and (5.31) iteratively if’ which the initial values of the unknown parameters are assumed zero. These equations are 212 This completes the initial estimation of the parameters 64, ee p> 01, 2, ..., 8 »o and 6,,. The first esti- a mate of the model is thus +2 bate, F Oe, G75) The IMSL subroutine FTMPS* evaluates the c's by Eq. (5.73) and calculates the roots of the nonlinear system (5-74) which with the notation x9? = o2, 4; = -8j0, becomes qr Z XX, j i=0,1,...4 (5.76) iro itj 7 The program outputs are the moving average coefficients and the residual variance. STEP (6). Obtain the maximum likelihood estimate of the parameters. Calculate the residuals e .-> max(p,q) io p> aq, & = Ve Fprj = 8007 2 pani * 2) Ppt 6.7) i » N-p t=1 for several values of and 6 around the initial estimates and obtain the values of the 9's and 6's for which S$ is ¥5ee Appendix of Chapter V on computer programs. 213 minimum. ‘his may be done graphically. For the ARMA(1,1) model one may plot the values of S ona 9-@ plane. Con- tours of equal values of S may be traced, and the minimum sum of squares is located, and the corresponding values of 6 and @ are obtained. The variance of the residuals ¢, is o2 = (1/N S). This graphical procedure has the advarftage that it exhibits any peculiarity that the sum of squares sur- face may have. After verification. that the surface is free of anomalies, this procedure may be extended by means of IMSL program FTMXL* which minimizes § by a modified steepest descent algorithm and gives the maximum likelihood estimate of the parameters and the residuals. The standard error of estimate of the parameters is obtained by taking the square root of the diagonal terms of the variance-covariance matrix (Eq. 5.38), and the (1-a) confidence intervals for the param- eters are obtained from Eq. (5.43). STEP (7). Perform the Porte Manteau test to check that the &, is a normal independent variable. Calculate the autocor- rbiation function r,(e) of the residual series, for the lags k going fro 1toL=N/10+p+q. (Thé e, may be obtained from IMSL program FTMXL¥.) Calculate the statistic QeNn (r,(2)] x a K=1 and determine if @Q is less than the theoretical chi-square value with L-p-q degrees of freedom. If this test is not passed the model is rejected. This step may be performed using the IMSL program FTAUTO* applied to the residuals obtained in program FTMXL*. All the procedures outlined in steps (1) through (7) may be performed by means of the IMSL computer program FTCMP*. STEP (8). Perform the Akaike test to select the final model among competing models. The previous procedure, steps (1) through (7), may be performed for several models, for ex- ample AR(2), ARMA(1,1), ARMA(2,1), etc., for which the maximum likelihood estimation of the parameters is found to converge. The best model is found by calculating the Akaike information criterion AIC(p,q) = N 2n (02) + 2(p+q) See Appendix of Chapter V on computer programs 214 where o2 is maximum likelihood estimate of the residual variance obtained in step (6). The model having the minimum value of the AIC(p,q) is selected. STEP (9). Generation of synthetic series. Once the ARMA model has been selected it may be used for generation of syn- thetic data. The series is generated by the formula P.. ~ a> Ee Oye ee B Coe 4 It is necessary to give p initial values to start the algorithm. Generation of normal random numbers usually yields variates with zero mean and unit variance. These are multiplied by ¢, to obtain random numbers with zero mean and variance o2. If the series has zero mean, the term €,, is zero. This step may be implemented by means of the IMSL program FTGEN*, The program FTCMP* estimates the parameters (steps (1) through (7)) and generates synthetic series (step (9)). STEP (10). Inverse transformation to obtain the synthetically generated hydrologic variable. If the 2, values generated in the previous step are centered so that Fy, Y where y, is the original hydrologic variable, then after generating the 2, series the y, is obtained from t IF at FT If the 2, is standardized such that where y, is the original hydrologic variable, then after generating the 2, series, y, is obtained from *See Appendix of Chapter V on computer programs. 215 STEP (11). Forecasting. Obtain the forecasting function for the lead time L from Eq. (5.57). Specifically for L q 24(L) = r2y(Lel) +... + O,%(L-p) « To apply the above equations it is necessary to obtain the q previous random values. This is done by calculating succes- sively 2:(1), &2, 22(1), €3, 23(1), &4, > &, 2C1) by means of Eqs. (5.59) and (5.60). The initial values of £4, +) &q may be taken as zero. If the series is sufficiently Jong, it is not necessary to reconstruct the ¢,'s from the beginning of the observations, but one may start at an arbi- trary point in the series at least q+10 terms back. Next the y-weights are obtained from Eq. (5.66), and the sum of the first L-1 squares of the y-weights are calculated to ob- tain the variance of the forecast error given by Eq. (5.67) Finally the (1-a/2)100% confidence interval is obtained from Eq. (5.68). The IMSL Program FTCAST* computes the time series forecasts and associated limits as described in this step 5.2.7: EXAMPLES OF ARMA MODELING FOR GENERATION AND FORECASTING ANNUAL TIMES SERIES Annual hydrologic series are, in general, stationary, and the ARMA(p,q) model given by Eq. (5.9) may be applied Two extreme cases are the autoregressive model or ARMA(p,0) and moving average model ARMA(0,q). The autoregressive models AR(p) or ARMA(p,0) are discussed in Chapter 4 The moving average model MA(q) or ARMA(O,q) has been applied to annual series. Matalas (1963) used the MA model to relate effective annual precipitation to annual runoff. In particular, Yevjevich (1963, a,b) used the moving average model to relate the mean annual runoff 2, and the annual effective precipitation e, ; , assumed to be' an independent white noise. The moving average coefficients 0; were shown to be the fraction of the annual precipitation flowing out in the (t-i)'t year, thus obtaining a physical interpretation of See Appendix of Chapter V on computer programs. 216 the MA model. Operating recursively on this model with the e,'8 considered as mutually independent random variables, Yevjevich (1964) obtained an autoregressive model with fewer terms. In fact, a moving average model, under suitable con- ditions of invertibility, (see Sec. 5.1.2), may be transformed into an autoregressive model of infinite order and vice versa. It is intuitively reasonable then to expect that there exists a mixed ARMA model with the least number of parameters to model time series parsimoniously (see Sec. 1.4) The ARMA model has been applied to annual streamflow series, among others, by Carlson et al., (1970) to the St. Lawrence, the Missouri, the Neva and the Niger rivers. Hipel (1977) also fitted ARMA models to the St. Lawrence River yearly flow series. O'Connell (1977) studied the ARMA(1,1) model to generate synthetic annual flow series, although the main thrust of his work was the investigation of the ARMA(1,1) model to preserve the persistence character- istics of streamflow series. In this section the calculations for the Niger River are done in detail. The annual modular coefficients (annual flows divided by the mean flow) and the mean flow for the Niger River at Koulicoro, Africa, for the years 1906 to 1957 are taken from Yevjevich (1963). The standardized flows are obtained from a7 4 a2 ts where q, are the annual flows, q is the mean flow and s is the standard deviation of the flows.. The data are given in Table 5.2. The analysis of the modular coefficient series u, and of the standardized series are done to exhibit the differ- ence between series with or without zero mean, The former series has a mean of unity and the latter a mean of zero STEP (1). The mean and variance of the series are gel. 2 = -0.0001 ~ 0 a s,? = 0.0575 2 = 0.9804 ~ 1. 2, es STEP (2). The autocovariance, autocorrelation coefficient and partial autocorrelation coefficient are calculated for lags 0 through 29. The autocorrelation and partial autocorrelation are the same for both series. The results for the 2, series up to lag 29 are shown in Table 5.3. The autocorrelation and partial autocorrelation are plotted in Figs. 5.7 and 5.8. Note 217 Table 5.2. Niger River at Koulicoro, Africa. Modular Coefficients (Annual flow/mean), Annual Flows (CFS) and Standardized Flows, 1906-1957 (after Yevjevich, 1963) oDULAR EN 397s2-88 -1.106432 42081182 ‘er1108 69038: 74 1114888 $3852.68 =le13310 Se6s9.57 tizrses 4390.25 ~les7es 2ers7is0 <1 19aa5i2 32s8:01 “1630748 49825-80, 1412048, 49480.50, 1445875, 32034143, sizs24 Se264.57 194487 4903452 409881 43652.65, =1813310 3680307 © -1Laazag8 5204188 lisioil $4796.80 ‘os3028 27574.57 alpe2ass 3118.50 21183964 5308.34 1375881 5365.81 10138458 ilesssis ilaoraio riess.58 i ze99s6 1537.78 1344958 2570.55, ‘e23400 $7480.89, [easae3 51588-54 1210552 ~legaaee ‘as541? nles3584 =rietes3 2001.85, 493032 41989148 1386706 43707:05 08181 35089143, 468608 41967 .48, 941293 Ssui7ies = -1461481 43596:33 817438 Sa144.88. ‘16514 4413.75. 755811 sesee.7? ‘ss0s20 9871.44 ‘316376 2274-76 ‘oges70 7529137 Tisss463 7623.72 1247703 9583.35 a iss973 73487 .48 iassaes 73464155 ‘374763 49001.85, 483032 St 138800 73860:51 1465808 Mean 1.0000 Su ~0.0001 Std. Dev. 0.2378 © 1SIb7-569,— 0-901 218 Table 5.3. Autocovariance, Autocorrelation and Partial Autocorrelation of Standardized Annual Flows, Niger River. 9 4.00000 1.00000 1 153478 133478 2 4s292 t2a7a0 3 39980 112086 a teoi1e 15830 5 fsa? oie a lige7s t0g7a1 7 tossaa ‘07003, a 10335 26015, 3 20063 20307 10: 34919 7119383 1 11954 12 03932 13. 102018 1a. 5115, 1s: 13012 1B. 100465 1 12678 1B. 108038 13. loaaei 20: siieies a1 108260 B 11353 23. 111483 24. ~iog87i 23. 14795 2. 102967 a 102550 28. 108234 23. loteoi 5 “—— Pp £ 38 er, ~ hay a mwa ave YRS) Figure 5.7. Autocorrelation function of standardized annual flows, Niger River. 219 ORRELATION PARTIAL AUTEC Fhe oR elie Figure 5.8. Partial autocorrelation function of standardized annual flows, Niger River that these functions are defined for integer values of the lags but the values are connected by lines to give a better pictor- ial visualization of their behavior. The partial autocorrelations were calculated by Durbin's formula (5.69) for ry = 0.53478, rz = 0.46292, rz = 0.39980: bu 11 = 0.53478 ‘ : 2 oes 0.534787 _ 9 oa7gq deg = aye Boras eon =i 78" 0.53478(1-0.46292) _ 9 , T = 0.53a782 = 9-40226 $21 = $33 0..39980-0 .40226x0 , 46292-0. 24780x0. 53478 _ 9 19996 1-0. 40226x0 .53478-0- 24780x0.46292 ete. 220 STEP (3). Identification. Take the 95% confidence interval given approximately by +#2//N = +2//51 = +40. The auto- correlation coefficient is seen to be positive significant up to Jag 3, it continues to decrease and becomes negative significant from lag 10 through lag 17, then oscillates within the confidence band and again reaches a significant value at lag 27. The partial autocorrelation coefficient is significant at lag 1 then oscillates within the confidence band. This re- sult indicates the presence of a first order autoregressive component. The autocorrelation coefficients seem to decay more slowly than for an autoregressive model, indicating the possibility of a moving average component. It appears reasonable to try the AR(1), AR(2) and ARMA(1,1) models. The detailed procedure is shown for the ARMA(1,1) model, and the results are compared to those of the AR(1) and AR(2) models. STEP (4). Initial estimate of the autoregressive parameters For the ARMA(1,1) model the Yule-Walker equations (5.70) reduce to = 0.8656 STEP (5). Initial estimate of the moving average parameter. The autocoveriance, c,' , of the series of Eq. (5.72) are calculated by Eq. (5.73), which reduces to (since =e) co! = Co + 6129 - 26:c, = (1 + 0.86562) x 0.9804 - 2 x 0.8656 x 0.52429 = 0.8074 and cy! = Cy + o12c, = o1(cetco) = (1 + 0.8656?) x 0.52429 - 0.8656(0.45385 + 0.9804) = -0.3244 Equations (5.74) reduce to 5 gz = 0-8074 @ = or @ = Se i & 1+ 6)? and 5. = 763! 0.3244 1278 oor = %E o 221 7 é 2 mus 5, = amb and 0.3244 @,2 - 0.8074 6; +°0.3244 = 0 Only the negative root gives a value of |6,|< 1, thus 6; = 0.5037 and = 0.6440 The constant 6,, given by Eq. (5.71) is (1 - $1) = -0.0001 (1 ~ 0.8658) = -0.000013 ~ 0 ® 0 The preliminary model for the 2, series is 2, = 0.8656%,_, + &, - 0.5037, 1 and for the u, series, from Eq. (5.71) 6,, = 1(1-0.8656) = 0.1344 and u, 0.1344 + 0.8656u,_, + ©, - 0.50376,_ 1 1 The values of the parameters obtained by the computer programs FTARPS* and FTMPS* for the z, standardized series are $1 = 0.865634, 6; = 0.503631, 0, = -0.000011, and 62 = 0.643987 STEP (6). Maximum likelihood estimates. For judiciously selected values of § and © in the neighborhood of the initial estimates calculate the residuals and the sum of the squares of the residuals. The maximum likelihood estimate corresponds to the minimum of the sum of squares surface. The residuals for $, = 0.8 and 6; = 0.4 are calculated as follows e,=0 2 = zg ~ $12, = - 0.871109 + 0.8 x 1.106482 = 0.0140 25 ~ $122 + O1e2 = 1.114639 + 0.8 x 0.871109 + &3 0.4 x 0.0140 = 1.8172 ¥See Appendix of Chapter V on computer programs. 222 £4 = 24 - $123 + O1e3 = -0.813310 - 0.8 x 1.114639 + 0.4 x 1.8172 = -0.9782 ep = 2 by t+ Ore) Thus —e$ = 0.0002; ef = 3.3022; ©% = 0.9568 ete Table 5.4 shows the residuals, their squares and sums of squares for = 0.80 and 06; = 0.40. The sum of squares of the residuals is 32.09. A more elaborate approach consists of using a computer program to develop the sum of squares surface. The detail of the sum of squares surface may be gradually increased to obtain refined estimates of $1 and... Table 5.5 shows the sum of squares of residuals multiplied by 100 on a 6-6 plane for 0.050 < 6, < 1.000 and 0.050 < 6; < 1.000. The minimum value of S$” is 32.01 corresponding to 0, = 0.80 and 6; = 0.35. Further accuracy may be gained by reducing the range of variables in the neighborhood of the minimum value of S. We choose 0.750 < $; < 0.850 and 0.305 < 8; < 0.400. This is exhibited in Table 5.6 where the quantities Shown are (S-32) x 10°. The minimum value is 1119 which corresponds to 4, = 0.790, 6; = 0.345 and S = 32.01119 Further detail may be obtained by further reducing the ranges of 4, and 6,. We choose 0.7855 < $ < 0.7950 and 0.3405 < 6, < 0.3500, as shown in Table 5.7 where the quantities tabulated are (S - 32.011) x 107, The minimum value is -812 which corresponds to 4, = 0.7905, 8, = 0.348 and § = 32.011 - 0.000812 = 32.010992 ~ 32.011. The standard errors of the parameters are obtained from Eqs. (5.41) and (5.42) which give _ {1 = 0.7905 x 0.348)2¢1 - 0.79052) | 1/7 _ 061) = lb ~~. 7055 = 0.348)" = 0-140 7 2 _ | G@= 0.7905 x 0.348)2(1 - 0.3482) | 1/2 _ (8) = {& “(7805 = 0.348) | 0-215 - Alternatively, the first and second differences of the sum of squares way be calculated from the values shown in Table 5.7. The first and second differences of the entries of Table 5.7, AS and A*S respectively, are shown below. 223 Table 5.4. Residuals (first line of each entry), Squares of residuals (second line of each entry) and sum of squares of residual for $; = 0.8 and 6; = 0.4. These results were obtained with the TI-59 cal- culator program shown in Appendix A5.3. RESIDUALS: 1. 8861 SQUARES 3. 557: Lgeat ns 2351 0.0450 o:0020 0. 4665, S 0.6854 1ses 012177 ess 3443 itge 70, 3448 ois? 6, 0496 8.0025 35 0.9943 0, 1217 ~0. 4362 ao eras 012462 9. 2279 -0. 9466 051s Or i2ot Petes 0.8192 1826 alerio -0. 3681 ise 0958 ~0. 6378 0, 8546 01 4061 01 Sore =1. 3610 ieee 1.3076 17038 3741 sum saul odeaan UM SQURRES 32.0323 pai beGs SG0E EGY GGIH vee Gele GesE BOGE 90SE Tere eGrE vest Z9Se Z25e TOLe zee 2682 STOP BSTp OG 21SG SGP COM) GRD ELBE ALIE OSE GIPEC TLE TSEC ESEC PEC ATE SBE ECE oLSE ORLE ScRE LSEE W218 G2PG OLS LSP) BTID BEE LIE OLE ELEC BOLE ELA ASC Adee OEE HHEE SOME OsrE eUSE ALBE tose ov2 295 GEIS ESLP GPE GOO pOLE GPSE IDE STEE SGEE peze Lee GEC ORC ADEE ALES ESPE |HSE OS9E Jag 9622 2025 C925 ISI) He IEEE IBLE OESE EGCE GEL ore GOZE TOLE vIgE sve SHE EBLE SHE SHSE E92 2212 0999 SESE SES SAP LECH ZEEE CELE PSE SOE GEE GHEE SHEE Loze EIze eSeE ZOE OLEE SoHE 658 269 2622 OSES STIS EGES ZIP 22 GiOv AOE IzSe ACE SEEE Be Lcee Lace GEE Wee ees TELE OT0T 22001 GOS B2Ia C90L SHIS OPS OESr GBEP eGOY GHLE EIDE SSPE OLEE SORE GAZE LSC BSCE TOLE esCe QT2T BPGIT TOOT Ocs6 21S pbOL GITS GSES Ther PIP) LEO CHE OSC IISE TIPE SHEE GOEL Eze L0EC GEE tebT o12pt 298eT vst lat s08st Borst To0e: 9112 GELET ELST TZOST Ssgzt ESOT GLEB 269L L689 2S T2ES GraP AL) SBI BSCE ABLE GSE ESSE Gere TSE 12092 900E2 ETO HSELT HOSPT pEOeT SBTOT SSE9 ZerZ SSSS SES EGS Va aErh LIz) LSSE SEC LSE 2OSE GESE 00S6 BB0B 2565 PSDS LES SPeP Loby eth BLBE HOSE OSCE Scbe OSEE zGCE oPcE IsEE G9G0T 2226 bOGL G259 SESS OSES HRP BrP GSTP GIGE BCLE OSE pOSE BEE Gove arte ISG0€ 60982 L2HE2 OLSET TESST ORLET HLPIT BTLG OSCR Heee GPO BLS e925 BP pISr ser Soy OLE HrLE ESSE JAE9E BVSOE G18S2 TSE VELAT LYST ZoseT S/80T TOES Teed SOIL 1+ TELS eres ser Iesr QL EL EIGE eELE -2E2r 22GHE G2POE T8SS2 S22 OTHzT ErPET 268» TEVEE JTEVE 99H2 ESVEZ CLPET OTIST cGvET ZEPIT L696 Toss E0SL E089 LSI9 ZESS E0eS ESP BSG EEh SHIP OSG SLEDb ELOAE YLIZE GEPSZ ZEST ZO6LT OSSPT ZEBT SSEOT OPPS SICE CThe BL99 T8OS I6SS eats asa» Posh T9EP 0L€9 959Gb STGP SOBSE 99EGZ LEVHZ LOBSI SOGST ESET SZETT THEOT 6206 $903 Th2 £959 C109 S633 SIS TaB» 20S STL 22256 Bey BSVGE ELveC LEZ TeATe ESTBT 20ST LSVET EOETT 963s 9348 SPL 2602 OLr9 SSS rESS BIS BBeb 21 SEEOT 9263 LLL SPS9 HHZD GLIS TZ2S OSH BPS GOEr OITh esse ‘T0008 “00008 "00088-00008"0002"00004"000S3"00005 00088-00008" 000Sr 0000» -O00SE 001 “90002"000sT-0000 00050" VLFAL OL X § Jo sonpea ‘uNY Issty - PoVJINg saxenbg Jo umg “g's TqRL TIE ERS GeTET SzEET GECT GOTT Babee sSOve Erase HSLL2 GLEE BBTAE ELBE SELLE 1BIOY OULEr vers» TSLEH ODEES otOLE poset pIPbT Zeabr THEST 2S29T GE2ZT OBPST ESOGT SC>Ta voce ESISE SEALE OEIGE CTE Sore PUBLE ODLOY TIGEY 962L> HSBO: O9ZtT 22etT TGSTT TET SCOET LSEET HUET SBOST abt Teost Zoxve oLEZe eeEre QTELe CLAE GSC COPSC ¥GhEC TeaTH 2e0s! 2BL6 ESE 2196 bss OLzOT GoGOT GELTT HGLeT ELGET SUEST GassT GzL9T SSLOZ GIGZe S9eS2 CELLe TOSOC BAECE ESbSE HESS 0628 Teas SUZ G2LL USGL OCH S205 GSEE OGSOT STIZT ScseT SHIST byEGT TeKBt ZoTTE esrez SEES OTZEz HIBIE Tesh vaze 2999 SS29 0209 $509 9269 COLD Tobe GE2B H126 SOsOT ‘OvSET ISEST pvELT besGT geste Scbbe esTLe 22001 2129 088s 2925 G9RP pear GOLr ede BLES GIOd GSBd BEEZ ossor ogter téser ossst szter zbsoz rote Bc8s E99 95S Ihab Seth Sade BESE 29SC PELE Tezb Tdeb e1es BEEZ GIME ZHOTT LSet OSabT TS0LT ZebEr BST 8202 GELS L9P BEAL {ee piAZ vase SPoz Huse OTEE GhEE 6285 9902 5hB beTOT IvGTT ZbeeT thTsT ease voor Gis ese9 IeZe ZI>G SEATT ser tHYs! 2522 BOSE sesh GSES Ieee LIEa seLOT SHE Eeet ashe SBE eucy ossS o6cs Iga zcrO e682 G99 SLOS LOOP I9TE geSe Ie CesT oseI SLI2 GOs PERG LOpZ SECS EEO LSSE HOLE 2402 LEST éShT SPI o6oT ShOTT ECOG G5Ze 9TZS POY GTEC ase OTe ScET BBTT HErt OFEET S90TT Zp06 Ssz2 ZOLs OsEr SOLE Sth 9B/T SECT oztt 90ST 99SE1 BGZTT 1525 25r2 GOBS Posh BPrE BSSe GOET GabT O2EGI HESOT 9SGET HOLTT ¥595 2rBL y9Z9 GIGH rose Gose ovee S20E2 ESET ZOTLI PIOPT ZOEZT BOI Std S299 LOPS LEEW ECVE e522 262 oSdZ zee Lore Ose g0eE SIO ACIS e2et bet 692 sete seeb SoPS 2BEs SosB SE2t aevT GHET 29-2 GeRE CIEh rss 1363 asst GEST CeLT Ze12 zy2 Ease 2ESh 2008 e022 228E2 66102 OBELT ITYST GBOET GOOTT B3TG HOS: TGID GHOS ELTH opvE GaGe gtZe BL9z HSSZ thee gCHE BESP 29aTe eveae Ceab2 EORTZ ASGAT ZAEST PrOrT APSTI Z500T Elva L802 TEES 20S LG2> HISE OSSE Z0SE aS3E soy ecsh S25SE GLOEE OsbEe C9092 SLGZ2 ZeTOR GIGLI POTST ESOET TETTT zvGS STE g1G9 EOS LIES T28» SHSP Zor» Eos» TOs ia JTasT Spbst ZIEPE oEveT OBLOT PIE OBI Fees YEPS SAEs SESE GeIS pws TF }002E-OOSTE"OOOTE"DOEGE” THA + 2us@> pecee sabre c2a0e pcb2 s0ct: ‘90009 00c6E“000sEooseE coDeE“0OEzE *0S3¢"000SE" 0USSE000SE“oUSPEodUrE “oOSEE-oDDEE“~OOSeE wisi s0IX(ZE-S) JO sonfea ‘uny puosag - sdeJang sesenbg Jo wns “9°g AIq2L 226 F evbG 09901 280zT C2ZET GISST veaZT BART Tae ZpOSe TESLE GZOIE ECCPE OPALE ESTP CGPS GZ2Gh OoES CISeS 9oBCa PezeD : 2b vezeS Sel9S 22019 9089 9082 9106 ZEPOT z902T GoGET SGEST eteet aso Sacre erase OEESZ ErSZE SEISE ogee TLLEY T6Sb CS 299 TéSL SEED GODOT ZrHet SOPPT SCOT ZEIGT G5oTe Ereve EzaLe ZeTTE CAPE TEESE Sve GOSH BGSOS HsesS ISLT p4S02 Svbc2 pessz Z1E82 LOECE GOOLE ETEOP Scosp GEESh 8622 SCE OvIy LEIS P2Cs TELE ghES SLIT Tree 2sH: Zavl S£L1 ECER COTE EGO HLS be99 Gaza SOTOT SETI HécbT AxG3T BLMET Eveee etvce esseZ LBIEE EBISE LELEE Sozcr lb 809 BPG OUST E92 ELE OY HIBS BIbL EZ’ SSZIT ZOPET GEST GISGT EEPTE SOSre OAc? CUTE ESGvE TS9D fe GPIS 2bz9 SSB osSOT eset TeesT Szazy Lazo2 CELE sS0L2 EESOE Stee ap 9ET- pt- Te €88 GIST sasz Ost GSr- S8S- Ip GIT- Teh OLTT G2Te GSE OBB OL29 OL03 O90OT EGET BRLMT SBELT THeGe SeeTe eesse eEESe B91 0SE- e/9- T8L- b49- SGE- PLT SIG @98T TEOE pode GOES TELL vOLS LETT GTbT GrOLT GBEST ocEze esI9z 60S 091 9LE- OOL- ZIG 212 TOh- T2T S58 ToBT ssGz Ezeh Goes saz za95 BEET EoEDT SeEST ESET oR? E402 EOTT SPE BGI- OES- GPS- 2G5- ES2- 292 Ges SEET SLE Eh PODS cBLe 2Ld6 TZEIT GLEDT S669T ZUE6T SSE L9h2 GOT aL 241 LST- H~ GOZ- 18 SES PICT Sree SAC eer OED 2408 SEI0T Sheet sPaPT BseLT 2985 p52" HS0E B902 SGT Sez GsE E52 OC GID oeIT eceT sole GEBC EES eBLs ESS9 e2SOT ETLat LOTST 2608 2999 Thos EEBE GEA aS02 OGPI SETI 266 2901 CHET CBI OPS SSPE zed» GIGS she Gees EOTIT oLcET oscer 2605 Gbbz GIO9 CoBr Tose Z10e Seve CLOe ze6T YET see Eble GEYE Ler S3>S 9549 SEB L000T zvOeT S22bI YPTZI BLOT 929B TL HES PSG ISTH ELSE 2OZE CvOE SoDE ZOCE GESE seSr Ges OrS9 e992 SEES SCTIT 22621 BISST GZSET HSSTT YESS LSB SIEZ 9529 ISPS GEL GIS> ESEr Scr 9S9 SIS JOBS E593 EORL LTTE EHOOT aroze bIScI T0ZLT COST oBZET ISSIT B600T SSeB 82AL HIOL HIPS 3209 IS25 GeBS SETI 0093 Eves BTA pSeE vess2 2c8e2 Sczt2 ELEAT 999B1 FAEPI SGZCT CEAIT vasor ers sea LTTE T2ee ESL G9GL GOEL E92a B2EB SOB SCOT EGYSTE 12502 OTOS2 POZE CEEOL LTEBT TET OCZST OSLET OOSZT GPT 9290T BOOT poss ETE Hers 2496 ETTOT oLoT SEIT ‘G00se Os6He "o0Gve"OG8HE"cOaPE“05ZHE"D0ZvE"OSAPE"OOBHE OSHC” DOSPE"OSHEE"OOPHE.OSCvE”OOErC”OSEPE"ODeHE OSTHE“DOTHE"OSOHE: *OIX(TT0'ZE-S) Jo sanqeA ‘adeJAng sesenbg Jo ung “1 °¢ aTIGeL 227 For the second derivative with respect to $1: 8, = 0.3480 1 s' vs! vs! 0.7895 172 702 0.7900 ~530 440 282 0.7905 -812 420 -138 0.7910 -674 421 -559 0.7915 -115 S' = (S - 32.011) x 107 “ The value of the second difference at $, = 0.7905 and 6, = 0.3480 is 420 and the second derivative is 9*S/2a$? ~ 420/{107 x (0.005)?] = 168. For the second derivative with respect to @;: $1 = 0.7905 e s' vs' ves! 9.3470 -401 311 0.3475 -112 21 100 0.3480 -812 212 : -112 0.3485 -700 212 -324 0.349 -376 S' = (S - 32.001) x 10? The value of the second difference at $, = 0.7905 and 6; = 0.3480 is 212 and 9*S8/a6,? ~ 212/[107(0.0005)2} = 84.8. For the mixed derivative the first differences of the values shown in Table 5.7 are first taken with réspect to 4; and the second differences are taken with respect to 6: and are shown in Table 5.8. The average value of the second difference at §, = 0.7905 and 6; = 0.348 is -219.25 and 87S/8p130; ~ -219.25/[107 x 0.0005 x 0.0005] = -87.7. 228 Table 5.8. Calculation of Second Differences of Sums of Squares with Respect to 6 and 6. 0.3880 0.3485 0.3870 iF ae eS er (ces | = ee 0 pee 22-219 tea -220 102 | o.700 | -357 9 sm “196-29 ae | ons { ant -ne eu [4 sos ate 8s a (29) a oma | 4 88 an “1 | 996-220 +718 207 559-219 240 ons | 1179 as #6 Note: The values in parentheses are averaged to obtain the second derivation at (0.7905, 0.3480). Also note that S' = (S = 32.011) x 10’. From Eq. (5.36) 62 = 32.0109/51 = 0.6277. The variance-covariance matrix of Eq. (5.39) is thus 7 168 -37.7 | + V(B) ~ 2 x 0.6277 x “81.7 84.8 The matrix above is inverted using the TI-59 calculator master library program ML-02 with the following result e 0.01294 0.0134 V(B) ~ 2 x 0.6277 x 0.0134 0.0256 Thus o($,) = J2 x 0.6277 x 1294 = 0.127 (81) = \2X 0.6277 x 0.0256 = 0.179 These values are close to those found above by the theoretical formula. The 95% confidence intervals of the parameters are obtained from Eq. (5.43) using the theoretical values of o(6;) and o(8;) 0.7905 - 1.96 x 0.14 < 4; < 0.7905 + 1.96 x 0.14 or 0.5161 < @; < (1.0649) 229 Observe that the upper limit must be restricted to 9, < 1.0 due to the stationarity condition, and 0.348 - 1.96 x 0.215 < 6; < 0.348 + 1.96 x 0.215 or -0.0734 < 6; < 0.7694 The parameter values obtained by the IMSL program FTMXL* after 9 iterations are $, = 0.791457; 6; = 0.348672; 59 = 0.000017; 62 = 0.627668. The parameter values obtained by the IMSL program FTCMP* after 50 iterations are $, = 0.790691; 6, = 0.248173; 8,9 = 0.000017; o? = 0.627668. The residuals and their sums of squares obtained from the IMSL program FTMXL* are shown in Table 5.9 STEP (7). Goodness of fit tests. The autocorrelation of the residuals r,(z,) is calculated and plotted at least for lags up to L=N/l0+p+q=51/l0+1+1+7.1=~8. The residuals statistics are listed in Table 5.10. Figure 5.9 shows the autocorrelation of the residuals up to lag 29. It is seen to remain well within the 95% confidence interval indicating that there is no significant correlation structure in the residuals The Q-statistic, calculated from the sum of the squares of the autocorrelations, is: L Q=N £ [r,(e)]? = 51 x 0.33966 = 17.3227 x * Comparing this Q-statistic with the chi-square value with L-p-q = 29-I-1 = 27 degrees of freedom, which at the 0.05 significance level is 40.113, shows that the ARMA(1,1) model passes the Porte Manteau test. Both tests indicate that the residuals appear to be independent. STEP (8). Akaike test. The Akaike test is used to select the best among competing models. The following models were fitted to the standardized 2, series with the following results *See Appendix of Chapter 5 on computer programs 230 Model 8 by be 8, a 00. a AR(1) -0.000036 0.5588 - - 0.663424 AR(2) -0.000028 0.4036 0.2534 0.622512 ARMA(1,1) -0.000017 0.7907 a 0.3482 0.627668 ARMA(2,1) -0.000017 0.7351 0.0586 ~=—-0.3171-:0.625062 Table 5.9. Residuals, Squares of Residuals and Sums of Squares Obtained from IMSL Program FTMAXL. RANK RESIDUALS —S.RESIDUALS _S.S-RESIDUALS 0 ° 9 +004602 +po0021 000021 11805755 3.260753 3.260774 “11065905 1135153 4.396927 1448590 +202131 4.598058 1333785 4.382853 e.74ges0 7.729513 iaqeaag 8.177862 1213 8.593075 1105587 11144 teizise 145025 1359002 28883 ~1383763 1155043 8.933176 ~1630371 1397367 91330544 ~1309574 1827324 10.157868 1577270 1333241 10431108 1361758 1130867 10.621976 ilesee70 «31470286 «14. 082262 11438284 = «2.068561 16. 160823, ~1851317 1724740 18. 885863 1545302 1297354 17.183017 1343834 va9082318.073840 1428643 iige024 —18,255865 1318510 i101448—18,357313 ~+360165 ies71918:487032 ‘0sss25 1004426 = 18.491458 234859 1055158 18,546615 ara673 teste? = 18.775744 264462 1068940 18..845684 1612344 1374985 19.220650 ~a7aie1 1760685 18.981315 +063208 1003885 19885310 317208 1100619 20. 085928 Piassi tsil2i2 —20:597141 1227826 o77oal —-20.874152 s2ise5 Ba934e 21524108 102758 1010553 21534667 752300 1365855 22. 100622 076872 005825 22. 106547 657303, 432055, 528602 -1513233 1263414 2.802016 {769946 tsseg08 23, 394824 ~1426241 Tigiesi 231576505, 1038848 1009771 23. 586276 lleas2g6 = 2.853721 28.439997 =1421255 1177457 28.6 17454 1813053 {661065 27.278513 1821828 1675401 27.953920 1511183 1261303 28.215229 -11392861 11940063 30. 185292 1362273 L.ess787 —32.011078 231 Table 5.10. Statistics of the Residuals MEAN = (+0533 VAR = 6248 ~ LAG _AUTOCOUAR AUTO CORR. SSG AUTOCORR PART AUTOCORR . 02256 90130 Toatis, looses loroas lo1s1s 103035 203273 Ms 12. 13: is. 104139, i +15880 igs 100182 is. 122121 104408 20. 123789 116143 au 127638 110352 22, 127701 109844 23. 128399 104288 24: 28872 102014 25. vas032 217118 26. s0Bi24 29882 105581 er. 114303 131733 103417 2 100850 £31246 112393 28: 114883 133866 103097 The Akaike information criterion AIC = N 2n(MLE of residual variance) + 2(ptq) applied to the above models gives AIC(1,0) = 51 £n(0.663424) + 2(1) = -18.9274 AIC(2,0) = 51 £n(0.622512) + 2(2) = -20.1736 AIC(1,1) = 51 2n(0.627668) + 2(2) = -19.7529 AIC(2,1) = 51 £n(0.625062) + 2(3) = -17.9651 . The AR(2) model has the minimum AIC value and is the theoretical choice. However, the AIC and the variance of the residuals of the ARMA(1,1) model are so close to those of the AR(2) model that both models are essentially equivalent STEP (9). Generation. The 2, series may be generated by the formula 232 rie 'PHRS) Figure 5.9. Autocorrelation function of residuals 2, LP tet be tl The last value of the observed series is taken as initial value. The initial value of the residual ¢,., is taken as the last entry in Table 5.9. A random number generator is used to produce the ¢, values which are normally distributed with zero mean and Variance 62. The results are shown in Table 5.11 and in Fig. 5.10 The synthetic flows are obtained by the transformation 4, = 3s + q= 2, x 13167.569 + 54362.00 The results are shown in Fig. 5.11 STEP (10). Forecasting. From the model Ore Fee Oey the following forecasting function is obtained (including 6,, if non zero) 2,(1) = $12, - O16 + 8, 24(2) = 12,1) + 8, 233 Table 5.11. Generation, Forecasting Weights, Forecasts and 95% Forecast Confidence Intervals Calculated for $1 = 0.790691, 8, = 0.384173, B99 = -0.000017, and 62 = 0.627668 LeaD SIMULATION _HIEGHTS FORCAST DEVIATIONS 1 442518 2 348695 3 276658 4 1218751 ‘33821? teisiis 5 Ti7eses izs74o8 11850055 8 1138762 ‘2l1420 1.889140 7 1108136 Tie7is1 1180973 8 vogsso2 tig2i4s 1388333 3 ‘067606 hiogéz1 1882820 10 147856 1053455, tosase7 1 95762 i aseaas loazes7 loss2aa 1 1aa7sea 2 -1lo10478 ‘033420 S0si603 «1.38686 3 laeazee soae425 ‘ogo7es 1 138369 14 Togoado ‘oaogs4 Tog2e31 1 Lgss818 15 328495 voissat toas46s «1.300082 18 11288060 1013063, ‘ogoizo «100262 1? 1ss5938 Touoaas Toisese 1100368 18 Si0012 ‘oosis7 to12543 «1300435 1s 11017158 loosa57 Tooss0s «1300477 20 =los94g2 1005108 loozais 1800503 a1 339637 ‘004037 ‘oosis2 1.300518 2 262400 ‘oosise Toosess 11300830 23 593587 looase4 Tooss22 «1300536 24 1133808 ‘001895 ‘ooz005 1.300540 5 711737364 001578 Tooaass 11300842 26 941333 so01248 too1e4s 1.300544 2? -11830718 ‘0008e7 Soo1445 © 1.800545 28 =1750438 ‘000780 Too1128 11800546 23 -1.44s572 000817 too0a73 «1300546 30 -2!061807 Toooas 000673 t.800546 31 71016151 ooo3es 00515 1.800548 2 835590 000305 000381 1: 300548, 3 880336 ooo2st ooozs2 11300546 34 007728 oooist oooala 00546 35 348258 ooo1st 00152 00546 38 541g26 ‘000113 ‘000103 300547 37 sisaae 000084 ‘ooooss 1.300847 38 895720 ‘000075, 000034 11800547 33 581482 ‘oo00ss 000010 1.800847 40 o79si7 ‘000047 ooo00s 11300847 41 1287387 ‘000037 ‘90024 «11800547 42 11050847 ‘ooo023 oo003s «1.300547 3 ‘es3aes ‘000023 000045 11300547 44 -11813883 ‘000018 oo00s3 «1.800547 45 134653 Soooo1s o000ss 11800547 46 s0sea2 ‘oooo11 000063 «1.300547 a 842695 ‘oooo0s 0000s? «1.800547 48 #1713880 ‘000007 900070 «1.800547 48 1035865 ‘ooo00s ooov7e «1.800847 50 T1z6156 ‘000004 oo0074 = 1.800847 = -1) + > 2y(L) = $42,(L-1) + 0,,,L22. With the parameters ; = 0.790691, 0, = 0.348173, @. = -0.000017, zs; = 1.465609, and €s, = 1.362273, we Rave 234 ame FETAL CATA SYNTHETIC DATA ' 20 a" é 5 am i 2.01 \ i 3.000 + z Fi eo Se oo Ge gpe 2a fa 8o tho Figure 5.10. Generation of 2, series, Niger River. 10 ‘ACTUAL DATA ‘SYNTHETIC DATR 00 2 com Steed So A & ono oo aa o 120 Ba 8.0 ve WO 5.0 308.0 TERR cveneSi® Figure 5.1]. Actual and synthetic annual flows, Niger River 253(1) = 0.790691 x 1.465609 - 0.348173 x 1.362273 - 0.000017 = 0.6845 235 251(2) = 0.790691 x 0.6845 - 0.000017 = 0.5412 251(3) = 0.790691 x 0.5412 - 0.000017 = 0.4279, etc. The small but nonzero value of 8, produces a small drift in the forecasts. The forecasts af@ listed in Table 5.11 as calculated by the program FTCMP* and are plotted in Fig. 5.12. The flow forecasts are obtained by the transformation aL) = 2,(L)s + q = 2,(L) x 13167.569 + 54362.00 and they are plotted in Fig. 5.13. a peas| FORECAST oF Figure 5.12. Forecast of 2, series, Niger River The 4-weights are calculated from Eq. (5.66), namely Wye w= rede? i=, 2, and the 95% half confidence interval is calculated from Eq (5.68) with Eyjg = 1-96 and 42 = 0.62768 a/2 We have thus ¥See Appendix Shapter V on computer programs 236 Figure 5.13. Forecasts of annual flows, Niger River | i] ye Uhye Ue werk BLP y= Grade) OF t+ 2 we Jugal + 2 wel", 1 0.4425 0.1958 1 1.5528 2 0.3499 0.1224 1.1958 1.6981 3 0.2767 0.0765 1.3183 1.7829 etc. The forecast weights and the 95% confidence intervals (deviations) calculated by the IMSL program FTCMP* are list- ed in Table 5.11 and plotted in Fig. 5.12 for the 2,'s and in Fig. 5.13 for the flows It may be more interesting to do "real-time forecasting." For this purpose the first 30 terms of the series are used to evaluate the initial model parameters. With this model the Jead-1 term is forecast. When the 3ist term becomes available the 3l-term series is used to re-evaluate the parameters and the lead-1 forecast is estimated. When the 32nd term becomes available the 32-term series is used to re-evaluate the param- eters and the lead-1 forecast is estimated. This procedure is repeated up to the Sist term. The calculations are performed 237 by repetitive use of the IMSL program FTCMP*. Figure 5.14 shows the "real-time" forecasts and the historical data. ‘Table 5.12 shows the evolution of the parameters and the forecasts. The last values of $1, 61, 8 and 6 correspond to those previously found ‘00 LINE -- EXPLANATIGN [ACTUAL DATA @ FORECAST oATa dn su Me yg (wee Figure 5.14. Real time forecas Niger River. Table 5.12. Real Time Forecasting, Standardized Flows. Niger River starting at time t = 30 yr YenR) ARPS RCPS FReCaSi(H) ACTUAL 1 som .gissa : iNa003leesay 5 "igen Telos 2 ‘sg “Sess 3 ngs Ses é igs eres > ; Nese Sases = 8 Ness lsiy = 3 "ges ‘grog 8 ‘ges (Stes it igves leeae fa iiss lass = 5 ‘Boss Sete i res lsaara iB ‘How ‘ss00 6 ‘Hels Seas ¢ Bases [Sr7eg Fa 2rs20 Soar is ‘zrigs “Sat00 2 gross erase 2 eos legos? reeset 2 ‘hole leave? 238 It is also interesting to generate several series. One hundred synthetic series of 51 yearly flows were generated. A sample autocorrelation function, the average autocorrelation function and the theoretical autocorrelation function are shown in Fig. 5.15 AUTOCORRELATION ¢ & \ a wn wt Figure 5.15. Sample ACF of series, average ACF of 100 series, and theoretical ACF. 5.2.8 LIMITATIONS TO BE CONSIDERED IN APPLICATIONS OF ARMA MODELING OF ANNUAL SERIES. A controversy exists regarding the adequacy of the ARMA models for reproducing the persistence encountered in hydrologic time series. The long-range dependence in hydro- logic time series is manifested by the fact that extreme events may persist for a long time. In the design of water re- sources systems through simulation methods, the hydrologic sequence is generated for a very long time, usually extending 500 to 1000 years. In such long records, it is reasonable to expect extreme precipitation and extraordinarily high river Jevels. It is evident that the occurrence of a very long drought period or a very long period of wet years would necessitate enormous reservoir capacities. Therefore, the generation of the long-range dependence effect is of vital importance to the water resources system planner Hurst (1951, 1956) was the first to do a detailed analysis of the long-term capacities of reservoirs making use of the 239 departures from the mean draft. He analyzed a large number of time series of annual flows, precipitation, temperature, tree rings, and mud varves ranging from 40 to 2000 years, and found that the rescaled range (see Sec. 2.2.5) obeys the relationships Ry ~ N" with h, now called the Hurst coeffi- cient or Hurst slope, having an average value of 0.73 and a standard deviation of 0.08. On the other hand Hurst (1951) and Feller (1951) showed that for an independent normal process h = 0.5. The Hurst coefficient h is a measure of the persistence or the dependence structure of a time series and so it is re~ lated to the correlation function of the series. Studies have demonstrated that h = 0.5 is valid not only for independent, normal variables but also for some dependent variables such as AR variables. AR models have been labeled short-memory models or of short-term dependence since their correlation functions decay rapidly as the number of lags increases. On the other hand, the opposite occurs for long memory models or long-term dependence. The consideration of short-term and/or long-term dependence has led hydrologists and statis- titians to propose various alternative models for the stochastic modeling of hydrologic time series. ‘The fractional Gaussian noise model (Mandelbrot and Wallis, 1969) was designed to model long-term dependence. Two basic characteristics of this model are that its autocorrelation function is not sum- mable and h > 0.5_ in contrast to AR models for which h = 0.5. The broken line model (Mejia et al., 1972) exhibits characteristics similar to the FGN model. Another alternative is provided by certain ARMA models which possess long-term dependence characteristics even though h = 0.5 assymptotically, O'Connell (1971, 1977) has studied in particular the persistence characteristics of the ARMA(1,1) model and found that the parameter region’ 0<@<1 and’ 0<@<1, $>@ is of particular interest in modeling long- term persistence. Simulations, carried out by Hipel and McLeod (1978) and Salas et al. (1979), show that with ac- curately fitted parameters, the ARMA models possess an auto- correlation which decays exponentially, but sufficiently slowly to approximately preserve the rescaled adjusted range and the Hurst coefficient. In summary, the ARMA model could be used for modeling time series with long-term dependence. However, one must consider the fact that in trying to preserve the long-term de- pendence structure, the short-term dependence may be often distorted or may not be preserved. Thus, depending on each particular case the analyst must decide which properties of the time series are required to be preserved and face the possible consequences of not preserving some others. In 240 general, several synthetic annual time series may be generated to obtain an estimate of the various statistics such as the mean, variance, skewness, correlogram, rescaled ad- justed range and the mean drought length. ‘Then the com- parison of the historical and generated statistics can be made to ascertain whether the fitted ARMA model is able to pre- serve the major historical statistics intended by the modeler. 5,3 ARMA MODELING OF PERIODIC TIME SERIES As indicated in Chapter 2, periodic hydrologic time series are those for which the time intervals are less than one year. For instance, seasonal series, monthly series, weekly series or daily series are periodic with either some or all their statitical characteristics varying with time in a periodic manner. The correlation structure of the periodic series may be the result of an ARMA process with either constant or periodic coefficients. . In either case the model representing the original time series will be called periodic ARMA model. 5.3.1 PERIODIC ARMA MODELS. Let us consider the original periodic series x, y Where v_ denotes the year, t=1, ..., w and w is the number of time intervals in the year. Assuming that the distribution of the series is skewed, an appropriate transformation can be used (Chapter 3) to transform xy to the normal series Vy,z- Then the periodic ARMA’ hodel for Vy, can be written as Hy + Oy Ay. (5.78) where , and a, are the periodic mean and periodic standard deviation and 2, may be represented by an ARMA model with either constant or time varying (periodic) coefficients. The ARMA(p,q) model with constant coefficients is ? 5.79 - 2 Oe. 5 +, (5.79) P =k oz itt a t t jel Pt-j where t= (v-l)wtt, @ and @ are the coefficients of the model and &, is the independent normal variable. This model is the same as the one used in Sec. .5.2.1 241 Tao and Delleur (1976) used the ARMA(p,q) model with time-varying coefficients as PB q Zt a 9 r By te” a 8 or Put t eye 80) where 6; , and 6, , are time varying autoregressive and moving average coefficients, respectively, and &, | is an vit independent and identically distributed normal | random variable. 5.3.2. PARAMETER ESTIMATION FOR PERIODIC ARMA (ODELS Assume as in Sec. 4.3.2 that there is an available sample of periodic hydrologic series denoted by x, , V=1,..., N and t= 1, ..., w where N is the total number of years of data and w is the number of time intervals within the year. As said before, prior to estimating the parameters of the models given by Eqs. (5.78) and (5.79) or (5.80), the transformation to normality should be made following the pro- cedure indicated in Chapter 3. The parameter estimation for periodic ARMA models is similar to that given for periodic AR models in Sec. 4.3.2 Removal of Within-the-year Periodicity For monthly series the monthly average jy, and standard deviations s_ are obtained month by mofth, or quarter by quarter for’ quarterly series, and are introduced in Eq. (5.78) as estimates of H, and oO respectively. For daily or weekly series the Fourier coefficients for the periodic mean and periodic standard deviation are calculated and the Fourier series fit of vy, and 8, are determined foliowing the procedure indicated in Secs. 3.3 and 4.3.2. The series Yy,r i8 then standardized by (5.81) where fi, and @, are either , and s, or their Fourier series fit. By removing the periodicity in the mean and in the standard deviation the series 2, of Eq. (5.81) be~ comes second-order stationary provided the autocorrelation function is approximately stationary. 242 Fitting Constant Coefficient ARMA Models The technique of fitting the ARMA(p,q) model to the 2, _ Series is the same as that explained in Sec. 5.2.2 with t + (v-l)wt. It must be remembered, however, that the synthetic series generated are z, | series, and that an in- verse standardization is needed to produce the y,, | series of Eq. (5.78). Likewise, if a transformation such’ ds a log- arithmic transformation has been used to obtain the y., series, an inverse transformation, i.e. exponentiation, is necessary to obtain the desired series x, ,. These models are, in general, satisfactory when the correlation function is approximately stationary. If the correlation function is periodic it may be desirable to consider an ARMA imodel with periodic coefficients. A Models Fitting Periodic Coefficients ARI The periodic correlation ry , of the series 2, may be computed from Eq. (2.10). ‘Thus, approximate estimates of ; and 8, , may be obtained from generalizations of the Yule-Walker Equations (Salas, 1972). For example: ARMA(L,O) 8). = Ty 5 (5.82) rot r. ltl 2,1 ARMA(2,0) 9,» = Eh et 1, Itt (5.83) by = et Mag Tet 2,1 ar ARMA(1,1) Eta res bt (5.84) a-6 hye Lt. . (6.85) 7 oe et Equation (5.84) is seen to be analogous to Eq. (5.29) and Eq. (5.85) is analogous to Eq. (5.28). Exact moment estimates are given by Salas et al. (1980). 243 The final estimates of the 9's and 6's can be obtained Gf desired) by fitting a harmonic function to their initial estimates 5.3.3 GOODNESS OF FIT FOR PERIODIC ARMA MODELS The tests for the goodness of fit for periodic ARMA models with constant coefficients are the same as those cited for the annual series in Sec. 5.2.3 Regarding the time varying ARMA models, there is no theoretical or empirical test available for testing the periodic correlation in the residual of stochastic models. For the ARMA(1,1) model the approximate variance of the residuals is (5.86) Equation (5.86) is similar to (5.26) with Y= 1. Tao and Delleur (1976) have proposed to replace the \fonperiodic Porte Manteau statistics Q by (see Eq. 3.46b) Low . Q@eNE =F (ry) (6.81) , wel ter) Kt in which N is the number of years of observations, w is the number of observations per year, L is the number of lags taken into account and r, | is the periodic correlation coefficient of the residuals at time « with lag k. The Q; statistic is similar to the Q_ statistic of Eq, (5.50), there- fore, it is called the modified Porte Manteau statistic. ' Smaller Q; | statistics indicate a lesser periodic correlation in the series. 5.3.4 SUMMARIZED ARMA MODELING PROCEDURE FOR PERIODIC SERIES STEP (1). Do the appropriate transformation so that the series is approximately normal. This may be a logarithmic transformation, a power transformation, a Box-Cox transfor- mation, etc. (see Sec. 3.2) STEP (2). Make the series stationary by Eq. (5.81). Typically yj, and o, are estimated by Y, and s, respec- tively for ‘quarterly' or monthly series.‘ For sebies with shorter time intervals the parametric standardization using Fourier series fit is more parsimonious requiring fewer param- eters. The Aj and Bj coefficients of the Fourier series 244 are obtained by Eqs. (3.32) and (3.33) for the mean and standard deviations. The number of harmonics are determined by the criteria suggested in Sec. 3.2.3. The fi, and 6, values are calculated applying Eqs. (3.30) and (3.31) in each case and the standardized series is formed by Eq. (5.81). A. Constant Coefficient Model The fitting of the ARMA(p,q) model to the 2, = 2, 5 series (t = (v-l)wtt) is the same as in Sec. 5.2.6, including the steps (2) through (11). Such steps are renumbered and summarized as: STEP (3). Calculation of autocorrelation and __ partial autocorrelation functions STEP (4). Identification. STEP (5). Initial estimate of the autoregressive parameters. STEP (6). Initial estimate of the moving average parameters. STEP (7). Maximum likelihood estimate of the parameters. STEP (8). Test of goodness of fit - Porte Manteau test. STEP (9). Akaike information criterion test STEP (10). Synthetic generation (if desired) STEP (11) Inverse transformation. Once the generated or forecasted z, , values are obtained, apply the inverse of the standardization transformation: If an initial transformation was used, such as a logarithmic transformation, an inverse transformation such as exponentia- tion is needed to obtain the series in the original form. STEP (12). Forecasting (if desired), followed by the above inverse transformations as needed B. Periodic Coefficient Model If there is reason to believe that the standardized series 2, _ is not stationary, the ARMA model with periodic coeffi- cients may be considered. After the standardization the following steps are used: 245 STEP (3') Compute the periodic correlation coefficients by Eq. (2.10). Step (4'). Obtain the periodic coefficients of the ARMA model from Eq. (5.82) through (5.86) depending on the case. STEP (5') Optional fit of the periodic function to the estimates of the coefficients (six harmonics are usually sufficient) STEP (6'), Test the goodne: Q-statistie Eq. (5.87). of-fit using the modified EP (7'). Use Eq. (5.80) for generation of synthetic data Gf desired). STEP (8'). Take conditional expectation of Eq. (5.80) to obtain the forecasting function (if desired) 5.3.5 EXAMPLES OF ARMA MODELING OF PERIODIC SERIES Example of Modeling with Constant Coefficients Model Delleur and Kavvas (1978) applied ARMA models to the average monthly rainfall series over 15 basins located in Indiana, Illinois and Kentucky. The basin areas varied be- tween 240 and 4000 square miles, approximately. The average rainfall was obtained by the Thiessen polygon method. The record lengths varied between 468 and 684 months. The monthly rainfall at Salamonia, Indiana, are used in this example, The data, x, |, are given in Table 5.13. For example X, 1 = 3.920, X 1 = 3.900, Xgg ) = 1.340. a STEP (1). Transformation, The square root transformation of the monthly rainfall was used in order to obtain a series which is approximately normally distributed. The last two lines of Table 5.14 show the monthly mean and the monthly standard deviations of the monthly rainfall square roots. For example, vie ie (3-920 + J5900 + ... + J1.340] = 1.5145, and co _ 2] Vv/2 81 =| 37 ae Oy - Fp = 0.5660 where Vy, = 43-920, yp , = 73-900, ete 246 “euerpuy ‘eruowepeg oT ob "8 e 4 4 “9 "e qe (seysut) uopendioorg “e1'S Ode o2e"1 aus 809-2 Neat 247 p9gb" bess: ose? cogs" eaee"t 2are"T Sbrs vors gsee'2 SE00"— 7 6200" nr E0Sh* dbTOrT 2eeB s ’ & 2 1 “eueypuy ‘eruouepeg ye (seysUT) UONeNdIOAIg Jo sjooy aseNbg Jo saicog psepuEg pais 1 Nea & PI'S LGR 248 STEP (2). Stationarization. The square root transformed series was stationarized by means of Eq. (5.81) where {i and 6, were equal to y, and s, respectively, as showh in Table 5.14. For example z. = (/37920 - 1.5145)/0.5660 = 0.8222. 1a STEP (3). Autocorrelation and partial autocorrelation. The autocorrelation and partial autocorrelation functions were cal- culated and are given in Table 5.15. STEP (4). Identification, The autocorrelation and the partial autocorrelation functions were used in the identification of the time series which indicated the possibility of an ARMA(1,1) model. STEP (5). Initial estimate of the autoregressive parameter. The initial estimate of the parameter ; was obtained from Eq. (5.29): 1 = re/ry = 0.0179/0.0288 = 0.62. STEP (6). Initial estimate of moving average coefficient 0). It was obtained as in the example 5.2.6 step (5), which also yields the value of a. The values of the preliminary esti- mates obtained by thé IMSL subroutine FTARPS and FTMPS are given in Table 5.15. STEP (7). Maximum likelihood estimates. Refined estimates of the ARMA(1,1) parameters which maximize the Log Likeli- hood function were obtained by minimizing the sum of squares of the residuals. The values of S($,8) = 2 c? (9,8) obtained as in step (6) of the example of Sec. 5.2.7 are listed in Table 5.16. It is seen that the sum of squares surface exhibits a diagonal valley going from (9,6) = (0,0) to (1,1). In this particular case the lower portion of the valley is very flat and in this region there is a large number of combinations of possible @ and ® values. The approximate parameter values which minimize the sum of squares of residuals are $, = 0.9 and 8, = 0.9. The estimated parameters obtained from the sum of squares of residuals were used as initial values in the IMSL program FTMXL which seeks the minimum of the sum of squares of the residuals. The final estimates of the parameters are shown in Table 5.15. STEP (8). Test of goodness of fit. The Porte Manteau lack of fit test was applied. The Q-statistic is given in Table 5.15. The complete program listing for this example is given in Appendix A5.4. The ARMA(1,1) model passed the first test in all 15 cases studied by Delleur and Kavvas, and the second test in 14 cases. 249 Table 5.15 Autocorrelation and Partial Autocorrelation Functions, and Partial and Final Parameter Esti- mates for the Standard Scores of Square Roots of Precipitation (in) at Salamonia, Indiana. Les AUTOCORRELATION PART. AUTOCORRELATION Parameter Estimates ARMA(1.1) 4 & 7 s(.8) Preliminary 0.62 0.59 0,974 - - Sum of Squares Surface 0.9 0.9 439.0 - FTMXL 0.914 9.961 0.961 28,60 wigp (908) = 30.8, n= 28 STEP ($). The forecasting function takes the form Gegp] = 2G) = Order a) + lepep] ~ Orl re a] where the square brackets indicate conditional expectations as in Eq. (5.56). Thus, 2,(1) = O12, - Bie, 2,(2).= 912,(2) 250 EprverT 29roszt perose2 vy-eeee 18: ‘aoptt c£r000»T 007 * So-tee” ve-ees oe sTsee' Ee'epee Og O2"80s S9-¥es 83. oe Ge Sen te oe Estep 2 03° E-ebe Fs os SeSby a op easy be-ope oe bo-eb Bebb oe Go-ssb fonsby or 0S'S2b _26°SSP 0 a wa THs euETpUy ‘eTUOUETeS 1B SeLIeg TTeyurey ATqWwOHL pazrpaepuerg Joy (I‘L)VWUY JO sfenprsey Jo sosenbg jo ung -gt"s aTqeL 251 24(3) = 6124(2) in general 2b) = 12, (L-1), L> 2 The $-weights obtained from Eq. (5.66) are: Yo ve " = ve “ & 2 = @y- 01) oft and the standard error of the forecasts is L-1 OL) = 1+ (61-81)? 6 ir0 .1) 2 2¢j-1) 0, Examples of Modeling with Periodic Coefficients Model Tao and Delleur (1976) have fitted conventional AR and ARMA models to daily and weekly flows of 20 watersheds located in Indiana, ‘ilinois and Kentucky, The watersheds have drainage areas varying from 234 square miles to 11,800 square miles and with a length of record varying from 26 years to 63 years. The daily flow series of the Salamonie River at Dora, Indiana (USGS station 33245) is examined in ‘the following example. STEP (1). Transformation. A logarithmic transformation of the daily flows is performed. STEP (2). Fitting a harmonic function to the daily means and standard deviations. Figure 5.16 shows the fits obtained by using 6 harmonics to fit the means and standard deviations of the daily flow logarithms STEP (3). Estimate the seasonal serial correlation coefficients. These are estimated by Eq. (2.10) and a 6-harmonic fit is shown for the first three lags in Fig. 5.17. STEP (4) and (5). Estimate of the periodic autoregressive and moving average coefficient. These are obtained from 252 8 10.00 MEAN 3.00 20.0 0. DOO Soo | oD 200.0 DAYS Figure 5.16. Means and standard deviations of logarithms of daily flows (Station 33245). Eqs. (5.84) and (5.85). These values and a 6-harmonic fit are shown in Fig. 5.18. STEP (6). Goodness of fit test. The seasonal serial correlation coefficients of the residuals, the Qy-statistic and the cumulative periodograms of the residuals are examined. Figure 5.19 shows the cumulative periodogram of the residuals of the ARMA(1,1). It may be seen that the residuals do not exhibit any dominant frequency and may therefore be considered ‘as white noise. It is interesting to notice, how- ever, that the residuals were not normally distributed, and their distribution was closely approximated by a bi-gamma distribution as shown in Fig. 5.20. 253 Figure 5.17. Seasonal serial correlation coefficients of lags 1, 2, and 8 for daily flows (USGS Station 3-3245) (logarithms). 254 1.000 “B00 ‘00 ‘t " 0 00 LUD HNO Days Figure 5.18. Seasonal AR and MA coefficients for time varying ARMA model applied to the cyclicly standardized daily runoff logarithms. ARMA (1,)) WITH SEASONAL COEFFICIENTS 0.000 206, ‘vo. a6. 20 1086. Figure 5.19. Cumulative periodogram of residuals in time varying ARMA(1,1) model 255, Ba PROB. DENS, aS BI-GAMMA DALY RESIOUALS 2 Figure 5.20. Probability distribution of residuals. 5.3.6 LIMITATIONS TO BE CONSIDERED IN APPLICATIONS OF ARMA MODELING OF PERIODIC SERIES For time steps less than a year, but not smaller than a month the nonparametric standardization may give adequate results. For monthly series there are 12 monthly means and 12 monthly standard deviations to evaluate, there are there- fore 24+ptqtl parameters to be estimated from the data. For time steps smaller than a month, the parametric periodic standardization usually requires less than 12 parameters for the mean and less than 12 for the standard deviation, and the total number of parameters is less than 24+p+qtl. If a substantial cyclicity remains in the autocorrelation function, the time varying ARMA Model may be called for. It should be remembered, however, that for the Porte, Manteau test no distribution has as yet been found for the Q-statistic of this model, and the suggested Q,-statistic is heuristic. The probability distribution of the residuals may not be normal, which may create some difficulty in the generation of synthetic series using this model 256 APPENDIX A5.1. COMPUTER PROGRAMS The International Mathematical and Statistical Libraries, Inc. (IMSL) GNB Building 7500 Bellaire Boulevard Houston, Texas 77036 have a collection of computer programs which include sub- routines on time series analysis. The following subroutines are pertinent to the material of Chapter 5. FTAUTO FTARPS FTMPS FTMXL FTGEN FTCAST FTCMP: Estimates the mean, variance, autocovariances, autocorrelations and partial autocorrelations for stationary time series. Preliminary estimation of _ autoregressive parameters in an ARIMA stochastic model Preliminary estimation of moving average parameters in an ARIMA stochastic model Maximum likelihood estimation of autoregressive and moving average parameters in an ARIMA stochastic model and calculation of residuals Generation of a time series from a given ARIMA Time series forecasts and probability limits using an ARIMA model. ARIMA stochastic model analysis with full parameter iteration and maximum likelihood estimation, generation of a time series, and time series forecasts 257 1 SUBROUTINE FTAUTO (ls Lil Key ESHs ANEAN, UAR, ACU, ACs PACUs HKARER) copurer LaTest REVISION Pura: usace PROWMENTS. ean uae °c Pec KARE PRECISION “HARBUARE NaTeTION Ee-GINGLE SANUARY 1» 1978 MEAN, UARTANCEs AUTOCOUARIANCES, NS AND PARTIAL. NS FOR @ STATIONARY TIME SERIES, CALL FTAUTO (eLHeKeLs TSH» ANEAN, UA, ACULAC PACU, HKAREAD INEUT VECTOR OF LENGTH Li CONTAINING THE TIME u F AUTOCOUReTANCES AND AUTCCORRELATIONS 70. BE COMPUTED. INGUT NUMBER OF PARTIAL aUTOGORRELATIONS TO BE COWUTED. MUST Be LESS THAN OF EQUAL To ke inpur ConTgoL raganer 0 86 PESFORED. IF FOR DETERMINING TASK ISH = 1 FIND EAN eND VARIANCE. ISN = 2 FIND AUTOCOURRIANCE. ISU = 3 FIND Means UaRIANCE, AND ‘RUTOCOUARIANCE: ISH = 4 FIND AUTOCOUARIANCES AnD RUTCCORRELATIONS. ISH = FIND HEANs URRIANCE, AUT COURRTARCES> AND. AUTOCORRELE Tons. TSH = & FIND AUTOCOUARIANCES, AUTOCORREL~ ATLONS, MM PARTIAL AUTOCO™ RELATIONS. TSH = 7 FIND EAN, UERIANCE, @UTOCUUAR— TANCES, AUTOCORRELATIONS, AND PARTIAL AUTOCORRELATIONS. 13eS0 AND 7. Bide AND 6. NEAW UALUE OF OUTPUT FOR Iu INEUT FOR Ii THE TINE SERIES Hi. OUTPUT FOR Tsu = 1.3,5+ AND 7. INPUT FOR ISH = 2,4) AND 6. UARIANCE OF TIME SERIES 1. UECToR OF LENGTH K. QUTPUT FoR Tsu ="2,9,4)546 AND 7. SUTOCOUARIeNCES FOR TINE SeRTES L. Acucty CORRESPONDS TO A TIME LAG OF I TIME UNITS. UECTOR OF LENGTH f. OUTPUT FOR ISH = 4555) anD 7. AUTOCORRELATIONS FOR TIME SERIES uU. eccl> Corresponds To\A TINE LAG OF I TIME UNITS. VECTOR OF LENGTH . OUTPUT FOR ISH ='5 AND 7. ERRTEAL' AUTOCOSRELATIONS' OF TINE SERIES H. PACUCLY = ACTL WOR AREA UEETOR GF LENGTH L. SINGLE any pousLe/Hs > SINGLE 36,448, HEO REGD. IMSL ROUTINES ~ ~ DOUBLE uxqoD, UXHUL, VAST SINGLE /NONE REQUIRED INFORMATION ON SPECIAL NOTATION AND 258 CONENTIONS 1S AUATLABLE IN THE naval INTRODUCTION O8 THROUGH IMSL ROUTINE UHELP REMARKS 1. IN SOME ROUTINES OF THIS CHAPTER ACUCI) CORRESPONDS TO @ TINE LAG OF C11) TENE UNITS RATHER THAN T TIME UNITS. THUS, IW THESE SUBROUTINES, ACUCL) TS THE Same AS THE UaeraNce YAR. INt THE CALLING PROGRAY TO FTAUTO. THE USER WISHES THE UARTANCE 1D BE THE FIRST ENTRY N TS AUTOCOURRIANCE aeRAY THE FOLLOMING CALL CAN BE aL FTAUTOCHs Lis pL ISH» AMEAN, ACUCL + ACUC)4AC+ PACU RARER), THE USER SHOULD ROTE THAT IN THIS CASE, ACU MUST BE H's CONSTANT, THEN Aly OF aCUr Ate DIMENSIONED Kei TH Tee PAIN PROGRAM. Tp THE TIRE SERIC AND! PACU THAT ARE OUTPUT, ACCORDING TO THE TSW SETTING ARE SET TO 2ER0. coPvercut = 1878 BY ISL, INC. ALL RIGHTS ESERUED, NeRReNTY IMSL WARRANTS ONLY THAT IMSL TESTING Has APPLIED TO THIS CODE. NO OTHER KnaRRANTY, EAPRESSED O8 THPLIED, 1S APPLICABLE. 1 SUBROUTINE FTARPS (ACU. HBARs IP 10, ARPS: PHACs Hay TER) COMPUTER = cOr/SINGLE LATEST REVISION —- UANUARY 1, 1972 PURPOSE, ~ PRELIMINARY ESTIMATION OF THE AUTOREGRESSTUE, PARAMETERS IN AN ARIMA STOCHASTIC MODEL. usace ~ oF LL FTARPS (AEULMBAR, IP, 10+ARPS+ PHAC Ky ARGUMENTS ACU ~ INPUT UECTOR OF LENGTH IP+T0+1 CONTAINING THE ‘OF THE TIME SERIES BEING MODELED. qCUCI) 15 THE AUTOCOUARIANC: CORRESPONDING TOA TIME LAG OF I-1 UNITS OF Tine. UBAR = INPUT HEaN OF THE TIME SERIES. Te” > INPUT NUNDER OF AUTOREGRESSIUE PARAMETERS IN "THE MODEL. TP MUST Be GREATER THAN C8 Equal TO i. 10 = INPUT NUMBER’ OF MOUING AVERAGE PARAMETERS IN 10 MUST GE GREATER THAN OR EOUAL fees ~ ISTH IP. ARPS CONTAINS IPA TES OF THE SIUE PARAMETERS OF THE MODEL Pyac = OUTPUT OUERALL ROVING AUERAGE CONSTANT. La” > HORE ARER VECTOR OF LENGTH IPe=2e52IP. ee = TERMINAL ERROR Rn 189 INDICATES ATED © SINGULAR SYSTER. TIS USUALLY Gapreares TuacID-Gneues IN THE ACU UeCTOR oR A TINE SERIES WHICH TS honsrarionaey. PRECTSION/HARDUARE - SINGLE AND DOUBLE /H32 = SINGLE ABS, Haas H30 REOD. IMSL ROUTINES ~ LEOTIF +LUDATH + LUELMF UERTST-UGETID, UREA NoTaTION coPyercHT neRAntTY SUBROUTINE FINES: COMPUTER LATEST REVISION PURPOSE usace ARGUMENTS ACU PRPS P pres ayy 1 TER PRECTSION/HARDNARE, Reon. IMSL ROUTINES noTaTrON = INEDRNATION QN SPECIAL NOTATION AND CONUENTIONS TS AVAILABLE IN THE MANUAL INTRODUCTION OR THROUGH THSL ROUTINE UHELP ~ 1878 BY IMSL. INC. ALL RIGHTS RESERVED, = SSL HeQRANTS ONLY THAT INSL TESTING HAS BEEN APPLIED TO THIS CODE. NO OTHER WARRANTY, EXPRESSED Of IMPLIED, TS APPLICABLE. (ACUs ARPS) IPs 10)PHAS. WNUs HAs TER) > enc/SINGLE upUARY Le 1978 = PRELIMINARY ESTIMATION OF THE NOUING AVERAGE. PARAPETERS IN AN ARIMA STOCHASTIC MODEL = CALL FTMPS (AOU, ARPS, IPs TO+PHAS;HNUs Ay TER? INPUT VECTOR OF LENGTH IP+To+1 CONTAINENG THE AUTOCOUASTANCES OF THE TINE SERIES BEING MODELED. “ACU(T) TS THE AUTOCOUARTANCE CORRESPONDING TOA TIME’ LAG OF I=L UNITS. OF THE. INPUT GecToR OF LENGTH IP. THIS VECTOR CONTAINS THE PRELIMINARY ESTINATES OF THE AUTOREGRESSIUE PARAMETERS OF THE MODEL. THESE ESTIMATES CAN BE COMPUTED BY CALLING SUBROUTINE FTAgeS PRTG? TO CALLING FIMBS. ~ INPUT NUMBER OF AUTOREGRESSIVE PaRareTeRS IN THE HOUEL. TP MUST BE GREATER THAY OR FOURL INPUT NUMBER OF MOUING AVERAGE PARAMETERS IN THE MOREL. TO HUST SE GREATER THAN O EQUAL 101. ~ OUTPUT VECTOR OF LENSTH Ta. PHAS CONTAINS ‘THE MOVING AVERAGE PARAMETERS. = OUTPUT UNITE NOISE UARTANCE. = MORK Agen VECTOR OF LENGTH t3ecta+1)9#¢¢(1049)=10) 72) . = ERROR PARAMETER. (OUTPUT) TERHTNAL. ERROR ER = 139 INDICATES FATLURE OF THE ALGORITHM TO CONVERGE TO ACCEPTABLE ESTIMATES FOR THE MCUING AUERAGE PaRAETERS. 3eR = 130 INDICATES & SINGULARITY INTHE NONLINEA@ COUATIONS HICH HUST BE SOLVED IN THE ITERATION PROCEDURE. THIS CONDITION Hay BE CAUSED BY EXTREMELY SMALL UALUES IN THe UecToR acu. 1eg = 131 INDICATES ACUCL) IS LESS THAN OR Eoual To Zend, JER = 32 INDICATES @ COMPUTED HITE NOISE UsPrANCE IS LESS THaN OR EGUAL TO ZERO. THIS INPLIES THAT 10 1S LESS THAN OF EQUAL TO ZERO. = SINGLE ann nouLe a2 = SINGLE HSS, Hag, HEC ~ FTVQS,UERTST.UCETIO,Z5¥STH INFORNGTION ON SPECTAL NOTATION AND 260 CONVENTIONS TS AVAILABLE IN THE NaNUAL. INTRODUCTION O@ THROUGH IMSL ROUTINE UHELP coPyRIGHT = 1978 BY IMSL+ INC. ALL RIGHTS. RESERUED. ARRANTY: = ISL WARRANTS ONLY THAT INSL TESTING HAS BEEN D TO THIS CODE. NO OTHER Lak A SUBROUTINE FHL (Xs IND, ARPSsPTAS) PHAC, ANU, GR+AL TER) COMPUTER = cocrsincLe LATEST REVISION —~ JANUARY 1, 1878 PURPOSE, ~ MAXIM LIKELIHOOD ESTIMATION OF PUTOREGRESSIVE AND MOUING AVERAGE PARAMETERS IN AN ARTHA (BOX-JENKINS) STOCHASTIC MODEL usece = CALL FTMXL. (xy INDs ARPS: PHAS, PHAC. uN GRy TER) PRGUMENTS X= INPUT TINE SERIES OF LENGTH INDI), X15 DestRoveD DY OUTPUT. INPUT/OUTPLT VECTOR OF LENGTH @. INDI) Cone TAINS HHEN Tel, LENGTH OF TIME SERIES x. 382) THUEE® (NON-NEGATIVE) OF euToREGRES- SIUE PARAMETERS IN THE DIFFERENCED FORM OF THE ARIA MODEL. 183, NUNBER CHON-NECATIUED OF MOVING AUER AGE PARAMETERS. IND(S)+INQ(3) MUST BE POSITIUE. Ted, NUMBER (NON-NEGATIVE) OF DIFFERENCING mm OPERATIONS REQUIRED TO HeKE x STATION- ARV. IF IND(4)=0 THE MEAN TS REMGUED FROM x. 325) INCU MAKIMUM RUNBER OF ITERATIONS DESIRED. “OW OUTPUT. IND(S) CONTAINS THE NUMBER GF ITERATIONS PERFORMED. Isr NON-NEGATIVE CONUERGENCE PARAMETER. CONVERGENCE 15 ASSUMED IF INDCE) SIGHIF- ICANT DIGITS OF THE OBJECTIVE FUNCTION DO NOT CHaNBe AFTER IND(B) CONSECUTIVE ITERATIONS. IZ, INDC7) NONZERO IMPLIES INITIAL ESTI- SATES OF ARPS AND PHAS IN THE DIFFER ICED FORM OF THE MODEL ARE INPUT. IND(?)=0 IMPLIES FTMAL CALLS FTARPS AND FTHPS TO CALCULATE INITIAL ESTIMATES. T=, POSITIUE CONVERGENCE PARARIETER AHOSE FUNcTION 15 DESCRIBED UNDER IND(E). RPS — INPUT/OUTPUT UECTOR OF LENGTH IND(2)+INDC4) « (ON INPUT. IF IND(?) 15 NONZERO. THE FIRST TNp(2) LOCATIONS SHOULD CONTAIN INITIAL ES~ ‘TINATES OF THE AUTOREGRESSIUE PARAMETERS IN THE DIFFERENCED FORM OF THE MODEL. ON OUT= PUT, THE INDC2)+IND(4>. PARAMETER ESTIMATES ARE’ FOR THE UNDIFFERENCED FORM OF THE MODEL. PraS = INPUT/OUTPUT UECTOR OF LENGTH IND(S).. ON IN PUT, IF INC?) 1S NONZERO, Pras SHOULD CON TAL INITIAL ESTIMATES OF’ THE NOUING AUERAGE PARANETE?S. ON OUTPUT, THE PARAMETER ESTI~ HATES. ARE RETURNED. PHAC = OUTPUT ESTIMATE OF OUERALL MOVING AVERAGE CoN- STANT IN THE UNDIFFERENCED MODEL. 261 WU = OUTPUT ESTIMATE OF THE HITE NOISE UARTANCE. GR = HORK AREA OF LENGTH 2¥CINDC@yIND(S))« A= HOR ARES OF LENGTH THE naxtiit OF Te CINDC2)+S)¥INDC2)+INDCSI HL BI CCINDC)*3)#IND(3))/2eSwINDC3I+INDCD+E 31 eeINDCD) TER ~ ERROR PARSMETER. COUTPUT) TESHTNAL ERROR TER=129. INDICATES AT LEAST ONE ELEMENT OF TND aS" OUT OF RANGE 126130 INDICATES AN ERROR OCCURRED IN THSL ROUTINE FTARPS. MORNING HITH FTX TERE? INDICATES AN ERROR OCCURRED IN rHsL ETHES, INITIAL PAS ESTIMATES 50. 26%0. ATES THAT ALL ND(S)_ ITEReT IONS UERE PERFORMED. CONVERGENCE <5 aSSUreD SND PROGRAM CONTINUES. CALCULATIONS. PRECISION/HARDUARE ~ SINGLE AND DOUBLESH3a REQD. IMSL ROUTINES - SINGL oTarion REHERKS copyaicHT uarRAnty usase ARGUNENTS: = SINGLE 136! 148-50 E/E TARPS, FTAUTO) FIPS: F TH0S. LEGTIEs CUBATE. LUELHF UERTST-UGETIO, UABriAr. 25¥sTh = DOUBLE/F TARPS; FTAUTO)FINPS»F THOS. LEOTIF, LUDATF LUELHF UERTST. USETIO. UABNXF . UXADD. UXMUL UXSTO, Z5¥STH INFORMATION ON SPECIAL NOTATION AND COWENTIONS 18 AUAILASLE TN THE MANUAL INTRODUCTION OR THROUGH IMSL ROUTINE UHELP ESTINATES OF THE RESIDUALS 08 ONE-STEP FORECASTING ERRORS ARE CONTAINED. IN THE FIRST INBC1)=2NEC4) LOCATIONS OF WORK UECTOR A. = 1978 BY IMSL, INC. ALL RIGHTS RESERVED. ~ IMSL _HARRANTS ONLY THAT IMSL TESTING Has BEEN APPLIED TO THIS CODE. NO OTHER WARRANTY, EXPRESSED OR IMPLIED, IS APPLICABLE. (ARPS, PHAS, PHAC, START, HAV» DSEED» 1Py 1GyLHs4 Ul) = coc/stnoLe = vanuary 1, 1978 ~ GENERATION OF § TIME SERIES FROM A GIUEN ARIMA CBO-JENKINS) STOCHASTIC MODEL = CALL FTGEN CARPS+PrAS PACH STAR 1) Tar Lies ka) sus DSEEDs ARPS - INPUT VECTOR OF LENGTH IP CONTAINING THE RUTOREGRESSTUE PARAMETERS OF THE MODEL. Pras - InpUT VECTOR OF LENGTH 10 CONTAINING "THE FOUING AVERAGE PARAMETERS OF THE MODEL. Prac - rNpuf CUERALL HOUING AUERAGE PARANETER. START ~ INPUT VECTOR OF LENGTH IP CONTAINING. STARTING \UALUES HITH WHICH TO GENERATE THE TENE SERIES. Uny_~ INPUT WiITE NOISE UaRTANCE. DSEED - INPUT. AN INTEGER UALUE IN THE EXCLUSIVE 262 RANGE (1+2147493647). DSEED 15 USED TO INITIATE THE GERERATION, ANDO EXIT. 15 REPLACED BY 8 NEW DSEED 10 BE USED IN SUBSEQUENT CALLS, DEED UST BE TYPED DOUBLE PRECISION iN THE CALLING PROGRAM. IP = INPUT. NUMBER. OF AUTOREGRESSIUE PARAMETERS TN THE MODEL. 10 = INBUT MBER OF MOVING AUERAGE PARAMETERS IN THE MODEL. LH = eUT LENGTH oF THE TIME SERIES TO RE eneearen. = OUTPUT UECTOR OF LENGTH LH CONTAINING THE GENERATED TIME SERIES. WR = WORK AREA UECTOR OF LENGTH LideHARTHUMCIPY 10) PRECISION/HARDNARE - SINGLE AND OUBLEZHSE = SINGLE /Ha8-Ha8,HS0 REOD. INSL ROUTINES ~ GCNL. GGUBS: MINIS. HERFT.UERTST UGETIO. NOTATION ~ INFDRHATLON ON SPECIAL NOTATION AND CONVENTIONS 15 AVAILABLE IN THE NANUAL INTRODUCTION GR THROUGH ISL ROUTINE UHELP REMARKS 1. EITHER OF THE INPUT UALUES IP O8 TO Nay 2 TF MMU TS EDUALTO'ZERO, THE MODEL REDUC NCD» THE SUM FROM J = 1 TO IP OF ARPS( CIs) +PMAC FOR Tats... 9H AND THE UALUES STARTCK)+KeLs CORRESPONDING TO U(-IP+1)++0+0H(D). cOPyRIGHT = 1978 BY IMSL, INC. ALL RIGHTS RESERVED. WARRANTY = INSLWORRANTS ONLY THAT INSL TESTING HAS BEEN APPLIED TO. THIS CODE. NO OTHER WARRANTY. EXPRESSED OR IMPLIED, 1S APPLICABLE. SUBROUTINE FTCAST (2+ ARPS: PHAS. PHACy ALPHA LUs DARPS:FCST, ANUs TER) COMPUTER = cne/sINGLe LATEST REVISION JANUARY 1+ 1979 PURPOSE, TINE SERIES FORECASTS AND PROBABILITY LIMITS USING AN ARIMA (BOX-JENKINS) MODEL USAGE = CALL_FTCAST. (2, ARPS, PriaS, PAC. ALPHAsLU» DARPS, FEST Lin. TER) ARGUMENTS 2 © ~ INPUT _UECTOR OF LENGTH LU(1) CONTAINING THE TIME SERTES. ARPS INPUT VECTOR OF LencTH Luca) CONTAINING ESTIMATES OF THE AUTOREGRESSIUE PARANETERS. PHAS —- INSUT UECTOR OF LENGTH 2sLUC) CONTAINING ESTIMATES OF THE HOUING AVERAGE. PARAMETERS EN THE FIRST CU(Q) LOCATIONS. THE REMAINING LOcaTEONS ane WORK STORAGE. PHC = INPUT, ESTIMATE OF QUERALL OUING AVERAGE CONSTANT . ALPHA ~ ZNBUT. A UALUE TN THE EXCLUSIVE INTERUAL (0, i) USED FOR COMPUTING 100¢1-ALPHA) PER NT PROBABILITY LIMITS FOR THE FORECASTS. THE UALUE 0.05 18 A COMMON CHOICE. z 263 = INPUT VECTOR OF LENGTHS. LUCID CONTAINS, HMEN iy LENGTH OF TIre Serres 2, 21 NUMBER OF AUTOREGRESSIUE PARAYETERS In THe moDeL. 12°) NUMBER GF MOVING AVERAGE PARAMETERS In THE 1/24) NUNBER GF DIFFERENCING OPERATIONS REQUIRED TO OBTAIN THE SERIES USED IN PETTING THE ARIMA MODEL. 1 2 5, MAATAUN LEAD TINE DESIRED FOR A Forecast. DARPS ~ QUIPUT USCTO? OF LENGTH Luca)sLuca) CONTAINING THE CONSTANTS, CORRESPONDING TO THOSE IN. 0RPS, FOR THE UNDIFF2RENCED FORM OF THe none. Fest - quTeur MATRIX OF DITENSION 3 By LUCS). FCST(trJ)s FOR LEAD TIMES U=1.2. 3. conTanh® inn 2 THE HEIGHTS FOR THE WEIGHTED Sut OF SHOCKS THAT GIVES THE FORECAST ERROR. a, THE FORECASTS. i. THE CORRESPONDING NEUTATIONS FROM Each FORECAST FoR THE PROBABILITY UIMiTs. uy = QuTPUT, ESTIMATE OF WHITE NOTSE UARIANCE. TER = ERROR PaAneTER. _(QUTPUT) TERMINAL. ERROR Tekalss IMIICATES PaRAHETER aarti Las NOT ZN THE EvCLUSIUE INTERUAL (0+), 30 FADTCATES CHE. O MOXE GF Lucey, Luc) O® LUG) HERE LESS THAN ZERO CR LUGS)" was LESS THAN One. 31 INDICATES LUC) TS Less THAN OR EQUAL TO LUC2)#LUCaHLUCA) Ig PRECISION /HARDHARE INGLE AND DOUBLES? = SINGLE 436,448,160 SINGLE /HONRTS, HERFT UERTST UGETIO [DOUBLE worge1S, NERF 1sUERTST; UGET IO» UKADD, UAMULs UxSTO REGD. IMSL ROUTINES NorATZoN = TFoRHATION ON SPECTAL NOTATION AND (CONJENTIONS 18 AUAILABLE TN THE Menuet, TNTROQUCTION 08 THROUGH TMS. ROUTINE UMEL REMARKS «THE ESTIMATES CONTAINED IN THE INPUT PARAMETERS aeps, PHACs AND PHAS, Hav DE COMPUTED BY USING IMSL ROUTINES FIAReS aD FTFPS. coPvRrcHT ~ 1978 BY IMSL, INC. ALL RIGHTS RESERUED. bagRANTY = TNSL_ORRANTS ONLY THAT IHSL BeeLIen 70 THIS CODE. NO TING Has BEEN HER WARRANTY. EXPRESSED OR THPLIED, IS APPLICABLE. 10 SUBROUTINE FTCHP Gp THD: DSEED, AL PHAL ARPS: PRAS. PAG) NWA FCST, SIM, ) computer = cnc/since LATEST REVISION ~ JANUARY 1 1978 PURPOSE, ~ NON-SEASONAL ARIMA (BOX-JENKCINED STOCHASTIC NOBEL ANALYSIS FORA SINGLE TINE SERIES 264 vusese ARGUMENTS x mm HITH FULL PARAMETER ITERATION AND MAXIMUM CikeLTHogD ESrIHaTioN ~ CALL FTCHP (it INDY DSEED, ALPHA ARPS: PrAS. PHAC, Hiivs FESTs STs ike TER) = INPUT UECTOR OF LENGTH IND(1) CONTAINING THE ‘TIME SERIES. x IS TESTROVED ON QUTPUT. ~ INPUT AND OUTPUT VECTOR OF LENGTH 10. ND TS DESTROVED ON OUTPUT EXCEPT AS inaneates. INDUD) CONTAINS THE INEUT LENGTH OF TINE ‘SERIES. INDC1) MUST BE GREATER THAN ENDS s1Nacr)+INDCarsz0. IND(2) CONTAINS, ON INPUT, THE MENInUM NUMBER OF AUTOREGRESSIUE PARAUE TERS IN THE DIFFERENCED FORY OF THE MODEL. INDC2) MUST BE GREATER THAN OR EQUAL’TO ZERO, COMMON UALUES FOR IND(2) ARE Oy J+ oe 2 ON OUTPUT, THDC2) CONTAINS THE NUMBER OF PUTOREGRESSIVE PARAMETERS IN THE DIFFERENCED MODEL SELECTED FOR FITTING. IND(3). CONTAINS. ON INPUT, THE AININUM NUYBER OF MOVING AUERAGE PARANETERS IN THE MODEL. INDC3) MUST BE GREATER THAN ‘Og EDUIAL TO ZERO, COMMON UALUES FOR INDCS) OREO, Ty OR Er, ON OUTPUT, INDC3) CONTAINS THE NUMBER OF HOVING AUERAGE PARAMETERS IN THE COMPUTED 1D(3) MUST BE GREATER ‘Thal 220, IND(4) CONTAINS, OM INEUTs THE MINIMUM NUMBER OF DIFFERENCING OPERATIONS ON THE TIME SERIES. IND(4) MUST BE GREATER THAN 02 EOUAL TO'ZERO, COMMON UALUES FOR INDC4) RE Dy Ts OR 2. OW QUTPUT, INDi4) CONTAINS THE NUMBER OF DIFFERENCING OPERATIONS PERFORMED ON THE TiMe SERIES. IND(S) CONTAINS THE INPUT NeXIMUN NUMBER OF TTeReTIONS DESIRED TO CALCULATE MaxirUM LIKELIHOOD ESTIMATES IN INSL ROUTINE FTVRL. & COnHON UALUE FoR TND(S) 15 25. INCE). CONTAINS. THE INPUT N@XIMUR NUMBER OF ‘@UTOREGRESSIUE PARAMETERS DESIRED IN THE BIFFERENCED FORM OF THE MODEL. IND(B) MUST BE GREATER THAN OR EQUAL TO IND(2). A COPRION URLUE FoR INDCS) TS 9, 1, OR 2. INC?) CONTAINS THE. INPUT ARIMUM NUMBER OF MOVING AVERAGE PARAMETERS DESIRED TN THE MODEL. INDC7) MUST ZE GREATER THAN OF EQUAL “TO IND(S). A COMMON VALUE FOR IND(3) 15 Oy Jy 02 2 nba)" cONTAINS” 1H INPUT M@xIMUN NUMBER OF FERFORHED ON DE GREATER THAN DR EDUAL TO INDi4), A COMMON VALLE FOR THD(@) 15.0» Ty O22. CONTAINS THE INPUT POSITIVE FORECASTING PARANETER. FORECASTS UP TO IND(S) STEPS IN ADUANCE ARE CALCULATED. NDC) FUST BE GREATER THAN ZERO, & COMMON CHOICE FOR THE VALUE OF IND(S) 15 THE LENGTH OF INTEREST Iv THE FUTURE. INDC10) CONTAINS THE INPUT SIMULATION OPTION. IF INDCLO) 18 LESS THAN OR EQUAL TO ZERO» 265 THEN SIMULATIONS @RE NOT DESTRED. I INBC1O) 18 GREATE® THAN ZERO» THEN IND(LO). SEMULATIONS OF THE FUTURE UP YG INN3) STEPS IN ADVANCE ARE DESIRED. DSEED = INPUT. AN INTEGER UALUE IN THE EXCLUSIVE RANGE (1y2147400547). DSEED 1S REPLACED BY AuNeW DSEED TO BE USED IN SUBSEQUENT CALLS, DSeED MUST HE TYSED DOUBLE PRECISION IN THe CALLING. PROGE: ALPHA - INPUT AND OUTPUT VECTOR OF LENGTH 2. ALPHACL) CONTAINS THE TnPuY TIENIMN SEGNIEI~ CANCE LEVEL, FOR FODEL SELECTION, IN THE EACLUSIUE RANGE (0.0+1.0). A COMON USLUE FOR ALPHACI) 1S .01 OR .05. Oy OUTPUT. THE ESTIMATED SicNIFicerce LEVEL Of THE PRELIMINARY MODEL 1S RETURNED. See! Thc ALGORTTH SECTION IN THE naNUaL DocunenT FOR FURTHER DETAILS. PLPHA(2) CONTAINS THE INPUT VALUE IN THE EXCLUSIVE RANGE (0.0+1.0) USED FOR COMPUTING To0(1,-ALPHAC2)) PERCENT PROBABILITY LIMITS FoR. THE FORECASTS. A COMMGN VALUE Foe aLpuaca) IS aN CHOICE IN THE INTERVAL ou.) = OUTEUT UectoR OF LENGTH INDCG>+TNDCS). THE FIRST. INDC2)*INGCA) LOCATIONS CONTAIN THE PUTOREGRESSIVE PRRAETER ESTIMATES OF THE UNDIFFERENCED FORN OF THE PODEL. PuaS ~ GUTPUT UECTOR OF LENGTH 24INDC7). “THE FIRST IND(3) COCATTONS CONTAIN THE MOUING AVERAGE PACARETER ESTIMATES OF THE MODEL. Prac - OUTPUT ESTiMAT= OF THE QUERALL MOVING AVERAGE PARAMETER. UU = OUTPUT ESTIMATE OF THE KNITE NOTSE UARTANCE. FCST = OUTPUT maTRIX OF DIMENSION 3 BY INDC3). FOR LEAD TIMES JoLs2er--INDCO)» FOSt(L+us ‘ConTalNs THE WEIGHTS FOR THE r ‘OF THE SHOCKS THAT GLUE THE ) CONTAINS THE FORECASTS, FCST(3)4) CONTAINS THE CORRESPONDING SEUEATIONS FROM EACH FORECAST FOR THE Limits, Sim = OUTPUT VECTOR OF LENGTH InDc9)=IND(10) DEFINED ONLY FOR INDCLO) GREATER THAN ZERO. CT-1) 41ND)» FOR LEAD THES, HIND(S)s CONTAINS THE RESULTS OF THE’ 54 SIMULATION, FOR T=1,3+.++e INDO). We = HOR eRES OF LENGTH THE naxTHUH GF fe THD )SCHNDCL)/10) #20 IND(G) 381M?) BL SeInng +3 1ND(7)+1NDCB) +199 HERE Ts THe naxtMUn OF CL) INOCG)*CINDCED+E)4+74DC7) +L (2) eenpca) (3) COND?) *CIND(?)+3) )2)+84INDT)+ Innce)+5 C. 2emnb(S)s2e1NDC7 +1NDCB)+INDCS)+15 TER ERROR FARAMETER. (OUTPUT) TERHINAL, ERROR TER=i29 INDICATES ONE OF THE INEUT EARAMETERS OF In) OR ALPHA HAS OUT OF TERS130" INDICATES NO MODEL WAS TESTED THAT Passen THE ALPHACI) SIGNIFICANCE LEVEL. WARNING CLITH FIX) TERSG? INDICATES THAT eLL INDCS) ITERATIONS NERE PERFORMED IN FTHRL. CONVERGENCE 1S 266 ASSUMED AND THE PROGRAM CONTINUES CALCULATIONS. PRECISION/HARDARE - SINGLE AND DOUBLE/H32 BINGLE 138, naB,Hs0 SINGLECHO2) FTARPS:FTaUTO, FTCAST, FTGENy FINS, FT"OS, FTL .GGMLs GGUBS, LEGTIF 1 LUDATF sLUELHF, HOCHe HONOR, MDNR! NERRCSERFC, HGANAD=DOAAy Ut UGET IO, UaBHNF 25yST™ = SINGLE (HSB, Hs, NEO)/FTARPS F TRUTOFTCASTS FIGEN,FTHPS, FTNGS, FTHML, GONIL, COUDS, LEGTIFsLUDATF | LUELMF BCH, MDNOR.MONRTS» NERFI, PERRC-ERFC.HGArasGArIMAs UERSET.UERTSTs UGETIO.UaBHKF. 2svST™ DOUBLE/FTARPS, FTAUTO.FTCAST: FIGENSF TIPS, FTNOSsFTHXL + GONML» GGUBS, LEOTIF LUDATF, LUELI sMOCH, HONOR, ONRES, NERF. FERRCERFC, NGAMAD-DGANI"A UERSET. UERTST, UGETIO, UABINF UNADD-UNPUL, UXSTO, 25¥STH REGD. THSL ROUTINES NOTATION INFORNATION ON SPECIAL NOTATION aro CONVENTIONS TS AUATCABLE IN! THE MANUAL INTRODUCTION OR THROUGH IMSL ROUTINE UNELP REMARKS 1, VALUES OF INDC1), ON THE ORDER OF 100. 2. TO START FORECASTING AND SIMULATION PROCEDURES, THE LAST IND()*INOC4) AS ASSIGNED ON OUTPUT) POINTS OF X ARE USED BY FICHP, 3, IF TERSI30.18 OBSERVED, ALPHALL) CONTAINS THE SIGNIFICANCE LEUEL OF THE LAST HODEL TESTED. REDUCING OR INCREASING THe NUMBER OF PARANETERS Sk DIFFERENCING AND A REDUCED ALPHACL) ARE copyRicHT = 1978 BY IMSL» INC. ALL RIGHTS RESERLED. eRRANTY ~ INSL_WARRANTS ONLY THAT ISL TESTING HAS BEEN APPLIED TO THIS CODE. NO OTHER UNeRANTY, EXPRESSED OR IMPLIED, 15 APPLICABLE. 267 APPENDIX A5.2. COMPUTER PROGRAM USED IN THE ANNUAL SERIES EXAMPLE, SEC. 5.2.7. PROGRA PROCL (INPUT: E2s NIGER, Es TAPESGINFUT, TAPEBSE2, TAPE=E, ATAPESSNIGER) A 100 COMFON 7A/" MEAN, OBA O1C100) »SC100)¥¢ 10094 SIGMA, SUM, ACUCEO>. ACC 160), PACUCEO) sNJHCG0) ARES. PRAC PASC) -MU-ALPHACa) FEST C3, 5), {NAACGOO) CRC)» iaREAC 100) CONCAE) s MECHA)» CONUTICA) ZEROCE AVE ‘REGO, HOLD): SEED, SEEDS, Me Ky SIMCESED IND OD4 8 140 REAL, NUH CON, MEAN, AMER COMM ZERO — READ DATA AND CALCULATE STANDERUIZED UARIBLES. nes BERD (36138) GBAR, (SCID, I=1409 Sineo.0 Bo 101 tt. afr) =08¢1)eonee 101 Suiesune (OTC) -GBAR #92 SlonassaeT SUM-CAI)) Bo 102 teLsN to2 YEcd)=cGIC1)-aBAR)/s76HR == PRINT OUT DATA LIST Do 103 Teien 103 RITE (Gs 120) To95¢r>,0r¢E>4¥IC1 IRIE (@s121) dae. stoma = AUTOCORRELATION AND PARTIAL AUTOCORRELATION FUNCTIONS CALL FTAUTO Cr1sN30+3067. APEAN URR, ACUC2) AC2)-PACUCA) HAD Pecuciyen,9 Pec sPAcuCL ROUCLI-URR WRITE CBrd22) AMEAN, URR HUN) $0.0 Bp toe Tet.a1 104 AOstren}=FLOATCED - PRELIHMWRY AUTOREGRESSIUE PARAMETER ESTIMATION (CALL. FTARPS_CACUsANEANs Ly 1+ ARPS PHAC LAs TER) Hetfe’ (eras) HRITE (Br128) anes. Pac ~~ PRELIMINARY MOUING AUERAGE PARAFETER ESTIMATION CALL FTMAPS (ACU, ARPS: 1 f+ PHRS.NU bits TERD FAITE (Be 127) BRAS) eA NexIHUM LIKELIMOOD PARAMETER ESTIMATION DATA IND/0e Le 160+30050 1427 IBC =a == COPY YI SERIES SINCE FTHAKL HILL DESTROY 17> 268 CALL FTHAXL (OSs IND. ARPS. PHAS.PHAC. INUs Re Lis ERD HITE’ (8. 126) ARPS, PhAC HRITE (8: 137) PHASCL)sHNU HRITE (@, 128) IND(S) -— CALC AND OUTPUT RESIDUALS AND SUM OF RESTOUAL SQUARED URITE (741309 SSR=0.0 BO 107 1=1,N-INo¢4) SoeeSSRiiAcT) #92 HITE (7,131) T+HACT)s WACTI*26SSR 107 CONTINUE ~ AUTOCORRELATION OF RESIDUALS AND GOODNESS OF FIT CALL FTAUTO CHa, 51»:302:30, 74 MEAN, UAR, ACUC2)»AC(2)sPRCUC2) +LKAREA) HeiTe’ (7,122) MEAN, AR HRITE (71138) SSR BO 108 1=2.30 SSRSSSRACCT Den 108 HRITE (7+ 124) NUMCT)s ACUCT)+AC()/SSR/PACUCT HRITE (Bs 128) '~ FINAL PARAMETER ESTIMATION, GENERATION AND FORCASTING. PLPHAC2)=0.05 ALPHACL)=ALPHAC2) IND(7)=2 0 INDGQ=i Inpee: INDO: Iy(8)=50 BO 108 Tei. see OSCDe¥iCL) SEED=123457 CALL FICORP COs, IND, SEED: ALPHA, ARPS. PHAS: PHAC. NV+ FEST, SIM+MAvTER) HRITE’ (8, 133) HRITE (81126) aRPS,PraC HRITE (8+ 127) PAASCL) +MY HRITE (B29) INDCS) COPY 2ND ROW OF FCST FOR GRAPHING DO 110 11550 Q5(1)=FCSTC2, 1) MUNCI+1=FLOATCI? 110 CONTINUE NurC19=0.0 BO 111 f=1+30 GONCT)=O5C19+F CST C341) con 19=08¢5)-FESTC3. 19 NURCID=FLOATCDD LLL CONTINUE: INVERSE TRANSFORMATION OF GENERATED AND FORECASTED VALUES. WRITE (85194) (1+SIMCT» (FESTCIs Ds DO 112 11,30 Gr¢1) 051 *stonasoReR CONCI)=<(CONCT eSiGMA)-aBAR PICON(1)=HCONCI}*STGHAYGBAR 112 CONTINUE, Bo '113 1=1+50 113 GYCT)=SINCD )esIGMArOBAR BunCa1)=30 Nunc32)=31 +3)+T1/50) GENERATE 100 SERIES, CALC AVERAGE AND THEDRETICAL AUTOCORR 269 B10 REAL TINE FORECASTING CALL TAUTO. (PIAS: ARPS» 50+ RHO) BO iss T=1,100 SEEDA=RANF CO.) CALL FTCENL CARPS, Pres, PHAC, AMEANSHNU, SEED2» Ly 1,60» STMo A) CALL FIAUTO CSIMs60, 48, 49, 7s MEAN, VAR, ACU® AC(2), PACU IKAREA acCI)=1.0 1g 115 CONTINUE RITE (8,135) DO 116. 1°1,80 URETE (+196) NUMCT)=1+ ACCT) +AVEACCT)»RHOCI) 116 CONTINUE c CALL TAUTO (0.3+0.5+30+RHOD STOP 118 FORMAT (FS.0/S104FS.49) 11s FORMAT Ciiiy/,5k. SH 14% 3X 10H MODULAR +4%+ 10H ANNUAL. 11 3%e L@HSTANDARDIZED, “420%, L2HCOEFFICIENTS, 3%, “7H” FLOWS: 2 StPUOUS, 7+ 4x BOCLH-)) 120° FORMAT CIH +X, 13,6X,F12,5,3X.F10,2)3%F12.5) 1BL FORMAT (1H-15x, -QH0-BAR’=F10.2,7r9% QHSIGHA =+F10.3) 122 FORMAT Cluis7y 12H” MEAN = +Fasdy LIN UR = F8.4) 123 FORMAT CH 7/1 TIM LAG ¢3Ky."SHALTOCOUAR. 3X." LOHAUTO CORR.» 13k ISHPART’AUTOCORR. «73H 1S0CIH=). 12 FORMAT (SXF. 048% 40 34sF10.5)), 15 FORnat Cinis/77) 35H -UALUES FROM FTHAPS ~ FTARPS #eee) IRB FORMAT (7+ 254” AUTO REGRESSIUE PARN.=,FL0.6/ 25H QUERALL MOUE ING Pee, =.F10.8) 127 FORMAT ( 25H MOUING AVE. PARH. =+F10.6,/, 18H LITE NOISE U 1a, SH =F10.5) 1gB FoRhAT (777+ 34H URLUES FROM FTMAKL 128 FORMAT (SK, 22HyUNBER OF ITERATIONS =,13) 130 FORMAT Ciniy/7+ax, SHRANK, 4x, LOHRESTDUALS +2X. L1HS.RESIDUALS. Ox 1 13H5.5, RESIDUALS, /+ aXe 48C1H~)) 131 FoRHat’ Ci +x 13, 3K, 3CaRF10.8)) 132 FORMAT CLH 1/713%, 10H. LAG.” «2X+_SHAUTOCOUAR, 3%, LOHAUTO CORR Lyeake 12HSSO AUTOCORR 2x, 13HPART AUTOCGRR,/s 4Xe67CIH-)) 133 FORMAT (iH +/7+ 35H UALUES FROM FTCORe sere) 136 FORMAT C1HL.SM, 10H LEAD 3X» 1OHSTMULATION. 3M, 10K EEGHTS » 13ky 10H FORCAST +xy LOHDEVIATIONS» “+ 5%, 83(1H-3> 74 S0CEXs 13,4, 403 BXF10.6))) 135 FORMAT CHL s//s4ils 4H LAGHBHs SHSAMPLE AC+3x+ LOHAUERAGE ACs 2Ks 115 THEORETCAL AC’ 744xsS°C1H-D > 136 FORMAT (1H ydXoP3. 093K 4CFL265¢ 3K) END. SUBROUTINE TAUTO CTHETA, PHI-LEN, RHO) REAL PHI. THETA, RHOCSOD CALCULATE RHO SERIES RHOCLY=L.0 RHOC2)=( (1. “PHIWTHETAD™ (PHI-THETAD)/((1.+THETAR#2)-2.4PHI*THETA) 50 101 K=3;LEN+! RHOCK SPT *RHOCK-1) 101 conTINUE: RETURN END aeneee) 270 APPENDIX A5.3. CALCULATOR PROGRAM TI 59 Calculator Program to calculate residuals, squares of residuals and sum of squares of residuals for an ARMA(1,1) model - (Statements 14 through 68, 77, 80, 94 through 133, 136, 152, 154 and 155 are printer controls and may be deleted if the printer is not used) - The program assumes that the data are listed in reverse order from last to first. Enter 6, in A. Enter 0, in B. Enter N < 55 in C. The data used for the Niger River example of Sec. 5.2.7 are given in the table following the program. The program output is shown in Table 5.4 of the text 271 888 03 be Bo 3 & 0 a a2 0 0 a 0 a 0 5 oC OG ; oo 3 ooo é oo 8 2 63 OF 4 04 o4 1 62 OF é f 05 oF 43: REL 02 5656 4 39 PRT 1 Bi R’S t o90 95 = 78 LBL 3 O91 97 Bz 14D 2 052 00 0 BS + 5 093 14 3D 72 Roe 3 034 69 OF oo 80 é 895 00 80 a5 + 6 036 03 3 43 RCL, a 037 06 & 57 57 oF 0g8 04 4 65 a 033 01 1 42 ROL oF 100 03 3 58 58 05 101 60 0 35 = 2 102 80 0 42 8TO 6 102 00 0 7 57 3 104 03° 3 a5 PRT 3 195 06 6 33 xe 4 108 69 OF 35 PRT 1 107 02 a2 38 AY 1 108 03 3 44 Sun 3 109 o4 4 56 56 3 110 OF 4 73 Ree 5 AM oi 4 60 090 oF M2 61 1 65 x a2 113 03 3 43 ROL : 11403 3 53 59 ; 1158 05 5 4 47 3 116 01 3 a5 = é 117 0? 7 é1 670 6 118 63 OP oo 00 a 118 02 3 31 St we 8 ib asin ue re CPA A OG oe I ws Sn ae 273 APPENDIX A5.4._ COMPUTER PROGRAM USED IN MONTHLY SERIES EXAMPLE, SEC. 5.3.5 PROGRAN MAIN CINPUT. OUTPUT, Te 1PEa) iL. MNEANC19) -MSTREUCAa) BINENSTON xitATCSS, 12), ¥TOTAL 18), PACUTSD). ACUCSO). ACCSO) ab CREATE DATA FOR LATER GENERAT: PES: (39) TEHPC8009+ XTEMP(1000)+_ HEADER FAC10009» XUCSO0)» KCONECA).. YC BONE), GRCL0O0)» C1000), INDCB), PALI). THETACLLD» + EKARE TON DATA PHI/0.0+0+110-2+0.3+ 044s 0+510+8+0-7+0-8+ 0.9» 1.07 Bath THETA/0.0s0,150-2,0-350-9+0.5/0.8,0.7,0-850.95 1.07 FIRST CARD OF DATA 1S @ HEADE READ (5.118) HEADER BRITE: (Gy 119) HEADER 2 CARD READ DATA FROM CARDS=~38 YRS-—12 MONTHS PER CARD NveaRs=38. READ (5,120) CCXMATCT De HRITE (6,120) COMHAT( Poy COMPUTE SQUARE ROOTS FOR THE DO 102 I=L-NYEARS DO 101 el, 12 SHAT D=xMAT CL Ie. 101 conTINUE: 102 coNTINUE COMPUTE YEARLY TOTALS DO 104 T=1.NvERRS yroraccl BO 103 Jetsiz TOTAL (ID =vroraL cL ext 103 CONTINUE 104 re} SERIES TLD 1.12) T=1 NVEARS) LA NVERRS) HSTDEUCI)=HSTDEUCJ)+xnIATCT, J) #28, 40s CONTINUE 106 CONTINUE Bo 10? T isroeuc A “1.3338.8 107 CONTINUE HMEAN(13)=0. HeTHeUCl3)=0. D0 108 T=1.NVEARS 212 HRGANCL3)SPMEBN(19>4VTOTALT)/FLOATCHYEARS) HSTDEUCIS)="STOEUC13)+YTOTAL(Z)8=2 108 CONTINUE, ‘SUMSOICLIsT (HSTDEUCT)-FEANCL) ##26FLORT (YEARS) )(FLOATCNVEARS) PSTDEUCLS)=( (MSTDEUC13) -PHEANG19)##26F LOATCNYEARS))/ (FLOAT (NVERRS) ri bse8 274 630 c c c E c 103 10 an 42 413 NOW STANDARDIZE THE DATA DO 110 tJ=1+NvEARS: BO 108 1-112 SMATCT I, 131) =(xRAT Cds JL) -HMEAN( TID )/ASTDEU CTI) CONTINUE! CONTINUE URITE OUT THE STANDARDIZED VALUES ONTO TAPE 8 FOR TABLES WRITE (Bs 122) (1s (AMAT (Ts J» ets 12)+YTOTALCT» Tats YEARS) URETE (8,122) CHREANCTD» T=1, 13) URITE (84123) (HSTDEUCT), T=1.13) NOU CREATE A VECTOR FOR THE LHOLE SERIES FROM THE MATRIX Teo DO 112 tat nveeRs BOT Jee CONTINUE CONTINUE SUBROUTINE FTAUTO COMPUTES AUTOCORRELATIONS, PARTIAL AUTOCORRELATI ND AUTOCOUARIANCES FOR ANY TIME SERIES. TZALL, FTAUTO. (KTENPs NMONTHSs 30»:204 7» MEAN: ACUCL)» ACUC2)sACL2)» PACUC 1B) KAREAD SET AUTOCORRELATION AND PARTIAL CORRELATION OF LAG. ZERO To ONE. a1. LRITE (G,i124) ateaneacycr) HRITE (21125) CIS, ACCES),PACUCTS)+ 151/31) PRELIMINARY ESTINATION OF AUTOGRESSIVE PARAMETER CALL FTARES (ACUsGMEAN: 11+ ARPS»PHACY HKAREA) TRETE: (6, 126) saPS. Prac NOW ESTIMATE MOVING AVERAGE PARAMETER AND FIND HAITE NOISE UARTANC ESTINATE USING FINES. CALL FTHPS (ACU, ARPS, 1+1+PHAS, HNUs LKAREA, TER) FRITE: (6. 127) PHASS LIN STORE THE VECTOR XTEHP DO 113 IP=LeNMONTHS TeNPCIP)=XTEMP CIP) CONTINUE FUSE FTIAL MILL DESTROY IT UPON OUTPUT 590 700 79 720 730 72 250 780 770 780 730 00 B10 220, 830 B40 850, 350 870 380 280, 500 310 320 230 340 350 980 870 390 530 1000 43010 030 41030 1040 050 41080 oro 1080 080 too 110 1120 3130 1180 1150 1180 1170 1180 1180 ig00 10 i220 test 1240 43850 1280 1270 1280 1230 1300 3310 1320 i330 1340 1380, 1360 44 15 1g GENERATE SUM OF SQUARES SURFACE Ur SET THE ROH AND COLUMN COUNTERS TO or IneTet IPA DO 118 15ety11 DONS TS-Lyn1 ETASTENP(O)-PHIC T6)*TENPCL) SUNSGI (15, 15)=ETA==2 0 ita 1723-nnoNTHS ETALSTEHE (17) PHI (16) #TEHPC ETa=eraL, SUNSOI (15, 15)=SUNSOL( 15.16) conT INE LET ITHETS AND IPHI_GE THE ROW AND Ci TO 'THE ROW AND_COLUMN OF ARPS. AND Pl MINIMUM SUNS OF SOURRES OF RESTOUALS G PHI AND THETA VECTORS NE FOR LATER USE 17-1) +THETACTS)#ETA seTALee2 SOLU NUNBER_CORRESPING iA THAT CENERATE THE: TF (SUNSOLCIS, 16).GT.SUMSO1CITHETA, TPHED) GO TO 115 TiHeTa=is conTaNUE conTINuz, URITE THE SUN OF SQUARES SURFACE OUT URITE (3. > URITE (3,129) (PHICIT) TTI 1) URITE (3,130) CTHETACTHD» (SUPSQ CTs To TAPE 9 ITT LT USE SUBROUTINE FTHxL_TO ESTIMATE AR*S,PHAC/PHAS, AND COMPUTE UNO.” LEAUE RESTIUALS, IN THE FIRST INDCL) POSITIONS OF THE DATA IND/Oe 1414057514 1137 ENDCL)=AMONT LET FINAL ESTINATES OF a@Ps_aND Pras THAT LED TO THE MINIMUM SUMS OF SOUAk eBQvE. PraSeTHETACITHETAD ARPSHPHICIPHED HITE: (6, 131) ARPS, PHA CALL FTURL,_ CxTENEs IND, ARPS, PHAS, PHAC. HITE (Ss 132) ARS, PMAS. PACSUN FIND AUTOCORRELATION AND PARTIAL AC THE 262 USCTOR AS OUTPUT OF FTHNL. CALL FTAUTO (As NMCRTHS, 24,24) 74 ANEAN. ACUC?» ACUCE) +ACC2) #PACUCE) +h aKAREAD ACCD=L 276 vecror ane, R INPUT TO FIMYL BE THOSE RES OF RESIDUALS GENERATED +N, GR Ay TERD RESIDUALS THAT ARE IN acyl c ¢ © URITE_OUT AUTOCORRELATION AND PARTIAL AC. COMPUTE © — SurmrTonccaurocoseetation)#+2). c Is alt URITE. CG, 193) THL ACGIH)» PRCUCTH? sso-ssaract In 117 CONTINUE. Sso-553-1. URITE (6-134) sso stor 11g ForNAT ¢gni9» 119 FORNAT (2X+8A10//10X+ SSHORIGINAL DATA FROM SALAMONIA SERIES)“ 120 Fosner (121%,FS.3)) ABE FORMAT (3x) 1215, 12(F6.3,3K),F7.3) 122 FORMAT (10K, 42(F8.3¢ 350 97703) 185 FORHAT (10x, 12 (FB. 3, 3K) F263777279 124 FORMAT (2x, 3SHNEAN AND UARTANCE OF 2T OF SORT SERIES. 2F1S.7) 125 FORMAT (3% 12+10X,F 10.4, 10K,F20.4) 18 FORHAT (/72X, 32KFTARPS ESTIMATE OF ARPS AND PHAC, 2F15.6) 127 FORMAT (7.2%. SOG THPS ESTIMATE OF PAS AND UU»2F15.8) 13 FORHAT (8x) 39%, LOHTABLE S-iG,725x, SIHSUM OF SQUARES OF RESIDUAL, 15 OF » 2SHARHACL.1) FOR STANDARDIZED» “+ 32%, 24HHONTHLY RAINFALL SE ARIES + LGHAT SALAMONTRy IND.»77/+52X) SHPHDD 129 FORKAT (5Xs_SHTHETA, 2x01) (369.26 1K)) 130 FORMAT (SKsFS.2,3%,11(F3.211X)) 131 FORKAT_ (7.x, 4BHURLUES OF “ARPS AND PNAS LEADING TO MINIMUM, 29H SUNS OF SOUazES OF RESTDUALS,5%,2F15.5) 132 FortaT (2x, 34H THX. OUTPUT OF ARES, PHAS, PHAC, HNU, 4F15.6) 133 Fornar (Bx, L1HLAG, AC, PACU,15, 2715.6) 134 FoRHaT (2x, 3SHSUMMATEON(ACe=2)-NOT INCLUDING LAG 0+F15.5) END REFERENCES Akaike, H., 1974. A new look at the statistical model identification. IEEE Transactions on Automatic Control, AS-19, 6, pp. 716-723. Box, G. E. P.and Jenkins, G. M., 1976. Time series analysis - forecasting and control. (Revised edition), Holden-Day, 575 pp. Carlson, R. F., MacCormick, A. J. A., and Watts, D. G., 1970. Application of ‘linear models to four annual streamflow series. Jour. Water Resour. Res., 6, 4, pp. 1070-1078. Delleur, J. W. and Kavvas, M. L., 1978. Stochastic models for monthly rainfall forecasting and synthetic generation. Jour. Appl. Meteor. 17, 10, pp. 1528-1536. Feller, W., 1951, The assymptotic distribution of the range of sums of independent variables. Ann. Math. Stat.22, pp. 427-432. Hipel, K. W., McLeod, A. I., and Lennox, W. L., 1977. Advances in Box-Jenkins modeling: 1-model construc- tion. Jour. Water Resour. Res. 13, 3, pp. 567-575 277 Hipel, K. W. and McLeod, A. I., 1978. Preservation of the rescaled and adjusted range ~ Part 2. Simulation studies using Box-Jenkins models. Jour. Water Resour. Res. 14, 8, pp. 509-516. Hurst, H. E., 1951. Long term storage capacity of reservoirs. Trans. Am. Soc. Civil Engrs., 116, pp 770-808. Hurst, H. E., Black, R. P., and Simaika, Y. N., 1965. Long term storage, an experimental study. Constable, London. Mandelbrot, B. B., and Wallis, J. R., 1969. Computer experiments with ractional Gaussian’ noises - Part 1: Averages and variances. Jour. Water Resour. Res. 1, pp. 228-241 McLeod, A. I., 1976. Improved Box-Jenkins Estimators. Rept. Dept. of Statistics Univ. of Waterloo, Ontario, Canada Mejia, J. M., Rodriguez-Iturbe, I. R., and Dawdy, D. R., 1972. Streamflow simulation - 2 - The broken line process as a potential model for hydrologic simulation. Jour. Water Resour. Res. 8, pp. 931-941. O'Connell, P. E., 1971. A simple stochastic modeling of Hurst law. In Mathematica) Models in Hydrology, Warsaw Symposium, (IASH Pub. 100, 1974), 1, pp. 169-187 O'Connell, P. E., 1977. ARIMA models in synthetic hydrology. In Mathematical Models for Surface Hydrology, T. A. Ciriani, U. Maione and J. R. Wallis, Editors, Wiley, New York. Salas, J. D., 1972, Range analysis of periodic-stochastic processes. Hydrology Paper 57, Colorado State Univer- sity, Fort Collins, Colorado Salas, J. D., Boes, D. C., Yevjevich, V., and Pegram, G. G. 8., 1979.’ Hurst phenomenon as a pre-assymptotic behavior. Jour. of Hydrology, 44, pp. 1-15. Salas, J. D., Boes, D. C., and Smith, R. A., 1980. ARMA modeling of seasonal hydrologic series. Paper submitted for publication to the Jour. Hyd. Div., ASCE. Tao, P. C., and Delleur, J. W., 1976. Seasonal and nonseasonal ARIMA models in hydrology. Proc. Am. Soc-Civl Engrs., Jour. of Hydr. Div. 102, HY10, pp. 1541-1559 278 Chapter 6 AUTOREGRESSIVE INTEGRATED MOVING AVERAGE MODELING 6.1 DESCRIPTION OF ARIMA MODELS It was shown in Chapter 5 that ARMA models can be fitted to stationary hydrologic series, such as the annual series. For nonstationary series such as monthly, weekly and daily series, the nonstationarity was removed by the periodic standardization. This procedure leads to useful models for the synthetic generation and forecasting of hydrologic series. However, the number of parameters required is generally large. For example, an ARMA(1,1) model applied to monthly series requires 27 parameters (12 monthly means, 12 monthly standard deviations, $1, 61, and 02). This chapter shows that there are alternate ways of transforming a time series into stationary series which lead to models requiring fewer parameters. If the series does not have a fixed mean but its successive changes or differences are stationary, then it is possible to extend the ARMA models to nonstationary series by working with their differences. It is possible to take the first, second, or in general, the d-th difference, which leads to simple nonperiodic ARIMA(p,d,q) models (also known as nonseasonal ARIMA models). It is also possible to take periodic .or seasonal differences at lag w such as the 12th difference of monthly series, which leads to periodic ARIMA(P,D,Q) models (also known as_ seasonal ARIMA models). The cébination of nonperiodic and periodic ARIMA models leads to the multiplicative ARIMA model. The acronym ARIMA stands for autoregressive integrated moving average process. The original discrete nonstationary series is differenced to obtain a stationary series. The con- tinuous time equivalent is to differentiate the process. To retrieve the original process, the differentiated process should be integrated, or equivalently for a discrete process, an infinite summation should be performed. Using the inte- gration analogy, a constant of integration is needed, but it has been "lost"’ in the process of differentiation or differenc- ing. Such an integrated series would lack a mean value and as a result the ARIMA models are nonstationary and cannot be used for the synthetic generation of stationary hydrologic time series. However, the ARIMA models are useful in fore- casting the deviations of a process. In the case that the series level changes, the forecasts with an ARIMA model would continue to track the process, whereas a stationary ARMA model would be tied to a mean level that may have become obsolete. Another application of ARIMA processes is 279 in the evaluation of transfer functions of dynamic linear systems, which relate the deviations of an input to the devia- tions of an output. 6.1.1 THE DIFFERENCING OPERATION Differencing of the time series may be used to remove its nonstationarity. The first order differencing is defined by Deere Taking the first difference of a series removes or attenuates the low frequency components of the series. Thus, differ- encing once may be useful in removing the trend of a series, or in cases where the series is nonstationary, in the mean or in level. If the series is nonstationary both in level and in slope, then two consecutive differencing operations are needed. In this case, the second difference of the series is represented by (6.1) Wee Ue Ue = Oy Mey) > py o Mpg) = Xp - yt Xe - (6.2) In general, the differencing operation may be done several times but in practice only one or two differencing operations are used. If the series exhibits periodicity then seasonal differencing may be used Uy x > x where w is the period. Typically w is equal to 12 for monthly series. If necessary, to achieve stationarity the seasonal differencing could be repeated D times. For example for D=2 and w=12 (6.3) -w u, t 7 Meera = Oe” Xena) > Gere > te2a? Xe > 2X e+ Xie, - (6.4) Differencing alone may not be sufficient to achieve stationarity. The logarithmic transformation of the data may also be useful as changes in logarithms are essentially per- centage changes. Other transformations may also be used. 6.1.2 THE ARIMA MODEL The behavior of the differenced series w, or wu, may be represented by a stationary ARMA (p,q) protess 280 Ge. (6.5) The model for x, is referred to as an ARIMA (p,d,q) where the parameters and q indicate the order of the auto- regressive and moving average components and d is the number of differencing operations necessary to obtain the stationary u, series. For d= 1, and replacing u, by Xt > Xp, wel obtain x, 1 Oy Rpg) HO Op > Xepey) Hep Oey Og eg (6.6) Although the u, series generated by Eq. (6.5) is stationary, the x, series kenerated by Eq. (6.6) is nonstationary. In fact Xe (Re Mey) + Oh 7 Xpeg) t Og 7 XLg) =utu Mt The xX, process is thus an infinite sum of u, tupgt (6.7) iS» referred to as the "integration" of the u, series. For this reason it is called an integrated autoregredsive - moving average process. The expectation of the right hand side of Eq. (6.7) does not converge and the process. x, is nonstationary. Likewise, it may also be shown that thé variance of the x, process tends to infinity. Thus, the differencing operatidn yields a stationary u, process but the “integrated” process x, is nonstationary’ with infinite variance. For this reason‘ the ARIMA process cannot be used to generate synthetic station- ary hydrologic sequences. However, it can be useful for forecasting hydrologic events. For example, the forecasting function of the ARIMA (p,i,q) is obtained by taking the conditional expectation of Eq. (6.6): Xe(L) = [Xyyp] = tO EX pgp ) #42) Eee] * # p-%-0Marapl t pay) = 8 lepepea] - + Oleg pegl > (6.8) indicates conditional expectations as in Eq. (5.57), = Xtgpj 8 an observed value; if = x,(L-i) is a forecast; likewise if i> L, is a past or current random value and if 281 is a future random value whose isk, (yp yl = 0 expectation is zero 6.2 SIMPLE ARIMA MODELING OF TIME SERIES The ARIMA models are called nonperiodic or nonseasonal if the differencing is as in Eqs. (6.1) or (6.2), and they are called periodic or seasonal if the differencing is as in Eqs. (6.3) or (6.4). Kavvas and Delleur (1975) showed that simple first differencing significantly reduces the periodic component of monthly series, and have fitted ARIMA (1,1,1) models to monthly rainfall. Rao (1980) used the ARIMA (0,1,1) and (0,2,2) or integrated moving average IMA (1,1) and (2,2) models to evaluate the changes in annual precipitation which are attributable to effects of urbanization. The simple ARIMA models (nonseasonal differences) are discussed in this section. 6.2.1 THE SIMPLE ARIMA MODEL The general ARIMA (p,d,q) model can be written as E o 3 ® 1 (6.9) - U7 Fe, - z jaa J td jeg Uti where u, is the d-th difference of the x, example the ARIMA (1,1,1) is , Process. For Uy = 2 Uy tee - 8 ey (6.10a) where Ue > My (6.10b) The stationarity of the ARMA (1,1) model fitted tou requires |g,| <1. It is therefore seen that if 9, > 0, af is usually the case in hydrology, then the model X= Lt 01) Hy - Oa Keg t ey OE (6.11) teL does not satisfy the stationarity conditions (4.24) for the AR(2) component of this model, as (1 + $,) + (-0:) = 1 Figure 6.1 shows a plot of a nonstationarity series generated by Eq. (6.11) with 9, = 0.5, 0; = 0.3 and o2 = 0.6. The slow rate of decay of the autocorrelation func~ tion is exhibited in Figs. 6.2a and’6.2b which show the auto- correlation function of the series of Fig. 6.1 for 24 and 500 lags, respectively. This effect is translated into a concentra- tion of the spectral density (see Eq. 2.22) at the origin as 282 xm ME Figure 6.1. Plot of generated ARIMA(1,1,1) series of Eq. (6.11) with parameters 9, = 0.5, 6; = 0.3 and 2 = 0.6. EXPLANATION wen © +95 PONT CONFI ny 95 PCNT CONFID SAMPLE AUTOCORR. Figure 6.2a. Autocorrelation function of generated ARIMA(1,1,1) series of Eq. (6.11) with parameters 9, = 0.5, 6; = 0.3 and of = 0.6. 283 LINE -= EXPLANATION PONT CONFID PONT CONFID £ UTOCORR, or AUTOCORRELATION bo 887 ea gg wie 5 eet Figure 6.2b. Autocorrelation function of generated ARIMA(1,1,1) series of Eq. (6.11) with parameters) 6; = 0.5, 0; = 0.2 and 02 = 0.6 e Z call % aol \ rattency (FR Figure 6.3. Power spectrum/variance plot, of generated ARIMA(1,1,1) series of Eq. (6.11) with param- eters $; = 0.5, 0: = 0.3 and o? = 0.6. shown in Fig. 6.3. It can be shown that the theoretical value of the spectrum at the origin tends to infinity. 284 It is seen that this type of model is not applicable for the generation of hydrologic variables which do not -exhibit this nonstationary behavior. However, this type of model may be useful for hydrologic forecasting. This is because the first difference lets the mean or level of the series go free, but upon forecasting the forecasted values are reattach- ed to the previous observations. Unlike ARMA models, the ARIMA model is thus capable of forecasting future values of a series even if its mean changes along time. 6.2.2. PARAMETER ESTIMATION FOR SIMPLE ARIMA MODELS The series x, is differenced as many times as necessary to obtain'a stationary series. In practice the number of differencing d is 0, 1, or 2. For example, the first difference of the nonstationary series of Fig. 6.1 is shown in Fig. 6.4. The nonstationarity of the x, series produces an autocorrelation function that decays very slowly as shown in Fig, 6.2, whereas the autocorrelation function (ACF) of the differenced series is shown in Fig. 6.5. Thus a slow decay of the ACF may be interpreted as an indicator of a nonstationarity in the x, series, suggesting the need for differencing. 1.00 N 5 oc. He Figure 6.4. First difference of generated ARIMA(1,1,1) X, series of Eq. (6.11) with parameters $1 = 0.5, 6 = 0.3 and of = 0.6. If the raw series is suspected to be nonstationary, the ACF of the differenced series is examined. If such ACF fails 285, LINE -- EXPLANATION +95 PONT CONFIO ‘CNT CONF IO PLE AUTOCORR . +P6 Figure 6.5. Autocorrelation function of first difference of generated ARIMA(I,1,1) x, series with pa- rameters $, = 0.5, 6; = 0.3 and of = 0.6. to damp out the sample autocorrelation, the second difference is examined. The lowest level of differencing to achieve sta~ tionarity is used Once the series has been stationarized by differencing, the estimation of the parameters of the stationary ARMA(p,q) model fitted to the u, series proceeds as in Sec. 5.2.2 6.2.3 GOODNESS OF FIT FOR SIMPLE ARIMA MODELS The procedure for testing the ARMA(p,q) model fitted to the differenced series are the same as those discussed in Sec. 5.2.3. Thus, the procedures for overfitting, the tests of the residuals such as the autocorrelation check, the Porte Manteau test and the cumulative periodogram test apply. The Akaike test is also useful to choose among competing models and to check the parsimony of the parameters 6.2.4 SUMMARIZED PROCEDURE FOR SIMPLE ARIMA MODELING The modeling procedure for the ARIMA(p,d,q) models are summarized in the following steps 286 STEP (1). Transformation. Check the normality of the series and make the appropriate transformation to normality if necessary. STEP (2). Differencing (2a). From a plot of the normalized series observe whether there is any nonstationarity in the level or both in the level and slope. The first case may indicate the need for first differencing, the second for differencing twice. (2b). Check the autocorrelation function of the normalized series. An unusually slow decay may indicate the need for differencing. Use the lowest level of differencing necessary to achieve stationarity. STEP (3). Fitting the ARMA(p,q) Modei to the Differenced Series. This procedure follows exactly the steps of Sec. 5.2.6, with the exception of the generation procedure which does not apply. 6.2.5 EXAMPLE OF SIMPLE ARIMA MODELING Delleur and Kavvas (1978) fitted the ARIMA(1,1,1) models to the rainfall series described in Sec. 5.3.5. The results are given below for the monthly precipitation at Salamonia, Indiana (Station 12-7747). The data are listed in Table 5.13 The step by step procedure of the previous section is followed. STEP (1). The square root of the monthly precipitation is used to obtain a series which is approximately normal. STEP (2). The nonstationarity in the series is removed by taking the first difference. Table 6.1 shows the first differ- ence_of the precipitation square roots. For example, uz = 4,910 - 3.920 = 0.236. Steps (1) and (2) may be performed by the IMSL Program FTRDIF*. STEP (3). An ARMA(p,q) model is fitted to the differenced series following the steps of Sec. 5.2.6 (8a). The mean and variance of the differenced series are u, = -0.0011 ands? = 0.6065 See Appendices A6.1 and A6.2. 287 UEIpUL ‘ETUOMETES ye uoTENdWerg ATYWUO_! Jo siooy ‘aun san ‘ogenbg Jo aouesayiq 4 TES yeydroorg ne ue Ss euerpuy T9 del 288 (3b). The autocorrelation function of the differenced square roots of the rainfall is shown in Fig. 6.6. The 95% confidence interval is given approximately by £2/{N = £2/(55 = 0.094. The autocorrelation function is seen to have a domi- nant value at lag 1. The partial autocorrelation function is shown in Fig. 6.7 and exhibits significant values at lags 1 through 10. FUTOCORRELATION aad Teoma ‘oo a er os Figure 6.6. Autocorrelation function of the first differenced square roots of monthly precipitation at Salamonia, Indiana. (3c). Identification. The behavior of the ACF and PACF suggests an ARMA(1,1) model for the differenced series (3d). The initial estimate of the autoregressive parameter is obtained as indicated in Sec. 5.2.6, Eqs. (5.70) and (5.71) 036415 _ _ 9 9754 and 0, = -0.0011(1 + 0.0754) = -0.0012 (Se). The initial estimate of the moving average parameter is obtained as in Sec. 5.2.6, Eqs. (5.74) with’ the following result from the IMSL subroutine FTMPS: 6; = 0.5873, 62 = 0.4207. e 289 CORR. Pant. UTE Ye Sm HP? Figure 6.7. Partial autocorrelation function of the first differenced square roots of monthly precipi- tation at Salamonia, Indiana. (3f), Maximum likelihood estimates. Table 6.2 shows the sum of squares surface of the residuals of the ARIMA(1,1,1) for the monthly rainfall square roots. The estimate of’ the parameters based on the sum of squares sur- face is: $=0.0, 6, =0.9 Using these estimates as initial values, the final estimates obtained by the IMSL program FTMXL are = = -l é = 62 = $= 0.0451; @,, = -0.0010; @, = 0.9593; 62 = 0.3382 (3g). Goodness of Fit test. The Q-statistic is found to be @ = 0.0448 x 455 = 20.06, and with 22 degrees of freedom, it is to be compared to $_(95%) = 33.9. The model is seen to pass the Porte Manteau test. The ACF of the residuals is shown in Fig. 6.8. It does not exhibit any significant value. Of the 15 series studied, 7 passed the test. In several cases, we found 6, = 0.999 which is essentially equal to 1. In that case the model may be written as Or Me teem ee or ELtS22 Bb1022 EStBI2 BB'EI2 ysrbee OSTZE serere Erase Te-o/e ge'ZEe gBTee = 90"T 22-761 E2r22T 28-29T os"aST OS“SGT © GETHST Sev9ST e715] 22769" CE"OBT TS:r61 06" So-602 GT'EBI 28"OeT GI'IZS pOrsgT ET ‘291 | ShresT © 20°99T ERAT d8"eBT "SGT =O 08-922 Bprore sel sBteBT 69"SZT aGtOZT AStEST © e9TTZT GO"esT 2E"SeT Ze"86T =o gorse sg'6ee S6'TTe 287261 Geet SE°0BT bOvzzT BE*Z2T Oe"TeT © BS"SBT 2e7eGT 09 Es'cze de"ls2 ebivee BETET2 © ST-O0e © ELrOBT © aT“SeT BETERT CETSAT STI 827002 Og ce‘boe B6-S22 BbrIse TT"Tee 2é"ble © Bhre0e © Pe"PGT © pO'OBT S8°6BT 2e"EBT § Te"T08 Ob SE°2CE up'pOoe TG'Se2 © gtISe «SL*TE2 STSTe 28'hO2«TS*2B122"S6T SE"SGT "20 =O S0°S2e E0°REE PS'POE Te"Sle eBrISe Jerzees US'ZT2 ebsdoe TE'T0e b0'10e BB"hoe Oe BT'zep G0°BZE LE"BEE B'pOE —OB'SZ2ESTTSe GE'eee BO'BT2 22-012 Gerg02 08'202 oT ET'Blb Gors2> BS"OBE pe"OPE TE'SOE Ga'Sle 0022 © aa'EEe SL0ee ebeTe GES'TIe 0 oss ove ost oz or o or 2-0 * op'- —0s*~ LHL THd “eueIpuy ‘ertiowepeg ye sojeg yeyurey ATWIUOW Jo Jooy auenbg Jo souedejIG 4ST Jo spenprsey Jo sarenbg Jo ung -Z°9 eae, 291 a Sea — ERAT 2 gee X — RatecBAEsarron UTOCORRELATION i 29,900} nn vo do em mee atm weno Figure 6.8. Autocorrelation function of residuals of Ist difference of square root of monthly rain, Salamonia, Indiana. which is seen to be the difference between X= 1X) te and x) = hy te te1 t-2 tol That is to say, the model degenerates into an AR(1) model. Thus, when 6, ~ 1 the ARIMA(1,1,1) model is not suitable for monthly rainfall series at the station considered. 6.2.6 LIMITATIONS TO BE CONSIDERED IN APPLICATIONS OF SIMPLE ARIMA MODELS It was indicated in Sec. 6.1.1 that taking the first difference of a series strongly attenuates the low frequencies. Kavvas and Delleur (1975) studied theoretically the effect of differencing, making use of the spectral representation of monthly time series. This spectral representation was com- posed of two parts: the continuous part, representing the stationary component of the series and the second part rep- resenting the periodic or seasonal component assumed to be circular stationary. Circular stationary means that the multi- variate probability distribution of Xo = (x1, X2, ..- X12) is the same as that of oetaK = Cia? <-> ¥p24¢12K? Analysis of the spectrum showed that while the first differ- encing significantly reduces the periodic component, the 292, continuous spectral part of the differenced series g,(f) is related to the continuous spectral part of the original series g(f) by gat) = gif) (2 sin anf)? = F< <5 Thus, differencing wipes out the value of the original spectral density g(f) at f=0 and dampens out the spectrum for 0 Xtey (6.14) Or Uy tH Ug to Hop Up te re ee or Pp Q Me Mein fe 28 Sti For example, the seasonal integrated moving average model ARIMA(0,1,Q) or IMA(1,Q),, is 1 e482 Spay oo 7 (6.15) 1 Ft-Qu ° Referring to Eq. (5.13), the autocorrelation structure of this model is written as 7% * 1% 41 t+ + 9.4% ke TF Oe F Ont »k< Qu Q =0 k> Qu The seasonal _ integrated autoregressive —_ model ARIMA(P,1,0),, is written as (6.16) Uy, = Oty + Potent + UL py tee - (6.17) So from Eq. (4.13), the autocorrelation structure is given by Plow = Pci )w * P2PCc-ayw tot MpPG-Pw? K? 0. (6-18) Consider for example, an AR(1) model fitted to the 12-th difference of monthly observations FOr Uy te t or a 7 Xtene = Oey 7 Xpag) t ep or 294 X= Oy * Mee 7 Xs tet It is seen to be a nonstationary AR process of the 13th order with 62 = 3 =... = 611 = 0, O12 = 1, O13 = ~Or If instead the MA(1) model is fitted to the 12-th difference of the monthly data, then - x, on (6.19) a tate12 or tle tb 7 . - (6.20) Suppose that x, is an observation in May of a particular year for a monthly flow time series. Equation (6.20) shows how this value might be expected to be related to the obser- vation made in May of the previous year, in the sense that both observations would be expected to diverge in the same direction from the long term trend. The same sort of rela- tionship might reasonably be expected to link the observations for other months. April data might be linked by t= = Ste ~ O13 or Xe = X13 * ter 7 O83 However, the o, residual for May might not be independent t of a.) for April. The ,'s might be represented, for ex- ample, ‘by an autoregressive process which would depict the underlying month to month carry-over. Thus Foro te (6.21) Combining Eqs. (6.19) and (6.21) gives = On apy # ey - Or Oa Mag > Or Sere = Or Cy 7 Or Mpg g) + Py Or Ebene Or Me te > Or See or Reo Meng = 01 Gey Xeeag) * St 7 Ot Sede 295. or = Or ey t Meg - 1 Xeeag t SET OL Spy (6-22) This sort of model is called a multiplicative model and is designated as an ARIMA(1,0,0) x (0,1,1)12 model, indicating that a seasonal MA(1) model is fitted to the 12th difference of the data and a AR(1) model is fitted to the residuals of the former model. More generally, for a monthly hydrologic series for example, the 12-lag differenced series could be fitted by an ARIMA (P,1,Q)12 model Q z : 6. isl i 't-12i (6.23) where v, is the first seasonal difference ~ ¥te12 This model explains the dependence of the observation x, of a particular month to the observations taken the same month during the previous years. Since there are 12 months, there would be 12 such models (one for each month) which are assumed to be similar Although the observations for a certain month, May for example, is related to the previous May observations, it is also related to the other monthly observations of the same year. It is therefore expected that the "error" components a, of Eq. (6.23) are serially correlated. To take care of this serial correlation between the months an ARMA (p,q) model is fitted to the a, series: tej (6.24) Fl Combining Eqs. (6.23) and (6.24) we obtain the ARIMA (p,q) x (P,1,Q)i2 multiplicative model P P po oP PB Meat EG Mea” 2, By 88 Veinai q Q + ee 8, 8 feagj-i 3 (6.25) 296 or p P eat 2 MOLT Mea) * pain > Xe-12c14)) p P ~ EE 6% Gyan - Xara) + iso jao bj Mtri-12] ~ “t-i-12(14)) oe Aa Es 6; 8; ee yas (6.26) Box and Jenkins (1976) generalized this model and obtained the multiplicative ARIMA (p,d,q) x (P,D,Q), model which consists of a seasonal ARMA (P,Q) fitted to the D-th seasonal difference of the data coupled with an ARMA (p,q) model fitted to the d-th difference of the residuals of the former model. The following notation is used: BY ; B = backward operator (6.27) = Xten (6.28) Q-B)x, axe = Ist difference (6.29) (1-B)? x, =X > 2x +X, = And difference (6.30) (-B)? x, = d-th difference (6.31) (1-BY?) x, =X, > Xy-19 = Ist seasonal difference of period 12 (6.32) (1-B??)? x, =X, ~ 2X4 19 + Xp_9q = 2nd seasonal difference of period 12 (6.33) a-BY)Px, = Dth seasonal difference of period w (B) x, = Cri B-g2B* - ec (6.34) = autoregressive operator 297 OB) x, = moving average operator (BY) = (1-0,B" - @B™ ~ ... - O,BP) xp == 2% xu #0 Gy =-1 = seasonal autoregressive operator Wy. GecpY 2w qu, .. 8 O(BY) = (4-0,B" - 7B ~ ... - EgB%) c= - 28 tui” % = seasonal moving average operator. With this notation the seasonal ARIMA (P,D,Q),, model, in which an ARMA (P,Q) model is fitted in the D*th seasonal difference of period w is written as (1-6,BY-6, BY — .. + % BPM) (1-BYP x (1-0, BY-6,B™ =... - & Be ya, (6.36) The residuals a, are in turn represented by an ARIMA (p,d,q) model (i.e. - an ARMA (p,q) model fitted to the d-th difference of the a, series) (1 ~ 6, B-o2B? - ... - BP ya-By4 (1-8, B-6,B? - ... - OB, : (6.37) where ¢, is an independent variable. The general multipli- cative ARIMA (p,d,q) x (P,D,Q),* model is obtained by solv- ing Eq. (6.37) for a, and repldting in Eq. (6.36) as "Note that upper case letters are used to designate the order of the autoregressive, differencing and moving average of the seasonal component and lower case letters are used to designate the respective operators of the nonseasonal com- ponent. The subscript w designates the period of the seasonal differencing 298 (i=; BY-o, B™ - - BPY) (1-6) B-$B? - ... - * BP) c-BY)Pa-B)4 x, = C-0,B" - eB - og3™) (1-8; B-82 B? - ... - 6,B4) &, (6.38a) or in a more condensed form (BY) 9(B)-BY)? (1-B)4x, = (BY) 0B) &, - (6.38b) Box and Jenkins (1976, p. 305).used an ARIMA(0,1,1) x (0,1,1)12 model to represent the logarithms of the monthly passenger totals in international air travel. This series ex- hibits both periodicity and a trend which justifies the season- al and nonseasonal differencings. First order moving average models were fitted to the seasonal and nonseasonal components. In hydrologic applications, however, the seasonal effect is dominant, and seasonal differencing may suffice to obtain a stationary series in the mean. River flows, however, are known to possess strong autoregressive components. There- fore, multiplicative models applied to monthly river flows are likely to inchide a nonseasonal autoregressive component. It should be noted that this multiplicative model is not reducible to an additive model in which the series is the sum of a sea~ sonal component and a nonseasonal component, as was postu- lated in the parametric periodic standardization method studied in Sec. 5.3. 6.3.2 PARAMETER ESTIMATION FOR MULTIPLICATIVE ARIMA MODELS The form of the autocovariance structure is useful in the model identification. Box and Jenkins (1976, p. 329-333) give a table of several simple multiplicative models and their autocovariances. The knowledge of the physical source of nonstationarity is useful in identifying the required differenc- ing. Also, the knowledge of the physical process underlying the nonseasonal portion of the model is useful in the identifi- cation of the portion of the model representing the nonsea- sonal component 94. One approach to the estimation of the parameters would follow the heuristic justification of the multiplicative models An ARMA(P,Q) modei would be fitted to each season after D seasonal differencings. These models would be averaged across the seasons to obtain the ARIMA(P,D,Q), model. Finally, the residuals of this model would be fittéd by an ARIMA(p,d,q) model, The two combined would yield the multiplicative ARIMA(p,d,q) x (P,D,Q), model. This ap- proach is not practical, however, as thé optimization of the 299 parameters would be done in the estimation of each model component, and this procedure would not guarantee a global optimization of the parameter choice. Instead, the sum of the squares of the residuals is calculated for a range of values of the parameters. A first estimate of the parameters corresponds to those values which minimize the sum-of-squares surface. The final estimate is obtained by the nonlinear estimation procedure. The principal hydrologic application of ARIMA models is in forecasting. The calculation of the variance of the fore- casts requires the calculation of the J-weights when the series is expressed as an infinite moving average. z Y &paj = MB) ep jzo Introducing this expression for x, into the general seasonal multiplicative ARIMA (p,d,q) x (P$D,Q),, model of Eq. (6.38) we have (8) 9(B")(1-B)* (1-BY)P CB) e, = 0(B) OCB) ey , (6.39) we obtain after simplifying by 9(B) 9¢8")(1-B)4 G1-BY) ycB) = 6(B) @(BY) . (6.40) The y-weights are obtained from the several equations that result from Eq. (6.40) by equating equal powers of B. A sample calculation is given in Sec. 6.3.5 6.3.3 GOODNESS OF FIT FOR MULTIPLICATIVE ARIMA MODELS As before, the residuals are examined. The correlogram is checked and the Porte Manteau lack of fit test is perform- ed. The periodogram check likewise may be performed and the Akaike test may be used to compare competing models. 6.3.4 SUMMARIZED PROCEDURE FOR MULTIPLICATIVE ARIMA MODELING The modeling procedure for the multiplicative ARIMA(p,d,q) x (P,D,Q),, models may be summarized in the following steps. 300 STEP (1). ‘Transformation. Check the normality of the data and make the appropriate transformation (such as the Box- Cox transformation) to normality as necessary. STEP (2). Differencing and Identification (2a) From a plot of the normalized series observe whether there is any periodicity or other nonstationarity in the series. (2b). Check the autocorrelation of the normalized series. ‘Typically, the ACF of monthly series shows a 12 month periodicity ‘superposed on a decaying autocorrelation The periodicity indicates the need of seasonal differencing by Eq. (6.13); a slow decay superposed on the periodicity may indicate the need of nonseasonal differencing by Eq. (6.1) It is useful to observe the behavior of the ACF for several combinations of seasonal and nonseasonal differencings. Use the lowest level of differencing necessary to achieve station- arity. The combination of a Box-Cox transformation with seasonal and nonseasonal differencing may be accomplished by IMSL Program FTRDIF*. STEP (3). Fitting the’ Model Parameters (3a). Preliminary estimates. The expressions for the autocovariance structure may be developed. Estimates of the parameters may then be obtained by replacing the expected values of the autocovariances by their observed values, and solving these equations simultaneously. Box and Jenkins (1976, p. 329 and ff) give a list of such functions for a few models, however these are not necessarily appropriate models for hydrologic situations. This step may become laborious and may be skipped, starting with step (3b) (3b). Maximum likelihood estimates. Find the sum of squares surface e,2(, 0, , ©) for a range of parameter values, locate its minimum and the corresponding parameter values (3c). Nonlinear estimation. Use the maximum likelihood estimates as initial values, the final estimates of the param- eters are obtained by the nonlinear estimation procedure. (3d). Diagnostic check. Perform the appropriate checks on the residuals; these include the inspection of their ACF, the Porte Manteau test and the cumulative periodogram test. The Akaike test may be performed to compare competing models. *See Appendices AG.1 and A6.2 301 (3e). Forecasting. Develop the forecasting function by taking the conditional expectations of the expanded finite difference form of the model. Compute the y-weights as indi- cated in Eq. 6.40. The forecasts and their confidence inter- vals are then evaluated (3f). Inverse transformation. The inverse of the transformation performed in step (3a) must be performed to obtain the forecasts in the original physical units Steps (3b) through (3f) (except the Akaike test) may be performed by the Box-Jenkins program UNESTM (see Appendices A6.3 and A6.4). 6.3.5 EXAMPLES OF MULTIPLICATIVE ARIMA MODELING Monthly flows of 16 watersheds located in Indiana, Illinois and Kentucky were studied by McKerchar and Delleur (4974). Many of these watersheds are the same as those referred to in Sec. 5.3.5 and 6.2.5. The watershed areas vary from approximately 240 to 4000 square miles and the length of the records varies from 444 to 672 months. The watershed corresponding to the rainfall data used in previous examples (the Blue River near White Cloud, Indiana) is used for illustration. It has a drainage area of 461 square miles and a length of record of 456 months. The data are given in the Appendix as part of the computer program input. Initially, AR(1) and AR(2) models were fitted to the periodic standardized monthly flow logarithms. The AR(1) model passed the Porte Manteau test in 10 out of 16 cases and the AR(2) model passed the test in all cases. The AR(2) model was selected and referred to as model A. The number of parameters required was 27, namely, 12 monthly means, 12 monthly standard deviations, 2 AR coefficients and 02. The multiplicative ARIMA models were investigated as they “offered the promise of fewer parameters in forecasting applications. STEP (1). Transformation. The seasonality of the monthly means and standard deviations of the flows are exhibited in Fig. 6.9. Often the standard deviations are larger than the means, resulting in a coefficient of variation greater than unity, indicating highly variable data. In addition, since the data have lower bounds of zero, a coefficient of variation greater than unity implies that the data must be positively skewed; this is confirmed by the positive values of the monthly coefficients of skewness shown in Fig. 6.10 The skewness coefficients for the logarithms of the monthly flows are close to zero, suggesting that monthly flows may follow lognormal distributions. Plots of flows of indivi- dual months on lognormal probability paper as in Fig. 6.11 302 15004 5 1000-4 300 Oe eT eT TT toe MONTH (j) Figure 6.9a. Mean monthly flows of the Blue River near White Cloud, Indiana ‘500-4 -| || 00-4 |_| Le Olhericislisseeel air fe erceal Wc lareees (ara eee at ct eee aT MONTH (j) Figure 6.9b. Monthly standard deviations of flows of the Blue River near White Cloud, Indiana. support this contention. Figure 6.12 gives little evidence for assuming a periodic pattern in the monthly correlations. It is thus assumed that the data are covariance stationary: the same dependence exists between all months regardless of the position within the 12 month seasonal cycle. STEP (2). Differencing and Identification. Autocorrelations were calculated for all possible differencing schemes of non- seasonal differencing (d = 0,1,2) and seasonal differencing with a lag of 12 (D = 0,1,2). The autocorrelation functions 303 ao4 Cy 20 NATURAL SERIES eal ee -—_— 04 FeePNEES pes eee 7] LOG-TRANSFORMED SERIES 4 oo 204 I ot MONTH (j) Figure 6.10. Monthly coefficients of skewness of flows and log-transformed flows of the Blue River near White Cloud, Indiana. of the differenced series w, = S4v2,y,, where y, = lox, are shown in Fig. 6.13. The first seasonal difference is seen to considerably decrease the seasonality in the autocorrelation function, and the first seasonal combined with the first non- seasonal difference (D=1 and d=1) essentially eliminates the cyclicity and shows significant values of lags 1, 2, 11, 12, and 13, with lags 1 and 12 being the dominant ones. This type of behavior tends to indicate the possibility of a season- al and a nonseasonal moving average component. An ARIMA modei of the type (0,1,1) x (0,1,1)12 appears to be a candi- date. This is the model used by Box and Jenkins to repre- sent the logarithms of monthly passenger totals in internation- al air travel. That series exhibited both seasonality and a positive trend. For the hydrologic data physical intuition would indicate that the nonseasonal differencing is unneces- sary due to the absence of a trend and that seasonal differ- encing would result in a series with stationary mean and var- iance. The month to month correlation could be taken into account through an AR(1) or AR(2) nonseasonal component. Thus, the ARIMA models (1,0,0) x (0,1,1)i2 and (2,0,0) x (0,1,1)12 also deserve consideration. The previous experi- ence with the models applied to the cyclically standardized flows mentioned in the introduction of this section indicates that an AR(2) component is likely to exist. Therefore, the following models are retained for further investigation: the ARIMA (2,0,0) x (0,1,1)i2 and the ARIMA (0,1,1) x (0,1,1),2 models. STEP (3b). Maximum likelihood estimates. The (0,1,1) x (0,1,1)12 model is written as (1-B)C-B1)y, = (-6,B)(-6:B1*)e, 304 200 |}-— + > 4 | ry l ! | J rr ee pRosaoLiry That Gg —e Figure 6.11. Lognormal probability plot of logarithms of April flows of the Blue River near White Cloud, Indiana. to. 5 1 os. ° © @ 4 3 Tet z MONTH (j) Figure 6.12. Monthly serial correlations of the logarithms of flows of the Blue River near White Cloud, Indiana. 305 Figure 6.13. k 420, 00 k d0, 0 k ert, D+0 x au. ont Estimated autocorrelation functions for differenced series for the Blue River near White Cloud, Indiana, with u, = v4v2,y;, where y, = log, @,, N= 456 306 os a a5 420 \ 08 460 seo. 0.6 420 bo (ae Dok = 400 a 380 [ 31 362 boon 1.0 Figure 6.14. Sum of squares surface 3e,2(61,0:) for multiplicative model (0,1,1) "x (0,1,1)i2 applied to the logarithms of monthly flows of the Blue River near White Cloud, Indiana. or Ye = Yer * Yeeza ~ Yeas * &t 7 One peg - repre * 9191813 The sum of the squares surface %e,2(01.02) is calculated and plotted in Fig. 6.14 for a range of values 0.2 < 6; < 0.8 and 0.4 < @; < 1.0. The surface is well behaved and exhibits a single minimum around 6; = 0.55 and 6, = 0.9 The (2,0,0) x (0,1,1)12 model is written as (1 ~ 6;B - 62B%)G - Bl )y, = G1 - 4B! )e, or Vee * Oe + Vere ~ HVE ag ~ MVera te” Op ae 307 The sum of squares surface Ze#($1,2,01) is shown in Fig. 6.15 for 0.2 < 6: < 0.8, 2 40.1 and 0.4 < 6 < 1.0. Again, the surface is well-balanced with a minimum in the neighborhood of 4; ~ 0.5 and 6, ~ 0.9. For 62 =0 and 0.2 the shape of thé surface remained almost unaltered and only small changes in the sum of squares occurred. r 0.8 0.6 Figure 6.15. Sum of squares surface 3¢,7(01,02,01) for multiplicative model (2,0,0) "x (0,1,1)12 for do = 0.1, applied to logarithms of monthly flows of Blue River near White Cloud, Indiana. STEP (3c). Nonlinear iterative estimation. Using the nonlinear iterative estimation procedure the following values were obtained. Model (0,1,1) x (0,1,1)32 Iteration 6 o, Initial estimate 0.55 0.9 Final estimate 0.549 + 0.040 0.942 + 0.009 308 The results for the model (2,0,0) x (0,1,1):2 obtained from the program UNESTM are given in Table 6.3 STEP (3d). Diagnostic check. The autocorrelation function of the residuals of model (0,1,1) x (0,1,1)12 is shown in Fig. 6.16. Lags two to six are seen to be all negative and significantly different from zero as the standard error is 21/{N = 41/J456 = 40.047. This is not typical of random 24 series. The Q-statistic is Q=N x r,%(e) = 71.3 which is 1 highly significant as 2 (90%) = 30.8, substantiating the conclusion that this autocorrelation function is not representa- tive of a random series. The (0,1,1) x (0,1,1)12 model must therefore be discarded. Further inspection of the ACF of the residuals shows that the seasonal effects have been nearly eliminated. It would therefore appear that the seasonal part of the model is acceptable but the nonseasonal is not. Figure 6.16. Residual autocorrelation function for model (0,1,1) x (0,1,1)12, (the standard error is 41/(N)® = 0.047), applied to logarithms of monthly flows of Blue River near White Cloud, Indiana. The residuals of the model (2,0,0) x (0,1,1)12 are calculated from EVE” Ve 7 V2 Vere * Veg + 2% t-1g * OPE re The last 44 residuals are listed in Table 6.4 The autocorrelation and partial autocorrelation functions of the residuals of the model (2,0,0) x (0,1,1);2 are shown in Table 6.5. They give no reason to suspect that the resi- dual series is not independent. The Q-statistic calculated for the first 30 lags is Q = 17.30, which is not significant com- pared to 3, (90%) = 29.6. This suggests that the residuals may be considered as random. The ARIMA (2,0,0) x (0,1,1),2 model passed the Porte Manteau test in 15 out of 16 cases studied. In the one case not passing, the statistic was Q@ at the 90% level and was just passing at the 95% level. It may therefore be concluded that the ARIMA (2,0,0) x 309 T0-310293"8, oud TavENYLS TwhITSIA 2bp STWNOIS3A 40 83gHNN To-322sap"2 SUNOS NuSH “wngISza ‘a'U Gev — 20+3ee9e2"e sauunos 40 KNS TwhaIs3y SITNS34 ONY NOTLUYOSNT A3HLO To-az00Tb"6 To-a26995°8 10-320866°8 a 1 39yaan¥ SNINOH e To-31stso"T To-38e98"—» €0-30sS8E°6 2 1 anTss3uo3u01ny z To-absT1S"9 To-322968"¥ To-3eve9s"S 1 T 3nTSS3uoRN0LNY 1 LTHIT 3aasn AIHIT w3H07 ann azo dAL a36hON 4N3D Bd SE aalyutisa al 3Hvavd aa ahead aalanvava (oA = 4199907 = WLbo CaHYOSSNAL HIIM G3d07|NgC 300K 21 a3gyo 40 T - x NO ONTONREBAAIC SHOILUnaasao ssh gno7 ATH aeaN aznTa ante = x - wiv 1 7300 40 aaeMANS ‘euRIpU] ‘pnofg aityM Ivey seATY entg ‘smo[y ATYIUOW Jo wYIIeZ0] ayy 0} paddy @T(T‘T‘o) x (0'0'Z) POW ‘s"9 AITEL e S a Table 6.4. Residuals of Model (2,0,0) x (0,1,1)12 Applied to the Logarithms of Monthly Flows, Blue River Near White Cloud, Indiana. T FIVTED VALUE RESIDUAL DATA UALUE 5.8713E+00 2-36516+00 2.2363E+00 71 14qaE+00 aloszie-o1 2713805€+00 7.8022e+00 -Ilo161e+00 5.58526+00 5:3625E+00 8180g2e-01 7. 34346-00 5.4865¢+00 4.8702e-01 35+ 00 81 12886+03 -1120536+00 7 32008+00 4:9398e-00 -4151236-01 4!aB86e+00 3. 3980E+00 6.45326-91 4.5151e+00 4/51546+00 03186+00 3!5e3se+00 3:55336+00 3:3856E-01 3.88186+00 3151396+00 SI51586-01 310814E+00 515856E+00 312314€-01 615088E+00 7:07s2e+00, One peae- Taking the conditional expectation, indicated by a square bracket, we have for a lead time of L months 311 20" 20-0 Bar= 00" 00" HOYOS baT= SO" 0" Br be £1 ToT zo" so" $0" Sor Tor= 90" 20" HOT OTTO EO 2bb = SuOTLYna3sa0. 49 a3ANN Yo-3abe09"8= $3TSI5 JO “Nad “1s To-30g02*—e= $3183 341 40 Wa ‘Sa1aaS WHIDTIO suoruenaasao 2b) 7 70OH - STeNGTS3 GBLYNTIS3 3HL - viva SNOTLYTRUODOINY WILaeE 1 3 rato = stwnaszy aa.eWtis3 HL t MoUS3as 30 S333030 12 HIN 3 Wgulaen SaunOs-IHD ¥ ALIN GaledNOO 8 anOHS Vovgorae2"T “Sn 3Hi 3610 SLIMN SI Sa]a36 SIM SU SHT 1531 OL oosass2e2"T = aoa “1S 4€ a3diAIa WwSH so: sor sor go: so: so: so: so-so: gor “3s go-- s0-- 20" S0°~ So* for bor bora v2 “er $0: $0: go" 0° 0: sor sor sor gar gor gor gor ss To So" So" STO o> Bas BO BOY TO" EO TO a 2>b = swoTiunesan 49 amu soeSsre09*e= sa1aas Jo *n3d "1S T0=30802"-s= Saas Su. oo Wa 831838 WATIEO sHOLisnaasao 2bp TTaUOW = sTwnaISze Caue8T1S3 IHL ~ YIN NoToNny MOTLETEAROI0IN 1 seuerpul “prog aYM ceeN saary antg ayy jo smory ATyiuoj Jo suyytueSoy ayy 0) parddy 2'(T‘T‘9) Yened pue voreestoooy °s°g oTGeL X (0'0'Z) [POW JO S[eNpIsey Jo suooUNY uoreTeZs090: 312 yl) = (Yeap) = abv gpea) + O2l¥ar-o) + Yen-ael - OsYeen-a3) ~ O2lVeez-agl + [pap] > Ole pt-r2l Remembering that the expectations of past observations and past random variables are the values themselves, and that the expectation of the future random variables is zero, the fore- casting functions become: Ll VED) = Oa eFOoVe AV p11 Oe 22 e131 L=2 £ ypC2) = bry +a ety 797% ¥ 47772 4-101 10 L=3...12: y,(L) = dry, (L-1)+o2y,(L-2)+¥ 4473071 ta-137 eV teL-14 1 tL 12 L=13 ¥,C13) = dry, @2)+oey, C11) ty, (Lory -o2y ey L=14-y(14) = O1y,(13)+2y,(12)+y4 (201 y4 1) boyy, LL... + y,(L) = Ory, (L-1)+b2y,(L-2)+y, (L-12)-61y,(L-13)- bey, (L-14) For example, making use of the y, and e values listed in Table 6.4 we obtain the lead one forecast at base time t = 456 months: Ya5q(1) = *Y¥a56 * 2¥a55 * Yaas ~ %1¥qaq ~ $2Vqag ~ 1%4a5 = 0.55534 x 4.3567 + 0.0093865 x 5.2417 + 3.4012 - 0.55534 x 4.3175 - 0.0093856 x 5.3753 + 0.898802 x 0.62689 = 3.9852 Likewise, making use of Tables 6.4 and 6.5, the 14 month forecast is: Vagg614) = O1¥ g5g(13) + b2¥q56(12) + Yg5g(2) - o1¥g5661) - $29 456 = 0.55534 x 3.9814 + 0.009386 x 4.3596 + 4.6945 - 0.55534 x 3.9852 - 0.009386 x 4.3567 = 4.6945 313 The forecasts at base time t = 456 months are listed in Table 6.6. It may be observed that the forecasts tend to the monthly means: L Vaso) in, % Difference 1 3.9852 4.1443 3.84 2 4.6945 4.8189 -2.58 3 5.7638 5.5837 43.23 4 6.4920 6.5030 -0.17 5 6.6869 6.8632 +0.36 6 7.0225 7.0079 +0.21 ete To estimate the forecasting errors we calculate the y-weights from Eq. (6.40) equating coefficients of equal powers of B: (1 ~ O1B ~ $2B?)(1 - Bl?)(Uo + YiB + ¥gB? + ...) = 1-6,B1? or (1. = OB ~ 628% - BI? + gsBI® + 6eBt4) (Wo + YB + ¥aB?+...) = 1 - @yB1? Thus $o=1 Ui = o1 = 0.555343 Yo = Ovbi + 2 = 0.555343x0.555343+0.0093856 = 0.31779 Us = Oib2tda = 0,555343x0.31779+0.0093865x0.555343 = 0.1817 Yi = Orhi0 + bao Vie = orbit dotio + 1 - Wig = Orbi2 + Gabi + U1 - On Ostia + Ooi Y= Ory * O22 + ay - OY ag batjeyge 52M The first 24 y weights are listed in Table 6.7. The standard deviation of the forecasts is 314 wr +09 “a “eueTpU] ‘pNo[D aiYM JON Jeary ont ‘smolq AqyIUOW Jo sunpEZoT oy 07 paddy (1 “T°0) X (0'0'%) T9POW 20°01 usw eors2a “9'9 aIGeL 315 Table 6.7. Model (2,0,0) x (0,1,1)12 Applied to the Logarithms of Monthly Flows, Blue River Near White Cloud, Indiana. 1 WEIGHTS USED IN CALCULATING CONFIDENCE LIMITS AND UPDATING FORECASTS AFTER NEW OBSERVATION J PSC) 0 1.0000000E+00 1 519534304E-01 2 17792396-01 3 g169652E-01 4 1,038868SE-01 5 519398334E-02 8 3.3961585€-02 2 1.3417872E-02 8 1.1192360E-02 3 613478843E-03 10 3.6294656E-03 a 2L0751829E-03 42 1,0238477E-01 13 14 15 16 17 18 19 5:6878146E-02 3. 2547917E-02 1:8609146£-02 1.063997 06-02 6. 08350826 -03 3..4783060E-03 19887558E-03 20 1,1370908E-03 21 &is014292€-04 22 172564E-04 23 253781E-04 '31978E-01 = 1/2 sy) = (142 +0. + ayy?) For example for L = 3, sy(3) = (1 + 0.555347 + 0.317792)! x 0.8652 = 1.0271. Note that the standard deviations of the forecasts increase slowly to 1.048 for L = 12 and to 1.054 for L = 24 remaining essentially constant, and do not reflect the variation in the monthly standard deviations of the flows. The 95% confidence limits of yy5g(3) are 5.7638 + 1.0271 x 1.96 UI 5.7638 - 1.0271 x 1.96 = 3.75 The forecasts confidence limits are listed in Table 6.6. STEP (Sf). Inverse Transformation. The y,(L) are exponentiated to obtain the flow forecasts listed in the lower half of Table 6.6, and plotted in Fig. 6.17. 316 Figure 6.17. Graph of forecasts at origin, graph interval is 2.1360E + 01, T = 456 with (2,0,0) x (0,1,1)12 model applied to monthly flow log- arithms of the Blue River near White Cloud, Indiana. STEP (3g). Forecasts Updates and Real Time Forecasts. The forecasts may be updated as a new information becomes avail- able. In example 5.2.7, the series was short and the param- eters were re-estimated each time a new flow became avail- able. In this case, the subroutine FCAST of the program UNESTM updates the forecasts keeping the model parameter constant. Subtracting ya) from Yee) evaluated from Eq. (5.54) the following expression for the forecast update is obtained: Yer) = yl) + bee) where fee = Veer 7 YD) which follows from Eqs. (5.55) and (5.54) written for L = 1. Thus when the new observation, yge7, is available the fore- casts may be updated: a Ya57) = Ygse(2) + ¥1Qq57 ~ Vase)? = 4.6945 + 0.5553 (gn 36 - 3.9852) = 4.4714 317 where the flow at time 457 is 36 cfs. (See data input in Appendix). Likewise, Vag7(2) = Yagg63) + ¥2(¥a57 - Yasg(D) = 5.7638 + 0.31779 [2n (36) - 3.9852] = 5.6361 Similarly as soon as Yygg = 2n (138) becomes available, the updated forecasts with two new observations may be calculated: Vagg(1) = Yagq(2) + ¥n(Va5q - Ya5qD) 361 + 0.5553 [gn (138) - 4.4715] = 5.8892 and Vagg(2) = Yagz(3) + Yoa5g ~ Yqgq1) 6.4190 + 0.31779 [2n (138) - 4.4715] = 6.5638 The updated forecasts are listed in Table 6.8. The sequence of updated lead 1 forecasts or real time forecasts are plotted in Fig. 6.18 and the exponentiated values are shown in Fig. 6.19 “+e Lae one SF Ube Forecasr Figure 6.18. Real time forecasts of logarithms of monthly flows, Blue River near White Cloud, Indiana. 318 eres» “Ghiveah seusoas ‘eave m1 Hsu s eeveee ssuazz08 rat» wave ‘Gatean ‘soninaaseo raw 2 - “BueIpU] “PNo[D oNYM WAN Jeary ant ‘smord AIUIUOWL Jo suMEIeZ0T ou or pemtddy 27(T‘T‘o) x (0'0'Z) IPO “8"9 ATAEL 319 6.3.6 LIMITATIONS TO BE CONSIDERED IN APPLICATIONS OF MULTIPLICATIVE ARIMA MODELS A comparison of Figs. 6.18 and 6.19 reveals that the multiplicative ARIMA model tracks the logarithms of the flows better than the flow obtained by exponentiation of their logarithms 3000.7 “to o8Ta SF Forecast 2500. 4 2000. 1800, FLON( CFS) 000, 500, “8.00 “00 a i i ee Figure 6.19. Real time forecasts of monthly flow, Blue River near White Cloud, Indiana. The forecasts y,(L) are conditional means which are expected to be normally distributed about the actual flow log- arithm with a standard deviation s,(L). It appears more realistic to use the method of moments instead of exponentia- tion to obtain the flow forecasts. The method of moments gives directly the mean m, and the standard deviation s, of the logarithms from the mean xX and the standard deviation Sy of the x, series by the relationships ie) 2 x exp{m, + 85/2] and a =k 2). 8, = ¥fexp(st) - 1] 320 Thus, the flow forecasts ay) and their standard deviations 5,(L) are given by ay(L) = exply,(L) + 1/2 s*,(L)] and \ s(L) = a,(L) fexpls,2(L)} - 1) However, it is important to recognize that by either method the ARIMA models yield standard errors of the fore- casts which are independent of the season. This inability to take into account the seasonal variability of the standard de- viations results in errors in the flow forecasts. It is interesting to note that.this defect can be eliminated by using a different model. The same series of monthly flow logarithms can be made stationary by means of a nonparamet- ric periodic standardization. AR(2) models were fitted to the 2, series with t= (v-1)12+ 1 where the y's are flow logarithms. All 16 flow series tested passed the goodness of fit test. For the Wabash River at Logansport, Indiana, (Station 3-3290) the model parameters were 62 = 0.738 ; 1 = 0.468 + 0.043 ; op = 0.082 + 0.043 The forecasts of the flow logarithms obtained by the AR(2) model are shown in Fig. 6.20 and are very similar to those obtained with the ARIMA (2,0,0) x (1,1,0)12 model. For the AR(2) model, the standard error of the forecasts of the standardized flow logarithms 2, is 9, s(L) = 1+ Wy? tet to. tPF OL The corresponding standard error of the y,(L) is sy(t,L) = s,(L)s, where the s, are the standard deviations of the correspond- ing monthly flow logarithms. The s, thus introduce the seasonality effect in the s,(t,L). It is therefore possible to obtain correct flow forecasts as shown in Fig. 6.21. Other limitations and comparisons hetween the ARMA and ARIMA models are discussed in the next section. 321 $9 STN. 3-3290 O—© MONTHLY MEANS. %——X FORECAST FUNCTION %(L) 5 L 1 t— fe € 2 8 24 im 2 6 12 € 12 Figure 6.20. Forecasting function for logarithms of monthly flows of the Wabash River at Logansport, Indiana 10000) STH. 3.3290 OO MoNTHLY MEANS 8000; Xx Forecast FN O(L} aw 6000 4000 2000! L 1 L 6 12 6 24 se ee e 12 Figure 6.21. Forecasting function for monthly flows of the Wabash River at Logansport, Indiana 322 6.3.7, COMPARISON AND LIMITATIONS OF ARMA AND. ARIMA MODELS A number of linear stochastic models are used for hydrologic time series. Some of these models are used for generating synthetic sequences of data, while others are used to forecast the data one or several time steps ahead. Both applications are of considerable importance to the design and operation of water resources systems. The historical stream- flow record is just one realization of the stochastic process of streamflow time series, which may not include all the critical periods of floods and droughts. It is therefore necessary to generate synthetic streamflow sequences which are sufficiently long, so that they can be expected to produce these critical conditions. These flow realizations, with the same statistical characteristics as those of the historical streamflow record, make it possible to analyze water resources systems under a great variety of conditions and to approach the optimal design more closely than would be possible with the historical flows only. To accomplish these objectives, hydrologists construct stochastic models which preserve the mean, the variance, and the autocorrelation structure of the historical flow record. The justification for preserving the mean and the variance is that ‘the range of the cumulative departures from the mean, which in turn specify the design of reservoir capacities, can be estimated in terms of these two statistics: The justifica~ tion for preserving the autocorrelation structure is to main- tain the characteristics of low-flow and high-flow sequences (besides, the needed storage capacities are functions of the autocorrelation structure). Traditionally, the time series models are fitted to the autocorrelation function or equivalent- ly, to its Fourier transform or the spectrum of the stationary transformed historical hydrologic record. Obviously, several models with increasing degrees of complexity can satisfy these statistical requirements with an increased accuracy. The cost of this increased accuracy is usually an increase in the num- ber of model parameters, but often the historical series are fairly short and lead to uncertainties in the estimation of the parameters, as well as to doubts on the appropriateness of models requiring the estimation of a large number of param- eters. The following question may then be raised: What are the trade-offs between the practicality and the complexity of competing time series models used in hydrologic analyses? The model choice may not be the same for different time resolutions, such as year, month, week, or day. It is ap- parent that in the analysis of annual series, the long-range dependence would be of major concern; for monthly series, the seasonal effects may be the dominant characteristic; whereas for daily series, the absence of flow or of rain in a 323

You might also like