Hamilton, J. (1994) - Time Series Analysis PDF

| \Time Series Analysis - James D. Hamilton PRINCETON UNIVERSITY PRESS PRINCETON, NEW JERSEYCope © 1984 by Picrton Unversity Press ablated by Pecton Unieaty Pro, Wiliam St, Prinezton, New Jey 0850 {ne United Kingson: Princeton Universty Pet, Guster West Set a ‘Al RightsReserved brary of Congress Catsogng in Publication Date aration, Jems D. Games Dols), (934), Tie snc san JaresD. Ham, Toctes biographical references end indexes SBN Os1.0086 Lies arabe f Tie Sigss—aelo " sa-8s8 cP “This book hasbeen compoed in Ties Roman. Princeton University Pres toot are printed on ai-ree paper an meet the guielos for setbatecs acti fhe Come on Prodacon ste for Dk Longe lhe el on Library Resources. Pred inthe United Stats of America woe76s4aa | | Contents PREFACE sit 1 Difference Equations 7 Ll, First-Order Difference Equations 1 1.2. pth-Order Difference Equations 7 APPENDIX 1,A. Prof of Chapter I Propositions 21 References 4 2 Lag Operators 25 2.1, Introduction 25 2.2. First-Order Difference Equations 27 23. Second-Order Difference Equations 29 2.4. pth-Order Difference Equations 33 2S. Initial Conditions and Unbounded Sequences 36 References 42 3 Stationary ARMA Processes B 3.1. Expectations, Stationarity, and Ergodicity 43 3.2. White Noise 47 3.3. Moving Average Processes 48 3.4. Autoregressive Processes 53 3.5. Mixed Autoregressive Moving Average Processes 593.6. ‘The Autocovariance-Generating Function 61 3.7. Invertibility 64 APPENDIX 3.A. Convergence Result for InfiiteOrder Moving Average Processes 68 Exercises 70 References 71 4 Forecasting 4.1. Principles of Forecasting 72 4.2. Forecasts Based on an Infinite Number of Observations 77 4.3. Forecasts Based on a Finite Number of Observations 85 4.4, ‘The Triangular Factorization of a Positive Definite ‘Symmetric Matrix 87 4.5. Updating a Linear Projection 92 4.6. Optimal Forecasts for Gaussian Processes 100 4.7. Sums of ARMA Processes 102 4.8. Wold’s Decomposition and the Box-Jenkins Modeling Philosophy 108 APPENDIX 4.4. Parallel Between OLS Regression land Linear Projection 113 APPENDIX 4.B. Triangular Factorization of the Covariance Maris for an MA() Process 114 Exercises 1S References 16 5 Maximum Likelihood Estimation U7 5.1, Introduction 117 5.2. The Likelihood Function for a Gaussian AR(1) Process 118 5.3. The Likelihood Function for a Gaussian AR(p) Process 123 5.4, The Likelihood Function for a Gaussian MA(1) Process 127 5.5. The Likelihood Function for a Gaussian MA(q) Process 130 5.6. ‘The Likelihood Function for a Gaussian ARMA(p, q) Process 132 5.7, Numerical Optimization 133 Wi Contents 58. | Inference with Maximum Likelihood Estimation 142 5.9. Inequality Constraints 146 APPENDIX S.A. Prooft of Chapter 5 Propositions 148 Exercises 150 References 180 © Spectral Analysis 52 6.1. The Population Spectrum 152 6.2. The Sample Periodogram 158 6.3. Estimating the Population Spectrum 163 6.4. Uses of Spectral Analysis 167 APPENDIX 6.A. Proof of Chapter 6 Propositions 172 Exercises 178 References 178 7 Asymptotic Distribution Theory 180 7.1. Review of Asymptotic Distribution Theory 180 7.2. Limit Theorems for Serially Dependent Observations 186 APPENDIX 7.A. Proofs af Chapter 7 Propositions 195 Exercises 198 References 199 8 Linear Regression Models 200 8.1. Review of Ordinary Least Squares with Deterministic Regressors and i.i.d, Gaussian Disturbances 200 8.2. Ordinary Least Squares Under More General Conditions 207 8.3. Generalized Least Squares 220 APPENDIX 8.4. Proofs of Chapter 8 Propositions 228 Bzercises 230 References 231 9 Linear Systems of Simultaneous Equations 233 9.1. Simultaneous Equations Bias 233 9.2. Instrumental Variables and Two-Stage Least ‘Squares 238 Conenss vil9.3. Identification 243 9.4. Full-Information Maximum Likelihood Estimation 247 9.5. Estimation Based on the Reduced Form 250 9.6. Overview of Simultaneous Equations Bias 252 APPENDIX 9.4. Prooft of Chapter 9 Proposition 253 Exercise 285 References 256 10 Covariance-Stationary Vector Processes 257 10.1, Introduction to Vector Autoregressions 257 10.2. Autocovariances and Convergence Results for Vector Processes 261 10.3, The Autocovariance-Generating Function for Vector Processes 266 10.4. The Spectrum for Vector Processes 268 10.5. The Sample Mean of a Vector Process 279 APPENDIX 10.4. Proofs of Chapter 10 Propositions 285 Exercises 290 References 290 1 Vector Autoregressions 291 11.1. Maximum Likelihood Estimation and Hypothesis Testing for an Unrestricted Vector Autoregression 291 11.2. Bivariate Granger Causality Tests 302 11.3. Maximum Likelihood Estimation of Restricted Vector Autoregressions 309 11.4, The Impulse-Response Function 318 11.5. Variance Decomposition 323 11.6. Vector Autoregressions and Structural Econometric Models 324 11,7. Standard Errors for Impulse-Response Functions 336 APPENDIX 11.A. Proofs of Chapter 11 Propositions 340 APPENDIX LL.B. Caleulation of Analytic Derivatives 344 Exercises 348 References 349 ‘viii Contents 12 Bayesian Analysis 351 12.1. Introduction to Bayesian Analysis 351 12.2, Bayesian Analysis of Vector Autoregressions 360 12.3. Numerical Bayesian Methods 362 APPENDIX 12.4. Proofs of Chapter 12 Propositions 366 Exercise 370 References 370 13 The Kalman Filter 372 13.1, The State-Space Representation of a Dynamic System 372 13.2. Derivation of the Kalman Filter 377 13.3. Forecasts Based on the State-Space Representation 381 13.4. Maximum Likelihood Estimation of Parameters 385 13.5. The Steady-State Kalman Filter 389 13.6. Smoothing 394 13.7. Statistical Inference with the Kalman Filter 397 13.8. Time-Varying Parameters 399 APPENDIX 13.4. Proofs of Chapter 13 Propositions 403 Exercises 406 References 407 14 Generalized Method of Moments 409 14.1. Estimation by the Generalized Method of Moments 409 14.2. Examples 415 14.3. Extensions 424 14.4. GMM and Maximum Likelihood Estimation 427 APPENDIX 14.4. Proofs of Chapter 14 Propositions 431 Exercise 432 References 433 13 Models of Nonstationary Time Series 45 15.1, Introduction 435 15.2. Why Linear Time Trends and Unit Roots? 438 Contents ix15.3. Comparison of Trend-Stationary and Unit Root Processes 438 15.4. ‘The Meaning of Tests for Unit Roots 444 15.5. Other Approaches to Trended Time Series 447 APPENDIX 15.4. Derivation of Selected Equations for Chapter 15 451 References 482 16 Processes with Deterministic Time Trends 16.1. Asymptotic Distribution of OLS Estimates of the Simple Time Trend Model 454 16.2. Hypothesis Testing for the Simple Time Trend Model 461 16.3. Asymptotic Inference for an Autoregressive Process Around a Deterministic Time Trend 463, APPENDIX 16.A. Derivation of Selected Equations for Chapter 16 472 Exercises 474 References 474 Univariate Processes with Unit Roots 17.1. Introduction 475 17.2. Brownian Motion 477 17.3. The Functional Central Limit Theorem 479 17.4. Asymptotic Properties of a First-Order Autoregression when the True Coefficient Is Unity 486 17.5. Asymptotic Results for Unit Root Processes with General Serial Correlation 504 17.6. Phillips-Perron Tests for Unit Roots 506 17.7. Asymptotic Properties of a pth-Order Auutoregression and the Augmented Dickey-Fuller Tests for Unit Roots 516 1.8. Other Approaches to Testing for Unit Roots 531 17.9. Bayesian Analysis and Unit Roots 532 APPENDIX 17.A. Proofs of Chapter 17 Propositions 534 Exercises $37 References SAL X Contents 454 475 18.1, 18.2. 18.3, 19.1. 19.2. 19.3. Unit Roots in Multivariate Time Series ‘544 Asymptotic Results for Nonstationary Vector Processes 544 Vector Autoregressions Containing Unit Roots 549 Spurious Regressions 557 APPENDIX 18.4. Proofs of Chapter 18 Propositions 562 Exercises 568 References $69 Cointegration S71 Introduction 571 ‘Testing the Null Hypothesis of No Cointegration 582 Testing Hypotheses About the Cointegrating Vector 601 APPENDIX 19.4. Proofs of Chapter 19 Propositions 618 Exercises 625 References 27 Full-Information Maximum Likelihood Analysis of Cointegrated Systems 630 . Canonical Correlation 630 . Maximum Likelihood Estimation 635 . Hypothesis Testing 645 Overview of Unit Roots—To Difference or Not to Difference? 651 APPENDIX 20.A. Proofs of Chapter 20 Propositions 653 Exercises 655 References 685 Time Series Models of Heteroskedasticity 657 Autoregressive Conditional Heteroskedasticity (ARCH) 657 . Extensions 665 APPENDIX 21.A. Derivation of Selected Equations for Chapter 21 673 References 674 Contents xi22 Modeling Time Series with Changes in Regime 22.1. Introduction 677 22.2. Markov Chains 678 22.3. Statistical Analysis of Distributions 685 22.4, Time Series Models of Changes in Regime 690 id, Mixture APPENDIX 22.A. Derivation of Selected Equations Sor Chapter 22 699 Exercise 702 References 702 A Mathematical Review A.l. Trigonometry 704 A2. Complex Numbers 708 A3. Calculus 71 Ad, Matrix Algebra 721 A'S. Probability and Statistics 739 References 750 B Statistical Tables C Answers to Selected Exercises D__ Greek Letters and Mathematical Symbols Used in the Text AUTHOR INDEX 789 SUBIECT INDEX 792 il Contents 677 704 751 769 786 | Preface Much of economics is concerned with modeling dynamics. There has been an explosion of research in this area in the last decade, as “time series econometrics” has practically come to be synonymous with “empirical macroeconomics.” Several texts provide good coverage of the advances inthe economic analysis cof dynamic systems, while others summarize the earlier literature on statistical inference for time series data. There seemed a use for a text that could integrate the theoretical and empirical issues as well as incorporate the many advances of the last decade, such as the analysis of vector autoregressions, estimation by ge ‘eralized method of moments, and statistical inference for nonstationary data. Thi is the goal of Time Series Analysis. ‘A principal anticipated use of the book would be as a textbook fora graduate econometrics course in time series analysis. The book aims for maximum flexibility through what might be described as an integrated modular structure. As an example ‘of this, the first three sections of Chapter 13 on the Kalman filter could be covered right after Chapter 4, if desired. Alternatively, Chapter 13 could be skipped altogether without loss of comprehension. Despite this Dexibiity, state-space ideas are fully integrated into the text beginning with Chapter 1, where a state-space representation is used (without any jargon or formalism) to introduce the key results concerning difference equations. Thus, when the reader encounters the formal development of the state-space framework and the Kalman fiter in Chapter 13, the notation and key ideas should already be quite familiar. Spectral analysis (Chapter 6) is another topic that could be covered ata point of the reader's choosing or skipped altogether. In this case, the integrated modular structure i achieved by the early introduction and use of autocovariance-generating functions and filters. Wherever possible, results are described in terms of these rather than the spectrum. ‘Although the book is designed with an econometrics couse in time series methods in mind, the book should be useful for several other purposes. It is completely self-contained, starting fom basic principles accessible to first-year {graduate students and including an extensive math review appendix. Thus the book ‘would be quite suitable for a first-year graduate course in macroeconomics or dynamic methods that has no econometric content. Such a course might use Chap- ters 1 and 2, Sections 3.1 through 3.5, and Sections 4,1 and 42. Yet another intended use for the book would be in a conventional econo- ‘metrics course without an explicit time series focus. The popular econometrics texts do not have much discussion of such topics as numerical methods; asymptotic results for serially dependent, heterogeneously distributed observations; estimation of models with distributed lags; autocorrelation- and heteroskedasticity-consistent xiiistandard errors; Bayesian analysis; or generalized method of moments, All of these topics receive extensive treatment in Time Series Analysis, Thus, an econometrics course without an explicit focus on time series might make use of Sections 3.1 through 3.5, Chapters 7 through 9, and Chapter 14, and perhaps any of Chapters 5, 11, and 12 as well. Again, the text is self-contained, with a fairly complete discussion of conventional simultaneous equations methods in Chapter 9, Indeed, a very important goal of the text is to develop the parallels between (1) the tr ditional econometric approach to simultaneous equations and (2) the current pop- tularty of vector autoregressions and generalized method of moments estimation, Finally the book attempts to provide a rigorous motivation for the methods and yet still’ be accessible for researchers with purely applied interests, ‘This is achieved by relegation of many details to mathematical appendixes atthe ends of chapters, and by inclusion of numerous examples that lustrate exactly how the theoretical results are used and applied in practice. The book developed out of my lectures at the University of Virginia. 1am grateful fist and foremost to my many students over the years whose questions and comments have shaped the course ofthe manuscript. Izlso have an enormous debt to numerous colleagues who have kindly offered many useful suggestions, ‘and would like to thank in particular Donald W. K. Andrews, Stephen R. Blough, John Cochrane, George Davis, Michael Dotsey, Robert Engle, T. Wake Epps, Marjorie Flavin, John Geweke, Eric Ghysels, Carlo Giannini, Clive W. J. Granger, Alastair Hall, Bruce E. Hansen, Kevin Hassett, Tomoo Inoue, Ravi Jagannathan, Kenneth F. Kroner, Rocco Mosconi, Masao Ogaki, Adrian Pagan, Peter C. B, Phillips, Peter Rapoport, Glenn Rudebusch, Raul Susmel, Mark Watson, Kenneth D. West, Halbert White, and Jeffrey M. Wooldridge. { would also like to thank Pok-sang Lam and John Rogers for graciously sharing their data. Tanks also go to Keith Sill and Christopher Stomberg for assistance with the figures, to ‘Chen for assistance with the statistical tablesin Appendix B, and to Richard Mickey {or a superb job of copy editing, James D. Hamilton xiv. Preface Time Series AnalysisDifference Equations To. First Order Difference Equations This book is concerned with the dynamic consequences of events over time. Let's say we are studying a variable whose value at date ris denoted y,, Suppose we are given a dynamic equation relating the value y takes on at date cto another variable wand to the value y took on in the previous period: He Oar + We aa] Equation [1.1.1] isa linea frs-order difference equation. A difference equation is an expression relating a variable y, to its previous values. This is a first-order difference equation because only the first lag ofthe variable (y..) appears in the equation. Note that it expresses y, a8 a linear function of y,-. and w, ‘An example of [1.1.1] is Goldfelds (1973) estimated money demand function ‘or the United States. Goldfela’s model related the log of the real money holdings of the public (m) tothe log of aggregate real income (1), the log of the interest rate on tbank accounts (r), and the log ofthe interest rate on commercial paper (r): sm, = O27 + 0.7m, + 0.191, ~ 0.045ry,~ 0.019% [1.1.2] ‘This isa special case of [1.1.1] with y, = my = 0.72, and = 0.21 + 0.191, ~ 0.0451, ~ 0.019ry. For purposes of analyzing the dynamics of such a system, it simplifies the algebra a litle to summarize the effects ofall the input variables (J, ryn and 74) in terms of scalar wa here. 1 Chapter 3 the input variable w, wil be regerded asa random variable, and the implications of [1.11] forthe statistical properties of the output series y, wil be explore. In preparation for this discussion, itis necestry Sst to understand the rmethanics of iflerence equations. For the discussion in Chapters 1 and 2, the values for the input variable (w, wa.) will simply be regarded as a sequence of deter minitic numbers. Our goals Wo answer the flowing queston: I dynamic system fs escibed by [1.1.1], what are the effects ony of changes inthe value of w? Solving a Difference Equation by Recursive Substitution ‘The presumption is that the dynamic equation [1.1.1] governs the behavior of y forall dates, Thus, for each date we have an equation relating the value of 1_y for that date to its previous value and the eurrent value of w: Date Equation 0 yo dy +m 1.13} Y= dye + my (14) 2 pa entm 0.413) 6 Ont me p16, If we know the starting value of y for date ¢ = —1 and the value of w for dates ¢ = 0,1,2,..., then its posible to simulate this dynamic sytem to find the value of y for any date. For example, if we know the value of y fort = —1 and the value of w for ¢= 0, we ean calculate the value of y for # = 0 directly from [1-13]. Given this value of y and the vale ofw for¢= 1, we can ealeulate the value of y for = 1 from {1.1.4 Yam Oo + m= Moya + Me) + wy = Or + dm +m Given this value of y, and the value of w for ¢ = 2, we can calculate the value of yy for ¢ = 2 from [1.1.5} Tae On + m= HOA + de + OH) + Way = Py + Ge + don + me Continuing recursively in this fashion, the value that y takes on at date can be described as a function of is intial value y_ and the bistory of w between date O and date « wag This procedure is known as solving the difference equation [1.1.1] by recursive substitution 4 Oy tI, HOE, + dea te [LT] Dynamic Multipliers Note that [1.1.7] expresses y, as a linear function of the initial value y_, and the historical values of w. This makes it very easy to calculate the effect of wo on Yr If wo were to change with y_, and wy, wa, , w, taken as unaffected, the effect on y, would be given by » Bene. B13) Note thatthe calculations would be exactly the same ifthe dynamicsimulation were started at date ¢ (taking ,., as given); then y,., could be described as a 2° Chapter 1 | Difference Equations function of Joa td Wo Ws << ep Tap Oa + Ot Ma + Bigs FE brain + Map foal “The effet of w, on 46 given by Best ogy, (1.2.19) am, ‘Thus the dynamic mulpier [1.1.10] depends only onj, the length of time separating the disturbance to the input (w.) and the observed value of the output (),., multiplier does not depend on ¢; that i, it does not depend on the dates of the ‘observations themselves. This is true of any linear difference equation. ‘As an example of calculating a dynamic mulplier, consider again Goldfeld's money demand specification (1.1.2). Suppose we want to know what will happen to money demand two quarters from now if curent income I, were to increase by ‘one unit today with future income Z,,, and [,.3 unaffected: mm gym al, aw, al, s al, From (1.12) a onesunt increase in , wl increate , by 0.19 units, meaning that amsal, = 0:19. Since g 0.72, we caluate am, ae 0-72.19) 038, cause [isthe log of income, an increase in J, of 0.01 units corresponds to a 1% ‘increase in income. An increase in m, of (0.01)-(0.098) = 0.001 cortesponds to 10.1% increase in money holdings. Thus the public would be expected to increase its money holdings by a litle less than 0.1% two quarters following a 1% increase Different values of ¢ in [1.1.1] can produce a variety of dynamic responses of y to w. IFO < <1, the multiplier 3,.,/9m, in [1.1.10] decays geometrically toward zero. Panel (a) of Figure 1.1 plots a a function of j for @ = 0.8. If 1 < @ <0, the multiplier ay,,,/aw, will alternate in sign as in panel (b). In this case an increase in w, will cause jo be higher, y,., to be lower, j,.. t0 be higher, and so on. Again the absolute value of the effect decays geometrically toward zero. 1f6 > 1, the dynamic multiplier increases exponentially over time as in panel (c) ‘A given increase in w, has a larger effect the farther into the future one goes, For @ < ~1, the system [1.1.1] exhibits explosive oscillation asin panel (4) ‘Thus, if [| <1, the system is stable; the consequences ofa given change in vw, wil eventually die out. If jf > 1, the system is explosive. An interesting pos- sibility is the borderline case, @ = 1. In this case, the solution [1.1.9] becomes Yeap Yeon Mt Mees + Mae Ht Meeps tea [LAM] Here the ouput variable ithe sum ofthe historia nputsw.Aone-nitinrese in wil eae a permaneat onvun nce a My on We might also be interested in the effect of w on the present value of the stream of future realizations of y. For a given stream of future values Ye, Jess for} = 0,1, AL, First-Order Difference Equations 34 1 : @) @=08 ©) o= -08 ent @o=-11 FIGURE 1.1 Dynamic multiplier for first-order difference equation for different values of ¢ (plot of ay,.,/3m, = @/ as a function of the lagi). Yous, «- and a constant interest rate! r > 0, the present value of the stream at time ¢is given by Yas gee 1a Game O11) Let 6 denote the discount factor: peuaty. Note that 0 <8 <1. Then the present value {1.1.12} can be written as 2% Piven (1.1.33) Consider what would happen if there were a one-unit increase in w, with sey Mes». unaffected, The consequences of this change forthe present value of y are found by differentiating [1.1.13] with respect to w, and then using [1.1.10] The intret ae mented here a ation of I thas r= 01 coresponds toa 0% interest 4 Chapter t | Difference Equations to evaluate each derivative ee Bezel = ee = ua - 99), uaa provided that 64] <1 In calculating the dynamic multipliers [11.10] or 1.1.14, we were asking ‘what would happen if w, were to increase by one unit With sy) Wrens eo Many Unaffected, We were thus finding the effect of a purely transitory change in Panel (a) of Figure 12 shows the time path of w associated with this question, and Panel (b) shows the implied path for y. Because the dynamic multiplier [1.1.10] falculates the response of y to a single impulse in wif also referred to a8 the impulseresponse function. (2) Value of w (©) Value of y FIGURE 1.2 Paths of input variable (w,) and output variable (y,) assumed for ‘dynamic multiplier and present-value calculations. 2: auaaesCcniba-Soemaaiaiis anuaaniatie TESometimes we might instead be interestedin the consequences ofa permanent ‘change in w. A permanent change in w means that W, Wes, ., and w;., would all inerease by one unit, as in Figure 1.3. From formula (1-1.10) the effect’on y,., ‘of permanent change in w beginning in period t is given by Beaty Bek y Borg Bawa geregers +41 ‘When |4{ < 1, the limit of this expression a goes to infinity as the “long-run” effect of w on y: is sometimes described tin [it + Bet 5 Bet : +24] Lte+et PE Lam, * Seer * Owes Ome) (1.1.5) =u - 6). em ia Time (@) Value of w Tie () Value of y FIGURE 1.3 Paths of input variable (w.) and output variable (y,) assumed for long-run effect calculations. 6 Chapter 1 | Difference Equations For example, the long-run income elasticity of money demand in the system [1.1.2] is given by 09 Tom 0.68. ‘A permanent 1% increase in income will eventually lead to a 0.68% increase in money demand. ‘Another related question concerns the cumulative consequences for y of @ one-time change in w. Here we consider a transitory disturbance to w asin panel (@) of Figure 1.2, but wish to calculate the sum of the consequences for all future values of y. Another way to think ofthis is asthe effect on the present value of y [1.1.13] withthe discount rate = 1. Setting = 1in[J.1.14] shows thiscumulative effect to be equal to y= ua - 4, (1.16) provided that < 1. Note thatthe cumulative effect ony of a transitory change in'w (expression (1.1.16) isthe same asthe long-run effet ony of permanent change in w (expression {1.1.15)) 12. pih-Order Difference Equations Let us now generalize the dynamic gystem [1.1.1] by allowing the value ofy at date ‘to depend on p of its own lags along with the current value of the input variable ee ee Sa 0.23) Equation [1.2.1] is a linear pth-order difference equation. It is often convenient to rewrite the pth-order difference equation (1.2.1) in the scalar y, as a first-order difference equation ina vector &,, Define the (p x 1) vector & by ” dees B=} ye 1122) Yecps. ‘That is, the first element ofthe vector £ at date is the value y took on st date f ‘The second element of &; isthe value y took on at date ¢ — 1, and so on. Define the (p X p) mateix F by Ce 00 10 1 0 0 F=lo 0 of. 1.2.3] We Ose Onset :20. 12, pllOrder Difference Equations 7For example, for p = 4, Frefers to the following 4 X 4 matrix: th ta be oi) see wou Olio): 0010 For p = 1 (the first-order difference equation (1.1.1), Fis just the scalar 6. Finally, define the (p x 1) vector v, by vealo 124) Consider the following first-order vector difference equation: ba Feats p25) tbe by os byes Oy Ww 10 0 oo o =]o 10 0 0 +]o Yeenrd Loo 0 i ollyd Lo ‘This is system of p equations. The first equation in this system is identical to equation [1.2.1]. The second equation is simply the identity ‘owing to the fact thatthe second element of & is the same as the first element of & 1, The third equation in [1.2.5] states that y,-; = y,-3; the pth equation states, that ¥, 0 in the solutions (1.2.14) and [1.2.15] for the second-order system, If, furthermore, all of the eigenvalues are less than 1 in absolute valve, then the system is stable, and its dynamics are represented as a weighted average of decaying exponentials or decaying exponentials oscillating in sign. For example, consider the following second- order difference equation: Ye = O6j-1 + 0202 + From equations (1.2.14) and [1.2.15], the eigeavalues of this system are given by 0.6 + VOT HOD, Af = 086 2 a = SMO FMOY 4 oe : aa From (1.2.25), we have 1 = AA, ~ Az) = 0.778 2 = Ally ~ Ay) = 0.222, ‘The dynamic multiplier for this system, Bey Fat = ell + eas, 2, pii-Order Di erence Equations 13is plotted as a function of j in panel (a) of Figure 1.4. Note that as j becomes larger, the patter is dominated by the larger eigenvalue (A,), approximating a simple geometric decay at rate Ay If the eigenvalues (the solutions to [J.2.16) are real but atleast one i greater than unity in absolute value, the system is explosive. If Ay denotes the eigenvalue that is largest in absolute value, the dynamic multiplier is eventually dominated by an exponential function of that Other interesting possiblities arise if some of the eigenvalues are complex. ‘Whenever this is the case, they appear as complex conjugates. For example, if p = Land 63 + 4G, <0, then the solutions A; and A, in (1.2.14] and (1.2.15] are complex conjugates. Suppose that A, and A, are complex conjugates, written as Asati 11.2.3] Anam bi 1.233] For the p = 2 case of {1.2.14] and [1.2.15], we would have a= o2 1234 b= (2)\V== (1.235) Our goal is to characterize the contribution to the dynamic multiplier ‘when A; is & complex number asin [1.2.32]. Recall tha to raise a complex number to a power, we rewrite (1.2.32] in polar coordinate form: Ay = Reoos(o) + ésin(), 1.236) where @ and R are defined in terms of @ and b by the following equations: R=VveTe cos(®) = aR sin(®) = BIR Note that R is equal to the modulus of the complex number A, ‘The eigenvalue a; in [1.2.36] ean be written as! A= Rie“, and s0 A = Rife] = Rifcos(@) + isin(@p). (1.237) ‘Analogously, if 4 isthe complex conjugate of 4, thea Az = Ricos(6) ~ ésin(o), which can be written® as = Rie“) ‘Thus AL = Rife] = Recos(@) ~ ésin(@p]- (1.2.38) a tan parr pn a nuns pci ge ee ‘See eqution(A.3.25] in the Mathematical Review (Appendix A) athe end ofthe Book. See equnton (A'3.26, 14 Chapter 1 | Difference Equations 10 ne () 6 = 06,6, = 02 124 ma toe ©) $= 05,4: = 08 FIGURE 1.4 Dynamic multiplier for second-order difference equation for different values of gy and dy (plot of 2y.,/2w, as a function of the lag ). ‘Substituting (1.2.37) and (1.2.38] into [1.2.2] gives the contribution ofthe complex: conjugates to the dynamic multiplier 3,7: GM + GAL = e RI feos(a) + ésin(@)] + e4Ri[cos(@) ~ ésin( 4] es + e,] Rhcos(@p) + Fler ~ es R'sin(@), ‘The appearance of the imaginary number é in [1.2.39] may seem a little troubling. After all, this calculation was intended to give the effect of a change in the real-valued variable w, on the real-valued variable y,., as predicted by the real- valued system [1.2.1], and it would be odd indeed if the correct answer involved the imaginary number /! Fortunately, it tums out from [1.2.25] that f A, and A; ‘are complex conjugates, then c, and cy are complex conjugates; that is, they can 01.239] 1.2. pth-Order Difference Equations 15be writen at anatpi ana-Bi for some real numbers a and A. Substituting these expressions into [1.2.39] yields 6A, + cok = [Ca # Bi) + (a ~ fi] Rican) + F(a + i) ~ (a ~ A] Risin) [2a] Ricos(@) + (2B Risin(@) rR’ cos(@f) - 28R'sin(@j), Which i striely rel ‘Thus, when some of the eigenvalues are complex, they contribute terms proportional 1 R’cos() and R’sn(@) tothe dynamic multiplier dy,,.dw, Note that if R = 1—that is, if the complex eigenvalues have unit modulas—the multipliers are periodic sine and cosine functions of j. A given increase in w increases Yen, for some ranges of j and decreases y,,, over other ranges, with the impulse ‘ever dying out at j=. Ifthe complex eigenvalues are less than in modulus (<1) the impulse again follows a sinusoidal pattern though its amplitude decays 2 the rate R- tthe complex eigenvalues are greater than lin modulus (R > 1), the amplitude ofthe sinusoids explodes at the rate! For an example of dynamic behavior characterized by decaying sinusoids, consider the seconthorder system = 059-1 — O8).-2 + me ‘The eigenvalues for this system are given from [1.2.14] and (1.2.15} 03 + VOSF= aOR 0.25 + 0.861 with modulus R= UBF = ORF = 0: Since R 2- 6, ing that A, i real, the left side of this expression isa positive number and the inequality would be satisfied for any value of ¢ > 2. If, on the other hand, 4, <2, we can square both sides to conclude that A, will exceed unity whenever OF > 4 48, + OF >i 4 ‘Thus, in the real region, A, will be greater than unity either if > 2 oF if (by, ds) lies northeast ofthe line 6, = 1 ~ 6, ia Figure 1.5. Similarly, with real elgenvaives, the arithmetically smaller eigenvalue (A,) will be less than 1 whenever Rove 2 VETS < -2- & VETTE, > 2 + by Again, if < —2, this must be satisfied, and in the case when > —2, we can square both sides: BEF Mb > 44 4a, + 8 belt by ‘Thus, inthe real region, A, will be less than ~1 if ether $4 < ~2 or (by, 6) les to the northwest of the line ¢s = 1+ 4, in Figure 1.5. ‘The system is thus stable whenever (¢), ds) lies within the triangular region of Figure 15. General Solution of a pth-Order Difference Equation with Repeated Eigenvalues In the more general case of a difference equation for which F has repeated eigenvalues and 5 < p linearly independent eigenvectors, result [1.2.17] i generalized by using the Jorden decomposition, F = MJM" (1.2.49) where Misa (p x p) matrix and J takes the form hoo je bore oo 4, 18 Chapeer 1 | Difference Equations with 410-00 Ox 1 00 00 00 * 2a O00 Geer 0c for A, an eigenvalue of F. If [1.2.17] is replaced by (1.2.40), then equation [1.2.19] generalizes to Pe MEM! (1.2.9) where Hone oho 8 Estee! : : ok “Moreover, fom [1.2.4], if Ji of dimension (n, % n), thea MAY Qa? Glare Oa APY Qian He (1.2.43) a where ‘ na =D) 3-B1 o otherwise Equation (1.2.43] may be verified by induction by multiplying [1.2.41] by [1.2.43] and noticing that )) + (44) = (33). For example, consider again the second-order diference equation, this time ‘with repeated roots. Then ye MI eG eee O-{ 0 that the dynamic multiplier takes the form Bel a ff = bh + bain Tong-Run and Present-Value Calculations If the eigenvalues are all less than 1 in modulus, then F’ in (1.2.9) goes to ‘zero as j becomes large. If all values of w and y are taken to be bounded, we can "Tis expres it taken fom Chiang (980, . 44), 12. pth-Order Difference Equations 19think of a “solution” of y, in terms of the infinite history of w, Yee ME Wie + UaWins + amas Hy 1.2.44] where ¥ i given by the (1, 1) element ofF/ and takes the particular form of [1.2.29] in the case of distinct eigenvalues. tis also straightforward to calculate the effect on the present value of y of 1 transitory increase in w. This is simplest to find if we first consider the slightly more general problem ofthe hypothetical consequences ofa change in any element of the vector ¥, on any element of €,., in a general system of the form of [1.2.5] ‘The answer to this more general problem can be inferred immediately from [1.2.9} =F (1.245) ‘The tue dynamic multiplier of interest, 2y,. is just the (1, 1) element of the (p x p) matrix in [1.245]. The effect on the present value of § of a change ia v is given by aD Peay S a3 a% ee is thus the (1, 1) element of the (p x p) matrix in [1.2.46]. This value is given by the following proposition, Proposition 13: Ifthe eigenvalues ofthe (p Xp) matrix F defined in 1.2.3] are all less than in ioduls, then the mats (Ly ~ BE)? exists and the effet of Ww on the preset value of yi given by ts (I 1) element: MQ ~ 68 ~ 68 ~ +++ ~ by-sB™" ~ 6,80) [Note that Proposition 1.3 includes the earlier result for a first-order system (equation (1.1.14) as a special case. ‘The cumulative effect of a one-time change in w, on Jp, Yreus ~~. can be considered a special case of Proposition 1.3 with no discounting, Setting = 1 in Proposition 1.3 shows that, provided the eigenvalues of F are all less than 1 in ‘modulus, the cumulative effcet of « one-time change in w on y is given by & Sot = ud = b= om, 7) 12.47) Notice again that [1.2.47] can alternatively be interpreted as giving the even- tual long-run effect on y of a permanent change in w: Mey fos, Weay tim By Boos g Yoon eta eee eee tin Si Sek Sak + Bete Ml - bo b 4). 20° Chapter t'| Difference Equations APPENDIX 1.A. Proofs of Chapter I Propositions 1 Proof of Proposition 1.1, "The eigenvalues of satisty IF - al = 0. pad For the matrix F defined in equation (1.2.3), this determinant would be en ee 100 0 o| joao--- 00 o 10 0 of-looa---- 00 ooo 1 of looo- oa G-) & 6 % 1 gh Was} exert site pair vales fo in equation (L221 rc hat Ti cae Tr! =1, Ag] 22 Chapter 1 | Difference Equations where Tis given by [J.A.4] and [1.4.8]. Wetng ou the fst column of the matrix system Sr equations [1.8.9] explicitly, we have art art ey pe apt ae? el fo ara eo} fo aoa Meee ae ae vacate acl ieee “This gives a system ofp linear equations inthe p unknowns (#1. Brovided thatthe Aare all tine, the sltion canbe show tobe” -—___ 1 ____ "GWA AY 1 Bae @ 1 = Ay) ‘Substituting these values into [1,221] gives equation [1.225]. 1 Proof Proposition 1.3, Te sca i this proposition shat if he eigenvalues of ev ess than Brin modulus then te invene of (0p ~ A) es. Suppose the inverse th, BE) eid oot exis. Thea the determinant, ~ AP would have tobe zee. But Ul, ~ BF © |=B- OF ~ 8", = (AMF ~ 8 so that ~ “1 would have tbe zero whenever he inverse of (1, ~ BF lf exist Butts would man tat pv" is.an eigenvalue of F, which ruled Gu by The assumption fatal eigenvalues of Fare stiles than Bia models. Thus, the atin Ty ~ BP ‘ba be nonsngalar Since [ly BF]! exis it satisfies the equation 0, - AFI“, - 6] = 1, 1.A10) Let xy denote the row f, column j element of [ly ~ BE], and write [1.4.20] 2s iy Ka 22 ae] [= Be Bde oo Bp By Bata ct te|| “BD Oia Sa pel et prea) id paw To. O10 904 “The tm is then to find the (1) element of ly ~ AF", that sc ind the value of. To 60 this we need only consider theft row Of equations in (1.A.11) 1 ~ Bi, Bb + Boyer ~ BH) “Bt outro i =f 0+ 0g. (LAs ‘ee Lemma 2 of Chiang (1980.14) “Appendix 1.A. Proofs of Chapter 1 Propositions 23Consider postmultiplying this system of equations by 2 mate with 1s along the principal siagonal, inthe fow p, column p'~ I positon, snd Os elsewhere: 10-00 o1--00 00+ pt. ‘The effet ofthis operation is to multiply the pth column of x matrix by and add the result tothe (p ~ 1h column: 1 By ~B, ~~ Bly Be, ~ 86, “pt ° o fis tar ony] & Jan 0-0 9 o 6 ° i Next mutiny the (p= 1)th colin by and ad the cea othe (p ~ 2th clu, Prceating hs tion we crave a” cane Bu xe 0 ay) x eG 0+ Oo. [LAs ‘The first equation in (1.A.13] states that 2a (= B= BR = Br) = 3 2a = MC ~ By ~ Bo =~ Bd), 5 claimed in Proposition 13. Chapter T References Chiang, Cin Lone, 1980. An Inroduction wo Stchate Proce and Thr Applications. untnglon, NY Kreger. Goat, Stephen M_I9TR “The Demand for Money Revised,” Brookings Paper on Economic Activity 3:577-638. i Papen Sargent, Thomas 3.1987. Macroeconomic Theory, 248. Beson:Asadeic Pres 24 Chapter I | Difference Equations Lag Operators 2.1. Introduction ‘The previous chapter analyzed the dynamics of linear difference equations using. tmatrix algebra. This chapter develops some of the same results using time series ‘operators. We begin with some introductory remarks on some useful time series operators ‘A time series is a collection of observations indexed by the date of each ‘observation, Usually we have collected data beginning at some particular date (say, 1 = 1) and ending at another (say, t = 7): Our 9d We often imagine that we could have obtained earlier observations (yo, ¥-1, Yan. 4) oF later observations (Yr4s; Yran - --) had the process been observed {or more time. The observed sample ().. i+.» « Yr) could then be viewed as a finite segment of a doubly infinite sequence, denoted {y) Cyne Lee Yeadon Ye Yao Pe Pret dreae ooh obierved sample ‘Typically, a time series (yt... is identified by describing the th element. For example, a time wend isa series whose value at date 1s simply the date of the observation: yet We could also consider a time series in which each element is equal to a constant «regardless of the date of the observation f: yne. Another important time series is a Gaussian white noise process, denoted y where (eJf.» isa sequence of independent random variables each of which has 8N(, 0°) distribution. ‘We are used to thinking ofa function such as y = f(x) ory = gC, w) a8 an ‘operation that accepts as input a number (x) or group of aumbers (t, ») and ‘roduces the output (3). A time series operator transforms one time series or group 25of time series into a new time series. It accepts as input a sequence such as {)F--» OF a group of sequences such a8 (Lx) —m w)f=-») and has as output 2 new sequence {y)7.-». Again, the operator is summarized by describing the value ‘of a typical clement of {y)7.—» in terms of the corresponding elements of tri» ‘An example ofa time series operator is the multiplication operator, represented as 2 Bir pay Although ts wetten exactly the same way as simple scalar multiplication, equation {2.11} i acolly shorthand for an infinite sequence of multiplications, one for cach date The operator multiplies the value takes on at any date by some constant to generate the val of y for that dat. ‘Another example of a time series operator is the addition operator: yam tm Here the value of y st any date #is the sum ofthe values that x and w tke on for that date, Since the mukiplication or addition operators amount oelementby-element sltipliction or addition, they obey all he standard rales of algebra For example, if we multiply each observation of {xJ7.-. by 6 and each observation of {w)f—= by B and add the results, Bx, + Bey the outcome is the same as if we had first added {x}Z_. to {w,? ‘multiplied each element of the resulting series by 8: Als, + w) ‘A highly useful operator is the lg operator. Suppose that we start with a sequence {x)/_-., and generate a new sequence {y,)7_ - ., Where the value of y for date ris equal the value x took on at date — 1 Ye = Kray: 1.2) “This i described applying the lag operator to (rf. -» The operation isrepe- sented by the symbol Ls and then Lars on 13) Consider the result of applying the lag operator twice to a series: LL) = Lis) = Fx Such a double application of the lag operator is indicated by “L?™ Be, = a In general, for any integer k, Di, = oe pas] Notice that if we first apply the multiplication operator and then the lag operator, asin 21 BS, Bins the result will be exactly the seme as if we had applied the lag operator first and then the multiplication operator: Hien Bs, 26 Chapter2 | Lag Operators ‘Thus the lag operator and multiplication operator are commutative: L(Bx) = BL, Similarly, if we frst add two series and then apply the lag operator tothe result, Gn We + ms tae the result the same as if we had applied the lag operator before adding: GW) > reas Mad > Heer + Mie “Thus, the lag operator is distributive over the addition operator: Le, + wi) = Lee + Lie ‘We thus see thatthe lag operator follows exactly the same algebraic rules as the multiplication operator. For this reason, itis tempting to use the expression “multiply y, by L” rather than “operate on {y}7~—» by L." Although the latter expression is technically more correct, this text will often se the former shorthand expression to facilitate the exposition. ‘Faced with atime series defined in terms of compound operators, we are free to use the standard commutative, associative, and distributive algebraic laws for multiplication and addition to express the compound operator in an alternative form. For example, the process defined by Ym (@ + BLL, is exactly the same as = (AL + BLA, = a, + Br To take another example, (1 = ADO = AD = = AL aL + AL, (By AL + Ava), LS} Ont Ader + Odden ‘An expression such as (al. + BL) is referred to as a polynomial inthe lag operator. Iti algebraically similar to a simple polynomial (az + bz?) where 2 is, 4 scalar. The difference is that the simple polynomial (az + 62%) refers to a particular number, whereas a polynomial inthe lag operator (al. + 612) refers 10 fan operator that would be applied to one time series (xJ%.-» to produce a new time series {y)F= =. Notice that if 7.» is just a series of constants, for allt, then the lag operator applied to x, produces the same series of constants Lr nase. Thus, for example, (aL + BL? + ye (a + B+ y)6. e146 2, First-Order Difference Equations Let ws now return to the first-order difference equation analyzed in Section 1.1 Ye = Oye + Me [2.2.1] 2.2, First-Order Difference Equations 27Equation {2.2.1} canbe rewrten using the lag operator [2.1.3] a8 Ym OLy, + we ‘This equation, in turn, can be rearranged using standard algebra, Yim OLy, = my (= 6Dy, = 222) Next consider “multiplying” both sides of [2.2.2] by the following operator: (FOL FPR 4 OL ++ + 61), R23) “The result would be (L4 OL + PLE + OL ++ + GLY ~ Ly, QO + PEs PL +--+ grqw, 2241 Expanding out the compound operator on the lft side of [2.2.4] results in (14 6h + GL + PD +++ + LY - OL) HO 4OL 4 PEF OL + gL) = (FOL FPL EOD ++ SLNOL 25} HUFOLE GE LOD ++ gL) (OLE SLE PD +o FOL OLY) =a gL, Substituting [22.5] ito [2.2.4] yields (= PLY, = OLE PLE PD 4 + LIM, (226) Writing [2.2.6] out explicitly using (2.1.4) produces A Og Mt Ohar + Phen + Pha Hee Pine EO Amt Oia + Pig + Mas ++ Ge. (227) Novice that equation (2.2.7 sential to eqution [1.1.7]. Applying the ee eee ee ee rere were employed in the previous chapter to arive a [1.1 7) Tei imeresting to reflect onthe naar of he opertor (2.2.3) as becomes large: We saw in 22.5} hat (1+ OL + GLP + BL H+ + GLA - By, =», - Vy That is, (1+ OL + GL? + GL) + +++ + GLI — @L)y, differs from y, tern jeg <1 andy isa tnt sun ted 99°, will become negligible ax ¢ becomes large: (LOL + FL + PLP ++ + PLY Ly, =y, for large, A sequence {y)7.—. is said to be bounded if there exists a finite number F such that ld <¥ for alle, ‘Thus, when || <1 and when we are considering applying an operator toa bounded. sequence, we can think of (4 ol + OL + OL +--+ OL) 28 Chapter 2 | Lag Operators ss approximating the inverse of the operator (I~ 6L), with this approximation made arbitrarily accurate by choosing j suficiently large: (= GL)! = fim + OL + PLE BL H+ HL). (2.28) This operator (1 — #L)~ has the property (1 - 61)" ~ 41) where “I denotes the identity operator: y= ye ‘The following chapter discusses stochastic sequences rather than the deterministic sequences studied here. There we will speak of mean square convergence and stationary stochastic processes in place of limits of bounded deterministic sequences, though the practical meaning of {2.2.8} will be little changed. Provided that [6 < 1 and we restrict ourselves to bounded sequences or stationary stochastic processes, both sides of [2.2.2] can be “divided” by (I~ #L) to obtain y= aL aH Os + Par + Pts + 229) Itshould be emphasized that if we were not restricted to considering bounded sequences or stationary stochastic processes (wT. = and {7 -» then expression [2.2.9] would not bea necessary implication of (22 1]. Equation [2.29] isconsstent with (2.2.1), but adding term agg’, = OB 1m, + das + Pan + Phas tees (22.10) produces another series consistent wit [2.2.1] for any constant a, To verity that {22.10} is consistent with [22.1], multiply [2.2.10] by (L ~ 62): (1 = 6byy, = 1 = ob + (1 ~ @L)(1 ~ 80), all ~ 08" +, so that [2.2.10] is consistent with [2.2.1] for any constant a, “Although any process ofthe form of [2.2.10] is consistent withthe difference equation [2.2.1], notice that since [a <1, i ‘Thus, even f{wjf. isa bounded sequence, the solution {y)7—~ given by [2.2.10] js unbounded uniess a, = 0 in (2.210. Thus, there was a particular reason for defining the operator [2.2.8] tobe the iaverse of (1 ~ 4)—namely, (1 ~ #L)~* defined in (2.2.8) i the unique operator satisfying (1 ety ~ 61) 1 that maps a bounded sequence {w Jf» into a bounded sequence (y f=» “The aature of (I~ 9L)~! whes [gl = 1 wil be discussed in Section 2.5. 2.3. Second-Order Difference Equations Consider next a second-order difference equation: Yee bier + bayea + Me p34] 2.3. Second-Order Difference Equations 29Rewriting this in lag operator form produces (= bb ~ bob? \y, = 23.2) ‘The left side of [2.3.2] contains a second-order polynomial in the lag operator L. Suppose we factor this polynomial, that is, find numbers A, and A such that (L= $0 = dL) = (= AL) AL) = (= Bay + AJL + AVAL). (23.3) ‘This i just the operation in [2.1.5] in reverse. Given values for , and dy, we seek numbers A, and A; with the properties that Ath and Aa = bs For example, if; = 0.6 and ¢, = —0.08, then we should choose A, = 0.4 and Ay = 02: (1 = 062 + 0.0824) = (1 ~ 0.42)(1 ~ 0.21), R34) It is easy enough to see that these values of A and A, work for this numerical example, but how are A, and A; found in general? The tabk is to choose Ay and Az 0 as to make sure that the operator on the right side of [2.3.3] is identical to that on the left side. This will be true whenever the following represent the identical functions of 2: (1 ~ $z ~ 6229) = (1 ~ Ave) ~ Anz): P35) This equation simply replaces the lag operator L in [2.3.3] with a scalar 2, What is the point of doing so? With [2.3.5], we can now ask, For what values of 2 i the right side of [2.3.5] equal to zero? The answer is, if either 2 = Ay! or z = Az}, then the right side of (23.5] would be zero. It would not have made sease to ask an analogous question of [2.3.3]—L denotes a particular operator, not a number, and L = Aj" is not a sensible statement ‘Why should we care thatthe right side of [2.3.5] is zero if 2 = Ay? or if. Az? Recall that the goal was to choose A, and A; so that the two sides of [2.3.5] represented the identical polynomial in z. This means that for any particular value 2 the two functions must produce the same number. If we find a value of z that Sets the right side to zero, that same value of z must set the left side to zero as well. But the values of z that set the left side to zero, (= a2 ~ by) 0, 236 ae given by the quadratic formula: 4S Es pam b+ VTE cere Dag eee 23.8) Setting z = z, or z makes the left side of [2.3.5] zer0, while Ag" sets the right side of [2.3.5] t0 zero. Thus agteg 2.39) Apher 2310] 30° Chapter2 | Lag Operators | | | Returning to the numerical example (2.3.4] in which # = 0.6 and gy = ~0.08, ‘we would calculate 0.6 = VOSF = HOR) Bu 2(0.08) ns 06 + YOKE = HOB 8 and so ues) = 04 5.0) = 02, as was found in (23.4) ‘When 1 + 4¢; <0, the values 2, and 2, are complex conjugates, and their reciprocals Ay and A can be found by first writing the complex number in polar coordinate form, Specifically, write sath 2, = Ri[cos(6) + ésin(@)] = Re Rober = R-fc09(0) ~ ésin()} ‘Actually, there is a more direct method for calculating the values of A, and. A; from 4, and ds. Divide both sides of [2.3.5] by 2*: (7 ~ bet ~ 4) = TAYE ~ AD) (23.11) and define A to be the variable =! a (23.12) ‘Substituting [2.3.12] into [2.3.11] produces Q? = dA ~ 6) = @— ANA ~ A). (2.3.13) ‘Again, 2.3.13] must hold for all values of A in order for the two sides of [2.3.5] to represent the same polynomial. The values of A that set the right side t0 zero areA = Ayand A ™ A, These same values must set the left side of [2.3.13] 0 zero 8 well i (2 = ba - 6) #0. (23.14) ‘Thus, to calculate the values of Ay and A; that factor the polynomial in (2.3.3), we can find the roots of [2.3.14] dectiy from the quadratic formuls open Be ve nt (23.15) 1 AVE esas For the example of [2.3.4], we would thus calculate 6 + UBF = HOB 05+ VOF= TOD). 4, VOR = MOR) = 02. 2.3. Second-Order Difference Equations 31.cis instructive to compare these results with those in Chapter 1. There the dynamics of the second-order difference equation (2.3.1] were summarized by calculating the eigenvalues of the matrix F given by [* ‘4 (23.17) The eigenvalues of F were seen to be the two values of A that satisfy equation (1213) OF ~ GA ~ by) = 0. ‘But this i the same calculation as in (2.3.14). This finding is summarized in the following proposition, Proposition 2.1: Factoring the polynomial (1 ~ @L ~ dsL3) as (1 @L ~ bb) = (1 ~ AD) - AL) (23.18) 4s the same calculation as finding the eigenvalues of the mairix F in (2.3.17). The eigenvalues h, and dz of ¥ are the same as the parameters hand Az in [2.3.18], and are given by equations (2.3.15) and [2.3.16] The correspondence between calculating the eigenvalues of a matrix and factoring a polynomial inthe lg operators very instructive, However, introduces ‘one minor source of posible semantic confusion about which we have tobe care Recall from Chapter I thatthe system 2.31] i stable if both A, and Ay ae less than 1 in modulus and explosive if either A, or Ay is greater than 1 in modulus, Sometimes ths is described as the equirement tha the roots of OF 6A 3.19] lic inside the unit circle. The posible confusion i that it soften convenient to work dtetl withthe polynomial inthe form in which i appears in (2.3.2), (1 62 - 62 =0, 2329 whose rots, we have seen, are the reciprocals of those of 2.3.19]. Ths, we could say with equal accuracy that “the difference equation {2.3.1} is stable whenever the roots of [2.3.19] lie inside the unit cirle” or that “he difference equation (23:1) isstable whenever the roots of [2.3.20] lie ouside the uit crle.” The two ‘atements mean exactly the same thing. Some scholars refer simply tothe “roots of the difference equation [2.3.1]," though this raises the possiblity of confusion between [23.19] and (2.3.20), This book will fallow the convention of using the term “eigenvalues” to refer to the roots of [2.3.19], Wherever the term “roots is used, we will indicate explcily the equation whose roots are being described ‘From here on inthis section, it assumed thatthe second-order dtterence equation is stable, with te elgenvalues A, and Ay distinct and both inside the unit titel. Where this is the case, the inverses (LAD AIL EARL? AL + (1= Aghyt = 14 AL +L + aD + ‘ce wel defined for bounded sequences. Write [2.3.2 a factored form: (1 = ALI ~ Asbiy, = and operate on both ses by (L'~ AL)=#(1 ~ AgL)7> = (= ALINE ~ AL), p32 32 Chapter2 | Lag Operators Following Sargent (1987, p. 184), when A, + As, we can use the following operator: fr (eae eerie ana fa 4g} 23.22) [Notice that this is simply another way of writing the operator in [2.3.21} {rx (= a” ie {as = AL) = A(t = pe} “OAD = ey ‘Thus, [2.3.21] can be writen as naa ayf - { REAL FALE ALE] Be kb ate 4802 + dpe Y= les + cam, + (6s + elias + AE + call + [ead + elma toe (23.23) where = AMA, = a) (2.3.24) 2 = —Aal(Ay — da). [2.3.25] From [2.3.23] the dynamic multiplier can be read off credy as Bet mod + adie the same result arrived atin equations [1.2.24] and [1.2.25]. Ta, pih-Order Difference Equations ‘These techniques generalize in a straightforward way to a pth-order difference ‘equation of the form Y= Dior + Palin H+ Oy Inap + Me 4a] Write (2.4.1 in terms of lag operators as (1 ~ iL ~ dL? — +++ ~ 4L¢, 42] Factor the operator on the lft side of [2.4.2] a8 (= ib = dal? ~~ 6,17) = (L- ALYL~ AL) = AgL). (2.4.3) ‘This is the same as finding the values of (Aas « «Ap such that the following polynomials ate the same for all z: (1 = ye — ye? =o = yah) = (1 = Aus = Aas) = 242) 24. oth-Order Difference Equations 33nd ‘As in the second-order system, we multiply both sides ofthis equation by z define A= 2: (ar = gat ~ dar — A - @) SO =AIA =a) = AD Clearly, setting A = A, for i= 1,2, ..., orp causes the right side of [2.4.4] to equal zero. Thus the values (Ay, As,» 4) must be the numbers that set the lft side of expression [2.4.4] to zero as wel Ay = bE = SAP EA = bpd = 4 0. 245) ‘This expression again is identical to that given in Proposition 1.1, which characterized the eigenvalues (41,2, - - , A) ofthe matrix F defined in equation [1.2.3 ‘Thus, Proposition 2.1 readily generalizes. (244) Proposition 2.2: Factoring a pth-order polynomial in the log operator, (1 = OL ~ Gab? — «++ — pL?) = (1 - ALY Aa) «+ (1 = AL), {5 the same calculation as finding the eigenvalues ofthe matris F defined in [1.23]. The eigenvalues (As, Az, - Ag) of Fare the same as the parameters (hay - + Ay) in [2.4.3] and are given By the solutions to equation (2.4.5) The difference equation (2.4.1]isstableif the eigenvalues (the roots of [2.4.5]) le inside the unit citele, or equivalently if the roots of 1m br = 2? = = 6 le outside the unit circle Assuming that the eigenvalues are inside the unit citcle and that we are restricting ourselves to considering bounded sequences, the inverses (1 ~ AL)", (1= AD),..., (1 = ApL)-* all exist, permitting the difference equation (ADO = Abs AD, 24.6) to be written as Yee (= ALA = ALE (= ALI, 47) Provided further that the eigenvalues (A,, Aa, 2,) are all distinct, the polynomial associated with the operator on the tight side of [2.4.7] can again be ex- panded with partial fractions 1 OA = ha) T=) 2.48) (Cea acer haa ere; Following Sargent (1987, pp. 192-93), the values of (6.x. that make (2.48) true can be found by multiplying both sides by (L~ Az) — 238) + (1 ~ Aya): 1m c(h ~ Ass)(1 aye) = aya) $+ ex = Avz)(L = Age) ++ (= Aya) + 249] 4 Gl — Az) = Az) + (= Apa. Equation [2.4.9] has to hold forall values of z Since itis a (p ~ 1)th-order polynomial, if (Ci, a, «+ 5G) are chosen so thet [2.49] holds for p particular 34° Chapter 2 | Lag Operators distinct values of 2, then [2.6.9] must hold forall z. To ensure that [2.4.9] holds requires that T= 6(l = ATI = Aa) = ar) Age! = AY A ae For [24.9] to hold for = A531, requires o" Oe Pant] ae 4.2) OA, = AI Oy = Bed Note again that these are identical to expression 1.2.25] in Chapter 1. Recall from the discussion there that cy +c # <9 +G = 1 To conclude, [2.4.7] can be written “Tia “TaD” ex(l + AL + AL? + ALP + Jw beg(L + AGL + ARLP + ARL? ++, Heeb GL EASL + ABLE + AL? + Dw, Yale tet oes + lw + [oh + Gah Ht OADM FLAT + GAR 8 + GARD [2.4.13] FUGA + CAL + + GAs + where (Ci, cos -- - 6) are given by equations [2.4.10] through [2.4.12]. Again, the dynamic multiplet canbe read dee off [2.4.13 Seat = [aM + aah + + Gah, (24.14) reproducing the ceslt from Chapter ‘There is a very convenient way to calculate the effect of w on the present value of y using the lag operator representation. Write [2.4.13] as Yom VO bien, Ma Uahig te RAS) where = TeM + dt + oa 24.16) [Next rewrite [2.415] in lag operator notation as n= Wm eas7) ‘here o(L) denotes an infinite order polynomial inthe lag operator: WL) = Wo FAL + Ul? + WAL? + Pra ery eee re |Notice that ¥, is the dynamic multiplier (2.4.14). The effect of w, on the present value of y is given by 24.18} Thinking of Y(2) a «polynomial ina real number 2, OE) = Wet et aaah + Hae Eo it appears that he mutpr (2.4.18 i simply this polynomial evluted at z = 6 a3 Bias ime ‘But comparing [2.4.17] with [2.47] it = UB) = Wot B+ B+ WB + =>. [24.19] apparent that WL) = [(l ~ ALY ~ AL) (La), som 243 enn ha HL) = [= OL = dyLt LA We conde hat 2) 2 Uh = 2 = bet = be for ny ve oan pur, WB) = (1 = by ~ dB? — +++ - 06° [2.4.20] suming 24) 48] ve hat ad fed te aap Raa reproducing the claim in Proposition 1.3. Again, the long-run multiplier obtains as the special case of [2.4.21] with B = 1 Beay 4 oop Bens Pees, 20, * Bes 2.5. Initial Conditions and Unbounded Sequences Section 1.2 analyzed the fllowing problem. Given a pth-order difference equation Y= bye + Oar H+ May + Mee 5.) initial values of y, Yous Yor soYors psa and @ sequence of values for the input variable w, {oy ie ge 253) 36 Chapter 2 | Lag Operators we sought to calculate the sequence of values for the output variable y {900 Ys «od Certainly there are systems where the question is posed in precisely this form. We may know the equation of motion forthe system [2.51] and its eurrent state [2.5.2] and wish to characterize the values that {y, 1... - 31} might take on for different specifications off, Wy» Wi However, there ate maay examples in economics and finance in which theory specifies just the equation of motion [2.5.1] and a sequence of driving variables [2.5.3]. Clearly, these two pieces of information slone are insufficient to determine the sequence (yo, y.,-.. .yJ,and some additional theory beyond that contained in the difference equation [25.1] is needed to describe fully the de- endence of y on w. These additional restrictions can be of interest in their own ight and also help give some insight into some of the technical details of mani lating diference equations. For these reasons, this section discusses in some depth an example of the role of initial conditions and their implications for solving ditference equations Let P, denote the price ofa stock and D, its dividend payment. If an investor buys the stock at date «and sells it at ¢ + 1, the investor will ean a yield of D{P, om the dividend and a yield of (P,, ~ P)/P, in capital gains. The investor's total return (7) thus Foes = (Pras ~ PYP, + DAP, ‘A very simple model ofthe stock market posts that the return investors earn on stocks is constant across time periods 1 (Poa ~ PIP, + DIP, 7 >0. 25.4) Equation [2.5.4] may seem too simplistic to be of much practical interest; it assumes among other things that investors have perfect foresight about future stock prices and dividends. However, a slightly more realistic model in which expected stock returns are constant involves a very similar set of technical issues. The ad> ‘vantage ofthe perfectforesight model (2.54) is that it can be discussed using the tools already in hand to gain some futher insight into using lag operators to solve difference equations. Mulipy (2.5.4 by P, to arsive at TP, = Pras B+ Dy Pay = (1 + OP, ~ De 25.5] Equation 2.55] willbe recognized asa first-order difference equation ofthe form of [lel] With yy = Pays @ = (1+ Phy and w, = —D, From (1.1.7), we know that (2°55) implies that Pras = (14 FP) (14 HYDy (1+ 'D,—(L*Ds .5,6) Hee (1+ Dia = Dy I the sequence {Dp, Dy, ..., D}-and the value of Pp were given, then [2.5.6] ‘could determine the vales of (Py, Pa... Pas)- Butt only the values (Dy, Ds, 1D} ate given, then equation [25.6] would not be enough to pin down {P,, Py... Bish There are an infinite number of possible sequences (P,P, .-. Piss) consistent with [2.5.5] and with a given (Dy, Dy,» D). This infinite umber of possibilities is indexed by the inital value Py ple epee ee, eee ey EEE‘A further simpli paths for (Py, Pay +P, ‘assumption helps clarify the nature of these different ni}: Suppose that dividends are constant over time: D,=D foralle Then [2.5.6] becomes Para (Lt nmr [dete toed t ye D is erp, 12 Gt et Bom = nn - OD = (1+ ORs = Din} + Din. Consider first the solution in which Py = Dir. Ifthe initial stock price should happen to take this value, then 2.5.7] implies that P= Dkr psa] forall, In this solution, dividends are constant at D and the stock price is constant at Dir. With no change in stock prices, investors never have any capital gains or losses, and their return is solely the dividend yield D/P = r, In a world with no changes in dividends this seems to be a sensible expression ofthe theory represent by (2.5.4). Equation [2.5.8] is sometimes described as the “market fundamentals solution to (2.5.4) for the case of constant dividends “However, even with constant dividends, equation [2.5.8] snot the only result consistent with [2.5.4]. Suppose that the inital price exceeded Dir Py > Dir Investors seem to be valuing the stock beyond the potential ofits constant dividend stream. From (2.5.7] this could be consistent with the asset pricing theory (2.5.4] provided that P, exceeds Dir by an even larger amount. AS long as investors all believe that prices will continue to rise over time, each will earn the required return from the realized capital gain and [2.5.4] will be satisfied. This scenario has reminded many economists of a speculative bubble in stock prices. If.such bubbles are to be ruled out, additional knowledge about the process for {P)t._..isrequired beyond that contained inthe theory of (2.5.4). For example, we might argue that finite world resources put an upper limit on feasible ttock prices, as in \Pl
O and this operator is not defined. In this ease, a lag operator representation can be sought forthe recursive substitution forward that led from (2.5.5] to (2.5.13}. This is accomplished using the inverse of the lag operato Lot, = Mets which extends result [2.14] to negative values of k. Note that L: inverse of the operator L: LL) = Las = In general, LAL = L-+, with L° defined as the identity operator: Lon We 40 Cheaper? | Lag Operators ‘Now consider multiplying [2.5.15] by hain ae Ce pa to obtain (ee be Le QPL XM 4NLP SMF tmnt se ete $F OME xD -a4nE BP. = [EP + [-U.: + fap ee FLT which is identical to [2.5.13] with cin [2.5.13] replaced with ¢ + 1. ‘When r > Oand (Pr. -. ia bounded sequence, the left side ofthe preceding ‘equation will approach P,. as T becomes large. Thus, when r > 0 and (P)7 and (DJF. -« are bounded sequences, the limit of the operator in [2.5.17] ¢ and could be viewed as the inverse ofthe operator on the left side of [2.5.15]: = (4+ nip = -G 4 ne KEE EDL + Lt LEH Applying ths limiting operator to [2.5.15] amounts to solving the difference equation forward asin [2.5.14] and selecting the market fundamentals solution among, the set of possible time paths for {P);_ given a particular time path for dividends (Dit. “Thus, given a first-order difference equation of the form (= Oly, = Hy (2.5.18) Sargent’s (1987) advice was to solve the equation “backward” when |] <1 by sulipying by Wei aM sors ett eE sy (25.19) and to solve the equation “forward” when |4| > 1 by multiplying by ot! ee (25.20) es Defining the inverse of {1 - L] in this way amounts to selecting an operator [1 6b} with the properties that {1 = 61)" x [1 = 62] = 1 (the identity operator) and that, when its applied to a bounded sequence {w)f=-», et ec the result is another bounded sequence. ‘The conclusion from this discussion is that in applying an operator such as (1 ~ 62)", we are implicitly imposing a boundedness assumption that rules out Sc samnbdd:EGAAOLAAL-UiAde SAAUUULMAGIEE MIADLGAMULAE = ETphenomena such as the speculative bubbles of equation (2.5.7] a priori. Where that is our intention, so much the better, though we should not apply the rules {2.5.19} of [25.20] without some reflection on their economic content. Chapter 2 References Sargent, Thomas J, 1987. Macroeconomic Theory, 24 ed. Boston: Academic Press Whiteman, Charl H, 1988. Linear Rational Expectations Models: A User's Guide. Mi seapalis: University of Minnesota Pres, 42° Chapter 2 | Lag Operators Stationary ARMA Processes ‘This chapter introduces univariate ARMA processes, which provide a very useful class of models for describing the dynamics ofan individual time series, The chapter begins with definitions of some of the key concepts used in time series analysis. Sections 3.2 through 3.5 then investigate the properties of various 4 RMA processes. Section 3.6 introduces the autocovariance-generating function, which is useful for analyzing the consequences of combining different time series and for an under- standing of the population spectrum. The chapter concludes with a discussion of invertblity (Section 3.7), which can be important for selecting the ARMA representation of an observed time series that is appropriate given the uses to be made of the model, 31. Expectations, Stationarity, and Ergodicity Expectations and Stochastic Processes Suppose we have observed a sample of size T of some random variable ¥, Whe Yar eo Ieh Bat] For example, consider a collection of T independent and identically distributed (iid) variables, Ce Baz with e.~ NO, 2%) This is referred to as a sample of size T from a Gaussian white noise process. ‘The observed sample [3.1.1] represents T particular numbers, but this set of TT numbers is only one possible outcome of the underlying stochastic process that generated the data. Indeed, even f we were to imagine having observed the process for an infinite period of time, arriving atthe sequence 0) the infinite sequence {yf would stil be viewed as a single realization from a time series process. For example, we might set one computer to work generating an infinite sequence of i..d. N(0, 0%) variates, {e/9)7.-., and a second computer generating @ separate sequence, {e{}... We would then view these as two independent realizations of a Gaussian White noise process. Te Pet Yor Mae Yas ee Yan Irae Peano 43Imagine a battery of J such computers generating sequences {{"I7. =, OPH Ot.) and consider selecting the observation associated with date 1 from each sequence: LPP ‘This would be described as a sample of J realizations of the random variable ¥, ‘This random variable has some density, denoted fy(y,), which is called the unconditional density of ¥,. For example, for the Gaussian white noise process, this density is given by cue 3 00 = ee en 32] ‘The expectation of the tth observation of atime series refers to the mean of this probability distribution, provided it exists: 20%) = [trod &, B13] ‘We might view this asthe probability int of the ensemble average: £(%) = plim cu & v9 Bad For example, if (¥.)2 poise process {¢,) represents the sum of a constants plus « Gaussian white Y= wt 6 B15} chen its mean is EY) = 1 + Ble) =m Bag ICY, isa time tend plus Gaussian white noise, ¥, = pt +e, (3.4.7) then its mean is E(Y) = pe B18) Sometimes for emphasis the expectation E(¥,) is ealled the unconditional ‘mean of ¥,, The unconditional mean is denoted yy: BY) = my Note that this notation allows the general possibilty thatthe mean can be a function ofthe date of the observation «. For the process [3.1.7] involving the time trend, the mean (3.1.8) is a function of time, whereas for the constant plus Gaussian white noise, the mean [3.1.6] is not a function of time. ‘The variance of the random variable ¥, (denoted 7, is similarly defined as aoe EW, =m)? = [0 = md Fuld by, 1g) Sb: iiaaaiead aad REAL Pada For example, for the process (3.1.7), the variance is Ye = EY, — BoP = ECC) ‘Autocovariance Given a particular realization such as {y}!}7... on a time series process, ‘consider constructing a vector x!" associated with date ¢. This vector consists of the [j + 1] most recent observations on y as of date ¢ for that realization: - a8 generating one paricular value of the vector x, and want to calculate the probability distribution ofthis vector xf? across realizations i. This distibuton is called the joint disozbution of (Yn Views +» Y,..). From this distribution we can calculate the jth autocovariance of Y, (denoted ») we [fe fo Ge= m0 = wed % fi terontiYo Yonrs HD a ear dy,» [3.1.10] EC BN med) Note that [2.1.10] has the form ofa covariance between two vaiables X and Y: Cov(X, ¥) = E(X - wx)(¥ ~ py). Thus [3.1.10] could be desrbed asthe covariance of ¥ with its own lagged value: tence, the tem “autocovarance." Notice farther toms (3110] that the Olt at tocovadane is atthe variance of Y, a8 atipated By the notation min [3199, "The autocovaraace yycan be viewed ast (1,j +1) element ofthe varianoe- covaance matt of the Vectors, For lib reason the sulocovarances ee 6e seribed asthe second moments of ie proves for Y “Again it ay be helpful to think of the ih autocovasince asthe pobebilty Unit fen cael overage: Ye = pli ane (YP = wd 42) ~ bah Bin] As an example of calculating autocovariances, note that for the process in [2.1.5] the autocovariances are all zero for j + 0: mh EM, = Ws) = Eee) = 0 for] #0. Stationarity If neither the mean ja nor the autocovariances 7, depend on the date 1, then the process for ¥, is said to be covariancesstationary o weakly stationary: proce E(Y) = for all: EY, w(K = w) = for all cand any 3.1, Expectations, Stationarity, and Ergodicity 45For example, the process in [3.1.5] is covariance-stationary E(Y) = w BUY, ~ why ~ 1) = ¢ es By contrast, the process of [3.1.7] is not covariance-stationary, because its mean, Bit, isa function of time. Notice that if a process is covariance-stationary, the covariance between Y, and ¥,-, depends only on j, the length of time separating the observations, and not on é, the date of the observation. It follows that for a covariance-stationary process, 7, and _, would represent the same magnitude. To see this, reall the efinition EY, ~ WN ~ 0) (3.4.12) If the process is covariance-stationary, them this magnitude is the same for any value of ¢ we might have chosen; for example, we can replace # with ¢ + j = Eres — MMesiiy =H) = BM = WY =) = EOF, 1% — But refercing again tothe definition [3.1.12], this last expression is just the definition of y.;. Thus, for any covariance-stationary process, y= yr) forall integers. 1.3] A different concept i that of strict stationarity. A process is said to be strictly stationary if, for any Values of jy, ja.» » jy the joint distribution of (¥,, Yon. Youn «=» Yig,) depends only on the intervals separating the dates (j., js,» 4j)'and not on the date itself (0). Notice that if a process is strictly stationary with finite second moments, then it must be covariance-stationary—if the densities over which we are integrating in [3.1.3] and [3.1.10] do aot depend on time, then the ‘moments 4, and y, will not depend on time. However, itis possible to imagine a process that is covariance-stationary but not strictly staonary; the mean and au- ‘ocovariances could not be functions of time, but perhaps higher moments such as EY) are, In this text the term “stationary” by itself is taken to mean “covariance- stationary. ‘A process {¥;} is said to be Gaussian if the joint density Pr deonon stron Ser Yens © Yesid) fs Gaussian for any ji, ja, .-- » jy Since the mean and variance are all that are needed to parameterize a multivariate Gausian dstibution completely, a covariance- stationary Gaussian process is strictly stationary. Ergodicity We have viewed expectations ofa time series in terms of ensemble averages such as [3.1.4] and [3.1.11]. These definitions may seem a bit contrived, since usually all one has available i a single realization of size T from the process, which we earlier denoted {y{, yf, . .. , y9}- From these observations we would cal culate the sample mean J. This, of course, is not an ensemble average but rather a time average: yearn Sop. Basa 46 Chapter3 | Stationary ARMA Processes ‘Whether time averages suchas [3.1.14] eventually converge to the ensemble concept (Y) for a stationary process has to do with ergodicity. A covariance stationary process is said to be ergodic for the mean if [3.1.14] converges in probability t0 BLY) as T— =." A process will be ergodic for the mean provided thatthe autocovariance 7, goes to zero sufficiently quickly as j becomes large. In Chapter 7 we will se that if the autocovariances for a covariance-stationary process satisfy Siie=, 115] then {¥) is ergodic for the mean. Similarly, ieestationary process is said to be ergodic for second moments if (ur -a 3,0 = ws wy forall. Sufficient conditions for second-moment ergodicity will be presented in Chapter 7. Inthe special case where (Y} is a stationary Gaussian process, condition [B.1.15] is sufficient to ensure ergodicity for all moments. For many applications, stationarity and ergodicity turn out to amount to the same requirements. For purposes of clarifying the concepts of stationarity and ergodicity, however, it may be helpful to consider an example of a process that is stationary but not ergodic. Suppose the mean i for the ith realization {y(n is generated from a N(0, M) distribution, say Y= +e, (1.16) Here (e) is 8 Gaussian white noise process with mean zero and variance o* that is independent of ji. Notice that by = E(u) + Ble) = 0. Also, Ye = Bul + 6h = at oF and a = EO + 6M + 6) =a for) #0. ‘Thus the process of [3.1.16] is eovariance-stationary. It does not satisy the sufficient condition [3.1.15] for ergodicity for the mean, however, and indeed, the time average an 3 v9 = am Fw +e) = 0 +0 Be ‘converges to n rather than to 2er0, the mean of ¥, 3.2. White Noise ‘Thebasic building block forall the processes considered in this chapter isasequence {eJfs-x whose elements have mean zero and variance o?, G24) 8.23) and for which the e ‘Omen “ergodicity” ic ued ia a more general ease see Anderson and Moore (197%, p. 319) oF Henan (1970 pp 201-2. 3.2. White Noise 47Ele) = 0 fore ts, B23) A process satisfying (3.2.1] through [3.2.3] is described as a whive noise proces. ‘We shall on occasion wish to replace [3.2.3] with the slightly stronger condition that the e's are independent across time: é €, independent for ¢ # +. B24} Notice that 3.2.4] implies [3.2.3] but [3.2.3] does not imply [3.2.4]. A process satisfying (3.2.1) Unrough [3.2.4] is called an independent white noise process. Finally, if [3.2.1] through {3.2.4} hold along with 4 ~ NO. 2°), B23) then we have the Gaussian white noise process. 333. Moving Average Processes The First-Order Moving Average Process Let {¢} be white noise asin [3.2.1] through [3.2.3], and consider the process Yaw t et benny Ba. Where wand @ could be any constants. This time series is called a first-order moving ‘average process, denoted MA(1). The term “moving average” comes from the fact that ¥, is constructed from a weighted sum, akin to an average, of the two most recent values of &. "The expectation of ¥, is given by E(Y) = Eu + 6, + 66,1) = w+ Ele) + OE (6,1) = [332] ‘We used the symbol ufo the constant term in [3.3.1] in anticipation of the result that this constant term turas out to be the mean of the process. The variance of ¥, is EY, - Ble, + 86.3) Ele + 20Q,-1 + Pet») B33] 40+ 0o* (1+ ee “The fist autocovaiance is CY, — wXYi-1 — w) = Ele, + 8€,-s)(e.-1 + 6€,-2) Eletins + 062 + 86a + Pest) (334) 04 bot 4040. Higher autocovatiaces ae all zero ECL, = Whey 1) = Ele + Be Meiey# eras) =O for />1. B35} Since the mean and autocovariances are not functions of time, an MA(1) process is covariance-stationary regardless of the value of 6, Furthermore, [3.1.15]is clearly satisfied Spa white noise, then the MA(1) process [3.3.1] is ergodic for 1 + Bo? + [or ‘Thus, if (6) is Gaussi all moments. 48 Chapter 3 | Stationary ARMA Processes ‘The jth autocorrelation of a covariance-stationary process (denoted p) is defined as its th autocovariance divided by the variance: py ah B36 ‘Again the terminology arises from the fact that pis the correlation between Y, and ¥, Cour, i Gort Yd = rae) Want)” Vie Vin ~ Since p isa correlation, jp = 1 forall, by the Cauchy-Schwarz inequality. Notice also thatthe Oth autocorrelation p, is equal to unity for any covariance-stationary process by definition. From (3.3.3) and [3.3.4], the frst autocorrelation for an MA(1) process is tiven by o_o + Ae T+ Higher autocorrelations are all ero. “The autocorrelation ,can be plotted as a function of /as in Figure 3.1. Panel (@) shows the autocorrelation function for white noise, while panel (b) gives the autocorrelation function for the MA(1) process: Y= 64 086-1. For different specifications of # we would obtain different values for the first autocorrelation pin (3.3.7). Positive values of @ induce positive autocorrelation in the series. In this ease, an unusually large value of Y, i likely to be followed by a larger-than-average value fr ¥,, jut a a smaller-than-average ¥, may well be followed by a smaller-than-average ¥,... By contrast, negative values of @ imply negative autocorrelation—a large ¥, might be expected to be followed by a small value for ¥,. ‘The values for p, implied by different specifications of @ are plotted in Figure 3.2, Notice that the largest possible value for pis 0.5; this occurs if @ = 1. The ‘smallest value for pis ~0.5, which occurs if @ = ~1. For any value of p, between =0.5 and 0:5, there are two different values of 6 that could produce that auto- cortelation, This is because the value of i(1 + 62) is unchanged if 8 is replaced by 1/8: a 3.7) (uy 62-00) PTs OF” FILS WF] For example, the processes Ym e+ 0.56,-1 and e+ ear would have the same autocorrelation function: 2 0. MTs +055 o.. We will have more to say about the relation between two MA(1) processes that share the same autocorrelation function in Sestion 3.7 5.3, Moving Average Processes 49(2) White noise: ¥, = 2, (b) MAQ): ¥, = 6, + 0.8e,-5 . oa : ae oe 2 (©) MAG): ¥,= 6 = 2661+ 036-2 (8) ARM): Y= 08%... + 6 = olse2) + 086, o (©) ARG): ¥, = -0.8¥.4 + 6 FIGURE 3.1 Autocorrelation functions for assorted ARMA processes. The qth-Order Moving Average Process A qih-order moving average process, denoted MA(g), is characterized by Yom at e+ Biter + Batica t+ Bing B38] ‘where (6) satisfies [3.2.1] through [3.2.3] and (8, numbers. The mean of [3.3.8] is again given by p: Gy + 48) could be any real E(Y,) = w+ Ele.) + 6°E (6-1) + BE (e,-2) + + GEC, ‘The variance of an MA(q) process is We EU, = WP = Ele Brea t Bina t+ Opin? BBS] 50 Chapter3 | Stationary ARMA Processes a FIGURE 3.2. The first autocorrelation (p,) for an MA(1) process possible for different values of 6. Since the e's are uncorrelated, the variance [3.3.9] is 4 Olot + Oho? Hoe + Oho (LF RHA + OBo® (8.3.10) Beet ABU + Bera + Baa Ong) % (61a Banja t Bafnpen t+ Beicpeal Ba.) = BLO + Ga Oya + OaaBaehajaa t+ lef) Terms involving e's at diferent dates have been dropped because their product has expectation zeto, and Oy is defined to be unity. For j > q, there are no e's with common dates in the definition of 7, and so the expectation is zero. Thus, 1, + Gash + Ooh +--+ 6408,-Jre? forfe 12....g "Lo forj> ai Boil For example, for an MA(2) process, welt + apo n= lh + elo? n= (elo? wanes 0. For any values of (8, 6s -- » 9), the MA(@) process is thus covariance- stationary. Condition [3.1.15] is satisfied, so for Gaussian ¢, the MA(q) process is also ergodic forall moments, The autocorrelation function is zero after q lags, as in pane! (c) of Figure 3.1. The Infinite-Order Moving Average Process "The MA(q) process can be written x, wt 3 Ge.) “ser equine [AS in Aen A heed oft os 3.3. Moving Average Processes. 51ih 6 1 Comider the proses hat result as ¢—+ Yen Stes we tees en tates te. BAS) This could be described as an MA(s) process. To preserve notational flexibility later, we will use ¥’s forthe coefficients ofan infinite-order moving average process 1nd @'s for the coefficients of a finte-order moving average process ‘Appendix 3.A to this chapter shows that the infinite sequence in [3.3.13] generates a well defined covariance-stationary process provided that Bw<* 33.14) Ie is often convenient to work with a slightly stronger condition than [3.3.14 Sic 1315) A sequence of numbers {y}7=0 satisfying [3.3.14] is suid to be square summable, whereas a sequence satisfying [3.3.15] is said to be absolutely summable. Absolute summability implies squaresummability, but the converse does not hold—there are examples of square-summablé sequences that are not absolutely summable (again, see Appendix 3.A). ‘The mean and autocovariances of an MA(e) process with absolutely sum rable coetficents can be calculated from a simple extrapolation of the results for an MA(g) process:? BCX) = fim Ela + dons + dieu Yate to + Ursin) 13316) = EY, ~ a = fim Et, + Vibe + Matra to ee 3a slim + Wee eae? y= EY, = wy — % ( ws) ~ 3.18) = Oy) + Ypte + Yost + Hats +9) Noreoer, an MA) poet wth absolutely sunmablcoetcienshas soli Spice pay] Hence, an MA(e) proces satsying [33.15 is ergoic forthe mean (ee Appendix eee 2Alaale oma jn ad exinene of he eond mame (ate fen condos wo peri inecuepng he ero meson ad vaunonSpaaiele WO eee ‘of random variables such that ie os Me Bind 0 and ¢ > 1, then ¥, in [3.4.1] is equal to a postive constant plus a positive number times its lagged value plus a mean-zero random variable. Yet [3.4.9] seems o assert that Y, would be negative on average for such a process! The reason that formula [3.4.9] is not valid when jg] = 1 is that we assumed in [3.4.8] that Y, is covariance-stationary, an assumption which is not correct when idl = 1. To find the second moments of Y, in an analogous manner, use [3.4.3] to rewrite [3.4.1] as, Y= ud ~ 8) + 4y, = W) = OH) + 6 (3.410) Now square both sides of [34.10] and take expectations: EQ = wh = PE n ~ a) + 266(%.-2 ~ me] + ECD). [8.4.10] 54 Chapter 3 | Staonary ARMA Proceses (@) $ = O (white noise) we ee @ o=09 FIGURE 3.3 Realizations of an AR(1) process, ¥, = #¥,.1 + 6, for alternative values of 6. Sisk ikialicausnauisb PLANETSRecall from [3.4.2] that (¥-1 ~ 4) isa linear function of e,-1. 61-2 On = 1) at beat Beat But ¢ is uncorrelated with &-y, 6)-m «= +, © #, must be uncorrelated with (7.1 ~ wu). Thus the middle tri oo the right side of [3.4.11] is zero BU(¥.-s ~ we] = 0 BA] ‘Again, assuming covariance-stationarty, we have E(Y, - w= E(¥,-1 - wy = Yo. (3.4.13) Substituting [3.4.13] and [3.4.12] into [3.4.11], we dn t0+o w= ol ~ &), reproducing 3.4.4) Similarly, we could multiply 3.4.10} by (¥,-, ~ ) and take expectations EU, = WH; =) = PEK 5 — Ky — W)) + Ele; - wh But the term (¥.-y ~ j) will be a linear function of fj. €;-j-ts fr=j-as “+ which, for / > 0, wil be uncorrelated with ¢,. Thus, for / > 0, the lst term on the right side in [3.4.14] is zero. Notice, moreover, that the expression appearing in the first term on the right side of [3.4.14], Ba OY - wh is the autocovariance of observations on Y separated by j ~ 1 periods: EO -1 = WY eneyen = 0) ‘Thus, for j > 0, [34.14] becomes HF Oh Baas} ‘Equation [3.4.15] takes the form ofa first-order difference equation, Joe Os + Me in which the autocovariance 7 takes the place of the variable y and in which the subscript (which indexes the order ofthe autocovariance) eplaces (which indexes time). The input w, in [3.4.15] is identically equal to zero. I is easy to see thatthe ditference equation [3.4.15] has the solution y= Ht, which reproduces [3.4.6]. We now see why the impulse-response function and autocorrelation function for an AR(1) process coincide—they both represent the solution to a first-order difference equation with autoregressive parameter ¢, an inal value of unity, and no subsequent shocks. aa] The Second-Order Autoregressive Process ‘A second-order autoregression, denoted AR(2), satisfies Yenc + AY + bY t te (3.4.16) Side eaiaanatcee t umudddial miele Tdaaliiad or, in lag operator notation, (= 6b = L9Y, = 6+ 6, Bau7] The difference equation [3.4.16 is stable provided that the roots of ( ~ 6 - #24) = 0 5.4.18) lie outside the unit circle. When this condition is satisfied, the AR(2) process turns ‘ut to be covariance-stationary, and the inverse of the altoregressive operator in [3.4.17] is given by WL) =~ iL = aL"! = do + iL + val? + WL + (34.19) Recalling (1.2.44). the value of y, can be found from the (1, 1) element of the matrix F raised 10 the jth power, as in expression [1.2.28]. Where the roots of {3.4.18} are distinct, a closed-form expression for gis given by [1.2.29] and 1.2.25}. Exercise 3.3atthe end of thischapter discusses alternative algorithms forcalculating “Multiplying both sides of [3.4.17] by W(L) gives ¥,= w(De + wLde, (3.4.20) I is straightforward to show that WLJe = ell = & ~ 4) (3.4.21) and Zc (3.4.22) the reader is invited to prove these claims in Exercises 3.4 and 3.5. Since [3.4.20], isan absolutely summable MA(@) process, its mean is given by the constant term: m= All ~ 6 ~ @) (423 ‘An alternative method for calculating the mean isto assume thatthe process is covariance-stationary and take expectations of [3.4.16] directly: E(Y) = c+ HE) + E(Y-2) + Ele), implying weet but du +0, reproducing [3.4.23] To find second moments, write [3.4.16] 8 Yom wl = 1 ~ b) + OM + OY te, (hw) = bt — HW) + xa — H+ (3.4.24) Multiplying both sides of [3.4.24] by (Y,., ~ a) and taking expectations produces ya bat eye forj=1,2,.. (3.4.25) ‘Thus, the autocovariances follow the same second-order difference equation as, oes the process for Y,, with the difference equation for y, indexed by the lag j. ‘The autocovariances therefore behave just as the solutions to the second-order difference equation analyzed in Section 1.2. An AR() process is coveriance- stationary provided that and ¢ lie within the triangular region of Figure 1.5. Bases cecdacsnaaaae cea aasaaanas ae‘When ; and d lie within the triangular region but above the parabola in that figure, the autocovariance function +, is the sum of two decaying exponential functions of j. When ; and ¢ fall within the triangular region but below the parabola, 7; is a damped sinusoidal function. ‘The autocorrelations are found by dividing both sides of [3.4.25] by 9 2) = bitin + duty for) = 1,2, 3429 In particular, setting j = 1 produces P= bs + bam py = dull ~ 6) (34.27) For] P= bps + oy (3.4.28) ‘The variance of a covariance-stationary second-order autoregression can be found by multiplying both sides of [3.4.24] by (¥, ~ u) and taking expectations: E(Y, = BP = br E- ~ Wh, ~ a) + by Ea ~ WK ~ Hd re= On + ban + (84.25) The last term (a?) in [3.4.29] comes from noticing that E(eMY, = w) = Ele Mdi(¥e-1 — w) + b2(¥-2 ~ w) + ed $0 + 4:0 + 0%, Equation [3.4.28] can be written To ™ b:Pim0 + dirt + 0% 6.439) ‘Substituting [3.4.27] and [3.4.28] into [3.4.30] gives welatigtatytailese (1 = ¢2)o* OF ONG OF ~ 6 The pth-Order Autoregressive Process A piorder atoregresion, denoted AR(p), satses Yet ON Yar to Fey the BABI] Provided that the roots of 1 = ge - et ae 0 432) all lie outside the unit circle, itis straightforward to verify that a covariance- stationary representation of the form Yet we, (34.33) 58° Chapter 3 | Stationary ARMA Processes cxists where ML) = (1 = bb = abt = Ly and 20 |yj| < e. Assuming that the stationarity condition is satisfied, one way to find the mean isto take expectations of (3.4.31) C+ Om + dae too + de, +=) (3.4.34) Using [3.4.34], equation [3.4.31] can be written Ye m= bh — a) + (Finn = a) + + lg) +6, BAR Autocovariances are found by multiplying both sides of [3.4.35] by (Y,, ~ a) and ‘aking expectations ie thea + diya to + bp» for j= 1,2, YO loin + ban too + daty +o? forj= 0 Using the fact that 9, =, the system of equations in [3.4.36] for j= 0, 1, g.P can be solved for tos 1. » yp a8 functions of o, by, das. -» dp Tt an be shown' that the (p X 1) vector (x Yous)” given by the fet p clements of the first column of the (p? x p*) matrix of. ~ (F @ PY]! where Fe the (p x p) matrix defined in equation [1.2.3] and @ india the Kronecker product. Dividing (3.4.36] by 1 produces the Yule-Walker equation: P= Oe + boat + Pep fOr = 1,20... [2437] ‘Thus, the autocovariances and autocorrelations follow the same pth-order difference equation as does the process itself [3.4.31] For distinct roots, theit solutions take the form 134.36) WEN HEM + BAL, (8.4.38) where the eigenvalues (iy, ., Ay ae the solutions to MP bt = PF = 4 =. 35. Mixed Autoregressive Moving Average Processes ‘An ARMA(p, 4) process includes both autoregressive and moving everage terns: Yom et bY t Yer t+ OYinp Het Bir [35.1] $ then oot Bilicgs 6 in lag operator form, (= el ~ 60 ~ +++ = Ly, TOFU OL + OL ++ + OL, Provided that the roots of Lm Gir = bet = = ber = 0 B33) BBs.2) “Th reader willbe invited to prove tis in Exar 10. ia Chepe 10.lie outside the unit circle, both sides of [3.5.2] can be divided by (1 ~ GL ~ dL? = ++ = yl?) to obtain Y= a+ WLe, where (+ abt aut y+ Ly) - WO) OT 4L- el 807) Riise Be ol ~ di - bs = ~ ): ‘Thus, stationary of an ARMA process depends entirely on the autoregresive parameters (Brus. + 4) and aot on the moving average parameters (8, 5, a) It is often convenient towrite the ARMA process [3.51]in terms of deviations from the mean: Yim w= b(Kan — Hw) + alFaa — H) + sa ke ae Autocovariances ae found by multiplying bot sides of [3.5.4] by (¥,.) ~ m) and taking expectations, For > 2, the resulting equations take the form Wed t erat FG frie ath at2. BSS “Thus, after qlgs the autocovarance function 7 (and the autocorrelation function 2) follow the ptrorder difference equation govemed by the autoregressive Parameters, ‘Note that [3.55] does not hold for = ¢, owing to correlation between Bf, and ¥,_, Hence, an ARMA(p, q) proces wil have more complicated autocova- innoes'fr lags through 4 than would the corresponding AR(p) process. For 7] a with dstinet autoregressive roots, the autozovarances wil be given by y= MAG + tad + + AyAb. 3.5.6) “This takes the sme form as the autocovarances for an AR(p) proces [3.438], though because the iil conditions (7,7. -» 7) difer forthe ARMA and ‘AR processes, the parameters h, in (3.36) wil not Bethe same asthe parameters nibs), There i « potential for redundant parameterization with ARMA proces. Consider, for example, a simple white nose proces, » Y= & B37] ‘Suppose both sides of [3.5.7] are multiplied by (1 ~ pL); (1 = pLy¥, = (1 = pl)ew B53) Cleary, if [3.5.7] is a valid representation, then so is [3.5.8] for any value ofp. ‘Thus, [3.5.8] might be described as an ARMA(1, 1) process, with ¢, = p and 4, = —p. Itis important to avoid such a parameterization. Since any value of p in 3.5.8) describes the data equally well, we will obviously get into trouble trying to estimate the parameter pin [3.5.8] by maximum likelihood. Moreover, theoretical manipulations based on a representation such as [3.5.8] may overlook key cancellations. If we ae using an ARMA(1, 1) model in which 6, is close to ~¢, then the data might better be modeled as simple white noise. imi s egeegea ee agen epee gmeaaaa ‘A related overparameterization can arise with an ARMA(p, q) model. Con- sider factoring the lag polynomial operators in [3.5.2] as in [24.3] (= AL) = Aak) ++ = aL VCH, ~ 0) = mL) = mL) = mbt ‘We assume that [Al < 1 for al , so thatthe process is covariance-stationary. If the autoregressive operator (1 ~'dL ~ gal? ~ +++ — 4,L*) and the moving average operator (1+ @\L + &L? +--+ 4L*) have any roots in common, say 45,1 oF some ! and then bot sds of [55 can be ded by (3.5.9) fa-anc-m= fa - ae (oth - ore - + ALYY, ~ where (> th = 80? = = ga BUA aD (+ OTL + OEE + oo + op gLery (1 = MLV = mab) == = my-aE)(L ~ youl) “+= gD). The stationary ARMA(p, q) process satisying [3.5.2] i cleanly identical to the stationary ARMA(p ~ 1, q ~ 1) process satistying (35.10) iL) — sorb) + (1 ~ Ag) 36. The Autocovariance-Generating Function For each of the covariance-stationary processes for ¥, considered so far, we cal culated the sequence of autocovariances (yf. If this sequence is absolutely summable, then one way of summarizing the autocovariances is through a scalar- valued function called the auocovariance-generating function: byl) - we. 3.6.1] ‘This function is constructed by taking the jth autocovasiance and multiplying it by some number z raised to the jth power, and then summing over all the possible values of j. The argument ofthis function (2) is taken to be a complex scalar. Of particular interest as an argument forthe autocovariance-generating function is any value of z that lies on the complex unit circle, £ = cos(a) ~ isin(w) = e-™, where i = V=T and w is the radian angle that z makes withthe real axis, If the autocovariance-generating function is evaluated at z = e~™ and divided by 2m, the resulting function of w, i ae is called the population spectrum of Y. The population spectrum will be discussed 1 Hy tele) = Ze8rle™) 3.6. The Autocovariance-Generating Function 61eee deta in Chapter 6. Thre it wil be showa tht fra process with wade asovvaranat, the faneton oye) ext a can be ed ele SA?UPtheautoovrincs, Tis means thet two afereat processes sate the dae atocovrance generating futon then the wo proces exhib the er at ecquence of etocoaiazees “Alan crampl ofcelaing an aulocovine-geneaing function, comer the MAC proves From equations [29 3] fo [33], feauecovarane generating ia gvla) = [Oor}zt + [(1 + oA? + [0a2|et= oem" + (1 + 6 + 0 Notice that this expression could alternatively be written av) = o°(1 + 62)(L + 827, 62] ‘The form of expression {3.6.2} suggests that for the MAC) process, Ya mt (LE GL + OLE + + BLM, the autocovariance-generating function might be calulated as av(a) = ol + Oe + BEE + OE) p63} M4 et Bye? tH Be) “his conjecture can be verified by carying out the multiplication ia [3.6.3] and collecting terms by powers of 2: (1 6 + ya 4 20) (LH BEE ZB Hee + OE“) (OR) + (gan + 84H)2—" + Bgaat By. + HYAIEE FAH OB + HE BE B64) HHO SH HOA FEO + BEA pee + EE ‘Comparison of [3.6.4] with [3.3.10] or [3.3.12] confirms that the coefficient on 2/ in 3.6.3] is indeed the jth autocovariance. ‘This method for finding gy(2) extends to the MA(=) case. If Yams Ke B63] with WL) = do tal + LP + Boo snd = lies B71 then ay(2) = o7H(2)02"9) 1.68] For example, the stationary AR(1) process can be written as Y= p= (1-6) ep ich iin he form of [65] with Y(L) = U0 ~ 61). Te autocvaince: 9 tanalon for ab ARCO proses woud eer be ened fom 2 T= #0 62 Chapter 3 | Stationary ARMA Processes ate) = teed ‘To verity this claim directly, expand out the terms in [3.6.5]: PU + ere TOPE) ee a ee rn from which the coefficient on 2! is Ob! + BG + GEE +)» oto ~ BF) This indeed yields the jth autocovariance as earlier calculated in equation [3.4.5]. 4 The autocovarance generating funtion for asttionary ARMA(, 9) process can be written (3.6.10) Sometimes the data are fitered, or treated in a particular way before they are analyzed, and we would like to summatize the effects ofthis treatment on the autocovariances. This calculation is particularly simple using the autocovariance generating function. For example, suppose that the original data ¥, were generated from an MA(1) process, ¥,= (1+ OL), 6.11] with autocoveriance-generating function given by [3.6.2]. Let's say that the data as actually analyzed, X,, represent the change in Y, over its value the previous period: x, Hower = (l- DY, B.6.12] Substituting [3.6.11] into (3.6.12), the observed data can be characterized as the following MA(2) process, X= 1-1) + OL)e,= [1+ @= 1)L~ O1e,=[1 4 4L + Oe, [3.6.13] with 8 = (0 ~ 1) and 6, = ~8. The autocovariance-generating function of the observed data X, can be calculated by direct application of [3.6.3] ax(2) Its often instructive, ho) factored form of the first PUL + Oz + G2 + + Be"), [3.6.14] ver, to keep the polynomial (1 + 62 + 6:2) in its e of (3.6.13), (+ 02 + 24) = 00 + 69), in which case 3.6.14] could be written Belz) = o%L ~ 2)(L + 62) = 2-10 + 02-2) = 21 = 27) gy(e) Of course, [3.6.14] and [3.6.15] represent the identical function of z, and which way we choose to write itis simply a matter of convenience. Applying the filter 6.15) 3.6. The Autocovarlance-Generating Function 63(1 ~ L) to ¥; thus results in multiplying its autocovariance-generating function by @- 20 = 2°). ‘This principle readily generalizes. Suppose that the original data series (¥) satisfies [3.6.5] through [3.6.7]. Lets say the data ae filtered according to ayy, [6.6] with nay = Saw Dhhi<= ‘Substituting [3.6.5] into [3.6.16], the observed data X, are then generated by 2X, A) + MLW )e, = wt + De where * = h(1)u and y*(L) = A(L)Y(L). The sequence of coetfcients associated With the compound operator (yf. -» turns out 0 be absolutely summable? and the autocovaiance-generating function of X, can accordingly be calculated as Bx(2) = o89*(z"(2" @A(zAZ)0(2-*)A(Z“") = A(ZACE“gy (2). [3.6.17] Applying the filter h(L) to a series thus results in multiplying its autocovariance- ‘generating function by h(2)k(=~). 37. Invertibility Trvertibility for the MA(1) Process Consider an MA(1) process, Yn ne (4 on p78 wt ee steed {F Sheace ‘yet, ve (S.ne)(S ae) ee ee Coe eae oom which the coefficent one bh teats toate Bt Sui S.nien 64 Chapter 3 | Stationary ARMA Processes Provided that [8] < 1, both sides of [3.7.1] can be multiplied by (1+ 62) to obtain (= OL + OP - OY - We, B72] which could be viewed as an AR() representation. If a moving average repre sentation such a [3.7.1] can be rewritten as an AR(=) representation such a5 [3.7.2] simply by inverting the moving average operator (1 + 6), then the moving average representation is said to be inverable, For an MA(1) process, invertbility requires {el < 1; if ol = 1, then the infinite sequence in [3.7.2] would not be well defined Let us investigate what invertibilty means in terms of the first and second moments of the process. Recall that the MA(1) process [3.7.1] has mean and autocovariance-generating function ar(2) = o%(1 + Gay(t + 82-4) B73] Now consider @ seemingly different MLA(1) process, F- wat A, B74] with pa {@ torear wee Saeaie Note that ¥, has the same mean (1) as ¥. Ils autocovariance generating function as(z) = 6701 + dny(L + Bet) = at + NEN (Ge + NEY] = (HG + NG + OEY, Suppose thatthe parameters of [3.7.4], (8, 3), are related to those of [37.1] by the following equations: 8.7.5} a = a? B27, ‘Then the autocovariance-generating functions 3.7.3] and [3.7.5] would be the same, meaning that ¥, and ¥, would have identical first and second moments. Notice from {3.7.6} that if [0] < 1, then [| > 1. To other words, for any invertible MA(1) representation [3.7.1], we have found a noniavertible MA(1) representation [3.7.4] with the same first and second moments asthe invertible representation. Conversely, given any noniavertble representation with [él > 1, there exists an invertible representation with @ = (1/8) that has the same first and second moments as the noninvertible representation. [nthe borderline case where 4 = £1, there is only one representation ofthe process, and i is noninvertible [Not only do the invertible and noninvertible representations share the same moments, either representation [37.1] or [3.7.4] could be used as an equally valid description of any given MA(1) process! Suppose a computer generated an infinite sequence of ¥'s according to [3.7.4] with 8 > 1. Thus we know fora fact that the data were generated from an MA(1) process expressed in term ofa noninvertible representation. In what sense could these same data be associated with an invertible MA(1) representation? Note rom 22.8} het (4 Oy = OE 21+ Ob + (OL + ome + 3.7. Invertibility 65Imagine calculating a series {¢)7.~ defined by em (Lt OL)“, — w) (B= w)~ OCF = #) + Fa — Hw) - OF te, B78) where @ = (1/8) is the moving average parameter associated with the invertible MA() representation that shares the same moments as [3.74]. Note that since |e] <1, this produces a well-defined, mean square convergent series {¢} Furthermore, the sequence (50 generated is white noise. The simplest way to verify this isto calculate the autocovarance-gencrating function of eand conte, that the coefficient on 2 (the jth autocovarance) i equal to zero for any] + 0 From [3.78] and [3.6.17], the autocovariance-generating function for is given by gyl2) = (1+ 02)" + 021)Ygp fe), 73] Substituting (3.7.5) into [3.7.9], gle) = (1+ 02) + 02-1) + BEL + GHEY) ae b.710) where the last equality follows from the fact that @* = 6, Since the autocovariance generating function isa constant, i follows that ¢, 6 white noise process wth Multiplying both sides of (3.7.8) by (1 + 01), Fa (0+ oye, is a perfectly valid invertible MA(1) representation of data that were actualy -enerated from the noninverible representation (3.7.4). The converse proposition is also true—suppose thatthe deta were really generated trom [3.7.1] with] < 1, an invertible representation, Then there exis 4 noninvertible representation with 4 = 1/0 that describes these data with equal validity. To characterize this noniavertble representation, consider the ope proposed in [25.20] asthe appropriate inverse of (1 +) LU ~ YL + @ OLE OL + BL? Define é, to be the series that results from applying this operator to (¥, ~ 44), = Yer — w) ~ ONea2 = HW) + Mies =m), BTA] noting that ths series converges for ol < 1. Again this series ig white noise: Az) = {02M = 628 + Oe? 2 +. X {Oe[L — Ost + O24 ~ O29 ++ Pol + o2)(L + 0~ = 08 ‘The coefficient on is zero forj # 0, 50, is white noise as claimed. Furthermore, by construction, Yi~ w= (1+ ble, 50 that we have found a noninvertible MA(1) representation of data that were actually generated by the invertible MA(1) representation [3.7.1]. Either the invertible or the noniaveribe representation could characterize any given data equally well, though there is a practical reason for preferring the 66 Chapter 3 | Stationary ARMA Processes invertible representation. To find the value of « for date ¢ associated with the invertible representation as in [3.7.8], we need to know carrent and past values of ¥. By contrast, to find the value of & for date ¢ associated with the noninvertible representation as in [3.7.11], we need to use all of the future values of Y1 Ifthe intention is to calculate the current value of ¢, using real-world data, it will be feasible only to work with the invertible representation, Also, 2s will be noted in Chapters 4 and 5, some convenient algorithms for estimating parameters and fore- ‘casting are valid only ifthe invertible representation is used. ‘The value of e, associated with the invertible representation is sometimes called the fundamental innovation for Y,. For the borderline case when |é| = 1, the process is noninvertible, but the innovation e, for such a process will still be described as the fundamental innovation for ¥, Tavertibility for the MA(q) Process Consider now the MA(a) process, Grab eb salt ee + oly, [8.7.4 _ {et fore wand (Tiere Provided that the roots of (L492 + 62? + + O29) <0 3.7.3) lie outside the unit circle, [3.7.12] can be written as an AR(=) simply by inverting the MA operator, (+ mb + ml? + mb + 9% = where (Lt mL + malt + mL to) = (LL LP Ee 4 LN ‘Where this is the case, the MA(a) representation [3.7.12] i invertible. Factor the moving average operator as CHL +L? +--+ +L) = (LAL AL) (LAL). (3.7.14) If <1 for al, then the roots of [3.7.13] are all outside the unit circle and the representation (3.7.12] is invertible. If instead some of the A, are outside (but not on) the unit circle, Hansen and Sargent (1981, p. 102) suggested the following procedute for finding an invertible representation. The autocovariance-generating function of ¥, can be written bela) = o- (= Aa) = a) A XU = AEM =e DG aaey, BSI (Orde the 1s So that (yA A) aie the uni and ye Aen " Ay) are outside the unit etcle” Suppore o? in (3.7.15) is replaced bya Adey'ARLy «A%; since complex A, appear as conjugate pars, this is a positive real number. Suppose further thet Qe, Ayeay s+ Ay) ate replaced with theit 3.7. Invertbiliy 67reciprocals, (Ass, Arta «=» Ag). The resulting function would be oRatter fifa —aal}{ fh a - ar} efhosLfhe-] eff a- aa}{ Basra - aval} {tha aco}{ Fh pase - 9 ol offer x {fh a- aro}{ Tae - of cto} which is identical to [3.7.15] ‘The implication isa follows. Suppose a noninvertble representation for an [MA(Q) process i writen in the form Y= n+ Ta - ane, (7.16 where Bit forts nt a #2... and eft EGE) = { ‘Then the invertible representation is given by rene fiho-avl{ fhe 0 otherwise. woh on where Eee, PR AE fore r 0 otherwise. ‘Then [3.7.16] and [3.7.17] have the identical autocovariance-generating function, though only [3.7.17] satisfies the inveribility condition, From the structure of the preceding argument, i is clear that there are @ ‘number of alternative MA(g) representations of the data, associated with all the possible “fips” between A, and 47. Only one ofthese has all ofthe A, on oF inside the unit circle. The innovations associated with this representation ate said to be the fundamental innovations for ¥, 68 Chapter 3 | Stationary ARMA Processes ‘APPENDIX 3.A. Convergence Results for Infinite-Order Moving Average Processes “This appendix proves the statements made inthe text about convergence forthe MA(=) process "ist ow that absolute summabilty ofthe moving average coffers implies square summabilty. Suppose that (oni absolutely summable, Then tere ext an <> such that lyf , establishing that (3.3.13 eles [3.14 INGA we show that squaresummabity does not imply absolut surmabiity. Foran cxanple oft sees that is square summable but not absolutely summable, consider 4 = iteer ‘Notice that 1 > 1 for all J, meaning tha ufone dso Su> [amare logtN + 1) ~ tog(s) which diverges to as Nr =, Hence (if not absolutely summable Its, however, Square summable, since Lj? < Ue for alle", meaning tat ur fee i Surers [Camyacerecun, am, which converges to 2s N+ 2. Hence (ie square-summable Next we show that square-sammabiity ofthe moving average coeticiets implies that the MA(=) representation in [3.3.1}] generates 4 mean square convergent random variable Frist recall what js meant by convergence of deterministic sum suchas 27 where (a) is just a sequence of numbers. One eiteron for determining whether 3a 4 converges £2 tome finite qumber as T > > isthe Cauchy crterion. The Cauchy eiterion states that Shrng, converges if and only if, for any + > 0, there exists «suitably large integer N such that, for any integer Mf > N, In words, once we have summed N terms, culating the sum out to a larger number M ‘does not change the total by any more than aa arbitrary small number For a stochastic procede such at [33.1], the comparable question is whether yon Yer converges in mean square 10 some random Variable ¥, as T—> =. In this case the Cty erterin states that Bug dj, converges if and only if, for any e > 0, there ‘exists suitably large integer N such that for any integer Mf > N Baa] [In words, once IN terms hove been summed, the difference between that sum and the one ‘obtained from summing to M is 2 random variable whote mean and variance are both arbitrary else to ze. Se ee eee ai) Sasapie ahaa Uksaahiis esasniae Maseanaaaes aeNow, the left side of [3.8.1] is simply Eatin + Vwssienes +0 + Wuesiena Wet Heart tad oF -[Ew- Lu] Butif 37.5 of converges as required by (3.3.14, then by the Cauchy criterion the right side of [3.Aj] may be made as smal as desired by choioe of suitably large N. Ths the infinite Serie ia 3.3.13} converges In mean square provided that [3.3.14] is satisfied Finally, we show that absolute summability of the moving average coefficients implies that the proves is ergodic fr the mean. Write [3.3.18] a5 Baa Id = o°| 3 dua ‘A key property ofthe absolute value operator is that la +b + else + fl + ih Hence bbs ot 3 ial Sse SS wand = oS Sea tal = ot Sol Si But there exists an M <= such that Bug || < M, and therefore 37 jel < Mf for k= 0.1, 2, meaning that Zico Sd more bination ofthe orginal variables a, exiyig [41-10] represents he same random vacuble; that, 10, 4.1, Principles of Forecasting 78which equivalently can be written vofun$sa] [ung wun ‘Comparing the OLS coefficient estimate b in equation [4.1.19] with the linear projection coefficient a in equation [4.1.13], we see that b i constructed from the sample moments (UT)EE xx) and (U'T)EE x.y, while « is constructed from population moments E(XX;) and E(X,Y,,,). Thus OLS regression is a summary of the particular sample observations (x. %,.-« X2) and (Yay Yoo «++ Prev whereas linear projection is a summaty of the population characteristics of the stochastic process (X,, Ye} ‘Although linear projection deseribes population moments and ordinary least squares describes sample moments, there is a formal mathematical sense in which the two operations are the same. Appendix 4,A to this chapter discusses this parallel and shows bow the formulas for an OLS regression can be viewed as @ special case of the formulas fora linear projection. Notice that if the stochastic process (X,, ¥;.} is covariance-stationary and ergodic for second moments, then the sample moments will converge to the pop- tlation moments as the sample size T goes to infinity wn Ex: xX) wry z KYiay EOYs), implying bra. (41.20) Thus OLS regression of y,.1 on x yields a consistent estimate of the linear projection coefficient. Note that this result requires ony tha the process be ergodic for second moments. By contrast, structural econometric analysis requires much stronger assumptions about the relation between X and Y. The difference arses because structural analysis seeks the effect of X on Y. In structural analysis, changes in X are associated with a particular structural event such as a change in Federal Reserve policy, and the objective is to evaluate the consequences for Y. Where that isthe objective, itis very important to consider the nature of the correlation between X and ¥ before relying on OLS estimates, In the case of linear projection, however, the only concern i forecasting, for which it does not matter whether it is X that causes Y or ¥ that causes X. Their observed historical comovements (as summarized by E(X,¥,.;)) are all that is needed for calculating a forecast. Result [6.1.20] shows that ordinary least squares regression provides a sound basis for forecasting under very mild assumptions. ‘One possible violation of these assumptions should nevertheless be noted. Result [4.1.20] was derived by assuming 8 covariance-stationary, ergodic process However, the moments of the data may have changed over time in fundamental ways, of the future environment may be different from that in the past. Where this isthe case, ordinary least squares may be undesirable, and better forecasts can emerge from cereful structural analysis. 16 Chapter 4 | Forecasting Forecasting Vectors ‘The preceding results can be extended to forecast an (n X 1) vector ¥,,, on the basis of a linear function of an (m x 1) vector X;: PK) = 0K, Free (e129 ‘Then a’ would denote an (1 m) matrix of projection coefficients satisfying ELM eg1 ~ @X,)Xi] = 0; [4.1.22] that is, each ofthe elements of (Ys. ~ Yrs) is uncorrelated with each ofthe smelements of X-_ Accordingly, the jth element of the vector Yu gives the ‘minimum MSE forecast ofthe scalar Y,,,. Moreover, to forecast an linear combination ofthe elements of Ys 889s fe1 = WY» the minimum MSE forecast of rr tequites (2-4 ~ #s.4) tobe uncomelated with X,. But since each ofthe Clements of (Yrs, = Yay) is uncorrelated with X,,clealyb'(¥j9; ~ Vj) also tncortelated with X, Thus when 3, satis (4.122), then h'Y,. y's the minimum MSE forecast of WY, for any vale of rom (41.22), the mati of projection coefficients i given by a! = (EQ XI) (ERX) (4.1.23) ‘The matrix generalization of the formula for the mean squared error [4.1.15] is MSE(@'X) = EtT¥ 45 ~ @X) [Wea aX} = EW Mo) CBW XI] CECE EON.) 174 42, Forecasts Based on an Infinite Number of Observations Forecasting Based on Lagged e's Consider a proces with an MAC) representation w= oe. 24) with e, white noite and wee Soe! toed 3 <= [4.2.2] Suppose that we have an infinite number of observations on « through date f, {¢, eiays Gay -» > and further know the values of and (Ys, da, - - -} Say we want to forecast the value of Y,.,y that is, the value that ¥ will take on s periods from now. Note that [42.1] implies Your= ht Gat tations tt teen ME gag) teat ‘The optimal linear forecast takes the form EY tains TH+ Get Hrortins + Urartina + [4.2.4] 4.2, Forecasts Based on an Infinite Number of Observations 77I __—_—_—_—_—_— ‘That is, the unknown future e's are set to their expected value of zero. The error associated with this forecast is Yes Bled tions + Dm Gras t Ualrosa toot teasfiass [425] In oder for 4.24] 10 be he optimal nea forces, condon 0} auies the forza aor thave mem sro md tose amore aha Ui rea contd tae rn] as te soe isstindeadbethenen projection camel Themesennn tener seal Tih Ge brea Eins EMiaslen trans P= (4 RAGE vor, [4.2.6] For example, for an MA(q) process, WL) =14 OL + LP 4-4 4,L8, the optimal linear forecast is olen fiaes ol (627) a [EEE Mae Mees OF Dood " fors=q+ i,q +2, The MSE is o fors=1 (4G 4 Feo? fore=23,....9 GHEHG +--+ ee? forse gt gs) ‘The MSE increases with the forecest horizon s up until s = q. If we ty to forecast an MA(g) farther than q periods into the futur, the forecasts simply the unconditional mean of the series (E(Y,) = ys) and the MSE is the unconditional variance of the series (Var(Y,) = (1.4 6} + 6} +--+ + oi)e3) ‘These properties also characterize the MA(=) case as the forecast hotizon s {goes to infinity. Its straightforward to establish from [4.2.2] that as 3-» =, the forecast in (4.2.4] converges in mean square to j, the unconditional mean, "The MSE [4.2.6] likewise converges to 0 7.04}, which is the unconditional variance of the MA(«) process [42.1]. ‘A compact lag operator expression for the forecast in (4.2.4) is sometimes used, Consider taking the polynomial W(L) and dividing by L*: Ww t LOL DP Loh GLP Fill + Urea to ‘The annihilation eperator* (indicated by {-].) replaces negative powers of L by zer0; for example, (428) Comparing [42.8] with 4.2.4], the optimal forecast could be written in lag operator notation as EMiolen tors] = et [: 429) "This iscoson of orectting bated onthe aniston operators simile thatin Sgt (158). 78 Chapter 4 | Forecasting Forecasting Based on Lagged Y's ‘The previous forecasts were based on the assumption that ¢, is observed directly. Inthe usual forecasting situation, we actually have observations on lagged Y's, not lagged e's. Suppose that the proces [42.1] has an AR(=) representation sven by WLM. #) = eo, (4.2.19) where (L) = EfuonjL!, m = 1, and 37uolm| < =. Suppose further that the AR polynomial m(Z) and the MA poiynomial ¥(L) are related by (Ly = WEY (4.2.41) A covariance-stationary AR(p) model of the form (= dL LP == LK — Mw) = eu (62.23 or, more compact, PLY, — 4) = 6, clearly satisties these requirements, with n(L) = #(L) and y{L) = [6(L))">. An MAG) process Yin pat OL + GLP +--+ BLY, (6.2.13) Y,~ n= Le, Js also of this form, with Y(L) = 6(L) and n(L) = [@(L)]"?, provided thet [4.2.13] is based on the invertible representation. With a noninvertible MA(q), the roots must first be lipped as described in Section 3.7 before applying the formulas given in this section. An ARMA(p, q) also satisfies [4.2.10] and [4.2.11] with Y(L) = A(LY/4(L), provided thatthe autoregressive operator $1.) satisfies the stationarity condition (roots of 4(2) = Ole outside the unit circle) and that the moving average ‘operator 6(L,) satisfies the invertbility condition (roots of 6(2) = O lie outside the unit cite), ‘Where the restrictions associated with [4.2.10] and [4.2.11] are satisfied, observations on {¥;, ¥;-s, -«} will be sutficient to construct (¢,,€,-1, . J} For example, for an AR(I) process [4.2.10] would be (1- 41)(%,- =e, (42.14) Thus, given and and observation of ¥, and Y,.,, the value of, can be constructed from = (Kew) ~ b Kina ~ For an MA(1) process written in invertible form, [4.2.10] would be (4 Ly, W) =e Given an infinite number of observations on Y, we could construct ¢ from 6 (eH) ~ O(a =H) + Pra =H) ~ OW w+ Under these conditions, [4.2.10] can be substituted into [4.2.9] to obtain the forecast of ¥,., a8 a function of lagged Y"s: iad¥e Yen Det [2] LNY, ~ wi (4.2.15) See ee teaor, using [4.2.11], BAY Yarns MOY] Avy, oe [Se Hw). (42.16) Equation [4.2.16] is known as the Wiener-Kolmogorov predicion formula. Several examples of using this forecasting rule follow. Forecasting an AR(1) Process For the covariance-stationary AR(I) process [4.2.14], we have UL) == OL) = 1 OL GEE PR + [4217] WL) ud. Substituting [4.2.18] into (4.2.16) yields the optimal linear s-period-ahead forecast for a stationary AR(1) process: and $GLE = GEE to = gil = 4b). (62.18) e EU Ae Yoon = w+ Og = HM ~ #) =a + HY, ~ #)- ‘The forecast decays geometically from (¥, ~ n) towards asthe forecast horizon ‘increases, From [4.2.17}, the moving average weight vis given by @!,s0 from {4.2.6}, the mean squared »-period-abead forecast error is We tat + peter, Notice that this grows with ¢ and asymptotically approaches o7/(1 ~ ¢), the unconditional variance of ¥. [42.19] Forecasting an AR(p) Process [Next consider forecasting the stationary AR(p) process [4.2.12]. The Wiener Kolmogorov formula in [4.2.16] essentially expresses the value of (¥,., ~ a) in terms of initial values {(¥, — 1), (¥-1 ~ «band subsequent values of {¢.., Year +++ Grad) and then drops the terms involving future e's. An expression of this form was provided by equation [1.2.26], which described the value ofa variable subject to a pth-order difference equation in terms of initial conditions and subsequent shocks Free = w= AH =H) + LB HH + ARK oes — + bane + dibiert + abienea too + Heations 42.20) where wf (42.21) Sibi dasa ae aphsn tigeaas Recall that f42 denotes the (1, 1) element of F,/§? denotes the (1,2) element of F, and so on, where F is the following (p p) matrix: bb bs br by 10 0 0 0 F=|0 10 o 0 ) a9 “The optimal speriod-ahesd forecast is thus ee ee + F9U pr — Notice that for any forecast horizons the optimal forecest isa constant pus a linear function of ¥ ¥,-as + « Yinpeah The associated forecast error is Voor — Prete = Gear + brent Vibrant + Waabre (4.2.23) “The easiest way to calculate the forecast in (4.2.22 is through a simple re- carson, This recursion can be deduced independently from a principle known a the law of trated projections, which wll be proved formally in Section 4.5, Suppose that at date c we wanted to make a one-period-ehead forecast of ¥,,.. The optimal forecast is clearly [4.2.23] (Pray — MD = OY, — w) + CHa - w+ + Oro H) Consider next a two-period-ahead forecast. Suppose that at date ¢ + 1 we were to make a one-period-shead forecast of Y,,3. Replacing ¢ with ¢ + 1 in (4.2.24) ives the optimal forecast as (4.2.24) Bacsper — HD = O(c — a) + oH, = a) + $F by(Yeapea ~ W) ‘The law of iterated projections asserts that if this date + 1 forecast of Ye is projected on date ¢ information, the result i the date forecast of Y,43. At date tthe values Yo. Yioay +s Yeopsa in [4.2.25] are known. Thus, evar — w) = Pre — Hw) + (8, - H+ + hopes — Hh Substituting [4.2.24] into [4.2.26] then yields the two-period-sheed forecast for an AR(p) process rsa 2) = lbh, ~ #) + bh =) + + Ore = HO) + bY, =H) + b(Way — M) +o +b Onpee - wD (E+ 6M, ~ w+ ids + OH an — Hd + + Gidyaa + bY Yirpos — H) + bby(Frepes ~ H) ‘The s period-ahead forecasts of an AR(p) process cen be obtained by iterating (4.2.25) (4.2.26) (Penn — Mw) = bBreje — w+ bsBaiae — a) ++ + Fessem 0) ia Segaga sansa ese gee eae cccforj = 1,2... 48 where Pan, forest Forecasting an MA(1) Process ‘Next consider an invertible MA(1) representation, Y-w= 4 oye, (4.2.28) with [ol < 1. Replacing 4(L) in the Wiener-Kolmogorow formula [4.2.16] with (1+ OL) gives Yr »+ [4] Tee - ». [42.25] ‘To forecast an MA(1) process one period into the future (5 = 1), [at]. -« heya trp ge) (42.30) FH =p) = Onn = 2) + Oa Wm tis sometimes useful to wit [4.2.28] a8 sail “T+OL and s0 mw and view ¢, as the outcome of an infinite recursion, (W,~ 1) ~ Gin (6231) The one-periodshead forecast [4.2.30] could then be written as Yosay = wt Oy (92.22) Equation (4.2.31) is in fact an exact characterization of «,, deduced from simple rearrangement of [4.2.28]. The “hat” notation (é) is introduced at this point in anticipation of the approximations to ¢, that will be introduced in the following. section and substituted into [4.2.31] and [4.2.32]. To forecast an MA(1) process for s = 2,3... periods into the future, 1+ oL T Jee ees and so, from [4.2.29], Poo fors = 2,3, (4.233) Forecasting an MA(q) Process For an invertible MA(g) process, (Y= w= tL + LE + OLDE 82 Chapter 4 | Forecasting the forecast [4.2.16] becomes : pease a] feb ofie cific} evesterransettereiegtenitttsitstie@sedic B e $2.34 1 *TSeLs abs + eet Now 1+ 4L+ O04 77 . {ev GoriL + Opeal? oF QEE fore = 12,0004 0 fors=q+1,q+2, Thus, for horizons of s = 1,2, 4, the forecast is given by Pigs = H+ + Oeik + Oeal? + + OLE, [4.2.35] where & canbe characterized by the recursion BW) ~ Bian = Bla = Og (42.36) A forecast farther than q periods into the future is simply the unconditional mean p. Forecasting an ARMA(I, 1) Process For an ARMA(L, 1) process (1 = G0), ~ w= (1+ Lye, ‘that i stationary (J6| < 1) and invertible (ll <1), fone aba] gh oy, — wy), (4.2.37) _[Gserser LL + oL + s'Le+ ++) v Peas (42.38) OHOML OLE-) ea HL EL + (o* + OG" "YL + HL 4+ GL? +--+) ftege! 3 Subang 42.38] int [4.2.37] gives sagas Bag ns [HA] La thoy, seo a 62. = Se T+ 0 eps sablnass tat dasa aguasNote that for s = 2,3,. . . the forecast [4.2.39] obeys the recursion (Posy — WY = OPiseate ~ HD ‘Thus, beyond one period, the forecast decays geometrically at the rate ¢ toward the uncondioeal wean ~The one-pesodahead forcast (= I) i gen by Pow = e+ PEE OF ~ wy (4.2.40) ‘This can equivalently be written asp 9) = SEEDED ey, yy = g¢Y,— w+ 06, [62.41] 1s 0L where Gaby _ a Gat = (K~ w) — 64-1 =H) = Oa = Ye Pyar (4.2.42) Forecasting an ARMA(p, q) Process Finally, consider forecasting stationary and invertible ARMA(p, q) proces: (1 = iL dyL? == 6 LN, W) = 4 L 4 LE + + LN, ‘The natural generalizations of [4.2.41] and [4.2.42] are Pere — BY = ON, — wd + Ken — w+ +b Wepes — WY + AEF labia tet Ofegayy 78) with (@} generated recursively from 4a, Peer [42.44] ‘The speriod-ahead forest would be Crue) (9245) Bessy — 1) + b(frsrnae — BY ++ OPrarepy — 1) $08 + Gaba to 4 Obie f0rs edd P(Presne — HY + Oa(Firenay — H) +07 + bp(Prsenpe — H) fors=q+hg4 2 where fy a¥, forest Thus for a forecast horizons greater than the moving average order q, the forecasts follow a pth-order difference equation governed solely by the autoregressive parameters 84 Chapter 4 | Forecasting 4.3. Forecasts Based on a Finite Number of Observations The formule inthe preceding section assumed that we had an infinite nomber of pastobwervations on Ys {Fy Fens Je and knew with eettity population per Famers such as, aod @ Ths scion continues to assume tha population parameters arc known with cetainy, but develops forecasts based on fine number of observations {Y,, ¥,_,, + Yeemeth For forecasting an AR(p) procs, an optimal speriod-ahead linear forecast ‘based on an infinite number of observations {Y,, ¥,.1, . . .} in fact makes use of only the p most recent values (¥, Vien. Yyegesh: For an MA or ARMA proces, however, we would in principe equ all ofthe historia vals of Yin Order to mplement the formulas of the preseding seetion Approximations to Optimal Forecasts One approach to forecasting based on a finite number of observations is to act as if presample e's were all equal to zero. The idea is thus to use the approx BM ln Yint SL MnP foal ts fiom = OF mt For example, consider forecasting an MA(a) process, The recursion [4.2.36] can be started by setting re) 43.2] and then iterating on [4.2.36] 0 generate &,meisfs-moas +» fy These caleu- lations produce bones = Comes we Eromen™ Linnea — H) ~ Oylsamess faomea = Womes~ H)~ biomes and so on. The resulting values for (6,8,-1) «+=» Gages) ate then substituted 0 for alli. Ths is known asthe triangular factorization of Q. Ss aaamaeaine asad ce SuusniedcLSApAAbe udMMMGdOaN AEGAN‘To see how the triangular factorization can be calculated, consider My My My > Oy My Me My > My =] My My My My |, (442) Dp Mg Mg + Don ‘We assume that © postive definite, meaning that x'@x > 0 for any nonzero (© 1) vector x, We aso assume that is symmetric, so that Oy = My ‘The matrix 2 canbe transformed into a matrix wth 270 inthe (2,1) postion boy multiplying the frst row of € by afi! and subtracting the resulting row from the second. A zer0 can be put inthe (3,1) postion by multiplying the fist row by fs," and subtracting the resulting row from the third. We proceed inthis fashion dowa the fist column. This set of operations can be summarized as premultiplying by the following mateix: autonOi ean = MyM 1:0 ++ 0 B=] -M%5' O10 [443] 2,03 1 ‘This matrix alvays exists, provided that Oy, # 0. This is ensured in the present case, because 1, sequal © eiftey, where e| = [1 0 0+ +O}. Since Mis positive definite, ee, must be greater thaa zero. When is premaltipied by E, and postmulipied by E; the result is FOF; = H, [44a] where a 0 0 ° 0 ay hy 2 hae mf Oy eh fas) ys 2 a, 0 ° . ° 0 My M7 %y My ~ O47, =] 0 -0,0)5, Oy - 0,05°%, 1, ~ 95050,. 0, ~ 0,050, 0 y= BAM y My = MeAMHIDs -- Me ~ DMI, We next proceed in exactly the same way withthe second columa of H. The approach now willbe to multiply the second row of H by hyshi? and subtract the result from the third row. Similarly, we multiply the second row of H by haha? and subtract the result from the fourth row, and so on down through the second 88 Chapter 4 | Forecasting column of H. These operations can be represented as premultiplying H by the following matrix: tee Osseo Geena aag Bye | 9 hahah 1 os 0 (646) 0 baht O + 1 ‘This matrix always exists provided that ha, # 0, But hz can be calculated as haa = efHle,, where ef = (0 1 0+ 0], Moreover, H = E,QB;, where Q is positive definite and E, is given by [4.4.3] Since E, is lower triangular, its determinant is the product of terms along the principal diagonal, which are all unity Thus Eis nonsingular, meaning that H = E,OE; is positive definite and $0 hz ‘He, must be strictly positive. Thus the matsix in [44.6] can always be calculated, If H is premultipled by the matrix in [4.4.6] and postmultiplied by the transpose, the result is EE! = K, vere hy 0 0 . 0 ohn 0 0 Ke] 00 = hahihas ote = hath 00 hy hahithay ~~ ye ~ hahitha Again, since H is positive definite and since E, is nonsingular, K is postive efinite and in particular ks is positive. Proceeding through each of the columas with the same approach, we sce that for any positive definite symmetric matrix 0. there exist matrices Ey, Ez, .. , Ey-1 Such that Ejay EEORIES- +B, = D, 447) where D= Gama 0 Cfo ° 0 My - MOGI, 0 we ° ° 0 as hatha « ° . ° ° ° om oe with all the diagonal entries of D strictly postive. The matrices E, and Ein [4.4.7] are given by [4.4.3] and [4.4.6]. In general, His 8 matrix with nonzero values in the th column below the principal diagonal, 1s along the principal diagonal, and zeros everywhere else. ‘Thus each B, is lower triangular with unit determinant. Hence Ej exists, and the following matrix exists: A=. 44, Factorization of a Positive Definite Symmetric Marix 89 Bet [4.4.8]If [44.7] is premultipled by A and postmultiplied by A’, the result is 9 = ADA’ [4s] Recall that E; represents the operation of multiplying the first row of by certain numbers and subtracting the results from each of the subsequent rows. Its inverse E;' undoes this operation, which would be achieved by multiplying the first row by these same numbers and adding the results to the subsequent rows. Thus ol, [44.10) as may be veiled directly by multiplying [4.43] by [44.10] to obtain the identity matrix. Similarly, TO eas 0 oO 1 O--0 Et =| 0 hubs! 1 --- 0 0 hgh’ OT and s0 on. Because of this special structure, the series of multiplications in [4.4.8] turns out to be trivial to carry out: iO aa ° Oo 10 ° A= | M05 hak 1 ° 4.1) iit heh Kak = 1 ‘That i, the jth columa of A is just the jth column of Ej. We should emphasize that the simplicity of carrying out these matrix multiplications is due not just to the special structure of the Ej? matrices but also to the order in which they are multiplied. For example, A“? = EyBy-2+'+ Ey cannot be calculated simply by using the jth column of &, for the jth column of ren Since the matrix A in [4.4.11] is lower triangular with 1s along the principal diagonal, expression [4.4.9] is the triangular factorization of 2. For illustration, the tiangular factorization = ADA’ of a (2 x 2) matrix le ele laa) a <[ a,-desa dls 2" 0 Ay = MyNG'M2} LO 1 while that of a (3 > 3) matrix is My My Oy Tee) My My My] =| Oni 1 0 05.05% yh 1 M5, Ox Ass nO ° 1 71%, 95M x} 0 has 0 0 1 hathas 0 0 Ay-Ayhaths| [0 0 1 where hae = (ag ~ Myf!) ay = (Dy ~ M48), and hos (Gy ~ 0,95;'My). (44.13) Uniqueness of the Triangular Factorization We next establish thatthe triangular factorization is unique. Suppose that = ADA, = ADA, ead where Ay and A; are both lower tiangula with Is along the principal diagonal and Dy and D, ate both diagonal with positive entries along the principal diagonal ‘Then all the matrices have inverses. Pemaltplying [4.4.18] by DA; and post multiplying by [A:)~* yields AS(AS]“! = DetAT AD, 44.15] Since Aj is upper triangular with 1s along the principal diagonal, (Ag]~* must likewise be upper triangular with 1s along the principal diagonal. Sine Aj is also ofthis form, the left side of [44.15] s upper triangular with Is along the principal diagonal. By similar reasoning, the right side of [44.15] must be lower triangular. "The only way an upper triangular matrix can equal a lower triangular matrix i if all the off-diagonal terms are zero. Moreover, since the diagonal entries on the leit side of [44.15] are all unity this matrix must be the identity matrix NTA)" = Te Postmultipication by Aj establishes that Aj = As. Premultipying [44.14] by A~* ‘and postmultiplying by (A’]"* then yields Dy The Cholesky Factorization [A closely related factorization of a symmetric positive definite matrix © is obtained as follows, Define D7 to be the (n Xn) diagonal matrix whose diagonal entries are the square roots of the corresponding elements ofthe matrix D in the ‘wiangular factorization: veg O0 ° 0 Vm 0: 0 pw=| 0 0 Vass outa Gis Oulouseestven ‘Since the matrix D is unique and has strictly positive diagonal entries, the matrix D'? exists and is unique. Then the triangular factorization can be written 0 = AD™D"%A' = ADIXAD?’ Se ee eee aDerr, (e416) where =a see ol[Vai oO 0 0 a 1 0 0 0 Vd; 0 0 ge aoe a0 || 0 ro ava 0 se eatiegtier (ELYs ~ POAIYOR x fs ~ POY) 94 Chapter 4 | Forecasting For example, suppose that ¥, is @ constant term, s0 that P(¥:I¥,) is just the mean of ¥2, while PCYSI¥;) = as. Equation [4.5.16] then sates that PCs[Faul) = Hs + Cov(¥s, Ya) [Var(¥a]-(¥s ~ 2) ‘The MSE associated with this updated linear projection can also be calculated from the triangular factorization. From [4.5.5], the MSE from a linear projection of Ys on ¥; and ¥; can be calculated from BLY, ~ P(r, YF = BP) sy = hiss ~ hashish. In general, for i > 2, the coefficient on Y, ia a linear projection of ¥, on ¥; and ¥, is given by the ith element of the second column of the matrix A. For any i> j, the coefficients on Y, ina linear projection of Y, on ¥ ¥j-1. .- » Ys is siven by the row i, column j element of A. The magnitude dy pives the MSE for linear projection of ¥,0m ¥,-., ¥i-2.-+ ++ Yic Application: Exact Finite-Sample Forecasts for an MA(1) Process ‘As an example of applying these results, suppose that ¥, follows an MA(|) process Yen tat Ona where ¢, is a white noise process with variance o* and 8 is unrestricted. Suppose We want to forecast the value of ¥, on the basis of the previous m ~ 1 values (Y., Yay sy Yqut)e Let Y= -#) e-w) Oa-w) (he - wh and let O denote the (1 x 1) vatiance-covariance matrix of Y: Teeuemms 0 o 1+8 6 ° 9 = EY) 0 6 t+e@- 0 | fasan Oe ome ite ‘Appendix 4.B to this chapter shows that the triangular factorization of 9 is, As [45.18] 1 a) ° ° i Poe ° ° ive a+). uneorneetts : . ij 0 eee MLE HHH + way | 14+ Oto + RD 4.5. Updating a Linear Projection 95D [45.19] 1+ 0 0 ° ° ° eo] o ° : ;: i Leesee Te¥eF ‘To use the triangular factorization to calculate exact finite-sample forecasts, recall that ¥, the ith element of ¥ = A-1Y, has the interpretation as the residual from a linear projection of ¥, on a constant and its previous values: P= ¥, - BY Yi-w oY). ‘The system of equations AY = Y can be written out explicitly as Fevnw Lee a ous) T+ey et OL + oe + mea Tee ee a pT Peat Fem Solving the last equation for Y, Y, ~ AY AY,-0 Yew LY)=¥,-e RN y — £ Maen Yeeros Wh impliag BUY AK y= Vanes Ya) = [4.5.20] Os Peo a ss erty DO hay ~ Bl eVaen Yoon YE ‘The MSE of this forecast is given by dy, MSEIEWA¥ yu Yaonse YN) = (45.21) It is interesting to note the behavior of this optimal forecast as the number cof observations (n) becomes large. First, suppose that the moving average representation is invertible (| < 1). In this case, as n —» , the coefficient in [4.5.20] tends to 8 M+ P+ oe 4 oteray re are aed while the MSE [4.5.21] tends to o2, the variance of the fundamental innovation. ‘Thus the optimal forecast for a finite number of observations (4.5.20] eventually tends toward the forecast rule used for an infinite aumber of observations [4.2.32] 96 Chapter 4 | Forecasting Alternatively, the calcul ions that produced [4.5.20] are equally valid for a noninvertble representation with |@| > 1. In ths case the coefficient in [4.5.20] tends toward 6 Ot Rs + Ty oft — re OYL - 8 rr es ar eT 9G-™ — 6° esl Ha) =I =e Thus, the coefficient in [4.5.20] tends to 6~* in this case, which is the moving average coefficient associated with the invertible representation. The MSE [4.5.21] tends to 06% [La ed 0) gage, woe) which wil be recognized from [3.7.7] a8 the variance ofthe innovation associated with the fundamental representation "This observation explains the use of the expression “fundameatal” ia this contest. The fundamental innovation «, has the propety that ¥,~ BOM Visas Yoon) 6, [4.5.23] a5 m—+ « where “> denotes mean square convergence. Thus when |6] > 1, the coefficient # in the approximation in [43.3] should be replaced by 8", When this is done, expression [4.3.3] wil approach the correct forecast as m ~> It abo intrvctve to conser the bordrine case 9 = 1. The optimal nit ‘sample forecast for an MA(I) process with @ = 1 is seen from [4.5.20] to be given by Qdrayene Y bw Ais — BF gran Yantee © oF Dh which, after recursive substitution, becomes yay) ow) (45.23) +3 mae urd =m The MSE ofthis forecast is given by [4.5.21 on + Wino. ‘Thus the variance of the forecast error again tends toward that of «,. Hence the innovation ¢, is again fundamental for this case in the sense of [4.5.22]. Note the comtrast between the optimal forecast [4.5.23] and a forecast based on a naive application of (4.3.3), Bt yer =) (Fyne = a) + (Fans =) (45.24) + (1% ~ #. ‘The approximation [4.3.3] was derived under the assumption thatthe moving average representation was invertible, and the bordetine ease @ = 1 isnot invertible For this, 45, Updating a Linear Projection 97reason [4.5.24] does not converge to the optimal forocast [4.5.23] as n grows large When 6 = 1, ¥; = w+ 6, + 6.1 and [4.5.24] can be written as Ht Gann + ua) ~ (Gyan + boos) + (Ean + Eyed me tC +) = et es + (Me ‘The difference between this and Y,, the value being forecast, is ¢, — (—1)"eo, which has MSE 20? for all n. Thus, whereas [4.5.23] converges to the optimal forecast as n>, [4.5,24] does not, Block Triangular Factorization Suppose we have observations on two sets of variables. The first set of variables is collected in an (n, % 1) vector ¥, and the second set in an (ny X 1) vector Y¥,. Their second-moment matrix can be writen in partitioned form as a= [EY ECUYD] _ fo, on] E(WY;) EY). My, O,)" where Oy, is an (n, X m;) matrix, Oz is an (ny % n,) matrix, and the (7, m4) matrix M4, is the transpose of the (n, X 7;) matrix Oy ‘We can put 2er0s in the lower left (nx m) block of © by premultiplying by the following matax: Teo If Q is premultiplied by E, and postmultiplied by Ej, the result is [ 1, *1(t: ee] [b aa = 905 IJ LO, MLO 1, “LP a, -evoie.] © LO M2 - 91050, lovdie 2} If [4.5.25] is premuttiplied by A and postmultiplied by A’, the result is MM) _f 1, 0 Dy Dp, 2 ,.A55" Ing. ay o Ty O's [% @ eae |e Ly ] 528 ‘This is similar to the triangular factorization © = ADA’, except that Dis a block- iagonal matrix rather than a truly diagonal matrix: [45.25] Define Ke Ej ae [v My ~ nog} 98 Chapter 4 | Forecasting ‘As in the earlier ease, B can be interpreted as the second-moment matrix of oe BleLde IEY ¥, = ¥; and ¥ = Yo - O4M5°Y,, The ith element of ¥, is given by Ya us a linear combination of the elements of ¥,. The block-diagonality of D implies that the product of any eloment of 2 with any element of Y, has expectation zero, Taus 0,0! gives the matrix of coefficients associated withthe linear projection of the vector ¥, on the vector Yi, POY) = 94.05%, (45.21 as claimed in [4.1.23]. The MSE matrix associated with this linear projection is Ella ~ POY DIM: ~ POLY) = ECe%) =D, [4.5.28] = Oy = M0515, as claimed in (4.1.28) The calculations for a (3 x 3) matrix similarly extend to a (3 x 3) block ‘matrix without complications. Let ¥,, Ys, and Yp be (7 X 1), (a * 1), and (ts X 1) vectors. A block-triangular factorization of their second-moment matrix obtained from a simple generalization of equation [4.4.13] [My My My pres 1%, My Ms) = |9,05' 1, ‘| My Om Ms) [Oy05! Halls! 1, Pee My, 0 ° Ty, 951M, O5'0,5 x0 Ha o o 1, HalHy 00 Hy ~ Halls oo Where Haz = (Oey ~ 0245 'M,:), Hoy = (Oyy ~ %051M,3), and Hyy = Hig = (Day ~ 2,.95;'0,)) This allows'us 10 generalize the earlier result [4.5.12] on updating a linear projection. The optimal forecast of Y, conditional on Y, and ¥, can be read oft the last block row of A PoGIy¥,) OGY, + Haass — O05"%) 4.5.30) IY.) + Hittite, - Povey), S31 where By, = El(¥s ~ PCYAYDIIN: ~ POLAK Hyg = E((¥s — POYAY MMs ~ PWV. ‘The MSE of this forecast is the matrix generalization of [45.13], E((¥s ~ POI¥a. Ms ~ POWsIva¥))) = Hay ~ Hala'H, (4.5.31) 45. Updating a Linear Projection 99where Hy = Ys ~ POY DIYs ~ POY DT Taw of erated Projections ‘Another useful result, the law of iterated projections, can be inferred immediately from [4.5.30]. What happens if the projection P(Y,|¥,,¥,) is itself projected on Y,? The law of iterated projections says that this projection is equal to the simple projection of ¥, on Yy PUP(YsI¥2.¥ 1%] = POY). [4.5.32] ‘To verify this claim, we need to show thatthe difference between P(¥I¥2,¥,) and P(YY,) is uncorrelated with Y,. But from [4.5.30], this difference is given by PCW |¥a.¥,) — PWIY,) = Hatta: ~ POY.) ‘which indeed is uncorrelated with Y, by the definition of the linear projection Powys). £6. Optimal Forecasts for Gaussian Processes ‘The forecasting rules developed in hs chapter are optimal witin the class oftinear functions ofthe variables on which the forecast is based, For Gaussian processes, vee can make the stronger caim that slong asa constant term is included amoog {he variables on which the forecast is based, the optimal unrestricted forecast turns ‘out to.bave a linear form and thus is given by the linea projection, To verify this, let ¥, be an (n, x 1) vector with mean ys, and ¥, an (nm, x 1) vector with mean js, where the variance covariance matrix is given by [EMS = mage = mas = ws = “yy [2 oe E (Ys = wod% ~ 44)" EO ~ #2) ~ wa)" LO, On FY, and ¥, are Gausia, then the joint probability density i 1 2, 9,,)~"* GF lo, On coon on 5) =} ‘The inverse of is readily found by inverting [4.5.26]: Fax 2) (6.6.1) o (a ~ 405M) ee fom 9,05) 1) Likewise, the determinant off can be found by taking the determinant of [4525) {0 = (|) 100 Chapter 4 | Forecasting a lower triangular matrix. Its determinant of terms along the principal diagonal, all of whit 10) = [i therefore given by the product are unity. Hence [A] = 1 and a, ° 0 9, ~ 9,059, 66.3) = [u) [a — 93,05. ‘Substituting [4.6.2] and [4.6.3] into [4.6.1], the joint density can be written I, 24 9, Oe FamQv nd 1 we = a = Gara [Me ~ O50, 950 1, * ( (0, - 0,05'0,3" [-eva ‘) [ - zl) = ape IOI Ig - gg x ef E10 = 0) 2 ~ m1 (46.4) e (My - ava) E = =} = BAMA Oe ~ MAO x an{ $16, — wy G2 ~ a's x e~o{-} 1 = m5 ~ = $02 = me ~ 9,050)" - »}. where m= pe + M054, — Hh). [4.6.5] facade ey ae =] rite yf Jordan form as M,J.M;, where J, upper eagle wih eigevaues of Oy slong the pracpal agonal, Wate Oy ~ 0,7, a8 M, JM, Then © = MIDI", where Et fad = Gapioalr ex[-to. = wy ny J ‘Thus 0 has the same determina: a J, Bete Jit upper wane, ts deteminant i te product of terms along the pracipel diagonal, or J} = [J [Jt Hence 10 = [0] | ~ O07 ae haubnid aa caaaiae has Mapai Meaaaaanaas SE‘The result ofthis division is Feary = EG “aan onl (2 ~ my“, »). where = 0, - 0,05) 466) In other wort, YJY, ~ M(m, H) 67) ~M( 02 + 95050, ~ nh [e ~ M0474) We saw in Section 4.1 dat he optimal unesticed forecast i given bythe conditonal expectation, Fora Gabssn proces, the optimal forecasts this E(YAY,) = a + Qa Mii'(, — m4). On the other hand, for any distribution, the linear projection of the vector Yon a vector ¥, and a constant term is given by EQEIY,) = ty + MGM — Hy. Hence, for a Gaussian process, the linear projection gives the unrestricted optimal forecast. 47. Sums of ARMA Processes ‘This section explores the nature of series that result from adding two different ARMA processes together, beginning with an instructive example. Siam of an MA(1) Process Plus White Noise Suppose that a series X, follows a zero-mean MA() process X= uy + Bias, (474) where us white noise a} for ond = {lie The autocovarances of X, are thus (1+ 808 for = E(KX,) = 9 B05 forj = (472) ° otherwise. Let vindicate a separate white noise series: ef for See | asus: seiuioe: Suppose, farthermore, that v and w re uncorrelated at all leads and as Eluy,.) = 0 forall, implying Biya) 0 feral) tar} Let an observed series Y represent the sum ofthe MACS) andthe mit noise proces Yemen 475 =u + Bu, + ite ‘The question now posed is, What are the time series properties of ¥? Clearly, Y, has mean zero, andits autocovariances can be deduced from [4.7.2] through [47.4 EW Yea) BK + Way + Ye = EK.) + Elo (1+ Pot + of forj=0 (67.6) forj = £1 ° otherwise. ‘Thus, the sum X, + v, is covariance-stationary, and its autocovariances are 2er0 beyond one lag, as are those for an MA(1). We might naturally then ask whether there exists @ zero-mean MA(1) representation for Y, Yom tt Ges ($7.1 wit seed (2 28 Whose autocovariances match those implied by [4.7.6]. The autocovariances of [4.7.7] would be given by (1+ oe? forj=0 worn = {i forj = 21 ° otherwise. otherwise, In order to be consistent with [4.7.6], it would have to be the case that (+ Bo? = (1 + Biot + oF (4.7.8) and Qo? = Bo3. (4.7.9) Equation [4.7.9] ean be solved for 0, oF = 6046, [4.7.20 and then substituted into [4.7.8] to deduce (1+ @MGoYO) = (1+ Bok + oF (1+ 898 = [0 + &) + (oODIS Set — [(1 + 8%) + (todo + 6 = 0. (4.7.11) 4.7. Sums of ARMA Processes 103For given values of 8, 0, and o3, two values of that sais [67.11] canbe found from the quadratic formula: {0 + 8) + Hed] = VOSS Ce i (4722) If of were equal to zero, the quadratic equation in [4.7.11] would just be = (+ 8+ 8 = 80 - HO- 8) = 0, [4.73] whose solutions are @ = Band @ = 8°, the moving average parameter for X, from the invertible and noniavertible representations, respectively. Figure 4.1 graphs equations [4.7.11] and [4.7.13] as functions of 0 assuming positive autocorrelation for X, (8 > 0). For 6 > 0 and o? > 0, equation [4.7.11] is everywhere lower than [4.7.13] by the amount (03/0), implying that [4.7.11] has two real solutions for 6, an invertible solution 6° satisfying o< lel <(a, (474) characterized by 1< [8] <6 ‘Taking the values associated with the invertible representation (6°, o*2), let us consider whether [4.7.7] could indeed characterize the data {Y,} generated by [4.7.5]. This would require (1+ OL)e, = 1 + Lu + M0 (4715) and a noninvertible solutio (1+ 0°L)~"((1 + 8b)u, + ¥] Huy Oa + Ong — Hy +) $ Bll, = Opn # By = Og Ho) $= Oy + Og = Os +) The series e efined in [47.16] sa distributed lag on past values of u and v, 30 it might seem to possess a rich autocorrelation structure. Infact, tras out 0 be (47.161 te33) fra} FIGURE 4.1 Graphs of equations [4.7.13] and [4.7.11]. 104 Chapter 4 | Forecasting ‘white noise! To see this, note from [4.7.6] that the autocovariance-generating, function of ¥ can be written Br(@) = (1 + Be)oK(1 + 824) + 0, (47.7) $0 thatthe autocoveriance-generating function of e, = (I + @*L)"*¥, is gz) = C+ Beat + 827) + oF (4.7.18) Ws O20 + 0) But 0° and o*? were chosen so as to make the autocovariance-generating function of (1 + 6°L)e,, namely, + oro + 0), identical to the tight side of [4.7.17]. Thus, [4.7.18] is simply equal to a2) a white noise series, “To summarize, adding an MA(1) process toa white noise sties with which ‘tis uncorrelated at ll eads and lags produces a ew MA(I) process characterized by (4.7) ‘Note thatthe series e, in [4.7.16] could not be forecast a a fnear function of lagged e or of lagged Y. Cleary, e could be forecast, however, onthe Bais of lagged u or lagged v. The histories {u) and {v) contain more information than (e) or {¥). The optimal forecast of ¥,., on the basis of {¥, ¥,-y,- -P would be E(KorsI¥n Yrnts +) = Oe, with associated mean squared error o*?, By contrast, the optimal linear forecast ‘of Yiey on the basis OF {iy tt. - + +4 Yo Mints «+ -} WOuld be ECs eons 224 Vo Veras au, with associated mean squared eror o2 + of, Recalling fom [47.14] that @"| < [8, it appears from (4.7.9] that (0°2)0™2 < 802, meaning from [4.7.8] that o4 > 2 + a2. In other words, past values of ¥ contaia less information than past values of u and v. This example can be useful for thinking about the consequences of differing information sets. One can always make 2 sensible forecast on the basis of what fone knows, {Y,, ¥;-1, -» +) though usually there is other information that could hhave helped more. An important feature of such settings is that even though 4, andy, are all white noise, there are complicated correlations between these white noise series ‘Another point worth noting is that all that can be estimated on the basis of {¥) are the two parameters 6° and o", whereas the true “structural” model [4.7.5] has three parameters (8, 02, and 03). Thus the parameters of the structural model are unidenfed inthe sense in which econometrcians use ths term—there exists a family of alternative configurations of 8, 02, and o? with [8) < 1 that would ‘produce the identical value for the likelihood function of the observed data {¥/. ‘The processes that were added together for this example both had mean zero Adding constant terms tothe processes will not change the results in any interesting. ‘way—if , is an MA(1) process with mean x andif vis white noise plus a constant iy, then X, + v, will be an MA(1) process with mean given by x + sy. Thus, ‘nothing s lost by restricting the subsequent discussion to sums of zero-mean processes. 4.7. Sums of ARMA Processes 105‘Adding Two Moving Average Processes Suppose next that X, is a zero-mean MA(q,) process: X= (+ BL + LEH ++ + 8,09) = BL) with a 2 forj=0 Bd {oe erte Let W, be @ zero-mean MA(q,) process: Wi = (Lt mL + mL? to + gL By, aL) Ev of forj=0 0 otherwise. ‘Thus, X has autocovarances v8, yf,» «vf of the form of (3.3.12 while W bas autocovariances 9", ¥, «7h of the same basic structure. Assume that X and W are uncorrelated with each othier at all leads and lags: E(KM,.) = 0 for allj and suppose we observe Ya X+W, Define q tobe the larger of g oF 4: q = max{q;, q2)- “Then the jth autocovarance of ¥ i given by E(YY,-;) = E(X, + WX; + We) = E(X,X,.) + E(WW,.)) after ena ° oterwite. Thus the autocovariances are zero beyond q lags, suggesting that Y, might be represeated as an MA(Q) process ‘What more would we need to show to be fully convinced that ¥, is indeed an MA(q) process? This question cen be posed in terms of autocovariance-generating functions. Since Wowk ty i follows hat Spt = Sots 3 ot But these are just the definitions of the respective autocovariance-generating functions, Br(2) = ax(2) + Bw(2) (4729) Equation [4.7.19] is a quite general result—if one adds together two covariance- stationary processes that are uncorrelated with each other at all leads and lags, the 106 Chapter 4 | Forecasting autocoveriance-generating function of the sum isthe sum of the autocovatiance- {generating functions ofthe individual series EY, to be expressed as an MA(g) process, Yea (14 OL + OL +++ + OLE, = OLDE, with Eee. a forj=0 [0 otherwise, then its autocovariance-generating function would be av(e) = (2012-0? ‘The question is thus whether there always eit value of (65, @y 22) auch that [4.7.19 is satisied (2) {a t)o? = Be)BCE"OR + w(e)e(=— oR (4.7.29 It turns out that there do. Thus, the conjecture turns out to be correct that if two moving average processes that are uncorrelated with each other at all leads and lags are added together, the result is a new moving average process whose order is the larger of the order of the original two series: ‘MA(q,) + MA(qs) = MA(maxlqs, 43) (47.211 ‘A proof of this assertion, along with a constructive algorithm for achieving the factorization in [4.7.20], will be provided in Chapter 13. Hiding Two Autoregressive Processes Suppose now that X, and W, are two AR(1) processes: (mx, =u, (4.723) (= pLyW, = ve (47.23) where u, and v, are each white noise with u, uncorrelated with v, forall and r. ‘Again suppose that we observe YX +m, and want to forecast Y,.1 on the basis of its own lagged values. 1f, by chance, X and W share the same autoregressive parameter, or tp thea (4.7.2) could simply be added directly to [6.7.23] to deduce (= WL), + (= aL), = +, (1 ~ LyX, + WD But the sum u, + is white noise (as a special case of result [4.7.21]), meaning that Y, has en AR(1) representation (= mLyY, = In the more likely case that the autoregressive parameters a and p are different, then [4.7.22] can be multiplied by (1 ~ pL): (1 = byt ~ 9b), = (1 ~ tus (4.7.24) 4.7. Sums of ARMA Processes 107and similarly, (.7.23] could be multiplied by (1 ~ a): (1 = mL) ~ pLyW, = (1 ~ 2h), (4725 ‘Adding [4.7.24] 0 [4.7.25] produces (1 ~ pL M(t = aL )(X, + WD = C1 = pL) + = abe (87.28) From [4.7.21], the right side of [4.7.26] has an MA(1) representation, Thus, we could write (= OL = GLY, = + OLE, where (= Gb = &L4) = (ptt ~ HL) and (1 + 6L)e, = (1 ~ plu, + (L ~ mL, In other words, ARQ) + ARC In general, adding an AR(,) process mL), = ty to an AR(p2) process with which itis uncorrelated a all leads and lags, ALYY, = produces an ARMA(p, + ps, max{p, pa) proces, HY, = H(L)e,, ARMAQ2, 1) (47.27) where OL) = m(L)o(L) and OLE, = e(L)uct mL. . Wold’s Decomposition and the Box-Jenkins Modeling Philosophy Wold’s Decomposition All of the covariance-stationary processes considered in Chapter 3 can be written in the form viene Sten ws where e; is the white noise error one would make in forecesting ¥, as a linear function of lagged Y and where 37-99} <= with Yo = 1 ‘One might think that we were able to write all these processes in the form of [4.8.1] because the discussion was restricted to a convenient class of models, However, the following result establishes that the representation (4.8.1) isin fact fundamental for any covariance-stationary time series. 108 Chapter 4 | Forecasting Proposition 4.1: (Wold's decompostion). Any zero-mean covariancesstationary process Y, can be represented in the form Y= Sos + He (6.82) Where y = 1 and 37.99} <=. The term gis white noise and represent the error ‘made in forecasting ¥,on the basis ofa lnear function of lagged Y: 6 ¥, ~ EWAN 1 Yin +). 4.8.3] The value of , is uncorrelated with ¢,., for any j though x, can be predicted arbitrarily wel froma linear function of past values of ¥: mm BG dYnte Yeas Je ‘The term mis called the linearly deterministic component of ¥, while 27-0 ,-) is called the linearly indeterministic component. If x, 0, then the process is called pwely linearly indeterminisic. ‘This proposition was frst proved by Wold (1938).* The proposition relies on stable second moments of ¥ but makes no use of higher moments, It thus describes only optimal linear forecasts of ¥. Finding the Wold representation in principle requires fitting an infinite number of parameters (41, Ys, ---) to the data. With a finite number of observations on(Y;, ¥a,- - - , Yr) this will aever be possible. Asa practical matter, we therefore need to make some additional assumptions about the nature of (Ys, Ya, - - -). A typical assumption in Chapter 3 was that y(L) can be expressed as the rato of two finite-order polynomials: a, oll) +L + OL +s + HLF Dee sata tea ere le ‘Another approach, based on the presumed “smoothness” of the population spec- ‘trum, will be explored in Chapter 6. The Box-Jenkins Modeling Philosophy Many forecasters are persuaded of the benefits of parsimony, owing a few parameters as posible. Box and Jenkins (1976) have been influential advocates of this view. They noted that in practice, analysts end up replacing the tue operators (L} and 6(L) with estimates (L) and Q(L) based on the data. The more param- ters to estimate, the more room there i {0 g0 wrong. ‘Although complicated models can track the data ‘very well ovr the historia peviod for which parameters are estimated, they often perfomm poorly when used for ‘utof-sampe forecasting, Fr example, the 1960s saw the development ofa number alge macroeconometrc models purpring to desrbe the economy txing hundreds ‘ot mzcroeconomic variables and equations. Part ofthe dllonment wth sch etforts was the dcovery that univariate ARMA model wih smal vues of p org often produced better forecasts than the big models (efor example Nekon, 1972). As te shall seein later chapters, ange size alone was hardly the only Habit ofthese large-scale macroeconometrc modes. Even, the claim tha simpler models provide rote robst forecasts has ageat many believers sro dsiptines. ‘See Saget (1967, pp 26-90) fru nce sketch ofthe ltulon behind this eu “For more recent petiminic evince about cutet largescle model, ee Ashley (1H). 4.8, Wold's Decomposition and the Box-Jenkins Modeling Philosophy 109‘The approach to forecasting advocated by Box and Jenkins can be broken down into four steps: (1) Transform the data, if necessary, 50 that the assumption of covariance- stationarity is @ reasonable one. @) Make en inital guess of small values for p and g for an ARMA(p, q) model that might describe the transformed series. @) Estimate the parameters in @(L) and 6(L). (4) Perform diagnostic analysis to confirm that the model is indeed consistent with the observed features of the data ‘The first step, selecting a suitable transformation of the data, is discussed in Chapter 15. For now we merely remark that for economic series that grow over time, many researchers use the change in the natural logarithm of the raw data, For example, if X, isthe level of real GNP in year, then ¥, = log X, ~ log X,-4 (485) might be the variable that an ARMA model purport to describe. ‘The third and fourth steps, estimation and diagnostic testing, will be discussed in Chapters 5 and 14. Analysis of seasonal dynamics can also be an important part of step 2 of the procedure; this is briefly discussed in Section 6.4, The remainder of this section is devoted to an exposition of the second step in the Box-Jenkins procedure on nonseasonal data, namely, selecting candidate values for p and q* ‘Sample Autocorrelations ‘An important part ofthis selection procedure is to form an estimate f, ofthe population autocorrelation p Recall that p, was defined as A= Hl where y= EH, = wi — ‘A natural estimate of the population autocorrelation p, is provided by the corresponding sample moments: A= ile where PAZ OM -9 tino TH1 wse i (a7) "Note that even though only T — j observations are used to construct 4, the ‘denominator in [4.8.6] is T rather than T ~ j. Thus, for lage j, expression [4.8.6] shrinks the estimates toward zero, as indeed the population autocovarianes go 10 zero a8 j > ®, assuming coveriance-stationarity. Also, the full sample of observations is used to construct J. "ox ad Jenkins eer o thi ep as “dentieston™ ofthe appropriate model. We avoid Box and ‘ean’ termiocogy, because "enicaon” bass quite dllereat mening for econometisan. 110 Chapter 4 | Forecasting Recall that ifthe data realy follow an MA(g) process, then p, will be zero forj> 4. By contrast, ifthe data follow an AR(p) proces, then wil gredualy decay toward zero a5 @ mixture of exponentials or damped siausoids. One guide for distinguishing between MA and AR representations, then, would be the decay properties ofp, Often, we are interested in a quick assessment of whether p, = 0 for) = q+ 1,4 + 2.» «Ifthe data were really generated by a Gausian BA(@) proces, then the variance ofthe estimate f, could be approximated by? vag = 4 {1 +28 oh forjeqtigtd.... [488] ‘Thas, in particular if we suspect that the data were generated by Gaussian white noise then 9, for aay j + 0 should lie between =2/VT about 95% ofthe time. ‘in general, ifthere is autocorrelation inthe process that generated the orginal data {Y), then the estimate f willbe corselated with f for # J2° Thus patterns in the estimated 4, may represent sampling error rather than patteos inthe true p, Partial Autocorrelation ‘Another useful measure is the partial autocorrelation. The mth population partial autocorrelation (denoted a) is defined as the last coefficient in a linear projection of ¥ on its m most recent values (equation (4.3.7): Praag Ha aL, — Hw) + a (Kay — Bd LK ma =) ‘We saw in equation [4.3.8] that the vector a’? can be calculated from af Reccceegiiinincme dish ay” n mea | | ee Yet Mant Recal that ifthe data were really generated by an AR(p) process, only the p most recent values of ¥ would be useful for forecasting. In this case, the projection coefficients on Ys more than p periods inthe past are equal to zero: w J Lye. aff 0 form=p + lp +2, By coatrat, if the data really were generated by en MA(Q) process with q = 1, then the partial autocorrelation ai asymptotically approaches zero instead of cutting off abruptly ’A natural estimate ofthe mth partial autocorrelation i the last coefficient ia an OLS regression of y on a constant and its m most recent values We E+ EM + MY + oo + Be mer + by where é, denotes the OLS regression residual. Ifthe data were really generated by an AR(p) process, then the sample estimate (4°) would have a variance around the true value (0) that could be approximated by'! Var(aQ) = UT form = p+ 1,p +2, "See Box and Feakins (1976, p25). Again see Box end Jenkin (1976.38) "pox and Jeske (1576, . 8). 48. Wolé’s Decomposition and the Box-Jenkins Modeling Philosophy 111Moreover, if the data were really generated by an AR(p) process, then 4? and {0 would be asymptotically independent for i, | > p. Example 4.1 ‘Weillustrate the Box-Jenkins approach with seasonally adjusted quarterly data on USS. real GNP from 1947 through 1988. The raw data (x) were converted to log changes (y,) as in [4.8.5]. Panel (2) of Figure 4.2 plots the sample autocortelations of y (A, for j = 0, 1, ..., 20), while panel (b) displays the semple partial autocorrelations (4°) for m ='0, 1, ... , 20). Ninety-five percent coafideace bands (=2/\VT) are plotted on both panels: for panel (a), these are appropriate under the null hypothesis that the data are really white noise, whereas for panel (b) these are appropriate ifthe data are really gen- trated by an AR(p) process for p less than m. we ng (2) Sample autocorrelations = 2 (®) Sample partial autocorrelations FIGURE 42. Sample autocorrlations and partial autocorelations for US. quarterly real GNP growth, 1947:T to 198B1V. Ninety-five percent confidence intervals are plotted as = 2V7. 112 Cheveer 4 | Forecasting ‘The first two autocorrelations appear nonzero, suggesting thet q ‘would be needed to describe these data as coming from a moving average ‘process. On the other hand, the pattern of autocorrelaions appears consistent withthe simple geometric decay of an AR(1) process, ane With 9 = 0.4. The paral autocorrelation could also be viewed as dying out alter one lag, also consistent with the AR(1) hypothesis. Thus, one's inital ‘guess for a parsimonious model might be that GNP growth follows aa AR(1) process, with MA() as another possiblity to be considered ‘APPENDIX 4.A. Parallel Between OLS Regression ‘and Linear Projection ‘This appendix discusses the parallel berween ordinary leat squares regression and linear projection. This parallel is developed by introducing an artificial random Variable specifically ‘onstructed so aso have population moments identical tothe samplemomentsof particular Sample. Say that in some particular sample on which we intend to perform OLS we have observed 7 particular values forthe explanatory vector, denoted x, Xy- Consider ‘tn arificialdiscrete-valued random variable that can take on only one ofthese particular Tales, each with probability (1/7) Pig =x} = UT Pg = x) = UT Pig = a} = UT, ‘Thus § is an artificially constructed random variable whose population probability disti- tation is given by the eepiial distribution function of x, The population mean of the random variable & | 20 = Sure “hus the population mean off equals the observed sample mean ofthe true random variable X,. The population recond moment of fi Eee hich s the sample second moment of (x, 3, -- +X). "We can similarly construct a second arficil Viable w that can take on one of the dict wales ed Pre): Suppo a the Jot btn of wand pen Bag PCa x8 = al UT fort ,2,...47. EG) = EB xs (aa) ‘The cooficient for a linear projection of w on & i the value ofa that misizes| E00 — et = BE Oa - os). As] This is algebraically the same problem as choosing B 50 as to minimize (4.1.17). Thus, crcnary least squares regression (chosing B so e (0 inimize [4-117] canbe viewed as {Speci case of linear projection (choonng eso ato miimine [4 A.3). The val of @ ‘Appendix 4.4. Parallel Between OLS Regression and Linear Projection 113that minimizes (4.A.3}can be found from substituting the expressions forthe population ‘moments ofthe sfc random variebies (equation [4.4.1] and [8.A.2]) into the formule {ora near projection (equation (4.1.13): 1g ‘1g a= towne) « [ES ax] [EE a1} ‘Ths the formula for the OLS estimate bin [4.1.18 canbe oie a a pel eat of Teosmul tte lines pete cote eo ect nar oye nd OLS repens shar he same them se tue, itcat show aoe bave's pare tc ar at ane tel ove fc ‘enc es or cong sige, For ea the tenant our popsecon BO) = Var(¥) + ECP. [a4] asthe sample analog. (a5) with F = (DEK ‘As 2 second example, suppése that we estimate a series of m OLS regressions, with 1 the dependent variable forthe th regression and x, a (x 1) vector of explanatory irae ammo to ech repression. Let Jor Yo) and wie the regression model as yes +u, for 1’ an (n x 4) matrix of regression coeficients, Then the sample vaslance-covarance ‘matrix of the OLS residuals can be inferred fom [.1.25} rhew- [Pim] - [bd] FS where 0, =, Mi’, and the th row of fis ven by «-{Se] Heed} APPENDIX 4.B. Triangular Factorization of the Covariance Matrix for an MA(1) Process “This appendix establishes thatthe triangular factorization of fin [4.5.17] is given by [4.5.18] and (43.19 “The magoitude o is simply a constant term that wil end up multiplying every term in the D matrix. Recognizing this, we can initially solve the factorization assuming t = 1, and then mulply the resulting D matrix by oto obiain the result for the gene ase. The (1,1) element of D (ignoring the factor o°) is given by the (1,1) element of dy = (J+ 08). To puta aero in the (2, 1) position of ©, we multiply the fist row of 1 by o1(1 + a) and subiract the reslt from the second; hence, aj, = OL +). This ‘operation changes the (2, 2) element of © to i: __@ Gta es ieeee nO OTR Tee TH [6a] ‘Toput azeroin the (3,2) element of, the second row ofthe new matrix must be multiplied by Bld and then subtracted from the third row; hence, ~ 1 ee As 114 Chapter 4 | Forecasting ‘This change he 3) lomo ae demure nose Team OER sys eds es 0 - US Tro dy teres Tee re Ta genera, forthe ith row, Lee ee em “rTperee ‘To put aero in the (i + 1,1) poston, multiply by +e eos Biayy = eld, = and subtract from the ((+ 1)th row, producing ONE Rees EY danas (14 0 OE ee CLG HOE em seems wy - THOR ee OL Oe os ty er LEE ees mew Tt ORT ae Chapter 4 Exercises 44, Use formula [4.3.6] to show that fora covariance-sttionary proces, the projection of ¥,.; on # constant and ¥, is given by £QulY) =~ ad + eY, where = ECY) and py = ‘yl (@)- Show that for the ARC) process, this reproduces equation (4.2.19) for = 1. (B) Show that forthe ALA(I) proces, this reproduces equation [45.20] for m = 2 (6) Show that for an AR(2) process, the implied forecst is. H+ (l(t ~ ADK, ~ wo. 1s the error associated with this forecast correlated with ¥,? Ts it correlated with ¥,. 42, Verity equation (43.3) 43. Find the triangular factorization of the following matrix: Fiera anieiea 3-4 2 44. Can the coefficient on ¥; fom a linear projection of Ya on Ys. Yas and ¥, be found from the (4,2) element ofthe matrix A from the triangular factorization of = E(YY')? 45. Suppose that X, follows an AR(p) process and, i a white noise process that is correlated with X,., for all). Show thatthe sam Yok ty, follows an ARMA(p, p) process Chapter 4 Exercises 1154.6, Generalize Exercise 45 to deduce that if one adds together an AR(p) process with fin MA(g) process an if these two proceses are uncorrelated with each other tal leads tnd lags them the result isan ARMA(p, p + 4) proces. Chapter 4 References Ashley, Richard. 1988, “On the Relative Worth of Recent Macroeconomic Forecast. International Journal of Forecasting 4:363~76, Box, George E. P., and Gwilym M. Jenkins. 1976, Time Series Analysi: Forecasting and Control, rev. ed. San Francisco: Holden-Day. ‘Nelson, Charles R, 1972 ‘The Prediction Performance ofthe FR.B.~M.LT. -PENN Model of the US. Economy.” American Economic Review 62:902-17. Sergent, Thomas J. 1987. Macroeconomic Theory, 24 ed. Boston: Academic Pres Wold, Hezman. 1938 (2d ed. 1954). A Snudy inthe Anahsis of Stalonary Time Series. Uppsela, Sweden: Almgvist and Wiksel, 116 Chapter ¢ | Forecasting Maximum Likelihood Estimation 5.1. Introduction Consider an ARMA model of the form Yom eM OM tt Mig tat Mts 5.12] #Bieaa tt Oates with, white noise: Ele) = 0 5.13] oe? forte Bad {e eae waa ‘The previous chapters assimed thatthe population parameters (Cds «5 ye 64... yo?) were known and showed how population moments such &s E(Y.¥. ) and linear forecasts £(Y,.,[¥», ¥,-s5 -.) could be calculated as functions ofthese population parameters. This chapter explores how to estimate the values of (c, 6, + Bis» +s By) 0) 01 the basis of observations on Y. ‘The primary principle on which estimation will be based is maximum lel hood. Let ® = (C, 1s «=» dye Bi» - » By 2)’ denote the vector of population parameters, Suppose we have observed a sample of size T (jj Jn.» v2) The appreach wll be to calculate the probability density Fane Ye Yamin 425 Be (5.1.4) ‘which might loosely be viewed as the probability of h sample, The maximum likelihood estimate (MLE) of sample is most likely to have been observed; that is, maximizes [5.1.4] ‘This approach requires specifying a particular distribution for the white noise process 5, Typically we will assume that ¢, is Gaussian white noise: e,~ iid. NC, 2°). (5.1.5) ‘Although this assumption is strong, the estimates of @ that result from it will often ‘turn out to be sensible for non-Gaussian processes a8 well ‘Finding maximum likelihood estimates conceptually involves two steps. First, the likelihood function [5.1.4] must be calculated. Second, values of @ must be found that maximize this function. This chapter is organized around these two steps, Sections 5.2 through 5.6 show how to calculate the likelihood function for ifferent Gaussian ARMA specifications, while subsequent sections review general techniques for numericel optimization, ‘observed this particular the value for which this it is the value of @ that 117522. The Likelihood Function for a Gaussian AR(1) Process Evaluating the Likelihood Function ‘A Gaussian AR(1) process takes the form Yim er O¥n1 +f (5.21) with ¢; ~ Lid, N(0, 02). For this case, the vector of population parameters to be estimated consists of 8 ™ (c, $, 0°)’ ‘Consider the probability distribution of Y,, the fist observation in the sample. From equations (3.4.3] and [3.4.4] this is a random variable with mean EY) = w= alll ~ 6) and variance EY, ~ w= oF ~ 69). Since (ef7.-» is Gaussian, ¥; is also Gaussian, Hence, the density of the first ‘observation takes the form Fa Qn ®) = fe nies ds 7) =o [se] [5.22] * Vin Vane — 35 PL 27d — 6) "Next consider th distribution of the second observation conditional on observing ¥, = y1 From [5.2.1], ¥,= + 6% +e 5.23] Conditioning on ¥; = y, means treating the random variable Y, as if it were the ‘deterministic constant y. For this cas, [5.2.3] gives ¥, as the constant (¢ + ¢;) plus the (0, a2) variable es. Hence, Cay, = 1) ~ MEE + 69). 09, Sransvslyns ®) = poe Oe) (5.24) ‘The jie density of observations 1and2is then js the produto [5.2.4] and 5:22} Srgv(din Ys ®) = fran (vars: Of (0s 9). Similar, the dstibuion ofthe third observation conditional onthe fist two is Iriver 188) = yey eo| == 5 BH), from which Fryar Oo Jo I) = feared Iii Ox, 1 0) In general, the values of ¥;, Ya). .- Yiar matter for ¥, only through the value of ¥,-s, and the density of observation «conditional on the preceding t ~ 1 observations is given by Febreon tere WdYents Years i @) = favdvdy 1 Viret 118 Chapter 5 | Maximum Likelihood Estimation [52.5] exp The joint density of the first r observations is then Sr p¥crn the Yee yy 8) § Fev Di-8 WFrrresoon ew Fors +1350. BAO The keitood ofthe complete sample cen ths be caleulated a8 pry gets Yraay «+ 985 8) = fens OT fraw, sOdy15 @). [5.2.7] ‘The log likelihood function (denoted £(6)) can be found by taking logs of (5.2.71 (0) = 806 F055 8) + 3108 fra, 8) 528] Clearly, the value of @ that maximizes (5.2.8} is identical to the value that maximizes [5.2.7]. However, Section 5.8 presents @ number of useful results that can be calculated as a by-product of the maximization if one always poses the problem as maximization of the log likelihood function [5.2.8] rather than the likelihood function (5.2.7) ‘Substituting (5.2.2] and [5.2.5] into [5.2.8], the log likelihood for a semple of size T from a Gaussian AR(1) process is seen to be £0) = ~Hog(2a) ~ Holo ~ 6] [ot - oF ~ Ba tee OF — (cr - sya} 192) 529] > . ~ (1 - 192} tog(o%) - & An Alternative Expression for the Likelihood Function A itferent description of the likehood function fo a sample of size T from 8 Gaussian AR(1) process is sometimes useful. Collect the fll set of observations ina (7% 1) vector, aly Ove IY This vector could be viewed as a single realization from a T-dimensional Gaussian distribution, The mean of this (T * 1) vector is E(Y,) e eae a (52.10) zo} La where, as before, 4 = ci(1 — ¢). In vector form, [5.2.10] could be written EQ) =», where js denotes the (T x 1) vector on the right side of [5.2.10]. The variance- covariance matrix of ¥ is given by BU = wy - wy] = 0 (52.1) ‘5.2. The Likelihood Function for a Gaussian AR(I) Process 119where BY, = w-wh) > UY, ~ adh) E(w Er ~ WH ~ w) EU — HOH ~ #) BUY, = WF = #) BUY — 1% ~ 1) EY, - wy [52.2] ‘The elements of this matrix correspond to autocovariances of Y. Recall that the jth autocovariance for an AR(1) process is given by BLY, ~ wy ~ m) = 0810 ~ 69) (52.3) Hence, [5.212] can be wetten as Dao, (52.14 where o 4 om eee oo. om [5215] ma ght ght Viewing the observed sample y asa single draw from a N(u, 0) distribution, ‘the sample likelihood could be weitten dowa immediately from the formula for the ‘multivariate Gaussian deasity: flys 8) = (2n)-7? [O-¥P expl—Ky — w)'O- Hy ~ w)}, (52.16) ‘with log likelihood £(@) = (~ 772) log(2n) + Hogi] - Ay ~ wy Oy = w). [5.2.17] Evidently, [52.17] and [5.2.9] must represent the identical function of (Js. 41). To verify that ths is indeed the case, define VI=B# foe oO Ot -@ 10 00 ret oom oo}. 5.2.18] on : ae [S248] Oia0) “61 It is straightforward to show that! LL=Vv4, [52.19] "By dive mlipeaon, oe eluates JB OVER VISE orev 0 a= 8) ML) OL 8 ° 0 ae om 8 we a ° ° ° aa and premahipyiag ts by L’ produces the (Fx 7) identity mats. Ts, LLY = Yy confing (529, 120 Chapter 5 | Maximum Likelihood Estimation implying from (5.2.14) that Ot = ALL. (5.2.20) Substituting [5.2.20] ato (5.2.17 resus in £(0) = (~T2)log@2n) + Hoplo*L' ~ Hy — wY'o-*L'Ly ~ w). [52.21] Define the (Tx 1) vector § to be Fm ly —p) ieee a0) 0 cote Boe tt o 20g | ee Oar git sd coil ee 0 0 Os 6 adhy ad fs22q VIFF 0-0) Or - #) ~ 801 ~m) =| Gy - 6-H) (r= M) ~ O0r-1 ~ #). Substituting w = ci(1 ~ 4), this becomes VIFF ln ~ ot 691 yen oy we on yee Byr ‘The last term in [5.2.21] can thus be written Hy ~ wo LL ~ w) = (QoS = (u@eAJa - 4D. - dG - OP [52.23] + WR 0, 6 ~ oF. ‘The mide erm in [5.2.21] is similarly Hoglo-2L'L =H ogle-?" = ILL} Hog o®7 + 4 loglL'L| [5.2.24] = (12) 1g o? + logit, ‘where use has been made of equations (A.4.], [A.4.9], and [A.6.11] in the Math- ematical Review (Appendix A) atthe end of the book. Moreover, sine Lis lower triangular, its determinant i given by the product ofthe terms along the pri diagonal: (Ly = VT = @ Thus, [5.2.24 sates that Hloglo“®L'L] = (772) tog 0? + Hog(t ~ $). [5.2.25] Substituting [5.2.23] and [5.2.25] into [5.2.21] reproduces [5.2.9]. Thus, equations [5.2.17] and [5.2.9] are just two different expressions forthe same magnitude, as claimed, Either expression accurately describes the log likelihood function, ‘5.2. The Likelihood Function for a Gaussian ARI1) Process. 121Expression [5.2.17] requires inverting a (T x T) matrix, whereas [5.2.9] does not. Thus, expression [5.2.9] is clearly to be preferred for computations. It avoids inverting a (7 x 7) matrix by writing Y, asthe sum of a forecast (c + ¥,-;) and 2 forecast error (@). The forecast error is independent from previous observations by construction, so the log ofits density is simply added to the log likelihood of the preceding observations. This approach is known as a prediction-error decom position of the likelihood function Exact Maximum Likelihood Estimates for the Gaussian AR(L) Process ‘The MLE 6 is the value for which (5.2.9] is maximized. In principe, this requires differentiating [5.2.9] and setting the result equal to zero. In practice, When an attempt is made to carry this out, the result is a system of nonlinear equations in @ and (y,, Ya)...» J) for whieh there is no simple solution for @ in terms of (93, Ja, -- + )1)- Maximization of [5.2.9] thus requires iterative or numerical procedures described in Section 5.7. Conditional Maximum Likelihood Estimates ‘An alternative to numerical maximization ofthe exact likelihood function is to regard the value of y, as deterministic and maximize the likelihood conditioned ‘nthe first observation, Feperporin On Yeats HI the objective then being to maximize 108 ferrets Yann >= odd @) = ((T ~ 1)2} tog(2) = [(F = 192] og(o*) 5.2.27) ie [& as 2. | Maximization of [5.2.27] with respect toc and gis equivalent to minimization of oP (52.28) which is achieved by an ordinary least squares (OLS) regression of y, on a constant and its owa lagged value, The conditional maximum likelihood estimates of ¢ and @ are therefore given by ees Ele where 3 denotes summation over 1 = 2,3, ...,T- ‘The conditional maximum likelihood estimate of the innovation variance is found by differentiating [5.2.27] with respect to o? and setting the result equal to 122 Chanter5 | Maximuon Likelihood Estimation pe ¥ [er e- y w= 3 [tpt In other words, the condiional MLE ‘s the average squared residual fom the OLS repression [5228 contrat 0 exact maximum lksihood estimates, the condonal maximum Uketihood estimates are thus ta fo compute: Moreover the sample sae F's Sutin large, the fst observation mates aegigblecontibuont the oat Ukebhood: The exact MLE and conditional MLE turn ou to have the tre pes simple diibuion, provided tat ll <.And when 6l> Ist condinat MEE continues to prove conten extintes, whereas knit a (23) doce tot. Ths is because [32.9] derived om [S222) whch doce mo scat describe the density of Y, when |$| > 1. For these reasons, in most applications the parameter ofan aulregresson re esmated by OLS faedana nastneen litelitood) rather thn exact maximum tetnoed 533. The Likelihood Function for a Gaussian AR(p) Process “This section discuses « Gaussian AR(p) process, Yom e+ Bar + bY tH OYp boy (5) with ¢, ~ iid. N(, 0%). In this case, the vector of population parameters to be estimated is 8 = (Cy day day + ys OY) Evaluating the Likelihood Function ‘A combination of the two methods described for the AR(1) case is used to caleulate the likelihood function for a sample of size T for an AR(p) process. The firstp observations in the sample (y,, y2,. - . yp) are collected in a (p % 1) vector Yo» which is viewed as the realization of a p-dimensional Gaussian variable. The ‘mean of this vector is 4, which denotes a (p X 1) vector each of whose elements is given by B= dl ~ b - dr- +++ ~ 4). (53.2) Let oV, denote the (p x p) vatiance-covariance matrix of (Ys, Ya, ---. Yo) EY = mB, = wha = a) oo BOK ~ HH, ~ 0) py 2] FORT NO =) B= a BOY = whY, = w) CY, = aN. =) EO, — m0 = 9) 20r, = nF [53.3] For example, for a first-order autoregression (p = 1), Vp isthe scalar 1(1 ~ 47). For a general pth-order autoregression, enn wy Suen gees eV, | mm wo Hes], Yt 2 Yea % 5.3. The Likelihood Function for a Gaussian AR p Process 123

Hamilton, J. (1994) - Time Series Analysis PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Hamilton, J. (1994) - Time Series Analysis PDF

Uploaded by

Copyright:

Available Formats

You might also like