## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

by Fabio Canova

Ratings:

876 pages12 hours

The last twenty years have witnessed tremendous advances in the mathematical, statistical, and computational tools available to applied macroeconomists. This rapidly evolving field has redefined how researchers test models and validate theories. Yet until now there has been no textbook that unites the latest methods and bridges the divide between theoretical and applied work.

Fabio Canova brings together dynamic equilibrium theory, data analysis, and advanced econometric and computational methods to provide the first comprehensive set of techniques for use by academic economists as well as professional macroeconomists in banking and finance, industry, and government. This graduate-level textbook is for readers knowledgeable in modern macroeconomic theory, econometrics, and computational programming using RATS, MATLAB, or Gauss. Inevitably a modern treatment of such a complex topic requires a quantitative perspective, a solid dynamic theory background, and the development of empirical and numerical methods--which is where Canova's book differs from typical graduate textbooks in macroeconomics and econometrics. Rather than list a series of estimators and their properties, Canova starts from a class of DSGE models, finds an approximate linear representation for the decision rules, and describes methods needed to estimate their parameters, examining their fit to the data. The book is complete with numerous examples and exercises.

Today's economic analysts need a strong foundation in both theory and application. *Methods for Applied Macroeconomic Research* offers the essential tools for the next generation of macroeconomists.

Publisher: Princeton University PressReleased: Sep 19, 2011ISBN: 9781400841028Format: book

This chapter is introductory and intended for readers who are unfamiliar with time series concepts, with the properties of stochastic processes, with basic asymptotic theory results, and with the principles of spectral analysis. Those who feel comfortable with these topics can skip directly to **chapter 2. **

Since the material is vast and complex, an effort is made to present it at the simplest possible level, emphasizing a selected number of topics and only those aspects which are useful for the central topic of this book: comparing the properties of dynamic stochastic general equilibrium (DSGE) models to the data. This means that intuition rather than mathematical rigor is stressed. More specialized books, such as those by Brockwell and Davis (1991), Davidson (1994), Priestley (1981), or White (1984), provide a comprehensive and in-depth treatment of these topics.

When trying to provide background material, there is always the risk of going too far back to the basics, of trying to reinvent the wheel. To avoid this, we assume that the reader is familiar with simple concepts of calculus such as limits, continuity, and uniform continuity of functions of real numbers, and that she is familiar with distributions functions, measures, and probability spaces.

The chapter is divided into six sections. The first defines what a stochastic process is. The second examines the limiting behavior of stochastic processes introducing four concepts of convergence and characterizing their relationships. Section 1.3 deals with time series concepts. Section 1.4 deals with laws of large numbers. These laws are useful to ensure that functions of stochastic processes converge to appropriate limits. We examine three situations: a case where the elements of a stochastic process are dependent and identically distributed; one where they are dependent and heterogeneously distributed; and one where they are martingale differences. Section 1.5 describes three central limit theorems corresponding to the three situations analyzed in section 1.4. Central limit theorems are useful for deriving the limiting distribution of functions of stochastic processes and are the basis for (classical) tests of hypotheses and for some model evaluation criteria.

Section 1.6 presents elements of spectral analysis. Spectral analysis is useful for breaking down economic time series into components (trends, cycles, etc.), for building measures of persistence in response to shocks, for computing the asymptotic covariance matrix of certain estimators, and for defining measures of distance between a model and the data. It may be challenging at first. However, once it is realized that most of the functions typically performed in everyday life employ spectral methods (frequency modulation in a stereo, frequency band reception in a cellular phone, etc.), the reader should feel more comfortable with it. Spectral analysis offers an alternative way to look at time series, translating serially dependent time observations into contemporaneously independent frequency observations. This change of coordinates allows us to analyze the primitive cycles which compose time series and to discuss their length, amplitude, and persistence.

, and *P *, where, for each *t*is a measurable function**is the real line. We assume that at each t , so that any function h(yτor yt .A normal random variable with zero mean and variance ∑y and a random variable uniformly distributed over the interval [a1, a. Finally, **

i.i.d.indicates identically and independently distributed random variables and a white noise is an i.i.d. process with zero mean and constant variance.

**Definition 1.1. (stochastic process). **is a probability measure defined on sets of sequences of real vectors (the paths

of the process).

and *t *for a given *t*, and performing countable unions, finite intersections, and complementing the above set of paths, we generate a set of events with proper probabilities. Note that *yt *is unrestricted for all *τ ≤ t*only at *t***². Two simple stochastic processes are the following. **

**Example 1.1. **, where *e*1 and *e*2 are random variables, *e*. Here *yt *is periodic: *e*1 controls the amplitude and *e*2 the periodicity of *yt. *

is such that *P*[*yt *= ±1] = 0.5 for all *t*. Such a process has no memory and flips between −1 and 1 as *t *changes.

**Example 1.2. **, and *e*1*t *and *e*2*t *, *et *~ i.i.d. (0, 1) is a stochastic process.

In a classical framework the properties of estimators are obtained by using sequences of estimators indexed by the sample size, and by showing that these sequences approach the true (unknown) parameter value as the sample size grows to infinity. Since estimators are continuous functions of the data, we need to ensure that the data possess a proper limit and that continuous functions of the data inherit these properties. To show that the former is the case, one can rely on a variety of convergence concepts. The first two deal with convergence of the sequence, the next with its moments, and the last with its distribution.

*1.2.1 Almost Sure Convergence *

The concept of almost sure (a.s.) convergence extends the idea of convergence to a limit employed in the case of a sequence of real numbers.

, convergence can be similarly defined.

**Definition 1.2 (a.s. convergence). **, and every *ε *> 0.

converges a.s. if the probability of obtaining a path for *yt *after some *T *is infinite dimensional, a.s. convergence is called convergence almost everywhere; sometimes a.s. convergence is termed convergence with probability 1 or strong consistency criteria.

Next, we describe the limiting behavior of functions of a.s. convergent sequences.

**Result 1.1. **. Let *h *be an *n *.

Result 1.1 is a simple extension of the standard fact that continuous functions of convergent sequences are convergent.

**Example 1.3. **.

**Exercise 1.1. **with probability 1 − 1/*t *with probability 1/*t*converge a.s. to 1? Suppose

. What is its a.s. limit?

becomes arbitrarily small as *t *is another sequence of random variables. To obtain convergence in this situation we need to strengthen the conditions by requiring uniform continuity of *h *(for example, assuming continuity on a compact set).

**Result 1.2. **Let *h *such that, for all *t *> *T*, *y*2*t *, uniformly in *t*.

are given, and let *h *be some continuous statistics, e.g., the mean or the variance. Then, result 1.2 tells us that if simulated and actual paths are close enough as *t *→ ∞, statistics generated from these paths will also be close.

*1.2.2 Convergence in Probability *

Convergence in probability is a weaker concept than a.s. convergence.

**Definition 1.3. (convergence in probability). **for *t *.

as *t *implies that after *T *as *T *.

**Example 1.4. **Let *y*t and *yτ *be independent ∀*t*, *τ*, let *yt *be either 0 or 1 and let

Then *P*[*yt*, = 0] = 1 − 1/(*j *+ 1) for *t *= 2*j*−1 + 1,…, 2*j*, *j *. This is because the probability that *yt *is in one of these classes is 1/*j *and, as *t *→ ∞, the number of classes goes to infinity. However, *yt *does not converge a.s. to 0 since the probability that a convergent path is drawn is 0; i.e., the probability of getting a 1 for any *t *= 2*j*−1 + 1,…, 2*j*, *j *> 1, is small but, since the streak 2*j*−1 + 1,…, 2*j *is large, the probability of getting a 1 is 1 − [1 − 1/(*j *+ 1)]²(*j *− ¹), which converges to 1 as *j *.

Although convergence in probability does not imply a.s. convergence, the following result shows how the latter can be obtained from the former.

**Result 1.3. **(see, for example, Lukacs 1975, p. 48).

Intuitively, since convergence in probability allows a more erratic behavior in the converging sequence than a.s. convergence, one can obtain the latter by disregarding the erratic elements. The concept of convergence in probability is useful to show weak

consistency of certain estimators.

**Example 1.5. **(i) Let *yt *be a sequence of i.i.d. random variables with *E*(*yt*(Kolmogorov strong law of large numbers).

(ii) Let *yt *be a sequence of uncorrelated random variables, *E*(*yt*) < ∞,

(Chebyshev weak law of large numbers).

In example 1.5 strong consistency requires i.i.d. random variables, while for weak consistency we just need a set of uncorrelated random variables with identical means and variances. Note also that weak consistency requires restrictions on the second moments of the sequence which are not needed in the former case.

The analogs of results 1.1 and 1.2 for convergence in probability can be easily obtained.

**Result 1.4. **for any continuous function *h *(see White 1984, p. 23).

**Result 1.5. **Let *h *and, for large *t*, *y*2*t *, uniformly in *t*(see White 1984, p. 25).

Sometimes *yt *, where each *ej *is i.i.d., has a limit which is not in the space of i.i.d. variables. In other cases, the limit point may be unknown. For all these cases, we can redefine a.s. convergence and convergence in probability by using the concept of Cauchy sequences.

**Definition 1.4 (convergence a.s. and in probability). **{*yt*for some *τ *> *t *> *T*

*1.2.3 Convergence in Lq-Norm *

While a.s. convergence and convergence in probability concern the path of *yt*, *Lq*-convergence refers to the *q*th moment of *yt*. *Lq*-convergence is typically analyzed when *q *= 2 (convergence in mean square), when *q *= 1 (absolute convergence), and when *q *= ∞ (minmax convergence).

**Definition 1.5. (convergence in the norm). **{*yt*)} converges in the *Lq*-norm (or in the *q*, if there exists a *y*for some *q *> 0.

Obviously, if the *q*th moment does not exist, convergence in *Lq *does not apply (i.e., if *yt *is a Cauchy random variable, *Lq*-convergence is meaningless for all *q*), while convergence in probability applies even when moments do not exist. Intuitively, the difference between the two types of convergence lies in the fact that the latter allows the distance between *yt *and *y *to get large faster than the probability gets smaller, while this is not possible with *Lq*-convergence. Consequently, *Lq*-convergence is stronger than convergence in probability.

**Exercise 1.2. **Let *yt *converge to 0 in *Lq*. Show that *yt *converges to 0 in probability. (Hint: use Chebyshev’s inequality.)

The following result provides conditions ensuring that convergence in probability implies *Lq*-convergence.

**Result 1.6. **and sup*t *(Davidson 1994, p. 287).

Hence, convergence in probability plus the restriction that |*yt|q *is uniformly integrable, ensures convergence in the *Lq*-norm. In general, there is no relationship between *Lq *and a.s. convergence. The following shows that the two concepts are distinct.

**Example 1.6. **Let *y*) = *t *[0, 1/*t*) and *y( **) *(1 / *t*: lim*t*→∞ *yt*. Since *yt *is not uniformly integrable it fails to converge in the *q *mean for any *q *> 1 (for *q *= 1, *E*|*yt*| = 1, ∀*t*). Hence, the limiting expectation of *yt *differs from its a.s. limit.

**Exercise 1.3. **Let

Show that the first and second moments of *yt *but that *yt *does not converge in quadratic mean to 1.

The next result shows that convergence in the *Lq′ *-norm obtains when we know that convergence in the *Lq*-norm occurs, *q *> *q′*. The result makes use of Jensen’s inequality, which we state next. Let *h *and let *y *. If *h *.

**Example 1.7. **For

.

**Result 1.7. **Let

.

**Example 1.8. **and *P*1) = *P*2) = 0.5. Let *yt*1) = (−1)*t*, *yt*2) = (−1)*t*+1 and let *y*1) = *y*2) = 0. Clearly, *yt *converges in the *Lq*. Since *yt *.

*1.2.4 Convergence in Distribution *

**Definition 1.6 (convergence in distribution). **Let {*yt*)} be an *m *, for every point of continuity *z*is the distribution function of a random variable *y*.

Convergence in distribution is the weakest convergence concept and does not imply, in general, anything about the convergence of a sequence of random variables. Moreover, while the previous three convergence concepts require {*yt*)} and the limit *y*) to be defined on the same probability space, convergence in distribution is meaningful even when this is not the case.

It is useful to characterize the relationship between convergence in distribution and convergence in probability.

**Result 1.8. **, *y**y *is the distribution of a random variable *z *such that *P*[*z *= *y*(see Rao 1913, p. 120).

*y *is a continuous function of *y*.

The next two results are handy when demonstrating the limiting properties of a class of estimators in dynamic models. Note that *y*1*t*) is *Op*(*t j*) if there exists an *O*(1) nonstochastic sequence *y*2*t *and that *y*2*t *is *O*(1) if for some 0 < Δ < ∞, there exists a *T *such that |*y*2*t*| < Δ for all *t **T*.

**Result 1.9. **is a constant (Davidson 1994, p. 355). If *y*1*t *and *y*2*t *(Rao 1913, p. 123).

Result 1.9 is useful when the distribution of *y*1*t *cannot be determined directly. In fact, if we can find a *y*2*t *with known asymptotic distribution which converges in probability to *y*1*t*, then the distribution of *y*1*t *can automatically be obtained. We will use this result in **chapter 5 when discussing two-step estimators. **

The limiting behavior of continuous functions of sequences which converge in distribution is easy to characterize. In fact, we have the following result.

**Result 1.10. **. If *h *(Davidson 1994, p. 355).

Most of the analysis conducted in this book assumes that observable time series are stationary and have memory which dies out sufficiently fast over time. In some cases we will use alternative and weaker hypotheses which allow for selected forms of nonstationarity and/or for more general memory requirements. This section provides definitions of these concepts and compare various alternatives.

We need two preliminary definitions.

**Definition 1.7 (lag operator). **The lag operator is defined by, *ℓyt *= *yt*−1 and *ℓ *−1*yt *= *yt*+1. The matrix lag operator *A(ℓ) *is defined by *A(ℓ) *= *A*0 + *A*1*ℓ *+ *A*2*ℓ *² + . . . , where *Aj*, *j *= 1, 2, . . . , are *m *× *m *matrices.

**Definition 1.8 (autocovariance function). **The autocovariance function of {*yt*)} is

and its autocorrelation function is

In general, both the autocovariance and the autocorrelation functions depend on time and on the gap between *yt *and *yt*−*τ*.

**Definition 1.9 (stationarity 1). **is stationary if, for any set of paths

.

A process is stationary if shifting a path over time does not change the probability distribution of that path. In this case, the joint distribution of {*yt*1,. . . . , *ytj*. A weaker concept is the following.

**Definition 1.10 (stationarity 2). **is covariance (weakly) stationary if *E*(*yt*) is constant; *E*|*yt*|² < ∞; ACF*t*(*τ*) is independent of *t*.

Definition 1.10 is weaker than 1.9 since it concerns the first two moments of *yt *rather than its joint distribution. Clearly, a stationary process is weakly stationary, while the converse is true only when the *yt *are normal random variables. In fact, when *yt *path is normal.

**Example 1.9. **Let *yt *= *e*1 cos(*ω*t) + *e*2 sin(*ωt*), where *e*1, *e*2 are uncorrelated with mean zero, unit variance, and *ω *[0, 2π]. Clearly, the mean of *yt *is constant and *E*|*yt*|² < ∞. Also, cov(*yt*, *yt*+*τ*) = cos(*ωt*) cos(*ω*(*t *+ *τ*)) + sin(*ωt*) × sin(*ω*(*t *+ *τ*)) = cos(*ωτ*). Hence, *yt *is covariance stationary.

**Exercise 1.4. **Suppose *yt *= *et *if *t *is odd and *yt *= *et *+ 1 if *t *is even, where *et *~ i.i.d. (0, 1). Show that *yt *, where *et *is a constant, is not a stationary process, but that Δ*yt *= *yt *– *yt*−1 is stationary.

When {*yt*0, (ii) |ACF(*τ*ACF(0), (iii) ACF(−*τ*) = ACF(*τ*) for all *τ*. Furthermore, if *y*1*t *and *y*2*t *are two stationary uncorrelated processes, *y*1*t *+ *y*2*t *is stationary and the autocovariance function of *y*1*t *+ *y*2*t *is ACF*y*1 (*τ*) + ACF*y*2 (*τ*).

**Example 1.10. **, where |*D*| < 1 and *et *~ i.i.d. (0, σ²). Clearly, *yt *is not covariance stationary since *E*(*yt*+ *at*, which depends on time. Taking first difference we have Δ*yt *= *a *+ *D*Δ*et*. Here *E*(Δ*yt*) = a, *E*(Δ*yt *− a)² = 2*D*²σ² > 0, *E*(Δ*yt *− a)(Δ*yt*−1 − a) = −*D*²σ² < *E*(Δ*yt *− a)², and *E*(A*yt *− a)(Δ*yt*+1 − a) = –*D*²σ².

**Exercise 1.5. **, where *et *, *a *. Compute the mean and the autocovariance function of *y*2*t*. Is *y*2*t *stationary? Is it covariance stationary?

**Definition 1.11 (autocovariance generating function). **, provided that the sum converges for all *z *−1 < |*z*> 1.

**Example 1.11. **Consider the process *yt *= *et*−*Det*−1 = (1−*Dℓ*)*et*, |*D*| < 1, *et *,

. Hence,

Example 1.11 can be generalized to more complex processes. In fact, if *yt *, and this holds for both univariate and multivariate *yt*. One interesting special case occurs when *z *= e−i*ω *= cos(*ω*) – i sin(*ω*, in which case

is the spectral density of *yt*.

**Exercise 1.6. **Consider *yt *= (1 + 0.5*ℓ *+ 0.8*ℓ*²)*et*, and (1 – 0.25*ℓ*)*yt *= *et*, where *et *. Are these processes covariance stationary? If so, show the autocovariance and the autocovariance generating functions.

**Exercise 1.7. **Let {*y*1*t*)} be a stationary process and let *h *be an *n *× 1 vector of continuous functions. Show that *y*2*t *= *h*(*y*1*t*) is also stationary.

Stationarity is a weaker requirement than i.i.d., where no dependence between elements of a sequence is allowed, but it is stronger that the identically (not necessarily independently) distributed assumption.

**Example 1.12. **Let *yt *~ i.i.d. (0, 1), ∀*t*. Since *yt*–*τ *~ i.i.d. (0, 1), ∀*τ*, any finite subsequence *yt*l+*τ*, . . . , *ytj*+*τ *will have the same distribution and therefore *yt *is stationary. It is easy to see that a stationary series is not necessarily i.i.d. For instance, let *yt *= *et *– *Det*−1. If |*D*| < 1, *yt *is stationary but not i.i.d.

**Exercise 1.8. **Give an example of a process which is identically (but not necessarily independently) distributed but nonstationary.

In this book, processes which are stationary will sometimes be indicated with the notation *I*(0), while processes which are stationary after *d *differences will be denoted by *I*(*d*).

A property of stationary sequences which ensures that the sample average converges to the population average is ergodicity (see section 1.4). Ergodicity is typically defined in terms of invariant events.

**Definition 1.12 (ergodicity 1). **Suppose *yt*) = *y*1(*ℓt−1 *), ∀*t*. Then {*yt*.

) will converge to the same limit. Hence, one path is sufficient to infer the moments of its distribution.

**Example 1.13. **be the length of the interval [*y*0, *yt*, *yt *or 0 so *yt *is not ergodic.

**Example 1.14. **Consider the process *yt *= *et *– 2*e*t−1, where *et *and cov(*yt*, *yt*−*τ*) does not depend on *t*. Therefore, the process is covariance stationary. To verify that it is ergodic, consider the sample mean (1 / *T*) Σ*t yt*. The sample variance is

, which converges to var(*yt*.

**Example 1.15. **Let *yt *= *e*1 + *e*2*t*, where *e*2*t *~ i.i.d. (0, 1) and *e*1 ~ i.i.d. (1, 1). Clearly, *yt *is stationary and *E*(*yt*) = 1. However, (1/*T*)Σ*tyt *= *e*1 + (1/*T*) × Σ*te*2*t *and

. Since the time average of *yt *(equal to *e*1) is different from the population average of *yt *(equal to 1), *yt *is not ergodic.

What is wrong with example 1.15? Intuitively, *yt *is not ergodic because it has too much

memory (*e*1 appears in *yt *for every *t*). In fact, for ergodicity to hold, the process must forget

its past reasonably fast. The laws of large numbers of section 1.4 give conditions ensuring that the memory of the process is not too strong.

**Exercise 1.9. **Suppose *yt *= 0.6*yt*−1 + 0.2*yt*−2 + *et*, where *et *~ i.i.d. (0, 1). Is *yt *stationary? Is it ergodic? Find the effect of a unitary change in *et *on *yt*+3. Repeat the exercise for *yt *= 0.4*yt*−1 + 0.8*yt*−2 + *et*.

**Exercise 1.10. **Consider the bivariate process:

where *E*(*e*1*t e*1*τ*) = 1 for *τ *= *t *and 0 otherwise, *E*(*e*2*t e*2*τ*) = 2 for *τ *= *t *and 0 otherwise, and *E*(*e*1*t e*2*τ*) = 0 for all *τ*, *t*. What is the limit of this derivative as *τ *→ ∞?

**Exercise 1.11. **Suppose that at *t *is given by

Show that *yt *is stationary but not ergodic. Show that a single path (i.e., a path composed of only 1s and 0s) is ergodic.

**Exercise 1.12. **, where *et *. Show that *yt *is neither stationary nor ergodic. Show that the sequence {*yt*, *yt*+4, *yt*+8, . . . } is stationary and ergodic.

Exercise 1.12 shows an important result: if a process is nonergodic, it may be possible to find a subsequence which is ergodic.

**Exercise 1.13. **Show that if {*y*1*t*)} is ergodic, *y*2*t *= *h*(*y*1*t*) is ergodic if *h *is continuous.

A concept which bears some resemblance to ergodicity is that of mixing.

**Definition 1.13 (mixing 1). **be two Borel algebras**and B-mixing and α-mixing are defined as follows: **

-mixing and *α*and *α *provides a measure of relative dependence while *α *measures absolute dependence.

For a stochastic process *α*be the Borel algebra generated by values of *yt *from the infinite past up to *t *be the Borel algebra generated by values of *yt *from *t *+ *τ *contains information up to *t *information from *t *+ *τ *on.

**Definition 1.14. (mixing 2). **For a stochastic process {*yt*and *α *and *α*(*τ*) = sup*t *.

(*τ*) and *α*(*τ*), called respectively uniform and strong mixing, measure how much dependence there is between elements of {*yt*} separated by *τ *(*τ*) = *α*(*τ*) = 0, *yt *and *yt*+*τ *(*τ*) = *α*(*τ*) = 0 as *τ *(*τ**α*(*τ*-mixing implies *α*-mixing.

**Example 1.16. **Let *yt *be such that cov(*yt yt*−*τ*1) = 0 for some *τ*1. Then

. Then *α*(*τ*) = 0 as *τ *→ ∞.

**Exercise 1.14. **(*τ*) does not go to zero as *τ *→ ∞.

Mixing is a somewhat stronger memory requirement than ergodicity. Rosenblatt (1978) shows the following result.

**Result 1.11. **Let *yt *be stationary. If *α*(*τ*) → 0 as *τ *→ ∞, *yt *is ergodic.

**Exercise 1.15. **(*τ**α*(*τ*(*τ*) → 0 as *τ *-mixing process is ergodic.

Both ergodicity and mixing are hard to verify in practice. A concept which bears some relationship to both and is easier to check is the following.

**Definition 1.15. (asymptotic uncorrelatedness). ***yt*and

, where var(*yt*) < ∞, ∀*t*.

Intuitively, if we can find an upper bound to the correlation of *yt *and *yt*−*τ*, ∀*τ*, and if the accumulation over *τ *of this bound is finite, the process has a memory that asymptotically dies out.

**Example 1.17. **Let *yt *= A*yt*−1 + *et*, *et *. Here corr(*yt*, *yt*−*τ*) = *Aτ *, so that *yt *has asymptotically uncorrelated elements.

Note that in definition 1.15 only *τ *> 0 matters. From example 1.17 it is clear that when var(*yt*) is constant and the covariance of *yt *with *yt*−*τ *only depends on *τ*, asymptotic uncorrelatedness is the same as covariance stationarity.

**Exercise 1.16. **as *τ *for some *b *> 0, *τ *sufficiently large.

**Exercise 1.17. **Suppose that *yt *is such that the correlation between *yt *and *yt*−*τ *goes to zero as *τ *→ ∞. Is this sufficient to ensure that *yt *is ergodic?

Instead of assuming stationarity and ergodicity or mixing, one can assume that *yt *satisfies an alternative set of conditions. These conditions considerably broaden the set of time series a researcher can work with.

**Definition 1.16. (martingale). **{*yt*for all *t*, *τ*.

**Definition 1.17. (martingale difference). **{*yt*.

**Example 1.18. **Let *yt *be i.i.d. with *E*(*yt*. Then *yt *is a martingale difference sequence.

Martingale difference is a much weaker requirement than stationarity and ergodicity since it only involves restrictions on the first conditional moment. It is therefore easy to build examples of processes which are martingale differences but are not stationary.

**Example 1.19. **Suppose that *yt *is i.i.d. with mean zero and variance *t*². Then *yt *is a martingale difference, nonstationary process.

**Exercise 1.18. **Let *y*1*t *be its conditional expectation. Show that *y*2*t *is a martingale.

Using the identity

, one can write

where

is the one-step-ahead revision in *yt*, made with new information accrued from *t *– *j *– 1 to *t *– *j*. Rev*t*−*j*(*t*) plays an important role in deriving the properties of functions of stationary processes, and will be extensively used in **chapters 4 and 10. **

**Exercise 1.19. **Show that Rev*t*−*j*(*t*) is a martingale difference.

, which appear in the formulas of OLS or IV estimators stochastically converge to well-defined limits. Since different conditions apply to different kinds of economic data, we consider here situations which are typically encountered in macro-time series contexts. Given the results of section 1.2, we will describe only strong laws of large numbers since weak laws of large numbers hold as a consequence.

Laws of large numbers typically come in the following form: given restrictions on the dependence and the heterogeneity of the observations and/or some restrictions on moments,

We will consider three cases: (i) *yt *has dependent and identically distributed elements; (ii) *yt *has dependent and heterogeneously distributed elements; (iii) *yt *has martingale difference elements. To better understand the applicability of each case note that in all cases observations are serially correlated. In the first case we restrict the distribution of the observations to be the same for every *t*; in the second we allow some carefully selected form of heterogeneity (for example, structural breaks in the mean or in the variance or conditional heteroskedasticity); in the third we do not restrict the distribution of the process, but impose conditions on its moments.

*1.4.1 Dependent and Identically Distributed Observations *

To state a law of large numbers (LLN) for stationary processes, we need conditions on the memory of the sequence. Typically, one assumes ergodicity since this implies average asymptotic independence of the elements of the {*yt*)} sequence.

The LLN is then as follows. Let {*yt*)} be stationary and ergodic with *E*|*yt*| < ∞, ∀*t*(see Stout 1974, p. 181).

To use this law when dealing with econometric estimators, recall that, for any measurable function *h *such that *y*2*t *= *h*(*y*1*t*), *y*2*t *is stationary and ergodic if *y*1*t *is stationary and ergodic.

**Exercise 1.20 (strong consistency of OLS and IV estimators). **Let

, and assume

(i)

(ii)

, where Σ*zx, T *is an *O*(1) random matrix which depends on *T *and has uniformly continuous column rank.

Show that *α*OLS = (*x*′ *x*)−1 (*x*′ *y*) and *α*IV = (*z*′ *x*)−1 (*z*′ *y*) exist a.s. for *T *under (ii). Show that under (ii′) *α*IV exists a.s. for *T *. (Hint: if *An *is a sequence of *k*1× *k *matrices, then *An *has uniformly full column rank if there exists a sequence of *k *× *k *which is uniformly nonsingular.)

*1.4.2 Dependent and Heterogeneously Distributed Observations *

To derive an LLN for dependent and heterogeneously distributed processes, we drop the ergodicity assumption and we substitute it with a mixing requirement. In addition, we need to define the *size *of the mixing conditions.

**Definition 1.18. **Let 1 ≤ *a *(*τ*) = *O*(*τ*−*b*) for *b *> a/(2*a *(*τ*) is of size *a*/(2*a *– 1). If *a *> 1 and *α*(*τ*) = *O*(*τ*−b) for *b *> *a*/(*a *– 1), *α*(*τ*) is of size *a*/(*a *– 1).

With definition 1.18 one can make precise statements on the memory of the process. In fact, *a *regulates the memory of a process. As *a *→ ∞, the dependence increases while as *a *→ 1, the sequence exhibits less and less serial dependence.

The LLN is the following. Let {*yt*(*τ*) of size *a*/(2*a *– 1) or *α*(*τ*) of size a/(a – 1), a > 1, and *E*(*yt*) < ∞, ∀*t*. If, for some 0 < *b *≤ a,

(see McLeish 1974, theorem 2.10).

In this law, the elements of *yt *are allowed to have time-varying distributions (e.g., *E*(*yt*) may depend on *t*) but the condition

restricts the moments of the process. Note that, for *a *= 1 and *b *= 1, the above collapses to Kolmogorov law of large numbers.

The moment condition can be weakened somewhat if we are willing to impose a bound on the (*a *+ *b*)th moment.

**Result 1.12. **Let {*yt*(*τ*) of size *a*/(2*a *– 1) or with *α*(*τ*) of size *a*/(*a *– 1), *a *> 1, such that E|*yt*|a+b is bounded for all *t*.

The next result mirrors the one obtained for Stationary ergodic processes.

**Result 1.13. **Let *h *, *τ *finite. If *y*1*t *(*τ*) (*α*(*τ*)) is *O*(*τ*−*b*) for some *b *> 0, *y*2*τ *(*τ*) (*α*(*τ*)) is *O*(*τ*−*b*).

From the above result it immediately follows that, if {*zt*, *xt*, *et*are also mixing processes of the same size.

The following result is useful when observations are heterogeneous.

**Result 1.14. **Let {*yt*(see White 1984, p. 48).

The LLN for processes with asymptotically uncorrelated elements is the following. Let {*yt*)} be a process with asymptotically uncorrelated elements, mean *E*(*yt*

Compared with result 1.12, we have relaxed the dependence restriction from mixing to asymptotic uncorrelation at the cost of altering the restriction on moments of order *a *+ *b *(*a *1, *b *≤ *a*, etc., directly.

*1.4.3 Martingale Difference Process *

The LLN for this type of process is the following. Let {*yt*)} be a martingale difference. If, for some *a *.

The martingale LLN requires restrictions on the moments of the process which are slightly stronger than those assumed in the case of independent *yt*. The analogue of result 1.12 for martingale differences is the following.

**Result 1.15. **Let {*yt*)} be a martingale difference such that *E*|*yt*|²a < Δ < ∞, for some *a *1 and all *t*.

**Exercise 1.21. **Suppose {*y*1*t*)} is a martingale difference. Show that *y*2*t *= *y*1*tzt *.

**Exercise 1.22. **Let *yt *= *xtα*0 + *et *and assume that *et *is positive and finite. Show that *α*.

There are also several central limit theorems (CLTs) available in the literature. Clearly, their applicability depends on the type of data a researcher has available. In this section we list CLTs for the three cases we have described in section 1.4. Loeve (1977) or White (1984) provide theorems for other relevant cases.

*1.5.1 Dependent and Identically Distributed Observations *

for *τ *→ ∞ (referred to as linear regularity in **as τ → ∞. The second condition is obviously stronger than the first one. Restrictions on the variance of the process are needed since when yt is a dependent and identically distributed process its variance is the sum of the variances of the forecast revisions made at each t, and this may not converge to a finite limit. We ask the reader to show this in the next two exercises. **

**Exercise 1.23. **, where Rev*t*−*j *(*t*) was defined just before exercise 1.19. Note that this implies that

.

**Exercise 1.24. **Give conditions on *yt *that make *ρτ *independent of *t*goes to ∞ as *T *→ ∞.

converges is that

A CLT is then as follows. Let (i) {*yt*)} be stationary and ergodic process,

and

(see Gordin 1969).

**Example 1.20. **. Consider, for example, *yt *= *et *– *et*−1, *et *.

**Exercise 1.25. **Assume that

, where *α*OLS is the OLS estimator of *α*0 in the model *yt *= *xtα*0 + *et *and *T *is the number of observations.

*1.5.2 Dependent Heterogeneously Distributed Observations *

The CLT in this case is the following. Let {*yt*(*τ*) or *α*(*τ*) is of size *a*/*a *– 1, *a *> 1, with *E*(*yt*) = 0 and *E*|*yt*|²a < Δ < ∞, ∀*t*, uniformly in *b*. Then

as *T *(see White and Domowitz 1984).

As in the previous CLT, we need the condition that the variance of *yt *in *b*. This is equivalent to imposing that *yt *is asymptotically covariance stationary, that is, that heterogeneity in *yt *dies out as *T *increases (see White 1984, p. 128).

*1.5.3 Martingale Difference Observations *

The CLT in this case is as follows. Let {*yt**t *be the distribution function of *yt *> 0,

(see McLeish 1974).

The last condition is somewhat mysterious: it requires that the average contribution of the extreme tails of the distribution to the variance of *yt *is zero in the limit. If this condition holds, then *yt *satisfies a uniform asymptotic negligibility condition. In other words, none of the elements of {*yt*)} can have a variance which dominates the variance of (1/*T*) Σ*t yt*. We illustrate this condition in the next example.

**Example 1.21. **, 0 < *ρ *as *T *→ ∞. Then

as *T *→ ∞ and the asymptotic negligibility condition holds.

The martingale difference assumption allows us to weaken several of the conditions needed to prove a central limit theorem relative to the case of stationary processes, and it will be the assumption used in several parts of this book.

A result, which will become useful in later chapters, concerns the asymptotic distribution of functions of converging stochastic processes.

**Result 1.16. **Suppose the *m *× 1 vector {*yt*, where *Σy *is a symmetric, nonnegative definite matrix and *at *→ 0 as *t *be such that each *hj *(*y*is an *n *× *m *.

**Example 1.22. **Suppose *yt *.

A central object in the analysis of time series is the spectral density (or spectrum).

**Definition 1.19. (spectral density). **The spectral density of a stationary {*yt*)} process at frequency *ω *.

We have already mentioned that the spectral density is a reparametrization of the covariance generating function and is obtained by setting *z *= e−i*ω *= cos(*ω*) – i sin(*ω*. Definition 1.19 also shows that the spectral density is the Fourier transform of the autocovariance of *yt*. Hence, the spectral density simply repackages the autocovariances of {*yt*)} by using sine and cosine functions as weights but can be more useful than the autocovariance function since, for *ω *appropriately chosen, its elements are uncorrelated.

In fact, if we evaluate the spectral density at Fourier frequencies, i.e., at *ωj *= 2π*j*/*T*, *j *= 1, . . . , *T *– 1, for any two *ω*1 ≠ *ω*(*ω*(*ω*2). Note that Fourier frequencies change with *T*, making recursive evaluation of the spectral density cumbersome.

**Example 1.23. **(*ω *(*ωj*). It is easily verified that

that is, the spectral density at frequency zero is the (unweighted) sum of all the elements of the autocovariance function. When *ωj *= 2π*j*/*T*,

, that is, the variance of the process is the area below the spectral density.

To understand how the spectral density transforms the autocovariance function, select, for example, *ω *= π/2. Note that cos(π/2) = 1, cos(3π/2) = –1, cos(π) = cos(2π) = 0, and that sin(π/2) = sin(3π/2) = 0, sin(0) = 1, and sin(π) = –1, and that these values repeat themselves since the sine and cosine functions are periodic.

**Exercise 1.26. **(*ω *= π). Which autocovariances enter at frequency π?

For a Fourier frequency, the corresponding period of oscillation is 2π/*ωj *= *T*/*j*.

**Example 1.24. **Suppose you have quarterly data. Then, at the Fourier frequency π/2, the period is equal to 4. That is, at frequency π/2 you have fluctuations with an annual periodicity. Similarly, at the frequency π, the period is 2 so that semiannual cycles are present at π.

**Exercise 1.27. **Business cycles are typically thought to occur with a periodicity between two and eight years. Assuming that you have quarterly data, find the Fourier frequencies characterizing business cycle fluctuations. Repeat the exercise for annual and monthly data.

**Figure 1.1. **(a) Short and (b) long cycles.

Given the formula to calculate the period of oscillation, we can immediately see that low frequencies are associated with cycles of long periods of oscillation, that is, with infrequent shifts from a peak to a through, and high frequencies with cycles of short periods of oscillation, that is, with frequent shifts from a peak to a through (see **figure 1.1). Hence, trends (i.e., cycles with an infinite periodicity) are located in the lowest frequencies of the spectrum and irregular fluctuations in the highest frequencies. Since the spectral density is periodic mod(2π) and symmetric around ω (ω) over the interval [0, π]. **

**Exercise 1.28. **(*ωj*(−*ωj*).

**Example 1.25. **Suppose {*yt*for *τ **y*(*ωj*) = σ²/2π, ∀*ωj*. That is, the spectral density of an i.i.d. process is constant for all *ωj *[0, π].

**Exercise 1.29. **Consider a stationary AR(1) process {*yt*)} with autoregressive coefficient equal to 0 ≤ *A *< 1. Calculate the autocovariance function of *yt*. Show that the spectral density is monotonically increasing as *ωj *→ 0.

**Exercise 1.30. **Consider a stationary MA(1) process {*yt*)} with MA coefficient equal to *D*. Calculate the autocovariance function and the spectral density of *yt*. Show their shape when *D *> 0 and *D *< 0.

Economic time series have a typical bell-shaped spectral density (see **figure 1.2) with a large portion of the variance concentrated in the lower part of the spectrum. Given the result of exercise 1.29, it is therefore reasonable to posit that most economic time series can be represented with relatively simple AR processes. **

**Figure 1.2. **Spectral density.

The definitions we have given are valid for univariate processes but can be easily extended to vectors of stochastic processes.

**Definition 1.20 (spectral density matrix). **The spectral density of an *m *, where

The elements on the diagonal of the spectral density matrix are real while the off-diagonal elements are typically complex. A measure of the strength of the relationship between two series at frequency *ω *is given by the coherence.

**Definition 1.21. **Consider a bivariate stationary process {*y*1*t*), *y*2*t*)}. The coherence between {*y*1*t*)} and {*y*2*t*)} at frequency *ω *is

The coherence is the frequency domain version of the correlation coefficient. Note that Co(*ω*) is a real-valued function, where |*y*| indicates the real part (or the modulus) of the complex number *y*.

**Example 1.26. **Suppose *yt *= *D*(*ℓ*)*et*, where *et *. It can be immediately verified that the coherence between *et *and *yt *is 1 at all frequencies. Suppose, on the other hand, that Co(*ω*) monotonically declines to 0 as *ω *moves from 0 to π. Then *yt *and *et *have similar low-frequency but different high-frequency components.

**Exercise 1.31. **Suppose that *et *and let *yt *= *Ayt*−1 + *et*.

Interesting transformations of *yt *can be obtained with the use of filters.

**Definition 1.22. **A filter is a linear transformation of a stochastic process, i.e., if *yt *(*ℓ*)*et*, *et *(*ℓ*) is a filter.

A moving average (MA) process is therefore a filter since a white noise is linearly transformed into another process. In general, stochastic processes can be thought of as filtered versions of some white noise process. To study the spectral properties of filtered processes, let CGF*e*(*z*) be the covariance generating function of *et*. Then the covariance generating function of *yt *is CGF*y*(*z*(*z*(*z*−1) CGF*e *(*z*(*z*)|² CGF*e*(*z*(*z*(*z*).

**Example 1.27. **Suppose that *et **e*(*ω*) = σ²/2π, ∀*ω*. Consider now the process *yt *= *D*(*ℓ*)*et*, where *D*(*ℓ*) = *D*0 + *D*1*ℓ *+ *D*2*ℓ*² +···. It is usual to interpret *D*(*ℓ*) as the response function of *yt *to a unitary change in *et**y*(*ω*) = |*D*(*e*−i*ω*e(*ω*), where |*D*(e−i*ω*)|² = *D*(e−i*ω*) *D*(ei*ω*) and *D*(e−i*ω*) = Σ*τDτ*e−i*ωτ *measures how a unitary change in *et *affects *yt *at frequency *ω*.

**Example 1.28. **Suppose that *yt *+ *at *+ *D*(*ℓ*)*et*, where *et *. Since *yt *(*ω*) does not exist. Differencing the process we have *yt *– *yt*−1 = a + *D*(*ℓ*)(*et *– *et*−1) so that *yt *– *yt*−1 is stationary if *et *– *et*−1 is a stationary and all the roots of *D*(*ℓ*) are greater than one in absolute value. If these conditions are met, the spectrum of Δ*yt *Δ*y*(*ω*) = |*D*(e−i*ω*Δ*e*(*ω*).

(e−i*ω*(e−i*ω*)|², the square modulus of the transfer function, measures the change in variance of *et *(e−i*ω*, measures how much the lead–lag relationships in *et *(e−i*ω*) = Ga(*ω*)e−iPh(*ω*), where Ga(*ω*) is the gain. Here Ga(*ω*(e−i*ω*)| measures the change in the amplitude of cycles induced by the filter.

Filtering is an operation frequently performed in everyday life (e.g., tuning a radio on a station filters out all other signals (waves)). Several types of filter are used in modern macroeconomics. **Figure 1.3 presents three general types of filter: a low pass, a high pass, and a band pass. A low pass filter leaves the low frequencies of the spectrum unchanged but wipes out high frequencies. A high pass filter does exactly the opposite. A band pass filter can be thought of as a combination of a low pass and a high pass filter: it wipes out very high and very low frequencies and leaves unchanged frequencies in middle range. **

**Figure 1.3. **Filters: (a) low pass; (b) high pass; (c) band pass.

Low pass, high pass, and band pass filters are nonrealizable, in the sense that, with samples of finite length, it is impossible to construct objects that look like those of **( ℓ(ℓ(ℓ)bp) have the following time representations. **

, for some *ω*(0, π).

, ∀*j *> 0.

, ∀*j *> 0, *ω*2 > *ω*1.

When *j *is finite the box-like spectral shape of these filters can only be approximated with a bell-shaped function. This means that relative to the ideal, realizable filters generate a loss of power at the edges of the band (a phenomenon called leakage) and an increase in the importance of the frequencies in the middle of the band (a phenomenon called compression). Approximations to these ideal filters are discussed in **chapter 3. **

**Definition 1.23. **The periodogram of a stationary *yt *) is

.

Perhaps surprisingly, the periodogram is an inconsistent estimator of the spectrum (see, for example, Priestley 1981, p. 433). Intuitively, this occurs because it consistently captures the power of *yt *over a band of frequencies but not in each single one of them. To obtain consistent estimates it is necessary to smooth

periodogram estimates with a filter. Such a smoothing filter is typically called a kernel.

**Definition 1.24. **(*ω**T*(*ω**T*(*ω*) → 0 uniformly as *T *→ ∞, for |*ω*.

Kernels can be applied to both autocovariance and periodogram estimates. When applied to the periodogram, a kernel produces an estimate of the spectrum at frequency *ω *by using a weighted average of the values of the periodogram in a neighborhood of *ω*. Note that this neighborhood is shrinking as *T *T(*ω*) looks like a δ-function, i.e., it puts all its mass at one point.

There are several types of kernel. Those used in this book are the following.

(1) Box-car (truncated):

(2) Bartlett:

(3) Parzen:

(4) Quadratic spectral:

Here *J*(*T*) is a truncation point, typically chosen to be a function of the sample size *T*QS crosses zero (call it *J**(*T*)) and this point plays the same role as *J*(*T*) in the other three kernels.

The Bartlett kernel and the quadratic spectral kernel are the most popular ones. The Bartlett kernel has the shape of a tent with width 2*J*(*T*). To ensure consistency of the spectral estimates, it is standard to select *J*(*T*) so that *J*(*T*)/*T *→ 0 as *T *→ ∞. In **figure 1.4 we have set J(T) = 20. The quadratic spectral kernel has the form of a wave with infinite loops, but after the first crossing, the side loops are small. **

**Exercise 1.32. **is consistent, where

is a kernel, and *i, i*′ = 1, 2.

**Figure 1.4. **(a) Bartlett and (b) quadratic spectral kernels.

While for the most part of this book we will consider stationary processes, we will deal at times with processes which are only locally stationary (e.g., processes with time-varying coefficients). For these processes, the spectral density is not defined. However, it is possible to define a local

spectral density and practically all the properties we have described also apply to this alternative construction. For details, see Priestley (1981, **chapter 11). **

**Exercise 1.33. **Compute the spectral density of consumption, investment, output, hours, real wage, consumer prices, M1, and the nominal interest rate by using quarterly U.S. data and compute their pairwise coherence with output. Are there any interesting features at business cycle frequencies you would like to emphasize? Repeat the exercise using euro area data. Are there important differences with the United States? (Hint: be careful with potential nonstationarities in the data.)

**¹ A function of h . **

**² A stochastic process could also be defined as a sequence of random variables which are jointly measurable (see, for example, Davidson 1994, p. 177). **

**³ A Borel algebra is the smallest collection **

You've reached the end of this preview. Sign up to read more!

Page 1 of 1

Close Dialog## Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

Loading