You are on page 1of 6

1

Stochastic Processes and the Wold’s Decomposition

Definition 1. A collection of random variables {yt: t ∈ ℜ} is called a stochastic process.


In general, {y (t ) : 0 ≤ t < ∞} and {y t : t = ±1,±2,...} are used to
define a continuous-time and discrete-time stochastic process,
respectively.

Stochastic Process: yt : t = ±1,±2,...


Realization y1, y2, …, yT

I. Stationary Stochastic Process

Definition 2. {yt} is said to be strictly stationary process if for all n and for all (t1,
t2,…,tn), and for all τ,

( yt ,..., yt ) d ( yt +τ ,..., yt +τ )
1 n 1 n

where d denotes equality in distribution.

Intuitively, stationarity means that the process attains a certain type of statistical
equilibrium and the distribution of the process does not change much. The problem is that
stationarity is rather restrictive and very difficult to verify.

Definition 3. Let {yt} be a stochastic process such that Var (yt) < ∞ for all t. Then the
autocovariance function γ(r,s) of {yt} is defined as,

γ (r , s ) = Cov ( y r , y s ) = E ( y r − E ( y r ))( y s − E ( y s ))

Definition 4. {yt} is said to be weakly stationary (sometimes called covariance stationary


or second-order stationary) if,

(i) E ( yt ) = µ for all t (constant mean)


(ii) Cov( yt yt −k ) = γ k for all t and k (the autocovarince is not a function of t but of k –
the lag/time difference)

If k = 0, then Cov(yt yt) = γ0 for all t. The means and variances of a stationary process
always remain constant. Strict stationarity implies weak stationarity but the converse is
not true.

Definition 5. Let {yt} be a stationary process, then

(i) γ k = Cov ( yt yt − k ) is called the autocovariance function


2

γk
(ii) ρk = is called the autocorrelation function
γ0

For stationary processes, we expect that both γ(.) and ρ(.) taper off to zero fairly rapidly.
This is an indication of what is known as the short-memory behavior of the series.

II. The white noise process

A stochastic process {εt} is called a white noise process if,

E [ε t ] = 0 ∀t
Var [ε t ] = σ 2 ∀ t
Cov(ε t ε t + k ) = γ k = 0 ∀k ≠ 0

Thus, a white noise process has autocovariance and autocorrelation functions that are all
equal to zero.

σ 2 k =0 1 k =0
γk =  and ρk =
o k ≠0 0 k ≠0

Remarks on the white noise process

• the process hardly occurs in applied time series but plays an important role in
constructing time series models
• a white noise process is Gaussian if the joint distribution is normally distributed

III. The sample autocorrelation function (sacf)

In practice, γk and ρk are unknown and they have to be estimated from sample data
(realization).

Let {yt} t = 1,2,…,T be a given time series and let,

T T

∑ yt ∑ (y − y)
2
t
y= t =1
and γˆ0 = t =1

T T
The sample auto-covariance and sample autocorrelation functions are defined as,
3

T −k

∑ (y t − y )( yt + k − y )
a.γˆk = t =1

T −k

γˆ ∑(y t
− y )( yt + k − y )
b. ρˆ k = k = t =1
T
γˆ0
∑ (y − y)
2
t
t =1

The plot of the sample acf versus k is known as a correlogram.

The sample auto-covariance function is biased especially when k (the lag) is large with
respect to T (the sample size). For this reason, it is suggested that only T/4 sample
estimates are to be calculated from the data.

For large T, the sample acf is normally distributed with,

E [ρˆ k ] = ρk and

∑ (ρ 2
i
+ ρ i + k ρ i − k − 4 ρ k ρ i ρ i −k + 2 ρ k2 ρ i )
Var [ρˆ k ] = −∞

T
For processes in which ρk = 0 for k > m, Bartlett approximated the variance as,

Var ( ρˆ k ) ≅
(1 + 2 ρ 1
2
+ 2 ρ 22 + 2 ρ 32 + ... + 2 ρ m2 )
T
Since the ρ’s are unknown, we use the following large-lag variance,

ˆ 2
Var ( ρˆ k ) = S ρ =
(1 + 2 ρˆ12 + 2 ρˆ 22 + 2 ρˆ 32 + ... + 2 ρ m2 )
k
T
To test for a white noise process, H0: ρ1 = ρ2 = …= ρk = 0, we use the following standard
error (under the null hypothesis),

S ρˆ = 1
k T
4

IV. The partial autocorrelation function, φkk (pacf)

φkk = correlation (yt yt+k | yt+1, yt+2, …, yt+k-1)

The pacf is the correlation between yt and yt+k after the mutual linear dependency on the
intervening variable yt+1, yt+2,…,yt+k-1 has been removed.

Consider the regression model where yt+k (the dependent variable), from a zero mean
stationary process, is regressed on k lagged variables yt+k-1, yt+k-2,…, yt.

y t +k = φk 1 yt +k −1 + φk 2 y t +k −2 + ... + φ kk yt + ε t +k (1)

where φkj denotes the jth regression parameter and εt+k is the normal error term that is
uncorrelated with Yt+k-j for j ≥ 0.

V. The sample partial autocorrelation function

To compute for the sample pacf, a recursive method is used. Knowing that,

φˆ11 = ρˆ1
k
ρˆ k +1 − ∑φˆkj ρˆ k +1− j
φˆk +1,k +1 = j =1
k
1 − ∑φˆkj ρˆ j
j =1

where φˆk +1, j = φˆkj − φˆk +1,k +1φˆk ,k +1− j j = 1,2..., k

Under the hypothesis that the underlying process is a white noise process, the variance of
the sample pacf can be approximated by (due to Bartlett),

1
Vˆar (φkk ) ≅
T
`
5

VI. Wold’s Decomposition

(Wold’s decomposition). Any zero-mean covariance-stationary process {yt}t∈(-∞,∞) can


be represented in the form,


yt = ∑ ϕ j ε t − j + κ t
j =0

= Ψ ( L)ε t + κ t
∞ ∞
where ϕ 0 = 1 and ∑ϕ j2 < ∞ Ψ ( L) = ∑ ϕ j L j
j =0 j =0

The term εt is the white noise and represents the error made in forecasting yt on the basis
of the linear function of lagged yt,

ε t = yt − Eˆ ( yt | yt −1 , yt − 2 ,K)

The term κt is called a linearly deterministic component of yt (example, a function of time


t), while Σϕjεt-j is called the linearly non-deterministic component. If κ = 0, then the
process is called purely linearly non-deterministic.

Remarks:

• the proposition was first proved by Wold (1938)


• the proposition relies on stable second moments of yt but makes no use of
other higher moments
• finding the Wold representation requires fitting an infinite number of
parameters (ϕ1, ϕ2, …ϕn, …) to the data and given a finite number of
observations (y1, y2, …, yT) will never be possible
• an alternative is to represent ϕ(L) as the ratio of two finite-order polynomials

1 2 q

θ ( L ) 1+ θ 1 L + θ 2 L + K + θ q L
Ψ ( L) = ∑ϕ j L = j
=
j =0 φ ( L) 1− φ1 L1 − φ2 L2 − K − φ p Lp
6

VII. Box – Jenkins Modeling Philosophy

Intuition behind the philosophy is the benefits of parsimony or using as few parameters as
possible.

• Box and Jenkins (1976) have been influential advocates of this philosophy

• since we only estimate the parameters in θ(L) and φ(L) from the sample data,
the more parameters, the more room for errors

Box – Jenkins Model Building Approach

1. Transform the data, if necessary, so that the assumption of covariance


stationary is a reasonable one.

a. to make the mean stationary - transformation using differencing is the


usual approach

yt* = (1 – L)d (1 – Ls)Dyt, where,

(1 – L)d regular differencing to answer for trend; d is the order of


differencing

(1 – Ls)D seasonal differencing to answer for seasonality; s is the


length of seasonality and D is the order of differencing

usually, d = 1 and D = 1 (if there is seasonality)

b. to make the variance stationary – transformation using the natural


logarithm is the usual approach

yt* = log(yt)

we normally combine (a) and (b), thus we have, yt*=(1 – L)d(1 – Ls)Dlog(yt)

2. Make an initial guess of small values for p and q for an ARMA(p,q) model
that might describe the transformed series.
3. Estimate the parameters in θ(L) and φ(L).
4. Perform diagnostic analysis to confirm that the model is indeed consistent
with the observed features of the data.

a. {εt} is white noise process with mean 0 and constant variance σε2
b. εt is normally distributed

You might also like