Professional Documents
Culture Documents
Definition 2. {yt} is said to be strictly stationary process if for all n and for all (t1,
t2,…,tn), and for all τ,
( yt ,..., yt ) d ( yt +τ ,..., yt +τ )
1 n 1 n
Intuitively, stationarity means that the process attains a certain type of statistical
equilibrium and the distribution of the process does not change much. The problem is that
stationarity is rather restrictive and very difficult to verify.
Definition 3. Let {yt} be a stochastic process such that Var (yt) < ∞ for all t. Then the
autocovariance function γ(r,s) of {yt} is defined as,
γ (r , s ) = Cov ( y r , y s ) = E ( y r − E ( y r ))( y s − E ( y s ))
If k = 0, then Cov(yt yt) = γ0 for all t. The means and variances of a stationary process
always remain constant. Strict stationarity implies weak stationarity but the converse is
not true.
γk
(ii) ρk = is called the autocorrelation function
γ0
For stationary processes, we expect that both γ(.) and ρ(.) taper off to zero fairly rapidly.
This is an indication of what is known as the short-memory behavior of the series.
E [ε t ] = 0 ∀t
Var [ε t ] = σ 2 ∀ t
Cov(ε t ε t + k ) = γ k = 0 ∀k ≠ 0
Thus, a white noise process has autocovariance and autocorrelation functions that are all
equal to zero.
σ 2 k =0 1 k =0
γk = and ρk =
o k ≠0 0 k ≠0
• the process hardly occurs in applied time series but plays an important role in
constructing time series models
• a white noise process is Gaussian if the joint distribution is normally distributed
In practice, γk and ρk are unknown and they have to be estimated from sample data
(realization).
T T
∑ yt ∑ (y − y)
2
t
y= t =1
and γˆ0 = t =1
T T
The sample auto-covariance and sample autocorrelation functions are defined as,
3
T −k
∑ (y t − y )( yt + k − y )
a.γˆk = t =1
T −k
γˆ ∑(y t
− y )( yt + k − y )
b. ρˆ k = k = t =1
T
γˆ0
∑ (y − y)
2
t
t =1
The sample auto-covariance function is biased especially when k (the lag) is large with
respect to T (the sample size). For this reason, it is suggested that only T/4 sample
estimates are to be calculated from the data.
E [ρˆ k ] = ρk and
∞
∑ (ρ 2
i
+ ρ i + k ρ i − k − 4 ρ k ρ i ρ i −k + 2 ρ k2 ρ i )
Var [ρˆ k ] = −∞
T
For processes in which ρk = 0 for k > m, Bartlett approximated the variance as,
Var ( ρˆ k ) ≅
(1 + 2 ρ 1
2
+ 2 ρ 22 + 2 ρ 32 + ... + 2 ρ m2 )
T
Since the ρ’s are unknown, we use the following large-lag variance,
ˆ 2
Var ( ρˆ k ) = S ρ =
(1 + 2 ρˆ12 + 2 ρˆ 22 + 2 ρˆ 32 + ... + 2 ρ m2 )
k
T
To test for a white noise process, H0: ρ1 = ρ2 = …= ρk = 0, we use the following standard
error (under the null hypothesis),
S ρˆ = 1
k T
4
The pacf is the correlation between yt and yt+k after the mutual linear dependency on the
intervening variable yt+1, yt+2,…,yt+k-1 has been removed.
Consider the regression model where yt+k (the dependent variable), from a zero mean
stationary process, is regressed on k lagged variables yt+k-1, yt+k-2,…, yt.
y t +k = φk 1 yt +k −1 + φk 2 y t +k −2 + ... + φ kk yt + ε t +k (1)
where φkj denotes the jth regression parameter and εt+k is the normal error term that is
uncorrelated with Yt+k-j for j ≥ 0.
To compute for the sample pacf, a recursive method is used. Knowing that,
φˆ11 = ρˆ1
k
ρˆ k +1 − ∑φˆkj ρˆ k +1− j
φˆk +1,k +1 = j =1
k
1 − ∑φˆkj ρˆ j
j =1
Under the hypothesis that the underlying process is a white noise process, the variance of
the sample pacf can be approximated by (due to Bartlett),
1
Vˆar (φkk ) ≅
T
`
5
∞
yt = ∑ ϕ j ε t − j + κ t
j =0
= Ψ ( L)ε t + κ t
∞ ∞
where ϕ 0 = 1 and ∑ϕ j2 < ∞ Ψ ( L) = ∑ ϕ j L j
j =0 j =0
The term εt is the white noise and represents the error made in forecasting yt on the basis
of the linear function of lagged yt,
ε t = yt − Eˆ ( yt | yt −1 , yt − 2 ,K)
Remarks:
1 2 q
∞
θ ( L ) 1+ θ 1 L + θ 2 L + K + θ q L
Ψ ( L) = ∑ϕ j L = j
=
j =0 φ ( L) 1− φ1 L1 − φ2 L2 − K − φ p Lp
6
Intuition behind the philosophy is the benefits of parsimony or using as few parameters as
possible.
• Box and Jenkins (1976) have been influential advocates of this philosophy
• since we only estimate the parameters in θ(L) and φ(L) from the sample data,
the more parameters, the more room for errors
yt* = log(yt)
we normally combine (a) and (b), thus we have, yt*=(1 – L)d(1 – Ls)Dlog(yt)
2. Make an initial guess of small values for p and q for an ARMA(p,q) model
that might describe the transformed series.
3. Estimate the parameters in θ(L) and φ(L).
4. Perform diagnostic analysis to confirm that the model is indeed consistent
with the observed features of the data.
a. {εt} is white noise process with mean 0 and constant variance σε2
b. εt is normally distributed