You are on page 1of 12

Problem Set 4

Sergi Quintana, Joseph Emmens, Ignasi Merediz Solà - Econometrics II


May 16, 2020

Question 1
Estimate an ARCH(1), ARCH(4) and GARCH(1,1) models using the nysewk data.

We have done so in the accompanying MatLab document. First we compute the weekly
percentage growth as to measure the volatility. The data is shown in figure 1,

Figure 1: Weekly % growth

A quick visual assessment shows signs that the data is covariance stationary.

1
A GARCH model takes two parameters, p and q where p is the number of lag variances
and q the number of lag residuals. For p = 0, we can model ARCH(q) process using a
GARCH model. E.g. GARCH(0,2) = ARCH(2).

a)
Check that stationarity restrictions hold. Compare likelihood values. Which of the three
models do you prefer? But do the models have the same number of parameters?
Let’s first review what an ARCH(q) an a GARCH(p,q) consist of:
ARCH(q):
yt = µ + ρyt−1 + t
t = σt ut
q
X
2
σt = ω + αi 2t−i
i=1
GARCH(p,q):
yt = µ + ρyt−1 + t
t = σt ut
q p
X X
2 2 2
σt = ω + αi t−i + βσt−i
i=1 i=1
We then have estimated each model separately:
However, it is important what we define as yt . Following the “Garch11Example.jl”
script,  
pt
yt = 100 × log
pt−1
Where pt is the price at time t. That is, yt is the returns of this market defined a the
logarithmic difference of the price between two consecutive periods, and multiplied by
100 to have as a percentage growth.
ARCH(1):
Following the suggestion as in “Garch11Example.jl” script, we have defined σ12 as the
sample variance of the ten first observations.
Parameter Value
µ 0.1663
ρ -0.0017
ω 3.1711
α1 0.2537
The log-likelihood value for this model is,
LogL = −4.4632 × 103
GARCH(1,1):
Following the suggestion as in “Garch11Example.jl” script, we have defined σ12 as the
sample variance of the ten first observations.

2
Parameter Value
µ 0.1764
ρ 0.0017
ω 0.1597
α1 0.1123
β1 0.8536

The log-likelihood value for this model is,

LogL = −4.3962 × 103

ARCH(4):
Following the suggestion as in “Garch11Example.jl” script, we have defined σi2 (i =
1, 2, 3, 4) as the sample variance of the ten first observations.

Parameter Value
µ 0.1945
ρ -0.0074
ω 2.2135
α1 0.2183
α2 0.1161
α3 0.0623
α4 0.0977

The log-likelihood value for this model is,

LogL = −4.4274 × 103

Conclusions
1. Stationarity A series is weakly stationary if,

µt = µ, ∀t

γjt = γj , ∀t

We are modelling different versions of auto-regressive conditional heteroskedastic-


ity models. Therefore we have the following conditions for stationarity,

αi ≥ 0 i = 1, . . . , q

βi ≥ 0 i = 1, . . . , p
q p
X X
αi + βi < 1
i i

We can see from the tables above that these conditions are met for the ARCH and
the GARCH models we have estimated, respectively.

3
b)
2. Likelihood
The GARCH(1,1) model reports the highest log-likelihood result, indicating that this
specification is most likely to return the data observed. The LogL are ordered as,

LogL(ARCH(1)) < LogL(ARCH(4)) < LogL(GARCH(1, 1))

3. Which model is best?


Both the Akaike information criterion (AIC) and the Bayesian information criterion
(BIC) estimate the relative quality of each model specification relative to the others
available.
The AIC is calculated as,
AIC = 2k − 2log(L)

And the BIC is calculated as,

BIC = kln(n) − 2log(L)

where in both cases, k represents the number of parameters. From a selection of


models, the one with the lowest AIC/BIC score is considered the best. This can be
seen in the formula as there is a trade off between the model which maximises the
log-likelihood function the most but at the cost of increasing parameters. A model
which is “over-fit” will eventually produce a higher AIC score.

Information criterion ARCH(1) GARCH(1,1) ARCH(4)


AIC 8.9344 × 103 8.8024 × 103 8.8689 × 103
BIC 8.9571 × 103 8.8307 × 103 8.9085 × 103

We can see from the table above therefore that the GARCH(1,1) model, according
to both the AIC and the BIC, is the best model for us to select, given that from a set
of candidate models for the data, the preferred model is the one with the minimum
AIC/BIC value. As we know from the class notes, this is because the idea behind a
GARCH model with low values of p and q may fit the data as well or better than an
ARCH model with large q.

Question 2
Estimate (by ML) the same Garch(1,1) model as in the previous problem using the ny-
sewk.gdt data set. Do you get the same parameter estimates? Why or why not, explain.
As we know, when are doing Maximum Likelihood Estimation we have to make dis-
tributional assumptions regarding the distribution of the model.

4
We first recall what consists of a GARCH (1,1) model (following the notation from the
previous exercise):

yt = µ + ρyt−1 + t
t = σt ut
σt2 = ω + α2t−1 + βσt−1
2

As we know, MLE consists of estimating the parameters by maximizing conditional


log-likelihood. Then, we first form the log-likelihood function. Since we assume that t is
normally distributed, we then have to note that:
yt − µ − ρyt−1 t
ut = =
σt σt
Is iid Gaussian so that the likelihood is simply the product of standard normal densi-
ties.
u ∼ N (0, I)
n  2
Y 1 u
f (u) = √ exp − t
t=1
2π 2
So that the joint density of y can be constructed doing a change of variables, having in
mind that:
yt − µ − ρyt−1
ut =
σt
∂ut 1
=
∂yt σt
n
∂u X 1
=
∂y 0 t=1 t
σ
So that if we do a change of variables,
n  2 !
Y 1 1 1 yt − µ − ρyt−1
f (y; θ) = √ exp −
t=1
2π σt 2 σt

And, having in mind that that t = yt − µ − ρyt−1 , we have:


n
X
L(θ) = lnf (t |σt ; θ)
t=1

Then,
1 −2
f (yt |σt ; θ) = p exp( 2t )
2πσt2 2σt
So that the likelihood function for GARCH(1,1) is:
n
X 1 1 2
L(θ) = − ln(2π) − ln(σt2 ) − t 2
t=1
2 2 2σt

5
σt2 = ω + α2t−1 + βσt−1
2

Now, we need σ12 to complete the definition of L(θ):


• The exact value of σ12 does not matter in large samples, since σt2 converges to its sta-
tionary distribution for large t.
• A reasonable guess for σ12 improves accuracy in finite samples.

• We then use the unconditional sample variance: σ12 = Ê[2t ]


And before writing the optimization problem, it is worth recalling that we need con-
straints on the parameters to guarantee stationary.
Then, MLE for GARCH(1,1) is:
θ̂M L = arg max L(θ)
ω,α1 ,β1

Subject to,
α ≥ 0, β ≥ 0, α + β < 1
We can find the results to this optimization problem by using the build-in function in
Matlab “garch”:
Description: “GARCH(1,1) Conditional Variance Model (Gaussian Distribution)”
Distribution: “Gaussian”
Parameter Value Standard Error T-Stat P-Value
Constant 0.14266 0.03573 3.9926 6.5339 × 10−5
GARCH(1) 0.8655 0.018483 46.828 0
ARCH(1) 0.10398 0.012698 8.1887 2.6415 × 10−16

The log-likelihood value for this model is,


LogL = −4.4089 × 103
The results are pretty similar as those found in exercise 1 using the as in question 1 (us-
ing the Garch11Example.jl script). The small differences seem to be due to the numerical
approximation function “fminunc” of Matlab.

Question 3
Write a Matlab script that generates two independent random walks, xt = xt−1 + ut and
yt = yt−1 + ut , where the initial conditions are x0 = 1 and y0 = 1, and the two errors are
both iidN (0, 1). Use a sample size of 1000 : t = 1, 2, . . . , 1000:
In the accompanying MatLab script we have specified the following random walk pro-
cess,
xt = xt−1 + t
yt = yt−1 + ut

6
1. Regress y upon x and a constant.
The model we have to estimate is:

yt = β1 + β1 · xt + et

Estimate SE t-stat p-value


β1 -12.442 0.32561 -38.21 1.5122e-197
β2 0.28615 0.0188 15.221 3.3468e-47

2. Discuss your findings, especially the slope coefficient, the t statistic of the slope, and
R2 . Are the findings sensible, given that we know that x has nothing to do with y?
We are going to organize the answer to this question by commenting each of the
findings separately. First, it is worth mentioning that the estimates of the slope co-
efficient and R2 are very sensitive and when the random generator process is ran
multiple times through MatLab the values change significantly, however constantly
displaying the general conclusions we are presenting:

• In the example we presented, we find that the R2 is 0.188, indicating ≈ 18% of


the variation in y can in fact be explained by variation in x. Even though there
is no reason for the variance in y to be explained by movement in x, it is worth
noticing that since they come from the same regression there is a part of the
variance of y that could be explained with the variance of x.
• The slope coefficient indicates a positive correlation between x and y. however,
we have to be aware that some times we run this regression the slope parameter
estimated is positive and some times is negative so that we should not take the
sign as anything conclusive of the relationship between x and y. The crucial
part to consider from the slope parameter estimated is both the t-stat and the
p-value. In all the regressions we have run we find a very high t-stat and low
p-value suggesting (as we were not expecting) that the there is a relationship
between x and y. This is what is known as “spurious correlation”, that is, given
that y and x are constructed in the same way, it seems that there is indeed a
relationship between these two variables.

3. Compute the variance of yt and xt conditional on the initial conditions y0 = 0 and


x0 = 0. Does the variance depend on t?
To show the variance of this model let’s calculate (analytically) the variance of yt (the
proof would be the same for xt ) to check (i) if the initial conditions matter and (ii)
and if the variance depends on t. To do so, let’s look at the model we are estimating:

yt = xt + et

And since,
yt = φyt−1 + ut

7
xt = ρxt−1 + t

Where φ = ρ = 1.
To follow the proof that the variance of yt and xt depend on t, we are gonna do it for
yt (but it would be the the same for xt ):

yt = yt−1 + ut

And,
yt = yt−2 + ut−1 + ut

Following until the first period, we find:


t
X
yt = y0 + ut
t=1

Since y0 is a number and E(ut ) = 0 we have:


t
X t
X t
X
V ar(yt ) = V ar( ut ) = E(u2t ) = σu2 = tσu2
t=1 t=1 t=1

From here, we can easily get that,

V ar(xt ) = tσ2

So that we can clearly see that the variance of both yt and xt depends on t. This
happens because φ = ρ = 1 and a necessary condition so that the variance does not
depend on t is that |φ| < 1 and |ρ| < 1 which is not satisfied in this case. Therefore,
we can see that the variance of both yt and xt “explode” as we increase the number
of observations. Moreover, we can also see that the initial conditions y0 and x0 do
not affect the variance.
4. Which of the assumptions of the classical linear regression model are not satisfied
by this data generating process?
This model (for both yt and xt ) breaks the assumption of no auto-correlation of the
error terms so that the classical assumption of spherical errors is not satisfied in this
case. To show it formally let’s see the correlation between yt and yt−1 ,
2
γ1 = Cov(yt , yt−1 ) = E[(yt−1 + ut )yt−1 ] = E(yt−1 ) = V ar(yt−1 ) = (t − 1)σu2 6= 0

Therefore, we can conclude that:

E(ui uj |y) 6= 0

So that the spherical error assumption is not satisfied for yt . The proof is the same to
show that the spherical error assumption is not satisfied neither for the case of xt .

8
5. Present estimation results using transformation(s) of y and/or x so that the regres-
sion using the transformed variables conforms that there is no relationship between
the variables. Explain why the trans-formation(s) you use are successful in elimi-
nating the problem of a spurious relationship.
We propose the following transformation to show that there is no relation between
x and y:
yt − yt−1 = β1 + β2 (xt − xt−1 ) + wt

Which is equivalent to,


ut = β1 + β2 (t ) + wt

This transformation is useful since it allows to check if there is a relation between x


and y. If they were truly correlated this would appear in the estimation and then the
estimated slope coefficient would be different (and statistically different) from 0. In
our context, we know that there is no relationship at all between x and y and hence
the errors are not correlated.
Estimate SE t-stat p-value
β1 -0.029095 0.031884 -0.91253 0.36171
β2 -0.023444 0.033185 -0.70645 0.48007

And the R2 = 0.0005. Clearly, these results confirm that there is no relationship
between x and y since the errors are uncorrelated (the p-value is high).

Question 4
Suppose that data follows a MA(1) process with a constant: yt = α0 + t + φ0 t−1 (t =
1, 2, . . . , T ); where V (t ) = σ02 (∀t), and the t are white noise shocks. Assume that the
parameters satisfy restrictions so that the process is invertible (this is a technical detail,
don’t let it confuse you).

4.a)
We have solved this exercise both analytically and computationally to see how well the
results approximate. In Matlab, we have generate the data with the following parameters:
α0 = 1, φ0 = 0.5 and t ∼iid N (0, 1)
i) Mean)
Analytical result:
E(yt ) = E(α0 + t + φ0 t−1 ) = E(α0 ) = α0
Computational results:
Sample size 10 100 1000 5000
Mean 0.7747 0.8352 1.0527 0.9899

9
Since α0 = 1, as we increase the sample size, the mean of the data approximates better
to the true value.
ii) Variance)
Analytical results:

V ar(yt ) = E(yt2 ) − E[yt ]2 = E[(α0 + t + φ0 t−1 )2 ] − α02

⇒ E[(α02 + 2t + φ20 2t−1 + 2t α0 + 2α0 φ0 t−1 + 2t t−1 φ0 ] − α02
⇒ α02 + σ02 + φ20 σ02 − α02 = σ02 [1 + φ20 ]
Computational results:

Sample size 10 100 1000 5000


Variance 1.8996 1.1805 1.2002 1.2241

Since σ02 = 1 and φ0 = 0.5 so that σ02 (1 + φ20 ) = 1.25, we can see that as we increase the
sample size, the variance of the data approximates better to true value.
iv) First-order autocovariance)
Analytical results:

γ1 = Cov(yt , yt−1 ) = E(yt yt−1 ) − E(yt )E(yt−1 ) = E([α0 + t + φ0 t−1 ]α0 + t−1 + φ0 t−2 ) − α02

= α02 + φ0 σ02 − α02 = φ0 σ02


Computational results:

Sample size 10 100 1000 5000


First-order autocovariance 0.3732 0.5728 0.4698 0.5018

Since σ02 = 1 and φ0 = 0.5 so that σ02 φ0 = 0.5, we can see that as we increase the sample
size, the first-order autocovariance of the data approximates better to true value.
iii) Second-order autocovariance)
Analytical results:

γ2 = Cov(yt , yt−2 ) = E(yt yt−2 ) − E(yt )E(yt−2 ) = E([α0 + t + φ0 t−1 ]α0 + t−2 + φ0 t−3 ) − α02

= α02 − α02 = 0
Computational results:

Sample size 10 100 1000 5000


Second-order autocovariance -0.7294 0.1174 -0.0664 -0.0052

We can see that as we increase the sample size, the second-order autocovariance of the
data approximates better to true value.

10
4.b)
Is the process covariance stationary? Plot and explain.
Yes, this MA(1) process is covariance stationary since:

• The mean is constant, E(yt ) = α0 .

• The variance is constant, V ar(yt ) = σ02 [1 + φ20 ].

• The γs autocovariance does not depend t, only on s. We have shown that for s = 1,
γs = φ0 σ02 and for any s ≥ 2, γs = 0

We have plotted the graph one graph for each sample size.

As you can see in the graph above, the mean of each sample size is plotted in red. As
the number of observation increases the graph appears to show stationary as the points
are equally dispersed around the mean. Furthermore the graph appears to display no
increase or decrease in the auto-covariance across the observation period.
We can see in the graph the the value of yt is always centered on α0 = 1. Moreover, we
can the variance seems constant across time. This pattern of the mean and the variance is
clearer as we increase the sample size.
In addition, in the graphs with 10 and 100 observations (in the graphs with 1000 and
5000 observations is more difficult to appreciate it) we can also see some persistence of 1
period of the shocks as suggested by γ1 > 0 and γk = 0 (∀k > 1)

11
4.c)
Is the process ergodic (for the mean)? Plot and explain.
A stationary stochastic process is ergodic (for the mean) if the time average converges
to the mean: n
1X
yt → µ
n t=0
We have shown in question 4.a) (numerically) and it can be seen in the figure of ques-
tion 4.b) (graphically), that any this stochastic process is ergodic for the mean since the
time average converges to the true value of the mean since they all centered (as we increase
the sample size this is more clear) around the mean (α0 ). Recall that α0 = 1, the mean of
each sample size clearly converges to 1 as the sample size increases, thus demonstrating
that the process is ergodic.

12

You might also like