You are on page 1of 333

Chapter 6

Univariate time-series modelling and


forecasting

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 1


Univariate Time-series Models

• Where we attempt to predict returns using only information contained in their


past values.
Some Notation and Concepts
• A Strictly Stationary Process
A strictly stationary process is one where
P{yt1  b1,..., ytn  bn}  P{yt1 m  b1,..., ytn  m  bn}
i.e. the probability measure for the sequence {yt} is the same as that for {yt+m}  m.
• A Weakly Stationary Process
If a series satisfies the next three equations, it is said to be weakly or covariance
stationary
1. E(yt) =  , t = 1,2,...,
2. E ( yt   )( yt   )    
2

3. E ( yt1   )( yt 2   )   t 2  t1  t1 , t2
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 2
Univariate Time-series Models (cont’d)

• So if the process is covariance stationary, all the variances are the same and all
the covariances depend on the difference between t1 and t2. The moments
E ( yt  E ( yt ))( yt  s  E ( yt  s ))   s , s = 0,1,2, ...
are known as the covariance function.
• The covariances, s, are known as autocovariances.

• However, the value of the autocovariances depend on the units of measurement


of yt.
• It is thus more convenient to use the autocorrelations which are the
autocovariances normalised by dividing by the variance:

  s , s = 0,1,2, ...
s
0
• If we plot s against s=0,1,2,... then we obtain the autocorrelation function or
correlogram.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 3
A White Noise Process

• A white noise process is one with (virtually) no discernible structure. A


definition of a white noise process is E ( y )  
t
Var ( yt )   2
 2 if t  r
 t r  
0 otherwise
• Thus the autocorrelation function will be zero apart from a single peak of 1
at s = 0. s  approximately N(0,1/T) where T = sample size

• We can use this to do significance tests for the autocorrelation coefficients


by constructing a confidence interval.
1
• For example, a 95% confidence interval would be given by  .196 
T . If
the sample autocorrelation coefficient, s , falls outside this region for any
value of s, then we reject the null hypothesis that the true value of the
coefficient at lag s is zero.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 4
Joint Hypothesis Tests

• We can also test the joint hypothesis that all m of the k correlation coefficients
are simultaneously equal to zero using the Q-statistic developed by Box and
m
Q  T   k2
Pierce:
k 1
where T = sample size, m = maximum lag length
• The Q-statistic is asymptotically distributed as a  m2.

• However, the Box Pierce test has poor small sample properties, so a variant
has been developed, called the Ljung-Box statistic:
m
 k2
Q  T T  2

~  m2
k 1 T k
• This statistic is very useful as a portmanteau (general) test of linear dependence
in time-series.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 5
An ACF Example

• Question:
Suppose that a researcher had estimated the first 5 autocorrelation coefficients
using a series of length 100 observations, and found them to be (from 1 to 5):
0.207, -0.013, 0.086, 0.005, -0.022.
Test each of the individual coefficient for significance, and use both the Box-
Pierce and Ljung-Box tests to establish whether they are jointly significant.

• Solution:
A coefficient would be significant if it lies outside (-0.196,+0.196) at the 5%
level, so only the first autocorrelation coefficient is significant.
Q=5.09 and Q*=5.26
Compared with a tabulated 2(5)=11.1 at the 5% level, so the 5 coefficients
are jointly insignificant.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 6


Moving Average Processes

• Let ut (t=1,2,3,...) be a sequence of independently and identically


distributed (iid) random variables with E(ut)=0 and Var(ut)=2 , then
yt =  + ut + 1ut-1 + 2ut-2 + ... + qut-q

is a qth order moving average model MA(q).

• Its properties are


E(yt)=; Var(yt) = 0 = (1+12   22 ... q2 )2
Covariances
( s   s 1 1   s  2 2  ...   q q  s ) 2 for s  1,2,..., q
s 
0 for s  q

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 7


Example of an MA Problem

1. Consider the following MA(2) process:


X t  u t   1u t 1   2 u t  2
where ut is a zero mean white noise process with variance  2 .
(i) Calculate the mean and variance of Xt
(ii) Derive the autocorrelation function for this process (i.e. express the
autocorrelations, 1, 2, ... as functions of the parameters 1 and
2).
(iii) If 1 = -0.5 and 2 = 0.25, sketch the acf of Xt.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 8


Solution

(i) If E(ut)=0, then E(ut-i)=0  i.


So

E(Xt) = E(ut + 1ut-1+ 2ut-2)= E(ut)+ 1E(ut-1)+ 2E(ut-2)=0

Var(Xt) = E[Xt-E(Xt)][Xt-E(Xt)]
but E(Xt) = 0, so
Var(Xt) = E[(Xt)(Xt)]
= E[(ut + 1ut-1+ 2ut-2)(ut + 1ut-1+ 2ut-2)]
= E[ u t2   12 u t21   22 u t2 2 +cross-products]

But E[cross-products]=0 since Cov(ut,ut-s)=0 for s0.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 9


Solution (cont’d)

So Var(Xt) = 0= E [ u t   1 u t 1   2 u t  2 ]
2 2 2 2 2

=  1   2 
2 2 2 2 2

= (1   1   2 )
2 2 2

(ii) The acf of Xt.


1 = E[Xt-E(Xt)][Xt-1-E(Xt-1)]
= E[Xt][Xt-1]
= E[(ut +1ut-1+ 2ut-2)(ut-1 + 1ut-2+ 2ut-3)]
= E[(  1u t 1   1 2 u t  2 )]
2 2

=  1  2   1 2  2
= ( 1   1 2 ) 2

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 10


Solution (cont’d)

2 = E[Xt-E(Xt)][Xt-2-E(Xt-2)]
= E[Xt][Xt-2]
= E[(ut +1ut-1+2ut-2)(ut-2 +1ut-3+2ut-4)]
= E[(  2 u t2 2 )]
=  2
2

3 = E[Xt-E(Xt)][Xt-3-E(Xt-3)]
= E[Xt][Xt-3]
= E[(ut +1ut-1+2ut-2)(ut-3 +1ut-4+2ut-5)]
=0

So s = 0 for s > 2.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 11


Solution (cont’d)

We have the autocovariances, now calculate the autocorrelations:



0   0  1
0
1 ( 1   1 2 ) 2 ( 1   1 2 )
1   
 0 (1   12   22 ) 2 (1   12   22 )
2 ( 2 ) 2 2
2   
 0 (1   12   22 ) 2 (1   12   22 )

3   3  0
0

s   s  0s  2
0

(iii) For 1 = -0.5 and 2 = 0.25, substituting these into the formulae above
gives 1 = -0.476, 2 = 0.190.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 12
ACF Plot

Thus the acf plot will appear as follows:

1.2

0.8

0.6

0.4
acf

0.2

0
0 1 2 3 4 5 6
-0.2

-0.4

-0.6

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 13


Autoregressive Processes

• An autoregressive model of order p, an AR(p) can be expressed as


y t    1 y t 1   2 y t  2  ...   p y t  p  u t
• Or using the lag operator notation:
Lyt = yt-1 Liyt = yt-i
p
y t      i y t i  u t
i 1
p

or y t      i L y t  u t
i

i 1

or  ( L) y t    u t where  ( L)  1  (1 L  2 L2 ... p Lp ) .

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 14


The Stationary Condition for an AR Model

• The condition for stationarity of a general AR(p) model is that the


roots of 1  1z   2 z2 ... p z p  0 all lie outside the unit circle.
• A stationary AR(p) model is required for it to have an MA()
representation.
• Example 1: Is yt = yt-1 + ut stationary?
The characteristic root is 1, so it is a unit root process (so non-
stationary)
• Example 2: Is yt = 3yt-1 – 2.75yt-2 + 0.75yt-3 +ut stationary?
The characteristic roots are 1, 2/3, and 2. Since only one of these lies
outside the unit circle, the process is non-stationary.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 15


Wold’s Decomposition Theorem

• This states that any stationary series can be decomposed into the sum of
two unrelated processes, a purely deterministic part and a purely stochastic
part, which will be an MA().

• For the AR(p) model,  ( L) yt  ut , ignoring the intercept, the Wold


decomposition is
yt   ( L)ut

where,
 ( L)  (1  1 L  2 L2 ... p Lp ) 1

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 16


The Moments of an Autoregressive Process

• The moments of an autoregressive process are as follows. The mean is


given by 0
E ( yt ) 
1  1  2  ...   p

• The autocovariances and autocorrelation functions can be obtained by


solving what are known as the Yule-Walker equations:
1  1  12  ...   p 1 p
 2  11  2  ...   p  2 p
  
 p   p 11   p  22  ...   p

• If the AR model is stationary, the autocorrelation function will decay


exponentially to zero.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 17
Sample AR Problem

• Consider the following simple AR(1) model


yt    1 yt 1  ut
(i) Calculate the (unconditional) mean of yt.

For the remainder of the question, set =0 for simplicity.

(ii) Calculate the (unconditional) variance of yt.

(iii) Derive the autocorrelation function for yt.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 18


Solution

(i) Unconditional mean:


E(yt) = E(+1yt-1)
= +1E(yt-1)
But also

So E(yt)=  +1 ( +1E(yt-2))


=  +1  +12 E(yt-2))

E(yt) =  +1  +12 E(yt-2))


=  +1  +12 ( +1E(yt-3))
=  +1  +12  +13 E(yt-3)

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 19


Solution (cont’d)

An infinite number of such substitutions would give


E(yt) =  (1+1+12 +...) + 1y0
So long as the model is stationary, i.e. 1 < 1, then 1 = 0.

So E(yt) =  (1+1+12 +...) = 


1  1

(ii) Calculating the variance of yt: yt  1 yt 1  ut

From Wold’s decomposition theorem:


yt (1  1 L)  ut
yt  (1  1 L) 1 u t
y t  (1  1 L  1 L2  ...)u t
2

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 20


Solution (cont’d)

So long as 1  1 , this will converge.


yt  ut  1ut 1  1 ut 2  ...
2

Var(yt) = E[yt-E(yt)][yt-E(yt)]
but E(yt) = 0, since we are setting  = 0.
Var(yt) = E[(yt)(yt)]
  
= E[ ut  1ut 1  12ut 2  .. ut  1ut 1  12ut 2  .. ]
= E[(ut 2  12ut 12  14ut 2 2  ...  cross  products)]
= E[(ut  1 ut 1  1 ut 2  ...)]
2 2 2 4 2

=  u2  12 u2  14 u2  ...


=  u (1  1  1  ...)
2 2 4

=  2
u
(1  12 )
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 21
Solution (cont’d)

(iii) Turning now to calculating the acf, first calculate the autocovariances:
1 = Cov(yt, yt-1) = E[yt-E(yt)][yt-1-E(yt-1)]
Since a0 has been set to zero, E(yt) = 0 and E(yt-1) = 0, so
1 = E[ytyt-1]
1 = E[(ut  1ut 1  1 ut 2  ...)(u t 1  1u t 2  1 u t 3  ...) ]
2 2

 u
= E[ 1 t 1
2
  1
3
u t  2  ...  cross  products]
2

= 1  1   1   ...
2 3 2 5 2

1 2
=
(1  12 )

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 22


Solution (cont’d)

For the second autocorrelation coefficient,


2 = Cov(yt, yt-2) = E[yt-E(yt)][yt-2-E(yt-2)]
Using the same rules as applied above for the lag 1 covariance
2 = E[ytyt-2]
= E[(ut  1ut 1  1 ut  2  ...)(ut 2  1ut 3  1 ut 4  ...) ]
2 2

= E[ 1 2 u t  2 2  1 4 u t 3 2  ...  cross  products]


= 12 2  14 2  ...
= 1  (1  1  1  ...)
2 2 2 4

12 2
=
(1  12 )

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 23


Solution (cont’d)

• If these steps were repeated for 3, the following expression would be
obtained
13 2
3 =
(1  12 )
and for any lag s, the autocovariance would be given by

1s  2
s =
(1  12 )
The acf can now be obtained by dividing the covariances by the
variance:

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 24


Solution (cont’d)

0
0 = 1
0
   2 2 
 1 2   1  
 2   2 
 (1   1 )  (1   1 )
1   2  
1 =    1 2 =   12
  0  
0
 2   2 
 2   2 
 (1   1 )  (1   1 )
   

3 = 1
3


s = 1s
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 25
The Partial Autocorrelation Function (denoted kk)

• Measures the correlation between an observation k periods ago and the


current observation, after controlling for observations at intermediate lags
(i.e. all lags < k).

• So kk measures the correlation between yt and yt-k after removing the effects
of yt-k+1 , yt-k+2 , …, yt-1 .

• At lag 1, the acf = pacf always

• At lag 2, 22 = (2-12) / (1-12)

• For lags 3+, the formulae are more complex.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 26


The Partial Autocorrelation Function (denoted kk)
(cont’d)

• The pacf is useful for telling the difference between an AR process and an
ARMA process.

• In the case of an AR(p), there are direct connections between yt and yt-s only
for s p.

• So for an AR(p), the theoretical pacf will be zero after lag p.

• In the case of an MA(q), this can be written as an AR(), so there are direct
connections between yt and all its previous values.

• For an MA(q), the theoretical pacf will be geometrically declining.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 27


ARMA Processes

• By combining the AR(p) and MA(q) models, we can obtain an ARMA(p,q)


model:  ( L) yt     ( L)ut

where  ( L)  1  1 L  2 L2 ... p Lp

and  ( L)  1  1L   2 L2  ...   q Lq

or yt    1 yt 1   2 yt 2  ...   p yt  p   1u t 1   2 u t 2  ...   q u t q  u t

with E(ut )  0; E(ut )   ; E(ut u s )  0, t  s


2 2

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 28


The Invertibility Condition

• Similar to the stationarity condition, we typically require the MA(q) part of


the model to have roots of (z)=0 greater than one in absolute value.

• The mean of an ARMA series is given by



E ( yt ) 
1  1  2 ...p

• The autocorrelation function for an ARMA process will display


combinations of behaviour derived from the AR and MA parts, but for lags
beyond q, the acf will simply be identical to the individual AR(p) model.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 29


Summary of the Behaviour of the acf for
AR and MA Processes

An autoregressive process has


• a geometrically decaying acf
• a number of spikes of pacf = AR order

A moving average process has


• a number of spikes of acf = MA order
• a geometrically decaying pacf

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 30


Some sample acf and pacf plots
for standard processes
The acf and pacf are not produced analytically from the relevant formulae for a model of that
type, but rather are estimated using 100,000 simulated observations with disturbances drawn
from a normal distribution.
ACF and PACF for an MA(1) Model: yt = – 0.5ut-1 + ut
0.05

0
1 2 3 4 5 6 7 8 9 10
-0.05

-0.1

-0.15
acf and pacf

-0.2

-0.25

-0.3
acf
-0.35
pacf

-0.4

-0.45
Lag

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 31


ACF and PACF for an MA(2) Model:
yt = 0.5ut-1 - 0.25ut-2 + ut

0.4

0.3 acf
pacf

0.2

0.1
acf and pacf

0
1 2 3 4 5 6 7 8 9 10

-0.1

-0.2

-0.3

-0.4
Lags

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 32


ACF and PACF for a slowly decaying AR(1) Model:
yt = 0.9yt-1 + ut

0.9
acf

0.8 pacf

0.7

0.6
acf and pacf

0.5

0.4

0.3

0.2

0.1

0
1 2 3 4 5 6 7 8 9 10
-0.1
Lags

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 33


ACF and PACF for a more rapidly decaying AR(1)
Model: yt = 0.5yt-1 + ut

0.6

0.5
acf
pacf
0.4
acf and pacf

0.3

0.2

0.1

0
1 2 3 4 5 6 7 8 9 10

-0.1
Lags

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 34


ACF and PACF for a more rapidly decaying AR(1)
Model with Negative Coefficient: yt = -0.5yt-1 + ut

0.3

0.2

0.1

0
1 2 3 4 5 6 7 8 9 10
acf and pacf

-0.1

-0.2

-0.3

-0.4
acf
-0.5 pacf

-0.6
Lags

‘Introductory Econometrics for Finance’ © Chris Brooks 2019


ACF and PACF for a Non-stationary Model
(i.e. a unit coefficient): yt = yt-1 + ut

0.9
acf
pacf
0.8

0.7

0.6
acf and pacf

0.5

0.4

0.3

0.2

0.1

0
1 2 3 4 5 6 7 8 9 10
Lags

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 36


ACF and PACF for an ARMA(1,1):
yt = 0.5yt-1 + 0.5ut-1 + ut

0.8

0.6
acf
pacf
0.4
acf and pacf

0.2

0
1 2 3 4 5 6 7 8 9 10

-0.2

-0.4
Lags

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 37


Building ARMA Models
- The Box Jenkins Approach

• Box and Jenkins (1970) were the first to approach the task of estimating an
ARMA model in a systematic manner. There are 3 steps to their approach:
1. Identification
2. Estimation
3. Model diagnostic checking

Step 1:
- Involves determining the order of the model.
- Use of graphical procedures
- A better procedure is now available

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 38


Building ARMA Models
- The Box Jenkins Approach (cont’d)

Step 2:
- Estimation of the parameters
- Can be done using least squares or maximum likelihood depending
on the model.

Step 3:
- Model checking

Box and Jenkins suggest 2 methods:


- deliberate overfitting
- residual diagnostics

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 39


Some More Recent Developments in
ARMA Modelling

• Identification would typically not be done using acf’s.


• We want to form a parsimonious model.

• Reasons:
- variance of estimators is inversely proportional to the number of degrees of
freedom.
- models which are profligate might be inclined to fit to data specific features

• This gives motivation for using information criteria, which embody 2 factors
- a term which is a function of the RSS
- some penalty for adding extra parameters
• The object is to choose the number of parameters which minimises the
information criterion.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 40
Information Criteria for Model Selection

• The information criteria vary according to how stiff the penalty term is.
• The three most popular criteria are Akaike’s (1974) information criterion
(AIC), Schwarz’s (1978) Bayesian information criterion (SBIC), and the
Hannan-Quinn criterion (HQIC).
AIC  ln( 2 )  2k / T
k
SBIC  ln( ˆ 2 )  ln T
T
2k
HQIC  ln( ˆ 2 )  ln(ln( T ))
T
where k = p + q + 1, T = sample size. So we min. IC s.t. p  p, q  q
SBIC embodies a stiffer penalty term than AIC.
• Which IC should be preferred if they suggest different model orders?
– SBIC is strongly consistent but (inefficient).
– AIC is not consistent, and will typically pick “bigger” models.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 41
ARIMA Models

• As distinct from ARMA models. The I stands for integrated.

• An integrated autoregressive process is one with a characteristic root


on the unit circle.

• Typically researchers difference the variable as necessary and then


build an ARMA model on those differenced variables.

• An ARMA(p,q) model in the variable differenced d times is equivalent


to an ARIMA(p,d,q) model on the original data.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 42


Exponential Smoothing

• Another modelling and forecasting technique

• How much weight do we attach to previous observations?

• Expect recent observations to have the most power in helping to forecast


future values of a series.

• The equation for the model


St =  yt + (1-)St-1 (1)
where
 is the smoothing constant, with 01
yt is the current realised value
St is the current smoothed value
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 43
Exponential Smoothing (cont’d)

• Lagging (1) by one period we can write


St-1 =  yt-1 + (1-)St-2 (2)
• and lagging again
St-2 =  yt-2 + (1-)St-3 (3)

• Substituting into (1) for St-1 from (2)


St =  yt + (1-)( yt-1 + (1-)St-2)
=  yt + (1-) yt-1 + (1-)2 St-2 (4)

• Substituting into (4) for St-2 from (3)


St =  yt + (1-) yt-1 + (1-)2 St-2
=  yt + (1-) yt-1 + (1-)2( yt-2 + (1-)St-3)
=  yt + (1-) yt-1 + (1-)2 yt-2 + (1-)3 St-3
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 44
Exponential Smoothing (cont’d)

• T successive substitutions of this kind would lead to


 T 
St    1    yt i   1    S0
i T

 i 0 
since 0, the effect of each observation declines exponentially as we
move another observation forward in time.

• Forecasts are generated by


ft+s = St
for all steps into the future s = 1, 2, ...

• This technique is called single (or simple) exponential smoothing.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 45


Exponential Smoothing (cont’d)

• It doesn’t work well for financial data because


– there is little structure to smooth
– it cannot allow for seasonality
– it is an ARIMA(0,1,1) with MA coefficient (1-) - (See Granger & Newbold, p174)
– forecasts do not converge on long term mean as s

• Can modify single exponential smoothing


– to allow for trends (Holt’s method)
– or to allow for seasonality (Winter’s method).

• Advantages of Exponential Smoothing


– Very simple to use
– Easy to update the model if a new realisation becomes available.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 46
Forecasting in Econometrics

• Forecasting = prediction.
• An important test of the adequacy of a model.
e.g.
- Forecasting tomorrow’s return on a particular share
- Forecasting the price of a house given its characteristics
- Forecasting the riskiness of a portfolio over the next year
- Forecasting the volatility of bond returns

• We can distinguish two approaches:


- Econometric (structural) forecasting
- Time-series forecasting

• The distinction between the two types is somewhat blurred (e.g, VARs).
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 47
In-Sample Versus Out-of-Sample

• Expect the “forecast” of the model to be good in-sample.

• Say we have some data - e.g. monthly FTSE returns for 120 months:
1990M1 – 1999M12. We could use all of it to build the model, or keep
some observations back:

• A good test of the model since we have not used the information from
1999M1 onwards when we estimated the model parameters.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 48
How to produce forecasts

• Multi-step ahead versus single-step ahead forecasts


• Recursive versus rolling windows

• To understand how to construct forecasts, we need the idea of conditional


expectations:
E(yt+1  t )

• We cannot forecast a white noise process: E(ut+s  t ) = 0  s > 0.

• The two simplest forecasting “methods”


1. Assume no change : f(yt+s) = yt
2. Forecasts are the long term average f(yt+s) = y
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 49
Models for Forecasting

• Structural models
e.g. y = X + u
yt  1   2 x2t     k xkt  ut
To forecast y, we require the conditional expectation of its future

Eyt t 1   E1   2 x2t     k xkt  ut 


value:

= 1   2 E x2t      k E xkt 
But what are ( x 2t ) etc.? We could use x 2 , so
E  yt   1   2 x2     k xk
= y !!

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 50


Models for Forecasting (cont’d)

• Time-series Models
The current value of a series, yt, is modelled as a function only of its previous
values and the current value of an error term (and possibly previous values of
the error term).

• Models include:
• simple unweighted averages
• exponentially weighted averages
• ARIMA models
• Non-linear models – e.g. threshold models, GARCH, bilinear models, etc.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 51


Forecasting with ARMA Models

The forecasting model typically used is of the form:


p q
f t , s     i f t , s i   j ut  s  j
i 1 j 1

where ft,s = yt+s , s 0; ut+s = 0, s > 0


= ut+s , s  0

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 52


Forecasting with MA Models

• An MA(q) only has memory of q.

e.g. say we have estimated an MA(3) model:

yt =  + 1ut-1 +  2ut-2 +  3ut-3 + ut


yt+1 =  +  1ut +  2ut-1 +  3ut-2 + ut+1
yt+2 =  +  1ut+1 +  2ut +  3ut-1 + ut+2
yt+3 =  +  1ut+2 +  2ut+1 +  3ut + ut+3

• We are at time t and we want to forecast 1,2,..., s steps ahead.

• We know yt , yt-1, ..., and ut , ut-1

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 53


Forecasting with MA Models (cont’d)

ft, 1 = E(yt+1  t ) = E( +  1ut +  2ut-1 +  3ut-2 + ut+1)


=  +  1ut +  2ut-1 +  3ut-2

ft, 2 = E(yt+2  t ) = E( +  1ut+1 +  2ut +  3ut-1 + ut+2)


=  +  2ut +  3ut-1

ft, 3 = E(yt+3  t ) = E( +  1ut+2 +  2ut+1 +  3ut + ut+3)


=  +  3ut

ft, 4 = E(yt+4  t ) = 

ft, s = E(yt+s  t ) =  s4

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 54


Forecasting with AR Models

• Say we have estimated an AR(2)


yt =  + 1yt-1 +  2yt-2 + ut
yt+1 =  +  1yt +  2yt-1 + ut+1
yt+2 =  +  1yt+1 +  2yt + ut+2
yt+3 =  +  1yt+2 +  2yt+1 + ut+3

ft, 1 = E(yt+1  t ) = E( +  1yt +  2yt-1 + ut+1)


=  +  1E(yt) +  2E(yt-1)
=  +  1yt +  2yt-1

ft, 2 = E(yt+2  t ) = E( +  1yt+1 +  2yt + ut+2)


=  +  1E(yt+1) +  2E(yt)
=  +  1 ft, 1 +  2yt
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 55
Forecasting with AR Models (cont’d)

ft, 3 = E(yt+3  t ) = E( +  1yt+2 +  2yt+1 + ut+3)


=  +  1E(yt+2) +  2E(yt+1)
=  +  1 ft, 2 +  2 ft, 1

• We can see immediately that

ft, 4 =  +  1 ft, 3 +  2 ft, 2 etc., so

ft, s =  +  1 ft, s-1 +  2 ft, s-2

• We can easily generate ARMA(p,q) forecasts in the same way.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 56


How can we test whether a forecast is accurate or not?

•For example, say we predict that tomorrow’s return on the FTSE will be 0.2, but
the outcome is actually -0.4. Is this accurate? Define ft,s as the forecast made at
time t for s steps ahead (i.e. the forecast made for time t+s), and yt+s as the
realised value of y at time t+s.
• Some of the most popular criteria for assessing the accuracy of time-series
forecasting techniques are:
N
1
MSE 
N
t 1
( yt  s  f t , s ) 2

N
1
MAE is given by MAE 
N
t 1
yt  s  f t , s

1 N yt s  ft ,s
Mean absolute percentage error: MAPE  100 
N t 1 yt s
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 57
How can we test whether a forecast is accurate or not?
(cont’d)

• It has, however, also been shown (Gerlow et al., 1993) that the accuracy of
forecasts according to traditional statistical criteria are not related to trading
profitability.

• A measure more closely correlated with profitability:

1 N
% correct sign predictions =  zt  s
N t 1

where zt+s = 1 if (yt+s . ft,s ) > 0


zt+s = 0 otherwise

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 58


Forecast Evaluation Example

• Given the following forecast and actual values, calculate the MSE, MAE and
percentage of correct sign predictions:

Steps Ahead Forecast Actual


1 0.20 -0.40
2 0.15 0.20
3 0.10 0.10
4 0.06 -0.10
5 0.04 -0.05

• MSE = 0.079, MAE = 0.180, % of correct sign predictions = 40

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 59


What factors are likely to lead to a
good forecasting model?

• “signal” versus “noise”

• “data mining” issues

• simple versus complex models

• financial or economic theory

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 60


Statistical Versus Economic or
Financial loss functions

• Statistical evaluation metrics may not be appropriate.


• How well does the forecast perform in doing the job we wanted it for?

Limits of forecasting: What can and cannot be forecast?


• All statistical forecasting models are essentially extrapolative

• Forecasting models are prone to break down around turning points

• Series subject to structural changes or regime shifts cannot be forecast

• Predictive accuracy usually declines with forecasting horizon

• Forecasting is not a substitute for judgement


‘Introductory Econometrics for Finance’ © Chris Brooks 2019 61
Back to the original question: why forecast?

• Why not use “experts” to make judgemental forecasts?


• Judgemental forecasts bring a different set of problems:
e.g., psychologists have found that expert judgements are prone to the
following biases:
– over-confidence
– inconsistency
– recency
– anchoring
– illusory patterns
– “group-think”.

• The Usually Optimal Approach


To use a statistical forecasting model built on solid theoretical
foundations supplemented by expert judgements and interpretation.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 62
Chapter 7

Multivariate models

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 1


Simultaneous Equations Models

• All the models we have looked at thus far have been single equations models of
the form y = X + u
• All of the variables contained in the X matrix are assumed to be EXOGENOUS.
• y is an ENDOGENOUS variable.

An example from economics to illustrate - the demand and supply of a good:


Qdt    Pt  St  ut (1)
Qst    Pt  Tt  vt (2)
Qdt  Qst (3)
where Qdt = quantity of the good demanded
Qst = quantity of the good supplied
St = price of a substitute good
Tt = some variable embodying the state of technology
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 2
Simultaneous Equations Models:
The Structural Form

• Assuming that the market always clears, and dropping the time subscripts
for simplicity
Q    P  S  u (4)
Q    P  T  v (5)

This is a simultaneous STRUCTURAL FORM of the model.

• The point is that price and quantity are determined simultaneously (price
affects quantity and quantity affects price).

• P and Q are endogenous variables, while S and T are exogenous.

• We can obtain REDUCED FORM equations corresponding to (4) and (5)


by solving equations (4) and (5) for P and for Q (separately).
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 3
Obtaining the Reduced Form

• Solving for Q,
  P  S  u    P  T  v (6)

• Solving for P,
Q  S u Q  T v (7)
      
       
• Rearranging (6),
P  P      T  S  v  u
(  ) P  (   )  T  S  (v  u)
    vu (8)
P  T S
   

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 4


Obtaining the Reduced Form (cont’d)

• Multiplying (7) through by ,


Q    S  u  Q    T  v
Q  Q      T  S  u  v
(    )Q  (    )  T  S  ( u  v)

     u  v
Q  T S (9)
   

• (8) and (9) are the reduced form equations for P and Q.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 5


Simultaneous Equations Bias

• But what would happen if we had estimated equations (4) and (5), i.e. the
structural form equations, separately using OLS?

• Both equations depend on P. One of the CLRM assumptions was that


E(Xu) = 0, where X is a matrix containing all the variables on the RHS of
the equation.

• It is clear from (8) that P is related to the errors in (4) and (5) - i.e. it is
stochastic.

• What would be the consequences for the OLS estimator,  , if we ignore


the simultaneity?

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 6


Simultaneous Equations Bias (cont’d)

• Recall that   ( X ' X ) 1 X ' y and y  X  u

• So that ˆ  ( X ' X ) 1 X ' ( X  u )


 ( X ' X ) 1 X ' X  ( X ' X ) 1 X ' u
   ( X ' X ) 1 X ' u
• Taking expectations, E ( )  E (  )  E (( X ' X ) 1 X ' u)
   ( X ' X ) 1 E ( X ' u )

• If the X’s are non-stochastic, E(Xu) = 0, which would be the case in a


single equation system, so that E( )  , which is the condition for
unbiasedness.

• But .... if the equation is part of a system, then E(Xu)  0, in general.


‘Introductory Econometrics for Finance’ © Chris Brooks 2019 7
Simultaneous Equations Bias (cont’d)

• Conclusion: Application of OLS to structural equations which are part


of a simultaneous system will lead to biased coefficient estimates.

• Is the OLS estimator still consistent, even though it is biased?

• No - In fact the estimator is inconsistent as well.

• Hence it would not be possible to estimate equations (4) and (5)


validly using OLS.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 8


Avoiding Simultaneous Equations Bias

So What Can We Do?


• Taking equations (8) and (9), we can rewrite them as
P  10  11T  12 S  1 (10)

Q   20   21T   22 S  2 (11)

• We CAN estimate equations (10) & (11) using OLS since all the RHS
variables are exogenous.

• But ... we probably don’t care what the values of the  coefficients are;
what we wanted were the original parameters in the structural
equations - , , , , , .

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 9


Identification of Simultaneous Equations

Can We Retrieve the Original Coefficients from the ’s?


Short answer: sometimes.

• As well as simultaneity, we sometimes encounter another problem:


identification.
• Consider the following demand and supply equations
Supply equation Q    P (12)
Demand equation Q    P (13)
We cannot tell which is which!
• Both equations are UNIDENTIFIED or NOT IDENTIFIED, or
UNDERIDENTIFIED.
• The problem is that we do not have enough information from the equations
to estimate 4 parameters. Notice that we would not have had this problem
with equations (4) and (5) since they have different exogenous variables.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 10
What Determines whether an Equation is Identified
or not?
• We could have three possible situations:

1. An equation is unidentified
· like (12) or (13)
· we cannot get the structural coefficients from the reduced form estimates

2. An equation is exactly identified


· e.g. (4) or (5)
· can get unique structural form coefficient estimates

3. An equation is over-identified
· Example given later
· More than one set of structural coefficients could be obtained from the
reduced form.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 11
What Determines whether an Equation is Identified
or not? (cont’d)

• How do we tell if an equation is identified or not?


• There are two conditions we could look at:

- The order condition - is a necessary but not sufficient condition for an


equation to be identified.

- The rank condition - is a necessary and sufficient condition for


identification. We specify the structural equations in a matrix form and
consider the rank of a coefficient matrix.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 12


Simultaneous Equations Bias (cont’d)

Statement of the Order Condition (from Ramanathan 1995, p. 666)


• Let G denote the number of structural equations. An equation is just
identified if the number of variables excluded from an equation is G-1.

• If more than G-1 are absent, it is over-identified. If less than G-1 are
absent, it is not identified.

Example
• In the following system of equations, the Y’s are endogenous, while the X’s
are exogenous. Determine whether each equation is over-, under-, or just-
identified. Y   Y  Y  X  X  u
1 0 1 2 3 3 4 1 5 2 1
Y2  0  1Y3  2 X1  u2
(14)-(16)
Y3   0   1Y2  u3
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 13
Simultaneous Equations Bias (cont’d)

Solution

G = 3;
If # excluded variables = 2, the eqn is just identified
If # excluded variables > 2, the eqn is over-identified
If # excluded variables < 2, the eqn is not identified

Equation 14: Not identified


Equation 15: Just identified
Equation 16: Over-identified

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 14


Tests for Exogeneity

• How do we tell whether variables really need to be treated as endogenous


or not?
• Consider again equations (14)-(16). Equation (14) contains Y2 and Y3 - but
do we really need equations for them?
• We can formally test this using a Hausman test, which is calculated as
follows:
1. Obtain the reduced form equations corresponding to (14)-(16). The
reduced forms turn out to be:
Y1  10  11 X1  12 X 2  v1
Y2  20  21 X1  v2 (17)-(19)
Y3  30  31 X1  v3
Estimate the reduced form equations (17)-(19) using OLS, and obtain the
fitted values, Y1 ,Y2 ,Y3

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 15


Tests for Exogeneity (cont’d)

2. Run the regression corresponding to equation (14).

3. Run the regression (14) again, but now also including the fitted values
Y2 , Y3 as additional regressors:

Y1   0  1Y2   3Y3   4 X 1   5 X 2  2Yˆ2  3Yˆ3  u1


1 1
(20)

4. Use an F-test to test the joint restriction that 2 = 0, and 3 = 0. If the


null hypothesis is rejected, Y2 and Y3 should be treated as endogenous.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 16


Recursive Systems

• Consider the following system of equations:


Y1  10   11 X1   12 X 2  u1
Y2  20  21Y1   21 X 1 22 X 2  u2 (21-23)
Y3  30  31Y1  32Y2   31 X1   32 X 2  u3

• Assume that the error terms are not correlated with each other. Can we estimate
the equations individually using OLS?

• Equation 21: Contains no endogenous variables, so X1 and X2 are not correlated


with u1. So we can use OLS on (21).
• Equation 22: Contains endogenous Y1 together with exogenous X1 and X2. We
can use OLS on (22) if all the RHS variables in (22) are uncorrelated with that
equation’s error term. In fact, Y1 is not correlated with u2 because there is no Y2
term in equation (21). So we can use OLS on (22).
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 17
Recursive Systems (cont’d)

• Equation 23: Contains both Y1 and Y2; we require these to be


uncorrelated with u3. By similar arguments to the above, equations
(21) and (22) do not contain Y3, so we can use OLS on (23).

• This is known as a RECURSIVE or TRIANGULAR system. We do


not have a simultaneity problem here.

• But in practice not many systems of equations will be recursive...

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 18


Indirect Least Squares (ILS)

• We cannot use OLS on structural equations, but we can validly apply it


to the reduced form equations.

• If the system is just identified, ILS involves estimating the reduced


form equations using OLS, and then using them to substitute back to
obtain the structural parameters.

• However, ILS is not used much because


1. Solving back to get the structural parameters can be tedious.
2. Most simultaneous equations systems are over-identified.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 19


Estimation of Systems
Using Two-Stage Least Squares

• In fact, we can use this technique for just-identified and over-identified


systems.

• Two stage least squares (2SLS or TSLS) is done in two stages:

Stage 1:
• Obtain and estimate the reduced form equations using OLS. Save the
fitted values for the dependent variables.

Stage 2:
• Estimate the structural equations, but replace any RHS endogenous
variables with their stage 1 fitted values.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 20
Estimation of Systems
Using Two-Stage Least Squares (cont’d)

Example: Say equations (14)-(16) are required.

Stage 1:
• Estimate the reduced form equations (17)-(19) individually by OLS and
obtain the fitted values, Y1 ,Y2 ,Y3 .

Stage 2:
• Replace the RHS endogenous variables with their stage 1 estimated values:
Y1   0  1Y2   3Y3   4 X 1  5 X 2  u1
Y2  0  1Y3  2 X 1  u2 (24)-(26)
Y3   0   1Y2  u3

• Now Y2 and Y3 will not be correlated with u1, Y2 will not be correlated with u2 ,
and Y3 will not be correlated with u3 .
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 21
Estimation of Systems
Using Two-Stage Least Squares (cont’d)

• It is still of concern in the context of simultaneous systems whether the


CLRM assumptions are supported by the data.

• If the disturbances in the structural equations are autocorrelated, the


2SLS estimator is not even consistent.

• The standard error estimates also need to be modified compared with


their OLS counterparts, but once this has been done, we can use the
usual t- and F-tests to test hypotheses about the structural form
coefficients.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 22


Instrumental Variables

• Recall that the reason we cannot use OLS directly on the structural
equations is that the endogenous variables are correlated with the errors.

• One solution to this would be not to use Y2 or Y3 , but rather to use some
other variables instead.

• We want these other variables to be (highly) correlated with Y2 and Y3, but
not correlated with the errors - they are called INSTRUMENTS.

• Say we found suitable instruments for Y2 and Y3, z2 and z3 respectively. We


do not use the instruments directly, but run regressions of the form
Y2  1  2 z2  1
Y3  3  4 z3  2 (27) & (28)
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 23
Instrumental Variables (cont’d)

• Obtain the fitted values from (27) & (28), Y2 and Y3 , and replace Y2 and
Y3 with these in the structural equation.

• We do not use the instruments directly in the structural equation.

• It is typical to use more than one instrument per endogenous variable.

• If the instruments are the variables in the reduced form equations, then
IV is equivalent to 2SLS.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 24


Instrumental Variables (cont’d)

What Happens if We Use IV / 2SLS Unnecessarily?


• The coefficient estimates will still be consistent, but will be inefficient
compared to those that just used OLS directly.

The Problem With IV


• What are the instruments?
Solution: 2SLS is easier.

Other Estimation Techniques


1. 3SLS - allows for non-zero covariances between the error terms.
2. LIML - estimating reduced form equations by maximum likelihood
3. FIML - estimating all the equations simultaneously using maximum
likelihood
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 25
An Example of the Use of 2SLS: Modelling
the Bid-Ask Spread and Volume for Options
• George and Longstaff (1993)
• Introduction
- Is trading activity related to the size of the bid / ask spread?
- How do spreads vary across options?

• How Might the Option Price / Trading Volume and the Bid / Ask Spread
be Related?

Consider 3 possibilities:
1. Market makers equalise spreads across options.
2. The spread might be a constant proportion of the option value.
3. Market makers might equalise marginal costs across options irrespective
of trading volume.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 26
Market Making Costs

• The S&P 100 Index has been traded on the CBOE since 1983 on a
continuous open-outcry auction basis.

• Transactions take place at the highest bid or the lowest ask.

• Market making is highly competitive.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 27


What Are the Costs Associated with Market Making?

• For every contract (100 options) traded, a CBOE fee of 9c and an


Options Clearing Corporation (OCC) fee of 10c is levied on the firm
that clears the trade.

• Trading is not continuous.

• Average time between trades in 1989 was approximately 5 minutes.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 28


The Influence of Tick-Size Rules on Spreads

• The CBOE limits the tick size:

$1/8 for options worth $3 or more


$1/16 for options worth less than $3

• The spread is likely to depend on trading volume


... but also trading volume is likely to depend on the spread.

• So there will be a simultaneous relationship.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 29


The Data

• All trading days during 1989 are used for observations.

• The average bid & ask prices are calculated for each option during the
time 2:00pm – 2:15pm Central Standard time.

• The following are then dropped from the sample for that day:
1. Any options that do not have bid / ask quotes reported during the ¼ hour.
2. Any options with fewer than 10 trades during the day.

• The option price is defined as the average of the bid & the ask.

• We get a total of 2456 observations. This is a pooled regression.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 30


The Models

• For the calls:


CBAi  0  1CDUMi  2Ci  3CLi  4Ti  5CRi  ei (1)

CLi   0   1CBAi   2Ti   3Ti 2   4 Mi2  vi (2)

• And symmetrically for the puts:

PBAi  0  1 PDUMi  2 Pi  3 PLi  4Ti  5 PRi  ui (3)

PLi  0  1 PBAi  2Ti  3Ti 2  4 Mi2  wi (4)

where PRi & CRi are the squared deltas of the options

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 31


The Models (cont’d)

• CDUMi and PDUMi are dummy variables


=0 if Ci or Pi < $3
=1 if Ci or Pi  $3

• T2 allows for a nonlinear relationship between time to maturity and the


spread.

• M2 is used since ATM options have a higher trading volume.

• Aside: are the equations identified?

• Equations (1) & (2) and then separately (3) & (4) are estimated using
2SLS.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 32
Results 1

Call Bid-Ask Spread and Trading Volume Regression


CBAi   0  1CDUM i   2Ci   3CLi   4Ti   5CRi  ei (6.55)
CLi   0   1CBAi   2Ti   3Ti 2   4 M i2  vi (6.56)

0 1 2 3 4 5 Adj. R2
0.08362 0.06114 0.01679 0.00902 -0.00228 -0.15378 0.688
(16.80) (8.63) (15.49) (14.01) (-12.31) (-12.52)
0 1 2 3 4 Adj. R2
-3.8542 46.592 -0.12412 0.00406 0.00866 0.618
(-10.50) (30.49) (-6.01) (14.43) (4.76)
Note: t-ratios in parentheses. Source: George and Longstaff (1993). Reprinted with the permission of
the School of Business Administration, University of Washington.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 33


Results 2

Put Bid-Ask Spread and Trading Volume Regression


PBA i   0  1 PDUM i   2 Pi   3 PLi   4Ti   5 PR i  ui (6.57)
PLi   0   1 PBAi   2Ti   3Ti 2   4 M i2  wi (6.58)
0 1 2 3 4 5 Adj. R2
0.05707 0.03258 0.01726 0.00839 -0.00120 -0.08662 0.675
(15.19) (5.35) (15.90) (12.56) (-7.13) (-7.15)
0 1 2 3 4 Adj. R2
-2.8932 46.460 -0.15151 0.00339 0.01347 0.517
(-8.42) (34.06) (-7.74) (12.90) (10.86)
Note: t-ratios in parentheses. Source: George and Longstaff (1993). Reprinted with the permission of
the School of Business Administration, University of Washington.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 34


Comments:

Adjusted R2  60%

1 and 1 measure the tick size constraint on the spread

2 and 2 measure the effect of the option price on the spread

3 and 3 measure the effect of trading activity on the spread

4 and 4 measure the effect of time to maturity on the spread

5 and 5 measure the effect of risk on the spread

1 and 1 measure the effect of the spread size on trading activity etc.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 35
Calls and Puts as Substitutes

• The paper argues that calls and puts might be viewed as substitutes
since they are all written on the same underlying.

• So call trading activity might depend on the put spread and put trading
activity might depend on the call spread.

• The results for the other variables are little changed.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 36


Conclusions

• Bid - Ask spread variations between options can be explained by reference


to the level of trading activity, deltas, time to maturity etc. There is a 2 way
relationship between volume and the spread.

• The authors argue that in the second part of the paper, they did indeed find
evidence of substitutability between calls & puts.

Comments
- No diagnostics.
- Why do the CL and PL equations not contain the CR and PR variables?
- The authors could have tested for endogeneity of CBA and CL.
- Why are the squared terms in maturity and moneyness only in the
liquidity regressions?
- Wrong sign on the squared deltas.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 37
Vector Autoregressive Models

• A natural generalisation of autoregressive models popularised by Sims


(1980)

• A VAR is in a sense a systems regression model i.e. there is more than one
dependent variable.

• Simplest case is a bivariate VAR


y1t  10  11 y1t 1 ... 1k y1t  k  11 y2 t 1 ...1k y2 t  k  u1t
y2 t  20  21 y2 t 1 ... 2 k y2 t  k   21 y1t 1 ... 2 k y1t  k  u2 t
where uit is an iid disturbance term with E(uit)=0, i=1,2; E(u1t u2t)=0.

• The analysis could be extended to a VAR(g) model, or so that there are g


variables and g equations.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 38


Vector Autoregressive Models:
Notation and Concepts

• One important feature of VARs is the compactness with which we can


write the notation. For example, consider the case from above where k=1.
y1t  10  11 y1t 1  11 y2t 1  u1t
• We can write this as
y2t  20  21 y2t 1  21 y1t 1  u2t

 y1t   10   11 11  y1t 1   u1t 


or         
 y2 t   20   21 21   y2 t 1   u2 t 

or even more compactly as

yt = 0 + 1 yt-1 + ut
g1 g1 gg g1 g1
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 39
Vector Autoregressive Models:
Notation and Concepts (cont’d)

• This model can be extended to the case where there are k lags of each
variable in each equation:

yt = 0 + 1 yt-1 + 2 yt-2 +...+ k yt-k + ut


g1 g1 gg g1 gg g1 gg g1 g1

• We can also extend this to the case where the model includes first
difference terms and cointegrating relationships (a VECM).

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 40


Vector Autoregressive Models Compared with
Structural Equations Models

• Advantages of VAR Modelling


- Do not need to specify which variables are endogenous or exogenous - all are
endogenous
- Allows the value of a variable to depend on more than just its own lags or
combinations of white noise terms, so more general than ARMA modelling
- Provided that there are no contemporaneous terms on the right hand side of the
equations, can simply use OLS separately on each equation
- Forecasts are often better than “traditional structural” models.
• Problems with VAR’s
- VAR’s are a-theoretical (as are ARMA models)
- How do you decide the appropriate lag length?
- So many parameters! If we have g equations for g variables and we have k lags of each
of the variables in each equation, we have to estimate (g+kg2) parameters. e.g. g=3, k=3,
parameters = 30
- Do we need to ensure all components of the VAR are stationary?
- How do we interpret the coefficients?
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 41
Choosing the Optimal Lag Length for a VAR

 2 possible approaches: cross-equation restrictions and information criteria

Cross-Equation Restrictions
 In the spirit of (unrestricted) VAR modelling, each equation should have
the same lag length
 Suppose that a bivariate VAR(8) estimated using quarterly data has 8 lags
of the two variables in each equation, and we want to examine a restriction
that the coefficients on lags 5 through 8 are jointly zero. This can be done
using a likelihood ratio test
 Denote the variance-covariance matrix of residuals (given by uˆuˆ /T), as̂ .
The likelihood ratio test for this joint hypothesis is given by

LR  T log ˆ r  log ˆ u 
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 42
Choosing the Optimal Lag Length for a VAR
(cont’d)

where ̂ r is the variance-covariance matrix of the residuals for the restricted


model (with 4 lags), ̂u is the variance-covariance matrix of residuals for the
unrestricted VAR (with 8 lags), and T is the sample size.
• The test statistic is asymptotically distributed as a 2 with degrees of freedom
equal to the total number of restrictions. In the VAR case above, we are
restricting 4 lags of two variables in each of the two equations = a total of 4 *
2 * 2 = 16 restrictions.
• In the general case where we have a VAR with p equations, and we want to
impose the restriction that the last q lags have zero coefficients, there would
be p2q restrictions altogether
• Disadvantages: Conducting the LR test is cumbersome and requires a
normality assumption for the disturbances.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 43


Information Criteria for VAR Lag Length Selection

• Multivariate versions of the information criteria are required. These can


be defined as:
ln ˆ  2k  / T
MAIC
 k
ˆ
MSBIC  ln   ln(T)
T
ˆ 2k 
MHQIC ln   ln(ln(T))
T
where all notation is as above and k is the total number of regressors in all
equations, which will be equal to g2k + g for g equations, each with k lags
of the g variables, plus a constant term in each equation. The values of the
information criteria are constructed for 0, 1, … lags (up to some pre-
specified maximum k ).

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 44


Does the VAR Include Contemporaneous Terms?

• So far, we have assumed the VAR is of the form


y1t  10  11 y1t 1  11 y2t 1  u1t
y2t  20  21 y2t 1  21 y1t 1  u2t

• But what if the equations had a contemporaneous feedback term?


y1t  10  11 y1t 1  11 y2t 1  12 y2t  u1t
y2t  20  21 y2t 1  21 y1t 1  22 y1t  u2t
• We can write this as

 y1t   10   11 11  y1t 1   12 0   y2 t   u1t 


            
y  
 2 t   20   21 21   2 t 1 
y  0  22   y1t   u2 t 

• This VAR is in primitive form.


‘Introductory Econometrics for Finance’ © Chris Brooks 2019 45
Primitive versus Standard Form VARs

• We can take the contemporaneous terms over to the LHS and write
 1  12   y1t   10   11 11  y1t 1   u1t 
           
   22 1   y2 t   20    21 21   y2 t 1  u2 t 
or
A yt = 0 +  1 yt-1 + ut

• We can then pre-multiply both sides by A-1 to give


yt = A-10 + A-11 yt-1 + A-1ut
or
yt = A0 + A1 yt-1 + et

• This is known as a standard form VAR, which we can estimate using OLS.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 46
Block Significance and Causality Tests

• It is likely that, when a VAR includes many lags of variables, it will be


difficult to see which sets of variables have significant effects on each
dependent variable and which do not. For illustration, consider the following
bivariate VAR(3):

 y1t    10    11  12  y1t 1    11  12  y1t  2    11  12  y1t 3   u1t 


                   
 y 2t   20    21  22  y 2t 1    21  22  y 2t  2    21  22  y 2t 3   u 2t 
• This VAR could be written out to express the individual equations as
y1t  10  11 y1t 1  12 y2t 1   11 y1t 2   12 y2t 2   11 y1t 3   12 y2t 3  u1t
y2t   20   21 y1t 1   22 y2t 1   21 y1t 2   22 y2t 2   21 y1t 3   22 y2t 3  u2t
• We might be interested in testing the following hypotheses, and their
implied restrictions on the parameter matrices:

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 47


Block Significance and Causality Tests (cont’d)

Hypothesis Implied Restriction


1. Lags of y1t do not explain current y2t 21 = 0 and 21 = 0 and 21 = 0
2. Lags of y1t do not explain current y1t 11 = 0 and 11 = 0 and 11 = 0
3. Lags of y2t do not explain current y1t 12 = 0 and 12 = 0 and 12 = 0
4. Lags of y2t do not explain current y2t 22 = 0 and 22 = 0 and 22 = 0
• Each of these four joint hypotheses can be tested within the F-test
framework, since each set of restrictions contains only parameters drawn
from one equation.
• These tests could also be referred to as Granger causality tests.
• Granger causality tests seek to answer questions such as “Do changes in y1
cause changes in y2?” If y1 causes y2, lags of y1 should be significant in the
equation for y2. If this is the case, we say that y1 “Granger-causes” y2.
• If y2 causes y1, lags of y2 should be significant in the equation for y1.
• If both sets of lags are significant, there is “bi-directional causality”
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 48
Impulse Responses

• VAR models are often difficult to interpret: one solution is to construct


the impulse responses and variance decompositions.
• Impulse responses trace out the responsiveness of the dependent variables
in the VAR to shocks to the error term. A unit shock is applied to each
variable and its effects are noted.
• Consider for example a simple bivariate VAR(1):
y1t  10  11 y1t 1  11 y2 t 1  u1t
y2 t  20  21 y2 t 1  21 y1t 1  u2 t
• A change in u1t will immediately change y1. It will change change y2 and
also y1 during the next period.
• We can examine how long and to what degree a shock to a given equation
has on all of the variables in the system.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 49


Variance Decompositions

• Variance decompositions offer a slightly different method of examining


VAR dynamics. They give the proportion of the movements in the
dependent variables that are due to their “own” shocks, versus shocks to the
other variables.

• This is done by determining how much of the s-step ahead forecast error
variance for each variable is explained innovations to each explanatory
variable (s = 1,2,…).

• The variance decomposition gives information about the relative importance


of each shock to the variables in the VAR.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 50


Impulse Responses and Variance Decompositions:
The Ordering of the Variables
• But for calculating impulse responses and variance decompositions, the
ordering of the variables is important.
• The main reason for this is that above, we assumed that the VAR error terms
were statistically independent of one another.
• This is generally not true, however. The error terms will typically be correlated
to some degree.
• Therefore, the notion of examining the effect of the innovations separately has
little meaning, since they have a common component.
• What is done is to “orthogonalise” the innovations.
• In the bivariate VAR, this problem would be approached by attributing all of
the effect of the common component to the first of the two variables in the
VAR.
• In the general case where there are more variables, the situation is more
complex but the interpretation is the same.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 51
An Example of the use of VAR Models:
The Interaction between Property Returns and the
Macroeconomy
• Brooks and Tsolacos (1999) employ a VAR methodology for investigating the
interaction between the UK property market and various macroeconomic variables.
• Monthly data are used for the period December 1985 to January 1998.
• It is assumed that stock returns are related to macroeconomic and business
conditions.
• The variables included in the VAR are
– FTSE Property Total Return Index (with general stock market effects removed)
– The rate of unemployment
– Nominal interest rates
– The spread between long and short term interest rates
– Unanticipated inflation
– The dividend yield.
The property index and unemployment are I(1) and hence are differenced.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 52
Marginal Significance Levels associated with Joint
F-tests that all 14 Lags have not Explanatory Power
for that particular Equation in the VAR

• Multivariate AIC selected 14 lags of each variable in the VAR

Lags of Variable
Dependent variable SIR DIVY SPREAD UNEM UNINFL PROPRES
SIR 0.0000 0.0091 0.0242 0.0327 0.2126 0.0000
DIVY 0.5025 0.0000 0.6212 0.4217 0.5654 0.4033
SPREAD 0.2779 0.1328 0.0000 0.4372 0.6563 0.0007
UNEM 0.3410 0.3026 0.1151 0.0000 0.0758 0.2765
UNINFL 0.3057 0.5146 0.3420 0.4793 0.0004 0.3885
PROPRES 0.5537 0.1614 0.5537 0.8922 0.7222 0.0000

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 53


Variance Decompositions for the
Property Sector Index Residuals

• Ordering for Variance Decompositions and Impulse Responses:


– Order I: PROPRES, DIVY, UNINFL, UNEM, SPREAD, SIR
– Order II: SIR, SPREAD, UNEM, UNINFL, DIVY, PROPRES.
Explained by innovations in

SIR DIVY SPREAD UNEM UNINFL PROPRES

Months ahead I II I II I II I II I II I II

1 0.0 0.8 0.0 38.2 0.0 9.1 0.0 0.7 0.0 0.2 100.0 51.0

2 0.2 0.8 0.2 35.1 0.2 12.3 0.4 1.4 1.6 2.9 97.5 47.5

3 3.8 2.5 0.4 29.4 0.2 17.8 1.0 1.5 2.3 3.0 92.3 45.8

4 3.7 2.1 5.3 22.3 1.4 18.5 1.6 1.1 4.8 4.4 83.3 51.5

12 2.8 3.1 15.5 8.7 15.3 19.5 3.3 5.1 17.0 13.5 46.1 50.0

24 8.2 6.3 6.8 3.9 38.0 36.2 5.5 14.7 18.1 16.9 23.4 22.0

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 54


Impulse Responses and Standard Error Bands for
Innovations in Dividend Yields and
the Treasury Bill Yield
Innovations in Dividend Yields

0.06

0.04

0.02

0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
-0.02

-0.04

-0.06 Innovations in the T-Bill Yield


Steps Ahead
0.12
0.1
0.08
0.06
0.04
0.02
0
-0.02 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
-0.04
-0.06
-0.08
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 Steps Ahead 55
Chapter 8

Modelling long-run relationship in finance

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 1


Stationarity and Unit Root Testing
Why do we need to test for Non-Stationarity?

• The stationarity or otherwise of a series can strongly influence its


behaviour and properties - e.g. persistence of shocks will be infinite for
nonstationary series

• Spurious regressions. If two variables are trending over time, a


regression of one on the other could have a high R2 even if the two are
totally unrelated

• If the variables in the regression model are not stationary, then it can
be proved that the standard assumptions for asymptotic analysis will
not be valid. In other words, the usual “t-ratios” will not follow a t-
distribution, so we cannot validly undertake hypothesis tests about the
regression parameters.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 2


Value of R2 for 1000 Sets of Regressions of a
Non-stationary Variable on another Independent
Non-stationary Variable

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 3


Value of t-ratio on Slope Coefficient for 1000 Sets of
Regressions of a Non-stationary Variable on another
Independent Non-stationary Variable

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 4


Two Types of Non-Stationarity

• Various definitions of non-stationarity exist


• In this chapter, we are really referring to the weak form or covariance
stationarity
• There are two models which have been frequently used to characterise
non-stationarity: the random walk model with drift:
yt =  + yt-1 + ut (1)

and the deterministic trend process:

yt =  +  t + u t (2)

where ut is iid in both cases.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 5


Stochastic Non-Stationarity

• Note that the model (1) could be generalised to the case where yt is an
explosive process:
yt =  + yt-1 + ut
where  > 1.

• Typically, the explosive case is ignored and we use  = 1 to


characterise the non-stationarity because
–  > 1 does not describe many data series in economics and finance.
–  > 1 has an intuitively unappealing property: shocks to the system
are not only persistent through time, they are propagated so that a
given shock will have an increasingly large influence.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 6


Stochastic Non-stationarity: The Impact of Shocks

• To see this, consider the general case of an AR(1) with no drift:


yt = yt-1 + ut (3)
Let  take any value for now.
• We can write: yt-1 = yt-2 + ut-1
yt-2 = yt-3 + ut-2
• Substituting into (3) yields: yt = (yt-2 + ut-1) + ut
= 2yt-2 + ut-1 + ut
• Substituting again for yt-2: yt = 2(yt-3 + ut-2) + ut-1 + ut
= 3 yt-3 + 2ut-2 + ut-1 + ut
• T successive substitutions of this type lead to:
yt = T y0 + ut-1 + 2ut-2 + 3ut-3 + ...+ Tu0 + ut

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 7


The Impact of Shocks for
Stationary and Non-stationary Series

• We have 3 cases:
1. <1  T0 as T
So the shocks to the system gradually die away.

2. =1  T =1 T
So shocks persist in the system and never die away. We obtain:

yt  y0   ut as T
t 0

So just an infinite sum of past shocks plus some starting value of y0.

3. >1. Now given shocks become more influential as time goes on,
since if >1, 3>2> etc.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 8
Detrending a Stochastically Non-stationary Series

• Going back to our 2 characterisations of non-stationarity, the r.w. with drift:


yt =  + yt-1 + ut (1)
and the trend-stationary process
yt =  +  t + u t (2)
• The two will require different treatments to induce stationarity. The second case
is known as deterministic non-stationarity and what is required is detrending.
• The first case is known as stochastic non-stationarity. If we let
yt = yt - yt-1
and L yt = yt-1
so (1-L) yt = yt - L yt = yt - yt-1
If we take (1) and subtract yt-1 from both sides:
yt - yt-1 =  + ut
 yt =  + ut
We say that we have induced stationarity by “differencing once”.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 9
Detrending a Series: Using the Right Method

• Although trend-stationary and difference-stationary series are both


“trending” over time, the correct approach needs to be used in each case.

• If we first difference the trend-stationary series, it would “remove” the


non-stationarity, but at the expense on introducing an MA(1) structure into
the errors.

• Conversely if we try to detrend a series which has a stochastic trend, then


we will not remove the non-stationarity.

• We will now concentrate on the stochastic non-stationarity model since


deterministic non-stationarity does not adequately describe most series in
economics or finance.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 10
Sample Plots for Various Stochastic Processes:
A White Noise Process

4
3
2
1
0
-1 1 40 79 118 157 196 235 274 313 352 391 430 469
-2
-3
-4

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 11


Sample Plots for various Stochastic Processes:
A Random Walk and a Random Walk with Drift

70

60
Random Walk
50
Random Walk with Drift

40

30

20

10

0
1 19 37 55 73 91 109 127 145 163 181 199 217 235 253 271 289 307 325 343 361 379 397 415 433 451 469 487

-10

-20

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 12


Sample Plots for Various Stochastic Processes:
A Deterministic Trend Process

30
25
20
15
10
5
0
-5 1 40 79 118 157 196 235 274 313 352 391 430 469

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 13


Autoregressive Processes with
Differing Values of  (0, 0.8, 1)

15

Phi=1
10
Phi=0.8

Phi=0
5

0
1 53 105 157 209 261 313 365 417 469 521 573 625 677 729 781 833 885 937 989

-5

-10

-15

-20

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 14


Definition of Non-Stationarity

• Consider again the simplest stochastic trend model:


yt = yt-1 + ut
or  yt = ut
• We can generalise this concept to consider the case where the series
contains more than one “unit root”. That is, we would need to apply the
first difference operator, , more than once to induce stationarity.

Definition
If a non-stationary series, yt must be differenced d times before it becomes
stationary, then it is said to be integrated of order d. We write yt I(d).
So if yt  I(d) then dyt I(0).
An I(0) series is a stationary series
An I(1) series contains one unit root,
e.g. yt = yt-1 + ut
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 15
Characteristics of I(0), I(1) and I(2) Series

• An I(2) series contains two unit roots and so would require differencing
twice to induce stationarity.

• I(1) and I(2) series can wander a long way from their mean value and
cross this mean value rarely.

• I(0) series should cross the mean frequently.

• The majority of economic and financial series contain a single unit root,
although some are stationary and consumer prices have been argued to
have 2 unit roots.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 16


How do we test for a unit root?

• The early and pioneering work on testing for a unit root in time series
was done by Dickey and Fuller (Dickey and Fuller 1979, Fuller 1976).
The basic objective of the test is to test the null hypothesis that  =1 in:
yt = yt-1 + ut
against the one-sided alternative  <1. So we have
H0: series contains a unit root
vs. H1: series is stationary.

• We usually use the regression:


yt = yt-1 + ut
so that a test of =1 is equivalent to a test of =0 (since -1=).

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 17


Different Forms for the DF Test Regressions

• Dickey Fuller tests are also known as  tests: , , .


• The null (H0) and alternative (H1) models in each case are
i) H0: yt = yt-1+ut
H1: yt = yt-1+ut, <1
This is a test for a random walk against a stationary autoregressive process of
order one (AR(1))
ii) H0: yt = yt-1+ut
H1: yt = yt-1++ut, <1
This is a test for a random walk against a stationary AR(1) with drift.
iii) H0: yt = yt-1+ut
H1: yt = yt-1++t+ut, <1
This is a test for a random walk against a stationary AR(1) with drift and a
time trend.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 18


Computing the DF Test Statistic

• We can write
yt=ut
where yt = yt- yt-1, and the alternatives may be expressed as
yt = yt-1++t +ut
with ==0 in case i), and =0 in case ii) and =-1. In each case, the
tests are based on the t-ratio on the yt-1 term in the estimated regression of
yt on yt-1, plus a constant in case ii) and a constant and trend in case iii).
The test statistics are defined as 

test statistic = 

SE( )
• The test statistic does not follow the usual t-distribution under the null,
since the null is one of non-stationarity, but rather follows a non-standard
distribution. Critical values are derived from Monte Carlo experiments in,
for example, Fuller (1976). Relevant examples of the distribution are
shown in table 4.1 below
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 19
Critical Values for the DF Test

Significance level 10% 5% 1%


C.V. for constant -2.57 -2.86 -3.43
but no trend
C.V. for constant -3.12 -3.41 -3.96
and trend
Table 4.1: Critical Values for DF and ADF Tests (Fuller,
1976, p373).

The null hypothesis of a unit root is rejected in favour of the stationary alternative
in each case if the test statistic is more negative than the critical value.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 20


The Augmented Dickey Fuller (ADF) Test

• The tests above are only valid if ut is white noise. In particular, ut will be
autocorrelated if there was autocorrelation in the dependent variable of the
regression (yt) which we have not modelled. The solution is to “augment”
the test using p lags of the dependent variable. The alternative model in
case (i) is now written: p
yt yt 1   i yt i  ut
i 1

• The same critical values from the DF tables are used as before. A problem
now arises in determining the optimal number of lags of the dependent
variable.
There are 2 ways
- use the frequency of the data to decide
- use information criteria
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 21
Testing for Higher Orders of Integration

• Consider the simple regression:


yt = yt-1 + ut
We test H0: =0 vs. H1: <0.
• If H0 is rejected we simply conclude that yt does not contain a unit root.
• But what do we conclude if H0 is not rejected? The series contains a unit
root, but is that it? No! What if ytI(2)? We would still not have rejected. So
we now need to test
H0: ytI(2) vs. H1: ytI(1)
We would continue to test for a further unit root until we rejected H0.
• We now regress 2yt on yt-1 (plus lags of 2yt if necessary).
• Now we test H0: ytI(1) which is equivalent to H0: ytI(2).
• So in this case, if we do not reject (unlikely), we conclude that yt is at least
I(2).
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 22
The Phillips-Perron Test

• Phillips and Perron have developed a more comprehensive theory of


unit root non-stationarity. The tests are similar to ADF tests, but they
incorporate an automatic correction to the DF procedure to allow for
autocorrelated residuals.

• The tests usually give the same conclusions as the ADF tests, and the
calculation of the test statistics is complex.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 23


Criticism of Dickey-Fuller and
Phillips-Perron-type tests

• The main criticism is that the power of the tests is low if the process is
stationary but with a root close to the non-stationary boundary.
e.g. the tests are poor at deciding if
=1 or =0.95,
especially with small sample sizes.

• If the true data generating process (dgp) is


yt = 0.95yt-1 + ut
then the null hypothesis of a unit root should be rejected.

• One way to get around this is to use a stationarity test as well as the
unit root tests we have looked at.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 24


Stationarity Tests

• Stationarity tests have


H0: yt is stationary
versus H1: yt is non-stationary

So that by default under the null the data will appear stationary.

• One such stationarity test is the KPSS test (Kwaitowski, Phillips,


Schmidt and Shin, 1992).

• Thus we can compare the results of these tests with the ADF/PP
procedure to see if we obtain the same conclusion.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 25


Stationarity Tests (cont’d)

• A Comparison

ADF / PP KPSS
H0: yt  I(1) H0: yt  I(0)
H1: yt  I(0) H1: yt  I(1)

• 4 possible outcomes

Reject H0 and Do not reject H0


Do not reject H0 and Reject H0
Reject H0 and Reject H0
Do not reject H0 and Do not reject H0
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 26
Unit Root Tests with Structural Breaks

• The standard Dickey-Fuller-type unit root tests presented above do not


perform well if there are structural breaks in the series
• The tests have low power in such circumstances and they fail to reject
the unit root null hypothesis when it is incorrect as the slope parameter
in the regression of yt on yt−1 is biased towards unity
• The larger the break and the smaller the sample, the lower the power of
the test
• Unit root tests are also oversized in the presence of structural breaks
• Perron (1989) demonstrates that after allowing for structural breaks in
the tests, a whole raft of macroeconomic series may be stationary
• He argues that most economic time series are best characterised by
broken trend stationary processes, i.e. a deterministic trend but with a
structural break.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 27
The Perron (1989) Procedure - Background

• Perron (1989) proposes three test equations differing dependent on the


type of break that is thought to be present:
1. A ‘crash’ model that allows a break in the level (i.e. the intercept)
2. A ‘changing growth’ model that allows for a break in the growth rate
(i.e. the slope)
3. A model that allows for both types of break to occur at the same time,
changing both the intercept and the slope of the trend.

• Define the break point in the data as Tb and Dt is a dummy variable


defined as

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 28


The Perron (1989) Procedure - Details

• The equation for the third (most general) version of the test is

• For the crash only model, set α2 = 0


• For the changing growth only model, set α1 = 0
• In all three cases, there is a unit root with a structural break at Tb under
the null hypothesis and a series that is a stationary process with a break
under the alternative
• A limitation of this approach is that it assumes that the break date is
known in advance
• It is possible, however, that the date will not be known and must be
determined from the data.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 29
The Banerjee et al. (1992) and Zivot and Andrews
(1992) Procedures - Background

• More seriously, Christiano (1992) has argued that the critical values
employed with the test will presume the break date to be chosen
exogenously
• But most researchers will select a break point based on an examination
of the data
• Thus the asymptotic theory assumed will no longer hold
• Banerjee et al. (1992) and Zivot and Andrews (1992) introduce an
approach to testing for unit roots in the presence of structural change
that allows the break date to be selected endogenously.
• Their methods are based on recursive, rolling and sequential tests.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 30


The Banerjee et al. (1992) and Zivot and Andrews
(1992) Procedures - Details

• For the recursive and rolling tests, Banerjee et al. propose four
specifications.
1. The standard Dickey-Fuller test on the whole sample
2. The ADF test conducted repeatedly on the sub-samples and the minimal
DF statistic is obtained
3. The maximal DF statistic obtained from the sub-samples
4. The difference between the maximal and minimal statistics
• For the sequential test, the whole sample is used each time with the
following regression being run

where tused = Tb/T .

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 31


The Banerjee et al. (1992) and Zivot and Andrews
(1992) Procedures – Details 2

• The test is run repeatedly for different values of Tb over as much of the
data as possible (a ‘trimmed sample’)

• This excludes the first few and the last few observations
• Clearly it is τt(tused) allows for the break, which can either be in the
level (where τt(tused) = 1 if t> tused and 0 otherwise); or the break can be
in the deterministic trend (where τt(tused) = t − tused if t > tused and 0
otherwise

• For each specification, a different set of critical values is required, and


these can be found in Banerjee et al.(1992).

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 32


Further Extensions

• Perron (1997) proposes an extension of the Perron (1989) technique


but using a sequential procedure that estimates the test statistic
allowing for a break at any point during the sample to be determined
by the data

• This technique is very similar to that of Zivot and Andrews, except that
Perron’s is more flexible since it allows for a break under both the null
and alternative hypotheses

• A further extension would be to allow for more than one structural


break in the series – for example, Lumsdaine and Papell (1997)
enhance the Zivot and Andrews (1992) approach to allow for two
structural breaks.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 33
Testing for Unit Roots with Structural Breaks
Example: EuroSterling Interest Rates

• Brooks and Rew (2002) examine whether EuroSterling interest rates


are best viewed as unit root process or not, allowing for the possibility
of structural breaks in the series
• Failure to account for structural breaks (caused, for example, by
changes in monetary policy or the removal of exchange rate controls)
may lead to incorrect inferences regarding the validity or otherwise of
the expectations hypothesis.
• Their sample covers the period 1 January 1981 to 1 September 1997
• They use the standard Dickey-Fuller test, the recursive and sequential
tests of Banerjee et al. They also employ the rolling test, the Perron
(1997) approach and several other techniques

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 34


Testing for Unit Roots with Structural Breaks in
EuroSterling Interest Rates – Results

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 35


Testing for Unit Roots with Structural Breaks in
EuroSterling Interest Rates – Conclusions

• The findings for the recursive tests are that the unit root null should
not be rejected at the 10% level for any of the maturities examined
• For the sequential tests, the results are slightly more mixed with the
break in trend model not rejecting the null hypothesis, while it is
rejected for the short, 7-day and the 1-month rates when a structural
break is allowed for in the mean
• The weight of evidence indicates that short term interest rates are best
viewed as unit root processes that have a structural break in their level
around the time of ‘Black Wednesday’ (16 September 1992) when the
UK dropped out of the European Exchange Rate Mechanism
• The longer term rates, on the other hand, are I(1) processes with no
breaks

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 36


Seasonal Unit Roots

• It is possible that a series may contain seasonal unit roots, so that it


requires seasonal differencing to induce stationarity
• We would use the notation I(d,D) to denote a series that is integrated
of order d,D and requires differencing d times and seasonal
differencing D times to obtain a stationary process
• Osborn (1990) develops a test for seasonal unit roots based on a
natural extension of the Dickey-Fuller approach.
• However, Osborn also shows that only a small proportion of
macroeconomic series exhibit seasonal unit roots; the majority have
seasonal patterns that can better be characterised using dummy
variables, which may explain why the concept of seasonal unit roots
has not been widely adopted

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 37


Cointegration: An Introduction

• In most cases, if we combine two variables which are I(1), then the
combination will also be I(1).

• More generally, if we combine variables with differing orders of


integration, the combination will have an order of integration equal to the
largest. i.e.,
if Xi,t  I(di) for i = 1,2,3,...,k
so we have k variables each integrated of order di.

Let k (1)
zt   i X i,t
i 1
Then zt  I(max di)

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 38


Linear Combinations of Non-stationary Variables

• Rearranging (1), we can write


k
X1,t   i X i ,t  z't
i 2

i zt

where i  
1 , z ' 
t  , i  2,..., k
1

• This is just a regression equation.

• But the disturbances would have some very undesirable properties: zt´ is
not stationary and is autocorrelated if all of the Xi are I(1).

• We want to ensure that the disturbances are I(0). Under what circumstances
will this be the case?
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 39
Definition of Cointegration (Engle & Granger, 1987)

• Let zt be a k1 vector of variables, then the components of zt are cointegrated


of order (d,b) if
i) All components of zt are I(d)
ii) There is at least one vector of coefficients  such that  zt  I(d-b)

• Many time series are non-stationary but “move together” over time.
• If variables are cointegrated, it means that a linear combination of them will
be stationary.
• There may be up to r linearly independent cointegrating relationships (where
r  k-1), also known as cointegrating vectors. r is also known as the
cointegrating rank of zt.
• A cointegrating relationship may also be seen as a long term relationship.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 40


Cointegration and Equilibrium

• Examples of possible Cointegrating Relationships in finance:


– spot and futures prices
– ratio of relative prices and an exchange rate
– equity prices and dividends

• Market forces arising from no arbitrage conditions should ensure an


equilibrium relationship.

• No cointegration implies that series could wander apart without bound


in the long run.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 41


Equilibrium Correction or Error Correction Models

• When the concept of non-stationarity was first considered, a usual


response was to independently take the first differences of a series of I(1)
variables.

• The problem with this approach is that pure first difference models have no
long run solution.
e.g. Consider yt and xt both I(1).
The model we may want to estimate is
 yt =  xt + ut
But this collapses to nothing in the long run.

• The definition of the long run that we use is where


yt = yt-1 = y; xt = xt-1 = x.
• Hence all the difference terms will be zero, i.e.  yt = 0; xt = 0.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 42
Specifying an ECM

• One way to get around this problem is to use both first difference and
levels terms, e.g.
 yt = 1xt + 2(yt-1-xt-1) + ut (1)
• yt-1-xt-1 is known as the error correction term.

• Providing that yt and xt are cointegrated with cointegrating coefficient


, then (yt-1-xt-1) will be I(0) even though the constituents are I(1).

• We can thus validly use OLS on (1).

• The Granger representation theorem shows that any cointegrating


relationship can be expressed as an equilibrium correction model.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 43
Testing for Cointegration in Regression

• The model for the equilibrium correction term can be generalised to


include more than two variables:
yt =  1 + 2x2t + 3x3t + … + kxkt + ut (3)

• ut should be I(0) if the variables yt, x2t, ... xkt are cointegrated.

• So what we want to test is the residuals of equation (3) to see if they


are non-stationary or stationary. We can use the DF / ADF test on ut.
So we have the regression
with vt  iid.
ut  ut 1  vt
• However, since this is a test on the residuals of an actual model, ut ,
then the critical values are changed.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 44


Testing for Cointegration in Regression:
Conclusions

• Engle and Granger (1987) have tabulated a new set of critical values
and hence the test is known as the Engle Granger (E.G.) test.

• We can also use the Durbin Watson test statistic or the Phillips Perron
approach to test for non-stationarity of ut .

• What are the null and alternative hypotheses for a test on the residuals
of a potentially cointegrating regression?
H0 : unit root in cointegrating regression’s residuals
H1 : residuals from cointegrating regression are stationary

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 45


Methods of Parameter Estimation in
Cointegrated Systems:
The Engle-Granger Approach

• There are (at least) 3 methods we could use: Engle Granger, Engle and Yoo,
and Johansen.
• The Engle Granger 2 Step Method
This is a single equation technique which is conducted as follows:
Step 1:
- Make sure that all the individual variables are I(1).
- Then estimate the cointegrating regression using OLS.
- Save the residuals of the cointegrating regression, ut .
- Test these residuals to ensure that they are I(0).
Step 2:
- Use the step 1 residuals as one variable in the error correction model e.g.
 yt =  1xt +  2( uˆt 1 ) + ut
where uˆt 1= yt-1-ˆ xt-1
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 46
An Example of a Model for Non-stationary
Variables: Lead-Lag Relationships between Spot
and Futures Prices
Background
• We expect changes in the spot price of a financial asset and its corresponding
futures price to be perfectly contemporaneously correlated and not to be
cross-autocorrelated.

i.e. expect Corr(ln(Ft),ln(St))  1


Corr(ln(Ft),ln(St-k))  0  k
Corr(ln(Ft-j),ln(St))  0  j

• We can test this idea by modelling the lead-lag relationship between the two.

• We will consider two papers Tse(1995) and Brooks et al (2001).

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 47


Futures & Spot Data

• Tse (1995): 1055 daily observations on NSA stock index and stock
index futures values from December 1988 - April 1993.

• Brooks et al (2001): 13,035 10-minutely observations on the FTSE


100 stock index and stock index futures prices for all trading days in
the period June 1996 – 1997.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 48


Methodology

• The fair futures price is given by

F*t = S t e(r-d)(T-t)
where Ft* is the fair futures price, St is the spot price, r is a
continuously compounded risk-free rate of interest, d is the
continuously compounded yield in terms of dividends derived from the
stock index until the futures contract matures, and (T-t) is the time to
maturity of the futures contract. Taking logarithms of both sides of
equation above gives
f t *  st  (r - d)(T - t)

• First, test ft and st for nonstationarity.


‘Introductory Econometrics for Finance’ © Chris Brooks 2019 49
Dickey-Fuller Tests on Log-Prices and Returns for
High Frequency FTSE Data

Futures Spot
Dickey-Fuller Statistics -0.1329 -0.7335
for Log-Price Data

Dickey Fuller Statistics -84.9968 -114.1803


for Returns Data

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 50


Cointegration Test Regression and Test on Residuals

• Conclusion: log Ft and log St are not stationary, but log Ft and log St are
stationary.
• But a model containing only first differences has no long run relationship.
• The solution is to see if there exists a cointegrating relationship between ft
and st which would mean that we can validly include levels terms in this
framework.

• Potential cointegrating regression:


st   0   1 f t  z t

where zt is a disturbance term.


• Estimate the regression, collect the residuals, zt , and test whether they are
stationary.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 51


Estimated Equation and Test for Cointegration for
High Frequency FTSE Data

Cointegrating Regression
Coefficient Estimated Value
0 0.1345
1 0.9834
DF Test on residuals Test Statistic
ẑt -14.7303

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 52


Conclusions from Unit Root and Cointegration Tests

• Conclusion: zt are stationary and therefore we have a cointegrating


relationship between log Ft and log St.

• Final stage in Engle-Granger 2-step method is to use the first stage


residuals, zt as the equilibrium correction term in the general equation.

• The overall model is

 ln S t   0  zˆt 1  1 ln S t 1  1 ln Ft 1  vt

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 53


Estimated Error Correction Model for
High Frequency FTSE Data

Coefficient Estimated Value t-ratio


0 9.6713E-06 1.6083
 -8.3388E-01 -5.1298
1 0.1799 19.2886
1 0.1312 20.4946

Look at the signs and significances of the coefficients:


• ̂1 is positive and highly significant
• ̂ 1 is positive and highly significant
•  is negative and highly significant

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 54


Forecasting High Frequency FTSE Returns

• Is it possible to use the error correction model to produce superior


forecasts to other models?

Comparison of Out of Sample Forecasting Accuracy


ECM ECM-COC ARIMA VAR
RMSE 0.0004382 0.0004350 0.0004531 0.0004510
MAE 0.4259 0.4255 0.4382 0.4378
% Correct 67.69% 68.75% 64.36% 66.80%
Direction

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 55


Can Profitable Trading Rules be Derived from the
ECM-COC Forecasts?
• The trading strategy involves analysing the forecast for the spot return, and
incorporating the decision dictated by the trading rules described below. It is assumed
that the original investment is £1000, and if the holding in the stock index is zero, the
investment earns the risk free rate.
– Liquid Trading Strategy - making a round trip trade (i.e. a purchase and sale of
the FTSE100 stocks) every ten minutes that the return is predicted to be positive
by the model.
– Buy-&-Hold while Forecast Positive Strategy - allows the trader to continue
holding the index if the return at the next predicted investment period is positive.
– Filter Strategy: Better Predicted Return Than Average - involves purchasing the
index only if the predicted returns are greater than the average positive return.
– Filter Strategy: Better Predicted Return Than First Decile - only the returns
predicted to be in the top 10% of all returns are traded on
– Filter Strategy: High Arbitrary Cut Off - An arbitrary filter of 0.0075% is
imposed,
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 56
Spot Trading Strategy Results for Error Correction
Model Incorporating the Cost of Carry

Trading Strategy Terminal Return ( % ) Terminal Return ( % )


Wealth {Annualised} Wealth (£) {Annualised} Number
(£) with slippage with slippage of trades
Passive 1040.92 4.09 1040.92 4.09 1
Investment {49.08} {49.08}

Liquid Trading 1156.21 15.62 1056.38 5.64 583


{187.44} {67.68}

Buy-&-Hold while 1156.21 15.62 1055.77 5.58 383


Forecast Positive {187.44} {66.96}

Filter I 1144.51 14.45 1123.57 12.36 135


{173.40} {148.32}
Filter II 1100.01 10.00 1046.17 4.62 65
{120.00} {55.44}

Filter III 1019.82 1.98 1003.23 0.32 8


{23.76} {3.84}

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 57


Strategy Trading Results – Conclusions

• The futures market “leads” the spot market because:


• the stock index is not a single entity, so
• some components of the index are infrequently traded
• it is more expensive to transact in the spot market
• stock market indices are only recalculated every minute

• Spot & futures markets do indeed have a long run relationship.

• Since it appears impossible to profit from lead–lag relationships, their


existence is entirely consistent with the absence of arbitrage
opportunities and in accordance with modern definitions of the
efficient markets hypothesis.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 58


The Engle-Granger Approach: Some Drawbacks

This method suffers from a number of problems:


1. Unit root and cointegration tests have low power in finite samples
2. We are forced to treat the variables asymmetrically and to specify one as
the dependent and the other as independent variables.
3. Cannot perform any hypothesis tests about the actual cointegrating
relationship estimated at stage 1.

 Problem 1 is a small sample problem that should disappear asymptotically.


 Problem 2 is addressed by the Johansen approach.
 Problem 3 is addressed by the Engle and Yoo approach or the Johansen approach.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 59


The Engle & Yoo 3-Step Method

• One of the problems with the EG 2-step method is that we cannot make
any inferences about the actual cointegrating regression.

• The Engle & Yoo (EY) 3-step procedure takes its first two steps from EG.

• EY add a third step giving updated estimates of the cointegrating vector


and its standard errors.

• The most important problem with both these techniques is that in the
general case above, where we have more than two variables which may be
cointegrated, there could be more than one cointegrating relationship.

• In fact there can be up to r linearly independent cointegrating vectors


(where r  g-1), where g is the number of variables in total.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 60
The Engle & Yoo 3-Step Method (cont’d)

• So, in the case where we just had y and x, then r can only be one or
zero.

• But in the general case there could be more cointegrating relationships.

• And if there are others, how do we know how many there are or
whether we have found the “best”?

• The answer to this is to use a systems approach to cointegration which


will allow determination of all r cointegrating relationships -
Johansen’s method.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 61


Testing for and Estimating Cointegrating Systems
Using the Johansen Technique Based on VARs

• To use Johansen’s method, we need to turn the VAR of the form

yt = 1 yt-1 + 2 yt-2 +...+ k yt-k + ut


g×1 g×g g×1 g×g g×1 g×g g×1 g×1

into a vector error correction model (VECM), which can be written as

yt =  yt-k + 1 yt-1 + 2 yt-2 + ... + k-1 yt-(k-1) + ut


k i
where  = (  i )  I g and i  (  j )  I g
j 1 j 1

 is a long run coefficient matrix since all the yt-i = 0.


‘Introductory Econometrics for Finance’ © Chris Brooks 2019 62
Review of Matrix Algebra
necessary for the Johansen Test

• Let  denote a gg square matrix and let c denote a g1 non-zero vector,
and let  denote a set of scalars.

•  is called a characteristic root or set of roots of  if we can write


c= c
gg g1 g1

• We can also write


 c =  Ip c

and hence
(  - Ig ) c = 0
where Ig is an identity matrix.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 63
Review of Matrix Algebra (cont’d)

• Since c  0 by definition, then for this system to have zero solution, we


require the matrix (  - Ig ) to be singular (i.e. to have zero determinant).
  - Ig  = 0

 5 1
• For example, let  be the 2  2 matrix  
2 4 

• Then the characteristic equation is

 5 1 1 0
  - Ig        0
2 4  0 1

5  1
  (5   )(4   )  2  2  9  18
2 4
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 64
Review of Matrix Algebra (cont’d)

• This gives the solutions  = 6 and  = 3.

• The characteristic roots are also known as Eigenvalues.

• The rank of a matrix is equal to the number of linearly independent rows or


columns in the matrix.

• We write Rank () = r

• The rank of a matrix is equal to the order of the largest square matrix we
can obtain from  which has a non-zero determinant.

• For example, the determinant of  above  0, therefore it has rank 2.


‘Introductory Econometrics for Finance’ © Chris Brooks 2019 65
The Johansen Test and Eigenvalues

• Some properties of the eigenvalues of any square matrix A:


1. the sum of the eigenvalues is the trace
2. the product of the eigenvalues is the determinant
3. the number of non-zero eigenvalues is the rank

• Returning to Johansen’s test, the VECM representation of the VAR was


yt =  yt-1 + 1 yt-1 + 2 yt-2 + ... + k-1 yt-(k-1) + ut

• The test for cointegration between the y’s is calculated by looking at the
rank of the  matrix via its eigenvalues. (To prove this requires some
technical intermediate steps).

• The rank of a matrix is equal to the number of its characteristic roots


(eigenvalues) that are different from zero.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 66
The Johansen Test and Eigenvalues (cont’d)

• The eigenvalues denoted i are put in order:


1  2  ...  g
• If the variables are not cointegrated, the rank of  will not be
significantly different from zero, so i = 0  i.
Then if i = 0, ln(1–i) = 0.
If the ’s are roots, they must be less than 1 in absolute value.

• Say rank () = 1, then ln(1–1) will be negative and ln(1–i) = 0.

• If the eigenvalue i is non-zero, then ln(1-i) < 0  i > 1.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 67


The Johansen Test Statistics

• The test statistics for cointegration are formulated as


g
trace (r )  T  ln( 1  ˆi )
i  r 1

and
max (r , r  1)  T ln(1  r 1 )

where is the estimated value for the ith ordered eigenvalue from the
 matrix.
trace tests the null that the number of cointegrating vectors is less than
equal to r against an unspecified alternative.
trace = 0 when all the i = 0, so it is a joint test.
max tests the null that the number of cointegrating vectors is r against
an alternative of r+1.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 68


Decomposition of the  Matrix

• For any 1 < r < g,  is defined as the product of two matrices:


 = 
gg gr rg
•  contains the cointegrating vectors while  gives the “loadings” of
each cointegrating vector in each equation.
• For example, if g=4 and r=1,  and  will be 41, and yt-k will be
given by:

  11   y1   11 
     
 12   y 2  or    12  y   y   y   y 
    11  12  13  14      11 1 12 2 13 3 14 4
 y  13 
 13   3
  y   14 
 14   4  t k
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 69
Johansen Critical Values

• Johansen & Juselius (1990) provide critical values for the 2 statistics.
The distribution of the test statistics is non-standard. The critical values
depend on:
1. the value of g-r, the number of non-stationary components
2. whether a constant and/or trend are included in the regressions.

• If the test statistic is greater than the critical value from Johansen’s
tables, reject the null hypothesis that there are r cointegrating vectors in
favour of the alternative that there are more than r.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 70


The Johansen Testing Sequence

• The testing sequence under the null is r = 0, 1, ..., g-1


so that the hypotheses for trace are

H 0: r = 0 vs H1: 0 < r  g
H 0: r = 1 vs H1: 1 < r  g
H0: r = 2 vs H1: 2 < r  g
... ... ...
H0: r = g-1 vs H1: r = g

• We keep increasing the value of r until we no longer reject the null.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 71


Interpretation of Johansen Test Results

• But how does this correspond to a test of the rank of the  matrix?

• r is the rank of .

•  cannot be of full rank (g) since this would correspond to the original
yt being stationary.

• If  has zero rank, then by analogy to the univariate case, yt depends
only on yt-j and not on yt-1, so that there is no long run relationship
between the elements of yt-1. Hence there is no cointegration.

• For 1 < rank () < g , there are multiple cointegrating vectors.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 72


Hypothesis Testing Using Johansen

• EG did not allow us to do hypothesis tests on the cointegrating relationship


itself, but the Johansen approach does.
• If there exist r cointegrating vectors, only these linear combinations will be
stationary.
• You can test a hypothesis about one or more coefficients in the
cointegrating relationship by viewing the hypothesis as a restriction on the
 matrix.
• All linear combinations of the cointegrating vectors are also cointegrating
vectors.
• If the number of cointegrating vectors is large, and the hypothesis under
consideration is simple, it may be possible to recombine the cointegrating
vectors to satisfy the restrictions exactly.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 73


Hypothesis Testing Using Johansen (cont’d)

• As the restrictions become more complex or more numerous, it will


eventually become impossible to satisfy them by renormalisation.

• After this point, if the restriction is not severe, then the cointegrating
vectors will not change much upon imposing the restriction.

• A test statistic to test this hypothesis is given by


r
 T  [ln( 1  i )  ln( 1  i *)]  2(m)
i 1
where,
i* are the characteristic roots of the restricted model
i are the characteristic roots of the unrestricted model
r is the number of non-zero characteristic roots in the unrestricted model,
and m is the number of restrictions.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 74
Cointegration Tests using Johansen:
Three Examples

Example 1: Hamilton(1994, p. 647)

• Does the PPP relationship hold for the US/Italian exchange rate–price
system?

• A VAR was estimated with 12 lags on 189 observations. The Johansen


test statistics were
r max critical value
0 22.12 20.8
1 10.19 14.0

• Conclusion: there is one cointegrating relationship.


‘Introductory Econometrics for Finance’ © Chris Brooks 2019 75
Example 2: Purchasing Power Parity (PPP)

• PPP states that the equilibrium exchange rate between 2 countries is


equal to the ratio of relative prices
• A necessary and sufficient condition for PPP is that the log of the
exchange rate between countries A and B, and the logs of the price
levels in countries A and B be cointegrated with cointegrating vector
[ 1 –1 1] .
• Chen (1995) uses monthly data for April 1973–December 1990 to test
the PPP hypothesis using the Johansen approach.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 76


Cointegration Tests of PPP with European Data

Tests for r=0 r1 r2 1 2


cointegration between
FRF – DEM 34.63* 17.10 6.26 1.33 -2.50
FRF – ITL 52.69* 15.81 5.43 2.65 -2.52
FRF – NLG 68.10* 16.37 6.42 0.58 -0.80
FRF – BEF 52.54* 26.09* 3.63 0.78 -1.15
DEM – ITL 42.59* 20.76* 4.79 5.80 -2.25
DEM – NLG 50.25* 17.79 3.28 0.12 -0.25
DEM – BEF 69.13* 27.13* 4.52 0.87 -0.52
ITL – NLG 37.51* 14.22 5.05 0.55 -0.71
ITL – BEF 69.24* 32.16* 7.15 0.73 -1.28
NLG – BEF 64.52* 21.97* 3.88 1.69 -2.17
Critical values 31.52 17.95 8.18 - -
Notes: FRF- French franc; DEM – German Mark; NLG – Dutch guilder; ITL – Italian lira; BEF –
Belgian franc. Source: Chen (1995). Reprinted with the permission of Taylor and Francis Ltd.
(www.tandf.co.uk).

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 77


Example 3: Are International
Bond Markets Cointegrated?

• Mills & Mills (1991)

• If financial markets are cointegrated, this implies that they have a


“common stochastic trend”.

Data:
• Daily closing observations on redemption yields on government bonds for
4 bond markets: US, UK, West Germany, Japan.

• For cointegration, a necessary but not sufficient condition is that the yields
are nonstationary. All 4 yields series are I(1).

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 78


Testing for Cointegration Between the Yields

• The Johansen procedure is used. There can be at most 3 linearly independent


cointegrating vectors.
g
• Mills & Mills use the trace test statistic: trace (r )  T  ln( 1  ˆi )
i  r 1
where i are the ordered eigenvalues.

Johansen Tests for Cointegration between International Bond Yields


r (number of cointegrating Test statistic Critical Values
vectors under the null hypothesis) 10% 5%
0 22.06 35.6 38.6
1 10.58 21.2 23.8
2 2.52 10.3 12.0
3 0.12 2.9 4.2
Source: Mills and Mills (1991). Reprinted with the permission of Blackwell Publishers.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 79


Testing for Cointegration Between the Yields
(cont’d)

• Conclusion: No cointegrating vectors.

• The paper then goes on to estimate a VAR for the first differences of the
yields, which is of the form k
X t   i X t i   t
i 1

where
 X (US ) t   11i 12i 13i 14i  1t 
 X (UK )   22i 23i 24i   
Xt   t , i   21i ,t   2 t 
 X (WG ) t  31i 32i 33i 34i  3t 
 X ( JAP)   42i 43i 44i   
They set k = 8.  t  41i  4t 

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 80


Variance Decompositions for VAR
of International Bond Yields

Variance Decompositions for VAR of International Bond Yields


Explaining Days Explained by movements in
movements in ahead US UK Germany Japan
US 1 95.6 2.4 1.7 0.3
5 94.2 2.8 2.3 0.7
10 92.9 3.1 2.9 1.1
20 92.8 3.2 2.9 1.1

UK 1 0.0 98.3 0.0 1.7


5 1.7 96.2 0.2 1.9
10 2.2 94.6 0.9 2.3
20 2.2 94.6 0.9 2.3

Germany 1 0.0 3.4 94.6 2.0


5 6.6 6.6 84.8 3.0
10 8.3 6.5 82.9 3.6
20 8.4 6.5 82.7 3.7

Japan 1 0.0 0.0 1.4 100.0


5 1.3 1.4 1.1 96.2
10 1.5 2.1 1.8 94.6
20 1.6 2.2 1.9 94.2

Source: Mills and Mills (1991). Reprinted with the permission of Blackwell Publishers.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 81
Impulse Responses for VAR of
International Bond Yields

Impulse Responses for VAR of International Bond Yields


Response of US to innovations in
Days after shock US UK Germany Japan
0 0.98 0.00 0.00 0.00
1 0.06 0.01 -0.10 0.05
2 -0.02 0.02 -0.14 0.07
3 0.09 -0.04 0.09 0.08
4 -0.02 -0.03 0.02 0.09
10 -0.03 -0.01 -0.02 -0.01
20 0.00 0.00 -0.10 -0.01

Response of UK to innovations in
Days after shock US UK Germany Japan
0 0.19 0.97 0.00 0.00
1 0.16 0.07 0.01 -0.06
2 -0.01 -0.01 -0.05 0.09
3 0.06 0.04 0.06 0.05
4 0.05 -0.01 0.02 0.07
10 0.01 0.01 -0.04 -0.01
20 0.00 0.00 -0.01 0.00

Response of Germany to innovations in


Days after shock US UK Germany Japan
0 0.07 0.06 0.95 0.00
1 0.13 0.05 0.11 0.02
2 0.04 0.03 0.00 0.00
3 0.02 0.00 0.00 0.01
4 0.01 0.00 0.00 0.09
10 0.01 0.01 -0.01 0.02
20 0.00 0.00 0.00 0.00

Response of Japan to innovations in


Days after shock US UK Germany Japan
0 0.03 0.05 0.12 0.97
1 0.06 0.02 0.07 0.04
2 0.02 0.02 0.00 0.21
3 0.01 0.02 0.06 0.07
4 0.02 0.03 0.07 0.06
10 0.01 0.01 0.01 0.04
20 0.00 0.00 0.00 0.01

Source: Mills and Mills (1991). Reprinted with the permission of Blackwell Publishers.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 82


Chapter 9

Modelling volatility and correlation

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 1


An Excursion into Non-linearity Land

• Motivation: the linear structural (and time series) models cannot


explain a number of important features common to much financial data
- leptokurtosis
- volatility clustering or volatility pooling
- leverage effects

• Our “traditional” structural model could be something like:


yt = 1 +  2x2t + ... + kxkt + ut, or more compactly y = X + u
• We also assumed that ut  N(0,2).

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 2


A Sample Financial Asset Returns Time Series

Daily S&P 500 Returns for August 2003 – August 2019

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 3


Non-linear Models: A Definition

• Campbell, Lo and MacKinlay (1997) define a non-linear data


generating process as one that can be written
yt = f(ut, ut-1, ut-2, …)
where ut is an iid error term and f is a non-linear function.

• They also give a slightly more specific definition as


yt = g(ut-1, ut-2, …)+ ut2(ut-1, ut-2, …)
where g is a function of past error terms only and 2 is a variance
term.

• Models with nonlinear g(•) are “non-linear in mean”, while those with
nonlinear 2(•) are “non-linear in variance”.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 4


Types of non-linear models

• The linear paradigm is a useful one.


• Many apparently non-linear relationships can be made linear by a
suitable transformation.
• On the other hand, it is likely that many relationships in finance are
intrinsically non-linear.

• There are many types of non-linear models, e.g.


- ARCH / GARCH
- switching models
- bilinear models

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 5


Testing for Non-linearity – The RESET Test

• The “traditional” tools of time series analysis (acf’s, spectral analysis)


may find no evidence that we could use a linear model, but the data
may still not be independent.
• Portmanteau tests for non-linear dependence have been developed.
• The simplest is Ramsey’s RESET test, which took the form:

ut  0  1 yt2  2 yt3 ...  p 1 ytp  vt

• Here the dependent variable is the residual series and the independent
variables are the squares, cubes, …, of the fitted values.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 6


Testing for Non-linearity – The BDS Test

• Many other non-linearity tests are available - e.g., the BDS and
bispectrum test
• BDS is a pure hypothesis test. That is, it has as its null hypothesis that
the data are pure noise (completely random)
• It has been argued to have power to detect a variety of departures from
randomness – linear or non-linear stochastic processes, deterministic
chaos, etc)
• The BDS test follows a standard normal distribution under the null
• The test can also be used as a model diagnostic on the residuals to ‘see
what is left’
• If the proposed model is adequate, the standardised residuals should be
white noise.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 7
Chaos Theory

• Chaos theory is a notion taken from the physical sciences


• It suggests that there could be a deterministic, non-linear set of
equations underlying the behaviour of financial series or markets
• Such behaviour will appear completely random to the standard
statistical tests
• A positive sighting of chaos implies that while, by definition, long-
term forecasting would be futile, short-term forecastability and
controllability are possible, at least in theory, since there is some
deterministic structure underlying the data
• Varying definitions of what actually constitutes chaos can be found in
the literature.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 8


Detecting Chaos

• A system is chaotic if it exhibits sensitive dependence on initial


conditions (SDIC)
• So an infinitesimal change is made to the initial conditions (the initial
state of the system), then the corresponding change iterated through the
system for some arbitrary length of time will grow exponentially
• The largest Lyapunov exponent is a test for chaos
• It measures the rate at which information is lost from a system
• A positive largest Lyapunov exponent implies sensitive dependence,
and therefore that evidence of chaos has been obtained
• Almost without exception, applications of chaos theory to financial
markets have been unsuccessful
• This is probably because financial and economic data are usually far
noisier and ‘more random’ than data from other disciplines.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 9
Neural Networks

• Artificial neural networks (ANNs) are a class of models whose


structure is broadly motivated by the way that the brain performs
computation
• ANNs have been widely employed in finance for tackling time series
and classification problems
• Applications have included forecasting financial asset returns,
volatility, bankruptcy and takeover prediction
• Neural networks have virtually no theoretical motivation in finance
(they are often termed a ‘black box’)
• They can fit any functional relationship in the data to an arbitrary
degree of accuracy.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 10


Feedforward Neural Networks

• The most common class of ANN models in finance are known as


feedforward network models
• These have a set of inputs (akin to regressors) linked to one or more
outputs (akin to the regressand) via one or more ‘hidden’ or
intermediate layers
• The size and number of hidden layers can be modified to give a closer
or less close fit to the data sample
• A feedforward network with no hidden layers is simply a standard
linear regression model
• Neural network models work best where financial theory has virtually
nothing to say about the likely functional form for the relationship
between a set of variables.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 11


Neural Networks – Some Disadvantages

• Neural networks are not very popular in finance and suffer from
several problems:
– The coefficient estimates from neural networks do not have any real
theoretical interpretation
– Virtually no diagnostic or specification tests are available for estimated
models
– They can provide excellent fits in-sample to a given set of ‘training’ data,
but typically provide poor out-of-sample forecast accuracy
– This usually arises from the tendency of neural networks to fit closely to
sample-specific data features and ‘noise’, and so they cannot ‘generalise’
– The non-linear estimation of neural network models can be cumbersome
and computationally time-intensive, particularly, for example, if the model
must be estimated repeatedly when rolling through a sample.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 12


Models for Volatility

• Modelling and forecasting stock market volatility has been the subject
of vast empirical and theoretical investigation
• There are a number of motivations for this line of inquiry:
– Volatility is one of the most important concepts in finance
– Volatility, as measured by the standard deviation or variance of returns, is
often used as a crude measure of the total risk of financial assets
– Many value-at-risk (VaR) models for measuring market risk require the
estimation or forecast of a volatility parameter
– The volatility of stock market prices also enters directly into the Black–
Scholes formula for deriving the prices of traded options
• We will now examine several volatility models.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 13


Historical Volatility

• The simplest model for volatility is the historical estimate


• Historical volatility simply involves calculating the variance (or
standard deviation) of returns in the usual way over some historical
period
• This then becomes the volatility forecast for all future periods
• Evidence suggests that the use of volatility predicted from more
sophisticated time series models will lead to more accurate forecasts
and option valuations
• Historical volatility is still useful as a benchmark for comparing the
forecasting ability of more complex time models

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 14


Heteroscedasticity Revisited

• An example of a structural model is


yt = 1 + 2x2t + 3x3t + 4x4t + u t
with ut  N(0, u2 ).

• The assumption that the variance of the errors is constant is known as


homoscedasticity, i.e. Var (ut) =  u2 .

• What if the variance of the errors is not constant?


- heteroscedasticity
- would imply that standard error estimates could be wrong.

• Is the variance of the errors likely to be constant over time? Not for
financial data.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 15
Autoregressive Conditionally Heteroscedastic
(ARCH) Models
• So use a model which does not assume that the variance is constant.
• Recall the definition of the variance of ut:
t2= Var(ut ut-1, ut-2,...) = E[(ut-E(ut))  ut-1, ut-2,...]
2

We usually assume that E(ut) = 0


so  2= Var(ut  ut-1, ut-2,...) = E[ut2 ut-1, ut-2,...].
t
What could the current value of the variance of the errors plausibly
depend upon?
– Previous squared error terms.
• This leads to the autoregressive conditionally heteroscedastic model
for the variance of the errors:
t2= 0 + 1ut21
• This is known as an ARCH(1) model
• The ARCH model due to Engle (1982) has proved very useful in
finance.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 16
Autoregressive Conditionally Heteroscedastic
(ARCH) Models (cont’d)
• The full model would be
yt = 1 +  2x2t + ... + kxkt + ut , ut  N(0, t2)
where t2 = 0 + 1 ut21
• We can easily extend this to the general case where the error variance
depends on q lags of squared errors:
t2 = 0 + 1 ut 1 +2 ut  2 +...+qut  q
2 2 2

• This is an ARCH(q) model.

• Instead of calling the variance t2, in the literature it is usually called ht,
so the model is
yt = 1 + 2x2t + ... + kxkt + ut , ut  N(0,ht)
where ht = 0 + 1 ut21 +2 ut2 2 +...+q ut2 q

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 17


Another Way of Writing ARCH Models

• For illustration, consider an ARCH(1). Instead of the above, we can


write

yt = 1 +  2x2t + ... + kxkt + ut , ut = vtt

t  0  1ut21 , vt  N(0,1)

• The two are different ways of expressing exactly the same model. The
first form is easier to understand while the second form is required for
simulating from an ARCH model, for example.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 18


Testing for “ARCH Effects”

1. First, run any postulated linear regression of the form given in the equation
above, e.g. yt = 1 + 2x2t + ... + kxkt + ut
saving the residuals, ût .

2. Then square the residuals, and regress them on q own lags to test for ARCH
of order q, i.e. run the regression
uˆt2   0   1uˆt21   2uˆt2 2  ...   quˆt2 q  vt
where vt is iid.
Obtain R2 from this regression

3. The test statistic is defined as TR2 (the number of observations multiplied


by the coefficient of multiple correlation) from the last regression, and is
distributed as a 2(q).

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 19


Testing for “ARCH Effects” (cont’d)

4. The null and alternative hypotheses are


H0 : 1 = 0 and 2 = 0 and 3 = 0 and ... and q = 0
H1 : 1  0 or 2  0 or 3  0 or ... or q  0.

If the value of the test statistic is greater than the critical value from the
2 distribution, then reject the null hypothesis.

• Note that the ARCH test is also sometimes applied directly to returns
instead of the residuals from Stage 1 above.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 20


Problems with ARCH(q) Models

• How do we decide on q?
• The required value of q might be very large
• Non-negativity constraints might be violated.
– When we estimate an ARCH model, we require i >0  i=1,2,...,q
(since variance cannot be negative)

• A natural extension of an ARCH(q) model which gets around some of


these problems is a GARCH model.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 21


Generalised ARCH (GARCH) Models

• The model is due to Bollerslev (1986). Allow the conditional variance to


be dependent upon previous own lags
• The variance equation is now
t2 = 0 + 1 ut21 +t-12 (1)
• This is a GARCH(1,1) model, which is like an ARMA(1,1) model for the
variance equation.
• We could also write
t-12 = 0 + 1ut22 +t-22
t-22 = 0 + 1 ut23 +t-32
• Substituting into (1) for t-12 :
 t2 = 0 + 1 ut21 +(0 + 1ut22 +t-22)
= 0 + 1 ut21 +0 + 1ut22 + t-22
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 22
Generalised ARCH (GARCH) Models (cont’d)

• Now substituting into (2) for t-22


t2 = 0 + 1 ut21 +0 + 1ut22 +2(0 + 1ut23 +t-32)
t2 = 0 + 1 ut21 +0 + 1ut22 +02 + 12ut23 +3t-32
t2 = 0 (1++2) + 1 ut21 (1+L+2L2 ) + 3t-32
• An infinite number of successive substitutions would yield
t2 = 0 (1++2+...) + 1 ut21 (1+L+2L2+...) + 02
• So the GARCH(1,1) model can be written as an infinite order ARCH model.

• We can again extend the GARCH(1,1) model to a GARCH(p,q):


t2 = 0+1 ut21 +2ut2 2 +...+qut2q +1t-12+2t-22+...+pt-p2
q p
t2 =  0   i u    j t  j
2 2
t i
i 1 j 1

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 23


Generalised ARCH (GARCH) Models (cont’d)

• But in general a GARCH(1,1) model will be sufficient to capture the


volatility clustering in the data.

• Why is GARCH Better than ARCH?


- more parsimonious - avoids overfitting
- less likely to breech non-negativity constraints

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 24


The Unconditional Variance under the GARCH
Specification

• The unconditional variance of ut is given by


0
Var(ut) =
1  (1   )
when 1   < 1

• 1    1 is termed “non-stationarity” in variance

• 1   = 1 is termed intergrated GARCH

• For non-stationarity in variance, the conditional variance forecasts will


not converge on their unconditional value as the horizon increases.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 25


Estimation of ARCH / GARCH Models

• Since the model is no longer of the usual linear form, we cannot use
OLS.

• We use another technique known as maximum likelihood.

• The method works by finding the most likely values of the parameters
given the actual data.

• More specifically, we form a log-likelihood function and maximise it.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 26


Estimation of ARCH / GARCH Models (cont’d)

• The steps involved in actually estimating an ARCH or GARCH model


are as follows

1. Specify the appropriate equations for the mean and the variance - e.g. an
AR(1)- GARCH(1,1) model:
yt =  + yt-1 + ut , ut  N(0,t2)
t2 = 0 + 1 ut21 +t-12
2. Specify the log-likelihood function to maximise:
T 1 T 1 T
L   log( 2 )   log(  t )   ( y t    y t 1 ) 2 /  t
2 2

2 2 t 1 2 t 1
3. The computer will maximise the function and give parameter values and
their standard errors

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 27


Parameter Estimation using Maximum Likelihood

• Consider the bivariate regression case with homoscedastic errors for


simplicity: y t   1   2 xt  u t

• Assuming that ut  N(0,2), then yt  N( 1   2 xt , 2) so that the


probability density function for a normally distributed random variable
with this mean and variance is given by
1  1 ( y t   1   2 xt ) 2  (1)
f ( y t  1   2 xt ,  ) 
2
exp  
 2  2 2 
• Successive values of yt would trace out the familiar bell-shaped curve.

• Assuming that ut are iid, then yt will also be iid.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 28


Parameter Estimation using Maximum Likelihood
(cont’d)

• Then the joint pdf for all the y’s can be expressed as a product of the
individual density functions
f ( y1 , y 2 ,..., yT  1   2 X t ,  2 )  f ( y1  1   2 X 1 ,  2 ) f ( y 2  1   2 X 2 ,  2 )...
f ( yT  1   2 X 4 ,  2 )
(2)
T
  f ( yt 1   2 X t ,  2 )
t 1

• Substituting into equation (2) for every yt from equation (1),


1  1 T ( y t   1   2 xt ) 2 
f ( y1 , y 2 ,..., yT 1   2 xt ,  )  T
2
exp    (3)
 ( 2 ) T
 2 t 1  2

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 29
Parameter Estimation using Maximum Likelihood
(cont’d)
• The typical situation we have is that the xt and yt are given and we want to
estimate 1, 2, 2. If this is the case, then f() is known as the likelihood
function, denoted LF(1, 2, 2), so we write

1  1 T ( y t   1   2 xt ) 2 
LF ( 1 ,  2 ,  )  T
2
exp    (4)
 ( 2 ) T
 2 t 1  2

• Maximum likelihood estimation involves choosing parameter values (1,


2,2) that maximise this function.

• We want to differentiate (4) w.r.t. 1, 2,2, but (4) is a product containing
T terms.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 30


Parameter Estimation using Maximum Likelihood
(cont’d)
• Since max f ( x )  maxlog( f ( x )) , we can take logs of (4).
x x

• Then, using the various laws for transforming functions containing


logarithms, we obtain the log-likelihood function, LLF:
T 1 T ( y t   1   2 xt ) 2
LLF  T log   log( 2 )  
2 2 t 1 2
• which is equivalent to
T T 1 T
( y     x ) 2
LLF   log  2  log( 2 )   t 1 2 t
(5)
2 2 2 t 1 2
• Differentiating (5) w.r.t. 1, 2,2, we obtain
LLF 1 ( y  1   2 xt ).2.  1
  t
1 2 2 (6)
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 31
Parameter Estimation using Maximum Likelihood
(cont’d)
LLF 1 ( y t  1   2 xt ).2.  xt (7)
 
 2 2 2
LLF T 1 1 ( y t   1   2 xt ) 2
   (8)
 2 22 2 4
• Setting (6)-(8) to zero to minimise the functions, and putting hats above
the parameters to denote the maximum likelihood estimators,

• From (6),  ( y  ˆ  ˆ x )  0
t 1 2 t

 y   ˆ   ˆ x  0
t 1 2 t

 y  Tˆ  ˆ  x  0
t 1 2 t
1 ˆ  ˆ 1
T
 t 1 2 T  xt  0
y  
(9)
ˆ1  y  ˆ 2 x
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 32
Parameter Estimation using Maximum Likelihood
(cont’d)
• From (7),  ( y  ˆ  ˆ x ) x  0
t 1 2 t t

 y x   ˆ x   ˆ x  0
t t 1 t
2
2 t

 y x  ˆ  x  ˆ  x  0
t t 1 t 2
2
t

ˆ  x   y x  ( y  ˆ x ) x
2
2
t t t 2 t

ˆ  x   y x  Txy  ˆ Tx
2
2
t t t 2
2

ˆ 2 ( xt2 Tx 2 )   yt xt  Txy

ˆ 2 
 y x  Txy
t t
(10)
( x Tx ) 2
t
2

T 1
• From (8), ˆ 2


ˆ 4
(y t  ˆ1  ˆ 2 xt ) 2
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 33
Parameter Estimation using Maximum Likelihood
(cont’d)
• Rearranging, ˆ 2  1  ( yt  ˆ1  ˆ 2 xt ) 2
T
1
 2   ut2 (11)
T

• How do these formulae compare with the OLS estimators?


(9) & (10) are identical to OLS
(11) is different. The OLS estimator was
1
 2 
Tk
 ut2

• Therefore the ML estimator of the variance of the disturbances is biased,


although it is consistent.
• But how does this help us in estimating heteroscedastic models?

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 34


Estimation of GARCH Models Using
Maximum Likelihood

• Now we have yt =  + yt-1 + ut , ut  N(0, t2)


t2 = 0 + 1 ut21 +t-12
T 1 T 1 T
L   log( 2 )   log(  t )   ( y t    y t 1 ) 2 /  t
2 2

2 2 t 1 2 t 1
• Unfortunately, the LLF for a model with time-varying variances cannot be
maximised analytically, except in the simplest of cases. So a numerical
procedure is used to maximise the log-likelihood function. A potential
problem: local optima or multimodalities in the likelihood surface.
• The way we do the optimisation is:
1. Set up LLF.
2. Use regression to get initial guesses for the mean parameters.
3. Choose some initial guesses for the conditional variance parameters.
4. Specify a convergence criterion - either by criterion or by value.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 35
Non-Normality and Maximum Likelihood (ML)

• Recall that the conditional normality assumption for ut is essential.

• We can test for normality using the following representation


u t = v t t vt  N(0,1)
ut
 t  0  1ut 1  2 t 1
2 2 v 
t
 t

uˆt
• The sample counterpart is vˆt 
ˆ t
• Are the v̂t normal? Typically v̂t are still leptokurtic, although less so than
the ût . Is this a problem? Not really, as we can use the ML with a robust
variance/covariance estimator. ML with robust standard errors is called Quasi-
Maximum Likelihood or QML.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 36
Extensions to the Basic GARCH Model

• Since the GARCH model was developed, a huge number of extensions


and variants have been proposed. Three of the most important
examples are EGARCH, GJR, and GARCH-M models.

• Problems with GARCH(p,q) Models:


- Non-negativity constraints may still be violated
- GARCH models cannot account for leverage effects

• Possible solutions: the exponential GARCH (EGARCH) model or the


GJR model, which are asymmetric GARCH models.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 37


The EGARCH Model

• Suggested by Nelson (1991). The variance equation is given by

u t 1  u 2
 
t 1
log(  t )     log(  t 1 )   
2 2

 t 1 2   t 1 2 
 

• Advantages of the model


- Since we model the log(t2), then even if the parameters are negative, t2
will be positive.
- We can account for the leverage effect: if the relationship between
volatility and returns is negative, , will be negative.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 38


The GJR Model

• Due to Glosten, Jaganathan and Runkle (1993)


t2 = 0 + 1 ut21 +t-12+ut-12It-1

where It-1 = 1 if ut-1 < 0


= 0 otherwise

• For a leverage effect, we would see  > 0.

• We require 1 +   0 and 1  0 for non-negativity.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 39


An Example of the use of a GJR Model

• Using monthly S&P 500 returns, December 1979 - June 1998

• Estimating a GJR model, we obtain the following results.


y t  0.172
(3.198)

 t 2  1.243  0.015ut21  0.498 t 1 2  0.604ut21 I t 1


(16.372) (0.437) (14.999) (5.772)

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 40


News Impact Curves

The news impact curve plots the next period volatility (ht) that would arise from various
positive and negative values of ut-1, given an estimated model.

News Impact Curves for S&P 500 Returns using Coefficients from GARCH and GJR
Model Estimates: 0.14
GARCH
GJR
0.12
Value of Conditional Variance

0.1

0.08

0.06

0.04

0.02

0
-1 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Value of Lagged Shock
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 41
GARCH-in Mean

• We expect a risk to be compensated by a higher return. So why not let


the return of a security be partly determined by its risk?

• Engle, Lilien and Robins (1987) suggested the ARCH-M specification.


A GARCH-M model would be
yt =  + t-1+ ut , ut  N(0,t2)
t2 = 0 + 1 ut21 +t-12

•  can be interpreted as a sort of risk premium.

• It is possible to combine all or some of these models together to get


more complex “hybrid” models - e.g. an ARMA-EGARCH(1,1)-M
model.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 42
What Use Are GARCH-type Models?

• GARCH can model the volatility clustering effect since the conditional
variance is autoregressive. Such models can be used to forecast volatility.

• We could show that


Var (yt  yt-1, yt-2, ...) = Var (ut  ut-1, ut-2, ...)

• So modelling t2 will give us models and forecasts for yt as well.

• Variance forecasts are additive over time.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 43


Forecasting Variances using GARCH Models

• Producing conditional variance forecasts from GARCH models uses a


very similar approach to producing forecasts from ARMA models.
• It is again an exercise in iterating with the conditional expectations
operator.
• Consider the following GARCH(1,1) model:
yt    ut , ut  N(0,t2),  t 2   0  1ut21   t 1 2
• What is needed is to generate forecasts of T+12 T, T+22 T, ...,
T+s2 T where T denotes all information available up to and
including observation T.
• Adding one to each of the time subscripts of the above conditional
variance equation, and then two, and then three would yield the
following equations
T+12 = 0 + 1uT2 +T2 , T+22 = 0 + 1uT+12 +T+12 , T+32 = 0 + 1
uT+2+T+22
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 44
Forecasting Variances
using GARCH Models (Cont’d)

Let 1f,T be the one step ahead forecast for 2 made at time T. This is
2

easy to calculate since, at time T, the values of all the terms on the
RHS are known.
•  1f,T would be obtained by taking the conditional expectation of the
2

first equation at the bottom of slide 44:


 1f,T = 0 + 1 uT2 +T2
2

Given,  1f,T how is 2f,T , the 2-step ahead forecast for 2 made at time T,
2

2

calculated? Taking the conditional expectation of the second equation


at the bottom of slide 36:
 2f,T = 0 + 1E( uT 1 T) +  1f,T
2 2 2

• where E( uT 1 T) is the expectation, made at time T, of uT 1, which is


2 2

the squared disturbance term.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 45


Forecasting Variances
using GARCH Models (Cont’d)
• We can write
E(uT+12  t) = T+12
• But T+12 is not known at time T, so it is replaced with the forecast for
it,  f 2 , so that the 2-step ahead forecast is given by
1,T
 2f,T = 0 + 1 1f,T + 1f,T
2 2 2

f 2 =  + ( + ) f
2
 2,T 0 1 1,T
• By similar arguments, the 3-step ahead forecast will be given by
 3f,T = ET(0 + 1uT+22 + T+22)
2

= 0 + (1+) 2f,T
2

= 0 + (1+)[ 0 + (1+) 1f,T ]


2

= 0 + 0(1+) + (1+)2 1f,T


2

• Any s-step ahead forecast (s  2) would be produced by


s 1
 f 2
  0  ( 1   ) i 1  ( 1   ) s 1  1f,T
2
s ,T
i 1
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 46
What Use Are Volatility Forecasts?

1. Option pricing

C = f(S, X, 2, T, rf)

2. Conditional betas
im,t
i ,t 
 m2 ,t

3. Dynamic hedge ratios


The Hedge Ratio - the size of the futures position to the size of the
underlying exposure, i.e. the number of futures contracts to buy or sell per
unit of the spot good.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 47
What Use Are Volatility Forecasts? (Cont’d)

• What is the optimal value of the hedge ratio?


• Assuming that the objective of hedging is to minimise the variance of the
hedged portfolio, the optimal hedge ratio will be given by
s
h p
F
where h = hedge ratio
p = correlation coefficient between change in spot price (S) and
change in futures price (F)
S = standard deviation of S
F = standard deviation of F

• What if the standard deviations and correlation are changing over time?
 s ,t
Use ht  p t
 F ,t
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 48
Testing Non-linear Restrictions or
Testing Hypotheses about Non-linear Models

• Usual t- and F-tests are still valid in non-linear models, but they are
not flexible enough.

• There are three hypothesis testing procedures based on maximum


likelihood principles: Wald, Likelihood Ratio, Lagrange Multiplier.

• Consider a single parameter,  to be estimated, Denote the MLE as ˆ


and a restricted estimate as ~ .

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 49


Likelihood Ratio Tests

• Estimate under the null hypothesis and under the alternative.


• Then compare the maximised values of the LLF.
• So we estimate the unconstrained model and achieve a given maximised
value of the LLF, denoted Lu
• Then estimate the model imposing the constraint(s) and get a new value of
the LLF denoted Lr.
• Which will be bigger?
• Lr  Lu comparable to RRSS  URSS

• The LR test statistic is given by


LR = -2(Lr - Lu)  2(m)
where m = number of restrictions.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 50


Likelihood Ratio Tests (cont’d)

• Example: We estimate a GARCH model and obtain a maximised LLF of


66.85. We are interested in testing whether  = 0 in the following equation:

yt =  + yt-1 + ut , ut  N(0, t )
2

t2 = 0 + 1 ut21 +   t 1
2

• We estimate the model imposing the restriction and observe the maximised
LLF falls to 64.54. Can we accept the restriction?
• LR = -2(64.54-66.85) = 4.62.
• The test follows a 2(1) = 3.84 at 5%, so reject the null.
• Denoting the maximised value of the LLF by unconstrained ML as L(ˆ)
~
and the constrained optimum as L( ) . Then we can illustrate the 3 testing
procedures in the following diagram:
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 51
Comparison of Testing Procedures under Maximum
Likelihood: Diagramatic Representation

L 

A

L ˆ

B
L
~

~
 ˆ 

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 52


Hypothesis Testing under Maximum Likelihood

• The vertical distance forms the basis of the LR test.

• The Wald test is based on a comparison of the horizontal distance.

• The LM test compares the slopes of the curve at A and B.

• We know at the unrestricted MLE, L(ˆ), the slope of the curve is zero.

~
• But is it “significantly steep” at L( ) ?

• This formulation of the test is usually easiest to estimate.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 53


An Example of the Application of GARCH Models
- Day & Lewis (1992)

• Purpose
• To consider the out of sample forecasting performance of GARCH and
EGARCH Models for predicting stock index volatility.

• Implied volatility is the markets expectation of the “average” level of


volatility of an option:

• Which is better, GARCH or implied volatility?

• Data
• Weekly closing prices (Wednesday to Wednesday, and Friday to Friday)
for the S&P100 Index option and the underlying 11 March 83 - 31 Dec. 89

• Implied volatility is calculated using a non-linear iterative procedure.


‘Introductory Econometrics for Finance’ © Chris Brooks 2019 54
The Models

• The “Base” Models


For the conditional mean
RMt  RFt  0  1 ht  ut (1)

And for the variance ht   0   1u t21   1 ht 1 (2)

ut 1  u  2 
1/ 2 
or ln( ht )   0  1 ln( ht 1 )  1 (    t 1    ) (3)
ht 1  ht 1    
where
RMt denotes the return on the market portfolio
RFt denotes the risk-free rate
ht denotes the conditional variance from the GARCH-type models while
t2 denotes the implied variance from option prices.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 55
The Models (cont’d)

• Add in a lagged value of the implied volatility parameter to equations (2)


and (3).
(2) becomes
ht   0   1u t21   1 ht 1   t21 (4)

and (3) becomes


ut 1  u 2 
1/ 2

ln( ht )   0  1 ln( ht 1 )  1 (   t 1
   )   ln(  t21 ) (5)
ht 1  ht 1    

• We are interested in testing H0 :  = 0 in (4) or (5).


• Also, we want to test H0 : 1 = 0 and  1 = 0 in (4),
• and H0 : 1 = 0 and  1 = 0 and  = 0 and  = 0 in (5).
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 56
The Models (cont’d)

• If this second set of restrictions holds, then (4) & (5) collapse to
ht2   0   t21 (4’)

• and (3) becomes


ln( ht2 )   0   ln(  t21 ) (5’)

• We can test all of these restrictions using a likelihood ratio test.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 57


In-sample Likelihood Ratio Test Results:
GARCH Versus Implied Volatility

R Mt  R Ft   0  1 ht  u t (8.78)
ht   0   1u t21   1 ht 1 (8.79)
ht   0   1u t21   1 ht 1   t21 (8.81)
ht2   0   t21 (8.81)
Equation for 0 1 010-4 1 1  Log-L 2
Variance
specification
(8.79) 0.0072 0.071 5.428 0.093 0.854 - 767.321 17.77
(0.005) (0.01) (1.65) (0.84) (8.17)
(8.81) 0.0015 0.043 2.065 0.266 -0.068 0.318 776.204 -
(0.028) (0.02) (2.98) (1.17) (-0.59) (3.00)
(8.81) 0.0056 -0.184 0.993 - - 0.581 764.394 23.62
(0.001) (-0.001) (1.50) (2.94)
Notes: t-ratios in parentheses, Log-L denotes the maximised value of the log-likelihood function in
each case. 2 denotes the value of the test statistic, which follows a 2(1) in the case of (8.81) restricted
to (8.79), and a 2 (2) in the case of (8.81) restricted to (8.81). Source: Day and Lewis (1992).
Reprinted with the permission of Elsevier Science.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 58


In-sample Likelihood Ratio Test Results:
EGARCH Versus Implied Volatility
R Mt  R Ft   0  1 ht  u t (8.78)
u t 1  u t 1 2
1/ 2
ln( ht )   0   1 ln( ht 1 )   1 (     ) (8.80)
ht 1  ht 1   

u t 1  u t 1 2 
1/ 2

ln( ht )   0   1 ln( ht 1 )   1 (       )   ln(  t21 ) (8.82)


ht 1  ht 1    
ln( ht2 )   0   ln(  t21 ) (8.82)
Equation for 0 1 010 -4
1    Log-L 2
Variance
specification
(c) -0.0026 0.094 -3.62 0.529 -0.273 0.357 - 776.436 8.09
(-0.03) (0.25) (-2.90) (3.26) (-4.13) (3.17)
(e) 0.0035 -0.076 -2.28 0.373 -0.282 0.210 0.351 780.480 -
(0.56) (-0.24) (-1.82) (1.48) (-4.34) (1.89) (1.82)
(e) 0.0047 -0.139 -2.76 - - - 0.667 765.034 30.89
(0.71) (-0.43) (-2.30) (4.01)
Notes: t-ratios in parentheses, Log-L denotes the maximised value of the log-likelihood function in
each case. 2 denotes the value of the test statistic, which follows a 2(1) in the case of (8.82) restricted
to (8.80), and a 2 (2) in the case of (8.82) restricted to (8.82). Source: Day and Lewis (1992).
Reprinted with the permission of Elsevier Science.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 59


Conclusions for In-sample Model Comparisons &
Out-of-Sample Procedure

• IV has extra incremental power for modelling stock volatility beyond


GARCH.

• But the models do not represent a true test of the predictive ability of
IV.

• So the authors conduct an out of sample forecasting test.

• There are 729 data points. They use the first 410 to estimate the
models, and then make a 1-step ahead forecast of the following week’s
volatility.

• Then they roll the sample forward one observation at a time,


constructing a new one step ahead forecast at each step.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 60
Out-of-Sample Forecast Evaluation

• They evaluate the forecasts in two ways:


• The first is by regressing the realised volatility series on the forecasts plus
a constant:
t21  b0  b1 2ft  t 1 (7)

where t 1 is the “actual” value of volatility, and  2ft is the value forecasted
2

for it during period t.


• Perfectly accurate forecasts imply b0 = 0 and b1 = 1.
• But what is the “true” value of volatility at time t ?
Day & Lewis use 2 measures
1. The square of the weekly return on the index, which they call SR.
2. The variance of the week’s daily returns multiplied by the number
of trading days in that week.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 61


Out-of Sample Model Comparisons

 t21  b0  b1 2ft   t 1 (8.83)


Forecasting Model Proxy for ex b0 b1 R2
post volatility
Historic SR 0.0004 0.129 0.094
(5.60) (21.18)
Historic WV 0.0005 0.154 0.024
(2.90) (7.58)
GARCH SR 0.0002 0.671 0.039
(1.02) (2.10)
GARCH WV 0.0002 1.074 0.018
(1.07) (3.34)
EGARCH SR 0.0000 1.075 0.022
(0.05) (2.06)
EGARCH WV -0.0001 1.529 0.008
(-0.48) (2.58)
Implied Volatility SR 0.0022 0.357 0.037
(2.22) (1.82)
Implied Volatility WV 0.0005 0.718 0.026
(0.389) (1.95)
Notes: Historic refers to the use of a simple historical average of the squared returns to forecast
volatility; t-ratios in parentheses; SR and WV refer to the square of the weekly return on the S&P 100,
and the variance of the week’s daily returns multiplied by the number of trading days in that week,
respectively. Source: Day and Lewis (1992). Reprinted with the permission of Elsevier Science.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 62


Encompassing Test Results: Do the IV Forecasts
Encompass those of the GARCH Models?
 t21  b0  b1 It2  b2 Gt2  b3 Et2  b4 Ht
2
  t 1 (8.86)
Forecast comparison b0 b1 b2 b3 b4 R2
-0.00010 0.601 0.298 - - 0.027
Implied vs. GARCH (-0.09) (1.03) (0.42)

Implied vs. GARCH 0.00018 0.632 -0.243 - 0.123 0.038


vs. Historical (1.15) (1.02) (-0.28) (7.01)

Implied vs. EGARCH -0.00001 0.695 - 0.176 - 0.026


(-0.07) (1.62) (0.27)

Implied vs. EGARCH 0.00026 0.590 -0.374 - 0.118 0.038


vs. Historical (1.37) (1.45) (-0.57) (7.74)

GARCH vs. EGARCH 0.00005 - 1.070 -0.001 - 0.018


(0.37) (2.78) (-0.00)
Notes: t-ratios in parentheses; the ex post measure used in this table is the variance of the week’s daily
returns multiplied by the number of trading days in that week. Source: Day and Lewis (1992).
Reprinted with the permission of Elsevier Science.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 63
Conclusions of Paper

• Within sample results suggest that IV contains extra information not


contained in the GARCH / EGARCH specifications.

• Out of sample results suggest that nothing can accurately predict


volatility!

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 64


Stochastic Volatility Models

• It is a common misconception that GARCH-type specifications are


stochastic volatility models
• However, as the name suggests, stochastic volatility models differ
from GARCH principally in that the conditional variance equation of a
GARCH specification is completely deterministic given all
information available up to that of the previous period
• There is no error term in the variance equation of a GARCH model,
only in the mean equation
• Stochastic volatility models contain a second error term, which enters
into the conditional variance equation.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 65


Autoregressive Volatility Models

• A simple example of a stochastic volatility model is the autoregressive


volatility specification
• This model is simple to understand and simple to estimate, because it
requires that we have an observable measure of volatility which is then
simply used as any other variable in an autoregressive model
• The standard Box-Jenkins-type procedures for estimating
autoregressive (or ARMA) models can then be applied to this series
• For example, if the quantity of interest is a daily volatility estimate, we
could use squared daily returns, which trivially involves taking a
column of observed returns and squaring each observation
• The model estimated for volatility, t2, is then

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 66


A Stochastic Volatility Model Specification

• The term ‘stochastic volatility’ is usually associated with a different


formulation to the autoregressive volatility model, a possible example
of which would be

where ηt is another N(0,1) random variable that is independent of ut


• The volatility is latent rather than observed, and so is modelled
indirectly
• Stochastic volatility models are superior in theory compared with
GARCH-type models, but the former are much more complex to
estimate.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 67


Covariance Modelling: Motivation

• A limitation of univariate volatility models is that the fitted conditional


variance of each series is entirely independent of all others
• This is potentially an important limitation for two reasons:
– If there are ‘volatility spillovers’ between markets or assets, the univariate
model will be mis-specified
– It is often the case that the covariances between series are of interest too
– The calculation of hedge ratios, portfolio value at risk estimates, CAPM
betas, and so on, all require covariances as inputs
• Multivariate GARCH models can be used for estimation of:
– Conditional CAPM betas
– Dynamic hedge ratios
– Portfolio variances

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 68


Simple Covariance Models:
Historical and Implied

• In exactly the same fashion as for volatility, the historical covariance


or correlation between two series can be calculated from a set of
historical data

• Implied covariances can be calculated using options whose payoffs are


dependent on more than one underlying asset

• The relatively small number of such options that exist limits the
circumstances in which implied covariances can be calculated

• Examples include rainbow options, ‘crack spread’ options for different


grades of oil, and currency options.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 69


Implied Covariance Models

• To give an illustration for currency options, the implied variance of the


cross-currency returns is given by

where and are the implied variances of the x and y


returns, respectively, and is the implied covariance between
x and y

• So if the implied covariance between USD/DEM and USD/JPY is of


interest, then the implied variances of the returns of USD/DEM and
USD/JPY and the returns of the cross-currency DEM/JPY are required.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 70


EWMA Covariance Models

• A EWMA specification gives more weight in the covariance to recent


observations than an estimate based on the simple average
• The EWMA model estimates for variances and covariances at time t in
the bivariate setup with two returns series x and y may be written as
hij,t = λhij,t−1 + (1 − λ)xt−1yt−1
where i  j for the covariances and i = j; x = y for the variances
• The fitted values for h also become the forecasts for subsequent
periods
• λ (0 < λ < 1) denotes the decay factor determining the relative weights
attached to recent versus less recent observations
• This parameter could be estimated but is often set arbitrarily (e.g.,
Riskmetrics use a decay factor of 0.97 for monthly data but 0.94 for
daily data).
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 71
EWMA Covariance Models - Limitations

• This equation can be rewritten as an infinite order function of only the


returns by successively substituting out the covariances:

• The EWMA model is a restricted version of an integrated GARCH


(IGARCH) specification, and it does not guarantee the fitted variance-
covariance matrix to be positive definite

• EWMA models also cannot allow for the observed mean reversion in
the volatilities or covariances of asset returns that is particularly
prevalent at lower frequencies.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 72


Multivariate GARCH Models

• Multivariate GARCH models are used to estimate and to forecast


covariances and correlations.
• The basic formulation is similar to that of the GARCH model, but where the
covariances as well as the variances are permitted to be time-varying.
• There are 3 main classes of multivariate GARCH formulation that are widely
used: VECH, diagonal VECH and BEKK.

VECH and Diagonal VECH


• e.g. suppose that there are two variables used in the model. The conditional
covariance matrix is denoted Ht, and would be 2  2. Ht and VECH(Ht) are
 h11t 
 h11t h12t 
Ht    VECH ( H t )  h22t 
h21t h22t   h12t 
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 73
VECH and Diagonal VECH

• In the case of the VECH, the conditional variances and covariances would
each depend upon lagged values of all of the variances and covariances
and on lags of the squares of both error terms and their cross products.
• In matrix form, it would be written
VECH H t   C  A VECH  t 1t 1   B VECH H t 1   t  t 1 ~ N 0, H t 
• Writing out all of the elements gives the 3 equations as
h11t  c11  a11u12t  a12 u 22t  a13 u1t u 2 t  b11 h11t 1  b12 h22 t 1  b13 h12 t 1
h22 t  c21  a 21u12t  a 22 u 22t  a 23 u1t u 2 t  b21 h11t 1  b22 h22 t 1  b23 h12 t 1
h12 t  c31  a 31u12t  a 32 u 22t  a 33 u1t u 2 t  b31 h11t 1  b32 h22 t 1  b33 h12 t 1
• Such a model would be hard to estimate. The diagonal VECH is much
simpler and is specified, in the 2 variable case, as follows:
h11t   0   1u12t 1   2 h11t 1
h22t   0   1u 22t 1   2 h22t 1
h12t   0   1u1t 1u 2t 1   2 h12t 1
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 74
BEKK and Model Estimation for M-GARCH

• Neither the VECH nor the diagonal VECH ensure a positive definite
variance-covariance matrix.
• An alternative approach is the BEKK model (Engle & Kroner, 1995).
• The BEKK Model uses a Quadratic form for the parameter matrices to
ensure a positive definite variance / covariance matrix Ht.
• In matrix form, the BEKK model is H t  W W  AHt 1 A  Bt 1t 1B
• Model estimation for all classes of multivariate GARCH model is
again performed using maximum likelihood with the following LLF:
   
TN
2
1 T

log 2   log H t   t' H t1 t
2 t 1

where N is the number of variables in the system (assumed 2 above), 
is a vector containing all of the parameters, and T is the number of obs.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 75


Correlation Models and the CCC Model

• The correlations between a pair of series at each point in time can be


constructed by dividing the conditional covariances by the product of
the conditional standard deviations from a VECH or BEKK model
• A subtly different approach would be to model the dynamics for the
correlations directly
• In the constant conditional correlation (CCC) model, the correlations
between the disturbances to be fixed through time
• Thus, although the conditional covariances are not fixed, they are tied
to the variances
• The conditional variances in the fixed correlation model are identical
to those of a set of univariate GARCH specifications (although they
are estimated jointly):

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 76


More on the CCC Model

• The off-diagonal elements of Ht, hij,t (i  j), are defined indirectly via
the correlations, denoted ρij:

• Is it empirically plausible to assume that the correlations are constant


through time?

• Several tests of this assumption have been developed, including a test


based on the information matrix due and a Lagrange Multiplier test

• There is evidence against constant correlations, particularly in the


context of stock returns.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 77
The Dynamic Conditional Correlation Model

• Several different formulations of the dynamic conditional correlation


(DCC) model are available, but a popular specification is due to Engle
(2002)
• The model is related to the CCC formulation but where the correlations
are allowed to vary over time.
• Define the variance-covariance matrix, Ht, as Ht = DtRtDt
• Dt is a diagonal matrix containing the conditional standard deviations
(i.e. the square roots of the conditional variances from univariate
GARCH model estimations on each of the N individual series) on the
leading diagonal
• Rt is the conditional correlation matrix
• Numerous parameterisations of Rt are possible, including an
exponential smoothing approach
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 78
The DCC Model – A Possible Specification

• A possible specification is of the MGARCH form:

where:
• S is the unconditional correlation matrix of the vector of standardised
residuals (from the first stage estimation), ut = Dt−1ϵt
• ι is a vector of ones
• Qt is an N × N symmetric positive definite variance-covariance matrix
• ◦ denotes the Hadamard or element-by-element matrix multiplication
procedure
• This specification for the intercept term simplifies estimation and
reduces the number of parameters.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 79
The DCC Model – A Possible Specification

• Engle (2002) proposes a GARCH-esque formulation for dynamically


modelling Qt with the conditional correlation matrix, Rt then constructed
as

where diag(·) denotes a matrix comprising the main diagonal elements


of (·) and Q∗ is a matrix that takes the square roots of each element in Q

• This operation is effectively taking the covariances in Qt and dividing


them by the product of the appropriate standard deviations in Qt∗ to
create a matrix of correlations.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 80


DCC Model Estimation

• The model may be estimated in a single stage using ML although this will
be difficult. So Engle advocates a two-stage procedure where each variable
in the system is first modelled separately as a univariate GARCH
• A joint log-likelihood function for this stage could be constructed, which
would simply be the sum (over N) of all of the log-likelihoods for the
individual GARCH models
• In the second stage, the conditional likelihood is maximised with respect
to any unknown parameters in the correlation matrix
• The log-likelihood function for the second stage estimation will be of the
form

• where θ1 and θ2 denote the parameters to be estimated in the 1st and 2nd
stages, respectively.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 81
Asymmetric Multivariate GARCH

• Asymmetric models have become very popular in empirical applications,


where the conditional variances and / or covariances are permitted to react
differently to positive and negative innovations of the same magnitude
• In the multivariate context, this is usually achieved in the Glosten et al.
(1993) framework
• Kroner and Ng (1998), for example, suggest the following extension to the
BEKK formulation (with obvious related modifications for the VECH or
diagonal VECH models)

where zt−1 is an N-dimensional column vector with elements taking the


value −ϵt−1 if the corresponding element of ϵt−1 is negative and zero
otherwise.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 82


An Example: Estimating a Time-Varying Hedge Ratio
for FTSE Stock Index Returns
(Brooks, Henry and Persand, 2002).
• Data comprises 3580 daily observations on the FTSE 100 stock index and
stock index futures contract spanning the period 1 January 1985 - 9 April 1999.
• Several competing models for determining the optimal hedge ratio are
constructed. Define the hedge ratio as .
– No hedge (=0)
– Naïve hedge (=1)
– Multivariate GARCH hedges:
• Symmetric BEKK
• Asymmetric BEKK
In both cases, estimating the OHR involves forming a 1-step ahead
forecast and computing
hCF ,t 1
OHR t 1   t
hF ,t 1
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 83
OHR Results

In Sample
Unhedged Naïve Hedge Symmetric Time Asymmetric
=0 =1 Varying Time Varying
Hedge Hedge
hFC,t hFC ,t
t  t 
hF , t h F ,t
Return 0.0389 -0.0003 0.0061 0.0060
{2.3713} {-0.0351} {0.9562} {0.9580}
Variance 0.8286 0.1718 0.1240 0.1211
Out of Sample
Unhedged Naïve Hedge Symmetric Time Asymmetric
=0 =1 Varying Time Varying
Hedge Hedge
hFC ,t hFC ,t
t  t 
hF , t h F ,t
Return 0.0819 -0.0004 0.0120 0.0140
{1.4958} {0.0216} {0.7761} {0.9083}
Variance 1.4972 0.1696 0.1186 0.1188
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 84
Plot of the OHR from Multivariate GARCH

Time Varying Hedge Ratios


1.00

0.95
Conclusions
0.90
- OHR is time-varying and less
than 1
0.85 - M-GARCH OHR provides a
better hedge, both in-sample
0.80
and out-of-sample.
0.75
- No role in calculating OHR for
asymmetries
0.70

0.65
500 1000 1500 2000 2500 3000

Symmetric BEKK
Asymmetric BEKK

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 85


Chapter 11

Panel Data

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 1


The Nature of Panel Data

• Panel data, also known as longitudinal data, have both time series and cross-
sectional dimensions.
• They arise when we measure the same collection of people or objects over a
period of time.
• Econometrically, the setup is
yit    xit  uit
where yit is the dependent variable,  is the intercept term,  is a k  1 vector
of parameters to be estimated on the explanatory variables, xit; t = 1, …, T;
i = 1, …, N.
• The simplest way to deal with this data would be to estimate a single, pooled
regression on all the observations together.
• But pooling the data assumes that there is no heterogeneity – i.e. the same
relationship holds for all the data.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 2
The Advantages of using Panel Data

There are a number of advantages from using a full panel technique when a panel
of data is available.

• We can address a broader range of issues and tackle more complex problems
with panel data than would be possible with pure time series or pure cross-
sectional data alone.

• It is often of interest to examine how variables, or the relationships between


them, change dynamically (over time).

• By structuring the model in an appropriate way, we can remove the impact of


certain forms of omitted variables bias in regression results.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 3


Seemingly Unrelated Regression (SUR)

• One approach to making more full use of the structure of the data would be
to use the SUR framework initially proposed by Zellner (1962). This has
been used widely in finance where the requirement is to model several
closely related variables over time.
• A SUR is so-called because the dependent variables may seem unrelated
across the equations at first sight, but a more careful consideration would
allow us to conclude that they are in fact related after all.
• Under the SUR approach, one would allow for the contemporaneous
relationships between the error terms in the equations by using a generalised
least squares (GLS) technique.
• The idea behind SUR is essentially to transform the model so that the error
terms become uncorrelated. If the correlations between the error terms in the
individual equations had been zero in the first place, then SUR on the system
of equations would have been equivalent to running separate OLS
regressions on each equation.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 4
Fixed and Random Effects Panel Estimators

• The applicability of the SUR technique is limited because it can only be


employed when the number of time series observations per cross-sectional
unit is at least as large as the total number of such units, N.

• A second problem with SUR is that the number of parameters to be estimated


in total is very large, and the variance-covariance matrix of the errors also
has to be estimated. For these reasons, the more flexible full panel data
approach is much more commonly used.

• There are two main classes of panel techniques: the fixed effects estimator
and the random effects estimator.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 5


Fixed Effects Models

• The fixed effects model for some variable yit may be written
yit    xit  i  vit
• We can think of i as encapsulating all of the variables that affect yit cross-
sectionally but do not vary over time – for example, the sector that a firm
operates in, a person's gender, or the country where a bank has its
headquarters, etc. Thus we would capture the heterogeneity that is
encapsulated in i by a method that allows for different intercepts for each
cross sectional unit.
• This model could be estimated using dummy variables, which would be
termed the least squares dummy variable (LSDV) approach.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 6


Fixed Effects Models (Cont’d)

• The LSDV model may be written

yit    xit  1 D1i  2 D2i  3 D3i     N DN i  vit

where D1i is a dummy variable that takes the value 1 for all observations on
the first entity (e.g., the first firm) in the sample and zero otherwise, D2i is a
dummy variable that takes the value 1 for all observations on the second
entity (e.g., the second firm) and zero otherwise, and so on.
• The LSDV can be seen as just a standard regression model and therefore it
can be estimated using OLS.
• Now the model given by the equation above has N+k parameters to estimate.
In order to avoid the necessity to estimate so many dummy variable
parameters, a transformation, known as the within transformation, is used to
simplify matters.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 7
The Within Transformation

• The within transformation involves subtracting the time-mean of each entity


away from the values of the variable.
• So define yi  Tt1 yit as the time-mean of the observations for cross-sectional
unit i, and similarly calculate the means of all of the explanatory variables.
• Then we can subtract the time-means from each variable to obtain a
regression containing demeaned variables only.
• Note that such a regression does not require an intercept term since now the
dependent variable will have zero mean by construction.
• The model containing the demeaned variables is yit  yi   ( xit  xi )  uit  ui
• We could write this as yit  xit 
 uit
where the double dots above the variables denote the demeaned values.
• This model can be estimated using OLS, but we need to make a degrees of
freedom correction.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 8
The Between Estimator

• An alternative to this demeaning would be to simply run a cross-sectional


regression on the time-averaged values of the variables, which is known as
the between estimator.
• An advantage of running the regression on average values (the between
estimator) over running it on the demeaned values (the within estimator) is
that the process of averaging is likely to reduce the effect of measurement
error in the variables on the estimation process.
• A further possibility is that instead, the first difference operator could be
applied so that the model becomes one for explaining the change in yit rather
than its level. When differences are taken, any variables that do not change
over time will again cancel out.
• Differencing and the within transformation will produce identical estimates
in situations where there are only two time periods.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 9


Time Fixed Effects Models

• It is also possible to have a time-fixed effects model rather than an entity-


fixed effects model.
• We would use such a model where we think that the average value of yit
changes over time but not cross-sectionally.
• Hence with time-fixed effects, the intercepts would be allowed to vary over
time but would be assumed to be the same across entities at each given point
in time. We could write a time-fixed effects model as
yit    xit   t  vit
where t is a time-varying intercept that captures all of the variables that
affect y and that vary over time but are constant cross-sectionally.
• An example would be where the regulatory environment or tax rate changes
part-way through a sample period. In such circumstances, this change of
environment may well influence y, but in the same way for all firms.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 10
Time Fixed Effects Models (Cont’d)

• Time-variation in the intercept terms can be allowed for in exactly the same
way as with entity fixed effects. That is, a least squares dummy variable
model could be estimated
yit  xit  1D1t  2 D2t  ...  T DTt  vit
where D1t, for example, denotes a dummy variable that takes the value 1 for
the first time period and zero elsewhere, and so on.
• The only difference is that now, the dummy variables capture time variation
rather than cross-sectional variation. Similarly, in order to avoid estimating a
model containing all T dummies, a within transformation can be conducted to
subtract away the cross-sectional averages from each observation
• Finally, it is possible to allow for both entity fixed effects and time fixed
effects within the same model. Such a model would be termed a two-way
error component model, and the LSDV equivalent model would contain both
cross-sectional and time dummies
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 11
Investigating Banking Competition with a Fixed Effects
Model
• The UK banking sector is relatively concentrated and apparently extremely
profitable.
• It has been argued that competitive forces are not sufficiently strong and that
there are barriers to entry into the market.
• A study by Matthews, Murinde and Zhao (2007) investigates competitive
conditions in UK banking between 1980 and 2004 using the Panzar-Rosse
approach.
• The model posits that if the market is contestable, entry to and exit from the
market will be easy (even if the concentration of market share among firms is
high), so that prices will be set equal to marginal costs.
• The technique used to examine this conjecture is to derive testable
restrictions upon the firm's reduced form revenue equation.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 12


Methodology

• The empirical investigation consists of deriving an index (the Panzar-Rosse


H-statistic) of the sum of the elasticities of revenues to factor costs (input
prices).
• If this lies between 0 and 1, we have monopolistic competition or a partially
contestable equilibrium, whereas H < 0 would imply a monopoly and H = 1
would imply perfect competition or perfect contestability.
• The key point is that if the market is characterised by perfect competition, an
increase in input prices will not affect the output of firms, while it will under
monopolistic competition. The model Matthews et al. investigate is given by
ln REVit   0  1 ln PLit   2 ln PKit   3 ln PFit  1 ln RISKASSit 
2 ln ASSETit  3 ln BRit   1GROWTH t  i  vit
where REVit is the ratio of bank revenue to total assets for firm i at time t, PL
is personnel expenses to employees (the unit price of labour); PK is the ratio
of capital assets to fixed assets (the unit price of capital); and PF is the ratio
of annual interest expenses to total loanable funds (the unit price of funds).
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 13
Methodology (Cont’d)

• The model also includes several variables that capture time-varying bank-
specific effects on revenues and costs, and these are: RISKASS, the ratio of
provisions to total assets; ASSET is bank size, as measured by total assets; BR
is the ratio of the bank's number of branches to the total number of branches
for all banks.
• Finally, GROWTHt is the rate of growth of GDP, which obviously varies
over time but is constant across banks at a given point in time; i is a bank-
specific fixed effects and vit is an idiosyncratic disturbance term. The
contestability parameter, H is given as 1 + 2 + 3
• Unfortunately, the Panzar-Rosse approach is only valid when applied to a
banking market in long-run equilibrium. Hence the authors also conduct a
test for this, which centres on the regression
ln ROAit   0 '1 ' ln PLit   2 ' ln PKit   3 ' ln PFit  1 ' ln RISKASSit 
2 ' ln ASSETit   3 ' ln BRit   1 ' GROWTH t  i  wit
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 14
Methodology (Cont’d)

• The explanatory variables for the equilibrium test regression are identical to
those of the contestability regression but the dependent variable is now the
log of the return on assets (lnROA).
• Equilibrium is argued to exist in the market if 1' + 2' + 3'
• Matthews et al. employ a fixed effects panel data model which allows for
differing intercepts across the banks, but assumes that these effects are fixed
over time.
• The fixed effects approach is a sensible one given the data analysed here
since there is an unusually large number of years (25) compared with the
number of banks (12), resulting in a total of 219 bank-years (observations).
• The data employed in the study are obtained from banks' annual reports and
the Annual Abstract of Banking Statistics from the British Bankers
Association. The analysis is conducted for the whole sample period, 1980-
2004, and for two sub-samples, 1980-1991 and 1992-2004.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 15


Results from Test of Banking Market Equilibrium
by Matthews et al.

Source: Matthews, Murinde and Zhao (2007)


‘Introductory Econometrics for Finance’ © Chris Brooks 2019 16
Analysis of Equilibrium Test Results

• The null hypothesis that the bank fixed effects are jointly zero (H 0: i = 0) is
rejected at the 1% significance level for the full sample and for the second
sub-sample but not at all for the first sub-sample.
• Overall, however, this indicates the usefulness of the fixed effects panel
model that allows for bank heterogeneity.
• The main focus of interest in the table on the previous slide is the equilibrium
test, and this shows slight evidence of disequilibrium (E is significantly
different from zero at the 10% level) for the whole sample, but not for either
of the individual sub-samples.
• Thus the conclusion is that the market appears to be sufficiently in a state of
equilibrium that it is valid to continue to investigate the extent of competition
using the Panzar-Rosse methodology. The results of this are presented on the
following slide.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 17


Results from Test of Banking Market Competition
by Matthews et al.

Source: Matthews, Murinde and Zhao (2007)


‘Introductory Econometrics for Finance’ © Chris Brooks 2019 18
Analysis of Competition Test Results

• The value of the contestability parameter, H, which is the sum of the input
elasticities, falls in value from 0.78 in the first sub-sample to 0.46 in the
second, suggesting that the degree of competition in UK retail banking
weakened over the period.
• However, the results in the two rows above that show that the null
hypotheses that H = 0 and H = 1 can both be rejected at the 1% significance
level for both sub-samples, showing that the market is best characterised by
monopolistic competition.
• As for the equilibrium regressions, the null hypothesis that the fixed effects
dummies (i) are jointly zero is strongly rejected, vindicating the use of the
fixed effects panel approach and suggesting that the base levels of the
dependent variables differ.
• Finally, the additional bank control variables all appear to have intuitively
appealing signs. For example, the risk assets variable has a positive sign, so
that higher risks lead to higher revenue per unit of total assets; the asset
variable has a negative sign, and is statistically significant at the 5% level or
below in all three periods, suggesting that smaller banks are more profitable.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 19
The Random Effects Model

• An alternative to the fixed effects model described above is the random


effects model, which is sometimes also known as the error components
model.
• As with fixed effects, the random effects approach proposes different
intercept terms for each entity and again these intercepts are constant over
time, with the relationships between the explanatory and explained variables
assumed to be the same both cross-sectionally and temporally.
• However, the difference is that under the random effects model, the
intercepts for each cross-sectional unit are assumed to arise from a common
intercept  (which is the same for all cross-sectional units and over time),
plus a random variable i that varies cross-sectionally but is constant over
time.
• i measures the random deviation of each entity’s intercept term from the
“global” intercept term . We can write the random effects panel model as
yit    xit  it , it   i  vit

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 20


How the Random Effects Model Works

• Unlike the fixed effects model, there are no dummy variables to capture the
heterogeneity (variation) in the cross-sectional dimension.
• Instead, this occurs via the i terms.
• Note that this framework requires the assumptions that the new cross-
sectional error term, i, has zero mean, is independent of the individual
observation error term vit, has constant variance, and is independent of the
explanatory variables.
• The parameters ( and the  vector) are estimated consistently but
inefficiently by OLS, and the conventional formulae would have to be
modified as a result of the cross-correlations between error terms for a given
cross-sectional unit at different points in time.
• Instead, a generalised least squares (GLS) procedure is usually used. The
transformation involved in this GLS procedure is to subtract a weighted mean
of the yit over time (i.e. part of the mean rather than the whole mean, as was
the case for fixed effects estimation).

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 21


Quasi-Demeaning the Data

• Define the ‘quasi-demeaned’ data as yit*  yit  yi and similarly for xit,
•  will be a function of the variance of the observation error term, v2, and of
the variance of the entity-specific error term, 2:
v
 1
T 2   v2
• This transformation will be precisely that required to ensure that there are no
cross-correlations in the error terms, but fortunately it should automatically
be implemented by standard software packages.
• Just as for the fixed effects model, with random effects, it is also
conceptually no more difficult to allow for time variation than it is to allow
for cross-sectional variation.
• In the case of time-variation, a time period-specific error term is included and
again, a two-way model could be envisaged to allow the intercepts to vary
both cross-sectionally and over time.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 22
Fixed or Random Effects?

• It is often said that the random effects model is more appropriate when the
entities in the sample can be thought of as having been randomly selected
from the population, but a fixed effect model is more plausible when the
entities in the sample effectively constitute the entire population.

• More technically, the transformation involved in the GLS procedure under


the random effects approach will not remove the explanatory variables that
do not vary over time, and hence their impact can be enumerated.

• Also, since there are fewer parameters to be estimated with the random
effects model (no dummy variables or within transform to perform), and
therefore degrees of freedom are saved, the random effects model should
produce more efficient estimation than the fixed effects approach.

• However, the random effects approach has a major drawback which arises
from the fact that it is valid only when the composite error term it is
uncorrelated with all of the explanatory variables.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 23
Fixed or Random Effects? (Cont’d)

• This assumption is more stringent than the corresponding one in the fixed
effects case, because with random effects we thus require both i and vit to be
independent of all of the xit.
• This can also be viewed as a consideration of whether any unobserved
omitted variables (that were allowed for by having different intercepts for
each entity) are uncorrelated with the included explanatory variables. If they
are uncorrelated, a random effects approach can be used; otherwise the fixed
effects model is preferable.
• A test for whether this assumption is valid for the random effects estimator is
based on a slightly more complex version of the Hausman test.
• If the assumption does not hold, the parameter estimates will be biased and
inconsistent.
• To see how this arises, suppose that we have only one explanatory variable,
x2it that varies positively with yit, and also with the error term, it. The
estimator will ascribe all of any increase in y to x when in reality some of it
arises from the error term, resulting in biased coefficients.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 24


Credit Stability of Banks in Central and Eastern
Europe: A Random Effects Analysis
• Foreign participants in the banking sector may improve competition and
efficiency to the benefit of the economy that they enter.
• They may have a stabilising effect on credit provision since they will
probably be better diversified than domestic banks and will therefore be more
able to continue to lend when the host economy is performing poorly.
• But on the other hand, it is also argued that foreign banks may alter the credit
supply to suit their own aims rather than that of the host economy.
• They may act more pro-cyclically than local banks, since they have
alternative markets to withdraw their credit supply to when host market
activity falls.
• Moreover, worsening conditions in the home country may force the
repatriation of funds to support a weakened parent bank.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 25


The Data

• There may also be differences in policies for credit provision dependent upon
the nature of the formation of the subsidiary abroad – i.e. whether the
subsidiary's existence results from a take-over of a domestic bank or from the
formation of an entirely new startup operation (a ‘greenfield investment’).

• A study by de Haas and van Lelyveld (2006) employs a panel regression


using a sample of around 250 banks from ten Central and East European
countries.

• They examine whether domestic and foreign banks react differently to


changes in home or host economic activity and banking crises.

• The data cover the period 1993-2000 and are obtained from BankScope.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 26


The Model

• The core model is a random effects panel regression of the form:


grit    1Takeoverit  2Greenfieldi   3Crisisit   4 Macroit
 5Contrit  ( i   it )
where the dependent variable, grit, is the percentage growth in the credit of
bank i in year t; Takeover is a dummy variable taking the value one for
foreign banks resulting from a takeover and zero otherwise; Greenfield is a
dummy taking the value one if bank is the result of a foreign firm making a
new banking investment rather than taking over an existing one; crisis is a
dummy variable taking the value one if the host country for bank i was
subject to a banking disaster in year t.
• Macro is a vector of variables capturing the macroeconomic conditions in the
home country (the lending rate and the change in GDP for the home and host
countries, the host country inflation rate, and the differences in the home and
host country GDP growth rates and the differences in the home and host
country lending rates).

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 27


The Model (Cont’d)

• Contr is a vector of bank-specific control variables that may affect the


dependent variable irrespective of whether it is a foreign or domestic bank.

• These are: weakness parent bank, defined as loan loss provisions made by
the parent bank; solvency is the ratio of equity to total assets; liquidity is the
ratio of liquid assets / total assets; size is the ratio of total bank assets to total
banking assets in the given country; profitability is return on assets and
efficiency is net interest margin.

•  and the 's are parameters (or vectors of parameters in the cases of 4 and
5), i is the unobserved random effect that varies across banks but not over
time, and it is an idiosyncratic error term.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 28


Estimation Options

• de Haas and van Lelyveld discuss the various techniques that could be
employed to estimate such a model.
• OLS is considered to be inappropriate since it does not allow for differences
in average credit market growth rates at the bank level.
• A model allowing for entity-specific effects (i.e. a fixed effects model that
effectively allowed for a different intercept for each bank) is ruled out on the
grounds that there are many more banks than time periods and thus too many
parameters would be required to be estimated.
• They also argue that these bank-specific effects are not of interest to the
problem at hand, which leads them to select the random effects panel model.
• This essentially allows for a different error structure for each bank. A
Hausman test is conducted, and shows that the random effects model is valid
since the bank-specific effects i are found “in most cases not to be
significantly correlated with the explanatory variables.”

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 29


Results

Source: de Haas and van Lelyveld (2006)

‘Introductory Econometrics for Finance’ © Chris Brooks 2008 30


Analysis of Results

• The main result is that during times of banking disasters, domestic banks
significantly reduce their credit growth rates (i.e. the parameter estimate on
the crisis variable is negative for domestic banks), while the parameter is
close to zero and not significant for foreign banks.

• There is a significant negative relationship between home country GDP


growth, but a positive relationship with host country GDP growth and credit
change in the host country.

• This indicates that, as the authors expected, when foreign banks have fewer
viable lending opportunities in their own countries and hence a lower
opportunity cost for the loanable funds, they may switch their resources to
the host country.

• Lending rates, both at home and in the host country, have little impact on
credit market share growth.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 31
Analysis of Results (Cont’d)

• Interestingly, the greenfield and takeover variables are not statistically


significant (although the parameters are quite large in absolute value),
indicating that the method of investment of a foreign bank in the host country
is unimportant in determining its credit growth rate or that the importance of
the method of investment varies widely across the sample leading to large
standard errors.

• A weaker parent bank (with higher loss provisions) leads to a statistically


significant contraction of credit in the host country as a result of the reduction
in the supply of available funds.

• Overall, both home-related (‘push’) and host-related (‘pull’) factors are found
to be important in explaining foreign bank credit growth.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 32


Panel Unit Root and Cointegration Tests - Background

• Dickey-Fuller and Phillips-Perron unit root tests have low power, especially
for modest sample sizes
• It is believed that more powerful versions of the tests can be employed when
time-series and cross-sectional information is combined – as a result of the
increase in sample size
• We could increase the number of observations by increasing the sample
period, but this data may not be available, or may have structural breaks
• A valid application of the test statistics is much more complex for panels than
single series
• Two important issues to consider:
– The design and interpretation of the null and alternative hypotheses needs careful
thought
– There may be a problem of cross-sectional dependence in the errors across the
unit root testing regressions
• Early studies that assumed cross-sectional independence are sometimes
known as ‘first generation’ panel unit root tests
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 33
The MADF Test

• We could run separate regressions over time for each series but to use
Zellner’s seemingly unrelated regression (SUR) approach, which we might
term the multivariate ADF (MADF) test
• This method can only be employed if T >> N, and Taylor and Sarno (1998)
provide an early application to tests for purchasing power parity
• However, that technique is now rarely used, researchers preferring instead to
make use of the full panel structure
• A key consideration is the dimensions of the panel – is the situation that T is
large or that N is large or both? If T is large and N small, the MADF
approach can be used
• But as Breitung and Pesaran (2008) note, in such a situation one may
question whether it is worthwhile to adopt a panel approach at all, since for
sufficiently large T, separate ADF tests ought to be reliable enough to render
the panel approach hardly worth the additional complexity.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 34


The LLC Test

• Levin, Lin and Chu (2002) – LLC – develop a test based on the equation:

• The model is very general since it allows for both entity-specific and time-
specific effects through αi and θt respectively as well as separate
deterministic trends in each series through δit, and the lag structure to mop
up autocorrelation in Δy

• Of course, as for the Dickey-Fuller tests, any or all of these deterministic


terms can be omitted from the regression

• The null hypothesis is H0: ρi ≡ ρ = 0 ∀ i and the alternative is H1: ρ < 0 ∀ i.


So the autoregressive dynamics are the same for all series under the
alternative.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 35
The LLC Test and Nuisance Parameters

• One of the reasons that unit root testing is more complex in the panel
framework in practice is due to the plethora of ‘nuisance parameters’ in the
equation which are necessary to allow for the fixed effects (i.e. αi, θt, δit)

• These nuisance parameters will affect the asymptotic distribution of the test
statistics and hence LLC propose that two auxiliary regressions are run to
remove their impacts

• The resulting test statistic is asymptotically distributed as a standard normal


variate (as both T and N tend to infinity)

• Breitung (2000) develops a modified version of the LLC test which does
not include the deterministic terms and which standardises the residuals
from the auxiliary regression in a more sophisticated fashion.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 36


The LLC Test – How to Interpret the Results

• Under the LLC and Breitung approaches, only evidence against the non-
stationary null in one series is required before the joint null will be rejected

• Breitung and Pesaran (2008) suggest that the appropriate conclusion when
the null is rejected is that ‘a significant proportion of the cross-sectional
units are stationary’

• Especially in the context of large N, this might not be very helpful since no
information is provided on how many of the N series are stationary

• Often, the homogeneity assumption is not economically meaningful either,


since there is no theory suggesting that all of the series have the same
autoregressive dynamics and thus the same value of ρ

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 37


Panel Unit Root Tests with Heterogeneous Processes

• This difficulty led Im, Pesaran and Shin (2003) – IPS – to propose an
alternative approach where the null and alternative hypotheses are now H0:
ρi = 0 ∀ i and H1: ρi < 0, i = 1, 2, . . . , N1; ρi = 0, i = N1 + 1, N1 + 2, . . . , N
• So the null hypothesis still specifies all series in the panel as nonstationary,
but under the alternative, a proportion of the series (N1/N) are stationary,
and the remaining proportion ((N − N1)/N) are nonstationary
• No restriction where all of the ρ are identical is imposed
• The statistic in this case is constructed by conducting separate unit root
tests for each series in the panel, calculating the ADF t-statistic for each one
in the standard fashion, and then taking their cross-sectional average
• This average is then transformed into a standard normal variate under the
null hypothesis of a unit root in all the series
• While IPS’s heterogeneous panel unit root tests are superior to the
homogeneous case when N is modest relative to T, they may not be
sufficiently powerful when N is large and T is small, in which case the LLC
approach may be preferable.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 38
The Maddala and Wu (1999) and Choi (2001) Tests

• Maddala and Wu (1999) and Choi (2001) developed a slight variant on the
IPS approach based on an idea dating back to Fisher (1932)
• Unit root tests are conducted separately on each series in the panel and the
p-values associated with the test statistics are then combined
• If we call these p-values pvi, i = 1, 2, . . . ,N, under the null hypothesis of a
unit root in each series, pvi will be distributed uniformly over the [0,1]
interval and hence the following will hold for given N as T → ∞

• Note that the cross-sectional independence assumption is crucial


• The p-values for use in the test must be obtained from Monte Carlo
simulations.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 39


Allowing for Cross-Sectional Heterogeneity

• The assumption of cross-sectional independence of the error terms in the


panel regression is highly unrealistic
• For example, in the context of testing for whether purchasing power parity
holds, there are likely to be important unspecified factors that affect all
exchange rates or groups of exchange rates in the sample, and will result in
correlated residuals.
• O’Connell (1998) demonstrates the considerable size distortions that can
arise when such cross-sectional dependencies are present
• We can adjust the critical values employed but the power of the tests will
fall such that in extreme cases the benefit of using a panel structure could
disappear completely
• O’Connell proposes a feasible GLS estimator for ρ where an assumed form
for the correlations between the disturbances is employed

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 40


Allowing for Cross-Sectional Heterogeneity 2

• To overcome the limitation that the correlation matrix must be specified


(and this may be troublesome because it is not clear what form it should
take), Bai and Ng (2004) propose to separate the data into a common factor
component that is highly correlated across the series and a specific part that
is idiosyncratic
• A further approach is to proceed with OLS but to employ modified standard
errors – so-called ‘panel corrected standard errors’ (PCSEs) – see, for
example Breitung and Das (2005)
• Overall, however, it is clear that satisfactorily dealing with cross-sectional
dependence makes an already complex issue considerable harder still
• In the presence of such dependencies, the test statistics are affected in a
non-trivial way by the nuisance parameters.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 41


Panel Cointegration Tests

• Testing for cointegration in panels is complex since one must consider the
possibility of cointegration across groups of variables (what we might term
‘cross-sectional cointegration’) as well as within the groups
• Most of the work so far has relied upon a generalisation of the single
equation methods of the Engle-Granger type following the pioneering work
by Pedroni (1999, 2004)
• For a set of M variables yit and xm,i,t that are individually integrated of order
one and thought to be cointegrated, the model is

• The residuals from this regression are then subjected to separate Dickey-
Fuller or augmented Dickey-Fuller type regressions for each group
• The null hypothesis is that the residuals from all of the test regressions are
unit root processes and therefore that there is no cointegration.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 42


The Pedroni Approach to Panel Cointegration

• Pedroni proposes two possible alternative hypotheses:


– All of the autoregressive dynamics are the same stationary process
– The dynamics from each test equation follow a different stationary process
• In the first case no heterogeneity is permitted, while in the second it is –
analogous to the difference between LLC and IPS as described above
• Pedroni then constructs a raft of different test statistics
• These standardised test statistics are asymptotically standard normally
distributed
• It is also possible to use a generalisation of the Johansen technique
• We could employ the Johansen approach on each group of series
separately, collect the p-values for the trace test and then take −2 times the
sum of their logs following Maddala and Wu (1999) above
• A full systems approach based on a ‘global VAR’ is possible but with
considerable additional complexity – see Breitung and Pesaran (2008).
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 43
Panel Unit Root Example: The Link between Financial
Development and GDP Growth

• To what extent are economic growth and the sophistication of the country’s
financial markets linked?
• Excessive government regulations may impede the development of the
financial markets and consequently economic growth will be slower
• On the other hand, if economic agents are able to borrow at reasonable rates
of interest or raise funding easily on the capital markets, this can increase
the viability of real investment opportunities
• Given that long time-series are typically unavailable for developing
economies, traditional unit root and cointegration tests that examine the link
between these two variables suffer from low power
• This provides a strong motivation for the use of panel techniques, as in the
study by Chrisopoulos and Tsionas (2004)

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 44


Panel Unit Root Example: The Link between Financial
Development and GDP Growth 2

• Defining real output for country i as yit, financial ‘depth’ as F, the


proportion of total output that is investment as S, and the rate of inflation as
, the core model is

• Financial depth, F, is proxied by the ratio of total bank liabilities to GDP


• Data are from the IMF’s International Financial Statistics for ten countries
(Colombia, Paraguay, Peru, Mexico, Ecuador, Honduras, Kenya, Thailand,
the Dominican Republic and Jamaica) over the period 1970-2000
• The panel unit root tests of Im, Pesaran and Shin, and the Maddala-Wu chi-
squared test are employed separately for each variable, but using a panel
comprising all ten countries
• The number of lags of Δyit is determined using AIC
• The null hypothesis in all cases is that the process is a unit root.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 45


Panel Unit Root Example: Results

• The results, are much stronger than for the single series unit root tests and
show conclusively that all four series are non-stationary in levels but
stationary in differences:

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 46


Panel Cointegration Test: Example

• The LLC approach is used along with the Harris-Tzavalis technique, which
is broadly the same as LLC but has slightly different correction factors in
the limiting distribution
• These techniques are based on a unit root test on the residuals from the
potentially cointegrating regression
• Christopoulos and Tsionis investigate the use of panel cointegration tests
with fixed effects, and with both fixed effects and a deterministic trend in
the test regressions
• These are applied to the regressions both with y, and separately F, as the
dependent variables
• The results quite strongly demonstrate that when the dependent variable is
output, the LLC approach rejects the null hypothesis of a unit root in the
potentially cointegrating regression residuals when fixed effects only are
included in the test regression, but not when a trend is also included.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 47


Panel Cointegration Test: Findings

• In the context of the Harris-Tzavalis variant of the residuals-based test, for


both the fixed effects and the fixed effects + trend regressions, the null is
rejected
• When financial depth is instead used as the dependent variable, none of
these tests reject the null hypothesis
• Thus, the weight of evidence from the residuals-based tests is that
cointegration exists when output is the dependent variable, but it does not
when financial depth is
• In the final row of the table, a systems approach to testing for cointegration
based on the sum of the logs of the p-values from the Johansen test shows
that the null hypothesis of no cointegrating vectors is rejected
• The conclusion is that one cointegrating relationship exists between the
four variables across the panel.

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 48


Panel Cointegration Test: Table of Results

‘Introductory Econometrics for Finance’ © Chris Brooks 2019 49

You might also like