Professional Documents
Culture Documents
3. E ( yt1 )( yt 2 ) t 2 t1 t1 , t2
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 2
Univariate Time-series Models (cont’d)
• So if the process is covariance stationary, all the variances are the same and all
the covariances depend on the difference between t1 and t2. The moments
E ( yt E ( yt ))( yt s E ( yt s )) s , s = 0,1,2, ...
are known as the covariance function.
• The covariances, s, are known as autocovariances.
• We can also test the joint hypothesis that all m of the k correlation coefficients
are simultaneously equal to zero using the Q-statistic developed by Box and
m
Q T k2
Pierce:
k 1
where T = sample size, m = maximum lag length
• The Q-statistic is asymptotically distributed as a m2.
• However, the Box Pierce test has poor small sample properties, so a variant
has been developed, called the Ljung-Box statistic:
m
k2
Q T T 2
~ m2
k 1 T k
• This statistic is very useful as a portmanteau (general) test of linear dependence
in time-series.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 5
An ACF Example
• Question:
Suppose that a researcher had estimated the first 5 autocorrelation coefficients
using a series of length 100 observations, and found them to be (from 1 to 5):
0.207, -0.013, 0.086, 0.005, -0.022.
Test each of the individual coefficient for significance, and use both the Box-
Pierce and Ljung-Box tests to establish whether they are jointly significant.
• Solution:
A coefficient would be significant if it lies outside (-0.196,+0.196) at the 5%
level, so only the first autocorrelation coefficient is significant.
Q=5.09 and Q*=5.26
Compared with a tabulated 2(5)=11.1 at the 5% level, so the 5 coefficients
are jointly insignificant.
Var(Xt) = E[Xt-E(Xt)][Xt-E(Xt)]
but E(Xt) = 0, so
Var(Xt) = E[(Xt)(Xt)]
= E[(ut + 1ut-1+ 2ut-2)(ut + 1ut-1+ 2ut-2)]
= E[ u t2 12 u t21 22 u t2 2 +cross-products]
So Var(Xt) = 0= E [ u t 1 u t 1 2 u t 2 ]
2 2 2 2 2
= 1 2
2 2 2 2 2
= (1 1 2 )
2 2 2
= 1 2 1 2 2
= ( 1 1 2 ) 2
2 = E[Xt-E(Xt)][Xt-2-E(Xt-2)]
= E[Xt][Xt-2]
= E[(ut +1ut-1+2ut-2)(ut-2 +1ut-3+2ut-4)]
= E[( 2 u t2 2 )]
= 2
2
3 = E[Xt-E(Xt)][Xt-3-E(Xt-3)]
= E[Xt][Xt-3]
= E[(ut +1ut-1+2ut-2)(ut-3 +1ut-4+2ut-5)]
=0
So s = 0 for s > 2.
(iii) For 1 = -0.5 and 2 = 0.25, substituting these into the formulae above
gives 1 = -0.476, 2 = 0.190.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 12
ACF Plot
1.2
0.8
0.6
0.4
acf
0.2
0
0 1 2 3 4 5 6
-0.2
-0.4
-0.6
or y t i L y t u t
i
•
i 1
• This states that any stationary series can be decomposed into the sum of
two unrelated processes, a purely deterministic part and a purely stochastic
part, which will be an MA().
where,
( L) (1 1 L 2 L2 ... p Lp ) 1
Var(yt) = E[yt-E(yt)][yt-E(yt)]
but E(yt) = 0, since we are setting = 0.
Var(yt) = E[(yt)(yt)]
= E[ ut 1ut 1 12ut 2 .. ut 1ut 1 12ut 2 .. ]
= E[(ut 2 12ut 12 14ut 2 2 ... cross products)]
= E[(ut 1 ut 1 1 ut 2 ...)]
2 2 2 4 2
= 2
u
(1 12 )
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 21
Solution (cont’d)
(iii) Turning now to calculating the acf, first calculate the autocovariances:
1 = Cov(yt, yt-1) = E[yt-E(yt)][yt-1-E(yt-1)]
Since a0 has been set to zero, E(yt) = 0 and E(yt-1) = 0, so
1 = E[ytyt-1]
1 = E[(ut 1ut 1 1 ut 2 ...)(u t 1 1u t 2 1 u t 3 ...) ]
2 2
u
= E[ 1 t 1
2
1
3
u t 2 ... cross products]
2
= 1 1 1 ...
2 3 2 5 2
1 2
=
(1 12 )
12 2
=
(1 12 )
• If these steps were repeated for 3, the following expression would be
obtained
13 2
3 =
(1 12 )
and for any lag s, the autocovariance would be given by
1s 2
s =
(1 12 )
The acf can now be obtained by dividing the covariances by the
variance:
0
0 = 1
0
2 2
1 2 1
2 2
(1 1 ) (1 1 )
1 2
1 = 1 2 = 12
0
0
2 2
2 2
(1 1 ) (1 1 )
3 = 1
3
…
s = 1s
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 25
The Partial Autocorrelation Function (denoted kk)
• So kk measures the correlation between yt and yt-k after removing the effects
of yt-k+1 , yt-k+2 , …, yt-1 .
• The pacf is useful for telling the difference between an AR process and an
ARMA process.
• In the case of an AR(p), there are direct connections between yt and yt-s only
for s p.
• In the case of an MA(q), this can be written as an AR(), so there are direct
connections between yt and all its previous values.
where ( L) 1 1 L 2 L2 ... p Lp
or yt 1 yt 1 2 yt 2 ... p yt p 1u t 1 2 u t 2 ... q u t q u t
0
1 2 3 4 5 6 7 8 9 10
-0.05
-0.1
-0.15
acf and pacf
-0.2
-0.25
-0.3
acf
-0.35
pacf
-0.4
-0.45
Lag
0.4
0.3 acf
pacf
0.2
0.1
acf and pacf
0
1 2 3 4 5 6 7 8 9 10
-0.1
-0.2
-0.3
-0.4
Lags
0.9
acf
0.8 pacf
0.7
0.6
acf and pacf
0.5
0.4
0.3
0.2
0.1
0
1 2 3 4 5 6 7 8 9 10
-0.1
Lags
0.6
0.5
acf
pacf
0.4
acf and pacf
0.3
0.2
0.1
0
1 2 3 4 5 6 7 8 9 10
-0.1
Lags
0.3
0.2
0.1
0
1 2 3 4 5 6 7 8 9 10
acf and pacf
-0.1
-0.2
-0.3
-0.4
acf
-0.5 pacf
-0.6
Lags
0.9
acf
pacf
0.8
0.7
0.6
acf and pacf
0.5
0.4
0.3
0.2
0.1
0
1 2 3 4 5 6 7 8 9 10
Lags
0.8
0.6
acf
pacf
0.4
acf and pacf
0.2
0
1 2 3 4 5 6 7 8 9 10
-0.2
-0.4
Lags
• Box and Jenkins (1970) were the first to approach the task of estimating an
ARMA model in a systematic manner. There are 3 steps to their approach:
1. Identification
2. Estimation
3. Model diagnostic checking
Step 1:
- Involves determining the order of the model.
- Use of graphical procedures
- A better procedure is now available
Step 2:
- Estimation of the parameters
- Can be done using least squares or maximum likelihood depending
on the model.
Step 3:
- Model checking
• Reasons:
- variance of estimators is inversely proportional to the number of degrees of
freedom.
- models which are profligate might be inclined to fit to data specific features
• This gives motivation for using information criteria, which embody 2 factors
- a term which is a function of the RSS
- some penalty for adding extra parameters
• The object is to choose the number of parameters which minimises the
information criterion.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 40
Information Criteria for Model Selection
• The information criteria vary according to how stiff the penalty term is.
• The three most popular criteria are Akaike’s (1974) information criterion
(AIC), Schwarz’s (1978) Bayesian information criterion (SBIC), and the
Hannan-Quinn criterion (HQIC).
AIC ln( 2 ) 2k / T
k
SBIC ln( ˆ 2 ) ln T
T
2k
HQIC ln( ˆ 2 ) ln(ln( T ))
T
where k = p + q + 1, T = sample size. So we min. IC s.t. p p, q q
SBIC embodies a stiffer penalty term than AIC.
• Which IC should be preferred if they suggest different model orders?
– SBIC is strongly consistent but (inefficient).
– AIC is not consistent, and will typically pick “bigger” models.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 41
ARIMA Models
i 0
since 0, the effect of each observation declines exponentially as we
move another observation forward in time.
• Forecasting = prediction.
• An important test of the adequacy of a model.
e.g.
- Forecasting tomorrow’s return on a particular share
- Forecasting the price of a house given its characteristics
- Forecasting the riskiness of a portfolio over the next year
- Forecasting the volatility of bond returns
• The distinction between the two types is somewhat blurred (e.g, VARs).
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 47
In-Sample Versus Out-of-Sample
• Say we have some data - e.g. monthly FTSE returns for 120 months:
1990M1 – 1999M12. We could use all of it to build the model, or keep
some observations back:
• A good test of the model since we have not used the information from
1999M1 onwards when we estimated the model parameters.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 48
How to produce forecasts
• Structural models
e.g. y = X + u
yt 1 2 x2t k xkt ut
To forecast y, we require the conditional expectation of its future
= 1 2 E x2t k E xkt
But what are ( x 2t ) etc.? We could use x 2 , so
E yt 1 2 x2 k xk
= y !!
• Time-series Models
The current value of a series, yt, is modelled as a function only of its previous
values and the current value of an error term (and possibly previous values of
the error term).
• Models include:
• simple unweighted averages
• exponentially weighted averages
• ARIMA models
• Non-linear models – e.g. threshold models, GARCH, bilinear models, etc.
ft, 4 = E(yt+4 t ) =
•For example, say we predict that tomorrow’s return on the FTSE will be 0.2, but
the outcome is actually -0.4. Is this accurate? Define ft,s as the forecast made at
time t for s steps ahead (i.e. the forecast made for time t+s), and yt+s as the
realised value of y at time t+s.
• Some of the most popular criteria for assessing the accuracy of time-series
forecasting techniques are:
N
1
MSE
N
t 1
( yt s f t , s ) 2
N
1
MAE is given by MAE
N
t 1
yt s f t , s
1 N yt s ft ,s
Mean absolute percentage error: MAPE 100
N t 1 yt s
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 57
How can we test whether a forecast is accurate or not?
(cont’d)
• It has, however, also been shown (Gerlow et al., 1993) that the accuracy of
forecasts according to traditional statistical criteria are not related to trading
profitability.
1 N
% correct sign predictions = zt s
N t 1
• Given the following forecast and actual values, calculate the MSE, MAE and
percentage of correct sign predictions:
Multivariate models
• All the models we have looked at thus far have been single equations models of
the form y = X + u
• All of the variables contained in the X matrix are assumed to be EXOGENOUS.
• y is an ENDOGENOUS variable.
• Assuming that the market always clears, and dropping the time subscripts
for simplicity
Q P S u (4)
Q P T v (5)
• The point is that price and quantity are determined simultaneously (price
affects quantity and quantity affects price).
• Solving for Q,
P S u P T v (6)
• Solving for P,
Q S u Q T v (7)
• Rearranging (6),
P P T S v u
( ) P ( ) T S (v u)
vu (8)
P T S
u v
Q T S (9)
• (8) and (9) are the reduced form equations for P and Q.
• But what would happen if we had estimated equations (4) and (5), i.e. the
structural form equations, separately using OLS?
• It is clear from (8) that P is related to the errors in (4) and (5) - i.e. it is
stochastic.
Q 20 21T 22 S 2 (11)
• We CAN estimate equations (10) & (11) using OLS since all the RHS
variables are exogenous.
• But ... we probably don’t care what the values of the coefficients are;
what we wanted were the original parameters in the structural
equations - , , , , , .
1. An equation is unidentified
· like (12) or (13)
· we cannot get the structural coefficients from the reduced form estimates
3. An equation is over-identified
· Example given later
· More than one set of structural coefficients could be obtained from the
reduced form.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 11
What Determines whether an Equation is Identified
or not? (cont’d)
• If more than G-1 are absent, it is over-identified. If less than G-1 are
absent, it is not identified.
Example
• In the following system of equations, the Y’s are endogenous, while the X’s
are exogenous. Determine whether each equation is over-, under-, or just-
identified. Y Y Y X X u
1 0 1 2 3 3 4 1 5 2 1
Y2 0 1Y3 2 X1 u2
(14)-(16)
Y3 0 1Y2 u3
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 13
Simultaneous Equations Bias (cont’d)
Solution
G = 3;
If # excluded variables = 2, the eqn is just identified
If # excluded variables > 2, the eqn is over-identified
If # excluded variables < 2, the eqn is not identified
3. Run the regression (14) again, but now also including the fitted values
Y2 , Y3 as additional regressors:
• Assume that the error terms are not correlated with each other. Can we estimate
the equations individually using OLS?
Stage 1:
• Obtain and estimate the reduced form equations using OLS. Save the
fitted values for the dependent variables.
Stage 2:
• Estimate the structural equations, but replace any RHS endogenous
variables with their stage 1 fitted values.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 20
Estimation of Systems
Using Two-Stage Least Squares (cont’d)
Stage 1:
• Estimate the reduced form equations (17)-(19) individually by OLS and
obtain the fitted values, Y1 ,Y2 ,Y3 .
Stage 2:
• Replace the RHS endogenous variables with their stage 1 estimated values:
Y1 0 1Y2 3Y3 4 X 1 5 X 2 u1
Y2 0 1Y3 2 X 1 u2 (24)-(26)
Y3 0 1Y2 u3
• Now Y2 and Y3 will not be correlated with u1, Y2 will not be correlated with u2 ,
and Y3 will not be correlated with u3 .
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 21
Estimation of Systems
Using Two-Stage Least Squares (cont’d)
• Recall that the reason we cannot use OLS directly on the structural
equations is that the endogenous variables are correlated with the errors.
• One solution to this would be not to use Y2 or Y3 , but rather to use some
other variables instead.
• We want these other variables to be (highly) correlated with Y2 and Y3, but
not correlated with the errors - they are called INSTRUMENTS.
• Obtain the fitted values from (27) & (28), Y2 and Y3 , and replace Y2 and
Y3 with these in the structural equation.
• If the instruments are the variables in the reduced form equations, then
IV is equivalent to 2SLS.
• How Might the Option Price / Trading Volume and the Bid / Ask Spread
be Related?
Consider 3 possibilities:
1. Market makers equalise spreads across options.
2. The spread might be a constant proportion of the option value.
3. Market makers might equalise marginal costs across options irrespective
of trading volume.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 26
Market Making Costs
• The S&P 100 Index has been traded on the CBOE since 1983 on a
continuous open-outcry auction basis.
• The average bid & ask prices are calculated for each option during the
time 2:00pm – 2:15pm Central Standard time.
• The following are then dropped from the sample for that day:
1. Any options that do not have bid / ask quotes reported during the ¼ hour.
2. Any options with fewer than 10 trades during the day.
• The option price is defined as the average of the bid & the ask.
where PRi & CRi are the squared deltas of the options
• Equations (1) & (2) and then separately (3) & (4) are estimated using
2SLS.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 32
Results 1
0 1 2 3 4 5 Adj. R2
0.08362 0.06114 0.01679 0.00902 -0.00228 -0.15378 0.688
(16.80) (8.63) (15.49) (14.01) (-12.31) (-12.52)
0 1 2 3 4 Adj. R2
-3.8542 46.592 -0.12412 0.00406 0.00866 0.618
(-10.50) (30.49) (-6.01) (14.43) (4.76)
Note: t-ratios in parentheses. Source: George and Longstaff (1993). Reprinted with the permission of
the School of Business Administration, University of Washington.
Adjusted R2 60%
1 and 1 measure the effect of the spread size on trading activity etc.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 35
Calls and Puts as Substitutes
• The paper argues that calls and puts might be viewed as substitutes
since they are all written on the same underlying.
• So call trading activity might depend on the put spread and put trading
activity might depend on the call spread.
• The authors argue that in the second part of the paper, they did indeed find
evidence of substitutability between calls & puts.
Comments
- No diagnostics.
- Why do the CL and PL equations not contain the CR and PR variables?
- The authors could have tested for endogeneity of CBA and CL.
- Why are the squared terms in maturity and moneyness only in the
liquidity regressions?
- Wrong sign on the squared deltas.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 37
Vector Autoregressive Models
• A VAR is in a sense a systems regression model i.e. there is more than one
dependent variable.
yt = 0 + 1 yt-1 + ut
g1 g1 gg g1 g1
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 39
Vector Autoregressive Models:
Notation and Concepts (cont’d)
• This model can be extended to the case where there are k lags of each
variable in each equation:
• We can also extend this to the case where the model includes first
difference terms and cointegrating relationships (a VECM).
Cross-Equation Restrictions
In the spirit of (unrestricted) VAR modelling, each equation should have
the same lag length
Suppose that a bivariate VAR(8) estimated using quarterly data has 8 lags
of the two variables in each equation, and we want to examine a restriction
that the coefficients on lags 5 through 8 are jointly zero. This can be done
using a likelihood ratio test
Denote the variance-covariance matrix of residuals (given by uˆuˆ /T), aŝ .
The likelihood ratio test for this joint hypothesis is given by
LR T log ˆ r log ˆ u
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 42
Choosing the Optimal Lag Length for a VAR
(cont’d)
• We can take the contemporaneous terms over to the LHS and write
1 12 y1t 10 11 11 y1t 1 u1t
22 1 y2 t 20 21 21 y2 t 1 u2 t
or
A yt = 0 + 1 yt-1 + ut
• This is known as a standard form VAR, which we can estimate using OLS.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 46
Block Significance and Causality Tests
• This is done by determining how much of the s-step ahead forecast error
variance for each variable is explained innovations to each explanatory
variable (s = 1,2,…).
Lags of Variable
Dependent variable SIR DIVY SPREAD UNEM UNINFL PROPRES
SIR 0.0000 0.0091 0.0242 0.0327 0.2126 0.0000
DIVY 0.5025 0.0000 0.6212 0.4217 0.5654 0.4033
SPREAD 0.2779 0.1328 0.0000 0.4372 0.6563 0.0007
UNEM 0.3410 0.3026 0.1151 0.0000 0.0758 0.2765
UNINFL 0.3057 0.5146 0.3420 0.4793 0.0004 0.3885
PROPRES 0.5537 0.1614 0.5537 0.8922 0.7222 0.0000
Months ahead I II I II I II I II I II I II
1 0.0 0.8 0.0 38.2 0.0 9.1 0.0 0.7 0.0 0.2 100.0 51.0
2 0.2 0.8 0.2 35.1 0.2 12.3 0.4 1.4 1.6 2.9 97.5 47.5
3 3.8 2.5 0.4 29.4 0.2 17.8 1.0 1.5 2.3 3.0 92.3 45.8
4 3.7 2.1 5.3 22.3 1.4 18.5 1.6 1.1 4.8 4.4 83.3 51.5
12 2.8 3.1 15.5 8.7 15.3 19.5 3.3 5.1 17.0 13.5 46.1 50.0
24 8.2 6.3 6.8 3.9 38.0 36.2 5.5 14.7 18.1 16.9 23.4 22.0
0.06
0.04
0.02
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
-0.02
-0.04
• If the variables in the regression model are not stationary, then it can
be proved that the standard assumptions for asymptotic analysis will
not be valid. In other words, the usual “t-ratios” will not follow a t-
distribution, so we cannot validly undertake hypothesis tests about the
regression parameters.
yt = + t + u t (2)
• Note that the model (1) could be generalised to the case where yt is an
explosive process:
yt = + yt-1 + ut
where > 1.
• We have 3 cases:
1. <1 T0 as T
So the shocks to the system gradually die away.
2. =1 T =1 T
So shocks persist in the system and never die away. We obtain:
yt y0 ut as T
t 0
So just an infinite sum of past shocks plus some starting value of y0.
3. >1. Now given shocks become more influential as time goes on,
since if >1, 3>2> etc.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 8
Detrending a Stochastically Non-stationary Series
4
3
2
1
0
-1 1 40 79 118 157 196 235 274 313 352 391 430 469
-2
-3
-4
70
60
Random Walk
50
Random Walk with Drift
40
30
20
10
0
1 19 37 55 73 91 109 127 145 163 181 199 217 235 253 271 289 307 325 343 361 379 397 415 433 451 469 487
-10
-20
30
25
20
15
10
5
0
-5 1 40 79 118 157 196 235 274 313 352 391 430 469
15
Phi=1
10
Phi=0.8
Phi=0
5
0
1 53 105 157 209 261 313 365 417 469 521 573 625 677 729 781 833 885 937 989
-5
-10
-15
-20
Definition
If a non-stationary series, yt must be differenced d times before it becomes
stationary, then it is said to be integrated of order d. We write yt I(d).
So if yt I(d) then dyt I(0).
An I(0) series is a stationary series
An I(1) series contains one unit root,
e.g. yt = yt-1 + ut
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 15
Characteristics of I(0), I(1) and I(2) Series
• An I(2) series contains two unit roots and so would require differencing
twice to induce stationarity.
• I(1) and I(2) series can wander a long way from their mean value and
cross this mean value rarely.
• The majority of economic and financial series contain a single unit root,
although some are stationary and consumer prices have been argued to
have 2 unit roots.
• The early and pioneering work on testing for a unit root in time series
was done by Dickey and Fuller (Dickey and Fuller 1979, Fuller 1976).
The basic objective of the test is to test the null hypothesis that =1 in:
yt = yt-1 + ut
against the one-sided alternative <1. So we have
H0: series contains a unit root
vs. H1: series is stationary.
• We can write
yt=ut
where yt = yt- yt-1, and the alternatives may be expressed as
yt = yt-1++t +ut
with ==0 in case i), and =0 in case ii) and =-1. In each case, the
tests are based on the t-ratio on the yt-1 term in the estimated regression of
yt on yt-1, plus a constant in case ii) and a constant and trend in case iii).
The test statistics are defined as
test statistic =
SE( )
• The test statistic does not follow the usual t-distribution under the null,
since the null is one of non-stationarity, but rather follows a non-standard
distribution. Critical values are derived from Monte Carlo experiments in,
for example, Fuller (1976). Relevant examples of the distribution are
shown in table 4.1 below
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 19
Critical Values for the DF Test
The null hypothesis of a unit root is rejected in favour of the stationary alternative
in each case if the test statistic is more negative than the critical value.
• The tests above are only valid if ut is white noise. In particular, ut will be
autocorrelated if there was autocorrelation in the dependent variable of the
regression (yt) which we have not modelled. The solution is to “augment”
the test using p lags of the dependent variable. The alternative model in
case (i) is now written: p
yt yt 1 i yt i ut
i 1
• The same critical values from the DF tables are used as before. A problem
now arises in determining the optimal number of lags of the dependent
variable.
There are 2 ways
- use the frequency of the data to decide
- use information criteria
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 21
Testing for Higher Orders of Integration
• The tests usually give the same conclusions as the ADF tests, and the
calculation of the test statistics is complex.
• The main criticism is that the power of the tests is low if the process is
stationary but with a root close to the non-stationary boundary.
e.g. the tests are poor at deciding if
=1 or =0.95,
especially with small sample sizes.
• One way to get around this is to use a stationarity test as well as the
unit root tests we have looked at.
So that by default under the null the data will appear stationary.
• Thus we can compare the results of these tests with the ADF/PP
procedure to see if we obtain the same conclusion.
• A Comparison
ADF / PP KPSS
H0: yt I(1) H0: yt I(0)
H1: yt I(0) H1: yt I(1)
• 4 possible outcomes
• The equation for the third (most general) version of the test is
• More seriously, Christiano (1992) has argued that the critical values
employed with the test will presume the break date to be chosen
exogenously
• But most researchers will select a break point based on an examination
of the data
• Thus the asymptotic theory assumed will no longer hold
• Banerjee et al. (1992) and Zivot and Andrews (1992) introduce an
approach to testing for unit roots in the presence of structural change
that allows the break date to be selected endogenously.
• Their methods are based on recursive, rolling and sequential tests.
• For the recursive and rolling tests, Banerjee et al. propose four
specifications.
1. The standard Dickey-Fuller test on the whole sample
2. The ADF test conducted repeatedly on the sub-samples and the minimal
DF statistic is obtained
3. The maximal DF statistic obtained from the sub-samples
4. The difference between the maximal and minimal statistics
• For the sequential test, the whole sample is used each time with the
following regression being run
• The test is run repeatedly for different values of Tb over as much of the
data as possible (a ‘trimmed sample’)
• This excludes the first few and the last few observations
• Clearly it is τt(tused) allows for the break, which can either be in the
level (where τt(tused) = 1 if t> tused and 0 otherwise); or the break can be
in the deterministic trend (where τt(tused) = t − tused if t > tused and 0
otherwise
• This technique is very similar to that of Zivot and Andrews, except that
Perron’s is more flexible since it allows for a break under both the null
and alternative hypotheses
• The findings for the recursive tests are that the unit root null should
not be rejected at the 10% level for any of the maturities examined
• For the sequential tests, the results are slightly more mixed with the
break in trend model not rejecting the null hypothesis, while it is
rejected for the short, 7-day and the 1-month rates when a structural
break is allowed for in the mean
• The weight of evidence indicates that short term interest rates are best
viewed as unit root processes that have a structural break in their level
around the time of ‘Black Wednesday’ (16 September 1992) when the
UK dropped out of the European Exchange Rate Mechanism
• The longer term rates, on the other hand, are I(1) processes with no
breaks
• In most cases, if we combine two variables which are I(1), then the
combination will also be I(1).
Let k (1)
zt i X i,t
i 1
Then zt I(max di)
i zt
where i
1 , z '
t , i 2,..., k
1
• But the disturbances would have some very undesirable properties: zt´ is
not stationary and is autocorrelated if all of the Xi are I(1).
• We want to ensure that the disturbances are I(0). Under what circumstances
will this be the case?
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 39
Definition of Cointegration (Engle & Granger, 1987)
• Many time series are non-stationary but “move together” over time.
• If variables are cointegrated, it means that a linear combination of them will
be stationary.
• There may be up to r linearly independent cointegrating relationships (where
r k-1), also known as cointegrating vectors. r is also known as the
cointegrating rank of zt.
• A cointegrating relationship may also be seen as a long term relationship.
• The problem with this approach is that pure first difference models have no
long run solution.
e.g. Consider yt and xt both I(1).
The model we may want to estimate is
yt = xt + ut
But this collapses to nothing in the long run.
• One way to get around this problem is to use both first difference and
levels terms, e.g.
yt = 1xt + 2(yt-1-xt-1) + ut (1)
• yt-1-xt-1 is known as the error correction term.
• ut should be I(0) if the variables yt, x2t, ... xkt are cointegrated.
• Engle and Granger (1987) have tabulated a new set of critical values
and hence the test is known as the Engle Granger (E.G.) test.
• We can also use the Durbin Watson test statistic or the Phillips Perron
approach to test for non-stationarity of ut .
• What are the null and alternative hypotheses for a test on the residuals
of a potentially cointegrating regression?
H0 : unit root in cointegrating regression’s residuals
H1 : residuals from cointegrating regression are stationary
• There are (at least) 3 methods we could use: Engle Granger, Engle and Yoo,
and Johansen.
• The Engle Granger 2 Step Method
This is a single equation technique which is conducted as follows:
Step 1:
- Make sure that all the individual variables are I(1).
- Then estimate the cointegrating regression using OLS.
- Save the residuals of the cointegrating regression, ut .
- Test these residuals to ensure that they are I(0).
Step 2:
- Use the step 1 residuals as one variable in the error correction model e.g.
yt = 1xt + 2( uˆt 1 ) + ut
where uˆt 1= yt-1-ˆ xt-1
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 46
An Example of a Model for Non-stationary
Variables: Lead-Lag Relationships between Spot
and Futures Prices
Background
• We expect changes in the spot price of a financial asset and its corresponding
futures price to be perfectly contemporaneously correlated and not to be
cross-autocorrelated.
• We can test this idea by modelling the lead-lag relationship between the two.
• Tse (1995): 1055 daily observations on NSA stock index and stock
index futures values from December 1988 - April 1993.
F*t = S t e(r-d)(T-t)
where Ft* is the fair futures price, St is the spot price, r is a
continuously compounded risk-free rate of interest, d is the
continuously compounded yield in terms of dividends derived from the
stock index until the futures contract matures, and (T-t) is the time to
maturity of the futures contract. Taking logarithms of both sides of
equation above gives
f t * st (r - d)(T - t)
Futures Spot
Dickey-Fuller Statistics -0.1329 -0.7335
for Log-Price Data
• Conclusion: log Ft and log St are not stationary, but log Ft and log St are
stationary.
• But a model containing only first differences has no long run relationship.
• The solution is to see if there exists a cointegrating relationship between ft
and st which would mean that we can validly include levels terms in this
framework.
Cointegrating Regression
Coefficient Estimated Value
0 0.1345
1 0.9834
DF Test on residuals Test Statistic
ẑt -14.7303
• One of the problems with the EG 2-step method is that we cannot make
any inferences about the actual cointegrating regression.
• The Engle & Yoo (EY) 3-step procedure takes its first two steps from EG.
• The most important problem with both these techniques is that in the
general case above, where we have more than two variables which may be
cointegrated, there could be more than one cointegrating relationship.
• So, in the case where we just had y and x, then r can only be one or
zero.
• And if there are others, how do we know how many there are or
whether we have found the “best”?
• Let denote a gg square matrix and let c denote a g1 non-zero vector,
and let denote a set of scalars.
and hence
( - Ig ) c = 0
where Ig is an identity matrix.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 63
Review of Matrix Algebra (cont’d)
5 1
• For example, let be the 2 2 matrix
2 4
5 1 1 0
- Ig 0
2 4 0 1
5 1
(5 )(4 ) 2 2 9 18
2 4
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 64
Review of Matrix Algebra (cont’d)
• The rank of a matrix is equal to the order of the largest square matrix we
can obtain from which has a non-zero determinant.
• The test for cointegration between the y’s is calculated by looking at the
rank of the matrix via its eigenvalues. (To prove this requires some
technical intermediate steps).
and
max (r , r 1) T ln(1 r 1 )
where is the estimated value for the ith ordered eigenvalue from the
matrix.
trace tests the null that the number of cointegrating vectors is less than
equal to r against an unspecified alternative.
trace = 0 when all the i = 0, so it is a joint test.
max tests the null that the number of cointegrating vectors is r against
an alternative of r+1.
11 y1 11
12 y 2 or 12 y y y y
11 12 13 14 11 1 12 2 13 3 14 4
y 13
13 3
y 14
14 4 t k
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 69
Johansen Critical Values
• Johansen & Juselius (1990) provide critical values for the 2 statistics.
The distribution of the test statistics is non-standard. The critical values
depend on:
1. the value of g-r, the number of non-stationary components
2. whether a constant and/or trend are included in the regressions.
• If the test statistic is greater than the critical value from Johansen’s
tables, reject the null hypothesis that there are r cointegrating vectors in
favour of the alternative that there are more than r.
H 0: r = 0 vs H1: 0 < r g
H 0: r = 1 vs H1: 1 < r g
H0: r = 2 vs H1: 2 < r g
... ... ...
H0: r = g-1 vs H1: r = g
• But how does this correspond to a test of the rank of the matrix?
• r is the rank of .
• cannot be of full rank (g) since this would correspond to the original
yt being stationary.
• If has zero rank, then by analogy to the univariate case, yt depends
only on yt-j and not on yt-1, so that there is no long run relationship
between the elements of yt-1. Hence there is no cointegration.
• For 1 < rank () < g , there are multiple cointegrating vectors.
• After this point, if the restriction is not severe, then the cointegrating
vectors will not change much upon imposing the restriction.
• Does the PPP relationship hold for the US/Italian exchange rate–price
system?
Data:
• Daily closing observations on redemption yields on government bonds for
4 bond markets: US, UK, West Germany, Japan.
• For cointegration, a necessary but not sufficient condition is that the yields
are nonstationary. All 4 yields series are I(1).
• The paper then goes on to estimate a VAR for the first differences of the
yields, which is of the form k
X t i X t i t
i 1
where
X (US ) t 11i 12i 13i 14i 1t
X (UK ) 22i 23i 24i
Xt t , i 21i ,t 2 t
X (WG ) t 31i 32i 33i 34i 3t
X ( JAP) 42i 43i 44i
They set k = 8. t 41i 4t
Source: Mills and Mills (1991). Reprinted with the permission of Blackwell Publishers.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 81
Impulse Responses for VAR of
International Bond Yields
Response of UK to innovations in
Days after shock US UK Germany Japan
0 0.19 0.97 0.00 0.00
1 0.16 0.07 0.01 -0.06
2 -0.01 -0.01 -0.05 0.09
3 0.06 0.04 0.06 0.05
4 0.05 -0.01 0.02 0.07
10 0.01 0.01 -0.04 -0.01
20 0.00 0.00 -0.01 0.00
Source: Mills and Mills (1991). Reprinted with the permission of Blackwell Publishers.
• Models with nonlinear g(•) are “non-linear in mean”, while those with
nonlinear 2(•) are “non-linear in variance”.
• Here the dependent variable is the residual series and the independent
variables are the squares, cubes, …, of the fitted values.
• Many other non-linearity tests are available - e.g., the BDS and
bispectrum test
• BDS is a pure hypothesis test. That is, it has as its null hypothesis that
the data are pure noise (completely random)
• It has been argued to have power to detect a variety of departures from
randomness – linear or non-linear stochastic processes, deterministic
chaos, etc)
• The BDS test follows a standard normal distribution under the null
• The test can also be used as a model diagnostic on the residuals to ‘see
what is left’
• If the proposed model is adequate, the standardised residuals should be
white noise.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 7
Chaos Theory
• Neural networks are not very popular in finance and suffer from
several problems:
– The coefficient estimates from neural networks do not have any real
theoretical interpretation
– Virtually no diagnostic or specification tests are available for estimated
models
– They can provide excellent fits in-sample to a given set of ‘training’ data,
but typically provide poor out-of-sample forecast accuracy
– This usually arises from the tendency of neural networks to fit closely to
sample-specific data features and ‘noise’, and so they cannot ‘generalise’
– The non-linear estimation of neural network models can be cumbersome
and computationally time-intensive, particularly, for example, if the model
must be estimated repeatedly when rolling through a sample.
• Modelling and forecasting stock market volatility has been the subject
of vast empirical and theoretical investigation
• There are a number of motivations for this line of inquiry:
– Volatility is one of the most important concepts in finance
– Volatility, as measured by the standard deviation or variance of returns, is
often used as a crude measure of the total risk of financial assets
– Many value-at-risk (VaR) models for measuring market risk require the
estimation or forecast of a volatility parameter
– The volatility of stock market prices also enters directly into the Black–
Scholes formula for deriving the prices of traded options
• We will now examine several volatility models.
• Is the variance of the errors likely to be constant over time? Not for
financial data.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 15
Autoregressive Conditionally Heteroscedastic
(ARCH) Models
• So use a model which does not assume that the variance is constant.
• Recall the definition of the variance of ut:
t2= Var(ut ut-1, ut-2,...) = E[(ut-E(ut)) ut-1, ut-2,...]
2
• Instead of calling the variance t2, in the literature it is usually called ht,
so the model is
yt = 1 + 2x2t + ... + kxkt + ut , ut N(0,ht)
where ht = 0 + 1 ut21 +2 ut2 2 +...+q ut2 q
t 0 1ut21 , vt N(0,1)
• The two are different ways of expressing exactly the same model. The
first form is easier to understand while the second form is required for
simulating from an ARCH model, for example.
1. First, run any postulated linear regression of the form given in the equation
above, e.g. yt = 1 + 2x2t + ... + kxkt + ut
saving the residuals, ût .
2. Then square the residuals, and regress them on q own lags to test for ARCH
of order q, i.e. run the regression
uˆt2 0 1uˆt21 2uˆt2 2 ... quˆt2 q vt
where vt is iid.
Obtain R2 from this regression
If the value of the test statistic is greater than the critical value from the
2 distribution, then reject the null hypothesis.
• Note that the ARCH test is also sometimes applied directly to returns
instead of the residuals from Stage 1 above.
• How do we decide on q?
• The required value of q might be very large
• Non-negativity constraints might be violated.
– When we estimate an ARCH model, we require i >0 i=1,2,...,q
(since variance cannot be negative)
• Since the model is no longer of the usual linear form, we cannot use
OLS.
• The method works by finding the most likely values of the parameters
given the actual data.
1. Specify the appropriate equations for the mean and the variance - e.g. an
AR(1)- GARCH(1,1) model:
yt = + yt-1 + ut , ut N(0,t2)
t2 = 0 + 1 ut21 +t-12
2. Specify the log-likelihood function to maximise:
T 1 T 1 T
L log( 2 ) log( t ) ( y t y t 1 ) 2 / t
2 2
2 2 t 1 2 t 1
3. The computer will maximise the function and give parameter values and
their standard errors
• Then the joint pdf for all the y’s can be expressed as a product of the
individual density functions
f ( y1 , y 2 ,..., yT 1 2 X t , 2 ) f ( y1 1 2 X 1 , 2 ) f ( y 2 1 2 X 2 , 2 )...
f ( yT 1 2 X 4 , 2 )
(2)
T
f ( yt 1 2 X t , 2 )
t 1
1 1 T ( y t 1 2 xt ) 2
LF ( 1 , 2 , ) T
2
exp (4)
( 2 ) T
2 t 1 2
• We want to differentiate (4) w.r.t. 1, 2,2, but (4) is a product containing
T terms.
• From (6), ( y ˆ ˆ x ) 0
t 1 2 t
y ˆ ˆ x 0
t 1 2 t
y Tˆ ˆ x 0
t 1 2 t
1 ˆ ˆ 1
T
t 1 2 T xt 0
y
(9)
ˆ1 y ˆ 2 x
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 32
Parameter Estimation using Maximum Likelihood
(cont’d)
• From (7), ( y ˆ ˆ x ) x 0
t 1 2 t t
y x ˆ x ˆ x 0
t t 1 t
2
2 t
y x ˆ x ˆ x 0
t t 1 t 2
2
t
ˆ x y x ( y ˆ x ) x
2
2
t t t 2 t
ˆ x y x Txy ˆ Tx
2
2
t t t 2
2
ˆ 2
y x Txy
t t
(10)
( x Tx ) 2
t
2
T 1
• From (8), ˆ 2
ˆ 4
(y t ˆ1 ˆ 2 xt ) 2
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 33
Parameter Estimation using Maximum Likelihood
(cont’d)
• Rearranging, ˆ 2 1 ( yt ˆ1 ˆ 2 xt ) 2
T
1
2 ut2 (11)
T
2 2 t 1 2 t 1
• Unfortunately, the LLF for a model with time-varying variances cannot be
maximised analytically, except in the simplest of cases. So a numerical
procedure is used to maximise the log-likelihood function. A potential
problem: local optima or multimodalities in the likelihood surface.
• The way we do the optimisation is:
1. Set up LLF.
2. Use regression to get initial guesses for the mean parameters.
3. Choose some initial guesses for the conditional variance parameters.
4. Specify a convergence criterion - either by criterion or by value.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 35
Non-Normality and Maximum Likelihood (ML)
uˆt
• The sample counterpart is vˆt
ˆ t
• Are the v̂t normal? Typically v̂t are still leptokurtic, although less so than
the ût . Is this a problem? Not really, as we can use the ML with a robust
variance/covariance estimator. ML with robust standard errors is called Quasi-
Maximum Likelihood or QML.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 36
Extensions to the Basic GARCH Model
u t 1 u 2
t 1
log( t ) log( t 1 )
2 2
t 1 2 t 1 2
The news impact curve plots the next period volatility (ht) that would arise from various
positive and negative values of ut-1, given an estimated model.
News Impact Curves for S&P 500 Returns using Coefficients from GARCH and GJR
Model Estimates: 0.14
GARCH
GJR
0.12
Value of Conditional Variance
0.1
0.08
0.06
0.04
0.02
0
-1 -0.9 -0.8 -0.7 -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Value of Lagged Shock
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 41
GARCH-in Mean
• GARCH can model the volatility clustering effect since the conditional
variance is autoregressive. Such models can be used to forecast volatility.
Let 1f,T be the one step ahead forecast for 2 made at time T. This is
2
•
easy to calculate since, at time T, the values of all the terms on the
RHS are known.
• 1f,T would be obtained by taking the conditional expectation of the
2
Given, 1f,T how is 2f,T , the 2-step ahead forecast for 2 made at time T,
2
•
2
f 2 = + ( + ) f
2
2,T 0 1 1,T
• By similar arguments, the 3-step ahead forecast will be given by
3f,T = ET(0 + 1uT+22 + T+22)
2
= 0 + (1+) 2f,T
2
1. Option pricing
2. Conditional betas
im,t
i ,t
m2 ,t
• What if the standard deviations and correlation are changing over time?
s ,t
Use ht p t
F ,t
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 48
Testing Non-linear Restrictions or
Testing Hypotheses about Non-linear Models
• Usual t- and F-tests are still valid in non-linear models, but they are
not flexible enough.
yt = + yt-1 + ut , ut N(0, t )
2
t2 = 0 + 1 ut21 + t 1
2
• We estimate the model imposing the restriction and observe the maximised
LLF falls to 64.54. Can we accept the restriction?
• LR = -2(64.54-66.85) = 4.62.
• The test follows a 2(1) = 3.84 at 5%, so reject the null.
• Denoting the maximised value of the LLF by unconstrained ML as L(ˆ)
~
and the constrained optimum as L( ) . Then we can illustrate the 3 testing
procedures in the following diagram:
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 51
Comparison of Testing Procedures under Maximum
Likelihood: Diagramatic Representation
L
A
L ˆ
B
L
~
~
ˆ
• We know at the unrestricted MLE, L(ˆ), the slope of the curve is zero.
~
• But is it “significantly steep” at L( ) ?
• Purpose
• To consider the out of sample forecasting performance of GARCH and
EGARCH Models for predicting stock index volatility.
• Data
• Weekly closing prices (Wednesday to Wednesday, and Friday to Friday)
for the S&P100 Index option and the underlying 11 March 83 - 31 Dec. 89
ut 1 u 2
1/ 2
or ln( ht ) 0 1 ln( ht 1 ) 1 ( t 1 ) (3)
ht 1 ht 1
where
RMt denotes the return on the market portfolio
RFt denotes the risk-free rate
ht denotes the conditional variance from the GARCH-type models while
t2 denotes the implied variance from option prices.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 55
The Models (cont’d)
ln( ht ) 0 1 ln( ht 1 ) 1 ( t 1
) ln( t21 ) (5)
ht 1 ht 1
• If this second set of restrictions holds, then (4) & (5) collapse to
ht2 0 t21 (4’)
R Mt R Ft 0 1 ht u t (8.78)
ht 0 1u t21 1 ht 1 (8.79)
ht 0 1u t21 1 ht 1 t21 (8.81)
ht2 0 t21 (8.81)
Equation for 0 1 010-4 1 1 Log-L 2
Variance
specification
(8.79) 0.0072 0.071 5.428 0.093 0.854 - 767.321 17.77
(0.005) (0.01) (1.65) (0.84) (8.17)
(8.81) 0.0015 0.043 2.065 0.266 -0.068 0.318 776.204 -
(0.028) (0.02) (2.98) (1.17) (-0.59) (3.00)
(8.81) 0.0056 -0.184 0.993 - - 0.581 764.394 23.62
(0.001) (-0.001) (1.50) (2.94)
Notes: t-ratios in parentheses, Log-L denotes the maximised value of the log-likelihood function in
each case. 2 denotes the value of the test statistic, which follows a 2(1) in the case of (8.81) restricted
to (8.79), and a 2 (2) in the case of (8.81) restricted to (8.81). Source: Day and Lewis (1992).
Reprinted with the permission of Elsevier Science.
u t 1 u t 1 2
1/ 2
• But the models do not represent a true test of the predictive ability of
IV.
• There are 729 data points. They use the first 410 to estimate the
models, and then make a 1-step ahead forecast of the following week’s
volatility.
where t 1 is the “actual” value of volatility, and 2ft is the value forecasted
2
• The relatively small number of such options that exist limits the
circumstances in which implied covariances can be calculated
• EWMA models also cannot allow for the observed mean reversion in
the volatilities or covariances of asset returns that is particularly
prevalent at lower frequencies.
• In the case of the VECH, the conditional variances and covariances would
each depend upon lagged values of all of the variances and covariances
and on lags of the squares of both error terms and their cross products.
• In matrix form, it would be written
VECH H t C A VECH t 1t 1 B VECH H t 1 t t 1 ~ N 0, H t
• Writing out all of the elements gives the 3 equations as
h11t c11 a11u12t a12 u 22t a13 u1t u 2 t b11 h11t 1 b12 h22 t 1 b13 h12 t 1
h22 t c21 a 21u12t a 22 u 22t a 23 u1t u 2 t b21 h11t 1 b22 h22 t 1 b23 h12 t 1
h12 t c31 a 31u12t a 32 u 22t a 33 u1t u 2 t b31 h11t 1 b32 h22 t 1 b33 h12 t 1
• Such a model would be hard to estimate. The diagonal VECH is much
simpler and is specified, in the 2 variable case, as follows:
h11t 0 1u12t 1 2 h11t 1
h22t 0 1u 22t 1 2 h22t 1
h12t 0 1u1t 1u 2t 1 2 h12t 1
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 74
BEKK and Model Estimation for M-GARCH
• Neither the VECH nor the diagonal VECH ensure a positive definite
variance-covariance matrix.
• An alternative approach is the BEKK model (Engle & Kroner, 1995).
• The BEKK Model uses a Quadratic form for the parameter matrices to
ensure a positive definite variance / covariance matrix Ht.
• In matrix form, the BEKK model is H t W W AHt 1 A Bt 1t 1B
• Model estimation for all classes of multivariate GARCH model is
again performed using maximum likelihood with the following LLF:
TN
2
1 T
log 2 log H t t' H t1 t
2 t 1
where N is the number of variables in the system (assumed 2 above),
is a vector containing all of the parameters, and T is the number of obs.
• The off-diagonal elements of Ht, hij,t (i j), are defined indirectly via
the correlations, denoted ρij:
where:
• S is the unconditional correlation matrix of the vector of standardised
residuals (from the first stage estimation), ut = Dt−1ϵt
• ι is a vector of ones
• Qt is an N × N symmetric positive definite variance-covariance matrix
• ◦ denotes the Hadamard or element-by-element matrix multiplication
procedure
• This specification for the intercept term simplifies estimation and
reduces the number of parameters.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 79
The DCC Model – A Possible Specification
• The model may be estimated in a single stage using ML although this will
be difficult. So Engle advocates a two-stage procedure where each variable
in the system is first modelled separately as a univariate GARCH
• A joint log-likelihood function for this stage could be constructed, which
would simply be the sum (over N) of all of the log-likelihoods for the
individual GARCH models
• In the second stage, the conditional likelihood is maximised with respect
to any unknown parameters in the correlation matrix
• The log-likelihood function for the second stage estimation will be of the
form
• where θ1 and θ2 denote the parameters to be estimated in the 1st and 2nd
stages, respectively.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 81
Asymmetric Multivariate GARCH
In Sample
Unhedged Naïve Hedge Symmetric Time Asymmetric
=0 =1 Varying Time Varying
Hedge Hedge
hFC,t hFC ,t
t t
hF , t h F ,t
Return 0.0389 -0.0003 0.0061 0.0060
{2.3713} {-0.0351} {0.9562} {0.9580}
Variance 0.8286 0.1718 0.1240 0.1211
Out of Sample
Unhedged Naïve Hedge Symmetric Time Asymmetric
=0 =1 Varying Time Varying
Hedge Hedge
hFC ,t hFC ,t
t t
hF , t h F ,t
Return 0.0819 -0.0004 0.0120 0.0140
{1.4958} {0.0216} {0.7761} {0.9083}
Variance 1.4972 0.1696 0.1186 0.1188
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 84
Plot of the OHR from Multivariate GARCH
0.95
Conclusions
0.90
- OHR is time-varying and less
than 1
0.85 - M-GARCH OHR provides a
better hedge, both in-sample
0.80
and out-of-sample.
0.75
- No role in calculating OHR for
asymmetries
0.70
0.65
500 1000 1500 2000 2500 3000
Symmetric BEKK
Asymmetric BEKK
Panel Data
• Panel data, also known as longitudinal data, have both time series and cross-
sectional dimensions.
• They arise when we measure the same collection of people or objects over a
period of time.
• Econometrically, the setup is
yit xit uit
where yit is the dependent variable, is the intercept term, is a k 1 vector
of parameters to be estimated on the explanatory variables, xit; t = 1, …, T;
i = 1, …, N.
• The simplest way to deal with this data would be to estimate a single, pooled
regression on all the observations together.
• But pooling the data assumes that there is no heterogeneity – i.e. the same
relationship holds for all the data.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 2
The Advantages of using Panel Data
There are a number of advantages from using a full panel technique when a panel
of data is available.
• We can address a broader range of issues and tackle more complex problems
with panel data than would be possible with pure time series or pure cross-
sectional data alone.
• One approach to making more full use of the structure of the data would be
to use the SUR framework initially proposed by Zellner (1962). This has
been used widely in finance where the requirement is to model several
closely related variables over time.
• A SUR is so-called because the dependent variables may seem unrelated
across the equations at first sight, but a more careful consideration would
allow us to conclude that they are in fact related after all.
• Under the SUR approach, one would allow for the contemporaneous
relationships between the error terms in the equations by using a generalised
least squares (GLS) technique.
• The idea behind SUR is essentially to transform the model so that the error
terms become uncorrelated. If the correlations between the error terms in the
individual equations had been zero in the first place, then SUR on the system
of equations would have been equivalent to running separate OLS
regressions on each equation.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 4
Fixed and Random Effects Panel Estimators
• There are two main classes of panel techniques: the fixed effects estimator
and the random effects estimator.
• The fixed effects model for some variable yit may be written
yit xit i vit
• We can think of i as encapsulating all of the variables that affect yit cross-
sectionally but do not vary over time – for example, the sector that a firm
operates in, a person's gender, or the country where a bank has its
headquarters, etc. Thus we would capture the heterogeneity that is
encapsulated in i by a method that allows for different intercepts for each
cross sectional unit.
• This model could be estimated using dummy variables, which would be
termed the least squares dummy variable (LSDV) approach.
where D1i is a dummy variable that takes the value 1 for all observations on
the first entity (e.g., the first firm) in the sample and zero otherwise, D2i is a
dummy variable that takes the value 1 for all observations on the second
entity (e.g., the second firm) and zero otherwise, and so on.
• The LSDV can be seen as just a standard regression model and therefore it
can be estimated using OLS.
• Now the model given by the equation above has N+k parameters to estimate.
In order to avoid the necessity to estimate so many dummy variable
parameters, a transformation, known as the within transformation, is used to
simplify matters.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 7
The Within Transformation
• Time-variation in the intercept terms can be allowed for in exactly the same
way as with entity fixed effects. That is, a least squares dummy variable
model could be estimated
yit xit 1D1t 2 D2t ... T DTt vit
where D1t, for example, denotes a dummy variable that takes the value 1 for
the first time period and zero elsewhere, and so on.
• The only difference is that now, the dummy variables capture time variation
rather than cross-sectional variation. Similarly, in order to avoid estimating a
model containing all T dummies, a within transformation can be conducted to
subtract away the cross-sectional averages from each observation
• Finally, it is possible to allow for both entity fixed effects and time fixed
effects within the same model. Such a model would be termed a two-way
error component model, and the LSDV equivalent model would contain both
cross-sectional and time dummies
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 11
Investigating Banking Competition with a Fixed Effects
Model
• The UK banking sector is relatively concentrated and apparently extremely
profitable.
• It has been argued that competitive forces are not sufficiently strong and that
there are barriers to entry into the market.
• A study by Matthews, Murinde and Zhao (2007) investigates competitive
conditions in UK banking between 1980 and 2004 using the Panzar-Rosse
approach.
• The model posits that if the market is contestable, entry to and exit from the
market will be easy (even if the concentration of market share among firms is
high), so that prices will be set equal to marginal costs.
• The technique used to examine this conjecture is to derive testable
restrictions upon the firm's reduced form revenue equation.
• The model also includes several variables that capture time-varying bank-
specific effects on revenues and costs, and these are: RISKASS, the ratio of
provisions to total assets; ASSET is bank size, as measured by total assets; BR
is the ratio of the bank's number of branches to the total number of branches
for all banks.
• Finally, GROWTHt is the rate of growth of GDP, which obviously varies
over time but is constant across banks at a given point in time; i is a bank-
specific fixed effects and vit is an idiosyncratic disturbance term. The
contestability parameter, H is given as 1 + 2 + 3
• Unfortunately, the Panzar-Rosse approach is only valid when applied to a
banking market in long-run equilibrium. Hence the authors also conduct a
test for this, which centres on the regression
ln ROAit 0 '1 ' ln PLit 2 ' ln PKit 3 ' ln PFit 1 ' ln RISKASSit
2 ' ln ASSETit 3 ' ln BRit 1 ' GROWTH t i wit
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 14
Methodology (Cont’d)
• The explanatory variables for the equilibrium test regression are identical to
those of the contestability regression but the dependent variable is now the
log of the return on assets (lnROA).
• Equilibrium is argued to exist in the market if 1' + 2' + 3'
• Matthews et al. employ a fixed effects panel data model which allows for
differing intercepts across the banks, but assumes that these effects are fixed
over time.
• The fixed effects approach is a sensible one given the data analysed here
since there is an unusually large number of years (25) compared with the
number of banks (12), resulting in a total of 219 bank-years (observations).
• The data employed in the study are obtained from banks' annual reports and
the Annual Abstract of Banking Statistics from the British Bankers
Association. The analysis is conducted for the whole sample period, 1980-
2004, and for two sub-samples, 1980-1991 and 1992-2004.
• The null hypothesis that the bank fixed effects are jointly zero (H 0: i = 0) is
rejected at the 1% significance level for the full sample and for the second
sub-sample but not at all for the first sub-sample.
• Overall, however, this indicates the usefulness of the fixed effects panel
model that allows for bank heterogeneity.
• The main focus of interest in the table on the previous slide is the equilibrium
test, and this shows slight evidence of disequilibrium (E is significantly
different from zero at the 10% level) for the whole sample, but not for either
of the individual sub-samples.
• Thus the conclusion is that the market appears to be sufficiently in a state of
equilibrium that it is valid to continue to investigate the extent of competition
using the Panzar-Rosse methodology. The results of this are presented on the
following slide.
• The value of the contestability parameter, H, which is the sum of the input
elasticities, falls in value from 0.78 in the first sub-sample to 0.46 in the
second, suggesting that the degree of competition in UK retail banking
weakened over the period.
• However, the results in the two rows above that show that the null
hypotheses that H = 0 and H = 1 can both be rejected at the 1% significance
level for both sub-samples, showing that the market is best characterised by
monopolistic competition.
• As for the equilibrium regressions, the null hypothesis that the fixed effects
dummies (i) are jointly zero is strongly rejected, vindicating the use of the
fixed effects panel approach and suggesting that the base levels of the
dependent variables differ.
• Finally, the additional bank control variables all appear to have intuitively
appealing signs. For example, the risk assets variable has a positive sign, so
that higher risks lead to higher revenue per unit of total assets; the asset
variable has a negative sign, and is statistically significant at the 5% level or
below in all three periods, suggesting that smaller banks are more profitable.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 19
The Random Effects Model
• Unlike the fixed effects model, there are no dummy variables to capture the
heterogeneity (variation) in the cross-sectional dimension.
• Instead, this occurs via the i terms.
• Note that this framework requires the assumptions that the new cross-
sectional error term, i, has zero mean, is independent of the individual
observation error term vit, has constant variance, and is independent of the
explanatory variables.
• The parameters ( and the vector) are estimated consistently but
inefficiently by OLS, and the conventional formulae would have to be
modified as a result of the cross-correlations between error terms for a given
cross-sectional unit at different points in time.
• Instead, a generalised least squares (GLS) procedure is usually used. The
transformation involved in this GLS procedure is to subtract a weighted mean
of the yit over time (i.e. part of the mean rather than the whole mean, as was
the case for fixed effects estimation).
• Define the ‘quasi-demeaned’ data as yit* yit yi and similarly for xit,
• will be a function of the variance of the observation error term, v2, and of
the variance of the entity-specific error term, 2:
v
1
T 2 v2
• This transformation will be precisely that required to ensure that there are no
cross-correlations in the error terms, but fortunately it should automatically
be implemented by standard software packages.
• Just as for the fixed effects model, with random effects, it is also
conceptually no more difficult to allow for time variation than it is to allow
for cross-sectional variation.
• In the case of time-variation, a time period-specific error term is included and
again, a two-way model could be envisaged to allow the intercepts to vary
both cross-sectionally and over time.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 22
Fixed or Random Effects?
• It is often said that the random effects model is more appropriate when the
entities in the sample can be thought of as having been randomly selected
from the population, but a fixed effect model is more plausible when the
entities in the sample effectively constitute the entire population.
• Also, since there are fewer parameters to be estimated with the random
effects model (no dummy variables or within transform to perform), and
therefore degrees of freedom are saved, the random effects model should
produce more efficient estimation than the fixed effects approach.
• However, the random effects approach has a major drawback which arises
from the fact that it is valid only when the composite error term it is
uncorrelated with all of the explanatory variables.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 23
Fixed or Random Effects? (Cont’d)
• This assumption is more stringent than the corresponding one in the fixed
effects case, because with random effects we thus require both i and vit to be
independent of all of the xit.
• This can also be viewed as a consideration of whether any unobserved
omitted variables (that were allowed for by having different intercepts for
each entity) are uncorrelated with the included explanatory variables. If they
are uncorrelated, a random effects approach can be used; otherwise the fixed
effects model is preferable.
• A test for whether this assumption is valid for the random effects estimator is
based on a slightly more complex version of the Hausman test.
• If the assumption does not hold, the parameter estimates will be biased and
inconsistent.
• To see how this arises, suppose that we have only one explanatory variable,
x2it that varies positively with yit, and also with the error term, it. The
estimator will ascribe all of any increase in y to x when in reality some of it
arises from the error term, resulting in biased coefficients.
• There may also be differences in policies for credit provision dependent upon
the nature of the formation of the subsidiary abroad – i.e. whether the
subsidiary's existence results from a take-over of a domestic bank or from the
formation of an entirely new startup operation (a ‘greenfield investment’).
• The data cover the period 1993-2000 and are obtained from BankScope.
• These are: weakness parent bank, defined as loan loss provisions made by
the parent bank; solvency is the ratio of equity to total assets; liquidity is the
ratio of liquid assets / total assets; size is the ratio of total bank assets to total
banking assets in the given country; profitability is return on assets and
efficiency is net interest margin.
• and the 's are parameters (or vectors of parameters in the cases of 4 and
5), i is the unobserved random effect that varies across banks but not over
time, and it is an idiosyncratic error term.
• de Haas and van Lelyveld discuss the various techniques that could be
employed to estimate such a model.
• OLS is considered to be inappropriate since it does not allow for differences
in average credit market growth rates at the bank level.
• A model allowing for entity-specific effects (i.e. a fixed effects model that
effectively allowed for a different intercept for each bank) is ruled out on the
grounds that there are many more banks than time periods and thus too many
parameters would be required to be estimated.
• They also argue that these bank-specific effects are not of interest to the
problem at hand, which leads them to select the random effects panel model.
• This essentially allows for a different error structure for each bank. A
Hausman test is conducted, and shows that the random effects model is valid
since the bank-specific effects i are found “in most cases not to be
significantly correlated with the explanatory variables.”
• The main result is that during times of banking disasters, domestic banks
significantly reduce their credit growth rates (i.e. the parameter estimate on
the crisis variable is negative for domestic banks), while the parameter is
close to zero and not significant for foreign banks.
• This indicates that, as the authors expected, when foreign banks have fewer
viable lending opportunities in their own countries and hence a lower
opportunity cost for the loanable funds, they may switch their resources to
the host country.
• Lending rates, both at home and in the host country, have little impact on
credit market share growth.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 31
Analysis of Results (Cont’d)
• Overall, both home-related (‘push’) and host-related (‘pull’) factors are found
to be important in explaining foreign bank credit growth.
• Dickey-Fuller and Phillips-Perron unit root tests have low power, especially
for modest sample sizes
• It is believed that more powerful versions of the tests can be employed when
time-series and cross-sectional information is combined – as a result of the
increase in sample size
• We could increase the number of observations by increasing the sample
period, but this data may not be available, or may have structural breaks
• A valid application of the test statistics is much more complex for panels than
single series
• Two important issues to consider:
– The design and interpretation of the null and alternative hypotheses needs careful
thought
– There may be a problem of cross-sectional dependence in the errors across the
unit root testing regressions
• Early studies that assumed cross-sectional independence are sometimes
known as ‘first generation’ panel unit root tests
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 33
The MADF Test
• We could run separate regressions over time for each series but to use
Zellner’s seemingly unrelated regression (SUR) approach, which we might
term the multivariate ADF (MADF) test
• This method can only be employed if T >> N, and Taylor and Sarno (1998)
provide an early application to tests for purchasing power parity
• However, that technique is now rarely used, researchers preferring instead to
make use of the full panel structure
• A key consideration is the dimensions of the panel – is the situation that T is
large or that N is large or both? If T is large and N small, the MADF
approach can be used
• But as Breitung and Pesaran (2008) note, in such a situation one may
question whether it is worthwhile to adopt a panel approach at all, since for
sufficiently large T, separate ADF tests ought to be reliable enough to render
the panel approach hardly worth the additional complexity.
• Levin, Lin and Chu (2002) – LLC – develop a test based on the equation:
• The model is very general since it allows for both entity-specific and time-
specific effects through αi and θt respectively as well as separate
deterministic trends in each series through δit, and the lag structure to mop
up autocorrelation in Δy
• One of the reasons that unit root testing is more complex in the panel
framework in practice is due to the plethora of ‘nuisance parameters’ in the
equation which are necessary to allow for the fixed effects (i.e. αi, θt, δit)
• These nuisance parameters will affect the asymptotic distribution of the test
statistics and hence LLC propose that two auxiliary regressions are run to
remove their impacts
• Breitung (2000) develops a modified version of the LLC test which does
not include the deterministic terms and which standardises the residuals
from the auxiliary regression in a more sophisticated fashion.
• Under the LLC and Breitung approaches, only evidence against the non-
stationary null in one series is required before the joint null will be rejected
• Breitung and Pesaran (2008) suggest that the appropriate conclusion when
the null is rejected is that ‘a significant proportion of the cross-sectional
units are stationary’
• Especially in the context of large N, this might not be very helpful since no
information is provided on how many of the N series are stationary
• This difficulty led Im, Pesaran and Shin (2003) – IPS – to propose an
alternative approach where the null and alternative hypotheses are now H0:
ρi = 0 ∀ i and H1: ρi < 0, i = 1, 2, . . . , N1; ρi = 0, i = N1 + 1, N1 + 2, . . . , N
• So the null hypothesis still specifies all series in the panel as nonstationary,
but under the alternative, a proportion of the series (N1/N) are stationary,
and the remaining proportion ((N − N1)/N) are nonstationary
• No restriction where all of the ρ are identical is imposed
• The statistic in this case is constructed by conducting separate unit root
tests for each series in the panel, calculating the ADF t-statistic for each one
in the standard fashion, and then taking their cross-sectional average
• This average is then transformed into a standard normal variate under the
null hypothesis of a unit root in all the series
• While IPS’s heterogeneous panel unit root tests are superior to the
homogeneous case when N is modest relative to T, they may not be
sufficiently powerful when N is large and T is small, in which case the LLC
approach may be preferable.
‘Introductory Econometrics for Finance’ © Chris Brooks 2019 38
The Maddala and Wu (1999) and Choi (2001) Tests
• Maddala and Wu (1999) and Choi (2001) developed a slight variant on the
IPS approach based on an idea dating back to Fisher (1932)
• Unit root tests are conducted separately on each series in the panel and the
p-values associated with the test statistics are then combined
• If we call these p-values pvi, i = 1, 2, . . . ,N, under the null hypothesis of a
unit root in each series, pvi will be distributed uniformly over the [0,1]
interval and hence the following will hold for given N as T → ∞
• Testing for cointegration in panels is complex since one must consider the
possibility of cointegration across groups of variables (what we might term
‘cross-sectional cointegration’) as well as within the groups
• Most of the work so far has relied upon a generalisation of the single
equation methods of the Engle-Granger type following the pioneering work
by Pedroni (1999, 2004)
• For a set of M variables yit and xm,i,t that are individually integrated of order
one and thought to be cointegrated, the model is
• The residuals from this regression are then subjected to separate Dickey-
Fuller or augmented Dickey-Fuller type regressions for each group
• The null hypothesis is that the residuals from all of the test regressions are
unit root processes and therefore that there is no cointegration.
• To what extent are economic growth and the sophistication of the country’s
financial markets linked?
• Excessive government regulations may impede the development of the
financial markets and consequently economic growth will be slower
• On the other hand, if economic agents are able to borrow at reasonable rates
of interest or raise funding easily on the capital markets, this can increase
the viability of real investment opportunities
• Given that long time-series are typically unavailable for developing
economies, traditional unit root and cointegration tests that examine the link
between these two variables suffer from low power
• This provides a strong motivation for the use of panel techniques, as in the
study by Chrisopoulos and Tsionas (2004)
• The results, are much stronger than for the single series unit root tests and
show conclusively that all four series are non-stationary in levels but
stationary in differences:
• The LLC approach is used along with the Harris-Tzavalis technique, which
is broadly the same as LLC but has slightly different correction factors in
the limiting distribution
• These techniques are based on a unit root test on the residuals from the
potentially cointegrating regression
• Christopoulos and Tsionis investigate the use of panel cointegration tests
with fixed effects, and with both fixed effects and a deterministic trend in
the test regressions
• These are applied to the regressions both with y, and separately F, as the
dependent variables
• The results quite strongly demonstrate that when the dependent variable is
output, the LLC approach rejects the null hypothesis of a unit root in the
potentially cointegrating regression residuals when fixed effects only are
included in the test regression, but not when a trend is also included.