TS - Lectures AAU August 7 - 9 2018

sourafel.girma@nottingham.ac.
uk Applied Time Series Analysis Addis Ababa University, 2018
This is a primer on applied time series analysis where technical details are kept to a minimum. The
preparation of these lecture notes has benefited from the following books:
1. Becketti, S., 2013. Introduction to time series using Stata. College Station, TX: Stata Press.
2. Boffelli, S. And Urga, G., 2016. Financial Econometrics using Stata. Stata Press Publication.
3. Enders, W., 2015. Applied Econometric Time Series. (Wiley Series in Probability and Statistics)
4. Brooks, C., 2014. Introductory Econometrics for Finance. Cambridge university press.
5. Hill, R.C., Griffiths, W.E., Lim, G.C. and Lim, M.A., 2008. Principles of Econometrics. Hoboken, NJ:
Wiley.
6. Tsay, R.S., 2005. Analysis of Financial Time Series. John Wiley & Sons.
7. Time Series Reference Manual, 2017, RELEASE 15. ®. A Stata Press Publication, College Station, Texas.
Page 1 of 190
sourafel.girma@nottingham.ac.uk Applied Time Series Analysis Addis Ababa University, 2018
Day 1: August 7, 2018 Univariate Time Series Analysis:

Basic concepts; forecasting
Day 2: August 8, 2018 Mutivariate time series analysis:
VAR;SVAR; VECM
Day 3: August 9, 2018 Volatility Modelling:
ARCH, GARCH; MGARCH
Daily Program:
Lecture 1: 9:00-10:30
Lecture 2 :11:00-12:30
Lecture 3:14:00-15:30
Page 2 of 190
PART I:
Basic concepts
Learning objectives:
Based on the material discussed in the lectures, you should be able to:
1. Describe the nature of time series data.
2. Explain what the components of a time series data are.
3. Explain what is meant by lags, leads and differences.
4. Explain what is meant by stationary time series, and test for stationarity.
5. Understand the use of autocorrelation and partial autocorrelation functions.
6. Use correlograms and cross-correlograms to study the features of time series variables.
7. Explain the nature of regressions involving lagged variables including finite distributed lags, autoregressive
and autoregressive distributed lags models.
8. Compute impact, interim and long run multipliers for both finite distributed lags and autoregressive
distributed lags models.
9. Understand the use of ARIMA modcls
10. Test for serial correlation in ARIMA models using Portmanteau (Q) test.
Page 3 of 190
Basic notions
 Several applications in economics involve the analysis of time series data; e.g. £/$ exchange rate, stock prices
and unemployment rates.
 Time series data consist of observations on a given economic unit (e.g. a country, a stock market, a currency, and
an industry) at several points in time with a natural ordering according to time.
 Knowing about the frequency of data collection is important. Here frequency refers to the length between
two successive observations. Commonly used frequencies are annual, quarterly, monthly, weekly and daily.
 Another important issue is whether the time series variable is stationary or nonstationary- these concepts
will be explained later.
Page 4 of 190
 When one presents time series data, or writes a time series regression model, the relevant time series variable
should be explicitly indexed with a time indicator.
Some examples of notations of times series data:
Example Notation Meaning
Example 1 𝑋𝑡 : t=1,2,….T The values of variable X from the first period to period T. Note
here the data frequency is not given explicitly. This is a generic
notation. 𝑋2 would correspond to the observation at period 2.
Here we don’t know the data frequency, but data ordering still
matter- e.g. we know that 𝑋2 is observed right after 𝑋1 .
Example 2 𝑋𝑡 : t=1971,…,2015 The values of X from 1971 to 2015. In this example, it should
be clear that the data frequency is yearly.
Example 3 𝑋𝑡 : t=2001q1,…,2015q4 The values of X from the first quarter of 2001 to the last
quarter of 2015. Here the data frequency is quarterly.
Example 4 𝑋𝑡 : t=2000m3,…,2010m12 The values of X from the third month (i.e. March) of 2000 to
December 2012. Here it is apparent that the data frequency is
monthly.
Example 5 𝑋𝑡 : t=2005w10,…,2014w51 The values of X from week 10 of 2005 to the penultimate week
of 2014. Here the data frequency is weekly.
Page 5 of 190
Some examples of time series data:

In order to emphasise the importance of data frequency, consider the same underlying variable, that is the
seasonally adjusted number (given thousands) of overseas visitors to the UK over the time period 1986-2016.
Example 1: This example pertains to our variable given at yearly frequency
Data snapshot: Figure 1:
Time series line of yearly data
35000
Overseas Vistors to the UK: Thousands
30000
25000
20000
15000
1986 1990 1995 2000 2005 2010 2015

year
Page 6 of 190
Example 2:- Same underlying data, but this time data disaggregated at quarterly frequency. Compared to the
yearly data, this is a high frequency data- because data “arrive” more frequently
Time series line of quarterly data
10000
8000
6000
4000
2000
1985q1 1990q1 1995q1 2000q1 2005q1 2010q1 2015q1

Year and quarter
Page 7 of 190
Example 3: At monthly frequency- a still higher frequency data.

Time series line of monthly data
3500
3000
2500
2000
1500
1000
1985m1 1990m1 1995m1 2000m1 2005m1 2010m1 2015m1

Year amd month
Page 8 of 190
 Apart from the frequency of the data, there are other considerations when working with time series data.
1. One important consideration is whether the data record a snapshot at a point time- e.g. closing value of
FT100 stock index, or the average value over the day?
2. Were the data recorded at regular interval or intermittently?
3. Are there any structural breaks in the data?
4. Were the data seasonally adjusted or not?
5. How are business holidays treated?
6. Are the data revised over time? If so how would this affect the analysis?
7. Are variables in the dataset given in current or real prices?
Page 9 of 190
Components of time series variables:

 The systematically predictable component of the time series is called the signal, whereas the component
which makes the time series difficult to predict (or “muddies the water”) is called the noise.
 The signal consists of three major components- trend, seasonal and cycle. It is important to note that
a given time series may incorporate some or all of these components.
 Trend is simply the persistent tendency (if any) of the time series to increase or decrease over time.
Trends may change over time both in magnitude and direction.
Figure 4: Trends in UK GDP Figure 5: Trends in log UK GDP
500000
13
400000
Log UK GDP market prices- £m
12
UK GDP- market prices:£m
300000
11
200000
10
100000
9
0
8
1955q3 1970q3 1985q3 2000q3 2015q3 1955q3 1970q3 1985q3 2000q3 2015q3
Year and quarter Year and quarter
Page 10 of 190
 Seasonality is a type of cyclicality where the time series has a tendency to increase or decrease in predictable
or regular ways, for example at the same quarter of year or the day of the week (e.g. see Fig. 6). This gives the
time series a smoothly oscillating character. When a time series oscillates around a trend, we say that it exhibits
cycles. In contrast to seasonality, the timing and duration of oscillations of cycles tend to be irregular or
aperiodic (e.g. se Fig 7).
Figure 6: Change in fixed capital formation (UK) Figure 7: UK male unemployment rate
20000
12
10000
10
unrate_male
8
IN
0
6
-10000
4
-20000
2
1990q1 1995q1 2000q1 2005q1 2010q1 2015q1 1970m1 1980m1 1990m11 2000m1 2010m1 2016m1
Year and quarter Year and month
Page 11 of 190
2. Lags, leads and differences

 The concept of a lagged variable is fundamental to time series data. Suppose 𝑋𝑡 : t=1, 2….T is the value
of the variable in period t, with T observations in total. Then 𝑋𝑡−1 : t=2….T is the value of the variable in
period t-1 or lagged one period. This is called lagged X, and it will have T-1 observations Lagged X is a
variable in its own right, and we can give it any name when working with a computer software, e.g. LX where
𝐿𝑋𝑡 =𝑋𝑡−1 . Similarly, we can create the value of X lagged by two periods say 𝐿2𝑋𝑡 =𝑋𝑡−2 , and so on.
 For illustration purpose we will use log of weekly £/$ exchange rate (logxr)
Data snapshot: Figure 8: Time series weekly £/S xrate
£/$ exchange rate logxr: log £/$ exchange rate
2.2
.8
2
.6
1.8
GBPUSD
logxr
1.6
.4
1.4
1.2
.2
2000w1 2004w1 2008w1 2012w1 2016w30 2000w1 2004w1 2008w1 2012w1 2016w30
Year and week Year and week
Page 12 of 190
A generic example with T=8 and q=3 Lags of weekly £/S xrate
t 𝑿𝒕 𝑳𝑿𝒕 = 𝑿𝒕−𝟏 𝑳𝟐𝑿𝒕 𝑳𝟑𝑿𝒕
= 𝑿𝒕−𝟐 = 𝑿𝒕−𝟑
1 𝑋1 . . .
2 𝑋2 𝑋1 . .
3 𝑋3 𝑋2 𝑋1 .
4 𝑋4 𝑋3 𝑋2 𝑋1
5 𝑋5 𝑋4 𝑋3 𝑋2
6 𝑋6 𝑋5 𝑋4 𝑋3
7 𝑋7 𝑋6 𝑋5 𝑋4
8 𝑋8 𝑋7 𝑋6 𝑋5
Page 13 of 190
 A related concept of is one of lead or forward variables. Again suppose 𝑋𝑡 : t=1, 2….T is the value of the
variable in period t, with T observations in total. Then 𝑋𝑡+1 : t=1….T-1 is the value of the variable in period
t+1 or forwarded by one period. This is called lead X, and it will have T-1 observations in total since when
t=T, 𝑋𝑡+1 =𝑋𝑇+1 is not observed (assuming data ends at t=T). We can give lead variables any name, e.g. FX
where 𝐹𝑋𝑡 =𝑋𝑡+1 . By the same token we can create the value of X forwarded by two periods say 𝐹2𝑋𝑡 =𝑋𝑡+2
, in which case we have T-2 observations; and so on.
A generic example with T=8 and q=3)
t 𝑿𝒕 𝑭𝑿𝒕 = 𝑿𝒕+𝟏 𝑭𝟐𝑿𝒕 = 𝑿𝒕∓𝟐 𝑭𝟑𝑿𝒕 = 𝑿𝒕+𝟑
1 𝑋1 𝑋2 𝑋3 𝑋4
2 𝑋2 𝑋3 𝑋4 𝑋5
3 𝑋3 𝑋4 𝑋5 𝑋6
4 𝑋4 𝑋5 𝑋6 𝑋7
5 𝑋5 𝑋6 𝑋7 𝑋8
6 𝑋6 𝑋7 𝑋8 .
7 𝑋7 𝑋8 . .
8 𝑋8 . . .
Page 14 of 190
Examples of leads creation with log £/$ exchange rate)
Page 15 of 190
 The concept of a differenced variable is fundamental to time series data. For 𝑋𝑡 : t=1,2….T, 𝑋𝑡 − 𝑋𝑡−1 :
t=2….T is the difference between the current value of the variable and its value in period t-1. That is it is the
(first) difference between X and lagged X, and this will have T-1 observations. We can give the first
differenced any name, e.g. DX where 𝐷𝑋𝑡 =𝑋𝑡 − 𝑋𝑡−1 .
 If X measured in logs, the log first difference would give the proportional (%) change in X between period
t-1 and period t. Thus, for example, if X = log of £/S exchange rate, and t denotes week, 𝐷𝑋𝑡 gives weekly
appreciation (𝐷𝑋𝑡 > 0) or depreciation (𝐷𝑋𝑡 < 0) of £ against $. In finance, the log first difference of a
variable is also known as the return, reflecting the return an investor purchasing the share would earn.
Generic example with T=8
t 𝑿𝒕 𝑳𝑿𝒕 = 𝑿𝒕−𝟏 𝑫𝑿𝒕 = 𝑫𝟐𝑿𝒕 =
𝑿𝒕 − 𝑿𝒕−𝟏 𝑿𝒕 − 𝑿𝒕−𝟐
1 𝑋1 . .
2 𝑋2 𝑋1 𝑋2 − 𝑋1 .
3 𝑋3 𝑋2 𝑋3 − 𝑋2 𝑋3 − 𝑋1
4 𝑋4 𝑋3 𝑋4 − 𝑋3 𝑋4 − 𝑋2
5 𝑋5 𝑋4 𝑋5 − 𝑋4 𝑋5 − 𝑋3
6 𝑋6 𝑋5 𝑋6 − 𝑋5 𝑋6 − 𝑋4
7 𝑋7 𝑋6 𝑋7 − 𝑋6 𝑋7 − 𝑋5
8 𝑋8 𝑋7 𝑋8 − 𝑋7 𝑋8 − 𝑋6
Page 16 of 190
Examples of returns (log difference) construction Figure 9: Weekly % changes in £/$ exchange
£/$ xrate % changes in £/$ exchange rate (log first difference)
rate (log first difference
.05
0
-.05
-.1
2000w1 2002w1 2004w1 2006w1 2008w1 2010w1 2012w1 2014w1 2016w1
Year and week
Page 17 of 190
3. Stationarity
An assumption that we maintain throughout this module is that the variables in our analysis are stationary.
The formal testing for stationarity, and techniques to deal with nonstationary variables, are beyond the scope
of the module. Hence our discussion of stationarity will be informal (and imprecise), but insightful nonetheless.
In general a stationary variable is “one that is not explosive, nor trending, and nor wandering aimlessly without
returning to its mean”. We illustrate this concept with some graphs:
Time series of a stationary variable:
Although the variable shows changes from
time to time, it has the tendency to revert to
some stationary point- a mean value of
around 10 to be more precise.
Page 18 of 190
An example of a time series of a nonstationary An example of a time series of a nonstationary

variable that is ‘‘slow-turning’’ or ‘‘wandering’’ variable that trends:
Page 19 of 190
Most macro and financial variable are nonstationary, but their first differences or returns tend to be stationary.
For example consumer price indices are nonstationary, but their log first difference (i.e. the rate of inflation) tend
to be stationary. Looking at Figure 10 below, the FT100 index series looks nonstationary (it wanders), but return
to FT100 series appears stationary.
Figure 10: FT100 index and its return
7000
.2
.1
6000
FT100 index return

FT100
0
5000
-.1
4000
-.2
3000
2000w1 2004w1 2008w1 2012w1 2016w30

Year and week
FT100 FT100 index return
Page 20 of 190
Important remark:
When the variables of interest are stationary, standard regression analysis can be carried out on them.
By contrast, with nonstationary variables (also known as variables with unit roots or random walk
variables), standard regression analysis can yield wrong and misleading results. This is known as the
spurious regression problem.
4. Testing for stationarity
The autoregressive process

𝑦𝑡 = ρ𝑦𝑡−1 + 𝜀𝑡
is stationary when | ρ | < 1. When  = 1, it becomes the nonstationary random walk or unit root process. It is
also known as the stochastic trend model:
𝑦𝑡 = 𝑦𝑡−1 + 𝜀𝑡
If you add a constant term to a unit root process, you get the random walk model with drift:
𝑦𝑡 = 𝜇 + 𝑦𝑡−1 + 𝜀𝑡
Page 21 of 190
Finally a ransom walk process with trend is one that includes a deterministic trend (a function of the time
variable t itself):
𝑦𝑡 = 𝜇 + 𝑦𝑡−1 + 𝛿𝑡 + 𝜀𝑡
Some examples from simulated data:
Random walk with trend
25
20
15
y2
10
5
0
1960q1 1970q1 1980q1 1990q1 2000q1 2010q1

t
Page 22 of 190
Random walk
3
2
1
y1
0
-1
-2
1960q1 1970q1 1980q1 1990q1 2000q1 2010q1

t
Page 23 of 190
Random walk with drift
30
20
y3
10
0
1960q1 1970q1 1980q1 1990q1 2000q1 2010q1

t
Page 24 of 190
Stationary process with trend
10
5
y4
0
-5
1960q1 1970q1 1980q1 1990q1 2000q1 2010q1

t
A stationary process is also known as an integrated process of order 0 – I(0). Time series (especially macro
time series) variables tend to be nonstationary. Nonstationary variables that achieve stationarity after being
differenced one [twice] are known as integrated process of order 1- I(1) [of order 2 – I(2)]. For example looking
at Figure 10, FT100 index is I(1) non-stationary because its first difference is stationary.
Page 25 of 190
So far we relied on visual inspections to “determine” whether a series is stationary or not. Visual inspection can
be misleading however. Beside it is often difficult to just eyeball series like y4 above and tell whether they are
random walk processes with drift or they are stationary variables with deterministic trend. In the latter case de-
trending the variable would be more appropriate rather than first-differenced it. It is therefore crucial to carry out
formal stationarity tests as a matter of routine.
Augmented Dickey-Fuller tests

There are many tests for determining whether a time series is stationary or nonstationary: the most popular one
is the Dickey–Fuller unit-root test
The Dickey-Fuller test comes in different flavours: without a constant term (like above model), including a
constant and or a trend, and allowing for serial correlation in the errors which gives rise to the augmented
Dickey–Fuller test (ADF)
The null hypothesis of ADF test is that the process is nonstationary, i.e. it contains a unit root. The alternative
hypothesis posits that the time series is stationary. For example when carrying out ADF tests with trend, the null
hypothesis is that the series is a random walk process with a deterministic trend, whereas the alternative hypothesis
Page 26 of 190
states that the process is a stationary process with a deterministic trend. So rejection of the null is evidence in
favour of stationarity.
As a practical guide for conducting the Dickey-Fuller testing procedure, first plot the time series of the variable
and select a suitable Dickey-Fuller test based on a visual inspection of the plot:
a. If the series appears to be wandering or fluctuating around a sample average of zero, use a Dickey-Fuller
test without a constant term (or “drift”).
b. If the series appears to be wandering or fluctuating around a sample average which is nonzero, use a
Dickey-Fuller test with a drift.
c. If the series appears to be wandering or fluctuating around a linear trend, use a test with a trend.
Some examples:
Time series plot based on US quarterly rates of inflation (inf) and unemployment data (du) spanning the period
1970Q1-2011Q4
Page 27 of 190
.08
.2
.06
.1
change in unemployment rate
Inflation rate
.04
0
.02
-.1
0
-.2
1970q1 1980q1 1990q1 2000q1 2010q1 1970q1 1980q1 1990q1 2000q1 2010q1
year and quarter year and quarter
None of the variables exhibit trend. Inflation (unemployment rate) appear to fluctuate around a non-zero (zero)
mean. So let’s try ADF tests with and with drift, allowing for some residual serial correlation (e.g. order 4).
Page 28 of 190
ADF test based on FT100 confirm non-stationarity of series:
Page 29 of 190
…whereas the return to FT100 index is confirmed as stationary:
An alternative unit-root test is the Phillips and Perron test which is a modification of ADF test statistic to
account for both serial correlation and heteroskedasticity in the residuals. The null and alternative hypotheses are
Page 30 of 190
as in ADF. By way of illustration we carried out the Phillips and Perron test on the FT100 index and its return,
and in this case we reached the same conclusions.
Page 31 of 190
5. The autocorrelation function

 In time series analysis, autocorrelation refers to the correlations involving a variable and a lag of itself. So the
first autocorrelation is simply the correlation between a variable and its first lag. Similarly, the second
autocorrelation is the correlation between a variable and the variable lagged twice, and so on. These can be
denoted as 𝜌0 , 𝜌1′ 𝜌2 ….
 For a variable X, the k-th order autocorrelation is defined as
𝐶𝑜𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒(𝑋𝑡 ,𝑋𝑡−𝑘 )
𝜌𝑘 = , k=0, 1, 2…
𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒(𝑋𝑡 )
The autocorrelation function (ACF) gives a sequence of autocorrelations across different lag lengths or
orders. Few points are worth noting:
a. The autocorrelation of order zero 𝜌0 (pronounced rho zero) is nothing but correlation between a variable
and itself, and it is always equal to 1.
b. As the order of the ACF increases, we are looking at correlations between a variable and its more distant
past.
c. Like all correlations, autocorrelations measure the strength of linear association between two variables,
and they lie between -1 (perfectly negatively correlated) and 1 (perfectly positively correlated).
Page 32 of 190
d. The absence of significant correlations indicate only absence of linear association, and it is possible that
non-linear associations may exist.
e. In practice we work with the sample ACF which is the ACF estimated from the sample in hand. The sample
ACF is sometimes referred to as correlogram.
f. A related concept is the notion of partial autocorrelation functions (PACF). To give an example, the partial
correlation between 𝑋𝑡 𝑎𝑛𝑑 𝑋𝑡−4 is the correlation between 𝑋𝑡 𝑎𝑛𝑑 𝑋𝑡−4 conditional on (or taking into
account) all other autocorrelations that come in between – in this case autocorrelation of order 1, 2, and 3.
In some sense PACF measures the “pure” or “net” correlation between two variables.
g. The ACF and PACF are some of the most commonly used tools in applied time series analysis because they
reveals useful information regarding the extent to which current values can be predicted from past values,
and shed light on the most profitable ways of modelling the time series.
Page 33 of 190
Empirical example of ACF and PACF:

The first 20 lags of ACF and PACF of FT100 return (variable DFT100) measured at weekly frequencies:
AC shows that correlation between FT100 return and its past value keeps changing sign across the lags. For
example the autocorrelation of lag 4 ( 4 weeks apart) is -0.0526, whereas that of order 5 is 0.0549
PAC shows that the correlation between current value and its value 8 weeks ago is 0.0458 taking into account
the effect of all lags in between.
Box-Pierce Q statistic which tests the null hypothesis that the correlations up to lag k are jointly equal to 0-
i.e. all ACs up to lag k are not significant
Prob>Q are p-value from the Box-Pierce Q test. At 5% level, any value <0.05 is evidence in favour of
significant autocorrelation, therefore rejecting the null hypothesis. Here we reject the null hypothesis in all
cases, and thus find evidence on favour of significant autocorrelations.
Page 34 of 190
Figure 11: ACF and PACF of FT100 return in graphical form
The grey area gives the 95% confidence interval of the corresponding correlation under the null hypothesis of
zero correlation. Any correlation confined within this area can be taken as being statistically insignificant.
Page 35 of 190
6. The cross-correlogram
 . The correlations between two time series variables Y and X calculated at varying lags gives rise to the cross-
correlogram.
Cross- We discuss the meaning of cross-correlation by focusing on the first three Lag
correlation quantities in this table
𝑪𝒐𝒓𝒓(𝒀𝒕 , 𝑿𝒕 ) Correlation between current value of Y and current value of X. 0

𝑪𝒐𝒓𝒓(𝒀𝒕 , 𝑿𝒕−𝟏 ) Correlation between current value of Y and last period value of X. -1
𝑪𝒐𝒓𝒓(𝒀𝒕 , 𝑿𝒕+𝟏 ) Correlation between current period’s value of Y and lead value of X. 1
𝑪𝒐𝒓𝒓(𝒀𝒕 , 𝑿𝒕−𝟐 ) Correlation between current value of Y and the value of X two periods earlier. -2
𝑪𝒐𝒓𝒓(𝒀𝒕 , 𝑿𝒕+𝟐 ) Correlation between current value of Y and the two-period ahead lead value of X. 2
If 𝐶𝑜𝑟𝑟(𝑌𝑡 , 𝑋𝑡+𝑘 )= 0 for all values of k, then X and Y are said to be not cross-correlated
Page 36 of 190
Empirical example of cross-correlogram:

The first 10 cross-correlations between FT100 weekly return (DFT100) and £/$ exchange rate (Dgbpusd) are
displayed in a graph and a table:
. xcorr DFT100 Dgbpusd, lag(10) xlabel(-10(1)10, grid) xcorr DFT100 Dgbpusd, lag(10) table
Cross-correlogram
1.00
1.00
0.50
0.50
0.00
0.00
-0.50
-0.50
-1.00
-1.00
-10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10
Lag
𝟏. 𝑪𝒐𝒓𝒓(𝑫𝑭𝑻𝟏𝟎𝟎𝒕 , 𝑫𝒈𝒑𝒃𝒖𝒔𝒅𝒕+𝒌 ); k=-10.-9,….0, 1, 2, ….,10 𝐶𝑜𝑟𝑟(𝐷𝐹𝑇100𝑡 , 𝐷𝑔𝑝𝑏𝑢𝑠𝑑𝑡 ) =0.077, implying a

small contemporaneous correlation between the two
2. The function is not symmetric about lag 0: variables. 𝐶𝑜𝑟𝑟(𝐷𝐹𝑇100𝑡 , 𝐷𝑔𝑝𝑏𝑢𝑠𝑑𝑡−2 ) =--0.099,
𝑪𝒐𝒓𝒓(𝑫𝑭𝑻𝟏𝟎𝟎𝒕 , 𝑫𝒈𝒑𝒃𝒖𝒔𝒅𝒕+𝒌 ) ≠ 𝑪𝒐𝒓𝒓(𝑫𝑭𝑻𝟏𝟎𝟎𝒕 , 𝑫𝒈𝒑𝒃𝒖𝒔𝒅𝒕−𝒌 ) implying a negative correlation between currency
movement two weeks ago, and current DFT100.
Page 37 of 190
7. Regression models with time series variables:

 A distinguishing feature of time series data is the likely correlation between current and lagged observations.
This gives rise to a dynamic relationship between variables. A dynamic relationship is one where “a
change in a given variable today has impact on that same variable, or other variables in one or more
future time periods”.
 The total effect of such a change does not occur instantaneously but is spread or distributed over future
time periods. These are also known as lagged effects: Here is a graphical depiction of distributed lag effects
Page 38 of 190
 Some economic examples whee dynamic models might arise:

1. The effects of income tax increase at one point in time is likely to lead to current and future reductions in
consumer expenditure on final goods; this in turn would reduce demand for productive inputs, leading to
future profits loss of the input suppliers.
2. The impact of currency depreciation on domestic inflation is likely to have lagged effects depending, for
example, on how fast importers and exporters adjust to this shock.
3. The effects of an increase/decrease on sales revenue on firms’ labour demand would depend on the cost
of labour demand adjustment such as hiring and firing cost. The higher such adjustment costs are, the longer
the effects are likely to take.
4. The impact of an increase in cigarette prices today will take time to be fully realised because of consumption
habits or habit formation, leading to dynamic consumption function. Habits are formed when consumers
discover they enjoy using a commodity today, and this will have a positive effects on their consumption
decisions tomorrow.
Page 39 of 190
 In econometrics there are several ways of capturing dynamic relationships, some of which are listed below:
1. Specify that a dependent variable y is a function of current and past values of an explanatory variable x
𝑦𝑡 = 𝑓(𝑥𝑡 , 𝑥𝑡−1 , 𝑥𝑡−2 , … ) + 𝑢𝑡 [1]
Because of the existence of these lagged effects, model [1] is called a finite distributed lag (DL) model.
2. Capture the dynamic characteristics of time-series by specifying a model with lagged dependent variables
as explanatory variables:
𝑦𝑡 = 𝑓(𝑦𝑡−1 , 𝑦𝑡−2 , … ) + 𝑢𝑡 [2]
Model [2] is called an autoregressive (AR) model, with ‘‘autoregressive’’ meaning a regression of y on its own
lags.
3. A combination of models [1] and [2] is called autoregressive distributed lags (ARDL) model.
𝑦𝑡 = 𝑓(𝑦𝑡−1 , 𝑦𝑡−2 , … , 𝑥𝑡 , 𝑥𝑡−1 , 𝑥𝑡−2 , … ) + 𝑢𝑡 [3]
Page 40 of 190
I. Distributed lag (DL) model

When regressing y on x, we include one or more lagged values of x. The number lagged values of x we include
determine the lag order of DL models. For example DL model of order 3 would look like this:
𝑦𝑡 = 𝛼 + 𝛽0 𝑥𝑡 + 𝛽1 𝑥𝑡−1 + 𝛽2 𝑥𝑡−2 +𝛽3 𝑥𝑡−3 + u𝑡 [4]
where 𝛼 the intercept, and u is a serially uncorrelated error term.
Interpretation of regression coefficients of DL model [4] following a unit change in x at time t:
Coefficient Interpretation
𝜷𝟎 A unit increase in x at time t will lead to an immediate change in y by
𝛽0 units, ceteris paribus (as always). This is also known as the impact effect. Note that
if y and x are in logs, the change will be proportional change as per usual.
𝜷𝟏 A unit increase in x at time t will lead to change in y at time t+1 by 𝛽1
𝜷𝟐 A unit increase in x at time t will lead to change in y at time t+2 by 𝛽2
𝜷𝟑 A unit increase in x at time t will lead to change in y at time t+3 by 𝛽3
Multiplier analysis refers to the analysis of the effect, and the timing of the effect, of
a change in one variable on the outcome of another variable
Long run or total multiplier: 𝛽0 + 𝛽1 +𝛽2 + 𝛽3
One-period interim multiplier : 𝛽0 + 𝛽1
Two-period interim multiplier : 𝛽0 + 𝛽1 +𝛽2
Page 41 of 190
Empirical example:
Regression of weekly FT100 index return (DFT100) on four lags of £/$ xrate return (Dgbpusd):
L1.Dgbpsuds is used to denote the

first lag of Dgbpsusd 𝑫𝒈𝒃𝒑𝒖𝒔𝒅𝒕−𝟏 ;
L2.Dgbpusd is used to denote the
second lag of Dgbpsusd
𝑫𝒈𝒃𝒑𝒖𝒔𝒅𝒕−𝟐 , and so on. The current
value of Dgbpusd − 𝑫𝒈𝒃𝒑𝒖𝒔𝒅𝒕 – is
given in the row starting with --.
Examples of interpretaion:
1. An immediate increase of
Dgbpusd by 0.146 units- i.e.
positive impact effect of 0.146.
2. A reduction in DFT100 by 0.178
two weeks later.
Calculating long run/total multiplier

The total multiplier is -0.1361, but it
is statistically insignificant as the
95% C.I contains 0.
Page 42 of 190
II. Autoregressive (AR) model

 In AR models one regresses y one or more lagged values of y itself. The number lagged values of y we include
determine the lag order of AR models. For example an AR model of order 1 - AR (1) - would look like this:
𝑦𝑡 = 𝛼0 + 𝛼1 𝑦𝑡−1 + u𝑡 [5]
Where 𝛼0 the intercept, and u is a serially uncorrelated error term. For the system to be the coefficient of the
lagged y term should satisfy|𝛼1 | < 1.
Interpretation based on AR (1) model following a unit change in y at time t:
Time Interpretation
t+1 A unit increase in y at time t will lead to change in y at time t+1 by 𝑎1
t+2 A unit increase in y at time t will lead to change in y at time t+2 by 𝑎1 × 𝑎1 = 𝛼12
t+3 A unit increase in y at time t will lead to change in y at time t+3 by 𝛼13 , and so on
𝑎1
Long run multiplier = 𝑎1 + 𝛼12 + 𝛼13 + ⋯ = 𝑎1 (1 + 𝑎1 + 𝛼12 + 𝛼13 + ⋯ ) = for|𝛼1 | < 1
1− 𝑎1
In general for an AR(p) model
𝑦𝑡 = 𝛼0 + 𝛼1 𝑦𝑡−1 + 𝛼2 𝑦𝑡−2 + ⋯ + 𝛼𝑝 𝑦𝑡−𝑝 + u𝑡

𝛼1 +𝛼2 +⋯+𝑎𝑝
The long run mutiplier is for |𝛼1 + 𝛼2 + ⋯ + 𝑎𝑝 | < 1
1− 𝑎1 −𝑎2 −⋯−𝑎𝑝
Page 43 of 190
Empirical example:
Regression of weekly FT100 index return (DFT100) on three of its lags – AR (3) model
The effect of a one unit increase in

DFT100 in a particular week is to decrease
DF100 by 0.0665 units the following week.
The long run effect can be obtained as

−𝟎.𝟎𝟔𝟔𝟓+𝟎.𝟎𝟏𝟕𝟕−𝟎.𝟏𝟏𝟐𝟔
= -0.1457
𝟏−(−𝟎.𝟎𝟔𝟔𝟓 +𝟎.𝟎𝟏𝟕𝟕−𝟎.𝟏𝟏𝟐𝟔)
Page 44 of 190
III. Autoregressive Distributed Lag Model (ARDL)

 When one regresses y one or more lagged values of y itself, as well as x and lags of x, we have the ARDL
model.
 If you include 1 lag of y and 2 lags of x you have the ARDL (1, 2) model; if you have 2 lags of y and 4 lags of
x, you have ARDL (2, 4) model, and so on.
 In general an ARDL (p, q) model can be written as
𝑦𝑡 = 𝛼 + 𝛼1 𝑦𝑡−1 + ⋯ + 𝛼𝑝 𝑦𝑡−𝑝 + 𝛽0 𝑥𝑡 + 𝛽1 𝑥𝑡−1 + ⋯ + 𝛽𝑞 𝑥𝑡−𝑞 + 𝑢𝑡 [6]
Following a unit change increase in x at time t, the impact multiplier is 𝛽0 , and the long run multiplier is
𝛽0 +𝛽1 +⋯+𝛽𝑞
.
1− 𝑎1 −𝑎2 −⋯−𝑎𝑝
Page 45 of 190
Empirical example:
Regression of DFT100 index on one lag of DFT100, and two lags of Dgbpusa- ARDL (1, 2).
Interpretaion:
1. An immediate increase of Dgbpusd by
0.135 units- i.e. impact multiplier of 0.135.
2. The long run impact is
0.1347−0.0365−0.1857
= -0.07498 .
1−(−0.0725)
Page 46 of 190
8. ARIMA models- a brief introduction

𝑡=𝑇
In an AR model, the time series , say {𝑦𝑡 }𝑡=1 , depends on its previous values and the current values of the innovations
(shocks). For example the AR(2) process is expressed as
𝑦𝑡 = 𝛼0 + 𝛼1 𝑦𝑡−1 + 𝛼2 𝑦𝑡−2 + u𝑡
Another useful class of models for modelling financial time series is the moving average (MA) process.
In MA models, the current value of the variable is a combination (weighted average) of current and past values of the white
noise innovation term 𝜀𝑡 .
The moving average model of order 1 or MA(1) is expressed as:
𝑦𝑡 = 𝜃0 + 𝜃1 𝑢𝑡−1 + u𝑡
Similarly the MA(2) process is described as
𝑦𝑡 = 𝜃0 + 𝜃1 𝑢𝑡−1 + 𝜃2 𝑢𝑡−2 + u𝑡
Page 47 of 190
The autoregressive moving average (ARMA) model combines the AR and MA model: the current value of the variable
depends on past values of the variable and the current and past values of the innovation.
The ARMA process is written as ARMA(p,q), where p and q indicate the orders of the AR and MA processes respectively.
Examples of ARMA models
𝒚𝒕 = 𝜽𝟎 + 𝜶𝟏 𝒚𝒕−𝟏 + 𝜽𝟏 𝒖𝒕−𝟏 + 𝐮𝒕 ARMA(1,1)
𝒚𝒕 = 𝜽𝟎 + 𝜶𝟏 𝒚𝒕−𝟏 + 𝜶𝟐 𝒚𝒕−𝟐 + 𝜽𝟏 𝒖𝒕−𝟏 + 𝐮𝒕 ARMA(2,1)
𝒚𝒕 = 𝜽𝟎 + 𝜶𝟏 𝒚𝒕−𝟏 + 𝜽𝟏 𝒖𝒕−𝟏 + 𝜽𝟐 𝒖𝒕−𝟐 + 𝐮𝒕 ARMA(1,2)
𝒚𝒕 = 𝜽𝟎 + 𝜶𝟏 𝒚𝒕−𝟏 + 𝜶𝟐 𝒚𝒕−𝟐 + 𝜽𝟏 𝒖𝒕−𝟏 + 𝜽𝟐 𝒖𝒕−𝟐 + 𝐮𝒕 ARMA(2,2)
𝒚𝒕 = 𝜽𝟎 + 𝜶𝟏 𝒚𝒕−𝟏 + 𝐮𝒕 ARMA(1,0)
The autoregressive integrated moving average model ARIMA(p, 1, q) fits an ARMA(p,q) to the first
differenced series ARIMA(p, 2, q) is equivalent to applying ARMA(p,q) to the twice-differenced series
∆𝟐 𝒚𝒕 = ∆𝒚𝒕 − ∆𝒚𝒕−𝟏
Page 48 of 190
Examples of ARIMA models

∆𝒚𝒕 = 𝜽𝟎 + 𝜶𝟏 ∆𝒚𝒕−𝟏 + 𝜽𝟏 𝒖𝒕−𝟏 + 𝐮𝒕 ARIMA(1,1,1)
∆𝟐 𝒚𝒕 = 𝜽𝟎 + 𝜶𝟏 ∆𝟐 𝒚𝒕−𝟏 + ∆𝟐 𝜶𝟐 𝒚𝒕−𝟐 + 𝜽𝟏 𝒖𝒕−𝟏 + 𝐮𝒕 ARIMA(2,2,1)
∆𝟐 𝒚𝒕 = 𝜽𝟎 + 𝜶𝟏 ∆𝟐 𝒚𝒕−𝟏 + 𝜽𝟏 𝒖𝒕−𝟏 + 𝜽𝟐 𝒖𝒕−𝟐 + 𝐮𝒕 ARIMA(1,2,2)
𝒚𝒕 = 𝜽𝟎 + 𝜶𝟏 𝒚𝒕−𝟏 + 𝜽𝟏 𝒖𝒕−𝟏 + 𝐮𝒕 ARIMA(1,0,1)
Empirical example:
Consider an ARIMA(1,0,2) model of the weekly return (first-differenced) of British Pound- US Dollar
exchange rate
∆𝒚𝒕 = 𝒚𝒕 − 𝒚𝒕−𝟏 is given as Dgbpusd in the dataset, where y denotes the un-differenced series
lnDgbpusd ( log of the exchange rate).
Page 49 of 190
Mo
Page 50 of 190
This is of course equivalent to fitting ARIMA(1,1,2) on the un-differenced series:
Page 51 of 190
9. Testing for serial correlation in ARIMA models
The nature of serial correlation

Consider the following time series regression model
𝑦𝑡 = 𝛼+ 𝛽0 𝑥𝑡 + 𝑢𝑡 , t=1, 2,…,T
where T is the number of time periods, and 𝑢𝑡 represents the random error or disturbance term.
When 𝑐𝑜𝑟𝑟(𝑢𝑡 , 𝑢𝑡−𝑘 )= 0 for k ≠ 0, we say that error term associated with two different time
periods are not correlated. In this case we have serially uncorrelated errors.
Page 52 of 190
Serially uncorrelated errors
4
2
u
0
-2
-4 0 100 200
t
300 400 500
Serial correlation, also called autocorrelation, refers to the violation of this key assumption.
Page 53 of 190
Positively serially correlated errors Negatively serially correlated errors
5
5
u
0
u
0
-5
-5
0 50 100 150 200
t 0 50 100 150 200
t
Some sources of serial correlation
Page 54 of 190
(i) Variables omitted from the time-series regression that are correlated across periods.
(ii) The use of incorrect functional form (e.g., a linear form when a nonlinear one should be used).
(iii) Systematic errors in measurement.
First-order autocorrelation: 𝑢𝑡 = ρ𝑢𝑡−1 + 𝜀𝑡 , t=1, 2... T, where |ρ| < 1 is an unknown parameter to be
estimated and 𝜀 𝑡 is a serially uncorrelated random error term:
Value of 𝛒 Nature of serial correlation
<0 Negative serial correlation
=0 No serial correlation
>0 Positive serial correlation
 One can also have a model of higher-order serial correlations. For example with quarterly data, it often makes
sense to investigate the presence of a fourth order serial correlation:
𝑢𝑡 = ρ1 𝑢𝑡−1 + ρ2 𝑢𝑡−2 + ρ3 𝑢𝑡−3 + ρ4 𝑢𝑡−4 + 𝜀𝑡
Page 55 of 190
Consequences of serial correlation

a. If the model does not contain lagged dependent variables then OLS is still unbiased in the presence of serial
correlation. For example if you estimate
𝑦𝑡 = 𝛼+ 𝛽0 𝑥𝑡 + 𝛽1 𝑥𝑡−1 + 𝛽2 𝑥𝑡−2 + 𝛽3 𝑥𝑡−3 + 𝛽4 𝑥𝑡−4 + 𝑢𝑡
OLS will be unbiased even with serially correlated errors. However the OLS estimator will be inefficient in the
sense that we will be able to obtain other estimators with lower variance.
By contrast if the model contains lagged dependent variables, OLS will be biased in the presence of serial
correlation. For example if you estimate
𝑦𝑡 = 𝛼+ 𝛼1 𝑦𝑡−1 + 𝛽0 𝑥𝑡 + 𝛽1 𝑥𝑡−1 + 𝛽2 𝑥𝑡−2 + 𝑢𝑡
The OLS estimators of all regression coefficients will be biased.
b. More seriously the conventional OLS formula for calculating standard errors - 𝑠. 𝑒(𝛽̂1 ) = √𝑣𝑎𝑟(𝛽̂1 ) - would
̂1 −𝛽1
𝛽
be incorrect. This will lead to invalid hypothesis testing procedures e.g. calculating t values - 𝑡 = ̂1 ) -
𝑠.𝑒(𝛽
or 95% CI=𝛽̂ 1 ± 1.96𝑠. 𝑒(𝛽̂1 ). This is true whether you estimate the model with or without lagged dependent
variables.
Page 56 of 190
Testing for serial correlation in ARIMA models

A. Checking for serial correlation using residual correlogram
Recall from that correlation outside the 95% area can be taken as a statistically significant evidence of correlation
at that particular lag. As an example, consider the ARIMA(1,1,2) model fitted on the £-US$ exchange rate:
Evidence of serial
correlations of
higher order.
Page 57 of 190
B. Portmanteau (Q) test for white noise
The Portmanteau test uses the fact under the null hypothesis that the error terms are serially uncorrelated (white
noise), the following quantity based on the estimated S autocorrelations 𝜌̂𝑘 , k=1,…, S from a sample of T
observation.
𝑆
1
𝑄 = 𝑇(𝑇 + 2) ∑ 𝜌̂𝑘2
𝑇−𝑗
𝑗=1
Converges in distribution to a 𝜒 2 distribution with S degrees of freedom.
A p-value of 0.0192 can be taken as evidence in favour of rejecting

the null hypothesis. It seems that the estimated ARIMA(1,1.2)
model is not adequate.
Page 58 of 190
Summary
1. Time series data have observations on a given economic unit at different points in time.
2. For time series data an important question has to do with the frequency of data collection.
3. The systematically predictable component of the time series - the signal - consists of three major
components- trend, cycle and seasonal.
4. One can create lags, leads and differences of time series variables.
5. A stationary variable is one that is not explosive, nor trending, and nor wandering aimlessly without returning
to its mean.
6. One should never run a regression of Y on X if the variables are nonstationary.
Page 59 of 190
7. The autocorrelation function (ACF) gives a sequence of autocorrelations across different lag lengths.
8. The correlations between Y and X calculated at varying lags gives rise to the so-called cross-correlogram.
9. In dynamic models a change in one variable may have repercussions beyond the time period in which it
occurred.
10. Dynamic relationships can be modelled using finite distributed lag (DL), autoregressive (AR) or
autoregressive distributed lags (ARDL) models.
11. The immediate effect of a change in a variable is called the impact multiplier.
12. The long run multiplier is the total/ultimate effect of changing one variable by one unit.
13. It is important to ensure that estimated dynamic time series models are free of serial correlation.
Page 60 of 190
Part II
Forecasting from time series models
Based on the material in this study pack, you should be able to:
1. Distinguish between in-sample and out-of-sample period.
2. Explain what is meant by point and interval forecasts.
3. Use information criteria to choose between different models.
4. Generate one-step ahead forecasts based on ARDL models.
5. Obtain interval forecasts.
6. Use root mean squared errors and mean absolute errors to assess the relative forecasting ability of
different models.
7. Explain what is meant by exponential smoothing and generate forecasts using exponential
smoothing.
Page 61 of 190
1. Introduction
Accurate forecasts of future values of economic and financial variables are crucial for informed decision-making
on the part of governments and businesses alike. Here we focus on econometric forecasting and forecast
evaluation.
Some important terminologies:
i. In-sample period: This is the sample period over which the models are estimated.
ii. Out-of-sample period: This corresponds to the data segment we hold out in order to evaluate the
estimated models for predictive power. It is also known as the hold-out sample.
iii. Forecast origin: The exact time period at which the forecast is being made. This is simply the forecasting
or projecting date. Any information up to the forecast origin will be considered as known.
iv. Forecast horizon: The amount of time between the forecast origin and the event being predicted. When
the forecast horizon is one period, we have one-step-ahead forecasts.
v. Point forecasts: These estimate a particular value of the variable being forecast.
vi. Interval forecasts: These give intervals within which the forecasted value should be found for a particular
percentage of the time- e.g. 95% confidence intervals.
Page 62 of 190
vii. Forecast error: the difference between actual value of the variable and its predicted value.
viii. Standard error of the prediction: this is the square root of the forecast error variance. This can be used
to form interval forecasts. For example based on the normal distribution,
95% interval forecast = point forecast ±1.96 times standard error of the prediction
Page 63 of 190
2. Choosing the best model based on information criteria:

Consider the following ARDL model:
𝑦𝑡 = 𝛼 + 𝛼1 𝑦𝑡−1 + ⋯ + 𝛼𝑝 𝑦𝑡−𝑝 + 𝛽1 𝑥𝑡−1 + ⋯ + 𝛽𝑞 𝑥𝑡−𝑞 + 𝑢𝑡 [1]
All-important question: How to choose the best number of lags p and q?
Step 1: Estimate several ARDL (p,q) models by varying p and q based on the in-sample period, and for each
model check for serial correlation.
Step 2: Eliminate models with serial correlation. You might also want to discard models with coefficients that
have signs and magnitudes widely inconsistent with expectations from economic theory or past experience.
Step 3: Out of those models that survived step 2, select the best one(s) in terms of how well they fit the in-
sample period data using appropriate model selection criteria. Two popular model selection criteria are
- Akaike’s information criterion (AIC)
- Schwarz’s/Bayesian information criterion (BIC).
Models with the lowest value of AIC or BIC should be preferred, all else equal.
Page 64 of 190
Steps to calculate AIC and BIC:

i. Run the desired regression;
ii. From the regression output collect the number of observations (N); number of regressors (k), and the
sum of squared residuals (SSR)
2𝑘 𝑆𝑆𝑅 𝑘 𝑆𝑆𝑅
iii. Then compute 𝐴𝐼𝐶 = ( ) + ln ( ), and 𝐵𝐼𝐶 = ( ) 𝑙𝑛𝑁 + ln ( )
𝑁 𝑁 𝑁 𝑁
Example of obtaining IC after estimating an ARIMA(2,1,2) model:
Page 65 of 190
Empirical example
Predict change in log unemployment rates (GU hereafter) based on past values of GU, and past values of GDP
growth (GG hereafter) with in-sample period of 1975Q2 to 2012q4; and out-of-sample period 2013q1 to 2014q4
Figure 1: Time plot of GU: 1975Q2-2014q4 Figure 2:In-sample correlogram of GU
Reasonable to start estimating models with up to 3
lags of GU.
1.00
.2
In-sample period
Autocorrelations of Dunrate
0.50
.1
Out-of-sample period
0.00
0
-.1
-0.50
0 10 20 30 40
1975q1 1985q1 1995q1 2005q1 2013q1 Lag
Quarter Bartlett's formula for MA(q) 95% confidence bands
Step 1: For in-sample period, estimate ARDL (p, q) models for all combinations of p=1, 2 and 3, and q=1, 2 and
3.
Page 66 of 190
Step 2: For each model (a) perform the Portmanteau test for serial correlation up to 4 lags (because we have a
quarterly data it seems to make sense to check for serial correlation of order 4).
and keep a record of the respective p-values; (b) obtain AIC- we can also use BIC, but let’s stick with AIC! Discard
all models that failed the serial correlation test at 5% level.
Step 3: Sort the surviving models by AIC, and select model(s) with the smallest AIC value(s) as your forecasting
model(s).
Table 1 reports p-values from the serial correlation tests, and AIC from the estimated ARDL(p,q) models:
(𝑝, 𝑞 ) ∈ {(1,1) … (3,3)} for 1975Q2 to 2012q4
Table 1
p q Serial correlation AIC Decision
test p-value
1 1 0.000 -671.241 Discard as p-value <0.05
2 1 0.214 -690.538 Select ARDL(2,1) model as the best one since its AIC is the smallest
and it has no serial correlation.
For reference estimated ARDL(2,1) is : 𝑮𝑼 ̂ 𝒕 =. 𝟎𝟎𝟎𝟗 + 𝟏. 𝟏𝟗𝟓𝟏𝑮𝑼𝒕−𝟏 −. 𝟑𝟖𝟒𝟑𝑮𝑼𝒕−𝟐 −. 𝟏𝟏𝟑𝟔𝑮𝑮𝒕−𝟏
2 2 0.317 -689.821 ARDL(2,2) can be used as the second best model
2 3 0.228 -682.683 Adequate; can be used
Page 67 of 190

3. Forecasting with ARDL models

 We are now ready to calculate one-step ahead forecasts of GU for out-of-sample period 2013q1-2014q4 using
the chosen ARDL(2,1) model: For convenience Table 2 lists the observed values of GU and GG for the period
2012q1-2014q1:
Table 2
Page 68 of 190
Some one-step ahead forecasts based on the estimated ARDL(2,1) model:

̂ 𝑡 = .0009 + 1.1951𝐺𝑈𝑡−1 − .3843𝐺𝑈𝑡−2 − .1136𝐺𝐺𝑡−1
𝐺𝑈
One-step ahead point forecast for t=2013q1 at forecast origin t=2012q4
̂ 2013q1 = .0009 + 1.1951GU2012q4 − .3843GU2012q3 − .1136GG2012q4
GU
̂ 2013q1 = .0009 + (1.1951 × 0) − (. 3843 × −.0141) − (. 1136 × .0285)
GU
≅ 0.0031
Page 69 of 190
One-step ahead point forecast for t=2013q2 at forecast origin t=2013q1

̂ 2013q2 = .0009 + 1.1951GU2013q1 − .3843GU2012q4 − .1136GG2013q1
GU
̂ 2013q2 = .0009 + (1.1951 × −.0215) − (. 3843 × 0) − (. 1136 × −.0279) ≅ −0.0216
GU
The point forecast of GU for 2013q1 = 0.0031 whereas the actual value is
-0.0215 (see Table 2). The forecast error (û) at 2013q1 is the difference between actual value of the variable and
its predicted value, that is û2013𝑞1 =-0.0246.
In order to obtain the interval forecast for 2013q1 we need to obtain the standard error of the forecast at 2013q1,
𝑠𝑒(û2013𝑞1 ). Then an approximate 95% forecast interval would be
̂ 2013q1 ± 1.96𝑠𝑒(û2013𝑞1 )
GU
We can obtain standard error of the forecast from Stata using the command predict se, stdp. We find that at
2013q1 𝑠𝑒(û2013𝑞1 ) = 0.0026 giving a 95% percent of forecast interval of
0.0031 ± 1.96 × 0.026 = [−0.0021, 0.0083]
Page 70 of 190
̂ ) and associated forecast intervals, forecast

Table 3 below gives actual values of GU, one-step ahead forecasts (GU
errors, and for future references forecast errors squared ( û2 ) and absolute value of the forecast errors (|û|).
Table 3:
ARDL(2,1) forecasts
Quarter GU ̂
𝐆𝐔 𝐮
̂ 95% forecast interval ̂𝟐
𝐮 |𝐮
̂|
Lower Upper
limit limit
2013q1 -0.0215 0.0031 -0.0246 -0.0021 0.0083 0.0006 0.0246
2013q2 -0.0369 -0.0216 -0.0153 -0.0281 -0.0152 0.0002 0.0153
2013q3 -0.0701 -0.0353 -0.0348 -0.0403 -0.0303 0.0012 0.0348
2013q4 -0.0841 -0.0705 -0.0136 -0.0779 -0.0631 0.0002 0.0136
2014q1 -0.0822 -0.0754 -0.0068 -0.0829 -0.0679 0.0000 0.0068
2014q2 -0.0896 -0.0619 -0.0277 -0.0705 -0.0534 0.0008 0.0277
2014q3 -0.0984 -0.0767 -0.0217 -0.0843 -0.0691 0.0005 0.0217
2014q4 -0.0715 -0.0850 0.0135 -0.0932 -0.0767 0.0002 0.0135
Page 71 of 190
4. Comparing the forecasting ability of different models:
In most forecasting exercises, there could be several competing forecasting models available to the economist
⟹ need for some forecast performance evaluation criteria to apply to the out-of-sample forecasts to select the
best possible one.
 One such commonly used forecast accuracy evaluation criterion is the root mean squared error (MSE).
Assuming we have H out-of-sample forecasts starting at forecast origin t, this is defined as
ℎ=𝐻
1 2
𝑅𝑀𝑆𝐸 = √ ∑ 𝑢̂𝑡+ℎ
𝐻
ℎ=1
Using quantities given in Table 3, and noting that H=8 in that example, we can calculate the RMSE for the
ARDL(2,1) model as
1
𝑡=2014𝑞4
𝑅𝑀𝑆𝐸 = √ ∑𝑡=2013𝑞1 𝑢̂𝑡2 = 0.02313.
8
 Another popular measure of models’ relative predictive ability is the mean absolute error (MAE) which is
the average of the absolute forecast errors |û|:
Page 72 of 190
ℎ=𝐻
1
𝑀𝐴𝐸 = ∑ |û𝑡+ℎ |
𝐻
ℎ=1
From Table 3, MAE of ARDL(2,1) model can be calculated as

𝑡=2014𝑞4
1
𝑀𝐴𝐸 = ∑ |𝑢
̂|
𝑡 = 0.01812
8
𝑡=2013𝑞1
When comparing the relative predictive ability of different models, the one with the smallest value of RMSE or
MAE is deemed to be the most accurate one.
We repeat the same analysis based on the second best model which is ARDL(2,2) – see Table 1. The results are
summarised in Table 4 below.
Table 4: ARDL(2,2) forecasts
Quarter GU ̂
GU û95% forecast interval û2 |û|
Lower limit Upper limit
2013q1 -0.0215 0.0014 -0.0229 -0.0045 0.0074 0.0005 0.0229
2013q2 -0.0369 -0.0227 -0.0142 -0.0295 -0.0160 0.0002 0.0142
2013q3 -0.0701 -0.0326 -0.0374 -0.0395 -0.0258 0.0014 0.0374
Page 73 of 190
2013q4 -0.0841 -0.0697 -0.0144 -0.0772 -0.0622 0.0002 0.0144

2014q1 -0.0822 -0.0756 -0.0066 -0.0831 -0.0682 0.0000 0.0066
2014q2 -0.0896 -0.0626 -0.0270 -0.0712 -0.0540 0.0007 0.0270
2014q3 -0.0984 -0.0739 -0.0245 -0.0830 -0.0648 0.0006 0.0245
2014q4 -0.0715 -0.0853 0.0139 -0.0936 -0.0770 0.0002 0.0139
Based on Table 4, RMSE =0.023062 and MAE=0.01809 for ARDL (2,2)- pretty similar to the ones from
ARDl(2,1). As can be seen from Figure 3 both models are doing reasonably well in capturing the trends in
unemployment rate changes, but both tend to overestimate the extent of such changes.
Figure 3:
Page 74 of 190
0
-.02
-.04
-.06
-.08
-.1
2013q1 2013q4 2014q2 2014q4

Quarter
Actual change in unemployment rate Forecasts from ARDL(2,1)

Forecasts from ARDL(2,2)
Page 75 of 190
5. Exponential smoothing
The exponential smoothing method which is a versatile forecasting tool without the need to estimate ARDL
models. Suppose we want to forecast the UK’s quarterly inflation rate (INR) for the period 2013q1-2014q4 using
information on past values of INR only For convenience we list some values of INR
One possibility would be to compute one-step ahead forecasts as the average value of the past four quarters. For
instance a forecast of INR for 2013q1 at forecast origin 2012q4 would be obtained as
𝐼𝑁𝑅2012𝑞1 +𝐼𝑁𝑅2012𝑞2 +𝐼𝑁𝑅2012𝑞3 +𝐼𝑁𝑅2012𝑞4
̂ 2013𝑞1 =
𝐼𝑁𝑅 =.006397
4
Similarly the forecast for 2013q2 at forecast origin 2013q1 would be computed as
Page 76 of 190
𝐼𝑁𝑅2012𝑞2 +𝐼𝑁𝑅2012𝑞3 +𝐼𝑁𝑅2012𝑞4 +𝐼𝑁𝑅2013𝑞1

̂ 2013𝑞2 =
𝐼𝑁𝑅 = .005697
4
The above are examples of a simple equally-weighted moving average model which uses the last 4
observations – MA (4) model for short.
Using a more general notation, the equally weighted MA (4) forecast of y for time T+1 at forecast origin T is
1 1 1 1
given as𝑦̂𝑇+1 = 𝑦𝑇 + 𝑦𝑇−1 + 𝑦𝑇−2 + 𝑦𝑇−3 .
4 4 4 4
You can take this idea to expand the list of past observations used and vary the weight attached to each
observation in the computation of the forecast, naturally subject to the weights summing to 1. For example one
1 1 1 1 3 1
can think of using 𝑦̂𝑇+1 = 𝑦𝑇 + 𝑦𝑇−1 + 𝑦𝑇−2 + 𝑦𝑇−3 + 𝑦𝑇−4 + 𝑦𝑇−5.
2 4 8 16 64 64
Or quite simply
2 1
𝑦̂𝑇+1 = 𝑦𝑇 + 𝑦𝑇−1
3 3
The exponential smoothing approach generalises this idea by using all past information with the weights
declining as the observations get older:
For 0 < 𝛼 < 1, this approach attaches weight of 𝛼(1 − 𝛼)𝑠 to 𝑦𝑇−𝑠 , and it can be shown that∑∞ 𝑠
𝑠=0 𝛼(1 − 𝛼) =
1. The forecasting equation is then

Page 77 of 190
𝑦̂𝑇+1 = 𝛼𝑦𝑇 + 𝛼(1 − 𝛼)1 𝑦𝑇−1 + 𝛼(1 − 𝛼)2 𝑦𝑇−2 + 𝛼(1 − 𝛼)3 𝑦𝑇−3 + ⋯ [5]
In equation [5] 𝛼 (also known as the smoothing parameter) is the weight given to the most recent observation
(s=0), and it should be clear that the higher 𝛼 is the more quickly past information gets obsolete (see Study pack II
for a graphical demonstration of this fact).
Computing forecast of 𝑦̂𝑇+1 using information from the infinite past as per Equation [5] is not convenient. To
simplify matter notice that using [5] the forecast of 𝑦̂𝑇 can be written as
𝑦̂𝑇 = 𝛼𝑦𝑇−1 + 𝛼(1 − 𝛼)1 𝑦𝑇−2 + 𝛼(1 − 𝛼)2 𝑦𝑇−3 + ⋯ [6]
Next multiplying [6] by (1 − 𝛼) we have
(1 − 𝛼) 𝑦̂𝑇 = 𝛼(1 − 𝛼)1 𝑦𝑇−1 + 𝛼(1 − 𝛼)2 𝑦𝑇−2 + 𝛼(1 − 𝛼)3 𝑦𝑇−3 + ⋯ [7]
Thus we can replace the infinite sum
𝛼(1 − 𝛼)1 𝑦𝑇−1 + 𝛼(1 − 𝛼)2 𝑦𝑇−2 + 𝛼(1 − 𝛼)3 𝑦𝑇−3 + ⋯
found in [5] by the single term (1 − 𝛼) 𝑦̂𝑇 , resulting in much simpler forecasting algorithm:
𝑦̂𝑇+1 = 𝛼𝑦𝑇 + (1 − 𝛼) 𝑦̂𝑇 [8]
Page 78 of 190
Thus the forecast for next period is a weighted average of the most recent observation 𝒚𝑻 and its forecast
̂𝑻 .
𝒚
To operationalise the exponential smoothing method one can choose the optimal value of 𝛼 in such a way that
the sum of squared in-sample forecast errors is minimised.
For a given value of , the one-step ahead in-sample forecast error is

𝑢̂𝑡,𝛼 = 𝑦𝑡 − 𝑦̂𝑡 = 𝑦𝑡 − 𝛼𝑦𝑡−1 − (1 − 𝛼) 𝑦̂𝑡−1 [9]
Thus the computer software would solve the following problem:
𝑚𝑖𝑛 ∑𝑇𝑡=1 𝑢̂𝑡,𝛼
2
with respect to 𝛼.
Empirical example:
Apply the exponential smoothing method using UK inflation rate data for 1975q2-2014q4:
- in-sample period 1975q2-2012q4; out-of-sample period 2013q1-2014q4.
Step 1: Determine the optimal smoothing parameter  by that minimising the sum of squared in-sample forecast
errors: (we call the thus forecasted series INR_sm0).
Page 79 of 190
In this case, the optimal value of

𝜶 =0.198
Step 2: Using the smoothing parameter 𝛼 =0.198, we generate the out-of-sample forecast values (we call the thus
forecasted series INR_sm1), and graph actual (in blue) and forecast (in red) values:
Page 80 of 190
.008
.006
.004
.002
2013q1 2013q3 2014q2 2014q4

yq
Inflation rate exp parms(0.1980) = INR
Page 81 of 190
Summary
1. Obtaining accurate forecasts is important for informed economic and financial decision making.
2. It is crucial that the models being used for forecasting are free from serial correlation.
3. Information criteria such as AIC and BIC can be used to choose the best model in terms of fitting the
data in in-sample period.
4. One should not only obtain point forecasts but also interval forecasts.
5. Because more than one econometric model can fit the data it is useful to carry out some out-of-sample
predictive ability evaluation based on RMSE and MAE.
6. Exponential smoothing is a useful alternative forecasting strategy that does not involve estimating
ARDL
Page 82 of 190
1. Discuss basic concepts behind the modelling of time series data with VAR
2. Build VAR models and carry out impulse response and forecast error decomposition analyses.
Page 83 of 190
Page 84 of 190
Page 85 of 190
Page 86 of 190
Page 87 of 190
Page 88 of 190
Page 89 of 190
Example taken form Hill, et al (2008) Principles of Econometrics.
Page 90 of 190
Page 91 of 190
Page 92 of 190
Page 93 of 190
Page 94 of 190
Page 95 of 190
Page 96 of 190
Page 97 of 190
Page 98 of 190
Page 99 of 190
Page 100 of 190

Page 101 of 190

Page 102 of 190

Page 103 of 190

Page 104 of 190

Page 105 of 190

Page 106 of 190

Page 107 of 190

Page 108 of 190

Page 109 of 190

Page 110 of 190

Page 111 of 190

Page 112 of 190

Page 113 of 190

Page 114 of 190

Page 115 of 190

 VAR, as advocated by Sims (1980), has the advantage of eschewing

“incredible identification restrictions” imposed by simultaneous equation
models. Essentially VAR treats all variables in the system symmetrically.
 Nonetheless the VAR approach has come in for some criticism due to its
lack of economic content (i.e. no role for well-founded restrictions based
on economic theory and institutional knowledge).
 Moreover VAR does not allow for contemporaneous relationships
between economic variables. For example, a change in income is only
allowed to affect consumption with a lag.
 Structural VAR (SVAR) arises out of imposing economic-theory related
identifying assumptions on reduced form VAR.
Page 116 of 190

To fix ideas, let y be a three-element vector, e.g. y = (investment, income, consumption):

Reduced form VAR (1): 𝑦𝑡 = 𝐴1 𝑦𝑡−1 + 𝜀𝑡 𝜀𝑡 ~𝑁(0, Σ𝜀 ) 𝜎11 𝜎12 𝜎13
Σ𝜀 = [𝜎12 𝜎22 𝜎23 ]
𝜎13 𝜎23 𝜎33
− Σ𝜀 has 6 distinct value
- Off-diagonal elements of Σ𝜀 capture captures contemporaneous relationship between the
errors
- No problem with estimating VAR with OLS as long as 𝜀𝑡 is serially correlated.
- Although VAR is useful for forecasting, limited use for structural analysis of the economy.
Structural VAR: 𝐴𝑦𝑡 = 𝐴1 𝑦𝑡−1 + 𝑢𝑡 𝑢𝑡 ~𝑁(0, Σ𝑢 ) 𝜎𝑢11 0 0
Σ𝑢 = [ 0 𝜎𝑢12 0 ]
0 0 𝜎𝑢33
−𝑢𝑡 are serially uncorrelated
AND independent of each
other.
- The A matrix capture contemporaneous relationship between the endogenous variables.
- Problematic estimating SVAR with OLS because of regressors-error correlation
Page 117 of 190

Pre-multiply SVAR by 𝐴−1 to “obtain” reduced-form VAR

𝑨−𝟏 𝑨𝒚𝒕 = 𝑨−𝟏 𝑨𝟏 𝒚𝒕−𝟏 + 𝑨−𝟏 𝒖𝒕
⟹
𝒚𝒕 = 𝑨−𝟏 𝑨𝟏 𝒚𝒕−𝟏 + 𝐴−1 𝑢𝑡
It should then be easy to see that Σ𝜀 ≡ 𝐴−1 Σ𝑢 𝐴−1 ′
𝜎11 𝜎12 𝜎13 𝑎11 𝑎12 𝑎13 −1 𝑎11 𝑎12 𝑎13 −1′
[𝜎12 𝜎22 𝜎23 ] = [𝑎12 𝑎22 𝑎23 ] Σ𝑢 [𝑎12 𝑎22 𝑎23 ] ≡Ω
𝜎13 𝜎23 𝜎33 𝑎31 𝑎32 𝑎33 𝑎31 𝑎32 𝑎33
-LHS -Σ𝜀 − has 6 distinct values which can be consistently estimated from reduced form VAR
-RHS - Ω − has 9 unknown parameters ( recall that Σ𝑢 is diagonal)
- 9 to be estimated from 6 equations: thus the system is not identified
-Identification requires putting some credible restrictions on the elements of A.
- On possibility is to put zero short-run restriction (also known as Cholesky restrictions or
recursive identification).
e.g. (i) investment is not contemporaneously affected by shocks to income and consumption; (ii)
income is contemporaneously affected by shock to investment but not consumption, and (iii)
consumption is contemporaneously affected by shocks to both income and investment.
-Here investment is put at the top of the recursive system- in this sense it is the most exogenous
variable (it is only contemporaneously affected by its own shocks). So variables ordering matters!
Page 118 of 190

The above restrictions imply

𝟏 𝟎 𝟎
𝑨 = [𝒂12 𝟏 0] ⟹ 𝛀 has 6 unknown parameters (3 𝒂.. 𝑠 and 3 𝜎𝑢. 𝑠)
𝒂31 𝒂32 𝟏
With 6 known (to be precise consistently estimated) parameters from the reduced form VAR and
6 unknown SVAR parameters, the model is just-identified or exactly identified (this corresponds
to empirical example 1).
Page 119 of 190

Over-identified models:
Over-identification occurs when there are more equations than parameters to estimate.
For example if we further impose the restriction that income is not contemporaneously affected by shock
to investment, we would have:
The above restrictions imply
𝟏 𝟎 𝟎
𝑨=[ 0 𝟏 0] ⟹ 𝛀 has 5 unknown parameters
𝒂31 𝒂31 𝟏
Model is over-identified (this corresponds to empirical example 2).

Empirical examples 1 and 2 below are taken from 7. Time Series Reference Manual, 2017, RELEASE 15. ®. A
Stata Press Publication, College Station, Texas.
Page 120 of 190

𝟏 𝟎 𝟎
Empirical example 1: 𝐴 = [𝒂12 𝟏 0]
𝒂31 𝒂32 𝟏
Page 121 of 190

Page 122 of 190

𝟏 𝟎 𝟎
Empirical example 2: 𝐴 = [ 0 𝟏 0]
𝒂31 𝒂32 𝟏
Page 123 of 190

Page 124 of 190

Page 125 of 190

Page 126 of 190

Page 127 of 190

Page 128 of 190

Page 129 of 190

Page 130 of 190

Page 131 of 190

Page 132 of 190

Page 133 of 190

Page 134 of 190

Page 135 of 190

Page 136 of 190

Page 137 of 190

Page 138 of 190

Page 139 of 190

Page 140 of 190

Page 141 of 190

Page 142 of 190

Page 143 of 190

Page 144 of 190

Page 145 of 190

Page 146 of 190

Page 147 of 190

Page 148 of 190

Page 149 of 190

Page 150 of 190

Page 151 of 190

Page 152 of 190

1. To understand the importance of modelling the volatility of financial and economic time series.
2. To estimate ARCH and GARCH models.
3. To forecast volatility
4. To understand the estimation and interpretation of multivariate GARCH models.
Page 153 of 190

1. Introduction
The values of financial and economic time series tend to change rapidly across time. This characteristic of financial
returns is known as time-varying volatility. Often large (small) changes are followed by further large (small)
changes. This is described as clustering of changes. Volatility clustering arises because pieces of market
information tend to arrive together (e.g. macroeconomic data) or due to psychological reasons (e.g. arrival of
important bad news shaping the mood of the market).
Volatility of returns represents the risk associated with holding a particular asset. Modelling volatility is therefore
an important component of modern financial analysis. The innovations or error terms 𝜀𝑡 in financing models
are also known as “shocks” or “news” because they represent the unexpected, and modelling volatility involves
modelling functions of 𝜀𝑡2 .
We will use Arabica coffee future market price’s weekly returns (x100) data for the purpose of empirical
illustration:
Page 154 of 190

Page 155 of 190

For a visual inspection of time varying volatility, we plot the squared returns of the coffee price returns:
Page 156 of 190

Page 157 of 190

Page 158 of 190

Page 159 of 190

Page 160 of 190

Empirical demonstration:
Testing for ARCH(6) effects based on an AR(4) model based on coffee shows evidence of time varying volatility
or heteroscedasticity.
. regress coffee l(1/4).coffee
Source SS df MS Number of obs = 611

F(4, 606) = 4.18
Model 721.653425 4 180.413356 Prob > F = 0.0024
Residual 26147.5545 606 43.1477796 R-squared = 0.0269
Adj R-squared = 0.0204
Total 26869.2079 610 44.0478818 Root MSE = 6.5687
coffee Coef. Std. Err. t P>|t| [95% Conf. Interval]
coffee
L1. .067121 .0404425 1.66 0.097 -.0123035 .1465456
L2. -.0183359 .0403742 -0.45 0.650 -.0976262 .0609544
L3. -.0902531 .0404127 -2.23 0.026 -.1696191 -.0108871
L4. -.104141 .0404774 -2.57 0.010 -.183634 -.0246479
_cons .1025018 .2658247 0.39 0.700 -.4195477 .6245512
. estat archlm , lag(6)

LM test for autoregressive conditional heteroskedasticity (ARCH)
lags(p) chi2 df Prob > chi2
6 59.963 6 0.0000
H0: no ARCH effects vs. H1: ARCH(p) disturbance
Page 161 of 190

Q-test on residual and squared residuals shows no evidence of serial correlation, but confirms the presence of
heteroscedasticity.
. ac resid_sqr
. predict resid, residual
(5 missing values generated)
. gen resid_sqr = resid^2

(5 missing values generated)
. wntestq resid
Portmanteau test for white noise
Portmanteau (Q) statistic = 43.4144

Prob > chi2(40) = 0.3280
. wntestq resid_sqr
Portmanteau test for white noise
Portmanteau (Q) statistic = 224.7080

Prob > chi2(40) = 0.0000
Page 162 of 190

Estimating models:
Like ARMA models when estimating ARCH models one can use information criteria (and serial correlation tests)
to determine the model that fits the data best.
Page 163 of 190

Page 164 of 190

Page 165 of 190

By way of illustration, we fit an ARCH(6) model with a constant mean and normally distributed errors:
. arch coffee, arch(1/6) nolog
ARCH family regression
Sample: 2005w2 - 2016w44 Number of obs = 615

Distribution: Gaussian Wald chi2(.) = .
Log likelihood = -1998.572 Prob > chi2 = .
OPG
coffee Coef. Std. Err. z P>|z| [95% Conf. Interval]
coffee
_cons .109839 .2305391 0.48 0.634 -.3420093 .5616873
ARCH
arch
L1. .0980987 .0506596 1.94 0.053 -.0011922 .1973897
L2. .0973032 .0461274 2.11 0.035 .0068951 .1877113
L3. .1052266 .0434417 2.42 0.015 .0200824 .1903708
L4. .1661873 .0506885 3.28 0.001 .0668397 .2655349
L5. .0769565 .0457464 1.68 0.093 -.0127048 .1666177
L6. -.0017966 .0324321 -0.06 0.956 -.0653624 .0617692
_cons 20.23847 2.788978 7.26 0.000 14.77217 25.70477
Page 166 of 190

Page 167 of 190

Model adequacy checks:
. * testing for volatility clusters

. regress sr_arch6
Source SS df MS Number of obs = 615

F(0, 614) = 0.00
Model 0 0 . Prob > F = .
Residual 614.900931 614 1.00146731 R-squared = 0.0000
Adj R-squared = 0.0000
Total 614.900931 614 1.00146731 Root MSE = 1.0007
sr_arch6 Coef. Std. Err. t P>|t| [95% Conf. Interval]
_cons .0119971 .0403535 0.30 0.766 -.0672505 .0912446
. estat archlm, lags(1,6,12)

LM test for autoregressive conditional heteroskedasticity (ARCH)
lags(p) chi2 df Prob > chi2
1 0.011 1 0.9155
6 0.514 6 0.9977
12 12.751 12 0.3874
H0: no ARCH effects vs. H1: ARCH(p) disturbance
Page 168 of 190

Model appears adequate in this respect.
Some evidence of autocorrelation
Page 169 of 190

Some weak evidence of autocorrelation (clustering).
Page 170 of 190

It appears that we have approximately normality distributed standardised errors (except for some
deviation at the upper tail)
The model adequacy checks suggest that there is certainly room for improvement. By way of illustration, we
consider two possible modifications to our baseline model: (i) estimating of the ARCH(6) model with an AR(4)
model for the mean equation; (ii) estimating the model in (i) with a Student’s t distribution
Page 171 of 190
. arch coffee, arch(1/6) ar(1/4) nolog
ARCH family regression -- AR disturbances

Distribution: Gaussian Wald chi2(4) = 10.22
Log likelihood = -1993.18 Prob > chi2 = 0.0368
OPG
coffee
_cons .0568814 .1971257 0.29 0.773 -.3294778 .4432407
ARMA
ar
L1. .0070821 .043594 0.16 0.871 -.0783607 .0925248
L2. -.0423097 .0458626 -0.92 0.356 -.1321988 .0475794
L3. -.011741 .0448792 -0.26 0.794 -.0997026 .0762206
L4. -.1420698 .0462513 -3.07 0.002 -.2327207 -.0514188
ARCH
arch
L1. .096822 .0502367 1.93 0.054 -.0016401 .195284
L2. .1074129 .0494122 2.17 0.030 .0105669 .204259
L3. .1153206 .0489735 2.35 0.019 .0193344 .2113069
L4. .132513 .0554539 2.39 0.017 .0238253 .2412007
L5. .1088043 .0506682 2.15 0.032 .0094965 .2081122
L6. -.0165251 .035382 -0.47 0.640 -.0858725 .0528224
_cons 19.63054 2.812442 6.98 0.000 14.11826 25.14283
Page 172 of 190

. arch coffee, arch(1/6) ar(1/4) distribution(t) nolog
ARCH family regression -- AR disturbances

Distribution: t Wald chi2(4) = 6.87
OPG
coffee
_cons .0037801 .1974719 0.02 0.985 -.3832577 .3908179
ARMA
ar
L1. .0171089 .0425513 0.40 0.688 -.0662902 .1005079
L2. -.0583185 .0436518 -1.34 0.182 -.1438746 .0272375
L3. -.0180302 .0426016 -0.42 0.672 -.1015277 .0654674
L4. -.0969459 .0438818 -2.21 0.027 -.1829527 -.0109391
ARCH
arch
L1. .112266 .0657108 1.71 0.088 -.0165247 .2410567
L2. .1252151 .0666399 1.88 0.060 -.0053967 .2558268
L3. .0977132 .0595747 1.64 0.101 -.019051 .2144774
L4. .1089875 .0604104 1.80 0.071 -.0094146 .2273896
L5. .0964241 .059543 1.62 0.105 -.0202781 .2131262
L6. -.0035813 .043177 -0.08 0.934 -.0882067 .0810441
_cons 20.05422 3.535028 5.67 0.000 13.1257 26.98275
/lndfm2 1.944785 .5022488 3.87 0.000 .9603956 2.929175
df 8.992129 3.511788 4.61273 20.71218
Page 173 of 190

Page 174 of 190

Page 175 of 190

. * GARCH(1,1) model estimation

. arch coffee, arch(1) garch(1) nolog

OPG
coffee
_cons .0601514 .2347744 0.26 0.798 -.399998 .5203008
ARCH
arch
L1. .1017103 .0192276 5.29 0.000 .0640249 .1393957
garch
L1. .8720564 .0218953 39.83 0.000 .8291424 .9149704
_cons 1.234846 .4547916 2.72 0.007 .3434708 2.126221
Page 176 of 190

Page 177 of 190

Page 178 of 190

Page 179 of 190

. * TGARCH(1,1) model estimation

. arch coffee, arch(1) tarch(1) garch(1) nolog
initial values not feasible
(note: default initial values infeasible; starting ARCH/ARMA estimates from 0)

OPG
coffee
_cons .3337825 .2276203 1.47 0.143 -.1123451 .7799101
ARCH
arch
L1. -.0077141 .0156794 -0.49 0.623 -.0384452 .0230169
tarch
L1. .1281206 .0190562 6.72 0.000 .0907712 .16547
garch
L1. .9346354 .0169129 55.26 0.000 .9014867 .9677841
_cons .5254656 .2274085 2.31 0.021 .0797531 .9711782
Positive news appear to increase volatility by a factor of 0.128.
Page 180 of 190

Page 181 of 190

Page 182 of 190

. * EGARCH(1,1) model estimation

. arch coffee, earch(1) egarch(1) nolog

OPG
coffee
_cons .3328754 .2328488 1.43 0.153 -.1235 .7892507
ARCH
earch
L1. .1021815 .0153625 6.65 0.000 .0720717 .1322914
earch_a
L1. .1164465 .0298289 3.90 0.000 .0579829 .17491
egarch
L1. .9880938 .0053174 185.82 0.000 .977672 .9985157
_cons .0470269 .0201367 2.34 0.020 .0075597 .0864941
Page 183 of 190

Page 184 of 190

. * GARCH(1,1) in-mean model estimation

. arch coffee, arch(1) garch(1) archm nolog

OPG
coffee
_cons .7679904 .4473149 1.72 0.086 -.1087306 1.644711
ARCHM
sigma2 -.0212181 .0114738 -1.85 0.064 -.0437064 .0012701
ARCH
arch
L1. .0949795 .0179889 5.28 0.000 .0597219 .130237
garch
L1. .8811698 .0199615 44.14 0.000 .8420459 .9202936
_cons 1.113584 .4303026 2.59 0.010 .2702067 1.956962
Page 185 of 190

Page 186 of 190

Page 187 of 190

. ***** MGARCH(1,1) model + VAR(1) for the mean part

. mgarch dcc (coffee oil= L.coffee L.oil), arch(1) garch(1) nolog
Dynamic conditional correlation MGARCH model

Coef. Std. Err. z P>|z| [95% Conf. Interval]
coffee
coffee
L1. -.0000511 .0433982 -0.00 0.999 -.08511 .0850078
oil
L1. -.0349724 .0707193 -0.49 0.621 -.1735797 .1036349
_cons .0857928 .2203575 0.39 0.697 -.3461 .5176855
ARCH_coffee
arch
L1. .0991133 .0253231 3.91 0.000 .0494808 .1487457
garch
L1. .8762679 .0290288 30.19 0.000 .8193725 .9331633
_cons 1.163623 .5322252 2.19 0.029 .1204812 2.206766
oil
coffee
L1. -.0207559 .0184968 -1.12 0.262 -.0570089 .0154972
oil
L1. .0431011 .0436383 0.99 0.323 -.0424284 .1286307
_cons .0235871 .1167081 0.20 0.840 -.2051565 .2523307
ARCH_oil
arch
L1. .0753599 .0190557 3.95 0.000 .0380115 .1127083
garch
L1. .896939 .0251274 35.70 0.000 .8476903 .9461878
_cons .3034487 .1433701 2.12 0.034 .0224484 .584449
corr(coffee,oil) .2298757 .1104118 2.08 0.037 .0134726 .4462789
Adjustment
lambda1 .0514456 .0180289 2.85 0.004 .0161095 .0867817
lambda2 .9216074 .025248 36.50 0.000 .8721222 Page
.9710926 188 of 190
. predict corr*, correlation
. des corr*
In practice you need to generate the standardised residuals for each equation (2 conditional variance
equations in this example) and perform the usual model adequacy tests based on each of them. Here
we skip this part and just show the dynamic correlation graph.
Page 189 of 190

Page 190 of 190

TS - Lectures AAU August 7 - 9 2018

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

TS - Lectures AAU August 7 - 9 2018

Uploaded by

Copyright:

Available Formats

sourafel.girma@nottingham.ac.

uk Applied Time Series Analysis Addis Ababa University, 2018

Day 1: August 7, 2018 Univariate Time Series Analysis:

Some examples of time series data:

1986 1990 1995 2000 2005 2010 2015

1985q1 1990q1 1995q1 2000q1 2005q1 2010q1 2015q1

Example 3: At monthly frequency- a still higher frequency data.

1985m1 1990m1 1995m1 2000m1 2005m1 2010m1 2015m1

Components of time series variables:

Log UK GDP market prices- £m

2. Lags, leads and differences

£/$ exchange rate logxr: log £/$ exchange rate

Examples of leads creation with log £/$ exchange rate)

An example of a time series of a nonstationary An example of a time series of a nonstationary

FT100 index return

2000w1 2004w1 2008w1 2012w1 2016w30

FT100 FT100 index return

4. Testing for stationarity

The autoregressive process

Random walk with trend

1960q1 1970q1 1980q1 1990q1 2000q1 2010q1

1960q1 1970q1 1980q1 1990q1 2000q1 2010q1

Random walk with drift

1960q1 1970q1 1980q1 1990q1 2000q1 2010q1

Stationary process with trend

1960q1 1970q1 1980q1 1990q1 2000q1 2010q1

Augmented Dickey-Fuller tests

ADF test based on FT100 confirm non-stationarity of series:

…whereas the return to FT100 index is confirmed as stationary:

5. The autocorrelation function

Empirical example of ACF and PACF:

Figure 11: ACF and PACF of FT100 return in graphical form

𝑪𝒐𝒓𝒓(𝒀𝒕 , 𝑿𝒕 ) Correlation between current value of Y and current value of X. 0

Empirical example of cross-correlogram:

𝟏. 𝑪𝒐𝒓𝒓(𝑫𝑭𝑻𝟏𝟎𝟎𝒕 , 𝑫𝒈𝒑𝒃𝒖𝒔𝒅𝒕+𝒌 ); k=-10.-9,….0, 1, 2, ….,10 𝐶𝑜𝑟𝑟(𝐷𝐹𝑇100𝑡 , 𝐷𝑔𝑝𝑏𝑢𝑠𝑑𝑡 ) =0.077, implying a

7. Regression models with time series variables:

 Some economic examples whee dynamic models might arise:

I. Distributed lag (DL) model

L1.Dgbpsuds is used to denote the

Calculating long run/total multiplier

II. Autoregressive (AR) model

In general for an AR(p) model

𝑦𝑡 = 𝛼0 + 𝛼1 𝑦𝑡−1 + 𝛼2 𝑦𝑡−2 + ⋯ + 𝛼𝑝 𝑦𝑡−𝑝 + u𝑡

The effect of a one unit increase in

The long run effect can be obtained as

III. Autoregressive Distributed Lag Model (ARDL)

8. ARIMA models- a brief introduction

(shocks). For example the AR(2) process is expressed as

𝒚𝒕 = 𝜽𝟎 + 𝜶𝟏 𝒚𝒕−𝟏 + 𝜶𝟐 𝒚𝒕−𝟐 + 𝜽𝟏 𝒖𝒕−𝟏 + 𝐮𝒕 ARMA(2,1)

𝒚𝒕 = 𝜽𝟎 + 𝜶𝟏 𝒚𝒕−𝟏 + 𝜽𝟏 𝒖𝒕−𝟏 + 𝜽𝟐 𝒖𝒕−𝟐 + 𝐮𝒕 ARMA(1,2)

𝒚𝒕 = 𝜽𝟎 + 𝜶𝟏 𝒚𝒕−𝟏 + 𝜶𝟐 𝒚𝒕−𝟐 + 𝜽𝟏 𝒖𝒕−𝟏 + 𝜽𝟐 𝒖𝒕−𝟐 + 𝐮𝒕 ARMA(2,2)

Examples of ARIMA models

∆𝟐 𝒚𝒕 = 𝜽𝟎 + 𝜶𝟏 ∆𝟐 𝒚𝒕−𝟏 + ∆𝟐 𝜶𝟐 𝒚𝒕−𝟐 + 𝜽𝟏 𝒖𝒕−𝟏 + 𝐮𝒕 ARIMA(2,2,1)

∆𝟐 𝒚𝒕 = 𝜽𝟎 + 𝜶𝟏 ∆𝟐 𝒚𝒕−𝟏 + 𝜽𝟏 𝒖𝒕−𝟏 + 𝜽𝟐 𝒖𝒕−𝟐 + 𝐮𝒕 ARIMA(1,2,2)

𝒚𝒕 = 𝜽𝟎 + 𝜶𝟏 𝒚𝒕−𝟏 + 𝜽𝟏 𝒖𝒕−𝟏 + 𝐮𝒕 ARIMA(1,0,1)

This is of course equivalent to fitting ARIMA(1,1,2) on the un-differenced series:

9. Testing for serial correlation in ARIMA models

The nature of serial correlation

Serially uncorrelated errors

Positively serially correlated errors Negatively serially correlated errors

Some sources of serial correlation

Consequences of serial correlation

Testing for serial correlation in ARIMA models