You are on page 1of 60

5.

Time Series Data


A time series data set consists of observations
on a variable or several variables over time.
In economics examples of time series data
include stock prices, money supply, consumer
price index, gross domestic product, exchange
rates, exports, etc.
In such time series data set, time is an
important dimension because past as well as
current events influence future events (that is,
lags do matter in time series analysis and time
series data.
12/07/2022 1
Unlike the arrangement of cross-sectional
data, the chronological ordering of
observations(variables) in a time series
expresses potentially important information.

Thus, a key feature of time series data that


makes it more difficult to analyze is the fact
that economic observations can rarely be
assumed to be independent across time.

12/07/2022 2
Therefore, in general a time series data is a
sequence of numerical data in which each
variable is associated with a particular
instant in time.
Univariate time-series analysis- analysis of
single sequence of data describing the
behavior of one variable in terms of its own
past values.
Example: Autoregressive models:
ut = ρut−1 + εt first order autoregressive or
yt= ρ1 yt−1+ ρ2yt−2+εt second order autoregressive

12/07/2022 3
Analysis of several sets of data(variables)
for the same sequence of time periods is
called multivariate time-series analysis.
Examples, analysis of the relationships
among say price level, money supply and
GDP on the basis of annual collected data).
The main purpose of time-series analysis is
to study the dynamics or temporal structure
of the data.

12/07/2022 4
Stationary and Non-stationary stochastic
processes
From theoretical point of view, the collection
of random variable yt ordered in time is called
a stochastic process or random process.
There are two different classes of the
stochastic process.
Stationary stochastic process-gives rise to
stationary time series.
Nonstationary stochastic process- give rise
to nonstationary time series.

12/07/2022 5
Stationary Stochastic Processes
Stochastic process is said to be stationary;
if its mean and variance are constant over time(do
not depend on time or do not change as time
changes).
 If the value of the covariance between the two
time periods depends only on the lag between the
two time periods and not on the actual time.
In the time series literature, a stochastic process
that satisfies such conditions is known as weakly
stationary, or covariance stationary.

12/07/2022 6
Mathematically the condition is expressed as;

Mean  E(Yt )  u
2 2
Variance  var (Yt )  E(Yt  u)  σ
Covariance  γk  E (Yt  u)(Yt  k  u)

where γk, is the covariance (at lag k) between the


values of Yt and Yt-k. If k = 0, we obtain γ0, which is
simply the variance of Y (= σ2); if k = 1, γ1 is the
covariance between two adjacent values of Y

12/07/2022 7
If a time series is not stationary as defined
above, it is called a non-stationary time series.
In other words, a non-stationary time series will
have a time varying mean or a time-varying
variance or both.
Stationarity is important;
to make generalization for other time periods
and thus to conduct reliable forecasts.
If a time series is non-stationary, we can study
its behavior only for the time period under
consideration (can not generalize the analysis
to other time periods).
12/07/2022 8
Unit Root Process
Let the model is written as;

If ρ=1, we face what is known as the unit


root problem, that is, a situation of non-
stationarity because in this case the variance
of Yt is not stationary.
The name unit root is due to the fact that
ρ=1.
12/07/2022 9
If, however, |ρ| < 1, that is if the absolute
value of ρ is less than one, then the time
series Yt is stationary in the sense we have
defined it.
In practical research, it is important to find
out whether a time series possesses has unit
root (or if it is non-stationery).
Note that the term unit root process is
similar to non-stationery process.

12/07/2022 10
Integrated Stochastic Processes
If the time series variable is made stationary by
first differencing, it is termed as integrated of
order 1,denoted as I(1).
Similarly, if a time series has to be differenced
twice (i.e. difference of the first differences) to
make it stationary, we call such a time series
integrated of order 2 denoted as I(2).
In general, if a (nonstationary) time series has
to be differenced d times to make it stationary,
that time series is said to be integrated of order
d and is denoted as I(d). Example Yt ∼ I(d).
12/07/2022 11
If a time series Yt is stationary to begin
with or it is stationary at levels (i.e., it does
not require any differencing), it is said to be
integrated of order zero.
Most economic time series data are
generally I(1); that is, they generally become
stationary only after taking their first
differences.

12/07/2022 12
Spurious regression
This is the situation where two variables which
are not theoretical evidence to be correlated
correlated (r=0), may indicate statistical
significant coefficient.
Suppose we regress Yt on Xt . Since Yt and Xt are
uncorrelated I(1) processes, the R2 from the
regression of Y on X should tend to zero; that is,
there should not be any relationship between the
two variables.
But if you run regression you may see regression
results. As you can see, the coefficient of X is
highly statistically significant, although the R2
value is low.
12/07/2022 13
This is in a nutshell the phenomenon of spurious or non-
sense regression. According to Granger and Newbold, and
R2 > d is a good rule of thumb to suspect that the estimated
regression is spurious.
12/07/2022 14
Tests of stationarity
By now you understood the nature of
stationary and non-stationary stochastic
processes and their related importance.
In practice we face two important questions:
1. How do we know that a given time series
is stationary?
2. If it is not stationary, how can we make
it stationary?

12/07/2022 15
Although there are several tests of stationarity
(non-stationarity), researchers commonly use
the following in the literature. These are:
1. Graphical method
2.The unit root test or ADF Test

12/07/2022 16
1. Graphical Analysis
Any time before conducting formal tests, it is
always advisable to plot the time series under
study because such a plot gives an initial clue
about the likely nature of the time series.

Take, for instance, the GDP time series shown


below. You will see that over the period of study
GDP has been increasing, that is, showing an
upward trend, suggesting perhaps that the
mean of the GDP has been changing.

12/07/2022 17
This may suggest that the GDP series is not
stationary. Such an intuitive feeling is the
starting point for a more formal tests of
stationarity.

12/07/2022 18
The unit root test
(Augmented Dickey–Fuller (ADF) Test)
In conducting the usual DF test, it was
assumed that the error term ut is
uncorrelated with its lag.
But in case the ut are correlated, Dickey and
Fuller have developed a test, known as the
augmented Dickey–Fuller (ADF) test.
This test is conducted by “augmenting” the
preceding three equations by adding the
lagged values of the dependent variable ∆Yt .
12/07/2022 19
To be specific, suppose we have

The ADF test here consists of estimating


the following regression:

where εt is a pure white noise error term


and where ∆Yt−1 = (Yt−1 − Yt−2), ∆Yt−2 = (Yt−2 −
Yt−3), etc.
12/07/2022 20
The number of lagged difference terms to
include is often determined empirically, the
idea being to include enough terms so that
the error term in this model is serially
uncorrelated.

In ADF we still test whether δ = 0 and the


ADF test follows the same asymptotic
distribution as the DF statistic, so the same
critical values can be used.

12/07/2022 21
Stata commands for ADFT
We use the following commands to conduct
unit root test for GDP.
dfuller tbrate, trend lags(1)
Or if you want the regression to be displayed;
dfuller gdp, trend regress lags(1)
We use similar test for consumption as:
dfuller consumption, trend lags(1)
Or if you want the regression to be displayed;
dfuller consumption, trend regress lags(1)
12/07/2022 22
The command shows regression of the dependent
variable up on itself lagged one year and first
difference.
If not stationary at levels you check it at first
differencing. Apply the command;
dfuller D.gdp, trend regress lags(1)
For lag selection
varsoc GDP MONEY, maxlag(4)
 Note:- the asterisks (*) in a result table
indicates appropriate lag to be selected.

12/07/2022 23
Illustrative example
Use stata time series data on GDP and
Electricity consumption for a country during
1958-1994. The file name is ‘TIME SERIES’
First declare that your data is a time series
process using the command:
tsset year
Next to visually detect whether the data is
trended (non-stationary) for both variables use
the command:
twoway (tsline gdp) (tsline cosumption)
We test the null hypothesis that the variable
under consideration is a unit root process.
12/07/2022 24
The graph is given below
1500
1000
500
0

1960 1970 1980 1990 2000


year

gdp cosumption

12/07/2022 25
ADFT Result for GDP(in levels)
. dfuller gdp, trend regress lags(1)

Augmented Dickey-Fuller test for unit root Number of obs = 35

Interpolated Dickey-Fuller
Test 1% Critical 5% Critical 10% Critical
Statistic Value Value Value

Z(t) -1.289 -4.288 -3.560 -3.216

MacKinnon approximate p-value for Z(t) = 0.8907

D.gdp Coef. Std. Err. t P>|t| [95% Conf. Interval]

gdp
L1. -.1655593 .1284583 -1.29 0.207 -.4275517 .096433
LD. -.5189601 .1501446 -3.46 0.002 -.825182 -.2127382
_trend 5.70084 3.568377 1.60 0.120 -1.576914 12.97859
_cons 62.43344 35.29034 1.77 0.087 -9.541691 134.4086
12/07/2022 26
Decision and conclusion: we can see from
the table that ADF test statistics in absolute
value is 1.289 and this value is less than the
critical values at 1%,5% and also 10%.
This tells us that GDP is a unit root
process(i.e. it is nonstationary process).
Since GDP is nonstationary (is a unit root
process) at levels, we take first difference
and check whether the unit root problem is
resolved.

12/07/2022 27
ADF test at first
. dfuller Dgdp, trend regress lags(1)

Augmented Dickey-Fuller test for unit root Number of obs = 34

Interpolated Dickey-Fuller
Test 1% Critical 5% Critical 10% Critical
Statistic Value Value Value

Z(t) -6.397 -4.297 -3.564 -3.218

MacKinnon approximate p-value for Z(t) = 0.0000

D.Dgdp Coef. Std. Err. t P>|t| [95% Conf. Interval]

Dgdp
L1. -2.008074 .3139025 -6.40 0.000 -2.649149 -1.367
LD. .2564326 .1744521 1.47 0.152 -.0998461 .6127114
_trend 1.503518 .7839002 1.92 0.065 -.0974196 3.104456
_cons 28.13491 16.20987 1.74 0.093 -4.970053 61.23987
12/07/2022 28
We can see from the table that, at first
differencing the ADF test statics is -6.379
which is larger than the critical values at 1,5
and 10%.
Thus we reject the null hypothesis and we
conclude that GDP at first differencing is
stationary. That is, the first differencing
removed the unit root problem. Thus, GDP
is said to be integrated of order 1, i.e
GDP~I(1).
We can do similar test for consumption

12/07/2022 29
Long-run time series analysis
6.1. Definition and concept of co-integration
6.2. Test for co-integration
6.3. Error correction models
6.4. Autoregressive distributive lag model
6.5. Application Using Stata

30
Cointegration and VEC Model
Cointegration : a concept used to indicate the
existence of long-run relationship between two
variables.
 VEC is estimated to examine the long run
relationship between the variables and to see the
short run dynamics of error adjustment
whenever disequilibrium occurs.
• To clarify the concept, let us consider the following
very simple form of the import function;

12/07/2022 31
First check for existence of cointegration
between variables by using the following Engle-
Granger two step process.
 For variables to be cointegrated, the two variables
m and y, must be integrated of order 1 or I(1) and
also the error term derived from their linear
relationships (with no constant) must be
integrated of order 0 or [stationary at levels ,i.e.,
I(0)].
 These conditions are checked for the null
hypothesis of no cointegration.
But first look at the time plots of both
variables shown in below, we observe that both
these series are clearly nonstationary.
12/07/2022 32
12/07/2022 33
First, reg Mt Yt, noconst;
Then, predict and save the residual
еt ( from the regression;
Finally, use the saved residual as
an auxiliary regression shown below;

That is; regress D.residual L.residual, noconstant


and thus look at whether the coefficient of еt-1 is
significant or not.
If it is statistically significant, we reject the null
hypothesis and conclude that import(M) and
GDP(Y) are cointegrated and thus we are safe to go
on estimating VEC.
12/07/2022 34
Summarized:
 A time series variable is integrated of order 1,
and written I(1), if it became stationary after first
differencing.
 Strictly speaking, estimation and hypothesis
testing based on OLS is justified only if the
two variables involved are I(0). That is, if it is
stationary at levels.
 If variables cointegrated the possible VECM
is given by differenced form as shown below.
Hypothesis
H0: The variables are not cointegrated
H1: The variables are cointegrated
12/07/2022 35
A possible ECM formulation of this model
might be:

When normalized, it may look like:

12/07/2022 36
This last formulation has a simple yet
interesting economic interpretation: it pre-
supposes that some variable M has an
equilibrium path defined by:

and, at any point in time t, there are deviations


from the long run path.
Adjustments are made from one period to the next
with the speed of adjustment equal to α*1 for Mt
and α*2 for Yt

12/07/2022 37
In other words for simplicity we may write
the model as;

The coefficients γ11, γ12, γ21, γ22, α1 and α2 are then


estimated by OLS.

12/07/2022 38
Interpretation
This is a short run model for import and the
model as judged by the F-test is (p-value 0.05)
adequate.
In the short run equation of import, income
has no significant impact, but the error
correction term i.e, coefficient of RES_(-1) is
significant. The absolute value of the
coefficient of the error correction term is -
0.27. This tells us that about 27 percent of the
short run disequilibrium will be adjusted to
equilibrium within a year.
12/07/2022 39
12/07/2022 40
Johnsen cointegration VEC Model
 Steps summarized
1. Specify the model correctly – specify by differencing
the VAR (all should have difference operators and
include ECT with the adjustment parameter.
2. Prepare data for timeseries
3. Conduct stationarity test : must be I(1) and not I(2)
4. Determine optimal lag length
5. Perform Johnsen cointegration test with p lags-
statistics –multivariate time series-cointegrating rank
of a VEC-list the variables. You can use both trace
statics and max creteria.
6. If no cointegration estimate unrestricted VAR
7. With cointegration estimate VECM with p lags but
the model is estimated with p-1 lags
8. Perform diagnostics
12/07/2022 41
Data for exercise
Open data on New South Africa and declare it
time series
Conduct stationary test for the variables;
 gdppc gds,gcf,import and remittance, they must be
I(1)
• Determine optimal lag length using command;
varsoc gdppc grossdosaving import gcf
remittance , maxlag(4)
It will produce lag length of 4 as a whole (see below)

12/07/2022 42
. varsoc gdppc grossdosaving import gcf remittance , maxlag(4)

Selection-order criteria
Sample: 1975 - 2019 Number of obs = 45

lag LL LR df p FPE AIC HQIC SBIC

0 -597.139 288487 26.7618 26.8366 26.9625


1 -400.271 393.74 25 0.000 140.017 19.1232 19.5722 20.3276*
2 -363.469 73.605 25 0.000 86.4645 18.5986 19.4218 20.8068
3 -331.467 64.002 25 0.000 70.8377 18.2874 19.4848 21.4993
4 -295.206 72.522* 25 0.000 53.9955* 17.7869* 19.3585* 22.0025
12/07/2022 43
Perform Johnsen cointegration test with p lags
Use the following command;
vecrank gdppc grossdosaving gcf import remittance,
trend(constant) lags(4)
. vecrank gdppc grossdosaving gcf import remittance, trend(constant) lags(4)

Johansen tests for cointegration


Trend: constant Number of obs = 45
Sample: 1975 - 2019 Lags = 4

5%
maximum trace critical
rank parms LL eigenvalue statistic value
0 80 -358.60404 . 126.7954 68.52
1 89 -334.74897 0.65362 79.0853 47.21
2 96 -313.76001 0.60657 37.1074 29.68
3 101 -300.89772 0.43541 11.3828* 15.41
4 104 -295.67501 0.20715 0.9374 3.76
5 105 -295.20633 0.02061
12/07/2022 44
Interpreting cointegration test result
 The maximum ranks indicate null hypothesis. For
example for rank=0, it says that there is zero
cointegration equation in this model and we reject
null hypothesis if trace statistics is grater than the
critical value.
 Similarly rank=1 indicates the null hypothesis for the
existence of 1 cointegrating equation in this model and
to reject the null hypothesis if trace statistics should
be greater than critical value.
 In our case trace statistics is less than the critical value
at rank 3 and thus we can not reject the null
hypothesis and conclude that there are 3 cointegrating
equations indicated by the rank.
12/07/2022 45
Estimating VECM
Statistics –multivariate time series-vector error
correction model(VECM)-indicate all the variable
in the dialogue box- 1 cointegration equation, 2
lags (but Stata will estimate it with p-1 lags)
Or
 use the following command
vec gdppc grossdosaving gcf import remittance,
trend(constant)
The second table indicates short run-coefficients
with dependent variable, error correction
coefficient and coefficients of other variables.
Ec1 is the speed of adjustment coefficient (since
there are 5 variables we will have 5 ec terms in each
case).
12/07/2022 46
The last table from VEC model (the Johansson's
normalization restriction equation) is the long run
equation.
Johansen normalization restriction imposed

beta Coef. Std. Err. z P>|z| [95% Conf. Interval]

_ce1
gdppc 1 . . . . .
grossdosaving -271.2405 95.96211 -2.83 0.005 -459.3228 -83.15821
gcf 511.9089 115.1419 4.45 0.000 286.2349 737.5829
import -526.7451 81.35322 -6.47 0.000 -686.1945 -367.2958
remittance 17469.23 4510.614 3.87 0.000 8628.593 26309.87
_cons -2093.906 . . . . .
12/07/2022 47
In the model restriction of 1 is made on the target
variable (the dependent variable). The error
correction term is generated from this long run
equation.
For interpretation of the long –run equation you
must reverse the sign, that is the coefficient with
negative sign is interpreted as a positive and vice
versa.
If coefficients have reverse signs, we say they do
have asymmetric effect on the dependent variable
other things kept constant.
For example other things constant, gcf and import
have significant asymmetric effect on gdppc
12/07/2022 48
Interpret the speed of adjustment (coefficient of
ec1 for target variable only.
Adjustment term (coefficient of ec1-second table
of the model) even if not statistically significant
the coefficient is -0.0150375 – implying that the
previous year’s error (the disequilibrium) are
corrected within the current year at an average
speed of 1.5% which is very small in magnitude
and insignificant.

12/07/2022 49
Postestimation tests
For residual autocorrelation
 Statistics –multivariate time series- VEC
diagnostics and tests-LM test for residual
autocorrelation – lag –active and vec results- ok.
But first run;
vec gdppc grossdosaving gcf import remittance,
trend(constant) lags(1)
In this case the null hypothesis is ‘no
autocorrelation for at lag order).
You can also conduct normality and stability tests
in the same procedure
12/07/2022 50
Normality test of the residual
Statistics –multivariate time series- VEC
diagnostics and tests- test for normally distributed
disturbances- tick on jaco-bera
 null hypothesis- normally distributed
Alternative – not normally distributed
For stability test use;
vecstable
And you will get;
Eigenvalue stability condition
The VECM specification imposes 4 unit moduli.

12/07/2022 51
Autoregressive Distributed Lag (ARDL)
Autoregressive Distributed Lag (ARDL) to
cointegration technique or bound cointegration
technique is one of the most commonly used
approaches for analysing long-run relationships.
 Its application does not require pre-tests for unit
roots unlike other techniques.
Consequently, ARDL cointegration technique is
preferable when dealing with variables that are
integrated of different order, I(0), I(1) or
combination of the both.
The long run relationship of the underlying
variables is detected through the F-statistic (Wald
test).
52
The existence of a long-run/cointegrating
relationship can be tested based on the EC
representation.
A bounds testing procedure is available to
draw conclusive inference without knowing
whether the variables are integrated of order
zero or one, I(0) or I(1), respectively.
As the name indicates ARDL is the
combination both the Distributed Lag (DL)
Model and Autoregressive models.

12/07/2022 53
It says that the change in Yt is due to the current
change in Xt plus an error-correction term;
 if the ‘disequilibrium error’ in the square
brackets is positive, then a ‘go to equilibrium’
mechanism generates additional negative
adjustment in Yt .
The speed of adjustment is determined by 1−φ ,
which is the adjustment parameter.
Note that stability assumption ensures that
0<(1−φ)<1. Therefore only a part of any disequilibrium
is made up for in the current period
12/07/2022 54
ARDL Model
Commands to run ARDL (use import as dependent
variable)
ardl import gdppc grossdosaving fdi remittance,
lags(2 0 0 0 0)
Check for lag length for each variable one- by one-by
varsoc gdppc
Applying the lag length re-estimate
ardl gdppc saving import remittance, lags ( )
 For ECM we include ec after the command ardl gdppc
saving import remittance, lags( ) ec
After ardl you may run normality test by;
predict myresiduals, r
sktest myresiduals
12/07/2022 55
D.import Coef. Std. Err. t P>|t| [95% Conf. Interval]

ADJ
import
L1. -.9409866 .1504486 -6.25 0.000 -1.245055 -.6369187

LR
gdppc .004162 .0008492 4.90 0.000 .0024457 .0058783
grossdosaving .2190234 .090265 2.43 0.020 .0365911 .4014558
fdi .5942854 .3411137 1.74 0.089 -.095131 1.283702
remittance 18.62005 8.063735 2.31 0.026 2.322633 34.91747

SR
import
LD. .336468 .1324769 2.54 0.015 .0687223 .6042137

_cons
12/07/2022
-9.622943 3.847053 -2.50 0.017 -17.39813 -1.847759
56
H0: no level relationship F= 8.695
Case 3 t= -6.255

Finite sample (4 variables, 47 observations, 1 short-run coefficients)

Kripfganz and Schneider (2018) critical values and approximate p-values

10% 5% 1% p-value
I(0) I(1) I(0) I(1) I(0) I(1) I(0) I(1)

F 2.621 3.790 3.151 4.451 4.381 5.967 0.000 0.001


t -2.560 -3.668 -2.897 -4.055 -3.577 -4.824 0.000 0.000

do not reject H0 if
both F and t are closer to zero than critical values for I(0) variables
(if p-values > desired level for I(0) variables)
reject H0 if
both F and t are more extreme than critical values for I(1) variables
(if p-values <
12/07/2022
desired level for I(1) variables) 57
. ardl import gdppc grossdosaving fdi remittance, lags(1 2 1 4 3)ec

ARDL(1,2,1,4,3) regression

Sample: 1975 - 2019 Number of obs = 45


R-squared = 0.7765
Adj R-squared = 0.6609
Log likelihood = -72.482972 Root MSE = 1.5090

D.import Coef. Std. Err. t P>|t| [95% Conf. Interval]

ADJ
import
L1. -.5392366 .1601595 -3.37 0.002 -.8667997 -.2116736

LR
gdppc .0061927 .0019726 3.14 0.004 .0021583 .0102272
grossdosaving -.0382086 .189775 -0.20 0.842 -.4263421 .3499248
fdi 1.592295 1.457257 1.09 0.284 -1.38813 4.572719
remittance -18.82432 25.36777 -0.74 0.464 -70.70722 33.05859

SR
12/07/2022 58
gdppc
H0: no level relationship F= 3.163
Case 3 t= -3.367

Finite sample (4 variables, 45 observations, 10 short-run coefficients)

Kripfganz and Schneider (2018) critical values and approximate p-values

10% 5% 1% p-value
I(0) I(1) I(0) I(1) I(0) I(1) I(0) I(1)

F 2.550 3.899 3.096 4.634 4.399 6.370 0.046 0.198


t -2.470 -3.580 -2.833 -4.001 -3.567 -4.852 0.016 0.137

do not reject H0 if
both F and t are closer to zero than critical values for I(0) variables
(if p-values > desired level for I(0) variables)
reject H0 if
both F and t are more extreme than critical values for I(1) variables
(if p-values < desired level for I(1) variables)
12/07/2022 59
Bounds test for long run relationship after ECM
After running arld for error correction, to confirm
existence of long-run relationship, we run bound test
as follows:
estat btest or estat ectest
Where, ‘estat btest’ or ‘estat ectest’ stand for bound test
and The long-run cointegration is possible if the F
statistics value is above the critical value
Note – if F-statistics is greater than the critical values at
5% level of significance, we reject null hypothesis and
conclude that there is long run-relationship and if it is less
than the critical value at 5%, there is no long run
relationship. If it fall in between it is inconclusive.
12/07/2022 60

You might also like