Professional Documents
Culture Documents
A time series is a sequential set of data points, measured typically over successive times with
equal periodicity
• Reserve banks record interest rates and exchange rates each day
• The government statistics department will compute the country’s gross domestic product on a yearly basis
• Newspapers publish yesterday’s noon temperatures for capital cities from around the world
• Time series analysis comprises methods for analyzing time series data in order to extract
meaningful statistics and other characteristics of the data
• It is used when we only have one variable, eg. Stock Price, measured at equal intervals of
time
Approved
• Furthermore, it is also possible that the factors that determine a particular variable are not
identifiable, in which case we can use time series analysis
Not
Approved
Logistic Regression 3
2.1 COMPONENTS OF TIME SERIES
Trend Seasonal
Cyclical Irregular
Trend Seasonal
Cyclical Irregular
Trend is the general tendency of a time series to increase,
decrease or stagnate over a long period of time
Trend Seasonal
Cyclical Irregular
Seasonal Variation explains fluctuations within a year
during the season, usually caused by climate and weather
conditions, customs, traditional habits, etc.
Trend Seasonal
Cyclical Irregular
Cyclical Variation describes the medium-term changes
caused by circumstances, which repeat in cycles. The
duration of a cycle extends over longer period of time.
Multiplicative Additive
Models Models
• There are two types of stationarity, i.e. strictly stationary and weakly stationary:
• The time series {Xt ,t ∈ Z} is said to be strictly • The time series {Xt ,t ∈ Z} is said to be weakly
stationary if the joint distribution of (Xt1 , Xt2 , . . . , stationary if:
Xtk ) is the same as that of (Xt1+h, Xt2+h, . . . , Xtk+h) 1. E(Xt) = μ
• In other words, strict stationarity means that the 2. Var(Xt)=E(Xt −μ)2 =σ 2
joint distribution only depends on the “difference” 3. Cov(Xt, Xt+k) = f(k) and ≠ f(t)
h, not the time (t1,t2, . . . ,tk )
• However in most applications this stationary
condition is too strong
Time Series Analysis 10
3.2 STATIONARITY IN MEAN
E(Xt) = μ
Xt Xt
t t
Xt , Yt Xt , Yt
t t
Xt Xt
t t
Xt , Yt Xt , Yt
t t
Xt Xt
t t
Xt , Yt Xt , Yt
t t
• Forecasting is difficult as time series is non-deterministic in nature, i.e. we cannot predict with
certainty what will occur in the future
• But the problem could be a little bit easier if the time series is stationary, you simply predict its
statistical properties will be the same in the future as they have been in the past
• Most statistical forecasting methods are based on the assumption that the time series can be
rendered approximately stationary after mathematical transformations
𝛾𝑘 𝐶𝑜𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒 𝑎𝑡 𝐿𝑎𝑔 𝑘
𝑝𝑘 = =
𝛾0 𝑉𝑎𝑟𝑖𝑎𝑛𝑐𝑒
• However in reality, we only have access to the sample. Hence, a plot of ρˆk against k is known as the
sample correlogram
Statistical Significance
• The statistical significance of any ρˆk can be judged by its standard error
• If a time series is purely random, that is, it exhibits white noise, the sample autocorrelation coefficients
ρˆk are approximately
ρˆk ∼ N(0, 1/n)
• In large samples the sample autocorrelation coefficients are normally distributed with zero mean and
variance equal to one over the sample size
ρˆk ± SE ; SE = (1/n)0.5
• If the preceding interval includes the value of zero, we do not reject the hypothesis that the true ρk is zero,
but if this interval does not include 0, we reject the hypothesis that the true ρk is zero
Δ Yt = β1 + δYt−1 + wt
• However, we cannot estimate it by OLS and
test the hypothesis that ρ = 1 by the usual t-
Null hypothesis that δ = 0 test because that test is severely biased in the
Alternative hypothesis being that δ < 0 case of a unit root
Δ Yt = β1 + δYt−1 + wt
• However, we cannot estimate it by OLS and
test the hypothesis that ρ = 1 by the usual t-
Null hypothesis that δ = 0 test because that test is severely biased in the
Alternative hypothesis being that δ < 0 case of a unit root
Yt = β1 + ρYt−1 + wt
Subtracting Yt−1 From Both Sides
Yt - Yt−1 = β1 +(ρ-1)*Yt−1 + wt
Δ Yt = β1 +(ρ-1)*Yt−1 + wt
ℎ
Hence we have, Δ Yt = β1 +δ*Yt−1 + 𝑖=1 β𝑖 ∗ Δyt−i + wt
ℎ
Δ Yt = β1 +δ*Yt−1 + 𝑖=1 β𝑖 ∗ Δyt−i + wt
Calculate T-statistic and compare with critical value of DF Critical Value from DF Distribution
Calculate T-statistic and compare with critical value of DF Critical Value from DF Distribution
xt = xt-1 + wt
• In practice, the series above will not be infinite but will start at
some time t = 1. Hence, Important Properties:
1. E(xt) = 0
xt = x0 + w1 + w2 +...+ wt
2. Var (xt) = t*σ2
• An interesting feature of the Random Walks is the persistence of
random shocks Hence Random Walks (Without Drift) is Non-Stationary
xt = α + xt-1 + wt
Yt = ρYt-1 + wt ; −1 ≤ ρ ≤ 1
• If ρ = 1, the above equation becomes a RWM (without drift). If ρ is in fact 1, we face what is known
as the unit root problem, that is, a situation of non-stationarity
• If, however, |ρ| < 1, then it can be shown that the time series Yt is stationary in the sense we have
defined it
• The distinction between stationary and nonstationary stochastic processes (or time series) has a
crucial bearing on whether the trend (the slow long-run evolution of the time series under
consideration) is deterministic or stochastic
• If the trend in a time series is a deterministic function of time, such as time, time-squared etc., we
call it a deterministic trend, whereas if it is not predictable, we call it a stochastic trend
Yt = β1 + β2t + β3Yt−1 + wt
Yt = Yt−1 + wt
Subtracting Yt−1 From Both Sides
Yt - Yt−1 = wt
Δ Yt = wt
Yt = β1 +Yt−1 + wt
Subtracting Yt−1 From Both Sides
Yt - Yt−1 = β1 + wt
Δ Yt = β1 + wt
Hence, Yt is DSP as Wt ~iid(0, σ2) and Covariance between constant and variable is 0
Deterministic
β1 β2 t β1 β2 t β1 + β2(t-1) + wt-1
β2 t β2(t-1) + wt-1
β1 β2 t β2(t-t+1) wt-1
β1 β2 t β2 wt-1
Now, Yt−1 = β1 + β2(t-1) + wt-1
β1 β2 t β1 + β2(t-1) + wt-1
called a TSP
Yt = β1 + β2t + Yt−1 + wt
Subtracting Yt−1 From Both Sides
Yt - Yt−1 = β1 + β2t + Yt−1 - Yt−1 + wt
Δ Yt = β1 + β2t + wt
• Recall that the RWM without drift is nonstationary, but its first difference is stationary
• Therefore, we call the RWM without drift integrated of order 1, denoted as I(1)
• Similarly, if a time series has to be differenced twice (i.e., take the first difference of the first
differences) to make it stationary, we call such a time series integrated of order 2
• Most economic time series are generally I(1); that is, they generally become stationary only after
taking their first differences
Time Series Analysis 33
8.2 INTEGRATED STOCHASTIC PROCESSES (PROPERTIES)
3. A linear combination of an I(d1) series and I(d2) series will be stationary in larger of (d1,d2)
• DSP is stationary after • Transformations are used • TSP is stationary around the
differencing to stabilize the non- trend line
• Stock prices are usually constant variance of a • To make such a time series
DSP series. stationary is to regress it on
• To make such a time series • Common transformation time and the residuals from
stationary, we subtract 1 methods include: this regression will then be
lagged value from the • Power stationary
original value • Square root,
• Log
• It should be pointed out that if a time series is DSP but we treat it as TSP, this is called
under-differencing
• On the other hand, if a time series is TSP but we treat it as DSP, this is called
over-differencing
𝑆𝑡 = 𝛼 1−𝛼 𝐾 𝑋𝑇 − 𝐾 + 1 − 𝛼 𝑇𝑆
0
𝑘=0
ARIMA
Time Series Analysis 41
12. Autoregressive Model (AR)
Let Yt represent WPI at time t. If we model Yt as
𝑌𝑡 − 𝛿 = 𝛽1 𝑌𝑡 − 𝛿 + 𝑢𝑡 Mean of Y
−1
First-Order 𝑂𝑅
AR(1), stochastic process: In other words, this model says that the forecast value of Y at
time t is simply some proportion ( = 𝛽 1) of its value at time (t − 1) plus a random shock
or disturbance at time t; again the Y values are expressed around their mean values.
AR(2), process. The value of Y at time t depends on its value in the previous two time
periods, the Y values being expressed around their mean value δ.
(𝑌𝑡 − 𝛿) = 𝛽1(𝑌𝑡 − 1 −
𝛿) + 𝛽2(𝑌𝑡 − 2 −
𝛿) +··· +𝛽𝑝(𝑌𝑡 − 𝑝 − 𝛿) + 𝑢𝑡
• “Data speak for themselves”: only the current and previous Y values are involved; there are
no other regressors
• Reduced form model that we encountered in our discussion of the simultaneous-equation
models
Time Series Analysis 43
13. Moving Average Model
𝑌𝑡 = µ + 𝛽0 𝑢𝑡 + 𝛽1𝑢𝑡 − 1
First-Order
Moving Average Constant White Noise
Stochastic Error
Terms
MA(1) process: Each forecast made here is adjusted for the errors made in previous
period.
Second-Order
𝑌𝑡 = µ + 𝛽0 𝑢𝑡 + 𝛽1𝑢𝑡 − 1 + 𝛽2𝑢𝑡 − 2
Moving Average
A moving average process is simply a linear combination of white noise error terms.
Time Series Analysis 44
14. Autoregressive Moving Average (ARMA)
ARMA (1,1) 𝑌𝑡 = 𝜃 + 𝛼1𝑌𝑡 − 1 𝛽1𝑢𝑡 − 1 + 𝑢𝑡
+
𝑌𝑡 = 𝜃 + 𝛽1 𝑌𝑡 1 + 𝛽2 𝑌𝑡 2 +··· +𝛽𝑝 𝑌𝑡 𝑝 + 𝛽0 𝑢𝑡 + 𝛽1 𝑢𝑡 1 +
𝛽2 𝑢𝑡 +⋯
−2
+ 𝛽𝑞 𝑢−𝑡 𝑞 + 𝑢𝑡 − − −
−
ARMA (p,q)
In general, in an ARMA( p, q) process, there will be p autoregressive and q moving average
terms.
Time Series Analysis 45
Assumption in AR, MA, ARMA: Weak Stationarity
However, in cases of nonstationary time series,
15. ARIMA
A time series stationary to begin with: 𝑌𝑡 ~ 𝐼(0)
ARIMA(p,d,q) Process
Results in a Normal
Integrated
stationary process as for
of Order d
series ARMA(p,q)
Box Jenkins
Estimation of
Methodology
parameters
assumes
stationarity
Diagnostic checking
Forecasting
Typically attempted using the handy statistical softwares, a regression model (ARIMA) is run on
the calculated lags for each component.
Consider a first-order autoregressive moving-average process. Then arima estimates all the
parameters in the model
𝑦𝑡 = 𝑥𝑡𝛽 + µ𝑡 (structural equation)
µ𝑡 = 𝜌µ𝑡 − 1 + 𝜃𝜖𝑡 − 1 + 𝜖𝑡 (disturbance, ARMA(1, 1))
On using the model obtained in ARMA, we get values of change in dependent variable.
• ARCH fits regression models in which the volatility of a series varies through time.
• ARCH models estimate future volatility as a function of prior volatility.
• ARCH fits models of autoregressive conditional heteroskedasticity (ARCH) by using
conditional maximum likelihood.
and assume that conditional on the information available at time (t − 1), the disturbance term is
distributed as
The error variance may depend not only on one lagged term of the squared error term but also
on several lagged squared terms as follows:
𝐻0 = 𝛼1 = 𝛼2 = ⋯ = 𝛼𝑝
𝑢2𝑡 = 𝛼 0 + 𝛼 1𝑢2 𝑡 − 1 +
𝛼 2𝑢 2
𝑡−2 +··· +
𝛼 𝑝 𝑢 2
𝑡−𝑝
GARCH(1, 1) model:
𝜎2𝑡 = 𝛼0 + 𝛼1𝑢2 𝑡 − 1 + 𝛼2𝜎2 𝑡 − 1
Implying the conditional variance of u at time t depends not only on the squared error term in
the previous time period but also on its conditional variance in the previous time period.
GARCH(p, q) Model: p lagged terms of the squared error term and q terms of the lagged
conditional variances.