You are on page 1of 36

Time Series Analysis

Lasse Engbo Christiansen

Department of Applied Mathematics and Computer Science


Technical University of Denmark

September 26, 2017

1 / 36
Outline of the lecture

I Regression based methods, 3rd part:

I Regression and exponential smoothing (Sec. 3.4)

I Global and local trend models - an example (Sec. 3.6)


I Operators; the backward shift operator; sec. 4.5.

2 / 36
Predictions In Time Series

I Are model-based - one for all data (so far).


I What if no model fits our need (and data)?
I Methods that aren’t model based in the usual sense applies.

3 / 36
Exponential smoothing

Given a forgetting factor λ ∈ ]0; 1[


N
X −1
µ̂N = c λj YN −j = c[YN + λYN −1 + · · · + λN −1 Y1 ]
j =0

The constant c is chosen so that the weights sum to one, which implies
that c = (1 − λ)/(1 − λN ).
When N is large c ≈ 1 − λ:

µ̂N = (1 − λ)[YN + λYN −1 + · · · + λN −1 Y1 ]


= (1 − λ)YN + (1 − λ)[λYN −1 + · · · + λN −1 Y1 ]
= (1 − λ)YN + λ(1 − λ)[YN −1 + · · · + λN −2 Y1 ]
= (1 − λ)YN + λµ̂N −1

4 / 36
Exponential Smoothing and prediction

Used as a prediction model:

Y
bN +`|N = µ̂N

Updating predictions with new observations:

bN +`+1|N +1 = (1 − λ)YN +1 + λY
Y bN +`|N

5 / 36
Simple Exponential Smoothing

For large N :
µ̂N +1 = (1 − λ)YN +1 + λµ̂N

Definition (Simple Exponential Smoothing):

The sequence SN defined by

SN = (1 − λ)YN + λSN −1
is called the simple exponential smoothing or first order exponential
smoothing of the time series Y.
I The smoothing constant α = 1 − λ (or the forgetting factor λ)
determines how much the latest observation influence the prediction.

6 / 36
Choice of smoothing constant α = 1 − λ

I Given a data set t = 1, . . . , N we construct


N
X N
X
S (α) = bt|t−l (α))2 =
(Yt − Y (Yt − µ̂t−l (α))2
t=1 t=1

The value minimizing S (α) is chosen.


I If the data set is large we eliminate the influence of the initial
estimate by dropping the first part of the errors when evaluating
S (α)
I Keep in mind however what the smoothing is used for, and modify
the criteria accordingly. The next slides show an example of this.

7 / 36
Example – wind speed 76 m a.g.l. at DTU Risø
I Measurements of wind speed every 10th minute
I Task: Forecast up to approximately 3 hours ahead using exponential
smoothing
25
10min. avg. of wind speed (m/s)

20
15
10
5

Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec Jan Feb Mar Apr May Jun Jul Aug Sep

2002 2003

8 / 36
S (α) for horizons 10 and 70 minutes

10 minutes 70 minutes

50000
100000

40000 95000
SSE

SSE
90000
30000

85000
20000

80000

0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8

Weight on most recent observation Weight on most recent observation

I 10 minutes (1-step): Use α = 0.95 or higher


I 70 minutes (7-step): Use α ≈ 0.7

9 / 36
S (α) for horizons 130 and 190 minutes

130 minutes 190 minutes

145000 186000

184000

140000 182000
SSE

SSE
180000

135000 178000

176000

130000
174000

0.2 0.4 0.6 0.8 0.2 0.4 0.6 0.8

Weight on most recent observation Weight on most recent observation

I 130 minutes (13-step): Use α ≈ 0.6


I 190 minutes (19-step): Use α ≈ 0.5

10 / 36
Example of forecasts with optimal α
Measurements
10 minute forecast
190 minute forecast
14
12
10
8
6
4

12:00 14:00 16:00 18:00 20:00 22:00 0:00 2:00 4:00 6:00 8:00 10:00 12:00
Jan 11 2003 Jan 12 2003 11 / 36
From global to local trend models

Last week we worked with the global trend model

YN +j = f T (j )θ + εN +j

which was solved iteratively by

F N +1 = F N + f (−N )f T (−N )
h N +1 = L−1 h N + f (0)YN +1
θbN +1 = F −1
N +1 h N +1

Could we do that locally?

12 / 36
Local trend models

We forget old observations in an exponential manner:

θbN = arg min S (θ; N )


θ

where for 0 < λ < 1


N
X −1
S (θ; N ) = λj [YN −j − f T (−j )θ]2
0.8

j =0
Weight

0.4
0.0

0 5 10 15 20 25 30

j (age of observation)

13 / 36
WLS formulation

The criterion:
N
X −1
S (θ; N ) = λj [YN −j − f T (−j )θ]2
j =0

can be written as:


T 
Y1 − f T (N − 1)θ λN −1 Y1 − f T (N − 1)θ
  
0 ··· 0
 Y2 − f T (N − 2)θ   0 λN −2 ··· 0  Y2 − f T (N − 2)θ 
    
 ..   .. .. .. ..  .. 
 .   . . . .  . 
YN − f T (0)θ 0 0 0 1 YN − f T (0)θ

which is a WLS criterion with Σ = diag[1/λN −1 , . . . , 1/λ, 1]

14 / 36
WLS solution

−1 −1
θbN = (x T
NΣ x N )−1 x T
NΣ Y
or

θbN = F −1
N hN
N
X −1
FN = λj f (−j )f T (−j )
j =0
N
X −1
hN = λj f (−j )YN −j
j =0

15 / 36
Updating the estimates when YN +1 is available

F N +1 = F N + λN f (−N )f T (−N )
h N +1 = λL−1 h N + f (0)YN +1
θbN +1 = F −1
N +1 h N +1

As initial values we can use h 0 = 0 and F 0 = 0

For many functions λN f (−N )f T (−N ) → 0 for N → ∞ and we get the


stationary result F N +1 = F N = F . Hence:

θbN +1 = LT θbN + F −1 f (0)[YN +1 − Y


bN +1|N ]

16 / 36
Variance estimation in local trend models
Define the total memory
N −1
X 1 − λN
T = λj =
j =0
1−λ

T is a measure of how many observations estimation is essentially based


upon.

A variance estimator is therefore

σ̂ 2 = (Y − x N θbN )T Σ−1 (Y − x N θbN )/(T − p), T >p


Notice that the restriction on T is a restriction on λ. How do you
interpret this?
(Note: This estimator is not in the book.)

17 / 36
Global and Local Trend Models - an Example

6 observations (N = 6):
4.0
3.5
3.0
y

2.5
2.0

1 2 3 4 5 6

18 / 36
Global Linear Trend:

T
YN +j = θ0 + θ1 j + εN +j ⇒ f (j ) = 1 j
Linear Model form:

2.0 1 −5
     
ε1
2.5 1 −4   ε2 
 
   
3.5 1 −3 θ0 + ε3  ⇔ y = x 6 θ + ε
 
 =
3.0 1 −2 θ1
ε4 
    
4.0 1 −1  ε5 
3.5 1 0 ε6

19 / 36
Global linear trend: Estimation

 
6 −15
F6 = xT
6 x6 =
−15 55
 
18.5
h6 = xT
6 y =
−40.5
    
0.5238 0.1429 18.5 3.905
θb6 = F −1
6 h6 = =
0.1429 0.0571 −40.5 0.329

20 / 36
Global linear trend: Prediction
Linear predictor:

b6+`|6 = f (`)T θb6 = 3.905 + 0.328`


Y
LS-estimate for σ 2 :

b2 = (y − x 6 θb6 )T (y − x 6 θb6 )/(6 − 2) = 0.4532


σ

Prediction error:

ε6 (`) = Y6+` − Y
b6+`|6
b2 1 + f T (`)F −1

Var(ε
d 6 (`)) = σ 6 f (`)

For example,
d 6 (1)) = 0.6192 .
b7|6 = 4.234 with Var(ε
Y

90% prediction interval:


q
b7|6 ± t0.05 (6 − 2)
Y Var(ε
d 6 (1)) = 4.234 ± 1.320

21 / 36
Global linear trend: Estimation - global linear trend
5.5
5.0
4.5
4.0
y

3.5
3.0
2.5
2.0

1 2 3 4 5 6 7

22 / 36
Global linear trend: Updating the parameters

I New observation: y7 = 3.5.

F7 = F 6 + f T (−6)f (−6)
     
6 −15  1 7 −21
= + 1 −6 = ,
−15 55 −6 −21 91
h7 = L−1 h 6 + f (0)y 7
 −1      
1 0 18.5 1 22
= + 3.5 = ,
1 1 −40.5 0 −59
    
0.4643 0.1071 22 3.896
θb7 = = .
0.1071 0.0357 −59 0.250

23 / 36
Global linear trend: Updating - global linear trend
5.5

Global trend given t=6 (+- sd)


Global trend given t=7 (+- sd)
5.0
4.5
4.0
y

3.5
3.0
2.5
2.0

1 2 3 4 5 6 7

24 / 36
Local linear trend: Estimation

I Forgetting factor λ = 0.9. Linear model unchanged.


 
P5 j T 4.6856 −10.284
F6 = j =0 λ f (−j )f (−j ) = −10.284 35.961
 
P5 j 14.902
h6 = j =0 λ f (−j )Y6−j =
−28.580
    
−1 0.573 0.164 14.902 3.85
θb6 = F 6 h6 = =
0.164 0.075 −28.580 0.308

σ̂ 2 = (Y − x 6 θb6 )T Σ−1 (Y − x 6 θb6 )/(T − 2) = 0.4962

b2 1 + f T (1)F −1
 2
Var(ε
d 6 (1)) = σ 6 f (1) = 0.697

25 / 36
Local linear trend: Predicting for t = 7
5.5

Global trend
Local trend
5.0
4.5
4.0
y

3.5
3.0
2.5
2.0

1 2 3 4 5 6 7

26 / 36
Local linear trend: Estimating σ̂ 2
I We can use the WLS estimator for θ̂N
I but not for σ̂ 2 !?!
I Reason: Local trend models assume that t are i.i.d.
I The proposed estimator:

σˆN 2 = (Y − x N θbN )T Σ−1 (Y − x N θbN )/(T − p), T >p

provides a local estimate.


I A global estimator is given as:
N
1 X (Yj − Ŷj |j −1 )2
σˆN 2 =
N − n j =n+1 1 + f T (1)F −1
j −1 f (1)

I Note that the first n predictions are ignored to stabilize the estimator
I Note that the prediction errors are normed.

27 / 36
Local linear trend: Estimation
4.0

Global linear |t=6


Global linear |t=7
local linear |t=6
3.5
3.0
y

2.5
2.0

1 2 3 4 5 6 7

Which of the (green) local trend models have the highest λ?


28 / 36
Operators; The backwards shift operator B

I An operator A is (here) a function of a time series {xt } (or a


stochastic proces {Xt }).
I Application of an operator on a time series {xt } yields a new time
series {Axt }. Likewise of a stochastic process {AXt }.
I Most important operator for us: The backwards shift operator B :
Bxt = xt−1 . Obviously, B j xt = xt−j .
I All other operators we shall consider in this lecture may be expressed
in terms of B .

29 / 36
The forward shift F and difference ∇

The forward shift operator


I Fxt = xt+1 ; F j xt = xt+j ;
I Obviously, combining a forward and backward shift yields the
identity operator 1, ie. F and B are each others inverse: B −1 = F
and F −1 = B .

The difference operator


I ∇xt = xt − xt−1 = 1xt − Bxt = (1 − B )xt .
I Thus: ∇ = 1 − B .

30 / 36
The summation S

Sxt = xt + xt−1 + xt−2 + . . .


= xt + Bxt + B 2 xt . . .
= (1 + B + B 2 + . . .)xt

I Summation, then difference (remember Sxt = xt + Sxt−1 )


∇Sxt = Sxt − Sxt−1 = xt + Sxt−1 − Sxt−1 = xt

I Difference, then summation


S ∇xt = (1 + B + B 2 . . .)xt − (1 + B + B 2 . . .)xt−1
= (1 + B + B 2 . . .)xt − (B + B 2 . . .)xt = xt

I So ∇ and S are each others inverse:


1
∇−1 = = 1 + B + B2 + . . . = S
1−B
31 / 36
Properties of B , F , ∇ and S

I The operators are all linear, ie.

H [λxt + (1 − λ)yt ] = λH [xt ] + (1 − λ)H [yt ]

I The operators may be combined into new operators:


The power series

X
a(z ) = ai z i
i=0

defines a new operator from an operator H by linear combinations:



X
a(H ) = ai H i
i=0

32 / 36
Examples of combined operators

I ∇−1 :
∞ ∞
1 X
−1 1 X
= zi so ∇ = = Bi = S
1−z i=0
1−B i=0
I Operator polynomial of order q:
q
X
θ(z ) = θi z i
i=0

ie. θi = 0 for i > q.

θ(B ) = (1 + θ1 B + · · · + θq B q )
where θ0 is chosen to be 1

33 / 36
The Cauchy product (discrete convolution)

The equation

{λi } ∗ {ψi } = {πi }


means that

π0 = λ0 ψ0
π1 = λ1 ψ0 + λ0 ψ1
..
.
πi = λi ψ0 + λi−1 ψ1 + . . . + λ0 ψi
..
.

34 / 36
Multiplying combined operators
Theorem 4.13

I For the operator H the following operators are given:



X X∞ ∞
X
λ(H ) = λi H i , ψ(H ) = ψi H i , π(H ) = πi H i
i=0 i=0 i=0

such that λ(H )ψ(H ) = π(H ).


I Then λ, ψ, π satisfies the equation
{λi } ∗ {ψi } = {πi }.

35 / 36
Highlights

PN −1
I Local trend model: S (θ; N ) = j =0 λj [YN −j − f T (−j )θ]2
I Iterative updates
N
X −1
FN = λj f (−j )f T (−j )
j =0
N
X −1
hN = λj f (−j )YN −j
j =0

θbN = F −1
N hN

I Backwards shift operator: BXt = Xt−1


I Difference operator: ∇Xt = Xt − Xt−1

36 / 36

You might also like