You are on page 1of 81

Seasonal Adjustment

Eurostat

Topics

Motivation and theoretical background (yvind Langsrud)

Seasonal adjustment step-by-step (Lszl Sajtos)

(A few) issues on seasonal adjustment (Lszl Sajtos)

Presented by
yvind Langsrud
Statistics Norway

Time series with seasonal and non-seasonal variation

80
60
40

a1

100

120

140

Index of production: Durable consumer goods

2004

2006

2008
Time

2010

2012

Removing the seasonal variation

40

60

80

100

120

140

Original (black) and seasonally adjusted (blue)

2004

2006

2008
Time

2010

2012

Removing also the non-seasonal variation

40

60

80

100

120

140

Original (black), seasonally adjusted (blue) and trend (red)

2004

2006

2008
Time

2010

2012

80

100

120

140

160

Monthly
time series example
Original series: Retail sales volume index

2000

2002

2004

2006

2008

2010

2012

2014

Trend and seasonality can be seen


How to find it by computation?

Quick and dirty calculation of trend by ordinary linear regression:

120

140

160

y = a + b*time + e

80

100

a = -6619.731
b = 3.351223

2000

2002

2004

2006

2008

2010

2012

2014

time = 2000.000, 2000.083, 2000.167, 2000.250,


2000.333, 2000.417, 2000.500, 2000.583,
2000.667, 2000.750, 2000.833, 2000.917,
2001.000, 2001.083, ...

Including seasonality in "the dirty model"

y = a + b*time + cmonth + e

80

100

120

140

160

Original (blue) and model fit (red)

2000

2002

2004

2006

2008

2010

2012

2014

140
80

y = a + b*time + cmonth + e

100

120

Including seasonality in "the dirty model"

160

Original (blue) and model fit (red)

2000

2002

2004

2006

2008

2010

2012

a = -6468.505
b = 3.275956
c =
mnd0
mnd2
-9.19620250 -16.59062737
mnd7
mnd8
1.84439111
4.62139480

mnd3
-6.79790939
mnd9
-2.56494236

mnd4
-8.51090569
mnd10
-0.04409251

mnd5
-1.18890200
mnd11
1.53598811

mnd6
6.33881598
mnd12
30.55299181

Transforming to seasonal adjustment language


a + b*time Tt
cmonth St
e It

yt = Tt + St + It

2014

Trend from "the dirty model"

yt = Tt + St + It

80

100

120

140

160

Original (blue) and trend (red)

2000

2002

2004

2006

2008

2010

2012

2014

yt = Tt + St + It
Seasonality from "the dirty model"

-10

10

20

30

Seasonality

2000

2002

2004

2006

2008

2010

2012

2014

yt = Tt + St + It
Seasonal adjustment by "the dirty model"

80

100

120

140

160

Original (blue) and seasonal adjusted (red)

2000

2002

2004

2006

2008

2010

2012

2014

Question to the audience:

What is wrong with this


ordinary regression approach ?

yt = Tt + St + It
Irregular component by "the dirty model"

-5

10

Irregular componet

2000

2002

2004

2006

2008

2010

2012

2014

In practise a multiplicative model is used:

yt = Tt St It
yt is not the original series but a series that is corrected for holiday and
trading day effects (calendar adjusted)

yt = Tt St It

100

120

140

160

Original (blue) and trend (red)

80

2000

2002

2004

2006

2008

2010

2012

2014

Seasonal factors

0.9

1.0

1.1

1.2

1.3

yt = Tt St It

2000

2005

2010

Note that the seasonal factors vary slightly along time

2015

Irregular componet

0.97

0.98

0.99

1.00

1.01

1.02

yt = Tt St It

2000

2002

2004

2006

2008

2010

2012

2014

This time the irregular component looks more as


true noise

Note that correlated neighbour values is allowed


(autocorrelation)

yt = Tt St It

80

100

120

140

160

Original (blue) and seasonally adjusted (red)

2000

2002

2004

2006

2008

2010

2012

2014

This is seasonally adjusted data as published by


Statistics Norway

Multiplicative model: yt = Tt St It
Additive model: yt = Tt + St + It

How to calculate Tt, St, and It from yt?


This is done by filtering
techniques
120

Seasonally adjusted (blue) and trend (red)

90

100

110

One element of this


methodology is how to
calculate the trend from
seasonally adjusted data
This is a question of
smoothing a noisy series
2000

2002

2004

2006

2008

2010

2012

90

100

110

120

2000-2014 Seasonally adjusted (blue) and trend (red)

2000

2002

2004

2006

2008

2010

2012

2014

110

115

120

2007-2012 Seasonally adjusted (blue) and trend (red)

2007

2008

2009

2010

2011

2012

Smoothing
by averaging
P = (Y + Y + Y )/3
t-1

t+1

115

120

3-term simple moving average: [1,1,1]/3

110

2007

2008

2009

2010

2011

2012

Also called filtering


Pt = (Yt-2+ Yt-1+ Yt + Yt+1 + Yt+2)/5
The filter is [1,1,1,1,1]/5

110

115

120

5-term simple moving average: [1,1,1,1,1]/5

2007

2008

2009

2010

2011

2012

Here the filter length is 9

110

115

120

9-term simple moving average: [1,1,1,1,1,1,1,1,1]/9

2007

2008

2009

2010

2011

2012

Filtering can be performed twice


3x3 filter
3-term moving average of a 3-term moving average
The final filter is [1,2,3,2,1]/9
Pt = (Yt-2+ 2Yt-1+ 3Yt + 2Yt+1 + Yt+2)/9

2x12 filter
[1/2,1,1,1,1,1,1,1,1,1,1,1,1/2]/12
Also called a centred 12-term moving average
Question to the audience:
Why is this filter of special interest?

Henderson filters
Finding filters with good properties is an
interesting topic
Hederson (1916) introduces the so-called
Henderson filters
X-12-ARIMA uses this type of filter to calculate
the trend
The filter length determines the degree of
smoothing

110

115

120

5-term Henderson: [-21,84,160,84,-21]/286

2007

2008

2009

2010

2011

2012

110

115

120

7-term Henderson: [-42,42,210,295,210,42,-42]/715

2007

2008

2009

2010

2011

2012

110

115

120

13-term Henderson: [-325,-468,0,1100,2475,3600,4032,3600,2475,1100,0,-468,-325]/16796

2007

2008

2009

2010

2011

2012

110

115

120

23-term Henderson filter

2007

2008

2009

2010

2011

2012

Question to the audience: Why does the filtered series stop in 2009?

110

115

120

99-term Henderson filter

2007

2008

2009

2010

2011

2012

Non-available observations at the end:


Two solutions
Asymmetric filters
Asymmetric variant of Henderson
[-0.034,0.116,0.383,0.534,0,0,0]
Can be used at the last observation

Forecasts in place of the unobserved values


The starting series for the X12-ARIMA decompositions is
a calendar adjusted series which is based on reg-ARIMA
modelling
The reg-ARIMA modelling can also be used to produced
forecasts
X12-ARIMA uses these forecasts in trend calculations

1.2
1.1
1.0
0.9

Finding the
seasonal
component
by filtering

1.3

Series with trend removed

2000

2002

2004

2006

2008

2010

2012

From a series with the trend removed we make


12 series
January-values, February-values,

Each of these series is smoothed by filtering


Altogether these smoothed series are the
seasonal component

2014

The X12-ARIMA algorithm


The decomposition is made by several iterative
steps
Seasonal component from series with trend removed
Trend from series with seasonal component removed

Initial estimate of trend using the 2x12 moving


average
One element is downweighting of observations
with an extreme irregular component

X12-ARIMA or SEATS
Both method can be viewed as filtering techniques
X12-ARIMA
A non-parametric method
No model assumed

SEATS
The components are assumed to follow ARIMA models
The filters are derived from modelling
Possible to do inference and to make forecasts with
confidence intervals
So why the name X12-ARIMA when this method is the one
that is not based on ARIMA?
Answer on the next slide

Calendar adjustment by reg-ARIMA modelling


"The dirty model"
mentioned earlier:

Seasonal ARIMA model

Correlated errors (autocorrelation)


Differencing the series makes the model quite good without explicit
parameters for trend and seasonality
Need to decide the type of ARIMA model: ARIMA(p,d,q)(P,D,Q)

Regression parameters in the model

Calendar effects: Trading day, Moving holyday,


Outliers and level shifts

Here y can be a log-transformed and leap-year adjusted


variant of the original data

This slide is stolen from


https://www.scss.tcd.ie/Rozenn.Dahyot/ST7005/15SeasonalARIMA.pd
f

Here B is the backshift operator: BYt =Yt-1


ARIMA(0,1,1)(0,1,1)
Most common model
Airline model

Example of
regression variables
in reg-ARIMA
modelling
Easter
2000 and 2001: Easter in
April
2008: Easter in March
2002: 4 of 5 Norwegian
Easter days in March

Trading day
Six parameters needed to
model seven days
Mon: Number of Mondays
minus Number of Sundays

Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
Dec
Jan
Feb
Mar
Apr
May
:
Mar
Apr
May
Jun

2000
2000
2000
2000
2000
2000
2000
2000
2000
2000
2000
2000
2001
2001
2001
2001
2001
2001
2001
2001
2001
2001
2001
2001
2002
2002
2002
2002
2002
:
2008
2008
2008
2008

Easter Mon Tue Wed Thu Fri Sat


0.0000000
0 -1 -1 -1 -1
0
0.0000000
0
1
0
0
0
0
-0.2571429
0
0
1
1
1
0
0.2571429 -1 -1 -1 -1 -1
0
0.0000000
1
1
1
0
0
0
0.0000000
0
0
0
1
1
0
0.0000000
0 -1 -1 -1 -1
0
0.0000000
0
1
1
1
0
0
0.0000000
0
0
0
0
1
1
0.0000000
0
0 -1 -1 -1 -1
0.0000000
0
0
1
1
0
0
0.0000000 -1 -1 -1 -1
0
0
0.0000000
1
1
1
0
0
0
0.0000000
0
0
0
0
0
0
-0.2571429
0
0
0
1
1
1
0.2571429
0 -1 -1 -1 -1 -1
0.0000000
0
1
1
1
0
0
0.0000000
0
0
0
0
1
1
0.0000000
0
0 -1 -1 -1 -1
0.0000000
0
0
1
1
1
0
0.0000000 -1 -1 -1 -1 -1
0
0.0000000
1
1
1
0
0
0
0.0000000
0
0
0
1
1
0
0.0000000
0 -1 -1 -1 -1
0
0.0000000
0
1
1
1
0
0
0.0000000
0
0
0
0
0
0
0.5428571 -1 -1 -1 -1
0
0
-0.5428571
1
1
0
0
0
0
0.0000000
0
0
1
1
1
0
:
0.7428571
0 -1 -1 -1 -1
0
-0.7428571
0
1
1
0
0
0
0.0000000
0
0
0
1
1
1
0.0000000
0 -1 -1 -1 -1 -1

Trading day: Separate effect of each day or


common effect of all weekdays?

Question to the
audience:

Regression Model
-------------------------------------------------------------Parameter
Standard
Variable
Estimate
Error
t-value
-------------------------------------------------------------Trading Day
Mon
-0.0019
0.00193
-1.00
Tue
0.0064
0.00194
3.31
Wed
0.0018
0.00190
0.94
Thu
-0.0016
0.00195
-0.81
Fri
0.0138
0.00188
7.37
Sat
0.0034
0.00193
1.73
*Sun (derived)
-0.0219
0.00196
-11.16

Why exactly
equal t-values?
Regression Model
-------------------------------------------------------------Parameter
Standard
Variable
Estimate
Error
t-value
-------------------------------------------------------------Trading Day
Weekday
0.0036
0.00053
6.87
**Sat/Sun (derived)
-0.0090
0.00131
-6.87

Outliers
An extreme observation caused by a special event can
be problematic
Can influence the modelling in a negative way

Parameter estimates
Forecasts
Decomposition

Solution
Include the outlier as a dummy variable in the reg-ARIMA
modelling

.0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0.

The outlier is included in the irregular component after


modelling

The observation is still included in seasonally adjusted data


But has no effect on the trend

Question to the audience: Examples of special events?

90

100

110

120

Data with outlier: Seasonally adjusted (blue) and trend (red)

2000

2002

2004

2006

2008

2010

2012

2014

85

90

95

100

105 110 115

Data with level shift: Seasonally adjusted (blue) and trend (red)

2000

2002

2004

2006

2008

2010

2012

2014

Level shift is handled similar to outliers


Regression variable: .0,0,0,0,0,0,0,0,1,1,1,1,1,1,1,1.
Level shift is included in the trend

Presented by
Lszl Sajtos
Hungarian Central Statistical Office

Topics
Seasonal adjustment step-by-step
(A few) issues on seasonal adjustment

Seasonal adjustment step-by-step

Seasonal adjustment step-by-step: structure


Input data
STEPS with check points
Not acceptable results

Preliminary results
If results are acceptable

Output data

Time series analysis (STEP 0)


Basic conditions

Length of time series (enough long to be seasonally adjusted?)


Monthly datasets: at least 3-year long
Quarterly datasets: at least 4-year long
At least 5-7-year long time series is optimal!

Expert information

Collecting expert data from the sections about datasets (potential outliers,
methodological changes, changes in exterior factors (e.g. law), connections
to other time series and sectors)

Graphical analysis, test for seasonality (STEP 1)

Graphical analysis via basic and sophisticated graphs


Plotted raw dataset

Spectral analysis: autocorrelogram and


auto-regressive spectrum

Identifying and explaining missing observations and outliers

Correction of data faults

Test for seasonality

Graphical analysis, an example (2000-2013)


lelm. jell.

Seasonality

144

136

128

Seems additive

120

112

104

96

88

80

Probably outliers

72

64

date

56
J an2000

J an2002

J an2004

J an2006

J an2008

J an2010

J an2012

J an2014

Data: Hungarian monthly retail volume index, food

Type of transformation (STEP 2)


Software
tools

Automatic test
Verification
Graphical analysis

Calendar adjustment (STEP 3)


Determining factors which may
affect (regressors)+national holidays

Consideration based on
professional reasons

Elimination

Little significance

Significance

Non-significance or absence

Keep

Consideration based on
professional reasons

Elimination

Outlier treatment (Step 4)


Software tools

Available expert
information

Verifying the results

Automatic outlier testing

STEP 1
Less significant, but
professionally
reasonable

Significant

Keep it
Monitoring

Stability

Not
significant

Consideration based on
professional reasons
Eliminate
it

ARIMA model (Step 5)


Software tools

Automatic choice recommended

Good results

Not satisfying results

Keep model
Airline model

Manual settings

Reducing the order of the model

Other low ordered models

Decomposition (Step 6)

Software tools

Eliminating deterministic
effects

Decomposition

Additive

Multiplicative

Log-additive

Quality diagnostics (Step 7)


1.

Model adequacy on residuals:

Ljung-Box test
Box-Pierce test

2.

Seasonality: based on spectral graphics

3.

Stability analysis: sliding spans

Documentation
required!

Manual settings (Step 8)


In case of:

Detailed analysis

Quality diagnostics are not auspicious

Further outlier correction

Other advanced settings (e.g. confidence intervals)

Manual settings

satisfying
Quality diagnostics

not

Manual settings

Dissemination

(STEP 9)

EXAMPLE (IN DEMETRA 2.04 SOFTWARE)


HUNGARIAN INDUSTRIAL TIME SERIES

Automated module

Open the input database

The list of time series

Selection of time series output

Save of output

Diagnostic, outlier %

Adjustment without fixed models

Setting the method and trading day regressor

Setting the country specific holidays

The results

Manual settings required

Quality diagnostics

(A few) issues on seasonal adjustment

Issues in Memobust book

Consistency issues

Data presentation

Revision

Treatment of the crisis

Communication with users

Issues on chained indices


Documentation

Revision
Revision

Unadjusted
data

SA data
Reasons:

New information are available


Better estimation required.

What to do: Estimating new model,


new seasonal factors

Reasons:

Data arrival after deadline


Erroneous data etc.

What to do: Data review

Revision strategies
Goal: preserving accuracy, taking new information into consideration while
avoiding large changes

reliability and stability

Strategies:

Extreme
Extreme types
types

Current

Concurrent

Alternative
Alternative types
types

Partial
concurrent

Controlled
current

Horizon of revision
Question: How many months of data should be revised?
Practices:

ESS Guideline: 3-4 years before the beginning of the revision


period

Statistics Denmark: at least 13 months back in time

Consistency issues
Linkages in economy and
among time
series;expectations of users;
errors; etc.

Issues
Time consistency issue

Temporal constraints

E.g.Annual and infra-annual series

Aggregation consistency issue

Cross-sectional
constraints

E.g.Total industrial and segmental


series

Time consistency issues


Problem: consistency of, for instance, sub-annual and annual series

e.g. GDP

Sources of inconsistency:

Less and more accurate data are compared;


Sampling errors;
Errors in evaluation

Benchmarking
Benchmark: typically annual data
Aim: Providing time consistency, the techniques operate with the
sum of modified sub-annual series

Benchmarking

Pro-rating
method

Denton
method

Pro-rating method
How it works: multiplies the sub-annual values by the
corresponding annual proportional discrepancies
Example: Three observations (), requirement:
Corrected values: ;

Denton method
How it works: Based on quadratic optimalization
Advantages:

The method can be developed, specificated

More reliable results (smaller discontinuities compared with


pro-rating)

Aggregation consistency
Aggregate series: time series consists of several components
(e.g. industrial series)
Goal: The aggregate series should equal to the sum of their
components
Problem: Non-linear seasonal adjustment process

Indirect SA

Direct SA

Consequences: Hard to preserve accounting relationships, and


meet users expectations

Methods to achieve aggregation consistency

Only direct or indirect seasonal adjustment


Pro-rating
Denton method
Regression based models

Thank you for your attention!


Questions?

You might also like