DASC6510/DASC4990
Unit 2b: Time series decomposition
Erfanul Hoque, PhD
Thompson Rivers Univeristy
The note is strongly inspired by the materials shared on the Book:
Hyndman, R. J. & Athanasopoulos, G. (2021) Forecasting:
principles and practice, 3rd edition. [Link]
2
Time series decomposition
3
Time series decomposition
• We can think of a time series as comprising three
components: a trend-cycle component, a seasonal component,
and a remainder component (containing anything else in the
time series).
• Here, we consider the most common methods for extracting
these components from a time series. Often this is done to
help improve understanding of the time series, but it can also
be used to improve forecast accuracy.
• When decomposing a time series, it is sometimes helpful to
first transform or adjust the series in order to make the
decomposition (and later analysis) as simple as possible
4
Transformations and adjustments
5
Population Adjustement/Per capita adjustments
global_economy %>%
filter(Country == "Australia") %>%
autoplot(GDP)
1.5e+12
1.0e+12
GDP
5.0e+11
0.0e+00
1960 1980 2000
Year [1Y]
6
Per capita adjustments
global_economy %>%
filter(Country == "Australia") %>%
autoplot(GDP / Population)
60000
GDP/Population
40000
20000
0
1960 1980 2000
Year [1Y]
7
Exercise
Consider the GDP information in global_economy. Plot the GDP
per capita for each country over time. Which country has the
highest GDP per capita? How has this changed over time?
8
Inflation adjustments
• Data which are affected by the value of money need to be adjusted
before modelling.
• For example, the average cost of a new house will have
increased over the last few decades due to inflation. A
$200,000 house this year is not the same as a $200,000 house
twenty years ago. For this reason, financial time series are
usually adjusted so that all values are stated in dollar values
from a particular year.
• To make these adjustments, a price index is used. If zt denotes the
price index and yt denotes the original house price in year t, then
xt = yt /zt ∗ z2000 gives the adjusted house price at particular yer
(say, year 2000) dollar values.
• For consumer goods, a common price index is the Consumer Price
Index (or CPI).
9
Inflation adjustments
print_retail <- aus_retail %>%
filter(Industry == "Newspaper and book retailing") %>%
group_by(Industry) %>%
index_by(Year = year(Month)) %>%
summarise(Turnover = sum(Turnover))
aus_economy <- global_economy %>%
filter(Code == "AUS")
print_retail %>%
left_join(aus_economy, by = "Year") %>%
mutate(Adjusted_turnover = Turnover / CPI * 100) %>%
pivot_longer(c(Turnover, Adjusted_turnover),
values_to = "Turnover") %>%
mutate(name = factor(name,
levels=c("Turnover","Adjusted_turnover"))) %>%
ggplot(aes(x = Year, y = Turnover)) +
geom_line() +
facet_grid(name ~ ., scales = "free_y") +
labs(title = "Turnover: Australian print media industry",
y = "$AU")
10
Inflation adjustments
Turnover: Australian print media industry
4000
Turnover
3000
2000
$AU
5000
Adjusted_turnover
4500
4000
3500
3000
1990 2000 2010
Year
11
Mathematical transformations
If the data show different variation at different levels of
the series, then a transformation can be useful.
Denote original observations as y1 , . . . , yn and
transformed observations as w1 , . . . , wn .
Mathematical transformations for stabilizing variation
√
Square root wt = yt ↓
√
Cube root wt = 3 yt Increasing
Logarithm wt = log(yt ) strength
Logarithms, in particular, are useful because they are
more interpretable: changes in a log value are relative
(percent) changes on the original scale.
12
Mathematical transformations
food <- aus_retail %>%
filter(Industry == "Food retailing") %>%
summarise(Turnover = sum(Turnover))
10000
Turnover ($AUD)
5000
1990 Jan 2000 Jan 2010 Jan 2020 Jan
Month [1M]
13
Mathematical transformations
food %>% autoplot(sqrt(Turnover)) +
labs(y = "Square root turnover")
100
Square root turnover
75
50
1990 Jan 2000 Jan 2010 Jan 2020 Jan
Month [1M]
14
Mathematical transformations
food %>% autoplot(Turnover^(1/3)) +
labs(y = "Cube root turnover")
20
Cube root turnover
15
10
1990 Jan 2000 Jan 2010 Jan 2020 Jan
Month [1M]
15
Mathematical transformations
food %>% autoplot(log(Turnover)) +
labs(y = "Log turnover")
9.5
9.0
Log turnover
8.5
8.0
7.5
7.0
1990 Jan 2000 Jan 2010 Jan 2020 Jan
Month [1M]
16
Mathematical transformations
food %>% autoplot(-1/Turnover) +
labs(y = "Inverse turnover")
−0.00025
Inverse turnover
−0.00050
−0.00075
1990 Jan 2000 Jan 2010 Jan 2020 Jan
Month [1M]
17
Box-Cox transformations
Each of these transformations is close to a member of the family of
Box-Cox transformations:
{
log(yt ), λ = 0;
wt =
(ytλ − 1)/λ, λ ̸= 0.
• λ = 1: (No substantive transformation)
• λ = 12 : (Square root plus linear transformation)
• λ = 0: (Natural logarithm)
• λ = −1: (Inverse plus 1)
18
Box-Cox transformations
Box−Cox transformed food retailing turnover (lambda = 1)
12500
10000
7500
Turnover
5000
2500
1990 Jan 2000 Jan 2010 Jan 2020 Jan
Month
19
Box-Cox transformations
food %>%
features(Turnover, features = guerrero)
## # A tibble: 1 x 1
## lambda_guerrero
## <dbl>
## 1 0.0895
• This attempts to balance the seasonal fluctuations
and random variation across the series.
• Always check the results.
• A low value of λ can give extremely large prediction
intervals.
20
Box-Cox transformations
food %>% autoplot(box_cox(Turnover, 0.0524)) +
labs(y = "Box-Cox transformed turnover")
12
Box−Cox transformed turnover
11
10
1990 Jan 2000 Jan 2010 Jan 2020 Jan
Month [1M]
21
Transformations
• Often no transformation needed.
• Simple transformations are easier to explain and work
well enough.
• Transformations can have very large effect on PI.
• If some data are zero or negative, then use λ > 0.
• Choosing logs is a simple way to force forecasts to be
positive
• Transformations must be reversed to obtain forecasts
on the original scale. (Handled automatically by
fable.)
22
Time series components
23
Time series patterns
Recall
Trend pattern exists when there is a long-term increase or
decrease in the data.
Cyclic pattern exists when data exhibit rises and falls that
are not of fixed period (duration usually of at least 2
years).
Seasonal pattern exists when a series is influenced by seasonal
factors (e.g., the quarter of the year, the month, or
day of the week).
24
Time series decomposition
yt = f (St , Tt , Rt )
where yt = data at period t
Tt = trend-cycle component at period t
St = seasonal component at period t
Rt = remainder component at period t
Additive decomposition: yt = St + Tt + Rt .
Multiplicative decomposition: yt = St × Tt × Rt .
25
Time series decomposition
• Additive model appropriate if magnitude of seasonal
fluctuations does not vary with level.
• If seasonal are proportional to level of series, then
multiplicative model appropriate.
• Multiplicative decomposition more prevalent with
economic series
• Alternative: use a Box-Cox transformation, and then
use additive decomposition.
• Logs turn multiplicative relationship into an additive
relationship:
yt = St × Tt × Rt ⇒ log yt = log St + log Tt + log Rt . 26
US Retail Employment
us_retail_employment <- us_employment %>%
filter(year(Month) >= 1990, Title == "Retail Trade") %>%
select(-Series_ID)
us_retail_employment
## # A tsibble: 357 x 3 [1M]
## Month Title Employed
## <mth> <chr> <dbl>
## 1 1990 Jan Retail Trade 13256.
## 2 1990 Feb Retail Trade 12966.
## 3 1990 Mar Retail Trade 12938.
## 4 1990 Apr Retail Trade 13012.
## 5 1990 May Retail Trade 13108.
## 6 1990 Jun Retail Trade 13183.
## 7 1990 Jul Retail Trade 13170.
## 8 1990 Aug Retail Trade 13160.
## 9 1990 Sep Retail Trade 13113.
## 10 1990 Oct Retail Trade 13185. 27
US Retail Employment
us_retail_employment %>%
autoplot(Employed) +
labs(y = "Persons (thousands)",
title = "Total employment in US retail")
Total employment in US retail
16000
Persons (thousands)
15000
14000
13000
1990 Jan 2000 Jan 2010 Jan 2020 Jan
Month [1M]
28
US Retail Employment
us_retail_employment %>%
model(stl = STL(Employed))
## # A mable: 1 x 1
## stl
## <model>
## 1 <STL>
29
US Retail Employment
dcmp <- us_retail_employment %>%
model(stl = STL(Employed))
components(dcmp)
## # A dable: 357 x 7 [1M]
## # Key: .model [1]
## # : Employed = trend + season_year + remainder
## .model Month Employed trend season_~1 remai~2 seaso~3
## <chr> <mth> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 stl 1990 Jan 13256. 13288. -33.0 0.836 13289.
## 2 stl 1990 Feb 12966. 13269. -258. -44.6 13224.
## 3 stl 1990 Mar 12938. 13250. -290. -22.1 13228.
## 4 stl 1990 Apr 13012. 13231. -220. 1.05 13232.
## 5 stl 1990 May 13108. 13211. -114. 11.3 13223.
## 6 stl 1990 Jun 13183. 13192. -24.3 15.5 13207.
## 7 stl 1990 Jul 13170. 13172. -23.2 21.6 13193.
## 8 stl 1990 Aug 13160. 13151. -9.52 17.8 13169.
## 9 stl 1990 Sep 13113. 13131. -39.5 22.0 13153. 30
US Retail Employment
us_retail_employment %>%
autoplot(Employed, color='gray') +
autolayer(components(dcmp), trend, color='#D55E00') +
labs(y = "Persons (thousands)",
title = "Total employment in US retail")
Total employment in US retail
16000
Persons (thousands)
15000
14000
13000
1990 Jan 2000 Jan 2010 Jan 2020 Jan
Month [1M] 31
US Retail Employment
components(dcmp) %>% autoplot()
STL decomposition
Employed = trend + season_year + remainder
16000
Employed
15000
14000
13000
16000
15000
trend
14000
13000
season_year
500
250
0
−250
100
remainder
50
0
−50
−100
1990 Jan 2000 Jan 2010 Jan 2020 Jan
Month
32
season_year
−250
0
250
500
1990
2000
Jan
2010
2020
1990
2000
2010 Feb
2020
1990
2000
Mar
2010
2020
1990
2000
Apr
US Retail Employment
2010
2020
1990
2000
May
2010
2020
1990
2000
Jun
2010
2020
1990
Month
2000
Jul
2010
2020
1990
2000
Aug
2010
2020
1990
2000
Sep
2010
2020
components(dcmp) %>% gg_subseries(season_year)
1990
2000
Oct
2010
2020
1990
2000
Nov
2010
2020
1990
2000
Dec
2010
2020 stl
33
Seasonal adjustment
• Useful by-product of decomposition: an easy way to calculate
seasonally adjusted data.
• Additive decomposition: seasonally adjusted data given by
yt − St = Tt + Rt
• Multiplicative decomposition: seasonally adjusted data given
by
yt /St = Tt × Rt
34
US Retail Employment
us_retail_employment %>%
autoplot(Employed, color='gray') +
autolayer(components(dcmp), season_adjust, color='#0072B2') +
labs(y = "Persons (thousands)",
title = "Total employment in US retail")
Total employment in US retail
16000
Persons (thousands)
15000
14000
13000
1990 Jan 2000 Jan 2010 Jan 2020 Jan
Month [1M] 35
Seasonal adjustment
• We use estimates of S based on past values to seasonally
adjust a current value.
• Seasonally adjusted series reflect remainders as well as
trend. Therefore they are not “smooth” and “downturns” or
“upturns” can be misleading.
• It is better to use the trend-cycle component to look for
turning points.
36
Classical Decomposition
• The traditional way to do time series decomposition is called
Classical decomposition.
• The first step in a classical decomposition is to use a moving
average method.
• The simplest estimate of the trend-cycle uses moving
averages.
• A moving average of order m can be written as
1 ∑ k
T̂t = yt+j , where m = 2k + 1
m j=−k
37
Moving Average Smoothing
So a moving average is an average of nearby points
• observations nearby in time are also likely to be close in
value.
• average eliminates some randomness in the data, leaving a
smooth trend-cycle component.
3-MA: T̂t = (yt−1 + yt + yt+1 )/3
5-MA: T̂t = (yt−2 + yt−1 + yt + yt+1 + yt+2 )/5
• each average computed by dropping oldest observation
and including next observation.
• averaging moves through time series until trend-cycle
computed at each observation possible.
38
Moving averages: example
global_economy |> filter(Country == "Australia") |>
autoplot(Exports) +
labs(y="% of GDP", title= "Total Australian exports")
Total Australian exports
21
% of GDP
18
15
12
1960 1980 2000
Year [1Y]
39
Moving average smoothing
Year Exports 5-MA
1960.00 12.99
1961.00 12.40
1962.00 13.94 13.46
1963.00 13.01 13.50
1964.00 14.94 13.61
... ... ...
2012.00 21.52 20.78
2013.00 19.99 20.81
2014.00 21.08 20.37
2015.00 20.01 20.32
2016.00 19.25
2017.00 21.27
40
Moving average smoothing
Total Australian exports: 3−MA
21
% of GDP
18
15
12
1960 1980 2000
Year [1Y]
41
Moving average smoothing
Total Australian exports: 5−MA
21
% of GDP
18
15
12
1960 1980 2000
Year [1Y]
42
Exercise:
1. For the following series, find an appropriate
transformation in order to stabilise the variance.
• United States GDP from global_economy
• Slaughter of Victorian “Bulls, bullocks and steers” in
aus_livestock
• Victorian Electricity Demand from vic_elec.
• Gas production from aus_production
2. Why is a Box-Cox transformation unhelpful for the
canadian_gas data?
43
Next Lecture!
• In the next lecture, we learn The Forecaster’s Toolbox
Please go to Chapter 5 of text book (Hyndman, R. J. &
Athanasopoulos, G. (2021) Forecasting: principles and practice,
3rd edition. [Link] beforehand.
44