You are on page 1of 16

3.1 Methods or Models?

3.2 Extrapolation Methods


3.3 Simple Exponential Smoothing
Arnond Sakworawich, Ph.D. 3.4 Linear Exponential Smoothing
Graduate School of Applied Statistics 3.5 Exponential Smoothing with a Damped Trend
National Institute of Development Administration 3.6 Other Approaches To Trend Forecasting
3.7 Prediction Intervals
3.8 The Use of Transformations
3.9 Model Selection
3.10 Principles for Extrapolative Models
1 2

• A forecast function is a mathematical expression for


deriving the forecasts over the forecast horizon. F=a+b*t • Forecasting method:
• A forecasting method is a (numerical) procedure for Ft b0 b1t
generating a forecast. That is, it involves the direct use of
a forecast function. When such methods are not based • Statistical (forecasting) model:
upon an underlying statistical model, they are termed
heuristic – naïve method: F(t+1) =Y t Yt 0 1 t
• A statistical forecasting) model is a statistical
o Plus assumptions about the distribution of the
description of the data generating process from which a
random error term.
forecasting method may be derived. Forecasts are made
by using a forecast function that is derived from the o The estimated model provides the forecast function,
model. A statistical model is a necessary foundation for along with the framework to make statements about
the construction of prediction intervals. F=a+b*t+e: OLS model uncertainty.
3 4
พยากรณ์ค่าใช้จ่าย ลักประกัน ขุ ภาพถ้วน น้าของ ป ช (ล้านบาท)
250,000

200,000

150,000

100,000

50,000

5 0 6

2545
2546
2547
2548
2549
2550
2551
2552
2553
2554
2555
2556
2557
2558
2559
2560
2561
2562
2563
2564
2565
2566
2567
2568
7 8
• A moving average of order K calculates the average for
• A series is locally constant if the mean level changes the last K periods:
gradually over time but there is no reason to expect a
systematic increase or decrease. Series may change Yn Yn 1 Yn
MA(t | K ) K 1
level / behavior over time. K
o Global averages apply weight (1/n) to each
observation • At each point in time, we drop the oldest observation
• We may apply greater weight to the recent past to and add a new one.
capture such changes. Trade off: • Large values of K produce smoother plots but are slower
o catching new trends [quick adjustment] against to adapt to changes.
o spurious movements in response to random • The forecast for period (t+1) is Ft 1 MA(t | K )
movements [slow adjustment]
9 10

Week Sales (26) MA(3) MA(7)


1 23056
2 24817
3 24300
4 23242 24058
5 22862 24120
6 22863 23468
7 23391 22989
8 22469 23039 23504
9 22241 22908 23421
10 24367 22701 23053
Data shown is from file WFJ_sales_MA.xlsx. © Cengage Learning 2013.
11 12
13 14

55,000
Sales

50,000 Adding one new observation! How updated mean will be?

45,000

Y1 Y2 Yn Yn 1
Y n 1
40,000 n 1
Yn 1 Y (n)
Y n
35,000
n 1
© Cengage Learning 2013.

30,000

• A verbal description of this expression is:


25,000
( Difference between new observation and old mean)
New mean old mean
n 1
20,000
1 3 5 7 9 11 13 15 17 19 21 23 25 27 29 31 33 35 37 39 41 43 45 47 49 51 53 55 57 59 61 63

WFJ Sales MA(3) MA(7) MA(3,3) MA(3,7) MA(7,7) 15 16


• Simple (or Single) Exponential Smoothing follows the • By repeated substitution, we obtain the extended form
same general idea but makes a smooth transition from showing the declining weights that give rise to the
one period to the next. name exponential smoothing:
• We select a smoothing constant , such that 0< <1
making for a partial adjustment. Ft 1
(1 )t 1 F0
• We denote the mean level at time t by Lt.
New mean = Old mean + * (Difference between new observation and old mean)
[Yt 1
(1 )Yt (1 )2Yt 1
(1 )tY1 ]

Ft 1
Ft (Yt 1
Ft ) Ft Et
Ft 1
Yt (1 )Yt 1
Ft 1
Ft Yt 1
Ft (1 2
) Yt 2
Ft 1
* Yt (1 )Ft (1 3
) Yt ...
3
F2 Y1 for initialization 17 18

# Term α=0.1 α=0.2 α=0.3 α=0.4 α=0.5 α=0.6 α=0.7 α=0.8 α=0.9
1 Xt 0.1000 0.2000 0.3000 0.4000 0.5000 0.6000 0.7000 0.8000 0.9000
2 Xt-1 0.0900 0.1600 0.2100 0.2400 0.2500 0.2400 0.2100 0.1600 0.0900
3 Xt-2 0.0810 0.1280 0.1470 0.1440 0.1250 0.0960 0.0630 0.0320 0.0090
4 Xt-3 0.0729 0.1024 0.1029 0.0864 0.0625 0.0384 0.0189 0.0064 0.0009
• A smaller value of gives a smoother trend line:
5 Xt-4 0.0656 0.0819 0.0720 0.0518 0.0313 0.0154 0.0057 0.0013 0.0001
o = 0.1 gives slow adjustment 6 Xt-5 0.0590 0.0655 0.0504 0.0311 0.0156 0.0061 0.0017 0.0003 0.0000
o = 0.9 gives rapid adjustment 7 Xt-6 0.0531 0.0524 0.0353 0.0187 0.0078 0.0025 0.0005 0.0001 0.0000
8 Xt-7 0.0478 0.0419 0.0247 0.0112 0.0039 0.0010 0.0002 0.0000 0.0000
9 Xt-8 0.0430 0.0336 0.0173 0.0067 0.0020 0.0004 0.0000 0.0000 0.0000
10 Xt-9 0.0387 0.0268 0.0121 0.0040 0.0010 0.0002 0.0000 0.0000 0.0000
11 Xt-10 0.0349 0.0215 0.0085 0.0024 0.0005 0.0001 0.0000 0.0000 0.0000
12 Xt-11 0.0314 0.0172 0.0059 0.0015 0.0002 0.0000 0.0000 0.0000 0.0000
13 Xt-12 0.0282 0.0137 0.0042 0.0009 0.0001 0.0000 0.0000 0.0000 0.0000
14 Xt-13 0.0254 0.0110 0.0029 0.0005 0.0001 0.0000 0.0000 0.0000 0.0000
15 Xt-14 0.0229 0.0088 0.0020 0.0003 0.0000 0.0000 0.0000 0.0000 0.0000
19
20
The effect of smoothing constant on data weight in Exponential smoothing
1

0.9

0.8
0.6

0.7
0.5

0.6
0.4
weight (α=0.2)
0.5

0.3 weight (α=0.5)


0.4

0.2
0.3

0.1 0.2

0 0.1
1 3 5 7 9 11 13 15 17 19 21
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
© Cengage Learning 2013.
21 α=0.1 α=0.2 α=0.5 α=0.8 α=0.9 22

Table 3.3: SES calculations using Y1 as starting


value and α=0.3
Time Sales Forecast Error E^2 Abs E APE If you had to make a subjective choice for the value of the
1 5 smoothing constant, what value would you choose for (a) a
2 6 5.00 1.00 1.00 1.00 16.67 product with long-term steady sales and (b) a stock price
3 7 5.30 1.70 2.89 1.70 24.29 index?
4 8 5.81 2.19 4.80 2.19 27.38
5 7 6.47 0.53 0.28 0.53 7.61 A. 0.8 for (a) and 0.3 for (b)
6 6 6.63 -0.63 0.39 0.63 10.45 B. 0.3 for (a) and 0.8 for (b)
7 5 6.44 -1.44 2.07 1.44 28.78
C. 0.5 for (a) and 0.5 for (b)
8 6 6.01 -0.01 0.00 0.01 0.12
9 7 6.01 0.99 0.99 0.99 14.21 D. 0.2 for (a) and 1.0 for (b)
10 8 6.30 1.70 2.88 1.70 21.21 E. 1.0 for (a) and 0.2 for (b)
11 7 6.81 0.19 0.04 0.19 2.68
12 6 6.87 -0.87 0.75 0.87 14.48
13 6.61
Means 0.44 1.51 1.02 15.12
© Cengage Learning 2013. 23 24
• Developed by R.G. Brown in 1940s & 50s.
• For each series, only need to record last forecast, latest
observation and smoothing constant.
• Useful for short-term forecasting [especially for a large
number of series].
• Local level model, so h-step ahead point forecast is
same as 1-step ahead:

Ft h (h) Ft 1 (1) Ft 1

• Uncertainty increases as the forecasting horizon


increases.
© Cengage Learning 2013.
25 26

27 28
• Choose α to minimize RMSE, MAE or MAPE for the • For model development purposes, a time series is
estimation sample; results tend to be similar. typically divided into two parts:
o Text uses RMSE unless stated otherwise. o The first part, the estimation or fitting sample, is
• (R)MSE corresponds to the use of Least Squares. used to estimate the parameters of the forecast
• Choose the starting value. function.
o Choose first observation o The second part, the hold-out or test sample is
used to check the performance of the forecasting
o Choose the average of time series method.
• Measures of performance based upon the estimation
sample are known as in-sample measures.
• Measures of performance based upon the holdout
sample are known as out-of-sample measures.

29 30

Method MPE MAE RMSE MAPE

MA(3) -178 3067 4320 8.9


MA(8) -475 3749 4865 11.0
SES(0.2) -309 3389 4342 9.9
SES(0.5) -164 2832 3980 8.2
SES(opt) -76 2561 3915 7.3
© Cengage Learning 2013.

31 32
Table 3.6: Effects of fitting using different methods

Estimation RMSE MAE MAPE


• Pros
Criterion
– Do not require very lengthy time-series data
Sample Estimation Hold- Estimation Hold- Estimation Hold-
out out out – Weight data at different time point differently.
Error • Cons
Measure – It does not sensitive to trend and seasonality.
RMSE 3200 3915 3217 3921 3217 3921
MAE 2288 2561 2247 2624 2247 2624
MAPE 7.0 7.3 6.9 7.5 6.9 7.5
Value of α 0.729 0.660 0.660
Data: WFJ_sales.xlsx. © Cengage Learning 2013.

33 34

Figure 3.8A: Linear trend fitted to Quarterly Sales

• When a time series has a long-term trend (e.g. Quarterly Sales = - 6.157 + 4.567 Period
increases in GDP or sales) the forecasting method must 80
S 5.47757
accommodate such features. There are two main 70 R-Sq
R-Sq(adj)
93.7%
93.3%
approaches: 60

50
o Convert the series to rates of change (growth rates, 40
Netflix Sales
either absolute or percentage) then predict the rate of 30

change, OR 20

o Develop forecasting methods that account for trends 10

0 2 4 6 8 10 12 14 16
Period
Data: Netflix_1.xlsx. ©Cengage Learning 2013.

35 36
Figure 3.8B: Quadratic trend fitted to Quarterly
Sales
Ft First moving average
Quarterly Sales = 6.914 - 0.0466 Period
+ 0.2883 Period**2 (Yt Yt 1
Yt 2
... Yt n 1
)/n
80
S 1.98024
70 R-Sq 99.2% Ft Second moving average
R-Sq(adj) 99.1%
60

50
(Ft Ft 1
Ft 2
...Ft n 1
)/n
Netflix Sales 40
at 2Ft Ft
30

20 2
bt (Ft Ft )
10
(n 1)
Fˆt
0
0 2 4 6 8
Period
10 12 14 16
m
at bt * m
Data: Netflix_1.xlsx. ©Cengage Learning 2013.

37 38

m year ค่าใช ้จ่าย Ft (regression) MA(4) MA(4) at bt Ft(LMA)


1 2545 27,612.00 20,588.71
2 2546 30,538.00 29,366.71
ค่าใช้ จ่ายโครงการหลักประกันสุขภาพถ้ วนหน้ า
3 2547 33,573.00 38,144.70 180,000
4 2548 40,890.00 46,922.69
160,000
5 2549 54,429.00 55,700.68 33153.25

6 2550 67,366.00 64,478.68 39857.5 140,000

7 2551 76,599.00 73,256.67 49064.5


120,000
8 2552 80,598.00 82,034.66 59821
100,000
9 2553 101,058.00 90,812.66 69748 45474.06 94021.94 16182.63

10 2554 107,814.00 99,590.65 81405.25 54622.75 108187.8 17855 110204.6 80,000

11 2555 108,744.00 108,368.64 91517.25 65009.69 118024.8 17671.71 126042.8


60,000
12 2556 114,963.00 117,146.64 99553.5 75622.88 123484.1 15953.75 135696.5
40,000
13 2557 115176 125,924.63 108144.8 85556 130733.5 15059.17 139437.9
14 134702.6212 111674.3 95155.19 128193.3 11012.71 145792.7
20,000
14 139206
14 150218.7
0
14 161231.4
2545

2546

2547

2548

2549

2550

2551

2552

2553

2554

2555

2556

2557
b0 20588.71212
b1 8777.993007
ค่าใช้ จ่าย Ft (regression) Ft(LMA)
39 40
• Brown method is one parameter model Ft Yt (1 )Ft 1

where level smoothing parameter is set to Ft Ft (1 )Ft 1


be the same as trend smoothing
at 2Ft Ft
parameter.
• It is called double exponential smoothing bt (Ft Ft )
(1 )
and is analogous to linear moving average ˆ
F at bt * m
but using exponential smoothing for two t m

times

41 42

year ค่าใช ้จ่าย SES(α=.712) error(SES) DES(α=.712) error(DES) at bt Ft ค่าใช้ จ่ายโครงการหลักประกันสุขภาพถ้ วนหน้ า
2545 27,612 180,000
2546 30,538 27,612 2,926
2547 33,573 29,695 3,878 27,612 2,083 31,779 5,150 160,000
2548 40,890 32,456 8,434 29,095 3,361 35,817 8,309 36,929
2549 54,429 38,461 15,968 31,488 6,973 45,434 17,238 44,126 140,000
2550 67,366 49,830 17,536 36,453 13,377 63,208 33,072 62,672
2551 76,599 62,316 14,283 45,978 16,338 78,654 40,391 96,279 120,000
2552 80,598 72,485 8,113 57,610 14,875 87,361 36,775 119,045
2553 101,058 78,262 22,796 68,201 10,060 88,322 24,871 124,135 100,000
2554 107,814 94,493 13,321 75,364 19,128 113,621 47,290 113,193
2555 108,744 103,977 4,767 88,984 14,994 118,971 37,068 160,911 80,000
2556 114,963 107,371 7,592 99,659 7,712 115,083 19,066 156,039
2557 115,176 112,777 2,399 105,150 7,626 120,403 18,854 134,149 60,000
2558 114,485 110,580 3,905 118,390 9,654 139,257
2559 128,043 40,000

2560 137,697
20,000
2561 147,350
α=.712
0
2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561
Actual Ft(Brown)
44

43
45 46

at Yt (1 )(Ft 1
bt 1
)
bt (at at 1
) (1 )(bt 1
)
• Holt method is two parameter model with Fˆ at bt * m
t m
level (α) and trend (γ) smoothing Initialization
parameters. (Y2 Y1 ) (Y4 Y3 )
• If α= γ, then Holt = Brown b1
2
or
b1 Y2 Y1
a1 Y1

47 48
# year Actual at bt Ft (Holt; γ=0.05 & α=0.95)
ค่ าใช้ จ่ายโครงการหลักประกันสุขภาพถ้ วนหน้ า
1 2545 27612 27612 2926
160,000
2 2546 30538 30538 2926 30538

3 2547 33573 33568 2931 33464 140,000


4 2548 40890 40670 3140 36499

5 2549 54429 53898 3644 43810 120,000

6 2550 67366 66875 4111 57542


100,000
7 2551 76599 76318 4377 70986

8 2552 80598 80603 4373 80696


80,000
9 2553 101058 100254 5137 84976

10 2554 107814 107693 5252 105391 60,000

11 2555 108744 108954 5052 112945


40,000
12 2556 114963 114915 5098 114006

13 2557 115176 115418 4868 120013


20,000
14 2558 120286

15 2559 125154 0
2545 2546 2547 2548 2549 2550 2551 2552 2553 2554 2555 2556 2557 2558 2559 2560 2561
16 2560 130022
Actual Ft (Holt; β=0.05 & α=0.95)
17 2561 134890
49 50
alpha 0.95
gamma 0.05

51 52
Netflix Sales Forecast
90

80

70

60

50

40

30

20

10

0
0 4 8 12 16

Data: Netflix_1.xlsx. ©Cengage Learning 2013.

53 54

• Use same structure as before:


• From Table 3.11, the point forecast for the next period
Forecast z 2 *( RMSE ) (2003.4) was 80.45 with RMSE = 2.85. The resulting
(approximate) 95 percent prediction interval is:
80.45 1.96 * 2.85 80.45 5.59
(74.86, 86.04)

55 56
• Compute growth rates (Figure 3.11) Yt Yt 1
• Transform to logarithms (Figure 3.12) • Define the growth rate as: Gt 100*
Yt 1
• Use a power transform [C transform; see Figure 3.12]
o Growth rates and differences are ways to remove trends • After forecasting the growth rate as gt+1, then convert
o Logarithmic and power transforms are ways to stabilize back to the original series:
the variance gt 1
Ft 1
Yt 1
100

57 58

©Cengage Learning 2013.


59 60
• Does the series display a trend? Do we expect the 1. Plot the Series.
trend to continue into the future? If so, then 2. Clean the Data.
o Use a method that includes a trend. 3. Use Transformations as Required by Expectations
About the Process.
• Do the observations tend to be more (or less) variable 4. Select Simple Methods, Unless Convincing Empirical
over time? If so, then Evidence calls for Greater Complexity.
o Transform the data to obtain roughly constant 5. Evaluate Alternative Methods, Preferably Using Out-of-
variability. Sample Data.
6. Update the Estimates Frequently.

61 62

• Always start by plotting (at least a subset of) the data


series to identify past and likely future behavior as well as
unusual observations.
• The adaptive features of local methods are typically
preferable for short-term forecasting.
• Keep it simple, unless the data indicate the need for a
more complex approach.
• Always evaluate forecasting performance, ideally using a
hold-out sample.

63

You might also like