You are on page 1of 36

King Abdulaziz University Faculty of Engineering

Industrial Engineering Dept.

IE 436 Dynamic Forecasting
1

CHAPTER 3 Exploring Data Patterns and an Introduction to Forecasting techniques
 Cross-sectional

data:

collected at a single point in time.  A Time series: collected, and recorded over successive increments of time. (Page 62)
2

Exploring Time Series Data Patterns
 Horizontal
 Trend.  Cyclical.  Seasonal.

(stationary).

A Stationary Series
Its mean and variance remain constant over time
3

The long-term component that represents the growth or decline in the time series.

The Trend

The Cyclical component
The wavelike fluctuation around the trend.
25

Cost

Cyclical Peak 20 15

Trend Line

Cyclical Valley
10

0

10

Year

20
Page (63)
4

FIGURE 3-2 Trend and Cyclical Components of an Annual Time Series Such as Housing Costs

The Seasonal Component
A pattern of change that repeats itself year after year.
Seasonal data
800 700 600 500
500 450 400 350 350 300 250 200 200 150 250 350 400 350 400 350 400 600 550 550 550 500 750

650

Y
400 300 200 100 2

4

6

8

10

12 14 Index

16

18

20

22

24

Page (64)
5

FIGURE 3-3 Electrical Usage for Washington water Power Company, 1980-1991

Exploring Data Patterns with Autocorrelation Analysis • Autocorrelation:
The correlation between a variable lagged one or more periods and itself.
rk 
t  k 1

 (Y  Y )(Y
t n t 1 t

n

t k 2

Y) k  0,1, 2, .... (3.1)

 (Y  Y )

rk = autocorrelation coefficient for a lag
of k periods

Yt

= observation in time period t

Yt  k = observation at time period t-k
(Pages 64-65)
6

Y = mean of the values of the series

Autocorrelation Function (Correlogram)
A graph of the autocorrelations for various lags.

Computation of the lag 1 autocorrelation coefficient

Table 3-1
(page 65)
7

Example 3.1
Data are presented in Table 3-1 (page 65).  Table 3-2 shows the computations that lead to the calculation of the lag 1 autocorrelation coefficient.  Figure 3-4 contains a scatter diagram of the pairs of observations (Yt, Yt-1).  Using the totals from Table 3-2 and Equation 3.1:

r1 
t 11

 (Y  Y )(Y
t n t 1 t

n

t 1 2

Y) 

 (Y  Y )

843  0.572 1474
8

Autocorrelation Function (Correlogram) (Cont.)
Minitab instructions: Stat > Time Series > Autocorrelation
Autocorrelation Function
1.0 0.8 0.6

Autocorrelation

0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 -1.0 1 2 Lag 3

FIGURE 3-5 Correlogram or Autocorrelation Function for the Data Used in Example 3.1

9

Questions to be Answered using Autocorrelation Analysis
 Are
 Do

the data random?

the data have a trend? the data stationary?

 Are  Are

the data seasonal?
(Page 68)
10

Are the data random?
If a series is random:
 The

successive values are not related to each other.  Almost all the autocorrelation coefficients are significantly different from zero.
11

Is an autocorrelation coefficient significantly different from zero?
- The autocorrelation coefficients of random data have an approximate normal sampling distribution.

-At a specified confidence level, a series can be considered random if the autocorrelation coefficients are within the interval [0 ± t SE(rk)],

(z instead of t for large samples).
rk - The following t statistic can be used: t  SE ( rk )
12

- Standard error of the autocorrelation at lag k:
SE (rk )  1  2 ri 2
i 1 k 1

(3.2)

n

Where:
ri = the autocorrelation at time lag k. k = the time lag n = the number of observations in the time series
13

Example 3.2
A hypothesis test:

(Page 69)

Is a particular autocorrelation coefficient is significantly different from zero?
At significant level  = 0.05: the critical values ± 2.2 are the t upper and lower points for n-1 = 11 degrees of freedom.

Decision Rule:
If t < -2.2 or t > 2.2, reject H◦: rk = 0
Note: t is given directly in the Minitab output under the heading T.
14

Is an autocorrelation coefficient different from zero? (Cont.)
The Modified Box-Pierce Q statistic

(developed by: Ljung, and Box) “LBQ”
A portmanteau test: Whether a whole set of autocorrelation coefficients at once.
15

rk2 Q  n(n  2)  k 1 n  k
m

(3.3)

Where:
n= number of observations K= the time lag m= number of time lags to be considered rk= kth autocorrelation coefficient lagged k time periods

The value of Q can be compared with the chi-square with m degrees of freedom.
16

Example 3.3 (Page 70)
t
1 2 3 4 5 6 7 8 9 10

Yt
343 574 879 728 37 227 613 157 571 72

t
11 12 13 14 15 16 17 18 19 20

Yt
946 142 477 452 727 147 199 744 627 122

t
21 22 23 24 25 26 27 28 29 30

Yt
704 291 43 118 682 577 834 981 263 424

t
31 32 33 34 35 36 37 38 39 40

Yt
555 476 612 574 518 296 970 204 616 17 97

Autocorrelation Function
1.0 0.8 0.6 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 -1.0

Autocorrelation

1

2

3

4

5

6

7

8

9

10

Lag 1 2 3 4 5 6 7

Corr -0.19 -0.01 -0.15 0.10 -0.25 0.03 0.17

T -1.21 -0.04 -0.89 0.63 -1.50 0.16 0.95

LBQ 1.57 1.58 2.53 3.04 6.13 6.17 7.63

Lag 8 9 10

Corr -0.03 -0.03 0.02

T -0.15 -0.18 0.12

LBQ 7.67 7.73 7.75

FIGURE 3-7 Autocorrelation Function for the Data Used in Example 18 3.3

• Q statistic for m= 10 time lags is calculated = 7.75 (using Minitab).
• The chi-square value  18.307, (tested at 0.05 significance level, degrees of freedom df = m = 10). Table B-4 (Page 527)
2 0.05=

 •Q<

, Conclusion: the series is random.

2 0.05

19

Do the Data have a Trend?
A significant relationship exists between successive time series values.  The autocorrelation coefficients are large for the first several time lags, and then gradually drop toward zero as the number of periods increases.  The autocorrelation for time lag 1: is close to 1, for time lag 2: is large but smaller than for time lag 1.

20

Example 3.4 (Page 72)
Data in Table 3-4 (Page 74)
Year 1955 Yt 3307 Year 1966 Yt 6769 Year 1977 Yt 17224 Year 1988 Yt 50251

1956
1957 1958

3556
3601 3721

1967
1968 1969

7296
8178 8844

1978
1979 1980

17946
17514 25195

1989
1990 1991

53794
55972 57242

1959
1960

4036
4134

1970
1971

9251
10006

1981
1982

27357
30020

1992
1993

52345
50838

1961
1962 1963 1964 1965

4268
4578 5093 5716 6357

1972
1973 1974 1975 1976

10991
12306 13101 13639 14950

1983
1984 1985 1986 1987

35883
38828 40715 44282 48440

1994
1995 1996 1997 1998

54559
34925 38236 41296 …….
21

Data Differencing
• A time series can be differenced to remove the trend and to create a stationary series. • See FIGURE 3-8 (Page 73) for differencing the Data of Example 3.1 • See FIGURES 3-12, 3-13 (Page 75)
22

Are The Data Seasonal?
 For

quarterly data: a significant autocorrelation coefficient will appear at time lag 4.  For monthly data: a significant autocorrelation coefficient will appear at time lag 12.
23

Example 3.5 (Page 76)
Table 3-5:
Year 1994 December 31 147.6

See Figures 3-14, 3-15 (Page 77)
March 31 251.8 June 30 273.1 September 30 249.1

1995
1996 1997 1998

139.3
140.5 168.8 259.7

221.2
245.5 322.6 401.1

260.2
298.8 393.5 464.6

259.5
287.0 404.3 497.7

1999
2000 2001 2002 2003 2004 2005

264.4
232.7 205.1 193.2 178.3 190.8 242.6

402.6
309.2 234.4 263.7 274.5 263.5 318.8

411.3
310.7 285.4 292.5 295.4 318.8 329.6

385.9
293.0 258.7 315.2 286.4 305.5 338.2
24

2006

232.1

285.6

291.0

281.4

Time Series Graph
Quarterly Sales: 1995-2007
600 500
Quarterly Sales

400 300 200 100 0 Years
FIGURE 3-14 Time Series Plot of Quarterly Sales for Coastal Marine for Example 3.5
25

Autocorrelation Function
1.0 0.8 0.6 0.4 0.2 0.0 -0.2 -0.4 -0.6 -0.8 -1.0

Autocorrelation

2

7

12

Lag 1 2 3 4 5 6 7

Corr 0.39 0.16 0.29 0.74 0.15 -0.15 -0.05

T 2.83 1.03 1.81 4.30 0.67 -0.64 -0.23

LBQ 8.49 10.00 14.91 46.79 48.14 49.44 49.60

Lag 8 9 10 11 12 13

Corr 0.34 -0.18 -0.43 -0.32 0.09 -0.35

T 1.48 -0.77 -1.79 -1.24 0.32 -1.34

LBQ 56.92 59.10 71.46 78.32 78.83 87.77

FIGURE 3-15 Autocorrelation Function for quarterly Sales for Coastal Marine for Example 3.5

Autocorrelation coefficients at time lags 1 and 4 are significantly 26 different from zero, Sales are seasonal on quarterly basis.

Choosing a Forecasting Technique
Questions to be Considered:

 


 

Why is a forecast needed? Who will use the forecast? What are the characteristics of the data? What time period is to be forecast? What are the minimum data requirements? How much accuracy is required? What will the forecast cost?

27

Choosing a Forecasting Technique (Cont.)
The Forecaster Should Accomplish the Following:
   

Define the nature of the forecasting problem. Explain the nature of the data. Describe the properties of the techniques. Develop criteria for selection.
28

Choosing a Forecasting Technique (Cont.)

Factors Considered:
   

Level of Details. Time horizon. Based on judgment or data manipulation. Management acceptance.

 Cost. 29

General considerations for choosing the appropriate method
Method
Judgment

Uses
Can be used in the absence of historical data (e.g. new product). Most helpful in mediumand long-term forecasts Sophisticated method Very good for medium- and long-term forecasts Easy to implement Work well when the series is relatively stable

Considerations
Subjective estimates are subject to the biases and motives of estimators.

Causal

Must have historical data. Relationships can be difficult to specify Rely exclusively on past data. Most useful for short-term estimates.

Time series

30

Method
Naïve Simple averages Moving averages Single Exponential smoothing Linear (Double) exponential smoothing (Holt’s) Quadratic exponential smoothing Seasonal exponential smoothing (Winter’s) Adaptive filtering Simple regression

Pattern of Data
ST , T , S ST ST ST T T S S T

Time Horizon
S S S S S S S S I

Type of Model
TS TS TS TS TS TS TS TS C

Minimal Data Requirements Nonseasonal
1 30 4-20 2 3 4 2xs 5xs 10

Seasonal

Multiple regression
Classical decomposition Exponential trend models S-curve fitting Gompertz models Growth curves Census X-12 ARIMA (Box-Jenkins) Lading indicators Econometric models

C,S
S T T T T S ST , T , C , S C C

I
S I,L I,L I,L I,L S S S S

C
TS TS TS TS TS TS TS C C

10 x V
5xs 10 10 10 10 6xs 24 24 30 6xs 3xs

Time series multiple regression T,S I,L C Pattern of data: ST, stationary; T, trended; S, seasonal; C, cyclical. Time horizon: S, short term (less than three months); I, intermediate; L, long term Type of model: TS, time series; C, causal. Seasonal: s, length of seasonality. of Variable: V, number variables.

31

Measuring Forecast Error
Basic Forecasting Notation
Y t = actual value of a time series in time t  Y t = forecast value for time period t

e t = Yt -

 Yt

= forecast error in time t (residual)
32

Measuring Forecasting Error (Cont.)
The Mean Absolute Deviation The Mean Squared Error The Root Mean Square Error
The Mean Absolute Percentage Error

 1 n MAD   Yt  Yt n t 1  2 1 n MSE   (Yt  Yt ) n t 1
RMSE   2 1 n (Yt  Yt )  n t 1

The Mean Percentage Error

Yt  n 1 (Yt  Yt ) MPE   n t 1 Yt
33

1 n MAPE   n t 1

 Yt  Yt

Equations (3.7 - 3.11)

Used for:
• The measurement of a technique usefulness or reliability. • Comparison of the accuracy of two different techniques. • The search for an optimal technique.

Example 3.6 (Page 83)
• Evaluate the model using: • MAD, MSE, RMSE, MAPE, and MPE.
34

Results of the forecast accuracy for a sample of 3003 time series (1997):

Empirical Evaluation of Forecasting Methods

Complex methods do not necessarily produce more accurate forecasts than simpler ones.  Various accuracy measures (MAD, MSE, MAPE) produce consistent results.  The performance of methods depends on the forecasting horizon and the kind of data analyzed( yearly, quarterly, monthly).

35

Determining the Adequacy of a Forecasting Technique

Are the residuals indicate a random series? (Examine the autocorrelation coefficients of the residuals, there should be no significant ones)

Are they approximately normally distributed?

Is the technique simple and understood by decision makers?
36