Professional Documents
Culture Documents
1. Descriptive Statistics
a. X = 21.32
b. S = 13.37
c. S2 = 178.76
d. If the policy is successful, smaller orders will be eliminated and the mean will
increase.
e. If the change causes all customers to consolidate a number of small orders into
large orders, the standard deviation will probably decrease. Otherwise, it is very
difficult to tell how the standard deviation will be affected.
2. Descriptive Statistics
23.41 + 102.59
4. a. Point estimate: X = = 63
2
95% error margin: (102.59 − 23.41)/2 = 39.59
13.5 −12.1
Z = 1.7 = 8.235
100
Reject H0 since the computed Z (8.235) is greater than the critical Z (1.645). The mean has
increased.
Since |−2.67| = 2.67 > 1.96, reject H 0 at the 5% level. The mean satisfaction rating is
different from 5.9.
p-value: P(Z < − 2.67 or Z > 2.67) = 2 P(Z > 2.67) = 2(.0038) = .0076, very strong
evidence against H 0 .
2
X −4 4.31 − 4
Test statistic: t = = = 2.23
S/ n .52 / 14
Since 2.23 > 1.771, reject H 0 at the 5% level. The medium-size serving contains an
average of more than 4 ounces of yogurt.
715 − 700
Z = 50 = 2.12
50
Since the calculated Z is greater than the critical Z (2.12 > 1.96), reject the null hypothesis.
The forecast does not appear to be reasonable.
p-value: P(Z < − 2.12 or Z > 2.12) = 2 P(Z > 2.12) = 2(.017) = .034, strong evidence
against H 0
10. This problem can be used to illustrate how a random sample is selected with Minitab. In
order to generate 30 random numbers from a population of 200 click the following menus:
Calc>Random Data>Integer
The Integer Distribution dialog box shown in the figure below appears. The number of
random digits desired, 30, is entered in the Number of rows of data to generate space. C1
is entered for Store in column(s) and 1 and 200 are entered as the Minimum and Maximum
values. OK is clicked and the 30 random numbers appear in Column 1 of the worksheet.
3
The null hypothesis that the mean is still 2.9 is true since the actual mean of the
population of data is 2.91 with a standard deviation of 1.608; however, a few students may
reject the null hypothesis, committing a Type I error.
11. a.
4
12. a.
Yˆ = 32.5 + 36.4X
Ŷ = 32.5 + 36.4(5.2) = 222
13. This is a good population for showing how random samples are taken. If three-digit
random numbers are generated from Minitab as demonstrated in Problem 10, the selected
items for the sample can be easily found. In this population, ρ = 0.06 so most
students will get a sample correlation coefficient r close to 0. The least squares line will, in
most cases, have a slope coefficient close to 0, and students will not be able to reject the
null hypothesis H0: β1 = 0 (or, equivalently, ρ = 0) if they carry out the hypothesis test.
14. a.
5
b. Rent = 275.5 + .518 Size
Hypothesis test:
H 0 : µ = 44
two-sided test, α = .02, critical value: |Z|= 2.33
H 1 : µ ≠ 44
X − 44 45.2 − 44
Test statistic: Z = = = 1.54
S/ n 10.3 / 175
As expected, the results of the hypothesis test are consistent with the confidence
interval for µ; µ = 44 is not ruled out by either procedure.
H 0 : µ = 63,700
16. a.
H1 : µ > 63,700
6
H 0 : µ = 4 .3
b.
H1 : µ ≠ 4.3
H 0 : µ = 1300
c.
H1 : µ < 1300
17. Large sample 95% confidence interval for mean monthly return μ:
5.99
−1.10 ±1.96 = −1.10 ±1.88 ⇒ ( −2.98, .78)
39
μ = .94 (%) is not a realistic value for mean monthly return of client’s
account since it falls outside the 95% confidence interval. Client may have a
case.
18. a.
In our consulting work, business people sometimes tell us that business schools teach a risk-
taking attitude that is too conservative. This is often reflected, we are told, in students choosing too
low a significance level: such a choice requires extreme evidence to move one from the status quo.
7
This case can be used to generate a discussion on this point as David chooses α = .01 and ends up
"accepting" the null hypothesis that the mean lifetime is 5000 hours.
Alice's point is valid: the company may be put in a bad position if it insists on very dramatic
evidence before abandoning the notion that its components last 5000 hours. In fact, the indifference
α (p-value) is about .0375; at any higher level the null hypothesis of 5000 hours is rejected.
In this case, John Mosby tries some primitive ways of forecasting his monthly sales. The
things he tries make some sort of sense, at least for a first cut, given that he has had no formal
training in forecasting methods. Students should have no trouble finding flaws in his efforts, such
as:
1. The mean value for each year, if projected into the future, is of little value since
month-to-month variability is missing.
2. His free-hand method of fitting a regression line through his data can be improved
upon using the least squares method, a technique now found on inexpensive hand
calculators. The large standard deviation for his monthly data suggests considerable
month-to-month variability and, perhaps, a strong
seasonal effect, a factor not accounted for when the values for a year are averaged.
Both the hand-fit regression line and John's interest in dealing with the monthly seasonal
factor suggest techniques to be studied in later chapters. His efforts also point out the value of
learning about well-established formal forecasting methods rather than relying on intuition and very
simple methods in the absence of knowledge about forecasting. We hope students will begin to
appreciate the value of formal forecasting methods after learning about John's initial efforts.
Julie’s initial look at her data using regression analysis is a good start. She found that the
r-squared value of 36% is not very high. Using more predictor variables, along with examining
their significance in the equation, seems like a good next step. The case suggests that other
techniques may prove even more valuable, techniques to be discussed in the chapters that follow.
Examining the residuals of her equation might prove useful. About how large are these
errors? Are forecast errors in this range acceptable to her? Do the residuals seem to remain in
the same range over time, or do they increase over time? Are a string of negative residuals
followed by a string of positive residuals or vice versa? These questions involve a deeper
understanding of forecasting using historical values and these matters will be discussed more fully
in later chapters.
CHAPTER 3
2. A time series consists of data that are collected, recorded, or observed over successive
increments of time.
3. The secular trend of a time series is the long-term component that represents the growth or
decline in the series over an extended period of time. The cyclical component is the wave-
like fluctuation around the trend. The seasonal component is a pattern of change that
repeats itself year after year. The irregular component is that part of the time
series remaining after the other components have been removed.
4. Autocorrelation is the correlation between a variable, lagged one or more period, and itself.
5. The autocorrelation coefficient measures the correlation between a variable, lagged one or
more periods, and itself.
6. The correlogram is a useful graphical tool for displaying the autocorrelations for various
lags of a time series. Typically, the time lags are shown on a horizontal scale and the
autocorrelation coefficients, the correlations between Yt and Yt-k, are displayed as vertical
bars at the appropriate time lags. The lengths and directions (from 0) of the bars indicate
the magnitude and sign of the of the autocorrelation coefficients. The lags at which
significant autocorrelations occur provide information about the nature of the time series.
7. a. nonstationary series
b. stationary series
c. nonstationary series
d. stationary series
8. a. stationary series
b. random series
c. trending or nonstationary series
d. seasonal series
e. stationary series
f. trending or nonstationary series
9. Naive methods, simple averaging methods, moving averages, and Box-Jenkins methods.
Examples are: the number of breakdowns per week on an assembly line having a uniform
production rate; the unit sales of a product or service in the maturation stage of its life
cycle; and the number of sales resulting from a constant level of effort.
10. Moving averages, simple exponential smoothing, Holt's linear exponential smoothing,
simple regression, growth curves, and Box-Jenkins methods. Examples are: sales
revenues of consumer goods, demand for energy consumption, and use of raw materials.
Other examples include: salaries, production costs, and prices, the growth period of the
life cycle of a new product.
9
11. Classical decomposition, census II, Winters’ exponential smoothing, time series multiple
regression, and Box-Jenkins methods. Examples are: electrical consumption,
summer/winter activities (sports like skiing), clothing, and agricultural growing seasons,
retail sales influenced by holidays, three-day weekends, and school calendars.
0 ± 1.96 ( 1
14.
80 ) = 0 ± 1.96 (.1118) = 0 ± .219
15. a. MPE
b. MAPE
c. MSE or RMSE
17. a. r1 = .895
H1: ρ1 ≠ 0
H0: ρ1 = 0
k −1 1−1
1 + 2∑ ri 2 1 + 2∑ ( r1 )
2
1
SE( r k ) = i =1
= i =1
= = .204
24
n 24
10
r1 − ρ1 .895 − 0
t= = = 4.39
SE(rk) .204
Since the computed t (4.39) is greater than the critical t (2.069), reject the null.
r2 = .788
H0: ρ2 = 0H1: ρ2 ≠ 0
k −1 2 −1
1 + 2∑ ri 2 1 + 2∑ ( .895)
2
2.6
SE( r k ) = i =1
= i =1
= = .33
24
n 24
r1 − ρ1 .788−0
t= = = 2.39
SE(r1) .33
Since the computed t (4.39) is greater than the critical t (2.069), reject the null.
11
18. a. r1 = .376
12
19. Figure 3-18 - The data are nonstationary. (Trending data)
Figure 3-19 - The data are random.
Figure 3-20 - The data are seasonal. (Monthly data)
Figure 3-21 - The data are stationary and have a pattern that could be modeled.
13
20.
The data have a quarterly seasonal pattern as shown by the significant autocorrelation
at time lag 4. First quarter earnings tend to be high, third quarter earnings tend to be low.
14
a. Time Data Forecast Error
et et
t Yt Yˆ t et et et2
Yt Yt
1 .40 - - - - - -
2 .29 .40 -.11 .11 .0121 .3793 -.3793
3 .24 .29 -.05 .05 .0025 .2083 -.2083
4 .32 .24 .08 .08 .0064 .2500 .2500
5 .47 .32 .15 .15 .0225 .3191 .3191
6 .34 .47 -.13 .13 .0169 .3824 -.3824
7 .30 .34 -.04 .04 .0016 .1333 -.1333
8 .39 .30 .09 .09 .0081 .2308 .2308
9 .63 .39 .24 .24 .0576 .3810 .3810
10 .43 .63 -.20 .20 .0400 .4651 -.4651
11 .38 .43 -.05 .05 .0025 .1316 -.1316
12 .49 .38 .11 .11 .0121 .2245 .2245
13 .76 .49 .27 .27 .0729 .3553 .3553
14 .51 .76 -.25 .25 .0625 .4902 -.4902
15 .42 .51 -.09 .09 .0081 .2143 -.2143
16 .61 .42 .19 .19 .0361 .3115 .3115
17 .86 .61 .25 .25 .0625 .2907 .2907
18 .51 .86 -.35 .35 .1225 .6863 -.6863
19 .47 .51 -.04 .04 .0016 .0851 -.0851
20 .63 .47 .16 .16 .0256 .2540 .2540
21 .94 .63 .31 .31 .0961 .3298 .3298
22 .56 .94 -.38 .38 .1444 .6786 -.6786
23 .50 .56 -.06 .06 .0036 .1200 -.1200
24 .65 .50 .15 .15 .0225 .2308 .2308
25 .95 .65 .30 .30 .0900 .3158 .3158
26 .42 .95 -.53 .53 .2809 1.2619 -1.2619
27 .57 .42 .15 .15 .0225 .2632 .2632
28 .60 .57 .03 .03 .0009 .0500 .0500
29 .93 .60 .33 .33 .1089 .3548 .3548
30 .38 .93 -.55 .55 .3025 1.4474 -1.4474
31 .37 .38 -.01 .01 .0001 .0270 -.0270
32 .57 .37 .20 .20 .0400 .3509 .3509
5.85
b. MAD = = .189
31
1.6865
c. MSE = = .0544 , RMSE = √.0544 = .2332
31
11.2227
d. MAPE = = .3620 or 36.2%
31
15
− 2.1988
e. MPE = = -.0709
31
b. The sales time series appears to vary about a fixed level so it is stationary.
The sample autocorrelations die out rapidly. This behavior is consistent with a
stationary series. Note that the sales data are not random. Sales in adjacent
weeks tend to be positively correlated.
16
b. The residual autocorrelations follow
Since, in this case, the residuals differ from the original observations by the
constant Y = 2460.05 , the residual autocorrelations will be the same as the
autocorrelations for the sales numbers. There is significant residual
autocorrelation at lag 1 and the autocorrelations die out in an exponential fashion.
The random model is not adequate for these data.
17
Since this series is trending upward, it is nonstationary. There is also a seasonal
pattern since 2nd and 3rd quarter earnings tend to be relatively large and 1st and 4th
quarter earnings tend to be relatively small.
The autocorrelations are consistent with choice in part b. The autocorrelations fail
to die out rapidly consistent with nonstationary behavior. In addition, there are
relatively large autocorrelations at lags 4 and 8, indicating a quarterly seasonal
pattern.
19
1. The retail sales series has a trend and a monthly seasonal pattern.
2. Yes! Julie has determined that her data have a trend and should be first differenced. She has
also found out that the first differenced data are seasonal.
4. She will know which technique works best by comparing error measurements such as MAD,
MSE or RMSE, MAPE, and MPE.
1. The retail sales series has a trend and a monthly seasonal pattern.
2. The patterns appear to be somewhat similar. More actual data is needed in order to reach a
definitive conclusion.
3. This question should create a lively discussion. There are good reasons to use either set of
data. The retail sales series should probably be used until more actual sales data is available.
1. This case affords students an opportunity to learn about the use of autocorrelation functions,
and to continue following John Mosby's quest to find a good forecasting method for his data.
With the use of Minitab, the concept of first differencing data is also illustrated. The
summary should conclude that the sales data have both a trend and a seasonal component.
2. The trend is upward. Since there are significant autocorrelation coefficients at time lags 12
and 24, the data have a monthly seasonal pattern.
3. There is a 49% random component. That is, about half the variability in John’s monthly
sales is not accounted for by trend and seasonal factors. John, and the students analyzing
these results, should realize that finding an accurate method of forecasting these data could
be very difficult.
4. Yes, the first differences have a seasonal component. Given the autocorrelations at lags 12
and 24, the monthly changes are related 12, 24, … months apart. This information should be
used in developing a forecasting model for changes in monthly sales.
1. First, Dorothy used Minitab to compute the autocorrelation function for the number of new
20
clients. The results are shown below.
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
2 12 22
Lag Corr T LBQ Lag Corr T LBQ Lag Corr T LBQ Lag Corr T LBQ
1 0.49 4.83 24.08 8 0.23 1.40 93.71 15 0.12 0.64 136.27 22 0.09 0.46 153.39
2 0.43 3.50 42.86 9 0.17 1.01 96.90 16 0.14 0.75 138.70 23 0.16 0.83 156.84
3 0.35 2.56 55.51 10 0.18 1.09 100.72 17 0.22 1.14 144.37 24 0.25 1.26 165.14
Since the autocorrelations failed to die out rapidly, Dorothy concluded her series was
trending or nonstationary. She then decided to difference her time series.
21
The autocorrelations for the first differenced series are:
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
2 12 22
Lag Corr T LBQ Lag Corr T LBQ Lag Corr T LBQ Lag Corr T LBQ
1 -0.42 -4.11 17.43 8 0.03 0.26 18.49 15 -0.12 -0.93 29.32 22 -0.12 -0.92 41.93
2 0.05 0.41 17.66 9 -0.06 -0.52 18.91 16 -0.04 -0.32 29.52 23 -0.03 -0.26 42.09
3 -0.04 -0.33 17.82 10 0.00 0.02 18.91 17 0.21 1.67 34.85 24 0.19 1.44 47.00
2. The differences appear to be stationary and are correlated in consecutive time periods. Given
the somewhat large autocorrelations at lags 12 and 24, a monthly seasonal pattern should be
considered.
3. Dorothy would recommend that various seasonal techniques such as Winters’ method of
exponential smoothing (Chapter 4), classical decomposition (Chapter 5), time series
multiple regression (Chapter 8) and Box-Jenkins methods (ARIMA models in Chapter 9) be
considered.
22
CASE 3-4: ALOMEGA FOOD STORES
The sales data from Chapter 1 for the Alomega Food Stores case are reprinted in Case
3-4. The case suggests that Julie look at the data pattern for her sales data.
Autocorrelations suggest an up and down pattern that is very regular. If one month is
relatively high, next month tends to be relatively low and so forth. Very regular
pattern is suggested by persistence of autocorrelations at relatively large lags.
The changing of the sign of the autocorrelations from one lag to the next is consistent with
an up and down pattern in the time series. If high sales tend to be followed by low sales or
low sales by high sales, autocorrelations at odd lags will be negative and autocorrelations at
even lags positive.
The relatively large autocorrelation at lag 12, 0.53, suggests there may also be a seasonal
pattern. This issue is explored in Case 5-6.
1. A time series plot and the autocorrelation function for Surtido Cookies sales follow.
23
The graphical evidence above suggests Surtido Cookies sales vary about a fixed level with
a strong monthly seasonal component. Sales are typically high near the end of the year and
low during the beginning of the year.
MAD appears large because of the big numbers for sales. MAPE is fairly large but
perhaps tolerable. In any event, Jame is convinced he can do better.
24
CHAPTER 4
1. Exponential smoothing
2. Naive
3. Moving average
6. a.
ˆ et et
t Yt Y t et et et2
Yt Yt
8.92
b. MAD = = .74
12
8.99
c. MSE = = .75
12
.445
d. MAPE = = .0371
12
25
.141
e. MPE = = .0118
12
f. 22.14
7.
Price AVER1 FITS1 RESI1
19.39 * * *
18.96 * * *
18.20 18.8500 * *
17.89 18.3500 18.8500 -0.96000
18.43 18.1733 18.3500 0.08000
19.98 18.7667 18.1733 1.80667
19.51 19.3067 18.7667 0.74333
20.63 20.0400 19.3067 1.32333
19.78 19.9733 20.0400 -0.26000
21.25 20.5533 19.9733 1.27667
21.18 20.7367 20.5533 0.62667
22.14 21.5233 20.7367 1.40333
Accuracy Measures
MAPE: 4.6319 MAD: 0.9422 MSE: 1.1728
Accuracy Measures
MAPE: 3.5779 MAD: 8.0000 MSE: 64.6667
26
b. & c. See plot below.
Yt Smoothed Forecast
200 200.000 200.000
210 204.000 200.000
215 208.400 204.000
216 211.440 208.400
219 214.464 211.440
220 216.678 214.646
225 220.007 216.678
226 222.404 220.007
222.404
Accuracy Measures
MAPE: 3.2144 MAD: 7.0013 MSE: 58.9657
27
9. a. & c, d, e, f 3-month moving-average (See plot below.)
Accuracy Measures
MAPE: 4.5875 MAD: 0.4911 MSE: 0.3193 MPE: .6904
28
b. & c, d, e, f 5-month moving-average (See plot below.)
Accuracy Measures
MAPE: 5.5830 MAD: 0.6040 MSE: 0.5202 MPE: .7100
29
g. Use 3-month moving average forecast: 10.4433
No! The accuracy measures favor the three-month moving average procedure, but the
values of the forecasts are not much different.
11. See plot below.
30
Month Demand Smooth Forecast Error
Accuracy Measures
MAPE: 14.67 MAD: 43.44 MSE: 2943.24
12. Naïve method - Forecast for 1996 Q2: 25.68 (Actual: 26.47)
MAPE = 8.622 MAD = 1.916 MSE = 5.852
5-month moving average - Forecast for 1996 Q2: 24.244 (Actual: 26.47)
MAPE: 9.791 MAD: 2.249 MSE: 7.402
31
MAPE: 8.425 MAD: 1.894 MSE: 5.462
Based on the error measures and the forecast for Q2 of 1996, the naïve method
and simple exponential smoothing are comparable. Either method could be used.
13. a. α = .4
Accuracy Measures
MAPE: 14.05 MAD: 24.02 MSE: 1174.50
b. α = .6
Accuracy Measures
MAPE: 14.68 MAD: 24.56 MSE: 1080.21
c. Looking at the error measures, there is not much difference between the two
choices of smoothing constant. The error measures for α = .4 are slightly better.
The forecasts for the two choices of smoothing constant are also not much
different.
d. The residual autocorrelations for α = .4 are shown below. The residual
autocorrelations for α = .6 are similar. There are significant residual
32
autocorrelations at lags 1, 4 and (very nearly) 8. A forecasting method that yields
no significant residual autocorrelations would be desirable.
14. None of the techniques do much better than the naïve method. Simple exponential
Smoothing with α close to 1, say .95, is essentially the naïve method.
Using the naïve method, the forecast for 2000 would be 6.85.
15. A time series plot of quarterly Revenues and the autocorrelation function show
that the data are seasonal with a trend. After some experimentation, Winters’
multiplicative smoothing with smoothing constants α (level) = 0.8, β (trend) = 0.1
and γ (seasonal) = 0.1 is used to forecast future Revenues. See plot below.
Accuracy Measures
MAPE 3.8
MAD 69.1
MSE 11146.4
Forecasts
34
The data appear to be seasonal with relatively large sales in August, September,
October and November, and relatively small sales in July and December.
b. & c. The Excel spreadsheet for calculating MAPE for the naïve forecasts and
the simple exponential smoothing forecasts is shown below.
35
is calculated with a divisor of 23 (since the first smoothed value is set equal
to the first observation). Using a divisor of 24 gives MAPE2 = 7.69%, the
value reported by Minitab.
36
17. a. The four-week moving average seems to represent the data a little better.
Compare the error measures for the four-week moving average in the figure below
with the five-week moving average results in Figure 4-4.
37
18. a. As the order of the moving average increases, the smoothed data become more
wavelike. Looking at the results for orders k =10 and k = 15, and counting the
number of years from one peak to the next, it appears as if the number of severe
earthquakes is on about a 30 year cycle.
38
The residual autocorrelation function is shown below. There are no
significant residual autocorrelations. Simple exponential smoothing seems
to provide a good fit to the earthquake data.
19. a. The results of Holt’s smoothing with α (level) = .9 and β (trend) = .1 for
Southwest Airline’s quarterly income are shown below. A plot of the residual
autocorrelation function follows. It appears as if Holt’s procedure represents the
data well but the residual autocorrelations have significant spikes at the seasonal
lags of 4 and 8 suggesting a seasonal component is not captured by Holt’s
method.
39
b. Winters’ multiplicative smoothing with α = β = γ =.2 was applied to the quarterly
income data and the results are shown in the plot below. The forecasts for the
four quarters of 2000 are:
Quarter Forecast
49 88.960
50 184.811
51 181.464
52 117.985
40
The forecasts seem reasonable but the residual autocorrelation function below has
a significant spike at lag 1. So although Winters’ procedure captures the trend and
seasonality, there is still some association in consecutive observations not
accounted for by Winters’ method.
20. A time series plot of The Gap quarterly sales is shown below.
41
This time series is trending upward and has a seasonal pattern with third and fourth
quarter Gap sales relatively large. Moreover the variability in this series is increasing
with the level suggesting a multiplicative Winters’ smoothing procedure or a
transformation of the data (say logarithms of sales) to stabilize the variability.
Forecasts
42
Quarter Forecast Lower Upper
101 3644.18 3423.79 3864.57
102 3775.78 3551.94 3999.62
103 4269.27 4041.58 4496.96
104 5267.82 5035.90 5499.74
This case provides the student with an opportunity to deal with a frequent real world
problem: small data sets. A plot of the two years of data shows both an upward trend and seasonal
pattern. The forecasting model that is selected must do an accurate job for at least three months into
the future.
Averaging methods are not appropriate for this data set because they do not work when data
has a trend, seasonality, or some other systematic pattern. Moving average models tend to smooth
out the seasonal pattern of the data instead of making use of it to forecast.
A naive model that takes into account both the trend and the seasonality of the data might
work. Since the seasonal pattern appears to be strong, a good forecast might take the same value it
did in the corresponding month one year ago or Yt+1 = Yt-11.
However, as it stands, this forecast ignores the trend. One approach to estimate trend is to calculate
the increase from each month in 2005 to the same month in 2006. As an example, the increase from
43
January, 2005 to January, 2006 is equal to (Y13 - Y1) = (17 - 5) = 12.
After the increases for all 12 months are calculated, they can be summed and then divided by
12. The forecast for each month of 2007 could then be calculated as the value for the same month in
2006 plus the average increase for each of the 12 months from 2005 to the same month in 2006.
Consequently, the forecast for January, 2007 is
Y25 = 17 + [(17 - 5) + (14 - 6) + (20 - 10) + (23 - 13) + (30 - 18) + (38 - 15) + (44 - 23) +
(41 - 26) + (33 - 21) + (23 - 15) + (26 - 12) + (17 - 14)]/12
148
Y25 = 17 + = 17 + 12 = 29
12
Month Forecast
Jan/2007 19.8
Feb/2007 18.0
Mar/2007 26.8
Apr/2007 32.0
May/2007 42.4
Jun/2007 45.8
Jul/2007 58.4
Aug/2007 58.9
Sep/2007 47.6
Oct/2007 33.7
Nov/2007 33.5
Dec/2007 28.0
44
The naïve forecasts are not unreasonable but the Winters’ forecasts seem to have captured
the seasonal pattern a little better, particularly for the first 3 months of the year. Notice that if the
trend and seasonal pattern are strong, Winters’ smoothing procedure can work well even with only
two years of monthly data.
This case shows how several exponential smoothing methods can be applied to the Mr. Tux
data. John Mosby tries simple exponential smoothing and exponential smoothing with adjustments
for trend and seasonal factors, along with a three-month moving average.
Students can begin to see that several forecasting methods are typically tried when an
important variable must be forecast. Some method of comparing them must be used, such as the
three accuracy methods discussed in this case. Students should be asked their opinions of John's
progress in his forecasting efforts given these accuracy values. It should be apparent to most that
the degree of accuracy achieved is not sufficient and that further study is needed. Students should
be reminded that they are looking at actual data, and that the problems faced by John Mosby really
occurred.
1. Of the methods attempted, Winters’ multiplicative smoothing was the best method John
found. Each forecast was typically off by about 25,825. The error in each forecast was
about 22% of the value of the variable being forecast.
2. There are other choices for the smoothing constants that lead to smaller error measures.
For example, with α = β = γ = .1, MAD = 22,634 and MAPE = 20.
3. John should examine plots of the residuals and the residual autocorrelations. If Winters’
procedure is adequate, the residuals should appear to be random. In addition, John can
examine the forecasts for the next 12 months to see if they appear to be reasonable.
45
4. The ideal value for MPE is 0. If MPE is negative, then, on average, the predicted values
are too high (larger than the actual values).
1. Students should realize immediately that simply using the basic naive approach of
using last period to predict this period will not allow for forecasts for the rest of
1993. Since the autocorrelation coefficients presented in Case 3-3 indicate
some seasonality, a naive model using April 1992 to predict April 1993, May 1992 to
predict May 1993 and so forth might be tried. This approach produces the error
measures
over the data region, and are not particularly attractive given the magnitudes of the new
client numbers.
2. A moving average model of any order cannot be defended since any moving average
will produce flat line forecasts for the rest of 1993. That is, the forecasts will lie along a
horizontal line whose level is the last value for the moving average. The seasonal pattern
will be ignored.
5. Using Winters’ procedure in 4, the forecasts for the remainder of 1993 are:
Month Forecast
Apr/1993 148
May/1993 141
Jun/1993 148
Jul/1993 141
Aug/1993 143
Sep/1993 136
Oct/1993 159
Nov/1993 146
Dec/1993 126
46
6. There are no significant residual autocorrelations (see plot below). Winters’
multiplicative smoothing seems adequate.
47
adequate. Also, a naïve model that combined seasonal and trend estimates (similar to
Equation 4.5) was found to be adequate. The trend and seasonal pattern in actual
Murphy Brother’s sales are consistent and pronounced so a naïve model is likely to
work well.
3. Based on the forecasting methods tested, actual Murphy Brother’s sales data should be
used. A plot of the results for the best Winters’ procedure follows.
An examination of the autocorrelation coefficients for the residuals from this Winters’
model shown below indicates that none of them are significantly different from zero.
However, Julie decided to use the naïve model because it was very simple and she could
explain it to her father.
48
CASE 4-5: FIVE-YEAR REVENUE PROJECTION FOR DOWNTOWN RADIOLOGY
1. The time series plot for Orders shows a slight upward trend and a seasonal pattern
with peaks in December. Because of the relatively small data set, the autocorrelations
are only computed for a limited number of lags, 6 in this case. Consequently with
monthly data, the seasonality does not show up in the autocorrelation function. There
is significant positive autocorrelation at lag 1, so Orders in consecutive months are
correlated.
The time series plot for CPO shows a downward trend but a seasonal component is
not readily apparent. There is significant positive autocorrelation at lag 1 and the
autocorrelations die out relatively slowly. The CPO series is nonstationary and
observations in consecutive time periods are correlated.
49
3. Simple exponential smoothing with α = .77 (the optimal α in Minitab) represents the
the CPO data well but, like any “averaging” procedure, produces flat-line forecasts.
Forecasts of CPO for the next 4 months are:
50
4. Multiplying the Orders forecasts in 2 by the CPO forecasts in 3 gives the
Contacts forecasts:
Month Forecast
Jul/2003 368333
51
Aug/2003 406064
Sep/2003 382113
Oct/2003 432763
6. It may or may not be better to focus on the number of units and contacts per unit
to get a forecast of contacts. It depends on the nature of the data (ease of modeling)
and the amount of relevant data available.
52
3. If another forecasting method can adequately account for the autocorrelation
in the Total Visits data, it is likely to produce “better” forecasts. This issue
is explored in subsequent cases.
4. The forecasts from Winters’ smoothing show an upward trend. If they are
to be believed, perhaps additional medical staff are required to handle the
expected increased demand. At this point however, further study is required.
53
1. Jame learned that Surtido Cookie sales have a strong seasonal pattern
(sales are relatively high during the last two months of the year, low during
the spring) with very little, if any, trend (see Case 3-5).
2. The autocorrelation function for sales (see Case 3-5) is consistent with
the time series plot. The autocorrelations die out (consistent with no
trend) and have a spike at the seasonal lag 12 (consistent with a seasonal
component).
54
4. Karin’s forecasts follow.
Month Forecast
Jun/2003 618914
Jul/2003 685615
Aug/2003 622795
Sep/2003 1447864
Oct/2003 1630271
Nov/2003 2038257
Dec/2003 1817989
These forecasts have the same pattern as the forecasts generated by Winters’
method but are uniformly lower. Winters’ forecasts seem more consistent
with recent history.
CHAPTER 5
55
TIME SERIES AND THEIR COMPONENTS
1. The purpose of decomposing a time series variable is to observe its various elements
in isolation. By doing so, insights into the causes of the variability of the series are
frequently gained. A second important reason for isolating time series components
is to facilitate the forecasting process.
2. The multiplicative components model works best when the variability of the time
series increases with the level. That is, the values of the series spread out as the
trend increases, and the set of observations have the appearance of a megaphone
or funnel.
3. The basic forces that affect and help explain the trend-cycle of a series are
population growth, price inflation, technological change, and productivity increases.
4. a. Exponential
c. Linear
5. Weather and the calendar year such as holidays affect the seasonal component.
6. a. & b.
c. 23.89 billion
d. 648.5 billion
56
e. The trend estimate is below Value Line’s estimate of 680 billion.
f. Inflation, population growth, and new technology affect the trend of capital
spending.
7. a. & b.
9. Yˆ = TS = 850(1.12) = $952
Yˆ = TS = 900(.827) = $744.3
57
Apr 201 93 216
May 206 95 217
Jun 241 99 243
Jul 230 96 240
Aug 245 89 275
Sep 271 103 263
Oct 291 120 243
Nov 320 131 244
Dec 419 189 222
The statement is not true. When the data are deseasonalized, it shows that business
is about the same.
13. a. & b. Would use both the trend and seasonal indices to forecast although seasonal
component is not strong in this example (see plot and seasonal indices below).
Yt = 2268.0 + 22.1*t
Seasonal Indices
Period Index
1 0.969
2 1.026
3 1.000
4 1.005
58
Forecasts
Period Forecast
Q3/1996 3305.39
Q4/1996 3343.02
59
14. a. Multiplicative Model
Yt = 72.6 + 6.01*t
Seasonal Indices
Period Index
1 1.278
2 0.907
3 0.616
4 0.482
5 0.426
6 0.467
7 0.653
8 0.863
9 1.365
10 1.790
11 1.865
12 1.288
60
b. Pronounced trend and seasonal components. Would use both for
forecasting.
Month Forecast
Jun/2006 253
Jul/2006 358
Aug/2006 478
Sep/2006 764
Oct/2006 1012
Nov/2006 1066
Dec/2006 744
Data LnSales
Length 77
NMissing 0
Yt = 4.6462 + 0.0215*t
Seasonal Indices
61
Period Index
1 0.335
2 -0.018
3 -0.402
4 -0.637
5 -0.714
6 -0.571
7 -0.273
8 -0.001
9 0.470
10 0.723
11 0.747
12 0.342
c. & d. Forecasts
Yt = -302.9 + 44.9*t
Seasonal Indices
Period Index
1 0.957
2 1.022
3 1.046
4 0.975
b. There is a significant trend but it is not a linear trend. First quarter sales
tend to be relatively low and third quarter sales tend to be relatively high.
However, the plot in part a indicates a multiplicative decomposition with a
linear trend is not an adequate representation of Disney sales. Perhaps
better to do a multiplicative decomposition with a quadratic trend. Even
better, in this case, is to do an additive decomposition with the logarithms
of Disney sales.
63
c. With the right decomposition, would use both the trend and seasonal
components to generate forecasts.
d. Forecasts
Quarter Forecast
Q4/1995 2506
Q1/1996 2502
Q2/1996 2719
Q3/1996 2830
Q4/1996 2681
17. a.
Month Forecast
Oct/1996 171.2
Nov/1996 174.9
Dec/1996 180.5
Yt = 128.814 + 0.677*t
Seasonal Indices
Period Index
1 0.880
2 0.859
3 0.991
65
4 0.986
5 1.031
6 1.021
7 1.007
8 1.035
9 0.973
10 0.991
11 1.015
12 1.210
Forecasts maintain the seasonal pattern but are uniformly below the actual
retail sales for 1995. However, MPE = MAPE = 2.49% is relatively small.
66
600
19. a. Jan = = 500
1.2
500(1.37) = 685 people estimated for Feb
c. 5
22. Deflating a time series removes the effects of dollar inflation and permits the analyst
to examine the series in constant dollars.
67
Feb 251,254
Mar 303,556
Apr 317,872
May 329,551
Jun 261,362
Jul 336,417
Yt = 65355 + 72.7*t
Seasonal Indices
68
Forecasts
Month Forecast
Nov/2003 74791.4
Dec/2003 74581.7
Jan/2004 73607.8
Feb/2004 73954.0
Mar/2004 74393.4
69
Apr/2004 74887.2
May/2004 75454.0
Jun/2004 76419.5
Jul/2004 76894.1
Aug/2004 76564.4
Sep/2004 75757.2
Oct/2004 76005.6
A multiplicative decomposition with a default linear trend is not quite right for these
data. There is some curvature in the time series as the plot of the seasonally adjusted
data indicates. Not surprisingly, there is a strong seasonal component with
employment relatively high in the summer and relatively low in the winter. In spite
of the not quite linear trend, the forecasts seem reasonable.
26. A linear trend is not appropriate for the employed men data. The plot below shows
a quadratic trend fit to the data of Table P-25.
Although better than a linear trend, the quadratic trend is not quite right. Employment
for the years 2000—2003 seems to have leveled off. No simple trend curve is
likely to provide an excellent fit to these data. The residual autocorrelation
function below indicates a prominent seasonal component since there are large
autocorrelations at the seasonal lag S = 12 and its multiples.
70
27. Multiplicative Model
Yt = 1157 + 1088*t
Seasonal Indices
Quarter Index
Q1 0.923
Q2 0.986
Q3 0.958
Q4 1.133
71
72
Forecasts and Actuals
Slight upward curvature in the Wal-Mart sales data so a linear trend is not quite
right. Not surprisingly, there is a strong seasonal component with 4th quarter
sales relatively high and 1st quarter sales relatively low. The forecasts for 2004
are uniformly below the actuals (primarily the result of the linear trend assumption)
although the seasonal pattern is maintained. Here MPE = MAPE = 9.92%.
Multiplicative decomposition better than additive decomposition but any
decomposition that assumes a linear trend will not forecast sales for 2004 well.
28. A linear trend fit to the Wal-Mart sales data of Table P-27 is shown below. A
linear trend misses the upward curvature in the data.
73
A quadratic trend provides a better fit to the Wal-Mart sales data (see plot
below). The autocorrelation function for the residuals from a quadratic
trend fit suggests a prominent seasonal component since there are large
autocorrelations at the seasonal lag S = 4 and its multiples.
74
CASE 5-1: THE SMALL ENGINE DOCTOR
1.
75
2.
76
4.
77
5. Trend*Seasonality (T*S): MAD = 1.52
Linear Trend Model: MAD = 9.87
6. If you had to limit your choices to the models in 2 and 4, the linear trend model is
78
better (judged by MAD and MSE) than any of the Holt smoothing procedures.
However, the Trend*Seasonality (T*S) model is best. This procedure is the only
one that takes account of the trend and seasonality in Small Engine Doctor sales.
At last, John is able to deal directly with the strong seasonal effect in his monthly data.
Students find it interesting that in addition to using these to forecast, John's banker wants them to
justify variable loan payments.
To forecast using decomposition, students see that both the C and I components must be
estimated. We like to emphasize that studying the C column in the computer printout is helpful,
but that other study is needed to estimate the course of the economy over the next several months.
The computer is not able to make such forecasts with accuracy, as anyone who follows economic
news well knows.
Thinking about John’s efforts to balance his seasonal business to achieve a more uniform
sales picture can generate a good class discussion. This is usually the goal of any business;
examples such as boats/skis or bikes/skis illustrate this effort in many seasonal businesses. In fact,
John Mosby put a great deal of effort into expanding his Seattle business in order to balance his
seasonal effect. Along with his shirt making business, he has achieved a rather uniform monthly
sales volume.
1. The two sentences might look something like this: A computer analysis of John
Mosby's monthly sales data clearly shows the strong variation by month. I think we
are justified in letting him make variable monthly loan payments based on the seasonal
indices shown in the computer printout.
2. Since John expects to do twice as much business in Seattle as Spokane, the Seattle
indices he should try to achieve will be only half as far from 100 as the Spokane
indices, and on the opposite side of 100:
Spokane Seattle
Jan 31.4 134.3
Feb 47.2 126.4
Mar 88.8 105.6
Apr 177.9 61.1
May 191.8 54.1
Jun 118.6 90.7
Jul 102.9 98.6
Aug 128.7 85.7
Sep 93.8 103.1
Oct 81.5 109.3
Nov 60.4 119.8
Dec 77.1 111.5
3. Using the sales figures for January and February of 2005, to get “average” (100%) sales
dollars, divide the actual sales by the corresponding seasonal index:
79
Jan: 71,043/.314 = 226,252
Feb: 152,930/.472 = 324,004
Now subtract the actual sales from these target values to get the sales necessary
from the shirt making machine:
Both the trend and seasonal components are important. The trend explains about 34%
percent of the total variance.
Multiplicative Model
Data Clients
Length 99
Yt = 89.88 + 0.638*t
Seasonal Indices
Month Index
1 1.177
2 1.168
3 1.246
4 0.997
5 0.940
6 1.020
7 0.916
8 0.951
9 0.878
10 1.055
11 0.868
12 0.783
80
The number of new clients tends to be relatively large during the first three months
of the year.
Forecasts
Month Forecast
Apr/2003 153.207
May/2003 145.121
Jun/2003 158.062
Jul/2003 142.440
Aug/2003 148.560
Sep/2003 137.749
Oct/2003 166.161
Nov/2003 137.261
Dec/2003 124.277
81
There is one, possibly two, large positive residuals (irregularities) at the beginning of the
series but there are no significant residual autocorrelations.
Smoothing Constants
Accuracy Measures
MAPE 1.1
MAD 76.2
MSD 11857.8
Forecasts
Month Forecast
Jan/2002 8127.8
Feb/2002 8165.1
Mar/2002 8202.4
Apr/2002 8239.7
May/2002 8277.0
82
Jun/2002 8314.2
Jul/2002 8351.5
Aug/2002 8388.8
Sep/2002 8426.1
Holt’s linear smoothing was adequate for the seasonally adjusted data, but the
forecasts above are uniformly above the actual values for the first nine months of
2002.
3. Using the same procedure as in 2, the forecast for October, 2002 is 8609.2.
4. The pattern for the three sets of data shows a trend and monthly seasonality.
Multiplicative Model
Data Calls
Length 60
NMissing 0
Yt = 21851 - 17.0437*t
Seasonal Indices
Month Index
1 0.937
2 0.922
3 0.972
4 0.963
5 0.925
6 1.016
7 1.063
8 1.094
9 1.094
11 1.025
12 0.936
85
CASE 5-6: ALOMEGA FOOD STORES
The sales data for the Alomega Food Stores case is subjected to a multiplicative
decomposition procedure in this case. A trend line is first calculated with the actual data plotted
around it (using MINITAB). Students can project this line into future months for sales forecasts,
although, as the case suggests, accurate forecasts will not result: The MAPE using only the trend
line is 28%.
A plot of the seasonal indices from the MINITAB output is shown below.. Students can
summarize the managerial benefits to Julie from studying these values. As noted in the case, the
MAPE drops to 12% when the seasonal indices along with the trend are used.
Finally, a 12-month forecast is generated using both the trend line and the seasonal
indices. The forecasts seem reasonable.
Month Forecast
Jan/2007 785348
Feb/2007 326276
Mar/2007 585307
Apr/2007 391827
86
May/2007 558299
Jun/2007 453257
Jul/2007 520615
Aug/2007 319029
Sep/2007 614997
Oct/2007 394599
Nov/2007 377580
Dec/2007 235312
Although more a management concern than a forecasting one, the attitude of Jackson
Tilson in the case might generate a discussion that ties the computer assisted forecasting process
into the real-life personalities of business associates. Although increasingly unlikely in the
business setting, there are still those whose backgrounds do not include familiarity with computer
based data analysis. Students whose careers will be spent in business might benefit from a
discussion of the human element in the management process.
CASE 5-7: SURTIDO COOKIES
1. Multiplicative Model
Data SurtidoSales
Length 41
NMissing 0
Yt = 907625 + 4736*t
Seasonal Indices
Month Index
1 0.696
2 0.546
3 0.517
4 0.678
5 0.658
6 0.615
7 0.716
8 0.567
9 1.527
10 1.664
11 1.988
12 1.829
87
88
Month Forecast
Jun/2003 680763
Jul/2003 795362
Aug/2003 633209
Sep/2003 1710846
Oct/2003 1872289
Nov/2003 2246745
Dec/2003 2076183
2. The linear trend in sales has a slight upward slope. The seasonal indices show that
cookie sales are relatively high the last four months of the year with a peak in
November and relatively low the rest of the year.
89
The multiplicative decomposition adequately accounts for the trend and seasonality
in the data. The forecasts are very reasonable. Jame should change his thinking
about the value of decomposition analysis.
Multiplicative Model
Yt = 955.6 + 4.02*t
90
Seasonal Indices
Month Index
1 0.972
2 1.039
3 0.943
4 0.884
5 1.039
6 0.935
7 1.043
8 1.033
9 0.995
10 1.007
11 1.091
12 1.019
91
Forecasts
92
There are significant residual autocorrelations. The residuals are far from random
The forecasts may be reasonable given the last three fiscal years of data. However,
looking at the time series decomposition plot in 2, it is clear a decomposition analysis
is not able to describe the middle two or three fiscal years of data. For some
reason, visits for these fiscal years, in general, appear to be unusually high. A
decomposition analysis does not adequately describe Mary’s data and leaves her
perplexed.
CHAPTER 6
REGRESSION ANALYSIS
ANSWERS TO PROBLEMS AND CASES
1. Option b is inconsistent because the regression coefficient and the correlation coefficient
93
must have the same sign.
Analysis of Variance
Source DF SS MS F P
Regression 1 92432 92432 20.47 0.002
Residual Error 8 36121 4515
Total 9 128552
a. Yes, the regression is significant. Reject H 0 : β1 = 0 using either the t value 2.384
and it’s p value .002, or the F ratio 20.47 and it’s p value .002.
b. Y = 828 + 10.8X
Analysis of Variance
Source DF SS MS F P
Regression 1 25.622 25.622 115.52 0.000
Residual Error 8 1.774 0.222
Total 9 27.396
a. Yes, the regression is significant. Reject H 0 : β1 = 0 using either the t value 10.75
and it’s p value .000, or the F ratio 115.52 and it’s p value .000.
b. Y = .620 + .109X
2
1 (3 −19.78)
1+ + 1 +.1 +.131 1.231
sf = .471 10 2148.9
= .471 = .471
sf = .471(1.110) = .523
95
.
Analysis of Variance
Source DF SS MS F P
Regression 1 634820 634820 50.96 0.000
Error 7 87197 12457
Total 8 722017
e. Reject H 0 : β1 = 0 at the 5% level since F = 50.96 and it’s p value = .000 < .05.
Could also use t = 7.14, the t value associated with the slope coefficient, and it’s
p value = .000. The correlation coefficient is significantly different from 0 since
the slope coefficient is significantly different from 0.
6. a, b and d.
96
The regression equation is
Books = 32.46 + 36.41 Feet (Positive linear relationship)
Analysis of Variance
Source DF SS MS F P
Regression 1 27032.3 27032.3 83.74 0.000
Error 9 2905.4 322.8
Total 10 29937.6
e. Reject H 0 : β1 = 0 at the 10% level since F = 83.74 and it’s p value = .000 < .10.
Could also use t = 9.15, the t value associated with the slope coefficient, and it’s
p value = .000. The correlation coefficient is significantly different from 0 since
the slope coefficient is significantly different from 0.
f. Based on the residuals versus the fitted values plot, there is no reason to
doubt the adequacy of the simple linear regression model.
97
g. Y = 32.46 + 36.41(4) = 178 books
7. a, b, c & d.
The regression equation is
Orders = 15.8 + 1.11 Catalogs (Fitted regression line)
Source DF SS MS F P
Regression 1 317.53 317.53 9.58 0.011
Residual Error 10 331.38 33.14
Total 11 648.92
New
Obs Fit SE Fit 90% CI 90% PI
1 26.98 1.93 (23.47, 30.48) (15.97, 37.98)
e. Do not reject H 0 : β1 = 0 at the 1% level since t = 3.10 and it’s p value = .011 > .01.
However, would reject H 0 : β1 = 0 at the, say, 5% level.
f. Do not reject H 0 : β1 = 0 at the 1% level since F = 9.58 and it’s p value = .011 > .01.
Result is consistent with the result in e as it should be.
98
g. See Fit and 90% PI at end of computer printout above. A 90% prediction interval
for mail orders when 10(000) catalogs are distributed is (16, 38)---16,000 to 38,000.
Analysis of Variance
Source DF SS MS F P
Regression 1 978986 978986 7.69 0.024
Residual Error 8 1017824 127228
Total 9 1996810
a. There is a significant (at the 5% level) negative relationship between these variables.
Reject H 0 : β1 = 0 at the 5% level since t = -2.77 and it’s p value = .024 < .05.
b. The data set is small. Moreover, r2 = .49 so only 49% of the variation in investment
dollars is explained by interest rate. Finally, the last observation (6.2, 1420) has a
large influence on the location of the fitted straight line. If this observation is deleted,
there is a considerable change in the slope (and intercept) of the fitted line. Using the
original straight line equation for prediction is suspect.
c. A forecast can be calculated. It is 1865. However, the 95% prediction interval is wide.
Forecast unlikely to be useful without additional information. See comments in b.
d. See answer to b.
e. It seems reasonable to say movements in interest rate cause changes in the level
of investment.
9. a. The firms seem to be using very similar rationale since r = .959. Also, from the fitted
line plot below, notice the fitted line is not far from the 45 o line through the origin (with
intercept 0 and slope 1).
99
b. If ABC bids 1.01, the predicted competitor’s bid is 101.212. A 95% prediction
interval (PI) is given below.
New
Obs Fit SE Fit 95% CI 95% PI
101 101.212 0.164 (100.872, 101.552) (99.637, 102.786)
c. Assume normality distributed errors about the population regression line and
treat the least square line as if it were the population regression line (n is reasonably
large in this case). Then at ABC bid 101, possible competitor bids are normally
distributed about the fitted value 101.212 with a standard deviation estimated by
sy.x = .743. Consequently, the probability that ABC will have the bid is
P(Z ≥ (101-101.212)/ .743) = P(Z ≥ -.285) = .51.
10. a. Only if the sample size is large enough. The t statistic associated with the
slope coefficient or the F ratio should be consulted to determine if the population
regression line slope is significantly different from a horizontal line with zero
slope.
100
b. The regression equation is
Permits = 2217 - 145 Rate
Analysis of Variance
Source DF SS MS F P
Regression 1 559607 559607 26.88 0.001
Residual Error 7 145753 20822
Total 8 705360
c. Reject H 0 : β1 = 0 at the 5% level since t = -5.18 and it’s p value = .001 < .05.
d. If interest rate increases by 1%, on average the number of building permits will
decrease by 145.
f. Interest rate explains about 79% of the variation in number of building permits issued.
101
= .72, and so on. The population regression equation is Y = 0.948 + 0.00469X.
Any student who fails to find a meaningful relationship between X and Y will be the
victim of a Type II error.
Analysis of Variance
Source DF SS MS F P
Regression 1 14331 14331 231.77 0.000
Residual Error 11 680 62
Total 12 15011
c. Reject H 0 : β1 = 0 at the 5% level since t = 15.22 and it’s p value = .000 < .05
d.
102
Residual Versus Fits plot shows curvature in scatter not captured by straight line fit.
e. Model with quadratic term in Batch Size fits well. Results with Size**2 as
predictor variable follow.
Analysis of Variance
Source DF SS MS F P
Regression 1 14951 14951 2727.00 0.000
Residual Error 11 60 5
Total 12 15011
f. Reject H 0 : β1 = 0 at the 5% level since t = 52.22 and it’s p value = .000 < .05
103
h. Predicted Values for New Observations
New
Obs Fit SE Fit 95% CI 95% PI
1 95.411 1.173 (92.829, 97.993) (89.647, 101.175)
14. a.
104
c. r 2 = .376 . About 38% of the variation in market prices is explained by
assessed values (as predictor variable). There is a considerable amount of
unexplained variation.
Unusual Observations
c. F = 72.6 , p value = .000 < .10. The regression is clearly significant at the
α = .10 level.
105
1.30 − 2.0
t= = −4.58 (p value = .000) suggests β1 = 2 is not supported by
.153
the data. Appears that operating expenses have a fixed cost component
represented by the intercept b0 = 18.88 , and are then about 1.3 times player costs.
f. Unusual Observations
Obs PlayCosts OpExpens Fit SE Fit Residual St Resid
7 18.0 60.00 42.31 1.64 17.69 3.45R
Analysis of Variance
Source DF SS MS F P
Regression 1 10855642 10855642 16.15 0.001
Residual Error 21 14113925 672092
106
Total 22 24969567
Although the regression is significant, the residual versus fit plot indicates the
magnitudes of the residuals increase with the level. This behavior and the
scatter diagram in a suggest that consumption is not evenly distributed about
the regression line. That is, the data have a megaphone-like appearance. A
straight line regression model for these data is not adequate.
c & d. The response variable is converted to the natural log of newsprint consumption
(LnConsum).
Analysis of Variance
Source DF SS MS F P
Regression 1 3.8252 3.8252 16.00 0.001
Residual Error 21 5.0209 0.2391
Total 22 8.8461
107
The regression is significant (F = 16, p value = .001) although only 43% of the
variation in ln(consumption) is explained by families. The residual plots
above suggest the straight line regression of ln(consumption) on families is
adequate. This simple linear regression model with ln(consumption) is better
than the same model with consumption as the response.
17. a. Can see from fitted line plot below that growth in number of steakhouses is
exponential, not linear.
108
b. The slope of a regression of ln(location) versus year is related to the annual
growth rate.
Analysis of Variance
Source DF SS MS F P
Regression 1 11.764 11.764 82.91 0.001
Residual Error 4 0.568 0.142
Total 5 12.332
18. a, Can see from fitted line plot below that growth in number of copy centers is
exponential, not linear.
109
b. The slope of a regression of ln(centers) on time (year) is related to the annual
growth rate.
Analysis of Variance
Source DF SS MS F P
Regression 1 53.078 53.078 1476.38 0.000
Residual Error 12 0.431 0.036
Total 13 53.509
b. Cannot reject H0 at the 10% level since the t value associated with the slope
coefficient, –1.57, has a p value of .138 > .10. The regression is not significant.
There does not appear to be a relationship between profits per employee and
110
number of employees.
c. r2 = .15. Only 15% of the variation in profits per employee is explained by the
number of employees.
d. The regression is not significant. There is no point in using the fitted function to
generate forecasts for profits per employee for a given number of employees.
Analysis of Variance
Source DF SS MS F P
Regression 1 579.40 579.40 5.99 0.029
Residual Error 13 1258.40 96.80
Total 14 1837.80
The regression is now significant at the 5% level (t value = -2.45, p value = .029 < .05).
r2 has increased from 15% to 31.5%. These results suggest there is a linear
relationship between profits per employee and number of employees. A single
observation can have a large influence on the regression analysis, particularly when
the number of observations is relatively small. However, the relatively small r2 of 31.5%
indicates there will be a fair amount of uncertainly associated with any forecast of
profits per employee. Dun and Bradstreet should not be thrown out unless there is some
good (non-numerical) reason not to include this firm with the others.
Source DF SS MS F P
Regression 1 3833.4 3833.4 118.09 0.000
Residual Error 24 779.1 32.5
Total 25 4612.5
e. The plot of the residuals versus the fitted values has a megaphone-like appearance.
The residuals are numerically smaller for smaller projects than for larger projects.
Estimated costs are more accurate predictors of actual costs for inexpensive (smaller)
projects than for expensive (larger) projects.
22. a. The regression is significant (t value = 14.71, p value = .000).
112
in the rejection region for a two-sided test at any reasonable significance level.
The estimated slope coefficient, .968, is consistent with β1 = 1 .
d. ln(24) = 3.178, so forecast of ln(actual cost) = .0026 + .968(3.178) = 3.079. Forecast
of actual cost is e3.079 = 21.737.
This case asks students to summarize the analysis in a report to management. We find this a useful
exercise since it requires students to put the application and results of a statistical procedure into their
own words. If they are able to do this, they understand the technique.
This case illustrates the use of regression analysis in a situation where determining a good
regression equation is only the first step. The results must then be priced out in order to
arrive at a rational decision regarding a pricing policy. This situation can generate a discussion regarding
the general nature of quantitative techniques: they aid in the decision-making
process rather than replace it. Possible policies regarding the small-load charge can be
discussed after the cost of such loads is determined. One approach would be to take small loads
at company cost, which is low. The resultant goodwill might pay off in increased regular
business. Another would be to charge a low cost for small loads but only if the customer agrees to
book a certain number of large loads.
The low out-of-pocket cost involved in adding small loads can focus management attention
in other directions. Since no significant costs need to be recovered by the small load charge,
a policy based on other considerations is appropriate.
1. The 89 degree temperature is 24 degrees off ideal (89 - 65 = 24). This value is placed into
the regression equation yielding a forecast number of units per day of 338.
2. Once again, the temperature is 24 degrees from ideal (65 - 41 = 24). For X = 24, a forecast
of 338 units is calculated from the regression equation.
3. Since there is a fairly strong relationship between output and deviation from ideal
temperature (r = -.80), higher output may well result from efforts to control the
temperature in the work area so that it is close to 65 degrees. Gene should consider ways
to do this.
4. Gene has made a decent start towards finding an effective forecasting tool. However,
since about 36% of the variation in output is unexplained, he should look for additional
important predictor variables.
1. The correlation coefficient is: r = .927. The corresponding t = 8.9 for testing
H 0 : ρ = 0 has a p value of .000. We reject H0 and conclude the correlation between
days absent and employee age holds for the population.
2. Y = –4.28 + .254X
113
3. r2 = .859. About 86% of Y's (absent days) variability can be explained through
knowledge of X (employee age).
4. The null hypothesis H 0 : β1 = 0 is rejected using either t = 8.9, p value = .000 or the
F = 79.3 with p value = .000. There is a significant relation between absent days and
employee age.
5. Placing X = 24 into the prediction equation yields a Y forecast of 1.8 absent days per year.
6. If time and cost are not factors, it might be helpful to take a larger sample to see if these
small sample results hold. If results hold, a larger sample will very likely produce
more precise interval forecasts.
7. The fitted function is likely to produce useful forecasts, although 95% prediction
intervals can be fairly wide because of the small sample size.
1. After John uses simple regression analysis to forecast his monthly sales volume, he is
not satisfied with the results. The low r-squared value (56.3%) disappoints him.
The high seasonal variation should be discussed as a cause of his poor fit
when using only the month number to forecast sales. The possibility of using
dummy variables to account for the monthly effect is a possibility. After this topic
is covered in Chapter 7, you can have the students return to this case.
2. Not adequate.
3. The idea of serial correlation can be mentioned at this point. The possibility of
autocorrelated residuals can be introduced based on John's Durbin-Watson statistic.
In fact, the DW is low, indicating definite autocorrelation. A class discussion about
this problem and what might be done about it is useful. After this topic is covered
in Chapter 8, you can have the students return to this case. We hope that by this
time students appreciate the difficulties involved in real-life forecasting. Forecasting
Compromises and multiple attempts are the norm, not exceptions.
114
Clients = 32.7 + 0.00349 Stamps
Analysis of Variance
Source DF SS MS F P
Regression 1 5891.9 5891.9 10.51 0.002
Residual Error 46 25791.4 560.7
Total 47 31683.2
The correlation of Clients and Index = 0.752. The relation is significant (see below).
Analysis of Variance
Source DF SS MS F P
Regression 1 49993 49993 126.04 0.000
Residual Error 97 38475 397
Total 98 88468
2. The regression equation is Clients = - 199 + 2.94 BI
Note: Students might develop a new equation that leaves out the first three months of
data for 1993. This is a better way to determine whether the model works and the
results are:
115
Analysis of Variance
Source DF SS MS F P
Regression 1 43028 43028 107.52 0.000
Residual Error 94 37617 400
Total 95 80645
Regressing Clients on the reciprocal of Index produces a little better straight line fit.
The results for this transformed predictor variable follow.
Analysis of Variance
Source DF SS MS F P
Regression 1 45015 45015 118.76 0.000
Residual Error 94 35630 379
Total 95 80645
3. Actual Forecast Forecast Forecast(RecipIndex predictor)
4. Only if the business activity index could itself be forecasted accurately. Otherwise, it is
not a viable predictor because the values for the business activity index are not
available in a timely fashion.
6. If a good regression equation can be developed in which the changes in the predictor
variable lead the response, it might be possible to accurately forecast the rest of 1993.
However, if the regression equation is based on coincident changes in the predictor
variable and response, forecasts for the rest of 1993 could not be developed since values
for the predictor variable are not known in advance.
116
1. The four linear regression models are shown below. Both temperature and rainfall are
potential predictor variables.
2. & 3. Sixty-five degrees was subtracted from the temperature variable. The variable used
was the absolute value of the temperature with relative zero at 65 degrees Fahrenheit
117
labeled NewTemp.
The correlation coefficient between Calls and NewTemp is .724, indicating a fairly
strong positive linear relationship. However, examination of the fitted line plot below
suggests there is a curvilinear relation between Calls and NewTemp
Analysis of Variance
Source DF SS MS F P
Regression 1 119870408 119870408 97.08 0.000
Residual Error 55 67910916 1234744
Total 56 187781324
118
CHAPTER 7
MULTIPLE REGRESSION
1. A good predictor variable is highly related to the dependent variable but not too
highly related to other predictor variables.
2. The population of Y values is normally distributed about E(Y), the plane formed by the
regression equation. The variance of the Y values around the regression plane is
constant. The residuals are independent of each other, implying a random sample. A linear
relationship exists between Y and each predictor variable.
3. The net regression coefficient measures the average change in the dependent variable per
unit change in the relevant independent variable, holding the other independent variables
constant.
5.
Y = 7.52 + 3(20) - 12.2(7) = -17.88
119
6. a. A correlation matrix displays the correlation coefficients between every
possible pair of variables in the analysis.
b. The entries in a correlation matrix reflected about the main diagonal are the
same. For example, r32 = r23.
f. Models that include variables 4 and 6 or variables 2 and 5 are possibilities. The
predictor variables in these models are related to the dependent variable and not
too highly related to each other.
g. Variable 5.
8. a. Correlations:
Time Amount
Amount 0.959
Items 0.876 0.923
120
Predictor Coef SE Coef T P VIF
Constant 0.4217 0.5864 0.72 0.483
Amount 0.08715 0.01611 5.41 0.000 6.756
Items -0.0386 0.1131 -0.34 0.737 6.756
Analysis of Variance
Source DF SS MS F P
Regression 2 128.988 64.494 87.71 0.000
Residual Error 15 11.030 0.735
Total 17 140.018
Amount and Time are highly collinear (correlation = .923, VIF = 6.756). Both
variables are not needed in the regression function. Deleting Items with the
non-significant t value gives the best regression below.
Analysis of Variance
Source DF SS MS F P
Regression 1 128.90 128.90 185.54 0.000
Residual Error 16 11.12 0.69
Total 17 140.02
b. From the Full Model, checkout time decreases by .039 which does not
make sense.
g. Using the best model, the 95% prediction interval (interval forecast) for
Amount = $70 is given below.
New
Obs Fit SE Fit 95% CI 95% PI
1 6.008 0.238 (5.504, 6.512) (4.171, 7.845)
Food Income
Income 0.884
Size 0.737 0.867
When income is increased by one thousand dollars holding family size constant, the
average increase in annual food expenditures is 228 dollars. When family size is
increased by one person holding income constant, the average decrease in annual
food expenditures is 41 dollars. Since family size is positively related to food
expenditures, r = .737, it doesn’t make sense that a decrease in expenditures
would occur.
122
dropped from the regression function and the analysis redone with only Income
as the predictor variable.
10. a. Both high temperature and traffic count are positively related to number of six-
packs sold and have potential as good predictor variables. There is some collinearity
(r = .68) between the predictor variables but perhaps not enough to limit their
value.
b1 .78207
t= s = = 3.45
b1 .22694
Reject H0 because 3.45 > 2.898 and conclude that the regression coefficient for
the high temp-variable is unequal to zero in the population.
b2 .06795
t= s = = 3.35
b2 .02026
Reject H0 because 3.35 > 2.898 and conclude that the regression coefficient for
the traffic count variable is unequal to zero in the population.
∑ (Y − Y )
2
2727.9
d. R2 = 1 - 2 = 1 - = .81
∑ (Y − Y ) 14316.9
We are able to explain 81% of the number of six-packs sold variation using
knowledge of daily high temperature and daily traffic count.
∑(Y − Yˆ ) =
2
2727.9
e. sy.x’s = = 160.46 = 12.67
n − k −1 ( 20−3)
f. If there is an increase of one degree in high temperature while the traffic count
is held constant, beer sales increase on an average of .78 six-packs.
g. The predictor variables explain 81% of the variation in six-packs sold. Both
predictor variables are significant. It would be prudent to examine the residuals (not
available in the problem) before deciding to use the fitted regression function for
forecasting however.
123
11. a. Scatter diagram follows. Female drivers indicated by solid circles, male divers by
diamonds.
d.
124
Line falls “between” point representing female drivers and point
representing male drivers. Straight line equation over-predicts mileage for
male drivers and under-predicts mileage for female drivers. Important to include
gender variable in this regression function.
12. a. Correlations: Sales, Outlets, Auto
Sales Outlets
Outlets 0.739
Auto 0.548 0.670
Number of retail outlets is positively related to annual sales, r12 = .74, and is
potentially a good predictor variable. Number of automobiles registered is
moderately related to annual sales, r13 = .55, and is positively correlated with
number of retail outlets, r23 = .67. Given number of retail outlets in the
regression function, number of automobiles registered may not be required.
Analysis of Variance
Source DF SS MS F P
Regression 2 1043.7 521.8 4.91 0.041
Residual Error 8 849.6 106.2
Total 10 1893.2
125
Predicted Values for New Observations
New
Obs Fit SE Fit 95% CI 95% PI
1 37.00 7.15 (20.50, 53.49) (8.07, 65.93)
As can be seen from the regression output, it appears as if each predictor variable is
not significant (at the 5% level), however the regression is significant at the 5%
level. This is one of things that can happen when the predictor variables are collinear.
The forecast for region 1 is 37 with a prediction error of 52.3 – 37 = 15.3. However,
it is not a good idea to use this fitted function for forecasting. If the regression is rerun
after deleting Auto, Outlets (and the regression) is significant at the 1% level and
d. The standard error of estimate is 10.3 which is quite large. As explained in part b,
the fitted function with both predictor variables should not be used to forecast.
Even if the regression is rerun with the single predictor Outlets, R2 =55% and
the relatively large standard error of the estimate suggest there will be a lot of
uncertainly associated with any forecast.
∑(Y − Y )
2
849.6
e. sy.x’s = = = 106.2 = 10.3
n − k −1 (11 −3)
f. If one retail outlet is added while the number of automobiles registered remains
constant, sales will increase by an average of .011 million or $11,000 dollars. If
one million more automobiles are registered while the number of retail outlets
remains constant, sales will increase by an average of .195 million or $195,000
dollars. However, these regression coefficients are suspect due to collinearity
between the predictor variables.
g. New predictor variables should be tried.
Analysis of Variance
Source DF SS MS F P
Regression 3 1843.40 614.47 86.32 0.000
Residual Error 7 49.83 7.12
Total 10 1893.23
New
Obs Fit SE Fit 95% CI 95% PI
1 27.306 1.878 (22.865, 31.746) (19.591, 35.020)
New
Obs Outlets Auto Income
1 2500 20.2 40.0
c. The standard error of estimate has been reduced to 2.67 from 10.3 and R2 has increased
to 97%. The 95% PI in part b is fairly narrow. The forecast for region 12 sales in
part be should be accurate.
d. The best choice is to drop Outlets from the regression function. If this is done,
the regression equation is
127
Sales = - 4.03 + 0.621 Auto + 0.430 Income
b. If the effort index increases one point while aptitude test score remains constant,
sales performance increases by an average of $20.600.
c. Y = 16.57 + .65(75) + 20.6(.5) = 75.62
∑ (Y − Y ) 2 139.4
f. R2 = 1 - 2 = 1 - = 1 - .039 = .961
∑ (Y − Y ) 3569.3
128
15. a. Scatter plot for cash purchases versus number of items (rectangles) and credit card
purchases versus number of items (solid circles) follows.
129
Notice that for a given number of items, sales from cash purchases are estimated to
be about $18.60 less than gross sales from credit card purchases.
c. The regression in part b is significant. The number of items sold and whether
the purchases were cash or credit card explains approximately 83% of the
variation in gross sales. The predictor variable Items is clearly significant. The
coefficient of the dummy variable X2 is significantly different from 0 at the
10% level but not at the 5% level. From the residual plots below we see that
there are a few large residuals (see, in particular, cash sales for day 25 and credit
card sales for day 1); but overall, plots do not indicate any serious departures
from the usual regression assumptions.
130
d. Y = 13.61 + 5.99(25) – 18.6(1) = $145
f. Fitted function in part b is effectively two parallel straight lines given by the
equations:
Cash purchases: Y = 13.61 + 5.99Items – 18.6(1) = -4.98 + 5.99Items
Credit card purchases: Y = 13.61 + 5.99Items
131
SO 0.049 -0.393
BA 0.446 0.015 -0.007
RUNS 0.627 0.279 -0.209 0.645
HR 0.209 0.490 -0.215 0.154 0.664
SB 0.190 -0.404 -0.062 -0.207 -0.162 -0.305
b. The stepwise results are the same for an alpha to enter = alpha to remove = .05 or
.15 (the Minitab default) or F to remove = F to enter =4.
Step 1 2
Constant 20.40 71.23
ERA -18.0
T-Value -9.52
P-Value 0.000
S 7.72 3.55
R-Sq 39.28 87.72
The fitted function from the stepwise program is:
17. a. View will enter the stepwise regression function first since it has the largest
correlation with Price. After that the order of entry is difficult to determine from
the correlation matrix alone. Several of the predictor variable pairs are fairly highly
correlated so multicollinearity could be a problem. For example, once View is in the
model, Elevation may not enter (be significant). Slope and Area are correlated so
it may be only one of these predictors is required.
132
18. a., b., & c. The regression results follow.
Analysis of Variance
Source DF SS MS F P
Regression 3 3071.1 1023.7 5.29 0.010
Residual Error 16 3096.7 193.5
Total 19 6167.8
Unusual Observations
New
Obs Fit SE Fit 95% CI 95% PI
1 80.88 3.36 (73.77, 88.00) (50.55, 111.22)
F = 5.29 with a p value = .010, so the regression is significant at the 1% level.
The predicted final exam score for within term exam scores of 86 and 77 and a
GPA of 3.4 is Yˆ = 81
The variance inflation factors (VIF’s) are all small (near 1); however, the t ratios and
corresponding p values suggest that each of the predictor variables could be dropped
from the regression equation. Since the F ratio was significant, we conclude that
multicollinearity is a problem.
d. Mean leverage = (3+1)/20= .20. None of the observations are high leverage points.
e. From the regression output above, observation 20 has a large standardized residual.
The fitted model over-predicts the response (final exam score) for this student.
19. Stepwise regression results, with significance level .05 to enter and leave the
regression function, follow.
133
Alpha-to-Enter: 0.05 Alpha-to-Remove: 0.05
Step 1
Constant -26.24
X3 31.4
T-Value 3.30
P-Value 0.004
S 14.6
R-Sq 37.71
R-Sq(adj) 34.25
The “best” regression model relates final exam score to the single predictor
variable grade point average.
All possible regression results are summarized in the following table.
Predictor R2
Variables
X1 .295
X2 .301
X3 .377
X1, X2 .404
X1, X3 .452
X2, X3 .460
X1, X2, X3 .498
The R criterion would suggest using all three predictor variables. However, the
2
20. Best three predictor variable model selected by stepwise regression follows.
134
Coefficient on education is negative. Everything else equal, as education level
increases, compensation decreases. Positive coefficient on lnsales implies as sales
increase, compensation increases, everything else equal. Finally, for fixed
education and sales, as percent ownership increases, compensation decreases.
Unusual Observations
Obs Educate LnComp Fit SE Fit Residual St Resid
31 2.00 6.5338 5.9055 0.4386 0.6283 2.73RX
33 0.00 6.3969 7.0645 0.2624 -0.6676 -1.59 X
All in all, this k = 3 predictor model appears to be better than the k = 2 predictor
model of Example 7.12.
135
Predictor Coef E Coef T P VIF
Constant 7.608 8.503 0.89 0.401
Accounts -0.00457 0.02378 -0.19 0.853 25.965
Accounts**2 0.00003361 0.00000893 3.76 0.007 25.965
Analysis of Variance
Source DF SS MS F P
Regression 2 51130 25565 165.95 0.000
Residual Error 7 1078 154
Total 9 52208
The coefficient of Accounts changes from the quadratic model to the straight
line model because, not surprisingly, Accounts and Accounts**2 are highly
collinear (VIF = 25.965 in the quadratic model).
Source DF SS MS F P
Regression 2 2777.0 1388.5 32.57 0.000
Residual Error 12 511.6 42.6
Total 14 3288.7
23. Using the final model from problem 22 with H2S = 7.3 and Lactic = 1.85
New
Obs Fit SE Fit 95% CI 95% PI
1 32.36 3.02 (25.78, 38.95) (16.69, 48.04)
Since s y ⋅x ' s = 6.53 and t.025 = 2.179 a large sample 95% prediction interval is:
Notice the large sample 95% prediction interval is not too much different than the
actual 95% prediction interval (PI) above.
Although the fit in this case is relatively good, the standard error of the estimate is
somewhat large, so there is a fair amount of uncertainty associated with any forecast.
137
It may be a good idea to collect more data and, perhaps, investigate additional
predictor variables.
Step 1
Constant 2.928
TotRev 1.96
T-Value 11.94
P-Value 0.000
S 13.7
R-Sq 85.59
R-Sq(adj) 84.99
Results from stepwise program are not surprising given the definitions of the
variables and the strong (and in some cases perfect) multicollinearity.
c. The coefficient of TotRev from the stepwise program is 1.96 and the constant
is relatively small and, in fact, insignificant. Consequently, Franchise Value is,
on average, about twice Total Revenue.
138
d. The regression equation is
OpExpens = 18.9 + 1.30 PlayerCt
Source DF SS MS F P
Regression 1 2101.7 2101.7 72.56 0.000
Residual Error 24 695.2 29.0
Total 25 2796.9
Unusual Observations
The linear relation between Operating expenses and Player costs is fairly strong.
About 75% of the variation in Operating expenses is explained by Player costs.
The actual data for this case is supplied in Appendix A. Students can either be asked to
Respond to the question at the end of the case or they can be assigned to run and analyze the data.
One approach that I have used successfully is to assign one group of students the role of asking
Judy Johnson's questions and another group the responsibility for Ron's answers.
1. What questions do you think Judy will have for Ron? The students always seem
to come up with questions that Ms. Johnson will ask. The key is that Ron should be able
to answer them. Possible issues include:
Are all the predictor variables in the final model required? Is a simpler model
with fewer predictor variables feasible?
139
Do the estimated regression coefficients in the final model make sense and are
they reliable?
Four observations have large standardized residuals. Is this a cause for concern?
Is the final model a good one and can it be confidently used to forecast the
utility’s bond interest rate at the time of issuance?
Is multiple regression the appropriate statistical method to use for this situation?
1. The multiple regression model that includes both unemployment rate and average
monthly temperature is shown below. Temperature is the only good predictor variable.
2. Yes.
Analysis of Variance
Source DF SS MS F P
Regression 2 120430208 60215104 48.28 0.000
Residual Error 54 67351116 1247243
Total 56 187781324
The regression is significant. The signs on the coefficients of the independent variables
make sense. The coefficient of each independent variable is significantly different
from 0 (t = –4.6, p value = .000 and t = 4.4, p value = .000, respectively).
Analysis of Variance
Source DF SS MS F P
Regression 3 140771801 46923934 52.90 0.000
Residual Error 53 47009523 886972
Total 56 187781324
Unusual Observations
141
The residual plots follow. There is no significant residual autocorrelation at
any lag.
1. The regression is significant. The R 2 of 78.1% looks good. The t statistic for each
of the predictor variables is large with a very small p-value. The VIF’s are relatively
small for the three predictors indicating that multicollinearity is not a problem. The
residual plots shown in Figure 7-4 indicate that this model is valid. Dr. Hanke has
developed a good model to forecast ERA.
2. The matrix plot below of ERA versus each of five potential predictor variables does
not show any obvious nonlinear relationships. There does not appear to be any
reason to develop a new model.
142
3. The regression results with WHIP replacing OBA as a predictor variable follow.
The residual plots are very similar to those in Figure 7-4.
Analysis of Variance
Source DF SS MS F P
Regression 3 91.167 30.389 157.48 0.000
Residual Error 134 25.859 0.193
Total 137 117.026
The fit and the adequacy of this model are virtually indistinguishable from the
corresponding model with OBA instead of WHIP as a predictor. The estimated
coefficients of CMD and HR/9 are nearly the same in both models. Both models are
good. The original model with OBA as a predictor has a slightly higher R2 and a
slightly smaller standard error of the estimate. Using these criteria, it is the preferred
model.
CASE 7-4: FANTASY BASEBALL (B)
143
The project may not be doomed to failure. A lot can be learned from investigating the
influence of the various independent variables on WINS. However, the best regression model
does not explain a large percentage of the variation in WINS, R2 = 34%, so the experts have
a point. There will be a lot of uncertainty associated with any forecast of WINS. The stepwise
selection of the best predictor variables and the subsequent full regression output follow.
Step 1 2
Constant 20.531 5.543
RUNS 0.0182
T-Value 3.86
P-Value 0.000
S 3.33 3.17
R-Sq 26.51 33.83
R-Sq(adj) 25.97 32.85
Analysis of Variance
Source DF SS MS F P
Regression 2 695.31 347.66 34.51 0.000
Residual Error 135 1360.17 10.08
Total 137 2055.48
CHAPTER 8
144
ANSWERS TO PROBLEMS AND CASES
1. If not properly accounted for, serial correlation can lead to false inferences under the
usual regression assumptions. Regressions can be judged significant when, in fact,
they are not, coefficient standard errors can be under (or over) estimated so individual
terms in the regression function may be judged significant (or insignificant) when they
are not (or are) and so forth.
2. Serial correlation often arises naturally in time series data. Series, like employment,
whose magnitudes are naturally related to the seasons of the year will be autocorrelated.
Series, like sales, that arise because of a consistently applied mechanism, like advertising
or effort, will be related from one period to the next (serially correlated). In the analysis
of time series data, autocorrelated residuals arise because of a model specification error
or incorrect functional form—the autocorrelation in the series is not properly accounted
for.
4. Durbin-Watson statistic
5. Reject H0 if DW < 1.10. Since 1.0 < 1.10, reject and conclude that the errors are
positively autocorrelated.
6. Reject H0 if DW < 1.55, Do not reject H0 if DW > 1.62. Since 1.6 falls between 1.55
and 1.62, the test is inconclusive.
8. A predictor variable is generated by using the Y variable lagged one or more periods.
145
S = 2.29032 R-Sq = 76.6% R-Sq(adj) = 73.0%
Analysis of Variance
Source DF SS MS F P
Regression 2 223.39 111.69 21.29 0.000
Residual Error 13 68.19 5.25
Total 15 291.58
Using the .05 significance level for a sample size of 16 with 2 predictor variables,
dL = .98. Since DW = .61 < .98, reject H0 and conclude the observations are positively
serially correlated.
Analysis of Variance
Source DF SS MS F P
Regression 3 2.20854E+11 73617995859 15.02 0.000
Residual Error 10 49008480079 4900848008
Total 13 2.69862E+11
With n = 14, k =3 and α = .05, DW = 1.14 gives an indeterminate test for serial
correlation.
11. Serial correlation is not a problem. However, it is interesting to see whether the students
realize that collinearity is a likely problem since Customer and Charge are highly correlated.
Correlation matrix:
146
The regression equation is
Revenue = - 65.6 + 0.00173 Use + 29.5 Charge + 0.000197 Customer
Analysis of Variance
Source DF SS MS F P
Regression 3 77037 25679 539.30 0.000
Residual Error 24 1143 48
Total 27 78180
Analysis of Variance
Source DF SS MS F P
Regression 2 76938 38469 774.66 0.000
Residual Error 25 1241 50
Total 27 78180
147
Share Earnings Dividend
Earnings 0.565
Dividend 0.719 0.712
Payout 0.435 -0.049 0.662
The best model, after taking account of the initial multicollinearity, uses the predictor
variables Earnings and Payout (ratio).
Analysis of Variance
Source DF SS MS F P
Regression 2 440912859 220456429 14.33 0.000
Residual Error 25 384584454 15383378
Total 27 825497313
b. With n = 28, k = 2 and α = .01, DW = .29 < dL = 1.04 so there is strong evidence of
positive serial correlation.
13. a.
148
b. No. The residual autocorrelation function for the residuals from the straight line fit
indicates significant positive autocorrelation. The independent errors assumption
is not viable.
c. The fitted line plot with the natural logarithms of Passengers as the dependent variable
and the residual autocorrelation function follow.
149
The residual autocorrelation function looks a little better than that in part b,
but there is still significant positive autocorrelation at lag 1.
d. Exponential trend plot for Passengers follows along with residual autocorrelation
function.
150
Still some residual autocorrelation. Errors are not independent.
e. Models in parts c and d are equivalent. If you take the natural logarithms of
fitted exponential growth model you get the fitted model in part c.
f. As we have pointed out, the errors for either of the models in parts c and d are
not independent. Using a model that assumes the errors are independent can
lead to inaccurate forecasts and, in this case, unwarranted precision.
g. Using the exponential growth model with t = 26, gives Yˆ2007 =195.
14. a. The best model lags permits by 2 quarters (Lg2Permits):
151
Sales = 20.2 + 9.23 Lg2Permits
Forecasts for the 3rd and 4th quarters can be done using several different
approaches. This is best left to the student with a discussion of why they
used a particular method. One method that is to average the past values
of Permits for the 1st and 2nd quarters and use these averages in the model.
This will result in forecasts: 3rd quarter 514; 4th quarter 235.
15.
Quarter Sales S2 S3 S4
1 16.3 0 0 0
2 17.7 1 0 0
3 28.1 0 1 0
4 34.3 0 0 1
152
Constant 19.292 2.074 9.30 0.000
S2 -1.425 2.933 -0.49 0.630
S3 11.163 2.999 3.72 0.001
S4 33.254 2.999 11.09 0.000
Analysis of Variance
Source DF SS MS F P
Regression 3 8726.5 2908.8 56.36 0.000
Residual Error 42 2167.6 51.6
Total 45 10894.1
The regression is significant. The model explains 80.1% of the variation in Sales.
There is no lag 1 autocorrelation but a significant residual autocorrelation at lag 4.
153
c. ρˆ =.585 . Calculate the generalized differences Yt ' = Yt − .585Yt −1 and
X t' = X t − .585 X t −1 , and fit the model given in equation (8.5). The result
is Yˆt ' = −2.31 + 2.81X t' with Durbin-Watson statistic = 1.74. In this case, the
estimate of β1 , βˆ1 = 2.81 , is nearly the same as the estimate of β1 in part a.
Here the autocorrelation in the data is not strong enough to have much effect
on the least squares estimate of the slope coefficient.
d. The standard error of β̂1 is smaller in the initial regression than it is in the
regression involving generalized differences. The standard error in the initial
regression is under estimated because of the positive serial correlation. The
standard error in the regression with generalized differences, although larger,
is the one to be trusted.
Analysis of Variance
154
Source DF SS MS F P
Regression 1 1164598 1164598 20.27 0.000
Residual Error 18 1034389 57466
Total 19 2198987
Analysis of Variance
Source DF SS MS F P
Regression 1 430.0 430.0 4.23 0.054 ← (1) Regression is not
significant
at .01 level
Residual Error 18 1829.0 101.6
Total 19 2259.0
Analysis of Variance
Source DF SS MS F P
Regression 2 1909.94 954.97 46.51 0.000
Residual Error 17 349.06 20.53
Total 19 2259.00
Using all the usual criteria for judging the adequacy of a regression model, this model
is much better than the simple linear regression model in part a.
19. a.
156
The data are clearly seasonal with fourth quarter sales large and sales for the
remaining quarters relatively small. Seasonality is confirmed by the
autocorrelation function with significant autocorrelation at the seasonal
lag 4.
Analysis of Variance
Source DF SS MS F P
Regression 1 4767638 4767638 84.32 0.000
Residual Error 22 1243890 56540
Total 23 6011528
157
Forecasts are not bad but they are below the Value Line estimates for the
last 3 quarters and the difference becomes increasingly larger.
e. Value line estimates for the last 3 quarters of 2003-04 seem increasingly optimistic.
Step 1 2
Constant 28.86 37.72
ChickPrice -0.29
T-Value -2.34
P-Value 0.030
S 2.58 2.34
R-Sq 84.98 88.21
158
R-Sq(adj) 84.27 87.03
c. There is high multicollinearity among the predictor variables so the final model
depends on which non-significant predictor variable is deleted first. If BeefPrice is
deleted, the final model is the one selected by stepwise regression (using a .05 level
for determining significance of individual terms) with significant lag 1 residual
autocorrelation. If Income is deleted first, then the final model involves the three
Price predictor variables as shown below. There is no significant residual
autocorrelation but large VIFs, although the coefficients of the predictor variables
have the right signs. In this data set, Income is essentially a proxy for the three
price variables.
Analysis of Variance
Source DF SS MS F P
Regression 3 844.44 281.48 63.08 0.000
Residual Error 19 84.78 4.46
Total 22 929.22
Step 1 2
Constant 1.729 2.375
LnChickP -0.445
T-Value -6.06
P-Value 0.000
S 0.0528 0.0321
R-Sq 90.71 96.72
159
R-Sq(adj) 90.27 96.40
Analysis of Variance
Source DF SS MS F P
Regression 2 0.61001 0.30500 295.30 0.000
Residual Error 20 0.02066 0.00103
Total 22 0.63067
160
Analysis of Variance
Source DF SS MS F P
Regression 2 8.039 4.020 2.72 0.091
Residual Error 19 28.033 1.475
Total 21 36.073
Very little explanatory power in the predictor variables. If the non-significant DiffIncome
is dropped from the model, the resulting regression is significant at the .05 level, R 2 is
virtually unchanged and the standard error of the estimate decreases slightly. The residual
plots look good and there is no evidence of autocorrelation. With the very low R 2, the fitted
function is not useful for forecasting the change (difference) in chicken consumption.
Analysis of Variance
Source DF SS MS F P
Regression 1 769.45 769.45 432.71 0.000
Residual Error 20 35.56 1.78
Total 21 805.01
161
Fitted regression function implies this year’s chicken consumption is likely to be
a very good predictor of next year’s chicken consumption. The coefficient on
lagged chicken consumption (LagChickC) is almost 1. The intercept in not significant.
Chicken consumption is essentially a “random walk”—next year’s chicken consumption is
this year’s chicken consumption plus a random amount with mean 0. The residual
plots look good and there is no residual autocorrelation.
We cannot infer the effect of a change in chicken price on chicken consumption with
this model since chicken price does not appear as a predictor variable.
24.
Yt − Yt −1 = X t − X t −1 + εt − εt −1 = νt + εt − εt −1 = ηt say
X t − X t −1 = νt
Here the independent error ηt has mean 0 and variance 3σ2. So the first differences for
both Yt and X t are stationary and X and Y are cointegrated of order 1. The cointegrating
linear combination is: Yt − X t = εt .
2. Would it have been better to eliminate multicollinearity first and then tackle
autocorrelation?
Answer: No. In order to solve the autocorrelation problem, the nature of the data was
changed (first differenced). If multicollinearity were solved first, one or more important
variables may have been eliminated. Autocorrelation must be accounted for first so the
usual regression assumptions apply; then multicollinearity can be tackled.
162
3. How does the small sample size affect the analysis?
Answer: A sample size of 15 is small for a model that uses three independent
variables (ideally, n should be in the neighborhood of 30 or more). A larger sample
size would almost certainly be helpful.
4. Should the regression done on the first differences have been through the origin?
Answer: Perhaps. An intercept can be included in the regression model and then
checked for significance. Ordinarily, regressions with first differenced data does
not require an intercept term.
6. What conclusions can be drawn from a comparison of the Spokane County business
activity index and the GNP?
Answer: The Spokane business activity seems to be extremely stable. It was not
affected by the national recessions of 1970 and 1974. The large peak in 1974 was
caused by Expo 74 (a world fair). It would be inappropriate in this case to expect
the Spokane economy to follow national patterns.
4. Would another type of forecasting model be more effective for forecasting weekly sales?
Answer: Possibly! Jim will investigate Box-Jenkins ARIMA models in Chapter 9.
John is correct to be disappointed with the model run with seasonal dummy variables since
the residual autocorrelations have a spike at lag 12. From a forecasting perspective, the
autoregressive model is better. The intercept term allows for a time trend, seasonality is accounted
163
for by sales lagged 12 months as the predictor variable, R2 is large (91%) and there is no residual
autocorrelation. However, this model does not include predictor variables directly under John’s
control, like price, so he would not be able to determine how a change in price (or changes in other
operational variables) might affect future sales.
Nonseasonal model:
Analysis of Variance
Source DF SS MS F P
Regression 3 34630 11543 41.62 0.000
Residual Error 80 22187 277
Total 83 56816
The best nonseasonal regression model used the business activity index, number of
bankruptcies filed, and number of building permits to forecast number of clients seen. The
Durbin-Watson test for serial correlation is inconclusive at the .05 level. The residual
autocorrelation function shows some significant autocorrelation around lag 4.
Analysis of Variance
Source DF SS MS F P
Regression 12 39111.7 3259.3 13.07 0.000
Residual Error 71 17704.7 249.4
Total 83 56816.3
The best seasonal model uses Index and 11 seasonal dummy variables to represent
the months Feb through Dec. We retain all the seasonal dummy variables for forecasting
purposes even though some are non-significant. The Durbin-Watson test is inconclusive at the
.05 level. The residual autocorrelations have a just significant spike at lag 6 but are otherwise
non-significant. Forecasts for the first three months of 1993 follow.
Forecast Actual
Jan 1993 179 151
Feb 1993 175 152
Mar 1993 197 199
Forecasts for Jan and Feb 1993 are high compared to actual numbers of clients but
forecast for Mar 1993 is very close to the actual number of new clients
Autoregressive model:
Autoregressive models with number of new clients lagged 1, 4 and 12 months were
tried. None of these models proved to be useful for forecasting. The best model had number of
new clients lagged 1 month. The results are displayed below.
Analysis of Variance
Source DF SS MS F P
Regression 1 19035 19035 30.62 0.000
Residual Error 93 57805 622
Total 94 76840
1. The results for the best model are shown below (see also solution to Case 7-2). Each of
the independent variables is significantly different from 0 at the .05 level. The signs of
the coefficients are what we would expect them to be.
Analysis of Variance
Source DF SS MS F P
Regression 3 140771801 46923934 52.90 0.000
Residual Error 53 47009523 886972
Total 56 187781324
2. Serial correlation is not a problem. The value of the Durbin-Watson statistic (1.62)
would not reject the null hypothesis of no serial correlation. There are no
significant residual autocorrelations. Restricting attention to integer powers, 2 is the
best choice for the exponential transformation. Allowing other choices for powers,
e.g. 2.4, may improve the fit a bit but is not as “nice” as an integer power.
3. The memo to Mr. DeCoria should use all the usual inferential and descriptive summaries
to defend the model in part 1. A residual analysis should also be included.
166
CASE 8-7 ALOMEGA FOOD STORES
2. “Selling” the final regression model to management, including the irascible Jackson
Tilson, ties the statistical exercise in the Alomega case to the real world of business
management. The idea of selling the statistical results to management can be
the focus of team presentations to the class with the instructor playing the role of
Tilson. Working through the presentation of results to the class adds an important
“real world” element to the statistical analysis.
3. As noted in the case, the advertising predictor variables are under the control of
Alomega management. Students can demonstrate the usefulness of this result by
choosing reasonable future values for these advertising variables and generating forecasts.
However, students must recognize the regression equation does not necessarily
imply a cause and effect relationship between advertising expenditures and sales. In
addition, conditions under which the model was developed may change in the future.
4. All forecasts, including the ones using Julie’s regression equation, assume a future
that is identical to the past except for the identified predictor variables. If her
model is used to generate forecasts for Alomega, she should check the model
accuracy on a regular basis. The errors encountered as the future unfolds should
be compared to those in the data used to generate the model. If significant
changes or trends are observed, the model should be updated to include the most
recent data, along with possibly discarding some of the oldest data. Alternatively,
a different approach to the forecasting problem can be sought if the forecasting errors
suggest that the current regression model is inadequate.
1. The positive coefficient on November makes sense because cookie sales are seasonal
sales relatively high each year in November, the month before the Christmas holidays.
2. Jame’s model looks good. Almost 94% of the variation in cookie sales is explained
by the model. The residual analysis indicates the usual regression assumptions are
tenable, including the independence assumption.
3. Forecasts:
167
June 2003 733,122
July 2003 799,823
August 2003 737,002
September 2003 1,562,070
October 2003 1,744,477
November 2003 2,152,463
December 2003 1,932,194
Analysis of Variance
Source DF SS MS F P
Regression 1 7.03141E+12 7.03141E+12 118.35 0.000
Residual Error 27 1.60415E+12 59412957997
Total 28 8.63556E+12
This regression model is very reasonable. About 81% of the variation in cookie
sales is explained with the single predictor variable, sales lagged 12 months
(Lg12Sales). The usual residual plots look good and there is no significant residual
autocorrelation.
Forecasts:
5. Both models fit the data well. Apart from July 2003, the forecasts generated by the
models are very close to one another. Dummy variable regression explains more of
the variation in cookie sales but the autoregression is simpler. Could make a case for
either model.
168
CASE 8-9 SOUTHWEST MEDICAL CENTER
1. The regression results along with residual plots and the residual autocorrelation
function follow.
Analysis of Variance
Source DF SS MS F P
Regression 12 2353707 196142 8.07 0.000
Residual Error 101 2456198 24319
Total 113 4809905
169
Mary has a right to be disappointed. This regression model does not fit well. Even
allowing for seasonality, only the Dec seasonal dummy variable is significant at the
.05 level. The residual plots clearly show a poor fit in the middle of the series and
there is a considerable amount of significant residual autocorrelation.
2. Mary might try an autoregression with different choices of lags of total visits
as predictor variable(s). She might try to fit a Box-Jenkins ARIMA model to
be discussed in Chapter 9. Regardless, finding an adequate model for this
time series will be challenging.
CHAPTER 9
170
BOX-JENKINS (ARIMA) METHODOLOGY
ANSWERS TO PROBLEMS AND CASES
1. a. 0 ± .196
b. Series is random
2. t Yt Yˆ
t
et
1 32.5 35.000 -2.500
2 36.6 34.375 2.225
3 33.3 36.306 -3.006
4 31.9 33.581 -1.681
Ŷ 7 = 35
c. 75.65 ± 2√3.2
b. AR(1)
c. ARIMA(1,0,1)
b. Q = 44.3 df = 11 α = .05
Reject H0 if χ2 > 19.675
171
Since Q = 44.3 > 19.675, reject H0 and conclude model is not adequate. Also,
there is a significant residual autocorrelation at lag 2. Add a MA term to the
model at lag 2 and fit an ARIMA(1,1,2) model.
172
The least squares estimate of the constant term, .7127, is virtually the same as
The least squares slope coefficient in the straight line fit shown in part a. Also,
The first order moving average coefficient is essentially 1. These two results
are consistent with a straight line time trend regression model for the original data.
Suppose Yt is demand in time period t. The straight line time trend regression
model is: Yt = β 0 + β1t + ε t . Thus Yt −1 = β0 + β1 (t −1) + εt −1 and
Yt − Yt −1 = β1 + εt − εt −1 . The latter is an ARIMA(0,1,1) model with a constant
term (the slope coefficient in the straight line model) and a first order moving
average coefficient of 1.
There is some residual autocorrelation (particularly at lag 2) for both the straight
line fit and the ARIMA(0,1,1) fit, but the usual residual plots indicate no other
problems.
d. The forecasts for the next four periods from forecast origin t = 52 for the
ARIMA model follow.
8. Since the autocorrelation coefficients drop off after one time lag and the partial
autocorrelation coefficients trail off, an MA(1) model should be adequate. The best
173
model is
Ŷ t = 56.1853 - (-0.7064)εt-1
The critical 5% chi-square value for 10 df is 18.31. Since the calculated chi-square
Q for the residual autocorrelations equals 7.4, the model is deemed adequate.
The autocorrelation and partial autocorrelation plots for the original series follow.
Autocorrelation Function for Yt
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
10 20 30
Lag Corr T LBQ Lag Corr T LBQ Lag Corr T LBQ Lag Corr T LBQ
1 0.39 4.33 19.20 10 -0.12 -1.13 25.16 19 -0.06 -0.51 36.49 28 0.19 1.60 62.90
2 -0.08 -0.80 20.06 11 -0.08 -0.75 26.02 20 -0.13 -1.22 39.26 29 0.03 0.21 63.01
3 0.06 0.62 20.59 12 0.10 0.95 27.43 21 -0.04 -0.36 39.51 30 -0.05 -0.45 63.51
4 0.02 0.22 20.65 13 0.14 1.36 30.41 22 -0.11 -1.02 41.53 31 0.08 0.63 64.52
174
Partial Autocorrelation Function for Yt
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
10 20 30
95 Percent Limits
Period Forecast Lower Upper
127 52.3696 44.6754 60.0637
128 56.1853 46.7651 65.6054
129 56.1853 46.7651 65.6054
9. Since the autocorrelation coefficients trail off and the partial autocorrelation
coefficients cut off after one time lag, an AR(1) model should be adequate.
The best model is
Ŷ t = 109.628 - 0.9377Yt-1
175
The forecast for period 81 is
Yˆ 81 = 109.628 - 0.9377Y80
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
5 10 15 20
176
Partial Autocorrelation Function for Yt
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
5 10 15 20
Number of observations: 80
Residuals: SS = 2325.19 (backforecasts excluded)
MS = 29.81 DF = 78
Modified Box-Pierce (Ljung-Box) Chi-Square statistic
Lag 12 24 36 48
Chi-Square 24.8(DF=10) 39.4(DF=22) 74.0(DF=34) 83.9(DF=46)
95 Percent Limits
Period Forecast Lower Upper
81 29.9234 19.2199 40.6269
82 81.5688 66.8957 96.2419
83 33.1408 15.7088 50.5728
The critical 5% chi-square value for 10 df's is 18.31. Since the calculated chi-square
Q for the residual autocorrelations equals 24.8, the model is deemed inadequate. An
examination of the individual residual autocorrelations suggests it might be possible to
improve the model by adding a MA term at lag 2.
177
10. As can be seen below, the autocorrelations for the original series are slow to die out. This
behavior indicates the series may be non-stationary. The autocorrelations for the
differenced data cut off after lag 1 and the partial autocorrelations die out. This suggests
an ARIMA(0,1,1) model. When this model is fit (see the computer output below), there
are no significant residual autocorrelations and the residual plots look good. The
forecasting equation from the fitted model is
Ŷ t = Yt-1 - (-0.3714)εt-1
Ŷ 81 = Y80 - (-0.3714)ε80
Yˆ 81 = 266.9 - (-0.3714)(3.4647) = 268.19
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
5 10 15 20
95 Percent Limits
Period Forecast Lower Upper
81 268.741 245.848 291.635
82 268.741 229.885 307.597
83 268.741 218.787 318.695
The critical 5% chi-square value for 11 df's is 19.68. Since the calculated
chi-square Q for the residual autocorrelations equals 9.2, the model is deemed adequate.
11. The slow decline in the early, non-seasonal lags indicates the need for regular
differencing.
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
2 12 22
Lag Corr T LBQ Lag Corr T LBQ Lag Corr T LBQ Lag Corr T LBQ
1 0.71 6.92 49.43 8 0.54 2.07 309.89 15 0.40 1.22 506.38 22 0.23 0.64 602.29
2 0.63 4.34 88.66 9 0.50 1.85 337.18 16 0.40 1.20 525.09 23 0.26 0.73 610.88
3 0.63 3.69 128.66 10 0.45 1.61 359.38 17 0.42 1.26 546.38 24 0.42 1.18 634.13
179
Autocorrelation Function for Regular
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
5 15 25
Lag Corr T LBQ Lag Corr T LBQ Lag Corr T LBQ Lag Corr T LBQ
1 -0.35 -3.44 12.19 8 -0.01 -0.11 25.78 15 -0.03 -0.19 90.84 22 -0.13 -0.72 109.87
2 -0.17 -1.49 15.08 9 0.05 0.40 26.06 16 -0.05 -0.28 91.10 23 -0.25 -1.43 118.13
3 0.01 0.07 15.09 10 -0.17 -1.35 29.20 17 0.25 1.47 98.33 24 0.54 2.98 156.06
4 -0.03 -0.23 15.16 11 -0.29 -2.22 38.14 18 -0.24 -1.38 105.13 25 -0.14 -0.71 158.67
180
Autocorrelation Function for Seasonal
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
5 15 25
Lag Corr T LBQ Lag Corr T LBQ Lag Corr T LBQ Lag Corr T LBQ
1 -0.49 -4.44 20.42 8 0.07 0.48 23.42 15 -0.03 -0.15 61.85 22 0.02 0.12 70.43
2 -0.03 -0.19 20.47 9 0.01 0.08 23.43 16 -0.11 -0.66 63.13 23 -0.02 -0.12 70.48
3 0.04 0.30 20.61 10 -0.07 -0.50 23.89 17 0.21 1.22 67.63 24 0.02 0.10 70.51
4 0.03 0.23 20.70 11 0.27 2.00 31.19 18 -0.13 -0.78 69.58 25 0.03 0.20 70.65
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
5 15 25
181
Concentrating on the non-seasonal lags, the autocorrelation coefficients drop off
after one time lag and the partial autocorrelation coefficients trail off, so a regular
moving average term of order 1 is indicated. Concentrating on the seasonal lags
(12 and 24), the autocorrelation coefficients cut off after lag 12 and the partial
autocorrelation coefficients trail off, so a seasonal moving average term of order 12
is suggested. An ARIMA(0,1,1)(0,1,1) model for Yt is identified.
95 Percent Limits
Period Forecast Lower Upper
97 163500 146991 180009
98 158300 141277 175322
99 177084 159562 194606
100 178792 160785 196798
101 188706 170227 207185
102 184846 165907 203785
103 191921 172532 211310
104 188746 168918 208574
105 185194 164936 205451
106 187669 166991 208348
107 188084 166993 209175
108 221521 200025 243016
The critical 5% chi-square value for 10 df's is 18.31. Since the calculated
chi-square Q for the residual autocorrelations equals 3, the model is deemed adequate.
b. The autocorrelation coefficient plot below indicates that the data are
non-stationary. Therefore, the data should be first differenced. The
autocorrelation coefficient and partial autocorrelation coefficient plots for
the first differenced data are also shown.
182
A
Autocorrelation Function for IBM
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
2 7 12
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
-0.6
-0.8
-1.0
2 7 12
183
Partial Autocorrelation Function for Diff.
1.0
0.8
0.6
0.4
0.2
0.0
-0.2
-0.4
utocorrelation -0.6
-0.8
-1.0
2 7 12
6 -0.11 -0.77
7 -0.00 -0.00
95 Percent Limits
Period Forecast Lower Upper
53 311.560 300.094 323.026
54 314.418 294.895 333.941
d. The residual plots look good and there are no significant residual autocorrelations.
184
There is no reason to doubt the adequacy of the model.
13. One question that might arise is should the student use the first 145 observations or
all 150 observations. With this many observations, it will not make much difference.
The autocorrelation function using all the data below is slow to die out and suggests
the DEF time series is non-stationary. Therefore, the differenced data should be investigated.
The autocorrelation coefficient and partial autocorrelation coefficient plots for the first
differenced data follow.
185
It appears that the autocorrelations for the differenced data cut off after lag one
and that the partial autocorrelations die out. This suggests a regular MA term in a model
for the differenced data so an ARIMA(0,1,1) model is identified. If 145 observations
are used, the forecasting equation from the fitted model is
Yˆ t = Yt-1 - 0.7179εt-1
186
Final Estimates of Parameters
Lag 12 24 36 48
Chi-Square 12.3 29.5 57.2 66.1
DF 10 22 34 46
P-Value 0.266 0.131 0.008 0.028
95% Limits
Period Forecast Lower Upper Actual
146 133.815 128.832 138.797 135.2
147 133.814 128.637 138.991 139.2
148 133.814 128.450 139.178 136.8
149 133.813 128.268 139.358 136.0
150 133.813 128.092 139.533 134.4
This model fits well. The usual residual analysis indicates no model inadequacies.
Comparing the forecasts with the actuals for the five days from forecast origin t = 145
using MAPE gives MAPE = 1.82%.
187
The sample autocorrelation and partial autocorrelation functions below suggest and
AR(2) or, equivalently, an ARIMA(2,0,0) model. The computer output follows along
with the residual autocorrelation function.
Number of observations: 90
Residuals: SS = 14914.5 (backforecasts excluded)
MS = 171.4 DF = 87
Lag 12 24 36 48
Chi-Square 19.9 25.9 41.7 55.9
DF 9 21 33 45
P-Value 0.018 0.209 0.142 0.128
95% Limits
Period Forecast Lower Upper Actual
91 110.333 84.665 136.001
188
The forecast of 110 accidents for the 91st week seems reasonable given the history
of the series near that point.
There is no evidence of annual seasonality in these data but since there is less
than two years of weekly observations, seasonality, if it exists, would be virtually
impossible to detect.
15. The time series plot that follows suggests the Price series is non-stationary. This
is corroborated by the autocorrelations which are slow to die out. The differenced
series should be investigated.
189
The autocorrelation function for the differenced data below suggests the
differenced series is random. The partial autocorrelation function for the
differenced data has a similar appearance.
An ARIMA(0,1,0) model is identified for the price of corn. For this model
a forecast of the next observation at forecast origin t is given by Yˆt +1 =Yt . Forecasts
two steps ahead are the same, similarly for three steps ahead and so forth. In other
words, this model produces “flat line” forecasts whose intercept is given by Yt .
So, forecasts of the price of corn for the next 12 months are all given by the last
observation or 251 cents per bushel.
16. The variation in the Cavanaugh sales series increases with the level, so a log
transformation seems appropriate. Let Yt be the natural log of sales and
190
Wt = Yt −Yt −12 be the seasonally differenced series. Two ARIMA models that
represent the data reasonably well are given by the expressions
ARIMA(0,0,2)(0,1,0)12 and ARIMA(1,0,0)(0,1,1)12. Both models contain a
constant term. Another possibility is the ARIMA(0,1,1)(0,1,1) 12 model (without
a constant), but the latter doesn’t fit quite as well as the former models. The results
for the ARIMA(1,0,0)(0,1,1)12 process are displayed below.
191
The residual autocorrelation a lag 2 can be ignored or, alternatively, can fit the
ARIMA(0,0,2)(0,1,1)12 model.
17. The variation in Disney sales increases with the level, so a log transformation
seems appropriate. Let Yt be the natural log of sales and Wt = Yt −Yt −4 be the
seasonally differenced series. Two ARIMA models that represent the data
reasonably well are given by the representations ARIMA(1,0,0)(0,1,1) 4 and
ARIMA(0,1,1)(0,1,1)4. The former model contains a constant. The results for
the ARIMA(1,0,0)(0,1,1)4 process are displayed below.
192
18. The data were transformed by taking natural logs; however, an ARIMA model
may be fit to the original observations. Let Yt be the natural log of demand
and let Wt = ∆∆12Yt = Yt −Yt −1 −Yt −12 +Yt −13 be the series after taking one seasonal
difference followed by a regular difference. An ARIMA(0,1,1)(0,1,1) 12 model
represents the log demand series well. The results follow.
193
19. Let Wt = ∆∆12Yt = Yt −Yt −1 −Yt −12 +Yt −13 be the series after taking one seasonal
difference followed by a regular difference. Examination of the autocorrelation
function for Wt leads to the identification of an ARIMA(0,1,0)(0,1,1) 12 model.
The results follow.
194
Lag 12 24 36 48
Chi-Square 13.2 19.3 26.2 52.6
DF 11 23 35 47
P-Value 0.280 0.681 0.858 0.266
95% Limits
Period Forecast Lower Upper
131 73653.4 73166.1 74140.6
132 73448.7 72759.7 74137.8
133 72571.8 71727.9 73415.7
134 72904.3 71929.8 73878.7
135 73200.8 72111.4 74290.3
136 73711.5 72518.1 74905.0
137 74218.7 72929.6 75507.7
138 75021.6 73643.5 76399.7
139 75459.7 73998.0 76921.4
140 75114.5 73573.8 76655.3
141 74519.0 72903.0 76134.9
142 74681.4 72993.6 76369.2
20. The variation in Wal-Mart sales increases with the level, so a log transformation
seems appropriate. Let Yt be the natural log of sales and Wt = Yt −Yt −4 be the
seasonally differenced series. Examination of the autocorrelation function for Wt
leads to the identification of an ARIMA(0,1,0)(0,1,1) 4 model.
The results follow.
195
Type Coef SE Coef T P
SMA 4 0.5249 0.1185 4.43 0.000
Differencing: 1 regular, 1 seasonal of order 4
Number of observations: Original series 60, after differencing 55
Lag 12 24 36 48
Chi-Square 12.5 20.3 30.3 47.7
DF 11 23 35 47
P-Value 0.327 0.626 0.697 0.445
Forecasts
Period LnSales Sales
Q1/05 11.1671 70,764
Q2/05 11.2514 76,988
Q3/05 11.2408 76,176
Q4/05 11.4233 91,427
Q1/06 11.2660 78,120
Q2/06 11.3503 84,991
Q3/06 11.3397 84,095
Q4/06 11.5223 100,942
21. Autocorrelations and partial autocorrelations for number of severe earthquakes suggest
an AR(1) model.
196
Summary of model fit and forecasts for the next 5 years follow.
Lag 12 24 36 48
Chi-Square 11.0 23.2 30.7 45.9
197
DF 10 22 34 46
P-Value 0.358 0.388 0.631 0.477
95% Limits
Period Forecast Lower Upper
101 21.6463 9.7236 33.5691
102 20.9037 7.3049 34.5026
103 20.4964 6.4323 34.5605
104 20.2729 6.0718 34.4741
105 20.1504 5.9082 34.3925
106 20.0831 5.8287 34.3376
22. Since the variation in the series increases with the level, a log transformation is indicated.
An examination of the autocorrelations and partial autocorrelations for LnGapSales leads
to the identification of an ARIMA(0,1,0)(0,1,1) 4 model. Summary of model fit and
forecasts for the next 8 quarters follow.
Lag 12 24 36 48
198
Chi-Square 14.8 19.7 24.2 27.3
DF 11 23 35 47
P-Value 0.194 0.659 0.916 0.990
23. The long strings of 0’s (no Influenza A positive cases) of uneven lengths might create
identification and fitting problems for ARIMA modeling. On the other hand, a simple
AR(1) model with an AR coefficient of about .8 and no constant term might provide
reasonable one week ahead forecasts for the number of positive cases. These forecasts
can be generated with the understanding that any non-integer forecast less than 1 is set
to 0 and any non-integer forecast greater than 1 is rounded to the closest integer.
1. & 2. & 3. AR(1) model is appropriate. See summary, forecasts and actuals below.
199
AR 1 0.5997 0.0817 7.34 0.000
Constant 1921.7 100.2 19.18 0.000
Mean 4800.8 250.3
Lag 12 24 36 48
Chi-Square 8.9 24.5 36.8 48.5
DF 10 22 34 46
P-Value 0.545 0.322 0.342 0.372
95% Limits
Period Forecast Lower Upper Actual
105 3249.49 1251.36 5247.62 2431
106 3870.48 1540.58 6200.38 2796
107 4242.89 1804.68 6681.10 4432
108 4466.23 1990.23 6942.23 5714
Forecasts are too high for first two weeks of January 1983 and too low for next
Two weeks. Note however, that actual sales fall within the 95% prediction
interal limits for each of the four weeks.
4. The best model in Chapter 8 for the original Restaurant Sales data is an autoregressive
model with an added dummy variable to represent the period during the year when
Marquette University is in session. So, because of the additional dummy variable, this
model fits the data better than the AR(1) model in part 1. If the dummy variable were not
present, the two models would be the same. Consequently, we would expect better
forecasts with the AR + dummy variable model than with the simple AR model.
Regardless, however, if forecasts are compared to actuals from forecast origin 104 (last
week in 1982), the usual measures of forecast accuracy (RMSE, MAPE, etc.) are likely
to be relatively large since a large portion of the variation in sales is not accounted for
by the AR + dummy variable model.
5. At the very least the parameters in the AR(1) model should be re-estimated if the
new data are combined with the old data. A better approach is to combine the data
and the go through the usual ARIMA model building process again. It may be the
combined data suggest the form of the ARIMA has changed. In this case, an AR(1)
is still appropriate when the new data are combined with the old data.
1. Box-Jenkins ARIMA models account for the autocorrelation in the observed series using
200
possibly differenced data, lagged dependent variables and current and previous errors.
There are no potential causal (exogenous) independent variables in these models so they
are often difficult to explain to management. Best to demonstrate the results.
2. Autocorrelation and partial autocorrelation plots for the regular and seasonally
differenced data suggest a non-seasonal AR(2) term (the partial autocorrelations cut
off after lag 2 and the autocorrelations die out). No seasonal MA or AR terms should
be included. However, here is a case where, say, the ARIMA(2, 1, 0)(0, 1, 0) 12
model is more complex than necessary and a much simpler model works well. A time
series plot of the seasonally differenced Mr. Tux data is shown below along with the
sample autocorrelation function for these differences.
Setting t = 97 through t = 108, we have the forecasts for the 12 months of 2006:
Yˆ100 =441,741
Yˆ101 =426,921
Y102 =305,048
Yˆ103 = 262, 477
ˆ
Y104 =407,576
Yˆ105 = 227,583
Yˆ
106 =205,692
Yˆ107 =213,876
Yˆ108 = 290,887
The sales forecasts for 2006 are obtained by adding 32,174 to the sales for
each of the 12 months of 2005.
2. The autocorrelation function plot below indicates that the data are non-stationary.
The autocorrelations are slow to die out. In addition, there is a spike at lag 12 and a
smaller spike at lag 24 indicating some seasonality.
202
The autocorrelation functions for the differenced series (DiffClients), the seasonally
differenced series (Diff12Clients) and the series with one regular and one seasonal
difference (DiffDiff12Clients) follow.
203
Relative to the autocorrelations for DiffClients and Diff12Clients, the autocorrelations
for DiffDiff12Clients are much more pronounced, indicating one regular difference and
one seasonal difference is too much. The autocorrelations for Diff12Clients are the
cleanest with a significant spike at lag 12 and a slightly smaller spike at lag 24. This
autocorrelation pattern suggests an ARIMA(0,0,0)(0,1,1) 12 or an ARIMA(0,0,0)(1,1,0)
model. The former model is the better choice. Summary results and forecasts follow.
204
Lag 12 24 36 48
Chi-Square 10.9 20.3 30.8 37.4
DF 11 23 35 47
P-Value 0.452 0.623 0.669 0.842
95% Limits
Period Forecast Lower Upper
Apr 1993 123.181 70.931 175.431
May 1993 122.960 70.710 175.210
Jun 1993 140.803 88.553 193.053
Jul 1993 150.944 98.694 203.194
Aug 1993 140.056 87.806 192.306
Sep 1993 134.285 82.035 186.535
Oct 1993 146.517 94.267 198.767
Nov 1993 146.953 94.703 199.203
Dec 1993 126.243 73.993 178.493
1. The forecast for 1961 using the AR(2) model is 1290. The revised error
measures are:
2. The results from fitting an ARIMA(1,1,0) model, one step ahead forecasts and
actuals follow.
Lag 12 24 36 48
Chi-Square 5.5 22.1 25.3 *
DF 11 23 35 *
P-Value 0.905 0.514 0.885 *
the same as those for the AR(2) model. The choice of one model over the other depends
upon whether one believes the sales series in non-stationary or “nearly” non-stationary.
1. & 2.
206
Fitted model: Yˆt = Yt −12 +50.479 +εt −.792εt −12
207
Model fits well and forecasts seem very reasonable.
Fitted model: Yˆt =Yt −1 +Yt −12 −Yt −13 +εt −.874εt −12
A constant term is not required with a regular and a seasonal difference.
208
3. The forecasts follow.
209
CASE 9-7: AAA WASHINGTON
Fitted model: Yˆt =Yt −1 +Yt −12 −Yt −13 +εt +.56εt −1 −.8515εt −12
210
CASE 9-8: WEB RETAILER
Lag 12 24 36 48
Chi-Square 10.2 * * *
DF 11 * * *
P-Value 0.513 * * *
This model was suggested by an examination of the plots of the autocorrelation and
partial autocorrelation functions for the original series and the first differenced series.
Another potential model is an ARIMA(1,0,0)(0,0,1) 12 model. But if this model is fit to
the data, the estimate of the autoregressive parameter turns out to be very nearly 1,
confirming the choice of the initial ARIMA(0,1,0)(0,0,1) 12.
2. The model in part 1 is adequate. The is no residual autocorrelation and the residual plots
that follow look good.
211
3. Forecasts from period 25
95% Limits
Period Forecast Lower Upper
26 426280 242397 610163
27 492809 232759 752859
28 527275 208780 845770
29 535656 167890 903422
30 545614 134439 956789
31 692161 241741 1142580
32 554640 68131 1041149
33 494570 -25530 1014669
34 484265 -67384 1035914
35 471355 -110135 1052844
36 462995 -146876 1072867
212
37 491232 -145757 1128222
The pattern of the forecasts is reasonable but the forecast of the seasonal peak in
December (recall this series starts in June) is very likely to be much too low. The
actual December peak may be captured by the 95% prediction limits but, because of
the small sample size, these limits are wide. The lower prediction limit is even
negative for some lead times.
4. The sample size in this case is small. With only two years of monthly data, it is
difficult to estimate the seasonality precisely. Although an ARIMA model
does provide some insights into the nature of this series, another modeling approach
may produce more readily acceptable forecasts.
213
Lag 12 24 36 48
Chi-Square 7.4 14.2 * *
DF 11 23 * *
P-Value 0.770 0.921 * *
Cookie sales have a strong and quite consistent seasonal component but with
little or no growth. Following the usual pattern of looking at autocorrelations
and partial autocorrelations for the original series and its various differences, the
best patterns for model identification appear to be those for the original series and
the seasonally differenced series. In either case, a seasonal moving average term of
order 12 is included in the model to accommodate seasonality and can be deleted if
non-significant. Fitting an ARIMA(1,0,0)(0,0,1)12 model gives an estimated
autoregressive coefficient of about .9, suggesting perhaps a model with a regular
difference, residual autocorrelations and unattractive forecasts. This line of
inquiry is not useful. The ARIMA model above involving the seasonally
differenced data fits well and, as we shall see, produces reasonable forecasts.
214
3. The forecasts for the next 12 months follow. Judging from the time series plot,
they seem very reasonable.
95% Limits
Period Forecast Lower Upper
42 627865 328983 926748
43 721336 422453 1020219
44 658579 359696 957461
45 1533503 1234620 1832386
46 1628889 1330007 1927772
47 2070440 1771557 2369323
48 1805503 1506620 2104385
49 778148 479265 1077031
50 534265 235382 833148
51 525169 226286 824052
52 697168 398285 996051
53 624876 325994 923759
215
CASE 9-10: SOUTHWEST MEDICAL CENTER
1. Various plots follow. Given these plots, Mary’s initial model seems reasonable.
216
217
2. Results from fitting an ARIMA(0,1,1)(0,1,1)12 model follow along with a residual
analysis and forecasts for the next 12 months.
218
Lag 12 24 36 48
Chi-Square 21.2 53.0 72.9 88.1
DF 10 22 34 46
P-Value 0.020 0.000 0.000 0.000
95% Limits
Period Forecast Lower Upper
115 1419.59 1223.70 1615.49
116 1438.07 1205.16 1670.99
117 1386.09 1121.28 1650.90
118 1376.53 1083.27 1669.79
119 1459.48 1140.30 1778.66
120 1431.27 1088.12 1774.41
121 1365.43 999.88 1730.98
122 1456.48 1069.83 1843.14
123 1324.46 917.79 1731.12
124 1303.44 877.70 1729.17
125 1442.69 998.71 1886.68
126 1350.23 888.71 1811.74
219
Collectively, the residual autocorrelations are larger than they would be for random
errors; however, they suggest no obvious additional terms to add to the ARIMA model.
Apart from the large residual at month 68, the residual plots look good. The forecasts
seem reasonable but the 95% prediction limits are fairly wide.
3. Total visits for fiscal years 4, 5 and 6 seem somewhat removed from the rest of the data.
Total visit for these fiscal years are, as a group, somewhat larger than the remaining
observations. Did something unusual happen during these years? Was total visits
defined differently? This particular feature makes modeling difficult.
220
CHAPTER 10
1. The Delphi method can be used in any forecasting situation where there is little or no
historical data and there is expert opinion (experience) available. Two examples might
be:
221
forecasts and Regression forecasts are preferred.
1. & 2. Sue and Bill have tackled a very tough business project: designing a restaurant
that will succeed. Restaurants seem to come and go on a regular basis so their
planning efforts prior to opening are important.
They have already tried focus groups and have some ideas to add to their own.
Since they have a number of "expert" friends, some way must be found to use this
expertise. The Delphi method suggests itself as a way to utilize their friends'
knowledge. A written description of the project along with the question of proper
motif could be supplied to each of their friends, along with a request to design the
restaurant. These descriptions would then be mailed back to each participant with
a request to re-design the business based on all the written replies. This process
could be continued until changes are no longer generated.
An optional step would then be to bring the participants together for a discussion.
This expert focus group could argue their cases and respond to Sue and Bill's
objections or insights. At the end of this process Sue and Bill would probably have
a better idea of how a successful restaurant would look and could begin their project
with more confidence. Also, financial backers would probably be more enthusiastic
after reviewing the extensive planning that Sue and Bill have undertaken prior to
opening their business.
1. The naïve forecasting model is not very accurate. The MSE equals
8,648,047,253.
2. The MSE for the multiple regression model (from the regression output)
equals 2,097,765,646 which is quite a bit less than the naïve model.
3. If the naïve approach had been more accurate, combining methods would have
been worth a try.
4. If Julie did combine forecasts, she should use a weighted average that definitely
favored the multiple regression model.
1. These articles are more abundant than many realize. More "popular” journals,
particularly financial markets titles such as Technical Analysis of Stocks &
222
Commodities, Financial Analysts Journal, and Futures present several articles.
In addition, the proceedings from the neural network conferences (published by
IEEE) will usually have some business applications. Finally, this approach is
beginning to appear in more scholarly journals such as Management Science and
Decision Sciences.
1. The interested student with access to a neural network simulator should enjoy
this assignment. In addition to the "backpropagation" approach, students might
try radial basis functions and least mean squares if they are available.
CHAPTER 11
223
1. a. One response: Forecasts may not be right, but they improve the odds of being
close to right. More importantly, if there are no agreed upon set of forecasts to
drive planning, then different groups may develop own procedures to guide
planning with potential chaos as the result.
c. One response: Good forecasts require not only good quantitative skills, they also
require an in-depth understanding of the business or, more generally, the
forecasting environment and, ultimately, good communication skills to sell
forecasts to management.
1. This case invites students to think about how to use some of the forecasting techniques
discussed in Chapter 11. Guy Preston is trying to get his managers to think about the
long-range position of the company, as opposed to the short range thinking that most
managers are involved in on a daily basis. The case might generate a class discussion
about the tendency of managers to shorten their planning horizons too much in the
daily press of business.
Guy has asked his managers to write scenarios for the future: a worst case, a status quo,
and a most likely scenario. His next task might be to discuss each of these three
possibilities, and to discuss any differences of opinion that might emerge. A second
round of written scenarios by each participant could then follow this.
2. The instructor should point out that the purpose of Guy's retreat is to expand the
planning horizon of his managers. He should be prepared to continue this effort after
the first round of written scenarios: it is quite possible that his team is still caught
up in the affairs of the day and is not really engaged in long range thinking. He should
encourage expanded thinking after the discussion phase and try during the day to continue
such thinking.
3. There are two possible benefits from Guy's retreat. First, he may gain valuable insights
into the company's future to use in his own long range thinking. Second, and
probably more important, his managers may come away with an increased
awareness of the importance of expanding their planning horizons. If this is true,
the company will probably be in a better position to face the future.
224
case (β = 0), would expect Holt’s procedure to fit and forecast better here.
Therefore, there is no reason to consider a combination of forecasts. Combining
forecasts is best considered when the sets of forecasts are produced by different
procedures.
2. Jill should definitely update her historical data as new data points arrive. Since she
is using a computer program to do the forecasting, there would be very little effort
involved in this process. Why not update and re-run every quarter for a while?
3. After the results for a few additional quarters (say 4) become available, the analysis
can be re-done to see if the current model is still viable. Model parameters can be
re-estimated after each new observation if appropriate computer software is available.
4. Box-Jenkins ARIMA methodology is not well suited for small sample sizes and
can be difficult to explain to a non-statistician.
This case illustrates the practical problems that are typically encountered when
attempting to forecast a time series in a business setting. Among the problems Jill
encounters are:
• She chooses to forecast a national variable for which data values are available
in the Survey of Current Business. Will this variable correlate well with the
actual Y value of interest (her firm's export sales)?
• Her initial sample size is only 13.
• When she attempts to gather more data, she finds that the series underwent a
definition change during the recent past, resulting in inconsistent data. She must
shift her focus to another surrogate variable.
• Her data plot indicates a bump in the data and she decides a more
consistent series would result if she dropped the first few data points.
A real life-forecasting project could very likely involve difficulties such as those
Jill encountered in this case, or perhaps even more. For this reason this case is a "good
read" for forecasting students as they finish their studies since it shows that judgment
and skill must be involved in the forecasting effort: forecasting problems are not usually
as clean and straightforward as textbook problems.
Students should summarize the results of the analyses of these data in the cases at the
ends of chapters 4 (smoothing), 5 (decomposition), 6 (simple linear regression), 8 (regression
with time series data) and 9 (Box-Jenkins methods). Fits, residual analyses, and forecasts can be
compared. Regardless of the method, there is a fair amount of unexplained variation in the
number of new clients. This may be a situation where combining forecasts makes sense.
We collected the data from the Mr. Tux rental shop so that real data could be used at the
225
end of each chapter instead of contrived data. We didn't know what would happen when we tried
to forecast this variable, but we think it turned out well because no one method was superior.
The case in Chapter 11 summarizes the different ways John used to forecast his monthly
sales, and asks students to comment on his efforts. We think a key point is that a lot of real data
sets do not lend themselves to accurate forecasting, and that continually trying different methods is
required. For the Mr. Tux data, there are fairly simple seasonal models (see the cases in Chapters
8 and 9) that represent the data well and provide reasonable forecasts.
What advice should we give to John Mosby for the future? Some suggestions to offer
might include:
1. Update the data set as future monthly values become available and re-run the
most promising analyses to see if the current forecasting model is still viable.
1. Julie has to choose between two different methods of forecasting her company’s
monthly sales. Students should review the results of these two efforts and decide
which offers the better choice. We find that class presentations by student teams
are valuable as they move the analysis beyond the computer results to simulate
implementing these results in a “real” situation.
226
4. Only if she finds two good methods.
Students should summarize the results of Mary’s forecasting efforts describing the fits,
residual analyses and forecasts. Moreover, they should point out the apparent difficulty in
finding an adequate model for Mary’s total visits series. If Mary’s data is accurate—there is no
reason for the apparent inconsistency in her time series—then it would probably be wise to
collect another year or so of data and attempt to model the entire data set or, perhaps, just the
data following fiscal year 6. In the interim, she may have to settle for the forecasts from the
best ARIMA model developed in Case 9-10.
227