Professional Documents
Culture Documents
1.1
Suppose you are a vendor who sells a product that is in high demand (e.g. cold beer on the beach,
cable television in Gainesville, or life jackets on the Titanic, to name a few). If you begin your day
with 100 items, have a profit of $10 per item, and an overhead of $30 per day, you know exactly
how much profit you will make that day, namely 100(10)-30=$970. Similarly, if you begin the day
with 50 items, you can also state your profits with certainty. In fact for any number of items you
begin the day with (x), you can state what the days profits (y) will be. That is,
y = 10 x 30.
This is called a deterministic model. In general, we can write the equation for a straight line as
y = 0 + 1 x,
1
where 0 is called the yintercept and 1 is called the slope. 0 is the value of y when x = 0,
and 1 is the change in y when x increases by 1 unit. In many realworld situations, the response
of interest (in this example its profit) cannot be explained perfectly by a deterministic model. In
this case, we make an adjustment for random variation in the process.
1.2
The adjustment people make is to write the mean response as a linear function of the predictor
variable. This way, we allow for variation in individual responses (y), while associating the mean
linearly with the predictor x. The model we fit is as follows:
E(y|x) = 0 + 1 x,
and we write the individual responses as
y = 0 + 1 x + ,
We can think of y as being broken into a systematic and a random component:
y = 0 + 1 x + |{z}
| {z }
systematic random
where x is the level of the predictor variable corresponding to the response, 0 and 1 are
unknown parameters, and is the random error component corresponding to the response whose
distribution we assume is N (0, ), as before. Further, we assume the error terms are independent
from one another, we discuss this in more detail in a later chapter. Note that 0 can be interpreted
as the mean response when x=0, and 1 can be interpreted as the change in the mean response
when x is increased by 1 unit. Under this model, we are saying that y|x N (0 +1 x, ). Consider
the following example.
Example 1.1 Coffee Sales and Shelf Space
A marketer is interested in the relation between the width of the shelf space for her brand of
coffee (x) and weekly sales (y) of the product in a suburban supermarket (assume the height is
always at eye level). Marketers are well aware of the concept of compulsive purchases, and know
that the more shelf space their product takes up, the higher the frequency of such purchases. She
believes that in the range of 3 to 9 feet, the mean weekly sales will be linearly related to the
width of the shelf space. Further, among weeks with the same shelf space, she believes that sales
will be normally distributed with unknown standard deviation (that is, measures how variable
weekly sales are at a given amount of shelf space). Thus, she would like to fit a model relating
weekly sales y to the amount of shelf space x her product receives that week. That is, she is fitting
the model:
y = 0 + 1 x + ,
so that y|x N (0 + 1 x, ).
One limitation of linear regression is that we must restrict our interpretation of the model to
the range of values of the predictor variables that we observe in our data. We cannot assume this
linear relation continues outside the range of our sample data.
We often refer to 0 + 1 x as the systematic component of y and as the random component.
1.3
We now have the problem of using sample data to compute estimates of the parameters 0 and
1 . First, we take a sample of n subjects, observing values y of the response variable and x of the
predictor variable. We would like to choose as estimates for 0 and 1 , the values b0 and b1 that
best fit the sample data. Consider the coffee example mentioned earlier. Suppose the marketer
conducted the experiment over a twelve week period (4 weeks with 3 of shelf space, 4 weeks with
6, and 4 weeks with 9), and observed the sample data in Table 1.
Shelf Space
x
6
3
6
9
3
9
Weekly Sales
y
526
421
581
630
412
560
Shelf Space
x
6
3
9
6
3
9
Weekly Sales
y
434
443
590
570
346
672
SALES
700
600
500
400
300
0
12
SPACE
n
X
(yi yi )2 =
i=1
n
X
i=1
Pn
and
Pn
i=1 yi
b0 = y 1 x =
Pn
i=1
xi
.
n
n
An alternative formula, but exactly the same mathematically, is to compute the sample
covariance of x and y, as well as the sample variance of x, then taking the ratio. This
is the the approach your book uses, but is extra work from the formula above.
cov(x, y) =
Pn
i=1 (xi
x)(yi y)
SSxy
=
n1
n1
s2x
b1
SSxy
,
SSxx
Pn
x)2
SSxx
=
n1
n1
i=1 (xi
b1 =
cov(x, y)
s2x
Some shortcut equations, known as the corrected sums of squares and crossproducts, that while
not very intuitive are very useful in computing these and other estimates are:
SSxx =
Pn
SSxy =
Pn
SSyy =
Pn
i=1 (xi
i=1 (xi
i=1 (yi
x)2
Pn
2
i=1 xi
x)(yi y) =
y)2
Pn
Pn
i=1
xi )2
Pn
2
i=1 yi
i=1
xi yi
Pn
i=1
Pn
i=1
Pn
xi )(
n
i=1
yi )
yi )2
Space (x)
6
3
6
9
3
9
6
3
9
6
3
9
P
x = 72
Sales (y)
526
421
581
630
412
560
434
443
590
570
346
672
P
y = 6185
x2
36
9
36
81
9
81
36
9
81
36
9
81
P 2
x = 504
xy
3156
1263
3486
5670
1236
5040
2604
1329
5310
3420
1038
6048
P
xy = 39600
y2
276676
177241
337561
396900
169744
313600
188356
196249
348100
324900
119716
451584
P 2
y = 3300627
SSxx =
SSxy =
(x x) =
(x x)(y y) =
SSyy =
(y y) =
xy
x)2
(72)2
= 504
= 72
n
12
x)(
n
y)
= 39600
(72)(6185)
= 2490
12
y)2
(6185)2
= 3300627
= 112772.9
n
12
From these, we obtain the least squares estimate of the true linear regression relation (0 +1 x).
b1 =
b0 =
b1
SSxy
2490
=
= 34.5833
SSxx
72
x
6185
72
=
34.5833( ) = 515.4167 207.5000 = 307.967.
n
12
12
y
b0 + b1 x
307.967 + 34.583x
So the fitted equation, estimating the mean weekly sales when the product has x feet of shelf
space is y = 0 + 1 x = 307.967 + 34.5833x. Our interpretation for b1 is the estimate for the
increase in mean weekly sales due to increasing shelf space by 1 foot is 34.5833 bags of coffee.
Note that this should only be interpreted within the range of x values that we have observed in the
experiment, namely x = 3 to 9 feet.
Example 1.2 Computation of a Stock Beta
A widely used measure of a companys performance is their beta. This is a measure of the firms
stock price volatility relative to the overall markets volatility. One common use of beta is in the
capital asset pricing model (CAPM) in finance, but you will hear them quoted on many business
news shows as well. It is computed as (Value Line):
The beta factor is derived from a least squares regression analysis between weekly
percent changes in the price of a stock and weekly percent changes in the price of all
stocks in the survey over a period of five years. In the case of shorter price histories, a
smaller period is used, but never less than two years.
In this example, we will compute the stock beta over a 28-week period for Coca-Cola and
Anheuser-Busch, using the S&P500 as the market for comparison. Note that this period is only
about 10% of the period used by Value Line. Note: While there are 28 weeks of data, there are
only n=27 weekly changes.
Table 3 provides the dates, weekly closing prices, and weekly percent changes of: the S&P500,
Coca-Cola, and Anheuser-Busch. The following summary calculations are also provided, with x
representing the S&P500, yC representing Coca-Cola, and yA representing Anheuser-Busch. All
calculations should be based on 4 decimal places. Figure 2 gives the plot and least squares regression
line for Anheuser-Busch, and Figure 3 gives the plot and least squares regression line for Coca-Cola.
5
X
X
x = 15.5200
yC = 2.4882
yA = 2.4281
x2 = 124.6354
2
yC
= 461.7296
2
yA
= 195.4900
xyC = 161.4408
xyA = 84.7527
Closing
Date
05/20/97
05/27/97
06/02/97
06/09/97
06/16/97
06/23/97
06/30/97
07/07/97
07/14/97
07/21/97
07/28/97
08/04/97
08/11/97
08/18/97
08/25/97
09/01/97
09/08/97
09/15/97
09/22/97
09/29/97
10/06/97
10/13/97
10/20/97
10/27/97
11/03/97
11/10/97
11/17/97
11/24/97
S&P
Price
829.75
847.03
848.28
858.01
893.27
898.70
887.30
916.92
916.68
915.30
938.79
947.14
933.54
900.81
923.55
899.47
929.05
923.91
950.51
945.22
965.03
966.98
944.16
941.64
914.62
927.51
928.35
963.09
A-B
Price
43.00
42.88
42.88
41.50
43.00
43.38
42.44
43.69
43.75
45.50
43.56
43.19
43.50
42.06
43.38
42.63
44.31
44.00
45.81
45.13
44.75
43.63
42.25
40.69
39.94
40.81
42.56
43.63
C-C
Price
66.88
68.13
68.50
67.75
71.88
71.38
71.00
70.75
69.81
69.25
70.13
68.63
62.69
58.75
60.69
57.31
59.88
57.06
59.19
61.94
62.38
61.69
58.50
55.50
56.63
57.00
57.56
63.75
S&P
% Chng
2.08
0.15
1.15
4.11
0.61
-1.27
3.34
-0.03
-0.15
2.57
0.89
-1.44
-3.51
2.52
-2.61
3.29
-0.55
2.88
-0.56
2.10
0.20
-2.36
-0.27
-2.87
1.41
0.09
3.74
A-B
% Chng
-0.28
0.00
-3.22
3.61
0.88
-2.17
2.95
0.14
4.00
-4.26
-0.85
0.72
-3.31
3.14
-1.73
3.94
-0.70
4.11
-1.48
-0.84
-2.50
-3.16
-3.69
-1.84
2.18
4.29
2.51
C-C
% Chng
1.87
0.54
-1.09
6.10
-0.70
-0.53
-0.35
-1.33
-0.80
1.27
-2.14
-8.66
-6.28
3.30
-5.57
4.48
-4.71
3.73
4.65
0.71
-1.11
-5.17
-5.13
2.04
0.65
0.98
10.75
ya
5
4
3
2
1
0
-1
-2
-3
-4
-5
-4
-3
-2
-1
Figure 2: Plot of weekly percent stock price changes for Anheuser-Busch versus S&P 500 and least
squares regression line
ya
5
4
3
2
1
0
-1
-2
-3
-4
-5
-4
-3
-2
-1
Figure 3: Plot of weekly percent stock price changes for Coca-Cola versus S&P 500 and least
squares regression line
xi
46.75
42.18
41.86
43.29
42.12
41.78
41.47
42.21
41.03
39.84
39.15
39.20
39.52
38.05
39.16
38.59
yi
92.64
88.81
86.44
88.80
86.38
89.87
88.53
91.11
81.22
83.72
84.54
85.66
85.87
85.23
87.75
92.62
i
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
xi
36.54
37.03
36.60
37.58
36.48
38.25
37.26
38.59
40.89
37.66
38.79
38.78
36.70
35.10
33.75
34.29
yi
91.56
84.12
81.22
83.35
82.29
80.92
76.92
78.35
74.57
71.60
65.64
62.09
61.66
77.14
75.47
70.37
i
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
xi
32.26
30.97
28.20
24.58
20.25
17.09
14.35
13.11
9.50
9.74
9.34
7.51
8.35
6.25
5.45
3.79
yi
66.71
64.37
56.09
50.25
43.65
38.01
31.40
29.45
29.02
19.05
20.36
17.68
19.23
14.92
11.44
12.69
n
X
xi = 1491.23
i=1
n
X
x2i = 54067.42
i=1
yi = 3140.78
i=1
n
X
yi2 = 238424.46
i=1
n
X
xi yi = 113095.80
i=1
Pn
2
i=1 xi
Pn
SSyy =
Pn
b1
i=1 xi yi
Pn
2
i=1 yi
xi )2
= 54067.42
Pn
SSxy =
15520.27
i=1
i=1
Pn
i=1
yi )2
Pn
i=1 xi yi
Pn
Pn
xi )(
n
i=1
yi )
(1491.23)2
48
= 113095.80 (1491.23)(3140.78)
= 113095.8097575.53 =
48
= 238424.46
Pn
2
i=1 xi
(3140.78)2
48
Pn
xi )( i=1 yi )
Pn n
( i=1 xi )2
n
i=1
SSxy
15520.27
=
SSxx
7738.94
2.0055
b0
=
yi
ei
y b1 x
=
b 0 + b 1 xi
yi yi
65.4329 (2.0055)(31.0673)
=
3.1274 + 2.0055xi
3.1274
i = 1, . . . , 48
yi (3.1274 + 2.0055xi )
i = 1, . . . , 48
Table 1.3 gives the raw data, their fitted values, and residuals.
A plot of the data and regression line are given in Figure 4.
Y=Math Score
cost
100
score
80
90
8 07 0
70
60
60
50
50
40
40
30
30
20
1 02 0
01
21 0
3 20
30
40 6
5 07
scioznec
We have seen now, how to estimate 0 and 1 . Now we can obtain an estimate of the variance of
the responses at a given value of x. Recall from your previous statistics course, you estimated the
variance by taking the average squared
P deviation of each measurement from the sample (estimated)
n
(yi y)2
SSE
= M SE =
=
n2
Pn
(SS
)2
SSyy SSxy
yi )
xx
=
n2
n2
i=1 (yi
n1
n2
s2y
[cov(x, y)]2
s2x
i
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
xi
46.75
42.18
41.86
43.29
42.12
41.78
41.47
42.21
41.03
39.84
39.15
39.20
39.52
38.05
39.16
38.59
36.54
37.03
36.60
37.58
36.48
38.25
37.26
38.59
40.89
37.66
38.79
38.78
36.70
35.10
33.75
34.29
32.26
30.97
28.20
24.58
20.25
17.09
14.35
13.11
9.50
9.74
9.34
7.51
8.35
6.25
5.45
3.79
yi
92.64
88.81
86.44
88.80
86.38
89.87
88.53
91.11
81.22
83.72
84.54
85.66
85.87
85.23
87.75
92.62
91.56
84.12
81.22
83.35
82.29
80.92
76.92
78.35
74.57
71.60
65.64
62.09
61.66
77.14
75.47
70.37
66.71
64.37
56.09
50.25
43.65
38.01
31.40
29.45
29.02
19.05
20.36
17.68
19.23
14.92
11.44
12.69
yi
96.88
87.72
87.08
89.95
87.60
86.92
86.30
87.78
85.41
83.03
81.64
81.74
82.38
79.44
81.66
80.52
76.41
77.39
76.53
78.49
76.29
79.84
77.85
80.52
85.13
78.65
80.92
80.90
76.73
73.52
70.81
71.90
67.82
65.24
59.68
52.42
43.74
37.40
31.91
29.42
22.18
22.66
21.86
18.19
19.87
15.66
14.06
10.73
ei
-4.24
1.09
-0.64
-1.15
-1.22
2.95
2.23
3.33
-4.19
0.69
2.90
3.92
3.49
5.79
6.09
12.10
15.15
6.73
4.69
4.86
6.00
1.08
-0.93
-2.17
-10.56
-7.05
-15.28
-18.81
-15.07
3.62
4.66
-1.53
-1.11
-0.87
-3.59
-2.17
-0.09
0.61
-0.51
0.03
6.84
-3.61
-1.50
-0.51
-0.64
-0.74
-2.62
1.96
Table 5: Approximated Monthly Outputs, total costs, fitted values and residuals Dean (1941)
.
10
This estimated error variance s2e can be thought of as the average squared distance from each
observed response to the fitted line. The word average is in quotes since we divide by n 2 and not
n. The closer the observed responses fall to the line, the smaller s2e is and the better our predicted
values will be.
Example 1.1 (Continued) Coffee Sales and Shelf Space
For the coffee data,
2
112772.9 (2490)
112772.9 86112.5
72
=
=
= 2666.04,
12 2
10
and the estimated residual standard error (deviation) is Se = 2666.04 = 51.63. We now have
estimates for all of the parameters of the regression equation relating the mean weekly sales to the
amount of shelf space the coffee gets in the store. Figure 5 shows the 12 observed responses and
the estimated (fitted) regression equation.
s2e
SALES
700
600
500
400
300
0
12
SPACE
Pn
i )
i=1 (yi y
s2e = M SE = SSE
n2 =
se = 38.88 = 6.24
SS 2
xy
= SSyy SSxx
= 32914.06 (15520.27)
7738.94 = 32914.0631125.55 = 1788.51
1788.51
482
= 38.88
11
(xi x)
b1 N (1 ,
)
SSxx
2.1
First, we obtain the estimated standard error of b1 (this is the standard deviation of its sampling
distibution:
se
se
sb1 =
=p
SSxx
(n 1)s2x
The interval can be written:
se
b1 t/2,n2 sb1 b1 t/2,n2
.
SSxx
se
Note that SS
is the estimated standard error of b1 since we use se = M SE to estimate .
xx
Also, we have n 2 degrees of freedom instead of n 1, since the estimate s2e has 2 estimated
paramters used in it (refer back to how we calculate it above).
Example 2.1 Coffee Sales and Shelf Space
For the coffee sales example, we have the following results:
b1 = 34.5833,
SSxx = 72,
se = 51.63,
n = 12.
12
gets (within the range of 3 to 9 feet). Note that the entire interval is positive (above 0), so we are
confident that in fact 1 > 0, so the marketer is justified in pursuing extra shelf space.
Example 2.2 Hosiery Mill Cost Function
b1 = 2.0055,
SSxx = 7738.94,
se = 6.24,
n = 48.
For the hosiery mill cost function analysis, we obtain a 95% confidence interval for average unit
variable costs (1 ). Note that t.025,482 = t.025,46 2.015, since t.025,40 = 2.021 and t.025,60 = 2.000
(we could approximate this with z.025 = 1.96 as well).
2.0055 t.025,46
6.24
= 2.0055 2.015(.0709) = 2.0055 0.1429 = (1.8626, 2.1484)
7738.94
We are 95% confident that the true average unit variable costs are between $1.86 and $2.15 (this
is the incremental cost of increasing production by one unit, assuming that the production process
is in place.
2.2
Similar to the idea of the confidence interval, we can set up a test of hypothesis concerning 1 .
Since the confidence interval gives us the range of believable values for 1 , it is more useful than
a test of hypothesis. However, here is the procedure to test if 1 is equal to some value, say 10 .
H0 : 1 = 10 (10 specified, usually 0)
(1) Ha : 1 6= 10
(2) Ha : 1 > 10
(3) Ha : 1 < 10
T S : tobs =
b1 10
se
SSxx
b1 10
sb1
the mean weekly sales must increase by more than 20 bags, otherwise the expense outweighs the
increase in sales. She wants to test to see if it is worth it to buy more space. She works under the
assumption that it is not worth it, and will only purchase more if she can show that it is worth it.
She sets = .05.
1. H0 : 1 = 20
2. T.S.: tobs =
HA : 1 > 20
34.583320
51.63
72
14.5833
6.085
= 2.397
2.00550
6.24
7738.94
2.0055
0.0709
= 28.29
2.3
Consider the deviations of the individual responses, yi , from their overall mean y. We would
like to break these deviations into two parts, the deviation of the observed value from its fitted
value, yi = b0 + b1 xi , and the deviation of the fitted value from the overall mean. See Figure 6
corresponding to the coffee sales example. That is, wed like to write:
yi y = (yi yi ) + (
yi y).
Note that all we are doing is adding and subtracting the fitted value. It so happens that
algebraically we can show the same equality holds once weve squared each side of the equation
and summed it over the n observed and fitted values. That is,
n
X
i=1
(yi y)2 =
n
X
(yi yi )2 +
i=1
n
X
i=1
14
(
yi y)2 .
SALES
700
600
500
400
300
0
12
SPACE
Figure 6: Plot of coffee data, fitted equation, and the line y = 515.4167
These three pieces are called the total, error, and model sums of squares, respectively. We
denote them as SSyy , SSE, and SSR, respectively. We have already seen that SSyy represents the
total variation in the observed responses, and that SSE represents the variation in the observed
responses around the fitted regression equation. That leaves SSR as the amount of the total
variation that is accounted for by taking into account the predictor variable X. We can use
this decomposition to test the hypothesis H0 : 1 = 0 vs HA : 1 6= 0. We will also find this
decomposition useful in subsequent sections when we have more than one predictor variable. We
first set up the Analysis of Variance (ANOVA) Table in Table 6. Note that we will have to
make minimal calculations to set this up since we have already computed SSyy and SSE in the
regression analysis.
Source of
Variation
MODEL
ERROR
TOTAL
ANOVA
Sum of
Degrees of
Squares
Freedom
Pn
2
SSR = i=1 (
yi y)
1
P
SSE = ni=1 (yi yi )2
n2
P
SSyy = ni=1 (yi y)2
n1
Mean
Square
M SR = SSR
1
M SE = SSE
n2
F
M SR
F =M
SE
The procedure of testing for a linear association between the response and predictor variables
using the analysis of variance involves using the F distribution, which is given in Table 6 (pp
B-11B-16) of your text book. This is the same distribution we used in the chapteron the 1-Way
ANOVA.
The testing procedure is as follows:
1. H0 : 1 = 0
2. T.S.: Fobs =
15
4. p-value: P (F > Fobs ) (You can only get bounds on this, but computer outputs report them
exactly)
Note that we already have a procedure for testing this hypothesis (see the section on Inferences
Concerning 1 ), but this is an important leadin to multiple regression.
Example 2.1 (Continued) Coffee Sales and Shelf Space
Referring back to the coffee sales data, we have already made the following calculations:
SSyy = 112772.9,
SSE = 26660.4,
n = 12.
We then also have that SSR = SSyy SSE = 86112.5. Then the Analysis of Variance is given in
Table 7.
ANOVA
Source of
Variation
MODEL
ERROR
TOTAL
Sum of
Squares
SSR = 86112.5
SSE = 26660.4
SSyy = 112772.9
Degrees of
Freedom
1
12 2 = 10
12 1 = 11
Mean
Square
M SR = 86112.5
= 86112.5
1
26660.4
M SE = 10 = 2666.04
F
F =
86112.5
2666.04
= 32.30
Table 7: The Analysis of Variance Table for the coffee data example
To test the hypothesis of no linear association between amount of shelf space and mean weekly
coffee sales, we can use the F -test described above. Note that the null hypothesis is that there is
no effect on mean sales from increasing the amount of shelf space. We will use = .01.
1. H0 : 1 = 0
2. T.S.: Fobs =
HA : 1 6= 0
M SR
M SE
86112.5
2666.04
= 32.30
Pn
y)2 = 32914.06
dfT otal = n 1 = 47
Error SS SSE =
Pn
yi )2 = 1788.51
dfE = n 2 = 46
i=1 (yi
Model SS SSR =
i=1 (yi
Pn
yi
i=1 (
dfR = 1
ANOVA
Source of
Variation
MODEL
ERROR
TOTAL
Sum of
Squares
SSR = 31125.55
SSE = 1788.51
SSyy = 32914.06
Degrees of
Freedom
1
48 2 = 46
48 1 = 47
Mean
Square
M SR = 31125.55
= 31125.55
1
M SE = 1788.51
= 38.88
46
F
F =
31125.55
38.88
= 800.55
Table 8: The Analysis of Variance Table for the hosiery mill cost example
HA : 1 6= 0
M SR
M SE
31125.55
38.88
= 800.55
Coefficient of Determination
A measure of association that has a clear physical interpretation is R2 , the coefficient of determination. This measure is always between 0 and 1, so it does not reflect whether y and x are
positively or negatively associated, and it represents the proportion of the total variation in the
response variable that is accounted for by fitting the regression on x. The formula for R2 is:
R2 = (R)2 = 1
SSE
SSR
[cov(x, y)]2
=
=
.
SSyy
SSyy
s2x s2y
Note that SSyy = ni=1 (yi y)2 represents the total variation in the response variable, while
P
SSE = ni=1 (yi yi )2 represents the variation in the observed responses about the fitted equation
(after taking into account x). This is why we sometimes say that R2 is proportion of the variation
in y that is explained by x.
Example 2.1 (Continued) Coffee Sales and Shelf Space
For the coffee data, we can calculate R2 using the values of SSxy , SSxx , SSyy , and SSE we have
previously obtained.
26660.4
86112.5
R2 = 1
=
= .7636
112772.9
112772.9
Thus, over 3/4 of the variation in sales is explained by the model using shelf space to predict
sales.
Example 2.2 (Continued) Hosiery Mill Cost Function
17
For the hosiery mill data, the model (regression) sum of squares is SSR = 31125.55 and the
total sum of squares is SSyy = 32914.06. To get the coefficient of determination:
R2 =
31125.55
= 0.9457
32914.06
Almost 95% of the variation in monthly production costs is explained by the monthly production
output.
18
y0 N (0 + 1 xg ,
1
n
x)
+ Pn(xg (x
. That is:
x)2
i=1
1
(xg x)2
).
+ Pn
2
n
i=1 (xi x)
Note that the standard error of the estimate is smallest at xg = x, that is at the mean of the
sampled levels of the predictor variable. The standard error increases as the value xg goes away
from this mean.
For instance, our marketer may wish to estimate the mean sales when she has 60 of shelf space,
or 70 , or 40 . She may also wish to obtain a confidence interval for the mean at these levels of x.
3.1
Using the ideas described in the previous section, we can write out the general form for a (1)100%
confidence interval for the mean response when xg .
s
(b0 + b1 xg ) t/2,n2 Se
1 (xg x)2
+
n
SSxx
1
(4 6)2
+
= 446.300 115.032 .1389
12
72
1
(6 6)2
+
= 515.467 115.032 .0833
12
72
1
(7 6)2
+
= 550.050 115.032 .0972
12
72
19
600
500
400
300
0
12
SPACE
Figure 7: Plot of coffee data, fitted equation, and 95% confidence limits for the mean
(30 31.0673)2
1
+
48
7738.94
63.29 1.82
(61.47, 65.11)
We can be 95% confident that the mean production costs among months where 30,000 items are
produced is between $61,470 and $65,110 (recall units were thousands for x and thousands for y).
A plot of the data, regression line, and 95% confidence bands for mean costs is given in Figure 8.
3.2
In many situations, a researcher would like to predict the outcome of the response variable at a
specific level of the predictor variable. In the previous section we estimated the mean response,
in this section we are interested in predicting a single outcome. In the context of the coffee sales
example, this would be like trying to predict next weeks sales given we know that we will have 60
of shelf space.
20
cost
100
90
80
70
60
50
40
30
20
10
0
10
20
30
40
50
size
Figure 8: Plot of hosiery mill cost data, fitted equation, and 95% confidence limits for the mean
21
First, suppose you know the parameters 0 and 1 Then you know that the response variable,
for a fixed level of the predictor variable (x = xg ), is normally distributed with mean E(y|xg ) =
0 + 1 xg and standard deviation . We know from previous work with the normal distribution
that approximately 95% of the measurements lie within 2 standard deviations of the mean. So if we
know 0 , 1 , and , we would be very confident that our response would lie between (0 +1 xg )2
and (0 + 1 xg ) + 2. Figure 9 represents this idea.
F2
0.040
0.035
0.030
0.025
0.020
0.015
0.010
0.005
0.000
50
60
70
80
60
100
140
180
the mean response. If we use the method from the last paragraph on each of these two distributions,
we can obtain the prediction interval by choosing the lefthand point of the lower distribution
and the righthand point of the upper distribution. This is displayed in Figure 11.
F1
0.040
0.035
0.030
0.025
0.020
0.015
0.010
0.005
0.000
20
60
100
140
180
Figure 11: Upper and lower prediction limits when we have estimated the mean
The general formula for a (1 )100% prediction interval of a future response is similar to the
confidence interval for the mean at xg , except that it is wider to reflect the variation in individual
responses. The formula is:
s
(b0 + b1 xg ) t/2,n2 s 1 +
1 (xg x)2
.
+
n
SSxx
(5 6)2
1
+
= 480.883 93.554 1.0972
12
72
This interval is relatively wide, reflecting the large variation in weekly sales at each level of x. Note
that just as the width of the confidence interval for the mean response depends on the distance
between xg and x, so does the width of the prediction interval. This should be of no surprise,
considering the way we set up the prediction interval (see Figure 10 and Figure 11). Figure 12
shows the fitted equation and 95% prediction limits for this example.
It must be noted that a prediction interval for a future response is only valid if conditions are
similar when the response occurs as when the data was collected. For instance, if the store is being
boycotted by a bunch of animal rights activists for selling meat next week, our prediction interval
will not be valid.
23
SALES
700
600
500
400
300
0
12
SPACE
Figure 12: Plot of coffee data, fitted equation, and 95% prediction limits for a single response
Example 3.2 (Continued) Hosiery Mill Cost Function
Suppose the plant manager knows based on purchase orders that this month, her plant will
produce 30,000 items (xg = 30.0). She would like to predict what the plants production costs will
be. She obtains a 95% prediction interval for this months costs.
r
(30 31.0673)2
1
+
48
7738.94
63.29 12.70
(50.59, 75.99)
She predicts that the costs for this month will be between $50,590 and $75,990. This interval is
much wider than the interval for the mean, since it includes random variation in monthly costs
around the mean. A plot of the 95% prediction bands is given in Figure 13.
3.3
Coefficient of Correlation
In many situations, we would like to obtain a measure of the strength of the linear association
between the variables y and x. One measure of this association that is reported in research journals
from many fields is the Pearson product moment coefficient of correlation. This measure, denoted
by r, is a number that can range from -1 to +1. A value of r close to 0 implies that there is very
little association between the two variables (y tends to neither increase or decrease as x increases).
A positive value of r means there is a positive association between y and x (y tends to increase as
x increases). Similarly, a negative value means there is a negative association (y tends to decrease
as x increases). If r is either +1 or -1, it means the data fall on a straight line (SSE = 0) that has
either a positive or negative slope, depending on the sign of r. The formula for calculating r is:
SSxy
cov(x, y)
r=p
=
.
sx sy
SSxx SSyy
Note that the sign of r is always the same as the sign of b1 . We can test whether a population
coefficient of correlation is 0, but since the test is mathematically equivalent to testing whether
1 = 0, we wont cover this test.
24
cost
100
90
80
70
60
50
40
30
20
10
0
10
20
30
40
50
size
Figure 13: Plot of hosiery mill cost data, fitted equation, and 95% prediction limits for an individual
outcome
25
26
Source
Model
Error
C Total
Root MSE
Dep Mean
51.63566
515.41667
Mean
Square
86112.50000
2666.24167
R-square
Adj R-sq
F Value
32.297
Prob>F
0.0002
0.7636
0.7399
Parameter Estimates
Variable
INTERCEP
SPACE
DF
1
1
Parameter
Estimate
307.916667
34.583333
Obs
1
2
3
4
5
6
7
8
9
10
11
12
Dep Var
SALES
421.0
412.0
443.0
346.0
526.0
581.0
434.0
570.0
630.0
560.0
590.0
672.0
Predict
Value
411.7
411.7
411.7
411.7
515.4
515.4
515.4
515.4
619.2
619.2
619.2
619.2
Obs
1
2
3
4
5
6
Upper95%
Predict
538.1
538.1
538.1
538.1
635.2
635.2
Residual
9.3333
0.3333
31.3333
-65.6667
10.5833
65.5833
Standard
Error
39.43738884
6.08532121
Std Err
Predict
23.568
23.568
23.568
23.568
14.906
14.906
14.906
14.906
23.568
23.568
23.568
23.568
Obs
7
8
9
10
11
12
T for H0:
Parameter=0
7.808
5.683
Lower95%
Mean
359.2
359.2
359.2
359.2
482.2
482.2
482.2
482.2
566.7
566.7
566.7
566.7
Upper95%
Predict
635.2
635.2
745.6
745.6
745.6
745.6
27
Upper95%
Mean
464.2
464.2
464.2
464.2
548.6
548.6
548.6
548.6
671.7
671.7
671.7
671.7
Residual
-81.4167
54.5833
10.8333
-59.1667
-29.1667
52.8333
Logistic Regression
Often, the outcome is nominal (or binary), and we wish to relate the probability that an outcome
has the characteristic of interest to an interval scale predictor variable. For instance, a local service
provider may be interested in the probability that a customer will redeem a coupon that is mailed
to him/her as a function of the amount of the coupon. We would expect that as the value of the
coupon increases, so does the proportion of coupons redeemed. An experiment could be conducted
as follows.
Choose a range of reasonable coupon values (say x=$0 (flyer only), $1, $2, $5, $10)
Identify a sample of customers (say 200 households)
Randomly assign customers to coupon values (say 40 per coupon value level)
Send out coupons, and determine whether each coupon was redeemed by the expiration date
(y = 1 if yes, 0 if no)
Tabulate results and fit estimated regression equation.
Note that probabilities are bounded by 0 and 1, so we cannot fit a linear regression, since it will
provide fitted values outside this range (unless b0 is between 0 and 1 and b1 is 0). We consider the
following model, that does force fitted probabilities to lie between 0 and 1:
p(x) =
e0 +1 x
1 + e0 +1 x
e = 2.71828 . . .
Unfortunately, unlike the case of simple linear regression, where there are close form equations
for least squares estimates of 0 and 1 , computer software must be used to obtain maximum
likelihood estimates of 0 and 1 , as well as their standard errors. Fortunately, many software
packages (e.g. SAS, SPSS, Statview) offer procedures to obtain the estimates, standard errors
and tests. We will give estimates and standard errors in this section, obtained from one of these
packages. Once the estimates of 0 and 1 are obtained, which we will label as b0 and b1 respectively,
we obtain the fitted equation:
P (x) =
eb0 +b1 x
1 + eb0 +b1 x
e = 2.71828 . . .
sb0 = .1354
b1 = 0.0313
28
sb1 = .0034
Dose (x)
0
25
50
100
# Responding
y=1 y=0
50
149
54
42
81
24
85
16
n
199
96
105
101
phatx
1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0.0
0
10
20
30
40
50
60
70
80
90
100
dose
29
A plot of the fitted equation (line) and the sample proportions at each dose (dots) are given in
Figure 14.
4.1
e0 +1 x
1 + e0 +1 x
e = 2.71828
Note that if 1 = 0, then the equation becomes p(x) = e0 /(1 + e0 ). That is, the probability
that the outcome is the characteristic of interest is not related to x, the predictor variable. In
terms of the Viagra example, this would mean that the probability a patient shows improvement
is independent of dose. This is what we would expect if the drug were not effective (still allowing
for a placebo effect).
Futher, note that if 1 > 0, the probability of the characteristic of interest occurring increases
in x, and if 1 < 0, the probability decreases in x. We can test whether 1 = 0 as follows:
H0 : 1 = 0 (Probability of outcome is independent of x)
HA : 1 6= 0 (Probability of outcome is associated with x)
2 = [b /s ]2
Test Statistic: Xobs
1 b1
2 2
Rejection Region: Xobs
,1 (=3.841, for = 0.05).
2
P -value: Area in 21 distribution above Xobs
Thus, we have strong evidence of a positive association (since b1 > 0 and we reject H0 ) between
probability of improvement and dose.
30
5.1
In general, if we have k predictor variables, we can write our response variable as:
y = 0 + 1 x1 + + k xk + .
Again, x is broken into a systematic and a random component:
y = 0 + 1 x1 + + k xk + |{z}
|
{z
}
random
systematic
We make the same assumptions as before in terms of , specifically that they are independent and
normally distributed with mean 0 and standard deviation . That is, we are assuming that y, at a
given set of levels of the k independent variables (x1 , . . . , xk ) is normal with mean E[y|x1 , . . . , xk ] =
0 + 1 x1 + + k xk and standard deviation . Just as before, 0 , 1 , . . . , k , and are unknown
parameters that must be estimated from the sample data. The parameters i represent the change
in the mean response when the ith predictor variable changes by 1 unit and all other predictor
variables are held constant.
In this model:
y Random outcome of the dependent variable
0 Regression constant (E(y|x1 = = xk = 0) if appropriate)
i Partial regression coefficient for variable xi (Change in E(y) when xi increases by 1 unit
0
and all other x s are held constant)
Random error term, assumed (as before) that N (0, )
k The number of independent variables
By the method of least squares (choosing the bi values that minimize SSE =
we obtain the fitted equation:
Pn
i=1 (yi
yi )2 ),
Y = b0 + b1 x1 + b2 x2 + + bk xk
and our estimate of :
se =
sP
(y y)2
=
nk1
SSE
nk1
The Analysis of Variance table will be very similar to what we used previously, with the only
adjustments being in the degrees of freedom. Table 10 shows the values for the general case when
there are k predictor variables. We will rely on computer outputs to obtain the Analysis of Variance
and the estimates b0 , b1 , and bk .
31
Source of
Variation
MODEL
ERROR
TOTAL
Sum of
Squares
P
SSR = ni=1 (
yi y)2
Pn
SSE = i=1 (yi yi )2
P
SSyy = ni=1 (yi y)2
ANOVA
Degrees of
Freedom
k
nk1
n1
Mean
Square
M SR = SSR
k
SSE
M SE = nk1
F
M SR
F =M
SE
5.2
Testing for Association Between the Response and the Full Set of Predictor
Variables
To see if the set of predictor variables is useful in predicting the response variable, we will test
H0 : 1 = 2 = . . . = k = 0. Note that if H0 is true, then the mean response does not depend
on the levels of the predictor variables. We interpret this to mean that there is no association
between the response variable and the set of predictor variables. To test this hypothesis, we use
the following method:
1. H0 : 1 = 2 = = k = 0
2. HA : Not every i = 0
3. T.S.: Fobs =
M SR
M SE
5.3
If we reject the previous null hypothesis and conclude that not all of the i are zero, we may wish
to test whether individual i are zero. Note that if we fail to reject the null hypothesis that i
is zero, we can drop the predictor xi from our model, thus simplifying the model. Note that this
test is testing whether xi is useful given that we are already fitting a model containing
the remaining k 1 predictor variables. That is, does this variable contribute anything once
weve taken into account the other predictor variables. These tests are t-tests, where we compute
t = sbbi just as we did in the section on making inferences concerning 1 in simple regression. The
i
procedure for testing whether i = 0 (the ith predictor variable does not contribute to predicting
the response given the other k 1 predictor variables are in the model) is as follows:
H0 : i = 0 (y is not associated with xi after controlling for all other independent variables)
32
(1) HA : i 6= 0
(2) HA : i > 0
(3) HA : i < 0
T.S.: tobs =
bi
S bi
5.4
We have seen the two extreme cases of testing whether all regression coefficients are simultaneously
0 (the F -test), and the case of testing whether a single regression coefficient is 0, controlling for
all other predictors (the t-test). We can also test whether a subset of the k regression coefficients
are 0, controlling for all other predictors. Note that the two extreme cases can be tested using this
very general procedure.
To make the notation as simple as possible, suppose our model consists of k predictor variables,
of which wed like to test whether q (q k) are simultaneously not associated with y, after controlling for the remaining k q predictor variables. Further assume that the k q remaining predictors
are labelled x1 , x2 , . . . , xkq and that the q predictors of interest are labelled xkq+1 , xkq+2 , . . . , xk .
This test is of the form:
H0 : kq+1 = kq+2 = = k = 0
The procedure for obtaining the numeric elements of the test is as follows:
1. Fit the model under the null hypothesis (kq+1 = kq+2 = = k = 0). It will include
only the first k q predictor variables. This is referred to as the Reduced model. Obtain
the error sum of squares (SSE(R)) and the error degrees of freedom dfE (R) = n (k q) 1.
2. Fit the model with all k predictors. This is referred to as the Complete or Full model
(and was used for the F -test for all regression coefficients). Obtain the error sum of squares
(SSE(F )) and the error degrees of freedom (dfE (F ) = n k 1).
By definition of the least squares citerion, we know that SSE(R) SSE(F ). We now obtain the
test statistic:
TS :
Fobs =
SSE(R)SSE(F )
(n(kq)1)(nk1)
SSE(F )
nk1
33
LATITUDE
29.767
32.850
26.933
31.950
34.800
33.450
28.700
32.450
31.800
34.850
30.867
36.350
30.300
26.900
28.450
25.900
LONGITUDE
95.367
96.850
97.800
102.183
102.467
99.633
100.483
100.533
106.40
100.217
102.900
102.083
97.700
99.283
99.217
97.433
ELEV
41
440
25
2851
3840
1461
815
2380
3918
2040
3000
3693
597
315
459
19
TEMP
56
48
60
46
38
46
53
46
44
41
47
36
52
60
56
62
INCOME
24322
21870
11384
24322
16375
14595
10623
16486
15366
13765
17717
19036
20514
11523
10563
12931
Source of
Variation
MODEL
Sum of
Squares
SSR = 934.328
ERROR
SSE = 7.609
TOTAL
SSyy = 941.938
ANOVA
Degrees of
Mean
Freedom
Square
k=3
M SR = 934.328
3
=311.443
nk1 =
M SE = 7.609
12
16 3 1 = 12
=0.634
n 1 = 15
F
F = 311.443
0.634
=491.235
p-value
.0001
PARAMETER
INTERCEPT (0 )
LATITUDE (1 )
LONGITUDE (2 )
ELEVATION (3 )
ESTIMATE
b0 =109.25887
b1 = 1.99323
b2 = 0.38471
b3 = 0.00096
t FOR H0 :
i =0
36.68
14.61
1.68
1.68
P-VALUE
.0001
.0001
.1182
.1181
STANDARD ERROR
OF ESTIMATE
2.97857
0.13639
0.22858
0.00057
Table 13: Parameter estimates and tests of hypotheses for individual parameters
2. HA : Not all i = 0
3. T.S.: Fobs =
M SR
M SE
311.443
0.634
= 491.235
4. R.R.: Fobs > F2,13,.05 = 3.81 (This is not provided on the output, the p-value takes the place
of it).
5. p-value: P (F > 644.45) = .0001 (Actually it is less than .0001, but this is the smallest p-value
the computer will print).
We conclude that we are sure that at least one of these three variables is related to the response
variable temperature.
We also see from the individual t-tests that latitude is useful in predicting temperature, even
after taking into account the other predictor variables.
The formal test (based on = 0.05 significance level) for determining wheteher temperature is
associated with latitude after controlling for longitude and elevation is given here:
H0 : 1 = 0 (TEMP (y) is not associated with LAT (x1 ) after controlling for LONG (x2 ) and
ELEV (x3 ))
HA : 1 6= 0 (TEMP is associated with LAT after controlling for LONG and ELEV)
T.S.: tobs =
b1
S b1
1.99323
0.136399
= 14.614
Note from Table 13 that neither the coefficient for LONGITUDE (X2 ) or ELEVATION (X3 )
are significant at the = 0.05 significance level (p-values are .1182 and .1181, respectively). Recall
these are testing whether each term is 0 controlling for LATITUDE and the other term.
Before concluding that neither LONGITUDE (x2 ) or ELEVATION (x3 ) are useful predictors,
controlling for LATITUDE, we will test whether they are both simultaneously 0, that is:
H0 : 2 = 3 = 0
vs
HA : 2 6= 0 and/or 3 6= 0
Next, we fit the model with only LATITUDE (x1 ) and obtain the error sum of squares: SSE(R) =
60.935 and get the following test statistic:
TS :
Fobs
Since 42.055 >> 3.89, we reject H0 , and conclude that LONGITUDE (x2 ) and/or ELEVATION
(x3 ) are associated with TEMPERATURE (y), after controlling for LATITUDE (x1 ).
The reason we failed to reject H0 : 2 = 0 and H0 : 3 = 0 individually based on the t-tests is
that ELEVATION and LONGITUDE are highly correlated (Elevations rise as you go further west
in the state. So, once you control for LONGITUDE, we observe little ELEVATION effect, and vice
versa. We will discuss why this is the case later. In theory, we have little reason to believe that
temperatures naturally increase or decrease with LONGITUDE, but we may reasonably expect
that as ELEVATION increases, TEMPERATURE decreases.
We refit the more parsimonious (simplistic) model that uses ELEVATION (x1 ) and LATITUDE (x2 ) to predict TEMPERATURE (y). Note the new symbols for ELEVATION and LATITUDE. That is to show you that they are merely symbols. The results are given in Table 14 and
Table 15.
Source of
Variation
MODEL
Sum of
Squares
SSR = 932.532
ERROR
SSE = 9.406
TOTAL
SSyy = 941.938
ANOVA
Degrees of
Mean
Freedom
Square
k=2
M SR = 932.532
2
=466.266
nk1 =
M SE = 9.406
13
16 2 1 = 13
=0.724
n 1 = 15
F
F = 466.266
0.634
=644.014
p-value
.0001
Table 14: The Analysis of Variance Table for Texas data without LONGITUDE
We see this by observing that the t-statistic for testing H0 : 1 = 0 (no latitude effect on
temperature) is 17.65, corresponding to a p-value of .0001, and the t-statistic for testing H0 :
2 = 0 (no elevation effect) is 8.41, also corresponding to a p-value of .0001. Further note
that both estimates are negative, reflecting that as elevation and latitude increase, temperature
decreases. That should not come as any big surprise.
36
PARAMETER
INTERCEPT (0 )
ELEVATION (1 )
LATITUDE (2 )
ESTIMATE
b0 =63.45485
b1 = 0.00185
b2 = 1.83216
t FOR H0 :
i =0
36.68
8.41
17.65
P-VALUE
.0001
.0001
.0001
STANDARD ERROR
OF ESTIMATE
0.48750
0.00022
0.10380
Table 15: Parameter estimates and tests of hypotheses for individual parameters without LONGITUDE
The magnitudes of the estimated coefficients are quite different, which may make you believe
that one predictor variable is more important than the other. This is not necessarily true, because
the ranges of their levels are quite different (1 unit change in latitude represents a change of
approximately 19 miles, while a unit change in elevation is 1 foot) and recall that i represents the
change in the mean response when variable Xi is increased by 1 unit.
The data corresponding to the 16 locations in the sample are plotted in Figure 15 and the fitted
equation for the model that does not include LONGITUDE is plotted in Figure 16. The fitted
equation is a plane in three dimensions.
TEMP
62.00
53.33
44.67
36.00
36.35
32.87
L A T 12 9 . 3 8
25.90 19
3918
2618
1319 ELEV
37
YHAT
63.45
53.66
43.87
34.08
37
4000
33
LAT
2667
1333 ELEV
29
25 0
vs
HA : 2 6= 0 and/or 5 6= 0 and/or 6 6= 0
k=6
q=3
dfE (F ) = 18 6 1 = 11
dfE (R) = 18 (6 3) 1 = 14
38
M SE(F ) = 0.00998
F.05,3,11 = 3.59
SMSA
Los Angeles-Long Beach
Denver
San Francisco-Oakland
Dallas-Fort Worth
Miami
Atlanta
Houston
Seattle
New York
Memphis
New Orleans
Cleveland
Chicago
Detroit
Minneapolis-St Paul
Baltimore
Philadelphia
Boston
y
6.17
6.06
6.04
6.04
6.02
6.02
5.99
5.91
5.89
5.87
5.85
5.75
5.73
5.66
5.66
5.63
5.57
5.28
x1
78.1
77.0
75.7
77.4
77.4
73.6
76.3
72.5
77.3
77.4
72.4
67.0
68.9
70.7
69.8
72.9
68.7
67.8
x2
3042
1997
3162
1821
1542
1074
1856
3024
216
1350
1544
631
972
699
1377
399
304
0
x3
91.3
84.1
129.3
41.2
119.1
32.3
45.2
109.7
364.3
111.0
81.0
202.7
290.1
223.4
138.4
125.4
259.5
428.2
x4
1738.1
1110.4
1738.1
778.4
1136.7
582.9
778.4
1186.0
2582.4
613.6
636.1
1346.0
1626.8
1049.6
1289.3
836.3
1315.3
2081.0
x5
45.5
51.8
24.0
45.7
88.9
39.9
54.1
31.1
11.9
27.4
27.3
24.6
20.1
24.7
28.8
22.9
18.3
7.5
x6
33.1
21.9
46.0
51.3
18.7
26.6
35.7
17.0
7.3
11.3
8.1
10.0
9.4
31.7
19.7
8.6
18.7
2.0
y
6.19
6.04
6.05
6.05
6.04
5.92
6.02
5.91
5.82
5.86
5.81
5.64
5.60
5.63
5.81
5.77
5.57
5.41
e = y y
-0.02
0.02
-0.01
-0.01
-0.02
0.10
-0.03
0.00
0.07
0.01
0.04
0.11
0.13
0.03
-0.15
-0.14
0.00
-0.13
Table 16: Data and fitted values for mortgage rate multiple regression example.
Source of
Variation
MODEL
Sum of
Squares
SSR = 0.73877
ERROR
SSE = 0.10980
TOTAL
SSyy = 0.84858
ANOVA
Degrees of
Mean
Freedom
Square
k=6
M SR = 0.73877
6
=0.12313
nk1 =
M SE = 0.10980
11
18 6 1 = 11
=0.00998
n 1 = 17
F
F = 0.12313
0.00998
=12.33
p-value
.0003
Table 17: The Analysis of Variance Table for Mortgage rate regression analysis
PARAMETER
INTERCEPT (0 )
x1 (1 )
x2 (2 )
x3 (3 )
x4 (4 )
x5 (5 )
x6 (6 )
STANDARD
ERROR
0.66825
0.00931
0.000047
0.000753
0.000112
0.00177
0.00230
ESTIMATE
b0 =4.28524
b1 = 0.02033
b2 = 0.000014
b3 = 0.00158
b4 = 0.000202
b5 = 0.00128
b6 = 0.000236
t-statistic
6.41
2.18
0.29
-2.10
1.79
0.73
0.10
P -value
.0001
.0515
.7775
.0593
.1002
.4826
.9203
Table 18: Parameter estimates and tests of hypotheses for individual parameters Mortgage rate
regression analysis
39
Source of
Variation
MODEL
Sum of
Squares
SSR = 0.73265
ERROR
SSE = 0.11593
TOTAL
SSyy = 0.84858
ANOVA
Degrees of
Mean
Freedom
Square
kq =3
M SR = 0.73265
3
=0.24422
n (k q) 1 = M SE = 0.11593
14
18 3 1 = 14
=0.00828
n 1 = 17
F
F = 0.24422
0.00828
=29.49
p-value
.0001
Table 19: The Analysis of Variance Table for Mortgage rate regression analysis (Reduced Model)
PARAMETER
INTERCEPT (0 )
x1 (1 )
x3 (3 )
x4 (4 )
STANDARD
ERROR
0.58139
0.00792
0.00041778
0.000074
ESTIMATE
b0 =4.22260
b1 = 0.02229
b3 = 0.00186
b4 = 0.000225
t-statistic
7.26
2.81
-4.46
3.03
P -value
.0001
.0138
.0005
.0091
Table 20: Parameter estimates and tests of hypotheses for individual parameters Mortgage rate
regression analysis (Reduced Model)
Next, we fit the reduced model, with 2 = 5 = 6 = 0. We get the Analysis of Variance in
Table 19 and parameter estimates in Table 20.
Note first, that all three regression coefficients
are significant now at the = 0.05 significance
level. Also, our residual standard error, Se = M SE has also decreased (0.09991 to 0.09100). This
implies we have lost very little predictive ability by dropping x2 , x5 , and x6 from the model. Now
to formally test whether these three predictor variables regression coefficients are simultaneously
0 (with = 0.05):
H0 : 2 = 5 = 6 = 0
HA : 2 6= 0 and/or 5 6= 0 and/or 6 6= 0
T S : Fobs =
(0.115930.10980)/2
0.00998
.00307
.00998
= 0.307
EMP (the amount of employment within 1.5 miles of the store. The regression coefficients and
standard errors are given in Table 5.4.
Source: Lord, J.D. and C.D. Lynds (1981), The Use of Regression Models in Store Location
Research: A Review and Case Study, Akron Business and Economic Review, Summer, 13-19.
Variable
POP
MHI
DIS
TFL
EMP
Estimate
0.09460
0.06129
4.88524
-2.59040
-0.00245
Std Error
0.01819
0.02057
1.72623
1.22768
0.00454
Table 21: Regression coefficients and standard errors for liquor store sales study
a) Do any of these variables fail to be associated with store sales after controlling for the others?
b) Consider the signs of the significant regression coefficients. What do they imply?
5.5
R2 and AdjustedR2
As was discussed in the previous chapter, the coefficient of multiple determination represents the
proportion of the variation in the dependent variable (y) that is explained by the regression on
the collection of independent variables: (x1 ,. . . ,xk ). R2 is computed exactly as before:
R2
SSR
SSyy
SSE
SSyy
One problem with R2 is that when we continually add independent variables to a regression
model, it continually increases (or at least, never decreases), even when the new variable(s) add
little or no predictive power. Since we are trying to fit the simplest (most parsimonious) model
that explains the relationship between the set of independent variables and the dependent variable,
we need a measure that penalizes models that contain useless or redundant independent variables.
This penalization takes into account that by including useless or redundant predictors, we are
decreasing error degrees of freedom (dfE = n k 1). A second measure, that does not carry the
proportion of variation explained criteria, but is useful for comparing models of varying degrees of
complexity, is Adjusted-R2 :
Adjusted R
SSE/(n k 1)
1
SSyy /(n 1)
n1
1
nk1
SSE
SSyy
k=3
SSE = 7.609
SSyy = 941.938
7.609
= 1
= 1 .008 = 0.992
941.938
15
= 1
12
Adj RF2
7.609
941.938
= 1 1.25(.008) = 0.9900
k=2
SSE = 9.406
SSyy = 941.938
2 and Adj-R2 :
and, we obtain RR
R
2
RR
= 1
9.406
= 1 .010 = 0.990
941.938
2
= 1
Adj RR
15
13
9.406
941.938
= 1 1.15(.010) = 0.9885
Thus, by both measures the Full Model wins, but it should be added that both appear to fit
the data very well!
Example 5.2 (Continued) Mortgage Financing Costs
For the mortgage data (with Total Sum of Squares SSyy = 0.84858 and n = 18), when we
include all 6 independent variables in the full model, we obtain the following results:
SSR = 0.73877
SSE = 0.10980
k=6
SSRF
0.73877
=
=
= 0.8706
SSyy
0.84858
n1
= 1
nk1
AdjRF2
SSEF
SSyy
= 1
17
11
0.10980
0.84858
= 0.8000
M SR
SSR/k
=
=
M SE
SSE/(n k 1)
SSR
SSyy /k
SSE
SSyy /(n
k 1)
R2 /k
(1 R2 )/(n k 1)
For the liquor store example, there were n = 16 stores and k = 5 variables in the full model. To
test:
H0 : 1 = 2 = 3 = 4 = 5 = 0
vs
HA : Not all i = 0
we get the following test statistic and rejection region ( = 0.05):
T S : Fobs =
0.69/5
0.138
=
= 4.45
(1 0.69)/(16 5 1)
0.031
5.6
Multicollinearity
1
1 Ri2
where Ri2 is the coefficient of multiple determination when xi is regressed on the k 1 other
independent variables. One rule of thumb suggests that severe multicollinearity is present if V IFi >
10 (Ri2 > .90).
Example 5.1 Continued
First, we run a regression with ELEVATION as the dependent variable and LATITUDE and
LONGITUDE as the independent variables. We then repeat the process with LATITUDE as the
dependent variable, and finally with LONGITUDE as the dependent variable. Table ?? gives R2
and V IF for each model.
Variable
ELEVATION
LATITUDE
LONGITUDE
R2
.9393
.7635
.8940
V IF
16.47
4.23
9.43
PARAMETER
INTERCEPT (0 )
LATITUDE (1 )
LONGITUDE (2 )
ELEVATION (3 )
ESTIMATE
b0 =109.25887
b1 = 1.99323
b2 = 0.38471
b3 = 0.00096
STANDARD ERROR
OF ESTIMATE
2.97857
0.13639
0.22858
0.00057
Table 23: Parameter estimates and standard errors for the full model
PARAMETER
INTERCEPT (0 )
ELEVATION (1 )
LATITUDE (2 )
ESTIMATE
b0 =63.45485
b1 = 0.00185
b2 = 1.83216
STANDARD ERROR
OF ESTIMATE
0.48750
0.00022
0.10380
Table 24: Parameter estimates and standard errors for the reduced model
5.7
Autocorrelation
t1 + t
where is the correlation between consecutive error terms, and t is a normally distributed
independent error term. When errors display a positive correlation, > 0 (Consecutive error
terms are associated). We can test this relation as follows, note that when = 0, error terms
are independent (which is the assumption in the derivation of the tests in the chapters on linear
regression).
Pn
)
P(ent et1
2
t=2
e
t=1 t
D dU = Dont Reject H0
D dL = Reject H0
dL D dU = Withhold judgement
Values of dL and dU (indexed by n and k (the number of predictor variables)) are given in Table
11(a), p. B-22.
Cures for Autocorrelation:
44
Additional independent variable(s) A variable may be missing from the model that will
eliminate the autocorrelation.
Transform the variables Take first differences (yt+1 yt ) and (yt+1 yt ) and run regression
with transformed y and x.
Source
DF
Sum of
Squares
Mean
Square
Model
Error
Corrected Total
2
66
68
4.80557
0.22264
5.02821
2.40278
0.00337
Root MSE
Dependent Mean
Coeff Var
0.05808
1.76999
3.28143
R-Square
Adj R-Sq
F Value
Pr > F
712.27
<.0001
0.9557
0.9544
Parameter Estimates
Variable
DF
Parameter
Estimate
Standard
Error
t Value
Pr > |t|
Intercept
income
price
1
1
1
4.61171
-0.11846
-1.23174
0.15262
0.10885
0.05024
30.22
-1.09
-24.52
<.0001
0.2804
<.0001
Durbin-Watson D
Number of Observations
1st Order Autocorrelation
0.247
69
0.852
45
Obs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
consume
1.9565
1.9794
2.0120
2.0449
2.0561
2.0678
2.0561
2.0428
2.0290
1.9980
1.9884
1.9835
1.9773
1.9748
1.9629
1.9396
1.9309
1.9271
1.9239
1.9414
1.9685
1.9727
1.9736
1.9499
1.9432
1.9569
1.9647
1.9710
1.9719
1.9956
2.0000
1.9904
1.9752
1.9494
1.9332
1.9139
1.9091
1.9139
1.8886
1.7945
1.7644
1.7817
1.7784
1.7945
1.7888
1.8751
1.7853
1.6075
1.5185
1.6513
1.6247
1.5391
1.4922
1.4606
1.4551
1.4425
1.4023
1.3991
income
1.7669
1.7766
1.7764
1.7942
1.8156
1.8083
1.8083
1.8067
1.8166
1.8041
1.8053
1.8242
1.8395
1.8464
1.8492
1.8668
1.8783
1.8914
1.9166
1.9363
1.9548
1.9453
1.9292
1.9209
1.9510
1.9776
1.9814
1.9819
1.9828
2.0076
2.0000
1.9939
1.9933
1.9797
1.9772
1.9924
2.0117
2.0204
2.0018
2.0038
2.0099
2.0174
2.0279
2.0359
2.0216
1.9896
1.9843
1.9764
1.9965
2.0652
2.0369
1.9723
1.9797
2.0136
2.0165
2.0213
2.0206
2.0563
price
1.9176
1.9059
1.8798
1.8727
1.8984
1.9137
1.9176
1.9176
1.9420
1.9547
1.9379
1.9462
1.9504
1.9504
1.9723
2.0000
2.0097
2.0146
2.0146
2.0097
2.0097
2.0097
2.0048
2.0097
2.0296
2.0399
2.0399
2.0296
2.0146
2.0245
2.0000
2.0048
2.0048
2.0000
1.9952
1.9952
1.9905
1.9813
1.9905
1.9859
2.0518
2.0474
2.0341
2.0255
2.0341
1.9445
1.9939
2.2082
2.2700
2.2430
2.2567
2.2988
2.3723
2.4105
2.4081
2.4081
2.4367
2.4284
yhat
2.04042
2.05368
2.08586
2.09249
2.05830
2.04032
2.03552
2.03571
2.00448
1.99032
2.01087
1.99841
1.99142
1.99060
1.96330
1.92709
1.91378
1.90619
1.90321
1.90691
1.90472
1.90585
1.91379
1.90874
1.88066
1.86482
1.86437
1.87700
1.89537
1.88024
1.91131
1.90612
1.90619
1.91372
1.91993
1.91813
1.92163
1.93193
1.92280
1.92823
1.84634
1.85087
1.86601
1.87565
1.86675
1.98091
1.92069
1.65766
1.57916
1.60428
1.59075
1.54655
1.45514
1.40407
1.40669
1.40612
1.37097
1.37697
46
e
-0.08392
-0.07428
-0.07386
-0.04759
-0.00220
0.02748
0.02058
0.00709
0.02452
0.00768
-0.02247
-0.01491
-0.01412
-0.01580
-0.00040
0.01251
0.01712
0.02091
0.02069
0.03449
0.06378
0.06685
0.05981
0.04116
0.06254
0.09208
0.10033
0.09400
0.07653
0.11536
0.08869
0.08428
0.06901
0.03568
0.01327
-0.00423
-0.01253
-0.01803
-0.03420
-0.13373
-0.08194
-0.06917
-0.08761
-0.08115
-0.07795
-0.10581
-0.13539
-0.05016
-0.06066
0.04702
0.033946
-0.007450
0.037059
0.056527
0.048415
0.036383
0.031328
0.022134
59
60
61
62
63
64
65
66
67
68
69
1.3798
1.3782
1.3366
1.3026
1.2592
1.2365
1.2549
1.2527
1.2763
1.2906
1.2721
2.0579
2.0649
2.0582
2.0517
2.0491
2.0766
2.0890
2.1059
2.1205
2.1205
2.1182
2.4310
2.4363
2.4552
2.4838
2.4958
2.5048
2.5017
2.4958
2.4838
2.4636
2.4580
1.37357
1.36622
1.34373
1.30927
1.29480
1.28046
1.28281
1.28807
1.30112
1.32600
1.33317
0.006226
0.011983
-0.007131
-0.006673
-0.035600
-0.043957
-0.027906
-0.035372
-0.024823
-0.035404
-0.061074
Residual
0.12
0.11
0.10
0.09
0.08
0.07
0.06
0.05
0.04
0.03
0.02
0.01
0.00
-0.01
-0.02
-0.03
-0.04
-0.05
-0.06
-0.07
-0.08
-0.09
-0.10
-0.11
-0.12
-0.13
-0.14
1870
1880
1890
1900
1910
1920
1930
1940
year
Figure 17: Plot of the residuals versus year for British spirits data
47
6.1
Polynomial Regression
While certainly not restricted to this case, it is best to describe polynomial regression in the case
of a model with only one predictor variable. In many realworld settings relationships will not be
linear, but will demonstrate nonlinear associations. In economics, a widely described phenomenon
is diminishing marginal returns. In this case, y may increase with x, but the rate of increase
decreases over the range of x. By adding quadratic terms, we can test if this is the case. Other
situations may show that the rate of increase in y is increasing in x.
Example 6.1 Health Club Demand
y = 0 + 1 x + 2 x2 + .
Again, we assume that N (0, ). In this model, the number of people attending in a day when
there are x machines is nomally distributed with mean 0 + 1 x + 2 x2 and standard deviation
. Note that we are no longer saying that the mean is linearly related to x, but rather that
it is approximately quadratically related to x (curved). Suppose she leases varying numbers of
machines over a period of n = 12 Wednesdays (always advertising how many machines will be
there on the following Wednesday), and observes the number of people attending the club each day,
and obtaining the data in Table 25.
In this case, we would like to fit the multiple regression model:
y = 0 + 1 x + 2 x2 + ,
which is just like our previous model except instead of a second predictor variable x2 , we are using
the variable x2 , the effect is that the fitted equation y will be a curve in 2 dimensions, not a
plane in 3 dimensions as we saw in the weather example. First we will run the regression on the
computer, obtaining the Analysis of Variance and the parameter estimates, then plot the data and
fitted equation. Table 26 gives the Analysis of Variance for this example and Table 27 gives the
parameter estimates and their standard errors. Note that even though we have only one predictor
variable, it is being used twice and could in effect be treated as two different predictor variables,
so k = 2.
The first test of hypothesis is whether the attendance is associated with the number of machines.
This is a test of H0 : 1 = 2 = 0. If the null hypothesis is true, that implies mean daily attendance
48
Week
1
2
3
4
5
6
7
8
9
10
11
12
# Machines (x)
3
6
1
2
5
4
1
5
3
2
6
4
Attendance (y)
555
776
267
431
722
635
218
692
534
459
810
671
Source of
Variation
MODEL
Sum of
Squares
SSR = 393933.12
ERROR
SSE = 6984.55
TOTAL
SSyy = 400917.67
ANOVA
Degrees of
Mean
Freedom
Square
k=2
M SR = 393933.12
2
=196966.56
n k 1 = M SE = 6984.55
9
=12-2-1=9
=776.06
n 1 = 11
F
196966.56
776.06
F =
=253.80
p-value
.0001
Table 26: The Analysis of Variance Table for health club data
PARAMETER
INTERCEPT (0 )
MACHINES (1 )
MACHINES SQ (2 )
ESTIMATE
b0 =72.0500
b1 = 199.7625
b2 = 13.6518
t FOR H0 :
i =0
2.04
8.67
4.23
P-VALUE
.0712
.0001
.0022
STANDARD ERROR
OF ESTIMATE
35.2377
23.0535
3.2239
Table 27: Parameter estimates and tests of hypotheses for individual parameters
49
is unrelated to the number of machines, thus the club owner would purchase very few (if any) of the
machines. As before this test is the F -test from the Analysis of Variance table, which we conduct
here at = .05.
1. H0 : 1 = 2 = 0
2. HA : Not both i = 0
3. T.S.: Fobs =
M SR
M SE
196966.56
776.06
= 253.80
4. R.R.: Fobs > F2,9,.05 = 4.26 (This is not provided on the output, the p-value takes the place
of it).
5. p-value: P (F > 253.80) = .0001 (Actually it is less than .0001, but this is the smallest p-value
the computer will print).
Another test with an interesting interpretation is H0 : 2 = 0. This is testing the hypothesis
that the mean increases linearly with x (since if 2 = 0 this becomes the simple regression model
(refer back to the coffee data example)). The t-test in Table 27 for this hypothesis has a test
statistic tobs = 4.23 which corresponds to a p-value of .0022, which since it is below .05, implies
we reject H0 and conclude 2 6= 0. Since b2 is is negative, we will conclude that 2 is negative,
which is in agreement with her theory that once you get to a certain number of machines, it does
not help to keep adding new machines. This is the idea of diminishing returns. Figure 18 shows
the actual data and the fitted equation y = 72.0500 + 199.7625x 13.6518x2 .
YHAT
900
800
700
600
500
400
300
200
100
0
0
Figure 18: Plot of the data and fitted equation for health club example
6.2
All of the predictor variables we have used so far were numeric or what are often called quantitative
variables. Other variables also can be used that are called qualitative variables. Qualitative variables measure characteristics that cannot be described numerically, such as a persons sex, race,
religion, or blood type; a citys region or mayors political affiliation; the list of possibilities is
endless. In this case, we frequently have some numeric predictor variable(s) that we believe is (are)
50
related to the response variable, but we believe this relationship may be different for different levels
of some qualitative variable of interest.
If a qualitative variable has m levels, we create m1 indicator or dummy variables. Consider
an example where we are interested in health care expenditures as related to age for men and women,
separately. In this case, the response variable is health care expenditures, one predictor variable is
age, and we need to create a variable representing sex. This can be done by creating a variable x2
that takes on a value 1 if a person is female and 0 if the person is male. In this case we can write
the mean response as before:
E[y|x1 , x2 ] = 0 + 1 x1 + 2 x2 + .
Note that for women of age x1 , the mean expenditure is E[y|x1 , 1] = 0 +1 x1 +2 (1) = (0 +2 )+
1 x1 , while for men of age x1 , the mean expenditure is E[y|x1 , 0] = 0 + 1 x1 + 0 (0) = 0 + 1 x1 .
This model allows for different means for men and women, but requires they have the same slope
(we will see a more general case in the next section). In this case the interpretation of 2 = 0 is
that the means are the same for both sexes, this is a hypothesis a health care professional may wish
to test in a study. In this example the variable sex had two variables, so we had to create 2 1 = 1
dummy variable, now consider a second example.
Example 6.2
We would like to see if annual per capita clothing expenditures is related to annual per capita
income in cities across the U.S. Further, we would like to see if there is any differences in the means
across the 4 regions (Northeast, South, Midwest, and West). Since the variable region has 4 levels,
we will create 3 dummy variables x2 , x3 , and x4 as follows (we leave x1 to represent the predictor
variable per capita income):
(
1 if region=South
x2 =
0 otherwise
x3 =
x4 =
1 if region=Midwest
0 otherwise
(
1 if region=West
0 otherwise
Note that cities in the Northeast have x2 = x3 = x4 = 0, while cities in other regions will have
either x2 , x3 , or x4 being equal to 1. Northeast cities act like males did in the previous example.
The data are given in Table 28.
The Analysis of Variance is given in Table 29, and the parameter estimates and standard errors
are given in Table 30.
Note that we would fail to reject H0 : 1 = 2 = 3 = 4 = 0 at = .05 significance level if
we looked only at the F -statistic and its p-value (Fobs = 2.93, p-value=.0562). This would lead us
to conclude that there is no association between the predictor variables income and region and the
response variable clothing expenditures. This is where you need to be careful when using multiple
regression with many predictor variables. Look at the test of H0 : 1 = 0, based on the t-test in
Table 30. Here we observe tobs =3.11, with a p-value of .0071. We thus conclude 1 6= 0, and that
clothing expenditures is related to income, as we would expect. However, we do fail to reject H0 :
2 = 0, H0 : 3 = 0,and H0 : 4 = 0, so we fail to observe any differences among the regions in terms
of clothing expenditures after adjusting for the variable income. Figure 19 and Figure 20 show the
original data using region as the plotting symbol and the 4 fitted equations corresponding to the 4
51
Source of
Variation
MODEL
ERROR
TOTAL
Sum of
Squares
1116419.0
1426640.2
2543059.2
ANOVA
Degrees of
Mean
Freedom
Square
4
279104.7
15
95109.3
19
F
2.93
p-value
.0562
Table 29: The Analysis of Variance Table for clothes expenditure data
PARAMETER
INTERCEPT (0 )
x1 (1 )
x2 (2 )
x3 (3 )
x4 (4 )
ESTIMATE
657.428
0.113
237.494
21.691
254.992
t FOR H0 :
i =0
0.82
3.11
1.17
0.11
1.30
P-VALUE
.4229
.0071
.2609
.9140
.2130
STANDARD ERROR
OF ESTIMATE
797.948
0.036
203.264
197.536
196.036
Table 30: Parameter estimates and tests of hypotheses for individual parameters
52
regions. Recall that the fitted equation is y = 657.428+0.113x1 +237.494x2 +21.691x3 +254.992x4 ,
and each of the regions has a different set of levels of variables x2 , x3 , and x4 .
Y
2600
2500
2400
2300
2200
2100
2000
1900
1800
1700
1600
1500
1400
1300
1200
16000
W
W
S
N
M
W
S
M
N
W
M M
W
18000
20000
22000
24000
26000
X1
REGION
N N N Midwest
M M M South
S S S Northeast
W W W West
18000
20000
22000
24000
26000
X1
6.3
In some situations, two or more predictor variables may interact in terms of their effects on the
mean response. That is, the effect on the mean response of changing the level of one predictor
variable depends on the level of another predictor variable. This idea is easiest understood in the
case where one of the variables is qualitative.
Example 6.3 Truck and SUV Safety Ratings
Several years ago, The Wall Street Journal reported safety scores on 33 models of SUVs and
53
trucks. Safety scores (y) were reported, as well as the vehicles weight (x1 ) and and indicator of
whether the vehicle has side air bags (x2 = 1 if it does, 0 if not). We fit a model, relating safety
scores to weight, presence of side airbags, and an interaction term that allows the effect of weight
to depend on whether side airbags are present:
y = 0 + 1 x1 + 2 x2 + 3 x1 x2 + .
We can write the equations for the two side airbag types as follows:
Side airbags: y = 0 + 1 x1 + 2 (1) + 3 x1 (1) + = (0 + 2 ) + (1 + 3 )x1 + ,
and
No side airbags: y = 0 + 1 x1 + 2 (0) + 3 x1 (0) + = 0 + 1 x1 + .
The data for years the 33 models are given in Table 31.
The Analysis of Variance table for this example is given in Table 32. Note that R2 = .5518.
Table 33 provides the parameter estimates, standard errors, and individual t-tests. Note that the
F -test for testing H0 : 1 = 2 = 3 = 0 rejects the null hypothesis (F =11.90, P -value=.0001), but
none of the individual t-tests are significant (all P -values exceed 0.05). This can happen due to the
nature of the partial regression coefficients. It can be seen that weight is a very good predictor,
and that the presence of side airbags and the interaction term do not contribute much to the model
(SSE for a model containing only Weight (x1 ) is 3493.7, use this to test H0 : 2 = 3 = 0).
For vehicles with side airbags the fitted equation is:
yairbags = (b0 + b2 ) + (b1 + b3 )x1 = 44.18 + 0.02162x1 ,
while for vehicles without airbags, the fitted equation is:
ynoairbags = b0 + b1 x1 = 76.09 + 0.01262x1 .
Figure 21 shows the two fitted equations for the safety data.
yhatsa
170
160
Side Airbag
150
140
130
No Side Airb
120
110
3000
4000
5000
6000
weight
54
Make
TOYOTA
CHEVROLET
FORD
BUICK
MAZDA
PLYMOUTH
VOLVO
AUDI
DODGE
ACURA
PONTIAC
CHRYSLER
FORD
TOYOTA
MERCURY
ISUZU
TOYOTA
MERCURY
LINCOLN
FORD
FORD
NISSAN
OLDSMOBILE
HONDA
MERCURY
TOYOTA
MERCEDES-BENZ
FORD
DODGE
LINCOLN
DODGE
CADILLAC
CHEVROLET
Weight (x1 )
3437
3454
3543
3610
3660
3665
3698
3751
3765
3824
3857
3918
3926
3945
3951
3966
3973
4041
4087
4125
4126
4147
4164
4244
4258
4356
4396
4760
4884
4890
4896
5372
5759
Airbag (x2 )
1
1
0
1
1
0
1
1
0
1
1
0
0
0
0
0
0
0
1
0
1
1
0
0
1
0
1
0
0
1
0
1
1
Source of
Variation
MODEL
ERROR
TOTAL
Sum of
Squares
3838.03
3116.88
6954.91
ANOVA
Degrees of
Freedom
3
29
32
Mean
Square
1279.34
107.48
F
11.90
p-value
.0001
Table 32: The Analysis of Variance Table for truck/SUV safety data
55
PARAMETER
INTERCEPT (0 )
x1 (1 )
x2 (2 )
x3 (3 )
ESTIMATE
76.09
0.01262
-31.91
0.0090
STANDARD ERROR
OF ESTIMATE
27.04
0.0065
31.78
.0076
t FOR H0 :
i =0
2.81
1.93
-1.00
1.18
P-VALUE
.0087
.0629
.3236
.2487
Table 33: Parameter estimates and tests of hypotheses for individual parameters Safety data
7.1
yrkbales
20000
18000
16000
14000
12000
10000
8000
6000
1
9
7
8
1
9
7
9
1
9
8
0
1
9
8
1
1
9
8
2
1
9
8
3
1
9
8
4
1
9
8
5
1
9
8
6
1
9
8
7
1
9
8
8
1
9
8
9
1
9
9
0
1
9
9
1
1
9
9
2
1
9
9
3
1
9
9
4
1
9
9
5
1
9
9
6
1
9
9
7
1
9
9
8
1
9
9
9
2
0
0
0
2
0
0
1
year
Gross_Sales
11
10
9
8
7
6
5
4
3
2
1
0
10
20
30
40
50
60
quarter1
Figure 23: Plot of quarterly Texas in-state FIRE gross sales 1989-2002
7.2
Smoothing Techniques
t
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
year
1989
1989
1989
1989
1990
1990
1990
1990
1991
1991
1991
1991
1992
1992
1992
1992
1993
1993
1993
1993
1994
1994
1994
1994
1995
1995
1995
1995
1996
1996
1996
1996
1997
1997
1997
1997
1998
1998
1998
1998
1999
1999
1999
1999
2000
2000
2000
2000
2001
2001
2001
2001
2002
2002
2002
2002
quarter
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
fitted sales (
yt )
1.725
1.813
1.900
1.988
2.075
2.163
2.250
2.338
2.425
2.513
2.600
2.688
2.776
2.863
2.951
3.038
3.126
3.213
3.301
3.388
3.476
3.563
3.651
3.738
3.826
3.913
4.001
4.089
4.176
4.264
4.351
4.439
4.526
4.614
4.701
4.789
4.876
4.964
5.051
5.139
5.226
5.314
5.401
5.489
5.577
5.664
5.752
5.839
5.927
6.014
6.102
6.189
6.277
6.364
6.452
6.539
ratio (yt /
yt )
0.908
1.102
1.015
1.586
1.016
0.926
0.873
1.345
0.763
0.916
0.850
1.499
0.884
0.886
0.949
1.558
0.853
1.013
0.924
1.566
0.849
0.895
0.829
1.297
0.785
0.843
0.825
1.127
0.798
0.786
0.788
1.251
0.728
0.788
0.832
1.357
0.830
0.931
0.893
1.264
0.829
0.858
0.851
1.393
0.824
0.933
0.897
1.342
0.870
0.883
0.874
1.684
0.860
0.916
0.847
1.303
Table 34: Quarterly in-state gross sales for Texas FIRE firms
58
implement (the textbook also covers even numbered MAs as well). A 3-period moving averge
involves averaging the value directly prior to the current time point, the current value, and the
value directly after the current time point. There will not be values for either the first or last
periods of the series. Similarly, a 5-period moving average will include the current time point, and
the two prior time points and the two subsequent time points.
Example 7.3 - U.S. Internet Retail Sales - 1999q4-2003q1
The data in Table 35 gives the U.S. e-commerce sales for n = 14 quarters (quarter 1 is the 4th
quarter of 1999 and quarter 14 is preliminary reported sales for the 1st quarter of 2003) in millions
of dollars (Source: U.S. Census Bureau).
Quarter
1
2
3
4
5
6
7
8
9
10
11
12
13
14
Sales (yt )
5393
5722
6250
7079
9248
8009
7904
7894
10788
9470
9761
10465
13770
11921
MA(3)
.
5788
6350
7526
8112
8387
7936
8862
9384
10006
9899
11332
12052
.
ES(0.1)
5393
5426
5508
5665
6024
6222
6390
6541
6965
7216
7470
7770
8370
8725
ES(0.5)
5393
5558
5904
6491
7870
7939
7922
7908
9348
9409
9585
10025
11897
11909
Table 35: Quarterly e-commerce sales and smoothed values for U.S. 1999q4-2003q1
To obatain the three period moving average (MA(3)) for the second quarter, we average the
first, second, and third period sales:
5393 + 5722 + 6250
17365
=
= 5788.3 5788
3
3
We can similarly obtain the three period moving average for quarters 3-13. The data and three
period moving averages are given in Figure 24. The moving average is the dashed line, while the
original series is the solid line.
Exponential Smoothing is an alternative means of smoothing the series. It makes use of
all prior time points, with higher weights on more recent time points, and exponentially decaying
weights on more distance time points. One advantage is that we have smoothed values for all time
points. One drawback is that we must select a tuning parameter (although we would also have to
choose the length of a moving average as well, for that method). One widely used convention is
to set the first periods smoothed value to the first observation, then make subsequent smoothed
values as a weighted average of the current observation and the previous value of the smoothed
series. We use the notation St for the smoothed value at time t.
59
sales
14000
13000
12000
11000
10000
9000
8000
7000
6000
5000
1
9 10 11 12 13 14
quarter
Figure 24: Plot of quarterly U.S. internet retail sales and 3-Period moving average
S 1 = y1
St = wyt + (1 w)St1
t2
w = 0.5 :
The smoothed values are given in Table 35, as well as in Figure 25. The solid line is the original
series, the smoothest line is w = 0.1, and the intermediate line is w = 0.5.
7.3
sales
14000
13000
12000
11000
10000
9000
8000
7000
6000
5000
1
9 10 11 12 13 14
quarter
Figure 25: Plot of quarterly U.S. internet retail sales and 32 Exponentially smoothed series
Consider the Texas gross (in-state) sales for the FIRE industry. The seasons are the four
quarters. Fitting a simple linear regression, relating sales to time period, we get:
yt = b0 + b1 t = 1.6376 + 0.08753t
The fitted values (as well as the observed values) have been shown previously in Table 34. Also for
each outcome, we obtain the ratio of the observed to fitted value, also given in the table. Consider
the first and last cases:
t=1:
y1 = 1.567
t = 56 :
y56 = 8.522
y1
1.567
=
= 0.908
y1
1.725
y1
8.522
=
= 1.303
y1
6.539
Next, we take the mean of the observed-to-fitted ratio for each quarter. There are 14 years of data:
Q1 :
0.908 + 1.016 + 0.763 + 0.884 + 0.853 + 0.849 + 0.785 + 0.798 + 0.728 + 0.830 + 0.829 + 0.824 + 0.870 + 0.860
= 0.84
14
0.906
Q3 :
0.875
Q4 :
1.398
The means sum to 4.022, and have a mean of 4.022/4=1.0055. If we divide each mean by 1.0055,
the indexes will sum to 1:
Q1 :
0.838
Q2 :
0.901
Q3 :
0.870
Q4 :
1.390
The seasonally adjusted time series is given by dividing each observed value by its seasonal index.
This way, we can determine when there are real changes in the series, beyond seasonal fluctuations.
Table 36 contains all components as well as the seasonally adjusted values.
61
t
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
year
1989
1989
1989
1989
1990
1990
1990
1990
1991
1991
1991
1991
1992
1992
1992
1992
1993
1993
1993
1993
1994
1994
1994
1994
1995
1995
1995
1995
1996
1996
1996
1996
1997
1997
1997
1997
1998
1998
1998
1998
1999
1999
1999
1999
2000
2000
2000
2000
2001
2001
2001
2001
2002
2002
2002
2002
quarter
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
fitted sales (
yt )
1.725
1.813
1.900
1.988
2.075
2.163
2.250
2.338
2.425
2.513
2.600
2.688
2.776
2.863
2.951
3.038
3.126
3.213
3.301
3.388
3.476
3.563
3.651
3.738
3.826
3.913
4.001
4.089
4.176
4.264
4.351
4.439
4.526
4.614
4.701
4.789
4.876
4.964
5.051
5.139
5.226
5.314
5.401
5.489
5.577
5.664
5.752
5.839
5.927
6.014
6.102
6.189
6.277
6.364
6.452
6.539
ratio (yt /
yt )
0.908
1.102
1.015
1.586
1.016
0.926
0.873
1.345
0.763
0.916
0.850
1.499
0.884
0.886
0.949
1.558
0.853
1.013
0.924
1.566
0.849
0.895
0.829
1.297
0.785
0.843
0.825
1.127
0.798
0.786
0.788
1.251
0.728
0.788
0.832
1.357
0.830
0.931
0.893
1.264
0.829
0.858
0.851
1.393
0.824
0.933
0.897
1.342
0.870
0.883
0.874
1.684
0.860
0.916
0.847
1.303
season adjusted
1.870
2.218
2.218
2.268
2.515
2.224
2.259
2.263
2.207
2.556
2.540
2.899
2.929
2.815
3.218
3.405
3.182
3.614
3.506
3.818
3.521
3.540
3.477
3.487
3.585
3.660
3.794
3.314
3.977
3.720
3.942
3.994
3.934
4.037
4.493
4.675
4.829
5.129
5.183
4.672
5.171
5.058
5.282
5.501
5.485
5.862
5.928
5.636
6.152
5.896
6.128
7.498
6.440
6.473
6.283
6.131
Table 36: Quarterly in-state gross sales for Texas FIRE firms and seasonally adjusted series
62
7.4
Introduction to Forecasting
|e |
t
=
Mean Absolute Deviation (MAD) MAD= number of forecasts
e2t =
Pn
t=1 (yt
Pn
t=1
|yt Ft |
n
F t )2
When comparing forecasting methods, we wish to minimize one or both of these quantities.
7.5
7.5.1
Moving Averages
This method, which is not included in the tesxt, is a slight adjustment to the centered moving
averages in the smoothing section. At time point t, we use the previous k observations to forecast
yt . We use the mean of the last k observations to forecast outcome at t:
Xt1 + Xt2 + + Xtk
Ft =
k
Problem: How to choose k?
Example 7.4 - Anhueser-Busch Annual Dividend Yields 1952-1995
Table 37 gives average dividend yields for AnheuserBusch for the years 19521995 (Source:Value
Line), forecasts and errors based on moving averages based on lags of 1, 2, and 3. Note that we
dont have early year forecasts, and the longer the lag, the longer we must wait until we get our
first forecast.
Here we compute moving averages for year=1963:
1Year: F1963 = y1962 = 3.2
2Year: F1963 =
y1962 +y1961
2
3Year: F1963 =
3.2+2.8
2
= 3.0
3.2+2.8+4.4
3
= 3.47
t
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
Year
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
yt
5.30
4.20
3.90
5.20
5.80
6.30
5.60
4.80
4.40
2.80
3.20
3.10
3.10
2.60
2.00
1.60
1.30
1.20
1.20
1.10
0.90
1.40
2.00
1.90
2.30
3.10
3.50
3.80
3.70
3.10
2.60
2.40
3.00
2.40
1.80
1.70
2.20
2.10
2.40
2.10
2.20
2.70
3.00
2.80
F1,t
.
5.30
4.20
3.90
5.20
5.80
6.30
5.60
4.80
4.40
2.80
3.20
3.10
3.10
2.60
2.00
1.60
1.30
1.20
1.20
1.10
0.90
1.40
2.00
1.90
2.30
3.10
3.50
3.80
3.70
3.10
2.60
2.40
3.00
2.40
1.80
1.70
2.20
2.10
2.40
2.10
2.20
2.70
3.00
e1,t
.
-1.10
-0.30
1.30
0.60
0.50
-0.70
-0.80
-0.40
-1.60
0.40
-0.10
0.00
-0.50
-0.60
-0.40
-0.30
-0.10
0.00
-0.10
-0.20
0.50
0.60
-0.10
0.40
0.80
0.40
0.30
-0.10
-0.60
-0.50
-0.20
0.60
-0.60
-0.60
-0.10
0.50
-0.10
0.30
-0.30
0.10
0.50
0.30
-0.20
F2,t
.
.
4.75
4.05
4.55
5.50
6.05
5.95
5.20
4.60
3.60
3.00
3.15
3.10
2.85
2.30
1.80
1.45
1.25
1.20
1.15
1.00
1.15
1.70
1.95
2.10
2.70
3.30
3.65
3.75
3.40
2.85
2.50
2.70
2.70
2.10
1.75
1.95
2.15
2.25
2.25
2.15
2.45
2.85
e2,t
.
.
-0.85
1.15
1.25
0.80
-0.45
-1.15
-0.80
-1.80
-0.40
0.10
-0.05
-0.50
-0.85
-0.70
-0.50
-0.25
-0.05
-0.10
-0.25
0.40
0.85
0.20
0.35
1.00
0.80
0.50
0.05
-0.65
-0.80
-0.45
0.50
-0.30
-0.90
-0.40
0.45
0.15
0.25
-0.15
-0.05
0.55
0.55
-0.05
F3,t
.
.
.
4.47
4.43
4.97
5.77
5.90
5.57
4.93
4.00
3.47
3.03
3.13
2.93
2.57
2.07
1.63
1.37
1.23
1.17
1.07
1.13
1.43
1.77
2.07
2.43
2.97
3.47
3.67
3.53
3.13
2.70
2.67
2.60
2.40
1.97
1.90
2.00
2.23
2.20
2.23
2.33
2.63
e3,t
.
.
.
0.73
1.37
1.33
-0.17
-1.10
-1.17
-2.13
-0.80
-0.37
0.07
-0.53
-0.93
-0.97
-0.77
-0.43
-0.17
-0.13
-0.27
0.33
0.87
0.47
0.53
1.03
1.07
0.83
0.23
-0.57
-0.93
-0.73
0.30
-0.27
-0.80
-0.70
0.23
0.20
0.40
-0.13
0.00
0.47
0.67
0.17
Table 37: Dividend yields, Forecasts, errors 1, 2, and 3 year moving Averages
64
DIV_YLD
7
Actual
MA(1)
MA(2)
MA(3)
6
5
4
3
2
1
0
1950
1960
1970
1980
1990
2000
CAL_YEAR
Figure 26: Plot of the data moving average forecast for AnheuserBusch dividend data
7.5.2
Exponential Smoothing
Exponential smoothing is a method of forecasting that weights data from previous time periods
with exponentially decreasing magnitudes. Forecasts can be written as follows, where the forecast
for period 2 is traditionally (but not always) simply the outcome from period 1:
Ft+1 = St = wyt + (1 w)St1 = wyt + (1 w)Ft
where :
Ft+1 is the forecast for period t + 1
yt is the outcome at t
St is the smoothed value for period t (St1 = Ft )
w is the smoothing constant (0 w 1)
Forecasts are smoother than the raw data and weights of previous observations decline exponentially with time.
Example 7.4 (Continued)
3 smoothing constants (allowing decreasing amounts of smoothness) for illustration:
w = 0.2 Ft+1 = 0.2yt + 0.8St1 = 0.2yt + 0.8Ft
w = 0.5 Ft+1 = 0.5yt + 0.5St1 = 0.5yt + 0.5Ft
w = 0.8 Ft+1 = 0.8yt + 0.2St1 = 0.8yt + 0.2Ft
Year 2 (1953) set F1953 = X1952 , then cycle from there.
65
t
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
Year
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
yt
5.30
4.20
3.90
5.20
5.80
6.30
5.60
4.80
4.40
2.80
3.20
3.10
3.10
2.60
2.00
1.60
1.30
1.20
1.20
1.10
0.90
1.40
2.00
1.90
2.30
3.10
3.50
3.80
3.70
3.10
2.60
2.40
3.00
2.40
1.80
1.70
2.20
2.10
2.40
2.10
2.20
2.70
3.00
2.80
Fw=.2,t
.
5.30
5.08
4.84
4.92
5.09
5.33
5.39
5.27
5.10
4.64
4.35
4.10
3.90
3.64
3.31
2.97
2.64
2.35
2.12
1.91
1.71
1.65
1.72
1.76
1.86
2.11
2.39
2.67
2.88
2.92
2.86
2.77
2.81
2.73
2.54
2.38
2.34
2.29
2.31
2.27
2.26
2.35
2.48
ew=.2,t
.
-1.10
-1.18
0.36
0.88
1.21
0.27
-0.59
-0.87
-2.30
-1.44
-1.25
-1.00
-1.30
-1.64
-1.71
-1.67
-1.44
-1.15
-1.02
-1.01
-0.31
0.35
0.18
0.54
1.24
1.39
1.41
1.03
0.22
-0.32
-0.46
0.23
-0.41
-0.93
-0.84
-0.18
-0.24
0.11
-0.21
-0.07
0.44
0.65
0.32
Fw=.5,t
.
5.30
4.75
4.33
4.76
5.28
5.79
5.70
5.25
4.82
3.81
3.51
3.30
3.20
2.90
2.45
2.03
1.66
1.43
1.32
1.21
1.05
1.23
1.61
1.76
2.03
2.56
3.03
3.42
3.56
3.33
2.96
2.68
2.84
2.62
2.21
1.96
2.08
2.09
2.24
2.17
2.19
2.44
2.72
ew=.5,t
.
-1.10
-0.85
0.88
1.04
1.02
-0.19
-0.90
-0.85
-2.02
-0.61
-0.41
-0.20
-0.60
-0.90
-0.85
-0.73
-0.46
-0.23
-0.22
-0.31
0.35
0.77
0.29
0.54
1.07
0.94
0.77
0.28
-0.46
-0.73
-0.56
0.32
-0.44
-0.82
-0.51
0.24
0.02
0.31
-0.14
0.03
0.51
0.56
0.08
Fw=.8,t
.
5.30
4.42
4.00
4.96
5.63
6.17
5.71
4.98
4.52
3.14
3.19
3.12
3.10
2.70
2.14
1.71
1.38
1.24
1.21
1.12
0.94
1.31
1.86
1.89
2.22
2.92
3.38
3.72
3.70
3.22
2.72
2.46
2.89
2.50
1.94
1.75
2.11
2.10
2.34
2.15
2.19
2.60
2.92
ew=.8,t
.
-1.10
-0.52
1.20
0.84
0.67
-0.57
-0.91
-0.58
-1.72
0.06
-0.09
-0.02
-0.50
-0.70
-0.54
-0.41
-0.18
-0.04
-0.11
-0.22
0.46
0.69
0.04
0.41
0.88
0.58
0.42
-0.02
-0.60
-0.62
-0.32
0.54
-0.49
-0.70
-0.24
0.45
-0.01
0.30
-0.24
0.05
0.51
0.40
-0.12
Table 38: Dividend yields, Forecasts, and errors based on exponential smoothing with w =
0.2, 0.5, 0.8
66
Table 38 gives average dividend yields for AnheuserBusch for the years 19521995 (Source:Value
Line), forecasts and errors based on exponential smoothing based on lags of 1, 2, and 3.
Here we obtain Forecasts based on Exponential Smoothing, beginning with year 2 (1953):
1953: Fw=.2,1953 = y1952 = 5.30 Fw=.5,1952 = y1952 = 5.30 Fw=.8,1952 = y1952 = 5.30
1954 (w = 0.2): Fw=.2,1954 = .2y1953 + .8Fw=.2,1953 = .2(4.20) + .8(5.30) = 5.08
1954 (w = 0.5): Fw=.5,1954 = .5y1953 + .5Fw=.5,1953 = .5(4.20) + .5(5.30) = 4.75
1954 (w = 0.8): Fw=.8,1954 = .8y1953 + .2Fw=.5,1953 = .8(4.20) + .2(5.30) = 4.42
Which level of w appears to be discounting more distant observations at a quicker rate? What
would happen if w = 1? If w = 0? Figure 27 gives raw data and exponential smoothing forecasts.
DIV_YLD
7
Actual
ES(a=.2)
ES(a=.5)
ES(a=.8)
6
5
4
3
2
1
0
1950
1960
1970
1980
1990
2000
CAL_YEAR
Figure 27: Plot of the data and Exponential Smoothing forecasts for AnheuserBusch dividend
data
Table 39 gives measures of forecast errors for three moving average, and three exponential
smoothing methods.
Measure
MAE
MSE
Moving Average
1Period 2Period 3Period
0.43
0.53
0.62
0.30
0.43
0.57
Exponential Smoothing
w = 0.2 w = 0.5 w = 0.8
0.82
0.58
0.47
0.97
0.48
0.34
67
7.6
After trend and seasonal indexes have been estimated, future outcomes can be forecast by the
equation:
Ft = [b0 + b1 t] SIt
where b0 + b1 t is the linear trend and SIt is the seasonal index for period t.
Example 7.2 (Continued)
For the Texas FIRE gross sales data, we have:
b0 = 1.6376
b1 = .08753
SIQ1 = .838
SIQ2 = .901
SIQ3 = .870
SIQ4 = 1.390
Thus for the 4 quarters of 2003 (t = 57, 58, 59, 60), we have:
Q1 : F57 = [1.6376 + 0.08753(57)](.838) = 5.553
7.6.1
Autoregression
Sometimes regression is run on past or lagged values of the dependent variable (and possibly
other variables). An Autoregressive model with independent variables corresponding to k periods
can be written as follows:
yt = b0 + b1 yt1 + b2 yt2 + + bk ytk
Note that the regression cannot be run for the first k responses in the series. Technically forecasts
can be given for only periods after the regression has been fit, since the regression model depends
on all periods used to fit it.
Example 7.4 (Continued)
From Computer software, autoregressions based on lags of 1, 2, and 3 periods are fit:
1Period: yt = 0.29 + 0.88yt1
2Period: yt = 0.29 + 1.18yt1 0.29yt2
3Period: yt = 0.28 + 1.21yt1 0.37yt2 + 0.05yt3
Table 40 gives raw data and forecasts based on three autoregression models. Figure 28 displays
the actual outcomes and predictions.
68
t
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
Year
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992
1993
1994
1995
yt
5.3
4.2
3.9
5.2
5.8
6.3
5.6
4.8
4.4
2.8
3.2
3.1
3.1
2.6
2
1.6
1.3
1.2
1.2
1.1
0.9
1.4
2
1.9
2.3
3.1
3.5
3.8
3.7
3.1
2.6
2.4
3
2.4
1.8
1.7
2.2
2.1
2.4
2.1
2.2
2.7
3
2.8
FAR(1),t
.
4.96
3.99
3.72
4.87
5.40
5.84
5.22
4.52
4.16
2.75
3.11
3.02
3.02
2.58
2.05
1.70
1.43
1.35
1.35
1.26
1.08
1.52
2.05
1.96
2.31
3.02
3.37
3.64
3.55
3.02
2.58
2.40
2.93
2.40
1.87
1.79
2.23
2.14
2.40
2.14
2.23
2.67
2.93
eAR(1),t
.
-0.76
-0.09
1.48
0.93
0.90
-0.24
-0.42
-0.12
-1.36
0.45
-0.01
0.08
-0.42
-0.58
-0.45
-0.40
-0.23
-0.15
-0.25
-0.36
0.32
0.48
-0.15
0.34
0.79
0.48
0.43
0.06
-0.45
-0.42
-0.18
0.60
-0.53
-0.60
-0.17
0.41
-0.13
0.26
-0.30
0.06
0.47
0.33
-0.13
FAR(2),t
.
.
3.72
3.68
5.30
5.64
6.06
5.09
4.34
4.10
2.33
3.26
3.03
3.06
2.47
1.90
1.60
1.36
1.33
1.36
1.24
1.03
1.68
2.25
1.96
2.46
3.29
3.53
3.77
3.56
2.88
2.47
2.37
3.14
2.26
1.72
1.78
2.40
2.13
2.52
2.08
2.28
2.84
3.05
e(AR(2),t
.
.
0.18
1.52
0.50
0.66
-0.46
-0.29
0.06
-1.30
0.87
-0.16
0.07
-0.46
-0.47
-0.30
-0.30
-0.16
-0.13
-0.26
-0.34
0.37
0.32
-0.35
0.34
0.64
0.21
0.27
-0.07
-0.46
-0.28
-0.07
0.63
-0.74
-0.46
-0.02
0.42
-0.30
0.27
-0.42
0.12
0.42
0.16
-0.25
FAR(3),t
.
.
.
3.72
5.35
5.58
6.03
5.03
4.35
4.12
2.29
3.35
3.00
3.05
2.44
1.90
1.61
1.37
1.34
1.36
1.23
1.03
1.70
2.23
1.92
2.47
3.28
3.49
3.75
3.54
2.86
2.47
2.39
3.16
2.20
1.73
1.80
2.41
2.10
2.53
2.05
2.29
2.85
3.03
eAR(3),t
.
.
.
1.48
0.45
0.72
-0.43
-0.23
0.05
-1.32
0.91
-0.25
0.10
-0.45
-0.44
-0.30
-0.31
-0.17
-0.14
-0.26
-0.33
0.37
0.30
-0.33
0.38
0.63
0.22
0.31
-0.05
-0.44
-0.26
-0.07
0.61
-0.76
-0.40
-0.03
0.40
-0.31
0.30
-0.43
0.15
0.41
0.15
-0.23
Table 40: Average dividend yields and Forecasts/errors based on autoregression with lags of 1, 2,
and 3 periods
69
DIV_YLD
7
Actual
AR(1)
AR(2)
AR(3)
6
5
4
3
2
1
0
1950
1960
1970
1980
1990
2000
CAL_YEAR
Figure 28: Plot of the data and Autoregressive forecasts for AnheuserBusch dividend data
70