Professional Documents
Culture Documents
Multiple Regression
2
Introduction
In this chapter we extend the simple
linear regression model, and allow for
any number of independent variables.
3
Introduction
We shall use computer printout to
Assess the model
How well it fits the data
Is it useful
Are any required conditions violated?
Employ the model
Interpreting the coefficients
Predictions using the prediction equation
Estimating the expected value of the dependent
variable
4
Model and Required
Conditions
We allow for k independent variables to
potentially be related to the dependent
variable
Coefficients Random error variable
5
Required conditions for the error
variable
6
Estimating the Coefficients
and Assessing the Model
The procedure used to perform regression
analysis:
Obtain the model coefficients and statistics using a
statistical software.
– Diagnose violations of required conditions. Try to remedy
problems when identified.
– Assess the model fit using statistics obtained from the
sample.
– If the model assessment indicates good fit to the data, use it
to interpret the coefficients and generate predictions.
7
Estimating the
Coefficients and
Assessing the Model
Example 1 Where to locate a new
motor inn?
La Quinta Motor Inns is planning an
expansion.
Management wishes to predict which sites are
likely to be profitable.
Several areas where predictors of profitability
can be identified are:
Competition
Market awareness
Demand generators 8
Margin
Profitability
Market
Competition Customers Community Physical
awareness
Market
Competition Customers Community Physical
awareness
13
Standard Error of Estimate
y.
The magnitude of se is judged by
comparing it to
14
Standard Error of Estimate
The definition is
SSE
R 1
2
i
( y y ) 2
H0: b0 = b1 = b2 = … = bk
H1: At least one bi is not equal to zero.
SSR MSR=SSR/k
SSE MSE=SSE/(n-k-1)
18
Testing the Validity of the La
Quinta Inns Regression Model
[Variation in y] = SSR + SSE.
Large F results from a large SSR. Then,
much of the variation in y is explained by the
regression model; the model is useful, and
thus, the null hypothesis should be rejected.
Therefore, the rejection region is…
SSR Rejection region
F k
SSE F>Fa,k,n-k-1
n k 1
19
Testing the Validity of the La
Quinta Inns Regression Model
Fa,k,n-k-1 = F0.05,6,100-6-1=2.17
F = 17.14 > 2.17
Also, the p-value (Significance F) = 0.0000
Reject the null hypothesis.
20
Interpreting the Coefficients
b0 = 38.14. This is the intercept, the value of y
when all the variables take the value zero.
Since the data range of all the independent
variables do not cover the value zero, do not
interpret the intercept.
b1 = – 0.0076. In this model, for each
additional room within 3 mile of the La Quinta
inn, the operating margin decreases on
21
Interpreting the Coefficients
b2 = 1.65. In this model, for each additional mile
that the nearest competitor is to a La Quinta inn,
the operating margin increases on average by
1.65% when the other variables are held constant.
b3 = 0.020. For each additional 1000 sq-ft of office
space, the operating margin will increase on
average by .02% when the other variables are held
constant.
b4 = 0.21. For each additional thousand students
the operating margin increases on average by .
21% when the other variables are held constant.
22
Interpreting the Coefficients
b5 = 0.41. For additional $1000 increase in
median household income, the operating
margin increases on average by .41%, when
the other variables remain constant.
b6 = -0.23. For each additional mile to the
downtown center, the operating margin
decreases on average by .23% when the
other variables are held constant.
23
Testing the Coefficients
The hypothesis for each bi is
y = MBA GPA
x1 = undergraduate GPA [UnderGPA]
x2 = GMAT score [GMAT]
x3 = years of work experience [Work]
30
MBA Program Admission
Policy – Model Diagnostics
SUMMARY OUTPUT
We estimate the
Regression Statistics
Multiple R 0.6808 regression
R Square 0.4635 model then we check:
Adjusted R Square
0.4446
Standard Error 0.788
Normality of errors
Observations 89
Standardized residuals
ANOVA
df SS MS 40 F Significance F
Regression 3 45.60 15.2030 24.48 0.0000
Residual 85 52.77 0.62
20
Total 88 98.37
10
Coefficients
Standard Error t Stat 0 P-value
Intercept 0.466 1.506 0.31 -2.5
0.7576 -1.5 -0.5 0.5 1.5 2.5 More
UnderGPA 0.063 0.120 0.52 0.6017
GMAT 0.011 0.001 8.16 0.0000
Work 0.093 0.031 3.00 0.0036
31
MBA Program Admission
Policy – Model Diagnostics
SUMMARY OUTPUT
We estimate the
Regression Statistics
Multiple R 0.6808 regression
R Square 0.4635 model then we check:
Adjusted R Square
0.4446
Standard Error 0.788
The variance of the error variable
Observations 89
Residuals
ANOVA
df SS MS 2 F Significance F
Regression 3 45.60 15.20 1 24.48 0.0000
Residual 85 52.77 0.62 0
Total 88 98.37
-1 6 7 8 9 10
-2
Coefficients
Standard Error t Stat P-value
Intercept 0.466 1.506 0.31-3 0.7576
UnderGPA 0.063 0.120 0.52 0.6017
GMAT 0.011 0.001 8.16 0.0000
Work 0.093 0.031 3.00 0.0036
32
MBA Program Admission
Policy – Model Diagnostics
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.6808
R Square 0.4635
Adjusted R Square 0.4446
Standard Error 0.788
Observations 89
ANOVA
df SS MS F Significance F
Regression 3 45.60 15.20 24.48 0.0000
Residual 85 52.77 0.62
Total 88 98.37
33
MBA Program Admission
Policy – Model Assessment
SUMMARY OUTPUT
• 46.35% of the variation in
Regression Statistics MBA GPA is explained by
Multiple R 0.6808
R Square 0.4635
the model.
Adjusted R Square
0.4446
Standard Error 0.788 • The model is valid
Observations 89
(p-value = 0.0000…)
ANOVA
df SS MS F Significance F
Regression 3 45.60 15.20 24.48 0.0000
• GMAT score and years of
Residual 85 52.77 0.62 work experience are
Total 88 98.37
linearly related to MBA
Coefficients
Standard Error t Stat P-value GPA.
Intercept 0.466 1.506 0.31 0.7576
UnderGPA 0.063 0.120 0.52 0.6017 • Insufficient evidence of
GMAT 0.011 0.001 8.16 0.0000
Work 0.093 0.031 3.00 0.0036 linear relationship
between undergraduate
34
GPA and MBA GPA.
Regression Diagnostics - II
36
Diagnostics: Multicolinearity
The proposed model is
PRICE = b0 + b1BEDROOMS + b2H-SIZE
+b3LOTSIZE
SUMMARY OUTPUT +e
Regression Statistics
Multiple R 0.7483 The model is valid, but no
R Square 0.5600 variable is significantly related
Adjusted R Square
0.5462
Standard Error 25023 to the selling price ?!
Observations 100
ANOVA
df SS MS F Significance F
Regression 3 76501718347 25500572782 40.73 0.0000
Residual 96 60109046053 626135896
Total 99 136610764400
39
Reducing Nonnormality by
Transformations
A brief list of transformations
y’ = log y (for y > 0)
Use when the se increases with y, or
Use when the error distribution is positively skewed
y’ = y2
Use when the s2e is proportional to E(y), or
Use when the error distribution is negatively skewed
y’ = y1/2 (for y > 0)
Use when the s2e is proportional to E(y)
y’ = 1/y
Use when s2e increases significantly when y increases
beyond some critical value.
40
Durbin - Watson Test:
Are the Errors Autocorrelated?
This test detects first order
autocorrelation between consecutive
residuals in a time series
If autocorrelation exists the error
variables are not
n independent
i 2
(ei ei 1 ) 2
Residual at time i d n
i 1
ei 2
The range of d is 0 d 4 41
Positive First Order
Autocorrelation
+ Residuals
+
+
+
0
Time
+
+ +
+
Residuals
+ +
+
0
+ + + Time
+
dL dU
44
One Tail Test for Negative
First Order Autocorrelation
If d>4-dL, negative first order correlation
exists
If d<4-dU, negative first order correlation
does not exists
if d falls between 4-dU and 4-dL the test
is inconclusive.
Negative first order correlation Inconclusive Negative
does not exist test first order
correlation
exists
4-dU 4-dL
45
Two-Tail Test for First Order
Autocorrelation
If d<dL or d>4-dL first order autocorrelation
exists
If d falls between dL and dU or between 4-
dU and
4-dLthe test is inconclusive
First order First order
First order
If d fallsInconclusive
between
correlation
test d and 4-d
correlation
does notUU there
Inconclusive
test
isFirst order
correlation
does not
no
correlation
exists exists
evidence for first order autocorrelation
exist exist
0 dL dU 2 4-dU 4-dL 4
46
Testing the Existence of
Autocorrelation, Example
Example
How does the weather affect the sales of lift tickets in a
ski resort?
Data of the past 20 years sales of tickets, along with
ANOVA
df SS MS F Signif. F
Regression 2 6793798 3396899 1.16 0.3373
Residual 17 49807214 2929836
Total 19 56601012
49
Diagnostics:
Heteroscedasticity
Residual vs. predicted y
3000
2000
1000
0
-10007500 8500 9500 10500 11500 12500
-2000
-3000
-4000