You are on page 1of 73

Basic Business Statistics

(9th Edition)

Chapter 14
Introduction to Multiple
Regression

© 2004 Prentice-Hall, Inc. Chap 14-1


Chapter Topics
◼ The Multiple Regression Model
◼ Residual Analysis
◼ Testing for the Significance of the Regression
Model
◼ Inferences on the Population Regression
Coefficients
◼ Testing Portions of the Multiple Regression
Model
◼ Dummy-Variables and Interaction Terms
◼ Logistic Regression Model
© 2004 Prentice-Hall, Inc. Chap 14-2
The Multiple Regression Model
Relationship between 1 dependent & 2 or more
independent variables is a linear function
Population Population slopes Random
Y-intercept error

Yi =   +  X 1i +   X 2i + +  k X ki +  i
Dependent (Response) Independent (Explanatory)
variable variables

© 2004 Prentice-Hall, Inc. Chap 14-3


Multiple Regression Model
YYi i== 00 ++  1X
X1i1i ++ 22XX2i2+i +i i
Y (Observed Y)
(Observed Y)

Response
Response  00 i
Plane
Plane
X
X22

X
X11 (X
X 1i ,,X
1i 2i2)i
X

Y| XY|X== 00 +


+ 1XX1i ++
1 1i
2X2i
2 X 2i

© 2004 Prentice-Hall, Inc. Chap 14-4


Multiple Regression Equation
= b00 +
Yii = X11ii +
+ b11X + bb22X
X2i + eei
2i +
Y
Y (Observed
(ObservedYY))
Response
Response bb00
Plane ei
Plane
X
X22

X
X11 (X
X 11ii , X2i2 i)
ˆ ^
YYi i==bb00++bb1 X i + b22X2i
1X11i + b 2i

© 2004 Prentice-Hall, Inc.


Multiple Regression Equation Chap 14-5
Multiple Regression Equation
Too
complicated
by hand! Ouch!

© 2004 Prentice-Hall, Inc. Chap 14-6


Interpretation of Estimated
Coefficients

◼ Slope (bj )
◼ Estimated that the average value of Y changes by
bj for each 1 unit increase in Xj , holding all other
variables constant (ceterus paribus)
◼ Example: If b1 = -2, then fuel oil usage (Y) is
expected to decrease by an estimated 2 gallons for
each 1 degree increase in temperature (X1), given
the inches of insulation (X2)
◼ Y-Intercept (b0)
◼ The estimated average value of Y when all Xj = 0

© 2004 Prentice-Hall, Inc. Chap 14-7


Multiple Regression Model:
Example
Oil (Gal) Temp (0F) Insulation
Develop a model for estimating 275.30 40 3
heating oil used for a single 363.80 27 3
family home in the month of 164.30 40 10
40.80 73 6
January, based on average 94.30 64 6
temperature and amount of 230.90 34 6
insulation in inches. 366.70 9 6
300.60 8 10
237.80 23 10
121.40 63 3
31.40 65 10
203.50 41 6
441.10 21 3
323.00 38 3
52.50 58 10
© 2004 Prentice-Hall, Inc. Chap 14-8
Multiple Regression Equation:
Example

Yˆi = b0 + b1 X 1i + b2 X 2i + + bk X ki
Coefficients
Excel Output Intercept 562.1510092
X Variable 1 -5.436580588
X Variable 2 -20.01232067

Yˆi = 562.151 − 5.437 X1i − 20.012 X 2i


For each degree increase in For each increase in one inch
temperature, the estimated average of insulation, the estimated
amount of heating oil used is average use of heating oil is
decreased by 5.437 gallons, decreased by 20.012 gallons,
holding insulation constant. holding temperature constant.
© 2004 Prentice-Hall, Inc. Chap 14-9
Multiple Regression in PHStat

◼ PHStat | Regression | Multiple Regression …

◼ Excel spreadsheet for the heating oil example

© 2004 Prentice-Hall, Inc. Chap 14-10


Venn Diagrams and
Explanatory Power of Regression
Variations in
Variations in
Oil explained
Temp not used
by the error
term ( SSE )
in explaining
variation in Oil Oil
Variations in Oil
explained by Temp
or variations in
Temp Temp used in
explaining variation
in Oil ( SSR )
© 2004 Prentice-Hall, Inc. Chap 14-11
Venn Diagrams and
Explanatory Power of Regression
(continued)

r =
2
Oil

Temp
SSR
=
variation of oil
SSR + SSE
© 2004 Prentice-Hall, Inc. Chap 14-12
Venn Diagrams and
Explanatory Power of Regression
Variation NOT Overlapping
explained by variation in
Temp nor both Temp and
Insulation Oil Insulation are
( SSE ) used in
explaining the
variation in Oil
Temp
but NOT in the
Insulation estimation of 
1
nor  2
© 2004 Prentice-Hall, Inc. Chap 14-13
Coefficient of
Multiple Determination
◼ Proportion of Total Variation in Y Explained by
All X Variables Taken Together
SSR Explained Variation
◼ r2
Y •12 k = =
SST Total Variation

◼ Never Decreases When a New X Variable is


Added to Model
◼ Disadvantage when comparing among models
© 2004 Prentice-Hall, Inc. Chap 14-14
Venn Diagrams and
Explanatory Power of Regression
r^2 can not decrease when add more independent variable

Oil r2
Y •12 =

Temp
Insulation SSR
=
SSR + SSE
© 2004 Prentice-Hall, Inc. Chap 14-15
Adjusted Coefficient of Multiple
Determination
◼ Proportion of Variation in Y Explained by All the X
Variables Adjusted for the Sample Size and the
Number of X Variables Used
 n −1 
= 1 − (1 − rY •12 k )
◼ 2 2 k is # variable
r
n − k − 1 
adj

◼ Penalizes excessive use of independent variables
2
◼ Smaller than rY •12 k
◼ Useful in comparing among models
◼ Can decrease if an insignificant new X variable is
added to the model
© 2004 Prentice-Hall, Inc. Chap 14-16
Coefficient of Multiple
Determination
Excel Output SSR
R e g re ssi o n S ta ti sti c s
r 2
Y •12 =
SST
M u lt ip le R 0.982654757
R S q u a re 0.965610371
A d ju s t e d R S q u a re 0.959878766 Adjusted r2
S t a n d a rd E rro r 26.01378323 ❑ reflects the number
O b s e rva t io n s 15 of explanatory
variables and sample
size
❑ is smaller than r2

© 2004 Prentice-Hall, Inc. Chap 14-17


Interpretation of Coefficient of
Multiple Determination
SSR
◼ r 2
Y •12 = = .9656
SST
◼ 96.56% of the total variation in heating oil can be
explained by temperature and amount of insulation

r = .9599
2
adj

◼ 95.99% of the total fluctuation in heating oil can


be explained by temperature and amount of
insulation after adjusting for the number of
explanatory variables and sample size
© 2004 Prentice-Hall, Inc. Chap 14-18
Simple and Multiple Regression
Compared
◼ The slope coefficient in a simple regression picks up
the impact of the independent variable plus the
impacts of other variables that are excluded from the
model, but are correlated with the included
independent variable and the dependent variable
◼ Coefficients in a multiple regression net out the
impacts of other variables in the equation
◼ Hence, they are called the net regression coefficients

◼ They still pick up the effects of other variables that are


excluded from the model, but are correlated with the
included independent variables and the dependent
variable
© 2004 Prentice-Hall, Inc. Chap 14-19
Simple and Multiple Regression
Compared: Example

◼ Two Simple Regressions: The three  ’s


are different
◼ Oil =  0 + 1 Temp + 

Oil =  0 +  2 Insulation + 

◼ Multiple Regression:
◼ Oil = 
0 + 1 Temp +  2 Insulation + 

The three  0’s do not The two  2’s do not


have the same value have the same value
The two 1’s do not
© 2004 Prentice-Hall, Inc. have the same value Chap 14-20
Simple and Multiple Regression
Compared: Slope Coefficients
Oil = b0 + b1 Temp + b2 Insulation + e
Coefficients
Intercept 562.1510092 -20.0123  -20.3503
Temp -5.436580588
Insulation -20.01232067

Oil = b0 + b1 Temp + e Oil = b0 + b2 Insulation + e

Coefficients Coefficients
Intercept 436.4382299 Intercept 345.3783784
Temp -5.462207697 Insulation -20.35027027
The three e’s are different
-5.4366  -5.4622
© 2004 Prentice-Hall, Inc. Chap 14-21
Simple and Multiple Regression
Compared: r2
Oil = b0 + b1 Temp + b2 Insulation + e
Regression Statistics
Multiple R 0.982654757
( 0.97275)

=
R Square 0.965610371
Adjusted R Square 0.959878766
Standard Error
Observations
26.01378323
15
0.96561  ( 0.75645 + 0.21630 )

Oil = b0 + b1 Temp + e Oil = b0 + b1 Insulation + e


Regression Statistics Regression Statistics
Multiple R 0.86974117 Multiple R 0.465082527
R Square 0.756449704 R Square 0.216301757
Adjusted R Square 0.737715065 Adjusted R Square 0.156017277
Standard Error 66.51246564 Standard Error 119.3117327
Observations 15 Observations 15
© 2004 Prentice-Hall, Inc. Chap 14-22
Example: Adjusted r2
Can Decrease
Oil =  0 + 1 Temp +  2 Insulation + 
Regression Statistics
Multiple R 0.982654757
R Square 0.965610371
Adjusted R Square 0.959878766 Try a 3rd explanatory variable
Standard Error 26.01378323
Observations 15

Oil =  0 + 1 Temp +  2 Insulation +  3 Rainfall + 


Regression Statistics Adjusted r 2 decreases when
Multiple R 0.983482856 k increases from 2 to 3
R Square 0.967238528
Adjusted R Square 0.958303581
Standard Error 25.72417272 Rainfall is not useful in explaining
Observations 15 the variation in oil consumption.
© 2004 Prentice-Hall, Inc. Chap 14-23
Using the Regression Equation
to Make Predictions
Predict the amount of heating oil used for a
home if the average temperature is 300 and
the insulation is 6 inches.

Yˆi = 562.151 − 5.437 X 1i − 20.012 X 2i


= 562.151 − 5.437 ( 30 ) − 20.012 ( 6 )
= 278.969 The predicted heating oil
used is 278.97 gallons.
© 2004 Prentice-Hall, Inc. Chap 14-24
Predictions in PHStat

◼ PHStat | Regression | Multiple Regression …


◼ Check the “Confidence and Prediction Interval
Estimate” box
◼ Excel spreadsheet for the heating oil example

© 2004 Prentice-Hall, Inc. Chap 14-25


Residual Plots residual show the error b/t actual
and forecast value

◼ Residuals Vs Yˆ
◼ May need to transform Y variable
◼ Residuals Vs X1
◼ May need to transform X 1 variable
◼ Residuals Vs X2
◼ May need to transform X 2variable
◼ Residuals Vs Time is time series
◼ May have autocorrelation

© 2004 Prentice-Hall, Inc. Chap 14-26


Residual Plots: Example
T em p eratu re R esid u al P lo t
Maybe some non-
60 linear relationship
40

20
Insulation R esidual P lot
Re sidua ls

0 20 40 60 80
-20

-40

-60

0 2 4 6 8 10 12

No Discernable Pattern

© 2004 Prentice-Hall, Inc. Chap 14-27


Testing for Overall Significance
◼ Shows if Y Depends Linearly on All of the X
Variables Together as a Group
Use F Test Statistic for multiple just use f-test only
for simple test we can use f-test or t-test

◼ Hypotheses:
◼ H0:  =  = … = k = 0 (No linear relationship)
◼ H1: At least one j   ( At least one independent
variable affects Y )
◼ The Null Hypothesis is a Very Strong Statement
◼ The Null Hypothesis is Almost Always Rejected
© 2004 Prentice-Hall, Inc. Chap 14-28
Testing for Overall Significance
(continued)

◼ Test Statistic:

MSR SSR / k
◼ F= =
MSE MSE / ( n − k − 1)
◼ Where F has k numerator and (n-k-1)
denominator degrees of freedom

© 2004 Prentice-Hall, Inc. Chap 14-29


Test for Overall Significance
Excel Output: Example
ANOVA
df SS MS F Significance F
Regression 2 228014.6 114007.3 168.4712 1.65411E-09
Residual 12 8120.603 676.7169
Total 14 236135.2
p-value
k = 2, the number of
explanatory variables n-1

MSR
= F Test Statistic
MSE
© 2004 Prentice-Hall, Inc. Chap 14-30
Test for Overall Significance:
Example Solution
H0: 1 = 2 = … = k = 0 Test Statistic:
H1: At least one j  0 F = 168.47
 = .05 (Excel Output)
df = 2 and 12
Decision:
Critical Value: review how to find critical value
Reject at  = 0.05.
Conclusion:
 = 0.05 There is evidence that at
least one independent
variable affects Y.
0 3.89 F
© 2004 Prentice-Hall, Inc. Chap 14-31
Test for Significance:
Individual Variables

◼ Show If Y Depends Linearly on a Single Xj


Individually While Holding the Effects of Other
X’s Fixed isolate 1 variable
and test it

◼ Use t Test Statistic


◼ Hypotheses:
◼ H0: j = 0 (No linear relationship)
◼ H1: j  0 (Linear relationship between Xj and Y)

© 2004 Prentice-Hall, Inc. Chap 14-32


t Test Statistic
Excel Output: Example

t Test Statistic for X1


(Temperature)

Coefficients Standard Error t Stat P-value


Intercept 562.1510092 21.09310433 26.65094 4.77868E-12
Temp -5.436580588 0.336216167 -16.1699 1.64178E-09
Insulation -20.01232067 2.342505227 -8.543127 1.90731E-06

bj
t= t Test Statistic for X2
Sb j (Insulation)

© 2004 Prentice-Hall, Inc. Chap 14-33


t Test : Example Solution
Does temperature have a significant effect on monthly
consumption of heating oil? Test at  = 0.05.
H0: 1 = 0 Test Statistic:
H1: 1  0 t Test Statistic = -16.1699
df = 12 Decision:
Critical Values: Reject H0 at  = 0.05.
Reject H0 Reject H0 Conclusion:
There is evidence of a
.025 .025
significant effect of
temperature on oil
-2.1788 0 2.1788 t consumption holding constant
© 2004 Prentice-Hall, Inc. the effect of insulation. Chap 14-34
Venn Diagrams and
Estimation of Regression Model
Only this Only this
information is information is
used in the used in the
estimation of estimation of  2
Oil
1 This
information
is NOT used
Temp in the
Insulation estimation
of 1 nor  2
© 2004 Prentice-Hall, Inc. Chap 14-35
Confidence Interval Estimate for
the Slope
Provide the 95% confidence interval for the population
slope 1 (the effect of temperature on oil consumption).
Coefficients Lower 95% Upper 95%
Intercept 562.151009 516.1930837 608.108935
Temp -5.4365806 -6.169132673 -4.7040285
Insulation -20.012321 -25.11620102 -14.90844

b1  tn− p −1Sb1 -6.169  1  -4.704


We are 95% confident that the estimated average consumption of
oil is reduced by between 4.7 gallons to 6.17 gallons per each
increase of 10 F holding insulation constant.
We can also perform the test for the significance of individual
variables, H0: 1 = 0 vs. H1: 1  0, using this confidence interval.
© 2004 Prentice-Hall, Inc. Chap 14-36
Contribution of a Single
Independent Variable X j

◼ Let Xj Be the Independent Variable of


Interest
◼ SSR ( X j | all others except X j )
j is from 1 to k

= SSR ( all ) − SSR ( all others except X j )


◼ Measures the additional contribution of Xj in
explaining the total variation in Y with the inclusion
of all the remaining independent variables

© 2004 Prentice-Hall, Inc. Chap 14-37


Contribution of a Single
Independent Variable X k
Measures the additional contribution of X1 in
explaining Y with the inclusion of X2 and X3.
SSR ( X 1 | X 2 and X 3 )
= SSR ( X 1 , X 2 and X 3 ) − SSR ( X 2 and X 3 )
From ANOVA section of From ANOVA section
regression for of regression for
Yˆi = b0 + b1 X1i + b2 X 2i + b3 X 3i Yˆi = b0 + b2 X 2i + b3 X 3i
Note: the values of the coefficients b0 , b1 , and b2 change
in the two regression equations.
© 2004 Prentice-Hall, Inc. Chap 14-38
Coefficient of Partial
Determination of X j

◼ r 2
Yj • all others =
SSR ( X j | all others )
SST − SSR ( all ) + SSR ( X j | all others )

◼ Measures the proportion of variation in the


dependent variable that is explained by Xj
while controlling for (holding constant) the
other independent variables

© 2004 Prentice-Hall, Inc. Chap 14-39


Coefficient of Partial
Determination for X j
(continued)

Example: Model with two independent variables

SSR ( X 1 | X 2 )
r 2
=
SST − SSR ( X 1 , X 2 ) + SSR ( X 1 | X 2 )
Y 1• 2

© 2004 Prentice-Hall, Inc. Chap 14-40


Venn Diagrams and Coefficient of
Partial Determination for X j
r2
=
SSR ( X1 | X 2 )
Y1 • 2

SSR ( X 1 | X 2 )
SST − SSR ( X 1 , X 2 ) + SSR ( X 1 | X 2 )
Oil
the part of both x1 only x1 not x2
and x2

=
Temp
Insulation
© 2004 Prentice-Hall, Inc. Chap 14-41
Coefficient of Partial
Determination in PHStat

◼ PHStat | Regression | Multiple Regression …


◼ Check the “Coefficient of Partial Determination”
box
◼ Excel spreadsheet for the heating oil example

© 2004 Prentice-Hall, Inc. Chap 14-42


Contribution of a Subset of
Independent Variables

◼ Let Xs Be the Subset of Independent Variables


of Interest

SSR ( X s | all others except X s )
= SSR ( all ) − SSR ( all others except X s )

◼ Measures the contribution of the subset Xs in


explaining SST with the inclusion of the remaining
independent variables

© 2004 Prentice-Hall, Inc. Chap 14-43


Contribution of a Subset of
Independent Variables: Example
Let Xs be X1 and X3

SSR ( X 1 and X 3 | X 2 )
= SSR ( X 1 , X 2 and X 3 ) − SSR ( X 2 )

From ANOVA
From ANOVA section of section of
regression for regression for
Yˆi = b0 + b1 X1i + b2 X 2i + b3 X 3i Yˆi = b0 + b2 X 2i
© 2004 Prentice-Hall, Inc. Chap 14-44
Testing Portions of Model
◼ Examines the Contribution of a Subset Xs of
Explanatory Variables to the Relationship with Y
◼ Null Hypothesis:
◼ Variables in the subset do not improve the model
significantly when all other variables are included
◼ Alternative Hypothesis:
◼ At least one variable in the subset is significant
when all other variables are included

© 2004 Prentice-Hall, Inc. Chap 14-45


Testing Portions of Model
(continued)

◼ One-Tailed Rejection Region


◼ Requires Comparison of Two Regressions
◼ One regression includes everything
◼ Another regression includes everything except the
portion to be tested

© 2004 Prentice-Hall, Inc. Chap 14-46


Partial F Test for the Contribution of
a Subset of X Variables
◼ Hypotheses:
◼ H0 : Variables Xs do not significantly improve the
model given all other variables included
◼ H1 : Variables Xs significantly improve the model
given all others included
◼ Test Statistic:
◼ SSR ( X s | all others ) / m
F=
MSE ( all )

◼ with df = m and (n-k-1)


◼ m = # of variables in the subset Xs
© 2004 Prentice-Hall, Inc. Chap 14-47
Partial F Test for the
Contribution of a Single X j

◼ Hypotheses:
◼ H0 : Variable Xj does not significantly improve
the model given all others included
◼ H1 : Variable Xj significantly improves the
model given all others included
◼ Test Statistic:
◼ SSR ( X j | all others )
F=
MSE ( all )
◼ with df =1 and (n-k-1 )
◼ m = 1 heresingle m=1, not single m /= 1
© 2004 Prentice-Hall, Inc. Chap 14-48
Testing Portions of Model:
Example
Test at the  = .05
level to determine if
the variable of
average temperature
significantly improves
the model, given that
insulation is included.

© 2004 Prentice-Hall, Inc. Chap 14-49


Testing Portions of Model:
Example single

H0: X1 (temperature) does


not improve model with X2
 = .05, df = 1 and 12
(insulation) included Critical Value = 4.75
H1: X1 does improve model
ANOVA (For X1 and X2) ANOVA (For X2)
SS MS SS
Regression 228014.6263 114007.313 Regression 51076.47
Residual 8120.603016 676.716918 Residual 185058.8
Total 236135.2
Total 236135.2293

SSR ( X 1 | X 2 ) ( 228, 015 − 51, 076 )


F= = = 261.47
MSE ( X 1 , X 2 ) 676.717
© 2004 Prentice-Hall, Inc.
Conclusion: Reject H0; X1 does improve model. Chap 14-50
Testing Portions of Model
in PHStat

◼ PHStat | Regression | Multiple Regression …


◼ Check the “Coefficient of Partial Determination”
box
◼ Excel spreadsheet for the heating oil example

© 2004 Prentice-Hall, Inc. Chap 14-51


Do We Need to Do This
for One Variable?

◼ The F Test for the Contribution of a Single


Variable After All Other Variables are Included
in the Model is IDENTICAL to the t Test of
the Slope for that Variable
◼ The Only Reason to Perform an F Test is to
Test Several Variables Together
simple linear: can use both f and t test
multiple linear: can only use t test for 1 variable and f test for multiple variables

© 2004 Prentice-Hall, Inc. Chap 14-52


Dummy-Variable Models
◼ Categorical Explanatory Variable with 2 or More
Levels
◼ Only Intercepts are Different
◼ Assumes Equal Slopes Across Categories
◼ The Number of Dummy-Variables Needed is (# of
Levels - 1)
◼ Regression Model Has Same Form:
Yi =  0 + 1 X 1i +  2 X 2i + • • • +  k X ki +  i
◼ Two Level Examples
◼ Yes or No, On or Off
◼ Use Dummy-Variable (Coded as 0 or 1)
© 2004 Prentice-Hall, Inc. Chap 14-53
Dummy-Variable Models
(with 2 Levels)
Given: Yˆi = b0 + b1 X1i + b2 X 2i
Y = Assessed Value of House
X1 = Square Footage of House
X2 = Desirability of Neighborhood = 0 if undesirable
1 if desirable
Desirable (X2 = 1)
Yˆi = b0 + b1 X1i + b2 (1) = (b0 + b2 ) + b1 X 1i
Undesirable (X2 = 0) Same
slopes
Yˆ = b + b X + b (0) = b + b X
i 0 1 1i 2 0 1 1i
© 2004 Prentice-Hall, Inc. Chap 14-54
Dummy-Variable Models
(with 2 Levels)
(continued)
Y (Assessed Value)

Same
slopes b1
b0 + b2
Intercepts
different b0

X1 (Square footage)
© 2004 Prentice-Hall, Inc. Chap 14-55
Interpretation of the Dummy-
Variable Coefficient (with 2 Levels)
Example:
Yˆi = b0 + b1 X1i + b2 X 2i = 20 + 5 X 1i + 6 X 2i
Y : Annual salary of college graduate in thousand $
0 non-business degree
X 1 : GPA X 2:
1 business degree

With the same GPA, college graduates with a business


degree are making an estimated 6 thousand dollars more
than graduates with a non-business degree, on average.
© 2004 Prentice-Hall, Inc. Chap 14-56
Dummy-Variable Models
(with 3 Levels)
Given:
Y = Assessed Value of the House (1000 $)
X 1 = Square Footage of the House
Style of the House = Split-level, Ranch, Tudor
(3 Levels; Need 2 Dummy Variables)
1 if Split-level 1 if Ranch
X2 =  X3 = 
 0 if not  0 if not
Yˆi = b0 + b1 X 1 + b2 X 2 + b3 X 3
© 2004 Prentice-Hall, Inc. Chap 14-57
Interpretation of the Dummy-
Variable Coefficients (with 3 Levels)
Given the Estimated Model:
Yˆi = 20.43 + 0.045 X 1i + 18.84 X 2i + 23.53 X 3i
For Split-level ( X 2 = 1) : With the same footage, a Split-
level will have an estimated
Yˆi = 20.43 + 0.045 X 1i + 18.84
average assessed value of 18.84
For Ranch ( X 3 = 1) : thousand dollars more than a
Tudor.
Yˆi = 20.43 + 0.045 X 1i + 23.53 With the same footage, a Ranch
For Tudor: will have an estimated average
assessed value of 23.53
Yˆ = 20.43 + 0.045 X
i 1i thousand dollars more than a
Tudor.
© 2004 Prentice-Hall, Inc. Chap 14-58
Regression Model Containing
an Interaction Term

◼ Hypothesizes Interaction between a Pair of X


Variables
◼ Response to one X variable varies at different
levels of another X variable
◼ Contains a Cross-Product Term
◼ Yi =  0 + 1 X 1i +  2 X 2 i +  3 X 1i X 2 i +  i

◼ Can Be Combined with Other Models


◼ E.g., Dummy-Variable Model

© 2004 Prentice-Hall, Inc. Chap 14-59


Effect of Interaction

◼ Given:
◼ Yi =  0 + 1 X 1i +  2 X 2 i +  3 X 1i X 2 i +  i

◼ Without Interaction Term, Effect of X1 on Y is


Measured by 1
◼ With Interaction Term, Effect of X1 on Y is
Measured by 1 + 3 X2
◼ Effect Changes as X2 Changes

© 2004 Prentice-Hall, Inc. Chap 14-60


Interaction Example
Y = 1 + 2X1 + 3X2 + 4X1X2
Y
Y = 1 + 2X1 + 3(1) + 4X1(1) = 4 + 6X1
12

8
Y = 1 + 2X1 + 3(0) + 4X1(0) = 1 + 2X1
4
0 X1
0 0.5 1 1.5
Effect (slope) of X1 on Y depends on X2 value
© 2004 Prentice-Hall, Inc. Chap 14-61
Interaction Regression Model
Worksheet
Case, i Yi X1i X2i X1i X2i
1 1 1 3 3
2 4 8 5 40
3 1 3 2 6
4 3 5 6 30
: : : : :

Multiply X1 by X2 to get X1X2


Run regression with Y, X1, X2 , X1X2
© 2004 Prentice-Hall, Inc. Chap 14-62
Interpretation When There Are
3+ Levels
Consider the effects of gender (male or female) and
working status (working part-time, working full-time
or not working) on income (Y ).
Y =  0 + 1 Male +  2 Part-time +  3 Full-time
+  4 Male • Part-time + 5 Male • Full-time + 
Male = 0 if female; 1 if male
Part-time = 1 if working part-time; 0 if working full-time or not working
Full-time = 1 if working full-time; 0 if working part-time or not working
Male•Part-time = 1 if male and working part-time; 0 otherwise
= (Male times Part-time)
Male•Full-time = 1 if male working full-time; 0 otherwise
= (Male times Full-time)
© 2004 Prentice-Hall, Inc. Chap 14-63
Interpretation When There Are
3+ Levels
(continued)

Y =  0 + 1 Male +  2 Part-time +  3 Full-time


+  4 Male • Part-time + 5 Male • Full-time + 
Not-working Part-time Full-time

Female   + 2   + 3
Male   + 1   +  1   + 1
+  2 +  4 + 3 + 5

© 2004 Prentice-Hall, Inc. Chap 14-64


Interpreting Results
Female Male Difference
Not-working:  0 Not-working:  0 + 1 1
Part-time:  0 +  2 Part-time:  0 + 1 1 +  4
+ 2 + 4
Full-time: 0 + 3 Full-time:  0 + 1 1 +  5
+ 3 + 5

Main Effects : Male, Part-time and Full-time


Interaction Effects : Male•Part-time and Male•Full-time

© 2004 Prentice-Hall, Inc. Chap 14-65


Evaluating the Presence of
Interaction with Dummy-Variable
◼ Suppose X1 and X2 are Numerical Variables and X3 is a
Dummy-Variable
◼ To Test if the Slope of Y with X1 and/or X2 are the
Same for the Two Levels of X3
◼ Model:
Yi =  0 + 1 X 1i +  2 X 2i +  3 X 3i +  4 X 1i X 3i +  5 X 2i X 3i +  i
◼ Hypotheses:
◼ H0: 4 = 5 = 0 (No Interaction between X1 and X3 or X2 and
X3 )
◼ H1: 4 and/or 5  0 (X1 and/or X2 Interacts with X3)
◼ Perform a Partial F Test
F=
( SSR( X 1 , X 2 , X 3 , X 4 , X 5 ) − SSR( X 1 , X 2 , X 3 ) ) / 2
© 2004 Prentice-Hall, Inc.
MSE ( X 1 , X 2 , X 3 , X 4 , X 5 ) Chap 14-66
Evaluating the Presence of
Interaction with Numerical Variables
◼ Suppose X1, X2 and X3 are Numerical Variables
◼ To Test If the Independent Variables Interact with
Each Other
◼ Model:
Yi =  0 + 1 X 1i +  2 X 2i +  3 X 3i +  4 X 1i X 2 i +  5 X 1i X 3i +  6 X 2 i X 3i +  i
◼ Hypotheses:
◼ H0: 4 = 5 = 6 = 0 (no interaction among X1, X2 and X3 )
◼ H1: at least one of 4, 5, 6  0 (at least one pair of X1, X2,
X3 interact with each other)
◼ Perform a Partial F Test
F=
( SSR( X 1 , X 2 , X 3 , X 4 , X 5 , X 6 ) − SSR( X 1, X 2 , X 3 ) ) / 3
MSE ( X 1 , X 2 , X 3 , X 4 , X 5 , X 6 )
© 2004 Prentice-Hall, Inc. Chap 14-67
Logistic Regression Model
◼ Enables the Use of Regression Model to
Predict the Probability of a Particular
Categorical Response for a Given Set of
Explanatory Variables
◼ Based on the Odds Ratio
◼ Represents the probability of a success compared
with the probability of failure
◼ probability of success
Odds ratio =
1 − probability of success

© 2004 Prentice-Hall, Inc. Chap 14-68


Logistic Regression Model
(continued)

◼ Logistic Regression Model


◼ ln ( odds ratio ) =  +  X
0 1 1i +  2 X 2 i + +  k X ki +  i
◼ Logistic Regression Equation
◼ ln ( estimated odds ratio ) = b0 + b1 X 1i + b2 X 2 i + + bk X ki
◼ Estimated Odds Ratio
ln ( estimated odds ratio )

e
◼ Estimated Probability of Success

estimated odds ratio
1 + estimated odds ratio
© 2004 Prentice-Hall, Inc. Chap 14-69
Interpretation of Estimated
Slope Coefficients

◼ Logistic Regression Equation Has to be


Estimated Using Computer Statistical
Software, e.g. Minitab®
◼ The Estimated Slope Coefficient bj Measures
the Estimated Change in the Natural
Logarithm of the Odds Ratio as a Result of a
One Unit Change in the Independent Variable
Xj Holding Constant the Effects of all the
Other Independent Variables

© 2004 Prentice-Hall, Inc. Chap 14-70


The Deviance Statistic

◼ Use to Test whether the Logistic


Regression is a Good-Fitting Model
◼ Hypotheses
◼ H0 : The model is a good-fitting model
◼ H1 : The model is not a good-fitting model
◼ Test Statistic
◼ The deviance statistic has a c distribution with
(n – k – 1) degrees of freedom
◼ The rejection region is always in the upper tail

© 2004 Prentice-Hall, Inc. Chap 14-71


Testing Significance of an
Independent Variable

◼ Hypotheses
◼ H0 :  j = 0 (Xj is not significant)
◼ H1 :  j  0 (Xj is significant)
◼ Test Statistic
◼ The Wald statistic is normally distributed
◼ A two-tail test with left and right-tail rejection
regions

© 2004 Prentice-Hall, Inc. Chap 14-72


Chapter Summary
◼ Developed the Multiple Regression Model
◼ Discussed Residual Plots
◼ Addressed Testing the Significance of the
Multiple Regression Model
◼ Discussed Inferences on Population
Regression Coefficients
◼ Addressed Testing Portions of the Multiple
Regression Model
◼ Discussed Dummy-Variables and Interaction
Terms
◼ Addressed Logistic Regression Model
© 2004 Prentice-Hall, Inc. Chap 14-73

You might also like