You are on page 1of 9

Sessions 11&12: Multiple Regression

with Categorical Predictors & Variable


Selection Method-Backward Elimination

Dr. Mahesh K C 1
Regression with Categorical Predictors Using Indicator Variables
• Categorical variables can be included in model through use of indicator variables.
• Example: Consider Cars data set. We have: mpg, cylinders, cubicinches, hp, weightlbs, time.to.60
are continuous variables and brand-Categorical Variable with three levels US, Japan and Europe.
The variable “year” is not considered.
• For regression, categorical variable with k categories transformed to k – 1 indicator (dummy)
variables. Indicator variable is binary, equals 1 when observation belongs to category, otherwise
equals 0.
• Brand variable is transformed to two indicator (dummy) variables:
1 1
C1   if country is Japan C 2   if country is US
0 otherwise 0 otherwise

• Note, brand =Europe is implied when C1 = 0 and C2 = 0 known as Reference Category.

Dr. Mahesh
2 KC
Estimated Regression Equation with Categorical Predictors
• Including indicator variables into model produces estimated reg. eq:
mpg  b 0  b1 ( cylinders )  b 2 ( cubicinche s )  b 3 ( hp )  b 4 ( weightlbs )
 b 5 ( time .to . 60 )  b 6 C 1  b 7 C 2

• estimated reg. eq. when brand is Japan:


mpg  ( b 0  b 6 )  b1 ( cylinders )  b 2 ( cubicinche s )  b 3 ( hp )  b 4 ( weightlbs )  b 5 ( time .to . 60 )

• estimated reg. eq. when brand is US:


mpg  ( b 0  b 7 )  b1 ( cylinders )  b 2 ( cubicinche s )  b 3 ( hp )  b 4 ( weightlbs )  b 5 ( time .to . 60 )

• estimated reg. eq. when brand is Europe:


mpg  b 0  b1 ( cylinders )  b 2 ( cubicinche s )  b 3 ( hp )  b 4 ( weightlbs )  b 5 ( time .to . 60 )

Dr. Mahesh
3 KC
Variable Selection Methods
• Several variable selection methods available.
• Assist analyst in determining which variables to include in model.
• Algorithms help select predictors leading to optimal model.
• Four variable selection methods:
(1) Forward Selection
(2) Backwards Elimination
(3) Stepwise Selection
(4) Best Subsets

Dr. Mahesh
4 KC
The Partial F-Test (Theory optional)
• Suppose model has x1,…,xp predictors and we consider adding additional predictor x*.
• Calculate sequential sum of squares from adding x*, given existing x1,…,xp in model.
• Full sum of squares SSFull = x1,…,xp, x* in model.
• Reduced sum of squares SSReduced = x1,…,xp in model.
• Therefore, extra sum of squares SSExtra denoted by
SS Extra  SS ( x * | x1 , x2 ,..., x p )  SS Full  SS Re duced
• Null hypothesis for Partial F-Test
– Ho: No, SSExtra associated with x* does not contribute significantly to model
– Ha: Yes, SSExtra associated with x* does contribute significantly to model
• Test statistic for Partial F-Test SS Extra
F ( x * | x1 , x 2 , , x p ) 
MSE Full

follows F1, n-p-2 distribution when Ho true.


• Therefore, Ho rejected for small p-value

Dr. Mahesh
5 KC
Backwards Elimination Procedure
• Procedure begins with all variables in model.
• Step 1:
Perform regression on full model with all variables
For example, assume model has x1,…,x4
• Step 2:
For each variable in model perform partial F-test
Select variable with smallest partial F-statistic, denoted Fmin
• Step 3:
If Fmin not significant, remove associated variable from model and return to Step 2
Otherwise, if Fmin significant, stop algorithm and report current model
If first pass, then current model is full model
If not first pass, then full set of predictors reduced by one or more variables

Dr. Mahesh
6 KC
Backwards Elimination Applied to Cars Data Set
• We begin with all predictors (excluding the predictor “Year”) included in the model.

Model A : mpg  b0  b1 (cylinder )  b2 (cubicinches)  b3 (hp)  b4 ( weightlbs )


 b5 (time.to.60)  b6 (brand )

• Partial F-statistic calculated for each predictor. Smallest F-statistic Fmin (= 0.5132) associated
with cubicinches. Here, Fmin is not significant at 5%, therefore cubicinches is dropped.
• On second pass predictor cylinders is eliminated as its Fmin (= 0.4425) which is not significant
at 5%.
• On third pass predictor time.to.60 is dropped with Fmin (=1.7229) which is not significant at
5%.
• Finally, all predictors are significant at 5% level.
• Procedure terminates with model (B):
Model B : mpg  b0  b1 (hp )  b2 ( weightlbs )  b6 (brand )

Dr. Mahesh
7 KC
Backwards Elimination Applied to Cars Data Set
• Most of the time variable selection methods take care of multicollinearity. Still
one may check for the same with latest model.
• Based on Model B check influential, outliers and leverage values.
• Check assumptions on regression. If violated one may try transformation
either on response variable or predictors or both.

Dr. Mahesh K C 8
References

• Shmueli, G., Bruce, P .C, Yahav, I., Patel, N.R., Lichtendahl, K .C.
(2018), Data Mining for Business Analytics, Wiley.
• Larose, D.T. & Larose, C.D. (2016), Data Mining and Predictive
Analytics, 2nd edition, Wiley.
• Kumar, U.D., (2018), Business Analytics-The Science of Data-
Driven Decision Making, 1st edition, Wiley.

Dr. Mahesh K C 9

You might also like