You are on page 1of 13

MACHINE LEARNING

COLLEGE PHYSICS
Chapter # Chapter Title
PowerPoint Image Slideshow

LECTURE 5 : Variable/Model Selection


LEARNING OBJECTIVES

• Variable Selection
• Cross-validation
• Cross-validation Methods
MODEL SELECTION/ VARIABLE SELECTION

• Given the data set with many potential predictors we need to decide
which ones to include in our model and which ones to leave out.
• Statistical algorithms may be used to find the best set of predictors.
Common selection methods are:
• Forward Selection (Automatic procedure)
• Backward Elimination (Automatic procedure)
• Best subset Selection (Automatic procedure)
FORWARD SELECTION

In Step 1, the predictor which has the most significance


with the response is entered into the model.

In subsequent steps, the remaining predictors are


considered; the predictor which has the greatest effect
on R2 is added.

The algorithm stops when adding predictors no longer


has a significant effect on R2.
5
The most significant
variable can be chosen so
that when added to the
model:
•It has the smallest p-
value, or
•It provides the highest
increase in R2, or
•It provides the highest
drop in model RSS
(Residuals Sum of
Squares) compared to
other predictors under
consideration.
BACKWARD ELIMINATION

In Step 1, all predictors are entered into the model.

In Subsequent Steps, the predictor whose removal


results in the smallest decrease in R2 is removed.

The algorithm stops when removing predictors would


result in a significant drop in R2.
7
The least significant
variable is a variable that:
•Has the highest p-value in
the model, or
•Its elimination from the
model causes the lowest
drop in R2, or
•Its elimination from the
model causes the lowest
increase in RSS (Residuals
Sum of Squares) compared
to other predictors
STEP WISE SELECTION(SEQUENTIAL REPLACEMENT)

It is a combination of forward and backward


selections.
Start with no predictors, then sequentially add
the most contributive predictors (like forward
selection).
After adding each new variable, remove any
variables that no longer provide an
improvement in the model fit (like backward
selection).
BEST SUBSETS(EXHAUSTIVE SEARCH)

Starts by considering all possible models with 1


variable, 2 variables, …, k variables
Then chooses the best model of size 1, the best
model of size 2, …, the best model of size k
Lastly, from these finalists, it chooses the best
overall model

Best subset selection ends up selecting 1 model


from 2k possible models.
10
R-CODE MODEL SELECTION

11
12
R-CODE MODEL SELECTION
13
EXERCISE
1. Use Carseats dataset from ISLR and find the best model
2. Use swiss dataset and find the best model

You might also like