You are on page 1of 18

Advanced Regression

and Model Selection


UpGrad Live Session - Ankit Jain
Model Selection Techniques
● If you are looking for a good place to start to choose a
machine learning algorithm for your dataset here are some
general guidelines.
● How large is your training set?
○ Small -- Prefer high bias/low variance classifiers (e.g.
Naive Bayes) over low bias/high variance classifiers (e.g.
KNN) to avoid overfitting.
○ Large - Low Bias/High Variance classifiers tend to produce
more accurate models
Adv/Disadv of Various Algorithms
● Naive Bayes:
○ Very simple to implement as it’s just a bunch of counts.
○ If conditional independence exists, it converges faster
than say Logistic Regression and thus requires less
training data.
○ If you want something fast,easy and performs well NB is a
good choice
○ Biggest disadvantage is that it can’t learn interactions in
the dataset
Adv/Disadv of Various Algorithms
● Logistic Regression:
○ Lots of ways to regularize the model and no need to worry
about features being correlated like in Naive Bayes.
○ Nice probabilistic interpretation. Helpful in problems like
churn prediction etc .
○ Online algorithm: Easy to update the model with the new
data (using an online gradient descent method)
Adv/Disadv of Various Algorithms
● Decision Trees:
○ Easy to explain and interpret (at least for some people)
○ Easily handles feature interactions.
○ No need to worry about outliers or whether data is linearly
separable or not.
○ Doesn’t support online learning. Rebuilding the model with
new data every time can be painful.
○ Tend to easily overfit. Solution: ensemble methods (RF)
Adv/Disadv of Various Algorithms
● SVM:
○ High accuracy for many datasets
○ With appropriate kernel, can work well even if your data
isn’t linearly separable in the base feature space.
○ Popular in text processing applications where high
dimensionality is a norm
○ Memory intensive, hard to interpret and kind of annoying
to run and tune
ADVANCED REGRESSION
Linear Regression Issues

● Sensitivity to outliers
● Multicollinearity leads to high variance of the estimator.
● Prone to overfit if there are lot of variables
● Hard to interpret when the number of predictors is large.Need
a smaller subset that exhibits strongest effects.
Regularization Techniques

● Regularization techniques typically work by penalizing the


magnitude of coefficients of features along with minimizing
the error between predicted and actual observations
● Different types of penalization
○ Ridge Regression: Penalize on squared coefficients
○ Lasso Regression: Penalize on absolute value of
coefficients
Why penalize on model coefficients?

Model1 = beta0 + beta1*x Model2 = beta0 + beta1*x + … beta10*x^10

beta1 = -0.58 beta1 = -1.4e05


Ridge Regression

● L2 penalty
● Pros
○ Variables >> Rows
○ Multicollinearity
○ Increased bias and lower variance from Linear Regression
● Cons
○ Doesn’t produce parsimonious model

Let’s see a collinearity example in R


Example: Luekemia Prediction

● Leukemia Data, Golub et al. Science 1999


● There are 38 training samples and 34 test samples with total
genes ~ 7000 (p >> n)
● Xij is the gene expression value for sample i and gene j
● Sample i either has tumor type AML or ALL
● We want to select genes relevant to tumor type
○ eliminate the trivial genes
○ grouped selection as many genes are highly correlated
● Ridge Regression can help to pursue this modeling
Grouped Selection

● If two predictors are highly correlated among themselves, the


estimated coefficients will be similar for them.
● if some variables are exactly identical, they will have same
coefficients

Ridge is good for grouped selection but not good for eliminating
trivial genes
LASSO
● Pros
○ Allow p >> n
○ Enforce sparsity in parameters
● Cons
○ If a group of predictors are highly correlated among
themselves, LASSO tends to pick only one of them and
shrink the other to zero
○ can not do grouped selection, tend to select one variable

LASSO is good for eliminating trivial genes but not good for
grouped selection
Elastic Net

● Weighted combination of L1 and L2 penalty


● Helps in enforcing sparsity
● Encourage grouping effect in highly correlated predictors

In gene selection problem, it can achieve both purposes of


removing trivial genes and doing group selection
Other Advanced Regression Methods

Poisson Regression

○ Typically used when the Y variable follows poisson


distribution (typically counts of events within a time t)
○ # times a customer will visit an ecommerce website next
month
Piecewise Linear Regression

● Polynomial regression
won’t work perfectly as it
will have high tendency to
overfit/underfit
● Instead, splitting the curve
into separate linear pieces
and building linear model
for each piece leads to
better results
QUESTIONS

You might also like