Advanced Regression and Model Selection: Upgrad Live Session - Ankit Jain

Advanced Regression
and Model Selection

UpGrad Live Session - Ankit Jain
Model Selection Techniques
● If you are looking for a good place to start to choose a
machine learning algorithm for your dataset here are some
general guidelines.
● How large is your training set?
○ Small -- Prefer high bias/low variance classifiers (e.g.
Naive Bayes) over low bias/high variance classifiers (e.g.
KNN) to avoid overfitting.
○ Large - Low Bias/High Variance classifiers tend to produce
more accurate models
Adv/Disadv of Various Algorithms
● Naive Bayes:
○ Very simple to implement as it’s just a bunch of counts.
○ If conditional independence exists, it converges faster
than say Logistic Regression and thus requires less
training data.
○ If you want something fast,easy and performs well NB is a
good choice
○ Biggest disadvantage is that it can’t learn interactions in
the dataset
● Logistic Regression:
○ Lots of ways to regularize the model and no need to worry
about features being correlated like in Naive Bayes.
○ Nice probabilistic interpretation. Helpful in problems like
churn prediction etc .
○ Online algorithm: Easy to update the model with the new
data (using an online gradient descent method)
● Decision Trees:
○ Easy to explain and interpret (at least for some people)
○ Easily handles feature interactions.
○ No need to worry about outliers or whether data is linearly
separable or not.
○ Doesn’t support online learning. Rebuilding the model with
new data every time can be painful.
○ Tend to easily overfit. Solution: ensemble methods (RF)
● SVM:
○ High accuracy for many datasets
○ With appropriate kernel, can work well even if your data
isn’t linearly separable in the base feature space.
○ Popular in text processing applications where high
dimensionality is a norm
○ Memory intensive, hard to interpret and kind of annoying
to run and tune
ADVANCED REGRESSION
Linear Regression Issues
● Sensitivity to outliers
● Multicollinearity leads to high variance of the estimator.
● Prone to overfit if there are lot of variables
● Hard to interpret when the number of predictors is large.Need
a smaller subset that exhibits strongest effects.
Regularization Techniques
● Regularization techniques typically work by penalizing the

magnitude of coefficients of features along with minimizing
the error between predicted and actual observations
● Different types of penalization
○ Ridge Regression: Penalize on squared coefficients
○ Lasso Regression: Penalize on absolute value of
coefficients
Why penalize on model coefficients?
Model1 = beta0 + beta1*x Model2 = beta0 + beta1*x + … beta10*x^10
beta1 = -0.58 beta1 = -1.4e05

Ridge Regression
● L2 penalty
● Pros
○ Variables >> Rows
○ Multicollinearity
○ Increased bias and lower variance from Linear Regression
● Cons
○ Doesn’t produce parsimonious model
Let’s see a collinearity example in R

Example: Luekemia Prediction
● Leukemia Data, Golub et al. Science 1999

● There are 38 training samples and 34 test samples with total
genes ~ 7000 (p >> n)
● Xij is the gene expression value for sample i and gene j
● Sample i either has tumor type AML or ALL
● We want to select genes relevant to tumor type
○ eliminate the trivial genes
○ grouped selection as many genes are highly correlated
● Ridge Regression can help to pursue this modeling
Grouped Selection
● If two predictors are highly correlated among themselves, the

estimated coefficients will be similar for them.
● if some variables are exactly identical, they will have same
coefficients
Ridge is good for grouped selection but not good for eliminating
trivial genes
LASSO
● Pros
○ Allow p >> n
○ Enforce sparsity in parameters
● Cons
○ If a group of predictors are highly correlated among
themselves, LASSO tends to pick only one of them and
shrink the other to zero
○ can not do grouped selection, tend to select one variable
LASSO is good for eliminating trivial genes but not good for
grouped selection
Elastic Net
● Weighted combination of L1 and L2 penalty

● Helps in enforcing sparsity
● Encourage grouping effect in highly correlated predictors
In gene selection problem, it can achieve both purposes of

removing trivial genes and doing group selection
Other Advanced Regression Methods
Poisson Regression
○ Typically used when the Y variable follows poisson

distribution (typically counts of events within a time t)
○ # times a customer will visit an ecommerce website next
month
Piecewise Linear Regression
● Polynomial regression
won’t work perfectly as it
will have high tendency to
overfit/underfit
● Instead, splitting the curve
into separate linear pieces
and building linear model
for each piece leads to
better results
QUESTIONS

Advanced Regression and Model Selection: Upgrad Live Session - Ankit Jain

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Advanced Regression and Model Selection: Upgrad Live Session - Ankit Jain

Uploaded by

Copyright:

Available Formats

Advanced Regression

and Model Selection

● Regularization techniques typically work by penalizing the

Model1 = beta0 + beta1x Model2 = beta0 + beta1x + … beta10*x^10

beta1 = -0.58 beta1 = -1.4e05

Let’s see a collinearity example in R

● Leukemia Data, Golub et al. Science 1999

● If two predictors are highly correlated among themselves, the

● Weighted combination of L1 and L2 penalty

In gene selection problem, it can achieve both purposes of

○ Typically used when the Y variable follows poisson

You might also like