You are on page 1of 18

Go, change the

RV College of world
Engineering

Improvi
Chapter 8
Supervised Learning- Regression

7/11/2022 1
Go, change the
RV College of
Engineering Introduction world

The most common regression algorithms


are
Sim ple lin ea r r eg r ession
Mu ltiple lin ea r r eg r ession
Poly n om ia l r eg r ession
Mu ltiv a r ia te a da ptiv e r eg r ession splin es
Log istic r eg r ession
Ma x im u m likelih ood estim a tion (lea st squ a r es)

7/11/2022 2
Go, change the
RV College of
Engineering Introduction world

7/11/2022 3
Go, change the
RV College of
Engineering Introduction world

(X , Y ) = (−3, −2) and (X , Y ) = (2, 2)

Slope = ??

7/11/2022 4
Go, change the
RV College of
Engineering Introduction world

types of slopes
Lin ea r positiv e slope
Cu r v e lin ea r positiv e slope
Lin ea r n eg a tiv e slope
Cu r v e lin ea r n eg a tiv e slope

7/11/2022 5
Go, change the
RV College of
Engineering Introduction world

Residual is the distance between the predicted point


(on the regression line) and the actual point as depicted
in

7/11/2022 6
Go, change the
RV College of
Engineering Introduction world

Maximum and minimum point of curves.


The minimum point is the point on the curve of the graph with the lowest y-coordinate and a slope of zero.
The Maximum is the point on the curve of the graph with the highest y-coordinate and a slope of zero.

7/11/2022 7
Go, change the
Multiple Linear Regression
RV College of
Engineering
world

In a multiple regression model, two or more independent variables, i.e. predictors are involved in
the model
The simple linear regression model and the multiple regression model assume that the dependent variable
is continuous.
The following expression describes the equation involving the relationship with two predictor variables,
namely X and X .

The model describes a plane in the three-dimensional space of Ŷ, X , and X . Parameter ‘a’ is the
intercept of this plane. Parameters ‘b ’ and ‘b ’ are referred to as partial regression coefficients.
Parameter b represents the change in the mean response corresponding to a unit change in X1 when
X2 is held constant. Parameter b represents the change in the mean response corresponding to a unit
change in X2 when X1 is held constant.

7/11/2022 8
Go, change the
Multiple Linear Regression
RV College of
Engineering
world

Assumptions in Regression Analysis


1 . Th e depen den t v a r ia ble (Y) ca n be ca lcu la ted / pr edica ted a s a lin ea r fu n ction of a specific set of
in depen den t v a r ia bles (X’s) plu s a n er r or ter m (ε).
2 . Th e n u m ber of obser v a tion s (n) is g r ea ter th a n th e n u m ber of pa r a m eter s (k) to be estim a ted, i.e. n > k.
3 . Rela tion sh ips deter m in ed by r eg r ession a r e on ly r ela tion sh ips of a ssocia tion ba sed on th e da ta set a n d n ot n ecessa r ily of
ca u se a n d effect of th e defin ed cla ss.
4 . Reg r ession lin e ca n be v a lid on ly ov er a lim ited r a n g e of da ta . If th e lin e is ex ten ded (ou tside th e r a n g e of ex tr a pola tion
), it m a y on ly lea d to w r on g pr ediction s.
5 . If th e bu sin ess con dition s ch a n g e a n d th e bu sin ess a ssu m ption s u n der ly in g th e r eg r ession m odel a r e n o lon g er v a
lid, th en th e pa st da ta set w ill n o lon g er be a ble to pr edict fu tu r e tr en ds.
6 . V a r ia n ce is th e sa m e for a ll v a lu es of X (h om oskeda sticity ).
7 . Th e er r or ter m (ε) is n or m a lly distr ibu ted. Th is a lso m ea n s th a t th e m ea n of th e er r or (ε) h a s a n ex pected v a lu e of 0.
8 . Th e v a lu es of th e er r or (ε) a r e in depen den t a n d a r e n ot r ela ted to a n y v a lu es of X. Th is m ea n s th a t th er e a r e n o r ela
tion sh ips betw een a pa r ticu la r X, Y th a t a r e r ela ted to a n oth er specific v a lu e of X, Y.
Given the above assumptions, the OLS estimator is the
Best Linear Unbiased Estimator (BLUE), and this
is called as Gauss-Markov Theorem.

7/11/2022 9
Go, change the
Multiple Linear Regression
RV College of
Engineering
world

Main Problems in Regression Analysis In multiple regressions, there are two primary problems:
multicollinearity and heteroscedasticity

Multicollinearity
Two variables are perfectly collinear if there is an exact linear relationship between them. Multicollinearity is the
situation in which the degree of correlation is not only between the dependent variable and the independent
variable, but there is also a strong correlation within (among) the independent variables themselves.
but it is difficult for us to determine how the dependent variable will change if each independent variable is changed
one at a time. When multicollinearity is present, it increases the standard errors of the coefficients. By overinflating the
standard errors, multicollinearity tries to make some variables statistically insignificant when they actually should be
significant (with lower standard errors). One way to gauge multicollinearity is to calculate the Variance Inflation Factor
(VIF), which assesses how much the variance of an estimated regression coefficient increases if the predictors are
correlated. If no factors are correlated, the VIFs will be equal to 1.

7/11/2022 10
Go, change the
Multiple Linear Regression
RV College of
Engineering
world

Heteroskedasticity
Heteroskedasticity refers to the changing variance of the error term.
If the variance of the error term is not constant across data sets, there will be erroneous predictions.
In general, for a regression equation to make accurate predictions, the error term should be independent,
identically (normally) distributed (iid).

7/11/2022 11
Go, change the
Multiple Linear Regression
RV College of
Engineering
world

Accuracy of linear regression can be improved using


the following three methods:
1 . Sh r in ka g e A ppr oa ch
2 . Su bset Selection
3 . Dim en sion a lity (V a r ia ble) Redu ction

Regularization is a technique to penalize the high-value regression coefficient. In other words, we can say
that it reduces the values of the coefficient thereby simplifying the model.
in regularization, we keep the same number of features but reduces the magnitude of the coefficient (W0,
W1,.……, Wm).
The ridge regression by default adds the L2 regularization penalty i.e. it adds the square of the magnitude
of the coefficient to the loss function. So the loss function changes to the following equation.
Minimization objective of ridge = LS Obj + α × (sum of square of coefficients)
The lasso regression by default adds the L1 regularization penalty i.e. it adds the absolute value of the
magnitude of the coefficient to the loss function. So the loss function changes to the following equation.
Minimization objective of lasso = LS Obj + α × (absolute value of the magnitude of coefficients)

7/11/2022 12
Go, change the
Multiple Linear Regression
RV College of
Engineering
world

The lasso overcomes this disadvantage by forcing some of the coefficients to zero value. We can
say that the lasso yields sparse models (involving only subset) that are simpler as well as more
interpretable. The lasso can be expected to perform better in a setting where a relatively small
number of predictors have substantial coefficients, and the remaining predictors have
coefficients that are very small or equal to zero.

7/11/2022 13
Go, change the
RV College of
Engineering
Polynomial Regression Model world

Polynomial regression model is the extension of the


simple linear model by adding extra predictors obtained
by raising (squaring) each of the original predictors to a
power. For example, if there are three variables, X, X ,
and X are used as predictors. This approach provides a
simple way to yield a non-linear fit to data.

7/11/2022 14
RV College of Go, change the
Engineering
Logistic Regression world

Logistic regression (logit regression) is a type of regression analysis used for predicting the outcome of a categorical
dependent variable similar to OLS regression. In logistic regression, dependent variable (Y) is binary (0,1) and
independent variables (X) are continuous in nature.

The goal of logistic regression is to predict the likelihood that Y is equal to 1 (probability that Y = 1 rather than 0)
given certain values of X. So, we are predicting probabilities rather than the scores of the dependent
variable.

logit function is the natural log of the odds that Y equals one of the categories​.

7/11/2022 15
RV College of Go, change the
Engineering
Logistic Regression world

Let us say we have a model that can predict whether a person is male or female on the basis of
their height. Given a height of 150 cm, we need to predict whether the person is male or female.
We know that the coefficients of a = −100 and b = 0.6. Using the above equation, we can calculate the
probability of male given a height of 150 cm or more formally P(male|height = 150).

or a probability of near zero that the person is a male.

7/11/2022 16
RV College of Go, change the
Engineering
Logistic Regression world

Assumptions in logistic regression


The following assumptions must hold when building a
logistic regression model:
Th er e ex ists a lin ea r r ela tion sh ip betw een log it fu n ction
a n d in depen den t v a r ia bles
Th e depen den t v a r ia ble Y m u st be ca teg or ica l (1 /0) a n d
ta ke bin a r y v a lu e, e.g . if pa ss th en Y = 1 ; else Y = 0
Th e da ta m eets th e ‘iid’ cr iter ion , i.e. th e er r or ter m s, ε,
a r e in depen den t fr om on e a n oth er a n d iden tica lly
distr ibu ted
Th e er r or ter m follow s a bin om ia l distr ibu tion [n, p]
n = # of r ecor ds in th e da ta
p = pr oba bility of su ccess (pa ss, r espon der )

7/11/2022 17
Go, change the
RV College of
Engineering Maximum Likelihood Estimation world

The coefficients in a logistic regression are estimated using a process called Maximum
Likelihood Estimation (MLE).
what is likelihood function
he likelihood function measures the goodness of fit of a statistical model to a sample of data for
given values of the unknown parameters.
MLE is about predicting the value for the parameters that maximizes the likelihood function.

7/11/2022 18

You might also like