Professional Documents
Culture Documents
3.13.23.33.43.5regression Regularization
3.13.23.33.43.5regression Regularization
Where {x1, x2, …, xn} are the independent variables of the dataset
and {a1, a2, …, an} are the coefficients of these variables, implying the weight that each
independent variable shares in the resultant target variable y.
The b value is the constant or bias term.
▪ The nonlinear regression, the relationship is a nonlinear equation (e.g., polynomial,
exponential)
▪ Objective?
▪ find the optimum values for the coefficients {a1, a2, …, an}
▪ Steps?
▪ model selection, model fitting, model prediction, and model evaluation
▪ Given: one independent variable and one dependent variable
▪ Required: simple linear regression
▪ Model selection:
▪
▪ Given: one independent variable and one dependent variable
▪ Required: simple linear regression
▪ Model selection:
Residue
A small amount that remains after the
main part has been taken away
▪ Given: one independent variable and one dependent variable
▪ Required: simple linear regression
▪ Model selection:
▪ Find a and b
For datasets with more than one independent variable, the same procedure needs to be
followed for each of the independent variables to find all the involved parameters.
R2 Metric for Correlation Analysis
▪ Coefficient of determination is employed to gauge the quality of fit of the derived
regression relationship.
▪ When a linear regression fails to fit a linear model with strong correlation from the
independent variable to the dependent variable, a nonlinear or polynomial
regression is advised to correlate the data variables in a curvilinear model.
▪ In this example, a linear regression model could possibly present a continuous
range of values including “0” and “1,” but it will not exactly produce {“0”, “1”}
output.
▪ Therefore, a logit function “S” shape is introduced
to the linear regression equation, and the
underlying linear model will map the predictions
with the logistic function to produce binary values
{“0”, “1”}.
▪ The rationale for logistic regression is based
on odds ratio, where the odds of a specific event
occurring are defined as the probability of an
event occurring divided by the probability of that
event not occurring
▪ Logistic regression is considered an extension of the linear regression analysis,
and its corresponding model can be used to classify input data records into a set
of given categories or discrete values that form the dependent variable.
▪ For example, the dependent variable (y) may represent a loan prediction
(approved/rejected) based on a credit score (i.e., the independent variable).
▪ Ex.; Assume two values for
y: “0” to denote a rejected loan status
and “1” to denote an approved loan status.
▪ In this example, a linear regression model could possibly present a continuous
range of values including “0” and “1,” but it will not exactly produce {“0”, “1”}
output.
▪ Therefore, a logit function “S” shape is introduced
to the linear regression equation, and the
underlying linear model will map the predictions
with the logistic function to produce binary values
{“0”, “1”}.
▪ The maximum likelihood method generates the logit function that predicts the
natural logarithm of the odds ratio
▪ The predicted odds ratio and the predicted probability of success are found next.
Hence, the logistic regression equation is
▪ The plot of the logistic equation on the given dataset is shown as follows
▪ The plot of the logistic equation on the
given dataset is shown as follows
https://towardsdatascience.com/logit-of-logistic-regression-
understanding-the-fundamentals-f384152a33d1
▪ Regression analysis tries to develop the best fit between the independent
variables and the dependent variable such that the loss function is minimized
and/or the value of R2 is maximized.
▪ However, the developed model can overfit the training dataset and lose the value of
generalization when applied to a different testing dataset.
▪ Furthermore, while tuning the model for its best fit, the existence of any wrong
data points (i.e., outliers) affect the development, leading to an incorrect
regression model.
▪ To avoid the issues of overfitting and outliers and to have a more robust model, we
penalize the loss function by adding a penalty term to the regression model. Such a
penalty is known as regularization.
▪ For the case of regression, it comes in two common forms: ridge and lasso
regularization.
▪ The ridge regression (or L2 regularization) proceeds by adding a term to the loss
function that penalizes the sum of squares of the model coefficients
before
▪ The ridge regression (or L2 regularization) proceeds by adding a term to the loss
function that penalizes the sum of squares of the model coefficients
before
after
where is a constant that controls the level of the penalty, and stands for the model
coefficients.
The higher is, the greater the emphasis on the reduction of the coefficients magnitudes, at
the expense of tolerating higher residuals.
▪ In L1 regularization the objective is to minimize the sum of the absolute values of
the coefficients instead of their squares.
▪ As a result, both large and small coefficients values are addressed and driven
down.
L1
▪ In L1 regularization the objective is to minimize the sum of the absolute values of
the coefficients instead of their squares.
▪ As a result, both large and small coefficients values are addressed and driven
down.
before
Ridge:
L2
Lasso:
L1
▪ In L1 regularization the objective is to minimize the sum of the absolute values of
the coefficients instead of their squares.
▪ As a result, both large and small coefficients values are addressed and driven
down.
before
Ridge:
L2
Lasso:
L1