You are on page 1of 6

Linear regression(LR) model and Assumptions

Linear Regression is a machine learning algorithm based on supervised learning.

Linear Regression is a statistical method to model.The aim is to find the best fit line to establish a linear
relationship between the independent variable and the dependent variable

It performs a regression task to compute the regression coefficients to predict the best fit line.we
can say coefficients nothing but the slop of the line.as we know best fit line is

y = mX+c , where m → is slop, c — -> intercept /error

regression coefficient tells whether there is a positive or negative correlation between each
independent variable the dependent variable.

A positive sign indicates that as the predictor variable increases, the response variable also increases.

A negative sign indicates that as the predictor variable increases, the response variable decreases.

It used to calculates the best-fit line for the observed data by minimizing the sum of the squares of residual
(residual is the vertical distance between a data point and the regression line)

to minimizing the error we use to Calculate the partial derivative of the loss function with respect to m, ,it is
nothing but the Gradient Descent which shows the steepness of the slop.

Gradient descent is a optimization algorithm for finding a local minimum by obtain the most accurate value
of m and c.

Gradient Descent initially use the random values for each coefficient. and iteratively update the coefficients by
taking the derivative of the loss function towards direction of minimizing the error.

The process is repeated until a minimum sum squared error is achieved

Residuals
The difference between the observed value of the dependent variable (y) and the predicted value (ŷ) is called
the residual (e).

Residual = Observed value — Predicted value , e = y — ŷ

Note : It is used when we want to predict the value of a variable based on the value of another variable. The variable
we want to predict is called the dependent variable

Assumptions
Linear Relationship between the features and target:

Observations are independent of each other.

Little or no Multicollinearity between the features:

Multicollinearity will not affect your model’s output or prediction strength.

Multicollinearity will only affect the coefficient values for the predictor variables and their importance.

4. Homoscedasticity: The variance of residual is the same for any value of X. equal variance around the line

5. Normality: For any fixed value of X, Y is normally distributed.

6. No autocorrelation in the residuals:Autocorrelation can be tested with the help of Durbin-Watson test

7. The mean of residuals should be zero

What is a residual? How is it computed?


Residual is also called Error. It is the difference between the predicted y value and the actual y value.

Residual = Actual y value — Predicted y value.

It can be positive or negative.

If residuals are always 0, then your model has a Perfect R square i.e. 1.

when p value for f test is lower than alpha i.e. what do


you conclude?
When p value for f test is lower than alpha (which is usually .05 if nothing else is specified), then we reject H0. We
conclude that:

We have the evidence that at least one of the [x variables] has a significant relationship with the [y variable].

when p value for t test is lower than alpha i.e. what do


you conclude?
When p value for t test is lower than alpha (i.e. .05) we conclude that we have the evidence that the [particular x
variable] has a significant relationship with the [y variable] when other [x variables] are present in the model.

H0 and Ha for F tests


H0: None of the x variables are related to the y variable.

Ha: Atleast one of the x variables is related to the y variable.

What are H0 and Ha for T test


T test:

H0: [x variable] has no relationship with [y variable] when [other x variables] are present in the model.

Ha: [x variable] has a significant relationship with [y variable] when [other x variables] are present in the model.

example the H0 and Ha for square feet will be stated as follows:

H0: Square feet has no relatoinship with Rent when number of bedrooms is present in the model

Ha: Square feet has a significant relationship with Rent when number of bedrooms is present in the model.

Similarly, you can have H0 and Ha for number of bedrooms:

H0: Number of bedrooms has no relationship with Rent when square feet is present in the model

Ha: Number of bedroms has a significant relationship with Rent when square feet is present in the model

Why normality is important in linear regression??

A : if datasets have very different ranges and units to each other if we not scale them in same scale.so we could not
equally distribute the importance of each input datasets. Resultant larger values become dominant over the lesser
values. with the help of scaling every bit of data get well trained in fair manner .resultant we get the best out thought
the dataset

2. How to estimate parameters in linear regression model?

A. We use two method to estimate parameters.

Ordinary Least Squares

Gradient Descent

Ordinary Least Squares

When we have more than one input we can use Ordinary Least Squares to estimate the values of the coefficients.

This approach treats the data as a matrix and uses linear algebra operations to estimate the optimal values for the
coefficients
The Ordinary Least Squares procedure use to minimize the sum of the squared residuals error. This residuals error
means the distance from each data point to the regression line, square it, and sum all of the squared errors together.
This is the quantity that ordinary least squares use to minimize.

Gradient Descent
is the process of optimizing the values of the coefficients by iteratively minimizing the error of the model

Gradient Descent and works by starting with random values for each coefficient. The sum of the squared errors are
calculated for each pair of input and output values.

A learning rate is used as a scale factor and the coefficients are updated in the direction towards minimizing the
error. The process is repeated until a minimum sum squared error is achieved

When using this method, you must select a learning rate (alpha) parameter that determines the size of the
improvement step to take on each iteration of the procedure.

What is the mean of residuals in Linear Regression

Residuals at a point as the difference between the actual y value at a point and the estimated y value from
the regression line.

Learning rate
that determines the step size at each iteration while moving toward a minimum of a loss function represents the
speed at which a machine learning model “learns”

What is R²
R-squared is a statistical measure of how close the data are to the fitted regression line.

R Square measures how much of variability in dependent variable can be explained by the model. It is square of
Correlation Coefficient(R) and that is why it is called R Square.

R Square is calculated by the sum of squared of prediction error divided by the total sum of square which replace
the calculated prediction with mean.

R Square value is between 0 to 1 and bigger value indicates a better fit between prediction and actual value.

R Square is a good measure to determine how well the model fits the dependent variables. However, it does not take
into consideration of overfitting problem.

If your regression model has many independent variables, because the model is too complicated, it may fit very well
to the training data but performs badly for testing data. That is why Adjusted R Square is introduced because it will
penalise additional independent variables in to the model and adjust the metric to prevent overfitting issue.
R-squared = Explained variation / Total variation

R² = coefficient of determination

RSS = sum of squares of residuals

TSS = total sum of squares of average

There are two major problem with R²


Problem 1: R² increases with every predictor added to a model. As R² always increases and never decreases, it can
appear to be a better fit with the more terms we add to the model. This can be completely misleading.

Problem 2: Similarly, if our model has too many terms and too many high-order polynomials we can run into the
problem of over-fitting the data. When we over-fit data, a misleadingly high R² value can lead to misleading
predictions.

What is adjusted R²?
Adjusted R-squared is used to determine how reliable the correlation is between the independent variables and the
dependent variable.

On addition of highly correlated variables the adjusted R-squared will increase whereas for variables with no
correlation with dependent variable the adjusted R-squared will decrease.

What is the difference between R square and adjusted R square?


R square and adjusted R square values are used for model validation in case of linear regression.

every time when you added independent variable R square will increasess even if independent variable is
significant .it never declined.

Adjusted R squared will increases when independent variable is significant and affects the dependent variable.

Adjusted R squared value is alwayes be less then and equal to R squared value.

R square indicates the variation of all the independent variables on the dependent variable. i.e. it considers all the
independent variable to explain the variation.

In the case of Adjusted R squared, it considers only significant variables(P values less than 0.05) to indicate the
percentage of variation in the model.

How to check accuracy in LR?

R-Squared(Coefficient of Determination)

Adjusted R-Squared
What is difference between mean square and least square error?

There are five metrics used to evaluate regression


models:¶
Mean Absolute Error(MAE)

Mean Squared Error(MSE)

Root Mean Squared Error(RMSE)

R-Squared(Coefficient of Determination)

Adjusted R-Squared

How to find RMSE and MSE


RMSE and MSE are the two of the most common measures of accuracy for a linear regression.

MSE is calculated by the sum of square of prediction error which is real output minus predicted
output and then divide by the number of data points. It gives you an absolute number on how much
your predicted results deviate from the actual one.

RMSE indicates the Root mean square error, which indicated by the formulae:

Root Mean Squared Error (RMSE)


Root Mean Square Error (RMSE) is the standard deviation of the residuals (prediction errors). Residuals are a
measure of how far the regression line from data points.

What is difference between collinearity and correlation?

You might also like