Professional Documents
Culture Documents
com/in/vikrantkumar95
Linear Regression
Clearly Explained
Swipe
What Is Linear Regression? (1/2)
Technical definition: Linear Regression is a statistical method
used to model a relationship between one dependent variable and
one or more independent variables.
linkedin.com/in/vikrantkumar95
What Is Linear Regression? (2/2)
Simply put, Linear Regression is a way to predict one thing ( Crop Yield
or Sales) based on the linear relationships it has with other things
(Time or Rainfall). It's like drawing a straight line through a scatter of
points to show the general trend.
)
Tan =B
The above line would have an equation with Crop Yield (the dependent
variable we are trying to predict) as Y-axis and Rainfall (the
independent variable that will be the predictor) as X-axis. The
equation would be
Crop Yield = A + B (Rainfall)
A is the Y-intercept and B is the slope of the line. The higher the value
of B the steeper the line, i.e the more sensitive the Crop Yield is to
changes in Rainfall.
linkedin.com/in/vikrantkumar95
So How Do We Fit A Line? (1/2)
1 2
linkedin.com/in/vikrantkumar95
So How Do We Fit A Line? (2/2)
As we saw in the previous slide, Linear Regression involves
fitting a line by minimizing the Sum of Square Residuals. Also
why it’s sometimes known as Least Squares Regression.
linkedin.com/in/vikrantkumar95
How Do We Tell How Good The Fit Is?
We fit a line by minimizing the residual squares. However,
how do we know if the best fit line actually captures the
underlying relationship? It could just be a less poor fit
amongst a bunch a poor fits. This is where R² comes in.
Here:
SST is Sum of Squares Total
SSR is Sum of Square Residuals
linkedin.com/in/vikrantkumar95
R² Explained (1/3)
We saw two terms in the R² formula - SST and SSR. We’ll take
a look at both here.
SST stands for Sum of Squares Total, which is the sum of the
square residuals around the mean of the dependent variable
(Crop Yield). Which basically means calculating the Sum of
Squares (SS) around a horizontal fit line that passes through
the mean of Crop Yield.
SST
linkedin.com/in/vikrantkumar95
R² Explained (2/3)
We can see below that calculating the Sum of Square
Residual (SS) for either of the graphs, you’d get the same
result, which is the SST.
SST SST
linkedin.com/in/vikrantkumar95
R² Explained (3/3)
The second term in the formula was SSR - Sum of Squares
Residual. This is what we used to achieve our best fit - It's
calculated by summing the squares of the differences
between each observed value and its corresponding predicted
value from the regression model.
SSR
Suppose there are 10 data points (n) and the SST comes out to be
400. Then the Var(mean) would be:
Var(mean) = 400 / 10 = 40
Now let’s assume the SSR comes out to be 120. Then Var(fit)
would be:
Var(mean) = 120 / 10 = 12
We can see that the variation in the second graph, i.e the line fit
by least squares, is less compared to the variation around the
mean in the first graph. Therefore, we can say that some of the
variation in the Crop Yield is explained by taking into
consideration the Rainfall.
linkedin.com/in/vikrantkumar95
Calculating & Interpreting R² (2/2)
Let us look at the formula for R² again:
A perfect model that passes through all the data points would
have a 0 Variance and hence an R² = 1. Although that rarely
happens in real world and is usually a sign of overfitting.
linkedin.com/in/vikrantkumar95
Future Learnings
What we saw here was just an example of Simple Linear
Regression (only one independent and one dependent
variable). There are a few more things that we’ll cover later
which would broaden your understanding of Linear
Regressions models:
linkedin.com/in/vikrantkumar95
Sample Code to Train and
Visualize a Simple Linear
Regression Model
linkedin.com/in/vikrantkumar95
Enjoyed
reading?
Follow for
everything Data
and AI!
linkedin.com/in/vikrantkumar95