R-Squared and Adjusted R-Squared

R-SQUARED AND ADJUSTED R-SQUARED
INTRODUCTION
R-Squared and Adjusted R-Squared are the key techniques to check the accuracy for a
Regression problem. We will understand each in detail in the subsequent sections.
There are various techniques to check the accuracy of different problems. In case of
classification problems, we use the confusion matrix, F1-Score, Precision, Recall etc.
R-SQUARED
The formula for R-Squared is:
Where,
SSres = Sum of residual
SStot = Sum of average total
BEST FIT LINE
To understand what SSres is, let’s take a graph to understand that
1
The blue dots in the graph are the actual points. The double ended arrow between each blue
dot and the diagonal line (best fit line) shows the difference between
the predicted and actual point. This is the error/residual. The summation of all these
differences between the actual and the predicted points is what we call as SSres
SSres=∑(yi−y^i)2
AVERAGE FIT LINE
In the above figure, you can see that instead of finding the best fit line, the average output
line is taken. The blue dots in the graph are the actual points.
The double ended arrow between each blue dot and the average output line gives the
difference between the predicted and actual point. The summation of all these differences
between the actual and the predicted points is what we call as SStot
SStot =∑(yi− ȳ)2

So, substituting the value of SSres and SStot in the R2 equation, we will get a value
somewhere between 0 and 1.
𝐒𝐒𝐫𝐞𝐬 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝐒𝐒𝐫𝐞𝐬<𝐒𝐒𝐭𝐨𝐭

R =1 –
2
=1− = 0 < value of R2 < 1
𝐒𝐒𝐭𝐨𝐭 𝐒𝐒𝐭𝐨𝐭
2
The logic behind this is, the error in SStot will always be higher as we are taking an average
fit.
Whereas the error for SSres will be comparatively lower than SStot making it a smaller value.
SSres
Therefore, SStot will be a smaller value. Subtracting this from 1 will give us a value
somewhere between 0 and 1.
If the R2 value is nearer to 1, then our best fit line has fitted to the model quite well.
But wait!! Can we encounter a scenario where the R2 value is less than 0?
Yes, the value for R2 can be less than 0 in cases where the output of the best fit line is worse
than the average output line. That means SSres >SStot
Substituting the values to the R2 equation below:
𝐒𝐒𝐫𝐞𝐬 𝑙𝑎𝑟𝑔𝑒 𝑣𝑎𝑙𝑢𝑒

R2 = 1 – 𝐒𝐒𝐭𝐨𝐭 = 1 − = −ve value
𝑠𝑚𝑎𝑙𝑙𝑒𝑟 𝑣𝑎𝑙𝑢𝑒
This means that the model that we have created is not at all a good model. Therefore R2 is
used to check the goodness of fit.
DRAWBACK OF R2
There is a drawback to R2 which often makes it difficult to predict the accuracy of the model.
Let’s say we have a simple linear regression model which has one independent feature and
has the equation y = ax + b. Now we add few more independent features to the model. Our
new equation would be a multiple linear regression model with an equation somewhat
like y = ax1 + bx2 + cx3 + d.
So, as the number of independent feature increases, our R2 also increases.
HOW DOES THE VALUE OF R2 INCREASE?
Every time when we add an independent feature, the linear regression algorithm adds a
coefficient value to the feature. Ex., The coefficients for the above equation are a, b, c which
got added when new features x1, x2, x3 were introduced to the model.
The Linear regression algorithm assigns the coefficients in such a way that the value of SSres
will always be decreasing, whenever we add a new independent feature.
If we substitute this logic to R2 equation:
𝐒𝐒𝐫𝐞𝐬 𝐷𝑒𝑐𝑟𝑒𝑎𝑠𝑖𝑛𝑔 𝑤𝑖𝑡ℎ 𝑖𝑛𝑐𝑟𝑒𝑎𝑠𝑖𝑛𝑔 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑓𝑒𝑎𝑡𝑢𝑟𝑒𝑠

R2 =1 – =1 – =close to 1
𝐒𝐒𝐭𝐨𝐭 𝐒𝐒𝐭𝐨𝐭(𝑐𝑜𝑛𝑠𝑡𝑎𝑛𝑡 𝑔𝑟𝑒𝑎𝑡𝑒𝑟 𝑣𝑎𝑙𝑢𝑒)
3
This sounds perfect right, Not really!!
As we increase the number of independent features in the model, the R2 value will also keep
on increasing even though the independent features are not co-related with the dependent
variable.
Chances are the feature that we include can be a complete one-off. It might not have any
relation with the target dependent variable, but still has some coefficient value contributing to
the output. Linear regression algorithm works in such a way that it adds a coefficient value to
every feature that is present in the model.
Ex. suppose we are predicting the age of students in which our model might have one of the
features as the contact number of the students. This feature seems to have no correlation
with the age, but still might have some coefficient value contributing to the output thereby
increasing the overall R2 of the model.
This clearly means that R2 doesn’t have anything to do with the correlation between the
independent features and the dependent variable. It simply increases whenever we add a new
feature to the model.
To prevent such scenarios, we use Adjusted R2
ADJUSTED R – SQUARED
The formula for Adjusted R-Squared is as follows
(𝑁−1)
Adjusted − R2 = 1 − (1− R2)
𝑁−𝑃−1
Where,
R2 = R − squared value
P = independent features
N = Sample size of the dataset
The Adjusted – R2 has a penalizing factor. It penalizes for adding independent variable that
don’t contribute to the model in any way or are not correlated to the dependent variable.
To understand this penalizing factor let’s divide it into 2 cases:
CASE – I
Let’s say we increase the number of independent features (P) for the model. These features
are not really contributing to the model much or are not correlated to the dependent
variable.
4
Let’s substitute this logic to the Adjusted R-Squared equation. The value for N-P-1 will
(𝑁−1)
decrease as the value for P has increased. Thus, the value for will increase.
𝑁−𝑃−1
Now there is one thing that we need to understand here. As we add new features, it is obvious
that the R-Squared value will increase. But this increase will be insignificant in comparison
(𝑁−1)
to the value because the newly added features are not correlated to the dependent
𝑁−𝑃−1
variable. So, (1− R2) will not decrease much.
(𝑁−1)
Now the value for multiplied with (1− R2) will also not be a decreased value.
𝑁−𝑃−1
Finally, on subtracting it from 1 will give us a smaller value
(𝑁−1)
Adjusted − R2 = 1 − (1− R2) = 1−(increasing value less than 1)
𝑁−𝑃−1
=smaller value
This is how Adjusted R-Squared penalizes when the features are not correlated to the
dependent variable.
CASE – II
Now let’s say we are adding features which are very much correlated to the dependent
(𝑁−1)
variable. In this case the R2 will be higher and will overwhelm the value.
𝑁−𝑃−1
(𝑁−1)
So, (1 – R2) will be a smaller value which multiplied with an overwhelming value
𝑁−𝑃−1
will give us a smaller value. Now subtracting this from 1 would give us Adjusted R-
Squared which is an increased value compared to the previous case.
Substituting this logic to the Adjusted R-Squared equation
(𝑁−1)
Adjusted − R2 = 1 − (1− R2)
𝑁−𝑃−1
=1 − (smaller value)(Overwhelmed value)
= 1 − smaller value = Increased Adjusted R2 value
So, this signifies that, when the independent features are correlated to the dependent variable,
the Adjusted R- Squared value goes up.
5
CONCLUSION
1. Whenever we add an independent feature to the model, the R-

squared value will always increase, even if the independent feature is
not correlated to the dependent variable. It will never decrease. On the
other hand, Adjusted R- Squared increases only when the independent
feature is correlated to the dependent variable.
2. The value for Adjusted R-Squared will always be less than or equal to R-
squared value.
THANK YOU

R-Squared and Adjusted R-Squared - Short Intro

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

R-Squared and Adjusted R-Squared - Short Intro

Uploaded by

Copyright:

Available Formats

SSres = Sum of residual

SStot = Sum of average total

BEST FIT LINE

To understand what SSres is, let’s take a graph to understand that

AVERAGE FIT LINE

SStot =∑(yi− ȳ)2

𝐒𝐒𝐫𝐞𝐬 𝑣𝑎𝑙𝑢𝑒 𝑜𝑓 𝐒𝐒𝐫𝐞𝐬<𝐒𝐒𝐭𝐨𝐭

Substituting the values to the R2 equation below:

𝐒𝐒𝐫𝐞𝐬 𝑙𝑎𝑟𝑔𝑒 𝑣𝑎𝑙𝑢𝑒

So, as the number of independent feature increases, our R2 also increases.

HOW DOES THE VALUE OF R2 INCREASE?

If we substitute this logic to R2 equation:

𝐒𝐒𝐫𝐞𝐬 𝐷𝑒𝑐𝑟𝑒𝑎𝑠𝑖𝑛𝑔 𝑤𝑖𝑡ℎ 𝑖𝑛𝑐𝑟𝑒𝑎𝑠𝑖𝑛𝑔 𝑖𝑛𝑑𝑒𝑝𝑒𝑛𝑑𝑒𝑛𝑡 𝑓𝑒𝑎𝑡𝑢𝑟𝑒𝑠

To prevent such scenarios, we use Adjusted R2

To understand this penalizing factor let’s divide it into 2 cases:

Finally, on subtracting it from 1 will give us a smaller value

Substituting this logic to the Adjusted R-Squared equation

=1 − (smaller value)(Overwhelmed value)

= 1 − smaller value = Increased Adjusted R2 value

1. Whenever we add an independent feature to the model, the R-

You might also like