You are on page 1of 41

How to Interpret Adjusted R-Squared and Predicted R-Squared

in Regression Analysis
Friday, July 22, 2022 8:35 PM

How to Interpret Adjusted R-Squared and Predicted R-


Squared in Regression Analysis
By Jim Frost133 Comments
R-squared tends to reward you for including too many independent variables in a regression model, and
it doesn’t provide any incentive to stop adding more. Adjusted R-squared and predicted R-squared use
different approaches to help you fight that impulse to add too many. The protection that adjusted R-
squared and predicted R-squared provide is critical because too many terms in a model can produce
results that you can’t trust. These statistics help you include the correct number of independent
variables in your regression model.

Does this graph display an actual relationship or is it an overfit model? This blog post shows you how to
make this determination.
Multiple linear regression can seduce you! Yep, you read it here first. It’s an incredibly tempting
statistical analysis that practically begs you to include additional independent variables in your model.
Every time you add a variable, the R-squared increases, which tempts you to add more. Some of the
independent variables will be statistically significant. Perhaps there is an actual relationship? Or is it just
a chance correlation?
You just pop the variables into the model as they occur to you or just because the data are readily
available. Higher-order polynomials curve your regression line any which way you want. But are you
fitting real relationships or just playing connect the dots? Meanwhile, the R-squared increases,
mischievously convincing you to include yet more variables!
In my post about interpreting R-squared, I show how evaluating how well a linear regression model fits
the data is not as intuitive as you may think. Now, I’ll explore reasons why you need to use adjusted R-
squared and predicted R-squared to help you specify a good regression model!
Some Problems with R-squared
Previously, I demonstrated that you cannot use R-squared to conclude whether your model is biased. To
check for this bias, you need to check your residual plots. Unfortunately, there are yet more problems
with R-squared that we need to address.
Problem 1: R-squared increases every time you add an independent variable to the model. The R-
squared never decreases, not even when it’s just a chance correlation between variables. A regression
model that contains more independent variables than another model can look like it provides a better fit
merely because it contains more variables.
Problem 2: When a model contains an excessive number of independent variables and polynomial
terms, it becomes overly customized to fit the peculiarities and random noise in your sample rather than
reflecting the entire population. Statisticians call this overfitting the model, and it produces deceptively

36. Interpret Adjusted R2 and Predicted R2.one (On 8-2-2022) Page 1


reflecting the entire population. Statisticians call this overfitting the model, and it produces deceptively
high R-squared values and a decreased capability for precise predictions.
Fortunately for us, adjusted R-squared and predicted R-squared address both of these problems.
What Is the Adjusted R-squared?
Use adjusted R-squared to compare the goodness-of-fit for regression models that contain differing
numbers of independent variables.
Let’s say you are comparing a model with five independent variables to a model with one variable and
the five variable model has a higher R-squared. Is the model with five variables actually a better model,
or does it just have more variables? To determine this, just compare the adjusted R-squared values!
The adjusted R-squared adjusts for the number of terms in the model. Importantly, its value increases
only when the new term improves the model fit more than expected by chance alone. The adjusted R-
squared value actually decreases when the term doesn’t improve the model fit by a sufficient amount.
The example below shows how the adjusted R-squared increases up to a point and then decreases. On
the other hand, R-squared blithely increases with each and every additional independent variable.

In this example, the researchers might want to include only three independent variables in their
regression model. My R-squared blog post shows how an under-specified model (too few terms) can
produce biased estimates. However, an overspecified model (too many terms) can reduce the model’s
precision. In other words, both the coefficient estimates and predicted values can have larger margins of
error around them. That’s why you don’t want to include too many terms in the regression model!
What Is the Predicted R-squared?
Use predicted R-squared to determine how well a regression model makes predictions. This statistic
helps you identify cases where the model provides a good fit for the existing data but isn’t as good at
making predictions. However, even if you aren’t using your model to make predictions, predicted R-
squared still offers valuable insights about your model.
Statistical software calculates predicted R-squared using the following procedure:
• It removes a data point from the dataset.
• Calculates the regression equation.
• Evaluates how well the model predicts the missing observation.
• And repeats this for all data points in the dataset.
Predicted R-squared helps you determine whether you are overfitting a regression model. Again, an
overfit model includes an excessive number of terms, and it begins to fit the random noise in your
sample.
By its very definition, it is not possible to predict random noise. Consequently, if your model fits a lot of
random noise, the predicted R-squared value must fall. A predicted R-squared that is distinctly smaller
than R-squared is a warning sign that you are overfitting the model. Try reducing the number of terms.
If I had to name my favorite flavor of R-squared, it would be predicted R-squared!
Related post: Overfitting Regression Models: Problems, Detection, and Avoidance
Example of an Overfit Model and Predicted R-squared
You can try this example using this CSV data file: PresidentRanking.
These data come from an analysis I performed that assessed the relationship between the highest
approval rating that a U.S. President achieved and their rank by historians. I found no correlation
between these variables, as shown in the fitted line plot. It’s nearly a perfect example of no relationship
because it is a flat line with an R-squared of 0.7%!

36. Interpret Adjusted R2 and Predicted R2.one (On 8-2-2022) Page 2


Now, imagine that we are chasing a high R-squared and we fit the model using a cubic term that
provides an S-shape.

36. Interpret Adjusted R2 and Predicted R2.one (On 8-2-2022) Page 3


Amazing! R-squared and adjusted R-squared look great! The coefficients are statistically significant
because their p-values are all less than 0.05. I didn’t show the residual plots, but they look good as well.
Hold on a moment! We’re just twisting the regression line to force it to connect the dots rather than
finding an actual relationship. We overfit the model, and the predicted R-squared of 0% gives this away.
If the predicted R-squared is small compared to R-squared, you might be over-fitting the model even if
the independent variables are statistically significant.
To read about the analysis above where I had to be extremely careful to avoid an overfit model, read
Understanding Historians’ Rankings of U.S. Presidents using Regression Models.
A Caution about the Problems of Chasing a High R-squared
All study areas involve a certain amount of variability that you can’t explain. If you chase a high R-
squared by including an excessive number of variables, you force the model to explain the
unexplainable. This is not good. While this approach can obtain higher R-squared values, it comes at the
cost of misleading regression coefficients, p-values, R-squared, and imprecise predictions.
Adjusted R-squared and predicted R-square help you resist the urge to add too many independent
variables to your model.
• Adjusted R-square compares models with different numbers of variables.
• Predicted R-square can guard against models that are too complicated.
Remember, the great power that comes with multiple regression analysis requires your restraint to use
it wisely!
If you’re learning regression, check out my Regression Tutorial!
Note: I wrote a different version of this post that appeared elsewhere. I’ve completely rewritten and
updated it for my blog site.
Share this:


• Save
Related

36. Interpret Adjusted R2 and Predicted R2.one (On 8-2-2022) Page 4


Regression Tutorial with Analysis Examples
In "Regression"

How to Interpret Regression Models that have Significant Variables but a Low R-squared
In "Regression"

How To Interpret R-squared in Regression Analysis


In "Regression"
Filed Under: RegressionTagged With: analysis example, conceptual, interpreting results
Reader Interactions
Comments

Jaysoonsays
July 19, 2022 at 11:09 am
Hi Jim,
Thank you for your excellent post and for continuing your informational blogs.
My question is a little bit off-topic.
What is your thought about the predicted R squared with NA value
due to leverage of 1.000, which in this case Predicted R squared and PRESS Statistic not defined?

36. Interpret Adjusted R2 and Predicted R2.one (On 8-2-2022) Page 5


due to leverage of 1.000, which in this case Predicted R squared and PRESS Statistic not defined?
I am doing some modeling, but I have encountered this several times.
I also searched the net, but I can’t find any discussion or published article about this.
I hope you could give some comment
Thank you.
Jaysoon
Reply

Jim Frostsays
July 20, 2022 at 12:45 am
Hi Jaysoon,
I’m not really sure what is causing the NA value for predicted R-squared in your software–although I do
have some guesses.
Unlike the regular R-squared, predicted R-squared can be negative. I’m guessing NA refers to negative
values, but I’m not sure. At the statistical software company I used to work at, they changed how they
reported negative values. At first, the software reported the negative R-squared value, However, later
they changed it so it reported 0% for negative values. Perhaps NA refers to negative values? I really
don’t know.
What cause a negative value and how do you interpret it? Well, if 0% is really bad for prediction, then
negative values are even worse! They’re really, really bad! You probably have very small total sample
size AND a large number of predictors given the sample size.
As for leverage points, removing them will change the estimated model greatly. And the predicted R-
squared process removes each point systematically, which should cause a large change and a low
predicted R-squared. That’s consistent with my theory that NA equates to a negative R-squared, which
wouldn’t be surprising with the presence of very influential leverage points.
But you’d really need to check with your software’s documentation about it to be sure.
Reply

Hannahsays
May 26, 2022 at 11:04 am
Hi Jim,
I have run two multiple regressions and the second one with an additional variable is significant but the
first one is not, what would be the explanation for this? Would this be due to an overfit model?
Thank you!
Reply

Jim Frostsays
June 2, 2022 at 11:16 pm
Hi Hannah,
I’m not exactly sure what you mean. You’ve run two regressions, and what becomes significant? The
entire regression model? The second variable?
Reply

Adele Manulatsays
April 29, 2022 at 1:30 am

36. Interpret Adjusted R2 and Predicted R2.one (On 8-2-2022) Page 6


April 29, 2022 at 1:30 am
I have performed multiple regresison analysis but out of 5 independent variables, only one has an
explanatory power and so i have performed backward deletion technique, as i was removing
insignificant IV one at a time, i noticed that the adjusted r-square was increasing this means that those
IV that i have removed does not contribute to the model, however when i was removing the 3rd
insignificant IV, I noticed that the adjusted r – square decreased. I have searched online for the reason
and I have found it here in which someone have commented who has the same experience as mine. You
have explained the cause was if an IV with t-statistic greater than 1 was removed then the adjusted r-
square will decrease, alternatively if it is lesser than 1 then it will increase and true enough the t-statistic
of the nonsignificant IV that i removed was greater than one while the others was not greater than one.
However I don’t know its implication and what to do next. Like should I continue to do backward
deletion till all of the IVs has sigificant explanatory power regardless if the adjusted r-squared was lesser
than before? What can we infer with this information sir and what would you advise for me to do to
arrive with an acceotable model? Thank you sir!
Reply

Jim Frostsays
April 29, 2022 at 4:34 pm
Hi Adele,
Yes, that can be a confusing issue. When you add a variable that has a t-value that is ≥ 1 but < ~1.96,
that variable causes the adjusted R-squared to increase even though the p-value won't be statistically
significant. The statistical measures disagree. You're talking about doing the reverse by removing
variables, but the same principles apply. So, what to do? For starters, I suggest you read my post about
specifying the best model. In that article, I talk about how choosing your model is a mix of using
statistical measures and theory. Statistical measures don’t always agree (as you’re seeing) and you really
need to let theory be your guide.
With this in mind, if the IV, it’s coefficient sign, and magnitude all make theoretical sense, I’d lean
towards leaving it in and explaining why in the writeup. On the other hand, if it doesn’t make theoretical
sense, there’s more reason to remove it. Also, consider the fact that generally it is better to leave in an
unnecessary variable than it is to remove a necessary one. So, when you’re not sure, err on the side of
leaving it in. If removing the variable makes the residual plots go from looking good to bad, you almost
definitely want to leave that variable in! Again, you’d need to explain the rationale in the writeup.
Unfortunately, it’s not possible for me to tell you the correct answer because that depends on very
subject-area specific knowledge, theory, and those other details I mention. But those are points I’d
consider.
Reply

Adele Manulatsays
April 28, 2022 at 10:04 pm
Hello sir, what does having a nonsignificant independent variable having a t-statistic greater than one
signify? or imply?
I learned that iremoving an independent variable has t- statistic greater than 1 decreases the adjusted r-
square. With this, is it okay to remove that variable and continue doing backward deletion technique in
order to arrive with the best model?
Reply

Adele Manulatsays

36. Interpret Adjusted R2 and Predicted R2.one (On 8-2-2022) Page 7


Adele Manulatsays
April 25, 2022 at 6:00 pm
what does this imoly sir? is it okay to proceed to backward deletion techniqie in order to arrive with the
a significant coeffient?
Reply

Chandrakant Bhogayatasays
April 17, 2022 at 1:44 am
You are right, Jim! t-value is 1.483 and it falls within that range.
Thanks a lot!
Reply

Jim Frostsays
April 21, 2022 at 2:33 am
You’re very welcome!
Reply

Chandrakant Bhogayatasays
February 22, 2022 at 11:34 pm
In my case, I performed a backward stepwise linear regression. In the final step, r square adjusted
decreased even after a non-significant predictor was removed. Why?
Thank you very much, Jim!
Reply

Jim Frostsays
February 23, 2022 at 10:18 pm
Hi,
For understanding when the adjusted R-squared will increase or decrease, the key value to know is the
t-value of 1. If the t-value for a variable is greater than 1, then adding it to the model causes adjusted R-
squared to increase. Alternatively, and for your case, if you remove a variable with a t-value > 1 the
adjusted R-squared will decrease.
The t-value for statistical significance varies depending on the degrees of freedom but it will always be
at least 1.96. Consequently, there is the range from 1.00 – 1.96 where the variable is not significant but
removing it will still cause the adjusted R-squared to decrease.
I think that is what happened in your case. If you check the t-value when the variable is in the model, I
bet it falls within that range.
Reply

Angel Lagosays
July 15, 2021 at 10:27 pm
Hi Jim,
I have a question. I am trying to run a regression analysis test on my dataset (6 IVs and 1 DV). However,

36. Interpret Adjusted R2 and Predicted R2.one (On 8-2-2022) Page 8


I have a question. I am trying to run a regression analysis test on my dataset (6 IVs and 1 DV). However,
my variables/predictors have five items each (sets of question) under them. Now I can’t run the test
because in SPSS and Excel, the input box will only accept one column/row for the DV. In other words,
how will I be able to run a test with several IVs and one DV with test results under them?
PS. I tried to copy and paste the five items (under DV) into one straight column so as to enter it into the
‘input’ prompt, but I’m not really sure if that is a valid way of testing the significance.
Thank you so much.
AAL
Reply

M Tanveer Hossain Parashsays


June 28, 2021 at 12:07 am
How to cite this in a journal article?
Reply

VICTOR MELLOsays
May 20, 2021 at 6:16 pm
Hello Jim. Great website, very clear and easy to follow. I have a question interpreting R2 when
comparing Multiple Linear Regressions with Linear Regressions. It would be great to have your thoughts
on it.
To illustrate, I am trying to find the correlation between a product Sales (Y) and its Prices (X). But not
only the company Prices, also the competition. When I run the Multiple Regression, I tend to get good
Adjusted R2s and Ps.
However, when I isolate those X’s one by one, the R2 tends to decrease. I’ve ran a lot of samples, and
find myself going back to this trend. It confuses me since other websites suggests that the multiple
regression could be better in this case.
Do you have a recomendation on which alternative I should go for?
Thank you,
Victor Mello
Reply

Jim Frostsays
May 20, 2021 at 6:39 pm
Hi Victor,
So, let’s start with simple regression and then move up to multiple regression. Let’s say you have sales
as the DV and in the first model you have Prices as the lone IV. You’ll get an r-squared. Suppose you use
that same dataset but add another IV, Competitor’s Prices, so you now have both IVs. R-squared will
never decrease when you add an IV. Theoretically, it could remain the same. However, in practice, it’ll
always go up by at least a trivial amount, but it can go up substantially.
The thing to remember is that R-squared NEVER goes down when you add an IV (assuming you’re using
the same dataset). Consequently, based on what you write, there is some sort of problem going on
because you’re describing a situation where a simple regression has a higher R-squared than a multiple
regression that has the original IV plus another one. That’s just not possible! I don’t know what the error
is, but there is some sort of serious error.
It IS possible for the adjusted R-squared to decline as you add an IV. I discuss the reasons for that in this
article. But, that doesn’t seem to be the situation to which you’re referring.
Beyond the R-squared issue, in a more general sense, if you have multiple significant IVs, it’s better to

36. Interpret Adjusted R2 and Predicted R2.one (On 8-2-2022) Page 9


Beyond the R-squared issue, in a more general sense, if you have multiple significant IVs, it’s better to
include them together in one model than fitting a simple regression model to “isolate” them one by one.
That might sound like a good idea but you’re actually introducing the possibility of omitted variable bias.
When you include them together, each coefficient estimate is that IV’s effect while holding the other
IV’s constant. It’s counter-intuitive but by include all the IVs, you’re actually able to isolate the role of
each because the model can control for the other IVs. By fitting the IVs separately, it allows that bias slip
in potentially because the model can’t control for those other variables.
I hope that helps!
Reply

Davidsays
April 11, 2021 at 8:06 am
Hi Jim, thanks for the answer!
Reply

Gemechu Asfawsays
April 10, 2021 at 1:53 am
is correlation coefficient positive and regression coefficient negative at the same time? how can we
interpret the result?
Reply

Davidsays
April 9, 2021 at 4:11 am
Hi Jim,
I was wondering about the interpretability of adjusted R-square when not used for model selection: in
particular, I was presented with a model and its adjusted R-square value, but not with its (non-adjusted)
R-square value.
In my understanding, adjusted R-square is a tool used to prevent over-fitting; so it is used when
comparing different model versions to each other: we test whether we should add or leave predictors
out.
Once a model is chosen, the adjusted R-square does not add any information anymore, instead one
should mention the R-square when presenting it.
Is my understanding correct?
Reply

Jim Frostsays
April 10, 2021 at 12:38 am
Hi David,
Frequently, or almost exclusively, you’ll see adjusted R-squared advertised as the way to compare
regression models with different numbers of predictors. However, there is another purpose for adjusted
R-squared. Regular R-squared is a biased estimator for how much variance the model explains in the
population. It tends to be too high. Sample R-squared values tend to overestimate the population R-
squared. Adjusted R-squared counteracts that by shrinking down R-squared to a point where it’s not a
biased estimator. Consequently, that’s another reason to report adjusted R-squared even when model
selection is done. I almost never see that in practice but I actually think it’s a good idea.

36. Interpret Adjusted R2 and Predicted R2.one (On 8-2-2022) Page 10


selection is done. I almost never see that in practice but I actually think it’s a good idea.
To read more on about that application of adjusted R-squared, read my post about Five Reasons Why
Your R-squared can be too High. It’s the first reason.
Reply

hyudatansays
April 6, 2021 at 9:42 am
Could you please explain why the difference between the adjusted R-squared and predicted R-squared
is preferably less than 0.2?
Reply

Stan Alekmansays
April 5, 2021 at 5:14 pm
Jim,
You describe R-squareds as explanatory statistics. I think of it as accounting terms: Proportion of
variation in response captured by the fitted model, which is therefore a relative measure of the
goodness of fit. I don’t know if we are saying anything different.
However RMSE is an absolute measure of the accuracy of a fitted model in the units of “Y”. The RMSE
gives the SD of the residuals. The RMSE estimates the concentration of the data around the fitted
equation. It can be used to compare models when the response variable does not change. I always
compare RMSE when terms are added or subtracted from a model.
Regards,
Stan Alekman
Reply

Jim Frostsays
April 5, 2021 at 5:39 pm
Hi Stan,
Statisticians frequently use both the “explain” and “account for” wording interchangeably. The model
explains X% of the variability. The model accounts for X% of the variability.
R-squared is a relative measure. As you say, RMSE is an absolute measure. I often like using the similar
but subtly different, Standard Error of the Regression (S), which is also an absolute measure but it
adjusts for the number of terms, making it more akin to adjusted R-squared whereas RMSE is more
closely related to R-squared. I’ve written a post, Standard Error of the Regression vs. R-squared that
looks at the differences. In that post, I explain why I like S quite a bit!
One point to be aware of with the RMSE is that it always decreases as you add terms, similar to how R-
squared always increases. S uses the same adjustment as adjusted R-squared and can actually increase.
Best wishes,
Jim
Reply

John Hogenbirksays
March 28, 2021 at 4:25 pm
Hi Jim,
I appreciate your easily understood explanations of all things statistical.

36. Interpret Adjusted R2 and Predicted R2.one (On 8-2-2022) Page 11


I appreciate your easily understood explanations of all things statistical.
I have a question about when the difference between the adjusted R-squared value and the predicted R-
squared value become a concern. You’ve replied several times that you “start to worry when the
difference is more than 0.1 (10%).” Am I correct in understanding this to mean an absolute difference of
0.1 or 10 percentage points? Or did you really mean a relative difference of 10%?
Regards,
John
Reply

Jim Frostsays
March 29, 2021 at 3:02 pm
Hi John,
Thanks for the question. I should make this more clear in the blog post itself!
I mean a difference of 10 percentage points. And that’s just a rule of thumb. But I don’t typically worry
much when it’s lower than that.
Reply

Aoun ALisays
January 25, 2021 at 9:48 am
Hi sir Jim,
I need your help in understanding the following :
R square value is 0.098
Adjusted R Square value is 0.079
R is..312
where my significance level is .027
what dose it mean ..
Reply

Jim Frostsays
March 26, 2021 at 2:44 am
Hi Aoun,
You have a low R-squared of 0.098 or 9.8%. That indicates that your model doesn’t explain much of the
variance in the dependent variable. However, if the p-value for your overall significance is 0.027 (I think
that’s what you’re saying but please correct me if that’s wrong), it is a significant model. It means that
you likely have at least one significant independent in your model. So, the model has a low R-squared
but with one or more significant IVs. That sounds confusing but I’ve written a post on exactly that topic!
How to Interpret a Model that have Significant Variables but a Low R-Squared
That post should answer your questions!
Reply

anniiezsays
January 21, 2021 at 3:07 am
Can I ask some question? In my experiment i.e. RSM CCD, I have R-square 0.6304, the difference of
predicted (0.5280) and adjusted (0.5853) R-square les than 0.2, lack of fit is not significant and model is
significant. Can I use this model for predict the response?

36. Interpret Adjusted R2 and Predicted R2.one (On 8-2-2022) Page 12


significant. Can I use this model for predict the response?
Reply

Jim Frostsays
January 22, 2021 at 12:38 am
Hi Anniiez,
There are several issues to continue. First, check those residual plots! If those look good, it’s a good sign
that your predictions won’t be biased. Even when you have a high R-squared, it’s possible to have biased
predictions, and checking the residual plots helps you avoid that.
The R-squared values relate to the precision of your predictions. There’s not a huge drop between your
R-square and predicted R-squared. So, that’s a good sign. However, the overall values for both are not
particularly high. High R-squared values aren’t important for all models, but they become important
when you want to make predictions. Lower R-squared values indicate that your predictions will be less
precise. It’s possible that you’re prediction will be so imprecise that they won’t be useful.
Here’s a couple of posts I wrote that you should read to help you understand the issue of prediction
precision:
Prediction Precision
Low R-squared Values and Prediction Precision
I hope this information helps!
Reply

Ana Ferreirasays
December 16, 2020 at 12:22 pm
What does it mean when the adjusted r-squared evidently increases between models, however, the
regression coefficient stays the same? Does this mean our explanatory variable is still a suppressor, or
due to the unchanged coefficient we cannot say this.
Reply

Jim Frostsays
December 17, 2020 at 10:56 pm
Hi Ana,
It means that whatever variables you added, they are likely to be worthwhile additions to the model.
The model is explaining more of the variance in the dependent variable even when you accounting for
the fact that you’re using more variables. So, that’s a good sign!
The fact that the coefficient of the explanatory variable in question didn’t change is neither a good or
bad thing really. What it means is that the new variables to the model are probably not correlated with
your variable of interest (VOI). If they had been correlated with both the VOI and the dependent
variable, when they weren’t in the model their absence would’ve been causing omitted variable bias in
the VOI. Adding in those variables reduces that bias and causes the coefficients to change. However,
that doesn’t appear to be the case because the VOI’s coefficient didn’t change. Consequently, the
interpretation of the VOI doesn’t change. In other words, you’d continue to interpret it the same way in
the new model.
I hope that helps!
Reply

36. Interpret Adjusted R2 and Predicted R2.one (On 8-2-2022) Page 13


Joseph Lombardisays
December 2, 2020 at 12:29 pm
Boy, do I wish Excel’s Data Analysis Regression add-in had Predicted R-Square!!!
If my model has only one (continuous) IV to begin with, why does Excel’s Regression tool return an
Adjusted R-Square? Obviously, there’s something I don’t understand.
Regards from Toms River, NJ, USA
Reply

Jim Frostsays
December 3, 2020 at 1:53 am
Hi Joseph,
There’s actually a good reason to know adjusted R-squared when you have only one variable. Typically,
you think of adjusted R-squared for helping you compare models with differing numbers of predictors.
However, there’s another use/interpretation of adjusted R-squared. It turns out regular R-squared is a
biased estimator. The R-squared in your statistical output tends to be higher than the correct population
value for R-squared. Adjusted R-squared shrinks down the regular R-squared to an unbiased value.
Adjusted R-squared doesn’t tend to be too high or too low on average. You can read more about that in
my post about Five Reasons Why Your R-squared Can Be Too High.
Reply

Bobbysays
November 21, 2020 at 10:44 pm
Hi Jim! Thank you for the insightful article.
I just wanted to ask, there are times when an independent variable in a multiple regression model is not
95% statistically significant (e.g. a p-value of 0.07). However, upon removing this variable from the
model, the adjusted R squared value also decreases. Hence, my question is which provides a better
measure of what model to use? Should I be removing predictors based off their p-value? Or rather,
should I be adding and removing variables based on the adjusted R squared? Which one takes priority?
Thanks for your help!
Bobby
Reply

Aaronsays
November 21, 2020 at 2:32 am
Hi Jim!
What is wrong with this kind of thinking: “I understand that R-squared is not a perfect measure of the
quality of a regression equation because it always increases when a variable is added to the equation.
Once we adjust for degrees of freedom by using Adjusted R-squared, though, it seems to me that the
higher the Adjusted R-squared, the better the equation.”
Reply

36. Interpret Adjusted R2 and Predicted R2.one (On 8-2-2022) Page 14


Amogh Bharadwaj D Nsays
October 10, 2020 at 6:42 pm
Hello Jim, based on the inrepretation of the Predicted R-squared and Adj R-squared above, why is that in
the last example Adj R-squared is 50% but predicted R-squared is 0% ? does not 50% Adj R-squared
provide good estimate for an estimate on the population ?
Reply

Jim Frostsays
October 13, 2020 at 1:52 pm
Hi Amogh,
These two statistics are telling you different things. Adjusted R-squared includes a shrinkage factor to
counteract the fact that regular R-squared is a biased estimator. Sample R-squared values tend to be
higher than the true population value and adjusted R-squared corrects for that bias.
Predicted R-squared indicates how well a model without each observation would predict that
observation.
Because what they measure is so different, it’s not surprising that the results can be different. I find that
predicted R-squared tends to be more sensitive to models that are overly complicated. Overfitting is
when the model starts to fit the random noise in the data. Because random noise is, by definition, not
predictable, this problem shows up in the predicted R-squared. Adjusted R-squared is not designed to
detect that problem–hence it doesn’t show up there.
I hope that helps clarify it!
Reply

Kyle Seibenicksays
September 29, 2020 at 9:09 pm
Hi Jim. Thanks for writing the regression ebook, this is a great refresher and enhancement of my skills.
I already saw this question and your response was “you don’t know of any easy way”. So I’ll ask – do you
know of a hard way or manual way to calc predicted r2 (or “PRESS” as I’m seeing in other places) in
Excel?
Reply

Tomingansays
August 16, 2020 at 6:05 am
Hi Jim
How can we conclusively tell that the number of IV are optimum for a given DV. As you did mentioned
that the more we add ID the r squared will continue increase. So when or where is the stopping point.
Any simple test that can be done. Please help Jim. Thank you
Reply

Jim Frostsays
August 17, 2020 at 9:11 pm

36. Interpret Adjusted R2 and Predicted R2.one (On 8-2-2022) Page 15


August 17, 2020 at 9:11 pm
Hi Tomingan,
Read the section about adjusted R-squared more carefully. I talk about that there! Adjusted R-squared
gives you an idea.
You can also look at the number of observations for each term in the model, as I discuss in my post
about overfitting regression models. Ultimately, the number of IVs you can add is limited by the number
of observations.
Also, you should let theory guide your model building. There’s no simple test that’ll tell you when you
get your model just right. However, by letting theory be your guide, you can get a better sense. Read my
post about choosing the correct model for more information.
Also, given all your questions about regression analysis, you should get my regression analysis ebook! It
covers all of this and more!
Reply

Tomingansays
August 16, 2020 at 4:31 am
Hi Jim
I have this reading, r = .344, r Squared = .118 and Adj. r sqared = .084. form 1 DV and 5 IV. My initial
analysis is that there is a low positive correlation. About 11.8% of the DV is explained or supported by
the IV. There is no telling that the 5 IV is the sufficient number. I believe 5 IV is not the optimum
number. What other testing can we do to identify the optimum number of IV? Thank you Jim.
Reply

Tomingansays
August 16, 2020 at 3:38 am
Hi Jim
I value greatly your comment on the r and r squared as well as the adjusted r squared. Can you please
indicate the best reference for this please. Thank you
Reply

Maximsays
August 12, 2020 at 6:42 am
Hello Jim,
Thank you for your blog! It helps a lot in doing my research. Could you please provide any reference for
the predicted R squared? I have found a method for its calculation, but all I can reference so far is
various posts in the internet. Thank you!
Reply

Dereksays
August 3, 2020 at 2:57 pm
Hey Jim,
I appreciate you sharing this article. I know that if an adjusted r-squared is 0.58, then the independent
variables in my model collectively account for 58% of the variability in the dependent variable around its
mean. I know that this is a basic question, but how would the interpretation differ if the predicted r-
squared

36. Interpret Adjusted R2 and Predicted R2.one (On 8-2-2022) Page 16


squared
is 0.58 (instead of the adjusted r-squared?
Thank you!
Derek
Reply

Jim Frostsays
August 5, 2020 at 12:23 am
Hi Derek,
Typically, analysts use adjusted R-squared to compare models with different numbers of predictors, as I
show in the post. But, interestingly, it has its own unique interpretation. While regular R-squared is the
amount of variation the model accounts for in your sample, adjusted R-squared is an estimate for how
much your model accounts for in the population. I write about this interpretation in this post.
But, on to your question! For predicted R-squared, the interpretation is the amount of variability that
your model accounts for in new observations that were not used during the parameter estimation
process.
Reply

Simon McGreesays
July 13, 2020 at 2:23 am
Hi Jim
Thank you for your earlier reply to my comment. Below is a summary of my analysis
I have 43 years of annual crop yield data and 360 climate indices (rain, maximum and minimum
temperature individual month and seasonal combinations). I use PCA to reduce the number of climate
variables and deal with multicollinearity. The scree plot shows no obvious elbow so I retain 32 PCs or
99.9% of the variance. Some of the variables have a weak relationship with sugarcane so it is possible
the first PCs have a weak relationship with sugarcane, another reason to perhaps retain more PCs. I then
examine the absolute value of the PC coefficients, I select the climate variable with the highest
coefficients to represent that PC.
I then use stepwise regression backward elimination. I stop at the highest R-sq predicted.
In the process of my paper undergoing review. I received the following “all data are used to screen for
hindcast skill, and hence there is potential for “artificial skill”. The authors indicate that they used
“leave‐one‐out‐cross validation”. However, they are using PCAs which does utilises all data in calculation
of principal
components. When this is done, statistical models have artificial skill in cross‐validation mode. Statistical
models so derived will be useless in actual prediction; their apparent skill results from the fact that the
crossvalidation is not truly on independent data because the entire sample was used to screen the
predictors from PCAs.
Appreciate your thoughts.
Reply

Raosays
June 27, 2020 at 2:00 am
Hi Jim,
Thanks for a great blog.
I’m curious why you say that adjusted R-square has no associated p-value.
The difference between R-square and adjusted R-square is in their degrees of freedom. I assume the

36. Interpret Adjusted R2 and Predicted R2.one (On 8-2-2022) Page 17


The difference between R-square and adjusted R-square is in their degrees of freedom. I assume the
sampling distribution of both is F; however, while these F distributions – defined by numerator and
denominator degrees of freedom – should be different, it should be just as easy to show p-values for
adjusted R-square as for R-square.
And yet, all regression output show just one ANOVA table – for population R-square estimated by
sample R-square. Why is this?
Rao
Reply

Jim Frostsays
June 27, 2020 at 4:09 pm
Hi Rao,
I’ve never seen one developed or used. Usually, adjusted R-squared is used to compare models with
differing numbers of predictors. Even with regular R-squared, you don’t usually see it discussed in
relation to its p-value.
I don’t really know why. I’ve never seen a discussion of this issue. R-squared is a biased estimator
whereas adjusted R-squared is not. I agree with your reasoning, but I don’t have answer for why it’s not
done.
Reply

Daisysays
June 18, 2020 at 6:28 am
Hi Jim, this is a hugely helpful website.
I am trying to calculated predicted R2 in stata following mixed effects ML regression. Do you have any
syntax for how to create it?
Reply

Jim Frostsays
June 18, 2020 at 12:14 pm
Hi Daisy, sorry, I’m not a stata user so I don’t know what command you’d use.
Reply

Mukhtarsays
June 9, 2020 at 8:46 am
Thanks for the comment and suggestions. I really appreciate your effort in educating masses through
your blog.
Reply

Jim Frostsays
June 10, 2020 at 12:09 pm
You’re very welcome, Mukhtar!
Reply

36. Interpret Adjusted R2 and Predicted R2.one (On 8-2-2022) Page 18


Reply

Mukhtarsays
June 5, 2020 at 11:54 pm
Hello Jim,
is there any benchmark that for the difference in r-suare, r-square (adj) and R-square (pred) values.
i have the following case and suspect if the model is overfit.
Source DF Adj SS Adj MS F-Value P-Value
Regression 5 0.174696 0.034939 14.81 0.000
Vc 1 0.033814 0.033814 14.33 0.001
ap*2 1 0.162968 0.162968 69.06 0.000
fr*2 1 0.143943 0.143943 61.00 0.000
A*2 1 0.015032 0.015032 6.37 0.020
ap*fr 1 0.151329 0.151329 64.13 0.000
Error 21 0.049556 0.002360
Lack-of-Fit 3 0.008540 0.002847 1.25 0.321
Pure Error 18 0.041017 0.002279
Total 26 0.224253
Model Summary
S R-sq R-sq(adj) R-sq(pred)
0.0485781 77.90% 72.64% 62.38%
other models have the following results, with all p-values significant
S R-sq R-sq(adj) R-sq(pred)
0.0587089 81.32% 76.87% 69.63%
S R-sq R-sq(adj) R-sq(pred)
0.0058268 73.94% 69.21% 60.91%
Thanks……
Reply

Jim Frostsays
June 8, 2020 at 3:50 pm
Hi Mukhtar,
There’s no standard guideline that I’m familiar with. But, I always start to worry when the difference is
greater than 10%. For overfitting, you also need to consider the number of observations per model
term. Given your output, I’d say you have some reason for concern about overfitting. I think reading the
other post will help you out.
Reply

Parikshitsays
May 28, 2020 at 12:22 pm
Thanks Jim.
Reply

Parikshitsays
May 27, 2020 at 2:10 pm

36. Interpret Adjusted R2 and Predicted R2.one (On 8-2-2022) Page 19


May 27, 2020 at 2:10 pm
Hi Jim,
I have two questions
1. What is the minimum r-square value, above which relation between variable and response can be
considered significant?? Why??
2. In model if r-square is 0.80, what should be minimum level of r-square adjusted, to use the model for
prediction??
Thanks
Parikshit
Reply

Jim Frostsays
May 27, 2020 at 4:59 pm
Hi,
For an R-squared to be statistically significant, the overall F-test for the model must be significant. To be
practically significant, that depends on the field of study.
Use predicted R-squared to assess prediction, not adjusted R-squared. There’s no exact guideline for
how close it must be. I start to worry when the difference is more than 0.1 (10%). However, you
probably should be assessing the precision of the prediction as I describe in this post about S vs. R-
squared.
Reply

Anne Wambuisays
May 27, 2020 at 11:35 am
Hello, Thank you for the explanations. I have a questions. I have used multiple regression to compare
three groups, when I removed one variable the model was not significant for one group, other
independent variables became significant(they were not before) R squared decreased significantly for
the second group, and one group has a slight decrease in R squared. How can I interpret this or meaning
of this factor.
Reply

Jim Frostsays
May 27, 2020 at 5:08 pm
Hi Anne,
If you’re find that the significance of predictor changes depending on specifically which variables are
include in the model, you might well have multicollinearity (correlated IVs). Read my post about
multicollinearity for more information.
Reply

Heidisays
May 26, 2020 at 12:27 pm
I completed a multi regression analysis in Exel with three independent variables and the results show an
R-squared value is 0.11 but the adjusted R-squared is 0.98. How could these values be so different? Also,
excel doesn’t give a predicted R squared value. is there another [easy] way to get it? The residuals show
values for the predicted [dependent variable] but that can’t be it.

36. Interpret Adjusted R2 and Predicted R2.one (On 8-2-2022) Page 20


values for the predicted [dependent variable] but that can’t be it.
BTW, I really appreciate your blog – it is the only onestatistics info I’ve found that makes any sense at
all.My textbook is all but useless. I still can’t claim to understand any of it, really, but reading your pages
helps a lot – if only to get through the assignments with a passing grade. Thanks.
Reply

Jim Frostsays
May 26, 2020 at 9:38 pm
Hi Heidi,
I’m so glad my blog has been helpful!
For the first thing, it’s impossible for the R-squared value to be lower than the adjusted R-squared for
the same model. There’s something off there. I don’t think there’s any easy way to get predicted R-
squared with Excel.
It is possible to have a large difference between R-squared and adjusted R-squared. However, adjusted
R-squared will always be smaller than R-squared. If there is a large difference, it might indicate you have
too many predictors (IV) in your model. It comes down to the number of observations per term in your
model. To see how this works, look at my post about Five Reasons Why Your R-squared can be Too High.
In the first reason, you’ll read about adjusted R-squared and see a graph that shows how adjusted R-
squared decreases by the sample size per term.
Reply

Julie Nielsensays
May 8, 2020 at 7:07 am
Hi Jim,
I have two models where I add time fixed-effects and robust and clustered standard errors. When I add
FE and robust and clustered standard errors to my models, model 1’s R-squared increases while model
2’s R-squared becomes negatives (from 0,301 to -0,385). If I look at my coefficient in the two models,
none of them seems to be significant, but I don’t understand how one of the R-squared can become
negative?
Reply

ankitasays
May 7, 2020 at 3:30 pm
sir, i am getting predicted R squared value as zero. Is it normal? please help me out.
Reply

Jim Frostsays
May 7, 2020 at 3:38 pm
Hi Ankita,
Yes, it’s possible. Unlike regular R-squared, both adjusted and predicted R-squared can fall below 0%. In
terms of interpretation, just interpret it as if it were 0%. It’s not good. Usually when you get a negative
value, it means you have a very small sample size along with an overly complex model.
Reply

36. Interpret Adjusted R2 and Predicted R2.one (On 8-2-2022) Page 21


Elizabeth Causleysays
April 23, 2020 at 4:13 pm
I am new to this whole process and I am still learning. If I have an adjusted R-Squared of 0.05448 for
data that includes 4 IV to 1 DV, what would I interpret that as? Also, I’m not sure if you can answer this
here, but this also gives me a F-Statistic as 79.2 on 4, how would that be interpreted?
Any help would be greatly appreciated!
Reply

Jim Frostsays
April 25, 2020 at 1:42 am
Hi Elizabeth,
It sounds like your four IVs explain a very low proportion of the variance in the DV. Are any of the p-
values for the coefficients statistically significant?
What’s the p-value associated with the F-statistic. You’ll usually only interpret the p-value for the F-
statistic rather than the F-value itself. You can read my post about the overall F-test for more
information.
Reply

Swapnilsays
April 7, 2020 at 9:02 am
Hi Jim,
Can you please help me out with this data. Is it statistically significant or not
Model Summary
S R-sq R-sq(adj) R-sq(pred)
52.0410 97.63% 93.49% 78.69%
Analysis of Variance
Source DF Adj SS Adj MS F-Value P-Value
Model 7 446694 63813 23.56 0.004
Linear 7 446694 63813 23.56 0.004
A 1 150522 150522 55.58 0.002
B 1 885 885 0.33 0.598
C 1 138967 138967 51.31 0.002
D 1 118108 118108 43.61 0.003
E 1 19212 19212 7.09 0.056
F 1 9624 9624 3.55 0.133
G 1 9377 9377 3.46 0.136
Error 4 10833 2708
Total 11 457527
Reply

Jim Frostsays
April 7, 2020 at 11:53 pm
Hi Swapnil,
Please read my post about regression coefficients and p-values. That post will show you how to

36. Interpret Adjusted R2 and Predicted R2.one (On 8-2-2022) Page 22


Please read my post about regression coefficients and p-values. That post will show you how to
determine significance and what it means. You have some insignificant terms that you should consider
dropping from the model.
In a nutshell, it looks like overall your model is significant. Some of the predictors are significant while
others are not. However, it looks like you might be overfitting your model. You might be including too
many terms given your sample size, which can distort the results. Click the link to read about that.
If after reading those posts you have more specific questions, please post them in the comments for the
relevant article. Thanks!
Reply

Sukusays
April 4, 2020 at 7:01 am
Hi Jim,
I need your help in understanding the following :
R square value is 0.018
Adjusted R Square value is -0.024
R is.0.133
What does a negative Adjusted R Square value predict about the relationship between 1 DV and 1 IVs ?
Thanks
Suku
Reply

Jim Frostsays
April 5, 2020 at 6:53 pm
Hi Suku,
Just interpret the negative value as if it were zero. Your model does not explain variability in the DV.
Reply

Alfer Jann D. Tantogsays


March 29, 2020 at 7:06 am
Hello! How do you split R-squared among the predictor variables? I have read a journal wherein the R-
squared is .400 = 40% and then they split the value between 3 predictors. 18 for predictor 1, 21.6 for
predictor 2, and 0.4 for predictor 3. May I ask how can I calculate it?
Reply

Jim Frostsays
March 29, 2020 at 7:49 pm
Hi Alfer,
I suspect that you’re referring to the practice of the increase in R-squared that occurs when you include
each predictor in the model last. That’s not exactly “splitting” the R-squared but I think it is what you’re
referring to. I’ve written a post that talks about this method as a way of determining the importance of
each predictor. I’d read that post to see if it answers your questions!
Reply

36. Interpret Adjusted R2 and Predicted R2.one (On 8-2-2022) Page 23


Idasays
February 18, 2020 at 9:28 pm
Hi Jim
I know I am a little slow here, but:
How can you tell if the adjusted R^2 is significant? Is it always significant if the p-value is higher than 0.5,
or is there a number I can navigate from when it comes to interpreting the adjusted R^2
Tank you!
Reply

Jim Frostsays
February 20, 2020 at 11:07 am
Hi Ida,
There’s no p-value for adjusted R-squared. Typically, you use it to compare models with different
numbers of predictors/IVs. It’s more for comparing models rather than determining statistical
significance. However, there is a p-value for the regular r-squared, although you might need to hunt for
it in the statistical output. The F-test of overall significance produces a p-value. When that p-value is less
than your significance level, you can reject the null hypothesis that R-squared equals zero.
I hope this helps!
Reply

Simon McGreesays
February 16, 2020 at 7:16 pm
Hi Jim
I’m concerned I have over fitted my models but first let me give you a bit of background.
I have 43 years of annual sugarcane and sugar data. I have 852 climate indices (rain, maximum and
minimum temperature individual month and seasonal combinations). I use PCA to reduce the number of
climate variables and deal with multicollinearity. The scree plot shows no obvious elbow so I retain 25
PCs or 99.9% of the variance. Some of the variables have a weak relationship with sugarcane so it is
possible the first PCs have a weak relationship with sugarcane, another reason to perhaps retain more
PCs. I then examine the absolute value of the PC coefficients, I focus on the four climate variables with
the four highest coefficients. The representative variable for each coefficient that I take to the next
stage is the one that has the strongest correlation coefficient with sugarcane and sugar yield
respectively.
I then use stepwise regression backward elimination. I stop at the highest R-sq predicted. For the
sugarcane model I have an adjusted R squared of 79% and predicted R squared of 73% (DF = 8). For the
sugar model I have an adjusted R squared of 81% and predicted R squared of 73% (DF=11). How am I
doing? Appreciate your thoughts.
I repeated the above with 70% of the variance retained. the R-sq adjusted and predicted values are
much lower. It appears some key climate variables are lost by only retaining 70%
Regards
Simon
Reply

Jim Frostsays

36. Interpret Adjusted R2 and Predicted R2.one (On 8-2-2022) Page 24


Jim Frostsays
February 20, 2020 at 3:59 pm
Hi Simon,
Based on what you write, I’d say you’re doing very well! I’d agree that the model with the higher
predicted R-squared is likely to be better. As always, use your subject area knowledge to apply statistics
correctly. But, I don’t see any obvious errors in your approach.
Reply

Reisays
January 28, 2020 at 8:52 am
So i’ve got to do a paper using regression analysis. I use 3 model, linier, quadratic and exponential as
comparison. Each of them got :
Linier R2 : 0.197 R2 ad : 0.875
Quadratic R2 : 0.931 R2 ad : 0.794
Exponential R2 : 0.919 R2 ad : 0.879
Which model i choose..?
Reply

Jim Frostsays
January 28, 2020 at 11:15 pm
Hi Rei,
Choosing a model is more than just going by several R-squared statistics! Check graphs and theory. For
more information, read my post about choosing the correct model. It’s not even possible to say that any
of those three are the correct model with the information provided. And, if your model has curvature,
which seems likely, read my post about curve fitting, which describes different methods and how to
compare the resulting models.
Best of luck with your analysis!
Reply

MD VASEEM CHAVHANsays
January 25, 2020 at 4:35 am
Thanks for explanaition.
please comment on the following model Model Summary
S R-sq R-sq(adj) R-sq(pred)
14.7955 99.33% 88.60% 0.00%
Reply

Jim Frostsays
January 25, 2020 at 4:58 pm
Hi,
Your example closely matches the example that I use in the section of this post titled, “Example of an
Overfit Model and Predicted R-squared.” Read that section more closely. You have an overfit model.
I’ve also written a post about overfitting models that will help you understand.
Reply

36. Interpret Adjusted R2 and Predicted R2.one (On 8-2-2022) Page 25


maheshsays
January 13, 2020 at 8:24 am
Hi Jim
I run the regression analysis and getting following results of R squared, adjusted R2 and predicted R2.
Model Summary
S R-sq R-sq(adj) R-sq(pred)
2047.24 99.11% 99.03% *
my question,
why predicted R2 has * value?
Is model good adjusted R2 or predicted R2 has * value?
Thanks,
Mahesh
Reply

Jim Frostsays
January 14, 2020 at 11:05 am
Hi Mahesh,
I’m not really sure. I’m drawing a blank as to why the procedure would be able to calculate R-squared
and adjusted R-squared but not predicted R-squared. Is there a chance that you have only three
observations? I’m thinking of a scenario where you have enough degrees of freedom to fit the model
when you use all the observations but not enough for predicted R-squared where you’re systematically
fitting the model multiple times where each time one observation is removed. That would suggest you
have just barely enough degrees of freedom to begin with and you’re probably overfitting the model
anyway. But, when one observation is removed you no longer have a sufficient number of DF.
I’m not sure that’s what is happening but it’s one possible scenario.
Reply

rezvansays
October 25, 2019 at 3:54 pm
Hi Jim,
I am writing my paper about optimization the leaching process of Cd by RSM using DX7. I obtained R2=
0.79, adjusted R2=0.74, and predicted R2 = 0.59. The software in box cox proposed me to normalize
data by transforming λ from 1 to 3, Then the results would change as follow R2 = 0.85, adjusted R2 =
0.80, and predicted R2 = 0.71. the other statistical tools like F-value , P-value and others would be
approximately constant in terms of being significant or not significant. I am confused if i do
transformation or not.
Thanks
Reply

Jim Frostsays
October 25, 2019 at 4:11 pm
Hi,
Check the residual plots for the model that does not transform the data. If the residual plots look good,
you don’t need to transform the data. On the other hand, if you see a problem in the residual plots, such

36. Interpret Adjusted R2 and Predicted R2.one (On 8-2-2022) Page 26


you don’t need to transform the data. On the other hand, if you see a problem in the residual plots, such
as severe nonnormality or heteroscedasticity, consider transforming the data. However, I always
recommend that transformation should be the last resort. There are other methods that can fix this
problems in some cases. These other methods involve fitting a better model. For example, a
misspecified model can produce nonnormal residuals and heteroscedasticity. You’d want to be sure that
you are specifying the correct model before considering a data transformation.
My article about heteroscedasticity (see link) discusses some of those other options for non-constant
variance. My ebook about regression analysis goes into much more detail about when and why you
might want to transform your data, when you wouldn’t, how to transform data, and how it all works.
Those details would apply to your analysis as well.
Again, if your residual plots for the model that uses the untransformed data look good, don’t transform
your data! Transformations can fix particular types of problems as a last resort.
I hope this helps! Best of luck with your analysis.
Reply

Mikesays
September 28, 2019 at 1:11 pm
Hi Jim.
Great Article. I would like some advice. I’m trying to build a linear regression model. I’ve determined
what the control variables are going to be based on prior knowledge and previous literature. I now need
to work out which of my 7 predictors to include in my final model with those control variables. In the
past I have decided on which predictors to include in the final model based on significance, adding those
with a p value <0.10. However, I've been speaking to a statistician, and instead they recommend
choosing the model with the best adjusted r2 value. I've seen lots of studies using my usual method for
variable selection, but I haven't come across any that selected variables based on adjusted r-squared
values. So, I'm just wondering whether you would recommend choosing the model with the highest
adjusted r-squared value, and whether you know of any papers that have selected variables for the final
model using this method? Looking forward to hearing from you.
Mike
Reply

Jim Frostsays
September 29, 2019 at 2:08 pm
Hi Mike,
Choosing the correct model is almost as much of an art as it is a science. One thing I always highlight is
the need to incorporate your subject-area knowledge about the underlying process/research question.
Never go solely by statistical measures. I’d also add to that by saying, there’s no single statistical
measure that is best. In fact, the various measures can disagree. Adjusted R-squared is a good on to
keep an eye, but it can lead you astray. For example, if you start to overfit your model, the adjusted R-
squared can look great, but your coefficients and their p-values are all messed up (technical term
there!). Chasing a high R-squared or adjusted R-squared can lead to problems.
Also, it’s important at least to pay attention to the p-values of the coefficients. If you include too many
variables that are not significant it reduces the precision of your model. Taken further, it can lead to the
overfitting I referred to before. However, if you have to choose between the possibility of leaving out an
important variable even though it’s not significant versus leaving it in even though you’re not sure, yes,
it’s generally better to include it. And, perhaps that’s the thinking behind the recommendation.
However, you shouldn’t take that too far!
I’d suggest reading my post about specifying the correct model. And, then for an illustration of how R-
squared and adjusted R-squared can lead you astray, read my posts about overfitting and data mining

36. Interpret Adjusted R2 and Predicted R2.one (On 8-2-2022) Page 27


squared and adjusted R-squared can lead you astray, read my posts about overfitting and data mining
which shows the dangers of only going by statistical measures. And, finally, automated variable selection
procedures can point you in the correct direction, but research has found that they don’t identify the
correct model in the majority of cases. Read my post about automated variable selection procedures for
more information.
If after those post you have more questions, don’t hesitate to post them. Also consider my ebook that
focuses regression in more detail!
I hope this helps!
Reply

Merpatisays
May 27, 2019 at 12:05 am
Hello! I want to ask. All my R2, adj.R2 and predicted R2 got the value of 1.0000.
Is it acceptable? And if possible, could you help me to deduce this information bcs I, myself not so good
in statistical analysis. Btw, the results were from three independent variables (pressure, time and
temperature) with one dependent variable (antioxidant activity).
I hope that I can improve my understanding on this matter.
Thank you in advance ^^
Reply

Jim Frostsays
May 27, 2019 at 9:44 pm
Hi,
Unfortunately, no, that’s not normal. Usually you only obtain an R-squared of 1 under several related
problematic circumstances.
If you fit a model that contains the same number of independent variables as observations, you’ll always
get an R-squared of 1 (or 100%).
If you overfit a model, which means too many terms for your number of observations, you can get the
same thing.
This can also happen with an automated procedure such as stepwise regression with a relative small
dataset and lots of candidate predictors.
I’m not sure what is going on with your data. If it’s physical process where the measurements are very
precise/accurate and there’s extremely low noise in the system, you can get R-squared values in the
90-99% range. Unless your software is rounding up, I’d be very skeptical. I’ve never seen a legitimate
100% in practice. 100% would indicate no random error in the model at all AND no measurement error
all. That just doesn’t happen in the real world. I’m assuming this is real world data rather than generated
data.
Reply

Jeffsays
May 24, 2019 at 7:07 pm
It’sincredible how clear and simple you can explain difficult concepts. Thank you, really
Reply

Ronniesays

36. Interpret Adjusted R2 and Predicted R2.one (On 8-2-2022) Page 28


Ronniesays
April 15, 2019 at 2:59 pm
Hello Jim,
Thanks so much for these posts! Just recently came across them and they’re incredibly useful!
I’m using partial least squares regression to model a response variable against spectral data. If I select a
number of latent variables that produces regular R2=0.49 and predicted R2=0.27, using 56 observations
of the response variables, what are your thoughts? Certainly, I’m fitting the calibration better than when
making new prediction, but I also know that we should always expect the predicted R2 to be somewhat
higher than the regular R2; and this is probably certainly the case with small number of observations.
Do you believe this type of fit would be justifiable given the relatively small number of observations
used to calibrate?
Thanks very much for your help!
Ronnie
Reply

Jim Frostsays
April 16, 2019 at 11:51 am
Hi Ronnie,
I’m really happy to hear that you found my site to be helpful!
Regular R-squared should be greater than Predicted R-squared. The model can’t predict new
observations better than the data used to fit the model. You might be thinking of the test R-squared.
The test R-squared is generally lower than the Predicted R-squared. A test R-squared is based on
validation data. The software uses an existing model and a new dataset to see how well the model
predicts values that were not used to estimate the model.
To make good predictions, you want Predicted R-squared to be close to the regular R-squared. And, you
want the test R-squared to be close to the Predicted R-squared.
For your dataset, it appears like the regular R-squared and predicted R-squared are not that close. This
condition indicates that your model doesn’t predict new observations as well as it fits the data used to
fit the model. Chances are that your test R-squared would be even lower than the predicted R-squared.
I’m not knowledgeable in model spectral data, so I’m not sure how this fit compares to similar models
and industry standards. I’d recommend doing some research to see what sort of fit is typical for this type
of data and see how your model compares. Some study areas are inherently more or less predictable
than other areas. So, I can’t really say whether the fit you’ve obtained is “justifiable.” The basic question
you need to answer is whether the fit you obtain is representative of the study area and really the best
you can do given the nature of the data. Or, do you need to improve the model to obtain a better fit.
Those answers depend on subject-area knowledge.
Reply

Allan Paolosays
April 1, 2019 at 11:00 am
Hi again Jim!
I just want to take time to thank you. Thanks to this article (and to you of course) I was able to get my
master’s degree. Thanks a lot!
Reply

Jim Frostsays
April 1, 2019 at 4:46 pm

36. Interpret Adjusted R2 and Predicted R2.one (On 8-2-2022) Page 29


April 1, 2019 at 4:46 pm
Hi Allan!
You’re very welcome! I’m so happy for you, and your comment absolutely makes my day!
Reply

Patrik Silvasays
November 19, 2018 at 4:25 am
Dear Jim!
I would like to know if you can clarify some of this points to me:
In the text section where the title is “What is the Predicted R-Squared”, I have read this:
“Statistical software calculates predicted R-squared using the following procedure:
1- It removes a data point from the dataset.
2 – Calculates the regression equation.
3 – Evaluates how well the model predicts the missing observation.
4 – And, repeats this for all data points in the dataset.
a) Is this procedure the same as what is called LOOCV (Leave one out Cross Validation)?
b) Which values do we compare to R-squared? Do we need to record the R-Squared in each time that
we leave one out till the last observation?
I want to understand this procedure to see which statistic it corresponds to in SPSS software.
Thank you in advance!
PS
Reply

Alsays
October 6, 2018 at 4:22 pm
Hi Jim,
Very helpful post.
Regarding the issue of “how much of the variation in the y values does the regression model explain”
1. The adjusted-R-squared is the answer to this for multiple regression, yes?
2. Why don’t we also use adjusted-R-squared when answering the question for simple regression? (most
stats textbooks use R-squared for this)
In general, when comparing a regression model with one independent variable to a model with multiple
independent variables — do we compare them on adjusted-R-squared, or do we compare the adjusted-
R-square of the second model with the R-squared of the first?
Thanks,
Al
Reply

Jim Frostsays
October 6, 2018 at 9:46 pm
Hi Al,
These are great questions! And, there is confusion in this area because many people don’t know exactly
what R-squared measures.
Let’s start with the easy part. When you’re comparing models with different numbers of independent
variables, use adjusted R-squared. Specifically, compare the adjusted R-squared from one model to the
adjusted R-squared values of the other models. Don’t use the regular R-squared for any of the models.
Now, onto which R-squared to report for what models. Typically, analysts will report the regular R-

36. Interpret Adjusted R2 and Predicted R2.one (On 8-2-2022) Page 30


Now, onto which R-squared to report for what models. Typically, analysts will report the regular R-
squared for the final model that a study settles on. That’s the norm. However, I disagree with that
practice a bit. I think that analysts should normally report the adjusted R-squared for all final models,
even when it has only one independent variable. The reason why is because regular R-squared is a
biased estimate. It tends to be too high. How much too high depends on the number of observations per
term in the model. Adjusted R-squared corrects for this upwards bias. In other words, adjusted R-
squared is an unbiased estimate of the amount of variance the model accounts for in the population–
which is why I think it should be the value that is reported. I write more about this in my post Five
Reasons Why Your R-squared can be Too High. It’s reason number 1.
Thanks for the great questions!
Reply

Jonathansays
August 21, 2018 at 11:42 am
Hello,
I have a challenge here i have rsquared of 0.6596 and an adjusted rsquared of -0.3617! How can this be
interpreted? what can you say about this ?
Thanks
Reply

Jim Frostsays
August 23, 2018 at 2:19 am
Hi Jonathan,
Chances are that you are severely overfitting your model. You probably have very few observations per
model term. To learn more about this problem, read my post about overfitting!
Reply

Kripasays
August 7, 2018 at 9:35 pm
Hi Sir,
Can you help me to interpret R squared value of .166 and Adjusted R squared value of .158?
Reply

Jim Frostsays
August 8, 2018 at 4:55 pm
Hi Kripa,
These blog posts should provide you with enough information so you know how to interpret these
values.
The R-squared value indicates that your model accounts for 16.6% of the variation in the dependent
variable around its mean. That’s usually considered a low amount. You typically interpret adjusted R-
squared in conjunction with the adjusted R-squared values from other models. Use adjusted R-squared
to compare the fit of models with a different number of independent variables.
Additionally, regular R-squared from a sample is biased. It tends to over-estimate the true R-squared for
the population. Adjusted R-squared is an unbiased estimate of the population value.
I hope this helps!

36. Interpret Adjusted R2 and Predicted R2.one (On 8-2-2022) Page 31


I hope this helps!
Reply

Tejaswi Dalavisays
July 8, 2018 at 2:53 am
what is the exact difference between R square & adjusted R square.which is better?
Reply

Jim Frostsays
July 8, 2018 at 3:02 am
Hi Tejaswi, you’re in the right place to learn about the differences. This blog post describes adjusted R-
squared. In it, there’s a link to my blog post about the regular R-squared. Between the two posts, you’ll
know all about both types. Adjusted R-squared is the better of the two. Although, my favorite is actually
predicted R-squared.
Reply

Juansays
July 6, 2018 at 4:16 am
Dear Jim,
Thanks a lot for your response, it answered some questions that I had for quite some time without
finding a clear/understandable explanation. I will certainly continue to follow the blog, it is a very
valuable source of information specially for us non-statisticians. I have already recommended it to my
colleges and i’m sure they will agree with me.
Best regards,
Juan F.
Reply

Jim Frostsays
July 6, 2018 at 10:58 am
Thanks so much, Juan. I appreciate that!
Reply

Juansays
July 3, 2018 at 4:06 am
Dear Jim,
Thanks for your explanation and fast response. Congratulations on such a good blog, it is very valuable
to be able to discuss / understand this topics in more friendly manner.
With respect to my question, I still have a couple of doubts.
– I can understand that one could obtain a high R2 and R2 (adj) in a model with significant curvature but,
shouldn’t the R2(pred) be generally low?
– isn’t the prediction power of the regression covered by including in the regression equation the center
point?

36. Interpret Adjusted R2 and Predicted R2.one (On 8-2-2022) Page 32


point?
I other words (correct me if i’m wrong), when curvature is significant in the regression model, then the
R2(Pred) is not relevant anymore and the model should not be used for predictive purposes?
Considering your comment on the residual plots, My versus fit seems (not clear though) that there
might be a pattern (scatter reduced as the fitted value is higher). Thus I did a regression after a Box-cox
transformation (Lambda =0.25) , eliminated variables with P>0.1 and I obtain a regression where
curvature is not significant (P= 0.2!!) and again great R2 values (R2:99%; R2adj: 98.7 and R2Pred: 96.8%)
…how to interpret this? is this resulting regression trustworthy and could it be used for predictive
purposes?
Thanks in advance for your time,
Regards,
Juan S.
Reply

Jim Frostsays
July 3, 2018 at 2:39 pm
Hi Juan,
Yes, it’s definitely possible that Predicted R-squared would be affected by inadequately modeling the
curvature. However, the degree to which the lack-of-fit affects it depends on how inadequate the fit is
and the number of observations. So, I couldn’t tell you specifically for your case whether it would be low
or not. But, definitely the lack of fit would impact it to some degree.
Center points allow you to detect curvature but are not sufficient to model the curvature.
I would agree, as I mention in my previous response, that I would not use the model to make predictions
when you know that it inadequately fits curvature that is present in the data. In that sense, yes, it
doesn’t matter what Predicted R-squared is because you know the predictions are biased. As I
mentioned, high R-squared values of any type do not indicate that your model provides an unbiased fit.
That pattern that you describe is heteroscedasticity. In my post about it, I discuss other options for
resolving it. A Box-Cox transformation is a recognized way to fix this problem, but I usually save that for
last solution I try. I prefer solutions that involve less data manipulation. I’m also a bit leery of how it
transformed away the curvature issue. However, I don’t have any specific reason to say that you
shouldn’t trust the model based on the limited information that I have. Just be sure to closely examine
the coefficients and be really certain that the signs and magnitudes fit with theory.
Also, be aware that the model fit statistics (the various R-squared values and S) apply to the transformed
response variable and not the response using natural units. That can make the model appear better than
it is. Although, they were high before the transformation, so no reason for concern.
Reply

Juansays
July 2, 2018 at 3:33 am
Dear Jim,
I recently started using Minitab for DoE. I work with an extraction process to evaluate the recovery
(Yield) of proteins. Evaluation of a half-factorial set of experiments with 5 variables DoE gave me a very
good regression model with R2(98.29%); adj-R2 (97.35%) and pred-R2 (95.57%). However I noticed that
my model indicates that the curvature is significant (P = 0.022). What is the effect of this curvature on
the predictive power of the model? in other words, is this model still good to make predictions? or is a
CCD required?
Reply

36. Interpret Adjusted R2 and Predicted R2.one (On 8-2-2022) Page 33


Jim Frostsays
July 2, 2018 at 2:25 pm
Hi Juan,
Yes, if the software detects curvature, it is usually a good idea to model that curvature. While R-squared
is high, you are trying to model a curve using a straight line, and that will lead to biased predictions. For
example, certain ranges of predictions might be systematically too high while other ranges could be
systematically too low. In my post about R-squared, in the section “Are High R-squared Values Always
Great?”, I show an example where the R-squared value is at 98.5% but the predictions are biased. Your
case is probably something like that–although obviously not necessarily mirroring the specific
relationship that I show. A high R-squared, and adjusted R-squared, don’t necessarily indicate that the
model provides an unbiased fit. Check those residual plots!
Thanks for writing. I hope this helps!
Reply

Sundarsays
June 24, 2018 at 12:38 pm
Dear Jim,
As usual brilliant post. However I would like to know about “The adjusted R-squared value actually
decreases when the term doesn’t improve the model fit by a sufficient amount.” How does the adjusted
R-square determines if addition of a variable has a positive or negative effect on the model.
Thanks
\
Reply

Jim Frostsays
June 25, 2018 at 10:50 am
Hi Sundar, the adjusted R-squared value decreases when the t-value for the coefficient is less than 1.
Reply

Franklin Moormannsays
April 10, 2018 at 1:25 am
When calculating predicted rsquared for a full dataset of 4000 data points, you would do all 4000 or a
random sample of those 4000 data points?
Reply

Jim Frostsays
April 12, 2018 at 3:05 pm
The procedure always cycles through the complete dataset and systematically removes one data point
at a time to calculate predicted R-squared.
Reply

36. Interpret Adjusted R2 and Predicted R2.one (On 8-2-2022) Page 34


Emanuel Lindströmsays
February 27, 2018 at 8:32 am
Hi Jim!
Awesome blog, and awesome posts! I’m learning a lot!!
I have 2 questions;
1. How is the predicted R-squared actually calculated? The step-by-step process you describe is iterated
for each data point in the population, but does that mean you get as many predicted R-squared as there
are data points, or do you do an additional step after iterating over all the data points?
2. Does predicted R-squared work even for large samples? I mean, it’s easy to see how the polynomial
line in the image changes if you remove a data point, but if there are more data points (100 more, or
even 1000 more), wouldn’t the over-fitted polynomial line stay the same and predict the one omitted
data point?
Again, thanks for an amazing resource!
Reply

Jim Frostsays
February 27, 2018 at 11:15 pm
Hi Emanuel,
Thanks so much! I’m glad you have found it to be helpful!
About predicted R-squared, which is really my favorite type of R-squared. Think about the error sum of
squares (SSE). This is where you take the squared differences between each observation and the fitted
value and sum them up across all observationa. It’s also known as the residual sum of squares because
it’s the sum of the squared residuals. A small value produces a high R-squared.
For predicted R-squared, you use the predicted error sum of squares (PRESS), which is similar to the SSE.
To calculate PRESS, you remove a point, refit the model, and then use the model to predict the removed
observation. Then, you take the removed value and subtract the predicted value and then square this
difference. You repeat for all of the removed values. You end up with a squared difference for each
value when it is removed. You then sum those squared differences and you have PRESS. A low PRESS
value produces a high predicted R-squared. So, it’s fairly analogous to the SSE but the squared
differences are based on predicting the missing values versus values that were used to fit the model.
Regarding point 2, yes, you’re correct, when you have more data points, it’s harder to overfit your
model and, hence, you wouldn’t expect a much lower predicted R-squared. Imagine you have a 1000
data points that follow the same U-shaped pattern. In that case, you’d be really sure about that curved
relationship because such a large number of data points aren’t going to follow that curve by chance.
That’s why you wouldn’t expect the predicted R-squared to drop when you have many data points.
However, fewer data points can produce that pattern by chance. If you remove one, it changes that
relationship noticeably. You’re not really certain that the relationship really is that U-shape. Predicted R-
squared detects this uncertainty and that’s why it drops.
Overfitting depends on the number of observations per term in the model, as you can read about in my
post about overfitting. You’d need a very, very complex model to overfit a dataset with 1000
observations!
I hope this helps!
Reply

MUHAMMAD K. N.says
January 10, 2018 at 7:07 pm

36. Interpret Adjusted R2 and Predicted R2.one (On 8-2-2022) Page 35


January 10, 2018 at 7:07 pm
Hi Jim ! I am working for a research on monitoring insect pest population fluctuation in Entomological
field, but I obtained mostly weaker r squared regression results and felt disturbed. What advise can you
give me in this regards.
Thanks
Muhammad.
Reply

Jim Frostsays
January 10, 2018 at 7:50 pm
Hi Muhammad! Unfortunately, that situation isn’t too uncommon and I’ve written a blog post that is
specifically about it:
Interpreting a Regression Model with a Low R-squared
A low R-square might or might not be a problem. If you have significant independent variables and your
main goal is to understand the relationships between the variables, a low R-squared is not necessarily a
problem.
However, if your main goal is to produce precise predictions, it can be a problem.
The blog post I recommend covers these scenarios and shows how it works. I think it’ll make your
situation more clear!
Reply

ALMAS KHURSHEEDsays
December 28, 2017 at 10:33 am
hi sir
i am very confuse how to write interpret statement for r2 if value is 0.68
can u please help me out
thank you
Reply

Jim Frostsays
December 28, 2017 at 10:58 am
Hi Almas, it means that the independent variables in your model collectively account for 68% of the
variability in the dependent variable around its mean. Click the link in the post to go to my post where I
talk about R-squared in more detail. I hope this helps!
Reply

Allan Paolo Labartinos Almajosesays


November 5, 2017 at 3:13 pm
Hi Jim! I’d like to ask for help regarding the calculation of predicted R-squared values. To be honest, my
nose bled (lol) after seeing the formula for the PRESS you provided in one of the comments above. Is
there a ‘layman’s way’ of computing this?
Actually, I had this idea:
– I remove one data point
– I regress the remaining points using the same model
– I try to predict the missing data point using the same model previously recalculated (the one with the

36. Interpret Adjusted R2 and Predicted R2.one (On 8-2-2022) Page 36


– I try to predict the missing data point using the same model previously recalculated (the one with the
reduced data point)
– The difference between the prediction of the model with complete data points and the prediction of
the new model with one data point removed is the PRESS of that point?
– I do this again for all of the remaining points
– I add all of the PRESS for each point, then sum-square everything, then compute R^2 normally, then
this R^2 is now the predicted R^2?
Is this even correct? I don’t know, this is just a wild guess. Please help me, I am totally at a loss here.
Thanks!
Reply

Jim Frostsays
November 7, 2017 at 11:46 am
Hi Allan, you’re very close! Think about how you usually calculate sums of squares. It’s the sum of the
squared deviations between the the fitted values and the observations. PRESS is similar except it is the
sum of the squared deviations between the fitted value of each removed observation and the removed
observation. So, the procedure basically removes each observation and uses the model to predict that
observation and squares the difference between the two. It does that systematically for all observations
and sums those squared differences. For your 4th point, you never fit the model with all observations
when calculating predicted R-squared. Instead, there is always one removed observation and you’re
essentially seeing how well the model predicts each removed observation. I hope this makes it more
clear!
Reply

Franklin Moormannsays
October 28, 2017 at 10:50 am
I’m not explaining well enough I believe. This is my formula results using junk data (with a rsquared
value of 0.2)
Predicted Rsquared = 1 – (PRESS / TSS) = 1 – (-1.04 / 67408.86) = 1.00
So as you can see something is definitely wrong.
Reply

Franklin Moormannsays
October 26, 2017 at 5:43 pm
I have no clue how to do diagonal elements in C# so I guess I’m going to have to go through and
eliminate one observation at a time and then calculate the press and rss after each elimination. Since
I’m doing that, how would I calculate the press statistic instead of doing the diagonal matrix stuff?
Reply

Franklin Moormannsays
October 26, 2017 at 7:15 pm
I found a workaround but I’m now getting a negative value for the press statistic so when I divide by the
total sum of squares it is returning 1 which I know isn’t correct
Reply

36. Interpret Adjusted R2 and Predicted R2.one (On 8-2-2022) Page 37


Jim Frostsays
October 27, 2017 at 10:34 am
Hi Franklin, actually for predicted R-squared (and adjusted R-squared) it is possible to get negative
values!
Reply

Jun Lisays
July 4, 2018 at 10:37 am
Hello Jim,
I develop an nonlinear regression model in R studio with R2 (0.904), R2(adj) 0.864 and R2 (predicted)
0.919. I wonder if it is possible that predicted R2 higher than the normal R2?
Hope for your reply.
Jun Li
Reply

Jim Frostsays
July 5, 2018 at 3:13 pm
Hi Jun Li,
First we need to make sure we’re clear on some terminology. Did you develop a true nonlinear model or
is it a linear model that uses polynomials to model curvature? You can read about the differences in my
post: The Difference between Linear and Nonlinear Models.
It’s an important distinction because R-squared and its variants are not valid for nonlinear models. If you
are truly using a nonlinear model, I suppose it might be possible to obtain a Predicted R-squared that is
higher than R-squared. Maybe. But, you shouldn’t be using any of those R-squared values because they
are invalid. You can use another goodness-of-fit statistic, such as the standard error of the regression.
For linear models, you can’t obtain a predicted R-squared that is higher than R-squared. That scenario
would indicate that the model predicts new observations better than it predicts the values used during
the model fitting process. That makes no sense.
I hope this helps!

Timsays
October 26, 2017 at 2:25 am
Hi Jim,
I know the way how R-squared is calculated in logistic regression is different. I wonder what would you
do if a reviewer asks you to provide similar indicator.
Thanks!
Tim
Reply

Jim Frostsays
October 26, 2017 at 1:37 pm

36. Interpret Adjusted R2 and Predicted R2.one (On 8-2-2022) Page 38


October 26, 2017 at 1:37 pm
Hi Tim,
There are two measures I’m most familiar with for logistic regression. One is deviance R-squared for
binary logistic regression. This statistic measure the proportion of the deviance in the dependent
variable that the model explains. Unlike R-squared, the format of the data affects the deviance R-
squared.
The other is Akaike Information Criterion (AIC), which measures the quality of a model based on fit and
the number of terms in the model.
Jim
Reply

Franklin Moormannsays
October 25, 2017 at 8:40 pm
I’m only supposed to remove one observation at a time to recalculate the prediction model but after
that, I’m supposed to use all original observations to run the calculations for press and tss?
Reply

Franklin Moormannsays
October 25, 2017 at 6:31 pm
I’m trying to create my own formula to calculate predicted rsquared and this was the only information
that I found on how to do it. I believe the formula to do this is predicted r2 = 1 – (press / tss) so would
you systematically leave off one data point at a time and calculate the press statistic and tss statistic and
add those values to a final total and calculate predicted r2 at the end?
Reply

Jim Frostsays
October 25, 2017 at 10:57 pm
Hi Franklin, here’s the predicted R-squared and PRESS formulas. The formulas don’t actually go through
and remove each observation one-at-a-time, but it is equivalent to that process.
Reply

Duc-Anh Luongsays
May 10, 2017 at 6:28 am
Hi Jim,
I have question about calculation of the predicted R squared in the linear regression.
(1). Is it true that in each time when we remove 1 data point, we have to fit model again and use this
model to predict the values of removed data point?
(2). Is it possible to get negative predicted R-squared?
Many thanks
Duc Anh
Reply

36. Interpret Adjusted R2 and Predicted R2.one (On 8-2-2022) Page 39


Jim Frostsays
May 13, 2017 at 4:55 pm
Hi Duc-Anh,
When the statistical software calculates predict R-squared, it systematically removes each observation
and determines how well the model based on all of the other observations predicts that value. The
software does this for all observations in the dataset and calculates the predicted error sums of squared
(PRESS). It then uses the PRESS to calculate the predicted R-squared. Usually, it uses the error sum of
squares (ESS) to calculate R-squared. All of these calculations occur behind the scenes. You don’t need
to worry about refitting the model for each observation. All you need to do is assess the predicted R-
squared with that process in mind so you know what it really means.
Yes, it is possible to obtain a negative predicted R-squared. However, some statistical software, such as
Minitab, rounds these negative values up to zero.
Thank you for writing with your excellent questions,
Jim
Reply
Comments and Questions
Primary Sidebar
Meet Jim
I’ll help you intuitively understand statistics by focusing on concepts and using plain English so you can
concentrate on understanding your results.
Read More...
Search this website
Buy My Introduction to Statistics eBook!
New! Buy My Hypothesis Testing eBook!
Buy My Regression eBook!
Subscribe by Email
Enter your email address to receive notifications of new posts by email.
I won't send you spam. Unsubscribe at any time.
Follow Me
Facebook
RSS Feed
Twitter
• Popular
• Latest
• How To Interpret R-squared in Regression Analysis
• How to Interpret P-values and Coefficients in Regression Analysis
• Measures of Central Tendency: Mean, Median, and Mode
• Normal Distribution in Statistics
• Multicollinearity in Regression Analysis: Problems, Detection, and Solutions
• How to Interpret the F-test of Overall Significance in Regression Analysis
• Understanding Interaction Effects in Statistics
Recent Comments
• gebretsadikshibreon New eBook Release! Regression Analysis: An Intuitive Guide
• Jim Froston Nonparametric Tests vs. Parametric Tests
• Jim Froston Percentiles: Interpretations and Calculations
• Jim Froston New eBook Release! Regression Analysis: An Intuitive Guide

36. Interpret Adjusted R2 and Predicted R2.one (On 8-2-2022) Page 40


• Jim Froston New eBook Release! Regression Analysis: An Intuitive Guide
• Jim Froston How to Interpret Adjusted R-Squared and Predicted R-Squared in Regression Analysis
Copyright © 2022 · Jim Frost · Privacy Policy

From <https://statisticsbyjim.com/regression/interpret-adjusted-r-squared-predicted-r-squared-regression/#more-938>

36. Interpret Adjusted R2 and Predicted R2.one (On 8-2-2022) Page 41

You might also like