You are on page 1of 9

Regression Analysis: How to Interpret the Constant (Y Intercept) | Minitab

http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis...

Data Analysis (http://blog.minitab.com/blog/data-analysis-2)


Quality Improvement (http://blog.minitab.com/blog/quality-improvement-2)
Project Tools (http://blog.minitab.com/blog/project-tools-2)

Minitab.com (http://www.minitab.com)

Jim Frost (http://blog.minitab.com/blog/adventures-in-statistics) . 11 July, 2013


4

259

61

()
()9 (http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis-how-to-interpret-the-constanty-intercept)

The constant term in linear regression


analysis seems to be such a simple
thing. Also known as the y intercept, it
is simply the value at which the fitted
line crosses the y-axis.

Master
Statistics
Anytime,
Anywhere
Quality Trainer
teaches you how to
analyze your data
anytime you are
online.

While the concept is simple, Ive seen


a lot of confusion about interpreting
the constant. Thats not surprising
because the value of the constant
term is almost always meaningless!
Paradoxically, while the value is generally meaningless, it is crucial to include the
constant term in most regression models!

In this post, Ill show you everything you need to know about the constant in linear
Take the Tour! (
regression analysis.
http://www.minitab.com
/products
I'll use fitted line plots to illustrate the concepts because it really brings the math to
/quality-trainer
life. However, a 2D fitted line plot can only display the results from simple
/?WT.ac=BlogQT)
regression, which has one predictor variable and the response. The concepts hold
true for multiple linear regression, but I cant graph the higher dimensions that are
required.

Zero Settings for All of the Predictor Variables Is Often


Impossible

1 of 9

12/12/2014 11:48 AM

Regression Analysis: How to Interpret the Constant (Y Intercept) | Minitab

http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis...

Ive often seen the constant described as the mean response value when all
predictor variables are set to zero. Mathematically, thats correct. However, a zero
setting for all predictors in a model is often an impossible/nonsensical
combination, as it is in the following example.
In my last post about the interpretation of regression p-values and coefficients
(http://blog.minitab.com/blog/adventures-in-statistics/how-to-interpretregression-analysis-results-p-values-and-coefficients), I used a fitted line plot to
illustrate a weight-by-height regression analysis. Below, Ive changed the scale of
the y-axis on that fitted line plot, but the regression results are the same as before.

If you follow the blue fitted line down to where it intercepts the y-axis, it is a fairly
negative value. From the regression equation, we see that the intercept value is
-114.3. If height is zero, the regression equation predicts that weight is -114.3
kilograms!
Clearly this constant is meaningless and you shouldnt even try to give it meaning.
No human can have zero height or a negative weight!
Now imagine a multiple regression analysis with many predictors. It becomes even
more unlikely that ALL of the predictors can realistically be set to zero.
If all of the predictors cant be zero, it is impossible to interpret the value of the
constant. Don't even try!

Zero Settings for All of the Predictor Variables Can Be


Outside the Data Range
2 of 9

12/12/2014 11:48 AM

Regression Analysis: How to Interpret the Constant (Y Intercept) | Minitab

http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis...

Even if its possible for all of the predictor variables to equal zero, that data point
might be outside the range of the observed data.
You should never use a regression model to make a prediction for a point that is
outside the range of your data because the relationship between the variables
might change. The value of the constant is a prediction for the response value
when all predictors equal zero. If you didn't collect data in this all-zero range, you
can't trust the value of the constant.
The height-by-weight example illustrates this concept. These data are from middle
school girls and we cant estimate the relationship between the variables outside of
the observed weight and height range. However, we can get a sense that the
relationship changes by marking the average weight and height for a newborn
baby on the graph. Thats not quite zero height, but it's as close as we can get.

I drew the red circle near the origin to approximate the newborn's average height
and weight. You can clearly see that the relationship must change as you extend
the data range!
So the relationship we see for the observed data is locally linear, but it changes
beyond that. Thats why you shouldnt predict outside the range of your data...and
another reason why the regression constant can be meaningless.

The Constant Is the Garbage Collector for the


Regression Model
Even if a zero setting for all predictors is a plausible scenario, and even if you

3 of 9

12/12/2014 11:48 AM

Regression Analysis: How to Interpret the Constant (Y Intercept) | Minitab

http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis...

collect data within that all-zero range, the constant might still be meaningless!
The constant term is in part estimated by the omission of predictors from a
regression analysis. In essence, it serves as a garbage bin for any bias that is not
accounted for by the terms in the model. You can picture this by imagining that the
regression line floats up and down (by adjusting the constant) to a point where the
mean of the residuals is zero, which is a key assumption for residual analysis
(http://blog.minitab.com/blog/adventures-in-statistics/why-you-need-to-checkyour-residual-plots-for-regression-analysis). This floating is not based on what
makes sense for the constant, but rather what works mathematically to produce
that zero mean.
The constant guarantees that the residuals dont have an overall positive or
negative bias, but also makes it harder to interpret the value of the constant
because it absorbs the bias.

Why Is it Crucial to Include the Constant in a


Regression Model?
Immediately above, we saw a key reason why you should include the constant in
your regression model. It guarantees that your residuals have a mean of zero.
Additionally, if you dont include the constant, the regression line is forced to go
through the origin. This means that all of the predictors and the response variable
must equal zero at that point. If your fitted line doesnt naturally go through the
origin, your regression coefficients and predictions will be biased if don't include
the constant.
Ill use the height and weight regression example to illustrate this concept. First, Ill
use General Regression in Minitab statistical software (http://www.minitab.com
/en-us/products/minitab/) to fit the model without the constant. In the output
below, you can see that there is no constant, just a coefficient for height.

Next, Ill overlay the line for this equation on the previous fitted line plot so we can
compare the model with and without the constant.

4 of 9

12/12/2014 11:48 AM

Regression Analysis: How to Interpret the Constant (Y Intercept) | Minitab

http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis...

The blue line is the fitted line for the regression model with the constant while the
green line is for the model without the constant. Clearly, the green line just doesnt
fit. The slope is way off and the predicted values are biased. For the model without
the constant, the weight predictions tend to be too high for shorter subjects and
too low for taller subjects.
In closing, the regression constant is generally not worth interpreting. Despite this,
it is almost always a good idea to include the constant in your regression analysis.
In the end, the real value of a regression model is the ability to understand how the
response variable changes when you change the values of the predictor variables.
Don't worry too much about the constant!
If you're learning about regression, read my regression tutorial
(http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis-tutorialand-examples)!

You Might Also Like:


How to Interpret Regression Analysis Results: P-values and Coefficients (http://blog.minitab.com
/blog/adventures-in-statistics/how-to-interpret-regression-analysis-results-p-values-and-coefficients)
Regression Analysis Tutorial and Examples (http://blog.minitab.com/blog/adventures-in-statistics/regressionanalysis-tutorial-and-examples)
How to Predict with Minitab: Using BMI to Predict the Body Fat Percentage, Part 1 (http://blog.minitab.com
/blog/adventures-in-statistics/how-to-predict-with-minitab-using-bmi-to-predict-the-body-fat-percentagepart-1)
Regression Analysis: How Do I Interpret R-squared and Assess the Goodness-of-Fit? (http://blog.minitab.com

5 of 9

12/12/2014 11:48 AM

Regression Analysis: How to Interpret the Constant (Y Intercept) | Minitab

http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis...

/blog/adventures-in-statistics/regression-analysis-how-do-i-interpret-r-squared-and-assess-the-goodnessof-fit)

Comments
Name: Tim McDaniel Sunday, September 15, 2013
Very nice. It is amazing, and I think understandable, how desperately new-to-regression students want to
attach a substantively meaningful interpretation to the intercept term. I tell students that one could interpret
the intercept as a "correction factor" when using particular values of the x's to predict y.

Name: Alex Sunday, January 12, 2014


I'm studying empirical economic research in Germany and the lecture notes did not explain this parameter, it
was just there. Thank you very much for explaining this with graphics!

Name: Hermanto Wednesday, April 30, 2014


This is an excellent explanation, particularly for a negative constant in regression analysis. Thanks.

Name: Stanley Saturday, May 17, 2014


Great! Very helpfu material to me.

6 of 9

12/12/2014 11:48 AM

Regression Analysis: How to Interpret the Constant (Y Intercept) | Minitab

http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis...

5 Comments

Egwuchukwu Frank Nweke

Regression line drawn as Y=c+1075x, when x was 2, Y was 239, given that Y intercept was
11,. Calculate the residual.

Jim Frost At Minitab

Mod

Hi, this sounds like a homework assignment to me. ;)


The residual equals the observed value minus the fitted value. So, let's figured out both
of those.
You state that the observed value for Y is 239.
We'll plug in your values in the equation to figure out the fitted value.
Y=11+1075*2. So, the fitted value equals 2161.
So, the residual is 239 - 2161 = -1922
Jim

John K.

Jim, can you elaborate on the purpose and meaning of assessing the significance of a
constant. The significance measure is included in regression results and occasionally is way
above .05 (in my example: .559). Thank you!

Jim Frost At Minitab

Mod

Hi John,
The strict technical meaning of the p-value for the constant is that it measures how
compatible your data are with the null hypothesis that the constant equals zero. If you
have a sufficiently low p-value for the constant, you can reject the null hypothesis and
conclude that the constant does not equal zero. In other words, the regression line
does not go through the origin.
Your higher p-value indicates that you cannot reject the null that the constant equals
zero. Your constant could be zero.
However because the value of the constant is generally meaningless determining
7 of 9

12/12/2014 11:48 AM