analysis seems to be such a simple

thing. Also known as the y intercept, it

is simply the value at which the fitted

line crosses the y-axis.

regression, which has one predictor variable and the response. The concepts hold

true for multiple linear regression, but I cant graph the higher dimensions that are

required.

Impossible

Ive often seen the constant described as the mean response value when all

predictor variables are set to zero. Mathematically, thats correct. However, a zero

setting for all predictors in a model is often an impossible/nonsensical

combination, as it is in the following example.

In my last post about the interpretation of regression p-values and coefficients

(http://blog.minitab.com/blog/adventures-in-statistics/how-to-interpretregression-analysis-results-p-values-and-coefficients), I used a fitted line plot to

illustrate a weight-by-height regression analysis. Below, Ive changed the scale of

the y-axis on that fitted line plot, but the regression results are the same as before.

If you follow the blue fitted line down to where it intercepts the y-axis, it is a fairly

negative value. From the regression equation, we see that the intercept value is

-114.3. If height is zero, the regression equation predicts that weight is -114.3

kilograms!

Clearly this constant is meaningless and you shouldnt even try to give it meaning.

No human can have zero height or a negative weight!

Now imagine a multiple regression analysis with many predictors. It becomes even

more unlikely that ALL of the predictors can realistically be set to zero.

If all of the predictors cant be zero, it is impossible to interpret the value of the

constant. Don't even try!

Outside the Data Range

Even if its possible for all of the predictor variables to equal zero, that data point

might be outside the range of the observed data.

You should never use a regression model to make a prediction for a point that is

outside the range of your data because the relationship between the variables

might change. The value of the constant is a prediction for the response value

when all predictors equal zero. If you didn't collect data in this all-zero range, you

can't trust the value of the constant.

The height-by-weight example illustrates this concept. These data are from middle

school girls and we cant estimate the relationship between the variables outside of

the observed weight and height range. However, we can get a sense that the

relationship changes by marking the average weight and height for a newborn

baby on the graph. Thats not quite zero height, but it's as close as we can get.

I drew the red circle near the origin to approximate the newborn's average height

and weight. You can clearly see that the relationship must change as you extend

the data range!

So the relationship we see for the observed data is locally linear, but it changes

beyond that. Thats why you shouldnt predict outside the range of your data...and

another reason why the regression constant can be meaningless.

Regression Model

Even if a zero setting for all predictors is a plausible scenario, and even if you

collect data within that all-zero range, the constant might still be meaningless!

The constant term is in part estimated by the omission of predictors from a

regression analysis. In essence, it serves as a garbage bin for any bias that is not

accounted for by the terms in the model. You can picture this by imagining that the

regression line floats up and down (by adjusting the constant) to a point where the

mean of the residuals is zero, which is a key assumption for residual analysis

(http://blog.minitab.com/blog/adventures-in-statistics/why-you-need-to-checkyour-residual-plots-for-regression-analysis). This floating is not based on what

makes sense for the constant, but rather what works mathematically to produce

that zero mean.

The constant guarantees that the residuals dont have an overall positive or

negative bias, but also makes it harder to interpret the value of the constant

because it absorbs the bias.

Regression Model?

Immediately above, we saw a key reason why you should include the constant in

your regression model. It guarantees that your residuals have a mean of zero.

Additionally, if you dont include the constant, the regression line is forced to go

through the origin. This means that all of the predictors and the response variable

must equal zero at that point. If your fitted line doesnt naturally go through the

origin, your regression coefficients and predictions will be biased if don't include

the constant.

Ill use the height and weight regression example to illustrate this concept. First, Ill

use General Regression in Minitab statistical software (http://www.minitab.com

/en-us/products/minitab/) to fit the model without the constant. In the output

below, you can see that there is no constant, just a coefficient for height.

Next, Ill overlay the line for this equation on the previous fitted line plot so we can

compare the model with and without the constant.

The blue line is the fitted line for the regression model with the constant while the

green line is for the model without the constant. Clearly, the green line just doesnt

fit. The slope is way off and the predicted values are biased. For the model without

the constant, the weight predictions tend to be too high for shorter subjects and

too low for taller subjects.

In closing, the regression constant is generally not worth interpreting. Despite this,

it is almost always a good idea to include the constant in your regression analysis.

In the end, the real value of a regression model is the ability to understand how the

response variable changes when you change the values of the predictor variables.

Don't worry too much about the constant!

If you're learning about regression, read my regression tutorial

(http://blog.minitab.com/blog/adventures-in-statistics/regression-analysis-tutorialand-examples)!

How to Interpret Regression Analysis Results: P-values and Coefficients (http://blog.minitab.com

/blog/adventures-in-statistics/how-to-interpret-regression-analysis-results-p-values-and-coefficients)

Regression Analysis Tutorial and Examples (http://blog.minitab.com/blog/adventures-in-statistics/regressionanalysis-tutorial-and-examples)

How to Predict with Minitab: Using BMI to Predict the Body Fat Percentage, Part 1 (http://blog.minitab.com

/blog/adventures-in-statistics/how-to-predict-with-minitab-using-bmi-to-predict-the-body-fat-percentagepart-1)

Regression Analysis: How Do I Interpret R-squared and Assess the Goodness-of-Fit? (http://blog.minitab.com

Comments

Name: Tim McDaniel Sunday, September 15, 2013

Very nice. It is amazing, and I think understandable, how desperately new-to-regression students want to

attach a substantively meaningful interpretation to the intercept term. I tell students that one could interpret

the intercept as a "correction factor" when using particular values of the x's to predict y.

I'm studying empirical economic research in Germany and the lecture notes did not explain this parameter, it

was just there. Thank you very much for explaining this with graphics!

This is an excellent explanation, particularly for a negative constant in regression analysis. Thanks.

Great! Very helpfu material to me.

Regression line drawn as Y=c+1075x, when x was 2, Y was 239, given that Y intercept was

11,. Calculate the residual.

Mod

The residual equals the observed value minus the fitted value. So, let's figured out both

of those.

You state that the observed value for Y is 239.

We'll plug in your values in the equation to figure out the fitted value.

Y=11+1075*2. So, the fitted value equals 2161.

So, the residual is 239 - 2161 = -1922

Jim

John K.

Jim, can you elaborate on the purpose and meaning of assessing the significance of a

constant. The significance measure is included in regression results and occasionally is way

above .05 (in my example: .559). Thank you!

Mod

Hi John,

The strict technical meaning of the p-value for the constant is that it measures how

compatible your data are with the null hypothesis that the constant equals zero. If you

have a sufficiently low p-value for the constant, you can reject the null hypothesis and

conclude that the constant does not equal zero. In other words, the regression line

does not go through the origin.

Your higher p-value indicates that you cannot reject the null that the constant equals

zero. Your constant could be zero.

However because the value of the constant is generally meaningless determining

