You are on page 1of 1

Correlation Coefficient: Cautions

R is linear association only

Correlation does not show Causation


There may be a lurking variable that links your two variables

There are three things we need to do to figure out:


1. Look at the scatter plot and see if it looks more or less linear
2. As long as it isn’t curvilinear then we can use the correlation coefficient r, to determine its
strength
3. If r is close enough to 1 or -1 we can use the linear model

Using a linear model to make predictions


The least squares regression line gives predictions for ŷ that we label,
ŷ = a + bx, where a is the y intercept and b is the slope

Example:
What is the height prediction for a shoe length of 29cm?
Plug x=29 into the least squares
ŷ = 90.8 + 3.1x
ŷ = 90.8 + 3.1 (29)
ŷ = 98.0 + 89.9 = 180.7
Interpret: we expect that a person with shoe length pf 29cm will be about 181cm tall
- For every increase of 1 cm in shoe length the predicates height will increase by 3.2cm

● a is the y-intercept
● It is the predicted value of y when x = 0

What can go wrong with predictions?


● When one or two observations drastically change the correlation coefficient of the linear
regression equation they are called influential observations

What else can go wrong?


Extrapolation
● Interpolation uses a linear model to predict values within the range of data from which
the linear was derived
● Extrapolation uses a linear model to predict outside the range of data from which the
linear model was derived
● Extrapolation should be avoided

You might also like