You are on page 1of 6

Basic Practice of Statistics - 3rd Edition

Linear Regression
 Objective: To quantify the linear
relationship between an explanatory
Chapter 5 variable (x) and response variable (y).

 We can then predict the average


Regression response for all subjects with a given
value of the explanatory variable.

BPS - 5th Ed. Chapter 5 1 BPS - 5th Ed. Chapter 5 2

Prediction via Regression Line Least Squares


Number of new birds and Percent returning
Example: predicting   Used to determine the “best” line
number (y) of new
  We want the line to be as close as possible to
adult birds that join
the colony based on the data points in the vertical (y) direction
the percent (x) of (since that is what we are trying to predict)
adult birds that
return to the colony   Least Squares: use the line that minimizes
from the previous the sum of the squares of the vertical distances
year. of the data points from the line

BPS - 5th Ed. Chapter 5 3 BPS - 5th Ed. Chapter 5 4

Least Squares Regression Line Prediction via Regression Line


Number of new birds and Percent returning
^
 Regression equation: y = a + bx   The regression equation is
y-hat = 31.9343 - 0.3040x
– x is the value of the explanatory variable –  y-hat is the average number of new birds for all
– “y-hat” is the average value of the response colonies with percent x returning
variable (predicted response for a value of x)   For all colonies with 60% returning, we predict
the average number of new birds to be 13.69:
– note that a and b are just the intercept and 31.9343 - (0.3040)(60) = 13.69 birds
slope of a straight line
  Suppose we know that an individual colony has
– note that r and b are not the same thing, but 60% returning. What would we predict the
their signs will agree number of new birds to be for just that colony?

BPS - 5th Ed. Chapter 5 5 BPS - 5th Ed. Chapter 5 6

Chapter 5 1
Basic Practice of Statistics - 3rd Edition

Regression Line Calculation Regression Calculation


Case Study
^
 Regression equation: y = a + bx
Per Capita Gross Domestic Product
and Average Life Expectancy for
Countries in Western Europe

where sx and sy are the standard deviations of


the two variables, and r is their correlation

BPS - 5th Ed. Chapter 5 7 BPS - 5th Ed. Chapter 5 8

Regression Calculation Regression Calculation


Case Study Case Study
Country Per Capita GDP (x) Life Expectancy (y) Linear regression equation:
Austria 21.4 77.48
Belgium 23.2 77.53
Finland 20.0 77.32
France 22.7 78.63
Germany 20.8 77.17
Ireland 18.6 76.39
Italy 21.5 78.51
Netherlands 22.0 78.15
Switzerland 23.8 78.99 ^
United Kingdom 21.2 77.37
y = 68.716 + 0.420x

BPS - 5th Ed. Chapter 5 9 BPS - 5th Ed. Chapter 5 10

Exercise: The heights and weights of 4 men are Coefficient of Determination (R2)
as follows
(6,170), (5.5,150),(5.8,170) and (6.2,180).   Measures usefulness of regression prediction
  R2(or r2, the square of the correlation):
a) Draw a scatterplot weight versus height
measures what fraction of the variation in the
b) Find the regression line. values of the response variable (y) is explained
c)  Mark has a height of 5.7. Could you give a by the regression line
Prediction of his weight?   r=1: R2=1: regression line explains all (100%) of
d) Plot a residual plot. (we will come back to this the variation in y
later)   r=.7: R2=.49: regression line explains almost half
(50%) of the variation in y

BPS - 5th Ed. Chapter 5 11 BPS - 5th Ed. Chapter 5 12

Chapter 5 2
Basic Practice of Statistics - 3rd Edition

Residuals Residuals

 A residualis the difference between an  A residualplot is a scatterplot of the


observed value of the response variable regression residuals against the
and the value predicted by the regression explanatory variable
line: – used to assess the fit of a regression line
residual = y - y^ – look for a “random” scatter around zero

BPS - 5th Ed. Chapter 5 13 BPS - 5th Ed. Chapter 5 14

Case Study Residual Plot:


Gesell Adaptive Score and Age at First Word Case Study
Draper, N. R. and John, J. A. “Influential observations and outliers
in regression,” Technometrics, Vol. 23 (1981), pp. 21-26. Gesell Adaptive Score and Age at First Word

BPS - 5th Ed. Chapter 5 15 BPS - 5th Ed. Chapter 5 16

Outliers and Influential Points

 An
outlier is an observation that lies far
away from the other observations
– outliers in the y direction have large residuals
– outliers in the x direction are often influential
for the least-squares regression line, meaning
that the removal of such points would
markedly change the equation of the line

BPS - 5th Ed. Chapter 5 17

Chapter 5 1
Outliers:
Case Study
Gesell Adaptive Score and Age at First Word

After removing
child 18
r2 = 11%

From all the data


r2 = 41%

BPS - 5th Ed. Chapter 5 18


Basic Practice of Statistics - 3rd Edition

Cautions Caution:
about Correlation and Regression Beware of Extrapolation
  only describe linear relationships   Sarah’s height was
  are both affected by outliers plotted against her
  always plot the data before interpreting age
  beware of extrapolation   Can you predict her
–  predicting outside of the range of x height at age 42
  beware of lurking variables months?
–  have important effect on the relationship among the   Can you predict her
variables in a study, but are not included in the study height at age 30
  association does not imply causation years (360 months)?

BPS - 5th Ed. Chapter 5 19 BPS - 5th Ed. Chapter 5 20

Caution: Caution:
Beware of Extrapolation Beware of Lurking Variables
  Regression line: Meditation and Aging
y-hat = 71.95 + .383 x (Noetic Sciences Review, Summer 1993, p. 28)
  height at age 42   Explanatory variable: observed meditation
months? y-hat = 88 practice (yes/no)
  height at age 30   Response: level of age-related enzyme
years? y-hat = 209.8
– She is predicted to be   general concern for one’s well being
6’ 10.5” at age 30. may also be affecting the response
(and the decision to try meditation)

BPS - 5th Ed. Chapter 5 21 BPS - 5th Ed. Chapter 5 22

Caution: Caution:
Correlation Does Not Imply Causation Correlation Does Not Imply Causation
Social Relationships and Health
Even very strong correlations may House, J., Landis, K., and Umberson, D. “Social Relationships
and Health,” Science, Vol. 241 (1988), pp 540-545.
not correspond to a real causal
  Does lack of social relationships cause people to
relationship (changes in x actually become ill? (there was a strong correlation)
causing changes in y).   Or, are unhealthy people less likely to establish and
maintain social relationships? (reversed relationship)
(correlation may be explained by a
  Or, is there some other factor that predisposes people
lurking variable)
both to have lower social activity and become ill?

BPS - 5th Ed. Chapter 5 23 BPS - 5th Ed. Chapter 5 24

Chapter 5 1
Basic Practice of Statistics - 3rd Edition

Evidence of Causation Exercise 5.34. Data on the heights in inches of


11 pairs of brothers and sisters
  A properlyconducted experiment establishes
the connection (chapter 9)
a) Plot the scatter plot. Find the least squares
  Other considerations:
Line. Make a residual plot.
–  The association is strong
–  The association is consistent
b)Damien is 70 inches tall. Predict the height of
  The connection happens in repeated trials His sister Tonya. Do you except your prediction
  The connection happens under varying conditions To be very accurate?
–  Higher doses are associated with stronger responses
–  Alleged cause precedes the effect in time
–  Alleged cause is plausible (reasonable explanation)

BPS - 5th Ed. Chapter 5 25 BPS - 5th Ed. Chapter 5 26

Chapter 5 2

You might also like