Professional Documents
Culture Documents
Chapter 05
Chapter 05
Linear Regression
Objective: To quantify the linear
relationship between an explanatory
Chapter 5 variable (x) and response variable (y).
Chapter 5 1
Basic Practice of Statistics - 3rd Edition
Exercise: The heights and weights of 4 men are Coefficient of Determination (R2)
as follows
(6,170), (5.5,150),(5.8,170) and (6.2,180). Measures usefulness of regression prediction
R2(or r2, the square of the correlation):
a) Draw a scatterplot weight versus height
measures what fraction of the variation in the
b) Find the regression line. values of the response variable (y) is explained
c) Mark has a height of 5.7. Could you give a by the regression line
Prediction of his weight? r=1: R2=1: regression line explains all (100%) of
d) Plot a residual plot. (we will come back to this the variation in y
later) r=.7: R2=.49: regression line explains almost half
(50%) of the variation in y
Chapter 5 2
Basic Practice of Statistics - 3rd Edition
Residuals Residuals
An
outlier is an observation that lies far
away from the other observations
– outliers in the y direction have large residuals
– outliers in the x direction are often influential
for the least-squares regression line, meaning
that the removal of such points would
markedly change the equation of the line
Chapter 5 1
Outliers:
Case Study
Gesell Adaptive Score and Age at First Word
After removing
child 18
r2 = 11%
Cautions Caution:
about Correlation and Regression Beware of Extrapolation
only describe linear relationships Sarah’s height was
are both affected by outliers plotted against her
always plot the data before interpreting age
beware of extrapolation Can you predict her
– predicting outside of the range of x height at age 42
beware of lurking variables months?
– have important effect on the relationship among the Can you predict her
variables in a study, but are not included in the study height at age 30
association does not imply causation years (360 months)?
Caution: Caution:
Beware of Extrapolation Beware of Lurking Variables
Regression line: Meditation and Aging
y-hat = 71.95 + .383 x (Noetic Sciences Review, Summer 1993, p. 28)
height at age 42 Explanatory variable: observed meditation
months? y-hat = 88 practice (yes/no)
height at age 30 Response: level of age-related enzyme
years? y-hat = 209.8
– She is predicted to be general concern for one’s well being
6’ 10.5” at age 30. may also be affecting the response
(and the decision to try meditation)
Caution: Caution:
Correlation Does Not Imply Causation Correlation Does Not Imply Causation
Social Relationships and Health
Even very strong correlations may House, J., Landis, K., and Umberson, D. “Social Relationships
and Health,” Science, Vol. 241 (1988), pp 540-545.
not correspond to a real causal
Does lack of social relationships cause people to
relationship (changes in x actually become ill? (there was a strong correlation)
causing changes in y). Or, are unhealthy people less likely to establish and
maintain social relationships? (reversed relationship)
(correlation may be explained by a
Or, is there some other factor that predisposes people
lurking variable)
both to have lower social activity and become ill?
Chapter 5 1
Basic Practice of Statistics - 3rd Edition
Chapter 5 2