Professional Documents
Culture Documents
Sociology 8811
Copyright © 2007 by Evan Schofer
Do not copy or distribute without
permission
Announcements
• None
Multiple Regression Assumptions
• 3. d. Predictors (Xis) are uncorrelated with error
– This most often happens when we leave out an
important variable that is correlated with another Xi
– Example: Predicting job prestige with family wealth,
but not including education
– Omission of education will affect error term. Those
with lots of education will have large positive errors.
• Since wealth is correlated with education, it will be
correlated with that error!
– Result: coefficient for family wealth will be biased.
Multiple Regression Assumptions
• 4. In systems of equations, error terms of
equations are uncorrelated
• Knoke, p. 256
– This is not a concern for us in this class
• Worry about that later!
Multiple Regression Assumptions
• 5. Sample is independent, errors are random
– Not only should errors not increase with X
(heteroskedasticity), there should be no pattern at all!
• Cases that are non-independent often have
correlated error
• Things that cause patterns in error
(autocorrelation):
– Measuring data over long periods of time (e.g., every
year). Error from nearby years may be correlated.
• Called: “Serial correlation”.
Multiple Regression Assumptions
• More things that cause patterns in error
(autocorrelation):
– Measuring data in families. All members are similar,
will have correlated error
– Measuring data in geographic space.
• Example: data on 50 US states. States in a similar region
have correlated error
• Called “spatial autocorrelation”
• There are variations of regression models to
address each kind of correlated error.
Regression: Outliers
• Note: Even if regression assumptions are met,
slope estimates can have problems
• Example: Outliers -- cases with extreme values
that differ greatly from the rest of your sample
• More formally: “influential cases”
• Outliers can result from:
• Errors in coding or data entry
• Highly unusual cases
• Or, sometimes they reflect important “real” variation
• Even a few outliers can dramatically change
estimates of the slope, especially if N is small.
Regression: Outliers
• Outlier Example: Extreme case that
pulls regression
4 line up
-4 -2 0 2 4
Regression line
-2 with extreme case
removed from
sample
-4
Regression: Outliers
• Strategy for identifying outliers:
• 1. Look at regression partial plots (avplots) for
extreme values
• 2. Compute outlier diagnostic statistics
– High values indicate potential outliers
• “Leverage”
• Cook’s D
• DFFIT
• DFBETA
• residuals, standardized residuals, studentized residuals.
Scatterplots
• Example: Study time and student achievement.
– X variable: Average # hours spent studying per day
– Y variable: Score on reading test
Case X Y Y axis
30
1 2.6 28
2 1.4 13
20
3 .65 17
4 4.1 31
10
5 .25 8
6 1.9 16 X axis
7 3.5 6 0
0 1 2 3 4
Outliers
• Results with outlier:
Model Summaryb