Regression

15: Linear Regression
Expected change in Y per unit X

Introduction
(p. 15.1)
• X = independent (explanatory) variable

• Y = dependent (response) variable
• Use instead of correlation
when distribution of X is fixed by researcher (i.e.,
set number at each level of X)
studying functional dependency between X and Y
Illustrative data (bicycle.sav)
(p. 15.1)
• Same as prior chapter

• X = percent receiving 60
reduce or free meal 50
(RFM) 40
• Y = percent using 30
helmets (HELM) 20
• n = 12 (outlier 10
removed to study
0
0 20 40 60 80 100
linear relation)
X - % Receiving Reduced Fee School Lunch
Regression Model (Equation)
(p. 15.2)
yˆ  a  bX “y hat”
where
yˆ represents predicted average of Y at a given X
a represents the line's intercept
b represents the line's slope
How formulas determine best line
(p. 15.2)
• Distance of points
from line = 60
residuals (dotted)
50
40
• Minimizes sum of 30
square residuals 20
• Least squares
10
regression line
0 20 40 60 80 100

Formulas for Least Squares Coefficients
with Illustrative Data (p. 15.2 – 15.3)
SS XY  4231.1333
b   0.539
SS XX 7855.67
a  y  bx  30.8833  (0.539)(30.8333)  47.49
SPSS output:
Alternative formula for slope
sY r
b
sX
Interpretation of Slope (b)
(p. 15.3)
• b = expected change in Y
per unit X 60
• Keep track of units! 50

Intercept
– Y = helmet users per 100 40
1 Unit X
– X = % receiving free lunch 30 Slope
• e.g., b of –0.54 predicts 20
decrease of 0.54 units of Y 10
for each unit X 0

0 20 40 60 80 100

Predicting Average Y
• ŷ = a + bx
Predicted Y = intercept + (slope)(x)
HELM = 47.49 + (–0.54)(RFM)
• What is predicted HELM when RFM = 50?
ŷ = 47.49 + (–0.54)(50) = 20.5
Average HELM predicted to be 20.5 in neighborhood
where 50% of children receive reduced or free meal
• What is average Y when x = 20?
ŷ = 47.49 +(–0.54)(20) = 36.7
Confidence Interval for Slope Parameter
(p. 15.4)
95% confidence Interval for ß = b  (t n  2 ,.975 )( seb )
where
b = point estimate for slope
tn-2,.975 = 97.5th percentile (from t table or StaTable)
seb = standard error of slope estimate (formula 5)
seY | x
seb  SSYY  b( SS XY )
SS XX seY | x 
n2
standard error of regression
Illustrative Example (bicycle.sav)
• 95% confidence interval for 
= –0.54 ± (t10,.975)(0.1058)
= –0.54 ± (2.23)(0.1058)
= –0.54 ± 0.24
= (–0.78, –0.30)
Interpret 95% confidence interval
Model:
Point estimate for slope (b) = –0.54
Standard error of slope (seb) = 0.24
95% confidence interval for  = (–0.78, –0.30)
Interpretation:
slope estimate = –0.54 ± 0.24
We are 95% confident the slope parameter falls
between –0.78 and –0.30
Significance Test
(p. 15.5)
• H0: ß = 0
• tstat (formula 7) with df = n – 2
• Convert tstat to p value
b  0.539
t stat    5.09
seb 0.1058
df =12 – 2 = 10
p = 2×area beyond tstat on t10
Use t table and StaTable
Regression ANOVA
(not in Reader & NR)
• SPSS also does an analysis of variance on regression model

• Sum of squares of fitted values around grand mean = (ŷi – ÿ)²
• Sum of squares of residuals around line = (yi– ŷi )²
• Fstat provides same p value as tstat
• Want to learn more about relation between ANOVA and regression?
(Take regression course)
Distributional Assumptions
(p. 15.5)
• Linearity
• Independence
• Normality
• Equal variance X
Regression Line
Distribution of Y
Y
Validity Assumptions
(p. 15.6)
• Data = farr1852.sav 200
X = mean elevation above sea level

Y = cholera mortality per 10,000
Mean mortality from Cholera per 10,000

100
80
• Scatterplot (right) shows negative 60
correlation 40
• Correlation and regression

computations reveal: 20
r = -0.88
10
ŷ = 129.9 + (-1.33)x 8
p = .009
6
0 100 200 300 400
• Farr used these results to support Mean Elevation of the Ground above the Higwater Mark
miasma theory and refute contagion
theory
– But data not valid (confounded by
“polluted water source”)

Regression

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Regression

Uploaded by

Copyright:

Available Formats

15: Linear Regression

Expected change in Y per unit X

• X = independent (explanatory) variable

• Same as prior chapter

reduce or free meal 50

X - % Receiving Reduced Fee School Lunch

• Keep track of units! 50

• e.g., b of –0.54 predicts 20

decrease of 0.54 units of Y 10

for each unit X 0

X - % Receiving Reduced Fee School Lunch

• SPSS also does an analysis of variance on regression model

X = mean elevation above sea level

Mean mortality from Cholera per 10,000

• Correlation and regression

You might also like