You are on page 1of 47

Simple Linear Regression, Part Deux

Agenda
• Hypothesis test for regression model
– Assumptions
• Confidence Interval
• Correlation vs. Regression
Hypothesis Test!
• Simple Linear Regression (i.e., only one x)
– Test r for significance
or
– Test b1 for significance
• Answers the questions:
– Is the slope significant?
– Is the simple linear regression equation significant?
– Is there a linear relationship between x and y?
– Does knowing x help predict y?
– Does x explain a significant amount of variance in y?
Regression Hypothesis Test!
• Steps
– 1. Assumptions
– 2. Hypotheses
– 3. Calculate t
• Will need b1 and SEb1
– 4. Find t*
– 5. Conclusion
Example
• You survey 9 recent college graduates and ask
their high school and college GPA. When you
analyze this data, you find the following
regression equation.
cGPA = 1.048 + 0.675 (hsGPA) #
1
hsGPA
3.09
cGPA
2.98
2 3.62 3.60
3 2.63 2.87
4 3.20 3.00
5 2.36 2.11
6 2.98 3.25
7 2.06 2.80
8 2.13 2.50
9 3.13 3.33

mean 2.80 2.94


sd 0.53 0.45
Assumptions
• 1. Random sample of all possible values of yi
for each xi
– For each hsGPA, need a random sample of all
possible cGPAs
• 2. Independent observations
• 3. x and y are linearly related
• 4. Residuals are normally distributed at each x
• 5. Residuals have equal variance
Linear Regression

How can we check assumptions 3, 4, 5?


1. Random Sample
2. Independent observations
3. Linearity
4. Residuals are normal
5. Residuals have equal variance

Ø Look at the Residual Plot


Linear Regression

Residual Plot

• X values are on the horizontal axis


• Residuals are on the vertical axis
• residual = y – ŷ
• You want to see a random scatter of
residuals around zero
• Any strong patterns indicate a failure of
assumptions
• E.g. funnels, curves, etc.
Original Data à Residual Plot

r = .806

Correlation of hs and college Residual Plot


GPAs 0.40
0.30
4.00 0.20
0.10
3.50
0.00

Residuals
-0.102.00 2.50 3.00 3.50 4.00
cGPA

3.00
-0.20
-0.30
2.50
-0.40
-0.50
2.00
-0.60
2.00 2.50 3.00 3.50 4.00
-0.70
hsGPA hsGPA
Linear Regression
What’s wrong with these residuals?
Residual Plot
Linear Regression
What’s wrong with these residuals?
Residual Plot
Original Data

Violates linearity assumption!


Linear Regression
What’s wrong with these residuals?

Residual Plot
Linear Regression
What’s wrong with these residuals?

Original Data

Residual Plot

Violates equal variance!


Nothing wrong! Not linear! No equal
variance!
Check for Normal Residuals
Can: 1. Look at residual plot (want a random scatter)
2. Make a QQ-plot of the residuals
3. Make a or histogram of the residuals.
Probability Density

Re
s id
ua
ls
Linear Regression

X
Hypotheses
• Example: You survey 9 recent college graduates and
ask their high school and college GPA. When you
analyze this data, you find the following regression
equation.
CGPA = 1.048 + 0.675 (HSGPA)
• Hypotheses
H0: β1 = 0
HA: β1≠ 0
Hypothesis Test
• 1. Assumptions
• 2. Hypotheses CGPA = 1.048 + 0.675 (HSGPA)
• 3. t = SEb1 =
n=9
To b, or not to β
• Step 3: Calculate t test statistic
• find b1 and SEb1
• y = b0 + b1x
– b0 and b1 are estimates of the true regression
parameters (β0 and β1)
– therefore, there is some error with b0 and b1
• Standard error!
• We hope it is small
Standard Error of b1
• Standard error of b1 (will be provided)
SEb1 = 0.187 in our example
• Indicates the confidence we have in our
estimate of the slope
• Use b1 and SEb1 for
– HYPOTHESIS TEST
• Does the slope significantly differ from zero?
• CONFIDENCE INTERVAL for β1
Hypothesis Test
• 1. Assumptions
• 2. Hypotheses CGPA = 1.048 + 0.675 (HSGPA)
• 3. t = SEb1 = 0.187
n=9
Hypothesis Test
• 1. Assumptions
• 2. Hypotheses CGPA = 1.048 + 0.675 (HSGPA)
!.#$% SEb1 = 0.187
• 3. t = = 3.610 n = 9
!.&'$
Hypothesis Test
• 1. Assumptions
• 2. Hypotheses CGPA = 1.048 + 0.675 (HSGPA)
• 3. t = 3.61 SEb1 = 0.187
n=9
• 4. t*7 = 2.365
• NOTE: df = n -2
Hypothesis Test
• 1. Assumptions
• 2. Hypotheses CGPA = 1.048 + 0.675 (HSGPA)
• 3. t = 3.61 SEb1 = 0.187
n=9
• 4. t*7 = 2.365
• NOTE: df = n -2
Conclusion
• I t I > t*
– REJECT THE NULL
– The slope is significant (t=3.61, df=7, p<.05).
– The regression equation is significant.
– The equation does significantly predict y from x.
– There is a significant linear relationship between x
and y.
– X explains a significant amount of variance in y.
Linear Regression in R
y x
cGPA = 1.0478 + 0.675hsGPA

b0

!"
Note1: t = #$!"; b1 SEb1 t p
Note2: t* not listed!
Linear Regression in R
Chandler had a hsGPA of
2.75 and a cGPA of 2.25.
What is his residual?
Linear Regression in R
cGPA=1.0478+0.675(hsGPA) Chandler had a hsGPA of
Residual = y - 𝑦! 2.75 and a cGPA of 2.25.
X = 2.75
What is his residual?
Y=2.25
#𝑦 = ????

Plug in the x to the equation to find !𝑦!


!𝑦 = 1.0478 + 0.675(2.75) = 2.904
Residual = y - !𝑦 = 2.25 - 2.904 = -0.654
Confidence Interval
• Use our sample slope (b1 = 0.675) to estimate the
population slope (β)?
• CGPA = 1.048 + 0.675 (HSGPA)
• Confidence Interval for the slope
• b1 – (t*)(SE) < β1 < b1 + (t*)(SE)
SE = 0.187
t* = 2.365
• We are 95% confident that the true
population slope lies between 0.233 and 1.117
• 0.233 < β1 < 1.117
– Unlike r, b1 can be any number
Squarecap: Try it! ID#
1
PlantMass
62
Seeds
45
2 135 180
3 218 390

• 1. Find the regression equation 4


5
210
80
310
120
6 55 70
r = .9787 7 70 100
8 130 160
9 150 210
10 222 340

• 2. Is the equation significant? Mean 133.20 192.50


SD 66.06 118.63
SEb1 = .1305; t* = 2.306
• 3. Confidence Interval?
• 5. A plant has a mass of 125 grams and 140
seeds. What is its residual?
Squarecap: Try it! ID#
1
PlantMass
62
Seeds
45
2 135 180
3 218 390

• 1. Find the regression equation 4


5
210
80
310
120
6 55 70
r = .9787 7 70 100
8 130 160
Seeds = -39.668 + 1.743(mass) 9
10
150
222
210
340

• 2. Is the equation significant? Mean 133.20 192.50


SD 66.06 118.63
Yes, p<.05
• 3. Confidence Interval? (1.46, 2.04)
• 5. A plant has a mass of 125 grams and 140
seeds. What is its residual? 140-178.21=-38.21
Does it match R?
Seeds = -41.586 + 1.757(mass)
Point(s) of Order!
• For simple linear regression (one x)
• Are x and y significantly, linearly related?
! b1
– can use t = or t =
SEr SEb1
• Use what you are given in the problem

• Fun fact! If your data (both x and y) are


standardized, then your r = b1 and b0 = 0.
y = rx
Summary
• The estimate of the slope divided by its
standard error follows a Student’s t
distribution with n-2 degrees of freedom

• We can perform hypothesis tests or calculate


confidence intervals on b1
Correlation vs. Regression

Correlation BOTH Regression


Correlation & Regression: Similarities
• Two continuous variables
• Linear relationship
– can determine direction (positive, negative)
• Strongly influenced by outliers
• Use t-distribution for hypothesis test + CI
• Hyp test: Is one variable linearly related to another?
– signif correlation?
– signif slope?

• If your x and y scores are standardized, then b1 = r


Correlation & Regression: Differences

Correlation Regression
• ρ, r • ŷi = b0 + b1xi
• order of x, y doesn’t matter • Use a sample to create the
• no units model (regression line)
– Use model to make
predictions of y
• units matter
• order of x, y matters
• R2 = how good is model
Correlation & Regression: Assumptions

Correlation Regression
• Assumptions • Assumptions
– 1. Random sample – 1. Random sample of all
– 2. Independent obs possible values of yi for each
xi
– 3. x,y come from a bivariate
normal distribution – 2. Independent observations
– 3. Data are linearly related
– 4. Residuals are normally
distributed
– 5. Residuals have equal
variance
Correlation & Regression: Hypotheses

Correlation Regression
• Hypotheses • Hypotheses
– HO : ρ = 0 – H 0 : β1 = 0
– HA : ρ ≠ 0 – H A : β1 ≠ 0
• Is the correlation • Is the slope significant?
significant? • Can I use x to predict y?
• Is there a linear relationship • Is there a linear relationship
between x and y? between x and y?
Correlation & Regression Calculations

Correlation Regression
• Finding r • Finding regression line

• Hypothesis test • Hypothesis test

– SE will be given
Squarecap!
• Lizards’ biteforce vs. territory
• Given: n=11, t*= 2.262
Estimate Std. Error t value Pr(>|t|)
(Intercept) -31.539 23.513
BiteForce 11.677 4.848

• 6. What is your t test statistic?


• 7. What is your conclusion?
• 8. Will the 95%CI have zero in it?
• 9. If a lizard has a bite force of 5N and a territory
of 25 m2, what is its residual?
Squarecap!
• Territory = -31.539 + 11.677(BiteForce)
• Given: n=11, t*= 2.262
Estimate Std. Error t value Pr(>|t|)
(Intercept) -31.539 23.513
BiteForce 11.677 4.848

• 6. What is your t test statistic?


• 7. What is your conclusion?
• 8. Will the 95%CI have zero in it?
• 9. If a lizard has a bite force of 5N and a territory
of 25 m2, what is its residual?
Squarecap!
• Territory = -31.539 + 11.677(BiteForce)
• Given: n=11, t*= 2.262
Estimate Std. Error t value Pr(>|t|)
(Intercept) -31.539 23.513
BiteForce 11.677 4.848

%%.'((
• 6. What is your t test statistic? ).*)* = 2.409
• 7. What is your conclusion? Reject null; p<.05
• 8. Will the 95%CI have zero in it? no
• 9. If a lizard has a bite force of 5N and a territory of 25 m2,
$
what is its residual? 𝑏𝑖𝑡𝑒=-31.539 + 11.677(5)=26.846;
resid=25-26.846 = -1.846 m2
Coming Up (TTh11)…
• Friday
– Preliminary analysis due by 11:59pm!
• Word doc + final data set
• Tuesday
– Class: ANOVA
• Wednesday
– Prelab + lab 9
Coming Up (TTh2)…
• Friday
– Preliminary analysis due by 11:59pm!
• Word doc + final data set
• Tuesday
– Class: ANOVA
– Prelab + lab 9
Coming Up (MW)…
• Tuesday
– Prelab + lab 9
• Wednesday
– Class: ANOVA
• Friday
– HW 6 due to canvas by 11:59pm!

You might also like