19 SL Regression 2 320E F21

Simple Linear Regression, Part Deux
Agenda
• Hypothesis test for regression model
– Assumptions
• Confidence Interval
• Correlation vs. Regression
Hypothesis Test!
• Simple Linear Regression (i.e., only one x)
– Test r for significance
or
– Test b1 for significance
• Answers the questions:
– Is the slope significant?
– Is the simple linear regression equation significant?
– Is there a linear relationship between x and y?
– Does knowing x help predict y?
– Does x explain a significant amount of variance in y?
Regression Hypothesis Test!
• Steps
– 1. Assumptions
– 2. Hypotheses
– 3. Calculate t
• Will need b1 and SEb1
– 4. Find t*
– 5. Conclusion
Example
• You survey 9 recent college graduates and ask
their high school and college GPA. When you
analyze this data, you find the following
regression equation.
cGPA = 1.048 + 0.675 (hsGPA) #
1
hsGPA
3.09
cGPA
2.98
2 3.62 3.60
3 2.63 2.87
4 3.20 3.00
5 2.36 2.11
6 2.98 3.25
7 2.06 2.80
8 2.13 2.50
9 3.13 3.33
mean 2.80 2.94

sd 0.53 0.45
Assumptions
• 1. Random sample of all possible values of yi
for each xi
– For each hsGPA, need a random sample of all
possible cGPAs
• 2. Independent observations
• 3. x and y are linearly related
• 4. Residuals are normally distributed at each x
• 5. Residuals have equal variance
Linear Regression
How can we check assumptions 3, 4, 5?

1. Random Sample
2. Independent observations
3. Linearity
4. Residuals are normal
5. Residuals have equal variance
Ø Look at the Residual Plot

Linear Regression
Residual Plot
• X values are on the horizontal axis

• Residuals are on the vertical axis
• residual = y – ŷ
• You want to see a random scatter of
residuals around zero
• Any strong patterns indicate a failure of
assumptions
• E.g. funnels, curves, etc.
Original Data à Residual Plot
r = .806
Correlation of hs and college Residual Plot

GPAs 0.40
0.30
4.00 0.20
0.10
3.50
0.00
Residuals
-0.102.00 2.50 3.00 3.50 4.00
cGPA
3.00
-0.20
-0.30
2.50
-0.40
-0.50
2.00
-0.60
2.00 2.50 3.00 3.50 4.00
-0.70
hsGPA hsGPA
Linear Regression
What’s wrong with these residuals?
Residual Plot
Linear Regression
Residual Plot
Original Data
Violates linearity assumption!

Linear Regression
Residual Plot
Linear Regression
Original Data
Residual Plot
Violates equal variance!

Nothing wrong! Not linear! No equal
variance!
Check for Normal Residuals
Can: 1. Look at residual plot (want a random scatter)
2. Make a QQ-plot of the residuals
3. Make a or histogram of the residuals.
Probability Density
Re
s id
ua
ls
Linear Regression
X
Hypotheses
• Example: You survey 9 recent college graduates and
ask their high school and college GPA. When you
analyze this data, you find the following regression
equation.
CGPA = 1.048 + 0.675 (HSGPA)
• Hypotheses
H0: β1 = 0
HA: β1≠ 0
Hypothesis Test
• 1. Assumptions
• 2. Hypotheses CGPA = 1.048 + 0.675 (HSGPA)
• 3. t = SEb1 =
n=9
To b, or not to β
• Step 3: Calculate t test statistic
• find b1 and SEb1
• y = b0 + b1x
– b0 and b1 are estimates of the true regression
parameters (β0 and β1)
– therefore, there is some error with b0 and b1
• Standard error!
• We hope it is small
Standard Error of b1
• Standard error of b1 (will be provided)
SEb1 = 0.187 in our example
• Indicates the confidence we have in our
estimate of the slope
• Use b1 and SEb1 for
– HYPOTHESIS TEST
• Does the slope significantly differ from zero?
• CONFIDENCE INTERVAL for β1
Hypothesis Test
• 1. Assumptions
• 3. t = SEb1 = 0.187
n=9
Hypothesis Test
• 1. Assumptions
!.#$% SEb1 = 0.187
• 3. t = = 3.610 n = 9
!.&'$
Hypothesis Test
• 1. Assumptions
• 3. t = 3.61 SEb1 = 0.187
n=9
• 4. t*7 = 2.365
• NOTE: df = n -2
Hypothesis Test
• 1. Assumptions
• 3. t = 3.61 SEb1 = 0.187
n=9
• 4. t*7 = 2.365
• NOTE: df = n -2
Conclusion
• I t I > t*
– REJECT THE NULL
– The slope is significant (t=3.61, df=7, p<.05).
– The regression equation is significant.
– The equation does significantly predict y from x.
– There is a significant linear relationship between x
and y.
– X explains a significant amount of variance in y.
Linear Regression in R
y x
cGPA = 1.0478 + 0.675hsGPA
b0
!"
Note1: t = #$!"; b1 SEb1 t p
Note2: t* not listed!
Chandler had a hsGPA of
2.75 and a cGPA of 2.25.
What is his residual?
cGPA=1.0478+0.675(hsGPA) Chandler had a hsGPA of
Residual = y - 𝑦! 2.75 and a cGPA of 2.25.
X = 2.75
What is his residual?
Y=2.25
#𝑦 = ????
Plug in the x to the equation to find !𝑦!

!𝑦 = 1.0478 + 0.675(2.75) = 2.904
Residual = y - !𝑦 = 2.25 - 2.904 = -0.654
Confidence Interval
• Use our sample slope (b1 = 0.675) to estimate the
population slope (β)?
• CGPA = 1.048 + 0.675 (HSGPA)
• Confidence Interval for the slope
• b1 – (t*)(SE) < β1 < b1 + (t*)(SE)
SE = 0.187
t* = 2.365
• We are 95% confident that the true
population slope lies between 0.233 and 1.117
• 0.233 < β1 < 1.117
– Unlike r, b1 can be any number
Squarecap: Try it! ID#
1
PlantMass
62
Seeds
45
2 135 180
3 218 390
• 1. Find the regression equation 4

5
210
80
310
120
6 55 70
r = .9787 7 70 100
8 130 160
9 150 210
10 222 340
• 2. Is the equation significant? Mean 133.20 192.50

SD 66.06 118.63
SEb1 = .1305; t* = 2.306
• 3. Confidence Interval?
• 5. A plant has a mass of 125 grams and 140
seeds. What is its residual?
Squarecap: Try it! ID#
1
PlantMass
62
Seeds
45
2 135 180
3 218 390
• 1. Find the regression equation 4

5
210
80
310
120
6 55 70
r = .9787 7 70 100
8 130 160
Seeds = -39.668 + 1.743(mass) 9
10
150
222
210
340
• 2. Is the equation significant? Mean 133.20 192.50

SD 66.06 118.63
Yes, p<.05
• 3. Confidence Interval? (1.46, 2.04)
• 5. A plant has a mass of 125 grams and 140
seeds. What is its residual? 140-178.21=-38.21
Does it match R?
Seeds = -41.586 + 1.757(mass)
Point(s) of Order!
• For simple linear regression (one x)
• Are x and y significantly, linearly related?
! b1
– can use t = or t =
SEr SEb1
• Use what you are given in the problem
• Fun fact! If your data (both x and y) are

standardized, then your r = b1 and b0 = 0.
y = rx
Summary
• The estimate of the slope divided by its
standard error follows a Student’s t
distribution with n-2 degrees of freedom
• We can perform hypothesis tests or calculate

confidence intervals on b1
Correlation vs. Regression
Correlation BOTH Regression

Correlation & Regression: Similarities
• Two continuous variables
• Linear relationship
– can determine direction (positive, negative)
• Strongly influenced by outliers
• Use t-distribution for hypothesis test + CI
• Hyp test: Is one variable linearly related to another?
– signif correlation?
– signif slope?
• If your x and y scores are standardized, then b1 = r

Correlation & Regression: Differences
Correlation Regression
• ρ, r • ŷi = b0 + b1xi
• order of x, y doesn’t matter • Use a sample to create the
• no units model (regression line)
– Use model to make
predictions of y
• units matter
• order of x, y matters
• R2 = how good is model
Correlation & Regression: Assumptions
• Assumptions • Assumptions
– 1. Random sample – 1. Random sample of all
– 2. Independent obs possible values of yi for each
xi
– 3. x,y come from a bivariate
normal distribution – 2. Independent observations
– 3. Data are linearly related
– 4. Residuals are normally
distributed
– 5. Residuals have equal
variance
Correlation & Regression: Hypotheses
• Hypotheses • Hypotheses
– HO : ρ = 0 – H 0 : β1 = 0
– HA : ρ ≠ 0 – H A : β1 ≠ 0
• Is the correlation • Is the slope significant?
significant? • Can I use x to predict y?
• Is there a linear relationship • Is there a linear relationship
between x and y? between x and y?
Correlation & Regression Calculations
• Finding r • Finding regression line
• Hypothesis test • Hypothesis test
– SE will be given
Squarecap!
• Lizards’ biteforce vs. territory
• Given: n=11, t*= 2.262
Estimate Std. Error t value Pr(>|t|)
(Intercept) -31.539 23.513
BiteForce 11.677 4.848
• 6. What is your t test statistic?

• 7. What is your conclusion?
• 8. Will the 95%CI have zero in it?
• 9. If a lizard has a bite force of 5N and a territory
of 25 m2, what is its residual?
Squarecap!
• Territory = -31.539 + 11.677(BiteForce)
• Given: n=11, t*= 2.262
(Intercept) -31.539 23.513
• 6. What is your t test statistic?

• 7. What is your conclusion?
• 8. Will the 95%CI have zero in it?
• 9. If a lizard has a bite force of 5N and a territory
of 25 m2, what is its residual?
Squarecap!
• Territory = -31.539 + 11.677(BiteForce)
• Given: n=11, t*= 2.262
(Intercept) -31.539 23.513
%%.'((
• 6. What is your t test statistic? ).*)* = 2.409
• 7. What is your conclusion? Reject null; p<.05
• 8. Will the 95%CI have zero in it? no
• 9. If a lizard has a bite force of 5N and a territory of 25 m2,
$
what is its residual? 𝑏𝑖𝑡𝑒=-31.539 + 11.677(5)=26.846;
resid=25-26.846 = -1.846 m2
Coming Up (TTh11)…
• Friday
– Preliminary analysis due by 11:59pm!
• Word doc + final data set
• Tuesday
– Class: ANOVA
• Wednesday
– Prelab + lab 9
Coming Up (TTh2)…
• Friday
– Preliminary analysis due by 11:59pm!
• Word doc + final data set
• Tuesday
– Class: ANOVA
– Prelab + lab 9
Coming Up (MW)…
• Tuesday
– Prelab + lab 9
• Wednesday
– Class: ANOVA
• Friday
– HW 6 due to canvas by 11:59pm!

19 SL Regression 2 320E F21

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

19 SL Regression 2 320E F21

Uploaded by

Copyright:

Available Formats

Simple Linear Regression, Part Deux

mean 2.80 2.94

How can we check assumptions 3, 4, 5?

Ø Look at the Residual Plot

• X values are on the horizontal axis

Correlation of hs and college Residual Plot

Violates linearity assumption!

Violates equal variance!

Plug in the x to the equation to find !𝑦!

• 1. Find the regression equation 4

• 2. Is the equation significant? Mean 133.20 192.50

• 1. Find the regression equation 4

• 2. Is the equation significant? Mean 133.20 192.50

• Fun fact! If your data (both x and y) are

• We can perform hypothesis tests or calculate

Correlation BOTH Regression

• If your x and y scores are standardized, then b1 = r

• Hypothesis test • Hypothesis test

• 6. What is your t test statistic?

• 6. What is your t test statistic?

You might also like