Professional Documents
Culture Documents
Goals
Find Find Equation for SLR and interpret parts
15000
10000
price
5000
0
1 2 3 4 5
carat
scatterplot of diamond length by width
60
50
40
30
y
20
10
0
0 2 4 6 8 10
x
40
30
hwy
20
2 3 4 5 6 7
displ
40
30
hwy
20
10 15 20 25 30 35
cty
Calculating r
• = =
Try Calculating the Correlation Coefficient
How much should a healthy pony weigh? Let x be the age of the pony
(in months), and let y be the average weight of the pony (in kilograms).
x 3 6 12 18 24
y 60 95 140 170 185
Paste into first cell
The y – intercept is a =
Least Squares Equation of a Line
•
Return to Ponies!
x 3 6 12 18 24
y 60 95 140 170 185
160
140
140
120
Pony Weight
100 95
80
60
60
40
20
0
0 5 10 15 20 25 30
Months
• Residuals:
• 1 2 3 4 5
• -13.415 3.902 13.537 8.171 -12.195
• Coefficients:
• Estimate Std. Error t value Pr(>|t|)
• (Intercept) 55.7317 12.0856 4.611 0.01918 *
•x 5.8943 0.8189 7.198 0.00553 **
• ---
• Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Time + 55.7317
Assessing a Line of Best Fit
Residuals
•The
residuals are the n quantities:
1st residual
2nd residual
nth residual
10
• Fit each of the points from the pony example:
• 73.41463 91.09756 126.46341 161.82927
5
197.19512
y - fitted(lm(y ~ x))
• Find the difference from the observed:
0
• -13.414634 3.902439 13.536585 8.170732
-12.195122
-5
• Is this a good fit? Why?
-10
1 2 3 4 5
Index
Coefficient of Determination:
• The coefficient of determination, denoted by , is the proportion of
variability in y that can be attributed to an approximate linear
relationship between x and y. The value or is often converted to a
percentage (by multiplying by 100).
• It is essentially a measure of the strength of the linear the relationship
between the two variables.
• A large value of indicates that a large proportion of the variability in
y can be explained by the approximate linear relationship between x
and y. This tells you that knowing the value of x is helpful for
predicting y.
Calculating
•Residual
sum of squares = SSResid =
Total sum of squares = SSTo =
The coefficient of determination is calculated as
=
*Note that we can calculate by squaring r, which is what we will do to
find it.
Bivariate Summary
• Visualize with Scatterplot
• Make preliminary claims based on plot
• Fit Simple Linear Regression
• Look at residual plot for patterns
• Calculate and Interpret the Coefficient of Determination
• Decide if SLR is appropriate for the relationship
• Use SLR line to make predictions
Regression Activity
• Using either Excel or the provided R code on the D2L page, create a scatterplot of
weight described by height and calculate the correlation coefficient (r).
• Share your scatterplot and R score by sketching them on your white board.
• Is your plot different from that of the other students? Correlation Coefficient? Why?
• Using Excel or R, and the R code on our D2L page, generate a simple linear regression line for
your data. Write out the model.
• Interpret the model and coefficient of determination (r^2)
• Append your Linear Model to your plot.
• Using your model, predict the value of a person’s weight given that you know they are 68
inches tall.
• If you used someone else’s model, would you get the same result? Would it be close?
• If I gave you a larger subset of the data would your model have differed more or less from
every other group’s model? Explain.