Professional Documents
Culture Documents
y y
x x
y y
x x
Scatter Plot Examples
(continued)
Strong relationships Weak relationships
y y
x x
y y
x x
Scatter Plot Examples
(continued)
No relationship
x
Correlation Coefficient
(continued)
x x x
r = -1 r = -.6 r=0
y y
x x
r = +.3 r = +1
Calculating the
Correlation Coefficient
Sample correlation coefficient:
r
( x x)( y y)
[ ( x x ) ][ ( y y ) ]
2 2
Tree n xy x y
Height, r
y 70 [n( x 2 ) ( x)2 ][n( y 2 ) ( y)2 ]
60
8(3142) (73)(321)
50
40
[8(713) (73)2 ][8(14111) (321) 2 ]
30
0.886
20
10
0
r = 0.886 → relatively strong positive
0 2 4 6 8 10 12 14
linear association between x and y
Trunk Diameter, x
Excel Output
Excel Correlation Output
Tools / data analysis / correlation…
Correlation between
Tree Height and Trunk Diameter
Significance Test for Correlation
• Hypotheses
H0: ρ = 0 (no correlation)
HA: ρ ≠ 0 (correlation exists)
• Test statistic
r
–
t (with n – 2 degrees of freedom)
2
1 r
n2
Example: Produce Stores
Is there evidence of a linear relationship
between tree height and trunk diameter at
the .05 level of significance?
r .886 Decision:
t 4.68
1 r 2 1 .886 2 Reject H0
y β0 β1x ε
Variable
y y β0 β1x ε
Observed Value
of y for xi
εi Slope = β1
Predicted Value
Random Error
of y for xi
for this x value
Intercept = β0
xi x
Estimated Regression Model
The sample regression line provides an estimate of
the population regression line
ŷ i b0 b1x variable
e 2
(y ŷ) 2
(y (b 0
2
b1x))
The Least Squares Equation
• The formulas for b1 and b0 are:
b1
( x x )( y y )
(x x) 2
algebraic
equivalent: and
xy x y
b1 n b0 y b1 x
x 2
( x ) 2
n
Interpretation of the
Slope and the Intercept
ANOVA Significance
df SS MS F F
18934.934 11.084
Regression 1 18934.9348 8 8 0.01039
Residual 8 13665.5652 1708.1957
Total 9 32600.5000
Coefficien P- Upper
ts Standard Error t Stat value Lower 95% 95%
0.1289 232.0738
Intercept 98.24833 58.03348 1.69296 2 -35.57720 6
0.0103
Square Feet 0.10977 0.03297 3.32938 9 0.03374 0.18580
Graphical Presentation
• House price model: scatter plot and
regression line
450
400
House Price ($1000s)
350
Slope
300
250
= 0.10977
200
150
100
50
Intercept 0
= 98.248 0 500 1000 1500 2000 2500 3000
Square Feet
Xi x
Coefficient of Determination, R2
• The coefficient of determination is the
portion of the total variation in the
dependent variable that is explained by
variation in the independent variable
Coefficient of determination
2 SSR sum of squares explained by regression
R
SST total sum of squares
y
R2 = 1
x
R = +1
2
Examples of Approximate
R Values
2
y
0 < R2 < 1
x
Examples of Approximate
R Values
2
R2 = 0
y
No linear relationship
between x and y:
SSE
s
n k 1
Where
SSE = Sum of squares error
n = Sample size
k = number of independent variables in the
model
Comparing Standard Errors
Variation of observed y values Variation in the slope of regression
from the regression line lines from different possible samples
y y
y y