You are on page 1of 10

Examining

Relationships
Regression Facts
YMS3e Chapter 3
3.3: Correlation and Regression Extras
Mr. Molesky

Regression Basics
Scatter Plot
Scatter Plot

The Endangered Manatee


The Endangered Manatee
60
60

When describing a Bivariate


Relationship:
Make a Scatterplot
Strength, Direction, Form
Model: y-hat=a+bx
Interpret slope in context
Make Predictions
Residual = ObservedPredicted
Assess the Model
Interpret r
Residual Plot
50
50
40
40
30
30
20
20
10
10
0
0

The Endangered Manatee


The Endangered Manatee
60
60
50
50
40
40
30
30
20
20
10
10
0
0

The
TheEndangered
EndangeredManatee
Manatee
60
60
50
50
40
40
30
30
20
20
10
10
00

Scatter
ScatterPlot
Plot

Scatter Plot
Scatter Plot

Minitab Output

Regression Analysis: Fat gain versus NEA


The regression equation is
Regression
Analysis:
Fat gain versus NEA
FatGain = ******
+ ******(NEA)
The
regression
equation isSE Coef
Predictor
Coef
FatGain
+ ******(NEA)
Constant= ******
3.5051
0.3036
NEA
-0.0034415
0.00074141
Predictor
Coef
SE Coef
Constant
3.5051
0.3036
NEA
-0.0034415
0.00074141
S=0.739853
R-Sq = 60.6%
S=0.739853

R-Sq = 60.6%

T
11.54
-4.04
T
11.54
R-4.04
Sq(adj)=57.8
%
RSq(adj)=57.8
%

P
0.000
0.000
P
0.000
0.000

Regression equations arent always as easy to spot as they are


on your TI-84. Can you find the slope and intercept above?

Outliers/Influential
Points
Does
the age of a childs first word
predict his/her mental ability? Consider

Age
AgeatatFirst
FirstWord
Wordand
andGesell
GesellScore
Score
Child
Age
Score
Child
Age
Score
11
22
33
44
55
66
77
88

11
22

15
15months
months
26
26months
months

95
95
71
71

33 10
10months
months
44 99months
months
55 15
15months
months

83
83
91
91

66
77

20
20months
months
18
18months
months

102
102
87
87
93
93
100
100

<new>
<new>

the following data on (age of first word,


Gesell Adaptive Score) for 21 children.
Age
AgeatatFirst
FirstWord
Wordand
andGesell
GesellScore
Score
130
130
120
120
110
110

99
10
10

88 11
11months
months
99 88months
months
10
10 20
20months
months

11
11
12
12

11
11
12
12

77months
months
99months
months

113
113
96
96

80
80
70
70

13
13
14
14

13
13
14
14

10
10months
months
11
11months
months

83
83
84
84

60
60
50
50

15
15
16
16

15
15
16
16

11
11months
months
10
10months
months

102
102
100
100

17
17
18
18

17
17
18
18

12
12months
months
42
42months
months

105
105
57
57

19
19
20
20

19
19
20
20

17
17months
months
11
11months
months

121
121
86
86

21
21

21
21 10
10months
months

100
100

104
104
94
94

Scatter
ScatterPlot
Plot

100
100
90
90

Influential?

Does the highlighted point markedly affect


the equation of the LSRL? If so, it is
influential.
Test by removing the point and finding the

Explanatory vs.
Response
The Distinction Between Explanatory and Response
variables is essential in regression.
Switching the distinction results in a different
least-squares regression line.
Hubble
Hubble1929
1929data
data
1200
1200
1000
1000
800
800
600
600
400
400
200
200
00
-200
-200
-400
-400

Scatter
ScatterPlot
Plot

Hubble
Hubble1929
1929data
data
2.2
2.2

Scatter
ScatterPlot
Plot

2.0
2.0
1.8
1.8
1.6
1.6
1.4
1.4
1.2
1.2
1.0
1.0
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0.0
0.0

Note: The correlation value, r, does NOT depend


on the distinction between Explanatory and

Correlation
Beer
Beerand
andBlood
BloodAlcohol
Alcohol
0.20
0.20
0.18
0.18
0.16
0.16
0.14
0.14
0.12
0.12
0.10
0.10
0.08
0.08
0.06
0.06
0.04
0.04
0.02
0.02
0.00
0.00

Scatter
ScatterPlot
Plot

The correlation, r, describes


the strength of the straight-line
relationship between x and y.
Ex: There is a strong,
positive, LINEAR relationship
between # of beers and
BAC.

There is a weak, positive,


linear relationship
between x and y.
However, there is a
strong nonlinear
relationship.

Collection
Collection11
55
00
-5-5
-10
-10
-15
-15
-20
-20

Scatter
ScatterPlot
Plot

Coefficient of
Determination
The coefficient
of determination, r , describes the
2

percent of variability in y that is explained by the linear


regression on x.
Wine
WineConsumption
Consumptionand
andHeart
HeartDisease
Disease
350
350
300
300
250
250
200
200
150
150
100
100
50
50
00

Scatter
ScatterPlot
Plot

71% of the variability in


death rates due to heart
disease can be explained
by the LSRL on alcohol
consumption.
That is, alcohol
consumption provides
us with a fairly good
prediction of death
rate due to heart
disease, but other

Cautions
Correlation and Regression are NOT RESISTANT
to outliers and Influential Points!
Correlations based on averaged data tend to
be higher than correlations based on all raw
data.
Extrapolating beyond the observed data can
result in predictions that are unreliable.

Correlation vs.
Consider the Causation
following historical data:
Collection
Collection11
Year
Year
11
22
33
44
55
66
77
88
99
10
10

Ministers
Ministers
1860
63
1860
63

Rum
<new>
Rum
<new>
8376
8376

1865
1865
1870
1870

48
48
53
53

6406
6406
7005
7005

1875
1875
1880
1880

64
64
72
72

8486
8486
9595
9595

1885
1885
1890
1890

80
80
85
85

10643
10643
11265
11265

1895
1895
1900
1900

76
76
80
80

10071
10071
10547
10547

1905
1905
1910
1910

83
83
105
105

11008
11008
13885
13885

1915
1915

140
140

18559
18559

Collection
Collection11
20000
20000
18000
18000
16000
16000
16000
16000
14000
14000
14000
14000
12000
12000
12000
12000
10000
10000
10000
10000
8000
8000
8000
8000
6000
6000
6000
4000
6000
4000
4000
2000
4000
2000
2000
00
2000
00

Scatter
ScatterPlot
Plot

There is an almost perfect


linear relationship between
x and y. (r=0.999997)
xx =
=#
# Methodist
Methodist Ministers
Ministers in
in New
New England
England
yy =
=#
# of
of Barrels
Barrels of
of Rum
Rum Imported
Imported to
to Boston
Boston
CORRELATION
CORRELATION DOES
DOES NOT
NOT IMPLY
IMPLY

11
11
12
12

Summary

The Endangered Manatee


The Endangered Manatee
60
60

Scatter Plot
Scatter Plot

50
50
40
40
30
30

The Endangered Manatee


The Endangered Manatee
60
60

20
20

50
50

10
10

40
40

0
0

30
30
20
20
10
10
0
0

Scatter Plot
Scatter Plot

The
TheEndangered
EndangeredManatee
Manatee
60
60
50
50
40
40
30
30
20
20
10
10
00

Scatter
ScatterPlot
Plot

You might also like