You are on page 1of 38

STATISTICS

Chapter 5 Correlation/Regression

MVS 250: V. Katch 1


Overview

Paired Data
 is there a relationship
 if so, what is the equation
 use the equation for prediction

2
Correlation

3
Definition
Correlation
exists between two variables
when one of them is related to
the other in some way

4
Assumptions
1. The sample of paired data (x,y) is a
random sample.
2. The pairs of (x,y) data have a
bivariate normal distribution.

5
Definition
Scatterplot (or scatter diagram)
is a graph in which the paired
(x,y) sample data are plotted with
a horizontal x axis and a vertical
y axis. Each individual (x,y) pair
is plotted as a single point.
6
Scatter Diagram of Paired Data

7
Scatter Diagram of Paired Data

8
Positive Linear Correlation

y y y

x x x
(a) Positive (b) Strong (c) Perfect
positive positive

Scatter Plots
9
Negative Linear Correlation

y y y

x x x
(d) Negative (e) Strong (f) Perfect
negative negative

Scatter Plots
10
No Linear Correlation
y y

x x
(g) No Correlation (h) Nonlinear Correlation

Scatter Plots
11
Definition
Linear Correlation Coefficient r
measures strength of the linear relationship
between paired x and y values in a sample

Σxy/n - (Σx/n)(Σy/n)
r=
(SDx) (SDy)
Where Σxy/n is the mean of the cross products;
(Σx/n) is the mean of the x variable; (Σy/n) is the
mean of the y variable; SDx is the standard
deviation of the x variable and SDy is the
standard deviation of the x variable
12
Notation for the
Linear Correlation Coefficient
n number of pairs of data presented

Σ denotes the addition of the items indicated.

Σ x/n denotes the mean of all x values.

Σ y/n denotes the mean of all y values.

Σ xy/n denotes the mean of the cross products [x


times y, summed; divided by n]

r linear correlation coefficient for a sample


ρ linear correlation coefficient for a
population
13
Rounding the
Linear Correlation Coefficient r

 Round to three decimal places


Use calculator or computer if possible

14
Properties of the
Linear Correlation Coefficient r

1. -1 ≤ r ≤ 1
2. Value of r does not change if all values of
either variable are converted to a different
scale.
3. The r is not affected by the choice of x and y.
Interchange x and y and the value of r will not
change.
4. r measures strength of a linear relationship.
15
Interpreting the Linear
Correlation Coefficient
If the absolute value of r exceeds the
value in Sig. Table, conclude that there is
a significant linear correlation.

Otherwise, there is not sufficient


evidence to support the conclusion of
significant linear correlation.

Remember to use n-2


16
Common Errors Involving Correlation

1. Causation: It is wrong to conclude that


correlation implies causality.

2. Averages: Averages suppress individual


variation and may inflate the correlation
coefficient.

3. Linearity: There may be some relationship


between x and y even when there is no
significant linear correlation.
17
Common Errors Involving Correlation
250

200

150
Distance
(feet)

100

50

0
0 1 2 3 4 5 6 7 8

Time (seconds)
18
Correlation is Not Causation

A B

19
Correlation Calculations

Rank Order Correlation - Rho


Pearson’s - r

20
Rank Order Correlation
Hits Rank HR Rank D D2
1 10 3 8 2 4
2 9 4 7 2 4
3 8 5 6 2 4
4 7 1 10 -3 9
5 6 7 4 2 4
6 5 6 5 0 0
7 4 2 9 -5 25
8 3 10 1 2 4
9 2 9 2 0 0
10 1 8 3 2 4

21
Rank Order Correlation, cont

Rho = 1- [6 (∑D ) / N (N -1)] 2 2

Hits Rank HR Rank D D2


Rho = 1- [6(58)/10(102-1)]
1 10 3 8 2 4
2 9 4 7 2 4
Rho = 1- [348 / 10 (100 -1)]
3 8 5 6 2 4
4 7 1 10 -3 9
Rho = 1- [348 / 990]
5 6 7 4 2 4
6 5 6 5 0 0 Rho = 1- 0.352
7 4 2 9 -5 25
8 3 10 1 2 4 Rho = 0.648
9 2 9 2 0 0
10 1 8 3 2 4

N=10
(∑D2 = 58)
22
Pearson’s r
Hits HR Σxy
1 3 3 Σxy/n - (Σx/n)(Σy/n)
2 4 8 r=
3 5 15 (SDx) (SDy)
4 1 4
5 7 35
6 6 36 r = 32.86 - (5.5) (5.5)/(3.03) (3.03)
7 2 14
r = 35.86 - 30.25 / 9.09
8 10 80
9 9 81 r = 5.61 / 9.09
10 8 80
r = 0.6172
Σ Σ Σxy/n
x/n= x/n= =32.86
5.5 5.5 23
Pearson’s r
Excel Demonstration

24
Is there a significant linear correlation?
Data from the Garbage Project
x Plastic (lb) 0.27 1.41 2.19 2.83 2.19 1.81 0.85 3.05
y Household 2 3 3 6 4 2 1 5

25
Is there a significant linear correlation?
Data from the Garbage Project
x Plastic (lb) 0.27 1.41 2.19 2.83 2.19 1.81 0.85 3.05
y Household 2 3 3 6 4 2 1 5

26
Is there a significant linear correlation?
Data from the Garbage Project
x Plastic (lb) 0.27 1.41 2.19 2.83 2.19 1.81 0.85 3.05
y Household 2 3 3 6 4 2 1 5

Plastic Garbage v Household size

7
6 r = 0.842
R2 2= 0.7096
5 R = 0.71
4
3
2
1
Household size
0
0 0.5 1 1.5 2 2.5 3 3.5
Plastic (lbs)
27
Is there a significant linear correlation?
n α = .05 α = .01
n=8 α = 0.05 H0:ρ =0 4
5
.950
.878
.999
.959

H :ρ ≠ 0
6 .811 .917
1 7 .754 .875
8 .707 .834
9 .666 .798
10 .632 .765
11 .602 .735

Test statistic is r = 0.842


12 .576 .708
13 .553 .684
14 .532 .661
15 .514 .641
16 .497 .623
17 .482 .606
18 .468 .590
19 .456 .575

Critical values are r = - 0.707 and 0.707


20 .444 .561
25 .396 .505
30 .361 .463
(Table R with n = 8 and α = 0.05) 35
40
.335
.312
.430
.402
45 .294 .378
50 .279 .361
60 .254 .330
70 .236 .305
80 .220 .286
90 .207 .269
100 .196 .256
TABLE R Critical Values of the Pearson Correlation Coefficient r

28
Is there a significant linear correlation?
0.842 > 0.707, That is the test statistic does fall within the
critical region.

Therefore, we REJECT H0: ρ = 0 (no correlation) and conclude


there is a significant linear correlation between the weights of
discarded plastic and household size.

Reject Fail to reject Reject


ρ =0 ρ=0 ρ =0

-1 r = - 0.707 0 1
r = 0.707

Sample data:
r = 0.842

29
Method 1: Test Statistic is t
(follows format of earlier chapters)

30
Formal Hypothesis Test
 To determine whether there is a
significant linear correlation
between two variables
 Two methods
 Both methods let H0: ρ = 0
(no significant linear correlation)
H1 : ρ ≠0
(significant linear correlation)
31
Method 2: Test Statistic is r
(uses fewer calculations)

Test statistic: r
Critical values: Refer to Table R
(no degrees of freedom)

32
Method 2: Test Statistic is r
(uses fewer calculations)

Test statistic: r
Critical values: Refer to Table A-6
(no degrees of freedom)

Reject Fail to reject Reject


ρ =0 ρ=0 ρ =0

-1 r = - 0.811 0 r = 0.811 1

Sample data:
r = 0.828

33
Method 1: Test Statistic is t
(follows format of earlier chapters)

Test statistic:
r
t=
1-r2
n-2

Critical values:

use Table T with


degrees of freedom = n - 2
34
Start

Testing for a Let H0: ρ = 0


H1: ρ ≠ 0

Linear Correlation Select a


significance
level α

Calculate r using
Formula 9-1
METHOD 1 METHOD 2

The test statistic is The test statistic is r


r
t= Critical values of t are from
1-r2 Table A-6
n -2
Critical values of t are from Table A-3
with n -2 degrees of freedom

If the absolute value of the


test statistic exceeds the
critical values, reject H0: ρ = 0
Otherwise fail to reject H0

If H0 is rejected conclude that there


is a significant linear correlation.
If you fail to reject H0, then there is
not sufficient evidence to conclude
that there is linear correlation.
35
Why does the critical value of r
increase as sample size decreases?

A correlation by chance is more likely.

36
Coefficient of Determination
(Effect Size)

r2
The part of variance of one variable that can be
explained by the variance of a related variable.

37
Justification for r Formula

Σ (x -x) (y -y)
r= (n -1) Sx Sy
(x, y) centroid of sample points
x=3
y x - x = 7- 3 = 4
(7, 23)
24

20
y - y = 23 - 11 = 12

Quadrant 2 Quadrant 1
16

12
y = 11
(x, y)
8
Quadrant 3 • Quadrant 4

4
••
0 x
0 1 2 3 4 5 6 7
38