Chapter 5 Correlation/Regression

Overview

Paired Data

is there a relationship

if so, what is the equation

use the equation for prediction

2

Correlation

3

Definition

Correlation

exists between two variables

when one of them is related to

the other in some way

4

Assumptions

1. The sample of paired data (x,y) is a

random sample.

2. The pairs of (x,y) data have a

bivariate normal distribution.

5

Definition

Scatterplot (or scatter diagram)

is a graph in which the paired

(x,y) sample data are plotted with

a horizontal x axis and a vertical

y axis. Each individual (x,y) pair

is plotted as a single point.

6

Scatter Diagram of Paired Data

7

Scatter Diagram of Paired Data

8

Positive Linear Correlation

y y y

x x x

(a) Positive (b) Strong (c) Perfect

positive positive

Scatter Plots

9

Negative Linear Correlation

y y y

x x x

(d) Negative (e) Strong (f) Perfect

negative negative

Scatter Plots

10

No Linear Correlation

y y

x x

(g) No Correlation (h) Nonlinear Correlation

Scatter Plots

11

Definition

Linear Correlation Coefficient r

measures strength of the linear relationship

between paired x and y values in a sample

Σxy/n - (Σx/n)(Σy/n)

r=

(SDx) (SDy)

Where Σxy/n is the mean of the cross products;

(Σx/n) is the mean of the x variable; (Σy/n) is the

mean of the y variable; SDx is the standard

deviation of the x variable and SDy is the

standard deviation of the x variable

12

Notation for the

Linear Correlation Coefficient

n number of pairs of data presented

times y, summed; divided by n]

ρ linear correlation coefficient for a

population

13

Rounding the

Linear Correlation Coefficient r

Use calculator or computer if possible

14

Properties of the

Linear Correlation Coefficient r

1. -1 ≤ r ≤ 1

2. Value of r does not change if all values of

either variable are converted to a different

scale.

3. The r is not affected by the choice of x and y.

Interchange x and y and the value of r will not

change.

4. r measures strength of a linear relationship.

15

Interpreting the Linear

Correlation Coefficient

If the absolute value of r exceeds the

value in Sig. Table, conclude that there is

a significant linear correlation.

evidence to support the conclusion of

significant linear correlation.

16

Common Errors Involving Correlation

correlation implies causality.

variation and may inflate the correlation

coefficient.

between x and y even when there is no

significant linear correlation.

17

Common Errors Involving Correlation

250

200

150

Distance

(feet)

100

50

0

0 1 2 3 4 5 6 7 8

Time (seconds)

18

Correlation is Not Causation

A B

19

Correlation Calculations

Pearson’s - r

20

Rank Order Correlation

Hits Rank HR Rank D D2

1 10 3 8 2 4

2 9 4 7 2 4

3 8 5 6 2 4

4 7 1 10 -3 9

5 6 7 4 2 4

6 5 6 5 0 0

7 4 2 9 -5 25

8 3 10 1 2 4

9 2 9 2 0 0

10 1 8 3 2 4

21

Rank Order Correlation, cont

Rho = 1- [6(58)/10(102-1)]

1 10 3 8 2 4

2 9 4 7 2 4

Rho = 1- [348 / 10 (100 -1)]

3 8 5 6 2 4

4 7 1 10 -3 9

Rho = 1- [348 / 990]

5 6 7 4 2 4

6 5 6 5 0 0 Rho = 1- 0.352

7 4 2 9 -5 25

8 3 10 1 2 4 Rho = 0.648

9 2 9 2 0 0

10 1 8 3 2 4

N=10

(∑D2 = 58)

22

Pearson’s r

Hits HR Σxy

1 3 3 Σxy/n - (Σx/n)(Σy/n)

2 4 8 r=

3 5 15 (SDx) (SDy)

4 1 4

5 7 35

6 6 36 r = 32.86 - (5.5) (5.5)/(3.03) (3.03)

7 2 14

r = 35.86 - 30.25 / 9.09

8 10 80

9 9 81 r = 5.61 / 9.09

10 8 80

r = 0.6172

Σ Σ Σxy/n

x/n= x/n= =32.86

5.5 5.5 23

Pearson’s r

Excel Demonstration

24

Is there a significant linear correlation?

Data from the Garbage Project

x Plastic (lb) 0.27 1.41 2.19 2.83 2.19 1.81 0.85 3.05

y Household 2 3 3 6 4 2 1 5

25

Is there a significant linear correlation?

Data from the Garbage Project

x Plastic (lb) 0.27 1.41 2.19 2.83 2.19 1.81 0.85 3.05

y Household 2 3 3 6 4 2 1 5

26

Is there a significant linear correlation?

Data from the Garbage Project

x Plastic (lb) 0.27 1.41 2.19 2.83 2.19 1.81 0.85 3.05

y Household 2 3 3 6 4 2 1 5

7

6 r = 0.842

R2 2= 0.7096

5 R = 0.71

4

3

2

1

Household size

0

0 0.5 1 1.5 2 2.5 3 3.5

Plastic (lbs)

27

Is there a significant linear correlation?

n α = .05 α = .01

n=8 α = 0.05 H0:ρ =0 4

5

.950

.878

.999

.959

H :ρ ≠ 0

6 .811 .917

1 7 .754 .875

8 .707 .834

9 .666 .798

10 .632 .765

11 .602 .735

12 .576 .708

13 .553 .684

14 .532 .661

15 .514 .641

16 .497 .623

17 .482 .606

18 .468 .590

19 .456 .575

20 .444 .561

25 .396 .505

30 .361 .463

(Table R with n = 8 and α = 0.05) 35

40

.335

.312

.430

.402

45 .294 .378

50 .279 .361

60 .254 .330

70 .236 .305

80 .220 .286

90 .207 .269

100 .196 .256

TABLE R Critical Values of the Pearson Correlation Coefficient r

28

Is there a significant linear correlation?

0.842 > 0.707, That is the test statistic does fall within the

critical region.

there is a significant linear correlation between the weights of

discarded plastic and household size.

ρ =0 ρ=0 ρ =0

-1 r = - 0.707 0 1

r = 0.707

Sample data:

r = 0.842

29

Method 1: Test Statistic is t

(follows format of earlier chapters)

30

Formal Hypothesis Test

To determine whether there is a

significant linear correlation

between two variables

Two methods

Both methods let H0: ρ = 0

(no significant linear correlation)

H1 : ρ ≠0

(significant linear correlation)

31

Method 2: Test Statistic is r

(uses fewer calculations)

Test statistic: r

Critical values: Refer to Table R

(no degrees of freedom)

32

Method 2: Test Statistic is r

(uses fewer calculations)

Test statistic: r

Critical values: Refer to Table A-6

(no degrees of freedom)

ρ =0 ρ=0 ρ =0

-1 r = - 0.811 0 r = 0.811 1

Sample data:

r = 0.828

33

Method 1: Test Statistic is t

(follows format of earlier chapters)

Test statistic:

r

t=

1-r2

n-2

Critical values:

degrees of freedom = n - 2

34

Start

H1: ρ ≠ 0

significance

level α

Calculate r using

Formula 9-1

METHOD 1 METHOD 2

r

t= Critical values of t are from

1-r2 Table A-6

n -2

Critical values of t are from Table A-3

with n -2 degrees of freedom

test statistic exceeds the

critical values, reject H0: ρ = 0

Otherwise fail to reject H0

is a significant linear correlation.

If you fail to reject H0, then there is

not sufficient evidence to conclude

that there is linear correlation.

35

Why does the critical value of r

increase as sample size decreases?

36

Coefficient of Determination

(Effect Size)

r2

The part of variance of one variable that can be

explained by the variance of a related variable.

37

Justification for r Formula

Σ (x -x) (y -y)

r= (n -1) Sx Sy

(x, y) centroid of sample points

x=3

y x - x = 7- 3 = 4

(7, 23)

24

•

20

y - y = 23 - 11 = 12

Quadrant 2 Quadrant 1

16

•

12

y = 11

(x, y)

8

Quadrant 3 • Quadrant 4

4

••

0 x

0 1 2 3 4 5 6 7

38

