You are on page 1of 75

Introduction to Correlation.xls.

This workbook explains and uses the correlation coefficient with several examples.

This workbook begins with an example based on SAT data for applicants admitted to Wabash College one year in the mid 199
The SATHist sheet gives summary statistics on the Verbal and Math SAT scores and asks you to draw rough pictures
The SATScatter sheet presents the same data in a bivariate scatter plot and allows you to plot the SD line and point of

The workbook goes on to explore various aspects of the correlation coefficient.


The Extreme sheet contains some extreme cases of positive and negative correlation.
The Patterns sheet shows that the same summary statistics will fit many different bivariate patterns. Summary stats, in
The Corr sheet dynamically shows how the correlation coefficient changes as more data is added to a bivariate scatter
The CRExample sheet contains the answer to an example question in the book.
The Velocity sheet gives an example based on actual economic data.
The Q&A sheet contains self-study questions.
The answers can be found in the Answers folder.
The workbook contains other hidden sheets that will be revealed as you progress through the material.

ege one year in the mid 1990s.


s you to draw rough pictures of the histograms for these variables.
plot the SD line and point of averages.

patterns. Summary stats, including r, don't tell the whole story!


s added to a bivariate scatter.

This sheet contains data on 527 applicants to Wabash College in a recent year.
Verbal

Math

510

690

470

Average

470

SD

512

630

Average

89

560

SD

440

710

660

What does the histogram look like?

730

What does the histogram look li

600

Use the average and SD to create a mind's eye picture.

650

Use the average and SD to crea

450

Scroll down to check how you did.

510

Scroll down to check how you d

500

420

580

610

400

500

490

650

490

570

340

460

320

380

560

550

400

470

450

610

510

710

730

680

510

560

570

690

690

770

510

620

480

560

590

660

500

650

660

660

360

460

470

640

540

560

560

640

440

480

620

610

560
590
450
400
470
400
500
490

60

50

40

Verbal SAT

700
740
500
540
550
560
650
530

70
60
50
40

40

560
540
650
520
450
590
650
500
370
520
480
640
560
450
540
400
480
700
490
420
580
500
610
420
460
500
470
580
380
690
440
600
660
490
560
460
620
590
550
490
520
450
620
620
490
400
590
420
540
300
600
360

30

20

10

650
610
520
680
500
640
760
600
610
670
640
720
460
460
640
550
560
710
540
440
730
530
700
580
620
570
620
740
560
540
420
650
640
600
590
750
690
640
560
640
650
490
730
690
610
500
640
460
500
610
710
490

30
20
10
0

440
370
440
480
490
400
540
440
500
500
480
470
460
580
510
540
540
440
540
500
480
530
500
360
560
490
510
430
650
680
480
480
510
390
360
660
650
580
660
630
490
440
430
480
580
530
630
520
470
570
410
400

460
370
680
490
460
550
640
570
600
740
580
690
500
570
560
630
500
540
560
710
660
550
620
740
570
560
730
510
680
720
570
620
530
450
520
780
700
610
750
600
630
560
600
610
720
700
650
570
500
690
570
530

480
430
440
600
390
570
460
500
420
640
440
520
460
510
490
350
630
400
470
630
440
540
570
510
590
520
400
410
440
580
440
580
440
530
450
440
720
510
380
540
500
490
460
670
620
480
420
480
540
500
560
420

560
500
590
690
470
450
520
580
560
660
620
720
480
560
550
510
650
560
500
760
610
650
600
580
630
440
640
510
480
580
480
680
540
680
570
510
580
590
480
630
630
610
560
600
570
570
560
580
700
560
800
660

490
420
420
580
580
470
540
450
430
520
590
540
540
530
580
570
600
700
440
490
530
500
470
380
360
480
600
430
500
540
470
470
400
620
470
390
530
460
600
450
560
500
660
440
520
450
490
490
510
620
460
490

490
560
420
740
620
610
500
500
470
690
700
600
450
690
710
670
650
730
670
470
480
600
560
420
500
500
680
510
560
560
560
470
630
580
480
520
530
670
700
350
580
420
730
650
580
660
610
540
480
690
570
430

660
550
410
470
680
560
360
620
500
440
580
360
480
650
590
430
400
400
550
540
600
560
350
440
430
450
540
600
450
670
400
540
520
520
560
440
470
450
560
660
580
500
550
560
780
670
510
470
420
520
480
670

590
780
590
460
760
570
440
520
560
660
580
550
480
500
730
600
520
570
690
620
670
660
600
600
380
580
530
590
550
700
430
650
520
480
600
530
620
580
620
730
730
560
620
560
800
670
600
530
550
660
650
670

530
690
540
580
470
600
580
560
620
380
430
320
480
480
460
570
580
550
610
450
520
550
660
570
630
450
480
400
480
500
420
580
720
400
420
610
470
390
340
400
500
340
430
380
570
600
420
550
600
420
560
500

540
670
740
590
610
650
660
620
740
560
390
450
500
660
570
660
590
600
740
600
590
690
690
570
690
540
650
600
560
560
560
710
700
490
540
730
550
530
650
530
570
580
550
530
680
500
540
720
700
460
670
600

720
490
600
430
540
400
500
370
560
600
640
640
530
530
500
500
680
560
560
500
470
600
740
360
660
560
580
510
600
360
510
720
460
500
460
470
400
440
600
510
500
610
680
740
550
450
400
620
540
570
450
430

760
530
580
430
730
690
650
580
650
620
550
640
640
610
720
620
680
640
500
570
520
760
780
660
800
690
650
660
670
480
560
730
530
600
620
560
460
460
670
740
650
570
740
530
580
430
540
650
600
600
650
480

460
500
540
540
540
580
570
370
520
590
540
640
490
580
460
600
380
660
670
580
560
580
440
450
430
580
660
490
480
470
580
490
380
640
510
430
590
490
430
450
660
480
650
540
600
460
570
530
380
420
520
500

650
480
650
350
640
710
680
490
560
680
670
630
650
560
500
660
590
540
680
650
560
730
600
530
380
560
600
500
470
500
570
480
520
730
520
480
620
660
550
590
720
660
540
690
690
570
690
660
560
560
610
610

470
700
540
500
510
630
440
490
510
590
620
500
730
560
580
420
340
350
370
620
460
720
400
440
530
510
380
440
590
480
400
400
470
510
440
440
400
590
520
480
330
540
480
640
510
650
550
350
400
350
390
520

680
660
600
610
640
750
600
610
510
620
700
560
610
580
570
500
500
590
460
700
530
630
580
560
760
540
560
540
630
670
620
500
570
640
510
560
470
680
680
540
560
530
430
650
660
630
590
700
470
530
590
600

500
410
410
490
430
450
430
590
430
560
550
400
660
520
530
680
350
560

570
680
440
610
580
620
490
670
530
640
450
550
730
560
610
710
480
670

595
87

What does the histogram look like?

Use the average and SD to create a mind's eye picture.

croll down to check how you did.

Univariate Scatter Analogue


1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
200

400

600

Verbal SAT

The Verbal SAT histogram starts in cell B37 and the 3Hist is below. Click the Make

Math SAT

3D Histogram

500-599

399 and below

500-599

400-499

399 and below

700 and above

Verbal SAT

Math SAT

Scatter Analogue

Bivariate Scatter Plot


900
800
700

Math SAT

600
500
400
300
200
100
0
800

200

400

600

800

Verbal SAT

and the 3Hist is below. Click the Make 3DHist button to learn how to make one..

3D Histogram

100
80

Math SAT
700 and above

600-699

500-599

80

60

40

20

Frequency

0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0

527 applicants to Wabash College in a recent year.


Verbal
Math
510
690
470
630
Point of Averages
470
560
Average x
512
440
710
Average y
595
730
650
510
420
610
500
650
570
460
380
550
470
610
710
680
560
690
770
620
560
660
650
660
460
640
560
640
480
610
700
740
500
540
550
560
650
530
650
610
520
680
500
640
760
600

89
87

0.55
A Scatter Diagram

750
650

Math

660
600
450
500
580
400
490
490
340
320
560
400
450
510
730
510
570
690
510
480
590
500
660
360
470
540
560
440
620
560
590
450
400
470
400
500
490
560
540
650
520
450
590
650
500

SD Line
SDx
SDy

550
450
350
250
250

350

450

550
Verbal

650

370
520
480
640
560
450
540
400
480
700
490
420
580
500
610
420
460
500
470
580
380
690
440
600
660
490
560
460
620
590
550
490
520
450
620
620
490
400
590
420
540
300
600
360
440
370
440
480
490
400
540
440

610
670
640
720
460
460
640
550
560
710
540
440
730
530
700
580
620
570
620
740
560
540
420
650
640
600
590
750
690
640
560
640
650
490
730
690
610
500
640
460
500
610
710
490
460
370
680
490
460
550
640
570

A Scatter Diagram
750
750

SD Line

750
750

600
740
580
690
500
570
560
630
500
540
560
710
660
550
620
740
570
560
730
510
680
720
570
620
530
450
520
780
700
610
750
600
630
560
600
610
720
700
650
570
500
690
570
530
560
500
590
690
470
450
520
580

650
650
Math
Math

500
500
480
470
460
580
510
540
540
440
540
500
480
530
500
360
560
490
510
430
650
680
480
480
510
390
360
660
650
580
660
630
490
440
430
480
580
530
630
520
470
570
410
400
480
430
440
600
390
570
460
500

550
550
450
450
350
350
250
250250
250

350
350

450
450

550
550

Verbal
Verbal

650
650

420
640
440
520
460
510
490
350
630
400
470
630
440
540
570
510
590
520
400
410
440
580
440
580
440
530
450
440
720
510
380
540
500
490
460
670
620
480
420
480
540
500
560
420
490
420
420
580
580
470
540
450

560
660
620
720
480
560
550
510
650
560
500
760
610
650
600
580
630
440
640
510
480
580
480
680
540
680
570
510
580
590
480
630
630
610
560
600
570
570
560
580
700
560
800
660
490
560
420
740
620
610
500
500

A Scatter Diagram
750
650

SD Line

470
690
700
600
450
690
710
670
650
730
670
470
480
600
560
420
500
500
680
510
560
560
560
470
630
580
480
520
530
670
700
350
580
420
730
650
580
660
610
540
480
690
570
430
590
780
590
460
760
570
440
520

Math

430
520
590
540
540
530
580
570
600
700
440
490
530
500
470
380
360
480
600
430
500
540
470
470
400
620
470
390
530
460
600
450
560
500
660
440
520
450
490
490
510
620
460
490
660
550
410
470
680
560
360
620

550
450
350
250
250

350

450

550
Verbal

650

500
440
580
360
480
650
590
430
400
400
550
540
600
560
350
440
430
450
540
600
450
670
400
540
520
520
560
440
470
450
560
660
580
500
550
560
780
670
510
470
420
520
480
670
530
690
540
580
470
600
580
560

560
660
580
550
480
500
730
600
520
570
690
620
670
660
600
600
380
580
530
590
550
700
430
650
520
480
600
530
620
580
620
730
730
560
620
560
800
670
600
530
550
660
650
670
540
670
740
590
610
650
660
620

620
380
430
320
480
480
460
570
580
550
610
450
520
550
660
570
630
450
480
400
480
500
420
580
720
400
420
610
470
390
340
400
500
340
430
380
570
600
420
550
600
420
560
500
720
490
600
430
540
400
500
370

740
560
390
450
500
660
570
660
590
600
740
600
590
690
690
570
690
540
650
600
560
560
560
710
700
490
540
730
550
530
650
530
570
580
550
530
680
500
540
720
700
460
670
600
760
530
580
430
730
690
650
580

560
600
640
640
530
530
500
500
680
560
560
500
470
600
740
360
660
560
580
510
600
360
510
720
460
500
460
470
400
440
600
510
500
610
680
740
550
450
400
620
540
570
450
430
460
500
540
540
540
580
570
370

650
620
550
640
640
610
720
620
680
640
500
570
520
760
780
660
800
690
650
660
670
480
560
730
530
600
620
560
460
460
670
740
650
570
740
530
580
430
540
650
600
600
650
480
650
480
650
350
640
710
680
490

520
590
540
640
490
580
460
600
380
660
670
580
560
580
440
450
430
580
660
490
480
470
580
490
380
640
510
430
590
490
430
450
660
480
650
540
600
460
570
530
380
420
520
500
470
700
540
500
510
630
440
490

560
680
670
630
650
560
500
660
590
540
680
650
560
730
600
530
380
560
600
500
470
500
570
480
520
730
520
480
620
660
550
590
720
660
540
690
690
570
690
660
560
560
610
610
680
660
600
610
640
750
600
610

510
590
620
500
730
560
580
420
340
350
370
620
460
720
400
440
530
510
380
440
590
480
400
400
470
510
440
440
400
590
520
480
330
540
480
640
510
650
550
350
400
350
390
520
500
410
410
490
430
450
430
590

510
620
700
560
610
580
570
500
500
590
460
700
530
630
580
560
760
540
560
540
630
670
620
500
570
640
510
560
470
680
680
540
560
530
430
650
660
630
590
700
470
530
590
600
570
680
440
610
580
620
490
670

430
560
550
400
660
520
530
680
350
560

530
640
450
550
730
560
610
710
480
670

650

750

512
512

350
800

512
333.66

595
421

300
780

595
595

689.83

769

SD Line

650
650

750
750

SD Line

650

750

Perfect Positive Correlation = 1

Measured Shoe Size


Left Shoes
Right Shoes
1
6
6
2
10
10
3
12
12
4
3
3
5
8
8
6
9
9
7
5
5
8
4
4
9
16
16
10
11
11

Correlation Coefficient

18
16
14
12
Right Shoes

Person

10
8
6
4
2

Scroll Down for more

0
0

10
Left Shoes

Perfect Negative Correlation = -1


Hours
Person
Sleeping
Hours Awake
1
8
16
2
6
18
3
7.25
16.75
4
8.5
15.5
5
5
19
6
7.5
16.5
7
6.75
17.25
8
7.75
16.25
9
7.1
16.9
10
6.5
17.5

20
19
18
17
16
15
14
13
12

Correlation Coefficient

-1
11
10
0

Scroll Down for more

Person
1
2
3
4
5
6
7
8
9
10

Correlation = 0
SAT
Height (inches)
1100
70
1070
71
860
67
1370
75
1030
73
1230
71
1070
73
1420
77
980
79
1500
66

Correlation Coefficient

0.0

80
78
76
74
72
70
68
66
64
0

Done.
Return to SATScatter sheet

500

1000

10

15

20

Left Shoes

10

1500

2000

Demonstrating Correlation

2
80

Parameters
r
Avg Y
SD Y

70

0
20
20

60
50
40

Descriptive Statistics
X
Y
Average
24.5
SD
14.4
r
0.000

30
20
10

20
20

0
-10 0

10

20

30

40

50

-20

X
0
1
2
3
4
5
6
7
8
9
10

Y
39.41
23.83
8.14
0.06
4.01
47.98
14.98
24.21
7.83
24.33
-5.99

-30

free
nonlinear

0.5
2 Remember to click the Generate Y button after chaning a parameter.

60

y
-0.35319
1
0
2
<--Change cell B2 and click Understanding Correlation-->
0.317457
Point of Averages
SD Line
1.397798
Average x
2
SDx
0.82
Average y
1.238 SDy
0.70

r -0.35
2.5
2
1.5

x
1
2
3

1
0.5
SDLine

0
0

Watch where the new point is placed.


You should see that if the new point arrives in an empty quadrant, it tends to destroy the corr
Also, points far away have a stronger influence than points near the line.

2 0.317457
2
2

2 1.238418
0.78 2.282537

1 1.238418
3 1.238418

3.22

quadrant, it tends to destroy the correlation.


nts near the line.

0.1943

Costa Rican Consumption and Investment Shares of GDP.


Percent
Year
1950
1951
1952
1953
1954
1955
1956
1957
1958
1959
1960
1961
1962
1963
1964
1965
1966
1967
1968
1969
1970
1971
1972
1973
1974
1975
1976
1977
1978
1979
1980
1981
1982
1983
1984
1985
1986
1987
1988
1989
1990
1991
1992

C Share
70.7
71.3
71.9
76.3
76.5
75.2
73.8
73.3
74.2
71.3
71.7
70.4
69.2
68.1
71.4
67.9
70.1
70.4
70.8
69.9
69.6
66.8
66.8
65.1
64.5
64.6
62.7
63.5
65.1
63.5
61.8
63
62.6
62.9
63
63.8
62.6
63.2
63.6
63.8
64.3
64.9
64.9

I Share
11.8
12.6
14.4
13.5
12.2
12.6
13.9
14.6
11.5
14.2
12.6
12.8
14.3
15.7
11.6
17.6
13.7
14
13
14.2
14.8
17.4
15.1
16.8
17.3
15.4
18.5
20.5
19.2
20
21.4
14
11.3
14.6
15.1
15.9
19.7
19.5
17.7
18.4
18.6
15.9
18.9

25
23
21
19
17
15
13
11
9
7
5
55

60

65

Average
SD
r

70

C Share
67.7
4.3
-0.7

The data for this example came from version 5.6 of the Penn World T
http://datacentre2.chass.utoronto.ca/pwt56/docs/country.html
If you use a different version of PWT, the data are different.

slope
intercept

75

80

I Share
15.5
2.7

rom version 5.6 of the Penn World Tables:


.ca/pwt56/docs/country.html

WT, the data are different.

85

-0.637
58.651
SD Line
13.594
13.211
12.829
10.025
9.897
10.726
11.618
11.937
11.363
13.211
12.956
13.785
14.550
15.251
13.147
15.378
13.976
13.785
13.530
14.103
14.295
16.079
16.079
17.163
17.545
17.481
18.692
18.182
17.163
18.182
19.266
18.501
18.756
18.565
18.501
17.991
18.756
18.373
18.118
17.991
17.672
17.290
17.290

This sheet contains data on Nomimal GDP and Money Stock for the United States from 1959 to 1996.
The Velocity of Money is calculated by dividing NomGDP by the Money Stock.

Create scatter diagrams of Velocity over time and LN Velocity over time from 1959 to 1981 AND 1959 to 1996. Report the corr
For which cases is the correlation coefficient a good descriptor of the data?
For which cases is the correlation coefficient a poor descriptor of the data? Why?

Year
Nom M1 (bill $) Nom Output (bill $)
1959
140
507.2
1960
140.7
526.6
1961
145.2
544.8
1962
147.8
585.2
1963
153.3
617.4
1964
160.3
663
1965
167.8
719.1
1966
172
787.8
1967
183.3
833.6
1968
197.4
910.6
1969
203.9
982.2
1970
214.4
1035.6
1971
228.3
1125.4
1972
249.2
1237.3
1973
262.8
1382.6
1974
274.2
1496.9
1975
287.4
1630.6
1976
306.3
1819
1977
331.2
2026.9
1978
358.4
2291.4
1979
382.9
2557.5
1980
408.9
2784.2
1981
436.8
3115.9
1982
474.6
3242.1
1983
521.2
3514.5
1984
552.2
3902.4
1985
619.9
4180.7
1986
724.4
4422.2
1987
749.7
4692.3
1988
787
5049.6
1989
794.2
5438.7
1990
825.8
5743.8
1991
897.3
5916.7
1992
1025
6244.4
1993
1129.8
6558.1
1994
1150.7
6947
1995
1129
7265.4
1996
1081.1
7636

Velocity
3.62
3.74
3.75
3.96
4.03
4.14
4.29
4.58
4.55
4.61
4.82
4.83
4.93
4.97
5.26
5.46
5.67
5.94
6.12
6.39
6.68
6.81
7.13
6.83
6.74
7.07
6.74
6.10
6.26
6.42
6.85
6.96
6.59
6.09
5.80
6.04
6.44
7.06

ln Velocity
1.287
1.320
1.322
1.376
1.393
1.420
1.455
1.522
1.515
1.529
1.572
1.575
1.595
1.602
1.660
1.697
1.736
1.781
1.812
1.855
1.899
1.918
1.965
1.922
1.909
1.955
1.909
1.809
1.834
1.859
1.924
1.940
1.886
1.807
1.759
1.798
1.862
1.955

8.00
7.00
6.00
5.00
4.00
3.00
2.00
1.00
0.00
1950

59 to 1996. Report the correlation coefficient in each of the four cases.

0.87

Velocity from 1959 to 1996

1960

1970

1980

1990

2000

Q&A for Correlation.xls

Ave GDP

5218.0 r(GDP,IMR)

SD GDP

5231.5

Ave IMR

46.6

Problems
SD IMR
1. Open the workbook IMRGDP.xls. Find out what's being
graphed in the scatter diagram, reproduced on the right.
Explain why the summary statistics, average IMR, average
GDP, and the correlation coefficient taken together don't tell the
whole story.
2. Change cell B16 in the Computing r sheet (you need to click
the Computing r button in the Corr sheet first) in this workbook
to some very large value (1000) and look at how the table
changes. If necessary, hit F9 or Ctrl-= to make the sheet
recompute. What intuition does this give you as to why r can
never be less than - 1 or more than 1?
3. Give an example of two variables which have some
correlation but in which one variable does not cause the other
variable. You don't need actual data, just a plausible case.
Notes on Patterns Sheet
You can alter the randomness of the "Nonlinear, Random"
option by changing the "nonlinear" parameter. Make it bigger in
absolute value and you get more noise.
You can change the shape of both the nonlinear, deterministic
and nonlinear, random options by changing the free parameter.

39.7

-0.735