# Correlation and Covariance

James H. Steiger

Goals for Today
Introduce the statistical concepts of
 Covariance
 Correlation
Investigate invariance properties
Develop computational formulas
Covariance
So far, we have been analyzing summary
statistics that describe aspects of a single list
of numbers
Frequently, however, we are interested in
how variables behave together

Smoking and Lung Capacity
Suppose, for example, we wanted to
investigate the relationship between
cigarette smoking and lung capacity
smoking habits, and measure their lung
capacities
Smoking and Lung Capacity
Cigarettes (X) Lung Capacity (Y)
0 45
5 42
10 33
15 31
20 29
Smoking and Lung Capacity
With SPSS, we can easily enter these data and produce a scatterplot.

Smoking
30 20 10 0 -10
L
u
n
g

C
a
p
a
c
i
t
y
50
40
30
20
Smoking and Lung Capacity
We can see easily from the graph that as
smoking goes up, lung capacity tends to go
down.
The two variables covary in opposite
directions.
We now examine two statistics, covariance
and correlation, for quantifying how
variables covary.

Covariance
When two variables covary in opposite
directions, as smoking and lung capacity do,
values tend to be on opposite sides of the
group mean. That is, when smoking is
above its group mean, lung capacity tends
to be below its group mean.
Consequently, by averaging the product of
deviation scores, we can obtain a measure
of how the variables vary together.
The Sample Covariance
Instead of averaging by dividing by N, we
divide by . The resulting formula is
1 N ÷
( )( )
1
1
1
N
xy i i
i
S X X Y Y
N
- -
=
= ÷ ÷
÷
¿
Calculating Covariance
Cigarettes
(X)
dX dXdY dY
Lung
Capacity (Y)
0 ÷10 ÷90 +9 45
5 ÷5 ÷30 +6 42
10 0 0 ÷3 33
15 +5 ÷25 ÷5 31
20 +10 ÷70 ÷7 29
215 ÷
Calculating Covariance
So we obtain

1
( 215) 53.75
4
xy
S = ÷ = ÷
Invariance Properties of Covariance
The covariance is invariant under listwise
multiplication. Hence, it is vulnerable to
changes in standard deviation of the
variables, and is not scale-invariant.
Invariance Properties of Covariance
If , then
i i
i i
L aX b
= +
=
1
1 1
Let ,
1
Then
1
1 1
1 1
i i i i
N
LM i i
i
N N
i i i i xy
i i
L aX b M cY d
S dl dm
N
adx cdy ac dx dy acS
N N
=
= =
= + = +
=
÷
= = =
÷ ÷
¿
¿ ¿
Invariance Properties of Covariance
Multiplicative constants come straight
through in the covariance, so covariance is
difficult to interpret – it incorporates
information about the scale of the variables.

The (Pearson) Correlation
Coefficient
Like covariance, but uses Z-scores instead
of deviations scores. Hence, it is invariant
under linear transformation of the raw
scores.
1
1
1
N
xy i i
i
r zx zy
N
=
=
÷
¿
Alternative Formula for the
Correlation Coefficient
xy
xy
x y
s
r
s s
=
Computational Formulas --
Covariance
There is a computational formula for covariance
similar to the one for variance. Indeed, the latter is
a special case of the former, since variance of a
variable is “its covariance with itself.”

1 1
1
1
1
N N
i i
N
i i
xy i i
i
X Y
s X Y
N N
= =
=
| |
|
| = ÷
÷
|
|
\ .
¿ ¿
¿
Computational Formula for
Correlation
By substituting and rearranging, you obtain
a substantial (and not very transparent)
formula for
xy
r
( ) ( )
2 2
2 2
xy
N XY X Y
r
N X X N Y Y
÷
=
( (
÷ ÷
¸ ¸ ¸ ¸
¿ ¿ ¿
¿ ¿ ¿ ¿
Computing a correlation
Cigarettes
(X)

XY
Lung
Capacity
(Y)
0 0 0 2025 45
5 25 210 1764 42
10 100 330 1089 33
15 225 465 961 31
20 400 580 841 29
50 750 1585 6680 180
2
X
2
Y
Computing a Correlation
( )
2 2
(5)(1585) (50)(180)
(5)(750) 50 (5)(6680) 180
7925 9000
(3750 2500)(33400 32400)
1075
.9615
1250 (1000)
xy
r
÷
=
( (
÷ ÷
¸ ¸ ¸ ¸
÷
=
÷ ÷
÷
= = ÷

Goals for Today
Introduce the statistical concepts of
Covariance  Correlation

Investigate invariance properties Develop computational formulas

however. we are interested in how variables behave together .Covariance So far. we have been analyzing summary statistics that describe aspects of a single list of numbers Frequently.

we wanted to investigate the relationship between cigarette smoking and lung capacity We might ask a group of people about their smoking habits. for example.Smoking and Lung Capacity Suppose. and measure their lung capacities .

Smoking and Lung Capacity Cigarettes (X) 0 5 Lung Capacity (Y) 45 42 10 15 20 33 31 29 .

50 40 30 Lung Capacity 20 -10 0 10 20 30 Smoking . we can easily enter these data and produce a scatterplot.Smoking and Lung Capacity With SPSS.

We now examine two statistics. .Smoking and Lung Capacity We can see easily from the graph that as smoking goes up. lung capacity tends to go down. covariance and correlation. for quantifying how variables covary. The two variables covary in opposite directions.

values tend to be on opposite sides of the group mean. when smoking is above its group mean. we can obtain a measure of how the variables vary together. Consequently. by averaging the product of deviation scores. . lung capacity tends to be below its group mean. That is.Covariance When two variables covary in opposite directions. as smoking and lung capacity do.

The Sample Covariance Instead of averaging by dividing by N. we divide by N  1 . The resulting formula is 1 S xy    X i  X  Yi  Y  N  1 i1 N .

Calculating Covariance Cigarettes (X) 0 5 10 dX 10 dXdY 90 dY +9 Lung Capacity (Y) 45 42 33 5 0 30 0 +6 3 15 20 +5 +10 25 70 5 7 31 29 215 .

Calculating Covariance So we obtain S xy 1  ( 215)  53.75 4 .

but not under listwise multiplication. and is not scale-invariant. . it is vulnerable to changes in standard deviation of the variables.Invariance Properties of Covariance The covariance is invariant under listwise addition. Hence.

M i  cYi  d Then S LM 1 N   dli dmi N  1 i 1 1 N 1 N   adxi cdyi  ac N  1  dxi dyi  acS xy N  1 i 1 i 1 . then dli  adxi Let Li  aX i  b.Invariance Properties of Covariance If Li  aX i  b.

Invariance Properties of Covariance Multiplicative constants come straight through in the covariance. so covariance is difficult to interpret – it incorporates information about the scale of the variables. .

The (Pearson) Correlation Coefficient Like covariance. 1 rxy   zxi zyi N  1 i1 N . it is invariant under linear transformation of the raw scores. but uses Z-scores instead of deviations scores. Hence.

Alternative Formula for the Correlation Coefficient rxy  sxy sx s y .

” N N   X i  Yi  N 1 i 1 i 1   X iYi  sxy  N  1  i 1 N         . the latter is a special case of the former.Computational Formulas -Covariance There is a computational formula for covariance similar to the one for variance. Indeed. since variance of a variable is “its covariance with itself.

Computational Formula for Correlation By substituting and rearranging. you obtain a substantial (and not veryrtransparent) xy formula for rxy  N  XY   X  Y  N X 2   X 2   N Y 2   Y 2         .

Computing a correlation Cigarettes (X) X2 0 25 100 225 400 750 XY 0 210 330 465 580 1585 Y2 2025 1764 1089 961 841 6680 Lung Capacity (Y) 0 5 10 15 20 50 45 42 33 31 29 180 .

Computing a Correlation rxy  (5)(1585)  (50)(180) (5)(750)  502  (5)(6680)  1802     7925  9000  (3750  2500)(33400  32400)  1075 1250  (1000)  .9615 .