You are on page 1of 16

AB1202

Statistics and Analysis


Lecture 3
Covariance and Correlation
Chin Chee Kai
cheekai@ntu.edu.sg
Nanyang Business School
Nanyang Technological University
NBS 2016S1 AB1202 CCK-STAT-018
2

Covariance and Correlation


• Covariance & Correlation Definitions
• Covariance & Correlation of Grouped Data
• Covariance & Correlation as Properties of Two
Joint Random Variables
• Covariance & Correlation of Distributions
• Mean of Sum of Random Variables
• Variance of Sum of Random Variables
NBS 2016S1 AB1202 CCK-STAT-018
3

Covariance & Correlation Definition


1 𝑁
• 𝐶𝑜𝑣 𝑋, 𝑌 = 𝑖=1 𝑥𝑖 − 𝜇𝑋 𝑦𝑖 − 𝜇𝑌 For discrete random
𝑁 variables X & Y
1 𝑛
• 𝐶𝑜𝑣𝑠𝑎𝑚𝑝𝑙𝑒 𝑋, 𝑌 = 𝑥𝑖 − 𝑥 𝑦𝑖 − 𝑦
𝑛−1 𝑖=1
𝐶𝑜𝑣 𝑋,𝑌
• 𝐶𝑜𝑟𝑟𝑒𝑙 𝑋, 𝑌 =
𝑉𝑎𝑟(𝑋) 𝑉𝑎𝑟(𝑌)
𝐶𝑜𝑣𝑠𝑎𝑚𝑝𝑙𝑒 𝑋,𝑌
• 𝐶𝑜𝑟𝑟𝑒𝑙𝑠𝑎𝑚𝑝𝑙𝑒 𝑋, 𝑌 =
𝑠𝑋 𝑠𝑌
• More common correlation notations:
𝑆𝑆𝑋𝑌
𝑟= for calculating correlation from
𝑆𝑆𝑋𝑋 𝑆𝑆𝑌𝑌
samples, where 𝑆𝑆𝑋𝑌 = 𝑛𝑖=1 𝑥𝑖 − 𝑥 𝑦𝑖 − 𝑦
𝜎𝑋𝑌
𝜌= for calculating population
𝜎𝑋 𝜎𝑌
𝑁 1
correlation, where 𝜎𝑋𝑌 = 𝑖=1 𝑥𝑖 − 𝜇𝑋 𝑦𝑖 − 𝜇𝑌 is just
𝑁
the population covariance of X and Y, 𝐶𝑜𝑣 𝑋, 𝑌 .
NBS 2016S1 AB1202 CCK-STAT-018
4

Geometrical Interpretation of 𝐶𝑜𝑣𝑠𝑎𝑚𝑝𝑙𝑒 𝑋, 𝑌


𝑛
𝑌 1 1
𝐶𝑜𝑣𝑠𝑎𝑚𝑝𝑙𝑒 𝑋, 𝑌 = 𝑥𝑖 − 𝑥 𝑦𝑖 − 𝑦 = 𝑆𝑆
𝑛−1 𝑛 − 1 𝑋𝑌
𝑖=1
𝑛

𝑆𝑆𝑋𝑌 = 𝑥𝑖 − 𝑥 𝑦𝑖 − 𝑦
𝑥3 − 𝑥
𝑖=1
+area
–area 𝑦3 − 𝑦
𝑥5 − 𝑥
𝑦5 − 𝑦
𝑦
𝑦4 − 𝑦
𝑦1 − 𝑦
𝑦2 − 𝑦
𝑥4 − 𝑥

𝑥1 − 𝑥
𝑥2 − 𝑥
–area
+area
𝑋
𝑥
NBS 2016S1 AB1202 CCK-STAT-018
5

Covariance For Grouped Data


• 𝐶𝑜𝑣 𝑋, 𝑌 = 𝑀 𝑘=1 𝑥𝑘 − 𝜇𝑋 𝑦𝑘 − 𝜇𝑌 𝑃(𝑋 = 𝑥𝑘 , 𝑌 = 𝑦𝑘 )
where M is the number of unique pairs of values of X & Y.
• As we typically group data in tables, it is common to
place X & Y on different axis. The covariance formula
then looks like:
𝑚 𝑛

𝐶𝑜𝑣 𝑋, 𝑌 = 𝑥𝑗 − 𝜇𝑋 𝑦𝑖 − 𝜇𝑌 𝑃(𝑋 = 𝑥𝑗 , 𝑌 = 𝑦𝑖 )
𝑖=1 𝑗=1
where 𝑚 is the number of rows (unique n
values of Y) and 𝑛 is the number of columns
(unique values of X), and 𝑚 × 𝑛 = 𝑀 X1=8 X2=9
m Y1=6 0.1 0.4
• When we use probabilities, we implicitly Y2=7 0.3 0.2
imply that the covariance calculated will
be population covariance.
NBS 2016S1 AB1202 CCK-STAT-018
6

Covariance as a Property of Two Joint


Random Variables
• Whereas a single X can

P(X=x, Y=y)
have descriptives like 0.4
mean, standard deviation,
etc, … 0.3

• Covariance can be thought 0.2


of as a descriptive involving Y
0.1
2 random variables.
7
▫ Higher absolute value 0
6
8 9
implies stronger linear X
relationship
X1=8 X2=9 P(Y)
▫ Negative sign implies Y1=6 0.1 0.4 0.5
inverted relationship (large Y2=7 0.3 0.2 0.5
X and small Y, or small X P(X)= 0.4 0.6
and large Y, are observed)
NBS 2016S1 AB1202 CCK-STAT-018
7

Calculating Covariance(X, Y)
• Given observation data on the right,
calculate covariance of X & Y. X Y
𝑛 1 8 6
• 𝐶𝑜𝑣𝑠𝑎𝑚𝑝𝑙𝑒 𝑋, 𝑌 = 𝑥 − 𝑥 𝑦𝑖 − 𝑦 =
𝑛−1 𝑖=1 𝑖
8 − 8.6 6 − 6.5 + 9 − 8.6 7 − 6.5 + 9 7
8 − 8.6 7 − 6.5 + 9 − 8.6 6 − 6.5 + 8 7
1
8 − 8.6 7 − 6.5 + 9 − 8.6 7 − 6.5 + 9 6
10−1
9 − 8.6 6 − 6.5 + 8 − 8.6 7 − 6.5 + 8 7
9 − 8.6 6 − 6.5 + 9 − 8.6 6 − 6.5 9 7
1
= −1 = −0.1111 9 6
9 8 7
• If data is the entire population, then: 9 6
𝑁
1 9 6
𝐶𝑜𝑣 𝑋, 𝑌 = 𝑥𝑖 − 𝜇𝑋 𝑦𝑖 − 𝜇𝑌
𝑁
𝑖=1
1
= −1 = −0.1
10
NBS 2016S1 AB1202 CCK-STAT-018
8

Covariance of Distributions
• More commonly, we get tabulated X1=8 X2=9
data tables of joint probabilities. Y1=6 0.1 0.4
Y2=7 0.3 0.2
• Table on the right is tabulated from
the same data set as previous slide. P(X=x, Y=y)
• Calculating covariance will be: 𝜇𝑋 = 8 ∙ 0.4 + 9 ∙ 0.6 = 8.6
𝑚 𝑛 𝜇𝑌 = 6 ∙ 0.5 + 7 ∙ 0.5 = 6.5

𝐶𝑜𝑣 𝑋, 𝑌 = 𝑥𝑗 − 𝜇𝑋 𝑦𝑖 − 𝜇𝑌 𝑃(𝑋 = 𝑥𝑗 , 𝑌 = 𝑦𝑖 )
𝑖= 𝑗=1
8 − 8.6 6 − 6.5 ∙ 0.1 +
9 − 8.6 6 − 6.5 ∙ 0.4 +
=
8 − 8.6 7 − 6.5 ∙ 0.3 +
(9 − 8.6)(7 − 6.5) ∙ 0.2
= −0.1
NBS 2016S1 AB1202 CCK-STAT-018
9

Correlation of Random Variables


• Given observation data on the right,
calculate correlation of X & Y. X Y
𝐶𝑜𝑣𝑠𝑎𝑚𝑝𝑙𝑒 (𝑋,𝑌) 8 6
•𝑟= 9 7
𝑆𝑋 𝑆𝑌
8 7
−0.1111
= = −0.4082 9 6
0.5164×0.5270
8 7
• If data is the entire population, then 9 7
correlation becomes : 9 6
𝐶𝑜𝑣(𝑋,𝑌) −0.1 8 7
𝜌= = = −0.4083
𝜎𝑋 𝜎𝑌 0.4899×0.5 9 6
(would be -0.4082, due to rounding) 9 6
NBS 2016S1 AB1202 CCK-STAT-018
10

Correlation of Distributions
• More commonly, we get tabulated X1=8 X2=9

data tables of joint probabilities. Y1=6 0.1 0.4


Y2=7 0.3 0.2
Calculating correlation will be:
P(X=x, Y=y)
8 − 8.6 6 − 6.5 ∙ 0.1 + 𝜇𝑋 = 8 ∙ 0.4 + 9 ∙ 0.6 = 8.6
9 − 8.6 6 − 6.5 ∙ 0.4 + 𝜇𝑌 = 6 ∙ 0.5 + 7 ∙ 0.5 = 6.5
• 𝜎𝑋𝑌 = = −0.1
8 − 8.6 7 − 6.5 ∙ 0.3 +
(9 − 8.6)(7 − 6.5) ∙ 0.2
𝜎𝑋𝑌
𝜌=
𝜎𝑋 =
8 − 8.6 2 ∙ 0.4 +
= 0.4899 𝜎𝑋 𝜎𝑌
9 − 8.6 2 ∙ 0.6
−0.1
=
6 − 6.5 2 ∙ 0.5 + 0.4899×0.5
𝜎𝑌 = = 0.5
7 − 6.5 2 ∙ 0.5
= −0.4083
NBS 2016S1 AB1202 CCK-STAT-018
11

Covariance, Variance & Correlation


• Since covariance is defined for any random variables 𝑋
and 𝑌, we might just let 𝑌 = 𝑋 and get:
𝑉𝑎𝑟 𝑋 = 𝜎𝑋2 = 𝐶𝑜𝑣(𝑋, 𝑋) and
𝑉𝑎𝑟𝑠𝑎𝑚𝑝𝑙𝑒 𝑋 = 𝑠 2 = 𝐶𝑜𝑣𝑠𝑎𝑚𝑝𝑙𝑒 (𝑋, 𝑋)
• The correlation of 𝑋 and 𝑌(= 𝑋) will be:
𝐶𝑜𝑣 𝑋,𝑋 𝐶𝑜𝑣𝑠𝑎𝑚𝑝𝑙𝑒 𝑋,𝑋
𝜌= = 1 and 𝑟= =1
𝜎𝑋 ∙𝜎𝑋 𝑠∙𝑠
• This means a random variable 𝑋 is always linearly and
completely correlated with itself.

• Further, if 𝑐 is a constant, then:


𝑉𝑎𝑟 𝑐 = 0, 𝐶𝑜𝑣 𝑋, 𝑐 = 0, 𝐶𝑜𝑣𝑠𝑎𝑚𝑝𝑙𝑒 𝑋, 𝑐 = 0

• It is easy to also check that: 𝐶𝑜𝑣 𝑋, 𝑌 = 𝐶𝑜𝑣 𝑌, 𝑋


NBS 2016S1 AB1202 CCK-STAT-018
12

Correlation as a Property of Two Joint


Random Variables

• Just like covariance, correlation can also be thought


of as a descriptive involving 2 random variables.
▫ Range is always from -1 to 1.
▫ Value closer to 0 implies little to no linear correlation.
▫ Value closer to 1 implies stronger positive linear
correlation (large X with large Y).
▫ Value closer to -1 implies stronger negative linear
correlation (large X with small Y).
NBS 2016S1 AB1202 CCK-STAT-018
13

Mean of Sum of Random Variables


• Suppose 𝑋 and 𝑌 are random variables with means
𝐸 𝑋 , 𝐸(𝑌) and variances 𝑉𝑎𝑟(𝑋) and 𝑉𝑎𝑟(𝑌).
• If 𝑊 = 𝑋 + 𝑌 is a random variable, what is the
mean expected value of 𝑊?
𝐸 𝑊 =𝐸 𝑋+𝑌 =𝐸 𝑋 +𝐸 𝑌
• Thus, in layman’s words, mean of sum of two
random variables is the sum of each of the means
of the two random variables.
• In general, if 𝑎 and 𝑏 are real constants
𝐸 𝑎𝑋 + 𝑏𝑌 = 𝑎𝐸 𝑋 + 𝑏𝐸 𝑌
• This property is known as linearity of the expected
value of random variable sums.
NBS 2016S1 AB1202 CCK-STAT-018
14

Variance of Sum of Random Variables


• What about variance 𝑉𝑎𝑟 𝑊 ?
• 𝑉𝑎𝑟 𝑊 = 𝑉𝑎𝑟 𝑋 + 𝑌
= 𝑉𝑎𝑟 𝑋 + 𝑉𝑎𝑟 𝑌 + 2𝐶𝑜𝑣 𝑋, 𝑌

• More generally, if 𝑊 = 𝑎𝑋 + 𝑏𝑌 where 𝑎 and 𝑏 are real


constants, then:
𝑉𝑎𝑟 𝑎𝑋 + 𝑏𝑌 = 𝑎2 𝑉𝑎𝑟 𝑋 + 𝑏 2 𝑉𝑎𝑟 𝑌 + 2𝑎𝑏𝐶𝑜𝑣 𝑋, 𝑌
= 𝑎2 𝑉𝑎𝑟 𝑋 + 𝑏 2 𝑉𝑎𝑟 𝑌 + 2𝑎𝑏 ∙ 𝜌𝜎𝑋 𝜎𝑌

• So linearity does not apply to variance calculations when


the random variables are not independent.
• When 𝑋 and 𝑌 are independent, it means 𝐶𝑜𝑣 𝑋, 𝑌 = 0,
so: 𝑉𝑎𝑟 𝑎𝑋 + 𝑏𝑌 = 𝑎2 𝑉𝑎𝑟 𝑋 + 𝑏 2 𝑉𝑎𝑟 𝑌
NBS 2016S1 AB1202 CCK-STAT-018
15

Wonders of Variance
• If a, 𝑏 ≥ 0, is it possible to make
𝑉𝑎𝑟 𝑎𝑋 + 𝑏𝑌 < 𝑉𝑎𝑟 𝑋 + 𝑉𝑎𝑟(𝑌)
𝑎2 𝑉𝑎𝑟 𝑋 + 𝑏 2 𝑉𝑎𝑟 𝑌 + 2𝑎𝑏𝐶𝑜𝑣 𝑋, 𝑌 < 𝑉𝑎𝑟 𝑋 + 𝑉𝑎𝑟 𝑌

• For simplicity, suppose a = 𝑏 = 1


𝑉𝑎𝑟 𝑋 + 𝑉𝑎𝑟 𝑌 + 2𝐶𝑜𝑣 𝑋, 𝑌 < 𝑉𝑎𝑟 𝑋 + 𝑉𝑎𝑟 𝑌 [*]

• For this to hold, we require 𝐶𝑜𝑣 𝑋, 𝑌 < 0


• This means that [*] does not hold for any 𝑋 and 𝑌 in
general.
• But if we find 𝑋, 𝑌 which are such that 𝐶𝑜𝑣 𝑋, 𝑌 < 0,
then [*] holds, and we’re in luck!
NBS 2016S1 AB1202 CCK-STAT-018
16

Wonders of Variance
• Variance is related to real-life properties, and affect
our decisions.
• Variance (and the associated notion of standard
deviation) is related to energy, uncertainty of
outcomes, financial risk, etc.
• We typically seek to reduce variance, since most
businesses prefer predictability than uncertainty.
• The theory and understanding about variance gives
us a solid mathematical foundation to spend the time
and resources on finding matching pairs of 𝑋, 𝑌
whose covariance is negative.

You might also like