You are on page 1of 30

PHP 2510

Expectation, variance, covariance, correlation
Expectation
• Discrete RV - weighted average
• Continuous RV - use integral to take the weighted average
Variance
• Variance is the average of (X −µ)
2
• Standard deviation
Covariance and correlation
• Covariance is the average of (X −µ
X
)(Y −µ
Y
)
• Correlation is a scaled version of covariance
Lots of examples
PHP 2510 – Oct 8, 2008 1
Expected value
Synonyms for expected value: ‘average’, ‘mean’
The expectation or expected value of a random variable X is a
weighted average of its possible outcomes.
For a discrete random variable, each outcome is weighted by its
probability of occurrence, using the mass function:
E(X) =

i
x
i
· P(X = x
i
) =

i
x
i
p(x
i
)
For a continuous random variable, each outcome is weighted by the
relative frequency of its occurrence, using the density function:
E(X) =
_
x f(x) dx
PHP 2510 – Oct 8, 2008 2
Examples: Discrete random variables
Example 1. Let X denote the number of boys in a family with
three children. Assume the probability of having a boy is .5.
Step 1: Compute the mass function
k p(k)
0 .125
1 .375
2 .375
3 .125
Step 2: Compute weighted average
E(X) =
3

k=0
k p(k)
= (0)(.125) + (1)(.375) + (2)(.375) + (3)(.125)
= 1.5
PHP 2510 – Oct 8, 2008 3
Example 2: Roulette. In roulette, a ball is tossed on a spinning
wheel, and it lands on one of 38 numbers (each of 1 to 36, plus 0
and 00). If you bet $1 on a particular number, the payoff for
winning is $36.
Suppose you bet $1 on the number 12. Define the random variable
X to be your winnings on one play of the roulette wheel. Then
X =
_
_
_
36 if the number is 12
−1 if the number is not 12
Find E(X), or your expected winnings.
PHP 2510 – Oct 8, 2008 4
Step 1: Compute mass function
k p(k)
36
1
38
–1
37
38
Step 2: Compute E(X) as weighted average of outcomes
E(X) =

k=−1,36
k p(k)
= (−1)
_
37
38
_
+ (36)
_
1
38
_
= −0.026
Question: What the expected return in 100 plays of roulette?
PHP 2510 – Oct 8, 2008 5
Expected value for common discrete RV’s
Binomial. If X has the binomial distribution with parameters n
and π, then E(X) = nπ.
Example: Toss a coin 50 times, and let X denote the number of
heads. Then
E(X) = nπ = 50 ×.5 = 25
Example: The proportion of individuals with coronary artery
disease is .3. In a sample of 45 individuals, what is the
expected number of cases of CAD?
E(X) = nπ = 45 ×.3 = 13.5
Suppose one person is selected from the population. Define a
random variable Y such that Y = 1 if the person has CAD and
Y = 0 if not. Then
E(Y ) = nπ = 1 ×.3 = .3
PHP 2510 – Oct 8, 2008 6
Poisson. If X has the Poisson distribution with rate parameter λ,
then E(X) = λ. This is because
E(X) =

k=0
k
_
e
−λ
λ
k
k!
_
= λ
The mean of a Poisson RV is the number of events you expect to
observe.
PHP 2510 – Oct 8, 2008 7
Geometric. If X has the Geometric distribution with success
probability π, then E(X) = 1/π. This is because
E(X) =

k=1
k
_
(1 −π)
k−1
π
_
=
1
π
The mean of a geometric RV is the number of trials you expect to
require before observing the first success. Hence if the success
probability π is low, E(X) will be high; and vice-versa.
Example. If you roll two dice, the probability of rolling a 3 is 2/36
or about 0.56. Let X denote the number of rolls until a 3 comes
up. What is E(X)? (Ans: 18)
PHP 2510 – Oct 8, 2008 8
Expected value for continuous RV
Let X be a continuous random variable defined on an interval A.
Then the expected value is a weighted average of outcomes,
weighted by the relative frequency of each outcome. The weighted
average is computed using an integral,
E(X) =
_
A
x f(x) dx
PHP 2510 – Oct 8, 2008 9
Example. Suppose X is a uniform random variable on the interval
[1, 4]. Find E(X).
Step 1: Recall that f(x) =
1
4−1
=
1
3
, and that the interval A is
[1, 4]. So the appropriate integral is
_
4
1
x f(x) dx =
_
4
1
x
1
3
dx
Step 2: Evaluate the integral
_
4
1
x
1
3
dx =
1
3
x
2
2
¸
¸
¸
¸
4
1
= 2.5
PHP 2510 – Oct 8, 2008 10
Expected values for common continuous RV’s
Normal. If X has a normal distribution with parameters µ and σ,
then E(X) = µ.
Exponential. If X has the exponential distribution with
parameter θ, then E(X) = θ. In this case, θ is the expected waiting
time until an event occurs, and 1/θ is called the event rate.
PHP 2510 – Oct 8, 2008 11
Some properties of expected values.
1. Linear combinations. If a and b are constants, then
E(aX +b) = aE(X) +b
2. Sums of random variables. The expected value of a sum of
random variables is the sum of expected values.
E(X
1
+X
2
+ · · · +X
n
) = E(X
1
) +E(X
2
) + · · · +E(X
n
)
PHP 2510 – Oct 8, 2008 12
Example. Suppose X is a Poisson random variable denoting the
number of lottery winners per week. Its expected value is
E(X) = 2. What is the expected number of winners over 4 weeks?
E(4X) = 4 ×E(X) = 4 × 2 = 8
Example. Let X denote the daily low temperature for each day in
September, and let E(X) denote its average. Suppose E(X) = 65,
measured in degrees Fahrenheit. What is the mean temperature in
degrees Celsius?
To convert X from F to C, define a new random variable
Y =
5
9
X −
160
9
Then using the rule about linear combinations,
E(Y ) =
5
9
E(X) −
160
9
≈ 18.3
PHP 2510 – Oct 8, 2008 13
Computing means from a sample of data
Loosely speaking, for a sample of observed data x
1
, x
2
, . . . , x
n
, each
of the individual x
i
can be thought of as having associated
probability mass p(x
i
) = 1/n.
So the sample mean is
x =
n

i=1
x
i
p(x
i
)
=
n

i=1
x
i
(1/n)
=
1
n
n

i=1
x
i
Simply put, take the sum of the observations and divide by n.
Sample means are not expected values! They are random variables.
We will discuss sample means later on ....
PHP 2510 – Oct 8, 2008 14
Variance of a random variables
Variance measures dispersion of a random variable’s distribution.
It is just an average. It is the average squared deviation of a
random variable from its mean.
To make notation simple, let µ = E(X). Then
var(X) = E{(X −µ)
2
}
In other words, it is the average value of (X −µ)
2
.
For a discrete random variable,
var(X) =

i
(x
i
−µ)
2
p(x
i
)
For a continuous random variable,
var(X) =
_
(x −µ)
2
f(x) dx
PHP 2510 – Oct 8, 2008 15
Example 1 (consumers of alcohol). In a certain population,
the proportion of those consuming alcohol is .65. Select a person at
random, with X = 1 if consumer of alcohol and X = 0 if not.
In this example, E(X) = µ = 0.65.
var(X) = E{(X − 0.65)
2
}
=

i
(x
i
− 0.65)
2
p(x
i
)
= (1 − 0.65)
2
(0.65) + (0 − 0.65)
2
(0.35) = .228
Example 2. Suppose instead the probability was 0.1. What then
is var(X)? Ans = 0.09.
Pattern: For a Binomial random variable X with n = 1 and
success probability π,
var(X) = π(1 −π)
PHP 2510 – Oct 8, 2008 16
Properties of variance
• If a and b are constants, then
var(aX +b) = a
2
var(X)
(Why is b not included?)
• If X
1
, X
2
, . . . , X
n
are independent random variables, then
var(X
1
+X
2
+ · · · +X
n
) = var(X
1
) + var(X
2
) + · · · + var(X
n
)
PHP 2510 – Oct 8, 2008 17
Computing variances from a sample of data
Like with the sample mean, for a sample of observed data
x
1
, x
2
, . . . , x
n
, each of the individual x
i
can be thought of as having
associated probability mass p(x
i
) = 1/n.
To calculate the sample variance, we take an average of (x
i
−x)
2
.
The sample variance is
S
2
=
n

i=1
(x
i
−x)
2
p(x
i
)
=
n

i=1
(x
i
−x)
2
(1/n)
=
1
n
n

i=1
(x
i
−x)
2
It is more common to use
1
n−1
instead of
1
n
. We will discuss reasons
for this later.
For now, you should think of variance as an average.
PHP 2510 – Oct 8, 2008 18
Standard deviation
The standard deviation measures the average distance of a random
variable X from its mean. By definition, SD(X) =
_
var(X).
The logic goes like this:
1. because var(X) measures average squared deviation between X
and its mean; and
2. because SD(X) =
_
var(X); then
3. SD(X) is approximately equal to the average absolute deviation
between X and its mean
PHP 2510 – Oct 8, 2008 19
Example.
In September in Providence, noon time temperature has mean 65
and variance 100.
• What is the SD of the temperatures?
• Select a day at random. What does SD tell us about the
temperature on that day, relative to the average temperature?
• Suppose noon time temps are normally distributed. Should a
noon time temperature of 85 be considered unusual? Why or
why not?
PHP 2510 – Oct 8, 2008 20
Mean and variance for some common RV’s
Random variable Mass or Density Function E(X) var(X)
Binomial(n, π)
_
n
x
_
π
x
(1 −π)
n−x
nπ nπ(1 −π)
Poisson(λ) e
−λ
λ
x
/x! λ λ
Geometric(π) (1 −π)
x−1
π 1/π 1/π
2
Normal(µ, σ
2
) µ σ
2
Exponential(θ) (1/θ)e
−θ/x
1/θ 1/θ
2
PHP 2510 – Oct 8, 2008 21
Correlation and Covariance
Correlation and covariance are one way to measure association
between two random variables that are observed at the same time
on the same unit.
Example: Height and weight measured on the same person
Example: years of education and income
Example: two successive measures of weight, taken on the same
person but one year apart.
PHP 2510 – Oct 8, 2008 22
Covariance
Covariance measures the degree to which two variables differ from
their mean. It is an average:
cov(X, Y ) = E {(X −µ
X
)(Y −µ
Y
)}
cov(X, Y ) > 0 means that X and Y tend to vary in the same
direction relative to their means (both higher or both lower).
They have a positive association.
• Example: height and weight
cov(X, Y ) < 0 means that X and Y tend to vary in opposite
directions relative to their means (when one is higher, the other
is lower). They have a negative association.
• Example: weight and minutes of exercise per day
cov(X, Y ) = 0 generally means that X and Y are not associated.
PHP 2510 – Oct 8, 2008 23
Example: mean arterial pressure and body mass index
during pregnancy
SUMMARY STATISTICS
Variable | Obs Mean Std. Dev.
----------+---------------------------------
map24 | 326 76.55951 7.351673
bmi | 326 25.10736 6.217994
Give an interpretation for SD here.
PHP 2510 – Oct 8, 2008 24
m
a
p
2
4
bmi
20 40 60
40
60
80
100
PHP 2510 – Oct 8, 2008 25
Computing covariance
For individual i, let m
i
denote MAP and let b
i
denote BMI.
In this table, prod represents
(m
i
−m) × (b
i
−b)
Recall m = 76.6 and b = 25.1.
To compute covariance, we take the average (sample mean) of the
products (following pages)
DATA EXCERPT
map24 (m_i) bmi (b_i) prod
-------------------------------------
1. 72.7 15.9 35.53593
2. 69.3 16.3 63.9371
3. 81 16.3 -39.10899
4. 63.7 16.3 113.2583
5. 74 16.6 21.77467
6. 73.3 16.6 27.7298
PHP 2510 – Oct 8, 2008 26
7. 69.3 16.9 59.58139
8. 74.7 16.9 15.26169
9. 82.7 17 -49.78313
10. 73 17.2 28.14632
11. 66.3 17.2 81.12561
12. 74 17.8 18.70326
13. 73 17.8 26.01062
14. 84.3 17.9 -55.78852
15. 68.3 17.9 59.52924
16. 70.3 18 44.48857
SUMMARY STATISTICS
Variable | Obs Mean
---------+-----------------------
prod | 326 13.2753
PHP 2510 – Oct 8, 2008 27
Computing covariance from a sample
Like mean and variance, covariance is an average.
In a sample of pairs (x
1
, y
1
), (x
2
, y
2
), . . . , (x
n
, y
n
), we can assume
each pair is observed with probability p(x
i
, y
i
) = 1/n.
Then the sample covariance is a weighted average of
(x
i
−x) (y
i
−y):
¯ cov(X, Y ) =
n

i=1
(x
i
−x) (y
i
−y) p(x
i
, y
i
)
=
1
n
n

i=1
(x
i
−x) (y
i
−y)
PHP 2510 – Oct 8, 2008 28
Correlation is a standardized covariance
corr(X, Y ) =
cov(X, Y )
SD(X) × SD(Y )
Always between –1 and 1
Measures degree of linear relationship
(If relationship not linear, correlation not an appropriate
measure of association)
Pearson’s sample correlation plugs in sample estimates for the
quantities in the formula above
¯ corr(X, Y ) =
(1/n)

n
i=1
(x
i
−x)(y
i
−y)
S
x
×S
y
PHP 2510 – Oct 8, 2008 29
SUMMARY STATISTICS
Variable | Obs Mean Std. Dev. Min Max
---------+-----------------------------------------------------
prod | 326 13.2753 53.69735 -131.3067 391.1627
map24 | 326 76.55951 7.351673 55 101.3
bmi | 326 25.10736 6.217994 15.9 57.2
CORRELATION COEFFICIENT
(obs=326)
| bmi
---------+------------------
map24 | 0.2913
Using the numbers on the table above, how would you obtain the
correlation coefficient?
PHP 2510 – Oct 8, 2008 30