Week 3 Slides

Subsection 1
Introduction
Week 4: Expected value 2 / 22

Summarizing data
How to provide summary information about a large set of data?
I Minimum value, maximum value, average value, median value
Average value of a set of data is very useful in practice.

I Average marks in a class exam
I Run rate, batting average in cricket (and similar numbers in many
sports)
What does the average represent?

I One number to represent a large set of numbers
I Useful in comparisons and in so many scenarios
Probability theory: expected value of a random variable is an important

theoretical construct that represents the average value
I It is used to express various types of summary information

Example: Casino math
Place bets on the sum seen when two die are rolled.
Bets and returns
I Under 7: get money back
I Over 7: get money back
I Equal to 7: get 4 times the money back

Bets and returns
Suppose you bet 1 unit of money.
Bet Gain Prob

Under 7 0 5/12
−1 7/12
Over 7 0 5/12
−1 7/12
Equal 7 +3 1/6
−1 5/6

Bets and returns
Suppose you bet 1 unit of money.
Bet Gain Prob

Under 7 0 5/12
−1 7/12
Over 7 0 5/12
−1 7/12
Equal 7 +3 1/6
−1 5/6
How did the casino decide on above returns? How to bet?

Expected value of a discrete random variable
Definition (Expected value)

Suppose X is a discrete random variable with range TX and PMF fX . The
expected value of X , denoted E [X ], is defined as
X
E [X ] = t fX (t),
t∈TX
assuming above sum exists.

Expected value of a discrete random variable
Definition (Expected value)

Suppose X is a discrete random variable with range TX and PMF fX . The
expected value of X , denoted E [X ], is defined as
X
E [X ] = t fX (t),
t∈TX
assuming above sum exists.
Other names: mean of X , average value of X

E [X ] may or may not belong to the range of X
E [X ] has the same units as X

Examples: Easy summation given PMF
1 X ∼ Bernoulli(p)
E [X ] = 0(1 − p) + 1(p) = p

E [X ] = 0(1 − p) + 1(p) = p
2 X ∼ Uniform{1, 2, 3, 4, 5, 6}
1 1 1 1 1 1
E [X ] = 1 · + 2 · + 3 · + 4 · + 5 · + 6 · = 3.5
6 6 6 6 6 6

E [X ] = 0(1 − p) + 1(p) = p
2 X ∼ Uniform{1, 2, 3, 4, 5, 6}
1 1 1 1 1 1
E [X ] = 1 · + 2 · + 3 · + 4 · + 5 · + 6 · = 3.5
6 6 6 6 6 6
1/1000 27/1000 972/1000
3 Value of a lottery ticket (in Rs.) is { 200 , 20 , 0 }.
1 27 972
E [X ] = 200 · + 20 · +0· = Rs. 0.56
1000 1000 1000

E [X ] = 0(1 − p) + 1(p) = p
2 X ∼ Uniform{1, 2, 3, 4, 5, 6}
1 1 1 1 1 1
E [X ] = 1 · + 2 · + 3 · + 4 · + 5 · + 6 · = 3.5
6 6 6 6 6 6
1/1000 27/1000 972/1000
3 Value of a lottery ticket (in Rs.) is { 200 , 20 , 0 }.
1 27 972
E [X ] = 200 · + 20 · +0· = Rs. 0.56
1000 1000 1000
1/5 1/5 1/2 1/10
4 Change in temperature (in ◦ C) is {−2, −1, 0 , 1 }.
1 1 1 1
E [X ] = −2 · −1· +0· +1· = −0.5 ◦ C
5 5 2 10
Example: Uniform{a, a + 1, . . . , b}
X ∼ Uniform{a, a + 1, . . . , b}
1 1 1
E [X ] = a · + (a + 1) · + ··· + b ·
b−a+1 b−a+1 b−a+1

X ∼ Uniform{a, a + 1, . . . , b}
1 1 1
E [X ] = a · + (a + 1) · + ··· + b ·
b−a+1 b−a+1 b−a+1
How to simplify sum?

Identity: a + (a + 1) + · · · + b = (b − a + 1) a+b
2

X ∼ Uniform{a, a + 1, . . . , b}
1 1 1
E [X ] = a · + (a + 1) · + ··· + b ·
b−a+1 b−a+1 b−a+1

Identity: a + (a + 1) + · · · + b = (b − a + 1) a+b
2
a+b
E [X ] =
2

Examples: More involved summation
1 X ∼ Geometric(p)
∞
X
E [X ] = t(1 − p)t−1 p
t=1

∞
X
E [X ] = t(1 − p)t−1 p
t=1
2 X ∼ Poisson(λ)
∞
λt
t e −λ
X
E [X ] =
t=0
t!

∞
X
E [X ] = t(1 − p)t−1 p
t=1
2 X ∼ Poisson(λ)
∞
λt
t e −λ
X
E [X ] =
t=0
t!
3 X ∼ Binomial(n, p)
n
!
X n t
E [X ] = t p (1 − p)n−t
t=0
t

1 Difference Equation (DE): at+1 − r at = bt (r 6= 1)
n
X a1 − r a n 1 n−1
X
at = + bt
t=1
1−r 1 − r t=1

n
X a1 − r a n 1 n−1
X
at = + bt
t=1
1−r 1 − r t=1
2 Geometric Progression (GP): at+1 − r at = 0 (r 6= 1)

n
X a1 − ran |r |<1 a1
at = =⇒
t=1
1−r n→∞ 1−r

n
X a1 − r a n 1 n−1
X
at = + bt
t=1
1−r 1 − r t=1

n
X a1 − ran |r |<1 a1
at = =⇒
t=1
1−r n→∞ 1−r
3 Exponential Function:
∞
λt
e −λ
X
=1
t=0
t!

n
X a1 − r a n 1 n−1
X
at = + bt
t=1
1−r 1 − r t=1

n
X a1 − ran |r |<1 a1
at = =⇒
t=1
1−r n→∞ 1−r
3 Exponential Function:
∞
λt
e −λ
X
=1
t=0
t!
n
n k n−k
= (a + b)n
P
4 Binomial formula: k a b
k=0

1 X ∼ Geometric(p) (DE: a1 = p, r = 1 − p, bt = r t p)
∞
X
E [X ] = t(1 − p)t−1 p = 1/p
t=1

∞
X
E [X ] = t(1 − p)t−1 p = 1/p
t=1
P∞ t
2 X ∼ Poisson(λ): E [X ] = t=0 t e −λ λt!
λ λ2 λ3

−λ
E [X ] = e 0 · +1 · + 2 · +3· + ···
1! 2! 3!
λ λ2

= λe −λ 1 + + + ··· = λ
1! 2!

∞
X
E [X ] = t(1 − p)t−1 p = 1/p
t=1
P∞ t
2 X ∼ Poisson(λ): E [X ] = t=0 t e −λ λt!
λ λ2 λ3

−λ
E [X ] = e 0 · +1 · + 2 · +3· + ···
1! 2! 3!
λ λ2

= λe −λ 1 + + + ··· = λ
1! 2!
Pn n t
3 X ∼ Binomial(n, p): E [X ] = t=0 t t p (1 − p)n−t
n n
X t n! X (n−1)!
E [X ] = p t (1−p)n−t = np p t−1 (1−p)n−t
t=1
t!(n−t)! t=1
(t−1)!(n−t)!
= np(p + (1 − p))n−1 = np
PMF and expected value

Subsection 2
Properties of expected value

Bet Gain Prob
Under 7 0 5/12
−1 7/12
Over 7 0 5/12
−1 7/12
Equal 7 +3 1/6
−1 5/6
Suppose you bet 1 unit of money with probability p1 each on “Under 7” or

“Over 7” and probability 1 − 2p1 on “Equal to 7”. What is the expected
gain?

Constant random variable and positive random variable
X
E [X ] = t P(X = t)
t∈TX

X
E [X ] = t P(X = t)
t∈TX
1 Consider a constant c as a random variable X with P(X = c) = 1.
E [c] = c · 1 = c

X
E [X ] = t P(X = t)
t∈TX
1 Consider a constant c as a random variable X with P(X = c) = 1.
E [c] = c · 1 = c
2 Suppose X takes only non-negative values, i.e. P(X ≥ 0) = 1. Then,
E [X ] ≥ 0

Expected value of a function of random variables
Theorem (Expected value of a function)

Suppose X1 , . . . , Xn have joint PMF fX1 ···Xn with range of Xi denoted TXi .
Let g : TX1 × · · · × TXn → R be a function, and let Y = g(X1 , . . . , Xn )
have range TY and PMF fY . Then,
X X
E [g(X1 , . . . , Xn )] = t fY (t) = g(t1 , . . . , tn )fX1 ···Xn (t1 , . . . , tn ).
t∈TY ti ∈TXi

Expected value of a function of random variables
Theorem (Expected value of a function)

Suppose X1 , . . . , Xn have joint PMF fX1 ···Xn with range of Xi denoted TXi .
Let g : TX1 × · · · × TXn → R be a function, and let Y = g(X1 , . . . , Xn )
have range TY and PMF fY . Then,
X X
E [g(X1 , . . . , Xn )] = t fY (t) = g(t1 , . . . , tn )fX1 ···Xn (t1 , . . . , tn ).
t∈TY ti ∈TXi
We have seen how to find fY , PMF of a function of multiple random

variables
The above theorem says: To find E [Y ], you do not always need fY .
The joint PMF of X1 , . . . , Xn can be used directly.
Simple sounding property with far-reaching consequences!

Examples for illustration
1/5 1/5 1/5 1/5 1/5 1/5 2/5 2/5
1 X ∼ {−2, −1, 0 , 1 , 2 }, g(X ) = X 2 ∼ { 0 , 1 , 4 }

1/5 1/5 1/5 1/5 1/5 1/5 2/5 2/5
1 X ∼ {−2, −1, 0 , 1 , 2 }, g(X ) = X 2 ∼ { 0 , 1 , 4 }
1 2 2
E [g(X )] = 0 · 5 +1· 5 +4· 5 =2

1/5 1/5 1/5 1/5 1/5 1/5 2/5 2/5
1 X ∼ {−2, −1, 0 , 1 , 2 }, g(X ) = X 2 ∼ { 0 , 1 , 4 }
1 2 2
E [g(X )] = 0 · 5 +1· 5 +4· 5 =2
1 1 1 1 1
E [g(X )] = (−2)2 · 5 + (−1)2 · 5 + (0)2 · 5 + (1)2 · 5 + (2)2 · 5 =2

1/5 1/5 1/5 1/5 1/5 1/5 2/5 2/5
1 X ∼ {−2, −1, 0 , 1 , 2 }, g(X ) = X 2 ∼ { 0 , 1 , 4 }
1 2 2
E [g(X )] = 0 · 5 +1· 5 +4· 5 =2
1 1 1 1 1
E [g(X )] = (−2)2 · 5 + (−1)2 · 5 + (0)2 · 5 + (1)2 · 5 + (2)2 · 5 =2
2 (X , Y ) ∼ Uniform{(0, 0), (1, 0), (0, 1), (1, 1), (−1, 1), (1, −1)}
1/6 4/6 1/6
g(X , Y ) = X 2 + XY + Y 2 ∼ { 0 , 1 , 3 }

1/5 1/5 1/5 1/5 1/5 1/5 2/5 2/5
1 X ∼ {−2, −1, 0 , 1 , 2 }, g(X ) = X 2 ∼ { 0 , 1 , 4 }
1 2 2
E [g(X )] = 0 · 5 +1· 5 +4· 5 =2
1 1 1 1 1
E [g(X )] = (−2)2 · 5 + (−1)2 · 5 + (0)2 · 5 + (1)2 · 5 + (2)2 · 5 =2
2 (X , Y ) ∼ Uniform{(0, 0), (1, 0), (0, 1), (1, 1), (−1, 1), (1, −1)}
1/6 4/6 1/6
g(X , Y ) = X 2 + XY + Y 2 ∼ { 0 , 1 , 3 }
1 4 1
E [g(X , Y )] = 0 · 6 +1· 6 +3· 6 = 7/6

1/5 1/5 1/5 1/5 1/5 1/5 2/5 2/5
1 X ∼ {−2, −1, 0 , 1 , 2 }, g(X ) = X 2 ∼ { 0 , 1 , 4 }
1 2 2
E [g(X )] = 0 · 5 +1· 5 +4· 5 =2
1 1 1 1 1
E [g(X )] = (−2)2 · 5 + (−1)2 · 5 + (0)2 · 5 + (1)2 · 5 + (2)2 · 5 =2
2 (X , Y ) ∼ Uniform{(0, 0), (1, 0), (0, 1), (1, 1), (−1, 1), (1, −1)}
1/6 4/6 1/6
g(X , Y ) = X 2 + XY + Y 2 ∼ { 0 , 1 , 3 }
1 4 1
E [g(X , Y )] = 0 · 6 +1· 6 +3· 6 = 7/6
1 1 1 1 1 1
E [g(X , Y )] = 0 · 6 + (1) · 6 + (1) · 6 + (3) · 6 + (1) · 6 + (1) · 6 = 7/6

Linearity of expected value
1 E [cX ] = cE [X ] for a random variable X and a constant c
Proof: X X
E [cX ] = ct fX (t) = c t fX (t) = cE [X ]
t∈TX t∈TX

Proof: X X
t∈TX t∈TX
2 E [X + Y ] = E [X ] + E [Y ] for any two random variables X , Y

Proof:
X
E [X + Y ] = (t1 + t2 )fXY (t1 , t2 )
t1 ∈TX ,t2 ∈TY
X X
= t1 fXY (t1 , t2 ) + t2 fXY (t1 , t2 )
t1 ∈TX ,t2 ∈TY t1 ∈TX ,t2 ∈TY
= E [X ] + E [Y ]

Proof: X X
t∈TX t∈TX
2 E [X + Y ] = E [X ] + E [Y ] for any two random variables X , Y

Proof:
X
E [X + Y ] = (t1 + t2 )fXY (t1 , t2 )
t1 ∈TX ,t2 ∈TY
X X
= t1 fXY (t1 , t2 ) + t2 fXY (t1 , t2 )
t1 ∈TX ,t2 ∈TY t1 ∈TX ,t2 ∈TY
= E [X ] + E [Y ]
E [aX + bY ] = aE [X ] + bE [Y ]
One of the most useful properties of expected value!
Illustrative examples
1 (X , Y ) ∼ Uniform{(0, 0), (1, 0), (0, 1), (1, 1), (−1, 1), (1, −1)}
1/6 4/6 1/6
g(X , Y ) = X 2 + XY + Y 2 ∼ { 0 , 1 , 3 }

1 (X , Y ) ∼ Uniform{(0, 0), (1, 0), (0, 1), (1, 1), (−1, 1), (1, −1)}
1/6 4/6 1/6
g(X , Y ) = X 2 + XY + Y 2 ∼ { 0 , 1 , 3 }
E [g(X , Y )] = 7/6

1 (X , Y ) ∼ Uniform{(0, 0), (1, 0), (0, 1), (1, 1), (−1, 1), (1, −1)}
1/6 4/6 1/6
g(X , Y ) = X 2 + XY + Y 2 ∼ { 0 , 1 , 3 }
E [g(X , Y )] = 7/6
E [g(X , Y )] = E [X 2 + XY + Y 2 ] = E [X 2 ] + E [XY ] + E [Y 2 ] = 7/6
(E [X 2 ] = 4/6, E [XY ] = −1/6, E [Y 2 ] = 4/6)

1 (X , Y ) ∼ Uniform{(0, 0), (1, 0), (0, 1), (1, 1), (−1, 1), (1, −1)}
1/6 4/6 1/6
g(X , Y ) = X 2 + XY + Y 2 ∼ { 0 , 1 , 3 }
E [g(X , Y )] = 7/6
(E [X 2 ] = 4/6, E [XY ] = −1/6, E [Y 2 ] = 4/6)
2 X ∼ fX , Y ∼ fY , the joint PMF fXY is not given. Can you compute
E [X + Y ]? Yes!

1 (X , Y ) ∼ Uniform{(0, 0), (1, 0), (0, 1), (1, 1), (−1, 1), (1, −1)}
1/6 4/6 1/6
g(X , Y ) = X 2 + XY + Y 2 ∼ { 0 , 1 , 3 }
E [g(X , Y )] = 7/6
(E [X 2 ] = 4/6, E [XY ] = −1/6, E [Y 2 ] = 4/6)
E [X + Y ]? Yes!
P P
E [X + Y ] = E [X ] + E [Y ] = t∈TX t fX (t) + t∈TY t fY (t)

1 (X , Y ) ∼ Uniform{(0, 0), (1, 0), (0, 1), (1, 1), (−1, 1), (1, −1)}
1/6 4/6 1/6
g(X , Y ) = X 2 + XY + Y 2 ∼ { 0 , 1 , 3 }
E [g(X , Y )] = 7/6
(E [X 2 ] = 4/6, E [XY ] = −1/6, E [Y 2 ] = 4/6)
E [X + Y ]? Yes!
P P
E [X + Y ] = E [X ] + E [Y ] = t∈TX t fX (t) + t∈TY t fY (t)
Note: Expected value of the form E [g(X ) + h(Y )] can be computed with
marginal PMFs, and it does not depend on the joint PMF.

Problem: Throw a fair die twice
What is the expected value of the sum of the two numbers seen?

Problem: Expected value of Binomial(n, p)
Suppose Y ∼ Binomial(n, p).
n
!
X n k
E [Y ] = k p (1 − p)n−k = np
k=0
k
Such summations may be difficult to simplify

n
!
X n k
E [Y ] = k p (1 − p)n−k = np
k=0
k

Alternative method using linearity of expected value
Let X1 , . . . , Xn be i.i.d. Bernoulli(p). Then, E [Xi ] = p and
Y = X1 + · · · + Xn ∼ Binomial(n, p)

n
!
X n k
E [Y ] = k p (1 − p)n−k = np
k=0
k

Alternative method using linearity of expected value
Let X1 , . . . , Xn be i.i.d. Bernoulli(p). Then, E [Xi ] = p and
Y = X1 + · · · + Xn ∼ Binomial(n, p)
So, E [Y ] = E [X1 ] + · · · + E [Xn ] = np

Zero-mean random variable
A random variable X with E [X ] = 0 is said to be a zero-mean random
variable.

variable.
Translation of a random variable
X + c, where c is a constant, is a “translated” version of X .
Range of X + c is {t + c : t ∈ TX }, which is a translated version of
TX , the range of X
P(X + c = t + c) = P(X = t) and the PMF is “translated” as well

variable.
Translation of a random variable
X + c, where c is a constant, is a “translated” version of X .
Range of X + c is {t + c : t ∈ TX }, which is a translated version of
TX , the range of X
P(X + c = t + c) = P(X = t) and the PMF is “translated” as well
Translation by expected value
Important: E [X ] is a constant.
Y = X − E [X ] is a translated version of X and E [Y ] = 0.
So, X − E [X ] is a zero-mean random variable.

Problem: Balls on bins
Suppose 10 balls are thrown uniformly at random into 3 bins. What is the
expected number of empty bins?

Subsection 3
Variance

Motivation
How good is expected value in describing a random variable?

1 1/2 1/2 1/2 1/2
Let X ∼ {10}, Y ∼ { 9 , 11}, Z ∼ { 0 , 20}.
E [X ] = E [Y ] = E [Z ] = 10
However, the three random variables are quite different!

Motivation
How good is expected value in describing a random variable?

1 1/2 1/2 1/2 1/2
Let X ∼ {10}, Y ∼ { 9 , 11}, Z ∼ { 0 , 20}.
E [X ] = E [Y ] = E [Z ] = 10
However, the three random variables are quite different!

Center and spread
Expected value: represents “center” of a random variable
We need some indicator of “spread” about the expected value

Variance and standard deviation
Definition (Variance and standard deviation)
The variance of a random variable X , denoted Var(X ), is defined as
Var(X ) = E [(X − E [X ])2 ].
The standard deviation of X , denoted SD(X ), is defined as

q
SD(X ) = + Var(X ).

Variance and standard deviation
Definition (Variance and standard deviation)
The variance of a random variable X , denoted Var(X ), is defined as
Var(X ) = E [(X − E [X ])2 ].
The standard deviation of X , denoted SD(X ), is defined as

q
SD(X ) = + Var(X ).
Variance is expected value of the random variable (X − E [X ])2 .

− E [X ])2 P(X = t)
P
I Var(X ) = t∈TX (t
I Variance is non-negative, and standard deviation is well-defined.

I Units of SD(X ) are same as units of X .
Intuitively, the more the “spread” in TX , the more will be Var(X )

Example
1 1/2 1/2 1/2 1/2
X ∼ {10}, Y ∼ { 9 , 11}, Z ∼ { 0 , 20}
E [X ] = E [Y ] = E [Z ] = 10
Variances and standard deviations are different!

Example
1 1/2 1/2 1/2 1/2
X ∼ {10}, Y ∼ { 9 , 11}, Z ∼ { 0 , 20}
E [X ] = E [Y ] = E [Z ] = 10
Var(X ) = (10 − 10)2 · 1 = 0, SD(X ) = 0

Example
1 1/2 1/2 1/2 1/2
X ∼ {10}, Y ∼ { 9 , 11}, Z ∼ { 0 , 20}
E [X ] = E [Y ] = E [Z ] = 10
Var(X ) = (10 − 10)2 · 1 = 0, SD(X ) = 0
1 1
Var(Y ) = (9 − 10)2 · + (11 − 10)2 · = 1, SD(X ) = 1
2 2

Example
1 1/2 1/2 1/2 1/2
X ∼ {10}, Y ∼ { 9 , 11}, Z ∼ { 0 , 20}
E [X ] = E [Y ] = E [Z ] = 10
Var(X ) = (10 − 10)2 · 1 = 0, SD(X ) = 0
1 1
Var(Y ) = (9 − 10)2 · + (11 − 10)2 · = 1, SD(X ) = 1
2 2
1 1
Var(Z ) = (0 − 10)2 · + (20 − 10)2 · = 100, SD(X ) = 10
2 2

Example: Throw a die
X ∼ Uniform{1, 2, 3, 4, 5, 6}
E [X ] = 3.5

X ∼ Uniform{1, 2, 3, 4, 5, 6}
E [X ] = 3.5
1 1 1
Var(X ) = (1 − 3.5)2 · + (2 − 3.5)2 · + (3 − 3.5)2 ·
6 6 6
2 1 2 1 2 1
+ (4 − 3.5) · + (5 − 3.5) · + (6 − 3.5) ·
6 6 6
35
= = 2.916 . . .
12
SD(X ) = 1.7078 . . .

X ∼ Uniform{1, 2, 3, 4, 5, 6}
E [X ] = 3.5
1 1 1
Var(X ) = (1 − 3.5)2 · + (2 − 3.5)2 · + (3 − 3.5)2 ·
6 6 6
2 1 2 1 2 1
+ (4 − 3.5) · + (5 − 3.5) · + (6 − 3.5) ·
6 6 6
35
= = 2.916 . . .
12
SD(X ) = 1.7078 . . .
Given PMF of X with a small range, variance and standard

deviation can be readily computed.
Properties: Scaling and translation
Let X be a random variable. Let a be a constant real number.

1 Var(aX ) = a2 Var(X )
2 SD(aX ) = |a|SD(X )
3 Var(X + a) = Var(X )
4 SD(X + a) = SD(X )

Properties: Scaling and translation
Let X be a random variable. Let a be a constant real number.

1 Var(aX ) = a2 Var(X )
2 SD(aX ) = |a|SD(X )
3 Var(X + a) = Var(X )
4 SD(X + a) = SD(X )
Contrast with similar properties for E [X ]

Alternative formula for variance
Theorem
The variance of a random variable X is given by
Var(X ) = E [X 2 ] − E [X ]2 .

Theorem
Var(X ) = E [X 2 ] − E [X ]2 .
Proof:
Var(X ) = E [(X − E [X ])2 ] = E [X 2 − 2E [X ]X + E [X ]2 ]
By linearity of expected value,
Var(X ) = E [X 2 ] − 2E [E [X ]X ] + E [E [X ]2 ]
Since E [X ] is a constant,
Var(X ) = E [X 2 ] − 2E [X ]E [X ] + E [X ]2 = E [X 2 ] − E [X ]2

Theorem
Var(X ) = E [X 2 ] − E [X ]2 .
Proof:
Var(X ) = E [(X − E [X ])2 ] = E [X 2 − 2E [X ]X + E [X ]2 ]
By linearity of expected value,
Var(X ) = E [X 2 ] − 2E [E [X ]X ] + E [E [X ]2 ]
Since E [X ] is a constant,
Var(X ) = E [X 2 ] − 2E [X ]E [X ] + E [X ]2 = E [X 2 ] − E [X ]2
E [X ], E [X 2 ]: first and second moment, Var(X ): second central moment

Sum and product of independent random variables
For any two random variables X and Y (independent or dependent),
E [X + Y ] = E [X ] + E [Y ].

E [X + Y ] = E [X ] + E [Y ].
Suppose X and Y are independent random variables. Then, more is true.

1 E [XY ] = E [X ]E [Y ]

E [X + Y ] = E [X ] + E [Y ].

1 E [XY ] = E [X ]E [Y ]
2 Var(X + Y ) = Var(X ) + Var(Y )

Proof

E [X + Y ] = E [X ] + E [Y ].

1 E [XY ] = E [X ]E [Y ]
2 Var(X + Y ) = Var(X ) + Var(Y )

Example: Sum of two dice
What is the variance of the sum of two dice?

Variance of common distributions
Distribution Expected value Variance
Bernoulli(p) p p(1 − p)
Binomials(n, p) np np(1 − p)
Geometric(p) 1/p (1 − p)/p 2
Poisson(λ) λ λ
Uniform{1, . . . , n} (n + 1)/2 2
(n − 1)/12

Working

Standardised random variables
Definition
A random variable X is said to be standardised if
E [X ] = 0, Var(X ) = 1.

Standardised random variables
Definition
A random variable X is said to be standardised if
E [X ] = 0, Var(X ) = 1.
Theorem
Let X be a random variable. Then,
X − E [X ]
SD(X )
is a standardised random variable.

Existence of expected value and variance
There are random variables s.t. E [X ] goes to ∞ (or −∞)

1/2 1/4 1/8 1/2n
I X ∼ { 1 , 2 , 4 , . . . , 2n−1 , . . .}
I E [X 2 ] will go to ∞ in this case and Var(X ) is ill-defined
There are random variables s.t. E [X ] is not well-defined

1/2 1/4 1/8 1/2n
n−1
I X ∼ { 1 , −2, 4 , . . . , (−2) , . . .}
I E [X 2 ] will go to ∞ in this case and Var(X ) is ill-defined
There are random variables s.t. E [X ] is finite, but E [X 2 ] goes to ∞

I Var(X ) goes to ∞ too
In this course, we will generally consider well-behaved random variables

with finite mean and variance

Subsection 4
Covariance and correlation

Motivation
Consider the following two joint PMFs for two random variables X and Y .
fXY 0 1 fXY 0 1
0 1/2 1/2 0 0 1/2
1 1/2 1/2 1 1/2 0

Motivation
fXY 0 1 fXY 0 1
0 1/2 1/2 0 0 1/2
1 1/2 1/2 1 1/2 0
Same marginal PMFs in both cases. So, mean and variance are the
same for X and Y .
However, the two cases are very different
I X and Y are independent in one case
I Value of X determines the value of Y in the other

Motivation
fXY 0 1 fXY 0 1
0 1/2 1/2 0 0 1/2
1 1/2 1/2 1 1/2 0
Same marginal PMFs in both cases. So, mean and variance are the
same for X and Y .
However, the two cases are very different
I X and Y are independent in one case
I Value of X determines the value of Y in the other
How to summarize relationship between random variables?

Covariance
Definition (Covariance)
Suppose X and Y are random variables on the same probability space. The
covariance of X and Y , denoted Cov(X , Y ), is defined as
Cov(X , Y ) = E [(X − E [X ])(Y − E [Y ])].

Covariance
Cov(X , Y ) = E [(X − E [X ])(Y − E [Y ])].
Cov(X , Y ) is positive
I (X − E [X ])(Y − E [Y ]) tends to be positive
I When X is above/below its average, Y tends to be correspondingly
above/below its average

Covariance
Cov(X , Y ) = E [(X − E [X ])(Y − E [Y ])].
Cov(X , Y ) is negative
I (X − E [X ])(Y − E [Y ]) tends to be negative
below/above its average

Covariance
Cov(X , Y ) = E [(X − E [X ])(Y − E [Y ])].
Cov(X , Y ) is negative
I (X − E [X ])(Y − E [Y ]) tends to be negative
below/above its average
Cov(X , Y ) = 0
I X and Y are said to be “uncorrelated”
Examples: positive and negative covariance
1 X : height of a person, Y : weight of a person

I A higher value of X tends to result in a higher value for Y
I We expect Cov(X , Y ) to be positive


2 X : Rainfall during monsoon, Y : Farmer debt
I A higher value of X tends to result in a lower value for Y
I We expect Cov(X , Y ) to be negative


2 X : Rainfall during monsoon, Y : Farmer debt
I A higher value of X tends to result in a lower value for Y
3 X : Runs in an IPL over, Y : Wickets in the same over

Example: Computing covariance
X = −1 X =0 X =1
Y = −1 1/15 2/15 2/15
Y =0 2/15 1/15 1/15
Y =1 2/15 2/15 1/15

Properties
1 Cov(X , X ) = Var(X )
I Proof: Cov(X , X ) = E [(X − E [X ])(X − E [X ])]

Properties
1 Cov(X , X ) = Var(X )
2 Cov(X , Y ) = E [XY ] − E [X ]E [Y ]
I Proof: Cov(X , Y ) = E [(X − E [X ])(Y − E [Y ])] =
E [XY − XE [Y ] − YE [X ] + E [X ]E [Y ]]

Properties
1 Cov(X , X ) = Var(X )
2 Cov(X , Y ) = E [XY ] − E [X ]E [Y ]
E [XY − XE [Y ] − YE [X ] + E [X ]E [Y ]]
3 Covariance if symmetric, Cov(X , Y ) = Cov(Y , X )

Properties
1 Cov(X , X ) = Var(X )
2 Cov(X , Y ) = E [XY ] − E [X ]E [Y ]
E [XY − XE [Y ] − YE [X ] + E [X ]E [Y ]]
3 Covariance if symmetric, Cov(X , Y ) = Cov(Y , X )
4 Covariance is a “linear” quantity
1 Cov(X , aY + bZ ) = aCov(X , Y ) + bCov(X , Z )
2 Cov(aX + bY , Z ) = aCov(X , Z ) + bCov(Y , Z )

Covariance and independence
1 If X and Y are independent, X and Y are uncorrelated,

i.e. Cov(X , Y ) = 0
I Proof: Cov(X , Y ) = E [XY ] − E [X ]E [Y ] = 0

Covariance and independence
1 If X and Y are independent, X and Y are uncorrelated,

i.e. Cov(X , Y ) = 0
I Proof: Cov(X , Y ) = E [XY ] − E [X ]E [Y ] = 0
2 If X and Y are uncorrelated, they may be dependent.
Example:
X = −1 X =0 X =1
Y =0 1/8 1/4 1/8
Y =1 1/4 0 1/4

Correlation coefficient
Definition (Correlation coefficient)

The correlation coefficient or simply correlation of two random variables X
and Y , denoted ρ(X , Y ), is defined as
Cov(X , Y )
ρ(X , Y ) = .
SD(X )SD(Y )


Cov(X , Y )
ρ(X , Y ) = .
SD(X )SD(Y )
Result: −SD(X )SD(Y ) ≤ Cov(X , Y ) ≤ SD(X )SD(Y )

2
X −E [X ] Y −E [Y ]
I Simplify E SD(X ) + SD(Y ) ≥ 0 to get lower bound
2
X −E [X ] Y −E [Y ]
I Simplify E SD(X ) − SD(Y ) ≥ 0 to get upper bound


Cov(X , Y )
ρ(X , Y ) = .
SD(X )SD(Y )
Result: −SD(X )SD(Y ) ≤ Cov(X , Y ) ≤ SD(X )SD(Y )

2
X −E [X ] Y −E [Y ]
I Simplify E SD(X ) + SD(Y ) ≥ 0 to get lower bound
2
X −E [X ] Y −E [Y ]
I Simplify E SD(X ) − SD(Y ) ≥ 0 to get upper bound
Using the above, −1 ≤ ρ(X , Y ) ≤ 1

Correlation and random variables
1 ρ(X , Y )
I One number that summarizes “trend” between two random variables
I Dimensionless quantity.

1 ρ(X , Y )
2 ρ(X , Y ) is close to zero
I X and Y are close to being uncorrelated
I No clear trend between X and Y

1 ρ(X , Y )
3 ρ(X , Y ) = 1 or ρ(X , Y ) = −1
I There exist a 6= 0 and b so that Y = aX + b with probability 1.
I Y is a linear function of X .

1 ρ(X , Y )
3 ρ(X , Y ) = 1 or ρ(X , Y ) = −1
I There exist a 6= 0 and b so that Y = aX + b with probability 1.
I Y is a linear function of X .
4 |ρ(X , Y )| is close to one
I X and Y are strongly correlated
I Increase in X is likely to match up with an increase in Y

Problem
Consider the following joint PMF, where −1/4 ≤ x ≤ 1/4.
X =0 X =1
Y =0 1/4 − x 1/4 + x
Y =1 1/4 + x 1/4 − x

Problem
Consider the following joint PMF, where −1/4 ≤ x ≤ 1/4.
X =c X =c +1
Y =d 1/4 − x 1/4 + x
Y =d +1 1/4 + x 1/4 − x

Scatter plots and correlation

Subsection 5
Bounds on probabilities using mean and variance

Notation for mean and variance
If there is no confusion about the random variable X ,

I µ will denote the mean E [X ]
I σ 2 will denote the variance Var(X )
I σ will denote the standard deviation SD(X )

Notation for mean and variance
If there is no confusion about the random variable X ,

I µ will denote the mean E [X ]
I σ 2 will denote the variance Var(X )
I σ will denote the standard deviation SD(X )
If there are multiple random variables under discussion,

I µX will denote the mean E [X ]
I σX2 will denote the variance Var(X )
I σX will denote the standard deviation SD(X )

What does mean say about the distribution?
Suppose the average marks in a course is 50/100.

What fraction of students will have marks ≥ 50?


I It cannot be 0. Why?






I Can it be 0.9?


I Can it be 0.9?


I Can it be 0.9?
Average says something about the distribution of marks!

Standard units in statistics
Consider a random variable X with mean µ and variance σ 2 .

In an experiment, X may take a value that is close to µ or away from µ.
X − µ: measures the distance of X from the mean µ.
I Could be positive or negative
Standard units: The number of standard deviations that a realization

of a random variable is away from the mean.
I We expect X − µ to fall between −cσ and cσ for a small value of c
I In other words, we expect X to fall between µ − cσ and µ + cσ

Examples
Throw a pair of die, X = sum of the two numbers

I µ = 7, σ ≈ 2.42
I P(|X − µ| ≤ σ) = P(4.58 ≤ X ≤ 9.42) = P(X ∈ {5, 6, 7, 8, 9} = 2/3
F So, P(|X − µ| > σ) = 1/3
I P(|X − µ| > 2σ) = P(X ∈ {2, 12} = 2/36 ≈ 0.056

Examples
Throw a pair of die, X = sum of the two numbers

I µ = 7, σ ≈ 2.42
I P(|X − µ| ≤ σ) = P(4.58 ≤ X ≤ 9.42) = P(X ∈ {5, 6, 7, 8, 9} = 2/3
F So, P(|X − µ| > σ) = 1/3
I P(|X − µ| > 2σ) = P(X ∈ {2, 12} = 2/36 ≈ 0.056
X ∼ Uniform{1, 2, . . . , 100}
I µ = 50.5, σ ≈ 28.9
I P(|X − µ| > σ) = 58/100
I P(|X − µ| > 2σ) = 0

Markov’s inequality
Theorem (Markov’s inequality)

Let X be a discrete random variable taking non-negative values with a finite
mean µ. Then,
µ
P(X ≥ c) ≤ .
c


mean µ. Then,
µ
P(X ≥ c) ≤ .
c
Proof:
P P P
µ= t∈TX t P(X = t) = t<c t P(X = t) + t≥c t P(X = t)


mean µ. Then,
µ
P(X ≥ c) ≤ .
c
Proof:
P P P
µ= t∈TX t P(X = t) = t<c t P(X = t) + t≥c t P(X = t)
Since the first sum is non-negative,
X X X
µ≥ t P(X = t) ≥ c P(X = t) = c P(X = t) = cP(X ≥ c)
t≥c t≥c t≥c

Chebyshev’s inequality
Theorem (Chebyshev’s inequality)
Let X be a discrete random variable with a finite mean µ and a finite
variance σ 2 . Then,
1
P(|X − µ| ≥ kσ) ≤ 2 .
k

1
P(|X − µ| ≥ kσ) ≤ 2 .
k
Proof:
Apply Markov’s inequality to (X − µ)2 .

1
P(|X − µ| ≥ kσ) ≤ 2 .
k
Proof:
Apply Markov’s inequality to (X − µ)2 .
Other forms:
σ2 1
P(|X − µ| ≥ c) ≤ c2
, P((X − µ)2 ≥ k 2 σ 2 ) ≤ k2
1
P(µ − kσ < X < µ + kσ) ≥ 1 − k2
1
P(X ≥ µ + kσ) + P(X ≤ µ − kσ) ≤ k2
1 1
I P(X ≥ µ + kσ) ≤ k2 , P(X ≤ µ − kσ) ≤ k2

Compare actual and Chebyshev
Let X be a random variable with finite mean µ and variance σ 2 .

1
P(|X − µ| ≥ 2σ) ≤
4


1
P(|X − µ| ≥ 2σ) ≤
4
√
1 X ∼ Binomial(10, 1/2), µ = 5, σ = 2.5 ≈ 1.58
P(|X − 5| ≥ 2σ = P(X ∈ {0, 1, 9, 10}) ≈ 0.021


1
P(|X − µ| ≥ 2σ) ≤
4
√
1 X ∼ Binomial(10, 1/2), µ = 5, σ = 2.5 ≈ 1.58
P(|X − 5| ≥ 2σ = P(X ∈ {0, 1, 9, 10}) ≈ 0.021
2 X ∼ Geometric(1/4), µ = 4, σ ≈ 3.46
P(|X − 4| ≥ 2σ = P(X ∈ {11, 12, . . .}) ≈ 0.056

What do mean and variance say about distribution?
Mean µ, through Markov’s inequality: bounds the probability that a

non-negative random variable takes values much larger than the mean.


Mean µ and standard deviation σ, through Chebyshev’s inequality:
bound the probability that X is away from µ by kσ.


These are some of the most useful measures of expected centre and
spread in practice.
I Suppose the average number of accidents decreases by 10000 per day
across the country. Is that a “significant” decrease?
I If standard deviation of the number of accidents is known, we can find
how high 10000 is in terms of standard deviations to answer above
question.


These are some of the most useful measures of expected centre and
spread in practice.
I Suppose the average number of accidents decreases by 10000 per day
across the country. Is that a “significant” decrease?
I If standard deviation of the number of accidents is known, we can find
how high 10000 is in terms of standard deviations to answer above
question.
There are a lot of important theoretical implications as well. We will

see these later on.

Week 3 Slides

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Week 3 Slides

Uploaded by

Copyright:

Available Formats

Subsection 1

Week 4: Expected value 2 / 22

Average value of a set of data is very useful in practice.

What does the average represent?

Probability theory: expected value of a random variable is an important

Week 4: Expected value 3 / 22

Week 4: Expected value 4 / 22

Suppose you bet 1 unit of money.

Bet Gain Prob

Week 4: Expected value 4 / 22

Suppose you bet 1 unit of money.

Bet Gain Prob

How did the casino decide on above returns? How to bet?

Definition (Expected value)

assuming above sum exists.

Week 4: Expected value 5 / 22

Definition (Expected value)

assuming above sum exists.

Other names: mean of X , average value of X

Week 4: Expected value 5 / 22

Week 4: Expected value 6 / 22

Week 4: Expected value 6 / 22

Week 4: Expected value 6 / 22

Week 4: Expected value 7 / 22

How to simplify sum?

Week 4: Expected value 7 / 22

How to simplify sum?

Week 4: Expected value 7 / 22

Week 4: Expected value 8 / 22

Week 4: Expected value 8 / 22

Week 4: Expected value 8 / 22

Week 4: Expected value 9 / 22

2 Geometric Progression (GP): at+1 − r at = 0 (r 6= 1)

Week 4: Expected value 9 / 22

2 Geometric Progression (GP): at+1 − r at = 0 (r 6= 1)

Week 4: Expected value 9 / 22

2 Geometric Progression (GP): at+1 − r at = 0 (r 6= 1)

Week 4: Expected value 9 / 22

Week 4: Expected value 10 / 22

Week 4: Expected value 10 / 22

Week 4: Expected value 11 / 22

Properties of expected value

Week 4: Expected value 13 / 22

Suppose you bet 1 unit of money with probability p1 each on “Under 7” or

Week 4: Expected value 12 / 22

Week 4: Expected value 14 / 22

1 Consider a constant c as a random variable X with P(X = c) = 1.

Week 4: Expected value 14 / 22

1 Consider a constant c as a random variable X with P(X = c) = 1.

2 Suppose X takes only non-negative values, i.e. P(X ≥ 0) = 1. Then,

Week 4: Expected value 14 / 22

Theorem (Expected value of a function)

Week 4: Expected value 15 / 22

Theorem (Expected value of a function)

We have seen how to find fY , PMF of a function of multiple random

Week 4: Expected value 15 / 22

Week 4: Expected value 16 / 22

Week 4: Expected value 16 / 22

Week 4: Expected value 16 / 22

Week 4: Expected value 16 / 22

Week 4: Expected value 16 / 22

Week 4: Expected value 16 / 22

Week 4: Expected value 17 / 22