Professional Documents
Culture Documents
MB+SCP
mbhushan,sachinp@iitb.ac.in
Spring 2020
Expectations
with fX (x) being the probability density function. For discete case, it is
generalization of probability mass function written using dirac delta functions.
Also denoted as µ.
It is the average value of X in a large number of repetitions of the
experiment and not necessarily the value that we expect X to have.
and,
!
Z ∞ X X Z ∞
E [X ] = x αi δ(x − xi ) dx = αi xδ(x − xi )dx
−∞ i i −∞
X
= αi xi
i
For the case of two coin tosses with a fair coin, if X = number of heads, then
E [X ] is
E [X ] = 0(1/4) + 1(1/2) + 2(1/4) = 1
If the probability of a head on a coin toss was 3/4
2
(1/4) = 1/16 k = 0
P(X = k) = 2 × 4 × 34
1
k=1
2
(3/4) k=2
Q. You are waiting for a phone call after dinner (9 PM). The number of hours (X)
after 9 PM until the message arrives is a random variable with probability density
function:
1/1.5 0 < x < 1.5
f (x) =
0 otherwise
R 1.5
Answer: E [X ] = 0 (x/1.5)dx = 0.75.
Note: The above density function is a uniform density function over (0, 1.5) and
the random variable is said to have a uniform distribution over (0, 1.5).
Given RV X and its PMF or PDF, compute expected value of some function
of X say g (X ).
Option 1:
I g (X ) is also a RV: For a given ω ∈ S, x = X (ω) is a number and
g (X (ω)) = g (x) is also a number.
I Find probability distribution of g (X ) and then compute its expectation.
FY (y ) = P{Y ≤ y } = P{X 3 ≤ y }
Z y 1/3
1/3
= P{X ≤ y } = 1dx = y 1/3
0
Thus,
0, y ≤0
FY (y ) = y 1/3 , 0<y <1
1, 1≤y
R∞ R11 −2/3
E [Y ] = −∞
yfY (y )dy = 0 3
y dy = 1/4
R1
For earlier example, E [g (X )] = 0
x 3 dx = 1/4 as before.
For constants a, b
E [aX + b] = aE [X ] + b
for X continuous or discrete.
Show this.
What happens for a = 0, expectation of a constant is the constant itself.
E [X ] is called mean (µ) or the first moment of X .
E [X n ] is called the nth moment of X .
n
P
n
E [X ] = R ∞xi xi nαi X discrete
−∞
x f (x)dx X continuous
X 2
Var(X ) = (xi − E [X ]) αi
i
X
xi2 − 2xi E [X ] + (E [X ])2 αi
=
i
X X X
= xi2 αi − 2E [X ] xi αi + (E [X ])2 αi
i i i
2 2
= E [X 2 ] − 2 (E [X ]) + (E [X ])
= E [X 2 ] − (E [X ])2
= E [X 2 ] − µ2
E [Y ] = aE [X ] + b = aµ + b
Var(aX + b) = a2 Var(X )
a=0: Var(b) = 0.
b=0: Var(aX ) = a2 Var(X ); b does not affect variance.
skewness=
E [(X − µ)3 ] E [(X − µ)3 ]
=
2
(E [(X − µ) ])3/2 σ3
is the (normalized) third moment about the mean
Indicates degree of asymmetry
x 苲
x x 苲
x x
苲
x
Negative or left skew Symmetric Positive or right skew
(a) (b) (c)
E [(x − E [x]]4
σ4
Can keep defining higher moments.
Moments can be used to summarize a probability density function or
probability mass function.
We tend to use only the first two moments usually.
How does one obtain E [X n ] from the given density function of X . From basic
definition
n
P
E [X n ] = R ∞xi xi nαi X discrete
−∞
x f (x)dx X continuous
φX (t) = φ(t) = E [e tX ]
P tx
R xi txe αi X discrete
i
=
x
e f (x)dx X continuous
φ(n) (0) = E [X n ]
E [X ] = τ, E [X 2 ] = 2τ 2 , ..., E [X k ] = k!τ k
E [X ]
P{X ≥ a} ≤
a
Chebyshev’s inequality (for any k > 0):
σ2
P{|X − µ| ≥ k} ≤
k2
Chebyshev’s inequality is a special case of Markov’s inequality.
These inqualities allow derivation of probability bounds when only mean, or
mean and variance are known.
0.4
0.3
0.2
0.1
µ − kσ µ µ + kσ
0.0
−4 −2 0 2 4 6
1
P (|X − E [X ]| ≥ kσ) ≤
k2
where
σ 2 = E [(X − µ)2 ]
For a Gaussian,
P{|X − µ| ≥ 2σ} ≤ 0.05
f (x)
– 3 µ – 2 – + + 2 + 3 x
68%
95%
99.7%
Q. Show that for 40000 tosses of a fair coin, there is at least a 0.99 prob that
the proportion of heads will be between 0.475 and 0.525.
RV: X= number of heads in n = 40000 tosses.
For a binomial RV, E [X ] = np and Var(X ) = np(1 − p).
Soln. I E [X ] = 40000
2
p
= 20000 = np.
p
I Also σ = np(1 − p) = 40000 × 1/2 × 1/2 = 100
I 1 − (1/k 2 ) = 0.99 ⇒ k = 10.
I The probability is at least 0.99 that we get between
20000 − (10 × 100) = 19000 and 20000 + (10 × 100) = 21000 heads.
I 19000/40000 = 0.475 and 21000/40000 = 0.525.