Professional Documents
Culture Documents
Ex. How many heads would you expect if you flipped a coin twice?
Draw PDF
Definition: Let X be a random variable assuming the values x1, x2, x3, ... with
corresponding probabilities p(x1), p(x2), p(x3),..... The mean or expected value of X is
defined by E(X) = sum xk p(xk).
Interpretations:
(i) The expected value measures the center of the probability distribution - center of mass.
(ii) Long term frequency (law of large numbers well get to this soon)
Expectations can be used to describe the potential gains and losses from games.
Ex. Roll a die. If the side that comes up is odd, you win the $ equivalent of that side. If it
is even, you lose $4.
Ex. Lottery You pick 3 different numbers between 1 and 12. If you pick all the numbers
correctly you win $100. What are your expected earnings if it costs $1 to play?
Let X be a random variable assuming the values x1, x2, x3, ... with corresponding
probabilities p(x1), p(x2), p(x3),..... For any function g, the mean or expected value of g(X)
is defined by E(g(X)) = sum g(xk) p(xk).
Ex. Roll a fair die. Let X = number of dots on the side that comes up.
Calculate E(X2).
Ex. An indicator variable for the event A is defined as the random variable that takes on
the value 1 when event A happens and 0 otherwise.
IA = 1 if A occurs
0 if AC occurs
Both X and Y have the same expected value, but are quite different in other respects. One
such respect is in their spread. We would like a measure of spread.
Definition: If X is a random variable with mean E(X), then the variance of X, denoted by
Var(X), is defined by Var(X) = E((X-E(X))2).
Var(X) = E((X-E(X))2)
= sum (x- E(X))2 p(x)
= sum (x2-2x E(X)+ E(X)2) p(x)
= sum x2 p(x) -2 E(X) sum xp(x) + E(X)2 sum p(x)
= E(X2) -2 E(X)2 + E(X)2 = E(X2) - E(X)2
Ex. Roll a fair die. Let X = number of dots on the side that comes up.
E(aX+b) = a E(X) + b
Var(aX+b) = E[(aX+b (a E(X)+b))2]= E(a2(X E(X))2) = a2E((X E(X))2)= a2Var(X)
i.e., the number m that minimizes E[(X-m)^2] is m=E(X). Proof: expand and
differentiate with respect to m.
Best estimate under 1-1(X=x) loss: mode. Ie, choosing mode maximizes probability of
being exactly right. Proof easy for discrete r.v.s; a limiting argument is required for
continuous r.v.s, since P(X=x)=0 for any x.
Moment Generating Functions
The moment generating function of the random variable X, denoted M X (t ) , is defined for
all real values of t by,
The reason M X (t ) is called a moment generating function is because all the moments of
X can be obtained by successively differentiating M X (t ) and evaluating the result at t=0.
First Moment:
d d d
M X (t ) = E (e tX ) = E ( e tX ) = E ( Xe tX )
dt dt dt
M ' X ( 0) = E ( X )
(For any of the distributions we will use we can move the derivative inside the
expectation).
Second moment:
d d d
M ' ' X (t ) = M ' (t ) = E ( Xe tX ) = E ( ( Xe tX )) = E ( X 2 e tX )
dt dt dt
kth moment:
M k X (t ) = E ( X k e tX )
M k X (0) = E ( X k )
Ex. Binomial random variable with parameters n and p.
Calculate M X (t ) :
n
" n% k n " %
n k n
M X (t) = E(e ) = ) e $ ' p (1( p) = )$ '( pe t ) (1( p) n(k = ( pe t + 1( p)
tX tk n(k
k= 0 # k& k
k= 0 # &
n !1
! (
M X ' (t ) = n pe t + 1 ! p ) pe t
n!2 n !1
(
M X ' ' (t ) = n(n ! 1) pe t + 1 ! p ) (
( pe t ) 2 + n pe t + 1 ! p ) pe t
n !1
(
E ( X ) = M X ' (0) = n pe 0 + 1 ! p ) pe 0 = np
n!2 n !1
(
E ( X 2 ) = M X ' ' (t ) = n(n ! 1) pe 0 + 1 ! p ) (
( pe 0 ) 2 + n pe 0 + 1 ! p ) pe 0
= n(n ! 1) p 2 + np
Later well see an even easier way to calculate these moments, by using the fact that a
binomial X is the sum of N i.i.d. simpler (Bernoulli) r.v.s.
Fact: Suppose that for two random variables X and Y, moment generating functions exist
and are given by M X (t ) and M Y (t ) , respectively. If M X (t ) = M Y (t ) for all values of t, then
X and Y have the same probability distribution.
If the moment generating function of X exists and is finite in some region about t=0, then
the distribution is uniquely determined.
Properties of Expectation
Proposition:
It is important to note that if the function g(x,y) is only dependent on either x or y the
formula above reverts to the 1-dimensional case.
Compute E(|X-Y|).
L L L L
1 1
E (| X " Y |) = ! ! | x " y | 2 dydx = 2 ! ! | x " y | dydx
0 0 L L 0 0
L x L x L
' y2 $ ' y2 $
(0 | x ! y | dy = (0 ( x ! y ) dy + (x ( y ! x ) dy = % xy ! " + % ! yx "
& 2 #o & 2 #x
x2 L2 x2 L2
= (x2 ! )+ ! xL ! ( ! x 2 ) = + x 2 ! xL
2 2 2 2
L L
1 & L2 2 # 1 - xL2 x 3 x 2 * 1 & L3 L3 L3 # L
E (| X ' Y |) = 2 $
.0 $% 2 + x ' xL !
! dx = + + ' L( = 2 $$ + ' !! =
L " L2 , 2 3 2 )0 L % 2 3 2" 3
Expectation of sums of random variables
Ex. Let X and Y be continuous random variables with joint pdf fXY(x,y). Assume that
E(X) and E(Y) are finite. Calculate E(X+Y).
Calculate E( X ).
We know that E ( X i ) = .
n
Xi 1 n
1 n
E ( X ) = E (! ) = E (! X i ) = ! E ( X i ) =
i =1 n n i =1 n i =1
When the mean of a distribution is unknown, the sample mean is often used in statistics to
estimate it. (Unbiased estimate)
Ex. Let X be a binomial random variable with parameters n and p. X represents the
number of successes in n trials. We can write X as follows:
X = X1 + X 2 + K + X n
where
#1 if trial i is a success
Xi = "
! 0 if trial i is a failure
E ( X i ) = 1 * p + 0 * (1 ! p ) = p
E ( X ) = E ( X 1 ) + E ( X 2 ) + K + E ( X n ) = np
Ex. A group of N people throw their hats into the center of a room. The hats are mixed,
and each person randomly selects one. Find the expected number of people that select
their own hat.
then X = X 1 + X 2 + K + X N
1
Each person is equally likely to select any of the N hats, so P( X i = 1) = .
N
1 1 1
E( X i ) = 1 + 0(1 ! ) = .
N N N
1
Hence, E ( X ) = E ( X 1 ) + E ( X 2 ) + K + E ( X N ) = N =1
N
Ex. Twenty people, consisting of 10 married couples, are to be seated at five different
tables, with four people at each table. If the seating is done at random, what is the
expected number of married couples that are seated at the same table?
Then X = X 1 + X 2 + K + X 10
Consider the table where husband i is sitting. There is room for three other people at his
table. There are a total of 19 possible people which could be seated at his table.
&1#&18 #
$$ !!$$ !!
1 2 3
P( X i = 1) = % "% " = .
&19 # 19
$$ !!
%3"
3 16 3
E( X i ) = 1 + 0 = .
19 19 19
3 30
Hence, E ( X ) = E ( X 1 ) + E ( X 2 ) + K + E ( X n ) = 10 =
19 19
Proposition: If X and Y are independent, then for any functions h and g,
E ( g ( X )h(Y )) = E ( g ( X )) E (h(Y )) .
Proof:
Fact: The moment generating function of the sum of independent random variables
equals the product of the individual moment generating functions.
Proof: M X +Y (t ) = E (e t ( X +Y ) ) = E (e tX e tY ) = E (e tX ) E (e tY ) = M X (t ) M Y (t )
Covariance and correlation
We would like a measure that can quantify this difference in the strength of a relationship
between two random variables.
m m m m
Then E (! X i ) = ! i and E (! Yi ) = !" i
i =1 i =1 i =1 i =1
n m ( n n m n % ( n m %
Cov (! X i , ! Yi ) = E &(! X i " ! i )(! Yi " !) i )# = E &! ( X i " i )! (Yi " ) i )#
i =1 j =1 ' i =1 i =1 j =1 i =1 $ ' i =1 j =1 $
( n m % n m n m
= E &!! ( X i " i )(Yi " ) i )# = !! E (( X i " i )(Yi " ) i )) = !! Cov ( X i , Y j )
' i =1 j =1 $ i =1 j =1 i =1 j =1
n n
Proposition: Var(" X i ) = "Var(X i ) + 2" " Cov(X i , X j ) . In particular,
i=1 i=1 i< j
V(X+Y)=V(X)+V(Y)+2C(X,Y).
Proof:
!
n n n n n n
Var(" X i ) = Cov(" X i , " X i ) = " " Cov(X i , X j ) = "Var(X i ) + " " Cov(X i , X j )
i=1 i=1 i=1 i=1 j=1 i=1 i# j
n
= "Var(X i ) + 2" " Cov(X i , X j )
i=1 i< j
n n
If X1, Xn are pairwise independent for ij, then Var (! X i ) = ! Var ( X i ) .
! i =1 i =1
Ex. Let X1, .. Xn be independent and identically distributed random variables having
n
X
expected value and variance 2. Let X = ! i be the sample mean. The random
i =1 n
2
n
(X " X )
variable S 2 = ! i is called the sample variance.
i =1 n "1
1 ( n % 1 1
E (S 2 ) = & )
n " 1 ' i =1
E[( X i " ) 2 ] " nE (( X " ) 2 )# = [ ]
n! 2 " nVar ( X ) = (n " 1)! 2 = ! 2
$ n "1 n "1
then X = X 1 + X 2 + K + X n
Calculate Var(X).
n n
Var(X) = Var(" X i ) = "Var(X i ) + 2" " Cov(X i , X j )
i=1 i=1 i< j
1
Recall that since each person is equally likely to select any of the N hats P( X i = 1) = .
N
!
Hence,
1 1 1
E( X i ) = 1 + 0(1 ! ) =
N N N
and
2 1 1 1
E ( X i ) = 12 + 0(1 ! ) = .
N N N
2
2 21 '1$ N !1
Var ( X i ) = E ( X i ) ! ( E ( X i )) = ! % " = .
N &N# N2
Cov ( X i , X j ) = E ( X i X j ) ! E ( X i ) E ( X j )
1 1
P( X i = 1, X j = 1) = P( X i = 1 | X j = 1) P( X j = 1) =
N N !1
1 1 1 1 1 1
E( X i X j ) = 1 + 0(1 ! )=
N N !1 N N !1 N N !1
2
1 1 '1$ 1
Cov ( X i , X j ) = !% " = 2
N N !1 & N # N ( N ! 1)
N !1 ' N $ 1 N !1 1
Hence, Var ( X ) = + 2%% "" 2 = + =1
N & 2 # N ( N ! 1) N N
Definition: The correlation between X and Y, denoted by (X,Y), is defined, as long as
Var(X) and Var(Y) are positive, by
Cov ( X , Y )
! ( X ,Y ) = .
Var ( X )Var (Y )
It can be shown that "1 # $ (X,Y ) # 1, with equality only if Y=aX+b (assuming E(X^2)
and E(Y^2) are both finite). This is called the Cauchy-Schwarz inequality.
Now let a^2=E(Y^2) and b^2=E(X^2). Then the above two inequalities read
0<=2a^2b^2+2abE(XY)
0<=2a^2b^2-2abE(XY);
E(XY)>=-sqrt[E(X^2) E(Y^2)]
E(XY)<=sqrt[E(X^2) E(Y^2)],
and this is equivalent to the inequality "1 # $ (X,Y ) # 1. For equality to hold, either
E[(aX+bY)^2]=0 or E[(aX-bY)^2]=0, i.e., X and Y are linearly related with a negative or
positive slope, respectively.
!
The correlation coefficient is therefore a measure of the degree of linearity between X
and Y. If (X,Y)=0 then this indicates no linearity, and X and Y are said to be
uncorrelated.
Conditional Expectation
Recall that if X and Y are discrete random variables, the conditional mass function of X,
given Y=y, is defined for all y such that P(Y=y)>0, by
p XY ( x, y )
p X |Y ( x | y ) = P( X = x | Y = y ) = .
pY ( y )
E ( X | Y = y ) = ! xP( X = x | Y = y ) = ! xp X |Y ( x | y ) .
x x
Similarly, if X and Y are continuous random variables, the conditional pdf of X given
Y=y, is defined for all y such that f Y ( y ) > 0 , by
f XY ( x, y )
f X |Y ( x | y ) = .
f Y ( y)
" "
E ( X | Y = y) = ! xP( X = x | Y = y) = ! xf X |Y ( x | y )dx .
#" #"
E(X|Y=y) is a function of y.
It is important to note that conditional expectations satisfy all the properties of regular
expectations:
1. E [g ( X ) | Y = y ]= ! g ( x) p X |Y ( x | y ) if X and Y discrete.
x
"
2. E [g ( X ) | Y = y ]= ! g ( x) f X |Y ( x | y )dx if X and Y continuous.
#"
& n # n
3. E $' X i | Y = y ! = ' E [X i | Y = y ]
% i =1 " i =1
Proposition: E ( X ) = E ( E ( X | Y ))
If Y is discrete E ( X ) = E ( E ( X | Y )) = ! E ( X | Y ) pY ( y )
y
If Y is continuous E ( X ) = E ( E ( X | Y )) = ! E ( X | Y ) f Y ( y )
& #
E ( E ( X | Y )) = ' E ( X | Y ) pY ( y ) = ' $' xp X |Y ( x | y )! pY ( y )
y y % x "
p ( x, y )
= '' x XY pY ( y ) = ' x ' p XY ( x, y ) = ' xp X ( x) = E ( X )
y x pY ( y ) x y x