You are on page 1of 7

Statistics for Data Science - 2

Notes - Week 1 and 2


Multiple Random Variables

1. Joint probability mass function: Suppose X and Y are discrete random variables
defined in the same probability space. Let the range of X and Y be TX and TY ,
respectively. The joint PMF of X and Y , denoted fXY , is a function from TX × TY
to [0, 1] defined as

fXY (t1 , t2 ) = P (X = t1 and Y = t2 ), t1 ∈ TX , t2 ∈ TY

• Joint PMF is usually written as table or a matrix.


• P (X = t1 and Y = t2 ) is denoted P (X = t1 , Y = t2 )

2. Marginal PMF: Suppose X and Y are jointly distributed discrete random variables
with joint PMF fXY . The PMF of the individual random variables X and Y are called
as marginal PMFs. It can be shown that
X
fX (t1 ) = P (X = t1 ) = (fXY (t1 , t2 ))
t2 ∈TY

X
fY (t2 ) = P (X = t2 ) = (fXY (t1 , t2 ))
t1 ∈TX

Note: Given the joint PMF, the marginal is unique.

3. Conditional distribution given an event: Suppose X is a discrete random variable


with range TX , and A is an event in the same probability space. The conditional PMF
of X given A is defined as the PMF

fX|A (t) = P (X = t|A)

where t ∈ TX
We will denote the conditional random variable by X|A. (Note that X|A is a valid
random variable with PMF fX|A ).

P ((X = t) ∩ A)
• fX|A (t) =
P (A)
• Range of (X|A) can be different from TX and will depend on A.
4. Conditional distribution of one random variable given another:
Suppose X and Y are jointly distributed discrete random variables with joint PMF
fXY . The conditional PMF of Y given X = t is defined as the PMF

P (X = x, Y = y) fXY (x, y)
fY |X=x (y) = =
P (X = x) fX (x)

We will denote the conditional random variable by Y |(X = x). (Note that Y |(X = x)
is a valid random variable with PMF fY |(X=x) .

• Range of (Y |X = t) can be different from TY and will depend on t.


• fXY (x, y) = fY |X=x (x, y).fX (x) = fX|Y =y (x, y).fY (y)

P
fY |X=x (y) = 1
y∈TY

5. Joint PMF of more than two discrete random variables:


Suppose X1 , X2 , . . . , Xn are discrete random variables defined in the same probability
space. Let the range of Xi be TXi . The joint PMF of Xi , denoted by fX1 X2 ...Xn , is a
function from TX1 × TX2 × . . . × TXn to [0, 1] defined as

fX1 X2 ...Xn (t1 , t2 , . . . , tn ) = P (X1 = t1 , X2 = t2 , . . . , Xn = tn ); ti ∈ TXi

6. Marginal PMF in case of more than two discrete random variables:


Suppose X1 , X2 , . . . , Xn are jointly distributed discrete random variables with joint
PMF fX1 X2 ...Xn . The PMF of the individual random variables X1 , X2 , . . . , Xn are
called as marginal PMFs. It can be shown that
X
fX1 (t1 ) = P (X1 = t1 ) = fX1 X2 ...Xn (t1 , t2 , . . . , tn )
t2 ∈TX2 ,t3 ∈TX3 ,...,tn ∈TXn

X
fX2 (t2 ) = P (X2 = t2 ) = fX1 X2 ...Xn (t1 , t2 , . . . , tn )
t1 ∈TX1 ,t3 ∈TX3 ,...,tn ∈TXn

..
.
X
fXn (tn ) = P (Xn = tn ) = fX1 X2 ...Xn (t1 , t2 , . . . , tn )
t1 ∈TX1 ,t2 ∈TX2 ,...,tn−1 ∈TXn−1

7. Marginalisation: Suppose X1 , X2 , . . . , Xn are jointly distributed discrete random


variables with joint PMF fX1 X2 ...Xn . The joint PMF of the random variables Xi1 , Xi2 , . . . Xik ,
denoted by fXi1 Xi2 ...Xik is given by
X
fXi1 Xi2 ...Xik (ti1 , ti2 , . . . tik ) = fX1 X2 ...Xn (t1 , . . . ti1 −1 , ti1 , ti1 +1 , . . . tik −1 , tik , tik +1 . . . , tn )

• Sum over everything you don’t want.

Page 2
8. Conditioning with multiple discrete random variables:

• A wide variety of conditioning is possible when there are many random variables.
Some examples are:
• Suppose X1 , X2 , X3 , X4 ∼ fX1 X2 X3 X4 and xi ∈ TXi , then
fX X (x1 , x2 )
– fX1 |X2 =x2 (x1 ) = 1 2
fX2 (x2 )
fX X X (x1 , x2 , x3 )
– fX1 ,X2 |X3 =x3 (x1 , x2 ) = 1 2 3
fX3 (x3 )
fX X X (x1 , x2 , x3 )
– fX1 |X2 =x2 ,X3 =x3 (x1 ) = 1 2 3
fX2 X3 (x2 , x3 )
fX X X X (x1 , x2 , x3 , x4 )
– fX1 X4 |X2 =x2 ,X3 =x3 (x1 , x4 ) = 1 2 3 4
fX2 X3 (x2 , x3 )
9. Conditioning and factors of the joint PMF:
Let X1 , X2 , X3 , X4 ∼ fX1 X2 X3 X4 , Xi ∈ TXi .
fX1 X2 X3 X4 (t1 , t2 , t3 , t4 ) =P (X1 = t1 and (X2 = t2 , X3 = t3 , X4 = t4 ))
=fX1 |X2 =t2 ,X3 =t3 ,X4 =t4 (t1 )P (X2 = t2 and (X3 = t3 , X4 = t4 ))
=fX1 |X2 =t2 ,X3 =t3 ,X4 =t4 (t1 )fX2 |X3 =t3 ,X4 =t4 (t2 )P (X3 = t3 and X4 = t4 )
=fX1 |X2 =t2 ,X3 =t3 ,X4 =t4 (t1 )fX2 |X3 =t3 ,X4 =t4 (t2 )fX3 |X4 =t4 (t3 )fX4 (t4 ).
• Factoring can be done in any sequence.
10. Independence of two random variables:
Let X and Y be two random variables defined in a probability space with ranges TX
and TY , respectively. X and Y are said to be independent if any event defined using
X alone is independent of any event defined using Y alone. Equivalently, if the joint
PMF of X and Y is fXY , X and Y are independent if
fXY (x, y) = fX (x)fY (y)
for x ∈ TX and y ∈ TY

• X and Y are independent if


fX|Y =y (x) = fX (x)
fY |X=x (y) = fY (y)
for x ∈ TX and y ∈ TY

• To show X and Y independent, verify


fXY (x, y) = fX (x)fY (y)
for all x ∈ TX and y ∈ TY

Page 3
• To show X and Y dependent, verify
fXY (x, y) 6= fX (x)fY (y)
for some x ∈ TX and y ∈ TY
– Special case: fXY (t1 , t2 ) = 0 when fX (t1 ) 6= 0, fY (t2 ) 6= 0.
11. Independence of multiple random variables:
Let X1 , X2 , . . . , Xn be random variables defined in a probability space with range of
Xi denoted TXi . X1 , X2 , . . . , Xn are said to be independent if events defined using
different Xi are mutually independent. Equivalently, X1 , X2 , . . . , Xn are independent
iff
fX1 X2 ...Xn (t1 , t2 , . . . , tn ) = fX1 (x1 )fX2 (x2 ) . . . fXn (xn )
for all xi ∈ TXi
• All subsets of independent random variables are independent.
12. Independent and Identically Distributed (i.i.d.) random variables:
Random variables X1 , X2 , . . . , Xn are said to be independent and identically distributed
(i.i.d.), if
(i) they are independent.
(ii) the marginal PMFs fXi are identical.
Examples:
• Repeated trials of an experiment creates i.i.d. sequence of random variables
– Toss a coin multiple times.
– Throw a die multiple times.
• Let X1 , X2 , . . . Xn ∼ i.i.d.X (Geometric(p)).
X will take values in {1, 2, . . .}
P (X = k) = pk−1 p

Since Xi ’s are independent and identically distributed, we can write


P (X1 > j, X2 > j, . . . , Xn > j) =P (X1 > j)P (X2 > j) . . . P (Xn > j)
=[P (X > j)]n

X
P (X > j) = (1 − p)k−1 p
k=j+1

=(1 − p)j p + (1 − p)j+1 p + (1 − p)j+2 p + . . .


=(1 − p)j p[1 + (1 − p) + (1 − p)2 + . . .]
 
j 1
=(1 − p) p
1 − (1 − p)
=(1 − p)j
⇒ P (X1 > j, X2 > j, . . . , Xn > j) = [P (X > j)]n = (1 − p)jn

Page 4
13. Functions of a random variable: X : random variable with PMF fX (t).
f (X) : random variable whose PMF is given as follows.

ff (X) (a) = P (f (X) = a) = P (X ∈ {t : f (t) = a})


X
= fX (t)
t:f (t)=a

• PMF of f (X) can be found using PMF of X.


14. Functions of multiple random variables (g(X1 , X2 , . . . , Xn )):
Suppose X1 , X2 , . . . , Xn have joint PMF fX1 X2 ...Xn with TXi denoting the range of Xi .
Let g : TX1 × TX2 × . . . × TXn → R be a function with range Tg . The PMF of
X = g(X1 , X2 ..., Xn ) is given by
X
fX (t) = P (g(X1 , X2 ..., Xn ) = t) = fX1 X2 ...Xn (t1 , t2 , . . . , tn )
(t1 ,...,tn ):g(X1 ,X2 ...,Xn )=t

• Sum of two random variables taking integer values:


X, Y ∼ fXY , Z = X + Y.
Let z be some integer,
P (Z = z) =P (X + Y = z)
X∞
= P (X = x, Y = z − x)
x=−∞
X∞
= fXY (x, z − x)
x=−∞
X∞
= fXY (z − y, y)
y=−∞


• Convolution: If X and Y are independent, fX+Y (z) =
P
fX (x)fY (z − x)
x=−∞

• Let X ∼ Poisson(λ1 ), Y ∼ Poisson(λ2 )


– X and Y are independent.
– Z = X + Y , z ∈ {0, 1, 2, . . .}
fZ (z) ∼ Poisson(λ1 + λ2 )    
λ1 λ2
(X = k | Z = n) ∼ Binomial n, , (Y = k | Z = n) ∼ Binomial n,
λ1 + λ2 λ1 + λ2
15. CDF of a random variable:
Cumulative distribution function of a random variable X is a function FX : R → [0, 1]
defined as
FX (x) = P (X ≤ x)

Page 5
16. Minimum of two random variables:
Let X, Y ∼ fXY and let Z = min{X, Y }, then

fZ (z) = P (Z = z) = P (min{X, Y } = z)
= P (X = z, Y = z) + P (X = z, Y > z) + P (X > z, Y = z)
X X
= fXY (z, z) + fXY (z, t2 ) + fXY (t1 , z)
t2 >z t1 >z

FZ (z) = P (Z ≤ z) = P (min{X, Y } ≤ z)
= 1 − P (min{X, Y } > z)
= 1 − [P (X > z, Y > z)]

17. Maximum of two random variables:


Let X, Y ∼ fXY and let Z = max{X, Y }, then

fZ (z) = P (Z = z) = P (max{X, Y } = z)
= P (X = z, Y = z) + P (X = z, Y < z) + P (X < z, Y = z)
X X
= fXY (z, z) + fXY (z, t2 ) + fXY (t1 , z)
t2 <z t1 <z

FZ (z) = P (Z ≤ z) = P (max{X, Y } ≤ z)
= [P (X ≤ z, Y ≤ z)]

18. Maximum and Minimum of n i.i.d. random variables

• Let X ∼ Geometric(p), Y ∼ Geometric(q)


X and Y are independent.
Z = min(X, Y )
Z ∼ Geometric(1 − (1 − p)(1 − q))
• Maximum of 2 independent geometric random variables is not geometric.

Important Points:

1. Let N ∼ Poisson(λ) and X|N = n ∼ Binomial(n, p), then X ∼ Poisson(λp)

Page 6
2. Memory less property of Geometric(p)
If X ∼ Geometric(p), then

P (X > m + n|X > m) = P (X > n)

3. Sum of n independent Bernoulli(p) trials is Binomial(n, p).

4. Sum of 2 independent Uniform random variables is not Uniform.

5. Sum of independent Binomial(n, p) and Binomial(m, p) is Binomial(n + m, p).

6. Sum of r i.i.d. Geometric(p) is Negative-Binomial(r, p).

7. Sum of independent Negative-Binomial(r, p) and Negative-Binomial(s, p) is Negative-


Binomial(r + s, p)

8. If X and Y are independent, then g(X) and h(Y ) are also independent.

Page 7

You might also like