You are on page 1of 41

Discrete joint Continuous joint Extra notes

10.022 Modelling Uncertainty


Week 9 Class 1: Joint Distributions

Term 3, 2023

1 / 41
Discrete joint Continuous joint Extra notes Joint and marginal pmf Independence Expected value

Outline

1 Discrete joint distribution


Joint and marginal pmf
Independence
Expected value

2 Continuous joint distribution


Joint pdf
Basic example
Marginal pdf
Independence; expected value

3 Extra notes
Proofs

2 / 41
Discrete joint Continuous joint Extra notes Joint and marginal pmf Independence Expected value

Introduction
Many situations involve 2 (or more) random variables, and it is
useful to study both random variables together (or, jointly ). Doing
so will lead to important concepts such as independence of random
variables, and sum of random variables.

Example 1
As early as Weeks 1 and 3, we encountered the following example
involving 2 discrete random variables:
Data from the last 100 job applicants at a company resulted in the
following table. One of the applicants is selected at random.

bachelor master PhD row sum


hired 12 14 10 36
not hired 38 21 5 64
column sum 50 35 15 100
3 / 41
Discrete joint Continuous joint Extra notes Joint and marginal pmf Independence Expected value

Example 1 (continued)

Define X to be a (Bernoulli) random variable that equals 1 if the


selected applicant is hired, and 0 otherwise. Define Y to be a
random variable that equals 0 if the applicant has a bachelor’s, 1 if
the applicant has a master’s, and 2 if the applicant has a PhD.

Then, after converting all the numbers in the table on Slide 3 into
probabilities, we get:

x⧹y 0 1 2 row sum


1 0.12 0.14 0.1 0.36
0 0.38 0.21 0.05 0.64
column sum 0.5 0.35 0.15 1

This is an example of a joint distribution of X and Y .

4 / 41
Discrete joint Continuous joint Extra notes Joint and marginal pmf Independence Expected value

Joint pmf
For discrete RV’s, their joint distribution is specified using a joint
probability mass function.

Definition
Let X and Y be discrete random variables. The joint probability
mass function (joint pmf) f (x, y) is defined as

f (x, y) := P (X = x) ∩ (Y = y) .
XX
Due to the Axioms, we must have f (x, y) = 1, where
all x all y
the sum is taken over all x and y values that X and Y can take
with positive probabilities.

In Example 1, we have: f (1, 0) = 0.12, f (1, 1) = 0.14,


f (1, 2) = 0.1, f (0, 0) = 0.38, f (0, 1) = 0.21, f (0, 2) = 0.05.
Their sum is indeed 1.
5 / 41
Discrete joint Continuous joint Extra notes Joint and marginal pmf Independence Expected value

Marginal pmf
Definition (continued)
The marginal probability mass function (marginal pmf) of X,
denoted by fX , is just the pmf of X, and can be computed using
X
fX (x) := P(X = x) = f (x, y).
all y
Similarly, the marginal pmf of Y , denoted by fY , is just the pmf
of Y , and can be computed using
X
fY (y) := P(Y = y) = f (x, y).
all x

Since (e. g.) fX is a function of x, so the sum must be over y.


In Example 1, we have: fX (1) = 0.36, fX (0) = 0.64,
fY (0) = 0.5, fY (1) = 0.35, and fY (2) = 0.15.
Marginal pmf’s are the row and column sums of the joint pmf
table; these sums are written in the margins, hence the name.
6 / 41
Discrete joint Continuous joint Extra notes Joint and marginal pmf Independence Expected value

Activity 1 (5 minutes)
An unloaded dice is rolled twice. Let X be the maximum number
obtained from the two rolls, and Y be the minimum number from
the two rolls. (E. g. if the two rolls result in 3 and 5, then X = 5
and Y = 3.) Complete the joint pmf table below.
x⧹y 1 2 3 4 5 6 fX
1
1 36 0
1
2 18

3
4
5
6

fY 1
7 / 41
Discrete joint Continuous joint Extra notes Joint and marginal pmf Independence Expected value

Activity 1 (solution)
For instance, we have
f (3, 3) = P(max = 3 and min = 3) = P({(3, 3)}) = 1/36;
f (3, 2) = P(max = 3 and min = 2) = P({(3, 2), (2, 3)}) = 2/36.
x⧹y 1 2 3 4 5 6 fX
1 1
1 36 0 0 0 0 0 36
1 1 3
2 18 36 0 0 0 0 36
1 1 1 5
3 18 18 36 0 0 0 36
1 1 1 1 7
4 18 18 18 36 0 0 36
1 1 1 1 1 9
5 18 18 18 18 36 0 36
1 1 1 1 1 1 11
6 18 18 18 18 18 36 36
11 9 7 5 3 1
fY 36 36 36 36 36 36 1
Note that both of the marginal pmf’s sum up to 1, which is a
useful way to check your answers. 8 / 41
Discrete joint Continuous joint Extra notes Joint and marginal pmf Independence Expected value

Independence of random variables

Definition
Let X and Y be discrete random variables. X and Y are said to
be independent if and only if the events ‘X ∈ A’ and ‘Y ∈ B’ are
independent, for any sets A and B ⊆ R.

It can be shown that the above definition is equivalent to the


following criteria: X and Y are independent if and only if

P (X = x) ∩ (Y = y) = P(X = x) P(Y = y), for all x and y.

In terms of the joint and marginal pmf’s, this criteria can be


written as:

X and Y are independent if and only if


f (x, y) = fX (x) fY (y), for all x, y ∈ R.

9 / 41
Discrete joint Continuous joint Extra notes Joint and marginal pmf Independence Expected value

Independence, continued

Intuitively, two random variables are independent if they do not


affect each other in any way (knowing the value of one RV does
not change the distribution of the other RV). A simple example
would be the outcomes from two dice rolls.

The independence of random variables is a much more strict


condition than the independence of events. (Recall from Week 4:
sometimes, events may be independent due to a coincidence.)
Therefore, do not confuse these two notions of independence.

In general, we cannot recover the joint pmf of X and Y if we are


only given their marginal pmf’s. But if X and Y are independent,
then the joint pmf is just the product of the marginals.

10 / 41
Discrete joint Continuous joint Extra notes Joint and marginal pmf Independence Expected value

Expected value

Let X and Y be discrete random variables, with joint probability


mass function f (x, y). Let g be any function of two variables.
Then, g(X, Y ) is yet another random variable, and the expected
value (mean) of g(X, Y ) can be calculated using the following
result:

Theorem
 XX
E g(X, Y ) = g(x, y) f (x, y).
all x all y

Think of this theorem as a generalization


P of the corresponding
result for a single RV, E(g(X)) = all x g(x) f (x).

A connection with a theorem from Week 5 can be found in the


extra notes.
11 / 41
Discrete joint Continuous joint Extra notes Joint and marginal pmf Independence Expected value

Activity 2 (10 minutes)

In a suburb, a household is randomly selected. Let X be the


number of cars owned by that household, and Y be the number of
TV’s owned by the same household. The joint pmf of X and Y is
given below.
x⧹y 1 2 3 4
0 0.15 0.25 0.03 0
1 0.12 0.2 0.05 0
2 0.08 0.1 0 0.02

(a) Are X and Y independent random variables?

(b) Compute E(X + Y ).

(c) Find P(X < Y ).

12 / 41
Discrete joint Continuous joint Extra notes Joint and marginal pmf Independence Expected value

Activity 2 (solution)

(a) We first compute the marginal pmf’s:

x⧹y 1 2 3 4 fX
0 0.15 0.25 0.03 0 0.43
1 0.12 0.2 0.05 0 0.37
2 0.08 0.1 0 0.02 0.2
fY 0.35 0.55 0.08 0.02 1

We observe (for instance) that

f (2, 3) = 0, while fX (2) fY (3) = 0.2 × 0.08 ̸= 0.

Therefore, the criteria for independence on the bottom of Slide 9 is


not satisfied for all x and y. Hence, X and Y are not independent
random variables.

13 / 41
Discrete joint Continuous joint Extra notes Joint and marginal pmf Independence Expected value

Activity 2 (solution, continued)


(b) E(X + Y ) means the average number of (cars + TV’s). We let
the function g(X, Y ) = X + Y , then use the theorem on Slide 11:
2 X
X 4
E(X + Y ) = (x + y) f (x, y)
x=0 y=1

= (0 + 1) × 0.15 + (0 + 2) × 0.25 + (0 + 3) × 0.03 + (0 + 4) × 0


+ (1 + 1) × 0.12 + (1 + 2) × 0.2 + (1 + 3) × 0.05 + (1 + 4) × 0
+ (2 + 1) × 0.08 + (2 + 2) × 0.1 + (2 + 3) × 0 + (2 + 4) × 0.02
= 2.54.

Alternatively, we can use the linearity of expectation to write

E(X + Y ) = E(X) + E(Y ) = 0.77 + 1.77 = 2.54,

where E(X) and E(Y ) are computed using the marginal pmf’s.
(Computing the same expectation in two ways allows you to check
your answer.)
14 / 41
Discrete joint Continuous joint Extra notes Joint and marginal pmf Independence Expected value

Activity 2 (solution, continued)

(c) The event ‘X < Y ’ means ‘fewer cars than TV’s’, and consists
of all (x, y) pairs satisfying x < y. The probabilities corresponding
to these pairs are shown in red on Slide 13. Summing up the
probabilities, we get

P(X < Y ) = 0.15 + 0.25 + 0.03 + 0.2 + 0.05 + 0.02 = 0.7.

In general, when we have a pair of discrete random variables X


and Y , and a region C, then P((X, Y ) ∈ C) can be computed by
summing the joint pmf over all (x, y) pairs satisfying (x, y) ∈ C.
As we will see next, for a pair of continuous random variables, the
same rule applies, except we replace ‘summing’ by ‘integrating’,
and ‘pmf’ by ‘pdf’.

15 / 41
Discrete joint Continuous joint Extra notes Joint pdf Example Marginal pdf Independence; E

Outline

1 Discrete joint distribution


Joint and marginal pmf
Independence
Expected value

2 Continuous joint distribution


Joint pdf
Basic example
Marginal pdf
Independence; expected value

3 Extra notes
Proofs

16 / 41
Discrete joint Continuous joint Extra notes Joint pdf Example Marginal pdf Independence; E

Joint pdf

Definition
Let X and Y be continuous random variables. A two-variable
function f (x, y) ≥ 0 is called the joint probability density
function (joint pdf) of X and Y , if for any region C in the plane,
ZZ

P (X, Y ) ∈ C = f (x, y) dx dy.
(x, y)∈C

Important example: if C is the rectangle [a, b] × [c, d], then


Z dZ b
P (a ≤ X ≤ b) ∩ (c ≤ Y ≤ d) = f (x, y) dx dy.
c a

If C is a more complicated region, then we would often need to


sketch the region, then set up a double integral accordingly.
17 / 41
Discrete joint Continuous joint Extra notes Joint pdf Example Marginal pdf Independence; E

Joint pdf, properties



Notation: P (X ∈ A) ∩ (Y ∈ B) is also denoted by 
P(X ∈ A, Y ∈ B). For example, P (a ≤ X ≤ b) ∩ (c ≤ Y ≤ d)
can be written as P(a ≤ X ≤ b, c ≤ Y ≤ d).

Since the total probability needs to be 1, we must have


Z ∞Z ∞
f (x, y) dx dy = 1.
−∞ −∞

f (x, y) does not represent a probability by itself; rather, for


small δx and δy,

f (x, y) δx δy ≈ P(x ≤ X ≤ x + δx, y ≤ Y ≤ y + δy).

f (x, y) can be thought of as probability per unit area (hence


‘density’). Volumes under f (x, y) represent probabilities.
18 / 41
Discrete joint Continuous joint Extra notes Joint pdf Example Marginal pdf Independence; E

Joint pdf, picture

An example of a joint pdf is shown as a surface z = f (x, y). The


total volume under the surface and above the xy-plane must be 1.
The probability that the pair (X, Y ) is found in the rectangle C, is
equal to the volume of the solid with base C and bounded above
by f (x, y).
19 / 41
Discrete joint Continuous joint Extra notes Joint pdf Example Marginal pdf Independence; E

Basic example
Example 2
Let k be a constant, and let T be the triangle in R2 bounded by
x ≥ 0, y ≥ 0, and x + y ≤ 1. Suppose that the joint pdf of X and
Y is given by (
k, on the triangle T ,
f (x, y) =
0, otherwise.

Above: the triangle T ;


right: the joint pdf in 3D.
20 / 41
Discrete joint Continuous joint Extra notes Joint pdf Example Marginal pdf Independence; E

Basic example, continued


Intuitively, the joint pdf being a constant k over the triangle T
means that no part of T is favoured over any other part.
The solid 3D figure on Slide 20 must have volume 1 (as the total
probability is 1). Since the triangle T has area 1/2, the ‘height’
k = 2.
Suppose we wish to find P(0 ≤ X ≤ 1/2, 0 ≤ Y ≤ 1/3). The
(rectangular) region here is entirely within T , so the probability is
given by the volume of a rectangular prism with length 1/2, width
1/3, and height 2, i. e. the answer is 1/3.
Alternatively, we can use the formula on the bottom of Slide 17:
Z 1/3 Z 1/2
1
P(0 ≤ X ≤ 1/2, 0 ≤ Y ≤ 1/3) = 2 dx dy = .
0 0 3
As with any double integration, it is not necessary to visualize the
surface f , but we should sketch the region of integration.
21 / 41
Discrete joint Continuous joint Extra notes Joint pdf Example Marginal pdf Independence; E

Activity 3 (5 minutes)

Consider the joint pdf in Example 2.

(a) Find P(X ≤ 1/2).

(b) Find P(X + Y ≤ 1/2).

Hint: sketch T and the region for each part.

22 / 41
Discrete joint Continuous joint Extra notes Joint pdf Example Marginal pdf Independence; E

Activity 3 (solution)

(a) The intersection between the region ‘x ≤ 1/2’ and the triangle
T is shaded in green above. This region is a trapezium with area
3/8, so the probability, as the volume of the corresponding prism,
is given by 3/8 × 2 = 3/4.
Alternatively, using double integration (and the picture above),
Z x=1/2 Z y=1−x
P(X ≤ 1/2) = 2 dy dx
x=0 y=0
Z 1/2 1/2
3
= 2 − 2x dx = 2x − x2 = .
0 0 4
23 / 41
Discrete joint Continuous joint Extra notes Joint pdf Example Marginal pdf Independence; E

Activity 3 (solution, continued)

(b) The intersection between the region ‘x + y ≤ 1/2’ and the


triangle T is shaded in green above. This region is a triangle with
area 1/8, so the probability, as the volume of the corresponding
prism, is given by 1/8 × 2 = 1/4.
Alternatively, using double integration (and the picture above),
Z y=1/2 Z x=1/2−y
P(X + Y ≤ 1/2) = 2 dx dy
y=0 x=0
Z 1/2 1/2
1
= 1 − 2y dy = y − y 2 = .
0 0 4
24 / 41
Discrete joint Continuous joint Extra notes Joint pdf Example Marginal pdf Independence; E

Marginal pdf

Definitions
Let X and Y be continuous random variables. The marginal
probability density function (marginal pdf) of X, denoted by
fX , is just the pdf of X, and can be computed using
Z ∞
fX (x) = f (x, y) dy.
−∞

Similarly, the marginal pdf of Y , denoted by fY , is just the pdf of


Y , and can be computed using
Z ∞
fY (y) = f (x, y) dx.
−∞

The only difference from the discrete case is that summation is


replaced by integration.
25 / 41
Discrete joint Continuous joint Extra notes Joint pdf Example Marginal pdf Independence; E

Basic example, continued


We compute the marginal pdf of X for Example 2.
When 0 ≤ x ≤ 1, we have
Z ∞ Z 0 Z 1−x Z ∞
fX (x) = f (x, y) dy = 0 dy + 2 dy + 0 dy
−∞ −∞ 0 1−x
Z y=1−x
=0+ 2 dy + 0 = 2(1 − x).
y=0

Recall that we need to carefully specify the domain of any pdf (or
cdf), so the full answer is
(
2(1 − x), if 0 ≤ x ≤ 1,
fX (x) =
0, otherwise.

Sanity checks: fX (x) is a function of x only (soR it should not



contain y); fX (x) must be a valid pdf, that is, −∞ fX (x) dx = 1.
26 / 41
Discrete joint Continuous joint Extra notes Joint pdf Example Marginal pdf Independence; E

Independence of RV’s; expected value


Let X and Y be continuous random variables. The notion of
independence is the same as before: X and Y are said to be
independent if and only if

P (X ∈ A) ∩ (Y ∈ B) = P(X ∈ A) P(Y ∈ B), for all A, B ⊆ R.
Equivalently, it can be shown that
X and Y are independent if and only if
f (x, y) = fX (x) fY (y), for all x, y ∈ R.

In words: X and Y are independent if and only if their joint pdf is


the product of the marginal pdf’s, for all real x and y.
Expected values can be computed using
Z ∞Z ∞

E g(X, Y ) = g(x, y) f (x, y) dx dy.
−∞ −∞
Again, the only difference from before is that summation is
replaced by integration. 27 / 41
Discrete joint Continuous joint Extra notes Joint pdf Example Marginal pdf Independence; E

Activity 4 (optional/take home, 10 minutes)


Consider two water pipes in a drainage system, whose discharge (in
m3 /s) during rainfall is given by the continuous random variables
X and Y respectively. Suppose that their joint pdf is
(
k, if 5 ≤ x ≤ 10 and 4 ≤ y ≤ 9,
f (x, y) =
0, otherwise.

(a) Find the constant k.

(b) Compute the marginal pdf’s of X and Y , and hence show that
X and Y are independent.

(c) What is the probability that the discharge of the first pipe
(with discharge X) is higher than that of the second pipe?

(Hint: a sketch would be helpful for each part.)


28 / 41
Discrete joint Continuous joint Extra notes Joint pdf Example Marginal pdf Independence; E

Activity 4 (solution)
Z ∞ Z ∞ Z 9 Z 10
(a) 1= f (x, y) dx dy = k dx dy = 52 k,
−∞ −∞ 4 5

therefore k = 1/25.
Z ∞ Z 9
1 1
(b) fX (x) = f (x, y) dy = dy = if 5 ≤ x ≤ 10,
−∞ 4 25 5

and 0 otherwise.
Z ∞ Z 10
1 1
fY (y) = f (x, y) dx = dx = if 4 ≤ y ≤ 9,
−∞ 5 25 5

and 0 otherwise.
On the square (5 ≤ x ≤ 10) ∩ (4 ≤ y ≤ 9), f (x, y) = fX (x) fY (y)
= 1/25, while outside the square, f (x, y) = fX (x) fY (y) = 0.
Therefore, f (x, y) = fX (x) fY (y) for all real x and y, hence X
and Y are independent (uniform) random variables.
29 / 41
Discrete joint Continuous joint Extra notes Joint pdf Example Marginal pdf Independence; E

Activity 4 (solution, continued)


(c) Two regions of interest, X ≤ Y and X > Y , are shaded below.

30 / 41
Discrete joint Continuous joint Extra notes Joint pdf Example Marginal pdf Independence; E

Activity 4 (solution, continued)


Using the picture on Slide 30, we may deduce geometrically that
P(X ≤ Y ) = 42 /2 · k = 8/25, therefore
P(X > Y ) = 1 − P(X ≤ Y ) = 17/25.
(Indeed, parts (a) and (b) can also be done using the picture and
geometry, without integration.)
Alternatively, using integration (and referring to the picture to help
us determine the limits of integration), we have
P(X > Y ) = 1 − P(X ≤ Y )
Z 9Z y
1
=1− dx dy
5 5 25
Z 9
1
=1− y − 5 dy
25 5
9
1 y2

17
=1− − 5y = .
25 2 5 25
31 / 41
Discrete joint Continuous joint Extra notes Joint pdf Example Marginal pdf Independence; E

Harder example
Example 3
Let X and Y denote the lifetime (in years) of an iPhone and an
Android phone respectively. Suppose that X ∼ exponential(a),
Y ∼ exponential(b), and that X and Y are independent. Find the
probability that the iPhone fails before the Android phone.

Solution: Due to independence, the joint pdf is given by


f (x, y) = fX (x) fY (y) = a e−ax b e−by if x ≥ 0 and y ≥ 0,
and 0 otherwise. (Recall the exponential pdf from Week 6.)
We are interested in P(X < Y ), which, by the definition on Slide
17, can be computed using the double integral of f over the region
‘x < y’ in the plane:
ZZ
P(X < Y ) = f (x, y) dx dy.
x<y
32 / 41
Discrete joint Continuous joint Extra notes Joint pdf Example Marginal pdf Independence; E

Example 3, picture

Above: part of the region of


integration; right: the joint pdf
as a surface over the region.

Note that the region of integration (x < y) and the surface


(a e−ax b e−by ) are separate objects; it is not necessary to visualize
the surface, but it is important to sketch the region to help you set
up the integration limits.

33 / 41
Discrete joint Continuous joint Extra notes Joint pdf Example Marginal pdf Independence; E

Example 3, continued
The integral on Slide 32 simplifies as
ZZ
P(X < Y ) = a e−ax b e−by dy dx
0<x<y<∞
Z x=∞ Z y=∞
= a e−ax b e−by dy dx
x=0 y=x
Z x=∞ h iy=∞
= a e−ax − e−by dx
y=x
Zx=0

= a e−ax e−bx dx
Z0 ∞
= a e−(a+b)x dx
0 ∞
a −(a+b)x
= − e
a+b 0
a
= .
a+b
34 / 41
Discrete joint Continuous joint Extra notes Joint pdf Example Marginal pdf Independence; E

Activity 5 (10 minutes)

In Example 3, suppose that a = b = 1. Find the probability that


the Android phone lasts at least 1 year longer than the iPhone,
that is, find P(Y ≥ X + 1).

Hint: first, sketch the region of integration.

35 / 41
Discrete joint Continuous joint Extra notes Joint pdf Example Marginal pdf Independence; E

Activity 5 (solution)
For P(Y ≥ X + 1), the required region of integration is
‘y ≥ x + 1’. Shaded below is part of this infinite region (intersect
with the 1st quadrant, where the joint pdf is non-zero).

36 / 41
Discrete joint Continuous joint Extra notes Joint pdf Example Marginal pdf Independence; E

Activity 5 (solution, continued)


Following the steps in the example, we get:
ZZ
P(X + 1 ≤ Y ) = e−x e−y dy dx
1<x+1≤y<∞
Z ∞Z ∞
−x −y
= e e dy dx
0 x+1
Z ∞ h i∞
= e−x − e−y dx
x+1
Z0 ∞
= e−x e−(x+1) dx
0
Z ∞
e−(2x+1) dx
=
0
 ∞
1 −(2x+1)
= − e
2 0
1
= ≈ 0.184.
2e
(It is also possible to do the dx integral first, then the dy integral; note
that the integration limits would change.) 37 / 41
Discrete joint Continuous joint Extra notes Proofs

Outline

1 Discrete joint distribution


Joint and marginal pmf
Independence
Expected value

2 Continuous joint distribution


Joint pdf
Basic example
Marginal pdf
Independence; expected value

3 Extra notes
Proofs

38 / 41
Discrete joint Continuous joint Extra notes Proofs

Joint distribution
The ‘full’ notation for f (x, y) is fX,Y (x, y), but the subscripts
are often dropped when the context is clear.
For any random variables X and Y , the joint cumulative
distribution function F (x, y) is defined as

F (x, y) := P (X ≤ x) ∩ (Y ≤ y) .
The joint cdf is very useful, as it can be used to compute all
kinds of probabilities regarding X and Y . For instance, we have
P(X ≤ x) = FX (x) = lim F (x, y),
y→∞

P(Y ≤ y) = FY (y) = lim F (x, y),


x→∞

P (X > x) ∩ (Y > y) = 1 − FX (x) − FY (y) + F (x, y).
Joint distributions are useful for summing random variables (next
class), and for important applications such as the multivariate
normal distribution (extensively used in e. g. machine learning).
39 / 41
Discrete joint Continuous joint Extra notes Proofs

Independence
We prove that the two characterizations of independence on Slide 9 are
equivalent. Such a proof must proceed in both directions:

In one direction: If ‘X ∈ A’ and ‘Y ∈ B’ are independent for any sets A


and B, then they must be independent for the particular choices of
A = {x} and B = {y},  for any x and y. It then follows that
P (X = x) ∩ (Y = y) = P(X = x) P(Y = y).

In the other direction: If P (X = x) ∩ (Y = y) = P(X = x) P(Y = y)
for any x and y, then by Axiom 3, we have
 XX 
P (X ∈ A) ∩ (Y ∈ B) = P (X = x) ∩ (Y = y)
x∈A y∈B
XX
= P(X = x) P(Y = y)
x∈A y∈B
X X
= P(X = x) P(Y = y)
x∈A y∈B

= P(X ∈ A) P(Y ∈ B),


establishing the independence of ‘X ∈ A’ and ‘Y ∈ B’ for any A and B. 40 / 41
Discrete joint Continuous joint Extra notes Proofs

Expectation of a sum
Let X and Y be any (not necessarily independent) discrete RV’s,
and let g(X, Y ) = X + Y . Then the theorem on Slide 11 gives
XX
E(X + Y ) = (x + y) f (x, y)
all x all y
X X X X
= x f (x, y) + y f (x, y)
all x all y all y all x
X X
= x fX (x) + y fY (y)
all x all y

= E(X) + E(Y ).

This process can be iterated to prove that

E(X1 + X2 + · · · + Xn ) = E(X1 ) + E(X2 ) + · · · + E(Xn ).

Moreover, the proof for continuous random variables is similar.


Therefore, we have shown Theorem 2 from Week 5 Class 2.
41 / 41

You might also like