Discrete Random Variables: He Shuangchi

Discrete Random Variables
He Shuangchi
IE5004 Discrete Random Variables 1 / 47

An example of gambling
Suppose that two gamblers, Tom and Jerry, are rolling a fair die.
If the outcome is 1, 2, or 3, Tom wins one dollar.
If the outcome is 4, 5, or 6, Jerry wins one dollar.
Let X be Tom’s gain of a particular run of the game. It is either 1 dollar
or −1 dollar, depending on the outcome of this run.
X (1) = X (2) = X (3) = 1
X (4) = X (5) = X (6) = −1
Question
Let Y be Jerry’s gain of that run. What is Y (1), Y (2), . . . , Y (6)?

Random variables
When focusing on certain numerical aspects of outcomes, it is convenient

to associate each outcome with a specific number.
Definition
A random variable is a function that assigns a real number to each
outcome in the sample space.
Let R be the set of real numbers. In other words, a random variable

X is a function that satisfies X (ω) ∈ R for each ω ∈ Ω.
Random variables are usually denoted by capital letters, e.g., X , Y , Z .

Notation
For a random variable X , when we write {X ∈ B} for a set of numbers B,

it means
{X ∈ B} = {ω ∈ Ω : X (ω) ∈ B}
Therefore,
P[X ∈ B] = P[ω ∈ Ω : X (ω) ∈ B]

Questions
In the previous example, if the die is fair,

What is the set {X = 1}? {X = 1} = {1, 2, 3} and P[X = 1] = 0.5.
What is the set {X = −1}? {X = −1} = {4, 5, 6} and
P[X = −1] = 0.5.
What is the set {X = 0}? {X = 0} = ∅ and P[X = 0] = 0.
What is the set {X ≤ 4}? {X ≤ 4} = Ω and P[X ≤ 4] = 1.

Example
Let X be the random variable that is defined as the sum of two fair dice.
What is the set {X = 2}? {X = 2} = {(1, 1)} and
1
P[X = 2] = P[{(1, 1)}] =
36
What is the set {X = 5}? {X = 5} = {(1, 4), (2, 3), (3, 2), (4, 1)} and
4
P[X = 5] = P[{(1, 4), (2, 3), (3, 2), (4, 1)}] =
36

An example of driving test
Example
Suppose that I am taking a driving test to obtain my driver’s license. If I
fail in the test, I will take the test again and again until I pass it. Once I
pass it, I can have the license. Let us consider the results of my tests until
I get the license.
The result of each test can be pass or fail, so the sample space is
Ω = {P, FP, FFP, FFFP, . . .}
Let X be the number of tests I take until I get the license. Is X a

random variable? Yes.
What is X ?
X (P) = 1, X (FP) = 2, X (FFP) = 3, . . .

An example of theory test
Example
To obtain my driver’s license. I also need to pass a theory test. The test
has 25 questions. I need to answer at least 20 questions correctly to pass
the exam. Suppose that for each question, I can give the right answer with
probability p and the correctness of different answers are independent. Let
Y be the number of correct answers.
Is Y a random variable? Yes.

What is the probability that I answer n questions correctly?

25 n
P[Y = n] = p (1 − p)25−n
n
What is the probability that I pass the theory test?
25
X 25 n
P[Y ≥ 20] = p (1 − p)25−n
n
n=20

An example of a cow farm
Example
Suppose that I am a farmer and have a cow farm. Each day the cows
produce some milk and I sell milk at two dollars per liter. On each day, the
produced milk is equally random between 4500 and 5000 liters. Let us
consider the amount of milk produced tomorrow.
What is the sample space?
Ω = [4500, 5000]
Let X be tomorrow’s revenue of the farm. What is it?
X (ω) = 2ω dollars for any ω ∈ Ω

Discrete random variables
Definition
If a random variable can only take countably many values, it is a discrete
random variable.
A set contains “countably many” elements means either the number

of elements is finite, or the number of elements is infinite but all these
elements can be listed as the first element, the second element, and
so on.
Examples of countable sets: finite sets, the set of positive integers,
the set of integers, the set of rational numbers.

Continuous random variables
Definition
If a random variable X can take uncountably many values and
P[X = a] = 0 for each a ∈ R, it is a continuous random variable.
Examples of uncountable sets: R, the set of positive real numbers, any

interval of the real line.

Probability distributions for discrete random variables
Probabilities assigned to the outcomes determines the probabilities

associated with the values of a random variable.
In the gambling example,
P[X = 1] = 0.5 and P[X = −1] = 0.5
In the driving test example, if the probability I pass a test is p,
P[X = k] = P[The first k − 1 tests all fail and the kth test passes]
= (1 − p)k−1 p
In the theory test example,

25 n
P[Y = n] = p (1 − p)25−n
n

Probability mass function
Definition
The probability mass function of a discrete random variable X is defined
for x ∈ R by
p(x) = P[X = x]
p(x) ≥ 0 for each x ∈ R

P
all possible x p(x) = 1

A probability mass function
In the driving test example,
0.3
0.2
p(x)
0.1
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
x
Figure: p = 0.2
p(k) = P[X = k] = (1 − p)k−1 p

A probability mass function
In the theory test example,
0.2
p(x)
0.1
0
0 5 10 15 20 25
x
Figure: p = 0.8

25 n
p(n) = P[Y = n] = p (1 − p)25−n
n
Cumulative distribution function
The distribution of a random variable can be described by its cumulative

distribution function.
Definition
The cumulative distribution function of a random variable X is defined for
each x ∈ R by
F (x) = P[X ≤ x]
If X is a discrete random variable, then

X
F (x) = P[X ≤ x] = p(y )
y ≤x
The probability mass function and the cumulative distribution

function can be determined each other.

probability mass function & cumulative distribution function
Exercise
Suppose the probability mass function of X is given by
x 1 2 4 8 16
p(x) 0.05 0.10 0.35 0.40 0.10
What is the cumulative distribution function F (x) = P[X ≤ x]?
F (1) = P[X ≤ 1] = p(1) = 0.05

F (2) = P[X ≤ 2] = p(1) + p(2) = 0.15
F (4) = P[X ≤ 4] = p(1) + p(2) + p(4) = 0.5
F (8) = P[X ≤ 8] = p(1) + p(2) + p(4) + p(8) = 0.9
F (16) = P[X ≤ 16] = p(1) + p(2) + p(4) + p(8) + p(16) = 1

probability mass function & cumulative distribution function
x 1 2 4 8 16
p(x) 0.05 0.10 0.35 0.40 0.10



0 x <1

0.05 1 ≤ x < 2




0.15 2 ≤ x < 4
F (x) =


0.5 4≤x <8




0.9 8 ≤ x < 16

1 x ≥ 16

Expected value
Given the probability mass function of a discrete random variable, we can

compute its average value. This average value is called the expected value.
Definition
Let X be a discrete random variable with set of possible values D and
probability mass function p(x). The expected value of X is
X
E[X ] = x · p(x)
x∈D
Expected value is also called expectation or mean or the first moment.

Example: In the gambling example, if the die is fair,
E[X ] = 1 · p(1) + (−1) · p(−1) = 1 · 0.5 + (−1) · 0.5 = 0.

An intuitive explanation
Suppose we repeat independent trials of an experiment n times. Among

the n trials, we get X = x for m(x) times. Then, the sample mean is
P
x · m(x) X m(x) X
X̄ = x∈D = x· ≈ x · p(x) = E[X ]
n n
x∈D x∈D

Indicator variables
We say that I is an indicator variable for an event A if

(
1 if A occurs
I =
0 if Ac occurs
Then, I is a discrete random variable with
p(1) = P[A] and p(0) = 1 − P[A]
Hence,
E[I ] = 1 · P[A] + 0 · (1 − P[A]) = P[A]

Properties of expected values
The following properties apply to any random variables. Let X , Y be two

random variables and a, b ∈ R,
E[a] = a, where a is viewed as a random variable with p(a) = 1
E[X + b] = E[X ] + b
E[X + Y ] = E[X ] + E[Y ]
E[a · X ] = a · E[X ]
Exercise
Prove the fourth property.

Sample mean and expected value
A sample mean is the arithmetic mean of a sample set. An expected

value is the mean weighted by the probability distribution.
A sample mean is random. Different sample sets may yield different
sample means. However, the expected value is a deterministic number
(not random).
The sample mean can be used to estimate the expected value.

Strong Law of large numbers (SLLN)
Theorem
Let X1 , X2 , . . . be an infinite sequence of independent, identically
distributed (iid) random variables with expected value
E[X1 ] = E[X2 ] = · · · = µ
Let
X1 + X2 + · · · + Xn
X̄n =
n
be the sample mean of the first n random variables. Then,
X̄n → µ as n → ∞.
The SLLN applies to any random variable that has a finite expected
value (i.e., µ 6= ±∞).

SLLN & the long-run relative frequency
Let I1 , I2 , . . . be a sequence of iid indicator variables for an event A, i.e.,

(
1 if A occurs
In =
0 if Ac occurs
Then,
E[I1 ] = E[I2 ] = · · · = P[A]
Note that
I1 + I2 + · · · + In
I¯n =
n
is the relative frequency that A occurs for n independent trials. The SLLN
implies that
I¯n → P[A] as n → ∞
In other words, P[A] is the long-run relative frequency that A occurs.

Expected value of a function
Proposition
Let X be a discrete random variable that has a set of possible values D
and probability mass function p(x), then the expected value of any
function h(X ) is given by
X
E[h(X )] = h(x)p(x)
x∈D

Variance and standard deviation
Definition
Let µ = E[X ] be the expected value of X and h(x) = (x − µ)2 . Then
X
V (X ) = E[h(X )] = E[(X − µ)2 ] = (x − µ)2 p(x)
x∈D
is called the variance of X and

p
σ= V (X )
is called the standard deviation of X .

Variance and variability
Variance is used to measure the variability in the distribution of a random

variable.
0.5
0.4
p(x) 0.3
0.2
0.1
0
−6 −4 −2 0 2 4 6
x
0.5
0.4
p(x)
0.3
0.2
0.1
0
−6 −4 −2 0 2 4 6
x
Figure: V (X ) = 0.6 in the first plot while V (X ) = 11.85 in the second one.

Properties of variance
The following properties apply to any random variable X and Y .

V (X + b) = V (X )
V (aX ) = a2 V (X )
A shortcut formula:
V (X ) = E[(X − µ)2 ] = E[X 2 ] − µ2 = E[X 2 ] − (E[X ])2
Let X and Y be two independent random variables, then
V (X + Y ) = V (X ) + V (Y )
Exercise
Prove the shortcut formula when X is a discrete random variable.

Bernoulli distribution
A Bernoulli trial is an experiment whose outcome is random but binary,

either “success” or “failure”. With probability p ∈ (0, 1), the outcome is a
success and with probability 1 − p, the outcome is a fail. Let X be the
random variable such that
X (S) = 1 and X (F ) = 0
Then, X is a Bernoulli random variable with probability mass function
p(1) = P[X = 1] = p and p(0) = P[X = 0] = 1 − p
Definition
Any random variable whose only possible values are 0 or 1 is called a
Bernoulli random variable.

Properties of the Bernoulli random variable
Expected value:
E[X ] = 1 · p + 0 · (1 − p) = p
Variance:
V (X ) = E[(X − p)2 ]
= (1 − p)2 · p + (0 − p)2 · (1 − p)
= p(1 − p)

Geometric distribution
Suppose that we repeat the Bernoulli trial many times and these trials are
independent. Let X be the number of trials until we get the first success.
Then, X is a geometric random variable.
probability mass function:
p(k) = P[X = k] = (1 − p)k−1 p, k = 1, 2, . . .
expected value
∞
X 1
E[X ] = k · (1 − p)k−1 p =
p
k=1
variance
1−p
V (X ) =
p2

An example of driving test
Example
I am taking a driving test to obtain my driver’s license. If I fail, I will take
the test again and again until I pass it. Suppose that the result of each
test is independent and I can pass each test with probability 0 < p < 1.
Let X be the number of tests I take until I pass.

X follows a geometric distribution.
p(k) = P[X = k] = (1 − p)k−1 p, k = 1, 2, . . .

Memorylessness
Consider the probability that you need more than n trials until a success
∞
X (1 − p)n p
P[X > n] = (1 − p)k−1 p = = (1 − p)n
1 − (1 − p)
k=n+1
Question
Suppose you have repeated m independent trials with all fails. What is the
probability that you need more than n extra trials before getting a success?
P[X > m + n, X > m] P[X > m + n]

P[X > m + n|X > m] = =
P[X > m] P[X > m]
(1 − p) m+n
= = (1 − p)n
(1 − p)m
= P[X > n]
The above property is called memorylessness.
Binomial distribution
Let X1 , . . . , Xn be iid Bernoulli random variables. Then,
Y = X1 + · · · + Xn
is called a binomial random variable.

Y is the number of successes among n independent Bernoulli trials.
probability mass function:

n k
p(k) = P[Y = k] = p (1 − p)n−k
k
expected value: Recall that E[X1 ] = p. Then,
E[Y ] = E[X1 + · · · + Xn ] = E[X1 ] + · · · + E[Xn ] = np
variance: Since X1 , . . . , Xn are iid,
V (Y ) = V (X1 + · · · + Xn ) = V (X1 ) + · · · + V (Xn ) = np(1 − p)

An example of theory test
Example
To obtain my driver’s license. I need to pass a theory test. The test has 25
questions. Suppose that for each question, I can give the right answer with
probability p and the correctness of all answers are independent one
another. Let Y be the number of correct answers.
Y follows a binomial distribution.

The probability that I answer n questions correctly is

25 n
P[Y = n] = p (1 − p)25−n
n
If I need to be correct in at least 20 questions, then the probability
that I pass the theory test is
25
X 25 n
P[Y ≥ 20] = p (1 − p)25−n
n
n=20

An example
Exercise
A production process yields a defective rate of 10%. For a sampling plan
of 10 units, determine the distribution of the number of defective units.
We have p = 0.1 and n = 10. Then,

10
P[Y = k] = × 0.1k × 0.910−k
k

Example
Exercise
The color of one’s eyes is determined by a single pair of genes, with the
gene for brown eyes being dominant over the one for blue eyes. This
means that an individual having two blue-eyed genes will have blue eyes,
while one having either two brown-eyed genes or one brown-eyed and one
blue-eyed gene will have brown eyes. When two people mate, the resulting
offspring receives one randomly chosen gene from each of its parents’ gene
pair. If the eldest child of a pair of brown-eyed parents has blue eyes, what
is the probability that exactly two of the four other children (none of
whom is a twin) of this couple also have blue eyes?

Solution
Because the eldest child has blue eyes, both parents must have one
blue-eyed and one brown-eyed gene.
The probability that an offspring has blue eyes is 0.5 × 0.5 = 0.25.
Because each of the other four children will have blue eyes with
probability 0.25, the probability that exactly two of them have blue
eyes is
4 27
× 0.252 × 0.752 =
2 128

Binomial probability mass function
0.4 0.25
p = 0.1, n = 10 p = 0.5, n = 10
0.35
p = 0.1, n = 20 0.2 p = 0.5, n = 20
0.3
0.25 0.15
0.2
0.15 0.1
0.1
0.05
0.05
0 0
0 5 10 15 20 0 5 10 15 20
0.4
p = 0.9, n = 10
0.35
p = 0.9, n = 20
0.3
0.25
0.2
0.15
0.1
0.05
0
0 5 10 15 20
The location and shape of a binomial distribution is affected by both the

sample size n and the probability of success p.
An approximation
Suppose that in the Bernoulli trial, p is very small, i.e., 0 < p 1. We

take a large number of trials n so that λ = np is a “moderate” number. In
this case, for k = 0, 1, . . .
λk

n k n! λ n−k
P[Y = k] = p (1 − p)n−k = 1 −
k k!(n − k)! nk n
n(n − 1) · · · (n − k + 1) λ k λ n λ −k
= 1 − 1 −
| n{zk } k! | {zn } | {zn }
→1 →e −λ →1
λk
→ e −λ as n → ∞
k!

Example
Exercise
The probability that a traffic light fails in a day is 1/3650. Suppose that
there are 10,000 lights installed in Singapore. What is the probability of no
failure within a day?
We have p = 1/3650 and n = 104 . Then, λ = np = 104 /3650 and
λk −λ

10000 k
P[Y = k] = p (1 − p)10000−k ≈ e
k k!
4 /3650
Hence, P[Y = 0] = e −10 = 0.0646

Example
Exercise
A process yields a defect rate of four defects per million opportunities. For
one million opportunities inspected, determine the probability distribution
of the number of defects.
We have n = 106 , p = 4 × 10−6 . Then, λ = 4 and
4k

n k
P[Y = k] = p (1 − p)n−k ≈ e −4
k k!

Poisson distribution
Definition
A discrete random variable X is called a Poisson random variable with
parameter λ > 0 if its probability mass function is given by
λk −λ
p(k) = P[X = k] = e for k = 0, 1, . . .
k!
Exercise
P∞
Verify that k=0 p(k) = 1.
expected value (hint: E[Y ] = np):
E[X ] = λ
variance (hint: V (Y ) = np(1 − p)): As n → ∞, p = λ/n → 0. Then,
V (X ) = λ
Poisson probability mass function
0.7 0.35
0.6 λ = 0.4 0.3 λ=2
0.5 0.25
0.4 0.2
0.3 0.15
0.2 0.1
0.1 0.05
0 0
0 2 4 6 8 10 0 5 10 15
0.14
λ = 10
0.12
0.1
0.08
0.06
0.04
0.02
0
0 5 10 15 20 25
The location and shape of a Poisson distribution is affected by its mean λ.

Binomial vs Poisson
0.35 0.35
binomial n = 10, p = 0.2 binomial n = 100, p = 0.02
0.3 0.3
Poisson λ = 2 Poisson λ = 2
0.25 0.25
0.2 0.2
0.15 0.15
0.1 0.1
0.05 0.05
0 0
0 2 4 6 8 10 0 2 4 6 8 10
0.2 0.2
0.18 binomial n = 50, p = 0.2 0.18 binomial n = 500, p = 0.02
0.16 Poisson λ = 10 0.16 Poisson λ = 10
0.14 0.14
0.12 0.12
0.1 0.1
0.08 0.08
0.06 0.06
0.04 0.04
0.02 0.02
0 0
0 5 10 15 20 0 5 10 15 20

Homework
Reading assignment
Read Chapters 4.1–4.7, 4.8.1
Study Chapter 4.9 on your own
Exercise problems:
4.21, 4.35, 4.38, 4.40, 4.49, 4.57, 4.63

Discrete Random Variables: He Shuangchi

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Discrete Random Variables: He Shuangchi

Uploaded by

Copyright:

Available Formats

Discrete Random Variables

IE5004 Discrete Random Variables 1 / 47

IE5004 Discrete Random Variables 2 / 47

When focusing on certain numerical aspects of outcomes, it is convenient

Let R be the set of real numbers. In other words, a random variable

IE5004 Discrete Random Variables 3 / 47

For a random variable X , when we write {X ∈ B} for a set of numbers B,

IE5004 Discrete Random Variables 4 / 47

In the previous example, if the die is fair,

IE5004 Discrete Random Variables 5 / 47

IE5004 Discrete Random Variables 6 / 47

Ω = {P, FP, FFP, FFFP, . . .}

Let X be the number of tests I take until I get the license. Is X a

X (P) = 1, X (FP) = 2, X (FFP) = 3, . . .

IE5004 Discrete Random Variables 7 / 47

Is Y a random variable? Yes.

IE5004 Discrete Random Variables 8 / 47

What is the sample space?

Let X be tomorrow’s revenue of the farm. What is it?

X (ω) = 2ω dollars for any ω ∈ Ω

IE5004 Discrete Random Variables 9 / 47

A set contains “countably many” elements means either the number

IE5004 Discrete Random Variables 10 / 47

Examples of uncountable sets: R, the set of positive real numbers, any

IE5004 Discrete Random Variables 11 / 47

Probabilities assigned to the outcomes determines the probabilities

P[X = 1] = 0.5 and P[X = −1] = 0.5

In the driving test example, if the probability I pass a test is p,

In the theory test example,

IE5004 Discrete Random Variables 12 / 47

p(x) ≥ 0 for each x ∈ R

IE5004 Discrete Random Variables 13 / 47

p(k) = P[X = k] = (1 − p)k−1 p

The distribution of a random variable can be described by its cumulative

If X is a discrete random variable, then

The probability mass function and the cumulative distribution

IE5004 Discrete Random Variables 16 / 47

What is the cumulative distribution function F (x) = P[X ≤ x]?

F (1) = P[X ≤ 1] = p(1) = 0.05

IE5004 Discrete Random Variables 17 / 47

IE5004 Discrete Random Variables 18 / 47

Given the probability mass function of a discrete random variable, we can

Expected value is also called expectation or mean or the first moment.

E[X ] = 1 · p(1) + (−1) · p(−1) = 1 · 0.5 + (−1) · 0.5 = 0.

IE5004 Discrete Random Variables 19 / 47

Suppose we repeat independent trials of an experiment n times. Among

IE5004 Discrete Random Variables 20 / 47

We say that I is an indicator variable for an event A if

Then, I is a discrete random variable with

p(1) = P[A] and p(0) = 1 − P[A]

IE5004 Discrete Random Variables 21 / 47

The following properties apply to any random variables. Let X , Y be two

IE5004 Discrete Random Variables 22 / 47

A sample mean is the arithmetic mean of a sample set. An expected

IE5004 Discrete Random Variables 23 / 47

IE5004 Discrete Random Variables 24 / 47

Let I1 , I2 , . . . be a sequence of iid indicator variables for an event A, i.e.,

IE5004 Discrete Random Variables 25 / 47

IE5004 Discrete Random Variables 26 / 47

is called the variance of X and

is called the standard deviation of X .

IE5004 Discrete Random Variables 27 / 47

Variance is used to measure the variability in the distribution of a random

Suppose that in the Bernoulli trial, p is very small, i.e., 0 < p 1. We