You are on page 1of 4

STAT6201 Lecture Note 3 Fall 2021

3 Common Families of Distributions


3.1 Introduction
A family of distributions: a class of pmf/pdf indexed by one or more “parameters”.
• distributions have a common functional form but different parameter values, quantifying certain char-
acteristics of the distribution
• functional form is tractable — parametric distribution family.

3.2 Discrete Distributions


3.2.1 Discrete Uniform Distribution
• X: equal chance of taking 1, 2, . . . , N . Here N is the parameter.
• pmf:
P (X = x | N ) = 1/N, x = 1, 2, . . . , N.
Note: Here P (X = x | N ) is NOT a conditional probability; it means such probability depends on
the parameter N .
• Moments: E(X) = (N + 1)/2 and Var(X) = (N + 1)(N − 1)/12.
• Generalization: X takes all integer values N0 , N0 + 1, . . . , N1 . With two parameters (N0 , N1 ), the
pmf is
P (X = x | N0 , N1 ) = 1/(N1 − N0 + 1).
Think: What are E(x) and Var(X)?

3.2.2 Hypergeometric Distribution


• N balls, M red, N − M green.
• Draw K balls at random without replacement.
• X = # red balls ∼ Hypergeometric(N, M, K).
• pmf:
M N −M
 
x K−x
P (X = x | M, N, K) = N
 , x = 0, 1, . . . , min(M, K).
K
In many cases, K is small compared to M and N . So the range is 0 ≤ x ≤ K.
• Moments:  
KM KM (N − M )(N − K)
E(X) = , Var(X) = .
N N N (N − 1)
Exericise: Prove them using the following facts:
       
M M −1 N N N −1
x =M , = .
x x−1 K K K −1
• Applications:
– Capture-Recapture Method: There are N (unknown) fish in a pond. To estimate N , we catch
M fish, mark them, and return them back to the pond. After a while, we catch K fish, and find
X = # marked fish. How to estimate N ?
– Acceptance Sampling: There are N = 25 machine parts in a lot. We sample K = 10 parts
and let X = # of defective parts found in this sample. If the total number of defective parts is
M = 6, what is the probability of “none are defective”? Calculate P (X = 0).
STAT6201 Lecture Note 3 Page 2 of 4

3.2.3 Binomial Distribution


• Bernoulli trial : Two possible outcomes: Success (S) or Failure (F).
• p = P (Success) in each trail; q = 1 − p.
• Experiment: n independent Bernoulli trials, with the same p.
• X = # of Successes ∼ Binomial(n, p).
• pmf:    
n k n−k n k n−k
P (X = k | n, p) = p (1 − p) = p q , k = 0, 1, . . . , n.
k k
Pn
Note that (x + y)n = i=0 ni xi y n−i , so P (X = k | n, p) is the (x + 1)th term of (p + q)n .


• Moments: E(X) = np, Var(X) = np(1 − p). (Can you derive? )


• MGF: M (t) = (1 − p + pet )n . (Can you derive? )

3.2.4 Poisson Distribution


• The Poisson distribution is most commonly used to model the number of random occurrences of some
phenomenon in a specified unit of time or space. e.g. the number of customers to arrive in a bank
between 10am and 11am.
• X ∼ Poisson(λ) if
e−λ λx
P (X = x | λ) = , x = 0, 1, 2, . . . .
x!
Here λ is called the intensity parameter, which specifies the mean rate of occurrence per unit (of time,
area, volume, Petc.), i.e., λ = E(X).
+∞
Verify that x=0 P (X = x | λ) = 1.

Example 3.2.1 (Textbook Example 3.2.4, pp.93)

• Moments: E(X) = λ, Var(X) =?


t
−1)
• MGF: MX (t) = eλ(e .
• Poisson approximation by binomial distribution.
Let X ∼ Binomial(n, p). If n → ∞, p → 0, np → λ (i.e., n large, p small, but np is moderate), then
Bin(n, p) → Poisson(λ).

Example 3.2.2 (Textbook Example 2.3.13, pp.66)

Example 3.2.3 (Textbook Example 3.2.5, pp.94)

3.2.5 Geometric Distribution


• i.i.d. Bernoulli trials: P (Success) = p. Denote q = 1 − p.
• pmf: Let X =number of trials needed to get the first Success.

P (X = x | p) = pq x−1 , x = 1, 2, . . . .

• Moments: E(X) = 1/p, Var(X) = (1 − p)/p2 .


pet
• MGF: MX (t) = 1−(1−p)et .

Cont.
STAT6201 Lecture Note 3 Page 3 of 4

• Memoryless Property:

P (X > a + b | X > a) = P (X > b) = (1 − p)b .




Fun Fact: The geometric distribution is the only discrete distribution with the memoryless property.

Example 3.2.4 (Textbook Example 3.2.7, pp.98)

3.2.6 Negative Binomial Distribution


• i.i.d. trials, with Success or Failure in each trial. P (Success) = p. Denote q = 1 − p.

• Let X denote the trial # on which the rth success occurs.


• pmf: X ∼ nb(r, p) if
 
x − 1 r x−r
P (X = x | r, p) = p q , x = r, r + 1, , . . . .
r−1

Geometric distribution is a special case of negative Binomial with r = 1.

• Moments:
r r(1 − p)
E(X) = , Var(X) = .
p p2
• Alternative definition: Let Y = # failures before the rth successes, i.e. Y = X − r.
   
r+y−1 r y r+y−1 r y
P (Y = y | r, p) = p q = p q , y = 0, 1, . . . .
r−1 y

We can obtain
r(1 − p) r(1 − p)
E(Y ) = , Var(Y ) = .
p p2
• Why called negative binomial?: Newton’s Generalized Binomial Theorem: For any a ∈ R,
∞  
X a
(x + z)a = xy z a−y ,
y=0
y

where the generalized binomial coefficient is defined as


 
a a(a − 1) · · · (a − y + 1)
=
y y!

Letting a = −r, we have


   
−r (−r)(−r − 1) · · · (−r − y + 1) y (r + y − 1) · · · r y r+y−1
= = (−1) = (−1) .
y y! y! y

Thus    
r+y−1 r y −r r
P (Y = y | r, p) = p q = p (−q)y .
y y

Also
∞ ∞   ∞  
X X −r X −r
P (Y = y | r, p) = pr (−q)y = pr (−q)y 1−r−y = pr (1 − q)−r = 1.
y=0 y=0
y y=0
y

Cont.
STAT6201 Lecture Note 3 Page 4 of 4

• MGF.
∞  
X −r r
M (t) = E(etY ) = ety p (−q)y = pr (1 − qet )−r
y=0
y

• Connection with geometric distributions: If X1 , . . . , Xr i.i.d. Geometric(p), then

Y = X1 + · · · + Xr ∼ nb(r, p).

How to understand this relationship intuitively?

Acknowledgement
The lecture notes of this course are based on the textbook and Prof. Huixia Judy Wang’s lecture slides. The
instructor thanks Prof. Wang for kindly sharing them.

The End.

You might also like