Discrete Distributions

Discrete Distributions
Chapter 6
Negative Binomial Distribution

section 6.3
Consider k = r, r + 1, ... independent Bernoulli trials with probability of success in one trial
being p. Let the random variable X be the trial number k needed to have r-th success.
Equivalently in the first k − 1 trials there are r − 1 successes (no matter when they have
occurred) and the k-th trial must be a success. Since the trials are independent, the required
probability is found by multiplying the two sub-probabilities:
( ) ( )
k − 1 r−1 k−1 r
p (1 − p) k−r
×p= p (1 − p)k−r
r−1 r−1
where here we are assuming that k = r, r + 1, ... . Putting n = k − r , or equivalently

k = n + r , we can write the above equality in terms of n :
( )
n+r−1 r
P (Y = n) = p (1 − p)n n = 0, 1, 2, ... where Y = X − r
r−1
( ) (n+r−1)
But we know from “Combinatorics” that n+r−1 r−1 = n , therefore we have
( )
n+r−1 r
P (Y = n) = p (1 − p)n n = 0, 1, 2, ...
n
1
Finally by a change of variable p = 1+β , we can write:
( )( )r ( )n
n+r−1 1 β
P (Y = n) = n = 0, 1, 2, ...
n 1+β 1+β
Recall that ( )
n+r−1 (n + r − 1)(n + r − 2) · · · (r)
=
n n!
therefore:
( )r ( )n
(n + r − 1)(n + r − 2) · · · (r) 1 β
P (Y = n) = n = 0, 1, 2, ...
n! 1+β 1+β
1
In this new form , r can be taken to be any positive number (and not just a positive integer).
So the negative binomial distribution has two positive parameters β > 0 and r > 0 . This
distribution has an advantage over the Poisson distribution in modeling because it has one
more parameter.
To be able to use the table of the textbook directly , let us change n to k thoroughly:
( )r ( )k
(k + r − 1)(k + r − 2) · · · (r) 1 β
P (Y = k) = k = 0, 1, 2, ...
k! 1+β 1+β
k terms
z }| {
(r)(r + 1) · · · (k + r − 1) β k
P (Y = k) = k = 0, 1, 2, ...
k! (1 + β)k+r
(k+r−1)(k+r−2)···(r)
We need the values k! to be able to calculate these probabilities. But note that
(k + r − 1)(k + r − 2) · · · (r) Γ(k + r)

Γ(k+r) = (k+r−1)(k+r−2) · · · (r)Γ(r) ⇒ =
k! Γ(k + 1) Γ(r)
and tables of the values the Gamma function or log-Gamma functions are provided in
(k+r−1)(k+r−2)···(r)
programming languages, so you can calculate the values k! through the
Γ(k+r)
formula Γ(k+1) Γ(r) .
Here is the reason for choosing the name “Negative Binomial”

For a moment let us recall the binomial series we have learned in Calculus:
∞
∑
α (α)(α − 1) · · · (α − k + 1)
(1 + z) = 1 + zk
k!
k=1
which is valid for all real number α and all − < z < 1.
As an example:
∑∞
1 − 14 (− 41 )(− 54 ) · · · (− 41 − k + 1) k
√
4
= (1 + z) = 1 + z −1<z <1
1+z k!
k=1
2
Now change z to −z , and substitute −r for α to get:
∑∞
(1 − z)−r = 1 + k=1
(−r)(−r−1)···(−r−k+1)
k! (−z)k
∑∞ (r)(r+1)···(r+k−1) k
= 1+ k=1 k! z dropping (−1)k
∑∞ (r+k−1)(r+k−2)···(r) k
= 1+ k=1 k! z rearranging
∑∞ (k+r−1)
= k=0 r−1 zk
The series expansion

∞ (
∑ )
−r 1 k+r−1 k
(1 − z) = = z
(1 − z)r r−1
k=0
is called the negative binomial expansion.
Now we calculate the PGF of the negative binomial distribution.

∞
∑ ∑∞ ( )( )r ( )k
N k k k+r−1 1 β
PN (z) = E(z ) = z P (N = k) = z
r−1 1+β 1+β
k=0 k=0
( )r ∑
∞ ( )( )k ( ∞ (
)r ∑ )( )k
1 k+r−1 β 1 k+r−1 zβ
= zk =
1+β r−1 1+β 1+β r−1 1+β
k=0 k=0
( )r ( )−r ( )r ( )
1 zβ 1 1 + β(1 − z) −r
= 1− =
1+β 1+β 1+β 1+β
( )r ( )r
1 1+β 1 1
= = =
1+β 1 + β(1 − z) (1 + β(1 − z)) r (1 − β(z − 1))r
Expected value and Variance
p(z) = (1 + β(1 − z))−r


 p′ (z) = rβ(1 + β(1 − z))−r−1
⇒
 p′′ (z) = r(r + 1)β 2 (1 + β(1 − z))−r−2
3

 p′ (1) = rβ
⇒
 p′′ (1) = r(r + 1)β 2

 E(N ) = p′ (1) = rβ
⇒
 Var(N ) = p′′ (1) + p′ (1)(1 − p′ (1)) = · · · = rβ + rβ 2 β = rβ(1 + β)
Note. As we see, in the negative binomial distribution , the variance is larger than the
expected value while in the Poisson distribution they are equal, therefore in modeling the data
in which the sample variance seems to be larger than the sample mean , the negative binomial
distribution is preferred over the Poisson distribution.
If PX (z) is the PGF of X , then



 E(X) = p′ (1)





 Var(X) = p′′ (1) + p′ (1)(1 − p′ (1))
Theorem (Poisson as a limit of NBinomials). Let Xn ∼ NBinomial(rn , βn ) such that

d
rn → ∞ and βn → 0 and rn βn → λ > 0. Then Xn → Poisson(λ)
Note. Before proving this theorem , we recall from Calculus that when x → 0 , then the
ln(1+x)
functions ln(1 + x) and x are equivalent in the sense that limx→0 x = 1. To see this
equality apply the L’Hospital rule. Once we have this , then in the quotients we can
substitute x for ln(1 + x) whenever x → 0. In fact, if g(x) is any function of x , then
( )( )
ln(1 + x) ln(1 + x) x ln(1 + x) x x
lim = lim = lim lim = lim
x→∞ g(x) x→∞ x g(x) x→∞ x x→∞ g(x) x→∞ g(x)
Proof of the Theorem. Set λn = rn βn . Then from the assumption we have λn → λ.
4
Further: ( )−rn
limn→∞ PXn (z) = lim 1 + βn (1 − z)
[ ]
= lim exp − rn ln(1 + βn (1 − z))
[ { }]
= exp lim − rn ln(1 + βn (1 − z))
[ { }]
ln(1+βn (1−z))
= exp lim − 1
rn
[ { ln(1+
λn (1−z)
)
}]
= exp lim − 1
rn
rn
[ { λn (1−z) }]
= exp lim − rn
1
rn
[ { }]
= exp lim − λn (1 − z) = exp(λ(z − 1))
So , we have proved that the PGF of the sequence Xn ’s tends to the PGF of the Poisson(λ)
distribution. Whence the claim.
5
Geometric Distribution
The Geometric random variable with parameter 0 < q < 1 is a variable with support
{1 , 2 , ....} such that X = k is the event that in a series of Bernoulli trials the first success
occurs at time k. Since the first k − 1 trials should result in a failure, then
P (X = k) = q (1 − q)k−1 k = 1, 2, ...
Note that
{ }
P (X ≥ m) = q(1 − q)m−1 + q(1 − q)m + q(1 − q)m+1 + · · ·
{ } 1
= q(1 − q)m−1 1 + (1 − q) + (1 − q)2 + · · · = q(1 − q)m−1 = (1 − q)m−1
1 − (1 − q)
Then for k = 1, 2, ... we have :
P (X ≥ n + k , X ≥ n) P (X ≥ n + k)
P (X ≥ n + k | X ≥ n) = =
P (X ≥ n) P (X ≥ n)
(1 − q)n+k−1
= = (1 − q)k
(1 − q)n−1
This property is called the memoryless property. Given that there are at least n claims, the
probability distribution of the number of claims in excess of n does not depend on n :
P (X = n + k | X ≥ n) = P (X ≥ n + k | X ≥ n) − P (X ≥ n + k + 1 | X ≥ n) =
(1 − q)k − (1 − q)k+1 does not depend on n
A random variable X has memoryless distribution if for all x’s the conditional distribution
(X | X ≥ x) is the same for all x’s.
We may consider the Geometric distribution as a special case of Negative Binomial : In fact by
1
changing k to k + 1 and substituting 1+β for p we can write the probabilities in the new form:
( )( )k
1 β βk
Geometric(β) P (N = k) = = k = 0, 1, ...
1+β 1+β (1 + β)k+1
So, a Geometric distribution is a Negative Binomial distribution with r = 1. Note that in this
6
new shape , we have
1
P (N = 0) =
1+β
i.e. the value of the probability function at k = 0 equals the probability of success.
7
(a,b,0) class
section 6.5
Definition. Let pk = P (X = k) , k = 0, 1, 2, ... be the probabilities of a discrete random

variable. If there are two numbers a and b satisfying
pk b equivalently pk
=a+ ⇐⇒ k = ak + b k = 1, 2, ...
pk−1 k pk−1
then we say that X belongs to the class (a, b, 0). Note that the following four distributions are
in class (a, b, 0) :
Distribution a b p0
q
Binomial(m, q) − 1−q −(m + 1)a (1 − q)m
Negative Binomial(r, β) β
1+β (r − 1)a (1 + β)−r
Geometric(q) β
1+β 0 (1 + β)−1
Poisson(λ) 0 λ e−λ
As it has been shown in the literature , these four distributions are the only non-trivial
distributions of the class (a, b, 0)
Example ∗. Consider a discrete random variable N of class of the (a , b , 0). Assume that
P (X = 0) = 0.1 , P (X = 1) = 0.3 , and P (X = 2) = 0.3
Calculate P (X = 3).
Solution. ( )
b
pk = a + pk−1 (∗)
k
Putting k = 1 in (∗) we get:
p1 = (a + b)p0 ⇒ 0.3 = (a + b)(0.1) ⇒ a+b=3
Putting k = 2 in (∗) we get:
8
p2 = (a + 12 b)p1 ⇒ 0.3 = (a + 12 b)(0.3) ⇒ a + 21 b = 1
By solving the two equations thus found, we will have: a = −1 and b = 4. Then the equality
(8) reduces to ( )
4
pk = −1 + pk−1 (∗)
k
Put k = 3 in (∗) to have:
p3 = (−1 + 34 )p2 ⇒ p3 = 13 p2 = 31 (0.3) = 0.01
Example ∗ . The Independent Insurance Company insures 25 risks, each with a 4%

probability of loss. The probabilities of loss are independent. On average, how often would 4
or more risks have losses in the same year?
A. Once in 13 years
B. Once in 17 years
C. Once in 39 years
D. Once in 60 years
E. Once in 72 years
Solution. The distribution of loss is Binomial(m = 25 , q = 0.04). Either find the probabilities
m(m−1)···(m−k+1) k
through elementary formula pk = k! q (1 − q)m−k or use the recursive formula:
q
a = − 1−q = −0.0417
b = −(m + 1)a = 1.08333
p0 = (1 − q)m = (0.96)25 = 0.3604
p1 = (a + b)p0 = 0.3754
p2 = (a + 2b )p1 = 0.1877
p3 = (a + 3b )p2 = 0.0600
desired probability = 1 − (p0 + p1 + p2 + p3 ) = 1 − 0.9835 = 0.0165
4 or more risks have losses in the same year on average once in: 1/.0165 = 60.6 years.
Example ∗ . The distribution of accidents for 84 randomly selected policies is as follows:
9
Number of accidents Number of Policies
0 32
1 26
2 12
3 7
4 4
5 2
6 1
Total 84
Which of the following models best represents these data ?

(A) Negative binomial
(B) Discrete uniform
(C) Poisson
(D) Binomial
(E) Either Poisson or Binomial
Solution.
pk
Number of accidents Number of Policies k pk−1
0 32 —
1 26 0.81
2 12 0.92
3 7 1.75
4 4 2.29
5 2 2.50
6 1 3.00
Total 84
10
pk
Since the values k pk−1 increase , a negative binomial distribution is best fit.
Example ∗.
A discrete probability distribution has the following properties:
(i) pk = c(1 + 1/k)pk−1 for k = 1, 2, ...
(ii) p0 = 0.5 .
Calculate c.
Solution.
This distribution is of class (a, b, 0) with a = b = c. Since the probabilities are non-negative
we must have c ≥ 0. A value of zero for c result in p1 = p2 = · · · = 0 which together with
p0 = 0.5 do not result in the sum of probabilities being equal to 1. So we have c > 0 and then
β
a > 0. So the distribution is of negative binomial. For the negative binomial we have a = 1+β
(r−1)β
and b = 1+β . Then
a=b ⇒ 1=r−1
⇒ r=2
1 1 √
p0 = 0.5 ⇒ 0.5 = = ⇒ β= 2 − 1 = 0.4142
(1 + β)r (1 + β)2
β 0.4142
c=a= = = 0.293
1+β 1.4142
Example ∗ . You are given the following:
• A portfolio consists of 10 independent risks.
• The distribution of the annual number of claims for each risk in the portfolio is given by
a Poisson distribution with mean λ = 0.1.
Determine the probability of the portfolio having more than 1 claim per year.
A. 5% B. 10% C. 26% D. 37% E. 63%
Solution. Since the sum of independent Poisson r.v.’s is distributed as Poisson, this portfolio
has a Poisson distribution with parameter λ = (10)(0.1) = 1. So the probability of more than
11
one claim in one year is
( )
1 − (p0 + p1 ) = 1 − e−1 + e−1 = 0.264
12
Truncation and modification at zero
section 6.6
The discrete distributions in the class (a , b , 0) all have positive probability at zero p(0) > 0.
But if in a study the number of claims is under study for onlt those losses that have resulted
in a claim, then the minimum observed value is 1 and therefore p(0) = 0. We can create such
a counting distribution from a class (a , b , 0) by assigning the zero value to p(0) and dividing
the remaining probabilities by 1 − p(0). Such a new distribution is called a zero-truncated
distribution.
In some insurance situations the chance of no claim is high so we may have a counting data
that needs a large value of p(0) to be properly modeled. Therefore we may modify the
counting model by assigning a large value to p(0) and then modify the other values of p(k) so
as to have a probability distribution. This new distribution is called zero-modified.
Note that in both cases of zero-truncated distribution and zero-modified distribution, the old
p(k)
probabilities {p(1) , P (2) , ...} are all multiplied by a constant, therefore the ratio p(k−1)
remains the same as before for k = 2, 3, ... ,
p(k) b
=a+ k = 2, 3, ...
p(k − 1) k
A distribution with this property is said to of class (a, b, 1). So by modifying a (a, b, 0) class
member we get a (a, b, 1) class member.
Definition. A discrete random variable X ∈ {0 , 1 , 2 , ...} is said to belong to class (a, b, 1) if

its density function satisfies the equality
( )
b
p(k) = a + p(k − 1) k = 2, 3, ...
k
Note. The class (a, b, 1) satisfies the same recursive form as the class (a, b, 0) does, but in the
case of (a, b, 1) the value p(0) is not as part of recursion.
13
Let us denote the probability function of (a, b, 0) by p(k) and that of (a, b, 1) by pM (k). By
assigning any number from γ ∈ [0 , 1) to pM (0) , we then define pM by

 γ k=0
(zero-modified distribution) pM (k) =
 1−γ p(k) k = 1, 2, ...
1−p(0)
We here verify that this is indeed a probability function:

∞
∑ ∞
∑ ∞
1−γ 1−γ ∑
pM (0) + pM (k) = γ + p(k) = γ + p(k)
1 − p(0) 1 − p(0)
k=1 k=1 k=1
1−γ
=γ+ (1 − p(0)) = γ + (1 − γ) = 1 ✓
1 − p(0)
The special case of γ = 0 :


 0 k=0
(zero-truncated distribution) pT (k) =
 1
p(k) k = 1, 2, ...
1−p(0)
Example. Let X be distributed as Geometric(β = 9). Calculate

(i) The zero-truncated distribution.
(ii) The zero-modified distribution with P M (0) = 0.3
Solution.
Part (i):
1 1 1 1 10
p(0) = = ⇒ = =
1+β 10 1 − p(0) 0.9 9
βk 9k 1 10 9k 9k−1
= ⇒ p(k) = =
(1 + β)k+1 10k+1 1 − p(0) 9 10k+1 10k

 0 k=0
pT (k) =
 9k−1 k = 1, 2, ... (zero-truncated distribution)
10k
14
Part (ii):
1−γ 0.7 7
= =
1 − p(0) 0.9 9
1−γ 7 9k 7(9k−1 )
p(k) = =
1 − p(0) 9 10k+1 10k+1

 0.3 k=0
pM (k) =
 7(9 )
k−1
k = 1, 2, ... (zero-modified distribution)
10k+1
In the following discussion , for the sake of simplicity , we denote the base probabilities by
{p0 , p1 , ...} and the resulted probabilities due to truncation or modification by {q0 , q1 , ...} .
Theorem. Let N be in the class (a, b, 0) having the MGF M (t) and the PGF Q(z). Then the
MGF and PGF of the the modified (and truncated) random variable are equal to
 ( )

 M ∗ (t) = γ−p0
+ 1−γ
M (t) a weighted average

 1−p0 1−p0

 ( )

 Q∗ (t) = γ−p0 1−γ
1−p0 + 1−p0 Q(z) a weighted average
15
Proof. We only prove the first equality as the proof of the second one is similar.
∑∞
M ∗ (t) = k=0 e
kt pM (k)
1−γ ∑∞ kt p
= γ+ 1−p0 k=1 e k
{∑ }
1−γ ∞
= γ+ 1−p0
kt
k=0 e pk − p0
{ }
1−γ
= γ+ 1−p0 M (t) − p0
1−γ 1−γ
= γ − p0 1−p0 + 1−p0 M (t)
γ−p0 1−γ
= 1−p0 + 1−p0 M (t)
Note. The constant function 1 is the MGF of the degenerate random variable which give the
probability 1 to k = 0. So , the above equalities show that the modified distribution is the
mixture of this degenerate distribution and the base distribution.
16
Extended Truncated Negative Binomial and Logarithmic Distributions
Definition. By the extended truncated negative binomial distribution we mean the

discrete distribution whose probability function is of the form:
 ( )

 pk = a + kb pk−1 k = 2, 3, ... p0 = 0





 where β (r−1)β
a= 1+β , b= 1+β , r ̸= 0 , r > −1 , β > 0
Note. The requirements r > −1 and β > 0 are needed to ensure that this indeed defines a
probability function as the following lemma shows.
Lemma.
( )k−1
β (r + 1)(r + 2) · · · (r + k − 1)
pk = p1 k = 2, 3, ...
1+β k!
Proof. From the defining recursive formula , we have

( )
β (r − 1)β kβ + (r − 1)β (r + k − 1)β β r+k−1
pk = + pk−1 = pk−1 = pk−1 = pk−1
1 + β k(1 + β) k(1 + β) k(1 + β) 1+β k
Writing this for k = 2, 3, ... gives us:



 p2 = β r+1
p1

 1+β 2


 p3 = β r+2
p2
1+β 3
 ..

 .



 β r+k−1
pk = 1+β k pk−1
Multiplying these gives the result.
∑∞
Lemma. For any choice of 0 < p1 ≤ 1 we have k=1 pk < ∞.
Proof. Equivalently we must show that the series with positive terms
∑∞ ∞ (
∑ )k−1
β (r + 1)(r + 2) · · · (r + k − 1)
pk = p1 + p1
1+β k!
k=1 k=2
17
is convergent. We use the so-called ratio-test to achieve this. Set
( )k−1
β (r + 1)(r + 2) · · · (r + k − 1)
ak =
1+β k!
Then
ak+1

lim = lim β r + k = β < 1
k→∞ ak k→∞ 1 + β k + 1 1+β
so , according to the ratio-test , the series is convergent.
Note. Here is a closed form for p1 :

∑∞ ∑∞ ( β
)k−1
(r+1)(r+2)···(r+k−1)
1 = k=1 pk = p1 + p1 k=2 1+β k!
( )∑ ( )k
p1 1+β ∞ −β (−r)(−r−1)(−r−2)···(−r−k+1)
= p1 + r β k=2 1+β k!
( ) {[ ∑∞ ( )k ] }
p1 1+β −β (−r)(−r−1)(−r−2)···(−r−k+1) rβ
= p1 + r β 1+ k=1 1+β k! −1− 1+β
( ) {( )−r }
p1 1+β β rβ
= p1 + r β 1− 1+β −1− 1+β
( ){ }
p1 1+β rβ
= p1 + r β (1 + β)r − 1 − 1+β
( ){ }
p1 1+β (1+β)r+1 −1−β(1+r)
= p1 + r β 1+β
{ }
p1 (1+β)r+1 −1−β(1+r)
= p1 + r β
{ }
(1+β)r+1 −1−β(1+r)
= p1 1 + rβ
{ }
(1+β)r+1 −1−β
= p1 rβ
Therefore
rβ
p1 = (1+β)r+1 −1−β
18
and for this one needs the assumption r ̸= 0 otherwise the denominator would be zero.
Note. We recall from Calculus that

d
(1 + β)r+1 = (1 + β)r+1 ln(1 + β)
dr
Then by applying the L’Hospital rule we will have:
rβ β β
lim p1 = lim = lim =
r→0 r→0 (1 + β)r+1 −1−β r→0 (1 + β)r+1 ln(1 + β) (1 + β) ln(1 + β)
So then for k ≥ 2 we have
( )k−1
β (r+1)(r+2)···(r+k−1)
limr→0 pk = limr→0 p1 1+β k!
( )k−1
β β (1)···(k−1)
= 1+β (1+β) ln(1+β) k!
( )k
β 1 1
= 1+β ln(1+β) k
β
This and limr→0 p1 = can be put together :
(1+β) ln(1+β)
( )k
β 1 1
lim pk = k = 1, 2, ...
k→∞ 1+β ln(1 + β) k
But the values on the right-hand side are positive numbers that sum to 1:
∑∞ ( β )k ∑∞ ( β )k 1 ( 1 ) ( (
β
))
k=1 1+β
1
ln(1+β)
1
k = 1
ln(1+β) k=1 1+β k = ln(1+β) − ln 1− 1+β
( )( ( )) ( )
= 1
ln(1+β) − ln 1+β
1 1
= ln(1+β) ln(1 + β) = 1
definition. The distribution with probability function

( )k
β 1 1
p(k) =
1+β ln(1 + β) k
is called Logarithmic distribution
Note. The class (a, b, 1) has 14 members. The list of members is given on page 94 of the
textbook and the distributional information about these distributions is given at the end of
the textbook.
19

Discrete Distributions

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Discrete Distributions

Uploaded by

Copyright:

Available Formats

Discrete Distributions

Negative Binomial Distribution

where here we are assuming that k = r, r + 1, ... . Putting n = k − r , or equivalently

(k + r − 1)(k + r − 2) · · · (r) Γ(k + r)

Here is the reason for choosing the name “Negative Binomial”

The series expansion

Now we calculate the PGF of the negative binomial distribution.

Expected value and Variance

p(z) = (1 + β(1 − z))−r

If PX (z) is the PGF of X , then

Theorem (Poisson as a limit of NBinomials). Let Xn ∼ NBinomial(rn , βn ) such that

Proof of the Theorem. Set λn = rn βn . Then from the assumption we have λn → λ.

Then for k = 1, 2, ... we have :

Definition. Let pk = P (X = k) , k = 0, 1, 2, ... be the probabilities of a discrete random

P (X = 0) = 0.1 , P (X = 1) = 0.3 , and P (X = 2) = 0.3

Example ∗ . The Independent Insurance Company insures 25 risks, each with a 4%

Example ∗ . The distribution of accidents for 84 randomly selected policies is as follows:

Which of the following models best represents these data ?

Example ∗ . You are given the following:

• A portfolio consists of 10 independent risks.

A. 5% B. 10% C. 26% D. 37% E. 63%

remains the same as before for k = 2, 3, ... ,

Definition. A discrete random variable X ∈ {0 , 1 , 2 , ...} is said to belong to class (a, b, 1) if

We here verify that this is indeed a probability function:

The special case of γ = 0 :

Example. Let X be distributed as Geometric(β = 9). Calculate

Definition. By the extended truncated negative binomial distribution we mean the

Proof. From the defining recursive formula , we have

Writing this for k = 2, 3, ... gives us:

Multiplying these gives the result.

Note. Here is a closed form for p1 :

Note. We recall from Calculus that

definition. The distribution with probability function

You might also like