You are on page 1of 37

Lecture 5

Some useful discrete


probability distributions

CH2010 Engineering Statistics PL2023v6 1


The Bernoulli Process
We take a sample of size one (a trial), which only have two possible outcomes, e.g.
Head or tail
Boy or girl
Pass or fail
• We repeat this many times.
• The probability of an outcome (head or tail) in each trial is constant, and
independent of the outcomes of the other trails.

CH2010 Engineering Statistics PL2023v6 2


Example 5.1
A product has a defective rate of 25%. If we sample three products, what are the
probabilities that two of them are defective?
Answer: by tabulating the sampling space, let X be the number of defectives (D;
non-defective – N), we have

Outcome NNN NDN NND DNN NDD DND DDN DDD


X 0 1 1 1 2 2 2 3

For example, if X = 2

CH2010 Engineering Statistics PL2023v6 3


Example 5.1 (cont.)

Apply this to all possible values of x (viz. 0, 1, 2, 3), we obtain the probability
distribution function of x.
x 0 1 2 3
f(x) 27/64 27/64 9/64 1/64

This is a binomial distribution. The individual trials are Bernoulli


experiments/trials.
CH2010 Engineering Statistics PL2023v6 4
5.1 Binomial distribution
• Discrete probability distribution
• We take n Bernoulli trials (independent, binary results)
• Probability of a positive outcome in each trial is p.
• What is the probability of x positive outcomes?

• Written as:

• Take example 5.1:

CH2010 Engineering Statistics PL2023v6 5


5.1 Binomial distribution
To generalise,

In the case of Example 5.1,

This is much faster than “counting sample points”.

CH2010 Engineering Statistics PL2023v6 6


5.1 Binomial distribution
The name comes from the fact that the b(x;n,p) values correspond to the terms in the
binomial expansion of (q + p)n, where q = 1 − p.

The cumulative distribution can be computed as:

Computing B(r;n,p) manually may be tedious. The values of B(r;n,p) are given in
Table A.1.
The mean and variance of the binomial distribution b(x;n,p) are:

and
CH2010 Engineering Statistics PL2023v6 7
Applications of binomial distribution
• Engineering
• Quality control of product lines – pass or fail
• Mechanical strength test – yield or not yield
• Pharmaceutical
• Effectiveness of treatment – cure or no cure.
• Military
• Hit or miss (of guided missile)

• What other applications can you think of?

CH2010 Engineering Statistics PL2023v6 8


Example 5.2
The probability that a patient recovers from a rare blood disease is 0.4. If 15 people
are known to have contracted this disease, what is the probability that (a) at least 10
recover, (b) from 3 to 8 recover, and (c) exactly 5 recover?
Answer
X = the random variable representing the number of people who recover.
n = 15, p = 0.4.
(a) Probability that at least 10 recover = P(X ≥ 10).

CH2010 Engineering Statistics PL2023v6 9


Example 5.2 (cont.)
(a) Probability that at least 10 recover = P(X ≥ 10).

Look up Table A.1

CH2010 Engineering Statistics PL2023v6 10


Example 5.2 (cont.)
(b) Between 3 and 8 recover = 3 ≤ X ≤ 8.

Look up Table A.1.

CH2010 Engineering Statistics PL2023v6 11


Example 5.2 (cont.)
(c) Exactly 5 recover

Look up Table A.1

Alternatively,

CH2010 Engineering Statistics PL2023v6 12


5.2 Hypergeometric distribution
Now, we are sampling from a finite pool of items.
Each item only has two possible outcomes: pass and fail.
We do not put the sampled item back to the pool, nor do we replenish the pool with
new replacements.
As we sample, the number of item left in the pool reduces.
Accordingly, the probability of “pass” of the next outcome depends on all previous
outcomes.
This is a hypergeometric experiment.
An example hypergeometric experiment is drawing cards consecutively from a deck
of cards without putting the drawn cards back. Here the outcome is either a red card
or a black card.
CH2010 Engineering Statistics PL2023v6 13
Example 5.3
We draw 5 cards from a deck of 52 playing cards (26 reds and 26 blacks), what is
the probability of observing 3 red cards?
Recall what we’ve learnt in week 2.
Probability of getting 3 reds from a draw of 5

There are 26 reds in total There are 26 blacks in total


Pick 3 out of 26 reds Pick 2 out of 26 blacks

Pick 5 out of a deck of 52


CH2010 Engineering Statistics PL2023v6 14
5.2 Hypergeometric distribution
• Following Example 5.3, we can also compute the probability of getting 0, 1, 2, 4
or 5 reds.
• The probability distribution of getting x reds is a hypergeometric distribution.
• The probability distribution of getting x reds in a draw of 5 cards from a deck of
52 cards containing 26 reds, can be denoted h(x; 52, 5, 26.)
• To generalise, the probability distribution of getting x successes in a sample of n
units from a pool of N units containing k numbers of successes is h(x; N, n, k).
• Hypergeometric distribution is relevant to the quality control of a batch-wise
production process.

CH2010 Engineering Statistics PL2023v6 15


Example 5.4
• An injection device is sold in packs of 10.
• It is not acceptable to have more than one defectives in a pack.
• The quality control manager wants to confirm the acceptability of the pack by
testing 3 devices in every pack of 10.
• If all three are good, then the pack is deemed accepted.
• Is this a good quality control plan?
Answer
Let’s consider: if the lot is unacceptable, e.g. 2 out of 10 are defective, what’s the
probability that the sampling plan does not identify this?

CH2010 Engineering Statistics PL2023v6 16


Example 5.4 (cont.)
In other words, what is the probability of picking 0 defective and 3 non-defectives
from a lot containing 2 defectives and 8 non-defectives? Let X be the random
variable describing the number of defectives sampled.

Thus, there’s a 47% percent chance that the sampling plan cannot detect any defect
item even if the pack has 2 defectives.
Therefore, this sampling plan is inadequate.

CH2010 Engineering Statistics PL2023v6 17


5.2 Hypergeometric distribution
Generalising the problem in Examples 5.3 and 5.4 above, we have:

The mean and variance of h(x; N, n, k) are:

CH2010 Engineering Statistics PL2023v6 18


Hypergeometric vs binomial
• In a binomial (Bernoulli) experiment, the probability of a success is constant and
independent of previous results.
• In a hypergeometric experiment, the probabilities of a success depends on the no.
of successes in previous draws.
• In a hypergeometric experiment, if the pool (N) is sufficiently large c.f. the sample
size (n), then the probability of a success may be approximated as constant and
independent of previous results.
• In the above case the hypergeometric distribution can be approximated by a
binomial distribution.
• This approximation is good for n/N ≤ 0.05 (if the population size is at least 20
times larger than the sample size).

CH2010 Engineering Statistics PL2023v6 19


Hypergeometric vs binomial
If the hypothesis/approximation holds, k/N → p, which is independent of n.

Same as the binomial case

Same as the binomial case

CH2010 Engineering Statistics PL2023v6 20


5.3 Negative binomial distribution
When we performance Bernoulli trials, we are interested in the no. of successes in
n trials.
In a negative binomial experiment, we are interested in the no. of Bernoulli trials
required to reach n successes.
The negative binomial distribution, b*(x; k, p), describes the probability of
requiring a total of x trials to reach k no. of successes, when the probability of
success in each trial is p.
This can be computed by firstly considering the probability of having (k − 1)
successes in (x − 1) trials, then multiply by p.

CH2010 Engineering Statistics PL2023v6 21


Example 5.5
• A drug is considered 60% effective in bring some degree of relief to a patient.
• During a clinical trial, what is the probability that the 7th patient tested is the 5th
who experiences relief?
Answer
To rephrase the situation above, it takes exactly 7 trials to have 5 successes.
To rephrase again, it takes 6 trials to have 4 successes (regardless of order), plus the
7th trial exactly being a success.

CH2010 Engineering Statistics PL2023v6 22


Example 5.5 (cont.)
It takes 6 trials to have 4 successes (and two failures, regardless of order), and the
7th is a success.

Number of ways to have 4 successes and Probability of 4 success and 2 failures in a


2 failures in 6 trials specific sequence

CH2010 Engineering Statistics PL2023v6 23


5.3 Negative binomial distribution
To generalise, the probability that the kth success is reached in the xth trial which
has a success rate of p can be described by the negative binomial distribution:

No of different ways that the first Probability of k Probability of x − k failures


x – 1 trials had k – 1 successes successes
The mean and variance of the negative binomial distribution can be calculated
by:

CH2010 Engineering Statistics PL2023v6 24


5.4 Geometric distribution
A special case of the negative binomial distribution is when k = 1.
i.e. what is the probability that the first success occurs in the xth trial?
In this case, the negative binomial distribution

can be described by the geometric distribution.

With mean and variance:

CH2010 Engineering Statistics PL2023v6 25


Application of geometric distribution
• The question “how long does it take to reach a success” is very important to cost
an exploratory work.
• Drilling in a new oil field
• Ignition test of a new engine design
• Discovery of the first effective drug formulation
• Finding a job
• Understanding the underlying distribution function helps us to evaluate and
optimise the effectiveness of the work plan.
• Otherwise, one might invest too much time and resources!

CH2010 Engineering Statistics PL2023v6 26


5.5. Poisson distribution
If the random variable X is the number of outcomes over a given time interval or a
specific region, then the process yielding X is a Poisson process.
Examples:
• No. of phone calls received in 60 minutes.
• No. of bacteria found in a culture.
• No. of typos in a page of essay.
• No. of questions answered in one exam (is this really a random process?)

CH2010 Engineering Statistics PL2023v6 27


5.5 Poisson distribution
• The probability that positive outcomes will occur in a Poisson process is
proportional to the size of the test.
• This can be manifested by the mean number of (i.e. the mathematical expectation)
outcomes in a Poisson process:

where t represents “time”, “distance”, “area” or “volume” of interest and λ is the


coefficient of proportionality.
• The variance of the Poisson random variable is also λt.

CH2010 Engineering Statistics PL2023v6 28


5.5 Poisson distribution
• The probability distribution of the Poisson random variable, X taking the value x is:

where λ represents the average number of outcomes per unit time, distance, area, or
volume. This is also known as Poisson distribution.
Table A.2 tabulates the cumulative Poisson distributions:

CH2010 Engineering Statistics PL2023v6 29


Example 5.6
During a laboratory experiment, the average number of radioactive particles passing
through a detector in 1 ms is 4. What is the probability that 6 particles enter the
counter in a given millisecond?
Solution
Average = 4: λt = 4.

Alternatively, use Table A.2 to estimate the probability that 6 particles enter.

CH2010 Engineering Statistics PL2023v6 30


Nature of the Poisson probability function

The distribution is almost symmetric when μ is as large as 5


CH2010 Engineering Statistics PL2023v6 31
Binomial distribution vs Poisson distribution
• In a binomial/Bernoulli experiment, if there are many failures and only few
successes, i.e. large n and small p, then the sampling sequence can be
approximated as a continuum (of almost all failures), analogous to the concept of
continuous space or time in Poisson distribution.
• Therefore, Poisson distribution can be considered as an extreme case of binomial
distribution.

CH2010 Engineering Statistics PL2023v6 32


Example 5.8
In a glass blowing factory, defects
(bubbles) occurs, occasionally making
the piece undesirable.
On average, 1 in every 1000 items
produces has one more or more bubbles.
What is the probability that a random
sample of 8000 will yield fewer than 7
items possessing bubbles?

CH2010 Engineering Statistics PL2023v6 33


Example 5.8 (cont.)
Solution
This is essentially a binomial experiment with n = 8000 and p = 0.001. Since p is
very close to 0 and n is quite large, we can approximate with the Poisson
distribution using:

Hence, if X represents the number of items with one or more bubbles, we have:

CH2010 Engineering Statistics PL2023v6 34


Summary
Binomial distribution (Table A.1)
No. success in n independent Bernoulli experiments Mean Variance

Hypergeometric distribution
No. success in n successive trials without sample Mean Variance
replacement

CH2010 Engineering Statistics PL2023v6 35


Summary
Negative binomial distribution
No. of Bernoulli trials required to reach k successes Mean Variance

Geometric distribution

No. of Bernoulli trials required to reach the first success Mean Variance

CH2010 Engineering Statistics PL2023v6 36


Summary

Poisson distribution (Table A.2)

Probability of success is proportional to the sample size Mean Variance

CH2010 Engineering Statistics PL2023v6 37

You might also like