Professional Documents
Culture Documents
Probability Random Process
Probability Random Process
Sample Space
# Suppose a box contains 100 items of a particular sort, say 100 capacitors, and each capacitor has
a unique production number running from 1101 to 1200. if an experiment consists of randomly
selecting a single capacitor from the box , then an appropriate sample space would be
𝑺𝟏 = {𝟏𝟏𝟎𝟏, 𝟏𝟏𝟎𝟐, … . . , 𝟏𝟐𝟎𝟎}
It would also be appropriate to employ the sample space
𝑺𝟐 = {𝟏𝟏𝟎𝟎, 𝟏𝟏𝟎𝟏, 𝟏𝟏𝟎𝟐, … . . , 𝟏𝟐𝟎𝟎}
One might argue that 𝑺𝟐 is less suitable from a modeling perspective, since no physical observation
will correspond to the element 1100; nevertheless , 𝑺𝟐 still satisfies the two necessary modeling
requirements. Insofar as probability is concerned, the probability of choosing 1100 will eventually
be set to zero. A set that cannot serve as a sample space is
𝑺𝟑 = {𝟏𝟏𝟎𝟏, 𝟏𝟏𝟎𝟐, … . . , 𝟏𝟏𝟗𝟗}
since no element in 𝑺𝟑 corresponds to the selection of the capacitor with production number 1200.
The elements of a sample space are called outcomes. An outcome is a logical entity and refers only to
the manner in which the phenomena are viewed by the experimenter. For instance in the foregoing
example, if
𝑺𝟒 = {𝟎. 𝟎𝟎𝟑𝝁𝑭, 𝟎. 𝟎𝟎𝟒𝝁𝑭}
Then only two outcomes are realized. While there might be all sorts of information available regarding
the chosen capacitor, once 𝑺𝟒 has been chosen as the sample space, inly the measured capacitance is
relevant, since only its observation will result in an outcome ( relative to 𝑺𝟒 ).
Events
In most probability problems, the investigator is interested not merely in the collection of outcomes
but in some subset of the sample space. A subset of a sample space is known as an event. Two
events that do not intersects are said to be mutually exclusive (disjoint). More generally, the events
𝑬𝟏 , 𝑬𝟐 … , 𝑬𝒏 are said to be mutually exclusive if
𝑬𝒊 ∩ 𝑬𝒋 = ∅
For any 𝒊 ≠ 𝒋, ∅ denoting empty set.
Probability( Modeling Random Processes for Engineers and Managers, James J. Solberg, John-Wiley 7 Sons Inc., 2009)
When the “probability of an event’ is spoken of in everyday language , almost everyone has a rough
idea of what is meant. It is fortunate that this is so, because it would be quite difficult to introduce the
concept to someone who had never considered it before. There are at least three distinct ways to
approach the subject, none of which is wholly satisfying.
The first to appear, historically , was the frequency concept. If an experiment were to be repeated many
times, then the number of times that event was observed to occur, divided by the number of times that
the experiment was conducted, would approach a number that was defined to be the probability of the
event. The ratio of the number of chances for success out of the total number of possibilities is the
concept with most elementary treatment of probability start. This definition proved to be somewhat
limiting, however, because circumstances frequently prohibit repetition of an experiment under
precisely the same conditions, even conceptually. Imagine trying to determine the probability of global
annihilation from meteor collision.
To extend the notion of probability to a wider class applications, a second approach involving the idea
of “ subjective” probabilities emerged. According to this idea, the probability of an event need not
relate to the frequency with which it would occur in an infinite number of trials; it is just a measure of
the degree of likelihood we believe the event to possess. This definition covers even the hypothetical
events, but seems a bit too loose for engineering applications. Different people could attach different
probabilities to the same event.
Most modern texts use the third concept, which relies upon axiomatic definition. According to this
notion, probabilities are just elements of an abstract mathematical system obeying certain axioms. This
notion is at once the most powerful and the most devoid of real world meaning. Of course, the axioms
are not purely arbitrary; they were selected to be consistent with the earlier concepts of probabilities
and to provide them with all of the properties everyone would agree they should have.
We will go with the formal axiomatic system , so that we can be rigorous in the mathematics. We want
to be able to calculate probabilities to assist in making good decisions. At the same time, we want to
bear in mind the real world interpretation of probabilities as measures of the likelihood of events in the
world. The whole point of learning the mathematics is to be able to use it in everyday life.
3. 𝑰𝒇 𝑬𝟏 , 𝑬𝟐 , … . , 𝑬𝒊 , … .
𝒊𝒔 𝒂𝒏𝒚 𝒇𝒊𝒏𝒊𝒕𝒆 𝒐𝒓 𝒊𝒏𝒇𝒊𝒏𝒊𝒕𝒆 𝒄𝒐𝒍𝒍𝒆𝒄𝒕𝒊𝒐𝒏 𝒐𝒇 𝒎𝒖𝒕𝒖𝒂𝒍𝒍𝒚 𝒆𝒙𝒄𝒍𝒖𝒔𝒊𝒗𝒆 𝒆𝒗𝒆𝒏𝒕𝒔, 𝒕𝒉𝒆𝒏
𝑷 ∞ڂ
𝒊=𝟏 𝑬𝒊 =𝑷 𝑬𝟏 + 𝑷 𝑬𝟐 + ⋯ … . .
Once 𝑺 has been endowed with a probability measure , 𝑺 is called a probability space.
Some of the additional basic laws of probability (which could be proved from the foregoing ) are:
4. 𝑷 ∅ = 𝟎 𝑤ℎ𝑒𝑟𝑒 ∅ 𝑖𝑠 𝑡ℎ𝑒 𝑒𝑚𝑝𝑡𝑦 𝑠𝑒𝑡 𝑜𝑟 𝑖𝑚𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒 𝑒𝑣𝑒𝑛𝑡.
5. 𝑷 𝑨ഥ =𝟏 − 𝑷 𝑨 . In other words, the probability that an event does not occur is 1 minus the
probability that it does occur.
6. 𝑷 𝑨 ∪ 𝑩 = 𝑷 𝑨 + 𝑷 𝑩 − 𝑷(𝑨 ∩ 𝑩), for any two events, 𝑨 and 𝑩. When the events are not
mutually exclusive ( when there is some possibility for both A and B to occur) then one has to subtract
off the probability that they both occur.
7. 𝑷 𝑨 𝑩 = 𝑷(𝑨 ∩ 𝑩)/𝑷(𝑩) provided 𝑃(𝐵) ≠ 0. This “ basic law” is , in reality , a definition of the
conditional probability of an event, 𝑨, given that another event , 𝑩, has occurred.
8. 𝑷 𝑨 𝑩 = 𝑷(𝑨) if and only if 𝐴 and 𝐵 are independent. This rule can be taken as the formal
definition of independence.
9. 𝑷 𝑨 ∩ 𝑩 = 𝑷 𝑨 𝑷(𝑩) if and only if 𝑨 and 𝑩 are independent.
A set of events 𝑩𝟏 , 𝑩𝟐 , … , 𝑩𝒏 constitute a partition of the sample space 𝑺 if they are mutually
exclusive and collectively exhaustive, that is ,
𝑩𝒊 ∩ 𝑩𝒋 =∅ for every pair 𝑖 and 𝑗
and
𝑺= 𝒊𝑩 𝟏=𝒊𝒏ڂ
In simple terms, a partition is just any way of grouping and listing all possible outcomes such that no
outcome appears in more than one group. When the experiment is performed , one and only one of
the 𝑩𝒊 will occur.
10. 𝑷 𝑨 = σ𝒊 𝑷 𝑨 𝑩𝒊 𝑷(𝑩𝒊 ) for any partition 𝑩𝒊 , 𝒊 = 𝟏, 𝟐, 𝟑, … 𝒏. This is one of the most useful
relationship in modeling applications. It one expression of the so called law of total probability.
Counting
Given a finite sample space
𝑺 = 𝒘𝟏 , 𝒘𝟐 , … . , 𝒘𝒏
of cardinality 𝒏, the hypothesis of equal probability is the assumption that
the physical conditions are such that each outcomes in 𝑺 possesses equal
probability:
𝑷 𝒘𝟏 = 𝑷 𝒘𝟐 =…….. 𝑷 𝒘𝒏 = 𝟏Τ𝒏
In such a case , the probability space is said to be equi-probable.
#In deciding the format for a memory word in a new computer, the designer decides on a length of 16
bits. Since each bit can be 0 or 1, the problem of deciding on the number of possible words can be
modeled as making 16 selections from an urn containing 2 balls. Thus there are 𝟐𝟏𝟔 = 𝟔𝟓, 𝟓𝟑𝟔 possible
words.
Now suppose the 4 symbols are chosen uniformly randomly with replacement. What is the probability
that a string will be formed in which no symbol is utilized more than once? Let E denote the event
consisting of all words with no symbol appearing more than once, then the desired probability is
𝑷(𝟗,𝟒)
𝑷 𝑬 = =0.461
𝟗𝟒
Fig.1
Here 4 possible branches can be chosen for the first selection, 2 for the second,
and 2 for the third. As a result , the tree contains 4 × 2 × 2 = 16 𝑓𝑖𝑛𝑎𝑙 𝑛𝑜𝑑𝑒𝑠.
It is crucial to note that at each of three stages (selections) of the tree, the
number of branches emanating from the nodes is the same. ; otherwise as
Illustrated in Fig.2 , the multiplication technique of the fundamental principal
does not apply. The requirement that there be a constant number of emanating
branches at each stage corresponds to the condition in the selection protocol
that , at each component , the number of possible choices for the component is
fixed and does not depend on the particular objects chosen to fill the preceding
components.
Fig.2
Again consider the set 𝐴. It can be readily seen that there are two 2-tuple
permutations for each 2-element combination. Thus , each 2-element subset from
𝐴 yields 2! =2 permutations. This reasoning resulting in
𝑃 𝑛, 𝑘 = 𝑘! 𝐶 𝑛, 𝑘 .
Or
𝑃(𝑛,𝑘) 𝑛!
𝐶 𝑛, 𝑘 = 𝑘! =𝑘! 𝑛−𝑘 !
DISCRETE RANDOM VARIABLES AND THEIR DISTRIBUTIONS
(Probability and statistics for computer scientists- Michael Baron, Chapman & Hall/CRC, 2007.)
A random variable is a function of an outcome,
𝑿=𝒇 𝝎 .
In other words , it is a quantity that depends on chance. The domain of the random
variable is the sample space. Its range can be the set of all real numbers 𝑹, or only
the positive numbers 𝟎, +∞ , 𝒐𝒓 the integers 𝒁, or the interval (𝟎, 𝟏) , etc.,
depending on what possible values the random variable can potentially take.
# Consider an experiment of tossing 3 fair coins and counting the number of heads. Certainly, the same
model suits the number of girls in a family with 3 children, the number of 1’s in a random binary code
consisting of 3 characters, etc.
Let 𝑿 be the number of heads 9 girl’s, 1’s ) . Prior to an experiment, its value is not known. All we can
say is that 𝑿 has to be an integer between 0 and 3. Since assuming value is an event, we can compute
probabilities,
𝟏 𝟏 𝟏 𝟏
𝑷 𝑿 = 𝟎 = 𝑷 𝒕𝒉𝒓𝒆𝒆 𝒕𝒂𝒊𝒍𝒔 = 𝑷 𝑻𝑻𝑻 = . . =
𝟐 𝟐 𝟐 𝟖
3
𝑃 𝑋 = 1 = 𝑃 𝐻𝑇𝑇 + 𝑃 𝑇𝐻𝑇 + 𝑃 𝑇𝑇𝐻 =
8
3
𝑃 𝑋 = 2 = 𝑃 𝐻𝐻𝑇 + 𝑃 𝐻𝑇𝐻 + 𝑃 𝑇𝐻𝐻 =
8
1
𝑃 𝑋 = 3 = 𝑃 𝐻𝐻𝐻 =
8
Summarizing,
𝒙 𝑷{𝑿 = 𝒙}
0 1/8
1 3/8
2 3/8
3 1/8
Total 1
This table contains everything that is known about random variable 𝑿 prior to the experiment.
Before we know the outcome 𝝎, we cannot tell what 𝑿 equals to. However, we cam list all the
possible values of 𝑿 and determine the corresponding probabilities.
Definition
Recall that one way to compute the probability of an event is to add probabilities of all the
outcomes in it. Hence, for any set 𝑨,
𝑷 𝑿𝝐 𝑨 = σ𝒙𝝐𝑨 𝑷(𝒙).
When 𝑨 is an interval , its probability can be computed directly from the cdf , 𝑭(𝒙),
𝑷 𝒂<𝑿≤𝒃 =𝑭 𝒃 −𝑭 𝒂 .
# (Errors in independent modules) . A program consists of two modules. The number of
errors 𝑋1 in the first module has the pmf 𝑃1 (𝑥), and the number of errors 𝑋2 in the second
module has the pmf 𝑃2 (𝑥), independently of 𝑋1 , where
𝒙 𝑃1 (𝑥) 𝑃2 (𝑥)
0 0.5 0.7
1 0.3 0.2
2 0.1 0.1
3 0.1 0
Sol.: We break the problem into steps. First, determine all possible values of 𝑌, then compute
the probability of each value. Clearly, the number of errors 𝑌 is integer that can be as low as 0 +
0 = 0 and as high as 3 + 2 = 5. Since 𝑃2 3 = 0, the second module has at most 2 errors. Next,
𝑃𝑌 0 = 𝑃 𝑌 = 0 = 𝑃 𝑋1 = 𝑋2 = 0 = 𝑃1 0 𝑃2 0 = 0.5 ∗ 0.7 = 0.35
𝑃𝑌 1 = 𝑃 𝑌 = 1 = 𝑃1 0 𝑃2 1 + 𝑃1 1 𝑃2 0 = 0.5 ∗ 0.2 + 0.3 ∗ 0.7 = 0.31
𝑃𝑌 2 = 𝑃 𝑌 = 2 = 𝑃1 0 𝑃2 2 + 𝑃1 1 𝑃2 1 + 𝑃1 2 𝑃2 0
= 0.5 ∗ 0.1 + 0.3 ∗ 0.2 + 0.1 ∗ 0.7 = 0.18
𝑃𝑌 3 = 𝑃 𝑌 = 3 = 𝑃1 0 𝑃2 3 + 𝑃1 1 𝑃2 2 + 𝑃1 2 𝑃2 1 + 𝑃1 3 𝑃2 0
= 0.5 ∗ 0 + 0.3 ∗ 0.1 + 0.1 ∗ 0.2 + 0.1 ∗ 0.7 = 0.12
𝑃𝑌 4 = 𝑃 𝑌 = 4 = 𝑃1 2 𝑃2 2 + 𝑃1 3 𝑃2 1 =0.1*0.1+0.1*0.2=0.03
𝑃𝑌 5 = 𝑃 𝑌 = 5 = 𝑃1 3 ∗ 𝑃2 (2)=0.1 ∗ 0.1 = 0.01
The cumulative function 𝐹 𝑦 can be similarly computed.
Families of Discrete Distributions
Bernoulli Distribution
The simplest random variable (excluding non-random ones!) takes just two
possible values. Call them 0 and 1.
𝑝 = 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑠𝑢𝑐𝑐𝑒𝑠𝑠
𝑞 = 1 − 𝑝 𝑖𝑓 𝑥 = 0
Bernoulli distribution 𝑃 𝑥 =ቊ
𝑝 𝑖𝑓 𝑥 = 1
𝐸 𝑋 = 𝑝 ; 𝑉𝑎𝑟 𝑋 = 𝑝𝑞
Sol: Let 𝑿 be the number of people (successes) , among the mentioned 15 users (trials),
who will buy the advanced version of the game. It has Binomial distribution with 𝒏 = 𝟏𝟓
Trials and the probability of success
𝒑 = 𝑷 𝒃𝒖𝒚 𝒂𝒅𝒗𝒂𝒏𝒄𝒆𝒅
= 𝑷 𝒃𝒖𝒚 𝒂𝒅𝒗𝒂𝒏𝒄𝒆𝒅 𝒄𝒐𝒎𝒑𝒍𝒆𝒕𝒆 𝒂𝒍𝒍 𝒍𝒆𝒗𝒆𝒍𝒔 𝑷 𝒄𝒐𝒎𝒑𝒍𝒆𝒕𝒆 𝒂𝒍𝒍 𝒍𝒆𝒗𝒆𝒍𝒔
= 𝟎. 𝟑𝟎 ∗ 𝟎. 𝟔𝟎 = 𝟎. 𝟏𝟖
𝑬 𝑿 = 𝟏𝟓 ∗ 𝟎. 𝟏𝟖 = 𝟐. 𝟕
And
𝑷 𝑿 ≥ 𝟐 = 𝟏 − 𝑷 𝑿 < 𝟐 = 𝟏 − 𝑷 𝟎 − 𝑷 𝟏 = 𝟏 − (𝟏 − 𝒑)𝒏 −𝒏𝒑(𝟏 − 𝒑)𝒏−𝟏 =0.7813
Geometric distribution
Definition
The number of Bernoulli trials needed to get the first success has Geometric distribution.
# A search engine goes through a list of sites looking for a given key phrase. Suppose the
search terminates as soon as the key phrase is found. The number of sites visited is
Geometric.
# A hiring manager interviews candidates , one by one, to fill a vacancy. The number of
candidates interviewed until one candidate receives an offer has Geometric distribution.
Geometric random variables can take any integer value from 1 to infinity , because one
needs at least 1 trial to have the first success, and the number of trials needed is not limited
by any specific number. ( For example, there is no guarantee that among the first 10 coin
tosses there will be at least one head.) The only parameter is 𝒑, the probability of a
“success”.
Geometric probability mass function has the form
Observe that
1
σ𝑥 𝑃(𝑥) = σ∞
𝑥=1(1 − 𝑝)
𝑥−1 𝑝=
𝑝=1
1−(1−𝑝)
The mean and variance is given as:
∞ ∞
𝑥−1
𝑑 𝑥
𝑑 1 1
𝑥ҧ = 𝐸 𝑋 = 𝑥(1 − 𝑝) 𝑝 = 𝑝 𝑞 = 𝑝 =
𝑑𝑞 𝑑𝑞 (1 − 𝑞) 𝑝
𝑥=1 𝑥=0
1−𝑝
𝑉𝑎𝑟 𝑋 =
𝑝2
Here we have defined 𝑞 = 1 − 𝑝.
# (St. Petersburg Paradox). This paradox was noticed by a Swiss mathematician
Daniel Bernoulli (1700-1782), a nephew of Jacob. It describes a gambling strategy
that enables one to win any desired amount of money with probability one.
Consider a game that can be played any number of times. Rounds are independent, and each time your
winning probability is 𝑝. The game does not have to be favorable to you or even fair. This 𝑝 can be any
positive probability. For each round , you bet some amount 𝑥. In case of a success , you win 𝑥. If you lose the
round , you lose 𝑥.
The strategy is simple . Your initial bet is the amount that you desire to win eventually. Then, if you win a
round, stop. If you lose a round , double your bet and continue.
Say the desired profit is $100. The game will progress as follows:
Balance
Round Bet ….if lose …. If win
1 100 -100 +100 and stop
2 200 -300 +100 and stop
3 400 -700 +100 and stop
… … …… …….
Sooner or later, the game ill stop, and at this moment, your balance will be $100. Guaranteed! But this is not
what D.Bernoulli called a paradox.
How many rounds should be played? Since each round is a Bernoulli trial, the number of them , 𝑋 , until the
first win is a Geometric random variable with parameter 𝑝.
1
Is the game endless? No, on the average , it will last 𝐸 𝑋 = 1Τ𝑝 rounds. In a fair game with 𝑝 = , one will
2
need 2 rounds, on the average., to win the desired amount. In an “unfair” game, with 𝑝 < 1Τ2, it will take
longer to win, but still a finite number of rounds. For example with 𝑝 = 0.2 i.e., one win in 5 rounds, then
on the average , one stop after 1Τ𝑝 = 5 rounds. This is not a paradox yet.
Finally , how much money does one need to have in order o be able to follow this strategy? Let 𝑌 be the
amount of the last bet. According to the strategy, 𝑌 = 100. 2𝑋−1 . It is a discrete random variable whose
expectation is
𝐸 𝑌 = σ𝑥 100. 2𝑥−1 𝑃𝑋 𝑥 = 100 σ∞ 𝑥=1 2
𝑥−1
(1 − 𝑝)𝑥−1 𝑝
∞ 100𝑝
𝑖𝑓 𝑝 > 1ൗ2
= 100𝑝 [2 1 − 𝑝 ] 𝑥−1
= 2(1 − 𝑝)
𝑥=1 +∞ 𝑖𝑓 𝑝 ≤ 1ൗ2
This the St.Petersburg Paradox ! A random variable that is always finite has an infinite expectation! Even
when the game is fair a 50-50 chance to win , one has to be (on the average! ) infinitely rich to follow this
strategy.
Negative Binomial distribution (Pascal)
In the foregoing , we played the game until the first win. Now keep playing until
we reach a certain number of wins. The number of played games is then Negative
Binomial.
Definition
Therefore
𝐸 𝑋 = 𝐸 𝑋1 + 𝑋2 + ⋯ . +𝑋𝑘 = 𝑘Τ𝑝;
𝑘(1−𝑝)
𝑉𝑎𝑟 𝑋 = 𝑉𝑎𝑟(𝑋1 + 𝑋2 + ⋯ . +𝑋𝑘 )= 𝑝2
#(Sequential testing). In a recent production 5% of certain electronic components are defective. We
need to find 12 non-defective components for our 12 new computers. Components are tested until 12
non defective ones are found. What is the probability that more than 15 components will have to be
tested?
Sol.: Let 𝑿 be the number of components tested until 12 non-defective ones are found. It is a number of
trials needed to see 12 successes, hence 𝑿 has Negative Binomial distribution with 𝒌 = 𝟏𝟐 and 𝒑 =
𝟎. 𝟎𝟓. We need 𝑷 𝑿 > 𝟏𝟓 =
σ∞𝟏𝟔 𝑷 𝒙 = 𝟏 − 𝑭 𝟏𝟓 . 𝐓𝐡𝐞𝐫𝐞𝐟𝐨𝐫𝐞 𝐨𝐧𝐞 𝐧𝐞𝐞𝐝 𝐭𝐡𝐞 𝐭𝐚𝐛𝐥𝐞 𝐨𝐟 𝐍𝐞𝐠𝐚𝐭𝐢𝐯𝐞 𝐛𝐢𝐧𝐨𝐦𝐢𝐚𝐥 𝐝𝐢𝐬𝐭𝐫𝐢𝐛𝐮𝐭𝐢𝐨𝐧.
However one may compute the left hand side using the following argument.
𝑷 𝑿 > 𝟏𝟓 = 𝑷{𝒎𝒐𝒓𝒆 𝒕𝒉𝒂𝒏 𝟏𝟓 𝒕𝒓𝒊𝒂𝒍𝒔 𝒏𝒆𝒆𝒅𝒆𝒅 𝒕𝒐 𝒈𝒆𝒕 𝟏𝟐 𝒔𝒖𝒄𝒄𝒆𝒔𝒔𝒆𝒔}
= 𝑷{𝟏𝟓 𝒕𝒓𝒊𝒂𝒍𝒔 𝒂𝒓𝒆 𝒏𝒐𝒕 𝒔𝒖𝒇𝒇𝒊𝒄𝒊𝒆𝒏𝒕}
= 𝑷{𝒕𝒉𝒆𝒓𝒆 𝒂𝒓𝒆 𝒇𝒆𝒘𝒆𝒓 𝒕𝒉𝒂𝒏 𝟏𝟐 𝒔𝒖𝒄𝒄𝒆𝒔𝒔𝒆𝒔 𝒊𝒏 𝟏𝟓 𝒕𝒓𝒊𝒂𝒍𝒔}
= 𝑷{𝒀 < 𝟏𝟐}
Where 𝒀 is the number of successes (non defective components) in 15 trials, which is a binomial
variable with parameters 𝒏 = 𝟏𝟓 and 𝒑 = 𝟎. 𝟗𝟓. Therefore
𝑷 𝑿 > 𝟏𝟓 = 𝑷 𝒀 < 𝟏𝟐 = 𝑷 𝒀 ≤ 𝟏𝟏 = 𝑭 𝟏𝟏 = 𝟎. 𝟎𝟎𝟓𝟓.
Poisson distribution
This distribution is related to a concept of rare events, or Poissonian events. Essentially
it means that two such events are extremely unlikely to occur within a very short time
or simultaneously. Arrivals of jobs, telephone calls , e-mail messages , traffic accidents,
network blackouts, virus attacks, error in software, floods, earthquakes are example of
rare events.
This distribution bears the name of a famous French mathematician Sim𝑒𝑜𝑛 ƴ Denis
Poisson (1781-1840).
𝐵𝑖𝑛𝑜𝑚𝑖𝑎𝑙 ∶ 𝑏 𝑘; 𝑛, 𝑝 ≈ 𝑃𝑜𝑖𝑠𝑠𝑜𝑛(𝜆)
Poisson approx. to Binomial where 𝑛 ≥ 30, 𝑝 ≤ 0.05, 𝑛𝑝 = 𝜆
𝑛 𝑘 𝑛−𝑘
Here 𝑏 𝑘; 𝑛, 𝑝 = 𝑝 𝑞
𝑘
𝑵 𝑵(𝑵−𝟏)
Sol.: Let 𝒏 = 𝟐
=
𝟐
pairs of students in this class. In each pair , both students are born on the
same day with probability 𝒑 = 𝟏Τ𝟑𝟔𝟓. Each pair is a Bernoulli trial because the two birthdays either
match or don’t match. Besides, matches in two different pairs are “nearly” independent. Therefore ,
𝑿, the number of pairs sharing birthdays, is “almost’ Binomial. For 𝑵 ≥ 𝟏𝟎, 𝒏 ≥ 𝟒𝟓 is large, and 𝒑 is
small, thus we shall use Poisson approximation with 𝝀 = 𝒏𝒑 = 𝑵(𝑵 − 𝟏)/𝟕𝟑𝟎,
𝑷 𝒕𝒉𝒆𝒓𝒆 𝒂𝒓𝒆 𝒕𝒘𝒐 𝒔𝒕𝒖𝒅𝒆𝒏𝒕𝒔 𝒔𝒉𝒂𝒓𝒊𝒏𝒈 𝒃𝒊𝒓𝒕𝒉𝒅𝒂𝒚 = 𝟏 − 𝑷{𝒏𝒐 𝒎𝒂𝒕𝒄𝒉𝒆𝒔}
𝟐
= 𝟏 − 𝑷 𝑿 = 𝟎 ≈ 𝟏 − 𝒆−𝝀 ≈ 𝟏 − 𝒆−𝑵 /𝟕𝟑𝟎
𝟐
Solving the inequality 𝟏 − 𝒆−𝑵 /𝟕𝟑𝟎 > 0.5, we obtain 𝑵 > √(𝟕𝟑𝟎 𝐥𝐧 𝟐) = 𝟐𝟐. 𝟓. That is , in a class of at
least 𝑵 = 𝟐𝟑 students, there is a more than 50% chance that at least two students were born on the
same day of the year.