You are on page 1of 28

Lecture:2

Sample Space

The set 𝑺 is a sample space for an experiment if every physical outcome of


the experiment refers to a unique element of 𝑺.

In effect two requirements are embodied in the foregoing definition.


(1) Every physical outcome of the experiment must refer to some element in
the sample space.
(2) The uniqueness condition means that each physical outcome must refer
to only one element in the sample space.

# Suppose a box contains 100 items of a particular sort, say 100 capacitors, and each capacitor has
a unique production number running from 1101 to 1200. if an experiment consists of randomly
selecting a single capacitor from the box , then an appropriate sample space would be
𝑺𝟏 = {𝟏𝟏𝟎𝟏, 𝟏𝟏𝟎𝟐, … . . , 𝟏𝟐𝟎𝟎}
It would also be appropriate to employ the sample space
𝑺𝟐 = {𝟏𝟏𝟎𝟎, 𝟏𝟏𝟎𝟏, 𝟏𝟏𝟎𝟐, … . . , 𝟏𝟐𝟎𝟎}
One might argue that 𝑺𝟐 is less suitable from a modeling perspective, since no physical observation
will correspond to the element 1100; nevertheless , 𝑺𝟐 still satisfies the two necessary modeling
requirements. Insofar as probability is concerned, the probability of choosing 1100 will eventually
be set to zero. A set that cannot serve as a sample space is
𝑺𝟑 = {𝟏𝟏𝟎𝟏, 𝟏𝟏𝟎𝟐, … . . , 𝟏𝟏𝟗𝟗}
since no element in 𝑺𝟑 corresponds to the selection of the capacitor with production number 1200.
The elements of a sample space are called outcomes. An outcome is a logical entity and refers only to
the manner in which the phenomena are viewed by the experimenter. For instance in the foregoing
example, if
𝑺𝟒 = {𝟎. 𝟎𝟎𝟑𝝁𝑭, 𝟎. 𝟎𝟎𝟒𝝁𝑭}

Then only two outcomes are realized. While there might be all sorts of information available regarding
the chosen capacitor, once 𝑺𝟒 has been chosen as the sample space, inly the measured capacitance is
relevant, since only its observation will result in an outcome ( relative to 𝑺𝟒 ).

Events
In most probability problems, the investigator is interested not merely in the collection of outcomes
but in some subset of the sample space. A subset of a sample space is known as an event. Two
events that do not intersects are said to be mutually exclusive (disjoint). More generally, the events
𝑬𝟏 , 𝑬𝟐 … , 𝑬𝒏 are said to be mutually exclusive if
𝑬𝒊 ∩ 𝑬𝒋 = ∅
For any 𝒊 ≠ 𝒋, ∅ denoting empty set.
Probability( Modeling Random Processes for Engineers and Managers, James J. Solberg, John-Wiley 7 Sons Inc., 2009)
When the “probability of an event’ is spoken of in everyday language , almost everyone has a rough
idea of what is meant. It is fortunate that this is so, because it would be quite difficult to introduce the
concept to someone who had never considered it before. There are at least three distinct ways to
approach the subject, none of which is wholly satisfying.

The first to appear, historically , was the frequency concept. If an experiment were to be repeated many
times, then the number of times that event was observed to occur, divided by the number of times that
the experiment was conducted, would approach a number that was defined to be the probability of the
event. The ratio of the number of chances for success out of the total number of possibilities is the
concept with most elementary treatment of probability start. This definition proved to be somewhat
limiting, however, because circumstances frequently prohibit repetition of an experiment under
precisely the same conditions, even conceptually. Imagine trying to determine the probability of global
annihilation from meteor collision.

To extend the notion of probability to a wider class applications, a second approach involving the idea
of “ subjective” probabilities emerged. According to this idea, the probability of an event need not
relate to the frequency with which it would occur in an infinite number of trials; it is just a measure of
the degree of likelihood we believe the event to possess. This definition covers even the hypothetical
events, but seems a bit too loose for engineering applications. Different people could attach different
probabilities to the same event.

Most modern texts use the third concept, which relies upon axiomatic definition. According to this
notion, probabilities are just elements of an abstract mathematical system obeying certain axioms. This
notion is at once the most powerful and the most devoid of real world meaning. Of course, the axioms
are not purely arbitrary; they were selected to be consistent with the earlier concepts of probabilities
and to provide them with all of the properties everyone would agree they should have.
We will go with the formal axiomatic system , so that we can be rigorous in the mathematics. We want
to be able to calculate probabilities to assist in making good decisions. At the same time, we want to
bear in mind the real world interpretation of probabilities as measures of the likelihood of events in the
world. The whole point of learning the mathematics is to be able to use it in everyday life.

A probability is a function , 𝑷 . 𝒎𝒂𝒑𝒑𝒊𝒏𝒈 𝒆𝒗𝒆𝒏𝒕𝒔 𝒐𝒏𝒕𝒐 𝒓𝒆𝒂𝒍 𝒏𝒖𝒎𝒃𝒆𝒓𝒔, 𝒂𝒏𝒅 𝒔𝒂𝒕𝒊𝒔𝒇𝒚𝒊𝒏𝒈

1. 𝟎 ≤ 𝑷 𝑨 ≤ 𝟏, 𝒇𝒐𝒓 𝒂𝒏𝒚 𝒆𝒗𝒆𝒏𝒕 𝑨.

2. 𝑷 𝑺 = 𝟏, 𝒘𝒉𝒆𝒓𝒆 𝑺 𝒊𝒔 𝒕𝒉𝒆 𝒘𝒉𝒐𝒍𝒆 𝒔𝒂𝒎𝒑𝒍𝒆 𝒔𝒑𝒂𝒄𝒆, 𝒐𝒓 𝒕𝒉𝒆 certain𝒆𝒗𝒆𝒏𝒕

3. 𝑰𝒇 𝑬𝟏 , 𝑬𝟐 , … . , 𝑬𝒊 , … .
𝒊𝒔 𝒂𝒏𝒚 𝒇𝒊𝒏𝒊𝒕𝒆 𝒐𝒓 𝒊𝒏𝒇𝒊𝒏𝒊𝒕𝒆 𝒄𝒐𝒍𝒍𝒆𝒄𝒕𝒊𝒐𝒏 𝒐𝒇 𝒎𝒖𝒕𝒖𝒂𝒍𝒍𝒚 𝒆𝒙𝒄𝒍𝒖𝒔𝒊𝒗𝒆 𝒆𝒗𝒆𝒏𝒕𝒔, 𝒕𝒉𝒆𝒏
𝑷 ‫∞ڂ‬
𝒊=𝟏 𝑬𝒊 =𝑷 𝑬𝟏 + 𝑷 𝑬𝟐 + ⋯ … . .

Once 𝑺 has been endowed with a probability measure , 𝑺 is called a probability space.
Some of the additional basic laws of probability (which could be proved from the foregoing ) are:
4. 𝑷 ∅ = 𝟎 𝑤ℎ𝑒𝑟𝑒 ∅ 𝑖𝑠 𝑡ℎ𝑒 𝑒𝑚𝑝𝑡𝑦 𝑠𝑒𝑡 𝑜𝑟 𝑖𝑚𝑝𝑜𝑠𝑠𝑖𝑏𝑙𝑒 𝑒𝑣𝑒𝑛𝑡.
5. 𝑷 𝑨ഥ =𝟏 − 𝑷 𝑨 . In other words, the probability that an event does not occur is 1 minus the
probability that it does occur.
6. 𝑷 𝑨 ∪ 𝑩 = 𝑷 𝑨 + 𝑷 𝑩 − 𝑷(𝑨 ∩ 𝑩), for any two events, 𝑨 and 𝑩. When the events are not
mutually exclusive ( when there is some possibility for both A and B to occur) then one has to subtract
off the probability that they both occur.
7. 𝑷 𝑨 𝑩 = 𝑷(𝑨 ∩ 𝑩)/𝑷(𝑩) provided 𝑃(𝐵) ≠ 0. This “ basic law” is , in reality , a definition of the
conditional probability of an event, 𝑨, given that another event , 𝑩, has occurred.
8. 𝑷 𝑨 𝑩 = 𝑷(𝑨) if and only if 𝐴 and 𝐵 are independent. This rule can be taken as the formal
definition of independence.
9. 𝑷 𝑨 ∩ 𝑩 = 𝑷 𝑨 𝑷(𝑩) if and only if 𝑨 and 𝑩 are independent.

A set of events 𝑩𝟏 , 𝑩𝟐 , … , 𝑩𝒏 constitute a partition of the sample space 𝑺 if they are mutually
exclusive and collectively exhaustive, that is ,
𝑩𝒊 ∩ 𝑩𝒋 =∅ for every pair 𝑖 and 𝑗

and
‫𝑺= 𝒊𝑩 𝟏=𝒊𝒏ڂ‬
In simple terms, a partition is just any way of grouping and listing all possible outcomes such that no
outcome appears in more than one group. When the experiment is performed , one and only one of
the 𝑩𝒊 will occur.
10. 𝑷 𝑨 = σ𝒊 𝑷 𝑨 𝑩𝒊 𝑷(𝑩𝒊 ) for any partition 𝑩𝒊 , 𝒊 = 𝟏, 𝟐, 𝟑, … 𝒏. This is one of the most useful
relationship in modeling applications. It one expression of the so called law of total probability.
Counting
Given a finite sample space
𝑺 = 𝒘𝟏 , 𝒘𝟐 , … . , 𝒘𝒏
of cardinality 𝒏, the hypothesis of equal probability is the assumption that
the physical conditions are such that each outcomes in 𝑺 possesses equal
probability:
𝑷 𝒘𝟏 = 𝑷 𝒘𝟐 =…….. 𝑷 𝒘𝒏 = 𝟏Τ𝒏
In such a case , the probability space is said to be equi-probable.

Given an urn containing 𝒏 numbered balls, we consider three selection


protocols:
1. Selection with replacements, order counts: For 𝒌 > 𝟎, 𝒌 balls are selected one at a
time, each chosen ball is returned to the urn, and the numbers of the chosen balls are recorded
in the order of selection.
2. Selection without replacement, order counts: For 𝟎 < 𝒌 ≤ 𝒏, 𝒌 balls are selected,
a chosen ball is not returned to the urn, and the numbers of the chosen balls are recorded in
the order of selection.
3. Selection without replacement, order does not count: : For 𝟎 < 𝒌 ≤ 𝒏, 𝒌 balls
are selected, a chosen ball is not returned to the urn, and the numbers of the chosen balls are
recorded without respect to the order of selection. Note that this process is equivalent to
selecting all 𝒌 balls at once.
1. Ordered Selection with Replacement
If 𝑵 = 𝟏, 𝟐, 𝟑, … , 𝒏 gives the number of the balls in the urn, then each outcome
resulting from the selection process is of the form

𝒏𝟏 , 𝒏𝟐 , … . , 𝒏𝒌 ; 𝒏𝒊 𝒊𝒔 𝒕𝒉𝒆 𝒊𝒏𝒕𝒆𝒈𝒆𝒓 𝒊𝒏𝒄𝒍𝒖𝒔𝒊𝒗𝒆𝒍𝒚 𝒃𝒆𝒕𝒘𝒆𝒆𝒏 𝟏 𝒂𝒏𝒅 𝒏.

The number of possible selection is 𝒏𝒌 .

#In deciding the format for a memory word in a new computer, the designer decides on a length of 16
bits. Since each bit can be 0 or 1, the problem of deciding on the number of possible words can be
modeled as making 16 selections from an urn containing 2 balls. Thus there are 𝟐𝟏𝟔 = 𝟔𝟓, 𝟓𝟑𝟔 possible
words.

2. Ordered Selection without Replacement- (Permutation)


In counting the number of possible selection processes without replacement for
which order counts, we are counting permutations.
# To get an idea of the counting problem , consider the set 𝑨 = 𝟏, 𝟐, 𝟑, 𝟒 . The set of permutation of
two objects from 𝑨 is 𝑸 =
𝟏, 𝟐 , 𝟏, 𝟑 , 𝟏, 𝟒 , 𝟐, 𝟏 , 𝟐, 𝟑 , 𝟐, 𝟒 , 𝟑, 𝟏 , 𝟑, 𝟐 , 𝟑, 𝟒 , 𝟒, 𝟏 , 𝟒, 𝟐 , (𝟒, 𝟑)
So that there are 12 permutation. Because of the non-replacement requirement, there are exactly three
ordered pairs in 𝑸 for each choice of a first component value. Thus , there is a total of 12=4X3
permutation.
If 𝑆 is a set containing 𝑛 elements and 0 < 𝑘 ≤ 𝑛 , then there exist
𝑛 𝑛 − 1 𝑛 − 2 … … … (𝑛 − 𝑘 + 1)
permutations of 𝑘 elements from 𝑆. Letting 𝑃(𝑛, 𝑘) denote the number of
permutations and employing factorials,
𝑛!
𝑃 𝑛. 𝑘 =
𝑛−𝑘 !
#Consider an alphabet consisting of 9 distinct symbols from which strings of length 4 that do not use the
same symbol twice are to be formed. Each string is a permutation of 9 objects taken 4 at a time, and thus
there are
𝑷 𝟗, 𝟒 = 𝟗 × 𝟖 × 𝟕 × 𝟔 = 𝟑𝟎𝟐𝟒
possible passwords.

Now suppose the 4 symbols are chosen uniformly randomly with replacement. What is the probability
that a string will be formed in which no symbol is utilized more than once? Let E denote the event
consisting of all words with no symbol appearing more than once, then the desired probability is
𝑷(𝟗,𝟒)
𝑷 𝑬 = =0.461
𝟗𝟒

Fundamental Principle of Counting


Consider the problem of counting 𝒌 −tuples formed according to the following scheme:
1. The first component of the 𝒌 −tuples can be occupied by any one of the 𝒓𝟏 𝐨𝐛𝐣𝐞𝐜𝐭𝐬.
2. No matter which object has been chosen to occupy the first component, any one of the 𝒓𝟐 objects
can occupy the second component.
3. Proceeding recursively, no matter which objects have been chosen to occupy the first 𝒋 −
𝟏components, any one of the 𝒓𝒋 objects can occupy the 𝒋 −th component.
The fundamental principle of counting states that there are 𝑟1 𝑟2 𝑟3 … 𝑟𝑘 possible
𝑘 −tuples that can result from application of the selection scheme . Equivalently
𝐸 = 𝑟1 𝑟2 𝑟3 … 𝑟𝑘
Following Fig.1 illustrates the fundamental principle of counting using a tree
diagram.

Fig.1

Here 4 possible branches can be chosen for the first selection, 2 for the second,
and 2 for the third. As a result , the tree contains 4 × 2 × 2 = 16 𝑓𝑖𝑛𝑎𝑙 𝑛𝑜𝑑𝑒𝑠.
It is crucial to note that at each of three stages (selections) of the tree, the
number of branches emanating from the nodes is the same. ; otherwise as
Illustrated in Fig.2 , the multiplication technique of the fundamental principal
does not apply. The requirement that there be a constant number of emanating
branches at each stage corresponds to the condition in the selection protocol
that , at each component , the number of possible choices for the component is
fixed and does not depend on the particular objects chosen to fill the preceding
components.

Fig.2

Unordered selection Without replacement-Combination


When selecting balls without replacement from an urn without regard to the
order of selection, the result of the procedure consists of a list of non-repeated
elements, the ordering in the list being irrelevant. A list of elements for which
the order of the listing can be interchanged without affecting the list is simply a
set. Thus the outcomes are subsets of the set from which the elements are
being chosen. Employing the urn model terminology , each outcome is a subset
consisting of 𝒌 balls from the original collection of 𝒏 balls in the urn.
Using the set 𝐴 = 1,2,3,4 , the set of all possible subsets that result from selecting
two elements according to the unordered, without replacement protocol is the set
of sets
{ 1,2 1,3 , 1,4 , 2,3 , 2,4 , 3,4 }
Each subset resulting from the unordered , without replacement protocol is called a
combination (of 𝑛 objects taken 𝑘 at a time)- 𝑛𝑘𝐶 or 𝐶 𝑛, 𝑘 .

Again consider the set 𝐴. It can be readily seen that there are two 2-tuple
permutations for each 2-element combination. Thus , each 2-element subset from
𝐴 yields 2! =2 permutations. This reasoning resulting in
𝑃 𝑛, 𝑘 = 𝑘! 𝐶 𝑛, 𝑘 .
Or
𝑃(𝑛,𝑘) 𝑛!
𝐶 𝑛, 𝑘 = 𝑘! =𝑘! 𝑛−𝑘 !
DISCRETE RANDOM VARIABLES AND THEIR DISTRIBUTIONS
(Probability and statistics for computer scientists- Michael Baron, Chapman & Hall/CRC, 2007.)
A random variable is a function of an outcome,
𝑿=𝒇 𝝎 .
In other words , it is a quantity that depends on chance. The domain of the random
variable is the sample space. Its range can be the set of all real numbers 𝑹, or only
the positive numbers 𝟎, +∞ , 𝒐𝒓 the integers 𝒁, or the interval (𝟎, 𝟏) , etc.,
depending on what possible values the random variable can potentially take.

Once an experiment is completed , and the outcome 𝝎 is known , the value of


random variable 𝑿(𝝎) becomes determined.

# Consider an experiment of tossing 3 fair coins and counting the number of heads. Certainly, the same
model suits the number of girls in a family with 3 children, the number of 1’s in a random binary code
consisting of 3 characters, etc.

Let 𝑿 be the number of heads 9 girl’s, 1’s ) . Prior to an experiment, its value is not known. All we can
say is that 𝑿 has to be an integer between 0 and 3. Since assuming value is an event, we can compute
probabilities,
𝟏 𝟏 𝟏 𝟏
𝑷 𝑿 = 𝟎 = 𝑷 𝒕𝒉𝒓𝒆𝒆 𝒕𝒂𝒊𝒍𝒔 = 𝑷 𝑻𝑻𝑻 = . . =
𝟐 𝟐 𝟐 𝟖
3
𝑃 𝑋 = 1 = 𝑃 𝐻𝑇𝑇 + 𝑃 𝑇𝐻𝑇 + 𝑃 𝑇𝑇𝐻 =
8
3
𝑃 𝑋 = 2 = 𝑃 𝐻𝐻𝑇 + 𝑃 𝐻𝑇𝐻 + 𝑃 𝑇𝐻𝐻 =
8
1
𝑃 𝑋 = 3 = 𝑃 𝐻𝐻𝐻 =
8
Summarizing,

𝒙 𝑷{𝑿 = 𝒙}
0 1/8
1 3/8
2 3/8
3 1/8
Total 1

This table contains everything that is known about random variable 𝑿 prior to the experiment.
Before we know the outcome 𝝎, we cannot tell what 𝑿 equals to. However, we cam list all the
possible values of 𝑿 and determine the corresponding probabilities.
Definition

Collection of all probabilities related to 𝑿 is the distribution of 𝑿. The function


𝑷 𝒙 =𝑷 𝑿=𝒙
Is the probability mass function (pmf). The cumulative distribution function (cdf) is defined
as
𝑭 𝒙 = 𝑷 𝑿 ≤ 𝒙 = ෍ 𝑷(𝒚)
𝒚≤𝒙

Recall that one way to compute the probability of an event is to add probabilities of all the
outcomes in it. Hence, for any set 𝑨,

𝑷 𝑿𝝐 𝑨 = σ𝒙𝝐𝑨 𝑷(𝒙).

When 𝑨 is an interval , its probability can be computed directly from the cdf , 𝑭(𝒙),
𝑷 𝒂<𝑿≤𝒃 =𝑭 𝒃 −𝑭 𝒂 .
# (Errors in independent modules) . A program consists of two modules. The number of
errors 𝑋1 in the first module has the pmf 𝑃1 (𝑥), and the number of errors 𝑋2 in the second
module has the pmf 𝑃2 (𝑥), independently of 𝑋1 , where

𝒙 𝑃1 (𝑥) 𝑃2 (𝑥)
0 0.5 0.7
1 0.3 0.2
2 0.1 0.1
3 0.1 0

Find the pmf and cdf of 𝑌 = 𝑋1 + 𝑋2 , the total number of errors.

Sol.: We break the problem into steps. First, determine all possible values of 𝑌, then compute
the probability of each value. Clearly, the number of errors 𝑌 is integer that can be as low as 0 +
0 = 0 and as high as 3 + 2 = 5. Since 𝑃2 3 = 0, the second module has at most 2 errors. Next,
𝑃𝑌 0 = 𝑃 𝑌 = 0 = 𝑃 𝑋1 = 𝑋2 = 0 = 𝑃1 0 𝑃2 0 = 0.5 ∗ 0.7 = 0.35
𝑃𝑌 1 = 𝑃 𝑌 = 1 = 𝑃1 0 𝑃2 1 + 𝑃1 1 𝑃2 0 = 0.5 ∗ 0.2 + 0.3 ∗ 0.7 = 0.31
𝑃𝑌 2 = 𝑃 𝑌 = 2 = 𝑃1 0 𝑃2 2 + 𝑃1 1 𝑃2 1 + 𝑃1 2 𝑃2 0
= 0.5 ∗ 0.1 + 0.3 ∗ 0.2 + 0.1 ∗ 0.7 = 0.18
𝑃𝑌 3 = 𝑃 𝑌 = 3 = 𝑃1 0 𝑃2 3 + 𝑃1 1 𝑃2 2 + 𝑃1 2 𝑃2 1 + 𝑃1 3 𝑃2 0
= 0.5 ∗ 0 + 0.3 ∗ 0.1 + 0.1 ∗ 0.2 + 0.1 ∗ 0.7 = 0.12
𝑃𝑌 4 = 𝑃 𝑌 = 4 = 𝑃1 2 𝑃2 2 + 𝑃1 3 𝑃2 1 =0.1*0.1+0.1*0.2=0.03
𝑃𝑌 5 = 𝑃 𝑌 = 5 = 𝑃1 3 ∗ 𝑃2 (2)=0.1 ∗ 0.1 = 0.01
The cumulative function 𝐹 𝑦 can be similarly computed.
Families of Discrete Distributions

We now consider the most commonly used families of discrete distributions.


Amazingly, absolutely different phenomena can be adequately described by the
same mathematical model, or a family of distributions. Say, the number of
virus attacks , received e-mails, error messages, network blackouts, telephone
calls, traffic accidents, earthquakes, and so on can all be modeled by the
Poisson family of distributions.

Bernoulli Distribution
The simplest random variable (excluding non-random ones!) takes just two
possible values. Call them 0 and 1.

Definition: A random variable with two possible values , 0 and 1, is called


Bernoulli variable, its distribution is Bernoulli distribution, and any experiment
with a binary outcome is called Bernoulli trial.
Good or defective components, parts that pass or fail tests, transmitted or lost signals, working or
malfunctioning hardware, sites that contain or do not contain a key word etc., are examples of
Bernoulli trials. All these experiments fit the same Bernoulli model, where we shall use generic names
for the two outcomes: “successes” and “failures”.
If 𝑃 1 = 𝑝 is the probability of success, then 𝑃 0 = 𝑞 = 1 − 𝑝is the
probability of a failure. We can then compute the expectation and variance as:
𝐸 𝑋 = 𝑥ҧ = ෍ 𝑥𝑃 𝑥 = 0 1 − 𝑝 + 1 𝑝 = 𝑝
𝑥
𝑉𝑎𝑟 𝑋 = 𝜎2 = σ𝑥 𝑥 − 𝑥ҧ 2 𝑃(𝑥)= 𝑝 1 − 𝑝 = 𝑝𝑞

𝑝 = 𝑝𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑠𝑢𝑐𝑐𝑒𝑠𝑠
𝑞 = 1 − 𝑝 𝑖𝑓 𝑥 = 0
Bernoulli distribution 𝑃 𝑥 =ቊ
𝑝 𝑖𝑓 𝑥 = 1
𝐸 𝑋 = 𝑝 ; 𝑉𝑎𝑟 𝑋 = 𝑝𝑞

In fact, we see that there is a whole family of Bernoulli distributions,


indexed by a parameter 𝑝. Every 𝑝 between 0 and 1 defines another
Bernoulli distribution. The distribution with 𝑝 = 𝑞 = 0.5 carries the
highest level of uncertainty because 𝑉𝑎𝑟 𝑋 = 𝑝𝑞 is maximized by 𝑝 =
𝑞 = 0.5. Distribution with lower or higher 𝑝 lave lower variances.
Extreme parameters 𝑝 = 0 and 𝑝 = 1 define non-random variables 0
and 1, respectively, their variance is 0.
Binomial distribution
Consider a sequence of independent Bernoulli trials and count the number of
successes in it. This may be the number of defective computers in a shipment,
the number of updated files in a folder, the number of e-mails with attachments
etc.,
Definition
A variable described as the number of successes in a sequence of independent Bernoulli
trials has Binomial distribution. Its parameter are 𝒏 − 𝒕𝒉𝒆 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒕𝒓𝒊𝒂𝒍𝒔, and 𝒑, the
probability of success.

Binomial probability mass function is


𝒏 𝒙 𝒏−𝒙
𝑷 𝒙 =𝑷 𝑿=𝒙 = 𝒑 𝒒 , 𝒙 = 𝟎, 𝟏, 𝟐, … . , 𝒏
𝒙
Which is the probability of exactly 𝒙 successes in 𝒏 trials. In this formula, 𝒑𝒙 is the
probability of 𝒙 successes , probabilities being multiplied due to independence of trials.
Also , 𝒒𝒏−𝒙 is the probability of the remaining (𝒏 − 𝒙) trials being failures. Finally ,
𝒏 𝒏!
= is the number of elements of the sample space 𝑺 that form the event 𝑿 = 𝒙 .
𝒙 𝒙! 𝒏−𝒙 !
This is the number of possible ordering of 𝒙 successes and (𝒏 − 𝒙) failures among 𝒏 trials
and it is computed as 𝑪 𝒏, 𝒙 .
The expectation , 𝐸 𝑋 is thus given as:
𝑛 𝑥 𝑛−𝑥
𝐸 𝑋 = 𝑥ҧ =σ𝑛𝑥=0 𝑥𝑃 𝑥 = σ𝑛𝑥=0 𝑥 𝑝 𝑞 =𝑛𝑝
𝑥
Similarly the variance is given by
𝑛 2 𝑛−𝑥
𝑉𝑎𝑟 𝑋 = 𝜎 2 = σ𝑛𝑥=0(𝑥 − 𝑥)ҧ 2 𝑝 𝑞 =𝑛𝑝𝑞
𝑥
# An exciting computer game is released. Sixty percent of players complete all the levels.
Thirty percent of them will then buy an advanced version of the game. Among 15 users ,
what is the expected number of people who will buy the advanced version? What is the
probability that at least two people will buy it?

Sol: Let 𝑿 be the number of people (successes) , among the mentioned 15 users (trials),
who will buy the advanced version of the game. It has Binomial distribution with 𝒏 = 𝟏𝟓
Trials and the probability of success
𝒑 = 𝑷 𝒃𝒖𝒚 𝒂𝒅𝒗𝒂𝒏𝒄𝒆𝒅
= 𝑷 𝒃𝒖𝒚 𝒂𝒅𝒗𝒂𝒏𝒄𝒆𝒅 𝒄𝒐𝒎𝒑𝒍𝒆𝒕𝒆 𝒂𝒍𝒍 𝒍𝒆𝒗𝒆𝒍𝒔 𝑷 𝒄𝒐𝒎𝒑𝒍𝒆𝒕𝒆 𝒂𝒍𝒍 𝒍𝒆𝒗𝒆𝒍𝒔
= 𝟎. 𝟑𝟎 ∗ 𝟎. 𝟔𝟎 = 𝟎. 𝟏𝟖
𝑬 𝑿 = 𝟏𝟓 ∗ 𝟎. 𝟏𝟖 = 𝟐. 𝟕
And
𝑷 𝑿 ≥ 𝟐 = 𝟏 − 𝑷 𝑿 < 𝟐 = 𝟏 − 𝑷 𝟎 − 𝑷 𝟏 = 𝟏 − (𝟏 − 𝒑)𝒏 −𝒏𝒑(𝟏 − 𝒑)𝒏−𝟏 =0.7813
Geometric distribution

Consider a sequence of independent Bernoulli trials. Each trial results in a “ success” or a


“failure”.

Definition

The number of Bernoulli trials needed to get the first success has Geometric distribution.

# A search engine goes through a list of sites looking for a given key phrase. Suppose the
search terminates as soon as the key phrase is found. The number of sites visited is
Geometric.

# A hiring manager interviews candidates , one by one, to fill a vacancy. The number of
candidates interviewed until one candidate receives an offer has Geometric distribution.

Geometric random variables can take any integer value from 1 to infinity , because one
needs at least 1 trial to have the first success, and the number of trials needed is not limited
by any specific number. ( For example, there is no guarantee that among the first 10 coin
tosses there will be at least one head.) The only parameter is 𝒑, the probability of a
“success”.
Geometric probability mass function has the form

𝑃 𝑥 = 𝑃 𝑡ℎ𝑒 𝑑𝑖𝑟𝑠𝑡 𝑠𝑢𝑐𝑐𝑒𝑠𝑠 𝑖𝑠 𝑡ℎ𝑒 𝑥 − 𝑡ℎ 𝑡𝑟𝑖𝑎𝑙 = (1 − 𝑝)𝑥−1 𝑝, 𝑥 = 1,2, … .

which is the probability of (𝑥 − 1) failures on the first (𝑥 − 1) trials and a success


on the last trial.

Observe that
1
σ𝑥 𝑃(𝑥) = σ∞
𝑥=1(1 − 𝑝)
𝑥−1 𝑝=
𝑝=1
1−(1−𝑝)
The mean and variance is given as:
∞ ∞
𝑥−1
𝑑 𝑥
𝑑 1 1
𝑥ҧ = 𝐸 𝑋 = ෍ 𝑥(1 − 𝑝) 𝑝 = 𝑝 ෍𝑞 = 𝑝 =
𝑑𝑞 𝑑𝑞 (1 − 𝑞) 𝑝
𝑥=1 𝑥=0
1−𝑝
𝑉𝑎𝑟 𝑋 =
𝑝2
Here we have defined 𝑞 = 1 − 𝑝.
# (St. Petersburg Paradox). This paradox was noticed by a Swiss mathematician
Daniel Bernoulli (1700-1782), a nephew of Jacob. It describes a gambling strategy
that enables one to win any desired amount of money with probability one.

Consider a game that can be played any number of times. Rounds are independent, and each time your
winning probability is 𝑝. The game does not have to be favorable to you or even fair. This 𝑝 can be any
positive probability. For each round , you bet some amount 𝑥. In case of a success , you win 𝑥. If you lose the
round , you lose 𝑥.

The strategy is simple . Your initial bet is the amount that you desire to win eventually. Then, if you win a
round, stop. If you lose a round , double your bet and continue.

Say the desired profit is $100. The game will progress as follows:

Balance
Round Bet ….if lose …. If win
1 100 -100 +100 and stop
2 200 -300 +100 and stop
3 400 -700 +100 and stop
… … …… …….
Sooner or later, the game ill stop, and at this moment, your balance will be $100. Guaranteed! But this is not
what D.Bernoulli called a paradox.

How many rounds should be played? Since each round is a Bernoulli trial, the number of them , 𝑋 , until the
first win is a Geometric random variable with parameter 𝑝.

1
Is the game endless? No, on the average , it will last 𝐸 𝑋 = 1Τ𝑝 rounds. In a fair game with 𝑝 = , one will
2
need 2 rounds, on the average., to win the desired amount. In an “unfair” game, with 𝑝 < 1Τ2, it will take
longer to win, but still a finite number of rounds. For example with 𝑝 = 0.2 i.e., one win in 5 rounds, then
on the average , one stop after 1Τ𝑝 = 5 rounds. This is not a paradox yet.

Finally , how much money does one need to have in order o be able to follow this strategy? Let 𝑌 be the
amount of the last bet. According to the strategy, 𝑌 = 100. 2𝑋−1 . It is a discrete random variable whose
expectation is
𝐸 𝑌 = σ𝑥 100. 2𝑥−1 𝑃𝑋 𝑥 = 100 σ∞ 𝑥=1 2
𝑥−1
(1 − 𝑝)𝑥−1 𝑝
∞ 100𝑝
𝑖𝑓 𝑝 > 1ൗ2
= 100𝑝 ෍[2 1 − 𝑝 ] 𝑥−1
= 2(1 − 𝑝)
𝑥=1 +∞ 𝑖𝑓 𝑝 ≤ 1ൗ2
This the St.Petersburg Paradox ! A random variable that is always finite has an infinite expectation! Even
when the game is fair a 50-50 chance to win , one has to be (on the average! ) infinitely rich to follow this
strategy.
Negative Binomial distribution (Pascal)

In the foregoing , we played the game until the first win. Now keep playing until
we reach a certain number of wins. The number of played games is then Negative
Binomial.

Definition

In a sequence of independent Bernoulli trials, the number of trials needed to


obtain 𝒌 successes has Negative Binomial distribution.

Negative Binomial probability mass function is


𝑃 𝑥 = 𝑃 𝑡ℎ𝑒 𝑥 − 𝑡ℎ 𝑡𝑟𝑖𝑎𝑙 𝑖𝑠 𝑡ℎ𝑒 𝑘 − 𝑡ℎ 𝑠𝑢𝑐𝑐𝑒𝑠𝑠
𝑘 − 1 𝑠𝑢𝑐𝑐𝑒𝑠𝑠𝑒𝑠 𝑖𝑛 𝑡ℎ𝑒 𝑓𝑖𝑟𝑠𝑡 𝑥 − 1 𝑡𝑟𝑖𝑎𝑙𝑠,
=𝑃
𝑎𝑛𝑑 𝑡ℎ𝑒 𝑙𝑎𝑠𝑡 𝑡𝑟𝑖𝑎𝑙 𝑖𝑠 𝑎 𝑠𝑢𝑐𝑐𝑒𝑠𝑠
𝑘−1
= (1 − 𝑝)𝑥−𝑘 𝑝𝑘
𝑥−1
This formula accounts for the probability of 𝑘 successes, the remaining (𝑥 − 𝑘)
failures, and the number of outcomes- sequences with 𝑘 −th success coming on
the 𝑥 − 𝑡ℎ trial.
Negative Binomial distribution has two parameters , 𝑘 and 𝑝. With 𝑘 = 1, it
becomes Geometric. Also , each negative binomial variable can be represented as
a sum of independent Geometric variables,
𝑋 = 𝑋1 + 𝑋2 + ⋯ … . . +𝑋𝑘
With the same probability of success 𝑝. Indeed, the number of trial until the 𝑘 −
𝑡ℎ success consists of a Geometric number of trials 𝑋1 until the first fuccess, an
additional geometric number of trials 𝑋2 until the second success , etc.

Therefore
𝐸 𝑋 = 𝐸 𝑋1 + 𝑋2 + ⋯ . +𝑋𝑘 = 𝑘Τ𝑝;
𝑘(1−𝑝)
𝑉𝑎𝑟 𝑋 = 𝑉𝑎𝑟(𝑋1 + 𝑋2 + ⋯ . +𝑋𝑘 )= 𝑝2
#(Sequential testing). In a recent production 5% of certain electronic components are defective. We
need to find 12 non-defective components for our 12 new computers. Components are tested until 12
non defective ones are found. What is the probability that more than 15 components will have to be
tested?
Sol.: Let 𝑿 be the number of components tested until 12 non-defective ones are found. It is a number of
trials needed to see 12 successes, hence 𝑿 has Negative Binomial distribution with 𝒌 = 𝟏𝟐 and 𝒑 =
𝟎. 𝟎𝟓. We need 𝑷 𝑿 > 𝟏𝟓 =
σ∞𝟏𝟔 𝑷 𝒙 = 𝟏 − 𝑭 𝟏𝟓 . 𝐓𝐡𝐞𝐫𝐞𝐟𝐨𝐫𝐞 𝐨𝐧𝐞 𝐧𝐞𝐞𝐝 𝐭𝐡𝐞 𝐭𝐚𝐛𝐥𝐞 𝐨𝐟 𝐍𝐞𝐠𝐚𝐭𝐢𝐯𝐞 𝐛𝐢𝐧𝐨𝐦𝐢𝐚𝐥 𝐝𝐢𝐬𝐭𝐫𝐢𝐛𝐮𝐭𝐢𝐨𝐧.
However one may compute the left hand side using the following argument.
𝑷 𝑿 > 𝟏𝟓 = 𝑷{𝒎𝒐𝒓𝒆 𝒕𝒉𝒂𝒏 𝟏𝟓 𝒕𝒓𝒊𝒂𝒍𝒔 𝒏𝒆𝒆𝒅𝒆𝒅 𝒕𝒐 𝒈𝒆𝒕 𝟏𝟐 𝒔𝒖𝒄𝒄𝒆𝒔𝒔𝒆𝒔}
= 𝑷{𝟏𝟓 𝒕𝒓𝒊𝒂𝒍𝒔 𝒂𝒓𝒆 𝒏𝒐𝒕 𝒔𝒖𝒇𝒇𝒊𝒄𝒊𝒆𝒏𝒕}
= 𝑷{𝒕𝒉𝒆𝒓𝒆 𝒂𝒓𝒆 𝒇𝒆𝒘𝒆𝒓 𝒕𝒉𝒂𝒏 𝟏𝟐 𝒔𝒖𝒄𝒄𝒆𝒔𝒔𝒆𝒔 𝒊𝒏 𝟏𝟓 𝒕𝒓𝒊𝒂𝒍𝒔}
= 𝑷{𝒀 < 𝟏𝟐}
Where 𝒀 is the number of successes (non defective components) in 15 trials, which is a binomial
variable with parameters 𝒏 = 𝟏𝟓 and 𝒑 = 𝟎. 𝟗𝟓. Therefore
𝑷 𝑿 > 𝟏𝟓 = 𝑷 𝒀 < 𝟏𝟐 = 𝑷 𝒀 ≤ 𝟏𝟏 = 𝑭 𝟏𝟏 = 𝟎. 𝟎𝟎𝟓𝟓.

Poisson distribution
This distribution is related to a concept of rare events, or Poissonian events. Essentially
it means that two such events are extremely unlikely to occur within a very short time
or simultaneously. Arrivals of jobs, telephone calls , e-mail messages , traffic accidents,
network blackouts, virus attacks, error in software, floods, earthquakes are example of
rare events.
This distribution bears the name of a famous French mathematician Sim𝑒𝑜𝑛 ƴ Denis
Poisson (1781-1840).

𝜆 = 𝑓𝑟𝑒𝑞𝑢𝑒𝑛𝑐𝑦, 𝑎𝑣𝑒𝑟𝑎𝑔𝑒 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑒𝑣𝑒𝑛𝑡𝑠


𝑒 −𝜆 𝜆𝑥
Poisson distribution
𝑃 𝑥 = 𝑥 = 0,1,2, … .
,
𝑥!
𝐸 𝑋 =𝜆
𝑉𝑎𝑟 = 𝜆
# ( New accounts) . Customers of an internet service provider initiate new accounts at the average
rate of 10 accounts per day. (a) What is the probability that more than 8 new accounts will be initiated
today? (b) What is the probability that more than 16 accounts will be initiated within 2 days?
Sol.: (a) Here 𝝀 = 𝟏𝟎. The probability that more than 8 new accounts ,
𝒆−𝟏𝟎 (𝟏𝟎)𝒙
𝑷 𝑿 > 𝟖 = 𝟏 − 𝑭𝑿 𝟖 = 𝟏 − σ𝟖𝒙=𝟎 =1-0.333=0.667
𝒙!
(b) The number of accounts , 𝒀, opened within 2 days does not exceed 𝟐𝑿. Rather, 𝒀 is another Poisson
random variable whose parameter equals 20. Therefore
𝑷 𝒀 > 𝟏𝟔 = 𝟏 − 𝑭𝒀 𝟏𝟔 = 𝟏 − 𝟎. 𝟐𝟐𝟏 = 𝟎. 𝟕𝟕𝟗.

𝐵𝑖𝑛𝑜𝑚𝑖𝑎𝑙 ∶ 𝑏 𝑘; 𝑛, 𝑝 ≈ 𝑃𝑜𝑖𝑠𝑠𝑜𝑛(𝜆)
Poisson approx. to Binomial where 𝑛 ≥ 30, 𝑝 ≤ 0.05, 𝑛𝑝 = 𝜆
𝑛 𝑘 𝑛−𝑘
Here 𝑏 𝑘; 𝑛, 𝑝 = 𝑝 𝑞
𝑘

Mathematically, it means closeness of binomial and Poisson pmf,


−𝜆 𝜆𝑘
𝑛 𝑘 𝑒
lim 𝑝 (1 − 𝑝)𝑛−𝑘 =
𝑛→∞,𝑝→0,𝑛𝑝→𝜆 𝑘 𝑘!

* When 𝒑 ≥ 𝟎. 𝟗𝟓 , the Poisson approximation is applicable too. The probability of a failure


𝒒 = 𝟏 − 𝒑 is small in this case. Then, we can approximate the number of failures , which is
also Binomial.
#(Birthday problem). Consider a class with 𝑵 ≥ 𝟏𝟎 students. Compute the probability that at least
two of them have their birthdays on the same day. How many students should be in class in order to
have this probability above 0.5?

𝑵 𝑵(𝑵−𝟏)
Sol.: Let 𝒏 = 𝟐
=
𝟐
pairs of students in this class. In each pair , both students are born on the
same day with probability 𝒑 = 𝟏Τ𝟑𝟔𝟓. Each pair is a Bernoulli trial because the two birthdays either
match or don’t match. Besides, matches in two different pairs are “nearly” independent. Therefore ,
𝑿, the number of pairs sharing birthdays, is “almost’ Binomial. For 𝑵 ≥ 𝟏𝟎, 𝒏 ≥ 𝟒𝟓 is large, and 𝒑 is
small, thus we shall use Poisson approximation with 𝝀 = 𝒏𝒑 = 𝑵(𝑵 − 𝟏)/𝟕𝟑𝟎,
𝑷 𝒕𝒉𝒆𝒓𝒆 𝒂𝒓𝒆 𝒕𝒘𝒐 𝒔𝒕𝒖𝒅𝒆𝒏𝒕𝒔 𝒔𝒉𝒂𝒓𝒊𝒏𝒈 𝒃𝒊𝒓𝒕𝒉𝒅𝒂𝒚 = 𝟏 − 𝑷{𝒏𝒐 𝒎𝒂𝒕𝒄𝒉𝒆𝒔}
𝟐
= 𝟏 − 𝑷 𝑿 = 𝟎 ≈ 𝟏 − 𝒆−𝝀 ≈ 𝟏 − 𝒆−𝑵 /𝟕𝟑𝟎
𝟐
Solving the inequality 𝟏 − 𝒆−𝑵 /𝟕𝟑𝟎 > 0.5, we obtain 𝑵 > √(𝟕𝟑𝟎 𝐥𝐧 𝟐) = 𝟐𝟐. 𝟓. That is , in a class of at
least 𝑵 = 𝟐𝟑 students, there is a more than 50% chance that at least two students were born on the
same day of the year.

You might also like