The Geometric Distribution
A Director wants to recruit a secretary to assist him in delivering human resource services
with specific responsibility for supporting department staff, providing information to
applicants and employees, maintaining clerical and financial records, and completing
assigned projects and tasks; who would be reporting to him regularly. Considering his busy
schedule, he conducts online interviews sequentially until he finds someone deserving to call
for a personal interview in his office. He knows that the probability of getting a deserving
candidate is 𝑝. What is the probability that he will find the ideal candidate in 𝑥 trials? What is
the probability that 𝑦 candidates would be rejected before he finds the right candidate?
A solution to these problems leads to the concept of Geometric Distribution.
In probability theory and statistics, the geometric distribution is either of two discrete
probability distributions: The probability distribution of the number X of Bernoulli trials
needed to get one success, supported on the set { 1, 2, 3, . . . }. The probability distribution of
the number 𝑌 = 𝑋 − 1 of failures before the first success, supported on the
set { 0, 1, 2, 3, . . . }. It is a matter of convention and convenience whether 𝑋 or 𝑌 would be
referred to follow the geometric distribution.
Consider the probability that the first occurrence of success requires 𝑘 number of independent
trials, each with success probability 𝑝. If the probability of success on each trial is 𝑝, the
probability that the 𝑘 th trial (out of 𝑘 trials) is the first success is:
𝑃[𝑋 = 𝑘] = 𝑓(𝑘) = (1 − 𝑝)𝑘−1 𝑝 for 𝑘 = 1, 2, 3, …
Thus, the above form of geometric distribution is used to model the number of trials until the
first success. In contrast, the following form of geometric distribution is used for modelling
the number of failures preceding the first success:
𝑃[𝑌 = 𝑘] = 𝑓(𝑘) = (1 − 𝑝)𝑘 𝑝 for 𝑘 = 0, 1, 2, 3 ….
In either case, the sequence of probabilities is geometric. Suppose a fair die is thrown
repeatedly until the first time a “1” appears. The probability distribution of the number of
times it is thrown is supported on the infinite set { 1, 2, 3, … } and is a geometric distribution
with 𝑝 = 1/6.
Negative Binomial Distribution
Consider the following problems:
i. Suppose a fair die is thrown repeatedly until the “6” appears six times. What is the
probability that the sixth “6” can be achieved at the 36th trial?
1|Pa g e
ii. Suppose a fair die is thrown repeatedly until the “6” appears five times. What is the
probability that the fifth “6” can be achieved at the 25th trial?
In the HR context, filling more than one vacancy by recruiting several candidates sequentially
until all the vacancies are filled is another simple example. The solution to these problems is
associated with the negative binomial distribution.
Suppose there is a sequence of Bernoulli trials with the probability of success 𝑝 per trial. We
are observing this sequence until a predefined number 𝑟 of failures has occurred. Then, the
random number of successes we have seen, 𝑋, will have the negative binomial (or Pascal)
distribution. The probability mass function of the negative binomial distribution is
𝑘+𝑟−1 𝑘
𝑓(𝑘; 𝑟, 𝑝) = 𝑃[𝑋 = 𝑘] = ( ) 𝑝 (1 − 𝑝)𝑟 for 𝑘 = 0,1,2, …
𝑘−1
Negative Binomial Distribution (Alternative form)
Suppose there is a sequence of Bernoulli trials with the probability of success 𝑝 per trial. We
are observing this sequence until a predefined number 𝑟 of successes has occurred. Then, the
random number of trials we have seen, 𝑋, will have the negative binomial distribution with
𝑝. 𝑚. 𝑓.:
𝑘−1 𝑟
𝑓(𝑘; 𝑟, 𝑝) = 𝑃[𝑋 = 𝑘] = ( ) 𝑝 (1 − 𝑝)𝑘−𝑟 for 𝑘 = 𝑟, 𝑟 + 1, 𝑟 + 2, …
𝑟−1
Discrete Uniform Distribution
In probability theory and statistics, the discrete uniform distribution is a symmetric
probability distribution whereby a finite number of values are equally likely to be observed;
1
every one of n values has equal probability . Another way of saying “discrete uniform
𝑛
distribution” would be “a known, finite number of outcomes equally likely to happen”. A
simple example of the discrete uniform distribution is throwing a fair die. The possible values
1
are 1, 2, 3, 4, 5, and 6; each time the die is thrown, the probability of a given score is .
6
Distribution Function, Survival Function and Hazard Function
Often in problem-solving, we are interested in finding the probability of the type 𝑃[𝑋 ≥
𝑥] (that is, the random variable takes at least the value 𝑥) or 𝑃[𝑋 ≤ 𝑥](that is, the random
variable takes at most the value 𝑥) or 𝑃[𝑋 > 𝑥] or 𝑃[𝑋 < 𝑥], for a given 𝑥 on the real line.
The most important to this end is the 𝑃[𝑋 ≤ 𝑥], for any real-valued random variable 𝑋.
The cumulative distribution function (c.d.f) [or, in short, only distribution function] of a real-
valued random variable X is the function given by 𝐹𝑋 (𝑥) = 𝑃[𝑋 ≤ 𝑥], that represents the
probability that the random variable X takes on a value less than or equal to x.
2|Pa g e
The probability that X lies in the semi-closed interval (𝑎, 𝑏], where 𝑎 < 𝑏, is therefore
𝑃 ( 𝑎 < 𝑋 ≤ 𝑏 ) = 𝐹𝑋 (𝑏) = 𝐹𝑋 (𝑎).
Four necessary and sufficient conditions exist for a function to be a distribution function and
Vice Versa. We shall consider them without the proof.
i. Every cumulative distribution function F is monotone non-decreasing; that is, for any
𝑥1 < 𝑥2 ; 𝑥1 , 𝑥2 ∈ ℝ ; 𝐹𝑋 (𝑥1 ) ≤ 𝐹𝑋 (𝑥2 ).
ii. 𝐹𝑋 (- ∞) = 0, or in other words lim 𝐹𝑋 (x) = 0
𝑥→−∞
iii. 𝐹𝑋 (+ ∞) = 1 or in other words lim 𝐹𝑋 (x) = 1.
𝑥→∞
iv. Every cumulative distribution function F is right-continuous in the sense, 𝐹𝑋 (x+0) =
𝐹𝑋 (x) or in other words lim 𝐹𝑋 (x + ℎ) = 𝐹𝑋 (x).
ℎ→0+
In the definition above, the “less than or equal to” sign, “≤”, is a convention. Many old
Soviet literature uses “<”, so the fourth property changes left continuity instead of right
continuity. This convention does not matter much in the case of absolutely continuous
densities but is vital for discrete distributions.
The CDF of a continuous random variable X can be expressed as the integral of its probability
density function 𝑓𝑋 (𝑡) as follows:
𝑥
𝐹𝑋 (𝑥) = ∫ 𝑓𝑋 (𝑡) 𝑑𝑡
−∞
The survival function, also known as a reliability function or complementary cumulative
distribution function, is a property of any random variable that maps a set of events, usually
associated with mortality or failure of some system, onto time. It captures the probability that
the system will survive beyond a specified time.
The term reliability function is common in engineering, while the term survival function is
used in a broader range of applications, including human mortality.
Let T be a random variable with CDF F(t). Its survival function or reliability function is:
𝑆(𝑡) = 𝑃 (𝑇 > 𝑡) = 1 − 𝐹(𝑡).
The failure rate is the frequency with which an engineered system or component fails,
expressed in failures per unit of time. It is often denoted by the 𝜆 and is highly used in
reliability engineering and engineering management. Calculating the failure rate for a smaller
interval of time in a limiting sense or instantaneous failure rate results in the hazard function
𝑓(𝑡)
(also called hazard rate), ℎ(𝑡). By definition ℎ(𝑡) = .
1−𝐹(𝑡)
Inverse distribution function (quantile function)
3|Pa g e
We shall see rigorous use of this when we study Testing of Hypothesis. The quantile function
specifies, for a given probability in the probability distribution of a random variable, the
value at which the random variable’s probability of being less than or equal to this value is
equal to the given probability. It is also called the percent point function or inverse
cumulative distribution function.
If the CDF 𝐹 is strictly increasing and continuous, then 𝐹 −1 (𝑝); 𝑝 ∈ [0, 1], is the unique
real number 𝑥 such that 𝐹(𝑥 ) = 𝑝, which defines the inverse distribution or quantile function
in such a case.
Some distributions do not have a unique inverse (for example, in the case where 𝑓𝑋 (𝑥) = 0
for all 𝑎 < 𝑥 < 𝑏 causing 𝐹(𝑥 ) to be constant). This problem can be solved by defining,
for 𝑝 ∈ [ 0, 1 ], the generalized inverse distribution function:
𝐹 −1 (𝑝) = inf{𝑥 ∈ ℝ: 𝐹(𝑥) ≥ 𝑝}.
Median and Other Quartiles
Note that 𝑀𝑒𝑑𝑖𝑎𝑛 = 𝐹 −1 (0.50); the middlemost point in a distribution. Given a set of raw
(numerical) data arranged according to the order of magnitude, the median is usually
considered the middlemost observation when the number of data points is odd or, by
convention, the average of the two middlemost points when the number of observations is
even.
The first Quartile is 𝐹 −1 (0.25); the point below which probability is 25% and above which
probability is 75%. The third Quartile is 𝐹 −1 (0.75); the point below which probability is
75% and above which probability is 25%.
Quartiles are typical Quantile measures - there are various quantile measures of location such
as percentile, decile, etc.
Quartile Deviation is a measure of dispersion or variability in the probability distribution.
Writing 𝑄3 = 𝐹 −1 (0.75); 𝑄1 = 𝐹 −1 (0.25); we define Quartile deviation (QD) as (𝑄3 −
𝑄1 )/2. The difference 𝑄3 − 𝑄1 is often called the Inter-Quartile Range (IQR).
Skewness measures the degree of asymmetry in the probability distribution or data, as the
case may be. A quick and robust measure of Skewness is Bowley’s skewness (𝐵), Writing 𝑄2
(𝑄3 −𝑄2 )−(𝑄2 −𝑄1 ) 𝑄3 −2𝑄2 +𝑄1
as the second Quartile or median; we define 𝐵 = = . However, it
𝑄3 − 𝑄1 𝑄3 − 𝑄1
may be misleading in some discrete distributions.
Note on Quartile Measures
These measures are not based on the entire probability distribution or data, as the case may
be. As a result, if there are some outliers in the tails, these measures are highly robust and
efficient because the presence of outliers does not influence them. However, since they are
4|Pa g e
not based on the entire distribution or data, sometimes results are slightly surprising. This is
because skewness is measured using just three locations of the distributions, and more tail
information is not considered. This reminds us that we must study specific measures based on
the entire probability distribution or the data, as the case may be.
Average of a Random Variable (Expectation)
The average of a random variable is usually computed using the notion called “expectation”.
For a discrete random variable 𝑋 that takes values 𝑥1 , 𝑥2 , …, 𝑥𝑛 respectively with probability
𝑝1 , 𝑝2 … , 𝑝𝑛 , the expectation of 𝑋, denoted by 𝐸(𝑋), is given by
𝑛
𝐸(𝑋) = ∑ 𝑥𝑖 𝑝𝑖
𝑖=1
For binomial distribution, it is:
𝑛
𝑛 𝑛−𝑘
𝐸(𝑋) = ∑ 𝑘 ( ) 𝑝𝑘 (1 − 𝑝) = 𝑛𝑝.
𝑘
𝑘=1
Rationale Behind Expectation
For a variable 𝑋 that takes values 𝑥1 , 𝑥2 , …, 𝑥𝑛 respectively with frequency 𝑓1 , 𝑓2 … , 𝑓𝑛 , the
arithmetic mean of 𝑋 is given by
𝑛 𝑛
1 𝑓𝑖
𝑋̅ = ∑ 𝑥𝑖 𝑓𝑖 = ∑ 𝑥𝑖
𝑁 𝑁
𝑖=1 𝑖=1
where
𝑛
𝑁 = ∑ 𝑓𝑖
𝑖=1
𝑓
Note that, according to the relative frequency approach of probability. 𝑖 tends to the
𝑁
probability of 𝑥𝑖 , which may be given by 𝑝𝑖 . That is, in the long run, 𝑋̅ tends to 𝐸(𝑋) and is
given by
𝑛
𝐸(𝑋) = ∑ 𝑥𝑖 𝑝𝑖
𝑖=1
For a discrete random variable 𝑋 that takes values 𝑥1 , 𝑥2 , … , … respectively with probability
𝑝1 , 𝑝2 … , …, the expectation of 𝑋, denoted by 𝐸(𝑋), is given by 𝐸(𝑋) = ∑∞ 𝑖=0 𝑥𝑖 𝑝𝑖 , provided
the sum is finite. In fact, the sum exists if and only if 𝐸(|𝑋|) < ∞.
5|Pa g e
For example, for a Poisson random variable 𝑋, the condition holds, and it can be shown that
𝐸(𝑋) = 𝜆
General Rules for Expectation
For a discrete random variable 𝑋 that takes values 𝑥1 , 𝑥2 , … , … respectively with probability
𝑝1 , 𝑝2 … , …, the expectation of a regular function of 𝑋, say 𝑔(𝑋) is given by
∞ ∞
𝐸(𝑔(𝑋)) = ∑ 𝑔(𝑥𝑖 ) 𝑃[𝑋 = 𝑥𝑖 ] = ∑ 𝑔(𝑥𝑖 )𝑝𝑖
𝑖=0 𝑖=0
provided the 𝐸(|𝑔(𝑋)|) < ∞.
If the distribution of 𝑋 is absolutely continuous 𝐸(𝑔(𝑋)) = ∫𝑋 𝑔(𝑥)𝑓(𝑥)𝑑𝑥, where the
integral is over the support of 𝑋, provided, of course, 𝐸(|𝑔(𝑋)|) < ∞.
Example:
Suppose, 𝑔(𝑋) = 𝑋 2 , Then 𝐸[𝑔(𝑋)] = 𝐸(𝑋 2 ) = ∫𝑋 𝑥 2 𝑓(𝑥)𝑑𝑥, provided the integral is
finite.
Dispersion of a Random Variable
The variance of a random variable 𝑋 is given by
𝑉𝑎𝑟(𝑋) = 𝜎 2 = 𝐸(𝑋 − 𝜇) 2 = 𝐸(𝑋 2 ) − 𝜇2 .
Condition for the existence of variance: 𝐸(𝑋 2 ) < ∞. The standard deviation of a random
variable 𝜎 is the positive square root of variance.
• 𝑋 ~ 𝐵𝑖𝑛𝑜𝑚𝑖𝑎𝑙 (𝑛, 𝑝), Var(𝑋) = np(1 − p)
• 𝑋 ~ 𝑃𝑜𝑖𝑠𝑠𝑜𝑛 (𝜆), Var(𝑋) = 𝜆
Application: Absenteeism in Call Center
Suppose the absenteeism on a particular day amongst Customer Service Representatives
(CSR) deployed for attending inbound calls follows a binomial distribution. Further, suppose
that the probability of a CSR to stay absent is 0.1. There are 25 CSRs in the call centre.
Suppose ten more call centre agents (CSA) handle the responsibilities of sales promotion and
marketing through outbound calls, and their absenteeism follows a binomial distribution with
a parameter of 0.08 on a particular day. Further, suppose that at any particular moment, the
number of incoming calls for the CSR follows a Poisson distribution with a rate (average) of
8. Customers do not have to wait if at least one of the 25 (assuming no absence) CSR is free,
as the call will automatically go to a free CSR. Company is often interested in the following
problems that help them to relook at various strategies:
6|Pa g e
I. What is the probability that 3 CSR will remain absent on a particular day?
II. What is the probability that no more than 3 CSR will remain absent on a particular
day?
III. What is the probability that three or more CSR will remain absent on a particular day?
IV. The company wants to know from the manager whether, in 95 percent of cases,
absenteeism amongst CSR is less than five on a particular day. What will be the
response?
V. What is the Median number of CSR Absentees ( that is, in about 50% of cases, up to
what number of CSRs will remain absent or in about 50% of cases, absenteeism will
be more than the given number)?
VI. What are the first and third quartiles of the distribution of CSRs’ Call centre
absenteeism?
VII. What is the Quartile Deviation of the CSRs’ Call centre absenteeism?
VIII. Find the Bowley’s Skewness in Call Center absenteeism by CSRs and comment.
IX. Question IX: What is the average number of absentees from CSRs daily?
X. Can we say that in 95 percent of situations, absenteeism amongst CSA is less than 5?
XI. On average, how many CSA will remain absent on a particular day?
XII. Suppose the call centre manager adopts a strategy that if more than 3 CSRs remain
absent on a particular day, one CSA will be deputed as CSR. What is the probability
that one CSA has to act as a CSR on a particular day?
XIII. Further, suppose that the manager adopts a strategy to assign one CSA the CSR role
per two absent CSRs (and may ignore the absence of any one CSR) on a particular
day. What is the probability that two CSAs will have to act as CSRs on a particular
day?
XIV. At some point, the top management realizes that retaining the brand value and serving
the existing customers better is more important than trying to win a few new
customers. They directed the manager that 25 CSR should be used 24x7 as far as
practicable, even at the cost of sales promotion, if necessary. What is the probability
that there will be no one for sales promotions and marketing on a particular day?
XV. What is the probability that, at any point, just one customer has to wait?
XVI. What is the probability that, at any point, at least three customers must wait?
XVII. What is the probability that more than half of the CSR will remain free at any time?
XVIII. What are the standard deviations and variances of absenteeism among CSRs and
CSAs?
XIX. What are the standard deviations and variances of the variable denoting the number of
customers who try to reach a certain CSR at a given time?
7|Pa g e
XX. What is the median number of customers trying to reach certain CSR at a given time?
XXI. What is the most likely value (mode) of the number of absentees among CSRs and
CSAs?
XXII. What is the most likely value of customers who try to reach certain CSR at a given
time?
XXIII. Suppose that after implementing stringent laws for casual leaves, the probabilities of
absenteeism boil down to 0.05 for both CSA and CSR. We assume that two groups
behave independently. Under the same policy as in Problem No. XIV, how will you
compute the probability that no one will be left for sales promotion and marketing on
a particular day?
XXIV. On a given day, the total number of absentees in two groups taken together was found
to be 5. What is the probability that 4 of them are CSRs?
8|Pa g e