You are on page 1of 26

Probability and Random Variables

Lecture Notes

2021

Rahul Mukherjee
Indian Institute of Management Calcutta

1
BE SURE TO BRING THESE NOTES TO EVERY CLASS STARTING FROM THE
FIRST CLASS

PROBABILITY AND RANDOM VARIABLES

Basic probability
Classical definition: A random experiment is one whose outcome cannot be predicted with
certainty. Examples include tossing a coin, rolling a die, etc. The set of all possible outcomes of a
random experiment is called the sample space, and denoted by Ω. Any particular outcome is a
sample point or simply a case. If we assume that the sample space is finite and that all cases are
equally likely, then according to the classical definition, the probability of an event A is defined as
N ( A)
P(A) = ,
N
where N(A) is the number of cases that favor A and N is the total number of cases.
Example 1. If a fair die with faces 1,…,6 is rolled twice, then altogether there are 36 cases,
11, 12,…,16, 21, 22,…, 26, …, 61, 62, …, 66,
i.e., N = 36. Now, if A denotes the event that the total score is a perfect square, then there are seven
cases that favor A, namely, 13, 22, 31, 36, 45, 54, 63, so that N(A) = 7, and P(A) =7/36. ■
The classical definition has, however, two limitations. First, the sample space may not be finite.
Second, even if the sample space is finite, not all cases may be equally likely.
Modified definition: We now present a modified definition which is applicable to discrete sample
spaces. A discrete sample space Ω is one whose elements can be arranged in the form of a sequence
which may be finite or infinite. In this situation, we can write
Ω = {ω1, ω2,…},
where ω1, ω2,… are the sample points. Clearly, a finite sample space is discrete. As an example of
an infinite sample space which is discrete, suppose a coin is tossed till the first head appears, Then
Ω ={H, TH, TTH, …},
i.e., Ω = {ω1, ω2,…}, with ω1= H, ω2 = TH, ω3 = TTH, and so on. With a discrete sample space as
above, for each i , let pi be a nonnegative quantity associated with ωi such that p1 + p2 + … =1. The
quantities p1, p2,… etc. quantify past experience about the experimental process. Then, according to
our modified definition, the probability of an event A is given by
P(A) =ΣA pi,
where ΣA denotes sum over pi such that ωi  A.
Example 2. From past experience, suppose the following are known about a die with faces 1,…,6:
(a) faces 1, 2, 3 are equally likely to occur,
(b) faces 4, 5, 6 are equally likely to occur,
(c) face 4 is twice as likely to occur as face 1.
Then we can take p1 = p2 = p3 = k and p4 = p5 = p6 = 2k, where k is some constant. As p1 + …+ p6
= 1, it follows that k = 1/9. Hence if A denotes the event of getting an odd number, then
P(A) = p1 + p3 + p5 = 4k = 4/9. ■
We next show how the modified definition covers the classical definition as a special case. To
that effect, suppose Ω is finite, say Ω = {ω1, ω2,…, ωN}, and let all cases be equally likely, i.e., p1 =
p2 = …= pN = 1/N. Then, for any event A, the modified definition yields
P(A) = ΣA pi = ΣA (1/N) = (1/N) +…+ (1/N) = N(A)/N,
which agrees with the classical definition. Note that in the sum shown above, (1/N) appears N(A)
times, because each ωi  A, i.e., each of the N(A) cases favoring A, contributes a term.

2
Some simple facts: (a) 0 ≤ P(A) ≤ 1,
(b) P(Ω) = 1 and P(ϕ) = 0, where ϕ denotes the empty set,
c c
(c) P(A ) = 1 – P(A), where A denotes the complement of A,
(d) If A  B, then P(A) ≤ P(B),
(e) P(AUB) = P(A) +P(B) – P(A∩B).
The last formula can be easily extended to the case of n events, say, B1,…, Bn. Let
S1 = P(B1) +…+ P(Bn),
S2 = P(B1∩B2) + P(B1∩B3) …+ P(Bn –1∩Bn), and so on.
Note that S1 involves n terms, S2 involves ( n2 ) terms, etc. Then the following union-intersection
formula, known more formally as the Theorem of Total Probability, holds:
n –1
P(B1U…UBn) = S1 – S2 + S3 – … + (–1) Sn.
The above can be proved by induction on n. A shorter but deeper proof is available in Feller, vol. 1.
Conditional probability and independence: Consider events A and B such that P(B) > 0. Then the
conditional probability of A given B is defined as
P( A  B)
P(A|B) = .
P( B)
In particular, if we go by the classical definition, then
P( A  B) N ( A  B) / N N ( A  B)
P(A|B) = = = ,
P( B) N ( B) / N N ( B)
which is the proportion of cases favoring A among those that favor B.
Two events A and B are independent if
P(A∩B) = P(A) P(B).
In this situation, if P(B) > 0, then dividing both sides by P(B),
P( A  B)
= P(A), i.e., P(A|B) = P(A),
P( B)
i.e., the conditional probability of A given B is the same as the unconditional probability of A. In
other words, our perception of the probability of A is not affected by information about occurrence
of B, which agrees intuitively with the idea of independence. Indeed, if A and B are independent
and P(A) > 0, then we also have P(B|A) = P(B), with similar implications.
Independence of n events: Hereafter, by product rule we will mean that the probability of an
intersection equals the product of individual probabilities. Thus, two events A and B are
independent if the product rule holds for their intersection, i.e., if the probability of the intersection
A∩B is the same as the product of the two individual probabilities P(A) and P(B).
Consider now n events B1,…, Bn. They are mutually or jointly independent if the product rule
holds for all intersections involving two or more of them, e.g., with n = 3, if
P(B1∩B2) = P(B1)P(B2), P(B1∩B3) = P(B1)P(B3), P(B2∩B3) = P(B2)P(B3),
P(B1∩B2∩B3) = P(B1)P(B2)P(B3).
The n events B1,…, Bn are pairwise independent if the product rule holds for all two-at-a-time
intersections involving them, e.g., with n = 3, if
P(B1∩B2) = P(B1)P(B2), P(B1∩B3) = P(B1)P(B3), P(B2∩B3) = P(B2)P(B3).
In what follows, we shall be concerned only with joint independence. We note at this stage that
joint independence obviously implies pairwise independence, but the converse is not generally true
as the following example reveals.
Example 3. A ticket is drawn at random from a box containing four tickets 1, 2, 3 and 4. Then Ω =
{1, 2, 3, 4}. Consider the events B1 = {1, 4}, B2 = {2, 4} and B3 = {3, 4}. Then
B1∩B2 = B1∩B3 = B2∩B3 = B1∩B2∩B3 = {4}.

3
Hence, if we assume that all cases are equally likely and go by the classical definition of
probability, then P(B1) = P(B2) = P(B3) = 2/4 = 1/2, and
P(B1∩B2) = P(B1∩B3) = P(B2∩B3) = P(B1∩B2∩B3) = 1/4.
Therefore,
P(B1∩B2) = P(B1)P(B2), P(B1∩B3) = P(B1)P(B3), P(B2∩B3) = P(B2)P(B3),
but P(B1∩B2∩B3) = 1/4  1/8 = P(B1)P(B2)P(B3).
In other words, the product rule holds for all two-at-a-time intersections but breaks down for the
three-at-a-time intersection. Thus the three events are pairwise but not mutually independent. ■
Mutually exclusive and exhaustive events: Two events A and B are mutually exclusive if P(A∩B) =
0. We now examine if two events can be simultaneously mutually exclusive and independent. If A
and B are two such events, then
0 = P(A∩B) [as they are mutually excusive]
= P(A) P(B), [as they are independent]
so that at least one of P(A) and P(B) must vanish. Thus, except in the trivial case where at least one
of A and B is impossible, they cannot be simultaneously mutually exclusive and independent.
Consider n events B1,…, Bn. They are mutually exclusive if no two of them can occur together.
The n events are exhaustive if at least one of them is bound to occur, i.e., if P(B1U…U Bn) = 1. We
give an example to show that the two concepts just introduced have no connection with each other.
Example 4. Suppose a fair die with faces 1,…,6 is rolled once. Then Ω = {1, 2,…, 6}. Consider the
events B1 = {1, 2}, B2 = {3, 4}, B3 = {5, 6}, B4 = {1, 3, 4}. Then
(a) the events B1, B2 , B3 are mutually exclusive and exhaustive,
(b) the events B1, B2 are mutually exclusive but not exhaustive,
(c) the events B1, B3 , B4 are exhaustive but not mutually exclusive,
(d) the events B1, B4 are neither mutually exclusive nor exhaustive. ■
Bayes’ Theorem: Consider mutually exclusive and exhaustive events B1,…, Bn.
n
(a) Then for any event A, P(A) =  P( A  Bi ) .
i 1
n
(b) If P ( Bi ) > 0 for every i , then P(A) =  P( A | Bi ) P( Bi ) .
i 1
(c) [Bayes’ Theorem] If, in addition, P(A) > 0, then for any fixed j,
P( A | B j ) P( B j )
P ( B j | A) = n
 P( A | Bi ) P( Bi )
i 1
Proof. (a) Obvious.
n P( A  Bi ) n
(b) By (a), P(A) =  P( Bi ) =  P( A | Bi ) P( Bi ) .
i 1 P ( Bi ) i 1

P( B j  A) P( A  B j )
(c) P ( B j | A) = = (*)
P ( A) P( A)
P( A  B j )
Now, P( A  B j ) = P ( B j ) = P( A | B j ) P( B j ) . Using this in the numerator of (*) and the
P( B j )
result of (b) in the denominator of (*), part (c) follows.
Random variable: A random variable is a variable that takes its values depending on the outcome of
a random experiment. More formally, a random variable is a finite, real-valued, measurable

4
function defined over a sample space. The word “measurable” loosely means that one can make
probability statements on a random variable.
As an illustration, consider a coin tossing game where the player gets Rs 2 if the coin falls
heads, and pays Re 1 if the coin falls tails. Then Ω = {H, T}, and denoting the gain in Rupees by X,
we have X(H) = 2 and X(T) = –1. Thus X assigns a finite real value to each element of Ω, i.e., X is a
finite, real-valued function defined over Ω. Moreover, one can make probability statements about
X, e.g., if the coin is fair, then X equals 2 or –1, each with probability 1/2.
Discrete random variables
Preliminaries: A discrete random variable is one for which the set of possible values can be
arranged in a sequence which may be finite or infinite. Typically, we write a random variable in
upper case and its particular values in the corresponding lower case. Thus, consider a discrete
random variable X with possible values x1, x2 , …, having respective probabilities p1, p2 ,…, where
p1 + p2 + … =1. Then x1, x2 , …, together with p1, p2 ,…, define the probability distribution of X,
because they show how the total probability 1 is distributed over the possible values of X.
The mean or expectation of X is defined as
E(X) =  xi pi ,
i
provided the sum is absolutely convergent, i.e.,  i | xi | pi   . If E(X) involves an infinite sum,
then this ensures that the sum is well-defined with a value that does not depend on the ordering of
the terms. Hereafter, whenever we consider expectation, this requirement of absolute convergence
is assumed to be met, i.e., the expectation is assumed to exist. The same remark applies to variance
to be defined now. Write μ = E(X). Then the variance of X is defined as
V(X) =  ( xi   ) 2 pi ,
i
while the standard deviation of X is given by SD(X) = V ( X ) . The following computational
formula will be very useful:
V(X) = E ( X 2 )  {E ( X )}2 ,
i.e., variance equals expectation of square minus square of expectation
We next turn to specific discrete distributions.
1. Discrete uniform distribution: Here the possible values of X are 1,…,n, each with probability 1/n.
So, E(X) = 1(1/n) + 2(1/n) + …+ n(1/n) = (1/n)(1+2+…+n) = (n+1)/2. Similarly,
E ( X 2 ) = 12 (1 / n)  2 2 (1 / n)  ...  n 2 (1 / n) = (1 / n)(12  2 2  ...  n 2 ) = (n  1)(2n  1) / 6 .
Hence V(X) = E ( X 2 )  {E ( X )}2 = (n  1)(2n  1) / 6  {(( n  1) / 2}2 = (n 2  1) / 12 , on simplification.
Discrete uniform distribution: E(X) = (n  1) / 2 V(X) = (n 2  1) / 12
Example 5. If a fair die with faces 1,…,6 is rolled once, then the score obtained, say X, has possible
values 1,…, 6, each with probability 1/6. Hence with n = 6 in the discrete uniform distribution,
E(X) = 7/2 and V(X) =35/12. ■
2. Two-point distribution: Here X has only two possible values a and b (a < b), having respective
2 2 2
probabilities p and q = 1 – p. Then E(X) = ap + bq, and E(X ) = a p + b q, so that
2 2 2 2 2 2
V(X) = E ( X 2 )  {E ( X )}2 = a p + b q – (ap + bq) = a p(1 – p) + b q(1 – q) – 2abpq = (b – a) pq,
because p + q = 1.
2
Two-point distribution: V(X) =(b – a) pq
3. Binomial distribution: Consider a sequence of independent trials such that each trial has two
possible outcomes, success and failure. In every trial, the probability of success is p and that of
failure is q = 1 – p. Let X be the number of successes in n such trials. Clearly, the possible values of
5
X are 0, 1,…, n. For any x in this range, we calculate the probability P(X = x). Since the trials are
independent, the probability of success in the first x trials and failure in the remaining n – x trials is
x n–x
pp…pqq…q (here p is written x times and q is written n – x times) = p q . However, the x trials
leading to success can be any x of the n trials and hence chosen in ( nx ) ways, each of which
x n–x
contributes p q to P(X = x). Hence we obtain
P(X = x) = ( nx ) p x q n x , x = 0, 1,…, n.
We shall use the shorthand f (x) for P(X = x) and call f (x) the probability mass function (pmf).
Thus, for the binomial distribution
f (x) = ( nx ) p x q n x , x = 0, 1,…, n.

In order to find the expectation and variance of the binomial distribution, for i  1,..., n , define
the binary variable Xi which equals 1 if the ith trial yields a success, and 0 if the ith trial yields a
failure. Note that X = X1 +…+ Xn . Hence
E ( X )  E ( X 1 )  ...  E ( X n ) .
Furthermore, V ( X )  V ( X 1 )  ...  V ( X n ) ,
as X 1 ,..., X n are independent random variables, because they correspond to independent trials.
Now, for any X i , the possible values are 1 and 0, which correspond to success and failure, and
hence have probabilities p and q, respectively. Therefore,
E ( X i )  1 p  0q  p , E ( X i2 )  12 p  0 2 q  p ,
V ( X i )  E ( X i2 )  {E ( X i )}2  p  p 2  pq .
Hence E ( X )  E ( X 1 )  ...  E ( X n ) = np, and V ( X )  V ( X 1 )  ...  V ( X n ) = npq.
Binomial distribution: E(X) = np V(X) = npq
The above ideas can be readily extended to the situation where the n trials continue to be
independent but the success probabilities vary from trial to trial. For i  1,..., n , suppose the ith trial
leads to a success or a failure with probabilities pi and qi , respectively, where qi  1  pi . Then, as
before, for each i ,
E ( X i )  pi and V ( X i )  pi qi ,
so that writing X for the number of successes in the n trials, we get
E ( X )  p1  ...  p n V ( X )  p1q1  ...  p n q n
Here we could find the expectation and variance of X without explicitly considering its distribution
which can be hard to obtain, especially for large n. There are many other similar situations where
one can find expectation and variance without much difficulty even though the distribution is
intractable. Can one then make probability statements on the basis of expectation and variance
alone? Chebyshev’s inequality, to be studied later, is a first step in this direction.
Example 6. Consider three coins which show heads with probabilities 0.3, 0.5 and 0.6, respectively.
If X be the number of heads when these coins are tossed together, then we can apply the above
formulae, with p1 = 0.3, p2 = 0.5, p3 = 0.6, and accordingly, q1 = 0.7, q2 = 0.5, q3 = 0.4, to obtain
E(X) = 0.3 + 0.5 + 0.6 = 1.4, and V(X) = (0.3)(0.7) + (0.5)(0.5) + (0.6)(0.4) = 0.7. ■
4. Poisson distribution: This arises as a limiting case of the binomial distribution as n tends to
infinity and p tends to zero, such that the product λ = np remains bounded. The Poisson distribution
is specified by the pmf
 x
f ( x)  P( X  x)  e , x = 0, 1, 2,…
x!
Poisson distribution : E(X) = λ V(X) = λ

6
These can be heuristically explained from the binomial expectation and variance as follows:
(Binomial) E(X) = np = λ (Poisson)
(Binomial) V(X) = npq = np(1 – p) = np(1 – np/n) = λ(1 – λ/n)  λ as n   (Poisson)
In fact, the pmf f (x) of the Poisson distribution can be obtained from that of the binomial
distribution via a similar but more laborious limiting operation.
The Poisson distribution arises for small p and hence is useful for modeling rare events.
Consider for example, the number of major earthquakes in a region over a ten year period. Divide
this period into small intervals, say each consisting of a day. There are 3652 days, taking care of
two leap years. Each of these n = 3652 days gives a trial: there is either an earthquake on that day
or none. Of course, the probability, p, of an earthquake on any particular day is very small. Thus,
here n is large and p is small, and the number of earthquakes over the ten year period can be
supposed to have the Poisson distribution.
As another example, consider the number of incoming calls at a telephone over a one hour
period. Divide this period into small intervals, say each of one second duration. There are n = 3600
seconds, each giving a trial: there is either an incoming call at that second or none, the probability,
p, of an incoming call at any particular second being very small. So, here again n is large and p is
small, and it is reasonable to suppose that the number of incoming calls over the one hour period
has the Poisson distribution.
In practice, numerical studies show that the Poisson distribution can be safely used for
approximating a binomial distribution when n ≥ 50 and p ≤ 0.03.
Example 7. In a large population, the incidence rate of a disease is 1%. Let X be the number of
persons having the disease among 100 people randomly chosen from this population. Each of these
n = 100 people corresponds to a trial: he/she may or may not have the disease, the probability of
having the disease being p = 0.01. Furthermore, these trials are independent because the population
size is large and hence the 100 randomly chosen people behave independently. Thus, X has the
binomial distribution with n = 100 and p = 0.01. Since n is large and p is small, the distribution of X
may be supposed to be Poisson with parameter λ = np = 1. As an immediate application, the
probability that at most one of the 100 chosen people has the disease is given by
0 1
P(X ≤ 1) = P(X = 0) + P(X = 1) = f (0)  f (1) = e    e = (1   )e  = 2e 1 = 0.7358. ■
0! 1!
5. Hypergeometric distribution: In a finite population of N units, suppose a proportion, p, of units
fall in a specific category. Then in the population, there are Np units that fall in this category, and
N – Np = Nq units that do not. Here q = 1 – p. A random sample of n units is drawn from the
population without replacement. Let X be the number of units in the sample that fall in the category
under consideration. Then
 Np  Nq 
  
 x  n  x 
P(X = x) = ,
N
 
n 
where the range of x is such that the combinations are well defined. This is called the
hypergeometric distribution because the above expression for P(X = x) is linked with
hypergeometric series in pure mathematics.
 N n
Hypergeometric distribution: E(X) = np V(X) = npq 
 N 1 
The expectation of the hypergeometric distribution is the same as that of the binomial distribution.
Moreover, the hypergeometric variance tends to the binomial variance as N tends to infinity, with n,
p and q held fixed. This is anticipated because in that case the distinction between without and with

7
replacement sampling gets blurred and the hypergeometric distribution itself converges to the
binomial distribution.
6. Geometric distribution: In the setup of the binomial distribution, suppose the trials are continued
till the first success is obtained, let X be the number of failures before the first success. Then the
possible values of X are 0, 1, 2,…, and for any x in this range,
x
P(X = x) = qq…qp [q written x times] = q p.
We next calculate the expectation and variance of the geometric distribution. Note that
 
E(X) =  xP( X  x) =  xq x p = 0 p  1qp  2q 2 p  3q 3 p  ... = qpS,
x 0 x 0
2
where S = 1 + 2q + 3q + …
2
qS = q + 2q + …
2 2 2
(subtracting) (1 – q)S = 1 + q + q + … = 1/(1 – q), i.e., S = 1/(1 – q) = 1/p , and E(X) = qpS = q/p.
 
Next, E ( X 2 ) =  x 2 P( X  x) =  x 2 q x p = 0 2 p  12 qp  2 2 q 2 p  32 q 3 p  4 2 q 4 p  ... = qpT,
x 0 x 0
2 3
where T = 1 + 4q + 9q + 16q + …
2 3
qT = q + 4q + 9q + …
2 3
(subtracting) (1 – q)T = 1 + 3q + 5q + 7q + …
2 3
q(1 – q)T = q + 3q + 5q + …
2 2 3
(subtracting) (1 – q) T = 1 + 2q + 2q + 2q + …
2
= 1 + 2q(1 + q + q + …) = 1 + 2q/(1 – q) = (1 + q)/(1 – q).
3 3 2
i.e., T = (1 + q)/(1 – q) = (1 + q)/p , and E ( X 2 ) = qpT = q(1 + q)/p . Therefore,
2 2 2
V(X) = E ( X 2 )  {E ( X )}2 = q(1 + q)/p – (q/p) = q/p .
2
Geometric distribution: E(X) = q/p V(X) = q/p
7. Negative binomial distribution: In the setup of the binomial distribution, suppose the trials are
continued till the rth success is obtained, where r is a given positive integer. Let X be the number of
failures before the rth success. Then the possible values of X are 0, 1, 2,…, and
 r  x  1 x r
P(X = x) =  q p , x = 0, 1, 2,…,
 x 
The name negative binomial arises because the above is proportional to the terms in the expansion
–r
of (1 – q) . The case r = 1 corresponds to the geometric distribution. In order to understand the
form of P(X = x), note that x failures and r successes occur in any particular order with probability
 r  x  1
q x p r , and that there are   possible orders as the (r + x)th trial yields a success.
 x 
2
Negative binomial distribution: E(X) = rq/p V(X) = rq/p
Arguments similar to those for the binomial distribution explain the forms of E(X) and V(X) as
stated above. Note that the number of failures, X, preceding the rth success can be expressed as
X = X1 +…+ Xr ,
where X1 is the number of failures before the first success, X2 is the number of failures after the first
but before the second success, X3 is the number of failures after the second but before the third
success, and so on. Clearly, X1 has the geometric distribution. Similarly, X2, X3,… have the
geometric distribution because the process regenerates itself after the first, second, … successes,
2
respectively. So, E(Xi) = q/p and V(Xi) = q/p , for every i . Moreover, X1, X2, X3,… are
independent, because so are the trials. Hence
E ( X )  E ( X 1 )  ...  E ( X r )  rq / p , V ( X )  V ( X 1 )  ...  V ( X r )  rq / p 2 .

8
Continuous random variables
Preliminaries: An (absolutely) continuous random variable X is specified by a probability density
function (pdf) f (x) (≥ 0) such that for any a and b (a < b),
b
P(a ≤ X ≤ b) = P(a < X ≤ b) = P(a ≤ X < b) = P(a < X < b) =  f ( x)dx .
a
In the above, we allow a =   or b =  . Clearly,

 f ( x)dx = P (  X  ) = 1.

Also, P(X = a) = 0, for any fixed a. This can be explained as
a
P(X = a) = P(a ≤ X ≤ a) =  f ( x)dx = 0.
a
Expectation, variance etc. can be defined as in the discrete case with sums replaced by integrals,

e.g., E(X) =  xf ( x)dx ,

provided the integral is absolutely convergent. For any fixed x, let
x
F(x) = P(X ≤ x) =  f (u )du .

The function F is called the cumulative distribution function or simply, the distribution function of
X. Note that dF ( x) / dx  f ( x) , provided the derivative exists. Any solution to F(x) = 1/2 is called a
median of X. Also, any maximizer of f (x) is called a mode of X. We next turn to specific
continuous distributions.
1. Continuous uniform distribution: The continuous uniform distribution over the range [a, b],
where a < b, is given by the pdf
1
f (x) = if a ≤ x ≤ b,
ba
= 0 otherwise.
This is also known as the rectangular distribution as the graph of f (x) resembles a rectangle.
Clearly, any x in the range [a, b] maximizes f (x) and hence is a mode.
For r = 1, 2,
b
r

r
b
r 1 b 1  x r 1 
r b r 1  a r 1
E ( X ) =  x f ( x)dx =  x f ( x)dx =  x dx =   = .
 a a ba b  a  r  1  (b  a)(r  1)
a
So, E(X) = (b  a) / 2 , E ( X )  (b  ab  a ) / 3 , and V(X) = E ( X )  {E ( X )}2 = (b  a ) 2 / 12 .
2 2 2 2

Continuous uniform distribution: E(X) = (b  a) / 2 V(X) = (b  a ) 2 / 12


We also have, for any interval I,
length of I  [a, b]
P( X  I ) =
ba
The above can be interpreted as an extension of the classical definition of probability to the
continuous case where, under the uniform distribution, the probability of an interval I is seen to
equal the length of the relevant part of I divided by the length of [a, b].
Example 8. Let X have the continuous uniform distribution over [6, 10]. Then
8 8 8 1 8  6 length of [5, 8]  [6, 10]
P(5 ≤ X ≤ 8) =  f ( x)dx =  f ( x)dx =  dx = = . ■
5 6 6 10  6 10  6 10  6
2. Exponential distribution: This is specified by the pdf

9
1
f (x) = e  x /  if x ≥ 0,

= 0 otherwise.
Here  is a positive-valued parameter. It is obvious (for instance, by drawing the graph of f (x) )
that f (x) is maximum at x = 0. Hence this distribution has mode 0. The following fact will be
useful in finding E(X) and V(X).

r y
Fact: For any nonnegative integer r,  y e dy  r !
0
Can be proved either by induction or using gamma function.
Now, for r = 1, 2,
   1
E ( X r ) =  x r f ( x)dx =  x r f ( x)dx =  x r e  x /  dx [Put y = x/θ, i.e., x = θy and dx = θdy]
 0 0 
 1 
=  ( y ) r e  y dy =  r  y r e  y dy   r (r !) .
0  0
So, E(X) = θ, E ( X )  2 , and V(X) = E ( X 2 )  {E ( X )}2 =  2 .
2 2

Exponential distribution: E(X) = θ V(X) =  2


We also have, for any fixed positive x,
 1
P(X > x) =  f (u )du =  e  u /  du = [e u /  ]x = e  x /  .
x x

For any fixed positive x, P(X > x) = e  x / 


From the above, it is also possible to find the median of the exponential distribution. For any x > 0,
F(x) = P(X ≤ x) = 1 – P(X > x) = 1  e  x /  .
Hence F(x) = 1/2 if and only if e  x /   1 / 2 , i.e., x   log e 2 . Thus, the exponential distribution
has median  log e 2 .
Example 9. The distribution of life, in hours, of an electrical component is known to be exponential.
If the component survives for more than 40 hours with probability 0.64, then what is the probability
that it fails within 20 hours? Denote the life, in hours, of the component by X, which has the
exponential distribution, say with parameter θ. Then 0.64 = P(X > 40) = exp(– 40/θ). Hence the
probability that the component fails within 20 hours is given by
1/2 1/2
P(X ≤ 20) = 1 – P(X > 20) = 1 – exp(– 20/θ) = 1 – {exp(– 40/θ)} =1 – (0.64) = 1 – 0.8 = 0.2. ■
3. Normal distribution: This is specified by the pdf
1  (x  )2 
f (x) = exp  ,   x  .
 2  2 2 
 
Here  and  are parameters,  can be any real number and  is positive. These parameters have
natural interpretation. One can show that E(X) =  and V(X) =  2 , i.e., SD(X) =  . It is easy to
see that the graph of f (x) is symmetric about  and attains a peak at  . Thus both the median and
the mode equal  . Indeed, f (x) depends on x only through ( x   ) 2 which appears in the
exponent with a minus sign. From this, it is evident that f (x) is maximum at x   .
In particular, if  = 0 and  = 1, then we get the standard normal distribution. Then f (x)
reduces to
1
 (x) = exp( 12 x 2 ) ,   x  .
2

10
x
Let  ( x)    (u )du .

Extensive tables of  (x) are available in the literature; see, e.g., the well-known Biometrika tables.
The following fact, arising as a consequence of the symmetry of  (x) about 0, is sometimes helpful
while reading these tables.
( x) = 1  ( x)
Consider now the case of general  and  . Then the following facts are useful in finding
probabilities associated with the normal distribution.
For any a and b (a < b),
b   a
(1) P(a ≤ X ≤ b) = P(a < X ≤ b) = P(a ≤ X < b) = P(a < X < b) =     .
     
b 
(2) P( X ≤ b) = P(X < b) =  .
  
a
(3) P(X ≥ a) = P( X > a) = 1   .
  
Note that (2) and (3) readily follow from (1). Thus, (2) is obtained from (1) if we allow a  
and note that  (x) tends to 0 as x   . Similarly, (3) is obtained from (1) if we allow b  
and note that  (x) tends to 1 as x   .
In the present day computing environment, the above probabilities can be computed using
standard software. For example, these can be found by the command “normcdf ” in MATLAB:
(1) P(a ≤ X ≤ b) = P(a < X ≤ b) = P(a ≤ X < b) = P(a < X < b)
= normcdf(b,  ,  ) – normcdf(a,  ,  ),
(2) P( X ≤ b) = P(X < b) = normcdf(b,  ,  ),
(3) P(X ≥ a) = P( X > a) = 1 – normcdf(a,  ,  ).
The tables are, however, still very useful because they are easily available and portable.
The normal distribution has many applications. Empirical studies show that it works well in such
diverse fields as biology, psychology, industry, economics, etc., either directly or via suitable
transformations. Moreover, it provides excellent approximations to other distributions and serves as
a very powerful computational tool. To motivate the ideas, consider the following example.
Example 10. If a fair coin is tossed 1000 times, then what is the probability that the number of
heads lies between 495 and 505? On the face of it, this is a simple problem. One only needs to find
P(X = 495) + P(X = 496) + … + P(X = 505),
where X has the binomial distribution with n = 1000 and p = 1/2. Now, in order to get P(X = 495),
one needs to calculate (1000
495 ) which is impossible even using a computer. The same difficulty will
arise for the other terms in the sum above. Here n is large but p is not small, and so, the Poisson
approximation does not work. However, the normal distribution will come to our rescue. ■
A. Normal approximation to the binomial distribution: Let X have the binomial distribution with
parameters n and p. Then for large n, say n ≥ 30, the distribution of X is approximately normal with
  np and   npq . This is the essence of the De Moivre Laplace Limit Theorem. Note that
under this approximation, the expectation and standard deviation of X remain unaltered.
Example 10 (continued). Here the distribution of X is binomial with n = 1000 and p = 1/2. As n is
large, X is approximately normal with  = (1000)(1/2) = 500 and   (1000)(1 / 2)(1 / 2) = 15.81.
Therefore, using the normal tables, it is easy to obtain the required probability as

11
 505  500   495  500 
P(495 ≤ X ≤ 505) =      =  (0.32)   (0.32)
 15.81   15.81 
=  (0.32)  {1   (0.32)} = 2 (0.32)  1 = 2(0.6255) – 1 = 0.25. ■
At this stage, one may wonder about a choice between the normal and Poisson approximations
while working with a binomial distribution with large n and small p. Empirical studies suggest that
the Poisson approximation is better if p ≤ 0.03; on the other hand, if p > 0.03, then it is advisable to
use the normal approximation.
B. Central Limit Theorem in simplest form: As seen earlier, the expectation and variance of a
binomially distributed random variable X can be obtained readily via the representation X = X1
+…+ Xn in terms of binary variables X1,…, Xn. This representation can also be used to explain the
normal approximation to the binomial distribution. It is possible to generalize this idea. Let X1,…,
Xn be independent random variables having the same distribution such that E ( X i )  m and
V ( X i )   2 , for each i . Then for large n, say n ≥ 30, the distribution of the sum S = X1 +…+ Xn is
approximately normal with   nm and    n . This is the essence of what is known as the
Lindeberg-Levy form of the central limit theorem. The expressions for  and  match our
intuition, because
E ( S )  E ( X 1 )  ...  E ( X n )  nm , and V ( S )  V ( X 1 )  ...  V ( X n )  n 2 , i.e., SD(S) =  n .
Example 11. The waiting time, in minutes, at a bus stop has the continuous uniform distribution
over [0, 10]. How does one calculate the probability that the mean waiting time, over a period of 48
days, does not exceed 5 min 30 sec? Let X1,…, X48 denote the waiting times, in minutes, for the 48
days. These may be supposed to be independent, and each of these has the continuous uniform
2
distribution over [0, 10]. Hence E ( X i ) = (10 + 0)/2 = 5 (= m) and V ( X i ) = (10 – 0) /12 = 25/3 (=
2
δ ), for each i . So, by the central limit theorem with n = 48, the sum S = X1 +…+ X48 is
approximately normal with  =(48)(5) = 240 and   (25 / 3)(48) = 20. If we now write M for
the mean of X1,…, X48, then M = S/48 and using the normal tables,
P(M ≤ 5.5) = P(S/48 ≤ 5.5) = P(S ≤ 264) = Φ((264 – 240)/20) = Φ(1.2) = 0.885. ■

C. Chebyshev’s inequality: Let X be any random variable with E(X) =  and V(X) =  2 . Then for
any t > 1,
1
P ( | X   |  t )  1  2 .
t
Here we do not consider t ≤ 1, for then the right-hand side becomes negative or zero, and the
inequality becomes noninformative. On the other hand, for t = 3, it follows that X lies in the interval
  3 , of length 6 , with probability at least 8/9; for the normal distribution, this probability is
actually as high as 0.9973. This has implications in quality management. If a process is under
control, then the value of a quality characteristic must lie in the interval   3 for at least a
proportion 8/9 of the items produced. If the actual proportion within these limits falls short of 8/9,
then there is reason to suspect that there has been a shift in the process mean or the process standard
deviation and this requires to be probed.

12
Correlation coefficient and independence
Let X and Y be jointly distributed random variables. Then their covariance is defined as
cov(X, Y) = E{( X  1 )(Y   2 )} ,
where 1 = E(X) and  2 = E(Y). Analogously to the corresponding result for variance, the following
computational formula holds for covariance:

13
cov(X, Y) = E ( XY )  E ( X ) E (Y ) ,
i.e., covariance equals expectation of product minus product of expectations
The correlation coefficient between X and Y is defined as
cov( X , Y )
 .
V ( X )V (Y )
The correlation coefficient is a unit-free pure number which lies between –1 and 1 and measures
the linear relationship between X and Y. The value ρ = 1 is indicative of perfect positive linear
relationship, while ρ = –1 is indicative of perfect negative linear relationship. Of course, ρ = 0
indicates lack of linear relationship.
Two random variables X and Y are independent if
P ( X  x and Y  y ) = P ( X  x) P(Y  y ) , for every x and y.
In the discrete case, the above definition reduces to
P ( X  x and Y  y ) = P ( X  x) P(Y  y ) , for every x and y,
which amounts to independence of the events X = x and Y = y, for every x and y. One can check that
if X and Y are independent, then E(XY) = E(X)E(Y), i.e., the product rule holds not only for
probabilities but also for expectation. As a result, if X and Y are independent, then cov(X, Y) = 0,
and hence ρ = 0, i.e., independence implies zero correlation. This is anticipated because
independence rules out any kind of relationship including linear relationship that ρ measures.
The converse, however, is not true, i.e., zero correlation does not imply independence. This is
again natural because ρ = 0 indicates lack of linear relationship, but other kinds of relationship may
exist. The following example serves as an illustration.
Example 12. Let X be a discrete random variable with possible values –1, 0 and 1, each having
2
probability 1/3. Define Y = X . Clearly, X and Y have a nonlinear relationship. Now,
3 3 3
E(X) = (–1)(1/3) + 0(1/3) + 1(1/3) = 0, E(XY) = E(X ) = (–1) (1/3) + 0 (1/3) + 13(1/3) = 0,
so that cov(X, Y) = E(XY) – E(X)E(Y) = 0, and ρ = 0. However, X and Y are not independent. For
example, P(X = 0) = 1/3 and P(Y = 1) = P(X = –1 or 1) = 2/3, but P(X = 0 and Y = 1) = 0, because Y
2
= X . Therefore, P(X = 0 and Y = 1)  P(X = 0) P(Y = 1), and X and Y are not independent. ■
Notwithstanding the above, there are two special situations where zero correlation implies
independence: (i) each of X and Y has only two possible values, (b) the joint distribution of X and Y
is bivariate normal.

14
Probability and distributions: Questions for practice

1. In a factory there are two machines producing 30% and 70% respectively of the total output. Out
of the items produced by the first machine 4% are defectives, whereas out of the items produced by
the second machine 6% are defectives. An item is drawn at random from the production line and it
is found to be defective. What is the conditional probability that this item was produced by the
second machine ?
2. In a factory, there are three machines 1, 2, 3, producing 50%, 30%, 20% respectively of the total
output. Out of the items produced by machine 2, four percent are defectives. The corresponding
figure for machine 3 is 6%. The following is known:
“If an item is drawn at random from the production line and found to be defective then the
conditional probability for this item to be produced by machine 1 is 0.50”.
What is the proportion of defective items among those produced by machine 1?
3. Out of the valves produced by factories A and B, 10% and 20%, respectively, are defectives. A bag
contains 4 valves of factory A and 5 valves of factory B. If two valves are drawn at random from the
bag without replacement, find the probability that at least one valve is defective.
4. Among the candidates seeking admission to a major business school, 20% are good, 50% are
average and 30% are poor, in terms of their intrinsic merit levels. The admission process involves a
written test, followed by a personal interview. Each candidate gets one of the three letter grades A, B
and C in the written test. The following are known:
Any good candidate gets either A or B grade with respective probabilities 0.7 and 0.3.
Any average candidate gets A, B or C grade with respective probabilities 0.2, 0.5 and 0.3.
Any poor candidate gets either B or C grade with respective probabilities 0.2 and 0.8.
Only the candidates securing an A grade in the written test are called for personal interview, and every
candidate so called appears at the interview. Any candidate called for interview is offered admission
with probability 0.5 if he/she is good, and with probability 0.1 if he/she is average. Obviously,
candidates not called for interview are not offered admission.
Moreover, any candidate called for interview is assigned a numerical score which equals 3 if he/she is
offered admission, and 0 otherwise.
(a) What is the probability that a randomly chosen candidate is offered admission?
(b) Given that a randomly chosen candidate has not been offered admission, what is the conditional
probability of his/her getting B grade in the written test?
(c) Given that a randomly chosen candidate has not obtained a C grade in the written test, what is the
conditional probability of his/her not being offered admission?
(d) Given that a randomly chosen candidate has been called for interview, what is the conditional
variance of his/her numerical score?
5. A business school conducts an examination on quantitative methods in two rounds. Each student
must appear in the first round of the examination, and any such student is assigned one of the four
letter grades A, B, C or F, on the basis of his/her performance at this round. It is known that 10%,
30%, 40% and 20% of the students get A, B, C and F grades respectively, at the first round.
If a student gets an A or a B grade at the first round then this is recorded as his/her final grade.
Otherwise, he/she has the option of sitting for the second round of the examination for possible grade
improvement.
Among the students securing a C grade at the first round, 40% do not opt for the second round of
examination and for them the final grade is recorded as C. On the other hand, among those who get a
C grade at the first round and decide to sit for the second round, 10%, 10%, 50% and 30% get A, B, C
and F grades respectively, at the second round.

15
All those securing an F grade at the first round sit for the second round of examination and, among
them, 5%, 10%, 40% and 45% get A, B, C and F grades respectively, at the second round. As per the
rules of the business school, the grades at the second round are converted to final grades using a
formula which is summarized as follows:

Grade in the second round A B C F


Final grade B C C F
(a) What is the probability that a randomly chosen student sits for the second round of the
examination?
(b) What is the probability that a randomly chosen student gets a final grade C ?
(c) Given that a randomly chosen student has a final grade B, what is the conditional probability that
he/she did not sit for the second round of examination?
(d) Given that a randomly chosen student has a final grade F, what is the conditional probability that
he/she got an F grade in the first round of examination?
6. A production process involves three machines A, B and C, which produce 50%, 30% and 20%
respectively, of the total output. Out of the items produced by machine A, 10% fail in a quality control
test. The corresponding figures for machines B and C are 20% and 30% respectively. All items passing
the quality control test are directly acceptable. On the other hand, items failing in the quality control
test are further processed and thus 40%, 50% and 60% of them turn out to be marginally acceptable,
depending on whether they came from machines A, B and C respectively, e.g., out of the items, that
are produced by machine A and that fail in the quality control test, 40% eventually turn out to be
marginally acceptable, and so on.
(a) Find the probability that a randomly chosen item from the production process is found to be
directly acceptable.
(b) Find the probability that a randomly chosen item from the production process turns out to be
marginally acceptable.
(c) Given that a randomly chosen item from the production process has failed in the quality control
test, what is the conditional probability that it turns out to be marginally acceptable?
(d) Given that a randomly chosen item from the production process has turned out to be marginally
acceptable, what is the conditional probability that it was produced by machine A?
(e) Given that a randomly chosen item was not produced by machine B, what is the conditional
probability that it turns out to be marginally acceptable?
7. The e-mail system of a management institute has a sensor that evaluates each incoming e-mail to
any account and then places it either in the inbox or the junk-box of that account. Consider three
accounts A, B and C with this system. Any incoming e-mail to these accounts is either educational
(called E-type for brevity) or of other kinds (called O-type for brevity). The following are known:
 The sensor puts any E-type e-mail either in the inbox or in the junk-box with respective
probabilities 0.8 and 0.2.
 The sensor puts any O-type e-mail either in the inbox or in the junk-box with respective
probabilities 0.3 and 0.7.
 Incoming e-mails are placed in the inbox or junk-box independently of one another.
(a) It is known that 60% of the e-mails coming to account A are of E-type and the rest are of O-type.
An e-mail was picked up at random from all the incoming ones to this account. If this e-mail was one
of those in the junk-box, then what is the conditional probability that this e-mail was of O-type?
(b) It is known that the number of E-type e-mails that arrive at account B on any day, during the
period 8:00am – 9:00am, equals 1, 2 or 3 each with probability 1/3. Obtain the probability that among
such e-mails on a particular day, exactly two go to the inbox.
(c) The following information is available about account C:

16
 Everyday, during the period 8:00am – 9:00am, it receives exactly one e-mail of the E-type.
 Furthermore, during the same period, it receives either no e-mail of the O-type or exactly one
e-mail of the O-type, with respective probabilities 0.4 and 0.6.
Among the incoming e-mails to this account on a particular day during 8:00am – 9:00am, suppose X
go the inbox and Y go to the junk-box. (I) Calculate P(X = 1). (II) Calculate P(Y = 1| X = 0).
8. The final round of the admission test for a management institute consists of three components: (I)
group discussion, (II) interview with practicing managers, (III) interview with the faculty of the
institute. The city, where the test is held, is notorious for traffic snarls. If a candidate is held up in a
traffic jam on the way to the test venue, he gets mentally disturbed and that can adversely affect his
performance in the test. In this case, his chances of being successful in (I), (II) and (III) are 0.3, 0.5
and 0.2 respectively. Otherwise, the corresponding chances are 0.6, 0.8 and 0.5 respectively. The
performances in (I), (II) and (III) in either case can be supposed to be mutually independent. The
chance that the candidate encounters a traffic jam on the way to the test venue is 0.3.
(a) Given that the candidate did not encounter a traffic jam on the way to the test venue, what is his
conditional probability of being successful in at least two of the three components of the test?
(b) Given that the candidate encountered a traffic jam on the way to the test venue, what is his
conditional probability of being successful in at least two of the three components of the test?
(c) Given that the candidate was successful in at least two of the three components of the test, what is
the conditional probability that he did not encounter a traffic jam on the way to the test venue?
9. In 2008, there were three brands A, B and C of a product having market shares 20%, 30% and 50%
respectively among a group of 1000 consumers. A new brand D came into the market in 2009. The
following are known about the behavior of these 1000 consumers:
(i) Among the consumers of brands A, B and C in 2008, respectively 50%, 40% and 30% switch over
to brand D in 2009. These people continue to use brand D till the end of 2009, but eventually some of
them start disliking brand D. As a result, among the converts to D from A, B and C, respectively 60%,
30% and 40% return to their original brands on January 1, 2010.
(ii) Among the three brands A, B and C, there is no change of loyalty in 2009 (i.e., there is no
conversion from brand A to brand B, and so on).
(iii) Those who do not change brand loyalty in 2009 continue with their brands on January 1, 2010.
Direction for parts (a)-(c): On December 31, 2009, one of the 1000 consumers is chosen at random.
(a) What is the probability that the chosen person is a consumer of brand D?
(b) Given that the chosen person is not a consumer of brand D, what is the conditional probability that
he/she was a consumer of brand C in 2008 ?
(c) Given that the chosen person is not a consumer of brand C, what is the conditional probability that
he/she will be a consumer of brand A on January 1, 2010 ?
Direction for parts (d)-(f): On January 1, 2010, one of the 1000 consumers is chosen at random.
(d) What is the probability that the chosen person is a consumer of brand D?
(e) What is the probability that the chosen person is a consumer of brand C ?
(f) Given that the chosen person is a consumer of brand A or C, what is the conditional probability that
he/she did not change brand loyalty in 2009 ?
10. The return, X, on an investment depends on the market condition which can be upbeat, moderate
or poor with respective probabilities 0.3, 0.4 and 0.3.
If the market condition is upbeat, then X equals 15 or 20 with respective probabilities 0.3 and 0.7.
If the market condition is moderate, then X equals 10 or 15, each with probability 0.5.
If the market condition is poor, then X equals 5 or 10 with respective probabilities 0.6 and 0.4.
There is also some uncertainty about the tax, T, on the return X, but the following are known:
If X = 20, then T = 5.
If X = 15, then T equals 3 or 4, with respective probabilities 0.4 and 0.6.
If X = 10, then T equals 2 or 3, with respective probabilities 0.8 and 0.2
17
If X = 5, then T = 0, i.e., there is no tax. Let Y = X – T be the post-tax value of the return.
(a) What is the probability that Y is even?
(b) Find the conditional expectation E(T | X < 12).
(c) Find the conditional variance V(X | T = 3).
(d) Given 0 < T < 4, what is the conditional probability for the market condition to be moderate?
(e) Find the conditional probability P(X > 8 | Y < 8).
11. The market condition in a region can be upbeat, moderate or poor, with respective probabilities
0.3, 0.5 and 0.2. The following are known about the profit, say X (in a certain monetary unit), of a
company under various market conditions:
(i) Upbeat: X equals 2, 3 or 4 with respective probabilities 0.2, 0.3 and 0.5;
(ii) Moderate: X equals 1, 2 or 4 with respective probabilities 0.3, 0.5 and 0.2;
(iii) Poor: X equals 1, 2 or 3 with respective probabilities 0.5, 0.3 and 0.2.
If X equals 3 or 4, then the company may be fined by the regulatory authorities for restrictive trade
practices, the amount of fine being 1 monetary unit. If X  3 , then this happens with probability 0.3,
in which case the net profit, say Y (in the same monetary unit as X), equals 2. On the other hand, if
X  4 , then the fine is imposed with probability 0.6, in which case the Y equals 3. Of course, with X
= 1 or 2, the question of imposition of a fine does not arise and hence Y equals X. The same happens
if X equals 3 or 4 and no fine is imposed.
(a) Find the probability that the company has to pay a fine.
(b) Obtain the conditional probability P ( X  3 | Y  2).
(c) Given that the market condition was moderate, what is the conditional expectation of Y ?
(d) Find the conditional variance of X given Y  3 .
12. Consider two economic indicators X and Y. The possible values of X are 2, 3 and 4, with
respective probabilities 0.3, 0.5 and 0.2.
If X = 2, then Y equals 1 or 2, with respective probabilities 0.4 and 0.6.
If X = 3, then Y equals 2 or 3, with respective probabilities 0.7 and 0.3
If X = 4, then Y equals 3 or 4, with respective probabilities 0.8 and 0.2.
If X + Y < 6, then the initial profit, Z, of a company equals X + Y. Else, Z = X + Y + 1.
The regulatory authorities may impose a penalty, W, on the company for restrictive trade practices.
If Z > 8, then W equals 1 or 2, with respective probabilities 0.2 and 0.8.
If Z = 8, then W equals 1 or 2, each with probability 0.5.
If Z < 8, then there is no penalty, that is, W = 0.
The final profit of the company is given by T = Z – W. Calculate
(a) P(T > 6), (b) E(Z | T = 7), (c) V(Z | W = 2), (d) E(YZW| W > 0) , and (e) P(Y is odd | T is even).
13. It is known that 50% of the passengers of an airline are Indians. The rest are obviously foreigners.
The passengers arrive at the check-in counter at a random order. Consider the first four passengers
who check-in for the flight. If the first passenger is an Indian then we write I for him or her; otherwise
we write F. The same thing is done for the second, third and fourth passengers. This gives a sequence
of length four consisting of I and F. For example, if all the first four passengers are foreigners then we
get FFFF. On the other hand, if the first and fourth passengers are Indians and the second and the third
passengers are foreigners, then we get the sequence IFFI. In any such sequence, an F-run is an
uninterrupted chain of F’s preceded and followed by either I or nothing. Similarly, an I-run is defined.
Let X and Y denote the numbers of F- and I- runs respectively. Thus with the sequence IFFI, we have
X=1 and Y=2 since the first and last members of the sequence give two I-runs whereas the two middle
members yield an F-run. Here are some more examples corresponding to several other possibilities for
the sequence formed by the first four passengers:
Sequence X Y
FFFF 1 0
IIFF 1 1
FIFI 2 2

18
Obtain (a) P(X=0 | Y=1), (b) P(X=1 | Y=1) and (c) P(X=2 | Y=1).
14. A manufacturing company has plants in three locations I, II, and III.
(a) A box contains four items produced at location I. Among these four, two are defectives and two are
not. Two items are drawn from the box at random and without replacement. What is the variance of
the number of defectives among the two items so drawn?
(b) From the production line at location II, where the proportion of defective items is 0.2, items are
drawn one by one at random till two defectives appear in consecutive draws or five items are drawn,
whichever happens earlier. If X items are drawn in the process, then calculate P(X = 5).
(c) The distribution of life (in hours) of any item produced at location III is exponential with mean 10.
If five items are drawn at random from the production line at this location, then what is the probability
that at least two of these survive for less than 10 hours?
15. (a) The waiting time (in minutes) at a bus stop follows the uniform distribution over the range
[0,10]. What is the probability that the total waiting time, over 30 occasions, exceeds 150 minutes?
(b) The distribution of a certain quality characteristic is continuous uniform over [– 8, 4]. If M is the
arithmetic mean of quality characteristic for 48 randomly chosen items, then find P(4 M   9) .
(c) The distribution of scores in a public examination is known to be normal with mean  and standard
deviation  . (i) If 20% of the candidates score over 80 and 30% of the candidates score below 40,
then find the ratio  /  . (ii) If   44 and   7 , then find the probability that among 100 randomly
chosen candidates no more than one scores below 29.
16. The probability distribution of the number of projects executed by a construction company per year
is as follows:
Value 4 5 6 7 8 9 10
2 2 2
Probability k k 7k +k 2k 2k 3k 2k
Here k is a suitable constant. Any year is "unusual" if the number of projects executed, X, is too large
or too small in the sense that |X7| > 2.
(a) Find the probability for any particular year to be "unusual".
(b) Also find the probability that out of four randomly chosen years, no more than one is "unusual".
(c) Find E(X).
17. There are three varieties, A, B, C, of an insect. These varieties occur in equal proportion in the
nature. An entomologist is conducting research on this insect and is primarily interested in variety A.
He collects the insect one-by-one till an insect of variety A is obtained. The process, however, is not
allowed to continue indefinitely. If an insect of variety A is not obtained even in four trials, the
entomologist stops collecting any more insect. Find the expected number of insects collected.
18. There are two machines A and B in a factory. The number of defectives, say X, produced by
machine A on a particular day follows the Poisson distribution with mean 2. Also, the number of
defectives, say Y, produced by machine B on the same day equals 0, 1, 2 or 3, each with probability
0.25. Independence of the two machines can be assumed.
(a) Given that exactly two defective items were produced on that day, find the conditional probability
that one of these defective items came from machine A and the other from machine B.
(b) Given that at least one of the machines produced two or more defective items on that day, what is
the conditional probability that one of the two machines did not produce any defective item at all on
that day?
19. The profit (in a certain unit) of a business enterprise for a particular year equals the larger root of
the quadratic equation x 2  Ax  B  0 , where A and B are two economic indicators, about which
the following are known:
 A equals either 7 or 8 with respective probabilities 0.4 and 0.6.

19
 Given A = 7, the possible values of B are 10 or 12 with respective conditional probabilities 0.7
and 0.3.
 Given A = 8, the possible values of B are 12 or 15 with respective conditional probabilities 0.8
and 0.2.
Let Z be the profit for the year.
(a) Obtain the conditional probability P(B = 15 | Z = 5). (b) Calculate E(Z).
20. A factory manufactures electric bulbs of two brands A and B. The distribution of life (in hours) of
any bulb of brand A is uniform over the range (0,4). The corresponding distribution for brand B is
uniform over the range (1,5). Four bulbs, of which two are of brand A and two are of brand B, are put
to test in a life testing experiment. Let X be the number of bulbs that fail during the first two hours of
the experiment and Y be the number of bulbs that fail during the first three hours of the experiment.
Find E(Y– X) and P(Y=3).
21. A zoologist, conducting research on crocodiles, collects crocodiles in a mangrove forest. There are
two kinds of crocodiles in the forest, interesting and boring (depending on a certain characteristic), in
the ratio 1:2. The crocodiles are collected one by one and at random. The population of crocodiles in
the forest is large so that the successive trials may be assumed to be independent.
(a) What is the probability that the first interesting crocodile is obtained at the fifth trial?
(b) What is the probability that at least six trials are needed to get the fourth boring crocodile?
(c) What is the expected number of interesting crocodiles that are collected before the third boring
crocodile is obtained? (no derivation needed; any relevant formula may be stated without proof).
(d) Suppose the third interesting crocodile appears in the Xth trial and the fifth interesting crocodile
appears in the Zth trial. Find E(Z  X). (no derivation needed; any relevant formula may be stated
without proof).
22. A psychological trait X affecting consumer behavior is studied at three locations Lucknow, Bhopal
and Panaji.
(a) At Lucknow, X has the continuous uniform over the range [ –h, 2], and V(X) = 3. Obtain the
conditional probability P(– 1 < X < 1 | – 5 < X < 1) at this location.
(b) At Bhopal, X has the normal distribution with mean 6 and standard deviation 3. If 100 customers
are chosen at random at this location, then what is the probability that at most one of them has a
negative X-value? [Given Φ(1.50) = 0.9332, Φ(1.75) = 0.9599, Φ(2.00) = 0.9772. Φ(2.25) = 0.9878]
(c) At Panaji, the distribution of X is exponential with P(X > 30) = 0.343. If three customers are
chosen at random at this location, then find the probability that the maximum of their X-values
exceeds 20.
23. The number of defects on a particular brand of plastic sheet (of fixed length and breadth) follows
the Poisson distribution. Out of four such sheets chosen at random, let
X = number of sheets with no defects,
Y = number of sheets with exactly one defect,
Z = number of sheets with exactly two defects.
It is known that the expectation of Z is twice that of X. Find
(a) the variance of the number of defects on the second of the four randomly chosen sheets,
(b) the variance of the number of sheets (among the chosen four) with two or more defects,
(c) P(X=1|Y=3).
24. Let X and Y denote the numbers of defective spots on two plastic sheets. The distribution of X is
Poisson such that P(X = 2) = P(X = 1). The distribution of Y is uniform with possible values 1, 2,…, n,
such that E(Y) = 3. Assume that X and Y are independent.
(a) Find the probability that two sheets together have three or more defective spots.
(b) Given that the two sheets together have exactly five defective spots, what is the conditional
probability that none of them has more than three defective spots?

20
(c) In a study on quality improvement, interest lies in the quantity Z, defined as Z  ( X  Y ) / 4 for
Y  3 and Z  X / 4 for Y  3 . Find E(Z).
25. A travel agent in a hilly region offers helicopter rides. The agent has only one helicopter which
can accommodate up to 4 passengers on any one trip. A passenger must have a reservation and any
passenger with reservation turns up for the trip with probability 0.5. If six reservations are made for
a trip, what is the expected number of empty seats when the helicopter departs ?
26. A factory produces two types of electric bulbs A and B. Sixty percent of the total output are of
type A and the rest are of type B. Any bulb of type A survives for 40 hours or more with probability
0.64. The corresponding probability for any bulb of type B is 0.36. It is known that the life distribution
of any bulb, either of type A or of type B, is exponential.
(i) Consider two bulbs, one of type A and the other of type B. What is the probability that they both
survive for 20 hours or more?
(ii) Given that a randomly chosen bulb from the total output of the factory has failed within 60 hours,
what is the conditional probability that it is of type A?
27. The following information is available about the graduating batches of a business school during the
period 2004-06.
Year 2004 2005 2006
Percentage of students securing a job 40 20 30
with a multinational
One student is chosen at random from the graduating batch of each year. Let X be the number of
students, out of the three so selected, who had secured a job with a multinational. Obtain P(X = x) for
x = 0, 1, 2, 3. Also find E(X) and V(X).
28. The proportions of tall people in four regions of a country, say North, East, South and West, are
0.5, 0.3, 0.3 and 0.4 respectively. One person was chosen at random from each region. Assume
independence across regions. Let
X = number of tall persons among the four people so selected,
Y = number of tall persons among the two people selected from East and South.
Calculate (a) V(X), (b) P(X = 2), (c) E (Y 2 ) , and (d) V(Y | X = 3).
29. The probability distribution of the number of e-mails, X, received daily by a businessman is as
follows:
x1
3 1
P(X = x) =    , x =1, 2, 3,……
 4  4
On a particular day, if two or fewer e-mails are received then the businessman replies to these e-mails
immediately. However, if three or more e-mails are received then only the first three of these are
replied to on the same day. Define
Y = number of e-mails, among those received on that day, that are replied to on the same day.
Z = number of e-mails, among those received on that day, that are not replied to on the same day.
Obtain (a) P(Y = 2), (b) P(Y = 3), (c) P(Z = 1 | Y =3), (d) E(Y), (e) Var(Y), and (f) E(Z).
30. Let X denote the number of projects that a company bids for in a particular year and Y denote the
number of projects, out of these X, that the company gets. It is known that the possible values of X are
2, 3 or 4 with respective probabilities 0.4, 0.4 and 0.2. Furthermore, for x = 2, 3 or 4, given X = x,
conditionally Y equals either x or x-1 each with probability 0.5. Obtain (a) P(X = 2 | Y = 2), (b) P(X =
3 | Y = 3), (c) E(X | Y =2), (d) Var(X | Y =2), (e) E(Y) and (f) E(XY).
31. In a production process, the joint distribution of two quality characteristics X and Y is as follows:
P(X = 1 and Y = 1) = 0.1, P(X = 1 and Y = 2) = 0.2, P(X = 1 and Y = 3) = 0,
P(X = 2 and Y = 1) = 0.1, P(X = 2 and Y = 2) = p, P(X = 2 and Y = 3) = 0.2,
P(X = 3 and Y = 1) = 0, P(X = 3 and Y = 2) = 0.1, P(X = 3 and Y = 3) = q.

21
Here p, q are nonnegative and p + q = 0.3.
(a) If E(X | Y = 2) = E(Y | X = 1) + (1/12), then what is the value of p ?
(b) Now, let p = 0.2 and q = 0.1. Then calculate E(XY).
32. It is known that the life (in hours) of a certain kind of electric bulb is specified by the following
1
probability density function: f (x) = (2  3x 2 ) , 0 < x < 4,
72
Let X 1 ,…, X 36 be the lives of 36 bulbs of this kind. Define
Z = max ( X 1 , X 2 ), W = min ( X 1 , X 2 ),
Y = X 1 + X 2 +…+ X 36 .
Assuming independence across bulbs, obtain (a) P(2 < X 1 < 3), (b) E( X 1 ), (c) P(2 < W < 3), (d) P(2 <
W < Z < 3) and (e) P(Y > 104).
33. A psychological trait, say X, is known to influence consumer behavior. Persons with  6  X  6
have normal behavior, while those with X  6 are hyperactive and those with X   6 are hypoactive.
Parts (a) – (e) below concern different geographic locations.
(a) In location A, the distribution of X is specified by the probability density function
   |x|
f ( x)  e ,   x  ,
2
where  is a positive constant. If 80% of the people in this location have a normal behavior, then what
is the value of  ?
(b) In location B, 10% of the people are hyperactive and 20% of the people are hypoactive.
Furthermore, the proportions of buyers of a particular brand among the hyperactive, normal and
hypoactive people are 0.8, p and 0.05 respectively. The following is also known about location B:
“Given that a randomly chosen person is a buyer of this particular brand, the conditional probability of
his/her being hyperactive is 0.4.”
(i) What is the value of p ?
(ii) Will the answer in (i) change if the distribution of X in location B is known to be normal?
(c) In location C, the distribution of X is unknown but it is known that E ( X )  0 and Var ( X )  4 .
From this information alone, the marketing manager of a company concludes that less than 12% of the
people in this location are hyperactive. Is the manager correct?
(d) In location D, the distribution of X is normal. It is also known that in this location, 10% of the
people are hyperactive and 20% of the people are hypoactive. What are the values of the mean and
standard deviation of X for this location?
34. A study on a psychological trait, X, affecting consumer behavior is conducted over three cities
Jaipur, Patna and Kochi. In Jaipur, X is exponentially distributed with mean 3, while in Patna, the
distribution of X is continuous uniform over the range [4, h] , where h is a positive constant. Finally,
the distribution of X in Kochi is specified by the probability density function f ( x)  k | x | if | x |  4 ,
and f ( x)  0 otherwise, where k is a positive constant.
(a) A person is called hyperactive if his/her X-value exceeds 11. Find the probability that out of 100
randomly chosen people in Jaipur at least two are hyperactive.
(b) If V(X) = 3 in Patna, then find P(X > 0 | – 3 < X < 3) for this city.
(c) Find P(3  X  1) for Kochi.
35. (a) The number of patients, say X, arriving at a specialty clinic on a particular day follows the
Poisson distribution with mean two. The clinic can handle at most two such patients. Thus if Y denotes
the number of patients handled by the clinic, then Y = X if X ≤ 2, while Y = 2 if X > 2. Find the
expectation of Y.

22
(b) Let X be the number of defectives among four items drawn at random from a large lot. The lot is
accepted if X = 0, and rejected if X ≥ 2. If X = 1, then two more items are drawn at random from the
lot. The lot is rejected if both these items are defectives, and accepted otherwise. If 20% of the items in
the lot are defectives, then what is the probability of acceptance of the lot?
(c) A system, with three components A, B and C arranged in series, functions if and only if all the
three components function. The life (in hours) of each component has the same exponential
distribution and these components behave independently. The probability that the system functions for
at least 8 hours is 0.6561. What is the probability that component A functions for 6 hours or more?
(d) The demand, X, of a certain commodity is known to have the continuous uniform distribution over
the range [8, 12]. The profit, Y, is related to X as follows: Y = X+1 if X≤ 10, while Y = X2 otherwise.
What is the expectation of Y ?
36. Let X denote the life (in hours) of a certain kind of bulb which is produced in four locations A, B,
C and D. The following are known:
(i) At location A, the distribution of X is exponential with mean 10.
(ii) At location B, the distribution of X is continuous uniform with mean 10 and variance 12.
(iii) At location C, the distribution of X is specified by the probability density function
f ( x) = k (12  x) 2 , if 8  x  12 ,
= 0 otherwise,
where k (> 0) is a constant.
(iv) Ten per cent of the bulbs produced at location D are defectives (i.e., fail within 5 hours).
(a) If two bulbs are selected at random from the output of location A, then what is the probability that
one will survive for 10 hours or more and the other will fail to do so?
(b) If one bulb is selected at random from the output of location B, then what is the probability that it
will survive for 12 hours or more?
(c) With reference to location C, what is the value of k ?
(d) The manufacturing cost at location D is Rs 10 per bulb and selling price of any bulb produced
here is Rs 12. If any bulb sold turns out to be defective then a complete refund is made to the
customer. What is the variance of the profit (in Rs) for any bulb sold from the output of location D ?
(e) From the output of location D, bulbs are inspected one by one at random till a defective bulb is
found. Let Y be the number of non-defective bulbs inspected in the process. A performance measure
Z is defined as Z = 2 if Y > 1, and Z = Y otherwise. What is the value of E(Z) ?
37. The joint probability distribution of two business indicators X and Y, each with three possible
values 1, 2 and 3, is shown below:
P(X = 1 and Y = 1) = 0.1, P(X = 1 and Y = 2) = 0, P(X = 1 and Y = 3) = 0.2,
P(X = 2 and Y = 1) = 0, P(X = 2 and Y = 2) = 0.3, P(X = 2 and Y = 3) = 0.1,
P(X = 3 and Y = 1) = 0.2, P(X = 3 and Y = 2) = 0.1, P(X = 3 and Y = 3) = 0.
Find (a) E(Y), (b) E(XY), (c) E(X| Y = 2) and (d) Var (Y| X= 3).
38. A number X is chosen at random and with equal probability from the set {2, 3, 4}. Then a number
Y is chosen at random and with equal probability from the set {1, 2, …, X}. Find (a) E(X| Y =2), (b)
V(X | Y=3) and (c) V(Y| X=2).
39. Consider the probability distributions of a quality characteristic X at three different locations of a
manufacturing unit.
(a) At location 1, the distribution of X is continuous uniform over the range [– 2, 3]. Three items are
drawn at random from the production line at this location, and let Y be the minimum of the X-values
for these three items. Find the conditional probability P(Y < 0| Y < 2).
(b) At location 2, if X is exponentially distributed and P(X > 30) = 0.216, then find P(X < 40).

23
(c) At location 3, the distribution of X is specified by the probability density function f (x) , where
f (x) = k | 2  x | if 1  x  4 , and f (x) = 0 otherwise. Find E(X) at this location.
40. A test has three components I, II and III, which are passed by a randomly chosen candidate with
probabilities 0.3, 0.2 and 0.1, respectively. The number of candidates is very large. Also, assume
independence across the three components as well as across candidates.
(a) Let X be the number of components passed by a randomly chosen candidate. Find V(X)/E(X).
(b) Obtain the probability that a randomly chosen candidate passes at least two components.
(c) If 100 candidates are chosen randomly, then find the probability that at most one of them passes all
the components.
(d) Candidates are drawn at random, one by one, till exactly two candidates passing component I are
found, or four candidates are drawn, whichever happens earlier. If Y be the number of candidates
drawn in the process, then find E(Y).
41. Let Y be a psychological trait influencing consumer behavior.
(a) If the distribution of Y is continuous uniform with expectation 10 and variance 12, then find the
conditional probability P(12 < Y < 18 | 6 < Y < 18).
(b) Suppose Y has the exponential distribution and let three consumers be chosen at random. The
probability that Y is less than 10 for at least one of these three consumers is 0.0784. Then what is the
probability that Y exceeds 15 for all three of them?
(c) Let Y have probability density function f ( y ) , where f ( y )  k (3 | y |) if | y | 3 , and f ( y )  0 if
| y |  3 . Here k is a suitable positive constant. Find V(Y).
42. In a test, there are three questions A, B and C, which are correctly answered by any randomly
chosen candidate with probabilities 0.1, 0.3 and 0.4, respectively. The number of candidates is very
large. Moreover, one can safely assume independence across questions and also across candidates.
(a) Given that a randomly chosen candidate has answered at least two questions correctly, what is
his/her conditional probability of answering question B correctly?
(b) Find the probability that out of 100 randomly chosen candidates at least two answer all three
questions correctly.
(c) Candidates are drawn one by one at random till four candidates correctly answering question C are
found. What is the probability that at least six candidates will have to be drawn for this purpose?
(d) Let X be the number of candidates, out of two chosen at random, who answer both questions A and
C incorrectly. If Y = 1 if X ≤ 1, and Y = 2, if X > 1, then obtain V(Y).
43. Let X be an indicator of business scenario, such that the scenario is upbeat if X > 12, poor if X < 6,
and moderate if 6 ≤ X ≤ 12. The indicator X is studied at four different locations P, Q, R and S.
(a) If at location P, the distribution of X is continuous uniform with E(X) = 13 and V(X) = 12, then
what is the probability that the business scenario is moderate at this location?
(b) If at location Q, distribution of X is exponential and the business scenario is poor with probability
0.16, then what is the probability that the business scenario is upbeat at this location?
(c) At location R, the distribution of X is normal with mean  and standard deviation  . If at this
location, the business scenario is poor with probability 0.3, and upbeat with probability 0.2, then
calculate the ratio  / .
[Given: Φ(0.3) = 0.6179, Φ(0.5244) = 0.7, Φ(0.8) = 0.7881, Φ(0.8416) = 0.8, Φ(1.2816) = 0.9]
(d) At location S, the distribution of X is specified by the probability density function f (x ) , where
f (x) = k if 4 ≤ x ≤ 8, f (x) = c if 8 < x ≤ 14, and f (x) = 0, otherwise.
Here k and c are constants. If E(X) = 8.37 at this location, then calculate the ratio c/k.
44. A gift coupon appears in 20% of the packets of an expensive detergent A and also in 1% of the
packets of an inexpensive detergent B. Assume independence of all packets.
(a) Ayesha purchases four packets of A. What is the conditional probability of her getting two or more
gift coupons given that she has got at least one?

24
(b) Hari purchases 100 packets of B. Let Y equal 0 is he does not get any gift coupon, 1 if he gets one
gift coupon, and 2 if he gets two or more gift coupons. Calculate V(Y).
(c) Surjit goes on purchasing packets of A one by one till he gets two packets with gift coupons. Find
the probability that he purchases at most four packets in the process.

25
STATISTICS FOR MANAGEMENT
PROBLEMS ON PROBABILITY AND RANDOM VARIABLES

ANSWERS
1. 7/9
2. 0.048
3. 0.2872
4. (a) 0.08, (b) 0.4022, (c) 0.8689, (d) 2
5. (a) 0.44, (b) 0.404, (c) 0.8982, (d) 0.5556
6. (a) 0.83, (b) 0.086, (c) 0.506, (d) 0.2326, (e) 0.08
7. (a) 0.7, (b) 0.3413, (c) (I) 0.692, (II) 0.4878
8. (a) 0.7, (b) 0.25, (c) 0.8673
9. (a) 0.37, (b) 0.5556, (c) 0.246, (d) 0.214, (e) 0.41, (f) 0.789
10. (a) 0.372, (b) 1.408, (c) 5.73, (d) 0.6422, (e) 0.2623
11. (a) 0.189, (b) 0.095, (c) 1.98, (d) 0.235
12. (a) 0.27, (b) 7.5496, (c) 0.2041, (d) 41.76, (e) 0.2985
13. (a) 0.1, (b) 0.6, (c) 0.3
14. (a) 1/3, (b) 0.896, (c) 0.9354
15. (a) 0.5, (b) 0.3085, (c) (i) 1.89, (ii) 0.52
16. (a) 0.21, (b) 0.8037, (c) 7.82
17. 2.407
18. (a) 0.4, (b) 0.2712
19. (a) 0.3, (b) 5.36
20. E(Y – X) = 1, P(Y=3) = 0.375
21. (a) 0.0658, (b) 0.5391, (c) 1.5, (d) 6
22. (a) 0.4, (b) 0.3355, (c) 0.8673
23. (a) 2, (b) 0.9647, (c) 0.1856
24. (a) 0.8917, (b) 0.4762, (c) 0.8
25. 1.125
26. (i) 0.48, (ii) 0.4828
27. P(X=0)= 0.336, P(X=1)= 0.452, P(X=2)= 0.188, P(X=3)= 0.024,
E(X)= 0.9, V(X) = 0.61
28. (a) 0.91, (b) 0.335, (c) 0.78, (d) 0.2271
29. (a) 3/16, (b) 9/16, (c) 3/16, (d) 37/16, (e) 0.7148, (f) 27/16
30. (a) 0.5, (b) 2/3, (c) 2.5, (d) 0.25, (e) 2.3, (f) 7
31. (a) 0.1, (b) 4.2
32. (a) 7/24, (b) 26/9, (c) 77/192, (d) 49/576, (e) 1/2
33. (a) 0.2682, (b) (i) 0.1571, (ii) No, (c) Yes, by Chebyshev’s inequality,
(d) mean = – 1.246, standard deviation = 5.66
34. (a) 0.725, (b) 0.4, (c) 0.3125
35. (a) 1.459, (b) 0.8028, (c) 0.9, (d) 65.667
36. (a) 0.4651, (b) 1/3, (c) 3/64, (d) 12.96, (e) 1.71
37. (a) 2, (b) 3.7, (c) 2.25, (d) 2/9
38. (a) 36/13, (b) 12/49, (c) 1/4
39. (a) 0.7903, (b) 0.8704, (c) 2.9333
40. (a) 0.7667, (b) 0.098, (c) 0.8781, (d) 3.694.
41. (a) 0.4, (b) 0.8847, (c) 1.5
42. (a) 0.8313, (b) 0.3374, (c) 0.9130, (d) 0.2066
43. (a) 5/12, (b) 0.7056, (c) 0.5290, (d) 0.6008
44. (a) 0.3062, (b) 0.6214, (c) 0.1808

26

You might also like