You are on page 1of 49

Prob Dist

Introduction to Probability and Statistics


The maths, the computation, and examples.

Dr Asad Ali

Department of Space Science


Institute of Space Technology
Islamabad, Pakistan

1 / 212
Prob Dist

Probability Distributions

Chapter 6: Probability Distributions

165 / 212
Prob Dist

Probability Distributions
In the last chapter we learned that every random variable follows one or the other probability distribu-
tion for which a corresponding pmf or pdf can be established. This actually means that the observations
on that variable are generated according to some mechanism. That mechanism is governed by some
well defined mathematical model, called probability mass function or probability density function. Fur-
thermore, we also learned as to how to calculate the probabilities of different events and probabilistic
statements. Also, we learned that mathematical expectation can be used to deduce some of the proper-
ties of a random variable, such as mean, variance, covariance, moments and correlation etc. Now we are
going to study some real world random variables for which some proper probability distributions were
established.
Some discrete probability distributions

The Binomial distribution:


Before introducing the Binomial distribution, we talk about a special type of trials called the
Bernoulli trials.
Often, only two outcomes can be attributed to a process.
Here are just a few examples:
A tossed coin falls either with ‘head’ or ‘tail’ up

An industrial process produces processors that can be either ‘usable’ or ‘defective’.

Polled individuals can be either ‘for’ or ‘against’ the death penalty.

Blood tests look for the ‘presence’ or ‘absence’ of antibodies in blood.

Can you name a few?


166 / 212
Prob Dist

Probability Distributions

Definition: Bernoulli Trial


A trail is called a Bernoulli Trial if
1 The trial has two outcomes: Success (S) or Failure (F ).
2 For each trial, p = P (S) is the probability of Success and q = 1 − p = P (F ) is the probability of
Failure. Both p and q are fixed for all trials.
3 Trials are independent of each other.

Now can define binomial distribution as following

Definition: Binomial Distribution


An experiment is called a binomial experiment if
1 There are (fixed) n trials.
2 Each trail of the experiment has two outcomes: Success (S) or Failure (F ).
3 For each trial, p = P (S) is the probability of Success and q = 1 − p = P (F ) is the probability of
Failure. Both p and q are fixed for all trials.
4 Trials are independent of each other.
The first property, exhibits that n Bernoulli trials constitute a binomial experiment.

167 / 212
Prob Dist

Probability Distributions
Binomial Distribution
Let X denotes the number of successes in a binomial experiment (X is then called a binomial random
variable), the pmf of X is defined as,
!
n x n−x
f (x) = P (X = x) = p q , x = 0, 1, 2, ..., n
x
The terms n, p are called the parameters of the binomial distribution. It’s traditionally denoted by
b(x; n, p).
How to do it in R
In R the binomial probabilities can be calculated in two ways.
1. Using the cumulative probabilities
pbinom (x , n , p ) # Cumulative probability
p ( X = x ) = pbinom (x , n , p ) - pbinom (x -1 , n , p ) # Here you need subtraction
# e.g.
b ( x =2 , n =5 , p =0.25) = pbinom ( x =2 , n =5 , p =0.25) - pbinom ( x =1 , n =5 , p =0.25)
2. Using the exact probabilities
dbinom (x , n , p ) # Exact probability
p ( X = x ) = dbinom (x , n , p )
# e.g.
p ( X =2) = dbinom ( x =2 , n =5 , p =0.25)
Note: These approaches can be applied to many other distributions.
168 / 212
Prob Dist

Probability Distributions

Example 1.
Consider a coin tossing experiment in which the coin is tossed five times. Find the probabilities of
obtaining various numbers of heads.
Solution:
Lets check the properties of this experiment.
1 each toss has two possible outcomes; either a head (success) occurs or a tail (failure).
2 the probability of a success is p = 21 (and hence q = 1 − p = 12 ), and it remains the same for all
trials.
3 the successive trails of the experiment are independent (or the successive tosses are independent).
4 the coin is tossed 5 times.
Thus its a binomial experiment. Let the rv X denotes the number of heads (successes), then it has a
binomial distribution with pmf :
!   
x 5−x
5 1 1
f (x) = P (X = x) = , x = 0, 1, 2, ..., 5
x 2 2

Lets start putting the values of X one by one, as following.


!   
0 5−0  5
5 1 1 1 1
P (no head) = P (X = 0) = =1× =
0 2 2 2 32

169 / 212
Prob Dist

Probability Distributions !   
1 5−1  5
5 1 1 1 5
P (one head) = P (X = 1) = =5× =
1 2 2 2 32
!   
2 5−2  5
5 1 1 1 10
P (two heads) = P (X = 2) = = 10 × =
2 2 2 2 32
!   
3 5−3  5
5 1 1 1 10
P (three heads) = P (X = 3) = = 10 × =
3 2 2 2 32
!   
4 5−4  5
5 1 1 1 5
P (four heads) = P (X = 4) = =5× =
4 2 2 2 32
!   
5 5−5  5
5 1 1 1 1
P (five heads) = P (X = 5) = =1× =
5 2 2 2 32

The binomial probability distribution of the number of heads obtained in 5 tosses of a coin is

xi 0 1 2 3 4 5
1 5 10 10 5 1
f (xi ) 32 32 32 32 32 32

Using R
dbinom (0:5 , 5 , 0.5)
[1] 0.03125 0.15625 0.31250 0.31250 0.15625 0.03125
170 / 212
Prob Dist

Probability Distributions

Example 2.
The probability of getting caught someone’s else exams is 0.2, find the probability of not getting caught
in three attempts. Assume independence.
Solution:
Here p = 0.2 (q = 1 − p = 0.8) and n = 3. Let X denotes the number of success (getting caught), then
it has binomial distribution with pmf.
!
n x n−x
f (x) = P (X = x) = p q , x = 0, 1, 2, ..., n
x
!
3
= (0.2)x (0.8)3−x , x = 0, 1, 2, 3
x

Now, since no gets caught, therefore X = 0..


!
3
f (0) = P (X = 0) = (0.2)0 (0.8)3−0
0
= (0.8)3
= 0.512

171 / 212
Prob Dist

Probability Distributions
Example 3.
Let A and B play a game in which A’s probability of winning is 2/3 . In a series of 8 games, what’s the
probability that A will win (i) exactly 4 games (ii) at least 4 games (iii) at most 6 games, and (iv) from
3 to 6 games?
Solution:
We observe that:
1 Each game has two possible out comes: A will win or will not win a game.

2 For all games the probability of A’s winning a game is p = 2 .


3
3 The successive games are independent.

4 There are n = 8 games.

Let X denotes the number of games won by A (success) then it has a binomial distribution; b(x; 8, 32 )
with pmf :
!
n x n−x
f (x) = P (X = x) = p q , x = 0, 1, 2, ..., n
x
!   
x 8−x
8 2 1
= , x = 0, 1, 2, ..., 8
x 3 3

Now we find the required probabilities:


!   
4 8−4
8 2 1 1120
P (X = 4) = = = 0.1707
4 3 3 6561
172 / 212
Prob Dist

Probability Distributions

and P (X ≥ 4) = P (X = 4) + P (X = 5) + P (X = 6) + P (X = 7) + P (X = 8)
8  2 4  1 8−4 8  2 5  1 8−5 8  2 6  1 8−6
= + +
4 3 3 5 3 3 6 3 3
8 2 7 1 8−7 8 2 8 1 8−8
       
+ +
7 3 3 8 3 3
= 0.9121

Note that P (X ≥ 4) = 1 − P (X < 4) (complement). So you can also calculate it this way. Similarly
P (X ≤ 6) = P (X = 0) + P (X = 1) + P (X = 2) + · · · + P (X = 6) (7 terms)
= 1 − P (X > 6) (using the complement rule)
= 1 − [P (X = 7) + P (X = 8)] (only two terms)
" #
8  2 7  1 8−7 8  2 8  1 8−8
=1− +
7 3 3 8 3 3
1280
=1− = 0.8049
6561
and P (3 ≤ X ≤ 6) = P (X = 3) + P (X = 4) + P (X = 5) + P (X = 6)
8  2 3  1 8−3 8  2 6  1 8−6
= + ··· +
3 3 3 6 3 3
5152
= = 0.7852
6561
173 / 212
Prob Dist

Probability Distributions

Binomial Frequency Distribution


Sometimes we perform an entire binomial experiment repeatedly, say N times. For example, One
may want to toss a coin 10 times every day for consecutive 30 days. Recall the relative frequencies
of frequency tables, we have seen in the definition of mathematical expectation that the relative
frequencies are actually just the probabilities f (x) for each value of X. That’s if we multiply each
relative frequency (or analogously f (x)) by the total number of observations we will again get the
frequency table. In the same manner, we can find a frequency table, called binomial frequency table,
by multiplying the probability of each success by the number of times the experiment was repeated.
Mathematically, we define it as:
!
n x n−x
N · f (x) = P (X = x) = N · p q , x = 0, 1, 2, ..., n
x

In the case of frequency table, one can not reconstruct the actualP
frequency table, using the relative
frequencies, exactly, P
unless one knows the total of frequencies ( f ), thus N , in this case, is not
necessarily equal to f . N is just the number of times the entire experiment was performed.
Home work:
Mean and Variance:
Let X be a binomial rv with pmf :
n
f (x) = px q n−x , x = 0, 1, 2, ..., n
x
Find E(X), E(X 2 ) and V (X).
174 / 212
Prob Dist

Probability Distributions
Example 4.
A five dice experiment was repeated 96 times. Find the expected frequencies when getting a 4, 5 or 6 is
considered a success.
Solution:
The probability of getting each 4, 5 and 6 in a single trail is 16 and using the addition law of probability
we observe that P (4 or 5 or 6) = 61 + 16 + 61 = 21 , this is our p. So we now have n = 5, p = 12 and N = 96.
Let X be the rv denoting the success (getting 4, 5 or 6), then X can take 0, 1, 2, 3, 4, 5 values. Thus
the binomial frequency distribution is given by:
!
n x n−x
N · f (x) = N · p q ,  x 1 5−x
x x 96 x5 12 2

5
 1 0 1 5−0
x = 0, 1, 2, ..., n 0 96 0 2 =3
2
1 1 5−1
1 96 51 12
  
putting the values of the relevant quanti- 2
= 15
ties, i.e. N , n, p and q 2 1 5−2
2 96 52 12
  
2
= 30
!   
x 5−x
5 1 1 3 1 5−3
3 96 53 12
  
N · f (x) = 96 · , 2
= 30
x 2 2
4 1 5−4
4 96 54 12
  
x = 0, 1, 2, ..., 5. 2
= 15
5 1 5−5
96 55 12
  
Now make a 2 column frequency table as 5 2
=3
shown. 175 / 212
Prob Dist

Probability Distributions
This is known as the binomial frequency distribution.
In your intermediate mathematics, you may have seen the expansion of a binomial expression (p + q)n .
When multiplied by N , we get the same thing as calculated in the above table. That is we can get the
same results if we expand
 5
1 1
96 + (like N (p + q)n )
2 2
using the binomial theorem. In practice,
n
!
n
X n x n−x
(p + q) = p q
x=0
x

so one just needs to choose the individual components of this sum. Also note that,
n
!
n
X n x n−x
(p + q) = 1 = p q
x=0
x

that is the sum of all probabilities is equal to one.

176 / 212
Prob Dist

Probability Distributions
Properties of Binomial Distribution
For a discrete random variable X having a binomial distribution with parameters n and p
(p = 1 − q)
1 The mean is given by E(X) = µ = np
2 The variance is given by Var(X) = σ 2 = npq or we can also write Var(X) = σ 2 = np(1 − p)

177 / 212
Prob Dist

Probability Distributions
Example 5.
A binomial random variable has mean 12.8 and variance 8.64. Find n and p.
Solution:
We know that the mean of a binomial random variable X is

µ = np (1)

and variance of X is given by

σ 2 = npq (2)

Putting the values and then dividing equation 2 by equation 1 (you can use any order) gives.

σ2 npq 8.64
= =
µ np 12.38
=⇒ q = 0.698
=⇒ p = 1 − q = 1 − 0.698 = 0.302

also
µ 12.38
µ = np =⇒ n = = = 40.9797 ≈ 41
p 0.302
178 / 212
Prob Dist

Probability Distributions
Example 6.
Is it possible to have a binomial distribution with mean 5 and standard deviation 3?
Solution: Lets check it. We need n and p (and/or q). We have
σ2 npq
 9
=  = =⇒ q = 1.8
µ np

 5
q is just a probability; the probability of failure, and it must be between 0 and 1. So it’s not possible to
have a binomial distribution with mean 5 and standard deviation 3.
Example 7.
If X is binomially distributed with mean 3 and variance 2. Find P (X = 7).
Solution: Again, to specify the pmf of X which is binomially distributed, we need its two parameters
n and p. So,
σ2 npq
 2 1 µ 3
=  = =⇒ q = and µ = np =⇒ n = = =⇒ n = 9
µ np
 3 3 p 1
3
We now have n and p so we can specify the pmf of X as following.
!   
x 9−x
1 9 1 2
b(x; 9, ) = x = 0, 1, 2, ..., 9
3 x 3 3
Thus P (X = 7) is given as,
!   
7 9−7
9 1 2 16
P (X = 7) = = = 0.0073
7 3 3 2187
179 / 212
Prob Dist

Probability Distributions

Fitting a binomial distribution to observed data


In practice, we often need to know as to which probability distribution can best explain the na-
ture/behavior of our observations (or the observed data). This is one of the real purposes of applying
theoretically established probability models to numerical observations. To Pfit a binomial distribu-
tion to an observed data is very easy, as we just need n [note that, n 6= f , rather it’s just the
largest value that a
Pbinomial rv can take in a given experiment e.g. x = 0, 1, 2, ..., n] and the
fx
actual mean (x̄ = P ) of observed data. For a binomial distribution we need n and p, that can
f
be easily found for any data. We know that x̄ = np, so putting the value of n and x̄ will give the
value of p. Thus we just find x̄ for a given data and we can then specify a binomial pmf with its
parameters n and p. The following example would clear the clouds of any confusion.

180 / 212
Prob Dist

Probability Distributions
Example 9.
Fit a binomial distribution to the following data.
x 0 1 2 3 4
f 30 62 46 10 2
Solution:
To fit a binomial distribution to this data we need the actual mean of the data. Here, the largest value
of X is 4 so n = 4.
Now P
fx 0 + 62 + 92 + 30 + 8 192
x̄ = P = = = 1.28
f 150 150
Using the relation x = np −→ 1.28 = 4p we find p = 0.32 and q = 1 − p = 0.68. So the pmf of X can
be specified as,
!
4
b(x; 4, 0.32) = P (X = x) = (0.32)x (0.68)4−x , x = 0, 1, 2, 3, 4.
x

Now what we need is same as we did in construction the binomial frequency table (Example 46). Here
we have N = 150. So we just need a table with columns of x and
!
4
150 · (0.32)x (0.68)4−x
x

List the values of x and find the corresponding P (X = x) and then N · P (X = x).
181 / 212
Prob Dist

Probability Distributions
Here is what we were looking for.

4
(0.32)x (0.68)4−x

x b(x; 4, 0.32) = x
150 · f (x)

4
(0.32)0 (0.68)4−0 = 0.21381376

0 0
32.072064 ≈ 32
4
(0.32)1 (0.68)4−1 = 0.40247296

1 1
60.370944 ≈ 60
4
(0.32)2 (0.68)4−2 = 0.28409856

2 2
42.614784 ≈ 43
4
(0.32)3 (0.68)4−3 = 0.08912896

3 3
13.369344 ≈ 13
4
(0.32)4 (0.68)4−4 = 0.01048576

4 4
1.572864 ≈ 2

The frequencies in the last column are called the expected frequencies whereas the actual frequencies are
called the observed frequencies. We can compare them by listing them together in the following table.
x 0 1 2 3 4
Observed f 30 62 46 10 2
Expected f 32 60 43 13 2
We can see that both the actual and expected frequencies do not differ very much, thus the fit is rather
good. This is just a rough analysis, otherwise the actual analysis of the fit is not that simple. To check
the goodness of the distribution fit, we use different tests in statistical methods.
182 / 212
Prob Dist

Probability Distributions
Probability Tables
In practice, the probabilities can be read from the probability tables that are often available for
most of probability distributions. For example, for binomial distribution a table which lists the
exact or cumulative probabilities associated with various values of X and p can be used to calculate
the required binomial probabilities. Here is an example table truncated to n = 3 only. It gives the
cumulative binomial probabilities, that’s
c
!
X n x n−x
P (X ≤ c) = p q
x=0
x

Suppose n = 3 and p = 0.4, the probability of say P (X = 2) is


P (X = 2) = P (X ≤ 2) − P (x ≤ 1) = 0.936 − 0.648 = 0.288
183 / 212
Prob Dist

Probability Distributions
Probability Tables
Another, frequently used, table is the table of exact binomial probabilities. That’s with such tables,
we can calculate P (X = x) directly. Here is an example table truncated to n = 4 only. It gives
!
n x n−x
P (X = x) = p q
x

When n = 3, p = 0.4, the probability of say P (X = 2) is 0.288. These tables are very useful as
they help in finding the probabilities without calculating the cumbersome combinations and powers
of decimal probabilities. However, there is a disadvantage too of these tables. These tables are
available for a very few values of p only. For example, the probabilities corresponding to p = 0.2345
are not listed in the above tables. Probabilities are very sensitive to rounding errors. So try to avoid
these tables when p does not match very well with one listed in these tables.
184 / 212
Prob Dist

Probability Distributions
Poisson Distribution
Many experimental situations occur in which we observe the counts of events within a specified
unit of time, area, volume, length etc. For example,
The number of telephone calls in an hour.
The number of cases of a disease per square kilometer in a specified area
The number of dolphin pod sightings along a flight path through a region
The number of particles emitted by a radioactive source in a given time
The number of births per hour during a given day
The Poisson distribution is a discrete probability distribution for the counts of events that occur
randomly in a given interval of time (or space).
If we let X = The number of events distributed independently in time, occurring in a fixed
time-interval or region.
Then, if the mean number of events per interval is λ.
The probability of observing x events in a given interval is given by

e−λ λx
P (X = x) = , x = 0, 1, 2, 3, 4, ...
x!
Where λ is the only parameter of the distribution, interpreted as, the average number of events in a given
interval/region. For example, the average number of calls per hour. Since it uses ‘per’ therefore it is, in
reality, a rate. Poisson distribution is generally denoted by p(x; λ) or P oisson(λ). Poisson distributed
random variables appear in many astronomical studies and it is thus very important to understand them
well.
In R the probabilities can be found in the same way as binomial. He we use ppois(x, lambda) and
dpois(x, lambda) according to the situation.
185 / 212
Prob Dist

Probability Distributions
Example 10.
If the number of arrivals is 10 per hour on average, determine the probability that, in any hour there
will be
1 0 arrivals;

2 6 arrivals;

3 more than 6 arrivals

Solution:
We see that there is no probability of success p or the number of trials n given. We are only given the
average number of arrivals/hour (the rate of arrivals). Also the underlying variable is a discrete rv, so
we need to use the Poisson distribution to find these probabilities.
We have λ = 10.
Let X is the number of arrivals then P (X = x) is defined as
e−10 10x
P (X = x) = x = 0, 1, 2, 3, 4, ...
x!
Now
e−10 100
(1) P (X = 0) = = 4.539993e − 05
0!
−10 6
e 10
(2) P (X = 6) = = 0.063055
6!
(3) P (X > 6) = 1 − P (X ≤ 6) = 1 − {P (X = 0) + P (X = 1) + · · · + P (X = 6)}
= 1 − 0.1301414 = 0.8698586
186 / 212
Prob Dist

Probability Distributions
Example 11. Home Work
The average rate of telephone calls in a busy reception is 4 per minute. If it can be assumed that the
number of telephone calls per minute interval is Poisson distributed, calculate the probability that
1 at least 2 telephone calls will be received in any minute.
2 any minute will be free of telephone calls.
3 no more than one telephone call will be received in any one minute interval.
Also go through section 3.7 and the exercise in the end i.e. Exercises Section 3.7 (93-109).

Using the Poisson to approximate the Binomial


The Binomial and Poisson distributions are both discrete probability distributions. In some circum-
stances the distributions are very similar. A Binomial(n, p) can be approximated with a Poisson(λ),
when n is very large; n → ∞ and p is very small; p → 0, such that np remains constant. Mathematically

e−λ λx
lim b(x; n, p) = , x = 0, 1, 2, ..., ∞
n→∞ x!
where λ = np.

187 / 212
Prob Dist

Probability Distributions
Example 12.
Past experience in the production of certain item has shown that the probability of an item being
defective is p = 0.03. The items depart in boxes of 500. What is the probability that
1 a box contains 3 or more defective;
2 two successive boxes contain 6 or more defectives in them.
Solution:
Let X represents the number of defective items in a box. In reality, it’s a binomial problem because
of the nature of the outcomes (defective or good). However, we see that p = 0.03 is too small and
n = 500 is
P500  too large making it a challenge to use binomial distribution. (evaluating P (X ≥ 3) =
500 x 500−x
x=0 3
(0.03) (0.97) can drive you crazy)
So to find the above probabilities we use Poisson distribution with

λ = np = 500 × 0.03 = 15

So the probability mass function of X is specified as:

e−λ λx e−15 15x


P (X = x) = = , x = 0, 1, 2, ...
x! x!
Now we can easily calculate the required probabilities.

188 / 212
Prob Dist

Probability Distributions
1 a box contains 3 or more defective;
To find the probability of 3 or more defectives in a box is:
P (X ≥ 3) = 1 − P (X < 3)
= 1 − [P (X = 0) + P (X = 1) + P (X = 2)]
 −15 0
e−15 151 e−15 152

e 15
=1− + +
0! 1! 2!
 0 1 2

15 15 15
= 1 − e−15 + +
0! 1! 2!
= 0.99996
2 Similarly, to find the probability of 6 or more defective items in two boxes we first need the
probability of 6 or more defective items in a single box which is:
P (X ≥ 6) = 1 − P (X < 6)
= 1 − [P (X = 0) + P (X = 1) + · · · + P (X = 5)]
 −15 0
e−15 151 e−15 155

e 15
=1− + + ··· +
0! 1! 5!
 0 1 5

15 15 15
= 1 − e−15 + + ··· +
0! 1! 5!
= 0.9972076
189 / 212
Prob Dist

Probability Distributions
Thus the probability of having 6 or more defective items is 0.9972076 for all individual boxes.
Assuming that the two boxes were filled independently, we can calculate the probability of 6 or
more defective items in two boxes as under.
P (X ≥ 6 in two boxes) = P (X ≥ 6) × P (X ≥ 6)
= (0.9972076)2
= 0.994423
Poisson frequency distribution
Like binomial distribution, we can find the Poisson frequency distribution by multiplying the Poisson
probability distribution by N , the number of experiments done.
It’s defined as:
e−λ λx
f (x) = N · Poisson(λ) = N · , x = 0, 1, 2, ..., ∞
x!
That’s just multiply the actual Poisson probabilities corresponding to x = 0, 1, 2, 3, ..., by a given
N. The method is same as was explained in example 46.
Fitting Poisson distribution to observed data
We
Pn can also fit a Poisson distribution to a given data. One first estimates λ by calculating x̄ =
fx
Px=0
n of the given data and then using the possible values of x = 0, 1, 2, 3, ... (ignoring the
x=0 f
frequencies) to find the corresponding probabilities. Then multiply those probabilities by n
P
x=0 f
or N to get a Poisson frequency distribution like thing. Go through, examples 46 and 51 in the
previous slides. See also, example 8.20/page 317 in Sher Muhammad Chaudhry’s book.
190 / 212
Prob Dist

Probability Distributions
Properties of Poisson distribution
1 Mean: The mean of a Poisson rv X is E(X) = λ.
2 Variance: The variance of a Poisson rv X is V (X) = E(X 2 ) − [E(X)]2 = λ.
Interesting! The mean and variance of Poisson random variable are same....

191 / 212
Prob Dist

Probability Distributions

Negative binomial distribution:


A negative binomial random variable is similar to a binomial random variable in that: each experi-
ment consists of a sequence of independent trials, each trial consists of a success or a failure, and the
probability of success is constant from trial to trial. The negative binomial differs from the binomial
in that there is not a fixed number of trials. The trials are performed until a total of r successes
have been observed. Hence the number of trials is now random. Let X = the number of failures
that precede the rth success. Then X is a negative binomial random variable. If there are x failures
and r successes (i.e., a total of x + r trials), there must be r − 1 successes in the first x + r − 1 trials.
The negative binomial distribution has two parameters r and p and has probability mass function:
!
x+r−1 r
P (X = x) = nb(x; r, p) = p (1 − p)x , for x = 0, 1, 2, . . . ,
r−1
The expected value and variance of a negative binomial random variable X are:

r(1 − p) r(1 − p)
E[Xr ] = , V ar(Xr ) =
p p2
and X is said to have a negative binomial distribution with parameters r, p. A special case
arises when r = 1, the distribution is known as the geometric distribution, that is one has to
repeat the experiment until the first and only success occurs. The resulting pmf is then written as,
P (X = x) = geo(x; p) = p(1 − p)x , for x = 1, 2, . . . ,
192 / 212
Prob Dist

Probability Distributions
Some continuous probability distributions
The uniform distribution
We know that a continuous rv assumes any value in a given interval. A continuous rv X, defined
over an interval [a, b], is said be uniformly distributed if all its values have an equal probability.
It’s probability density function is defined as,
 1 , a<x<b

f (x) = b − a
0, elsewhere.

Where a and b are the two parameters of the distribution. In other words, a random variable is
uniformly distributed whenever the probability is proportional to the length of the interval. It’s also
called rectangular distribution because its density looks like a perfect rectangle (see figure below).
The notations for uniform distribution are U (a, b) or Uniform(a, b).
b −a
1

a b
x
Rb 1
The area within this rectangle is unity, that’s a b−a
dx = 1.
193 / 212
Prob Dist

Probability Distributions

The cdf of uniform distribution is obtained as,


Z x Z x
1
F (x) = f (x)dx = dx
a a b−a
Z x
1
= dx
b−a a
1
= [x]xa
b−a
x−a
= , a<x<b
b−a

Hence, the cdf is,




 0, for x ≤ a

x − a
= , for a < x < b
b−a


1, for x ≥ b

194 / 212
Prob Dist

Probability Distributions
Properties of uniform distribution
1 T heM ean : The mean of a uniform rv X is obtained as:
Z b Z b
1
µ == E(X) = xf (x)dx = x dx
a a b−a
Z b
1
= xdx
b−a a
 2 b
1 x
=
b−a 2 a
b2 − a2 −
(b  a)(b + a)
= =
2(b − a) 2
(b −a)
b+a
=
2
2 The Variance: The variance of X is given by V (X) = E(X 2 ) − [E(X)]2 .
Z b Z b
1
Now, E(X 2 ) = x2 f (x)dx = x2 dx
a a b − a
Z b b
x3

1 2 1
= x dx =
b−a a b−a 3 a
b3 − a3 a)(b2 + ab + a2 )
−
(b 
= =
3(b − a) 3 −
(b a)
195 / 212
Prob Dist

Probability Distributions

b2 + ab + b2
So, E(X 2 ) =
3
Thus, V (X) = E(X 2 ) − [E(X)]2
b2 + ab + b2 (b + a)2
= −
3 4
(b − a)2
=
12

196 / 212
Prob Dist

Probability Distributions
Beta Distribution
The Beta Distribution We often need to model the distribution of a proportion (i.e., 0 < X < 1),
where X is a continuous random variable. The beta distribution is often used in this framework.
X is said to have a beta distribution with parameters α, β, A, and B if the pdf of X is:
  α−1  β−1
 1
 Γ(α + β) x − A B−x
, A≤x≤B
f (x; α, β , A, B) = B − A Γα Γβ B−A B−A

0, otherwise

The case A = 0 and B = 1 gives the standard beta distribution which is commonly used instead.

 Γ(α + β) xα−1 (1 − x)β−1 ,



0≤x≤1
f (x; α, β) = Γα Γβ

0, otherwise
The mean and variance for a beta distribution are:
α
E(X) =
α+β
αβ
V (X) =
(α + β)2 (α + β + 1)
Find the mean and variance of Beta distribution using expectation method.

197 / 212
Prob Dist

Probability Distributions

Gamma Distribution
Another widely used distributional model for skewed data is the gamma family of distributions.
A continuous random variable X is said to have a gamma distribution if the pdf of X is:
  
 1 xα−1 exp − x , x ≥ 0

f (x; α, β) = β α Γα β

0, otherwise

where α > 0, β > 0, and Γ is gamma function defined above.


The standard gamma distribution has β = 1.
The mean and variance for a gamma distribution are:

E(X) = α β
V (X) = αβ 2

Finding the mean and variance of Gamma distribution is left as an exercise.

198 / 212
Prob Dist

Probability Distributions
Exponential Distribution
1
A special case of the gamma distribution is the exponential distribution. Taking α = 1 and β = λ
gives the exponential pdf:
(
λ exp (−λx) , λ > 0, x ≥ 0
f (x; λ) =
0, otherwise

The expected value and variance of an exponential random variable X are:


1
E(X) =
λ
1
V (X) = 2
λ
Derive the mean and variance yourself.

199 / 212
Prob Dist

Probability Distributions
Chi-Squared Distribution
The chi-squared distribution is another member of the gamma family with α = ν2 and β = 2. A
random variable X has a chi-squared distribution with ν degrees of freedom, denoted X ∼ χ2 , if it
has pdf: 
1 ν −1
 x

 ν
ν
 x 2 exp − , x ≥ 0, ν = 1, 2, 3, . . . ,
f (x; ν) = 2 2 Γ 2 2

0, otherwise
The mean and variance of X ∼ χ2ν are

E(X) = ν
V (X) = 2ν

The chi-squared distribution is widely used in statistical inference. In fact, if X ∼ N (0, 1), then
X 2 ∼ χ2 . The most commonly used procedure in inference which involves the chi-squared
distribution is the Pearson goodness-of-fit statistic:
k
X (Oi − Ei )2
i=1
Ei

where Oi = observed data and Ei = expected data (after fitting a chosen distribution to observed
data and finding expected frequencies as was done for binomial and poisson distributions).
200 / 212
Prob Dist

Probability Distributions
The Normal Distribution (The father of all distributions)
The normal probability distribution is the most important distribution for describing a continuous
random variable. It is widely used in statistical inference (testing of hypothesis: next chapter). It
has a bell-shaped probability density function, known as the Gaussian function or informally the
bell curve. It’s usually denoted by N (µ, σ 2 ). Let X be a normal rv, then its pdf is given as:

1 x − µ !2



 1
e 2 σ

f (x) = √ , −∞ ≤ x ≤ +∞
σ 2π

f (x )


0, elsewhere.

Where the mean µ (−∞ ≤ µ ≤ +∞) and


standard deviation σ ≥ 0 are the two pa-
rameters that determine the normal dis-
−∞ +∞
tribution. x
In real word, most of the processes follow a normal distribution.

201 / 212
Prob Dist

Probability Distributions
The effects of changing µ and σ on normal curve.

1.0
µ=2
σ = 0.5

0.8
0.6
f (x )
µ=4
σ=1
0.4
µ=8
σ=2
0.2
0.0

0 5 10 15
x
Changes in µ cause changes in the location of the density and changes in σ cause changes in the
spread of the density. Therefore, the parameters µ and σ are also called location and scale
parameters respectively.

202 / 212
Prob Dist

Probability Distributions
The cumulative distribution function of normal rv X is given as:
1 x − µ !2
Z x −
1
F (x) = P (X ≤ x) = √ e 2 σ dx
σ 2π −∞

1.0
0.8
0.6
F (X )
0.4
0.2
0.0

16 18 20 22 24
X
CDF of Normal rv.

203 / 212
Prob Dist

Probability Distributions
Standard normal distribution
It’s the thing that you, actually, will be using to find the probabilities associated with a normal
random variable. The normal distribution is determined by the values of the two parameters µ and
σ. Looking at the ranges of these two parameters (i.e. −∞ ≤ µ ≤ +∞) and σ ≥ 0), we can see that
there can be an infinite number of normal distributions. Unfortunately, it’s not possible to process
a normal rv in it’s actual shape. The integration of the normal density is not very straight forward.
Also, for each of these distributions we need a separate probability table, which is an impossible job.
So, we use a standardized version in which the mean and standard deviations are fixed constants
(µ = 0, σ = 1) for all normal rv’s, so that the integration can be easier and also so that there be
only one probability table which can be used for all normal rv’s. This standardization is, in reality,
a transformation performed over the term X−µ σ
in the exponent of the normal density function.
X −µ
Z=
σ
where X is a normal rv. The pdf of Z is then,
1 z2
φ(z) = √ e− 2 , −∞ ≤ z ≤ +∞

It has zero mean and unit variance. A cumulative probability table, which is constructed for various
values of Z using the following cdf :
Z z
1 z2
Φ(z) = P (Z ≤ z) = √ e− 2 dz
2π −∞
204 / 212
Prob Dist

Probability Distributions
Standard Normal Cumulative Probabilities (Using half of the density)
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
0.0 .0000 .0040 .0080 .0120 .0160 .0199 .0239 .0279 .0319 .0359
0.1 .0398 .0438 .0478 .0517 .0557 .0596 .0636 .0675 .0714 .0753
0.2 .0793 .0832 .0871 .0910 .0948 .0987 .1026 .1064 .1103 .1141
0.3 .1179 .1217 .1255 .1293 .1331 .1368 .1406 .1443 .1480 .1517
0 z 0.4 .1554 .1591 .1628 .1664 .1700 .1736 .1772 .1808 .1844 .1879
0.5 .1915 .1950 .1985 .2019 .2054 .2088 .2123 .2157 .2190 .2224
0.6 .2257 .2291 .2324 .2357 .2389 .2422 .2454 .2486 .2517 .2549
0.7 .2580 .2611 .2642 .2673 .2704 .2734 .2764 .2794 .2823 .2852
0.8 .2881 .2910 .2939 .2967 .2995 .3023 .3051 .3078 .3106 .3133
0.9 .3159 .3186 .3212 .3238 .3264 .3289 .3315 .3340 .3365 .3389
1.0 .3413 .3438 .3461 .3485 .3508 .3531 .3554 .3577 .3599 .3621
1.1 .3643 .3665 .3686 .3708 .3729 .3749 .3770 .3790 .3810 .3830
1.2 .3849 .3869 .3888 .3907 .3925 .3944 .3962 .3980 .3997 .4015
1.3 .4032 .4049 .4066 .4082 .4099 .4115 .4131 .4147 .4162 .4177
1.4 .4192 .4207 .4222 .4236 .4251 .4265 .4279 .4292 .4306 .4319
1.5 .4332 .4345 .4357 .4370 .4382 .4394 .4406 .4418 .4429 .4441
1.6 .4452 .4463 .4474 .4484 .4495 .4505 .4515 .4525 .4535 .4545

The above table is constructed using the fact that the normal density is symmetric therefore the
sign of a value of Z does not matter. That’s Z = −1.6 and Z = +1.6 are equivalent. The only
care one needs is to add/subtract a 0.5 (half area under the normal curve) when the value of Z
is positive/negative depending on the direction of the inequality. For example, P (Z ≤ −1.6) and
P (Z ≤ 1.6) are depicted as under.
0.6

0.6
0.5

0.5
P(Z <= −1.6)= 0.5 − P(0 <= Z <= 1.6) P(Z <= 1.6)= 0.5 + P(0 <= Z <= 1.6)
0.4

0.4
f (z )

f (z )
0.3

0.3
0.2

0.2
0.5 0.5 0.5 0.5
0.1

0.1
0.0

0.0

−4 −2 0 2 4 −4 −2 0 2 4
z z

205 / 212
Prob Dist

Probability Distributions
In other words, this table assumes that the total area under the normal curve is 0.5 rather than 1
and one has to find the area from 0 to z rather than from −∞ to z.
A few examples will clear the ambiguities.
In the following, look at the first figure, we are trying to find P (Z ≤ −1.65). Since the table lists only
positive values of Z and since the normal curve is symmetric therefore we can mirror the required
area to the right of the cure, in which case it becomes P (Z ≥ 1.65). As we also know that the tabled
probabilities are actually cumulative probabilities therefore we need to subtract P (Z ≤ 1.65) from
0.5 to get P (Z ≥ 1.65), which if mirrored to the left side again will become P (Z ≤ −1.65).
0.6

0.6
0.5

0.5
P(−1.65 <= Z) = 0.5 − P(0 <= Z <= 1.65) P(0.6 <= Z <= 1.67) = P(0 <= Z <= 1.67) − P(0 <= Z <= 0.6)
0.4

0.4
f (z )

f (z )
0.3

0.3
0.2

0.2
0.5 0.5 0.5 0.5 R equi red
0.1

0.1
R equi red M i rrored
0.0

0.0
−4 −2 0 2 4 −4 −2 0 2 4
z z
The second figure depicts P (0.6 ≤ Z ≤ 1.65). Which can be written as
P (0.6 ≤ Z ≤ 1.65) = P (0.0 ≤ Z ≤ 1.65) − P (0.0 ≤ Z ≤ 0.6).
It can now be easily found from the table.
206 / 212
Prob Dist

Probability Distributions
Example 13.
Let X ∼ N (50, 25). Find P (0 ≤ X ≤ 40), P (55 ≤ X ≤ 100), P (X ≥ 54) and P (X ≤ 57). Solution: For
each probability we need to convert the X values to Z values using:
X −µ
Z=
σ
√ X − 50
We have µ = 50 and σ = 25 = 5 thus Z = .
5
0−50
For X = 0, Z = 5
= −10.0 and similarly for X = 40, Z = -2

So,

0.6
P (0 ≤ X ≤ 40) = P (−10 ≤ Z ≤ −2)

0.5
= P (−10 ≤ Z ≤ 0) − P (−2 ≤ Z ≤ 0)

0.4
= P (0 ≤ Z ≤ 10) − P (0 ≤ Z ≤ 2)

f (z )
0.3
= 0.5 − 0.4772 = 0.0228

0.2
0.1
Note: Most of the normal tables list the

0.0
probabilities only upto Z = 3.0 or 3.5. −4 −2 0 2 4
z
The area under the curve for Z values
greater than 3.0 (or 3.5) is assumed 0.5.

207 / 212
Prob Dist

Probability Distributions
55 − 50
For X = 55, Z = = 1 and for X = 100, Z = 10.
5
So,

0.5
0.4
P (55 ≤ X ≤ 100) = P (1 ≤ Z ≤ 10)

0.3
f (z )
= P (0 ≤ Z ≤ 10) − P (0 ≤ Z ≤ 1)

0.2
= 0.5 − 0.3413 = 0.1587

0.1
0.0
−4 −2 0 2 4
z

0.5
54 − 50
For X = 54, Z = = 0.8. So,

0.4
5

0.3
f (z )
P (X ≥ 54) = P (Z ≥ 0.8)

0.2
0.1
= 0.5 − P (0 ≤ Z ≤ 0.8)

0.0
= 0.5 − .2881 = 0.2119 −4 −2 0
z
2 4

0.5
57 − 50

0.4
For X = 57, Z = = 1.4. So,
5

0.3
f (z )
0.2
P (X ≤ 57) = P (Z ≤ 1.4)

0.1
= 0.5 + P (0 ≤ Z ≤ 1.4)

0.0
−4 −2 0 2 4
z
= 0.5 + 0.4192 = 0.9192

208 / 212
Prob Dist

Probability Distributions

Properties of normal distribution


1 The density function (say f (x)) of a normal rv is a proper pdf. That’s f (x) ≥ 0 and
R +∞
−∞
f (x)dx = 1

2 The mean: The mean, median and mode of a normal distribution are all equal to µ

3 The variance: The variance of any distribution is given by V (X) = E(X 2 ) − [E(X)2 ] = σ 2
4 The Mode and the Median:
The mode and median of a normal distribution are also equal to µ. The mode is defined as
d d2
Mode = f (x) = 0 if f (x) < 0.
dx dx2
Whereas, the median is derived as the solution of the following expression for m.
Z m −1 x − µ
!2

1 1
= √ e 2 σ dx
2 σ 2π −∞
Both of the above expression give µ.
5 The Mean deviation:
The mean deviation (MD) of a normal distribution is approximately 4 of its standard deviation.
That’s
4
M D = σ.
5
209 / 212
Prob Dist

Probability Distributions
Example 15.
Suppose the diameter at certain height of trees (in inches) of a certain type is normally distributed with
mean 8.8 and standard deviation 2.8, based on data in a 1997 article in the Forest Products Journal.
1 What is the probability that the diameter of a randomly selected tree of this type will exceed 10
inches?
2 What is the probability that the diameter of a randomly selected tree of this type will be between
5 and 10 inches?
Solution:
Here X ∼ N (8.8, 2.82 ), So
X −µ X − 8.8
Z= =

0.4
σ 2.8

0.3
Now

f (z )
0.2
P (X > 10.0) =?

0.1
 
10.0 − 8.8
P (X > 10.0) = P Z>
2.8

0.0
= P (Z > 0.43) −4 −2 0
z
2 4

= 0.5 − P (0 ≤ Z ≤ 0.43)
= 0.5 − 0.1664 = 0.3336

210 / 212
Prob Dist

Probability Distributions
2 P (5.0 ≤ X ≤ 10.0) =?

0.4
 
5.0 − 8.8 10.0 − 8.8

0.3
P (5.0 ≤ X ≤ 10.0) = P ≤Z≤
2.8 2.8

f (z )
0.2
= P (−1.36 ≤ Z ≤ 0.43)

0.1
= 0.4131 + 0.1664 = 0.5795

0.0
−4 −2 0 2 4
z
Sketching densities with shaded areas are of great help as it makes it easier to figure out as to where
exactly are the probability areas located and what exactly we need to find.
Example 16.
The mean height of soldiers is 68.22 in. with a variance of 10.8 in.2 . Assuming that the heights are
normally distributed, how many soldiers in regiment of 1000 would you expect to be over 6 ft. (72 in.)?
Solution:
Let X denotes the heights, then X =height∼ N (68.22, 10.8). Hence Z = X−68.22

10.8
.

0.4
 
72.0 − 68.22
P (X > 72.0) = P Z> √

0.3
10.8

f (z )
0.2
= P (Z > 1.15)

0.1
= 0.5 − P (0 ≤ Z ≤ 1.15)

0.0
= 0.5 − 0.3749 = 0.1251 z
−4 −2 0 2 4

Now, out of 1000 soldiers 1000 × 0.1251 = 125 are expected to have their height exceeding 6 ft.
211 / 212
Prob Dist

Probability Distributions
Normal approximation to binomial.
Let X ∼ Binomial(n, p), with mean np and standard deviation npq. Now, if p is close to 21 (any
departure from p = 12 results in a skewed Binomial histogram), and n is sufficiently large, such that
both np and nq ≈ 10 then we can define the following variable.
X ± 0.5 − np
Z= √ ∼ N (0, 1)
npq

The ±0.5 is used to make X a continuous rv because, in reality, it’s a discrete rv.

212 / 212

You might also like