BB NPTEL Lecture 4

4.
1 Common discrete distributions

(in structural reliability)
©Baidurya Bhattacharya IIT Kharagpur www.facweb.iitkgp.ac.in/~baidurya/ 1

Common discrete distributions


Equally likely - uniform:
The uniform distribution arises naturally when there is This distribution also corresponds to
no reason to favour one outcome over another from the state of maximum Shannon
the sample space entropy.
– making all sample points equally likely.
Shannon’s entropy,
1
p X ( x)  , x  x1 , x2 ,..., xn H  i pi ln pi   E[ln p( X )]
n
1 n
   i 1 xi for a coin toss problem, is simply:
n
1 n
 2   i 1 ( xi   ) 2 H   p ln p  (1  p) ln(1  p)
n
which attains its maximum at,
dH
Example: Throw of a fair die. W = {1,2,3,4,5,6}. Each face is 0
equally likely. If {X = i} signifies the number of points face up, dp
then P[X=i] = 1/6 for i = 1,2,…,6. The mean and variance are: p 1 p
i.e., ln  ln 0
  3.5,  2  2.92 1 p p
1
 p
2 4
©Baidurya Bhattacharya IIT Kharagpur www.facweb.iitkgp.ac.in/~baidurya/
–
The Bernoulli trial:
1.1.1 The Bernoulli trial A sequence of independent and
The Bernoulli trial (BT) refers to a binary outcome: identical Bernoulli trials
X = 0 (often called “failure”) that occurs with probabilty q, and 1. The number of trials to the first (or
X = 1 (often called “success”) that occurs with probability p, next) success gives rise to the
so that p + q = 1.
Geometric distribution.
2. The th
A sequence of independent and identical Bernoulli trials can help model largnumber of trials to the k success
classes of phenomena of engineering interest. The number of trials to tgives rise to the Pascal (or negative
binomial) distribution.
Mean of X = 1p + 0(1  p) = p
Variance of X = (1  p)2 p + (0  p)2(1  p) = p(1  p) 3. The number of successes in a fixed
number of Bernoulli trials follows the
Binomial distribution.

The geometric
1.1.1.1 distribution:
Relation between Bernoulli and Geometric:
The Geometric random variable, G, represents the trial number of the first success in
a sequence of IID Bernoulli trials {Xi}:
      
     
{G  n}  { X1  0, X 2  0,..., X n1  0, X n  1} (0.1)
(0.1)
1.1.1.2
1.1.1.2Probability law
Probability law
Since the
Since sequence
the sequence{X{i}Xare IID,
} are IID, thethePMFPMFofofthe geometric
the geometric random
random variable is is
variable easily
easily
derived from its definition:
      
P{G  n}  P { X  0, X  0,..., X  0, X n  1}
  1   2  n 1
  

 
P { X  0} P{ X  0}... P { X  0}P { X n  1}
 1  2  n 1
   (0.2)
 q q ... q p

 qn 1 p
The CDF of the Geometric RV can also be derived easily from its definition:
The CDF of the Geometric RV can also be derived easily from its definition:
FG (n)  P{G  n}
 
 1 P{G n}
  
 1 P {G  n  1} (0.3)
      
 1  P{ X 1  0}P{ X 2  0}...P{ X n  0}
    
 1 qn
  
Mean return period/ mean recurrence interval:
• Sequence of independent and identical
Bernoulli trials (parameter p): • “Mean Return Period” = mean of these
• Time instants of trials Geometric RVs
• known and discrete • is equal to 1/p (in units of trial time interval)
• Equally spaced (commonly)
• Associate a random variable Xi with each
• No. of trials (“time”): trial  IID sequence
• to the first occurrence is random  • Known CDF FX for all Xi
Geometric random variable, mean value
of this Geometric RV is 1/p • Occurrence (or success) = {Xi > xp} in ith trial
• between successive occurrences is • p = P{success} = P {Xi > xp} = 1 – FX (xp)
random  sequence of IID Geometric • xp = level corresponding to exceedance
RVs, each with mean 1/p probability p
• Mean return period depends on the level xp

Example: flooding due to rainfall
The storm sewers in a city are designed for rainfall having a return period of 100
years.
p = 1/100 yr
a) What is the probability that the sewers will be flooded

(a) Ans: .999x.01 = .009135
for the first time in the tenth year after the completion
of construction?
b) What is the probability of flooding within the first 10 (b) Ans: 1  .9910 = .09562
years?
c) What is the probability of no flooding in 100 years? (c) Ans: .99100 = .3660
8
Binomial distribution:
The Binomial random variable counts the number of successes in

a sequence of n (non-random) IID Bernoulli trials.
Since the Bernoulli RV is 0 for non-occurrence and 1 for
occurrence, the Binomial RV is the sum:
n
X Bin   X i  Normal RV by Central Limit theorem
i 1
Since the Xi’s are IID, the mean and variance of the Binomial RV:
n
   p  np
i 1
n
   pq  npq
2
i 1
The Binomial PMF is
n
P[ X Bin  x]    p x q n  x , x  0,1, 2,..., n
 x
9
Pascal or Negative Binomial distribution:
Consider a sequence of IID BTs with parameter p. The Pascal CDF from the Binomial CDF:
X r  trial # at which r th success ocurs, xr  r , r  1,....,  Start with the complementary CDF of
Xr : Probability that the rth success
Here, it is not important on which trial numbers the
occurs after k trials:
previous r – 1 successes occur.
r th success occurs less than r successes
 x  1 r 1 x 1( r 1)  x  1 r x  r after k trials in k trials
P Xr  x    p q p  p q
 r  1  
success
r  1  P Xr  k  P  X Bin ,k  r 
r 1 successes in
x 1 trials in xth
trial
Hence the Pascal CDF:
Since Xr is the sum of r IID Geometric RVs:
X r  G1  G2  ...  Gr  Normal RV by Central Limit P X r  k   1 P X r  k 
theorem
 1  P  X bin ,k  r 
The first two moments of the Pascal distribution are:
 1  FX bin ,k  r  1
1
  E[ X r ]  E[G1 ]  E[G2 ]  ...  E[Gr ]  r r/ p
p
q
 2  var[ X r ]  var[G1 ]  var[G2 ]  ...  var[Gr ]  r 2  rq / p 2
p
10
Examples: The PDF of the annual rainfall, H, in a certain region is shown on the left.
Drought in the region is defined as annual rainfall being less than 1m.
fH (h)
b  1, a  1/ 2.25 a) What is the probability of drought in a year.
b) If annual rainfall magnitudes are mutually independent

b year from year, find:
ah2
i. The mean return period of droughts
0 1.5 2.0 h (m) ii. The probability that 4 out of 10 successive years
will be drought years.
Define, A  {drought occurs}
iii. Starting next year, the fourth drought will occur 10
a
a) P[ A]  P[ H  1]   0.15 years from now.
3
that is, p  0.15 / yr
n  x  1 r x  r
bi) Mean return period, bii) P[ X Bin  x]    p x q n  x biii) P  X r  x    p q
T  1/ p  6.75 yr
 x  r 1 
10  9
P[ X Bin  4]    0.154  0.856 P  X 4  10     0.154  0.856
4  3
 0.040  0.016
11
Example: oil exploration
An oil exploration company is trying to locate subsea oil wells. For every exploratory drill, the
probability of correctly locating a deposit is 5%. The company has scheduled a series of drills
in a basin. Assume that the outcomes are IID. The company defines “success” as hitting the
first deposit.
(a) What is the probability that the company will find success at the 3rd (a) Ans: .952 x .05 =.045
drill?
(b) What is the expected number of drills for the first find? (b) Ans: 1/.05 = 20
(c) After how many drills is the company 90% likely to succeed? (c) Ans: ln.1/ln.95 = 45
(d) What is the probability that between 10 and 30 drills (both included) (d) Ans: .959-.9530 = .415
will be needed to claim success?
(e) 10 drills are conducted. What is the probability that 2 or more hits will (e) Ans: 1-.9510-10x.959x.05 = .086
occur? Does this answer change if it is known that at least 1 hit occurs? Yes. .086/(1-.9510) = .21
(f) What is the probability that the second hit will occur on the 10th drill? (f) Ans: 9C1x.958x.052 = .0149
(g) What is the probability that the second hit will occur on the 10th drill or
(g) Ans: .959+ 9C1.958.05 =.929
later?
12
Example: darts
You and your friend go into a sports bar where a dart throwing competition is going on. You buy m darts at 1
rupee each and throw them at the board. Of these, N darts hit within the inner circle.
Your friend picks up these N darts, and throws them at the board. X of them hit the inner circle.
You and your friend earn 10 rupees for each of the X hits.
Assume that your throws are independent and each has a probability p1 of hitting the inner circle. Your friend’s
throws are also independent, and each has a probability p2 of hitting the inner circle.
a) What is the distribution of N?

b) What is the distribution of X?
c) How much do you expect to earn from this game?
d) Say, p1 > p2. Does it matter who goes first?
Clearly, N is a binomial random variable with parameters m and p1. Given N = n, X too is
Binomial:
pX |N n ( x; n)  P  X  x | N  n  (nx ) p2x (1  p2 )n x (0.1)
The unconditional PMF of X can be found by theorem of total probability:
m
pX ( x)   pX |N n ( x; n) pN (n)   (nx ) p2x (1  p2 )n x (mn ) p1n (1  p1 )mn I ( x  n)
all n n 0
where the indicator function ensures that your friend can never have more successes than
the darts you win.
13
Example: darts (contd.)
You and your friend go into a sports bar where a dart throwing competition is going on….
Substituting v  n  x so that n  x  v  0, and n  m  v  m  x allows us to rewrite the

above summation as:
m! x m x (1  p2 )v p1v  x (1  p1 )mv  x
p X ( x)  p2 
x ! v 0 v !(m  x  v)!
Thus, X is Binomial with parameters (m1 p1 p2 ) .
Rearranging and substituting m '  m  x , we obtain

E[earning]  m 1  10  E  X 
m! m'
 m '
p X ( x)  ( p1 p2 ) x    [ p1  p1 p2 ]v [1  p1 ]m 'v   m  10mp1 p2  (10 p1 p2  1)m
x !(m  x)! v 0  v 
v m 'v
m    p  p1 p2 
m' m '
 1  p1  Since the solution is symmetric in p1 and p2,
   ( p1 p2 ) (1  p1 p2 )     1
x m x
   it does not matter who goes first.
x  v  0  v   1  p1 p2  1  p1 p2 
Using the binomial identity and seeing that the terms in […] add up to 1:
 m
p X ( x)    ( p1 p2 ) x (1  p1 p2 ) m x
x 
14
Hypergeometric distribution:
A finite population of size N is partitioned into two
groups:
- “marked” (of size d)  d  N  d 

  
- “unmarked” (of size N - d).  x  n  x 
p X ( x; n, d , N )  P( X  x; n, d , N ) 
N
Members of each group are otherwise indistinguishable  
n
from each other.
Sampling without replacement from the population. nd

Sample size n. X 
N
How many are “marked” in the sample?

nd ( N  d )  N  n 
 X2   
N2  N 1 
15
Example: population estimation
Wildlife population N is unknown. Catch d. Mark them. Release them and let them mix.
Now, catch n of which x are found to be marked.
What is the best estimate of the unknown N?
Solution: a) Maximum likelihood estimation:

Which value of N would maximize the likelihood of observing x marked samples?
With known x, n, d, we are looking for N such that
The first inequality gives N  nd / x
P( N  1; x, n, d )  P( N ; x, n, d )  P( N  1; x, n, d )
The second inequality gives N 1  nd / x
In terms of the hypergeometric PMF:
 nd 
 d  N  1  d   d  N  d   d  N  1  d  Combining the two N  
          x 
 x  n  x    x  n  x    x  n  x 
b) Law of large numbers:
 N  1 N  N  1
      Sample proportion approaches population
 n  n  n  proportion
Cancelling out terms, we get: (Note: strictly speaking independence
assumption does not hold here).
1 ( N  d )( N  n) ( N  1  d )( N  d )( N  1  n)( N  n)
  x d
1 ( N  d  n  x) N ( N  1  d  n  x)( N  d  n  x)( N  1) N  , n N
n N
nd
N
x 16
The Poisson distribution:
The Poisson random variable represents the count of points occurring according to a
Poisson process in a given interval of time.
 x
p X ( x)  e , x  0,1, 2,3,....
x!
Like the Geometric, the Poisson distribution is a single parameter distribution.
Its mean and variance are:
E( X )  ,
var( X )  
If l is the (constant) rate of occurrence of the underlying (homogeneous) Poisson process,

and t is the length of the interval, then, the mean:
=lt
If l(t) is the (variable) rate of occurrence of the underlying (inhomogeneous) Poisson process,
and t is the length of the interval, then, the mean:
t
 (t )   l (t )dt
0
17
Example: Poisson distribution (earthquake)
Earthquakes above a certain intensity occur at the rate of 0.025 per year according to a Poisson
process.
a) What is the probability of no such earthquake in 50 years?
b) What is the probability of 2 or more such earthquakes in 100 years?
 x
Poisson PMF: p X ( x)  e , x  0,1, 2,3,....
x!
a)   l t  0.025 / yr  50 yr = 1.25 b)   l t  0.025 / yr  100 yr = 2.5
 0 P[ X  2]  1  p X (0)  p X (1)
p X (0)  e  0.2865
0!
 0  1
 1 e e
0! 1!
 0.7127
18
Example: Poisson distribution (weld)
Flaws occur in an 8 meter long weldline at the rate of 1.25 per meter according to a Poisson
process.
An NDT system can detect a flaw 80% of the time. False positives do not occur. Assume that
detection of individual flaws are mutually independent events. The complete weldline is
inspected.
(a) What is the average number of flaws in the weld?
(b) What is the probability of finding this average number of flaws?
(c) What number of flaws can be expected to be detected?
(d) What is the probability of detecting 5 flaws in the weld?
(e) If 5 flaws are detected, what is the probability that there are 10 flaws in the weld?
N = number of flaws in the weld
a)   l l  1.25 / m  8 m = 10
X = number of flaws detected
N is Poisson. What is the distribution of X ?  10

b) pN (10)  e  0.125
p = P [ a given flaw is detected] 10!
19
Example: Poisson distribution (weld) contd.
Flaws occur in an 8 meter long weldline at the rate of 1.25 per meter according to a Poisson
process. …
(c) What number of flaws can be expected to be detected?
N = number of flaws in the weld ~ Poisson p = P [ a given flaw is detected] = 0.8

X = number of flaws detected q=1p
Given N, X is binomial
n
(d) What is the probability of detecting 5 flaws in
p X | N  n ( x)  P  X  x | N  n     p x (1  p ) n  x , x  n the weld?
 x
Unconditional PMF of X : (e) If 5 flaws are detected, what is the probability

n  n
that there are 10 flaws in the weld?
p X ( x )   p X | N  n ( x ) p N ( n)     p x q n  x e  
all n n x  x  n!
85
8
p x e  
 n ' x d) p X (5)  e  0.0916
  q n'
, n'  n  x 5!
(e) P  N  10 | X  5  ?
x! n ' 0 n '!
 p
(  p) e x
  P  X  5 | N  10 P[ N  10] / P[ X  5]
x!
 X ~ Poisson (  p). (c) Ans: 10  .8  8 10  5 5 10 1010
 p q e / .0916  0.0361
5  10! 20

BB NPTEL Lecture 4

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

BB NPTEL Lecture 4

Uploaded by

Copyright:

Available Formats

4.

1 Common discrete distributions

©Baidurya Bhattacharya IIT Kharagpur www.facweb.iitkgp.ac.in/~baidurya/ 1

©Baidurya Bhattacharya IIT Kharagpur www.facweb.iitkgp.ac.in/~baidurya/ 2

©Baidurya Bhattacharya IIT Kharagpur www.facweb.iitkgp.ac.in/~baidurya/ 3

©Baidurya Bhattacharya IIT Kharagpur www.facweb.iitkgp.ac.in/~baidurya/ 5

©Baidurya Bhattacharya IIT Kharagpur www.facweb.iitkgp.ac.in/~baidurya/ 7

a) What is the probability that the sewers will be flooded

The Binomial random variable counts the number of successes in

The Binomial PMF is

b) If annual rainfall magnitudes are mutually independent

a) What is the distribution of N?

Substituting v  n  x so that n  x  v  0, and n  m  v  m  x allows us to rewrite the

Rearranging and substituting m '  m  x , we obtain

- “marked” (of size d)  d  N  d 

Sampling without replacement from the population. nd

How many are “marked” in the sample?

Solution: a) Maximum likelihood estimation:

If l is the (constant) rate of occurrence of the underlying (homogeneous) Poisson process,

a)   l t  0.025 / yr  50 yr = 1.25 b)   l t  0.025 / yr  100 yr = 2.5

N = number of flaws in the weld ~ Poisson p = P [ a given flaw is detected] = 0.8

You might also like