Professional Documents
Culture Documents
DUDEWICZ
DUDEWICZ
0, then P(-|B) is a probability
function; that is,
(i) P(A|B) = 0, for all events A;
P(Q\B) = 1;
) P| UA4,|B}= ¥ P(A,|B). if 4,0 A, = @ i #).
at a=1
Proof: Use Definition 2.3.2 and the properties of probability. .
Theorem 2.3.6. Multiplication Rule. For each n + 1 events Ay, Ay... A
which P(AgA, --- A,) > 0, we have
P(AgAy --- Ay) = P(Ag)P( Ail) P( A310) *** P(AglAgAL “+ Ay-)-
Proof: Since
AgAy = Ay © AoA, ++ Aya S 7+ © Ags
swe have
0 < P(AyA, +++ Ay) S +++ < P(Ag).
so all conditional probabilities involved are defined. [Recall (see Definition 2.3.2)
that P(A|B) is defined only when P(B) > 0.] It is easy to show the result for
n= 1, and the general result follows by mathematical induction. .2.3. CONDITIONAL PROBABILITY; BAYES’ RULE: ADDITIONAL PROPERTIES 41
Theorem 2.3.7. Theorem of Total Probabilities. If
o( U a.) =1,
n=l
B,VB=2 (i*/J),
P(B,) > 0 foreach n,
then (for any event A)
P(A) = Y P(AIB,)P(B,)-
at
Proof: \f we denote A a U8,] by xX and An{ UB] by Y, then
Pn
XO Y= @, XU Y=A,and therefore P(A) = P(X) + PY). Thus,
P(A) ~e(40 Ua,| val an (G4)
n=l
- (an Ga) +asine o( O]
a= net nat
1 implies >| Us
and hence P(Y) = 0
zy P(AB,) = r P(A|B,)P(B,) .
nat nad
- ose,
Corollary 2.3.8. Theorem 2.3.7 holds if 00 is replaced by some finite number N.
Theorem 2.3.9. Bayes’ Rule. If
a GU ) =)
BOB =8 (i#j).
P(B,) > 0 foreach n,
and A is an event with P(A) > 0, then for each k.
P(AIB)P(B)
& P(AIB,) P(B,)
nat
P(B,IA) =42 PROBABILITY
(DE
te |
Proof: By the definition of conditional probability and the Theorem of Total
Probabilities (2.3.7),
pena) = PAB) _PCAIBL) PCB) :
eo PAY 2418.) P(8,)
i
The hypotheses of the Theorem of Total Probabilities and of Bayes’ Rule can
be visualized as follows. 9 is essentially decomposed into disjoint sets B,, By... .
Then (Theorem of Total Probabilities) we can find P(A) by adding up the
probabilities of those parts of A that lie in B,, B),.... A diagram may help
(Figure 2.3-1). [The final result is stated somewhat differently, since P(AB,) =
P(A|B,)P(B,).] In Bayes’ Rule we look at an inverse problem: “Given that an
outcome has occurred which is in A, what is the probability that it is in B,?” (At
this point, examine Example 2.3.18 briefly to see some practical implications.)
Example 2.3.10. In a certain country, all legislative committee members are
either Communists or Republicans. There are 3 committees. Committee 1 has 5
Communists, committee 2 has 2 Communists and 4 Republicans, and committee
3 consists of 3 Communists and 4 Republicans. A committee is selected at
random, and a person is then selected at random from that committee. (a) Find
the probability that the person selected is a Communist. (b) Given that the
selected person is a Communist, what is the probability that he/she came from
* committee 2? committee 3? committee 17
(a) Let A denote the event that the selected person is a Communi:
denote the event that initially committee n is selected (n
Then by Theorem 2.3.7,
and B,
2,3).
P(A) = x P(A|B,)P(B,) = 1
nat2.3. CONDITIONAL PROBABILITY; BAYES’ RULE; ADDITIONAL PROPERTIES 43
(®) Here, using Bayes’ Rule (Theorem 2.3.9), we have
Jap
P(B, and A) _ P(A|B,)P(B2)
P(BuIA) = SG) = pay 7 a
63
eeold
301
P(AIB,)P(B)) _ 739
P(BuA) = aay = T= FP
6
2
P(AIB,) PCB, ol
P( BIA) = ( po 1) ---H.
68
Example 2.3.11. A factory has 4 machines that produce the same item. Ma-
chines 1 and 2 each produce 20% of the total output, while machines 3 and 4 are
larger and each produce 30% of the total output. It is known that 6% of machine
1's output is defective, while machine 2 produces 5% defective items, and
machines 3 and 4 each produce 8% defective items. An item is chosen at random
from the output of the factory. (a) What is the probability that this item is
defective? (b) Given that the item selected in part (a) is defective, what is the
probability that it was produced by machine 2?
(a) Let A be the event that the item chosen is defective, and B, be the
event that the item chosen was produced by machine n (n = 1,2,3,4).
Then by the Theorem of Total Probabilities (2.3.7),
4
P(A) = ¥ P(AIB,) P(B,)
ret
= 06 X 20 + 05 x .20 + .08 x .30 + 08 x 30 = 07.
(b) Here, using Bayes’ Rule, we find that
P(A|B,) P(B. 05 x .20 1
(ayia) = ARIE) | at
Example 2.3.12. An urn contains a white chips and 6 blue chips. A chip is
chosen at random from the umn, discarded, and replaced by one of the opposite
color; a second chip is then drawn. What is the probability that the second chip
drawn is blue? What is the probability that the second chip is white?
Let W, denote the event that the ith draw results in selection of a white chip,
and B, denote the event that the ith draw results in selection of a blue chip44 PROBABILITY
(i= 1,2). Then the desired probabilities are P(B,) and P(W,). By the Theorem
of Total Probabilities (Theorem 2.3.7),
P( By) = P(B|W,) P(W,) + P( By1B,) P(B,)
b+1 oa ue 0 (b= 1)b + (6+ 1)a
“atb atb a+b ath (a+b)
Now, since W, = B,, it follows (see Theorem 2.1.17) that
(b-1)b+(b+1)a (a-l)a+(a+1)b
(a+b) (a+by
As an example, suppose that a certain city consists of 1000 law-abiding citizens
(4 = 1000) and 200 miscreants (6 = 200). This city is in a totalitarian country,
and a random screening process is used to seek out the criminals, If a criminal is
found in the first stage, he/she is turned into a law-abiding citizen by a certain
“treatment”; while if a law-abiding citizen is selected in the first stage, then
he/she becomes angry at society and a miscreant in the second stage. Then the
probability of a citizen being a miscreant before the police treatment is P(B,) =
200/1200 = .1667, while the probability of a citizen being a miscreant after the
police screening (the first stage selection) is
(199)(200) + (201)(1000) _
(1200)
Thus the treatment (similar to a roadblock through which both the bad and the
innocent must pass) increases the bad by 100 (.1672 —.1667)/.1667 = 0.3%.
P(W,) = 1 ~ P(B)) =
P(B,) =
Example 2.3.13. A bin contains 100 items, of which 10% are bad. A second bin
contains 100 items, of which 90% are good. Since such a large proportion of the
second bin is good, by selecting an item at random from the second bin and
adding it to the first bin do we increase the probability that an item chosen from
the first bin will be good?
We can model this situation as follows. We have n = 2 urns, each of which
contains a white balls and 5 blue balls (in our example, a = 90 and b = 10). A
ball chosen at random from urn 1 has probability a/(a + b) of being white. If an
item is chosen at random from urn 2 and added to urn 1, what then is the
Probability of an item selected from urn 1 being white? Letting W, and B,
respectively denote the events that draw i yields a white or blue ball, we have
P(W,) = P(W,|W,) P(W,) + P(Wy)B,) P(B,)
a+1 a a b
ee ee
a+b+1 a+b a+b+1 a+b
a+atab (a+1+b)a a
(a+b+1(a+b) (a+b+ilat+b) arb’
Thus, this process cannot improve the chances of obtaining a good item.2.3. CONDITIONAL PROBABILITY; BAYES’ RULE; ADDITIONAL PROPERTIES 45
Example 2.3.14, If a certain country has n states, and the proportion of the
voters who are Democrats is the same in each state ( p, where p is between 0 and
1), must it be the case that proportion p of the voters in the country overall are
Democrats?
Let A, be the event that a voter comes from state i (i= 1,...,), and B be
the event that a voter is a Democrat. We know that P(B|A,) =p for all i
(i= 1,..., ”). Using the Theorem of Total Probabilities, it then follows that
P(B)= EPA) P(A) = pA) = EPA) =p-1=p,
so the conjectured property holds—100p% of the country also is Democratic.
We already know how to find P(A, U ++» UA,) if Ay,..., 4, are disjoint
events (Theorem 2.1.13). If A,,..., 4, are not disjoint, however, in general we
have only the bound of Boole’s Inequality 2.1.16. We now wish to remedy this
lack (Theorem 2.3.15) and to give examples of the use of our new result
(Examples 2.3.17 and 2.3.19).
Theorem 2.3.15. Let Aj,..., 4, be events. Then
P(A, VU A,U -+- UA,) = YE P(A) — 2 P(A, A,)
int ig
+ Y P(A,NA,N Ay) =
lejek
+(-1)""'P(A,N ++ A, ).
Proof (9 finite or denumerable); Suppose a point w is in A, U +++ UA,. Then
its probability is counted once on the left side, How many times is its probability
Counted on the right side? Suppose, without loss of generality, that « is in exactly
1 of Ay,..., 4, (1 <1< n). Without loss of generality, we may assume w is in
Ajy..., Ap Then
“(Jol eo
But, by the binomial theorem, (a + b)"= 5, (")a“b"-*. Thus
kao
vm amn's Elsienter- (oft)
1 (0-(Q) (Jaen
so w has its probability counted exactly once on the right side also. On the other
Number of times w’s probability is) _
counted on the right side a46 PROBABILITY
EIS
Figure 23-2.
hand, if w is not in A, U --- UA,, its probability is counted zero times on each
side (why?), so the proof is complete. [Note: This proof only covers the case of a
finite or denumerable sample space @. Other cases are covered in the Honors
Problems.} .
Corollary 2.3.16. Let A, B be events. Then
‘P(A UB) = P(A) + P(B) — P(AB).
Proof: This is the result of Theorem 2.3.15 with n = 2. A Venn diagram may
be intuitively helpful. See Figure 2.3-2. .
An alternative proof can be given as follows. First A = (AB) U (AB) and
AU B= BU (AB) are each disjoint unions [ie., (AB) M (AB) = @ and BA
(AB) = 9}. Hence,
P(A) = P(AB) + P( AB).
P(AUB) = P(B) + P( AB),
and
P(A) — P(A U B) = P( AB) — P(B),
or
P(A UB) = P(A) + P(B) — P(AB). .
Example 2.3.17. Probleme des Rencontres. Suppose that N (N > 1) cards
(numbered 1,2,...,.N) are placed at random onto N numbered places on a table
(the places being numbered 1,2,..., N), each place receiving one card. We say a
match or rencontre occurs at the ith place if the card numbered i is placed there.
What is the probability of at least one match?
Let A, be the event “rencontre at the ith place” (j= 1,.....N). Then
P(A, U ++» UAx) is what we wish to find. Since our 2 consists of N! orderings
of N objects and each has probability 1/N'!, and since event 4,4, --- A, occurs
if the cards numbered 1,2,....j fall on the places numbered 1,2...., j.2.3. CONDITIONAL PROBABILITY; BAYES’ RULE; ADDITIONAL PROPERTIES 47
respectively (1 2
Souree: 5 Destination
—|3 > 4
Figure 24-1, A water-supply system,54 PROBABILITY
Let A, denote the event that pumping station i is operable at time to
(1. sis 5). Then P(A,) = P(A,) = +++ = P(As) = p. Let B denote the event
that water is available to the destination at time fp. Then P(B) = R(p). Now
P(B) = P(BA,) + P(BA,)
= P(BIAs)P(As) + P( BIA.) P(A)
= pP(BlAs) + (1 ~ p) P( BIAS).
But (using Property 2.4.7 in the third equality)
P( BIA) = P((4,A2) U (434g) 145)
P(AyAalAs) + P(AgAGIAs) ~ P(A,A2As G15)
a0 Ore pp
and
P(BIAs) = P(A, U AGIAs)
= P(Ag|As) + P(AglAs) ~ P(ADAGIAS)
=P +p-p = 2p p*
Hence,
P(B) = p(2p - p*) + (1 ~ p)(2p* - p*)
= 4p? — 3p? — p* + p’.
Alternatively, we may express B as
B = (A\A3) U (A3Ag) U (A5A3) U (ASA)
and use Theorem 2.3.15:
P(B) = 4p? — 3(p* — p?) + 2( p* + p’) — p*
= 4p? — 3p? — p* +p’.
Can you devise a better way of utilizing five pumping stations ie. a way such
that. if the sole goal is to deliver water to the destination at time 1). the new way
yields a larger R(p)|? [Hint: Pipe water through each station directly to the24, INDEPENDENT EVENTS 55
destination.] Can you do better if water must (because of the distances involved)
pass through at least two pumping stations and must (because of the pipe
strengths involved) pass through at most two pumping stations (i.e., if water must
pass through exactly two pumping stations)?
Example 2.4.13. Simpson’s Paradox. Two companies manufacture a certain type
of sophisticated electronic equipment for the government; to avoid lawsuits, let
us call them company C and company C. In the past, company C has had 5%
good output, whereas company C has had 50% good output (ie., 95% of C’s
output and 50% of C’s output is not of acceptable quality). The government has
just ordered 21,100 of these electronic devices, 10,100 from company C and
11,000 from company C. (Company C often does receive part of the order, for
political reasons, or because company C does not have a capacity large enough to
allow it to take the whole order, and so on.)
Since the electronic devices are critically needed for an important government
program, before production of the 21,100 devices starts, government scientists
develop a new manufacturing method that they believe will almost double the
percentage of good devices received. Companies C and € are given this informa-
tion, but (because of limited lead-time) its use is largely optional: they must each
use the new method for at least 100 of their devices. but use of the new method
beyond that point is left to their discretion.
‘When the devices are received and tested some time later, analysis of the
21.100 items shows a breakdown as in Table 2.4-1, which horrifies the govern-
ment: 46% of items produced by the usual method were good, but only 11% of
items produced by the new method were good. This shocks the responsible
government officials, who begin to berate (and worse!) their scientists and
companies C and C for the “Jousy new method” and for producing so many of
the items with the clearly inferior new method, respectively. Yet the scientists and
the companies insist that the new method has almost doubled the percentage of
good items. Can they possibly be right?
The answer is (amazingly) yes, as we see in Table 2.4-2. The new method has
nearly doubled the percent good in each case (10% versus 5% for company C and
95% versus 50% for company C). The totals put into the experimental (new)
TaBLE 2.4-1
Analysis of Total Production
Production Method
Standard New
Bad 5950 9005
Results
Good 5050 (46%) 1095 (11%)56 PROBABILITY
Tapie 24-2
Production Analysis by Company
Company
c c
Production Method Production Method
Standard New ‘Standard New
Bad 950 9000 5000 5
Results
Good | 50(5%) 1000(10%) — 5000(50%) 95 (95%)
method by each company are also quite reasonable: company C (which knew its
implementation of the standard method was already quite good) put the mini-
mum allowed (100), whereas company C (which knew its implementation of the
standard method was inferior) put 10,000 of its 11,000 through the new method.
The fact that results like those displayed here can occur is, when stated in
terms of probabilities, called Simpson’s paradox: It is possible to have P(A|B) <
P(A|B) even though both
P(A|BC) > P(A\BC),
= _ (2.4.14)
P(A\BC) > P(A|BC).
Simpson’s paradox, just discussed for the case of two populations, applies
equally well to cases with several populations; it is discussed further in Problems
2.32, 2.33, and 2.34. It was first discussed by E. H. Simpson (1951); more recent
Feferences are Blyth (1972) and Wagner (1982). The crux of why it can occur is as
follows. Suppose that in Population I (the standard production method), the rate
of occurrence of good results is r, in one segment of the population (company
C), and r, in the rest of that population (company C). The overall good rate for
that population (the standard production method) is r= w,r, + w,r, (where
2 = 1 — w,). In our example, r, = .05 with w, = 1000/1100, and r, = .50 with
— w,, hence
w,
1000 10000 ts
” = F900 (5) + 7999 (59) = oe =
In population 2 (the new production method), the comparable figures are
10, W, = 10000/10100, and R, = .95, W; = 1 — W,, so that
10000 100 1095
(10) + —— (95) = = 11
Re WAR + WARa= org 10100 10100 ~2.4. INDEPENDENT EVENTS. ST
Although .05 = r, < R, = .10 (the standard method is inferior to the new method
in company C) and .50 =r, < R, = .95 (the standard method is inferior to the
new method in company C), yet
6 = r= Wir, + Wor, > WR, + WR, = R= 1M.
This can occur since w, + W, (the proportion of production devoted to each
method differs in the companies).
Example 2.4.15. Independence / Nonindependence for Sampling With / Without
Replacement. Suppose that a box of chocolates contains 12 solid chocolates and
12 créme-filled chocolates. The candies are drawn at random from the box one at
a time. Find the probability that a solid chocolate is drawn for the fourth time on
the 7th draw if (a) sampling is done with replacement (the candies are examined,
but not eaten); (b) the sampling is done without replacement (as is usual in such
situations).
(a) When sampling is with replacement, the results of the individual
random draws are independent (draw i does not affect the results of
draw f) and hence
(4th solid occurs on draw 7)
= P(Exactly 3 solids in first 6 draws, and solid on draw 7)
= P(3 solids and 3 créme in first 6 draws) P(Solid on draw 7)
6\(2)42)'_ 29 5. yes
= ($(5) (5) ~ 28 7 32 0 0
(b) When sampling is without replacement, the draws are not independent
(hence conditional probability is used to obtain the values of the terms
that make up the probability of interest):
(4th solid occurs on draw 7)
= P(Exactly 3 solids in first 6 draws, and solid on draw 7)
(6) 2 Ml 1 12 nN 1 9 18150
24 23 22 «21 «20 «+19 18 100947
.1798.
(Here note that which draws in the first six are the solids affects only
the order of the numerator terms, not the overall value; the binomial
coefficient gives the number of possible orders, which reduces to 20
possibilities for the ordering of the 3 solids and 3 crémes on the first 6
draws. On the seventh draw, there are 12 — 3 = 9 solids left, and a total
of 24 — 6 = 18 candies in the box, for a 9/18 probability of 4 solid on
the 7th draw given that exactly 3 of the previous draws yielded solids.)