You are on page 1of 28

Module 1. Probability (Chapter 1 in the textbook) Example 1.

Simpson’s paradox (ground for abusing statistics)


§1. Introduction to probability Death penalty verdict by defendant’s race and victim’s race.

The subject studies Probability for Statistics. Victim’s Defendant’s Death Penalty Percent
race race Yes No of yes
So what is statistics? White 53 414
concerns collection, analysis and interpretation of data. White Black 11 37
White 0 16
for making informed decisions in all areas of science, business and government. Black Black 4 139
really good and useful. Not only statisticians but also people in many other (Source: Radelet and Pierce: Florida Law Review 43, 1-34 (1991))

disciplines do statistics. (Check http://www.statsci.org/jobs ) Finding 1: For each category of victims, the percentage of black defendants

but is often intentionally misused by people by finding ways to interpret the receiving death penalty is higher than that for white defendants.

data that are to their own favour. If the victim’s race factor is dropped, we get the following table:
“He uses statistics as a drunken man uses lamp posts —for support rather Defendant Death Penalty Percent
’s race Yes No of yes
than illumination” — Andrew Lang.
White
“There are three types of lies —lies, damn lies, and statistics.”—Benjamin Black
Disraeli. Finding 2: The percentage of black defendants receiving death penalty is lower.

1-1 1-2
Moral of the example: Now what is probability (theory)?
When the data are resulted from an observational study, there is a possibility that a third Probability theory provides a mathematics foundation for applying statistics
variable (called confounding variable) may affect the observed relationship between the where randomness/uncertainties are almost always involved.
two variables of interest. Probability refers to the quantification of randomness/uncertainty. (Note that
probability cannot remove or reduce the randomness/uncertainty.)
Simpson’s Paradox is that the relationship appears to be in a different direction
It would be interesting to know how probability can be used to quantify the
when the confounding variables are not considered.
randomness/uncertainty.

But we also want to know how much a confounding variable affects the observed
relationship between the two variables. Probability theory would be required in Example 2. Probability as a long term relative frequency.
answering this question. A tennis player made 1000 first serves in last year’s matches. Among these serves, 750
were successful. Suppose the player’s tennis skills stay as the same this year.
Question: What is the probability that his next first serve will be successful?
Apparently, the answer is Pr(success)

1-3 1-4
Moral of the example:
R worksheet continued:
Outcome in each trial occurs by chance. But there is a statistical pattern over large
> y=sample(x=c("S","F"), size=400, replace=T, prob=c(0.75, 0.25))
number of trials. The probability here is interpreted as the long-term relative > table(y)
F S
n(successes)
frequency: Pr(success) lim in some sense. 103 297 # The relative frequency is 74.25% from 400 serves.
n n(trials) > y=sample(x=c("S","F"), size=4000, replace=T, prob=c(0.75, 0.25))
> table(y)
If the probability of a successful first serve is really 0.75, then we would expect 958 3042 # The relative frequency is 76.05% from 4000 serves
the player to succeed around 75% of the first serves in long run. > y=sample(x=c("S","F"), size=40000, replace=T, prob=c(0.75, 0.25))
> table(y)
This can be illustrated through computer simulation using R or Maple. 9882 30118 #The relative frequency is 75.295% from 40000 serves.
R worksheet: We have performed 101 simulations,
> y = sample(x=c("S","F"), size=40, replace=T, prob=c(0.75, 0.25)) with the numbers of “first serves” being
#Simulate 40 first serves and save the results in y.
> y #List the 40 results 40, 400, 800, 1200, … , 40000
[1] "S" "S" "S" "F" "S" "S" "S" "S" "S" "S" "S" "F" "S" "S" "S" "S" "F" "S" "S" "S" [21]"F" "S" "S" "F"
"S" "S" "F" "S" "S" "F" "S" "S" "F" "F" "F" "F" "F" "S" "S" "S" respectively. The resultant relative
> table(y) #Summarize the result by tabulation (tally) frequencies are plotted to the left.
F S
12 28 #28 successes out 40, i.e. 70%, which is around 75%.
1-5 1-6
Example 3. Probability as a measure of uncertainty of knowledge.
Maple worksheet:
> with(Statistics): #The statistics package is attached. One of John’s medical tests came back as positive, implying he may have a disease D.

> P := [0.75, 0.25]: #Define the probability values for “S” and “F”. The test is only 95% accurate. Namely,
> X := RandomVariable(ProbabilityTable(P)): #Define a random variable The test is positive 95% of the time on patients with disease D.
> Y := Sample(X, 40): #Generate a sample of 40 “serves” according to prob. P. The test is negative 95% of the time on patients without disease D.
> interface(rtablesize = 40):

> Y; #List the results In addition, the occurrence rate of disease D in the population is 1 in 1000.
[1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 1, 2, 2, 1, 1, 2, 2, 1, 1, 2, 1, 1, 1, 1]
Question: What is the probability that John has disease D given his positive test result?
> FrequencyTable(Y, bins = 2); #List the frequencies and relative frequencies etc.. Remarks:
Range Freq, RelaFreq, CumFreq, CumRelaFreq Before answering this question, what is the interpretation of this probability?
[1. .. 1.500000000 33. 82.50000000 33. 82.50000000] Can this probability be interpreted as a long-term relative frequency? The process
[1.500000000 .. 2. 7. 17.50000000 40. 100.0000000] of infection on John has already completed, is not repeatable, and has a certain
outcome. Only that we do not have the knowledge of the outcome.
> Y := Sample(X, 40000): However, there is a large population that John belongs to. So we are looking at the
frequency of the disease in that population.
> FrequencyTable(Y, bins = 2);
Further, this is a conditional probability, conditional on the knowledge that his test is
[1. .. 1.500000000 29927. 74.81750000 29927. 74.81750000] positive.
[1.500000000 .. 2. 10073. 25.18250000 40000. 100.0000000] We will learn a systematic method to study this type of problem in the near future.

1-7 1-8
Now we solve the question in a special situation. Moral of the example:
Suppose there are 100,000 people in the population. Probability has a second interpretation as measuring the uncertainty of knowledge.
The proportion of people having this disease is 1 in 1000. Namely, in total 100 Probability of Event A conditional on Event B is not the same as the probability of
people suffer from this disease. Event B conditional on Event A.
Imaging every people in the population has a test for this disease. The population
can be grouped according to the actual status and test status for the disease.

Test + Test - Total


In order to rigorously define and systematically study probability we need some
Disease
No disease formal notations and terminology.
Total 100,000
Only people will be tested positive, out of whom only have the disease.
John is one of the people tested positive.
So the probability of John being one of the (Test+, Disease) cases is
Pr(John has the disease |his test is positive)=.

1-9 1-10
§2. Definitions and properties related to probability Example 5. The experiment is the same as in Example 4. Now we are interested in the
number of heads obtained. So the sample space is
Experiment: a process of obtaining an observed result .
S5 { }. This is also a finite sample space.
Trial: performing an experiment.
Remark. The same experiment can have different sample spaces depending on the
Outcome: an observed result of an experiment.
objective of the experiment.
Random experiment: an experiment in which the outcome cannot be predicted
Example 6. Consider the toss of one coin. If we are interested in the number of tosses
with certainty.
required to get a heads-on, the sample space would be
Definition 1. Sample space, or Outcome space, denoted by S, is the set of all
S6 { }, the set of all natural numbers, which is a countable infinite
possible outcomes of an experiment. Each outcome in S is called a sample point,
sample space.
which is often generically denoted as .

S is called a discrete sample space if S is either finite or countable infinite.


Example 4. Let an experiment be the toss of two coins. Suppose we are interested in the
observed face of each coin. Then the sample space is Example 7. Observe the lifetime T of a light bulb. The sample space is
S4 { }. This is a finite sample space.
S7 , which is a 1-D continuous sample space.

1-11 1-12
Example 8. Detect the epicenter of an earthquake. An outcome or a sample point is Definition 2. Event. A is called an event if A is a subset of the outcomes in the
denoted as (x, y, z) for (longitude, latitude, depth). Then the sample space is sample space S, i.e. A S . We say event A has occurred if we perform a random
S8 experiment and the outcome of the experiment is in A.

Example 9. There are 4 white marbles and 2 black marbles in a bag. Now take out two
which is a 3-D continuous sample space.
marbles at random. The sample space is S9 { }.
Each of the 4 sample points constitutes an event, i.e.
Remark. Different experiments can have equivalent sample spaces. For example,
A1 { }, A2 { }, A3 { }, A4 { } are all events.
experiments of having “Heads” or “Tails”, “raining” or “not raining”, and “male” or
The following are also events:
“female” can all have the same sample space S { yes, no} , where “yes” and “no”
1. E1 {The first draw is a black} { }.
have different interpretations for each experiment.
2. E2 {The second draw is a black} { }.
3. E3 {Exactly one draw is a black} { }.
The difference between E1, E2 , E3 and A1, A2 , A3 , A4 is that the former ones are
decomposable, while the latter ones are elementary events.

1-13 1-14
An event is called an elementary event if it contains exactly one outcome of the Operations of events are the same as those of sets.
experiment. 1. A B or B A: A is a sub-event of B, or B contains A, implying that outcome
There are two special events: The empty set is called the null (or impossible) B if A. For example
event. It cannot happen at any time. The sample space S itself is called the certain Let A ={# of phone calls < 5} and B ={number of phone calls < 6}. Then A B.
event which always happens. For any event A, S A .
Operations of events. If A B and B A , then A=B.
1. Different events can be defined in one sample space. 2. Intersection A B or AB represents the event “A and B”. So A B has occurred if
2. An objective of the probability theory is to enable computing the probabilities of both A and B have occurred.
complicated events from that of simple events. 3. Union A B represents the event “A or B”. So A B has occurred if either A or B
3. For this reason, one needs to study the relations among different events. has occurred.
4. Complement A' represents the event “not A”. E.g., A B' has occurred means A has
occurred but B not.

1-15 1-16
Definition 3. Properties of the event operations:
1. Events A and B are mutually exclusive if A B . 1. Commutative laws. A B B A, A B B A.
2. Events A1, A2 , A3 , are mutually exclusive if they are pair-wisely mutually 2. Associative laws. A (B C) (A B) C, A (B C) (A B) C.

exclusive, i.e. Ai Aj for any i j. 3. Distributive laws. A (B C) (A B) (A C ),

3. Events A1, A2 , Ak are collectively exhaustive if A1 A2 Ak S. A (B C) (A B) (A C) .

4. A1, A2 , Ak are mutually exclusive and exhaustive if A1 A2 Ak S 4. De Morgan's laws. ( A B)' A' B' , ( A B)' A' B' .
Proof: If (A B)' , A B. A and B.
and Ai Aj for any i j
A' and B' . A' B' . (A B)' A' B' .
Similarly, we can show that A' B' (A B)' .
Priority of the operations:
Therefore, ( A B)' A' B' .
1. complement, 2. intersection, 3. union or difference.
Similarly one can show that ( A B)' A' B' .

1-17 1-18
Example 10. Randomly select a student from a class. Let §3. Classic probability models
A = {the selected one is a male}, Given a random experiment and the associated sample space S, the primary objective
B = {the selected one does not like singing}, of probability modeling is to assign to each event A a real number P(A), called the
C = {the selected one is an athlete}. Then probability of A.
1. A B C ' is the event that the selected student is a male, does not like singing and Sometimes P(A) can be determined from the long-term relative frequency on A.
is not an athlete. Sometimes P(A) can be determined by the degrees of belief about A.
2. Under what conditions, A B C A? Generally, there is no universal method for determining the probability value P(A)
(All male students do not like singing but are athletes.) which must be dependent on the nature of the experiment involved.
3. When does C ' B hold? However, prior to focusing on specifying P(A), one can deduce an abstract
mathematical definition for probability by following the properties of the relative
4. When does A = B hold? frequencies of events.
With that definition and the associated properties of probability, we will be able to find
5. When does A' = C hold? the probability for any general event once some background information about the
experiment is provided.

1-19 1-20
In the case of discrete sample space, probability can be equivalently defined as a
Definition 4. Probability is a real-valued set function P that assigns to each event set function satisfying:
a) P({ }) 0 for any sample point S.
A in the sample space S a number P(A), called the probability of the event A, such
b) S P({ }) 1.
that the following properties are satisfied:
a) (Non-negative) P(A) 0 , c) For any sample point sequence 1, 2 , in S, P( i 1{ i }) i 1 P({ i })
b) (Regular) P(S) 1,

c) (Countable additive) If A1, A2 , A3 , are events and Ai Aj ,i j , then Example 11 (Example 9 continued). It can be found that

P(A1 A2 Ak ) P(A1 ) P(A2 ) P(Ak ) or equivalently P( A1 ) P({(W , W )}) , P( A2 ) P({(W , B)}) ,


P( ki 1 Ai ) k
i 1 P(Ai ) for each positive integer k;
P( A3 ) P({(B, W )}) , P( A4 ) P({(B, B)}) ,
further
P(A1 A2 A3 ) P(A1 ) P(A2 ) P(A3 ) or equivalently Then one can apply Definition 4 to find the probability for any event. E.g.
P( i 1 Ai ) i 1 P(Ai ) for an infinite, but countable, number of events.
P( A1 A2 ) P({(W , W ), (W , B)})

1-21 1-22
Properties of probability The classic probability model
(Theorem 1.2-1 in the text). For each event A, P(A) 1 P(A') . We now consider a special type of random experiments that are simple but have many
applications. There are two particular features for these experiments:
Theorem 1.2-2. For the null event , P( ) 0
a) The number of possible outcomes is finite, i.e. S is finite.
b) All possible outcomes (sample points), denoted by 1, , n, are equally likely,
Theorem 1.2-3. If events A and B are such that A B, then P(A) P(B).
1
i.e. P({ 1}) P({ n }) .
Theorem 1.2-4. For each event A, P(A) 1. n
Given that S is finite and has equally likely outcomes, the probability of any event A is
Theorem 1.2-5. If A and B are any two events, then # of sample points in A n(A)
P(A)
P(A B) P(A) P(B) P(A B). # of sample points in S n
Probability models where the probability is defined in the above way are called classic
Theorem 1.2-6. If A, B and C are any three events, then probability models, which often appear in games of chance, genetics, quality control
P( A B C) P( A) P( B) P(C ) P( A B)
and theoretical physics, etc.
P( A C ) P( B C ) P( A B C ).
To compute P(A), we only need to count both n(A) and n.

1-23 1-24
All counting techniques are derived from two basic counting rules: Some useful counting formulas.
a) Multiplication principle. If operation A1 can be performed in n1 ways, Permutations: The number of permutations of n distinct objects taken r at a time is
operation A2 can be performed in n2 ways, then there are n1 n2 ways to n!
n Pr Pn,r n(n 1)(n 2) (n r 1),
(n r)!
carry out both A1 and A2 .
which is also the number of ways of taking r objects from n distinct ones and
b) Addition principle. If operation A1 can be performed in n1 ways, operation
displaying them on a line. So permutations are dependent on the order.
A2 can be performed in n2 ways, then there are n1 n2 ways to carry out
Combinations: The number of combinations of n distinct objects chosen r at a time is
either A1 or A2 .
n n!
n Cr Cn,r ,
r r!(n r)!
Example 12. How many ways can a test of 20 True-False questions be answered?
which can also be regarded as the number of ways of taking r objects from n distinct
ones and putting them into a “box” or “cell”. So the order does not matter here.

Example 13. To travel to B from A, there are 2 routes if by a train, but 3 routes if by a
car. How many routes are there to go to B from A?

1-25 1-26
n Example 15. The number of permutations of n objects of which r1 are of one kind, r2
Remark: is frequently called a binomial coefficient because it appears in the
r of a second kind, ..., rk of a k-th kind is
n n r n r
binomial expansion: (a b)n b a .
r 0 r

Example 14. Suppose we have 5 marbles, 2 black and 3 white, but otherwise
indistinguishable. How many distinguishable permutations these 5 marbles can which is often called a multinomial coefficient.
arrange?
Example 16. The number of ways of partitioning a set of n distinct objects into k cells
with r1 objects in the 1st cell, r2 in the 2nd cell, and so forth is

Note the difference between distinguishable and indistinguishable objects.

which is the same as the answer of Example 15.

1-27 1-28
Probability computation in classic probability models Example 19. (Difficult?) Suppose there are N birds in a reserve park, all being attached
ID numbers from 1, 2, … until N respectively. A bird watcher has observed n birds and
Example 17. A 4-volume text is displayed on a shelf in a random order.
recorded their ID numbers during a visit to the park.
What is the probability that the displayed order is 1,2,3,4 or 4,3,2,1?
What is the probability that the largest number he has recorded is k?
Denote the event by E={(1,2,3,4), (4,3,2,1)}. So n(E)= . Also, n(S)= . Thus
(He may have observed the same bird more than one time.)
P(E )

Consider a set of n natural numbers among 1 to N as a sample point. The total number
Example 18. There are 10 resistors with resistance values 1 Ω, 2 Ω, 3 Ω, … until 10 Ω
respectively. We are to take 3 out and require that among the 3 taken, one is < 5Ω, one is of possible sample points is N n . Among these N n points, k n have their largest
= 5Ω, and the other is > 5Ω. components not larger than k, (k 1) n have their largest components not larger than
What is the probability that this outcome is to be achieved? k 1. Thus k n (k 1) n points have their largest components = k. Therefore,

1-29 1-30
Example 20. A box contains n tickets, numbered from 1 to n. If 3 tickets are selected at Example 22. Assume that among a+b items of certain product, a are defective and b are
random without replacement, what is the probability of obtaining tickets with good.
consecutive integers? a) If randomly select n items from the a+b with replacement, what is the
probability that exactly k are defective?
P
P
Example 21. (Difficult?) There are m black marbles and n white marbles in a bag. Now
consider the process of randomly picking out marbles one by one without replacement. b) If randomly select n items from the a + b without replacement, what is the
What is the probability that the k-th marble picked out is a black one? probability that exactly k are defective?
(1 k m n)

P P

1-31 1-32
Example 23. (Difficult?) There are one white marble and N-1 black marbles in a bag. From the above examples, we see the probability calculation in classic probability
Consider the process of randomly picking out a marble and putting in back a black models can be extremely difficult. See the book “An Introduction to Probability and
marble each time. What is the probability of picking out a black marble at the k-th step? Its Applications, Volume 1” by William Feller (1968) for many more classic
probability examples.
Let A = {Pick out a black marble at the k-th step}. Now calculate P(A' ) first. Event A' On the other hand, classic probability models have limited applications in practice.
implies that the white marble is selected at the k-th step and a black marble is selected Due to this we will not focus on the classic probability models in this subject.
at each of the first k-1 steps. Thus However, we do need to be familiar with the notations and the basic properties of
probability introduced in the current and previous sections.
P( A' )

It follows that

P( A) 1 P( A' )

1-33 1-34
§4. Conditional probability and multiplication rule If the outcome is to be identified by the two attributes “colour” and “time”, the

Example 24. There are 20 tulip bulbs with similar appearance. sample space S 2 will have 4 outcome points: (E, R), (E, Y), (L, R), (L,Y). It would

8 will bloom early, 12 late; not seem to be right to assume the four outcomes to be equally likely.

13 will be red, 7 yellow. Further the bulbs are distributed as Suppose we choose to use S1, then P( E ) , P ( L) , P( R E) etc..
Early (E) Late (L) Totals
Suppose we know “E” is to happen for the selected bulb. Conditional on this
Red (R) 5 8 13
information, the probability that the selected bulb is “R” is
Yellow (Y) 3 4 7
n( R E ) n( R E ) n( S1 ) P( R E )
Totals 8 12 20 P( R | E ) , namely
n( E ) n( E ) n( S1 ) P( E )
Consider the experiment of selecting a bulb at random. P( R E )
P( R | E ) .
If the outcome is to be identified by the bulb’s ID number, the sample space S1 will P( E )

have 20 outcome points. It would be reasonable to assume that each outcome is Remark: The conditional probability of event “R” given event “E” is the proportion

equally likely, i.e., a classic probability model can be used. of that part of “E” belonging to “R” in comparison with “E”.

1-35 1-36
Definition 1. The conditional probability of an event A, given that event B has
So P( A2 | A1 )
occurred, is defined by
P( A B) This probability can also be obtained through intuition.
P( A | B) , provided that P(B) > 0.
P( B) b) Conditional probability of an ace on the 1st draw given that the 2nd draw is an ace?
Example 25. Two cards are drawn without replacement from a deck of cards. Let
P( A1 | A2 )
A1 {An ace on the first draw} and A2 {An ace on the second draw}
1st draw is an ace 1st draw not ace totals The above result is not so intuitive; conditional probability definition helps.
nd
2 draw is an ace Conditional probability also satisfies the three probability axioms:
nd
2 draw not ace 1. P( A | B) 0. Properties Theorems 1.2-1~1.2-6 also hold for conditional probability.
These axioms and properties are very useful in probability computing.
totals 2. P( B | B) 1.
a) What is the conditional probability of getting an ace on the second draw given that 3. If A1, A2 , are mutually exclusive events, then
the first draw is an ace? a) P( A1 A2 Ak | B) P( A1 | B) P( A2 | B) P( Ak | B) for any k, and

P( A1 ) , P( A2 ) , P( A1 A2 ) . b) P( i 1 Ai | B) i 1 P( Ai | B) .

1-37 1-38
Definition 2. Multiplication rule: Example 27. Among the policyholders of an insurance company, 60% are with auto
policies, 40% with homeowner policies, and 20% for both. A person is selected at
P( A B) P( A) P( B | A) or P( A B) P( B ) P( A | B) random from the policyholders. Let
A1 {He/she has only auto policy.}, A2 {He/she has only homeowner policy.}
A3 {He/she has both.} , A4 {He/she has neither auto nor home policies.}, and
This rule follows straightforward from the definition of conditional probability. B {He/she is to renew at least one of the auto and homeowner policies.}
It is known that P( B | A1) 0.6 , P( B | A2 ) 0.7 , P( B | A3 ) 0.8 , P( B | A4 ) 0 .
Example 26. A bowl contains 7 blue chips and 3 red chips. Two chips are drawn What is the conditional probability that the person will renew at least one of the auto
successively at random and without replacement. Let and homeowner policies given that he/she currently has an auto or homeowner policy
A={The 1st draw results in a red chip.}, B={The 2nd draw results in a blue chip.} or both? Namely, find P( B | A1 A2 A3 ) . ( A1, A2 , A3 are mutually exclusive.)
Using Venn diagram follows P( A1) 0.4, P( A2 ) 0.2, P( A3 ) 0.2, P( A4 ) 0.2,
What is the probability of first drawing a red chip and second drawing a blue chip?
So P( B | A1 A2 A3 )
P( A B)

1-39 1-40
Example 28. A device has two components, C1 and C2 . §5. Independent events
It will operate if at least one of the two components is operating. Events A and B are said to be independent if the occurrence of one of them does not
The probability that one component will fail when both are working in a one-year change the probability of the occurrence of the other.
period is 0.01. Namely, A and B are independent if P( A | B) P( A) or P( B | A) P( B) .
However, when one fails, the probability of the other failing is 0.03 in that one-year P( A B) P( A B)
Since P( A | B) and P( B | A) , it follows that independence
period due to added strain. P( B) P( A)

The two components cannot fail at the same time. of A and B implies P( A B) P( A) P( B) .

What is the probability that the device fails during a one-year period?
Definition 1. Events A and B are independent if and only if P( A B) P( A) P( B) .
P(device fails) = Otherwise A and B are said to be dependent events.

1-41 1-42
Example 29. Flip a fair coin twice and observe the sequence of heads and tails. Example 31. A red die and a white die are rolled. Let event C ={5 on red die} and event
The sample space S ={HH, HT, TH, TT} 6 2 1 1
D={sum of dice is 11}. Then P(C ) P( D) P(C D) .
st nd 36 36 108 36
Let A = {heads on the 1 flip} = {HH, HT}, B = {tails on the 2 flip} = {HT, TT},
Hence C and D are dependent events.
C = {tails on both flips} = {TT}.
Remark: Events A and B are always independent if P(A)=0 or P(B)=0. Why?
It is easy to see that P( A) P( B) 2 4 1 2 , P(C ) 1 4 ,
Theorem 1.5-1. If A and B independent, then the following is true:
P( A B) P({HT }) 1 4 , P( B C) P(C ) 1 4 .
a) A and B' are independent.
So A and B are independent, but B and C are not.
b) A' and B are independent.
Independence of A and B here is apparent b/c A and B are with different trials.
c) A' and B' are independent.
Example 30. A red die and a white die are rolled. Let event A ={4 on the red die} and
Proof of (c):
event B ={sum of dice is odd}. Thus
P( A' B' ) P(( A B)' ) 1 P( A B)
P( A) P( B) 1 [ P( A) P( B) P( A B)] 1 P( A) P( B) P( A) P( B)
(1 P( A))(1 P( B)) P( A' ) P( B' ).
Therefore, A and B are independent events. Independence of A and B here is not so
A' and B' are independent.
obvious.

1-43 1-44
Example 32. An urn contains 4 balls numbered 1, 2, 3 and 4. One ball is to be drawn at Definition 2 can be extended to mutual independence of four or more events.
random from the urn. Let the events A, B and C be defined by A={1,2}, B={1,3}, Definition 3. A1, A2 , , Ak are said to be mutually independent if for every
C={1,4}. Then P(A)=P(B)=P(C)=1/2. Furthermore, j=2,3,…,k and every subset of distinct indices i1, i2 , ,i j,
P( A B) 1 4 P( A) P( B) , P( A C) 1 4 P( A) P(C ) , P( B C) 1 4 P( B) P(C ) ,
P( Ai1 Ai2 Ai j ) P( Ai1 ) P( Ai2 ) P( Ai j ) .
which implies that A, B and C are independent in pairs (called pairwise independence).
Remarks:
However, P( A B C) 1 4 1 8 P( A) P( B) P(C ) .
Mutual independence implies pairwise independence; but the contrary is not true.
That is, something seems lacking for the complete independence of A, B and C.
Mutual independence of A1, A2 , , Ak implies the independence of an event
This prompts the introduction of the following definition:
induced from a subset of A1, A2 , , Ak and another event induced from the
Definition 2. Events A, B, and C are mutually independent if and only if the
complement subset of A1, A2 , , Ak . For example, if A1, A2 , , A4 are independent,
following conditions hold:
They are pair-wisely independent; i.e. all the following pairs are independent events:

P( A B) P( A) P( B) , P( A C) P( A) P(C ) , and P( B C) P( B) P(C ) . o A1 and A2 A4

Also P( A B C) P( A) P( B) P(C ) o A1' and A3 A4' , etc..

1-45 1-46
Example 33. A rocket has a built-in redundant system. In this system, if component K1 Example 35. Three inspectors look at a critical component of a product. Their
fails, it is bypassed and component K 2 is used. If component K 2 fails, it is bypassed and probabilities of detecting a defect are different, i.e. 0.99, 0.98, and 0.96, respectively.
component K 3 is used. Suppose that the probability of failure of any one of these Assuming independence,

components is 0.15 and assume that the failures of these components are mutually The probability that at least one detects the defect is

independent events. Let Ai denote the event that component K i fails for i=1, 2, 3. The probability that only one finds the defect is

Because the system fails only if all components fail, the probability that the system does
not fail is given by
Example 36. Suppose that on five consecutive days an “instant winner” lottery ticket is
P[( A1 A2 A3 )' ] =
bought and the probability of winning is 1/5 on each day. Assuming independent trials,
Example 34. The probability that a company’s work force has at least one accident in a
P(WWLLL ) , P(LWLWL )
month is (0.01)k, where k is the number of days in that month. Assume the numbers of
accidents are independent from month to month. If the company’s year starts with In general, the probability of buying two winning tickets and three losing tickets is
January, the probability that the first accident is in April is
P(none in Jan to Mar, at least one in Apr) =

1-47 1-48
§6. Law of total probability, Bayes’s theorem, prior and posterior probabilities From Example 37, we will be able to see the following results:
Example 37. A common lot of microchips come from 3 manufacturers. The numbers of
defective and non-defective microchips from this lot are summarized in the following Law of total probability: If B1, B2 , , Bk are mutually exclusive and exhaustive events,
Manufacturer 1 Manufacturer 2 Manufacturer 3 Totals namely, Bi Bj for 1 i j k and S B1 B2 Bk .
Defective 5 10 5
Then we have
Non-defective 20 25 35 k
a) A (A B1) (A B2 ) (A Bk ) i 1A Bi for any event A.
Totals
b) P( A) P( A B1) P( A B2 ) P( A Bk ) k
Now select a microchip at random from the lot. i 1 P( A Bi ).

Define the event A={the selected is defective}, Bi ={made by manufacturer i}, i=1,2,3.
It is easy to see that P(A)= . Remark: Law of total probability sometimes helps substantially in computing the
probability of a complex event A by first finding a partition of S then finding the
Also S B1 B2 B3, and B1, B2 , B3 are mutually exclusive. So
probability of each partitioned subset of A.
P(A)

1-49 1-50
Example 38. A boy has 5 blue and 4 white marbles in his left pocket and 4 blue and 5 Bayes’s Theorem, prior probability, posterior probability.
white marbles in his right pocket. If he transfers one marble at random from his left to Applying the definition of conditional probability, the multiplication rule and the law of
his right pocket, what is the probability of his then drawing a blue marble from his right total probability, it follows that
pocket? P( B j A) P( B j ) P( A | B j ) P( B j ) P( A | B j )
P( B j | A) .
st
First let BL1, BR2 and WL1 denote getting blue from left pocket in the 1 draw, P( A) k
i 1 P( A Bi ) k P( B ) P( A | B )
i 1 i i
getting blue from right pocket in the 2nd draw, and getting white from left pocket in This result is summarized by
the 1st draw, respectively. Then Bayes’s Theorem: If B1, B2 , , Bk are mutually exclusive and exhaustive events,
P( BR 2) (applying the law of total probability)
namely, Bi Bj for 1 i j k and S B1 B2 Bk . (Such B1, B2 , , Bk

(applying the multiplication rule) may also be said to constitute a partition of the sample space.)
P( B j ) P( A | B j )
Then for each j=1, 2, …, k, we have P( B j | A) .
k P( B ) P( A | B )
i 1 i i

P( B j ) P( A | B j )
In particular, if k = 2, P( B j | A) .
is the pursued probability. P( B1 ) P( A | B1 ) P( B2 ) P( A | B2 )

1-51 1-52
Remarks:
1. Bayes’s theorem tells us that the conditional probability P( B j | A) may be computed Example 39. In a factory, machines I, II, and III are all producing springs of the same

from the partitioning probabilities P( B j ) and the conditional probabilities P( A | B j ) length. Of their production, machines I, II, and III produce 2%, 1% and 3% defective
springs, respectively. Of the total production of springs in the factory, machine I
with respect to each partition.
produces 35%, machine II produces 25%, and machine III produces 40%. One spring is
2. There is a particular branch of statistics called Bayesian statistics that is based on
to be selected at random from the total springs produced in a day,
the use of the theorem. There (and elsewhere) P( B j ) is called the prior probability
a) What is the probability that the selected spring is defective?
and P( B j | A) is called the posterior probability.
P( D)
3. When applying Bayes’s theorem for calculating probabilities it is often very helpful
to draw a diagram for the problem under consideration.

b) If the selected spring is defective, what is the conditional probability that it was
produced by machine III?

P( III | D)

1-53 1-54
Example 40. A Pap smear is a screening procedure used to detect cervical cancer. Example 41 (Example 3 revisited).
For women with this cancer, there are about 16% false negatives; that is One of John’s medical tests came back as positive, implying he may have a disease D.
P(T- = test negative |C = cancer) = 0.16, P(T+ = test positive |C = cancer) = 0.84. The test is only 95% accurate. Namely,
For women without cancer, there are about 19% false positives; that is The test is positive 95% of the time on patients with disease D.
P(T+ |C' = not cancer) =0.19, P(T- |C' = not cancer) =0.81 The test is negative 95% of the time on patients without disease D.
In a country, there are about 8 women in every 100,000 have this cancer; that is In addition, the occurrence rate of disease D in the population is 1 in 1000.
P(C)=0.00008 and P(C')=0.99992 Question: What is the probability that John does contract the disease D given his
By Bayes’s theorem positive test result?

P(C T ) 0.00008 0.84


P(C | T ) 0.000354 .
P(T ) 0.00008 0.84 0.99992 0.19 By Bayes’s theorem

This result means that, for every million positive Pap smears, on average only 354 P( D T ) 0.001 0.95 95
P( D | T ) 0.01866405 .
represent true cases of cervical cancer. The reason for that is the low incidence of P(T ) 0.001 0.95 0.999 0.05 5090

that cancer and high error rates of the procedure, i.e. 0.16 and 0.19.

1-55 1-56

You might also like