You are on page 1of 24

MGMT 402 Data & Decisions

Fall 2020
Lecture 1 Notes

1 Learning objectives

You should be able to:

1. Understand the concepts of an experiment, events and sample space;

2. Apply the addition, total probability and multiplication rules;

3. Understand the meaning of independence;

4. Be able to solve problems using both the probability table method and the
probability tree method.

2 Events

Let’s begin by defining some basic terms.


An experiment is an activity that produces a random outcome.
The sample space S is the set of all possible outcomes.
An event E is a set of outcomes from S.
Here is a simple example. Suppose you have a 6-sided die. Let’s define the experi-
ment as rolling the die once and recording the number that comes up. The sample
space S is
S = {1, 2, 3, 4, 5, 6}.
There are many different events we can define here:

• An event could be just a single number, e.g., E = {6} (we roll a six) or E = {2}
(we roll a two).

• An event could also be more complicated – for example, E could be that we


roll an even number. This is the same as defining E = {2, 4, 6}.

1
Note that in the example above, E = {2, 4, 6} means that we roll “one of 2, 4, or 6”. It
does not mean that we roll a 2, followed by a 4, followed by a 6.
We have some classifications for events. A collection of events is mutually exclusive
if at most one of the events can occur. A collection of events is collectively exhaustive
if at least one of the events must occurs.
In the 6-sided die example:

• {1} and {2, 4, 5} are mutually exclusive – {1} may happen, or {2, 4, 5} may happen,
but not both. Note here that it is possible that neither {1} nor {2, 4, 5} occurs
(if we roll a 3, for example) – for this reason, these events are not collectively
exhaustive.
• {1, 2, 3}, {2, 5, 6}, {2, 3, 4, 6} are collectively exhaustive; at least one of the events
must occur, because together they cover all possible outcomes. They are not
mutually exclusive because {1, 2, 3} and {2, 5, 6} could both occur if we roll a 2.
• {1, 2}, {3, 4, 5} and {6} are both mutually exclusive and collectively exhaustive;
exactly one of the must occur.

Here is a more complicated example.


Example – Japan-Tokyo. Let’s define the following two events:

• A = you have been to Japan.


• B = you have never been to Tokyo.

Question: Are A and B mutually exclusive? Are they collectively exhaustive?


Answer: To check if they are mutually exclusive, we have to check if at most one of the events
can occur. We can check this by asking whether both A and B can happen; if yes, then they
are not mutually exclusively. Could both A and B happen? Yes, because it is possible to have
been to Japan but to have never been to Tokyo. For example, if you had only ever been to
Osaka, you would have been to Japan, but you would have never been to Tokyo. Therefore, A
and B are not mutually exclusive.
To check if they are collectively exhaustive, we have to check if at least one of the events must
occur. Another way we can check this is by asking if it is possible that neither event occurs
– if yes, then they are not collectively exhaustive. Could neither A nor B happen? No; it is
not possible for someone to never have been to Japan and to have been to Tokyo.
A different way of seeing that they are collectively exhaustive is to see that people who have
never been to Japan could never have been to Tokyo: either a person has been to Japan, in

2
which case they fall in event A, or they have never been to Japan, in which case they must
never have been to Tokyo, and they fall under event B.
(In Lecture 1, we did a slightly different version of the above, where A = you have
never been to Japan, B = you have been to Tokyo. The same logic can be used to
show that A and B are mutually exclusive, but are not collectively exhaustive.)

3 Venn diagrams

Some of the ideas in the previous section are better understood visually, with the aid
of a Venn diagram.
A Venn diagram is used to represent sets of outcomes. Usually we use a rectangle
to represent the sample space:

Sample space

3
An event A is a set of outcomes in that sample space:

Sample space

We might have more than one event – for example, there might be another event B:

Sample space

B
A

A and B

A and B are the two circles; the event A ∩ B is their intersection, which is the lens-
shaped area in the middle. A ∪ B is the union of the two shapes; it is the set of all
points that those two shapes cover.

4
How do we think about probabilities? Think of P(A) as being the area of A relative
to the sample space.
Mutually exclusive means that A and B have no overlap:

Sample space

B
A

Collectively exhaustive means that A and B together cover the whole sample space:

Sample space

B
A

A and B

(In this picture, A is red, B is blue, and their intersection is the purple region.)

5
Mutually exclusive and collectively exhaustive means that A and B do not overlap,
and together cover the entire sample space:

Sample space

A
A
B

4 Probability definitions

We can now start to talk about probabilities. Let A be an event. P(A) is then the
probability of event A. By definition, the probability of an event is always between
0 and 1:
0 ≤ P(A) ≤ 1.

We will define B | A as a conditional event: it means that event B occurs given that
event A has occurred.
The conditional probability P(B | A) is the probability of B occurring given that A has
occurred.
To contrast them from conditional events, we sometimes refer to the events A or B
on their own as unconditional events, and the corresponding P(A) or P(B) as uncon-
ditional probabilities.
For example, with dice:

• P(2) is the probability that we roll a 2. The event 2 is an unconditional event,


and the probability P(2) is an unconditional probability.

6
• P(2 | Even) is the probability that we roll a 2, given that the number that shows
up is even. The event 2 | Even is a conditional event, and the probability is a
conditional probability.

Given two events A and B, we denote the probability of A or B or both occurring


as P(A or B). We will sometimes write this in a more formal mathematical way as
P(A ∪ B) (“the probability of A union B”; A ∪ B is the set of outcomes where at least
one of A or B happens).
We denote the probability of A and B occurring as P(A and B). We refer to “A and B”
as a joint event – both must occur. In math, we will sometimes write this as P(A ∩ B)
(“the probability of A intersect B”; A ∩ B is the set of outcomes where A and B both
happen).
The sample space S is the set of all outcomes. The probability of S is always 1:

P(S) = 1.

For an event A, we denote the complement of A as the set of outcomes for which A
does not occur. We will denote the complement of A by “not A” or by AC (C stands
for complement). The probability of the complement of A is 1 minus the probability
of A:
P(not A) = 1 − P(A).

5 Rules for probabilities

The addition rule tells us, for events A and B, how to compute P(A or B):

P(A or B) = P(A) + P(B) − P(A and B).

In words, we add the probability of A and the probability of B, and then remove the
probability of A and B so as to not double count.

7
The addition rule can be visualized with the aid of a Venn diagram:

+ -
=

P(A or B) P(A) P(B) P(A and B)

(We add the area of A and the area of B, and then we subtract A ∩ B, so as to not
double count it.)
If A and B are mutually exclusive, then P(A and B) = 0, and the addition rule simpli-
fies:
P(A or B) = P(A) + P(B).

The total probability rule tells us how to divide up a probability P(A) using the joint
probabilities P(A and B) and P(A and BC ):

P(A) = P(A and B) + P(A and BC ).

This rule holds more generally. If we have A, and any collection of mutually exclusive
and collectively exhaustive events B1 , . . . , Bk , then

P(A) = P(A and B1 ) + P(A and B2 ) + · · · + P(A and Bk ).

The multiplication rule links conditional probabilities and joint probabilities. For
any two events A and B, we have

P(A and B) = P(A) × P(B | A)

or equivalently

P(A and B)
P(B | A) = .
P(A)

Example – Movies. Let:

• W be the event that a movie is given a wide release (1000+ theaters); and

8
• B be the event that a movie is a blockbuster ($500 million+ in revenue).

Suppose that:

P(W) = 0.48,
P(B) = 0.18,
P(W and B) = 0.16.

Question: What is P(W or B)?


Answer: Use the addition rule:

P(W or B) = P(W) + P(B) − P(W and B)


= 0.48 + 0.18 − 0.16
= 0.5.

Question: What is P(W C and B) (a movie is a blockbuster and not a wide release)?
Answer: Notice from the total probability rule that

P(B) = P(B and W) + P(B and W C )

which we can re-arrange:

P(B and W C ) = P(B) − P(B and W)


= 0.18 − 0.16
= 0.02.

Question: What is P(B | W) (a movie is a blockbuster, given that it is a wide release)?


Answer: Apply the multiplication rule:
P(B and W)
P(B | W) =
P(W)
0.16
=
0.48
1
= .
3

In the above computations, we proceeded by applying the formulas. Sometimes it


is easier to make a table:

9
B BC
W 0.16 0.48
WC
0.18 1

In the middle 2×2 portion of the table, the cells correspond to the probabilities of the
four possible joint events: W ∩ B, W C ∩ B, W ∩ BC , W C ∩ BC . The row underneath
contains the column sums of the 2 × 2 portion, while the column to the right of the
2 × 2 portion contains the row sums. The row sums correspond to the probabilities
of W and W C , while the column sums correspond to the probabilities of B and BC .
The bottom-right most cell is ; it is the sum of the two cells in the same row on the
left (P(B) and P(BC )) and the two cells in the same column above it (P(W) and P(W C )).
To get all of our probabilities, we simply need to fill the table in so that the row and
column sums are respected. Doing so, we get:

B BC
W 0.16 0.32 0.48
WC 0.02 0.50 0.52
0.18 0.82 1

We can use this table to easily read off probabilities of individual events or joint
events – for example:

• P(W C ∩ BC ) = 0.50 (we look up row W C and column BC ); or

• P(W C ) = 0.52 (the row sum for row W C ).

The table doesn’t give us the conditional probabilities; to get those, we still need to
apply the multiplication rule.

10
6 Birthday paradox

In class, we did an interesting experiment: are there at least two people in the room
who have the same birthday? Our initial reaction might be no – “there are so many
days in the year, how could two people have the same birthday?”
Let’s do some calculations. Let’s assume that everybody is born in a regular year,
with 365 days. Let’s assume there are 78 people in the class. We are interested in
calculating:

P(At least two people out of 78 have the same birthday).

One way that we can do this is by thinking of the complement. Our event is:

A = At least two people out of 78 with the same birthday

and its complement is:

AC = No two people have the same birthday.

The event AC is the same as all 78 people having distinct birthdays. So we can use
the complement rule: we know that P(A) = 1 − P(AC ). But what is P(AC )?
To figure it out, let’s do some counting. Let’s suppose there are 3 people in the room,
to start with.

• How many different outcomes are there? Well, each person can be born on one
of the 365 days, and there are 3 people. So there are

365 × 365 × 365

different outcomes.

• In how many of those outcomes are the birthdays different? The answer is

365 × 364 × 363

which is obtained as follows: the first person can be born on any of the 365
days. The second person can be born on any of the days, except the one that
the second person is born on – there are 364 such days. The third person can
be born on any of the days, except those taken by the first two people – there
are 363 such days.

11
• The probability of AC is then just the number of outcomes of interest, divided
by the total number of outcomes:

365 × 364 × 363


P(AC ) = = 0.991796.
365 × 365 × 365
P(A) is then just

P(A) = 1 − P(AC ) = 1 − 0.991796 = 0.0082.

So for 3 people, that probability – with all of our assumptions – is pretty low; only
0.0082. But what happens when there are more people in the class? Intuitively, it
starts to get more likely that at least two people will have a common birthday. In
general, for n people in the class, the probability of AC is

365 × 364 × · · · × (365 − n + 1)


P(AC ) =
365 × 365 × · · · × 365
and P(A) is
365 × 364 × · · · × (365 − n + 1)
P(A) = 1 − .
365 × 365 × · · · × 365
The table below shows how P(A) changes as n changes. With 65 people (the size of
Section 2), this probability is over 0.997; with 71 people (the size of Section 1), this
probability is over 0.9993; with 78 people (the size of Section 3), this probability is
over 0.9998!

n P(A)
3 0.0082
10 0.1169
20 0.4114
40 0.8912
50 0.9704
60 0.9941
65 0.9977
70 0.9992
71 0.9993
78 0.99986

12
7 Independence

We say that events A and B are independent if knowing the occurrence of one event
does not change the probability of the other event. Mathematically, there are three
equivalent ways to say that A and B are independent:

1. P(A and B) = P(A) × P(B).


(The joint probability factorizes.)

2. P(A | B) = P(A).
(The conditional probability of A given B is the same as the unconditional
probability of A.)

3. P(B | A) = P(B).
(The conditional probability of B given A is the same as the unconditional
probability of B.)

If one of these conditions is true, the other two are true. If any one is false, the other
two must be false.
If any of the conditions is false, we say that A and B are not independent. (It is not
common to say “dependent.”)
If we know that events A and B correspond to two physical processes that do not
interact with each other, then it is safe to assume that A and B are independent. Here
are some specific examples:

• Suppose you roll a 6-sided die and flip a coin. Let A = you roll an even number,
and let B = the coin lands heads. We know that

P(A) = 3/6,

P(B) = 1/2.
These events are independent. The number you roll on the die should not
be associated with or influence the side the coin lands on, and vice versa.
Therefore,
P(A and B) = 3/6 × 1/2 = 1/4,
and similarly,

P(A | B) = 3/6,
P(B | A) = 1/2.

13
• Suppose you roll a 6-sided die twice. Let A = the first die roll is a 6, and B =
the second die roll is a 6. We know
P(A) = 1/6,
P(B) = 1/6.
These events are independent; whether you roll a 6 the first time shouldn’t
change the probability that you roll a 6 the next time. Therefore:
P(A | B) = 1/6,
P(B | A) = 1/6,
P(A and B) = 1/6 × 1/6 = 1/36.

Here are some examples of non-independent events:

• Suppose you roll a 6-sided die once. Let A = the die roll is ≥ 3, B = the die roll
is ≤ 4. These events are not independent – we have:
4
P(A) = ,
6
4
P(B) = ,
6
but when we compute the joint probability
P(A and B) = P(Die roll is in {3, 4})
2
=
6
4 4 4
, × =
6 6 9
The joint probability is 2/6 (= 1/3), which is not the same as what it should be if
the events were independent (= 4/9). If we check the conditional probabilities,
we get:
2/6
P(A | B) = = 1/2,
4/6
2/6
P(B | A) = = 1/2,
4/6
which are not the same as the unconditional probabilities P(A) and P(B). In
fact, the conditional probability is lower; this makes sense, because if we know
that A occurs, then we know that the roll was not 1 or 2, which makes it less
likely that B occurs (the die roll is less than or equal to 4).

14
Independence is not the same as mutual exclusivity. In fact, if A and B are mutually
exclusive, then they are not independent. To see this, suppose that A has occurred.
Since at most one of A and B can occur, B cannot occur. The probability of B occurring
given A is therefore zero (P(B | A) = 0), even though P(B) might be different from
0. (For a concrete example of this, try the 6-sided die example above with A = the
die roll is ≤ 3, B = the die roll is ≥ 4. A and B are mutually exclusive and not
independent.)
Let’s discuss some real-world examples where we might think about independence:

• Facebook runs an ad for a Dyson air purifier to 30 different users. The users
differ in geographic locations and demographic attributes. If we let Ai be the
event that user i decides to purchase the air purifier, then it could be appropriate
to model A1 , . . . , A30 as independent.

• Now suppose that the 30 users are all in the Los Angeles area. In this situation,
independence could fail to hold in lots of ways. For example, if some of the
users are friends with each other, then the purchasing decision of one user
could affect those of the others (e.g., if user 5 buys the air purifier, then users
7, 8 and 10 who are all user 5’s friends could become more likely to buy it).
Similarly, there could also be a common event (for example, a sharp increase
in wildfires / increase in the air quality index) that links the users’ purchasing
decisions together. (If user 5 buys the air purifier, then it is more likely that the
air quality is poor, which in turn increases the probability of user 7 buying the
air purifier.)

• A pharma company invests in 10 different projects. Each project is a drug


development project with the goal of developing a drug for a disease. The drug
projects are completely unrelated and intended to treat completely different
diseases. If we let Ai be the event that project i is successful, then it might be
appropriate to model A1 , . . . , A10 as being independent.

• Now suppose that the drug projects are related. For example, suppose project
3 is a pancreatic cancer drug project and project 4 is a colon cancer drug project.
These projects could be related in that the location (gastrointestinal system) and
the type of disease (cancer) are similar; they could also be related in that the
development teams overlap, the lab resources used by the projects are similar,
or the underlying technology is similar. For any of these reasons, we might
believe that A3 and A4 are not independent; if the pancreatic cancer drug is
successful, then the probability of the colon cancer drug being successful could
change (become higher), and vice versa.

15
Does independence have something to do with causality? Yes and no. If A and B
are independent, then they are not causally related.
But if A and B are not independent, then it is not necessarily the case that either A
causes B or B causes A. For example, suppose that A = I eat ice cream today, and B =
I carry an umbrella today. Then P(B | A) and P(B) will not be the same, and P(A | B)
and P(A) will not be the same, which means that these events are not independent.
But this does not mean that A causes B, or that B causes A. It might be that there
is some other factor that is driving both events, such as the weather – if it is sunny
today, I am likely to eat ice cream and unlikely to carry an umbrella, and if it is
raining today, I am unlikely to eat ice cream but likely to carry an umbrella.

8 Some problems

Example – coke and pepsi. Let’s assume that every person is either a coke drinker
or a pepsi drinker. Let C1 be the event that a person is a coke drinker this week, and
C2 that he is a coke drinker next week; similarly, let P1 be the event that a person is
a pepsi drinker this week, and P2 that he is a pepsi drinker next week.
From market research, we know that 60% of coke drinkers in one week stay coke
drinkers in the next week, while 40% switch to pepsi; similarly, we know that 70% of
pepsi drinkers in one week stay loyal to pepsi, while the other 30% switch to coke.
We also know that this week, half of the population are pepsi drinkers and half are
coke drinkers.
Question. What is the probability that a random person is a coke drinker next week?
Answer. Let’s do this as a table:

C2 P2
C1 0.5
P1 0.5

Notice that in the table above, the rows are labeled C1 and P1 , while the columns
are labeled C2 and P2 . We did this because C1 and P1 are mutually exclusive events,
and C2 and P2 are mutually exclusive events. Suppose we instead did the following
table:

16
P1 P2
C1
C2

This table is not the right table to use because in this table, row sums and column
sums will not behave the way we want them to. For example, if we tried to add the
row for C1 , the sum would be P(C1 ∩ P1 ) + P(C1 ∩ P2 ). This quantity is not equal to
P(C1 ). This is the case because P1 and P2 are not mutually exclusive and collectively
exhaustive, so it is incorrect to apply the total probability rule here.
Back to our original table – notice that we only have P(C1 ) and P(P1 ) filled in. Let’s
first compute all of the joint probabilities:

P(C2 ∩ C1 ) = P(C2 | C1 ) × P(C1 ) = 0.6 × 0.5 = 0.3


P(P2 ∩ C1 ) = P(P2 | C1 ) × P(C1 ) = 0.4 × 0.5 = 0.2
P(C2 ∩ P1 ) = P(C2 | P1 ) × P(P1 ) = 0.3 × 0.5 = 0.15
P(P2 ∩ P1 ) = P(P2 | P1 ) × P(P1 ) = 0.7 × 0.5 = 0.35.

Let’s now fill in the table: (red indicates the entries we calculated above, and blue
indicates row sums that we infer from the joint probabilities)

C2 P2
C1 0.3 0.2 0.5
P1 0.15 0.35 0.5
0.45 0.55

The question asked for the probability that a random person is a coke drinker next
week. This is just P(C2 ), which is the column sum for C2 . This is just 0.45.
Question. Are C2 and C1 independent?
Answer. From the problem statement, we have P(C2 | C1 ) = 0.6, and from the table,
we have P(C2 ) = 0.45. Since these probabilities are not the same, we can conclude
that C2 and C1 are not independent.

17
Example – coke and pepsi, with probability trees. Another way we can solve the
coke and pepsi problem is using something called a probability tree.
For this problem, the probability tree in the abstract looks like this:

C2
?
0.6
C1

0.5
P2
?
0.4

C2
?
0.3
P1

0.5
P2
?
0.7

This tree has two levels. The first level is whether a random person this week drinks
coke (C1 ) or drinks pepsi (P1 ). The second level is whether the person drinks coke
next week (C2 ) or pepsi next week (P2 ).
Under the two branches in the first level, the probabilities are just the unconditional
probabilities of coke/pepsi this week.
Under the branches in the second level, the probabilities are now the conditional
probabilities. The conditioning event is whatever branch we took at the previous
level.
The probabilities all the way on the right, at the ends of the branches, are the
joint probabilities. For each endpoint, this probability is obtained by multiplying the
probabilities along the branches. (This is just the multiplication rule, but represented
graphically via a tree).
We can fill out the tree with the information we have:

18
C2
?
0.6
C1

0.5
P2
?
0.4

C2
?
0.3
P1

0.5
P2
?
0.7

We have all of the unconditional and conditional probabilities, so we can multiply


them together to obtain the joint probabilities:

C2
0.3
0.6
C1

0.5
P2
0.2
0.4

C2
0.15
0.3
P1

0.5
P2
0.35
0.7

19
So in the above tree, the number 0.3 at the end of the top-most branch is P(C1 ∩ C2 )
(coke drinker this week and coke drinker next week). From here we can easily
answer the previous questions:

• What is P(C2 )? To get P(C2 ), we consider those branches with C2 :

C2
0.3
0.6
C1

0.5
P2
0.2
0.4

C2
0.15
0.3
P1

0.5
P2
0.35
0.7

We sum 0.3 (= P(C1 ∩ C2 )) and 0.15 (= P(P1 ∩ C2 )), obtaining 0.45.

• Are C1 and C2 independent? There is an easy way to see this from the tree. In
particular, notice that the conditional probability of C2 changes depending on
whether we took the C1 branch or the P1 (“not C1 ”) branch. If C1 and C2 were
independent, the conditional probability of C2 would be the same for both the
C1 and the P1 branches. But it is not, so they are not independent.

20
Example – lost luggage. You decide you’ve had enough of the smokey weather in
LA these days, and book a ski vacation in Switzerland for the winter break. You
book a flight from Los Angeles with three legs: Los Angeles → New York → London
→ Zurich.
You decide to check a bag from Los Angeles to Zurich. For each flight, there is a 5%
probability that your bag will be lost and not loaded on to the plane. Assume that
the flights are independent of each other.
Question. You arrive in Zurich. What is the probability that your bag was lost?
Answer 1. Let L1 , L2 and L3 be the events that your bag was lost on flight 1 (LA →
NYC), flight 2 (NYC → London) or flight 3 (London → Zurich). Our probability of
the bag being lost is
P(Lost) = P(L1 or L2 or L3 ).
The events L1 , L2 and L3 are mutually exclusive events, so we have

P(Lost) = P(L1 or L2 or L3 )
= P(L1 ) + P(L2 ) + P(L3 )

What are these individual probabilities? For L1 , it must be lost on flight 1:

P(L1 ) = 0.05.

For L2 , we need the bag to make it on flight 1 (probability of 0.95), but then to be lost
on flight 2:
P(L2 ) = 0.95 × 0.05 = 0.0475.
For L3 , we need the bag to make it on flights 1 and 2 (probability of 0.95 for each of
those two), but to then be lost on flight 3:

P(L3 ) = 0.95 × 0.95 × 0.05 = 0.045125.

Therefore, P(Lost) is:

P(Lost) = P(L1 ) + P(L2 ) + P(L3 )


= 0.05 + 0.0475 + 0.045125
= 0.142625.

Answer 2. We could have also attacked this problem by considering the complement
of Lost. We know that
P(Lost) = 1 − P(Not lost).

21
What is P(Not lost)? For a bag to not be lost at all on the whole journey, it must not
be lost on each of the three flights. It makes it on to each flight with probability 0.95,
so for three flights, this probability is just
P(Not lost) = 0.953 = 0.857375.
Plugging this in to the expression for P(Lost), we get:
P(Lost) = 1 − 0.857375 = 0.142625.

Question. Suppose that airport information at Zurich informs us that our bag is
lost. What is the probability that it was lost at Los Angeles?
Answer. We want the probability of P(L1 | Lost). Apply the conditional probability
formula:
P(L1 ∩ Lost)
P(L1 | Lost) = .
P(Lost)
Note that the event L1 ∩ Lost is the same as L1 (if L1 occurs, then Lost occurs). So we
just have
P(L1 )
P(L1 | Lost) =
P(Lost)
0.05
=
0.142625
= 0.35057.

Observe that the conditional probability of our bag being lost on flight 1, given that
it was lost at all, is 0.35. For flights 2 and 3, this number is 0.33304 and 0.31639.
Given that the probability of our bag getting lost on any of the flights is the same
(0.05), it may seem strange that these numbers are not the same. Think about it this
way: imagine we start with 100 bags in Los Angeles. Of these, 5 are lost, and the
remaining 95 make it to New York. Of these 95, 4.75 are lost and 90.25 make it to
London. Of these 90.25 bags, 4.5125 bags are lost and 85.7375 bags make it to Zurich.
The number of bags that are “still standing” at London is smaller than the number
that started from Los Angeles. For a bag to be lost on flight 3 (London to Zurich), it
had to have a bit of luck to make it to London and to not be lost on the previous two
flights. So when we calculate the conditional probability, we should expect that our
bag is more likely to have been lost on flight 1 than flight 2, and more likely to have
been lost on flight 2 than flight 3.
We could also have tackled this problem using a probability tree. The initial proba-
bility tree for this problem is:

22
Not lost
(flight 3)
?
?

Not lost
(flight 2)
?

Not lost
(flight 1) Lost (flight 3)
?
? 0.05

Lost (flight 2)
?
0.05

Lost (flight 1)
?
0.05

Notice that this tree has three levels – one for the outcome of each flight. Notice that
the tree is not symmetric. So for example, if we follow the lower branch at the first
juncture (bag is lost on flight 1), the tree ends at that branch; once the bag is lost,
there is nothing else that can happen to it (it cannot become “lost again” on a later
flight, or become “not lost” on a later flight).
Let’s fill the tree out. First, since the probability that the bag can be lost on a flight
is 0.05, the probability that it doesn’t get lost is 0.95. So under all of the “Not lost
(flight 1/2/3)” branches, we can write 0.95. To then obtain the probability of each of
the four outcomes – that the bag was lost on flight 1, lost on flight 2, lost on flight
3 or not lost on any of them – we just multiply along the tree. The completed tree
looks like this:

23
Not lost
(flight 3)
0.857375
0.95

Not lost
(flight 2)
0.95

Not lost
(flight 1) Lost (flight 3)
0.045125
0.95 0.05

Lost (flight 2)
0.0475
0.05

Lost (flight 1)
0.05
0.05

From the tree, we can immediately see the probability that the bag is not lost. This
is the probability at the end of the top most branch: 0.857375.
To find P(L1 | Lost), we need to find P(L1 ∩ Lost) and P(Lost). The value P(L1 ∩ Lost)
is just the probability at the end of the bottom-most branch (0.05), and P(Lost) is
just the same of the probabilities indicated by the left-pointing arrows (0.045125 +
0.0475 + 0.05 = 0.142625) . So we get:

0.05
P(L1 | Lost) =
0.142625
= 0.35057.

24

You might also like