This action might not be possible to undo. Are you sure you want to continue?

Janko Gravner

Mathematics Department

University of California

Davis, CA 95616

gravner@math.ucdavis.edu

March 28, 2010

These notes were started in January 2009, with help from Christopher Ng, a student who

was taking the Math 135A and 135B classes at UC Davis and was typesetting the notes he was

taking during my lectures. This text is not a treatise in elementary probability, and has no lofty

goals; instead, its aim is to help a student achieve the proﬁciency in the subject required for a

typical exam and basic real-life applications. Therefore, its emphasis is on examples, which are

chosen without much redundancy. A reader should strive to understand every example given

and be able to design and solve a similar one. Problems at the end of chapters and on sample

exams (the solutions to all of which are provided) have been selected from actual exams, hence

should be used as a test for preparedness.

I have only one tip for studying probability: you cannot do it half-heartedly. You have to

devote to this class several hours of concentrated attention per week to understand the subject

enough so that standard problems become routine. If you think that coming to class and reading

the examples while also doing something else is enough, you’re in for an unpleasant surprise on

the exams.

This text will always be available free of charge to UC Davis students. Please contact me if

you spot any mistake. I am thankful to Marisano James for numerous corrections and helpful

suggestions.

Copyright 2010, Janko Gravner

1 INTRODUCTION 1

1 Introduction

Theory of probability has always been associated with gambling and many of the most accessible

examples still come from that activity. You should be familiar with the basic tools of gambling

trade: a coin, a (six-sided) die, and a full deck of 52 cards. A fair coin gives you Heads (H) or

Tails (T), with equal probability, a fair die will give you 1, 2, 3, 4, 5, or 6 with equal probability,

and a shuﬄed deck of cards means that any ordering of cards is equally likely.

Example 1.1. Here are some typical questions that we will be asking and you will learn how

to answer. This example will serve as an illustration and you shouldn’t expect yourself to

understand how to get the answer yet.

Start with a shuﬄed deck of cards and distribute all 52 cards to 4 players, 13 cards to each.

What is the probability that each player gets an Ace? Next, assume you are a player and you

get a single Ace. Now what is the probability that each player gets an Ace?

Answers. If any ordering of cards is equally likely, then also any position of the four Aces in

the deck is equally likely. There are

_

52

4

_

possibilities for positions (slots) for the 4 aces. Out of

these, the number of positions that give each player an Ace is 13

4

: pick the ﬁrst slot among the

cards that the ﬁrst player gets, then the second slot among the second player’s card slots, then

the third and the fourth slots. Therefore, the answer is

13

4

(

52

4

)

≈ 0.1055.

After you see that you have a single Ace, the probability goes up: the previous answer needs

to be divided by the probability that you get a single Ace, which is

13·(

39

3

)

(

52

4

)

≈ 0.4388. The answer

then becomes

13

4

13·(

39

3

)

≈ 0.2404.

Here is how you can quickly estimate the second probability during a card game: give the

second Ace to a player, then a third to a diﬀerent player (probability about 2/3) and then the

last to the third player (probability about 1/3) for the approximate answer 2/9 ≈ 0.22.

History of probability

Although gambling dates back thousands of years, the birth of modern probability is considered

to be a 1654 letter from Flemish aristocrat and notorious gambler Chevalier de M´er´e to the

mathematician and philosopher Blaise Pascal. In essence the letter said,

I used to bet even money that I would get at least one 6 in four rolls of a fair die.

The probability of this is 4 times the probability of getting a 6 in a single die, i.e.,

4/6 = 2/3; clearly I had an advantage and indeed I was making money. Now I bet

even money that within 24 rolls of two dice I get at least one double 6. This has the

same advantage (24/6

2

= 2/3), but now I’m losing money. Why?

As Pascal discussed in his correspondence with Pierre de Fermat, de M´er´e reasonong was faulty;

after all, if the number of rolls were 7 in the ﬁrst game, the logic would give the nonsensical

probability 7/6. We’ll come back to this later.

1 INTRODUCTION 2

Example 1.2. In a family with 4 children, what is the probability of a 2:2 boy-girl split?

One common wrong answer:

1

5

, as the 5 possibilities for number of boys (0 to 4) are not

equally likely.

Another common guess: close to 1, as this is the most “balanced” possibility. This repre-

sents the mistaken belief that symmetry in probabilities should very likely result in symmetry in

the outcome. A related confusion supposes that events that are probable (say, have probability

around 0.75) occur nearly certainly.

Equally likely outcomes

Suppose an experiment is performed, with n possible outcomes comprising a set S. Assume

also that all outcomes are equally likely. (Whether this assumption is realistic, depends on the

context. In the above Example 1.2, this was a bad assumption.) An event E is a set of outcomes,

i.e., E ⊂ S. If an event E consists of m diﬀerent outcomes (often called “good” outcomes for

E), then the probability of E is given by,

(1.1) P(E) =

m

n

.

Example 1.3. A fair die has 6 outcomes, and take E = ¦2, 4, 6¦. Then P(E) =

1

2

.

What does the answer in Example 1.3 mean? Every student of probability should spend

some time thinking about this. The fact is that it is very diﬃcult to attach a meaning to P(E)

if we roll a die just a single time, or a few times. The most straightforward interpretation is

that for a very large number of rolls, about half of the outcomes will be even. Note that this

requires at least the concept of a limit! This relative frequency interpretation of probability will

be explained in detail much later. For now, take formula (1.1) as the deﬁnition of probability.

2 COMBINATORICS 3

2 Combinatorics

Example 2.1. Toss three fair coins. What is the probability of exactly one H?

There are 8 equally likely outcomes: HHH, HHT, HTH, HTT, THH, THT, TTH, TTT. Out

of these, 3 have exactly one H. That is, E = ¦HTT, THT, TTH¦, and P(E) = 3/8.

Example 2.2. Now let’s compute the probability of a 2:2 boy-girl split in a four-child family.

We now have 16 outcomes, which we will assume are equally likely, although this is not quite

true in reality. We can list the outcomes, but there has to be a better way:

BBBB BBBG BBGB BBGG

BGBB BGBG BGGB BGGG

GBBB GBBG GBGB GBGG

GGBB GGBG GGGB GGGG

We conclude that

P(2:2 split) =

6

16

=

3

8

,

P(1:3 split or 3:1 split) =

8

16

=

1

2

,

P(4:0 split or 0:4 split) =

2

16

=

1

8

.

Example 2.3. Roll two dice. What is the most likely sum?

The outcomes are ordered pairs (i, j), 1 ≤ i ≤ 6, 1 ≤ j ≤ 6.

sum no. of outcomes

2 1

3 2

4 3

5 4

6 5

7 6

8 5

9 4

10 3

11 2

12 1

Our answer is 7, and P(sum = 7) =

6

36

=

1

6

.

How to count?

Listing all outcomes is very ineﬃcient, especially if their number is large.

2 COMBINATORICS 4

Basic principle: If an experiment consists of two parts, and the ﬁrst part has m outcomes,

and the second part has n outcomes regardless of the outcome in the ﬁrst part, then the

experiment as a whole has mn outcomes.

Example 2.4. Roll a die 4 times. What is the probability that you get diﬀerent numbers?

At least at the beginning, you should divide every solution into the following three steps:

Step 1: Identify the set of equally likely outcomes. In this case, this is the set of all ordered

4-tuples of numbers 1, . . . , 6. That is, ¦(a, b, c, d) : a, b, c, d ∈ ¦1, . . . , 6¦¦.

Step 2: Compute the number of outcomes. In this case, the number of outcomes is there-

fore 6

4

.

Step 3: Compute the number of good outcomes. In this case this number is 6 5 4 3.

Why? We have 6 options for the ﬁrst roll, 5 options for the second roll since its number

must diﬀer from the number on the ﬁrst roll, 4 options for the third roll since its number

must not appear on the ﬁrst two rolls, etc. Note that the set of possible outcomes changes

from step to step, but their number does not!

The answer then is 6 5 4 3/6

4

= 5/18 ≈ 0.2778.

Example 2.5. Let’s now compute probabilities for de M´er´e’s games.

In Game 1, there are 4 rolls and he wins with at least one 6. The number of good events is

6

4

−5

4

, as the number of bad events is 5

4

. Therefore

P(win) = 1 −

_

5

6

_

4

≈ 0.5177.

In Game 2, there are 24 rolls of two dice and he wins by at least one pair of 6’s rolled. The

number of outcomes is 36

24

, the number of bad ones 35

24

and thus the number of good outcomes

equals 36

24

−35

24

. Therefore

P(win) = 1 −

_

35

36

_

24

≈ 0.4914.

Chevalier de M´er´e overcounted the good outcomes in both cases. His count 4 6

3

in Game

1 selects a die with a 6 and arbitrary numbers for other dice. But many outcomes have more

than one six and are hence counted more than once.

One should also note that both probabilities are barely diﬀerent from 1/2 so de M´er´e was

gambling a lot to be able to notice the diﬀerence.

2 COMBINATORICS 5

Permutations

Assume you have n objects. The number of ways to ﬁll n ordered slots with them is

n (n −1) . . . 2 1 = n!,

while the number of ways to ﬁll k ≤ n ordered slots is

n(n −1) . . . (n −k + 1) =

n!

(n −k)!

.

Example 2.6. Shuﬄe a deck of cards.

• P(top card is an Ace) =

1

13

=

4·51!

52!

.

• P(all cards of the same suit end up next to each other) =

4!·(13!)

4

52!

≈ 4.510

−28

. This event

never happens in practice.

• P(hearts are together) =

40!13!

52!

≈ 6 10

−11

.

To compute the last probability, for example, collect all hearts into a block, then specify

a good event by ordering 40 items (the block of hearts plus the 39 other cards), and ordering

hearts within their block.

Before we go on to further examples, let’s agree that when the text says that a random

choice is made, without further elaboration, this means that all available choices are equally

likely. Also, in the next problem (and in statistics in general) sampling with replacement refers

to choosing, at random, an object from a population, noting its properties, putting the object

back into population, and then repeating. Sampling without replacement omits the putting back

part.

Example 2.7. A bag has 6 pieces of paper with letters, E, E, P, P, P, and R, on them. Pull 6

pieces at random out of the bag (1) without, and (2) with replacement. What is the probability

that these pieces, in order, spell PEPPER?

There are two problems to solve. For sampling without replacement:

1. Outcome is an ordering of pieces of paper E

1

E

2

P

1

P

2

P

3

R.

2. The number of outcomes thus is 6!.

3. The number of good outcomes is 3!2!.

The probability is

3!2!

6!

=

1

60

.

For sampling with replacement, the answer is

3

3

·2

2

6

6

=

1

2·6

3

, quite a lot smaller.

2 COMBINATORICS 6

Example 2.8. Sit 3 men and 3 women at random (1) on a row of chairs, and (2) around a table.

Compute P(all women sit together). In case (2), also compute P(men and women alternate).

In case (1), the answer is

4!3!

6!

=

1

5

.

For case (2), pick a man, say John Smith, and sit him ﬁrst. Then you reduce to a row

problem, with 5! outcomes and the number of good outcomes is 3! 3!. The answer is

3

10

. For

the last question, the seats for men and women are ﬁxed after John Smith takes his seat and so

the answer is

3!2!

5!

=

1

10

.

Example 2.9. A group consists of 3 Norwegians, 4 Swedes, and 5 Finns, and they sit at

random around a table. What is the probability that all three national subgroups end up sitting

together?

The answer is

3!·4!·5!·2!

11!

. Pick, say, a Norwegian (Arne) and sit him down. Here is how you

count the good events. There are 3! choices for ordering the group of Norwegians (and then sit

them down to one of both sides of Arne, depending on the ordering). Then there are 4! choices

for arranging the Swedes, and 5! choices for arranging the Finns. Finally, there are 2! choices to

order the two blocks of Swedes and Finns.

Combinations

Let

_

n

k

_

be the number of diﬀerent subsets with k elements of a set with n elements. Then,

_

n

k

_

=

n(n −1) . . . (n −k + 1)

k!

=

n!

k!(n −k)!

To understand why the above is true, ﬁrst choose the subset, and then order its elements in

a row, to ﬁll k ordered slots with elements from the set with n objects. Then distinct choices of

a subset and its ordering will end up in distinct orderings. Therefore,

_

n

k

_

k! = n(n −1) . . . (n −k + 1).

We call

_

n

k

_

= n choose k, or a binomial coeﬃcient (as it appears in the binomial theorem:

(x +y)

n

=

n

k=0

_

n

k

_

x

k

y

n−k

). Note also that

_

n

0

_

=

_

n

n

_

= 1 and

_

n

k

_

=

_

n

n −k

_

.

More general are the multinomial coeﬃcients, deﬁned next.

2 COMBINATORICS 7

The number of ways to divide a set of n elements into r (distinguishable) subsets of

n

1

, n

2

, . . . , n

r

elements where n

1

+. . . +n

r

= n is denoted by

_

n

n

1

,...,nr

_

, and

_

n

n

1

, . . . , n

r

_

=

_

n

n

1

__

n −n

1

n

2

__

n −n

1

−n

2

n

3

_

. . .

_

n −n

1

−. . . −n

r−1

n

r

_

=

n!

n

1

!n

2

! . . . n

r

!

To understand the slightly confusing word distinguishable, just think of painting n

1

elements

red, then n

2

diﬀerent elements blue, etc. These colors distinguish between diﬀerent subsets.

Example 2.10. A fair coin is tossed 10 times. What is the probability that we get exactly 5

Heads?

P(exactly 5 Heads) =

_

10

5

_

2

10

≈ 0.2461,

as one needs to choose the position of the ﬁve heads among 10 slots to ﬁx a good outcome.

Example 2.11. We have a bag that contains 100 balls, with 50 of them red and 50 blue. Select

5 balls at random. What is the probability that 3 are blue and 2 are red?

The number of outcomes is

_

100

5

_

, and all of them are equally likely, a reasonable interpreta-

tion of “select 5 balls at random.” Then the answer is

P(3 are blue and 2 are red) =

_

50

3

__

50

2

_

_

100

5

_ ≈ 0.3189

Why should this probability be less than a half? The probability that 3 are blue and 2 are

red is equal to the probability that 3 are red and 2 are blue, so if it were greater than a half,

both together would add to over 1. It cannot be exactly a half either, because other possibilities

(like all 5 chosen balls are red) have probability greater than 0.

Example 2.12. Here we return to Example 1.1 and do it more slowly. Shuﬄe a standard deck

of 52 cards, and deal 13 cards to each of the 4 players.

What is the probability that each player gets an Ace? We will solve this problem in two

ways, to emphasize that you often have a choice in your set of equally likely outcomes.

The ﬁrst way uses an outcome to be an ordering of 52 cards:

1. There are 52! equally likely outcomes.

2. Let the ﬁrst 13 cards go to the ﬁrst player, second 13 cards to the second player, etc. Pick

a slot with each of the four segments of 13 slots for an Ace. There are 13

4

possibilities to

choose these four slots for Aces.

3. The number of choices to ﬁll these four positions with (four diﬀerent) Aces is 4!.

4. Order the rest of the cards in 48! ways.

2 COMBINATORICS 8

The probability is then

13

4

4!48!

52!

.

The second way, via a small leap of faith, assumes that each set of four positions of the four

Aces among the 52 shuﬄed cards is equally likely. You may choose to believe this intuitive fact

or try to write down a formal proof: the number of permutations that result in a given set F of

four positions of Aces in the deck is independent of F. Then:

1. Outcomes are the positions of the 4 Aces among the 52 slots for shuﬄed cards of the deck.

2. Number of outcomes is

_

52

4

_

.

3. The number of good outcomes is 13

4

, as we need to choose one slot among 13 cards that

go to the ﬁrst player, etc.

The probability is then

13

4

(

52

4

)

, which agrees with the number we obtained the ﬁrst way.

Let’s also compute P(one person has all four Aces). Doing the problem the second way, we

get

1. Number of outcomes is

_

52

4

_

.

2. To ﬁx a good outcome, pick one player (

_

4

1

_

choices) and pick four slots for Aces for that

player (

_

13

4

_

choices).

The answer is then

(

4

1

)(

13

4

)

(

52

4

)

= 0.0106; a lot smaller than the probability of each player getting an

Ace.

Example 2.13. Roll a die 12 times. P(each number appears exactly two times)?

1. An outcome consist of ﬁlling each of the 12 slots (for 12 rolls) with an integer between 1

and 6 (the outcome of the roll).

2. Number of is therefore outcomes 6

12

.

3. To ﬁx a good outcome, Pick two slots for 1, then pick two slots for 2, etc., with

_

12

2

__

10

2

_

. . .

_

2

2

_

choices.

The probability is then

(

12

2

)(

10

2

)...(

2

2

)

6

12

.

What is the P(1 appears exactly 3 times, 2 appears exactly 2 times)?

To ﬁx a good outcome now, pick three slots for 1, two slots for 2, and ﬁll the remaining 7

slots by numbers 3, . . . , 6. The number of choices is

_

12

3

__

9

2

_

4

7

, and the answer

(

12

3

)(

9

2

)4

7

6

12

.

Example 2.14. We have 14 rooms with 4 colors, white, blue, green, and yellow. Each room

is painted at random with one of the four colors. There are 4

14

equally likely outcomes, so, for

example,

P(5 white, 4 blue, 3 green, 2 yellow rooms) =

_

14

5

__

9

4

__

5

3

__

2

2

_

4

14

.

2 COMBINATORICS 9

Example 2.15. A middle row on a plane sits 7 people. Three of them order chicken (C) and

the remaining four pasta (P). The ﬂight attendant returns with the meals, but has forgotten

who ordered what and discovers that they are all asleep, so she puts the meals in front of them

at random. What is the probability that they all receive correct meals?

A reformulation makes the problem clearer: we are interested in P(3 people who ordered C

get C). Let’s label people 1, . . . , 7, and assume that 1, 2, and 3 ordered C. The outcome is a

selection of 3 people from the 7 who receive C, the number of them is

_

7

3

_

and there is a single

good outcome. So the answer is

1

(

7

3

)

=

1

35

. Similarly

P(no one who ordered C gets C) =

_

4

3

_

_

7

3

_ =

4

35

,

P(a single person who ordered C gets C) =

3

_

4

2

_

_

7

3

_ =

18

35

,

P(two persons who ordered C get C) =

_

3

2

_

4

_

7

3

_ =

12

35

.

Problems

1. A California licence plate consists of a sequence of seven ordered symbols: number, letter,

letter, letter, number, number, number, where a letter is any one of 26 letters and a number is

one among 0, 1, . . . , 9. Assume all licence plates are equally likely. (a) What is the probability

that all symbols are diﬀerent? (b) What is the probability that all symbols are diﬀerent and the

ﬁrst number is the largest of all numbers on the plate?

2. A tennis tournament has 2n participants, n Swedes and n Norwegians. First n people are

chosen at random from the 2n (with no regard to nationality), then paired randomly with the

other n people. Each pair proceeds to play one match. An outcome is a set of n (ordered) pairs,

giving winner and loser in each of the n matches. (a) Determine the number of outcomes. (b)

What do you need to assume to conclude that all outcomes are equally likely? (c) Under this

assumption, compute the probability that all Swedes are winners.

3. 5 Norwegians, 6 Swedes and 7 Finns are seated at random around a table. Compute the

following probabilities: (a) that all Norwegians sit together, (b) that all Norwegians and all

Swedes sit together, and (c) that that all Norwegians all Swedes and all Finns sit together.

4. A bag contains 80 balls numbered 1, . . . , 80. Before the game starts, you choose 10 diﬀerent

numbers from amongst 1, . . . , 80 and write them on a piece of paper. Then 20 balls are selected

(without replacement) out of the bag at random. (a) What is the probability that all your

numbers are selected? (b) What is the probability that none of your numbers is selected? (c)

What is the probability that exactly 4 of your numbers are selected?

2 COMBINATORICS 10

5. A full deck of 52 cards contains 13 Hearts. Pull 8 cards from the deck at random, (a) without

replacement and (b) with replacement. In each case compute the probability that you get no

Hearts.

Solutions to problems

1. (a)

10·9·8·7·26·25·24

10

4

·26

3

, (b) the answer in (a) times

1

4

.

2. (a) Divide into two groups (winners and losers), then pair them:

_

2n

n

_

n! choices. Alternatively,

pair ﬁrst player, then next available player, etc., and then at the end choose winners and losers:

(2n − 1)(2n − 3) 3 1 2

n

. (Of course, these two expressions are the same.) (b) All players

are of equal strength, equally likely to win or lose any match against any other player. (c) The

number of good events is n!, the choice of a Norwegian paired with each Swede.

3. (a)

13!·5!

17!

, (b)

8!·5!·6!

17!

, (c)

2!·7!·6!·5!

17!

.

4. (a)

(

70

10

)

(

80

20

)

, (b)

(

70

20

)

(

80

20

)

, (c)

(

10

4

)(

70

16

)

(

80

20

)

.

5. (a)

(

39

8

)

(

52

8

)

, (b)

_

3

4

_

8

.

3 AXIOMS OF PROBABILITY 11

3 Axioms of Probability

The question here is: how can we mathematically deﬁne a random experiment? What we have

are outcomes (which tell you exactly what happens), events (sets comprising certain outcomes),

and probability (which attaches to every event the likelihood that it happens). We need to

agree on which properties these objects must have in order to compute with them and develop

a theory.

When we have ﬁnitely many equally likely outcomes all is pretty clear and we have already

seen many examples. However, as is common in mathematics, inﬁnite sets are much harder to

deal with. For example, we will see soon what it means to choose a random point within a unit

circle. On the other hand, we will also see that there is no way to choose at random a positive

integer — remember that “at random” means all choices are equally likely, unless otherwise

speciﬁed.

A probability space is then a triple (Ω, T, P). The ﬁrst object Ω is an arbitrary set of

outcomes, sometimes called a sample space.

The second object T is the collection of all events, that is a set of subsets of Ω. Therefore,

A ∈ T necessarily means that A ⊂ Ω. Can we just say that each A ⊂ Ω is an event? In this

course, you can assume so without worry, although there are good reasons for not assuming so

in general ! I will give the deﬁnition of what properties T needs to satisfy, but this is just for

illustration and you should take a course in measure theory to understand what is really going

on. Namely, T needs to be a σ-algebra, which means (1) ∅ ∈ T, (2) A ∈ T =⇒ A

c

∈ T, and

(3) A

1

, A

2

, ∈ T =⇒ ∪

∞

i=1

A

i

∈ T.

What is important is that you can take complement A

c

of an event A (i.e., A

c

happens when

A does not happen), unions of two or more events (i.e., A

1

∪ A

2

happens when either A

1

or A

2

happens), and intersection of two or more events (i.e., A

1

∩ A

2

happens when both A

1

and A

2

happen). We call events A

1

, A

2

, . . . pairwise disjoint if A

i

∩ A

j

= ∅ if i ,= j — that is, at most

one of such events can occur.

Finally, the probability P is a number attached to every event A and satisﬁes the following

three axioms:

Axiom 1. For every event A, P(A) ≥ 0.

Axiom 2. P(Ω) = 1.

Axiom 3. If A

1

, A

2

, . . . is a sequence of pairwise disjoint events, then

P(

∞

_

i=1

A

i

) =

∞

i=1

P(A

i

).

Whenever we have an abstract deﬁnition like this, the ﬁrst thing to do is to look for examples.

Here are some.

3 AXIOMS OF PROBABILITY 12

Example 3.1. Ω = ¦1, 2, 3, 4, 5, 6¦,

P(A) =

(no. of elements in A)

6

.

The random experiment here is rolling a fair die. Clearly this can be generalized to any ﬁnite

set with equally likely outcomes.

Example 3.2. Ω = ¦1, 2, . . .¦, and you have numbers p

1

, p

2

, . . . ≥ 0 with p

1

+p

2

+. . . = 1. For

any A ⊂ Ω,

P(A) =

i∈A

p

i

.

For example, toss a fair coin repeatedly until ﬁrst Heads. Your outcome is the number of tosses.

Here, p

i

=

1

2

i

.

Note that p

i

cannot be chosen to be equal, as you cannot make the sum of inﬁnitely many

equal numbers to be 1.

Example 3.3. Pick a point from inside the unit circle centered at the origin. Here, Ω = ¦(x, y) :

x

2

+y

2

≤ 1¦ and

P(A) =

(area of A)

π

.

It is important to observe that if A is a singleton (as set whose element is a single point), then

P(A) = 0. This means that we cannot attach the probability to outcomes — you hit a single

point (or even a line) with probability 0, but a “fatter” set with positive area you hit with

positive probability.

Another important theoretical remark: this is a case where A cannot be arbitrary subset of

the circle — there are sets for which area cannot be deﬁned!

Consequences of the axioms

(C0) P(∅) = 0.

Proof . In Axiom 3, take all sets to be ∅.

(C1) If A

1

∩ A

2

= ∅, then P(A

1

∪ A

2

) = P(A

1

) +P(A

2

).

Proof . In Axiom 3, take all sets other than ﬁrst two to be ∅.

(C2)

P(A

c

) = 1 −P(A).

3 AXIOMS OF PROBABILITY 13

Proof . Apply (C1) to A

1

= A, A

2

= A

c

.

(C3) 0 ≤ P(A) ≤ 1.

Proof . Use that P(A

c

) ≥ 0 in (C2).

(C4) If A ⊂ B, P(B) = P(A) +P(B ¸ A) ≥ P(A).

Proof . Use (C1) for A

1

= A and A

2

= B ¸ A.

(C5) P(A ∪ B) = P(A) +P(B) −P(A ∩ B).

Proof . Let P(A ¸ B) = p

1

, P(A ∩ B) = p

12

and P(B ¸ A) = p

2

, and note that sets appearing

here are pairwise disjoint. Then P(A) = p

1

+p

12

, P(B) = p

2

+p

12

and P(A∪B) = p

1

+p

2

+p

12

.

(C6)

P(A

1

∪ A

2

∪ A

3

) = P(A

1

) +P(A

2

) +P(A

3

)

−P(A

1

∩ A

2

) −P(A

1

∩ A

3

) −P(A

2

∩ A

2

)

+P(A

1

∩ A

2

∩ A

3

)

and more generally

P(A

1

∪ ∪ A

n

) =

n

i=1

P(A

i

) −

1≤i<j≤n

P(A

i

∩ A

j

) +

1≤i<j<k≤n

P(A

i

∩ A

j

∩ A

k

) +. . .

+ (−1)

n−1

P(A

1

∩ ∩ A

n

).

This is called inclusion-exclusion formula and is commonly used when it is easier to compute

probabilities of intersections that of unions.

Proof . We prove this only for n = 3. Let p

1

= P(A

1

∩ A

c

2

∩ A

c

3

), p

2

= P(A

c

1

∩ A

2

∩ A

c

3

),

p

3

= P(A

c

1

∩ A

c

2

∩ A

3

), p

12

= P(A

1

∩ A

2

∩ A

c

3

), p

13

= P(A

1

∩ A

c

2

∩ A

3

), p

23

= P(A

c

1

∩ A

2

∩ A

3

),

and p

123

= P(A

1

∩ A

2

∩ A

3

). Again note that all sets are pairwise disjoint, and next that the

3 AXIOMS OF PROBABILITY 14

right hand side of (6) is

(p

1

+p

12

+p

13

+p

123

) + (p

2

+p

12

+p

23

+p

123

) + (p

3

+p

13

+p

23

+p

123

)

−(p

12

+p

123

) −(p

13

+p

123

) −(p

23

+p

123

)

+p

123

= p

1

+p

2

+p

3

+p

12

+p

13

+p

23

+p

123

= P(A

1

∪ A

2

∪ A

3

).

Example 3.4. Pick an integer in [1, 1000] at random. Compute the probability that it is

divisible neither by 12 nor by 15.

The sample space consists of 1000 integers between 1 and 1000, and let A

r

be the subset

consisting of integers divisible by r. The cardinality of A

r

is ¸1000/r|. Another simple fact

is that A

r

∩ A

s

= A

lcm(r,s)

, where lcm stands for the least common multiple. Our probability

equals

1 −P(A

12

∪ A

15

) = 1 −P(A

12

) −P(A

15

) +P(A

12

∩ A

15

)

= 1 −P(A

12

) −P(A

15

) +P(A

60

)

= 1 −

83

1000

−

66

1000

+

16

1000

= 0.867.

Example 3.5. Sit 3 men and 4 women at random in a row. What is the probability that either

all the men or all the women end up sitting next to each other?

Here A

1

= ¦all women sit together¦, A

2

= ¦all men sit together¦, A

1

∩ A

2

= ¦both women

and men sit together¦, and so the answer is

P(A

1

∪ A

2

) = P(A

1

) +P(A

2

) −P(A

1

∩ A

2

) =

4! 4!

7!

+

5! 3!

7!

−

2! 3! 4!

7!

.

Example 3.6. The group of 3 Norwegians, 4 Swedes and 5 Finns is seated at random around

the table. Compute the probability that at least one of the three groups ends up sitting together.

Deﬁne A

N

= ¦Norwegians sit together¦, and similarly A

S

, A

F

. We have

P(A

N

) =

3! 9!

11!

, P(A

S

) =

4! 8!

11!

, P(A

N

) =

5! 7!

11!

,

P(A

N

∩ A

S

) =

3! 4! 6!

11!

, P(A

N

∩ A

F

) =

3! 5! 5!

11!

, P(A

S

∩ A

F

) =

4! 5! 4!

11!

,

P(A

N

∩ A

S

∩ A

F

) =

3! 4! 5! 2!

11!

.

and therefore

P(A

N

∪ A

S

∪ A

F

) =

3! 9! + 4! 8! + 5! 7! −3! 4! 6! −3! 5! 5! −4! 5! 4! + 3! 4! 5! 2!

11!

.

Example 3.7. Matching problem. A large company of n employees has a scheme according

to which each employee buys a Christmas gift, and gifts are then distributed at random to the

employees. What is the probability that someone gets his or her own gift?

3 AXIOMS OF PROBABILITY 15

You should note that this is a lot diﬀerent than asking, assuming you are one of the employees,

for the probability that you get your own gift. (Which is

1

n

.)

Let A

i

= ¦ith employee gets his or her own gift¦. Then what we are looking for is

P(

n

_

i=1

A

i

).

We have

P(A

i

) =

1

n

(for all i)

P(A

i

∩ A

j

) =

(n −2)!

n!

=

1

n(n −1)

(for all i < j)

P(A

i

∩ A

j

∩ A

k

) =

(n −3)!

n!

=

1

n(n −1)(n −2)

(for all i < j < k)

. . .

P(A

1

∩ ∩ A

n

) =

1

n!

and so our answer is

n

1

n

−

_

n

2

_

1

n(n −1)

+

_

n

3

_

1

n(n −1)(n −2)

−. . . + (−1)

n−1

1

n!

= 1 −

1

2!

−

1

3!

+. . . + (−1)

n−1

1

n!

→ 1 −

1

e

≈ 0.6321 (as n →∞).

Example 3.8. Birthday Problem. Assume you have k people in the room. What is the

probability that there are two who share a birthday? We will ignore leap years, assume all

birthdays are equally likely, and generalize the problem a little: from n possible birthdays,

sample k times with replacement.

P(a shared birthday) = 1 −P(no shared birthdays) = 1 −

n (n −1) (n −k + 1)

n

k

.

When n = 365, the lowest k for which this is above 0.5 is famously k = 23, and some values are

given in the following table.

k prob. for n = 365

10 0.1169

23 0.5073

41 0.9032

57 0.9901

70 0.9992

3 AXIOMS OF PROBABILITY 16

Each day, Massachusetts lottery chooses a four digit number at random, with leading 0’s

allowed. Thus there is sampling with replacement from among n = 10, 000 choices each day. On

February 6, 1978, the Boston Evening Globe reported that

“During [the lottery’s] 22 months existence [...], no winning number has ever been

repeated. [David] Hughes, the expert [and a lottery oﬃcial] doesn’t expect to see

duplicate winners until about half of the 10, 000 possibilities have been exhausted.”

What would an informed reader make of this? Assuming k = 660 days, the probability of no

repetition works out to be about 2.19 10

−10

, a remarkably improbable event. What happened

was that Mr. Hughes, not understanding the Birthday Problem, did not check for repetitions,

conﬁdent that there would not be any. Apologetic lottery oﬃcials announced later that there

indeed were repetitions.

Example 3.9. Coupon Collector Problem. Within the context of the previous problem, assume

k ≥ n and compute P(all n birthday are represented).

More often this is described in terms of cereal boxes, each of which contains one of n diﬀerent

cards (coupons), chosen at random. If you buy k boxes, what is the probability that you have

a complete collection?

When k = n, our probability is

n!

n

n

. More generally, let

A

i

= ¦ith birthday is missing¦.

Then we need to compute

1 −P(

n

_

i=1

A

i

).

Now

P(A

i

) =

(n −1)

k

n

k

(for all i)

P(A

i

∩ A

j

) =

(n −2)

k

n

k

(for all i < j)

. . .

P(A

1

∩ ∩ A

n

) = 0

and our answer is

1 −n

_

n −1

n

_

k

+

_

n

2

__

n −2

n

_

k

−. . . + (−1)

n−1

_

n

n −1

__

1

n

_

k

=

n

i=0

_

n

i

_

(−1)

i

_

1 −

i

n

_

k

.

This must be

n!

n

n

for k = n, and 0 when k < n, neither of which is obvious from the formula.

Neither will you, for large n, get anything close the correct numbers when k ≤ n if you try

3 AXIOMS OF PROBABILITY 17

to compute the probabilities by computer, due to very large size of summands with alternating

signs and resulting rounding errors. We will return to this problem later for a much more eﬃcient

computational method, but some numbers are in two tables below. Another remark for those

who know a lot of combinatorics: you will perhaps notice that the above probability is

n!

n

k

S

k,n

,

where S

k,n

is the Stirling number of the second kind.

k prob. for n = 6

13 0.5139

23 0.9108

36 0.9915

k prob. for n = 365

1607 0.0101

1854 0.1003

2287 0.5004

2972 0.9002

3828 0.9900

4669 0.9990

More examples with combinatorial ﬂavor

We will now do more problems which would belong more into previous chapter, but are a little

harder, so we do them here instead.

Example 3.10. Roll a die 12 times. Compute the probability that a number occurs 6 times

and two other numbers occur three times each.

The number of outcomes is 6

12

. To count the number of good outcomes:

1. Pick the number that occurs 6 times:

_

6

1

_

= 6 choices.

2. Pick the two numbers that occurs 3 times each:

_

5

2

_

choices.

3. Pick slots (rolls) for the number that occurs 6 times:

_

12

6

_

choices.

4. Pick slots for one of the numbers that occur 3 times:

_

6

3

_

choices.

Therefore, our probability is

6(

5

2

)(

12

6

)(

6

3

)

6

12

.

Example 3.11. You have 10 pairs of socks in the closet. Pick 8 socks at random. For every i,

compute the probability that you get i complete pairs of socks.

Now there are

_

20

8

_

outcomes. To count the number of good outcomes:

1. Pick i pairs of socks of the 10:

_

10

i

_

choices.

2. Pick pairs which are represented by a single sock:

_

10−i

8−2i

_

choices.

3. Pick a sock from each of the latter pairs: 2

8−2i

choices.

Therefore, our probability is

2

8−2i

(

10−i

8−2i

)(

10

i

)

(

20

8

)

.

3 AXIOMS OF PROBABILITY 18

Example 3.12. Poker Hands. In the deﬁnitions, the word value refers to A, K, Q, J, 10, 9, 8,

7, 6, 5, 4, 3, 2. This sequence orders cards in descending consecutive values, with one exception:

an Ace may be regarded as 1 for the purposes of making a straight (but note that, for example,

K, A, 1, 2, 3 is not a valid straight sequence — A can only begin or end a straight). From lowest

to highest, here are the hands:

(a) one pair: two cards of same value plus 3 cards with diﬀerent values

J♠ J♣ 9♥ Q♣ 4♠

(b) two pairs: two pairs plus another card of diﬀerent value

J♠ J♣ 9♥ 9♣ 3♠

(c) three of a kind: three cards of the same value plus two with diﬀerent values

Q♠ Q♣ Q♥ 9♣ 3♠

(d) straight: ﬁve cards with consecutive values

5♥ 4♣ 3♣ 2♥ A♠

(e) ﬂush: ﬁve cards of the same suit

K♣ 9♣ 7♣ 6♣ 3♣

(f) full house: a three of a kind and a pair

J♣ J♦ J♥ 3♣ 3♠

(g) four of a kind: four cards of the same value

K♣ K♦ K♥ K♣ 10♠

(e) straight ﬂush: ﬁve cards of the same suit with consecutive values

A♠ K♠ Q♠ J♠ 10♠

And here are the probabilities:

hand no. combinations approx. prob.

one pair 13

_

12

3

_

_

4

2

_

4

3

0.422569

two pairs

_

13

2

_

11

_

4

2

_

_

4

2

_

4 0.047539

three of a kind 13

_

12

2

_

_

4

3

_

4

2

0.021128

straight 10 4

5

0.003940

ﬂush 4

_

13

5

_

0.001981

full house 13 12

_

4

3

_

_

4

2

_

0.001441

four of a kind 13 12 4 0.000240

straight ﬂush 10 4 0.000015

3 AXIOMS OF PROBABILITY 19

You should note that the probabilities of straight and ﬂush above include the possibility of a

straight ﬂush.

Let’s see how some of these are computed. First, the number of all outcomes is

_

52

5

_

=

2, 598, 960. Then, for example, for the three of a kind, the number of good outcomes by listing

the number of choices:

1. Choose a value for the triple: 13.

2. Choose the values of other two cards:

_

12

2

_

.

3. Pick three cards from the four of the same chosen value:

_

4

3

_

.

4. Pick a card (i.e., the suit) from each of the two remaining values: 4

2

.

One could do the same for one pair:

1. Pick a number for the pair: 13.

2. Pick the other three numbers:

_

12

3

_

3. Pick two cards from the value of the pair:

_

4

2

_

.

4. Pick the suits of the other three values: 4

3

And for the ﬂush:

1. Pick a suit: 4.

2. Pick ﬁve numbers:

_

13

5

_

Our ﬁnal worked out case is straight ﬂush:

1. Pick a suit: 4.

2. Pick the beginning number: 10.

We end this example by computing the probability of not getting any hands listed above,

that is,

P(nothing) = P(all cards with diﬀerent values) −P(straight or ﬂush)

=

_

13

5

_

4

5

_

52

5

_ −(P(straight) +P(ﬂush) −P(straight ﬂush))

=

_

13

5

_

4

5

−10 4

5

−4

_

13

5

_

+ 40

_

52

5

_

≈ 0.5012.

3 AXIOMS OF PROBABILITY 20

Example 3.13. Assume that 20 Scandinavians, 10 Finns and 10 Danes, are to be distributed

at random into 10 rooms, 2 per room. What is the probability that exactly 2i rooms are mixed,

i = 0, . . . 5?

This is an example when careful thinking about what outcomes should be really pays oﬀ.

Consider the following model for distributing Scandinavians into rooms. First arrange them at

random onto a row of 20 slots S1, S2, . . . , S20. Room 1 then takes people in slots S1, S2, so

let’s call these two slots R1. Similarly, room 2 takes people in slots S3, S4, so let’s call these

two slots R2, etc.

Now it is clear that we only need to keep track of distribution of 10 D’s into the 20 slots,

corresponding to the positions of 10 Danes. Any such distribution constitutes an outcome, and

they are equally likely. Their number is

_

20

10

_

.

To get 2i (for example, 4) mixed rooms, start by choosing 2i (ex., 4) out of 10 rooms which

are going to be mixed. There are

_

10

2i

_

of these choices. You also need to make a choice into

which slot in each of the 2i chosen mixed rooms the D goes, for 2

2i

choices.

Once these two choices are made, you still have 10 − 2i (ex., 6) D’s to distribute into 5 − i

(ex., 3) rooms, as there are two D’s in each of those rooms. For this, you need to choose 5 − i

(ex., 3) rooms out of the remaining 10 − 2i (ex., 6), for

_

10−2i

5−i

_

choices, and this choice ﬁxes a

good outcome.

The ﬁnal answer therefore is

_

10

2i

_

2

2i

_

10−2i

5−i

_

_

20

10

_ .

Problems

1. Roll a single die 10 times. Compute the following probabilities: (a) that you get at least one

6; (b) that you get at least one 6 and at least one 5; (c) that you get three 1’s, two 2’s and ﬁve

3’s.

2. Three married couples take seats around a table at random. Compute P(no wife sits next to

her husband).

3. A group of 20 Scandinavians consists of 7 Swedes, 3 Finns, and 10 Norwegians. A committee

of ﬁve people is chosen at random from this group. What is the probability that at least one of

the three nations is not represented on the committee?

4. Choose each digit of a 5 digit number at random from digits 1, . . . 9. Compute the probability

that no digit appears more than twice.

5. Roll a fair die 10 times. (a) Compute the probability that at least one number occurs exactly

6 times. (b) Compute the probability that at least one number occurs exactly once.

3 AXIOMS OF PROBABILITY 21

6. A lottery ticket consists of two rows, each containing 3 numbers from 1, 2, . . . , 50. The

drawing consists of choosing 5 diﬀerent numbers from 1, 2, . . . , 50 at random. A ticket wins if

its ﬁrst row contains at least two of the numbers drawn, and the second row contains at least

two of the numbers drawn. Thus there are four types of tickets, as shown below:

Ticket 1 Ticket 2 Ticket 3 Ticket 4

1 2 3

4 5 6

1 2 3

1 2 3

1 2 3

2 3 4

1 2 3

3 4 5

For example, if the numbers 1, 3, 5, 6, 17 are drawn, then Ticket 1, Ticket 2 and Ticket 4 all

win, while Ticket 3 loses. Compute the winning probabilities for all four tickets.

Solutions to problems

1. (a) 1 −(5/6)

10

. (b) 1 −2 (5/6)

10

+ (2/3)

10

. (c)

_

10

3

__

7

2

_

6

−10

.

2. The complement is a union of three events A

i

= ¦couple i sits together¦, i = 1, 2, 3. Moreover,

P(A

1

) =

2

5

= P(A

2

) = P(A

3

),

P(A

1

∩ A

2

) = P(A

1

∩ A

3

) = P(A

2

∩ A

3

) =

3! 2! 2!

5!

=

1

5

,

P(A

1

∩ A

2

∩ A

3

) =

2! 2! 2! 2!

5!

=

2

15

.

For P(A

1

∩ A

2

), for example, pick a seat for husband

3

. In the remaining row of 5 sets, pick the

ordering for couple

1

, couple

2

, and wife

3

, then the ordering of seats within each of couple

1

, and

couple

2

. Now, by inclusion-exclusion,

P(A

1

∪ A

2

∪ A

3

) = 3

2

5

−3

1

5

+

2

15

=

11

15

,

and our answer is

4

15

.

3. Let A

1

= the event that Swedes are not represented, A

2

= the event that Finns are not

represented, and A

3

= the event that Norwegians are not represented.

P(A

1

∪ A

2

∪ A

3

) = P(A

1

) +P(A

2

) +P(A

3

) −P(A

1

∩ A

2

) −P(A

1

∩ A

3

) −P(A

3

∩ A

2

)

+P(A

1

∩ A

2

∩ A

3

)

=

1

_

20

5

_

__

13

5

_

+

_

17

5

_

+

_

10

5

_

−

_

10

5

_

−0 −

_

7

5

_

+ 0

_

4. The number of bad events is 9

_

5

3

_

8

2

+ 9

_

5

4

_

8 + 9. The ﬁrst term is the number of

numbers in which a digit appears 3 times, but no digit appears 4 times: choose a digit, choose 3

3 AXIOMS OF PROBABILITY 22

positions ﬁlled by it, then ﬁll the remaining position. The second term is the number of numbers

in which a digit appears 4 times, but no digit appears 5 times, and the last term is the number

of numbers in which a digit appears 5 times. The answer then is

1 −

9

_

5

3

_

8

2

+ 9

_

5

4

_

8 + 9

9

5

.

5. (a) Let A

i

be the event that the number i appears exactly 6 times. As A

i

are pairwise

disjoint,

P(A

1

∪ A

2

∪ A

3

∪ A

4

∪ A

5

∪ A

6

) = 6

_

10

6

_

5

4

6

10

.

(b) (a) Now A

i

is the event that the number i appears exactly once. By inclusion-exclusion,

P(A

1

∪ A

2

∪ A

3

∪ A

4

∪ A

5

∪ A

6

)

= 6P(A

1

)

−

_

6

2

_

P(A

1

∩ A

2

)

+

_

6

3

_

P(A

1

∩ A

2

∩ A

3

)

−

_

6

4

_

P(A

1

∩ A

2

∩ A

3

∩ A

4

)

+

_

6

5

_

P(A

1

∩ A

2

∩ A

3

∩ A

4

∩ A

5

)

−

_

6

6

_

P(A

1

∩ A

2

∩ A

3

∩ A

4

∩ A

5

∩ A

6

)

= 6 10

5

9

6

10

−

_

6

2

_

10 9

4

8

6

10

+

_

6

3

_

10 9 8

3

7

6

10

−

_

6

4

_

10 9 8 7

2

6

6

10

+

_

6

5

_

10 9 8 7 6

1

6

10

−0.

6. Below, a hit is shorthand for a chosen number.

P(ticket 1 wins) = P(two hits on each line) +P(two hits on one line, three on the other)

=

3 3 44 + 2 3

_

50

5

_ =

402

_

50

5

_,

3 AXIOMS OF PROBABILITY 23

and

P(ticket 2 wins) = P(two hits among 1, 2, 3) +P(three hits among 1, 2, 3)

=

3

_

47

3

_

+

_

47

3

_

_

50

5

_ =

49726

_

50

5

_ ,

and

P(ticket 3 wins) = P(2, 3 both hit) +P(1, 4 both hit and one of 2, 3 hit)

=

_

48

3

_

+ 2

_

46

2

_

_

50

5

_ =

19366

_

50

5

_ ,

and ﬁnally,

P(ticket 4 wins) = P(3 hit, at least one additional hit on each line) +P(1, 2, 4, 5 all hit)

=

(4

_

45

2

_

+ 4 45 + 1) + 45

_

50

5

_ =

4186

_

50

5

_

4 CONDITIONAL PROBABILITY AND INDEPENDENCE 24

4 Conditional Probability and Independence

Example 4.1. Assume that you have a bag with 11 cubes, out of which 7 have fuzzy surface

and 4 have smooth surface. Out of 7 fuzzy ones, 3 are red and 4 are blue; and out of 4 smooth

ones, 2 are red and 2 are blue. So there are 5 red and 6 blue cubes. Other than color and

fuzziness, cubes have no other distinguishing characteristics.

Your plan is to pull a cube out of the bag at random. Before you start your experiment, the

probability that the pulled-out cube is red is 5/11. Now you reach into the bag and grab the

cube and notice it is fuzzy (but do not pull it out or note its color in any other way). Clearly

the probability should now change to 3/7!

Your experiment clearly has 11 outcomes. Consider the events R, B, F, S that the pulled-

out cube is red, blue, fuzzy, and smooth, respectively. We observed that P(R) = 5/11. For

the probability of a red cube, conditioned on it being fuzzy, we do not have a notation, so we

introduce it here: P(R[F) = 3/7. Note that this also equals

P(R ∩ F)

P(F)

=

P(the selected ball is red and fuzzy)

P(the selected ball is fuzzy)

.

This conveys the idea that with additional information the probability must be adjusted.

This is common in real life. Say bookies estimate your basketball’s team chances of winning a

particular game to be 0.6, 24 hours before the game starts. Two hours before the game starts,

however, it comes out that your team’s star player is out with a sprained ankle. You cannot

expect that the bookies’ odds will remain the same and they change, say, to 0.3. Then the game

starts and at half time your time lead by 25 points. Again, the odds will change, say to 0.8.

Finally, when complete information (that is, the outcome of your experiment, the game in this

case) is known, all probabilities are trivial, 0 or 1.

For general deﬁnition, take events A and B, and assume that P(B) > 0. Then the conditional

probability of A given B equals

P(A[B) =

P(A ∩ B)

P(B)

.

Example 4.2. Here is a question asked on Wall Street job interviews. (This is the original

formulation; the macabre tone is not unusual for such interviews.)

“Let’s play a game of Russian roulette. You are tied to your chair. Here’s a gun, a revolver.

Here’s the barrel of the gun, six chambers, all empty. Now watch me as I put two bullets into

the barrel, into two adjacent chambers. I close the barrel and spin it. I put a gun to your head

and pull the trigger. Click. Lucky you! Now I’m going to pull the trigger one more time. Which

would you prefer, that I spin the barrel ﬁrst, or that I just pull the trigger?”

Assume that the barrel rotates clockwise after the hammer hits and is pulled back. You are

given the choice between an unconditional and a conditional probability of death. The former,

4 CONDITIONAL PROBABILITY AND INDEPENDENCE 25

if the barrel is spun again, remains 1/3. The latter, if the trigger is pulled without the extra

spin, equals the probability that the hammer clicked on an empty slot which is next to a bullet

in the counterclockwise direction, and equals 1/4.

For a ﬁxed condition B, and acting on events A, the conditional probability Q(A) = P(A[B)

satisﬁes the three axioms in Chapter 3. (This is routine to check and a reader who is more the-

oretically inclined might view it as a good exercise.) Thus Q is another probability assignment,

and all consequences of the axioms are valid for it.

Example 4.3. Toss two fair coins, blindfolded. Somebody tells you that you tossed at least

one Heads. What is the probability that both your tosses are Heads.

Here A = ¦both H¦, B = ¦at least one H¦, and

P(A[B) =

P(A∩ B)

P(B)

=

P(both H)

P(at least one H)

=

1

4

3

4

=

1

3

.

Example 4.4. Toss a coin 10 times. If you know (a) that exactly 7 Heads are tossed, (b) that

at least 7 Heads are tossed, what is the probability that your ﬁrst toss is Heads?

For (a),

P(ﬁrst toss H[exactly 7 H’s) =

_

9

6

_

1

2

10

_

10

7

_

1

2

10

=

7

10

.

Why is this not surprising? Conditioned on 7 Heads, they are equally likely to occur on any

given subset of 7 tosses. If you choose 7 tosses out of 10 at random, the ﬁrst toss is included in

your choice with probability

7

10

.

For (b), the answer is, after canceling

1

2

10

,

_

9

6

_

+

_

9

7

_

+

_

9

8

_

+

_

9

9

_

_

10

7

_

+

_

10

8

_

+

_

10

9

_

+

_

10

10

_ =

65

88

≈ 0.7386.

Clearly, the answer should be a little larger than before, because this condition is more advan-

tageous for Heads.

Conditional probabilities are sometimes given, or can be easily determined, especially in

sequential random experiments. Then we can use

P(A

1

∩ A

2

) = P(A

1

)P(A

2

[A

1

),

P(A

1

∩ A

2

∩ A

3

) = P(A

1

)P(A

2

[A

1

)P(A

3

[A

1

∩ A

2

),

etc.

Example 4.5. An urn contains 10 black and 10 white balls. Draw 3 (a) without replacement,

and (b) with replacement. What is the probability that all three are white?

We already know how to do part (a):

4 CONDITIONAL PROBABILITY AND INDEPENDENCE 26

1. Number of outcomes:

_

20

3

_

.

2. Number of ways to select 3 balls out of 10 white ones:

_

10

3

_

.

Our probability is then

(

10

3

)

(

20

3

)

.

To do this problem another way, imagine drawing the balls sequentially. Then we are com-

puting the probability of intersection of three events: P(1st ball is white, 2nd ball is white, and

3rd ball is white). The relevant probabilities are:

1. P(1st ball is white) =

1

2

.

2. P(2nd ball is white[1st ball is white) =

9

19

.

3. P(3rd ball is white[1s two picked are white=

8

18

.

Our probability is then the product

1

2

9

19

8

18

, which equals, as it must, what we obtained before.

This approach is particularly easy in case (b), where the previous colors pulled-out balls do

not aﬀect probabilities on the subsequent stages. The answer therefore is

_

1

2

_

3

.

Theorem 4.1. First Bayes’ formula. Assume that F

1

, . . . , F

n

are pairwise disjoint and that

F

1

∪ . . . ∪ F

n

= Ω, that is, exactly one of them always happens. Then, for any event, A,

P(A) = P(F

1

)P(A[F

1

)+P(F

2

)P(A[F

2

)+. . .+P(F

n

)P(A[F

n

) .

Proof.

P(F

1

)P(A[F

1

) +P(F

2

)P(A[F

2

) +. . . +P(F

n

)P(A[F

n

) = P(A∩ F

1

) +. . . +P(A∩ F

n

)

= P((A ∩ F

1

) ∪ . . . ∪ (A ∩ F

n

))

= P(A∩ (F

1

∪ . . . ∪ F

n

))

= P(A∩ Ω) = P(A)

We call an instance of using this formula “computing the probability by conditioning on

which of the events F

i

happens.” The formula is useful in sequential experiments, when you

face diﬀerent experimental conditions on the second stage depending on what happens on the

ﬁrst stage. Quite often, there are just two events F

i

, that is, an event F and its complement F

c

and we are thus conditioning on whether F happens or not.

Example 4.6. Flip a fair coin. If you toss Heads, roll 1 die. If you toss Tails, roll 2 dice.

Compute the probability that you roll exactly one 6.

Here you condition on the outcome of the coin toss, which could be Heads (event F) or Tails

(event F

c

). If A = ¦exactly one 6¦, then P(A[F) =

1

6

, P(A[F

c

) =

2·5

36

, P(F) = P(F

c

) =

1

2

and

so

P(A) = P(F)P(A[F) +P(F

c

)P(A[F

c

) =

2

9

.

4 CONDITIONAL PROBABILITY AND INDEPENDENCE 27

Example 4.7. Roll a die, then pull at random, without replacement, as many cards from the

deck as the number shown on the die. What is the probability that you get at least one Ace?

Here F

i

= ¦number shown on the die is i¦ for i = 1, . . . , 6. Clearly, P(F

i

) =

1

6

. If A is the

event that you get at least one Ace,

1. P(A[F

1

) =

1

13

,

2. In general, for i ≥ 1, P(A[F

i

) = 1 −

(

48

i

)

(

52

i

)

.

Therefore, by the Bayes’ formula,

P(A) =

1

6

_

1

13

+ 1 −

_

48

2

_

_

52

2

_ + 1 −

_

48

3

_

_

52

3

_ + 1 −

_

48

4

_

_

52

4

_ + 1 −

_

48

5

_

_

52

5

_ + 1 −

_

48

6

_

_

52

6

_

_

.

Example 4.8. Coupon collector problem, revisited. As promised, we will develop a compu-

tationally much better formula than the one in Example 3.9. This will be another example of

conditioning, whereby you (1) reinterpret the problem as a sequential experiment and (2) use the

Bayes’ formula with “conditions” F

i

being relevant events at the ﬁrst stage of the experiment.

Here is how it works on this example. Let p

k,r

be the probability that exactly r (out of total

of n) birthdays are represented among k people, the event we call A. We will ﬁx n and let k

and r be variable. Note that p

k,n

is what we computed by inclusion-exclusion formula.

At the ﬁrst stage you have k − 1 people, than the k’th person arrives on the scene. Let

F

1

be the event that there are r birthdays represented among the k − 1 people, and F

2

the

event that there are r − 1 birthdays represented among the k − 1 people. Let F

3

be the event

that any other number of birthdays occurs with k − 1 people. Clearly, P(A[F

3

) = 0, as the

newcomer contributes either 0 or 1 new birthdays. Moreover, P(A[F

1

) =

r

n

, the probability that

the newcomer duplicates one of existing r birthdays, and P(A[F

2

) =

n−r+1

n

, the probability that

the newcomer does not duplicate one of existing r −1 birthdays. Therefore,

p

k,r

= P(A) = P(A[F

1

)P(F

1

) +P(A[F

2

)P(F

2

) =

r

n

p

k−1,r

+

n −r + 1

n

p

k−1,r−1

,

for k, r ≥ 1, and this, together with boundary conditions

p

0,0

= 1,

p

0,r

= 0, for r > 0,

p

k,0

= 0, for k > 0,

makes the computation fast and precise.

Theorem 4.2. Second Bayes’ formula. Let F

1

, . . . , F

n

and A be as in Theorem 4.1. Then

P(F

j

[A) =

P(F

j

∩ A)

P(A)

=

P(A[F

j

)P(F

j

)

P(A[F

1

)P(F

1

) +. . . +P(A[F

n

)P(F

n

)

.

4 CONDITIONAL PROBABILITY AND INDEPENDENCE 28

Often, F

j

is called a hypothesis, P(F

j

) its prior probability and P(F

j

[E) its posterior prob-

ability.

Example 4.9. We have a fair coin and an unfair coin which always comes out Heads. Choose

one at random, toss it twice. It comes out Heads both times. What is the probability that the

coin is fair?

The relevant events are F = ¦fair coin¦, U = ¦unfair coin¦, and B = ¦both tosses H¦. Then

P(F) = P(U) =

1

2

(as each coin is chosen with equal probability). Moreover, P(B[F) =

1

4

, and

P(B[U) = 1. Our probability then is

1

2

1

4

1

2

1

4

+

1

2

1

=

1

5

.

Example 4.10. A factory has three machines M

1

, M

2

and M

3

, that produce items (say,

lightbulbs). It is impossible to tell which machine produced a particular item, but some are

defective. Here are the known numbers:

machine proportion of items made prob. any made item is defective

M

1

0.2 0.001

M

2

0.3 0.002

M

3

0.5 0.003

You pick an item, test it, and ﬁnd it is defective. What is the probability that it’s made by

the machine M

2

?

The best way to think about this random experiment is as a two-stage procedure. First you

choose a machine with the probabilities given by the proportion. Than that machine produces

an item, which you then proceed to test. (Indeed, this is the same as choosing the item from a

large number of them and test it.)

Let D be the event that an item is defective, and let M

i

also denote the event that the

item is made by machine i. Then P(D[M

1

) = 0.001, P(D[M

2

) = 0.002, P(D[M

1

) = 0.003,

P(M

1

) = 0.2, P(M

2

) = 0.3, P(M

3

) = 0.5, and so

P(M

2

[D) =

0.002 0.3

0.001 0.2 + 0.003 0.5 + 0.002 0.3

≈ 0.26.

Example 4.11. Assume 10% of people have a certain disease. The test gives the correct

diagnosis with probability of 0.8; that is, if the person is sick, the test will be positive with

probability 0.8, but if the person is not sick, the test will be positive with probability 0.2. A

random person from the population has tested positive for the disease. What is the probability

that he is actually sick? (No, it is not 0.8!)

Let us deﬁne the three relevant events: S = ¦sick¦, H = ¦healthy¦, T = ¦tested positive¦.

Now, P(H) = 0.9, P(S) = 0.1, P(T[H) = 0.2 and P(T[S) = 0.8. We are interested in

P(S[T) =

P(T[S)P(S)

P(T[S)P(S) +P(T[H)P(H)

=

8

26

≈ 31%.

4 CONDITIONAL PROBABILITY AND INDEPENDENCE 29

Note that the prior probability P(S) is very important! Without a very good idea about what

it is, a positive test result is diﬃcult to evaluate: a positive test for HIV would mean something

very diﬀerent for a random person as opposed to somebody who gets tested because of risky

behavior.

Example 4.12. O. J. Simpson’s ﬁrst trial , 1995. The famous sports star and media personality

O. J. Simpson was on trial in Los Angeles for murder of his wife and her boyfriend. One

of the many issues was whether Simpson’s history of spousal abuse can be presented by the

prosecution at the trial; that is, whether this history was “probative,” it had some evidentiary

value, or whether it was merely “prejudicial” and should be excluded. Alan Dershowitz, a

famous professor of law at Harvard and a consultant for the defense, was claiming the latter,

citing the statistics that < 0.1% of the men who abuse their wives end up killing them. As

J. F. Merz and J. C. Caulkins pointed out in the journal Chance (Vol. 8, 1995, pg. 14), this is

the wrong probability to look at!

We need to start with the fact that a woman is murdered. These numbered 4, 936 in 1992,

out of which 1, 430 were killed by partners. In other words, if we let

A = ¦the (murdered) woman was abused by the partner¦,

M = ¦the woman was murdered by the partner¦,

then we estimate the prior probabilities P(M) = 0.29, P(M

c

) = 0.71, and what we are interested

in is the posterior probability P(M[A). It was also commonly estimated at the time that

about 5% of the women had been physically abused by their husbands. Thus we can say that

P(A[M

c

) = 0.05 as there is no reason to assume that a woman murdered by somebody else

was more or less likely to be abused by her partner. The ﬁnal number we need is P(A[M).

Dershowitz states that “a considerable number” of wife murderers had previously assaulted

them, although “some” did not. So we will (conservatively) say that P(A[M) = 0.5. (The

two-stage experiment then is: choose a murdered woman at random; on the ﬁrst stage, she is

murdered by her partner, or not, with stated probabilities; on the second stage, she is among

the abused women, or not, with probabilities depending on the outcome of the ﬁrst stage.) By

the Bayes’ formula,

P(M[A) =

P(M)P(A[M)

P(M)P(A[M) +P(M

c

)P(A[M

c

)

=

29

36.1

≈ 0.8.

The law literature studiously avoids quantifying concepts such as probative value and reasonable

doubt. Nevertheless, we can probably say that 80% is considerably too high, compared to the

prior probability of 29%, to use as a sole argument that the evidence is not probative.

Independence

Events A and B are independent if P(A ∩ B) = P(A)P(B), and dependent (or correlated)

otherwise.

4 CONDITIONAL PROBABILITY AND INDEPENDENCE 30

Assuming that P(B) > 0, one could rewrite the condition for independence,

P(A[B) = P(A),

so the probability of A is unaﬀected by knowledge that B occurred. Also, if A and B are

independent,

P(A ∩ B

c

) = P(A) −P(A∩ B) = P(A) −P(A)P(B) = P(A)(1 −P(B)) = P(A)P(B

c

),

so A and B

c

are also independent — knowing that B has not occurred also has no inﬂuence on

the probability of A. Another fact to notice immediately is that disjoint events with nonzero

probability cannot be independent: given that one of them happens, the other cannot happen

and thus its probability drops to zero.

Quite often, independence is an assumption and it is the most important concept in proba-

bility.

Example 4.13. Pull a random card from a full deck. Let A = ¦card is an Ace¦ and R =

¦card is red¦. Are A and R independent?

We have P(A) =

1

13

, P(R) =

1

2

, and, as there are two red Aces, P(R ∩ F) =

2

52

=

1

26

.

The two events are independent — the proportion of aces among red cards is the same as the

proportion among all cards.

Now pull two cards out of the deck sequentially without replacement. Are F = ¦ﬁrst card is an Ace¦

and S = ¦second card is an Ace¦ independent?

Now P(F) = P(S) =

1

13

and P(S[F) =

3

51

, so they are not independent.

Example 4.14. Toss 2 fair coins, F = ¦Heads on 1st toss¦ S = ¦Heads on 2nd toss¦ These are

independent. You will notice that here the independence is in fact an assumption.

How do we deﬁne independence of more than two events? We say that events A

1

, A

2

, . . . , A

n

are independent if

P(A

i

1

∩ . . . ∩ A

i

k

) = P(A

i

1

)P(A

i

2

) . . . P(A

i

k

),

where 1 ≤ i

1

≤ i

2

≤ . . . ≤ i

k

≤ n are arbitrary indices. Occurrence of any combination of events

does not inﬂuence probability of others, and again it can be shown that you can replace some

A

i

by A

c

i

and independent events remain independent.

Example 4.15. Roll a four sided fair die, that is, choose one of the numbers 1, 2, 3, 4 at

random. Let A = ¦1, 2¦, B = ¦1, 3¦, C = ¦1, 4¦. Check that these are pairwise independent

(each pair is independent), but not independent.

Indeed, P(A) = P(B) = P(C) =

1

2

and P(A∩B) = P(A∩C) = P(B∩C) =

1

4

, and pairwise

independence follows. However,

P(A ∩ B ∩ C) =

1

4

,=

1

8

.

A simple reason for lack of independence is

A∩ B ⊂ C,

4 CONDITIONAL PROBABILITY AND INDEPENDENCE 31

so we have complete information on occurrence of C, as soon as we know that A and B both

happen.

Example 4.16. You roll a die, your friend tosses a coin.

• If you roll 6, you win outright.

• If you do not roll 6, and your friend tosses Heads, you lose outright.

• If neither, the game is repeated until decided.

What is the probability that you win?

One way to solve this problem certainly is this:

P(win) = P(win on 1st round) +P(win on 2nd round) +P(win on 3rd round) +. . .

=

1

6

+

_

5

6

1

2

_

1

6

+

_

5

6

1

2

_

2

1

6

+. . . ,

and then you sum up the geometric series. Important note: we have implicitly assumed inde-

pendence between the coin and the die, as well as between several tosses and rolls. This is very

common in problems such as this!

You can avoid all nuisance, however, by the following trick. Let

D = ¦game is decided on 1st round¦,

W = ¦you win¦.

The events D and W are independent, which one can certainly check by computation, but in

fact there is a very good reason to conclude so immediately. The crucial observation is that,

provided that the game is not decided on the 1st round, you are thereafter facing the same game

with the same winning probability; thus

P(W[D

c

) = P(W).

In other words, D

c

and W are independent, and then so are D and W, and therefore

P(W) = P(W[D).

This means that one can solve this problem by computing relevant probabilities for the 1st

round:

P(W[D) =

P(W ∩ D)

P(D)

=

1

6

1

6

+

5

6

1

2

=

2

7

,

which is our answer.

Example 4.17. Craps. Many casinos allow you to bet even money on the following game. Two

dice are rolled, and the sum S is observed.

4 CONDITIONAL PROBABILITY AND INDEPENDENCE 32

• If S ∈ ¦7, 11¦, you win immediately.

• If S ∈ ¦2, 3, 12¦, you lose immediately.

• If S ∈ ¦4, 5, 6, 8, 9, 10¦, the pair of dice is rolled repeatedly until one of the following

happens:

– S repeats, in which case you win.

– 7 appears, in which case you lose.

What is the winning probability?

Let’s look at all possible ways to win:

1. You win on the ﬁrst roll with prob.

8

36

.

2. Otherwise,

• you roll a 4 (prob.

3

36

), then win with prob.

3

36

3

36

+

6

36

=

3

3+6

=

1

3

;

• you roll a 5 (prob.

4

36

), then win with prob.

4

4+6

=

2

5

;

• you roll a 6 (prob.

5

36

), then win with prob.

5

5+6

=

5

11

;

• you roll a 8 (prob.

5

36

), then win with prob.

5

5+6

=

5

11

;

• you roll a 9 (prob.

4

36

), then win with prob.

4

4+6

=

1

5

;

• you roll a 10 (prob.

3

36

), then win with prob.

3

3+6

=

1

3

.

Using the Bayes’ formula,

P(win) =

8

36

+ 2

_

3

36

1

3

+

4

36

2

5

+

5

36

5

11

_

≈ 0.4929,

a decent game by casino standards.

Bernoulli trials

Assume n independent experiments, each of which is a success with probability, p, and thus

failure with probability 1 −p.

In a sequence of n Bernoulli trials, P(exactly k successes) =

_

n

k

_

p

k

(1−p)

n−k

.

This is because the successes can occur on any subset S of k trials out of n, i.e., on any

S ⊂ ¦1, . . . , n¦ with cardinality k. These possibilities are disjoint, as exactly k successes cannot

occur on two diﬀerent such sets. There are

_

n

k

_

such subsets; if you ﬁx such an S, then successes

4 CONDITIONAL PROBABILITY AND INDEPENDENCE 33

must occur on k trials in S, and failures on all n − k trials not in S, and the probability that

this happens is, by the assumed independence, p

k

(1 −p)

n−k

.

Example 4.18. A machine produces items which are independently defective with probability

p. Let’s compute a few probabilities:

1. P(exactly two items among ﬁrst 6 are defective) =

_

6

2

_

p

2

(1 −p)

4

.

2. P(at least one item among ﬁrst 6 is defective) = 1 −P(no defects) = 1 −(1 −p)

6

3. P(at least 2 items among ﬁrst 6 are defective) = 1 −(1 −p)

6

−6p(1 −p)

5

4. P(exactly 100 items are made before 6 defective are found) equals

P(100th item def., exactly 5 items among 1st 99 def.) = p

_

99

5

_

p

5

(1 −p)

94

.

Example 4.19. Problem of Points. This involves ﬁnding the probability of n successes before

m failures in a sequence of Bernoulli trials. Let’s call this probability p

n,m

.

p

n,m

= P(in ﬁrst m+n −1 trials, the number of successes is ≥ n)

=

n+m−1

k=n

_

n +m−1

k

_

p

k

(1 −p)

n+m−1−k

.

The problem is solved, but it needs to be pointed out that computationally this is not the best

formula. It is much more eﬃcient to use recursive formula obtained by conditioning on the

outcome of the ﬁrst trial. Assume m, n ≥ 1. Then,

p

n,m

= P(ﬁrst trial is a success) P(n −1 successes before m failures)

+P(ﬁrst trial is a failure) P(n successes before m−1 failures)

= p p

n−1,m

+ (1 −p) p

n,m−1

.

Together with boundary conditions, valid for m, n ≥ 1,

p

n,0

= 0, p

0,m

= 1,

this again allows for a very speedy and precise computations for large m and n.

Example 4.20. Best of 7. Assume that two equally matched teams, A and B, play a series

of games, and the ﬁrst team with four single-game victories wins the series. As it happens, the

team A lost the ﬁrst game. What is the probability it will win the series? Assume that the

games are Bernoulli trials with success probability

1

2

.

We have

P(A wins the series) = P(4 successes before 3 failures)

=

6

k=4

_

6

k

__

1

2

_

6

=

15 + 6 + 1

2

6

≈ 0.3438.

4 CONDITIONAL PROBABILITY AND INDEPENDENCE 34

Example 4.21. Banach Matchbox Problem. A mathematician carries two matchboxes, each

originally containing n matches. Each time he needs a match, he is equally likely to take it from

either box. What is the probability that, upon reaching for a box and ﬁnding it empty, there

are exactly k matches still in the other box? Here, 0 ≤ k ≤ n.

Let A

1

be the event that the matchbox 1 is the one discovered empty, and that at that

instant matchbox 2 contains k matches. Before that point, he has used n + n − k matches, n

from matchbox 1 and n−k from matchbox 2. This means that he has reached for box 1 exactly

n times in (n +n −k) trials, and for the last time at the (n +1 +n −k)’th trial. Therefore our

probability is

2 P(A

1

) = 2

1

2

_

2n −k

n

_

1

2

2n−k

=

_

2n −k

n

_

1

2

2n−k

.

Example 4.22. Each day, you decide with probability, p, independently to ﬂip a fair coin.

Otherwise, you do nothing. (a) What is the probability to get get exactly 10 heads in ﬁrst 20

days? (b) What is the probability to get 10 heads before 5 tails?

For (a), the probability of getting Heads is p/2 independently each day, so the answer is

_

20

10

_

_

p

2

_

10

_

1 −

p

2

_

10

.

For (b), you can disregard days at which you don’t ﬂip to get

14

k=10

_

14

k

_

1

2

14

.

Example 4.23. You roll a die, and your score is the number shown on the die. Your friend

rolls ﬁve dice and his score is the number of 6’s shown. Compute (a) the probability of the event

A that the two scores are equal and (b) the probability of the event B that you friend’s score is

strictly larger.

In both cases we will condition on your friend’s score — this works a little better in case (b)

than conditioning on your score. Let F

i

, i = 0, . . . , 5, be the event that your friend’s score is i.

Then P(A[F

i

) =

1

6

if i ≥ 1 and P(A[F

0

) = 0. Then by the ﬁrst Bayes’ formula, we get

P(A) =

5

i=1

P(F

i

)

1

6

=

1

6

(1 −P(F

0

)) =

1

6

−

5

5

6

6

≈ 0.0997.

4 CONDITIONAL PROBABILITY AND INDEPENDENCE 35

Moreover, P(B[F

i

) =

i−1

6

if i ≥ 2 and 0 otherwise, and so

P(B) =

5

i=1

P(F

i

)

i −1

6

=

1

6

5

i=1

i P(F

i

) −

1

6

5

i=1

P(F

i

)

=

1

6

5

i=1

i P(F

i

) −

1

6

+

5

5

6

6

=

1

6

5

i=1

i

_

5

i

__

1

6

_

i

_

5

6

_

5−i

−

1

6

+

5

5

6

6

=

1

6

5

6

−

1

6

+

5

5

6

6

≈ 0.0392.

The last equality can be obtained by a computation, but we will soon learn why the sum has to

equal

5

6

.

Problems

1. Consider the following game. Pull one card at random from a full deck of 52 cards. If you

pull an ace, you win outright. If not, then you look at the value of the card (K, Q, and J count

as 10). If the number is 7 or less, you lose outright. Otherwise, you pull (at random, without

replacement) that number of additional cards from the deck. (For example, if you pulled a 9 the

ﬁrst time, you pull 9 more cards.) If you get at least one ace, you win. What are your chances

of winning this game?

2. An item is defective (independently of other items) with probability 0.3. You have a method

of testing whether the item is defective, but it does not always give you correct answer. If

the tested item is defective, the method detects the defect with probability 0.9 (and says it is

good with probability 0.1). If the tested item is good, then the method says it is defective with

probability 0.2 (and gives the right answer with probability 0.8).

A box contains 3 items. You have tested all of them and the tests detect no defects. What

is the probability that none of the 3 items is defective?

3. A chocolate egg either contains a toy or is empty. Assume that each egg contains a toy with

probability p, independently of other eggs. You have 5 eggs; you open the ﬁrst one, see if it

has a toy inside, then do the same for the second one, etc. Let E

1

be the event that you get at

least 4 toys, and E

2

the event that you get at least two toys in succession. Compute P(E

1

) and

P(E

2

). Are E

1

and E

2

independent?

4. You have 16 balls, 3 blue, 4 green, and 9 red. You also have 3 urns. For each of the 16 balls,

4 CONDITIONAL PROBABILITY AND INDEPENDENCE 36

you select an urn at random and put the ball into it. (Urns are large enough to accommodate any

number of balls.) (a) What is the probability that no urn is empty? (b) What is the probability

that each urn contains 3 red balls? (c) What is the probability that each urn contains all three

colors?

5. Assume that you have an n–element set U and you select r independent random subsets

A

1

, . . . , A

r

⊂ U. All A

i

are chosen so that all 2

n

choices are equally likely. Compute (in a simple

closed form) the probability that the A

i

are pairwise disjoint.

Solutions to problems

1. Let

F

1

= ¦Ace ﬁrst time¦,

F

8

= ¦8 ﬁrst time¦,

F

9

= ¦9 ﬁrst time¦,

F

10

= ¦10, J, Q, or K ﬁrst time¦.

Also let W be the event that you win. Then

P(W[F

1

) = 1,

P(W[F

8

) = 1 −

_

47

8

_

_

51

8

_,

P(W[F

9

) = 1 −

_

47

9

_

_

51

9

_,

P(W[F

10

) = 1 −

_

47

10

_

_

51

10

_,

and so,

P(W) =

4

52

+

4

52

_

1 −

_

47

8

_

_

51

8

_

_

+

4

52

_

1 −

_

47

9

_

_

51

9

_

_

+

16

52

_

1 −

_

47

10

_

_

51

10

_

_

.

2. Let F = ¦none are defective¦ and A = ¦test indicates that are not defective¦. By the second

Bayes’ formula,

P(F[A) =

P(A∩ F)

P(A)

=

(0.7 0.8)

3

(0.7 0.8 + 0.3 0.1)

3

=

_

56

59

_

3

.

4 CONDITIONAL PROBABILITY AND INDEPENDENCE 37

3. P(E

1

) = 5p

4

(1 −p) +p

5

= 5p

4

−4p

5

and P(E

2

) = 1 −(1 −p)

5

−5p(1 −p)

4

−

_

4

2

_

p

2

(1 −p)

3

−

p

3

(1 −p)

2

. As E

1

⊂ E

2

, E

1

and E

2

are not independent.

4. (a) Let A

i

= the event that i-th urn is empty.

P(A

1

) = P(A

2

) = P(A

3

) =

_

2

3

_

16

,

P(A

1

∩ A

2

) = P(A

1

∩ A

3

) = P(A

2

∩ A

3

) =

_

1

3

_

16

,

P(A

1

∩ A

2

∩ A

3

) = 0.

Because of this, by inclusion-exclusion,

P(A

1

∪ A

2

∪ A

3

) =

2

16

−1

3

15

,

and

P(no urns are empty) = 1 −P(A

1

∪ A

2

∪ A

3

)

= 1 −

2

16

−1

3

15

.

(b) We can ignore other balls since only red balls matter here.

Hence the result is:

9!

3!3!3!

3

9

=

9!

8 3

12

.

(c) As

P(at least one urn lacks blue) = 3

_

2

3

_

3

−3

_

1

3

_

3

,

P(at least one urn lacks green) = 3

_

2

3

_

4

−3

_

1

3

_

4

,

P(at least one urn lacks red) = 3

_

2

3

_

9

−3

_

1

3

_

9

,

we have, by independence,

P(each urn contains all 3 colors) =

_

1 −

_

3

_

2

3

_

3

−3

_

1

3

_

3

__

_

1 −

_

3

_

2

3

_

4

−3

_

1

3

_

4

__

_

1 −

_

3

_

2

3

_

9

−3

_

1

3

_

9

__

.

4 CONDITIONAL PROBABILITY AND INDEPENDENCE 38

5. This is the same as choosing an r n matrix in which every entry is independently 0 or

1 with probability 1/2 and ending up with at most one 1 in every column. Since columns are

independent, this gives ((1 +r)2

−r

)

n

.

4 CONDITIONAL PROBABILITY AND INDEPENDENCE 39

Interlude: Practice Midterm 1

This practice exam covers the material from the ﬁrst four chapters. Give yourself 50 minutes to

solve the four problems, which you may assume have equal point score.

1. Ten fair dice are rolled. What is the probability that:

(a) At least one 1 appears.

(b) Each of the numbers 1, 2, 3 appears exactly twice, while 4 appears four times.

(c) Each of the numbers 1, 2, 3 appears at least once.

2. Five married couples are seated at random around a round table.

(a) Compute the probability that all couples sit together (i.e., every husband-wife pair occupies

adjacent seats).

(b) Compute the probability that at most one wife does not sit next to her husband.

3. Consider the following game. The player rolls a fair die. If he rolls 3 or less, he loses

immediately. Otherwise he selects, at random, as many cards from the full deck as the number

that came up on the die. The player wins if all four aces are among the selected cards.

(a) Compute the winning probability for this game.

(b) Smith only tells you that he recently played this game once, and won. What is the probability

that he rolled a 6 on the die?

4. A chocolate egg either contains a toy or is empty. Assume that each egg contains a toy with

probability p ∈ (0, 1), independently of other eggs. Each toy is, with equal probability, red,

white, or blue (again, independently of other toys). You buy 5 eggs. Let E

1

be the event that

you get at most 2 toys, and E

2

the event that you get you get at least one red and at least one

white and at least one blue toy (so that you have complete collection).

(a) Compute P(E

1

). Why is this probability very easy to compute when p = 1/2?

(b) Compute P(E

2

).

(c) Are E

1

and E

2

independent? Explain.

4 CONDITIONAL PROBABILITY AND INDEPENDENCE 40

Solutions to Practice Midterm 1

1. Ten fair dice are rolled. What is the probability that:

(a) At least one 1 appears.

Solution:

1 −P(no 1 appears) = 1 −

_

5

6

_

10

.

(b) Each of the numbers 1, 2, 3 appears exactly twice, while 4 appears four times.

Solution:

_

10

2

__

8

2

__

6

2

_

6

10

=

10!

2

3

4! 6

10

.

(c) Each of the numbers 1, 2, 3 appears at least once.

Solution:

Let A

i

be the event that the ith number does not appear. We know the following:

P(A

1

) = P(A

2

) = P(A

3

) =

_

5

6

_

10

,

P(A

1

∩ A

2

) = P(A

1

∩ A

3

) = P(A

2

∩ A

3

) =

_

4

6

_

10

,

P(A

1

∩ A

2

∩ A

3

) =

_

3

6

_

10

.

4 CONDITIONAL PROBABILITY AND INDEPENDENCE 41

Then,

P(1, 2, and 3 all appears at least once)

= P((A

1

∪ A

2

∪ A

3

)

c

)

= 1 −P(A

1

) −P(A

2

) −P(A

3

)

+P(A

1

∩ A

2

) +P(A

2

∩ A

3

) +P(A

1

∩ A

3

)

−P(A

1

∩ A

2

∩ A

3

)

= 1 −3

_

5

6

_

10

+ 3

_

4

6

_

10

−

_

3

6

_

10

.

2. Five married couples are seated at random around a round table.

(a) Compute the probability that all couples sit together (i.e., every husband-wife pair

occupies adjacent seats).

Solution:

Let i be an integer in the set ¦1, 2, 3, 4, 5¦. Denote each husband and wife as h

i

and w

i

, respectively.

i. Fix h

1

to one of the seats.

ii. There are 9! ways to order the remaining 9 people in the remaining 9 seats. This

is our sample space.

iii. There are 2 ways to order w

1

.

iv. Treat each couple as a block and the remaining 8 seats as 4 pairs (where each

pair are adjacent). There are 4! ways to seat the remaining 4 couples into 4 pairs

of seats.

v. There are 2

4

ways to order h

i

and w

i

within the pair of seats.

Therefore, our solution is

2 4! 2

4

9!

.

(b) Compute the probability that at most one wife does not sit next to her husband.

Solution:

Let A be the event that all wives sit next to their husbands, and let B be the event

that exactly one wife does not sit next to her husband. We know that P(A) =

2

5

·4!

9!

from part (a). Moreover, B = B

1

∪ B

2

∪B

3

∪ B

4

∪B

5

, where B

i

is the event that w

i

4 CONDITIONAL PROBABILITY AND INDEPENDENCE 42

does not sit next to h

i

and the remaining couples sit together. Then B

i

are disjoint

and their probabilities are all the same. So we need to determine P(B

1

).

i. Sit h

1

on a speciﬁed seat.

ii. Sample space is 9! for ordering the remaining 9 people into the remaining 9 seats.

iii. Consider each of remaining 4 couples and w

1

as 5 blocks.

iv. As w

1

cannot be next to her husband, you have 3 positions for w

1

in the ordering

of the 5 blocks.

v. There are 4! ways to order the remaining 4 couples.

vi. There are 2

4

ways to order the couples within their blocks.

Therefore,

P(B

1

) =

3 4! 2

4

9!

.

Our answer is then

5

3 4! 2

4

9!

+

2

5

4!

9!

.

3. Consider the following game. The player rolls a fair die. If he rolls 3 or less, he loses

immediately. Otherwise he selects, at random, as many cards from the full deck as the

number that came up on the die. The player wins if all four aces are among the selected

cards.

(a) Compute the winning probability for this game.

Solution:

Let W be the event that you win. Let F

i

be the event that you roll i, where

i = 1, . . . , 6; P(F

i

) =

1

6

.

Since we lose if we roll a 1, 2, or 3, P(W[F

1

) = P(W[F

2

) = P(W[F

3

) = 0. Moreover,

P(W[F

4

) =

1

_

52

4

_,

P(W[F

5

) =

_

5

4

_

_

52

4

_,

P(W[F

6

) =

_

6

4

_

_

52

4

_.

Therefore,

P(W) =

1

6

1

_

52

4

_

_

1 +

_

5

4

_

+

_

6

4

__

.

4 CONDITIONAL PROBABILITY AND INDEPENDENCE 43

(b) Smith only tells you that he recently played this game once, and won. What is the

probability that he rolled a 6 on the die?

Solution:

P(F

6

[W) =

1

6

1

(

52

4

)

_

6

4

_

P(W)

=

_

6

4

_

1 +

_

5

4

_

+

_

6

4

_

=

15

21

=

5

7

.

4. A chocolate egg either contains a toy or is empty. Assume that each egg contains a toy with

probability p ∈ (0, 1), independently of other eggs. Each toy is, with equal probability,

red, white, or blue (again, independently of other toys). You buy 5 eggs. Let E

1

be the

event that you get at most 2 toys, and E

2

the event that you get you get at least one red

and at least one white and at least one blue toy (so that you have complete collection).

(a) Compute P(E

1

). Why is this probability very easy to compute when p = 1/2?

Solution:

P(E

1

) = P(0 toys) +P(1 toy) +P(2 toys)

= (1 −p)

5

+ 5p(1 −p)

4

+

_

5

2

_

p

2

(1 −p)

3

.

When p =

1

2

,

P(at most 2 toys) = P(at least 3 toys)

= P(at most 2 eggs are empty)

Therefore, the probability, P(E

1

) = P(E

c

1

) and P(E

1

) =

1

2

.

4 CONDITIONAL PROBABILITY AND INDEPENDENCE 44

(b) Compute P(E

2

).

Solution:

Let A

1

be the event that red is missing, A

2

the event that white is missing, and

A

3

the event that blue is missing.

P(E

2

) = P((A

1

∪ A

2

∪ A

3

)

C

)

= 1 −3

_

1 −

p

3

_

5

+ 3

_

1 −

2p

3

_

5

−(1 −p)

5

.

(c) Are E

1

and E

2

independent? Explain.

Solution:

No. E

1

∩ E

2

= ∅.

5 DISCRETE RANDOM VARIABLES 45

5 Discrete Random Variables

A random variable is a number whose value depends upon the outcome of a random experiment.

Mathematically, then, a random variable X is a real-valued function on Ω, the space of outcomes:

X : Ω →R.

Sometimes, when convenient, we also allow X to have the value ∞ or more rarely −∞, but

this will not occur in this chapter. The crucial theoretical property that X has to have is that,

for each interval B, the set of outcomes for which X ∈ B is an event, so we are able to talk

about its probability P(X ∈ B). Random variables are traditionally denoted by capital letters

to distinguish them from deterministic quantities.

Example 5.1. Here are some examples of random variables.

1. Toss a coin 10 times, and let X be the number of Heads.

2. Choose a random point in the unit square ¦(x, y) : 0 ≤ x, y ≤ 1¦, and let X be its distance

from the origin.

3. Choose a random person in class, and let X be the height of the person, in inches.

4. Let X be value of the NASDAQ stock index, at the closing of the next business day.

A discrete random variable X has ﬁnitely or countably many values x

i

, i = 1, 2, . . ., and

p(x

i

) = P(X = x

i

) with i = 1, 2, . . . is called the probability mass function of X. Sometimes X

is put into the subscript of its p. m. f., p = p

X

.

A probability mass function p has the following properties:

1. For all i, p(x

i

) > 0 (we do not list values of X which occur with probability 0).

2. For any interval B, P(X ∈ B) =

x

i

∈B

p(x

i

).

3. As X has to have some value,

i

p(x

i

) = 1.

Example 5.2. Let X be the number of Heads in 2 fair coin tosses. Determine its p. m. f.

Possible values of X are 0, 1, and 2. Their probabilities are: P(X = 0) =

1

4

, P(X = 1) =

1

2

,

and P(X = 2) =

1

4

.

You should note that the random variable Y , which counts the number of Tails in the 2

tosses, has the same p. m. f., that is, p

X

= p

Y

, but X and Y are far from being the same

random variable! In general, random variables may have the same p. m. f., but not even deﬁned

on the same set of outcomes.

Example 5.3. An urn contains 20 balls, numbered 1, . . . , 20. Pull out 5 balls at random,

without replacement. Let X be the largest number among selected balls. Determine its p. m. f.

and the probability that at least one of the selected numbers is 15 or more.

5 DISCRETE RANDOM VARIABLES 46

The possible values are 5, . . . , 20. To determine the p. m. f., note that you have

_

20

5

_

outcomes,

and then

P(X = i) =

_

i−1

4

_

_

20

5

_ .

Finally,

P(at least one number 15 or more) = P(X ≥ 15) =

20

i=15

P(X = i).

Example 5.4. An urn contains 11 balls, 3 white, 3 red, and 5 blue balls. Take out 3 balls at

random, without replacement. You win $1 for each red ball you select and lose a $1 for each

white ball you select. Determine the p. m. f. of X, the amount you win.

The number of outcomes is

_

11

3

_

. X can have values −3, −2, −1, 0, 1, 2, and 3. Let’s start

with 0. This can occur with one ball of each color, or with 3 blue balls:

P(X = 0) =

3 3 5 +

_

5

3

_

_

11

3

_ =

55

165

.

To get X = 1, we can have 2 red and 1 white, or 1 red and 2 blue:

P(X = 1) = P(X = −1) =

_

3

2

__

3

1

_

+

_

3

1

__

5

2

_

_

11

3

_ =

39

165

.

The probability that X = −1 is the same because of symmetry between the roles that red and

white balls play. Next, to get X = 2 we have to have 2 red balls and 1 blue:

P(X = −2) = P(X = 2) =

_

3

2

__

5

1

_

_

11

3

_ =

15

165

.

Finally, a single outcome (3 red balls) produces X = 3:

P(X = −3) = P(X = 3) =

1

_

11

3

_ =

1

165

.

All the seven probabilities should add to 1, which can be used either to check the computations,

or to compute the seventh probability (say, P(X = 0)) from the other six.

Assume X is a discrete random variable with possible values x

i

, i = 1, 2 . . .. Then the expected

value, also called expectation, average, or mean, of X is

EX =

i

x

i

P(x = x

i

) =

i

x

i

p(x

i

).

For any function, g : R →R,

Eg(X) =

i

g(x

i

)P(X = x

i

).

5 DISCRETE RANDOM VARIABLES 47

Example 5.5. Let X be a random variable with P(X = 1) = 0.2, P(X = 2) = 0.3, and

P(X = 3) = 0.5. What should be the expected value of X?

We can of course just use the formula, but let’s instead proceed intuitively and see that the

deﬁnition makes sense. What, then should be the average of X?

Imagine a large number n of repetitions of the experiment, and measure the realization of

X in each. By frequency interpretation of probability, about 0.2n realizations will have X = 1,

about 0.3n will have X = 2, and about 0.5n will have X = 3. The average value of X then

should be

1 0.2n + 2 0.3n + 3 0.5n

n

= 1 0.2 + 2 0.3 + 3 0.5 = 2.3,

which of course is the same as the formula gives.

Take a discrete random variable X and let µ = EX. How should we measure the deviation

of X from µ, i.e., how “spread-out” is the p. m. f. of X?

The most natural way would certainly be E[X − µ[. The only problem with this is that

absolute values are annoying. Instead, we deﬁne the variance of X

Var(X) = E(x −µ)

2

.

The quantity that has the correct units is the standard deviation

σ(X) =

_

Var(X) =

_

E(X −µ)

2

.

We will give another, more convenient, formula for the variance, which will use the following

property of expectation, called linearity:

E(α

1

X

1

+α

2

X

2

) = α

1

EX

1

+α

2

EX

2

,

valid for any random variables X

1

and X

2

, and nonrandom constants α

1

and α

2

. This property

will be explained and discussed in more detail later. Then

Var(X) = E[(X −µ)

2

]

= E[X

2

−2µX +µ

2

]

= E(X

2

) −2µE(X) +µ

2

= E(X

2

) −µ

2

= E(X

2

) −(EX)

2

In computations, bear in mind that variance cannot be negative! Furthermore, the only way

that a random variable has 0 variance is when it is equal to its expectation µ with probability

1 (so it’s not really random at all): P(X = µ) = 1. Here is the summary.

The variance of a random variable X is Var(X) = E(X−EX)

2

= E(X

2

)−(EX)

2

.

5 DISCRETE RANDOM VARIABLES 48

Example 5.6. Previous example, continued. Compute Var(X).

E(X

2

) = 1

2

0.2 + 2

2

0.3 + 3

2

0.5 = 5.9,

and (EX)

2

= (2.3)

2

= 5.29, so that Var(X) = 5.9−5.29 = 0.61 and σ(X) =

_

Var(X) ≈ 0.7810.

We will now look at some famous probability mass functions.

5.1 Uniform discrete random variable

This is a random variable with values x

1

, . . . , x

n

, each with equal probability 1/n. Such a random

variable is simply the random choice of one among n numbers.

Properties:

1. EX =

x

1

+...+xn

n

.

2. VarX =

x

2

1

+...+x

2

n

n

−

_

x

1

+...+xn

n

_

2

.

Example 5.7. Let X be the number shown on a fair die. Compute EX, E(X

2

), and Var(X).

This is the standard example of a discrete uniform random variable, and

EX =

1 + 2 +. . . + 6

6

=

7

2

,

EX

2

=

1 + 2

2

+. . . + 6

2

6

=

91

6

,

Var(X) =

91

6

−

_

7

2

_

2

=

35

12

.

5.2 Bernoulli random variable

This is also called an indicator random variable. Assume A is an event with probability p. Then

I

A

, the indicator of A, is given by

I

A

=

_

1 if A happens,

0 otherwise.

Other notations for I

A

include 1

A

and χ

A

. Though simple, such random variables are very

important as building blocks for more complicated random variables.

Properties:

1. E(I

A

) = p.

2. Var(I

A

) = p(1 −p).

For the variance, note that I

2

A

= I

A

, so that E(I

2

A

) = EI

A

= p.

5 DISCRETE RANDOM VARIABLES 49

5.3 Binomial random variable

A Binomial(n,p) random variable counts the number of successes in n independent trials, each

of which is a success with probability p.

Properties:

1. Probability mass function: P(X = i) =

_

n

i

_

p

i

(1 −p)

n−i

, i = 0, . . . , n.

2. EX = np.

3. Var(X) = np(1 −p).

The expectation and variance formulas will be proved later. For now, please take them on faith.

Example 5.8. Let X be the number of heads in 50 tosses of a fair coin. Determine EX, Var(X)

and P(X ≤ 10)? As X is Binomial(50,

1

2

), so EX = 25, Var(X) = 12.5 and

P(X ≤ 10) =

10

i=0

_

50

i

_

1

2

50

.

Example 5.9. Denote by d the dominant gene and r the recessive gene at a single locus.

Then dd is called the pure dominant genotype, dr is called the hybrid, and rr the pure recessive

genotype. The two genotypes with at least one dominant gene, dd and dr, result in the phenotype

of the dominant gene, while rr results in a recessive phenotype.

Assuming that both parents are hybrid and have n children, what is the probability that at

least two will have the recessive phenotype? Each child, independently, gets one of the genes at

random from each parent.

For each child, independently, the probability of the rr genotype is

1

4

. If X is the number of

rr children, then X is Binomial(n,

1

4

). Therefore,

P(X ≥ 2) = 1 −P(X = 0) −P(X = 1) = 1 −

_

3

4

_

n

−n

1

4

_

3

4

_

n−1

.

5.4 Poisson random variable

A random variable is Poisson(λ), with parameter λ > 0, if it has the probability mass function

given below.

Properties:

1. P(X = i) =

λ

i

i!

e

−λ

for i = 0, 1, 2, . . .,

2. EX = λ,

5 DISCRETE RANDOM VARIABLES 50

3. Var(X) = λ.

Here is how you compute the expectation:

EX =

∞

i=1

i e

−λ

λ

i

i!

= e

−λ

λ

∞

i=1

λ

i−1

(i −1)!

= e

−λ

λe

λ

= λ,

and the variance computation is similar (and a good excercise!).

The Poisson random variable is useful as an approximation to a Binomial one, when the

number of trials is large and the probability of success is small. In this context it is often called

the law of rare events, ﬁrst formulated by L. J. Bortkiewicz (in 1898), who studied deaths by

horse kicks in Prussian cavalry.

Theorem 5.1. Poisson approximation to Binomial . When n is large, p is small, and λ = np

is of moderate size, Binomial(n,p) is approximately Poisson(λ):

If X is Binomial(n, p), with p =

λ

n

, then, as n →∞,

P(X = i) →e

λ

λ

i

i!

,

for each ﬁxed i = 0, 1, 2, . . .

Proof.

P(X = i) =

_

n

i

__

λ

n

_

i

_

1 −

λ

n

_

n−i

=

n(n −1) . . . (n −i + 1)

i!

λ

i

n

i

_

1 −

λ

n

_

n

_

1 −

λ

n

_

−i

=

λ

i

i!

_

1 −

λ

n

_

n

n(n −1) . . . (n −i + 1)

n

i

1

_

1 −

λ

n

_

i

→

λ

i

i!

e

−λ

1 1,

as n →∞.

Poisson approximation is quite good: one can prove that the mistake made by computing a

probability using Poisson approximation in place of its exact Binomial expression (in the context

of the above theorem) is no more than

min(1, λ) p.

Example 5.10. Suppose that the probability that a person is killed by lighting in a year is,

independently, 1/500 million. Assume that the US population is 300 million.

5 DISCRETE RANDOM VARIABLES 51

1. Compute P(3 or more people will be killed by lightning next year) exactly.

If X is the number of people killed by lightning, then X is Binomial(n, p) where n = 300

million and p = 1/ 500 million, and the answer is

1 −(1 −p)

n

−np(1 −p)

n−1

−

_

n

2

_

p

2

(1 −p)

n−2

≈ 0.02311530.

2. Approximate the above probability.

As np =

3

5

, X is approximately Poisson(

3

5

), and the answer is

1 −e

−λ

−λe

−λ

−

λ

2

2

e

−λ

≈ 0.02311529.

3. Approximate P(two or more people are killed by lightning within the ﬁrst 6 months of

next year).

This highlights the interpretation of λ as a rate. If lightning deaths occur at the rate of

3

5

a year, they should occur at half that rate in 6 months. Indeed, assuming the lightning

deaths occur as a result of independent factors in disjoint time intervals, we can imagine

that they operate on diﬀerent people in disjoint time intervals. Thus, doubling the time

interval is the same as doubling the number n of people (while keeping p the same), and

then np doubles too. Consequently, halving the time interval has the same p, but half as

many trials, so np changes to

3

10

and so λ =

3

10

as well. The answer is

1 −e

−λ

−λe

−λ

≈ 0.0369.

4. Approximate P(in exactly 3 of next 10 years exactly 3 people are killed by lightning).

In every year, the probability of exactly 3 deaths is approximately

λ

3

3!

e

−λ

, where, again,

λ =

3

5

. Assuming year-to-year independence, the number of years with exactly 3 people

killed is approximately Binomial(10,

λ

3

3!

e

−λ

). The answer is then

_

10

3

__

λ

3

3!

e

−λ

_

3

_

1 −

λ

3

3!

e

−λ

_

7

≈ 4.34 10

−6

.

5. Compute the expected number of years, among the next 10, in which 2 or more people are

killed by lightning.

By the same logic as above, and the formula for Binomal expectation, the answer is

10(1 −e

−λ

−λe

−λ

) ≈ 0.3694.

Example 5.11. Poisson distribution and law. Assume a crime has been committed. It is

known that the perpetrator has certain characteristics, which occur with small frequency p (say,

10

−8

) in a population of size n (say, 10

8

). A person who matches these characteristics has been

5 DISCRETE RANDOM VARIABLES 52

found at random (e.g., at a routine traﬃc stop, or by airport security) and, since p is so small,

charged with the crime. There is no other evidence. What should the defense be?

Let’s start with a mathematical model of this situation. Assume that N is the number of

people with given characteristics. This is a Binomial random variable, but given the assumptions

we can easily assume it is Poisson with λ = np. Choose a person from among these N, label

that person C, the criminal. Then choose at random another person, A, who is arrested. The

question is whether C = A, that is, whether the arrested person is guilty. Mathematically, we

can formulate the problem like this:

P(C = A[ N ≥ 1) =

P(C = A, N ≥ 1)

P(N ≥ 1)

.

We need to condition as the experiment cannot even be performed when N = 0. Now by the

ﬁrst Bayes formula,

P(C = A, N ≥ 1) =

∞

k=1

P(C = A, N ≥ 1 [ N = k) P(N = k)

=

∞

k=1

P(C = A[ N = k) P(N = k)

and

P(C = A[ N = k) =

1

k

,

so

P(C = A, N ≥ 1) =

∞

k=1

1

k

λ

k

k!

e

−λ

.

The probability that the arrested person is guilty then is

P(C = A[N ≥ 1) =

e

−λ

1 −e

−λ

∞

k=1

λ

k

k k!

.

There is no closed-form expression for the sum, but it can be easily computed numerically. The

defense may claim that the probability of innocence, 1−(the above probability), is about 0.2330

when λ = 1, presumably enough for a reasonable doubt.

This model was in fact tested in court, in the famous People v. Collins case, a 1968 jury

trial in Los Angeles. In this instance, it was claimed by prosecution (on ﬂimsy grounds) that

p = 1/12, 000, 000 and n would be the adult couples in the LA area, say n = 5, 000, 000. The

jury convicted the charged couple in a robbery on the basis of the prosecutor’s claim that, due

to low p, “the chances of there being another couple [with speciﬁed characteristics, in the LA

area] must be one in a billion.” The Supreme Court of California reversed the conviction, and

gave two reasons. The ﬁrst reason was insuﬃcient foundation for the estimate of p. The second

reason was that the probability that another couple with matching characteristics exists is in

fact

P(N ≥ 2 [ N ≥ 1) =

1 −e

−λ

−λe

−λ

1 −e

−λ

,

5 DISCRETE RANDOM VARIABLES 53

much larger than the prosecutor claimed, namely for λ =

5

12

about 0.1939. This is about twice

the (more relevant) probability of innocence, which is for this λ about 0.1015.

5.5 Geometric random variable

A Geometric(p) random variable X counts the number trials required for the ﬁrst success in

independent trials with success probability p.

Properties:

1. Probability mass function: P(X = n) = p(1 −p)

n−1

with n = 1, 2, . . .

2. EX =

1

p

.

3. Var(X) =

1−p

p

2

.

4. P(X > n) =

∞

k=n+1

p(1 −p)

k−1

= (1 −p)

n

.

5. P(X > n +k[X > k) =

(1−p)

n+k

(1−p)

k

= P(X > n).

We leave out the proofs of the ﬁrst two formulas, which reduce to manipulations with geo-

metric series.

Example 5.12. Let X be the number of tosses of a fair coin required for the ﬁrst Heads. What

are EX, and Var(X)?

As X is Geometric(

1

2

), EX = 2 and Var(X) = 2.

Example 5.13. You roll a die, your opponent tosses a coin. If you roll 6 you win; if you don’t

roll 6 and your opponent tosses Heads you lose; otherwise, this round ends and the game repeats.

On the average, how many rounds does the game last?

P(game decided on round 1) =

1

6

+

5

6

1

2

=

7

12

,

and so the number of rounds N is Geometric(

7

12

), and

EN =

12

7

.

5 DISCRETE RANDOM VARIABLES 54

Problems

1. Roll a fair die repeatedly. Let X be the number of 6’s in the ﬁrst 10 rolls and let Y the

number of rolls needed to obtain a 3. (a) Write down the probability mass function of X. (b)

Write down the probability mass function of Y . (c) Find an expression for P(X ≥ 6). (d) Find

an expression for P(Y > 10).

2. A biologist needs at least 3 mature specimens of a certain plant. The plant needs a year

to reach maturity; once a seed is planted, any plant will survive for the year with probability

1/1000 (independently of other plants). The biologist plants 3000 seeds. A year is deemed a

success if three or more plants from these seeds reach maturity.

(a) Write down the exact expression for the probability that the biologist will indeed end up

with at least 3 mature plants.

(b) Write down a relevant approximate expression for the probability from (a). Brieﬂy justify

the approximation.

(c) The biologist plans to do this year after year. What is the approximate probability that he

has at least 2 successes in 10 years?

(d) Devise a method to determine the number of seeds the biologist should plant in order to get

at least 3 mature plants in a year with probability at least 0.999. (Your method will probably

require a lengthy calculation – do not try to carry it out with pen and paper.)

3. You are dealt one card at random from a full deck, and your opponent is dealt 2 cards

(without any replacement). If you get an Ace, he pays you $10, if you get a King, he pays you

$5 (regardless of his cards). If you have neither an Ace, nor a King, but your card is red and

your opponent has no red cards, he pays you $1. In all other cases you pay him $1. Determine

your expected earnings. Are they positive?

4. You and your opponent both roll a fair die. If you both roll the same number, the game

is repeated, otherwise whoever rolls the larger number wins. Let N be the number of times

the two dice have to be rolled before the game is decided. (a) Determine the probability mass

function of N. (b) Compute EN. (c) Compute P(you win). (d) Assume that you get paid

$10 for winning on the ﬁrst round, $1 for winning on any other round, and nothing otherwise.

Compute your expected winnings.

5. Each of the 50 students in class belongs to exactly one of the four groups A, B, C, or D. The

membership of the four groups are as follows: A: 5, B: 10, C: 15, D: 20. First, choose one of

the 50 students at random and let X be the size of that student’s group. Next, choose one of

the four groups at random, and let Y be its size. (Recall: all random choices are with equal

probability, unless otherwise speciﬁed.) (a) Write down the probability mass functions for X

and Y . (b) Compute EX and EY . (c) Compute Var(X) and Var(Y ). (d) Assume you have s

5 DISCRETE RANDOM VARIABLES 55

students divided into n groups with membership numbers s

1

, . . . , s

n

, and again X is the size of

the group of a randomly chosen student while Y is the size of the randomly chosen group. Let

EY = µ and Var(Y ) = σ

2

. Express EX with s, n, µ, and σ.

Solutions

1. (a) X is Binomial(10,

1

6

):

P(X = i) =

_

10

i

__

1

6

_

i

_

5

6

_

10−i

,

where i = 0, 1, 2, . . . , 10.

(b) Y is Geometric(

1

6

):

P(Y = i) =

1

6

_

5

6

_

i−1

,

where i = 1, 2, . . .

(c)

P(X ≥ 6) =

10

i=6

_

10

i

__

1

6

_

i

_

5

6

_

10−i

.

(d)

P(Y > 10) =

_

5

6

_

10

.

2. (a) Let X, the number of mature plants, is Binomial

_

3000,

1

1000

_

.

P(X ≥ 3) = 1 −P(X ≤ 2)

= 1 −(0.999)

3000

−3000(0.999)

2999

(0.001) −

_

3000

2

_

(0.999)

2998

(0.001)

2

.

(b) By Poisson approximation with λ = 3000

1

1000

= 3,

P(X ≥ 3) ≈ 1 −e

−3

−3e

−3

−

9

2

e

−3

.

(c) Denote the probability in (b) by s. Then the number of years the biologists succeeds is

approximately Binomial(10, s) and the answer is

1 −(1 −s)

10

−10s(1 −s)

9

.

(d) Solve

e

−λ

+λe

−λ

+

λ

2

2

e

−λ

= 0.001

5 DISCRETE RANDOM VARIABLES 56

for λ and then let n = 1000λ. The equation above can be solved by rewriting

λ = log 1000 + log(1 +λ +

λ

2

2

)

and then solving by iteration. The result is that the biologist should plant 11, 229 seeds.

3. Let X be your earnings.

P(X = 10) =

4

52

,

P(X = 5) =

4

52

,

P(X = 1) =

22

52

_

26

2

_

_

51

2

_ =

11

102

,

P(X = −1) = 1 −

2

13

−

11

102

,

and so

EX =

10

13

+

5

13

+

11

102

−1 +

2

13

+

11

102

=

4

13

+

11

51

> 0

4. (a) N is Geometric(

5

6

):

P(N = n) =

_

1

6

_

n−1

5

6

,

where n = 1, 2, 3, . . ..

(b) EN =

6

5

.

(c) By symmetry, P(you win) =

1

2

.

(d) You get paid $10 with probability

5

12

, $1 with probability

1

12

, and 0 otherwise, so your

expected winnings are

51

12

.

5. (a)

x P(X = x) P(Y = x)

5 0.1 0.25

10 0.2 0.25

15 0.3 0.25

20 0.4 0.25

(b) EX = 15, EY = 12.5.

(c) E(X

2

) = 250, so Var(X) = 25. E(Y

2

) = 187.5, so Var(Y ) = 31.25.

5 DISCRETE RANDOM VARIABLES 57

(d) Let s = s

1

+. . . +s

n

. Then

E(X) =

n

i=1

s

i

s

i

s

=

n

s

n

i=1

s

2

i

1

n

=

n

s

EY

2

=

n

s

(Var(Y ) + (EY )

2

) =

n

s

(σ

2

+µ

2

).

6 CONTINUOUS RANDOM VARIABLES 58

6 Continuous Random Variables

By deﬁnition, X is a continuous random variable if there exists a nonnegative function f so

that, for every interval B,

P(X ∈ B) =

_

B

f(x) dx,

The function f = f

X

is called the density of X.

We will assume that a density function f is continuous, apart from ﬁnitely many (possibly

inﬁnite) jumps. Clearly it must hold that

_

∞

−∞

f(x) dx = 1.

Note also that

P(X ∈ [a, b]) = P(a ≤ X ≤ b) =

_

b

a

f(x) dx,

P(X = a) = 0,

P(X ≤ b) = P(X < b) =

_

b

−∞

f(x) dx.

The function F = F

X

given by

F(x) = P(X ≤ x) =

_

x

−∞

f(s) ds

is called the distribution function of X. On an open interval where f is continuous,

F

(x) = f(x).

The density has the same role as the probability mass function for discrete random variables:

it tells which values x are relatively more probable for X than others. Namely, if h is very small

then

P(X ∈ [x, x +h]) = F(x +h) −F(x) ≈ F

(x) h = f(x) h.

By analogy with discrete random variables, we deﬁne,

EX =

_

∞

−∞

x f(x) dx,

Eg(X) =

_

∞

−∞

g(x) f(x) dx,

and the the variance is computed by the same formula: Var(X) = E(X

2

)−(EX)

2

.

6 CONTINUOUS RANDOM VARIABLES 59

Example 6.1. Let

f(x) =

_

cx if 0 < x < 4,

0 otherwise.

(a) Determine c. (b) Compute P(1 ≤ X ≤ 2). (c) Determine EX and Var(X).

For (a), we use the fact that density integrates to 1, so we have

_

4

0

cxdx = 1, and c =

1

8

. For

(b), we compute

_

2

1

x

8

dx =

3

16

.

Finally, for (c) we get

EX =

_

4

0

x

2

8

dx =

8

3

and

E(X

2

) =

_

4

0

x

3

8

dx = 8.

So Var(X) = 8 −

64

9

=

8

9

.

Example 6.2. Assume that X has density

f

X

(x) =

_

3x

2

if x ∈ [0, 1],

0 otherwise.

Compute the density f

Y

of Y = 1 −X

4

.

On a problem like this, compute ﬁrst the distribution function F

Y

of Y . Before starting,

note that the density f

Y

(y) will be nonzero only when y ∈ [0, 1] as the values of Y are restricted

onto that interval. Now, for y ∈ (0, 1)

F

Y

(y) = P(Y ≤ y) = P(1 −X

4

≤ y) = P(1 −y ≤ X

4

) = P((1 −y)

1

4

≤ X) =

_

1

(1−y)

1

4

3x

2

dx.

It follows that

f

Y

(y) =

d

dy

F

Y

(y) = −3((1 −y)

1

4

)

2

1

4

(1 −y)

−

3

4

(−1) =

3

4

1

(1 −y)

1

4

,

for y ∈ (0, 1), and f

Y

(y) = 0 otherwise. You should observe that it is immaterial how f(y) is

deﬁned at y = 0 and y = 1, because those two values would contribute nothing to any integral.

As for discrete random variables, we now look at some famous densities.

6.1 Uniform random variable

Such random variable represents a choice of a random number in [α, β]. For [α, β] = [0, 1], this

is ideally the output of a computer random number generator.

Properties:

6 CONTINUOUS RANDOM VARIABLES 60

1. Density: f(x) =

_

1

β−α

if x ∈ [α, β],

0 otherwise.

2. EX =

α+β

2

.

3. Var(X) =

(β−α)

2

12

.

Example 6.3. Assume X is uniform on [0, 1]. What is P(X ∈ Q)? What is the probability

that the binary expansion of X starts with 0.010?

As Q is countable, it has an enumeration, say, Q = ¦q

1

, q

2

, . . . ¦. By Axiom 3 of Chapter 3:

P(X ∈ Q) = P(∪

i

¦X = q

i

¦) =

i

P(X = q

i

) = 0.

Note that you cannot do this for sets that are not countable or you would “prove” that P(X ∈

R) = 0, while we of course know that P(X ∈ R) = P(Ω) = 1. As X is, with probability 1,

irrational, its binary expansion is uniquely deﬁned, so there is no ambiguity about what the

second question means.

Divide the [0, 1) into 2

n

intervals of equal length. If binary expansion of a number x ∈ [0, 1)

is 0.x

1

x

2

. . ., the ﬁrst n binary digits determine which of the 2

n

subintervals x belongs to: if you

know that x belongs to an interval I based on ﬁrst n −1 digits, then n’th digit 1 means that x

is in the right half of I and n’th digit 0 means that x is in the left half of I. For example, if the

expansion starts with 0.010, the number is in [0,

1

2

], then in [

1

4

,

1

2

], and then ﬁnally in [

1

4

,

3

8

].

Our answer is

1

8

, but in fact we can make a more general conclusion. If X is uniform on [0, 1]

then, any all 2

n

possibilities for its ﬁrst n binary digits are equally likely. In other words, the

binary digits of X are the result of an inﬁnite sequence of independent fair coin tosses. Choosing

a uniform number on [0, 1] is thus equivalent to tossing a fair coin inﬁnitely many times.

Example 6.4. A uniform random number X divides [0, 1] into two segments. Let R be the

ratio of the smaller versus the larger segment. Compute the density of R.

As R has values in (0, 1), the density f

R

(r) is nonzero only for r ∈ (0, 1), and we will deal

only with such r’s.

F

R

(r) = P(R ≤ r) = P

_

X ≤

1

2

,

X

1 −X

≤ r

_

+P

_

X >

1

2

,

1 −X

X

≤ r

_

= P

_

X ≤

1

2

, X ≤

r

r + 1

_

+P

_

X >

1

2

, X ≥

1

r + 1

_

= P

_

X ≤

r

r + 1

_

+P

_

X ≥

1

r + 1

_ _

since

r

r + 1

≤

1

2

and

1

r + 1

≥

1

2

_

=

r

r + 1

+ 1 −

1

r + 1

=

2r

r + 1

For r ∈ (0, 1), the density thus equals

f

R

(r) =

d

dr

F

R

(r) =

2

(r + 1)

2

.

6 CONTINUOUS RANDOM VARIABLES 61

6.2 Exponential random variable

A random variable is Exponential(λ), with parameter λ > 0, if it has the probability mass

function given below. This is a distribution for a waiting time for some random event, for

example, for a lightbulb to burn out, or for the next earthquake of some given magnitude.

Properties:

1. Density: f(x) =

_

λe

−λx

if x ≥ 0,

0 if x < 0.

2. EX =

1

λ

.

3. Var(X) =

1

λ

2

.

4. P(X ≥ x) = e

−λx

.

5. Memoryless property: P(X ≥ x +y[X ≥ y) = e

−λx

.

The last property means that if the event still has not occured by some given time (no matter

how large), your waiting time distribution is the same as it was at the beginning. There is no

“aging.”

Proofs of these properties are integration exercises and are omitted.

Example 6.5. Assume that a lightbulb last on average 100 hours. Assuming exponential

distribution, compute the probability that it lasts more than 200 hours and the probability that

it lasts less than 50 hours.

Let X be the waiting time for the bulb to burn out. Then X is Exponential with λ =

1

100

and

P(X ≥ 200) = e

−2

≈ .1353,

P(X ≤ 50) = 1 −e

−

1

2

≈ 0.3935.

6.3 Normal random variable

A random variable is Normal with parameters µ ∈ R and σ

2

> 0 or, in short, X is N(µ, σ

2

),

if its density is the function given below. Such a random variable is (at least approximately)

very common. For example, a measurement with a random error, weight of a randomly caught

yellow-billed magpie, SAT (or some other) test score of a randomly chosen student at UC Davis,

etc.

Properties:

1. Density:

f(x) = f

X

(x) =

1

σ

√

2π

e

−

(x−µ)

2

2σ

2

,

6 CONTINUOUS RANDOM VARIABLES 62

where x ∈ (−∞, ∞).

2. EX = µ.

3. Var(X) = σ

2

.

The fact that

_

∞

−∞

f(x) dx = 1.

is a diﬃcult exercise in integration, as is the computation of the variance. Assuming that the

integral of f is 1, we can use symmetry to prove that EX must be µ:

EX =

_

∞

−∞

xf(x) dx =

_

∞

−∞

(x −µ)f(x) dx +µ

_

∞

−∞

f(x) dx

=

_

∞

−∞

(x −µ)

1

σ −

√

2π

e

−

(x−µ)

2

2σ

2

dx +µ

=

_

∞

−∞

z

1

σ

√

2π

e

−

z

2

2σ

2

dz +µ

= µ,

where the last integral was obtained bt the change of variable z = x−µ and is zero because the

function integrated is odd.

Example 6.6. Let X be a N(µ, σ

2

) random variable and let Y = αX +β, with α > 0. How is

Y distributed?

If X is a “measurement with error” αX +β amounts to changing the units and so Y should

still be normal. Let’s see if this is the case. We start by computing the distribution function of

Y ,

F

Y

(y) = P(Y ≤ y)

= P(αX +β ≤ y)

= P

_

X ≤

y −β

α

_

=

_ y−β

α

−∞

f

X

(x) dx

and then the density,

f

Y

(y) = f

X

_

y −β

α

_

1

α

=

1

√

2πσα

e

−

(y−β−αµ)

2

2α

2

σ

2

.

Therefore, Y is normal with EY = αµ +β and Var(Y ) = (ασ)

2

.

6 CONTINUOUS RANDOM VARIABLES 63

In particular,

Z =

X −µ

σ

has EZ = 0 and Var(Z) = 1. Such N(0, 1) random variables are called standard Normal. Its

distribution function F

Z

(z) is denoted by Φ(z). Note that

f

Z

(z) =

1

√

2π

e

−z

2

/2

Φ(z) = F

Z

(z) =

1

√

2π

_

z

−∞

e

−x

2

/2

dx.

The integral for Φ(z) cannot be computed as an elementary function, so approximate values

are given in the tables. Nowadays this is largely obsolete, as computer can easily compute Φ(z)

very accurately for any given z. You should also note that it is enough to know these values for

z > 0, as in this case, by using the fact that f

Z

(x) is an even function,

Φ(−z) =

_

−z

−∞

f

Z

(x) dx =

_

∞

z

f

Z

(x) dx = 1 −

_

z

−∞

f

Z

(x) dx = 1 −Φ(z).

In particular, Φ(0) =

1

2

. Another way to write this is P(Z ≥ −z) = P(Z ≤ z), the form which

is also often useful.

Example 6.7. What is the probability that a Normal random variable diﬀers from its mean µ

by more than σ? More than 2σ? More than 3σ?

In symbols, if X is N(µ, σ

2

), we need to compute P([X − µ[ ≥ σ), P([X − µ[ ≥ 2σ), and

P([X −µ[ ≥ 3σ).

In this, and all other examples of this type, the letter Z will stand for an N(0, 1) random

variable.

We have

P([X −µ[ ≥ σ) = P

_¸

¸

¸

¸

X −µ

σ

¸

¸

¸

¸

≥ 1

_

= P([Z[ ≥ 1) = 2P(Z ≥ 1) = 2(1 −Φ(1)) ≈ 0.3173.

Similarly,

P([X −µ[ ≥ 2σ) = 2(1 −Φ(2)) ≈ 0.0455,

P([X −µ[ ≥ 3σ) = 2(1 −Φ(3)) ≈ 0.0027.

Example 6.8. Assume that X is Normal with mean µ = 2, and variance σ

2

= 25. Compute

the probability that X is between 1 and 4.

6 CONTINUOUS RANDOM VARIABLES 64

Here is the computation:

P(1 ≤ X ≤ 4) = P

_

1 −2

5

≤

X −2

5

≤

4 −2

5

_

= P(−0.2 ≤ Z ≤ 0.4)

= P(Z ≤ 0.4) −P(Z ≤ −0.2)

= Φ(0.4) −(1 −Φ(0.2))

≈ 0.2347 .

Let S

n

be a Binomial(n, p) random variable. Recall that its mean is np and its variance

np(1 − p). If we pretend S

n

is Normal, then

Sn−np

√

np(1−p)

is standard Normal, i.e., N(0, 1). The

next theorem will say that this is approximately true if p be ﬁxed. (e.g., 0.5) while n is large

(e.g., n = 100).

Theorem 6.1. De Moivre-Laplace Central Limit Theorem.

Let S

n

be Binomial(n, p) where p is ﬁxed and n is large. Then,

Sn−np

√

np(1−p)

≈ N(0, 1); more

precisely,

P

_

S

n

−np

_

np(1 −p)

≤ x

_

→Φ(x)

as n →∞, for every real number x.

We should also note that the above theorem is an analytical statement; it says that

k:0≤k≤np+x

√

np(1−p)

_

n

k

_

p

k

(1 −p)

n−k

→

1

√

2π

_

x

−∞

e

−

s

2

2

ds

as n → ∞, for every x ∈ R. Indeed it can, and originally was, proved this way, with a lot of

computational work.

An important issue is the quality of the Normal approximation to Binomial. One can prove

that the diﬀerence between the Binomial probability (in the above theorem) and its limit is at

most

1

_

np (1 −p)

.

A commonly cited rule of thumb is that this is a decent approximation when np(1−p) ≥ 10, but

when np(1−p) = 10 the bound above is larger than 0.3! Various corrections have been developed

to diminish the error, but there are in my opinion obsolete by now. In a situation when the error

1/

_

np (1 −p) is too high, you should simply compute directly with Binomial distribution and

not use the Normal approximation. (We will simply assume that the approximation is adequate

in the examples below.) You should also not forget that when n is large and p is small, say

n = 100 and p =

1

100

, the Poisson approximation (with λ = np) is much better!

6 CONTINUOUS RANDOM VARIABLES 65

Example 6.9. Roulette wheel has 38 slots: 18 red, 18 black, and 2 green. The ball ends at one

of these at random. Let’s say you are a player who plays a large number of games and makes an

even bet $1 on red in every game. After n games, what is the probability that you are ahead?

Answer this for n = 100 and n = 1000.

Let S

n

be the number of times you win. This is a Binomial(n,

9

19

) random variable.

P(ahead) = P(win more than half of the games)

= P

_

S

n

>

n

2

_

= P

_

S

n

−np

_

np(1 −p)

>

1

2

n −np

_

np(1 −p)

_

≈ P

_

Z >

(

1

2

−p)

√

n

_

p(1 −p)

_

For n = 100, we get

P

_

Z >

5

√

90

_

≈ 0.2990,

while for n = 1000, we get

P

_

Z >

5

3

_

≈ 0.0478.

For comparison, the true probabilities are 0.2650 and 0.0448, respectively.

Example 6.10. What would be the answer to the previous example is the game was fair, i.e.,

you bet even money on the outcome of a fair coin toss each time.

Then p =

1

2

and

P(ahead) →P(Z > 0) = 0.5,

as n →∞.

Example 6.11. How many times do you need to toss a fair coin to get 100 heads with probability

90%?

Let n be the number of tosses that we’re looking for. We have S

n

which is Binomial(n,

1

2

)

and need to ﬁnd n so that

P(S

n

≥ 100) ≈ 0.9.

We will use below that n > 200, as the probability would be approximately

1

2

for n = 200 (see

6 CONTINUOUS RANDOM VARIABLES 66

the previous example). Here is the computation:

P

_

S

n

−

1

2

n

1

2

√

n

≥

100 −

1

2

n

1

2

√

n

_

≈ P

_

Z ≥

100 −

1

2

n

1

2

√

n

_

= P

_

Z ≥

200 −n

√

n

_

= P

_

Z ≥ −

_

n −200

√

n

__

= P

_

Z ≤

n −200

√

n

_

= Φ

_

n −200

√

n

_

= 0.9

Now, according to the tables, Φ(1.28) ≈ 0.9, thus we need to solve

n−200

√

n

= 1.28, or

n −1.28

√

n −200 = 0.

This is a quadratic equation in

√

n, with the only positive solution

√

n =

1.28 +

√

1.28

2

+ 800

2

.

Rounding up the number n we get from above, we conclude that n = 219.

Problems

1. A random variable X has density function

f(x) =

_

c(x +

√

x) x ∈ [0, 1],

0 otherwise.

(a) Determine c. (b) Compute E(1/X). (c) Determine the probability density function of

Y = X

2

.

2. The density function of a random variable X is given by

f(x) =

_

a +bx 0 ≤ x ≤ 2,

0 otherwise.

You also know that E(X) = 7/6. (a) Compute a and b. (b) Compute Var(X).

6 CONTINUOUS RANDOM VARIABLES 67

3. After your complaint about their service, a representative of an insurance company promised

to call you “between 7 and 9 this evening.” Assume that this means that the time T of the call

is uniformly distributed in the speciﬁed interval.

(a) Compute the probability that the call arrives between 8:00 and 8:20.

(b) At 8:30, the call still hasn’t arrived. What is the probability that it arrives in the next 10

minutes?

(c) Assume that you know in advance that the call will last exactly 1 hour. From 9 to 9:30,

there is a game show on TV that you wanted to watch. Let M be the amount of time of the

show which you miss because of the call. Compute the expected value of M.

4. Toss a fair coin twice. You win $1 if at least one of the two tosses comes out heads.

(a) Assume that you play this game 300 times. What is, approximately, the probability that

you win at least $250?

(b) Approximately how many times do you need to play so that you win at least $250 with

probability at least 0.99?

5. Roll a die n times, and let M be the number of times you roll 6. Assume that n is large.

(a) Compute the expectation EM.

(b) Write down an approximation, in terms on n and Φ, of the probability that M diﬀers from

its expectation by less than 10%.

(c) How large should n be so that the probability in (b) is larger that 0.99?

Solutions

1. (a) As

1 = c

_

1

0

(x +

√

x) dx = c

_

1

2

+

2

3

_

=

7

6

c,

it follows that c =

6

7

.

(b)

6

7

_

1

0

1

x

(x +

√

x) dx =

18

7

.

(c)

F

r

(y) = P(Y ≤ y)

= P(X ≤

√

y)

=

6

7

_

√

y

0

(x +

√

x) dx,

6 CONTINUOUS RANDOM VARIABLES 68

and so

f

Y

(y) =

_

3

7

(1 +y

−

1

4

) if y ∈ (0, 1),

0 otherwise.

2. (a) From

_

1

0

f(x) dx = 1 you get 2a + 2b = 1 and from

_

1

0

xf(x) dx =

7

6

you get 2a +

8

3

b =

7

6

.

The two equations give a = b =

1

4

.

(b) E(X

2

) =

_

1

0

x

2

f(x) dx =

5

3

, and so Var(X) =

5

3

−(

7

6

)

2

=

11

36

.

3. (a)

1

6

.

(b) Let T be time of the call, from 7pm, in minutes; T is uniform on [0, 120]. P(T ≤ 100[T ≥

90) =

1

3

.

(c) We have

M =

_

¸

_

¸

_

0 if 0 ≤ T ≤ 60,

T −60 if 60 ≤ T ≤ 90,

30 if 90 ≤ T.

Then

EM =

1

120

_

90

60

(t −60) dx +

1

120

_

120

90

30 dx = 11.25.

4. (a) P(win a single game) =

3

4

. If you play n times, the number X of games you win is

Binomial(n,

3

4

). If Z is N(0, 1), then

P(X ≥ 250) ≈ P

_

_

Z ≥

250 −n

3

4

_

n

3

4

1

4

_

_

.

For (a), n = 300, and the above expression is P(Z ≥

10

3

), which is approximately 1 −Φ(3.33) ≈

0.0004.

For (b), you need to ﬁnd n so that the above expression is 0.99, or that

Φ

_

_

250 −n

3

4

_

n

3

4

1

4

_

_

= 0.01.

The argument must be negative, so that

250 −n

3

4

_

n

3

4

1

4

= −2.33.

If x =

√

3n, this yields

x

2

−4.66x −1000 = 0

6 CONTINUOUS RANDOM VARIABLES 69

and solving the quadratic equation gives x ≈ 34.04, n > (34.04)

2

/3, n ≥ 387.

5. (a) M is Binomial(n,

1

6

), so EM =

n

6

.

(b)

P

_¸

¸

¸M −

n

6

¸

¸

¸ <

n

6

0.1

_

≈ P

_

_

[Z[ <

n

6

0.1

_

n

1

6

5

6

_

_

= 2Φ

_

0.1

√

n

√

5

_

−1.

(c) The above must be 0.99. and so Φ

_

0.1

√

n

√

5

_

= 0.995,

0.1

√

n

√

5

= 2.57, and ﬁnally n ≥ 3303.

7 JOINT DISTRIBUTIONS AND INDEPENDENCE 70

7 Joint Distributions and Independence

Discrete Case

Assume you have a pair (X, Y ) of discrete random variables X and Y . Their joint probability

mass function is given by

p(x, y) = P(X = x, Y = y).

so that

P((X, Y ) ∈ A) =

(x,y)∈A

p(x, y).

The marginal probability mass functions are then the p. m. f.’s of X and Y , given by

P(X = x) =

y

P(X = x, Y = y) =

y

p(x, y)

P(Y = y) =

x

P(X = x, Y = y) =

x

p(x, y)

Example 7.1. An urn has 2 red, 5 white, and 3 green balls. Select 3 balls at random and let X

be the number of red balls and Y be the number of white balls. Determine (a) joint p. m. f. of

(X, Y ), (b) marginal p. m. f.’s, (c) P(X ≥ Y ), and (d) P(X = 2[X ≥ Y ).

Joint p. m. f. is given by P(X = x, Y = y) for all possible x and y. In our case, x can be

0, 1, or 2 and y can be 0, 1, 2, or 3. The values are given in the table.

y¸x 0 1 2 P(Y = y)

0 1/120 2 3/120 3/120 10/120

1 5 3/120 2 5 3/120 5/120 50/120

2 10 3/120 10 2/120 0 50/120

3 10/120 0 0 10/120

P(X = x) 56/120 56/120 8/120 1

The last row and column are respective column and row sums, and therefore determine the

marginal p. m. f.’s. To answer (c) we merely add relevant probabilities,

P(X ≥ Y ) =

1 + 6 + 3 + 30 + 5

120

=

3

8

,

and to answer (d), we compute

P(X = 2, X ≥ Y )

P(X ≥ Y )

=

8

120

3

8

=

8

45

.

7 JOINT DISTRIBUTIONS AND INDEPENDENCE 71

Two random variables, are independent if

P(X ∈ A, Y ∈ B) = P(X ∈ A)P(Y ∈ B)

for all intervals A and B. In discrete case, X and Y are independent exactly when

P(X = x, Y = y) = P(X = x)P(Y = y)

for all possible values x and y of X and Y , that is, the joint p. m. f. is the product of marginal

p. m. f.’s.

Example 7.2. In the previous Example, are X and Y independent?

No, and those 0’s in the table are dead giveaways. For example, P(X = 2, Y = 2) = 0, but

neither P(X = 2), nor P(Y = 2) is 0.

Example 7.3. Most often, independence is an assumption. For example, roll a die twice and

let X be the number on the ﬁrst roll and Y be the number on the second roll. Then X and

Y are independent: we are used to assuming that all 36 outcomes of the two rolls are equally

likely, which is the same as assuming that the two random variables are discrete uniform (on

1, 2, . . . , 6) and independent.

Continuous Case

We say that (X, Y ) is jointly continuous pair of random variables if there exists a joint density

f(x, y) ≥ 0, so that

P((X, Y ) ∈ S) =

__

S

f(x, y) dxdy,

where S is some nice (say, open or closed) subset of R

2

Example 7.4. Let (X, Y ) is a random point on S where S is a compact (that is, closed and

bounded) subset of R

2

. What this means is that

f(x, y) =

_

1

area(S)

if (x, y) ∈ S,

0 otherwise.

The simplest example is a square of length of side length 1

f(x, y) =

_

1 if 0 ≤ x ≤ 1 and 0 ≤ y ≤ 1,

0 otherwise.

7 JOINT DISTRIBUTIONS AND INDEPENDENCE 72

Example 7.5. Let

f(x, y) =

_

c x

2

y if x

2

≤ y ≤ 1,

0 otherwise.

Determine (a) the constant c, (b) P(X ≥ Y ), (c) P(X = Y ), and (d) P(X = 2Y ).

For (a),

_

1

−1

dx

_

1

x

2

c x

2

y dy = 1

c

4

21

= 1

and so

c =

21

4

.

For (b) Let S be the region between the graphs y = x

2

and y = x, for x ∈ (0, 1). Then

P(X ≥ Y ) = P((X, Y ) ∈ S)

=

_

1

0

dx

_

x

x

2

21

4

x

2

y dy

=

3

20

Both probabilities in (c) and (d) are 0 because a two-dimensional integral over a line is 0.

If f is a joint density of (X, Y ), then the two marginal densities, which are densities of X and

Y , are computed by integrating out the other valiable:

f

X

(x) =

_

∞

−∞

f(x, y) dy,

f

Y

(y) =

_

∞

−∞

f(x, y) dx.

Indeed, for an interval A, X ∈ A means that (X, Y ) ∈ S where S = AR, and therefore

P(X ∈ A) =

_

A

dx

_

∞

−∞

f(x, y) dy.

The marginal densities formulas now follow from the deﬁnition of density. With some advanced

calculus expertise, the following can be checked.

Two jointly continuous random variables X and Y are independent exactly when the joint

density is the product of marginal ones,

f(x, y) = f

X

(x) f

Y

(y),

for all x and y.

7 JOINT DISTRIBUTIONS AND INDEPENDENCE 73

Example 7.6. Previous example, continued. Compute marginal densities and determine

whether X and Y are independent.

We have

f

X

(x) =

_

1

x

2

21

4

x

2

y dy =

21

8

x

2

(1 −x

4

)

for x ∈ [−1, 1], and 0 otherwise. Moreover,

f

Y

(y) =

_

√

y

−

√

y

21

4

x

2

y dx =

7

2

y

5

2

where y ∈ [0, 1], and 0 otherwise. The two random variables X and Y are clearly not independent

because f(x, y) ,= f

X

(x)f

Y

(y).

Example 7.7. Let (X, Y ) be the random points in the square of length of 1 with bottom left

corner at the origin. Are X and Y independent?

f(x, y) =

_

1 (x, y) ∈ [0, 1] [0, 1],

0 otherwise.

Marginal densities are

f

X

(x) = 1

if x ∈ [0, 1] and

f

Y

(y) = 1

if y ∈ [0, 1], and 0 otherwise. Therefore X and Y are independent.

Example 7.8. Let (X, Y ) be a random point in the triangle ¦(x, y) : 0 ≤ y ≤ x ≤ 1¦. Are X

and Y independent?

Now

f(x, y) =

_

2 0 ≤ y ≤ x ≤ 1

0, otherwise.

The marginal densities are

f

X

(x) = 2x,

if x ∈ [0, 1], and

f

Y

(y) = 2(1 −y)

if y ∈ [0, 1], and 0 otherwise. So X and Y are no longer distributed uniformly and no longer

independent.

We can make a more general conclusion from the last two examples. Assume that (X, Y ) is

jointly continuous pair of random variables, uniform on a compact set S ⊂ R

2

. If they are to

be independent, their marginal densities have to be constant, thus uniform on some sets, say A

and B, and then S = A B. (If A and B are both intervals, then S = A B is a rectangle,

which is is the most common example of independence.)

7 JOINT DISTRIBUTIONS AND INDEPENDENCE 74

Example 7.9. Mr. and Mrs. Smith agree to meet at a speciﬁed location “between 5 and 6

p.m.” Assume that they both arrive there at a random time between 5 and 6, and that their

arrivals are independent. (a) Find the density for the time one of them will have to wait for

the other. (b) Mrs. Smith later tells you she had to wait; given this information, compute the

probability that Mr. Smith arrived before 5:30.

Let X be the time when Mr. Smith arrives and Y be the time when Mrs. Smith arrives,

with the time unit 1 hour. The assumptions imply that (X, Y ) is uniform on [0, 1] [0, 1].

For (a), let T = [X − Y [, which has possible values in [0, 1]. So ﬁx t ∈ [0, 1], and compute

(drawing a picture will also help)

P(T ≤ t) = P([X −Y [ ≤ t)

= P(−t ≤ X −Y ≤ t)

= P(X −t ≤ Y ≤ X +t)

= 1 −(1 −t)

2

= 2t −t

2

,

and so

f

T

(t) = 2 −2t.

For (b), we need to compute

P(X ≤ 0.5[X > Y ) =

P(X ≤ 0.5, X > Y )

P(X > Y )

=

1

8

1

2

=

1

4

.

Example 7.10. Assume X and Y are independent, that X is uniform on [0, 1], and that Y has

density f

Y

(y) = 2y for y ∈ [0, 1] and 0 elsewhere. Compute P(X +Y ≤ 1).

The assumptions determine the joint density of (X, Y )

f(x, y) =

_

2y if (x, y) ∈ [0, 1] [0, 1],

0 otherwise.

To compute the probability in question we compute

_

1

0

dx

_

1−x

0

2y dy

or

_

1

0

dy

_

1−y

0

2y dx,

whichever double integral is easier. The answer is

1

3

.

Example 7.11. Assume you are waiting for two phone calls, from Alice and from Bob. The

waiting time T

1

for Alice’s call has expectation 10 minutes, and the waiting time T

2

for Bob’s

7 JOINT DISTRIBUTIONS AND INDEPENDENCE 75

call has expectation 40 minutes. Assume T

1

and T

2

independent exponential random variables.

What is the probability that Alice’s call will come ﬁrst?

We need to compute P(T

1

< T

2

). Assuming our unit is 10 minutes, we have, for t

1

, t

2

> 0,

f

T

1

(t

1

) = e

−t

1

and

f

T

2

(t

2

) =

1

4

e

−t

2

/4

,

so that the joint density is

f(t

1

, t

2

) =

1

4

e

−t

1

−t

2

/4

,

for t

1

, t

2

≥ 0. Therefore

P(T

1

< T

2

) =

_

∞

0

dt

1

_

∞

t

1

1

4

e

−t

1

−t

2

/4

dt

2

=

_

∞

0

e

−t

1

dt

1

e

−t

1

/4

=

_

∞

0

e

−5t

1

4

dt

1

=

4

5

.

Example 7.12. Buﬀon needle problem. Parallel lines at distance 1 are drawn on a large sheet

of paper. Drop a needle of length onto the sheet. Compute the probability that it intersects

one of the lines.

Let D be the distance from the center of the needle to the nearest line, and Θ be the acute

angle relative to the lines. We will, reasonably, assume that D and Θ are independent and

uniform on their respective intervals 0 ≤ D ≤

1

2

and 0 ≤ Θ ≤

π

2

. Then

P(the needle intersects a line) = P

_

D

sin Θ

<

2

_

= P

_

D <

2

sin Θ

_

.

Case 1: ≤ 1. Then the probability equals

_

π/2

0

2

sin θ dθ

π/4

=

/2

π/4

=

2

π

.

When = 1, you famously get

2

π

, which can be used to get (very poor) approximations for π.

Case 2: > 1. Now the curve, d =

2

sinθ intersects d =

1

2

at θ = arctan(

1

). Then the

probability equals

4

π

_

2

_

arcsin(

1

)

0

sinθ dθ +

_

π

2

−arcsin

1

_

1

2

_

=

4

π

_

2

−

1

2

_

2

−1 +

π

4

−

1

2

arctan

1

_

.

7 JOINT DISTRIBUTIONS AND INDEPENDENCE 76

Similar approach works for cases of more than two random variables. Let’s do a single

example for illustration.

Example 7.13. Assume X

1

, X

2

, X

3

are uniform on [0, 1] and independent. What is P(X

1

+

X

2

+X

3

≤ 1)?

The joint density now is

f

(X

1

,X

2

,X

3

(x

1

, x

2

, x

3

) = f

X

1

(x

1

)f

X

2

(x

2

)f

X

3

(x

3

) =

_

1 if (x

1

, x

2

, x

3

) ∈ [0, 1]

3

,

0 otherwise.

Here then is how we get the answer:

P(X

1

+X

2

+X

3

≤ 1) =

_

1

0

dx

1

_

1−x

1

0

dx

2

_

1−x

1

−x

2

0

dx

3

=

1

6

.

In general, if X

1

, . . . , X

n

are independent an uniform on [0, 1],

P(X

1

+. . . +X

n

≤ 1) =

1

n!

Conditional distributions

The conditional p. m. f. of X given Y = y is in the discrete case given simply by

p

X

(x[Y = y) = P(X = x[Y = y) =

P(X = x, Y = y)

P(Y = y)

.

This is trickier in the continuous case, as we cannot divide by P(Y = y) = 0.

For a jointly continous pair of random varaibles X and Y , we deﬁne the conditional density of

X given Y = y as follows

f

X

(x[Y = y) =

f(x, y)

f

Y

(y)

,

where f(x, y) is of course the joint density of (X, Y ).

Observe than when f

Y

(y) = 0,

_

∞

−∞

f(x, y) dx = 0 and so f(x, y) = 0 for every x. So we

have a

0

0

situation, which we deﬁne to be 0.

Here is a “physicist’s proof” why this should be the conditional density formula

P(X = x +dx[Y = y +dy) =

P(X = x +dx, Y = y +dy)

P(Y = y +dy)

=

f(x, y) dxdy

f

Y

(y) dy

=

f(x, y)

f

Y

(y)

dx

= f

X

(x[Y = y) dx

7 JOINT DISTRIBUTIONS AND INDEPENDENCE 77

Example 7.14. Let (X, Y ) be a random point in the triangle ¦(x, y) : x, y ≥ 0, x + y ≤ 1¦.

Compute f

X

(x[Y = y).

The joint density f(x, y) equals 2 on the triangle. For a given y ∈ [0, 1], we know that if

Y = y then X is between 0 and 1 −y. Moreover,

f

Y

(y) =

_

1−y

0

2 dx = 2(1 −y).

Therefore

f

X

(x[Y = y) =

_

1

1−y

0 ≤ x ≤ 1 −y,

0 otherwise.

In other words, given Y = y, X is distributed uniformly on [0, 1−y], which is hardly surprising.

Example 7.15. Suppose (X, Y ) has joint density

f(x, y) =

_

21

4

x

2

y x

2

≤ y ≤ 1,

0 otherwise.

Compute f

X

(x[Y = y).

We compute ﬁrst

f

Y

(y) =

21

4

y

_

√

y

−

√

y

x

2

dx =

7

2

y

5

2

,

for y ∈ [0, 1]. Then

f

X

(x[Y = y) =

21

4

x

2

y

7

2

y

5/2

=

3

2

x

2

y

−3/2

.

where −

√

y ≤ x ≤

√

y.

Suppose we are asked to compute P(X ≥ Y [Y = y)? This makes no literal sense because

the probability P(Y = y) of the condition is 0. We reinterpret this expression as

P(X ≥ y[Y = y) =

_

∞

y

f

X

(x[Y = y) dx,

which equals

_

√

y

y

3

2

x

2

y

−3/2

dx =

1

2

y

−3/2

_

y

3/2

−y

3

_

=

1

2

_

1 −y

3/2

_

.

Problems

1. Let (X, Y ) be a random point in the square ¦(x, y) : −1 ≤ x, y ≤ 1¦. Compute the

conditional probability P(X ≥ 0 [ Y ≤ 2X). (It may be a good idea to draw a nice picture and

use elementary geometry, rather then calculus.)

7 JOINT DISTRIBUTIONS AND INDEPENDENCE 78

2. Roll a fair die 3 times. Let X be the number of 6’s obtained, and Y the number of 5’s. (a)

Compute the joint probability mass function of X and Y . (b) Are X and Y independent?

3. X and Y are independent random variables, and they have the same density function

f(x) =

_

c(2 −x) x ∈ (0, 1)

0 otherwise.

(a) Determine c. (b) Compute P(Y ≤ 2X) and P(Y < 2X).

4. Let X and Y be independent random variables, both uniformly distributed on [0, 1]. Let

Z = min(X, Y ) be the smaller value of the two.

(a) Compute the density function of Z.

(b) Compute P(X ≤ 0.5[Z ≤ 0.5).

(c) Are X and Z independent?

5. The joint density of (X, Y ) is given by

_

f(x, y) = 3x if 0 ≤ y ≤ x ≤ 1,

0 otherwise.

(a) Compute the conditional density of Y given X = x.

(b) Are X and Y independent?

Solutions to problems

1. After noting the relevant areas,

P(X ≥ 0[Y ≤ 2X) =

P(X ≥ 0, Y ≤ 2X)

P(Y ≤ 2X)

=

1

4

_

2 −

1

2

1

2

1

_

1

2

=

7

8

2. (a) The joint p. m. f. is given by the table

7 JOINT DISTRIBUTIONS AND INDEPENDENCE 79

y¸x 0 1 2 3 P(Y = y)

0

4

3

216

3·4

2

216

3·4

216

1

216

125

216

1

3·4

2

216

3·2·4

216

3

216

0

75

216

2

3·4

216

3

216

0 0

15

216

3

1

216

0 0 0

1

216

P(X = x)

125

216

75

216

15

216

1

216

1

Alternatively, for x, y = 0, 1, 2, 3 and x +y ≤ 3,

P(X = x, Y = y) =

_

3

x

__

3 −x

y

__

1

6

_

x+y

_

4

6

_

3−x

(b) No. P(X = 3, Y = 3) = 0 and P(X = 3)P(Y = 3) ,= 0.

3. (a) From

c

_

1

0

(2 −x) dx = 1

it follows that c =

2

3

.

(b) We have

P(Y ≤ 2X) = P(Y < 2X)

=

_

1

0

dy

_

1

y/2

4

9

(2 −x)(2 −y) dx

=

4

9

_

1

0

(2 −y) dy

_

1

y/2

(2 −x) dx

=

4

9

_

1

0

(2 −y) dy

_

2

_

1 −

y

2

_

−

1

2

_

1 −

y

2

4

__

=

4

9

_

1

0

(2 −y)

_

3

2

−y +

y

2

8

_

dy

=

4

9

_

1

0

_

3 −

9

2

y +

5

4

y

2

−

y

3

8

_

dy

=

4

9

_

3 −

9

4

+

5

12

−

1

32

_

.

4.

7 JOINT DISTRIBUTIONS AND INDEPENDENCE 80

(a) For z ∈ [0, 1]

P(Z ≤ z) = 1 −P(both X and Y are above z) = 1 −(1 −z)

2

= 2z −z

2

,

so that

f

Z

(z) = 2(1 −z),

for z ∈ [0, 1] and 0 otherwise.

(b) From (a), we conclude that P(Z ≤ 0.5) =

3

4

, and P(X ≤ 0.5, Z ≤ 0.5) = P(X ≤ 0.5) =

1

2

,

so the answer is

2

3

.

(c) No: Z ≤ X.

5. (a) Assume x ∈ [0, 1]. As

f

X

(x) =

_

x

0

3xdy = 3x

2

,

we have

f

Y

(y[X = x) =

f(x, y)

f

X

(x)

=

3x

3x

2

=

1

x

for 0 ≤ y ≤ x. In other words, Y is uniform on [0, x].

(b) As the answer in (a) depends on x, the two random variables are not independent.

7 JOINT DISTRIBUTIONS AND INDEPENDENCE 81

Interlude: Practice Midterm 2

This practice exam covers the material from the chapters 5 through 7. Give yourself 50 minutes

to solve the four problems, which you may assume have equal point score.

1. A random variable X has density function

f(x) =

_

c(x +x

2

), x ∈ [0, 1],

0, otherwise.

(a) Determine c.

(b) Compute E(1/X).

(c) Determine the probability density function of Y = X

2

.

2. A certain country holds a presidential election, with two candidates running for the oﬃce.

Not satisﬁed by their choice, each voter casts a vote independently at random, based on the

outcome of a fair coin ﬂip. At the end, there are 4,000,000 valid votes, as well as 20,000 invalid

votes.

(a) Using a relevant approximation, compute the probability that in the ﬁnal count of valid

votes only the numbers for the two candidates will diﬀer by less than 1000 votes.

(b) Each invalid vote is double-checked independently with probability 1/5000. Using a relevant

approximation, compute the probability that at least 3 invalid votes are double-checked.

3. Throw a fair coin 5 times. Let X be the total number of heads among the ﬁrst three tosses,

and Y the total number of heads among the last three tosses. (Note that if the third toss comes

out heads, it is counted both into X and into Y ).

(a) Write down the joint probability mass function of X and Y .

(b) Are X and Y independent? Explain.

(c) Compute the conditional probability P(X ≥ 2 [ X ≥ Y ).

4. Every working day, John comes to the bus stop exactly at 7am. He takes the ﬁrst bus

that arrives. The arrival of the ﬁrst bus is an exponential random variable with expectation 20

minutes.

Also every working day, and independently, Mary comes to the same bus stop at a random

time, uniformly distributed between 7 and 7:30.

(a) What is the probability that John will wait tomorrow for more that 30 minutes?

(b) Assume day-to-day independence. Consider Mary late if she comes after 7:20. What is the

probability that Mary will be late on 2 or more working days among the next 10 working days?

(c) What is the probability that John and Mary will meet at the station tomorrow?

7 JOINT DISTRIBUTIONS AND INDEPENDENCE 82

Solutions to Practice Midterm 2

1. A random variable X has density function

f(x) =

_

c(x +x

2

), x ∈ [0, 1],

0, otherwise.

(a) Determine c.

Solution:

Since

1 = c

_

1

0

(x +x

2

) dx,

1 = c

_

1

2

+

1

3

_

,

1 =

5

6

c,

and so c =

6

5

.

(b) Compute E(1/X).

Solution:

E

_

1

X

_

=

6

5

_

1

0

1

x

(x +x

2

) dx

=

6

5

_

1

0

(1 +x) dx

=

6

5

_

1 +

1

2

_

=

9

5

.

(c) Determine the probability density function of Y = X

2

.

Solution:

7 JOINT DISTRIBUTIONS AND INDEPENDENCE 83

The values of Y are in [0, 1], so we will assume that y ∈ [0, 1]. Then

F(y) = P(Y ≤ y)

= P(X

2

≤ y)

= P(X ≤

√

y)

=

_

√

y

0

6

5

(x +x

2

) dx,

and so

f

Y

(y) =

d

dy

F

Y

(y)

=

6

5

(

√

y +y)

1

2

√

y

=

3

5

(1 +

√

y).

2. A certain country holds a presidential election, with two candidates running for the oﬃce.

Not satisﬁed by their choice, each voter casts a vote independently at random, based on

the outcome of a fair coin ﬂip. At the end, there are 4,000,000 valid votes, as well as 20,000

invalid votes.

(a) Using a relevant approximation, compute the probability that in the ﬁnal count of

valid votes only the numbers for the two candidates will diﬀer by less than 1000 votes.

Solution:

Let S

n

be the vote count for candidate 1. Thus S

n

is Binomial(n, p) where n =

4, 000, 000 and p =

1

2

. Then, n −S

n

is the vote count for candidate 2.

P([S

n

−(n −S

n

)[ ≤ 1000) = P(−1000 ≤ 2S

n

−n ≤ 1000)

= P

_

_

−500

_

n

1

2

1

2

≤

S

n

−n/2

_

n

1

2

1

2

≤

500

_

n

1

2

1

2

_

_

≈ P(−0.5 ≤ Z ≤ 0.5)

= P(Z ≤ 0.5) −P(Z ≤ −0.5)

= P(Z ≤ 0.5) −(1 −P(Z ≤ −0.5))

= 2P(Z ≤ 0.5) −1

= 2Φ(0.5) −1

≈ 2 0.6915 −1

= 0.383.

7 JOINT DISTRIBUTIONS AND INDEPENDENCE 84

(b) Each invalid vote is double-checked independently with probability 1/5000. Using

a relevant approximation, compute the probability that at least 3 invalid votes are

double-checked.

Solution:

Now, let S

n

be the number of double-checked votes, which is Binomial

_

20000,

1

5000

_

,

and thus approximately Poisson(4). Then

P(S

n

≥ 3) = 1 −P(S

n

= 0) −P(S

n

= 1) −P(S

n

= 2)

≈ 1 −e

−4

−4e

−4

−

4

2

2

e

−4

= 1 −13e

−4

.

3. Throw a fair coin 5 times. Let X be the total number of heads among the ﬁrst three tosses,

and Y the total number of heads among the last three tosses. (Note that if the third toss

comes out heads, it is counted both into X and into Y ).

(a) Write down the joint probability mass function of X and Y .

Solution:

P(X = x, Y = y) is given by the table

x¸y 0 1 2 3

0 1/32 2/32 1/32 0

1 2/32 5/32 4/32 1/32

2 1/32 4/32 5/32 2/32

3 0 1/32 2/32 1/32

To compute these, observe that the number of outcomes is 2

5

= 32, and then

P(X = 2, Y = 1) = P(X = 2, Y = 1, 3rd coin Heads)

+P(X = 2, Y = 1, 3rd coin Tails)

=

2

32

+

2

32

=

4

32

,

P(X = 2, Y = 2) =

2 2

32

+

1

32

=

5

32

,

P(X = 1, Y = 1) =

2 2

32

+

1

32

=

5

32

,

etc.

7 JOINT DISTRIBUTIONS AND INDEPENDENCE 85

(b) Are X and Y independent? Explain.

Solution:

No,

P(X = 0, Y = 3) = 0 ,=

1

8

1

8

= P(X = 0)P(Y = 3).

(c) Compute the conditional probability P(X ≥ 2 [ X ≥ Y ).

Solution:

P(X ≥ 2[X ≥ Y ) =

P(X ≥ 2, X ≥ Y )

P(X ≥ Y )

=

1 + 4 + 5 + 1 + 2 + 1

1 + 2 + 1 + 5 + 4 + 1 + 5 + 2 + 1

=

14

22

=

7

11

.

4. Every working day, John comes to the bus stop exactly at 7am. He takes the ﬁrst bus that

arrives. The arrival of the ﬁrst bus is an exponential random variable with expectation 20

minutes.

Also every working day, and independently, Mary comes to the same bus stop at a random

time, uniformly distributed between 7 and 7:30.

(a) What is the probability that John will wait tomorrow for more that 30 minutes?

Solution:

Assume time unit 10 minutes. Let T be the arrival time of the bus. It is Expo-

nential with parameter λ =

1

2

. Then,

f

T

(t) =

1

2

e

−t/2

,

for t ≥ 0, and

P(T ≥ 3) = e

−3/2

.

7 JOINT DISTRIBUTIONS AND INDEPENDENCE 86

(b) Assume day-to-day independence. Consider Mary late if she comes after 7:20. What

is the probability that Mary will be late on 2 or more working days among the next

10 working days?

Solution:

Let X be the Mary’s arrival time. It is uniform on [0, 3]. Therefore,

P(X ≥ 2) =

1

3

.

The number of late days among 10 days is Binomial

_

10,

1

3

_

, and therefore,

P(2 or more late working days among 10 working days)

= 1 −P(0 late) −P(1 late)

= 1 −

_

2

3

_

10

−10

1

3

_

2

3

_

9

(c) What is the probability that John and Mary will meet at the station tomorrow?

Solution:

We have

f

(T,X)

(t, x) =

1

6

e

−t/2

,

for x ∈ [0, 3] and t ≥ 0. Therefore

P(X ≥ T) =

1

3

_

3

0

dx

_

∞

x

1

2

e

−t/2

dt

=

1

3

_

3

0

e

−x/2

dx

=

2

3

(1 −e

−3/2

).

8 MORE ON EXPECTATION AND LIMIT THEOREMS 87

8 More on Expectation and Limit Theorems

If you have a pair of random variables (X, Y ) with joint density f and take another function g

of two variables, then

Eg(X, Y ) =

_ _

g(x, y)f(x, y) dxdy

while if instead (X, Y ) is a discrete pair with joint probability mass function p,

Eg(X, Y ) =

x,y

g(x, y)p(x, y).

Example 8.1. Assume that two among 5 items are defective. Put the items in a random

order, and inspect them one by one. Let X be the number of inspection needed to ﬁnd the ﬁrst

defective item, and Y the number of additional inspections needed to ﬁnd the second defective

item. Compute E[X −Y [.

The joint p. m. f. of mass function of (X, Y ) is given by the following table, which lists

P(X = i, Y = j), together with [i − j[ in brackets whenever this probability is nonzero, for

i, j = 1, 2, 3, 4:

i¸j 1 2 3 4

1 .1 (0) .1 (1) .1 (2) .1 (3)

2 .1 (1) .1 (0) .1 (1) 0

3 .1 (2) .1 (1) 0 0

4 .1 (3) 0 0 0

The answer is therefore E[X −Y [ = 1.4.

Example 8.2. Assume that (X, Y ) is a random point in the right triangle ¦(x, y) : x, y ≥

0, x +y ≤ 1¦. Compute EX, EY , and EXY .

Note that the density is 2 on the triangle, and so,

EX =

_

1

0

dx

_

1−x

0

x2 dy

=

_

1

0

2x(1 −x) dx

= 2

_

1

2

−

1

3

_

=

1

3

,

8 MORE ON EXPECTATION AND LIMIT THEOREMS 88

and therefore, by symmetry, EY = EX =

1

3

. Furthermore,

E(XY ) =

_

1

0

dx

_

1−x

0

xy2 dy

=

_

1

0

2x

(1 −x)

2

2

dx

=

_

1

0

(1 −x)x

2

dx

=

1

3

−

1

4

=

1

12

.

Linearity and monotonicity of expectation

Theorem 8.1. Expectation is linear and monotone:

1. For constants a and b, E(aX +b) = aE(X) +b.

2. For arbitrary random variables X

1

, . . . , X

n

whose expected values exist,

E(X

1

+. . . +X

n

) = E(X

1

) +. . . +E(X

n

).

3. For two random variables X ≤ Y , we have EX ≤ EY .

Proof. We will check the second property for n = 2 and continuous case, that is, that

E(X +Y ) = EX +EY.

This is a consequence of the same property for two-dimensional integrals:

__

(x +y)f(x, y) dxdy =

__

xf(x, y) dxdy +

__

yf(x, y) dxdy.

To prove this property for arbitrary n (and continuous case), one could simply proceed by

induction.

The way we deﬁned expectation, the third property is not absolutely obvious. However, it

is clear that Z ≥ 0 implies EZ ≥ 0, and applying this to Z = Y − X, together with linearity,

does establish the monotonicity.

It should be emphasized again that linearity holds for arbitrary random variables which do

not need to be independent! This is very useful, as we can often write a random variable X as a

sum of (possibly dependent) indicators, X = I

1

+ +I

n

. An instance of this method is called

the indicator trick.

8 MORE ON EXPECTATION AND LIMIT THEOREMS 89

Example 8.3. Assume that an urn contains 10 black, 7 red, and 5 white balls. Take 5 balls out

(a) with and (b) without replacement and let X be the number of red balls pulled out. Compute

EX.

Let I

i

be the indicator of the event that the i’th ball is red, that is,

I

i

= I

{ith ball is red}

=

_

1 if ith ball is red,

0 otherwise.

In both cases then X = I

1

+I

2

+I

3

+I

4

+I

5

.

In part (a), X is Binomial(5,

7

22

), so we know that EX = 5

7

22

, but we will not use this

knowledge. Instead, it is clear that

EI

1

= 1 P(1st ball is red) =

7

22

= EI

2

= . . . = EI

5

.

Therefore, by additivity, EX = 5

7

22

.

For part (b), one way is to compute the p. m. f. of X,

P(X = i) =

_

7

i

__

15

5−i

_

_

22

5

_

where i = 0, 1, . . . , 5, and then

EX =

5

i=0

i

_

7

i

__

15

5−i

_

_

22

5

_ .

However, the indicator trick works exactly as before (the fact that I

i

are now dependent does

not matter) and so the answer is also exactly the same, EX = 5

7

22

.

Example 8.4. Matching problem, revisited. Assume n people buys n gifts, which are then

assigned at random, and let X be the number of people who receive their own gift. What is

EX?

This is another problem very well suited for the indicator trick. Let

I

i

= I

{person i receives own gift}

.

Then

X = I

1

+I

2

+. . . +I

n

.

Moreover,

EI

i

=

1

n

,

for all i, and so

EX = 1.

Example 8.5. Five married couples are seated around a table at random. Let X be the number

of wives who sit next to their husbands. What is EX?

8 MORE ON EXPECTATION AND LIMIT THEOREMS 90

Now let

I

i

= I

{wife i sits next to her husband}

.

Then,

X = I

1

+. . . +I

5

,

and

EI

i

=

2

9

,

so that

EX =

10

9

.

Example 8.6. Coupon collector problem, revisited. Sample from n cards, with replacement,

indeﬁnitely. Let N be the number of cards you need to sample for complete collection, i.e., to

get all diﬀerent cards are represented. What is EN?

Let N

i

be the number the number of additional cards you need to get the i’th new card,

after you have received the (i −1)’st new card.

Then N

1

, the number of cards to receive of 1st new card, is trivial as the ﬁrst card you buy

is new: N

1

= 1. Afterwards, N

2

, the number of additional cards to get the second new card is

Geometric with success probability

n−1

n

. After that N

3

, the number of additional cards to get

the third new card is Geometric with success probability

n−2

n

. In general, N

i

is geometric with

success probability

n−i+1

n

, i = 1, . . . , n, and

N = N

1

+. . . +N

n

,

so that

EN = n

_

1 +

1

2

+

1

3

+. . . +

1

n

_

.

Now we have

n

i=2

1

i

≤

_

n

1

1

x

dx ≤

n−1

i=1

1

i

by comparing the integral with the Riemman sum at left and right endpoints in the division of

[1, n] into [1, 2], [2, 3], . . . , [n −1, n], and so

log n ≤

n

i=1

1

i

≤ log n + 1

which establishes the limit

lim

n→∞

EN

nlog n

= 1.

Example 8.7. Assume that an urn contains 10 black, 7 red, and 5 white balls. Pull 5 balls

out (a) with replacement and (b) without replacement, and let W is the number of white balls

pulled out, and Y the number of diﬀerent colors. Compute EW and EY .

8 MORE ON EXPECTATION AND LIMIT THEOREMS 91

We already know that

EW = 5

5

22

in either case.

Let I

b

, I

r

, and I

w

be the indicatots of the event that, respectively, black, red, and white balls

are represented. Clearly

Y = I

b

+I

r

+I

w

,

and so, in the case with replacement

EY = 1 −

12

5

22

5

+ 1 −

15

5

22

5

+ 1 −

17

5

22

5

.

while in the case without replacement

EY = 1 −

_

12

5

_

_

22

5

_ + 1 −

_

15

5

_

_

22

5

_ + 1 −

_

17

5

_

_

22

5

_.

Expectation and independence

Theorem 8.2. Multiplicativity of expectation for independent factors.

Expectation of product of independent random variables is product of their expectations, i.e., if

X and Y are independent,

E[g(X)h(Y )] = Eg(X) Eh(Y )

Proof. For the continuous case,

E[g(X)h(Y )] =

__

g(x)h(y)f(x, y) dxdy

=

__

g(x)h(y)f

X

(x)g

Y

(y) dxdy

=

_

g(x)f

X

(x) dx

_

h(y)f

Y

(y) dy

= Eg(X) Eh(Y )

Example 8.8. Let’s return to a random point (X, Y ) in the triangle ¦(x, y) : x, y ≥ 0, x +y ≤

1¦. We computed that E(XY ) =

1

12

and EX = EY =

1

3

. The two random variables have

E(XY ) ,= EX EY and they cannot be independent. Of course we knew that they are not

independent already.

Now, let’s pick a random point (X, Y ) is a random point on the square ¦(x, y) : 0 ≤ x, y ≤ 1¦

instead. Now X and Y are independent, and therefore E(XY ) = EX EY =

1

4

.

8 MORE ON EXPECTATION AND LIMIT THEOREMS 92

Finally, pick a random point (X, Y ) in the square rotated by 45 degrees, with corners at

(0, 1), (1, 0), (0, −1), and (−1, 0). Clearly we have, by symmetry,

EX = EY = 0

but also

E(XY ) =

1

2

_

1

−1

dx

_

−1+|x|

1−|x|

xy dy

=

1

2

_

1

−1

xdx

_

1−|x|

−1+|x|

y dy

=

1

2

_

1

−1

xdx 0

= 0.

This is an example where E(XY ) = EX EY even though X and Y are not independent.

Computing expectation by conditioning

For a pair of random variables (X, Y ), deﬁne the conditional expectation of Y given

X = x

E(Y [X = x) =

y

P(Y = y[X = x) (discrete case),

=

_

y

y f

Y

(y[X = x) (continuous case).

Observe that E(Y [X = x) is a function of x, let’s call it g(x) for just a moment. Then we

denote g(X) by E(Y [X). This is the expectation of Y provided we know what X is; note again

that this is an expression dependent on X, and so we can compute its expectation. Here is what

we get.

Theorem 8.3. Tower property.

The formula E(E(Y [X)) = EY holds; less mysteriously, in the discrete case

EY =

x

E(Y [X = x) P(X = x),

and in the continuous case,

EY =

_

E(Y [X = x)f

X

(x) dx.

8 MORE ON EXPECTATION AND LIMIT THEOREMS 93

Proof. To verify this in the discrete case, we write out the expectation inside the sum:

x

y

yP(Y = y[X = x) =

x

y

y

P(X = x, Y = y)

P(X = x)

P(X = x)

=

x

y

yP(X = x, y = y)

= EY

Example 8.9. Once again, consider a random point (X, Y ) in the triangle ¦(x, y) : x, y ≥

0, x +y ≤ 1¦. Given that X = x, Y is distributed uniformly on [0, 1 −x] and so

E(Y [X = x) =

1

2

(1 −x).

So E(Y [X) =

1

2

(1 − X), and the expectation of

1

2

(1 − X) therefore must equal expectation of

Y , and indeed it does as EX = EY =

1

3

, as we know.

Example 8.10. Roll a die and then toss as many coins as shown up on the die. Compute the

expected number of heads.

Let X be the number on the die when x = 1, 2, . . . , 6 and Y be the number of heads. Fix an

x ∈ ¦1, 2, . . . , 6¦. Given that X = x, Y is Binomial(x,

1

2

). In particular,

E(Y [X = x) = x

1

2

,

and therefore,

E(number of heads) = E[Y ]

=

6

x=1

x

1

2

P(X = x)

=

6

x=1

x

1

2

1

6

=

7

4

.

Example 8.11. Here is another job interview question. You die and are presented with three

doors. One of them leads to heaven, one leads to one day in purgatory, and one leads to two

days in purgatory. After your stay in purgatory is over, you go back to the doors and picks

again, but the doors are reshuﬄed each time you come back, so you must in fact choose a door

at random each time. How long is your expected stay in purgatory?

Code the doors 0, 1 and 2, with the obvious meaning, and let N be the number of days in

purgatory.

8 MORE ON EXPECTATION AND LIMIT THEOREMS 94

Then

E(N[your ﬁrst pick is door 0) = 0,

E(N[your ﬁrst pick is door 1) = 1 +EN,

E(N[your ﬁrst pick is door 2) = 2 +EN.

Therefore,

EN = (1 +EN)

1

3

+ (2 +EN)

1

3

,

and solving this equation gives

EN = 3.

Covariance

Let X, Y be random variables. We deﬁne the covariance of (or between) X and Y as

Cov(X, Y ) = E((X −EX)(Y −EY ))

= E(XY −(EX) Y −(EY ) X +EX EY )

= E(XY ) −EX EY −EY EX +EX EY

= E(XY ) −EX EY

To summarize, the most useful formula is

Cov(X, Y ) = E(XY ) −EX EY.

You should immediately note that, if X and Y are independent, then Cov(X, Y ) = 0, but

the converse is false.

Let X and Y be indicator random variables, so X = I

A

and Y = I

B

, for two events A and

B. Then EX = P(A), EY = P(B), E(XY ) = E(I

A∩B

) = P(A ∩ B), and so

Cov(X, Y ) = P(A ∩ B) −P(A)P(B) = P(A)[P(B[A) −P(B)]

If P(B[A) > P(B), we say the two events are positively correlated and in this case the covariance

is positive; if the events are negatively correlated all inequalities are reversed. For general random

variable X and Y , Cov(X, Y ) > 0 means intuitively that, “on the average,” increasing X will

result in larger Y .

Variance of sums of random variables

Theorem 8.4. Variance-covariance formula:

E(

n

i=1

X

i

)

2

=

n

i=1

EX

2

i

+

i=j

E(X

i

X

j

),

Var(

n

i=1

X

i

) =

n

i=1

Var(X

i

) +

i=j

Cov(X

i

, X

j

).

8 MORE ON EXPECTATION AND LIMIT THEOREMS 95

Proof. The ﬁrst formula follows from writing the sum

_

n

i=1

X

i

_

2

=

n

i=1

X

2

i

+

i=j

X

i

X

j

and linearity of expectation. The second formula follows from the ﬁrst:

Var(

n

i=1

X

i

) = E

_

n

i=1

X

i

−E(

n

i=1

X

i

)

_

2

= E

_

n

i=1

(X

i

−EX

i

)

_

2

=

n

i=1

Var(X

i

) +

i=j

E(X

i

−EX

i

)E(X

j

−EX

j

),

which is equivalent to the formula.

Corollary 8.5. Linearity of variance for independent summands.

If X

1

, X

2

, . . . , X

n

are independent, then Var(X

1

+. . . +X

n

) = Var(X

1

)+. . . +Var(X

n

).

The variance-covariance formula often makes computing the variance possible even if the

random variables are not independent, especially when they are indicators.

Example 8.12. Let S

n

be Binomial(n, p). We will now compute its expectation and variance.

The crucial observation is that S

n

=

n

i=1

I

i

, where I

i

is the indicator I

{ith trial is a success}

.

Therefore I

i

are independent. Then ES

n

= np, and

Var(S

n

) =

n

i=1

Var(I

i

)

=

n

i=1

(E(I

i

) −E(I

2

i

))

= n(p −p

2

)

= np(1 −p).

Example 8.13. Matching problem, revisited yet again. Recall that X is the number of people

who get their own gift, and we will compute Var(X).

8 MORE ON EXPECTATION AND LIMIT THEOREMS 96

Recall also that X =

n

i=1

I

i

, where I

i

= I

{ith person gets own gift}

, so that EI

i

=

1

n

and

E(X

2

) = n

1

n

+

i=j

E(I

i

I

j

)

= 1 +

i=j

1

n(n −1)

= 1 + 1

= 2.

Above, E(I

i

I

j

) is the probability that the i’th person and j’th person both get their own gifts,

and thus equals

1

n(n−1)

. We conclude that Var(X) = 1. (In fact, X is for large n very close to

Poisson with λ = 1.)

Example 8.14. Roll a die 10 times. Let X be the number of 6 rolled and Y be the number of

5 rolled. Compute Cov(X, Y ).

Observe that X =

10

i=1

I

i

where I

i

= I

{ith roll is 6}

, and Y =

10

i=1

J

i

where J

i

= I

{ith roll is 5}

.

Then, EX = EY =

10

6

=

5

3

. Moreover E(I

i

J

j

) equals 0 when i = j (because both 5 and 6

cannot be rolled on the same roll), and E(I

i

J

j

) =

1

6

2

if i ,= j (by independence of diﬀerent rolls).

Therefore,

EXY =

10

i=1

10

j=1

E(I

i

J

j

)

=

i=j

1

6

2

=

10 9

36

=

5

2

and

Cov(X, Y ) =

5

2

−

_

5

3

_

2

= −

5

18

,

negative, as should be expected.

Weak Law of Large Numbers

Assume that an experiment is performed in which the event A happens with probability P(A) =

p. At the beginning of the course, we promised to attach a precise meaning to the statement:

“If you repeat the experiment, independently, a large number of times, the proportion of times

A happens converges to p.” We will now do so, and actually prove the more general statement

below.

Theorem 8.6. Weak Law of Large Numbers.

8 MORE ON EXPECTATION AND LIMIT THEOREMS 97

If X, X

1

, X

2

, . . . are independent and identically distributed random variables with ﬁnite expec-

tation and variance, then

X

1

+...+Xn

n

converges to EX in the sense that, for any ﬁxed > 0,

P

_¸

¸

¸

¸

X

1

+. . . +X

n

n

−EX

¸

¸

¸

¸

≥

_

→0,

as n →∞.

In particular, if S

n

is the number of successes in n independent trials, each of which is

success with probability p, then, as we have observed before, S

n

= I

1

+ . . . + I

n

, where I

i

=

I

{success at trial i}

. So, for every > 0,

P

_¸

¸

¸

¸

S

n

n

−p

¸

¸

¸

¸

≥

_

→0,

as n →∞. Thus proportion of successes converges to p in this sense.

Theorem 8.7. Markov Inequality. If X ≥ 0 is a random variable, and a > 0,

P(X ≥ a) ≤

1

a

EX.

Example 8.15. If EX = 1 and X ≥ 0, it must be that P(X ≥ 10) ≤ 0.1.

Proof. Here is the crucial observation:

I

{X≥a}

≤

1

a

X.

Indeed, if X < a, the left-hand side is 0, and the right-hand side nonnegative; while if X ≥ a,

the left-hand side is 1, and the right-hand side at least 1. Taking expectation of both sides, we

get

P(X ≥ a) = E(I

{X≥a}

) ≤

1

a

EX

Theorem 8.8. Chebyshev’s inequality. If EX = µ and Var(X) = σ

2

are both ﬁnite, and k > 0,

then

P([X −µ[ ≥ k) ≤

σ

2

k

2

.

Example 8.16. If EX = 1 and Var(X) = 1, P(X ≥ 10) ≤ P([X −1[ ≥ 9) ≤

1

81

.

Example 8.17. If EX = 1, Var(X) = 0.1,

P([X −1[ ≥ 0.5) ≤

0.1

0.5

2

=

2

5

.

As the previous two examples show, Chebyshev’s inequality is useful if either σ is small or k

is large.

8 MORE ON EXPECTATION AND LIMIT THEOREMS 98

Proof. By Markov’s inequality,

P([X −µ[ ≥ k) = P((X −µ)

2

≥ k

2

) ≤

1

k

2

E(X −µ)

2

=

1

k

2

Var(X).

We are now ready to prove the Weak Law of Large Numbers.

Proof. Denote µ = EX, σ

2

= Var(X), and let S

n

= X

1

+. . . +X

n

. Then

ES

n

= EX

1

+. . . +EX

n

= nµ.

and

Var(S

n

) = nσ

2

.

Therefore, by Chebyshev’s inequality,

P([S

n

−nµ[ ≥ n) ≤

nσ

2

n

2

2

=

σ

2

n

2

→0,

as n →∞.

A careful look through the proof above will show that it remains valid if depends on n, but

goes to 0 slower than

1

√

n

. This suggests that

X

1

+...+Xn

n

converges to EX at the rate of about

1

√

n

. We will make this statement more precise below.

Central Limit Theorem

Theorem 8.9. Central limit theorem.

Assume that X, X

1

, X

2

, . . . are independent, identically distributed random variables, with ﬁnite

µ = EX and σ

2

= V ar(X). Then

P

_

X

1

+. . . +X

n

−µ n

σ

√

n

≤ x

_

→P(Z ≤ x)

as n →∞, where Z is standard Normal.

We will not prove this theorem in detail, but will later give good indication as to why it holds.

You should however observe that it is a remarkable theorem: the random variables X

i

have an

arbitrary distribution (with given expectation and variance), and the theorem says that their

sum approximates a very particular distribution, the normal one. Adding many independent

copies of a random variable erases all information about its distribution other than expectation

and variance!

8 MORE ON EXPECTATION AND LIMIT THEOREMS 99

Example 8.18. Assume that X

n

are independent and uniform on [0, 1] Let S

n

= X

1

+. . . +X

n

.

(a) Compute approximately P(S

200

≤ 90). (b) Using the approximation, ﬁnd n so that P(S

n

≥

50) ≥ 0.99.

We know that EX

i

=

1

2

and Var(X

i

) =

1

3

−

1

4

=

1

12

.

For (a),

P(S

200

≤ 90) = P

_

_

S

200

−200

1

2

_

200

1

12

≤

90 −200

1

2

_

200

1

12

_

_

≈ P(Z ≤ −

√

6)

= 1 −P(Z ≤

√

6)

≈ 1 −0.993

= 0.007

For (b), we rewrite

P

_

_

S

n

−n

1

2

_

n

1

12

≥

50 −n

1

2

_

n

1

12

_

_

= 0.99,

and then approximate

P

_

_

Z ≥ −

n

2

−50

_

n

1

12

_

_

= 0.99,

or

P

_

_

Z ≤

n

2

−50

_

n

1

12

_

_

= Φ

_

_

n

2

−50

_

n

1

12

_

_

= 0.99.

Using the fact that Φ(z) = 0.99 for (approximately) z = 2.326, we get that

n −1.345

√

n −100 = 0

and that n = 115.

Example 8.19. A casino charges $1 for entrance. For promotion, they oﬀer to ﬁrst 30000

“guests” the following game. Roll a fair die:

• if you roll 6, you get free entrance and $2;

• if you roll 5, you get free entrance;

• otherwise, you pay the normal fee.

Compute the number s so that the revenue loss is at most s with probability 0.9.

8 MORE ON EXPECTATION AND LIMIT THEOREMS 100

In symbols, if L is lost revenue, we need to ﬁnd s so that

P(L ≤ s) = 0.9.

We have L = X

1

+ + X

n

where n = 30000, X

i

are independent, and P(X

i

= 0) =

4

6

,

P(X

i

= 1) =

1

6

, and P(X

i

= 3) =

1

6

. Therefore

EX

i

=

2

3

and

Var(X

i

) =

1

6

+

9

6

−

_

2

3

_

2

=

11

9

.

Therefore,

P(L ≤ s) = P

_

_

L −

2

3

n

_

n

11

9

≤

s −

2

3

n

_

n

11

9

_

_

≈ P

_

_

Z ≤

s −

2

3

n

_

n

11

9

_

_

= 0.9,

which gives

s −

2

3

n

_

n

11

9

= 0.8159,

and ﬁnally,

s =

2

3

n + 0.8159

_

n

11

9

≈ 20156.

Problems

1. An urn contains 2 white and 4 black balls. Make 3 draws out of the urn in succession with

replacement. Let X be the total number of balls drawn, and Y the draw of the ﬁrst white ball.

For example, if the draws are white, black, black, then X = 1, Y = 2. Compute E(XY ).

2. The joint density of (X, Y ) is given by

_

f(x, y) = 3x if 0 ≤ y ≤ x ≤ 1,

0 otherwise.

Compute Cov(X, Y ).

8 MORE ON EXPECTATION AND LIMIT THEOREMS 101

3. Five married couples are seated at random in a row of 10 seats.

(a) Compute the expected number of women that sit next to their husbands.

(b) Compute the expected number of women that sit next to at least one man.

4. There are 20 birds that sit in a row on a wire. Each bird looks left or right with equal

probability. Let N be the number of birds not seen by any neighboring bird. Compute EN.

5. Roll a fair die 24 times. Compute, using a relavant approximation, the probability that the

sum of numbers exceeds 100.

Solutions to problems

1. First we determine the joint p. m. f. of (X, Y ). We have

P(X = 0, Y = 1) = P(bbb) =

1

5

,

P(X = 0, Y = 1) = P(bwb or bbw) =

2

5

,

P(X = 0, Y = 1) = P(wbb) =

1

5

,

P(X = 0, Y = 1) = P(bww) =

1

15

,

P(X = 0, Y = 1) = P(wbw) =

1

15

,

P(X = 0, Y = 1) = P(wwb) =

1

15

,

so that

E(XY ) = 1

2

5

+ 2

1

5

+ 2

1

15

+ 4

1

15

+ 6

1

15

=

8

5

.

2. We have

EX =

_

1

0

dx

_

x

0

x 3xdy =

_

1

0

3x

3

dx =

3

4

,

EX =

_

1

0

dx

_

x

0

y 3xdy =

_

1

0

3

2

x

3

dx =

3

8

,

E(XY ) =

_

1

0

dx

_

x

0

xy 3xdy =

_

1

0

3

2

x

4

dx =

3

10

,

so that Cov(X, Y ) =

3

10

−

3

4

3

8

=

3

160

.

8 MORE ON EXPECTATION AND LIMIT THEOREMS 102

3. (a) Let the number be M. Let I

i

= I

{couple i sits together}

. Then

EI

i

=

9! 2!

10!

=

1

5

,

and so

EM = EI

1

+. . . +EI

5

= 5 EI

1

= 1.

(b) Let the number be N. Let I

i

= I

{woman i sits next to a man}

. Then, by dividing into cases,

whereby the woman either sits on one of the two end chairs or on one of the eight middle chairs,

EI

i

=

2

10

5

9

+

8

10

_

1 −

4

9

3

8

_

=

7

9

,

and so

EM = EI

1

+. . . +EI

5

= 5 EI

1

=

35

9

.

4. For the two birds at either end, the probability that it is not seen is

1

2

, while for any other

bird this probability is

1

4

. By the indicator trick

EN = 2

1

2

+ 18

1

4

=

11

2

.

5. Let X

1

, X

2

, . . . be the numbers on successive rolls and S

n

= X

1

+. . . + X

n

be the sum. We

know that EX

i

=

7

2

, and Var(X

i

) =

35

12

. So we have

P(S

24

≥ 100) = P

_

_

S

24

−24

7

2

_

24

35

12

≥

100 −24

7

2

_

24

35

12

_

_

≈ P(Z ≥ 1.85) = 1 −Φ(1.85) ≈ 0.032.

8 MORE ON EXPECTATION AND LIMIT THEOREMS 103

Interlude: Practice Final

This practice exam covers the material from the chapters 1 through 8. Give yourself 120 minutes

to solve the six problems, which you may assume have equal point score.

1. Recall that a full deck of cards contains 13 cards of each of the four suits (♣, ♦, ♥, ♠). Select

cards from the deck at random, one by one, without replacement.

(a) Compute the probability that the ﬁrst four cards selected are all hearts (♥).

(b) Compute the probability that all suits are represented among the ﬁrst four cards selected.

(c) Compute the expected number of diﬀerent suits among the ﬁrst four cards selected.

(d) Compute the expected number of cards you have to select to get the ﬁrst hearts card.

2. Eleven Scandinavians: 2 Swedes, 4 Norwegians and 5 Finns, are seated on a row of 11 chairs

at random.

(a) Compute the probability that all groups sit together (i.e., Swedes occupy adjacent seats,

and so do Norwegians and Finns).

(b) Compute the probability that at least one of the groups sits together.

(c) Compute the probability that the two Swedes have exactly one person sitting between them.

3. You have two fair coins. Toss the ﬁrst coin three times, and let X be the number of heads.

Then toss the second coin X times, that is, as many times as you got heads on the ﬁrst one.

Let Y be the number of heads on the second coin. (For example, if X = 0, Y is automatically

0; if X = 2, toss the second coin twice and count the number of heads to get Y .)

(a) Determine the joint probability mass function of X and Y , that is, write down a formula for

P(X = i, Y = j) for all relevant i and j.

(b) Compute P(X ≥ 2 [ Y = 1).

4. Assume that 2, 000, 000 single male tourists visit Las Vegas every year. Assume also that

each of these tourists independently gets married while drunk with probability 1/1, 000, 000.

(a) Write down the exact probability that exactly 3 male tourists will get married while drunk

next year.

(b) Compute the expected number of such drunk marriages in next 10 years.

(c) Write down a relevant approximate expression for the probability in (a).

(d) Write down an approximate expression for the probability that there will be no such drunk

marriages during at least one of the next 3 years.

5. Toss a fair coin twice. You win $2 if both tosses comes out heads, lose $1 if no toss comes

out heads, and win or lose nothing otherwise.

(a) What is the expected number of games you need to play to win once?

8 MORE ON EXPECTATION AND LIMIT THEOREMS 104

(b) Assume that you play this game 500 times. What is, approximately, the probability that

you win at least $135?

(c) Again, assume that you play this game 500 times. Compute (approximately) the amount

of money x, such that your winnings will be at least x with probability 0.5. Then do the same

with probability 0.9.

6. Two random variables X and Y are independent and have the same probability density

function

g(x) =

_

c(1 +x) x ∈ [0, 1],

0 otherwise.

(a) Find the value of c. Here and in (b): use

_

1

0

x

n

dx =

1

n+1

, for n > −1.

(b) Find Var(X +Y ).

(c) Find P(X +Y < 1) and P(X +Y ≤ 1). Here and in (d): when you get to a single integral

involving powers, stop.

(d) Find E[X −Y [.

8 MORE ON EXPECTATION AND LIMIT THEOREMS 105

Solutions to Practice Final

1. Recall that a full deck of cards contains 13 cards of each of the four suits (♣, ♦, ♥, ♠).

Select cards from the deck at random, one by one, without replacement.

(a) Compute the probability that the ﬁrst four cards selected are all hearts (♥).

Solution:

_

13

4

_

_

52

4

_

(b) Compute the probability that all suits are represented among the ﬁrst four cards

selected.

Solution:

13

4

_

52

4

_

(c) Compute the expected number of diﬀerent suits among the ﬁrst four cards selected.

Solution:

If X is the number of suits represented, then X = I

♥

+ I

♦

+ I

♣

+ I

♠

, where I

♥

=

I

{♥ is represented}

, etc. Then

EI

♥

= 1 −

_

39

4

_

_

52

4

_,

which is the same for other three indicators, so

EX = 4 EI

♥

= 4

_

1 −

_

39

4

_

_

52

4

_

_

.

(d) Compute the expected number of cards you have to select to get the ﬁrst hearts card.

Solution:

Label non-♥ cards 1, . . . , 39 and let I

i

= I

{card i selected before any ♥ card}

. Then EI

i

=

1

14

for any i. If N is our number of cards you have to select to get the ﬁrst hearts

card, then

EN = E(I

1

+ +EI

39

) =

39

14

.

8 MORE ON EXPECTATION AND LIMIT THEOREMS 106

2. Eleven Scandinavians: 2 Swedes, 4 Norwegians and 5 Finns, are seated on a row of 11

chairs at random.

(a) Compute the probability that all groups sit together (i.e., Swedes occupy adjacent

seats, and so do Norwegians and Finns).

Solution:

2! 4! 5! 3!

11!

(b) Compute the probability that at least one of the groups sits together.

Solution:

Deﬁne A

S

= ¦Swedes sit together¦, and similarly A

N

and A

F

. Then

P(A

S

∪ A

N

∪ A

F

)

= P(A

S

) +P(A

N

) +P(A

F

)

−P(A

S

∩ A

N

) −P(A

S

∩ A

F

) −P(A

N

∩ A

F

)

+P(A

S

∩ A

N

∩ A

F

)

=

2! 10! + 4! 8! + 5! 7! −2! 4! 7! −2! 5! 6! −4! 5! 3! + 2! 4! 5! 3!

11!

(c) Compute the probability that the two Swedes have exactly one person sitting between

them.

Solution:

The two Swedes may occupy chairs 1, 3; or 2, 4; or 3, 5; ...; or 9, 11. These are exactly

9 possibilities, so the answer is

9

_

11

2

_ =

9

55

.

3. You have two fair coins. Toss the ﬁrst coin three times, and let X be the number of heads.

Then toss the second coin X times, that is, as many times as you got heads on the ﬁrst

one. Let Y be the number of heads on the second coin. (For example, if X = 0, Y is

automatically 0; if X = 2, toss the second coin twice and count the number of heads to

get Y .)

8 MORE ON EXPECTATION AND LIMIT THEOREMS 107

(a) Determine the joint probability mass function of X and Y , that is, write down a

formula for P(X = i, Y = j) for all relevant i and j.

Solution:

P(X = i, Y = j) = P(X = i)P(Y = j[X = i) =

_

3

i

_

1

2

3

_

i

j

_

1

2

i

for 0 ≤ j ≤ i ≤ 3.

(b) Compute P(X ≥ 2 [ Y = 1).

Solution:

This equals

P(X = 2, Y = 1) +P(X = 3, Y = 1)

P(X = 1, Y = 1) +P(X = 2, Y = 1) +P(X = 3, Y = 1)

=

5

9

.

4. Assume that 2, 000, 000 single male tourists visit Las Vegas every year. Assume also that

each of these tourists independently gets married while drunk with probability 1/1, 000, 000.

(a) Write down the exact probability that exactly 3 male tourists will get married while

drunk next year.

Solution:

With X equal to the number of such drunken marriages, X is Binomial(n, p) with

p = 1/1, 000, 000 and n = 2, 000, 000, so we have

P(X = 3) =

_

n

3

_

p

3

(1 −p)

n−3

(b) Compute the expected number of such drunk marriages in next 10 years.

Solution:

As X is binomial, its expected value is np = 2, so the answer is 10 EX = 20.

(c) Write down a relevant approximate expression for the probability in (a).

Solution:

We use that X is approximately Poisson with λ = 2, so that the answer is

λ

3

3!

e

−λ

=

4

3

e

−2

.

8 MORE ON EXPECTATION AND LIMIT THEOREMS 108

(d) Write down an approximate expression for the probability that there will be no such

drunk marriages during at least one of the next 3 years.

Solution:

This equals 1−P(at least one such marriage in each of the next 3 years), which equals

1 −

_

1 −e

−2

_

3

.

5. Toss a fair coin twice. You win $2 if both tosses comes out heads, lose $1 if no toss comes

out heads, and win or lose nothing otherwise.

(a) What is the expected number of games you need to play to win once?

Solution:

The probability of winning in

1

4

. The answer, expectation of a Geometric

_

1

4

_

random

variable, is 4.

(b) Assume that you play this game 500 times. What is, approximately, the probability

that you win at least $135?

Solution:

Let X be the winnings in one game, and X

1

, X

2

, . . . , X

n

winnings in successive games,

with S

n

= X

1

+. . . +X

n

. Then we have

EX = 2

1

4

−1

1

4

=

1

4

and

Var(X) = 4

1

4

+ 1

1

4

−

_

1

4

_

2

=

19

16

.

Thus

P(S

n

≥ 135) = P

_

_

S

n

−n

1

4

_

n

19

16

≥

135 −n

1

4

_

n

19

16

_

_

≈ P

_

_

Z ≥

135 −n

1

4

_

n

19

16

_

_

,

where Z is standard Normal. Using n = 500, we get the answer

1 −Φ

_

_

10

_

500

19

16

_

_

.

8 MORE ON EXPECTATION AND LIMIT THEOREMS 109

(c) Again, assume that you play this game 500 times. Compute (approximately) the

amount of money x, such that your winnings will at least x with probability 0.5.

Then do the same with probability 0.9.

Solution:

For probability 0.5, the answer is exactly ES

n

= n

1

4

= 125. For probability 0.9, we

approximate

P(S

n

≥ x) ≈ P

_

_

Z ≥

x −125

_

500

19

16

_

_

= P

_

_

Z ≤

125 −x

_

500

19

16

_

_

= Φ

_

_

125 −x

_

500

19

16

_

_

,

where we have used that x < 125. Then we use that Φ(z) = 0.9 at z ≈ 1.28, leading

to the equation

125 −x

_

500

19

16

= 1.28

and therefore

x = 125 −1.28

_

500

19

16

.

(This gives x = 93.)

6. Two random variables X and Y are independent and have the same probability density

function

g(x) =

_

c(1 +x) x ∈ [0, 1],

0 otherwise.

(a) Find the value of c. Here and in (b): use

_

1

0

x

n

dx =

1

n+1

, for n > −1.

Solution:

As

1 = c

_

1

0

(1 +x) dx = c

3

2

,

we have c =

2

3

(b) Find Var(X +Y ).

Solution:

8 MORE ON EXPECTATION AND LIMIT THEOREMS 110

By independence, this equals 2Var(X) = 2(E(X

2

) −(EX)

2

). Moreover,

E(X) =

2

3

_

1

0

x(1 +x) dx =

5

9

,

E(X

2

) =

2

3

_

1

0

x

2

(1 +x) dx =

7

18

,

and the answer is

13

81

.

(c) Find P(X + Y < 1) and P(X + Y ≤ 1). Here and in (d): when you get to a single

integral involving powers, stop.

Solution:

The two probabilities are both equal to

_

2

3

_

2

_

1

0

dx

_

1−x

0

(1 +x)(1 +y) dy =

_

2

3

_

2

_

1

0

(1 +x)

_

(1 −x) +

(1 −x)

2

2

_

dx.

(d) Find E[X −Y [.

Solution:

This equals

_

2

3

_

2

2

_

1

0

dx

_

x

0

(x −y)(1 +x)(1 +y) dy =

_

2

3

_

2

2

_

1

0

_

x(1 +x)

_

x +

x

2

2

_

−(1 +x)

_

x

2

2

+

x

3

3

__

dx.

1 INTRODUCTION

1

1

Introduction

Theory of probability has always been associated with gambling and many of the most accessible examples still come from that activity. You should be familiar with the basic tools of gambling trade: a coin, a (six-sided) die, and a full deck of 52 cards. A fair coin gives you Heads (H) or Tails (T), with equal probability, a fair die will give you 1, 2, 3, 4, 5, or 6 with equal probability, and a shuﬄed deck of cards means that any ordering of cards is equally likely. Example 1.1. Here are some typical questions that we will be asking and you will learn how to answer. This example will serve as an illustration and you shouldn’t expect yourself to understand how to get the answer yet. Start with a shuﬄed deck of cards and distribute all 52 cards to 4 players, 13 cards to each. What is the probability that each player gets an Ace? Next, assume you are a player and you get a single Ace. Now what is the probability that each player gets an Ace? Answers. If any ordering of cards is equally likely, then also any position of the four Aces in the deck is equally likely. There are 52 possibilities for positions (slots) for the 4 aces. Out of 4 these, the number of positions that give each player an Ace is 134 : pick the ﬁrst slot among the cards that the ﬁrst player gets, then the second slot among the second player’s card slots, then 4 the third and the fourth slots. Therefore, the answer is 13 ≈ 0.1055. (52) 4 After you see that you have a single Ace, the probability goes up: the previous answer needs 13·(39) to be divided by the probability that you get a single Ace, which is 523 ≈ 0.4388. The answer (4) 134 then becomes 13· 39 ≈ 0.2404. (3) Here is how you can quickly estimate the second probability during a card game: give the second Ace to a player, then a third to a diﬀerent player (probability about 2/3) and then the last to the third player (probability about 1/3) for the approximate answer 2/9 ≈ 0.22.

History of probability

Although gambling dates back thousands of years, the birth of modern probability is considered to be a 1654 letter from Flemish aristocrat and notorious gambler Chevalier de M´r´ to the ee mathematician and philosopher Blaise Pascal. In essence the letter said, I used to bet even money that I would get at least one 6 in four rolls of a fair die. The probability of this is 4 times the probability of getting a 6 in a single die, i.e., 4/6 = 2/3; clearly I had an advantage and indeed I was making money. Now I bet even money that within 24 rolls of two dice I get at least one double 6. This has the same advantage (24/62 = 2/3), but now I’m losing money. Why? As Pascal discussed in his correspondence with Pierre de Fermat, de M´r´ reasonong was faulty; ee after all, if the number of rolls were 7 in the ﬁrst game, the logic would give the nonsensical probability 7/6. We’ll come back to this later.

1 INTRODUCTION

2

Example 1.2. In a family with 4 children, what is the probability of a 2:2 boy-girl split? One common wrong answer: equally likely.

1 5,

as the 5 possibilities for number of boys (0 to 4) are not

Another common guess: close to 1, as this is the most “balanced” possibility. This represents the mistaken belief that symmetry in probabilities should very likely result in symmetry in the outcome. A related confusion supposes that events that are probable (say, have probability around 0.75) occur nearly certainly.

**Equally likely outcomes
**

Suppose an experiment is performed, with n possible outcomes comprising a set S. Assume also that all outcomes are equally likely. (Whether this assumption is realistic, depends on the context. In the above Example 1.2, this was a bad assumption.) An event E is a set of outcomes, i.e., E ⊂ S. If an event E consists of m diﬀerent outcomes (often called “good” outcomes for E), then the probability of E is given by, (1.1) P (E) = m . n

1 Example 1.3. A fair die has 6 outcomes, and take E = {2, 4, 6}. Then P (E) = 2 .

What does the answer in Example 1.3 mean? Every student of probability should spend some time thinking about this. The fact is that it is very diﬃcult to attach a meaning to P (E) if we roll a die just a single time, or a few times. The most straightforward interpretation is that for a very large number of rolls, about half of the outcomes will be even. Note that this requires at least the concept of a limit! This relative frequency interpretation of probability will be explained in detail much later. For now, take formula (1.1) as the deﬁnition of probability.

That is. Example 2. TTH}. TTT. although this is not quite true in reality. 3 have exactly one H. What is the most likely sum? The outcomes are ordered pairs (i. E = {HTT. and P (E) = 3/8. We now have 16 outcomes. Toss three fair coins. Roll two dice. j). What is the probability of exactly one H? There are 8 equally likely outcomes: HHH. 1 ≤ j ≤ 6. of outcomes 1 2 3 4 5 6 5 4 3 2 1 = 1. 1 ≤ i ≤ 6. THT. HTT. especially if their number is large.2. but there has to be a better way: BBBB BGBB GBBB GGBB We conclude that P (2:2 split) = BBBG BGBG GBBG GGBG BBGB BGGB GBGB GGGB BBGG BGGG GBGG GGGG 3 6 = .1. THH. and P (sum = 7) = 6 36 no. HHT. Now let’s compute the probability of a 2:2 boy-girl split in a four-child family. We can list the outcomes. . which we will assume are equally likely. THT. HTH.2 COMBINATORICS 3 2 Combinatorics Example 2. 16 2 2 1 P (4:0 split or 0:4 split) = = . sum 2 3 4 5 6 7 8 9 10 11 12 Our answer is 7. 16 8 P (1:3 split or 3:1 split) = Example 2. 16 8 8 1 = . 6 How to count? Listing all outcomes is very ineﬃcient. TTH.3. Out of these.

the number of bad ones 3524 and thus the number of good outcomes equals 3624 − 3524 . 6. In this case.4914.4. But many outcomes have more than one six and are hence counted more than once. Example 2. you should divide every solution into the following three steps: Step 1: Identify the set of equally likely outcomes. Therefore P (win) = 1 − 5 6 4 ≈ 0. . {(a. . d) : a. Note that the set of possible outcomes changes from step to step. 4 options for the third roll since its number must not appear on the ﬁrst two rolls. there are 4 rolls and he wins with at least one 6. and the ﬁrst part has m outcomes. d ∈ {1. Step 3: Compute the number of good outcomes. and the second part has n outcomes regardless of the outcome in the ﬁrst part. The number of outcomes is 3624 . etc.5. One should also note that both probabilities are barely diﬀerent from 1/2 so de M´r´ was ee gambling a lot to be able to notice the diﬀerence. but their number does not! The answer then is 6 · 5 · 4 · 3/64 = 5/18 ≈ 0. Chevalier de M´r´ overcounted the good outcomes in both cases.5177. . In Game 2. 5 options for the second roll since its number must diﬀer from the number on the ﬁrst roll. In this case.2 COMBINATORICS 4 Basic principle: If an experiment consists of two parts. Why? We have 6 options for the ﬁrst roll. . b. 6}}. Let’s now compute probabilities for de M´r´’s games. . as the number of bad events is 54 . Roll a die 4 times. . there are 24 rolls of two dice and he wins by at least one pair of 6’s rolled. That is. . . this is the set of all ordered 4-tuples of numbers 1. In this case this number is 6 · 5 · 4 · 3. What is the probability that you get diﬀerent numbers? At least at the beginning. then the experiment as a whole has mn outcomes. . Example 2. c. b.2778. Step 2: Compute the number of outcomes. c. the number of outcomes is therefore 64 . Therefore P (win) = 1 − 35 36 24 ≈ 0. His count 4 · 63 in Game ee 1 selects a die with a 6 and arbitrary numbers for other dice. The number of good events is − 54 . ee 64 In Game 1.

3. while the number of ways to ﬁll k ≤ n ordered slots is n(n − 1) . Before we go on to further examples. Outcome is an ordering of pieces of paper E1 E2 P1 P2 P3 R. this means that all available choices are equally likely. Sampling without replacement omits the putting back part. noting its properties. quite a lot smaller. spell P EP P ER? There are two problems to solve. and ordering hearts within their block. What is the probability that these pieces. and (2) with replacement. . For sampling without replacement: 1. without further elaboration. Example 2.7. P . . 33 ·22 66 For sampling with replacement. • P (hearts are together) = 40!13! 52! ≈ 4. To compute the last probability. 2 · 1 = n!. collect all hearts into a block. let’s agree that when the text says that a random choice is made. . • P (top card is an Ace) = 1 13 n! . in order. for example. and then repeating. . 4!·(13!)4 52! • P (all cards of the same suit end up next to each other) = never happens in practice. This event ≈ 6 · 10−11 . P .5·10−28 . putting the object back into population. . then specify a good event by ordering 40 items (the block of hearts plus the 39 other cards). (n − k + 1) = Example 2. E. The number of ways to ﬁll n ordered slots with them is n · (n − 1) . on them. P . A bag has 6 pieces of paper with letters. at random. Pull 6 pieces at random out of the bag (1) without. Shuﬄe a deck of cards. and R. 2. Also.2 COMBINATORICS 5 Permutations Assume you have n objects. the answer is = 1 2·63 . (n − k)! = 4·51! 52! . The number of outcomes thus is 6!. an object from a population. in the next problem (and in statistics in general) sampling with replacement refers to choosing. The number of good outcomes is 3!2!.6. E. The probability is 3!2! 6! = 1 60 .

Sit 3 men and 3 women at random (1) on a row of chairs. The answer is 10 . say John Smith. ﬁrst choose the subset. Then distinct choices of a subset and its ordering will end up in distinct orderings. . Then. a Norwegian (Arne) and sit him down. Combinations Let n k be the number of diﬀerent subsets with k elements of a set with n elements. n k = = n(n − 1) . and then order its elements in a row. Therefore. . or a binomial coeﬃcient (as it appears in the binomial theorem: k n k n−k (x + y)n = n ). and (2) around a table. 4 Swedes. In case (1). the answer is 4!3! 6! 1 = 5. Compute P (all women sit together). For the last question. k We call n = n choose k. In case (2). Then you reduce to a row 3 problem. (n − k + 1) k! n! k!(n − k)! To understand why the above is true. There are 3! choices for ordering the group of Norwegians (and then sit them down to one of both sides of Arne. Finally.2 COMBINATORICS 6 Example 2. . Then there are 4! choices for arranging the Swedes. and they sit at random around a table. .9. . pick a man. with 5! outcomes and the number of good outcomes is 3! · 3!. to ﬁll k ordered slots with elements from the set with n objects. 5! Example 2. n k! = n(n − 1) . Pick. also compute P (men and women alternate). and sit him ﬁrst. deﬁned next. (n − k + 1). say. n−k More general are the multinomial coeﬃcients. What is the probability that all three national subgroups end up sitting together? The answer is 3!·4!·5!·2! . and 5! choices for arranging the Finns.8. and 5 Finns. A group consists of 3 Norwegians. Here is how you 11! count the good events. depending on the ordering). there are 2! choices to order the two blocks of Swedes and Finns. the seats for men and women are ﬁxed after John Smith takes his seat and so 1 the answer is 3!2! = 10 . For case (2). Note also that k=0 k x y n 0 = n n =1 and n k = n .

. as one needs to choose the position of the ﬁve heads among 10 slots to ﬁx a good outcome. Order the rest of the cards in 48! ways. . 2. These colors distinguish between diﬀerent subsets.10. etc. + nr = n is denoted by n1 .1 and do it more slowly. What is the probability that 3 are blue and 2 are red? The number of outcomes is 100 . with 50 of them red and 50 blue. It cannot be exactly a half either. second 13 cards to the second player.12.3189 Why should this probability be less than a half? The probability that 3 are blue and 2 are red is equal to the probability that 3 are red and 2 are blue.. . 4. nr ! n n1 n − n1 − n2 n − n1 − . . . The number of choices to ﬁll these four positions with (four diﬀerent) Aces is 4!.. Pick a slot with each of the four segments of 13 slots for an Ace.. Example 2. . − nr−1 . We have a bag that contains 100 balls. so if it were greater than a half. The ﬁrst way uses an outcome to be an ordering of 52 cards: 1. A fair coin is tossed 10 times. n2 . There are 134 possibilities to choose these four slots for Aces. Example 2. . There are 52! equally likely outcomes. .11. .” Then the answer is P (3 are blue and 2 are red) = 50 50 3 2 100 5 ≈ 0. and all of them are equally likely. etc. both together would add to over 1... n3 nr To understand the slightly confusing word distinguishable.2461. . nr elements where n1 + . Example 2. and n n1 . Shuﬄe a standard deck of 52 cards. a reasonable interpreta5 tion of “select 5 balls at random. .. then n2 diﬀerent elements blue. . What is the probability that we get exactly 5 Heads? P (exactly 5 Heads) = 10 5 210 ≈ 0.nr . because other possibilities (like all 5 chosen balls are red) have probability greater than 0. just think of painting n1 elements red. Let the ﬁrst 13 cards go to the ﬁrst player. to emphasize that you often have a choice in your set of equally likely outcomes. . Select 5 balls at random. and deal 13 cards to each of the 4 players. Here we return to Example 1.2 COMBINATORICS 7 The number of ways to divide a set of n elements into r (distinguishable) subsets of n n1 . What is the probability that each player gets an Ace? We will solve this problem in two ways. . nr = = n − n1 n2 n! n1 !n2 ! . 3. .

3. Doing the problem the second way.. . . 2. so. You may choose to believe this intuitive fact or try to write down a formal proof: the number of permutations that result in a given set F of four positions of Aces in the deck is independent of F . which agrees with the number we obtained the ﬁrst way. with choices. and ﬁll the remaining 7 (12)(9)47 2 slots by numbers 3.(2) 2 2 2 612 12 2 10 2 . Number of outcomes is 52 4 (52) 4 134 . etc. Number of outcomes is 52 4 . Outcomes are the positions of the 4 Aces among the 52 slots for shuﬄed cards of the deck. 4 The answer is then Ace. 2 appears exactly 2 times)? To ﬁx a good outcome now.2 COMBINATORICS 134 4!48! 52! .13. Each room is painted at random with one of the four colors. as we need to choose one slot among 13 cards that go to the ﬁrst player. 3. two slots for 2. 3 2 Example 2. Pick two slots for 1. The probability is then get 1. . 2 2 . for example.. 4 1 2. blue. and the answer 3 612 . 8 The probability is then The second way. The number of choices is 12 9 47 . Then: 1. The probability is then (12)(10). 2. via a small leap of faith. There are 414 equally likely outcomes. green. white. Let’s also compute P (one person has all four Aces). The number of good outcomes is 134 .. P (5 white. a lot smaller than the probability of each player getting an (52) 4 Example 2. To ﬁx a good outcome.. Roll a die 12 times. we .0106. What is the P (1 appears exactly 3 times. We have 14 rooms with 4 colors. assumes that each set of four positions of the four Aces among the 52 shuﬄed cards is equally likely.. choices) and pick four slots for Aces for that (4)(13) 1 4 = 0.14. To ﬁx a good outcome. Number of is therefore outcomes 612 . and yellow. 2 yellow rooms) = 14 5 9 4 5 3 2 2 414 . . 6. 4 blue. pick three slots for 1. P (each number appears exactly two times)? 1. An outcome consist of ﬁlling each of the 12 slots (for 12 rolls) with an integer between 1 and 6 (the outcome of the roll). pick one player ( player ( 13 choices). 3 green. . then pick two slots for 2. etc.

3. . 2. . (b) What do you need to assume to conclude that all outcomes are equally likely? (c) Under this assumption. 4. What is the probability that they all receive correct meals? A reformulation makes the problem clearer: we are interested in P (3 people who ordered C get C). . Assume all licence plates are equally likely. (a) What is the probability that all symbols are diﬀerent? (b) What is the probability that all symbols are diﬀerent and the ﬁrst number is the largest of all numbers on the plate? 2. . The outcome is a selection of 3 people from the 7 who receive C. letter. First n people are chosen at random from the 2n (with no regard to nationality). . Each pair proceeds to play one match. number. . number. . letter. Three of them order chicken (C) and the remaining four pasta (P). 1. n Swedes and n Norwegians. and (c) that that all Norwegians all Swedes and all Finns sit together. 35 3· ·4 4 2 7 3 P (a single person who ordered C gets C) = P (two persons who ordered C get C) = 3 2 7 3 = 12 . . Let’s label people 1. . (a) What is the probability that all your numbers are selected? (b) What is the probability that none of your numbers is selected? (c) What is the probability that exactly 4 of your numbers are selected? . . then paired randomly with the other n people. 35 18 . Similarly 7 (3) P (no one who ordered C gets C) = 4 3 7 3 = 4 . but has forgotten who ordered what and discovers that they are all asleep. So the answer is 1 = 35 . 6 Swedes and 7 Finns are seated at random around a table. Then 20 balls are selected (without replacement) out of the bag at random. 5 Norwegians. compute the probability that all Swedes are winners. . Compute the following probabilities: (a) that all Norwegians sit together. . (a) Determine the number of outcomes. so she puts the meals in front of them at random. 35 = Problems 1. where a letter is any one of 26 letters and a number is one among 0. giving winner and loser in each of the n matches. 9. 7. 80 and write them on a piece of paper. 80. . letter. (b) that all Norwegians and all Swedes sit together. .2 COMBINATORICS 9 Example 2. the number of them is 7 and there is a single 3 1 good outcome. The ﬂight attendant returns with the meals. Before the game starts. A California licence plate consists of a sequence of seven ordered symbols: number. and 3 ordered C. . and assume that 1. A middle row on a plane sits 7 people. number. . A bag contains 80 balls numbered 1. A tennis tournament has 2n participants. An outcome is a set of n (ordered) pairs. you choose 10 diﬀerent numbers from amongst 1.15.

. (a) 13!·5! 17! . Pull 8 cards from the deck at random. (c) The number of good events is n!. 17! (70) (70) (10)(70) 10 4 20 16 . then pair them: 2n ·n! choices. (a) Divide into two groups (winners and losers). 104 ·263 1 (b) the answer in (a) times 4 . 3. (b) 8!·5!·6! 17! .) (b) All players are of equal strength. Solutions to problems 1. (a) 5. these two expressions are the same. n pair ﬁrst player. .2 COMBINATORICS 10 5. (a) 10·9·8·7·26·25·24 . the choice of a Norwegian paired with each Swede. (c) (20) (20) (80) 20 (39) 8 . and then at the end choose winners and losers: (2n − 1)(2n − 3) · · · 3 · 1 · 2n . 2. (b) 80 . etc. equally likely to win or lose any match against any other player. In each case compute the probability that you get no Hearts. (c) 2!·7!·6!·5! . (a) 4. (a) without replacement and (b) with replacement. (Of course. 80 . then next available player. Alternatively. A full deck of 52 cards contains 13 Hearts. (b) (52) 8 3 8 4 .

at most one of such events can occur. P (A) ≥ 0. and probability (which attaches to every event the likelihood that it happens). · · · ∈ F =⇒ ∪∞ Ai ∈ F. but this is just for illustration and you should take a course in measure theory to understand what is really going on. Finally. unions of two or more events (i. i=1 What is important is that you can take complement Ac of an event A (i. . sometimes called a sample space. we will see soon what it means to choose a random point within a unit circle. A probability space is then a triple (Ω. On the other hand. For example. and intersection of two or more events (i. we will also see that there is no way to choose at random a positive integer — remember that “at random” means all choices are equally likely.e.. . . Therefore. .e.3 AXIOMS OF PROBABILITY 11 3 Axioms of Probability The question here is: how can we mathematically deﬁne a random experiment? What we have are outcomes (which tell you exactly what happens). . inﬁnite sets are much harder to deal with. We need to agree on which properties these objects must have in order to compute with them and develop a theory. Namely. For every event A. you can assume so without worry. unless otherwise speciﬁed. When we have ﬁnitely many equally likely outcomes all is pretty clear and we have already seen many examples. F. A1 ∩ A2 happens when both A1 and A2 happen). Axiom 3. A ∈ F necessarily means that A ⊂ Ω.e. If A1 . Can we just say that each A ⊂ Ω is an event? In this course. . (2) A ∈ F =⇒ Ac ∈ F. . We call events A1 . Axiom 2. The ﬁrst object Ω is an arbitrary set of outcomes. as is common in mathematics. the probability P is a number attached to every event A and satisﬁes the following three axioms: Axiom 1.. A2 . The second object F is the collection of all events. A2 . However. is a sequence of pairwise disjoint events. pairwise disjoint if Ai ∩ Aj = ∅ if i = j — that is.. the ﬁrst thing to do is to look for examples. i=1 Whenever we have an abstract deﬁnition like this. that is a set of subsets of Ω. events (sets comprising certain outcomes). and (3) A1 . Here are some. Ac happens when A does not happen). F needs to be a σ-algebra. P ). which means (1) ∅ ∈ F. then P( ∞ Ai ) = ∞ i=1 P (Ai ). P (Ω) = 1. A2 . although there are good reasons for not assuming so in general! I will give the deﬁnition of what properties F needs to satisfy. A1 ∪ A2 happens when either A1 or A2 happens).

toss a fair coin repeatedly until ﬁrst Heads. pi = 2i .}. Pick a point from inside the unit circle centered at the origin. Your outcome is the number of tosses. . . 5. and you have numbers p1 . (C2) P (Ac ) = 1 − P (A). In Axiom 3. . 1 Here. . of elements in A) . 6}. Example 3. 2. take all sets to be ∅. Example 3. 6 12 The random experiment here is rolling a fair die. For any A ⊂ Ω. y) : x2 + y 2 ≤ 1} and (area of A) P (A) = . then P (A) = 0. p2 . .3. take all sets other than ﬁrst two to be ∅. . Ω = {1. P (A) = pi . P (A) = (no. . π It is important to observe that if A is a singleton (as set whose element is a single point). Proof . Ω = {1. . . Proof . Clearly this can be generalized to any ﬁnite set with equally likely outcomes. i∈A For example. ≥ 0 with p1 + p2 + . 2. Here. Note that pi cannot be chosen to be equal.1. Ω = {(x. 3. This means that we cannot attach the probability to outcomes — you hit a single point (or even a line) with probability 0. as you cannot make the sum of inﬁnitely many equal numbers to be 1.2. Another important theoretical remark: this is a case where A cannot be arbitrary subset of the circle — there are sets for which area cannot be deﬁned! Consequences of the axioms (C0) P (∅) = 0. = 1. then P (A1 ∪ A2 ) = P (A1 ) + P (A2 ). but a “fatter” set with positive area you hit with positive probability. (C1) If A1 ∩ A2 = ∅. In Axiom 3.3 AXIOMS OF PROBABILITY Example 3. 4.

3 AXIOMS OF PROBABILITY Proof . . A2 = Ac . Let P (A \ B) = p1 . (C4) If A ⊂ B. and next that the . p13 = P (A1 ∩ Ac ∩ A3 ). Again note that all sets are pairwise disjoint. 13 (C3) 0 ≤ P (A) ≤ 1. . Use that P (Ac ) ≥ 0 in (C2). Proof . Use (C1) for A1 = A and A2 = B \ A. 2 3 2 1 and p123 = P (A1 ∩ A2 ∩ A3 ). p2 = P (Ac ∩ A2 ∩ Ac ). Let p1 = P (A1 ∩ Ac ∩ Ac ). Then P (A) = p1 + p12 . P (A ∩ B) = p12 and P (B \ A) = p2 . Apply (C1) to A1 = A. 2 3 1 3 c p3 = P (A1 ∩ Ac ∩ A3 ). This is called inclusion-exclusion formula and is commonly used when it is easier to compute probabilities of intersections that of unions. We prove this only for n = 3. p12 = P (A1 ∩ A2 ∩ Ac ). (C6) P (A1 ∪ A2 ∪ A3 ) = P (A1 ) + P (A2 ) + P (A3 ) + P (A1 ∩ A2 ∩ A3 ) − P (A1 ∩ A2 ) − P (A1 ∩ A3 ) − P (A2 ∩ A2 ) and more generally n P (A1 ∪ · · · ∪ An ) = i=1 P (Ai ) − n−1 1≤i<j≤n P (Ai ∩ Aj ) + 1≤i<j<k≤n P (Ai ∩ Aj ∩ Ak ) + . + (−1) P (A1 ∩ · · · ∩ An ). Proof . P (B) = p2 + p12 and P (A ∪ B) = p1 + p2 + p12 . P (B) = P (A) + P (B \ A) ≥ P (A). Proof . (C5) P (A ∪ B) = P (A) + P (B) − P (A ∩ B). Proof . and note that sets appearing here are pairwise disjoint. p23 = P (Ac ∩ A2 ∩ A3 ).

7! 7! 7! Example 3.6. P (AS ∩ AF ) = . The group of 3 Norwegians. Our probability equals 1 − P (A12 ∪ A15 ) = 1 − P (A12 ) − P (A15 ) + P (A12 ∩ A15 ) = 1 − P (A12 ) − P (A15 ) + P (A60 ) 66 16 83 − + = 0. P (AS ) = . 11! 11! 11! 3! · 4! · 6! 3! · 5! · 5! 4! · 5! · 4! P (AN ∩ AS ) = . Compute the probability that at least one of the three groups ends up sitting together. Pick an integer in [1. where lcm stands for the least common multiple. 1000] at random.867. The sample space consists of 1000 integers between 1 and 1000. 11! P (AN ) = and therefore P (AN ∪ AS ∪ AF ) = 3! · 9! + 4! · 8! + 5! · 7! − 3! · 4! · 6! − 3! · 5! · 5! − 4! · 5! · 4! + 3! · 4! · 5! · 2! . A large company of n employees has a scheme according to which each employee buys a Christmas gift. and similarly AS . and gifts are then distributed at random to the employees. The cardinality of Ar is 1000/r . Compute the probability that it is divisible neither by 12 nor by 15. Matching problem. =1− 1000 1000 1000 Example 3. A1 ∩ A2 = {both women and men sit together}. AF . Example 3. What is the probability that someone gets his or her own gift? . P (AN ∩ AF ) = .7. Deﬁne AN = {Norwegians sit together}. What is the probability that either all the men or all the women end up sitting next to each other? Here A1 = {all women sit together}.4. We have 4! · 8! 5! · 7! 3! · 9! . A2 = {all men sit together}. and let Ar be the subset consisting of integers divisible by r. 4 Swedes and 5 Finns is seated at random around the table. 11! 11! 11! 3! · 4! · 5! · 2! P (AN ∩ AS ∩ AF ) = . Sit 3 men and 4 women at random in a row. Another simple fact is that Ar ∩ As = Alcm(r. P (AN ) = .5. and so the answer is P (A1 ∪ A2 ) = P (A1 ) + P (A2 ) − P (A1 ∩ A2 ) = 4! · 4! 5! · 3! 2! · 3! · 4! + − .3 AXIOMS OF PROBABILITY 14 right hand side of (6) is (p1 + p12 + p13 + p123 ) + (p2 + p12 + p23 + p123 ) + (p3 + p13 + p23 + p123 ) + p123 − (p12 + p123 ) − (p13 + p123 ) − (p23 + p123 ) = p1 + p2 + p3 + p12 + p13 + p23 + p123 = P (A1 ∪ A2 ∪ A3 ).s) . 11! Example 3.

Assume you have k people in the room. + (−1)n−1 2! 3! n! 1 → 1 − ≈ 0. . and some values are given in the following table.) Let Ai = {ith employee gets his or her own gift}.9901 0.5073 0. for n = 365 0.3 AXIOMS OF PROBABILITY 15 You should note that this is a lot diﬀerent than asking. the lowest k for which this is above 0. . We have P (Ai ) P (Ai ∩ Aj ) P (Ai ∩ Aj ∩ Ak ) P (A1 ∩ · · · ∩ An ) and so our answer is 1 n n 1 1 1 − + − .9032 0. k 10 23 41 57 70 prob. Birthday Problem.5 is famously k = 23. Then what we are looking for is n P( i=1 Ai ).9992 . assume all birthdays are equally likely. = 1 (for all i) n (n − 2)! 1 = (for all i < j) n! n(n − 1) 1 (n − 3)! = (for all i < j < k) n! n(n − 1)(n − 2) 1 n! When n = 365.8. P (a shared birthday) = 1 − P (no shared birthdays) = 1 − n · (n − 1) · · · (n − k + 1) . What is the probability that there are two who share a birthday? We will ignore leap years. e n· Example 3. . assuming you are one of the employees. sample k times with replacement.. and generalize the problem a little: from n possible birthdays.1169 0. + (−1)n−1 · · n 2 n(n − 1) 3 n(n − 1)(n − 2) n! 1 1 1 = 1 − − + .. . 1 for the probability that you get your own gift. nk = = = . (Which is n .6321 (as n → ∞).

Within the context of the previous problem. More generally. a remarkably improbable event. not understanding the Birthday Problem. no winning number has ever been repeated.19 · 10−10 .. What happened was that Mr. our probability is n! nn . Thus there is sampling with replacement from among n = 10. conﬁdent that there would not be any. 1978. ..].9. If you buy k boxes. On February 6. what is the probability that you have a complete collection? When k = n. get anything close the correct numbers when k ≤ n if you try . Apologetic lottery oﬃcials announced later that there indeed were repetitions. the expert [and a lottery oﬃcial] doesn’t expect to see duplicate winners until about half of the 10. 000 choices each day. assume k ≥ n and compute P (all n birthday are represented). Hughes. i=1 = = ..” What would an informed reader make of this? Assuming k = 660 days. did not check for repetitions. = (n − 1)k nk (n − 2)k nk 0 (for all i) (for all i < j) n−1 n k + n 2 n−2 n k k − .. Massachusetts lottery chooses a four digit number at random. let Ai = {ith birthday is missing}. . [David] Hughes. the Boston Evening Globe reported that “During [the lottery’s] 22 months existence [. 000 possibilities have been exhausted. and 0 when k < n. neither of which is obvious from the formula. n! This must be nn for k = n. each of which contains one of n diﬀerent cards (coupons). + (−1)n−1 = i=0 n i (−1)i 1 − n i n n−1 1 n k . Then we need to compute 1 − P( Now P (Ai ) P (Ai ∩ Aj ) P (A1 ∩ · · · ∩ An ) and our answer is 1−n n n Ai ). Neither will you. More often this is described in terms of cereal boxes. Coupon Collector Problem. chosen at random. Example 3. for large n. with leading 0’s allowed.3 AXIOMS OF PROBABILITY 16 Each day. the probability of no repetition works out to be about 2.

10.9990 k 13 23 36 prob. our probability is 10−i 28−2i (8−2i)(10) i (20) 8 . Example 3. our probability is 6(5)(12)(6) 2 6 3 . Now there are 20 8 outcomes. Compute the probability that a number occurs 6 times and two other numbers occur three times each.0101 0. Pick the two numbers that occurs 3 times each: choices.1003 0. due to very large size of summands with alternating signs and resulting rounding errors. 12 6 6 3 3. We will return to this problem later for a much more eﬃcient computational method. To count the number of good outcomes: 10 i 1.n is the Stirling number of the second kind. 10−i 8−2i 2.9002 0. 612 choices.5004 0. Pick 8 socks at random. Pick the number that occurs 6 times: 6 1 = 6 choices. where Sk. The number of outcomes is 612 . Pick pairs which are represented by a single sock: choices. so we do them here instead. but some numbers are in two tables below. Pick slots (rolls) for the number that occurs 6 times: 4.n . Another remark for those n! who know a lot of combinatorics: you will perhaps notice that the above probability is nk Sk. 3.9108 0. k 1607 1854 2287 2972 3828 4669 prob. 5 2 2. Therefore. Pick i pairs of socks of the 10: choices. for n = 365 0. choices.9915 More examples with combinatorial ﬂavor We will now do more problems which would belong more into previous chapter. Example 3. .11.5139 0. To count the number of good outcomes: 1. for n = 6 0. For every i. You have 10 pairs of socks in the closet. Pick a sock from each of the latter pairs: 28−2i choices. Roll a die 12 times. Pick slots for one of the numbers that occur 3 times: Therefore. compute the probability that you get i complete pairs of socks.3 AXIOMS OF PROBABILITY 17 to compute the probabilities by computer. but are a little harder.9900 0.

In the deﬁnitions. A.021128 0.003940 0. This sequence orders cards in descending consecutive values. Q. 2. here are the hands: (a) one pair : two cards of same value plus 3 cards with diﬀerent values J♠ J♣ 9♥ Q♣ 4♠ (b) two pairs: two pairs plus another card of diﬀerent value J♠ J♣ 9♥ 9♣ 3♠ (c) three of a kind: three cards of the same value plus two with diﬀerent values Q♠ Q♣ Q♥ 9♣ 3♠ (d) straight: ﬁve cards with consecutive values 5♥ 4♣ 3♣ 2♥ A♠ (e) ﬂush: ﬁve cards of the same suit K♣ 9♣ 7♣ 6♣ 3♣ (f) full house: a three of a kind and a pair J♣ J♦ J♥ 3♣ 3♠ (g) four of a kind: four cards of the same value K♣ K♦ K♥ K♣ 10♠ (e) straight ﬂush: ﬁve cards of the same suit with consecutive values A♠ K♠ Q♠ J♠ 10♠ And here are the probabilities: hand one pair two pairs three of a kind straight ﬂush full house four of a kind straight ﬂush no.3 AXIOMS OF PROBABILITY 18 Example 3. 1. 10. 8. J.12.001981 0. for example.001441 0. 3. K.047539 0. 4. K. combinations 13 · 12 · 4 · 43 3 2 13 4 4 2 · 11 · 2 · 2 · 4 13 · 12 · 4 · 42 2 3 10 · 45 4 · 13 5 13 · 12 · 4 · 4 3 2 13 · 12 · 4 10 · 4 approx. the word value refers to A. 7. Poker Hands.000015 . 9.422569 0. 6. 5. with one exception: an Ace may be regarded as 1 for the purposes of making a straight (but note that. From lowest to highest. prob.000240 0. 3 is not a valid straight sequence — A can only begin or end a straight). 0. 2.

e. Pick a number for the pair: 13. 4 3 3. Pick a card (i. 598. 2. 4. Let’s see how some of these are computed. P (nothing) = P (all cards with diﬀerent values) − P (straight or ﬂush) = = 13 5 52 5 13 5 · 45 − (P (straight) + P (ﬂush) − P (straight ﬂush)) 13 5 · 45 − 10 · 45 − 4 · 52 5 + 40 ≈ 0. Our ﬁnal worked out case is straight ﬂush: 1. Pick two cards from the value of the pair: 4. We end this example by computing the probability of not getting any hands listed above. Pick three cards from the four of the same chosen value: . Pick the suits of the other three values: 43 And for the ﬂush: 1. 2. Pick a suit: 4. Pick ﬁve numbers: 13 5 . the number of good outcomes by listing the number of choices: 1. Choose the values of other two cards: 12 2 .5012. for the three of a kind. the suit) from each of the two remaining values: 42 .3 AXIOMS OF PROBABILITY 19 You should note that the probabilities of straight and ﬂush above include the possibility of a straight ﬂush. 960. 2. Pick a suit: 4. Pick the other three numbers: 12 3 4 2 3. One could do the same for one pair : 1. First. . Pick the beginning number: 10. the number of all outcomes is 52 = 5 2. that is. for example.. 2. Choose a value for the triple: 13. Then.

5? This is an example when careful thinking about what outcomes should be really pays oﬀ. 2.3 AXIOMS OF PROBABILITY 20 Example 3. Any such distribution constitutes an outcome. What is the probability that at least one of the three nations is not represented on the committee? 4. etc. for 10−2i choices. . What is the probability that exactly 2i rooms are mixed. 4) mixed rooms. so let’s call these two slots R2. and 10 Norwegians. 3) rooms. 10 Finns and 10 Danes. Their number is 20 . 6). 2 per room. are to be distributed at random into 10 rooms. i = 0.. and they are equally likely.. Consider the following model for distributing Scandinavians into rooms.. so let’s call these two slots R1. . For this. . and this choice ﬁxes a 5−i good outcome. First arrange them at random onto a row of 20 slots S1. S2. . Roll a single die 10 times. Problems 1. Three married couples take seats around a table at random. A committee of ﬁve people is chosen at random from this group. 10 To get 2i (for example. Compute P (no wife sits next to her husband). two 2’s and ﬁve 3’s. S4. Compute the following probabilities: (a) that you get at least one 6. 4) out of 10 rooms which are going to be mixed. 5. There are 10 of these choices. . start by choosing 2i (ex. . . as there are two D’s in each of those rooms. Compute the probability that no digit appears more than twice. Room 1 then takes people in slots S1. Once these two choices are made. (c) that you get three 1’s. (b) that you get at least one 6 and at least one 5. . room 2 takes people in slots S3. 3. Choose each digit of a 5 digit number at random from digits 1. 3) rooms out of the remaining 10 − 2i (ex. Similarly. for 22i choices. you need to choose 5 − i (ex. 6) D’s to distribute into 5 − i (ex. You also need to make a choice into 2i which slot in each of the 2i chosen mixed rooms the D goes. . (a) Compute the probability that at least one number occurs exactly 6 times. .. A group of 20 Scandinavians consists of 7 Swedes. S2.. S20. 3 Finns. corresponding to the positions of 10 Danes. you still have 10 − 2i (ex. The ﬁnal answer therefore is 10 2i 22i 20 10 10−2i 5−i . 9.13. Roll a fair die 10 times. . Assume that 20 Scandinavians. (b) Compute the probability that at least one number occurs exactly once. Now it is clear that we only need to keep track of distribution of 10 D’s into the 20 slots.

The drawing consists of choosing 5 diﬀerent numbers from 1. In the remaining row of 5 sets. Moreover. by inclusion-exclusion. 2 1 2 11 −3· + = . 5 5 15 15 3. . 3. for example. P (A1 ∪ A2 ∪ A3 ) = 3 · and our answer is 4 15 . The ﬁrst term is the number of 3 4 numbers in which a digit appears 3 times. and couple2 . while Ticket 3 loses. 6. 2. then Ticket 1. 5. . A2 = the event that Finns are not represented. Solutions to problems 1. couple2 . . . Now. P (A1 ∪ A2 ∪ A3 ) = P (A1 ) + P (A2 ) + P (A3 ) − P (A1 ∩ A2 ) − P (A1 ∩ A3 ) − P (A3 ∩ A2 ) +P (A1 ∩ A2 ∩ A3 ) 1 13 17 10 10 7 = 20 + + − −0− +0 5 5 5 5 5 5 4. P (A1 ) = P (A1 ∩ A2 ) = P (A1 ∩ A3 ) = P (A2 ∩ A3 ) = P (A1 ∩ A2 ∩ A3 ) = For P (A1 ∩ A2 ). The complement is a union of three events Ai = {couple i sits together}. A lottery ticket consists of two rows. 17 are drawn. but no digit appears 4 times: choose a digit. each containing 3 numbers from 1. A ticket wins if its ﬁrst row contains at least two of the numbers drawn. . then the ordering of seats within each of couple1 . (c) 2 = P (A2 ) = P (A3 ). pick the ordering for couple1 . 50 at random. pick a seat for husband3 . . (b) 1 − 2 · (5/6)10 + (2/3)10 . 2. and A3 = the event that Norwegians are not represented. 2. 5! 5 2! · 2! · 2! · 2! 2 = . . if the numbers 1. Thus there are four types of tickets. 2. Ticket 2 and Ticket 4 all win. The number of bad events is 9 · 5 · 82 + 9 · 5 · 8 + 9. and wife3 . i = 1. and the second row contains at least two of the numbers drawn. choose 3 . Let A1 = the event that Swedes are not represented. 3. (a) 1 − (5/6)10 . 5 3! · 2! · 2! 1 = .3 AXIOMS OF PROBABILITY 21 6. 5! 15 10 3 7 2 6−10 . . as shown below: Ticket 1 1 2 3 4 5 6 Ticket 2 1 2 3 1 2 3 Ticket 3 1 2 3 2 3 4 Ticket 4 1 2 3 3 4 5 For example. Compute the winning probabilities for all four tickets. 50.

6 (b) (a) Now Ai is the event that the number i appears exactly once. The second term is the number of numbers in which a digit appears 4 times. 5. then ﬁll the remaining position. three on the other) 3 · 3 · 44 + 2 · 3 402 = = 50 . 50 5 5 . a hit is shorthand for a chosen number. but no digit appears 5 times. By inclusion-exclusion. P (ticket 1 wins) = P (two hits on each line) + P (two hits on one line.3 AXIOMS OF PROBABILITY 22 positions ﬁlled by it. − · 10 · 9 · · 10 · 9 · 8 · · 10 · 9 · 8 · 7 · · 10 · 9 · 8 · 7 · 6 · 6. 10 · 54 P (A1 ∪ A2 ∪ A3 ∪ A4 ∪ A5 ∪ A6 ) = 6 · 6 10 . (a) Let Ai be the event that the number i appears exactly 6 times. and the last term is the number of numbers in which a digit appears 5 times. Below. The answer then is 1− 9· 5 3 · 82 + 9 · 95 5 4 ·8+9 . P (A1 ∪ A2 ∪ A3 ∪ A4 ∪ A5 ∪ A6 ) = 6P (A1 ) 6 2 6 3 6 4 6 5 6 6 − + − + − P (A1 ∩ A2 ) P (A1 ∩ A2 ∩ A3 ) P (A1 ∩ A2 ∩ A3 ∩ A4 ) P (A1 ∩ A2 ∩ A3 ∩ A4 ∩ A5 ) P (A1 ∩ A2 ∩ A3 ∩ A4 ∩ A5 ∩ A6 ) 59 610 48 610 37 610 26 610 1 610 = 6 · 10 · 6 2 6 + 3 6 − 4 6 + 5 − 0. As Ai are pairwise disjoint.

and P (ticket 3 wins) = P (2. 4 both hit and one of 2.3 AXIOMS OF PROBABILITY 23 and P (ticket 2 wins) = P (two hits among 1. P (ticket 4 wins) = P (3 hit. 4. 3) = 3· 47 3 + 50 5 47 3 = 49726 50 5 . 2. 5 all hit) = (4 45 2 + 4 · 45 + 1) + 45 50 5 = 4186 50 5 . 3 both hit) + P (1. 2. 2. 3 hit) = 48 3 +2· 50 5 46 2 = 19366 50 5 . at least one additional hit on each line) + P (1. and ﬁnally. 3) + P (three hits among 1.

B. Lucky you! Now I’m going to pull the trigger one more time. we do not have a notation. Other than color and fuzziness. P (F ) P (the selected ball is fuzzy) This conveys the idea that with additional information the probability must be adjusted. 2 are red and 2 are blue. Here’s the barrel of the gun. P (B) Example 4. The former. a revolver. We observed that P (R) = 5/11. Say bookies estimate your basketball’s team chances of winning a particular game to be 0. however. the probability that the pulled-out cube is red is 5/11.6. Which would you prefer. so we introduce it here: P (R|F ) = 3/7. Then the game starts and at half time your time lead by 25 points. the game in this case) is known.) “Let’s play a game of Russian roulette. take events A and B. For the probability of a red cube. the macabre tone is not unusual for such interviews. You cannot expect that the bookies’ odds will remain the same and they change. to 0. 3 are red and 4 are blue. Now you reach into the bag and grab the cube and notice it is fuzzy (but do not pull it out or note its color in any other way). all probabilities are trivial. Here is a question asked on Wall Street job interviews. when complete information (that is. conditioned on it being fuzzy. Before you start your experiment. Your plan is to pull a cube out of the bag at random. six chambers. You are given the choice between an unconditional and a conditional probability of death. the odds will change. out of which 7 have fuzzy surface and 4 have smooth surface.4 CONDITIONAL PROBABILITY AND INDEPENDENCE 24 4 Conditional Probability and Independence Example 4. I close the barrel and spin it. and assume that P (B) > 0. Here’s a gun. and out of 4 smooth ones. You are tied to your chair. Assume that you have a bag with 11 cubes.1. Now watch me as I put two bullets into the barrel. This is common in real life. blue. respectively. and smooth. S that the pulledout cube is red. Click. Finally. Out of 7 fuzzy ones.3. F . Two hours before the game starts. I put a gun to your head and pull the trigger. say to 0. Then the conditional probability of A given B equals P (A|B) = P (A ∩ B) . it comes out that your team’s star player is out with a sprained ankle. fuzzy. say. Again. or that I just pull the trigger?” Assume that the barrel rotates clockwise after the hammer hits and is pulled back. For general deﬁnition. Note that this also equals P (the selected ball is red and fuzzy) P (R ∩ F ) = . 24 hours before the game starts. 0 or 1. that I spin the barrel ﬁrst. .8.2. the outcome of your experiment. into two adjacent chambers. cubes have no other distinguishing characteristics. (This is the original formulation. So there are 5 red and 6 blue cubes. all empty. Clearly the probability should now change to 3/7! Your experiment clearly has 11 outcomes. Consider the events R.

7386. the answer should be a little larger than before. What is the probability that all three are white? We already know how to do part (a): . the conditional probability Q(A) = P (A|B) satisﬁes the three axioms in Chapter 3. Example 4. after canceling 9 6 10 7 1 210 . and P (A|B) = P (both H) P (A ∩ B) = = P (B) P (at least one H) 1 4 3 4 1 = . Example 4. Toss a coin 10 times. the ﬁrst toss is included in 7 your choice with probability 10 . or can be easily determined. especially in sequential random experiments.3. they are equally likely to occur on any given subset of 7 tosses.) Thus Q is another probability assignment. An urn contains 10 black and 10 white balls. 10 Why is this not surprising? Conditioned on 7 Heads. if the trigger is pulled without the extra spin. (This is routine to check and a reader who is more theoretically inclined might view it as a good exercise. P (ﬁrst toss H|exactly 7 H’s) = 9 6 10 7 · 1 210 · 21 10 = 7 . and all consequences of the axioms are valid for it. remains 1/3. blindfolded. the answer is. Toss two fair coins. equals the probability that the hammer clicked on an empty slot which is next to a bullet in the counterclockwise direction. B = {at least one H}. 3 Example 4. The latter. etc.5. and (b) with replacement. 9 8 10 9 + + 9 7 10 8 + + + + 9 9 10 10 = 65 ≈ 0. Then we can use P (A1 ∩ A2 ) = P (A1 )P (A2 |A1 ).4 CONDITIONAL PROBABILITY AND INDEPENDENCE 25 if the barrel is spun again. P (A1 ∩ A2 ∩ A3 ) = P (A1 )P (A2 |A1 )P (A3 |A1 ∩ A2 ). For a ﬁxed condition B. Here A = {both H}. For (b). because this condition is more advantageous for Heads. Somebody tells you that you tossed at least one Heads. Conditional probabilities are sometimes given. 88 Clearly. Draw 3 (a) without replacement. If you know (a) that exactly 7 Heads are tossed. and acting on events A. what is the probability that your ﬁrst toss is Heads? For (a). What is the probability that both your tosses are Heads.4. (b) that at least 7 Heads are tossed. and equals 1/4. If you choose 7 tosses out of 10 at random.

2 2. which equals. Assume that F1 . Then. then P (A|F ) = 1 .” The formula is useful in sequential experiments. P (3rd ball is white|1s two picked are white= 9 8 Our probability is then the product 1 · 19 · 18 . Number of outcomes: . imagine drawing the balls sequentially. what we obtained before. . which could be Heads (event F ) or Tails (event F c ). P (A) = P (F1 )P (A|F1 )+P (F2 )P (A|F2 )+. P (F1 )P (A|F1 ) + P (F2 )P (A|F2 ) + . that is. where the previous colors pulled-out balls do 1 3 not aﬀect probabilities on the subsequent stages. 3. ∪ Fn )) = P ((A ∩ F1 ) ∪ . . Number of ways to select 3 balls out of 10 white ones: Our probability is then . Flip a fair coin. exactly one of them always happens. and 3rd ball is white). . P (2nd ball is white|1st ball is white) = 9 19 . If you toss Tails.1. . (10) 3 . . The answer therefore is 2 . If you toss Heads. 9 . . .4 CONDITIONAL PROBABILITY AND INDEPENDENCE 20 3 26 1. . P (A|F c ) = 2·5 . . roll 1 die. . ∪ Fn = Ω. Compute the probability that you roll exactly one 6. ∪ (A ∩ Fn )) = P (A ∩ Ω) = P (A) We call an instance of using this formula “computing the probability by conditioning on which of the events Fi happens. P (1st ball is white) = 1 . 10 3 2. Fn are pairwise disjoint and that F1 ∪ . when you face diﬀerent experimental conditions on the second stage depending on what happens on the ﬁrst stage. Quite often. P (F ) = P (F c ) = 1 and 6 36 2 so 2 P (A) = P (F )P (A|F ) + P (F c )P (A|F c ) = . A. If A = {exactly one 6}. Then we are computing the probability of intersection of three events: P (1st ball is white. . Theorem 4. The relevant probabilities are: 1. that is.+P (Fn )P (A|Fn ) . . Example 4. roll 2 dice. 2nd ball is white. as it must. . .6. an event F and its complement F c and we are thus conditioning on whether F happens or not. Proof. . (20) 3 To do this problem another way. 8 18 . for any event. 2 This approach is particularly easy in case (b). + P (A ∩ Fn ) = P (A ∩ (F1 ∪ . . First Bayes’ formula. + P (Fn )P (A|Fn ) = P (A ∩ F1 ) + . there are just two events Fi . Here you condition on the outcome of the coin toss.

pk. whereby you (1) reinterpret the problem as a sequential experiment and (2) use the Bayes’ formula with “conditions” Fi being relevant events at the ﬁrst stage of the experiment. p0.1. As promised. P (Fi ) = 6 .0 = 0. Here is how it works on this example.0 = 1. Theorem 4.9. the probability that the newcomer does not duplicate one of existing r − 1 birthdays. for r > 0. . This will be another example of conditioning. Let F3 be the event that any other number of birthdays occurs with k − 1 people. Clearly. If A is the event that you get at least one Ace. At the ﬁrst stage you have k − 1 people. . the probability that n−r+1 the newcomer duplicates one of existing r birthdays. n n for k.2. Note that pk.7. P (A|F1 ) = 1 13 . the event we call A. Example 4.r = 0. Let pk. Let F1 be the event that there are r birthdays represented among the k − 1 people. together with boundary conditions p0.r + · pk−1. . and F2 the event that there are r − 1 birthdays represented among the k − 1 people. pk. Second Bayes’ formula. r ≥ 1.8. as the r newcomer contributes either 0 or 1 new birthdays. Fn and A be as in Theorem 4. . P (A|F1 ) = n .4 CONDITIONAL PROBABILITY AND INDEPENDENCE 27 Example 4. 1. and P (A|F2 ) = n . Coupon collector problem. We will ﬁx n and let k and r be variable. Let F1 . revisited. What is the probability that you get at least one Ace? 1 Here Fi = {number shown on the die is i} for i = 1. and this. . Clearly. In general. + P (A|Fn )P (Fn ) . by the Bayes’ formula. P (A|Fi ) = 1 − Therefore. 6. for k > 0. . Roll a die. for i ≥ 1. (52) i +1− 48 3 52 3 +1− 48 4 52 4 +1− 48 5 52 5 +1− 48 6 52 6 . as many cards from the deck as the number shown on the die. .r−1 .r be the probability that exactly r (out of total of n) birthdays are represented among k people. . we will develop a computationally much better formula than the one in Example 3.n is what we computed by inclusion-exclusion formula. P (A|F3 ) = 0.r = P (A) = P (A|F1 )P (F1 ) + P (A|F2 )P (F2 ) = r n−r+1 · pk−1. Therefore. P (A) = 1 6 1 +1− 13 48 2 52 2 (48) i . then pull at random. Then P (Fj |A) = P (Fj ∩ A) P (A|Fj )P (Fj ) = . . Moreover. without replacement. . than the k’th person arrives on the scene. P (A) P (A|F1 )P (F1 ) + . makes the computation fast and precise. 2.

We have a fair coin and an unfair coin which always comes out Heads. test it. the test will be positive with probability 0.3 0. What is the probability that he is actually sick? (No. Now. We are interested in P (S|T ) = 8 P (T |S)P (S) = ≈ 31%.8!) Let us deﬁne the three relevant events: S = {sick}. this is the same as choosing the item from a large number of them and test it. P (M2 ) = 0.001 0. lightbulbs).11.003 You pick an item. and B = {both tosses H}. Than that machine produces an item. P (T |H) = 0.002 · 0. P (M3 ) = 0.3 ≈ 0.5 prob. and let Mi also denote the event that the item is made by machine i.8. toss it twice.) Let D be the event that an item is defective. What is the probability that the coin is fair? The relevant events are F = {fair coin}. P (B|F ) = 1 . if the person is sick. M2 and M3 . It comes out Heads both times. that is. Fj is called a hypothesis.1.002 · 0. Example 4.8. H = {healthy}.3 Example 4.3. What is the probability that it’s made by the machine M2 ? The best way to think about this random experiment is as a two-stage procedure. which you then proceed to test. and ﬁnd it is defective. 5 ·1 Example 4.4 CONDITIONAL PROBABILITY AND INDEPENDENCE 28 Often. First you choose a machine with the probabilities given by the proportion. (Indeed.002.002 0. any made item is defective 0.2. Moreover.003 · 0. Then 1 P (F ) = P (U ) = 2 (as each coin is chosen with equal probability). A factory has three machines M1 . The test gives the correct diagnosis with probability of 0. It is impossible to tell which machine produced a particular item. 0.2 and P (T |S) = 0.10.2.26. it is not 0. Assume 10% of people have a certain disease. Our probability then is 1 2 · 1 1 2 · 4 1 1 4 + 2 1 = . P (Fj ) its prior probability and P (Fj |E) its posterior probability.2 0.2 + 0. and so P (M2 |D) = 0. that produce items (say. U = {unfair coin}. Here are the known numbers: machine M1 M2 M3 proportion of items made 0. Then P (D|M1 ) = 0. P (M1 ) = 0.5. Choose one at random.001 · 0.9.5 + 0.8. P (T |S)P (S) + P (T |H)P (H) 26 .001. P (D|M2 ) = 0. T = {tested positive}. and 4 P (B|U ) = 1. but if the person is not sick. the test will be positive with probability 0. P (S) = 0. P (H) = 0. P (D|M1 ) = 0. A random person from the population has tested positive for the disease. but some are defective.9.003.

was claiming the latter. Independence Events A and B are independent if P (A ∩ B) = P (A)P (B). As J. F.” it had some evidentiary value. Simpson was on trial in Los Angeles for murder of his wife and her boyfriend. 936 in 1992.8. c )P (A|M c ) P (M )P (A|M ) + P (M 36. J. pg. So we will (conservatively) say that P (A|M ) = 0. (The two-stage experiment then is: choose a murdered woman at random. It was also commonly estimated at the time that about 5% of the women had been physically abused by their husbands.) By the Bayes’ formula. on the second stage. Simpson’s ﬁrst trial. M = {the woman was murdered by the partner}. Alan Dershowitz. The famous sports star and media personality O. out of which 1. we can probably say that 80% is considerably too high. 8. Nevertheless. a positive test result is diﬃcult to evaluate: a positive test for HIV would mean something very diﬀerent for a random person as opposed to somebody who gets tested because of risky behavior.1 The law literature studiously avoids quantifying concepts such as probative value and reasonable doubt. with stated probabilities. O. this is the wrong probability to look at! We need to start with the fact that a woman is murdered. citing the statistics that < 0. or not. The ﬁnal number we need is P (A|M ). she is murdered by her partner. compared to the prior probability of 29%. Merz and J. One of the many issues was whether Simpson’s history of spousal abuse can be presented by the prosecution at the trial. Example 4.4 CONDITIONAL PROBABILITY AND INDEPENDENCE 29 Note that the prior probability P (S) is very important! Without a very good idea about what it is. Dershowitz states that “a considerable number” of wife murderers had previously assaulted them. whether this history was “probative.29. or whether it was merely “prejudicial” and should be excluded. Thus we can say that P (A|M c ) = 0. if we let A = {the (murdered) woman was abused by the partner}. on the ﬁrst stage. J.5. or not. In other words. and dependent (or correlated) otherwise. Caulkins pointed out in the journal Chance (Vol. to use as a sole argument that the evidence is not probative. then we estimate the prior probabilities P (M ) = 0. with probabilities depending on the outcome of the ﬁrst stage. 430 were killed by partners.05 as there is no reason to assume that a woman murdered by somebody else was more or less likely to be abused by her partner. 1995. These numbered 4. . that is. although “some” did not. P (M |A) = 29 P (M )P (A|M ) = ≈ 0. she is among the abused women. C.71. 14). 1995.12. P (M c ) = 0. a famous professor of law at Harvard and a consultant for the defense.1% of the men who abuse their wives end up killing them. and what we are interested in is the posterior probability P (M |A).

1 2 and P (A ∩ B) = P (A ∩ C) = P (B ∩ C) = 1 . independence is an assumption and it is the most important concept in probability. Occurrence of any combination of events does not inﬂuence probability of others. Example 4. . Also. 2}. A2 . Indeed. P (A|B) = P (A). Roll a four sided fair die.14. so A and B c are also independent — knowing that B has not occurred also has no inﬂuence on the probability of A. i Example 4. Another fact to notice immediately is that disjoint events with nonzero probability cannot be independent: given that one of them happens. that is. 3}. Check that these are pairwise independent (each pair is independent). 4}. so the probability of A is unaﬀected by knowledge that B occurred. . one could rewrite the condition for independence. Now pull two cards out of the deck sequentially without replacement. ≤ ik ≤ n are arbitrary indices. so they are not independent.15. . and. ∩ Aik ) = P (Ai1 )P (Ai2 ) . C = {1. How do we deﬁne independence of more than two events? We say that events A1 . but not independent. P (Aik ). and again it can be shown that you can replace some Ai by Ac and independent events remain independent. Pull a random card from a full deck. Toss 2 fair coins. P (R ∩ F ) = 52 = 26 . . Quite often. Are F = {ﬁrst card is an Ace} and S = {second card is an Ace} independent? Now P (F ) = P (S) = 1 13 and P (S|F ) = 3 51 .4 CONDITIONAL PROBABILITY AND INDEPENDENCE 30 Assuming that P (B) > 0. . the other cannot happen and thus its probability drops to zero. . F = {Heads on 1st toss} S = {Heads on 2nd toss} These are independent. An are independent if P (Ai1 ∩ . 2. 3. P (A ∩ B c ) = P (A) − P (A ∩ B) = P (A) − P (A)P (B) = P (A)(1 − P (B)) = P (A)P (B c ). Let A = {card is an Ace} and R = {card is red}. if A and B are independent. . . 2 The two events are independent — the proportion of aces among red cards is the same as the proportion among all cards. . and pairwise 4 1 1 = . B = {1. P (R) = 1 .13. 4 at random. 4 8 P (A ∩ B ∩ C) = A simple reason for lack of independence is A ∩ B ⊂ C. where 1 ≤ i1 ≤ i2 ≤ . . P (A) = P (B) = P (C) = independence follows. Example 4. choose one of the numbers 1. Let A = {1. However. as there are two red Aces. . You will notice that here the independence is in fact an assumption. Are A and R independent? 2 1 1 We have P (A) = 13 .

the game is repeated until decided. by the following trick. and the sum S is observed. as well as between several tosses and rolls. . Let D = {game is decided on 1st round}. P (W |D) = P (D) 7 6 + 62 which is our answer. . you lose outright. and then so are D and W .. The events D and W are independent. You roll a die. but in fact there is a very good reason to conclude so immediately. What is the probability that you win? One way to solve this problem certainly is this: P (win) = P (win on 1st round) + P (win on 2nd round) + P (win on 3rd round) + . Craps. you win outright. your friend tosses a coin. which one can certainly check by computation. Example 4.. thus P (W |D c ) = P (W ). .4 CONDITIONAL PROBABILITY AND INDEPENDENCE 31 so we have complete information on occurrence of C. This is very common in problems such as this! You can avoid all nuisance. The crucial observation is that. Dc and W are independent. and your friend tosses Heads. Important note: we have implicitly assumed independence between the coin and the die. • If you do not roll 6. W = {you win}. • If you roll 6.16. Example 4. 1 = + 6 5 1 · 6 2 1 + 6 5 1 · 6 2 2 1 + .. This means that one can solve this problem by computing relevant probabilities for the 1st round: 1 2 P (W ∩ D) = 1 651 = . Two dice are rolled.17. Many casinos allow you to bet even money on the following game. and therefore P (W ) = P (W |D). as soon as we know that A and B both happen. • If neither. you are thereafter facing the same game with the same winning probability. provided that the game is not decided on the 1st round. In other words. 6 and then you sum up the geometric series. however.

• you roll a • you roll a • you roll a • you roll a 3 36 ). each of which is a success with probability. 11}. .. on any S ⊂ {1. 8. 8 36 . 5 5 5 6 (prob. P (exactly k successes) = n k p (1 − p)n−k . 3 Using the Bayes’ formula. 10}. 2. the pair of dice is rolled repeatedly until one of the following happens: – S repeats. 3. 3+6 = 1 . 5+6 = 11 . then successes k . There are n such subsets. 9. then win with prob. k This is because the successes can occur on any subset S of k trials out of n. you win immediately. if you ﬁx such an S. then win with prob. • If S ∈ {2. 36 ). P (win) = 8 +2 36 3 1 4 2 5 5 · + · + · 36 3 36 5 36 11 ≈ 0. 6. Otherwise. • you roll a 5 (prob. as exactly k successes cannot occur on two diﬀerent such sets.4 CONDITIONAL PROBABILITY AND INDEPENDENCE • If S ∈ {7. . 32 • If S ∈ {4. in which case you lose. 4 4 9 (prob. 36 ). 36 ). in which case you win. 5. 12}. You win on the ﬁrst roll with prob. 5+6 = 11 . 5 3 3 10 (prob. you lose immediately. n} with cardinality k. a decent game by casino standards. What is the winning probability? Let’s look at all possible ways to win: 1.4929. 5 5 5 8 (prob. . . – 7 appears. p.e. In a sequence of n Bernoulli trials. 3 36 3 6 + 36 36 = 3 3+6 = 1. 4+6 = 5 . then win with prob. then win with prob. These possibilities are disjoint. then win with prob. then win with prob. • you roll a 4 (prob. and thus failure with probability 1 − p. Bernoulli trials Assume n independent experiments. 4+6 = 1 . i. 36 ). 3 4 4 2 36 ).

Assume m. A and B.m = P (ﬁrst trial is a success) · P (n − 1 successes before m failures) = p · pn−1. k The problem is solved. Example 4. this again allows for a very speedy and precise computations for large m and n. Best of 7 . play a series of games. and the probability that this happens is. p0.18. pn. and the ﬁrst team with four single-game victories wins the series. 5 Example 4.m = P (in ﬁrst m + n − 1 trials. 2. A machine produces items which are independently defective with probability p. by the assumed independence. but it needs to be pointed out that computationally this is not the best formula. n ≥ 1. P (exactly two items among ﬁrst 6 are defective) = 6 2 p2 (1 − p)4 . Assume that two equally matched teams.0 = 0. As it happens. Let’s compute a few probabilities: 1. Problem of Points. pn. + P (ﬁrst trial is a failure) · P (n successes before m − 1 failures) Together with boundary conditions..20.m−1 . n ≥ 1. P (at least one item among ﬁrst 6 is defective) = 1 − P (no defects) = 1 − (1 − p)6 3.19.3438.m . P (at least 2 items among ﬁrst 6 are defective) = 1 − (1 − p)6 − 6p(1 − p)5 4. Example 4.) = p · 99 5 p (1 − p)94 .m + (1 − p) · pn. the team A lost the ﬁrst game. It is much more eﬃcient to use recursive formula obtained by conditioning on the outcome of the ﬁrst trial. exactly 5 items among 1st 99 def.4 CONDITIONAL PROBABILITY AND INDEPENDENCE 33 must occur on k trials in S. P (exactly 100 items are made before 6 defective are found) equals P (100th item def.m = 1. 26 . pk (1 − p)n−k . What is the probability it will win the series? Assume that the 1 games are Bernoulli trials with success probability 2 . Let’s call this probability pn. pn. and failures on all n − k trials not in S. Then. This involves ﬁnding the probability of n successes before m failures in a sequence of Bernoulli trials. the number of successes is ≥ n) n+m−1 k=n = n+m−1 k p (1 − p)n+m−1−k . We have P (A wins the series) = P (4 successes before 3 failures) 6 6 = k=4 6 k 1 2 = 15 + 6 + 1 ≈ 0. valid for m.

and that at that instant matchbox 2 contains k matches. Banach Matchbox Problem. . 2 · P (A1 ) = 2 · n 2 22n−k 22n−k n Example 4. and your score is the number shown on the die.22. This means that he has reached for box 1 exactly n times in (n + n − k) trials.21. . 6 6 6 6 . Then by the ﬁrst Bayes’ formula. You roll a die. Otherwise. Therefore our probability is 2n − k 1 1 1 2n − k = . Each time he needs a match. Each day.23. A mathematician carries two matchboxes. there are exactly k matches still in the other box? Here. you can disregard days at which you don’t ﬂip to get 14 k=10 14 1 . . independently to ﬂip a fair coin. so the answer is 20 10 p 2 10 Example 4. 1− p 2 10 .4 CONDITIONAL PROBABILITY AND INDEPENDENCE 34 Let A1 be the event that the matchbox 1 is the one discovered empty. 5. (a) What is the probability to get get exactly 10 heads in ﬁrst 20 days? (b) What is the probability to get 10 heads before 5 tails? For (a). In both cases we will condition on your friend’s score — this works a little better in case (b) than conditioning on your score. What is the probability that. he has used n + n − k matches. 0 ≤ k ≤ n. Your friend rolls ﬁve dice and his score is the number of 6’s shown. i = 0. be the event that your friend’s score is i. each originally containing n matches. n from matchbox 1 and n − k from matchbox 2. he is equally likely to take it from either box. Compute (a) the probability of the event A that the two scores are equal and (b) the probability of the event B that you friend’s score is strictly larger. For (b). the probability of getting Heads is p/2 independently each day. . we get 5 P (A) = i=1 P (Fi ) · 1 1 55 1 = (1 − P (F0 )) = − 6 ≈ 0. you do nothing. and for the last time at the (n + 1 + n − k)’th trial. Before that point. p. k 214 Example 4.0997. you decide with probability. upon reaching for a box and ﬁnding it empty. 1 Then P (A|Fi ) = 6 if i ≥ 1 and P (A|F0 ) = 0. Let Fi .

Consider the following game. and J count as 10). independently of other eggs. P (B|Fi ) = if i ≥ 2 and 0 otherwise. 3 blue.2 (and gives the right answer with probability 0. For each of the 16 balls. see if it has a toy inside. Are E1 and E2 independent? 4. and so 5 P (B) = i=1 P (Fi ) · 5 i−1 6 1 6 5 = = = = 1 6 1 6 1 6 i=1 5 i=1 5 i=1 i · P (Fi ) − i · P (Fi ) − i· 5 i 1 6 P (Fi ) i=1 1 55 + 6 66 i 5 6 5−i − 1 55 + 6 66 1 5 1 55 · − + ≈ 0.9 (and says it is good with probability 0. If the tested item is defective. If not. but it does not always give you correct answer.0392.3. if you pulled a 9 the ﬁrst time.8). If the tested item is good. If you pull an ace. Pull one card at random from a full deck of 52 cards. (For example. you lose outright. and 9 red. Compute P (E1 ) and P (E2 ). etc. You have tested all of them and the tests detect no defects. Q. and E2 the event that you get at least two toys in succession. Otherwise. without replacement) that number of additional cards from the deck. . you open the ﬁrst one. but we will soon learn why the sum has to equal 5 . then do the same for the second one. A chocolate egg either contains a toy or is empty. What is the probability that none of the 3 items is defective? 3. If the number is 7 or less. you win. then the method says it is defective with probability 0. You have 16 balls. then you look at the value of the card (K. You also have 3 urns. A box contains 3 items.4 CONDITIONAL PROBABILITY AND INDEPENDENCE i−1 6 35 Moreover. you pull 9 more cards. An item is defective (independently of other items) with probability 0. You have 5 eggs. What are your chances of winning this game? 2. you pull (at random. 4 green.1). You have a method of testing whether the item is defective. the method detects the defect with probability 0. you win outright.) If you get at least one ace. 6 Problems 1. Assume that each egg contains a toy with probability p. 6 6 6 66 The last equality can be obtained by a computation. Let E1 be the event that you get at least 4 toys.

Assume that you have an n–element set U and you select r independent random subsets A1 . . Let F = {none are defective} and A = {test indicates that are not defective}. Then F10 = {10. P (A ∩ F ) P (A) (0.4 CONDITIONAL PROBABILITY AND INDEPENDENCE 36 you select an urn at random and put the ball into it. F9 = {9 ﬁrst time}. .1)3 56 59 3 P (F |A) = = = . or K ﬁrst time}. F8 = {8 ﬁrst time}. Solutions to problems 1. P (W |F1 ) = 1. Compute (in a simple closed form) the probability that the Ai are pairwise disjoint. J. .7 · 0.) (a) What is the probability that no urn is empty? (b) What is the probability that each urn contains 3 red balls? (c) What is the probability that each urn contains all three colors? 5.7 · 0. . Let F1 = {Ace ﬁrst time}.8 + 0. . . and so. (Urns are large enough to accommodate any number of balls. Q. By the second Bayes’ formula.8)3 (0. 2. Ar ⊂ U . P (W |F8 ) = 1 − P (W |F9 ) = 1 − P (W |F10 ) = 1 − 47 8 51 8 47 9 51 9 47 10 51 10 47 9 51 9 . P (W ) = 4 4 + 52 52 1− 47 8 51 8 + 4 52 1− + 16 52 1− 47 10 51 10 . Also let W be the event that you win.3 · 0. . All Ai are chosen so that all 2n choices are equally likely.

by independence. −3 4 1 3 1 3 3 × 4 −3 2 3 9 × 1 3 9 −3× . Because of this. E1 and E2 are not independent. P (E1 ) = 5p4 (1 − p) + p5 = 5p4 − 4p5 and P (E2 ) = 1 − (1 − p)5 − 5p(1 − p)4 − p3 (1 − p)2 . = 1− 315 (b) We can ignore other balls since only red balls matter here. by inclusion-exclusion. 9 −3 . P (each urn contains all 3 colors) = 1− 1− 1− 3 3 3× 2 3 2 3 3 2 3 2 3 2 3 3 −3 4 1 3 1 3 1 3 3 .4 CONDITIONAL PROBABILITY AND INDEPENDENCE 3. P (no urns are empty) = 1 − P (A1 ∪ A2 ∪ A3 ) 216 − 1 . (a) Let Ai = the event that i-th urn is empty. 4. As E1 ⊂ E2 . P (A1 ∪ A2 ∪ A3 ) = and 216 − 1 . 1 3 16 P (A1 ∩ A2 ) = P (A1 ∩ A3 ) = P (A2 ∩ A3 ) = P (A1 ∩ A2 ∩ A3 ) = 0. 9 3 8 · 312 (c) As P (at least one urn lacks blue) = 3 P (at least one urn lacks green) = 3 P (at least one urn lacks red) = 3 we have. 4 −3 9 . P (A1 ) = P (A2 ) = P (A3 ) = 2 3 16 4 2 37 p2 (1 − p)3 − . Hence the result is: 9! 9! 3!3!3! = . . 315 .

4 CONDITIONAL PROBABILITY AND INDEPENDENCE 38 5. . This is the same as choosing an r × n matrix in which every entry is independently 0 or 1 with probability 1/2 and ending up with at most one 1 in every column. this gives ((1 + r)2−r ) . Since columns are n independent.

(c) Each of the numbers 1. as many cards from the full deck as the number that came up on the die. white. 2. with equal probability. Give yourself 50 minutes to solve the four problems. or blue (again. 3 appears exactly twice. You buy 5 eggs.4 CONDITIONAL PROBABILITY AND INDEPENDENCE 39 Interlude: Practice Midterm 1 This practice exam covers the material from the ﬁrst four chapters. Each toy is. red. independently of other toys). What is the probability that: (a) At least one 1 appears. Assume that each egg contains a toy with probability p ∈ (0. Consider the following game. 1. Five married couples are seated at random around a round table. The player rolls a fair die. 1). Otherwise he selects.. . (b) Compute the probability that at most one wife does not sit next to her husband. (c) Are E1 and E2 independent? Explain. at random. Why is this probability very easy to compute when p = 1/2? (b) Compute P (E2 ). 3 appears at least once. Let E1 be the event that you get at most 2 toys. If he rolls 3 or less. he loses immediately. and E2 the event that you get you get at least one red and at least one white and at least one blue toy (so that you have complete collection). every husband-wife pair occupies adjacent seats). Ten fair dice are rolled. and won. (a) Compute the probability that all couples sit together (i. while 4 appears four times. (a) Compute the winning probability for this game. (b) Smith only tells you that he recently played this game once. (b) Each of the numbers 1. which you may assume have equal point score. 2. The player wins if all four aces are among the selected cards. 2.e. A chocolate egg either contains a toy or is empty. What is the probability that he rolled a 6 on the die? 4. 3. (a) Compute P (E1 ). independently of other eggs.

.4 CONDITIONAL PROBABILITY AND INDEPENDENCE 40 Solutions to Practice Midterm 1 1. Solution: Let Ai be the event that the ith number does not appear. Solution: 10 1 − P (no 1 appears) = 1 − 5 6 . 2. 2. Solution: 10 2 8 6 2 2 610 = 23 10! . 3 appears exactly twice. 3 appears at least once. What is the probability that: (a) At least one 1 appears. (b) Each of the numbers 1. while 4 appears four times. 4 6 10 P (A1 ∩ A2 ) = P (A1 ∩ A3 ) = P (A2 ∩ A3 ) = P (A1 ∩ A2 ∩ A3 ) = 3 6 10 . · 4! · 610 (c) Each of the numbers 1. Ten fair dice are rolled. We know the following: P (A1 ) = P (A2 ) = P (A3 ) = 5 6 10 . .

iii. Solution: Let i be an integer in the set {1.e. Treat each couple as a block and the remaining 8 seats as 4 pairs (where each pair are adjacent). and 3 all appears at least once) = P ((A1 ∪ A2 ∪ A3 )c ) = 1 − P (A1 ) − P (A2 ) − P (A3 ) −P (A1 ∩ A2 ∩ A3 ) = 1−3· 5 6 10 +P (A1 ∩ A2 ) + P (A2 ∩ A3 ) + P (A1 ∩ A3 ) +3· 4 6 10 − 3 6 10 . P (1. (a) Compute the probability that all couples sit together (i. respectively. We know that P (A) = 2 9! from part (a). every husband-wife pair occupies adjacent seats). 2. 9! (b) Compute the probability that at most one wife does not sit next to her husband. 2. 3. and let B be the event 5 ·4! that exactly one wife does not sit next to her husband. Denote each husband and wife as hi and wi . Moreover. v. Solution: Let A be the event that all wives sit next to their husbands. Five married couples are seated at random around a round table. 5}. There are 4! ways to seat the remaining 4 couples into 4 pairs of seats. There are 2 ways to order w1 . 2. ii.. our solution is 2 · 4! · 24 . Fix h1 to one of the seats.4 CONDITIONAL PROBABILITY AND INDEPENDENCE 41 Then. 4. where Bi is the event that wi . This is our sample space. There are 24 ways to order hi and wi within the pair of seats. There are 9! ways to order the remaining 9 people in the remaining 9 seats. i. B = B1 ∪ B2 ∪ B3 ∪ B4 ∪ B5 . Therefore. iv.

4 CONDITIONAL PROBABILITY AND INDEPENDENCE 42 does not sit next to hi and the remaining couples sit together. There are 24 ways to order the couples within their blocks. So we need to determine P (B1 ). P (W ) = 1 1 · 52 6 4 1+ 5 6 + 4 4 . P (Fi ) = 6 . or 3. Sit h1 on a speciﬁed seat. iii. Moreover. . (a) Compute the winning probability for this game. . v. Otherwise he selects. at random. Sample space is 9! for ordering the remaining 9 people into the remaining 9 seats. 9! Therefore. The player wins if all four aces are among the selected cards. Consider the following game. i. Consider each of remaining 4 couples and w1 as 5 blocks. Then Bi are disjoint and their probabilities are all the same. 9! 9! 3. P (W |F1 ) = P (W |F2 ) = P (W |F3 ) = 0. 1 52 4 5 4 52 4 6 4 52 4 . where 1 i = 1. There are 4! ways to order the remaining 4 couples. P (W |F4 ) = P (W |F5 ) = P (W |F6 ) = Therefore. . Since we lose if we roll a 1. If he rolls 3 or less. as many cards from the full deck as the number that came up on the die. ii. . P (B1 ) = Our answer is then 5· 3 · 4! · 24 . vi. The player rolls a fair die. . iv. 3 · 4! · 24 25 · 4! + . 6. Let Fi be the event that you roll i. he loses immediately. . Solution: Let W be the event that you win. you have 3 positions for w1 in the ordering of the 5 blocks. 2. As w1 cannot be next to her husband. .

red. . with equal probability. independently of other eggs. What is the probability that he rolled a 6 on the die? Solution: P (F6 |W ) = = = = 1 6 · 1 · (52) 4 P (W ) 6 4 6 4 1+ 15 21 5 . white. 2 When p = 1 . and E2 the event that you get you get at least one red and at least one white and at least one blue toy (so that you have complete collection). Each toy is. or blue (again. 7 5 4 + 6 4 4. 1). independently of other toys). Assume that each egg contains a toy with probability p ∈ (0. 2 P (at most 2 toys) = P (at least 3 toys) = P (at most 2 eggs are empty) 1 c Therefore. A chocolate egg either contains a toy or is empty. Let E1 be the event that you get at most 2 toys. the probability. P (E1 ) = P (E1 ) and P (E1 ) = 2 . Why is this probability very easy to compute when p = 1/2? Solution: P (E1 ) = P (0 toys) + P (1 toy) + P (2 toys) 5 2 = (1 − p)5 + 5p(1 − p)4 + p (1 − p)3 . and won.4 CONDITIONAL PROBABILITY AND INDEPENDENCE 43 (b) Smith only tells you that he recently played this game once. (a) Compute P (E1 ). You buy 5 eggs.

and A3 the event that blue is missing. (c) Are E1 and E2 independent? Explain. E1 ∩ E2 = ∅. Solution: Let A1 be the event that red is missing. Solution: No. . P (E2 ) = P ((A1 ∪ A2 ∪ A3 )C ) = 1−3· 1− p 3 5 5 +3· 1− 2p 3 − (1 − p)5 . A2 the event that white is missing.4 CONDITIONAL PROBABILITY AND INDEPENDENCE 44 (b) Compute P (E2 ).

p(xi ) > 0 (we do not list values of X which occur with probability 0). Example 5. Mathematically. For all i. . xi ∈B i p(xi ) p(xi ). Sometimes X is put into the subscript of its p. pX = pY . 1. m. we also allow X to have the value ∞ or more rarely −∞. y ≤ 1}. but X and Y are far from being the same random variable! In general. and p(xi ) = P (X = xi ) with i = 1. Random variables are traditionally denoted by capital letters to distinguish them from deterministic quantities. Example 5. the set of outcomes for which X ∈ B is an event. .3.1. . which counts the number of Tails in the 2 tosses. random variables may have the same p. 1. You should note that the random variable Y . f. An urn contains 20 balls.. and 2. . Choose a random person in class. Pull out 5 balls at random. Here are some examples of random variables. then. As X has to have some value. but this will not occur in this chapter. Determine its p. for each interval B. that is. i = 1. m. The crucial theoretical property that X has to have is that.5 DISCRETE RANDOM VARIABLES 45 5 Discrete Random Variables A random variable is a number whose value depends upon the outcome of a random experiment. Toss a coin 10 times. 4 1 and P (X = 2) = 4 . has the same p. numbered 1. . p = pX . P (X = 1) = 2 . Determine its p. . Choose a random point in the unit square {(x. . P (X ∈ B) = 3. and let X be the number of Heads. 1 Possible values of X are 0. . but not even deﬁned on the same set of outcomes. y) : 0 ≤ x. Let X be the number of Heads in 2 fair coin tosses. the space of outcomes: X : Ω → R. and let X be its distance from the origin. . . f. 4.. m. a random variable X is a real-valued function on Ω. f. 2. Let X be the largest number among selected balls. and let X be the height of the person.2. 2. without replacement. m. Their probabilities are: P (X = 0) = 1 . f. m. f.. = 1. so we are able to talk about its probability P (X ∈ B).. . when convenient. 20. Sometimes. 2. is called the probability mass function of X. Example 5. For any interval B. A discrete random variable X has ﬁnitely or countably many values xi . Let X be value of the NASDAQ stock index. at the closing of the next business day. 2. and the probability that at least one of the selected numbers is 15 or more. A probability mass function p has the following properties: 1. in inches. 3.

4. 20. . . f. and 3. Then the expected value. which can be used either to check the computations. g : R → R. Determine the p. The possible values are 5. of X is EX = i xi P (x = xi ) = i xi p(xi ). 3 white. or 1 red and 2 blue: P (X = 1) = P (X = −1) = 3 2 3 1 + 11 3 3 1 5 2 = 39 . 2. and 5 blue balls. 1. The number of outcomes is 11 . To determine the p. m. m. Take out 3 balls at random. 165 To get X = 1. or with 3 blue balls: P (X = 0) = 3·3·5+ 11 3 5 3 = 55 . to get X = 2 we have to have 2 red balls and 1 blue: P (X = −2) = P (X = 2) = Finally. 0. For any function. .5 DISCRETE RANDOM VARIABLES 20 5 46 outcomes. we can have 2 red and 1 white. also called expectation. without replacement. i=15 Example 5. You win $1 for each red ball you select and lose a $1 for each white ball you select. of X. note that you have and then P (X = i) = Finally. i = 1. P (at least one number 15 or more) = P (X ≥ 15) = P (X = i). . 2 . 3 red. . or mean. This can occur with one ball of each color. a single outcome (3 red balls) produces X = 3: P (X = −3) = P (X = 3) = 1 11 3 3 2 11 3 5 1 = 15 . Let’s start 3 with 0. 165 = 1 . f.. Assume X is a discrete random variable with possible values xi . P (X = 0)) from the other six.. 20 i−1 4 20 5 . the amount you win. or to compute the seventh probability (say. . −1. Next. Eg(X) = i g(xi )P (X = xi ). 165 The probability that X = −1 is the same because of symmetry between the roles that red and white balls play. X can have values −3. average. An urn contains 11 balls. . 165 All the seven probabilities should add to 1. −2.

5.2 + 2 · 0.5 = 2. Let X be a random variable with P (X = 1) = 0.3. P (X = 2) = 0. of X? The most natural way would certainly be E|X − µ|.5n = 1 · 0. about 0. The quantity that has the correct units is the standard deviation σ(X) = Var(X) = E(X − µ)2 . We will give another. The average value of X then should be 1 · 0. then should be the average of X? Imagine a large number n of repetitions of the experiment. and P (X = 3) = 0. but let’s instead proceed intuitively and see that the deﬁnition makes sense. n which of course is the same as the formula gives.2n + 2 · 0. which will use the following property of expectation. m. What should be the expected value of X? We can of course just use the formula. the only way that a random variable has 0 variance is when it is equal to its expectation µ with probability 1 (so it’s not really random at all): P (X = µ) = 1.5n will have X = 3.3 + 3 · 0. By frequency interpretation of probability.2n realizations will have X = 1. Here is the summary. and measure the realization of X in each. about 0. How should we measure the deviation of X from µ. Take a discrete random variable X and let µ = EX.3. we deﬁne the variance of X Var(X) = E(x − µ)2 . valid for any random variables X1 and X2 . bear in mind that variance cannot be negative! Furthermore.e.3n + 3 · 0.5. = E(X 2 ) − 2µE(X) + µ2 . The only problem with this is that absolute values are annoying. The variance of a random variable X is Var(X) = E(X −EX)2 = E(X 2 )−(EX)2 .5 DISCRETE RANDOM VARIABLES 47 Example 5. This property will be explained and discussed in more detail later.3n will have X = 2. and about 0. Instead. i. What.2. how “spread-out” is the p. and nonrandom constants α1 and α2 . formula for the variance.. called linearity: E(α1 X1 + α2 X2 ) = α1 EX1 + α2 EX2 . Then Var(X) = E[(X − µ)2 ] = E[X 2 − 2µX + µ2 ] = E(X 2 ) − µ2 = E(X 2 ) − (EX)2 In computations. more convenient. f.

Compute Var(X). otherwise.9−5. Var(IA ) = p(1 − p).29. note that IA = IA . Then IA . EX = x1 +. Assume A is an event with probability p.61 and σ(X) = We will now look at some famous probability mass functions.6.5 DISCRETE RANDOM VARIABLES 48 Example 5. the indicator of A.+x2 n 1 n 2.7. so that E(IA ) = EIA = p.7810..3)2 = 5. xn . . each with equal probability 1/n. This is the standard example of a discrete uniform random variable. 2 2 For the variance. Properties: 1. and EX = 1 + 2 + .+xn . E(IA ) = p. .1 Uniform discrete random variable This is a random variable with values x1 . Compute EX. Properties: 1. Though simple. + 6 = 6 1 + 22 + .. Previous example. 2 = 91 . E(X 2 ) = 12 · 0. .29 = 0. continued. . n x2 +.2 Bernoulli random variable This is also called an indicator random variable. such random variables are very important as building blocks for more complicated random variables.+xn 2 . 5.3 + 32 · 0.. n Example 5. VarX = − x1 +. 2. .. Let X be the number shown on a fair die.2 + 22 · 0. and Var(X)..9.. Var(X) ≈ 0. 12 5. .5 = 5. Such a random variable is simply the random choice of one among n numbers.. so that Var(X) = 5. and (EX)2 = (2. is given by IA = 1 0 if A happens. 6 35 . Other notations for IA include 1A and χA . . E(X 2 ). + 62 EX 2 = 6 7 2 91 − = Var(X) = 6 2 7 ..

Therefore. . please take them on faith. each of which is a success with probability p. . EX = λ. Probability mass function: P (X = i) = 2. . For each child. dr is called the hybrid. 5.5 and 2 10 n i pi (1 − p)n−i . Then dd is called the pure dominant genotype.4 Poisson random variable A random variable is Poisson(λ). Var(X) and P (X ≤ 10)? As X is Binomial(50. independently.. Assuming that both parents are hybrid and have n children.8. Var(X) = np(1 − p). result in the phenotype of the dominant gene. Example 5. The expectation and variance formulas will be proved later. Denote by d the dominant gene and r the recessive gene at a single locus.3 Binomial random variable A Binomial(n. n. The two genotypes with at least one dominant gene. i = 0. . P (X ≥ 2) = 1 − P (X = 0) − P (X = 1) = 1 − 3 4 n 1 −n· 4 3 4 n−1 . so EX = 25. 3. P (X ≤ 10) = i=0 50 1 . what is the probability that at least two will have the recessive phenotype? Each child. 4 ).p) random variable counts the number of successes in n independent trials. Let X be the number of heads in 50 tosses of a fair coin. Properties: 1. If X is the number of 4 1 rr children. Properties: 1. while rr results in a recessive phenotype. For now. . . independently. Var(X) = 12. dd and dr. P (X = i) = 2. 1 ). 1. if it has the probability mass function given below. 2. . i 250 Example 5. then X is Binomial(n. the probability of the rr genotype is 1 . λi i! e−λ for i = 0. EX = np. and rr the pure recessive genotype. . with parameter λ > 0.9. Determine EX. gets one of the genes at random from each parent.5 DISCRETE RANDOM VARIABLES 49 5.

λ) · p. . P (X = i) = = = → as n → ∞. who studied deaths by horse kicks in Prussian cavalry. (n − i + 1) 1 λ · 1− · · i λ i i! n n 1− n Poisson approximation is quite good: one can prove that the mistake made by computing a probability using Poisson approximation in place of its exact Binomial expression (in the context of the above theorem) is no more than min(1. then. Assume that the US population is 300 million. Suppose that the probability that a person is killed by lighting in a year is.p) is approximately Poisson(λ): λ If X is Binomial(n. with p = n . . and λ = np is of moderate size. . Var(X) = λ.1. Theorem 5. . Example 5. Binomial(n. p). Poisson approximation to Binomial. When n is large. when the number of trials is large and the probability of success is small. Bortkiewicz (in 1898). Proof. . as n → ∞. . 2. ﬁrst formulated by L. independently.5 DISCRETE RANDOM VARIABLES 50 3. (i − 1)! and the variance computation is similar (and a good excercise!). .10. P (X = i) → eλ for each ﬁxed i = 0. i! λ n λ −i n(n − 1) . Here is how you compute the expectation: EX = ∞ i=1 i · e−λ λi = e−λ λ i! ∞ i=1 λi−1 = e−λ λ eλ = λ. 1/500 million. i! λ 1− n n−i λi −λ · e · 1 · 1. J. In this context it is often called the law of rare events. p is small. (n − i + 1) λi · i 1− 1− i! n n n n i λ n(n − 1) . . 1. n i λ n i λi . The Poisson random variable is useful as an approximation to a Binomial one.

Consequently. A person who matches these characteristics has been .3694. the answer is 10(1 − e−λ − λe−λ ) ≈ 0. we can imagine that they operate on diﬀerent people in disjoint time intervals. Assuming year-to-year independence.11.0369. This highlights the interpretation of λ as a rate. In every year. 5. λ e−λ ). Approximate P (in exactly 3 of next 10 years exactly 3 people are killed by lightning). again.5 DISCRETE RANDOM VARIABLES 51 1. in which 2 or more people are killed by lightning. and the formula for Binomal expectation. The answer is then 3! 10 3 λ3 −λ e 3! 3 3 λ3 1 − e−λ 3! 7 ≈ 4. Compute the expected number of years. Compute P (3 or more people will be killed by lightning next year) exactly. and the answer is n 2 p (1 − p)n−2 ≈ 0. Approximate the above probability.02311529. among the next 10. Example 5. p) where n = 300 million and p = 1/ 500 million. By the same logic as above. Thus. 3! λ = 3 . 108 ). 2 1 − e−λ − λe−λ − λ2 −λ e ≈ 0. It is known that the perpetrator has certain characteristics. 10−8 ) in a population of size n (say. If lightning deaths occur at the rate of 3 5 a year. 2 3.34 · 10−6 . and then np doubles too. halving the time interval has the same p. but half as 3 3 many trials. 4. Poisson distribution and law . Indeed. which occur with small frequency p (say. Approximate P (two or more people are killed by lightning within the ﬁrst 6 months of next year). assuming the lightning deaths occur as a result of independent factors in disjoint time intervals. the number of years with exactly 3 people 5 3 killed is approximately Binomial(10. they should occur at half that rate in 6 months. doubling the time interval is the same as doubling the number n of people (while keeping p the same). so np changes to 10 and so λ = 10 as well. If X is the number of people killed by lightning. X is approximately Poisson( 5 ). Assume a crime has been committed. then X is Binomial(n. 3 3 As np = 5 . The answer is 1 − e−λ − λe−λ ≈ 0.02311530. and the answer is 1 − (1 − p)n − np(1 − p)n−1 − 2. where. the probability of exactly 3 deaths is approximately λ e−λ .

but it can be easily computed numerically. it was claimed by prosecution (on ﬂimsy grounds) that p = 1/12. In this instance. The ﬁrst reason was insuﬃcient foundation for the estimate of p. Collins case. A. 000.” The Supreme Court of California reversed the conviction. Assume that N is the number of people with given characteristics. N ≥ 1) . presumably enough for a reasonable doubt. 000 and n would be the adult couples in the LA area. we can formulate the problem like this: P (C = A | N ≥ 1) = P (C = A. since p is so small. 000. 000. at a routine traﬃc stop. k 1 λk −λ · ·e . Then choose at random another person. in the LA area] must be one in a billion.. The second reason was that the probability that another couple with matching characteristics exists is in fact 1 − e−λ − λe−λ . say n = 5. the criminal. What should the defense be? Let’s start with a mathematical model of this situation. whether the arrested person is guilty. The jury convicted the charged couple in a robbery on the basis of the prosecutor’s claim that. who is arrested. Choose a person from among these N . k k! ∞ k=1 The probability that the arrested person is guilty then is P (C = A|N ≥ 1) = e−λ · 1 − e−λ λk . There is no other evidence. This model was in fact tested in court.2330 when λ = 1. Mathematically. a 1968 jury trial in Los Angeles. 1−(the above probability). N ≥ 1) = ∞ k=1 1 . and gave two reasons. N ≥ 1) = = ∞ k=1 ∞ k=1 P (C = A. label that person C. in the famous People v. “the chances of there being another couple [with speciﬁed characteristics. The defense may claim that the probability of innocence. or by airport security) and. This is a Binomial random variable. N ≥ 1 | N = k) · P (N = k) P (C = A | N = k) · P (N = k) and so P (C = A | N = k) = P (C = A. charged with the crime. that is. Now by the ﬁrst Bayes formula. The question is whether C = A. P (C = A. P (N ≥ 2 | N ≥ 1) = 1 − e−λ . but given the assumptions we can easily assume it is Poisson with λ = np.5 DISCRETE RANDOM VARIABLES 52 found at random (e. due to low p.g. is about 0. k · k! There is no closed-form expression for the sum. P (N ≥ 1) We need to condition as the experiment cannot even be performed when N = 0.

What are EX.1939. . otherwise.5 Geometric random variable A Geometric(p) random variable X counts the number trials required for the ﬁrst success in independent trials with success probability p. Let X be the number of tosses of a fair coin required for the ﬁrst Heads. Example 5. 2. On the average. Example 5. = P (X > n). and 1 5 1 7 + · = . how many rounds does the game last? P (game decided on round 1) = 7 and so the number of rounds N is Geometric( 12 ). You roll a die. p 3.1015. P (X > n + k|X > k) = (1−p)n+k (1−p)k We leave out the proofs of the ﬁrst two formulas. this round ends and the game repeats. Var(X) = 1−p . Properties: 1. which reduce to manipulations with geometric series. if you don’t roll 6 and your opponent tosses Heads you lose. Probability mass function: P (X = n) = p(1 − p)n−1 with n = 1.13. 2.5 DISCRETE RANDOM VARIABLES 53 5 much larger than the prosecutor claimed. 7 . . EX = 2 and Var(X) = 2. This is about twice the (more relevant) probability of innocence.12. . your opponent tosses a coin. which is for this λ about 0. 5. p2 ∞ k=n+1 p(1 4. 6 6 2 12 EN = 12 . and Var(X)? 1 As X is Geometric( 2 ). namely for λ = 12 about 0. P (X > n) = − p)k−1 = (1 − p)n . EX = 1 . If you roll 6 you win. 5.

(a) Write down the probability mass function of X. (c) Find an expression for P (X ≥ 6). (b) Compute EN . In all other cases you pay him $1. (b) Write down a relevant approximate expression for the probability from (a). The biologist plants 3000 seeds. otherwise whoever rolls the larger number wins. but your card is red and your opponent has no red cards. (c) Compute P (you win). C. and your opponent is dealt 2 cards (without any replacement). he pays you $5 (regardless of his cards). choose one of the four groups at random. D: 20. 5. (b) Compute EX and EY . once a seed is planted. and nothing otherwise. he pays you $1. If you both roll the same number. Determine your expected earnings. B. (Recall: all random choices are with equal probability. Brieﬂy justify the approximation. Let X be the number of 6’s in the ﬁrst 10 rolls and let Y the number of rolls needed to obtain a 3. The plant needs a year to reach maturity. What is the approximate probability that he has at least 2 successes in 10 years? (d) Devise a method to determine the number of seeds the biologist should plant in order to get at least 3 mature plants in a year with probability at least 0. any plant will survive for the year with probability 1/1000 (independently of other plants). unless otherwise speciﬁed. (d) Assume you have s .5 DISCRETE RANDOM VARIABLES 54 Problems 1. (d) Find an expression for P (Y > 10). and let Y be its size. nor a King. Let N be the number of times the two dice have to be rolled before the game is decided. The membership of the four groups are as follows: A: 5. Next. Roll a fair die repeatedly. C: 15. B: 10. 2. Compute your expected winnings. A year is deemed a success if three or more plants from these seeds reach maturity.) (a) Write down the probability mass functions for X and Y .999. If you have neither an Ace. choose one of the 50 students at random and let X be the size of that student’s group. A biologist needs at least 3 mature specimens of a certain plant. he pays you $10. You are dealt one card at random from a full deck. Are they positive? 4. First. or D. Each of the 50 students in class belongs to exactly one of the four groups A. You and your opponent both roll a fair die. If you get an Ace. (Your method will probably require a lengthy calculation – do not try to carry it out with pen and paper. (b) Write down the probability mass function of Y . (a) Determine the probability mass function of N . if you get a King.) 3. the game is repeated. (a) Write down the exact expression for the probability that the biologist will indeed end up with at least 3 mature plants. (d) Assume that you get paid $10 for winning on the ﬁrst round. $1 for winning on any other round. (c) Compute Var(X) and Var(Y ). (c) The biologist plans to do this year after year.

. . . (c) 10 1 6 5 6 i−1 .999)2999 (0. 1. P (X ≥ 6) = (d) i=6 10 i 1 6 5 6 i 5 6 10−i . the number of mature plants. . P (X ≥ 3) = 1 − P (X ≤ 2) 1 1000 .999)2998 (0. n.5 DISCRETE RANDOM VARIABLES 55 students divided into n groups with membership numbers s1 . 2 (c) Denote the probability in (b) by s. 10 P (Y > 10) = .001 2 . 6 ): P (X = i) = where i = 0.999)3000 − 3000(0. 2 = 3. 2. (d) Solve e−λ + λe−λ + λ2 −λ e = 0. .001)2 . Let EY = µ and Var(Y ) = σ 2 . 2. 1 (b) Y is Geometric( 6 ): 10 i 1 6 i 5 6 10−i . Then the number of years the biologists succeeds is approximately Binomial(10. . . 9 P (X ≥ 3) ≈ 1 − e−3 − 3e−3 − e−3 . . and again X is the size of the group of a randomly chosen student while Y is the size of the randomly chosen group. and σ. . (a) X is Binomial(10. 10. s) and the answer is 1 − (1 − s)10 − 10s(1 − s)9 . P (Y = i) = where i = 1. (a) Let X. . µ. . Solutions 1 1. 2.001) − (b) By Poisson approximation with λ = 3000 · 1 1000 3000 (0. is Binomial 3000. sn . Express EX with s. = 1 − (0.

2 0. (a) N is Geometric( 5 ): 6 P (N = n) = where n = 1. and 0 otherwise.25 0.5. The equation above can be solved by rewriting λ = log 1000 + log(1 + λ + λ2 ) 2 and then solving by iteration. 102 P (X = −1) = 1 − and so EX = 11 2 − . P (X = 10) = 4 .5 DISCRETE RANDOM VARIABLES 56 for λ and then let n = 1000λ.25. EY = 12. .25 0. 3. 2 (d) You get paid $10 with probability expected winnings are 51 .25 0.3 0. . $1 with probability 1 12 . Let X be your earnings. 229 seeds. (c) E(X 2 ) = 250. 6 (b) EN = 5 .5. The result is that the biologist should plant 11. E(Y 2 ) = 187. . P (X = x) 0. so Var(X) = 25. P (you win) = 1 . so your . 52 22 · 52 26 2 51 2 P (X = 1) = = 11 . (a) x 5 10 15 20 (b) EX = 15. so Var(Y ) = 31. 2.1 0. 13 102 10 5 11 2 11 4 11 + + −1+ + = + >0 13 13 102 13 102 13 51 4. 6 (c) By symmetry. 3. 12 5. 1 6 n−1 5 · .4 P (Y = x) 0. 52 4 P (X = 5) = .25 5 12 ..

. n s s s . .5 DISCRETE RANDOM VARIABLES 57 (d) Let s = s1 + . + sn . Then n E(X) = i=1 si n si · = s s n i=1 s2 · i 1 n n n = · EY 2 = (Var(Y ) + (EY )2 ) = (σ 2 + µ2 ).

b a b f (x) dx. Note also that P (X ∈ [a. On an open interval where f is continuous. Namely. We will assume that a density function f is continuous. P (X ∈ B) = The function f = fX is called the density of X. . and the the variance is computed by the same formula: Var(X) = E(X 2 )−(EX)2 . The density has the same role as the probability mass function for discrete random variables: it tells which values x are relatively more probable for X than others. P (X ≤ b) = P (X < b) = The function F = FX given by f (x) dx. Clearly it must hold that ∞ −∞ f (x) dx. EX = ∞ −∞ x · f (x) dx. B f (x) dx = 1. we deﬁne. for every interval B. x + h]) = F (x + h) − F (x) ≈ F (x) · h = f (x) · h. apart from ﬁnitely many (possibly inﬁnite) jumps. By analogy with discrete random variables. b]) = P (a ≤ X ≤ b) = P (X = a) = 0. X is a continuous random variable if there exists a nonnegative function f so that. F (x) = f (x). ∞ Eg(X) = −∞ g(x) · f (x) dx. −∞ x F (x) = P (X ≤ x) = f (s) ds −∞ is called the distribution function of X. if h is very small then P (X ∈ [x.6 CONTINUOUS RANDOM VARIABLES 58 6 Continuous Random Variables By deﬁnition.

2. For (a) Determine c.1 Uniform random variable Such random variable represents a choice of a random number in [α. 1) FY (y) = P (Y ≤ y) = P (1 − X 4 ≤ y) = P (1 − y ≤ X 4 ) = P ((1 − y) 4 ≤ X) = It follows that fY (y) = 1 3 1 3 d 1 . for y ∈ (0. for y ∈ (0. Before starting. otherwise. (b) Compute P (1 ≤ X ≤ 2). otherwise. so we have (b). Assume that X has density fX (x) = Compute the density fY of Y = 1 − X 4 . 1]. 1). Now. Properties: . we now look at some famous densities. β] = [0. note that the density fY (y) will be nonzero only when y ∈ [0. For (a). 16 1 8 4 Finally. for (c) we get EX = 0 x2 8 dx = 8 3 4 and E(X 2 ) = 0 x3 dx = 8.1. (c) Determine EX and Var(X). For [α. On a problem like this. we use the fact that density integrates to 1. 4 0 cx dx 1 = 1. 8 So Var(X) = 8 − 64 9 = 8 9. and fY (y) = 0 otherwise. 1] as the values of Y are restricted onto that interval. Example 6. 6. FY (y) = −3((1 − y) 4 )2 (1 − y)− 4 (−1) = dy 4 4 (1 − y) 1 4 1 1 (1−y) 4 1 3x2 dx . Let f (x) = cx 0 if 0 < x < 4. 3x2 0 if x ∈ [0. As for discrete random variables. β].6 CONTINUOUS RANDOM VARIABLES 59 Example 6. and c = 8 . 1]. compute ﬁrst the distribution function FY of Y . You should observe that it is immaterial how f (y) is deﬁned at y = 0 and y = 1. we compute 2 x 3 dx = . because those two values would contribute nothing to any integral. this is ideally the output of a computer random number generator.

}. In other words. otherwise. 1). For example.3.X ≤ + P X > . 2 8 1 Our answer is 8 . β]. 3. As R has values in (0. but in fact we can make a more general conclusion. Var(X) = Example 6. EX = α+β 2 . 1] into two segments. if the 1 1 1 expansion starts with 0. What is P (X ∈ Q)? What is the probability that the binary expansion of X starts with 0. so there is no ambiguity about what the second question means. Example 6.4. and then ﬁnally in [ 4 . 1] is thus equivalent to tossing a fair coin inﬁnitely many times. 1] then. . 1). Density: f (x) = 2. Compute the density of R. . 1]. 3 ]. Q = {q1 . it has an enumeration. 1) into 2n intervals of equal length. the number is in [0. the density thus equals FR (r) = P (R ≤ r) = P X≤ fR (r) = 2 d FR (r) = . while we of course know that P (X ∈ R) = P (Ω) = 1. q2 . As X is.x1 x2 . 1 ]. the binary digits of X are the result of an inﬁnite sequence of independent fair coin tosses. If binary expansion of a number x ∈ [0. the ﬁrst n binary digits determine which of the 2n subintervals x belongs to: if you know that x belongs to an interval I based on ﬁrst n − 1 digits.010. ≤r +P X > . the density fR (r) is nonzero only for r ∈ (0. 2 ].. A uniform random number X divides [0. X 1 1−X 1 . say. with probability 1. irrational. ≤r 2 1−X 2 X 1 r 1 1 = P X ≤ . 1). Assume X is uniform on [0. then n’th digit 1 means that x is in the right half of I and n’th digit 0 means that x is in the left half of I. . (β−α)2 12 . then in [ 4 .6 CONTINUOUS RANDOM VARIABLES 1 β−α 60 1. i Note that you cannot do this for sets that are not countable or you would “prove” that P (X ∈ R) = 0. If X is uniform on [0. Let R be the ratio of the smaller versus the larger segment. 0 if x ∈ [α. . Choosing a uniform number on [0. By Axiom 3 of Chapter 3: P (X ∈ Q) = P (∪i {X = qi }) = P (X = qi ) = 0. dr (r + 1)2 . its binary expansion is uniquely deﬁned.X ≥ 2 r+1 2 r+1 r 1 1 1 1 r = P X≤ ≤ and ≥ +P X ≥ since r+1 r+1 r+1 2 r+1 2 r 1 2r = +1− = r+1 r+1 r+1 For r ∈ (0. and we will deal only with such r’s. Divide the [0. 1) is 0. any all 2n possibilities for its ﬁrst n binary digits are equally likely. .010? As Q is countable.

a measurement with a random error. Properties: 1. Such a random variable is (at least approximately) very common. There is no “aging. with parameter λ > 0. The last property means that if the event still has not occured by some given time (no matter how large).3935. 3. This is a distribution for a waiting time for some random event.3 Normal random variable A random variable is Normal with parameters µ ∈ R and σ 2 > 0 or. Then X is Exponential with λ = and P (X ≥ 200) = e−2 ≈ . Assuming exponential distribution.6 CONTINUOUS RANDOM VARIABLES 61 6. for a lightbulb to burn out. Density: (x−µ)2 1 f (x) = fX (x) = √ e− 2σ2 . or for the next earthquake of some given magnitude. if its density is the function given below. 5. if x < 0. λe−λx 0 if x ≥ 0. compute the probability that it lasts more than 200 hours and the probability that it lasts less than 50 hours. For example. 1 1 100 6. Var(X) = 1 . etc. weight of a randomly caught yellow-billed magpie. if it has the probability mass function given below. Memoryless property: P (X ≥ x + y|X ≥ y) = e−λx . SAT (or some other) test score of a randomly chosen student at UC Davis.2 Exponential random variable A random variable is Exponential(λ). EX = λ .5. X is N (µ. in short. Assume that a lightbulb last on average 100 hours.” Proofs of these properties are integration exercises and are omitted.1353. σ 2π . σ 2 ). λ2 4. for example. Properties: 1. P (X ≥ x) = e−λx . Example 6. Let X be the waiting time for the bulb to burn out. P (X ≤ 50) = 1 − e− 2 ≈ 0. your waiting time distribution is the same as it was at the beginning. Density: f (x) = 1 2.

we can use symmetry to prove that EX must be µ: EX = = = ∞ −∞ ∞ −∞ ∞ xf (x) dx = (x − µ) ∞ −∞ (x − µ)f (x) dx + µ ∞ −∞ f (x) dx z2 1 z √ e− 2σ2 dz + µ −∞ σ 2π = µ. How is Y distributed? If X is a “measurement with error” αX + β amounts to changing the units and so Y should still be normal. 2. Example 6. 2πσα Therefore.6. as is the computation of the variance. ∞). Var(X) = σ 2 . with α > 0. . Let’s see if this is the case. (x−µ)2 1 √ e− 2σ2 dx + µ σ − 2π where the last integral was obtained bt the change of variable z = x − µ and is zero because the function integrated is odd. is a diﬃcult exercise in integration.6 CONTINUOUS RANDOM VARIABLES where x ∈ (−∞. Let X be a N (µ. EX = µ. 3. We start by computing the distribution function of Y. The fact that ∞ −∞ 62 f (x) dx = 1. Assuming that the integral of f is 1. fY (y) = fX = √ y−β α · 1 α (y−β−αµ)2 1 e− 2α2 σ2 . FY (y) = P (Y ≤ y) = P (αX + β ≤ y) y−β = P X≤ α = y−β α fX (x) dx −∞ and then the density. Y is normal with EY = αµ + β and Var(Y ) = (ασ)2 . σ 2 ) random variable and let Y = αX + β.

. Note that Z= fZ (z) = Φ(z) = FZ (z) = 2 1 √ e−z /2 2π z 1 2 √ e−x /2 dx. P (|X − µ| ≥ 2σ) = 2(1 − Φ(2)) ≈ 0.7. The integral for Φ(z) cannot be computed as an elementary function. the letter Z will stand for an N (0. Its distribution function FZ (z) is denoted by Φ(z). 1) random variables are called standard Normal. Assume that X is Normal with mean µ = 2. so approximate values are given in the tables. and all other examples of this type. Another way to write this is P (Z ≥ −z) = P (Z ≤ z). We have P (|X − µ| ≥ σ) = P X −µ ≥1 σ = P (|Z| ≥ 1) = 2P (Z ≥ 1) = 2(1 − Φ(1)) ≈ 0. Φ(−z) = −z −∞ fZ (x) dx = z ∞ z fZ (x) dx = 1 − −∞ fZ (x) dx = 1 − Φ(z).0455. the form which is also often useful. Φ(0) = 2 .3173. 2π −∞ In particular. as computer can easily compute Φ(z) very accurately for any given z. and variance σ 2 = 25. Similarly. Compute the probability that X is between 1 and 4. if X is N (µ.0027. 1) random variable. and P (|X − µ| ≥ 3σ). In this. as in this case. by using the fact that fZ (x) is an even function. Example 6. Example 6. σ 2 ).8. What is the probability that a Normal random variable diﬀers from its mean µ by more than σ? More than 2σ? More than 3σ? In symbols. 1 In particular. P (|X − µ| ≥ 2σ). You should also note that it is enough to know these values for z > 0. Such N (0.6 CONTINUOUS RANDOM VARIABLES 63 X −µ σ has EZ = 0 and Var(Z) = 1. we need to compute P (|X − µ| ≥ σ). P (|X − µ| ≥ 3σ) = 2(1 − Φ(3)) ≈ 0. Nowadays this is largely obsolete.

(e. Then. N (0.g..6 CONTINUOUS RANDOM VARIABLES 64 Here is the computation: P (1 ≤ X ≤ 4) = P 1−2 X −2 4−2 ≤ ≤ 5 5 5 = P (−0.4) ≈ 0. more Sn − np ≤x np(1 − p) → Φ(x) e− 2 ds s2 k:0≤k≤np+x np(1−p) as n → ∞.e.. (We will simply assume that the approximation is adequate in the examples below. but when np(1−p) = 10 the bound above is larger than 0. then √Sn −np is standard Normal.2)) = P (Z ≤ 0. We should also note that the above theorem is an analytical statement.5) while n is large (e. for every x ∈ R. it says that √ n k 1 p (1 − p)n−k → √ k 2π x −∞ np(1−p) ≈ N (0. √Sn −np precisely. Recall that its mean is np and its variance np(1 − p). 1). 0. Indeed it can.4) − (1 − Φ(0. Let Sn be Binomial(n. If we pretend Sn is Normal. for every real number x. p) random variable. with a lot of computational work. De Moivre-Laplace Central Limit Theorem. p) where p is ﬁxed and n is large. The np(1−p) next theorem will say that this is approximately true if p be ﬁxed. n = 100). n p (1 − p) A commonly cited rule of thumb is that this is a decent approximation when np(1 − p) ≥ 10.3! Various corrections have been developed to diminish the error.1. i.g. say 1 n = 100 and p = 100 . the Poisson approximation (with λ = np) is much better! . but there are in my opinion obsolete by now.2 ≤ Z ≤ 0. you should simply compute directly with Binomial distribution and not use the Normal approximation.2) Let Sn be a Binomial(n. = Φ(0. In a situation when the error 1/ n p (1 − p) is too high. One can prove that the diﬀerence between the Binomial probability (in the above theorem) and its limit is at most 1 .. 1). proved this way. and originally was. P as n → ∞.2347 .4) − P (Z ≤ −0. An important issue is the quality of the Normal approximation to Binomial. Theorem 6.) You should also not forget that when n is large and p is small.

10. Then p = as n → ∞. respectively. 9 Let Sn be the number of times you win. This is a Binomial(n. we get P while for n = 1000. what is the probability that you are ahead? Answer this for n = 100 and n = 1000. you bet even money on the outcome of a fair coin toss each time. 18 black. We will use below that n > 200. What would be the answer to the previous example is the game was fair. Roulette wheel has 38 slots: 18 red. the true probabilities are 0. Example 6.5. 1 ) 2 and need to ﬁnd n so that P (Sn ≥ 100) ≈ 0.11. After n games. The ball ends at one of these at random. Let’s say you are a player who plays a large number of games and makes an even bet $1 on red in every game. for n = 200 (see . as the probability would be approximately 1 2 1 2 and P (ahead) → P (Z > 0) = 0.. We have Sn which is Binomial(n.0448.6 CONTINUOUS RANDOM VARIABLES 65 Example 6.2990. i. 19 ) random variable. and 2 green.2650 and 0. we get P Z> 5 3 ≈ 0. How many times do you need to toss a fair coin to get 100 heads with probability 90%? Let n be the number of tosses that we’re looking for. For comparison.9.e.9. P (ahead) = P (win more than half of the games) n = P Sn > 2 1 n − np Sn − np > 2 = P np(1 − p) np(1 − p) √ ( 1 − p) n ≈ P Z> 2 p(1 − p) For n = 100. 5 Z>√ 90 ≈ 0.0478. Example 6.

28 n − 200 = 0. otherwise.6 CONTINUOUS RANDOM VARIABLES 66 the previous example). otherwise.28. Φ(1.9 n−200 √ n Now. 1]. (c) Determine the probability density function of Y = X 2. 2. . n= 2 = 1. (a) Determine c. The density function of a random variable X is given by f (x) = a + bx 0 0 ≤ x ≤ 2. with the only positive solution √ √ 1. or This is a quadratic equation in Rounding up the number n we get from above. n. You also know that E(X) = 7/6.282 + 800 . we conclude that n = 219. A random variable X has density function f (x) = c(x + 0 √ x) x ∈ [0. Here is the computation: P 100 − 1 n Sn − 1 n √ 2 ≥ 1√ 2 1 2 n 2 n ≈ P = P = = = = Z≥ Z≥ 1 100 − 2 n √ 1 2 n 200 − n √ n n − 200 √ P Z≥− n n − 200 P Z≤ √ n n − 200 √ Φ n 0.9.28) ≈ 0. (b) Compute E(1/X). Problems 1. (b) Compute Var(X).28 + 1. thus we need to solve √ √ n − 1. (a) Compute a and b. according to the tables.

Let M be the amount of time of the show which you miss because of the call. (b) Write down an approximation. Assume that n is large. (a) Assume that you play this game 300 times. in terms on n and Φ. After your complaint about their service.6 CONTINUOUS RANDOM VARIABLES 67 3. What is. 6 it follows that c = (b) 6 7. . 4.” Assume that this means that the time T of the call is uniformly distributed in the speciﬁed interval. (c) How large should n be so that the probability in (b) is larger that 0. Toss a fair coin twice. x 7 Fr (y) = P (Y ≤ y) √ = P (X ≤ y) = 6 7 √ y (x + 0 √ x) dx. 6 7 (c) 1 0 √ 18 1 (x + x) dx = . Roll a die n times. (a) Compute the probability that the call arrives between 8:00 and 8:20. You win $1 if at least one of the two tosses comes out heads. and let M be the number of times you roll 6. the probability that you win at least $250? (b) Approximately how many times do you need to play so that you win at least $250 with probability at least 0. a representative of an insurance company promised to call you “between 7 and 9 this evening. What is the probability that it arrives in the next 10 minutes? (c) Assume that you know in advance that the call will last exactly 1 hour. approximately. of the probability that M diﬀers from its expectation by less than 10%.99? Solutions 1. Compute the expected value of M .99? 5. (b) At 8:30. (a) Compute the expectation EM . the call still hasn’t arrived. (a) As 1 1=c 0 (x + √ x) dx = c 1 2 + 2 3 = 7 c. From 9 to 9:30. there is a game show on TV that you wanted to watch.

99.25. 10 3 ). if 90 ≤ T. If Z is N (0. n = 300. and so Var(X) = 3 5 3 7 − ( 6 )2 = (b) Let T be time of the call. 3 6 = 5 . 1 0 xf (x) dx 11 36 .66x − 1000 = 0 3 4 3 4 1 4 · = −2. in minutes. T is uniform on [0. the number X of games you win is 4 Binomial(n. 3 (c) We have 0 M = T − 60 30 90 60 if 0 ≤ T ≤ 60. 1). 120]. 1). (a) 6 . 3 1 n· 4 · 4 For (a).01. so that For (b). otherwise. (a) From 0 f (x) dx = 1 you get 2a + 2b = 1 and from The two equations give a = b = 1 . which is approximately 1 − Φ(3. P (T ≤ 100|T ≥ 90) = 1 . this yields x2 − 4. you need to ﬁnd n so that the above expression is 0.33) ≈ The argument must be negative. from 7pm.0004. and the above expression is P (Z ≥ 0. If you play n times. 4 (b) E(X 2 ) = 1 3. 90 4. 1 2. If x = . 1 2 0 x f (x) dx 1 = 7 6 you get 2a + 8 b = 7 . (a) P (win a single game) = 3 .33. 3 1 n· 4 · 4 250 − n · √ n· 3n. 3 ). then 4 250 − n · 3 4 P (X ≥ 250) ≈ P Z ≥ .6 CONTINUOUS RANDOM VARIABLES 68 and so fY (y) = 3 7 (1 0 + y − 4 ) if y ∈ (0. 1 120 120 Then EM = 1 120 (t − 60) dx + 30 dx = 11. if 60 ≤ T ≤ 90. or that 250 − n · 3 4 Φ = 0.

(a) M is Binomial(n.1 n √ 5 69 n 6 · 0.995. and so Φ = 0. √ 0. n > (34.1 ≈ P |Z| < 6 6 √ 0. and ﬁnally n ≥ 3303. = 2. 6 6 (b) P n n M− < · 0.57.1 1 6 n· · 5 6 (c) The above must be 0.04)2 /3.1 n √ 5 − 1. 1 ).1 n √ 5 = 2Φ √ 0. n ≥ 387.99. . 5.04. so EM = n .6 CONTINUOUS RANDOM VARIABLES and solving the quadratic equation gives x ≈ 34.

f.y)∈A The marginal probability mass functions are then the p. 2. and therefore determine the marginal p.’s of X and Y . x can be 0. 45 . f. 1. and (d) P (X = 2|X ≥ Y ). f. y) x P (Y = y) = x P (X = x. P (X ≥ Y ) = and to answer (d). y). y) p(x. or 2 and y can be 0. f. The values are given in the table. m. Y = y). An urn has 2 red.’s. Determine (a) joint p. m. In our case.7 JOINT DISTRIBUTIONS AND INDEPENDENCE 70 7 Joint Distributions and Independence Discrete Case Assume you have a pair (X. Y ). Y ) of discrete random variables X and Y . y\x 0 1 2 3 P (X = x) 0 1/120 5 · 3/120 10 · 3/120 10/120 56/120 1 2 · 3/120 2 · 5 · 3/120 10 · 2/120 0 56/120 2 3/120 5/120 0 0 8/120 P (Y = y) 10/120 50/120 50/120 10/120 1 The last row and column are respective column and row sums. m. Joint p. and 3 green balls. given by P (X = x) = y P (X = x. m. Y ) ∈ A) = p(x. Select 3 balls at random and let X be the number of red balls and Y be the number of white balls. Their joint probability mass function is given by p(x. 5 white. so that P ((X. Y = y) = y p(x. To answer (c) we merely add relevant probabilities. X ≥ Y ) = P (X ≥ Y ) 8 120 3 8 1 + 6 + 3 + 30 + 5 3 = . or 3. 1. f. m. we compute P (X = 2.’s. 120 8 = 8 . Y = y) for all possible x and y. (c) P (X ≥ Y ). y) = P (X = x. (b) marginal p.1. Y = y) = Example 7. of (X. is given by P (X = x. (x.

S where S is some nice (say.3.4. m. . y) = 1 area(S) 0 if (x. Y = y) = P (X = x)P (Y = y) for all possible values x and y of X and Y . In discrete case. y) ≥ 0. otherwise. X and Y are independent exactly when P (X = x. open or closed) subset of R2 Example 7. m. that is. otherwise. Then X and Y are independent: we are used to assuming that all 36 outcomes of the two rolls are equally likely.’s. . For example. closed and bounded) subset of R2 . . and those 0’s in the table are dead giveaways. What this means is that f (x. are independent if P (X ∈ A. In the previous Example. nor P (Y = 2) is 0. The simplest example is a square of length of side length 1 f (x. roll a die twice and let X be the number on the ﬁrst roll and Y be the number on the second roll. Let (X. Y ) is jointly continuous pair of random variables if there exists a joint density f (x. f. Continuous Case We say that (X. Y ) ∈ S) = f (x. y) = 1 0 if 0 ≤ x ≤ 1 and 0 ≤ y ≤ 1. is the product of marginal p. Y ∈ B) = P (X ∈ A)P (Y ∈ B) for all intervals A and B. which is the same as assuming that the two random variables are discrete uniform (on 1. are X and Y independent? No. Y ) is a random point on S where S is a compact (that is. Example 7. Most often. 2. f. independence is an assumption. the joint p. .7 JOINT DISTRIBUTIONS AND INDEPENDENCE 71 Two random variables.2. 6) and independent. For example. so that P ((X. P (X = 2. Y = 2) = 0. Example 7. y) ∈ S. but neither P (X = 2). . y) dx dy.

If f is a joint density of (X. y) dy. 1). 4 For (b) Let S be the region between the graphs y = x2 and y = x. For (a).7 JOINT DISTRIBUTIONS AND INDEPENDENCE 72 Example 7. the following can be checked. −∞ . The marginal densities formulas now follow from the deﬁnition of density. and therefore P (X ∈ A) = dx A ∞ f (x. (c) P (X = Y ). 1 1 dx −1 x2 c x2 y dy = 1 c· 4 21 = 1 and so c= 21 . With some advanced calculus expertise. f (x. y) = c x2 y 0 if x2 ≤ y ≤ 1. for an interval A. Y ). y) dy. Y ) ∈ S) x 1 21 2 · x y dy dx = 2 4 x 0 3 = 20 Both probabilities in (c) and (d) are 0 because a two-dimensional integral over a line is 0. for x ∈ (0.5. otherwise. y) dx. Determine (a) the constant c. are computed by integrating out the other valiable: fX (x) = fY (y) = ∞ −∞ ∞ −∞ f (x. which are densities of X and Y . and (d) P (X = 2Y ). X ∈ A means that (X. (b) P (X ≥ Y ). y) = fX (x) · fY (y). Then P (X ≥ Y ) = P ((X. then the two marginal densities. f (x. Indeed. for all x and y. Two jointly continuous random variables X and Y are independent exactly when the joint density is the product of marginal ones. Let f (x. Y ) ∈ S where S = A × R.

7 JOINT DISTRIBUTIONS AND INDEPENDENCE 73 Example 7. Are X and Y independent? 1 0 (x. y) = The marginal densities are fX (x) = 2x. We have fX (x) = 1 x2 Compute marginal densities and determine 21 2 21 2 · x y dy = x (1 − x4 ) 4 8 √ y √ for x ∈ [−1. Assume that (X. The two random variables X and Y are clearly not independent because f (x. 1]. 1] and fY (y) = 1 f (x. 1]. y) = Marginal densities are if y ∈ [0. which is is the most common example of independence. y) = fX (x)fY (y). uniform on a compact set S ⊂ R2 . If they are to be independent. their marginal densities have to be constant. and then S = A × B. 1]. and fY (y) = 2(1 − y) 2 0≤y≤x≤1 0. So X and Y are no longer distributed uniformly and no longer independent.) . thus uniform on some sets. then S = A × B is a rectangle. Are X and Y independent? Now f (x. otherwise. and 0 otherwise. (If A and B are both intervals. Let (X. Example 7.7.8. 1]. Moreover. Y ) be the random points in the square of length of 1 with bottom left corner at the origin. otherwise. Y ) is jointly continuous pair of random variables. Example 7. y) ∈ [0. fY (y) = − y 21 2 7 5 · x y dx = y 2 4 2 where y ∈ [0. Y ) be a random point in the triangle {(x. and 0 otherwise. 1] × [0. continued. fX (x) = 1 if x ∈ [0. whether X and Y are independent.6. Previous example. 1]. 1]. Therefore X and Y are independent. say A and B. Let (X. and 0 otherwise. if y ∈ [0. if x ∈ [0. and 0 otherwise. We can make a more general conclusion from the last two examples. y) : 0 ≤ y ≤ x ≤ 1}.

Smith agree to meet at a speciﬁed location “between 5 and 6 p. For (b). otherwise. Compute P (X + Y ≤ 1). (a) Find the density for the time one of them will have to wait for the other.m.10. 1] and 0 elsewhere. To compute the probability in question we compute 1 1−x dx 0 0 1−y 2y dy or 0 1 dy 0 2y dx.5|X > Y ) = P (X ≤ 0.9. with the time unit 1 hour. compute the probability that Mr. we need to compute P (X ≤ 0. Smith arrives. Y ) f (x. from Alice and from Bob.7 JOINT DISTRIBUTIONS AND INDEPENDENCE 74 Example 7. Smith arrived before 5:30. 1]. Assume you are waiting for two phone calls. For (a).” Assume that they both arrive there at a random time between 5 and 6. 1] × [0. whichever double integral is easier. 4 Example 7. 3 Example 7. y) = 2y 0 if (x. 1]. Assume X and Y are independent. which has possible values in [0.11. y) ∈ [0. 1]. 1]. that X is uniform on [0. and the waiting time T2 for Bob’s . 1].5. The assumptions imply that (X. The answer is 1 . So ﬁx t ∈ [0. given this information. and that Y has density fY (y) = 2y for y ∈ [0. Mr. X > Y ) = P (X > Y ) 1 8 1 2 1 = . Smith later tells you she had to wait. The waiting time T1 for Alice’s call has expectation 10 minutes. let T = |X − Y |. and Mrs. and so fT (t) = 2 − 2t. The assumptions determine the joint density of (X. (b) Mrs. and compute (drawing a picture will also help) P (T ≤ t) = P (|X − Y | ≤ t) = P (−t ≤ X − Y ≤ t) = 1 − (1 − t)2 = P (X − t ≤ Y ≤ X + t) = 2t − t2 . Smith arrives and Y be the time when Mrs. 1] × [0. Y ) is uniform on [0. and that their arrivals are independent. Let X be the time when Mr.

reasonably.7 JOINT DISTRIBUTIONS AND INDEPENDENCE 75 call has expectation 40 minutes. t2 ≥ 0. Now the curve. which can be used to get (very poor) approximations for π. Let D be the distance from the center of the needle to the nearest line. t2 > 0. assume that D and Θ are independent and uniform on their respective intervals 0 ≤ D ≤ 1 and 0 ≤ Θ ≤ π . Then the probability equals π/2 0 2 D < sin Θ 2 D< 2 sin Θ . Then 2 2 P (the needle intersects a line) = P = P Case 1: ≤ 1. sin θ dθ /2 2 = = . What is the probability that Alice’s call will come ﬁrst? We need to compute P (T1 < T2 ). for t1 . Drop a needle of length onto the sheet. We will. t2 ) = e−t1 −t2 /4 . Buﬀon needle problem. d = Case 2: probability equals 4 π arcsin( 1 ) sin θ intersects d = 1 2 4 1 − π 2 2 at θ = arctan( 1 ). you famously get 2 π. 4 ∞ ∞ 0 ∞ ∞ t1 dt1 1 −t1 −t2 /4 e dt2 4 = = 0 e−t1 dt1 e−t1 /4 e −5t1 4 dt1 = 4 . 5 Example 7. Therefore P (T1 < T2 ) = 0 1 fT2 (t2 ) = e−t2 /4 . 4 2 2 sin θ dθ + 0 1 π − arcsin 2 = 2 −1+ . 2 > 1. Then the 1 π 1 − arctan . Assuming our unit is 10 minutes. π/4 π/4 π 1 2 When = 1. fT1 (t1 ) = e−t1 and so that the joint density is for t1 . we have. and Θ be the acute angle relative to the lines. Parallel lines at distance 1 are drawn on a large sheet of paper. Compute the probability that it intersects one of the lines. Assume T1 and T2 independent exponential random variables. 4 1 f (t1 .12.

X3 are uniform on [0. . x2 . 6 In general. of X given Y = y is in the discrete case given simply by pX (x|Y = y) = P (X = x|Y = y) = P (X = x. Here is a “physicist’s proof” why this should be the conditional density formula P (X = x + dx|Y = y + dy) = P (X = x + dx. which we deﬁne to be 0. y) dx dy = fY (y) dy f (x. P (X1 + . + Xn ≤ 1) = 1 n! Conditional distributions The conditional p. we deﬁne the conditional density of X given Y = y as follows f (x. Xn are independent an uniform on [0. f. y) = 0 for every x. y) is of course the joint density of (X. 1] and independent. . y) dx = fY (y) = fX (x|Y = y) dx ∞ . For a jointly continous pair of random varaibles X and Y . y) fX (x|Y = y) = .X2 . Y = y) . 1]3 . −∞ f (x. So we 0 have a 0 situation.7 JOINT DISTRIBUTIONS AND INDEPENDENCE 76 Similar approach works for cases of more than two random variables. . . Assume X1 . 1]. m. . What is P (X1 + X2 + X3 ≤ 1)? The joint density now is 1 if (x1 .X3 (x1 . if X1 .13. . P (Y = y) This is trickier in the continuous case. 1−x1 −x2 0 f(X1 . Example 7. y) dx = 0 and so f (x. Let’s do a single example for illustration. x3 ) = fX1 (x1 )fX2 (x2 )fX3 (x3 ) = Here then is how we get the answer: 1 1−x1 P (X1 + X2 + X3 ≤ 1) = dx1 0 0 dx2 1 dx3 = . Observe than when fY (y) = 0. as we cannot divide by P (Y = y) = 0. fY (y) where f (x. X2 . Y ). 0 otherwise. Y = y + dy) P (Y = y + dy) f (x. x2 . x3 ) ∈ [0.

x + y ≤ 1}. 1 − y]. Example 7. √ − y x2 dx = 7 5 y2. otherwise. (It may be a good idea to draw a nice picture and use elementary geometry.15. given Y = y. X is distributed uniformly on [0.) . y) : x. For a given y ∈ [0. Then fX (x|Y = y) = √ √ where − y ≤ x ≤ y. The joint density f (x. In other words.14. y) : −1 ≤ x. y ≥ 0. Y ) be a random point in the square {(x. 1]. √ y 21 2 4 x y 0 x2 ≤ y ≤ 1. Let (X. 2 Suppose we are asked to compute P (X ≥ Y |Y = y)? This makes no literal sense because the probability P (Y = y) of the condition is 0. otherwise. 1]. which is hardly surprising. y) equals 2 on the triangle. y) = Compute fX (x|Y = y). Suppose (X. which equals y √ y 1 1 3 2 −3/2 x y dx = y −3/2 y 3/2 − y 3 = 2 2 2 1 − y 3/2 . rather then calculus. We compute ﬁrst 21 fY (y) = y 4 for y ∈ [0. Problems 1. we know that if Y = y then X is between 0 and 1 − y. 1−y fY (y) = 0 2 dx = 2(1 − y). y ≤ 1}. Y ) be a random point in the triangle {(x.7 JOINT DISTRIBUTIONS AND INDEPENDENCE 77 Example 7. We reinterpret this expression as P (X ≥ y|Y = y) = ∞ y fX (x|Y = y) dx. Let (X. Compute fX (x|Y = y). 1 1−y Therefore fX (x|Y = y) = 0 0 ≤ x ≤ 1 − y. Compute the conditional probability P (X ≥ 0 | Y ≤ 2X). Y ) has joint density f (x. Moreover. 2 21 2 4 x y 7 5/2 2y = 3 2 −3/2 x y .

1]. (a) Determine c. and they have the same density function f (x) = c(2 − x) 0 x ∈ (0. Y ) is given by f (x. y) = 3x 0 if 0 ≤ y ≤ x ≤ 1. (a) The joint p.7 JOINT DISTRIBUTIONS AND INDEPENDENCE 78 2. 4. Let X and Y be independent random variables. both uniformly distributed on [0.5|Z ≤ 0.5). otherwise. P (X ≥ 0|Y ≤ 2X) = = = P (X ≥ 0. m. is given by the table . (b) Compute P (Y ≤ 2X) and P (Y < 2X). and Y the number of 5’s. 1) otherwise. The joint density of (X. f. (c) Are X and Z independent? (b) Compute P (X ≤ 0. (a) Compute the joint probability mass function of X and Y . X and Y are independent random variables. (b) Are X and Y independent? Solutions to problems 1. Let X be the number of 6’s obtained. Let Z = min(X. (b) Are X and Y independent? 3. (a) Compute the conditional density of Y given X = x. After noting the relevant areas. (a) Compute the density function of Z. Y ≤ 2X) P (Y ≤ 2X) 1 1 1 4 2− 2 · 2 ·1 1 2 7 8 2. Y ) be the smaller value of the two. 5. Roll a fair die 3 times.

for x. P (X = x. Y = y) = 3 x 3−x y 1 6 x+y 4 6 3−x (b) No. (a) From 1 c 0 2 it follows that c = 3 . 3. 2. Y = 3) = 0 and P (X = 3)P (Y = 3) = 0. 9 4 12 32 4. P (X = 3. 1. y = 0.7 JOINT DISTRIBUTIONS AND INDEPENDENCE y\x 0 1 2 3 0 43 216 3·42 216 3·4 216 1 216 79 3 1 216 1 3·42 216 3·2·4 216 3 216 2 3·4 216 3 216 P (Y = y) 125 216 75 216 15 216 1 216 0 0 0 0 0 0 P (X = x) 125 216 75 216 15 216 1 216 1 Alternatively. . (2 − x) dx = 1 (b) We have P (Y ≤ 2X) = P (Y < 2X) 1 1 4 dy = (2 − x)(2 − y) dx y/2 9 0 = = = = = 4 9 1 0 1 (2 − y) dy y/2 (2 − x) dx 1− y2 4 4 1 1 y − (2 − y) dy 2 1 − 9 0 2 2 1 3 y2 4 −y+ (2 − y) dy 9 0 2 8 5 y3 9 4 1 dy 3 − y + y2 − 9 0 2 4 8 5 1 9 4 − 3− + . 3 and x + y ≤ 3.

1].5) = P (X ≤ 0. for 0 ≤ y ≤ x. (a) Assume x ∈ [0. the two random variables are not independent. (c) No: Z ≤ X. 4 2 so the answer is 3 .5) = 3 . Y is uniform on [0. y) 3x 1 = 2 = fX (x) 3x x (b) As the answer in (a) depends on x. .5) = 2 . for z ∈ [0. and P (X ≤ 0. In other words. 5. 1] P (Z ≤ z) = 1 − P (both X and Y are above z) = 1 − (1 − z)2 = 2z − z 2 .7 JOINT DISTRIBUTIONS AND INDEPENDENCE (a) For z ∈ [0. As fX (x) = 0 x 3x dy = 3x2 . so that fZ (z) = 2(1 − z). 1] and 0 otherwise. 80 1 (b) From (a). x]. Z ≤ 0.5. we conclude that P (Z ≤ 0. we have fY (y|X = x) = f (x.

(b) Each invalid vote is double-checked independently with probability 1/5000. as well as 20.7 JOINT DISTRIBUTIONS AND INDEPENDENCE 81 Interlude: Practice Midterm 2 This practice exam covers the material from the chapters 5 through 7. compute the probability that at least 3 invalid votes are double-checked. otherwise. Every working day. Let X be the total number of heads among the ﬁrst three tosses.000 valid votes.000. each voter casts a vote independently at random. which you may assume have equal point score. and Y the total number of heads among the last three tosses. Give yourself 50 minutes to solve the four problems. Throw a fair coin 5 times. The arrival of the ﬁrst bus is an exponential random variable with expectation 20 minutes. there are 4. (a) Using a relevant approximation. and independently. A random variable X has density function f (x) = c(x + x2 ). (c) Determine the probability density function of Y = X 2 . (c) Compute the conditional probability P (X ≥ 2 | X ≥ Y ). 1. 4. 3. 2. (b) Are X and Y independent? Explain. x ∈ [0. Also every working day. with two candidates running for the oﬃce. compute the probability that in the ﬁnal count of valid votes only the numbers for the two candidates will diﬀer by less than 1000 votes. (a) What is the probability that John will wait tomorrow for more that 30 minutes? (b) Assume day-to-day independence. 1]. John comes to the bus stop exactly at 7am. What is the probability that Mary will be late on 2 or more working days among the next 10 working days? (c) What is the probability that John and Mary will meet at the station tomorrow? . (Note that if the third toss comes out heads. Mary comes to the same bus stop at a random time. uniformly distributed between 7 and 7:30. (b) Compute E(1/X). (a) Write down the joint probability mass function of X and Y . Not satisﬁed by their choice. A certain country holds a presidential election. Using a relevant approximation. At the end. (a) Determine c. it is counted both into X and into Y ).000 invalid votes. Consider Mary late if she comes after 7:20. based on the outcome of a fair coin ﬂip. He takes the ﬁrst bus that arrives. 0.

7 JOINT DISTRIBUTIONS AND INDEPENDENCE 82 Solutions to Practice Midterm 2 1. 1]. 1 = c 1 = 6 and so c = 5 . . Solution: 1 X 6 11 (x + x2 ) dx 5 0 x 6 1 (1 + x) dx 5 0 1 6 1+ 5 2 9 . otherwise. Solution: Since 1 c(x + x2 ). 0. Solution: . 1 1 + 2 3 5 c. 5 E = = = = (c) Determine the probability density function of Y = X 2 . 1 = c 0 (x + x2 ) dx. A random variable X has density function f (x) = (a) Determine c. 6 (b) Compute E(1/X). x ∈ [0.

1]. P (|Sn − (n − Sn )| ≤ 1000) = P (−1000 ≤ 2Sn − n ≤ 1000) Sn − n/2 −500 ≤ ≤ = P n· 1 · 1 n· 1 · 1 2 2 2 2 = P (Z ≤ 0. compute the probability that in the ﬁnal count of valid votes only the numbers for the two candidates will diﬀer by less than 1000 votes. Not satisﬁed by their choice. 5 and so fY (y) = = = d FY (y) dy 6 √ 1 ( y + y) · √ 5 2 y √ 3 (1 + y). n − Sn is the vote count for candidate 2. based on the outcome of a fair coin ﬂip.5) ≈ P (−0.5) − 1 = P (Z ≤ 0.5) − P (Z ≤ −0. Then F (y) = P (Y ≤ y) √ y 83 = P (X 2 ≤ y) √ = P (X ≤ y) 0 = 6 (x + x2 ) dx.383. as well as 20. 5 2. Thus Sn is Binomial(n.7 JOINT DISTRIBUTIONS AND INDEPENDENCE The values of Y are in [0. 000.5 ≤ Z ≤ 0. there are 4.6915 − 1 = 0. 1]. At the end.000 valid votes.000. A certain country holds a presidential election. . Solution: Let Sn be the vote count for candidate 1.5) − (1 − P (Z ≤ −0. p) where n = 1 4.5) − 1 = 2P (Z ≤ 0. so we will assume that y ∈ [0. 000 and p = 2 . with two candidates running for the oﬃce. (a) Using a relevant approximation. each voter casts a vote independently at random.5) 500 n· 1 2 · 1 2 ≈ 2 · 0.5)) = 2Φ(0.000 invalid votes. Then.

Y = 1. and then P (X = 2. .7 JOINT DISTRIBUTIONS AND INDEPENDENCE 84 (b) Each invalid vote is double-checked independently with probability 1/5000. Using a relevant approximation. observe that the number of outcomes is 25 = 32. compute the probability that at least 3 invalid votes are double-checked. and thus approximately Poisson(4). 32 32 32 etc. 3rd coin Heads) +P (X = 2. let Sn be the number of double-checked votes. Y = 1) = + = . Y = 1. Let X be the total number of heads among the ﬁrst three tosses. Y = 2) = + = . (a) Write down the joint probability mass function of X and Y . Y = 1) = P (X = 2. and Y the total number of heads among the last three tosses. it is counted both into X and into Y ). Solution: P (X = x. = 32 32 32 2·2 1 5 P (X = 2. 3. To compute these. Throw a fair coin 5 times. 32 32 32 2·2 1 5 P (X = 1. which is Binomial 20000. 3rd coin Tails) 2 4 2 + = . (Note that if the third toss comes out heads. Y = y) is given by the table x\y 0 1 2 3 0 1/32 2/32 1/32 0 1 2/32 5/32 4/32 1/32 2 1/32 4/32 5/32 2/32 3 0 1/32 2/32 1/32 1 5000 . Solution: Now. Then P (Sn ≥ 3) = 1 − P (Sn = 0) − P (Sn = 1) − P (Sn = 2) 42 ≈ 1 − e−4 − 4e−4 − e−4 2 = 1 − 13e−4 .

Y = 3) = 0 = 1 1 · = P (X = 0)P (Y = 3).7 JOINT DISTRIBUTIONS AND INDEPENDENCE 85 (b) Are X and Y independent? Explain. (a) What is the probability that John will wait tomorrow for more that 30 minutes? Solution: Assume time unit 10 minutes. Mary comes to the same bus stop at a random time. Then. X ≥ Y ) P (X ≥ Y ) 1+4+5+1+2+1 1+2+1+5+4+1+5+2+1 14 22 7 . 1 fT (t) = e−t/2 . 8 8 (c) Compute the conditional probability P (X ≥ 2 | X ≥ Y ). Solution: No. 11 P (X ≥ 2|X ≥ Y ) = = = = 4. 2 for t ≥ 0. P (X = 0. and independently. Solution: P (X ≥ 2. and P (T ≥ 3) = e−3/2 . It is Expo1 nential with parameter λ = 2 . John comes to the bus stop exactly at 7am. Let T be the arrival time of the bus. The arrival of the ﬁrst bus is an exponential random variable with expectation 20 minutes. uniformly distributed between 7 and 7:30. He takes the ﬁrst bus that arrives. Every working day. Also every working day. .

3 1 The number of late days among 10 days is Binomial 10. Therefore.X) (t. 6 for x ∈ [0. 3 . 3 . What is the probability that Mary will be late on 2 or more working days among the next 10 working days? Solution: Let X be the Mary’s arrival time. 3]. Therefore P (X ≥ T ) = = = ∞ 1 3 1 −t/2 e dt dx 3 0 2 x 1 3 −x/2 e dx 3 0 2 (1 − e−3/2 ). Consider Mary late if she comes after 7:20. 3] and t ≥ 0.7 JOINT DISTRIBUTIONS AND INDEPENDENCE 86 (b) Assume day-to-day independence. 1 P (X ≥ 2) = . and therefore. It is uniform on [0. x) = e−t/2 . P (2 or more late working days among 10 working days) = 1 − P (0 late) − P (1 late) = 1− 2 3 10 − 10 · 1 3 2 3 9 (c) What is the probability that John and Mary will meet at the station tomorrow? Solution: We have 1 f(T.

8 MORE ON EXPECTATION AND LIMIT THEOREMS

87

8

More on Expectation and Limit Theorems

If you have a pair of random variables (X, Y ) with joint density f and take another function g of two variables, then Eg(X, Y ) = g(x, y)f (x, y) dxdy

**while if instead (X, Y ) is a discrete pair with joint probability mass function p, Eg(X, Y ) =
**

x,y

g(x, y)p(x, y).

Example 8.1. Assume that two among 5 items are defective. Put the items in a random order, and inspect them one by one. Let X be the number of inspection needed to ﬁnd the ﬁrst defective item, and Y the number of additional inspections needed to ﬁnd the second defective item. Compute E|X − Y |. The joint p. m. f. of mass function of (X, Y ) is given by the following table, which lists P (X = i, Y = j), together with |i − j| in brackets whenever this probability is nonzero, for i, j = 1, 2, 3, 4: i\j 1 2 3 4 1 (0) (1) (2) (3) 2 .1 (1) .1 (0) .1 (1) 0 3 .1 (2) .1 (1) 0 0 4 .1 (3) 0 0 0

.1 .1 .1 .1

The answer is therefore E|X − Y | = 1.4. Example 8.2. Assume that (X, Y ) is a random point in the right triangle {(x, y) : x, y ≥ 0, x + y ≤ 1}. Compute EX, EY , and EXY . Note that the density is 2 on the triangle, and so,

1 1−x

EX =

0 1

dx

0

x 2 dy

=

0

2x(1 − x) dx 1 1 − 2 3

= 2 = 1 , 3

**8 MORE ON EXPECTATION AND LIMIT THEOREMS
**

1 and therefore, by symmetry, EY = EX = 3 . Furthermore, 1 1−x

88

E(XY ) =

0 1

dx

0

xy2 dy (1 − x)2 dx 2

=

0 1

2x

=

0

(1 − x)x2 dx

= =

1 1 − 3 4 1 . 12

**Linearity and monotonicity of expectation
**

Theorem 8.1. Expectation is linear and monotone: 1. For constants a and b, E(aX + b) = aE(X) + b. 2. For arbitrary random variables X1 , . . . , Xn whose expected values exist, E(X1 + . . . + Xn ) = E(X1 ) + . . . + E(Xn ). 3. For two random variables X ≤ Y , we have EX ≤ EY . Proof. We will check the second property for n = 2 and continuous case, that is, that E(X + Y ) = EX + EY. This is a consequence of the same property for two-dimensional integrals: (x + y)f (x, y) dx dy = xf (x, y) dx dy + yf (x, y) dx dy.

To prove this property for arbitrary n (and continuous case), one could simply proceed by induction. The way we deﬁned expectation, the third property is not absolutely obvious. However, it is clear that Z ≥ 0 implies EZ ≥ 0, and applying this to Z = Y − X, together with linearity, does establish the monotonicity. It should be emphasized again that linearity holds for arbitrary random variables which do not need to be independent! This is very useful, as we can often write a random variable X as a sum of (possibly dependent) indicators, X = I1 + · · · + In . An instance of this method is called the indicator trick .

8 MORE ON EXPECTATION AND LIMIT THEOREMS

89

Example 8.3. Assume that an urn contains 10 black, 7 red, and 5 white balls. Take 5 balls out (a) with and (b) without replacement and let X be the number of red balls pulled out. Compute EX. Let Ii be the indicator of the event that the i’th ball is red, that is, Ii = I{ith ball is red} = In both cases then X = I1 + I2 + I3 + I4 + I5 .

7 In part (a), X is Binomial(5, 22 ), so we know that EX = 5 · knowledge. Instead, it is clear that 7 22 ,

1 if ith ball is red, 0 otherwise.

but we will not use this

**EI1 = 1 · P (1st ball is red) = Therefore, by additivity, EX = 5 ·
**

7 22 .

7 = EI2 = . . . = EI5 . 22

**For part (b), one way is to compute the p. m. f. of X, P (X = i) =
**

7 i 15 5−i 22 5 15 5−i 22 5

**where i = 0, 1, . . . , 5, and then
**

5

EX =

i=0

i

7 i

.

However, the indicator trick works exactly as before (the fact that Ii are now dependent does 7 not matter) and so the answer is also exactly the same, EX = 5 · 22 . Example 8.4. Matching problem, revisited. Assume n people buys n gifts, which are then assigned at random, and let X be the number of people who receive their own gift. What is EX? This is another problem very well suited for the indicator trick. Let Ii = I{person i receives own gift} . Then X = I1 + I2 + . . . + In . Moreover, EIi = for all i, and so EX = 1. Example 8.5. Five married couples are seated around a table at random. Let X be the number of wives who sit next to their husbands. What is EX? 1 , n

the number of additional cards to get n the third new card is Geometric with success probability n−2 .7. is trivial as the ﬁrst card you buy is new: N1 = 1. N2 . 7 red. i = 1. Compute EW and EY . 2]. after you have received the (i − 1)’st new card. 3]. . and n N = N1 + . . . . In general. . After that N3 . + Nn . and 5 white balls. Assume that an urn contains 10 black. Let N be the number of cards you need to sample for complete collection....e. the number of additional cards to get the second new card is Geometric with success probability n−1 . 9 10 .6. Afterwards. . with replacement. . . X = I1 + . Sample from n cards. revisited. [2. . . the number of cards to receive of 1st new card. Then. . . [n − 1. so that EN = n 1 + Now we have n i=2 1 1 1 + + . and let W is the number of white balls pulled out.8 MORE ON EXPECTATION AND LIMIT THEOREMS 90 Now let Ii = I{wife i sits next to her husband} . n→∞ n log n lim Example 8. and so n log n ≤ which establishes the limit i=1 1 ≤ log n + 1 i EN = 1. and EIi = so that EX = 2 . 1 ≤ i 1 1 dx ≤ x n−1 i=1 1 i by comparing the integral with the Riemman sum at left and right endpoints in the division of [1. indeﬁnitely. i. Then N1 . and Y the number of diﬀerent colors. to get all diﬀerent cards are represented. What is EN ? Let Ni be the number the number of additional cards you need to get the i’th new card. 9 Example 8. + I5 . n]. n] into [1. Ni is geometric with n success probability n−i+1 . Coupon collector problem. Pull 5 balls out (a) with replacement and (b) without replacement. + 2 3 n n . . n.

and white balls are represented. y ≤ 1} 1 instead. let’s pick a random point (X. x + y ≤ 1 1 1}. E[g(X)h(Y )] = Eg(X) · Eh(Y ) Proof. and therefore E(XY ) = EX · EY = 4 . y ≥ 0. Now X and Y are independent. y) dx dy g(x)h(y)fX (x)gY (y) dxdy g(x)fX (x) dx h(y)fY (y) dy = Eg(X) · Eh(Y ) Example 8. 5 22 Let Ib . i. y) : 0 ≤ x. if X and Y are independent. Ir . Y ) is a random point on the square {(x. and Iw be the indicatots of the event that. Now. respectively.. The two random variables have E(XY ) = EX · EY and they cannot be independent.8. in the case with replacement EY = 1 − while in the case without replacement EY = 1 − 12 5 22 5 155 175 125 + 1 − 5 + 1 − 5. For the continuous case. Expectation and independence Theorem 8. E[g(X)h(Y )] = = = g(x)h(y)f (x.8 MORE ON EXPECTATION AND LIMIT THEOREMS 91 We already know that EW = 5 · in either case. Multiplicativity of expectation for independent factors. We computed that E(XY ) = 12 and EX = EY = 3 . red.2. y) : x.e. . Of course we knew that they are not independent already. 225 22 22 +1− 15 5 22 5 +1− 17 5 22 5 . Let’s return to a random point (X. Expectation of product of independent random variables is product of their expectations. and so. Y ) in the triangle {(x. black. Clearly Y = Ib + Ir + Iw .

y = y y fY (y|X = x) (continuous case). This is the expectation of Y provided we know what X is.3. Y ). pick a random point (X. (1. note again that this is an expression dependent on X. less mysteriously. and so we can compute its expectation. Tower property. with corners at (0. EY = E(Y |X = x)fX (x) dx. let’s call it g(x) for just a moment. 1). −1). Theorem 8. The formula E(E(Y |X)) = EY holds. Observe that E(Y |X = x) is a function of x. (0. Y ) in the square rotated by 45 degrees. 0). by symmetry. 0). and in the continuous case.8 MORE ON EXPECTATION AND LIMIT THEOREMS 92 Finally. in the discrete case EY = x E(Y |X = x) · P (X = x). Clearly we have. . Computing expectation by conditioning For a pair of random variables (X. Here is what we get. Then we denote g(X) by E(Y |X). and (−1. x dx · 0 This is an example where E(XY ) = EX · EY even though X and Y are not independent. deﬁne the conditional expectation of Y given X=x E(Y |X = x) = P (Y = y|X = x) (discrete case). EX = EY = 0 but also E(XY ) = = = 1 2 1 2 1 dx −1 1 −1+|x| 1−|x| 1−|x| xy dy y dy x dx −1 1 −1 −1+|x| 1 2 = 0.

Let X be the number on the die when x = 1. 2 and therefore. y = y) = EY Example 8. In particular. and indeed it does as EX = EY = 3 . Roll a die and then toss as many coins as shown up on the die. . After your stay in purgatory is over.10. E(number of heads) = E[Y ] 6 = x=1 6 x· x· 1 · P (X = x) 2 1 1 · 2 6 = x=1 = 7 . 2. and the expectation of 2 (1 − X) therefore must equal expectation of 2 1 Y . Once again. Y is Binomial(x. 2 1 So E(Y |X) = 1 (1 − X). as we know. 6}. Y ) in the triangle {(x. To verify this in the discrete case. Compute the expected number of heads. Fix an 1 x ∈ {1.8 MORE ON EXPECTATION AND LIMIT THEOREMS 93 Proof. Here is another job interview question. You die and are presented with three doors.11. . . 4 Example 8. y ≥ 0. with the obvious meaning. Example 8. and one leads to two days in purgatory. 1 and 2. you go back to the doors and picks again. 1 E(Y |X = x) = x · . . one leads to one day in purgatory.9. Y = y) · P (X = x) P (X = x) = x y yP (X = x. Given that X = x. but the doors are reshuﬄed each time you come back. and let N be the number of days in purgatory. 1 − x] and so 1 E(Y |X = x) = (1 − x). . we write out the expectation inside the sum: yP (Y = y|X = x) = x y x y y P (X = x. . consider a random point (X. 6 and Y be the number of heads. How long is your expected stay in purgatory? Code the doors 0. 2. Given that X = x. y) : x. One of them leads to heaven. 2 ). so you must in fact choose a door at random each time. . . . Y is distributed uniformly on [0. x + y ≤ 1}.

“on the average. then Cov(X. and solving this equation gives Covariance Let X. Y ) = P (A ∩ B) − P (A)P (B) = P (A)[P (B|A) − P (B)] If P (B|A) > P (B). and so Cov(X. so X = IA and Y = IB . E(N |your ﬁrst pick is door 1) = 1 + EN. but the converse is false. 3 3 EN = 3. For general random variable X and Y . We deﬁne the covariance of (or between) X and Y as Cov(X. for two events A and B. 1 1 EN = (1 + EN ) + (2 + EN ) . the most useful formula is Cov(X. You should immediately note that. i=j Xi ) = i=1 i=1 Var(Xi ) + .4. Y ) = 0. if the events are negatively correlated all inequalities are reversed. Y ) = E((X − EX)(Y − EY )) = E(XY − (EX) · Y − (EY ) · X + EX · EY ) = E(XY ) − EX · EY = E(XY ) − EX · EY − EY · EX + EX · EY To summarize. Variance of sums of random variables Theorem 8. Therefore. Cov(Xi . Y ) = E(XY ) − EX · EY. Xj ). E(N |your ﬁrst pick is door 2) = 2 + EN. EY = P (B).8 MORE ON EXPECTATION AND LIMIT THEOREMS 94 Then E(N |your ﬁrst pick is door 0) = 0. Y ) > 0 means intuitively that. Cov(X. Variance-covariance formula: n n E( Var( Xi ) = i=1 n i=1 n 2 EXi2 + i=j E(Xi Xj ). if X and Y are independent. E(XY ) = E(IA∩B ) = P (A ∩ B). we say the two events are positively correlated and in this case the covariance is positive. Then EX = P (A).” increasing X will result in larger Y . Y be random variables. Let X and Y be indicator random variables.

.13. Example 8. where Ii is the indicator I{ith trial is a success} . i=1 Therefore Ii are independent. Recall that X is the number of people who get their own gift. = i=1 Var(Xi ) + i=j which is equivalent to the formula.5. Example 8.+Xn ) = Var(X1 )+. Linearity of variance for independent summands. especially when they are indicators. The ﬁrst formula follows from writing the sum n 2 n Xi i=1 = i=1 Xi2 + i=j Xi Xj and linearity of expectation. Corollary 8. . We will now compute its expectation and variance. . . p). The variance-covariance formula often makes computing the variance possible even if the random variables are not independent. Then ESn = np. .12. . Xn are independent. If X1 . Matching problem. Let Sn be Binomial(n. and we will compute Var(X). revisited yet again. . +Var(Xn ). X2 . The crucial observation is that Sn = n Ii . . .8 MORE ON EXPECTATION AND LIMIT THEOREMS 95 Proof. then Var(X1 +. and n Var(Sn ) = i=1 n Var(Ii ) (E(Ii ) − E(Ii2 )) = i=1 = n(p − p2 ) = np(1 − p). The second formula follows from the ﬁrst: n n n 2 Var( i=1 Xi ) = E i=1 n Xi − E( Xi ) i=1 2 = E i=1 n (Xi − EXi ) E(Xi − EXi )E(Xj − EXj ).

a large number of times. 10 · 9 36 5 2 5 − 2 5 3 2 =− 5 . . the proportion of times A happens converges to p. EX = EY = 10 = 5 . 10 10 EXY = i=1 j=1 E(Ii Jj ) 1 62 = i=j = = and Cov(X. we promised to attach a precise meaning to the statement: “If you repeat the experiment. i=1 i=1 Then. (In fact. and actually prove the more general statement below. Theorem 8. E(Ii Ij ) is the probability that the i’th person and j’th person both get their own gifts.6. At the beginning of the course.” We will now do so. Roll a die 10 times. and Y = 10 Ji where Ji = I{ith roll is 5} .8 MORE ON EXPECTATION AND LIMIT THEOREMS n i=1 Ii . Compute Cov(X. Weak Law of Large Numbers.14.) Example 8. 18 Weak Law of Large Numbers Assume that an experiment is performed in which the event A happens with probability P (A) = p. Observe that X = 10 Ii where Ii = I{ith roll is 6} . so that EIi = 1 + n E(Ii Ij ) i=j E(X 2 ) = n · = 1+ i=j 1 n(n − 1) = 1+1 = 2. X is for large n very close to Poisson with λ = 1. as should be expected. and E(Ii Jj ) = 62 if i = j (by independence of diﬀerent rolls). Let X be the number of 6 rolled and Y be the number of 5 rolled. 1 n 96 and Recall also that X = where Ii = I{ith person gets own gift} . Therefore. Y ) = negative. We conclude that Var(X) = 1. Moreover E(Ii Jj ) equals 0 when i = j (because both 5 and 6 6 3 1 cannot be rolled on the same roll). 1 and thus equals n(n−1) . Y ). Above. independently.

8 MORE ON EXPECTATION AND LIMIT THEOREMS 97 If X. . . and a > 0. and the right-hand side at least 1.. If EX = µ and Var(X) = σ 2 are both ﬁnite. . + Xn − EX ≥ n → 0. each of which is success with probability p. . X1 + .8.16. P (|X − 1| ≥ 0. as n → ∞. If EX = 1. P (X ≥ a) ≤ 1 EX. Taking expectation of both sides. k Example 8.1. are independent and identically distributed random variables with ﬁnite expectation and variance. If EX = 1 and X ≥ 0. X2 . Var(X) = 0.5 5 1 81 . it must be that P (X ≥ 10) ≤ 0. if X < a. the left-hand side is 0. for any ﬁxed > 0. . .5) ≤ 2 0. Sn = I1 + . we get 1 P (X ≥ a) = E(I{X≥a} ) ≤ EX a I{X≥a} ≤ Theorem 8. for every > 0.+Xn converges to EX in the sense that. Theorem 8. Markov Inequality. In particular. P Sn −p ≥ n → 0. 2 0. If EX = 1 and Var(X) = 1. As the previous two examples show. P (X ≥ 10) ≤ P (|X − 1| ≥ 9) ≤ Example 8. a Example 8. and the right-hand side nonnegative. Thus proportion of successes converges to p in this sense. as we have observed before. Chebyshev’s inequality. where Ii = I{success at trial i} . and k > 0. then..1. the left-hand side is 1.7. Chebyshev’s inequality is useful if either σ is small or k is large. . . n P as n → ∞. + In . If X ≥ 0 is a random variable.1 = .17. Proof. then X1 +. Here is the crucial observation: 1 X. while if X ≥ a. if Sn is the number of successes in n independent trials. X1 .15. So. a Indeed. then σ2 P (|X − µ| ≥ k) ≤ 2 .

X2 . P (|Sn − nµ| ≥ n ) ≤ as n → ∞. n2 2 n We will make this statement more precise below. This suggests that X1 +. with ﬁnite µ = EX and σ 2 = V ar(X). Denote µ = EX. . Central Limit Theorem Theorem 8. where Z is standard Normal.+Xn converges to EX at the rate of about n 1 √ . A careful look through the proof above will show that it remains valid if depends on n. . σ 2 = Var(X). + EXn = nµ.. Therefore. We will not prove this theorem in detail. n σ2 nσ 2 = 2 → 0. are independent. and Var(Sn ) = nσ 2 . Proof. Central limit theorem. but will later give good indication as to why it holds. k2 k We are now ready to prove the Weak Law of Large Numbers.8 MORE ON EXPECTATION AND LIMIT THEOREMS 98 Proof. + Xn − µ · n √ ≤x σ n → P (Z ≤ x) as n → ∞. . identically distributed random variables. . By Markov’s inequality. Assume that X. . by Chebyshev’s inequality. .. Adding many independent copies of a random variable erases all information about its distribution other than expectation and variance! . Then ESn = EX1 + . and the theorem says that their sum approximates a very particular distribution. Then P X1 + . . the normal one. P (|X − µ| ≥ k) = P ((X − µ)2 ≥ k2 ) ≤ 1 1 E(X − µ)2 = 2 Var(X). . + Xn . You should however observe that it is a remarkable theorem: the random variables Xi have an arbitrary distribution (with given expectation and variance). . and let Sn = X1 + . but 1 goes to 0 slower than √n .9. X1 .

8 MORE ON EXPECTATION AND LIMIT THEOREMS 99 Example 8.19. (b) Using the approximation. Example 8. . or P Z ≥ − P Z ≤ − 50 n· 1 12 n 2 1 12 n 2 = 0.326.99.9. you get free entrance. We know that EXi = For (a). • if you roll 5. Assume that Xn are independent and uniform on [0.345 n − 100 = 0 and that n = 115. P (S200 ≤ 90) = P 1 2 and Var(Xi ) = 1 3 − 1 4 = 1 12 .99 for (approximately) z = 2. they oﬀer to ﬁrst 30000 “guests” the following game. 1] Let Sn = X1 + . − 50 n· 1 12 Using the fact that Φ(z) = 0. you pay the normal fee.007 ≈ 1 − 0.99. (a) Compute approximately P (S200 ≤ 90). = Φ = 0.99. For promotion.99. Compute the number s so that the revenue loss is at most s with probability 0. you get free entrance and $2. we rewrite P Sn − n · n· n 2 1 2 1 12 ≥ 50 − n · n· − 50 n· 1 2 1 12 and then approximate = 0. A casino charges $1 for entrance.993 ≤ 90 − 200 · 200 · 1 2 1 12 For (b). S200 − 200 · 1 12 1 2 200 · √ ≈ P (Z ≤ − 6) √ = 1 − P (Z ≤ 6) = 0. + Xn .18. ﬁnd n so that P (Sn ≥ 50) ≥ 0. • otherwise. . Roll a fair die: • if you roll 6. we get that √ n − 1. .

Xi are independent. otherwise.8159. Let X be the total number of balls drawn. 9 n· 11 9 ·n ≤ 2 3 s− 2 3 ·n 11 9 ≈ P Z ≤ which gives s− and ﬁnally. we need to ﬁnd s so that P (L ≤ s) = 0. ·n 11 9 = 0. if the draws are white. y) = 3x 0 Compute Cov(X. Make 3 draws out of the urn in succession with replacement. n· n· 11 ≈ 20156. 9 Problems 1. 2 3 2 3 2 1 9 + − 6 6 = 11 . Therefore 6 EXi = and Var(Xi ) = Therefore.9. Compute E(XY ).8159 3 2 3 s− ·n 11 9 n· n· = 0.9. if L is lost revenue. For example. Y ).8 MORE ON EXPECTATION AND LIMIT THEOREMS 100 In symbols. s= 2 n + 0. and P (Xi = 0) = 1 P (Xi = 1) = 6 . We have L = X1 + · · · + Xn where n = 30000. and P (Xi = 3) = 1 . Y = 2. An urn contains 2 white and 4 black balls. Y ) is given by f (x. . and Y the draw of the ﬁrst white ball. black. P (L ≤ s) = P L− 2 3 4 6. if 0 ≤ y ≤ x ≤ 1. then X = 1. 2. black. The joint density of (X.

Solutions to problems 1. Compute. 15 1 P (X = 0. Roll a fair die 24 times. 15 so that E(XY ) = 1 · 2. Y = 1) = P (wbb) = . 5. We have 1 P (X = 0. 5 2 P (X = 0. 4. Y ) = 3 10 · 3 8 = 3 160 . of (X. Compute EN . Each bird looks left or right with equal probability. m. 15 1 P (X = 0. (b) Compute the expected number of women that sit next to at least one man. Y ). We have 1 1 1 1 8 2 +2· +2· +4· +6· = . (a) Compute the expected number of women that sit next to their husbands. f. Let N be the number of birds not seen by any neighboring bird.8 MORE ON EXPECTATION AND LIMIT THEOREMS 101 3. the probability that the sum of numbers exceeds 100. Y = 1) = P (bwb or bbw) = . . 4 0 0 0 1 x 1 3 3 3 y · 3x dy = dx EX = x dx = . 5 1 P (X = 0. Y = 1) = P (bbb) = . Y = 1) = P (wbw) = . 2 10 0 0 0 dx x · 3x dy = − 3 4 1 x 1 so that Cov(X. 5 5 15 15 15 5 EX = 3 3x3 dx = . 8 0 2 0 0 1 x 1 3 4 3 xy · 3x dy = dx E(XY ) = x dx = . First we determine the joint p. There are 20 birds that sit in a row on a wire. using a relavant approximation. Y = 1) = P (bww) = . 5 1 P (X = 0. Five married couples are seated at random in a row of 10 seats. Y = 1) = P (wwb) = .

8 MORE ON EXPECTATION AND LIMIT THEOREMS 102 3. (a) Let the number be M . the probability that it is not seen is 1 . We know that EXi = 7 . Then EIi = and so EM = EI1 + . EIi = and so EM = EI1 + . 2 4 2 5. whereby the woman either sits on one of the two end chairs or on one of the eight middle chairs. Let X1 . . + EI5 = 5 EI1 = 2 5 8 4 3 · + · 1− · 10 9 10 9 8 7 = . (b) Let the number be N . For the two birds at either end. + EI5 = 5 EI1 = 1. . . . by dividing into cases. X2 . . . + Xn be the sum. 9 35 . So we have 2 12 7 7 S24 − 24 · 2 100 − 24 · 2 ≈ P (Z ≥ 1. P (S24 ≥ 100) = P ≥ 35 35 24 · 12 24 · 12 . 9 1 9! 2! = .85) = 1 − Φ(1. By the indicator trick 4 EN = 2 · 1 11 1 + 18 · = . . Let Ii = I{couple i sits together} .85) ≈ 0. . Then.032. . and Var(Xi ) = 35 . Let Ii = I{woman i sits next to a man} . be the numbers on successive rolls and Sn = X1 + . while for any other 2 bird this probability is 1 . 10! 5 4.

Y is automatically 0. Let Y be the number of heads on the second coin. (b) Compute P (X ≥ 2 | Y = 1). without replacement. Toss the ﬁrst coin three times. (c) Compute the expected number of diﬀerent suits among the ﬁrst four cards selected. 4. ♥. ♦. 2. 000. and win or lose nothing otherwise. You win $2 if both tosses comes out heads.. which you may assume have equal point score. (a) What is the expected number of games you need to play to win once? . (d) Compute the expected number of cards you have to select to get the ﬁrst hearts card. 000 single male tourists visit Las Vegas every year. 000. write down a formula for P (X = i. (b) Compute the expected number of such drunk marriages in next 10 years. Give yourself 120 minutes to solve the six problems. Toss a fair coin twice. and so do Norwegians and Finns). Y = j) for all relevant i and j. You have two fair coins. (c) Compute the probability that the two Swedes have exactly one person sitting between them. as many times as you got heads on the ﬁrst one. that is.e. Select cards from the deck at random. Assume also that each of these tourists independently gets married while drunk with probability 1/1. (a) Compute the probability that the ﬁrst four cards selected are all hearts (♥). (a) Write down the exact probability that exactly 3 male tourists will get married while drunk next year. Assume that 2. Eleven Scandinavians: 2 Swedes. 5. 000. lose $1 if no toss comes out heads. toss the second coin twice and count the number of heads to get Y . (a) Compute the probability that all groups sit together (i. and let X be the number of heads. (c) Write down a relevant approximate expression for the probability in (a). one by one. if X = 0. (d) Write down an approximate expression for the probability that there will be no such drunk marriages during at least one of the next 3 years.) (a) Determine the joint probability mass function of X and Y . (b) Compute the probability that at least one of the groups sits together. 1.8 MORE ON EXPECTATION AND LIMIT THEOREMS 103 Interlude: Practice Final This practice exam covers the material from the chapters 1 through 8. that is. are seated on a row of 11 chairs at random. 4 Norwegians and 5 Finns. ♠). if X = 2. (b) Compute the probability that all suits are represented among the ﬁrst four cards selected. Recall that a full deck of cards contains 13 cards of each of the four suits (♣. Then toss the second coin X times. 3. (For example. Swedes occupy adjacent seats.

Here and in (b): use (b) Find Var(X + Y ).9. Compute (approximately) the amount of money x. (c) Find P (X + Y < 1) and P (X + Y ≤ 1). such that your winnings will be at least x with probability 0. approximately. (a) Find the value of c. 1 n 0 x dx = 1 n+1 .8 MORE ON EXPECTATION AND LIMIT THEOREMS 104 (b) Assume that you play this game 500 times. g(x) = 0 otherwise. Two random variables X and Y are independent and have the same probability density function c(1 + x) x ∈ [0. assume that you play this game 500 times. (d) Find E|X − Y |. stop. Then do the same with probability 0. Here and in (d): when you get to a single integral involving powers.5. 1]. What is. 6. . the probability that you win at least $135? (c) Again. for n > −1.

so EX = 4 EI♥ = 4 1− 39 4 52 4 . then 39 EN = E (I1 + · · · + EI39 ) = . Select cards from the deck at random. Solution: 134 52 4 (c) Compute the expected number of diﬀerent suits among the ﬁrst four cards selected. Recall that a full deck of cards contains 13 cards of each of the four suits (♣. (a) Compute the probability that the ﬁrst four cards selected are all hearts (♥). ♦. 14 . then X = I♥ + I♦ + I♣ + I♠ .8 MORE ON EXPECTATION AND LIMIT THEOREMS 105 Solutions to Practice Final 1. ♥. . . Solution: Label non-♥ cards 1. ♠). If N is our number of cards you have to select to get the ﬁrst hearts card. . Then EI♥ = 1 − 39 4 52 4 . without replacement. . (d) Compute the expected number of cards you have to select to get the ﬁrst hearts card. etc. where I♥ = I{♥ is represented} . Then EIi = 1 14 for any i. 39 and let Ii = I{card i selected before any ♥ card} . Solution: 13 4 52 4 (b) Compute the probability that all suits are represented among the ﬁrst four cards selected. Solution: If X is the number of suits represented. which is the same for other three indicators. one by one.

Then toss the second coin X times. and similarly AN and AF . 11. or 9. and so do Norwegians and Finns). are seated on a row of 11 chairs at random.. if X = 0. Toss the ﬁrst coin three times. that is. so the answer is 9 9 11 = 55 .e. as many times as you got heads on the ﬁrst one. 5.) . and let X be the number of heads. 3.. Eleven Scandinavians: 2 Swedes. Solution: 2! 4! 5! 3! 11! (b) Compute the probability that at least one of the groups sits together. Solution: Deﬁne AS = {Swedes sit together}. . Let Y be the number of heads on the second coin. or 2. Y is automatically 0. Swedes occupy adjacent seats. or 3. These are exactly 9 possibilities. You have two fair coins. (a) Compute the probability that all groups sit together (i. 4. 4 Norwegians and 5 Finns. 2 3. toss the second coin twice and count the number of heads to get Y . if X = 2... Solution: The two Swedes may occupy chairs 1.8 MORE ON EXPECTATION AND LIMIT THEOREMS 106 2. Then P (AS ∪ AN ∪ AF ) = P (AS ) + P (AN ) + P (AF ) − P (AS ∩ AN ) − P (AS ∩ AF ) − P (AN ∩ AF ) + P (AS ∩ AN ∩ AF ) 2! 10! + 4! 8! + 5! 7! − 2! 4! 7! − 2! 5! 6! − 4! 5! 3! + 2! 4! 5! 3! = 11! (c) Compute the probability that the two Swedes have exactly one person sitting between them. (For example.

Solution: We use that X is approximately Poisson with λ = 2. Solution: As X is binomial. Y = 1) + P (X = 3. (c) Write down a relevant approximate expression for the probability in (a). so the answer is 10 EX = 20. (a) Write down the exact probability that exactly 3 male tourists will get married while drunk next year. write down a formula for P (X = i. Solution: With X equal to the number of such drunken marriages. 000 single male tourists visit Las Vegas every year. P (X = 1. 000 and n = 2. 000. Solution: P (X = i. (b) Compute P (X ≥ 2 | Y = 1). Assume also that each of these tourists independently gets married while drunk with probability 1/1. X is Binomial(n. Y = 1) 5 = . 000.8 MORE ON EXPECTATION AND LIMIT THEOREMS 107 (a) Determine the joint probability mass function of X and Y . its expected value is np = 2. so that the answer is λ3 −λ 4 −2 e = e . Solution: This equals P (X = 2. that is. Y = j) for all relevant i and j. 000. Assume that 2. Y = j) = P (X = i)P (Y = j|X = i) = for 0 ≤ j ≤ i ≤ 3. p) with p = 1/1. Y = 1) + P (X = 2. Y = 1) + P (X = 3. Y = 1) 9 4. so we have P (X = 3) = n 3 p (1 − p)n−3 3 3 1 i 1 · i 23 j 2i (b) Compute the expected number of such drunk marriages in next 10 years. 3! 3 . 000. 000. 000.

is 4. Xn winnings in successive games. + Xn . 16 19 16 ≥ 135 − n · n· 19 16 where Z is standard Normal.8 MORE ON EXPECTATION AND LIMIT THEOREMS 108 (d) Write down an approximate expression for the probability that there will be no such drunk marriages during at least one of the next 3 years. and win or lose nothing otherwise. Solution: This equals 1−P (at least one such marriage in each of the next 3 years). and X1 . You win $2 if both tosses comes out heads. . Using n = 500. . X2 . expectation of a Geometric 4 variable. What is. (a) What is the expected number of games you need to play to win once? Solution: The probability of winning in 1 . Toss a fair coin twice. lose $1 if no toss comes out heads. 5. . the probability that you win at least $135? Solution: Let X be the winnings in one game. which equals 1 − 1 − e−2 3 . we get the answer 10 . . . with Sn = X1 + . approximately. 1 4 random (b) Assume that you play this game 500 times. . . 1 − Φ 19 500 · 16 1 4 ≈ P Z ≥ 135 − n · n· 19 16 1 4 . The answer. Then we have EX = 2 · and Var(X) = 4 · Thus P (Sn ≥ 135) = P Sn − n · n· 1 4 1 1 1 −1· = 4 4 4 1 4 2 1 1 +1· − 4 4 = 19 .

Compute (approximately) the amount of money x.28 19 500 · 16 and therefore x = 125 − 1. such that your winnings will at least x with probability 0. the answer is exactly ESn = n · 4 = 125. Solution: 1 For probability 0.9 at z ≈ 1. assume that you play this game 500 times.) 6.28. 1 3 (1 + x) dx = c · .5. 2 (b) Find Var(X + Y ). we approximate 125 − x 125 − x x − 125 = P Z ≤ = Φ . leading to the equation 125 − x = 1.8 MORE ON EXPECTATION AND LIMIT THEOREMS 109 (c) Again. For probability 0. g(x) = 0 otherwise.28 · (This gives x = 93.5. Two random variables X and Y are independent and have the same probability density function c(1 + x) x ∈ [0. Here and in (b): use Solution: As 1 =c· we have c = 2 3 0 1 n 0 x dx 500 · 19 . Solution: . Then we use that Φ(z) = 0. P (Sn ≥ x) ≈ P Z ≥ 19 19 19 500 · 16 500 · 16 500 · 16 where we have used that x < 125. 1]. Then do the same with probability 0. for n > −1.9. 16 = 1 n+1 .9. (a) Find the value of c.

8 MORE ON EXPECTATION AND LIMIT THEOREMS By independence. Moreover. 9 7 . 18 x2 (1 + x) dx = (c) Find P (X + Y < 1) and P (X + Y ≤ 1). stop. this equals 2Var(X) = 2(E(X 2 ) − (EX)2 ). Solution: This equals 2 3 2 3 2 1 x ·2 2 dx 0 1 0 (x − y)(1 + x)(1 + y) dy = x2 2 − (1 + x) x2 x3 + 2 3 dx. ·2 x(1 + x) x + 0 . 1 110 x(1 + x) dx = 0 1 0 5 . 2 3 2 E(X 2 ) = 3 E(X) = and the answer is 13 81 . Solution: The two probabilities are both equal to 2 3 2 3 2 0 2 0 1 1 1−x dx 0 (1 + x)(1 + y) dy = (1 − x)2 2 dx. Here and in (d): when you get to a single integral involving powers. (1 + x) (1 − x) + (d) Find E|X − Y |.

- CT3 probability
- Lecture Notes
- random variables
- Notes 3
- Probability Axioms
- Ch1 Mit Probability
- Lecture Notes for Introductory Probability
- lect2
- 04MA246L1A
- Notes 1
- notes#3.pdf
- Slide_01_Set Theory and Probability
- MELJUN CORTES CS113_StudyGuide_chap4
- Probability
- Chap 002
- Probability
- Probabilistic Approach to Rough Sets
- Lecture Notes on Probability
- Basics of Probability
- Statistics Notes 2
- Set 1 - Introduction & Probability Model
- introdiscreteprobas_v1.2
- Elementary Probability Theorem——Kai lai Chung-1
- Elementary probability theory with stochastic processes and an introduction to mathematical finance
- Chapter 13 I Probability ENRICH
- Stat 200 Lecture 1
- Contemporary Mathematics FINALS
- Reasoning about Probability and Nondeterminism
- Probability Wikibook
- Probability Theory Lecture notes 03
- Mat A

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue reading from where you left off, or restart the preview.

scribd