This action might not be possible to undo. Are you sure you want to continue?

Lecture 7 - Content

2 Sets

2 Probability and counting

2 Conditional probability

Many students are not familiar with set theory, i.e. spend some more time explaining

various operations carefully. If time left, spend some time talking about cardinality

of N, Q and R.

Statistics (Advanced): Lecture 7 1

Sets

Before we look at probability it is necessary to understand sets because probabilities

are typically described in terms of sets where an ‘event’ occurs.

Deﬁnition 1. The set of all possible outcomes of an experiment is called a sample

space, denoted by Ω. Any subset A of the sample space Ω, denoted by A ⊂ Ω is

called an event.

Deﬁnition 2. The counting operator N(A) is a set function that counts how many

elements belong to the set (event) A.

Example (Sample spaces).

Coin: Ω = ¦H, T¦ ⇒ N(Ω) = 2.

Dice: Ω = ¦1, 2, 3, 4, 5, 6¦ ⇒ N(¦1, 2, 5¦) = 3;

Weight: Ω = R

+

⇒ N(R

+

) = ∞.

(We want to count objects for any event)

Statistics (Advanced): Lecture 7 2

Set Notation

Before we introduce probability we need to introduce some notation. Let A, B ⊂ Ω.

symbol set theory probability

Ω largest set certain event

∅ empty set impossible event

A∪B union of A and B event A or event B

A∩B intersection of A and B event A and event B

A

C

= Ω¸A complement of A not event A

Statistics (Advanced): Lecture 7 3

Intersection Operator

The set A ∩ B denotes the set such that if C ∈ A ∩ B then C ∈ A and C ∈ B

(∩ is called the intersection operator).

Statistics (Advanced): Lecture 7 4

Intersection Operator

Examples:

2 ¦1, 2¦ ∩ ¦red, white¦ = ∅.

2 ¦1, 2, green¦ ∩ ¦red, white, green¦ = ¦green¦.

2 ¦1, 2¦ ∩ ¦1, 2¦ = ¦1, 2¦.

Some basic properties of intersections:

2 A ∩ B = B ∩ A.

2 A ∩ (B ∩ C) = (A ∩ B) ∩ C.

2 A ∩ B ⊆ A.

2 A ∩ A = A.

2 A ∩ ∅ = ∅.

2 A ⊆ B if and only if A ∩ B = A.

Statistics (Advanced): Lecture 7 5

Union Operator

The set A∪B denotes the set such that if C ∈ A∪B then C ∈ A and/or C ∈ B

(∪ is called the union operator).

Statistics (Advanced): Lecture 7 6

Union Operator

Examples:

2 ¦1, 2¦ ∪ ¦red, white¦ = ¦1, 2, red, white¦.

2 ¦1, 2, green¦ ∪ ¦red, white, green¦ = ¦1, 2, red, white, green¦.

2 ¦1, 2¦ ∪ ¦1, 2¦ = ¦1, 2¦.

Some basic properties of unions:

2 A ∪ B = B ∪ A.

2 A ∪ (B ∪ C) = (A ∪ B) ∪ C.

2 A ⊆ (A ∪ B).

2 A ∪ A = A.

2 A ∪ ∅ = A.

2 A ⊆ B if and only if A ∪ B = B.

Statistics (Advanced): Lecture 7 7

Set Minus

The set A ¸ B denotes the set such that if C ∈ A ¸ B then C ∈ A and C / ∈ B.

Statistics (Advanced): Lecture 7 8

Set Complement

The set A

c

= Ω ¸ A denotes the set such that if C ∈ A

c

then C / ∈ A.

Statistics (Advanced): Lecture 7 9

Set Minus

Examples:

2 ¦1, 2¦ ¸ ¦red, white¦ = ¦1, 2¦.

2 ¦1, 2, green¦ ¸ ¦red, white, green¦ = ¦1, 2¦.

2 ¦1, 2¦ ¸ ¦1, 2¦ = ∅.

2 ¦1, 2, 3, 4¦ ¸ ¦1, 3¦ = ¦2, 4¦.

Some basic properties of complements:

2 A ¸ B ,= B ¸ A.

2 A ∪ A

c

= Ω.

2 A ∩ A

c

= ∅.

2 (A

c

)

c

= A.

2 A ¸ A = ∅.

Statistics (Advanced): Lecture 7 10

Theorem 1. The complement of the union of A and B equals the intersection of

the complements

(A ∪ B)

C

= (A

C

) ∩ (B

C

).

Proof. Use Venn diagrams for LHS and RHS and colour areas.

Theorem 2. de Morgan’s Laws.

_

n

i=1

A

i

_

c

=

n

_

i=1

A

c

i

and

_

n

_

i=1

A

i

_

c

=

n

i=1

A

c

i

Statistics (Advanced): Lecture 7 11

Counting – Ordered Sampling without replacement

Example (Ordered samples without replacement). The number of ordered samples

of size r we can draw without replacement from n objects is,

n (n −1) . . . (n −r + 1) =

n!

(n −r)!

Recall: 0! = 1.

Statistics (Advanced): Lecture 7 12

Counting – Unordered Sampling without replacement

Example (Unordered samples without replacement).

n

C

r

=

_

n

r

_

=

n!

r!(n −r)!

= n Choose r.

Recall,

n

C

r

=

n

C

n−r

since

_

n

n −r

_

=

n!

(n −r)!((n −(n −r))!

= n Choose r.

and so

_

n

0

_

=

_

n

n

_

= 1

Statistics (Advanced): Lecture 7 13

Sampling without replacement (cont)

Example. Consider ¦A, B, C¦ and select r = 2 items:

¦A, B¦ ¦A, C¦ ¦B, C¦ ← order important

¦B, A¦ ¦C, A¦ ¦C, B¦

_

3

2

_

= 3 samples

↑

order not important 3 2 = 6 samples

Statistics (Advanced): Lecture 7 14

Sampling in R

# Creating ordered lists

n = 158;

x = 1:n;

set.seed(6) # set random seed to 6 to reproduce results

sample(x) # random permutation of nos 1,2,...,158: n! possibilities

sample(x,10) # choose 10 numbers without replacement

sample(x,10,TRUE) # choose 10 numbers with replacement = bootstrap sampling

Statistics (Advanced): Lecture 7 15

What is Probability?

1. Subjective probability expresses the strength of one’s belief (the basis of Bayesian

Statistics – a bit on that later).

2. Classical probability concept, mathematical answer for equally likely outcomes.

Theorem 3. If there are n equally likely possibilities of which one must occur

and s are regarded as favourable ( = successes), then the probability P of a

success is given by s/n.

Statistics (Advanced): Lecture 7 16

What is Probability?

3. The frequency interpretation of probability:

Theorem 4. The probability of an event (or outcome) is the proportion of times

the event occur in a long run of repeated experiments.

or in words:

If an experiment is repeated n times under identical conditions, and if

the event A occurs m times, then as n becomes large (i.e. in the long-run)

the probability of A occurring is the ratio m/n.

Statistics (Advanced): Lecture 7 17

What is Probability?

2 The constancy of the gender ratio at birth. In Australia, the proportion of male

births is fairly stable at 0.51. This long run relative frequency is used to estimate

the probability that a randomly chosen birth is male.

2 Cancer council records show the age standardised mortality rate from breast

cancer in NSW was close to 20 per 100,000 over the years 1972-2000. For a

randomly chosen woman, we use 0.0002 as the probability of breast cancer.

Example (Coin tossing).

Buﬀon (1707-1788): n = 4, 040 ⇒ P(¦H¦) = 50.7%.

Pearson (1857-1936): n = 24, 000 ⇒ P(¦H¦) = 50.05%.

Coin Tossing in R

table(sample(c("H","T"),4040,T))/4040

table(sample(c("H","T"),24000,T))/24000

Statistics (Advanced): Lecture 7 18

Coin Tossing 2010’s

In the 2010’s Stanford Professor Persi Diaconis developed the “Coin Tosser 3000”.

However, the machine is designed to ﬂip a coin with the same result every time!

Statistics (Advanced): Lecture 7 19

What is Probability?

4. Mathematical formulation of probability

Deﬁnition 3 (due to Andrey Kolmogorov, 1933). Given a sample space Ω

A ⊂ Ω, we deﬁne P(A), the probability of A, to be a value of a non-negative

additive set function that satisﬁes the following three axioms:

A1: For any event A, 0 ≤ P(A) ≤ 1,

A2: P(Ω) = 1,

A3: If A and B are mutually exclusive events (A ∩ B = ∅), then

P(A ∪ B) = P(A) + P(B).

A3’: If A

1

, A

2

, A

3

, . . . is a ﬁnite or inﬁnite sequence of mutually exclusive events

in Ω, then

P(A

1

∪ A

2

∪ A

3

∪ . . .) = P(A

1

) + P(A

2

) + P(A

2

) + . . . .

Statistics (Advanced): Lecture 7 20

Example (Lotto). A lotto type barrel contains 10 balls numbered 1, 2, . . . , 10. Three

balls are drawn.

i. How many distinct samples can be drawn?

n =

10

C

3

=

_

10

3

_

=

10 9 8

1 2 3

= 120.

ii. Event A = ¦1, 2, . . . , 7¦ (all numbers less than seven).

_

7

3

_

=

7 6 5

1 2 3

= 35 ’successes’ ⇒P(A) =

35

120

=

7

24

.

iii. B = all drawn numbers are even: P(B) =

1

120

_

5

3

_

=

10

120

=

1

12

.

A ∩ B = ¦(2, 4, 6)¦ ⇒P(A ∩ B) = 1/120.

iv. P(A ∪ B)? To answer this we need our next theorem.

Statistics (Advanced): Lecture 7 21

Addition Theorem

Theorem 5 (Addition Theorem). If A and B are any events in Ω, then

P(A ∪ B) = P(A) + P(B) −P(A ∩ B)

Proof. Use Venn diagrams, i.e. draw pictures OO and colour regions.

Alternatively use axioms only. First note that

P(A) = P((A ¸ (A ∩ B)) ∪ (A ∩ B))

A3

= P(A ¸ (A ∩ B)) + P(A ∩ B). (1)

Similarly, P(B) = P(B ¸ (A ∩ B)) + P(A ∩ B). Next,

P(A ∪ B) = P((A ¸ (A ∩ B)) ∪ (B ¸ (A ∩ B)) ∪ (A ∩ B))

A3

= P(A ¸ (A ∩ B)) + P(B ¸ (A ∩ B)) + P(A ∩ B)

= [P(A ¸ (A ∩ B)) + P(A ∩ B)]

+[P(B ¸ (A ∩ B)) + P(A ∩ B)] −P(A ∩ B)

= P(A) + P(B) −P(A ∩ B)

which follows from the result (1).

Statistics (Advanced): Lecture 7 22

Example (Lotto). A lotto type barrel contains 10 balls numbered 1, 2, . . . , 10. Three

balls are drawn.

i. How many distinct samples can be drawn? 120.

ii. Event A = ¦1, 2, . . . , 7¦ (all numbers less than seven). P(A) =

7

24

.

iii. B = all drawn numbers are even: P(B) =

1

12

.

Also P(A ∩ B) = 1/120.

iv. P(A ∪ B)?

P(A ∪ B) = P(A) + P(B) −P(A ∩ B) =

7

24

+

1

12

−

1

120

=

44

120

=

11

30

.

Statistics (Advanced): Lecture 7 23

Poincar´es’ Theorem

Theorem 6 (Poincar´es’ formula, not part of M1905). Let A

1

, A

2

, . . . , A

n

be any

events in Ω. Then,

P

_

n

_

i=1

A

i

_

=

n

i=1

P(A

i

) −

i<j

P(A

i

∩ A

j

) +

i<j<k

P(A

i

∩ A

j

∩ A

k

) + . . .

+(−1)

n−1

P

_

n

i=1

A

i

_

.

Statistics (Advanced): Lecture 7 24

(Unconditional) probability

2 Recall 3 Axioms of probability.

2 P(A

C

) = 1 − P(A) since A ∩ A

C

= ∅ hence, 1 = P(Ω) = P(A ∪ A

C

) =

P(A) + P(A

C

).

2 P(∅) = 0 because ∅ = Ω

C

, hence P(∅) = 1 −P(Ω).

2 etc.

Statistics (Advanced): Lecture 7 25

Conditional Probability – Motivating Example

Consider the following (ﬁctional) table of Sports Mortality Rates compiled over the

last decade:

SPORT Description DEATHS

Chess Board Game considered

the national sport of Russia 0

Statistics (Advanced): Lecture 7 26

Conditional Probability – Motivating Example

Consider the following (ﬁctional) table of Sports Mortality Rates compiled over the

last decade:

SPORT Description DEATHS

Chess Board Game considered

the national sport of Russia 0

Boxing Barbaric Sport where two

people hit each other 5

Statistics (Advanced): Lecture 7 27

Conditional Probability – Motivating Example

Consider the following (ﬁctional) table of Sports Mortality Rates compiled over the

last decade:

SPORT Description DEATHS

Chess Board Game considered

the national sport of Russia 0

Boxing Barbaric Sport where two

people hit each other 5

Chess Boxing 5 minutes of Chess followed

by 2 minutes of Boxing 0

Statistics (Advanced): Lecture 7 28

Conditional Probability – Motivating Example

Consider the following (ﬁctional) table of Sports Mortality Rates compiled over the

last decade:

SPORT Description DEATHS

Chess Board Game considered

the national sport of Russia 0

Boxing Barbaric Sport where two

people hit each other 5

Chess Boxing 5 minutes of Chess followed

by 2 minutes of Boxing 0

Sky Diving Jumping out of a plane

with a parachute 10

Statistics (Advanced): Lecture 7 29

Conditional Probability – Motivating Example

Consider the following (ﬁctional) table of Sports Mortality Rates compiled over the

last decade:

SPORT Description DEATHS

Chess Board Game considered

the national sport of Russia 0

Boxing Barbaric Sport where two

people hit each other 5

Chess Boxing 5 minutes of Chess followed

by 2 minutes of Boxing 0

Sky Diving Jumping out of a plane

with a parachute 10

Lawn Bowls Rolling a Ball across

grass to hit other balls 1000

Statistics (Advanced): Lecture 7 30

Hence, Lawn Bowls is the most dangerous sport by far!

Statistics (Advanced): Lecture 7 31

Conditional Probability – Motivating Example

However, the number of deaths given that the “sportsperson is young” is zero so

that

P(“Dying from Lawn Bowls”[“sportsperson is young”) ≈ 0

even though

P(“Dying from Lawn Bowls”) is large.

Statistics (Advanced): Lecture 7 32

Conditional Probability – Another Motivating Example

What is the probability of the important event

A = (starting salary after uni ≥ 60k)?

What is the sample space Ω?

Possibilities are:

Ω

1

= ¦all students¦,

Ω

2

= ¦all male students¦,

Ω

3

= ¦all students with a maths degree¦.

Statistics (Advanced): Lecture 7 33

Conclusion

2 Probability depends on the underlying sample space Ω!

2 Hence, if it is unclear to what sample space A refers to then make it clear by

writing

P(A[Ω) instead of P(A)

which we read as the conditional probability of A relative to Ω or given Ω,

respectively.

Deﬁnition 4. If A and B are any events in Ω and P(B) ,= 0 then, the conditional

probability of A given B is

P(A[B) =

P(A ∩ B)

P(B)

.

Statistics (Advanced): Lecture 7 34

Additional material for Lecture 7

A combinatorial proof of the binomial theorem

The binomial theorem says

(x + y)

n

=

n

k=0

_

n

k

_

x

k

y

n−k

.

Consider the more complicated product

(x

1

+ y

1

)(x

2

+ y

2

) (x

n

+ y

n

)

Its expansion consists of the sum of 2

n

terms, each term being the product of n

factors. Each term consists either x

k

or y

k

, for each k = 1, . . . , n. For example,

(x

1

+ y

1

)(x

2

+ y

2

) = x

1

x

2

+ x

1

y

2

+ y

1

x

2

+ y

1

y

2

Now, there is 1 =

_

n

0

_

term with y terms only, n =

_

n

1

_

with one x term and (n−1)

y terms etc. In general, there are

_

n

k

_

terms with exactly k x’s and (n−k) y’s. The

theorem follows by letting x

k

= x and y

k

= y.

Statistics (Advanced): Lecture 7 35

More on set theory

The operation of forming unions, intersections and complements of events obey rules

similar to the rules of algebra. Following some examples for events A, B and C:

Commutative law: A ∪ B = B ∪ A and A ∩ B = B ∩ A

Associative law: (A∪B)∪C = A∪(B∪C) and (A∩B)∩C = A∩(B∩C)

Distributive law: (A ∪ B) ∩ C = (A ∩ C) ∪ (B ∩ C) and (A ∩ B) ∪ C =

(A ∪ C) ∩ (B ∪ C).

Statistics (Advanced): Lecture 7 36

extra page

Statistics (Advanced): Lecture 7 37

Tuesday, 16 August 2011

Lecture 8 - Content

2 Conditional probability

2 Bayes rule

2 Integer valued random variables

Conditional probability equation

P(A[Ω) =

P(A ∩ Ω)

P(Ω)

= P(A) ⇒ for P(B) > 0 : P(A[B) =

P(A ∩ B)

P(B)

Statistics (Advanced): Lecture 8 38

Conditional probability (cont)

Example (Defect machine parts). Suppose that 500 machine parts are inspected

before they are shipped.

2 I = (a machine part is improperly assembled)

2 D = (a machine part contains one or more defective components)

N(S) = 500, N(I) = 30, N(D) = 15, N(I ∩ D) = 10

Statistics (Advanced): Lecture 8 39

Example (cont)

Assumption: equal probabilities in the selection of one of the machine parts.

⇒ Using the classical concept of probability we get:

P(D) = P(D[Ω) =

N(D)

N(Ω)

=

15

500

=

3

100

,

P(D[I) =

N(D ∩ I)

N(I)

=

10

30

=

1

3

>

3

100

,

note that if N(Ω) > 0, then

=

N(D ∩ I)

1

N(Ω)

N(I)

1

N(Ω)

=

P(D ∩ I)

P(I)

.

Statistics (Advanced): Lecture 8 40

General multiplication rule of probability

Theorem 7 (General multiplication rule of probability). If A and B are any events

in Ω, then

P(A ∩ B) = P(B) P(A[B), if P(B) ,= 0, changing A and B yields

= P(A) P(B[A), if P(A) ,= 0.

Proof. This holds because,

P(A[B) :=

P(A ∩ B)

P(B)

etc.

What happens if P(A[B) = P(A)?

⇒ additional information of B is of no use ⇒ special multiplication rule!

P(A ∩ B) = P(A) P(B).

Statistics (Advanced): Lecture 8 41

Deﬁnition of independence of events

Deﬁnition 5. If A and B are any two events in a sample space Ω, we say that A

is independent of B if and only if P(A[B) = P(A).

From the general multiplication rule it follows that if P(A[B) = P(A) then P(B[A) =

P(B) and we say simply that A and B are independent.

Statistics (Advanced): Lecture 8 42

Alternative View of Independence

Alternatively, if A and B are independent then P(A ∩ B) = P(A) P(B) and

hence,

P(B[A) =

P(A ∩ B)

P(A)

(using Baye’s rule)

=

P(A) P(B)

P(A)

(using independence)

= P(B).

which can also be interpreted as saying that knowing A does not eﬀecting the

probability of B.

Statistics (Advanced): Lecture 8 43

Independence

In other words the events A and B are independent if the chance that one happens

remains the same regardless of how the other turns out.

Example. Suppose that we toss a fair coin twice. Let

A = ¦heads of the ﬁrst toss¦

and

B = ¦heads of the second toss¦.

Now suppose A occurred. Then

P(¦B knowning A has happened¦) =

1

2

.

Statistics (Advanced): Lecture 8 44

Independence – Example 2

Example. Consider the following 6 boxes

1 2 3 1 2 3

Suppose we select a box at random, as it is drawn you see that it is green. Then

P(A = ¦getting a “2”¦) =

2

6

=

1

3

P(B = ¦getting a “2” if I know it is green¦) =

1

3

Knowing the selected box is green has not changed our knowledge about which

numbers might be drawn.

Hence, the events A and B are independent.

Statistics (Advanced): Lecture 8 45

Independence – Example 3

Example. Consider the following 6 boxes

1 1 2 1 2 2

Suppose we select a box at random, as it is drawn you see that it is green. Then

P(A = ¦getting a “2”¦) =

3

6

=

1

2

P(B = ¦getting a “2” if I know it is green¦) =

1

3

Knowing the selected box is green HAS CHANGED our knowledge about which

numbers might be drawn.

Hence, the events A and B are NOT independent.

Statistics (Advanced): Lecture 8 46

Independence – Example 4

Example. Two cards are drawn at random from an ordinary deck of 52 playing

cards. What is the probability of getting two aces if

(a) the ﬁrst card is replaced before the second is drawn?

(Solution: 4/52 4/52 = 1/169 since here P(A

1

∩ A

2

) = P(A

1

) P(A

2

) )

(b) The ﬁrst card is not replaced before the second card is drawn?

(Solution: 4/52 3/51 = 1/221 but unlike above P(A

2

[A

1

) ,= P(A

2

))

⇒ Independence is violated when the sampling is without replacement.

Statistics (Advanced): Lecture 8 47

Independence – Example 5

Medical records indicate that the proportion of children who have had measles by the

age of 8 is 0.4. The corresponding proportion for chicken pox is 0.5. The proportion

who have had both diseases by the age of 8 is 0.3. An infant is randomly selected.

Let A represent the event that he contracts measles, and B that he contracts chicken

pox, by the age of 8 years.

2 Estimate P(A), P(B) and P(A ∩ B).

P(A) = 0.4, P(B) = 0.5 and P(A ∩ B) = 0.3.

2 Are A and B independent?

P(A) P(B) = 0.2 ,= P(A∩B) = 0.3, so NO, A and B are not independent.

Statistics (Advanced): Lecture 8 48

Bayes rule

Example (The burgers are better...). Assume you get your burgers

2 60% from supplier B

1

[HJ]

2 30% from supplier B

2

[McD]

2 10% from supplier B

3

[RR]

⇒ P(B

1

) = 0.6, P(B

2

) = 0.3, and P(B

3

) = 0.1.

Interested in the event A =(good burger).

[Draw a picture that shows that B

1

, B

2

, and B

3

are mutually exclusive events with

B

1

∪ B

2

∪ B

3

= Ω.]

Statistics (Advanced): Lecture 8 49

Example (cont)

It follows that,

A = A ∩ (B

1

∪ B

2

∪ B

3

) = (A ∩ B

1

) ∪ (A ∩ B

2

) ∪ (A ∩ B

3

).

Note that (A ∩ B

1

), (A ∩ B

2

) and (A ∩ B

3

) are mutually exclusive.

By Axiom 3 we get

P(A) = P((A ∩ B

1

) ∪ (A ∩ B

2

) ∪ (A ∩ B

3

))

= P(A ∩ B

1

) + P(A ∩ B

2

) + P(A ∩ B

3

).

Remember the general multiplication rule:

We already know that

P(A ∩ B) = P(B) P(A[B), if P(B) ,= 0,

= P(A) P(B[A), if P(A) ,= 0.

Statistics (Advanced): Lecture 8 50

Example (cont)

So we can write

P(A) = P(B

1

) P(A[B

1

) + P(B

2

) P(A[B

2

) + P(B

3

) P(A[B

3

)

= 0.6 P(A[B

1

)

. ¸¸ .

0.95, very good

+0.3 P(A[B

2

)

. ¸¸ .

0.80, suﬃcient

+0.1 P(A[B

3

)

. ¸¸ .

0.65, insuﬃcient

= 0.875.

[The above probabilities 0.95, 0.8, 0.65 are from personal experience, i.e. subjective probability.]

What did the example teach us?

Strategy: decompose complicated events into mutually exclusive simple(r) events!

Statistics (Advanced): Lecture 8 51

Total probability rule

Theorem 8 (Total probability rule). If B

1

, B

2

, . . . , B

n

are mutually exclusive events

such that B

1

∪ B

2

∪ . . . ∪ B

n

= Ω then for any event A ⊂ Ω,

P(A) =

n

i=1

P(B

i

) P(A[B

i

).

Example (Burger, cont). We know already that supplier B

3

is bad. So what is

P(B

3

[A) (if a burger is good is it from B

3

)? By deﬁnition of the conditional prob-

ability, since P(A) > 0,

P(B

3

[A) =

P(A ∩ B

3

)

P(A)

=

P(B

3

∩ A)

P(A)

=

P(B

3

) P(A[B

3

)

3

i=1

P(B

i

) P(A[B

i

)

=

0.1 0.65

0.875

= 0.074.

After we know that a burger is good the probability that it comes from B

3

decreases

from 0.1 to 0.074.

Statistics (Advanced): Lecture 8 52

Bayes’ rule or Theorem

What we just derived is the famous formula, called Bayes’ rule or theorem.

Theorem 9 (Bayes Rule). If B

1

, B

2

, . . . , B

n

are mutually exclusive events such

that B

1

∪ B

2

∪ . . . ∪ B

n

= Ω then for any event A ⊂ Ω,

P(B

j

[A) =

P(A[B

j

) P(B

j

)

n

i=1

P(A[B

i

) P(B

i

)

.

The probabilities P(B

i

) are called the priori probabilities and the probabilities P(B

i

[A)

the posteriori probabilities, i = 1, . . . , n.

Statistics (Advanced): Lecture 8 53

Reverend Thomas Bayes (1701 - 1761)

2 Born in Hertfordshire (London, England),

2 was a Presbyterian minister,

2 studied: theology and mathematics,

2 best known for Essay Towards Solving a

Problem in the Doctrine of Chances ,

2 where Bayes’ Theorem was ﬁrst proposed.

2 Words: Bayes’ rule, Bayes’ Theorem,

Bayesian Statistics.

Statistics (Advanced): Lecture 8 54

Example of Bayes Rule – Screening test for Tuberculosis

TB (D

+

) No TB (D

−

)

X-ray Positive (S

+

) 22 51 73

X-ray Negative (S

−

) 8 1739 1747

30 1790 1820

What is the probability that a randomly selected individual has tuberculosis given

that his or her X-ray is positive given that P(D

+

) = 0.000093?

2 P(D

+

) = 0.000093 which implies that P(D

−

) = 0.999907.

2 P(S

+

[D

+

) = 22/30 = 0.7333

2 P(S

+

[D

−

) = 51/1790 = 0.0285

P(D

+

[S

+

) =

P(S

+

[D

+

)P(D

+

)

P(S

+

[D

+

)P(D

+

) + P(S

+

[D

−

)P(D

−

)

=

0.7333 0.000093

0.7333 0.000093 + 0.0285 0.999907

= 0.00239

Statistics (Advanced): Lecture 8 55

Integer valued random variables

Many observed numbers are the random result of many possible numbers.

Deﬁnition 6. A random variable X is a real-valued function of the elements of a

sample space Ω.

Note that such functions are denoted with capital letters and their images (out-

comes) with lower case letters, e.g. x.

Examples.

2 How many times (X) will you be caught speeding?

2 What will your ﬁnal mark (Y ) for MATH1905 be?

2 How old (Z, in years) do you think your stats lecturer is?

Statistics (Advanced): Lecture 8 56

Random Variable Example – 3 Coins

Consider tossing three coins. The number of heads showing when the coins land is

a random variable: it assigns the number 0 to the outcome ¦T, T, T¦, the number

1 to the outcome ¦T, T, H¦, the number 2 to the outcome ¦T, H, H¦, and the

number 3 to the outcome ¦H, H, H ¦.

Statistics (Advanced): Lecture 8 57

Random Variable Example – 3 Coins

Events Random Variable Probability

TTT

TTH

THT

THH

HTT

HTH

HHT

HHH

X = ¦ Number of Heads ¦

P(X = 0) =

1

8

P(X = 1) =

3

8

P(X = 2) =

3

8

P(X = 3) =

1

8

Statistics (Advanced): Lecture 8 58

Random Variable Notation – 3 Coins

We use upper case letters to denote “unobserved” random variables, say X, and

lower case letters to their observed values, in this case x.

For example, in the above example before the three coins land we denote the number

of heads X, after the coins have landed we denote the number of coins x so that

we can write P(X = x).

Statistics (Advanced): Lecture 8 59

The mother of all examples: Bernoulli trials!

Deﬁnition 7. Bernoulli trials satisfy the following assumptions:

(i) there are only two possible outcomes for each trial,

(ii) the probability of success is the same for each trial,

(iii) the outcomes from diﬀerent trials are independent,

(iv) there are a ﬁxed number n of Bernoulli trials conducted.

Example (n = 1, coin). Ω: Head or Tail. We can describe the trial (before ﬂipping

the coin) in full detail. Consider a function

X : ¦H, T¦ → ¦0, 1¦ s.t. X(H) = x

H

= 1 and X(T) = x

T

= 0.

What is the probability that X = x

H

= 1?

P(X = 1) = P(X = x

H

) = P(H) = p = 1/2 ⇒ P(X = 0) = 1/2.

Statistics (Advanced): Lecture 8 60

Jacob Bernoulli (1654–1705)

2 Born in Basel (Switzerland),

2 1 of 8 mathematicians in his family,

2 studied: theology → maths & astro,

2 best known for Ars Conjectandi (The Art of

Conjecture),

2 application of probability theory to games

of chance, introduction of the law of large

numbers.

2 Words: Bernoulli trial, Bernoulli numbers.

Statistics (Advanced): Lecture 8 61

extra page

Statistics (Advanced): Lecture 8 62

Monday, 22 August 2011

Lecture 9 - Content

2 Distribution of a random variable

2 Binomial distribution

2 Mean of a distribution

Statistics (Advanced): Lecture 9 63

Revised Axioms of Probability

In Lecture 7 we used the following deﬁnition of probability

Deﬁnition 8 (due to Andrey Kolmogorov, 1933). Given a sample space Ω A ⊂ Ω,

we deﬁne P(A), the probability of A, to be a value of a non-negative additive set

function that satisﬁes the following three axioms:

A1: For any event A, 0 ≤ P(A) ≤ 1,

A2: P(Ω) = 1,

A3’: If A

1

, A

2

, A

3

, . . . is a ﬁnite or inﬁnite sequence of mutually exclusive events in

Ω, then

P(A

1

∪ A

2

∪ A

3

∪ . . .) = P(A

1

) + P(A

2

) + P(A

2

) + . . . .

However, nothing is lost if we replace A1 : 0 ≤ P(A) ≤ 1 with A1 : 0 ≤ P(A).

Statistics (Advanced): Lecture 9 64

Proof by Contradiction

Assume the following 3 axioms:

A1: For any event A ⊂ Ω, 0 ≤ P(A),

A2: P(Ω) = 1,

A3’: If A

1

, A

2

, A

3

, . . . is a ﬁnite or inﬁnite sequence of mutually exclusive events in

Ω, then

P(A

1

∪ A

2

∪ A

3

∪ . . .) = P(A

1

) + P(A

2

) + P(A

2

) + . . . .

Now let us assume that A4: For any event A ⊂ Ω, P(A) > 1, then

2 1 = P(Ω) = P(A ∪ A

c

) = P(A) + P(A

c

).

2 Rearranging we have P(A

c

) = 1 −P(A).

2 By A4 we have P(A

c

) < 0. This contradicts A1, hence A4 cannot be assumed.

Statistics (Advanced): Lecture 9 65

Revised Axioms of Probability

Deﬁnition 9 (due to Andrey Kolmogorov, 1933). Given a sample space Ω A ⊂ Ω,

we deﬁne P(A), the probability of A, to be a value of a non-negative additive set

function that satisﬁes the following three axioms:

A1: For any event A, 0 ≤ P(A),

A2: P(Ω) = 1,

A3’: If A

1

, A

2

, A

3

, . . . is a ﬁnite or inﬁnite sequence of mutually exclusive events in

Ω, then

P(A

1

∪ A

2

∪ A

3

∪ . . .) = P(A

1

) + P(A

2

) + P(A

2

) + . . . .

This is the minimal set of axioms needed to deﬁne probability.

Random Variables Reminder

n = 1 (Coin): Ω = ¦H, T¦

X

→ ¦0, 1¦ ⊂ R. Thus X(H) = x

H

= 1 and X(T) =

x

T

= 0 ⇒ P(X = 1) = P(H) = p.

Statistics (Advanced): Lecture 9 66

Distribution of a random variable

Deﬁnition 10. The probability distribution of a integer-valued random variable X

is a list of the possible values of X together with their probabilities

p

i

= P(X = i) ≥ 0 and

i

p

i

= 1.

There is nothing special with the subscript i; we could and will equally well use j,

k, x etc.

Deﬁnition 11. The probability that the value of a random variable X is less than

or equal to x, that is

F(x) = P(X ≤ x),

is called the cumulative distribution function or just the distribution function.

Also, note that for integer valued random variables that

P(X = x) = F(x) −F(x −1).

Statistics (Advanced): Lecture 9 67

Example (n = 3, IT problems). A network is fragile. By experience: P(F) = 0.1 =

1−p that in any given week ≥ 1 major problem; P(S) = 0.9 = p that there is none,

respectively. Out of 3 weeks, how many weeks, X, had ≥ 1 problem and with what

probability?

(a) All possible outcomes:

FFF SFF FSF FFS

SSF FSS SFS SSS

(b) What is the probability of each outcome? Use special multiplication rule of

probability because sessions are independent!?

(c) What is the probability distribution of the number of successes, X, among the

3 sessions.

Statistics (Advanced): Lecture 9 68

Example (cont)

P(X = 0) = P(FFF) = P(F) P(F) P(F) = (1 −p)

3

P(X = 1) = P(SFF ∪ FSF ∪ FFS)

. ¸¸ .

mutually exclusive events

= P(SFF) + P(FSF) + P(FFS)

= 3 (1 −p)

2

p =

_

3

1

_

(1 −p)

2

p, select one S out of 3 trials.

similarly we get for X = 2 and X = 3

P(X = 2) =

_

3

2

_

(1 −p)p

2

, select two S out of 3 trials,

P(X = 3) =

_

3

3

_

p

3

, select three S out of 3 trials.

Statistics (Advanced): Lecture 9 69

Binomial distribution

We can generalise this result for any n ≥ 1 and success probability p ∈ [0, 1].

Deﬁnition 12. The probability distribution of the number of successes X = i in

n ∈ N independent Bernoulli trials is called the binomial distribution,

p

i

= P(X = i) =

_

n

i

_

p

i

(1 −p)

n−i

.

The success probability of a single Bernoulli trial is p and i = 0, 1, . . . , n.

To say that the random variable X has the binomial distribution with parameters n

and p we write X ∼ B(n, p).

This deﬁnes a family of probability distributions, with each member characterized

by a given value of the parameter p and the number of trials n.

Statistics (Advanced): Lecture 9 70

Binomial distribution

Since p

i

, 0 ≤ i ≤ n is a probability distribution we have the identity (which we will

use later on)

n

i=0

_

n

i

_

p

i

(1 −p)

n−i

= 1

for any 0 ≤ p ≤ 1.

A special case of the Binomial distribution is the Bernoulli distribution where n = 1

and

P(X

i

= i) = p

i

(1 −p)

1−i

.

There is another special relationship between the Bernoulli distribution and the

Binomial distribution.

If X

i

∼ Bernoulli(p) for 1 ≤ i ≤ n and Y =

n

i=1

X

i

then

Y ∼ B(n, p).

Statistics (Advanced): Lecture 9 71

Example (Dice). Roll a fair dice 9 times. Let X be the probability of sixes obtained.

Then X ∼ B(9, 1/6); that is

p

i

= P(X = i)

=

_

n

i

_

p

i

(1 −p)

n−i

=

_

9

i

__

1

6

_

i

_

5

6

_

9−i

=

_

9

i

_

5

9−i

6

9

, i = 0, 1, . . . , 9.

Statistics (Advanced): Lecture 9 72

With your table calculator or with R:

> n = 9;

> p = 1/6;

> round(dbinom(0:n,n,p),4) # dbinom for B(n,p) prob’s

[1] 0.1938 0.3489 0.2791 0.1302 0.0391

[5] 0.0078 0.0010 0.0001 0.0000 0.0000

> pbinom(1,n,p) # for B(n,p) cumulative probabilities

[1] 0.5426588

Hence, P(X = 4) = 0.0391 and P(X < 2) = F(1) = 0.5426588.

Statistics (Advanced): Lecture 9 73

Shape of the binomial distribution

2 We get a binomial distribution if

1. we are counting something over a ﬁxed number of trials or repetitions,

2. the trials are independent and

3. the probability of the outcome of interest is constant across trials.

2 The binomial distribution is centred at n p,

2 the closer p to 1/2 the more symmetric the distribution/histogram,

2 the larger n the closer the shape to a bell (normal).

Statistics (Advanced): Lecture 9 74

par(mfrow=c(2,2)); n =10 # and for n=50, 100, etc

barplot(dbinom(0:n,n,1/2))

title(main="Probabilities for X~B(n=10,p=0.5)")

barplot(dbinom(0:n,n,0.1))

title(main="Probabilities for X~B(10,0.1)")

barplot(dbinom(0:n,n,0.8))

title(main="Probabilities for X~B(10,0.8)")

barplot(dbinom(0:n,n,0.4))

title(main="Probabilities for X~B(10,0.4)")

Statistics (Advanced): Lecture 9 75

0

.

0

0

0

.

0

5

0

.

1

0

0

.

1

5

0

.

2

0

Probabilities for X~B(n=10,p=0.5)

0

.

0

0

.

1

0

.

2

0

.

3

Probabilities for X~B(10,0.1)

0

.

0

0

0

.

1

0

0

.

2

0

0

.

3

0

Probabilities for X~B(10,0.8)

0

.

0

0

0

.

0

5

0

.

1

0

0

.

1

5

0

.

2

0

0

.

2

5

Probabilities for X~B(10,0.4)

Statistics (Advanced): Lecture 9 76

Example. In a small pond there are 50 ﬁsh, 20 of which have been tagged. Seven

ﬁsh are caught and X represents the number of tagged ﬁsh in the catch. Assume

each ﬁsh in the pond has the same chance of being caught. Is X binomial

(a) if each ﬁsh is returned before the next catch?

Yes, provided the ﬁsh do not learn from their experience, i.e. the probability of

catching each ﬁsh stays the same for each of the 7 trials.

P(X = 1) =

_

7

1

__

20

50

_

1

_

30

50

_

6

≈ 0.131 = dbinom(1,7,0.4)

Statistics (Advanced): Lecture 9 77

(b) if the ﬁsh are not returned once they are caught?

This situation cannot be modelled by a binomial as the proportion of tagged ﬁsh

changes at each trial.

If there were 5,000 ﬁsh, 2,000 of which had been tagged then the change in the

proportion was negligible and we could model with a binomial.

P(X = 1) =

_

20

1

_

_

30

6

_

_

50

7

_

= choose(20,1)*choose(30,6)/choose(50,7) (in R)

= 0.119 (to 3 d.p.)

P(X = 1) =

_

2000

1

_

_

3000

6

_

_

5000

7

_

= choose(2000,1)*choose(3000,6)/choose(5000,7) (in R)

= 0.131 (to 3 d.p.)

Statistics (Advanced): Lecture 9 78

Mean of a distribution

Deﬁnition 13. For a random variable X taking values 0, 1, 2, . . . with

P(X = i) = p

i

i = 0, 1, 2, . . .

the mean or expected value of X is deﬁned to be

µ = E(X) =

i

i p

i

.

Interpretation of E(X)

2 Long run average of observations of X because p

i

≈ f

i

/n.

2 Centre of balance of the probability density (histogram). (draw picture)

2 Measure of location of the distribution.

Deﬁnition 14. For any function g(X) we deﬁne the expected value E(g(X)) by

E(g(X)) =

i

g(i) p

i

.

Statistics (Advanced): Lecture 9 79

Expectation of a Dice Roll

Let X = ¦Face showing from a dice roll¦ where p

i

= P(X = i) = 1/6 for i =

1, 2, . . . , 6. Then

µ = E(X)

=

6

i=1

i p

i

=

i

i 1/6

= 3.5.

Note: the expected value in this case is not one of the observed values.

Statistics (Advanced): Lecture 9 80

Mean of a distribution (cont)

Theorem 10. For constants a and b

E(aX + b) = a E(X) + b.

Proof.

E(aX + b) =

all i

g(i) p

i

; where g(i) = a i + b,

=

all i

_

(a i)p

i

+ b p

i

_

= a

all i

i p

i

+ b

all i

p

i

= a E(X) + b.

Statistics (Advanced): Lecture 9 81

Expectation of X ∼ B(n, p)

Theorem 11. The expectation of X ∼ B(n, p) is E(X) = np.

Proof.

E(X) =

n

i=0

i p

i

=

n

i=0

i

n!

i!(n −i)!

p

i

(1 −p

i

)

(n−i)

; change to i = 1, . . . , n,

=

n

i=1

i

n!

i!(n −i)!

p

i

(1 −p

i

)

(n−i)

; simplify,

=

n

i=1

i

n (n −1)!

i(i −1)!(n −i)!

p

i

(1 −p

i

)

(n−i)

,

= n p

n

i=1

(n −1)!

(i −1)!(n −i)!

p

i−1

(1 −p

i

)

(n−i)

; sub. j = i −1, m = n −1.

Hence, E(X) = np

m

j=0

_

m

j

_

p

j

(1 −p)

m−j

. ¸¸ .

sums to 1 because probabilities from Y ∼ B(m, p)

.

Statistics (Advanced): Lecture 9 82

Example (Multiple choice section in M1905 exam is worth 35%).

20 questions and each question has 5 possible answers. A student decides to answer

the questions by selecting an answer at random.

(a) What is the expected number of correct responses? Let X denote the number

of correct answers. X ∼ B(20, 0.2). The expected number of correct answers is

np = 4

(b) Probability that the student has more than 10 correct answers?

P(X > 10) = 1 −P(X ≤ 10)

= 1 −0.9994, with 1-pbinom(10,20,0.2)

= 0.0006

(c) If the student scores 4 for a correct answer but -1 for a wrong response, what is

his expected score?

E[4 X + (−1) (20 −X)] = E(5X −20) = 0.

Statistics (Advanced): Lecture 9 83

Tuesday, 23 August 2011

Lecture 10 - Content

2 Variance of a distribution

2 More integer-valued distributions

2 Probability generating functions

Statistics (Advanced): Lecture 10 84

Expectation of a distribution – Reminders

The expectation of a distribution (or expectation of a random variable) is the mean

of the probability distribution (a measure of distribution location).

Note that

2 E(X) =

i

i p

i

=

i

i P(X = i) and

2 E(g(X)) =

i

g(i) p

i

=

i

g(i) P(X = i).

Statistics (Advanced): Lecture 10 85

Variance of a distribution

Example. Suppose X (e.g. number of shoes in suitcase) takes the values 2, 4 and

6 with probabilities

i 2 4 6

p

i

0.1 0.3 0.6

Hence,

µ = E(X)

=

i

i p

i

=

i

i p

i

= 2 0.1 + 4 0.3 + 6 0.6

= 5.

Statistics (Advanced): Lecture 10 86

What is E(X

2

)?

Suppose X (e.g. number of shoes in suitcase) takes the values 2, 4 and 6 with

probabilities

i 2 4 6

p

i

0.1 0.3 0.6

What is E(X

2

)?

Solution 1: E(X

2

)

Def

=

g(i)p

i

=

i

2

p

i

= 26.8 ,= 5

2

.

Solution 2: i → i

2

= j and X → X

2

= Y , use E(Y ) =

j

jp

j

j 4 16 36

p

j

0.1 0.3 0.6

The distribution of Y can be hard to get (e.g. for continuous rvs).

Statistics (Advanced): Lecture 10 87

Deﬁnition 15. The variance of the random variable X is deﬁned by

Var(X) = σ

2

= E(X −µ)

2

= E(X

2

) −µ

2

,

where µ = E(X) and σ

2

is also a measure of spread.

This is like the large sample limit of a sample variance.

The standard deviation of X is σ =

√

σ

2

.

Statistics (Advanced): Lecture 10 88

Variance of a Linear Transformation

Theorem 12. For any constants a and b

Var(aX + b) = a

2

Var(X).

Proof.

Var(aX + b) = E[(aX + b)

2

] −(E[aX + b])

2

= E[a

2

X

2

+ abX + b

2

] −(a E[X] + b)

2

= a

2

E[X

2

] + 2ab E[X] + b

2

−(a E[X]

2

+ 2ab E[X] + b

2

)

= a

2

(E[X

2

] −E[X]

2

)

= a

2

Var(X).

Statistics (Advanced): Lecture 10 89

Example. If X ∼ B(n, p) then we’ll show later that Var(X) = n p (1 −p).

2 Hence, if p = 0 or 1 then the variance is 0.

2 the variance is largest when p = 0.5 and in this case it is σ

2

= n/4.

Statistics (Advanced): Lecture 10 90

More integer-valued distributions

Geometric distribution

The binomial random variable is just one possible integer-valued random variable.

Suppose we have an inﬁnite sequence of independent trials, each of which gives a

success with probability p and failure with probability q = 1 −p.

Deﬁnition 16. The geometric distribution with parameter p (= success prob.) has

probabilities for the number of failures X before the ﬁrst success,

p

i

= P(X = i) = q

i

p, i = 0, 1, 2, . . . .

Note the probabilities add to 1:

P(X = 0) + p

1

+ . . . = p + qp + q

2

p + . . . = p(1 + q + q

2

+ . . .) = p

_

1

1 −q

_

= 1

[By induction we can prove that 1 + q + . . . + q

n

=

1−q

n+1

1−q

.]

Statistics (Advanced): Lecture 10 91

Example. A fair die is thrown repeatedly until it shows a six.

(a) What is the probability that more than 7 throws are required?

P(X > 7) = 1 −P(X ≤ 7) = 1 −

7

i=0

_

5

6

_

i

1

6

= 0.232 (3dp)

with 1-pgeom(7,1/6) or with 1-sum(dgeom(0:7,1/6)).

(b) Is it more likely that an odd number of throws is required or an even number?

Because 0 ≤ P(X = i) ≤ 1 and F(∞) = 1 we ﬁnd,

P(even) −P(odd) =

∞

j=1

P(X = 2(j −1)) −

∞

k=1

P(X = 2k −1)

=

∞

j=1

P(X = 2(j −1)) −P(X = 2j −1) =

∞

j=1

q

2(j−1)

p −q

2j−1

p

= p

∞

j=1

(q

2(j−1)

−q

2j−1

)

. ¸¸ .

≤0

⇒ odd number of throws are more likely.

Statistics (Advanced): Lecture 10 92

The Poisson approximation to the Binomial

The Poisson distribution often serves as a ﬁrst theoretical model for counts which

do not have a natural upper bound.

Possible examples

2 modeling of number of accidents, crashes, breakdowns

2 modeling radioactivity measured by the Geiger counter

2 modeling of so-called rare events (meteorite impacts, heart attacks)

Whether or not the Poisson distribution suﬃciently describes count data is not

answered at this early stage but postponed to later lectures in statistics.

Statistics (Advanced): Lecture 10 93

The Poisson distribution can be seen as the limiting distribution of B(n, p):

Let n → ∞, while p → 0 and np → λ ∈ (0, ∞).

For X ∼ B(n, p) we know that

P(X = k) =

_

n

k

_

p

k

. ¸¸ .

=()

(1 −p)

n−k

. ¸¸ .

=()

.

Then, () =

_

n

k

_

p

k

=

_

n

k

_

λ

k

n

k

=

n(n −1) (n −k + 1)

n n n

λ

k

k!

→

λ

k

k!

and () = (1 −p)

n−k

=

_

1 −

λ

n

_

n

_

1 −

λ

n

_

−k

. ¸¸ .

→1

→ e

−λ

.

Hence,

P(X = k) → e

−λ

λ

k

k!

, for k = 0, 1, 2, . . . .

Statistics (Advanced): Lecture 10 94

Approximation is good if n p

2

is small!

X ∼ B(158,

1

365

) and n p

2

= 0.001186:

> # What is the probability that of 158 people, exactly k have a birthday today?

> n = 158; p=1/365;

> round(dbinom(0:7,n,p),5);

[1] 0.64826 0.28139 0.06068 0.00867 0.00092 0.00008 0.00001 0.00000

> round(dpois(0:7,p*n),5);

[1] 0.64864 0.28078 0.06077 0.00877 0.00095 0.00008 0.00001 0.00000

But for n = 10

> n = 10; p=1/3;

> round(dbinom(0:4,n,p),5);

[1] 0.01734 0.08671 0.19509 0.26012 0.22761

> round(dpois(0:4,p*n),5);

[1] 0.03567 0.11891 0.19819 0.22021 0.18351

Statistics (Advanced): Lecture 10 95

Probability distribution for X ∼ B(n, p) and X ∼ T(λ)

0

.

0

0

.

2

0

.

4

0

.

6

Probabilities for X~B(158,1/365)

0

.

0

0

.

2

0

.

4

0

.

6

Probabilities for X~Poi(158/365)

0

.

0

0

0

.

1

0

0

.

2

0

Probabilities for X~B(10,1/3)

0

.

0

0

0

.

0

5

0

.

1

0

0

.

1

5

0

.

2

0

Probabilities for X~Poi(10/3)

Statistics (Advanced): Lecture 10 96

Probability generating functions

Let X ∈ N and p

i

= P(X = i), i = 0, 1, 2, . . .

Deﬁnition 17. The probability generating function is deﬁned as

π(s) = p

0

+ p

1

s + p

2

s

2

+ p

3

s

3

+ . . ..

Example. If X only takes a ﬁnite number of values (e.g. X ∼ B(n, p)) then π(s)

is a polynomial.

Alternatively (e.g. X ∼ T(λ)) π(s) is a power series.

Statistics (Advanced): Lecture 10 97

Properties of π(s)

Let s ∈ [0, 1] then

2 0 ≤ π(s) ≤ 1,

2 π(1) = p

0

+ p

1

+ . . . = 1,

2 π

/

(s) = p

1

+ 2p

2

s + 3p

3

s

2

+ . . . ≥ 0, s ≥ 0.

2 π

/

(1) = p

1

+ 2p

2

+ 3p

3

+ . . . = E(X) (if E(X) is ﬁnite),

2 π

//

(s) = 2p

2

+ 6p

3

+ 4 3p

4

+ . . . at s = 1, so π

//

(1) = E(X(X −1)) and

Var(X) = E(X

2

) −(EX)

2

= π

//

(1) + π

/

(1) −(π

/

(1))

2

.

Statistics (Advanced): Lecture 10 98

Example (Poisson distribution). For X ∼ T(λ),

π(s) =

∞

i=0

e

−λ

λ

i

i!

s

i

= e

−λ

∞

i=0

e

λs

e

λs

(λs)

i

i!

= e

−λ

e

λs

= e

λ(s−1)

.

Hence,

π

/

(s) = λe

λ(s−1)

so E(X) = λ (= π

/

(1))

π

//

(s) = λ

2

e

λ(s−1)

so E[X(X −1)] = λ

2

and

Var(X) = E(X

2

) −(EX)

2

= λ

2

+ λ −λ

2

= λ.

Statistics (Advanced): Lecture 10 99

Example (Binomial distribution). Let X ∼ B(n, p).

First, note that

(x + y)

n

=

n

i=0

_

n

i

_

x

i

y

n−i

. (2)

Then

π(s) =

n

i=0

s

i

_

n

i

_

p

i

(1 −p)

n−i

=

n

i=0

_

n

i

_

(sp)

i

(1 −p)

n−i

= (1 −p + ps)

n

which follows from (2). Then

π

/

(s) = np(1 −p + ps)

n−1

so that π

/

(1) = E(X) = np,

π

//

(s) = np

2

(n −1)(1 −p + ps)

n−2

so that π

//

(1) = np

2

(n −1)

and ﬁnally,

Var(X) = π

//

(1) −(π

/

(1))

2

+ π

/

(1) = np

2

(n −1) −n

2

p

2

+ np = np(1 −p).

Statistics (Advanced): Lecture 10 100

Statistics (Advanced): Lecture 10 101

Sign up to vote on this title

UsefulNot useful- Statistics Formula Review 2
- Probabilty Statistics Formula Review 2
- ch6
- FCM2063 Printed Notes.
- Session+Plan_DS
- SYLLABUS_SQQS1013_A122
- ApxA Solutions
- Applied probability and statistics
- Mood an Introduction to the Theory of Statistics - Copy
- Introduction to Probability Theory1
- Probability and Statistics
- Course Outline
- Probability
- 13.Probability
- TOV_2008
- Sample.exam1a
- 02 Probability, Bayes Theorem and the Monty Hall Problem
- L1 Stats
- Chap 05
- Final Under Grad
- Ch6 Testbank Handout
- STA301 - Final Term Solved Subjective With Reference by Moaaz
- Chap 1
- Seventh Week
- Wroughton and Cole - 2013 - Distinguishing Between Binomial, Hypergeometric and Negative Binomial
- 517 (Sims, Princeton).pdf
- 02 Probability, Bayes Theorem and the Monty Hall Problem
- Chapter 5
- MA 220 - Lecture Plan - 09-10.doc
- Notes5
- M1905 Topic 2A Probability