You are on page 1of 5

# Stat 110 Homework 4, Fall 2017

Due: Friday 10/6 at 5:00 pm, submitted as a PDF via the course webpage. Please check carefully
to make sure you upload the correct file. Your submission must be a single PDF file, no more than
20 MB in size. It can be typeset or scanned, but must be clear and easily legible (not blurry or
faint) and correctly rotated. No submissions on paper or by email will be accepted. Please show
your work and give clear, careful, convincing justifications. See the syllabus for the collaboration
policy.

1. (BH 4.10) Consider the St. Petersburg paradox (Example 4.3.13), except that you receive \$n
rather than \$2n if the game lasts for n rounds. What is the fair value of this game? What if the
payoff is \$n2 ?

Solution: The fair value of the game is equal to the expected earnings.
• Win \$n after n rounds: The probability that the game ends on round n is 2−n , so the
expected earnings are S = 1 · 2−1 + 2 · 2−2 + 3 · 2−3 + · · · . Multiplying this summation
by 2 gives 2S = 1 · 20 + 2 · 2−1 + 3 · 2−2 + · · · , and subtracting S from 2S yields S =
1 · 20 + 1 · 2−1 + 1 · 2−2 + · · · = 1−1/2
1
= \$2 .

• Win \$n2 after n rounds: The expected earnings are S 0 = 12 · 2−1 + 22 · 2−2 + 32 · 2−3 + · · · .
Multiplying this summation by 2 gives 2S 0 = 12 · 20 + 22 · 2−1 + 32 · 2−2 + · · · , and subtracting
S 0 from this yields S 0 = 1 + (2 + 1) · (1/2) + (4 + 1) · (1/4) + (6 + 1) · (1/8) + · · · =
(1 + 21 + 14 + · · · ) + 1 + 22 + 43 + 48 + · · · = 1−1/2
1
+ 2S = 1/2 + 1 = \$3/2 .

## 2. (BH 4.14) Let X have PMF

P (X = k) = cpk /k for k = 1, 2, . . . ,
where p is a parameter with 0 < p < 1 and c is a normalizing constant. We have c = −1/ log(1−p),
as seen from the Taylor series
p2 p3
− log(1 − p) = p + + + ....
2 3
This distribution is called the Logarithmic distribution (because of the log in the above Taylor
series), and has often been used in ecology. Find the mean and variance of X.

Solution:
∞ ∞
X X cp −p
E(X) = kP (X = k) = cpk = = .
k=1 k=1
1−p (1 − p)log(1 − p)

X ∞
X
2 2
E(X ) = k P (X = k) = ckpk .
k=1 k=1

## Multiplying E(X ) by p and subtracting from E(X ), we find (1 − p)E(X 2 ) = c(p + p2 +

2 2
1
p3 + · · · ) = E(X). Dividing, we find E(X 2 ) = 1−p E(X), and V ar(X) = E(X 2 ) − E(X)2 =
−plog(1 − p) − p2
.
(1 − p)2 (log(1 − p))2

1
3. (BH 4.16) The dean of Blotchville University boasts that the average class size there is 20. But
the reality experienced by the majority of students there is quite different: they find themselves
in huge courses, held in huge lecture halls, with hardly enough seats or Haribo gummi bears for
everyone. The purpose of this problem is to shed light on the situation. For simplicity, suppose
that every student at Blotchville University takes only one course per semester.
(a) Suppose that there are 16 seminar courses, which have 10 students each, and 2 large lecture
courses, which have 100 students each. Find the dean’s-eye-view average class size (the simple
average of the class sizes) and the student’s-eye-view average class size (the average class size
experienced by students, as it would be reflected by surveying students and asking them how big
their classes are). Explain the discrepancy intuitively.
(b) Give a short proof that for any set of class sizes (not just those given above), the dean’s-eye-
view average class size will be strictly less than the student’s-eye-view average class size, unless
all classes have exactly the same size.
Hint: Relate this to the fact that variances are nonnegative.
Solution: (a)
2×100+16×10
• Dean’s average size: 2+16
= 20 .
P
class size (2×100)×100+(16×10)×10
• Surveyor’s average size: all students
total students
= 2×100+16×10
= 60 .

The discrepancy arises from the fact that the dean weights all classes equally, but the surveyor
weights all students equally. Effectively, the dean’s method gives less weight to students in large
classes, because those classes have more students. This means that the dean weights small classes
more than the surveyor, pulling the dean’s average class size below the surveyor’s average class
size. P
= 1, · · · , n with sizes Ai = A1 , · · · , An . The Dean’s average is nAi , and
(b) Let there be classes i P
A2i
the surveyor’s average is P
Ai
.
A2 2 A2
P P P P
We know that V ar(A) = E(A )−E(A)2 = n i − ( nA2 i ) ≥ 0. Rearranging, this yields nAi ≤ P Aii ,
2

which means that the dean’s average can never be greater than the surveyor’s average. Equality
holds iff V ar(A) = 0; in other words, all classes are of the same size.

4. (BH 4.25) Nick and Penny are independently performing independent Bernoulli trials. For
concreteness, assume that Nick is flipping a nickel with probability p1 of Heads and Penny is
flipping a penny with probability p2 of Heads. Let X1 , X2 , . . . be Nick’s results and Y1 , Y2 , . . . be
Penny’s results, with Xi ∼ Bern(p1 ) and Yj ∼ Bern(p2 ).
(a) Find the distribution and expected value of the first time at which they are simultaneously
successful, i.e., the smallest n such that Xn = Yn = 1.
Hint: Define a new sequence of Bernoulli trials and use the story of the Geometric.
(b) Find the expected time until at least one has a success (including the success).
Hint: Define a new sequence of Bernoulli trials and use the story of the Geometric.
(c) For p1 = p2 , find the probability that their first successes are simultaneous, and use this to
find the probability that Nick’s first success precedes Penny’s.
(a) The probability that Nick and Penny are simultaneously successful is p1 p2 . Hence, the first time
they are successful follows the distribution Z ∼ F S(p1 p2 ) , where F S denotes the first success

2
distribution. Let Z 0 ∼ Geom(p1 p2 ). We have Z = Z 0 + 1 and E(Z) = 1 + E(Z 0 ) = 1 + 1−p 1 p2
p1 p2
=
1
.
p1 p 2
(b) The chance that at least one has a success is p1 + p2 − p1 p2 , by inclusion-exclusion. Similarly
to above, letting the r.v. W be the expected time until at least one has a success (including the
1
success), we have W ∼ F S(p1 + p2 − p1 p2 ) and E(W ) = .
p1 + p2 − p 1 p2
(c) The probability that the first successes are simultaneous on turn k is ((1 − p)k−1 )2 p2 , where
the squares account for the fact that there are two players. The k − 1 1 − p terms represent k − 1
consecutive failures, and the last p represents one final success. Adding, we find
∞ ∞
X
k−1 2 2 2
X p2 p
P (simultaneous success) = ((1 − p) ) p =p (1 − p)2k−2 = 2
= .
k=1 k=1
1 − (1 − p) (2 − p)

By symmetry, Nick and Penny are equally likely to precede each other. The chance of one preceding
p 1−p
the other is 1 − P (simultaneous success), which gives P (Nick) = 12 (1 − (2−p) )= .
2−p

5. (BH 4.28) In many problems about modeling count data, it is found that values of zero in
the data are far more common than can be explained well using a Poisson model (we can make
P (X = 0) large for X ∼ Pois(λ) by making λ small, but that also constrains the mean and variance
of X to be small since both are λ). The Zero-Inflated Poisson distribution is a modification of
the Poisson to address this issue, making it easier to handle frequent zero values gracefully.
A Zero-Inflated Poisson r.v. X with parameters p and λ can be generated as follows. First
flip a coin with probability of p of Heads. Given that the coin lands Heads, X = 0. Given that
the coin lands Tails, X is distributed Pois(λ). Note that if X = 0 occurs, there are two possible
explanations: the coin could have landed Heads (in which case the zero is called a structural zero),
or the coin could have landed Tails but the Poisson r.v. turned out to be zero anyway.
For example, if X is the number of chicken sandwiches consumed by a random person in a
week, then X = 0 for vegetarians (this is a structural zero), but a chicken-eater could still have
X = 0 occur by chance (since they might not happen to eat any chicken sandwiches that week).
(a) Find the PMF of a Zero-Inflated Poisson r.v. X.
(b) Explain why X has the same distribution as (1 − I)Y , where I ∼ Bern(p) is independent of
Y ∼ Pois(λ).
(c) Find the mean of X in two different ways: directly using the PMF of X, and using the
representation from (b). For the latter, you can use the fact (which we prove in Chapter 7) that
if r.v.s Z and W are independent, then E(ZW ) = E(Z)E(W ).
(d) Find the variance of X.

## Solution: (a) Let Z ∼ P ois(λ). We have P (X = 0) = p + (1 − p)P (Z = 0) and P (X =

−λ k
k) = (1 − p)P (Z = k) for k ≥ 1. With P (Z = k) = e k!λ , we find that the PMF of X is
−λ λk
P (X = 0) = p + (1 − p)e−λ and P (X = k) = (1−p)e k!
for k ≥ 1.
(b) Note that (P (1 − I) = 0) = (P (I = 1)) = p and (P (1 − I) = 1) = (P (I = 0)) = 1 − p. If
X = (1 − I)Y , this means that X = Y with probability 1 − p and X = 0 (regardless of Y ) with
probability p, which is exactly the same story of the Zero-Inflated Poisson r.v.

3
(c)
• PMF of X: Using the Taylor expansion of e, we find
∞ ∞
X X λk−1
E(X) = kP (X = k) = (1 − p)e−λ λ = (1 − p)λ .
k=0 k=0
(k − 1)!

## • E(ZW ) = E(Z)E(W ): We have E(1 − I) = 1 − E(I) = 1 − p, and E(Y ) = λ. Therefore,

E(X) = E(1 − I)E(Y ) = (1 − p)λ .

(d) Using the same rule for independent r.v.s, E(X 2 ) = E((1 − I)2 )E(Y 2 ) = (1 − p)2 (V ar(Y ) +
(EY )2 ) = (1 − p)2 (λ + λ2 ). Therefore, V ar(X) = E(X 2 ) − (EX)2 = (1 − p)2 (λ + λ2 ) − (1 − p)2 λ2 =
(1 − p)2 λ .

6. (BH 4.37) You have a well-shuffled 52-card deck. On average, how many pairs of adjacent cards
are there such that both cards are red?
Solution: By symmetry, each pair of adjacent cards has a 1/4 chance of both cards be-
ing red. Since there are 51 pairs of adjacent cards, by the linearity of expectation we obtain

## 7. (BH 4.55) Elk dwell in a certain forest.

 There are N elk, of which a simple random sample of
N
size n is captured and tagged (so all n sets of n elk are equally likely). The captured elk are
returned to the population, and then a new sample is drawn. This is an important method that
is widely used in ecology, known as capture-recapture. If the new sample is also a simple random
sample, with some fixed size, then the number of tagged elk in the new sample is Hypergeometric.
For this problem, assume that instead of having a fixed sample size, elk are sampled one by one
without replacement until m tagged elk have been recaptured, where m is specified in advance (of
course, assume that 1 ≤ m ≤ n ≤ N ). An advantage of this sampling method is that it can be
used to avoid ending up with a very small number of tagged elk (maybe even zero), which would
be problematic in many applications of capture-recapture. A disadvantage is not knowing how
large the sample will be.
(a) Find the PMFs of the number of untagged elk in the new sample (call this X) and of the total
number of elk in the new sample (call this Y ).
Hint: What does the event X = k say about how many tagged and how many untagged elk there
are in the first m + k − 1 elk sampled? What does it say about the (m + k)th elk sampled?
(b) Find the expected sample size EY using symmetry, linearity, and indicator r.v.s.
Hint: We can assume that even after getting m tagged elk, they continue to be captured until all N
of them have been obtained; briefly explain why this can be assumed. Express X = X1 + · · · + Xm ,
where X1 is the number of untagged elk before the first tagged elk, X2 is the number between the
first and second tagged elk, etc. Then find EXj by creating the relevant indicator r.v. for each
untagged elk in the population.
(c) Suppose that m, n, N are such that EY is an integer. If the sampling is done with a fixed
sample size equal to EY rather than sampling until exactly m tagged elk are obtained, find the
expected number of tagged elk in the sample. Is it less than m, equal to m, or greater than m
(for n < N )?

4
Solution:
(a) Suppose the number of untagged elk in the new sample is X = k. We have m − 1 tagged
elk in the first m + k − 1 elk sampled, and the mth tagged elk is recaptured on the m + kth try.
Therefore,

(choices of tagged elk)(choices of nontagged elk) (choices for mth tagged elk)
P (X = k) =
(Total choices) (remaining elk)

n N −n
 
m−1 k n − (m − 1)
= N
.
N − (m + k − 1)

m+k−1

(b) Simply ignore any elk that are captured after the mth tagged elk; if elk continue to be captured,
that does not change the value of X and Y because those are fully determined after the mth tagged
elk is captured. Let Xi be the number of untagged elk between the i − 1th and ith tagged elk,
and let Xn+1 be the number of untagged elk captured after the last (nth) tagged elk. We have the
−n
identity N − n = X1 + · · · + Xn+1 , and E(Xi ) = Nn+1 by symmetry, since there are n + 1 r.v.s in the
preceding equation, the tagged elk randomly partition the untagged elk, and there is no systematic
−n)
difference between two Xi and Xj . We have, then, E(X) = E(X1 )+· · ·+E(Xm ) = m(N n+1
. Finally,
m(N + 1)
m(N −n)
the identity Y = m + X yields E(Y ) = m + E(X) = m + n+1
= .
n+1
(c) Suppose we collect a sample of fixed size m(N +1)
n+1
. By symmetry, each elk in this sample has a
n/N chance of being tagged. By linearity of expectation, the expected number of tagged elk in
the entire sample is m(N
n+1
+1)
× Nn = m 1+1/N
1+1/n
. Since n < N we have 1/n > 1/N , and this expected
value is less than m.