PSet 6

Stat 244 Winter 2024 — Problem set 6
Due on Gradescope, Thursday Feb 15 (9:30am)
1. Suppose X is drawn from Uniform[0, T ]. The parameter T can be 1 or 2 or 3—these are the only
possibilities.
(a) Consider the following procedure for estimating T based on observing the data X:
• If we observe X ≤ 1, we estimate that T is 1
• If we observe 1 < X ≤ 2, we estimate that T is 2
• If we observe X > 2, we estimate that T is 3
For each possible value of T , compute the probability that we estimate T correctly or incorrectly.
(b) Consider a Bayesian framework where we place a prior on T — we assume it’s equally likely to
be 1 or 2 or 3. (In other words, T ∼ Uniform{1, 2, 3}, the uniform distribution over a finite set.)
Compute the posterior distribution of T , given the observed data X. As in the lecture, you can
assume that it is okay to combine densities and PMFs for this setting where X is continuous while
T is discrete. The final form of your answer should be very simple — your final answer should give
simple numerical values, without summation notation or anything like that, but you will need to
split into cases.
2. Rejection sampling for the Geometric distribution.
(a) Calculate the rejection sampling procedure that we would use if we have access to draws from the
Geometric(0.5) distribution, and would like to simulate draws from the Geometric(0.6) distribu-
tion.
(b) What goes wrong if we instead try to simulate draws from the Geometric(0.4) distribution (again
assuming that Geometric(0.5) is the distribution that we can draw samples from)?
3. Consider the following two density functions:
Distribution A Distribution B
5
5
4
4
3
3
density
density
2
2
1
1
0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
x x
1
Consider two settings:
(1) You have access to samples from Distribution A, and you use rejection sampling to produce
samples from Distribution B.
(2) You have access to samples from Distribution B, and you use rejection sampling to produce
samples from Distribution A.
Which of these two implementations of rejection sampling will be more efficient, and which will be less
efficient? Explain your answer thoroughly. You may use pictures to help explain your solution, but a
picture alone without an explanation is not sufficient.
4. In this problem you will construct a loose confidence interval for Binomial data. This interval won’t be
optimal—there are techniques to compute a more narrow interval, which will be covered later on—but
the goal is to illustrate the idea behind confidence intervals rather than to compute the best possible
answer.
Suppose that X ∼ Binomial(n, p) (e.g., the total number of Heads, if a probability-p coin is flipped n
times). We will write p̂ = X
n , the proportion of Heads in the observed data. We know that E(X) = np
and Var(X) = np(1 − p).
(a) Use Chebyshev’s inequality to find an upper bound on
P(|p̂ − p| ≥ ϵ)
where ϵ > 0 is some small constant. Your upper bound will depend on the unknown p. However,
by calculating maxp∈[0,1] {p(1 − p)}, you can construct a looser upper bound that doesn’t depend
on p by using your answer to part (a). So, your final answer should be of the form
P(|p̂ − p| ≥ ϵ) ≤ (some function that depends on n and on ϵ but not on p)
(b) Next, for some desired error level α ∈ (0, 1), find a value for ϵ so that the probability above is
≤ α. Your value of ϵ should depend on n and α but not on p. Once you’ve computed this, you
will have a statement of the form
P(|p̂ − p| ≥ (some function of n and α)) ≤ α.
Then, based on this answer, compute an interval of the form

h i
(a lower bound which is a function of p̂ & n & α), (an upper bound which is a function of p̂ & n & α)
so that, calculating probability with respect to a draw of the random variable X, it holds that
P(the true parameter p lies in the interval) ≥ 1 − α.
5. For each part of this problem, use the CLT to approximate the probability you need to compute. You
can ignore the issue of continuity corrections for this problem.
(a) There are two games to play at a fair. For the first game, in each round of the game you roll a
fair die, and win $4 if you roll a 6 or lose $1 otherwise. Suppose you play this first game 36 times.
What is the approximate probability that in the end you are ahead (i.e., your total earnings are
positive)?
(b) For the second game, in each round of the game you throw a football, and the money you win
is equal to 0.1(F − 20), where F (in feet) is the distance that you threw the ball. Assume that
F follows an Exponential(0.1) distribution. Suppose you play the second game 50 times. What
is the approximate probability that you lose no more than $30 in total when playing the second
game?
2
(c) Now combine all your games—you play the first game 36 times and then the second game 50
times. What is the probability of losing less than $72 in total?
6. Let X ∼ Binomial(60, 0.22) and let Y = X/60 be the proportion of successes in the sample.
(a) What is the normal distribution that approximates the distribution of Y ?
(b) Calculate (approximately) the probability P(Y ≤ 0.25) (you can ignore issues of continuity cor-
rections etc). (To obtain values of Φ(x), the CDF of the normal distribution, you can use Table
2 in the back of your book or just search online for “standard normal table”. Or, if you have R,
you can use the command pnorm.)

PSet 6

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

PSet 6

Uploaded by

Copyright:

Available Formats

Stat 244 Winter 2024 — Problem set 6

Due on Gradescope, Thursday Feb 15 (9:30am)

3. Consider the following two density functions:

(a) Use Chebyshev’s inequality to find an upper bound on

P(|p̂ − p| ≥ ϵ) ≤ (some function that depends on n and on ϵ but not on p)

P(|p̂ − p| ≥ (some function of n and α)) ≤ α.

Then, based on this answer, compute an interval of the form

P(the true parameter p lies in the interval) ≥ 1 − α.

You might also like