Professional Documents
Culture Documents
1. (10 pt) Suppose we take one observation, X, from the discrete distribution,
x −2 −1 0 1 2
Pr(X = x|θ) (1 − θ)/4 θ/12 1/2 (3 − θ)/12 θ/4, 0 ≤ θ ≤ 1
solutions:
θ
E[T (X)] = 4 × + 0 × ((1 − θ)/4 + θ/12 + 1/2 + (3 − θ)/12) = θ
4
, which is an unbiased estimator of θ.
(b) Denote the MLE θ̂. Since there’s only one observation:
Since θ̂(0) can be multiple values, it’s not unique and all the MLE’s are biased
since
1 θ
E[θ̂] = 0 × (1 − θ)/4 + 1 × θ/12 + θ̂(0) × + 0 × (3 − θ)/12 + 1 × 6= θ
2 4
1
Q
(a) It is enough to show that the negative log likelihood function ℓ(θ) = − log f (xi |
θ) is a strictly convex function of θ ∈ Rp . Since it’s the sum of the negative log
likelihoods for each Xj , and a sum of strictly convex functions is strictly convex,
it’s enough to consider a single observation n = 1.
To show the negative log likelihood is strictly convex it is enough to show that
2
its Hessian, the matrix Hij = ∂θ∂i ∂θj ℓ(θ) of second derivatives, is positive-definite
everywhere (except possibly at θ̂ = x). Let’s compute the necessary derivatives:
p
!α/2
X
2
ℓ(θ) = − log cα + (xi − θi )
i=1
p
!(α−2)/2
∂ X
ℓ(θ) = −α (xi − θi )2 (xk − θk )
∂θk i=1
p
!(α−2)/2 p
!(α−4)/2
∂2 X X
ℓ(θ) = α (xi − θi )2 + α(α − 2) (xi − θi )2 (xk − θk )2
∂θk2 i=1 i=1
p
! (α−4)/2 " p
#
X X
2 2 2
= α (xi − θi ) (xi − θi ) + (α − 2)(xk − θk )
i=1 i=1
p
!(α−4)/2
∂2 X
ℓ(θ) = α (xi − θi )2 (α − 2)(xi − θi )(xk − θk )
∂θi ∂θk i=1
p
X
Akk = (xi − θi )2 + (α − 2)(xk − θk )2
i=1
Akj = (α − 2)(xk − θk )(xj − θj )
A = |∆|2 Ip + (α − 2)∆∆′
The matrix A staisfies A∆ = λ∆ with eigenvalue λ = |∆|2 (α − 1), strictly
postive since α > 1 (except at ∆ = 0, i.e., θ = θ̂ = x, which is okay). The
other eigenvectors are orthogonal to ∆, all with eigenvalues λ′ = |∆|2 , which
are also strictly positive. Thus A is positive-definite and so is the Hessian,
H = α|∆|α−4 A.
2
(b) First consider the case where α = 1 in dimension p = 1, with n = 2m even.
WLOG order the data x1 ≤ x2 ≤ · · · ≤ xn . The log likelihood function is given
by
X
ℓ(θ) = n log c(α) − |xi − θ|,
a continuous function whose derivative does not exist at the data points θ ∈ {xi }
and which elsewhere satisfies
d X d
ℓ(θ) = − |xi − θ|
dθ " dθ # " #
X X
= (−1) + (+1)
xi <θ xi >θ
= The number of xi > θ minus the number of xi < θ
(note that the derivative of |x − θ| is −1 on the interval θ ∈ (−∞, x), +1 on the
interval θ ∈ (x, ∞), and undefined at the point θ = x). Thus ℓ(θ) is increasing
when θ < xm , when more than half the {xi } exceed θ, and is decreasing when
θ > xm+1 , when fewer than half the {xi } exceed θ. In the interval xm < θ < xm+1
the derivative is zero, so ℓ(θ) is constant there and equal to its maximum value.
In case n = 2m − 1 is odd, the same argument shows that ℓ(θ) achieves a unique
maximum at the median xm .
In dimension p > 2 a similar argument holds, only θ̂ now should be any value in
the “median rectangle” (or “median block”) where each of its components is a
median of the corresponding components of the xi ’s.
3. (10 pt) Let (X1 , · · · , Xn ) be a random sample from the uniform distribution on the
interval (θ − 12 , θ + 12 ), where θ ∈ R is unknown. Let X(j) be the j th order statistic.
(a) Show that (X(1) + X(n) )/2 is strongly consistent for θ, i.e., that
limn→∞ (X(1) + X(n) )/2 = θ a.s.
(b) Show that X n := (X(1) + · · · + X(n) )/n is L2 consistent.
solution:
(a) For any ǫ > 0,
1 1
P (|X(1) − (θ − )| > ǫ) = P (X(1) > ǫ + (θ − ))
2 2
1 n
= [P (X1 > ǫ + (θ − ))]
2
n
= (1 − ǫ)
3
and
1 1
P (|X(n) − (θ + )| > ǫ) = P (X(n) < −ǫ + (θ + ))
2 2
1 n
= [P (X1 < −ǫ + (θ + ))]
2
= (1 − ǫ)n
iid
4. (10 pt) Let {Xi } ∼ No(µ, σ 2 ) be a random sample from the normal distribution with
unknown mean µ ∈ R and known variance σ 2 > 0. For fixed t 6= 0, find the Uniform
Minimum-Variance Unbiased Estimator (UMVUE) of etµ and show that its variance
is larger than the Cramér-Rao lower bound. Show that the ratio of its variance to
the Cramér-Rao lower bound converges to 1 as n → ∞.
solutions:
2 2
(a) The sample mean X is complete and sufficient for µ. Since E[etX ] = eµt+σ t /(2n) ,
2 2
the UMVUE of etµ is T (X) = e−σ t /(2n)+tX .
The Fisher information I(µ) = n/σ 2 . Then the Cramér-Rao lower bound is
d tµ 2
( dµ e ) /I(µ) = σ 2 t2 e2tµ /n. On the other hand,
2 t2 /n 2 t2 /n σ 2 t2 e2tµ
V ar(T ) = e−σ Ee2tX − e2tµ = (eσ − 1)e2tµ >
n
, the Cramér-Rao lower bound.
(b) The ratio of the variance of the UMVUE over the Cramér-Rao lower bound is
2 2
(eσ t /n −1)/(σ 2 t2 /n), which converges to 1 as n → ∞, since limx→0(ex −1)/x = 1
4
5. (10 pt) Let X have density function f (x|θ), x ∈ Rd , and let θ have (proper) prior
density π(θ). Let δ π (x) denote the Bayes estimate of θ under π(θ) for squared-error
loss (i.e., δ π (x) := E[θ | X = x]) and suppose δ π has finite Bayes risk r(π, δ π ) =
E[|δ π (X) − θ|2 ]. Show that for any other estimator δ(x),
Z
π
2
r(π, δ) − r(π, δ ) = δ(x) − δ π (x) f (x)dx
where f (x) is the marginal density of X. If f (x|θ) is normal No(θ, 1), consider the
collection of estimators of the form δ(X) := cX + d. Show that whenever 0 ≤ c < 1
these estimators are all proper Bayes estimators, and hence admissible [Hint: Find
a prior π for which δ π (X) = cX + d]. Show that if c > 1 the resulting estimator is
inadmissible.
solutions:
(a)
Z Z
π
r(π, δ) − r(π, δ ) = [(θ − δ(x))2 − (θ − δ π (x))2 ]f (x|θ)π(θ)dxdθ
Z Z
= [(θ − δ(x))2 − (θ − δ π (x))2 ]π(θ|x)f (x)dθdx
Z Z
= f (x)[ [(θ − δ(x))2 − (θ − δ π (x))2 ]π(θ|x)f (x)dθ]dx
Z
= f (x)[δ 2 (x) − δ π2 (x) + 2δ π2 (x) − 2δδ π ]dx
Z
2
= δ(x) − δ π (x) f (x)dx
τ2 1
E[θ|x] = 2
x+ 2 µ
τ +1 τ +1
2
. Let c = τ 2τ+1 , d = τ 2µ+1 . Whenever 0 ≤ c < 1, we can always get µ, τ 2 and
cX + d is a Bayes estimator with proper prior No(µ, τ 2 ).
(c) When c > 1, MSE(cX + d) = R(θ, δc ) = (cθ + d − θ)2 + c2 V ar(X) > 1, while
MSE for δ0 (X) = X is 1, so the estimator δ0 (X) = X is uniformly better than
cX + d. Similarly when c = 1, d 6= 0, X is uniformly better than X + d.
5
(a) One way to show this is to notice that it is a limiting case of homework problem
1.2.14 from the first homework set, taking the limit as τ 2 goes to infinity. Then
the posterior distribution of µ approaches a normal distribution with mean x̄ and
standard deviation σ 2 . If we were “bad” and interpreted a frequentist confidence
interval for µ in a Bayesian way, we would be implicitly assuming that our prior
knowledge of µ was utterly nil—that all real values were equally likely in our
minds (a questionable assumption, since it is not a proper prior). Here is a
slightly more detailed development of the solution.
iid
We know that if Xj ∼ No(θ, σ 2 ), then the sufficient statistic X̄ ∼ No(θ, σ 2 /n).
So the likelihood function for θ is given by:
1 −1
ℓ(θ) = f (x̄|θ) = √ exp (x̄ − θ)2
2πσ 2 /n 2σ 2 /n
The posterior distribution of θ given X is given by:
1 −1
π(θ|X) ∝ 1 · √ exp (x̄ − θ)2
2πσ 2 /n 2σ 2 /n
This function, which came about because it was the pdf of x̄, is also the kernel of
a normal density for θ, and it is easy to see that the mean is x̄ and the variance
is σ 2 /n. So θ|X ∼ No(x̄, σ 2 /n).
Under squared error loss, the Bayes estimator is the posterior mean, which in
this case is x̄.
π(θ|X) ∝ π(θ)π(X|θ)
= (1 − θ)−2 θS (1 − θ)n−S
= θS (1 − θ)n−S−2
6
This is the kernel of a Be(S + 1, n − S − 1) density, so long as the conditions are
met for such a density to exist, which are that S + 1 > 0 and n − S − 1 > 0.
The first of these will always be true, but the second will only be true when
n − S > 1, i.e., when at least two “failures” are observed. (Should we have
n − S = 1 or n − S = 0, then the expression θS (1 − θ)n−S−2 would not have a
fininte integral, so the posterior distribution on θ would be improper, a decided
no-no.)
Under the squared error loss function, the Bayes estimator is the posterior mean
S+1
of θ, which for a Be(S + 1, n − S − 1) density is known to be (S+1)+(n−S−1) = S+1
n
.
• Note that some students calculated the Bayes rule for λ instead of for θ, which
S+1
I also marked correct. The Bayes rule for λ is n−S−2 , exists when S < n − 1,
finite when S < n − 2.