You are on page 1of 16

INDIAN STATISTICAL INSTITUTE

B.Stat. (Hons.) 3rd Year. 2022-23. Semester 1


Parametric Inference Assignment 5 October 17, 2022

Bayesian inference

1. (Bayes estimates in some standard situations involving a proper prior)


(a) Suppose X1 , . . . , Xn are i.i.d. Bin(1, θ), 0 ≤ θ ≤ 1, and the prior distribution Π of
Θ is given by a Beta(a, b) distribution. Write X = (X1 , . . . , Xn ).
(1) Find the posterior distribution of Θ given X.
(2) Find the Bayes estimate δπ (X) of θ when L(θ, d) = (θ − d)2 . Show that it is a
convex combination of the prior mean and the sample mean.
(3) What can you say about δπ (X) when n is held fixed but a and b → ∞ with the
ratio b/a being kept fixed?
(4) What can you say about δπ (X) when a and b are held fixed but n → ∞?
(5) Find the Bayes estimate δπ∗ (X) of θ(1 − θ) under squared error loss.
(6) Compare δπ∗ (X) of (5) with the UMVUE δ0 (X) of θ(1 − θ).

(b) Suppose X1 , . . . , Xn are i.i.d. Poisson(θ), θ > 0, and the prior distribution Π of Θ is
given by a Gamma(a, b) distribution with shape parameter a and scale parameter b. Write
X = (X1 , . . . , Xn ).
(1) Find the posterior distribution of Θ given X.
(2) Find the Bayes estimate δπ (X) of θ when L(θ, d) = (θ − d)2 . Show that it is a
convex combination of the prior mean and the sample mean.
(3) What happens to δπ (X), when the error is squared error and n → ∞?
(4) What happens to δπ (X), when the error is squared error and a → 0, b → ∞, or
both?
(5) Find the Bayes estimate δπ (X) of θ when L(θ, d) = (θ − d)2 /θk .

(c) Suppose X1 , . . . , Xn are i.i.d. N(θ, σ02 ), θ ∈ R and σ0 > 0 is known. Let the prior
distribution Π of Θ be given by a N(µ, b2 ) distribution. Write X = (X1 , . . . , Xn ).
(1) Find the posterior distribution of Θ given X.
(2) Find the Bayes estimate of δπ (X) of θ when L(θ, d) = (θ − d)2 . Show that it is a
convex combination of the prior mean and the sample mean.
(3) Show that as n → ∞ with µ and b fixed, δπ (X) becomes essentially the estimate
X̄ and δπ (X) → θ in probability. Interpret these facts.

1
(4) Show that as b → 0, δπ (X) → µ in probability. Interpret this fact.
(5) Show that as b → ∞, δπ (X) essentially coincides with X̄. Interpret this fact.

(d) Suppose X1 , . . . , Xn are i.i.d. Geometric(θ), 0 ≤ θ ≤ 1, and the prior distribution Π


of Θ is given by a Beta(a, b) distribution. Write X = (X1 , . . . , Xn ).
(1) Find the posterior distribution of Θ given X.
(1)
(2) Find the Bayes estimate δπ (X) of θ when L(θ, d) = (θ − d)2 . Show that it is a
convex combination of the prior mean and the sample mean.
(1)
(3) What can you say about δπ (X) when n is held fixed but a and b → ∞ with the
ratio b/a being kept fixed?
(1)
(4) What can you say about δπ (X) when a and b are held fixed but n → ∞?
(2)
(5) Find the Bayes estimate δπ (X) of 1/θ when the loss is squared-error. Show that
it is a convex combination of the prior mean and the sample mean.
(2)
(6) What can you say about δπ (X) when n is held fixed but a and b → ∞ with the
ratio b/a being kept fixed?
(2)
(7) What can you say about δπ (X) when a and b are held fixed but n → ∞?

(e) Suppose X1 , . . . , Xn are i.i.d. N(0, σ 2 ), σ > 0 is unknown. Let the prior distribution
def
Π for τ = 1/2σ 2 be given by a Gamma(g, 1/α) distribution with shape parameter g and
scale parameter 1/α. Write X = (X1 , . . . , Xn ).
(1) Find the posterior distribution of τ given X.
(1)
(2) Find the Bayes estimate δπ (X) of σ 2 when L(σ 2 , d) = (σ 2 − d)2 .
(2)
(3) Find the Bayes estimate δπ (X) of σ 2 when L(σ 2 , d) = (σ 2 − d)2 /σ 4 .

2. (Bayes estimates in some standard situations involving an improper prior)


(a) Suppose X1 , . . . , Xn are i.i.d. Bin(1, θ), 0 ≤ θ ≤ 1, and the prior distribution Π of
Θ is given by a Beta(a, b) distribution, where a = b = 0. Hence, the prior is improper since
∫1 −1 dp = ∞. Write X = (X , . . . , X ). Consider the problem of estimating θ
0 [p(1 − p)] 1 n
def
when L(θ, d) = (θ − d)2 . Let X = X1 + · · · + Xn .
(1) Show that the posterior distribution of Θ given X is proper if 1 ≤ X ≤ n − 1, and
that it is improper if X = 0 or n.
(2) Show that for any estimate δ(X) that satisfies δ(0) = 0 and δ(n) = 1, the posterior
expected loss E[L(Θ, δ(X))|X] is finite and minimized at δ(X) = X/n.
(3) Argue that even though the resulting posterior distribution is not proper for all
Bayesian inference

2
def
values of x, δ0 (X) = X/n can be considered as a Bayes estimate.

(b) Suppose X1 , . . . , Xn are i.i.d. N(θ, σ02 ), θ ∈ R and σ0 > 0 is known. Let the prior
distribution Π of Θ be given by an improper distribution with π(θ) = 1 (or any positive
constant c) for all θ ∈ R. This may be identified as a N(µ, b2 ) distribution with b = ∞.
Write X = (X1 , . . . , Xn ). Consider the problem of estimating θ when L(θ, d) = (θ − d)2 .
(1) Find the posterior distribution of Θ given X. [This is a proper posterior.]
(2) Show that the posterior expected loss E[L(Θ, δ(X))|X] is finite and minimized at
δ(X) = X̄.
(3) Argue that X̄ can be considered as a Bayes estimate.

(c) Suppose X1 , . . . , Xn are i.i.d. N(0, σ 2 ), σ > 0 is unknown. Let the prior distribu-
def
tion Π for τ = 1/2σ 2 be given by the improper density (1/τ )dτ which corresponds to a
Gamma(g, 1/α) distribution corresponding to g = α = 0. Write X = (X1 , . . . , Xn ).
(1) Find the posterior distribution of τ given X.
(1)
(2) Find the Bayes estimate δπ (X) of σ 2 when L(σ 2 , d) = (σ 2 − d)2 . Argue that
def ∑n
Y /(n − 2) is a Bayes estimate, where Y = 2
i=1 Xi .
(2)
(3) Find the Bayes estimate δπ (X) of σ 2 when L(σ 2 , d) = (σ 2 − d)2 /σ 4 . Argue that
def ∑n 2
Y /(n + 2) is a Bayes estimate, where Y = i=1 Xi .

3. (Bayes estimates and unbiased estimates)


(a) Prove the following result. Let Θ have distribution Π, and let Pθ denote the con-
ditional distribution of X given θ. Consider the estimation of g(θ) when the loss func-
tion is squared error. Then, no unbiased estimate δ(X) can be a Bayes estimate unless
E[δ(X) − g(θ)]2 = 0 where the expectation is taken with respect to variation to both X and
Θ.

(b) Suppose Xi , i = 1, . . . , n, are i.i.d. with E(Xi ) = θ and Var(Xi ) = σ 2 which is


independent of θ. Show that for any proper prior distribution on Θ, the sample mean X̄
cannot be a Bayes estimate.

(c) Suppose Xi , i = 1, . . . , n, are i.i.d. with E(Xi ) = θ and Var(Xi ) = v(θ). Show that
for any proper prior distribution Π on Θ, the sample mean X̄ will be a Bayes estimate only

if v(θ) dΠ(θ) = 0. Hence, show that the sample mean X̄ cannot be a Bayes estimate if
v(θ) > 0 (a.e. Π).

(d) Suppose X1 , . . . , Xn are i.i.d. Bin(1, θ), 0 ≤ θ ≤ 1, and the prior distribution of Θ
is given by Π. Show that any estimate δ(X1 , . . . , Xn ) is a Bayes estimate of θ with respect
Bayesian inference

3
to some prior Π if δ(0, . . . , 0) = 0, δ(1, . . . , 1) = 1. Hence, show that the sample mean X̄ is
a Bayes estimate for some prior Π.
∫∞
(e) Prove the following result. Let X ∼ 1θ f (x/θ), x > 0, where 0 tf (t) dt = 1, and let
π(θ) = θ12 dθ, θ > 0. Then,
(1) E(X|θ) = θ, i.e., X is unbiased;
x2
(2) Π(θ|x) = θ3
f (x/θ) is a proper density;
(3) E(θ|X) = X, i.e., the posterior mean is unbiased.

4. (Conjugate prior)
(a) Suppose X1 , . . . , Xn are i.i.d. Bin(1, θ), 0 ≤ θ ≤ 1, and the prior distribution Π of
Θ is given by a Beta(r, s) distribution. Write X = (X1 , . . . , Xn ). Show that the posterior
is a beta distribution with parameters to be determined by you.

(b) By the term conjugate family of priors, we mean one to which the posterior after
sampling also belongs. We indicate in this exercise, how one may get hold of one such family
where data are being taken from a k-parameter exponential family. The previous exercise
provides an example of a conjugate family of priors.
Suppose X1 , . . . , Xn is a sample from the k-parameter exponential family
 

k
p(x, θ) = h(x) exp  ηj (θ)Tj (x) − B(θ) , x ∈ X ⊂ Rq , θ ∈ Θ ⊂ Rk . (1)
j=1

def ∑
n
Let tj = i=1 Tj (xi ), j = 1, . . . , k. A conjugate exponential family is obtained by letting
n and the tj ’s be “parameters” and treating θ as the variable of interest.
Let t = (t1 , . . . , tk , tk+1 ), and
 
∫ ∞ ∫ ∞ ∑
k
def
ω(t) = ··· exp  tj ηj (θ) − tk+1 B(θ) dθ1 · · · dθk ,
−∞ −∞ j=1 (2)
def
Ω = {t : 0 < w(t) < ∞},

with integrals replaced by sums in the discrete case. We assume that Ω is non-empty. Show
that the (k + 1)-th parameter exponential family given by
 

k
πt (θ) = exp  tj ηj (θ) − tk+1 B(θ) − log w(t) , θ ∈ Θ, t ∈ Ω,
j=1

is a conjugate prior to p(x, θ) given by (1). Identify the parameters of the posterior and
indicate how the parameters of the prior are updated in the formula for the posterior.
Bayesian inference

4
(c) Suppose X1 , . . . , Xn is a N(θ, σ02 ) sample, where σ02 is known and θ ∈ Θ = R is
unknown. Construct a family of conjugate priors as in exercise (b).

(d) Suppose X1 , . . . , Xn is a Poisson(θ) sample, where θ ∈ Θ = (0, ∞) is unknown.


Construct a family of conjugate priors as in exercise (b).

5. (Merging opinions)
Consider a parameter space consisting of two points θ1 and θ2 , and suppose that for
given θ, an experiment leads to a random variable X whose pmf p(x|θ) is given by

x
θ 0 1
θ1 0.8 0.2
θ2 0.4 0.6

Let π be the prior frequency function of θ defined by π(θ1 ) = 1/2 and π(θ2 ) = 1/2.
(a) Find the posterior frequency function π(θ|x).
(b) Suppose X1 , . . . , Xn are independent with pmf p(x|θ). Find π(θ|x1 , . . . , xn ), the
posterior distribution of θ given Xi = xi , i = 1, . . . , n. Observe that it depends only on
∑n
i=1 xi .

(c) Same as (b) except use the prior π1 (θ1 ) = 0.25, π(θ2 ) = 0.75.
∑n
(d) Give the values of P (θ = θ1 | i=1 Xi = .5n) for the two priors π and π1 when n = 2
and n = 100.
∑n
(e) Give the most probable values θ̂ = arg maxθ π(θ| i=1 Xi = k) for the two priors π
and π1 . Compare these θ̂’s for n = 2 and n = 100.
(f) Give the set on which the two θ̂’s disagree. Show that the probability of this set

tends to zero as n → ∞. Assume X ∼ p(x) = 2i=1 π(θi )p(x|θi ). For this convergence, does
it matter which prior, π or π1 , is used in the formula for p(x)?

6. Consider an experiment in which, for given θ, the outcome X has density p(x|θ) =
(2x/θ2 ), 0 < x < θ. Let π denote a prior density for θ.
(a) Find the posterior density of θ when π(θ) = 1, 0 ≤ θ ≤ 1.
(b) Find the posterior density of θ when π(θ) = 3θ2 , 0 ≤ θ ≤ 1.
(c) Find E(θ|X) for the two priors in (a) and (b).
(d) Suppose X1 , . . . , Xn are independent with the same distribution as X. Find the
posterior density of θ given X1 = x1 , X2 = x2 , . . . , Xn = xn when π(θ) = 1, 0 ≤ θ ≤ 1.

1
7. Let X1 , . . . , Xn be distributed as p(x1 , . . . , xn |θ) = where x1 , . . . , xn are natural
θn
Bayesian inference

5
numbers betwen 1 and θ, and Θ = {1, 2, 3, . . .}.
(a) Suppose θ has prior pmf
c(a)
π(j) = , j = 1, 2, . . . ,
ja
∑∞ −a ]−1 .
where a > 1 and c(α) = [ j=1 j Show that

c(n + a, m)
π(j|x1 , . . . , xn ) = , j = m, m + 1, . . . ,
j n+a
∑∞ −b ]−1 ,
where m = max(x1 , . . . , xn ), c(b, t) = [ j=t j b > 1.
(b) Suppose that max(x1 , . . . , xn ) = x1 = m for all n. Show that π(m|x1 , . . . , xn ) → 1
as n → ∞ whatever be a. Interpret this result.

8. Suppose X1 , . . . , Xn is a sample with Xi ∼ p(x|θ), a regular model and integrable as a


function of θ. Assume that A = {x : p(x|θ) > 0} does not depend on θ.
(a) Show that the family of priors


N
p(ξi , θ)
i=1
π(θ) = ∫ ,
∏N
p(ξi , θ) dθ
Θ i=1

where ξ ∈ A and N ∈ {1, 2, . . .} is a conjugate family of prior distributions for p(X|θ) and
that the posterior distribution of θ given X = x is given by


N
p(ξi′ , θ)
π(θ|x) = ∫ i=1′ ,
∏N
p(ξi′ , θ) dθ
Θ i=1

where N ′ = N + n and (ξ1′ , ξ2′ , . . . , ξN


′ ) = (ξ , . . . , ξ , x , . . . , x ).
′ 1 N 1 n

(b) Use the result in (a) to obtain π(θ) and π(θ|x) when

p(x|θ) = θ exp(−θx), x > 0, θ > 0,


= 0 otherwise.

9. Let p(x|θ) = exp{−(x − θ)}, 0 < θ < x and let π(θ) = 2 exp(−2θ), θ > 0. Find the
posterior density π(θ|x).

10. Suppose p(x|θ) is the density of i.i.d. X1 , . . . , Xn , where Xi ∼ N(µ0 , 1/θ), µ0 is known,
and θ == σ −2 is (called) the precision of the distribution of Xi .
Bayesian inference

6
∑n
(a) Show that p(x|θ) ∝ θn/2 exp(−tθ/2), where t = i=1 (Xi − µ0 )2 and ∝ denotes
“proprotional to” as a function of θ.
(b) Let π(θ) ∝ θ(λ−2)/2 exp(−νθ/2), ν > 0, λ > 0; θ > 0. Find the posterior distribution
π(θ|x) and show that if λ is an integer, given x, θ(t + ν) has a χ2λ+n distribution. Note
that, unconditionally, νθ has a χ2λ distribution.
(c) Find the posterior distribution of σ.

11. Show that if X1 , . . . , Xn are i.i.d. N(µ, σ 2 ) and we formally put π(µ, σ) = 1/σ, then the

posterior density π(µ|X̄, S 2 ) of µ given (X̄, S 2 ) is such that n(µ − X̄)/S ∼ tn−1 . [Here
def ∑n 2 def ∑n
X̄ = i=1 Xi /n, S = i=1 (Xi − X̄)2 /(n − 1).]

12. Let X have geometric distribution with parameter θ. In other words, P (X = k|θ) =
(1 − θ)k θ, k = 0, 1, 2, . . . .
(a) Find the posterior distribution of θ given X = 2 when the prior distribution of θ is
uniform on {1/4, 1/2, 3/4}.
(b) Relative to (a), what is the most probable value of θ given X = 2? Give reasons.
(c) Find the posterior distribution of θ given X = k when the prior distribution is
Beta(r, s).

def ∑n
13. In exercise 1(a), suppose n is large and x̄ = (1/n) i=1 xi
is not close to 0 or 1 and
the prior distribution is Beta(r, s). Justify the following approximation to the posterior
distribution: ( )
t − µ̃
P (θ ≤ t|X1 = x1 , . . . , Xn = xn ) ≈ Φ ,
σ̃
where Φ is the standard normal distribution function and
def n r def µ̃(1 − µ̃)
µ̃ = x̄ + , σ̃ 2 = .
n+r+s n+r+s n+r+s

14. Let (X1 , . . . , Xn+k ) be a sample from a population with density f (x|θ), θ ∈ Θ. Let θ
have prior density π. Show that the conditional distribution of (θ, Xn+1 , . . . , Xn+k ) given
X1 = x1 , . . . , Xn = xn is same as that of (Y, Z1 , . . . , Zk ) where the marginal distribution of
Y equals the posterior distribution of θ given X1 = x1 , . . . , Xn = xn , and the conditional
distribution of Zi ’s given Y = t is same as that of a sample of size k from the population
with density f (x|t).

15. (Scale uniform)


Consider the following model:
i.i.d.
Xi |θ ∼ U(0, θ), i = 1, . . . , n,
1
∼ Gamma(a, b), [a : shape parameter b : scale parameter]; a, b are known.
θ
Bayesian inference

7
def
(a) Verify that the Bayes estimate of θ will depend on the data only through Y =
maxi Xi .
(b) Show that
∫ ∞ 1
θ dθ
E(θ|X1 , . . . , Xn , a, b) = E(θ|Y, a, b) = ∫Y ∞ θn+a+1 e−1/θb .
1

Y θn+a+1 e−1/θb

(c) Show that

1 P (χ22(n+a−1) < 2/by)


E(θ|Y, a, b) = ,
b(n + a − 1) P (χ22(n+a) < 2/by)

where χ2ν is a chi-squared random variable with ν degrees of freedom.

16. (Single-prior Bayes and hierarchical Bayes)


Consider the models given by (1) and (3) in the appendix. Let π(θ|x, γ) be the single-
prior Bayes posterior and let π(θ|x) be the hierarchical Bayes posterior.
(a) Show that ∫
π(θ|x) = π(θ|x, γ) · π(γ|x) dγ,

where ∫
f (x|θ)π(θ|γ)ψ(γ) dθ
def
π(γ|x) = ∫ ∫ .
f (x|θ)π(θ|γ)ψ(γ) dθ dγ

(b) Show that E(θ|x) = E[E(θ|x, γ)].


(c) Show that Var(θ|x) = E[Var(θ|x, γ)]+ Var[E(θ|x, γ)], and hence that π(θ|x) will tend
to have larger variance than π(θ|x, γ0 ).

17. (Conjugate normal hierarchy)


Consider the following model:
i.i.d.
Xi |θ ∼ N(θ, σ02 ), σ0 is known i = 1, . . . , n,
θ|τ ∼ N(0, τ 2 ),
1
∼ Gamma(a, b), [a : shape parameter b : scale parameter]; a, b are known.
τ2
Write x = (x1 , . . . , xn ).
(a) Show that the hierarchical Bayes estimate of θ under squared error loss is given by

nτ 2 x̄
E(θ|x) = π(τ 2 |x) dτ 2 .
nτ 2 + σ02
Bayesian inference

8
(b) Identify the expectation in (a) as the expectation of the single-prior Bayes estimate
of θ using the density π(τ 2 |x).
(c) Show that the marginal prior of θ, unconditional on τ 2 , is given by
( )
1
Γ a+
2 1
π(θ) = √ ( )a+1/2 .
2πΓ(a)ba 1 θ2
+
b 2

(d) Suppose a = ν/2, b = 2/ν. Identify the marginal prior obtained in (c) above as a
standard distribution. Show, moreover, that in this case the posterior mean is given by
∫ ∞
θ(1 + θ2 /ν)−(ν+1)/2 e−n(θ−x̄)
2 /2σ 2
0 dθ
E(θ|x̄) = ∫−∞
∞ .
(1 + θ2 /ν)−(ν+1)/2 e−n(θ−x̄)
2 /2σ 2
0 dθ
−∞

(e) Show that the marginal posterior of τ 2 is given by

[ ] 1 x̄2

σ02 τ 2 2 2 1
e 2 σ0 + τ e−1/bτ
2
2
σ0 + τ 2 (τ 2 )a+3/2
π(τ 2 |x̄) = .
∫ [ ] 1 x̄2
∞ −
σ02 τ 2 2 2 1
e 2 σ0 + τ e−1/bτ dτ 2
2

0 σ02 + τ 2 (τ 2 )a+3/2

(f) Show that


( )
nτ 2 σ02 τ 2
θ|x̄, τ 2 ∼ N 2 x̄, 2 ,
nτ 2 +
 0
σ σ0 + nτ 2
( )−1 
1 1 θ2 1
2
|x̄, θ ∼ Gamma a + , + .
τ 2 2 b

(g) Explain how you can use the facts in (f) to implement the Gibbs sampler for the
purpose of evaluating the expectation in (a) above.

18. (Poisson hierarchy with Gibbs sampling)


Consider the following model:

X|λ ∼ Poisson(λ),
λ|b ∼ Gamma(a, b), a known,
1
∼ Gamma(k, τ ), k, τ known.
b
Bayesian inference

9
(a) Show that ( )
b
λ|x, b ∼ Gamma a + x, ,
( 1+b )
1 τ
|x, λ ∼ Gamma a + k, .
b 1 + λτ

(b) Explain how you can use the facts in (a) to implement the Gibbs sampler for the
purpose of calculating the hierarchical Bayes estimate of λ.

19. (Estimation of normal mean with Cauchy prior)


Suppose X|θ ∼ N(θ, σ02 ), σ0 is known. Let θ ∼ Cauchy(µ, τ ), where µ, τ are known.
(a) Show that the posterior mean and variance of θ are given by
∫ ( )
∞ (θ − x)2 ( )−1
θ exp − τ 2 + (θ − µ)2 dθ
−∞ 2σ02
E(θ|x) = ∫ ( ) ,
∞ (θ − x)2 ( 2 )−1
exp − τ + (θ − µ) 2

−∞ (
2σ02 )
∫ ∞
(θ − x)2 ( 2 )−1
θ2 exp − 2 τ + (θ − µ)2 dθ
−∞ 2σ0
Var(θ|x) = ∫ ( ) − (E(θ|x))2 .
∞ (θ − x)2 ( )−1
exp − τ 2 + (θ − µ)2 dθ
−∞ 2σ02

(b) Identify the posterior mean in (a) above as a ratio of two means of a normal variable
with mean x and variance σ02 .
(c) Explain how you can use the fact in (b) to obtain Monte-carlo estimate of E(θ|x).
(d) Identify the posterior mean in (a) above as a ratio of two means of a Cauchy variable
with location parameter µ and scale parameter τ .
(e) Explain how you can use the fact in (d) to obtain Monte-carlo estimate of E(θ|x).
(f) Can you give any qualitative comparison of the estimates obtained in (c) and (e)?
(g) Argue that the given model can be replaced by the following:

X|θ ∼ N(θ, σ02 ),


θ|λ ∼ N(µ, τ 2 /λ),
λ ∼ Gamma(1/2, 1/2).

(h) Show that under the model in (g),


( )
τ2 λσ02 τ 2 σ02
θ|x, λ ∼ N x + µ, ,
τ 2 + λσ0( τ 2 + λσ0 )τ 2 + λσ02
2 2

d τ 2 + (θ − µ)2
λ|θ, x = λ|θ ∼ Exponential .
2τ 2

10
(i) Explain how you can use the facts in (h) to implement the Gibbs sampler for the
purpose of calculating the (hierarchical) Bayes estimate of θ.

20. (This exercise provides an example where the posterior density is not of any standard
form but using techniques of numerical integration, it is possible to work with it. On the
other hand, Gibbs sampling provides an excellent alternative.)
Suppose we are studying the distribution of the number of defectives X in the daily
production of a product. Consider the model (X|Y, θ) ∼ Bin(Y, θ), where Y , a day’s pro-
duction, is a random variable with a Poisson distribution with known mean λ, and θ is the
probability that any product is defective. The difficulty, however, is that Y is not observ-
able, and inference has to be made on the basis of X only. The prior distribution is such
that θ|Y = y ∼ Beta(α, γ), with known α and γ, independent of Y .
(a) Show that X|θ ∼ Poisson(λθ). Show also that the posterior pdf of θ given X = x is
given by
π(θ|x) ∝ exp(−λθ)θx+α−1 (1 − θ)γ−1 , 0 < θ < 1.
This is not a standard density. However, using techniques of numerical integration, it is
possible to work with the posterior density.
(b) Show that

Y |X = x, θ ∼ x + Poisson(λ(1 − θ)),
θ|X = x, Y = y ∼ Beta(α + x, γ + y − x).

Explain how you can use these facts to implement the Gibbs sampler for the purpose of
calculating the Bayes estimate of θ.

21. Let X1 , . . . , Xn be i.i.d. N(θ, σ02 ), σ0 known. Suppose θ has the N(µ, τ 2 ) prior.
(a) Construct the 100(1 − α)% HPD credible interval for θ.
(b) Consider the uniform prior for this problem by letting τ 2 → ∞. Work out (a) in
this case.

22. Let X1 , . . . , Xm and Y1 , . . . , Yn be independent random samples, respectively, from


N(θ1 , σ02 ) and N(θ2 , σ02 ), where σ0 is known. Construct a 100(1 − α)% credible interval for
θ1 − θ2 assuming a uniform prior on (θ1 , θ2 ).

23. Let X1 , . . . , Xm and Y1 , . . . , Yn be independent random samples, respectively, from


N(θ, σ12 ) and N(θ, σ22 ), where both σ1 and σ2 are known. Construct a 100(1 − α)% credible
interval for the common mean θ assuming a uniform prior. Show that the frequentist
100(1 − α)% confidence interval leads to the same answer.
Bayesian inference
Bayesian inference

11
24. Let X1 , . . . , Xn be i.i.d. observations from a Beta(θ, 1) distribution and assume that θ
has a gamma(a, σ) prior. Find a 100(1 − α)% credible set for θ.

25. Let X1 , . . . , Xn be i.i.d. observations from an exponential distribution with scale pa-
rameter λ, where λ has the conjugate IG(a, b) prior, an inverted gamma with pdf
( )a+1
1 1
π(λ|a, b) = e−1/(bλ) , 0 < λ < ∞.
Γ(a)ba λ

Find a 100(1 − α)% HPD credible set for λ.

26. From the preceding exercise, find a 100(1 − α)% HPD credible interval for σ 2 , the
variance of a normal distribution, based on the sample variance S 2 and using a conjugate
IG(a, b) prior for σ 2 . Starting with this interval, find the limiting 100(1 − α)% HPD credible
for σ 2 obtained, as a → 0 and b → ∞.

27. Let X1 , . . . , Xn be i.i.d. N(θ, σ 2 ), where both θ and σ are unknown. Our interest is on
the inference about θ only. Consider the prior pdf
( )
1 1 1
e−1/(bσ ) ,
2 2 2 2
π(θ, σ 2 |µ, τ 2 , a, b) = √ e(θ−µ) /(2τ σ a
2
2πτ σ 2 Γ(a)b σ2

a N(µ, τ 2 σ 2 ) multiplied by an IG(a, b).


(a) Show that this prior is a conjugate prior for this problem.
(b) Find the posterior distribution of θ and use is to construct a 100(1 − α)% credible
set for θ.
(c) The classical (frequentist) 1 − α confidence set for θ can be expressed as
{ }
S2
θ :| θ − X̄ |≤ F1,n−1,1−α/2 .
n

Is there any (limiting) configuration of τ 2 , a, and b that would allow this set to be approached
by a credible set from part (b)?

28. Suppose X1 , . . . , Xn are i.i.d. Bin(1, θ) observations.


(a) Calculate a 100(1 − α)% credible set for θ using the conjugate Beta(a, b) prior.
(b) Using the relationship between the Beta and F distributions, write the credible set
in a form that is comparable to the form of the intervals in exercise 24, section 10. Do the
intervals match for any values of a and b?

29. Let θ denote the probability of success with a particular drug for some disease. Consider
two different experiments to estimate θ. In the first experiment, n randomly chosen patients
Bayesian inference

12
are treated with this drug and let X denote the number of successes. In the other experi-
ment, patients are treated with this drug, one after another until r successes are observed.
In this experiment, let Y denote the total number of patients treated with this drug.
(a) Construct a 100(1 − α)% HPD credible interval for θ under the U(0, 1) prior when
X = x is observed.
(b) Construct a 100(1 − α)% HPD credible interval for θ under the U(0, 1) prior when
Y = y is observed.

30. Suppose that X|θ ∼ Bin(n, θ) and that θ ∼ Beta(r, s) distribution with r and s integers.
def
Let λ = sθ/[r(1 − θ)]. Show how the quantiles of the F distribution can be used to find
upper and lower credible bounds for λ and for θ. [Hint. θ ∼ Beta(r, s) =⇒ sθ/[r(1 − θ)] ∼
F2r,2s .]

31. Suppose that X1 , . . . , Xn are i.i.d. Poisson(λ) and λ ∼ χ2k /s0 , where s0 is a constant.
def ∑n
Let T = i=1 Xi .

(a) Show that λ|T = t ∼ χ2k+2t /(s0 + 2n).


(b) Show how quantiles of the χ2 distribution can be used to determine level 1 − α upper
and lower credible bounds for θ.

32. Suppose that X1 , . . . , Xn are i.i.d. U(0, θ), and that θ ∼ Pareto(c, s) distribution with
pdf
scs
π(t) = , t > c, s, c > 0.
ts−1
def
Let M = X(n) .
(a) Show that θ|M = m ∼ Pareto(max(c, m), s + n).
(b) Find level 1 − α upper and lower credible bounds for θ.
(c) Give a level 1 − α confidence level for θ.
(d) Compare the level 1 − α upper and lower credible bounds for θ to the level 1 − α
upper and lower confidence bounds for θ. In particular consider the credible bounds as
n → ∞.
def
33. Suppose that X1 , . . . , Xn are i.i.d. N(µ0 , σ 2 ), where µ0 is known. Let λ = σ −2 .
Suppose λ has the gamma distribution with spahe parameter a/2 and scale parameter 2/b.
Find level 1 − α upper credible bound for σ 2 .

Appendix

A. Existence of Bayes estimates and how to find them


Bayesian inference

13
The following theorem and the corollary that follows tell us how to obtain Bayes estimates.
Theorem Let Θ have distribution Λ, and given Θ = θ, let X have distribution Pθ .
Suppose, in addition, the following assumptions hold for the problem of estimating g(Θ)
with non-negative loss function L(θ, d).

(a) There exists an estimator δ0 with finite risk.


(b) For almost all x, there exists a value δΛ (x) minimizing

E{L(Θ, δ(x))|X = x}.

Then, δΛ (X) is a Bayes estimate of g(θ).


Corollary Suppose the assumptions of the theorem stated above hold.

(a) If L(θ, d) = [d − g(θ)]2 , then δΛ (x) = E[g(Θ)|x], and, more generally, if


L(θ, d) = w(θ)[d − g(θ)]2 , then

w(θ)g(θ) dΛ(θ|x) E(w(Θ)g(Θ)|x)
δΛ (x) = ∫ = .
w(θ) dΛ(θ|x) E(w(Θ)|x)

(b) If L(θ, d) = [d − g(θ)], then δΛ (x) is any median of the conditional distri-
bution of Θ given x.
(c) If {
0 when |d − θ| ≤ c,
L(θ, d) =
1 when |d − θ| > c,
then δΛ (x) is the midpoint of the interval I of length 2c which maximizes
P [Θ ∈ I|x].

B. Single-prior Bayes and hierarchical Bayes


A single-prior Bayes model is described as follows:

X|θ ∼ f (x|θ),
(1)
Θ|γ ∼ π(θ|γ).

Thus, conditionally, on θ, X has sampling density f (x|θ), and conditionally on γ, Θ has


prior density π(θ|x, γ). We assume that the functional form of the prior, and the value of γ,
is known so that we have one completely specified prior. The known value of γ is sometimes
denoted by γ0 .
Given a loss function L(θ, d), we then look for the estimate that minimizes

L(θ, d(x))π(θ|x, γ0 ) dθ, (2.1)

Bayesian inference

14
where
f (x|θ)π(θ|γ0 )
π(θ|x, γ0 ) = ∫ . (2.2)
f (x|θ)π(θ|γ0 ) dθ
In a hierarchical Bayes model, rather than specifying the prior distribution as a single
function, we specify it in a hierarchy. Thus, we place another level on the model (1), and
write
X|θ ∼ f (x|θ),
Θ|γ ∼ π(θ|γ), (3)
Γ ∼ ψ(γ),
where we assume that ψ(·) is known and not dependent on any other unknown hyperpa-
rameters (as parameters of a prior are sometimes called). Note that we can continue this
hierarchical modeling and add more stages to the model, but this is not often done in prac-
tice. The class of models (3) appears to be more general than the class (1) since in (1), γ
has a fixed value, but in (3), it is permitted to have an arbitrary probability distribution.
However, this appearance is deceptive. Since π(θ|γ) in (1) can be any fixed distribution, we

can, in particular, take for it π(θ) = π(θ|γ)ψ(γ) dγ, which reduces the hierarchical model
(3) to the single-prior model (1). However, there is a conceptual and practical advantage to
the hierarchical model, in that it allows us to model relatively complicated situation using
a series of simpler steps; that is, both π(θ|γ) and ψ(γ) may be of a simple form (even conju-
gate), but π(θ) may be more complex. Moreover, there is often a computational advantage
to hierarchical modeling.

C. Gibbs sampling in hierarchical Bayes models


Consider the following model:
X|θ ∼ f (x|θ),
Θ|γ ∼ π(θ|γ), (4)
Γ ∼ ψ(γ),
where we assume that ψ(·) is known and not dependent on any other unknown hyperpa-
rameters (as parameters of a prior are sometimes called). From (4), we calculate the full
conditionals
θ|x, γ ∼ π(θ|x, γ)
(5)
γ|x, θ ∼ π(γ|x, θ),
which are the posterior distributions of each parameter conditional on all others. If, for
i = 1, 2, . . . , M, random variables are generated according to
θi |x, γi−1 ∼ π(θ|x, γi−1 )
(6)
γi |x, θi−1 ∼ π(γ|x, θi−1 ),
this defines a Markov chain (θi , γi ). It follows from the theory of such chains that there exist
distributions π(θ|x) and π(γ|x) such that
d
θi −→ θ ∼ π(θ|x),
d (7)
γi −→ γ ∼ π(γ|x),

15
as i −→ ∞, and

1 ∑M
h(θi ) −→ E(h(θ)|x) = h(θ)π(θ|x) dθ. (8.1)
M i=1
Notice also that

1 ∑M
E(h(θ)|x, γi ) −→ E(h(θ)|x, γ))π(x|γ) dγ = E(h(θ)|x). (8.2)
M i=1

Thus, Gibbs sampling provides two methods of calculating the same quantity (posterior
mean, in this case). Notice, in view of Rao-Blackwell theorem, that (8.2) is superior to
(8.1).

Bayesian inference

16

You might also like