Professional Documents
Culture Documents
Bayesian inference
(b) Suppose X1 , . . . , Xn are i.i.d. Poisson(θ), θ > 0, and the prior distribution Π of Θ is
given by a Gamma(a, b) distribution with shape parameter a and scale parameter b. Write
X = (X1 , . . . , Xn ).
(1) Find the posterior distribution of Θ given X.
(2) Find the Bayes estimate δπ (X) of θ when L(θ, d) = (θ − d)2 . Show that it is a
convex combination of the prior mean and the sample mean.
(3) What happens to δπ (X), when the error is squared error and n → ∞?
(4) What happens to δπ (X), when the error is squared error and a → 0, b → ∞, or
both?
(5) Find the Bayes estimate δπ (X) of θ when L(θ, d) = (θ − d)2 /θk .
(c) Suppose X1 , . . . , Xn are i.i.d. N(θ, σ02 ), θ ∈ R and σ0 > 0 is known. Let the prior
distribution Π of Θ be given by a N(µ, b2 ) distribution. Write X = (X1 , . . . , Xn ).
(1) Find the posterior distribution of Θ given X.
(2) Find the Bayes estimate of δπ (X) of θ when L(θ, d) = (θ − d)2 . Show that it is a
convex combination of the prior mean and the sample mean.
(3) Show that as n → ∞ with µ and b fixed, δπ (X) becomes essentially the estimate
X̄ and δπ (X) → θ in probability. Interpret these facts.
1
(4) Show that as b → 0, δπ (X) → µ in probability. Interpret this fact.
(5) Show that as b → ∞, δπ (X) essentially coincides with X̄. Interpret this fact.
(e) Suppose X1 , . . . , Xn are i.i.d. N(0, σ 2 ), σ > 0 is unknown. Let the prior distribution
def
Π for τ = 1/2σ 2 be given by a Gamma(g, 1/α) distribution with shape parameter g and
scale parameter 1/α. Write X = (X1 , . . . , Xn ).
(1) Find the posterior distribution of τ given X.
(1)
(2) Find the Bayes estimate δπ (X) of σ 2 when L(σ 2 , d) = (σ 2 − d)2 .
(2)
(3) Find the Bayes estimate δπ (X) of σ 2 when L(σ 2 , d) = (σ 2 − d)2 /σ 4 .
2
def
values of x, δ0 (X) = X/n can be considered as a Bayes estimate.
(b) Suppose X1 , . . . , Xn are i.i.d. N(θ, σ02 ), θ ∈ R and σ0 > 0 is known. Let the prior
distribution Π of Θ be given by an improper distribution with π(θ) = 1 (or any positive
constant c) for all θ ∈ R. This may be identified as a N(µ, b2 ) distribution with b = ∞.
Write X = (X1 , . . . , Xn ). Consider the problem of estimating θ when L(θ, d) = (θ − d)2 .
(1) Find the posterior distribution of Θ given X. [This is a proper posterior.]
(2) Show that the posterior expected loss E[L(Θ, δ(X))|X] is finite and minimized at
δ(X) = X̄.
(3) Argue that X̄ can be considered as a Bayes estimate.
(c) Suppose X1 , . . . , Xn are i.i.d. N(0, σ 2 ), σ > 0 is unknown. Let the prior distribu-
def
tion Π for τ = 1/2σ 2 be given by the improper density (1/τ )dτ which corresponds to a
Gamma(g, 1/α) distribution corresponding to g = α = 0. Write X = (X1 , . . . , Xn ).
(1) Find the posterior distribution of τ given X.
(1)
(2) Find the Bayes estimate δπ (X) of σ 2 when L(σ 2 , d) = (σ 2 − d)2 . Argue that
def ∑n
Y /(n − 2) is a Bayes estimate, where Y = 2
i=1 Xi .
(2)
(3) Find the Bayes estimate δπ (X) of σ 2 when L(σ 2 , d) = (σ 2 − d)2 /σ 4 . Argue that
def ∑n 2
Y /(n + 2) is a Bayes estimate, where Y = i=1 Xi .
(c) Suppose Xi , i = 1, . . . , n, are i.i.d. with E(Xi ) = θ and Var(Xi ) = v(θ). Show that
for any proper prior distribution Π on Θ, the sample mean X̄ will be a Bayes estimate only
∫
if v(θ) dΠ(θ) = 0. Hence, show that the sample mean X̄ cannot be a Bayes estimate if
v(θ) > 0 (a.e. Π).
(d) Suppose X1 , . . . , Xn are i.i.d. Bin(1, θ), 0 ≤ θ ≤ 1, and the prior distribution of Θ
is given by Π. Show that any estimate δ(X1 , . . . , Xn ) is a Bayes estimate of θ with respect
Bayesian inference
3
to some prior Π if δ(0, . . . , 0) = 0, δ(1, . . . , 1) = 1. Hence, show that the sample mean X̄ is
a Bayes estimate for some prior Π.
∫∞
(e) Prove the following result. Let X ∼ 1θ f (x/θ), x > 0, where 0 tf (t) dt = 1, and let
π(θ) = θ12 dθ, θ > 0. Then,
(1) E(X|θ) = θ, i.e., X is unbiased;
x2
(2) Π(θ|x) = θ3
f (x/θ) is a proper density;
(3) E(θ|X) = X, i.e., the posterior mean is unbiased.
4. (Conjugate prior)
(a) Suppose X1 , . . . , Xn are i.i.d. Bin(1, θ), 0 ≤ θ ≤ 1, and the prior distribution Π of
Θ is given by a Beta(r, s) distribution. Write X = (X1 , . . . , Xn ). Show that the posterior
is a beta distribution with parameters to be determined by you.
(b) By the term conjugate family of priors, we mean one to which the posterior after
sampling also belongs. We indicate in this exercise, how one may get hold of one such family
where data are being taken from a k-parameter exponential family. The previous exercise
provides an example of a conjugate family of priors.
Suppose X1 , . . . , Xn is a sample from the k-parameter exponential family
∑
k
p(x, θ) = h(x) exp ηj (θ)Tj (x) − B(θ) , x ∈ X ⊂ Rq , θ ∈ Θ ⊂ Rk . (1)
j=1
def ∑
n
Let tj = i=1 Tj (xi ), j = 1, . . . , k. A conjugate exponential family is obtained by letting
n and the tj ’s be “parameters” and treating θ as the variable of interest.
Let t = (t1 , . . . , tk , tk+1 ), and
∫ ∞ ∫ ∞ ∑
k
def
ω(t) = ··· exp tj ηj (θ) − tk+1 B(θ) dθ1 · · · dθk ,
−∞ −∞ j=1 (2)
def
Ω = {t : 0 < w(t) < ∞},
with integrals replaced by sums in the discrete case. We assume that Ω is non-empty. Show
that the (k + 1)-th parameter exponential family given by
∑
k
πt (θ) = exp tj ηj (θ) − tk+1 B(θ) − log w(t) , θ ∈ Θ, t ∈ Ω,
j=1
is a conjugate prior to p(x, θ) given by (1). Identify the parameters of the posterior and
indicate how the parameters of the prior are updated in the formula for the posterior.
Bayesian inference
4
(c) Suppose X1 , . . . , Xn is a N(θ, σ02 ) sample, where σ02 is known and θ ∈ Θ = R is
unknown. Construct a family of conjugate priors as in exercise (b).
5. (Merging opinions)
Consider a parameter space consisting of two points θ1 and θ2 , and suppose that for
given θ, an experiment leads to a random variable X whose pmf p(x|θ) is given by
x
θ 0 1
θ1 0.8 0.2
θ2 0.4 0.6
Let π be the prior frequency function of θ defined by π(θ1 ) = 1/2 and π(θ2 ) = 1/2.
(a) Find the posterior frequency function π(θ|x).
(b) Suppose X1 , . . . , Xn are independent with pmf p(x|θ). Find π(θ|x1 , . . . , xn ), the
posterior distribution of θ given Xi = xi , i = 1, . . . , n. Observe that it depends only on
∑n
i=1 xi .
(c) Same as (b) except use the prior π1 (θ1 ) = 0.25, π(θ2 ) = 0.75.
∑n
(d) Give the values of P (θ = θ1 | i=1 Xi = .5n) for the two priors π and π1 when n = 2
and n = 100.
∑n
(e) Give the most probable values θ̂ = arg maxθ π(θ| i=1 Xi = k) for the two priors π
and π1 . Compare these θ̂’s for n = 2 and n = 100.
(f) Give the set on which the two θ̂’s disagree. Show that the probability of this set
∑
tends to zero as n → ∞. Assume X ∼ p(x) = 2i=1 π(θi )p(x|θi ). For this convergence, does
it matter which prior, π or π1 , is used in the formula for p(x)?
6. Consider an experiment in which, for given θ, the outcome X has density p(x|θ) =
(2x/θ2 ), 0 < x < θ. Let π denote a prior density for θ.
(a) Find the posterior density of θ when π(θ) = 1, 0 ≤ θ ≤ 1.
(b) Find the posterior density of θ when π(θ) = 3θ2 , 0 ≤ θ ≤ 1.
(c) Find E(θ|X) for the two priors in (a) and (b).
(d) Suppose X1 , . . . , Xn are independent with the same distribution as X. Find the
posterior density of θ given X1 = x1 , X2 = x2 , . . . , Xn = xn when π(θ) = 1, 0 ≤ θ ≤ 1.
1
7. Let X1 , . . . , Xn be distributed as p(x1 , . . . , xn |θ) = where x1 , . . . , xn are natural
θn
Bayesian inference
5
numbers betwen 1 and θ, and Θ = {1, 2, 3, . . .}.
(a) Suppose θ has prior pmf
c(a)
π(j) = , j = 1, 2, . . . ,
ja
∑∞ −a ]−1 .
where a > 1 and c(α) = [ j=1 j Show that
c(n + a, m)
π(j|x1 , . . . , xn ) = , j = m, m + 1, . . . ,
j n+a
∑∞ −b ]−1 ,
where m = max(x1 , . . . , xn ), c(b, t) = [ j=t j b > 1.
(b) Suppose that max(x1 , . . . , xn ) = x1 = m for all n. Show that π(m|x1 , . . . , xn ) → 1
as n → ∞ whatever be a. Interpret this result.
∏
N
p(ξi , θ)
i=1
π(θ) = ∫ ,
∏N
p(ξi , θ) dθ
Θ i=1
where ξ ∈ A and N ∈ {1, 2, . . .} is a conjugate family of prior distributions for p(X|θ) and
that the posterior distribution of θ given X = x is given by
′
∏
N
p(ξi′ , θ)
π(θ|x) = ∫ i=1′ ,
∏N
p(ξi′ , θ) dθ
Θ i=1
(b) Use the result in (a) to obtain π(θ) and π(θ|x) when
9. Let p(x|θ) = exp{−(x − θ)}, 0 < θ < x and let π(θ) = 2 exp(−2θ), θ > 0. Find the
posterior density π(θ|x).
10. Suppose p(x|θ) is the density of i.i.d. X1 , . . . , Xn , where Xi ∼ N(µ0 , 1/θ), µ0 is known,
and θ == σ −2 is (called) the precision of the distribution of Xi .
Bayesian inference
6
∑n
(a) Show that p(x|θ) ∝ θn/2 exp(−tθ/2), where t = i=1 (Xi − µ0 )2 and ∝ denotes
“proprotional to” as a function of θ.
(b) Let π(θ) ∝ θ(λ−2)/2 exp(−νθ/2), ν > 0, λ > 0; θ > 0. Find the posterior distribution
π(θ|x) and show that if λ is an integer, given x, θ(t + ν) has a χ2λ+n distribution. Note
that, unconditionally, νθ has a χ2λ distribution.
(c) Find the posterior distribution of σ.
11. Show that if X1 , . . . , Xn are i.i.d. N(µ, σ 2 ) and we formally put π(µ, σ) = 1/σ, then the
√
posterior density π(µ|X̄, S 2 ) of µ given (X̄, S 2 ) is such that n(µ − X̄)/S ∼ tn−1 . [Here
def ∑n 2 def ∑n
X̄ = i=1 Xi /n, S = i=1 (Xi − X̄)2 /(n − 1).]
12. Let X have geometric distribution with parameter θ. In other words, P (X = k|θ) =
(1 − θ)k θ, k = 0, 1, 2, . . . .
(a) Find the posterior distribution of θ given X = 2 when the prior distribution of θ is
uniform on {1/4, 1/2, 3/4}.
(b) Relative to (a), what is the most probable value of θ given X = 2? Give reasons.
(c) Find the posterior distribution of θ given X = k when the prior distribution is
Beta(r, s).
def ∑n
13. In exercise 1(a), suppose n is large and x̄ = (1/n) i=1 xi
is not close to 0 or 1 and
the prior distribution is Beta(r, s). Justify the following approximation to the posterior
distribution: ( )
t − µ̃
P (θ ≤ t|X1 = x1 , . . . , Xn = xn ) ≈ Φ ,
σ̃
where Φ is the standard normal distribution function and
def n r def µ̃(1 − µ̃)
µ̃ = x̄ + , σ̃ 2 = .
n+r+s n+r+s n+r+s
14. Let (X1 , . . . , Xn+k ) be a sample from a population with density f (x|θ), θ ∈ Θ. Let θ
have prior density π. Show that the conditional distribution of (θ, Xn+1 , . . . , Xn+k ) given
X1 = x1 , . . . , Xn = xn is same as that of (Y, Z1 , . . . , Zk ) where the marginal distribution of
Y equals the posterior distribution of θ given X1 = x1 , . . . , Xn = xn , and the conditional
distribution of Zi ’s given Y = t is same as that of a sample of size k from the population
with density f (x|t).
7
def
(a) Verify that the Bayes estimate of θ will depend on the data only through Y =
maxi Xi .
(b) Show that
∫ ∞ 1
θ dθ
E(θ|X1 , . . . , Xn , a, b) = E(θ|Y, a, b) = ∫Y ∞ θn+a+1 e−1/θb .
1
dθ
Y θn+a+1 e−1/θb
where ∫
f (x|θ)π(θ|γ)ψ(γ) dθ
def
π(γ|x) = ∫ ∫ .
f (x|θ)π(θ|γ)ψ(γ) dθ dγ
8
(b) Identify the expectation in (a) as the expectation of the single-prior Bayes estimate
of θ using the density π(τ 2 |x).
(c) Show that the marginal prior of θ, unconditional on τ 2 , is given by
( )
1
Γ a+
2 1
π(θ) = √ ( )a+1/2 .
2πΓ(a)ba 1 θ2
+
b 2
(d) Suppose a = ν/2, b = 2/ν. Identify the marginal prior obtained in (c) above as a
standard distribution. Show, moreover, that in this case the posterior mean is given by
∫ ∞
θ(1 + θ2 /ν)−(ν+1)/2 e−n(θ−x̄)
2 /2σ 2
0 dθ
E(θ|x̄) = ∫−∞
∞ .
(1 + θ2 /ν)−(ν+1)/2 e−n(θ−x̄)
2 /2σ 2
0 dθ
−∞
[ ] 1 x̄2
−
σ02 τ 2 2 2 1
e 2 σ0 + τ e−1/bτ
2
2
σ0 + τ 2 (τ 2 )a+3/2
π(τ 2 |x̄) = .
∫ [ ] 1 x̄2
∞ −
σ02 τ 2 2 2 1
e 2 σ0 + τ e−1/bτ dτ 2
2
0 σ02 + τ 2 (τ 2 )a+3/2
(g) Explain how you can use the facts in (f) to implement the Gibbs sampler for the
purpose of evaluating the expectation in (a) above.
X|λ ∼ Poisson(λ),
λ|b ∼ Gamma(a, b), a known,
1
∼ Gamma(k, τ ), k, τ known.
b
Bayesian inference
9
(a) Show that ( )
b
λ|x, b ∼ Gamma a + x, ,
( 1+b )
1 τ
|x, λ ∼ Gamma a + k, .
b 1 + λτ
(b) Explain how you can use the facts in (a) to implement the Gibbs sampler for the
purpose of calculating the hierarchical Bayes estimate of λ.
(b) Identify the posterior mean in (a) above as a ratio of two means of a normal variable
with mean x and variance σ02 .
(c) Explain how you can use the fact in (b) to obtain Monte-carlo estimate of E(θ|x).
(d) Identify the posterior mean in (a) above as a ratio of two means of a Cauchy variable
with location parameter µ and scale parameter τ .
(e) Explain how you can use the fact in (d) to obtain Monte-carlo estimate of E(θ|x).
(f) Can you give any qualitative comparison of the estimates obtained in (c) and (e)?
(g) Argue that the given model can be replaced by the following:
d τ 2 + (θ − µ)2
λ|θ, x = λ|θ ∼ Exponential .
2τ 2
10
(i) Explain how you can use the facts in (h) to implement the Gibbs sampler for the
purpose of calculating the (hierarchical) Bayes estimate of θ.
20. (This exercise provides an example where the posterior density is not of any standard
form but using techniques of numerical integration, it is possible to work with it. On the
other hand, Gibbs sampling provides an excellent alternative.)
Suppose we are studying the distribution of the number of defectives X in the daily
production of a product. Consider the model (X|Y, θ) ∼ Bin(Y, θ), where Y , a day’s pro-
duction, is a random variable with a Poisson distribution with known mean λ, and θ is the
probability that any product is defective. The difficulty, however, is that Y is not observ-
able, and inference has to be made on the basis of X only. The prior distribution is such
that θ|Y = y ∼ Beta(α, γ), with known α and γ, independent of Y .
(a) Show that X|θ ∼ Poisson(λθ). Show also that the posterior pdf of θ given X = x is
given by
π(θ|x) ∝ exp(−λθ)θx+α−1 (1 − θ)γ−1 , 0 < θ < 1.
This is not a standard density. However, using techniques of numerical integration, it is
possible to work with the posterior density.
(b) Show that
Y |X = x, θ ∼ x + Poisson(λ(1 − θ)),
θ|X = x, Y = y ∼ Beta(α + x, γ + y − x).
Explain how you can use these facts to implement the Gibbs sampler for the purpose of
calculating the Bayes estimate of θ.
21. Let X1 , . . . , Xn be i.i.d. N(θ, σ02 ), σ0 known. Suppose θ has the N(µ, τ 2 ) prior.
(a) Construct the 100(1 − α)% HPD credible interval for θ.
(b) Consider the uniform prior for this problem by letting τ 2 → ∞. Work out (a) in
this case.
11
24. Let X1 , . . . , Xn be i.i.d. observations from a Beta(θ, 1) distribution and assume that θ
has a gamma(a, σ) prior. Find a 100(1 − α)% credible set for θ.
25. Let X1 , . . . , Xn be i.i.d. observations from an exponential distribution with scale pa-
rameter λ, where λ has the conjugate IG(a, b) prior, an inverted gamma with pdf
( )a+1
1 1
π(λ|a, b) = e−1/(bλ) , 0 < λ < ∞.
Γ(a)ba λ
26. From the preceding exercise, find a 100(1 − α)% HPD credible interval for σ 2 , the
variance of a normal distribution, based on the sample variance S 2 and using a conjugate
IG(a, b) prior for σ 2 . Starting with this interval, find the limiting 100(1 − α)% HPD credible
for σ 2 obtained, as a → 0 and b → ∞.
27. Let X1 , . . . , Xn be i.i.d. N(θ, σ 2 ), where both θ and σ are unknown. Our interest is on
the inference about θ only. Consider the prior pdf
( )
1 1 1
e−1/(bσ ) ,
2 2 2 2
π(θ, σ 2 |µ, τ 2 , a, b) = √ e(θ−µ) /(2τ σ a
2
2πτ σ 2 Γ(a)b σ2
Is there any (limiting) configuration of τ 2 , a, and b that would allow this set to be approached
by a credible set from part (b)?
29. Let θ denote the probability of success with a particular drug for some disease. Consider
two different experiments to estimate θ. In the first experiment, n randomly chosen patients
Bayesian inference
12
are treated with this drug and let X denote the number of successes. In the other experi-
ment, patients are treated with this drug, one after another until r successes are observed.
In this experiment, let Y denote the total number of patients treated with this drug.
(a) Construct a 100(1 − α)% HPD credible interval for θ under the U(0, 1) prior when
X = x is observed.
(b) Construct a 100(1 − α)% HPD credible interval for θ under the U(0, 1) prior when
Y = y is observed.
30. Suppose that X|θ ∼ Bin(n, θ) and that θ ∼ Beta(r, s) distribution with r and s integers.
def
Let λ = sθ/[r(1 − θ)]. Show how the quantiles of the F distribution can be used to find
upper and lower credible bounds for λ and for θ. [Hint. θ ∼ Beta(r, s) =⇒ sθ/[r(1 − θ)] ∼
F2r,2s .]
31. Suppose that X1 , . . . , Xn are i.i.d. Poisson(λ) and λ ∼ χ2k /s0 , where s0 is a constant.
def ∑n
Let T = i=1 Xi .
32. Suppose that X1 , . . . , Xn are i.i.d. U(0, θ), and that θ ∼ Pareto(c, s) distribution with
pdf
scs
π(t) = , t > c, s, c > 0.
ts−1
def
Let M = X(n) .
(a) Show that θ|M = m ∼ Pareto(max(c, m), s + n).
(b) Find level 1 − α upper and lower credible bounds for θ.
(c) Give a level 1 − α confidence level for θ.
(d) Compare the level 1 − α upper and lower credible bounds for θ to the level 1 − α
upper and lower confidence bounds for θ. In particular consider the credible bounds as
n → ∞.
def
33. Suppose that X1 , . . . , Xn are i.i.d. N(µ0 , σ 2 ), where µ0 is known. Let λ = σ −2 .
Suppose λ has the gamma distribution with spahe parameter a/2 and scale parameter 2/b.
Find level 1 − α upper credible bound for σ 2 .
Appendix
13
The following theorem and the corollary that follows tell us how to obtain Bayes estimates.
Theorem Let Θ have distribution Λ, and given Θ = θ, let X have distribution Pθ .
Suppose, in addition, the following assumptions hold for the problem of estimating g(Θ)
with non-negative loss function L(θ, d).
(b) If L(θ, d) = [d − g(θ)], then δΛ (x) is any median of the conditional distri-
bution of Θ given x.
(c) If {
0 when |d − θ| ≤ c,
L(θ, d) =
1 when |d − θ| > c,
then δΛ (x) is the midpoint of the interval I of length 2c which maximizes
P [Θ ∈ I|x].
X|θ ∼ f (x|θ),
(1)
Θ|γ ∼ π(θ|γ).
Bayesian inference
14
where
f (x|θ)π(θ|γ0 )
π(θ|x, γ0 ) = ∫ . (2.2)
f (x|θ)π(θ|γ0 ) dθ
In a hierarchical Bayes model, rather than specifying the prior distribution as a single
function, we specify it in a hierarchy. Thus, we place another level on the model (1), and
write
X|θ ∼ f (x|θ),
Θ|γ ∼ π(θ|γ), (3)
Γ ∼ ψ(γ),
where we assume that ψ(·) is known and not dependent on any other unknown hyperpa-
rameters (as parameters of a prior are sometimes called). Note that we can continue this
hierarchical modeling and add more stages to the model, but this is not often done in prac-
tice. The class of models (3) appears to be more general than the class (1) since in (1), γ
has a fixed value, but in (3), it is permitted to have an arbitrary probability distribution.
However, this appearance is deceptive. Since π(θ|γ) in (1) can be any fixed distribution, we
∫
can, in particular, take for it π(θ) = π(θ|γ)ψ(γ) dγ, which reduces the hierarchical model
(3) to the single-prior model (1). However, there is a conceptual and practical advantage to
the hierarchical model, in that it allows us to model relatively complicated situation using
a series of simpler steps; that is, both π(θ|γ) and ψ(γ) may be of a simple form (even conju-
gate), but π(θ) may be more complex. Moreover, there is often a computational advantage
to hierarchical modeling.
15
as i −→ ∞, and
∫
1 ∑M
h(θi ) −→ E(h(θ)|x) = h(θ)π(θ|x) dθ. (8.1)
M i=1
Notice also that
∫
1 ∑M
E(h(θ)|x, γi ) −→ E(h(θ)|x, γ))π(x|γ) dγ = E(h(θ)|x). (8.2)
M i=1
Thus, Gibbs sampling provides two methods of calculating the same quantity (posterior
mean, in this case). Notice, in view of Rao-Blackwell theorem, that (8.2) is superior to
(8.1).
Bayesian inference
16