You are on page 1of 8

Homework solutions Math525 Fall 2003

Text book: Bickel-Doksum, 2nd edition


Assignment # 2

Section 1.2.

1. (a).

π(θi )p(x|θi ) p(x|θi )


π(θi |x) = = i = 1, 2
π(θ1 )p(x|θ1 ) + π(θ2 )p(x|θ2 ) p(x|θ1 ) + p(x|θ2 )

Therefore,
0.8 2 0.2 1
 
 π(θ1 |0) = =  π(θ1 |1) = =

 

0.8 + 0.4 3 0.2 + 0.6 4
0.4 1 0.6 3
 π(θ2 |0) = =  π(θ2 |1) = =

 

0.8 + 0.4 3 0.2 + 0.6 4

(b). Pn Pn
xi
p(x1 , · · · , xn |θ1 ) = (0.2) i=1 (0.8)n− i=1
xi

Pn Pn
xi
p(x1 , · · · , xn |θ2 ) = (0.6) i=1 (0.4)n− i=1
xi

Hence
π(θi )p(x1 , · · · , xn |θi )
π(θi |x1 , · · · , xn ) =
π(θ1 )p(x1 , · · · , xn |θ1 ) + π(θ2 )p(x1 , · · · , xn |θ2 )
p(x1 , · · · , xn |θi )
= i = 1, 2
p(x1 , · · · , xn |θ1 ) + p(x1 , · · · , xn |θ2 )
Thus,
Pn Pn
xi n− xi
(0.2) i=1 (0.8) i=1
π(θ1 |x1 , · · · , xn ) = Pn Pn Pn Pn
xi n− xi xi n− xi
(0.2) i=1 (0.8) i=1 + (0.6) i=1 (0.4) i=1

Pn Pn
xi n− xi
(0.6) i=1 (0.4) i=1
π(θ2 |x1 , · · · , xn ) = Pn
Pn
Pn
Pn
xi n− xi xi n− xi
(0.2) i=1 (0.8) i=1 + (0.6) i=1 (0.4) i=1

(c).
Pn Pn
(0.25)(0.2) i=1 xi (0.8)n− i=1 xi
π(θ1 |x1 , · · · , xn ) = Pn Pn Pn Pn
xi n− xi xi n− xi
0.25(0.2) i=1 (0.8) i=1 + 0.75(0.6) i=1 (0.4) i=1
Pn Pn
(0.2) i=1 xi (0.8)n− i=1 xi
= Pn Pn Pn Pn
(0.2) i=1 xi (0.8)n− i=1 xi + 3 · (0.6) i=1 xi (0.4)n− i=1 xi

1
Pn Pn
3 · (0.6) i=1 xi (0.4)n− i=1 xi
π(θ2 |x1 , · · · , xn ) = Pn Pn Pn Pn
(0.2) i=1 xi (0.8)n− i=1 xi + 3 · (0.6) i=1 xi (0.4)n− i=1 xi

(d) For the prior distribution π,


n
 X n (0.2)n/2 (0.8)n/2
π θ1 Xi = =

i=1
2 (0.2)n/2 (0.8)n/2 + (0.6)n/2 (0.4)n/2

n
 X n (0.6)n/2 (0.4)n/2
π θ2 Xi = =

i=1
2 (0.2)n/2 (0.8)n/2 + (0.6)k (0.4)n/2
As n = 2
2
 X  (0.2)(0.8) 2
π θ1 Xi = 1 = =

i=1
(0.2)(0.8) + (0.6)(0.4) 5
2
 X  2 3
π θ2 Xi = 1 = 1 − =

i=1
5 5

As n = 100
100
 X  (1.6)50
π θ1 Xi = 50 =

i=1
(1.6)50 + (2.4)50
n
 X n (2.4)50
π θ2 Xi = =

i=1
2 (1.6)50 + (2.4)50

For the prior distribution π1 , as n = 2,


2
 X  (0.2)(0.8) 2
π θ1 Xi = 1 = =

i=1
(0.2)(0.8) + 3 · (0.6)(0.4) 11

2
 X  2 9
π θ2 Xi = 1 = 1 − =

i=1
5 11
100
 X  (1.6)50
π θ1 Xi = 50 =

i=1
(1.6)50 + 3 · (2.4)50
100
 X  3 · (2.4)50
π θ2 Xi = 50 =

i=1
(1.6)50 + 3 · (2.4)50

(e) First, I believe that there k should be n/2. By definition


n n n
 X n n  X n  X n o
arg max π θ Xi = = θ; π θ Xi = = max π s Xi =

θ
i=1
2 i=1
2 s
i=1
2

2
By (d), in both cases n = 2 and n = 100,
n
 X n
arg max π θ Xi = = θ2

θ 2
i=1

3. (a)

π(θ|X = 2)
π(θ)P {X = 2|θ = θ}
=
π(1/4)P {X = 2|θ = 1/4} + π(1/2)P {X = 2|θ = 1/2} + π(3/4)P {X = 2|θ = 3/4}
(1 − θ)2 θ 16
= 2 2 2
= (1 − θ)2 θ
(3/4) (1/4) + (1/2) (1/2) + (1/4) (3/4) 5
Hence
9 2 3
π((1/4)|X = 2) = , π((1/2)|X = 2) = , π((3/4)|X = 2) =
20 5 20

(b) Given X = 2, the most probable value of θ is 1/4. As X = k,

(1 − θ)k θ
π(θ|X = k) =
(3/4)k (1/4) + (1/2)k (1/2) + (1/4)k (3/4)

So we need to compare (3/4)k (1/4), (1/2)k (1/2) and (1/4)k (3/4). As k = 0, the third is
the largest — so 3/4 is most probable. As k = 1, the second is the largest — so 1/2 is
most probable. As k ≥ 2, the first is the largest — so 1/4 is most probable.

(c). The posterior density of θ is

π(θ|X = k)
θ r−1 (1 − θ)s−1  Z 1 xr−1 (1 − x)s−1 −1
= P {X = k|θ = θ} P {X = k|θ = x}dx
B(r, s) 0 B(r, s)
Z 1 −1 θ r (1 − θ)s+k−1
r−1 s−1 k r−1 s−1 k
= θ (1 − θ) (1 − θ) θ x (1 − x) (1 − x) xdx =
0 B(r + 1, s + k)

where θ > 0. So the posterior distribution of θ given X = k is the beta distribution


β(r + 1, s + k).

4.(a) Notice that p(x1 , · · · , xn |j) = 0 as j < max{x1 , · · · , xn } = m.

π(j)p(x1 , · · · , xn |j)
π(j|x1 , · · · , xn ) = P∞
k=1 π(k)p(x1 , · · · , xn |k)
c(a)j −a j −n c(n + a, m)
= P∞ = j = m, m + 1, · · ·
c(a) k=m j −a−n j a+n

3
(b).
∞ 
X −1  ∞  m n+a −1
c(n + a, m) m n+a X
π(m|x1 , · · · , xn ) = = = 1+
ma+n j=m
j j=m+1
j

Therefore the conclusion follows from the fact that


X∞  m n+a
−→ 0 (n → ∞)
j=m+1
j

which can be easily checked out by either dominated or monotonic convergence theorem.

Explanation: {X1 , · · · , Xn } is an i.i.d. sample from the population with the uniform
distribution over {1, · · · , θ}, where θ is an unknown parameter given as the exact upper
bound of the distribution. It is intuitively clear that max{X1 , · · · , Xn } converges to θ in
some suitable sense as n → ∞. Another way to describe such phenomenum is to say that
θ, the randomization of θ, take the value m = max{x1 , · · · , xn } with probability close to 1
as n is large.

Section 1.3.

1. (a), (b) We use the formula in p.25:

R(θ, δ) = l(θ, a1)P {δ(X) = a1 } + l(θ, a2 )P {δ(X) = a2 } + l(θ, a3 )P {δ(X) = a3 }



 Pθ1 {δ(X) = a2 } + 2Pθ1 {δ(X) = a3 } if θ = θ1
=
2Pθ2 {δ(X) = a1 } + Pθ2 {δ(X) = a3 } if θ = θ2

We now consider the distributions of the decision rules δi i = 1, 2, · · · , 9:


i = 1:
Pθ1 {δ1 (X) = a1 } = 1, Pθ1 {δ1 (X) = a2 } = 0 Pθ1 {δ1 (X) = a3 } = 0
Pθ2 {δ1 (X) = a1 } = 1, Pθ2 {δ1 (X) = a2 } = 0 Pθ2 {δ1 (X) = a3 } = 0

i = 2:

Pθ1 {δ2 (X) = a1 } = p, Pθ1 {δ2 (X) = a2 } = 1 − p, Pθ1 {δ2 (X) = a3 } = 0

Pθ2 {δ2 (X) = a1 } = q, Pθ2 {δ2 (X) = a2 } = 1 − q, Pθ2 {δ2 (X) = a3 } = 0

i = 3:

Pθ1 {δ3 (X) = a1 } = p, Pθ1 {δ3 (X) = a2 } = 0, Pθ2 {δ3 (X) = a3 } = 1 − p

Pθ2 {δ3 (X) = a1 } = q, Pθ2 {δ3 (X) = a2 } = 0, Pθ2 {δ3 (X) = a3 } = 1 − q

4
i = 4:

Pθ1 {δ4 (X) = a1 } = 1 − p, Pθ1 {δ4 (X) = a2 } = p, Pθ1 {δ3 (X) = a3 } = 0

Pθ2 {δ4 (X) = a1 } = 1 − q, Pθ2 {δ4 (X) = a2 } = q, Pθ2 {δ3 (X) = a3 } = 0

i = 5:
Pθ1 {δ5 (X) = a1 } = 0, Pθ1 {δ5 (X) = a2 } = 1, Pθ1 {δ5 (X) = a3 } = 0
Pθ2 {δ5 (X) = a1 } = 0, Pθ2 {δ5 (X) = a2 } = 1, Pθ2 {δ5 (X) = a3 } = 0

i = 6:

Pθ1 {δ6 (X) = a1 } = 0, Pθ1 {δ6 (X) = a2 } = p, Pθ1 {δ6 (X) = a3 } = 1 − p

Pθ2 {δ6 (X) = a1 } = 0, Pθ2 {δ6 (X) = a2 } = q, Pθ2 {δ6 (X) = a3 } = 1 − p

i = 7:

Pθ1 {δ7 (X) = a1 } = 1 − p, Pθ1 {δ7 (X) = a2 } = 0, Pθ1 {δ7 (X) = a3 } = p

Pθ2 {δ7 (X) = a1 } = 1 − q, Pθ2 {δ7 (X) = a2 } = 0, Pθ2 {δ7 (X) = a3 } = q

i = 8:

Pθ1 {δ8 (X) = a1 } = 0, Pθ1 {δ8 (X) = a2 } = 1 − p, Pθ1 {δ8 (X) = a3 } = p

Pθ2 {δ8 (X) = a1 } = 0, Pθ2 {δ8 (X) = a2 } = 1 − q, Pθ2 {δ8 (X) = a3 } = q

i = 9:
Pθ1 {δ9 (X) = a1 } = 0, Pθ1 {δ9 (X) = a2 } = 0, Pθ1 {δ9 (X) = a3 } = 1
Pθ2 {δ9 (X) = a1 } = 0, Pθ2 {δ9 (X) = a2 } = 0, Pθ2 {δ9 (X) = a3 } = 1
Plug in we have all risk points R(θj , δi ) (j = 1, 2, i = 1, 2, · · · 9).

(c) In the case (a), the decision rule has the same distribution regardless the value of
θ. 
 P {δ(X) = a2 } + 2P {δ(X) = a3 } if θ = θ1
R(θ, δ) =
2P {δ(X) = a1 } + P {δ(X) = a3 } if θ = θ1

Hence,
0 if θ = θ1 0.9 if θ = θ1
( (
R(θ, δ1 ) = , R(θ, δ2) = ,
1 if θ = θ1 0.2 if θ = θ1

5
1.8 if θ = θ1 0.1 if θ = θ1
( (
R(θ, δ3 ) = , R(θ, δ4 ) = ,
1.1 if θ = θ1 1.8 if θ = θ1
1 if θ = θ1 1.9 if θ = θ1
( (
R(θ, δ5 ) = , R(θ, δ6) = ,
0 if θ = θ1 0.9 if θ = θ1
1.1 if θ = θ1 1.1 if θ = θ1
( (
R(θ, δ7 ) = , R(θ, δ8 ) = ,
1.9 if θ = θ1 0.1 if θ = θ1
2 if θ = θ1
(
R(θ, δ9 ) =
1 if θ = θ1
The minimax rule is δ2 .

(d).
r(δ1 ) = 0.5, r(δ2 ) = 0.55, r(δ3 ) = 1.45, r(δ4 ) = 0.95, r(δ5 ) = 0.5
r(δ6 ) = 1.4, r(δ7 ) = 1.5, r(δ8 ) = 0.6, r(δ9 ) = 1.5
The Bayes rules are δ1 and δ5 .

8 (a) Let µ be the expectation of the population.


n
−1
X 2
Es = (n − 1) E [Xi − µ] − [X̄ − µ]
i=1
n
Xn o
= (n − 1)−1 σ 2 − 2E [Xi − µ] · [X̄ − µ] + V ar(X̄)

i=1
n o
= (n − 1)−1 nσ 2 − 2nV ar(X̄) + nV ar(X̄)
n o n o
−1 2 −1
= (n − 1) nσ − nV ar(X̄) = (n − 1) 2 2
nσ − σ = σ 2

(b)(i) Since s is unbiased,


M SE(s) = V ar(s) = Es2 − σ 4
Pn 2
Notice that i=1 Xi − X̄ ∼ σ 2 χ2n−1 , there are i.i.d. standard normal random variable
Y1 , · · · , Yn−1 such that
n n−1
2 1 X  2 2 σ4  X 2
Es = 2
E Xi − X̄ = 2
E Yk2
(n − 1) i=1
(n − 1)
k=1
n−1 n−1
σ4 X σ4 X X
= E Yj2 Yk2 = EYk4 + 2 EYj2 Yk2
(n − 1)2 (n − 1)2
j,k=1 k=1 1≤j<k≤n−1
4
σ n+1 4
3(n − 1) + (n − 1)2 − (n − 1) } =
 
= 2
σ
(n − 1) n−1

6
So
n+1 4 2
M SE(s) = σ − σ4 = σ4
n−1 n−1

(ii)
2 2 2
M SE(σ̂02 ) = V ar(σ̂02 ) + E(σ̂02 ) − σ 2 = E(σ̂02 )2 − E(σ̂02 ) + E(σ̂02 ) − σ 2
2
= c2 σ 4 (n − 1)2 + 2(n − 1) − c2 (n − 1)2 σ 4 + c(n − 1)σ 2 − σ 2

n 2 o
= σ 4 2c2 (n − 1) + c(n − 1) − 1

It is easy to see that c = (n + 1)−1 is the minimizer of the right hand side.

9. E p̂ = θ and E θ̂ = 0.02 + 0.8E p̂ = 0.02 + 0.8θ.

σ2
M SE(p̂) = V ar(p̂) =
n

M SE(θ̂) = V ar(θ̂) + 0.02 + 0.8θ − θ)2


σ2
= 0.64V ar(p̂) + 0.04 · (0.1 − θ)2 = 0.64 + 0.04 · (0.1 − θ)2
n
Solve the inequality M SE(θ̂) < E p̂:
σ
|θ − 0.1| < 0.3 √
n

19. (a). Given a decision rule δ: {0, 1} −→ {a1 , a2 },



2Pθ1 {δ(X) = a2 } θ = θ1
R(θ, δ) = El(θ, δ(X)) =
3Pθ2 {δ(X) = a1 } + Pθ2 {δ(X) = a2 } θ = θ2

There are 4 possible non-randomized decision rules:

δ1 (0) = a1 , δ1 (1) = a2 ,

δ2 (0) = a2 , δ2 (1) = a1 ,
δ3 (0) = a1 , δ3 (1) = a1 ,
δ4 (0) = a2 , δ4 (1) = a2 ,
We have
 
1.6 θ = θ1 0.4 θ = θ1
R(θ, δ1 ) = , R(θ, δ2 ) = ,
1.8 θ = θ2 2.2 θ = θ2

7
 
0 θ = θ1 2 θ = θ1
R(θ, δ3 ) = , R(θ, δ4 ) = ,
3 θ = θ2 1 θ = θ2
The minimax rule is δ1 .

(b) The risk function of any randomized decision rule δ can be written in the form

4
X
R(θ, δ1 ) = λi R(θ, δi )
i=1

where λ1 , λ2 , λ3 , λ4 can be any non-negative numbers satisfying λ1 + λ2 + λ3 + λ4 = 1.


The risk set S takes the form
n o
S = 1.6λ1 + 0.4λ2 + 2λ4 , 1.8λ1 + 2.2λ2 + 3λ3 + λ4

By graphing, the optimal  point lies on the line sigment between R(θ 1 , δ 2 ), R(θ 2 , δ 2 ) and
R(θ1 , δ4 ), R(θ2 , δ4 ) That is

R(θ1 , δ) = 0.4λ + 2(1 − λ) and R(θ2 , δ) = 2.2λ + (1 − λ)

Solving 0.4λ + 2(1 − λ) = 2.2λ + (1 − λ) we have λ = 5/14. So the minmax rule among
the randomized decision rules is

δ2 with probability 5/14
δ=
δ4 with probability 9/14

(c). The Bayes risks are

r(δ1 ) = 0.1 × 1.6 + 0.9 × 1.8 = 1.78

r(δ2 ) = 0.1 × 0.4 + 0.9 × 2.2 = 2.02


r(δ3 ) = 0.1 × 0 + 0.9 × 3 = 2.7
r(δ4 ) = 0.1 × 2 + 0.9 × 1 = 1.1
So the Bayes rule is δ4 .

You might also like