You are on page 1of 16

1 Solution to Problem 8.

1
Assume the coin is fair (that is, if H0 : p = 0.5 is true, where p is the probability of heads
in a single flip), and assume that the flips are indpendent (seems reasonable). Letting X be
the number of heads in 1, 000 flips, we have X ∼ binomial(1000, .5). If we use the normal
approximation to the binomial, then
X − 500
Z = √
1000 ∗ .5 ∗ .5
is approximately standard normal N(0, 1). The observed value of Z is
560 − 500
z = √ = 3.795.
1000 ∗ .5 ∗ .5
This is a very large value for a Z-statistic. The corresponding p-value is

p = P [|Z| ≥ 3] = 2 ∗ 7.4 ∗ 10−5 = 0.00015.

It is very unlikely to see a value for X this far from its mean. The exact binomial p-value
would be

p = P [X ≥ 560 or X ≤ 440|p = 0.5] = 2 ∗ 8.252 ∗ 10−5 = 0.00017.

It would be unreasonable to assume the coin is fair.


(Note that we computed 2-sided p-values above.)

2 Solution to Problem 8.6


(a) and (b) The unconstrained MLEs are θ̂ = X̄ and µ̂ = Ȳ . The constrained MLE (i.e.,
the MLE under the constraint θ = µ) is µ̂0 = (nX̄ + mȲ )/(n + m), i.e. the average of the
combined sample of Xi s and Yj s. Therefore the maximized unconstrained likelihood is
! 
Y h i Y
L(θ̂, µ̂|X, Y ) = θ̂−1 exp −Xi /θ̂  µ̂−1 exp [−Yj /µ̂]
i j
" #  
X X
= θ̂−n exp − Xi /θ̂ µ̂−m exp − Yj /µ̂
i j
−n −m
= θ̂ µ̂ exp [−(n + m)] .

1
The maximized constrained likelihood is
h   i
−(n+m)
L(µ̂0 |X, Y ) = µ̂0 exp − nX̄ + mȲ /µ̂0
−(n+m)
= µ̂0 exp [−(n + m)] .

Hence, the LRT statistic is


−(n+m) −(n+m)
µ̂0 e
λ(X, Y ) = X̄ −n Ȳ −m e−(n+m)
. P m
n−n ( i Xi )n m−m Y
P
j j
= P (n+m)
(n + m)−(n+m) i Xi + j Yj
P

(n + m)(n+m) n
= T (1 − T )m ,
nn mm

where T is given in the statement of the problem and

Yj
P
j
1−T = P .
i Xi + j Yj
P

Note that this is a constant times h(T ) = T n (1 − T )m, which is an unnormalized beta PDF,
(of course, 0 < T ≤ 1). It is easy to check that this has a maximum at T = n/(n + m)
and decreases to 0 as one goes away from this maximum. Thus, we can also express the
acceptance region as c < T < C where 0 < c < n/(n + m) < C < 1. Note that

E[ Xi ] nθ n
P
i
hP i = =
E i Xi +
P
j Yj nθ + mµ n+m

when θ = µ, so the value of T we would most accept H0 for is n/(n + m) and it seems
somewhat reasonable.
(c) Note that exponential(µ) is the same as gamma(1, µ), so by Exercise 4.24 (Home-
work 4), Xi ∼ gamma(n, θ), Yj ∼ gamma(m, µ), and if θ = µ, then
P P
i j

Xi
P
i
T = P ∼ beta(n, m).
i Xi + j Yj
P

2
3 Solution to Problem 8.12
(a) It probably isn’t obvious, but one can check that the IG(α, β) distribution is the
distribution of 1/Y when Y ∼ gamma(α, β). Letting V = 1/Y , the Jacobian is 1/V 2 , so

fV (v) = fY (1/v)/v 2
(1/v)α−1
= α
exp [−(1/v)/β] v −2
Γ(α)β
1 1 −1/(βv)
= e , v > 0.
Γ(α)β α v α+1
When working with this conjugate prior, it is useful to reparameterize in terms of the
precision parameter
δ = 1/σ 2 .

Restating the result of Exercise 7.23, p. 359, if we use a gamma(α, β) prior on δ, then the
posterior is gamma(α+(n−1)/2, [(n−1)S 2/2+1/β]−1). To check this, we have to compute
the posterior. Note that they don’t mention a prior on µ, and it is clear that they want to
get the posterior in terms of S 2 . So, we pretend that S 2 is our observation and work with
its PDF given σ 2 = 1/δ, i.e.
!
n−1 (n − 1)v
fS 2 (v) = 2
fχ2n−1
σ σ2
= (n − 1)δfχ2n−1 ((n − 1)δv)
[(n − 1)δ](n−1)/2 v (n−1)/2−1 −(n−1)δv/2
= e , v > 0.
Γ((n − 1)/2)2(n−1)/2
Using only S 2 here is justified in some sense because S 2 has distribution independent of µ
and is complete and sufficient for just σ 2 . However, if the problem is to be treated correctly,
we would specify a joint prior for µ and σ 2 and work with that.
Anyway, this criticism aside, the posterior is

π(δ|S 2 = v)

= m(v)−1 fS 2 (v|δ)π(δ)
(n−1)/2 (n−1)/2−1 α−1
−1 [(n − 1)δ] v −(n−1)δv/2 δ
= m(v) (n−1)/2
e α
e−δ/β
Γ((n − 1)/2)2 Γ(α)β

3
(n − 1)(n−1)/2 v (n−1)/2−1
= m(v)−1 δ α+(n−1)/2−1 exp [−δ ((n − 1)v/2 + 1/β)] , δ > 0,
Γ((n − 1)/2)Γ(α)2(n−1)/2 β α

and one recognizes from the last two factors in the last expression that the posterior for δ
is the gamma distribution given above.
Back to the problem we are supposed to work on, namely to devise a Bayesian test for
H0 : σ 2 ≤ 1, or equivalently of H0 : δ ≥ 1. This test would reject if the posterior probability
of H0 is too small, i.e. if
h i
P δ ≥ 1|δ ∼ gamma(α + (n − 1)/2, [(n − 1)S 2 /2 + 1/β]−1)

is too small. The posterior probability of H0 is sometimes known as the “Bayesian p-value.”
One rejects H0 if this is too small. Other than the similarity of the definition of the rejection
region, it is totally dissimilar to the frequentist p-value.
Note that
Z ∼ gamma(α, 1) ⇒ βZ ∼ gamma(α, β). (1)

Thus for fixed c > 0,


ψ(β) = P [βZ > c] = 1 − FZ (c/β)

is increasing in β. Thus, for the Bayesian p-value above, larger values of S 2 mean smaller
values of the scale parameter which means smaller posterior probability for H0 . Thus, if
we put a threshold on the Bayesian p-value of (say) 0.05, then there will be some critical
value of S 2 above which we will reject H0 .
(b) We will derive the LRT starting with i.i.d. N(µ, σ 2 ) observations. The unconstrained
MLEs are

µ̂ = X̄,
1X n−1 2
σˆ2 = (Xi − X̄)2 = S .
n i n

The MLE of µ is obtained by minimizing i (Xi − µ)2 , and this is unaffected by whether
P

we assume H0 is true or not. Pluggin this in, the unconstrained “concentrated” likelihood

4
is " #
2

2 −n/2
 1 X
L(µ̂, σ |X) = (2π) −n/2
σ exp − 2 (Xi − X̄)2 .
2σ i
We say a likelihood is “concentrated” when we can analytically maximize over one of the
parameters and then plug that value back in. In order to find the MLE under the constraint
H0 : σ 2 ≤ 1, we want to claim that if the unconstrained MLE σˆ2 > 1, then the constrained
MLE is σˆ2 0 = 1. (Of course, if the unconstrained MLE σˆ2 ≤ 1, then the constrained MLE
is σˆ2 0 = σˆ2 .) In order to establish our claim, we want to show that the “concentrated”
log likelihood log L(σ 2 , µ̂|X) is concave in σ 2 . Here, we have maximized over µ, and the
result doesn’t depend on σ 2 , so the constrained and unconstrained MLEs for µ are the
same (namely, X̄).
So, in an effort to establish our claim (of concavity), we compute the first two derivatives
w.r.t. σ 2 of the concentrated log likelihood, which gives

∂ −n 1 X
log L(µ̂, σ 2 ) = + (Xi − X̄)2
∂σ 2 2σ 2 2(σ 2 )2 i
∂2 n 1 X
log L(µ̂, σ 2 ) = − 2 3 (Xi − X̄)2
(∂σ 2 )2 2
2(σ ) 2 (σ ) i

Note that the second derivative changes sign from negative to positive at σ12 = 2σˆ2 , so it is
negative 0 < σ 2 < 2σˆ2 , i.e. the log likelihood is concave on this interval. Hence if σˆ2 > 1,
then the constrained MLE is σˆ2 0 = 1.
We will assume for now that σˆ2 ≤ 1. Then the LRT statistic is

L(µ̂, 1)
λ (X) =
L(µ̂, σˆ2 )
h i
exp − 12 i (Xi − X̄)2
P
=
(σˆ2 )−n/2 exp − 2σ1ˆ2
h i
2
i (Xi − X̄)
P

en/2 (σˆ2 )n/2 exp −nσˆ2 /2 .


h i
=

It is easy to check that as a function of σˆ2 the LRT statistic is decreasing, so we will reject
if σˆ2 is too big. It is of course more convenient to use the chi-squared statistic (n − 1)S 2
which has a χ2n−1 distribution if σ 2 = 1.

5
The final question, “Is there any choice of prior parameters for which the regions agree?”
seems very silly. For both the Bayesian test and the LRT, we reject if S 2 is too big. However,
we need to set a threshold for the Bayesian p-value and a level of significance for the
frequentist test (which is the threshold below which the frequentist p-value is significant).
We can clearly choose thresholds for each so that the rejection region is the same. This is
entirely justified since they mean completely different things.
Perhaps the authors mean to ask that if we set the same thresholds for the Bayesian
and frequentist p-values, can we find prior parameters such that the two agree tests agree,
even though they mean totally different things. That’s my best guess at what is intended.
This seems difficult at first glance. We can explicitly compute the frequentist rejection
region from a given level of significance (and an n) since it only depends on the level of
signficance and degrees of freedom (determined by n). However, the Bayesian test seems
to require that, given S 2 and everything else (sample size n, prior parameters α and β), we
compute the posterior probability of H0 and then decide whether to reject or not. Using
our “scaling law” in equation (1), we have that the Bayesian p-value can be written as
h i
Bp − value = P δ ≥ 1|δ ∼ gamma(α + (n − 1)/2, [(n − 1)S 2 /2 + 1/β]−1 )
h i
= P Z ≥ (n − 1)S 2 + 2/β | Z ∼ gamma((2α + n − 1)/2, 2)
h i
= P Z ≥ (n − 1)S 2 + 2/β | Z ∼ χ2α+n−1 ,

where I suppose we assume α is an integer in the last expression. Based on the distribution
theory for S 2 , the frequentist p-value is
h i
F p − value = P Z ≥ (n − 1)S 2 |Z ∼ χ2n−1 = gamma((n − 1)/2, 2) .

Note for the Bayesian p-value, we have a larger threshold but a larger degree of freedom or
shape parameter. We can feel assured that we can choose the shape parameter increment
2α to offset the larger threshold increment 2/β since in the limit α → 0 and β → ∞, the
two would agree. I suppose one must check that as the opposite limits happen, we can
cover the whole range of possible p-values (i.e., (0, 1)).

6
Now we have only checked that for given frequentist p-value, we can find (α, β) such
that the Bayesian p-value matches (which makes (α, β) dependent on the data, which no
proper Bayesian should do, though a lot of applied Bayesians do, but we won’t judge,
lest we be judged, but Bayes formula is based on the assumption that the prior is chosen
independently of the data). We could do the above calculation with S 2 replaced by the
critical value for a particular level of significance, and find the prior parameters in the same
way that is suggested.
I hope none of y’all worked as hard on this problem as I did. Since I assigned it, I
guess it was my albatross to bear. I won’t assign this problem in the future without further
instructions. I knew from past experience that this chapter had a lot of issues, particularly
with the exercises.

4 Solution to Problem 8.15


This seems to be a nice, straightforward problem. We just apply the Neyman-Pearson
Lemma with a few calculations, and we are done. Not like that morass of a previous
problem.
According to Neyman-Pearson (NP), we compute the statistic

f (X|θ1 )
Y =
f (X|θ0 )

where the null hypothesis is H0 : θ = θ0 and the alternative is H0 : θ = θ1 . We don’t allow


any other values of θ than θ0 and θ1 for NP. For the specific problem at hand,
" ! #
 −n/2 1 1 1 X 2
Y = σ12 /σ02 exp − − 2 X
2 2
σ1 σ0 i i

We were given that σ0 < σ1 , so the quantity multiplying Xi2 in the exponent is > 0. We
P
i

reject for large values of Y , which, by strict monotonicity of the exponential, is equivalent
to rejecting for large values of
Xi2 .
X
T =
i

7
According to NP as presented by Dr. Cox, the test function is

1 if f (X|σ02 ) > kf (X|σ02 )






φ(X) =
 γ (X) if f (X|σ02 ) = kf (X|σ02 )


0 if f (X|σ02 ) = kf (X|σ02 )


However, Y is a smooth monotone increasing transformation of the continuous RV T , so


P [Y = k|σ] = 0, and we don’t need to concern ourselves with randomization.
Thus, the NP test can be expressed as

1 if Xi2 > c
 P
i

φ(X) =
0 if Xi2 ≤ c

 P
i

where c is determined by " #


Xi2
X
P > c|σ = σ0 = α,
i
where α is the given level of significance.
We know that Xi2 /σ02 ∼ χ2n , so we take c = σ02 χ2n,α where χ2n,α denotes the 1 − α
P
i

quantile of the χ2n distribution.

5 Solution to Problem 8.20


x→ 1 2 3 4 5 6 7
f (X|H1) .06 .05 .04 .03 .02 .01 .79
f (X|H0) .01 .01 .01 .01 .01 .01 .94
Y 6 5 4 3 2 1 79/94
We have reproduced the table from the book, but reversed the rows for the two PMFs,
and added a row for Y , the ratio of the likelihoods f (X|H1)/f (X|H0). Since the ratio of
the likelihoods is decreasing in x, we add the smaller values of x to the rejection region
first. If we put {1, 2, 3, 4} into the rejection region, then we achieve a level of significance
of 0.04. Thus, the Most Powerful level 0.04 test is

Reject H0 if X≤4
Accept H0 if X > 4.

8
The type II error probability is

P [X ≤ 4|H0 ] = .06 + .05 + .04 + .03 = 0.18.

6 Solution to Problem 8.31


(a) Note that the sufficient statistic is
X
T = Xi .
i

Also, T ∼ Poisson(nλ). If λ1 < λ2 , the ratio of the likelihoods is

f (t|λ2 ) (nλ2 )t e−nλ2 t


= = C(λ1 , λ2 )(λ2 /λ1 )t
f (t|λ1 ) (nλ1 )t e−nλ1 t

where C(·) doesn’t depend on t. Thus, we see that the family has the MLR property in
T . It follows from the Karlin-Rubin theorem that a UMP test rejects with T > c where
c is determined such that P [T > c|λ = λ0 ] = α, if we can achieve the level α without
randomization. If we don’t want to randomize, then we choose the smallest value of c such
that P [T > c|λ = λ0 ] ≤ α.
(b) Of course, these calculations are only approximate. If λ = 1, then T ∼ Poisson(n),
which for large n is approximately N(n, n). In terms of an approximate N(0, 1) test statis-
tic, we would use
T −n
Z = √ .
n
Since the .95 quantile of N(0, 1) is z.05 = 1.645, we want to solve for c in

c−n √
√ = 1.645 =⇒ c = n + 1.645 n.
n

Now if λ = 2, then T is approximately N(2n, 2n). Now, the .9 quantile of N(0, 1) is 1.281,
so the .1 quantile is -1.281, and so we want

.9 = P [T > c|λ = 2] ≈ P [Z > (c − 2n)/ 2n|Z ∼ N(0, 1)]

9
which means
√ √
(c − 2n)/ 2n = −1.281 =⇒ c = 2n − 1.812 n.

Equating the two expressions gives


√ √ √
n + 1.645 n = 2n − 1.812 n =⇒ 3.457 n = n =⇒ n = (3.457)2 = 11.95.

Of course, we would use n = 12. Checking the actual values that would be achieved if
n = 12, we would reject H0 : λ = 1 if T > 18, which has a size of 0.037. The power at
λ = 2 would be 0.932. So, the test is conservative on both counts (i.e., lower type I and
type II error probabilities than we asked for).

7 Solution to Problem 8.37


(a) To show the test is size α:
 q 

sup P X̄ > θ0 + zα σ 2 /n θ

θ≤θ0
 q 

= sup P X̄ − θ > θ0 − θ + zα σ 2 /n θ

θ≤θ0
 
X̄ − θ θ0 − θ


= sup P q > q + zα θ
θ≤θ0 σ 2 /n σ 2 /n
 
θ0 − θ


= sup P  Z > q + zα Z ∼ N(0, 1)
θ≤θ0 σ 2 /n
  
θ0 − θ
= sup 1 − Φ  q + zα  ,
θ≤θ0 2
σ /n

where Φ is the N(0, 1) CDF. We see that the last quantity insided the square brackets is
increasing in θ, so the supremum is achieved at θ = θ0 . Hence, the size of the test is
 q 

P X̄ > θ0 + zα σ 2 /n θ
= θ0 = 1 − Φ(zα ) = α.

To show that it is the LRT, we note that the unconstrained MLE of θ is X̄. We want
to show that if X̄ > θ0 , then the constrained MLE is θ̂0 = θ0 . The log likelihood function

10
is
log L(θ|X) = −(n/2) log(2πσ 2 ) − 1/(2σ 2 ) (Xi − θ)2 .
X

i
The 1st and 2nd derivatives w.r.t. θ are

(1/σ 2 )
X
log L(θ|X) = (Xi − θ)
∂θ i
∂2
log L(θ|X) = −n/σ 2 .
∂θ2
Since the 2nd derivative is negative, the function is concave, so it is increasing for θ < X̄
and decreasing for θ > X̄, hence when θ0 < X̄, the MLE assuming H0 : θ < θ0 is θˆ0 = θ0 .
So assuming θ0 < X̄, the LRT statistic is
h i
(2πσ 2 )−n/2 exp − 2σ12 i (Xi − θ0 )2
P
λ(X) = h i
(2πσ 2 )−n/2 exp − 2σ12 i (Xi − X̄)2
P

n 
 2 
= exp − 2 X̄ − θ0 .

The last expression is obtained by adding and subtracting X̄ up in the squared summands
of the numerator exponential of the previous expression. Of course, λ(X̄) = 1 if X̄ ≤ θ0 .
So, λ(X̄) is a nonincreasing function of X̄, and for X̄ > θ0 , is strictly nonincreasing. Hence
rejecting for small values of λ(X̄) is the same as rejecting for large values of X̄. So, as long
as α < .5, then the critical value will be > θ0 , and it is as given in the problem.
(b) The sufficient statistic is X̄, and it’s PDF is

f (x̄|θ) = c(θ)e−ηx̄ h(x̄)

where
h i
c(θ) = (2πσ 2 /n)−n/2 exp −n/(2σ 2 )θ2
h i
η = exp nµ/(2σ 2 )
h i
h(x̄) = exp −n/(2σ 2 )x̄2

Hence, if θ1 < θ2 ,
f (x̄|θ2 ) h i
= C(θ1 , θ2 ) exp n(θ2 − θ2 )x̄/σ 2
f (x̄|θ1 )

11
which is strictly increasing in x̄. Thus, the MLR property holds, and the UMP test rejects
for large values of X̄, and since the given test has power α at θ = θ0 , it is the UMP test
by Karlin-Rubin.
(c) Showing that the test is size α seems to be rather tricky. If θ = θ0 , then (X̄ −

θ0 )/(S/ n) ∼ tn−1 , so


" #
h i X̄ − θ0
P X̄ > θ0 + tn−1,α S/ n θ = θ0 = P √ > tn−1,α θ = θ0

S/ n
= α.

Intuitively, it seems clear that if θ < θ0 , the probability only gets bigger. If we try the
same calculation for some θ < θ0 , but do the correct centering, we obtain
#
√ i
"
h X̄ − θ θ0 − θ
P X̄ > θ0 + tn−1,α S/ n θ = P √ > √ + tn−1,α θ
S/ n S/ n
" #
X̄ − θ
< P √ > tn−1,α θ
S/ n
= α.

Note that the second line follows since we removed a positive quantity from the right hand
side of the > sign inside the event (we are assuming θ < θ0 ), and the third quantity follows
from the usual facts as above.
To show it is the LRT test (at least if α is small enough), we return to a fairly well
worn calculation. The unconstrained MLE is obvious enough:

θ̂ = X̄
1 X 2 n−1 2
σˆ2 = Xi − X̄ = S .
n i n
The constrained MLE (under H0 ) will be the same as the above if X̄ ≤ θ0 . If X̄ > θ0 , we
already showed in part (a) that maximizing over θ for fixed σ 2 resulted in θ̂0 = θ0 . If we
further maximize over σ 2 , the result is clear. Hence, for the case X̄ > θ0 , the constrained
MLE is

θ̂0 = θ0
1X
σˆ2 0 = (Xi − θ0 )2 .
n i

12
Therefore, still assuming X̄ > θ0 , the LRT statistic is

supθ≤θ0 L ( θ, σ 2 | X)
λ (X) =
sup−∞<θ<∞ L ( θ, σ 2 | X)
−n/2
σˆ2 0 exp −(2σˆ2 0 )−1
 h i
2
i (Xi − θ0 )
P
= −n/2  2 
σˆ2 exp −(2σˆ2 )−1
 P 
i Xi − X̄
−n/2
σˆ2 0 exp −(2σˆ2 0 )−1 nσˆ2 0
 h i

= −n/2
σˆ2 exp −(2σˆ2 )−1 nσˆ2
 h i

−n/2
σˆ2 0

exp [−n/2]
= −n/2
σˆ2

exp [−n/2]
 −n/2
σˆ2 0 
= 
σˆ2
 −n/2
2
 i (Xi − θ0 ) 
P
= P  2 
i Xi − X̄
  2 P  2 −n/2
n X̄ − θ0 + i Xi − X̄
=

 P  2 
i Xi − X̄
 
= T 2 /(n − 1) + 1)−n/2 ,

where
X̄ − θ0
T = √ .
S/ n
Clearly rejecting for small values of λ is equivalent to rejecting for large values of T , when
X̄ > θ0 , which is equivalent to the test that rejects when


X̄ > θ0 + t(n−1),α S/ n.

This all supposes X̄ > θ0 , and also that α < .5 so that t(n−1),α > 0. For both the LRT and
the test given, if α < .5, they both accept when X̄ ≤ θ0 .

13
8 Solution to Problem 8.41 parts (a) and (b)
(a) Same old LRT calculation. The unconstrained MLEs of µX and µY are easily seen to
be

µ̂X = X̄

µ̂Y = Ȳ .

Plugging these into the log likelihood, log L(µX , µY , σ 2 |X, Y ), we obtain the concentrated
log likelihood
  n+m 1 X 2 1 X 2
log L µX , µY , σ 2 X, Y = − log σ 2 − 2 Xi − X̄ − 2 Yj − Ȳ .

2 2σ i 2σ j
Taking derivatives and setting equal to zero, we get
 
1 X  2 2
σˆ2
X
= Xi − X̄ + Yj − Ȳ 
n+m i j
1  2

= (n − 1)SX + (m − 1)SY2 .
n+m
The constrained MLE computed by assuming H0 : µX = µY is true is easy to see since
then the Xi s and Yj s are i.i.d. from the same N(µ, σ 2 ) distribution, so the result is
nX̄ + mȲ
µ̂0 =
n+m  
1 
σˆ2 0 (Xi − µ̂0 )2 + (Yj − µ̂0 )2 
X X
=
n+m i j
 
1 
(Xi − X̄ + X̄ − µ̂0 )2 + (Yj − Ȳ + Ȳ − µ̂0 )2 
X X
=
n+m i j
1  2

= (n − 1)SX + n(X̄ − µ̂0 )2 + (m − 1)SY2 + m(Ȳ − µ̂0 )2 .
n+m
Following the same sort of calculations as in the previous exercise, one will arrive at
 −(n+m)/2
σˆ2
 0
λ (X, Y ) =
σˆ2
!−(n+m)/2
n(X̄ − µ̂0 )2 + m(Ȳ − µ̂0 )2
= 1+ 2
(n − 1)SX + (m − 1)SY2

14
Now

nX̄ + mȲ
X̄ − µ̂0 = X̄ −
n+m
(n + m)X̄ − nX̄ − mȲ
=
n+m
m  
= X̄ − Ȳ ,
n+m

with an analogous expresson for Ȳ − µ̂0 :

n  
Ȳ − µ̂0 = Ȳ − X̄ .
n+m

Hence,

nm  2
n(X̄ − µ̂0 )2 + m(Ȳ − µ̂0 )2 = X̄ − Ȳ .
n+m

Plugging this back into the final expression for the LRT statistic λ, we obtain
  2 −(n+m)/2
nm X̄ − Ȳ
λ (X, Y ) = 1 +

2

2 
n + m (n − 1)SX + (m − 1)SY
  2 −(n+m)/2
nm 1 1
  X̄ − Ȳ
= 1 + +
  
(n + m)(n + m − 2) n m SP2 1 + 1

n m
  2 −(n+m)/2
1 X̄ − Ȳ
= 1 +
  
n + m − 2 SP2 1 + 1

n m
−(n+m)/2
1

= 1+ T2 ,
n+m−2

where T is given in the statement of the problem. Now this is monotone decreasing in T 2 ,
so rejecting for small values of λ (X, Y ) is equivalent to rejecting for large values of T 2 , or
equivalently if T < −c or T > c for some c > 0.
2
(b) We know that X̄ is independent of SX , and similarly for the Ȳ and SY2 . Also, the
Xi are independent of the Yj s. Hence, X̄ − Ȳ is independent of SP2 . Also, (n − 1)SX
2
/σ 2 ∼
χ2n−1 is independent of (m − 1)SY2 /σ 2 ∼ χ2m−1 , so (n + m − 2)SP2 /σ 2 = (n − 1)SX
2
/σ 2 +

15
(m − 1)SY2 /σ 2 ∼ χ2n+m−2 by the additivity properties of independent χ2 RVs. Also, X̄ − Ȳ
∼ N(µX − µY , σ 2 (n−1 + m−1 )), which is N(0, σ 2 (n−1 + m−1 )) under H0 . Thus, we have

X̄ − Ȳ
T = r  
1 1
SP2 n
+ m
  r  
1 1
X̄ − Ȳ / σ 2 n
+ m
= q
SP2 /σ 2
N(0, 1)
= q
χ2n+m−2 /(n + m − 2)

where the numerator and denominators are independent. We recognized the last expression
as a tn+m−2 RV (or distribution).

16

You might also like