# Solutions Manual for

Statistical Inference, Second Edition
George Casella
University of Florida
Roger L. Berger
North Carolina State University
Damaris Santana
University of Florida
0-2 Solutions Manual for Statistical Inference
“When I hear you give your reasons,” I remarked, “the thing always appears to me to be so
ridiculously simple that I could easily do it myself, though at each successive instance of your
reasoning I am baﬄed until you explain your process.”
Dr. Watson to Sherlock Holmes
A Scandal in Bohemia
0.1 Description
This solutions manual contains solutions for all odd numbered problems plus a large number of
solutions for even numbered problems. Of the 624 exercises in Statistical Inference, Second Edition,
this manual gives solutions for 484 (78%) of them. There is an obtuse pattern as to which solutions
were included in this manual. We assembled all of the solutions that we had from the ﬁrst edition,
and ﬁlled in so that all odd-numbered problems were done. In the passage from the ﬁrst to the
second edition, problems were shuﬄed with no attention paid to numbering (hence no attention
paid to minimize the new eﬀort), but rather we tried to put the problems in logical order.
A major change from the ﬁrst edition is the use of the computer, both symbolically through
Mathematica
tm
and numerically using R. Some solutions are given as code in either of these lan-
guages. Mathematica
tm
can be purchased from Wolfram Research, and R is a free download from
http://www.r-project.org/.
Here is a detailed listing of the solutions included.
Chapter Number of Exercises Number of Solutions Missing
1 55 51 26, 30, 36, 42
2 40 37 34, 38, 40
3 50 42 4, 6, 10, 20, 30, 32, 34, 36
4 65 52 8, 14, 22, 28, 36, 40
48, 50, 52, 56, 58, 60, 62
5 69 46 2, 4, 12, 14, 26, 28
all even problems from 36 −68
6 43 35 8, 16, 26, 28, 34, 36, 38, 42
7 66 52 4, 14, 16, 28, 30, 32, 34,
36, 42, 54, 58, 60, 62, 64
8 58 51 36, 40, 46, 48, 52, 56, 58
9 58 41 2, 8, 10, 20, 22, 24, 26, 28, 30
32, 38, 40, 42, 44, 50, 54, 56
10 48 26 all even problems except 4 and 32
11 41 35 4, 20, 22, 24, 26, 40
12 31 16 all even problems
0.2 Acknowledgement
Many people contributed to the assembly of this solutions manual. We again thank all of those
who contributed solutions to the ﬁrst edition – many problems have carried over into the second
edition. Moreover, throughout the years a number of people have been in constant touch with us,
contributing to both the presentations and solutions. We apologize in advance for those we forget to
mention, and we especially thank Jay Beder, Yong Sung Joo, Michael Perlman, Rob Strawderman,
and Tom Wehrly. Thank you all for your help.
And, as we said the ﬁrst time around, although we have beneﬁted greatly from the assistance and
ACKNOWLEDGEMENT 0-3
comments of others in the assembly of this manual, we are responsible for its ultimate correctness.
To this end, we have tried our best but, as a wise man once said, “You pays your money and you
George Casella
Roger L. Berger
Damaris Santana
December, 2001
Chapter 1
Probability Theory
“If any little problem comes your way, I shall be happy, if I can, to give you a hint or two as
to its solution.”
Sherlock Holmes
The Adventure of the Three Students
1.1 a. Each sample point describes the result of the toss (H or T) for each of the four tosses. So,
for example THTT denotes T on 1st, H on 2nd, T on 3rd and T on 4th. There are 2
4
= 16
such sample points.
b. The number of damaged leaves is a nonnegative integer. So we might use S = {0, 1, 2, . . .}.
c. We might observe fractions of an hour. So we might use S = {t : t ≥ 0}, that is, the half
inﬁnite interval [0, ∞).
d. Suppose we weigh the rats in ounces. The weight must be greater than zero so we might use
S = (0, ∞). If we know no 10-day-old rat weighs more than 100 oz., we could use S = (0, 100].
e. If n is the number of items in the shipment, then S = {0/n, 1/n, . . . , 1}.
1.2 For each of these equalities, you must show containment in both directions.
a. x ∈ A\B ⇔ x ∈ A and x / ∈ B ⇔ x ∈ A and x / ∈ A ∩ B ⇔ x ∈ A$$A ∩ B). Also, x ∈ A and x / ∈ B ⇔x ∈ A and x ∈ B c ⇔x ∈ A∩ B c . b. Suppose x ∈ B. Then either x ∈ A or x ∈ A c . If x ∈ A, then x ∈ B ∩ A, and, hence x ∈ (B∩A) ∪(B∩A c ). Thus B ⊂ (B∩A) ∪(B∩A c ). Now suppose x ∈ (B∩A) ∪(B∩A c ). Then either x ∈ (B ∩ A) or x ∈ (B ∩ A c ). If x ∈ (B ∩ A), then x ∈ B. If x ∈ (B ∩ A c ), then x ∈ B. Thus (B ∩ A) ∪ (B ∩ A c ) ⊂ B. Since the containment goes both ways, we have B = (B ∩ A) ∪ (B ∩ A c ). (Note, a more straightforward argument for this part simply uses the Distributive Law to state that (B ∩ A) ∪ (B ∩ A c ) = B ∩ (A∪ A c ) = B ∩ S = B.) c. Similar to part a). d. From part b). A∪ B = A ∪ [(B ∩ A) ∪ (B ∩ A c )] = A ∪ (B ∩ A) ∪ A ∪ (B ∩ A c ) = A ∪ [A∪ (B ∩ A c )] = A∪ (B ∩ A c ). 1.3 a. x ∈ A∪ B ⇔ x ∈ A or x ∈ B ⇔ x ∈ B ∪ A x ∈ A∩ B ⇔ x ∈ A and x ∈ B ⇔ x ∈ B ∩ A. b. x ∈ A∪ (B ∪ C) ⇔ x ∈ A or x ∈ B ∪ C ⇔ x ∈ A∪ B or x ∈ C ⇔ x ∈ (A∪ B) ∪ C. (It can similarly be shown that A∪ (B ∪ C) = (A∪ C) ∪ B.) x ∈ A∩ (B ∩ C) ⇔ x ∈ A and x ∈ B and x ∈ C ⇔ x ∈ (A∩ B) ∩ C. c. x ∈ (A∪ B) c ⇔ x / ∈ A or x / ∈ B ⇔ x ∈ A c and x ∈ B c ⇔ x ∈ A c ∩ B c x ∈ (A∩ B) c ⇔ x / ∈ A∩ B ⇔ x / ∈ A and x / ∈ B ⇔ x ∈ A c or x ∈ B c ⇔ x ∈ A c ∪ B c . 1.4 a. “A or B or both” is A∪B. From Theorem 1.2.9b we have P(A∪B) = P(A)+P(B)−P(A∩B). 1-2 Solutions Manual for Statistical Inference b. “A or B but not both” is (A∩ B c ) ∪ (B ∩ A c ). Thus we have P((A∩ B c ) ∪ (B ∩ A c )) = P(A∩ B c ) +P(B ∩ A c ) (disjoint union) = [P(A) −P(A∩ B)] + [P(B) −P(A∩ B)] (Theorem1.2.9a) = P(A) +P(B) −2P(A∩ B). c. “At least one of A or B” is A∪ B. So we get the same answer as in a). d. “At most one of A or B” is (A∩ B) c , and P((A∩ B) c ) = 1 −P(A∩ B). 1.5 a. A∩ B ∩ C = {a U.S. birth results in identical twins that are female} b. P(A∩ B ∩ C) = 1 90 × 1 3 × 1 2 1.6 p 0 = (1 −u)(1 −w), p 1 = u(1 −w) +w(1 −u), p 2 = uw, p 0 = p 2 ⇒ u +w = 1 p 1 = p 2 ⇒ uw = 1/3. These two equations imply u(1 − u) = 1/3, which has no solution in the real numbers. Thus, the probability assignment is not legitimate. 1.7 a. P(scoring i points) = _ 1 − πr 2 A if i = 0 πr 2 A _ (6−i) 2 −(5−i) 2 5 2 _ if i = 1, . . . , 5. b. P(scoring i points|board is hit) = P(scoring i points ∩ board is hit) P(board is hit) P(board is hit) = πr 2 A P(scoring i points ∩ board is hit) = πr 2 A _ (6 −i) 2 −(5 −i) 2 5 2 _ i = 1, . . . , 5. Therefore, P(scoring i points|board is hit) = (6 −i) 2 −(5 −i) 2 5 2 i = 1, . . . , 5 which is exactly the probability distribution of Example 1.2.7. 1.8 a. P(scoring exactly i points) = P(inside circle i) − P(inside circle i + 1). Circle i has radius (6 −i)r/5, so P(sscoring exactly i points) = π(6 −i) 2 r 2 5 2 πr 2 π ((6−(i + 1))) 2 r 2 5 2 πr 2 = (6 −i) 2 −(5 −i) 2 5 2 . b. Expanding the squares in part a) we ﬁnd P(scoring exactly i points) = 11−2i 25 , which is decreasing in i. c. Let P(i) = 11−2i 25 . Since i ≤ 5, P(i) ≥ 0 for all i. P(S) = P(hitting the dartboard) = 1 by deﬁnition. Lastly, P(i ∪ j) = area of i ring + area of j ring = P(i) +P(j). 1.9 a. Suppose x ∈ (∪ α A α ) c , by the deﬁnition of complement x ∈ ∪ α A α , that is x ∈ A α for all α ∈ Γ. Therefore x ∈ A c α for all α ∈ Γ. Thus x ∈ ∩ α A c α and, by the deﬁnition of intersection x ∈ A c α for all α ∈ Γ. By the deﬁnition of complement x ∈ A α for all α ∈ Γ. Therefore x ∈ ∪ α A α . Thus x ∈ (∪ α A α ) c . Second Edition 1-3 b. Suppose x ∈ (∩ α A α ) c , by the deﬁnition of complement x ∈ (∩ α A α ). Therefore x ∈ A α for some α ∈ Γ. Therefore x ∈ A c α for some α ∈ Γ. Thus x ∈ ∪ α A c α and, by the deﬁnition of union, x ∈ A c α for some α ∈ Γ. Therefore x ∈ A α for some α ∈ Γ. Therefore x ∈ ∩ α A α . Thus x ∈ (∩ α A α ) c . 1.10 For A 1 , . . . , A n (i) _ n _ i=1 A i _ c = n i=1 A c i (ii) _ n i=1 A i _ c = n _ i=1 A c i Proof of (i): If x ∈ (∪A i ) c , then x / ∈ ∪A i . That implies x / ∈ A i for any i, so x ∈ A c i for every i and x ∈ ∩A i . Proof of (ii): If x ∈ (∩A i ) c , then x / ∈ ∩A i . That implies x ∈ A c i for some i, so x ∈ ∪A c i . 1.11 We must verify each of the three properties in Deﬁnition 1.2.1. a. (1) The empty set ∅ ∈ {∅, S}. Thus ∅ ∈ B. (2) ∅ c = S ∈ B and S c = ∅ ∈ B. (3) ∅∪S = S ∈ B. b. (1) The empty set ∅ is a subset of any set, in particular, ∅ ⊂ S. Thus ∅ ∈ B. (2) If A ∈ B, then A ⊂ S. By the deﬁnition of complementation, A c is also a subset of S, and, hence, A c ∈ B. (3) If A 1 , A 2 , . . . ∈ B, then, for each i, A i ⊂ S. By the deﬁnition of union, ∪A i ⊂ S. Hence, ∪A i ∈ B. c. Let B 1 and B 2 be the two sigma algebras. (1) ∅ ∈ B 1 and ∅ ∈ B 2 since B 1 and B 2 are sigma algebras. Thus ∅ ∈ B 1 ∩ B 2 . (2) If A ∈ B 1 ∩ B 2 , then A ∈ B 1 and A ∈ B 2 . Since B 1 and B 2 are both sigma algebra A c ∈ B 1 and A c ∈ B 2 . Therefore A c ∈ B 1 ∩ B 2 . (3) If A 1 , A 2 , . . . ∈ B 1 ∩B 2 , then A 1 , A 2 , . . . ∈ B 1 and A 1 , A 2 , . . . ∈ B 2 . Therefore, since B 1 and B 2 are both sigma algebra, ∪ i=1 A i ∈ B 1 and ∪ i=1 A i ∈ B 2 . Thus ∪ i=1 A i ∈ B 1 ∩ B 2 . 1.12 First write P _ _ i=1 A i _ = P _ n _ i=1 A i _ i=n+1 A i _ = P _ n _ i=1 A i _ +P _ _ i=n+1 A i _ (A i s are disjoint) = n i=1 P(A i ) +P _ _ i=n+1 A i _ (ﬁnite additivity) Now deﬁne B k = i=k A i . Note that B k+1 ⊂ B k and B k → φ as k → ∞. (Otherwise the sum of the probabilities would be inﬁnite.) Thus P _ _ i=1 A i _ = lim n→∞ P _ _ i=1 A i _ = lim n→∞ _ n i=1 P(A i ) +P(B n+1 ) _ = i=1 P(A i ). 1.13 If A and B are disjoint, P(A ∪ B) = P(A) + P(B) = 1 3 + 3 4 = 13 12 , which is impossible. More generally, if A and B are disjoint, then A ⊂ B c and P(A) ≤ P(B c ). But here P(A) > P(B c ), so A and B cannot be disjoint. 1.14 If S = {s 1 , . . . , s n }, then any subset of S can be constructed by either including or excluding s i , for each i. Thus there are 2 n possible choices. 1.15 Proof by induction. The proof for k = 2 is given after Theorem 1.2.14. Assume true for k, that is, the entire job can be done in n 1 × n 2 × · · · × n k ways. For k + 1, the k + 1th task can be done in n k+1 ways, and for each one of these ways we can complete the job by performing 1-4 Solutions Manual for Statistical Inference the remaining k tasks. Thus for each of the n k+1 we have n 1 × n 2 × · · · × n k ways of com- pleting the job by the induction hypothesis. Thus, the number of ways we can do the job is (1 ×(n 1 ×n 2 ×· · · ×n k )) +· · · + (1 ×(n 1 ×n 2 ×· · · ×n k )) . ¸¸ . n k+1 terms = n 1 ×n 2 ×· · · ×n k ×n k+1 . 1.16 a) 26 3 . b) 26 3 + 26 2 . c) 26 4 + 26 3 + 26 2 . 1.17 There are _ n 2 _ = n(n −1)/2 pieces on which the two numbers do not match. (Choose 2 out of n numbers without replacement.) There are n pieces on which the two numbers match. So the total number of diﬀerent pieces is n +n(n −1)/2 = n(n + 1)/2. 1.18 The probability is ( n 2 )n! n n = (n−1)(n−1)! 2n n−2 . There are many ways to obtain this. Here is one. The denominator is n n because this is the number of ways to place n balls in n cells. The numerator is the number of ways of placing the balls such that exactly one cell is empty. There are n ways to specify the empty cell. There are n −1 ways of choosing the cell with two balls. There are _ n 2 _ ways of picking the 2 balls to go into this cell. And there are (n − 2)! ways of placing the remaining n − 2 balls into the n − 2 cells, one ball in each cell. The product of these is the numerator n(n −1) _ n 2 _ (n −2)! = _ n 2 _ n!. 1.19 a. _ 6 4 _ = 15. b. Think of the n variables as n bins. Diﬀerentiating with respect to one of the variables is equivalent to putting a ball in the bin. Thus there are r unlabeled balls to be placed in n unlabeled bins, and there are _ n+r−1 r _ ways to do this. 1.20 A sample point speciﬁes on which day (1 through 7) each of the 12 calls happens. Thus there are 7 12 equally likely sample points. There are several diﬀerent ways that the calls might be assigned so that there is at least one call each day. There might be 6 calls one day and 1 call each of the other days. Denote this by 6111111. The number of sample points with this pattern is 7 _ 12 6 _ 6!. There are 7 ways to specify the day with 6 calls. There are _ 12 6 _ to specify which of the 12 calls are on this day. And there are 6! ways of assigning the remaining 6 calls to the remaining 6 days. We will now count another pattern. There might be 4 calls on one day, 2 calls on each of two days, and 1 call on each of the remaining four days. Denote this by 4221111. The number of sample points with this pattern is 7 _ 12 4 __ 6 2 __ 8 2 __ 6 2 _ 4!. (7 ways to pick day with 4 calls, _ 12 4 _ to pick the calls for that day, _ 6 2 _ to pick two days with two calls, _ 8 2 _ ways to pick two calls for lowered numbered day, _ 6 2 _ ways to pick the two calls for higher numbered day, 4! ways to order remaining 4 calls.) Here is a list of all the possibilities and the counts of the sample points for each one. pattern number of sample points 6111111 7 _ 12 6 _ 6! = 4,656,960 5211111 7 _ 12 5 _ 6 _ 7 2 _ 5! = 83,825,280 4221111 7 _ 12 4 __ 6 2 __ 8 2 __ 6 2 _ 4! = 523,908,000 4311111 7 _ 12 4 _ 6 _ 8 3 _ 5! = 139,708,800 3321111 _ 7 2 __ 12 3 __ 9 3 _ 5 _ 6 2 _ 4! = 698,544,000 3222111 7 _ 12 3 __ 6 3 __ 9 3 __ 7 2 __ 5 2 _ 3! = 1,397,088,000 2222211 _ 7 5 __ 12 2 __ 10 2 __ 8 2 __ 6 2 __ 4 2 _ 2! = 314,344,800 3,162,075,840 The probability is the total number of sample points divided by 7 12 , which is 3,162,075,840 7 12 .2285. 1.21 The probability is ( n 2r )2 2r ( 2n 2r ) . There are _ 2n 2r _ ways of choosing 2r shoes from a total of 2n shoes. Thus there are _ 2n 2r _ equally likely sample points. The numerator is the number of sample points for which there will be no matching pair. There are _ n 2r _ ways of choosing 2r diﬀerent shoes Second Edition 1-5 styles. There are two ways of choosing within a given shoe style (left shoe or right shoe), which gives 2 2r ways of arranging each one of the _ n 2r _ arrays. The product of this is the numerator _ n 2r _ 2 2r . 1.22 a) ( 31 15 )( 29 15 )( 31 15 )( 30 15 )···( 31 15 ) ( 366 180 ) b) 336 366 335 365 ··· 316 336 ( 366 30 ) . 1.23 P( same number of heads ) = n x=0 P(1 st tosses x, 2 nd tosses x) = n x=0 _ _ n x __ 1 2 _ x _ 1 2 _ n−x _ 2 = _ 1 4 _ n n x=0 _ n x _ 2 . 1.24 a. P(A wins) = i=1 P(A wins on i th toss) = 1 2 + _ 1 2 _ 2 1 2 + _ 1 2 _ 4 _ 1 2 _ +· · · = i=0 _ 1 2 _ 2i+1 = 2/3. b. P(A wins) = p + (1 −p) 2 p + (1 −p) 4 p +· · · = i=0 p(1 −p) 2i = p 1−(1−p) 2 . c. d dp _ p 1−(1−p) 2 _ = p 2 [1−(1−p) 2 ] 2 > 0. Thus the probability is increasing in p, and the minimum is at zero. Using L’Hˆopital’s rule we ﬁnd lim p→0 p 1−(1−p) 2 = 1/2. 1.25 Enumerating the sample space gives S = {(B, B), (B, G), (G, B), (G, G)} ,with each outcome equally likely. Thus P(at least one boy) = 3/4 and P(both are boys) = 1/4, therefore P( both are boys | at least one boy ) = 1/3. An ambiguity may arise if order is not acknowledged, the space is S = {(B, B), (B, G), (G, G)}, with each outcome equally likely. 1.27 a. For n odd the proof is straightforward. There are an even number of terms in the sum (0, 1, · · · , n), and _ n k _ and _ n n−k _ , which are equal, have opposite signs. Thus, all pairs cancel and the sum is zero. If n is even, use the following identity, which is the basis of Pascal’s triangle: For k > 0, _ n k _ = _ n−1 k _ + _ n−1 k−1 _ . Then, for n even n k=0 (−1) k _ n k _ = _ n 0 _ + n−1 k=1 (−1) k _ n k _ + _ n n _ = _ n 0 _ + _ n n _ + n−1 k=1 (−1) k __ n −1 k _ + _ n −1 k −1 __ = _ n 0 _ + _ n n _ _ n −1 0 _ _ n −1 n −1 _ = 0. b. Use the fact that for k > 0, k _ n k _ = n _ n−1 k−1 _ to write n k=1 k _ n k _ = n n k=1 _ n −1 k −1 _ = n n−1 j=0 _ n −1 j _ = n2 n−1 . 1-6 Solutions Manual for Statistical Inference c. n k=1 (−1) k+1 k _ n k _ = n k=1 (−1) k+1 _ n−1 k−1 _ = n n−1 j=0 (−1) j _ n−1 j _ = 0 from part a). 1.28 The average of the two integrals is [(nlog n −n) + ((n + 1) log (n + 1) −n)] /2 = [nlog n + (n + 1) log (n + 1)] /2 −n ≈ (n + 1/2) log n −n. Let d n = log n! −[(n + 1/2) log n −n], and we want to show that lim n→∞ md n = c, a constant. This would complete the problem, since the desired limit is the exponential of this one. This is accomplished in an indirect way, by working with diﬀerences, which avoids dealing with the factorial. Note that d n −d n+1 = _ n + 1 2 _ log _ 1 + 1 n _ −1. Diﬀerentiation will show that ((n + 1 2 )) log((1 + 1 n )) is increasing in n, and has minimum value (3/2) log 2 = 1.04 at n = 1. Thus d n − d n+1 > 0. Next recall the Taylor expansion of log(1 + x) = x − x 2 /2 + x 3 /3 − x 4 /4 + · · ·. The ﬁrst three terms provide an upper bound on log(1 +x), as the remaining adjacent pairs are negative. Hence 0 < d n d n+1 < _ n + 1 2 __ 1 n 1 2n 2 + 1 3n 3 _ −1 = 1 12n 2 + 1 6n 3 . It therefore follows, by the comparison test, that the series 1 d n −d n+1 converges. Moreover, the partial sums must approach a limit. Hence, since the sum telescopes, lim N→∞ N 1 d n −d n+1 = lim N→∞ d 1 −d N+1 = c. Thus lim n→∞ d n = d 1 −c, a constant. 1.29 a. Unordered Ordered {4,4,12,12} (4,4,12,12), (4,12,12,4), (4,12,4,12) (12,4,12,4), (12,4,4,12), (12,12,4,4) Unordered Ordered (2,9,9,12), (2,9,12,9), (2,12,9,9), (9,2,9,12) {2,9,9,12} (9,2,12,9), (9,9,2,12), (9,9,12,2), (9,12,2,9) (9,12,9,2), (12,2,9,9), (12,9,2,9), (12,9,9,2) b. Same as (a). c. There are 6 6 ordered samples with replacement from {1, 2, 7, 8, 14, 20}. The number of or- dered samples that would result in {2, 7, 7, 8, 14, 14} is 6! 2!2!1!1! = 180 (See Example 1.2.20). Thus the probability is 180 6 6 . d. If the k objects were distinguishable then there would be k! possible ordered arrangements. Since we have k 1 , . . . , k m diﬀerent groups of indistinguishable objects, once the positions of the objects are ﬁxed in the ordered arrangement permutations within objects of the same group won’t change the ordered arrangement. There are k 1 !k 2 ! · · · k m ! of such permutations for each ordered component. Thus there would be k! k 1 !k 2 !···k m ! diﬀerent ordered components. e. Think of the m distinct numbers as m bins. Selecting a sample of size k, with replacement, is the same as putting k balls in the m bins. This is _ k+m−1 k _ , which is the number of distinct bootstrap samples. Note that, to create all of the bootstrap samples, we do not need to know what the original sample was. We only need to know the sample size and the distinct values. 1.31 a. The number of ordered samples drawn with replacement from the set {x 1 , . . . , x n } is n n . The number of ordered samples that make up the unordered sample {x 1 , . . . , x n } is n!. Therefore the outcome with average x 1 +x 2 +···+x n n that is obtained by the unordered sample {x 1 , . . . , x n } Second Edition 1-7 has probability n! n n . Any other unordered outcome from {x 1 , . . . , x n }, distinct from the un- ordered sample {x 1 , . . . , x n }, will contain m diﬀerent numbers repeated k 1 , . . . , k m times where k 1 + k 2 + · · · + k m = n with at least one of the k i ’s satisfying 2 ≤ k i ≤ n. The probability of obtaining the corresponding average of such outcome is n! k 1 !k 2 ! · · · k m !n n < n! n n , since k 1 !k 2 ! · · · k m ! > 1. Therefore the outcome with average x 1 +x 2 +···+x n n is the most likely. b. Stirling’s approximation is that, as n →∞, n! ≈ 2πn n+(1/2) e −n , and thus _ n! n n __ _ 2nπ e n _ = n!e n n n 2nπ = 2πn n+(1/2) e −n e n n n 2nπ = 1. c. Since we are drawing with replacement from the set {x 1 , . . . , x n }, the probability of choosing any x i is 1 n . Therefore the probability of obtaining an ordered sample of size n without x i is (1 − 1 n ) n . To prove that lim n→∞ (1 − 1 n ) n = e −1 , calculate the limit of the log. That is lim n→∞ nlog _ 1 − 1 n _ = lim n→∞ log _ 1 − 1 n _ 1/n . L’Hˆopital’s rule shows that the limit is −1, establishing the result. See also Lemma 2.3.14. 1.32 This is most easily seen by doing each possibility. Let P(i) = probability that the candidate hired on the ith trial is best. Then P(1) = 1 N , P(2) = 1 N −1 , . . . , P(i) = 1 N −i + 1 , . . . , P(N) = 1. 1.33 Using Bayes rule P(M|CB) = P(CB|M)P(M) P(CB|M)P(M) +P(CB|F)P(F) = .05 × 1 2 .05 × 1 2 +.0025 × 1 2 = .9524. 1.34 a. P(Brown Hair) = P(Brown Hair|Litter 1)P(Litter 1) +P(Brown Hair|Litter 2)P(Litter 2) = _ 2 3 __ 1 2 _ + _ 3 5 __ 1 2 _ = 19 30 . b. Use Bayes Theorem P(Litter 1|Brown Hair) = P(BH|L1)P(L1) P(BH|L1)P(L1) +P(BH|L2)P(L2 = _ 2 3 _ _ 1 2 _ 19 30 = 10 19 . 1.35 Clearly P(·|B) ≥ 0, and P(S|B) = 1. If A 1 , A 2 , . . . are disjoint, then P _ _ i=1 A i ¸ ¸ ¸ ¸ ¸ B _ = P ( i=1 A i ∩ B) P(B) = P ( i=1 (A i ∩ B)) P(B) = i=1 P(A i ∩ B) P(B) = i=1 P(A i |B). 1-8 Solutions Manual for Statistical Inference 1.37 a. Using the same events A, B, C and W as in Example 1.3.4, we have P(W) = P(W|A)P(A) +P(W|B)P(B) +P(W|C)P(C) = γ _ 1 3 _ + 0 _ 1 3 _ + 1 _ 1 3 _ = γ+1 3 . Thus, P(A|W) = P(A∩W) P(W) = γ/3 (γ+1)/3 = γ γ+1 where, _ ¸ _ ¸ _ γ γ+1 = 1 3 if γ = 1 2 γ γ+1 < 1 3 if γ < 1 2 γ γ+1 > 1 3 if γ > 1 2 . b. By Exercise 1.35, P(·|W) is a probability function. A, B and C are a partition. So P(A|W) +P(B|W) +P(C|W) = 1. But, P(B|W) = 0. Thus, P(A|W) + P(C|W) = 1. Since P(A|W) = 1/3, P(C|W) = 2/3. (This could be calculated directly, as in Example 1.3.4.) So if A can swap fates with C, his chance of survival becomes 2/3. 1.38 a. P(A) = P(A ∩ B) + P(A ∩ B c ) from Theorem 1.2.11a. But (A ∩ B c ) ⊂ B c and P(B c ) = 1 −P(B) = 0. So P(A∩ B c ) = 0, and P(A) = P(A∩ B). Thus, P(A|B) = P(A∩ B) P(B) = P(A) 1 = P(A) . b. A ⊂ B implies A∩ B = A. Thus, P(B|A) = P(A∩ B) P(A) = P(A) P(A) = 1. And also, P(A|B) = P(A∩ B) P(B) = P(A) P(B) . c. If A and B are mutually exclusive, then P(A ∪ B) = P(A) + P(B) and A ∩ (A ∪ B) = A. Thus, P(A|A∪ B) = P(A∩ (A∪ B)) P(A∪ B) = P(A) P(A) +P(B) . d. P(A∩ B ∩ C) = P(A∩ (B ∩ C)) = P(A|B ∩ C)P(B ∩ C) = P(A|B ∩ C)P(B|C)P(C). 1.39 a. Suppose A and B are mutually exclusive. Then A ∩ B = ∅ and P(A ∩ B) = 0. If A and B are independent, then 0 = P(A ∩ B) = P(A)P(B). But this cannot be since P(A) > 0 and P(B) > 0. Thus A and B cannot be independent. b. If A and B are independent and both have positive probability, then 0 < P(A)P(B) = P(A∩ B). This implies A∩ B = ∅, that is, A and B are not mutually exclusive. 1.40 a. P(A c ∩ B) = P(A c |B)P(B) = [1 −P(A|B)]P(B) = [1 −P(A)]P(B) = P(A c )P(B) , where the third equality follows from the independence of A and B. b. P(A c ∩ B c ) = P(A c ) −P(A c ∩ B) = P(A c ) −P(A c )P(B) = P(A c )P(B c ). Second Edition 1-9 1.41 a. P( dash sent | dash rec) = P( dash rec | dash sent)P( dash sent) P( dash rec | dash sent)P( dash sent) +P( dash rec | dot sent)P( dot sent) = (2/3)(4/7) (2/3)(4/7) + (1/4)(3/7) = 32/41. b. By a similar calculation as the one in (a) P(dot sent|dot rec) = 27/434. Then we have P( dash sent|dot rec) = 16 43 . Given that dot-dot was received, the distribution of the four possibilities of what was sent are Event Probability dash-dash (16/43) 2 dash-dot (16/43)(27/43) dot-dash (27/43)(16/43) dot-dot (27/43) 2 1.43 a. For Boole’s Inequality, P(∪ n i=1 ) ≤ n i=1 P(A i ) −P 2 +P 3 +· · · ±P n n i=1 P(A i ) since P i ≥ P j if i ≤ j and therefore the terms −P 2k + P 2k+1 ≤ 0 for k = 1, . . . , n−1 2 when n is odd. When n is even the last term to consider is −P n ≤ 0. For Bonferroni’s Inequality apply the inclusion-exclusion identity to the A c i , and use the argument leading to (1.2.10). b. We illustrate the proof that the P i are increasing by showing that P 2 ≥ P 3 . The other arguments are similar. Write P 2 = 1≤i<j≤n P(A i ∩ A j ) = n−1 i=1 n j=i+1 P(A i ∩ A j ) = n−1 i=1 n j=i+1 _ n k=1 P(A i ∩ A j ∩ A k ) +P(A i ∩ A j ∩ (∪ k A k ) c ) _ Now to get to P 3 we drop terms from this last expression. That is n−1 i=1 n j=i+1 _ n k=1 P(A i ∩ A j ∩ A k ) +P(A i ∩ A j ∩ (∪ k A k ) c ) _ n−1 i=1 n j=i+1 _ n k=1 P(A i ∩ A j ∩ A k ) _ n−2 i=1 n−1 j=i+1 n k=j+1 P(A i ∩ A j ∩ A k ) = 1≤i<j<k≤n P(A i ∩ A j ∩ A k ) = P 3 . The sequence of bounds is improving because the bounds P 1 , P 1 −P 2 +P 3 , P 1 −P 2 +P 3 −P 4 + P 5 , . . ., are getting smaller since P i ≥ P j if i ≤ j and therefore the terms −P 2k +P 2k+1 ≤ 0. The lower bounds P 1 −P 2 , P 1 −P 2 +P 3 −P 4 , P 1 −P 2 +P 3 −P 4 +P 5 −P 6 , . . ., are getting bigger since P i ≥ P j if i ≤ j and therefore the terms P 2k+1 −P 2k ≥ 0. 1-10 Solutions Manual for Statistical Inference c. If all of the A i are equal, all of the probabilities in the inclusion-exclusion identity are the same. Thus P 1 = nP(A), P 2 = _ n 2 _ P(A), . . . , P j = _ n j _ P(A), and the sequence of upper bounds on P(∪ i A i ) = P(A) becomes P 1 = nP(A), P 1 −P 2 +P 3 = _ n − _ n 2 _ + _ n 3 __ P(A), . . . which eventually sum to one, so the last bound is exact. For the lower bounds we get P 1 −P 2 = _ n − _ n 2 __ P(A), P 1 −P 2 +P 3 −P 4 = _ n − _ n 2 _ + _ n 3 _ _ n 4 __ P(A), . . . which start out negative, then become positive, with the last one equaling P(A) (see Schwa- ger 1984 for details). 1.44 P(at least 10 correct|guessing) = 20 k=10 _ 20 k _ _ 1 4 _ k _ 3 4 _ n−k = .01386. 1.45 X is ﬁnite. Therefore B is the set of all subsets of X. We must verify each of the three properties in Deﬁnition 1.2.4. (1) If A ∈ B then P X (A) = P(∪ x i ∈A {s j ∈ S : X(s j ) = x i }) ≥ 0 since P is a probability function. (2) P X (X) = P(∪ m i=1 {s j ∈ S : X(s j ) = x i }) = P(S) = 1. (3) If A 1 , A 2 , . . . ∈ B and pairwise disjoint then P X (∪ k=1 A k ) = P( _ k=1 {∪ x i ∈A k {s j ∈ S : X(s j ) = x i }}) = k=1 P(∪ x i ∈A k {s j ∈ S : X(s j ) = x i }) = k=1 P X (A k ), where the second inequality follows from the fact the P is a probability function. 1.46 This is similar to Exercise 1.20. There are 7 7 equally likely sample points. The possible values of X 3 are 0, 1 and 2. Only the pattern 331 (3 balls in one cell, 3 balls in another cell and 1 ball in a third cell) yields X 3 = 2. The number of sample points with this pattern is _ 7 2 __ 7 3 __ 4 3 _ 5 = 14,700. So P(X 3 = 2) = 14,700/7 7 ≈ .0178. There are 4 patterns that yield X 3 = 1. The number of sample points that give each of these patterns is given below. pattern number of sample points 34 7 _ 7 3 _ 6 = 1,470 322 7 _ 7 3 __ 6 2 __ 4 2 __ 2 2 _ = 22,050 3211 7 _ 7 3 _ 6 _ 4 2 __ 5 2 _ 2! = 176,400 31111 7 _ 7 3 __ 6 4 _ 4! = 88,200 288,120 So P(X 3 = 1) = 288,120/7 7 ≈ .3498. The number of sample points that yield X 3 = 0 is 7 7 −288,120 −14,700 = 520,723, and P(X 3 = 0) = 520,723/7 7 ≈ .6322. 1.47 All of the functions are continuous, hence right-continuous. Thus we only need to check the limit, and that they are nondecreasing a. lim x→−∞ 1 2 + 1 π tan −1 (x) = 1 2 + 1 π _ −π 2 _ = 0, lim x→∞ 1 2 + 1 π tan −1 (x) = 1 2 + 1 π _ π 2 _ = 1, and d dx _ 1 2 + 1 π tan −1 (x) _ = 1 1+x 2 > 0, so F(x) is increasing. b. See Example 1.5.5. c. lim x→−∞ e −e −x = 0, lim x→∞ e −e −x = 1, d dx e −e −x = e −x e −e −x > 0. d. lim x→−∞ (1 −e −x ) = 0, lim x→∞ (1 −e −x ) = 1, d dx (1 −e −x ) = e −x > 0. Second Edition 1-11 e. lim y→−∞ 1− 1+e −y = 0, lim y→∞ + 1− 1+e −y = 1, d dx ( 1− 1+e −y ) = (1−)e −y (1+e −y ) 2 > 0 and d dx ( + 1− 1+e −y ) > 0, F Y (y) is continuous except on y = 0 where lim y↓0 ( + 1− 1+e −y ) = F(0). Thus is F Y (y) right continuous. 1.48 If F(·) is a cdf, F(x) = P(X ≤ x). Hence lim x→∞ P(X ≤ x) = 0 and lim x→−∞ P(X ≤ x) = 1. F(x) is nondecreasing since the set {x : X ≤ x} is nondecreasing in x. Lastly, as x ↓ x 0 , P(X ≤ x) →P(X ≤ x 0 ), so F(·) is right-continuous. (This is merely a consequence of deﬁning F(x) with “ ≤ ”.) 1.49 For every t, F X (t) ≤ F Y (t). Thus we have P(X > t) = 1 −P(X ≤ t) = 1 −F X (t) ≥ 1 −F Y (t) = 1 −P(Y ≤ t) = P(Y > t). And for some t , F X (t ) < F Y (t ). Then we have that P(X > t ) = 1 −P(X ≤ t ) = 1 −F X (t ) > 1 −F Y (t ) = 1 −P(Y ≤ t ) = P(Y > t ). 1.50 Proof by induction. For n = 2 2 k=1 t k−1 = 1 +t = 1−t 2 1−t . Assume true for n, this is n k=1 t k−1 = 1−t n 1−t . Then for n + 1 n+1 k=1 t k−1 = n k=1 t k−1 +t n = 1−t n 1−t +t n = 1−t n +t n (1−t) 1−t = 1−t n+1 1−t , where the second inequality follows from the induction hypothesis. 1.51 This kind of random variable is called hypergeometric in Chapter 3. The probabilities are obtained by counting arguments, as follows. x f X (x) = P(X = x) 0 _ 5 0 __ 25 4 _ _ _ 30 4 _ ≈ .4616 1 _ 5 1 __ 25 3 _ _ _ 30 4 _ ≈ .4196 2 _ 5 2 __ 25 2 _ _ _ 30 4 _ ≈ .1095 3 _ 5 3 __ 25 1 _ _ _ 30 4 _ ≈ .0091 4 _ 5 4 __ 25 0 _ _ _ 30 4 _ ≈ .0002 The cdf is a step function with jumps at x = 0, 1, 2, 3 and 4. 1.52 The function g(·) is clearly positive. Also, _ x 0 g(x)dx = _ x 0 f(x) 1−F(x 0 ) dx = 1−F(x 0 ) 1−F(x 0 ) = 1. 1.53 a. lim y→−∞ F Y (y) = lim y→−∞ 0 = 0 and lim y→∞ F Y (y) = lim y→∞ 1 − 1 y 2 = 1. For y ≤ 1, F Y (y) = 0 is constant. For y > 1, d dy F Y (y) = 2/y 3 > 0, so F Y is increasing. Thus for all y, F Y is nondecreasing. Therefore F Y is a cdf. b. The pdf is f Y (y) = d dy F Y (y) = _ 2/y 3 if y > 1 0 if y ≤ 1. c. F Z (z) = P(Z ≤ z) = P(10(Y −1) ≤ z) = P(Y ≤ (z/10) + 1) = F Y ((z/10) + 1). Thus, F Z (z) = _ 0 if z ≤ 0 1 − _ 1 [(z/10)+1] 2 _ if z > 0. 1-12 Solutions Manual for Statistical Inference 1.54 a. _ π/2 0 sin xdx = 1. Thus, c = 1/1 = 1. b. _ −∞ e −|x| dx = _ 0 −∞ e x dx + _ 0 e −x dx = 1 + 1 = 2. Thus, c = 1/2. 1.55 P(V ≤ 5) = P(T < 3) = _ 3 0 1 1.5 e −t/1.5 dt = 1 −e −2 . For v ≥ 6, P(V ≤ v) = P(2T ≤ v) = P _ T ≤ v 2 _ = _ v 2 0 1 1.5 e −t/1.5 dt = 1 −e −v/3 . Therefore, P(V ≤ v) = _ 0 −∞< v < 0, 1 −e −2 0 ≤ v < 6 , 1 −e −v/3 6 ≤ v . Chapter 2 Transformations and Expectations 2.1 a. f x (x) = 42x 5 (1 − x), 0 < x < 1; y = x 3 = g(x), monotone, and Y = (0, 1). Use Theorem 2.1.5. f Y (y) = f x (g −1 (y)) ¸ ¸ ¸ ¸ d dy g −1 (y) ¸ ¸ ¸ ¸ = f x (y 1/3 ) d dy (y 1/3 ) = 42y 5/3 (1 −y 1/3 )( 1 3 y −2/3 ) = 14y(1 −y 1/3 ) = 14y −14y 4/3 , 0 < y < 1. To check the integral, _ 1 0 (14y −14y 4/3 )dy = 7y 2 −14 y 7/3 7/3 ¸ ¸ ¸ ¸ 1 0 = 7y 2 −6y 7/3 ¸ ¸ ¸ 1 0 = 1 −0 = 1. b. f x (x) = 7e −7x , 0 < x < ∞, y = 4x + 3, monotone, and Y = (3, ∞). Use Theorem 2.1.5. f Y (y) = f x ( y −3 4 ) ¸ ¸ ¸ ¸ d dy ( y −3 4 ) ¸ ¸ ¸ ¸ = 7e −(7/4)(y−3) ¸ ¸ ¸ ¸ 1 4 ¸ ¸ ¸ ¸ = 7 4 e −(7/4)(y−3) , 3 < y < ∞. To check the integral, _ 3 7 4 e −(7/4)(y−3) dy = −e −(7/4)(y−3) ¸ ¸ ¸ 3 = 0 −(−1) = 1. c. F Y (y) = P(0 ≤ X ≤ y) = F X ( y). Then f Y (y) = 1 2 y f X ( y). Therefore f Y (y) = 1 2 y 30( y) 2 (1 − y) 2 = 15y 1 2 (1 − y) 2 , 0 < y < 1. To check the integral, _ 1 0 15y 1 2 (1 − y) 2 dy = _ 1 0 (15y 1 2 −30y + 15y 3 2 )dy = 15( 2 3 ) −30( 1 2 ) + 15( 2 5 ) = 1. 2.2 In all three cases, Theorem 2.1.5 is applicable and yields the following answers. a. f Y (y) = 1 2 y −1/2 , 0 < y < 1. b. f Y (y) = (n+m+1)! n!m! e −y(n+1) (1 −e −y ) m , 0 < y < ∞. c. f Y (y) = 1 σ 2 log y y e −(1/2)((log y)/σ) 2 , 0 < y < ∞. 2.3 P(Y = y) = P( X X+1 = y) = P(X = y 1−y ) = 1 3 ( 2 3 ) y/(1−y) , where y = 0, 1 2 , 2 3 , 3 4 , . . . , x x+1 , . . . . 2.4 a. f(x) is a pdf since it is positive and _ −∞ f(x)dx = _ 0 −∞ 1 2 λe λx dx + _ 0 1 2 λe −λx dx = 1 2 + 1 2 = 1. 2-2 Solutions Manual for Statistical Inference b. Let X be a random variable with density f(x). P(X < t) = _ _ t −∞ 1 2 λe λx dx if t < 0 _ 0 −∞ 1 2 λe λx dx+ _ t 0 1 2 λe −λx dx if t ≥ 0 where, _ t −∞ 1 2 λe λx dx = 1 2 e λx ¸ ¸ t −∞ = 1 2 e λt and _ t 0 1 2 λe −λx dx = − 1 2 e −λx ¸ ¸ t 0 = − 1 2 e −λt + 1 2 . Therefore, P(X < t) = _ 1 2 e λt if t < 0 1 − 1 2 e −λt dx if t ≥ 0 c. P(|X| < t) = 0 for t < 0, and for t ≥ 0, P(|X| < t) = P(−t < X < t) = _ 0 −t 1 2 λe λx dx + _ t 0 1 2 λe −λx dx = 1 2 _ 1 −e −λt ¸ + 1 2 _ −e −λt +1 ¸ = 1 −e −λt . 2.5 To apply Theorem 2.1.8. Let A 0 = {0}, A 1 = (0, π 2 ), A 3 = (π, 2 ) and A 4 = ( 2 , 2π). Then g i (x) = sin 2 (x) on A i for i = 1, 2, 3, 4. Therefore g −1 1 (y) = sin −1 ( y), g −1 2 (y) = π −sin −1 ( y), g −1 3 (y) = sin −1 ( y) +π and g −1 4 (y) = 2π −sin −1 ( y). Thus f Y (y) = 1 ¸ ¸ ¸ ¸ 1 1 −y 1 2 y ¸ ¸ ¸ ¸ + 1 ¸ ¸ ¸ ¸ 1 1 −y 1 2 y ¸ ¸ ¸ ¸ + 1 ¸ ¸ ¸ ¸ 1 1 −y 1 2 y ¸ ¸ ¸ ¸ + 1 ¸ ¸ ¸ ¸ 1 1 −y 1 2 y ¸ ¸ ¸ ¸ = 1 π _ y(1 −y) , 0 ≤ y ≤ 1 To use the cdf given in (2.1.6) we have that x 1 = sin −1 ( y) and x 2 = π −sin −1 ( y). Then by diﬀerentiating (2.1.6) we obtain that f Y (y) = 2f X (sin −1 ( y) d dy (sin −1 ( y) −2f X (π −sin −1 ( y) d dy (π −sin −1 ( y) = 2( 1 1 1 −y 1 2 y ) −2( 1 −1 1 −y 1 2 y ) = 1 π _ y(1 −y) 2.6 Theorem 2.1.8 can be used for all three parts. a. Let A 0 = {0}, A 1 = (−∞, 0) and A 2 = (0, ∞). Then g 1 (x) = |x| 3 = −x 3 on A 1 and g 2 (x) = |x| 3 = x 3 on A 2 . Use Theorem 2.1.8 to obtain f Y (y) = 1 3 e −y 1/3 y −2/3 , 0 < y < ∞ . b. Let A 0 = {0}, A 1 = (−1, 0) and A 2 = (0, 1). Then g 1 (x) = 1 −x 2 on A 1 and g 2 (x) = 1 −x 2 on A 2 . Use Theorem 2.1.8 to obtain f Y (y) = 3 8 (1 −y) −1/2 + 3 8 (1 −y) 1/2 , 0 < y < 1 . Second Edition 2-3 c. Let A 0 = {0}, A 1 = (−1, 0) and A 2 = (0, 1). Then g 1 (x) = 1 −x 2 on A 1 and g 2 (x) = 1 −x on A 2 . Use Theorem 2.1.8 to obtain f Y (y) = 3 16 (1 − _ 1 −y) 2 1 1 −y + 3 8 (2 −y) 2 , 0 < y < 1 . 2.7 Theorem 2.1.8 does not directly apply. a. Theorem 2.1.8 does not directly apply. Instead write P(Y ≤ y) = P(X 2 ≤ y) = _ P(− y ≤ X ≤ y) if |x| ≤ 1 P(1 ≤ X ≤ y) if x ≥ 1 = _ _ y y f X (x)dx if |x| ≤ 1 _ y 1 f X (x)dx if x ≥ 1 . Diﬀerentiation gives f y (y) = _ 2 9 1 y if y ≤ 1 1 9 + 1 9 1 y if y ≥ 1 . b. If the sets B 1 , B 2 , . . . , B K are a partition of the range of Y , we can write f Y (y) = k f Y (y)I(y ∈ B k ) and do the transformation on each of the B k . So this says that we can apply Theorem 2.1.8 on each of the B k and add up the pieces. For A 1 = (−1, 1) and A 2 = (1, 2) the calculations are identical to those in part (a). (Note that on A 1 we are essentially using Example 2.1.7). 2.8 For each function we check the conditions of Theorem 1.5.3. a. (i) lim x→0 F(x) = 1 −e −0 = 0, lim x→−∞ F(x) = 1 −e −∞ = 1. (ii) 1 −e −x is increasing in x. (iii) 1 −e −x is continuous. (iv) F −1 x (y) = −log(1 −y). b. (i) lim x→−∞ F(x) = e −∞ /2 = 0, lim x→∞ F(x) = 1 −(e 1−∞ /2) = 1. (ii) e −x/2 is increasing, 1/2 is nondecreasing, 1 −(e 1−x /2) is increasing. (iii) For continuity we only need check x = 0 and x = 1, and lim x→0 F(x) = 1/2, lim x→1 F(x) = 1/2, so F is continuous. (iv) F −1 X (y) = _ log(2y) 0 ≤ y < 1 2 ≤ y < 1, 1 −log(2(1 −y)) 1 2 ≤ y < 1 c. (i) lim x→−∞ F(x) = e −∞ /4 = 0, lim x→∞ F(x) = 1 −e −∞ /4 = 1. (ii) e −x /4 and 1 −e −x /4 are both increasing in x. (iii) lim x↓0 F(x) = 1 −e −0 /4 = 3 4 = F(0), so F is right-continuous. (iv) F −1 X (y) = _ log(4y) 0 ≤ y < 1 4 −log(4(1 −y)) 1 4 ≤ y < 1 2-4 Solutions Manual for Statistical Inference 2.9 From the probability integral transformation, Theorem 2.1.10, we know that if u(x) = F x (x), then F x (X) ∼ uniform(0, 1). Therefore, for the given pdf, calculate u(x) = F x (x) = _ 0 if x ≤ 1 (x −1) 2 /4 if 1 < x < 3 1 if 3 ≤ x . 2.10 a. We prove part b), which is equivalent to part a). b. Let A y = {x : F x (x) ≤ y}. Since F x is nondecreasing, A y is a half inﬁnite interval, either open, say (−∞, x y ), or closed, say (−∞, x y ]. If A y is closed, then F Y (y) = P(Y ≤ y) = P(F x (X) ≤ y) = P(X ∈ A y ) = F x (x y ) ≤ y. The last inequality is true because x y ∈ A y , and F x (x) ≤ y for every x ∈ A y . If A y is open, then F Y (y) = P(Y ≤ y) = P(F x (X) ≤ y) = P(X ∈ A y ), as before. But now we have P(X ∈ A y ) = P(X ∈ ( −∞,x y )) = lim x↑y P(X ∈ (−∞, x]), Use the Axiom of Continuity, Exercise 1.12, and this equals lim x↑y F X (x) ≤ y. The last inequality is true since F x (x) ≤ y for every x ∈ A y , that is, for every x < x y . Thus, F Y (y) ≤ y for every y. To get strict inequality for some y, let y be a value that is “jumped over” by F x . That is, let y be such that, for some x y , lim x↑y F X (x) < y < F X (x y ). For such a y, A y = (−∞, x y ), and F Y (y) = lim x↑y F X (x) < y. 2.11 a. Using integration by parts with u = x and dv = xe −x 2 2 dx then EX 2 = _ −∞ x 2 1 e −x 2 2 dx = 1 _ −xe −x 2 2 ¸ ¸ ¸ ¸ −∞ + _ −∞ e −x 2 2 dx _ = 1 (2π) = 1. Using example 2.1.7 let Y = X 2 . Then f Y (y) = 1 2 y _ 1 e −y 2 + 1 e −y 2 _ = 1 2πy e −y 2 . Therefore, EY = _ 0 y 2πy e −y 2 dy = 1 _ −2y 1 2 e −y 2 ¸ ¸ ¸ 0 + _ 0 y −1 2 e −y 2 dy _ = 1 ( 2π) = 1. This was obtained using integration by parts with u = 2y 1 2 and dv = 1 2 e −y 2 and the fact the f Y (y) integrates to 1. b. Y = |X| where −∞ < x < ∞. Therefore 0 < y < ∞. Then F Y (y) = P(Y ≤ y) = P(|X| ≤ y) = P(−y ≤ X ≤ y) = P(x ≤ y) −P(X ≤ −y) = F X (y) −F X (−y). Second Edition 2-5 Therefore, F Y (y) = d dy F Y (y) = f X (y) +f X (−y) = 1 e −y 2 + 1 e −y 2 = _ 2 π e −y 2 . Thus, EY = _ 0 y _ 2 π e −y 2 dy = _ 2 π _ 0 e −u du = _ 2 π _ −e −u ¸ ¸ 0 ¸ = _ 2 π , where u = y 2 2 . EY 2 = _ 0 y 2 _ 2 π e −y 2 dy = _ 2 π _ −ye −y 2 ¸ ¸ ¸ 0 + _ 0 e −y 2 dy _ = _ 2 π _ π 2 = 1. This was done using integration by part with u = y and dv = ye −y 2 dy. Then Var(Y ) = 1− 2 π . 2.12 We have tan x = y/d, therefore tan −1 (y/d) = x and d dy tan −1 (y/d) = 1 1+(y/d) 2 1 d dy = dx. Thus, f Y (y) = 2 πd 1 1+(y/d) 2 , 0 < y < ∞. This is the Cauchy distribution restricted to (0, ∞), and the mean is inﬁnite. 2.13 P(X = k) = (1 −p) k p +p k (1 −p), k = 1, 2, . . .. Therefore, EX = k=1 k[(1 −p) k p +p k (1 −p)] = (1 −p)p _ k=1 k(1 −p) k−1 + k=1 kp k−1 _ = (1 −p)p _ 1 p 2 + 1 (1 −p) 2 _ = 1 −2p + 2p 2 p(1 −p) . 2.14 _ 0 (1 −F X (x))dx = _ 0 P(X > x)dx = _ 0 _ x f X (y)dydx = _ 0 _ y 0 dxf X (y)dy = _ 0 yf X (y)dy = EX, where the last equality follows from changing the order of integration. 2.15 Assume without loss of generality that X ≤ Y . Then X ∨ Y = Y and X ∧ Y = X. Thus X +Y = (X ∧ Y ) + (X ∨ Y ). Taking expectations E[X +Y ] = E[(X ∧ Y ) + (X ∨ Y )] = E(X ∧ Y ) + E(X ∨ Y ). Therefore E(X ∨ Y ) = EX + EY −E(X ∧ Y ). 2.16 From Exercise 2.14, ET = _ 0 _ ae −λt +(1 −a)e −µt ¸ dt = −ae −λt λ (1 −a)e −µt µ ¸ ¸ ¸ ¸ 0 = a λ + 1 −a µ . 2-6 Solutions Manual for Statistical Inference 2.17 a. _ m 0 3x 2 dx = m 3 set = 1 2 ⇒ m = _ 1 2 _ 1/3 = .794. b. The function is symmetric about zero, therefore m = 0 as long as the integral is ﬁnite. 1 π _ −∞ 1 1+x 2 dx = 1 π tan −1 (x) ¸ ¸ ¸ ¸ −∞ = 1 π _ π 2 + π 2 _ = 1. This is the Cauchy pdf. 2.18 E|X −a| = _ −∞ |x −a|f(x)dx = _ a −∞ −(x −a)f(x)dx + _ a (x −a)f(x)dx. Then, d da E|X −a| = _ a −∞ f(x)dx − _ a f(x)dx set = 0. The solution to this equation is a = median. This is a minimum since d 2 /da 2 E|X−a| = 2f(a) > 0. 2.19 d da E(X −a) 2 = d da _ −∞ (x −a) 2 f X (x)dx = _ −∞ d da (x −a) 2 f X (x)dx = _ −∞ −2(x −a)f X (x)dx = −2 __ −∞ xf X (x)dx −a _ −∞ f X (x)dx _ = −2[EX −a]. Therefore if d da E(X−a) 2 = 0 then −2[EX−a] = 0 which implies that EX = a. If EX = a then d da E(X −a) 2 = −2[EX −a] = −2[a −a] = 0. EX = a is a minimum since d 2 /da 2 E(X −a) 2 = 2 > 0. The assumptions that are needed are the ones listed in Theorem 2.4.3. 2.20 From Example 1.5.4, if X = number of children until the ﬁrst daughter, then P(X = k) = (1 −p) k−1 p, where p = probability of a daughter. Thus X is a geometric random variable, and EX = k=1 k(1 −p) k−1 p = p − k=1 d dp (1 −p) k = −p d dp _ k=0 (1 −p) k −1 _ = −p d dp _ 1 p −1 _ = 1 p . Therefore, if p = 1 2 ,the expected number of children is two. 2.21 Since g(x) is monotone Eg(X) = _ −∞ g(x)f X (x)dx = _ −∞ yf X (g −1 (y)) d dy g −1 (y)dy = _ −∞ yf Y (y)dy = EY, where the second equality follows from the change of variable y = g(x), x = g −1 (y) and dx = d dy g −1 (y)dy. 2.22 a. Using integration by parts with u = x and dv = xe −x 2 2 we obtain that _ 0 x 2 e −x 2 2 dx 2 = β 2 2 _ 0 e −x 2 2 dx. The integral can be evaluated using the argument on pages 104-105 (see 3.3.14) or by trans- forming to a gamma kernel (use y = −λ 2 2 ). Therefore, _ 0 e −x 2 2 dx = πβ/2 and hence the function integrates to 1. Second Edition 2-7 b. EX = 2β/ π EX 2 = 3β 2 /2 VarX = β 2 _ 3 2 4 π ¸ . 2.23 a. Use Theorem 2.1.8 with A 0 = {0}, A 1 = (−1, 0) and A 2 = (0, 1). Then g 1 (x) = x 2 on A 1 and g 2 (x) = x 2 on A 2 . Then f Y (y) = 1 2 y −1/2 , 0 < y < 1. b. EY = _ 1 0 yf Y (y)dy = 1 3 EY 2 = _ 1 0 y 2 f Y (y)dy = 1 5 VarY = 1 5 _ 1 3 _ 2 = 4 45 . 2.24 a. EX = _ 1 0 xax a−1 dx = _ 1 0 ax a dx = ax a+1 a+1 ¸ ¸ ¸ 1 0 = a a+1 . EX 2 = _ 1 0 x 2 ax a−1 dx = _ 1 0 ax a+1 dx = ax a+2 a+2 ¸ ¸ ¸ 1 0 = a a+2 . VarX = a a+2 _ a a+1 _ 2 = a (a+2)(a+1) 2 . b. EX = n x=1 x n = 1 n n x=1 x = 1 n n(n+1) 2 = n+1 2 . EX 2 = n i=1 x 2 n = 1 n n i=1 x 2 = 1 n n(n+1)(2n+1) 6 = (n+1)(2n+1) 6 . VarX = (n+1)(2n+1) 6 _ n+1 2 _ 2 = 2n 2 +3n+1 6 n 2 +2n+1 4 = n 2 +1 12 . c. EX = _ 2 0 x 3 2 (x −1) 2 dx = 3 2 _ 2 0 (x 3 −2x 2 +x)dx = 1. EX 2 = _ 2 0 x 2 3 2 (x −1) 2 dx = 3 2 _ 2 0 (x 4 −2x 3 +x 2 )dx = 8 5 . VarX = 8 5 −1 2 = 3 5 . 2.25 a. Y = −X and g −1 (y) = −y. Thus f Y (y) = f X (g −1 (y))| d dy g −1 (y)| = f X (−y)| − 1| = f X (y) for every y. b. To show that M X (t) is symmetric about 0 we must show that M X (0 + ) = M X (0 −) for all > 0. M X (0 +) = _ −∞ e (0+)x f X (x)dx = _ 0 −∞ e x f X (x)dx + _ 0 e x f X (x)dx = _ 0 e (−x) f X (−x)dx + _ 0 −∞ e (−x) f X (−x)dx = _ −∞ e −x f X (x)dx = _ −∞ e (0−)x f X (x)dx = M X (0 −). 2.26 a. There are many examples; here are three. The standard normal pdf (Example 2.1.9) is symmetric about a = 0 because (0 − ) 2 = (0 + ) 2 . The Cauchy pdf (Example 2.2.4) is symmetric about a = 0 because (0 − ) 2 = (0 + ) 2 . The uniform(0, 1) pdf (Example 2.1.4) is symmetric about a = 1/2 because f((1/2) +) = f((1/2) −) = _ 1 if 0 < < 1 2 0 if 1 2 ≤ < ∞ . b. _ a f(x)dx = _ 0 f(a +)d (change variable, = x −a) = _ 0 f(a −)d (f(a +) = f(a −) for all > 0) = _ a −∞ f(x)dx. (change variable, x = a −) 2-8 Solutions Manual for Statistical Inference Since _ a −∞ f(x)dx + _ a f(x)dx = _ −∞ f(x)dx = 1, it must be that _ a −∞ f(x) dx = _ a f(x) dx = 1/2. Therefore, a is a median. c. EX −a = E(X −a) = _ −∞ (x −a)f(x)dx = _ a −∞ (x −a)f(x)dx + _ a (x −a)f(x)dx = _ 0 (−)f(a −)d + _ 0 f(a +)d With a change of variable, = a−x in the ﬁrst integral, and = x−a in the second integral we obtain that EX −a = E(X −a) = − _ 0 f(a −)d + _ 0 f(a −)d (f(a +) = f(a −) for all > 0) = 0. (two integrals are same) Therefore, EX = a. d. If a > > 0, f(a −) = e −(a−) > e −(a+) = f(a +). Therefore, f(x) is not symmetric about a > 0. If − < a ≤ 0, f(a −) = 0 < e −(a+) = f(a +). Therefore, f(x) is not symmetric about a ≤ 0, either. e. The median of X = log 2 < 1 = EX. 2.27 a. The standard normal pdf. b. The uniform on the interval (0, 1). c. For the case when the mode is unique. Let a be the point of symmetry and b be the mode. Let assume that a is not the mode and without loss of generality that a = b+ > b for > 0. Since b is the mode then f(b) > f(b +) ≥ f(b +2) which implies that f(a −) > f(a) ≥ f(a +) which contradict the fact the f(x) is symmetric. Thus a is the mode. For the case when the mode is not unique, there must exist an interval (x 1 , x 2 ) such that f(x) has the same value in the whole interval, i.e, f(x) is ﬂat in this interval and for all b ∈ (x 1 , x 2 ), b is a mode. Let assume that a ∈ (x 1 , x 2 ), thus a is not a mode. Let also assume without loss of generality that a = (b + ) > b. Since b is a mode and a = (b + ) ∈ (x 1 , x 2 ) then f(b) > f(b + ) ≥ f(b + 2) which contradict the fact the f(x) is symmetric. Thus a ∈ (x 1 , x 2 ) and is a mode. d. f(x) is decreasing for x ≥ 0, with f(0) > f(x) > f(y) for all 0 < x < y. Thus f(x) is unimodal and 0 is the mode. Second Edition 2-9 2.28 a. µ 3 = _ −∞ (x −a) 3 f(x)dx = _ a −∞ (x −a) 3 f(x)dx + _ a (x −a) 3 f(x)dx = _ 0 −∞ y 3 f(y +a)dy + _ 0 y 3 f(y +a)dy (change variable y = x −a) = _ 0 −y 3 f(−y +a)dy + _ 0 y 3 f(y +a)dy = 0. (f(−y +a) = f(y +a)) b. For f(x) = e −x , µ 1 = µ 2 = 1, therefore α 3 = µ 3 . µ 3 = _ 0 (x −1) 3 e −x dx = _ 0 (x 3 −3x 2 + 3x −1)e −x dx = Γ(4) −3Γ(3) + 3Γ(2) −Γ(1) = 3! −3 ×2! + 3 ×1 −1 = 3. c. Each distribution has µ 1 = 0, therefore we must calculate µ 2 = EX 2 and µ 4 = EX 4 . (i) f(x) = 1 e −x 2 /2 , µ 2 = 1, µ 4 = 3, α 4 = 3. (ii) f(x) = 1 2 , −1 < x < 1, µ 2 = 1 3 , µ 4 = 1 5 , α 4 = 9 5 . (iii) f(x) = 1 2 e −|x| , −∞ < x < ∞, µ 2 = 2, µ 4 = 24, α 4 = 6. As a graph will show, (iii) is most peaked, (i) is next, and (ii) is least peaked. 2.29 a. For the binomial EX(X −1) = n x=2 x(x −1) _ n x _ p x (1 −p) n−x = n(n −1)p 2 n x=2 _ n −2 x _ p x−2 (1 −p) n−x = n(n −1)p 2 n−2 y=0 _ n −2 y _ p y (1 −p) n−2−y = n(n −1)p 2 , where we use the identity x(x −1) _ n x _ = n(n −1) _ n−2 x _ , substitute y = x −2 and recognize that the new sum is equal to 1. Similarly, for the Poisson EX(X −1) = x=2 x(x −1) e −λ λ x x! = λ 2 y=0 e −λ λ y y! = λ 2 , where we substitute y = x −2. b. Var(X) = E[X(X −1)] + EX −(EX) 2 . For the binomial Var(X) = n(n −1)p 2 +np −(np) 2 = np(1 −p). For the Poisson Var(X) = λ 2 +λ −λ 2 = λ. c. EY = n y=0 y a y +a _ n y _ _ a+b−1 a _ _ n+a+b−1 y+a _ = n y=1 n a (y −1) + (a + 1) _ n −1 y −1 _ _ a+b−1 a _ _ (n−1)+(a+1)+b−1 (y−1)+(a+1) _ 2-10 Solutions Manual for Statistical Inference = n y=1 n a (y −1) + (a + 1) _ n −1 y −1 _ _ a+b−1 a _ _ (n−1)+(a+1)+b−1 (y−1)+(a+1) _ = na a+1 _ a+b−1 a _ _ a+1+b−1 a+1 _ n y=1 a + 1 (y −1) + (a + 1) _ n −1 y −1 _ _ a+1+b−1 a+1 _ _ (n−1)+(a+1)+b−1 (y−1)+(a+1) _ = na a +b n−1 j=0 a + 1 j + (a + 1) _ n −1 j _ _ a+1+b−1 a+1 _ _ (n−1)+(a+1)+b−1 (j+(a+1) _ = na a +b , since the last summation is 1, being the sum over all possible values of a beta-binomial(n − 1, a + 1, b). E[Y (Y − 1)] = n(n−1)a(a+1) (a+b)(a+b+1) is calculated similar to EY, but using the identity y(y −1) _ n y _ = n(n−1) _ n−2 y−2 _ and adding 2 instead of 1 to the parameter a. The sum over all possible values of a beta-binomial(n −2, a + 2, b) will appear in the calculation. Therefore Var(Y ) = E[Y (Y −1)] + EY −(EY ) 2 = nab(n +a +b) (a +b) 2 (a +b + 1) . 2.30 a. E(e tX ) = _ c 0 e tx 1 c dx = 1 ct e tx ¸ ¸ c 0 = 1 ct e tc 1 ct 1 = 1 ct (e tc −1). b. E(e tX ) = _ c 0 2x c 2 e tx dx = 2 c 2 t 2 (cte tc −e tc + 1). (integration-by-parts) c. E(e tx ) = _ α −∞ 1 e (x−α)/β e tx dx + _ α 1 e −(x−α)/β e tx dx = e −α/β 1 ( 1 β +t) e x( 1 β +t) ¸ ¸ ¸ ¸ ¸ α −∞ + − e α/β 1 ( 1 β −t) e −x( 1 β −t) ¸ ¸ ¸ ¸ ¸ α = 4e αt 4−β 2 t 2 , −2/β < t < 2/β. d. E _ e tX _ = x=0 e tx _ r+x−1 x _ p r (1 − p) x = p r x=0 _ r+x−1 x _ _ (1 −p)e t _ x . Now use the fact that x=0 _ r+x−1 x _ _ (1 −p)e t _ x _ 1 −(1 −p)e t _ r = 1 for (1 −p)e t < 1, since this is just the sum of this pmf, to get E(e tX ) = _ p 1−(1−p)e t _ r , t < −log(1 −p). 2.31 Since the mgf is deﬁned as M X (t) = Ee tX , we necessarily have M X (0) = Ee 0 = 1. But t/(1−t) is 0 at t = 0, therefore it cannot be an mgf. 2.32 d dt S(t) ¸ ¸ ¸ ¸ t=0 = d dt (log(M x (t)) ¸ ¸ ¸ ¸ t=0 = d dt M x (t) M x (t) ¸ ¸ ¸ ¸ ¸ t=0 = EX 1 = EX _ since M X (0) = Ee 0 = 1 _ d 2 dt 2 S(t) ¸ ¸ ¸ ¸ t=0 = d dt _ M x (t) M x (t) ¸ ¸ ¸ t=0 = M x (t)M x (t) −[M x (t)] 2 [M x (t)] 2 ¸ ¸ ¸ ¸ ¸ t=0 = 1 · EX 2 −(EX) 2 1 = VarX. 2.33 a. M X (t) = x=0 e tx e −λ λ x x! = e −λ x=1 (e t λ) x x! = e −λ e λe t = e λ(e t −1) . EX = d dt M x (t) ¸ ¸ t=0 = e λ(e t −1) λe t ¸ ¸ ¸ t=0 = λ. Second Edition 2-11 EX 2 = d 2 dt 2 M x (t) ¸ ¸ ¸ t=0 = λe t e λ(e t −1) λe t +λe t e λ(e t −1) ¸ ¸ ¸ t=0 = λ 2 +λ. VarX = EX 2 −(EX) 2 = λ 2 +λ −λ 2 = λ. b. M x (t) = x=0 e tx p(1 −p) x = p x=0 ((1 −p)e t ) x = p 1 1−(1 −p)e t = p 1−(1 −p)e t , t < −log(1 −p). EX = d dt M x (t) ¸ ¸ ¸ ¸ t=0 = −p (1 −(1 −p)e t ) 2 _ −(1 −p)e t _ ¸ ¸ ¸ ¸ ¸ t=0 = p(1 −p) p 2 = 1−p p . EX 2 = d 2 dt 2 M x (t) ¸ ¸ ¸ ¸ t=0 = _ 1−(1 −p)e t _ 2 _ p(1 −p)e t _ +p(1 −p)e t 2 _ 1−(1 −p)e t _ (1 −p)e t (1 −(1 −p)e t ) 4 ¸ ¸ ¸ ¸ ¸ ¸ ¸ t=0 = p 3 (1 −p) + 2p 2 (1 −p) 2 p 4 = p(1 −p) + 2(1 −p) 2 p 2 . VarX = p(1 −p) + 2(1 −p) 2 p 2 (1 −p) 2 p 2 = 1−p p 2 . c. M x (t) = _ −∞ e tx 1 2πσ e −(x−µ) 2 /2σ 2 dx = 1 2πσ _ −∞ e −(x 2 −2µx−2σ 2 tx+µ 2 )/2σ 2 dx. Now com- plete the square in the numerator by writing x 2 −2µx −2σ 2 tx+µ 2 = x 2 −2(µ +σ 2 t)x ±(µ +σ 2 t) 2 2 = (x −(µ +σ 2 t)) 2 −(µ +σ 2 t) 2 2 = (x −(µ +σ 2 t)) 2 −[2µσ 2 t + (σ 2 t) 2 ]. Then we have M x (t) = e [2µσ 2 t+(σ 2 t) 2 ]/2σ 2 1 2πσ _ −∞ e 1 2 (x−(µ+σ 2 t)) 2 dx = e µt+ σ 2 t 2 2 . EX = d dt M x (t) ¸ ¸ t=0 = (µ+σ 2 t)e µt+σ 2 t 2 /2 ¸ ¸ ¸ t=0 = µ. EX 2 = d 2 dt 2 M x (t) ¸ ¸ ¸ t=0 = (µ+σ 2 t) 2 e µt+σ 2 t 2 /2 2 e µt+σ 2 t/2 ¸ ¸ ¸ t=0 = µ 2 2 . VarX = µ 2 2 −µ 2 = σ 2 . 2.35 a. EX r 1 = _ 0 x r 1 2πx e −(log x) 2 /2 dx (f 1 is lognormal with µ = 0, σ 2 = 1) = 1 _ −∞ e y(r−1) e −y 2 /2 e y dy (substitute y = log x, dy = (1/x)dx) = 1 _ −∞ e −y 2 /2+ry dy = 1 _ −∞ e −(y 2 −2ry+r 2 )/2 e r 2 /2 dy = e r 2 /2 . 2-12 Solutions Manual for Statistical Inference b. _ 0 x r f 1 (x) sin(2πlog x)dx = _ 0 x r 1 2πx e −(log x) 2 /2 sin(2π log x)dx = _ −∞ e (y+r)r 1 e −(y+r) 2 /2 sin(2πy + 2πr)dy (substitute y = log x, dy = (1/x)dx) = _ −∞ 1 e (r 2 −y 2 )/2 sin(2πy)dy (sin(a + 2πr) = sin(a) if r = 0, 1, 2, . . .) = 0, because e (r 2 −y 2 )/2 sin(2πy) = −e (r 2 −(−y) 2 )/2 sin(2π(−y)); the integrand is an odd function so the negative integral cancels the positive one. 2.36 First, it can be shown that lim x→∞ e tx−(log x) 2 = ∞ by using l’Hˆopital’s rule to show lim x→∞ tx −(log x) 2 tx = 1, and, hence, lim x→∞ tx −(log x) 2 = lim x→∞ tx = ∞. Then for any k > 0, there is a constant c such that _ k 1 x e tx e ( log x) 2 /2 dx ≥ c _ k 1 x dx = c log x| k = ∞. Hence M x (t) does not exist. 2.37 a. The graph looks very similar to Figure 2.3.2 except that f 1 is symmetric around 0 (since it is standard normal). b. The functions look like t 2 /2 – it is impossible to see any diﬀerence. c. The mgf of f 1 is e K 1 (t) . The mgf of f 2 is e K 2 (t) . d. Make the transformation y = e x to get the densities in Example 2.3.10. 2.39 a. d dx _ x 0 e −λt dt = e −λx . Verify d dx __ x 0 e −λt dt _ = d dx _ 1 λ e −λt ¸ ¸ ¸ ¸ x 0 _ = d dx _ 1 λ e −λx + 1 λ _ = e −λx . b. d _ 0 e −λt dt = _ 0 d e −λt dt = _ 0 −te −λt dt = − Γ(2) λ 2 = − 1 λ 2 . Verify d _ 0 e −λt dt = d 1 λ = − 1 λ 2 . c. d dt _ 1 t 1 x 2 dx = − 1 t 2 . Verify d dt __ 1 t 1 x 2 dx _ = d dt _ 1 x ¸ ¸ ¸ ¸ 1 t _ = d dt _ −1 + 1 t _ = − 1 t 2 . d. d dt _ 1 1 (x−t) 2 dx = _ 1 d dt _ 1 (x−t) 2 _ dx = _ 1 2(x −t) −3 dx = −(x −t) −2 ¸ ¸ ¸ 1 = 1 (1−t) 2 . Verify d dt _ 1 (x −t) −2 dx = d dt _ −(x −t) −1 ¸ ¸ ¸ 1 _ = d dt 1 1 −t = 1 (1 −t) 2 . Chapter 3 Common Families of Distributions 3.1 The pmf of X is f(x) = 1 N 1 −N 0 +1 , x = N 0 , N 0 + 1, . . . , N 1 . Then EX = N 1 x=N 0 x 1 N 1 −N 0 +1 = 1 N 1 −N 0 +1 _ N 1 x=1 x − N 0 −1 x=1 x _ = 1 N 1 −N 0 +1 _ N 1 (N 1 +1) 2 (N 0 −1)(N 0 −1 + 1) 2 _ = N 1 + N 0 2 . Similarly, using the formula for N 1 x 2 , we obtain Ex 2 = 1 N 1 −N 0 +1 _ N 1 (N 1 +1)(2N 1 +1) −N 0 (N 0 −1)(2N 0 −1) 6 _ VarX = EX 2 −EX = (N 1 −N 0 )(N 1 −N 0 +2) 12 . 3.2 Let X = number of defective parts in the sample. Then X ∼ hypergeometric(N = 100, M, K) where M = number of defectives in the lot and K = sample size. a. If there are 6 or more defectives in the lot, then the probability that the lot is accepted (X = 0) is at most P(X = 0 | M = 100, N = 6, K) = _ 6 0 __ 94 K _ _ 100 K _ = (100 −K) · · · · · (100 −K −5) 100 · · · · · 95 . By trial and error we ﬁnd P(X = 0) = .10056 for K = 31 and P(X = 0) = .09182 for K = 32. So the sample size must be at least 32. b. Now P(accept lot) = P(X = 0 or 1), and, for 6 or more defectives, the probability is at most P(X = 0 or 1 | M = 100, N = 6, K) = _ 6 0 __ 94 K _ _ 100 K _ + _ 6 1 __ 94 K−1 _ _ 100 K _ . By trial and error we ﬁnd P(X = 0 or 1) = .10220 for K = 50 and P(X = 0 or 1) = .09331 for K = 51. So the sample size must be at least 51. 3.3 In the seven seconds for the event, no car must pass in the last three seconds, an event with probability (1 − p) 3 . The only occurrence in the ﬁrst four seconds, for which the pedestrian does not wait the entire four seconds, is to have a car pass in the ﬁrst second and no other car pass. This has probability p(1 − p) 3 . Thus the probability of waiting exactly four seconds before starting to cross is [1 −p(1 −p) 3 ](1 −p) 3 . 3-2 Solutions Manual for Statistical Inference 3.5 Let X = number of eﬀective cases. If the new and old drugs are equally eﬀective, then the probability that the new drug is eﬀective on a case is .8. If the cases are independent then X ∼ binomial(100, .8), and P(X ≥ 85) = 100 x=85 _ 100 x _ .8 x .2 100−x = .1285. So, even if the new drug is no better than the old, the chance of 85 or more eﬀective cases is not too small. Hence, we cannot conclude the new drug is better. Note that using a normal approximation to calculate this binomial probability yields P(X ≥ 85) ≈ P(Z ≥ 1.125) = .1303. 3.7 Let X ∼ Poisson(λ). We want P(X ≥ 2) ≥ .99, that is, P(X ≤ 1) = e −λ + λe −λ ≤ .01. Solving e −λ + λe −λ = .01 by trial and error (numerical bisection method) yields λ = 6.6384. 3.8 a. We want P(X > N) < .01 where X ∼ binomial(1000, 1/2). Since the 1000 customers choose randomly, we take p = 1/2. We thus require P(X > N) = 1000 x=N+1 _ 1000 x __ 1 2 _ x _ 1 − 1 2 _ 1000−x < .01 which implies that _ 1 2 _ 1000 1000 x=N+1 _ 1000 x _ < .01. This last inequality can be used to solve for N, that is, N is the smallest integer that satisﬁes _ 1 2 _ 1000 1000 x=N+1 _ 1000 x _ < .01. The solution is N = 537. b. To use the normal approximation we take X ∼n(500, 250), where we used µ = 1000( 1 2 ) = 500 and σ 2 = 1000( 1 2 )( 1 2 ) = 250.Then P(X > N) = P _ X −500 250 > N −500 250 _ < .01 thus, P _ Z > N −500 250 _ < .01 where Z ∼ n(0, 1). From the normal table we get P(Z > 2.33) ≈ .0099 < .01 ⇒ N −500 250 = 2.33 ⇒ N ≈ 537. Therefore, each theater should have at least 537 seats, and the answer based on the approx- imation equals the exact answer. Second Edition 3-3 3.9 a. We can think of each one of the 60 children entering kindergarten as 60 independent Bernoulli trials with probability of success (a twin birth) of approximately 1 90 . The probability of having 5 or more successes approximates the probability of having 5 or more sets of twins entering kindergarten. Then X ∼ binomial(60, 1 90 ) and P(X ≥ 5) = 1 − 4 x=0 _ 60 x __ 1 90 _ x _ 1 − 1 90 _ 60−x = .0006, which is small and may be rare enough to be newsworthy. b. Let X be the number of elementary schools in New York state that have 5 or more sets of twins entering kindergarten. Then the probability of interest is P(X ≥ 1) where X ∼ binomial(310,.0006). Therefore P(X ≥ 1) = 1 −P(X = 0) = .1698. c. Let X be the number of States that have 5 or more sets of twins entering kindergarten during any of the last ten years. Then the probability of interest is P(X ≥ 1) where X ∼ binomial(500, .1698). Therefore P(X ≥ 1) = 1 −P(X = 0) = 1 −3.90 ×10 −41 ≈ 1. 3.11 a. lim M/N→p,M→∞,N→∞ _ M x __ N−M K−x _ _ N K _ = K! x!(K−x)! lim M/N→p,M→∞,N→∞ M!(N−M)!(N−K)! N!(M−x)!(N−M−(K−x))! In the limit, each of the factorial terms can be replaced by the approximation from Stirling’s formula because, for example, M! = (M!/( 2πM M+1/2 e −M )) 2πM M+1/2 e −M and M!/( 2πM M+1/2 e −M ) → 1. When this replacement is made, all the 2π and expo- nential terms cancel. Thus, lim M/N→p,M→∞,N→∞ _ M x __ N−M K−x _ _ N K _ = _ K x _ lim M/N→p,M→∞,N→∞ M M+1/2 (N−M) N−M+1/2 (N−K) N−K+1/2 N N+1/2 (M−x) M−x+1/2 (N−M−K+x) N−M−(K−x)+1/2 . We can evaluate the limit by breaking the ratio into seven terms, each of which has a ﬁnite limit we can evaluate. In some limits we use the fact that M →∞, N →∞ and M/N →p imply N −M →∞. The ﬁrst term (of the seven terms) is lim M→∞ _ M M −x _ M = lim M→∞ 1 _ M−x M _ M = lim M→∞ 1 _ 1+ −x M _ M = 1 e −x = e x . Lemma 2.3.14 is used to get the penultimate equality. Similarly we get two more terms, lim N−M→∞ _ N −M N −M −(K −x) _ N−M = e K−x and lim N→∞ _ N −K N _ N = e −K . 3-4 Solutions Manual for Statistical Inference Note, the product of these three limits is one. Three other terms are limM →∞ _ M M −x _ 1/2 = 1 lim N−M→∞ _ N −M N −M −(K −x) _ 1/2 = 1 and lim N→∞ _ N −K N _ 1/2 = 1. The only term left is lim M/N→p,M→∞,N→∞ (M −x) x (N −M −(K −x)) K−x (N −K) K = lim M/N→p,M→∞,N→∞ _ M −x N −K _ x _ N −M −(K −x) N −K _ K−x = p x (1 −p) K−x . b. If in (a) we in addition have K → ∞, p → 0, MK/N → pK → λ, by the Poisson approxi- mation to the binomial, we heuristically get _ M x __ N−M K−x _ _ N K _ → _ K x _ p x (1 −p) K−x e −λ λ x x! . c. Using Stirling’s formula as in (a), we get lim N,M,K→∞, M N →0, KM N →λ _ M x __ N−M K−x _ _ N K _ = lim N,M,K→∞, M N →0, KM N →λ e −x x! K x e x M x e x (N−M) K−x e K−x N K e K = 1 x! lim N,M,K→∞, M N →0, KM N →λ _ KM N _ x _ N −M N _ K−x = 1 x! λ x lim N,M,K→∞, M N →0, KM N →λ _ 1 − MK N K _ K = e −λ λ x x! . 3.12 Consider a sequence of Bernoulli trials with success probability p. Deﬁne X = number of successes in ﬁrst n trials and Y = number of failures before the rth success. Then X and Y have the speciﬁed binomial and hypergeometric distributions, respectively. And we have F x (r −1) = P(X ≤ r −1) = P(rth success on (n + 1)st or later trial) = P(at least n + 1 −r failures before the rth success) = P(Y ≥ n −r + 1) = 1 −P(Y ≤ n −r) = 1 −F Y (n −r). Second Edition 3-5 3.13 For any X with support 0, 1, . . ., we have the mean and variance of the 0−truncated X T are given by EX T = x=1 xP(X T = x) = x=1 x P(X = x) P(X > 0) = 1 P(X > 0) x=1 xP(X = x) = 1 P(X > 0) x=0 xP(X = x) = EX P(X > 0) . In a similar way we get EX 2 T = EX 2 P(X>0) . Thus, VarX T = EX 2 P(X > 0) _ EX P(X > 0) _ 2 . a. For Poisson(λ), P(X > 0) = 1 −P(X = 0) = 1 − e −λ λ 0 0! = 1 −e −λ , therefore P(X T = x) = e −λ λ x x!(1−e −λ ) x = 1, 2, . . . EX T = λ/(1 −e −λ ) VarX T = (λ 2 + λ)/(1 −e −λ ) −(λ/(1 −e −λ )) 2 . b. For negative binomial(r, p), P(X > 0) = 1 −P(X = 0) = 1− _ r−1 0 _ p r (1−p) 0 = 1−p r . Then P(X T = x) = _ r+x−1 x _ p r (1 −p) x 1−p r , x = 1, 2, . . . EX T = r(1 −p) p(1 −p r ) VarX T = r(1 −p) + r 2 (1 −p) 2 p 2 (1 −p r ) _ r(1 −p) p(1 −p r ) 2 _ . 3.14 a. x=1 −(1−p) x x log p = 1 log p x=1 −(1−p) x x = 1, since the sum is the Taylor series for log p. b. EX = −1 log p _ x=1 (1−p) x _ = −1 log p _ x=0 (1−p) x −1 _ == −1 log p _ 1 p −1 _ = −1 log p _ 1−p p _ . Since the geometric series converges uniformly, EX 2 = −1 log p x=1 x(1 −p) x = (1−p) log p x=1 d dp (1 −p) x = (1−p) log p d dp x=1 (1 −p) x = (1−p) log p d dp _ 1−p p _ = −(1−p) p 2 log p . Thus VarX = −(1−p) p 2 log p _ 1 + (1−p) log p _ . Alternatively, the mgf can be calculated, M x (t) = −1 log p x=1 _ (1−p)e t _ x = log(1+pe t −e t ) log p and can be diﬀerentiated to obtain the moments. 3-6 Solutions Manual for Statistical Inference 3.15 The moment generating function for the negative binomial is M(t) = _ p 1−(1 −p)e t _ r = _ 1 + 1 r r(1 −p)(e t −1) 1−(1 −p)e t _ r , the term r(1 −p)(e t −1) 1−(1 −p)e t λ(e t −1) 1 = λ(e t −1) as r →∞, p →1 and r(p −1) →λ. Thus by Lemma 2.3.14, the negative binomial moment generating function converges to e λ(e t −1) , the Poisson moment generating function. 3.16 a. Using integration by parts with, u = t α and dv = e −t dt, we obtain Γ(α + 1) = _ 0 t (α+1)−1 e −t dt = t α (−e −t ) ¸ ¸ ¸ 0 _ 0 αt α−1 (−e −t )dt = 0 + αΓ(α) = αΓ(α). b. Making the change of variable z = 2t, i.e., t = z 2 /2, we obtain Γ(1/2) = _ 0 t −1/2 e −t dt = _ 0 2 z e −z 2 /2 zdz = 2 _ 0 e −z 2 /2 dz = 2 π 2 = π. where the penultimate equality uses (3.3.14). 3.17 EX ν = _ 0 x ν 1 Γ(α)β α x α−1 e −x/β dx = 1 Γ(α)β α _ 0 x (ν+α)−1 e −x/β dx = Γ(ν+α)β ν+α Γ(α)β α = β ν Γ(ν+α) Γ(α) . Note, this formula is valid for all ν > −α. The expectation does not exist for ν ≤ −α. 3.18 If Y ∼ negative binomial(r, p), its moment generating function is M Y (t) = _ p 1−(1−p)e t _ r , and, from Theorem 2.3.15, M pY (t) = _ p 1−(1−p)e pt _ r . Now use L’Hˆopital’s rule to calculate lim p→0 _ p 1−(1 −p)e pt _ = lim p→0 1 (p −1)te pt +e pt = 1 1 −t , so the moment generating function converges to (1 −t) −r , the moment generating function of a gamma(r, 1). 3.19 Repeatedly apply the integration-by-parts formula 1 Γ(n) _ x z n−1 z −z dz = x n−1 e −x (n −1)! + 1 Γ(n −1) _ x z n−2 z −z dz, until the exponent on the second integral is zero. This will establish the formula. If X ∼ gamma(α, 1) and Y ∼ Poisson(x). The probabilistic relationship is P(X ≥ x) = P(Y ≤ α−1). 3.21 The moment generating function would be deﬁned by 1 π _ −∞ e tx 1+x 2 dx. On (0, ∞), e tx > x, hence _ 0 e tx 1+x 2 dx > _ 0 x 1+x 2 dx = ∞, thus the moment generating function does not exist. Second Edition 3-7 3.22 a. E(X(X−1)) = x=0 x(x −1) e −λ λ x x! = e −λ λ 2 x=2 λ x−2 (x−2)! (let y = x −2) = e −λ λ 2 y=0 λ y y! = e −λ λ 2 e λ = λ 2 EX 2 = λ 2 + EX = λ 2 + λ VarX = EX 2 −(EX) 2 = λ 2 + λ −λ 2 = λ. b. E(X(X−1)) = x=0 x(x −1) _ r + x −1 x _ pr(1 −p) x = x=2 r(r + 1) _ r + x −1 x −2 _ pr(1 −p) x = r(r + 1) (1 −p) 2 p 2 y=0 _ r + 2 + y −1 y _ pr + 2(1 −p) y = r(r −1) (1 −p) 2 p 2 , where in the second equality we substituted y = x −2, and in the third equality we use the fact that we are summing over a negative binomial(r + 2, p) pmf. Thus, VarX = EX(X −1) + EX −(EX) 2 = r(r + 1) (1 −p) 2 p 2 + r(1 −p) p r 2 (1 −p) 2 p 2 = r(1 −p) p 2 . c. EX 2 = _ 0 x 2 1 Γ(α)β α x α−1 e −x/β dx = 1 Γ(α)β α _ 0 x α+1 e −x/β dx = 1 Γ(α)β α Γ(α + 2)β α+2 = α(α + 1)β 2 . VarX = EX 2 −(EX) 2 = α(α + 1)β 2 −α 2 β 2 = αβ 2 . d. (Use 3.3.18) EX = Γ(α+1)Γ(α+β) Γ(α+β+1)Γ(α) = αΓ(α)Γ(α+β) (α+β)Γ(α+β)Γ(α) = α α+β . EX 2 = Γ(α+2)Γ(α+β) Γ(α+β+2)Γ(α) = (α+1)αΓ(α)Γ(α+β) (α+β+1)(α+β)Γ(α+β)Γ(α) = α(α+1) (α+β)(α+β+1) . VarX = EX 2 −(EX) 2 = α(α+1) (α+β)(α+β+1) α 2 (α+β) 2 = αβ (α+β) 2 (α+β+1) . 3-8 Solutions Manual for Statistical Inference e. The double exponential(µ, σ) pdf is symmetric about µ. Thus, by Exercise 2.26, EX = µ. VarX = _ −∞ (x −µ) 2 1 e −|x−µ|/σ dx = _ −∞ σz 2 1 2 e −|z| σdz = σ 2 _ 0 z 2 e −z dz = σ 2 Γ(3) = 2σ 2 . 3.23 a. _ α x −β−1 dx = −1 β x −β ¸ ¸ ¸ ¸ α = 1 βα β , thus f(x) integrates to 1 . b. EX n = βα n (n−β) , therefore EX = αβ (1 −β) EX 2 = αβ 2 (2 −β) VarX = αβ 2 2−β (αβ) 2 (1−β) 2 c. If β < 2 the integral of the second moment is inﬁnite. 3.24 a. f x (x) = 1 β e −x/β , x > 0. For Y = X 1/γ , f Y (y) = γ β e −y γ y γ−1 , y > 0. Using the transforma- tion z = y γ /β, we calculate EY n = γ β _ 0 y γ+n−1 e −y γ dy = β n/γ _ 0 z n/γ e −z dz = β n/γ Γ _ n γ +1 _ . Thus EY = β 1/γ Γ( 1 γ + 1) and VarY = β 2/γ _ Γ _ 2 γ +1 _ −Γ 2 _ 1 γ +1 __ . b. f x (x) = 1 β e −x/β , x > 0. For Y = (2X/β) 1/2 , f Y (y) = ye −y 2 /2 , y > 0 . We now notice that EY = _ 0 y 2 e −y 2 /2 dy = 2 since 1 _ −∞ y 2 e −y 2 /2 = 1, the variance of a standard normal, and the integrand is sym- metric. Use integration-by-parts to calculate the second moment EY 2 = _ 0 y 3 e −y 2 /2 dy = 2 _ 0 ye −y 2 /2 dy = 2, where we take u = y 2 , dv = ye −y 2 /2 . Thus VarY = 2(1 −π/4). c. The gamma(a, b) density is f X (x) = 1 Γ(a)b a x a−1 e −x/b . Make the transformation y = 1/x with dx = −dy/y 2 to get f Y (y) = f X (1/y)|1/y 2 | = 1 Γ(a)b a _ 1 y _ a+1 e −1/by . Second Edition 3-9 The ﬁrst two moments are EY = 1 Γ(a)b a _ 0 _ 1 y _ a e −1/by = Γ(a −1)b a−1 Γ(a)b a = 1 (a −1)b EY 2 = Γ(a −2)b a−2 Γ(a)b a = 1 (a −1)(a −2)b 2 , and so VarY = 1 (a−1) 2 (a−2)b 2 . d. f x (x) = 1 Γ(3/2)β 3/2 x 3/2−1 e −x/β , x > 0. For Y = (X/β) 1/2 , f Y (y) = 2 Γ(3/2) y 2 e −y 2 , y > 0. To calculate the moments we use integration-by-parts with u = y 2 , dv = ye −y 2 to obtain EY = 2 Γ(3/2) _ 0 y 3 e −y 2 dy = 2 Γ(3/2) _ 0 ye −y 2 dy = 1 Γ(3/2) and with u = y 3 , dv = ye −y 2 to obtain EY 2 = 2 Γ(3/2) _ 0 y 4 e −y 2 dy = 3 Γ(3/2) _ 0 y 2 e −y 2 dy = 3 Γ(3/2) π. Using the fact that 1 2 π _ −∞ y 2 e −y 2 = 1, since it is the variance of a n(0, 2), symmetry yields _ 0 y 2 e −y 2 dy = π. Thus, VarY = 6 −4/π, using Γ(3/2) = 1 2 π. e. f x (x) = e −x , x > 0. For Y = α−γ log X, f Y (y) = e −e α−y γ e α−y γ 1 γ , −∞< y < ∞. Calculation of EY and EY 2 cannot be done in closed form. If we deﬁne I 1 = _ 0 log xe −x dx, I 2 = _ 0 (log x) 2 e −x dx, then EY = E(α −γ log x) = α −γI 1 , and EY 2 = E(α −γ log x) 2 = α 2 −2αγI 1 + γ 2 I 2 .The constant I 1 = .5772157 is called Euler’s constant. 3.25 Note that if T is continuous then, P(t ≤ T ≤ t+δ|t ≤ T) = P(t ≤ T ≤ t+δ, t ≤ T) P(t ≤ T) = P(t ≤ T ≤ t+δ) P(t ≤ T) = F T (t+δ) −F T (t) 1−F T (t) . Therefore from the deﬁnition of derivative, h T (t) = 1 1 −F T (t) = lim δ→0 F T (t + δ) −F T (t) δ = F T (t) 1 −F T (t) = f T (t) 1−F T (t) . Also, d dt (log[1 −F T (t)]) = − 1 1−F T (t) (−f T (t)) = h T (t). 3.26 a. f T (t) = 1 β e −t/β and F T (t) = _ t 0 1 β e −x/β dx = − e −x/β ¸ ¸ t 0 = 1 −e −t/β . Thus, h T (t) = f T (t) 1−F T (t) = (1/β)e −t/β 1−(1 −e −t/β ) = 1 β . 3-10 Solutions Manual for Statistical Inference b. f T (t) = γ β t γ−1 e −t γ , t ≥ 0 and F T (t) = _ t 0 γ β x γ−1 e −x γ dx = _ t γ/β 0 e −u du = − e −u | t γ/β 0 = 1 −e −t γ , where u = x γ/β . Thus, h T (t) = (γ/β)t γ−1 e −t γ e −t γ = γ β t γ−1 . c. F T (t) = 1 1+e −(t−µ)/β and f T (t) = e −(t−µ)/β (1+e −(t−µ)/β ) 2 . Thus, h T (t) = 1 β e −(t−µ)/β(1+e −(t−µ)/β ) 2 1 e −(t−µ)/β 1+e −(t−µ)/β = 1 β F T (t). 3.27 a. The uniform pdf satisﬁes the inequalities of Exercise 2.27, hence is unimodal. b. For the gamma(α, β) pdf f(x), ignoring constants, d dx f(x) = x α−2 e −x/β β [β(α−1) −x], which only has one sign change. Hence the pdf is unimodal with mode β(α −1). c. For the n(µ, σ 2 ) pdf f(x), ignoring constants, d dx f(x) = x−µ σ 2 e −(−x/β) 2 /2σ 2 , which only has one sign change. Hence the pdf is unimodal with mode µ. d. For the beta(α, β) pdf f(x), ignoring constants, d dx f(x) = x α−2 (1 −x) β−2 [(α−1) −x(α+β−2)] , which only has one sign change. Hence the pdf is unimodal with mode α−1 α+β−2 . 3.28 a. (i) µ known, f(x|σ 2 ) = 1 2πσ exp _ −1 2 (x −µ) 2 _ , h(x) = 1, c(σ 2 ) = 1 2πσ 2 I (0,∞) 2 ), w 1 2 ) = − 1 2 , t 1 (x) = (x −µ) 2 . (ii) σ 2 known, f(x|µ) = 1 2πσ exp _ x 2 2 _ exp _ µ 2 2 _ exp _ µ x σ 2 _ , h(x) = exp _ −x 2 2 _ , c(µ) = 1 2πσ exp _ −µ 2 2 _ , w 1 (µ) = µ, t 1 (x) = x σ 2 . b. (i) α known, f(x|β) = 1 Γ(α)β α x α−1 e −x β , h(x) = x α−1 Γ(α) , x > 0, c(β) = 1 β α , w 1 (β) = 1 β , t 1 (x) = −x. (ii) β known, f(x|α) = e −x/β 1 Γ(α)β α exp((α −1) log x), h(x) = e −x/β , x > 0, c(α) = 1 Γ(α)β α w 1 (α) = α −1, t 1 (x) = log x. (iii) α, β unknown, f(x|α, β) = 1 Γ(α)β α exp((α −1) log x − x β ), h(x) = I {x>0} (x), c(α, β) = 1 Γ(α)β α , w 1 (α) = α −1, t 1 (x) = log x, w 2 (α, β) = −1/β, t 2 (x) = x. c. (i) α known, h(x) = x α−1 I [0,1] (x), c(β) = 1 B(α,β) , w 1 (β) = β −1, t 1 (x) = log(1 −x). (ii) β known, h(x) = (1 −x) β−1 I [0,1] (x), c(α) = 1 B(α,β) , w 1 (x) = α −1, t 1 (x) = log x. Second Edition 3-11 (iii) α, β unknown, h(x) = I [0,1] (x), c(α, β) = 1 B(α,β) , w 1 (α) = α −1, t 1 (x) = log x, w 2 (β) = β −1, t 2 (x) = log(1 −x). d. h(x) = 1 x! I {0,1,2,...} (x), c(θ) = e −θ , w 1 (θ) = log θ, t 1 (x) = x. e. h(x) = _ x−1 r −1 _ I {r,r+1,...} (x), c(p) = _ p 1−p _ r , w 1 (p) = log(1 −p), t 1 (x) = x. 3.29 a. For the n(µ, σ 2 ) f(x) = _ 1 _ _ e −µ 2 /2σ 2 σ _ _ e −x 2 /2σ 2 +xµ/σ 2 _ , so the natural parameter is (η 1 , η 2 ) = (−1/2σ 2 , µ/σ 2 ) with natural parameter space {(η 1 2 ):η 1 < 0, −∞< η 2 < ∞}. b. For the gamma(α, β), f(x) = _ 1 Γ(α)β α _ _ e (α−1) log x−x/β _ , so the natural parameter is (η 1 , η 2 ) = (α −1, −1/β) with natural parameter space {(η 1 2 ):η 1 > −1,η 2 < 0}. c. For the beta(α, β), f(x) = _ Γ(α+β) Γ(α)Γ(β) _ _ e (α−1) log x+(β−1) log(1−x) _ , so the natural parameter is (η 1 , η 2 ) = (α − 1, β − 1) and the natural parameter space is {(η 1 2 ):η 1 > −1,η 2 > −1}. d. For the Poisson f(x) = _ 1 x! _ _ e −θ _ e xlogθ so the natural parameter is η = log θ and the natural parameter space is {η:−∞< η < ∞}. e. For the negative binomial(r, p), r known, P(X = x) = _ r+x−1 x _ (p r ) _ e x log (1−p) _ , so the natural parameter is η = log(1 −p) with natural parameter space {η:η < 0}. 3.31 a. 0 = ∂θ _ h(x)c(θ) exp _ k i=1 w i (θ)t i (x) _ dx = _ h(x)c (θ) exp _ k i=1 w i (θ)t i (x) _ dx + _ h(x)c(θ) exp _ k i=1 w i (θ)t i (x) __ k i=1 ∂w i (θ) ∂θ j t i (x) _ dx = _ h(x) _ ∂θ j logc(θ) _ c(θ) exp _ k i=1 w i (θ)t i (x) _ dx + E _ k i=1 ∂w i (θ) ∂θ j t i (x) _ = ∂θ j logc(θ) + E _ k i=1 ∂w i (θ) ∂θ j t i (x) _ Therefore E _ k i=1 ∂w i (θ) ∂θ j t i (x) _ = − ∂θ j logc(θ). 3-12 Solutions Manual for Statistical Inference b. 0 = 2 ∂θ 2 _ h(x)c(θ) exp _ k i=1 w i (θ)t i (x) _ dx = _ h(x)c (θ) exp _ k i=1 w i (θ)t i (x) _ dx + _ h(x)c (θ) exp _ k i=1 w i (θ)t i (x) __ k i=1 ∂w i (θ) ∂θ j t i (x) _ dx + _ h(x)c (θ) exp _ k i=1 w i (θ)t i (x) __ k i=1 ∂w i (θ) ∂θ j t i (x) _ dx + _ h(x)c(θ) exp _ k i=1 w i (θ)t i (x) __ k i=1 ∂w i (θ) ∂θ j t i (x) _ 2 dx + _ h(x)c(θ) exp _ k i=1 w i (θ)t i (x) __ k i=1 2 w i (θ) ∂θ 2 j t i (x) _ dx = _ h(x) _ 2 ∂θ 2 j logc(θ) _ c(θ) exp _ k i=1 w i (θ)t i (x) _ dx + _ h(x) _ c (θ) c(θ) _ 2 c(θ) exp _ k i=1 w i (θ)t i (x) _ dx +2 _ ∂θ j logc(θ) _ E _ k i=1 ∂w i (θ) ∂θ j t i (x) _ +E _ ( k i=1 ∂w i (θ) ∂θ j t i (x)) 2 _ + E _ k i=1 2 w i (θ) ∂θ 2 j t i (x) _ = 2 ∂θ 2 j logc(θ) + _ ∂θ j logc(θ) _ 2 −2E _ k i=1 ∂w i (θ) ∂θ j t i (x) _ E _ k i=1 ∂w i (θ) ∂θ j t i (x) _ +E _ ( k i=1 ∂w i (θ) ∂θ j t i (x)) 2 _ + E _ k i=1 2 w i (θ) ∂θ 2 j t i (x) _ = 2 ∂θ 2 j logc(θ) + Var _ k i=1 ∂w i (θ) ∂θ j t i (x) _ + E _ k i=1 2 w i (θ) ∂θ 2 j t i (x) _ . Therefore Var _ k i=1 ∂w i (θ) ∂θ j t i (x) _ = − 2 ∂θ 2 j logc(θ) −E _ k i=1 2 w i (θ) ∂θ 2 j t i (x) _ . 3.33 a. (i) h(x) = e x I {−∞<x<∞} (x), c(θ) = 1 2πθ exp( −θ 2 )θ > 0, w 1 (θ) = 1 , t 1 (x) = −x 2 . (ii) The nonnegative real line. b. (i) h(x) = I {−∞<x<∞} (x), c(θ) = 1 2πaθ 2 exp( −1 2a ) −∞< θ < ∞, a > 0, w 1 (θ) = 1 2aθ 2 , w 2 (θ) = 1 , t 1 (x) = −x 2 , t 2 (x) = x. (ii) A parabola. Second Edition 3-13 c. (i) h(x) = 1 x I {0<x<∞} (x), c(α) = α α Γ(α) α > 0, w 1 (α) = α, w 2 (α) = α, t 1 (x) = log(x), t 2 (x) = −x. (ii) A line. d. (i) h(x) = C exp(x 4 )I {−∞<x<∞} (x), c(θ) = exp(θ 4 ) −∞< θ < ∞, w 1 (θ) = θ, w 2 (θ) = θ 2 , w 3 (θ) = θ 3 , t 1 (x) = −4x 3 , t 2 (x) = 6x 2 , t 3 (x) = −4x. (ii) The curve is a spiral in 3-space. (iii) A good picture can be generated with the Mathematica statement ParametricPlot3D[{t, t^2, t^3}, {t, 0, 1}, ViewPoint -> {1, -2, 2.5}]. 3.35 a. In Exercise 3.34(a) w 1 (λ) = 1 and for a n(e θ , e θ ), w 1 (θ) = 1 2e θ . b. EX = µ = αβ, then β = µ α . Therefore h(x) = 1 x I {0<x<∞} (x), c(α) = α α Γ(α)( µ α ) α , α > 0, w 1 (α) = α, w 2 (α) = α µ , t 1 (x) = log(x), t 2 (x) = −x. c. From (b) then (α 1 , . . . , α n , β 1 , . . . , β n ) = (α 1 , . . . , α n , α 1 µ , . . . , α n µ ) 3.37 The pdf ( 1 σ )f( (x−µ) σ ) is symmetric about µ because, for any > 0, 1 σ f _ (µ+)−µ σ _ = 1 σ f _ σ _ = 1 σ f _ σ _ = 1 σ f _ (µ−)−µ σ _ . Thus, by Exercise 2.26b, µ is the median. 3.38 P(X > x α ) = P(σZ + µ > σz α + µ) = P(Z > z α ) by Theorem 3.5.6. 3.39 First take µ = 0 and σ = 1. a. The pdf is symmetric about 0, so 0 must be the median. Verifying this, write P(Z ≥ 0) = _ 0 1 π 1 1+z 2 dz = 1 π tan −1 (z) ¸ ¸ ¸ ¸ 0 = 1 π _ π 2 −0 _ = 1 2 . b. P(Z ≥ 1) = 1 π tan −1 (z) ¸ ¸ 1 = 1 π _ π 2 π 4 _ = 1 4 . By symmetry this is also equal to P(Z ≤ −1). Writing z = (x −µ)/σ establishes P(X ≥ µ) = 1 2 and P(X ≥ µ + σ) = 1 4 . 3.40 Let X ∼ f(x) have mean µ and variance σ 2 . Let Z = X−µ σ . Then EZ = _ 1 σ _ E(X −µ) = 0 and VarZ = Var _ X −µ σ _ = _ 1 σ 2 _ Var(X −µ) = _ 1 σ 2 _ VarX = σ 2 σ 2 = 1. Then compute the pdf of Z, f Z (z) = f x (σz+µ)· σ = σf x (σz+µ) and use f Z (z) as the standard pdf. 3.41 a. This is a special case of Exercise 3.42a. b. This is a special case of Exercise 3.42b. 3.42 a. Let θ 1 > θ 2 . Let X 1 ∼ f(x −θ 1 ) and X 2 ∼ f(x −θ 2 ). Let F(z) be the cdf corresponding to f(z) and let Z ∼ f(z).Then F(x | θ 1 ) = P(X 1 ≤ x) = P(Z + θ 1 ≤ x) = P(Z ≤ x −θ 1 ) = F(x −θ 1 ) ≤ F(x −θ 2 ) = P(Z ≤ x −θ 2 ) = P(Z + θ 2 ≤ x) = P(X 2 ≤ x) = F(x | θ 2 ). 3-14 Solutions Manual for Statistical Inference The inequality is because x −θ 2 > x −θ 1 , and F is nondecreasing. To get strict inequality for some x, let (a, b] be an interval of length θ 1 −θ 2 with P(a < Z ≤ b) = F(b) −F(a) > 0. Let x = a + θ 1 . Then F(x | θ 1 ) = F(x −θ 1 ) = F(a + θ 1 −θ 1 ) = F(a) < F(b) = F(a + θ 1 −θ 2 ) = F(x −θ 2 ) = F(x | θ 2 ). b. Let σ 1 > σ 2 . Let X 1 ∼ f(x/σ 1 ) and X 2 ∼ f(x/σ 2 ). Let F(z) be the cdf corresponding to f(z) and let Z ∼ f(z). Then, for x > 0, F(x | σ 1 ) = P(X 1 ≤ x) = P(σ 1 Z ≤ x) = P(Z ≤ x/σ 1 ) = F(x/σ 1 ) ≤ F(x/σ 2 ) = P(Z ≤ x/σ 2 ) = P(σ 2 Z ≤ x) = P(X 2 ≤ x) = F(x | σ 2 ). The inequality is because x/σ 2 > x/σ 1 (because x > 0 and σ 1 > σ 2 > 0), and F is nondecreasing. For x ≤ 0, F(x | σ 1 ) = P(X 1 ≤ x) = 0 = P(X 2 ≤ x) = F(x | σ 2 ). To get strict inequality for some x, let (a, b] be an interval such that a > 0, b/a = σ 1 2 and P(a < Z ≤ b) = F(b) −F(a) > 0. Let x = aσ 1 . Then F(x | σ 1 ) = F(x/σ 1 ) = F(aσ 1 1 ) = F(a) < F(b) = F(aσ 1 2 ) = F(x/σ 2 ) = F(x | σ 2 ). 3.43 a. F Y (y|θ) = 1 −F X ( 1 y |θ) y > 0, by Theorem 2.1.3. For θ 1 > θ 2 , F Y (y|θ 1 ) = 1 −F X _ 1 y ¸ ¸ ¸ ¸ θ 1 _ ≤ 1 −F X _ 1 y ¸ ¸ ¸ ¸ θ 2 _ = F Y (y|θ 2 ) for all y, since F X (x|θ) is stochastically increasing and if θ 1 > θ 2 , F X (x|θ 2 ) ≤ F X (x|θ 1 ) for all x. Similarly, F Y (y|θ 1 ) = 1 − F X ( 1 y 1 ) < 1 − F X ( 1 y 2 ) = F Y (y|θ 2 ) for some y, since if θ 1 > θ 2 , F X (x|θ 2 ) < F X (x|θ 1 ) for some x. Thus F Y (y|θ) is stochastically decreasing in θ. b. F X (x|θ) is stochastically increasing in θ. If θ 1 > θ 2 and θ 1 , θ 2 > 0 then 1 θ 2 > 1 θ 1 . Therefore F X (x| 1 θ 1 ) ≤ F X (x| 1 θ 2 ) for all x and F X (x| 1 θ 1 ) < F X (x| 1 θ 2 ) for some x. Thus F X (x| 1 θ ) is stochastically decreasing in θ. 3.44 The function g(x) = |x| is a nonnegative function. So by Chebychev’s Inequality, P(|X| ≥ b) ≤ E|X|/b. Also, P(|X| ≥ b) = P(X 2 ≥ b 2 ). Since g(x) = x 2 is also nonnegative, again by Chebychev’s Inequality we have P(|X| ≥ b) = P(X 2 ≥ b 2 ) ≤ EX 2 /b 2 . For X ∼ exponential(1), E|X| = EX = 1 and EX 2 = VarX + (EX) 2 = 2 . For b = 3, E|X|/b = 1/3 > 2/9 = EX 2 /b 2 . Thus EX 2 /b 2 is a better bound. But for b = 2, E|X|/b = 1/ 2 < 1 = EX 2 /b 2 . Thus E|X|/b is a better bound. Second Edition 3-15 3.45 a. M X (t) = _ −∞ e tx f X (x)dx ≥ _ a e tx f X (x)dx ≥ e ta _ a f X (x)dx = e ta P(X ≥ a), where we use the fact that e tx is increasing in x for t > 0. b. M X (t) = _ −∞ e tx f X (x)dx ≥ _ a −∞ e tx f X (x)dx ≥ e ta _ a −∞ f X (x)dx = e ta P(X ≤ a), where we use the fact that e tx is decreasing in x for t < 0. c. h(t, x) must be nonnegative. 3.46 For X ∼ uniform(0, 1), µ = 1 2 and σ 2 = 1 12 , thus P(|X −µ| > kσ) = 1 −P _ 1 2 k 12 ≤ X ≤ 1 2 + k 12 _ = _ 1 − 2k 12 k < 3, 0 k ≥ 3, For X ∼ exponential(λ), µ = λ and σ 2 = λ 2 , thus P(|X −µ| > kσ) = 1 −P(λ −kλ ≤ X ≤ λ + kλ) = _ 1 + e −(k+1) −e k−1 k ≤ 1 e −(k+1) k > 1. From Example 3.6.2, Chebychev’s Inequality gives the bound P(|X −µ| > kσ) ≤ 1/k 2 . Comparison of probabilities k u(0, 1) exp(λ) Chebychev exact exact .1 .942 .926 100 .5 .711 .617 4 1 .423 .135 1 1.5 .134 .0821 .44 3 0 0.0651 .33 2 0 0.0498 .25 4 0 0.00674 .0625 10 0 0.0000167 .01 So we see that Chebychev’s Inequality is quite conservative. 3.47 P(|Z| > t) = 2P(Z > t) = 2 1 _ t e −x 2 /2 dx = _ 2 π _ t 1+x 2 1+x 2 e −x 2 /2 dx = _ 2 π __ t 1 1+x 2 e −x 2 /2 dx+ _ t x 2 1+x 2 e −x 2 /2 dx _ . 3-16 Solutions Manual for Statistical Inference To evaluate the second term, let u = x 1+x 2 , dv = xe −x 2 /2 dx, v = −e −x 2 /2 , du = 1−x 2 (1+x 2 ) 2 , to obtain _ t x 2 1 + x 2 e −x 2 /2 dx = x 1 + x 2 (−e −x 2 /2 ) ¸ ¸ ¸ ¸ t _ t 1 −x 2 (1 + x 2 ) 2 (−e −x 2 /2 )dx = t 1 + t 2 e −t 2 /2 + _ t 1 −x 2 (1 + x 2 ) 2 e −x 2 /2 dx. Therefore, P(Z ≥ t) = _ 2 π t 1 + t 2 e −t 2 /2 + _ 2 π _ t _ 1 1 + x 2 + 1 −x 2 (1 + x 2 ) 2 _ e −x 2 /2 dx = _ 2 π t 1 + t 2 e −t 2 /2 + _ 2 π _ t _ 2 (1 + x 2 ) 2 _ e −x 2 /2 dx _ 2 π t 1 + t 2 e −t 2 /2 . 3.48 For the negative binomial P(X = x + 1) = _ r + x + 1 −1 x + 1 _ p r (1 −p) x+1 = _ r + x x + 1 _ (1 −p)P(X = x). For the hypergeometric P(X = x + 1) = _ ¸ _ ¸ _ (M−x)(k−x+x+1)(x+1) P(X=x) if x < k, x < M, x ≥ M −(N −k) ( M x+1 )( N−M k−x−1 ) ( N k ) if x = M −(N −k) −1 0 otherwise. 3.49 a. E(g(X)(X −αβ)) = _ 0 g(x)(x −αβ) 1 Γ(α) β α x α−1 e −x/β dx. Let u = g(x), du = g (x), dv = (x −αβ)x α−1 e −x/β , v = −βx α e −x/β . Then Eg(X)(X −αβ) = 1 Γ(α)β α _ −g(x)βx α e −x/β ¸ ¸ ¸ 0 + β _ 0 g (x)x α e −x/β dx _ . Assuming g(x) to be diﬀerentiable, E|Xg (X)| < ∞ and lim x→∞ g(x)x α e −x/β = 0, the ﬁrst term is zero, and the second term is βE(Xg (X)). b. E _ g(X) _ β−(α−1) 1−X x __ = Γ(α+β) Γ(α)Γ(β) _ 1 0 g(x) _ β −(α −1) 1−x x _ x α−1 (1 −x) β−1 dx. Let u = g(x) and dv = (β −(α −1) 1−x x )x α−1 (1 −x) β . The expectation is Γ(α + β) Γ(α)Γ(β) _ g(x)x α−1 (1 −x) β ¸ ¸ 1 0 + _ 1 0 (1 −x)g (x)x α−1 (1 −x) β−1 dx _ = E((1 −X)g (X)), assuming the ﬁrst term is zero and the integral exists. Second Edition 3-17 3.50 The proof is similar to that of part a) of Theorem 3.6.8. For X ∼ negative binomial(r, p), Eg(X) = x=0 g(x) _ r + x −1 x _ p r (1 −p) x = y=1 g(y −1) _ r + y −2 y −1 _ p r (1 −p) y−1 (set y = x + 1) = y=1 g(y −1) _ y r + y −1 __ r + y −1 y _ p r (1 −p) y−1 = y=0 _ y r + y −1 g(y −1) 1 −p _ __ r + y −1 y _ p r (1 −p) y _ (the summand is zero at y = 0) = E _ X r + X −1 g(X −1) 1 −p _ , where in the third equality we use the fact that _ r+y−2 y−1 _ = _ y r+y−1 _ _ r+y−1 y _ . Chapter 4 Multiple Random Variables 4.1 Since the distribution is uniform, the easiest way to calculate these probabilities is as the ratio of areas, the total area being 4. a. The circle x 2 +y 2 ≤ 1 has area π, so P(X 2 +Y 2 ≤ 1) = π 4 . b. The area below the line y = 2x is half of the area of the square, so P(2X −Y > 0) = 2 4 . c. Clearly P(|X +Y | < 2) = 1. 4.2 These are all fundamental properties of integrals. The proof is the same as for Theorem 2.2.5 with bivariate integrals replacing univariate integrals. 4.3 For the experiment of tossing two fair dice, each of the points in the 36-point sample space are equally likely. So the probability of an event is (number of points in the event)/36. The given probabilities are obtained by noting the following equivalences of events. P({X = 0, Y = 0}) = P({(1, 1), (2, 1), (1, 3), (2, 3), (1, 5), (2, 5)}) = 6 36 = 1 6 P({X = 0, Y = 1}) = P({(1, 2), (2, 2), (1, 4), (2, 4), (1, 6), (2, 6)}) = 6 36 = 1 6 P({X = 1, Y = 0}) = P({(3, 1), (4, 1), (5, 1), (6, 1), (3, 3), (4, 3), (5, 3), (6, 3), (3, 5), (4, 5), (5, 5), (6, 5)}) = 12 36 = 1 3 P({X = 1, Y = 1}) = P({(3, 2), (4, 2), (5, 2), (6, 2), (3, 4), (4, 4), (5, 4), (6, 4), (3, 6), (4, 6), (5, 6), (6, 6)}) = 12 36 = 1 3 4.4 a. _ 1 0 _ 2 0 C(x + 2y)dxdy = 4C = 1, thus C = 1 4 . b. f X (x) = _ _ 1 0 1 4 (x + 2y)dy = 1 4 (x + 1) 0 < x < 2 0 otherwise c. F XY (x, y) = P(X ≤ x, Y ≤ y) = _ x −∞ _ y −∞ f(v, u)dvdu. The way this integral is calculated depends on the values of x and y. For example, for 0 < x < 2 and 0 < y < 1, F XY (x, y) = _ x −∞ _ y −∞ f(u, v)dvdu = _ x 0 _ y 0 1 4 (u + 2v)dvdu = x 2 y 8 + y 2 x 4 . But for 0 < x < 2 and 1 ≤ y, F XY (x, y) = _ x −∞ _ y −∞ f(u, v)dvdu = _ x 0 _ 1 0 1 4 (u + 2v)dvdu = x 2 8 + x 4 . 4-2 Solutions Manual for Statistical Inference The complete deﬁnition of F XY is F XY (x, y) = _ ¸ ¸ ¸ _ ¸ ¸ ¸ _ 0 x ≤ 0 or y ≤ 0 x 2 y/8 +y 2 x/4 0 < x < 2 and 0 < y < 1 y/2 +y 2 /2 2 ≤ x and 0 < y < 1 x 2 /8 +x/4 0 < x < 2 and 1 ≤ y 1 2 ≤ x and 1 ≤ y . d. The function z = g(x) = 9/(x + 1) 2 is monotone on 0 < x < 2, so use Theorem 2.1.5 to obtain f Z (z) = 9/(8z 2 ), 1 < z < 9. 4.5 a. P(X > Y ) = _ 1 0 _ 1 y (x +y)dxdy = 7 20 . b. P(X 2 < Y < X) = _ 1 0 _ y y 2xdxdy = 1 6 . 4.6 Let A = time that A arrives and B = time that B arrives. The random variables A and B are independent uniform(1, 2) variables. So their joint pdf is uniform on the square (1, 2) ×(1, 2). Let X = amount of time A waits for B. Then, F X (x) = P(X ≤ x) = 0 for x < 0, and F X (x) = P(X ≤ x) = 1 for 1 ≤ x. For x = 0, we have F X (0) = P(X ≤ 0) = P(X = 0) = P(B ≤ A) = _ 2 1 _ a 1 1dbda = 1 2 . And for 0 < x < 1, F X (x) = P(X ≤ x) = 1−P(X > x) = 1−P(B−A > x) = 1− _ 2−x 1 _ 2 a+x 1dbda = 1 2 +x− x 2 2 . 4.7 We will measure time in minutes past 8 A.M. So X ∼ uniform(0, 30), Y ∼ uniform(40, 50) and the joint pdf is 1/300 on the rectangle (0, 30) ×(40, 50). P(arrive before 9 A.M.) = P(X +Y < 60) = _ 50 40 _ 60−y 0 1 300 dxdy = 1 2 . 4.9 P(a ≤ X ≤ b, c ≤ Y ≤ d) = P(X ≤ b, c ≤ Y ≤ d) −P(X ≤ a, c ≤ Y ≤ d) = P(X ≤ b, Y ≤ d) −P(X ≤ b, Y ≤ c) −P(X ≤ a, Y ≤ d) +P(X ≤ a, Y ≤ c) = F(b, d) −F(b, c) −F(a, d) −F(a, c) = F X (b)F Y (d) −F X (b)F Y (c) −F X (a)F Y (d) −F X (a)F Y (c) = P(X ≤ b) [P(Y ≤ d) −P(Y ≤ c)] −P(X ≤ a) [P(Y ≤ d) −P(Y ≤ c)] = P(X ≤ b)P(c ≤ Y ≤ d) −P(X ≤ a)P(c ≤ Y ≤ d) = P(a ≤ X ≤ b)P(c ≤ Y ≤ d). 4.10 a. The marginal distribution of X is P(X = 1) = P(X = 3) = 1 4 and P(X = 2) = 1 2 . The marginal distribution of Y is P(Y = 2) = P(Y = 3) = P(Y = 4) = 1 3 . But P(X = 2, Y = 3) = 0 = ( 1 2 )( 1 3 ) = P(X = 2)P(Y = 3). Therefore the random variables are not independent. b. The distribution that satisﬁes P(U = x, V = y) = P(U = x)P(V = y) where U ∼ X and V ∼ Y is Second Edition 4-3 U 1 2 3 2 1 12 1 6 1 12 V 3 1 12 1 6 1 12 4 1 12 1 6 1 12 4.11 The support of the distribution of (U, V ) is {(u, v) : u = 1, 2, . . . ; v = u + 1, u + 2, . . .}. This is not a cross-product set. Therefore, U and V are not independent. More simply, if we know U = u, then we know V > u. 4.12 One interpretation of “a stick is broken at random into three pieces” is this. Suppose the length of the stick is 1. Let X and Y denote the two points where the stick is broken. Let X and Y both have uniform(0, 1) distributions, and assume X and Y are independent. Then the joint distribution of X and Y is uniform on the unit square. In order for the three pieces to form a triangle, the sum of the lengths of any two pieces must be greater than the length of the third. This will be true if and only if the length of each piece is less than 1/2. To calculate the probability of this, we need to identify the sample points (x, y) such that the length of each piece is less than 1/2. If y > x, this will be true if x < 1/2, y − x < 1/2 and 1 − y < 1/2. These three inequalities deﬁne the triangle with vertices (0, 1/2), (1/2, 1/2) and (1/2, 1). (Draw a graph of this set.) Because of the uniform distribution, the probability that (X, Y ) falls in the triangle is the area of the triangle, which is 1/8. Similarly, if x > y, each piece will have length less than 1/2 if y < 1/2, x − y < 1/2 and 1 −x < 1/2. These three inequalities deﬁne the triangle with vertices (1/2, 0), (1/2, 1/2) and (1, 1/2). The probability that (X, Y ) is in this triangle is also 1/8. So the probability that the pieces form a triangle is 1/8 + 1/8 = 1/4. 4.13 a. E(Y −g(X)) 2 = E((Y −E(Y | X)) + (E(Y | X) −g(X))) 2 = E(Y −E(Y | X)) 2 + E(E(Y | X) −g(X)) 2 + 2E[(Y −E(Y | X))(E(Y | X) −g(X))] . The cross term can be shown to be zero by iterating the expectation. Thus E(Y −g(X)) 2 = E(Y −E(Y | X)) 2 +E(E(Y | X) −g(X)) 2 ≥ E(Y −E(Y | X)) 2 , for all g(·). The choice g(X) = E(Y | X) will give equality. b. Equation (2.2.3) is the special case of a) where we take the random variable X to be a constant. Then, g(X) is a constant, say b, and E(Y | X) = EY . 4.15 We will ﬁnd the conditional distribution of Y |X +Y . The derivation of the conditional distri- bution of X|X + Y is similar. Let U = X + Y and V = Y . In Example 4.3.1, we found the joint pmf of (U, V ). Note that for ﬁxed u, f(u, v) is positive for v = 0, . . . , u. Therefore the conditional pmf is f(v|u) = f(u, v) f(u) = θ u−v e −θ (u−v)! λ v e −λ v! (θ+λ) u e −(θ+λ) u! = _ u v __ λ θ+λ _ v _ θ θ+λ _ u−v , v = 0, . . . , u. That is V |U ∼ binomial(U, λ/(θ +λ)). 4.16 a. The support of the distribution of (U, V ) is {(u, v) : u = 1, 2, . . . ; v = 0, ±1, ±2, . . .}. If V > 0, then X > Y . So for v = 1, 2, . . ., the joint pmf is f U,V (u, v) = P(U = u, V = v) = P(Y = u, X = u +v) = p(1 −p) u+v−1 p(1 −p) u−1 = p 2 (1 −p) 2u+v−2 . 4-4 Solutions Manual for Statistical Inference If V < 0, then X < Y . So for v = −1, −2, . . ., the joint pmf is f U,V (u, v) = P(U = u, V = v) = P(X = u, Y = u −v) = p(1 −p) u−1 p(1 −p) u−v−1 = p 2 (1 −p) 2u−v−2 . If V = 0, then X = Y . So for v = 0, the joint pmf is f U,V (u, 0) = P(U = u, V = 0) = P(X = Y = u) = p(1 −p) u−1 p(1 −p) u−1 = p 2 (1 −p) 2u−2 . In all three cases, we can write the joint pmf as f U,V (u, v) = p 2 (1 −p) 2u+|v|−2 = _ p 2 (1 −p) 2u _ (1 −p) |v|−2 , u = 1, 2, . . . ; v = 0, ±1, ±2, . . . . Since the joint pmf factors into a function of u and a function of v, U and V are independent. b. The possible values of Z are all the fractions of the form r/s, where r and s are positive integers and r < s. Consider one such value, r/s, where the fraction is in reduced form. That is, r and s have no common factors. We need to identify all the pairs (x, y) such that x and y are positive integers and x/(x + y) = r/s. All such pairs are (ir, i(s − r)), i = 1, 2, . . .. Therefore, P _ Z = r s _ = i=1 P(X = ir, Y = i(s −r)) = i=1 p(1 −p) ir−1 p(1 −p) i(s−r)−1 = p 2 (1 −p) 2 i=1 ((1 −p) s ) i = p 2 (1 −p) 2 (1 −p) s 1−(1 −p) s = p 2 (1 −p) s−2 1−(1 −p) s . c. P(X = x, X +Y = t) = P(X = x, Y = t −x) = P(X = x)P(Y = t −x) = p 2 (1 −p) t−2 . 4.17 a. P(Y = i + 1) = _ i+1 i e −x dx = e −i (1 −e −1 ), which is geometric with p = 1 −e −1 . b. Since Y ≥ 5 if and only if X ≥ 4, P(X −4 ≤ x|Y ≥ 5) = P(X −4 ≤ x|X ≥ 4) = P(X ≤ x) = e −x , since the exponential distribution is memoryless. 4.18 We need to show f(x, y) is nonnegative and integrates to 1. f(x, y) ≥ 0, because the numerator is nonnegative since g(x) ≥ 0, and the denominator is positive for all x > 0, y > 0. Changing to polar coordinates, x = r cos θ and y = r sin θ, we obtain _ 0 _ 0 f(x, y)dxdy = _ π/2 0 _ 0 2g(r) πr rdrdθ = 2 π _ π/2 0 _ 0 g(r)drdθ = 2 π _ π/2 0 1dθ = 1. 4.19 a. Since (X 1 −X 2 ) _ 2 ∼ n(0, 1), (X 1 −X 2 ) 2 _ 2 ∼ χ 2 1 (see Example 2.1.9). b. Make the transformation y 1 = x 1 x 1 +x 2 , y 2 = x 1 + x 2 then x 1 = y 1 y 2 , x 2 = y 2 (1 − y 1 ) and |J| = y 2 . Then f(y 1 , y 2 ) = _ Γ(α 1 2 ) Γ(α 1 )Γ(α 2 ) y α 1 −1 1 (1 −y 1 ) α 2 −1 _ _ 1 Γ(α 1 2 ) y α 1 2 −1 2 e −y 2 _ , thus Y 1 ∼ beta(α 1 , α 2 ), Y 2 ∼ gamma(α 1 1 , 1) and are independent. Second Edition 4-5 4.20 a. This transformation is not one-to-one because you cannot determine the sign of X 2 from Y 1 and Y 2 . So partition the support of (X 1 , X 2 ) into A 0 = {−∞ < x 1 < ∞, x 2 = 0}, A 1 = {−∞< x 1 < ∞, x 2 > 0} and A 2 = {−∞< x 1 < ∞, x 2 < 0}. The support of (Y 1 , Y 2 ) is B = {0 < y 1 < ∞, −1 < y 2 < 1}. The inverse transformation from B to A 1 is x 1 = y 2 y 1 and x 2 = _ y 1 −y 1 y 2 2 with Jacobian J 1 = ¸ ¸ ¸ ¸ ¸ ¸ 1 2 y 2 y 1 y 1 1 2 1−y 2 2 y 1 y 2 y 1 1−y 2 2 ¸ ¸ ¸ ¸ ¸ ¸ = 1 2 _ 1 −y 2 2 . The inverse transformation from B to A 2 is x 1 = y 2 y 1 and x 2 = − _ y 1 −y 1 y 2 2 with J 2 = −J 1 . From (4.3.6), f Y 1 ,Y 2 (y 1 , y 2 ) is the sum of two terms, both of which are the same in this case. Then f Y 1 ,Y 2 (y 1 , y 2 ) = 2 _ 1 2πσ 2 e −y 1 /(2σ 2 ) 1 2 _ 1−y 2 2 _ = 1 2πσ 2 e −y 1 /(2σ 2 ) 1 _ 1−y 2 2 , 0 < y 1 < ∞, −1 < y 2 < 1. b. We see in the above expression that the joint pdf factors into a function of y 1 and a function of y 2 . So Y 1 and Y 2 are independent. Y 1 is the square of the distance from (X 1 , X 2 ) to the origin. Y 2 is the cosine of the angle between the positive x 1 -axis and the line from (X 1 , X 2 ) to the origin. So independence says the distance from the origin is independent of the orientation (as measured by the angle). 4.21 Since R and θ are independent, the joint pdf of T = R 2 and θ is f T,θ (t, θ) = 1 e −t/2 , 0 < t < ∞, 0 < θ < 2π. Make the transformation x = t cos θ, y = t sin θ. Then t = x 2 +y 2 , θ = tan −1 (y/x), and J = ¸ ¸ ¸ ¸ 2x 2y −y x 2 +y 2 −x x 2 +y 2 ¸ ¸ ¸ ¸ = 2. Therefore f X,Y (x, y) = 2 e 1 2 (x 2 +y 2 ) , 0 < x 2 +y 2 < ∞, 0 < tan −1 y/x < 2π. Thus, f X,Y (x, y) = 1 e 1 2 (x 2 +y 2 ) , −∞< x, y < ∞. So X and Y are independent standard normals. 4.23 a. Let y = v, x = u/y = u/v then J = ¸ ¸ ¸ ¸ ∂x ∂u ∂x ∂v ∂y ∂u ∂y ∂v ¸ ¸ ¸ ¸ = ¸ ¸ ¸ ¸ 1 v u v 2 0 1 ¸ ¸ ¸ ¸ = 1 v . f U,V (u, v) = Γ(α+β) Γ(α)Γ(β) Γ(α+β+γ) Γ(α+β)Γ(γ) _ u v _ α−1 _ 1 − u v _ β−1 v α+β−1 (1−v) γ−1 1 v , 0 < u < v < 1. 4-6 Solutions Manual for Statistical Inference Then, f U (u) = Γ(α+β+γ) Γ(α)Γ(β)Γ(γ) u α−1 _ 1 u v β−1 (1 −v) γ−1 ( v−u v ) β−1 dv = Γ(α+β+γ) Γ(α)Γ(β)Γ(γ) u α−1 (1 −u) β+γ−1 _ 1 0 y β−1 (1 −y) γ−1 dy _ y = v −u 1−u , dy = dv 1−u _ = Γ(α+β+γ) Γ(α)Γ(β)Γ(γ) u α−1 (1 −u) β+γ−1 Γ(β)Γ(γ) Γ(β+γ) = Γ(α+β+γ) Γ(α)Γ(β+γ) u α−1 (1 −u) β+γ−1 , 0 < u < 1. Thus, U ∼ gamma(α, β +γ). b. Let x = uv, y = _ u v then J = ¸ ¸ ¸ ¸ ∂x ∂u ∂x ∂v ∂y ∂u ∂x ∂v ¸ ¸ ¸ ¸ = ¸ ¸ ¸ ¸ 1 2 v 1/2 u −1/2 1 2 u 1/2 v −1/2 1 2 v −1/2 u −1/2 1 2 u 1/2 v −3/2 ¸ ¸ ¸ ¸ = 1 2v . f U,V (u, v) = Γ(α +β +γ) Γ(α)Γ(β)Γ(γ) ( uv α−1 (1 − uv) β−1 __ u v _ α+β−1 _ 1 − _ u v _ γ−1 1 2v . The set {0 < x < 1, 0 < y < 1} is mapped onto the set {0 < u < v < 1 u , 0 < u < 1}. Then, f U (u) = _ 1/u u f U,V (u, v)dv = Γ(α +β +γ) Γ(α)Γ(β)Γ(γ) u α−1 (1−u) β+γ−1 . ¸¸ . _ 1/u u _ 1 − uv 1 −u _ β−1 _ 1 − _ u/v 1 −u _ γ−1 ( _ u/v) β 2v(1 −u) dv. Call it A To simplify, let z = u/v−u 1−u . Then v = u ⇒z = 1, v = 1/u ⇒z = 0 and dz = − u/v 2(1−u)v dv. Thus, f U (u) = A _ z β−1 (1 −z) γ−1 dz ( kernel of beta(β, γ)) = Γ(α+β+γ) Γ(α)Γ(β)Γ(γ) u α−1 (1 −u) β+γ−1 Γ(β)Γ(γ) Γ(β+γ) = Γ(α+β+γ) Γ(α)Γ(β+γ) u α−1 (1 −u) β+γ−1 , 0 < u < 1. That is, U ∼ beta(α, β +γ), as in a). 4.24 Let z 1 = x +y, z 2 = x x+y , then x = z 1 z 2 , y = z 1 (1 −z 2 ) and |J| = ¸ ¸ ¸ ¸ ¸ ∂x ∂z 1 ∂x ∂z 2 ∂y ∂z 1 ∂y ∂z 2 ¸ ¸ ¸ ¸ ¸ = ¸ ¸ ¸ ¸ z 2 z 1 1−z 2 −z 1 ¸ ¸ ¸ ¸ = z 1 . The set {x > 0, y > 0} is mapped onto the set {z 1 > 0, 0 < z 2 < 1}. f Z 1 ,Z 2 (z 1 , z 2 ) = 1 Γ(r) (z 1 z 2 ) r−1 e −z 1 z 2 · 1 Γ(s) (z 1 −z 1 z 2 ) s−1 e −z 1 +z 1 z 2 z 1 = 1 Γ(r+s) z r+s−1 1 e −z 1 · Γ(r+s) Γ(r)Γ(s) z r−1 2 (1 −z 2 ) s−1 , 0 < z 1 , 0 < z 2 < 1. Second Edition 4-7 f Z 1 ,Z 2 (z 1 , z 2 ) can be factored into two densities. Therefore Z 1 and Z 2 are independent and Z 1 ∼ gamma(r +s, 1), Z 2 ∼ beta(r, s). 4.25 For X and Z independent, and Y = X +Z, f XY (x, y) = f X (x)f Z (y −x). In Example 4.5.8, f XY (x, y) = I (0,1) (x) 1 10 I (0,1/10) (y −x). In Example 4.5.9, Y = X 2 +Z and f XY (x, y) = f X (x)f Z (y −x 2 ) = 1 2 I (−1,1) (x) 1 10 I (0,1/10) (y −x 2 ). 4.26 a. P(Z ≤ z, W = 0) = P(min(X, Y ) ≤ z, Y ≤ X) = P(Y ≤ z, Y ≤ X) = _ z 0 _ y 1 λ e −x/λ 1 µ e −y/µ dxdy = λ µ+λ _ 1 −exp _ _ 1 µ + 1 λ _ z __ . Similarly, P(Z ≤ z,W=1) = P(min(X, Y ) ≤ z, X ≤ Y ) = P(X ≤ z, X ≤ Y ) = _ z 0 _ x 1 λ e −x/λ 1 µ e −y/µ dydx = µ µ+λ _ 1 −exp _ _ 1 µ + 1 λ _ z __ . b. P(W = 0) = P(Y ≤ X) = _ 0 _ y 1 λ e −x/λ 1 µ e −y/µ dxdy = λ µ+λ . P(W = 1) = 1 −P(W = 0) = µ µ+λ . P(Z ≤ z) = P(Z ≤ z, W = 0) +P(Z ≤ z, W = 1) = 1 −exp _ _ 1 µ + 1 λ _ z _ . Therefore, P(Z ≤ z, W = i) = P(Z ≤ z)P(W = i), for i = 0, 1, z > 0. So Z and W are independent. 4.27 From Theorem 4.2.14 we know U ∼ n(µ + γ, 2σ 2 ) and V ∼ n(µ − γ, 2σ 2 ). It remains to show that they are independent. Proceed as in Exercise 4.24. f XY (x, y) = 1 2πσ 2 e 1 2 [(x−µ) 2 +(y−γ) 2 ] (by independence, sof XY = f X f Y ) Let u = x +y, v = x −y, then x = 1 2 (u +v), y = 1 2 (u −v) and |J| = ¸ ¸ ¸ ¸ 1/2 1/2 1/2 −1/2 ¸ ¸ ¸ ¸ = 1 2 . The set {−∞< x < ∞, −∞< y < ∞} is mapped onto the set {−∞< u < ∞, −∞< v < ∞}. Therefore f UV (u, v) = 1 2πσ 2 e 1 2 _ (( u+v 2 )−µ) 2 +(( u−v 2 )−γ) 2 ¸ · 1 2 = 1 4πσ 2 e 1 2 _ 2( u 2 ) 2 −u(µ+γ)+ (µ+γ) 2 2 +2( v 2 ) 2 −v(µ−γ)+ (µ+γ) 2 2 _ = g(u) 1 4πσ 2 e 1 2(2σ 2 ) (u −(µ +γ)) 2 · h(v)e 1 2(2σ 2 ) (v −(µ −γ)) 2 . By the factorization theorem, U and V are independent. 4-8 Solutions Manual for Statistical Inference 4.29 a. X Y = Rcos θ Rsin θ = cot θ. Let Z = cot θ. Let A 1 = (0, π), g 1 (θ) = cot θ, g −1 1 (z) = cot −1 z, A 2 = (π, 2π), g 2 (θ) = cot θ, g −1 2 (z) = π + cot −1 z. By Theorem 2.1.8 f Z (z) = 1 | −1 1 +z 2 | + 1 | −1 1 +z 2 | = 1 π 1 1 +z 2 , −∞< z < ∞. b. XY = R 2 cos θ sin θ then 2XY = R 2 2 cos θ sin θ = R 2 sin 2θ. Therefore 2XY R = Rsin 2θ. Since R = X 2 +Y 2 then 2XY X 2 +Y 2 = Rsin 2θ. Thus 2XY X 2 +Y 2 is distributed as sin 2θ which is distributed as sin θ. To see this let sin θ ∼ f sin θ . For the function sin 2θ the values of the function sin θ are repeated over each of the 2 intervals (0, π) and (π, 2π) . Therefore the distribution in each of these intervals is the distribution of sin θ. The probability of choosing between each one of these intervals is 1 2 . Thus f 2 sin θ = 1 2 f sin θ + 1 2 f sin θ = f sin θ . Therefore 2XY X 2 +Y 2 has the same distribution as Y = sin θ. In addition, 2XY X 2 +Y 2 has the same distribution as X = cos θ since sin θ has the same distribution as cos θ. To see this let consider the distribution of W = cos θ and V = sin θ where θ ∼ uniform(0, 2π). To derive the distribution of W = cos θ let A 1 = (0, π), g 1 (θ) = cos θ, g −1 1 (w) = cos −1 w, A 2 = (π, 2π), g 2 (θ) = cos θ, g −1 2 (w) = 2π −cos −1 w. By Theorem 2.1.8 f W (w) = 1 | −1 1 −w 2 | + 1 | 1 1 −w 2 | = 1 π 1 1 −w 2 , −1 ≤ w ≤ 1. To derive the distribution of V = sin θ, ﬁrst consider the interval ( π 2 , 2 ). Let g 1 (θ) = sin θ, 4g −1 1 (v) = π −sin −1 v, then f V (v) = 1 π 1 1 −v 2 , −1 ≤ v ≤ 1. Second, consider the set {(0, π 2 ) ∪( 2 , 2π)}, for which the function sin θ has the same values as it does in the interval ( −π 2 , π 2 ). Therefore the distribution of V in {(0, π 2 ) ∪ ( 2 , 2π)} is the same as the distribution of V in ( −π 2 , π 2 ) which is 1 π 1 1−v 2 , −1 ≤ v ≤ 1. On (0, 2π) each of the sets ( π 2 , 2 ), {(0, π 2 ) ∪ ( 2 , 2π)} has probability 1 2 of being chosen. Therefore f V (v) = 1 2 1 π 1 1 −v 2 + 1 2 1 π 1 1 −v 2 = 1 π 1 1 −v 2 , −1 ≤ v ≤ 1. Thus W and V has the same distribution. Let X and Y be iid n(0, 1). Then X 2 + Y 2 ∼ χ 2 2 is a positive random variable. Therefore with X = Rcos θ and Y = Rsin θ, R = X 2 +Y 2 is a positive random variable and θ = tan −1 ( Y X ) ∼ uniform(0, 1). Thus 2XY X 2 +Y 2 ∼ X ∼ n(0, 1). 4.30 a. EY = E{E(Y |X)} = EX = 1 2 . VarY = Var (E(Y |X)) + E(Var(Y |X)) = VarX + EX 2 = 1 12 + 1 3 = 5 12 . EXY = E[E(XY |X)] = E[XE(Y |X)] = EX 2 = 1 3 Cov(X, Y ) = EXY −EXEY = 1 3 _ 1 2 _ 2 = 1 12 . b. The quick proof is to note that the distribution of Y |X = x is n(1, 1), hence is independent of X. The bivariate transformation t = y/x, u = x will also show that the joint density factors. Second Edition 4-9 4.31 a. EY = E{E(Y |X)} = EnX = n 2 . VarY = Var (E(Y |X)) + E(Var(Y |X)) = Var(nX) + EnX(1 −X) = n 2 12 + n 6 . b. P(Y = y, X ≤ x) = _ n y _ x y (1 −x) n−y , y = 0, 1, . . . , n, 0 < x < 1. c. P(y = y) = _ n y _ Γ(y + 1)Γ(n −y + 1) Γ(n + 2) . 4.32 a. The pmf of Y , for y = 0, 1, . . ., is f Y (y) = _ 0 f Y (y|λ)f Λ (λ)dλ = _ 0 λ y e −λ y! 1 Γ(α)β α λ α−1 e −λ/β = 1 y!Γ(α)β α _ 0 λ (y+α)−1 exp _ _ _ −λ _ β 1+β _ _ _ _ = 1 y!Γ(α)β α Γ(y +α) _ β 1+β _ y+α . If α is a positive integer, f Y (y) = _ y +α −1 y __ β 1+β _ y _ 1 1+β _ α , the negative binomial(α, 1/(1 +β)) pmf. Then EY = E(E(Y |Λ)) = EΛ = αβ VarY = Var(E(Y |Λ)) + E(Var(Y |Λ)) = VarΛ + EΛ = αβ 2 +αβ = αβ(β + 1). b. For y = 0, 1, . . ., we have P(Y = y|λ) = n=y P(Y = y|N = n, λ)P(N = n|λ) = n=y _ n y _ p y (1 −p) n−y e −λ λ n n! = n=y 1 y!(n −y)! _ p 1−p _ y [(1 −p)λ] n e −λ = e −λ m=0 1 y!m! _ p 1−p _ y [(1 −p)λ] m+y (let m = n −y) = e −λ y! _ p 1−p _ y [(1 −p)λ] y _ m=0 [(1−p)λ] m m! _ = e −λ (pλ) y e (1−p)λ = (pλ) y e −pλ y! , 4-10 Solutions Manual for Statistical Inference the Poisson(pλ) pmf. Thus Y |Λ ∼ Poisson(pλ). Now calculations like those in a) yield the pmf of Y , for y = 0, 1, . . ., is f Y (y) = 1 Γ(α)y!(pβ) α Γ(y +α) _ 1+pβ _ y+α . Again, if α is a positive integer, Y ∼ negative binomial(α, 1/(1 +pβ)). 4.33 We can show that H has a negative binomial distribution by computing the mgf of H. Ee Ht = EE _ e Ht ¸ ¸ N _ = EE _ e (X 1 +···+X N )t ¸ ¸ ¸ N _ = E _ _ E _ e X 1 t ¸ ¸ N N _ , because, by Theorem 4.6.7, the mgf of a sum of independent random variables is equal to the product of the individual mgfs. Now, Ee X 1 t = x 1 =1 e x 1 t −1 logp (1 −p) x 1 x 1 = −1 logp x 1 =1 (e t (1 −p)) x 1 x 1 = −1 logp _ −log _ 1−e t (1 −p) __ . Then E _ log {1−e t (1 −p)} logp _ N = n=0 _ log {1−e t (1 −p)} logp _ n e −λ λ n n! (since N ∼ Poisson) = e −λ e λlog(1−e t (1−p)) logp n=0 e −λlog(1−e t (1−p)) logp _ λlog(1−e t (1−p)) logp _ n n! . The sum equals 1. It is the sum of a Poisson _ [λlog(1 −e t (1 −p))]/[logp] _ pmf. Therefore, E(e Ht ) = e −λ _ e log(1−e t (1−p)) _ λ/ log p = _ e logp _ −λ/ logp _ 1 1−e t (1 −p) _ −λ/ log p = _ p 1−e t (1 −p) _ −λ/ logp . This is the mgf of a negative binomial(r, p), with r = −λ/ log p, if r is an integer. 4.34 a. P(Y = y) = _ 1 0 P(Y = y|p)f p (p)dp = _ 1 0 _ n y _ p y (1 −p) n−y 1 B(α, β) p α−1 (1 −p) β−1 dp = _ n y _ Γ(α+β) Γ(α)Γ(β) _ 1 0 p y+α−1 (1 −p) n+β−y−1 dp = _ n y _ Γ(α+β) Γ(α)Γ(β) Γ(y+α)Γ(n+β−y) Γ(α+n+β) , y = 0, 1, . . . , n. b. P(X = x) = _ 1 0 P(X = x|p)f P (p)dp = _ 1 0 _ r +x −1 x _ p r (1 −p) x Γ(α +β) Γ(α)Γ(β) p α−1 (1 −p) β−1 dp Second Edition 4-11 = _ r +x −1 x _ Γ(α +β) Γ(α)Γ(β) _ 1 0 p (r+α)−1 (1 −p) (x+β)−1 dp = _ r +x −1 x _ Γ(α +β) Γ(α)Γ(β) Γ(r +α)Γ(x +β) Γ(r +x +α +β) x = 0, 1, . . . Therefore, EX = E[E(X|P)] = E _ r(1 −P) P _ = α −1 , since E _ 1 −P P _ = _ 1 0 _ 1 −P P _ Γ(α +β) Γ(α)Γ(β) p α−1 (1 −p) β−1 dp = Γ(α +β) Γ(α)Γ(β) _ 1 0 p (α−1)−1 (1 −p) (β+1)−1 dp = Γ(α +β) Γ(α)Γ(β) Γ(α −1)Γ(β + 1) Γ(α +β) = β α −1 . Var(X) = E(Var(X|P)) + Var(E(X|P)) = E _ r(1 −P) P 2 _ + Var _ r(1 −P) P _ = r (β + 1)(α +β) α(α −1) +r 2 β(α +β −1) (α −1) 2 (α −2) , since E _ 1 −P P 2 _ = _ 1 0 Γ(α +β) Γ(α)Γ(β) p (α−2)−1 (1 −p) (β+1)−1 dp = Γ(α +β) Γ(α)Γ(β) Γ(α −2)Γ(β + 1) Γ(α +β −1) = (β + 1)(α +β) α(α −1) and Var _ 1 −P P _ = E _ _ 1 −P P _ 2 _ _ E _ 1 −P P __ 2 = β(β + 1) (α −2)(α −1) −( β α −1 ) 2 = β(α +β −1) (α −1) 2 (α −2) , where E _ _ 1 −P P _ 2 _ = _ 1 0 Γ(α +β) Γ(α)Γ(β) p (α−2)−1 (1 −p) (β+2)−1 dp = Γ(α +β) Γ(α)Γ(β) Γ(α −2)Γ(β + 2) Γ(α −2 +β + 2) = β(β + 1) (α −2)(α −1) . 4.35 a. Var(X) = E(Var(X|P)) + Var(E(X|P)). Therefore, Var(X) = E[nP(1 −P)] + Var(nP) = n αβ (α +β)(α +β + 1) +n 2 VarP = n αβ(α +β + 1 −1) (α +β 2 )(α +β + 1) +n 2 VarP 4-12 Solutions Manual for Statistical Inference = nαβ(α +β + 1) (α +β 2 )(α +β + 1) nαβ (α +β 2 )(α +β + 1) +n 2 VarP = n α α +β β α +β −nVarP +n 2 VarP = nEP(1 −EP) +n(n −1)VarP. b. Var(Y ) = E(Var(Y |Λ)) + Var(E(Y |Λ)) = EΛ + Var(Λ) = µ + 1 α µ 2 since EΛ = µ = αβ and Var(Λ) = αβ 2 = (αβ) 2 α = µ 2 α . The “extra-Poisson” variation is 1 α µ 2 . 4.37 a. Let Y = X i . P(Y = k) = P(Y = k, 1 2 < c = 1 2 (1 +p) < 1) = _ 1 0 (Y = k|c = 1 2 (1 +p))P(P = p)dp = _ 1 0 _ n k _ [ 1 2 (1 +p)] k [1 − 1 2 (1 +p)] n−k Γ(a +b) Γ(a)Γ(b) p a−1 (1 −p) b−1 dp = _ 1 0 _ n k _ (1 +p) k 2 k (1 −p) n−k 2 n−k Γ(a +b) Γ(a)Γ(b) p a−1 (1 −p) b−1 dp = _ n k _ Γ(a +b) 2 n Γ(a)Γ(b) k j=0 _ 1 0 p k+a−1 (1 −p) n−k+b−1 dp = _ n k _ Γ(a +b) 2 n Γ(a)Γ(b) k j=0 _ k j _ Γ(k +a)Γ(n −k +b) Γ(n +a +b) = k j=0 ___ k j _ 2 n _ __ n k _ Γ(a +b) Γ(a)Γ(b) Γ(k +a)Γ(n −k +b) Γ(n +a +b) _ _ . A mixture of beta-binomial. b. EY = E(E(Y |c)) = E[nc] = E _ n _ 1 2 (1 +p) __ = n 2 _ 1 + a a +b _ . Using the results in Exercise 4.35(a), Var(Y ) = nEC(1 −EC) +n(n −1)VarC. Therefore, Var(Y ) = nE _ 1 2 (1 +P) _ _ 1 −E _ 1 2 (1 +P) __ +n(n −1)Var _ 1 2 (1 +P) _ = n 4 (1 + EP)(1 −EP) + n(n −1) 4 VarP = n 4 _ 1 − _ a a +b _ 2 _ + n(n −1) 4 ab (a +b) 2 (a +b + 1) . 4.38 a. Make the transformation u = x ν x λ , du = −x ν 2 dν, ν λ−ν = x λu . Then _ λ 0 1 ν e −x/ν 1 Γ(r)Γ(1 −r) ν r−1 (λ−ν) r Second Edition 4-13 = 1 Γ(r)Γ(1 −r) _ 0 1 x _ x λu _ r e −(u+x/λ) du = x r−1 e −x/λ λ r Γ(r)Γ(1 −r) _ 0 _ 1 u _ r e −u du = x r−1 e −x/λ Γ(r)λ r , since the integral is equal to Γ(1 −r) if r < 1. b. Use the transformation t = ν/λ to get _ λ 0 p λ (ν)dν = 1 Γ(r)Γ(1 −r) _ λ 0 ν r−1 (λ −ν) −r dν = 1 Γ(r)Γ(1 −r) _ 1 0 t r−1 (1 −t) −r dt = 1, since this is a beta(r, 1 −r). c. d dx log f(x) = d dx _ log 1 Γ(r)λ r +(r −1) log x −x/λ _ = r−1 x 1 λ > 0 for some x, if r > 1. But, d dx _ log _ 0 e −x/ν ν q λ (ν)dν _ = _ 0 1 ν 2 e −x/ν q λ (ν)dν _ 0 1 ν e −x/ν q λ (ν)dν < 0 ∀x. 4.39 a. Without loss of generality lets assume that i < j. From the discussion in the text we have that f(x 1 , . . . , x j−1 , x j+1 , . . . , x n |x j ) = (m−x j )! x 1 !· · · · ·x j−1 !·x j+1 !· · · · ·x n ! × _ p 1 1 −p j _ x 1 · · · · · _ p j−1 1 −p j _ x j−1 _ p j+1 1 −p j _ x j+1 · · · · · _ p n 1 −p j _ x n . Then, f(x i |x j ) = (x 1 ,...,x i−1 ,x i+1 ,...,x j−1 ,x j+1 ,...,x n ) f(x 1 , . . . , x j−1 , x j+1 , . . . , x n |x j ) = (x k =x i ,x j ) (m−x j )! x 1 !· · · · ·x j−1 !·x j+1 !· · · · ·x n ! ×( p 1 1 −p j ) x 1 · · · · ·( p j−1 1 −p j ) x j−1 ( p j+1 1 −p j ) x j+1 · · · · ·( p n 1 −p j ) x n × (m−x i −x j )! _ 1 − p i 1−p j _ m−x i −x j (m−x i −x j )! _ 1 − p i 1−p j _ m−x i −x j = (m−x j )! x i !(m−x i −x j )! ( p i 1 −p j ) x i _ 1 − p i 1 −p j _ m−x i −x j × (x k =x i ,x j ) (m−x i −x j )! x 1 !· · · · ·x i−1 !, x i+1 !· · · · ·x j−1 !, x j+1 !· · · · ·x n ! ×( p 1 1 −p j −p i ) x 1 · · · · ·( p i−1 1 −p j −p i ) x i−1 ( p i+1 1 −p j −p i ) x i+1 4-14 Solutions Manual for Statistical Inference ×( p j−1 1 −p j −p i ) x j−1 ( p j+1 1 −p j −p i ) x j+1 · · · · ·( p n 1 −p j −p i ) x n = (m−x j )! x i !(m−x i −x j )! ( p i 1 −p j ) x i _ 1 − p i 1 −p j _ m−x i −x j . Thus X i |X j = x j ∼ binomial(m−x j , p i 1−p j ). b. f(x i , x j ) = f(x i |x j )f(x j ) = m! x i !x j !(m−x j −x i )! p x i i p x j j (1 −p j −p i ) m−x j −x i . Using this result it can be shown that X i +X j ∼ binomial(m, p i +p j ). Therefore, Var(X i +X j ) = m(p i +p j )(1 −p i −p j ). By Theorem 4.5.6 Var(X i +X j ) = Var(X i ) + Var(X j ) + 2Cov(X i , X j ). Therefore, Cov(X i , X j ) = 1 2 [m(p i +p j )(1−p i −p j )−mp i (1−p i )−mp i (1−p i )] = 1 2 (−2mp i p j ) = −mp i p j . 4.41 Let a be a constant. Cov(a, X) = E(aX) −EaEX = aEX −aEX = 0. 4.42 ρ XY,Y = Cov(XY, Y ) σ XY σ Y = E(XY 2 )−µ XY µ Y σ XY σ Y = EXEY 2 −µ X µ Y µ Y σ XY σ Y , where the last step follows from the independence of X and Y. Now compute σ 2 XY = E(XY ) 2 −[E(XY )] 2 = EX 2 EY 2 −(EX) 2 (EY ) 2 = (σ 2 X 2 X )(σ 2 Y 2 Y ) −µ 2 X µ 2 Y = σ 2 X σ 2 Y 2 X µ 2 Y 2 Y µ 2 X . Therefore, ρ XY,Y = µ X 2 Y 2 Y )−µ X µ 2 Y 2 X σ 2 Y 2 X µ 2 Y 2 Y µ 2 X ) 1/2 σ Y = µ X σ Y 2 X σ 2 Y 2 Y σ 2 X 2 X σ 2 Y ) 1/2 . 4.43 Cov(X 1 +X 2 , X 2 +X 3 ) = E(X 1 +X 2 )(X 2 +X 3 ) −E(X 1 +X 2 )E(X 2 +X 3 ) = (4µ 2 2 ) −4µ 2 = σ 2 Cov(X 1 +X 2 )(X 1 −X 2 ) = E(X 1 +X 2 )(X 1 −X 2 ) = EX 2 1 −X 2 2 = 0. 4.44 Let µ i = E(X i ). Then Var _ n i=1 X i _ = Var (X 1 +X 2 +· · · +X n ) = E[(X 1 +X 2 +· · · +X n ) −(µ 1 2 +· · · +µ n )] 2 = E[(X 1 −µ 1 ) + (X 2 −µ 2 ) +· · · + (X n −µ n )] 2 = n i=1 E(X i −µ i ) 2 + 2 1≤i<j≤n E(X i −µ i )(X j −µ j ) = n i=1 VarX i + 2 1≤i<j≤n Cov(X i , X j ). Second Edition 4-15 4.45 a. We will compute the marginal of X. The calculation for Y is similar. Start with f XY (x, y) = 1 2πσ X σ Y _ 1−ρ 2 × exp _ 1 2(1−ρ 2 ) _ _ x−µ X σ X _ 2 −2ρ _ x−µ X σ X __ y−µ Y σ Y _ + _ y−µ Y σ Y _ 2 __ and compute f X (x) = _ −∞ f XY (x, y)dy = _ −∞ 1 2πσ X σ Y _ 1−ρ 2 e 1 2(1−ρ 2 ) 2 −2ρωz+z 2 Y dz, where we make the substitution z = y−µ Y σ Y , dy = σ Y dz, ω = x−µ X σ X . Now the part of the exponent involving ω 2 can be removed from the integral, and we complete the square in z to get f X (x) = e ω 2 2(1−ρ 2 ) 2πσ X _ 1−ρ 2 _ −∞ e 1 2(1−ρ 2 ) [(z 2 −2ρωz+ρ 2 ω 2 )−ρ 2 ω 2 ] dz = e −ω 2 /2(1−ρ 2 ) e ρ 2 ω 2 /2(1−ρ 2 ) 2πσ X _ 1−ρ 2 _ −∞ e 1 2(1−ρ 2 ) (z−ρω) 2 dz. The integrand is the kernel of normal pdf with σ 2 = (1 − ρ 2 ), and µ = ρω, so it integrates to _ 1−ρ 2 . Also note that e −ω 2 /2(1−ρ 2 ) e ρ 2 ω 2 /2(1−ρ 2 ) = e −ω 2 /2 . Thus, f X (x) = e −ω 2 /2 2πσ X _ 1−ρ 2 _ 1−ρ 2 = 1 2πσ X e 1 2 _ x−µ X σ X _ 2 , the pdf of n(µ X , σ 2 X ). b. f Y |X (y|x) = 1 2πσ X σ Y 1−ρ 2 e 1 2(1−ρ 2 ) _ _ x−µ X σ X _ 2 −2ρ _ x−µ X σ X __ y−µ Y σ Y _ + _ y−µ Y σ Y _ 2 _ 1 2πσ X e 1 2 X (x−µ X ) 2 = 1 2πσ Y _ 1−ρ 2 e 1 2(1−ρ 2 ) _ _ x−µ X σ X _ 2 −(1−ρ 2 ) _ x−µ X σ X _ 2 −2ρ _ x−µ X σ X __ y−µ Y σ Y _ + _ y−µ Y σ Y _ 2 _ = 1 2πσ Y _ 1−ρ 2 e 1 2(1−ρ 2 ) _ ρ 2 _ x−µ X σ X _ 2 −2ρ _ x−µ X σ X __ y−µ Y σ Y _ + _ y−µ Y σ Y _ 2 _ = 1 2πσ Y _ 1−ρ 2 e 1 2 Y (1−ρ 2 _ (y−µ Y )− _ ρ σ Y σ X (x−µ X ) 2 , which is the pdf of n _ Y −ρ(σ Y X )(x −µ X ) , σ Y _ 1 −ρ 2 _ . c. The mean is easy to check, E(aX +bY ) = aEX +bEY = aµ X +bµ Y , 4-16 Solutions Manual for Statistical Inference as is the variance, Var(aX +bY ) = a 2 VarX +b 2 VarY + 2abCov(X, Y ) = a 2 σ 2 X +b 2 σ 2 Y + 2abρσ X σ Y . To show that aX + bY is normal we have to do a bivariate transform. One possibility is U = aX+bY , V = Y , then get f U,V (u, v) and show that f U (u) is normal. We will do this in the standard case. Make the indicated transformation and write x = 1 a (u − bv), y = v and obtain |J| = ¸ ¸ ¸ ¸ 1/a −b/a 0 1 ¸ ¸ ¸ ¸ = 1 a . Then f UV (u, v) = 1 2πa _ 1−ρ 2 e 1 2(1−ρ 2 ) _ [ 1 a (u−bv)] 2 −2 ρ a (u−bv)+v 2 ¸ . Now factor the exponent to get a square in u. The result is 1 2(1−ρ 2 ) _ b 2 + 2ρab +a 2 a 2 _ _ u 2 b 2 + 2ρab +a 2 −2 _ b +aρ b 2 + 2ρab +a 2 _ uv +v 2 _ . Note that this is joint bivariate normal form since µ U = µ V = 0, σ 2 v = 1, σ 2 u = a 2 +b 2 +2abρ and ρ = Cov(U, V ) σ U σ V = E(aXY +bY 2 ) σ U σ V = aρ +b _ a 2 +b 2 +abρ , thus (1 −ρ ∗2 ) = 1 − a 2 ρ 2 +abρ +b 2 a 2 +b 2 + 2abρ = (1−ρ 2 )a 2 a 2 +b 2 + 2abρ = (1 −ρ 2 )a 2 σ 2 u where a _ 1−ρ 2 = σ U _ 1−ρ ∗2 . We can then write f UV (u, v) = 1 2πσ U σ V _ 1−ρ ∗2 exp _ 1 2 _ 1−ρ ∗2 _ u 2 σ 2 U −2ρ uv σ U σ V + v 2 σ 2 V _ _ , which is in the exact form of a bivariate normal distribution. Thus, by part a), U is normal. 4.46 a. EX = a X EZ 1 +b X EZ 2 + Ec X = a X 0 +b X 0 +c X = c X VarX = a 2 X VarZ 1 +b 2 X VarZ 2 + Varc X = a 2 X +b 2 X EY = a Y 0 +b Y 0 +c Y = c Y VarY = a 2 Y VarZ 1 +b 2 Y VarZ 2 + Varc Y = a 2 Y +b 2 Y Cov(X, Y ) = EXY −EX · EY = E[(a X a Y Z 2 1 +b X b Y Z 2 2 +c X c Y +a X b Y Z 1 Z 2 +a X c Y Z 1 +b X a Y Z 2 Z 1 +b X c Y Z 2 +c X a Y Z 1 +c X b Y Z 2 ) −c X c Y ] = a X a Y +b X b Y , since EZ 2 1 = EZ 2 2 = 1, and expectations of other terms are all zero. b. Simply plug the expressions for a X , b X , etc. into the equalities in a) and simplify. c. Let D = a X b Y −a Y b X = − _ 1−ρ 2 σ X σ Y and solve for Z 1 and Z 2 , Z 1 = b Y (X−c X ) −b X (Y −c Y ) D = σ Y (X−µ X )+σ X (Y −µ Y ) _ 2(1+ρ)σ X σ Y Z 2 = σ Y (X−µ X )+σ X (Y −µ Y ) _ 2(1−ρ)σ X σ Y . Second Edition 4-17 Then the Jacobian is J = _ ∂z 1 ∂x 1 ∂z 1 ∂y ∂z 2 ∂x ∂z 2 ∂y _ = _ b Y D −b X D −a Y D a X D _ = a X b Y D 2 a Y b X D 2 = 1 D = 1 _ 1−ρ 2 σ X σ Y , and we have that f X,Y (x, y) = 1 e 1 2 Y (x−µ X )+σ X (y−µ Y )) 2 2(1+ρ)σ 2 X σ 2 Y 1 e 1 2 Y (x−µ X )+σ X (y−µ Y )) 2 2(1−ρ)σ 2 X σ 2 Y 1 _ 1−ρ 2 σ X σ Y = (2πσ X σ Y _ 1 −ρ 2 ) −1 exp _ 1 2(1 −ρ 2 ) _ x −µ X σ X _ 2 _ − 2ρ x −µ X σ X _ y −µ Y σ Y _ + _ y −µ Y σ Y _ 2 , −∞< x < ∞, −∞< y < ∞, a bivariate normal pdf. d. Another solution is a X = ρσ X b X = _ (1 −ρ 2 X a Y = σ Y b Y = 0 c X = µ X c Y = µ Y . There are an inﬁnite number of solutions. Write b X = ± _ σ 2 X −a 2 X ,b Y = ± _ σ 2 Y −a 2 Y , and substitute b X ,b Y into a X a Y = ρσ X σ Y . We get a X a Y + _ ± _ σ 2 X −a 2 X __ ± _ σ 2 Y −a 2 Y _ = ρσ X σ Y . Square both sides and simplify to get (1 −ρ 2 2 X σ 2 Y = σ 2 X a 2 Y −2ρσ X σ Y a X a Y 2 Y a 2 X . This is an ellipse for ρ = ±1, a line for ρ = ±1. In either case there are an inﬁnite number of points satisfying the equations. 4.47 a. By deﬁnition of Z, for z < 0, P(Z ≤ z) = P(X ≤ z and XY > 0) +P(−X ≤ z and XY < 0) = P(X ≤ z and Y < 0) +P(X ≥ −z and Y < 0) (since z < 0) = P(X ≤ z)P(Y < 0) +P(X ≥ −z)P(Y < 0) (independence) = P(X ≤ z)P(Y < 0) +P(X ≤ z)P(Y > 0) (symmetry of Xand Y ) = P(X ≤ z)(P(Y < 0) +P(Y > 0)) = P(X ≤ z). By a similar argument, for z > 0, we get P(Z > z) = P(X > z), and hence, P(Z ≤ z) = P(X ≤ z). Thus, Z ∼ X ∼ n(0, 1). b. By deﬁnition of Z, Z > 0 ⇔ either (i)X < 0 and Y > 0 or (ii)X > 0 and Y > 0. So Z and Y always have the same sign, hence they cannot be bivariate normal. 4-18 Solutions Manual for Statistical Inference 4.49 a. f X (x) = _ (af 1 (x)g 1 (y) + (1 −a)f 2 (x)g 2 (y))dy = af 1 (x) _ g 1 (y)dy + (1 −a)f 2 (x) _ g 2 (y)dy = af 1 (x) + (1 −a)f 2 (x). f Y (y) = _ (af 1 (x)g 1 (y) + (1 −a)f 2 (x)g 2 (y))dx = ag 1 (y) _ f 1 (x)dx + (1 −a)g 2 (y) _ f 2 (x)dx = ag 1 (y) + (1 −a)g 2 (y). b. (⇒) If X and Y are independent then f(x, y) = f X (x)f Y (y). Then, f(x, y) −f X (x)f Y (y) = af 1 (x)g 1 (y) + (1 −a)f 2 (x)g 2 (y) −[af 1 (x) + (1 −a)f 2 (x)][ag 1 (y) + (1 −a)g 2 (y)] = a(1 −a)[f 1 (x)g 1 (y) −f 1 (x)g 2 (y) −f 2 (x)g 1 (y) +f 2 (x)g 2 (y)] = a(1 −a)[f 1 (x) −f 2 (x)][g 1 (y) −g 2 (y)] = 0. Thus [f 1 (x) −f 2 (x)][g 1 (y) −g 2 (y)] = 0 since 0 < a < 1. (⇐) if [f 1 (x) −f 2 (x)][g 1 (y) −g 2 (y)] = 0 then f 1 (x)g 1 (y) +f 2 (x)g 2 (y) = f 1 (x)g 2 (y) +f 2 (x)g 1 (y). Therefore f X (x)f Y (y) = a 2 f 1 (x)g 1 (y) +a(1 −a)f 1 (x)g 2 (y) +a(1 −a)f 2 (x)g 1 (y) + (1 −a) 2 f 2 (x)g 2 (y) = a 2 f 1 (x)g 1 (y) +a(1 −a)[f 1 (x)g 2 (y) +f 2 (x)g 1 (y)] + (1 −a) 2 f 2 (x)g 2 (y) = a 2 f 1 (x)g 1 (y) +a(1 −a)[f 1 (x)g 1 (y) +f 2 (x)g 2 (y)] + (1 −a) 2 f 2 (x)g 2 (y) = af 1 (x)g 1 (y) + (1 −a)f 2 (x)g 2 (y) = f(x, y). Thus X and Y are independent. c. Cov(X, Y ) = aµ 1 ξ 1 + (1 −a)µ 2 ξ 2 −[aµ 1 + (1 −a)µ 2 ][aξ 1 + (1 −a)ξ 2 ] = a(1 −a)[µ 1 ξ 1 −µ 1 ξ 2 −µ 2 ξ 1 2 ξ 2 ] = a(1 −a)[µ 1 −µ 2 ][ξ 1 −ξ 2 ]. To construct dependent uncorrelated random variables let (X, Y ) ∼ af 1 (x)g 1 (y) + (1 − a)f 2 (x)g 2 (y) where f 1 , f 2 , g 1 , g 2 are such that f 1 −f 2 = 0 and g 1 −g 2 = 0 with µ 1 = µ 2 or ξ 1 = ξ 2 . d. (i) f 1 ∼ binomial(n, p), f 2 ∼ binomial(n, p), g 1 ∼ binomial(n, p), g 2 ∼ binomial(n, 1 −p). (ii) f 1 ∼ binomial(n, p 1 ), f 2 ∼ binomial(n, p 2 ), g 1 ∼ binomial(n, p 1 ), g 2 ∼ binomial(n, p 2 ). (iii) f 1 ∼ binomial(n 1 , p n 1 ), f 2 ∼ binomial(n 2 , p n 2 ), g 1 ∼ binomial(n 1 , p), g 2 ∼ binomial(n 2 , p). Second Edition 4-19 4.51 a. P(X/Y ≤ t) = _ 1 2 t t > 1 1 2 + (1 −t) t ≤ 1 P(XY ≤ t) = t −t log t 0 < t < 1. b. P(XY/Z ≤ t) = _ 1 0 P(XY ≤ zt)dz = __ 1 0 _ zt 2 + (1 −zt) ¸ dz if t ≤ 1 _ 1 t 0 _ zt 2 + (1 −zt) ¸ dz + _ 1 1 t 1 2zt dz if t ≤ 1 = _ 1 −t/4 if t ≤ 1 t − 1 4t + 1 2t log t if t > 1 . 4.53 P(Real Roots) = P(B 2 > 4AC) = P(2 log B > log 4 + log A+ log C) = P(−2 log B ≤ −log 4 −log A−log C) = P (−2 log B ≤ −log 4 + (−log A−log C)) . Let X = −2 log B, Y = −log A−log C. Then X ∼ exponential(2), Y ∼ gamma(2, 1), indepen- dent, and P(Real Roots) = P(X < −log 4 +Y ) = _ log 4 P(X < −log 4 +y)f Y (y)dy = _ log 4 _ −log 4+y 0 1 2 e −x/2 dxye −y dy = _ log 4 _ 1 −e 1 2 log 4 e −y/2 _ ye −y dy. Integration-by-parts will show that _ a ye −y/b = b(a +b)e −a/b and hence P(Real Roots) = 1 4 (1 + log 4) − 1 24 _ 2 3 + log 4 _ = .511. 4.54 Let Y = n i=1 X i . Then P(Y ≤ y) = P( n i=1 X i ≤ y) = P( n i=1 −log X i ≥ −log y). Now, −log X i ∼ exponential(1) = gamma(1, 1). By Example 4.6.8, n i=1 −log X i ∼ gamma(n, 1). Therefore, P(Y ≤ y) = _ −log y 1 Γ(n) z n−1 e −z dz, and f Y (y) = d dy _ −log y 1 Γ(n) z n−1 e −z dz = − 1 Γ(n) (−log y) n−1 e −(−log y) d dy (−log y) = 1 Γ(n) (−log y) n−1 , 0 < y < 1. 4-20 Solutions Manual for Statistical Inference 4.55 Let X 1 , X 2 , X 3 be independent exponential(λ) random variables, and let Y = max(X 1 , X 2 , X 3 ), the lifetime of the system. Then P(Y ≤ y) = P (max(X 1 , X 2 , X 3 ) ≤ y) = P (X 1 ≤ y and X 2 ≤ y and X 3 ≤ y) = P(X 1 ≤ y)P(X 2 ≤ y)P(X 3 ≤ y). by the independence of X 1 , X 2 and X 3 . Now each probability is P(X 1 ≤ y) = _ y 0 1 λ e −x/λ dx = 1 −e −y/λ , so P(Y ≤ y) = _ 1−e −y/λ _ 3 , 0 < y < ∞, and the pdf is f Y (y) = _ 3 _ 1 −e −y/λ _ 2 e −y/λ y > 0 0 y ≤ 0. 4.57 a. A 1 = [ 1 n n x=1 x 1 i ] 1 1 = 1 n n x=1 x i , the arithmetic mean. A −1 = [ 1 n n x=1 x −1 i ] −1 = 1 1 n ( 1 x 1 +· · · + 1 x n ) , the harmonic mean. lim r→0 log A r = lim r→0 log[ 1 n n x=1 x r i ] 1 r = lim r→0 1 r log[ 1 n n x=1 x r i ] = lim r→0 1 n n i=1 rx r−1 i 1 n n i=1 x r i = lim r→0 1 n n i=1 x r i log x i 1 n n i=1 x r i = 1 n n i=1 log x i = 1 n log( n i=1 x i ). Thus A 0 = lim r→0 A r = exp( 1 n log( n i=1 x i )) = ( n i=1 x i ) 1 n , the geometric mean. The term rx r−1 i = x r i log x i since rx r−1 i = d dr x r i = d dr exp(r log x i ) = exp(r log x i ) log x i = x r i log x i . b. (i) if log A r is nondecreasing then for r ≤ r log A r ≤ log A r , then e log A r ≤ e log A r . Therefore A r ≤ A r . Thus A r is nondecreasing in r. (ii) d dr log A r = −1 r 2 log( 1 n n x=1 x r i ) + 1 r 1 n n i=1 rx r−1 i 1 n n i=1 x r i = 1 r 2 _ r n i=1 x r i log x i n x=1 x r i −log( 1 n n x=1 x r i ) _ , where we use the identity for rx r−1 i showed in a). (iii) r n i=1 x r i log x i n x=1 x r i −log( 1 n n x=1 x r i ) = log(n) + r n i=1 x r i log x i n x=1 x r i −log( n x=1 x r i ) = log(n) + n i=1 _ x r i n i=1 x r i r log x i x r i n i=1 x r i log( n x=1 x r i ) _ = log(n) + n i=1 _ x r i n i=1 x r i (r log x i −log( n x=1 x r i )) _ = log(n) − n i=1 x r i n i=1 x r i log( n x=1 x r i x r i ) = log(n) − n i=1 a i log( 1 a i ). Second Edition 4-21 We need to prove that log(n) ≥ n i=1 a i log( 1 a i ). Using Jensen inequality we have that Elog( 1 a ) = n i=1 a i log( 1 a i ) ≤ log(E 1 a ) = log( n i=1 a i 1 a i ) = log(n) which establish the result. 4.59 Assume that EX = 0, EY = 0, and EZ = 0. This can be done without loss of generality because we could work with the quantities X −EX, etc. By iterating the expectation we have Cov(X, Y ) = EXY = E[E(XY |Z)]. Adding and subtracting E(X|Z)E(Y |Z) gives Cov(X, Y ) = E[E(XY |Z) −E(X|Z)E(Y |Z)] + E[E(X|Z)E(Y |Z)]. Since E[E(X|Z)] = EX = 0, the second term above is Cov[E(X|Z)E(Y |Z)]. For the ﬁrst term write E[E(XY |Z) −E(X|Z)E(Y |Z)] = E[E{XY −E(X|Z)E(Y |Z)| Z}] where we have brought E(X|Z) and E(Y |Z) inside the conditional expectation. This can now be recognized as ECov(X, Y |Z), establishing the identity. 4.61 a. To ﬁnd the distribution of f(X 1 |Z), let U = X 2 −1 X 1 and V = X 1 . Then x 2 = h 1 (u, v) = uv+1, x 1 = h 2 (u, v) = v. Therefore f U,V (u, v) = f X,Y (h 1 (u, v), h 2 (u, v))|J| = e −(uv+1) e −v v, and f U (u) = _ 0 ve −(uv+1) e −v dv = e −1 (u + 1) 2 . Thus V |U = 0 has distribution ve −v . The distribution of X 1 |X 2 is e −x 1 since X 1 and X 2 are independent. b. The following Mathematica code will draw the picture; the solid lines are B 1 and the dashed lines are B 2 . Note that the solid lines increase with x1, while the dashed lines are constant. Thus B 1 is informative, as the range of X 2 changes. e = 1/10; Plot[{-e*x1 + 1, e*x1 + 1, 1 - e, 1 + e}, {x1, 0, 5}, PlotStyle -> {Dashing[{}], Dashing[{}],Dashing[{0.15, 0.05}], Dashing[{0.15, 0.05}]}] c. P(X 1 ≤ x|B 1 ) = P(V ≤ v | − < U < ) = _ v 0 _ ve −(uv+1) e −v dudv _ 0 _ ve −(uv+1) e −v dudv = e −1 _ e −v (1+) 1+ 1 1+ e −v (1−) 1− + 1 1− _ e −1 _ 1 1+ + 1 1− _ . Thus lim →0 P(X 1 ≤ x|B 1 ) = 1 −e −v −v e −v = _ v 0 ve −v dv = P(V ≤ v |U = 0). P(X 1 ≤ x|B 2 ) = _ x 0 _ 1+ 0 e −(x 1 +x 2 ) dx 2 dx 1 _ 1+ 0 e −x 2 dx 2 = e −(x+1+) −e −(1+) −e −x + 1 1 −e −(1+) . Thus lim →0 P(X 1 ≤ x|B 2 ) = 1 −e x = _ x 0 e x 1 dx 1 = P(X 1 ≤ x|X 2 = 1). 4-22 Solutions Manual for Statistical Inference 4.63 Since X = e Z and g(z) = e z is convex, by Jensen’s Inequality EX = Eg(Z) ≥ g(EZ) = e 0 = 1. In fact, there is equality in Jensen’s Inequality if and only if there is an interval I with P(Z ∈ I) = 1 and g(z) is linear on I. But e z is linear on an interval only if the interval is a single point. So EX > 1, unless P(Z = EZ = 0) = 1. 4.64 a. Let a and b be real numbers. Then, |a +b| 2 = (a +b)(a +b) = a 2 + 2ab +b 2 ≤ |a| 2 + 2|ab| +|b| 2 = (|a| +|b|) 2 . Take the square root of both sides to get |a +b| ≤ |a| +|b|. b. |X +Y | ≤ |X| +|Y | ⇒E|X +Y | ≤ E(|X| +|Y |) = E|X| + E|Y |. 4.65 Without loss of generality let us assume that Eg(X) = Eh(X) = 0. For part (a) E(g(X)h(X)) = _ −∞ g(x)h(x)f X (x)dx = _ {x:h(x)≤0} g(x)h(x)f X (x)dx + _ {x:h(x)≥0} g(x)h(x)f X (x)dx ≤ g(x 0 ) _ {x:h(x)≤0} h(x)f X (x)dx +g(x 0 ) _ {x:h(x)≥0} h(x)f X (x)dx = _ −∞ h(x)f X (x)dx = g(x 0 )Eh(X) = 0. where x 0 is the number such that h(x 0 ) = 0. Note that g(x 0 ) is a maximum in {x : h(x) ≤ 0} and a minimum in {x : h(x) ≥ 0} since g(x) is nondecreasing. For part (b) where g(x) and h(x) are both nondecreasing E(g(X)h(X)) = _ −∞ g(x)h(x)f X (x)dx = _ {x:h(x)≤0} g(x)h(x)f X (x)dx + _ {x:h(x)≥0} g(x)h(x)f X (x)dx ≥ g(x 0 ) _ {x:h(x)≤0} h(x)f X (x)dx +g(x 0 ) _ {x:h(x)≥0} h(x)f X (x)dx = _ −∞ h(x)f X (x)dx = g(x 0 )Eh(X) = 0. The case when g(x) and h(x) are both nonincreasing can be proved similarly. Chapter 5 Properties of a Random Sample 5.1 Let X = # color blind people in a sample of size n. Then X ∼ binomial(n, p), where p = .01. The probability that a sample contains a color blind person is P(X > 0) = 1 − P(X = 0), where P(X = 0) = _ n 0 _ (.01) 0 (.99) n = .99 n . Thus, P(X > 0) = 1 −.99 n > .95 ⇔n > log(.05)/ log(.99) ≈ 299. 5.3 Note that Y i ∼ Bernoulli with p i = P(X i ≥ µ) = 1 − F(µ) for each i. Since the Y i ’s are iid Bernoulli, n i=1 Y i ∼ binomial(n, p = 1 −F(µ)). 5.5 Let Y = X 1 +· · · + X n . Then ¯ X = (1/n)Y , a scale transformation. Therefore the pdf of ¯ X is f ¯ X (x) = 1 1/n f Y _ x 1/n _ = nf Y (nx). 5.6 a. For Z = X − Y , set W = X. Then Y = W − Z, X = W, and |J| = ¸ ¸ ¸ ¸ 0 1 −1 1 ¸ ¸ ¸ ¸ = 1. Then f Z,W (z, w) = f X (w)f Y (w −z) · 1, thus f Z (z) = _ −∞ f X (w)f Y (w −z)dw. b. For Z = XY , set W = X. Then Y = Z/W and |J| = ¸ ¸ ¸ ¸ 0 1 1/w −z/w 2 ¸ ¸ ¸ ¸ = −1/w. Then f Z,W (z, w) = f X (w)f Y (z/w) · |−1/w|, thus f Z (z) = _ −∞ |−1/w| f X (w)f Y (z/w)dw. c. For Z = X/Y , set W = X. Then Y=W/Z and |J| = ¸ ¸ ¸ ¸ 0 1 −w/z 2 1/z ¸ ¸ ¸ ¸ = w/z 2 . Then f Z,W (z, w) = f X (w)f Y (w/z) · |w/z 2 |, thus f Z (z) = _ −∞ |w/z 2 |f X (w)f Y (w/z)dw. 5.7 It is, perhaps, easiest to recover the constants by doing the integrations. We have _ −∞ B 1+ _ ω σ _ 2 dω = σπB, _ −∞ D 1+ _ ω−z τ _ 2 dω = τπD and _ −∞ _ 1+ _ ω σ _ 2 1+ _ ω−z τ _ 2 _ = _ −∞ _ 1+ _ ω σ _ 2 C(ω−z) 1+ _ ω−z τ _ 2 _ dω −Cz _ −∞ 1 1+ _ ω−z τ _ 2 = A σ 2 2 log _ 1+ _ ω σ _ 2 _ 2 2 log _ 1+ _ ω−z τ _ 2 ¸ ¸ ¸ ¸ −∞ −τπCz. The integral is ﬁnite and equal to zero if A = M 2 σ 2 , C = M 2 τ 2 for some constant M. Hence f Z (z) = 1 π 2 στ _ σπB−τπD− 2πMz τ _ = 1 π(σ+τ) 1 1+(z/(σ+τ)) 2 , if B = τ σ+τ , D = σ σ+τ) , M = −στ 2 2z(σ+τ) 1 1+( z σ+τ ) 2 . 5-2 Solutions Manual for Statistical Inference 5.8 a. 1 2n(n −1) n i=1 n j=1 (X i −X j ) 2 = 1 2n(n −1) n i=1 n j=1 (X i ¯ X + ¯ X −X j ) 2 = 1 2n(n −1) n i=1 n j=1 _ (X i ¯ X) 2 −2(X i ¯ X)(X j ¯ X) + (X j ¯ X) 2 _ = 1 2n(n −1) _ ¸ ¸ ¸ ¸ _ n i=1 n(X i ¯ X) 2 −2 n i=1 (X i ¯ X) n j=1 (X j ¯ X) . ¸¸ . =0 +n n j=1 (X j ¯ X) 2 _ ¸ ¸ ¸ ¸ _ = n 2n(n −1) n i=1 (X i ¯ X) 2 + n 2n(n −1) n j=1 (X j ¯ X) 2 = 1 n −1 n i=1 (X i ¯ X) 2 = S 2 . b. Although all of the calculations here are straightforward, there is a tedious amount of book- keeping needed. It seems that induction is the easiest route. (Note: Without loss of generality we can assume θ 1 = 0, so EX i = 0.) (i) Prove the equation for n = 4. We have S 2 = 1 24 4 i=1 4 j=1 (X i −X j ) 2 , and to calculate Var(S 2 ) we need to calculate E(S 2 ) 2 and E(S 2 ). The latter expectation is straightforward and we get E(S 2 ) = 24θ 2 . The expected value E(S 2 ) 2 = E(S 4 ) contains 256(= 4 4 ) terms of which 112(= 4 ×16 + 4 ×16 −4 2 ) are zero, whenever i = j. Of the remaining terms, • 24 are of the form E(X i −X j ) 4 = 2(θ 4 + 3θ 2 2 ) • 96 are of the form E(X i −X j ) 2 (X i −X k ) 2 = θ 4 + 3θ 2 2 • 24 are of the form E(X i −X j ) 2 (X k −X ) 2 = 4θ 2 2 Thus, Var(S 2 ) = 1 24 2 _ 24 ×2(θ 4 + 3θ 2 2 ) + 96(θ 4 + 3θ 2 2 ) + 24 ×4θ 4 −(24θ 2 ) 2 _ = 1 4 _ θ 4 1 3 θ 2 2 _ . (ii) Assume that the formula holds for n, and establish it for n+1. (Let S n denote the variance based on n observations.) Straightforward algebra will establish S 2 n+1 = 1 2n(n + 1) _ _ n i=1 n j=1 (X i −X j ) 2 + 2 n k=1 (X k −X n+1 ) 2 _ _ def’n = 1 2n(n + 1) [A+ 2B] where Var(A) = 4n(n −1) 2 _ θ 4 n −3 n −1 θ 2 2 _ (induction hypothesis) Var(B) = n(n + 1)θ 4 −n(n −3)θ 2 2 (X k and X n+1 are independent) Cov(A, B) = 2n(n −1) _ θ 4 −θ 2 2 ¸ (some minor bookkeeping needed) Second Edition 5-3 Hence, Var(S 2 n+1 ) = 1 4n 2 (n + 1) 2 [Var(A) + 4Var(B) + 4Cov(A, B)] = 1 n + 1 _ θ 4 n −2 n θ 2 2 _ , establishing the induction and verifying the result. c. Again assume that θ 1 = 0. Then Cov( ¯ X, S 2 ) = 1 2n 2 (n −1) E _ _ _ n k=1 X k n i=1 n j=1 (X i −X j ) 2 _ _ _ . The double sum over i and j has n(n − 1) nonzero terms. For each of these, the entire expectation is nonzero for only two values of k (when k matches either i or j). Thus Cov( ¯ X, S 2 ) = 2n(n −1) 2n 2 (n −1) EX i (X i −X j ) 2 = 1 n θ 3 , and ¯ X and S 2 are uncorrelated if θ 3 = 0. 5.9 To establish the Lagrange Identity consider the case when n = 2, (a 1 b 2 −a 2 b 1 ) 2 = a 2 1 b 2 2 +a 2 2 b 2 1 −2a 1 b 2 a 2 b 1 = a 2 1 b 2 2 +a 2 2 b 2 1 −2a 1 b 2 a 2 b 1 +a 2 1 b 2 1 +a 2 2 b 2 2 −a 2 1 b 2 1 −a 2 2 b 2 2 = (a 2 1 +a 2 2 )(b 2 1 +b 2 2 ) −(a 1 b 1 +a 2 b 2 ) 2 . Assume that is true for n, then _ n+1 i=1 a 2 i __ n+1 i=1 b 2 i _ _ n+1 i=1 a i b i _ 2 = _ n i=1 a 2 i +a 2 n+1 __ n i=1 b 2 i +b 2 n+1 _ _ n i=1 a i b i +a n+1 b n+1 _ 2 = _ n i=1 a 2 i __ n i=1 b 2 i _ _ n i=1 a i b i _ 2 + _ n i=1 a 2 i _ b 2 n+1 +a 2 n+1 _ n i=1 b 2 i _ −2 _ n i=1 a i b i _ a n+1 b n+1 = n−1 i=1 n j=i+1 (a i b j −a j b i ) 2 + n i=1 (a i b n+1 −a n+1 b i ) 2 = n i=1 n+1 j=i+1 (a i b j −a j b i ) 2 . If all the points lie on a straight line then Y − µ y = c(X − µ x ), for some constant c = 0. Let b i = Y −µ y and a i = (X −µ x ), then b i = ca i . Therefore n i=1 n+1 j=i+1 (a i b j −a j b i ) 2 = 0. Thus the correlation coeﬃcient is equal to 1. 5.10 a. θ 1 = EX i = µ 5-4 Solutions Manual for Statistical Inference θ 2 = E(X i −µ) 2 = σ 2 θ 3 = E(X i −µ) 3 = E(X i −µ) 2 (X i −µ) (Stein’s lemma: Eg(X)(X −θ) = σ 2 Eg (X)) = 2σ 2 E(X i −µ) = 0 θ 4 = E(X i −µ) 4 = E(X i −µ) 3 (X i −µ) = 3σ 2 E(X i −µ) 2 = 3σ 4 . b. VarS 2 = 1 n 4 n−3 n−1 θ 2 2 ) = 1 n (3σ 4 n−3 n−1 σ 4 ) = 4 n−1 . c. Use the fact that (n −1)S 2 2 ∼ χ 2 n−1 and Varχ 2 n−1 = 2(n −1) to get Var _ (n −1)S 2 σ 2 _ = 2(n −1) which implies ( (n−1) 2 σ 4 )VarS 2 = 2(n −1) and hence VarS 2 = 2(n −1) (n −1) 2 4 = 4 n −1 . Remark: Another approach to b), not using the χ 2 distribution, is to use linear model theory. For any matrix A Var(X AX) = 2µ 2 2 trA 2 + 4µ 2 θ Aθ, where µ 2 is σ 2 , θ = EX = µ1. Write S 2 = 1 n−1 n i=1 (X i ¯ X) = 1 n−1 X (I − ¯ J n )X.Where I − ¯ J n = _ _ _ _ _ _ 1− 1 n 1 n · · · − 1 n 1 n 1− 1 n . . . . . . . . . . . . 1 n · · · · · · 1− 1 n _ _ _ _ _ _ . Notice that trA 2 = trA = n −1, Aθ = 0. So VarS 2 = 1 (n −1) 2 Var(X AX) = 1 (n −1) 2 _ 4 (n −1) + 0 _ = 4 n −1 . 5.11 Let g(s) = s 2 . Since g(·) is a convex function, we know from Jensen’s inequality that Eg(S) ≥ g(ES), which implies σ 2 = ES 2 ≥ (ES) 2 . Taking square roots, σ ≥ ES. From the proof of Jensen’s Inequality, it is clear that, in fact, the inequality will be strict unless there is an interval I such that g is linear on I and P(X ∈ I) = 1. Since s 2 is “linear” only on single points, we have ET 2 > (ET) 2 for any random variable T, unless P(T = ET) = 1. 5.13 E _ c S 2 _ = c _ σ 2 n −1 E __ S 2 (n −1) σ 2 _ = c _ σ 2 n −1 _ 0 q 1 Γ _ n−1 2 _ 2 (n−1)/2 q ( n−1 2 )−1 e −q/2 dq, Since _ S 2 (n −1)/σ 2 is the square root of a χ 2 random variable. Now adjust the integrand to be another χ 2 pdf and get E _ c S 2 _ = c _ σ 2 n −1 · Γ(n/2)2 n/2 Γ((n −1)/2)2 ((n−1)/2 _ 0 1 Γ(n/2)2 n/2 q (n−1)/2 1 2 e −q/2 dq . ¸¸ . =1 since χ 2 n pdf . So c = Γ( n−1 2 ) n−1 2Γ( n 2 ) gives E(cS) = σ. Second Edition 5-5 5.15 a. ¯ X n+1 = n+1 i=1 X i n + 1 = X n+1 + n i=1 X i n + 1 = X n+1 +n ¯ X n n + 1 . b. nS 2 n+1 = n (n + 1) −1 n+1 i=1 _ X i ¯ X n+1 _ 2 = n+1 i=1 _ X i X n+1 +n ¯ X n n + 1 _ 2 (use (a)) = n+1 i=1 _ X i X n+1 n + 1 n ¯ X n n + 1 _ 2 = n+1 i=1 _ _ X i ¯ X n _ _ X n+1 n + 1 ¯ X n n + 1 __ 2 _ ± ¯ X n _ = n+1 i=1 _ _ X i ¯ X n _ 2 −2 _ X i ¯ X n _ _ X n+1 ¯ X n n + 1 _ + 1 (n + 1) 2 _ X n+1 ¯ X n _ 2 _ = n i=1 _ X i ¯ X n _ 2 + _ X n+1 ¯ X n _ 2 −2 (X n+1 ¯ X n ) 2 n + 1 + n + 1 (n + 1) 2 _ X n+1 ¯ X n _ 2 _ since n 1 (X i ¯ X n ) = 0 _ = (n −1)S 2 n + n n + 1 _ X n+1 ¯ X n _ 2 . 5.16 a. 3 i=1 _ X i −i i _ 2 ∼ χ 2 3 b. _ X i −1 i _ _ ¸ ¸ ¸ _ 3 i=2 _ X i −i i _ 2 _ 2 ∼ t 2 c. Square the random variable in part b). 5.17 a. Let U ∼ χ 2 p and V ∼ χ 2 q , independent. Their joint pdf is 1 Γ _ p 2 _ Γ _ q 2 _ 2 (p+q)/2 u p 2 −1 v q 2 −1 e −(u+v) 2 . From Deﬁnition 5.3.6, the random variable X = (U/p)/(V/q) has an F distribution, so we make the transformation x = (u/p)/(v/q) and y = u +v. (Of course, many choices of y will do, but this one makes calculations easy. The choice is prompted by the exponential term in the pdf.) Solving for u and v yields u = p q xy 1 + q p x , v = y 1 + q p x , and |J| = q p y _ 1 + q p x _ 2 . We then substitute into f U,V (u, v) to obtain f X,Y (x, y) = 1 Γ _ p 2 _ Γ _ q 2 _ 2 (p+q)/2 _ p q xy 1 + q p x _ p 2 −1 _ y 1 + q p x _ q 2 −1 e −y 2 q p y _ 1 + q p x _ 2 . 5-6 Solutions Manual for Statistical Inference Note that the pdf factors, showing that X and Y are independent, and we can read oﬀ the pdfs of each: X has the F distribution and Y is χ 2 p+q . If we integrate out y to recover the proper constant, we get the F pdf f X (x) = Γ _ p+q 2 _ Γ _ p 2 _ Γ _ q 2 _ _ q p _ p/2 x p/2−1 _ 1 + q p x _ p+q 2 . b. Since F p,q = χ 2 p /p χ 2 q /q , let U ∼ χ 2 p , V ∼ χ 2 q and U and V are independent. Then we have EF p,q = E _ U/p V/q _ = E _ U p _ E _ q V _ (by independence) = p p qE _ 1 V _ (EU = p). Then E _ 1 V _ = _ 0 1 v 1 Γ _ q 2 _ 2 q/2 v q 2 −1 e v 2 dv = 1 Γ _ q 2 _ 2 q/2 _ 0 v q−2 2 −1 e v 2 dv = 1 Γ _ q 2 _ 2 q/2 Γ _ q −2 2 _ 2 (q−2)/2 = Γ _ q−2 2 _ 2 (q−2)/2 Γ _ q−2 2 _ _ q−2 2 _ 2 q/2 = 1 q −2 . Hence, EF p,q = p p q q−2 = q q−2 , if q > 2. To calculate the variance, ﬁrst calculate E(F 2 p,q ) = E _ U 2 p 2 q 2 V 2 _ = q 2 p 2 E(U 2 )E _ 1 V 2 _ . Now E(U 2 ) = Var(U) + (EU) 2 = 2p +p 2 and E _ 1 V 2 _ = _ 0 1 v 2 1 Γ(q/2) 2 q/2 v (q/2)−1 e −v/2 dv = 1 (q −2)(q −4) . Therefore, EF 2 p,q = q 2 p 2 p(2 +p) 1 (q −2)(q −4) = q 2 p (p + 2) (q −2)(q −4) , and, hence Var(F p,q ) = q 2 (p + 2) p(q −2)(q −4) q 2 (q −2) 2 = 2 _ q q −2 _ 2 _ q +p −2 p(q −4) _ , q > 4. c. Write X = U/p V/p then 1 X = V/q U/p ∼ F q,p , since U ∼ χ 2 p , V ∼ χ 2 q and U and V are independent. d. Let Y = (p/q)X 1+(p/q)X = pX q+pX , so X = qY p(1−Y ) and ¸ ¸ ¸ dx dy ¸ ¸ ¸ = q p (1 −y) −2 . Thus, Y has pdf f Y (y) = Γ _ q+p 2 _ Γ _ p 2 _ Γ _ q 2 _ _ p q _ p 2 _ qy p(1−y) _ p−2 2 _ 1 + p q qy p(1−y) _ p+q 2 q p(1 −y) 2 = _ B _ p 2 , q 2 __ −1 y p 2 −1 (1 −y) q 2 −1 ∼ beta _ p 2 , q 2 _ . Second Edition 5-7 5.18 If X ∼ t p , then X = Z/ _ V/p where Z ∼ n(0, 1), V ∼ χ 2 p and Z and V are independent. a. EX = EZ/ _ V/p = (EZ)(E1/ _ V/p) = 0, since EZ = 0, as long as the other expectation is ﬁnite. This is so if p > 1. From part b), X 2 ∼ F 1,p . Thus VarX = EX 2 = p/(p −2), if p > 2 (from Exercise 5.17b). b. X 2 = Z 2 /(V/p). Z 2 ∼ χ 2 1 , so the ratio is distributed F 1,p . c. The pdf of X is f X (x) = _ Γ( p+1 2 ) Γ(p/2) _ 1 (1 +x 2 /p) (p+1)/2 . Denote the quantity in square brackets by C p . From an extension of Stirling’s formula (Exercise 1.28) we have lim p→∞ C p = lim p→∞ _ p−1 2 _ p−1 2 + 1 2 e p−1 2 _ p−2 2 _ p−2 2 + 1 2 e p−2 2 1 = e −1/2 π lim p→∞ _ p−1 2 _ p−1 2 + 1 2 _ p−2 2 _ p−2 2 + 1 2 p = e −1/2 π e 1/2 2 , by an application of Lemma 2.3.14. Applying the lemma again shows that for each x lim p→∞ _ 1+x 2 /p _ (p+1)/2 = e x 2 /2 , establishing the result. d. As the random variable F 1,p is the square of a t p , we conjecture that it would converge to the square of a n(0, 1) random variable, a χ 2 1 . e. The random variable qF q,p can be thought of as the sum of q random variables, each a t p squared. Thus, by all of the above, we expect it to converge to a χ 2 q random variable as p →∞. 5.19 a. χ 2 p ∼ χ 2 q + χ 2 d where χ 2 q and χ 2 d are independent χ 2 random variables with q and d = p − q degrees of freedom. Since χ 2 d is a positive random variable, for any a > 0, P(χ p > a) = P(χ 2 q 2 d > a) > P(χ 2 q > a). b. For k 1 > k 2 , k 1 F k 1 ∼ (U + V )/(W/ν), where U, V and W are independent and U ∼ χ 2 k 2 , V ∼ χ 2 k 1 −k 2 and W ∼ χ 2 ν . For any a > 0, because V/(W/ν) is a positive random variable, we have P(k 1 F k 1 > a) = P((U +V )/(W/ν) > a) > P(U/(W/ν) > a) = P(k 2 F k 2 > a). c. α = P(F k,ν > F α,k,ν ) = P(kF k,ν > kF α,k,ν ). So, kF α,k,ν is the α cutoﬀ point for the random variable kF k,ν . Because kF k,ν is stochastically larger that (k−1)F k−1,ν , the α cutoﬀ for kF k,ν is larger than the α cutoﬀ for (k −1)F k−1,ν , that is kF α,k,ν > (k −1)F α,k−1,ν . 5.20 a. The given integral is _ 0 1 e −t 2 x/2 ν x 1 Γ(ν/2)2 ν/2 (νx) (ν/2)−1 e −νx/2 dx = 1 ν ν/2 Γ(ν/2)2 ν/2 _ 0 e −t 2 x/2 x ((ν+1)/2)−1 e −νx/2 dx 5-8 Solutions Manual for Statistical Inference = 1 ν ν/2 Γ(ν/2)2 ν/2 _ 0 x ((ν+1)/2)−1 e −(ν+t 2 )x/2 dx _ integrand is kernel of gamma((ν+1)/2, 2/(ν+t 2 ) _ = 1 ν ν/2 Γ(ν/2)2 ν/2 Γ((ν + 1)/2) _ 2 ν+t 2 _ (ν+1)/2 = 1 νπ Γ((ν+1)/2) Γ(ν/2) 1 (1 +t 2 /ν) (ν+1)/2 , the pdf of a t ν distribution. b. Diﬀerentiate both sides with respect to t to obtain νf F (νt) = _ 0 yf 1 (ty)f ν (y)dy, where f F is the F pdf. Now write out the two chi-squared pdfs and collect terms to get νf F (νt) = t −1/2 Γ(1/2)Γ(ν/2)2 (ν+1)/2 _ 0 y (ν−1)/2 e −(1+t)y/2 dy = t −1/2 Γ(1/2)Γ(ν/2)2 (ν+1)/2 Γ( ν+1 2 )2 (ν+1)/2 (1 +t) (ν+1)/2 . Now deﬁne y = νt to get f F (y) = Γ( ν+1 2 ) νΓ(1/2)Γ(ν/2) (y/ν) −1/2 (1 +y/ν) (ν+1)/2 , the pdf of an F 1,ν . c. Again diﬀerentiate both sides with respect to t, write out the chi-squared pdfs, and collect terms to obtain (ν/m)f F ((ν/m)t) = t −m/2 Γ(m/2)Γ(ν/2)2 (ν+m)/2 _ 0 y (m+ν−2)/2 e −(1+t)y/2 dy. Now, as before, integrate the gamma kernel, collect terms, and deﬁne y = (ν/m)t to get f F (y) = Γ( ν+m 2 ) Γ(m/2)Γ(ν/2) _ m ν _ m/2 y m/2−1 (1 + (m/ν)y) (ν+m)/2 , the pdf of an F m,ν . 5.21 Let m denote the median. Then, for general n we have P (max(X 1 , . . . , X n ) > m) = 1 −P (X i ≤ m for i = 1, 2, . . . , n) = 1 −[P (X 1 ≤ m)] n = 1 − _ 1 2 _ n . 5.22 Calculating the cdf of Z 2 , we obtain F Z 2(z) = P((min(X, Y )) 2 ≤ z) = P(−z ≤ min(X, Y ) ≤ z) = P(min(X, Y ) ≤ z) −P(min(X, Y ) ≤ − z) = [1 −P(min(X, Y ) > z)] −[1 −P(min(X, Y ) > − z)] = P(min(X, Y ) > − z) −P(min(X, Y ) > z) = P(X > − z)P(Y > − z) −P(X > z)P(Y > z), Second Edition 5-9 where we use the independence of X and Y . Since X and Y are identically distributed, P(X > a) = P(Y > a) = 1 −F X (a), so F Z 2(z) = (1 −F X (− z)) 2 −(1 −F X ( z)) 2 = 1 −2F X (− z), since 1 −F X ( z) = F X (− z). Diﬀerentiating and substituting gives f Z 2(z) = d dz F Z 2(z) = f X (− z) 1 z = 1 e −z/2 z −1/2 , the pdf of a χ 2 1 random variable. Alternatively, P(Z 2 ≤ z) = P _ [min(X, Y )] 2 ≤ z _ = P(− z ≤ min(X, Y ) ≤ z) = P(− z ≤ X ≤ z, X ≤ Y ) +P(− z ≤ Y ≤ z, Y ≤ X) = P(− z ≤ X ≤ z|X ≤ Y )P(X ≤ Y ) +P(− z ≤ Y ≤ z|Y ≤ X)P(Y ≤ X) = 1 2 P(− z ≤ X ≤ z) + 1 2 P(− z ≤ Y ≤ z), using the facts that X and Y are independent, and P(Y ≤ X) = P(X ≤ Y ) = 1 2 . Moreover, since X and Y are identically distributed P(Z 2 ≤ z) = P(− z ≤ X ≤ z) and f Z 2(z) = d dz P(− z ≤ X ≤ z) = 1 (e −z/2 1 2 z −1/2 +e −z/2 1 2 z −1/2 ) = 1 z −1/2 e −z/2 , the pdf of a χ 2 1 . 5.23 P(Z > z) = x=1 P(Z > z|x)P(X = x) = x=1 P(U 1 > z, . . . , U x > z|x)P(X = x) = x=1 x i=1 P(U i > z)P(X = x) (by independence of the U i ’s) = x=1 P(U i > z) x P(X = x) = x=1 (1 −z) x 1 (e −1)x! = 1 (e −1) x=1 (1 −z) x x! = e 1−z −1 e −1 0 < z < 1. 5.24 Use f X (x) = 1/θ, F X (x) = x/θ, 0 < x < θ. Let Y = X (n) , Z = X (1) . Then, from Theorem 5.4.6, f Z,Y (z, y) = n! 0!(n −2)!0! 1 θ 1 θ _ z θ _ 0 _ y −z θ _ n−2 _ 1− y θ _ 0 = n(n −1) θ n (y −z) n−2 , 0 < z < y < θ. 5-10 Solutions Manual for Statistical Inference Now let W = Z/Y , Q = Y . Then Y = Q, Z = WQ, and |J| = q. Therefore f W,Q (w, q) = n(n −1) θ n (q −wq) n−2 q = n(n −1) θ n (1 −w) n−2 q n−1 , 0 < w < 1, 0 < q < θ. The joint pdf factors into functions of w and q, and, hence, W and Q are independent. 5.25 The joint pdf of X (1) , . . . , X (n) is f(u 1 , . . . , u n ) = n!a n θ an u a−1 1 · · · u a−1 n , 0 < u 1 < · · · < u n < θ. Make the one-to-one transformation to Y 1 = X (1) /X (2) , . . . , Y n−1 = X (n−1) /X (n) , Y n = X (n) . The Jacobian is J = y 2 y 2 3 · · · y n−1 n . So the joint pdf of Y 1 , . . . , Y n is f(y 1 , . . . , y n ) = n!a n θ an (y 1 · · · y n ) a−1 (y 2 · · · y n ) a−1 · · · (y n ) a−1 (y 2 y 2 3 · · · y n−1 n ) = n!a n θ an y a−1 1 y 2a−1 2 · · · y na−1 n , 0 < y i < 1; i = 1, . . . , n −1, 0 < y n < θ. We see that f(y 1 , . . . , y n ) factors so Y 1 , . . . , Y n are mutually independent. To get the pdf of Y 1 , integrate out the other variables and obtain that f Y 1 (y 1 ) = c 1 y a−1 1 , 0 < y 1 < 1, for some constant c 1 . To have this pdf integrate to 1, it must be that c 1 = a. Thus f Y 1 (y 1 ) = ay a−1 1 , 0 < y 1 < 1. Similarly, for i = 2, . . . , n − 1, we obtain f Y i (y i ) = iay ia−1 i , 0 < y i < 1. From Theorem 5.4.4, the pdf of Y n is f Y n (y n ) = na θ na y na−1 n , 0 < y n < θ. It can be checked that the product of these marginal pdfs is the joint pdf given above. 5.27 a. f X (i) |X (j) (u|v) = f X (i) ,X (j) (u, v)/f X (j) (v). Consider two cases, depending on which of i or j is greater. Using the formulas from Theorems 5.4.4 and 5.4.6, and after cancellation, we obtain the following. (i) If i < j, f X (i) |X (j) (u|v) = (j −1)! (i −1)!(j −1 −i)! f X (u)F i−1 X (u)[F X (v) −F X (u)] j−i−1 F 1−j X (v) = (j −1)! (i −1)!(j −1 −i)! f X (u) F X (v) _ F X (u) F X (v) _ i−1 _ 1− F X (u) F X (v) _ j−i−1 , u < v. Note this interpretation. This is the pdf of the ith order statistic from a sample of size j−1, from a population with pdf given by the truncated distribution, f(u) = f X (u)/F X (v), u < v. (ii) If j < i and u > v, f X (i) |X (j) (u|v) = (n −j)! (n −1)!(i −1 −j)! f X (u) [1−F X (u)] n−i [F X (u) −F X (v)] i−1−j [1−F X (v)] j−n = (n −j)! (i −j −1)!(n −i)! f X (u) 1−F X (v) _ F X (u) −F X (v) 1−F X (v) _ i−j−1 _ 1− F X (u) −F X (v) 1−F X (v) _ n−i . This is the pdf of the (i−j)th order statistic from a sample of size n−j, from a population with pdf given by the truncated distribution, f(u) = f X (u)/(1 −F X (v)), u > v. b. From Example 5.4.7, f V |R (v|r) = n(n −1)r n−2 /a n n(n −1)r n−2 (a −r)/a n = 1 a −r , r/2 < v < a −r/2. Second Edition 5-11 5.29 Let X i = weight of ith booklet in package. The X i s are iid with EX i = 1 and VarX i = .05 2 . We want to approximate P _ 100 i=1 X i > 100.4 _ = P _ 100 i=1 X i /100 > 1.004 _ = P( ¯ X > 1.004). By the CLT, P( ¯ X > 1.004) ≈ P(Z > (1.004 −1)/(.05/10)) = P(Z > .8) = .2119. 5.30 From the CLT we have, approximately, ¯ X 1 ∼ n(µ, σ 2 /n), ¯ X 2 ∼ n(µ, σ 2 /n). Since ¯ X 1 and ¯ X 2 are independent, ¯ X 1 ¯ X 2 ∼ n(0, 2σ 2 /n). Thus, we want .99 ≈ P ¸ ¯ X 1 ¯ X 2 ¸ ¸ < σ/5 _ = P _ −σ/5 σ/ _ n/2 < ¯ X 1 ¯ X 2 σ/ _ n/2 < σ/5 σ/ _ n/2 _ ≈ P _ 1 5 _ n 2 < Z < 1 5 _ n 2 _ , where Z ∼ n(0, 1). Thus we need P(Z ≥ n/5( 2)) ≈ .005. From Table 1, n/5 2 = 2.576, which implies n = 50(2.576) 2 ≈ 332. 5.31 We know that σ 2 ¯ X = 9/100. Use Chebyshev’s Inequality to get P _ −3k/10 < ¯ X−µ < 3k/10 _ ≥ 1 −1/k 2 . We need 1 −1/k 2 ≥ .9 which implies k ≥ 10 = 3.16 and 3k/10 = .9487. Thus P(−.9487 < ¯ X −µ < .9487) ≥ .9 by Chebychev’s Inequality. Using the CLT, ¯ X is approximately n _ µ, σ 2 ¯ X _ with σ ¯ X = .09 = .3 and ( ¯ X −µ)/.3 ∼ n(0, 1). Thus .9 = P _ −1.645 < ¯ X−µ .3 < 1.645 _ = P(−.4935 < ¯ X −µ < .4935). Thus, we again see the conservativeness of Chebychev’s Inequality, yielding bounds on ¯ X − µ that are almost twice as big as the normal approximation. Moreover, with a sample of size 100, ¯ X is probably very close to normally distributed, even if the underlying X distribution is not close to normal. 5.32 a. For any > 0, P ¸ ¸ _ X n a ¸ ¸ ¸ > _ = P ¸ ¸ _ X n a ¸ ¸ ¸ ¸ ¸ ¸ _ X n + a ¸ ¸ ¸ > ¸ ¸ ¸ _ X n + a ¸ ¸ ¸ _ = P _ |X n −a| > ¸ ¸ ¸ _ X n + a ¸ ¸ ¸ _ ≤ P _ |X n −a| > a _ →0, as n →∞, since X n → a in probability. Thus X n a in probability. b. For any > 0, P ¸ ¸ ¸ a X n −1 ¸ ¸ ¸ ¸ _ = P _ a 1+ ≤ X n a 1− _ = P _ a − a 1+ ≤ X n ≤ a + a 1− _ ≥ P _ a − a 1+ ≤ X n ≤ a + a 1+ _ _ a + a 1+ < a + a 1− _ = P _ |X n −a| ≤ a 1+ _ →1, as n →∞, since X n → a in probability. Thus a/X n →1 in probability. 5-12 Solutions Manual for Statistical Inference c. S 2 n → σ 2 in probability. By a), S n = _ S 2 n σ 2 = σ in probability. By b), σ/S n → 1 in probability. 5.33 For all > 0 there exist N such that if n > N, then P(X n +Y n > c) > 1 −. Choose N 1 such that P(X n > −m) > 1 −/2 and N 2 such that P(Y n > c +m) > 1 −/2. Then P(X n +Y n > c) ≥ P(X n > −m, +Y n > c +m) ≥ P(X n > −m) +P(Y n > c +m) −1 = 1 −. 5.34 Using E ¯ X n = µ and Var ¯ X n = σ 2 /n, we obtain E n( ¯ X n −µ) σ = n σ E( ¯ X n −µ) = n σ (µ −µ) = 0. Var n( ¯ X n −µ) σ = n σ 2 Var( ¯ X n −µ) = n σ 2 Var ¯ X = n σ 2 σ 2 n = 1. 5.35 a. X i ∼ exponential(1). µ X = 1, VarX = 1. From the CLT, ¯ X n is approximately n(1, 1/n). So ¯ X n −1 1/ n →Z ∼ n(0, 1) and P _ ¯ X n −1 1/ n ≤ x _ →P(Z ≤ x). b. d dx P(Z ≤ x) = d dx F Z (x) = f Z (x) = 1 e −x 2 /2 . d dx P _ ¯ X n −1 1/ n ≤ x _ = d dx _ n i=1 X i ≤ x n +n _ _ W = n i=1 X i ∼ gamma(n, 1) _ = d dx F W (x n +n) = f W (x n +n) · n = 1 Γ(n) (x n +n) n−1 e −(x n+n) n. Therefore, (1/Γ(n))(x n +n) n−1 e −(x n+n) n ≈ 1 e −x 2 /2 as n →∞. Substituting x = 0 yields n! ≈ n n+1/2 e −n 2π. 5.37 a. For the exact calculations, use the fact that V n is itself distributed negative binomial(10r, p). The results are summarized in the following table. Note that the recursion relation of problem 3.48 can be used to simplify calculations. P(V n = v) (a) (b) (c) v Exact Normal App. Normal w/cont. 0 .0008 .0071 .0056 1 .0048 .0083 .0113 2 .0151 .0147 .0201 3 .0332 .0258 .0263 4 .0572 .0392 .0549 5 .0824 .0588 .0664 6 .1030 .0788 .0882 7 .1148 .0937 .1007 8 .1162 .1100 .1137 9 .1085 .1114 .1144 10 .0944 .1113 .1024 Second Edition 5-13 b. Using the normal approximation, we have µ v = r(1 −p)/p = 20(.3)/.7 = 8.57 and σ v = _ r(1 −p)/p 2 = _ (20)(.3)/.49 = 3.5. Then, P(V n = 0) = 1 −P(V n ≥ 1) = 1 −P _ V n −8.57 3.5 1−8.57 3.5 _ = 1 −P(Z ≥ −2.16) = .0154. Another way to approximate this probability is P(V n = 0) = P(V n ≤ 0) = P _ V −8.57 3.5 0−8.57 3.5 _ = P(Z ≤ −2.45) = .0071. Continuing in this way we have P(V = 1) = P(V ≤ 1) −P(V ≤ 0) = .0154 −.0071 = .0083, etc. c. With the continuity correction, compute P(V = k) by P _ (k−.5)−8.57 3.5 ≤ Z ≤ (k+.5)−8.57 3.5 _ , so P(V = 0) = P (−9.07/3.5 ≤ Z ≤ −8.07/3.5) = .0104 − .0048 = .0056, etc. Notice that the continuity correction gives some improvement over the uncorrected normal approximation. 5.39 a. If h is continuous given > 0 there exits δ such that |h(x n )−h(x)| < for |x n −x| < δ. Since X 1 , . . . , X n converges in probability to the random variable X, then lim n→∞ P(|X n −X| < δ) = 1. Thus lim n→∞ P(|h(X n ) −h(X)| < ) = 1. b. Deﬁne the subsequence X j (s) = s +I [a,b] (s) such that in I [a,b] , a is always 0, i.e, the subse- quence X 1 , X 2 , X 4 , X 7 , . . .. For this subsequence X j (s) → _ s if s > 0 s + 1 if s = 0. 5.41 a. Let = |x −µ|. (i) For x −µ ≥ 0 P(|X n −µ| > ) = P(|X n −µ| > x −µ) = P(X n −µ < −(x −µ)) +P(X n −µ > x −µ) ≥ P(X n −µ > x −µ) = P(X n > x) = 1 −P(X n ≤ x). Therefore, 0 = lim n→∞ P(|X n −µ| > ) ≥ lim n→∞ 1−P(X n ≤ x). Thus lim n→∞ P(X n x) ≥ 1. (ii) For x −µ < 0 P(|X n −µ| > ) = P(|X n −µ| > −(x −µ)) = P(X n −µ < x −µ) +P(X n −µ > −(x −µ)) ≥ P(X n −µ < x −µ) = P(X n ≤ x). Therefore, 0 = lim n→∞ P(|X n −µ| > ) ≥ lim n→∞ P(X n ≤ x). By (i) and (ii) the results follows. b. For every > 0, P(|X n −µ| > ) ≤ P(X n −µ < −) +P(X n −µ > ) = P(X n < µ −) + 1 −P(X n ≤ µ +) → 0 as n →∞. 5-14 Solutions Manual for Statistical Inference 5.43 a. P(|Y n −θ| < ) = P ¸ ¸ _ (n)(Y n −θ) ¸ ¸ ¸ < _ (n) _ . Therefore, lim n→∞ P(|Y n −θ| < ) = lim n→∞ P ¸ ¸ _ (n)(Y n −θ) ¸ ¸ ¸ < _ (n) _ = P(|Z| < ∞) = 1, where Z ∼ n(0, σ 2 ). Thus Y n →θ in probability. b. By Slutsky’s Theorem (a), g (θ) n(Y n − θ) → g (θ)X where X ∼ n(0, σ 2 ). Therefore n[g(Y n ) −g(θ)] = g (θ) n(Y n −θ) →n(0, σ 2 [g (θ)] 2 ). 5.45 We do part (a), the other parts are similar. Using Mathematica, the exact calculation is In[120]:= f1[x_]=PDF[GammaDistribution[4,25],x] p1=Integrate[f1[x],{x,100,\[Infinity]}]//N 1-CDF[BinomialDistribution[300,p1],149] Out[120]= e^(-x/25) x^3/2343750 Out[121]= 0.43347 Out[122]= 0.0119389. The answer can also be simulated in Mathematica or in R. Here is the R code for simulating the same probability p1<-mean(rgamma(10000,4,scale=25)>100) mean(rbinom(10000, 300, p1)>149) In each case 10,000 random variables were simulated. We obtained p1 = 0.438 and a binomial probability of 0.0108. 5.47 a. −2 log(U j ) ∼ exponential(2) ∼ χ 2 2 . Thus Y is the sum of ν independent χ 2 2 random variables. By Lemma 5.3.2(b), Y ∼ χ 2 . b. β log(U j ) ∼ exponential(2) ∼ gamma(1, β). Thus Y is the sum of independent gamma random variables. By Example 4.6.8, Y ∼ gamma(a, β) c. Let V = a j=1 log(U j ) ∼ gamma(a, 1). Similarly W = b j=1 log(U j ) ∼ gamma(b, 1). By Exercise 4.24, V V +W ∼ beta(a, b). 5.49 a. See Example 2.1.4. b. X = g(U) = −log 1−U U . Then g −1 (x) = 1 1+e −y . Thus f X (x) = 1 × ¸ ¸ ¸ ¸ e −y (1 +e −y ) 2 ¸ ¸ ¸ ¸ = e −y (1 +e −y ) 2 −∞< y < ∞, which is the density of a logistic(0, 1) random variable. c. Let Y ∼ logistic(µ, β) then f Y (y) = 1 β f Z ( −(y−µ) β ) where f Z is the density of a logistic(0, 1). Then Y = βZ + µ. To generate a logistic(µ, β) random variable generate (i) generate U ∼ uniform(0, 1), (ii) Set Y = β log U 1−U +µ. 5.51 a. For U i ∼ uniform(0, 1), EU i = 1/2, VarU i = 1/12. Then X = 12 i=1 U i −6 = 12 ¯ U −6 = 12 _ ¯ U−1/2 1/ 12 _ Second Edition 5-15 is in the form n _ ( ¯ U−EU)/σ _ with n = 12, so X is approximately n(0, 1) by the Central Limit Theorem. b. The approximation does not have the same range as Z ∼ n(0, 1) where −∞ < Z < +∞, since −6 < X < 6. c. EX = E _ 12 i=1 U i −6 _ = 12 i=1 EU i −6 = _ 12 i=1 1 2 _ −6 = 6 −6 = 0. VarX = Var _ 12 i=1 U i −6 _ = Var 12 i=1 U i = 12VarU 1 = 1 EX 3 = 0 since X is symmetric about 0. (In fact, all odd moments of X are 0.) Thus, the ﬁrst three moments of X all agree with the ﬁrst three moments of a n(0, 1). The fourth moment is not easy to get, one way to do it is to get the mgf of X. Since Ee tU = (e t −1)/t, E _ e t _ 12 i=1 U i −6 _ _ = e −6t _ e t −1 t _ 12 = _ e t/2 −e −t/2 t _12 . Computing the fourth derivative and evaluating it at t = 0 gives us EX 4 . This is a lengthy calculation. The answer is EX 4 = 29/10, slightly smaller than EZ 4 = 3, where Z ∼ n(0, 1). 5.53 The R code is the following: a. obs <- rbinom(1000,8,2/3) meanobs <- mean(obs) variance <- var(obs) hist(obs) Output: > meanobs [1] 5.231 > variance [1] 1.707346 b. obs<- rhyper(1000,8,2,4) meanobs <- mean(obs) variance <- var(obs) hist(obs) Output: > meanobs [1] 3.169 > variance [1] 0.4488879 c. obs <- rnbinom(1000,5,1/3) meanobs <- mean(obs) variance <- var(obs) hist(obs) Output: > meanobs [1] 10.308 > variance [1] 29.51665 5-16 Solutions Manual for Statistical Inference 5.55 Let X denote the number of comparisons. Then EX = k=0 P(X > k) = 1 + k=1 P(U > F y (y k−1 )) = 1 + k=1 (1 −F y (y k−1 )) = 1 + k=0 (1 −F y (y i )) = 1 + EY 5.57 a. Cov(Y 1 , Y 2 ) = Cov(X 1 + X 3 , X 2 + X 3 ) = Cov(X 3 , X 3 ) = λ 3 since X 1 , X 2 and X 3 are independent. b. Z i = _ 1 if X i = X 3 = 0 0 otherwise p i = P(Z i = 0) = P(Y i = 0) = P(X i = 0, X 3 = 0) = e −(λ i 3 ) . Therefore Z i are Bernoulli(p i ) with E[Z i ] = p i , Var(Z i ) = p i (1 −p i ) and E[Z 1 Z 2 ] = P(Z 1 = 1, Z 2 = 1) = P(Y 1 = 0, Y 2 = 0) = P(X 1 +X 3 = 0, X 2 +X 3 = 0) = P(X 1 = 0)P(X 2 = 0)P(X 3 = 0) = e −λ 1 e −λ 2 e −λ 3 . Therefore, Cov(Z 1 , Z 2 ) = E[Z 1 Z 2 ] −E[Z 1 ]E[Z 2 ] = e −λ 1 e −λ 2 e −λ 3 −e −(λ i 3 ) e −(λ 2 3 ) = e −(λ i 3 ) e −(λ 2 3 ) (e λ 3 −1) = p 1 p 2 (e λ 3 −1). Thus Corr(Z 1 , Z 2 ) = p 1 p 2 (e λ 3 −1) p 1 (1−p 1 ) p 2 (1−p 2 ) . c. E[Z 1 Z 2 ] ≤ p i , therefore Cov(Z 1 , Z 2 ) = E[Z 1 Z 2 ] −E[Z 1 ]E[Z 2 ] ≤ p 1 −p 1 p 2 = p 1 (1 −p 2 ), and Cov(Z 1 , Z 2 ) ≤ p 2 (1 −p 1 ). Therefore, Corr(Z 1 , Z 2 ) ≤ p 1 (1 −p 2 ) _ p 1 (1 −p 1 ) _ p 2 (1 −p 2 ) = _ p 1 (1 −p 2 ) _ p 2 (1 −p 1 ) and Corr(Z 1 , Z 2 ) ≤ p 2 (1 −p 1 ) _ p 1 (1 −p 1 ) _ p 2 (1 −p 2 ) = _ p 2 (1 −p 1 ) _ p 1 (1 −p 2 ) which implies the result. 5.59 P(Y ≤ y) = P(V ≤ y|U < 1 c f Y (V )) = P(V ≤ y, U < 1 c f Y (V )) P(U < 1 c f Y (V )) = _ y 0 _ 1 c f Y (v) 0 dudv 1 c = 1 c _ y 0 f Y (v)dv 1 c = _ y 0 f Y (v)dv 5.61 a. M = sup y Γ(a+b) Γ(a)Γ(b) y a−1 (1−y) b−1 Γ([a]+[b]) Γ([a])Γ([b]) y [a]−1 (1−y) [b]−1 < ∞, since a −[a] > 0 and b −[b] > 0 and y ∈ (0, 1). Second Edition 5-17 b. M = sup y Γ(a+b) Γ(a)Γ(b) y a−1 (1−y) b−1 Γ([a]+b) Γ([a])Γ(b) y [a]−1 (1−y) b−1 < ∞, since a −[a] > 0 and y ∈ (0, 1). c. M = sup y Γ(a+b) Γ(a)Γ(b) y a−1 (1−y) b−1 Γ([a]+1+β) Γ([a]+1)Γ(b ) y [a]+1−1 (1−y) b −1 < ∞, since a − [a] − 1 < 0 and y ∈ (0, 1). b − b > 0 when b = [b] and will be equal to zero when b = b, thus it does not aﬀect the result. d. Let f(y) = y α (1 −y) β . Then df(y) dy = αy α−1 (1 −y) β −y α β(1 −y) β−1 = y α−1 (1 −y) β−1 [α(1 −y) +βy] which is maximize at y = α α+β . Therefore for, α = a −a and β = b −b M = Γ(a+b) Γ(a)Γ(b) Γ(a +b ) Γ(a )Γ(b ) _ a −a a −a +b −b _ a−a _ b −b a −a +b −b _ b−b . We need to minimize M in a and b . First consider _ a−a a−a +b−b _ a−a _ b−b a−a +b−b _ b−b . Let c = α + β, then this term becomes _ α c _ α _ c−α c _ c−α . This term is maximize at α c = 1 2 , this is at α = 1 2 c. Then M = ( 1 2 ) (a−a +b−b ) Γ(a+b) Γ(a)Γ(b) Γ(a +b ) Γ(a )Γ(b ) . Note that the minimum that M could be is one, which it is attain when a = a and b = b . Otherwise the minimum will occur when a − a and b − b are minimum but greater or equal than zero, this is when a = [a] and b = [b] or a = a and b = [b] or a = [a] and b = b. 5.63 M = sup y 1 e −y 2 2 1 e −|y| λ . Let f(y) = −y 2 2 + |y| λ . Then f(y) is maximize at y = 1 λ when y ≥ 0 and at y = −1 λ when y < 0. Therefore in both cases M = 1 e −1 2 1 e −1 λ 2 . To minimize M let M = λe 1 2 . Then d log M = 1 λ 1 λ 3 , therefore M is minimize at λ = 1 or λ = −1. Thus the value of λ that will optimize the algorithm is λ = 1. 5.65 P(X ≤ x) = m i=1 P(X ≤ x|q i )q i = m i=1 I(Y i ≤ x)q i = 1 m m i=1 f(Y i ) g(Y i ) I(Y i ≤ x) 1 m m i=1 f(Y i ) g(Y i ) −→ m→∞ E g f(Y ) g(Y ) I(Y ≤ x) E g f(Y ) g(Y ) = _ x −∞ f(y) g(y) g(y)dy _ −∞ f(y) g(y) g(y)dy = _ x −∞ f(y)dy. 5.67 An R code to generate the sample of size 100 from the speciﬁed distribution is shown for part c). The Metropolis Algorithm is used to generate 2000 variables. Among other options one can choose the 100 variables in positions 1001 to 1100 or the ones in positions 1010, 1020, ..., 2000. a. We want to generate X = σZ + µ where Z ∼ Student’s t with ν degrees of freedom. Therefore we ﬁrst can generate a sample of size 100 from a Student’s t distribution with ν degrees of freedom and then make the transformation to obtain the X’s. Thus f Z (z) = Γ( ν+1 2 ) Γ( ν 2 ) 1 νπ 1 _ 1+ _ z 2 ν __ (v+1)/2 . Let V ∼ n(0, ν ν−2 ) since given ν we can set EV = EZ = 0, and Var(V ) = Var(Z) = ν ν −2 . Now, follow the algorithm on page 254 and generate the sample Z 1 , Z 2 . . . , Z 100 and then calculate X i = σZ i +µ. 5-18 Solutions Manual for Statistical Inference b. f X (x) = 1 2πσ e −(log x−µ) 2 /2σ 2 x . Let V ∼ gamma(α, β) where α = (e µ+(σ 2 /2) ) 2 e 2(µ+σ 2 ) −e 2µ+σ 2 , and β = e 2(µ+σ 2 ) −e 2µ+σ 2 e µ+(σ 2 /2) , since given µ and σ 2 we can set EV = αβ = e µ+(σ 2 /2) = EX and Var(V ) = αβ 2 = e 2(µ+σ 2 ) −e 2µ+σ 2 = Var(X). Now, follow the algorithm on page 254. c. f X (x) = α β e −x α β x α−1 . Let V ∼ exponential(β). Now, follow the algorithm on page 254 where ρ i = min _ V α−1 i Z α−1 i−1 e −V α i +V i −Z i−1 +Z α i−1 β , 1 _ An R code to generate a sample size of 100 from a Weibull(3,2) is: #initialize a and b b <- 2 a <- 3 Z <- rexp(1,1/b) ranvars <- matrix(c(Z),byrow=T,ncol=1) for( i in seq(2000)) { U <- runif(1,min=0,max=1) V <- rexp(1,1/b) p <- pmin((V/Z)^(a-1)*exp((-V^a+V-Z+Z^a)/b),1) if (U <= p) Z <- V ranvars <- cbind(ranvars,Z) } #One option: choose elements in position 1001,1002,...,1100 to be the sample vector.1 <- ranvars[1001:1100] mean(vector.1) var(vector.1) #Another option: choose elements in position 1010,1020,...,2000 to be the sample vector.2 <- ranvars[seq(1010,2000,10)] mean(vector.2) var(vector.2) Output: [1] 1.048035 [1] 0.1758335 [1] 1.130649 [1] 0.1778724 5.69 Let w(v, z) = f Y (v)f V (z) f V (v)f Y (z) , and then ρ(v, z) = min{w(v, z), 1}. We will show that Z i ∼ f Y ⇒P(Z i+1 ≤ a) = P(Y ≤ a). Second Edition 5-19 Write P(Z i+1 ≤ a) = P(V i+1 ≤ a and U i+1 ≤ ρ i+1 ) +P(Z i ≤ a and U i+1 > ρ i+1 ). Since Z i ∼ f Y , suppressing the unnecessary subscripts we can write P(Z i+1 ≤ a) = P(V ≤ a and U ≤ ρ(V, Y )) +P(Y ≤ a and U > ρ(V, Y )). Add and subtract P(Y ≤ a and U ≤ ρ(V, Y )) to get P(Z i+1 ≤ a) = P(Y ≤ a) +P(V ≤ a and U ≤ ρ(V, Y )) −P(Y ≤ a and U ≤ ρ(V, Y )). Thus we need to show that P(V ≤ a and U ≤ ρ(V, Y )) = P(Y ≤ a and U ≤ ρ(V, Y )). Write out the probability as P(V ≤ a and U ≤ ρ(V, Y )) = _ a −∞ _ −∞ ρ(v, y)f Y (y)f V (v)dydv = _ a −∞ _ −∞ I(w(v, y) ≤ 1) _ f Y (v)f V (y) f V (v)f Y (y) _ f Y (y)f V (v)dydv + _ a −∞ _ −∞ I(w(v, y) ≥ 1)f Y (y)f V (v)dydv = _ a −∞ _ −∞ I(w(v, y) ≤ 1)f Y (v)f V (y)dydv + _ a −∞ _ −∞ I(w(v, y) ≥ 1)f Y (y)f V (v)dydv. Now, notice that w(v, y) = 1/w(y, v), and thus ﬁrst term above can be written _ a −∞ _ −∞ I(w(v, y) ≤ 1)f Y (v)f V (y)dydv = _ a −∞ _ −∞ I(w(y, v) > 1)f Y (v)f V (y)dydv = P(Y ≤ a, ρ(V, Y ) = 1, U ≤ ρ(V, Y )). The second term is _ a −∞ _ −∞ I(w(v, y) ≥ 1)f Y (y)f V (v)dydv = _ a −∞ _ −∞ I(w(y, v) ≤ 1)f Y (y)f V (v)dydv = _ a −∞ _ −∞ I(w(y, v) ≤ 1) _ f V (y)f Y (v) f V (y)f Y (v) _ f Y (y)f V (v)dydv = _ a −∞ _ −∞ I(w(y, v) ≤ 1) _ f Y (y)f V (v) f V (y)f Y (v) _ f V (y)f Y (v)dydv = _ a −∞ _ −∞ I(w(y, v) ≤ 1)w(y, v)f V (y)f Y (v)dydv = P(Y ≤ a, U ≤ ρ(V, Y ), ρ(V, Y ) ≤ 1). 5-20 Solutions Manual for Statistical Inference Putting it all together we have P(V ≤ a and U ≤ ρ(V, Y )) = P(Y ≤ a, ρ(V, Y ) = 1, U ≤ ρ(V, Y )) +P(Y ≤ a, U ≤ ρ(V, Y ), ρ(V, Y ) ≤ 1) = P(Y ≤ a and U ≤ ρ(V, Y )), and hence P(Z i+1 ≤ a) = P(Y ≤ a), so f Y is the stationary density. Chapter 6 Principles of Data Reduction 6.1 By the Factorization Theorem, |X| is suﬃcient because the pdf of X is f(x|σ 2 ) = 1 2πσ e −x 2 /2σ 2 = 1 2πσ e −|x| 2 /2σ 2 = g( |x|| σ 2 ) · 1 .¸¸. h(x) . 6.2 By the Factorization Theorem, T(X) = min i (X i /i) is suﬃcient because the joint pdf is f(x 1 , . . . , x n |θ) = n i=1 e iθ−x i I (iθ,+∞) (x i ) = e inθ I (θ,+∞) (T(x)) . ¸¸ . g(T(x)|θ) · e −Σ i x i . ¸¸ . h(x) . Notice, we use the fact that i > 0, and the fact that all x i s > iθ if and only if min i (x i /i) > θ. 6.3 Let x (1) = min i x i . Then the joint pdf is f(x 1 , . . . , x n |µ, σ) = n i=1 1 σ e −(x i −µ)/σ I (µ,∞) (x i ) = _ e µ/σ σ _n e −Σ i x i I (µ,∞) (x (1) ) . ¸¸ . g( x (1) i x i|µ,σ) · 1 .¸¸. h(x) . Thus, by the Factorization Theorem, _ X (1) , i X i _ is a suﬃcient statistic for (µ, σ). 6.4 The joint pdf is n j=1 _ h(x j )c(θ) exp _ k i=1 w i (θ)t i (x j ) __ = c(θ) n exp _ _ k i=1 w i (θ) n j=1 t i (x j ) _ _ . ¸¸ . g(T(x)|θ) · n j=1 h(x j ) . ¸¸ . h(x) . By the Factorization Theorem, _ n j=1 t 1 (X j ), . . . , n j=1 t k (X j ) _ is a suﬃcient statistic for θ. 6.5 The sample density is given by n i=1 f(x i |θ) = n i=1 1 2iθ I (−i(θ −1) ≤ x i ≤ i(θ + 1)) = _ 1 _ n _ n i=1 1 i _ I _ min x i i ≥ −(θ −1) _ I _ max x i i ≤ θ + 1 _ . Thus (min X i /i, max X i /i) is suﬃcient for θ. 6-2 Solutions Manual for Statistical Inference 6.6 The joint pdf is given by f(x 1 , . . . , x n |α, β) = n i=1 1 Γ(α)β α x i α−1 e −x i = _ 1 Γ(α)β α _ n _ n i=1 x i _ α−1 e −Σ i x i . By the Factorization Theorem, ( n i=1 X i , n i=1 X i ) is suﬃcient for (α, β). 6.7 Let x (1) = min i {x 1 , . . . , x n }, x (n) = max i {x 1 , . . . , x n }, y (1) = min i {y 1 , . . . , y n } and y (n) = max i {y 1 , . . . , y n }. Then the joint pdf is f(x, y|θ) = n i=1 1 3 −θ 1 )(θ 4 −θ 2 ) I 1 3 ) (x i )I 2 4 ) (y i ) = _ 1 3 −θ 1 )(θ 4 −θ 2 ) _ n I 1 ,∞) (x (1) )I (−∞,θ 3 ) (x (n) )I 2 ,∞) (y (1) )I (−∞,θ 4 ) (y (n) ) . ¸¸ . g(T(x)|θ) · 1 .¸¸. h(x) . By the Factorization Theorem, _ X (1) , X (n) , Y (1) , Y (n) _ is suﬃcient for (θ 1 , θ 2 , θ 3 , θ 4 ). 6.9 Use Theorem 6.2.13. a. f(x|θ) f(y|θ) = (2π) −n/2 e −Σ i (x i −θ) 2 /2 (2π) −n/2 e −Σ i (y i −θ) 2 /2 = exp _ 1 2 __ n i=1 x 2 i n i=1 y 2 i _ +2θn(¯ y−¯ x) __ . This is constant as a function of θ if and only if ¯ y = ¯ x ; therefore ¯ X is a minimal suﬃcient statistic for θ. b. Note, for X ∼ location exponential(θ), the range depends on the parameter. Now f(x|θ) f(y|θ) = n i=1 _ e −(x i −θ) I (θ,∞) (x i ) _ n i=1 _ e −(y i −θ) I (θ,∞) (y i ) _ = e e −Σ i x i n i=1 I (θ,∞) (x i ) e e −Σ i y i n i=1 I (θ,∞) (y i ) = e −Σ i x i I (θ,∞) (min x i ) e −Σ i y i I (θ,∞) (min y i ) . To make the ratio independent of θ we need the ratio of indicator functions independent of θ. This will be the case if and only if min{x 1 , . . . , x n } = min{y 1 , . . . , y n }. So T(X) = min{X 1 , . . . , X n } is a minimal suﬃcient statistic. c. f(x|θ) f(y|θ) = e −Σ i (x i −θ) n i=1 _ 1 +e −(x i −θ) _ 2 n i=1 _ 1 +e −(y i −θ) _ 2 e −Σ i (y i −θ) = e −Σ i (y i −x i ) _ n i=1 _ 1 +e −(y i −θ) _ n i=1 _ 1 +e −(x i −θ) _ _ 2 . This is constant as a function of θ if and only if x and y have the same order statistics. Therefore, the order statistics are minimal suﬃcient for θ. d. This is a diﬃcult problem. The order statistics are a minimal suﬃcient statistic. Second Edition 6-3 e. Fix sample points x and y. Deﬁne A(θ) = {i : x i ≤ θ}, B(θ) = {i : y i ≤ θ}, a(θ) = the number of elements in A(θ) and b(θ) = the number of elements in B(θ). Then the function f(x|θ)/f(y|θ) depends on θ only through the function n i=1 | x i −θ | − n i=1 | y i −θ | = i∈A(θ) (θ −x i ) + i∈A(θ) c (x i −θ) − i∈B(θ) (θ −y i ) − i∈B(θ) c (y i −θ) = (a(θ) −[n −a(θ)] −b(θ) + [n −b(θ)])θ + _ _ i∈A(θ) x i + i∈A(θ) c x i + i∈B(θ) y i i∈B(θ) c y i _ _ = 2(a(θ) −b(θ))θ + _ _ i∈A(θ) x i + i∈A(θ) c x i + i∈B(θ) y i i∈B(θ) c y i _ _ . Consider an interval of θs that does not contain any x i s or y i s. The second term is constant on such an interval. The ﬁrst term will be constant, on the interval if and only if a(θ) = b(θ). This will be true for all such intervals if and only if the order statistics for x are the same as the order statistics for y. Therefore, the order statistics are a minimal suﬃcient statistic. 6.10 To prove T(X) = (X (1) , X (n) ) is not complete, we want to ﬁnd g[T(X)] such that Eg[T(X)] = 0 for all θ, but g[T(X)] ≡ 0 . A natural candidate is R = X (n) −X (1) , the range of X, because by Example 6.2.17 its distribution does not depend on θ. From Example 6.2.17, R ∼ beta(n−1, 2). Thus ER = (n − 1)/(n + 1) does not depend on θ, and E(R − ER) = 0 for all θ. Thus g[X (n) , X (1) ] = X (n) −X (1) −(n−1)/(n+1) = R−ER is a nonzero function whose expected value is always 0. So, (X (1) , X (n) ) is not complete. This problem can be generalized to show that if a function of a suﬃcient statistic is ancillary, then the suﬃcient statistic is not complete, because the expectation of that function does not depend on θ. That provides the opportunity to construct an unbiased, nonzero estimator of zero. 6.11 a. These are all location families. Let Z (1) , . . . , Z (n) be the order statistics from a random sample of size n from the standard pdf f(z|0). Then (Z (1) + θ, . . . , Z (n) + θ) has the same joint distribution as (X (1) , . . . , X (n) ), and (Y (1) , . . . , Y (n−1) ) has the same joint distribution as (Z (n) + θ − (Z (1) + θ), . . . , Z (n) + θ − (Z (n−1) + θ)) = (Z (n) − Z (1) , . . . , Z (n) − Z (n−1) ). The last vector depends only on (Z 1 , . . . , Z n ) whose distribution does not depend on θ. So, (Y (1) , . . . , Y (n−1) ) is ancillary. b. For a), Basu’s lemma shows that (Y 1 , . . . ,Y n−1 ) is independent of the complete suﬃcient statistic. For c), d), and e) the order statistics are suﬃcient, so (Y 1 , . . . ,Y n−1 ) is not inde- pendent of the suﬃcient statistic. For b), X (1) is suﬃcient. Deﬁne Y n = X (1) . Then the joint pdf of (Y 1 , . . . ,Y n ) is f(y 1 , . . . , y n ) = n!e −n(y 1 −θ) e −(n−1)y n n−1 i=2 e y i , 0 < y n−1 < y n−2 < · · · < y 1 0 < y n < ∞. Thus, Y n = X (1) is independent of (Y 1 , . . . , Y n−1 ). 6.12 a. Use Theorem 6.2.13 and write f(x, n|θ) f(y, n |θ) = f(x|θ, N = n)P(N = n) f(y|θ, N = n )P(N = n ) = _ n x _ θ x (1−θ) n−x p n _ n y _ θ y (1−θ) n −y p n = θ x−y (1 −θ) n−n −x+y _ n x _ p n _ n y _ p n . 6-4 Solutions Manual for Statistical Inference The last ratio does not depend on θ. The other terms are constant as a function of θ if and only if n = n and x = y. So (X, N) is minimal suﬃcient for θ. Because P(N = n) = p n does not depend on θ, N is ancillary for θ. The point is that although N is independent of θ, the minimal suﬃcient statistic contains N in this case. A minimal suﬃcient statistic may contain an ancillary statistic. b. E _ X N _ = E _ E _ X N ¸ ¸ ¸ ¸ N __ = E _ 1 N E(X | N) _ = E _ 1 N _ = E(θ) = θ. Var _ X N _ = Var _ E _ X N ¸ ¸ ¸ ¸ N __ + E _ Var _ X N ¸ ¸ ¸ ¸ N __ = Var(θ) + E _ 1 N 2 Var (X | N) _ = 0 + E _ Nθ(1−θ) N 2 _ = θ(1 −θ)E _ 1 N _ . We used the fact that X|N ∼ binomial(N, θ). 6.13 Let Y 1 = log X 1 and Y 2 = log X 2 . Then Y 1 and Y 2 are iid and, by Theorem 2.1.5, the pdf of each is f(y|α) = αexp {αy −e αy } = 1 1/α exp _ y 1/α −e y/(1/α) _ , −∞< y < ∞. We see that the family of distributions of Y i is a scale family with scale parameter 1/α. Thus, by Theorem 3.5.6, we can write Y i = 1 α Z i , where Z 1 and Z 2 are a random sample from f(z|1). Then logX 1 logX 2 = Y 1 Y 2 = (1/α)Z 1 (1/α)Z 2 = Z 1 Z 2 . Because the distribution of Z 1 /Z 2 does not depend on α, (log X 1 )/(log X 2 ) is an ancillary statistic. 6.14 Because X 1 , . . . , X n is from a location family, by Theorem 3.5.6, we can write X i = Z i +µ, where Z 1 , . . . , Z n is a random sample from the standard pdf, f(z), and µ is the location parameter. Let M(X) denote the median calculated from X 1 , . . . , X n . Then M(X) = M(Z)+µ and ¯ X = ¯ Z+µ. Thus, M(X) − ¯ X = (M(Z) + µ) −( ¯ Z + µ) = M(Z) − ¯ Z. Because M(X) − ¯ X is a function of only Z 1 , . . . , Z n , the distribution of M(X) − ¯ X does not depend on µ; that is, M(X) − ¯ X is an ancillary statistic. 6.15 a. The parameter space consists only of the points (θ, ν) on the graph of the function ν = aθ 2 . This quadratic graph is a line and does not contain a two-dimensional open set. b. Use the same factorization as in Example 6.2.9 to show ( ¯ X, S 2 ) is suﬃcient. E(S 2 ) = aθ 2 and E( ¯ X 2 ) = Var ¯ X + (E ¯ X) 2 = aθ 2 /n +θ 2 = (a +n)θ 2 /n. Therefore, E _ n a +n ¯ X 2 S 2 a _ = _ n a +n __ a +n n θ 2 _ 1 a 2 = 0, for all θ. Thus g( ¯ X, S 2 ) = n a+n ¯ X 2 S 2 a has zero expectation so ( ¯ X, S 2 ) not complete. 6.17 The population pmf is f(x|θ) = θ(1−θ) x−1 = θ 1−θ e log(1−θ)x , an exponential family with t(x) = x. Thus, i X i is a complete, suﬃcient statistic by Theorems 6.2.10 and 6.2.25. i X i −n ∼ negative binomial(n, θ). 6.18 The distribution of Y = i X i is Poisson(nλ). Now Eg(Y ) = y=0 g(y) (nλ) y e −nλ y! . If the expectation exists, this is an analytic function which cannot be identically zero. Second Edition 6-5 6.19 To check if the family of distributions of X is complete, we check if E p g(X) = 0 for all p, implies that g(X) ≡ 0. For Distribution 1, E p g(X) = 2 x=0 g(x)P(X = x) = pg(0) + 3pg(1) + (1 −4p)g(2). Note that if g(0) = −3g(1) and g(2) = 0, then the expectation is zero for all p, but g(x) need not be identically zero. Hence the family is not complete. For Distribution 2 calculate E p g(X) = g(0)p +g(1)p 2 +g(2)(1 −p −p 2 ) = [g(1) −g(2)]p 2 + [g(0) −g(2)]p +g(2). This is a polynomial of degree 2 in p. To make it zero for all p each coeﬃcient must be zero. Thus, g(0) = g(1) = g(2) = 0, so the family of distributions is complete. 6.20 The pdfs in b), c), and e) are exponential families, so they have complete suﬃcient statistics from Theorem 6.2.25. For a), Y = max{X i } is suﬃcient and f(y) = 2n θ 2n y 2n−1 , 0 < y < θ. For a function g(y), Eg(Y ) = _ θ 0 g(y) 2n θ 2n y 2n−1 dy = 0 for all θ implies g(θ) 2nθ 2n−1 θ 2n = 0 for all θ by taking derivatives. This can only be zero if g(θ) = 0 for all θ, so Y = max{X i } is complete. For d), the order statistics are minimal suﬃcient. This is a location family. Thus, by Example 6.2.18 the range R = X (n) − X (1) is ancillary, and its expectation does not depend on θ. So this suﬃcient statistic is not complete. 6.21 a. X is suﬃcient because it is the data. To check completeness, calculate Eg(X) = θ 2 g(−1) + (1 −θ)g(0) + θ 2 g(1). If g(−1) = g(1) and g(0) = 0, then Eg(X) = 0 for all θ, but g(x) need not be identically 0. So the family is not complete. b. |X| is suﬃcient by Theorem 6.2.6, because f(x|θ) depends on x only through the value of |x|. The distribution of |X| is Bernoulli, because P(|X| = 0) = 1 − θ and P(|X| = 1) = θ. By Example 6.2.22, a binomial family (Bernoulli is a special case) is complete. c. Yes, f(x|θ) = (1 − θ)(θ/(2(1 − θ)) |x| = (1 − θ)e |x|log[θ/(2(1−θ)] , the form of an exponential family. 6.22 a. The sample density is i θx θ−1 i = θ n ( i x i ) θ−1 , so i X i is suﬃcient for θ, not i X i . b. Because i f(x i |θ) = θ n e (θ−1) log(Π i x i ) , log ( i X i ) is complete and suﬃcient by Theorem 6.2.25. Because i X i is a one-to-one function of log ( i X i ), i X i is also a complete suﬃcient statistic. 6.23 Use Theorem 6.2.13. The ratio f(x|θ) f(y|θ) = θ −n I (x (n) /2,x (1) ) (θ) θ −n I (y (n) /2,y (1) ) (θ) is constant (in fact, one) if and only if x (1) = y (1) and x (n) = y (n) . So (X (1) , X (n) ) is a minimal suﬃcient statistic for θ. From Exercise 6.10, we know that if a function of the suﬃcient statistics is ancillary, then the suﬃcient statistic is not complete. The uniform(θ, 2θ) family is a scale family, with standard pdf f(z) ∼ uniform(1, 2). So if Z 1 , . . . , Z n is a random sample 6-6 Solutions Manual for Statistical Inference from a uniform(1, 2) population, then X 1 = θZ 1 , . . . , X n = θZ n is a random sample from a uniform(θ, 2θ) population, and X (1) = θZ (1) and X (n) = θZ (n) . So X (1) /X (n) = Z (1) /Z (n) , a statistic whose distribution does not depend on θ. Thus, as in Exercise 6.10, (X (1) , X (n) ) is not complete. 6.24 If λ = 0, Eh(X) = h(0). If λ = 1, Eh(X) = e −1 h(0) +e −1 x=1 h(x) x! . Let h(0) = 0 and x=1 h(x) x! = 0, so Eh(X) = 0 but h(x) ≡ 0. (For example, take h(0) = 0, h(1) = 1, h(2) = −2, h(x) = 0 for x ≥ 3 .) 6.25 Using the fact that (n − 1)s 2 x = i x 2 i − n¯ x 2 , for any (µ, σ 2 ) the ratio in Example 6.2.14 can be written as f(x|µ, σ 2 ) f(y|µ, σ 2 ) = exp _ µ σ 2 _ i x i i y i _ 1 2 _ i x 2 i i y 2 i __ . a. Do part b) ﬁrst showing that i X 2 i is a minimal suﬃcient statistic. Because _ i X i , i X 2 i _ is not a function of i X 2 i , by Deﬁnition 6.2.11 _ i X i , i X 2 i _ is not minimal. b. Substituting σ 2 = µ in the above expression yields f(x|µ, µ) f(y|µ, µ) = exp _ i x i i y i _ exp _ 1 _ i x 2 i i y 2 i __ . This is constant as a function of µ if and only if i x 2 i = i y 2 i . Thus, i X 2 i is a minimal suﬃcient statistic. c. Substituting σ 2 = µ 2 in the ﬁrst expression yields f(x|µ, µ 2 ) f(y|µ, µ 2 ) = exp _ 1 µ _ i x i i y i _ 1 2 _ i x 2 i i y 2 i __ . This is constant as a function of µ if and only if i x i = i y i and i x 2 i = i y 2 i . Thus, _ i X i , i X 2 i _ is a minimal suﬃcient statistic. d. The ﬁrst expression for the ratio is constant a function of µ and σ 2 if and only if i x i = i y i and i x 2 i = i y 2 i . Thus, _ i X i , i X 2 i _ is a minimal suﬃcient statistic. 6.27 a. This pdf can be written as f(x|µ, λ) = _ λ _ 1/2 _ 1 x 3 _ 1/2 exp _ λ µ _ exp _ λ 2 x − λ 2 1 x _ . This is an exponential family with t 1 (x) = x and t 2 (x) = 1/x. By Theorem 6.2.25, the statistic ( i X i , i (1/X i )) is a complete suﬃcient statistic. ( ¯ X, T) given in the problem is a one-to-one function of ( i X i , i (1/X i )). Thus, ( ¯ X, T) is also a complete suﬃcient statistic. b. This can be accomplished using the methods from Section 4.3 by a straightforward but messy two-variable transformation U = (X 1 +X 2 )/2 and V = 2λ/T = λ[(1/X 1 ) +(1/X 2 ) − (2/[X 1 +X 2 ])]. This is a two-to-one transformation. Second Edition 6-7 6.29 Let f j = logistic(α j , β j ), j = 0, 1, . . . , k. From Theorem 6.6.5, the statistic T(x) = _ n i=1 f 1 (x i ) n i=1 f 0 (x i ) , . . . , n i=1 f k (x i ) n i=1 f 0 (x i ) _ = _ n i=1 f 1 (x (i) ) n i=1 f 0 (x (i) ) , . . . , n i=1 f k (x (i) ) n i=1 f 0 (x (i) ) _ is minimal suﬃcient for the family {f 0 , f 1 , . . . , f k }. As T is a 1 − 1 function of the order statistics, the order statistics are also minimal suﬃcient for the family {f 0 , f 1 , . . . , f k }. If F is a nonparametric family, f j ∈ F, so part (b) of Theorem 6.6.5 can now be directly applied to show that the order statistics are minimal suﬃcient for F. 6.30 a. From Exercise 6.9b, we have that X (1) is a minimal suﬃcient statistic. To check completeness compute f Y 1 (y), where Y 1 = X (1) . From Theorem 5.4.4 we have f Y 1 (y) = f X (y) (1−F X (y)) n−1 n = e −(y−µ) _ e −(y−µ) _ n−1 n = ne −n(y−µ) , y > µ. Now, write E µ g(Y 1 ) = _ µ g(y)ne −n(y−µ) dy. If this is zero for all µ, then _ µ g(y)e −ny dy = 0 for all µ (because ne > 0 for all µ and does not depend on y). Moreover, 0 = d __ µ g(y)e −ny dy _ = −g(µ)e −nµ for all µ. This implies g(µ) = 0 for all µ, so X (1) is complete. b. Basu’s Theorem says that if X (1) is a complete suﬃcient statistic for µ, then X (1) is inde- pendent of any ancillary statistic. Therefore, we need to show only that S 2 has distribution independent of µ; that is, S 2 is ancillary. Recognize that f(x|µ) is a location family. So we can write X i = Z i +µ, where Z 1 , . . . , Z n is a random sample from f(x|0). Then S 2 = 1 n −1 (X i ¯ X) 2 = 1 n −1 ((Z i +µ) −( ¯ Z +µ)) 2 = 1 n −1 (Z i ¯ Z) 2 . Because S 2 is a function of only Z 1 , . . . , Z n , the distribution of S 2 does not depend on µ; that is, S 2 is ancillary. Therefore, by Basu’s theorem, S 2 is independent of X (1) . 6.31 a. (i) By Exercise 3.28 this is a one-dimensional exponential family with t(x) = x. By Theorem 6.2.25, i X i is a complete suﬃcient statistic. ¯ X is a one-to-one function of i X i , so ¯ X is also a complete suﬃcient statistic. From Theorem 5.3.1 we know that (n − 1)S 2 2 ∼ χ 2 n−1 = gamma((n −1)/2, 2). S 2 = [σ 2 /(n −1)][(n −1)S 2 2 ], a simple scale transformation, has a gamma((n−1)/2, 2σ 2 /(n−1)) distribution, which does not depend on µ; that is, S 2 is ancillary. By Basu’s Theorem, ¯ X and S 2 are independent. (ii) The independence of ¯ X and S 2 is determined by the joint distribution of ( ¯ X, S 2 ) for each value of (µ, σ 2 ). By part (i), for each value of (µ, σ 2 ), ¯ X and S 2 are independent. b. (i) µ is a location parameter. By Exercise 6.14, M − ¯ X is ancillary. As in part (a) ¯ X is a complete suﬃcient statistic. By Basu’s Theorem, ¯ X and M− ¯ X are independent. Because they are independent, by Theorem 4.5.6 Var M = Var(M− ¯ X+ ¯ X) = Var(M− ¯ X)+Var ¯ X. (ii) If S 2 is a sample variance calculated from a normal sample of size N, (N − 1)S 2 2 χ 2 N−1 . Hence, (N − 1) 2 Var S 2 /(σ 2 ) 2 = 2(N − 1) and Var S 2 = 2(σ 2 ) 2 /(N − 1). Both M and M − ¯ X are asymptotically normal, so, M 1 , . . . , M N and M 1 ¯ X 1 , . . . , M N ¯ X N are each approximately normal samples if n is reasonable large. Thus, using the above expression we get the two given expressions where in the straightforward case σ 2 refers to Var M, and in the swindle case σ 2 refers to Var(M − ¯ X). c. (i) E(X k ) = E _ X Y Y _ k = E _ _ X Y _ k _ Y k _ _ indep. = E _ X Y _ k E _ Y k _ . Divide both sides by E _ Y k _ to obtain the desired equality. 6-8 Solutions Manual for Statistical Inference (ii) If α is ﬁxed, T = i X i is a complete suﬃcient statistic for β by Theorem 6.2.25. Because β is a scale parameter, if Z 1 , . . . , Z n is a random sample from a gamma(α, 1) distribution, then X (i) /T has the same distribution as (βZ (i) )/ (β i Z i ) = Z (i) / ( i Z i ), and this distribution does not depend on β. Thus, X (i) /T is ancillary, and by Basu’s Theorem, it is independent of T. We have E(X (i) |T) = E _ X (i) T T ¸ ¸ ¸ ¸ T _ = TE _ X (i) T ¸ ¸ ¸ ¸ T _ indep. = TE _ X (i) T _ part (i) = T E(X (i) ) ET . Note, this expression is correct for each ﬁxed value of (α, β), regardless whether α is “known” or not. 6.32 In the Formal Likelihood Principle, take E 1 = E 2 = E. Then the conclusion is Ev(E, x 1 ) = Ev(E, x 2 ) if L(θ|x 1 )/L(θ|x 2 ) = c. Thus evidence is equal whenever the likelihood functions are equal, and this follows from Formal Suﬃciency and Conditionality. 6.33 a. For all sample points except (2, x 2 ) (but including (1, x 1 )), T(j, x j ) = (j, x j ). Hence, g(T(j, x j )|θ)h(j, x j ) = g((j, x j )|θ)1 = f ((j, x j )|θ). For (2, x 2 ) we also have g(T(2, x 2 )|θ)h(2, x 2 ) = g((1, x 1 )|θ)C = f ((1, x 1 )|θ)C = C 1 2 f 1 (x 1 |θ) = C 1 2 L(θ|x 1 ) = 1 2 L(θ|x 2 ) = 1 2 f 2 (x 2 |θ) = f ((2, x 2 )|θ). By the Factorization Theorem, T(J, X J ) is suﬃcient. b. Equations 6.3.4 and 6.3.5 follow immediately from the two Principles. Combining them we have Ev(E 1 , x 1 ) = Ev(E 2 , x 2 ), the conclusion of the Formal Likelihood Principle. c. To prove the Conditionality Principle. Let one experiment be the E experiment and the other E j . Then L(θ|(j, x j )) = f ((j, x j )|θ) = 1 2 f j (x j |θ) = 1 2 L(θ|x j ). Letting (j, x j ) and x j play the roles of x 1 and x 2 in the Formal Likelihood Principle we can conclude Ev(E , (j, x j )) = Ev(E j , x j ), the Conditionality Principle. Now consider the Formal Suﬃciency Principle. If T(X) is suﬃcient and T(x) = T(y), then L(θ|x) = CL(θ|y), where C = h(x)/h(y) and h is the function from the Factorization Theorem. Hence, by the Formal Likelihood Principle, Ev(E, x) = Ev(E, y), the Formal Suﬃciency Principle. 6.35 Let 1 = success and 0 = failure. The four sample points are {0, 10, 110, 111}. From the likelihood principle, inference about p is only through L(p|x). The values of the likelihood are 1, p, p 2 , and p 3 , and the sample size does not directly inﬂuence the inference. 6.37 a. For one observation (X, Y ) we have I(θ) = −E _ 2 ∂θ 2 log f(X, Y |θ) _ = −E _ 2Y θ 3 _ = 2EY θ 3 . But, Y ∼ exponential(θ), and EY = θ. Hence, I(θ) = 2/θ 2 for a sample of size one, and I(θ) = 2n/θ 2 for a sample of size n. b. (i) The cdf of T is P(T ≤ t) = P _ i Y i i X i ≤ t 2 _ = P _ 2 i Y i 2 i X i θ ≤ t 2 2 _ = P(F 2n,2n ≤ t 2 2 ) Second Edition 6-9 where F 2n,2n is an F random variable with 2n degrees of freedom in the numerator and denominator. This follows since 2Y i /θ and 2X i θ are all independent exponential(1), or χ 2 2 . Diﬀerentiating (in t) and simplifying gives the density of T as f T (t) = Γ(2n) Γ(n) 2 2 t _ t 2 t 2 2 _ n _ θ 2 t 2 2 _ n , and the second derivative (in θ) of the log density is 2n t 4 + 2t 2 θ 2 −θ 4 θ 2 (t 2 2 ) 2 = 2n θ 2 _ 1 − 2 (t 2 2 + 1) 2 _ , and the information in T is 2n θ 2 _ 1 −2E _ 1 T 2 2 + 1 _ 2 _ = 2n θ 2 _ _ 1 −2E _ 1 F 2 2n,2n + 1 _ 2 _ _ . The expected value is E _ 1 F 2 2n,2n + 1 _ 2 = Γ(2n) Γ(n) 2 _ 0 1 (1 +w) 2 w n−1 (1 +w) 2n = Γ(2n) Γ(n) 2 Γ(n)Γ(n + 2) Γ(2n + 2) = n + 1 2(2n + 1) . Substituting this above gives the information in T as 2n θ 2 _ 1 −2 n + 1 2(2n + 1) _ = I(θ) n 2n + 1 , which is not the answer reported by Joshi and Nabar. (ii) Let W = i X i and V = i Y i . In each pair, X i and Y i are independent, so W and V are independent. X i ∼ exponential(1/θ); hence, W ∼ gamma(n, 1/θ). Y i ∼ exponential(θ); hence, V ∼ gamma(n, θ). Use this joint distribution of (W, V ) to derive the joint pdf of (T, U) as f(t, u|θ) = 2 [Γ(n)] 2 t u 2n−1 exp _ t ut θ _ , u > 0, t > 0. Now, the information in (T, U) is −E _ 2 ∂θ 2 log f(T, U|θ) _ = −E _ 2UT θ 3 _ = E _ 2V θ 3 _ = 2nθ θ 3 = 2n θ 2 . (iii) The pdf of the sample is f(x, y) = exp [−θ ( i x i ) −( i y i ) /θ] . Hence, (W, V ) deﬁned as in part (ii) is suﬃcient. (T, U) is a one-to-one function of (W, V ), hence (T, U) is also suﬃcient. But, EU 2 = EWV = (n/θ)(nθ) = n 2 does not depend on θ. So E(U 2 −n 2 ) = 0 for all θ, and (T, U) is not complete. 6.39 a. The transformation from Celsius to Fahrenheit is y = 9x/5 + 32. Hence, 5 9 (T (y) −32) = 5 9 ((.5)(y) + (.5)(212) −32) = 5 9 ((.5)(9x/5 + 32) + (.5)(212) −32) = (.5)x + 50 = T(x). b. T(x) = (.5)x + 50 = (.5)x + 106 = T (x). Thus, we do not have equivariance. 6-10 Solutions Manual for Statistical Inference 6.40 a. Because X 1 , . . . , X n is from a location scale family, by Theorem 3.5.6, we can write X i = σZ i +µ, where Z 1 , . . . , Z n is a random sample from the standard pdf f(z). Then T 1 (X 1 , . . . , X n ) T 2 (X 1 , . . . , X n ) = T 1 (σZ 1 +µ, . . . , σZ n +µ) T 2 (σZ 1 +µ, . . . , σZ n +µ) = σT 1 (Z 1 , . . . , Z n ) σT 2 (Z 1 , . . . , Z n ) = T 1 (Z 1 , . . . , Z n ) T 2 (Z 1 , . . . , Z n ) . Because T 1 /T 2 is a function of only Z 1 , . . . , Z n , the distribution of T 1 /T 2 does not depend on µ or σ; that is, T 1 /T 2 is an ancillary statistic. b. R(x 1 , . . . , x n ) = x (n) − x (1) . Because a > 0, max{ax 1 + b, . . . , ax n + b} = ax (n) + b and min{ax 1 +b, . . . , ax n +b} = ax (1) +b. Thus, R(ax 1 +b, . . . , ax n +b) = (ax (n) +b)−(ax (1) +b) = a(x (n) −x (1) ) = aR(x 1 , . . . , x n ). For the sample variance we have S 2 (ax 1 +b, . . . , ax n +b) = 1 n −1 ((ax i +b) −(a¯ x +b)) 2 = a 2 1 n −1 (x i − ¯ x) 2 = a 2 S 2 (x 1 , . . . , x n ). Thus, S(ax 1 +b, . . . , ax n +b) = aS(x 1 , . . . , x n ). Therefore, R and S both satisfy the above condition, and R/S is ancillary by a). 6.41 a. Measurement equivariance requires that the estimate of µ based on y be the same as the estimate of µ based on x; that is, T (x 1 +a, . . . , x n +a) −a = T (y) −a = T(x). b. The formal structures for the problem involving Xand the problem involving Yare the same. They both concern a random sample of size n from a normal population and estimation of the mean of the population. Thus, formal invariance requires that T(x) = T (x) for all x. Combining this with part (a), the Equivariance Principle requires that T(x 1 +a, . . . , x n +a)− a = T (x 1 +a, . . . , x n +a)−a = T(x 1 , . . . , x n ), i.e., T(x 1 +a, . . . , x n +a) = T(x 1 , . . . , x n )+a. c. W(x 1 + a, . . . , x n + a) = i (x i + a)/n = ( i x i ) /n + a = W(x 1 , . . . , x n ) + a, so W(x) is equivariant. The distribution of (X 1 , . . . , X n ) is the same as the distribution of (Z 1 + θ, . . . , Z n + θ), where Z 1 , . . . , Z n are a random sample from f(x − 0) and EZ i = 0. Thus, E θ W = E i (Z i +θ)/n = θ, for all θ. 6.43 a. For a location-scale family, if X ∼ f(x|θ, σ 2 ), then Y = g a,c (X) ∼ f(y|cθ + a, c 2 σ 2 ). So for estimating σ 2 , ¯ g a,c 2 ) = c 2 σ 2 . An estimator of σ 2 is invariant with respect to G 1 if W(cx 1 + a, . . . , cx n + a) = c 2 W(x 1 , . . . , x n ). An estimator of the form kS 2 is invariant because kS 2 (cx 1 +a, . . . , cx n +a) = k n −1 n i=1 _ (cx i +a) − n i=1 (cx i +a)/n _ 2 = k n −1 n i=1 ((cx i +a) −(c¯ x +a)) 2 = c 2 k n −1 n i=1 (x i − ¯ x) 2 = c 2 kS 2 (x 1 , . . . , x n ). To show invariance with respect to G 2 , use the above argument with c = 1. To show invariance with respect to G 3 , use the above argument with a = 0. ( G 2 and G 3 are both subgroups of G 1 . So invariance with respect to G 1 implies invariance with respect to G 2 and G 3 .) b. The transformations in G 2 leave the scale parameter unchanged. Thus, ¯ g a 2 ) = σ 2 . An estimator of σ 2 is invariant with respect to this group if W(x 1 +a, . . . , x n +a) = W(g a (x)) = ¯ g a (W(x)) = W(x 1 , . . . , x n ). Second Edition 6-11 An estimator of the given form is invariant if, for all a and (x 1 , . . . , x n ), W(x 1 +a, . . . , x n +a) = φ _ ¯ x+a s _ s 2 = φ _ ¯ x s _ s 2 = W(x 1 , . . . , x n ). In particular, for a sample point with s = 1 and ¯ x = 0, this implies we must have φ(a) = φ(0), for all a; that is, φ must be constant. On the other hand, if φ is constant, then the estimators are invariant by part a). So we have invariance if and only if φ is constant. Invariance with respect to G 1 also requires φ to be constant because G 2 is a subgroup of G 1 . Finally, an estimator of σ 2 is invariant with respect to G 3 if W(cx 1 , . . . , cx n ) = c 2 W(x 1 , . . . , x n ). Estimators of the given form are invariant because W(cx 1 , . . . , cx n ) = φ _ c¯ x cs _ c 2 s 2 = c 2 φ _ ¯ x s _ s 2 = c 2 W(x 1 , . . . , x n ). Chapter 7 Point Estimation 7.1 For each value of x, the MLE ˆ θ is the value of θ that maximizes f(x|θ). These values are in the following table. x 0 1 2 3 4 ˆ θ 1 1 2 or 3 3 3 At x = 2, f(x|2) = f(x|3) = 1/4 are both maxima, so both ˆ θ = 2 or ˆ θ = 3 are MLEs. 7.2 a. L(β|x) = n i=1 1 Γ(α)β α x α−1 i e −x i = 1 Γ(α) n β _ n i=1 x i _ α−1 e −Σ i x i logL(β|x) = −log Γ(α) n −nαlog β + (α−1) log _ n i=1 x i _ i x i β ∂logL ∂β = − β + i x i β 2 Set the partial derivative equal to 0 and solve for β to obtain ˆ β = i x i /(nα). To check that this is a maximum, calculate 2 logL ∂β 2 ¸ ¸ ¸ ¸ β= ˆ β = β 2 2 i x i β 3 ¸ ¸ ¸ ¸ β= ˆ β = (nα) 3 ( i x i ) 2 2(nα) 3 ( i x i ) 2 = − (nα) 3 ( i x i ) 2 < 0. Because ˆ β is the unique point where the derivative is 0 and it is a local maximum, it is a global maximum. That is, ˆ β is the MLE. b. Now the likelihood function is L(α, β|x) = 1 Γ(α) n β _ n i=1 x i _ α−1 e −Σ i x i , the same as in part (a) except α and β are both variables. There is no analytic form for the MLEs, The values ˆ α and ˆ β that maximize L. One approach to ﬁnding ˆ α and ˆ β would be to numerically maximize the function of two arguments. But it is usually best to do as much as possible analytically, ﬁrst, and perhaps reduce the complexity of the numerical problem. From part (a), for each ﬁxed value of α, the value of β that maximizes L is i x i /(nα). Substitute this into L. Then we just need to maximize the function of the one variable α given by 1 Γ(α) n ( i x i /(nα)) _ n i=1 x i _ α−1 e −Σ i x i /(Σ i x i /(nα)) = 1 Γ(α) n ( i x i /(nα)) _ n i=1 x i _ α−1 e −nα . 7-2 Solutions Manual for Statistical Inference For the given data, n = 14 and i x i = 323.6. Many computer programs can be used to maximize this function. From PROC NLIN in SAS we obtain ˆ α = 514.219 and, hence, ˆ β = 323.6 14(514.219) = .0450. 7.3 The log function is a strictly monotone increasing function. Therefore, L(θ|x) > L(θ |x) if and only if log L(θ|x) > log L(θ |x). So the value ˆ θ that maximizes log L(θ|x) is the same as the value that maximizes L(θ|x). 7.5 a. The value ˆ z solves the equation (1 −p) n = i (1 −x i z), where 0 ≤ z ≤ (max i x i ) −1 . Let ˆ k = greatest integer less than or equal to 1/ˆ z. Then from Example 7.2.9, ˆ k must satisfy [k(1 −p)] n i (k −x i ) and [(k + 1)(1 −p)] n < i (k + 1 −x i ). Because the right-hand side of the ﬁrst equation is decreasing in ˆ z, and because ˆ k ≤ 1/ˆ z (so ˆ z ≤ 1/ ˆ k) and ˆ k + 1 > 1/ˆ z, ˆ k must satisfy the two inequalities. Thus ˆ k is the MLE. b. For p = 1/2, we must solve _ 1 2 _ 4 = (1 − 20z)(1 − z)(1 − 19z), which can be reduced to the cubic equation −380z 3 + 419z 2 − 40z + 15/16 = 0. The roots are .9998, .0646, and .0381, leading to candidates of 1, 15, and 26 for ˆ k. The ﬁrst two are less than max i x i . Thus ˆ k = 26. 7.6 a. f(x|θ) = i θx −2 i I [θ,∞) (x i ) = _ i x −2 i _ θ n I [θ,∞) (x (1) ). Thus, X (1) is a suﬃcient statistic for θ by the Factorization Theorem. b. L(θ|x) = θ n _ i x −2 i _ I [θ,∞) (x (1) ). θ n is increasing in θ. The second term does not involve θ. So to maximize L(θ|x), we want to make θ as large as possible. But because of the indicator function, L(θ|x) = 0 if θ > x (1) . Thus, ˆ θ = x (1) . c. EX = _ θ θx −1 dx = θ logx| θ = ∞. Thus the method of moments estimator of θ does not exist. (This is the Pareto distribution with α = θ, β = 1.) 7.7 L(0|x) = 1, 0 < x i < 1, and L(1|x) = i 1/(2 x i ), 0 < x i < 1. Thus, the MLE is 0 if 1 ≥ i 1/(2 x i ), and the MLE is 1 if 1 < i 1/(2 x i ). 7.8 a. EX 2 = Var X +µ 2 = σ 2 . Therefore X 2 is an unbiased estimator of σ 2 . b. L(σ|x) = 1 2πσ e −x 2 /(2σ 2 ) . log L(σ|x) = log(2π) −1/2 −log σ −x 2 /(2σ 2 ). ∂logL ∂σ = − 1 σ + x 2 σ 3 set = 0 ⇒ ˆ σX 2 = ˆ σ 3 ⇒ ˆ σ = X 2 = |X|. 2 logL ∂σ 2 = −3x 2 σ 2 σ 6 + 1 σ 2 , which is negative at ˆ σ = |x|. Thus, ˆ σ = |x| is a local maximum. Because it is the only place where the ﬁrst derivative is zero, it is also a global maximum. c. Because EX = 0 is known, just equate EX 2 = σ 2 = 1 n 1 i=1 X 2 i = X 2 ⇒ ˆ σ = |X|. 7.9 This is a uniform(0, θ) model. So EX = (0 + θ)/2 = θ/2. The method of moments estimator is the solution to the equation ˜ θ/2 = ¯ X, that is, ˜ θ = 2 ¯ X. Because ˜ θ is a simple function of the sample mean, its mean and variance are easy to calculate. We have E ˜ θ = 2E ¯ X = 2EX = 2 θ 2 = θ, and Var ˜ θ = 4Var ¯ X = 4 θ 2 /12 n = θ 2 3n . Second Edition 7-3 The likelihood function is L(θ|x) = n i=1 1 θ I [0,θ] (x i ) = 1 θ n I [0,θ] (x (n) )I [0,∞) (x (1) ), where x (1) and x (n) are the smallest and largest order statistics. For θ ≥ x (n) , L = 1/θ n , a decreasing function. So for θ ≥ x (n) , L is maximized at ˆ θ = x (n) . L = 0 for θ < x (n) . So the overall maximum, the MLE, is ˆ θ = X (n) . The pdf of ˆ θ = X (n) is nx n−1 n , 0 ≤ x ≤ θ. This can be used to calculate E ˆ θ = n n + 1 θ, E ˆ θ 2 = n n + 2 θ 2 and Var ˆ θ = 2 (n + 2)(n + 1) 2 . ˜ θ is an unbiased estimator of θ; ˆ θ is a biased estimator. If n is large, the bias is not large because n/(n + 1) is close to one. But if n is small, the bias is quite large. On the other hand, Var ˆ θ < Var ˜ θ for all θ. So, if n is large, ˆ θ is probably preferable to ˜ θ. 7.10 a. f(x|θ) = i α β α x α−1 i I [0,β] (x i ) = _ α β α _ n ( i x i ) α−1 I (−∞,β] (x (n) )I [0,∞) (x (1) ) = L(α, β|x). By the Factorization Theorem, ( i X i , X (n) ) are suﬃcient. b. For any ﬁxed α, L(α, β|x) = 0 if β < x (n) , and L(α, β|x) a decreasing function of β if β ≥ x (n) . Thus, X (n) is the MLE of β. For the MLE of α calculate ∂α logL = ∂α _ nlogα−nαlogβ+(α−1)log i x i _ = n α −nlog β + log i x i . Set the derivative equal to zero and use ˆ β = X (n) to obtain ˆ α = n nlogX (n) −log i X i = _ 1 n i (logX (n) −logX i ) _ −1 . The second derivative is −n/α 2 < 0, so this is the MLE. c. X (n) = 25.0, log i X i = i log X i = 43.95 ⇒ ˆ β = 25.0, ˆ α = 12.59. 7.11 a. f(x|θ) = i θx θ−1 i = θ n _ i x i _ θ−1 = L(θ|x) d log L = d _ nlogθ+(θ−1)log i x i _ = n θ + i log x i . Set the derivative equal to zero and solve for θ to obtain ˆ θ = (− 1 n i log x i ) −1 . The second derivative is −n/θ 2 < 0, so this is the MLE. To calculate the variance of ˆ θ, note that Y i = −log X i ∼ exponential(1/θ), so − i log X i ∼ gamma(n, 1/θ). Thus ˆ θ = n/T, where T ∼ gamma(n, 1/θ). We can either calculate the ﬁrst and second moments directly, or use the fact that ˆ θ is inverted gamma (page 51). We have E 1 T = θ n Γ(n) _ 0 1 t t n−1 e −θt dt = θ n Γ(n) Γ(n −1) θ n−1 = θ n −1 . E 1 T 2 = θ n Γ(n) _ 0 1 t 2 t n−1 e −θt dt = θ n Γ(n) Γ(n −2) θ n−2 = θ 2 (n −1)(n −2) , 7-4 Solutions Manual for Statistical Inference and thus E ˆ θ = n n −1 θ and Var ˆ θ = n 2 (n −1) 2 (n −2) θ 2 →0 as n →∞. b. Because X ∼ beta(θ, 1), EX = θ/(θ + 1) and the method of moments estimator is the solution to 1 n i X i = θ θ+1 ˜ θ = i X i n− i X i . 7.12 X i ∼ iid Bernoulli(θ), 0 ≤ θ ≤ 1/2. a. method of moments: EX = θ = 1 n i X i = ¯ X ⇒ ˜ θ = ¯ X. MLE: In Example 7.2.7, we showed that L(θ|x) is increasing for θ ≤ ¯ x and is decreasing for θ ≥ ¯ x. Remember that 0 ≤ θ ≤ 1/2 in this exercise. Therefore, when ¯ X ≤ 1/2, ¯ X is the MLE of θ, because ¯ X is the overall maximum of L(θ|x). When ¯ X > 1/2, L(θ|x) is an increasing function of θ on [0, 1/2] and obtains its maximum at the upper bound of θ which is 1/2. So the MLE is ˆ θ = min _ ¯ X, 1/2 _ . b. The MSE of ˜ θ is MSE( ˜ θ) = Var ˜ θ + bias( ˜ θ) 2 = (θ(1 −θ)/n) + 0 2 = θ(1 −θ)/n. There is no simple formula for MSE( ˆ θ), but an expression is MSE( ˆ θ) = E( ˆ θ −θ) 2 = n y=0 ( ˆ θ −θ) 2 _ n y _ θ y (1 −θ) n−y = [n/2] y=0 _ y n −θ _ 2 _ n y _ θ y (1 −θ) n−y + n y=[n/2]+1 _ 1 2 −θ _ 2 _ n y _ θ y (1 −θ) n−y , where Y = i X i ∼ binomial(n, θ) and [n/2] = n/2, if n is even, and [n/2] = (n − 1)/2, if n is odd. c. Using the notation used in (b), we have MSE( ˜ θ) = E( ¯ X −θ) 2 = n y=0 _ y n −θ _ 2 _ n y _ θ y (1 −θ) n−y . Therefore, MSE( ˜ θ) −MSE( ˆ θ) = n y=[n/2]+1 _ _ y n −θ _ 2 _ 1 2 −θ _ 2 _ _ n y _ θ y (1 −θ) n−y = n y=[n/2]+1 _ y n + 1 2 −2θ __ y n 1 2 __ n y _ θ y (1 −θ) n−y . The facts that y/n > 1/2 in the sum and θ ≤ 1/2 imply that every term in the sum is positive. Therefore MSE( ˆ θ) < MSE( ˜ θ) for every θ in 0 < θ ≤ 1/2. (Note: MSE( ˆ θ) = MSE( ˜ θ) = 0 at θ = 0.) 7.13 L(θ|x) = i 1 2 e 1 2 |x i −θ| = 1 2 n e 1 2 Σ i |x i −θ| , so the MLE minimizes i |x i − θ| = i |x (i) − θ|, where x (1) , . . . , x (n) are the order statistics. For x (j) ≤ θ ≤ x (j+1) , n i=1 |x (i) −θ| = j i=1 (θ −x (i) ) + n i=j+1 (x (i) −θ) = (2j −n)θ − j i=1 x (i) + n i=j+1 x (i) . Second Edition 7-5 This is a linear function of θ that decreases for j < n/2 and increases for j > n/2. If n is even, 2j − n = 0 if j = n/2. So the likelihood is constant between x (n/2) and x ((n/2)+1) , and any value in this interval is the MLE. Usually the midpoint of this interval is taken as the MLE. If n is odd, the likelihood is minimized at ˆ θ = x ((n+1)/2) . 7.15 a. The likelihood is L(µ, λ|x) = λ n/2 (2π) n i x i exp _ λ 2 i (x i −µ) 2 µ 2 x i _ . For ﬁxed λ, maximizing with respect to µ is equivalent to minimizing the sum in the expo- nential. d i (x i −µ) 2 µ 2 x i = d i ((x i /µ) −1) 2 x i = − i 2 ((x i /µ) −1) x i x i µ 2 . Setting this equal to zero is equivalent to setting i _ x i µ −1 _ = 0, and solving for µ yields ˆ µ n = ¯ x. Plugging in this ˆ µ n and maximizing with respect to λ amounts to maximizing an expression of the form λ n/2 e −λb . Simple calculus yields ˆ λ n = n 2b where b = i (x i − ¯ x) 2 2¯ x 2 x i . Finally, 2b = i x i ¯ x 2 −2 i 1 ¯ x + i 1 x i = − n ¯ x + i 1 x i = i _ 1 x i 1 ¯ x _ . b. This is the same as Exercise 6.27b. c. This involved algebra can be found in Schwarz and Samanta (1991). 7.17 a. This is a special case of the computation in Exercise 7.2a. b. Make the transformation z = (x 2 −1)/x 1 , w = x 1 ⇒ x 1 = w, x 2 = wz + 1. The Jacobean is |w|, and f Z (z) = _ f X 1 (w)f X 2 (wz + 1)wdw = 1 θ 2 e −1/θ _ we −w(1+z)/θ dw, where the range of integration is 0 < w < −1/z if z < 0, 0 < w < ∞ if z > 0. Thus, f Z (z) = 1 θ 2 e −1/θ _ _ −1/z 0 we −w(1+z)/θ dw if z < 0 _ 0 we −w(1+z)/θ dw if z ≥ 0 Using the fact that _ we −w/a dw = −e −w/a (aw +a 2 ), we have f Z (z) = e −1/θ _ zθ+e (1+z)/zθ (1+z−zθ) θz(1+z) 2 if z < 0 1 (1+z) 2 if z ≥ 0 7-6 Solutions Manual for Statistical Inference c. From part (a) we get ˆ θ = 1. From part (b), X 2 = 1 implies Z = 0 which, if we use the second density, gives us ˆ θ = ∞. d. The posterior distributions are just the normalized likelihood times prior, so of course they are diﬀerent. 7.18 a. The usual ﬁrst two moment equations for X and Y are ¯ x = EX = µ X , 1 n i x 2 i = EX 2 = σ 2 X 2 X , ¯ y = EY = µ Y , 1 n i y 2 i = EY 2 = σ 2 Y 2 Y . We also need an equation involving ρ. 1 n i x i y i = EXY = Cov(X, Y ) + (EX)(EY ) = ρσ X σ Y X µ Y . Solving these ﬁve equations yields the estimators given. Facts such as 1 n i x 2 i − ¯ x 2 = i x 2 i −( i x i ) 2 /n n = i (x i − ¯ x) 2 n are used. b. Two answers are provided. First, use the Miscellanea: For L(θ|x) = h(x)c(θ) exp _ k i=1 w i (θ)t i (x) _ , the solutions to the k equations n j=1 t i (x j ) = E θ _ n j=1 t i (X j ) _ = nE θ t i (X 1 ), i = 1, . . . , k, provide the unique MLE for θ. Multiplying out the exponent in the bivariate normal pdf shows it has this exponential family form with k = 5 and t 1 (x, y) = x, t 2 (x, y) = y, t 3 (x, y) = x 2 , t 4 (x, y) = y 2 and t 5 (x, y) = xy. Setting up the method of moment equations, we have i x i = nµ X , i x 2 i = n(µ 2 X 2 X ), i y i = nµ Y , i y 2 i = n(µ 2 Y 2 Y ), i x i y i = i [Cov(X, Y ) +µ X µ Y ] = n(ρσ X σ Y X µ Y ). These are the same equations as in part (a) if you divide each one by n. So the MLEs are the same as the method of moment estimators in part (a). For the second answer, use the hint in the book to write L(θ|x, y) = L(θ|x)L(θ, x|y) = (2πσ 2 X ) n 2 exp _ 1 2 X i (x i −µ X ) 2 _ . ¸¸ . A × _ 2πσ 2 Y (1−ρ 2 ) _ n 2 exp _ −1 2 Y (1 −ρ 2 ) i _ y i _ µ Y σ Y σ X (x i −µ X ) __ 2 _ . ¸¸ . B Second Edition 7-7 We know that ¯ x and ˆ σ 2 X = i (x i − ¯ x) 2 /n maximizes A; the question is whether given σ Y , µ Y , and ρ, does ¯ x, ˆ σ 2 X maximize B? Let us ﬁrst ﬁx σ 2 X and look for ˆ µ X , that maximizes B. We have ∂logB ∂µ X ∝ −2 _ i _ (y i −µ Y )− ρσ Y σ X (x i −µ X ) _ _ ρσ Y σ X set = 0 i (y i −µ Y ) = ρσ Y σ X Σ(x i − ˆ µ X ). Similarly do the same procedure for L(θ|y)L(θ, y|x) This implies i (x i −µ X ) = ρσ X σ Y i (y i ˆ µ Y ). The solutions ˆ µ X and ˆ µ Y therefore must satisfy both equations. If i (y i − ˆ µ Y ) = 0 or i (x i − ˆ µ X ) = 0, we will get ρ = 1/ρ, so we need i (y i − ˆ µ Y ) = 0 and i (x i − ˆ µ X ) = 0. This implies ˆ µ X = ¯ x and ˆ µ Y = ¯ y. ( 2 log B ∂µ 2 X < 0. Therefore it is maximum). To get ˆ σ 2 X take ∂log B ∂σ 2 X i ρσ Y σ 2 X (x i − ˆ µ X ) _ (y i −µ Y )− ρσ Y σ X (x i −µ X ) _ set = 0. i (x i − ˆ µ X )(y i − ˆ µ Y ) = ρσ Y ˆ σ X (x i − ˆ µ X ) 2 . Similarly, i (x i − ˆ µ X )(y i − ˆ µ Y ) = ρσ X ˆ σ Y i (y i − ˆ µ Y ) 2 . Thus ˆ σ 2 X and ˆ σ 2 Y must satisfy the above two equations with ˆ µ X = ¯ X, ˆ µ Y = ¯ Y . This implies ˆ σ Y ˆ σ X i (x i − ¯ x) 2 = ˆ σ X ˆ σ Y i (y i − ¯ y) 2 i (x i − ¯ x) 2 ˆ σ 2 X = i (y i − ¯ y) 2 ˆ σ 2 Y . Therefore, ˆ σ 2 X = a i (x i − ¯ x) 2 , ˆ σ 2 Y = a i (y i − ¯ y) 2 where a is a constant. Combining the knowledge that _ ¯ x, 1 n i (x i − ¯ x) 2 _ = (ˆ µ X , ˆ σ 2 X ) maximizes A, we conclude that a = 1/n. Lastly, we ﬁnd ˆ ρ, the MLE of ρ. Write log L(¯ x, ¯ y, ˆ σ 2 X , ˆ σ 2 Y , ρ|x, y) = − n 2 log(1 −ρ 2 ) − 1 2(1−ρ 2 ) i _ (x i − ¯ x) 2 ˆ σ 2 X 2ρ(x i − ¯ x)(y i − ¯ y) ˆ σ X ,ˆ σ Y + (y i − ¯ y) 2 ˆ σ 2 Y _ = − n 2 log(1 −ρ 2 ) − 1 2(1−ρ 2 ) _ ¸ ¸ ¸ ¸ _ 2n −2ρ i (x i − ¯ x)(y i − ¯ y) ˆ σ X ˆ σ Y . ¸¸ . A _ ¸ ¸ ¸ ¸ _ because ˆ σ 2 X = 1 n i (x i − ¯ x) 2 and ˆ σ 2 Y = 1 n i (y i − ¯ y) 2 . Now log L = − n 2 log(1 −ρ 2 ) − n 1 −ρ 2 + ρ 1 −ρ 2 A and ∂log L ∂ρ = n 1 −ρ 2 (1−ρ 2 ) 2 + A(1−ρ 2 ) + 2Aρ 2 (1−ρ 2 ) 2 set = 0. This implies A+Aρ 2 −nˆ ρ−nˆ ρ 3 (1−ρ 2 ) 2 = 0 ⇒ A(1 + ˆ ρ 2 ) = nˆ ρ(1 + ˆ ρ 2 ) ⇒ ˆ ρ = A n = 1 n i (x i − ¯ x)(y i − ¯ y) ˆ σ X ˆ σ Y . 7-8 Solutions Manual for Statistical Inference 7.19 a. L(θ|y) = i 1 2πσ 2 exp _ 1 2 (y i −βx i ) 2 _ = (2πσ 2 ) −n/2 exp _ 1 2 i (y 2 i −2βx i y i 2 x 2 i ) _ = (2πσ 2 ) −n/2 exp _ β 2 i x 2 i 2 _ exp _ 1 2 i y 2 i + β σ 2 i x i y i _ . By Theorem 6.1.2, ( i Y 2 i , i x i Y i ) is a suﬃcient statistic for (β, σ 2 ). b. logL(β,σ 2 |y) = − n 2 log(2π) − n 2 log σ 2 1 2 y 2 i + β σ 2 i x i y i β 2 2 i x 2 i . For a ﬁxed value of σ 2 , ∂logL ∂β = 1 σ 2 i x i y i β σ 2 i x 2 i set = 0 ⇒ ˆ β = i x i y i i x 2 i . Also, 2 logL ∂β 2 = 1 σ 2 i x 2 i < 0, so it is a maximum. Because ˆ β does not depend on σ 2 , it is the MLE. And ˆ β is unbiased because E ˆ β = i x i EY i i x 2 i = i x i · βx i i x 2 i = β. c. ˆ β = i a i Y i , where a i = x i / j x 2 j are constants. By Corollary 4.6.10, ˆ β is normally dis- tributed with mean β, and Var ˆ β = i a 2 i Var Y i = i _ x i j x 2 j _ 2 σ 2 = i x 2 i ( j x 2 j ) 2 σ 2 = σ 2 i x 2 i . 7.20 a. E i Y i i x i = 1 i x i i EY i = 1 i x i i βx i = β. b. Var _ i Y i i x i _ = 1 ( i x i ) 2 i Var Y i = i σ 2 ( i x i ) 2 = 2 n 2 ¯ x 2 = σ 2 n¯ x 2 . Because i x 2 i −n¯ x 2 = i (x i − ¯ x) 2 ≥ 0, i x 2 i ≥ n¯ x 2 . Hence, Var ˆ β = σ 2 i x 2 i σ 2 n¯ x 2 = Var _ i Y i i x i _ . (In fact, ˆ β is BLUE (Best Linear Unbiased Estimator of β), as discussed in Section 11.3.2.) Second Edition 7-9 7.21 a. E 1 n i Y i x i = 1 n i EY i x i = 1 n i βx i x i = β. b. Var 1 n i Y i x i = 1 n 2 i Var Y i x 2 i = σ 2 n 2 i 1 x 2 i . Using Example 4.7.8 with a i = 1/x 2 i we obtain 1 n i 1 x 2 i n i x 2 i . Thus, Var ˆ β = σ 2 i x 2 i σ 2 n 2 i 1 x 2 i = Var 1 n i Y i x i . Because g(u) = 1/u 2 is convex, using Jensen’s Inequality we have 1 ¯ x 2 1 n i 1 x 2 i . Thus, Var _ i Y i i x i _ = σ 2 n¯ x 2 σ 2 n 2 i 1 x 2 i = Var 1 n i Y i x i . 7.22 a. f(¯ x, θ) = f(¯ x|θ)π(θ) = n 2πσ e −n(¯ x−θ) 2 /(2σ 2 ) 1 2πτ e −(θ−µ) 2 /2τ 2 . b. Factor the exponent in part (a) as −n 2 (¯ x −θ) 2 1 2 (θ −µ) 2 = − 1 2v 2 (θ −δ(x)) 2 1 τ 2 2 /n (¯ x −µ) 2 , where δ(x) = (τ 2 ¯ x+(σ 2 /n)µ)/(τ 2 2 /n) and v = (σ 2 τ 2 /n) _ (τ +σ 2 /n). Let n(a, b) denote the pdf of a normal distribution with mean a and variance b. The above factorization shows that f(x, θ) = n(θ, σ 2 /n) ×n(µ, τ 2 ) = n(δ(x), v 2 ) ×n(µ, τ 2 2 /n), where the marginal distribution of ¯ X is n(µ, τ 2 2 /n) and the posterior distribution of θ|x is n(δ(x), v 2 ). This also completes part (c). 7.23 Let t = s 2 and θ = σ 2 . Because (n −1)S 2 2 ∼ χ 2 n−1 , we have f(t|θ) = 1 Γ((n −1)/2) 2 (n−1)/2 _ n −1 θ t _ [(n−1)/2]−1 e −(n−1)t/2θ n −1 θ . With π(θ) as given, we have (ignoring terms that do not depend on θ) π(θ|t) ∝ _ _ 1 θ _ ((n−1)/2)−1 e −(n−1)t/2θ 1 θ _ _ 1 θ α+1 e −1/βθ _ _ 1 θ _ ((n−1)/2)+α+1 exp _ 1 θ _ (n −1)t 2 + 1 β __ , 7-10 Solutions Manual for Statistical Inference which we recognize as the kernel of an inverted gamma pdf, IG(a, b), with a = n −1 2 +α and b = _ (n −1)t 2 + 1 β _ −1 . Direct calculation shows that the mean of an IG(a, b) is 1/((a −1)b), so E(θ|t) = n−1 2 t + 1 β n−1 2 +α−1 = n−1 2 s 2 + 1 β n−1 2 +α−1 . This is a Bayes estimator of σ 2 . 7.24 For n observations, Y = i X i ∼ Poisson(nλ). a. The marginal pmf of Y is m(y) = _ 0 (nλ) y e −nλ y! 1 Γ(α)β α λ α−1 e −λ/β = n y y!Γ(α)β α _ 0 λ (y+α)−1 e λ β/(nβ+1) dλ = n y y!Γ(α)β α Γ(y +α) _ β nβ+1 _ y+α . Thus, π(λ|y) = f(y|λ)π(λ) m(y) = λ (y+α)−1 e λ β/(nβ+1) Γ(y+α) _ β nβ+1 _ y+α ∼ gamma _ y +α, β nβ+1 _ . b. E(λ|y) = (y +α) β nβ+1 = β nβ+1 y + 1 nβ+1 (αβ). Var(λ|y) = (y +α) β 2 (nβ+1) 2 . 7.25 a. We will use the results and notation from part (b) to do this special case. From part (b), the X i s are independent and each X i has marginal pdf m(x|µ, σ 2 , τ 2 ) = _ −∞ f(x|θ, σ 2 )π(θ|µ, τ 2 ) dθ = _ −∞ 1 2πστ e −(x−θ) 2 /2σ 2 e −(θ−µ) 2 /2τ 2 dθ. Complete the square in θ to write the sum of the two exponents as _ θ − _ 2 σ 2 2 + µσ 2 σ 2 2 __ 2 2 σ 2 τ 2 σ 2 2 (x −µ) 2 2(σ 2 2 ) . Only the ﬁrst term involves θ; call it −A(θ). Also, e −A(θ) is the kernel of a normal pdf. Thus, _ −∞ e −A(θ) dθ = στ σ 2 2 , and the marginal pdf is m(x|µ, σ 2 , τ 2 ) = 1 2πστ στ σ 2 2 exp _ (x −µ) 2 2(σ 2 2 ) _ = 1 σ 2 2 exp _ (x −µ) 2 2(σ 2 2 ) _ , a n(µ, σ 2 2 ) pdf. Second Edition 7-11 b. For one observation of X and θ the joint pdf is h(x, θ|τ) = f(x|θ)π(θ|τ), and the marginal pdf of X is m(x|τ) = _ −∞ h(x, θ|τ) dθ. Thus, the joint pdf of X = (X 1 , . . . , X n ) and θ = (θ 1 , . . . , θ n ) is h(x, θ|τ) = i h(x i , θ i |τ), and the marginal pdf of X is m(x|τ) = _ −∞ · · · _ −∞ i h(x i , θ i |τ) dθ 1 . . . dθ n = _ −∞ · · · __ −∞ h(x 1 , θ 1 |τ) dθ 1 _ n i=2 h(x i , θ i |τ) dθ 2 . . . dθ n . The dθ 1 integral is just m(x 1 |τ), and this is not a function of θ 2 , . . . , θ n . So, m(x 1 |τ) can be pulled out of the integrals. Doing each integral in turn yields the marginal pdf m(x|τ) = i m(x i |τ). Because this marginal pdf factors, this shows that marginally X 1 , . . . , X n are independent, and they each have the same marginal distribution, m(x|τ). 7.26 First write f(x 1 , . . . , x n |θ)π(θ) ∝ e n 2 (¯ x−θ) 2 −|θ|/a where the exponent can be written n 2 (¯ x −θ) 2 |θ| a = n 2 (θ −δ ± (x)) + n 2 _ ¯ x 2 −δ 2 ± (x) _ with δ ± (x) = ¯ x ± σ 2 na , where we use the “+” if θ > 0 and the “−” if θ < 0. Thus, the posterior mean is E(θ|x) = _ −∞ θe n 2 (θ−δ ± (x)) 2 _ −∞ e n 2 (θ−δ ± (x)) 2 . Now use the facts that for constants a and b, _ 0 e a 2 (t−b) 2 dt = _ 0 −∞ e a 2 (t−b) 2 dt = _ π 2a , _ 0 te a 2 (t−b) 2 dt = _ 0 (t −b)e a 2 (t−b) 2 dt + _ 0 be a 2 (t−b) 2 dt = 1 a e a 2 b 2 +b _ π 2a , _ 0 −∞ te a 2 (t−b) 2 dt = − 1 a e a 2 b 2 +b _ π 2a , to get E(θ|x) = _ πσ 2 2n (x) +δ + (x)) + σ 2 n _ e n 2 δ 2 + (x) −e n 2 δ 2 (x) _ 2 _ πσ 2 2n . 7-12 Solutions Manual for Statistical Inference 7.27 a. The log likelihood is log L = n i=1 −βτ i +y i log(βτ i ) −τ i +x i log(τ i ) −log y i ! −log x i ! and diﬀerentiation gives ∂β log L = n i=1 −τ i + y i τ i βτ i ⇒ β = n i=1 y i n i=1 τ i ∂τ j log L = −β + y j β βτ j −i + x j τ j ⇒ τ j = x j +y j 1 +β n j=1 τ j = n j=1 x j + n j=1 y j 1 +β . Combining these expressions yields ˆ β = n j=1 y j / n j=1 x j and ˆ τ j = x j +y j 1+ ˆ β . b. The stationary point of the EM algorithm will satisfy ˆ β = n i=1 y i ˆ τ 1 + n i=2 x i ˆ τ 1 = ˆ τ 1 +y 1 ˆ β + 1 ˆ τ j = x j +y j ˆ β + 1 . The second equation yields τ 1 = y 1 /β, and substituting this into the ﬁrst equation yields β = n j=2 y j / n j=2 x j . Summing over j in the third equation, and substituting β = n j=2 y j / n j=2 x j shows us that n j=2 ˆ τ j = n j=2 x j , and plugging this into the ﬁrst equa- tion gives the desired expression for ˆ β. The other two equations in (7.2.16) are obviously satisﬁed. c. The expression for ˆ β was derived in part (b), as were the expressions for ˆ τ i . 7.29 a. The joint density is the product of the individual densities. b. The log likelihood is log L = n i=1 −mβτ i +y i log(mβτ i ) +x i log(τ i ) + log m! −log y i ! −log x i ! and ∂β log L = 0 ⇒ β = n i=1 y i n i=1 i ∂τ j log L = 0 ⇒ τ j = x j +y j . Since τ j = 1, ˆ β = n i=1 y i /m = n i=1 y i / n i=1 x i . Also, j τ j = j (y j +x j ) = 1, which implies that mβ = j (y j +x j ) and ˆ τ j = (x j +y j )/ i (y i +x i ). c. In the likelihood function we can ignore the factorial terms, and the expected complete-data likelihood is obtained by on the r th iteration by replacing x 1 with E(X 1 |ˆ τ (r) 1 ) = mˆ τ (r) 1 . Substituting this into the MLEs of part (b) gives the EM sequence. Second Edition 7-13 The MLEs from the full data set are ˆ β = 0.0008413892 and ˆ τ = (0.06337310, 0.06374873, 0.06689681, 0.04981487, 0.04604075, 0.04883109, 0.07072460, 0.01776164, 0.03416388, 0.01695673, 0.02098127, 0.01878119, 0.05621836, 0.09818091, 0.09945087, 0.05267677, 0.08896918, 0.08642925). The MLEs for the incomplete data were computed using R, where we take m = x i . The R code is #mles on the incomplete data# xdatam<-c(3560,3739,2784,2571,2729,3952,993,1908,948,1172, 1047,3138,5485,5554,2943,4969,4828) ydata<-c(3,4,1,1,3,1,2,0,2,0,1,3,5,4,6,2,5,4) xdata<-c(mean(xdatam),xdatam); for (j in 1:500) { xdata<-c(sum(xdata)*tau[1],xdatam) beta<-sum(ydata)/sum(xdata) tau<-c((xdata+ydata)/(sum(xdata)+sum(ydata))) } beta tau The MLEs from the incomplete data set are ˆ β = 0.0008415534 and ˆ τ = (0.06319044, 0.06376116, 0.06690986, 0.04982459, 0.04604973, 0.04884062, 0.07073839, 0.01776510, 0.03417054, 0.01696004, 0.02098536, 0.01878485, 0.05622933, 0.09820005, 0.09947027, 0.05268704, 0.08898653, 0.08644610). 7.31 a. By direct substitution we can write log L(θ|y) = E _ log L(θ|y, X)| ˆ θ (r) , y _ −E _ log k(X|θ, y)| ˆ θ (r) , y _ . The next iterate, ˆ θ (r+1) is obtained by maximizing the expected complete-data log likelihood, so for any θ, E _ log L( ˆ θ (r+1) y, X) ¸ ¸ ¸ ˆ θ (r) , y _ ≥ E _ log L(θ|y, X)| ˆ θ (r) , y _ b. Write E[log k(X|θ, y)|θ , y] = _ log k(x|θ, y) log k(x|θ , y)dx ≤ _ log k(x|θ , y) log k(x|θ , y)dx, from the hint. Hence E _ log k(X| ˆ θ (r+1) , y) ¸ ¸ ¸ ˆ θ (r) , y _ ≤ E _ log k(X| ˆ θ (r) , y) ¸ ¸ ¸ ˆ θ (r) , y _ , and so the entire right hand side in part (a) is decreasing. 7.33 Substitute α = β = _ n/4 into MSE(ˆ p B ) = np(1−p) (α+β+n) 2 + _ np+α α+β+n −p _ 2 and simplify to obtain MSE(ˆ p B ) = n 4( n +n) 2 , independent of p, as desired. 7.35 a. δ p (g(x)) = δ p (x 1 +a, . . . , x n +a) = _ −∞ t i f(x i +a −t) dt _ −∞ i f(x i +a −t) dt = _ −∞ (y +a) i f(x i −y) dy _ −∞ i f(x i −y) dy (y = t −a) = a +δ p (x) = ¯ g (δ p (x)) . 7-14 Solutions Manual for Statistical Inference b. i f(x i −t) = 1 (2π) n/2 e 1 2 Σ i (x i −t) 2 = 1 (2π) n/2 e 1 2 n(¯ x−t) 2 e 1 2 (n−1)s 2 , so δ p (x) = ( n/ 2π) _ −∞ te 1 2 n(¯ x−t) 2 dt ( n/ 2π) _ −∞ e 1 2 n(¯ x−t) 2 dt = ¯ x 1 = ¯ x. c. i f(x i −t) = i I _ t − 1 2 ≤ x i ≤ t + 1 2 _ = I _ x (n) 1 2 ≤ t ≤ x (1) + 1 2 _ , so δ p (x) = _ x (1) +1/2 x (n) +1/2 t dt _ x (1) +1/2 x (n) +1/2 1 dt = x (1) +x (n) 2 . 7.37 To ﬁnd a best unbiased estimator of θ, ﬁrst ﬁnd a complete suﬃcient statistic. The joint pdf is f(x|θ) = _ 1 _ n i I (−θ,θ) (x i ) = _ 1 _ n I [0,θ) (max i |x i |). By the Factorization Theorem, max i |X i | is a suﬃcient statistic. To check that it is a complete suﬃcient statistic, let Y = max i |X i |. Note that the pdf of Y is f Y (y) = ny n−1 n , 0 < y < θ. Suppose g(y) is a function such that Eg(Y ) = _ θ 0 ny n−1 θ n g(y) dy = 0, for all θ. Taking derivatives shows that θ n−1 g(θ) = 0, for all θ. So g(θ) = 0, for all θ, and Y = max i |X i | is a complete suﬃcient statistic. Now EY = _ θ 0 y ny n−1 θ n dy = n n + 1 θ ⇒ E _ n + 1 n Y _ = θ. Therefore n+1 n max i |X i | is a best unbiased estimator for θ because it is a function of a complete suﬃcient statistic. (Note that _ X (1) , X (n) _ is not a minimal suﬃcient statistic (recall Exercise 5.36). It is for θ < X i < 2θ, −2θ < X i < θ, 4θ < X i < 6θ, etc., but not when the range is symmetric about zero. Then max i |X i | is minimal suﬃcient.) 7.38 Use Corollary 7.3.15. a. ∂θ logL(θ|x) = ∂θ log i θx θ−1 i = ∂θ i [logθ + (θ−1) logx i ] = i _ 1 θ + logx i _ = −n _ i logx i n 1 θ _ . Thus, − i log X i /n is the UMVUE of 1/θ and attains the Cram´er-Rao bound. Second Edition 7-15 b. ∂θ logL(θ|x) = ∂θ log i logθ θ−1 θ x i = ∂θ i [loglogθ −log(θ−1) +x i logθ] = i _ 1 θlogθ 1 θ−1 _ + 1 θ i x i = n θlogθ n θ−1 + n¯ x θ = n θ _ ¯ x− _ θ θ−1 1 logθ __ . Thus, ¯ X is the UMVUE of θ θ−1 1 logθ and attains the Cram´er-Rao lower bound. Note: We claim that if ∂θ log L(θ|X) = a(θ)[W(X) − τ(θ)], then EW(X) = τ(θ), because under the condition of the Cram´er-Rao Theorem, E ∂θ log L(θ|x) = 0. To be rigorous, we need to check the “interchange diﬀerentiation and integration“ condition. Both (a) and (b) are exponential families, and this condition is satisﬁed for all exponential families. 7.39 E θ _ 2 ∂θ 2 log f(X|θ) _ = E θ _ ∂θ _ ∂θ log f(X|θ) __ = E θ _ ∂θ _ ∂θ f(X|θ) f(X|θ) __ = E θ _ _ 2 ∂θ 2 f(X|θ) f(X|θ) _ ∂θ f(X|θ) f(X|θ) _ 2 _ _ . Now consider the ﬁrst term: E θ _ 2 ∂θ 2 f(X|θ) f(X|θ) _ = _ _ 2 ∂θ 2 f(x|θ) _ dx = d _ ∂θ f(x|θ) dx (assumption) = d E θ _ ∂θ log f(X|θ) _ = 0, (7.3.8) and the identity is proved. 7.40 ∂θ logL(θ|x) = ∂p log i p x i (1 −p) 1−x i = ∂p i x i log p + (1 −x i ) log(1 −p) = i _ x i p (1 −x i ) 1−p _ = n¯ x p n −n¯ x 1−p = n p(1 −p) [¯ x −p]. By Corollary 7.3.15, ¯ X is the UMVUE of p and attains the Cram´er-Rao lower bound. Alter- natively, we could calculate −nE θ _ 2 ∂θ 2 logf(X|θ) _ = −nE _ 2 ∂p 2 log _ p X (1 −p) 1−X _ _ = −nE _ 2 ∂p 2 [Xlogp + (1 −X) log(1 −p)] _ = −nE _ ∂p _ X p (1 −X) 1−p __ = −nE _ −X p 2 1−X (1 −p) 2 _ = −n _ 1 p 1 1−p _ = n p(1 −p) . 7-16 Solutions Manual for Statistical Inference Then using τ(θ) = p and τ (θ) = 1, τ (θ) −nE θ _ 2 ∂θ 2 logf(X|θ) _ = 1 n/p(1 −p) = p(1 −p) n = Var ¯ X. We know that E ¯ X = p. Thus, ¯ X attains the Cram´er-Rao bound. 7.41 a. E( i a i X i ) = i a i EX i = i a i µ = µ i a i = µ. Hence the estimator is unbiased. b. Var ( i a i X i ) = i a 2 i Var X i = i a 2 i σ 2 = σ 2 i a 2 i . Therefore, we need to minimize i a 2 i , subject to the constraint i a i = 1. Add and subtract the mean of the a i , 1/n, to get i a 2 i = i __ a i 1 n _ + 1 n _ 2 = i _ a i 1 n _ 2 + 1 n , because the cross-term is zero. Hence, i a 2 i is minimized by choosing a i = 1/n for all i. Thus, i (1/n)X i = ¯ X has the minimum variance among all linear unbiased estimators. 7.43 a. This one is real hard - it was taken from an American Statistician article, but the proof is not there. A cryptic version of the proof is in Tukey (Approximate Weights, Ann. Math. Statist. 1948, 91-92); here is a more detailed version. Let q i = q i (1 + λt i ) with 0 ≤ λ ≤ 1 and |t i | ≤ 1. Recall that q i = (1/σ 2 i )/ j (1/σ 2 j ) and VarW = 1/ j (1/σ 2 j ). Then Var _ q i W i j q j _ = 1 ( j q j ) 2 i q i σ 2 i = 1 [ j q j (1 +λt j )] 2 i q ∗2 i (1 +λt i ) 2 σ 2 i = 1 [ j q j (1 +λt j )] 2 j (1/σ 2 j ) i q i (1 +λt i ) 2 , using the deﬁnition of q i . Now write i q i (1 +λt i ) 2 = 1 +2λ j q j t j 2 j q j t 2 j = [1 +λ j q j t j ] 2 2 [ j q j t 2 j −( j q j t j ) 2 ], where we used the fact that j q j = 1. Now since [ j q j (1 +λt j )] 2 = [1 +λ j q j t j ] 2 , Var _ q i W i j q j _ = 1 j (1/σ 2 j ) _ 1 + λ 2 [ j q j t 2 j −( j q j t j ) 2 ] [1 +λ j q j t j ] 2 _ 1 j (1/σ 2 j ) _ 1 + λ 2 [1 −( j q j t j ) 2 ] [1 +λ j q j t j ] 2 _ , since j q j t 2 j ≤ 1. Now let T = j q j t j , and Var _ q i W i j q j _ 1 j (1/σ 2 j ) _ 1 + λ 2 [1 −T 2 ] [1 +λT] 2 _ , Second Edition 7-17 and the right hand side is maximized at T = −λ, with maximizing value Var _ q i W i j q j _ 1 j (1/σ 2 j ) _ 1 + λ 2 [1 −λ 2 ] [1 −λ 2 ] 2 _ = VarW 1 1 −λ 2 . Bloch and Moses (1988) deﬁne λ as the solution to b max /b min = 1 +λ 1 −λ , where b i /b j are the ratio of the normalized weights which, in the present notation, is b i /b j = (1 +λt i )/(1 +λt j ). The right hand side is maximized by taking t i as large as possible and t j as small as possible, and setting t i = 1 and t j = −1 (the extremes) yields the Bloch and Moses (1988) solution. b. b i = 1/k (1/σ 2 i ) __ j 1/σ 2 j _ = σ 2 i k j 1/σ 2 j . Thus, b max = σ 2 max k j 1/σ 2 j and b min = σ 2 min k j 1/σ 2 j and B = b max /b min = σ 2 max 2 min . Solving B = (1 + λ)/(1 −λ) yields λ = (B −1)/(B + 1). Substituting this into Tukey’s inequality yields Var W Var W (B + 1) 2 4B = ((σ 2 max 2 min ) + 1) 2 4(σ 2 max 2 min ) . 7.44 i X i is a complete suﬃcient statistic for θ when X i ∼ n(θ, 1). ¯ X 2 − 1/n is a function of i X i . Therefore, by Theorem 7.3.23, ¯ X 2 − 1/n is the unique best unbiased estimator of its expectation. E _ ¯ X 2 1 n _ = Var ¯ X + (E ¯ X) 2 1 n = 1 n 2 1 n = θ 2 . Therefore, ¯ X 2 −1/n is the UMVUE of θ 2 . We will calculate Var _ ¯ X 2 −1/n _ = Var( ¯ X 2 ) = E( ¯ X 4 ) −[E( ¯ X 2 )] 2 , where ¯ X ∼ n (θ, 1/n) , but ﬁrst we derive some general formulas that will also be useful in later exercises. Let Y ∼ n(θ, σ 2 ). Then here are formulas for EY 4 and Var Y 2 . EY 4 = E[Y 3 (Y −θ +θ)] = EY 3 (Y −θ) + EY 3 θ = E Y 3 (Y −θ) +θEY 3 . EY 3 (Y −θ) = σ 2 E(3Y 2 ) = σ 2 3 _ σ 2 2 _ = 3σ 4 + 3θ 2 σ 2 . (Stein’s Lemma) θEY 3 = θ _ 3θσ 2 3 _ = 3θ 2 σ 2 4 . (Example 3.6.6) Var Y 2 = 3σ 4 + 6θ 2 σ 2 4 −(σ 2 2 ) 2 = 2σ 4 + 4θ 2 σ 2 . Thus, Var _ ¯ X 2 1 n _ = Var ¯ X 2 = 2 1 n 2 + 4θ 2 1 n > 2 n . 7-18 Solutions Manual for Statistical Inference To calculate the Cram´er-Rao lower bound, we have E θ _ 2 logf(X|θ) ∂θ 2 _ = E θ _ 2 ∂θ 2 log 1 e −(X−θ) 2 /2 _ = E θ _ 2 ∂θ 2 _ log(2π) −1/2 1 2 (X−θ) 2 __ = E θ _ ∂θ (X−θ) _ = −1, and τ(θ) = θ 2 , [τ (θ)] 2 = (2θ) 2 = 4θ 2 so the Cram´er-Rao Lower Bound for estimating θ 2 is (θ)] 2 −nE θ _ 2 ∂θ 2 logf(X|θ) _ = 2 n . Thus, the UMVUE of θ 2 does not attain the Cram´er-Rao bound. (However, the ratio of the variance and the lower bound →1 as n →∞.) 7.45 a. Because ES 2 = σ 2 , bias(aS 2 ) = E(aS 2 ) −σ 2 = (a −1)σ 2 . Hence, MSE(aS 2 ) = Var(aS 2 ) + bias(aS 2 ) 2 = a 2 Var(S 2 ) + (a −1) 2 σ 4 . b. There were two typos in early printings; κ = E[X −µ] 4 4 and Var(S 2 ) = 1 n _ κ − n −3 n −1 _ σ 4 . See Exercise 5.8b for the proof. c. There was a typo in early printings; under normality κ = 3. Under normality we have κ = E[X −µ] 4 σ 4 = E _ X −µ σ _ 4 = EZ 4 , where Z ∼ n(0, 1). Now, using Lemma 3.6.5 with g(z) = z 3 we have κ = EZ 4 = Eg(Z)Z = 1E(3Z 2 ) = 3EZ 2 = 3. To minimize MSE(S 2 ) in general, write Var(S 2 ) = Bσ 4 . Then minimizing MSE(S 2 ) is equivalent to minimizing a 2 B + (a − 1) 2 . Set the derivative of this equal to 0 (B is not a function of a) to obtain the minimizing value of a is 1/(B+1). Using the expression in part (b), under normality the minimizing value of a is 1 B + 1 = 1 1 n _ 3 − n−3 n−1 _ + 1 = n −1 n + 1 . d. There was a typo in early printings; the minimizing a is a = n −1 (n + 1) + (κ−3)(n−1) n . To obtain this simply calculate 1/(B + 1) with (from part (b)) B = 1 n _ κ − n −3 n −1 _ . Second Edition 7-19 e. Using the expression for a in part (d), if κ = 3 the second term in the denominator is zero and a = (n − 1)/(n + 1), the normal result from part (c). If κ < 3, the second term in the denominator is negative. Because we are dividing by a smaller value, we have a > (n−1)/(n+1). Because Var(S 2 ) = Bσ 4 , B > 0, and, hence, a = 1/(B+1) < 1. Similarly, if κ > 3, the second term in the denominator is positive. Because we are dividing by a larger value, we have a < (n −1)/(n + 1). 7.46 a. For the uniform(θ, 2θ) distribution we have EX = (2θ +θ)/2 = 3θ/2. So we solve 3θ/2 = ¯ X for θ to obtain the method of moments estimator ˜ θ = 2 ¯ X/3. b. Let x (1) , . . . , x (n) denote the observed order statistics. Then, the likelihood function is L(θ|x) = 1 θ n I [x (n) /2,x (1) ] (θ). Because 1/θ n is decreasing, this is maximized at ˆ θ = x (n) /2. So ˆ θ = X (n) /2 is the MLE. Use the pdf of X (n) to calculate EX (n) = 2n+1 n+1 θ. So E ˆ θ = 2n+1 2n+2 θ, and if k = (2n + 2)/(2n + 1), Ek ˆ θ = θ. c. From Exercise 6.23, a minimal suﬃcient statistic for θ is (X (1) , X (n) ). ˜ θ is not a function of this minimal suﬃcient statistic. So by the Rao-Blackwell Theorem, E( ˜ θ|X (1) , X (n) ) is an unbiased estimator of θ ( ˜ θ is unbiased) with smaller variance than ˜ θ. The MLE is a function of (X (1) , X (n) ), so it can not be improved with the Rao-Blackwell Theorem. d. ˜ θ = 2(1.16)/3 = .7733 and ˆ θ = 1.33/2 = .6650. 7.47 X i ∼ n(r, σ 2 ), so ¯ X ∼ n(r, σ 2 /n) and E ¯ X 2 = r 2 + σ 2 /n. Thus E[(π ¯ X 2 − πσ 2 /n)] = πr 2 is best unbiased because ¯ X is a complete suﬃcient statistic. If σ 2 is unknown replace it with s 2 and the conclusion still holds. 7.48 a. The Cram´er-Rao Lower Bound for unbiased estimates of p is _ d dp p _ 2 −nE d 2 dp 2 logL(p|X) = 1 −nE _ d 2 dp 2 log[p X (1 −p) 1−X ] _ = 1 −nE _ X p 2 (1−X) (1−p) 2 _ = p(1 −p) n , because EX = p. The MLE of p is ˆ p = i X i /n, with E ˆ p = p and Var ˆ p = p(1−p)/n. Thus ˆ p attains the CRLB and is the best unbiased estimator of p. b. By independence, E(X 1 X 2 X 3 X 4 ) = i EX i = p 4 , so the estimator is unbiased. Because i X i is a complete suﬃcient statistic, Theorems 7.3.17 and 7.3.23 imply that E(X 1 X 2 X 3 X 4 | i X i ) is the best unbiased estimator of p 4 . Evaluating this yields E _ X 1 X 2 X 3 X 4 ¸ ¸ ¸ ¸ ¸ i X i = t _ = P(X 1 = X 2 = X 3 = X 4 = 1, n i=5 X i = t −4) P( i X i = t) = p 4 _ n−4 t−4 _ p t−4 (1 −p) n−t _ n t _ p t (1 −p) n−t = _ n −4 t −4 _ _ _ n t _ , for t ≥ 4. For t < 4 one of the X i s must be zero, so the estimator is E(X 1 X 2 X 3 X 4 | i X i = t) = 0. 7.49 a. From Theorem 5.5.9, Y = X (1) has pdf f Y (y) = n! (n −1)! 1 λ e −y/λ _ 1−(1 −e −y/λ ) _ n−1 = n λ e −ny/λ . Thus Y ∼ exponential(λ/n) so EY = λ/n and nY is an unbiased estimator of λ. 7-20 Solutions Manual for Statistical Inference b. Because f X (x) is in the exponential family, i X i is a complete suﬃcient statistic and E(nX (1) | i X i ) is the best unbiased estimator of λ. Because E( i X i ) = nλ, we must have E(nX (1) | i X i ) = i X i /n by completeness. Of course, any function of i X i that is an unbiased estimator of λ is the best unbiased estimator of λ. Thus, we know directly that because E( i X i ) = nλ, i X i /n is the best unbiased estimator of λ. c. From part (a), ˆ λ = 601.2 and from part (b) ˆ λ = 128.8. Maybe the exponential model is not a good assumption. 7.50 a. E(a ¯ X + (1 − a)cS) = aE ¯ X + (1 − a)E(cS) = aθ + (1 − a)θ = θ. So a ¯ X + (1 − a)cS is an unbiased estimator of θ. b. Because ¯ X and S 2 are independent for this normal model, Var(a ¯ X+(1−a)cS) = a 2 V 1 +(1− a) 2 V 2 , where V 1 = Var ¯ X = θ 2 /n and V 2 = Var(cS) = c 2 ES 2 −θ 2 = c 2 θ 2 −θ 2 = (c 2 −1)θ 2 . Use calculus to show that this quadratic function of a is minimized at a = V 2 V 1 +V 2 = (c 2 −1)θ 2 ((1/n) +c 2 −1)θ 2 = (c 2 −1) ((1/n) +c 2 −1) . c. Use the factorization in Example 6.2.9, with the special values µ = θ and σ 2 = θ 2 , to show that ( ¯ X, S 2 ) is suﬃcient. E( ¯ X −cS) = θ −θ = 0, for all θ. So ¯ X −cS is a nonzero function of ( ¯ X, S 2 ) whose expected value is always zero. Thus ( ¯ X, S 2 ) is not complete. 7.51 a. Straightforward calculation gives: E _ θ −(a 1 ¯ X +a 2 cS) ¸ 2 = a 2 1 Var ¯ X +a 2 2 c 2 Var S +θ 2 (a 1 +a 2 −1) 2 . Because Var ¯ X = θ 2 /n and Var S = ES 2 −(ES) 2 = θ 2 _ c 2 −1 c 2 _ , we have E _ θ −(a 1 ¯ X +a 2 cS) ¸ 2 = θ 2 _ a 2 1 _ n +a 2 2 (c 2 −1) + (a 1 +a 2 −1) 2 _ , and we only need minimize the expression in square brackets, which is independent of θ. Diﬀerentiating yields a 2 = _ (n + 1)c 2 −n ¸ −1 and a 1 = 1 − _ (n + 1)c 2 −n ¸ −1 . b. The estimator T has minimum MSE over a class of estimators that contain those in Exercise 7.50. c. Because θ > 0, restricting T ≥ 0 will improve the MSE. d. No. It does not ﬁt the deﬁnition of either one. 7.52 a. Because the Poisson family is an exponential family with t(x) = x, i X i is a complete suﬃcient statistic. Any function of i X i that is an unbiased estimator of λ is the unique best unbiased estimator of λ. Because ¯ X is a function of i X i and E ¯ X = λ, ¯ X is the best unbiased estimator of λ. b. S 2 is an unbiased estimator of the population variance, that is, ES 2 = λ. ¯ X is a one-to-one function of i X i . So ¯ X is also a complete suﬃcient statistic. Thus, E(S 2 | ¯ X) is an unbiased estimator of λ and, by Theorem 7.3.23, it is also the unique best unbiased estimator of λ. Therefore E(S 2 | ¯ X) = ¯ X. Then we have Var S 2 = Var _ E(S 2 | ¯ X) _ + EVar(S 2 | ¯ X) = Var ¯ X + EVar(S 2 | ¯ X), so Var S 2 > Var ¯ X. c. We formulate a general theorem. Let T(X) be a complete suﬃcient statistic, and let T (X) be any statistic other than T(X) such that ET(X) = ET (X). Then E[T (X)|T(X)] = T(X) and Var T (X) > Var T(X). Second Edition 7-21 7.53 Let a be a constant and suppose Cov θ 0 (W, U) > 0. Then Var θ 0 (W +aU) = Var θ 0 W +a 2 Var θ 0 U + 2aCov θ 0 (W, U). Choose a ∈ _ −2Cov θ 0 (W, U) _ Var θ 0 U, 0 _ . Then Var θ 0 (W + aU) < Var θ 0 W, so W cannot be best unbiased. 7.55 All three parts can be solved by this general method. Suppose X ∼ f(x|θ) = c(θ)m(x), a < x < θ. Then 1/c(θ) = _ θ a m(x) dx, and the cdf of X is F(x) = c(θ)/c(x), a < x < θ. Let Y = X (n) be the largest order statistic. Arguing as in Example 6.2.23 we see that Y is a complete suﬃcient statistic. Thus, any function T(Y ) that is an unbiased estimator of h(θ) is the best unbiased estimator of h(θ). By Theorem 5.4.4 the pdf of Y is g(y|θ) = nm(y)c(θ) n /c(y) n−1 , a < y < θ. Consider the equations _ θ a f(x|θ) dx = 1 and _ θ a T(y)g(y|θ) dy = h(θ), which are equivalent to _ θ a m(x) dx = 1 c(θ) and _ θ a T(y)nm(y) c(y) n−1 dy = h(θ) c(θ) n . Diﬀerentiating both sides of these two equations with respect to θ and using the Fundamental Theorem of Calculus yields m(θ) = − c (θ) c(θ) 2 and T(θ)nm(θ) c(θ) n−1 = c(θ) n h (θ) −h(θ)nc(θ) n−1 c (θ) c(θ) 2n . Change θs to ys and solve these two equations for T(y) to get the best unbiased estimator of h(θ) is T(y) = h(y) + h (y) nm(y)c(y) . For h(θ) = θ r , h (θ) = rθ r−1 . a. For this pdf, m(x) = 1 and c(θ) = 1/θ. Hence T(y) = y r + ry r−1 n(1/y) = n +r n y r . b. If θ is the lower endpoint of the support, the smallest order statistic Y = X (1) is a complete suﬃcient statistic. Arguing as above yields the best unbiased estimator of h(θ) is T(y) = h(y) − h (y) nm(y)c(y) . For this pdf, m(x) = e −x and c(θ) = e θ . Hence T(y) = y r ry r−1 ne −y e y = y r ry r−1 n . c. For this pdf, m(x) = e −x and c(θ) = 1/(e −θ −e −b ). Hence T(y) = y r ry r−1 ne −y (e −y −e −b ) = y r ry r−1 (1 −e −(b−y) ) n . 7-22 Solutions Manual for Statistical Inference 7.56 Because T is suﬃcient, φ(T) = E[h(X 1 , . . . , X n )|T] is a function only of T. That is, φ(T) is an estimator. If Eh(X 1 , . . . , X n ) = τ(θ), then Eh(X 1 , · · · , X n ) = E[E( h(X 1 , . . . , X n )| T)] = τ(θ), so φ(T) is an unbiased estimator of τ(θ). By Theorem 7.3.23, φ(T) is the best unbiased estimator of τ(θ). 7.57 a. T is a Bernoulli random variable. Hence, E p T = P p (T = 1) = P p _ n i=1 X i > X n+1 _ = h(p). b. n+1 i=1 X i is a complete suﬃcient statistic for θ, so E _ T ¸ ¸ ¸ n+1 i=1 X i _ is the best unbiased estimator of h(p). We have E _ T ¸ ¸ ¸ ¸ ¸ n+1 i=1 X i = y _ = P _ n i=1 X i > X n+1 ¸ ¸ ¸ ¸ ¸ n+1 i=1 X i = y _ = P _ n i=1 X i > X n+1 , n+1 i=1 X i = y _ _ P _ n+1 i=1 X i = y _ . The denominator equals _ n+1 y _ p y (1 −p) n+1−y . If y = 0 the numerator is P _ n i=1 X i > X n+1 , n+1 i=1 X i = 0 _ = 0. If y > 0 the numerator is P _ n i=1 X i > X n+1 , n+1 i=1 X i = y, X n+1 = 0 _ +P _ n i=1 X i > X n+1 , n+1 i=1 X i = y, X n+1 = 1 _ which equals P _ n i=1 X i > 0, n i=1 X i = y _ P(X n+1 = 0) +P _ n i=1 X i > 1, n i=1 X i = y −1 _ P(X n+1 = 1). For all y > 0, P _ n i=1 X i > 0, n i=1 X i = y _ = P _ n i=1 X i = y _ = _ n y _ p y (1 −p) n−y . If y = 1 or 2, then P _ n i=1 X i > 1, n i=1 X i = y −1 _ = 0. And if y > 2, then P _ n i=1 X i > 1, n i=1 X i = y −1 _ = P _ n i=1 X i = y −1 _ = _ n y −1 _ p y−1 (1 −p) n−y+1 . Second Edition 7-23 Therefore, the UMVUE is E _ T ¸ ¸ ¸ ¸ ¸ n+1 i=1 X i = y _ = _ ¸ ¸ ¸ _ ¸ ¸ ¸ _ 0 if y = 0 ( n y )p y (1−p) n−y (1−p) ( n+1 y )p y (1−p) n−y+1 = ( n y ) ( n+1 y ) = 1 (n+1)(n+1−y) if y = 1 or 2 (( n y )+( n y−1 ))p y (1−p) n−y+1 ( n+1 y )p y (1−p) n−y+1 = ( n y )+( n y−1 ) ( n+1 y ) = 1 if y > 2. 7.59 We know T = (n −1)S 2 2 ∼ χ 2 n−1 . Then ET p/2 = 1 Γ _ n−1 2 _ 2 n−1 2 _ 0 t p+n−1 2 −1 e t 2 dt = 2 p 2 Γ _ p+n−1 2 _ Γ _ n−1 2 _ = C p,n . Thus E _ (n −1)S 2 σ 2 _ p/2 = C p,n , so (n − 1) p/2 S p _ C p,n is an unbiased estimator of σ p . From Theorem 6.2.25, ( ¯ X, S 2 ) is a complete, suﬃcient statistic. The unbiased estimator (n−1) p/2 S p _ C p,n is a function of ( ¯ X, S 2 ). Hence, it is the best unbiased estimator. 7.61 The pdf for Y ∼ χ 2 ν is f(y) = 1 Γ(ν/2)2 ν/2 y ν/2−1 e −y/2 . Thus the pdf for S 2 = σ 2 Y/ν is g(s 2 ) = ν σ 2 1 Γ(ν/2)2 ν/2 _ s 2 ν σ 2 _ ν/2−1 e −s 2 ν/(2σ 2 ) . Thus, the log-likelihood has the form (gathering together constants that do not depend on s 2 or σ 2 ) log L(σ 2 |s 2 ) = log _ 1 σ 2 _ +K log _ s 2 σ 2 _ −K s 2 σ 2 +K , where K > 0 and K > 0. The loss function in Example 7.3.27 is L(σ 2 , a) = a σ 2 −log _ a σ 2 _ −1, so the loss of an estimator is the negative of its likelihood. 7.63 Let a = τ 2 /(τ 2 + 1), so the Bayes estimator is δ π (x) = ax. Then R(µ, δ π ) = (a −1) 2 µ 2 +a 2 . As τ 2 increases, R(µ, δ π ) becomes ﬂatter. 7.65 a. Figure omitted. b. The posterior expected loss is E(L(θ, a)|x) = e ca Ee −cθ −cE(a−θ)−1, where the expectation is with respect to π(θ|x). Then d da E(L(θ, a)|x) = ce ca Ee −cθ −c set = 0, and a = − 1 c log Ee −cθ is the solution. The second derivative is positive, so this is the mini- mum. 7-24 Solutions Manual for Statistical Inference c. π(θ|x) = n(¯ x, σ 2 /n). So, substituting into the formula for a normal mgf, we ﬁnd Ee −cθ = e −c¯ x+σ 2 c 2 /2n , and the LINEX posterior loss is E(L(θ, a)|x) = e c(a−¯ x)+σ 2 c 2 /2n −c(a − ¯ x) −1. Substitute Ee −cθ = e −c¯ x+σ 2 c 2 /2n into the formula in part (b) to ﬁnd the Bayes rule is ¯ x −cσ 2 /2n. d. For an estimator ¯ X +b, the LINEX posterior loss (from part (c)) is E(L(θ, ¯ x +b)|x) = e cb e c 2 σ 2 /2n −cb −1. For ¯ X the expected loss is e c 2 σ 2 /2n − 1, and for the Bayes estimator (b = −cσ 2 /2n) the expected loss is c 2 σ 2 /2n. The marginal distribution of ¯ X is m(¯ x) = 1, so the Bayes risk is inﬁnite for any estimator of the form ¯ X +b. e. For ¯ X + b, the squared error risk is E _ ( ¯ X +b) −θ ¸ 2 = σ 2 /n + b 2 , so ¯ X is better than the Bayes estimator. The Bayes risk is inﬁnite for both estimators. 7.66 Let S = i X i ∼ binomial(n, θ). a. E ˆ θ 2 = E S 2 n 2 = 1 n 2 ES 2 = 1 n 2 (nθ(1 −θ) + (nθ) 2 ) = θ n + n−1 n θ 2 . b. T (i) n = _ j=i X j _ 2 _ (n − 1) 2 . For S values of i, T (i) n = (S − 1) 2 /(n − 1) 2 because the X i that is dropped out equals 1. For the other n − S values of i, T (i) n = S 2 /(n − 1) 2 because the X i that is dropped out equals 0. Thus we can write the estimator as JK(T n ) = n S 2 n 2 n −1 n _ S (S −1) 2 (n −1) 2 + (n −S) S 2 (n −1) 2 _ = S 2 −S n(n −1) . c. EJK(T n ) = 1 n(n−1) (nθ(1 −θ) + (nθ) 2 −nθ) = n 2 θ 2 −nθ 2 n(n−1) = θ 2 . d. For this binomial model, S is a complete suﬃcient statistic. Because JK(T n ) is a function of S that is an unbiased estimator of θ 2 , it is the best unbiased estimator of θ 2 . Chapter 8 Hypothesis Testing 8.1 Let X = # of heads out of 1000. If the coin is fair, then X ∼ binomial(1000, 1/2). So P(X ≥ 560) = 1000 x=560 _ 1000 x __ 1 2 _ x _ 1 2 _ n−x ≈ .0000825, where a computer was used to do the calculation. For this binomial, EX = 1000p = 500 and Var X = 1000p(1 −p) = 250. A normal approximation is also very good for this calculation. P {X ≥ 560} = P _ X −500 250 559.5−500 250 _ ≈ P {Z ≥ 3.763} ≈ .0000839. Thus, if the coin is fair, the probability of observing 560 or more heads out of 1000 is very small. We might tend to believe that the coin is not fair, and p > 1/2. 8.2 Let X ∼ Poisson(λ), and we observed X = 10. To assess if the accident rate has dropped, we could calculate P( X ≤ 10| λ = 15) = 10 i=0 e −15 15 i i! = e −15 _ 1+15+ 15 2 2! +· · · + 15 10 10! _ ≈ .11846. This is a fairly large value, not overwhelming evidence that the accident rate has dropped. (A normal approximation with continuity correction gives a value of .12264.) 8.3 The LRT statistic is λ(y) = sup θ≤θ 0 L(θ|y 1 , . . . , y m ) sup Θ L(θ|y 1 , . . . , y m ) . Let y = m i=1 y i , and note that the MLE in the numerator is min {y/m,θ 0 } (see Exercise 7.12) while the denominator has y/m as the MLE (see Example 7.2.7). Thus λ(y) = _ 1 if y/m ≤ θ 0 0 ) y (1−θ 0 ) m−y (y/m) y (1−y/m) m−y if y/m > θ 0 , and we reject H 0 if 0 ) y (1−θ 0 ) m−y (y/m) y (1 −y/m) m−y < c. To show that this is equivalent to rejecting if y > b, we could show λ(y) is decreasing in y so that λ(y) < c occurs for y > b > mθ 0 . It is easier to work with log λ(y), and we have log λ(y) = y log θ 0 + (m−y) log (1 −θ 0 ) −y log _ y m _ −(m−y) log _ m−y m _ , 8-2 Solutions Manual for Statistical Inference and d dy logλ(y) = log θ 0 −log(1 −θ 0 ) −log _ y m _ −y 1 y + log _ m−y m _ + (m−y) 1 m−y = log _ θ 0 y/m _ m−y m _ 1−θ 0 _ . For y/m > θ 0 , 1 − y/m = (m− y)/m < 1 − θ 0 , so each fraction above is less than 1, and the log is less than 0. Thus d dy log λ < 0 which shows that λ is decreasing in y and λ(y) < c if and only if y > b. 8.4 For discrete random variables, L(θ|x) = f(x|θ) = P(X = x|θ). So the numerator and denomi- nator of λ(x) are the supremum of this probability over the indicated sets. 8.5 a. The log-likelihood is log L(θ, ν|x) = nlog θ +nθ log ν −(θ + 1) log _ i x i _ , ν ≤ x (1) , where x (1) = min i x i . For any value of θ, this is an increasing function of ν for ν ≤ x (1) . So both the restricted and unrestricted MLEs of ν are ˆ ν = x (1) . To ﬁnd the MLE of θ, set ∂θ log L(θ, x (1) |x) = n θ +nlog x (1) −log _ i x i _ = 0, and solve for θ yielding ˆ θ = n log( i x i /x n (1) ) = n T . (∂ 2 /∂θ 2 ) log L(θ, x (1) |x) = −n/θ 2 < 0, for all θ. So ˆ θ is a maximum. b. Under H 0 , the MLE of θ is ˆ θ 0 = 1, and the MLE of ν is still ˆ ν = x (1) . So the likelihood ratio statistic is λ(x) = x n (1) /( i x i ) 2 (n/T) n x n 2 /T (1) _ ( i x i ) n/T+1 = _ T n _ n e −T (e −T ) n/T = _ T n _ n e −T+n . (∂/∂T) log λ(x) = (n/T) − 1. Hence, λ(x) is increasing if T ≤ n and decreasing if T ≥ n. Thus, T ≤ c is equivalent to T ≤ c 1 or T ≥ c 2 , for appropriately chosen constants c 1 and c 2 . c. We will not use the hint, although the problem can be solved that way. Instead, make the following three transformations. First, let Y i = log X i , i = 1, . . . , n. Next, make the n-to-1 transformation that sets Z 1 = min i Y i and sets Z 2 , . . . , Z n equal to the remaining Y i s, with their order unchanged. Finally, let W 1 = Z 1 and W i = Z i − Z 1 , i = 2, . . . , n. Then you ﬁnd that the W i s are independent with W 1 ∼ f W 1 (w) = nν n e −nw , w > log ν, and W i ∼ exponential(1), i = 2, . . . , n. Now T = n i=2 W i ∼ gamma(n − 1, 1), and, hence, 2T ∼ gamma(n −1, 2) = χ 2 2(n−1) . 8.6 a. λ(x, y) = sup Θ 0 L(θ|x, y) sup Θ L(θ|x, y) = sup θ n i=1 1 θ e −x i m j=1 1 θ e −y j sup θ,µ n i=1 1 θ e −x i m j=1 1 µ e −y j = sup θ 1 θ m+n exp _ _ n i=1 x i + m j=1 y j __ θ _ sup θ,µ 1 θ n exp {− n i=1 x i /θ} 1 µ m exp _ m j=1 y j _. Second Edition 8-3 Diﬀerentiation will show that in the numerator ˆ θ 0 = ( i x i + j y j )/(n +m), while in the denominator ˆ θ = ¯ x and ˆ µ = ¯ y. Therefore, λ(x, y) = _ n+m i x i + j y j _ n+m exp _ _ n+m i x i + j y j _ _ i x i + j y j _ _ _ n i x i _ n exp _ _ n i x i _ i x i __ m j y j _ m exp _ _ m j y j _ j y j _ = (n +m) n+m n n m m ( i x i ) n _ j y j _ m _ i x i + j y j _ n+m . And the LRT is to reject H 0 if λ(x, y) ≤ c. b. λ = (n +m) n+m n n m m _ i x i i x i + j y j _ n _ j y j i x i + j y j _ m = (n +m) n+m n n m m T n (1 −T) m . Therefore λ is a function of T. λ is a unimodal function of T which is maximized when T = n m+n . Rejection for λ ≤ c is equivalent to rejection for T ≤ a or T ≥ b, where a and b are constants that satisfy a n (1 −a) m = b n (1 −b) m . c. When H 0 is true, i X i ∼ gamma(n, θ) and j Y j ∼ gamma(m, θ) and they are indepen- dent. So by an extension of Exercise 4.19b, T ∼ beta(n, m). 8.7 a. L(θ, λ|x) = n i=1 1 λ e −(x i −θ)/λ I [θ,∞) (x i ) = _ 1 λ _ n e −(Σ i x i −nθ)/λ I [θ,∞) (x (1) ), which is increasing in θ if x (1) ≥ θ (regardless of λ). So the MLE of θ is ˆ θ = x (1) . Then ∂log L ∂λ = − n λ + i x i −n ˆ θ λ 2 set = 0 ⇒ n ˆ λ = i x i −n ˆ θ ⇒ ˆ λ = ¯ x −x (1) . Because 2 log L ∂λ 2 = n λ 2 −2 i x i −n ˆ θ λ 3 ¸ ¸ ¸ ¸ ¸ ¯ x−x (1) = n (¯ x −x (1) ) 2 2n(¯ x −x (1) ) (¯ x −x (1) ) 3 = −n (¯ x −x (1) ) 2 < 0, we have ˆ θ = x (1) and ˆ λ = ¯ x−x (1) as the unrestricted MLEs of θ and λ. Under the restriction θ ≤ 0, the MLE of θ (regardless of λ) is ˆ θ 0 = _ 0 if x (1) > 0 x (1) if x (1) ≤ 0. For x (1) > 0, substituting ˆ θ 0 = 0 and maximizing with respect to λ, as above, yields ˆ λ 0 = ¯ x. Therefore, λ(x) = sup Θ 0 L(θ,λ | x) sup Θ L(θ,λ | x) = sup {(λ,θ):θ≤0} L(λ,θ | x) L( ˆ θ, ˆ λ | x) = _ 1 if x (1) ≤ 0 L(¯ x,0|x) L( ˆ λ, ˆ θ|x) if x (1) > 0, where L(¯ x, 0 | x) L( ˆ λ, ˆ θ | x) = (1/¯ x) n e −n¯ x/¯ x _ 1/ ˆ λ _ n e −n(¯ x−x (1) )/(¯ x−x (1) ) = _ ˆ λ ¯ x _ n = _ ¯ x−x (1) ¯ x _ n = _ 1 − x (1) ¯ x _ n . So rejecting if λ(x) ≤ c is equivalent to rejecting if x (1) /¯ x ≥ c , where c is some constant. 8-4 Solutions Manual for Statistical Inference b. The LRT statistic is λ(x) = sup β (1/β n )e −Σ i x i sup β,γ n n )( i x i ) γ−1 e −Σ i x γ i . The numerator is maximized at ˆ β 0 = ¯ x. For ﬁxed γ, the denominator is maximized at ˆ β γ = i x γ i /n. Thus λ(x) = ¯ x −n e −n sup γ n / ˆ β n γ )( i x i ) γ−1 e −Σ i x γ i / ˆ β γ = ¯ x −n sup γ n / ˆ β n γ )( i x i ) γ−1 . The denominator cannot be maximized in closed form. Numeric maximization could be used to compute the statistic for observed data x. 8.8 a. We will ﬁrst ﬁnd the MLEs of a and θ. We have L(a, θ | x) = n i=1 1 2πaθ e −(x i −θ) 2 /(2aθ) , log L(a, θ | x) = n i=1 1 2 log(2πaθ) − 1 2aθ (x i −θ) 2 . Thus ∂log L ∂a = n i=1 _ 1 2a + 1 2θa 2 (x i −θ) 2 _ = − n 2a + 1 2θa 2 n i=1 (x i −θ) 2 set = 0 ∂log L ∂θ = n i=1 _ 1 + 1 2aθ 2 (x i −θ) 2 + 1 (x i −θ) _ = − n + 1 2aθ 2 n i=1 (x i −θ) 2 + n¯ x −nθ set = 0. We have to solve these two equations simultaneously to get MLEs of a and θ, say ˆ a and ˆ θ. Solve the ﬁrst equation for a in terms of θ to get a = 1 n i=1 (x i −θ) 2 . Substitute this into the second equation to get n + n + n(¯ x−θ) = 0. So we get ˆ θ = ¯ x, and ˆ a = 1 n¯ x n i=1 (x i − ¯ x) 2 = ˆ σ 2 ¯ x , the ratio of the usual MLEs of the mean and variance. (Veriﬁcation that this is a maximum is lengthy. We omit it.) For a = 1, we just solve the second equation, which gives a quadratic in θ that leads to the restricted MLE ˆ θ R = −1+ _ 1+4(ˆ σ 2 +¯ x 2 ) 2 . Second Edition 8-5 Noting that ˆ a ˆ θ = ˆ σ 2 , we obtain λ(x) = L( ˆ θ R | x) L(ˆ a, ˆ θ | x) = n i=1 1 ˆ θ R e −(x i ˆ θ R ) 2 /(2 ˆ θ R ) n i=1 1 2πˆ a ˆ θ e −(x i ˆ θ) 2 /(2ˆ a ˆ θ) = _ 1/(2π ˆ θ R ) _ n/2 e −Σ i (x i ˆ θ R ) 2 /(2 ˆ θ R ) (1/(2πˆ σ 2 )) n/2 e −Σ i (x i −¯ x) 2 /(2ˆ σ 2 ) = _ ˆ σ 2 / ˆ θ R _ n/2 e (n/2)−Σ i (x i ˆ θ R ) 2 /(2 ˆ θ R ) . b. In this case we have log L(a, θ | x) = n i=1 _ 1 2 log(2πaθ 2 ) − 1 2aθ 2 (x i −θ) 2 _ . Thus ∂logL ∂a = n i=1 _ 1 2a + 1 2a 2 θ 2 (x i −θ) 2 _ = − n 2a + 1 2a 2 θ 2 n i=1 (x i −θ) 2 set = 0. ∂logL ∂θ = n i=1 _ 1 θ + 1 3 (x i −θ) 2 + 1 2 (x i −θ) _ = − n θ + 1 3 n i=1 (x i −θ) 2 + 1 2 n i=1 (x i −θ) set = 0. Solving the ﬁrst equation for a in terms of θ yields a = 1 2 n i=1 (x i −θ) 2 . Substituting this into the second equation, we get n θ + n θ +n i (x i −θ) i (x i −θ) 2 = 0. So again, ˆ θ = ¯ x and ˆ a = 1 n¯ x 2 n i=1 (x i − ¯ x) 2 = ˆ σ 2 ¯ x 2 in the unrestricted case. In the restricted case, set a = 1 in the second equation to obtain ∂log L ∂θ = − n θ + 1 θ 3 n i=1 (x i −θ) 2 + 1 θ 2 n i=1 (x i −θ) set = 0. Multiply through by θ 3 /n to get −θ 2 + 1 n n i=1 (x i −θ) 2 θ n n i=1 (x i −θ) = 0. Add ±¯ x inside the square and complete all sums to get the equation −θ 2 + ˆ σ 2 + (¯ x −θ) 2 +θ(¯ x −θ) = 0. 8-6 Solutions Manual for Statistical Inference This is a quadratic in θ with solution for the MLE ˆ θ R = ¯ x + _ ¯ x+4(ˆ σ 2 +¯ x 2 ) _ 2. which yields the LRT statistic λ(x) = L( ˆ θ R | x) L(ˆ a, ˆ θ | x) = n i=1 1 _ ˆ θ 2 R e −(x i ˆ θ R ) 2 /(2 ˆ θ 2 R ) n i=1 1 2πˆ a ˆ θ 2 e −(x i ˆ θ) 2 /(2ˆ a ˆ θ 2 ) = _ ˆ σ ˆ θ R _ n e (n/2)−Σ i (x i ˆ θ R ) 2 /(2 ˆ θ R ) . 8.9 a. The MLE of λ under H 0 is ˆ λ 0 = _ ¯ Y _ −1 , and the MLE of λ i under H 1 is ˆ λ i = Y −1 i . The LRT statistic is bounded above by 1 and is given by 1 ≥ _ ¯ Y _ −n e −n ( i Y i ) −1 e −n . Rearrangement of this inequality yields ¯ Y ≥ ( i Y i ) 1/n , the arithmetic-geometric mean inequality. b. The pdf of X i is f(x i i ) = (λ i /x 2 i )e −λ i /x i , x i > 0. The MLE of λ under H 0 is ˆ λ 0 = n/ [ i (1/X i )], and the MLE of λ i under H 1 is ˆ λ i = X i . Now, the argument proceeds as in part (a). 8.10 Let Y = i X i . The posterior distribution of λ|y is gamma (y +α, β/(β + 1)). a. P(λ ≤ λ 0 |y) = (β+1) y+α Γ(y+α)β y+α _ λ 0 0 t y+α−1 e −t(β+1)/β dt. P(λ > λ 0 |y) = 1 −P(λ ≤ λ 0 |y). b. Because β/(β + 1) is a scale parameter in the posterior distribution, (2(β + 1)λ/β)|y has a gamma(y + α, 2) distribution. If 2α is an integer, this is a χ 2 2y+2α distribution. So, for α = 5/2 and β = 2, P(λ ≤ λ 0 |y) = P _ 2(β+1)λ β 2(β+1)λ 0 β ¸ ¸ ¸ ¸ y _ = P(χ 2 2y+5 ≤ 3λ 0 ). 8.11 a. From Exercise 7.23, the posterior distribution of σ 2 given S 2 is IG(γ, δ), where γ = α+(n− 1)/2 and δ = [(n − 1)S 2 /2 + 1/β] −1 . Let Y = 2/(σ 2 δ). Then Y |S 2 ∼ gamma(γ, 2). (Note: If 2α is an integer, this is a χ 2 distribution.) Let M denote the median of a gamma(γ, 2) distribution. Note that M depends on only α and n, not on S 2 or β. Then we have P(Y ≥ 2/δ|S 2 ) = P(σ 2 ≤ 1|S 2 ) > 1/2 if and only if M > 2 δ = (n −1)S 2 + 2 β , that is, S 2 < M −2/β n −1 . b. From Example 7.2.11, the unrestricted MLEs are ˆ µ = ¯ X and ˆ σ 2 = (n −1)S 2 /n. Under H 0 , ˆ µ is still ¯ X, because this was the maximizing value of µ, regardless of σ 2 . Then because L(¯ x, σ 2 |x) is a unimodal function of σ 2 , the restricted MLE of σ 2 is ˆ σ 2 , if ˆ σ 2 ≤ 1, and is 1, if ˆ σ 2 > 1. So the LRT statistic is λ(x) = _ 1 if ˆ σ 2 ≤ 1 (ˆ σ 2 ) n/2 e −n(ˆ σ 2 −1)/2 if ˆ σ 2 > 1. Second Edition 8-7 We have that, for ˆ σ 2 > 1, ∂(ˆ σ 2 ) log λ(x) = n 2 _ 1 ˆ σ 2 −1 _ < 0. So λ(x) is decreasing in ˆ σ 2 , and rejecting H 0 for small values of λ(x) is equivalent to rejecting for large values of ˆ σ 2 , that is, large values of S 2 . The LRT accepts H 0 if and only if S 2 < k, where k is a constant. We can pick the prior parameters so that the acceptance regions match in this way. First, pick α large enough that M/(n−1) > k. Then, as β varies between 0 and ∞, (M −2/β)/(n −1) varies between −∞ and M/(n −1). So, for some choice of β, (M −2/β)/(n −1) = k and the acceptance regions match. 8.12 a. For H 0 : µ ≤ 0 vs. H 1 : µ > 0 the LRT is to reject H 0 if ¯ x > cσ/ n (Example 8.3.3). For α = .05 take c = 1.645. The power function is β(µ) = P _ ¯ X−µ σ/ n > 1.645− µ σ/ n _ = P _ Z > 1.645− σ _ . Note that the power will equal .5 when µ = 1.645σ/ n. b. For H 0 : µ = 0 vs. H A : µ = 0 the LRT is to reject H 0 if |¯ x| > cσ/ n (Example 8.2.2). For α = .05 take c = 1.96. The power function is β(µ) = P _ −1.96 − nµ/σ ≤ Z ≤ 1.96 + nµ/σ _ . In this case, µ = ±1.96σ/ n gives power of approximately .5. 8.13 a. The size of φ 1 is α 1 = P(X 1 > .95|θ = 0) = .05. The size of φ 2 is α 2 = P(X 1 +X 2 > C|θ = 0). If 1 ≤ C ≤ 2, this is α 2 = P(X 1 +X 2 > C|θ = 0) = _ 1 1−C _ 1 C−x 1 1 dx 2 dx 1 = (2 −C) 2 2 . Setting this equal to α and solving for C gives C = 2 − 2α, and for α = .05, we get C = 2 − .1 ≈ 1.68. b. For the ﬁrst test we have the power function β 1 (θ) = P θ (X 1 > .95) = _ 0 if θ ≤ −.05 θ +.05 if −.05 < θ ≤ .95 1 if .95 < θ. Using the distribution of Y = X 1 +X 2 , given by f Y (y|θ) = _ y −2θ if 2θ ≤ y < 2θ + 1 2θ + 2 −y if 2θ+1 ≤ y < 2θ + 2 0 otherwise, we obtain the power function for the second test as β 2 (θ) = P θ (Y > C) = _ ¸ _ ¸ _ 0 if θ ≤ (C/2) −1 (2θ + 2 −C) 2 /2 if (C/2) −1 < θ ≤ (C −1)/2 1 −(C −2θ) 2 /2 if (C −1)/2 < θ ≤ C/2 1 if C/2 < θ. c. From the graph it is clear that φ 1 is more powerful for θ near 0, but φ 2 is more powerful for larger θs. φ 2 is not uniformly more powerful than φ 1 . 8-8 Solutions Manual for Statistical Inference d. If either X 1 ≥ 1 or X 2 ≥ 1, we should reject H 0 , because if θ = 0, P(X i < 1) = 1. Thus, consider the rejection region given by {(x 1 , x 2 ): x 1 +x 2 > C} _ {(x 1 , x 2 ): x 1 > 1} _ {(x 1 , x 2 ): x 2 > 1}. The ﬁrst set is the rejection region for φ 2 . The test with this rejection region has the same size as φ 2 because the last two sets both have probability 0 if θ = 0. But for 0 < θ < C −1, The power function of this test is strictly larger than β 2 (θ). If C − 1 ≤ θ, this test and φ 2 have the same power. 8.14 The CLT tells us that Z = ( i X i −np)/ _ np(1 −p) is approximately n(0, 1). For a test that rejects H 0 when i X i > c, we need to ﬁnd c and n to satisfy P _ Z > c−n(.49) _ n(.49)(.51) _ = .01 and P _ Z > c−n(.51) _ n(.51)(.49) _ = .99. We thus want c−n(.49) _ n(.49)(.51) = 2.33 and c−n(.51) _ n(.51)(.49) = −2.33. Solving these equations gives n = 13,567 and c = 6,783.5. 8.15 From the Neyman-Pearson lemma the UMP test rejects H 0 if f(x | σ 1 ) f(x | σ 0 ) = (2πσ 2 1 ) −n/2 e −Σ i x 2 i /(2σ 2 1 ) (2πσ 2 0 ) −n/2 e −Σ i x 2 i /(2σ 2 0 ) = _ σ 0 σ 1 _ n exp _ 1 2 i x 2 i _ 1 σ 2 0 1 σ 2 1 _ _ > k for some k ≥ 0. After some algebra, this is equivalent to rejecting if i x 2 i > 2log (k (σ 1 0 ) n ) _ 1 σ 2 0 1 σ 2 1 _ = c _ because 1 σ 2 0 1 σ 2 1 > 0 _ . This is the UMP test of size α, where α = P σ 0 ( i X 2 i > c). To determine c to obtain a speciﬁed α, use the fact that i X 2 i 2 0 ∼ χ 2 n . Thus α = P σ 0 _ i X 2 i 2 0 > c/σ 2 0 _ = P _ χ 2 n > c/σ 2 0 _ , so we must have c/σ 2 0 = χ 2 n,α , which means c = σ 2 0 χ 2 n,α . 8.16 a. Size = P(reject H 0 | H 0 is true) = 1 ⇒ Type I error = 1. Power = P(reject H 0 | H A is true) = 1 ⇒ Type II error = 0. b. Size = P(reject H 0 | H 0 is true) = 0 ⇒ Type I error = 0. Power = P(reject H 0 | H A is true) = 0 ⇒ Type II error = 1. 8.17 a. The likelihood function is L(µ, θ|x, y) = µ n _ i x i _ µ−1 θ n _ _ j y j _ _ θ−1 . Second Edition 8-9 Maximizing, by diﬀerentiating the log-likelihood, yields the MLEs ˆ µ = − n i log x i and ˆ θ = − m j log y j . Under H 0 , the likelihood is L(θ|x, y) = θ n+m _ _ i x i j y j _ _ θ−1 , and maximizing as above yields the restricted MLE, ˆ θ 0 = − n +m i log x i + j log y j . The LRT statistic is λ(x, y) = ˆ θ m+n 0 ˆ µ θ m _ i x i θ 0 −ˆ µ _ _ j y j _ _ ˆ θ 0 ˆ θ . b. Substituting in the formulas for ˆ θ, ˆ µ and ˆ θ 0 yields ( i x i ) ˆ θ 0 −ˆ µ _ j y j θ 0 ˆ θ = 1 and λ(x, y) = ˆ θ m+n 0 ˆ µ θ m = ˆ θ n 0 ˆ µ n ˆ θ m 0 ˆ θ m = _ m+n m _ m _ m+n n _ n (1 −T) m T n . This is a unimodal function of T. So rejecting if λ(x, y) ≤ c is equivalent to rejecting if T ≤ c 1 or T ≥ c 2 , where c 1 and c 2 are appropriately chosen constants. c. Simple transformations yield −log X i ∼ exponential(1/µ) and −log Y i ∼ exponential(1/θ). Therefore, T = W/(W + V ) where W and V are independent, W ∼ gamma(n, 1/µ) and V ∼ gamma(m, 1/θ). Under H 0 , the scale parameters of W and V are equal. Then, a simple generalization of Exercise 4.19b yields T ∼ beta(n, m). The constants c 1 and c 2 are determined by the two equations P(T ≤ c 1 ) +P(T ≥ c 2 ) = α and (1 −c 1 ) m c n 1 = (1 −c 2 ) m c n 2 . 8.18 a. β(θ) = P θ _ | ¯ X−θ 0 | σ/ n > c _ = 1 −P θ _ | ¯ X−θ 0 | σ/ n ≤ c _ = 1 −P θ _ n ¯ X−θ 0 n _ = 1 −P θ _ −cσ/ n +θ 0 −θ σ/ n ¯ X−θ σ/ n cσ/ n +θ 0 −θ σ/ n _ = 1 −P _ −c + θ 0 −θ σ/ n ≤ Z ≤ c + θ 0 −θ σ/ n _ = 1 + Φ _ −c + θ 0 −θ σ/ n _ −Φ _ c + θ 0 −θ σ/ n _ , where Z ∼ n(0, 1) and Φ is the standard normal cdf. 8-10 Solutions Manual for Statistical Inference b. The size is .05 = β(θ 0 ) = 1 + Φ(−c) − Φ(c) which implies c = 1.96. The power (1 − type II error) is .75 ≤ β(θ 0 +σ) = 1 + Φ(−c − n) −Φ(c − n) = 1 + Φ(−1.96− n) . ¸¸ . ≈0 −Φ(1.96 − n). Φ(−.675) ≈ .25 implies 1.96 − n = −.675 implies n = 6.943 ≈ 7. 8.19 The pdf of Y is f(y|θ) = 1 θ y (1/θ)−1 e −y 1/θ , y > 0. By the Neyman-Pearson Lemma, the UMP test will reject if 1 2 y −1/2 e y−y 1/2 = f(y|2) f(y|1) > k. To see the form of this rejection region, we compute d dy _ 1 2 y −1/2 e y−y 1/2 _ = 1 2 y −3/2 e y−y 1/2 _ y − y 1/2 2 1 2 _ which is negative for y < 1 and positive for y > 1. Thus f(y|2)/f(y|1) is decreasing for y ≤ 1 and increasing for y ≥ 1. Hence, rejecting for f(y|2)/f(y|1) > k is equivalent to rejecting for y ≤ c 0 or y ≥ c 1 . To obtain a size α test, the constants c 0 and c 1 must satisfy α = P(Y ≤ c 0 |θ = 1) +P(Y ≥ c 1 |θ = 1) = 1 −e −c 0 +e −c 1 and f(c 0 |2) f(c 0 |1) = f(c 1 |2) f(c 1 |1) . Solving these two equations numerically, for α = .10, yields c 0 = .076546 and c 1 = 3.637798. The Type II error probability is P(c 0 < Y < c 1 |θ = 2) = _ c 1 c 0 1 2 y −1/2 e −y 1/2 dy = −e −y 1/2 ¸ ¸ ¸ c 1 c 0 = .609824. 8.20 By the Neyman-Pearson Lemma, the UMP test rejects for large values of f(x|H 1 )/f(x|H 0 ). Computing this ratio we obtain x 1 2 3 4 5 6 7 f(x|H 1 ) f(x|H 0 ) 6 5 4 3 2 1 .84 The ratio is decreasing in x. So rejecting for large values of f(x|H 1 )/f(x|H 0 ) corresponds to rejecting for small values of x. To get a size α test, we need to choose c so that P(X ≤ c|H 0 ) = α. The value c = 4 gives the UMP size α = .04 test. The Type II error probability is P(X = 5, 6, 7|H 1 ) = .82. 8.21 The proof is the same with integrals replaced by sums. 8.22 a. From Corollary 8.3.13 we can base the test on i X i , the suﬃcient statistic. Let Y = i X i ∼ binomial(10, p) and let f(y|p) denote the pmf of Y . By Corollary 8.3.13, a test that rejects if f(y|1/4)/f(y|1/2) > k is UMP of its size. By Exercise 8.25c, the ratio f(y|1/2)/f(y|1/4) is increasing in y. So the ratio f(y|1/4)/f(y|1/2) is decreasing in y, and rejecting for large value of the ratio is equivalent to rejecting for small values of y. To get α = .0547, we must ﬁnd c such that P(Y ≤ c|p = 1/2) = .0547. Trying values c = 0, 1, . . ., we ﬁnd that for c = 2, P (Y ≤ 2|p = 1/2) = .0547. So the test that rejects if Y ≤ 2 is the UMP size α = .0547 test. The power of the test is P (Y ≤ 2|p = 1/4) ≈ .526. Second Edition 8-11 b. The size of the test is P (Y ≥ 6|p = 1/2) = 10 k=6 _ 10 k _ _ 1 2 _ k _ 1 2 _ 10−k ≈ .377. The power function is β(θ) = 10 k=6 _ 10 k _ θ k (1 −θ) 10−k c. There is a nonrandomized UMP test for all α levels corresponding to the probabilities P (Y ≤ i|p = 1/2), where i is an integer. For n = 10, α can have any of the values 0, 1 1024 , 11 1024 , 56 1024 , 176 1024 , 386 1024 , 638 1024 , 848 1024 , 968 1024 , 1013 1024 , 1023 1024 , and 1. 8.23 a. The test is Reject H 0 if X > 1/2. So the power function is β(θ) = P θ (X > 1/2) = _ 1 1/2 Γ(θ+1) Γ(θ)Γ(1) x θ−1 (1 −x) 1−1 dx = θ 1 θ x θ ¸ ¸ ¸ ¸ 1 1/2 = 1 − 1 2 θ . The size is sup θ∈H 0 β(θ) = sup θ≤1 (1 −1/2 θ ) = 1 −1/2 = 1/2. b. By the Neyman-Pearson Lemma, the most powerful test of H 0 : θ = 1 vs. H 1 : θ = 2 is given by Reject H 0 if f(x | 2)/f(x | 1) > k for some k ≥ 0. Substituting the beta pdf gives f(x|2) f(x|1) = 1 β(2,1) x 2−1 (1 −x) 1−1 1 β(1,1) x 1−1 (1 −x) 1−1 = Γ(3) Γ(2)Γ(1) x = 2x. Thus, the MP test is Reject H 0 if X > k/2. We now use the α level to determine k. We have α = sup θ∈Θ 0 β(θ) = β(1) = _ 1 k/2 f X (x|1) dx = _ 1 k/2 1 β(1, 1) x 1−1 (1 −x) 1−1 dx = 1 − k 2 . Thus 1 −k/2 = α, so the most powerful α level test is reject H 0 if X > 1 −α. c. For θ 2 > θ 1 , f(x|θ 2 )/f(x|θ 1 ) = (θ 2 1 )x θ 2 −θ 1 , an increasing function of x because θ 2 > θ 1 . So this family has MLR. By the Karlin-Rubin Theorem, the test that rejects H 0 if X > t is the UMP test of its size. By the argument in part (b), use t = 1 −α to get size α. 8.24 For H 0 : θ = θ 0 vs. H 1 : θ = θ 1 , the LRT statistic is λ(x) = L(θ 0 |x) max{L(θ 0 |x), L(θ 1 |x)} = _ 1 if L(θ 0 |x) ≥ L(θ 1 |x) L(θ 0 |x)/L(θ 1 |x) if L(θ 0 |x) < L(θ 1 |x). The LRT rejects H 0 if λ(x) < c. The Neyman-Pearson test rejects H 0 if f(x|θ 1 )/f(x|θ 0 ) = L(θ 1 |x)/L(θ 0 |x) > k. If k = 1/c > 1, this is equivalent to L(θ 0 |x)/L(θ 1 |x) < c, the LRT. But if c ≥ 1 or k ≤ 1, the tests will not be the same. Because c is usually chosen to be small (k large) to get a small size α, in practice the two tests are often the same. 8.25 a. For θ 2 > θ 1 , g(x | θ 2 ) g(x | θ 1 ) = e −(x−θ 2 ) 2 /2σ 2 e −(x−θ 1 ) 2 /2σ 2 = e x(θ 2 −θ 1 )/σ 2 e 2 1 −θ 2 2 )/2σ 2 . Because θ 2 −θ 1 > 0, the ratio is increasing in x. So the families of n(θ, σ 2 ) have MLR. b. For θ 2 > θ 1 , g(x | θ 2 ) g(x | θ 1 ) = e −θ 2 θ x 2 /x! e −θ 1 θ x 1 /x! = _ θ 2 θ 1 _ x e θ 1 −θ 2 , which is increasing in x because θ 2 1 > 1. Thus the Poisson(θ) family has an MLR. c. For θ 2 > θ 1 , g(x | θ 2 ) g(x | θ 1 ) = _ n x _ θ x 2 (1−θ 2 ) n−x _ n x _ θ x 1 (1−θ 1 ) n−x = _ θ 2 (1−θ 1 ) θ 1 (1−θ 2 ) _ x _ 1 −θ 2 1 −θ 1 _ n . Both θ 2 1 > 1 and (1 −θ 1 )/(1 −θ 2 ) > 1. Thus the ratio is increasing in x, and the family has MLR. (Note: You can also use the fact that an exponential family h(x)c(θ) exp(w(θ)x) has MLR if w(θ) is increasing in θ (Exercise 8.27). For example, the Poisson(θ) pmf is e −θ exp(xlog θ)/x!, and the family has MLR because log θ is increasing in θ.) 8-12 Solutions Manual for Statistical Inference 8.26 a. We will prove the result for continuous distributions. But it is also true for discrete MLR families. For θ 1 > θ 2 , we must show F(x|θ 1 ) ≤ F(x|θ 2 ). Now d dx [F(x|θ 1 ) −F(x|θ 2 )] = f(x|θ 1 ) −f(x|θ 2 ) = f(x|θ 2 ) _ f(x|θ 1 ) f(x|θ 2 ) −1 _ . Because f has MLR, the ratio on the right-hand side is increasing, so the derivative can only change sign from negative to positive showing that any interior extremum is a minimum. Thus the function in square brackets is maximized by its value at ∞ or −∞, which is zero. b. From Exercise 3.42, location families are stochastically increasing in their location param- eter, so the location Cauchy family with pdf f(x|θ) = (π[1+(x−θ) 2 ]) −1 is stochastically increasing. The family does not have MLR. 8.27 For θ 2 > θ 1 , g(t|θ 2 ) g(t|θ 1 ) = c(θ 2 ) c(θ 1 ) e [w(θ 2 )−w(θ 1 )]t which is increasing in t because w(θ 2 ) − w(θ 1 ) > 0. Examples include n(θ, 1), beta(θ, 1), and Bernoulli(θ). 8.28 a. For θ 2 > θ 1 , the likelihood ratio is f(x|θ 2 ) f(x|θ 1 ) = e θ 1 −θ 2 _ 1+e x−θ 1 1+e x−θ 2 _ 2 . The derivative of the quantity in brackets is d dx 1+e x−θ 1 1+e x−θ 2 = e x−θ 1 −e x−θ 2 (1+e x−θ 2 ) 2 . Because θ 2 > θ 1 , e x−θ 1 > e x−θ 2 , and, hence, the ratio is increasing. This family has MLR. b. The best test is to reject H 0 if f(x|1)/f(x|0) > k. From part (a), this ratio is increasing in x. Thus this inequality is equivalent to rejecting if x > k . The cdf of this logistic is F(x|θ) = e x−θ _ (1 +e x−θ ). Thus α = 1 −F(k |0) = 1 1+e k and β = F(k |1) = e k −1 1+e k −1 . For a speciﬁed α, k = log(1 −α)/α. So for α = .2, k ≈ 1.386 and β ≈ .595. c. The Karlin-Rubin Theorem is satisﬁed, so the test is UMP of its size. 8.29 a. Let θ 2 > θ 1 . Then f(x|θ 2 ) f(x|θ 1 ) = 1+(x −θ 1 ) 2 1+(x −θ 2 ) 2 = 1 + (1+θ 1 ) 2 /x 2 −2θ 1 /x 1 + (1+θ 2 ) 2 /x 2 −2θ 2 /x . The limit of this ratio as x → ∞ or as x → −∞ is 1. So the ratio cannot be monotone increasing (or decreasing) between −∞ and ∞. Thus, the family does not have MLR. b. By the Neyman-Pearson Lemma, a test will be UMP if it rejects when f(x|1)/f(x|0) > k, for some constant k. Examination of the derivative shows that f(x|1)/f(x|0) is decreasing for x ≤ (1 − 5)/2 = −.618, is increasing for (1 − 5)/2 ≤ x ≤ (1 + 5)/2 = 1.618, and is decreasing for (1+ 5)/2 ≤ x. Furthermore, f(1|1)/f(1|0) = f(3|1)/f(3|0) = 2. So rejecting if f(x|1)/f(x|0) > 2 is equivalent to rejecting if 1 < x < 3. Thus, the given test is UMP of its size. The size of the test is P(1 < X < 3|θ = 0) = _ 3 1 1 π 1 1+x 2 dx = 1 π arctanx ¸ ¸ ¸ ¸ 3 1 ≈ .1476. Second Edition 8-13 The Type II error probability is 1 −P(1 < X < 3|θ = 1) = 1 − _ 3 1 1 π 1 1+(x −1) 2 dx = 1 − 1 π arctan(x −1) ¸ ¸ ¸ ¸ 3 1 ≈ .6476. c. We will not have f(1|θ)/f(1|0) = f(3|θ)/f(3|0) for any other value of θ = 1. Try θ = 2, for example. So the rejection region 1 < x < 3 will not be most powerful at any other value of θ. The test is not UMP for testing H 0 : θ ≤ 0 versus H 1 : θ > 0. 8.30 a. For θ 2 > θ 1 > 0, the likelihood ratio and its derivative are f(x|θ 2 ) f(x|θ 1 ) = θ 2 θ 1 θ 2 1 +x 2 θ 2 2 +x 2 and d dx f(x|θ 2 ) f(x|θ 1 ) = θ 2 θ 1 θ 2 2 −θ 2 1 2 2 +x 2 ) 2 x. The sign of the derivative is the same as the sign of x (recall, θ 2 2 − θ 2 1 > 0), which changes sign. Hence the ratio is not monotone. b. Because f(x|θ) = (θ/π)(θ 2 +|x| 2 ) −1 , Y = |X| is suﬃcient. Its pdf is f(y|θ) = π 1 θ 2 +y 2 , y > 0. Diﬀerentiating as above, the sign of the derivative is the same as the sign of y, which is positive. Hence the family has MLR. 8.31 a. By the Karlin-Rubin Theorem, the UMP test is to reject H 0 if i X i > k, because i X i is suﬃcient and i X i ∼ Poisson(nλ) which has MLR. Choose the constant k to satisfy P ( i X i > k|λ = λ 0 ) = α. b. P _ i X i > k ¸ ¸ ¸ ¸ ¸ λ = 1 _ ≈ P _ Z > (k −n)/ n _ set = .05, P _ i X i > k ¸ ¸ ¸ ¸ ¸ λ = 2 _ ≈ P _ Z > (k −2n)/ 2n _ set = .90. Thus, solve for k and n in k −n n = 1.645 and k −2n 2n = −1.28, yielding n = 12 and k = 17.70. 8.32 a. This is Example 8.3.15. b. This is Example 8.3.19. 8.33 a. From Theorems 5.4.4 and 5.4.6, the marginal pdf of Y 1 and the joint pdf of (Y 1 , Y n ) are f(y 1 |θ) = n(1 −(y 1 −θ)) n−1 , θ < y 1 < θ + 1, f(y 1 , y n |θ) = n(n −1)(y n −y 1 ) n−2 , θ < y 1 < y n < θ + 1. Under H 0 , P(Y n ≥ 1) = 0. So α = P(Y 1 ≥ k|0) = _ 1 k n(1 −y 1 ) n−1 dy 1 = (1 −k) n . Thus, use k = 1 −α 1/n to have a size α test. 8-14 Solutions Manual for Statistical Inference b. For θ ≤ k −1, β(θ) = 0. For k −1 < θ ≤ 0, β(θ) = _ θ+1 k n(1 −(y 1 −θ)) n−1 dy 1 = (1 −k +θ) n . For 0 < θ ≤ k, β(θ) = _ θ+1 k n(1 −(y 1 −θ)) n−1 dy 1 + _ k θ _ θ+1 1 n(n −1)(y n −y 1 ) n−2 dy n dy 1 = α + 1 −(1 −θ) n . And for k < θ, β(θ) = 1. c. (Y 1 , Y n ) are suﬃcient statistics. So we can attempt to ﬁnd a UMP test using Corollary 8.3.13 and the joint pdf f(y 1 , y n |θ) in part (a). For 0 < θ < 1, the ratio of pdfs is f(y 1 , y n |θ) f(y 1 , y n |0) = _ 0 if 0 < y 1 ≤ θ, y 1 < y n < 1 1 if θ < y 1 < y n < 1 ∞ if 1 ≤ y n < θ + 1, θ < y 1 < y n . For 1 ≤ θ, the ratio of pdfs is f(y 1 , y n |θ) f(y 1 , y n |0) = _ 0 if y 1 < y n < 1 ∞ if θ < y 1 < y n < θ + 1. For 0 < θ < k, use k = 1. The given test always rejects if f(y 1 , y n |θ)/f(y 1 , y n |0) > 1 and always accepts if f(y 1 , y n |θ)/f(y 1 , y n |0) < 1. For θ ≥ k, use k = 0. The given test always rejects if f(y 1 , y n |θ)/f(y 1 , y n |0) > 0 and always accepts if f(y 1 , y n |θ)/f(y 1 , y n |0) < 0. Thus the given test is UMP by Corollary 8.3.13. d. According to the power function in part (b), β(θ) = 1 for all θ ≥ k = 1 − α 1/n . So these conditions are satisﬁed for any n. 8.34 a. This is Exercise 3.42a. b. This is Exercise 8.26a. 8.35 a. We will use the equality in Exercise 3.17 which remains true so long as ν > −α. Recall that Y ∼ χ 2 ν = gamma(ν/2, 2). Thus, using the independence of X and Y we have ET = E X _ Y/ν = (EX) νEY −1/2 = µ ν Γ((ν −1)/2) Γ(ν/2) 2 if ν > 1. To calculate the variance, compute E(T ) 2 = E X 2 Y/ν = (EX 2 )νEY −1 = (µ 2 + 1)ν Γ((ν −2)/2) Γ(ν/2)2 = 2 + 1)ν ν −2 if ν > 2. Thus, if ν > 2, Var T = 2 + 1)ν ν −2 _ µ ν Γ((ν −1)/2) Γ(ν/2) 2 _ 2 b. If δ = 0, all the terms in the sum for k = 1, 2, . . . are zero because of the δ k term. The expression with just the k = 0 term and δ = 0 simpliﬁes to the central t pdf. c. The argument that the noncentral t has an MLR is fairly involved. It may be found in Lehmann (1986, p. 295). Second Edition 8-15 8.37 a. P( ¯ X > θ 0 + z α σ/ n|θ 0 ) = P _ ( ¯ X−θ 0 )/(σ/ n) > z α 0 _ = P(Z > z α ) = α, where Z ∼ n(0, 1). Because ¯ x is the unrestricted MLE, and the restricted MLE is θ 0 if ¯ x > θ 0 , the LRT statistic is, for ¯ x ≥ θ 0 λ(x) = (2πσ 2 ) −n/2 e −Σ i (x i −θ 0 ) 2 /2σ 2 (2πσ 2 ) −n/2 e −Σ i (x i −¯ x) 2 /2σ 2 = e −[n(¯ x−θ 0 ) 2 +(n−1)s 2 ]] _ 2 e −(n−1)s 2 /2σ 2 = e −n(¯ x−θ 0 ) 2 /2σ 2 . and the LRT statistic is 1 for ¯ x < θ 0 . Thus, rejecting if λ < c is equivalent to rejecting if (¯ x −θ 0 )/(σ/ n) > c (as long as c < 1 – see Exercise 8.24). b. The test is UMP by the Karlin-Rubin Theorem. c. P( ¯ X > θ 0 + t n−1,α S/ n|θ = θ 0 ) = P(T n−1 > t n−1,α ) = α, when T n−1 is a Student’s t random variable with n − 1 degrees of freedom. If we deﬁne ˆ σ 2 = 1 n (x i − ¯ x) 2 and ˆ σ 2 0 = 1 n (x i −θ 0 ) 2 , then for ¯ x ≥ θ 0 the LRT statistic is λ = (ˆ σ 2 /ˆ σ 2 0 ) n/2 , and for ¯ x < θ 0 the LRT statistic is λ = 1. Writing ˆ σ 2 = n−1 n s 2 and ˆ σ 2 0 = (¯ x −θ 0 ) 2 + n−1 n s 2 , it is clear that the LRT is equivalent to the t-test because λ < c when n−1 n s 2 (¯ x−θ 0 ) 2 + n−1 n s 2 = (n −1)/n (¯ x−θ 0 ) 2 /s 2 +(n −1)/n < c and ¯ x ≥ θ 0 , which is the same as rejecting when (¯ x −θ 0 )/(s/ n) is large. d. The proof that the one-sided t test is UMP unbiased is rather involved, using the bounded completeness of the normal distribution and other facts. See Chapter 5 of Lehmann (1986) for a complete treatment. 8.38 a. Size = P θ 0 _ | ¯ X −θ 0 |> t n−1,α/2 _ S 2 /n _ = 1 −P θ 0 _ −t n−1,α/2 _ S 2 /n ≤ ¯ X −θ 0 ≤ t n−1,α/2 _ S 2 /n _ = 1 −P θ 0 _ −t n−1,α/2 ¯ X −θ 0 _ S 2 /n ≤ t n−1,α/2 _ _ ¯ X −θ 0 _ S 2 /n ∼ t n−1 under H 0 _ = 1 −(1 −α) = α. b. The unrestricted MLEs are ˆ θ = ¯ X and ˆ σ 2 = i (X i ¯ X) 2 /n. The restricted MLEs are ˆ θ 0 = θ 0 and ˆ σ 2 0 = i (X i −θ 0 ) 2 /n. So the LRT statistic is λ(x) = (2πˆ σ 0 ) −n/2 exp{−nˆ σ 2 0 /(2ˆ σ 2 0 )} (2πˆ σ) −n/2 exp{−nˆ σ 2 /(2ˆ σ 2 )} = _ i (x i −¯ x) 2 i (x i −θ 0 ) 2 _ n/2 = _ i (x i −¯ x) 2 i (x i −¯ x) 2 +n(¯ x−θ 0 ) 2 _ n/2 . For a constant c, the LRT is reject H 0 if _ i (x i −¯ x) 2 i (x i −¯ x) 2 +n(¯ x−θ 0 ) 2 _ = 1 1 +n(¯ x−θ 0 ) 2 / i (x i −¯ x) 2 < c 2/n . After some algebra we can write the test as reject H 0 if |¯ x −θ 0 |> _ _ c −2/n −1 _ (n −1) s 2 n _ 1/2 . 8-16 Solutions Manual for Statistical Inference We now choose the constant c to achieve size α, and we reject if |¯ x −θ 0 |> t n−1,α/2 _ s 2 /n. c. Again, see Chapter 5 of Lehmann (1986). 8.39 a. From Exercise 4.45c, W i = X i − Y i ∼ n(µ W , σ 2 W ), where µ X − µ Y = µ W and σ 2 X + σ 2 Y ρσ X σ Y = σ 2 W . The W i s are independent because the pairs (X i , Y i ) are. b. The hypotheses are equivalent to H 0 : µ W = 0 vs H 1 : µ W = 0, and, from Exercise 8.38, if we reject H 0 when | ¯ W |> t n−1,α/2 _ S 2 W /n, this is the LRT (based on W 1 , . . . , W n ) of size α. (Note that if ρ > 0, Var W i can be small and the test will have good power.) 8.41 a. λ(x, y) = sup H 0 L(µ X , µ Y , σ 2 | x, y) supL(µ X , µ Y , σ 2 | x, y) = L(ˆ µ, ˆ σ 2 0 | x, y) L(ˆ µ X , ˆ µ Y , ˆ σ 2 1 | x, y) . Under H 0 , the X i s and Y i s are one sample of size m+n from a n(µ, σ 2 ) population, where µ = µ X = µ Y . So the restricted MLEs are ˆ µ = i X i + i Y i n +m = n¯ x+n¯ y n +m and ˆ σ 2 0 = i (X i − ˆ µ) 2 + i (Y i − ˆ µ) 2 n +m . To obtain the unrestricted MLEs, ˆ µ x , ˆ µ y , ˆ σ 2 , use L(µ X , µ Y , σ 2 |x, y) = (2πσ 2 ) −(n+m)/2 e −[Σ i (x i −µ X ) 2 i (y i −µ Y ) 2 ]/2σ 2 . Firstly, note that ˆ µ X = ¯ x and ˆ µ Y = ¯ y, because maximizing over µ X does not involve µ Y and vice versa. Then ∂log L ∂σ 2 = − n +m 2 1 σ 2 + 1 2 _ i (x i − ˆ µ X ) 2 + i (y i − ˆ µ Y ) 2 _ 1 2 ) 2 set = 0 implies ˆ σ 2 = _ n i=1 (x i − ¯ x) 2 + m i=1 (y i − ¯ y) 2 _ 1 n +m . To check that this is a maximum, 2 log L ∂(σ 2 ) 2 ¸ ¸ ¸ ¸ ¸ ˆ σ 2 = n +m 2 1 2 ) 2 _ i (x i − ˆ µ X ) 2 + i (y i − ˆ µ Y ) 2 _ 1 2 ) 3 ¸ ¸ ¸ ¸ ¸ ˆ σ 2 = n +m 2 1 (ˆ σ 2 ) 2 −(n +m) 1 (ˆ σ 2 ) 2 = − n +m 2 1 (ˆ σ 2 ) 2 < 0. Thus, it is a maximum. We then have λ(x, y) = (2πˆ σ 2 0 ) n+m 2 exp _ 1 2ˆ σ 2 0 _ n i=1 (x i − ˆ µ) 2 + m i=1 (y i − ˆ µ) 2 __ (2πˆ σ 2 ) n+m 2 exp _ 1 2ˆ σ 2 _ n i=1 (x i − ¯ x) 2 + m i=1 (y i − ¯ y) 2 __ = _ ˆ σ 2 0 ˆ σ 2 1 _ n+m 2 and the LRT is rejects H 0 if ˆ σ 2 0 /ˆ σ 2 > k. In the numerator, ﬁrst substitute ˆ µ = (n¯ x + m¯ y)/(n +m) and write n i=1 _ x i n¯ x+m¯ y n +m _ 2 = n i=1 _ (x i −¯ x)+ _ ¯ x− n¯ x+m¯ y n +m __ 2 = n i=1 (x i − ¯ x) 2 + nm 2 (n +m) 2 (¯ x − ¯ y) 2 , Second Edition 8-17 because the cross term is zero. Performing a similar operation on the Y sum yields ˆ σ 2 0 ˆ σ 2 = (x i −¯ x) 2 + (y i −¯ y) 2 + nm n+m (¯ x−¯ y) 2 ˆ σ 2 = n +m+ nm n +m (¯ x−¯ y) 2 ˆ σ 2 . Because ˆ σ 2 = n+m−2 n+m S 2 p , large values of ˆ σ 2 0 _ ˆ σ 2 are equivalent to large values of (¯ x−¯ y) 2 _ S 2 p and large values of |T|. Hence, the LRT is the two-sample t-test. b. T = ¯ X − ¯ Y _ S 2 p (1/n + 1/m) = ( ¯ X − ¯ Y ) _ _ σ 2 (1/n + 1/m) _ [(n +m−2)S 2 p 2 ]/(n +m−2) . Under H 0 , ( ¯ X− ¯ Y ) ∼ n(0, σ 2 (1/n+1/m)). Under the model, (n−1)S 2 X 2 and (m−1)S 2 Y 2 are independent χ 2 random variables with (n − 1) and (m− 1) degrees of freedom. Thus, (n + m − 2)S 2 p 2 = (n − 1)S 2 X 2 + (m − 1)S 2 Y 2 ∼ χ 2 n+m−2 . Furthermore, ¯ X − ¯ Y is independent of S 2 X and S 2 Y , and, hence, S 2 p . So T ∼ t n+m−2 . c. The two-sample t test is UMP unbiased, but the proof is rather involved. See Chapter 5 of Lehmann (1986). d. For these data we have n = 14, ¯ X = 1249.86, S 2 X = 591.36, m = 9, ¯ Y = 1261.33, S 2 Y = 176.00 and S 2 p = 433.13. Therefore, T = −1.29 and comparing this to a t 21 distribution gives a p-value of .21. So there is no evidence that the mean age diﬀers between the core and periphery. 8.42 a. The Satterthwaite approximation states that if Y i ∼ χ 2 r i , where the Y i ’s are independent, then i a i Y i approx χ 2 ˆ ν ˆ ν where ˆ ν = ( i a i Y i ) 2 i a 2 i Y 2 i /r i . We have Y 1 = (n −1)S 2 X 2 X ∼ χ 2 n−1 and Y 2 = (m−1)S 2 Y 2 Y ∼ χ 2 m−1 . Now deﬁne a 1 = σ 2 X n(n −1) [(σ 2 X /n) + (σ 2 Y /m)] and a 2 = σ 2 Y m(m−1) [(σ 2 X /n) + (σ 2 Y /m)] . Then, a i Y i = σ 2 X n(n −1) [(σ 2 X /n) + (σ 2 Y /m)] (n −1)S 2 X σ 2 X + σ 2 Y m(m−1) [(σ 2 X /n) + (σ 2 Y /m)] (m−1)S 2 Y σ 2 Y = S 2 X /n +S 2 Y /m σ 2 X /n+σ 2 Y /m χ 2 ˆ ν ˆ ν where ˆ ν = _ S 2 X /n+S 2 Y /m σ 2 X /n+σ 2 Y /m _ 2 1 (n−1) S 4 X n 2 2 X /n+σ 2 Y /m) 2 + 1 (m−1) S 4 Y m 2 2 X /n+σ 2 Y /m) 2 = _ S 2 X /n +S 2 Y /m _ 2 S 4 X n 2 (n−1) + S 4 Y m 2 (m−1) . b. Because ¯ X − ¯ Y ∼ n _ µ X −µ Y , σ 2 X /n+σ 2 Y /m _ and S 2 X /n+S 2 Y /m σ 2 X /n+σ 2 Y /m approx ∼ χ 2 ˆ ν /ˆ ν, under H 0 : µ X −µ Y = 0 we have T = ¯ X − ¯ Y _ S 2 X /n +S 2 Y /m = ( ¯ X − ¯ Y ) _ _ σ 2 X /n+σ 2 Y /m _ (S 2 X /n+S 2 Y /m) 2 X /n+σ 2 Y /m) approx ∼ t ˆ ν . 8-18 Solutions Manual for Statistical Inference c. Using the values in Exercise 8.41d, we obtain T = −1.46 and ˆ ν = 20.64. So the p-value is .16. There is no evidence that the mean age diﬀers between the core and periphery. d. F = S 2 X /S 2 Y = 3.36. Comparing this with an F 13,8 distribution yields a p-value of 2P(F ≥ 3.36) = .09. So there is some slight evidence that the variance diﬀers between the core and periphery. 8.43 There were typos in early printings. The t statistic should be ( ¯ X − ¯ Y ) −(µ 1 −µ 2 ) _ 1 n 1 + ρ 2 n 2 _ (n 1 −1)s 2 X +(n 2 −1)s 2 Y 2 n 1 +n 2 −2 , and the F statistic should be s 2 Y /(ρ 2 s 2 X ). Multiply and divide the denominator of the t statistic by σ to express it as ( ¯ X − ¯ Y ) −(µ 1 −µ 2 ) _ σ 2 n 1 + ρ 2 σ 2 n 2 divided by ¸ (n 1 −1)s 2 X 2 + (n 2 −1)s 2 Y /(ρ 2 σ 2 ) n 1 +n 2 −2 . The numerator has a n(0, 1) distribution. In the denominator, (n 1 − 1)s 2 X 2 ∼ χ 2 n 1 −1 and (n 2 −1)s 2 Y /(ρ 2 σ 2 ) ∼ χ 2 n 2 −1 and they are independent, so their sum has a χ 2 n 1 +n 2 −2 distribution. Thus, the statistic has the form of n(0, 1)/ _ χ 2 ν /ν where ν = n 1 + n 2 − 2, and the numerator and denominator are independent because of the independence of sample means and variances in normal sampling. Thus the statistic has a t n 1 +n 2 −2 distribution. The F statistic can be written as s 2 Y ρ 2 s 2 X = s 2 Y /(ρ 2 σ 2 ) s 2 X 2 = [(n 2 −1)s 2 Y /(ρ 2 σ 2 )]/(n 2 −1) [(n 1 −1)s 2 X /(σ 2 )]/(n 1 −1) which has the form [χ 2 n 2 −1 /(n 2 − 1)]/[χ 2 n 1 −1 /(n 1 − 1)] which has an F n 2 −1,n 1 −1 distribution. (Note, early printings had a typo with the numerator and denominator degrees of freedom switched.) 8.44 Test 3 rejects H 0 : θ = θ 0 in favor of H 1 : θ = θ 0 if ¯ X > θ 0 +z α/2 σ/ n or ¯ X < θ 0 −z α/2 σ/ n. Let Φ and φ denote the standard normal cdf and pdf, respectively. Because ¯ X ∼ n(θ, σ 2 /n), the power function of Test 3 is β(θ) = P θ ( ¯ X < θ 0 −z α/2 σ/ n) +P θ ( ¯ X > θ 0 +z α/2 σ/ n) = Φ _ θ 0 −θ σ/ n −z α/2 _ + 1 −Φ _ θ 0 −θ σ/ n +z α/2 _ , and its derivative is dβ(θ) = − n σ φ _ θ 0 −θ σ/ n −z α/2 _ + n σ φ _ θ 0 −θ σ/ n +z α/2 _ . Because φ is symmetric and unimodal about zero, this derivative will be zero only if _ θ 0 −θ σ/ n −z α/2 _ = θ 0 −θ σ/ n +z α/2 , that is, only if θ = θ 0 . So, θ = θ 0 is the only possible local maximum or minimum of the power function. β(θ 0 ) = α and lim θ→±∞ β(θ) = 1. Thus, θ = θ 0 is the global minimum of β(θ), and, for any θ = θ 0 , β(θ ) > β(θ 0 ). That is, Test 3 is unbiased. Second Edition 8-19 8.45 The veriﬁcation of size α is the same computation as in Exercise 8.37a. Example 8.3.3 shows that the power function β m (θ) for each of these tests is an increasing function. So for θ > θ 0 , β m (θ) > β m 0 ) = α. Hence, the tests are all unbiased. 8.47 a. This is very similar to the argument for Exercise 8.41. b. By an argument similar to part (a), this LRT rejects H + 0 if T + = ¯ X − ¯ Y −δ _ S 2 p _ 1 n + 1 m _ ≤ −t n+m−2,α . c. Because H 0 is the union of H + 0 and H 0 , by the IUT method of Theorem 8.3.23 the test that rejects H 0 if the tests in parts (a) and (b) both reject is a level α test of H 0 . That is, the test rejects H 0 if T + ≤ −t n+m−2,α and T ≥ t n+m−2,α . d. Use Theorem 8.3.24. Consider parameter points with µ X − µ Y = δ and σ → 0. For any σ, P(T + ≤ −t n+m−2,α ) = α. The power of the T test is computed from the noncentral t distribution with noncentrality parameter |µ x −µ Y −(−δ)|/[σ(1/n + 1/m)] = 2δ/[σ(1/n + 1/m)] which converges to ∞as σ →0. Thus, P(T ≥ t n+m−2,α ) →1 as σ →0. By Theorem 8.3.24, this IUT is a size α test of H 0 . 8.49 a. The p-value is P __ 7 or more successes out of 10 Bernoulli trials ¸ ¸ ¸ θ= 1 2 _ = _ 10 7 __ 1 2 _ 7 _ 1 2 _ 3 + _ 10 8 __ 1 2 _ 8 _ 1 2 _ 2 + _ 10 9 __ 1 2 _ 9 _ 1 2 _ 1 + _ 10 10 __ 1 2 _ 10 _ 1 2 _ 0 = .171875. b. P-value = P{X ≥ 3 | λ = 1} = 1 −P(X < 3 | λ = 1) = 1 − _ e −1 1 2 2! + e −1 1 1 1! + e −1 1 0 0! _ ≈ .0803. c. P-value = P{ i X i ≥ 9 | 3λ = 3} = 1 −P(Y < 9 | 3λ = 3) = 1 −e −3 _ 3 8 8! + 3 7 7! + 3 6 6! + 3 5 5! +· · · + 3 1 1! + 3 0 0! _ ≈ .0038, where Y = 3 i=1 X i ∼ Poisson(3λ). 8.50 From Exercise 7.26, π(θ|x) = _ n 2πσ 2 e −n(θ−δ ± (x)) 2 /(2σ 2 ) , where δ ± (x) = ¯ x ± σ 2 na and we use the “+” if θ > 0 and the “−” if θ < 0. a. For K > 0, P(θ > K|x, a) = _ n 2πσ 2 _ K e −n(θ−δ + (x)) 2 /(2σ 2 ) dθ = P _ Z > n σ [K−δ + (x)] _ , where Z ∼ n(0, 1). 8-20 Solutions Manual for Statistical Inference b. As a →∞, δ + (x) → ¯ x so P(θ > K) →P _ Z > n σ (K−¯ x) _ . c. For K = 0, the answer in part (b) is 1 −(p-value) for H 0 : θ ≤ 0. 8.51 If α < p(x), sup θ∈Θ 0 P(W(X) ≥ c α ) = α < p(x) = sup θ∈Θ 0 P(W(X) ≥ W(x)). Thus W(x) < c α and we could not reject H 0 at level α having observed x. On the other hand, if α ≥ p(x), sup θ∈Θ 0 P(W(X) ≥ c α ) = α ≥ p(x) = sup θ∈Θ 0 P(W(X) ≥ W(x)). Either W(x) ≥ c α in which case we could reject H 0 at level α having observed x or W(x) < c α . But, in the latter case we could use c α = W(x) and have {x : W(x ) ≥ c α } deﬁne a size α rejection region. Then we could reject H 0 at level α having observed x. 8.53 a. P(−∞< θ < ∞) = 1 2 + 1 2 1 2πτ 2 _ −∞ e −θ 2 /(2τ 2 ) dθ = 1 2 + 1 2 = 1. b. First calculate the posterior density. Because f(¯ x|θ) = n 2πσ e −n(¯ x−θ) 2 /(2σ 2 ) , we can calculate the marginal density as m π (¯ x) = 1 2 f(¯ x|0) + 1 2 _ −∞ f(¯ x|θ) 1 2πτ e −θ 2 /(2τ 2 ) = 1 2 n 2πσ e −n¯ x 2 /(2σ 2 ) + 1 2 1 _ 2 /n)+τ 2 e −¯ x 2 /[2((σ 2 /n)+τ 2 )] (see Exercise 7.22). Then P(θ = 0|¯ x) = 1 2 f(¯ x|0)/m π (¯ x). c. P _ | ¯ X| > ¯ x ¸ ¸ θ = 0 _ = 1 −P _ | ¯ X| ≤ ¯ x ¸ ¸ θ = 0 _ = 1 −P _ −¯ x ≤ ¯ X ≤ ¯ x ¸ ¸ θ = 0 _ = 2 _ 1−Φ _ ¯ x/(σ/ n) , where Φ is the standard normal cdf. d. For σ 2 = τ 2 = 1 and n = 9 we have a p-value of 2 (1 −Φ(3¯ x)) and P (θ = 0| ¯ x) = _ 1 + _ 1 10 e 81¯ x 2 /20 _ −1 . The p-value of ¯ x is usually smaller than the Bayes posterior probability except when ¯ x is very close to the θ value speciﬁed by H 0 . The following table illustrates this. Some p-values and posterior probabilities (n = 9) ¯ x 0 ±.1 ±.15 ±.2 ±.5 ±.6533 ±.7 ±1 ±2 p-value of ¯ x 1 .7642 .6528 .5486 .1336 .05 .0358 .0026 ≈ 0 posterior P(θ = 0|¯ x) .7597 .7523 .7427 .7290 .5347 .3595 .3030 .0522 ≈ 0 Second Edition 8-21 8.54 a. From Exercise 7.22, the posterior distribution of θ|x is normal with mean [τ 2 /(τ 2 2 /n)]¯ x and variance τ 2 /(1 +nτ 2 2 ). So P(θ ≤ 0|x) = P _ Z ≤ 0 −[τ 2 /(τ 2 2 /n)]¯ x _ τ 2 /(1 +nτ 2 2 ) _ = P _ Z ≤ − τ _ 2 /n)(τ 2 2 /n) ¯ x _ = P _ Z ≥ τ _ 2 /n)(τ 2 2 /n) ¯ x _ . b. Using the fact that if θ = 0, ¯ X ∼ n(0, σ 2 /n), the p-value is P( ¯ X ≥ ¯ x) = P _ Z ≥ ¯ x −0 σ/ n _ = P _ Z ≥ 1 σ/ n ¯ x _ c. For σ 2 = τ 2 = 1, P(θ ≤ 0|x) = P _ Z ≥ 1 _ (1/n)(1 + 1/n) ¯ x _ and P( ¯ X ≥ ¯ x) = P _ Z ≥ 1 _ 1/n ¯ x _ . Because 1 _ (1/n)(1 + 1/n) < 1 _ 1/n , the Bayes probability is larger than the p-value if ¯ x ≥ 0. (Note: The inequality is in the opposite direction for ¯ x < 0, but the primary interest would be in large values of ¯ x.) d. As τ 2 →∞, the constant in the Bayes probability, τ _ 2 /n)(τ 2 2 /n) = 1 _ 2 /n)(1+σ 2 /(τ 2 n)) 1 σ/ n , the constant in the p-value. So the indicated equality is true. 8.55 The formulas for the risk functions are obtained from (8.3.14) using the power function β(θ) = Φ(−z α 0 −θ), where Φ is the standard normal cdf. 8.57 For 0–1 loss by (8.3.12) the risk function for any test is the power function β(µ) for µ ≤ 0 and 1 − β(µ) for µ > 0. Let α = P(1 < Z < 2), the size of test δ. By the Karlin-Rubin Theorem, the test δ z α that rejects if X > z α is also size α and is uniformly more powerful than δ, that is, β δ z α (µ) > β δ (µ) for all µ > 0. Hence, R(µ, δ z α ) = 1 −β δ z α (µ) < 1 −β δ (µ) = R(µ, δ), for all µ > 0. Now reverse the roles of H 0 and H 1 and consider testing H 0 : µ > 0 versus H 1 : µ ≤ 0. Consider the test δ that rejects H 0 if X ≤ 1 or X ≥ 2, and the test δ z α that rejects H 0 if X ≤ z α . It is easily veriﬁed that for 0–1 loss δ and δ have the same risk functions, and δ z α and δ z α have the same risk functions. Furthermore, using the Karlin-Rubin Theorem as before, we can conclude that δ z α is uniformly more powerful than δ . Thus we have R(µ, δ) = R(µ, δ ) ≥ R(µ, δ z α ) = R(µ, δ z α ), for all µ ≤ 0, with strict inequality if µ < 0. Thus, δ z α is better than δ. Chapter 9 Interval Estimation 9.1 Denote A = {x : L(x) ≤ θ} and B = {x : U(x) ≥ θ}. Then A ∩ B = {x : L(x) ≤ θ ≤ U(x)} and 1 ≥ P {A∪ B} = P {L(X) ≤ θ or θ ≤ U(X)} ≥ P {L(X) ≤ θ or θ ≤ L(X)} = 1, since L(x) ≤ U(x). Therefore, P(A∩B) = P(A)+P(B)−P(A∪B) = 1−α 1 +1−α 2 −1 = 1−α 1 −α 2 . 9.3 a. The MLE of β is X (n) = max X i . Since β is a scale parameter, X (n) /β is a pivot, and .05 = P β (X (n) /β ≤ c) = P β (all X i ≤ cβ) = _ β _ α 0 n = c α 0 n implies c = .05 1/α 0 n . Thus, .95 = P β (X (n) /β > c) = P β (X (n) /c > β), and {β : β < X (n) /(.05 1/α 0 n )} is a 95% upper conﬁdence limit for β. b. From 7.10, ˆ α = 12.59 and X (n) = 25. So the conﬁdence interval is (0, 25/[.05 1/(12.59·14) ]) = (0, 25.43). 9.4 a. λ(x, y) = sup λ=λ 0 L _ σ 2 X , σ 2 Y ¸ ¸ x, y _ sup λ∈(0,+∞) L( σ 2 X , σ 2 Y | x, y) The unrestricted MLEs of σ 2 X and σ 2 Y are ˆ σ 2 X = ΣX 2 i n and ˆ σ 2 Y = ΣY 2 i m , as usual. Under the restriction, λ = λ 0 , σ 2 Y = λ 0 σ 2 X , and L _ σ 2 X , λ 0 σ 2 X ¸ ¸ x, y _ = _ 2πσ 2 X _ −n/2 _ 2πλ 0 σ 2 X _ −m/2 e −Σx 2 i /(2σ 2 X ) · e −Σy 2 i /(2λ 0 σ 2 X ) = _ 2πσ 2 X _ −(m+n)/2 λ −m/2 0 e −(λ 0 Σx 2 i +Σy 2 i )/(2λ 0 σ 2 X ) Diﬀerentiating the log likelihood gives d log L d (σ 2 X ) 2 = d 2 X _ m+n 2 log σ 2 X m+n 2 log (2π) − m 2 log λ 0 λ 0 Σx 2 i + Σy 2 i 0 σ 2 X _ = − m+n 2 _ σ 2 X _ −1 + λ 0 Σx 2 i + Σy 2 i 0 _ σ 2 X _ −2 set = 0 which implies ˆ σ 2 0 = λ 0 Σx 2 i + Σy 2 i λ 0 (m+n) . To see this is a maximum, check the second derivative: d 2 log L d (σ 2 X ) 2 = m+n 2 _ σ 2 X _ −2 1 λ 0 _ λ 0 Σx 2 i + Σy 2 i _ _ σ 2 X _ −3 ¸ ¸ ¸ ¸ σ 2 X =ˆ σ 2 0 = − m+n 2 (ˆ σ 2 0 ) −2 < 0, 9-2 Solutions Manual for Statistical Inference therefore ˆ σ 2 0 is the MLE. The LRT statistic is _ ˆ σ 2 X _ n/2 _ ˆ σ 2 Y _ m/2 λ m/2 0 (ˆ σ 2 0 ) (m+n)/2 , and the test is: Reject H 0 if λ(x, y) < k, where k is chosen to give the test size α. b. Under H 0 , Y 2 i /(λ 0 σ 2 X ) ∼ χ 2 m and X 2 i 2 X ∼ χ 2 n , independent. Also, we can write λ(X, Y ) = _ _ 1 n m+n + (ΣY 2 i 0 σ 2 X )/m (ΣX 2 i 2 X )/n · m m+n _ _ n/2 _ _ 1 m m+n + (ΣX 2 i 2 X )/n (ΣY 2 i 0 σ 2 X )/m · n m+n _ _ m/2 = _ 1 n n+m + m m+n F _ n/2 _ 1 m m+n + n m+n F −1 _ m/2 where F = ΣY 2 i 0 m ΣX 2 i /n ∼ F m,n under H 0 . The rejection region is _ ¸ _ ¸ _ (x, y): 1 _ n n+m + m m+n F _ n/2 · 1 _ m m+n + n m+n F −1 _ m/2 < c α _ ¸ _ ¸ _ where c α is chosen to satisfy P _ _ n n +m + m m+n F _ −n/2 _ m n +m + n m+n F −1 _ −m/2 < c α _ = α. c. To ease notation, let a = m/(n +m) and b = a y 2 i / x 2 i . From the duality of hypothesis tests and conﬁdence sets, the set c(λ) = _ _ _ λ: _ 1 a +b/λ _ n/2 _ 1 (1 −a)+ a(1−a) b λ _ m/2 ≥ c α _ _ _ is a 1−α conﬁdence set for λ. We now must establish that this set is indeed an interval. To do this, we establish that the function on the left hand side of the inequality has only an interior maximum. That is, it looks like an upside-down bowl. Furthermore, it is straightforward to establish that the function is zero at both λ = 0 and λ = ∞. These facts imply that the set of λ values for which the function is greater than or equal to c α must be an interval. We make some further simpliﬁcations. If we multiply both sides of the inequality by [(1 − a)/b] m/2 , we need be concerned with only the behavior of the function h(λ) = _ 1 a +b/λ _ n/2 _ 1 b +aλ _ m/2 . Moreover, since we are most interested in the sign of the derivative of h, this is the same as the sign of the derivative of log h, which is much easier to work with. We have d log h(λ) = d _ n 2 log(a +b/λ) − m 2 log(b +aλ) _ = n 2 b/λ 2 a +b/λ m 2 a b +aλ = 1 2 (a +b/λ)(b +aλ) _ −a 2 2 +ab(n −m)λ+nb 2 ¸ . Second Edition 9-3 The sign of the derivative is given by the expression in square brackets, a parabola. It is easy to see that for λ ≥ 0, the parabola changes sign from positive to negative. Since this is the sign change of the derivative, the function must increase then decrease. Hence, the function is an upside-down bowl, and the set is an interval. 9.5 a. Analogous to Example 9.2.5, the test here will reject H 0 if T < k(p 0 ). Thus the conﬁdence set is C = {p: T ≥ k(p)}. Since k(p) is nondecreasing, this gives an upper bound on p. b. k(p) is the integer that simultaneously satisﬁes n y=k(p) _ n y _ p y (1 −p) n−y ≥ 1 −α and n y=k(p)+1 _ n y _ p y (1 −p) n−y < 1 −α. 9.6 a. For Y = X i ∼ binomial(n, p), the LRT statistic is λ(y) = _ n y _ p y 0 (1 −p 0 ) n−y _ n y _ ˆ p y (1 − ˆ p) n−y = _ p 0 (1 − ˆ p) ˆ p(1 −p 0 ) _ y _ 1−p 0 1−ˆ p _ n where ˆ p = y/n is the MLE of p. The acceptance region is A(p 0 ) = _ y : _ p 0 ˆ p _ y _ 1−p 0 1−ˆ p _ n−y ≥ k _ where k is chosen to satisfy P p 0 (Y ∈ A(p 0 )) = 1 −α. Inverting the acceptance region to a conﬁdence set, we have C(y) = _ p: _ p ˆ p _ y _ (1 −p) 1−ˆ p _ n−y ≥ k _ . b. For given n and observed y, write C(y) = _ p: (n/y) y (n/(n −y)) n−y p y (1 −p) n−y ≥ k _ . This is clearly a highest density region. The endpoints of C(y) are roots of the n th degree polynomial (in p), (n/y) y (n/(n −y)) n−y p y (1 −p) n−y −k . The interval of (10.4.4) is _ p: ¸ ¸ ¸ ¸ ¸ ˆ p −p _ p(1 −p)/n ¸ ¸ ¸ ¸ ¸ ≤ z α/2 _ . The endpoints of this interval are the roots of the second degree polynomial (in p), (ˆ p−p) 2 z 2 α/2 p(1 − p)/n. Typically, the second degree and n th degree polynomials will not have the same roots. Therefore, the two intervals are diﬀerent. (Note that when n →∞ and y →∞, the density becomes symmetric (CLT). Then the two intervals are the same.) 9.7 These densities have already appeared in Exercise 8.8, where LRT statistics were calculated for testing H 0 : a = 1. a. Using the result of Exercise 8.8(a), the restricted MLE of θ (when a = a 0 ) is ˆ θ 0 = −a 0 + _ a 2 0 + 4 x 2 i /n 2 , and the unrestricted MLEs are ˆ θ = ¯ x and ˆ a = (x i − ¯ x) 2 n¯ x . 9-4 Solutions Manual for Statistical Inference The LRT statistic is λ(x) = _ ˆ a ˆ θ a 0 ˆ θ 0 _ n/2 e 1 2a 0 ˆ θ 0 Σ(x i ˆ θ 0 ) 2 e 1 2ˆ a ˆ θ Σ(x i ˆ θ) 2 = _ 1 2πa 0 ˆ θ 0 _ n/2 e n/2 e 1 2a 0 ˆ θ 0 Σ(x i ˆ θ 0 ) 2 The rejection region of a size α test is {x: λ(x) ≤ c α }, and a 1 − α conﬁdence set is {a 0 : λ(x) ≥ c α }. b. Using the results of Exercise 8.8b, the restricted MLE (for a = a 0 ) is found by solving −a 0 θ 2 + [ˆ σ 2 + (¯ x −θ) 2 ] +θ(¯ x −θ) = 0, yielding the MLE ˆ θ R = ¯ x + _ ¯ x + 4a 0 (ˆ σ 2 + ¯ x 2 )/2a 0 . The unrestricted MLEs are ˆ θ = ¯ x and ˆ a = 1 n¯ x 2 n i=1 (x i − ¯ x) 2 = ˆ σ 2 ¯ x 2 , yielding the LRT statistic λ(x) = _ ˆ σ/ ˆ θ R _ n e (n/2)−Σ(x i ˆ θ R ) 2 /(2 ˆ θ R ) . The rejection region of a size α test is {x: λ(x) ≤ c α }, and a 1 − α conﬁdence set is {a 0 : λ(x) ≥ c α }. 9.9 Let Z 1 , . . . , Z n be iid with pdf f(z). a. For X i ∼ f(x −µ), (X 1 , . . . , X n ) ∼ (Z 1 +µ, . . . , Z n +µ), and ¯ X −µ ∼ Z +µ −µ = ¯ Z. The distribution of ¯ Z does not depend on µ. b. For X i ∼ f(x/σ)/σ, (X 1 , . . . , X n ) ∼ (σZ 1 , . . . , σZ n ), and ¯ X/σ ∼ σZ/σ = ¯ Z. The distribu- tion of ¯ Z does not depend on σ. c. For X i ∼ f((x − µ)/σ)/σ, (X 1 , . . . , X n ) ∼ (σZ 1 + µ, . . . , σZ n + µ), and ( ¯ X − µ)/S X (σZ +µ − µ)/S σZ+µ = σ ¯ Z/(σS Z ) = ¯ Z/S Z . The distribution of ¯ Z/S Z does not depend on µ or σ. 9.11 Recall that if θ is the true parameter, then F T (T|θ) ∼ uniform(0, 1). Thus, P θ 0 ({T : α 1 ≤ F T (T|θ 0 ) ≤ 1 −α 2 }) = P(α 1 ≤ U ≤ 1 −α 2 ) = 1 −α 2 −α 1 , where U ∼ uniform(0, 1). Since t ∈ {t : α 1 ≤ F T (t|θ) ≤ 1 −α 2 } ⇔ θ ∈ {θ : α 1 ≤ F T (t|θ) ≤ 1 −α 2 } the same calculation shows that the interval has conﬁdence 1 −α 2 −α 1 . 9.12 If X 1 , . . . , X n ∼ iid n(θ, θ), then n( ¯ X − θ)/ θ ∼ n(0, 1) and a 1 − α conﬁdence interval is {θ: | n(¯ x −θ)/ θ| ≤ z α/2 }. Solving for θ, we get _ θ: nθ 2 −θ _ 2n¯ x +z 2 α/2 _ +n¯ x 2 ≤ 0 _ = _ θ: θ ∈ _ 2n¯ x +z 2 α/2 ± _ 4n¯ xz 2 α/2 +z 4 α/2 _ /2n _ . Simpler answers can be obtained using the t pivot, ( ¯ X−θ)/(S/ n), or the χ 2 pivot, (n−1)S 2 2 . (Tom Werhley of Texas A&M university notes the following: The largest probability of getting a negative discriminant (hence empty conﬁdence interval) occurs when nθ = 1 2 z α/2 , and the probability is equal to α/2. The behavior of the intervals for negative values of ¯ x is also interesting. When ¯ x = 0 the lefthand endpoint is also equal to 0, but when ¯ x < 0, the lefthand endpoint is positive. Thus, the interval based on ¯ x = 0 contains smaller values of θ than that based on ¯ x < 0. The intervals get smaller as ¯ x decreases, ﬁnally becoming empty.) Second Edition 9-5 9.13 a. For Y = −(log X) −1 , the pdf of Y is f Y (y) = θ y 2 e −θ/y , 0 < y < ∞, and P(Y/2 ≤ θ ≤ Y ) = _ θ θ y 2 e −θ/y dy = e −θ/y ¸ ¸ ¸ θ = e −1/2 −e −1 = .239. b. Since f X (x) = θx θ−1 , 0 < x < 1, T = X θ is a good guess at a pivot, and it is since f T (t) = 1, 0 < t < 1. Thus a pivotal interval is formed from P(a < X θ < b) = b −a and is _ θ: log b log x ≤ θ ≤ log a log x _ . Since X θ ∼ uniform(0, 1), the interval will have conﬁdence .239 as long as b −a = .239. c. The interval in part a) is a special case of the one in part b). To ﬁnd the best interval, we minimize log b −log a subject to b −a = 1 −α, or b = 1 −α +a. Thus we want to minimize log(1 − α + a) − log a = log _ 1+ 1−α a _ , which is minimized by taking a as big as possible. Thus, take b = 1 and a = α, and the best 1 −α pivotal interval is _ θ: 0 ≤ θ ≤ log α log x _ . Thus the interval in part a) is nonoptimal. A shorter interval with conﬁdence coeﬃcient .239 is {θ: 0 ≤ θ ≤ log(1 −.239)/log(x)}. 9.14 a. Recall the Bonferroni Inequality (1.2.9), P(A 1 ∩ A 2 ) ≥ P (A 1 ) + P (A 2 ) − 1. Let A 1 = P(interval covers µ) and A 2 = P(interval covers σ 2 ). Use the interval (9.2.14), with t n−1,α/4 to get a 1 −α/2 conﬁdence interval for µ. Use the interval after (9.2.14) with b = χ 2 n−1,α/4 and a = χ 2 n−1,1−α/4 to get a 1−α/2 conﬁdence interval for σ. Then the natural simultaneous set is C a (x) = _ (µ, σ 2 ): ¯ x −t n−1,α/4 s n ≤ µ ≤ ¯ x +t n−1,α/4 s n and (n −1)s 2 χ 2 n−1,α/4 ≤ σ 2 (n −1)s 2 χ 2 n−1,1−α/4 _ and P _ C a (X) covers (µ, σ 2 ) _ = P(A 1 ∩A 2 ) ≥ P (A 1 ) +P (A 2 ) −1 = 2(1−α/2) −1 = 1−α. b. If we replace the µ interval in a) by _ µ: ¯ x− n ≤ µ ≤ ¯ x + n _ then ¯ X−µ σ/ n ∼ n(0, 1), so we use z α/4 and C b (x) = _ (µ, σ 2 ): ¯ x −z α/4 σ n ≤ µ ≤ ¯ x +z α/4 σ n and (n −1)s 2 χ 2 n−1,α/4 ≤ σ 2 (n −1)s 2 χ 2 n−1,1−α/4 _ and P _ C b (X) covers (µ, σ 2 ) _ ≥ 2(1 −α/2) −1 = 1 −α. c. The sets can be compared graphically in the (µ, σ) plane: C a is a rectangle, since µ and σ 2 are treated independently, while C b is a trapezoid, with larger σ 2 giving a longer interval. Their areas can also be calculated Area of C a = _ 2t n−1,α/4 s n _ _ _ (n −1)s 2 _ 1 χ 2 n−1,1−α/4 1 χ 2 n−1,α/4 __ Area of C b = _ z α/4 s n n −1 χ 2 n−1,1−α/4 + ¸ n −1 χ 2 n−1,α/4 __ × _ _ (n −1)s 2 _ 1 χ 2 n−1,1−α/4 1 χ 2 n−1,α/4 __ and compared numerically. 9-6 Solutions Manual for Statistical Inference 9.15 Fieller’s Theorem says that a 1 −α conﬁdence set for θ = µ Y X is _ θ: _ ¯ x 2 t 2 n−1,α/2 n −1 s 2 X _ θ 2 −2 _ ¯ x¯ y − t 2 n−1,α/2 n −1 s Y X _ θ + _ ¯ y 2 t 2 n−1,α/2 n −1 s 2 Y _ ≤ 0 _ . a. Deﬁne a = ¯ x 2 − ts 2 X , b = ¯ x¯ y − ts Y X , c = ¯ y 2 − ts 2 Y , where t = t 2 n−1,α/2 n−1 . Then the parabola opens upward if a > 0. Furthermore, if a > 0, then there always exists at least one real root. This follows from the fact that at θ = ¯ y/¯ x, the value of the function is negative. For ¯ θ = ¯ y/¯ x we have _ ¯ x 2 −ts 2 X _ _ ¯ y ¯ x _ 2 −2 (¯ x¯ y −ts XY ) _ ¯ y ¯ x _ + _ ¯ y 2 −as 2 Y _ = −t _ ¯ y 2 ¯ x 2 s 2 X −2 ¯ y ¯ x s XY +s 2 Y _ = −t _ n i=1 _ ¯ y 2 ¯ x 2 (x i − ¯ x) 2 −2 ¯ y ¯ x (x i − ¯ x)(y i − ¯ y) + (y i − ¯ y) 2 _ _ = −t _ n i=1 _ ¯ y ¯ x (x i − ¯ x) −(y i − ¯ y) _ 2 _ which is negative. b. The parabola opens downward if a < 0, that is, if ¯ x 2 < ts 2 X . This will happen if the test of H 0 : µ X = 0 accepts H 0 at level α. c. The parabola has no real roots if b 2 < ac. This can only occur if a < 0. 9.16 a. The LRT (see Example 8.2.1) has rejection region {x : |¯ x − θ 0 | > z α/2 σ/ n}, acceptance region A(θ 0 ) = {x: −z α/2 σ/ n ≤ ¯ x −θ 0 ≤ z α/2 σ/ n}, and 1−α conﬁdence interval C(θ) = {θ: ¯ x −z α/2 σ/ n ≤ θ ≤ ¯ x +z α/2 σ/ n}. b. We have a UMP test with rejection region {x : ¯ x − θ 0 < −z α σ/ n}, acceptance region A(θ 0 ) = {x: ¯ x−θ 0 ≥ −z α σ/ n}, and 1−α conﬁdence interval C(θ) = {θ: ¯ x+z α σ/ n ≥ θ}. c. Similar to b), the UMP test has rejection region {x: ¯ x −θ 0 > z α σ/ n}, acceptance region A(θ 0 ) = {x: ¯ x −θ 0 ≤ z α σ/ n}, and 1−α conﬁdence interval C(θ) = {θ: ¯ x−z α σ/ n ≤ θ}. 9.17 a. Since X − θ ∼ uniform(−1/2, 1/2), P(a ≤ X − θ ≤ b) = b − a. Any a and b satisfying b = a + 1 −α will do. One choice is a = − 1 2 + α 2 , b = 1 2 α 2 . b. Since T = X/θ has pdf f(t) = 2t, 0 ≤ t ≤ 1, P(a ≤ X/θ ≤ b) = _ b a 2t dt = b 2 −a 2 . Any a and b satisfying b 2 = a 2 + 1 −α will do. One choice is a = _ α/2, b = _ 1 −α/2. 9.18 a. P p (X = 1) = _ 3 1 _ p 1 (1 −p) 3−1 = 3p(1 −p) 2 , maximum at p = 1/3. P p (X = 2) = _ 3 2 _ p 2 (1 −p) 3−2 = 3p 2 (1 −p), maximum at p = 2/3. b. P(X = 0) = _ 3 0 _ p 0 (1 −p) 3−0 = (1 −p) 3 , and this is greater than P(X = 2) if (1 −p) 2 > 3p 2 , or 2p 2 + 2p −1 < 0. At p = 1/3, 2p 2 + 2p −1 = −1/9. c. To show that this is a 1 − α = .442 interval, compare with the interval in Example 9.2.11. There are only two discrepancies. For example, P(p ∈ interval | .362 < p < .634) = P(X = 1 or X = 2) > .442 by comparison with Sterne’s procedure, which is given by Second Edition 9-7 x interval 0 [.000,.305) 1 [.305,.634) 2 [.362,.762) 3 [.695,1]. 9.19 For F T (t|θ) increasing in θ, there are unique values θ U (t) and θ L (t) such that F T (t|θ) < 1 − α 2 if and only if θ < θ U (t) and F T (t|θ) > α 2 if and only if θ > θ L (t). Hence, P (θ L (T) ≤ θ ≤ θ U (T)) = P (θ ≤ θ U (T)) −P (θ ≤ θ L (T)) = P _ F T (T) ≤ 1 − α 2 _ −P _ F T (T) ≤ α 2 _ = 1 −α. 9.21 To construct a 1 −α conﬁdence interval for p of the form {p: ≤ p ≤ u} with P( ≤ p ≤ u) = 1 −α, we use the method of Theorem 9.2.12. We must solve for and u in the equations (1) α 2 = x k=0 _ n k _ u k (1 −u) n−k and (2) α 2 = n k=x _ n k _ k (1 −) n−k . In equation (1) α/2 = P(K ≤ x) = P(Y ≤ 1 − u), where Y ∼ beta(n − x, x + 1) and K ∼ binomial(n, u). This is Exercise 2.40. Let Z ∼ F 2(n−x),2(x+1) and c = (n −x)/(x + 1). By Theorem 5.3.8c, cZ/(1 +cZ) ∼ beta(n −x, x + 1) ∼ Y . So we want α/2 = P _ cZ (1 +cZ) ≤ 1 −u _ = P _ 1 Z cu 1 −u _ . From Theorem 5.3.8a, 1/Z ∼ F 2(x+1),2(n−x) . So we need cu/(1−u) = F 2(x+1),2(n−x),α/2 . Solving for u yields u = x+1 n−x F 2(x+1),2(n−x),α/2 1 + x+1 n−x F 2(x+1),2(n−x),α/2 . A similar manipulation on equation (2) yields the value for . 9.23 a. The LRT statistic for H 0 : λ = λ 0 versus H 1 : λ = λ 0 is g(y) = e −nλ 0 (nλ 0 ) y /e −n ˆ λ (n ˆ λ) y , where Y = X i ∼ Poisson(nλ) and ˆ λ = y/n. The acceptance region for this test is A(λ 0 ) = {y : g(y) > c(λ 0 )) where c(λ 0 ) is chosen so that P(Y ∈ A(λ 0 )) ≥ 1 − α. g(y) is a unimodal function of y so A(λ 0 ) is an interval of y values. Consider constructing A(λ 0 ) for each λ 0 > 0. Then, for a ﬁxed y, there will be a smallest λ 0 , call it a(y), and a largest λ 0 , call it b(y), such that y ∈ A(λ 0 ). The conﬁdence interval for λ is then C(y) = (a(y), b(y)). The values a(y) and b(y) are not expressible in closed form. They can be determined by a numerical search, constructing A(λ 0 ) for diﬀerent values of λ 0 and determining those values for which y ∈ A(λ 0 ). (Jay Beder of the University of Wisconsin, Milwaukee, reminds us that since c is a function of λ, the resulting conﬁdence set need not be a highest density region of a likelihood function. This is an example of the eﬀect of the imposition of one type of inference (frequentist) on another theory (likelihood).) b. The procedure in part a) was carried out for y = 558 and the conﬁdence interval was found to be (57.78, 66.45). For the conﬁdence interval in Example 9.2.15, we need the values χ 2 1116,.95 = 1039.444 and χ 2 1118,.05 = 1196.899. This conﬁdence interval is (1039.444/18, 1196.899/18) = (57.75, 66.49). The two conﬁdence intervals are virtually the same. 9-8 Solutions Manual for Statistical Inference 9.25 The conﬁdence interval derived by the method of Section 9.2.3 is C(y) = _ µ: y + 1 n log _ α 2 _ ≤ µ ≤ y + 1 n log _ 1 − α 2 _ _ where y = min i x i . The LRT method derives its interval from the test of H 0 : µ = µ 0 versus H 1 : µ = µ 0 . Since Y is suﬃcient for µ, we can use f Y (y | µ). We have λ(y) = sup µ=µ 0 L(µ|y) sup µ∈(−∞,∞) L(µ|y) = ne −n (y −µ 0 )I 0 ,∞)(y) ne −(y−y) I [µ,∞)(y) = e −n(y−µ 0 ) I 0 ,∞) (y) = _ 0 if y < µ 0 e −n(y−µ 0 ) if y ≥ µ 0 . We reject H 0 if λ(y) = e −n(y−µ 0 ) < c α , where 0 ≤ c α ≤ 1 is chosen to give the test level α. To determine c α , set α = P {reject H 0 | µ = µ 0 } = P _ Y > µ 0 log c α n or Y < µ 0 ¸ ¸ ¸ ¸ µ = µ 0 _ = P _ Y > µ 0 log c α n ¸ ¸ ¸ ¸ µ = µ 0 _ = _ µ 0 log c α n ne −n(y−µ 0 ) dy = −e −n(y−µ 0 ) ¸ ¸ ¸ µ 0 log c α n = e log c α = c α . Therefore, c α = α and the 1 −α conﬁdence interval is C(y) = _ µ: µ ≤ y ≤ µ − log α n _ = _ µ: y + 1 n log α ≤ µ ≤ y _ . To use the pivotal method, note that since µ is a location parameter, a natural pivotal quantity is Z = Y −µ. Then, f Z (z) = ne −nz I (0,∞) (z). Let P{a ≤ Z ≤ b} = 1 −α, where a and b satisfy α 2 = _ a 0 ne −nz dz = −e −nz ¸ ¸ a 0 = 1 −e −na ⇒ e −na = 1 − α 2 ⇒ a = −log _ 1 − α 2 _ n α 2 = _ b ne −nz dz = −e −nz ¸ ¸ b = e −nb ⇒ −nb = log α 2 ⇒ b = − 1 n log _ α 2 _ Thus, the pivotal interval is Y +log(α/2)/n ≤ µ ≤ Y +log(1 −α/2), the same interval as from Example 9.2.13. To compare the intervals we compare their lengths. We have Length of LRT interval = y −(y + 1 n log α) = − 1 n log α Length of Pivotal interval = _ y + 1 n log(1 −α/2) _ −(y + 1 n log α/2) = 1 n log 1 −α/2 α/2 Thus, the LRT interval is shorter if −log α < log[(1 −α/2)/(α/2)], but this is always satisﬁed. 9.27 a. Y = X i ∼ gamma(n, λ), and the posterior distribution of λ is π(λ|y) = (y + 1 b ) n+a Γ(n +a) 1 λ n+a+1 e 1 λ (y+ 1 b ) , Second Edition 9-9 an IG _ n +a, (y + 1 b ) −1 _ . The Bayes HPD region is of the form {λ : π(λ|y) ≥ k}, which is an interval since π(λ|y) is unimodal. It thus has the form {λ: a 1 (y) ≤ λ ≤ a 2 (y)}, where a 1 and a 2 satisfy 1 a 1 n+a+1 e 1 a 1 (y+ 1 b ) = 1 a 2 n+a+1 e 1 a 2 (y+ 1 b ) . b. The posterior distribution is IG(((n−1)/2) +a, (((n−1)s 2 /2) +1/b) −1 ). So the Bayes HPD region is as in part a) with these parameters replacing n +a and y + 1/b. c. As a →0 and b →∞, the condition on a 1 and a 2 becomes 1 a 1 ((n−1)/2)+1 e 1 a 1 (n−1)s 2 2 = 1 a 2 ((n−1)/2)+1 e 1 a 2 (n−1)s 2 2 . 9.29 a. We know from Example 7.2.14 that if π(p) ∼ beta(a, b), the posterior is π(p|y) ∼ beta(y + a, n −y +b) for y = x i . So a 1 −α credible set for p is: {p: β y+a,n−y+b,1−α/2 ≤ p ≤ β y+a,n−y+b,α/2 }. b. Converting to an F distribution, β c,d = (c/d)F 2c,2d 1+(c/d)F 2c,2d , the interval is y+a n−y+b F 2(y+a),2(n−y+b),1−α/2 1 + y+a n−y+b F 2(y+a),2(n−y+b),1−α/2 ≤ p ≤ y+a n−y+b F 2(y+a),2(n−y+b),α/2 1 + y+a n−y+b F 2(y+a),2(n−y+b),α/2 or, using the fact that F m,n = F −1 n,m , 1 1 + n−y+b y+a F 2(n−y+b),2(y+a),α/2 ≤ p ≤ y+a n−y+b F 2(y+a),2(n+b),α/2 1 + y+a n−y+b F 2(y+a),2(n−y+b),α/2 . For this to match the interval of Exercise 9.21, we need x = y and Lower limit: n −y +b = n −x + 1 ⇒ b = 1 y +a = x ⇒ a = 0 Upper limit: y +a = x + 1 ⇒ a = 1 n −y +b = n −x ⇒ b = 0. So no values of a and b will make the intervals match. 9.31 a. We continually use the fact that given Y = y, χ 2 2y is a central χ 2 random variable with 2y degrees of freedom. Hence 2 2Y = E[E(χ 2 2Y |Y )] = E2Y = 2λ Varχ 2 2Y = E[Var(χ 2 2Y |Y )] + Var[E(χ 2 2Y |Y )] = E[4Y ] + Var[2Y ] = 4λ + 4λ = 8λ mgf = Ee 2 2Y = E[E(e 2 2Y |Y )] = E _ 1 1 −2t _ Y = y=0 e −λ _ λ 1−2t _ y y! = e −λ+ λ 1−2t . From Theorem 2.3.15, the mgf of (χ 2 2Y −2λ)/ 8λ is e −t λ/2 _ e −λ+ λ 1−t/ _ . 9-10 Solutions Manual for Statistical Inference The log of this is _ λ/2t −λ + λ 1 −t/ = t 2 λ −t 2 + 2 λ = t 2 −(t 2/ λ) + 2 →t 2 /2 as λ →∞, so the mgf converges to e t 2 /2 , the mgf of a standard normal. b. Since P(χ 2 2Y ≤ χ 2 2Y,α ) = α for all λ, χ 2 2Y,α −2λ →z α as λ →∞. In standardizing (9.2.22), the upper bound is nb nb+1 χ 2 2(Y +a),α/2 −2λ = _ 8(λ +a) _ nb nb+1 2 2(Y +a),α/2 −2(λ +a)] _ 8(λ +a) + nb nb+1 2(λ +a) −2λ _ 8(λ +a) _ . While the ﬁrst quantity in square brackets →z α/2 , the second one has limit lim λ→∞ −2 1 nb+1 λ +a nb nb+1 _ 8(λ +a) →−∞, so the coverage probability goes to zero. 9.33 a. Since 0 ∈ C a (x) for every x, P( 0 ∈ C a (X)| µ = 0) = 1. If µ > 0, P(µ ∈ C a (X)) = P(µ ≤ max{0, X +a}) = P(µ ≤ X +a) (since µ > 0) = P(Z ≥ −a) (Z ∼ n(0, 1)) = .95 (a = 1.645.) A similar calculation holds for µ < 0. b. The credible probability is _ max(0,x+a) min(0,x−a) 1 e 1 2 (µ−x) 2 dµ = _ max(−x,a) min(−x,−a) 1 e 1 2 t 2 dt = P (min(−x, −a) ≤ Z ≤ max(−x, a)) . To evaluate this probability we have two cases: (i) |x| ≤ a ⇒ credible probability = P(|Z| ≤ a) (ii) |x| > a ⇒ credible probability = P(−a ≤ Z ≤ |x|) Thus we see that for a = 1.645, the credible probability is equal to .90 if |x| ≤ 1.645 and increases to .95 as |x| →∞. 9.34 a. A 1 − α conﬁdence interval for µ is {µ : ¯ x − 1.96σ/ n ≤ µ ≤ ¯ x + 1.96σ/ n}. We need 2(1.96)σ/ n ≤ σ/4 or n ≥ 4(2)(1.96). Thus we need n ≥ 64(1.96) 2 ≈ 245.9. So n = 246 suﬃces. b. The length of a 95% conﬁdence interval is 2t n−1,.025 S/ n. Thus we need P _ 2t n−1,.025 S n σ 4 _ ≥ .9 ⇒ P _ 4t 2 n−1,.025 S 2 n σ 2 16 _ ≥ .9 ⇒ P _ _ _ _ _ (n −1)S 2 σ 2 . ¸¸ . ∼χ 2 n−1 (n −1)n t 2 n−1,.025 · 64 _ _ _ _ _ ≥ .9. Second Edition 9-11 We need to solve this numerically for the smallest n that satisﬁes the inequality (n −1)n t 2 n−1,.025 · 64 ≥ χ 2 n−1,.1 . Trying diﬀerent values of n we ﬁnd that the smallest such n is n = 276 for which (n −1)n t 2 n−1,.025 · 64 = 306.0 ≥ 305.5 = χ 2 n−1,.1 . As to be expected, this is somewhat larger than the value found in a). 9.35 length = 2z α/2 σ/ n, and if it is unknown, E(length) = 2t α/2,n−1 cσ/ n, where c = n −1Γ( n−1 2 ) 2Γ(n/2) and EcS = σ (Exercise 7.50). Thus the diﬀerence in lengths is (2σ/ n)(z α/2 −ct α/2 ). A little work will show that, as n →∞, c →constant. (This can be done using Stirling’s formula along with Lemma 2.3.14. In fact, some careful algebra will show that c →1 as n →∞.) Also, we know that, as n → ∞, t α/2,n−1 → z α/2 . Thus, the diﬀerence in lengths (2σ/ n)(z α/2 − ct α/2 ) → 0 as n →∞. 9.36 The sample pdf is f(x 1 , . . . , x n |θ) = n i=1 e iθ−x i I (iθ,∞) (x i ) = e Σ(iθ−x i ) I (θ,∞) [min(x i /i)]. Thus T = min(X i /i) is suﬃcient by the Factorization Theorem, and P(T > t) = n i=1 P(X i > it) = n i=1 _ it e iθ−x dx = n i=1 e i(θ−t) = e n(n+1) 2 (t−θ) , and f T (t) = n(n + 1) 2 e n(n+1) 2 (t−θ) , t ≥ θ. Clearly, θ is a location parameter and Y = T − θ is a pivot. To ﬁnd the shortest conﬁdence interval of the form [T + a, T + b], we must minimize b − a subject to the constraint P(−b ≤ Y ≤ −a) = 1 −α. Now the pdf of Y is strictly decreasing, so the interval length is shortest if −b = 0 and a satisﬁes P(0 ≤ Y ≤ −a) = e n(n+1) 2 a = 1 −α. So a = 2 log(1 −α)/(n(n + 1)). 9.37 a. The density of Y = X (n) is f Y (y) = ny n−1 n , 0 < y < θ. So θ is a scale parameter, and T = Y/θ is a pivotal quantity. The pdf of T is f T (t) = nt n−1 , 0 ≤ t ≤ 1. b. A pivotal interval is formed from the set {θ: a ≤ t ≤ b} = _ θ: a ≤ y θ ≤ b _ = _ θ: y b ≤ θ ≤ y a _ , and has length Y (1/a − 1/b) = Y (b − a)/ab. Since f T (t) is increasing, b − a is minimized and ab is maximized if b = 1. Thus shortest interval will have b = 1 and a satisfying α = _ a 0 nt n−1 dt = a n ⇒ a = α 1/n . So the shortest 1 −α conﬁdence interval is {θ: y ≤ θ ≤ y/α 1/n }. 9-12 Solutions Manual for Statistical Inference 9.39 Let a be such that _ a −∞ f(x) dx = α/2. This value is unique for a unimodal pdf if α > 0. Let µ be the point of symmetry and let b = 2µ −a. Then f(b) = f(a) and _ b f(x) dx = α/2. a ≤ µ since _ a −∞ f(x) dx = α/2 ≤ 1/2 = _ µ −∞ f(x) dx. Similarly, b ≥ µ. And, f(b) = f(a) > 0 since f(a) ≥ f(x) for all x ≤ a and _ a −∞ f(x) dx = α/2 > 0 ⇒ f(x) > 0 for some x < a ⇒ f(a) > 0. So the conditions of Theorem 9.3.2 are satisﬁed. 9.41 a. We show that for any interval [a, b] and > 0, the probability content of [a − , b − ] is greater (as long as b − > a). Write _ a b f(x) dx − _ b− a− f(x) dx = _ b b− f(x) dx − _ a a− f(x) dx ≤ f(b −)[b −(b −)] −f(a)[a −(a −)] ≤ [f(b −) −f(a)] ≤ 0, where all of the inequalities follow because f(x) is decreasing. So moving the interval toward zero increases the probability, and it is therefore maximized by moving a all the way to zero. b. T = Y −µ is a pivot with decreasing pdf f T (t) = ne −nt I [0,∞] (t). The shortest 1 −α interval on T is [0, − 1 n log α], since _ b 0 ne −nt dt = 1 −α ⇒ b = − 1 n log α. Since a ≤ T ≤ b implies Y −b ≤ µ ≤ Y −a, the best 1−α interval on µ is Y + 1 n log α ≤ µ ≤ Y . 9.43 a. Using Theorem 8.3.12, identify g(t) with f(x|θ 1 ) and f(t) with f(x|θ 0 ). Deﬁne φ(t) = 1 if t ∈ C and 0 otherwise, and let φ be the indicator of any other set C satisfying _ C f(t) dt ≥ 1 −α. Then (φ(t) −φ (t))(g(t) −λf(t)) ≤ 0 and 0 ≥ _ (φ −φ )(g −λf) = _ C g − _ C g −λ __ C f − _ C f _ _ C g − _ C g, showing that C is the best set. b. For Exercise 9.37, the pivot T = Y/θ has density nt n−1 , and the pivotal interval a ≤ T ≤ b results in the θ interval Y/b ≤ θ ≤ Y/a. The length is proportional to 1/a − 1/b, and thus g(t) = 1/t 2 . The best set is {t : 1/t 2 ≤ λnt n−1 }, which is a set of the form {t : a ≤ t ≤ 1}. This has probability content 1 −α if a = α 1/n . For Exercise 9.24 (or Example 9.3.4), the g function is the same and the density of the pivot is f k , the density of a gamma(k, 1). The set {t : 1/t 2 ≤ λf k (t)} = {t : f k+2 (t) ≥ λ }, so the best a and b satisfy _ b a f k (t) dt = 1 − α and f k+2 (a) = f k+2 (b). 9.45 a. Since Y = X i ∼ gamma(n, λ) has MLR, the Karlin-Rubin Theorem (Theorem 8.3.2) shows that the UMP test is to reject H 0 if Y < k(λ 0 ), where P (Y < k(λ 0 )|λ = λ 0 ) = α. b. T = 2Y/λ ∼ χ 2 2n so choose k(λ 0 ) = 1 2 λ 0 χ 2 2n,α and {λ: Y ≥ k(λ)} = _ λ: Y ≥ 1 2 λχ 2 2n,α _ = _ λ: 0 < λ ≤ 2Y/χ 2 2n,α _ is the UMA conﬁdence set. c. The expected length is E 2Y χ 2 2n,α = 2nλ χ 2 2n,α . d. X (1) ∼ exponential(λ/n), so EX (1) = λ/n. Thus E(length(C )) = 2 ×120 251.046 λ = .956λ E(length(C m )) = −λ 120 ×log(.99) = .829λ. Second Edition 9-13 9.46 The proof is similar to that of Theorem 9.3.5: P θ ∈ C (X)) = P θ (X ∈ A )) ≤ P θ (X ∈ A(θ )) = P θ ∈ C(X)) , where A and C are any competitors. The inequality follows directly from Deﬁnition 8.3.11. 9.47 Referring to (9.3.2), we want to show that for the upper conﬁdence bound, P θ ∈ C) ≤ 1 −α if θ ≥ θ. We have P θ ∈ C) = P θ ¯ X +z α σ/ n). Subtract θ from both sides and rearrange to get P θ ∈ C) = P θ _ θ −θ σ/ n ¯ X −θ σ/ n +z α _ = P _ Z ≥ θ −θ σ/ n −z α _ , which is less than 1 − α as long as θ ≥ θ. The solution for the lower conﬁdence interval is similar. 9.48 a. Start with the hypothesis test H 0 : θ ≥ θ 0 versus H 1 : θ < θ 0 . Arguing as in Example 8.2.4 and Exercise 8.47, we ﬁnd that the LRT rejects H 0 if ( ¯ X − θ 0 )/(S/ n) < −t n−1,α . So the acceptance region is {x: (¯ x −θ 0 )/(s/ n) ≥ −t n−1,α } and the corresponding conﬁdence set is {θ: ¯ x +t n−1,α s/ n ≥ θ}. b. The test in part a) is the UMP unbiased test so the interval is the UMA unbiased interval. 9.49 a. Clearly, for each σ, the conditional probability P θ 0 ( ¯ X > θ 0 + z α σ/ n | σ) = α, hence the test has unconditional size α. The conﬁdence set is {(θ,σ): θ ≥ ¯ x −z α σ/ n}, which has conﬁdence coeﬃcient 1 −α conditionally and, hence, unconditionally. b. From the Karlin-Rubin Theorem, the UMP test is to reject H 0 if X > c. To make this size α, P θ 0 (X > c) = P θ 0 (X > c| σ = 10) P(σ = 10) +P (X > c| σ = 1) P(σ = 1) = pP _ X −θ 0 10 > c −θ 0 10 _ + (1 −p)P(X −θ 0 > c −θ 0 ) = pP _ Z > c −θ 0 10 _ + (1 −p)P(Z > c −θ 0 ), where Z ∼ n(0, 1). Without loss of generality take θ 0 = 0. For c = z (α−p)/(1−p) we have for the proposed test P θ 0 (reject) = p + (1 −p)P _ Z > z (α−p)/(1−p) _ = p + (1 −p) (α−p) (1 −p) = p +α −p = α. This is not UMP, but more powerful than part a. To get UMP, solve for c in pP(Z > c/10) + (1 −p)P(Z > c) = α, and the UMP test is to reject if X > c. For p = 1/2, α = .05, we get c = 12.81. If α = .1 and p = .05, c = 1.392 and z.1−.05 .95 = .0526 = 1.62. 9.51 P θ (θ ∈ C(X 1 , . . . , X n )) = P θ _ ¯ X −k 1 ≤ θ ≤ ¯ X +k 2 _ = P θ _ −k 2 ¯ X −θ ≤ k 1 _ = P θ _ −k 2 Z i /n ≤ k 1 _ , where Z i = X i − θ, i = 1, . . . , n. Since this is a location family, for any θ, Z 1 , . . . , Z n are iid with pdf f(z), i. e., the Z i s are pivots. So the last probability does not depend on θ. 9-14 Solutions Manual for Statistical Inference 9.52 a. The LRT of H 0 : σ = σ 0 versus H 1 : σ = σ 0 is based on the statistic λ(x) = sup µ,σ=σ 0 L(µ, σ 0 | x) sup µ,σ∈(0,∞) L(µ, σ 2 | x) . In the denominator, ˆ σ 2 = (x i − ¯ x) 2 /n and ˆ µ = ¯ x are the MLEs, while in the numerator, σ 2 0 and ˆ µ are the MLEs. Thus λ(x) = _ 2πσ 2 0 _ −n/2 e Σ(x i −¯ x) 2 2 0 (2πˆ σ 2 ) −n/2 e Σ(x i −¯ x) 2 2 = _ σ 2 0 ˆ σ 2 _ −n/2 e Σ(x i −¯ x) 2 2 0 e −n/2 , and, writing ˆ σ 2 = [(n −1)/n]s 2 , the LRT rejects H 0 if _ σ 2 0 n−1 n s 2 _ −n/2 e (n−1)s 2 2 0 < k α , where k α is chosen to give a size α test. If we denote t = (n−1)s 2 σ 2 0 , then T ∼ χ 2 n−1 under H 0 , and the test can be written: reject H 0 if t n/2 e −t/2 < k α . Thus, a 1 −α conﬁdence set is _ σ 2 : t n/2 e −t/2 ≥ k α _ = _ σ 2 : _ (n −1)s 2 σ 2 _ n/2 e (n−1)s 2 σ 2 /2 ≥ k α _ . Note that the function t n/2 e −t/2 is unimodal (it is the kernel of a gamma density) so it follows that the conﬁdence set is of the form _ σ 2 : t n/2 e −t/2 ≥ k α _ = _ σ 2 : a ≤ t ≤ b _ = _ σ 2 : a ≤ (n −1)s 2 σ 2 ≤ b _ = _ σ 2 : (n −1)s 2 b ≤ σ 2 (n −1)s 2 b _ , where a and b satisfy a n/2 e −a/2 = b n/2 e −b/2 (since they are points on the curve t n/2 e −t/2 ). Since n 2 = n+2 2 −1, a and b also satisfy 1 Γ _ n+2 2 _ 2 (n+2)/2 a ((n+2)/2)−1 e −a/2 = 1 Γ _ n+2 2 _ 2 (n+2)/2 b ((n+2)/2)−1 e −b/2 , or, f n+2 (a) = f n+2 (b). b. The constants a and b must satisfy f n−1 (b)b 2 = f n−1 (a)a 2 . But since b ((n−1)/2)−1 b 2 = b ((n+3)/2)−1 , after adjusting constants, this is equivalent to f n+3 (b) = f n+3 (a). Thus, the values of a and b that give the minimum length interval must satisfy this along with the probability constraint. The conﬁdence interval, say I(s 2 ) will be unbiased if (Deﬁnition 9.3.7) c. P σ 2 _ σ 2 ∈ I(S 2 ) _ ≤ P σ 2 _ σ 2 ∈ I(S 2 ) _ = 1 −α. Some algebra will establish P σ 2 _ σ 2 ∈ I(S 2 ) _ = P σ 2 _ (n −1)S 2 2 σ 2 σ 2 (n −1)S 2 2 _ = P σ 2 _ χ 2 n−1 b σ 2 σ 2 χ 2 n−1 a _ = _ bc ac f n−1 (t) dt, Second Edition 9-15 where c = σ 2 2 . The derivative (with respect to c) of this last expression is bf n−1 (bc) − af n−1 (ac), and hence is equal to zero if both c = 1 (so the interval is unbiased) and bf n−1 (b) = af n−1 (a). From the form of the chi squared pdf, this latter condition is equivalent to f n+1 (b) = f n+1 (a). d. By construction, the interval will be 1 −α equal-tailed. 9.53 a. E[blength(C) −I C (µ)] = 2cσb −P(|Z| ≤ c), where Z ∼ n(0, 1). b. d dc [2cσb −P(|Z| ≤ c)] = 2σb −2 _ 1 e −c 2 /2 _ . c. If bσ > 1/ 2π the derivative is always positive since e −c 2 /2 < 1. 9.55 E[L((µ,σ), C)] = E[L((µ,σ), C)|S < K] P(S < K) + E[L((µ,σ), C)|S > K] P(S > K) = E _ L((µ,σ), C )|S < K ¸ P(S < K) + E[L((µ,σ), C)|S > K] P(S > K) = R _ L((µ,σ), C ) ¸ + E[L((µ,σ), C)|S > K] P(S > K), where the last equality follows because C = ∅ if S > K. The conditional expectation in the second term is bounded by E[L((µ,σ), C)|S > K] = E[blength(C) −I C (µ)|S > K] = E[2bcS −I C (µ)|S > K] > E[2bcK −1|S > K] (since S > K and I C ≤ 1) = 2bcK −1, which is positive if K > 1/2bc. For those values of K, C dominates C. 9.57 a. The distribution of X n+1 ¯ X is n[0, σ 2 (1 + 1/n)], so P _ X n+1 ¯ X ±z α/2 σ _ 1 + 1/n _ = P(|Z| ≤ z α/2 ) = 1 −α. b. p percent of the normal population is in the interval µ±z p/2 σ, so ¯ x±kσ is a 1−α tolerance interval if P(µ ±z p/2 ⊆ σ ¯ X ±kσ) = P( ¯ X −kσ ≤ µ −z p/2 σ and ¯ X +kσ ≥ µ +z p/2 σ) ≥ 1 −α. This can be attained by requiring P( ¯ X −kσ ≥ µ −z p/2 σ) = α/2 and P( ¯ X +kσ ≤ µ +z p/2 σ) = α/2, which is attained for k = z p/2 +z α/2 / n. c. From part (a), (X n+1 ¯ X)/(S _ 1 + 1/n) ∼ t n−1 , so a 1 − α prediction interval is ¯ X ± t n−1,α/2 S _ 1 + 1/n. Chapter 10 Asymptotic Evaluations 10.1 First calculate some moments for this distribution. EX = θ/3, EX 2 = 1/3, VarX = 1 3 θ 2 9 . So 3 ¯ X n is an unbiased estimator of θ with variance Var(3 ¯ X n ) = 9(VarX)/n = (3 −θ 2 )/n →0 as n →∞. So by Theorem 10.1.3, 3 ¯ X n is a consistent estimator of θ. 10.3 a. The log likelihood is n 2 log (2πθ) − 1 2 (x i −θ)/θ. Diﬀerentiate and set equal to zero, and a little algebra will show that the MLE is the root of θ 2 +θ −W = 0. The roots of this equation are (−1 ± 1 + 4W)/2, and the MLE is the root with the plus sign, as it has to be nonnegative. b. The second derivative of the log likelihood is (−2 x 2 i + nθ)/(2θ 3 ), yielding an expected Fisher information of I(θ) = −E θ −2 X 2 i +nθ 3 = 2nθ +n 2 , and by Theorem 10.1.12 the variance of the MLE is 1/I(θ). 10.4 a. Write X i Y i X 2 i = X i (X i + i ) X 2 i = 1 + X i i X 2 i . From normality and independence EX i i = 0, VarX i i = σ 2 2 2 ), EX 2 i = µ 2 2 , VarX 2 i = 2τ 2 (2µ 2 2 ), and Cov(X i , X i i ) = 0. Applying the formulas of Example 5.5.27, the asymptotic mean and variance are E _ X i Y i X 2 i _ ≈ 1 and Var _ X i Y i X 2 i _ 2 2 2 ) [n(µ 2 2 )] 2 = σ 2 n(µ 2 2 ) b. Y i X i = β + i X i with approximate mean β and variance σ 2 /(nµ 2 ). 10-2 Solutions Manual for Statistical Inference c. 1 n Y i X i = β + 1 n i X i with approximate mean β and variance σ 2 /(nµ 2 ). 10.5 a. The integral of ET 2 n is unbounded near zero. We have ET 2 n > _ n 2πσ 2 _ 1 0 1 x 2 e −(x−µ) 2 /2σ 2 dx > _ n 2πσ 2 K _ 1 0 1 x 2 dx = ∞, where K = max 0≤x≤1 e −(x−µ) 2 /2σ 2 b. If we delete the interval (−δ, δ), then the integrand is bounded, that is, over the range of integration 1/x 2 < 1/δ 2 . c. Assume µ > 0. A similar argument works for µ < 0. Then P(−δ < X < δ) = P[ n(−δ −µ) < n(X −µ) < n(δ −µ)] < P[Z < n(δ −µ)], where Z ∼ n(0, 1). For δ < µ, the probability goes to 0 as n →∞. 10.7 We need to assume that τ(θ) is diﬀerentiable at θ = θ 0 , the true value of the parameter. Then we apply Theorem 5.5.24 to Theorem 10.1.12. 10.9 We will do a more general problem that includes a) and b) as special cases. Suppose we want to estimate λ t e −λ /t! = P(X = t). Let T = T(X 1 , . . . , X n ) = _ 1 if X 1 = t 0 if X 1 = t. Then ET = P(T = 1) = P(X 1 = t), so T is an unbiased estimator. Since X i is a complete suﬃcient statistic for λ, E(T| X i ) is UMVUE. The UMVUE is 0 for y = X i < t, and for y ≥ t, E(T|y) = P(X 1 = t| X i = y) = P(X 1 = t, X i = y) P( X i = y) = P(X 1 = t)P( n i=2 X i = y −t) P( X i = y) = t e −λ /t!}{[(n −1)λ] y−t e −(n−1)λ /(y −t)!} (nλ) y e −nλ /y! = _ y t _ (n −1) y−t n y . a. The best unbiased estimator of e −λ is ((n −1)/n) y . b. The best unbiased estimator of λe −λ is (y/n)[(n −1)/n] y−1 c. Use the fact that for constants a and b, d λ a b λ = b λ λ a−1 (a +λlog b), to calculate the asymptotic variances of the UMVUEs. We have for t = 0, ARE _ _ n −1 n _ n ˆ λ , e −λ _ = _ e −λ _ n−1 n _ log _ n−1 n _ n _ 2 , Second Edition 10-3 and for t = 1 ARE _ n n −1 ˆ λ _ n −1 n _ n ˆ λ , ˆ λe −λ _ = _ (λ −1)e −λ n n−1 _ n−1 n _ _ 1 + log _ n−1 n _ n ¸ _ 2 . Since [(n −1)/n] n →e −1 as n →∞, both of these AREs are equal to 1 in the limit. d. For these data, n = 15, X i = y = 104 and the MLE of λ is ˆ λ = ¯ X = 6.9333. The estimates are MLE UMVUE P(X = 0) .000975 .000765 P(X = 1) .006758 .005684 10.11 a. It is easiest to use the Mathematica code in Example A.0.7. The second derivative of the log likelihood is 2 ∂µ 2 log _ 1 Γ[µ/β]β µ/β x −1+µ/β e −x/β _ = 1 β 2 ψ (µ/β), where ψ(z) = Γ (z)/Γ(z) is the digamma function. b. Estimation of β does not aﬀect the calculation. c. For µ = αβ known, the MOM estimate of β is ¯ x/α. The MLE comes from diﬀerentiating the log likelihood d _ −αnlog β − i x i _ set = 0 ⇒β = ¯ x/α. d. The MOM estimate of β comes from solving 1 n i x i = µ and 1 n i x 2 i = µ 2 +µβ, which yields ˜ β = ˆ σ 2 /¯ x. The approximate variance is quite a pain to calculate. Start from E ¯ X = µ, Var ¯ X = 1 n µβ, Eˆ σ 2 ≈ µβ, Varˆ σ 2 2 n µβ 3 , where we used Exercise 5.8(b) for the variance of ˆ σ 2 . Now using Example 5.5.27 and (and assuming the covariance is zero), we have Var ˜ β ≈ 3 . The ARE is then ARE( ˆ β, ˜ β) = _ 3 ¸ _ E _ d 2 2 l(µ, β|X __ . Here is a small table of AREs. There are some entries that are less than one - this is due to using an approximation for the MOM variance. µ β 1 3 6 10 1 1.878 0.547 0.262 0.154 2 4.238 1.179 0.547 0.317 3 6.816 1.878 0.853 0.488 4 9.509 2.629 1.179 0.667 5 12.27 3.419 1.521 0.853 6 15.075 4.238 1.878 1.046 7 17.913 5.08 2.248 1.246 8 20.774 5.941 2.629 1.451 9 23.653 6.816 3.02 1.662 10 26.546 7.704 3.419 1.878 10-4 Solutions Manual for Statistical Inference 10.13 Here are the 35 distinct samples from {2, 4, 9, 12} and their weights. {12, 12, 12, 12}, 1/256 {9, 12, 12, 12}, 1/64 {9, 9, 12, 12}, 3/128 {9, 9, 9, 12}, 1/64 {9, 9, 9, 9}, 1/256 {4, 12, 12, 12}, 1/64 {4, 9, 12, 12}, 3/64 {4, 9, 9, 12}, 3/64 {4, 9, 9, 9}, 1/64 {4, 4, 12, 12}, 3/128 {4, 4, 9, 12}, 3/64 {4, 4, 9, 9}, 3/128 {4, 4, 4, 12}, 1/64 {4, 4, 4, 9}, 1/64 {4, 4, 4, 4}, 1/256 {2, 12, 12, 12}, 1/64 {2, 9, 12, 12}, 3/64 {2, 9, 9, 12}, 3/64 {2, 9, 9, 9}, 1/64 {2, 4, 12, 12}, 3/64 {2, 4, 9, 12}, 3/32 {2, 4, 9, 9}, 3/64 {2, 4, 4, 12}, 3/64 {2, 4, 4, 9}, 3/64 {2, 4, 4, 4}, 1/64 {2, 2, 12, 12}, 3/128 {2, 2, 9, 12}, 3/64 {2, 2, 9, 9}, 3/128 {2, 2, 4, 12}, 3/64 {2, 2, 4, 9}, 3/64 {2, 2, 4, 4}, 3/128 {2, 2, 2, 12}, 1/64 {2, 2, 2, 9}, 1/64 {2, 2, 2, 4}, 1/64 {2, 2, 2, 2}, 1/256 The veriﬁcations of parts (a) − (d) can be done with this table, or the table of means in Example A.0.1 can be used. For part (e),verifying the bootstrap identities can involve much painful algebra, but it can be made easier if we understand what the bootstrap sample space (the space of all n n bootstrap samples) looks like. Given a sample x 1 , x 2 , . . . , x n , the bootstrap sample space can be thought of as a data array with n n rows (one for each bootstrap sample) and n columns, so each row of the data array is one bootstrap sample. For example, if the sample size is n = 3, the bootstrap sample space is x 1 x 1 x 1 x 1 x 1 x 2 x 1 x 1 x 3 x 1 x 2 x 1 x 1 x 2 x 2 x 1 x 2 x 3 x 1 x 3 x 1 x 1 x 3 x 2 x 1 x 3 x 3 x 2 x 1 x 1 x 2 x 1 x 2 x 2 x 1 x 3 x 2 x 2 x 1 x 2 x 2 x 2 x 2 x 2 x 3 x 2 x 3 x 1 x 2 x 3 x 2 x 2 x 3 x 3 x 3 x 1 x 1 x 3 x 1 x 2 x 3 x 1 x 3 x 3 x 2 x 1 x 3 x 2 x 2 x 3 x 2 x 3 x 3 x 3 x 1 x 3 x 3 x 2 x 3 x 3 x 3 Note the pattern. The ﬁrst column is 9 x 1 s followed by 9 x 2 s followed by 9 x 3 s, the second column is 3 x 1 s followed by 3 x 2 s followed by 3 x 3 s, then repeated, etc. In general, for the entire bootstrap sample, Second Edition 10-5 ◦ The ﬁrst column is n n−1 x 1 s followed by n n−1 x 2 s followed by, . . ., followed by n n−1 x n s ◦ The second column is n n−2 x 1 s followed by n n−2 x 2 s followed by, . . ., followed by n n−2 x n s, repeated n times ◦ The third column is n n−3 x 1 s followed by n n−3 x 2 s followed by, . . ., followed by n n−3 x n s, repeated n 2 times . . . ◦ The n th column is 1 x 1 followed by 1 x 2 followed by, . . ., followed by 1 x n , repeated n n−1 times So now it is easy to see that each column in the data array has mean ¯ x, hence the entire bootstrap data set has mean ¯ x. Appealing to the 3 3 × 3 data array, we can write the numerator of the variance of the bootstrap means as 3 i=1 3 j=1 3 k=1 _ 1 3 (x i +x j +x k ) − ¯ x _ 2 = 1 3 2 3 i=1 3 j=1 3 k=1 [(x i − ¯ x) + (x j − ¯ x) + (x k − ¯ x)] 2 = 1 3 2 3 i=1 3 j=1 3 k=1 _ (x i − ¯ x) 2 + (x j − ¯ x) 2 + (x k − ¯ x) 2 ¸ , because all of the cross terms are zero (since they are the sum of deviations from the mean). Summing up and collecting terms shows that 1 3 2 3 i=1 3 j=1 3 k=1 _ (x i − ¯ x) 2 + (x j − ¯ x) 2 + (x k − ¯ x) 2 ¸ = 3 3 i=1 (x i − ¯ x) 2 , and thus the average of the variance of the bootstrap means is 3 3 i=1 (x i − ¯ x) 2 3 3 which is the usual estimate of the variance of ¯ X if we divide by n instead of n − 1. The general result should now be clear. The variance of the bootstrap means is n i 1 =1 n i 2 =1 · · · n i n =1 _ 1 n (x i 1 +x i 2 +· · · +x i n ) − ¯ x _ 2 = 1 n 2 n i 1 =1 n i 2 =1 · · · n i n =1 _ (x i 1 − ¯ x) 2 + (x i 2 − ¯ x) 2 +· · · + (x i n − ¯ x) 2 ¸ , since all of the cross terms are zero. Summing and collecting terms shows that the sum is n n−2 n i=1 (x i − ¯ x) 2 , and the variance of the bootstrap means is n n−2 n i=1 (x i − ¯ x) 2 /n n = n i=1 (x i − ¯ x) 2 /n 2 . 10.15 a. As B →∞ Var B ( ˆ θ) = Var ( ˆ θ). b. Each Var B i ( ˆ θ) is a sample variance, and they are independent so the LLN applies and 1 m m i=1 Var B i ( ˆ θ) m→∞ → EVar B ( ˆ θ) = Var ( ˆ θ), where the last equality follows from Theorem 5.2.6(c). 10-6 Solutions Manual for Statistical Inference 10.17 a. The correlation is .7781 b. Here is R code (R is available free at http://cran.r-project.org/) to bootstrap the data, calculate the standard deviation, and produce the histogram: cor(law) n <- 15 theta <- function(x,law){ cor(law[x,1],law[x,2]) } results <- bootstrap(1:n,1000,theta,law,func=sd) results[2] hist(results[[1]]) The data “law” is in two columns of length 15, “results[2]” contains the standard deviation. The vector “results[[1]]” is the bootstrap sample. The output is V1 V2 V1 1.0000000 0.7781716 V2 0.7781716 1.0000000 func.thetastar [1] 0.1322881 showing a correlation of .7781 and a bootstrap standard deviation of .1323. c. The R code for the parametric bootstrap is mx<-600.6;my<-3.09 sdx<-sqrt(1791.83);sdy<-sqrt(.059) rho<-.7782;b<-rho*sdx/sdy;sdxy<-sqrt(1-rho^2)*sdx rhodata<-rho for (j in 1:1000) { y<-rnorm(15,mean=my,sd=sdy) x<-rnorm(15,mean=mx+b*(y-my),sd=sdxy) rhodata<-c(rhodata,cor(x,y)) } sd(rhodata) hist(rhodata) where we generate the bivariate normal by ﬁrst generating the marginal then the condid- ional, as R does not have a bivariate normal generator. The bootstrap standard deviation is 0.1159, smaller than the nonparametric estimate. The histogram looks similar to the nonparametric bootstrap histogram, displaying a skewness left. d. The Delta Method approximation is r ∼ n(ρ, (1 −ρ 2 ) 2 /n), and the “plug-in” estimate of standard error is _ (1 −.7782 2 ) 2 /15 = .1018, the smallest so far. Also, the approximate pdf of r will be normal, hence symmetric. e. By the change of variables t = 1 2 log _ 1 +r 1 −r _ , dt = 1 1 −r 2 , the density of r is 1 2π(1 −r 2 ) exp _ n 2 _ 1 2 log _ 1 +r 1 −r _ 1 2 log _ 1 +ρ 1 −ρ __ 2 _ , −1 ≤ r ≤ 1. More formally, we could start with the random variable T, normal with mean 1 2 log _ 1+ρ 1−ρ _ and variance 1/n, and make the transformation to R = e 2T +1 e 2T −1 and get the same answer. Second Edition 10-7 10.19 a. The variance of ¯ X is Var ¯ X = E( ¯ X −µ) 2 = E _ 1 n i X i −µ _ 2 = 1 n 2 E _ _ i (X i −µ) 2 + 2 i>j (X i −µ)(X j −µ) _ _ = 1 n 2 _ 2 + 2 n(n −1) 2 ρσ 2 _ = σ 2 n + n −1 n ρσ 2 b. In this case we have E _ _ i>j (X i −µ)(X j −µ) _ _ = σ 2 n i=2 i−1 j=1 ρ i−j . In the double sum ρ appears n −1 times, ρ 2 appears n −2 times, etc.. so n i=2 i−1 j=1 ρ i−j = n−1 i=1 (n −i)ρ i = ρ 1 −ρ _ n − 1 −ρ n 1 −ρ _ , where the series can be summed using (1.5.4), the partial sum of the geometric series, or using Mathematica. c. The mean and variance of X i are EX i = E[E(X i |X i−1 )] = EρX i−1 = · · · = ρ i−1 EX 1 and VarX i = VarE(X i |X i−1 ) + EVar(X i |X i−1 ) = ρ 2 σ 2 + 1 = σ 2 for σ 2 = 1/(1 −ρ 2 ). Also, by iterating the expectation EX 1 X i = E[E(X 1 X i |X i−1 )] = E[E(X 1 |X i−1 )E(X i |X i−1 )] = ρE[X 1 X i−1 ], where we used the facts that X 1 and X i are independent conditional on X i−1 . Continuing with the argument we get that EX 1 X i = ρ i−1 EX 2 1 . Thus, Corr(X 1 , X i ) = ρ i−1 EX 2 1 −ρ i−1 (EX 1 ) 2 VarX 1 VarX i = ρ i−1 σ 2 σ 2 σ 2 = ρ i−1 . 10.21 a. If any x i → ∞, s 2 → ∞, so it has breakdown value 0. To see this, suppose that x 1 → ∞. Write s 2 = 1 n −1 n i=1 (x i − ¯ x) 2 = 1 n −1 _ [(1 − 1 n )x 1 − ¯ x −1 ] 2 + n i=2 (x i − ¯ x) 2 _ , where ¯ x −1 = (x 2 + . . . + x n )/n. It is easy to see that as x 1 → ∞, each term in the sum →∞. b. If less than 50% of the sample → ∞, the median remains the same, and the median of |x i − M| remains the same. If more than 50% of the sample → ∞, M → ∞ and so does the MAD. 10-8 Solutions Manual for Statistical Inference 10.23 a. The ARE is [2σf(µ)] 2 . We have Distribution Parameters variance f(µ) ARE normal µ = 0, σ = 1 1 .3989 .64 logistic µ = 0, β = 1 π 2 /3 .25 .82 double exp. µ = 0, σ = 1 2 .5 2 b. If X 1 , X 2 , . . . , X n are iid f X with EX 1 = µ and VarX 1 = σ 2 , the ARE is σ 2 [2 ∗ f X (µ)] 2 . If we transform to Y i = (X i − µ)/σ, the pdf of Y i is f Y (y) = σf X (σy + µ) with ARE [2 ∗ f Y (0)] 2 = σ 2 [2 ∗ f X (µ)] 2 c. The median is more eﬃcient for smaller ν, the distributions with heavier tails. ν VarX f(0) ARE 3 3 .367 1.62 5 5/3 .379 .960 10 5/4 .389 .757 25 25/23 .395 .678 50 25/24 .397 .657 ∞ 1 .399 .637 d. Again the heavier tails favor the median. δ σ ARE .01 2 .649 .1 2 .747 .5 2 .895 .01 5 .777 .1 5 1.83 .5 5 2.98 10.25 By transforming y = x −θ, _ −∞ ψ(x −θ)f(x −θ)dx = _ −∞ ψ(y)f(y)dy. Since ψ is an odd function, ψ(y) = −ψ(−y), and _ −∞ ψ(y)f(y)dy = _ 0 −∞ ψ(y)f(y)dy + _ 0 ψ(y)f(y)dy = _ 0 −∞ −ψ(−y)f(y)dy + _ 0 ψ(y)f(y)dy = − _ 0 ψ(y)f(y)dy + _ 0 ψ(y)f(y)dy = 0, where in the last line we made the transformation y →−y and used the fact the f is symmetric, so f(y) = f(−y). From the discussion preceding Example 10.2.6, ˆ θ M is asymptotically normal with mean equal to the true θ. 10.27 a. lim δ→0 1 δ [(1 −δ)µ +δx −µ] = lim δ→0 δ(x −µ) δ = x −µ. b. P(X ≤ a) = P(X ≤ a|X ∼ F)(1 −δ) +P(x ≤ a|X = x)δ = (1 −δ)F(a) +δI(x ≤ a) Second Edition 10-9 and (1 −δ)F(a) = 1 2 ⇒ a = F −1 _ 1 2(1 −δ) _ (1 −δ)F(a) +δ = 1 2 ⇒ a = F −1 _ 1 2 −δ 2(1 −δ) _ c. The limit is lim δ→0 a δ −a 0 δ = a δ | δ=0 by the deﬁnition of derivative. Since F(a δ ) = 1 2(1−δ) , d F(a δ ) = d 1 2(1 −δ) or f(a δ )a δ = 1 2(1 −δ) 2 ⇒a δ = 1 2(1 −δ) 2 f(a δ ) . Since a 0 = m, the result follows. The other limit can be calculated in a similar manner. 10.29 a. Substituting cl for ψ makes the ARE equal to 1. b. For each distribution is the case that the given ψ function is equal to cl , hence the resulting M-estimator is asymptotically eﬃcient by (10.2.9). 10.31 a. By the CLT, n 1 ˆ p 1 −p 1 _ p 1 (1 −p 1 ) →n(0, 1) and n 2 ˆ p 2 −p 2 _ p 2 (1 −p 2 ) →n(0, 1), so if ˆ p 1 and ˆ p 2 are independent, under H 0 : p 1 = p 2 = p, ˆ p 1 − ˆ p 2 _ _ 1 n 1 + 1 n 2 _ ˆ p(1 − ˆ p) →n(0, 1) where we use Slutsky’s Theorem and the fact that ˆ p = (S 1 +S 2 )/(n 1 +n 2 ) is the MLE of p under H 0 and converges to p in probability. Therefore, T →χ 2 1 . b. Substitute ˆ p i s for S i and F i s to get T = n 2 1 (ˆ p 1 − ˆ p) 2 n 1 ˆ p + n 2 2 (ˆ p 2 − ˆ p) 2 n 2 ˆ p + n 2 1 [(1 − ˆ p 1 ) −(1 − ˆ p)] 2 n 1 (1 − ˆ p) + n 2 2 [(1 − ˆ p 2 ) −(1 − ˆ p)] 2 n 2 ˆ p = n 1 (ˆ p 1 − ˆ p) 2 ˆ p(1 − ˆ p) + n 2 (ˆ p 2 − ˆ p) 2 ˆ p(1 − ˆ p) Write ˆ p = (n 1 ˆ p 1 + n 2 ˆ p 2 )/(n 1 + n 2 ). Substitute this into the numerator, and some algebra will get n 1 (ˆ p 1 − ˆ p) 2 +n 2 (ˆ p 2 − ˆ p) 2 = (ˆ p 1 − ˆ p 2 ) 2 1 n 1 + 1 n 2 , so T = T. 10-10 Solutions Manual for Statistical Inference c. Under H 0 , ˆ p 1 − ˆ p 2 _ _ 1 n 1 + 1 n 2 _ p(1 −p) →n(0, 1) and both ˆ p 1 and ˆ p 2 are consistent, so ˆ p 1 (1 − ˆ p 1 ) → p(1 − p) and ˆ p 2 (1 − ˆ p 2 ) → p(1 − p) in probability. Therefore, by Slutsky’s Theorem, ˆ p 1 −ˆ p 2 _ ˆ p 1 (1−ˆ p 1 ) n 1 + ˆ p 2 (1−ˆ p 2 ) n 2 →n(0, 1), and (T ∗∗ ) 2 →χ 2 1 . It is easy to see that T ∗∗ = T in general. d. The estimator (1/n 1 + 1/n 2 )ˆ p(1 − ˆ p) is the MLE of Var(ˆ p 1 − ˆ p 2 ) under H 0 , while the estimator ˆ p 1 (1 − ˆ p 1 )/n 1 + ˆ p 2 (1 − ˆ p 2 )/n 1 is the MLE of Var(ˆ p 1 − ˆ p 2 ) under H 1 . One might argue that in hypothesis testing, the ﬁrst one should be used, since under H 0 , it provides a better estimator of variance. If interest is in ﬁnding the conﬁdence interval, however, we are making inference under both H 0 and H 1 , and the second one is preferred. e. We have ˆ p 1 = 34/40, ˆ p 2 = 19/35, ˆ p = (34 + 19)/(40 + 35) = 53/75, and T = 8.495. Since χ 2 1,.05 = 3.84, we can reject H 0 at α = .05. 10.32 a. First calculate the MLEs under p 1 = p 2 = p. We have L(p|x) = p x 1 p x 2 p x 3 · · · p x n−1 n−1 _ 1−2p− n−1 i=3 p i _ m−x 1 −x 2 −···−x n−1 . Taking logs and diﬀerentiating yield the following equations for the MLEs: ∂logL ∂p = x 1 +x 2 p 2 _ m− n−1 i=1 x i _ 1−2p− n−1 i=3 p i = 0 ∂logL ∂p i = x i p i x n 1−2p− n−1 i=3 p i = 0, i = 3, . . . , n −1, with solutions ˆ p = x 1 +x 2 2m , ˆ p i = x i m , i = 3, . . . , n − 1, and ˆ p n = _ m− n−1 i=1 x i _ /m. Except for the ﬁrst and second cells, we have expected = observed, since both are equal to x i . For the ﬁrst two terms, expected = mˆ p = (x 1 +x 2 )/2 and we get (observed −expected) 2 expected = _ x 1 x 1 +x 2 2 _ 2 x 1 +x 2 2 + _ x 2 x 1 +x 2 2 _ 2 x 1 +x 2 2 = (x 1 −x 2 ) 2 x 1 +x 2 . b. Now the hypothesis is about conditional probabilities is given by H 0 : P(change—initial agree)=P(change—initial disagree) or, in terms of the parameters H 0 : p 1 p 1 +p 3 = p 2 p 2 +p 4 . This is the same as p 1 p 4 = p 2 p 3 , which is not the same as p 1 = p 2 . 10.33 Theorem 10.1.12 and Slutsky’s Theorem imply that ˆ θ −θ _ 1 n I n ( ˆ θ) →n(0, 1) and the result follows. 10.35 a. Since σ/ n is the estimated standard deviation of ¯ X in this case, the statistic is a Wald statistic Second Edition 10-11 b. The MLE of σ 2 is ˆ σ 2 µ = i (x i −µ) 2 /n. The information number is d 2 d(σ 2 ) 2 _ n 2 log σ 2 1 2 ˆ σ 2 µ σ 2 ¸ ¸ ¸ ¸ σ 2 =ˆ σ 2 µ = n 2ˆ σ 2 µ . Using the Delta method, the variance of ˆ σ µ = _ ˆ σ 2 µ is ˆ σ 2 µ /8n, and a Wald statistic is ˆ σ µ −σ 0 _ σ 2 µ /8n . 10.37 a. The log likelihood is log L = − n 2 log σ 2 1 2 i (x i −µ) 2 2 with d = 1 σ 2 i (x i −µ) = n σ 2 (¯ x −µ) d 2 2 = − n σ 2 , so the test statistic for the score test is n σ 2 (¯ x −µ) _ σ 2 /n = n ¯ x −µ σ b. We test the equivalent hypothesis H 0 : σ 2 = σ 2 0 . The likelihood is the same as Exercise 10.35(b), with ﬁrst derivative d 2 = n(ˆ σ 2 µ −σ 2 ) 4 and expected information number E _ n(2ˆ σ 2 µ −σ 2 ) 6 _ = n(2σ 2 −σ 2 ) 6 = n 4 . The score test statistic is _ n 2 ˆ σ 2 µ −σ 2 0 σ 2 0 10.39 We summarize the results for (a) −(c) in the following table. We assume that the underlying distribution is normal, and use that for all score calculations. The actual data is generated from normal, logistic, and double exponential. The sample size is 15, we use 1000 simulations and draw 20 bootstrap samples. Here θ 0 = 0, and the power is tabulated for a nominal α = .1 test. 10-12 Solutions Manual for Statistical Inference Underlying pdf Test θ 0 θ 0 + .25σ θ 0 + .5σ θ 0 + .75σ θ 0 + 1σ θ 0 + 2σ Laplace Naive 0.101 0.366 0.774 0.957 0.993 1. Boot 0.097 0.364 0.749 0.932 0.986 1. Median 0.065 0.245 0.706 0.962 0.995 1. Logistic Naive 0.137 0.341 0.683 0.896 0.97 1. Boot 0.133 0.312 0.641 0.871 0.967 1. Median 0.297 0.448 0.772 0.944 0.993 1. Normal Naive 0.168 0.316 0.628 0.878 0.967 1. Boot 0.148 0.306 0.58 0.836 0.957 1. Median 0.096 0.191 0.479 0.761 0.935 1. Here is Mathematica code: This program calculates size and power for Exercise 10.39, Second Edition We do our calculations assuming normality, but simulate power and size under other distri- butions. We test H 0 : θ = 0. theta_0=0; Needs["Statistics‘Master‘"] Clear[x] f1[x_]=PDF[NormalDistribution[0,1],x]; F1[x_]=CDF[NormalDistribution[0,1],x]; f2[x_]=PDF[LogisticDistribution[0,1],x]; f3[x_]=PDF[LaplaceDistribution[0,1],x]; v1=Variance[NormalDistribution[0,1]]; v2=Variance[LogisticDistribution[0,1]]; v3=Variance[LaplaceDistribution[0,1]]; Calculate m-estimate Clear[k,k1,k2,t,x,y,d,n,nsim,a,w1] ind[x_,k_]:=If[Abs[x]<k,1,0] rho[y_,k_]:=ind[y,k]*y^2 + (1-ind[y,k])*(k*Abs[y]-k^2) alow[d_]:=Min[Mean[d],Median[d]] aup[d_]:=Max[Mean[d],Median[d]] sol[k_,d_]:=FindMinimum[Sum[rho[d[[i]]-a,k],{i,1,n}],{a,{alow[d],aup[d]}}] mest[k_,d_]:=sol[k,d][[2]] generate data - to change underlying distributions change the sd and the distribution in the Random statement. n = 15; nsim = 1000; sd = Sqrt[v1]; theta = {theta_0, theta_0 +.25*sd, theta_0 +.5*sd, theta_0 +.75*sd, theta_0 + 1*sd, theta_0 +2*sd} ntheta = Length[theta] data = Table[Table[Random[NormalDistribution[0, 1]], {i, 1, n}],{j, 1,nsim}]; m1 = Table[Table[a /. mest[k1, data[[j]] - theta[[i]]], {j, 1, nsim}], {i, 1, n\theta}]; Calculation of naive variance and test statistic Psi[x_, k_] = x*If[Abs[x]<= k, 1, 0]- k*If[x < -k, 1, 0] + Second Edition 10-13 k*If[x > k, 1, 0]; Psi1[x_, k_] = If[Abs[x] <= k, 1, 0]; num =Table[Psi[w1[[j]][[i]], k1], {j, 1, nsim}, {i, 1,n}]; den =Table[Psi1[w1[[j]][[i]], k1], {j, 1, nsim}, {i, 1,n}]; varnaive = Map[Mean, num^2]/Map[Mean, den]^2; naivestat = Table[Table[m1[[i]][[j]] -theta_0/Sqrt[varnaive[[j]]/n], {j, 1, nsim}],{i, 1, ntheta}]; absnaive = Map[Abs, naivestat]; N[Table[Mean[Table[If[absnaive[[i]][[j]] > 1.645, 1, 0], {j, 1, nsim}]], {i, 1, n\theta}]] Calculation of bootstrap variance and test statistic nboot=20; u:=Random[DiscreteUniformDistribution[n]] databoot=Table[data[[jj]][[u]],{jj,1,nsim},{j,1,nboot},{i,1,n}]; m1boot=Table[Table[a/.mest[k1,databoot[[j]][[jj]]], {jj,1,nboot}],{j,1,nsim}]; varboot = Map[Variance, m1boot]; bootstat = Table[Table[m1[[i]][[j]] -theta_0/Sqrt[varboot[[j]]], {j, 1, nsim}], {i, 1, ntheta}]; absboot = Map[Abs, bootstat]; N[Table[Mean[Table[If[absboot[[i]][[j]] > 1.645, 1,0], {j, 1, nsim}]], {i, 1, ntheta}]]$$
Calculation of median test - use the score variance at the root density (normal)
med = Map[Median, data];
medsd = 1/(n*2*f1[theta_0]);
medstat = Table[Table[med[[j]] + \theta[[i]] - theta_0/medsd,
{j, 1, nsim}], {i, 1, ntheta}];
absmed = Map[Abs, medstat];
N[Table[Mean[Table[If[\(absmed[[i]][[j]] > 1.645, 1, 0],
{j, 1, nsim}]], {i, 1, ntheta}]]
10.41 a. The log likelihood is
log L = nr log p +n¯ xlog(1 −p)
with
d
dp
log L =
nr
p

n¯ x
1 −p
and
d
2
dp
2
log L = −
nr
p
2

n¯ x
(1 −p)
2
,
expected information
nr
p
2
(1−p)
and (Wilks) score test statistic

n
_
r
p

n¯ x
1−p
_
_
r
p
2
(1−p)
=
_
n
r
_
(1 −p)r +p¯ x

1 −p
_
.
Since this is approximately n(0, 1), a 1 −α conﬁdence set is
_
p :
¸
¸
¸
¸
_
n
r
_
(1 −p)r −p¯ x

1 −p

¸
¸
¸
≤ z
α/2
_
.
10-14 Solutions Manual for Statistical Inference
b. The mean is µ = r(1 −p)/p, and a little algebra will verify that the variance, r(1 −p)/p
2
can be written r(1 −p)/p
2
= µ +µ
2
/r. Thus
_
n
r
_
(1 −p)r −p¯ x

1 −p
_
=

n
µ − ¯ x
_
µ +µ
2
/r
.
The conﬁdence interval is found by setting this equal to z
α/2
, squaring both sides, and
solving the quadratic for µ. The endpoints of the interval are
r(8¯ x +z
2
α/2
) ±
_
rz
2
α/2
_
16r¯ x + 16¯ x
2
+rz
2
α/2
8r −2z
2
α/2
.
For the continuity correction, replace ¯ x with ¯ x+1/(2n) when solving for the upper endpoint,
and with ¯ x −1/(2n) when solving for the lower endpoint.
c. We table the endpoints for α = .1 and a range of values of r. Note that r = ∞ is the
Poisson, and smaller values of r give a wider tail to the negative binomial distribution.
r lower bound upper bound
1 22.1796 364.42
5 36.2315 107.99
10 38.4565 95.28
50 40.6807 85.71
100 41.0015 84.53
1000 41.3008 83.46
∞ 41.3348 83.34
10.43 a. Since
P
_

i
X
i
= 0
_
= (1 −p)
n
= α/2 ⇒p = 1 −α
1/n
and
P
_

i
X
i
= n
_
= p
n
= α/2 ⇒p = α
1/n
,
these endpoints are exact, and are the shortest possible.
b. Since p ∈ [0, 1], any value outside has zero probability, so truncating the interval shortens
it at no cost.
10.45 The continuity corrected roots are
2ˆ p +z
2
α/2
/n ±
1
n
±
_
z
2
α/2
n
3
[±2n(1 −2ˆ p) −1] + (2ˆ p +z
2
α/2
/n)
2
−4ˆ p
2
(1 +z
2
α/2
/n)
2(1 +z
2
α/2
/n)
where we use the upper sign for the upper root and the lower sign for the lower root. Note that
the only diﬀerences between the continuity-corrected intervals and the ordinary score intervals
are the terms with ± in front. But it is still diﬃcult to analytically compare lengths with the
non-corrected interval - we will do a numerical comparison. For n = 10 and α = .1 we have
the following table of length ratios, with the continuity-corrected length in the denominator
n 0 1 2 3 4 5 6 7 8 9 10
Ratio 0.79 0.82 0.84 0.85 0.86 0.86 0.86 0.85 0.84 0.82 0.79
The coverage probabilities are
Second Edition 10-15
p 0 .1 .2 .3 .4 .5 .6 .7 .8 .9 1
score .99 .93 .97 .92 .90 .89 .90 .92 .97 .93 .99
cc .99 .99 .97 .92 .98 .98 .98 .92 .97 .99 .99
Mathematica code to do the calculations is:
Needs["Statistics‘Master‘"]
Clear[p, x]
pbino[p_, x_] = PDF[BinomialDistribution[n, p], x];
cut = 1.645^2;
n = 10;
The quadratic score interval with and without continuity correction
slowcc[x_] := p /. FindRoot[(x/n - 1/(2*n) - p)^2 ==
cut*(p*((1 - p))/n, {p, .001}]
supcc[x_] := p /. FindRoot[(x/n + 1/(2*n) - p)^2 ==
cut*(p*((1 - p)/n, {p, .999}]
slow[x_] := p /. FindRoot[(x/n - p))^2 ==
cut*(p*(1 - p))/n, {p, .001}]
sup[x_] := p /. FindRoot[(x/n - p)^2 ==
cut*(p*(1 - p)/n, {p, .999}]
scoreintcc=Partition[Flatten[{{0,sup[0]},Table[{slowcc[i],supcc[i]},
{i,1,n-1}],{slowcc[n],1}},2],2];
scoreint=Partition[Flatten[{{0,sup[0]},Table[{slow[i],sup[i]},
{i,1,n-1}],{slowcc[n],1}},2],2];
Length Comparison
Table[(sup[i] - slow[i])/(supcc[i] - slowcc[i]), {i, 0, n}]
Now we’ll calculate coverage probabilities
scoreindcc[p_,x_]:=If[scoreintcc[[x+1]][[1]]<=p<=scoreintcc[[x+1]][[2]],1,0]
scorecovcc[p_]:=scorecovcc[p]=Sum[pbino[p,x]*scoreindcc[p,x],{x,0,n}]
scoreind[p_,x_]:=If[scoreint[[x+1]][[1]]<=p<=scoreint[[x+1]][[2]],1,0]
scorecov[p_]:=scorecov[p]=Sum[pbino[p,x]*scoreind[p,x],{x,0,n}]
{scorecovcc[.0001],Table[scorecovcc[i/10],{i,1,9}],scorecovcc[.9999]}//N
{scorecov[.0001],Table[scorecov[i/10],{i,1,9}],scorecov[.9999]}//N
10.47 a. Since 2pY ∼ χ
2
nr
(approximately)
P(χ
2
nr,1−α/2
≤ 2pY ≤ χ
2
nr,α/2
) = 1 −α,
and rearrangment gives the interval.
b. The interval is of the form P(a/2Y ≤ p ≤ b/2Y ), so the length is proportional to b − a.
This must be minimized subject to the constraint
_
b
a
f(y)dy = 1 −α, where f(y) is the pdf
of a χ
2
nr
. Treating b as a function of a, diﬀerentiating gives
b

−1 = 0 and f(b)b

−f(a) = 0
which implies that we need f(b) = f(a).
Chapter 11
Analysis of Variance and Regression
11.1 a. The ﬁrst order Taylor’s series approximation is
Var[g(Y )] ≈ [g

(θ)]
2
· VarY = [g

(θ)]
2
· v(θ).
b. If we choose g(y) = g

(y) =
_
y
a
1

v(x)
dx, then
dg

(θ)

=
d

_
θ
a
1
_
v(x)
dx =
1
_
v(θ)
,
by the Fundamental Theorem of Calculus. Then, for any θ,
Var[g

(Y )] ≈
_
1
_
v(θ)
_
2
v(θ) = 1.
11.2 a. v(λ) = λ, g

(y) =

y,
dg

(λ)

=
1
2

λ
, Varg

(Y ) ≈
_
dg

(λ)

_
2
· v(λ) = 1/4, independent of λ.
b. To use the Taylor’s series approximation, we need to express everything in terms of θ =
EY = np. Then v(θ) = θ(1 −θ/n) and
_
dg

(θ)

_
2
=
_
_
1
_
1 −
θ
n
·
1
2
_
θ
n
·
1
n
_
_
2
=
1
4nθ(1 −θ/n)
.
Therefore
Var[g

(Y )] ≈
_
dg

(θ)

_
2
v(θ) =
1
4n
,
independent of θ, that is, independent of p.
c. v(θ) = Kθ
2
,
dg

(θ)

=
1
θ
and Var[g

(Y )] ≈
_
1
θ
_
2
· Kθ
2
= K, independent of θ.
11.3 a. g

λ
(y) is clearly continuous with the possible exception of λ = 0. For that value use
l’Hˆopital’s rule to get
lim
λ→0
y
λ
−1
λ
= lim
λ→0
(log y)y
λ
1
= log y.
b. From Exercise 11.1, we want to ﬁnd v(λ) that satisﬁes
y
λ
−1
λ
=
_
y
a
1
_
v(x)
dx.
Taking derivatives
d
dy
_
y
λ
−1
λ
_
= y
λ−1
=
d
dy
_
y
a
1
_
v(x)
dx =
1
_
v(y)
.
11-2 Solutions Manual for Statistical Inference
Thus v(y) = y
−2(λ−1)
. From Exercise 11.1,
Var
_
y
λ
−1
λ
_

_
d
dy
θ
λ
−1
λ
_
2
v(θ) = θ
2(λ−1)
θ
−2(λ−1)
= 1.
Note: If λ = 1/2, v(θ) = θ, which agrees with Exercise 11.2(a). If λ = 1 then v(θ) = θ
2
,
which agrees with Exercise 11.2(c).
11.5 For the model
Y
ij
= µ +τ
i

ij
, i = 1, . . . , k, j = 1, . . . , n
i
,
take k = 2. The two parameter conﬁgurations
(µ, τ
1
, τ
2
) = (10, 5, 2)
(µ, τ
1
, τ
2
) = (7, 8, 5),
have the same values for µ +τ
1
and µ +τ
2
, so they give the same distributions for Y
1
and Y
2
.
11.6 a. Under the ANOVA assumptions Y
ij
= θ
i
+
ij
, where
ij
∼ independent n(0, σ
2
), so Y
ij

independent n(θ
i
, σ
2
). Therefore the sample pdf is
k

i=1
n
i

j=1
(2πσ
2
)
−1/2
e

(y
ij
−θ
i
)
2

2
= (2πσ
2
)
−Σn
i
/2
exp
_
_
_

1

2
k

i=1
n
i

j=1
(y
ij
−θ
i
)
2
_
_
_
= (2πσ
2
)
−Σn
i
/2
exp
_

1

2
k

i=1
n
i
θ
2
i
_
×exp
_
_
_

1

2

i

j
y
2
ij
+
2

2
k

i=1
θ
i
n
i
¯
Y

_
_
_
.
Therefore, by the Factorization Theorem,
_

Y

,
¯
Y

, . . . ,
¯
Y

,

i

j
Y
2
ij
_
_
is jointly suﬃcient for
_
θ
1
, . . . , θ
k
, σ
2
_
. Since (
¯
Y

, . . . ,
¯
Y

, S
2
p
) is a 1-to-1 function of this
vector, (
¯
Y

, . . . ,
¯
Y

, S
2
p
) is also jointly suﬃcient.
b. We can write
(2πσ
2
)
−Σn
i
/2
exp
_
_
_

1

2
k

i=1
n
i

j=1
(y
ij
−θ
i
)
2
_
_
_
= (2πσ
2
)
−Σn
i
/2
exp
_
_
_

1

2
k

i=1
n
i

j=1
([y
ij
− ¯ y

] + [¯ y

−θ
i
])
2
_
_
_
= (2πσ
2
)
−Σn
i
/2
exp
_
_
_

1

2
k

i=1
n
i

j=1
[y
ij
− ¯ y

]
2
_
_
_
exp
_

1

2
k

i=1
n
i
[¯ y

−θ
i
]
2
_
,
so, by the Factorization Theorem,
¯
Y

, i = 1, . . . , n, is independent of Y
ij

¯
Y

, j = 1, . . . , n
i
,
so S
2
p
is independent of each
¯
Y

.
c. Just identify n
i
¯
Y

with X
i
and redeﬁne θ
i
as n
i
θ
i
.
Second Edition 11-3
11.7 Let U
i
=
¯
Y

−θ
i
. Then
k

i=1
n
i
[(
¯
Y

¯
¯
Y ) −(θ
i

¯
θ)]
2
=
k

i=1
n
i
(U
i

¯
U)
2
.
The U
i
are clearly n(0, σ
2
/n
i
). For K = 2 we have
S
2
2
= n
1
(U
1

¯
U)
2
+n
2
(U
2

¯
U)
2
= n
1
_
U
1

n
1
¯
U
1
+n
2
¯
U
2
n
1
+n
2
_
2
+n
2
_
U
2

n
1
¯
U
1
+n
2
¯
U
2
n
1
+n
2
_
2
= (U
1
−U
2
)
2
_
n
1
_
n
2
n
1
+n
2
_
2
+n
2
_
n
1
n
1
+n
2
_
2
_
=
(U
1
−U
2
)
2
1
n
1
+
1
n
2
.
Since U
1
− U
2
∼ n(0, σ
2
(1/n
1
+ 1/n
2
)), S
2
2

2
∼ χ
2
1
. Let
¯
U
k
be the weighted mean of k U
i
s,
and note that
¯
U
k+1
=
¯
U
k
+
n
k+1
N
k+1
(U
k+1

¯
U
k
),
where N
k
=

k
j=1
n
j
. Then
S
2
k+1
=
k+1

i=1
n
i
(U
i

¯
U
k+1
)
2
=
k+1

i=1
n
i
_
(U
i

¯
U
k
) −
n
k+1
N
k+1
(U
k+1

¯
U
k
)
_
2
= S
2
k
+
n
k+1
N
k
N
k+1
(U
k+1

¯
U
k
)
2
,
where we have expanded the square, noted that the cross-term (summed up to k) is zero, and
did a boat-load of algebra. Now since
U
k+1

¯
U
k
∼ n(0, σ
2
(1/n
k+1
+ 1/N
k
)) = n(0, σ
2
(N
k+1
/n
k+1
N
k
)),
independent of S
2
k
, the rest of the argument is the same as in the proof of Theorem 5.3.1(c).
11.8 Under the oneway ANOVA assumptions, Y
ij
∼ independent n(θ
i
, σ
2
). Therefore
¯
Y

∼ n
_
θ
i
, σ
2
/n
i
_
(Y
ij
’s are independent with common σ
2
.)
a
i
¯
Y

∼ n
_
a
i
θ
i
, a
2
i
σ
2
/n
i
_
k

i=1
a
i
¯
Y

∼ n
_

a
i
θ
i
, σ
2
k

i=1
a
2
i
/n
i
_
.
All these distributions follow from Corollary 4.6.10.
11.9 a. From Exercise 11.8,
T =

a
i
¯
Y
i
∼ n
_

a
i
θ
i
, σ
2

a
2
i
_
,
and under H
0
, ET = δ. Thus, under H
0
,

a
i
¯
Y
i
−δ
_
S
2
p

a
2
i
∼ t
N−k
,
11-4 Solutions Manual for Statistical Inference
where N =

n
i
. Therefore, the test is to reject H
0
if
¸
¸

a
i
¯
Y
i
−δ
¸
¸
_
S
2
p

a
2
i
/n
i
> t
N−k,
α
2
.
b. Similarly for H
0
:

a
i
θ
i
≤ δ vs. H
1
:

a
i
θ
i
> δ, we reject H
0
if

a
i
¯
Y
i
−δ
_
S
2
p

a
2
i
/n
i
> t
N−k,α
.
11.10 a. Let H
i
0
, i = 1, . . . , 4 denote the null hypothesis using contrast a
i
, of the form
H
i
0
:

j
a
ij
θ
j
≥ 0.
If H
1
0
is rejected, it indicates that the average of θ
2
, θ
3
, θ
4
, and θ
5
is bigger than θ
1
which
is the control mean. If all H
i
0
’s are rejected, it indicates that θ
5
> θ
i
for i = 1, 2, 3, 4. To see
this, suppose H
4
0
and H
5
0
are rejected. This means θ
5
>
θ
5

4
2
> θ
3
; the ﬁrst inequality is
implied by the rejection of H
5
0
and the second inequality is the rejection of H
4
0
. A similar
argument implies θ
5
> θ
2
and θ
5
> θ
1
. But, for example, it does not mean that θ
4
> θ
3
or
θ
3
> θ
2
. It also indicates that
1
2

5

4
) > θ
3
,
1
3

5

4

3
) > θ
2
,
1
4

5

4

3

2
) > θ
1
.
b. In part a) all of the contrasts are orthogonal. For example,
5

i=1
a
2i
a
3i
=
_
0, 1, −
1
3
, −
1
3
, −
1
3
_
_
_
_
_
_
_
0
0
1
1
2
1
2
_
_
_
_
_
_
= −
1
3
+
1
6
+
1
6
= 0,
and this holds for all pairs of contrasts. Now, from Lemma 5.4.2,
Cov
_

i
a
ji
¯
Y

,

i
a
j

i
¯
Y

_
=
σ
2
n

i
a
ji
a
j

i
,
which is zero because the contrasts are orthogonal. Note that the equal number of obser-
vations per treatment is important, since if n
i
= n
i
for some i, i

, then
Cov
_
k

i=1
a
ji
¯
Y
i
,
k

i=1
a
j

i
¯
Y
i
_
=
k

i=1
a
ji
a
j

i
σ
2
n
i
= σ
2
k

i=1
a
ji
a
j

i
n
i
= 0.
c. This is not a set of orthogonal contrasts because, for example, a
1
×a
2
= −1. However, each
contrast can be interpreted meaningfully in the context of the experiment. For example, a
1
tests the eﬀect of potassium alone, while a
5
looks at the eﬀect of adding zinc to potassium.
11.11 This is a direct consequence of Lemma 5.3.3.
11.12 a. This is a special case of (11.2.6) and (11.2.7).
Second Edition 11-5
b. From Exercise 5.8(a) We know that
s
2
=
1
k −1
k

i=1
(¯ y

¯
¯ y)
2
=
1
2k(k −1)

i,i

(¯ y

− ¯ y
i

·
)
2
.
Then
1
k(k −1)

i,i

t
2
ii
=
1
2k(k −1)

i,i

(¯ y

− ¯ y
i

·
)
2
s
2
p
/n
=
k

i=1
(¯ y

¯
¯ y)
2
(k −1)s
2
p
/n
=

i
n(¯ y

¯
¯ y)
2
/(k −1)
s
2
p
,
which is distributed as F
k−1,N−k
under H
0
: θ
1
= · · · = θ
k
. Note that

i,i

t
2
ii
=
k

i=1
k

i

=1
t
2
ii
,
therefore t
2
ii
and t
2
i

i
are both included, which is why the divisor is k(k−1), not
k(k−1)
2
=
_
k
2
_
.
Also, to use the result of Example 5.9(a), we treated each mean
¯
Y

as an observation, with
overall mean
¯
¯
Y . This is true for equal sample sizes.
11.13 a.
L(θ|y) =
_
1
2πσ
2
_
Nk/2
e

1
2

k
i=1

n
i
j=1
(y
ij
−θ
i
)
2

2
.
Note that
k

i=1
n
i

j=1
(y
ij
−θ
i
)
2
=
k

i=1
n
i

j=1
(y
ij
− ¯ y

)
2
+
k

i=1
n
i
(¯ y

−θ
i
)
2
= SSW +
k

i=1
n
i
(¯ y

−θ
i
)
2
,
and the LRT statistic is
λ = (ˆ τ
2
/ˆ τ
2
0
)
Nk/2
where
ˆ τ
2
= SSW and ˆ τ
2
0
= SSW +

i
n
i
(¯ y

− ¯ y
··
)
2
= SSW +SSB.
Thus λ < k if and only if SSB/SSW is large, which is equivalent to the F test.
b. The error probabilities of the test are a function of the θ
i
s only through η =

θ
2
i
. The
distribution of F is that of a ratio of chi squared random variables, with the numerator
being noncentral (dependent on η). Thus the Type II error is given by
P (F > k|η) = P
_
χ
2
k−1
(η)/(k −1)
χ
2
N−k
/(N −k)
> k
_
≥ P
_
χ
2
k−1
(0)/(k −1)
χ
2
N−k
/(N −k)
> k
_
= α,
where the inequality follows from the fact that the noncentral chi squared is stochastically
increasing in the noncentrality parameter.
11-6 Solutions Manual for Statistical Inference
11.14 Let X
i
∼ n(θ
i
, σ
2
). Then from Exercise 11.11
Cov
_

i
a
i

c
i
X
i
,

i

c
i
v
i
X
i
_
= σ
2

a
i
v
i
Var
_

i
a
i

c
i
X
i
_
= σ
2

a
2
i
c
i
, Var
_
i

c
i
v
i
X
i
_
= σ
2

c
i
v
2
i
,
and the Cauchy-Schwarz inequality gives
_

a
i
v
i
___

a
2
i
/c
i
_

c
i
v
2
i
.
If a
i
= c
i
v
i
this is an equality, hence the LHS is maximized. The simultaneous statement is
equivalent to
_

k
i=1
a
i
(¯ y

−θ
i
)
_
2
_
s
2
p

k
i=1
a
2
i
/n
_ ≤ M for all a
1
, . . . , a
k
,
and the LHS is maximized by a
i
= n
i
(¯ y

−θ
i
). This produces the F statistic.
11.15 a. Since t
2
ν
= F
1,ν
, it follows from Exercise 5.19(b) that for k ≥ 2
P[(k −1)F
k−1,ν
≥ a] ≥ P(t
2
ν
≥ a).
So if a = t
2
ν,α/2
, the F probability is greater than α, and thus the α-level cutoﬀ for the F
must be greater than t
2
ν,α/2
.
b. The only diﬀerence in the intervals is the cutoﬀ point, so the Scheﬀ´e intervals are wider.
c. Both sets of intervals have nominal level 1 − α, but since the Scheﬀ´e intervals are wider,
tests based on them have a smaller rejection region. In fact, the rejection region is contained
in the t rejection region. So the t is more powerful.
11.16 a. If θ
i
= θ
j
for all i, j, then θ
i
−θ
j
= 0 for all i, j, and the converse is also true.
b. H
0
: θ ∈ ∩
ij
Θ
ij
and H
1
: θ ∈ ∪
ij

ij
)
c
.
11.17 a. If all of the means are equal, the Scheﬀ´e test will only reject α of the time, so the t tests
will be done only α of the time. The experimentwise error rate is preserved.
b. This follows from the fact that the t tests use a smaller cutoﬀ point, so there can be rejection
using the t test but no rejection using Scheﬀ´e. Since Scheﬀ´e has experimentwise level α,
the t test has experimentwise error greater than α.
c. The pooled standard deviation is 2.358, and the means and t statistics are
Mean t statistic
Low Medium High Med-Low High-Med High-Low
3.51. 9.27 24.93 3.86 10.49 14.36
The t statistics all have 12 degrees of freedom and, for example, t
12,.01
= 2.68, so all of the
tests reject and we conclude that the means are all signiﬁcantly diﬀerent.
11.18 a.
P(Y > a|Y > b) = P(Y > a, Y > b)/P(Y > b)
= P(Y > a)/P(Y > b) (a > b)
> P(Y > a). (P(Y > b) < 1)
b. If a is a cutoﬀ point then we would declare signiﬁcance if Y > a. But if we only check if Y is
signiﬁcant because we see a big Y (Y > b), the proper signiﬁcance level is P(Y > a|Y > b),
which will show less signiﬁcance than P(Y > a).
Second Edition 11-7
11.19 a. The marginal distributions of the Y
i
are somewhat straightforward to derive. As X
i+1

gamma(λ
i+1
, 1) and, independently,

i
j=1
X
j
∼ gamma(

i
j=1
λ
j
, 1) (Example 4.6.8), we
only need to derive the distribution of the ratio of two independent gammas. Let X ∼
gamma(λ
1
, 1) and Y ∼ gamma(λ
2
, 1). Make the transformation
u = x/y, v = y ⇒ x = uv, y = v,
with Jacobian v. The density of (U, V ) is
f(u, v) =
1
Γ(λ
1
)Γ(λ
2
)
(uv)
λ
1
−1
v
λ
2
−1
ve
−uv
e
−v
=
u
λ
1
−1
Γ(λ
1
)Γ(λ
2
)
v
λ
1

2
−1
e
−v(1+u)
.
To get the density of U, integrate with respect to v. Note that we have the kernel of a
gamma(λ
1

2
, 1/(1 +u)), which yields
f(u) =
Γ(λ
1

2
)
Γ(λ
1
)Γ(λ
2
)
u
λ
1
−1
(1 +u)
λ
1

2
−1
.
The joint distribution is a nightmare. We have to make a multivariate change of variable.
This is made a bit more palatable if we do it in two steps. First transform
W
1
= X
1
, W
2
= X
1
+X
2
, W
3
= X
1
+X
2
+X
3
, . . . , W
n
= X
1
+X
2
+· · · +X
n
,
with
X
1
= W
1
, X
2
= W
2
−W
1
, X
3
= W
3
−W
2
, . . . X
n
= W
n
−W
n−1
,
and Jacobian 1. The joint density of the W
i
is
f(w
1
, w
2
, . . . , w
n
) =
n

i=1
1
Γ(λ
i
)
(w
i
−w
i−1
)
λ
i
−1
e
−w
n
, w
1
≤ w
2
≤ · · · ≤ w
n
,
where we set w
0
= 0 and note that the exponent telescopes. Next note that
y
1
=
w
2
−w
1
w
1
, y
2
=
w
3
−w
2
w
2
, . . . y
n−1
=
w
n
−w
n−1
w
n−1
, y
n
= w
n
,
with
w
i
=
y
n

n−1
j=i
(1 +y
j
)
, i = 1, . . . , n −1, w
n
= y
n
.
Since each w
i
only involves y
j
with j ≥ i, the Jacobian matrix is triangular and the
determinant is the product of the diagonal elements. We have
dw
i
dy
i
= −
y
n
(1 +y
i
)

n−1
j=i
(1 +y
j
)
, i = 1, . . . , n −1,
dw
n
dy
n
= 1,
and
f(y
1
, y
2
, . . . , y
n
) =
1
Γ(λ
1
)
_
y
n

n−1
j=1
(1 +y
j
)
_
λ
1
−1
×
n−1

i=2
1
Γ(λ
i
)
_
y
n

n−1
j=i
(1 +y
j
)

y
n

n−1
j=i−1
(1 +y
j
)
_
λ
i
−1
e
−y
n
×
n−1

i=1
y
n
(1 +y
i
)

n−1
j=i
(1 +y
j
)
.
11-8 Solutions Manual for Statistical Inference
Factor out the terms with y
n
and do some algebra on the middle term to get
f(y
1
, y
2
, . . . , y
n
) = y
Σ
i
λ
i
−1
n
e
−y
n
1
Γ(λ
1
)
_
1

n−1
j=1
(1 +y
j
)
_
λ
1
−1
×
n−1

i=2
1
Γ(λ
i
)
_
y
i−1
1 +y
i−1
1

n−1
j=i
(1 +y
j
)
_
λ
i
−1
×
n−1

i=1
1
(1 +y
i
)

n−1
j=i
(1 +y
j
)
.
We see that Y
n
is independent of the other Y
i
(and has a gamma distribution), but there
does not seem to be any other obvious conclusion to draw from this density.
b. The Y
i
are related to the F distribution in the ANOVA. For example, as long as the sum
of the λ
i
are integers,
Y
i
=
X
i+1

i
j=1
X
j
=
2X
i+1
2

i
j=1
X
j
=
χ
2
λ
i+1
χ
2

i
j=1
λ
j
∼ F
λ
i+1
,

i
j=1
λ
j
.
Note that the F density makes sense even if the λ
i
are not integers.
11.21 a.
Grand mean ¯ y
··
=
188.54
15
= 12.57
Total sum of squares =
3

i=1
5

j=1
(y
ij
− ¯ y
··
)
2
= 1295.01.
Within SS =
3

1
5

1
(y
ij
− ¯ y

)
2
=
5

1
(y
1j
−3.508)
2
+
5

1
(y
2j
−9.274)
2
+
5

1
(y
3j
−24.926)
2
= 1.089 + 2.189 + 63.459 = 66.74
Between SS = 5
_
3

1
(y
ij
− ¯ y

)
2
_
= 5(82.120 + 10.864 + 152.671) = 245.65 · 5 = 1228.25.
ANOVA table:
Source df SS MS F
Treatment 2 1228.25 614.125 110.42
Within 12 66.74 5.562
Total 14 1294.99
Note that the total SS here is diﬀerent from above – round oﬀ error is to blame. Also,
F
2,12
= 110.42 is highly signiﬁcant.
b. Completing the proof of (11.2.4), we have
k

i=1
n
i

j=1
(y
ij

¯
¯ y)
2
=
k

i=1
n
i

j=1
((y
ij
− ¯ y

) +(¯ y
i

¯
¯ y))
2
Second Edition 11-9
=
k

i=1
n
i

j=1
(y
ij
− ¯ y

)
2
+
k

i=1
n
i

j=1
(¯ y

¯
¯ y)
2
+
k

i=1
n
i

j=1
(y
ij
− ¯ y

) (¯ y

¯
¯ y) ,
where the cross term (the sum over j) is zero, so the sum of squares is partitioned as
k

i=1
n
i

j=1
(y
ij
− ¯ y

)
2
+
k

i=1
n
i
(¯ y
i

¯
¯ y)
2
c. From a), the F statistic for the ANOVA is 110.42. The individual two-sample t’s, using
s
2
p
=
1
15−3
(66.74) = 5.5617, are
t
2
12
=
(3.508 −9.274)
2
(5.5617)(2/5)
=
33.247
2.2247
= 14.945,
t
2
13
=
(3.508 −24.926)
2
2.2247
= 206.201,
t
2
23
=
(9.274 −24.926)
2
2.2247
= 110.122,
and
2(14.945) + 2(206.201) + (110.122)
6
= 110.42 = F.
11.23 a.
EY
ij
= E(µ +τ
i
+b
j
+
ij
) = µ +τ
i
+ Eb
j
+ E
ij
= µ +τ
i
VarY
ij
= Varb
j
+ Var
ij
= σ
2
B

2
,
by independence of b
j
and
ij
.
b.
Var
_
n

i=1
a
i
¯
Y

_
=
n

i=1
a
2
i
Var
¯
Y

+ 2

i>i

Cov(a
i
Y

, a
i
Y
i

·
).
The ﬁrst term is
n

i=1
a
2
i
Var
¯
Y

=
n

i=1
a
2
i
Var
_
_
1
r
r

j=1
µ +τ
i
+b
j
+
ij
_
_
=
1
r
2
n

i=1
a
2
i
(rσ
2
B
+rσ
2
)
from part (a). For the covariance
E
¯
Y

= µ +τ
i
,
and
E(
¯
Y

¯
Y
i

·
) = E
_
_
_
_
µ +τ
i
+
1
r

j
(b
j
+
ij
)
_
_
_
_
µ +τ
i
+
1
r

j
(b
j
+
i

j
)
_
_
_
_
= (µ +τ
i
)(µ +τ
i
) +
1
r
2
E
_
_
_
_

j
(b
j
+
ij
)
_
_
_
_

j
(b
j
+
i

j
)
_
_
_
_
11-10 Solutions Manual for Statistical Inference
since the cross terms have expectation zero. Next, expanding the product in the second term
again gives all zero cross terms, and we have
E(
¯
Y

¯
Y
i

·
) = (µ +τ
i
)(µ +τ
i
) +
1
r
2
(rσ
2
B
),
and
Cov(
¯
Y

,
¯
Y
i

·
) = σ
2
B
/r.
Finally, this gives
Var
_
n

i=1
a
i
¯
Y

_
=
1
r
2
n

i=1
a
2
i
(rσ
2
B
+rσ
2
) + 2

i>i

a
i
a
i
σ
2
B
/r
=
1
r
_
n

i=1
a
2
i
σ
2

2
B
(
n

i=1
a
i
)
2
_
=
1
r
σ
2
n

i=1
a
2
i
=
1
r

2

2
B
)(1 −ρ)
n

i=1
a
2
i
,
where, in the third equality we used the fact that

i
a
i
= 0.
11.25 Diﬀerentiation yields
a.

∂c

[y
i
−(c+dx
i
)] (−1)
set
= 0 ⇒ nc +d

x
i
=

y
i

∂d

[y
i
−(c
i
+dx
i
)] (−x
i
)
set
= 0 ⇒ c

x
i
+d

x
2
i
=

x
i
y
i
.
b. Note that nc +d

x
i
=

y
i
⇒c = ¯ y −d¯ x. Then
(¯ y −d¯ x)

x
i
+d

x
2
i
=

x
i
y
i
and d
_

x
2
i
−n¯ x
2
_
=

x
i
y
i

x
i
¯ y
which simpliﬁes to d =

x
i
(y
i
− ¯ y)/

(x
i
− ¯ x)
2
. Thus c and d are the least squares
estimates.
c. The second derivatives are

2
∂c
2

2
∂c∂d

x
i
,

2
∂d
2

x
2
i
.
Thus the Jacobian of the second-order partials is
¸
¸
¸
¸
n

x
i

x
i

x
2
i
¸
¸
¸
¸
= n

x
2
i

_

x
i
_
2
= n

(x
i
− ¯ x)
2
> 0.
11.27 For the linear estimator

i
a
i
Y
i
to be unbiased for α we have
E
_

i
a
i
Y
i
_
=

i
a
i
(α +βx
i
) = α ⇒

i
a
i
= 1 and

i
a
i
x
i
= 0.
Since Var

i
a
i
Y
i
= σ
2

i
a
2
i
, we need to solve:
minimize

i
a
2
i
subject to

i
a
i
= 1 and

i
a
i
x
i
= 0.
Second Edition 11-11
A solution can be found with Lagrange multipliers, but verifying that it is a minimum is
excruciating. So instead we note that

i
a
i
= 1 ⇒a
i
=
1
n
+k(b
i

¯
b),
for some constants k, b
1
, b
2
, . . . , b
n
, and

i
a
i
x
i
= 0 ⇒k =
−¯ x

i
(b
i

¯
b)(x
i
− ¯ x)
and a
i
=
1
n

¯ x(b
i

¯
b)

i
(b
i

¯
b)(x
i
− ¯ x)
.
Now

i
a
2
i
=

i
_
1
n

¯ x(b
i

¯
b)

i
(b
i

¯
b)(x
i
− ¯ x)
_
2
=
1
n
+
¯ x
2

i
(b
i

¯
b)
2
[

i
(b
i

¯
b)(x
i
− ¯ x)]
2
,
since the cross term is zero. So we need to minimize the last term. From Cauchy-Schwarz we
know that

i
(b
i

¯
b)
2
[

i
(b
i

¯
b)(x
i
− ¯ x)]
2

1

i
(x
i
− ¯ x)]
2
,
and the minimum is attained at b
i
= x
i
. Substituting back we get that the minimizing a
i
is
1
n

¯ x(x
i
−¯ x)

i
(x
i
−¯ x)
2
, which results in

i
a
i
Y
i
=
¯
Y −
ˆ
β¯ x, the least squares estimator.
11.28 To calculate
max
σ
2
L(σ
2
|y, ˆ α
ˆ
β) = max
σ
2
_
1
2πσ
2
_
n/2
e

1
2
Σ
i
[y
i
−(ˆ α+
ˆ
βx
i
)]
2

2
take logs and diﬀerentiate with respect to σ
2
to get
d

2
log L(σ
2
|y, ˆ α,
ˆ
β) = −
n

2
+
1
2

i
[y
i
−(ˆ α +
ˆ
βx
i
)]
2

2
)
2
.
Set this equal to zero and solve for σ
2
. The solution is ˆ σ
2
.
11.29 a.

i
= E(Y
i
− ˆ α −
ˆ
βx
i
) = (α +βx
i
) −α −βx
i
= 0.
b.
Varˆ
i
= E[Y
i
− ˆ α −
ˆ
βx
i
]
2
= E[(Y
i
−α −βx
i
) −(ˆ α −α) −x
i
(
ˆ
β −β)]
2
= VarY
i
+ Varˆ α +x
2
i
Var
ˆ
β −2Cov(Y
i
, ˆ α) −2x
i
Cov(Y
i
,
ˆ
β) + 2x
i
Cov(ˆ α,
ˆ
β).
11.30 a. Straightforward algebra shows
ˆ α = ¯ y −
ˆ
β¯ x
=

1
n
y
i

¯ x

(x
i
− ¯ x)y
i

(x
i
− ¯ x)
2
=

_
1
n

¯ x(x
i
− ¯ x)

(x
i
− ¯ x)
2
_
y
i
.
11-12 Solutions Manual for Statistical Inference
b. Note that for c
i
=
1
n

¯ x(x
i
−¯ x)

(x
i
−¯ x)
2
,

c
i
= 1 and

c
i
x
i
= 0. Then
Eˆ α = E

c
i
Y
i
=

c
i
(α +βx
i
= α,
Varˆ α =

c
2
i
VarY
i
= σ
2

c
2
i
,
and

c
2
i
=

_
1
n

¯ x(x
i
− ¯ x)

(x
i
− ¯ x)
2
_
2
=

1
n
2
+

¯ x
2
(x
i
− ¯ x)
2
(

(x
i
− ¯ x)
2
)
2
(cross term = 0)
=
1
n
+
¯ x
2

(x
i
− ¯ x)
2
=

x
2
i
nS
xx
.
c. Write
ˆ
β =

d
i
y
i
, where
d
i
=
x
i
− ¯ x

(x
i
− ¯ x)
2
.
From Exercise 11.11,
Cov(ˆ α,
ˆ
β) = Cov
_

c
i
Y
i
,

d
i
Y
i
_
= σ
2

c
i
d
i
= σ
2

_
1
n

¯ x(x
i
− ¯ x)

(x
i
− ¯ x)
2
_
(x
i
− ¯ x)

(x
i
− ¯ x)
2
=
−σ
2
¯ x

(x
i
− ¯ x)
2
.
11.31 The fact that
ˆ
i
=

i

ij
−(c
j
+d
j
x
i
)]Y
j
follows directly from (11.3.27) and the deﬁnition of c
j
and d
j
. Since ˆ α =

i
c
i
Y
i
, from Lemma
11.3.2
Cov(ˆ
i
, ˆ α) = σ
2

j
c
j

ij
−(c
j
+d
j
x
i
)]
= σ
2
_
_
c
i

j
c
j
(c
j
+d
j
x
i
)
_
_
= σ
2
_
_
c
i

j
c
2
j
−x
i

j
c
j
d
j
_
_
.
Substituting for c
j
and d
j
gives
c
i
=
1
n

(x
i
− ¯ x)¯ x
S
xx

j
c
2
j
=
1
n
+
¯ x
2
S
xx
x
i

j
c
j
d
j
= −
x
i
¯ x
S
xx
,
and substituting these values shows Cov(ˆ
i
, ˆ α) = 0. Similarly, for
ˆ
β,
Cov(ˆ
i
,
ˆ
β) = σ
2
_
_
d
i

j
c
j
d
j
−x
i

j
d
2
j
_
_
Second Edition 11-13
with
d
i
=
(x
i
− ¯ x)
S
xx

j
c
j
d
j
= −
¯ x
S
xx
x
i

j
d
2
j
=
1
S
xx
,
and substituting these values shows Cov(ˆ
i
,
ˆ
β) = 0.
11.32 Write the models as
3y
i
= α +βx
i
+
i
y
i
= α

(x
i
− ¯ x) +
i
= α

z
i
+
i
.
a. Since ¯ z = 0,
ˆ
β =

(x
i
− ¯ x)(y
i
− ¯ y)

(x
i
− ¯ x)
2
=

z
i
(y
i
− ¯ y)

z
2
i
=
ˆ
β

.
b.
ˆ α = ¯ y −
ˆ
β¯ x,
ˆ α

= ¯ y −
ˆ
β

¯ z = ¯ y
since ¯ z = 0.
ˆ α

∼ n(α +β¯ z, σ
2
/n) = n(α, σ
2
/n).
c. Write
ˆ α

=

1
n
y
i
ˆ
β

=

_
z
i

z
2
i
_
y
i
.
Then
Cov(ˆ α,
ˆ
β) = −σ
2

1
n
_
z
i

z
2
i
_
= 0,
since

z
i
= 0.
11.33 a. From (11.23.25), β = ρ(σ
Y

X
), so β = 0 if and only if ρ = 0 (since we assume that the
variances are positive).
b. Start from the display following (11.3.35). We have
ˆ
β
2
S
2
/S
xx
=
S
2
xy
/S
xx
= (n −2)
S
2
xy
_
S
yy
−S
2
xy
/S
xx
_
S
xx
= (n −2)
S
2
xy
_
S
yy
S
xx
−S
2
xy
_,
and dividing top and bottom by S
yy
S
xx
ﬁnishes the proof.
c. From (11.3.33) if ρ = 0 (equivalently β = 0), then
ˆ
β/(S/

S
xx
) =

n −2 r/

1 −r
2
has a
t
n−2
distribution.
11-14 Solutions Manual for Statistical Inference
11.34 a. ANOVA table for height data
Source df SS MS F
Regression 1 60.36 60.36 50.7
Residual 6 7.14 1.19
Total 7 67.50
The least squares line is ˆ y = 35.18 +.93x.
b. Since y
i
− ¯ y = (y
i
− ˆ y
i
) + (ˆ y
i
− ¯ y), we just need to show that the cross term is zero.
n

i=1
(y
i
− ˆ y
i
)(ˆ y
i
− ¯ y) =
n

i=1
_
y
i
−(ˆ α +
ˆ
βx
i
)
_ _
(ˆ α +
ˆ
βx
i
) − ¯ y
_
=
n

i=1
_
(ˆ y
i
− ¯ y) −
ˆ
β(x
i
− ¯ x)
_ _
ˆ
β(x
i
− ¯ x)
_
(ˆ α = ¯ y −
ˆ
β¯ x)
=
ˆ
β
n

i=1
(x
i
− ¯ x)(y
i
− ¯ y) −
ˆ
β
2
n

i=1
(x
i
− ¯ x)
2
= 0,
from the deﬁnition of
ˆ
β.
c.

(ˆ y
i
− ¯ y)
2
=
ˆ
β
2

(x
i
− ¯ x)
2
=
S
2
xy
S
xx
.
11.35 a. For the least squares estimate:
d

i
(y
i
−θx
2
i
)
2
= 2

i
(y
i
−θx
2
i
)x
2
i
= 0
which implies
ˆ
θ =

i
y
i
x
2
i

i
x
4
i
.
b. The log likelihood is
log L = −
n
2
log(2πσ
2
) −
1

2

i
(y
i
−θx
2
i
)
2
,
and maximizing this is the same as the minimization in part (a).
c. The derivatives of the log likelihood are
d

log L =
1
σ
2

i
(y
i
−θx
2
i
)x
2
i
d
2

2
log L =
−1
σ
2

i
x
4
i
,
so the CRLB is σ
2
/

i
x
4
i
. The variance of
ˆ
θ is
Var
ˆ
θ = Var
_
i
y
i
x
2
i

i
x
4
i
_
=

i
_
x
2
i

j
x
4
j
_
σ
2
= σ
2
/

i
x
4
i
,
so
ˆ
θ is the best unbiased estimator.
Second Edition 11-15
11.36 a.
Eˆ α = E(
¯
Y −
ˆ
β
¯
X) = E
_
E(
¯
Y −
ˆ
β
¯
X|
¯
X)
_
= E
_
α+β
¯
X −β
¯
X
¸
= Eα = α.
E
ˆ
β = E[E(
ˆ
β|
¯
X)] = Eβ = β.
b. Recall
VarY = Var[E(Y |X)] + E[Var(Y |X)]
Cov(Y , Z) = Cov[E(Y |X), E(Z|X)] + E[Cov(Y, Z|X)].
Thus
Varˆ α = E[Var(ˆ α|X)] = σ
2
E
_

X
2
i
_
S
XX
_
Var
ˆ
β = σ
2
E[1/S
XX
]
Cov(ˆ α,
ˆ
β) = E[Cov(ˆ α,
ˆ
β|
ˆ
X)] = −σ
2
E[
¯
X/S
XX
].
11.37 This is almost the same problem as Exercise 11.35. The log likelihood is
log L = −
n
2
log(2πσ
2
) −
1

2

i
(y
i
−βx
i
)
2
.
The MLE is

i
x
i
y
i
/

i
x
2
i
, with mean β and variance σ
2
/

i
x
2
i
, the CRLB.
11.38 a. The model is y
i
= θx
i
+
i
, so the least squares estimate of θ is

x
i
y
i
/

x
2
i
(regression
through the origin).
E
_
x
i
Y
i

x
2
i
_
=

x
i
(x
i
θ)

x
2
i
= θ
Var
_
x
i
Y
i

x
2
i
_
=

x
2
i
(x
i
θ)
(

x
2
i
)
2
= θ

x
3
i
(

x
2
i
)
2
.
The estimator is unbiased.
b. The likelihood function is
L(θ|x) =
n

i=1
e
−θx
i
(θx
i
)
y
i
(y
i
)!
=
e
−θΣx
i

(θx
i
)
y
i

y
i
!

∂θ
logL =

∂θ
_
−θ

x
i
+

y
i
log(θx
i
) −log

y
i
!
_
= −

x
i
+

x
i
y
i
θx
i
set
= 0
which implies
ˆ
θ =

y
i

x
i
E
ˆ
θ =

θx
i

x
i
= θ and Var
ˆ
θ = Var
_
y
i

x
i
_
=

θx
i
(

x
i
)
2
=
θ

x
i
.
c.

2
∂θ
2
log L =

∂θ
_

x
i
+

y
i
θ
_
=

y
i
θ
2
and E −

2
∂θ
2
log L =

x
i
θ
.
Thus, the CRLB is θ/

x
i
, and the MLE is the best unbiased estimator.
11-16 Solutions Manual for Statistical Inference
11.39 Let A
i
be the set
A
i
=
_
_
_
ˆ α,
ˆ
β :
_
(ˆ α +
ˆ
βx
0i
) −(α +βx
0i
)
_ _
_
_
S
¸
1
n
+
(x
0i
− ¯ x)
2
S
xx
_
_
≤ t
n−2,α/2m
_
_
_
.
Then P (∩
m
i=1
A
i
) is the probability of simultaneous coverage, and using the Bonferroni In-
equality (1.2.10) we have
P (∩
m
i=1
A
i
) ≥
m

i=1
P(A
i
) −(m−1) =
m

i=1
_
1 −
α
m
_
−(m−1) = 1 −α.
11.41 Assume that we have observed data (y
1
, x
1
), (y
2
, x
2
), . . . , (y
n−1
, x
n−1
) and we have x
n
but not
y
n
. Let φ(y
i
|x
i
) denote the density of Y
i
, a n(a +bx
i
, σ
2
).
a. The expected complete-data log likelihood is
E
_
n

i=1
log φ(Y
i
|x
i
)
_
=
n−1

i=1
log φ(y
i
|x
i
) + Elog φ(Y |x
n
),
where the expectation is respect to the distribution φ(y|x
n
) with the current values of the
parameter estimates. Thus we need to evaluate
Elog φ(Y |x
n
) = E
_

1
2
log(2πσ
2
1
) −
1

2
1
(Y −µ
1
)
2
_
,
where Y ∼ n(µ
0
, σ
2
0
). We have
E(Y −µ
1
)
2
= E([Y −µ
0
] + [µ
0
−µ
1
])
2
= σ
2
0
+ [µ
0
−µ
1
]
2
,
since the cross term is zero. Putting this all together, the expected complete-data log
likelihood is

n
2
log(2πσ
2
1
) −
1

2
1
n−1

i=1
[y
i
−(a
1
+b
1
x
i
)]
2

σ
2
0
+ [(a
0
+b
0
x
n
) −(a
1
+b
1
x
n
)]
2

2
1
= −
n
2
log(2πσ
2
1
) −
1

2
1
n

i=1
[y
i
−(a
1
+b
1
x
i
)]
2

σ
2
0

2
1
if we deﬁne y
n
= a
0
+b
0
x
n
.
b. For ﬁxed a
0
and b
0
, maximizing this likelihood gives the least squares estimates, while the
maximum with respect to σ
2
1
is
ˆ σ
2
1
=

n
i=1
[y
i
−(a
1
+b
1
x
i
)]
2

2
0
n
.
So the EM algorithm is the following: At iteration t, we have estimates ˆ a
(t)
,
ˆ
b
(t)
, and ˆ σ
2(t)
.
We then set y
(t)
n
= ˆ a
(t)
+
ˆ
b
(t)
x
n
(which is essentially the E-step) and then the M-step is
to calculate ˆ a
(t+1)
and
ˆ
b
(t+1)
as the least squares estimators using (y
1
, x
1
), (y
2
, x
2
), . . .
(y
n−1
, x
n−1
), (y
(t)
n
, x
n
), and
ˆ σ
2(t+1)
1
=

n
i=1
[y
i
−(a
(t+1)
+b
(t+1)
x
i
)]
2

2(t)
0
n
.
Second Edition 11-17
c. The EM calculations are simple here. Since y
(t)
n
= ˆ a
(t)
+
ˆ
b
(t)
x
n
, the estimates of a and b
must converge to the least squares estimates (since they minimize the sum of squares of
the observed data, and the last term adds nothing. For ˆ σ
2
we have (substituting the least
squares estimates) the stationary point
ˆ σ
2
=

n
i=1
[y
i
−(ˆ a +
ˆ
bx
i
)]
2
+ ˆ σ
2
n
⇒ ˆ σ
2
= σ
2
obs
,
where σ
2
obs
is the MLE from the n−1 observed data points. So the MLE s are the same as
those without the extra x
n
.
d. Now we use the bivariate normal density (see Deﬁnition 4.5.10 and Exercise 4.45 ). Denote
the density by φ(x, y). Then the expected complete-data log likelihood is
n−1

i=1
log φ(x
i
, y
i
) + Elog φ(X, y
n
),
where after iteration t the missing data density is the conditional density of X given Y = y
n
,
X|Y = y
n
∼ n
_
µ
(t)
X

(t)

(t)
X

(t)
Y
)(y
n
−µ
(t)
Y
), (1 −ρ
2(t)

2(t)
X
_
.
Denoting the mean by µ
0
and the variance by σ
2
0
, the expected value of the last piece in
the likelihood is
Elog φ(X, y
n
)
= −
1
2
log(2πσ
2
X
σ
2
Y
(1 −ρ
2
))

1
2(1 −ρ
2
)
_
E
_
X −µ
X
σ
X
_
2
−2ρE
_
(X −µ
X
)(y
n
−µ
Y
)
σ
X
σ
Y
_
+
_
y
n
−µ
Y
σ
Y
_
2
_
= −
1
2
log(2πσ
2
X
σ
2
Y
(1 −ρ
2
))

1
2(1 −ρ
2
)
_
σ
2
0
σ
2
X
+
_
µ
0
−µ
X
σ
X
_
2
−2ρ
_

0
−µ
X
)(y
n
−µ
Y
)
σ
X
σ
Y
_
+
_
y
n
−µ
Y
σ
Y
_
2
_
.
So the expected complete-data log likelihood is
n−1

i=1
log φ(x
i
, y
i
) + log φ(µ
0
, y
n
) −
σ
2
0
2(1 −ρ
2

2
X
.
The EM algorithm is similar to the previous one. First note that the MLEs of µ
Y
and σ
2
Y
are the usual ones, ¯ y and ˆ σ
2
Y
, and don’t change with the iterations. We update the other
estimates as follows. At iteration t, the E-step consists of replacing x
(t)
n
by
x
(t+1)
n
= ˆ µ
(t)
X

(t)
σ
(t)
X
σ
(t)
Y
(y
n
− ¯ y).
Then µ
(t+1)
X
= ¯ x and we can write the likelihood as

1
2
log(2πσ
2
X
ˆ σ
2
Y
(1 −ρ
2
)) −
1
2(1 −ρ
2
)
_
S
xx

2
0
σ
2
X
−2ρ
S
xy
σ
X
ˆ σ
Y
+
S
yy
ˆ σ
2
Y
_
.
11-18 Solutions Manual for Statistical Inference
which is the usual bivariate normal likelihood except that we replace S
xx
with S
xx
+ σ
2
0
.
So the MLEs are the usual ones, and the EM iterations are
x
(t+1)
n
= ˆ µ
(t)
X

(t)
σ
(t)
X
σ
(t)
Y
(y
n
− ¯ y)
ˆ µ
(t+1)
X
= ¯ x
(t)
ˆ σ
2(t+1)
X
=
S
(t)
xx
+ (1 − ˆ ρ
2(t)
)ˆ σ
2(t)
X
n
ˆ ρ
(t+1)
=
S
(t)
xy
_
(S
(t)
xx
+ (1 − ˆ ρ
2(t)
)ˆ σ
2(t+1)
X
)S
yy
.
Here is R code for the EM algorithm:
nsim<-20;
xdata0<-c(20,19.6,19.6,19.4,18.4,19,19,18.3,18.2,18.6,19.2,18.2,
18.7,18.5,18,17.4,16.5,17.2,17.3,17.8,17.3,18.4,16.9)
ydata0<-(1,1.2,1.1,1.4,2.3,1.7,1.7,2.4,2.1,2.1,1.2,2.3,1.9,2.4,2.6,
2.9,4,3.3,3,3.4,2.9,1.9,3.9,4.2)
nx<-length(xdata0);
ny<-length(ydata0);
#initial values from mles on the observed data#
xmean<-18.24167;xvar<-0.9597797;ymean<-2.370833;yvar<- 0.8312327;
rho<- -0.9700159;
for (j in 1:nsim) {
#This is the augmented x (O2) data#
xdata<-c(xdata0,xmean+rho*(4.2-ymean)/(sqrt(xvar*yvar)))
xmean<-mean(xdata);
Sxx<-(ny-1)*var(xdata)+(1-rho^2)*xvar
xvar<-Sxx/ny
rho<-cor(xdata,ydata0)*sqrt((ny-1)*var(xdata)/Sxx)
}
The algorithm converges very quickly. The MLEs are
ˆ µ
X
= 18.24, ˆ µ
Y
= 2.37, ˆ σ
2
X
= .969, ˆ σ
2
Y
= .831, ˆ ρ = −0.969.
Chapter 12
Regression Models
12.1 The point (ˆ x

, ˆ y

) is the closest if it lies on the vertex of the right triangle with vertices (x

, y

)
and (x

, a +bx

). By the Pythagorean theorem, we must have
_
(ˆ x

−x

)
2
+
_
ˆ y

−(a +bx

)
_
2
_
+
_
(ˆ x

−x

)
2
+(ˆ y

−y

)
2
_
= (x

−x

)
2
+ (y

−(a +bx

))
2
.
Substituting the values of ˆ x

and ˆ y

from (12.2.7) we obtain for the LHS above
_
_
b(y

−bx

−a)
1+b
2
_2
+
_
b
2
(y

−bx

−a)
1+b
2
_2
_
+
_
_
b(y

−bx

−a)
1+b
2
_2
+
_
y

−bx−a)
1+b
2
_
2
_
= (y

−(a +bx

))
2
_
b
2
+b
4
+b
2
+1
(1+b
2
)
2
_
= (y

−(a +bx

))
2
.
12.3 a. Diﬀerentiation yields ∂f/∂ξ
i
= −2(x
i
− ξ
i
) − 2λβ [y
i
−(α+βξ
i
)]
set
= 0 ⇒ ξ
i
(1 + λβ
2
) =
x
i
−λβ(y
i
−α), which is the required solution. Also, ∂
2
f/∂ξ
2
= 2(1 +λβ
2
) > 0, so this is a
minimum.
b. Parts i), ii), and iii) are immediate. For iv) just note that D is Euclidean distance between
(x
1
,

λy
1
) and (x
2
,

λy
2
), hence satisﬁes the triangle inequality.
12.5 Diﬀerentiate log L, for L in (12.2.17), to get

∂σ
2
δ
log L =
−n
σ
2
δ
+
1
2(σ
2
δ
)
2
λ
1+
ˆ
β
2
n

i=1
_
y
i
−(ˆ α +
ˆ
βx
i
)
_
2
.
Set this equal to zero and solve for σ
2
δ
12.7 a. Suppressing the subscript i and the minus sign, the exponent is
(x−ξ)
2
σ
2
δ
+
[y−(α+βξ)]
2
σ
2

=
_
σ
2

2
σ
2
δ
σ
2

σ
2
δ
_
(ξ−k)
2
+
[y−(α+βx)]
2
σ
2

2
σ
2
δ
,
where k =
σ
2

x+σ
2
δ
β(y−α)
σ
2

2
σ
2
δ
. Thus, integrating with respect to ξ eliminates the ﬁrst term.
b. The resulting function must be the joint pdf of X and Y . The double integral is inﬁnite,
however.
12.9 a. From the last two equations in (12.2.19),
ˆ σ
2
δ
=
1
n
S
xx
− ˆ σ
2
ξ
=
1
n
S
xx

1
n
S
xy
ˆ
β
,
which is positive only if S
xx
> S
xy
/
ˆ
β. Similarly,
ˆ σ
2

=
1
n
S
yy

ˆ
β
2
ˆ σ
2
ξ
=
1
n
S
yy

ˆ
β
2
1
n
S
xy
ˆ
β
,
which is positive only if S
yy
>
ˆ
βS
xy
.
12-2 Solutions Manual for Statistical Inference
b. We have from part a), ˆ σ
2
δ
> 0 ⇒ S
xx
> S
xy
/
ˆ
β and ˆ σ
2

> 0 ⇒ S
yy
>
ˆ
βS
xy
. Furthermore,
ˆ σ
2
ξ
> 0 implies that S
xy
and
ˆ
β have the same sign. Thus S
xx
> |S
xy
|/|
ˆ
β| and S
yy
> |
ˆ
β||S
xy
|.
Combining yields
|S
xy
|
S
xx
<
¸
¸
¸
ˆ
β
¸
¸
¸ <
S
yy
|S
xy
|
.
12.11 a.
Cov(aY +bX, cY +dX)
= E(aY +bX)(cY +dX) −E(aY +bX)E(cY +dX)
= E
_
acY
2
2
_
−E(aY +bX)E(cY +dX)
= acVarY +ac(EY )
2
2
−E(aY +bX)E(cY +dX)
= acVarY + (bc +ad)Cov(X, Y ) +bdVarX.
b. Identify a = βλ, b = 1, c = 1, d = −β, and using (12.3.19)
Cov(βλY
i
+X
i
, Y
i
−βX
i
) = βλVarY + (1 −λβ
2
)Cov(X, Y ) −βVarX
= βλ
_
σ
2

2
σ
2
ξ
_
+ (1 −λβ
2
)βσ
2
ξ
−β
_
σ
2
δ

2
ξ
_
= βλσ
2

−βσ
2
δ
= 0
if λσ
2

= σ
2
δ
. (Note that we did not need the normality assumption, just the moments.)
c. Let W
i
= βλY
i
+ X
i
, V
i
= Y
i
+ βX
i
. Exercise 11.33 shows that if Cov(W
i
, V
i
) = 0,
then

n −2r/

1 −r
2
has a t
n−2
distribution. Thus

n −2r
λ
(β)/
_
1 −r
2
λ
(β) has a t
n−2
distribution for all values of β, by part (b). Also
P
__
β:
(n −2)r
2
λ(β)
1 −r
2
λ(β)
≤ F
1,n−2,α
__
= P
__
(X, Y ):
(n −2)r
2
λ
(β)
1 −r
2
λ
(β)
≤ F
1,n−2,α
__
= 1 −α.
12.13 a. Rewrite (12.2.22) to get
_
β:
ˆ
β −
tˆ σ
β

n −2
≤ β ≤
ˆ
β +
tˆ σ
β

n −2
_
=
_
_
_
β:
(
ˆ
β−β)
2
σ
2
β
_
(n −2)
≤ F
_
_
_
.
b. For
ˆ
β of (12.2.16), the numerator of r
λ
(β) in (12.2.22) can be written
βλS
yy
+(1−β
2
λ)S
xy
−βS
xy
= β
2
(λS
xy
) +β(S
xx
−λS
yy
) +S
xy
= λS
xy
(β −
ˆ
β)
_
β+
1
λ
ˆ
β
_
.
Again from (12.2.22), we have
r
2
λ
(β)
1 −r
2
λ
(β)
=
_
βλS
yy
+(1−β
2
λ)S
xy
−βS
xy
_
2

2
λ
2
S
yy
+2βλS
xy
+S
xx
) (S
yy
−2βS
xy

2
S
xx
) −(βλS
yy
+(1−β
2
λ)S
xy
−βS
xx
)
2
,
and a great deal of straightforward (but tedious) algebra will show that the denominator
of this expression is equal to
(1 +λβ
2
)
2
_
S
yy
S
xx
−S
2
xy
_
.
Second Edition 12-3
Thus
r
2
λ
(β)
1 −r
2
λ
(β)
= y
λ
2
S
2
xy
_
β −
ˆ
β
_
2
_
β+
1
λ
ˆ
β
_
2
(1−λβ
2
)
2
_
S
yy
S
x
−S
2
xy
_
=
_
β−
ˆ
β
_
2
ˆ σ
2
β
_
1+λβ
ˆ
β
1+λβ
2
_
2
(1 +λ
ˆ
β
2
)
2
S
2
xy
ˆ
β
2
_
(S
xx
−λS
yy
)
2
+ 4λS
2
xy
_,
after substituting ˆ σ
2
β
from page 588. Now using the fact that
ˆ
β and −1/λ
ˆ
β are both roots
of the same quadratic equation, we have
(1+λ
ˆ
β
2
)
2
ˆ
β
2
=
_
1
ˆ
β

ˆ
β
_
2
=
(S
xx
−λS
yy
)
2
+4λS
2
xy
S
2
xy
.
Thus the expression in square brackets is equal to 1.
12.15 a.
π(−α/β) =
e
α+β(−α/β)
1 +e
α+β(−α/β)
=
e
0
1 +e
0
=
1
2
.
b.
π((−α/β) +c) =
e
α+β((−α/β)+c)
1 +e
α+β((−α/β)+c)
=
e
βc
1 +e
βc
,
and
1 −π((−α/β) −c) = 1 −
e
−βc
1 +e
−βc
=
e
βc
1 +e
βc
.
c.
d
dx
π(x) = β
e
α+βx
[1 +e
α+βx
]
2
= βπ(x)(1 −π(x)).
d. Because
π(x)
1 −π(x)
= e
α+βx
,
the result follows from direct substitution.
e. Follows directly from (d).
f. Follows directly from

∂α
F(α +βx) = f(α +βx) and

∂β
F(α +βx) = xf(α +βx).
g. For F(x) = e
x
/(1 +e
x
), f(x) = F(x)(1 −F(x)) and the result follows. For F(x) = π(x) of
(12.3.2), from part (c) if follows that
f
F(1−F)
= β.
12.17 a. The likelihood equations and solution are the same as in Example 12.3.1 with the exception
that here π(x
j
) = Φ(α +βx
j
), where Φ is the cdf of a standard normal.
b. If the 0 − 1 failure response in denoted “oring” and the temperature data is “temp”, the
following R code will generate the logit and probit regression:
12-4 Solutions Manual for Statistical Inference
For the logit model we have
Estimate Std. Error z value Pr(> |z|)
Intercept 15.0429 7.3719 2.041 0.0413
temp −0.2322 0.1081 −2.147 0.0318
and for the probit model we have
Estimate Std. Error z value Pr(> |z|)
Intercept 8.77084 3.86222 2.271 0.0232
temp −0.13504 0.05632 −2.398 0.0165
Although the coeﬃcients are diﬀerent, the ﬁt is qualitatively the same, and the probability
of failure at 31

, using the probit model, is .9999.
12.19 a. Using the notation of Example 12.3.1, the likelihood (joint density) is
J

j=1
_
e
α+βx
j
1 +e
α+βx
j
_
y

j
_
1
1 +e
α+βx
j
_
n
j
−y

j
=
J

j=1
_
1
1 +e
α+βx
j
_
n
j
e
α

j
y

j

j
x
j
y

j
.
By the Factorization Theorem,

j
y

j
and

j
x
j
y

j
are suﬃcient.
b. Straightforward substitution.
12.21 Since
d

log(π/(1 −π)) = 1/(π(1 −π)),
Var log
_
ˆ π
1 − ˆ π
_

_
1
π(1 −π)
_
2
π(1 −π)
n
=
1
nπ(1 −π)
12.23 a. If

a
i
= 0,
E

i
a
i
Y
i
=

i
a
i
[α +βx
i
+µ(1 −δ)] = β

i
a
i
x
i
= β
for a
i
= x
i
− ¯ x.
b.
E(
¯
Y −β¯ x) =
1
n

i
[α +βx
i
+µ(1 −δ)] −β¯ x = α +µ(1 −δ),
so the least squares estimate a is unbiased in the model Y
i
= α

+ βx
i
+
i
, where α

=
α +µ(1 −δ).
12.25 a. The least absolute deviation line minimizes
|y
1
−(c +dx
1
)| +|y
2
−(c +dx
1
)| +|y
3
−(c +dx
3
)| .
Any line that lies between (x
1
, y
1
) and (x
1
, y
2
) has the same value for the sum of the ﬁrst
two terms, and this value is smaller than that of any line outside of (x
1
, y
1
) and (x
2
, y
2
).
Of all the lines that lie inside, the ones that go through (x
3
, y
3
) minimize the entire sum.
b. For the least squares line, a = −53.88 and b = .53. Any line with b between (17.9−14.4)/9 =
.39 and (17.9 −11.9)/9 = .67 and a = 17.9 −136b is a least absolute deviation line.
12.27 In the terminology of M-estimators (see the argument on pages 485 − 486),
ˆ
β
L
is consistent
for the β
0
that satisﬁes E
β
0

i
ψ(Y
i
− β
0
x
i
) = 0, so we must take the “true” β to be this
value. We then see that

i
ψ(Y
i

ˆ
β
L
x
i
) →0
as long as the derivative term is bounded, which we assume is so.
Second Edition 12-5
12.29 The argument for the median is a special case of Example 12.4.3, where we take x
i
= 1
so σ
2
x
= 1. The asymptotic distribution is given in (12.4.5) which, for σ
2
x
= 1, agrees with
Example 10.2.3.
12.31 The LAD estimates, from Example 12.4.2 are ˜ α = 18.59 and
˜
β = −.89. Here is Mathematica
code to bootstrap the standard deviations. (Mathematica is probably not the best choice here,
as it is somewhat slow. Also, the minimization seemed a bit delicate, and worked better when
done iteratively.) Sad is the sum of the absolute deviations, which is minimized iteratively
in bmin and amin. The residuals are bootstrapped by generating random indices u from the
discrete uniform distribution on the integers 1 to 23.
1. First enter data and initialize
Needs["Statistics‘Master‘"]
Clear[a,b,r,u]
a0=18.59;b0=-.89;aboot=a0;bboot=b0;
y0={1,1.2,1.1,1.4,2.3,1.7,1.7,2.4,2.1,2.1,1.2,2.3,1.9,2.4,
2.6,2.9,4,3.3,3,3.4,2.9,1.9,3.9};
x0={20,19.6,19.6,19.4,18.4,19,19,18.3,18.2,18.6,19.2,18.2,
18.7,18.5,18,17.4,16.5,17.2,17.3,17.8,17.3,18.4,16.9};
model=a0+b0*x0;
r=y0-model;
u:=Random[DiscreteUniformDistribution[23]]
2. Here is the actual bootstrap. The vectors aboot and bboot contain the bootstrapped values.
B=500;
Do[
rstar=Table[r[[u]],{i,1,23}];
astar=a/.amin[[2]];
bstar=b/.bmin[astar][[2]];
aboot=Flatten[{aboot,astar}];
bboot=Flatten[{bboot,bstar}],
{i,1,B}]
3. Summary Statistics
Mean[aboot]
StandardDeviation[aboot]
Mean[bboot]
StandardDeviation[bboot]
4. The results are Intercept: Mean 18.66, SD .923 Slope: Mean −.893, SD .050.

0-2

Solutions Manual for Statistical Inference

“When I hear you give your reasons,” I remarked, “the thing always appears to me to be so ridiculously simple that I could easily do it myself, though at each successive instance of your reasoning I am baﬄed until you explain your process.” Dr. Watson to Sherlock Holmes A Scandal in Bohemia 0.1 Description This solutions manual contains solutions for all odd numbered problems plus a large number of solutions for even numbered problems. Of the 624 exercises in Statistical Inference, Second Edition, this manual gives solutions for 484 (78%) of them. There is an obtuse pattern as to which solutions were included in this manual. We assembled all of the solutions that we had from the ﬁrst edition, and ﬁlled in so that all odd-numbered problems were done. In the passage from the ﬁrst to the second edition, problems were shuﬄed with no attention paid to numbering (hence no attention paid to minimize the new eﬀort), but rather we tried to put the problems in logical order. A major change from the ﬁrst edition is the use of the computer, both symbolically through Mathematicatm and numerically using R. Some solutions are given as code in either of these languages. Mathematicatm can be purchased from Wolfram Research, and R is a free download from http://www.r-project.org/. Here is a detailed listing of the solutions included. Chapter 1 2 3 4 5 6 7 8 9 10 11 12 Number of Exercises 55 40 50 65 69 43 66 58 58 48 41 31 Number of Solutions 51 37 42 52 46 35 52 51 41 26 35 16 Missing 26, 30, 36, 42 34, 38, 40 4, 6, 10, 20, 30, 32, 34, 36 8, 14, 22, 28, 36, 40 48, 50, 52, 56, 58, 60, 62 2, 4, 12, 14, 26, 28 all even problems from 36 − 68 8, 16, 26, 28, 34, 36, 38, 42 4, 14, 16, 28, 30, 32, 34, 36, 42, 54, 58, 60, 62, 64 36, 40, 46, 48, 52, 56, 58 2, 8, 10, 20, 22, 24, 26, 28, 30 32, 38, 40, 42, 44, 50, 54, 56 all even problems except 4 and 32 4, 20, 22, 24, 26, 40 all even problems

0.2 Acknowledgement Many people contributed to the assembly of this solutions manual. We again thank all of those who contributed solutions to the ﬁrst edition – many problems have carried over into the second edition. Moreover, throughout the years a number of people have been in constant touch with us, contributing to both the presentations and solutions. We apologize in advance for those we forget to mention, and we especially thank Jay Beder, Yong Sung Joo, Michael Perlman, Rob Strawderman, and Tom Wehrly. Thank you all for your help. And, as we said the ﬁrst time around, although we have beneﬁted greatly from the assistance and

ACKNOWLEDGEMENT

0-3

comments of others in the assembly of this manual, we are responsible for its ultimate correctness. To this end, we have tried our best but, as a wise man once said, “You pays your money and you takes your chances.” George Casella Roger L. Berger Damaris Santana December, 2001

Each sample point describes the result of the toss (H or T) for each of the four tosses. If x ∈ (B ∩ Ac ). x ∈ A\B ⇔ x ∈ A and x ∈ B ⇔ x ∈ A and x ∈ A ∩ B ⇔ x ∈ A\(A ∩ B). So we might use S = {t : t ≥ 0}. x ∈ (A ∪ B)c ⇔ x ∈ A or x ∈ B ⇔ x ∈ Ac and x ∈ B c ⇔ x ∈ Ac ∩ B c / / x ∈ (A ∩ B)c ⇔ x ∈ A ∩ B ⇔ x ∈ A and x ∈ B ⇔ x ∈ Ac or x ∈ B c ⇔ x ∈ Ac ∪ B c . 1/n. 100]. Then either x ∈ (B ∩ A) or x ∈ (B ∩ Ac ). c. Suppose we weigh the rats in ounces.4 a. for example THTT denotes T on 1st. Similar to part a). then x ∈ B. / b. x ∈ A ∪ (B ∪ C) ⇔ x ∈ A or x ∈ B ∪ C ⇔ x ∈ A ∪ B or x ∈ C ⇔ x ∈ (A ∪ B) ∪ C. . we have B = (B ∩ A) ∪ (B ∩ Ac ). 1}. x ∈ A ∪ B ⇔ x ∈ A or x ∈ B ⇔ x ∈ B ∪ A x ∈ A ∩ B ⇔ x ∈ A and x ∈ B ⇔ x ∈ B ∩ A. b. If x ∈ (B ∩ A). From part b). 1. (Note.. Also. So we might use S = {0. b. d. c. / / / 1. e.) c. a. If we know no 10-day-old rat weighs more than 100 oz.9b we have P (A∪B) = P (A)+P (B)−P (A∩B).Chapter 1 Probability Theory “If any little problem comes your way. then x ∈ B. Since the containment goes both ways. Suppose x ∈ B.1 a. .3 a. If x ∈ A. then S = {0/n. that is. The number of damaged leaves is a nonnegative integer. 1. Now suppose x ∈ (B ∩ A) ∪ (B ∩ Ac ). So. If n is the number of items in the shipment. Thus (B ∩ A) ∪ (B ∩ Ac ) ⊂ B. . We might observe fractions of an hour. “A or B or both” is A∪B. . the half inﬁnite interval [0. . There are 24 = 16 such sample points.}. T on 3rd and T on 4th. ∞).” Sherlock Holmes The Adventure of the Three Students 1. The weight must be greater than zero so we might use S = (0. I shall be happy. a more straightforward argument for this part simply uses the Distributive Law to state that (B ∩ A) ∪ (B ∩ Ac ) = B ∩ (A ∪ Ac ) = B ∩ S = B. Thus B ⊂ (B ∩ A) ∪ (B ∩ Ac ). . then x ∈ B ∩ A. H on 2nd. 1. A ∪ B = A ∪ [(B ∩ A) ∪ (B ∩ Ac )] = A ∪ (B ∩ A) ∪ A ∪ (B ∩ Ac ) = A ∪ [A ∪ (B ∩ Ac )] = A ∪ (B ∩ Ac ). . 2.2.2 For each of these equalities. . you must show containment in both directions.) x ∈ A ∩ (B ∩ C) ⇔ x ∈ A and x ∈ B and x ∈ C ⇔ x ∈ (A ∩ B) ∩ C. to give you a hint or two as to its solution. (It can similarly be shown that A ∪ (B ∪ C) = (A ∪ C) ∪ B. we could use S = (0. if I can. hence x ∈ (B ∩ A) ∪ (B ∩ Ac ). Then either x ∈ A or x ∈ Ac . d. x ∈ A and / / x ∈ B ⇔ x ∈ A and x ∈ B c ⇔ x ∈ A ∩ B c . From Theorem 1. ∞). and.

p0 = p2 p1 = p2 These two equations imply u(1 − u) = 1/3. 1.S. Lastly. 52 πr2 52 πr2 52 2 2 2 b.2. so P (sscoring exactly i points) = π(6 − i) r2 π ((6−(i + 1)))2 r2 (6 − i) −(5 − i) − = . . by the deﬁnition of intersection α α x ∈ Ac for all α ∈ Γ. Therefore x ∈ Ac for all α ∈ Γ. 5. 5. which is exactly the probability distribution of Example 1.6 “At least one of A or B” is A ∪ B. . P (scoring exactly i points) = P (inside circle i) − P (inside circle i + 1). 5 = = = P (scoring i points ∩ board is hit) P (board is hit) πr2 A πr2 (6 − i)2 − (5 − i)2 i = 1. that is x ∈ Aα for all α ∈ Γ. d. So we get the same answer as in a). Since i ≤ 5. 1. b. p2 = uw. P (scoring i points) = b. Expanding the squares in part a) we ﬁnd P (scoring exactly i points) = 11−2i . . . . p1 = u(1 − w) + w(1 − u). “At most one of A or B” is (A ∩ B)c .1-2 Solutions Manual for Statistical Inference b.2. and P ((A ∩ B)c ) = 1 − P (A ∩ B). 1. Suppose x ∈ (∪α Aα )c . birth results in identical twins that are female} 1 P (A ∩ B ∩ C) = 90 × 1 × 1 3 2 p0 = (1 − u)(1 − w). Therefore α x ∈ ∪α Aα . 1. . . c. P (scoring i points|board is hit) = (6 − i)2 − (5 − i)2 52 i = 1. P (i ∪ j) = area of i ring + area of j ring = P (i) + P (j). Circle i has radius (6 − i)r/5. By the deﬁnition of complement x ∈ Aα for all α ∈ Γ.9a) = P (A) + P (B) − 2P (A ∩ B).5 a. which has no solution in the real numbers. Let P (i) = 11−2i . which is 25 decreasing in i.8 a. Thus we have P ((A ∩ B c ) ∪ (B ∩ Ac )) = P (A ∩ B c ) + P (B ∩ Ac ) (disjoint union) = [P (A) − P (A ∩ B)] + [P (B) − P (A ∩ B)] (Theorem1. A ∩ B ∩ C = {a U.7. the probability assignment is not legitimate. P (scoring i points|board is hit) P (board is hit) P (scoring i points ∩ board is hit) Therefore. . . Thus x ∈ (∪α Aα )c .7 a. P (S) = P (hitting the dartboard) = 1 by 25 deﬁnition. by the deﬁnition of complement x ∈ ∪α Aα . . Thus. P (i) ≥ 0 for all i. c. . A 52 1− πr 2 A πr 2 A (6−i)2 −(5−i)2 52 if i = 0 if i = 1. . ⇒ u+w =1 ⇒ uw = 1/3. “A or B but not both” is (A ∩ B c ) ∪ (B ∩ Ac ). Thus x ∈ ∩α Ac and. . 1.9 a.

1. ∪Ai ∈ B. A2 . so x ∈ Ac for every i / / i and x ∈ ∩Ai . . x ∈ Ac for some α ∈ Γ. for each i. Thus ∪∞ Ai ∈ B1 ∩ B2 . (1) The empty set ∅ is a subset of any set. By the deﬁnition of complementation.) Thus ∞ ∞ n ∞ P i=1 Ai = lim P n→∞ i=1 Ai = lim n→∞ P (Ai ) + P (B n+1 ) = i=1 i=1 P (Ai ). Note that Bk+1 ⊂ Bk and Bk → φ as k → ∞. then A ⊂ B c and P (A) ≤ P (B c ). . . since B1 and B2 are both sigma algebra. Thus x ∈ ∪α Ac and. 1. that is. For k + 1. (Otherwise the sum of the probabilities would be inﬁnite. . by the deﬁnition of complement x ∈ (∩α Aα ). the entire job can be done in n1 × n2 × · · · × nk ways. . Therefore x ∈ ∩α Aα . (3) If A1 . Thus ∅ ∈ B1 ∩ B2 . then.12 First write ∞ n ∞ P i=1 Ai = P i=1 n Ai ∪ i=n+1 Ai ∞ = P i=1 n Ai +P i=n+1 ∞ Ai (Ai s are disjoint) = i=1 ∞ P (Ai ) + P i=n+1 Ai (ﬁnite additivity) Now deﬁne Bk = i=k Ai .2. then A1 . (3) If A1 . ∪∞ Ai ∈ B1 and ∪∞ Ai ∈ B2 .14 If S = {s1 . . That implies x ∈ Ac for some i. so A and B cannot be disjoint. then A ∈ B1 and A ∈ B2 . Therefore x ∈ Aα for some α ∈ Γ. But here P (A) > P (B c ). Let B1 and B2 be the two sigma algebras. sn }. Therefore x ∈ Ac for some α ∈ Γ. . Ai ⊂ S. hence. 1. in particular. if A and B are disjoint. (1) The empty set ∅ ∈ {∅. S}. so x ∈ ∪Ac . A2 . (2) If A ∈ B1 ∩ B2 . (2) ∅c = S ∈ B and S c = ∅ ∈ B. Thus there are 2n possible choices.1.Second Edition 1-3 b.11 We must verify each of the three properties in Deﬁnition 1. Therefore x ∈ Aα for some α ∈ Γ. The proof for k = 2 is given after Theorem 1. . Assume true for k. . ∈ B1 ∩ B2 . . Therefore. . ∈ B. By the deﬁnition of union. Since B1 and B2 are both sigma algebra Ac ∈ B1 and Ac ∈ B2 . . ∈ B1 and A1 . then any subset of S can be constructed by either including or excluding si .10 For A1 . Therefore Ac ∈ B1 ∩ B2 . . then x ∈ ∪Ai . i=1 i=1 i=1 1. Thus ∅ ∈ B. . c.15 Proof by induction. b.14. . ∈ B2 . ∅ ⊂ S. by the deﬁnition of α α union. (3) ∅∪S = S ∈ B. That implies x ∈ Ai for any i. (2) If A ∈ B. Ac is also a subset of S. / i i 1. ∪Ai ⊂ S. Thus α x ∈ (∩α Aα )c . Ac ∈ B. . . for each i. which is impossible. Proof of (ii): If x ∈ (∩Ai )c . and for each one of these ways we can complete the job by performing . A2 .13 If A and B are disjoint.2. then A ⊂ S. . . the k + 1th task can be done in nk+1 ways. A2 . P (A ∪ B) = P (A) + P (B) = 1 + 3 = 13 . An n c n n c n (i) i=1 Ai = i=1 Ac i (ii) i=1 Ai = i=1 Ac i Proof of (i): If x ∈ (∪Ai )c . a. Hence. Suppose x ∈ (∩α Aα )c . Thus ∅ ∈ B. . More 3 4 12 generally. (1) ∅ ∈ B1 and ∅ ∈ B2 since B1 and B2 are sigma algebras. then x ∈ ∩Ai . and. 1.

6 to pick two days with two calls. 6 = 15. So the total number of diﬀerent pieces is n + n(n − 1)/2 = n(n + 1)/2.825.000 314. There are 2r ways of choosing 2r diﬀerent shoes . 2 calls on each of two days.21 The probability is 2r2n . We will now count another pattern. There are n − 1 ways of choosing the cell with two balls.19 a. and there are n+r−1 ways to do this. and 1 call on each of the remaining four days. There are 2n ways of choosing 2r shoes from a total of 2n shoes. 3 b) 26 + 26 .800 3. nk+1 terms 1. 2r ( 2r ) 2n Thus there are 2r equally likely sample points. And there are 6! ways of assigning the remaining 6 calls to the remaining 6 days. 1.708. The number of sample points with this pattern is 7 12 6!.840 The probability is the total number of sample points divided by 712 . the number of ways we can do the job is (1 × (n1 × n2 × · · · × nk )) + · · · + (1 × (n1 × n2 × · · · × nk )) = n1 × n2 × · · · × nk × nk+1 . There might be 6 calls one day and 1 call each of the other days. pattern 6111111 5211111 4221111 4311111 3321111 3222111 2222211 number of sample points 7 12 6! = 6 7 12 6 7 5! = 5 2 7 12 6 8 6 4! = 4 2 2 2 7 12 6 8 5! = 4 3 7 12 9 6 2 3 3 5 2 4! = 12 6 9 7 5 7 3 3 3 2 2 3! = 7 12 10 8 6 4 5 2 2 2 2 2 2! = 4.162. There are n 2 ways of picking the 2 balls to go into this cell. ( n )22r 1. (n)n! 1.16 a) 26 . Thus there are r unlabeled balls to be placed in n unlabeled bins. Think of the n variables as n bins. Thus.17 There are n = n(n − 1)/2 pieces on which the two numbers do not match. And there are (n − 2)! ways of placing the remaining n − 2 balls into the n − 2 cells.162. The n 2nn−2 denominator is nn because this is the number of ways to place n balls in n cells.960 83. Thus for each of the nk+1 we have n1 × n2 × · · · × nk ways of completing the job by the induction hypothesis.18 The probability is 2 n = (n−1)(n−1)! . There are n ways to specify the empty cell. There are many ways to obtain this.344. one ball in each cell. which is 3. 2 4! ways to order remaining 4 calls. The numerator is the number of ways of placing the balls such that exactly one cell is empty.000 1.800 698. There are several diﬀerent ways that the calls might be assigned so that there is at least one call each day.544. (7 ways to pick day with 4 4 2 2 2 calls. There might be 4 calls on one day. 3 2 c) 264 + 263 + 262 . 8 ways to pick 4 2 2 two calls for lowered numbered day. (Choose 2 out of 2 n numbers without replacement. r 1.000 139. There are 7 ways to specify the day with 6 calls.075. The product of these is the numerator n(n − 1) n (n − 2)! = n n!. Diﬀerentiating with respect to one of the variables is equivalent to putting a ball in the bin.) Here is a list of all the possibilities and the counts of the sample points for each one.1-4 Solutions Manual for Statistical Inference the remaining k tasks.075.20 A sample point speciﬁes on which day (1 through 7) each of the 12 calls happens.280 523. 2 2 1. The numerator is the number of sample points n for which there will be no matching pair. Denote this by 6111111.840 ≈ 712 .397. Denote this by 4221111. 6 ways to pick the two calls for higher numbered day.656. The number of sample points with this pattern is 7 12 6 8 6 4!.2285. There are 12 to specify which of 6 6 the 12 calls are on this day.) There are n pieces on which the two numbers match.088. Thus there are 712 equally likely sample points.908. 12 to pick the calls for that day. Here is one. 4 b.

for n even k k k−1 n (−1) k=0 k n k = n + 0 n−1 (−1)k k=1 n n + k n (−1)k n−1 n−1 + k k−1 = 0. Thus.25 Enumerating the sample space gives S = {(B. There are an even number of terms in the sum n (0. 1−(1−p)2 p = [1−(1−p)2 ]2 > 0. B). Thus P (at least one boy) = 3/4 and P (both are boys) = 1/4.22 a) 1. Use the fact that for k > 0. (G. Then. 1. G)}. have opposite signs. and the minimum p is at zero. . B). with each outcome equally likely. An ambiguity may arise if order is not acknowledged. G). d dp p 1−(1−p)2 2 p(1 − p)2i = p . use the following identity. G)} . o 1. = = b. P (A wins) = p + (1 − p)2 p + (1 − p)4 p + · · · = c. (G.Second Edition 1-5 styles. 1. therefore P ( both are boys | at least one boy ) = 1/3. Thus the probability is increasing in p. 2nd tosses x) n x 1 2 x = x=0 1 2 n−x 2 = 1 4 n n x=0 n x 2 . The product of this is the numerator n 2r 2r 2 . · · · .27 a. For n odd the proof is straightforward. which is the basis of Pascal’s triangle: For k > 0. P ( same number of heads ) = x=0 n P (1st tosses x. all pairs cancel k and the sum is zero. 1.with each outcome equally likely. If n is even. B). There are two ways of choosing within a given shoe style (left shoe or right shoe). which are equal. k n n k n n + + 0 n n−1 k=1 n n n−1 n−1 + − − 0 n 0 n−1 =n n n−1 k−1 to write n−1 k k=1 n k = n k=1 n−1 k−1 =n j=0 n−1 j = n2n−1 . 1. G).23 n (31)(29)(31)(30)···(31) 15 15 15 15 15 (366) 180 b) 336 335 316 366 365 ··· 336 366 30 ( ) . (B. n).24 a. (G. Using L’Hˆpital’s rule we ﬁnd limp→0 1−(1−p)2 = 1/2. which n gives 22r ways of arranging each one of the 2r arrays. b. n = n−1 + n−1 . ∞ P (A wins) = i=1 P (A wins on ith toss) 1 + 2 1 2 2 = 1 + 2 1 2 4 1 2 ∞ + ··· = i=0 ∞ i=0 1 2 2i+1 = 2/3. and n and n−k . the space is S = {(B. (B.

N N →∞ lim dn − dn+1 = lim d1 − dN +1 = c.4. Thus there would be k1 !k2k! m ! diﬀerent ordered components. .2. 2.12).20). d. k n = k=1 (−1) k=1 (−1) k 1.9.4. . km diﬀerent groups of indistinguishable objects. (12. !···k e. we do not need to know what the original sample was. There are k1 !k2 ! · · · km ! of such permutations for each ordered component. (12. 2 n 1 1 Diﬀerentiation will show that ((n + 2 )) log((1 + n )) is increasing in n. (2.12. (9.9.4). . a constant. the partial sums must approach a limit.2.4) Unordered Ordered (2.28 The average of the two integrals is =n n−1 j n−1 j=0 (−1) j = 0 from part a). This would complete the problem. 1 N →∞ Thus limn→∞ dn = d1 − c.9).9). Think of the m distinct numbers as m bins. Therefore the outcome with average x1 +x2 +···+xn that is obtained by the unordered sample {x1 . since the sum telescopes.12). We only need to know the sample size and the distinct values. xn } n . which is the number of distinct k bootstrap samples. 14. (12. Unordered Ordered 1.9) (9. and we want to show that limn→∞ mdn = c.29 a. . xn } is nn . by working with diﬀerences. . since the desired limit is the exponential of this one. 8. Let dn = log n! − [(n + 1/2) log n − n].12.9).4. (2.9). If the k objects were distinguishable then there would be k! possible ordered arrangements.2) b. 2 12n 6n It therefore follows.9. as the remaining adjacent pairs are negative.12.9.9.12).04 at n = 1. (4. Hence. .2). once the positions of the objects are ﬁxed in the ordered arrangement permutations within objects of the same group won’t change the ordered arrangement.12.1-6 n k+1 Solutions Manual for Statistical Inference n n−1 k−1 k+1 c. by the comparison test. 1. Hence 0 < dn dn+1 < n+ 1 2 1 1 1 + 3 2 n 2n 3n −1= ∞ 1 1 + 3. 20}.9.12} (9. and has minimum value (3/2) log 2 = 1. Note that 1 1 dn − dn+1 = n + log 1 + − 1. The ﬁrst three terms provide an upper bound on log(1 + x). . a constant. (9. This is k+m−1 . Same as (a). 180 Thus the probability is 66 . c. (12.12. {4. The number of ordered samples drawn with replacement from the set {x1 .9.2. to create all of the bootstrap samples.12.2.2. There are 66 ordered samples with replacement from {1.12.9).2. . that the series 1 dn − dn+1 converges. Selecting a sample of size k. .2).4. 7. The number of or6! dered samples that would result in {2.9. 8.12). xn } is n!. Thus dn − dn+1 > 0. The number of ordered samples that make up the unordered sample {x1 . .4.9.9. (12. . (9. with replacement.12.9. . This is accomplished in an indirect way. (4. 14. .12.12) (12.31 a. Since we have k1 . 7. Moreover.4). . (9. 7.2. [(n log n − n) + ((n + 1) log (n + 1) − n)] /2 = [n log n + (n + 1) log (n + 1)] /2 − n ≈ (n + 1/2) log n − n. Next recall the Taylor expansion of log(1 + x) = x − x2 /2 + x3 /3 − x4 /4 + · · ·. .9.4. 14} is 2!2!1!1! = 180 (See Example 1.9.4. is the same as putting k balls in the m bins.12.12} (4.12.12) {2. which avoids dealing with the factorial.12.9.12. . Note that.

Any other unordered outcome from {x1 . 3 2 5 2 30 b. √ b..3. . the probability of choosing 1 any xi is n .0025 × 2 1 2 = . . n n n e nn 2nπ nn 2nπ c. N −i+1 ..9524. km times where k1 + k2 + · · · + km = n with at least one of the ki ’s satisfying 2 ≤ ki ≤ n.Second Edition 1-7 n! has probability nn .32 This is most easily seen by doing each possibility. xn }. . and thus √ √ n! 2nπ n!en 2πnn+(1/2) e−n en √ √ = = = 1. then ∞ P i=1 Ai B = = P( ∞ i=1 Ai ∩ B) P (B) P (Ai ∩ B) P (B) = = P( ∞ ∞ i=1 (Ai ∩ B)) P (B) ∞ i=1 P (Ai |B). A2 . L’Hˆpital’s rule shows that the limit is −1.05 × 1 P (CB|M )P (M ) 2 = P (CB|M )P (M ) + P (CB|F )P (F ) . distinct from the unordered sample {x1 . . n! ≈ 2πnn+(1/2) e−n . . That is lim n log 1 − 1 n = lim log 1 − n→∞ 1/n 1 n Therefore the outcome with average x1 +x2 +···+xn n n→∞ . . are disjoint. .33 Using Bayes rule P (M |CB) = 1. . .35 Clearly P (·|B) ≥ 0. P (i) = 1 . o 1. . k1 !k2 ! · · · km !nn n is the most likely. . 19 1. and P (S|B) = 1.05 × 1 +. since k1 !k2 ! · · · km ! > 1. as n → ∞. P (Brown Hair) = P (Brown Hair|Litter 1)P (Litter 1) + P (Brown Hair|Litter 2)P (Litter 2) 2 1 3 1 19 = + = . Then P (1) = 1.34 a. . . If A1 .14.. i=1 . P (N ) = 1. . establishing the result. . will contain m diﬀerent numbers repeated k1 . xn }. . calculate the limit of the log. . Since we are drawing with replacement from the set {x1 .. . . xn }. The probability of obtaining the corresponding average of such outcome is n! n! < n . Stirling’s approximation is that. To prove that limn→∞ (1 − n )n = e−1 . Use Bayes Theorem P (Litter 1|Brown Hair) = P (BH|L1)P (L1) P (BH|L1)P (L1) + P (BH|L2)P (L2 = 2 3 19 30 1 2 1 . N P (2) = 1 . See also Lemma 2. Therefore the probability of obtaining an ordered sample of size n without xi 1 n 1 is (1 − n ) . = 10 . . . Let P (i) = probability that the candidate hired on the ith trial is best. . N −1 .

then P (A ∪ B) = P (A) + P (B) and A ∩ (A ∪ B) = A. Thus A and B cannot be independent. P (A ∪ B) P (A) + P (B) d. P (A|W) + P (C|W) = 1. P (A) = P (A ∩ B) + P (A ∩ B c ) from Theorem 1. A and B are not mutually exclusive.4.2. P (C|W) = 2/3. and P (A) = P (A ∩ B). P (B) P (B) P (A ∩ B) P (A) = = 1. P (A|B) = .39 a. So P (A ∩ B c ) = 0. B and C are a partition. 2 b. b.3. as in Example 1. P (B|W) = 0.11a.38 a. If A and B are independent and both have positive probability. (This could be calculated directly.35. If A and B are independent. where the third equality follows from the independence of A and B. B. Suppose A and B are mutually exclusive. that is. A. P (Ac ∩ B c ) = P (Ac ) − P (Ac ∩ B) = P (Ac ) − P (Ac )P (B) = P (Ac )P (B c ). 1. P (A|W) = P (A∩W) P (W) = γ/3 (γ+1)/3 = γ γ+1 where. then 0 < P (A)P (B) = P (A ∩ B).) So if A can swap fates with C. P (A) P (A) P (A) P (A ∩ B) = = P (A) P (B) 1 c. A ⊂ B implies A ∩ B = A. P (A|B) = P (A ∩ B) P (A) = . . then 0 = P (A ∩ B) = P (A)P (B). 1. This implies A ∩ B = ∅. P (Ac ∩ B) = P (Ac |B)P (B) = [1 − P (A|B)]P (B) = [1 − P (A)]P (B) = P (Ac )P (B) . If A and B are mutually exclusive.1-8 Solutions Manual for Statistical Inference 1. b. So P (A|W) + P (B|W) + P (C|W) = 1. b. we have P (W) = P (W|A)P (A) + P (W|B)P (B) + P (W|C)P (C) 1 1 1 γ+1 = γ +0 +1 = . Then A ∩ B = ∅ and P (A ∩ B) = 0. Thus. But. P (A ∩ B ∩ C) = P (A ∩ (B ∩ C)) = P (A|B ∩ C)P (B ∩ C) = P (A|B ∩ C)P (B|C)P (C). By Exercise 1. But (A ∩ B c ) ⊂ B c and P (B c ) = 1 − P (B) = 0. P (A ∩ (A ∪ B)) P (A) P (A|A ∪ B) = = . his chance of survival becomes 2/3. Since P (A|W) = 1/3. P (·|W) is a probability function. But this cannot be since P (A) > 0 and P (B) > 0. Thus. 3 3 3 3 Thus.40 a. 1 3 1 3 1 3  γ  γ+1 =  γ <  γ+1  γ > γ+1 if γ = 1 2 if γ < 1 2 if γ > 1 . C and W as in Example 1. Using the same events A. Thus.3. P (B|A) = And also.4.37 a. Thus. 1.

n n Probability (16/43)2 (16/43)(27/43) (27/43)(16/43) (27/43)2 P (∪n ) ≤ i=1 i=1 P (Ai ) − P2 + P3 + · · · ± Pn ≤ i=1 P (Ai ) since Pi ≥ Pj if i ≤ j and therefore the terms −P2k + P2k+1 ≤ 0 for k = 1. are getting smaller since Pi ≥ Pj if i ≤ j and therefore the terms −P2k + P2k+1 ≤ 0. Given that dot-dot was received. . . P1 −P2 +P3 .. (2/3)(4/7) + (1/4)(3/7) b.10). For Bonferroni’s Inequality apply the inclusion-exclusion identity to the Ac ..Second Edition 1-9 1.2. the distribution of the four possibilities of what was sent are Event dash-dash dash-dot dot-dash dot-dot 1. The sequence of bounds is improving because the bounds P1 . The lower bounds P1 − P2 . are getting bigger since Pi ≥ Pj if i ≤ j and therefore the terms P2k+1 − P2k ≥ 0. The other arguments are similar. P1 − P2 + P3 − P4 + P5 − P6 . . P1 −P2 +P3 −P4 + P5 . . . . That is n−1 n n P (Ai ∩ Aj ∩ Ak ) + P (Ai ∩ Aj ∩ (∪k Ak )c ) i=1 j=i+1 n−1 k=1 n n ≥ i=1 j=i+1 n−2 n−1 k=1 n P (Ai ∩ Aj ∩ Ak ) ≥ i=1 j=i+1 k=j+1 P (Ai ∩ Aj ∩ Ak ) = 1≤i<j<k≤n P (Ai ∩ Aj ∩ Ak ) = P3 . . When n is even the last term to consider is −Pn ≤ 0. n−1 when 2 n is odd. i b. . . P1 − P2 + P3 − P4 . For Boole’s Inequality. and use the argument leading to (1.41 a. P ( dash sent | dash rec) = = P ( dash rec | dash sent)P ( dash sent) P ( dash rec | dash sent)P ( dash sent) + P ( dash rec | dot sent)P ( dot sent) (2/3)(4/7) = 32/41. Write n−1 n P2 = 1≤i<j≤n P (Ai ∩ Aj ) = i=1 j=i+1 n−1 n P (Ai ∩ Aj ) n = i=1 j=i+1 k=1 P (Ai ∩ Aj ∩ Ak ) + P (Ai ∩ Aj ∩ (∪k Ak )c ) Now to get to P3 we drop terms from this last expression. . We illustrate the proof that the Pi are increasing by showing that P2 ≥ P3 . By a similar calculation as the one in (a) P (dot sent|dot rec) = 27/434. Then we have 16 P ( dash sent|dot rec) = 43 .43 a. .

dx e−e = e−x e−e > 0.20. and P (X3 = 0) = 520.723/77 ≈ . The number of sample points with this pattern is 7 7 4 5 = 14. . The number of sample points that yield X3 = 0 is 77 − 288. . with the last one equaling P (A) (see Schwager 1984 for details).5. so F (x) is increasing. . limx→∞ e−e = 1. (2) PX (X ) = P (∪m {sj ∈ S : X(sj ) = xi }) = P (S) = 1. limx→−∞ 1 + π tan−1 (x) = 1 + π −π = 0. The number of sample points that give each of these patterns is given below. . limx→∞ 1 + π tan−1 (x) = 2 2 2 2 d 1 1 1 −1 (x) = 1+x2 > 0.120 − 14. limx→−∞ e−e = 0. . all of the probabilities in the inclusion-exclusion identity are the same.47 All of the functions are continuous. For the lower bounds we get P1 − P2 = n − n 2 P (A). Thus we only need to check the limit. where the second inequality follows from the fact the P is a probability function. P1 − P2 + P3 = n − n n + 2 3 P (A).5. d dx (1 − e−x ) = e−x > 0. (3) If i=1 A1 .700/77 ≈ .400 3 2 2 7 7 6 4! = 88. k 4 4 1. We must verify each of the three properties in Deﬁnition 1.6322. There are 77 equally likely sample points.723. 2 3 3 So P (X3 = 2) = 14.700. Only the pattern 331 (3 balls in one cell. . . limx→−∞ (1 − e−x ) = 0.470 3 7 7 6 4 2 = 22. 3 1. ∈ B and pairwise disjoint then ∞ 20 k n−k PX (∪∞ Ak ) k=1 = P( k=1 ∞ {∪xi ∈Ak {sj ∈ S : X(sj ) = xi }}) ∞ = k=1 P (∪xi ∈Ak {sj ∈ S : X(sj ) = xi }) = k=1 PX (Ak ). limx→∞ (1 − e−x ) = 1.3498.120 So P (X3 = 1) = 288. dx 2 + π tan b.200 3 4 288.0178. . and that they are nondecreasing 1 1 1 a.44 P (at least 10 correct|guessing) = k=10 20 1 = .4. 1. 1. Therefore B is the set of all subsets of X . 1 and 2. which start out negative. pattern 34 322 3211 31111 number of sample points 7 7 6 = 1. then become positive. and d. so the last bound is exact. P1 − P2 + P3 − P4 = n − n n n + − 2 3 4 P (A). −x −x −x −x d c. . (1) If A ∈ B then PX (A) = P (∪xi ∈A {sj ∈ S : X(sj ) = xi }) ≥ 0 since P is a probability function.050 3 2 2 2 7 7 6 4 5 2! = 176.700 = 520. See Example 1. Pj = P (A).01386.120/77 ≈ . 1 2 + 1 π π 2 = 1. There are 4 patterns that yield X3 = 1. A2 . 3 balls in another cell and 1 ball in a third cell) yields X3 = 2. . which eventually sum to one. . Thus n n P1 = nP (A).2. hence right-continuous.46 This is similar to Exercise 1. P2 = P (A). . 2 j and the sequence of upper bounds on P (∪i Ai ) = P (A) becomes P1 = nP (A).45 X is ﬁnite. . The possible values of X3 are 0.1-10 Solutions Manual for Statistical Inference c. If all of the Ai are equal. .

d FY (y) = 0 is constant.4616 ≈ .53 a.50 Proof by induction. as x ↓ x0 . 2. Thus is FY (y) right continuous. Thus for all y. Then for n + 1 tk−1 = k=1 k=1 tk−1 + tn = 1−tn 1−tn +tn (1−t) 1−tn+1 + tn = = . Thus. 1. 1−t 1−t 1−t where the second inequality follows from the induction hypothesis. Thus we have P (X > t) = 1 − P (X ≤ t) = 1 − FX (t) ≥ 1 − FY (t) = 1 − P (Y ≤ t) = P (Y > t). 3 and 4. 1.) 1.1095 ≈ .Second Edition −y 1-11 (1− 1− 1− d 1− d 1− e. The probabilities are obtained by counting arguments. Then we have that P (X > t∗ ) = 1 − P (X ≤ t∗ ) = 1 − FX (t∗ ) > 1 − FY (t∗ ) = 1 − P (Y ≤ t∗ ) = P (Y > t∗ ). so F (·) is right-continuous. 1. dy FY (y) = 2/y 3 > 0. 1. 2/y 3 if y > 1 d b. Also.52 The function g(·) is clearly positive. FX (t) ≤ FY (t). P (X ≤ x) → P (X ≤ x0 ). F (x) = P (X ≤ x). FY (y) is continuous except on y = 0 where limy↓0 ( + 1+e−y ) = F (0). FX (t∗ ) < FY (t∗ ). Lastly. The pdf is fY (y) = dy FY (y) = 0 if y ≤ 1. ∞ ∞ g(x)dx = x0 x0 1−F (x0 ) f (x) dx = = 1. For y > 1. Therefore FY is a cdf. Hence limx→∞ P (X ≤ x) = 0 and limx→−∞ P (X ≤ x) = 1. 1−F (x0 ) 1−F (x0 ) 1. c. For n = 2 2 tk−1 = 1 + t = k=1 1−t2 . limy→−∞ FY (y) = limy→−∞ 0 = 0 and limy→∞ FY (y) = limy→∞ 1 − y12 = 1. limy→∞ + 1+e−y = 1. dx ( 1+e−y ) = (1+e)e )2 > 0 and dx ( + 1+e−y ) > −y 1− 0. And for some t∗ . FZ (z) = 0 1− 1 [(z/10)+1]2 if z ≤ 0 if z > 0. F (x) is nondecreasing since the set {x : X ≤ x} is nondecreasing in x.49 For every t. this is n+1 n n k−1 k=1 t = 1−tn 1−t . x 0 1 2 3 4 5 0 5 1 5 2 5 3 5 4 fX (x) = P (X = x) 25 4 25 3 25 2 25 1 25 0 30 4 30 4 30 4 30 4 30 4 ≈ . as follows. 1−t Assume true for n.0091 ≈ .0002 The cdf is a step function with jumps at x = 0. . FZ (z) = P (Z ≤ z) = P (10(Y − 1) ≤ z) = P (Y ≤ (z/10) + 1) = FY ((z/10) + 1).51 This kind of random variable is called hypergeometric in Chapter 3. so FY is increasing. (This is merely a consequence of deﬁning F (x) with “ ≤ ”. limy→−∞ 1+e−y = 0.48 If F (·) is a cdf. FY is nondecreasing. 1. For y ≤ 1.4196 ≈ .

c = 1/1 = 1. 3 P (V ≤ 5) = P (T < 3) = 0 1 −t/1. b. Thus.1-12 Solutions Manual for Statistical Inference π/2 sin xdx 0 ∞ e−|x| dx −∞ 1. 1.5 For v ≥ 6.5 e dt = 1 − e−2 .5 −∞ < v < 0.5 e dt = 1 − e−v/3 . 0≤v<6. 1. 6≤v . Thus.54 a.55 = 1. c = 1/2. . = 0 ex dx −∞ + ∞ −x e dx 0 = 1 + 1 = 2. P (V ≤ v) = 0 1 − e−2 1 − e−v/3 v 2 v 2 = 0 1 −t/1. 1. P (V ≤ v) = P (2T ≤ v) = P T ≤ Therefore.

. 0 < y < 1. FY (y) = P (0 ≤ X ≤ √ √ 1 √ 2 y fX ( y). 0 < y < n!m! 1 log y −(1/2)((log y)/σ)2 .5. . 4 dy 4 4 4 To check the integral. . 0 < y < ∞. 0 < y < 1. To check the integral. 0 < x < ∞. 0 < x < 1. fY (y) = c. . 3 < y < ∞. y = x3 = g(x). 3 3 2 3 4 2. Therefore 0 < y < 1. . X 2. 2 y To check the integral. 1 1 √ √ √ fY (y) = √ 30( y)2 (1 − y)2 = 15y 2 (1 − y)2 . ∞).1. Theorem 2.1 a. 1).5. fY (y) = fx ( y−3 d y−3 1 7 ) ( ) = 7e−(7/4)(y−3) = e−(7/4)(y−3) . and Y = (3. Use Theorem 2. Then fY (y) = ∞ = 0 − (−1) = 1. ∞ 3 7 −(7/4)(y−3) e dy = −e−(7/4)(y−3) 4 √ y) = FX ( y). where y = 0.4 a. y = 4x + 3.1.2 In all three cases. monotone. fx (x) = 7e−7x . 1 0 15y 2 (1 − 1 √ 1 y)2 dy = 0 3 1 2 1 2 (15y 2 − 30y + 15y 2 )dy = 15( ) − 30( ) + 15( ) = 1. 1 .3 P (Y = y) = P ( X+1 = y) = P (X = x = 1 ( 2 )y/(1−y) . . fY (y) = 2 y −1/2 . 0 b. 2 . 3 2 5 2. fY (y) = fx (g −1 (y)) = d −1 g (y) dy = fx (y 1/3 ) d 1/3 1 (y ) = 42y 5/3 (1 − y 1/3 )( y −2/3 ) dy 3 14y(1 − y 1/3 ) = 14y − 14y 4/3 . x+1 . monotone. 1 a.5 is applicable and yields the following answers. fY (y) = ∞. (n+m+1)! −y(n+1) e (1 − e−y )m . σ2 y e y 1−y ) b. 1 (14y − 14y 4/3 )dy = 7y 2 −14 0 y 7/3 7/3 1 = 7y 2 −6y 7/3 0 1 = 1 − 0 = 1. 3 . f (x) is a pdf since it is positive and ∞ 0 f (x)dx = −∞ −∞ 1 λx λe dx + 2 ∞ 0 1 −λx 1 1 λe dx = + = 1. Use Theorem 2. and Y = (0. fx (x) = 42x5 (1 − x). . 3 c. 2 2 2 . .Chapter 2 Transformations and Expectations 2.1.

Thus fY (y) = = 1 1 1 1 1 1 1 1 1 1 1 1 √ √ −√ −√ √ + √ + √ + √ 2π 2π 2π 2π 1−y2 y 1−y2 y 1−y2 y 1−y2 y 1 .1.2-2 Solutions Manual for Statistical Inference b. Then g1 (x) = |x| = −x3 on A1 and 3 g2 (x) = |x| = x3 on A2 .1. 2 2 P (X < t) = c.1. 3 3 (1 − y)−1/2 + (1 − y)1/2 . A1 = (−1. 3. Then by diﬀerentiating (2. A1 = (−∞. g2 (y) = π − sin−1 ( y). b. Then 2 2 2 √ √ −1 −1 gi (x) = sin2 (x) on Ai for i = 1. Let A0 = {0}. −1 −1 −1 √ −1 √ g3 (y) = sin ( y) + π and g4 (y) = 2π − sin ( y).8 to obtain fY (y) = 1 −y1/3 −2/3 e y . 2π).1. −∞ 2 λeλx dx = Therefore. 2. Let A0 = {0}. 4. 3π ) and A4 = ( 3π . 0≤y≤1 π y(1 − y) √ √ To use the cdf given in (2. 0) and A2 = (0. A1 = (0. 8 8 0<y<1 . Use Theorem 2.8 to obtain fY (y) = . Let X be a random variable with density f (x). Therefore g1 (y) = sin−1 ( y). 3 0<y<∞ 3 .1. 1).6) we have that x1 = sin−1 ( y) and x2 = π − sin−1 ( y). Let A0 = {0}.6) we obtain that fY (y) = = = √ d √ √ d √ 2fX (sin−1 ( y) (sin−1 ( y) − 2fX (π − sin−1 ( y) (π − sin−1 ( y) dy dy 1 1 1 1 −1 1 2( √ √ ) − 2( √ √ ) 2π 1 − y 2 y 2π 1 − y 2 y 1 π y(1 − y) 2. A3 = (π. a. 2.5 To apply Theorem 2. t 1 λx t 2e −∞ t −∞ 0 −∞ 1 λx 2 λe dx t 1 1 λx −λx dx 2 λe dx+ 0 2 λe t 1 λe−λx dx 0 2 if t < 0 if t ≥ 0 t 0 = 1 eλt and 2 = − 1 e−λx 2 = − 1 e−λt + 1 .8 can be used for all three parts. P (|X| < t) = 0 for t < 0.6 Theorem 2. 1 λt 2e 1 − 1 e−λt dx 2 if t < 0 if t ≥ 0 0 P (|X| < t) = P (−t < X < t) = −t 1 λx λe dx + 2 t 0 1 −λx λe dx 2 = 1 1 1 − e−λt + −e−λt +1 2 2 = 1 − e−λt . and for t ≥ 0. Use Theorem 2. ∞). Then g1 (x) = 1 − x2 on A1 and g2 (x) = 1 − x2 on A2 .1. π ). 0) and A2 = (0.8. P (X < t) = 1 where.

(i) (ii) (iii) (iv) −1 FX (y) = limx→0 F (x) = 1 − e−0 = 0.1. limx→−∞ F (x) = 1 − e−∞ = 1.1. .7). e−x/2 is increasing. log(2y) 0 ≤ y < 1 ≤ y < 1. limx→1 F (x) = 1/2. and limx→0 F (x) = 1/2. 1 − e−x is increasing in x. Instead write P (Y ≤ y) = P (X 2 ≤ y) √ √ P (− y ≤ X ≤ y) if |x| ≤ 1 √ = P (1 ≤ X ≤ y) if x ≥ 1 = Diﬀerentiation gives fy (y) = 1 2√ 9 y 1 1√ 1 9 + 9 y √ y √ f (x)dx − y X √ y fX (x)dx 1 3 (1 − 16 1 − y)2 √ 1 3 + (2 − y)2 .8 to obtain fY (y) = . If the sets B1 . For A1 = (−1. 2) the calculations are identical to those in part (a). 1).3. So this says that we can apply Theorem 2. limx→−∞ F (x) = e−∞ /2 = 0. Let A0 = {0}. . (i) (ii) (iii) (iv) b. 1) and A2 = (1. Then g1 (x) = 1 − x2 on A1 and g2 (x) = 1 − x on A2 . 2.5. (iii) limx↓0 F (x) = 1 − e−0 /4 = 3 = F (0).Second Edition 2-3 c. 1 − e−x is continuous. (ii) e−x /4 and 1 − e−x /4 are both increasing in x. b.1. limx→∞ F (x) = 1 − (e1−∞ /2) = 1. .8 For each function we check the conditions of Theorem 1.8 on each of the Bk and add up the pieces. we can write fY (y) = k fY (y)I(y ∈ Bk ) and do the transformation on each of the Bk .8 does not directly apply. Theorem 2. so F is continuous.7 Theorem 2. −1 Fx (y) = − log(1 − y).1. 1 − (e1−x /2) is increasing. limx→∞ F (x) = 1 − e−∞ /4 = 1. BK are a partition of the range of Y . . a. B2 . 2. (Note that on A1 we are essentially using Example 2. A1 = (−1. 1−y 8 0<y<1 if |x| ≤ 1 if x ≥ 1 . a. Use Theorem 2. For continuity we only need check x = 0 and x = 1.8 does not directly apply. if y ≤ 1 if y ≥ 1 . 0) and A2 = (0. 1/2 is nondecreasing. 4 log(4y) 0≤y< 1 −1 4 (iv) FX (y) = 1 − log(4(1 − y)) 4 ≤ y < 1 . 2 1 − log(2(1 − y)) 1 ≤ y < 1 2 c.1. so F is right-continuous. (i) limx→−∞ F (x) = e−∞ /4 = 0.

1. for the given pdf. Then −y 1 −y 1 1 1 −y e 2 . Thus. The last inequality is true since Fx (x) ≤ y for every x ∈ Ay . xy ). Let Ay = {x : Fx (x) ≤ y}. 2π −y 2 This was obtained using integration by parts with u = 2y 2 and dv = 1 e 2 fY (y) integrates to 1. which is equivalent to part a). then FY (y) = P (Y ≤ y) = P (Fx (X) ≤ y) = P (X ∈ Ay ). x↑y 0 if x ≤ 1 (x − 1)2 /4 if 1 < x < 3 1 if 3 ≤ x Use the Axiom of Continuity. x]). let y be a value that is “jumped over” by Fx . either open. Y = |X| where −∞ < x < ∞. and Fx (x) ≤ y for every x ∈ Ay .11 a.1. say (−∞. 2. Since Fx is nondecreasing. To get strict inequality for some y.9 From the probability integral transformation. If Ay is closed.12. and this equals limx↑y FX (x) ≤ y.2-4 Solutions Manual for Statistical Inference 2. b. ∞ EY = 0 √ −y −y 1 y 1 e 2 dy = √ −2y 2 e 2 2πy 2π ∞ ∞ + 0 0 y 1 −1 2 e −y 2 1 √ dy = √ ( 2π) = 1. Then FY (y) and the fact the = P (Y ≤ y) = P (|X| ≤ y) = P (−y ≤ X ≤ y) = P (x ≤ y) − P (X ≤ −y) = FX (y) − FX (−y). Exercise 1.7 let Y = X 2 . . 2π Using example 2. b.10. we know that if u(x) = Fx (x).xy )) = lim P (X ∈ (−∞. then FY (y) = P (Y ≤ y) = P (Fx (X) ≤ y) = P (X ∈ Ay ) = Fx (xy ) ≤ y. FY (y) ≤ y for every y. If Ay is open. Using integration by parts with u = x and dv = xe ∞ −x2 2 dx then ∞ EX 2 = −∞ x2 1 −x2 1 e 2 dx = 2π 2π −xe −x2 2 ∞ + −∞ −∞ e −x2 2 dx = 1 (2π) = 1. But now we have P (X ∈ Ay ) = P (X ∈ ( − ∞. lim FX (x) < y < FX (xy ). fY (y) = √ √ e 2 + √ e 2 = √ 2 y 2πy 2π 2π Therefore. and FY (y) = limx↑y FX (x) < y. xy ). 1). That is. for some xy . 2. let y be such that. Theorem 2. xy ]. x↑y For such a y. Therefore. say (−∞. as before. Ay is a half inﬁnite interval. Therefore 0 < y < ∞. Ay = (−∞. calculate u(x) = Fx (x) = . The last inequality is true because xy ∈ Ay .10 a. that is. for every x < xy . or closed. then Fx (X) ∼ uniform(0. We prove part b).

FY (y) = Thus. . 2. π EY = 0 y e−u du = 0 2 −e−u π ∞ 0 = where u = y2 2 . This is the Cauchy distribution restricted to (0. Thus X + Y = (X ∧ Y ) + (X ∨ Y ). ∞ ∞ ∞ EX = k=1 k[(1 − p)k p + pk (1 − p)] = (1 − p)p k=1 k(1 − p)k−1 + k=1 kpk−1 = 2.13 P (X = k) = (1 − p)k p + pk (1 − p). = dx. Then Var(Y ) = 1− π .15 Assume without loss of generality that X ≤ Y . λ µ . Then X ∨ Y = Y and X ∧ Y = X.Second Edition 2-5 Therefore. . 2. π 2 . therefore tan−1 (y/d) = x and fY (y) = 2 1 . Taking expectations E[X + Y ] = E[(X ∧ Y ) + (X ∨ Y )] = E(X ∧ Y ) + E(X ∨ Y ). where the last equality follows from changing the order of integration. ∞ d 1 −y 1 −y FY (y) = fX (y) + fX (−y) = √ e 2 + √ e 2 = dy 2π 2π 2 −y e 2 dy = π 2 π ∞ 2 −y e 2 . 2. and the mean is inﬁnite. ∞ ET = 0 ae−λt +(1 − a)e−µt dt = −ae−λt (1 − a)e−µt − λ µ ∞ = 0 a 1−a + . ∞). . Therefore.12 We have tan x = y/d. Thus. ∞ EY 2 = 0 y2 2 −y e 2 dy = π −y 2 −ye 2 π ∞ ∞ + 0 0 e −y 2 dy = −y 2 2 π π = 1. k = 1. 1 1 dy 1+(y/d)2 d tan−1 (y/d) = 0 < y < ∞. Therefore E(X ∨ Y ) = EX + EY − E(X ∧ Y ). p(1 − p) ∞ ∞ (1 − FX (x))dx = 0 0 ∞ P (X > x)dx ∞ = 0 ∞ x y fX (y)dydx dxfX (y)dy 0 ∞ 0 = = 0 yfX (y)dy = EX.16 From Exercise 2. 2 This was done using integration by part with u = y and dv = ye 2.14 1 1 (1 − p)p 2 + p (1 − p)2 = 1 − 2p + 2p2 ..14. 2. πd 1+(y/d)2 d dy 2 dy.

if X = number of children until the ﬁrst daughter. The assumptions that are needed are the ones listed in Theorem 2. x = g −1 (y) and d dx = dy g −1 (y)dy. 1 π ∞ −∞ 1 1 dx = tan−1 (x) 1+x2 π a −∞ ∞ = −∞ 1 π π π + 2 2 = 1.5. 2. set d E|X − a| = da f (x)dx − −∞ a f (x)dx = 0.3. Then. . 2 2 b. This is the Cauchy pdf.19 d E(X − a)2 da = = −∞ d da ∞ ∞ ∞ (x − a)2 fX (x)dx = −∞ −∞ d (x − a)2 fX (x)dx da ∞ ∞ −2(x − a)fX (x)dx = −2 −∞ xfX (x)dx − a −∞ fX (x)dx = −2[EX − a]. 2 2. d Therefore if da E(X − a)2 = 0 then −2[EX − a] = 0 which implies that EX = a. This is a minimum since d2 /da2 E|X −a| = 2f (a) > 0.14) or by trans√ 2 2 ∞ forming to a gamma kernel (use y = −λ2 /β 2 ). 0 3x2 dx = m3 = 1 ⇒ m = 1 = . Thus X is a geometric random variable. ∞ 2.4. p Therefore. where p = probability of a daughter.21 Since g(x) is monotone ∞ ∞ Eg(X) = −∞ g(x)fX (x)dx = −∞ yfX (g −1 (y)) d −1 g (y)dy = dy ∞ yfY (y)dy = EY.794.the expected number of children is two. −∞ where the second equality follows from the change of variable y = g(x).17 a. 0 e−x /β dx = πβ/2 and hence the function integrates to 1. 2.22 a.20 From Example 1. if p = 1 . and ∞ ∞ EX = k=1 k(1 − p)k−1 p = p − k=1 d (1 − p)k dp = −p d dp ∞ (1 − p) −1 k=0 k = −p d 1 −1 dp p = 1 . The function is symmetric about zero. EX = a is a minimum since d /da E(X − a) = 2 > 0. Therefore.2-6 m set Solutions Manual for Statistical Inference 1/3 2. 2.4. Using integration by parts with u = x and dv = xe−x ∞ 0 2 /β 2 we obtain that 2 x2 e−x 2 /β 2 dx2 = β2 2 ∞ 0 e−x /β 2 dx. therefore m = 0 as long as the integral is ﬁnite. The integral can be evaluated using the argument on pages 104-105 (see 3. If EX = a then d 2 2 2 2 da E(X − a) = −2[EX − a] = −2[a − a] = 0.3. then P (X = k) = (1 − p)k−1 p.18 E|X − a| = −∞ |x − a|f (x)dx = −(x − a)f (x)dx + a ∞ ∞ (x a − a)f (x)dx. The solution to this equation is a = median.

1) pdf (Example 2. 1 0 if 2 ≤ < ∞ f (x)dx = a 0 ∞ f (a + )d f (a − )d 0 a (change variable. . 0) and A2 = (0. Then fY (y) = b. EX = EX 2 = VarX = c.1.24 a. EX = 2β/ π EX 2 = 3β 2 /2 VarX = β 2 . 1 = 0 a a+2 . To show that MX (t) is symmetric about 0 we must show that MX (0 + ) = MX (0 − ) for all > 0.4) is symmetric about a = 0 because (0 − )2 = (0 + )2 . A1 = (−1. n n x 1 1 n(n+1) = n+1 . axa+1 a+1 = 0 1 0 x2 axa−1 dx = − a a+1 2 axa+1 dx = axa+2 a+2 a a+1 . Thus fY (y) = fX (g −1 (y))| dy g −1 (y)| = fX (−y)| − 1| = fX (y) for every y. a a+2 = a . here are three. The uniform(0. Use Theorem 2. ∞ ∞ 1 1 if 0 < < 2 . i=1 n 6 6 2 2 2 2 (n+1)(2n+1) +1 − n+1 = 2n +3n+1 − n +2n+1 = n 12 . (a+2)(a+1)2 2.23 a. EX = EX 2 = VarX = b. 1 5 yfY (y)dy = xaxa−1 dx = 1 3 1 0 EY 2 = axa dx = 1 0 1 0 y 2 fY (y)dy = 1 VarY = 1 5 − 1 2 3 = 4 45 . There are many examples. ∞ 0 ∞ MX (0 + ) = −∞ ∞ e(0+ e 0 ∞ )x fX (x)dx = −∞ 0 e x fX (x)dx + 0 e x fX (x)dx ∞ = = −∞ (−x) fX (−x)dx + −∞ e (−x) fX (−x)dx = −∞ e− x fX (x)dx e(0− )x fX (x)dx = MX (0 − ). = x − a) > 0) = = −∞ (f (a + ) = f (a − ) for all (change variable.25 a. Then g1 (x) = x2 on A1 and g2 (x) = x2 on A2 . 6 2 6 4 2 2 EX = 0 x 3 (x − 1)2 dx = 3 0 (x3 − 2x2 + x)dx = 1. 2 2 2 2 3 EX 2 = 0 x2 2 (x − 1)2 dx = 3 0 (x4 − 2x3 + x2 )dx = 8 . The standard normal pdf (Example 2. x = a − ) f (x)dx. 2 5 8 3 2 VarX = 5 − 1 = 5 . d Y = −X and g −1 (y) = −y.26 a.9) is symmetric about a = 0 because (0 − )2 = (0 + )2 .8 with A0 = {0}.1. 1). 2. b.1. 2. 2 0 < y < 1.Second Edition 2-7 3 4 2−π √ b. The Cauchy pdf (Example 2. EY = 2.4) is symmetric about a = 1/2 because f ((1/2) + ) = f ((1/2) − ) = b.2. 1 0 1 0 1 −1/2 y . x=1 x = n x=1 n = n 2 2 2 n n 1 1 n(n+1)(2n+1) x 2 = n i=1 x = n = (n+1)(2n+1) .

b. f (x) is decreasing for x ≥ 0. c. Therefore. d. The median of X = log 2 < 1 = EX. there must exist an interval (x1 . Since b is the mode then f (b) > f (b + ) ≥ f (b + 2 ) which implies that f (a − ) > f (a) ≥ f (a + ) which contradict the fact the f (x) is symmetric. . d. = a − x in the ﬁrst integral. f (a − ) = e−(a− ) > e−(a+ ) = f (a + ). The uniform on the interval (0. The standard normal pdf. Let a be the point of symmetry and b be the mode. i. f (x) is not symmetric about a > 0. If a > > 0.27 a. c. 1). x2 ) then f (b) > f (b + ) ≥ f (b + 2 ) which contradict the fact the f (x) is symmetric. f (x) is not symmetric about a ≤ 0. Therefore. Thus a is the mode. b is a mode. Therefore. either. x2 ) such that f (x) has the same value in the whole interval. e. a is a median. Thus a ∈ (x1 . f (x) is ﬂat in this interval and for all b ∈ (x1 . For the case when the mode is not unique. For the case when the mode is unique. thus a is not a mode. it must be that f (x) dx = −∞ a f (x) dx = 1/2. x2 ) and is a mode. Let also assume without loss of generality that a = (b + ) > b. (two integrals are same) Therefore. with f (0) > f (x) > f (y) for all 0 < x < y. Let assume that a ∈ (x1 . 2.2-8 Solutions Manual for Statistical Inference Since a ∞ ∞ f (x)dx + −∞ a a f (x)dx = −∞ ∞ f (x)dx = 1. EX = a. Since b is a mode and a = (b + ) ∈ (x1 . Let assume that a is not the mode and without loss of generality that a = b+ > b for > 0. and = x − a in the second integral we obtain that EX − a = E(X − a) ∞ ∞ = − 0 f (a − )d + 0 f (a − )d (f (a + ) = f (a − ) for all > 0) = 0. x2 ). x2 ).e. f (a − ) = 0 < e−(a+ ) = f (a + ). Thus f (x) is unimodal and 0 is the mode. ∞ EX − a = = E(X − a) = −∞ a (x − a)f (x)dx ∞ (x − a)f (x)dx + −∞ ∞ a (x − a)f (x)dx ∞ = 0 (− )f (a − )d + 0 f (a + )d With a change of variable. If − < a ≤ 0.

∞ a ∞ µ3 = −∞ 0 (x − a)3 f (x)dx = −∞ ∞ (x − a)3 f (x)dx + a (x − a)3 f (x)dx = −∞ ∞ y 3 f (y + a)dy + 0 y 3 f (y + a)dy ∞ (change variable y = x − a) = 0 −y 3 f (−y + a)dy + 0 y 3 f (y + a)dy (f (−y + a) = f (y + a)) = 0. α4 = 9 . Similarly. c. y = n(n − 1)p2 x=2 n−2 = n(n − 1)p2 y=0 where we use the identity x(x − 1) n = n(n − 1) n−2 . −1 < x < 1. Each distribution has µ1 = 0.28 a. For the binomial Var(X) = n(n − 1)p2 + np − (np)2 = np(1 − p). therefore α3 = µ3 . and (ii) is least peaked. (iii) is most peaked. (ii) f (x) = 2 . (i) f (x) = √1 e−x /2 . n ∞ ∞ EY = y=0 y a n y+a y a+b−1 a n+a+b−1 y+a n = y=1 n a n−1 (y − 1) + (a + 1) y − 1 a+b−1 a (n−1)+(a+1)+b−1 (y−1)+(a+1) . For f (x) = e−x . 3 5 (iii) f (x) = 1 e−|x| . α4 = 6. α4 = 3. (i) is next. c. µ1 = µ2 = 1.Second Edition 2-9 2. For the binomial n 2 EX(X − 1) = x=2 x(x − 1) n n x p (1 − p)n−x x n − 2 x−2 p (1 − p)n−x x n−2 y p (1 − p)n−2−y = n(n − 1)p2 . 2π 1 1 µ2 = 1 . −∞ < x < ∞.29 a. b. 2. therefore we must calculate µ2 = EX 2 and µ4 = EX 4 . for the Poisson e−λ λx e−λ λy EX(X − 1) = x(x − 1) = λ2 = λ2 . ∞ ∞ µ3 = 0 (x − 1)3 e−x dx = 0 (x3 − 3x2 + 3x − 1)e−x dx = Γ(4) − 3Γ(3) + 3Γ(2) − Γ(1) = 3! − 3 × 2! + 3 × 1 − 1 = 3. 2 As a graph will show. substitute y = x − 2 and recognize x x that the new sum is equal to 1. µ4 = 24. Var(X) = E[X(X − 1)] + EX − (EX)2 . µ4 = 5 . µ2 = 2. b. µ2 = 1. µ4 = 3. x! y! y=0 x=2 where we substitute y = x − 2. For the Poisson Var(X) = λ2 + λ − λ2 = λ.

33 a. being the sum over all possible values of a beta-binomial(n − 1. we necessarily have MX (0) = Ee0 = 1. The sum over all y y−2 possible values of a beta-binomial(n − 2. E(e c. E etX = that e −α/β α ∞ 2β +− −∞ α 4eαt . = e−λ eλe = eλ(e t t ∞ (et λ)x x=1 x! −1) . E[Y (Y − 1)] = n(n−1)a(a+1) is calculated similar to EY. c c tx 1 1 1 1 1 e c dx = ct etx 0 = ct etc − ct 1 = ct (etc 0 c 2x tx e dx = c22t2 (ctetc − etc + 1). but using the identity (a+b)(a+b+1) y(y − 1) n = n(n − 1) n−2 and adding 2 instead of 1 to the parameter a.2-10 n Solutions Manual for Statistical Inference = y=1 n a n−1 (y − 1) + (a + 1) y − 1 n a+b−1 a (n−1)+(a+1)+b−1 (y−1)+(a+1) a+1+b−1 a+1 (n−1)+(a+1)+b−1 (y−1)+(a+1) = na a+b−1 a+1 a a+1+b−1 a+1 n−1 a+1 (y − 1) + (a + 1) y − 1 y=1 = na a+b n−1 j=0 a+1 n−1 j + (a + 1) j a+1+b−1 a+1 (n−1)+(a+1)+b−1 (j+(a+1) = na . therefore it cannot be an mgf. 2. d dt Mx (t) t=0 = eλ(e −1) λet t=0 = λ. (a + b)2 (a + b + 1) )= )= b. t < − log(1 − p). b) will appear in the calculation. Therefore Var(Y ) = E[Y (Y − 1)] + EY − (EY )2 = 2. b).32 d S(t) dt = t=0 d (log(M x (t)) dt = t=0 d dt Mx (t) Mx (t) = t=0 EX = EX 1 since MX (0) = Ee0 = 1 d2 S(t) dt2 = t=0 d dt Mx (t) Mx (t) = t=0 2 Mx (t)M x (t) − [M x (t)] [M x (t)] 2 2 t=0 = 2. 4−β 2 t2 −2/β < t < 2/β. ∞ r+x−1 x=0 x r ∞ tx r+x−1 x=0 e x pr (1 − p)x = pr t x (1 − p)e t x .30 a. a + 1. (integration-by-parts) 1 −(x−α)/β tx e e dx 2β 1 eα/β 1 e−x( β −t) 1 2β ( β − t) E(e ) tx = −∞ 1 (x−α)/β tx e e dx + 2β 1 1 ex( β +t) 1 ( β +t) ∞ α = = d. since this is just the sum of this pmf. to get E(etX ) = . MX (t) = EX = ∞ tx e−λ λx x=0 e x! 1 · EX 2 −(EX) 1 = e−λ t = VarX. a + 2. Now use the fact ∞ r+x−1 x=0 x (1 − p)e 1 − (1 − p)e p 1−(1−p)et t r = 1 for (1 − p)et < 1.31 Since the mgf is deﬁned as MX (t) = EetX . 0 c2 α − 1). . a+b since the last summation is 1. But t/(1 − t) is 0 at t = 0. 2. E(e tX tX nab(n + a + b) .

−(1 − p)e t t=0 t=0 p(1 − p) p2 2 d Mx (t) dt2 = 1−p . p2 Now com- 1 1 c. Mx (t) = x=0 etx p(1 − p)x = p x=0 ((1 − p)e ) t x 1 = p t 1−(1 − p)e EX = = EX 2 = d Mx (t) dt = = p t. = eµt+ σ 2 t2 2 Then we have Mx (t) = e[2µσ EX = EX 2 = d dt Mx (t) t=0 2 2 2 2 1 ∞ t+(σ 2 t)2 ]/2σ 2 √ 1 e− 2σ2 (x−(µ+σ t)) dx 2πσ −∞ 2 2 . ∞ ∞ d2 Mx (t) = dt2 t=0 2 2 λet eλ(e t −1) λet +λet eλ(e t −1) t=0 = λ2 + λ.Second Edition 2-11 EX 2 = VarX = EX − (EX) = λ2 + λ − λ2 = λ. . σ2 = 1) (substitute y = log x. 2. Mx (t) = −∞ etx √2πσ e−(x−µ) /2σ dx = √2πσ plete the square in the numerator by writing 2 2 2 2 ∞ e−(x −2µx−2σ tx+µ )/2σ dx. dy = (1/x)dx) ∞ −∞ ∞ = = ey(r−1) e−y −∞ ∞ 2 /2 y e dy e−y −∞ 2 /2+ry 1 dy = √ 2π e−(y 2 −2ry+r 2 )/2 r 2 /2 e dy = er /2 .35 a. p2 2 2 p(1 − p) + 2(1 − p) (1 − p) − p2 p2 2 2 = 1−p . ∞ d Mx (t) = dt2 t=0 2 2 2 t=0 2 µt+σ t /2 t/2 t=0 = µ2 + σ 2 . −∞ x2 − 2µx − 2σ 2 tx+µ2 = x2 − 2(µ + σ 2 t)x ± (µ + σ 2 t)2 + µ2 = (x − (µ + σ 2 t))2 − (µ + σ 2 t)2 + µ2 = (x − (µ + σ 2 t))2 − [2µσ 2 t + (σ 2 t)2 ]. p t=0 t 2 1−(1 − p)e = p(1 − p)e t +p(1 − p)e 2 1−(1 − p)e t 4 t t (1 − p)e t (1 − (1 − p)e ) p (1 − p) + 2p (1 − p) p4 2 3 2 2 t=0 = VarX = ∞ = p(1 − p) + 2(1 − p) . EX r 1 = 0 xr √ 1 √ 2π 1 √ 2π 2 2 1 e−(log x) /2 dx 2πx (f1 is lognormal with µ = 0. +σ 2 eµt+σ 2 VarX = µ + σ − µ = σ 2 . b. = (µ+σ 2 t)eµt+σ (µ+σ 2 t) e t /2 2 2 = µ. 1−(1 − p)e −p (1 − (1 − p)e ) t 2 t < − log(1 − p).

2. d dt 1 1 dx t x2 = − t1 . The functions look like t2 /2 – it is impossible to see any diﬀerence. Make the transformation y = ex to get the densities in Example 2. dt 1 − t (1 − t) . The graph looks very similar to Figure 2. x→∞ ∞ k Then for any k > 0. dλ λ λ c. dy = (1/x)dx) ∞ 2 2 1 √ e(r −y )/2 sin(2πy)dy 2π −∞ (sin(a + 2πr) = sin(a) if r = 0. c. 2. the integrand is an odd function so the negative integral cancels the positive one. The mgf of f1 is eK1 (t) . ∞ xr f1 (x) sin(2πlog x)dx = 0 = = = 2 2 2 2 1 e−(log x) /2 sin(2π log x)dx 2πx 0 ∞ 2 1 e(y+r)r √ e−(y+r) /2 sin(2πy + 2πr)dy 2π −∞ (substitute y = log x. . x d 2. Verify d dx b. . dx 0 e−λt dt = e−λx . (1−t)2 Verify d dt (x − t)−2 dx = 1 d −1 −(x − t) dt = 1 d 1 1 = 2. x Hence Mx (t) does not exist. t2 ∞ d. = = d dλ 1 −te−λt dt = − Γ(2) = − λ2 .2-12 Solutions Manual for Statistical Inference b.37 a. b. Verify λ2 ∞ e−λt dt = 0 d 1 1 = − 2. 2.36 First. d dλ ∞ −λt e dt 0 x e−λt dt = 0 ∞ d −λt e dt 0 dλ d 1 − e−λt dx λ ∞ 0 x = 0 d dx 1 1 − e−λx + λ λ = e−λx .3.2 except that f1 is symmetric around 0 (since it is standard normal).3.10. there is a constant c such that 1 tx ( log x)2 /2 e e dx ≥ c x 1 ∞ dx = c log x|k = ∞. d dt ∞ 1 dx 1 (x−t)2 = ∞ d dt 1 ∞ dx = 2(x − t)−3 dx = −(x − t) ∞ −2 = 1 1 . d. 1.39 a.) 0. The mgf of f2 is eK2 (t) . ∞ xr √ because e(r −y )/2 sin(2πy) = −e(r −(−y) )/2 sin(2π(−y)). 2 Verify d dt 1 t 1 d dx = 2 x dt 1 (x−t)2 − ∞ 1 1 x 1 = t d dt −1 + 1 t =− 1 . x→∞ ∞ k lim tx − (log x)2 = lim tx = ∞. hence. x→∞ tx lim and. . it can be shown that 2 lim etx−(log x) = ∞ x→∞ 2 by using l’Hˆpital’s rule to show o tx − (log x)2 = 1.

09331 for K = 51. . . K) where M = number of defectives in the lot and K = sample size.1 The pmf of X is f (x) = 1 N1 −N0 +1 . 12 1 N1 −N 0 +1 EX 2 − EX 3. 2 N 1 N1 (N 1 +1) (N 0 −1)(N 0 −1 + 1) − 2 2 Similarly. the probability is at most P (X = 0 or 1 | M = 100. By trial and error we ﬁnd P (X = 0 or 1) = . The only occurrence in the ﬁrst four seconds. for 6 or more defectives. a. . N1 x = N0 . Thus the probability of waiting exactly four seconds before starting to cross is [1 − p(1 − p)3 ](1 − p)3 . K) = 6 0 94 K 100 K = (100 − K) · · · · · (100 − K − 5) .10220 for K = 50 and P (X = 0 or 1) = . using the formula for Ex2 VarX = = x2 .09182 for K = 32. Now P (accept lot) = P (X = 0 or 1). 100 · · · · · 95 By trial and error we ﬁnd P (X = 0) = . for which the pedestrian does not wait the entire four seconds. N = 6. an event with probability (1 − p)3 . is to have a car pass in the ﬁrst second and no other car pass. 3.2 Let X = number of defective parts in the sample. So the sample size must be at least 51. So the sample size must be at least 32. If there are 6 or more defectives in the lot.10056 for K = 31 and P (X = 0) = . M. This has probability p(1 − p)3 . . then the probability that the lot is accepted (X = 0) is at most P (X = 0 | M = 100. N = 6. b. no car must pass in the last three seconds. Then X ∼ hypergeometric(N = 100. N1 . K) = 6 0 94 K 100 K + 6 1 94 K−1 100 K . Then N1 N0 −1 EX = x=N 0 x 1 1 = N1 −N 0 +1 N1 −N 0 +1 x− x=1 x=1 x = = 1 N1 −N 0 +1 N1 + N0 . and. .3 In the seven seconds for the event.Chapter 3 Common Families of Distributions 3. N0 + 1. we obtain N1 (N 1 +1)(2N 1 +1) − N0 (N 0 −1)(2N 0 −1) 6 (N 1 −N 0 )(N 1 −N 0 +2) = .

and the answer based on the approximation equals the exact answer. Since the 1000 customers choose randomly.33) ≈ .8x . even if the new drug is no better than the old. we cannot conclude the new drug is better. Note that using a normal approximation to calculate this binomial probability yields P (X ≥ 85) ≈ P (Z ≥ 1.01 Z> < . If the new and old drugs are equally eﬀective. then the probability that the new drug is eﬀective on a case is .01.8 a.01 where X ∼ binomial(1000. From the normal table we get P (Z > 2.Then 1000 1000 x=N +1 1000 x < .6384. Therefore. we take p = 1/2. 1 b. 1/2). P (X ≤ 1) = e−λ + λe−λ ≤ . 250). Solving e−λ + λe−λ = . . To use the normal approximation we take X ∼ n(500.01.1285. Hence. P (X > N ) = P thus. 3. P X − 500 N − 500 √ > √ 250 250 N − 500 √ 250 < . This last inequality can be used to solve for N .2100−x = . that is. If the cases are independent then X ∼ binomial(100. x So. and 100 P (X ≥ 85) = x=85 100 .8).3-2 Solutions Manual for Statistical Inference 3. that is. the chance of 85 or more eﬀective cases is not too small.125) = .01 where Z ∼ n(0. N is the smallest integer that satisﬁes 1 2 The solution is N = 537. 3.1303.01 ⇒ N − 500 √ = 2. each theater should have at least 537 seats.7 Let X ∼ Poisson(λ).5 Let X = number of eﬀective cases. where we used µ = 1000( 2 ) = 500 1 1 2 and σ = 1000( 2 )( 2 ) = 250.0099 < .99.33 250 ⇒ N ≈ 537. We thus require 1000 P (X > N ) = x=N +1 1000 x 1 2 x 1− 1 2 1000−x < .01 which implies that 1 2 1000 1000 x=N +1 1000 x < . We want P (X > N ) < . We want P (X ≥ 2) ≥ .8. 1).01 by trial and error (numerical bisection method) yields λ = 6.01. .

14 is used to get the penultimate equality. Let X be the number of elementary schools in New York state that have 5 or more sets of twins entering kindergarten. We can think of each one of the 60 children entering kindergarten as 60 independent Bernoulli 1 trials with probability of success (a twin birth) of approximately 90 . Therefore P (X ≥ 1) = 1 − P (X = 0) = . lim N −M N − M − (K − x) N −K N N N −M N −M →∞ = eK−x and N →∞ lim = e−K .N →∞ lim M M +1/2 (N −M ) N N +1/2 (M −x) N −M +1/2 (N −K) N −K+1/2 M −x+1/2 (N −M −K+x) N −M −(K−x)+1/2 . c. . b. Therefore P (X ≥ 1) = 1 − P (X = 0) = 1 − 3. Then the probability of interest is P (X ≥ 1) where X ∼ binomial(310. √ √ M ! = (M !/( 2πM M +1/2 e−M )) 2πM M +1/2 e−M √ √ and M !/( 2πM M +1/2 e−M ) → 1. Then X ∼ binomial(60. N → ∞ and M/N → p imply N − M → ∞.M →∞.90 × 10−41 ≈ 1.M →∞.. When this replacement is made. e−x Lemma 2.N →∞ = K! M !(N −M )!(N −K)! lim x!(K−x)! M/N →p.1698).0006). Thus. lim M x N −M K−x N K M/N →p.0006. In some limits we use the fact that M → ∞. all the 2π and exponential terms cancel. lim K x M x N −M K−x N K M/N →p. .1698. 3.11 a. which is small and may be rare enough to be newsworthy. The probability of having 5 or more successes approximates the probability of having 5 or more sets of twins entering 1 kindergarten. Then the probability of interest is P (X ≥ 1) where X ∼ binomial(500.M →∞. for example. Let X be the number of States that have 5 or more sets of twins entering kindergarten during any of the last ten years. each of which has a ﬁnite limit we can evaluate.Second Edition 3-3 3.3.N →∞ = M/N →p. The ﬁrst term (of the seven terms) is lim M M −x M M →∞ = lim 1 M −x M M M →∞ = lim 1 M 1+ −x M M →∞ = 1 = ex .N →∞ N !(M −x)!(N −M −(K−x))! In the limit. We can evaluate the limit by breaking the ratio into seven terms.9 a. 90 ) and 4 P (X ≥ 5) = 1 − x=0 60 x 1 90 x 1− 1 90 60−x = .M →∞. Similarly we get two more terms. each of the factorial terms can be replaced by the approximation from Stirling’s formula because.

The only term left is lim (M − x) (N − M − (K − x)) (N − K) M −x N −K K x K−x M/N →p. the product of these three limits is one. KM →λ x! N N lim 1 lim x! N.K→∞.K→∞. M K/N → pK → λ. Deﬁne X = number of successes in ﬁrst n trials and Y = number of failures before the rth success. Then X and Y have the speciﬁed binomial and hypergeometric distributions. If in (a) we in addition have K → ∞. we get lim M x N −M K−x N K −x N. M →0.K→∞.N →∞ lim N − M − (K − x) N −K K−x = px (1 − p)K−x .M. KM →λ K N N e−λ λx . x! 3. respectively. by the Poisson approximation to the binomial.M. . x x! c.M →∞.3-4 Solutions Manual for Statistical Inference Note. Three other terms are lim M → ∞ lim M M −x 1/2 = 1 1/2 N −M →∞ N −M N − M − (K − x) N −K N x 1/2 = 1 and N →∞ lim = 1.M. KM →λ N N KM N x K−x K−x e N −M N K K−x = = MK 1 x λ lim 1− N x! N. KM →λ N N = = e K x ex M x ex (N −M ) N K eK N.K→∞. b.N →∞ = M/N →p. we heuristically get M x N −M K−x N K → K x e−λ λx p (1 − p)K−x → . M →0. p → 0. M →0. And we have Fx (r − 1) = = = = = = P (X ≤ r − 1) P (rth success on (n + 1)st or later trial) P (at least n + 1 − r failures before the rth success) P (Y ≥ n − r + 1) 1 − P (Y ≤ n − r) 1 − FY (n − r). Using Stirling’s formula as in (a). M →0.M.M →∞.12 Consider a sequence of Bernoulli trials with success probability p.

1. EX .Second Edition 3-5 3. . p2 log p ∞ (1−p) d log p dp (1 − p)x = x=1 (1−p) d 1−p log p dp p (1−p) . . = λ/(1 − e−λ ) = (λ2 + λ)/(1 − e−λ ) − (λ/(1 − e−λ ))2 . ∞ −(1−p)x x=1 x log p pr (1 − p)0 = 1 − pr . x = 1. . 1−pr r(1 − p) p(1 − pr ) r(1 − p) + r2 (1 − p)2 r(1 − p) − . Mx (t) = −1 log p ∞ (1−p)e x=1 t x = log(1+pet −et ) log p and can be diﬀerentiated to obtain the moments. the mgf can be calculated. . r−1 0 b. Since the geometric series converges uniformly. P (X > 0) 2 In a similar way we get EXT = Thus.13 For any X with support 0. EX P (X > 0) e−λ λ0 0! 2 VarXT = EX 2 − P (X > 0) .14 a. For Poisson(λ). . a. ∞ r+x−1 x x = 1 log p ∞ ∞ −(1−p)x x=1 x EX = −1 log p (1−p) x=1 x = −1 log p (1−p) −1 == x=0 x −1 1 −1 −1 = log p p log p 1−p p . Then = = = pr (1 − p) . p2 (1 − pr ) p(1 − pr )2 = 1. . . we have the mean and variance of the 0−truncated XT are given by ∞ ∞ EXT = x=1 xP (XT = x) = x=1 ∞ x P (X = x) P (X > 0) 1 xP (X = x) = P (X > 0) x=0 ∞ = 1 xP (X = x) = P (X > 0) x=1 EX 2 P (X>0) . P (X > 0) = 1 − P (X = 0) = 1 − P (XT = x) EXT VarXT 3. log p −(1−p) p2 log p 1+ Alternatively. b. . . P (X > 0) = 1 − P (X = 0) = 1 − P (XT = x) EXT VarXT = e−λ λx x!(1−e−λ ) = 1 − e−λ . therefore x = 1. For negative binomial(r. since the sum is the Taylor series for log p.. . EX 2 = = Thus VarX = −1 log p ∞ x(1 − p)x = x=1 ∞ (1−p) log p d (1 − p)x dp x=1 = −(1−p) . 2. p). 2.

and. we obtain √ √ ∞ √ √ π √ 2 2 −z2 /2 e zdz = 2 e−z /2 dz = 2 √ = π. 3. the moment generating function of a gamma(r. u = tα and dv = e−t dt. this formula is valid for all ν > −α. the Poisson moment generating function. . . 3.3-6 Solutions Manual for Statistical Inference 3.15. Γ(α)β α Γ(α) xν ∞ 1 Γ(α)β α ∞ x(ν+α)−1 e−x/β dx 0 Note. 3. If X ∼ gamma(α. 3. pt 1−t (p − 1)te +ept p→0 p pt 1−(1 − p)e = lim p→0 so the moment generating function converges to (1 − t)−r . The probabilistic relationship is P (X ≥ x) = P (Y ≤ α − 1). Now use L’Hˆpital’s rule to calculate o 1 1 = . MpY (t) = lim p 1−(1−p)ept r p 1−(1−p)et r . 1+x2 thus the moment generating function does not exist.16 a.17 EX ν = = 1 xα−1 e−x/β dx = Γ(α)β α 0 Γ(ν+α)β ν+α β ν Γ(ν+α) = . as r → ∞. This will establish the formula.3. −∞ 1+x2 On (0.3.18 If Y ∼ negative binomial(r.3. p)..21 The moment generating function would be deﬁned by ∞ 0 1 π ∞ etx dx. ∞).15 The moment generating function for the negative binomial is M (t) = the term r(1 − p)(e −1) λ(e −1) → = λ(et − 1) t 1 1−(1 − p)e t t p t 1−(1 − p)e r = 1+ 1 r(1 − p)(e −1) r 1−(1 − p)et t r . The expectation does not exist for ν ≤ −α.14). 3. 1) and Y ∼ Poisson(x).19 Repeatedly apply the integration-by-parts formula 1 Γ(n) ∞ z n−1 z −z dz = x xn−1 e−x 1 + (n − 1)! Γ(n − 1) ∞ z n−2 z −z dz. hence etx dx > 1+x2 ∞ 0 x dx = ∞. x until the exponent on the second integral is zero. b. Making the change of variable z = ∞ ∞ Γ(1/2) = 0 t−1/2 e−t dt = 0 2t. etx > x. t = z 2 /2. its moment generating function is MY (t) = from Theorem 2. p → 1 and r(p − 1) → λ. 1).14.e. z 2 0 where the penultimate equality uses (3. Thus by Lemma 2. Using integration by parts with. the negative binomial moment generating function converges to t eλ(e −1) . we obtain ∞ Γ(α + 1) = 0 t(α+1)−1 e−t dt = tα (−e ) √ −t ∞ ∞ − 0 0 αtα−1 (−e−t )dt = 0 + αΓ(α) = αΓ(α). i.

(α+β)Γ(α+β)Γ(α) α+β (α+1)αΓ(α)Γ(α+β) α(α+1) = = . p2 ∞ 2 2 = r(r + 1) = c. = − 2 (α+β)(α+β+1) (α+β)2 (α+β) (α+β+1) = . ∞ EX 2 = 0 x2 1 xα−1 e−x/β dx = Γ(α)β α 1 Γ(α)β α xα+1 e−x/β dx 0 = VarX d. p2 2 where in the second equality we substituted y = x − 2. VarX = EX(X − 1) + EX − (EX)2 r(1 − p) r2 (1 − p) (1 − p) + − 2 p p p2 r(1 − p) .Second Edition 3-7 3. (α+β+1)(α+β)Γ(α+β)Γ(α) (α+β)(α+β+1) αβ α(α+1) α2 = . ∞ λx−2 (x−2)! x=2 λy y! y=0 ∞ = e−λ λ2 eλ = λ2 = λ2 + EX = λ2 + λ = EX 2 − (EX)2 = λ2 + λ − λ2 = λ. Thus. Γ(α)β α EX 2 − (EX)2 = α(α + 1)β 2 − α2 β 2 = αβ 2 . ∞ E(X(X−1)) = x=0 x(x − 1) ∞ e−λ λx x! (let y = x − 2) = e−λ λ2 = e−λ λ2 EX 2 VarX b. Γ(α+1)Γ(α+β) Γ(α+β+1)Γ(α) Γ(α+2)Γ(α+β) Γ(α+β+2)Γ(α) EX 2 − (EX)2 αΓ(α)Γ(α+β) α = . E(X(X−1)) = x=0 ∞ x(x − 1) r(r + 1) x=2 r+x−1 pr(1 − p)x x r+x−1 pr(1 − p)x x−2 2 ∞ y=0 = = r(r + 1) = r(r − 1) (1 − p) p2 r+2+y−1 pr + 2(1 − p)y y (1 − p) . and in the third equality we use the fact that we are summing over a negative binomial(r + 2.3. (Use 3.18) EX EX 2 VarX = = = = 1 Γ(α + 2)β α+2 = α(α + 1)β 2 .22 a. p) pmf.

The double exponential(µ. γ 1 Thus EY = β 1/γ Γ( γ + 1) and VarY = β 2/γ Γ 1 γ +1 /2 . ∞ x−β−1 dx = α −1 −β x β ∞ = α 1 . b. For Y = X 1/γ . we calculate γ −y γ /β γ−1 y . x > 0. where we take u = y 2 . fx (x) = β e−x/β . EX n = βαn (n−β) . 3. βαβ thus f (x) integrates to 1 . Thus VarY = 2(1 − π/4). the variance of a standard normal. The gamma(a. 1 xa−1 e−x/b . and the integrand is sym2π metric. 1 3. βe y > 0. dv = ye−y c.23 a. Thus.26.24 a. therefore EX EX 2 VarX = = = αβ (1 − β) αβ 2 (2 − β) αβ 2 (αβ) − 2 2−β (1−β) 2 c. fY (y) = ye−y ∞ . EX = µ. σ) pdf is symmetric about µ. y > 0 . b. by Exercise 2. If β < 2 the integral of the second moment is inﬁnite.3-8 Solutions Manual for Statistical Inference e. Use integration-by-parts to calculate the second moment ∞ EY 2 = 0 y 3 e−y 2 2 ∞ /2 dy = 2 0 ye−y 2 /2 dy = 2. βe x > 0. Γ(a)ba fX (x) = Make the transformation y = 1/x with dx = −dy/y 2 to get fY (y) = fX (1/y)|1/y 2 | = 1 Γ(a)ba 1 y a+1 e−1/by . Using the transforma- EY n = γ β ∞ y γ+n−1 e−y 0 γ ∞ /β dy = β n/γ 0 2 γ +1 z n/γ e−z dz = β n/γ Γ −Γ2 2 n +1 . b) density is /2 . fx (x) = 1 −x/β . . For Y = (2X/β)1/2 . ∞ VarX = −∞ (x − µ)2 ∞ 1 −|x−µ|/σ e dx = 2σ 1 σz 2 e−|z| σdz 2 −∞ ∞ = σ2 0 z 2 e−z dz = σ 2 Γ(3) = 2σ 2 . fY (y) = tion z = y γ /β. We now notice that √ y e 2 −y 2 /2 EY = 0 ∞ 2 dy = 2π 2 since √1 −∞ y 2 e−y /2 = 1.

The constant I1 = . fx (x) = = = ∞ 1 Γ(a − 1)ba−1 1 e−1/by = a Γ(a)b 0 y Γ(a)ba Γ(a − 2)ba−2 1 = . since it is the variance of a n(0. Γ(a)ba (a − 1)(a − 2)b2 a = 1 (a − 1)b 1 (a−1)2 (a−2)b2 . 2 0 e. hT (t) = Also. using Γ(3/2) = 1 π. fY (y) = 2 ∞ 0 calculate the moments we use integration-by-parts with u = y . −∞ < y < ∞. Γ(3/2) y e −y 2 y > 0. t ≤ T ) P (t ≤ T ) P (t ≤ T ≤ t+δ) P (t ≤ T ) FT (t+δ) − F T (t) . For Y = α − γ log X. I2 = 0 (log x)2 e−x dx. and EY 2 = E(α − γ log x)2 = α2 − 2αγI1 + γ 2 I2 . 1 x3/2−1 e−x/β . If we deﬁne ∞ ∞ α−y α−y 1 γ. hT (t) = fT (t) (1/β)e−t/β 1 = .5772157 is called Euler’s constant. 3. then EY = E(α − γ log x) = α − γI1 . 1−F T (t) 1 1 − FT (t) = δ→0 lim FT (t + δ) − FT (t) δ = FT (t) 1 − FT (t) = fT (t) . x > 0. Thus. fY (y) = e−e γ e γ of EY and EY 2 cannot be done in closed form. dv = ye−y to obtain EY 2 = 2 Γ(3/2) ∞ 0 ∞ 2 y 4 e−y dy = 2 3 Γ(3/2) ∞ 0 y 2 e−y dy = 2 3 √ π. dv = ye EY = 2 Γ(3/2) ∞ 0 2 2 2 −y 2 . Γ(3/2)β 3/2 x > 0. fT (t) = 1 −t/β βe P (t ≤ T ≤ t+δ. For Y = (X/β)1/2 . = −t/β 1−F T (t) β 1−(1 − e ) . − 3. 1−F T (t) d 1 (log[1 − F T (t)]) = − (−fT (t)) = hT (t). symmetry yields √ √ 2 ∞ 2 −y y e dy = π. Γ(3/2) 1 Using the fact that 2√π −∞ y 2 e−y = 1. dt 1−F T (t) t 1 −x/β e dx 0 β and FT (t) = = − e−x/β t 0 = 1 − e−t/β .25 Note that if T is continuous then. VarY = 6 − 4/π. Calculation I1 = 0 log xe−x dx. To to obtain y 3 e−y dy = 2 2 Γ(3/2) ye−y dy = 2 1 Γ(3/2) and with u = y 3 .Second Edition 3-9 The ﬁrst two moments are EY EY 2 and so VarY = d. Thus.26 a. fx (x) = e−x . P (t ≤ T ≤ t+δ|t ≤ T ) = = = Therefore from the deﬁnition of derivative. 2).

c(β) = B(α. Thus. w1 (β) = β − 1. 1 exp 2πσ −1 (x − µ)2 . (i) α known. t2 (x) = x. (ii) β known. . c(σ 2 ) = (ii) σ 2 known. h(x) = (1 − x) I[0. σ 2 ) pdf f (x). t ≥ 0 and FT (t) = γ 1 − e−t /β . x > 0. where u = xγ/β . Hence the pdf is unimodal with mode µ.27 a. β and fT (t) = (1+e−(t−µ)/β ) . 1 c. Γ(α)β α w1 (α) = α − 1.∞) α−1 α+β−2 . For the gamma(α. FT (t) = 1 1+e−(t−µ)/β (γ/β)tγ−1 e−t e−tγ /β e−(t−µ)/β 2 γ /β = γ γ−1 t . β) pdf f (x). ignoring constants. Γ(α)β α β 1 h(x) = I{x>0} (x). 1 β−1 (ii) β known. Hence the pdf is unimodal with mode 3. t1 (x) = log(1 − x). dx f (x) = x β [β(α−1) − x]. 1 e−(t−µ)/β 1+e−(t−µ)/β hT (t) = 1 −(t−µ)/β (1+e−(t−µ)/β )2 e β = 1 FT (t). α Γ(α)β c(β) = 1 βα .β) . β 3. t1 (x) = log x. d c. −x2 2σ 2 1 x2 exp − 2 2σ 2πσ √1 2πσ exp − . 2 α−2 −x/β /2σ 2 .3-10 γ Solutions Manual for Statistical Inference γ b. t1 (x) = log x. Thus.β) . which only has d f (x) = xα−2 (1 − x)β−2 [(α−1) − x(α+β−2)] . β) = 1 x exp((α − 1) log x − ). √ 1 I (σ 2 ). w2 (α.28 a. 2πσ 2 (0. w1 (x) = α − 1. The uniform pdf satisﬁes the inequalities of Exercise 2. β) = Γ(α)β α . c(α. c(α) = B(α. f (x|µ) = √ h(x) = exp b. µ2 2σ 2 . x > 0. β) = −1/β. 1 w1 (σ 2 ) = − 2σ2 .27. t1 (x) = log x. For the n(µ. t γ γ−1 −xγ /β x e dx 0 β = tγ/β 0 e−u du = − e−u |0 tγ/β = hT (t) = c. For the beta(α. d e b. dx which only has one sign change. ignoring constants. 1 Γ(α)β α 1 exp((α − 1) log x).1] (x). f (x|σ 2 ) = √ h(x) = 1. β unknown. which only has one sign change. β) pdf f (x). α−1 −x 1 xα−1 e β . (i) α known. c(α) = (iii) α. dx f (x) = x−µ e−(−x/β) σ2 one sign change. σ2 x σ2 . (i) µ known. c(µ) = exp −µ2 2σ 2 w1 (µ) = µ. t1 (x) = f (x|β) = h(x) = x Γ(α) .1] (x). 2σ 2 t1 (x) = (x − µ)2 . exp µ x . f (x|α. ignoring constants. hence is unimodal. w1 (β) = 1 β. d. t1 (x) = −x. Hence the pdf is unimodal with mode β(α − 1). w1 (α) = α − 1. f (x|α) = e−x/β h(x) = e−x/β . fT (t) = β tγ−1 e−t /β . h(x) = xα−1 I[0.

h(x) = r − 1 I{r.η2 ):η1 > −1. For the n(µ. t1 (x) = x.1] (x). so the natural parameter is (η1 . 0 = ∂ ∂θ k h(x)c(θ) exp i=1 k wi (θ)ti (x) dx = h(x)c (θ) exp i=1 k wi (θ)ti (x) dx k + h(x)c(θ) exp i=1 wi (θ)ti (x) i=1 k ∂wi (θ) ti (x) dx ∂θj k = h(x) ∂ logc(θ) c(θ) exp ∂θj k wi (θ)ti (x) dx + E i=1 i=1 ∂wi (θ) ti (x) ∂θj = Therefore E ∂ logc(θ) + E ∂θj k ∂wi (θ) i=1 ∂θj ti (x) i=1 ∂wi (θ) ti (x) ∂θj ∂ = − ∂θj logc(θ). 2 2 3.η2 < 0}. d.} (x). −1/β) with natural parameter space {(η1 . −∞ < η2 < ∞}. Γ(α)β α so the natural parameter is (η1 . η2 ) = (α − 1.β) . h(x) = x! I{0.31 a. w1 (p) = log(1 − p). σ 2 ) f (x) = 1 √ 2π e−µ /2σ σ e−x 2 /2σ 2 +xµ/σ 2 .1. c(θ) = e−θ . f (x) = Γ(α+β) Γ(α)Γ(β) e(α−1) log x+(β−1) log(1−x) . c(p) = x−1 p 1−p r . 1 d. For the negative binomial(r. 3. β unknown. w1 (θ) = log θ. w1 (α) = α − 1. β − 1) and the natural parameter space is {(η1 .η2 ):η1 > −1... η2 ) = (−1/2σ 2 . 1 h(x) = I[0. β) = B(α. c(α. For the beta(α.2. so the natural parameter is (η1 ..η2 > −1}. e. β). µ/σ 2 ) with natural parameter space {(η1 . β). t1 (x) = x.29 a. b. For the gamma(α. r known. 1 f (x) = e(α−1) log x−x/β . P (X = x) = r+x−1 x (pr ) ex log (1−p) . e.η2 ):η1 < 0. t2 (x) = log(1 − x). . For the Poisson 1 f (x) = e−θ exlogθ x! so the natural parameter is η = log θ and the natural parameter space is {η:−∞ < η < ∞}. p).Second Edition 3-11 (iii) α.r+1.} (x). η2 ) = (α − 1.. c.. t1 (x) = log x.. w2 (β) = β − 1. so the natural parameter is η = log(1 − p) with natural parameter space {η:η < 0}.

3-12

Solutions Manual for Statistical Inference

b. 0 = ∂2 ∂θ2
k

h(x)c(θ) exp
i=1 k

wi (θ)ti (x) dx

=

h(x)c (θ) exp
i=1

wi (θ)ti (x) dx
k k

+

h(x)c (θ) exp
i=1 k

wi (θ)ti (x)
i=1 k

∂wi (θ) ti (x) dx ∂θj ∂wi (θ) ti (x) dx ∂θj
2

+

h(x)c (θ) exp
i=1 k

wi (θ)ti (x)
i=1 k

+

h(x)c(θ) exp
i=1 k

wi (θ)ti (x)
i=1 k

∂wi (θ) ti (x) ∂θj

dx

+

h(x)c(θ) exp
i=1 2

wi (θ)ti (x)
i=1 k

∂ 2 wi (θ) ti (x) dx 2 ∂θj

=

h(x)

∂ 2 logc(θ) c(θ) exp ∂θj
2 k

wi (θ)ti (x) dx
i=1

+

c (θ) h(x) c(θ)

c(θ) exp
i=1 k

wi (θ)ti (x) dx

+2

∂ logc(θ) E ∂θj
k

i=1

∂wi (θ) ti (x) ∂θj
k

+E (
i=1 2

∂wi (θ) ti (x))2 + E ∂θj
2

i=1

∂ 2 wi (θ) ti (x) 2 ∂θj

=

∂ ∂ 2 logc(θ) + ∂θ logc(θ) ∂θj j
k

−2E
i=1 k

∂wi (θ) ti (x) E ∂θj

k

i=1

∂wi (θ) ti (x) ∂θj
k

+E (
i=1

∂wi (θ) ti (x))2 + E ∂θj
k

i=1

∂ 2 wi (θ) ti (x) 2 ∂θj
k

=

∂2 2 logc(θ) + Var ∂θj
k ∂wi (θ) i=1 ∂θj ti (x)

i=1
2

∂wi (θ) ti (x) ∂θj

+E
i=1

∂ 2 wi (θ) ti (x) . 2 ∂θj .

Therefore Var

∂ = − ∂θ2 logc(θ) − E
j

k ∂ 2 wi (θ) ti (x) 2 i=1 ∂θj 1 2θ ,

3.33 a. (i) h(x) = ex I{−∞<x<∞} (x), c(θ) = (ii) The nonnegative real line.

√1 2πθ

exp( −θ )θ > 0, w1 (θ) = 2

t1 (x) = −x2 .

1 b. (i) h(x) = I{−∞<x<∞} (x), c(θ) = √2πaθ2 exp( −1 ) − ∞ < θ < ∞, a > 0, 2a 1 1 w1 (θ) = 2aθ2 , w2 (θ) = aθ , t1 (x) = −x2 , t2 (x) = x. (ii) A parabola.

Second Edition
1 α c. (i) h(x) = x I{0<x<∞} (x), c(α) = Γ(α) α > 0, w1 (α) = α, w2 (α) = α, t1 (x) = log(x), t2 (x) = −x. (ii) A line. d. (i) h(x) = C exp(x4 )I{−∞<x<∞} (x), c(θ) = exp(θ4 ) − ∞ < θ < ∞, w1 (θ) = θ, w2 (θ) = θ2 , w3 (θ) = θ3 , t1 (x) = −4x3 , t2 (x) = 6x2 , t3 (x) = −4x. (ii) The curve is a spiral in 3-space. (iii) A good picture can be generated with the Mathematica statement
α

3-13

ParametricPlot3D[{t, t^2, t^3}, {t, 0, 1}, ViewPoint -> {1, -2, 2.5}]. 3.35 a. In Exercise 3.34(a) w1 (λ) =
1 2λ

and for a n(eθ , eθ ), w1 (θ) =

1 . 2eθ

µ 1 b. EX = µ = αβ, then β = α . Therefore h(x) = x I{0<x<∞} (x), αα c(α) = Γ(α)( µ )α , α > 0, w1 (α) = α, w2 (α) = α , t1 (x) = log(x), t2 (x) = −x. µ
α

c. From (b) then (α1 , . . . , αn , β1 , . . . , βn ) = (α1 , . . . , αn , α1 , . . . , αn ) µ µ
1 3.37 The pdf ( σ )f ( (x−µ) ) is symmetric about µ because, for any σ

> 0, 1 f σ (µ− )−µ σ .

1 f σ

(µ+ )−µ σ

=

1 f σ σ

=

1 f − σ σ

=

Thus, by Exercise 2.26b, µ is the median. 3.38 P (X > xα ) = P (σZ + µ > σzα + µ) = P (Z > zα ) by Theorem 3.5.6. 3.39 First take µ = 0 and σ = 1. a. The pdf is symmetric about 0, so 0 must be the median. Verifying this, write

P (Z ≥ 0) =
0 ∞

1 1 1 dz = tan−1 (z) 2 π 1+z π

=
0

1 π

π 1 −0 = . 2 2

1 1 b. P (Z ≥ 1) = π tan−1 (z) 1 = π π − π = 1 . By symmetry this is also equal to P (Z ≤ −1). 2 4 4 Writing z = (x − µ)/σ establishes P (X ≥ µ) = 1 and P (X ≥ µ + σ) = 1 . 2 4

3.40 Let X ∼ f (x) have mean µ and variance σ 2 . Let Z = EZ = and VarZ = Var X −µ σ = 1 σ2 1 σ

X−µ σ .

Then

E(X − µ) = 0

Var(X − µ) =

1 σ2

VarX =

σ2 = 1. σ2

Then compute the pdf of Z, fZ (z) = fx (σz +µ)·σ = σfx (σz +µ) and use fZ (z) as the standard pdf. 3.41 a. This is a special case of Exercise 3.42a. b. This is a special case of Exercise 3.42b. 3.42 a. Let θ1 > θ2 . Let X1 ∼ f (x − θ1 ) and X2 ∼ f (x − θ2 ). Let F (z) be the cdf corresponding to f (z) and let Z ∼ f (z).Then F (x | θ1 ) = P (X1 ≤ x) = P (Z + θ1 ≤ x) = P (Z ≤ x − θ1 ) = F (x − θ1 ) ≤ F (x − θ2 ) = P (Z ≤ x − θ2 ) = P (Z + θ2 ≤ x) = P (X2 ≤ x) = F (x | θ2 ).

3-14

Solutions Manual for Statistical Inference

The inequality is because x − θ2 > x − θ1 , and F is nondecreasing. To get strict inequality for some x, let (a, b] be an interval of length θ1 − θ2 with P (a < Z ≤ b) = F (b) − F (a) > 0. Let x = a + θ1 . Then F (x | θ1 ) = F (x − θ1 ) = F (a + θ1 − θ1 ) = F (a) < F (b) = F (a + θ1 − θ2 ) = F (x − θ2 ) = F (x | θ2 ).

b. Let σ1 > σ2 . Let X1 ∼ f (x/σ1 ) and X2 ∼ f (x/σ2 ). Let F (z) be the cdf corresponding to f (z) and let Z ∼ f (z). Then, for x > 0, F (x | σ1 ) = P (X1 ≤ x) = P (σ1 Z ≤ x) = P (Z ≤ x/σ1 ) = F (x/σ1 ) ≤ F (x/σ2 ) = P (Z ≤ x/σ2 ) = P (σ2 Z ≤ x) = P (X2 ≤ x) = F (x | σ2 ).

The inequality is because x/σ2 > x/σ1 (because x > 0 and σ1 > σ2 > 0), and F is nondecreasing. For x ≤ 0, F (x | σ1 ) = P (X1 ≤ x) = 0 = P (X2 ≤ x) = F (x | σ2 ). To get strict inequality for some x, let (a, b] be an interval such that a > 0, b/a = σ1 /σ2 and P (a < Z ≤ b) = F (b) − F (a) > 0. Let x = aσ1 . Then F (x | σ1 ) = F (x/σ1 ) = F (aσ1 /σ1 ) = F (a) < F (b) = F (aσ1 /σ2 ) = F (x/σ2 ) = F (x | σ2 ).
1 3.43 a. FY (y|θ) = 1 − FX ( y |θ) y > 0, by Theorem 2.1.3. For θ1 > θ2 ,

FY (y|θ1 ) = 1 − FX

1 θ1 y

≤ 1 − FX

1 θ2 y

= FY (y|θ2 )

for all y, since FX (x|θ) is stochastically increasing and if θ1 > θ2 , FX (x|θ2 ) ≤ FX (x|θ1 ) for 1 1 all x. Similarly, FY (y|θ1 ) = 1 − FX ( y |θ1 ) < 1 − FX ( y |θ2 ) = FY (y|θ2 ) for some y, since if θ1 > θ2 , FX (x|θ2 ) < FX (x|θ1 ) for some x. Thus FY (y|θ) is stochastically decreasing in θ. b. FX (x|θ) is stochastically increasing in θ. If θ1 > θ2 and θ1 , θ2 > 0 then θ1 > θ1 . Therefore 2 1 1 FX (x| θ1 ) ≤ FX (x| θ1 ) for all x and FX (x| θ1 ) < FX (x| θ1 ) for some x. Thus FX (x| θ ) is 1 2 1 2 stochastically decreasing in θ. 3.44 The function g(x) = |x| is a nonnegative function. So by Chebychev’s Inequality, P (|X| ≥ b) ≤ E|X|/b. Also, P (|X| ≥ b) = P (X 2 ≥ b2 ). Since g(x) = x2 is also nonnegative, again by Chebychev’s Inequality we have P (|X| ≥ b) = P (X 2 ≥ b2 ) ≤ EX 2 /b2 . For X ∼ exponential(1), E|X| = EX = 1 and EX 2 = VarX + (EX)2 = 2 . For b = 3, E|X|/b = 1/3 > 2/9 = EX 2 /b2 . √ Thus EX 2 /b2 is a better bound. But for b = 2, √ E|X|/b = 1/ 2 < 1 = EX 2 /b2 . Thus E|X|/b is a better bound.

µ = 1 2 and σ 2 = 1 12 .00674 0.6.0498 0.617 . ∞ ∞ MX (t) = −∞ etx fX (x)dx ≥ a ∞ etx fX (x)dx ≥ eta a fX (x)dx = eta P (X ≥ a). where we use the fact that etx is increasing in x for t > 0.0821 0. µ = λ and σ 2 = λ2 .134 0 0 0 0 exp(λ) exact .135 .0625 .711 . From Example 3.1 .44 . b.47 P (|Z| > t) = = = 1 2P (Z > t) = 2 √ 2π 2 π 2 π ∞ t ∞ t e−x 2 /2 dx 1+x2 −x2 /2 e dx 1+x2 1 −x2 /2 e dx+ 1+x2 ∞ t ∞ t x2 −x2 /2 e dx .942 .5 √ 3 2 4 10 Comparison of probabilities u(0. k .01 So we see that Chebychev’s Inequality is quite conservative. x) must be nonnegative. Chebychev’s Inequality gives the bound P (|X − µ| > kσ) ≤ 1/k 2 . c. 1+x2 . ∞ a MX (t) = −∞ etx fX (x)dx ≥ −∞ a ta −∞ etx fX (x)dx ≥ e fX (x)dx = eta P (X ≤ a). 3. 1). 3.45 a. 1) exact .423 . h(t.25 .2.33 . where we use the fact that etx is decreasing in x for t < 0. 3.926 .0651 0.46 For X ∼ uniform(0. thus P (|X − µ| > kσ) = 1 − P (λ − kλ ≤ X ≤ λ + kλ) = 1 + e−(k+1) − ek−1 e−(k+1) k≤1 k > 1. For X ∼ exponential(λ).5 1 1. thus = 1− 0 2k √ 12 P (|X − µ| > kσ) = 1 − P 1 k 1 k −√ ≤X≤ +√ 2 2 12 12 k< k≥ √ √ 3.0000167 Chebychev 100 4 1 .Second Edition 3-15 3.

b. E(g(X)(X − αβ)) = 0 g(x)(x − αβ) 1 β α xα−1 e−x/β dx.49 a. Let u = g(x) and dv = (β − (α − 1) 1−x )xα−1 (1 − x)β . x ≥ M − (N − k) if x = M − (N − k) − 1 otherwise. ∞ 2 2 t e−t /2 + π 1 + t2 2 2 t e−t /2 + 2 π1+t 2 2 t e−t /2 . the ﬁrst term is zero. . to x2 −x2 /2 e dx = 1 + x2 = 2 x (−e−x /2 ) 2 1+x 2 t e−t /2 + 2 1+t − t ∞ t 2 1 − x2 (−e−x /2 )dx 2 )2 (1 + x t 1 − x2 −x2 /2 e dx. Then Eg(X)(X − αβ) = 1 −g(x)βxα e−x/β Γ(α)β α ∞ ∞ +β 0 0 g (x)xα e−x/β dx . du = g (x). x < M . The expectation is x Γ(α + β) g(x)xα−1 (1 − x)β Γ(α)Γ(β) 1 0 1 + 0 (1 − x)g (x)xα−1 (1 − x)β−1 dx = E((1 − X)g (X)). E g(X) β−(α−1) 1−X x = Γ(α+β) Γ(α)Γ(β) 1 g(x) β − (α − 1) 0 1−x x xα−1 (1 − x)β−1 dx. dv = (x − αβ)xα−1 e−x/β . v = −e−x ∞ 2 /2 .3-16 Solutions Manual for Statistical Inference To evaluate the second term. let u = obtain ∞ t x 1+x2 . assuming the ﬁrst term is zero and the integral exists. P (Z ≥ t) = = ≥ 3. dv = xe−x ∞ 2 /2 dx. and the second term is βE(Xg (X)). Assuming g(x) to be diﬀerentiable. π 1 + t2 2 π 2 π ∞ t ∞ t 1 1 − x2 + 1 + x2 (1 + x2 )2 2 (1 + x2 )2 e−x 2 e−x 2 /2 dx /2 dx r+x+1−1 r p (1 − p)x+1 = x+1 r+x x+1 (1 − p)P (X = x). du = 1−x2 (1+x2 )2 . if x < k.48 For the negative binomial P (X = x + 1) = For the hypergeometric  (M −x)(k−x+x+1)(x+1)  P (X=x)  M N −M P (X = x + 1) = (x+1)(k−x−1)   (N ) k 0 3. v = −βxα e−x/β . Γ(α) Let u = g(x). E|Xg (X)| < ∞ and limx→∞ g(x)xα e−x/β = 0. (1 + x2 )2 Therefore.

6.50 The proof is similar to that of part a) of Theorem 3. p). . Eg(X) ∞ = x=0 ∞ g(x) r+x−1 r p (1 − p)x x r+y−2 r p (1 − p)y−1 y−1 y r+y−1 (set y = x + 1) = y=1 ∞ g(y − 1) = y=1 ∞ g(y − 1) r+y−1 r p (1 − p)y−1 y r+y−1 r p (1 − p)y y .Second Edition 3-17 3. r+y−2 y−1 = y=0 y g(y − 1) r+y−1 1−p X g(X − 1) r+X −1 1−p (the summand is zero at y = 0) = E where in the third equality we use the fact that = y r+y−1 r+y−1 y . For X ∼ negative binomial(r.8.

1 2 0 0 C(x + 2y)dxdy = 4C = 1. (2. 4).4 a. fX (x) = + 2y)dy = 1 (x + 1) 0 < x < 2 4 0 otherwise x y c. 5). The circle x2 + y 2 ≤ 1 has area π. 1). y) = P (X ≤ x. for 0 < x < 2 and 0 < y < 1. (6. (3. 2). 4). 5)}) 12 1 = = 36 3 P ({X = 1. 4. 4. 6)}) 12 1 = = 36 3 4. 3). thus C = 1 . (1. 4 8 4 . so P (2X − Y > 0) = 4 . u)dvdu.2. (4. 5)}) = = P ({(1. (1.5 with bivariate integrals replacing univariate integrals. 4 8 4 But for 0 < x < 2 and 1 ≤ y. Y = 0}) = P ({(3. The area below the line y = 2x is half of the area of the square. 1). 2). 2). Y = 1}) = P ({(3. y) = −∞ −∞ f (u. (4. 4). (6. 4). the total area being 4. 4 1 1 (x 0 4 b. (4.3 For the experiment of tossing two fair dice. FXY (x. (4. 5). 2). The given probabilities are obtained by noting the following equivalences of events. (6. The proof is the same as for Theorem 2. So the probability of an event is (number of points in the event)/36. (3. y) = −∞ −∞ f (u. (4. P ({X = 0. 6). The way this integral is calculated depends on the values of x and y. For example. each of the points in the 36-point sample space are equally likely. 6).2 These are all fundamental properties of integrals. (3. Clearly P (|X + Y | < 2) = 1. x y x 1 0 FXY (x. (1. (6. c. (2. 3). (2.1 Since the distribution is uniform. v)dvdu = 0 1 x2 y y 2 x (u + 2v)dvdu = + . 6)}) = 6 36 6 36 = = 1 6 1 6 P ({X = 1. the easiest way to calculate these probabilities is as the ratio of areas. 6). (5. 1). 4). 3). 2). (5. 3). (4. Y = 0}) P ({X = 0. Y = 1}) = P ({(1. 1). 3). (6. 1). 3). a. (2. (5. so P (X 2 + Y 2 ≤ 1) = π . (6. v)dvdu = 0 1 x2 x (u + 2v)dvdu = + . x y x y 0 FXY (x. 2). 5). (5. 4). Y ≤ y) = −∞ −∞ f (v. (2. 4 2 b. 6). (2. 1). (3. (1. (5.Chapter 4 Multiple Random Variables 4. 5). (5.

P (X 2 < Y < X) = 0 y 2xdxdy = 1 . For x = 0.9 P (a ≤ X ≤ b. y) = y/2 + y 2 /2 .6 Let A = time that A arrives and B = time that B arrives. 2−x 2 FX (x) = P (X ≤ x) = 1 − P (X > x) = 1 − P (B − A > x) = 1 − 1 a+x 1dbda = x2 1 +x− . 2) variables. 2 2 4. Y ∼ uniform(40. c ≤ Y ≤ d) = P (X ≤ b. FX (x) = P (X ≤ x) = 0 for x < 0. The function z = g(x) = 9/(x + 1)2 is monotone on 0 < x < 2. P (X > Y ) = 0 √y (x + y)dxdy = 20 .7 We will measure time in minutes past 8 A. √ 1 1 7 4. 2) × (1.M.10 a. d) − F (a. The marginal distribution of X is P (X = 1) = P (X = 3) = 1 and P (X = 2) = 4 marginal distribution of Y is P (Y = 2) = P (Y = 3) = P (Y = 4) = 1 . we have 2 a 1 √ y FX (0) = P (X ≤ 0) = P (X = 0) = P (B ≤ A) = 1 1 1dbda = 1 . Y ≤ c) = F (b. c) = FX (b)FY (d) − FX (b)FY (c) − FX (a)FY (d) − FX (a)FY (c) = P (X ≤ b) [P (Y ≤ d) − P (Y ≤ c)] − P (X ≤ a) [P (Y ≤ d) − P (Y ≤ c)] = P (X ≤ b)P (c ≤ Y ≤ d) − P (X ≤ a)P (c ≤ Y ≤ d) = P (a ≤ X ≤ b)P (c ≤ Y ≤ d). V = y) = P (U = x)P (V = y) where U ∼ X and V ∼ Y is 1 2. The . Y ≤ d) + P (X ≤ a. 30). 2 3 Therefore the random variables are not independent. so use Theorem 2. c) − F (a. 50).1. 1 < z < 9.5 a. Then.M.5 to obtain fZ (z) = 9/(8z 2 ). b. d) − F (b. 4.  2  x /8 + x/4 0 < x < 2 and 1 ≤ y   1 2 ≤ x and 1 ≤ y d.4-2 Solutions Manual for Statistical Inference The complete deﬁnition of FXY is  x ≤ 0 or y ≤ 0  02   x y/8 + y 2 x/4 0 < x < 2 and 0 < y < 1  2 ≤ x and 0 < y < 1 FXY (x. b. 6 4. So their joint pdf is uniform on the square (1. So X ∼ uniform(0. Y ≤ d) − P (X ≤ b. 30) × (40. 2 And for 0 < x < 1.) = P (X + Y < 60) = 40 1 1 dxdy = . Y = 3) = 0 = ( )( ) = P (X = 2)P (Y = 3). 2). 50) and the joint pdf is 1/300 on the rectangle (0. Let X = amount of time A waits for B. The random variables A and B are independent uniform(1. But 3 1 1 P (X = 2. c ≤ Y ≤ d) − P (X ≤ a. c ≤ Y ≤ d) = P (X ≤ b. and FX (x) = P (X ≤ x) = 1 for 1 ≤ x. The distribution that satisﬁes P (U = x. 300 2 4. 50 60−y 0 P (arrive before 9 A. Y ≤ c) − P (X ≤ a.

. Therefore. the probability that (X. . . Similarly. Y ) falls in the triangle is the area of the triangle. and E(Y | X) = EY . Note that for ﬁxed u. y − x < 1/2 and 1 − y < 1/2. v = 0. y) such that the length of each piece is less than 1/2. . In Example 4. Thus E(Y − g(X)) = E(Y − E(Y | X))2 + E(E(Y | X) − g(X))2 ≥ E(Y − E(Y | X))2 . . So for v = 1. g(X) is a constant. Equation (2. E(Y − g(X))2 = = E ((Y − E(Y | X)) + (E(Y | X) − g(X))) E(Y − E(Y | X))2 + E(E(Y | X) − g(X))2 + 2E [(Y − E(Y | X))(E(Y | X) − g(X))] . . The choice g(X) = E(Y | X) will give equality. v = 0. To calculate the probability of this. u + 2. v = u + 1. Y ) is in this triangle is also 1/8. . the sum of the lengths of any two pieces must be greater than the length of the third. (1/2. . . . the joint pmf is fU. In order for the three pieces to form a triangle. u. . 0). 4.12 One interpretation of “a stick is broken at random into three pieces” is this. These three inequalities deﬁne the triangle with vertices (1/2. .V (u. The support of the distribution of (U. Let X and Y denote the two points where the stick is broken. These three inequalities deﬁne the triangle with vertices (0.}. . then we know V > u. v) : u = 1. we need to identify the sample points (x.Second Edition 4-3 1 V 2 3 4 1 12 1 12 1 12 U 2 1 6 1 6 1 6 3 1 12 1 12 1 12 4. which is 1/8. 1) distributions. . 1/2) and (1/2. if x > y. . If y > x.) Because of the uniform distribution.3) is the special case of a) where we take the random variable X to be a constant. . This will be true if and only if the length of each piece is less than 1/2. λ/(θ + λ)). .11 The support of the distribution of (U. ±1. Then.16 a.13 a. The probability that (X. Let X and Y both have uniform(0. Let U = X + Y and V = Y . v) is positive for v = 0. v) : u = 1. . v) = f (v|u) = f (u) θ u−v e−θ λv e−λ (u−v)! v! (θ+λ)u e−(θ+λ) u! 2 = u v λ θ+λ v θ θ+λ u−v . b. V = v) = P (Y = u. So the probability that the pieces form a triangle is 1/8 + 1/8 = 1/4.15 We will ﬁnd the conditional distribution of Y |X + Y .. 1/2). u. Therefore the conditional pmf is f (u. we found the joint pmf of (U. X = u + v) = p(1 − p)u+v−1 p(1 − p)u−1 = p2 (1 − p)2u+v−2 . . V ). .1. x − y < 1/2 and 1 − x < 1/2. say b. If V > 0. this will be true if x < 1/2. 4. (1/2. v) = P (U = u. (Draw a graph of this set. That is V |U ∼ binomial(U. V ) is {(u. U and V are not independent. 1/2). if we know U = u.3. each piece will have length less than 1/2 if y < 1/2. 2. 4. 2. ±2.}. 2. . This is not a cross-product set. . V ) is {(u. 1/2) and (1. Suppose the length of the stick is 1. 4. 2 The cross term can be shown to be zero by iterating the expectation. . 1). for all g(·). . then X > Y . . . f (u.2. The derivation of the conditional distribution of X|X + Y is similar. More simply. and assume X and Y are independent. Then the joint distribution of X and Y is uniform on the unit square.

where r and s are positive integers and r < s. P Z= r s ∞ ∞ = i=1 P (X = ir. 0 4. −2. Since (X1 − X2 ) √ 2 ∼ n(0. r/s. Since the joint pmf factors into a function of u and a function of v. . Since Y ≥ 5 if and only if X ≥ 4. b. P (X − 4 ≤ x|Y ≥ 5) = P (X − 4 ≤ x|X ≥ 4) = P (X ≤ x) = e−x . y) such that x and y are positive integers and x/(x + y) = r/s. b.18 We need to show f (x. 1−(1 − p) s−2 P (X = x. and the denominator is positive for all x > 0. u = 1. U and V are independent. Therefore. Y = t − x) = P (X = x)P (Y = t − x) = p2 (1 − p)t−2 . V = v) = P (X = u. i = 1. because the numerator is nonnegative since g(x) ≥ 0. the joint pmf is fU. y) is nonnegative and integrates to 1. . In all three cases. 2. 4. which is geometric with p = 1 − e−1 . then X < Y . the joint pmf is fU. since the exponential distribution is memoryless.V (u.17 a.. V = 0) = P (X = Y = u) = p(1 − p)u−1 p(1 − p)u−1 = p2 (1 − p)2u−2 . 4. ±2. x = r cos θ and y = r sin θ. α2 ). then X = Y . .4-4 Solutions Manual for Statistical Inference If V < 0. X + Y = t) = P (X = x. That is. We need to identify all the pairs (x. v) = P (U = u. v = 0. r and s have no common factors. . Then f (y 1 . Y = u − v) = p(1 − p)u−1 p(1 − p)u−v−1 = p2 (1 − p)2u−v−2 . . p 2 2 ∞ (1 − p) ((1 − p) ) i=1 s i = (1 − p) s 1−(1 − p) (1 − p) 2 = p2 (1 − p) s . Consider one such value. (X1 − X2 )2 2 ∼ χ2 (see Example 2. y 2 ) = y2 = x1 + x2 then x1 = y1 y2 .. . b. we can write the joint pmf as fU.1. If V = 0. P (Y = i + 1) = i e−x dx = e−i (1 − e−1 ). 1) and are independent. . Make the transformation y1 = |J| = y2 . 2. y > 0. 0) = P (U = u. y)dxdy = 0 2 2g(r) rdrdθ = πr π π/2 0 0 ∞ g(r)drdθ = 2 π π/2 1dθ = 1. . 1). . i(s − r)). Γ(α1 +α2 ) 2 Γ(α1 +α2 ) α1 −1 α −1 y (1 − y 1 ) 2 Γ(α1 )Γ(α2 ) 1 thus Y1 ∼ beta(α1 . where the fraction is in reduced form. . y) ≥ 0.V (u. . we obtain ∞ 0 0 ∞ π/2 ∞ 0 i+1 f (x.19 a. . So for v = 0. All such pairs are (ir. f (x. x2 = y2 (1 − y1 ) and 1 y α1 +α2 −1 e−y2 . Y = i(s − r)) = i=1 p(1 − p)ir−1 p(1 − p)i(s−r)−1 p 2 s = c. The possible values of Z are all the fractions of the form r/s. v) = p2 (1 − p)2u+|v|−2 = p2 (1 − p) 2u (1 − p)|v|−2 .9). 1 x1 x1 +x2 .V (u. . Changing to polar coordinates. So for v = −1. . . Y2 ∼ gamma(α1 + α1 . ±1.

4. From (4. x2 = 0}.21 Since R and θ are independent. e 2 4π 2x −y x2 +y 2 2y −x x2 +y 2 = 2. A1 = {−∞ < x1 < ∞.Y 2 (y1 . −∞ < x.6). y < ∞.Y 2 (y1 . 0 < t < ∞. Y1 is the square of the distance from (X1 . v fU. We see in the above expression that the joint pdf factors into a function of y1 and a function of y2 . θ) = J= Therefore fX. θ = tan−1 (y/x). v) = Γ(α+β) Γ(α+β+γ) Γ(α)Γ(β) Γ(α+β)Γ(γ) u v α−1 β−1 1 v α+β−1 (1−v)γ−1 . This transformation is not one-to-one because you cannot determine the sign of X2 from Y1 and Y2 . −1 < y2 < 1}. fX. Then fY1 . x2 > 0} and A2 = {−∞ < x1 < ∞.3. So partition the support of (X1 . 0 < tan−1 y/x < 2π. y) = 1 − 1 (x2 +y2 ) e 2 . X2 ) to the origin. 0 < x2 + y 2 < ∞. So independence says the distance from the origin is independent of the orientation (as measured by the angle). So X and Y are independent standard normals. fY1 . x = u/y = u/v then J= ∂x ∂u ∂y ∂u ∂x ∂v ∂y ∂v = 1 v 0 1− u v u − v2 1 = 1 . both of which are the same in this case. 2π 2 − 1 (x2 +y2 ) . Let y = v. 2πσ 2 1−y 2 2 0 < y1 < ∞. v . y2 ) is the sum of two terms. 1−y 2 2 √ 2 The inverse transformation from B to A2 is x1 = y2 y1 and x2 = − y1 −y 1 y2 with J2 = −J1 . 0 < θ < 2π. So Y1 and Y2 are independent. Then t = x2 + y 2 . and fT.Second Edition 4-5 4.V (u. X2 ) to the origin. Y2 ) √ is B = {0 < y1 < ∞. X2 ) into A0 = {−∞ < x1 < ∞.θ (t. −1 < y2 < 1.23 a.Y (x. 0 < u < v < 1. x2 < 0}. 4π √ √ Make the transformation x = t cos θ. y = t sin θ. y2 ) = = 2 1 −y1 /(2σ2 ) 1 e 2πσ 2 2 1−y 2 2 1 −y1 /(2σ2 ) 1 e . 4. b. the joint pdf of T = R2 and θ is 1 −t/2 e . The support of (Y1 .20 a. The inverse transformation from B to A1 is x1 = y2 y1 2 and x2 = y1 −y 1 y2 with Jacobian y 1 √2 2 y1 √ 2 1−y 1 √ 2 y1 2 √ y2 √ y1 y1 J1 = √ = 1 2 2 1 − y2 .Y (x. y) = Thus. Y2 is the cosine of the angle between the positive x1 -axis and the line from (X1 .

Let x = uv. The set {x > 0. z2 ) = = 1 (z1 z2 )r−1 e−z1 z2 Γ(r) 1 z r+s−1 e−z1 · Γ(r+s) 1 1 (z1 − z1 z2 )s−1 e−z1 +z1 z2 z1 Γ(s) Γ(r+s) r−1 z (1 − z2 )s−1 . U ∼ gamma(α. 0 < y < 1} is mapped onto the set {0 < u < v < u . β + γ).V (u. fU (u) 1/u = u fU. fU (u) = = = = Γ(α+β+γ) α−1 u Γ(α)Γ(β)Γ(γ) 1 v β−1 (1 − v)γ−1 ( u v−u β−1 ) dv v y= dv v−u . v) = √ Γ(α + β + γ) √ α−1 ( uv (1 − uv)β−1 Γ(α)Γ(β)Γ(γ) u v α+β−1 1− 1 . x 4. 0 < u < 1. U ∼ beta(α.24 Let z1 = x + y. dy = 1−u 1−u 1 Γ(α+β+γ) α−1 u (1 − u)β+γ−1 y β−1 (1 − y)γ−1 dy Γ(α)Γ(β)Γ(γ) 0 Γ(β)Γ(γ) Γ(α+β+γ) α−1 u (1 − u)β+γ−1 Γ(α)Γ(β)Γ(γ) Γ(β+γ) Γ(α+β+γ) α−1 u (1 − u)β+γ−1 . fU (u) = A = = z β−1 (1 − z)γ−1 dz ( kernel of beta(β. 0 < z2 < 1}. then x = z1 z2 . 2v(1 − u) Call it A √ √ u/v u/v−u To simplify. Γ(α)Γ(β+γ) That is. β + γ). fZ1 .4-6 Solutions Manual for Statistical Inference Then. 0 < u < 1}. v = 1/u ⇒ z = 0 and dz = − 2(1−u)v dv. 0 < z2 < 1. Thus. Then. 2v u v γ−1 fU. 2v 1 The set {0 < x < 1. Γ(r)Γ(s) 2 · . let z = 1−u . γ)) Γ(α+β+γ) α−1 Γ(β)Γ(γ) u (1 − u)β+γ−1 Γ(α)Γ(β)Γ(γ) Γ(β+γ) Γ(α+β+γ) α−1 u (1 − u)β+γ−1 .Z2 (z 1 . as in a).V (u. y = u then v J= ∂x ∂u ∂y ∂u ∂x ∂v ∂x ∂v = 1 1/2 −1/2 u 2v 1 −1/2 −1/2 v u 2 1 1/2 −1/2 v 2u − 1 u1/2 v −3/2 2 = 1 . 0 < u < 1. √ b. Then v = u ⇒ z = 1. v)dv Γ(α + β + γ) α−1 u (1−u)β+γ−1 Γ(α)Γ(β)Γ(γ) 1/u u = √ 1 − uv 1−u β−1 1 − u/v 1−u γ−1 ( u/v)β dv. y = z1 (1 − z2 ) and |J| = ∂x ∂z1 ∂y ∂z1 ∂x ∂z2 ∂y ∂z2 = z2 1−z 2 z1 −z 1 = z1 . 0 < z1 . z2 = x+y . y > 0} is mapped onto the set {z1 > 0. Γ(α)Γ(β+γ) Thus.

z2 ) can be factored into two densities. 4. sofXY = fX fY ) Let u = x + y. Proceed as in Exercise 4. It remains to show that they are independent. Y = X 2 + Z and fXY (x. then x = 1 (u + v). s).1/10) (y − x).26 a. 2 2 2 2 = .Second Edition 4-7 fZ1 .1) (x) In Example 4. y) = 1 − 12 [(x−µ)2 +(y−γ)2 ] e 2σ 2πσ 2 (by independence. W = i) = P (Z ≤ z)P (W = i). 2σ 2 ).5.27 From Theorem 4. 1. 1). U and V are independent. λ 1 −x/λ 1 −y/µ e e dxdy = . P (Z ≤ z. P (W = 0) = P (Y ≤ X) = fXY (x.W =1) = P (min(X. So Z and W are independent. −∞ < y < ∞} is mapped onto the set {−∞ < u < ∞. v = x − y.25 For X and Z independent. y) = fX (x)fZ (y − x).2. Z2 ∼ beta(r. W = 0) + P (Z ≤ z. P (Z ≤ z. fXY (x.8.5. y = 1 (u − v) and 2 2 |J| = 1/2 1/2 1/2 −1/2 = 1 . fXY (x. Y ) ≤ z.9. Therefore 1 1 − 2σ2 (( u+v )−µ)2 +(( u−v )−γ )2 1 2 2 fU V (u. y) = fX (x)fZ (y − x2 ) = 4.1/10) (y − x2 ).1) (x) I(0. In Example 4. 10 1 1 I(−1. 2 4πσ By the factorization theorem. P (Z ≤ z. for i = 0. 2σ 2 ) and V ∼ n(µ − γ. Y ≤ X) = P (Y ≤ z. 2 10 λ µ+λ 1 − exp − 1 1 + µ λ z . µ λ Therefore.Z 2 (z1 . W = 0) = P (min(X. Therefore Z1 and Z2 are independent and Z1 ∼ gamma(r + s. Y ≤ X) z ∞ 1 −x/λ 1 −y/µ = e e dxdy λ µ 0 y = Similarly. 4.24. −∞ < v < ∞}. λ µ µ+λ 0 y µ P (W = 1) = 1 − P (W = 0) = . v) = e · 2πσ 2 2 (µ+γ) (µ+γ) 1 1 − 2σ2 2( u ) −u(µ+γ)+ 2 +2( v ) −v(µ−γ)+ 2 2 2 e 4πσ 2 1 1 − 2(2σ2 ) − 1 2 2 = g(u) e (u − (µ + γ)) · h(v)e 2(2σ2 ) (v − (µ − γ)) . 2 The set {−∞ < x < ∞. b.14 we know U ∼ n(µ + γ. and Y = X + Z. Y ) ≤ z. 1 1 + µ λ z . X ≤ Y ) = P (X ≤ z. y) = I(0. W = 1) = 1 − exp − + z . z > 0. µ+λ 1 1 P (Z ≤ z) = P (Z ≤ z. X ≤ Y ) z ∞ 1 −x/λ 1 −y/µ µ = e e dydx = 1 − exp − λ µ µ+λ 0 x ∞ ∞ 1 I(0.

The probability of choosing between each one of these intervals is 1 .8 fW (w) = 1 −1 1 1 1 1 |√ |√ |+ |= √ .1. π). consider the set {(0. 2π). To see this let consider the distribution of W = cos θ and V = sin θ where θ ∼ uniform(0. 2 π 1 − v2 2 π 1 − v2 π 1 − v2 −1 ≤ v ≤ 1. 2π)} has probability 1 of being chosen. g1 (z) = cot−1 z. 1). g1 (θ) = cot θ. Therefore the distribution in each of these intervals is the distribution of sin θ. π ) ∪ ( 3π . u = x will also show that the joint density factors. The bivariate transformation t = y/x. The quick proof is to note that the distribution of Y |X = x is n(1. 2π)} is 2 2 2 2 1 1 the same as the distribution of V in ( −π . √X 2 +Y 2 has the same distribution as X = cos θ since sin θ has the same distribution as cos θ. −1 ≤ w ≤ 1. then fV (v) = 1 1 √ . 1). 2 2 2 2XY 2XY Therefore √X 2 +Y 2 has the same distribution as Y = sin θ. For the function sin 2θ the values of the function sin θ are repeated over each of the 2 intervals (0.8 X Y 4. By Theorem 2. Y ) = = = = E {E(Y |X)} = EX = 1 . hence is independent of X. g2 (θ) = cot θ. π ) which is π √1−v2 . 2 2 −1 4g1 (v) = π − sin−1 v. Thus √X 2 +Y 2 is distributed as sin 2θ which is distributed as sin θ. −1 −1 g2 (θ) = cos θ. Therefore ∼ 2 with X = R cos θ and Y = R sin θ. 2π). Thus W and V has the same distribution. Therefore 2XY = R sin 2θ. Thus √X 2 +Y 2 ∼ X ∼ n(0. Let A1 = (0. XY = R2 cos θ sin θ then 2XY = R2 2 cos θ sin θ = R2 sin 2θ. 2π)}. π 1 − v2 −1 ≤ v ≤ 1. g1 (θ) = cos θ. 4.1. . R sin −1 −1 A2 = (π. g2 (z) = π + cot z. 3π ). g1 (w) = cos−1 w. To see this let sin θ ∼ fsin θ . Therefore the distribution of V in {(0. 2π 1 + z 2 2π 1 + z 2 π 1 + z2 −∞ < z < ∞. R = X 2 + Y 2 is a positive random variable and Y 2XY θ = tan−1 ( X ) ∼ uniform(0. In addition. Then X 2 + Y 2 √ χ2 is a positive random variable. Therefore 2 2 2 2 2 fV (v) = 1 11 11 1 1 1 √ √ + = √ . 12 Var (E(Y |X)) + E (Var(Y |X)) = VarX + EX 2 = E[E(XY |X)] = E[XE(Y |X)] = EX 2 = EXY − EXEY = 1 − 3 1 2 2 1 3 = 1 . −1 ≤ v ≤ 1. fZ (z) = 1 −1 1 −1 1 1 | |+ | |= . for which the function sin θ has the same values 2 2 as it does in the interval ( −π . Let X and Y be iid n(0. π ) ∪ ( 3π . ﬁrst consider the interval ( π . By Theorem 2. To derive −1 the distribution of W = cos θ let A1 = (0. π ) ∪ ( 3π . 2π). 3π ). On (0. π ). π). Let g1 (θ) = sin θ. A2 = (π.4-8 Solutions Manual for Statistical Inference −1 θ = R cos θ = cot θ. 1). R √ 2XY 2XY Since R = X 2 + Y 2 then √X 2 +Y 2 = R sin 2θ. 2π 1 − w2 2π 1 − w2 π 1 − w2 To derive the distribution of V = sin θ. π) and (π. 2 1 1 + 12 3 = 5 . 1). 2π) . 2π) each 2 2 of the sets ( π . g2 (w) = 2π − cos w. Let Z = cot θ. Second. 12 b.30 a. EY VarY EXY Cov(X.29 a. Thus f2 sin θ = 1 fsin θ + 1 fsin θ = fsin θ . b. {(0.

. 1. b. .. For y = 0. P (Y = y. we have ∞ P (Y = y|λ) = n=y ∞ P (Y = y|N = n. 1. Then EY VarY = = E(E(Y |Λ)) = EΛ = αβ Var(E(Y |Λ)) + E(Var(Y |Λ)) = VarΛ + EΛ = αβ 2 + αβ = αβ(β + 1). . y y = 0. . If α is a positive integer. 0 < x < 1. . The pmf of Y . X ≤ x) = c. . is ∞ n y x (1 − x)n−y . 2 n n2 + . . n.Second Edition 4-9 4. 12 6 VarY = Var (E(Y |X)) + E (Var(Y |X)) = Var(nX) + EnX(1 − X) = b. y Γ(n + 2) fY (y) = 0 fY (y|λ)fΛ (λ)dλ = 1 y!Γ(α)β α = λy e−λ 1 λα−1 e−λ/β dλ y! Γ(α)β α 0    −λ  ∞ λ(y+α)−1 exp dλ  β  0 1+β ∞ = 1 Γ(y + α) y!Γ(α)β α β 1+β y+α . 1/(1 + β)) pmf.32 a. fY (y) = y+α−1 y β 1+β y 1 1+β α . 1. EY = E{E(Y |X)} = EnX = n . for y = 0. y! . . λ)P (N = n|λ) n y e−λ λn p (1 − p)n−y n! y 1 y!(n − y)! 1 y!m! m=0 p 1−p y ∞ = n=y ∞ = n=y p 1−p p 1−p y [(1 − p)λ]n e−λ y = e−λ e−λ y! [(1 − p)λ]m+y [(1−p)λ] m! m=0 ∞ m (let m = n − y) = [(1 − p)λ]y = e−λ (pλ)y e(1−p)λ y (pλ) e−pλ = ..31 a. . the negative binomial(α. P (y = y) = 4. . n Γ(y + 1)Γ(n − y + 1) .

Then E log {1−et (1 − p)} logp N ∞ = n=0 log {1−et (1 − p)} logp λlog(1−et (1−p)) logp n e−λ λn n! (since N ∼ Poisson) λlog(1−et (1−p)) logp n ∞ = e−λ e e −λlog(1−et (1−p)) logp n=0 t n! . The sum equals 1. 1. .33 We can show that H has a negative binomial distribution by computing the mgf of H. Y ∼ negative binomial(α. elogp −λ/ logp 1 1−et (1 − p) −λ/ log p p 1−et (1 − p) −λ/ logp This is the mgf of a negative binomial(r. 4. 1 n Γ(α+β) py+α−1 (1 − p)n+β−y−1 dp y Γ(α)Γ(β) 0 n Γ(α+β) Γ(y+α)Γ(n+β−y) .7. Now. y = 0. 1 P (Y = y) = 0 1 P (Y = y|p)fp (p)dp n y 1 p (1 − p)n−y pα−1 (1 − p)β−1 dp y B(α. y Γ(α)Γ(β) Γ(α+n+β) 1 P (X = x) = 0 1 P (X = x|p)fP (p)dp r+x−1 r Γ(α + β) α−1 p (1 − p)x p (1 − p)β−1 dp x Γ(α)Γ(β) = 0 . p).4-10 Solutions Manual for Statistical Inference the Poisson(pλ) pmf. for y = 0. It is the sum of a Poisson [λlog(1 − e (1 − p))]/[logp] pmf. Therefore. Again. 4. . ∞ EeX1 t = x1 =1 ex1 t −1 (1 − p) logp x1 x1 = −1 logp x ∞ 1 (et (1 − p)) x1 =1 x1 = −1 − log 1−et (1 − p) logp .34 a. EeHt = EE eHt N = EE e(X1 +···+XN )t N = E E eX1 t N N . with r = −λ/ log p. n. by Theorem 4. if α is a positive integer. E(e Ht ) = e−λ elog(1−e = t (1−p)) λ/ log p = . 1. . . . Now calculations like those in a) yield the pmf of Y . β) = 0 = = b. . if r is an integer. because. . is fY (y) = 1 α Γ(y + α) Γ(α)y!(pβ) pβ 1+pβ y+α . the mgf of a sum of independent random variables is equal to the product of the individual mgfs. Thus Y |Λ ∼ Poisson(pλ). 1/(1 + pβ))..6.

P α−1 EX = E[E(X|P )] = E since E 1−P P 1 = 0 1−P P 1 0 Γ(α + β) α−1 p (1 − p)β−1 dp Γ(α)Γ(β) p(α−1)−1 (1 − p)(β+1)−1 dp = Γ(α + β) Γ(α − 1)Γ(β + 1) Γ(α)Γ(β) Γ(α + β) = = Γ(α + β) Γ(α)Γ(β) β . Γ(α)Γ(β) Γ(r + x + α + β) r(1 − P ) rβ = .35 a. Var(X) E[nP (1 − P )] + Var(nP ) αβ = n + n2 VarP (α + β)(α + β + 1) αβ(α + β + 1 − 1) = n + n2 VarP (α + β 2 )(α + β + 1) = . (α − 1)2 (α − 2) where E 1−P P 2 1 = 0 Γ(α + β) (α−2)−1 p (1 − p)(β+2)−1 dp Γ(α)Γ(β) = β(β + 1) . . r+x−1 x r+x−1 x Γ(α + β) 1 (r+α)−1 p (1 − p)(x+β)−1 dp Γ(α)Γ(β) 0 Γ(α + β) Γ(r + α)Γ(x + β) x = 0. Therefore.Second Edition 4-11 = = Therefore. α(α − 1) (α − 1)2 (α − 2) r(1 − P ) + Var P2 r(1 − P ) P = r since E 1−P P2 = = and Var 1−P P Γ(α + β) (α−2)−1 p (1 − p)(β+1)−1 dp = 0 Γ(α)Γ(β) (β + 1)(α + β) α(α − 1) 1 Γ(α + β) Γ(α − 2)Γ(β + 1) Γ(α)Γ(β) Γ(α + β − 1) = = E 1−P P 2 − E 1−P P 2 = β(β + 1) β 2 −( ) (α − 2)(α − 1) α−1 β(α + β − 1) . . . Var(X) = E(Var(X|P )) + Var(E(X|P )). α−1 Var(X) = E(Var(X|P )) + Var(E(X|P )) = E β(α + β − 1) (β + 1)(α + β) + r2 . 1. (α − 2)(α − 1) = Γ(α + β) Γ(α − 2)Γ(β + 2) Γ(α)Γ(β) Γ(α − 2 + β + 2) 4.

Var(Y ) = E(Var(Y |Λ)) + Var(E(Y |Λ)) = EΛ + Var(Λ) = µ + α µ2 since EΛ = µ = αβ and Var(Λ) = αβ 2 = 4. Therefore. EY = E(E(Y |c)) = E[nc] = E n Using the results in Exercise 4.37 a. λ−ν = x λu . Make the transformation u = λ 0 − x λ .38 a. A mixture of beta-binomial. = = 0 1 1 1 < c = (1 + p) < 1) 2 2 1 1 (Y = k|c = (1 + p))P (P = p)dp 2 0 1 n 1 1 Γ(a + b) a−1 [ (1 + p)]k [1 − (1 + p)]n−k p (1 − p)b−1 dp k 2 2 Γ(a)Γ(b) n (1 + p)k (1 − p)n−k Γ(a + b) a−1 p (1 − p)b−1 dp k 2k 2n−k Γ(a)Γ(b) k 0 1 = 0 = n Γ(a + b) k 2n Γ(a)Γ(b) n Γ(a + b) k 2n Γ(a)Γ(b) k k j 2n pk+a−1 (1 − p)n−k+b−1 dp j=0 k = j=0 k Γ(k + a)Γ(n − k + b) j Γ(n + a + b) . The “extra-Poisson” variation is 1 2 αµ . Var(Y ) = nE = = 1 1 (1 + P ) + n(n − 1)Var (1 + P ) 2 2 n n(n − 1) (1 + EP )(1 − EP ) + VarP 4 4 2 n a n(n − 1) ab 1− + . Then 1 −x/ν 1 ν r−1 dν e ν Γ(r)Γ(1 − r) (λ−ν)r . Let Y = Xi . = 1 b. Var(Y ) = nEC(1 − EC) + n(n − 1)VarC. du = −x ν ν 2 dν.35(a). 2 (a + b + 1) 4 a+b 4 (a + b) 1−E x ν 1 (1 + p) 2 = n 2 1+ a a+b . P (Y = k) (αβ)2 α = µ2 α . = j=0 n Γ(a + b) Γ(k + a)Γ(n − k + b) k Γ(a)Γ(b) Γ(n + a + b) b. = P (Y = k. 1 (1 + P ) 2 4.4-12 Solutions Manual for Statistical Inference nαβ(α + β + 1) nαβ − + n2 VarP 2 )(α + β + 1) 2 )(α + β + 1) (α + β (α + β β α = n − nVarP + n2 VarP α+β α+β = nEP (1 − EP ) + n(n − 1)VarP.

From the discussion in the text we have that f (x1 ... . . . f (xi |xj ) = (x1 . d log dx ∞ 0 − e−x/ν qλ (ν)dν = ν ∞ 1 −x/ν e qλ (ν)dν 0 ν2 ∞ 1 −x/ν e qλ (ν)dν 0 ν < 0 ∀x.xn ) p1 1 − pj x1 ····· pj−1 1 − pj xj−1 pj+1 1 − pj xj+1 ····· pn 1 − pj xn . since this is a beta(r. . if r > 1. xn |xj ) (m − xj )! = x1 !· · · · ·xj−1 !·xj+1 !· · · · ·xn ! × Then..39 a.xj−1 . f (x1 .. 0 c. xj−1 . 4... d d 1 r−1 1 log f (x) = log +(r − 1) log x − x/λ = − >0 dx dx Γ(r)λr x λ for some x. ..xi+1 . xj+1 . xj+1 !· · · · ·xn ! (xk =xi . xj−1 . .. . b. . ..xj+1 .. .Second Edition 4-13 r = = 1 Γ(r)Γ(1 − r) ∞ 0 1 x ∞ x λu 1 u e−(u+x/λ) du xr−1 e−x/λ .xj ) ×( p1 pi−1 pi+1 )x1 · · · · ·( )xi−1 ( )xi+1 1 − pj − p i 1 − p j − pi 1 − pj − pi . Without loss of generality lets assume that i < j. . But.xi−1 . . Use the transformation t = ν/λ to get λ pλ (ν)dν = 0 1 Γ(r)Γ(1 − r) λ ν r−1 (λ − ν)−r dν = 0 1 Γ(r)Γ(1 − r) 1 tr−1 (1 − t)−r dt = 1.. xn |xj ) (m − xj )! x1 !· · · · ·xj−1 !·xj+1 !· · · · ·xn ! = (xk =xi . Γ(r)λr xr−1 e−x/λ r Γ(r)Γ(1 − r) λ r e−u du = 0 since the integral is equal to Γ(1 − r) if r < 1. . .. xi+1 !· · · · ·xj−1 !.xj ) ×( p1 x1 pj−1 xj−1 pj+1 xj+1 pn xn ) · · · · ·( ) ( ) · · · · ·( ) 1 − pj 1 − pj 1 − pj 1 − pj pi 1−pj pi 1−pj m−xi −xj m−xi −xj m−x −x (m − xi − xj )! 1 − × (m − xi − xj )! 1 − = i j (m − xj )! pi xi pi ( ) 1− xi !(m − xi − xj )! 1 − pj 1 − pj (m − xi − xj )! × x1 !· · · · ·xi−1 !. . xj+1 . . 1 − r).

Then n Var i=1 Xi = = = = Var (X1 + X2 + · · · + Xn ) E [(X 1 + X2 + · · · + Xn ) − (µ1 + µ2 + · · · + µn )] E [(X 1 −µ1 ) + (X 2 −µ2 ) + · · · + (X n −µn )] n 2 2 E(Xi − µi )2 + 2 i=1 n 1≤i<j≤n E(Xi − µi )(Xj − µj ) = i=1 VarXi + 2 1≤i<j≤n Cov(Xi .Y = Cov(XY.42 ρXY. 1−pj ). Var(Xi + Xj ) = m(pi + pj )(1 − pi − pj ). . pi Thus Xi |Xj = xj ∼ binomial(m − xj .Y = 4.44 Let µi = E(Xi ). 4. Y ) E(XY )−µXY µY EXEY 2 −µX µY µY = = . σXY σY σXY σY σXY σY 2 where the last step follows from the independence of X and Y. Cov(Xi . Therefore. Cov(a. 2 2 4. 2 µX (σY +µ2 )−µX µ2 Y Y 2 2 (σX σY 2 +σX µ2 Y 2 +σY 1/2 µ2 ) σY X = µX σY 2 (µ2 σY X +µ2 Y 2 2 2 σX +σX σY ) 1/2 .43 Cov(X1 + X2 . X) = E(aX) − EaEX = aEX − aEX = 0. Therefore. X Y X Y Y X Therefore. Xj ) = 1 1 [m(pi +pj )(1−pi −pj )−mpi (1−pi )−mpi (1−pi )] = (−2mpi pj ) = −mpi pj . Xj ). 4. By Theorem 4.6 Var(Xi + Xj ) = Var(Xi ) + Var(Xj ) + 2Cov(Xi . X2 + X3 ) = E(X1 + X2 )(X2 + X3 ) − E(X1 + X2 )E(X2 + X3 ) = (4µ2 + σ 2 ) − 4µ2 = σ 2 2 2 Cov(X 1 +X 2 )(X 1 −X 2 ) = E(X1 + X2 )(X1 − X2 ) = EX1 − X2 = 0. xj ) = f (xi |xj )f (xj ) = m! x pxi p j (1 − pj − pi )m−xj −xi . ρXY.41 Let a be a constant. pi + pj ). b.4-14 Solutions Manual for Statistical Inference ×( = pj−1 pj+1 pn )xj−1 ( )xj+1 · · · · ·( )xn 1 − pj − p i 1 − pj − p i 1 − pj − p i 1− pi 1 − pj m−xi −xj (m − xj )! pi xi ( ) xi !(m − xi − xj )! 1 − pj . xi !xj !(m − xj − xi )! i j Using this result it can be shown that Xi + Xj ∼ binomial(m. Xj ).5. Now compute 2 σXY = E(XY )2 − [E(XY )]2 = EX 2 EY 2 − (EX)2 (EY )2 2 2 2 2 2 2 = (σX + µ2 )(σY + µ2 ) − µ2 µ2 = σX σY + σX µ2 + σY µ2 . f (xi .

Now the part of the σY σX exponent involving ω 2 can be removed from the integral. fX (x) = 2 the pdf of n(µX . so it integrates √ 2 2 2 2 2 2 to 2π 1−ρ2 . and µ = ρω. where we make the substitution z = y−µY . Start with fXY (x. .Second Edition 4-15 4. y)dy = −∞ 1 2πσX σY 1−ρ2 e − 1 2(1−ρ2 ) (ω2 −2ρωz+z2 )σY dz. fY |X (y|x) 1√ − 1−ρ2 2πσX σY e 1 2(1−ρ2 ) x−µX σX 2 −2ρ x−µX σX y−µY σY + y−µY σY 2 = − 12 (x−µX )2 √ 1 e 2σX 2πσX = √ √ √ 1 2πσY 1 2πσY 1 2πσY 1−ρ2 1−ρ2 1−ρ2 − e − 1 2(1−ρ2 ) x−µX σX 2 −(1−ρ2 ) x−µX σX 2 −2ρ x−µX σX y−µY σY + y−µY σY 2 = = e − 1 2(1−ρ2 ) ρ2 x−µX σX 2 −2ρ x−µX σX y−µY σY + y−µY σY 2 e 2σ 2 Y 1 √ (1−ρ2 (y−µY )− ρ σ Y (x−µX ) X σ 2 . σY c. which is the pdf of n (µY − ρ(σY /σX )(x − µX ) . The integrand is the kernel of normal pdf with σ 2 = (1 − ρ2 ).45 a. Also note that e−ω /2(1−ρ ) eρ ω /2(1−ρ ) = e−ω /2 . y) = 1 2πσX σY 1−ρ2 x−µX σX 2 × exp − and compute ∞ 1 2(1−ρ2 ) −2ρ x−µX σX y−µY σY + y−µY σY 2 ∞ fX (x) = −∞ fXY (x. dy = σY dz. b. ω = x−µX . σX ). The calculation for Y is similar. Thus. The mean is easy to check. and we complete the square in z to get e − ω2 2(1−ρ2 ) ∞ −∞ fX (x) = = 2πσX e−ω 2 1−ρ2 e 2πσX e − 1 2(1−ρ2 ) [(z2 −2ρωz+ρ2 ω2 )−ρ2 ω2 ] ∞ − 1 (z−ρω)2 2(1−ρ2 ) dz /2(1−ρ2 ) ρ2 ω 2 /2(1−ρ2 ) 1−ρ2 e −∞ dz. e−ω 2πσX 2 /2 √ 1−ρ2 2π 1−ρ2 = √ 1 −1 e 2 2πσX x−µX σX 2 . 1 − ρ2 . We will compute the marginal of X. E(aX + bY ) = aEX + bEY = aµX + bµY .

4.V (u. v) = 2πa 1 1−ρ2 e − 1 2(1−ρ2 ) 1 [ a (u−bv)] 2 ρ −2 a (u−bv)+v 2 . Simply plug the expressions for aX . We can then write 1 2πσU σV 1−ρ∗2 exp − 1 2 1−ρ∗2 u2 uv v2 −2ρ + 2 2 σU σU σV σV . bX . The result is − b2 + 2ρab + a2 1 2) 2(1−ρ a2 b2 u2 −2 + 2ρab + a2 b + aρ b2 + 2ρab + a2 uv + v 2 . To show that aX + bY is normal we have to do a bivariate transform. into the equalities in a) and simplify. 0 1 a Then fU V (u. U is normal. 2 2 Note that this is joint bivariate normal form since µU = µV = 0. then get fU. = = = = = = 2 2 since EZ1 = EZ2 = 1. Y ) aX EZ1 + bX EZ2 + EcX = aX 0 + bX 0 + cX = cX a2 VarZ1 + b2 VarZ2 + VarcX = a2 + b2 X X X X aY 0 + bY 0 + cY = cY a2 VarZ1 + b2 VarZ2 + VarcY = a2 + b2 Y Y Y Y EXY − EX · EY 2 2 E[(aX aY Z1 + bX bY Z2 + cX cY + aX bY Z1 Z2 + aX cY Z1 + bX aY Z2 Z1 + bX cY Z2 + cX aY Z1 + cX bY Z2 ) − cX cY ] = aX aY + bX bY . One possibility is U = aX + bY . Y ) = a2 σX + b2 σY + 2abρσX σY . and expectations of other terms are all zero. y = v and obtain 1 1/a −b/a |J| = = . by part a). We will do this in 1 the standard case. Thus. V ) E(aXY + bY 2 ) aρ + b ρ∗ = = = . v) = a2 ρ2 + abρ + b2 (1−ρ2 )a (1 − ρ2 )a = 2 = 2 + b2 + 2abρ 2 + 2abρ 2 a a +b σu 2 2 1−ρ∗2 . etc. 2 2 Var(aX + bY ) = a2 VarX + b2 VarY + 2abCov(X.4-16 Solutions Manual for Statistical Inference as is the variance. Let D = aX bY − aY bX = − 1−ρ2 σX σY and solve for Z1 and Z2 .46 a. Now factor the exponent to get a square in u. σU σV σU σV a2 + b2 + abρ thus (1 − ρ∗2 ) = 1 − where a 1−ρ2 = σU fU V (u. v) and show that fU (u) is normal. c. σu = a2 + b2 + 2abρ and Cov(U . which is in the exact form of a bivariate normal distribution. Make the indicated transformation and write x = a (u − bv). EX VarX EY VarY Cov(X. V = Y . b. Z1 Z2 = = bY (X−cX ) − bX (Y −cY ) D σY (X−µX )+σX (Y −µY ) 2(1−ρ)σX σY . = σY (X−µX )+σX (Y −µY ) 2(1+ρ)σX σY . σv = 1.

we get P (Z > z) = P (X > z). Write bX = ± σX −a2 . So Z and Y always have the same sign. Square both sides and simplify to get 2 2 2 2 (1 − ρ2 )σX σY = σX a2 − 2ρσX σY aX aY + σY a2 . and X Y substitute bX . −∞ < x < ∞. Y X This is an ellipse for ρ = ±1. 1). −∞ < y < ∞. Z ∼ X ∼ n(0. y) = = 1 −1 2 √ e 2π (2πσX σY x − µX σX (σY (x−µX )+σX (y−µY ))2 2(1+ρ)σ 2 σ 2 X Y 1 −1 2 √ e 2π (σY (x−µX )+σX (y−µY ))2 2(1−ρ)σ 2 σ 2 X Y 1 1−ρ2 σX σY 1 − ρ2 )−1 exp − y − µY σY + 1 2(1 − ρ2 ) 2 x − µX σX 2 − 2ρ y − µY σY . for z > 0. b. a bivariate normal pdf. By deﬁnition of Z.bY into aX aY = ρσX σY . In either case there are an inﬁnite number of points satisfying the equations. Another solution is aX aY cX cY = = = = ρσX bX = σ Y bY = 0 µX µY . By a similar argument.Second Edition 4-17 Then the Jacobian is J= ∂z1 ∂x1 ∂z2 ∂x ∂z1 ∂y ∂z2 ∂y = bY D −aY D −bX D aX D = aY bX 1 1 aX bY − = = . and hence. D2 D2 D − 1−ρ2 σX σY and we have that fX. Z > 0 ⇔ either (i)X < 0 and Y > 0 or (ii)X > 0 and Y > 0. We get 2 aX aY + ± σX −a2 X 2 ± σY −a2 Y = ρσX σY . P (Z ≤ z) = P (X ≤ z). By deﬁnition of Z. (1 − ρ2 )σX 2 2 There are an inﬁnite number of solutions. P (Z ≤ z) = = = = = = P (X P (X P (X P (X P (X P (X ≤ z and XY > 0) + P (−X ≤ z and XY < 0) ≤ z and Y < 0) + P (X ≥ −z and Y < 0) (since z < 0) ≤ z)P (Y < 0) + P (X ≥ −z)P (Y < 0) (independence) ≤ z)P (Y < 0) + P (X ≤ z)P (Y > 0) (symmetry of Xand Y ) ≤ z)(P (Y < 0) + P (Y > 0)) ≤ z).47 a. Thus. for z < 0.bY = ± σY −a2 . a line for ρ = ±1.Y (x. . d. hence they cannot be bivariate normal. 4.

g2 ∼ binomial(n. . (⇒) If X and Y are independent then f (x. Thus [f1 (x) − f2 (x)][g1 (y) − g2 (y)] = 0 since 0 < a < 1. f (x. y) = fX (x)fY (y). c. f2 ∼ binomial(n2 . p1 ). Thus X and Y are independent.49 a. g1 ∼ binomial(n1 . d. fX (x) = (af1 (x)g1 (y) + (1 − a)f2 (x)g2 (y))dy g1 (y)dy + (1 − a)f2 (x) g2 (y)dy = af1 (x) = af1 (x) + (1 − a)f2 (x). (⇐) if [f1 (x) − f2 (x)][g1 (y) − g2 (y)] = 0 then f1 (x)g1 (y) + f2 (x)g2 (y) = f1 (x)g2 (y) + f2 (x)g1 (y). p). (ii) f1 ∼ binomial(n. Cov(X. g2 ∼ binomial(n2 . p2 ). y) − fX (x)fY (y) = af1 (x)g1 (y) + (1 − a)f2 (x)g2 (y) − [af1 (x) + (1 − a)f2 (x)][ag1 (y) + (1 − a)g2 (y)] = a(1 − a)[f1 (x)g1 (y) − f1 (x)g2 (y) − f2 (x)g1 (y) + f2 (x)g2 (y)] = a(1 − a)[f1 (x) − f2 (x)][g1 (y) − g2 (y)] = 0. f2 ∼ binomial(n. Then. p). p). f2 .4-18 Solutions Manual for Statistical Inference 4. n2 ). g1 ∼ binomial(n. g1 . (i) f1 ∼ binomial(n. Therefore fX (x)fY (y) = a2 f1 (x)g1 (y) + a(1 − a)f1 (x)g2 (y) + a(1 − a)f2 (x)g1 (y) + (1 − a)2 f2 (x)g2 (y) = a2 f1 (x)g1 (y) + a(1 − a)[f1 (x)g2 (y) + f2 (x)g1 (y)] + (1 − a)2 f2 (x)g2 (y) = a2 f1 (x)g1 (y) + a(1 − a)[f1 (x)g1 (y) + f2 (x)g2 (y)] + (1 − a)2 f2 (x)g2 (y) = af1 (x)g1 (y) + (1 − a)f2 (x)g2 (y) = f (x. p). y). g2 are such that f1 − f2 = 0 and g1 − g2 = 0 with µ1 = µ2 or ξ1 = ξ2 . b. p1 ). f2 ∼ binomial(n. p). 1 − p). Y ) ∼ af1 (x)g1 (y) + (1 − a)f2 (x)g2 (y) where f1 . g1 ∼ binomial(n. p2 ). fY (y) = (af1 (x)g1 (y) + (1 − a)f2 (x)g2 (y))dx f1 (x)dx + (1 − a)g2 (y) f2 (x)dx = ag1 (y) = ag1 (y) + (1 − a)g2 (y). To construct dependent uncorrelated random variables let (X. n1 ). g2 ∼ binomial(n. Y ) = aµ1 ξ1 + (1 − a)µ2 ξ2 − [aµ1 + (1 − a)µ2 ][aξ1 + (1 − a)ξ2 ] = a(1 − a)[µ1 ξ1 − µ1 ξ2 − µ2 ξ1 + µ2 ξ2 ] = a(1 − a)[µ1 − µ2 ][ξ1 − ξ2 ]. p p (iii) f1 ∼ binomial(n1 .

if t > 1 P (B 2 > 4AC) P (2 log B > log 4 + log A + log C) P (−2 log B ≤ − log 4 − log A − log C) P (−2 log B ≤ − log 4 + (− log A − log C)) . i=1 − log Xi ∼ gamma(n. Y = − log A − log C. 1).8.6.511.53 P (Real Roots) = = = = dz dz + 1 1 1 2zt dz t if t ≤ 1 if t ≤ 1 1 t 1 − t/4 1 t − 4t + log t if t ≤ 1 . independent. ye−y/b = b(a + b)e−a/b and hence 2 + log 4 3 n Integration-by-parts will show that P (Real Roots) = n ∞ a 1 1 (1 + log 4) − 4 24 n = . Let X = −2 log B. − log y Γ(n) and fY (y) = d dy ∞ − log y 1 n−1 −z z e dz Γ(n) 1 d = − (− log y)n−1 e−(− log y) (− log y) Γ(n) dy 1 = (− log y)n−1 . By Example 4. Now.51 a. 1 P (XY /Z ≤ t) = 0 P (XY ≤ zt)dz 1 0 0 zt 2 + (1 − zt) zt 2 + (1 − zt) 1 2t = = 4. Γ(n) . 0 < y < 1. and P (Real Roots) = = log 4 ∞ P (X < − log 4 + Y ) ∞ P (X < − log 4 + y)fY (y)dy − log 4+y 0 1 = log 4 ∞ 1 −x/2 e dxye−y dy 2 = log 4 1 − e− 2 log 4 e−y/2 ye−y dy. Then X ∼ exponential(2). 4. P (X/Y ≤ t) t>1 (1 − t) t ≤ 1 P (XY ≤ t) = t − t log t 0 < t < 1.54 Let Y = i=1 Xi . ∞ 1 n−1 −z P (Y ≤ y) = z e dz. Then P (Y ≤ y) = P ( i=1 Xi ≤ y) = P ( i=1 − log Xi ≥ − log y). 1). n − log Xi ∼ exponential(1) = gamma(1.Second Edition 4-19 4. Therefore. = 1 2t 1 2 + b. 1). Y ∼ gamma(2.

Then P (Y ≤ y) = P (max(X1 . the lifetime of the system.4-20 Solutions Manual for Statistical Inference 4. 1 . X3 ). (ii) (iii) d dr log Ar = −1 r2 1 log( n n x=1 xr ) + i 1 r 1 n 1 n n i=1 n rxr−1 i xr i=1 i = 1 r2 r n i=1 n xr log xi i xr i 1 − log( n n x=1 xr ) . X2 . and let Y = max(X1 . n i=1 1 xi )) = ( i=1 xi ) n . r→0 lim log Ar 1 1 = lim log[ xr ] r r→0 n x=1 i 1 1 = lim log[ xr ] = r→0 r n x=1 i = 1 n n r→0 n lim 1 n n r−1 i=1 rxi n 1 r i=1 xi n = r→0 lim 1 n n r i=1 xi log xi n 1 r i=1 xi n n i=1 log xi = i=1 n 1 1 log( xi ). n x=1 1 1 n ( x1 the arithmetic mean.57 a. ai . i b. (i) if log Ar is nondecreasing then for r ≤ r log Ar ≤ log Ar . i r n r i=1 xi log xi n r x=1 xi − log( 1 xr ) n x=1 i n n = log(n) + r n n r i=1 xi log xi n r x=1 xi − log( x=1 xr ) i xr i n = log(n) + i=1 n xr i n i=1 xr i n i=1 xr i n i=1 xr i xr i r log xi − n i=1 xr i n log( x=1 xr ) i = = log(n) + i=1 n (r log xi − log( x=1 n x=1 xr i xr )) i n log(n) − i=1 xr i log( xr i ) = log(n) − i=1 ai log( 1 ). 3 1 − e−y/λ 0 n 2 −y/λ n = 1 xi . The term Thus A0 = limr→0 Ar = exp( n log( r−1 r−1 d d r r rxi = xi log xi since rxi = dr xi = dr exp(r log xi ) = exp(r log xi ) log xi = xr log xi . X3 ) ≤ y) = P (X1 ≤ y and X2 ≤ y and X3 ≤ y) = P (X1 ≤ y)P (X2 ≤ y)P (X3 ≤ y). Thus Ar is nondecreasing in r.55 Let X1 . Now each probability is P (X1 ≤ y) = 1 − e−y/λ . X2 . Therefore Ar ≤ Ar . y 1 −x/λ e dx 0 λ by the independence of X1 . X2 . X2 and X3 . X3 be independent exponential(λ) random variables. so P (Y ≤ y) = 1−e−y/λ and the pdf is fY (y) = 4. i x=1 where we use the identity for rxr−1 showed in a). the geometric mean. e 0 < y < ∞. y>0 y ≤ 0. xn ) n 1 x−1 ]−1 = n x=1 i n n 1 + ··· + the harmonic mean. then elog Ar ≤ elog Ar . A1 A−1 = = [ [ 1 1 x1 ] 1 n x=1 i = 3 .

Second Edition n 4-21 1 We need to prove that log(n) ≥ i=1 ai log( ai ). This can be done without loss of generality because we could work with the quantities X − EX. Therefore X2 −1 X1 and V = X1 . Y ) = E[E(XY |Z) − E(X|Z)E(Y |Z)] + E[E(X|Z)E(Y |Z)]. and EZ = 0. 4.Y (h1 (u. fU. as the range of X2 changes. 1 − e−(1+ ) P (X1 ≤ x|B2 ) Thus lim e−(x1 +x2 ) dx2 dx1 e−x2 dx2 x x1 e dx1 0 1+ 0 = →0 P (X1 ≤ x|B2 ) = 1 − ex = = P (X1 ≤ x|X2 = 1). 0. v). 5}.15. {x1.e. the solid lines are B1 and the dashed lines are B2 . while the dashed lines are constant.61 a. let U = x1 = h2 (u. Then x2 = h1 (u. EY = 0. . v) = fX. 1+ − 1 1+ − e −v ∗ (1− ) 1− 1 1− v∗ 0 + 1 1− 1 e−1 − 1+ + ∗ ∗ Thus lim →0 P (X1 ≤ x|B1 ) = 1 − e−v − v ∗ e−v = = x 1+ 0 0 ve−v dv = P (V ≤ v ∗ |U = 0). 1 . 0. etc. h2 (u. Y |Z). Using Jensen inequality we have that n n 1 1 1 1 E log( a ) = i=1 ai log( ai ) ≤ log(E a ) = log( i=1 ai ai ) = log(n) which establish the result. (u + 1)2 Thus V |U = 0 has distribution ve . Thus B1 is informative. e−(x+1+ ) − e−(1+ ) − e−x + 1 . v))|J| = e−(uv+1) e−v v. establishing the identity. Y ) = EXY = E[E(XY |Z)]. Adding and subtracting E(X|Z)E(Y |Z) gives Cov(X.05}]}] c.05}].59 Assume that EX = 0. b. This can now be recognized as ECov(X. By iterating the expectation we have Cov(X. Dashing[{}]. the second term above is Cov[E(X|Z)E(Y |Z)]. 4.Dashing[{0. For the ﬁrst term write E[E(XY |Z) − E(X|Z)E(Y |Z)] = E [E { XY − E(X|Z)E(Y |Z)| Z}] where we have brought E(X|Z) and E(Y |Z) inside the conditional expectation. The following Mathematica code will draw the picture. v) = uv +1. v) = v. PlotStyle -> {Dashing[{}]. 1 + e}. e = 1/10. P (X1 ≤ x|B1 ) = P (V ≤ v | − < U < ) = e−1 = e −v ∗ (1+ ) ∗ v∗ 0 − ∞ 0 − ve−(uv+1) e−v dudv ve−(uv+1) e−v dudv .15. Dashing[{0. Plot[{-e*x1 + 1. Note that the solid lines increase with x1. Since E[E(X|Z)] = EX = 0. 0. To ﬁnd the distribution of f (X1 |Z). and fU (u) = 0 −v ∞ ve−(uv+1) e−v dv = e−1 . e*x1 + 1. The distribution of X1 |X2 is e−x1 since X1 and X2 are independent.V (u.

4.4-22 Solutions Manual for Statistical Inference 4. |X + Y | ≤ |X| + |Y | ⇒ E|X + Y | ≤ E(|X| + |Y |) = E|X| + E|Y |. Note that g(x0 ) is a maximum in {x : h(x) ≤ 0} and a minimum in {x : h(x) ≥ 0} since g(x) is nondecreasing. unless P (Z = EZ = 0) = 1. Let a and b be real numbers.64 a. For part (a) ∞ E(g(X)h(X)) = −∞ g(x)h(x)fX (x)dx g(x)h(x)fX (x)dx + {x:h(x)≤0} {x:h(x)≥0} = ≤ g(x0 ) g(x)h(x)fX (x)dx h(x)fX (x)dx {x:h(x)≥0} h(x)fX (x)dx + g(x0 ) {x:h(x)≤0} ∞ = −∞ h(x)fX (x)dx = g(x0 )Eh(X) = 0. where x0 is the number such that h(x0 ) = 0. by Jensen’s Inequality EX = Eg(Z) ≥ g(EZ) = e0 = 1. b. Take the square root of both sides to get |a + b| ≤ |a| + |b|. there is equality in Jensen’s Inequality if and only if there is an interval I with P (Z ∈ I) = 1 and g(z) is linear on I. The case when g(x) and h(x) are both nonincreasing can be proved similarly. So EX > 1.63 Since X = eZ and g(z) = ez is convex.65 Without loss of generality let us assume that Eg(X) = Eh(X) = 0. 4. For part (b) where g(x) and h(x) are both nondecreasing ∞ E(g(X)h(X)) = −∞ g(x)h(x)fX (x)dx g(x)h(x)fX (x)dx + {x:h(x)≤0} {x:h(x)≥0} = ≥ g(x0 ) g(x)h(x)fX (x)dx h(x)fX (x)dx {x:h(x)≥0} h(x)fX (x)dx + g(x0 ) {x:h(x)≤0} ∞ = −∞ h(x)fX (x)dx = g(x0 )Eh(X) = 0. In fact. But ez is linear on an interval only if the interval is a single point. Then. |a + b|2 = (a + b)(a + b) = a2 + 2ab + b2 ≤ |a|2 + 2|ab| + |b|2 = (|a| + |b|)2 . .

Hence fZ (z) = if B = τ σ+τ . 5. σπB−τ πD− = π 2 στ τ π(σ+τ ) 1+ (z/(σ+τ ))2 M= −στ 2 1 2z(σ+τ ) 1+( z )2 . 5. For Z = X − Y .01.3 Note that Yi ∼ Bernoulli with pi = P (Xi ≥ µ) = 1 − F (µ) for each i. p). Then b. −∞ 1 2 The integral is ﬁnite and equal to zero if A = M σ2 . set W = X. The probability that a sample contains a color blind person is P (X > 0) = 1 − P (X = 0). w) = fX (w)fY (z/w) · |−1/w|. thus fZ (z) = −∞ |w/z 2 |fX (w)fY (w/z)dw. where P (X = 0) = n (. 1 1 = 1. 0 P (X > 0) = 1 − . 0 1 = w/z 2 . Thus.99n . Then Y = W − Z.99) ≈ 299. Then Y = Z/W and |J| = 0 1 = −1/w.7 It is. thus fZ (z) = −∞ |−1/w| fX (w)fY (z/w)dw. Then 2 −w/z 1/z ∞ fZ. −∞ D 1+ dω ω−z 2 τ = τ πD and ∞ −∞ Aω 1+ ∞ − ω 2 1+ σ Aω 1+ ω σ Cω ω−z 2 τ 2− dω ∞ = −∞ C(ω−z) 1+ 2 ω−z 2 τ dω − Cz = A σ2 ω log 1+ 2 σ − Cτ 2 log 1+ 2 ω−z τ dω ω−z 2 −∞ 1+ τ 2 ∞ − τ πCz. ¯ 5. i=1 Yi ∼ binomial(n. p = 1 − F (µ)). For Z = X/Y . and |J| = 0 −1 ∞ fZ. perhaps.01)0 (. set W = X. a scale transformation.99n > . 1 2πM z 1 1 . set W = X. We have c. ¯ ¯ 5.W (z.Chapter 5 Properties of a Random Sample 5. w) = fX (w)fY (w − z) · 1. . Then 2 1/w −z/w ∞ fZ. where p = . For Z = XY . σ+τ D= σ σ+τ ) . thus fZ (z) = −∞ fX (w)fY (w − z)dw. Therefore the pdf of X is x 1 fX (x) = 1/n fY 1/n = nfY (nx).6 a. Then X ∼ binomial(n.05)/ log(. Then X = (1/n)Y .99)n = . Then Y=W/Z and |J| = ∞ −∞ B 1+ ω σ ∞ 2 dω = σπB.W (z.5 Let Y = X1 + · · · + Xn . X = W . C = M τ22 for some constant M .95 ⇔ n > log(.1 Let X = # color blind people in a sample of size n. easiest to recover the constants by doing the integrations.W (z. w) = fX (w)fY (w/z) · |w/z 2 |. Since the Yi ’s are iid n Bernoulli.

B) = 2n(n − 1) θ4 − θ2 . (Note: Without loss of generality we can assume θ1 = 0. 1 2n(n − 1) = n n (X i −Xj ) i=1 j=1 n n 2 1 2n(n − 1) 1 2n(n − 1) ¯ ¯ (Xi − X + X − Xj )2 i=1 j=1 n n = ¯ 2 ¯ ¯ ¯ 2 (X i −X) −2(X i −X)(X j −X) + (X j −X) i=1 j=1  =   n  n n n   1 ¯ ¯ ¯ ¯  (Xj − X)2  n(Xi − X)2 − 2 (Xi − X) (Xj −X) +n   2n(n − 1)  i=1  i=1 j=1 j=1 =0 = n 2n(n − 1) 1 n−1 n n ¯ (Xi − X)2 + i=1 n 2n(n − 1) n ¯ (Xj − X)2 j=1 = ¯ (Xi − X)2 = S 2 . Although all of the calculations here are straightforward.) 1 (i) Prove the equation for n = 4. so EXi = 0. 4 4 Var(S 2 ) = 1 1 1 2 2 2 2 24 × 2(θ4 + 3θ2 ) + 96(θ4 + 3θ2 ) + 24 × 4θ4 − (24θ2 ) = θ4 − θ2 . (Let Sn denote the variance based on n observations. Of the remaining terms. there is a tedious amount of bookkeeping needed. The expected value E(S 2 )2 = E(S 4 ) contains 256(= 44 ) terms of which 112(= 4 × 16 + 4 × 16 − 42 ) are zero. and to calculate Var(S 2 ) we need to calculate E(S 2 )2 and E(S 2 ). i=1 b.8 a. It seems that induction is the easiest route. The latter expectation is straightforward and we get E(S 2 ) = 24θ2 . 2 24 4 3 (ii) Assume that the formula holds for n.5-2 Solutions Manual for Statistical Inference 5. and establish it for n+1. We have S 2 = 24 i=1 j=1 (Xi − Xj )2 . whenever i = j.) Straightforward algebra will establish   n n n 1 2 2 2  Sn+1 = (X −Xj ) + 2 (X k −Xn+1 ) 2n(n + 1) i=1 j=1 i k=1 def’n = 1 [A + 2B] 2n(n + 1) where Var(A) = 4n(n − 1)2 θ4 − n−3 2 θ n−1 2 (induction hypothesis) (Xk and Xn+1 are independent) (some minor bookkeeping needed) 2 Var(B) = n(n + 1)θ4 − n(n − 3)θ2 2 Cov(A. 2 • 24 are of the form E(Xi − Xj )4 = 2(θ4 + 3θ2 ) 2 2 2 • 96 are of the form E(Xi − Xj ) (Xi − Xk ) = θ4 + 3θ2 2 2 2 • 24 are of the form E(Xi − Xj ) (Xk − X ) = 4θ2 Thus.

then n+1 n+1 n+1 2 a2 i i=1 i=1 n b2 i a2 i i=1 n − i=1 ai bi n n 2 = + a2 n+1 i=1 n b2 i b2 i + n b2 n+1 2 − i=1 ai bi + an+1 bn+1 = i=1 a2 i i=1 n − i=1 ai bi n n + i=1 n−1 n a2 i b2 + a2 n+1 n+1 i=1 n b2 i −2 i=1 ai bi an+1 bn+1 = i=1 j=i+1 n n+1 (ai bj − aj bi )2 + i=1 (ai bn+1 − an+1 bi )2 = i=1 j=i+1 (ai bj − aj bi )2 . θ1 = EXi = µ . Therefore i=1 j=i+1 (ai bj − aj bi )2 = 0. then bi = cai .10 a. S 2 ) = 2 2n (n − 1)  n n n Xk k=1 i=1 j=1   2 (X i −Xj ) . (a1 b2 − a2 b1 )2 = a2 b2 + a2 b2 − 2a1 b2 a2 b1 1 2 2 1 2 2 2 2 = a1 b2 + a2 b1 − 2a1 b2 a2 b1 + a2 b2 + a2 b2 − a2 b2 − a2 b2 1 1 2 2 1 1 2 2 = (a2 + a2 )(b2 + b2 ) − (a1 b1 + a2 b2 )2 . For each of these. 1 2 1 2 Assume that is true for n. for some constant c = 0. c. the entire expectation is nonzero for only two values of k (when k matches either i or j). 2n2 (n − 1) n ¯ and X and S 2 are uncorrelated if θ3 = 0. Thus the correlation coeﬃcient is equal to 1. Then   1 ¯ E Cov(X. 5. Var(S n+1 ) = 2 1 4n2 (n + 1) 2 [Var(A) + 4Var(B) + 4Cov(A. Let n n+1 bi = Y − µy and ai = (X − µx ).9 To establish the Lagrange Identity consider the case when n = 2.  The double sum over i and j has n(n − 1) nonzero terms. Thus ¯ Cov(X. B)] = 1 n−2 2 θ4 − θ .Second Edition 5-3 Hence. 5. Again assume that θ1 = 0. If all the points lie on a straight line then Y − µy = c(X − µx ). n+1 n 2 establishing the induction and verifying the result. S 2 ) = 1 2n(n − 1) EXi (Xi − Xj )2 = θ3 .

 . is to use linear model theory.13 √ E c S2 = c = c σ2 E n−1 σ2 n−1 ∞ 0 S 2 (n − 1) σ2 √ q Γ n−1 2 n−1 1 q ( 2 )−1 e−q/2 dq. σ ≥ ES. Use the fact that (n − 1)S 2 /σ 2 ∼ χ2 n−1 and Varχn−1 = 2(n − 1) to get Var 2 (n − 1)S σ2 2 = 2(n − 1) which implies ( (n−1) )VarS 2 = 2(n − 1) and hence σ4 VarS 2 = 2(n − 1) (n − 1) /σ 4 2 = 2σ 4 . . n−1 n−1 2 c. (n−1)/2 2 Since S 2 (n − 1)/σ 2 is the square root of a χ2 random variable. Aθ = 0. in fact. . 2 Γ(n/2)2n/2 =1 since χ2 n pdf So c = Γ( √ n−1 2 ) √ n−1 2Γ( n ) 2 gives E(cS) = σ. . So VarS 2 = 1 (n − 1) 2 Var(X AX) = 1 (n − 1) 2 2σ 4 (n − 1) + 0 = 2σ 4 . n−1 5. . Write 2 n 1 1 ¯ ¯ S 2 = n−1 i=1 (Xi − X) = n−1 X (I − Jn )X. Since s2 is “linear” only on single points. the inequality will be strict unless there is an interval I such that g is linear on I and P (X ∈ I) = 1. Taking square roots.Where  1 1 1  1− n − n · · · − n  .. not using the χ2 distribution. For any matrix A Var(X AX) = 2µ2 trA2 + 4µ2 θ Aθ. VarS 2 = n (θ4 − n−3 θ2 ) = n (3σ 4 − n−3 σ 4 ) = n−1 . 1 1 −n · · · · · · 1− n Notice that trA2 = trA = n − 1. n  . we know from Jensen’s inequality that Eg(S) ≥ g(ES).   − 1 1− 1 .5-4 Solutions Manual for Statistical Inference θ2 θ3 θ4 = E(Xi − µ)2 = σ 2 = E(Xi −µ)3 = E(Xi − µ)2 (Xi − µ) (Stein’s lemma: Eg(X)(X − θ) = σ 2 Eg (X)) = 2σ 2 E(Xi − µ) = 0 = E(Xi − µ)4 = E(Xi − µ)3 (Xi − µ) = 3σ 2 E(Xi − µ)2 = 3σ 4 . 5. θ = EX = µ1. where µ2 is σ 2 . Now adjust the integrand to be another χ2 pdf and get √ E c S2 = c σ2 Γ(n/2)2n/2 · n − 1 Γ((n − 1)/2)2((n−1)/2 ∞ 0 1 1 q (n−1)/2 − e−q/2 dq . From the proof of Jensen’s Inequality. . unless P (T = ET ) = 1.  n ¯ I − Jn =  . . Since g(·) is a convex function. 4 1 1 2σ 2 b. we have ET 2 > (ET )2 for any random variable T . n−1 Remark: Another approach to b). .   .11 Let g(s) = s2 . which implies σ 2 = ES 2 ≥ (ES)2 . it is clear that.

From Deﬁnition 5. independent. so we make the transformation x = (u/p)/(v/q) and y = u + v. v) to obtain fX. and |J| = 1 + px q py q 1 + px 2. b. y) = 1 Γ p 2 p q xy q + px p 2 −1 Γ q 2 2(p+q)/2 1 y q 1 + px q 2 −1 e −y 2 q py q 1 + px 2. but this one makes calculations easy.) Solving for u and v yields u= p q xy q . n+1 n+1 n+1 n+1 i=1 ¯ Xi − Xn+1 i=1 2 2 = i=1 n+1 Xi − ¯ Xn+1 + nXn n+1 (use (a)) 2 = i=1 n+1 ¯ Xn+1 nXn Xi − − n+1 n+1 ¯ Xi − Xn − = i=1 n+1 ¯ Xn+1 Xn − n+1 n+1 2 ¯ ±Xn 1 (n + 1) 2 2 = i=1 n ¯ Xi −Xn ¯ Xi − Xn i=1 2 ¯ −2 Xi −Xn ¯ + Xn+1 − Xn ¯ Xn+1 −Xn n+1 2 + ¯ Xn+1 −Xn 2 = 2 −2 ¯ (X n+1 −Xn ) n+1 ¯ + 2 Xn+1 − Xn n+1 (n + 1) n 2 since 1 ¯ (Xi − Xn ) = 0 = 5.Y (x. + px 1 v= y q . nS 2 n+1 = n (n + 1) − 1 n+1 n+1 n ¯ Xi Xn+1 + i=1 Xi Xn+1 + nXn = = . Square the random variable in part b). We then substitute into fU.V (u. 3 i=1 Xi −1 i (n − 2 1)Sn n ¯ + Xn+1 − Xn n+1 2 .Second Edition 5-5 5. 5.3.6. .15 a. many choices of y will do. Their joint pdf is p q 1 Γ p 2 Γ q 2 2(p+q)/2 u 2 −1 v 2 −1 e p q −(u+v) 2 . Xi −i 2 i ∼ χ2 3 3 i=2 Xi −i 2 i 2 ∼ t2 c. ¯ Xn+1 = b.17 a. (Of course. Let U ∼ χ2 and V ∼ χ2 .16 a. The choice is prompted by the exponential term in the pdf. the random variable X = (U/p)/(V /q) has an F distribution.

q ) = E Now E(U 2 ) = Var(U ) + (EU )2 = 2p + p2 and E Therefore. we get the F pdf fX (x) = Γ Γ p 2 p+q 2 Γ q 2 q p p/2 xp/2−1 1+ q px p+q 2 . Since Fp. Let Y = U/p V /p q 2 (p + 2) q2 =2 − p(q − 2)(q − 4) (q − 2)2 1 X q q−2 2 q+p−2 p(q − 4) . ﬁrst calculate U 2 q2 p2 V 2 = q2 E(U 2 )E p2 1 V2 . 2 p (q − 2)(q − 4) p (q − 2)(q − 4) and.p .q = Γ q 2 2q/2 q q−2 .q ) = c. V ∼ χ2 and U and V are independent. V ∼ χ2 and U and V are independent. 2 Γ (q/2) 2q/2 v (q − 2)(q − 4) 1 q2 (p + 2) q2 p(2 + p) = . Then we have p q = = E U/p V /q p 1 qE p V = E U p E q V (by independence) (EU = p). q let U ∼ χ2 . If we integrate out y to recover the p+q proper constant. b. 2 2 . 2 2 q 2 −1 p q p qy p(1−y) q p+q 2 1+ p qy q p(1−y) q p(1 − y) 2 = y 2 −1 (1 − y) 2 −1 ∼ beta p q . p q qY p(1−Y ) (p/q)X 1+(p/q)X pX q+pX . EFp. . Y has pdf p−2 2 fY (y) = Γ Γ B p 2 q+p 2 Γ p q . hence Var(Fp. Thus.q = χ2 /p p χ2 /q . q > 4. q−2 v = Hence.5-6 Solutions Manual for Statistical Inference Note that the pdf factors.q Then E 1 V ∞ = 0 1 vΓ 1 1 q 2 v 2 −1 e− 2 dv = 2q/2 q−2 2 2(q−2)/2 = q v 1 Γ Γ q 2 ∞ 2q/2 q−2 2 q−2 2 v 0 q−2 2 −1 e− 2 dv = 1 . 2 E(Fp. 2 EFp. then = = V /q U/p ∼ Fq. EFp. Write X = d. and we can read oﬀ the pdfs of each: X has the F distribution and Y is χ2 . since U ∼ χ2 .q = 1 V2 ∞ = 0 1 1 1 v (q/2)−1 e−v/2 dv = . Γ Γ 2(q−2)/2 q−2 2 2q/2 p q p q−2 = if q > 2. showing that X and Y are independent. so X = and p 2 dx dy q = p (1 − y)−2 . To calculate the variance.

q d q b.ν > Fα.ν . for any a > 0.20 a. kFα. 1 e. fX (x) = √ Γ(p/2) pπ (1 + x2 /p)(p+1)/2 Denote the quantity in square brackets by Cp .ν > a).ν is stochastically larger that (k−1)Fk−1. Thus. EX = EZ/ V /p = (EZ)(E1/ V /p) = 0. For k1 > k2 .k. d P (χp > a) = P (χ2 + χ2 > a) > P (χ2 > a).28) we have √ p→∞ lim Cp = p→∞ lim √ 2π 2π p−1 2 p−2 2 p−1 1 2 +2 p−2 1 2 +2 e− e p−1 2 − p−2 2 1 √ pπ = e−1/2 e1/2 √ √ . b.p . ν k we have P (k1 Fk1 . V and W are independent and U ∼ χ22 . The pdf of X is Γ( p+1 ) 1 2 .ν is the α cutoﬀ point for the random variable kFk. as long as the other expectation is ﬁnite. k V ∼ χ21 −k2 and W ∼ χ2 . then X = Z/ V /p where Z ∼ n(0.k.ν . since EZ = 0.ν ∼ (U + V )/(W/ν).14. So. V ∼ χ2 and Z and V are independent. χ2 ∼ χ2 + χ2 where χ2 and χ2 are independent χ2 random variables with q and d = p − q p q q d d degrees of freedom. α = P (Fk. Because kFk.p is the square of a tp . so the ratio is distributed F1. each a tp squared. Applying the lemma again shows that for each x p→∞ lim 1+x2 /p (p+1)/2 = ex 2 /2 . d. we expect it to converge to a χ2 random variable as q p → ∞. a χ2 . that is kFα. This is so if p > 1. X 2 ∼ F1.k−1.ν is larger than the α cutoﬀ for (k − 1)Fk−1. establishing the result.ν ). we conjecture that it would converge to the square of a n(0. From part b).ν .3.ν > (k − 1)Fα. Z 2 ∼ χ2 .k.ν . From an extension of Stirling’s formula (Exercise 1.Second Edition 5-7 5.17b).k. π 2 = e−1/2 √ lim π p→∞ p−1 2 p−2 2 p−1 1 2 +2 p−2 1 2 +2 √ p by an application of Lemma 2. 5. 1). X 2 = Z 2 /(V /p). the α cutoﬀ for kFk. 1) random variable. Since χ2 is a positive random variable.ν > a) = P ((U + V )/(W/ν) > a) > P (U/(W/ν) > a) = P (k2 Fk2 . c. The given integral is ∞ 0 √ 2 1 1 √ e−t x/2 ν x (νx)(ν/2)−1 e−νx/2 dx ν/2 2π Γ(ν/2)2 1 ν ν/2 √ 2π Γ(ν/2)2ν/2 ∞ 0 = e−t 2 x/2 ((ν+1)/2)−1 −νx/2 x e dx .19 a. because V /(W/ν) is a positive random variable. As the random variable F1.p can be thought of as the sum of q random variables. 1 c. The random variable qFq.18 If X ∼ tp . k1 Fk1 .ν > kFα. if p > 2 (from Exercise 5.p .ν ) = P (kFk. where U . p a. Thus VarX = EX 2 = p/(p − 2). by all of the above. For any a > 0. 5.

for general n we have P (max(X 1 . collect terms. 2/(ν+t2 ) 1 ν √ Γ((ν + 1)/2) 2π Γ(ν/2)2ν/2 (ν+1)/2 1 1 Γ((ν+1)/2) √ . write out the chi-squared pdfs. Y ) > z)] − [1 − P (min(X.5-8 Solutions Manual for Statistical Inference = = = 1 ν ν/2 √ 2π Γ(ν/2)2ν/2 ν/2 ∞ 0 x((ν+1)/2)−1 e−(ν+t 2 ν+t2 2 )x/2 dx integrand is kernel of gamma((ν+1)/2. .22 Calculating the cdf of Z 2 . Xn ) > m) = = 5. 2 νπ Γ(ν/2) (1 + t /ν)(ν+1)/2 the pdf of a tν distribution.21 Let m denote the median. as before. 1 − P (Xi ≤ m for i = 1. 5. .ν . . b.ν . n) 1 − [P (X1 ≤ m)] n n = 1− 1 2 . the pdf of an F1. Now. the pdf of an Fm. . and deﬁne y = (ν/m)t to get fF (y) = Γ( ν+m ) 2 Γ(m/2)Γ(ν/2) m ν m/2 y m/2−1 (1 + (m/ν)y) (ν+m)/2 . Y ))2 ≤ z) = P (−z ≤ min(X. Y ) > − z)] √ √ P (min(X. . c. and collect terms to obtain (ν/m)fF ((ν/m)t) = t−m/2 Γ(m/2)Γ(ν/2)2 (ν+m)/2 0 ∞ y (m+ν−2)/2 e−(1+t)y/2 dy. . Diﬀerentiate both sides with respect to t to obtain ∞ νfF (νt) = 0 yf1 (ty)fν (y)dy. . Y ) > − z) − P (min(X. Again diﬀerentiate both sides with respect to t. Y ) ≤ − z) √ √ [1 − P (min(X. Y ) > z) √ √ √ √ P (X > − z)P (Y > − z) − P (X > z)P (Y > z). Y ) ≤ z) √ √ P (min(X. Y ) ≤ z) − P (min(X. where fF is the F pdf. . νΓ(1/2)Γ(ν/2) (1 + y/ν)(ν+1)/2 t−1/2 Γ(1/2)Γ(ν/2)2 t−1/2 Γ(1/2)Γ(ν/2)2 (ν+1)/2 (ν+1)/2 0 ∞ y (ν−1)/2 e−(1+t)y/2 dy Γ( ν+1 )2(ν+1)/2 2 (1 + t) (ν+1)/2 . integrate the gamma kernel. . Now write out the two chi-squared pdfs and collect terms to get νfF (νt) = = Now deﬁne y = νt to get fF (y) = −1/2 Γ( ν+1 ) (y/ν) 2 . Then. we obtain FZ 2 (z) = = = = = √ P ((min(X. 2.

0 < z < y < θ. √ √ d P (− z ≤ X ≤ z) = dz 1 √ z −1/2 e−z/2 .Y (z. 0 < x < θ. FX (x) = x/θ. 5. 1 5. Since X and Y are identically distributed. . Let Y = X(n) . Ux > z|x)P (X = x) (by independence of the Ui ’s) ∞ P (Ui > z)P (X = x) x=1 i=1 ∞ = x=1 P (Ui > z)x P (X = x) = x=1 ∞ x (1 − z)x 1 (e − 1)x! = 1 (1 − z) (e − 1) x=1 x! = e 1−z −1 e−1 0 < z < 1. . Moreover. from Theorem 5.23 ∞ ∞ . P (X > a) = P (Y > a) = 1 − FX (a).4. Z = X(1) . θn . fZ. Alternatively.Second Edition 5-9 where we use the independence of X and Y .6. √ √ since 1 − FX ( z) = FX (− z). Y )] ≤ z √ √ = P (− z ≤ min(X. X ≤ Y ) + P (− z ≤ Y ≤ z. Y ≤ X) √ √ = P (− z ≤ X ≤ z|X ≤ Y )P (X ≤ Y ) √ √ +P (− z ≤ Y ≤ z|Y ≤ X)P (Y ≤ X) √ √ √ √ 1 1 P (− z ≤ X ≤ z) + P (− z ≤ Y ≤ z). y) = n! 11 0!(n − 2)!0! θ θ z θ 0 y−z θ n−2 1− y θ 0 = n(n − 1) (y − z)n−2 . Then. so √ √ √ FZ 2 (z) = (1 − FX (− z))2 − (1 − FX ( z))2 = 1 − 2FX (− z). . 2π 1 1 1 √ (e−z/2 z −1/2 + e−z/2 z −1/2 ) 2 2 2π P (Z > z) = = P (Z > z|x)P (X = x) = x=1 ∞ x x=1 P (U1 > z. dz z 2π the pdf of a χ2 random variable. Y ) ≤ z) √ √ √ √ = P (− z ≤ X ≤ z.24 Use fX (x) = 1/θ. Diﬀerentiating and substituting gives fZ 2 (z) = √ 1 d 1 FZ 2 (z) = fX (− z) √ = √ e−z/2 z −1/2 . = 2 2 1 2 2 using the facts that X and Y are independent. 1 P (Z ≤ z) 2 = P [min(X. and P (Y ≤ X) = P (X ≤ Y ) = since X and Y are identically distributed √ √ 2 P (Z ≤ z) = P (− z ≤ X ≤ z) and fZ 2 (z) = = the pdf of a χ2 . .

4. . integrate out the other variables and obtain that fY1 (y1 ) = c1 y1 .7. Make the one-to-one transformation to Y1 = X(1) /X(2) . . 0 < w < 1. we obtain fYi (yi ) = iayi . the pdf of Yn is fYn (yn ) = θna yn product of these marginal pdfs is the joint pdf given above. From Example 5. we obtain the following.5-10 Solutions Manual for Statistical Inference Now let W = Z/Y . ia−1 0 < y1 < 1. Z = W Q. yn ) factors so Y1 . . u > v. .4. . 5.25 The joint pdf of X(1) . b. . . from a population with pdf given by the truncated distribution. W and Q are independent. yn ) = = n!an 2 n−1 (y1 · · · yn )a−1 (y2 · · · yn )a−1 · · · (yn )a−1 (y2 y3 · · · yn ) θan n!an a−1 2a−1 na−1 y y2 · · · yn . .4. Yn = X(n) . . 0 < yn < θ. a−r r/2 < v < a − r/2. 5. depending on which of i or j is greater. . . Yn are mutually independent. This is the pdf of the (i−j)th order statistic from a sample of size n−j. n − 1. To get the pdf of a−1 Y1 . . u < v. f (u) = fX (u)/FX (v). . fV |R (v|r) = n(n − 1)r n(n − 1)r n−2 /a n n n−2 (a − r)/a = 1 . from a population with pdf given by the truncated distribution. 0 < y1 < 1. un ) = n!an a−1 u · · · ua−1 . n − 1. and. it must be that c1 = a. . So the joint pdf of Y1 . and after cancellation. . 0 < yi < 1. (i) If i < j. Q = Y . 0 < yn < θ. From na na−1 . Similarly. 0 < yi < 1. . . . . Then Y = Q. . . . Consider two cases. θn θn The joint pdf factors into functions of w and q. . u < v. .27 a. Using the formulas from Theorems 5. X(n) is f (u1 .6. and |J| = q. . θan 1 We see that f (y1 . . Yn is f (y 1 .4 and 5.4. . fX(i) |X(j) (u|v) = fX(i) . fX(i) |X(j) (u|v) = = (n − j)! n−i i−1−j j−n fX (u) [1−F X (u)] [FX (u) − F X (v)] [1−F X (v)] (n − 1)!(i − 1 − j)! FX (u) − F X (v) (n − j)! fX (u) (i − j − 1)!(n − i)! 1−F X (v) 1−F X (v) i−j−1 1− FX (u) − F X (v) 1−F X (v) n−i . hence. . v)/fX(j) (v). . Therefore fW. . . .X (j) (u. . fX(i) |X(j) (u|v) = = (j − 1)! 1−j i−1 fX (u)FX (u)[FX (v) − FX (u)]j−i−1 FX (v) (i − 1)!(j − 1 − i)! (j − 1)! fX (u) FX (u) (i − 1)!(j − 1 − i)! FX (v) FX (v) i−1 1− FX (u) FX (v) j−i−1 . Thus fY1 (y1 ) = ay1 . . It can be checked that the Theorem 5. n θan 1 0 < u1 < · · · < un < θ. for some a−1 constant c1 . To have this pdf integrate to 1. f (u) = fX (u)/(1 − FX (v)). q) = n(n − 1) n(n − 1) (q − wq)n−2 q = (1 − w)n−2 q n−1 . (ii) If j < i and u > v. 2 n−1 The Jacobian is J = y2 y3 · · · yn .Q (w. 0 < q < θ. . Note this interpretation. i = 1. This is the pdf of the ith order statistic from a sample of size j−1.4. Yn−1 = X(n−1) /X(n) . . . for i = 2.

004 − 1)/(.32 a. yielding bounds on X − µ that are almost twice as big as the normal approximation.3 ¯ = P (−. n/5 2 = 2. since Xn → a in probability. X .3 ∼ n(0. we want ¯ are independent.9 which implies k ≥ 10 = 3. 2 5 2 √ √ √ √ where Z ∼ n(0. ¯ Thus. X1 ∼ n(µ.4935 < X − µ < .8) = .2119. √ √ as n → ∞. ¯ ¯ ¯ ¯ 5. P (X > 1. which implies n = 50(2. √ We need 1 − 1/k 2 ≥ .4935). σX with σX = ¯ ¯ ¯ − µ)/. ¯ X is probably very close to normally distributed. σ 2 /n). Use Chebyshev’s Inequality to get ¯ ¯ P −3k/10 < X−µ < 3k/10 ≥ 1 − 1/k 2 . Thus we need P (Z ≥ n/5( 2)) ≈ .16 and 3k/10 = . 2 5. For any > 0. Since X1 and X2 ¯ 1 − X2 ∼ n(0.09 = .9 2 ¯ by Chebychev’s Inequality.30 From the CLT we have.29 Let Xi = weight of ith booklet in package. Thus Xn → a in probability. 5. 1+ a+ a a <a+ 1+ 1− as n → ∞. Thus a/Xn → 1 in probability.99 ≈ P = P ≈ P ¯ ¯ X1 −X2 < σ/5 −σ/5 σ/ − 1 5 n/2 < ¯ ¯ X1 −X2 σ/ n/2 < σ/5 σ/ n/2 n 1 n <Z< .9 = P −1. Using the CLT. 2σ 2 /n). approximately.Second Edition 5-11 5. P a −1 ≤ Xn = P = P ≥ P = P a a ≤ Xn ≤ 1+ 1− a a a− ≤ Xn ≤ a + 1+ 1− a a a− ≤ Xn ≤ a + 1+ 1+ a |Xn − a| ≤ → 1. ¯ By the CLT.052 .004 = P (X > 1.4 = P i=1 Xi /100 > 1.645 . The Xi s are iid with EXi = 1 and VarXi = . σ 2 /n). .31 We know that σX = 9/100.9487. even if the underlying X distribution is not close to normal. 1).3 . √ √ √ √ P Xn − a > = P Xn − a Xn + a > Xn + a √ = P |Xn − a| > Xn + a √ ≤ P |Xn − a| > a → 0. Thus.9487 < X − µ < . b. 100 100 ¯ We want to approximate P i=1 Xi > 100.004) ≈ P (Z > (1. since Xn → a in probability. X2 ∼ n(µ.645 < ¯ X−µ < 1. Moreover. with a sample of size 100.005. Thus and (X √ . X is approximately n µ.004). we again see the conservativeness of Chebychev’s Inequality. For any > 0. Thus ¯ P (−.576)2 ≈ 332. 1). From Table 1.05/10)) = P (Z > .576.9487) ≥ .

2 d d 1 P (Z ≤ x) = FZ (x) = fZ (x) = √ e−x /2 .34 Using EXn = µ and VarXn = σ 2 /n.1085 .0147 . Substituting x = 0 2π √ yields n! ≈ nn+1/2 e−n 2π.1024 . For the exact calculations. (1/Γ(n))(x n + n)n−1 e−(x n+n) n ≈ √1 e−x /2 as n → ∞.1007 .0258 . 1) 1/ n b. Sn → σ 2 in probability. 5. .0332 .1113 . we obtain √ ¯ √ √ n(Xn −µ) n ¯ n E = E(Xn − µ) = (µ − µ) = 0.0588 .0882 . σ σ σ σ n ¯ 5. Xn is approximately n(1. σ σ σ √ ¯ n(Xn −µ) n n n σ2 ¯ ¯ Var = 2 Var(Xn − µ) = 2 VarX = 2 = 1. Note that the recursion relation of problem 3. 1) i=1 √ √ √ d FW (x n + n) = fW (x n + n) · n = dx √ √ √ 1 (x n + n)n−1 e−(x n+n) n.0392 .1148 .1030 . Sn = Sn → σ 2 = σ in probability.0824 .0788 .0056 . Then P (Xn + Yn > c) ≥ P (Xn > −m.48 can be used to simplify calculations.1114 . By b).1100 . +Yn > c + m) ≥ P (Xn > −m) + P (Yn > c + m) − 1 = 1 − .33 For all > 0 there exist N such that if n > N . From the CLT. d P dx = = ¯ Xn −1 √ ≤x 1/ n d dx n √ Xi ≤ x n + n n W = i=1 Xi ∼ gamma(n. 1/n).0664 .5-12 Solutions Manual for Statistical Inference √ 2 2 c. use the fact that Vn is itself distributed negative binomial(10r.0572 . dx dx 2π and P ¯ Xn −1 √ ≤x 1/ n → P (Z ≤ x).1162 . By a).0151 . µX = 1.1144 .0083 . then P (Xn + Yn > c) > 1 − .37 a. Choose N1 such that P (Xn > −m) > 1 − /2 and N2 such that P (Yn > c + m) > 1 − /2. σ/Sn → 1 in probability.0201 .0937 .0071 .0263 .1137 .0048 .0549 . ¯ ¯ 5. Normal w/cont.0008 . p). 5.0944 P (Vn = v) (b) (c) Normal App. Γ(n) √ √ √ 2 Therefore. The results are summarized in the following table. VarX = 1. So ¯ Xn −1 √ → Z ∼ n(0.0113 . v 0 1 2 3 4 5 6 7 8 9 10 (a) Exact .35 a. Xi ∼ exponential(1).

Second Edition

5-13

b. Using the normal approximation, we have µv = r(1 − p)/p = 20(.3)/.7 = 8.57 and σv = Then, P (Vn = 0) = 1 − P (Vn ≥ 1) = 1 − P Vn −8.57 1−8.57 ≥ 3.5 3.5 = 1 − P (Z ≥ −2.16) = .0154. r(1 − p)/p =
2

(20)(.3)/.49 = 3.5.

Another way to approximate this probability is P (Vn = 0) = P (Vn ≤ 0) = P V − 8.57 0−8.57 ≤ 3.5 3.5 = P (Z ≤ −2.45) = .0071.

Continuing in this way we have P (V = 1) = P (V ≤ 1) − P (V ≤ 0) = .0154 − .0071 = .0083, etc. c. With the continuity correction, compute P (V = k) by P (k−.5)−8.57 ≤ Z ≤ (k+.5)−8.57 , so 3.5 3.5 P (V = 0) = P (−9.07/3.5 ≤ Z ≤ −8.07/3.5) = .0104 − .0048 = .0056, etc. Notice that the continuity correction gives some improvement over the uncorrected normal approximation. 5.39 a. If h is continuous given > 0 there exits δ such that |h(xn ) − h(x)| < for |xn − x| < δ. Since X1 , . . . , Xn converges in probability to the random variable X, then limn→∞ P (|Xn − X| < δ) = 1. Thus limn→∞ P (|h(Xn ) − h(X)| < ) = 1. b. Deﬁne the subsequence Xj (s) = s + I[a,b] (s) such that in I[a,b] , a is always 0, i.e, the subsequence X1 , X2 , X4 , X7 , . . .. For this subsequence Xj (s) → 5.41 a. Let = |x − µ|. s if s > 0 s + 1 if s = 0.

(i) For x − µ ≥ 0 P (|Xn − µ| > ) = = ≥ = P (|Xn − µ| > x − µ) P (Xn − µ < −(x − µ)) + P (Xn − µ > x − µ) P (Xn − µ > x − µ) P (Xn > x) = 1 − P (Xn ≤ x).

Therefore, 0 = limn→∞ P (|Xn −µ| > ) ≥ limn→∞ 1−P (Xn ≤ x). Thus limn→∞ P (Xn ≤ x) ≥ 1. (ii) For x − µ < 0 P (|Xn − µ| > ) = = ≥ = P (|Xn − µ| > −(x − µ)) P (Xn − µ < x − µ) + P (Xn − µ > −(x − µ)) P (Xn − µ < x − µ) P (Xn ≤ x).

Therefore, 0 = limn→∞ P (|Xn − µ| > ) ≥ limn→∞ P (Xn ≤ x). By (i) and (ii) the results follows. b. For every > 0, P (|Xn − µ| > ) ≤ P (Xn − µ < − ) + P (Xn − µ > ) = P (Xn < µ − ) + 1 − P (Xn ≤ µ + ) → 0 as n → ∞.

5-14

Solutions Manual for Statistical Inference

5.43 a. P (|Yn − θ| < ) = P

(n)(Yn − θ) <

(n)

. Therefore, (n) = P (|Z| < ∞) = 1,

n→∞

lim P (|Yn − θ| < ) = lim P
n→∞

(n)(Yn − θ) <

where Z ∼ n(0, σ 2 ). Thus Yn → θ in probability. √ b. By Slutsky’s Theorem (a), g (θ) n(Yn − θ) → g (θ)X where X ∼ n(0, σ 2 ). Therefore √ √ n[g(Yn ) − g(θ)] = g (θ) n(Yn − θ) → n(0, σ 2 [g (θ)]2 ). 5.45 We do part (a), the other parts are similar. Using Mathematica, the exact calculation is In[120]:= f1[x_]=PDF[GammaDistribution[4,25],x] p1=Integrate[f1[x],{x,100,\[Infinity]}]//N 1-CDF[BinomialDistribution[300,p1],149] Out[120]= e^(-x/25) x^3/2343750 Out[121]= 0.43347 Out[122]= 0.0119389. The answer can also be simulated in Mathematica or in R. Here is the R code for simulating the same probability p1<-mean(rgamma(10000,4,scale=25)>100) mean(rbinom(10000, 300, p1)>149) In each case 10,000 random variables were simulated. We obtained p1 = 0.438 and a binomial probability of 0.0108. 5.47 a. −2 log(Uj ) ∼ exponential(2) ∼ χ2 . Thus Y is the sum of ν independent χ2 random variables. 2 2 By Lemma 5.3.2(b), Y ∼ χ2 . 2ν b. β log(Uj ) ∼ exponential(2) ∼ gamma(1, β). Thus Y is the sum of independent gamma random variables. By Example 4.6.8, Y ∼ gamma(a, β) c. Let V = j=1 log(Uj ) ∼ gamma(a, 1). Similarly W = V Exercise 4.24, V +W ∼ beta(a, b). 5.49 a. See Example 2.1.4. 1 b. X = g(U ) = − log 1−U . Then g −1 (x) = 1+e−y . Thus U fX (x) = 1 × e−y e−y = −y )2 (1 + e (1 + e−y )2
a b j=1

log(Uj ) ∼ gamma(b, 1). By

− ∞ < y < ∞,

which is the density of a logistic(0, 1) random variable.
1 c. Let Y ∼ logistic(µ, β) then fY (y) = β fZ ( −(y−µ) ) where fZ is the density of a logistic(0, 1). β Then Y = βZ + µ. To generate a logistic(µ, β) random variable generate (i) generate U ∼ U uniform(0, 1), (ii) Set Y = β log 1−U + µ. 5.51 a. For Ui ∼ uniform(0, 1), EUi = 1/2, VarUi = 1/12. Then 12

X=
i=1

¯ Ui − 6 = 12U − 6 =

12

¯ U −1/2 √ 1/ 12

Second Edition

5-15

√ ¯ is in the form n (U −EU )/σ with n = 12, so X is approximately n(0, 1) by the Central Limit Theorem. b. The approximation does not have the same range as Z ∼ n(0, 1) where −∞ < Z < +∞, since −6 < X < 6. c.
12 12 12

EX = E
i=1

Ui −6

=
i=1 12

EUi − 6 =
i=1 12

1 2

− 6 = 6 − 6 = 0.

VarX = Var
i=1

Ui −6

= Var
i=1

Ui = 12VarU1 = 1

EX 3 = 0 since X is symmetric about 0. (In fact, all odd moments of X are 0.) Thus, the ﬁrst three moments of X all agree with the ﬁrst three moments of a n(0, 1). The fourth moment is not easy to get, one way to do it is to get the mgf of X. Since EetU = (et − 1)/t, E e
t
12 i=1

Ui −6

= e−6t

et −1 t

12

=

et/2 − e−t/2 t

12

.

Computing the fourth derivative and evaluating it at t = 0 gives us EX 4 . This is a lengthy calculation. The answer is EX 4 = 29/10, slightly smaller than EZ 4 = 3, where Z ∼ n(0, 1). 5.53 The R code is the following: a. obs <- rbinom(1000,8,2/3) meanobs <- mean(obs) variance <- var(obs) hist(obs) Output: > meanobs [1] 5.231 > variance [1] 1.707346 b. obs<- rhyper(1000,8,2,4) meanobs <- mean(obs) variance <- var(obs) hist(obs) Output: > meanobs [1] 3.169 > variance [1] 0.4488879 c. obs <- rnbinom(1000,5,1/3) meanobs <- mean(obs) variance <- var(obs) hist(obs) Output: > meanobs [1] 10.308 > variance [1] 29.51665

Therefore Zi are Bernoulli(pi ) with E[Zi ] = pi . Var(Zi ) = pi (1 − pi ) and E[Z1 Z2 ] = P (Z1 = 1. X3 ) = λ3 since X1 .61 a.5-16 Solutions Manual for Statistical Inference 5. Cov(Z1 . Z2 ) = E[Z1 Z2 ] − E[Z1 ]E[Z2 ] = e−λ1 e−λ2 e−λ3 − e−(λi +λ3 ) e−(λ2 +λ3 ) = e−(λi +λ3 ) e−(λ2 +λ3 ) (eλ3 − 1) = p1 p2 (eλ3 − 1). Y2 ) = Cov(X1 + X3 . since a − [a] > 0 and b − [b] > 0 and y ∈ (0.57 a. X3 = 0) = e−(λi +λ3 ) . Therefore. Y2 = 0) = P (X1 + X3 = 0. Cov(Y1 .55 Let X denote the number of comparisons. Therefore. X2 + X3 = 0) = P (X1 = 0)P (X2 = 0)P (X3 = 0) = e−λ1 e−λ2 e−λ3 . b. Z2 ) ≤ which implies the result. therefore √ p1 (1−p1 ) p2 (1−p2 ) .59 P (Y ≤ y) = P (V ≤ y|U < = y 0 1 c fY p1 (1 − p2 ) p1 (1 − p1 ) p2 (1 − p2 ) p2 (1 − p1 ) p1 (1 − p1 ) p2 (1 − p2 ) = p1 (1 − p2 ) p2 (1 − p1 ) p2 (1 − p1 ) p1 (1 − p2 ) = 1 fY (V )) = c = 1 c y 0 P (V ≤ y. Z2 ) ≤ and Corr(Z1 . Z2 = 1) = P (Y1 = 0. Zi = 1 0 if Xi = X3 = 0 otherwise pi = P (Zi = 0) = P (Yi = 0) = P (Xi = 0. M = supy Γ(a+b) y a−1 (1−y)b−1 Γ(a)Γ(b) Γ([a]+[b]) y [a]−1 (1−y)[b]−1 Γ([a])Γ([b]) < ∞. Z2 ) = E[Z1 Z2 ] − E[Z1 ]E[Z2 ] ≤ p1 − p1 p2 = p1 (1 − p2 ). E[Z1 Z2 ] ≤ pi . Z2 ) = √ c. and Cov(Z1 . Corr(Z1 . X2 + X3 ) = Cov(X3 . X2 and X3 are independent. 1). 5. Then ∞ ∞ EX = k=0 P (X > k) = 1 + k=1 ∞ P (U > Fy (yk−1 )) ∞ = 1+ k=1 (1 − Fy (yk−1 )) = 1 + k=0 (1 − Fy (yi )) = 1 + EY 5. Z2 ) ≤ p2 (1 − p1 ). Cov(Z1 . p1 p2 (eλ3 −1) Thus Corr(Z1 . . U < 1 fY (V )) c P (U < 1 fY (V )) c fY (v)dv 1 c y (v) 0 1 c dudv = 0 fY (v)dv 5.

follow the algorithm on page 254 and generate the sample Z1 . this is when a = [a] and b = [b] or a = a and b = [b] or a = [a] and b = b. c This term is b−b a−a +b−b maximize at α c .63 M = supy y= −1 λ −y 2 √1 e 2 2π −|y| 1 λ 2λ e . therefore M is minimize at λ = 1 or λ = −1. this is at α = 1 c. Therefore in both cases M = . First consider c = α + β. a. 5.. We want to generate X = σZ + µ where Z ∼ Student’s t with ν degrees of freedom. 5. Let = 1 2. Let V ∼ n(0. Then df (y) = αy α−1 (1 − y)β − y α β(1 − y)β−1 = y α−1 (1 − y)β−1 [α(1 − y) + βy] dy which is maximize at y = M= α α+β . Among other options one can choose the 100 variables in positions 1001 to 1100 or the ones in positions 1010.. ν−2 Now. The Metropolis Algorithm is used to generate 2000 variables. Let f (y) = −y 2 2 + |y| λ .65 m m P (X ∗ ≤ x) = i=1 −→ m→∞ P (X ∗ ≤ x|qi )qi = i=1 ) Eg f (Y ) I(Y ≤ g(Y ) Eg f (Y ) g(Y I(Yi ≤ x)qi = x −∞ ∞ −∞ f (y) g(y) g(y)dy f (y) g(y) g(y)dy 1 m m f (Yi ) i=1 g(Yi ) I(Yi ≤ m f (Yi ) 1 i=1 g(Yi ) m x) x) = x = −∞ f (y)dy. b−b We need to minimize M in a and b . < ∞. Otherwise the minimum will occur when a − a and b − b are minimum but greater or equal than zero. 1 1 Then d log M = λ − λ3 . 2000. b − b > 0 when b = [b] and will be equal to zero when b = b.. α = a − a and β = b − b a−a a−a +b−b a−a Γ(a+b) Γ(a)Γ(b) Γ(a +b ) Γ(a )Γ(b ) b−b a−a +b−b a−a a−a +b−b a−a b−b . and Var(V ) = Var(Z) = ν . 1). ν−2 ) since given ν we can set EV = EZ = 0. which it is attain when a = a and b = b . M = supy c. Therefore for. Then f (y) is maximize at y = −1 √1 e 2λ2 2π −1 1 λ2 2λ e 1 λ when y ≥ 0 and at 1 when y < 0. since a − [a] − 1 < 0 and y ∈ (0. Therefore we ﬁrst can generate a sample of size 100 from a Student’s t distribution with ν degrees of freedom and then make the transformation to obtain the X’s. Thus fZ (z) = Γ( ν+1 ) 1 2 √ Γ( ν ) νπ 2 1 1+ z2 ν (v+1)/2 ν . Let f (y) = y α (1 − y)β . . 1). 1020. 5. d. Z2 . . since a − [a] > 0 and y ∈ (0. thus it does not aﬀect the result. Note that the minimum that M could be is one. . To minimize M let M = λe 2λ2 . then this term becomes α α c Γ(a+b) Γ(a)Γ(b) Γ(a +b ) Γ(a )Γ(b ) c−α c−α . Thus the value of λ that dλ will optimize the algorithm is λ = 1.Second Edition 5-17 b. . . M = supy Γ(a+b) y a−1 (1−y)b−1 Γ(a)Γ(b) Γ([a]+b) y [a]−1 (1−y)b−1 Γ([a])Γ(b) Γ(a+b) y a−1 (1−y)b−1 Γ(a)Γ(b) Γ([a]+1+β) y [a]+1−1 (1−y)b −1 Γ([a]+1)Γ(b ) < ∞.67 An R code to generate the sample of size 100 from the speciﬁed distribution is shown for part c). Z100 and then calculate Xi = σZi + µ. Then M = ( 1 )(a−a +b−b ) 2 2 .

.cbind(ranvars. follow the algorithm on page 254 where ρi = min Viα−1 −Vi α−1 e Zi−1 α +V −Z α i i−1 +Zi−1 β .2000.runif(1.1778724 5.min=0.1/b) ranvars <. fX (x) = α βe −xα β 2 2 /2) = EX 2 ) − e2µ+σ = Var(X).2000 to be the sample vector..1002. β) where 2 (eµ+(σ /2) )2 α = 2(µ+σ2 ) .ranvars[seq(1010.3 Z <.byrow=T.max=1) V <.2) is: #initialize a and b b <... and then ρ(v..1/b) p <.69 Let w(v. c. z).1) #Another option: choose elements in position 1010.rexp(1.ncol=1) for( i in seq(2000)) { U <. e − e2µ+σ2 since given µ and σ 2 we can set e2(µ+σ ) − e2µ+σ and β = . eµ+(σ2 /2) 2 2 EV = αβ = eµ+(σ and Var(V ) = αβ 2 = e2(µ+σ Now. 1}.V ranvars <..10)] mean(vector..1) if (U <= p) Z <. Now. z) = fY (v)fV (z) fV (v)fY (z) .1100 to be the sample vector.1) var(vector. We will show that Zi ∼ fY ⇒ P (Zi+1 ≤ a) = P (Y ≤ a). xα−1 . .2) Output: [1] 1.2) var(vector. z) = min{w(v.1 <.Z) } #One option: choose elements in position 1001.5-18 Solutions Manual for Statistical Inference b.matrix(c(Z).. fX (x) = e−(log x−µ) √1 x 2πσ 2 /2σ 2 .1020. follow the algorithm on page 254.1758335 [1] 1.pmin((V/Z)^(a-1)*exp((-V^a+V-Z+Z^a)/b).048035 [1] 0.1 An R code to generate a sample size of 100 from a Weibull(3.ranvars[1001:1100] mean(vector. Let V ∼ gamma(α.130649 [1] 0.rexp(1. Let V ∼ exponential(β).2 <.2 a <.

Y ) ≤ 1). v) ≤ 1)w(y. Y )) + P (Y ≤ a and U > ρ(V. Y )) a ∞ = −∞ a −∞ ∞ ρ(v. ρ(V. y) ≤ 1)fY (v)fV (y)dydv −∞ a −∞ ∞ = −∞ I(w(y. v) > 1)fY (v)fV (y)dydv = P (Y ≤ a. −∞ Now. and thus ﬁrst term above can be written a −∞ ∞ I(w(v. Y )). v) ≤ 1) fV (y)fY (v) fV (y)fY (v) fY (y)fV (v) fV (y)fY (v) fY (y)fV (v)dydv fV (y)fY (v)dydv = −∞ I(w(y. ρ(V. y)fY (y)fV (v)dydv I(w(v. Y )). Y ) = 1. notice that w(v. y) = 1/w(y. y) ≥ 1)fY (y)fV (v)dydv −∞ a −∞ a ∞ = −∞ ∞ I(w(y. Since Zi ∼ fY . U ≤ ρ(V. Y )) to get P (Zi+1 ≤ a) = P (Y ≤ a) + P (V ≤ a and U ≤ ρ(V. v) ≤ 1) −∞ a −∞ ∞ = = −∞ a −∞ ∞ −∞ I(w(y. Y ). Thus we need to show that P (V ≤ a and U ≤ ρ(V. y) ≥ 1)fY (y)fV (v)dydv −∞ ∞ = + −∞ I(w(v. Y )). v). .Second Edition 5-19 Write P (Zi+1 ≤ a) = P (Vi+1 ≤ a and Ui+1 ≤ ρi+1 ) + P (Zi ≤ a and Ui+1 > ρi+1 ). y) ≥ 1)fY (y)fV (v)dydv. y) ≤ 1)fY (v)fV (y)dydv −∞ −∞ a ∞ I(w(v. v)fV (y)fY (v)dydv = P (Y ≤ a. y) ≤ 1) −∞ −∞ a ∞ = + −∞ a fY (v)fV (y) fV (v)fY (y) fY (y)fV (v)dydv I(w(v. suppressing the unnecessary subscripts we can write P (Zi+1 ≤ a) = P (V ≤ a and U ≤ ρ(V. Y )) −P (Y ≤ a and U ≤ ρ(V. The second term is a −∞ ∞ I(w(v. Write out the probability as P (V ≤ a and U ≤ ρ(V. Add and subtract P (Y ≤ a and U ≤ ρ(V. v) ≤ 1)fY (y)fV (v)dydv I(w(y. U ≤ ρ(V. Y )). Y )) = P (Y ≤ a and U ≤ ρ(V.

Y ) ≤ 1) = P (Y ≤ a and U ≤ ρ(V. and hence P (Zi+1 ≤ a) = P (Y ≤ a). so fY is the stationary density.5-20 Solutions Manual for Statistical Inference Putting it all together we have P (V ≤ a and U ≤ ρ(V. ρ(V. Y ). U ≤ ρ(V. ρ(V. Y )) = P (Y ≤ a. Y )). . U ≤ ρ(V. Y ) = 1. Y )) + P (Y ≤ a.

5 The sample density is given by n n j=1 t1 (Xj ). 6. 6. .3 Let x(1) = mini xi . T (X) = mini (Xi /i) is suﬃcient because the joint pdf is n f (x1 . . .4 The joint pdf is n k i Xi is a suﬃcient statistic for (µ. . |X| is suﬃcient because the pdf of X is f (x|σ 2 ) = √ 2 2 2 2 1 1 e−x /2σ = √ e−|x| /2σ = g( |x|| σ 2 ) · 1 .Σi xi |µ. i i Thus (min Xi /i.∞) (xi ) = σ i=1 n eµ/σ σ n e−Σi xi /σ I(µ. and the fact that all xi s > iθ if and only if mini (xi /i) > θ. σ).2 By the Factorization Theorem. . .+∞) (T (x)) · e−Σi xi . . X(1) . σ) = 1 −(xi −µ)/σ e I(µ. xn |θ) = i=1 eiθ−xi I(iθ.  wi (θ)ti (xj ) = c(θ)n exp  k n  ti (xj ) · n h(xj )c(θ) exp j=1 i=1 wi (θ) i=1 g(T (x)|θ) j=1 h(xj ) . f (xi |θ) i=1 = 1 I (−i(θ − 1) ≤ xi ≤ i(θ + 1)) 2iθ i=1 1 2θ n n = 1 i i=1 n I min xi xi ≥ −(θ − 1) I max ≤θ+1 . Then the joint pdf is f (x1 . h(x) g ( x(1) . g(T (x)|θ) h(x) Notice. . . . xn |µ. . by the Factorization Theorem. 2πσ 2πσ h(x) 6. we use the fact that i > 0. j=1 h(x) By the Factorization Theorem.σ ) Thus.1 By the Factorization Theorem. . 6.∞) (x(1) ) · 1 . n j=1 tk (Xj ) is a suﬃcient statistic for θ.+∞) (xi ) = einθ I(θ. max Xi /i) is suﬃcient for θ.Chapter 6 Principles of Data Reduction 6. .

Therefore.θ4 ) (yi ) (θ3 − θ1 )(θ4 − θ2 ) 1 3 i=1 1 (θ3 − θ1 )(θ4 − θ2 ) n n I(θ1 .θ ) (xi )I(θ2 . xn }. f (x|θ) f (y|θ) = e−Σi (xi −θ) n i=1 n i=1 2 2 1 + e−(yi −θ) e−Σi (yi −θ) 2 1 + e−(xi −θ) n i=1 n i=1 = e−Σi (yi −xi ) 1 + e−(yi −θ) 1 + e−(xi −θ) . e−Σi yi I(θ. ¯ This is constant as a function of θ if and only if y = x . This will be the case if and only if min{x1 . xn }. β). . . 6. . the order statistics are minimal suﬃcient for θ. . By the Factorization Theorem. . θ4 ). . y(1) = mini {y1 .∞) (min yi ) To make the ratio independent of θ we need the ratio of indicator functions independent of θ. .∞) (min xi ) .9 Use Theorem 6. . . So T (X) = min{X1 . . h(x) g(T (x)|θ) By the Factorization Theorem. . . θ3 . b. f (x|θ) (2π) e−Σi (xi −θ) /2 1 = = exp − −n/2 −Σi (yi −θ)2 /2 f (y|θ) 2 (2π) e −n/2 2 n n x2 − i i=1 i=1 2 yi +2θn(¯−¯) y x . ( Xi .7 Let x(1) = mini {x1 . a. therefore X is a minimal suﬃcient ¯ ¯ statistic for θ.∞) (xi ) n enθ e−Σi yi i=1 I(θ. X(1) . n i=1 Xi ) is suﬃcient for (α. d. . the range depends on the parameter. Then the joint pdf is f (x.∞) (xi ) i=1 e n −(yi −θ) I (θ. Now f (x|θ) f (y|θ) = = n −(xi −θ) I(θ. X(n) . . θ2 . Note. c. . . β) = 1 xi α−1 e−xi /β = Γ(α)β α i=1 n i=1 n 1 Γ(α)β α n n α−1 xi i=1 e−Σi xi /β .6 The joint pdf is given by f (x1 . 6. . y|θ) = = 1 I(θ . yn }. Y(n) is suﬃcient for (θ1 .∞) (x(1) )I(−∞. . xn } = min{y1 . for X ∼ location exponential(θ). This is a diﬃcult problem. .∞) (y i ) i=1 e n nθ −Σi xi e e i=1 I(θ. . . . yn } and y(n) = maxi {y1 . .13. xn |α. . . yn }. .∞) (y(1) )I(−∞.2. . Y(1) .θ3 ) (x(n) )I(θ2 . . x(n) = maxi {x1 . .∞) (yi ) = e−Σi xi I(θ. . The order statistics are a minimal suﬃcient statistic.θ4 ) (y(n) ) · 1 . This is constant as a function of θ if and only if x and y have the same order statistics. . .6-2 Solutions Manual for Statistical Inference 6. . Xn } is a minimal suﬃcient statistic.

Let Z(1) . This problem can be generalized to show that if a function of a suﬃcient statistic is ancillary. Fix sample points x and y. Consider an interval of θs that does not contain any xi s or yi s. Then (Z(1) + θ. so (Y1 . . Y(n−1) ) has the same joint distribution as (Z(n) + θ − (Z(1) + θ). Yn−1 ). Y(n−1) ) is ancillary.2. Z(n) − Z(n−1) ). . Basu’s lemma shows that (Y1 . Yn = X(1) is independent of (Y1 . . The last vector depends only on (Z1 . 6. a(θ) = the number of elements in A(θ) and b(θ) = the number of elements in B(θ). . Therefore. N = n)P (N = n) f (y|θ. Zn ) whose distribution does not depend on θ. . . . . This will be true for all such intervals if and only if the order statistics for x are the same as the order statistics for y. . B(θ) = {i : yi ≤ θ}. . because the expectation of that function does not depend on θ.2. and E(R − E R) = 0 for all θ. Z(n) + θ − (Z(n−1) + θ)) = (Z(n) − Z(1) . .13 and write f (x. . . but g[T (X)] ≡ 0 . . . So.Yn−1 ) is not independent of the suﬃcient statistic. . . . . Thus g[X(n) . . . X(1) is suﬃcient. . . . n|θ) f (y. . . . Use Theorem 6. 6. .17 its distribution does not depend on θ. and (Y(1) . Thus E R = (n − 1)/(n + 1) does not depend on θ. Z(n) + θ) has the same joint distribution as (X(1) .12 a. Deﬁne A(θ) = {i : xi ≤ θ}. . X(1) ] = X(n) − X(1) − (n − 1)/(n + 1) = R − E R is a nonzero function whose expected value is always 0. . From Example 6. Deﬁne Yn = X(1) . we want to ﬁnd g[T (X)] such that E g[T (X)] = 0 for all θ. the range of X. X(n) ).10 To prove T (X) = (X(1) . Z(n) be the order statistics from a random sample of size n from the standard pdf f (z|0).Yn ) is n−1 f (y1 . . . For b). .Second Edition 6-3 e. the order statistics are a minimal suﬃcient statistic. Thus. n |θ) = = f (x|θ. . b. because by Example 6. 2). then the suﬃcient statistic is not complete. . on the interval if and only if a(θ) = b(θ). . . .Yn−1 ) is independent of the complete suﬃcient statistic. . R ∼ beta(n−1. So. nonzero estimator of zero. For c). . A natural candidate is R = X(n) − X(1) . (X(1) . yn ) = n!e−n(y1 −θ) e−(n−1)yn i=2 eyi . For a). . N = n )P (N = n ) n x n y θx (1−θ) θy (1−θ) n−x pn pn n −y = θx−y (1 − θ)n−n −x+y n x n y pn pn . . . . . . That provides the opportunity to construct an unbiased. . . X(n) ) is not complete. The second term is constant on such an interval. . .11 a. Then the function f (x|θ)/f (y|θ) depends on θ only through the function n n | xi −θ | − i=1 i=1 | yi −θ | (xi − θ) − i∈A(θ)c i∈B(θ) = i∈A(θ) (θ − xi ) + (θ − yi ) − i∈B(θ)c (yi − θ) = (a(θ) − [n − a(θ)] − b(θ) + [n − b(θ)])θ  + − i∈A(θ)  yi  i∈B(θ) c xi + i∈A(θ) c xi + i∈B(θ) yi −  = 2(a(θ) − b(θ))θ + − i∈A(θ)  xi + i∈A(θ) c xi + i∈B(θ) yi − i∈B(θ) c yi  . 6. Then the joint pdf of (Y1 . X(n) ) is not complete. The ﬁrst term will be constant. and e) the order statistics are suﬃcient. d). . 0 < yn−1 < yn−2 < · · · < y1 0 < yn < ∞.17.2. These are all location families. . . (Y(1) .

¯ = (M (Z) + µ) − (Z + µ) = M (Z) − Z. . . 6. 2 N N E E We used the fact that X|N ∼ binomial(N. . i Xi is a complete. X N X Var N E = X 1 1 N = E E (X | N ) = E N θ = E(θ) = θ. We see that the family of distributions of Yi is a scale family with scale parameter 1/α.6. . θ 6. S 2 ) not complete. Because P (N = n) = pn does not depend on θ. .5. Then (1/α)Z 1 Y1 Z1 logX1 = = = . E(S 2 ) = aθ2 ¯ 2 ) = VarX + (EX)2 = aθ2 /n + θ2 = (a + n)θ2 /n. for all θ. Therefore. b. M (X) − X ¯ ¯ only Z1 . Xn is from a location family.17 The population pmf is f (x|θ) = θ(1 − θ)x−1 = 1−θ elog(1−θ)x . The parameter space consists only of the points (θ. the minimal suﬃcient statistic contains N in this case. Zn . S 2 ) = − S2 a ¯ has zero expectation so (X. .2. the pdf of each is f (y|α) = α exp {αy − eαy } = 1 exp 1/α y − ey/(1/α) . Xn . S 2 ) is suﬃcient. f (z). Then Y1 and Y2 are iid and. logX2 Y2 (1/α)Z 2 Z2 Because the distribution of Z1 /Z2 does not depend on α. 1/α −∞ < y < ∞. the distribution of M (X) − X does not depend on µ. M (X) − X is an ancillary statistic. a ¯ Thus g(X. by Theorem 3. Thus. ¯ ¯ and E(X E n ¯ 2 S2 X − a+n a n ¯2 a+n X = n a+n a+n 2 θ n 1 − aθ2 = 0. where Z1 and Z2 are a random sample from f (z|1). . N is ancillary for θ. .1. by Theorem 2. .10 and 6. The point is that although N is independent of θ. ¯ b. .13 Let Y1 = log X1 and Y2 = log X2 . . . 1 by Theorem 3. A minimal suﬃcient statistic may contain an ancillary statistic.25. (log X1 )/(log X2 ) is an ancillary statistic. 6. . Let ¯ ¯ M (X) denote the median calculated from X1 . Now ∞ Eg(Y ) = y=0 g(y) (nλ) e−nλ .18 The distribution of Y = i Xi is Poisson(nλ). Thus.15 a. . 6. this is an analytic function which cannot be identically zero. Then M (X) = M (Z)+µ and X = Z +µ. Zn is a random sample from the standard pdf. Use the same factorization as in Example 6. that is. we can write Yi = α Zi .6-4 Solutions Manual for Statistical Inference The last ratio does not depend on θ. y! y If the expectation exists. .2. This quadratic graph is a line and does not contain a two-dimensional open set.14 Because X1 . an exponential family with t(x) = x. and µ is the location parameter. i Xi − n ∼ negative binomial(n. . So (X.6. Because M (X) − X is a function of ¯ ¯ ¯ Thus. where Z1 . ν) on the graph of the function ν = aθ2 .5.5. N N N X X 1 = Var E N + E Var N = Var(θ) + E Var (X | N ) N N N2 N θ(1−θ) 1 = 0+E = θ(1 − θ)E . N ) is minimal suﬃcient for θ. θ). θ). . suﬃcient statistic by Theorems 6.2. The other terms are constant as a function of θ if and only if n = n and x = y. we can write Xi = Zi +µ. 6.9 to show (X.

then the suﬃcient statistic is not complete. then the expectation is zero for all p.y(1) ) (θ) is constant (in fact. Y = max{Xi } is suﬃcient and f (y) = For a function g(y). 6. For Distribution 1.25. i b. by Example 6. θ 2n 2n−1 y . By Example 6. i Xi is also a complete suﬃcient statistic. So (X(1) . For Distribution 2 calculate Ep g(X) = g(0)p + g(1)p2 + g(2)(1 − p − p2 ) = [g(1) − g(2)]p2 + [g(0) − g(2)]p + g(2).Second Edition 6-5 6. b. a binomial family (Bernoulli is a special case) is complete. The uniform(θ. Because i f (xi |θ) = θn e(θ−1) log(Πi xi ) . so the family of distributions is complete. implies that g(X) ≡ 0.x(1) ) (θ) f (x|θ) = −n f (y|θ) θ I(y(n) /2. Because i Xi is a one-to-one function of log ( i Xi ). 6. 2 Ep g(X) = x=0 g(x)P (X = x) = pg(0) + 3pg(1) + (1 − 4p)g(2). This can only be zero if g(θ) = 0 for all θ. Note that if g(0) = −3g(1) and g(2) = 0.2. we check if Ep g(X) = 0 for all p. So the family is not complete. Zn is a random sample .10.22. The ratio θ−n I(x(n) /2. . |X| is suﬃcient by Theorem 6. g(0) = g(1) = g(2) = 0. Yes.22 a. . with standard pdf f (z) ∼ uniform(1.20 The pdfs in b). but g(x) need not be identically 0. X is suﬃcient because it is the data. c). and e) are exponential families.2. . Hence the family is not complete. c. we know that if a function of the suﬃcient statistics is ancillary. the form of an exponential family. not i Xi . one) if and only if x(1) = y(1) and x(n) = y(n) .25. So this suﬃcient statistic is not complete. E g(Y ) = 0 g(y) 2n 2n−1 2nθ2n−1 y dy = 0 for all θ implies g(θ) = 0 for all θ θ2n θ2n by taking derivatives. From Exercise 6. So if Z1 . because P (|X| = 0) = 1 − θ and P (|X| = 1) = θ. To check completeness.23 Use Theorem 6. This is a location family.2. . the order statistics are minimal suﬃcient.2.19 To check if the family of distributions of X is complete. θ2n 0 < y < θ. Thus. but g(x) need not be identically zero.2. and its expectation does not depend on θ.21 a. f (x|θ) = (1 − θ)(θ/(2(1 − θ))|x| = (1 − θ)e|x|log[θ/(2(1−θ)] . For a). To make it zero for all p each coeﬃcient must be zero. log ( i Xi ) is complete and suﬃcient by Theorem 6. Thus. 6. For d). The distribution of |X| is Bernoulli. calculate Eg(X) = θ θ g(−1) + (1 − θ)g(0) + g(1). then Eg(X) = 0 for all θ. 2θ) family is a scale family.6. 2).13.18 the range R = X(n) − X(1) is ancillary. so they have complete suﬃcient statistics from Theorem 6. so Y = max{Xi } is complete.2. so i Xi is suﬃcient for θ. 6. 2 2 If g(−1) = g(1) and g(0) = 0. The sample density is i θxθ−1 = θn ( i xi )θ−1 . because f (x|θ) depends on x only through the value of |x|. X(n) ) is a minimal suﬃcient statistic for θ. This is a polynomial of degree 2 in p.

2) population. By Theorem 6. Because i Xi . Thus. Thus. The ﬁrst expression for the ratio is constant a function of µ and σ 2 if and only if 2 2 2 i yi and i xi = i yi . This pdf can be written as f (x|µ. X(n) ) is not complete. 2 yi . (X(1) .3 by a straightforward but messy two-variable transformation U = (X1 + X2 )/2 and V = 2λ/T = λ[(1/X1 ) + (1/X2 ) − (2/[X1 + X2 ])]. Substituting σ 2 = µ in the above expression yields f (x|µ. If λ = 1. by Deﬁnition 6. This is an exponential family with t1 (x) = x and t2 (x) = 1/x. σ 2 ) σ i x2 − n¯2 . i xi = i yi and i x2 = i i d. T ) is also a complete suﬃcient statistic. So X(1) /X(n) = Z(1) /Z(n) . as in Exercise 6. Substituting σ 2 = µ2 in the ﬁrst expression yields f (x|µ. Xn = θZn is a random sample from a uniform(θ. Eh(X) = h(0). 2 Xi 2 a. so Eh(X) = 0 but h(x) ≡ 0.24 If λ = 0. Thus. i (1/Xi )). and X(1) = θZ(1) and X(n) = θZ(n) . 2 yi . σ 2 ) = exp 2 f (y|µ. This is a two-to-one transformation.11 Xi .10. . µ) = exp f (y|µ. µ) xi − i i yi exp − 1 2µ x2 − i i i 2 yi . Thus. µ2 ) 1 = exp 2) f (y|µ. µ µ xi − i i i x2 = i i i yi − 1 2µ2 x2 − i i i 2 yi . h(2) = −2. for any (µ. Eh(X) = e−1 h(0) + e−1 ∞ h(x) . b. c. σ 2 ) the ratio in Example 6. i Xi is a minimal suﬃcient statistic. then X1 = θZ1 . 2 Xi is a minimal This is constant as a function of µ if and only if suﬃcient statistic. i Xi is not minimal. x! h(1) = 1. (X.6-6 Solutions Manual for Statistical Inference from a uniform(1.25. i (1/Xi )) is a complete suﬃcient statistic. . i Xi is a minimal suﬃcient statistic. . 2 2 is not a function of i Xi .2. 6. i Xi . . (X.14 can x i 1 2σ 2 xi − i i yi − x2 − i i i 2 yi . h(x) = 0 for x ≥ 3 . Thus. λ) = λ 2π 1/2 i xi = 1 x3 1/2 exp λ µ exp − λ λ1 x− 2 2µ 2x . This is constant as a function of µ if and only if 2 i Xi . take h(0) = 0.) 6. This can be accomplished using the methods from Section 4.25 Using the fact that (n − 1)s2 = x be written as µ f (x|µ. 6. Do part b) ﬁrst showing that i Xi is a minimal suﬃcient statistic.27 a. the ¯ statistic ( i Xi . i i b. T ) given in the problem ¯ is a one-to-one function of ( i Xi .2. x! x=1 ∞ Let h(0) = 0 and x=1 h(x) = 0. a statistic whose distribution does not depend on θ.2. . 2θ) population. (For example.

As T is a 1 − 1 function of the order statistics. If this is zero for all µ. . By Theorem ¯ 6.4 we have fY1 (y) = fX (y) (1−FX (y)) ∞ n−1 n = e−(y−µ) e−(y−µ) n−1 n = ne−n(y−µ) . . k. c. and in the swindle case σ 2 refers to Var(M − X).14. i Xi is a complete suﬃcient statistic. 2σ 2 /(n − 1)) distribution. Now.28 this is a one-dimensional exponential family with t(x) = x. . 1. MN and M1 − X1 . a simple scale n−1 transformation.1 we know that (n − 1)S 2 /σ 2 ∼ χ2 = gamma((n − 1)/2. 6. S 2 is ancillary.Second Edition 6-7 6. Because S 2 is a function of only Z1 . . To check completeness compute fY1 (y). From Theorem 5. . Thus. By Basu’s Theorem. . so part (b) of Theorem 6. . So we can write Xi = Zi + µ. then X(1) is independent of any ancillary statistic. by Basu’s theorem. which does not depend ¯ on µ. .4. (N − 1)S /σ ∼ χ2 −1 . From Exercise 6.2. Therefore. . f0 (x(i) ) n i=1 fk (x(i) ) n i=1 f0 (x(i) ) is minimal suﬃcient for the family {f0 ...25. .9b.. (i) E(X k ) = E X Y Y k =E X Y k Yk indep.5 can now be directly applied to show that the order statistics are minimal suﬃcient for F. S 2 is ancillary. where Y1 = X(1) .6. Zn is a random sample from f (x|0). If F is a nonparametric family.5. (i) µ is a location parameter. X is a one-to-one function of i Xi . This implies g(µ) = 0 for all µ. f1 . then µ g(y)e−ny dy = 0 for all µ (because nenµ > 0 for all µ and does not depend on y). 6. . By part (i). . . . 2). fk }. Both M N ¯ ¯ ¯ and M − X are asymptotically normal. has a gamma((n − 1)/2. Recognize that f (x|µ) is a location family. fk }. . so. b. By Exercise 6.31 a. . ¯ so X is also a complete suﬃcient statistic.3. the distribution of S 2 does not depend on µ. ¯ ¯ (ii) The independence of X and S 2 is determined by the joint distribution of (X. Hence. σ 2 ). X and S 2 are independent. . . the statistic T (x) = n i=1 n i=1 f1 (xi ) . j = 0. Moreover. βj ). . f0 (xi ) n i=1 fk (xi ) n i=1 f0 (xi ) = n i=1 n i=1 f1 (x(i) ) . = E X Y k E Yk . that is. f1 . S 2 = [σ 2 /(n − 1)][(n − 1)S 2 /σ 2 ].. we need to show only that S 2 has distribution independent of µ. . Because ¯ complete suﬃcient statistic. Therefore. using the above expression we get the two given expressions where in the straightforward case σ 2 refers ¯ to Var M .. .29 Let fj = logistic(αj . 0= d dµ ∞ g(y)e−ny dy = −g(µ)e−nµ µ for all µ. (N − 1)2 Var S 2 /(σ 2 )2 = 2(N − 1) and Var S 2 = 2(σ 2 )2 /(N − 1). X and S 2 are independent.. . where Z1 . M − X is ancillary. . . so X(1) is complete.. by Theorem 4. MN − XN are each approximately normal samples if n is reasonable large. Then S2 = 1 n−1 ¯ (Xi − X)2 = 1 n−1 ¯ ((Zi + µ) − (Z + µ))2 = 1 n−1 ¯ (Zi − Z)2 . (i) By Exercise 3. By Basu’s Theorem. write Eµ g(Y1 ) = µ g(y)ne−n(y−µ) dy. the order statistics are also minimal suﬃcient for the family {f0 . . .6 Var M = Var(M − X + X) = Var(M − X)+Var X. fj ∈ F. ¯ ¯ b. 2 2 2 (ii) If S is a sample variance calculated from a normal sample of size N . From Theorem 6.5. X ¯ ¯ ¯ ¯ they are independent. Zn . that is. σ 2 ). .30 a. . Basu’s Theorem says that if X(1) is a complete suﬃcient statistic for µ.6. that is. . S 2 is ancillary. S 2 ) for each ¯ value of (µ. M1 . As in part (a) X is a ¯ and M − X are independent.. we have that X(1) is a minimal suﬃcient statistic. From Theorem 5. for each value of (µ. Divide both sides by E Y k to obtain the desired equality. S 2 is independent of X(1) . ∞ y > µ.

I(θ) = 2/θ2 for a sample of size one. From the likelihood principle. xj )) = Ev(Ej . 6. Equations 6. this expression is correct for each ﬁxed value of (α. 6. x∗ ) (but including (1.4 and 6. .25. T (J.5 follow immediately from the two Principles. x∗ )|θ)C 1 1 L(θ|x∗ ) = 2 2 1 = C f1 (x∗ |θ) 1 2 1 f2 (x∗ |θ) = f ∗ ((2. Then the conclusion is Ev(E. For (2. and p3 . 110. For one observation (X. 10. xj )|θ). x∗ ) we also have 2 g(T (2. the Conditionality Principle. x2 ) if L(θ|x1 )/L(θ|x2 ) = c. x1 ) = Ev(E. the conclusion of the Formal Likelihood Principle. (j. = TE X(i) T part (i) = T E(X(i) ) .2n ≤ t2 /θ2 ) .33 a. and I(θ) = 2n/θ2 for a sample of size n.6-8 Solutions Manual for Statistical Inference (ii) If α is ﬁxed. Thus. x∗ )|θ). where C = h(x)/h(y) and h is the function from the Factorization Theorem. 111}. xj )|θ) = 1 1 fj (xj |θ) = L(θ|xj ). xj ) and xj play the roles of x∗ and x∗ in the Formal Likelihood Principle we 1 2 can conclude Ev(E ∗ . . xj )) = f ∗ ((j. by the Formal Likelihood Principle. xj ). Thus evidence is equal whenever the likelihood functions are equal. it is independent of T . 2 2 2 By the Factorization Theorem. Let one experiment be the E ∗ experiment and the other Ej . the Formal Suﬃciency Principle. x∗ ) = Ev(E2 .3. b. The four sample points are {0.3. Y |θ) ∂θ2 = −E − 2Y θ3 = 2E Y . inference about p is only through L(p|x). . x) = Ev(E. y).32 In the Formal Likelihood Principle. β). x∗ )|θ)C 1 1 = C L(θ|x∗ ) = 1 2 = f ∗ ((1. take E1 = E2 = E.2. Y ∼ exponential(θ). b.35 Let 1 = success and 0 = failure. For all sample points except (2. Ev(E. Hence. Combining them we have Ev(E1 . 2 2 Letting (j. 1 2 c. p.37 a. To prove the Conditionality Principle. . p2 . Hence. and this follows from Formal Suﬃciency and Conditionality. Because β is a scale parameter. and the sample size does not directly inﬂuence the inference. 6. then X(i) /T has the same distribution as (βZ(i) )/ (β i Zi ) = Z(i) / ( i Zi ). If T (X) is suﬃcient and T (x) = T (y). xj )|θ)1 = f ∗ ((j. T (j. x∗ ). regardless whether α is “known” or not. θ3 But. x∗ )|θ)h(2. if Z1 . xj ) = g((j. T = i Xi is a complete suﬃcient statistic for β by Theorem 6. 2 1 g(T (j. x∗ ) 2 2 = g((1. and E Y = θ. and this distribution does not depend on β. xj )|θ)h(j. 6. The values of the likelihood are 1. Then L(θ|(j. X(i) /T is ancillary. xj ). then L(θ|x) = CL(θ|y). 1) distribution. Now consider the Formal Suﬃciency Principle. Zn is a random sample from a gamma(α. xj ) = (j. We have E(X(i) |T ) = E X(i) T T T = TE X(i) T T indep. (i) The cdf of T is P (T ≤ t) = P Yi ≤ t2 Xi i i =P 2 2 i Yi /θ ≤ t2 /θ2 Xi θ i = P (F2n. Hence. and by Basu’s Theorem. ET Note. XJ ) is suﬃcient. x∗ )). Y ) we have I(θ) = −E ∂2 log f (X.

U ) is also suﬃcient. 1/θ). W ∼ gamma(n.2n is an F random variable with 2n degrees of freedom in the numerator and denominator. But. This follows since 2Yi /θ and 2Xi θ are all independent exponential(1). U ) is not complete.5)(y) + (. (W. or χ2 . and (T. so W and V are independent. y) = exp [−θ ( i xi ) − ( i yi ) /θ] . Yi ∼ exponential(θ).5)x + 50 = (. V ) deﬁned as in part (ii) is suﬃcient. Use this joint distribution of (W. V ). 9 b.Second Edition 6-9 where F2n. V ) to derive the joint pdf of (T. the information in (T. hence.39 a.5)x + 106 = T ∗ (x). u|θ) = 2t [Γ(n)] t θ Now. So E(U 2 − n2 ) = 0 for all θ. U ) is a one-to-one function of (W. E U 2 = E W V = (n/θ)(nθ) = n2 does not depend on θ. t > 0. T (x) = (.2n 2 1 T 2 /θ2 + 1 2  2n  = 2 1 − 2E θ 1 2 F2n. +1 = Γ(2n) Γ(n)2 ∞ 0 1 wn−1 Γ(2n) Γ(n)Γ(n + 2) n+1 = = . The transformation from Celsius to Fahrenheit is y = 9x/5 + 32. (T. Xi and Yi are independent. Hence. and the information in T is 2n 1 − 2E θ2 The expected value is E 1 2 F2n. (ii) Let W = i Xi and V = i Yi . 2 (1 + w)2n (1 + w) Γ(n)2 Γ(2n + 2) 2(2n + 1) Substituting this above gives the information in T as 2n n+1 n 1−2 = I(θ) . f (t. 5 ∗ (T (y) − 32) 9 = = 5 ((. θ3 θ (iii) The pdf of the sample is f (x. U |θ) ∂θ2 = −E − 2U T θ3 =E 2V θ3 = 2nθ 2n = 2. we do not have equivariance. Thus. 6.5)(9x/5 + 32) + (. . hence (T. In each pair.2n + 1 2  . θ). U ) as 2 uθ ut u2n−1 exp − − . u > 0. Hence. V ∼ gamma(n. 2 θ 2(2n + 1) 2n + 1 which is not the answer reported by Joshi and Nabar.5)(212) − 32) = (. Xi ∼ exponential(1/θ). U ) is −E ∂2 log f (T. Diﬀerentiating (in t) and simplifying gives the density of T as 2 fT (t) = Γ(2n) 2 Γ(n)2 t t2 2 + θ2 t n θ2 2 + θ2 t n . hence. and the second derivative (in θ) of the log density is 2n 2n t4 + 2t2 θ2 − θ4 = 2 θ2 (t2 + θ2 )2 θ 1− 2 (t2 /θ2 + 1)2 .5)(212) − 32) 9 5 ((.5)x + 50 = T (x).

. we can write Xi = σZi + µ. . . . and R/S is ancillary by a). . . . . axn + b) = 1 ((axi + b) − (a¯ + b))2 x n−1 1 = a2 (xi − x)2 = a2 S 2 (x1 . . . . . . . = = T2 (X 1 . c. . . . . . that is.c (σ 2 ) = c2 σ 2 . R(x1 . b. . S(ax1 + b. where Z1 . . Xn ) is the same as the distribution of (Z1 + θ. . . . Zn is a random sample from the standard pdf f (z). .) b. .41 a. i. . . σZn +µ) σT2 (Z 1 . . cxn +a) = = k n−1 k n−1 n n 2 (cxi + a) − i=1 n i=1 (cxi + a)/n 2 ((cxi + a) − (c¯ + a)) x i=1 n k = c2 n−1 (xi − x)2 = c2 kS 2 (x1 . cxn + a) = c2 W (x1 . . ga. max{ax1 + b. . . . . ga (σ 2 ) = σ 2 . . for all θ. Thus. . . σZn +µ) = . . Combining this with part (a). 6. . Zn ) Because T1 /T2 is a function of only Z1 . . Therefore. so W (x) is equivariant. then Y = ga. . . . . . . . T1 /T2 is an ancillary statistic. . . . . . The formal structures for the problem involving X and the problem involving Y are the same. . xn ).6-10 Solutions Manual for Statistical Inference 6. axn +b} = ax(1) +b. Then T1 (X 1 . Zn are a random sample from f (x − 0) and E Zi = 0. ¯ n−1 6. . . xn + a) = i (xi + a)/n = ( i xi ) /n + a = W (x1 . . if X ∼ f (x|θ. Thus.43 a.6. T ∗ (x1 + a. . They both concern a random sample of size n from a normal population and estimation of the mean of the population. Eθ W = E i (Zi + θ)/n = θ. . . . use the above argument with c = 1. xn + a) − a = T ∗ (y) − a = T (x). . . by Theorem 3. . . . . the distribution of T1 /T2 does not depend on µ or σ. . . . W (x1 + a. . T (x1 +a. xn ) + a. that is. . .. xn ). . xn ). . . Measurement equivariance requires that the estimate of µ based on y be the same as the estimate of µ based on x. An estimator of σ 2 is invariant with respect to G1 if ¯ W (cx1 + a. . . Xn ) σT1 (Z 1 . . . . . . . . . An estimator of the form kS 2 is invariant because kS 2 (cx1 +a. . . formal invariance requires that T (x) = T ∗ (x) for all x. xn ). . xn +a)−a = T (x1 . . . b. . Zn ) T1 (Z 1 . . axn +b) = (ax(n) +b)−(ax(1) +b) = a(x(n) − x(1) ) = aR(x1 . . . ¯ i=1 To show invariance with respect to G2 . . . For the sample variance we have S 2 (ax1 + b. . . . . . Zn . axn + b) = aS(x1 . An ¯ estimator of σ 2 is invariant with respect to this group if W (x1 + a. Zn ) T1 (σZ1 +µ. . . . . xn +a) = T (x1 . . For a location-scale family. . . use the above argument with a = 0. The transformations in G2 leave the scale parameter unchanged. .5. To show invariance with respect to G3 . . R and S both satisfy the above condition. . So invariance with respect to G1 implies invariance with respect to G2 and G3 . ¯ . Because a > 0. where Z1 . ( G2 and G3 are both subgroups of G1 . . . So for estimating σ 2 . Xn ) T2 (σZ1 +µ. σ 2 ). . Zn + θ). . . Thus. xn ) = x(n) − x(1) . . . . . Thus. . . The distribution of (X1 . . . . . xn )+a. . . Zn ) T2 (Z 1 . R(ax1 +b. .40 a. . . . Thus. . . . . . . xn + a) = W (ga (x)) = ga (W (x)) = W (x1 . the Equivariance Principle requires that T (x1 +a. . xn ).e. Because X1 . . Xn is from a location scale family. . . . . . xn +a)− a = T ∗ (x1 +a. c2 σ 2 ). xn ). . . xn ).c (X) ∼ f (y|cθ + a. . axn + b} = ax(n) + b and min{ax1 +b.

for a sample point with s = 1 and x = 0. . cxn ) = φ x 2 ¯ c¯ 2 2 x c s = c2 φ s = c2 W (x1 . . On the other hand. an estimator of σ 2 is invariant with respect to G3 if W (cx1 . . So we have invariance if and only if φ is constant. Invariance with respect to G1 also requires φ to be constant because G2 is a subgroup of G1 . φ must be constant. xn ). xn + a) = φ x+a ¯ s s2 = φ x 2 ¯ s = W (x1 . cs s . that is. . . . . . . . this implies we must have φ(a) = φ(0). W (x1 + a. . . . xn ). ¯ for all a. . if φ is constant. Estimators of the given form are invariant because W (cx1 . . . . . . Finally. . . . then the estimators are invariant by part a). xn ). .Second Edition 6-11 An estimator of the given form is invariant if. for all a and (x1 . . . . . cxn ) = c2 W (x1 . s In particular. xn ). .

Substitute this into L. calculate ∂ 2 logL ∂β 2 = ˆ β=β xi /(nα). ﬁrst. L(β|x) 1 xα−1 e−xi /β = Γ(α)β α i i=1 n = 1 Γ(α)n β nα n n α−1 xi i=1 e−Σi xi /β i logL(β|x) ∂logL ∂β = − log Γ(α)n − nα log β + (α−1) log i=1 xi − xi β = − nα + β i xi β2 i ˆ Set the partial derivative equal to 0 and solve for β to obtain β = that this is a maximum. the same as in part (a) except α and β are both variables. The values α and β that maximize L. f (x|2) = f (x|3) = 1/4 are both maxima. x 0 1 2 3 4 ˆ θ 1 1 2 or 3 3 3 ˆ ˆ At x = 2. That is. 7. ˆ Because β is the unique point where the derivative is 0 and it is a local maximum. Now the likelihood function is L(α. β is the MLE. . These values are in the following table. Then we just need to maximize the function of the one variable α given by 1 nα i xi /(nα)) n α−1 Γ(α)n ( = xi i=1 n e−Σi xi /(Σi xi /(nα)) α−1 Γ(α)n ( 1 nα i xi /(nα)) xi i=1 e−nα . But it is usually best to do as much as possible analytically. the MLE θ is the value of θ that maximizes f (x|θ). One approach to ﬁnding α and β would be to ˆ ˆ numerically maximize the function of two arguments. and perhaps reduce the complexity of the numerical problem.1 For each value of x.Chapter 7 Point Estimation ˆ 7. b. There is no analytic form for the ˆ ˆ MLEs. so both θ = 2 or θ = 3 are MLEs. it is a ˆ global maximum. the value of β that maximizes L is i xi /(nα). To check nα 2 i xi − β2 β3 = ˆ β=β (nα)3 ( i xi ) 2 − 2(nα)3 ( i xi ) 2 =− (nα)3 ( i xi ) 2 < 0.2 a. for each ﬁxed value of α. β|x) = 1 Γ(α)n β nα n α−1 xi i=1 e−Σi xi /β . From part (a).

6 ˆ β = 14(514. Thus.0646. 7. E X = θ θx−1 dx = θ logx|θ = ∞. Thus the method of moments estimator of θ does not exist. σ6 σ 4 ∞ ∞ = √ Thus. and the MLE is 1 if 1 < i 1/(2 xi ). n = 14 and i xi = 323. just equate E X 2 = σ 2 = n i=1 Xi = X 2 ⇒ σ = |X|. and 26 for k. which is negative at σ = |x|. For p = 1/2. 0 < xi < 1.8 a. The ﬁrst two are less than maxi xi . ˆ 323.6 a. The method of moments estimator ˜ ˜ ˜ ¯ ¯ is the solution to the equation θ/2 = X. k must satisfy the two inequalities. 2πσ √ 1 x2 set = − + 3 = 0 ⇒ σ X 2 = σ 3 ⇒ σ = X 2 = |X|.7-2 Solutions Manual for Statistical Inference For the given data.0381.3 The log function is a strictly monotone increasing function.∞) (x(1) ). . b. Thus. f (x|θ) = i θx−2 I[θ. Therefore X 2 is an unbiased estimator of σ 2 . θ = 2X. ˆ ˆ ˆ σ σ −3x2 σ 2 1 ˆ = + 2 . From PROC NLIN in SAS we obtain α = 514. the MLE is 0 if √ √ 1 ≥ i 1/(2 xi ).2. θ) model. (This is the Pareto distribution with α = θ. which can be reduced to the 2 cubic equation −380z 3 + 419z 2 − 40z + 15/16 = 0. Because θ is a simple function of the sample mean. n 3n . ˆ ˆ leading to candidates of 1. 7. −2 7.∞) (xi ) = θn I[θ. i xi So to maximize L(θ|x).5 a.) √ 7. 2 θ2 /12 θ2 ˜ ¯ and Var θ = 4Var X = 4 = . 15. θn is increasing in θ.∞) (x(1) ).219) = . and L(1|x) = i 1/(2 xi ).7 L(0|x) = 1. The roots are . ˆ ˆ ˆ z ≤ 1/k) ˆ z b. and because k ≤ 1/ˆ (so ˆ z ˆ and k + 1 > 1/ˆ. Because E X = 0 is known. L(θ|x) > L(θ |x) if and ˆ only if log L(θ|x) > log L(θ |x). its mean and variance are easy to calculate. L(σ|x) ∂logL ∂σ ∂ 2 logL ∂σ 2 2 2 1 e−x /(2σ ) . The second term does not involve θ.9. θ = x(1) . Thus.9998. L(θ|x) = θn I[θ. that is. hence. So the value θ that maximizes log L(θ|x) is the same as the value that maximizes L(θ|x). 0 < xi < 1. E X 2 = Var X + µ2 = σ 2 . Many computer programs can be used to maximize this function.9 This is a uniform(0. Because it is the only place where the ﬁrst derivative is ˆ zero.0450. 1 1 2 ˆ c. we must solve 1 = (1 − 20z)(1 − z)(1 − 19z). 7. X(1) is a suﬃcient statistic for i i xi θ by the Factorization Theorem. So E X = (0 + θ)/2 = θ/2. we want to make θ as large as possible. it is also a global maximum. log L(σ|x) = log(2π)−1/2 − log σ − x2 /(2σ 2 ). Thus k = 26. L(θ|x) = 0 if θ > x(1) . Thus k is the MLE.6. σ = |x| is a local maximum.219 and. 7. c. k must satisfy [k(1 − p)] ≥ i n (k − xi ) and [(k + 1)(1 − p)] < i n (k + 1 − xi ). ˆ Because the right-hand side of the ﬁrst equation is decreasing in z . The value z solves the equation ˆ (1 − p)n = i (1 − xi z). But because of the indicator ˆ function. and . Therefore. Then from z ˆ Example 7. ˆ where 0 ≤ z ≤ (maxi xi )−1 . β = 1. −2 b. We have θ ˜ ¯ E θ = 2E X = 2E X = 2 = θ. Let k = greatest integer less than or equal to 1/ˆ.

If n is large. β|x). log i Xi = i log Xi = 43. n−1 t Γ(n) θ n−1 1 n−1 −θt θn Γ(n − 2) θ2 t e dt = = . L is maximized at θ = x(n) . ˆ ˜ ˆ ˜ Var θ < Var θ for all θ. This can be used to calculate ˆ Eθ = n θ. The second ˆ derivative is −n/θ2 < 0. ˆ c. So the ˆ ˆ overall maximum.95 ⇒ β = 25. and L(α. ˜ ˆ θ is an unbiased estimator of θ. a ˆ decreasing function. For any ﬁxed α.59. the bias is quite large.11 a. θ is a biased estimator. the MLE. 2 n−2 t Γ(n) θ (n − 1)(n − 2) . For θ ≥ x(n) . the bias is not large because n/(n + 1) is close to one.θ] (x(n) )I[0. The pdf of θ = X(n) is nxn−1 /θn . n+1 ˆ E θ2 = n 2 θ n+2 ˆ and Var θ = nθ2 (n + 2)(n + 1) 2. X(n) is the MLE of β. α = 12.β] (xi ) = β α ( i xi ) I(−∞. f (x|θ) = i β α xα−1 I[0. θ is probably preferable to θ. X(n) = 25. On the other hand. i ˆ Set the derivative equal to zero and use β = X(n) to obtain n α= ˆ nlogX(n) − log 1 = n i Xi −1 (logX(n) − logXi ) i . β|x) a decreasing function of β if β ≥ x(n) . 1/θ). L = 1/θn . For the MLE of α calculate n α−1 ∂ ∂ logL = nlogα−nαlogβ+(α−1)log ∂α ∂α xi = i n − n log β + log α xi . 0 ≤ x ≤ θ. is θ = X(n) . b. note that ˆ = n/T . We have 1 T 1 E 2 T E = = θn Γ(n) θn Γ(n) ∞ 0 ∞ 0 1 n−1 −θt θn Γ(n − 1) θ t e dt = = .∞) (x(1) ). θ θ i=1 n where x(1) and x(n) are the smallest and largest order statistics. But if n is small.∞) (x(1) ) = L(α. Thus θ T ∼ gamma(n. So for θ ≥ x(n) . The second derivative is −n/α2 < 0.10 a. so − i log Xi ∼ gamma(n. We can either calculate the ﬁrst and second moments directly. so this is the MLE. i 1 ˆ Set the derivative equal to zero and solve for θ to obtain θ = (− n i log xi )−1 . Thus. where Yi = − log Xi ∼ exponential(1/θ). if n is large. ( i Xi . α α 7.0. L = 0 for θ < x(n) .θ] (xi ) = n I[0.β] (x(n) )I[0. By i the Factorization Theorem. So. 1/θ).Second Edition 7-3 The likelihood function is L(θ|x) = 1 1 I[0. β|x) = 0 if β < x(n) .0. L(α. so this is the MLE. X(n) ) are suﬃcient. To calculate the variance of θ. or use ˆ the fact that θ is inverted gamma (page 51). θ−1 f (x|θ) = i θxθ−1 i = θ n i xi xi i = L(θ|x) n + θ d log L = dθ d nlogθ+(θ−1)log dθ = log xi . ˆ 7.

13 L(θ|x) = i 1 e− 2 |xi −θ| = 21 e− 2 Σi |xi −θ| . ˜ ˜ ˜ ˜ b. n ˜ ˆ MSE(θ) − MSE(θ) = y=[n/2]+1 n y −θ n 2 − 1 −θ 2 2 n y θ (1 − θ)n−y y n y θ (1 − θ)n−y . . method of moments: EX = θ = 1 n ¯ Xi = X i ⇒ ˜ ¯ θ = X. 1/2] and obtains its maximum at the upper bound of θ which ˆ ¯ is 1/2. n j n 1 1 i |xi − θ| = n i |x(i) − θ|. n i θ+1 n− i Xi 7. So the MLE is θ = min X. because X increasing function of θ on [0.7-4 Solutions Manual for Statistical Inference and thus ˆ Eθ = n θ n−1 and ˆ Var θ = n2 (n − 1) (n − 2) 2 θ2 → 0 as n → ∞. b. For x(j) ≤ θ ≤ x(j+1) . we showed that L(θ|x) is increasing for θ ≤ x and is decreasing ¯ ¯ ¯ for θ ≥ x.7. Because X ∼ beta(θ. 1). 0 ≤ θ ≤ 1/2. . y where Y = i Xi ∼ binomial(n. . c. ˆ ˜ ˆ ˜ Therefore MSE(θ) < MSE(θ) for every θ in 0 < θ ≤ 1/2. x(n) are the order statistics. y Therefore. 1/2 . Remember that 0 ≤ θ ≤ 1/2 in this exercise. . if n is even. L(θ|x) is an ¯ the MLE of θ. if n is odd.2. Therefore. E X = θ/(θ + 1) and the method of moments estimator is the solution to 1 θ i Xi ˜ Xi = ⇒θ= . we have n ˜ ¯ MSE(θ) = E(X − θ)2 = y=0 y −θ n 2 n y θ (1 − θ)n−y . MLE: In Example 7. j |x(i) − θ| = i=1 i=1 (θ − x(i) ) + i=j+1 (x(i) − θ) = (2j − n)θ − i=1 x(i) + i=j+1 x(i) . θ) and [n/2] = n/2. There is no ˆ but an expression is simple formula for MSE(θ).) 7. . The MSE of θ is MSE(θ) = Var θ + bias(θ)2 = (θ(1 − θ)/n) + 02 = θ(1 − θ)/n. and [n/2] = (n − 1)/2. X is ¯ ¯ is the overall maximum of L(θ|x). y = y=[n/2]+1 y 1 + − 2θ n 2 y 1 − n 2 The facts that y/n > 1/2 in the sum and θ ≤ 1/2 imply that every term in the sum is positive. when X ≤ 1/2. When X > 1/2. n ˆ MSE(θ) = ˆ E(θ − θ)2 = y=0 [n/2] ˆ (θ − θ)2 n y θ (1 − θ)n−y y n = y=0 y −θ n 2 n y θ (1 − θ)n−y + y y=[n/2]+1 1 −θ 2 2 n y θ (1 − θ)n−y . a.12 Xi ∼ iid Bernoulli(θ). so the MLE minimizes n 2 where x(1) . Using the notation used in (b). (Note: MSE(θ) = MSE(θ) = 0 at θ = 0.

x2 = wz + 1. maximizing with respect to µ is equivalent to minimizing the sum in the exponential. Usually the midpoint of this interval is taken as the MLE. the likelihood is minimized at θ = x((n+1)/2) .15 a. and any value in this interval is the MLE. d dµ (xi − µ)2 d = 2x µ i dµ ((xi /µ) − 1) =− xi 2 i i i 2 ((xi /µ) − 1) xi . xi µ2 Setting this equal to zero is equivalent to setting xi −1 µ = 0. i and solving for µ yields µn = x. where the range of integration is 0 < w < −1/z if z < 0. ⇒ x1 = w. w = x1 The Jacobean is |w|.27b. 7. 7. b. Make the transformation z = (x2 − 1)/x1 . c. 2j − n = 0 if j = n/2. 2¯2 xi x 1 = xi 1 1 − xi x ¯ xi −2 x2 ¯ i 1 + x ¯ i 1 n =− + xi x ¯ . λ|x) = λ λn/2 exp − (2π)n i xi 2 (xi − µ)2 µ2 xi . This involved algebra can be found in Schwarz and Samanta (1991). 0 < w < ∞ if z > 0. i i b.2a. Plugging in this µn and maximizing with respect to λ ˆ ¯ ˆ amounts to maximizing an expression of the form λn/2 e−λb .17 a. Simple calculus yields n ˆ λn = 2b Finally. So the likelihood is constant between x(n/2) and x((n/2)+1) . fZ (z) = Using the fact that 1 −1/θ e θ2 −1/z we−w(1+z)/θ dw 0 ∞ we−w(1+z)/θ dw 0 if z < 0 if z ≥ 0 we−w/a dw = −e−w/a (aw + a2 ). and fZ (z) = fX1 (w)fX2 (wz + 1)wdw = 1 −1/θ e θ2 we−w(1+z)/θ dw. The likelihood is L(µ. Thus. we have fZ (z) = e −1/θ zθ+e(1+z)/zθ (1+z−zθ) θz(1+z)2 1 2 (1+z) if z < 0 if z ≥ 0 . i For ﬁxed λ. If ˆ n is odd. This is a special case of the computation in Exercise 7. This is the same as Exercise 6. 2b = i where b= i (xi − x)2 ¯ . If n is even.Second Edition 7-5 This is a linear function of θ that decreases for j < n/2 and increases for j > n/2.

k. y) = x. Y ) + (E X)(E Y ) = ρσX σY + µX µY . i Solving these ﬁve equations yields the estimators given. 1 n xi yi = E XY = Cov(X. Facts such as 1 n x2 − x2 = ¯ i i i x2 − ( i n i xi ) /n 2 = i (xi − x)2 ¯ n are used. t3 (x. . 1 n 1 n 2 x2 = E X 2 = σX + µ2 . y) = y. . i X 2 2 yi = n(µ2 + σY ). we have xi i = nµX . y) = x2 . use the Miscellanea: For k L(θ|x) = h(x)c(θ) exp i=1 n wi (θ)ti (x) . Y ) + µX µY ] = n(ρσX σY + µX µY ).7-6 Solutions Manual for Statistical Inference ˆ c. i X i 2 2 yi = E Y 2 = σY + µ2 . The posterior distributions are just the normalized likelihood times prior. gives us θ = ∞. y) = xy. From part (b). Multiplying out the exponent in the bivariate normal pdf shows it has this exponential family form with k = 5 and t1 (x. x|y) = 2 (2πσX )− 2 exp − n 1 2 2σX A (xi − µX )2 i 2 × 2πσY (1−ρ2 ) −n 2 exp −1 2 2σY (1 − ρ2 ) yi − µY + ρ i B σY (x − µX ) σX i 2 . t4 (x. These are the same equations as in part (a) if you divide each one by n. y) = y 2 and t5 (x.18 a. if we use the second ˆ density. t2 (x. use the hint in the book to write L(θ|x. Setting up the method of moment equations. So the MLEs are the same as the method of moment estimators in part (a). Two answers are provided. . Y i We also need an equation involving ρ. i 2 x2 = n(µ2 + σX ). provide the unique MLE for θ. For the second answer. i = 1. 7. Y i yi i = nµY . X2 = 1 implies Z = 0 which. From part (a) we get θ = 1. d. n the solutions to the k equations j=1 ti (xj ) = Eθ j=1 ti (X j ) = nEθ ti (X1 ). b. = µY . = i xi yi i [Cov(X. y) = L(θ|x)L(θ. so of course they are diﬀerent. . First. The usual ﬁrst two moment equations for X and Y are x = ¯ y = ¯ EX EY = µX .

Now ¯ n n ρ log(1 − ρ2 ) − + A 2 1 − ρ2 1 − ρ2 and nρ A(1−ρ2 ) + 2Aρ2 set ∂log L n = − + = 0. Write ˆ log L(¯. ¯ = a i (yi − y )2 where a is a constant. ¯ ˆ ˆ We have ∂logB ∂µX ∝ ⇒ i −2 i (y i − µY )− (yi − µY ) = ρσY (x − µX ) σX i ρσY Σ(xi − µX ).ˆY ˆ σ σY ˆ2   (xi − x)(y i − y )  ¯ ¯   σX σ Y ˆ ˆ  A 2 σY ˆ2 2 i    1 n 2 2n − 2ρ = − log(1 − ρ ) − 2)  2 2(1−ρ  because σX ˆ2 = 1 n i (xi i − x) and ¯ log L = − 2 σY ˆ2 = 1 n i (yi − y ) . 2 (xi − µX ) (y i − µY )− σ σX X ρσY (xi − µX )(yi − µY ) = ˆ ˆ (xi − µX )2 . ∂ 2 log B This implies µX = x and µY = y . ρ|x. = a i (xi − x) . we will get ρ = 1/ρ. i (xi − µX )(yi − µY ) = ρσY ˆ ˆ i (yi − µY ) . n n i σX σY ˆ ˆ This implies . y) x ¯ ˆ2 ˆ2 n 1 = − log(1 − ρ2 ) − 2 2(1−ρ2 ) ¯ ¯ ¯ ¯ (xi − x)2 2ρ(xi − x)(y i − y ) (y i − y )2 − + σX ˆ2 σX . we conclude that a = 1/n. y . Combining the ¯ 1 knowledge that x. Therefore it is maximum). y|x) This implies i (xi −µX ) = ρσY i (yi − σ µY ). that maximizes B. To get σX take ˆ ¯ ˆ ¯ ˆ2 X ∂log B 2 ∂σX ∝ i ⇒ i ρσY ρσY set ˆ (xi − µX ) = 0. ˆ ˆ σY ˆ σX ˆ σX ˆ2 (xi − x)2 = ¯ i σX ˆ σY ˆ 2 (yi − y )2 ⇒ ¯ i i (xi − x)2 ¯ = 2 σX ˆ i (y i − y )2 ¯ . ¯ ¯ µ ˆ2 Lastly. ¯ ˆ2 2 2 µY . so we need i (yi − µY ) = 0 and i (xi − µX ) = 0. ( ∂µ2 < 0. n i (xi − x) = (ˆX . ∂ρ 1 − ρ2 (1−ρ2 )2 (1−ρ2 )2 A + Aρ2 −nˆ−nˆ3 ρ ρ = 0 ⇒ A(1 + ρ2 ) = nˆ(1 + ρ2 ) ˆ ρ ˆ (1−ρ2 )2 (xi − x)(y i − y ) ¯ ¯ A 1 ⇒ ρ= ˆ = . and ρ. If i (yi − µY ) = 0 or ˆ ˆ ˆ ˆ ˆ ˆ ˆ i (xi − µX ) = 0. Thus σX and σY must satisfy the σ ˆ ¯ µY = Y . 2 σY ˆ Therefore. the question is whether given σY . does x. we ﬁnd ρ. σY . the MLE of ρ. σX ) maximizes A. ˆ σX ˆ X ˆ 2 ˆ2 ˆ2 Similarly. σX maximize B? Let us ﬁrst ﬁx σX and look for µX . This implies ¯ above two equations with µX = X. ˆ σX ρσY σX set = 0 X Similarly do the same procedure for L(θ|y)L(θ. σX . The solutions µX and µY therefore must satisfy both equations.Second Edition 7-7 ¯ We know that x and σX = i (xi − x)2 /n maximizes A.

19 a.6. And β is unbiased because xi E Yi xi · βxi ˆ Eβ = i = i = β. ¯ x2 ≥ n¯2 . β is normally disj 2 a2 Var Yi i = i xi 2 j xj σ2 = ( x2 2 i σ = x2 )2 j j i σ2 2. i xi ∂ 2 logL 1 = 2 2 ∂β σ x2 < 0. β is BLUE (Best Linear Unbiased Estimator of β). it is the MLE. i Yi i xi = ( 1 2 i xi ) Var Yi = i i ( σ2 nσ 2 σ2 = 2 2 = . Var Because i i Yi = i xi 1 i xi i E Yi = 1 i xi βxi = β. xi yi − i β σ2 set ˆ x2 = 0 ⇒ β = i i i xi yi 2 .2. i i ∂logL 1 = 2 ∂β σ Also.σ 2 |y) = − For a ﬁxed value of σ 2 . i i ˆ ˆ so it is a maximum. By Corollary 4.) . Hence. 2 n x ¯ n¯2 x i xi ) i i x2 − n¯2 = x i i (xi − x)2 ≥ 0. i xi 7. β σ2 β2 2σ 2 logL(β. where ai = xi / tributed with mean β. β = i ai Yi .7-8 Solutions Manual for Statistical Inference 7. ( b. and ˆ Var β = i j ˆ x2 are constants. Because β does not depend on σ 2 .1. E b. By Theorem 6. x i i ˆ Var β = σ2 σ2 ≤ = Var 2 n¯2 x i xi Yi i xi . L(θ|y) = i √ 1 2πσ 2 exp − 1 (y − βxi )2 2σ 2 i 1 2σ 2 β 2 = (2πσ 2 )−n/2 exp − (2πσ 2 )−n/2 exp − Yi2 . ˆ (In fact.2. σ 2 ).10. as discussed in Section 11.3. 2 2 i xi i xi ˆ c. i i xi Yi ) is a suﬃcient statistic for (β. n 1 n log(2π) − log σ 2 − 2 2 2 2σ 2 yi + xi yi − i x2 . (y i −2βxi yi + β 2 x2 ) i i 2 = 2 i xi 2 2σ exp − 1 2σ 2 2 yi + i β σ2 xi yi i .20 a.

θ With π(θ) as given. we have (ignoring terms that do not depend on θ) π(θ|t) ∝ ∝ 1 θ 1 θ ((n−1)/2)−1 e−(n−1)t/2θ ((n−1)/2)+α+1 1 θ 1 −1/βθ e θα+1 . v 2 ) × n(µ.8 with ai = 1/x2 i we obtain 1 n 1 ≥ x2 i n 2. θ) = f (¯|θ)π(θ) = √ x x e x e−(θ−µ) /2τ . x2 i 1 1 2 = Var n xi Yi . E b. 7.22 a. The above factorization shows that f (x. b) denote ¯ the pdf of a normal distribution with mean a and variance b. xi i Thus. ¯ where the marginal distribution of X is n(µ. x 2σ 2 2τ 2v τ + σ 2 /n where δ(x) = (τ 2 x + (σ 2 /n)µ)/(τ 2 + σ 2 /n) and v = (σ 2 τ 2 /n) (τ + σ 2 /n).21 a. x2 i 1 n 1 Yi = xi n 1 E Yi = xi n βxi = β. θ) = n(θ. ˆ Var β = σ2 σ2 ≤ 2 2 n i xi i i Because g(u) = 1/u2 is convex. 2πσ 2πτ b. √ 2 2 n −n(¯−θ)2 /(2σ2 ) 1 √ f (¯. τ 2 + σ 2 /n) and the posterior distribution of θ|x is n(δ(x). xi i i i i i i Using Example 4. v 2 ).Second Edition 7-9 7. This also completes part (c). Factor the exponent in part (a) as 1 1 1 −n (¯ − θ)2 − 2 (θ − µ)2 = − 2 (θ − δ(x))2 − 2 x (¯ − µ)2 . τ 2 + σ 2 /n). Let n(a.7. we have n−1 f (t|θ) = 1 Γ ((n − 1)/2) 2(n−1)/2 n−1 t θ [(n−1)/2]−1 i i 1 . Because (n − 1)S 2 /σ 2 ∼ χ2 . σ 2 /n) × n(µ. exp − 1 (n − 1)t 1 + θ 2 β . Var 7. using Jensen’s Inequality we have 1 1 ≤ x2 ¯ n Thus. xi Yi i xi = σ2 σ2 ≤ 2 n¯2 x n i i e−(n−1)t/2θ n−1 . i xi 1 1 = Var x2 n i Yi . Var 1 n Yi 1 = 2 xi n Var Yi σ2 = 2 x2 n i 1 .23 Let t = s2 and θ = σ 2 . τ 2 ) = n(δ(x).

σ 2 )π(θ|µ. b) is 1/((a − 1)b). ∞ e−A(θ) dθ = −∞ √ 2π √ σ2 στ . σ 2 . σ 2 . with a= n−1 +α 2 and b= (n − 1)t 1 + 2 β −1 . 2πστ Complete the square in θ to write the sum of the two exponents as θ− − xτ 2 σ 2 +τ 2 + µσ 2 σ 2 +τ 2 2 2τ 2 2 σσ +τ 2 2 − (x − µ)2 . E(λ|y) Var(λ|y) = = (y + α) (y + α) β nβ+1 β2 = 2. Y = i Xi ∼ Poisson(nλ). the Xi s are independent and each Xi has marginal pdf ∞ ∞ m(x|µ. e−A(θ) is the kernel of a normal pdf. a. From part (b). τ 2 ) = −∞ f (x|θ. ny y!Γ(α)β α λ(y+α)−1 e− β/(nβ+1) dλ = λ ny Γ(y + α) y!Γ(α)β α β nβ+1 y+α . λ(y+α)−1 e− β/(nβ+1) β f (y|λ)π(λ) = π(λ|y) = y+α ∼ gamma y + α. b). 7. IG(a. τ 2 ) = = a n(µ. (x − µ)2 1 √ στ exp − 2π √ 2(σ 2 + τ 2 ) 2πστ σ2 + τ 2 (x − µ)2 1 √ √ exp − .7-10 Solutions Manual for Statistical Inference which we recognize as the kernel of an inverted gamma pdf. We will use the results and notation from part (b) to do this special case.24 For n observations. This is a Bayes estimator of σ 2 . τ 2 ) dθ = −∞ 1 −(x−θ)2 /2σ2 −(θ−µ)2 /2τ 2 e e dθ. λ . Thus. The marginal pmf of Y is ∞ m(y) = 0 (nλ)y e−nλ 1 λα−1 e−λ/β dλ y! Γ(α)β α ∞ 0 = Thus. 2(σ 2 + τ 2 ) 2π σ 2 + τ 2 . Direct calculation shows that the mean of an IG(a. + τ2 and the marginal pdf is m(x|µ. call it −A(θ). β 1 y+ (αβ). so E(θ|t) = n−1 1 2 t+ β n−1 2 + α−1 = n−1 2 2 s n−1 2 + + 1 β α−1 . Also. σ 2 + τ 2 ) pdf.25 a. m(y) nβ+1 β Γ(y+α) nβ+1 b. 2(σ 2 + τ 2 ) Only the ﬁrst term involves θ. nβ+1 nβ+1 (nβ+1) 7.

n 2 n 2 πσ 2 2n . m(x|τ ). ∞ 0 ∞ 0 0 −∞ e− 2 (t−b) dt te − a (t−b)2 2 a 2 0 = −∞ ∞ e− 2 (t−b) dt = (t − b)e− 2 (t−b) dt + 0 a 2 a 2 π . . Thus. For one observation of X and θ the joint pdf is h(x. . this shows that marginally X1 . Doing each integral in turn yields the marginal pdf m(x|τ ) = i m(xi |τ ). 2 n ∞ e− 2σ2 (θ−δ± (x)) dθ −∞ Now use the facts that for constants a and b.Second Edition 7-11 b. . θ|τ ) = f (x|θ)π(θ|τ ). . dθn . θ1 |τ ) dθ1 i=2 h(xi . θn . . Because this marginal pdf factors. m(x1 |τ ) can be pulled out of the integrals. . . and this is not a function of θ2 . . dθn n = −∞ ··· −∞ h(x1 . θ|τ ) = i h(xi . . . . θi |τ ) dθ2 . Xn are independent. . and the marginal pdf of X is ∞ m(x|τ ) = −∞ h(x. The dθ1 integral is just m(x1 |τ ). and the marginal pdf of X is ∞ ∞ m(x|τ ) = −∞ ∞ ··· −∞ ∞ i h(xi . 7. . 2a 2 E(θ|x) = (δ− (x) + δ+ (x)) + σ n 2 e− 2σ2 δ+ (x) −e− 2σ2 δ− (x) . . . 2a te− 2 (t−b) dt to get a 2 1 a 2 = − e− 2 b + b a πσ 2 2n π . and they each have the same marginal distribution.26 First write 2 n x f (x1 . θn ) is h(x. the joint pdf of X = (X1 . . θi |τ ). θ|τ ) dθ. . . Thus. θi |τ ) dθ1 . . . Xn ) and θ = (θ1 . . 2a ∞ 0 dt = be− 2 (t−b) dt = a 2 1 − a b2 e 2 +b a π . . the posterior E(θ|x) = 2 n ∞ θe− 2σ2 (θ−δ± (x)) dθ −∞ . where we use the “+” if θ > 0 and the “−” if θ < 0. xn |θ)π(θ) ∝ e− 2σ2 (¯−θ) −|θ|/a where the exponent can be written |θ| n n n 2 (¯ − θ)2 − x = (θ − δ± (x)) + 2 x2 − δ± (x) ¯ 2σ 2 a 2σ 2 2σ with δ± (x) = x ± ¯ mean is σ2 na . . So. .

1+β b. mβ ˆ Since τj = 1. β = i=1 yi /m = i=1 yi / i=1 xi .7-12 Solutions Manual for Statistical Inference 7. and plugging this into the ﬁrst equaˆ tion gives the desired expression for β. The log likelihood is n log L = i=1 −mβτi + yi log(mβτi ) + xi log(τi ) + log m! − log yi ! − log xi ! and ∂ log L = 0 ∂β ∂ log L = 0 ∂τj n n ⇒ ⇒ n β= τj = n i=1 yi n i=1 mτi xj + yj . ˆ β+1 xi The second equation yields τ1 = y1 /β. b. j τj = j (yj + xj ) = 1. ˆ 7. The stationary point of the EM algorithm will satisfy ˆ β = τ1 ˆ τj ˆ = = n i=1 yi n i=2 τ1 + ˆ τ1 + y 1 ˆ ˆ β+1 xj + yj . ˆ c. The log likelihood is n log L = i=1 −βτi + yi log(βτi ) − τi + xi log(τi ) − log yi ! − log xi ! and diﬀerentiation gives ∂ log L ∂β ∂ log L ∂τj n = i=1 −τi + −β + n y i τi βτi ⇒ β= ⇒ n j=1 n i=1 yi n i=1 τi = ⇒ yj β xj −i+ βτj τj n j=1 τj = yj . which implies that mβ = j (yj + xj ) and τj = (xj + yj )/ i (yi + xi ). . xj + yj 1+β τj = j=1 xj + 1+β ˆ Combining these expressions yields β = n j=1 yj / n j=1 xj and τj = ˆ xj +yj ˆ . and substituting β = n n n n ˆ j=2 yj / j=2 xj shows us that j=2 τj = j=2 xj . τ τ Substituting this into the MLEs of part (b) gives the EM sequence. as were the expressions for τi . In the likelihood function we can ignore the factorial terms. The expression for β was derived in part (b).27 a. The joint density is the product of the individual densities.16) are obviously satisﬁed. ˆ c. and substituting this into the ﬁrst equation yields n n β = j=2 yj / j=2 xj . Also. and the expected complete-data (r) (r) likelihood is obtained by on the rth iteration by replacing x1 with E(X1 |ˆ1 ) = mˆ1 .29 a.2. The other two equations in (7. Summing over j in the third equation.

08642925). 0.1908.01695673. 0. 7. 0. y] = log k(x|θ.06689681. y)| θ(r) . θ(r+1) is obtained by maximizing the expected complete-data log likelihood. 1047.01878119. 0.2571. and so the entire right hand side in part (a) is decreasing. 0.Second Edition 7-13 ˆ The MLEs from the full data set are β = 0. . X) θ(r) . 0. 0. 0.1.0008415534 and τ = ˆ (0. y ≤ E log k(X|θ(r) .3952.993. where we take m = R code is #mles on the incomplete data# xdatam<-c(3560. ˆ ˆ ˆ so for any θ. 0.2729.4) xdata<-c(mean(xdatam). By direct substitution we can write ˆ ˆ log L(θ|y) = E log L(θ|y. xn + a) = ∞ t −∞ ∞ −∞ i i n √ .4969.2. y − E log k(X|θ.01776510. . y ≥ E log L(θ|y. 0. δp (g(x)) = δp (x1 + a. as desired.35 a. 0. 0. 7.01878485.02098536. 0.09945087. 0.02098127. X)| θ(r) .06337310.1.09820005. 0. 0.3739.08644610). 0. .04981487. 0. 0. ˆ ˆ ˆ ˆ from the hint. X)| θ(r) .04604973.04604075.1.06376116.0008413892 and τ = ˆ (0. . 0.05621836.04982459.3138. y) log k(x|θ .3. y . ˆ The next iterate.2. 0. 0.04884062. y) θ(r) . Hence E log k(X|θ(r+1) .06690986.5.2.3.01776164. y b.07073839.5554.31 a. E log L(θ(r+1) y.08898653.948. 4( n + n)2 f (xi + a − t) dt f (xi + a − t) dt = ∞ −∞ (y + a) i i f (xi − y) dy ∞ −∞ f (xi − y) dy (y = t − a) = a + δp (x) = g (δp (x)) .03416388. y) log k(x|θ . 7. y)|θ .4828) ydata<-c(3.4. y)dx.05267677.xdatam) beta<-sum(ydata)/sum(xdata) tau<-c((xdata+ydata)/(sum(xdata)+sum(ydata))) } beta tau ˆ The MLEs from the incomplete data set are β = 0.01696004. 0.1172.5485.0.04883109. 0.06319044. y)dx ≤ log k(x|θ . The The MLEs for the incomplete data were computed using R. y .xdatam). 0. y) θ(r) . Write E [log k(X|θ.5. 0. 0. ¯ . 0. 0.06374873.1. 0.05268704. xi . 0. 0.09947027.08896918.33 Substitute α = β = n/4 into MSE(ˆB ) = p np(1−p) (α+β+n)2 + np+α α+β+n 2 −p and simplify to obtain MSE(ˆB ) = p independent of p. 0. for (j in 1:500) { xdata<-c(sum(xdata)*tau[1].03417054.6.2784.07072460.2943.4. 0.05622933.09818091.0.

Now θ EY = 0 y ny n−1 n dy = θ θn n+1 ⇒ E n+1 Y n = θ. for all θ. n θ Thus. Note that the pdf of Y is fY (y) = ny n−1 /θn . X(n) is not a minimal suﬃcient statistic (recall Exercise 5. for all θ. To check that it is a complete suﬃcient statistic.θ) (xi ) = i 1 2θ n I[0. − i log Xi /n is the UMVUE of 1/θ and attains the Cram´r-Rao bound. for all θ. θn Taking derivatives shows that θn−1 g(θ) = 0.. Suppose g(y) is a function such that θ E g(Y ) = 0 ny n−1 g(y) dy = 0. i By the Factorization Theorem. and Y = maxi |Xi | is a complete suﬃcient statistic. It is for θ < Xi < 2θ. So g(θ) = 0. let Y = maxi |Xi |. The joint pdf is f (x|θ) = 1 2θ n I(−θ.7-14 Solutions Manual for Statistical Inference b. Then maxi |Xi | is minimal suﬃcient. Therefore n+1 maxi |Xi | is a best unbiased estimator for θ because it is a function of a complete n suﬃcient statistic.3. (2π)n/2 (2π)n/2 so √ √ ( n/ 2π) δp (x) = √ √ ( n/ 2π) 2 1 ∞ x te− 2 n(¯−t) dt −∞ 2 1 ∞ x e− 2 n(¯−t) dt −∞ = x ¯ = x. etc. ¯ 1 c.θ) (max|xi |).) 7. so t dt = 1 dt δp (x) = x(1) +1/2 x(n) +1/2 x(1) + x(n) . ﬁrst ﬁnd a complete suﬃcient statistic. a. f (xi − t) = i 2 2 2 1 1 1 1 1 x e− 2 Σi (xi −t) = e− 2 n(¯−t) e− 2 (n−1)s .37 To ﬁnd a best unbiased estimator of θ. f (xi − t) = i i I t− 1 1 ≤ xi ≤ t + 2 2 x(1) +1/2 x(n) +1/2 = I x(n) − 1 1 ≤ t ≤ x(1) + 2 2 . 0 < y < θ. 4θ < Xi < 6θ. 2 7. but not when the range is symmetric about zero.15.36). maxi |Xi | is a suﬃcient statistic. (Note that X(1) . ∂ logL(θ|x) ∂θ = ∂ log ∂θ θxθ−1 = i i ∂ ∂θ [logθ + (θ−1) logxi ] i = i 1 + logxi θ = −n − i logxi 1 − .38 Use Corollary 7. e . −2θ < Xi < θ.

3. and this condition is satisﬁed for all exponential families. then E W (X) = τ (θ). X is the UMVUE of ∂ ∂θ and attains the Cram´r-Rao lower bound.Second Edition 7-15 b. x p(1 − p) ¯ By Corollary 7. e Note: We claim that if log L(θ|X) = a(θ)[W (X) − τ (θ)]. To be rigorous. p(1 − p) . Both (a) and (b) are exponential families. ¯ Thus. because ∂ under the condition of the Cram´r-Rao Theorem. f (X|θ) f (X|θ) = = ∂2 f (x|θ) dx = ∂θ2 d ∂ Eθ log f (X|θ) dθ ∂θ d dθ = 0. we e need to check the “interchange diﬀerentiation and integration“ condition. X is the UMVUE of p and attains the Cram´r-Rao lower bound. E ∂θ log L(θ|x) = 0. 7.40 ∂ logL(θ|x) ∂θ = = i ∂ log ∂p pxi (1 − p)1−xi i = ∂ ∂p xi log p + (1 − xi ) log(1 − p) i xi (1 − xi ) − p 1−p = n¯ n − n¯ x x − p 1−p = n [¯ − p]. ∂ logL(θ|x) ∂θ = = i ∂ log ∂θ i logθ xi θ θ−1 = + 1 θ ∂ ∂θ [loglogθ − log(θ−1) + xi logθ] i 1 1 − θlogθ θ−1 xi = i n n n¯ x − + θlogθ θ−1 θ = n x− ¯ θ θ θ−1 θ 1 − θ−1 logθ − 1 logθ .15.39 Eθ ∂2 log f (X|θ) ∂θ2 = Eθ = Now consider the ﬁrst term: Eθ ∂2 ∂θ 2 f (X|θ) ∂ ∂θ ∂ ∂θ ∂ log f (X|θ) ∂θ ∂ ∂θ f (X|θ)  = Eθ  Eθ ∂2 ∂θ 2 f (X|θ) f (X|θ) f (X|θ) − ∂ ∂θ f (X|θ) 2  .8) and the identity is proved.3. we could calculate −nEθ ∂2 logf (X|θ) ∂θ2 ∂2 1−X = −nE log pX (1 − p) ∂p2 = −nE ∂ X (1 − X) − ∂p p 1−p = = −nE = −nE ∂2 [Xlogp + (1 − X) log(1 − p)] ∂p2 −X 1−X − p2 (1 − p)2 1 1 = −n − − p 1−p n . 7. ∂ f (x|θ) dx ∂θ (assumption) (7. Altere natively.

∗ ∗ 2 2 Let qi = qi (1 + λti ) with 0 ≤ λ ≤ 1 and |ti | ≤ 1. τ (θ) −nEθ ∂2 ∂θ 2 logf (X|θ) = 1 p(1 − p) ¯ = = VarX. i (1/n)Xi = X has the minimum variance among all linear unbiased estimators. b. Ann. here is a more detailed version. 7.43 a. Thus. and ≤ 1 λ2 [1 − T 2 ] . e 7. i ( [ [ 1 ∗ (1 + λt )]2 j j qj 1 ∗ qj (1 + λtj )]2 j ∗ using the deﬁnition of qi . n/p(1 − p) n ¯ ¯ We know that EX = p. A cryptic version of the proof is in Tukey (Approximate Weights. Statist. Now since ∗ qj (1 + λtj )]2 = [1 + λ j qj tj ]2 . X attains the Cram´r-Rao bound. 2 ) 1 + [1 + λT ]2 j (1/σj . 91-92). Hence the estimator is unbiased. 1/n. i a2 is minimized by choosing ai = 1/n for all i. i ¯ Thus. This one is real hard . where we used the fact that [ j j ∗ qj = 1. Therefore. n because the cross-term is zero. we need to minimize i i i subject to the constraint i ai = 1. Then Var qi Wi j qj = = = 1 2 j qj ) 2 qi σi i ∗2 2 qi (1 + λti )2 σi i 2 j (1/σj ) ∗ qi (1 + λti )2 . Recall that qi = (1/σi )/ j (1/σj ) and ∗ 2 VarW = 1/ j (1/σj ). Add and subtract the mean of the ai . Hence. Now let T = j Var qi Wi j qj qj tj .it was taken from an American Statistician article. 1948.41 a. but the proof is not there. since j qj t2 ≤ 1. Var ( i ai Xi ) = i a2 Var Xi = i a2 σ 2 = σ 2 i a2 . i ai − 1 n + 1 n 2 = i ai − 1 n 2 + 1 .7-16 Solutions Manual for Statistical Inference Then using τ (θ) = p and τ (θ) = 1. E ( i ai Xi ) = i ai E Xi = i ai µ = µ i ai = µ. to get a2 = i i i i a2 . Var qi Wi j qj = ≤ λ2 [ j qj t2 − ( j qj tj )2 ] 1 j 1+ 2) [1 + λ j qj tj ]2 j (1/σj λ2 [1 − ( 1 1+ 2 [1 + λ j (1/σj ) j j j qj tj )2 ] qj tj ]2 . Now write ∗ qi (1 + λti )2 = 1 + 2λ i j q j tj + λ 2 j qj t2 = [1 + λ j j qj tj ]2 + λ2 [ j q j t2 − ( j j qj tj )2 ]. Math.

bi = 1/k 2 (1/σi ) 2 j 1/σj 2 σi k = 2 1/σj .23. 2 2 Var W ∗ 4B 4(σmax /σmin ) 7. Let Y ∼ n(θ. n n n n i ¯ Therefore. 4 2 2 3σ + 6θ σ + θ − (σ + θ ) . 1 1 1 1 ¯ ¯ ¯ E X 2− = Var X + (E X)2 − = + θ2 − = θ2 .6) = 2σ + 4θ σ .3. E Y 4 = E[Y 3 (Y − θ + θ)] = E Y 3 (Y − θ) + E Y 3 θ = E Y 3 (Y − θ) + θE Y 3 . X 2 − 1/n is the UMVUE of θ2 . but ﬁrst we derive some general formulas that will also be useful in later exercises. 1−λ where bi /bj are the ratio of the normalized weights which. X expectation. Then here are formulas for E Y 4 and Var Y 2 . ¯ where X ∼ n (θ. 1/n) . bmax = 2 σmax k 2 1/σj j and bmin = 2 σmin k 2 1/σj j 2 2 and B = bmax /bmin = σmax /σmin . with maximizing value Var qi Wi j qj ≤ 1 λ2 [1 − λ2 ] 1 = VarW ∗ . Substituting this into Tukey’s inequality yields 2 2 (B + 1)2 ((σmax /σmin ) + 1)2 Var W ≤ = . j Thus. b. We will calculate ¯ ¯ ¯ ¯ Var X 2 −1/n = Var(X 2 ) = E(X 4 ) − [E(X 2 )]2 . is bi /bj = (1 + λti )/(1 + λtj ). X 2 − 1/n is a function of ¯ 2 − 1/n is the unique best unbiased estimator of its i Xi . 2 n n n 2 = θ 3θσ 2 + θ3 = 4 2 2 = 3θ2 σ 2 + θ4 . by Theorem 7. 2 ) 1 + [1 − λ2 ]2 (1/σj 1 − λ2 j Bloch and Moses (1988) deﬁne λ as the solution to bmax /bmin = 1+λ . σ 2 ). Solving B = (1 + λ)/(1 − λ) yields λ = (B − 1)/(B + 1). 1 ¯ Var X 2 − n ¯ = Var X 2 = 2 1 1 4θ2 + 4θ2 > . (Stein’s Lemma) θE Y 3 Var Y Thus. and setting ti = 1 and tj = −1 (the extremes) yields the Bloch and Moses (1988) solution. 1).6.44 ¯ Xi is a complete suﬃcient statistic for θ when Xi ∼ n(θ. 4 2 2 2 (Example 3. The right hand side is maximized by taking ti as large as possible and tj as small as possible.Second Edition 7-17 and the right hand side is maximized at T = −λ. E Y 3 (Y −θ) = σ 2 E(3Y 2 ) = σ 2 3 σ 2 +θ2 = 3σ 4 + 3θ2 σ 2 . Therefore. in the present notation.

∂2 n ∂θ 2 logf (X|θ) 2 −nEθ Thus. the ratio of the e variance and the lower bound → 1 as n → ∞.8b for the proof. = E Z 4.) 7.6. n+1 d. Now. Under normality we have κ= E[X − µ]4 X −µ =E σ4 σ 4 1 n κ− n−3 n−1 σ4 . under normality the minimizing value of a is 1 = B+1 1 1 n 3− n−3 n−1 = +1 n−1 .45 a. using Lemma 3. Then minimizing MSE(S 2 ) is equivalent to minimizing a2 B + (a − 1)2 . [τ (θ)]2 = (2θ)2 = 4θ2 so the Cram´r-Rao Lower Bound for estimating θ2 is e [τ (θ)] 4θ2 = . we have e Eθ ∂ 2 logf (X|θ) ∂θ2 = = Eθ Eθ 2 ∂2 1 log √ e−(X−θ) /2 2 ∂θ 2π 2 ∂ 1 log(2π)−1/2 − (X−θ)2 ∂θ2 2 = Eθ ∂ (X−θ) ∂θ = −1. and τ (θ) = θ2 . write Var(S 2 ) = Bσ 4 . under normality κ = 3. There were two typos in early printings. Because E S 2 = σ 2 . MSE(aS 2 ) = Var(aS 2 ) + bias(aS 2 )2 = a2 Var(S 2 ) + (a − 1)2 σ 4 . κ = E[X − µ]4 /σ 4 and Var(S 2 ) = See Exercise 5. Set the derivative of this equal to 0 (B is not a function of a) to obtain the minimizing value of a is 1/(B + 1). Hence. where Z ∼ n(0. . b. 1).7-18 Solutions Manual for Statistical Inference To calculate the Cram´r-Rao lower bound.5 with g(z) = z 3 we have κ = E Z 4 = E g(Z)Z = 1E(3Z 2 ) = 3E Z 2 = 3. There was a typo in early printings. the UMVUE of θ2 does not attain the Cram´r-Rao bound. To minimize MSE(S 2 ) in general. There was a typo in early printings. (However. the minimizing a is a= n−1 (n + 1) + (κ−3)(n−1) n . To obtain this simply calculate 1/(B + 1) with (from part (b)) B= 1 n κ− n−3 n−1 . Using the expression in part (b). bias(aS 2 ) = E(aS 2 ) − σ 2 = (a − 1)σ 2 . c.

ˆ = θ.23.48 a. ¯ for θ to obtain the method of moments estimator θ b. 2 ¯ ∼ n(r.17 and 7. Then. The Cram´r-Rao Lower Bound for unbiased estimates of p is e d dp p 2 d2 −nE dp2 logL(p|X) = −nE 1 X d2 log[p (1 dp2 − p) 1−X = ] −nE 1 (1−X) X − p2 − (1−p)2 = p(1 − p) .49 a. Theorems 7. So E θ = 2n+2 θ. θ = 2(1.6650. if κ = 3 the second term in the denominator is zero and a = (n − 1)/(n + 1). 2θ) distribution we have E X = (2θ + θ)/2 = 3θ/2. If σ 2 is unknown replace it with s2 best unbiased because X and the conclusion still holds. Evaluating this yields E X1 X2 X3 X4 i Xi = t = P (X 1 = X 2 = X 3 = X 4 = 1. P ( i Xi = t) p4 n−t n−4 t−4 (1 − p) t−4 p n−t n t t p (1 − p) n i=5 Xi = t − 4) n . So θ = X(n) /2 is the MLE. θn (n) ˆ ˆ Because 1/θn is decreasing. From Theorem 5. B > 0. Thus E [(π X 2 − πσ 2 /n)] = πr2 is ¯ ¯ 7. the second term in the denominator is negative.16)/3 = . So by the Rao-Blackwell Theorem. σ ). The MLE of p is p = i Xi /n. hence.Second Edition 7-19 e. .x(1) ] (θ). this is maximized at θ = x(n) /2. X(n) ) is an ˜ ˜ unbiased estimator of θ (θ is unbiased) with smaller variance than θ. so it can not be improved with the Rao-Blackwell Theorem. the likelihood function is L(θ|x) = 1 I[x /2. σ 2 /n) and E X 2 = r2 + σ 2 /n. . E(θ|X(1) . Because i Xi is a complete suﬃcient statistic. For the uniform(θ. t i = = n−4 t−4 for t ≥ 4. the second term in the denominator is positive. By independence. Similarly. Using the expression for a in part (d). a minimal suﬃcient statistic for θ is (X(1) . Y = X(1) has pdf fY (y) = n! 1 −y/λ −y/λ e 1−(1 − e ) (n − 1)! λ n−1 Xi = = n −ny/λ e .23 imply that E(X1 X2 X3 X4 | 4 i Xi ) is the best unbiased estimator of p . λ Thus Y ∼ exponential(λ/n) so E Y = λ/n and nY is an unbiased estimator of λ. E kθ ˜ c. Because Var(S 2 ) = Bσ 4 . we have a > (n − 1)/(n + 1). n because E X = p. ˆ b. Use 2n+1 2n+1 ˆ the pdf of X(n) to calculate E X(n) = n+1 θ.33/2 = . The MLE is a function of (X(1) . if κ > 3. ¯ 7. and if k = (2n + 2)/(2n + 1). x(n) denote the observed order statistics. so the estimator is unbiased. If κ < 3. . ˜ ˆ d. Because we are dividing by a smaller value. the normal result from part (c). X(n) ). So we solve 3θ/2 = X ˜ = 2X/3.3.9. E(X1 X2 X3 X4 ) = i E Xi = p4 .7733 and θ = 1. . 7.3.47 Xi ∼ n(r. and. a = 1/(B + 1) < 1. with E p = p and Var p = p(1 − p)/n.5. so the estimator is E(X1 X2 X3 X4 | t) = 0. so X ¯ is a complete suﬃcient statistic. θ is not a function ˜ of this minimal suﬃcient statistic. we have a < (n − 1)/(n + 1). For t < 4 one of the Xi s must be zero. Let x(1) . From Exercise 6. Because we are dividing by a larger value. 7. X(n) ).46 a. . Thus ˆ ˆ ˆ p attains the CRLB and is the best unbiased estimator of p.

b.2. So X − cS is a nonzero function ¯ ¯ of (X. Then E[T (X)|T (X)] = T (X) and Var T (X) > Var T (X). The estimator T ∗ has minimum MSE over a class of estimators that contain those in Exercise 7. 7. Let T (X) be a complete suﬃcient statistic. Use the factorization in Example 6. 2 2 2 V1 +V 2 ((1/n) + c −1)θ ((1/n) + c −1) 2 2 c.9. Of course. where V1 = VarX Use calculus to show that this quadratic function of a is minimized at a= (c −1)θ2 (c −1) V2 = = .50. 7. X is a one-to-one ¯ ¯ function of i Xi . So X is also a complete suﬃcient statistic. E S 2 = λ. a) V2 . c. restricting T ∗ ≥ 0 will improve the MSE. E(X − cS) = θ − θ = 0. for all θ. Because E ( i Xi ) = nλ. Thus (X. −1 −1 Diﬀerentiating yields a2 = (n + 1)c2 − n and a1 = 1 − (n + 1)c2 − n . So aX + (1 − a)cS is an unbiased estimator of θ. that is. Because θ > 0. by Theorem 7. S 2 ) is suﬃcient. S 2 ) whose expected value is always zero. it is also the unique best unbiased estimator of λ. ¯ ¯ ¯ 7. c.3. ¯ ¯ b. with the special values µ = θ and σ 2 = θ2 . ¯ ¯ Therefore E(S 2 |X) = X. Var(aX +(1−a)cS) = a2 V1 +(1− 2 ¯ = θ2 /n and V2 = Var(cS) = c2 E S 2 − θ2 = c2 θ2 − θ2 = (c2 − 1)θ2 . Straightforward calculation gives: ¯ E θ − (a1 X + a2 cS) 2 ¯ = a2 Var X + a2 c2 Var S + θ2 (a1 + a2 − 1)2 . which is independent of θ. 1 2 and we only need minimize the expression in square brackets. Because X is a function of i Xi and E X = λ.52 a. S 2 ) is not complete. Any function of i Xi that is an unbiased estimator of λ is the unique ¯ ¯ ¯ best unbiased estimator of λ. Thus. From part (a). ˆ ˆ c. Because X and S 2 are independent for this normal model. we know directly that because E( i Xi ) = nλ. Then we have ¯ ¯ ¯ ¯ Var S 2 = Var E(S 2 |X) + E Var(S 2 |X) = Var X + E Var(S 2 |X). i Xi /n is the best unbiased estimator of λ.23. d. and let T (X) be any statistic other than T (X) such that E T (X) = E T (X). E(aX + (1 − a)cS) = aE X + (1 − a)E(cS) = aθ + (1 − a)θ = θ. . We formulate a general theorem. i Xi is a complete suﬃcient statistic and E (nX(1) | i Xi ) is the best unbiased estimator of λ. Because fX (x) is in the exponential family. ¯ so Var S 2 > Var X.2 and from part (b) λ = 128. Thus.7-20 Solutions Manual for Statistical Inference b. i Xi is a complete suﬃcient statistic. ¯ b. we must have E (nX(1) | i Xi ) = i Xi /n by completeness. any function of i Xi that is an unbiased estimator of λ is the best unbiased estimator of λ.50 a. No. S 2 is an unbiased estimator of the population variance. Maybe the exponential model is not a good assumption. we have = θ2 a2 n + a2 (c − 1) + (a1 + a2 − 1)2 . 1 2 c2 −1 c2 ¯ Because Var X = θ2 /n and Var S = E S 2 − (E S)2 = θ2 ¯ E θ − (a1 X + a2 cS) 2 2 . It does not ﬁt the deﬁnition of either one.8. λ = 601. X is the best unbiased estimator of λ. Because the Poisson family is an exponential family with t(x) = x.51 a. E(S 2 |X) is an unbiased estimator of λ and. to show ¯ ¯ ¯ that (X.

a < y < θ. If θ is the lower endpoint of the support.23 we see that Y is a complete suﬃcient statistic. nm(y)c(y) c. so W cannot be best unbiased. the smallest order statistic Y = X(1) is a complete suﬃcient statistic. Hence T (y) = y r + ry r−1 n+r r = y . Suppose X ∼ f (x|θ) = c(θ)m(x). which are equivalent to θ m(x) dx = a 1 c(θ) θ and a T (y)nm(y) h(θ) dy = . By Theorem 5. ne−y n . Then Varθ0 (W + aU ) = Varθ0 W + a2 Varθ0 U + 2aCovθ0 (W. any function T (Y ) that is an unbiased estimator of h(θ) is the best unbiased estimator of h(θ). c(y)n−1 c(θ)n Diﬀerentiating both sides of these two equations with respect to θ and using the Fundamental Theorem of Calculus yields m(θ) = − c (θ) c(θ)2 and T (θ)nm(θ) c(θ)n h (θ) − h(θ)nc(θ)n−1 c (θ) = . m(x) = 1 and c(θ) = 1/θ.53 Let a be a constant and suppose Covθ0 (W. m(x) = e−x and c(θ) = eθ . and the cdf of X is F (x) = c(θ)/c(x). Choose a ∈ −2Covθ0 (W. 7. For this pdf. U ) Varθ0 U.2. a < x < θ.4. Hence T (y) = y r − ry r−1 ry r−1 = yr − . a. U ). Arguing as above yields the best unbiased estimator of h(θ) is T (y) = h(y) − For this pdf. n(1/y) n b. U ) > 0.4 the pdf of Y is g(y|θ) = nm(y)c(θ)n /c(y)n−1 .55 All three parts can be solved by this general method. For this pdf. −y ey ne n h (y) . n−1 c(θ) c(θ)2n Change θs to ys and solve these two equations for T (y) to get the best unbiased estimator of h(θ) is h (y) . Thus. T (y) = h(y) + nm(y)c(y) For h(θ) = θr .Second Edition 7-21 7. m(x) = e−x and c(θ) = 1/(e−θ − e−b ). a < x < θ θ. Consider the equations θ θ f (x|θ) dx = 1 a and a T (y)g(y|θ) dy = h(θ). Then Varθ0 (W + aU ) < Varθ0 W . Hence T (y) = y r − ry r−1 −y ry r−1 (1 − e−(b−y) ) (e − e−b ) = y r − . Let Y = X(n) be the largest order statistic. 0 . Then 1/c(θ) = a m(x) dx. h (θ) = rθr−1 . Arguing as in Example 6.

· · · .23. We have n+1 n n+1 n+1 i=1 Xi is the best unbiased E T i=1 Xi = y = P i=1 n Xi > Xn+1 i=1 n+1 Xi = y n+1 = P i=1 Xi > Xn+1 . φ(T ) is the best unbiased estimator of τ (θ). If y > 0 the numerator is n n+1 n n+1 P i=1 Xi > Xn+1 . X n+1 = 0 +P i=1 Xi > Xn+1 . i=1 Xi = y − 1 = 0. . then n n P i=1 Xi > 1. Xn ) = τ (θ). n Ep T = Pp (T = 1) = Pp i=1 n+1 i=1 Xi > Xn+1 = h(p). 7. That is. X n )| T )] = τ (θ). T is a Bernoulli random variable. . φ(T ) = E[h(X1 . The denominator equals n+1 y py (1 − p)n+1−y . so φ(T ) is an unbiased estimator of τ (θ). . . For all y > 0. If E h(X1 . Xi is a complete suﬃcient statistic for θ. Xn )|T ] is a function only of T . X n+1 = 1 which equals n n n n P i=1 Xi > 0. And if y > 2.3. Hence. then E h(X1 . φ(T ) is an estimator. i=1 Xi = y =P i=1 Xi = y = n y p (1 − p)n−y . If y = 0 the numerator is n n+1 P i=1 Xi > Xn+1 . i=1 Xi = 0 = 0. i=1 Xi = y − 1 =P i=1 Xi = y − 1 = n py−1 (1 − p)n−y+1 . .56 Because T is suﬃcient. . so E T estimator of h(p). . b. n n n P i=1 Xi > 0. By Theorem 7. i=1 Xi = y. i=1 Xi = y. y If y = 1 or 2. . .57 a. i=1 Xi = y P (Xn+1 = 0) + P i=1 Xi > 1. . y−1 . . . Xn ) = E [E ( h(X 1 .7-22 Solutions Manual for Statistical Inference 7. then n n n P i=1 Xi > 1. i=1 Xi = y − 1 P (Xn+1 = 1). i=1 Xi = y P i=1 Xi = y .

63 Let a = τ 2 /(τ 2 + 1). (X. where the expectation is with respect to π(θ|x). δ π ) becomes ﬂatter. σ2 σ2 2 so the loss of an estimator is the negative of its likelihood.59 We know T = (n − 1)S 2 /σ 2 ∼ χ2 . Then R(µ. 7. Hence.n .61 The pdf for Y ∼ χ2 is ν f (y) = Thus the pdf for S 2 = σ 2 Y /ν is g(s2 ) = 1 ν 2 Γ(ν/2)2ν/2 σ s2 ν σ2 ν/2−1 1 y ν/2−1 e−y/2 .27 is L(σ 2 . a) = a a − log − 1. the UMVUE is n+1 E T i=1 Xi = y   0n y   (y )p (1−p)n−y (1−p) (n)  y 1 = n+1 = (n+1)(n+1−y) (n+1)py (1−p)n−y+1 ( y ) = y n  ((n)+( n ))py (1−p)n−y+1  y y−1 (n)+(y−1)   = y n+1 =1 n+1 y (1−p)n−y+1 ( y )p ( y ) if y = 0 if y = 1 or 2 if y > 2.65 a. From Theorem 6. The unbiased estimator (n−1)p/2 S p Cp. ¯ so (n − 1)p/2 S p Cp. .n is a function of (X. S 2 ).2. σ2 σ2 σ where K > 0 and K > 0. 7. 7. S 2 ) is a ¯ complete. 7. a)|x) = eca E e−cθ −cE(a−θ)−1. Γ(ν/2)2ν/2 e−s 2 ν/(2σ 2 ) . As τ 2 increases.25. b. a)|x) = ceca E e−cθ − c = 0. Thus.Second Edition 7-23 Therefore. The second derivative is positive. so the Bayes estimator is δ π (x) = ax. The loss function in Example 7.n . p/2 = Cp. Then d set E (L(θ. R(µ.3.n is an unbiased estimator of σ p . The posterior expected loss is E (L(θ. da and a = − 1 log E e−cθ is the solution. Figure omitted. Then n−1 ET Thus E (n − 1)S σ2 2 p/2 = Γ 1 n−1 2 ∞ 2 n−1 2 t 0 p+n−1 −1 2 e t −2 22 Γ dt = Γ p p+n−1 2 n−1 2 = Cp. δ π ) = (a − 1) µ2 + a2 . the log-likelihood has the form (gathering together constants that do not depend on s2 or σ 2 ) 1 s2 s2 log L(σ 2 |s2 ) = log + K log −K 2 +K . suﬃcient statistic. so this is the minic mum. it is the best unbiased estimator.

Thus we can write the estimator as JK(Tn ) = n S2 n−1 − n2 n S (S − 1) (n − 1) 2 2 + (n − S) n2 θ 2 −nθ 2 n(n−1) S2 (n − 1) 2 = S 2 −S . so the Bayes risk is x ¯ inﬁnite for any estimator of the form X + b. Tn = (i) j=i 2 1 2 n2 E S 2 = 1 n2 (nθ(1 − θ) + (nθ)2 ) = θ n + (i) n−1 2 n θ . For an estimator X + b. θ). ¯ x Substitute E e−cθ = e−c¯+σ c /2n into the formula in part (b) to ﬁnd the Bayes rule is 2 x − cσ /2n. The marginal distribution of X is m(¯) = 1. For the other n − S values of i. S is a complete suﬃcient statistic. ¯ ¯ d. E θ2 = E S2 = n b.7-24 Solutions Manual for Statistical Inference c. ¯ For X the expected loss is ec σ /2n − 1. x + b)|x) = ecb ec ¯ 2 2 2 σ 2 /2n − cb − 1. it is the best unbiased estimator of θ2 .66 Let S = i Xi ∼ binomial(n. we ﬁnd E e−cθ = x 2 2 x e−c¯+σ c /2n . the squared error risk is E (X + b) − θ = σ 2 /n + b2 . 2 ¯ ¯ ¯ e. substituting into the formula for a normal mgf. Xj (n − 1)2 . and the LINEX posterior loss is x E (L(θ. 7. E JK(Tn ) = 1 n(n−1) (nθ(1 − θ) + (nθ)2 − nθ) = = θ2 . Because JK(Tn ) is a function of S that is an unbiased estimator of θ2 . so X is better than the Bayes estimator. n(n − 1) c. So. For this binomial model. The Bayes risk is inﬁnite for both estimators. d. the LINEX posterior loss (from part (c)) is E (L(θ. ˆ a. Tn = (S − 1)2 /(n − 1)2 because the Xi (i) that is dropped out equals 1. σ 2 /n). and for the Bayes estimator (b = −cσ 2 /2n) the ¯ expected loss is c2 σ 2 /2n. π(θ|x) = n(¯. Tn = S 2 /(n − 1)2 because the Xi that is dropped out equals 0. . For X + b. a)|x) = ec(a−¯)+σ 2 2 2 2 c /2n − c(a − x) − 1. For S values of i.

12) while the denominator has y/m as the MLE (see Example 7. . To assess if the accident rate has dropped. To show that this is equivalent to rejecting if y > b. . Thus 1 λ(y) = and we reject H0 if (θ0 ) (1−θ0 ) y y m−y m−y (θ0 )y (1−θ0 )m−y (y/m)y (1−y/m)m−y if y/m ≤ θ0 if y/m > θ0 . and note that the MLE in the numerator is min {y/m. P {X ≥ 560} = P 559. (A normal approximation with continuity correction gives a value of .) 8.Chapter 8 Hypothesis Testing 8. For this binomial. and we observed X = 10. Thus.1 Let X = # of heads out of 1000. .2 Let X ∼ Poisson(λ).763} ≈ . we could show λ(y) is decreasing in y so that λ(y) < c occurs for y > b > mθ0 . . and p > 1/2. . .5−500 X − 500 √ √ ≥ 250 250 ≈ P {Z ≥ 3. ym ) . 1/2).0000839. supΘ L(θ|y1 .11846. the probability of observing 560 or more heads out of 1000 is very small.3 The LRT statistic is λ(y) = m supθ≤θ0 L(θ|y1 . 8. A normal approximation is also very good for this calculation. ym ) Let y = i=1 yi . not overwhelming evidence that the accident rate has dropped.2. . (y/m) (1 − y/m) < c. where a computer was used to do the calculation. So 1000 P (X ≥ 560) = x=560 1000 x 1 2 x 1 2 n−x ≈ . . E X = 1000p = 500 and Var X = 1000p(1 − p) = 250. . We might tend to believe that the coin is not fair. and we have log λ(y) = y log θ0 + (m − y) log (1 − θ0 ) − y log y − (m − y) log m m−y m .θ0 } (see Exercise 7.0000825. then X ∼ binomial(1000. i! 2! 10! This is a fairly large value.7). we could calculate 10 P ( X ≤ 10| λ = 15) = i=0 e−15 15i 152 1510 = e−15 1+15+ +···+ ≈ . It is easier to work with log λ(y).12264. If the coin is fair. if the coin is fair.

although the problem can be solved that way. w > log ν. make the n-to-1 transformation that sets Z1 = mini Yi and sets Z2 . . so each fraction above is less than 1.8-2 Solutions Manual for Statistical Inference and d logλ(y) dy = = log θ0 − log(1 − θ0 ) − log log θ0 m y/m 1−θ0 m−y y 1 − y + log m y m−y m + (m − y) 1 m−y . x(1) |x) = −n/θ2 < 0. So both the restricted and unrestricted MLEs of ν are ν = x(1) . set ˆ ∂ n log L(θ. So the numerator and denominator of λ(x) are the supremum of this probability over the indicated sets. Under H0 . 1). with their order unchanged. Next. ˆ (∂ 2 /∂θ2 ) log L(θ. for appropriately chosen constants c1 and c2 .µ supθ n 1 −xi /θ i=1 θ e n 1 −xi /θ i=1 θ e 1 θ m+n m 1 −y j /θ j=1 θ e m 1 −y j /µ j=1 µ e n i=1 exp − n i=1 xi + = m j=1 yj m j=1 θ . Zn equal to the remaining Yi s. 2) = χ2(n−1) . . and the d log is less than 0. n T i xi /x(1) ) xi i = 0. n. . Hence. Now T = i=2 Wi ∼ gamma(n − 1.5 a. i = 2. For y/m > θ0 . 8.4 For discrete random variables. . (∂/∂T ) log λ(x) = (n/T ) − 1. where x(1) = mini xi . So θ is a maximum. Instead. this is an increasing function of ν for ν ≤ x(1) . ˆ b. 8. . i = 2. First. make the following three transformations. yj /µ supθ. . We will not use the hint. for all θ. Thus dy log λ < 0 which shows that λ is decreasing in y and λ(y) < c if and only if y > b. T ≤ c is equivalent to T ≤ c1 or T ≥ c2 . and. Then you ﬁnd that the Wi s are independent with W1 ∼ fW1 (w) = nν n e−nw . . c. Finally. ν|x) = n log θ + nθ log ν − (θ + 1) log i xi . n. i = 1. λ(x) is increasing if T ≤ n and decreasing if T ≥ n. Thus. . . y) = supθ supθ. 1 − y/m = (m − y)/m < 1 − θ0 . To ﬁnd the MLE of θ. y) supΘ L(θ|x. . the MLE of θ is θ0 = 1. 8. .6 a.µ θ1 exp {− n xi /θ} µ1 exp − m . For any value of θ. y) = supΘ0 L(θ|x. . . 2 2T ∼ gamma(n − 1. . L(θ|x) = f (x|θ) = P (X = x|θ). λ(x. let Yi = log Xi . The log-likelihood is log L(θ. n. and the MLE of ν is still ν = x(1) . . n and Wi ∼ exponential(1). So the likelihood ratio ˆ statistic is λ(x) = xn /( (1) n n2 /T (n/T ) x(1) i xi ) i 2 n/T +1 = ( xi ) T n n e−T (e −T n/T = ) T n n e−T +n . ν ≤ x(1) . x(1) |x) = + n log x(1) − log ∂θ θ and solve for θ yielding ˆ θ= log( n n = . . hence. let W1 = Z1 and Wi = Zi − Z1 .

λ(x) = where L(¯. ˆ ˆ ¯ we have θ = x(1) and λ = x − x(1) as the unrestricted MLEs of θ and λ. ˆ which is increasing in θ if x(1) ≥ θ (regardless of λ).θ | x) supΘ0 L(θ.θ|x) if x(1) ≤ 0 if x(1) > 0. Therefore λ is a function of T .λ | x) = = ˆ ˆ supΘ L(θ. Rejection for λ ≤ c is equivalent to rejection for T ≤ a or T ≥ b.0|x) x ˆ ˆ L(λ. Therefore. λ|x) = 1 −(xi −θ)/λ e I[θ. x . T ∼ beta(n. ¯ Therefore. where c∗ is some constant. Then ∂log L n =− + ∂λ λ Because n ∂ 2 log L = 2 −2 ∂λ2 λ i i ˆ xi − nθ set = 0 λ2 ⇒ ˆ nλ = i ˆ xi − nθ ⇒ ˆ ¯ λ = x − x(1) .λ | x) L(θ. Under the restriction θ ≤ 0. θ) and j Yj ∼ gamma(m.19b. ˆ xi − nθ 3 λ = x−x(1) ¯ n (¯ − x(1) ) x 2 − 2n(¯ − x(1) ) x (¯ − x(1) ) x 3 = −n (¯ − x(1) ) x 2 < 0. When H0 is true. the MLE of θ (regardless of λ) is ˆ θ0 = 0 x(1) if x(1) > 0 if x(1) ≤ 0. 0 | x) x = ˆ ˆ L(λ. So the MLE of θ is θ = x(1) . ˆ ¯ n+m n+m xi + yj n i xi + j yj )/(n + m). λ | x) ˆ λ x ¯ n 1 L(¯.∞) (xi ) = λ i=1 n 1 λ n e−(Σi xi −nθ)/λ I[θ. y) = n i i j n+m xi + m j j yj m i xi + m j j yj j n xi exp − i xi j i xi m yj exp − yj yj = (n + m)n+m ( n n mm xi ) n yj n+m . L(θ. i xi + j yj And the LRT is to reject H0 if λ(x.∞) (x(1) ). y) ≤ c. So by an extension of Exercise 4. θ | x) x x (1/¯) e−n¯/¯ x n sup{(λ. c. yields λ0 = x. m). i Xi ∼ gamma(n.Second Edition 8-3 i ˆ Diﬀerentiation will show that in the numerator θ0 = ( ˆ ¯ denominator θ = x and µ = y . ˆ ˆ For x(1) > 0.θ):θ≤0} L(λ. b. while in the exp − i λ(x. substituting θ0 = 0 and maximizing with respect to λ. So rejecting if λ(x) ≤ c is equivalent to rejecting if x(1) /¯ ≥ c∗ . where a and b are constants that satisfy an (1 − a)m = bn (1 − b)m . (n + m) λ= n n mm n+m xi i xi + i n j j yj j m yj i xi + yj = (n + m) n n mm n+m T n (1 − T )m . as above. ˆ 1/λ n = = x x e−n(¯−x(1) )/(¯−x(1) ) x−x(1) ¯ x ¯ n = 1− x(1) x ¯ n . θ) and they are independent. 8. λ is a unimodal function of T which is maximized when n T = m+n .7 a.

For ﬁxed γ. 2 . the denominator is maximized at ¯ γ ˆγ = β xi /n. i xi ) e ˆ The numerator is maximized at β0 = x. 2πaθ = 1 1 (xi − θ)2 . which gives a quadratic in θ that leads to the restricted MLE ˆ θR = −1+ 1+4(ˆ +¯2 ) σ x 2 .8-4 Solutions Manual for Statistical Inference b. θ | x) log L(a. x ¯ (xi − x)2 = ¯ i=1 the ratio of the usual MLEs of the mean and variance. We will ﬁrst ﬁnd the MLEs of a and θ. − log(2πaθ) − 2 2aθ i=1 = i=1 n − − i=1 1 1 2 + (x − θ) 2a 2θa2 i = − n 1 + 2a 2θa2 n (xi − θ)2 i=1 set = 0 = 1 1 1 2 + (x − θ) + (xi − θ) 2θ 2aθ2 i aθ n n 1 = − + 2θ 2aθ2 (xi − θ)2 + i=1 n¯ − nθ x aθ set = 0. We omit it. say a and θ.γ (γ n /β n )( i γ−1 −Σi xγ /β . (Veriﬁcation that this is a maximum is lengthy. The denominator cannot be maximized in closed form. θ | x) Thus ∂log L ∂a ∂log L ∂θ n = i=1 n √ 2 1 e−(xi −θ) /(2aθ) . and a= ˆ 1 n¯ x n n n n(¯−θ) x + + = 0. we just solve the second equation. Numeric maximization could be used to compute the statistic for observed data x. ˆ Solve the ﬁrst equation for a in terms of θ to get a= 1 nθ n (xi − θ)2 .8 a. 2θ 2θ aθ σ2 ˆ .) For a = 1. The LRT statistic is λ(x) = supβ (1/β n )e−Σi xi /β supβ. Thus i λ(x) = x−n e−n ¯ ˆ supγ (γ n /β γ )( n i γ−1 ˆ −Σi xγ /β γ i = xi ) e x−n ¯ n n /β )( supγ (γ ˆγ i xi ) γ−1 . ˆ We have to solve these two equations simultaneously to get MLEs of a and θ. i=1 Substitute this into the second equation to get − ˆ ¯ So we get θ = x. We have n L(a. 8.

i=1 (xi − θ)2 − i=1 θ n n (xi − θ) = 0.Second Edition 8-5 Noting that aθ = σ 2 . θ = x and a= ˆ 1 n¯2 x n n n + +n θ θ i (xi −θ) 2 i (xi −θ) = 0. set a = 1 in the second equation to obtain ∂log L n 1 =− + 3 ∂θ θ θ Multiply through by θ3 /n to get −θ2 + 1 n n n (xi − θ)2 + i=1 1 θ2 n set (xi − θ) = 0. i=1 Substituting this into the second equation. Solving the ﬁrst equation for a in terms of θ yields a= 1 nθ2 n (xi − θ)2 . log L(a. In this case we have n n/2 n i=1 √ 1 ˆ 2π θR e−(xi −θR ) e−(xi −θ) ˆ 2 ˆ /(2θ R ) λ(x) = = n/2 n i=1 √1 ˆ ˆ 2 /(2ˆθ) aˆ 2πˆθ aˆ 2 e−Σi (xi −θR ) ˆ /(2θ R ) (1/(2πˆ 2 )) σ σ 2 /θR ˆ ˆ n/2 −Σi (x −¯)2 /(2ˆ 2 ) σ i x e e(n/2)−Σi (xi −θR ) ˆ 2 ˆ /(2θ R ) . 2 2aθ2 Thus ∂logL ∂a ∂logL ∂θ n = i=1 n − 1 1 2 + 2 2 (xi − θ) 2a 2a θ = − n 1 + 2 2 2a 2a θ n (xi − θ)2 i=1 set = 0. i=1 Add ±¯ inside the square and complete all sums to get the equation x −θ2 + σ 2 + (¯ − θ)2 + θ(¯ − θ) = 0. (xi − x)2 = ¯ i=1 σ2 ˆ x2 ¯ in the unrestricted case. = i=1 1 1 1 2 − + 3 (xi − θ) + 2 (xi − θ) θ aθ aθ n = − n 1 + 3 θ aθ (xi − θ)2 + i=1 1 aθ2 n (xi − θ) i=1 set = 0. θ | x) a ˆ ˆ 1/(2π θR ) = = b. θ | x) = i=1 1 1 − log(2πaθ2 ) − (xi − θ)2 . In the restricted case. ˆ x x . we obtain ˆˆ ˆ ˆ L(θR | x) L(ˆ. we get − ˆ ¯ So again.

If 2α is an integer. Then Y |S 2 ∼ gamma(γ. S2 < M − 2/β . The pdf of Xi is f (xi |λi ) = (λi /x2 )e−λi /xi . x ˆ ˆ if σ 2 > 1.23. (Note: If 2α is an integer.8-6 Solutions Manual for Statistical Inference This is a quadratic in θ with solution for the MLE ˆ θR = x + ¯ which yields the LRT statistic ˆ L(θR | x) λ(x) = = L(ˆ. ˆ . this is a χ2 2y+2α distribution. δ). 8. Note that M depends on only α and n. The MLE of λ under H0 is λ0 = Y i LRT statistic is bounded above by 1 and is given by 1≥ ¯ Y ( i −n −n Yi .11 a. b. From Example 7. the arithmetic-geometric mean inequality. Let Y = 2/(σ 2 δ). P (λ > λ0 |y) = 1 − P (λ ≤ λ0 |y). −1 ) e−n e 1/n ¯ Rearrangement of this inequality yields Y ≥ ( i Yi ) . the argument proceeds as in n/ [ i (1/Xi )]. From Exercise 7. and the MLE of λi under H1 is λi = Y −1 . ˆ L(¯. The 8. δ β that is. ˆ b.10 Let Y = i Xi . 2). σ 2 |x) is a unimodal function of σ 2 . 8. Then we have P (Y ≥ 2/δ|S 2 ) = P (σ 2 ≤ 1|S 2 ) > 1/2 if and only if M> 2 2 = (n − 1)S 2 + . if σ 2 ≤ 1. 2) 2γ distribution. for α = 5/2 and β = 2. the unrestricted MLEs are µ = X and σ 2 = (n − 1)S 2 /n. xi > 0. and the MLE of λi under H1 is λ part (a). (2(β + 1)λ/β)|y has a gamma(y + α.9 a. Because β/(β + 1) is a scale parameter in the posterior distribution. e−(xi −θR ) ˆ 2 ˆ /(2θ R ) ˆ2 2 √ 1 2πˆθ 2 aˆ ˆ2 a e−(xi −θ) /(2ˆθ ) = σ ˆ ˆR θ n e(n/2)−Σi (xi −θR ) ˆ 2 ˆ /(2θ R ) . Under H0 . θ | x) a ˆ n i=1 n i=1 1 ˆ2 2π θR x+4(ˆ +¯2 ) ¯ σ x 2 2. this is a χ2 distribution.) Let M denote the median of a gamma(γ. The MLE of λ under H0 is λ0 = i ˆ i = Xi .11. where γ = α + (n − 1)/2 and δ = [(n − 1)S 2 /2 + 1/β]−1 . ˆ ˆ ¯ −1 . P (λ ≤ λ0 |y) = (β+1) Γ(y+α)β y+α y+α λ0 0 ty+α−1 e−t(β+1)/β dt. Then because µ is still X. ˆ ˆ ¯ because this was the maximizing value of µ. regardless of σ 2 . and is 1. Now. not on S 2 or β. the restricted MLE of σ 2 is σ 2 . P (λ ≤ λ0 |y) = P 2(β+1)λ0 2(β+1)λ ≤ y β β = P (χ2 2y+5 ≤ 3λ0 ). β/(β + 1)).2. The posterior distribution of λ|y is gamma (y + α. 2) distribution. So. the posterior distribution of σ 2 given S 2 is IG(γ. n−1 ¯ b. So the LRT statistic is ˆ λ(x) = 1 2 n/2 σ2 (ˆ ) e−n(ˆ −1)/2 σ if σ 2 ≤ 1 ˆ if σ 2 > 1. a.

ˆ ∂ ∂(ˆ ) σ 2 log λ(x) = n 2 1 −1 σ2 ˆ < 0.05 take c = 1. The power function is √ √ β(µ) = P −1. 8. ˆ where k is a constant. but φ2 is more powerful for larger θs. and for α = .3. b. HA : µ = 0 the LRT is to reject H0 if |¯| > cσ/ n (Example 8.96. (M − 2/β)/(n − 1) varies between −∞ and M/(n − 1). We can pick the prior parameters so that the acceptance regions match in this way. H1 : µ > 0 the LRT is to reject H0 if x > cσ/ n (Example 8. as β varies between 0 and ∞. we get 0 if θ ≤ −. 2 2 Setting this equal to α and solving for C gives C = 2 − √ C = 2 − . The size of φ2 is α2 = P (X1 +X2 > C|θ = 0).Second Edition 8-7 We have that. .1 ≈ 1. The LRT accepts H0 if and only if S 2 < k. For ¯ α = . For x α = .645− √ . Then. For H0 : µ ≤ 0 vs.95|θ = 0) = . we obtain the power function for the second test as  0  (2θ + 2 − C)2 /2 β2 (θ) = Pθ (Y > C) = 2   1 − (C − 2θ) /2 1 if if if if θ ≤ (C/2) − 1 (C/2) − 1 < θ ≤ (C − 1)/2 (C − 1)/2 < θ ≤ C/2 C/2 < θ. = P Z > 1.05 if −.05 θ + . √ b.645. So λ(x) is decreasing in σ 2 . given by fY (y|θ) = y − 2θ 2θ + 2 − y 0 if 2θ ≤ y < 2θ + 1 if 2θ+1 ≤ y < 2θ + 2 otherwise. for σ 2 > 1. For H0 : µ = 0 vs. If 1 ≤ C ≤ 2.96σ/ n gives power of approximately .3).05 < θ ≤ . The power function is √ ¯ µ nµ X−µ √ > 1. this is 1 1 α2 = P (X1 + X2 > C|θ = 0) = 1−C C−x1 1 dx2 dx1 = √ (2 − C) .645σ/ n.96 − nµ/σ ≤ Z ≤ 1. for some choice of β.95 1 if . From the graph it is clear that φ1 is more powerful for θ near 0. √ In this case.05. large values of S 2 .5 when µ = 1. For the ﬁrst test we have the power function β1 (θ) = Pθ (X1 > . c.2). and rejecting H0 for small values of λ(x) is equivalent to rejecting ˆ for large values of σ 2 . √ 8. µ = ±1.95) = 2α.12 a.96 + nµ/σ .645− β(µ) = P σ σ/ n σ/ n √ Note that the power will equal . φ2 is not uniformly more powerful than φ1 . First. that is.5.05 take c = 1. Using the distribution of Y = X1 + X2 . The size of φ1 is α1 = P (X1 > . pick α large enough that M/(n − 1) > k.13 a.05.2. (M − 2/β)/(n − 1) = k and the acceptance regions match. So.68.95 < θ.

n. To determine c to obtain a speciﬁed 2 2 2 Xi /σ0 > c/σ0 2 = P χ2 > c/σ0 . 8. The ﬁrst set is the rejection region for φ2 . But for 0 < θ < C − 1. The likelihood function is µ−1 ⇒ ⇒ Type I error = 1. Size = P (reject H0 | H0 is true) = 1 Power = P (reject H0 | HA is true) = 1 b.51) = . this is equivalent to rejecting if x2 > i i 2log (k (σ1 /σ0 ) ) 1 2 σ0 n − 1 2 σ1 =c because 1 1 2 − σ2 > 0 .49) = −2. x2 ) : x1 > 1} {(x1 .51) n(.51) c−n(. because if θ = 0.8-8 Solutions Manual for Statistical Inference d.α 8. 1).49) n(. The test with this rejection region has the same size as φ2 because the last two sets both have probability 0 if θ = 0. we should reject H0 . θ|x. n 2 2 so we must have c/σ0 = χ2 . The power function of this test is strictly larger than β2 (θ).33.α n. use the fact that i Xi /σ0 ∼ χ2 .15 From the Neyman-Pearson lemma the UMP test rejects H0 if 2 f (x | σ1 ) (2πσ1 ) e−Σi xi /(2σ1 ) = = 2 2 −n/2 e−Σi x2 /(2σ0 ) f (x | σ0 ) i (2πσ0 ) −n/2 2 2 σ0 σ1 n exp 1 2 x2 i i 1 1 2 − σ2 σ0 1 >k for some k ≥ 0. σ0 1 This is the UMP test of size α. ⇒ ⇒ Type I error = 0. we need to ﬁnd c and n to satisfy P We thus want Z> c−n(. this test and φ2 have the same power. = 2. Thus n α = Pσ0 i i 2 Xi > c).33 and c−n(. For a test that rejects H0 when i Xi > c.5.49) = . x2 ) : x2 > 1}. Type II error = 0. P (Xi < 1) = 1. Solving these equations gives n = 13.51)(. If either X1 ≥ 1 or X2 ≥ 1. Type II error = 1. where α = Pσ0 ( 2 2 α. which means c = σ0 χ2 .51) n(. Size = P (reject H0 | H0 is true) = 0 Power = P (reject H0 | HA is true) = 0 8.14 The CLT tells us that Z = ( i Xi − np)/ np(1 − p) is approximately n(0. After some algebra.567 and c = 6.783. 8. x2 ) : x1 + x2 > C} {(x1 .49) n(.16 a.49)(.  θn  j θ−1 yj  .51)(. L(µ. consider the rejection region given by {(x1 . Thus.17 a.01 and P Z> c−n(. y) = µn i xi .99.49)(. If C − 1 ≤ θ.

a simple generalization of Exercise 4. the likelihood is  L(θ|x. µ and θ0 yields ( λ(x. Substituting in the formulas for θ. i log xi + j log yj   j θ0 −θ ˆ ˆ yj  . Therefore. yields the MLEs µ=− ˆ Under H0 . 1/µ) and V ∼ gamma(m. Simple transformations yield − log Xi ∼ exponential(1/µ) and − log Yi ∼ exponential(1/θ). y) = ˆ ˆ ˆm+n θ0 θn θm = 0 0 = µn θm ˆ ˆ µn θm ˆ ˆ m+n m m i xi ) ˆ µ θ0 −ˆ ˆ ˆ θ0 −θ j yj n = 1 and m+n n (1 − T )m T n .18 a. . This is a unimodal function of T . y) ≤ c is equivalent to rejecting if T ≤ c1 or T ≥ c2 . Under H0 . 1/θ). y) = θn+m  i n i log xi ˆ and θ = − m . T = W/(W + V ) where W and V are independent. β(θ) ¯ ¯ |X−θ0 | |X−θ0 | √ >c √ ≤c = 1 − Pθ σ/ n σ/ n cσ cσ ¯ = 1 − Pθ − √ ≤ X−θ0 ≤ √ n n √ √ ¯ −cσ/ n + θ0 −θ X−θ cσ/ n + θ0 −θ √ √ ≤ √ = 1 − Pθ ≤ σ/ n σ/ n σ/ n θ0 −θ θ0 −θ = 1 − P −c + √ ≤ Z ≤ c + √ σ/ n σ/ n θ0 −θ θ0 −θ = 1 + Φ −c + √ −Φ c+ √ . where c1 and c2 are appropriately chosen constants. by diﬀerentiating the log-likelihood. c. σ/ n σ/ n = Pθ and (1 − c1 )m cn = (1 − c2 )m cn . 1) and Φ is the standard normal cdf. j log yj θ−1 xi j yj  . ˆ θ0 = − The LRT statistic is ˆ θm+n λ(x. xi i ˆ ˆ ˆ b. m). the scale parameters of W and V are equal. y) = 0 µn θm ˆ ˆ ˆ µ θ0 −ˆ n+m .Second Edition 8-9 Maximizing.19b yields T ∼ beta(n. W ∼ gamma(n. The constants c1 and c2 are determined by the two equations P (T ≤ c1 ) + P (T ≥ c2 ) = α 8. So rejecting if λ(x. and maximizing as above yields the restricted MLE. Then. 1 2 where Z ∼ n(0.

1 (1/θ)−1 −y1/θ y e . a test i that rejects if f (y|1/4)/f (y|1/2) > k is UMP of its size. The size is .675 implies n = 6.25c. 8. Let Y = Xi ∼ binomial(10. 1. So the test that rejects if Y ≤ 2 is the UMP size α = .20 By the Neyman-Pearson Lemma. we compute d dy 1 −1/2 y−y1/2 y e 2 = 1 −3/2 y−y1/2 y e 2 y− y 1/2 1 − 2 2 which is negative for y < 1 and positive for y > 1.96. Hence.84 The ratio is decreasing in x. P (Y ≤ 2|p = 1/2) = . f (c0 |1) f (c1 |1) Solving these two equations numerically.0547 test. the suﬃcient statistic. and rejecting for large value of the ratio is equivalent to rejecting for small values of y. 7|H1 ) = . yields c0 = . The power of the test is P (Y ≤ 2|p = 1/4) ≈ .21 The proof is the same with integrals replaced by sums.96 − n). Thus f (y|2)/f (y|1) is decreasing for y ≤ 1 and increasing for y ≥ 1.526. the UMP test will reject if f (y|θ) = f (y|2) 1 −1/2 y−y1/2 y e = > k.22 a.675) ≈ .10.04 test. From Corollary 8. . To obtain a size α test. the UMP test rejects for large values of f (x|H1 )/f (x|H0 ). The value c = 4 gives the UMP size α = .13. rejecting for f (y|2)/f (y|1) > k is equivalent to rejecting for y ≤ c0 or y ≥ c1 . p) and let f (y|p) denote the pmf of Y .. Computing this ratio we obtain x f (x|H1 ) f (x|H0 ) 1 6 2 3 4 5 6 7 5 4 3 2 1 . y > 0. θ By the Neyman-Pearson Lemma.0547. So rejecting for large values of f (x|H1 )/f (x|H0 ) corresponds to rejecting for small values of x.637798.19 The pdf of Y is √ n = −. By Exercise 8.8-10 Solutions Manual for Statistical Inference b. To get α = . we need to choose c so that P (X ≤ c|H0 ) = α.25 implies 1.609824.3.076546 and c1 = 3.3. The Type II error probability is c1 P (c0 < Y < c1 |θ = 2) = c0 1/2 1 −1/2 −y1/2 y e dy = −e−y 2 c1 = . ≈0 Φ(−. Trying values c = 0. By Corollary 8. . The power (1 − type II error) is √ √ √ √ .0547. To get a size α test.13 we can base the test on i Xi . we ﬁnd that for c = 2. 8.96− n) −Φ(1. c0 8. The Type II error probability is P (X = 5.82. the ratio f (y|1/2)/f (y|1/4) is increasing in y. 2 f (y|1) To see the form of this rejection region. 6.0547.75 ≤ β(θ0 + σ) = 1 + Φ(−c − n) − Φ(c − n) = 1 + Φ(−1. .943 ≈ 7. .05 = β(θ0 ) = 1 + Φ(−c) − Φ(c) which implies c = 1. for α = . So the ratio f (y|1/4)/f (y|1/2) is decreasing in y. we must ﬁnd c such that P (Y ≤ c|p = 1/2) = .96 − 8. the constants c0 and c1 must satisfy α = P (Y ≤ c0 |θ = 1) + P (Y ≥ c1 |θ = 1) = 1 − e−c0 + e−c1 and f (c0 |2) f (c1 |2) = .

) . For θ2 > θ1 .23 a.Second Edition 10 k 10−k 8-11 10 1 1 b. 1024 . so the most powerful α level test is reject H0 if X > 1 − α.25 a. Thus the ratio is increasing in x. The test is Reject H0 if X > 1/2. c. where i is an integer. The LRT rejects H0 if λ(x) < c. By the Karlin-Rubin Theorem. an increasing function of x because θ2 > θ1 . the LRT statistic is λ(x) = L(θ0 |x) = max{L(θ0 |x).377. n−x x θ1 (1−θ2 ) 1 − θ1 g(x | θ1 ) x θ1 (1−θ1 ) Both θ2 /θ1 > 1 and (1 − θ1 )/(1 − θ2 ) > 1. 8. H1 : θ = θ1 . σ 2 ) have MLR. There is a nonrandomized UMP test for all α levels corresponding to the probabilities P (Y ≤ i|p = 1/2).24 For H0 : θ = θ0 vs. α can have any of the values 0. We have 1 1 α = sup β(θ) = β(1) = θ∈Θ0 k/2 fX (x|1) dx = k/2 1 k x1−1 (1 − x)1−1 dx = 1 − . So the power function is 1 β(θ) = Pθ (X > 1/2) = 1/2 1 Γ(θ+1) θ−1 x (1 − x)1−1 dx = θ xθ Γ(θ)Γ(1) θ θ 1 =1− 1/2 1 . 8. and the family has MLR. and the family has MLR because log θ is increasing in θ. 1 11 56 176 386 638 848 968 1013 1023 1024 . 1) 2 Thus 1 − k/2 = α. the test that rejects H0 if X > t is the UMP test of its size. b. 2 2 2 2 2 2 g(x | θ2 ) e−(x−θ2 ) /2σ = −(x−θ )2 /2σ2 = ex(θ2 −θ1 )/σ e(θ1 −θ2 )/2σ . β(1. For θ2 > θ1 . For n = 10. Γ(2)Γ(1) Thus. in practice the two tests are often the same.27). the tests will not be the same. For example. x g(x | θ1 ) e θ1 /x! θ1 which is increasing in x because θ2 /θ1 > 1. 1024 . Substituting the beta pdf gives f (x|2) = f (x|1) 1 2−1 (1 β(2. The Neyman-Pearson test rejects H0 if f (x|θ1 )/f (x|θ0 ) = L(θ1 |x)/L(θ0 |x) > k. 1024 . So this family has MLR. Because c is usually chosen to be small (k large) to get a small size α. the ratio is increasing in x. For θ2 > θ1 . L(θ1 |x)} 1 if L(θ0 |x) ≥ L(θ1 |x) L(θ0 |x)/L(θ1 |x) if L(θ0 |x) < L(θ1 |x). 1024 . 2θ The size is supθ∈H0 β(θ) = supθ≤1 (1 − 1/2 ) = 1 − 1/2 = 1/2. By the argument in part (b). We now use the α level to determine k. the LRT. c.1) x 1 1−1 (1 β(1. H1 : θ = 2 is given by Reject H0 if f (x | 2)/f (x | 1) > k for some k ≥ 0. n−x x n n x g(x | θ2 ) θ2 (1−θ1 ) 1 − θ2 x θ (1−θ2 ) = n 2 = . The size of the test is P (Y ≥ 6|p = 1/2) = ≈ . 1 g(x | θ1 ) e Because θ2 − θ1 > 0. f (x|θ2 )/f (x|θ1 ) = (θ2 /θ1 )xθ2 −θ1 . the most powerful test of H0 : θ = 1 vs.1) x − x) − x) 1−1 1−1 = Γ(3) x = 2x. use t = 1 − α to get size α. 8. Thus the Poisson(θ) family has an MLR. But if c ≥ 1 or k ≤ 1. For θ2 > θ1 . So the families of n(θ. x g(x | θ2 ) e−θ2 θx /x! θ2 = −θ1 2 = eθ1 −θ2 . 1024 . the MP test is Reject H0 if X > k/2. (Note: You can also use the fact that an exponential family h(x)c(θ) exp(w(θ)x) has MLR if w(θ) is increasing in θ (Exercise 8. If k = 1/c > 1. By the Neyman-Pearson Lemma. 1024 . 1024 . the Poisson(θ) pmf is e−θ exp(x log θ)/x!. 1024 . and 1. b. 1024 . The power k=6 k 2 2 10 10 k 10−k function is β(θ) = k=6 k θ (1 − θ) c. this is equivalent to L(θ0 |x)/L(θ1 |x) < c.

location families are stochastically increasing in their location param2 eter. = 2 = 2 2 f (x|θ1 ) 1+(x − θ2 ) 1 + (1+θ2 ) /x − 2θ2 /x The limit of this ratio as x → ∞ or as x → −∞ is 1. c. 1+ek −1 2 . this ratio is increasing in x. which is zero. Examination of the derivative shows that f (x|1)/f (x|0) is decreasing √ √ √ for x ≤ (1 − 5)/2√ −. By the Neyman-Pearson Lemma. c(θ2 ) [w(θ2 )−w(θ1 )]t g(t|θ2 ) e = g(t|θ1 ) c(θ1 ) which is increasing in t because w(θ2 ) − w(θ1 ) > 0. But it is also true for discrete MLR families. 1 . Thus this inequality is equivalent to rejecting if x > k .8-12 Solutions Manual for Statistical Inference 8. Thus. the family does not have MLR. So the ratio cannot be monotone increasing (or decreasing) between −∞ and ∞. Thus the function in square brackets is maximized by its value at ∞ or −∞.2. the likelihood ratio is f (x|θ2 ) 1+ex−θ1 = eθ1 −θ2 f (x|θ1 ) 1+ex−θ2 The derivative of the quantity in brackets is d 1+ex−θ1 ex−θ1 − ex−θ2 = 2 . The Karlin-Rubin Theorem is satisﬁed. From Exercise 3.27 For θ2 > θ1 . for some constant k. Thus α = 1 − F (k |0) = 1 1+ek and β = F (k |1) = ek −1 . The best test is to reject H0 if f (x|1)/f (x|0) > k. and is = decreasing for (1 + 5)/2 ≤ x.29 a. So rejecting if f (x|1)/f (x|0) > 2 is equivalent to rejecting if 1 < x < 3. For a speciﬁed α. For θ1 > θ2 .26 a. the given test is UMP of its size. Examples include n(θ. This family has MLR. k ≈ 1. f (1|1)/f (1|0) = f (3|1)/f (3|0) = 2. Thus. so the test is UMP of its size. so the location Cauchy family with pdf f (x|θ) = (π[1+(x−θ) ])−1 is stochastically increasing. 8. hence. So for α = . 8.618. the ratio is increasing. Let θ2 > θ1 . The cdf of this logistic is F (x|θ) = ex−θ (1 + ex−θ ).28 a. Furthermore.618. The size of the test is 3 2 2 2 P (1 < X < 3|θ = 0) = 1 1 1 1 2 dx = π arctanx π 1+x 3 ≈ . beta(θ. the ratio on the right-hand side is increasing. ex−θ1 > ex−θ2 . dx 1+ex−θ2 (1+ex−θ2 ) Because θ2 > θ1 . and. Then f (x|θ2 ) 1 + (1+θ1 ) /x − 2θ1 /x 1+(x − θ1 ) . 8. From part (a). We will prove the result for continuous distributions. k = log(1 − α)/α. b. we must show F (x|θ1 ) ≤ F (x|θ2 ). The family does not have MLR.1476. so the derivative can only change sign from negative to positive showing that any interior extremum is a minimum.42. 1).595. b. 1). and Bernoulli(θ).386 and β ≈ . is increasing for (1 − 5)/2 ≤ x ≤ (1 + 5)/2 = 1. f (x|θ2 ) Because f has MLR. For θ2 > θ1 . b. a test will be UMP if it rejects when f (x|1)/f (x|0) > k. Now d [F (x|θ1 ) − F (x|θ2 )] = f (x|θ1 ) − f (x|θ2 ) = f (x|θ2 ) dx f (x|θ1 ) −1 .

which is positive. 8. By the Karlin-Rubin Theorem.4. because i Xi is suﬃcient and i Xi ∼ Poisson(nλ) which has MLR. 1 c.90. Diﬀerentiating as above.05. b. So 1 α = P (Y1 ≥ k|0) = k n(1 − y1 )n−1 dy1 = (1 − k)n .4 and 5. Because f (x|θ) = (θ/π)(θ2 + |x|2 )−1 . θ < y1 < θ + 1. b. 8.3.6476. We wi