You are on page 1of 28

1

CATEGORICAL DATA ANALYSIS

Solutions to Selected Odd-Numbered Problems
Alan Agresti
c
Version March 15, 2006, Alan Agresti 2006

This manual contains solutions and hints to solutions for many of the odd-numbered
exercises in Categorical Data Analysis, second edition, by Alan Agresti (John Wiley, &
Sons, 2002).
Please report errors in these solutions to the author (Department of Statistics, Univer-
sity of Florida, Gainesville, Florida 32611-8545, e-mail AA@STAT.UFL.EDU), so they
can be corrected in future revisions of this site. The author regrets that he cannot provide
students with more detailed solutions or with solutions of other problems not in this file.

Chapter 1

1. a. nominal, b. ordinal, c. interval, d. nominal, e. ordinal, f. nominal, g. ordi-
nal.
3. π varies from batch to batch, so the counts come from a mixture of binomials rather
than a single bin(n, π). Var(Y ) = E[Var(Y | π)] + Var[E(Y | π)] > E[Var(Y | π)] =
E[nπ(1 − π)].
q
5. π̂ = 842/1824 = .462, so z = (.462 − .5)/ .5(.5)/1824 = −3.28, for which P = .001
q
for Ha : π 6= .5. The 95% Wald CI is .462 ± 1.96 .462(.538)/1824 = .462 ± .023, or (.439,
.485). The 95% score CI is also (.439, .485).
7. a. ℓ(π) = π 20 , so π̂ = 1.0. q q
b. Wald statistic z = (1.0 − .5)/ 1.0(0)/20 = ∞. Wald CI is 1.0 ±1.96 1.0(0)/20 =
1.0 ± 0.0, or (1.0,q 1.0).
c. z = (1.0 − .5)/ .5(.5)/20 = 4.47, P < .0001. Score CI is (0.839, 1.000).
d. Test statistic 2(20) log(20/10) = 27.7, df = 1. From problem 1.25a, the CI is
(exp(−1.962 /40), 1) = (0.908, 1.0).
e. P -value = 2(.5)20 = .00000191. Clopper-Pearson CI is (0.832, 1.000). CI using Blaker
method is (0.840, 1.000).
f. n = 1.962 (.9)(.1)/(.05)2 = 138.
9. The sample mean is 0.61. Fitted probabilities for the truncated distribution are 0.543,
0.332, 0.102, 0.021, 0.003. The estimated expected frequencies are 108.5, 66.4, 20.3, 4.1,
and 0.6, and the Pearson X 2 = 0.7 with df = 3 (0.3 with df = 2 if one truncates at 3
and above).

2

11. Var(π̂) = π(1 − π)/n decreases as π moves toward 0 or 1 from 0.5.
13. This is the binomial probability of y successes and k − 1 failures in y + k − 1 trials
times the probability of a failure at the next trial.
P n
15. For binomial, m(t) = E(etY ) = y y
(πet )y (1−π)n−y = (1−π+πet )n , so m′ (0) = nπ.
P
17. a. ℓ(µ) = exp(−nµ)µ yi , so L(µ) = −nµ + ( yi ) log(µ) and L′ (µ) = −n +
P

( yi )/µ = 0 yields µ̂ q
= ( yi )/n.
P P
q
b. (i) zw = (ȳ − µ0 )/ ȳ/n, (ii) zs = (ȳ − µ0 )/ µ0 /n, (iii) −2[−nµ0 + ( yi ) log(µ0 ) +
P

nȳ − ( yi ) log(ȳ)].
P
q
c. (i) ȳ ± zα/2 ȳ/n, (ii) all µ0 such that |zs | ≤ zα/2 , (iii) all µ0 such that LR statistic
≤ χ21 (α).
19. a. No outcome can give P ≤ .05, and hence one never rejects H0 .
b. When T = 2, mid P -value = .04 and one rejects H0 . Thus, P(Type I error) = P(T =
2) = .08.
c. P -values of the two tests are .04 and .02; P(Type I error) = P(T = 2) = .04 with both
tests.
d. P(Type I error) = E[P(Type I error | T )] = (5/8)(.08) = .05.
21. a. With the binomial test the smallest possible P -value, from y = 0 or y = 5, is
2(1/2)5 = 1/16. Since this exceeds .05, it is impossible to reject H0 , and thus P(Type I
error) = 0. With the large-sample score test, q y = 0 and y = 5 are the only outcomes to
give P ≤ .05 (e.g., with y = 5, z = (1.0 − .5)/ .5(.5)/5 = 2.24 and P = .025). Thus, for
that test, P(Type I error) = P (Y = 0) + P (Y = 5) = 1/16.
b. For every possible outcome the Clopper-Pearson CI contains .5. e.g., when y = 5, the
CI is (.478, 1.0), since for π0 = .478 the binomial probability of y = 5 is .4785 = .025.
23. For π just below .18/n, P (CI contains π) = P (Y = 0) = (1 − π)n = (1 − .18/n)n ≈
exp(−.18) = 0.84.
25. a. The likelihood-ratio (LR) CI is the set of π0 for testing H0 : π = π0 such
that LR statistic = −2 log[(1 − π0 )n /(1 − π̂)n ] ≤ zα/2
2
, with π̂ = 0.0. Solving for π0 ,
2 2 2
n log(1 − π0 ) ≥ −zα/2 /2, or (1 − π0 ) ≥ exp(−zα/2 /2n), or π0 ≤ 1 − exp(−zα/2 /2n). Using
2 2
exp(x) = 1 + x+ ... for small x, the upper bound is roughly 1 −(1 −z.025 /2n) = z.025 /2n =
2 2
1.96 /2n ≈ 2 /2n = 2/n.
q
b. Solve for (0 − π)/ π(1 − π)/n = −zα/2 .
c. Upper endpoint is solution to π00 (1 − π0 )n = α/2, or (1 − π0 ) = (α/2)1/n , or
π0 = 1 − (α/2)1/n . Using the expansion exp(x) ≈ 1 + x for x close to 0, (α/2)1/n =
exp{log[(α/2)1/n ]} ≈ 1+log[(α/2)1/n ], so the upper endpoint is ≈ 1−{1+log[(α/2)1/n ]} =
− log(α/2)1/n = − log(.025)/n = 3.69/n.
d. The mid P -value when y = 0 is half the probability of that outcome, so the upper
bound for this CI sets (1/2)π00 (1 − π0 )n = α/2, or π0 = 1 − α1/n .
29. The right-tail mid P -value equals P (T > to ) + (1/2)p(to ) = 1 − P (T ≤ to ) +
(1/2)p(to ) = 1 − Fmid (to ).

(Note similarity to relative risks. 3.. 7. Heart disease. 33.83 P(V=w) + . e.5. if N = no.83) = 76.001304/. Relative risk: Lung cancer. which is m(t) = (1 − 2t)−(ν1 +ν2 )/2 . 11. Sensitivity = P (+|C) = 1 − P (−|C) = 3/4. which simplifies to n(1 + π)/π(1 q − π). That is.62. X 2 = (n1 − nπ0 )2 /nπ0 + (n2 − n(1 − π2 ))2 /n(1 − π0 ) = n[(π̂ − π0 )2 (1 − π0 ) + ((1 − π̂) − (1 − π0 ))2 π0 ]/π0 (1 − π0 ). P (−|C) = 1/4.02 times those for nonsmokers. then the mgf of Y1 + Y2 is the product of the mgfs. relative risk. Then.0085. It is unclear from the wording. and denote the null probabilities in the two categories by π0 and (1 − π0 ). the odds of dying from lung cancer for smokers are estimated to be 14. this happens when the proportion in the first category is close to zero.998696)/(. the relative risk of fatality for ‘none’ is 7. 5. a. Specificity = P (−|C̄) = 1 − P (+|C̄) can’t be determined from information given. which equals (π̂ − π0 )2 /[π0 (1 − π0 )/n] = zS2 .g. Applying Bayes theorem. which is the mgf of a χ2 with df = ν1 + ν2 .3. b. P (V = w|M = w) = P (M = w|V = w)P (V = w)/[P (M = w|V = w)P (V = w) + P (M = w|V = b)P (V = b)] = . Heart disease. 3 31. (. (Cigarette smok- ing seems more highly associated with heart disease) Odds ratio: Lung cancer. which is 2nπ 2 /π 2 + nπ(1 − π)/π 2 + nπ(1 − π)/(1 − π)2 + n(1 − π)/(1 − π)2 . The proportion of fatal injuries is close to zero for each row. a.62.17/. c.00256. 9. but presumably this means that P (C̄|+) = 2/3. since difference of proportions makes it appear there is no association. 1994 probability of gun-related death in U.5. .S.00130.999879) = 10.0012. so the odds ratio is similar to the relative risk. Difference of proportions describes excess deaths due to smoking.965.06)/(. was 34. If Y1 is χ2 with df = ν1 and if Y2 is independent χ2 with df = ν2 . (Cigarette smoking seems more highly associated with lung cancer) Difference of proportions: Lung cancer.4.g. The odds ratio is θ̂ = 7.06 P(V=b)]. 10.. 11. 1. 34. 5. we predict there would be . Odds ratio = (. We need to know the relative numbers of victims who were white and black. the information is its negative expected value. Relative risks are 3. Heart disease. difference of proportions = . The asymptotic standard error is the square root of the inverse information.78. Chapter 2 1. 35. Since ∂ 2 L/∂π 2 = −(2n11 /π 2 ) − n12 /π 2 − n12 /(1 − π)2 − n22 /(1 − π)2 .000121/. X given Y . 14.94/. . e.00.897 times that for ‘seat belt’.00130N fewer deaths per year from .7. 1. and (1 − π̂) = n2 /n.) b.02. 14. smokers in population.79. . or π(1 − π)/n(1 + π). Let π̂ = n1 /n.7 times that in England and Wales.83 P(V=w)/[.

5%) subjects are not HIV+. This could happen if Jones tends to have relatively more observations (i. columns = (hit. 25.04975 .05(. “dis” denote subject has disease. where rows = (Smith. 27.7 = 2.005)/[. and θ = [π1 /(1−π1 )]/[π2 /(1−π2 )] > π1 /π2 > 1.00475 .94525 . The odds ratio = 361. the difference between the proportion of concordant pairs and the proportion of discordant pairs equals . This simply states that ordinary independence for a two-way table holds in each partial table. D = 709). The 5% errors for them swamp (in frequency) the 95% correct cases for subjects who truly are HIV+. 19.087.1/11.00256N fewer deaths per year from heart disease. layers = (year 1.00025 . The odds of carcinoma for the various smoking levels satisfy: (Odds for high smokers)/(Odds for low smokers) = (Odds for high smokers)/(Odds for nonsmokers) (Odds for low smokers)/(Odds for nonsmokers) = 26. then 1 − π1 > 1 − π2 . 1−π1 < 1−π2 . gamma = . Yes.360 (C = 1508. i.. Thus elimination of cigarette smoking would have biggest impact on deaths due to heart disease. There is a tendency for wife’s rating to be higher when husband’s rating is higher. Equality of the I conditional distributions is equivalent to independence. . this would be an occurrence of Simpson’s paradox. Jones). . The age distribution is relatively higher in Maine.995 Nearly all (99. If π1 < π2 . This condition is equivalent to the conditional distributions of Y in the first I − 1 rows being identical to the one in row I. 17. “at bats”) for years in which his average is high. 23. b.360.2. a.e.95(. Then. a.4 lung cancer if they had never smoked. 29. year K).95(. . Suppose π1 > π2 . 21. and .995)] = . and θ = [π1 /(1 − π1 )]/[π2 /(1 − π2 )] < π1 /π2 < 1. . . Use Bayes Theorem and result that RR = P (D | E)/P (D | Ē). The numerator is the extra proportion that got the disease above and beyond what the proportion would be if no one had been exposed (which is P (D | Ē)). of the untied pairs.e. . the odds of a positive test result are 361 times higher for those who are HIV+ than for those not HIV+.005) + .. Let “pos” denote positive diagnosis. 33. Test + − Total Reality + . 15. One could display the data as a 2 × 2 × K table. P (pos|dis)P (dis) P (dis|pos) = P (pos|dis)P (dis) + P (pos|no dis)P (no dis) b.005 − . out) response for each time at bat.

5.23 standard errors. df = 2.. c.4.59. quite wide.2755) of the observed table. λ = 0 can occur even when the variables are not independent. for those columns combined and compared to column three. so λ = 0. .37 (df = 1).2755) = .9.95. no evidence of difference).3808. (i) P -value is hypergeometric probability P (n11 = 21 or 22 or 23) = .243. 7. The main evidence of association relates to whether one suffered a heart attack. rows 3 and 4 combined.27. 5 37.01 and G2 = 7.638 is sum of probabilities that are no greater than the probability (.5(. rows 3 and 4 (G2 = . The standardized Pearson residuals show that the number of female Democrats and Male Republicans is significantly greater than expected under independence. e. The asymptotic CI (.76. df = 6.a. and the 3 × 2 table consisting of rows 1 and 2 combined.75. P = 0. and the difference between the observed count and fitted value is 2. then E[V (Y | X)] = i πi+ (1 − π1|i ) = 1 − π+1 = 1 − max{π+j }. Sample odds ratio is 0. and the null . G2 = 27. and much stronger evidence of an association.. a.77.03.03).74.3808 . It is plausible that control of cancer is independent of treatment used. so P < . TY = j n+j (n+j −1)/2.6.001. With this type of P -value.g. df = 1. and row 5 (G2 = 95. G2 = 2. TX = i ni+ (ni+ −1)/2. test treats variables as nominal and ignores the information on the ordering. X 2 = 0. Chapter 3 3. 14. Since the maximum P being the same in each row does not imply independence. G2 = 25. The free throws are plausibly indepen- dent. strong evidences of differences). c.22 (df = 1). and the number of female Republicans and Male Democrats is significantly less than expected under independence. a.29. df = 2. b. The denominator is the number of pairs that are untied on X. P P P P c. 11. X 2 = 8.07). the actual error probability tends to be closer to the nominal value. no evidence of difference). there were 279 female Democrats. TXY = i j nij (nij −1)/2. P = . the estimated expected frequency under independence is 261. and 95% CI for true odds ratio is (0. If in each row the maximum probability falls in the same column. Note that ties on X and Y are counted both in TX and TY .29. df = 1. and so TXY must be subtracted. df = 1. For first two columns.02.00 (df = 2) show considerable evidence against the hypothesis of independence (P -value = . G2 = 0. 39. 27. the sum of the two one-sided P-values is 1. The ‘exact’ CI (. P-value about 0. Compare rows 1 and 2 (G2 = . b.18. b.21. say column 1.1) for the SE.31. Residuals suggest tendency for aspirations to be higher when family income is higher. 2. (ii) P -value = 0. Ordinal test gives M 2 = 4. 13.15) uses the delta method formula (3. 9. The values X 2 = 7.55) is the Cornfield tail-method interval that guarantees a coverage probability of at least .

The free throws are plausibly independent and identically distributed. For given π. XX i j Since nij ≤ ni+ . this is approximately 1 − α if B/ Po (1 − Po )/M = zα/2 . It follows that X 2 cannot P P exceed n[min(I. ∞). For any “reasonable” significance test. The log likelihood has kernel L = n11 log(θ2 ) + (n12 + n21 ) log[θ(1 − θ)] + n22 log(1 − θ)2 ∂L/∂θ = 2n11 /θ + (n12 + n21 )/θ − (n12 + n21 )/(1 − θ) − 2n22 /(1 − θ) = 0 gives θ̂ = (2n11 + n12 + n21 )/2(n11 + n12 + n21 + n22 ) = (n1+ + n+1 )/2n = (p1+ + p+1 )/2. By expanding the square and simplifying. q q 45.5. Calculate estimated expected frequencies (e. b. the P -value will be small if the sample size is large enough.5. As noted in problem 42.5. 21.0.164. 35. P (|P̂ − Po | ≤ B) = P (|P̂ − Po |/ Po (1 − Po )/M ≤ B/ Po (1 − Po )/M. a.618. X 2 = n[ (n2ij /ni+ n+j ) − 1]. the test statistic tends to be larger and the P -value tends to be smaller as the sample size increases. J) − 1] = n[min(I − 1. Most statisticians feel we learn more by estimating parameters using confidence intervals than by conducting significance tests. so df = (4-1)-1 = 2 (one higher than in testing independence without assuming identical marginal distributions). (. whenever H0 is false. P = 0. ∞).8. 17. 29. the double sum cannot exceed i j nij /ni+ = I. and obtain Pearson X 2 . the minimum is with proportions (. (0.0035 takes into account the positive linear trend information in the sample.6 expected value is 0. the contribution to the asymptotic variance is [1/nπ + 1/n(1 − π)]. the double sum term cannot exceed i j nij /n+j = J. Because G2 for full table = G2 for collapsed table + G2 for table consisting of the two rows that are combined. a. which is 2. P = 0. c. which is less than 0 for π < 0. We estimated one parameter. . the probability of this table is highest at π = .5) in the two categories.. By the q approximate normality of P̂ . 31. and since P P nij ≤ n+j . and P (X ≥ 6 | n+1 = k) is the P P -value for Fisher’s exact test. Note θ = π1+ = π+1 .5 and greater than 0 for π > 0. The observed table has X 2 = 6. .g. For proportions π and 1 −π in the two categories for a given sample. 33. however. it does not guarantee that the actual error probability is no greater than the nominal value. P (X 2 ≥ 6) = k P (X 2 ≥ 6 and P 2 2 n+1 = k) = k P (X ≥ 6 | n+1 = k)P (n+1 = k). The derivative of this with respect to π is 1/n(1 − π)2 − 1/nπ 2 . µ̂11 = nθ̂2 ). 43. J − 1)]. 15. b. one can obtain the alternative formula for X 2.5. Thus. Even if H0 is just slightly false.

15. Need log likelihood value when q β = 0. 11. The relatively small SE for k̂ −1 gives strong evidence that this model fits better than the Poisson model and that it is necessary to allow for overdispersion. X 2 = 35.5893 ± 1. The relative risk is . and likelihood-ratio statistic for testing color equals 8. predicted values are 3.1. at weight = 5.452c1 + .0774) = .037 and adjusted interval . df = 3. . Thus.27.0003 + .2. The actual value is 3. c. much higher than the upper bound of 1. The exact test using X 2 gives P = . df = 2.099(5.502 + .5893(2. Multiply standard errors by 535.0650) = (. df = 1. considerably larger than Poisson variance unless µ̂ is very small.238 + 1. 5. π̂ = Φ−1 (−2. and log(. c2 .173(color).6 (ii) 2. where c1 .03.0) = .53. a. logit(π̂) = -3.74. O’Neal’s estimated probability of making a free throw is . (i) 3. There is still very strong evidence of a positie weight effect.5.1. predicted logit = 5. 2.40.4619.323(weight). d. 13.74.74. (.0650)2 = 82. e.002c3 . = 1. likelihood-ratio stat. which together with Fig.0304(. 9. However.0011 = 9.028. log(µ̂) = -.3. π̂ = e−6.0.546(weight) . b. exp[−. 4.0102/.546(weight) + .2182 ] = .8 times the predicted value.896/171 = 1. a.2. a.5 (df = 22) provides evidence of lack of fit (P = . and a 95% confidence interval is (.145 + . so adjusted SE = .456 (SE = . Using quasi likelihood. a 1 cm increase in width corresponds to an estimated in- crease of 21% in the expected number of satellites. the estimated variance is µ̂ + 1. b.21. c.5893/.034).9968/. a.1. Palm Beach County is an outlier.2 kg. predicted probability = 1.0011(7. c. whereby darker crabs tend to have fewer satellites. d. c3 are dummy variables for the first three color levels.029).9997 7.456.0 for a probability. For estimated mean µ̂. d.2182 /[1 + e−6.7167). Using the or- dinality of color yields stronger evidence of an effect. log(µ̂) = .0025 + . at 5. q b.695 + 1.0020. α̂ = .247c2 + .089 + .0021. π̂ = -.8 suggests it is an outlier. c.4.0032) = 5. there is evidence of lack of fit. . . so the simpler model does not give a significantly poorer fit.48) = .3. Estimated proportion π̂ = −. probit(π̂) = -2.4288 + .51). The estimated probability of malformation increases from . X 2 /df = 1. 3.77. Chapter 4 1. Roughly 3%.815(weight). P = ..192) = 1. Using scores 1. Compared to the more complex model in (a).2)) = Φ−1 (3. 7 Solving for M gives the result.099(weight).238 + 1. Since exp(.3.44)] = 2. Test statistic = 9.11µ̂2. a. The much larger SE of β̂ in this model also reflects the overdispersion. b.96(.0102 at x = 7.0011 at x = 0 to .

∂µA /∂ηi is constant. With identity link the GLM likelihood equations simplify to.23z. we could get negative predicted values.17 so it is not significant. j=1 (yij − µi )/µi = 0. Φ(α + βx) = Φ( x−(−α/β) 1/β ). P P H = −( i yi )/µ2. Letting φ = Φ′ . For j = 1. so µ̂B = ȳB .27 has a std. P b. any real number predicted value for the linear model corresponds to a probability between 0 and 1. and for observations in group B. With the logit link. With the log link. as the coefficient of the cross product of . Deviance = 2 i j [yij log(yij /ȳi ). so µ̂A = ȳA .53). This shows some tendency for a lower rate of imperfections at the high thickness level (z = 1). The link function determines the function of the mean that is predicted by the linear predictor in a GLM. it is not unusual to get predicted probabilities below 0 or above 1. Poisson means must be nonnegative. c.8 is (. A restriction of the model is that to ensure 0 < π(x) < 1.23 equals . xij = 1 and the P likelihood equation gives (yi − µA ) ∂µA X (yi − µB ) ∂µB X     + = 0. Adding an interaction (cross-product) term does not provide a significantly better fit.38. a. The logistic pdf√has φ(x) = ex /(1 + ex )2 which equals . P P 29. log[π(x)] = α + βx. P 25. b.72 + . 19. Setting α + βx = 0 gives x = −α/β. for each i. error of . For log likelihood L(µ) = −nµ + ( i yi ) log(µ). The identity link models the binomial probability directly as a linear function of the predictors. ∂µB /∂ηi is constant. a. and for observations in group A. The derivative of Φ at x = −α/β is βφ(α + β(−α/β)) = βφ(0).36. the standard normal pdf equals 1/ 2π at x = 0.25 at x = 0. Since log[π(x + 1)] − log[π(x)] = β. 35. wi = [φ( βj xij )]2 /[Φ( βj xij )(1 − Φ( βj xij ))/ni ] P P P j j j P ni 27. 17. from which µ̂i = j yij /ni . Since φ is symmetric. Similarly. because probabilities must fall between 0 and 1. error of -. A µA ∂ηi B µB ∂ηi The first sum is 0 from the first likelihood equation. so second sum sets B (yi − µB )/µB = 0. reflecting slight overdispersion. Note . so that µ(t+1) = 2µ(t) − (µ(t) )2 /ȳ. and hence µ(t+1) = ȳ. . so likelihood equation sets A (yi − µA )/µA = 0. the relative risk is π(x + 1)/π(x) = exp(β). For j = 0. 23.59x − . For Newton- P Raphson. the adjustment to µ(t) is µ(t) − (µ(t) )2 /ȳ. 15. When the probability is near 0 or 1 for some predictor values or when there are several predictors. It is not often used. Φ(0) = . the score is u = ( i yi − nµ)/µ. and the information is n/µ. whereas straight lines provide predictions that can be any real number. though the std. it is necessary that α + βx < 0. If we use an identity link. It follows that the adjustment to µ(t) in P Fisher scoring is [µ(t) /n][( i yi −nµ(t) )/µ(t) ] = ȳ−µ(t) . With single predictor.5. xij = 0 for group B. a predicted negative log mean still corresponds to a positive mean. Model with main effects and no interaction has fit log(µ̂) = 1.

c. Goodness-of-fit statistic G2 = 2. For the Poisson.8. and 2.1449(8) /[1 + e−3.7 and e3. Given race. logit(π̂) = -3.3671)2 = 5. so wi = (∂µi /∂ηi )2 /Var(Yi ) = v(µi )/v(µi = 1.932) = .7771+.015 has SE = . However the fit shows strong evidence of a tendency for the likelihood of lung cancer to increase at higher levels of smoking.26.3068 = 3.8 times the odds of a black athlete.96.g.06.20. Black defendants with white victims had estimated probability e−3.1449(8) ]. b. df = 1.7771+. 3.1449/.6. d. .0. Goodness-of-fit test gives G2 = 0. f.2 times the odds when the victim was black. 9.132. Likelihood-ratio statistic = 34. df = 1. controlling for gender. For a given defendant’s race. For main effects logit model with intercourse as response. Chapter 5 1. Wald statistic (−. so g ′ (µi ) = µi .025 for LR statistic. 11. X 2 = . 7. −1/2 √ v(µi ) = µi . G2 = . LR statistic = 5. so model fits well.7.9 for gender. so rate of change is β̂ π̂(1 − π̂) = . 13.087. π̂ = .068. The odds of remission at LI = x + 1 are estimated to fall between 1. The main effects model fits very well (G2 = .044. the odds of a white athlete graduating are estimated to be exp(1.009. the odds of the death penalty when the victim was white are estimated to be between e1. Given gender. and tends to give smaller P -values when there truly is a linear trend.30.1449 = 1.029 and 1.7771/.23. Wald statistic = (. π̂ = e−3. π̂ = . with Wald or likelihood-ratio tests (e. At LI = 8.72 times higher for blacks than for whites. P -value = . P -value = .21 for two-unit change. b. then also µ(t+1) = ȳ. eβ̂ = e.298 times the odds of remission at LI = x.4044 /[1 + e−3. Fitted probabilities are .7175 = 41.0593)2 = 5.068)(. 37.5961+2. h. the odds of having ever had sexual intercourse are estimated to be exp(1.7 for race and 1. The model does not fit well (G2 = 31.5 at −α̂/β̂ = 3.004.4044 ] = . df = 1. 9 that if µ(t) = ȳ. df = 2 shows no evidence of lack of fit. estimated conditional odds ratios are 3. each with df = 1. c. . The Cochran–Armitage test uses the ordering of rows and has df = 1.g. a.. the odds of a female graduating are estimated to be exp(.38. df = 1. with a particularly large negative residual for the first count.16.397(snoring). the race effect of 1. P -value = .093.0146 for Ha :β 6= 0.37 .49 for one-unit change in snoring.015) = 2.866 + 0.1449 = 26.313) = 3. so g(µi ) = 2 µ.1449(. Both effects are highly significant. g. Mul- tiplicative effect on odds equals exp(0.0002. and the gender .8678/. e. df = 4).352) = 1.. a. ∂ηi /∂µi = v(µi )−1/2 .397) = 1.021. 5. e.4 times the odds of a male graduating.5961+2. so model fits well. df = 1). .07 = 8.

1A + 1. then the logit is the same for each row. the prediction equation is logit(π̂) = −10.2S.2. of S of 1. Given {πi }.80zc + 2. Note that .556 + . 33.48.2.465 and -. which correspond to estimated proba- bilities of . For large n.11. a. The original variables c and x relate to the standardized variables zc and zx by zc = (c−2. the log likelihood is i [yi log πi + (1 − yi ) log(1 − πi )]. 35. 21. R = 0: logit(π̂) = −7.4) = 4. With constraint βI = 0. When yi is a 0 or 1.2) = 3.44 and x = 2.7 + .509[. the coeff. By Bayes Theorem. and the log likelihood equals 0. where β = (µ1 − µ0 )/σ 2 and α = − log[(1 − ρ)/ρ] + [µ20 − µ21 ]/2σ 2 . The odds ratio eβ is approximately equal to the relative risk when the probability is near 0 and the complement is near 1.80 and zx = (x−26.10 effect of .11zx +26.458[2. the ratio of (α̂ + β̂x . At x̄ = 26.4S. The coefficients of the standardized variables are -.509(.97. βi is the log odds ratio for rows i and I of the table.44)/.3. the estimated logits at c = 1 and at c = 4 are 1. The square of the denominator is the variance of logit(π̂) = α̂ + β̂x.352 has SE = .01 for smoking represents the result of the test that the log odds ratio between Y and S for whites is 0.2 is the log odds ratio between Y and S when R = 0 (whites). When all βi are equal.3 for whites. in terms of the ML .458(2.053(income). and (for fixed π0 ) all x for which the absolute ratio is no larger than zα/2 are not contra- dictory. P (Y = 1|x) = ρ exp[−(x−µ1 )2 /2σ 2 ]/{ρ exp[−(x−µ1 )2 /2σ 2 +(1−ρ) exp[−(x−µ0 )2 /2σ 2 ]} = 1/{1 + [(1 − ρ)/ρ] exp{−[µ20 − µ21 + 2x(µ1 − µ0 )]/2σ 2 } = 1/{1 + exp[−(α + βx)]} = exp(α + βx)/[1 + exp(α + βx)]. since eβ = [π(x + 1)/(1 − π(x + 1))]/[π(x)/(1 − π(x))] ≈ π(x + 1)/π(x). it follows that βi = log[πi /(1 − πi )]) − log[πI /(1 − πI )]. a one standard deviation change in x has more than double the effect of a one standard deviation change in c. is the difference between the log odds ratios 1.logit(π0 ) to its standard deviation is approximately standard normal.3]. 15. P For the saturated model.41 and .1A + 1. log[πI /(1 − πI )] = α determines α.80zc +2.062. of the cross-product term.4 and 1. d. The YS conditional odds ratio is exp(1. π̂i = yi . Logit model gives fit. Thus.3)/2.080). so πi is the same in each row. 37.11zx + 26.3. so that c = . in which case the RS interaction does not enter the equation. Since log[πi /(1 − πi )] = α + βi . a.81 and . That is. logit(π̂) = -3. Controlling for the other variable. So.80) = -.071 − . 31. The coeff. The P -value of P < .0 + .11) = . Let ρ = P(Y=1). 25. 29. so there is independence. we can find parameters so model holds exactly.1 for blacks and exp(1.44] + . R = 1: logit(π̂) = −6.

146)/(. which in chi- squared form equals (14 − 6)2 /(14 + 6) = 3. logit(π̂) = −12.671)2 = 1. Let pi = yi /ni .834(weight) + . whereas women applied relatively more often to Departments C.e.87. The CMH statistic simplifies to the McNemar statistic of Sec.21 and . P P P P So.3 is . E. These two effects combine to give a relative advantage to men for admissions when we study the marginal association. D. F. so λ = 1.85.90. so this is the problem of multicollinearity. P For this model. For the latter model. There is slight evidence of a better response with treatment B (P = .83. Then n = 75. = 32. 9.2 (df = 1).84 times higher for men than women.35 + .55 and (. 10. admissions rates were relatively high for Departments A and B and relatively low for C.854. 5. = .307/.074 for the two-sided alternative).1. c. a. θ̂AG(D) = . Like. At the same time. The values of G2 are 2.1. above mean) is . one std.674.68 for the model with no G effect and 2. E. the deviance simplifies to D = −2[α̂ i π̂i + β̂ i xi π̂i + i log(1 − π̂i )] P P P  β̂xi )+ i log(1 − π̂i )] = −2[ i π̂i (α̂ + P P π̂i = −2 π̂i log −2 log(1 − π̂i ). logit(π̂) = -9.351 + . The estimated odds of admission were 1. δ = 5. so given department. the  deviance equals π̂i D = −2 i [yi log π̂i + (1 − yi ) log(1 − π̂i )] = −2 i [yi log + log(1 − π̂i )] P P 1−π̂i = −2 i [yi (α̂ + β̂xi ) + log(1 − π̂i )]. The odds ratio is [(. F. These predictors are highly correlated (Pearson corr. However.22).9 (df = 2). the likelihood equations are i yi = i π̂i and i xi yi = i xi π̂i . The ith sample logit is (t) (t) (t) (t) (t) log[pi /(1 − pi )] ≈ log[πi /(1 − πi )] + (pi − πi )/πi (1 − πi ) (t) (t) (t) (t) (t) = log[πi /(1 − πi )] + [yi − ni πi ]/ni πi (1 − πi ) Chapter 6 1. and take just the term with the first derivative.674/. a. and the P -values are . a. 13. 1.887).90 times as high for men as for women. b. Simpson’s paradox strikes again! Men applied relatively more often to Departments A and B.497x. Expand log[p/(1 − p)] in a Taylor series for a neighborhood of points around p = π. These each have df = 1. Wald statistics are (.09.854/. There is extremely strong evidence that at least one variable affects the response. at x = 26. prob.04. P < . at x = 28.4 (i.182)2 = 2.. D.834/.56 for the model with G and D main effects. dev. the estimated odds of admission were .0001. . Prob.307(width). CI for conditional AG odds ratio is (0. b.326)] = 2. 11 fit and the ML estimates {π̂i } for this linear trend model. ratio stat. P P i 1−π̂i i 41.

For males in urban areas wearing seat belts. for whites. 31. For x2 − x1 = 1.271)x2 = 1. The noncentrality is the same for models (X + Z) and (Z).419) = 1.342) = 1.fitted)2 /fitted for the “success” category and the second component is (observed - fitted)2 /fitted for the “failure” category. f.975)/[1 + exp(−2. 25. For each of 2 logits.271)] = . a.2.419+.183 for Democrats. . all dummy variables equal 0 and the estimated cumulative proba- bilities are exp(3.469)] = . so the power goes to 1 as n increases. . Combining terms gives (y − nπ)2 /nπ(1 − π). controlling for gender.3494)] = . exp(7.419 − . P (y =√ 1) = P (α1 + β1 x1 + ǫ1 > α0 + β0 x0 + ǫ0 ) = √ P [(ǫ0 − ǫ1 )/ 2 <√ ((α1 − α0 ) + ∗ ∗ ∗ ∗ (β1 − β0 )x)/ 2] = Φ(α + β x) where α = (α1 − α0 )/ 2.8 between gender and party ID and 9.758+. and 1. where the first component is (observed . . c. so df = 2(4) − 2(3) = 2.342)+exp(−. 5. The likelihood-ratio statistic of 7. Both gender and race have significant effects. we use the notation of (4.883+. 7. The estimated odds for females are exp(0. b. so the difference statistic has noncentrality 0. with estimated conditional odds ratios of 1.0.2563)] = . √ 29 a. so π(x2 ) = π(x1 )exp[β(x2 −x1 )] .76.469 + . π̂1 = exp(.03 and shows evidence of a gender effect.965.65 times the estimated odds for Republicans.342 − . Chapter 7 1.5 times those for males.3074)/[1 + exp(3.2563)/[1 + exp(7.883 + . has a P -value of .975) = 2. The estimated odds of preferring Democrat instead of Republican are higher for females and for blacks. Then. The esti- mated probability of a very liberal response equals exp(−2. which is the square of the residual. with G2 = 0. β = (β1 − β0 )/ 2. For simplicity.105)x1 + (. Four intercepts are needed for five response categories.314x + 1 + .4818)] = . a. The logit model with additive effects and no interaction fits well.8 between race and party ID. We consider the contribution to the X 2 statistic of its two components (corresponding to the two levels of the response) at level i of the explanatory variable.4 times those for blacks. exp(5. log(π̂1 /π̂2 ) = (. controlling for race.2 based on df = 2. 3. based on df = 2. there are 4 gender-race combinations and 3 parameters.025.071x2.0007.3494)/[1 + exp(5.469)/[1 + exp(−2. for Democrats the estimated odds of response in the liberal direction are exp(.975)] = . π(x2 ) equals π(x1 ) raised to the power exp(β). (log π(x2 ))/(log π(x1 )) = exp[β(x2 − x1 )].965.105+.21) but suppress the subscripts.9993.883+.758) + (.469 + . The conditional XY independence model has noncentrality propor- tional to n.004. . Adding these chi-squared components therefore gives the sum of the squared residuals.995.419+.005. The corresponding response probabilities are . For any collapsing of the response. and .4818)/[1 + exp(3.3074)] = .641 + .078 for Republicans and exp(−2.12 23.970. they are exp(0. that contribution is (y − nπ)2 /nπ + [(n − y) − n(1 − π)]2 /n(1 − π).342)/[1+exp(. exp(3.

549.74. b. the estimated odds of response with sequential therapy below any fixed level are . female) and (1. Since exp(.3 (df = 1) and has P -value = . gives similar results as in (a).7602) = . a.1244) = . CMH statistic for correlation alternative. because the baseline-category logit model refers to individual categrories rather than cumulative probabilities. c. For this age group the log odds ratio with smoking is β1 +β3 for smoking levels one unit apart and 2(β1 + β3 ) for smoking levels two units apart. since the model allows different effects for each logit and treats A as a factor.18 and exp[2(. logit[P (Y ≤ j | X = xi )] .581 (SE = 0. The model does not account for ordinality. LR statistic comparing this model to model with four separate operation parameters equals 2. When there is roughly a linear trend. for each logit). µ = 2. 13 b. alternating). P̂ (Y > 2) = 1 − P̂ (Y ≤ 1) = 1 − Φ(−. a.58.541 (SE = . For j < k.47 in urban locations. the effect of smoking is 94% higher for the older-aged subjects.600). LR statistic for cumulative logit model with linear effect of operation = 6. c. and the numerator is negative when β1 > 0 and β2 > 0.logit[P (Y ≤ k | X = xi )] = (αj − αk ) + (βj − βk )x. No. The main effects model fits well (G2 = 5. the estimated odds ratios are exp(. 13. There is not linear structure for baseline-category logits that implies identical effects for each cumulative logit. the estimated odds of injury below any fixed level for a female are between .115 + .161 + .6.493)) = (.56 and .0) for (male.0272)] = (exp(−. strong evidence that operation has an effect on dumping.83]/5. 27.012.161 − . b.13.611 times the estimated odds for a male. c. so df = 4. using equally-spaced scores.549 and .663) = 2. The model has 12 baseline-category logits (2 for each of the 6 combinations of S and A) and 8 parameters (an intercept. Also. 9. and adding an interaction term does not give an improved fit (The interaction model has G2 = 4. The sequential therapy leads to a better response than the alternating therapy. a. For x2 = 1.67 and σ = 5.41 in rural locations and exp(−. The denominator is positive.195x1 ) = Φ(. b.96(. Setting up dummy variables (1.1244 is the difference between the two log odds ratios. . The estimated odds ratios are . a.7602−. Give seat belt use and location.83 and σ = 5. Estimated odds ratio equals exp(−. Thus.5463 ± 1.663)] = 4. 31.13.195x1 ) = Φ([x1 − . an A effect and two S effects. 29. exp(−.0) for (sequential. For the extreme categories of B. Wald CI is exp[−.212) and gender effect = -. the log odds ratios double and the odds ratios square. The interaction effect -.611). so the shape is that of a normal cdf with µ = . df = 1. since it focuses on a single degree of freedom.8 (df = 3). 11.01.115 + . P = .663) = 1. ∂π3 (x)/∂x = −[β 1 exp(α1 +β1 x)+β2 exp(α2 +β2 x)] [1+exp(α1 +β1 x)+exp(α2 +β2 x)]2 . 17. it does not permit interaction. df = 6). we get treatment effect = -.56 times the estimated odds with alternating therapy. this tends to be more powerful and give smaller P -values.13).295).7.94.5. For x2 = 0. which are two units apart. df = 7). This difference of cumulative probabilities cannot be positive since . so simpler model is adequate. equals 6.

.175) for GH association.14). and .96(. For a given gender.38 (df = 2) for (GI. G2 values are 2. The local odds ratios refer to a narrow region of the response scale (categories j and j − 1 alone). HI.464 (SE = .3.4 times higher for those ejected (controlling for S). 35. estimated log odds ratio is . EI) = 2.252 (SE = . For a given subject. a. b. 33. The cumulative probabilities in row a are all smaller or all greater than those in row b depending on whether µa > µb or µa < µb . and if βj > βk then the difference is positive for small x. GI. G2 (SE. 7. a. if βj > βk then the difference is positive for large x.252 ± 1. the log odds of selecting a over b depend on ua − ub .30 (df = 1) for (GI.57 for S and I. a. the effect β refers to an underlying continuous variable with a normal distribution with standard deviation 1. h αh + βh x + γuh For a given cost. wearers of seat belts are much less likely to be ejected). the estimated conditional log odds ratio equals λ̂AC AC AC AC 11 + λ̂22 − λ̂12 − λ̂21 5.175)]. I = injury. even though simpler models fit adequately. The estimated conditional odds ratios are .464 ± 1.96(. Let S = safety equipment. Then. Since the intervals contain values rather far from 1. The loglinear model (SE. c. SI. and . Similarly. For either approach. The full model has an extra I − 1 parameters. df = 1. 5. EI) is equivalent to a logit model in which S and E have additive effects on I. so CI for odds ratio is exp[−. df = I(J − 1) − [(J − 1) + (I − 1)] = (I − 1)(J − 2). and exp(1. 37.798) = 16.57 times higher for those not wearing seat belts (controlling .e. however. leading to CI of exp[. Chapter 8 1. from (8. E = whether ejected. Estimated log odds ratios is -.241) for GI association. the odds a female selects a over b are exp(βa − βb ) times the odds for males. GH). 3. SI.061 for E and I. b. Loglinear models containing SE are equivalent to logit models with I as response variable and S and E as explanatory variables. it is safest to use model (GH. The estimated odds of a fatal injury are exp(2.241)]. b.2.0. so it seems there is an association for each pair of variables. From the argument in Sec.85. HI). 41. the model has the form αj + βj x + γuj πj = P . and that association can be regarded as the same at each level of the third variable.14 P (Y ≤ j) ≤ P (Y ≤ k). whereas cumulative odds ratios refer to the entire response scale. Any simpler model has G2 > 1000.091 for S and E (i. HI).717) = 5.

Model (BP R. overall the most likely case for injury is therefore females not wearing seat belts in rural locations. For those who agree with birth control availability. females are more likely to be injured. logit(π) = α + βiR . other things being fixed. RS) fits well (G2 = 7. e. Model deleting PR association also fits well (G2 = 10. Model with A as response and additive factor effects for R and P . πijk = πi++ π+j+ π++k .44 times the odds for seat belt use.147) = 3. we find that P (X = i. For mutual independence. RS) has G2 = 5. “No” is category 1 of I. Summing both sides over k. 9. and the odds of no injury for no seat belt use are estimated to be .0.58 times the odds of no injury for males (controlling for L and S). the odds of no injury for urban location are estimated to be 2. BS. P R. b.13 with location.45.05). . as in problem 16c this simplifies to 4λXY 11 . the estimated odds of view- ing premarital sex as wrong only sometimes or not wrong at all are about triple the estimated odds for those who disagree with birth control availability. Y = j|Z = k) = P (X = i|Z = k)P (Y = j) = P (X = i|Z = k)P (Y = j|Z = k) and there is XY conditional independence. logit(π) = α + βiR + βjP . that is.7. P (X = i. (ii) (GRP. AP ). df = 7. b. Use equations such as ! ! µi11 µij1 µ111 λ = log(µ111 ). P (Y = j|Z = k) = π+jk /π++k = π+j+ π++k /π++k = π+j+ = P (Y = j). Similarly. df = 11). The 95% CI is exp(1.147 ± 1. add term of form βijRP to logit model in Exercise 5. so injury is more likely at a rural location. AR). Set βhG = 0 in model in previous logit model. When Y is jointly independent of X and Z. a. so the odds of no injury for females are estimated to be . 13. 15 for E). Dividing πijk by π++k . and also a good fit.645(.13 times the odds for rural location. 4. Homogeneous association model (BP. P S. and “female” is category 1 of G.8. df = 9). AR. Hence. Since there is no interaction for this model. Injury has estimated conditional odds ratios . b. A). But when πijk = π+j+ πi+k . AG. λXY ij = log µ111 µi11 µ1j1 ! XY Z [µijk µ11k /µi1k µ1jk ] λijk = log [µij1 µ111 /µi11 µ1j1 ] 19.58 with gender. for zero-sum constraints. and .23. (GRP. (i) (GRP. BR. logit(π) = α. P S.153)) = (2. estimated conditional BS odds ratio equals exp(1.15. a. so injury is more likely for no seat belt use. 17. c. Y = j|Z = k) = P (X = i|Z = k)P (Y = j).44 with seat-belt use. For homogeneous association model. AG). 2. but we use the full model. there is a positive association between support for birth control availability and premarital sex. log θ11(k) = log µ11k + log µ22k − log θ12k − log θ21k = λXY XY XY XY 11 + λ22 − λ12 − λ21 . πij+ = πi++ π+j+ . 7. which is marginal independence in the XY marginal table. πijk = π+j+ πi+k . AP R. BS. (iii) (GRP. λX i = log .

C is the 3×12 matrix with rows (1. YZ) is not defined in the same way. 0. x2 . 1.125 . 0. 25. in terms of cell probabilities as functions of marginal probabilities. 0. XZ) satisfies this. Since the odds ratios are identical. 0. 0.Z) called the equiprobability model. / 0. 0. / 0. x2 . No.0 but are the same at each level of the third variable.125 Z=1 Z=2 This is actually a special case of (X. x3 .125 .10 . XZ.125 . for it. XZ. 0. -1. 0.. 0. Model (XY. a. model (Y. b. When X and Y are conditionally independent. 0. 0. then an odds ratio relating them using two levels of each variable equals 1.15 . 0. .15 c. there is no three-factor interaction.15 . β)′. 0. 0. W Y. 0. 0.125 . -1.10 . α2 . 0. W Y Z) involve X and Y . 0. b. for this model). a. 1. W Z. 0. 0. x1 . but X and Z are dependent (the conditional association being the same as the marginal association in each case. / 1.10 . 0. 2/16 1/16 4/16 1/16 1/16 4/16 1/16 2/16 e. For instance. 21. When one specifies sufficient marginal probabilities that have the required one-way marginal probabilities of 1/2 each. . 0. 0.125 . 0. 0. All terms in the saturated model that are not in model (W XZ. (W X.Y. 1/4 1/24 1/12 1/8 1/8 1/12 1/24 1/4 d. x3 ).125 X . 1. 0. The λXY term does not appear in the model. / 1. Use the definitions of the models. / 0. -1. β = (α1 . / 0. these specified marginal distributions then determine the joint distribution. 0. 0. X is the 6×3 matrix with rows (1. + = 1i 1T −i = (1 + 1)T . Y Y . 0 / 0. Y Z) 27. 0. one needs to determine cell probabilities for which each set of partial odds ratios do not equal 1. Number of terms = 1 + + + . P i 1 2 T i by the Binomial theorem..15 .10 . d.16 c. so permit an XY conditional association. 1.125 . Any 2 × 2 × 2 table ! ! ! ! T T T T 23. so X and Y are conditionally independent. 0. 0. 0. 0. 1. 0. x1 .0 at each level of Z. 0. 1.

0. (t+1) (t) (t) (t+2) (t+1) (t+1) 39. / 1. 1. 0.0). / 1. µ = (µ111 . 0. 1. 0. the marginal odds ratio is the same as the conditional odds ratio (and hence 1. 33. 1). 0. 0. / 1. 0. 0. {n++k }. 1. (0) so column totals match. µ222 ) . 0. 0. µ121 . 1. 1. The likelihood equations are µ̂i+ = ni+ for all i. 0. 0. 0. 1. 0. Thus. 1. categories of (Y Z)-1] = (HI − 1)(JK − 1). 0. µ112 . so row totals match {ri }. 0. 0. 0. 1. 0. For model (XY. 37. Z). 0. -1). 0. / 0. 1. b. 0. 0. 31. 0. 1. 29. 0. 0. For (XY. 1. and k. µ211 . 1. 0. XZ. so the usual results apply to the two-way table having Z in one dimension. categories of (XY )-1][no. 0. µ̂++j+ = n++j+ . λX Y Z ′ ′ 1 . β = (λ. 0. 0. 0. 0. 0. Z). Y Z) says that the composite variable (having marginal frequencies {nhi++ }) is independent of the Y Z composite variable (having marginal frequencies {n++jk }). 0. 0. so the marginal odds ratio equals the conditional odds ratio. 0. 1. 0. Residual df = IJK − [1 + (I − 1) + (J − 1) + (K − 1) + (I − 1)(J − 1)] = (IJ − 1)(K − 1). 0. a.. For (XY. µ122 . df = IJK − [1 + (I − 1) + (J − 1) + (K − 1) + (I − 1)(J − 1) + (J − 1)(K − 1)]. Y Z). Differentiating with respect to λXY ij and λZk gives the likelihood equations µ̂ij+ = nij+ and µ̂++k = n++k for all i. 0. 0. (i) For each pair of variables. 17 -1. 1. 1. e. / 0. which has probabilistic form πhijk = πh+++ π+i++ π++j+ π+++k . 0). df = (HIJ − 1)(K − 1). 1. 0. Model (W X. 0. 0. log likelihood is ni++ λX n+j+ λYj + n++k λZk + nij+ λXY X X X XX XXX L = nλ + i + ij − µijk i j k i j The minimal sufficient statistics are {nij+ }. λ1 . 0. 0. j. 0. a. 0. 0. 0. and then πij = πij (cj /π+j ). df = IJK − [1 + (I − 1) + (J − 1) + (K − 1) + (I − 1)(J − 1)]. 0. Y Z). Model (W XY. 0. 0. 0. 0. 0. 0. For (XY. 0. and they satisfy the model. Z) says that Z is independent of the W XY composite vari- able. For this model. 0. λ1 ) . 0. 0. / 1. 1. Chapter 9 1. Take πij = πij (ri /πi+ ). µ̂+++k = n+++k . 0. 0. since πijk = πij+ π++k . 1. µ̂+i++ = n+i++ . in a given row the J cell probabilities are equal. / 1. (ii) . 0. 1. 0. so by Birch’s results they are ML estimates. µ221 . 0. 0. since the remaining variable is conditionally independent of each of those two. For this model. / 1. 0. 2. 0. For any pair of variables. The formula reported in the table satisfies the likelihood equations µ̂h+++ = nh+++ . / 1. and X is a 8×4 matrix with rows (1. 0. / 0. at least one of them is conditionally independent of the remaining variable. 1. df = [no. where {πij = pij }. The fitted values that satisfy the model and the likelihood equations are µ̂ij = ni+ /J. 0. b. HIJ levels of W XY composite variable in the other. µ212 . and A is the 12×9 matrix with rows (1. -1. 1. 0. 0. 35. 0. · · ·. for t = 1. 0. 0. 0. µ̂ijk = µ̂ij+ µ̂++k /n = nij+ n++k /n. df = IJK − [1 + (I − 1) + (J − 1) + (K − 1) + (I − 1)(J − 1) + (J − 1)(K − 1) + (I − 1)(J − 1)].g. / 0.

so the association may change when one controls for M. also. 5. then. with std. so these odds ratios all are at least equal to 1. the variables are negatively likelihood-ratio dependent. df = 4.79. since A and C are conditionally independent (given M). Yes – let U be a composite variable consisting of combinations of levels of Y and Z. a.5. (i) no pairs of variables are conditionally independent. CM). No.1.7 standard errors. the conditional distributions on Y are stochastically increasing.25 = 3.49 times that for younger group. e. For age scores (1.73 for each successive increase of one age category. X and Z are conditionally independent given W and Y or given only Y alone. 27. which implies the difference between an observed and fitted count in one cell is the negative of that in an adjacent cell. a. (ii) These are likelihood equations implied by the three association terms in the model. Model (AC. error = . when β > 0. and the . (ii) For the AM odds ratio. Thus. c. Do a likelihood-ratio test with and without time as a factor in the model 19. 33.309.3.g. when i < h and j < k. their SE values are thus identical. as Y increases. The other models fit poorly. CM) fits well. The estimated constant collision rate is exp() = . Monotonicity of scores implies ui < uh and vj < vk . 21. the difference between the observed and fitted counts is 3. in the cell with each variable equal to yes. When β < 0. The interaction term = -. so collapsibility conditions are not satisfied for any pair of variables. AM. W and Y are conditionally independent given X and Z (as the model symbol implies) or conditional on X alone since X separates W and Y . collapsibility conditions are satisfied as W is conditionally independent of U. as X increases. d. From the definition. W and Z are separated using X alone or Y alone or X and Y together. estimated death rate for older age group is e1.097. G2 = 1. With log link.0153 accidents per million miles of travel.309) = .2. The ratio of the rate for smokers to nonsmokers decreases markedly as age increases. c.18 these are the likelihood equations implied by the λAC term in the model. the conditional distributions on X are stochastically in- creasing. df = 3. for model (AM. 15. as are the standardized Pearson residuals.0 when β ≥ 0. G2 = 12. b. For L×L model. It has df = 1. G2 = 3.5). it follows that a joint distribution of two discrete variables is positively likelihood-ratio dependent if all odds ratios of form µij µhk /µik µhj ≥ 1. (iii) These are likelihood equations implied by the λAM and λCM terms in the model. the odds ratio is the same when one collapses over C. given X. df = 2. 17. this odds ratio equals exp[β(uh −ui )(vk −vj )]. the estimated ratio of rates is multiplied by exp(−. and the likelihood equations imply fitted values equal observed in each two-way marginal table.a. b. 25. (i) Both A and C are conditionally dependent with M.4. The model appears to fit adequately.

Y Z). from which follows P P the I equations in the third set of likelihood equations. b.9). P P which under indep. Also under H0 . where λ = λ + λI + λI + uI vI . I. a. and πij φij simplifies to P P PP −( ui πi+ )( vj π+j ). . For instance. λ̄j = λj −λI + Y µ1 (vj −vI ). This has row effects form with the indicated constraints. and { j nij vj }. For row effects model with j < k. 19 conditional distributions on Y (X) are stochastically decreasing as X (Y ) increases. This model uses the ordinality of X and Y . and Y is stochastically higher in row i. log likelihood is ni+ λX n+j λYj + X X X X XX L = nλ + i + µi [ nij vj ] − exp(λ + . Y Z). µhj µik /µhk µij = exp[(µi − µh )(vk − vj )]. a. i = 1. or simply note that the model has one more parameter than the conditional XY independence model (XZ.) i j i j i j Thus. so it has one fewer df than that model. X X X X [ i i j j √ The asymptotic standard error is σ/ n. Then log mij = λ+(λ̄X X Y Y i +λI )+[λ̄j +λI +µI (vI −vj )]+(µ̄i +µI )vj = λ′ + λ̄X Y ′ X Y i + λ̄j + µ̄i vj . set λ̄X X X Y Y i = λi −λI .. a.. there is likelihood-ratio dependence for the 2 × J table consisting of rows i and h.9) simplifies to u2i πi+ − ( ui πi+ )2 ][ vj2 π+j − ( vj π+j )2 ]. These equations are obtained successively by differentiating with respect to λXZ . λY Z . P P πij φ2ij = u2i vj2 πi+ π+j + ( vj π+j )2 ( u2i πi+ ) + ( ui πi+ )2 ( vj2 π+j ) XX XX X X X X i j i j j i i j u2i πi+ )( vj π+j )2 −2( vj2 π+j )( ui πi+ )2 . b. b. and β. and is a parsimonious special case of model (XY. Use formula (3. In this context. {n+j }. 39. λXY has only one nonredundant value. XX X X X X X X +2( ui vj πi+ π+j )( ui πi+ )( vj π+j )−2( i j i j i j j i Then σ 2 in (3. Differentiating P with respect to the parameters and setting results equal to zero gives the likelihood equations. since scores on Y are monotone increasing. For Poisson sampling. (Note the distinction between . 35. For zero-sum constraints.. the minimal sufficient statistics are {ni+ }. we can take u1 = v1 = −u2 = −v2 and have βui vj = λXY ij . Can calculate this directly. Note the derivative of the log likelihood with respect to β is i j ui vj (nij − µij ). Thus. ζ = ui vj (πij − πi+ π+j ) and φij = ui vj − PP ui ( b vb π+b ) − vj ( a ua πa+ ) Under H0 .. When µi − µh > 0. ∂L/∂µi = j vj nij − j vj µij . If parameters do not satisfy these constraints.. 37. When I = J = 2. Note these equations imply that the correlation between the scores for X and the scores for Y is the same for the fitted and observed data. all such odds ratios are positive. estimates is n i j ui vj (pij − pi+ p+j ). XZ. the estimate of which is the same formula with πij replaced by pij . c. µ̄i = µi −µI . P P b. πij = πi+ π+j .

. 41.e. for SE = . McNemar chi-squared = (203 − 90)2 /(203 + 90) = 43. . The G2 value is the sum of G2 for separate fits.018. β̂ = log(203/90) = log(2. df = K(IJ − I − J)). k = 1.0093). b. {µ̂+jk = n+jk }. XX XX ui vj µ̂ijk = ui vj nijk . and it follows that ML estimates cannot exist for this model. where {vj } are fixed scores and the row effects satisfy a constraint such as i µi = 0. since we must be able to evaluate the logarithm for all fitted values..080).. The third equation is replaced by the K equations. (A=1. and { j vj µ̂ij+ = j vj nij+ }.20 ordinal and nominal is irrelevant when there are only two categories. a. Thus we have a contradiction. .’ c.062 ±. P P and residual df = IJK -[1 + (I-1) + (J-1) + (K-1) + (I-1)(K-1) + (J-1)(K-1) + (I-1)]. and df is the sum of IJ − I − J values from separate fits (i. since likelihood equations for the model imply that µ̂111 + µ̂112 = n111 + n112 (i. For unit-spaced scores.81. those effects are the same for each logit. where parameters satisfy constraint such as i µik = 0 for each k.000086. Using similar arguments for other two-way margins implies that µ̂122 = n122 +c. or .B=1)) occurred 22 times. a.650. Replace final term in model in (a) by µik vj . For a given respondent. we cannot exploit “trends” until there are at least 3 categories. and let c = µ̂111 . Note this model treats Y alone as ordinal. b. But then µ̂112 = n112 − c. Sample marginal proportions are 1300/1825 = 0.712 and 1187/1825 = 0..0001. Chapter 10 1. the odds of a ‘yes’ re- sponse for ‘let patient die’ are estimated to equal 2. df = 1. K.96(.B=0) occurred 45 times and (A=0.26) = 0. . µ̂11+ = n11+ ). Replace final term in df expression by K(I − 1).’ 3. the odds that Y is in category j + 1 instead of j are exp(µa − µb ) times higher in level a of X than in level b of X. the odds that Y is in category j + 1 rather than j are exp(µak − µbk ) times higher at level a of X than at level b of X. and µ̂222 = n222 − c. 47.6.062 has an estimated variance of [(90+203)/1825−(90−203)2/18252]/1825 = . The difference of .26 times the odds of a ‘yes’ response for ‘suicide. µ̂222 = −c < 0. there is strong evidence of a higher proportion of ‘yes’ responses for ‘let patient die. For equally-spaced scores. Ignoring order. a.) d. which is impossible. log µijk = λ + λX Y Z XZ YZ i + λj + λk + λik + λjk + µi vj .044. and corresponds to an adjacent-categories logit model for Y as a response in which X and Z have additive effects but no interaction. Suppose ML estimates did exist. Then c > 0. µ̂212 = n212 +c. The P likelihood equations are {µ̂i+k = ni+k }. But since n222 = 0. P < . within each level k of Z. The 95% Wald CI is .. or (. i j i j This model corresponds to fitting L × L model separately at each level of Z.e.0093. Within level k P of Z.062 ±1.

296 (SE = 0.4 (df = 4). Pearson statistic = 7. The ranking is Graf.01 (df = 1) and is identical to quasi symmetry.240). Marginal homogeneity is plausible. 25. β̂3 = . this is n12 /n21 . . Kappa = . Setting the β̂5 = 0 for Sanchez.585. df = 3 − 1 = 2. Seles.389 (SE = .591 − 0. .53 for Seles. there is a contribution of 1 to the numerator or the denominator. 0. β̂2 = 1. This is a conditional odds ratio. c.4)] = . a.2 (df = 6).90). Coke is preferred to Classic Coke. or the reverse.91) for β1 − β2 translates to (. kappa = 0. a.73 for Sabatini. or in terms of the original 2×2 table. The estimate β̂1 − β̂2 = −. with G2 = 0. A 95% CI of (-1.4)/[1 + exp(−.0635). G2 = 4. Using a 98% CI for each of the 10 pairs. Sanchez. Navratilova.669.71) for the probability of a Seles win. b. and β̂4 = 1.57. so averaging over them to get the marginal Y1 and Y2 gives bino- mials with the same parameters. on the main diagonal. .81. b. The parameter estimates for Coke. the individual trials for the conditional model are identical as well as independent. and quasi independence has X 2 = .580 (SE = 0. and 0. df = 1. weighted kappa equals .40 has SE = .71. Symmetry model has X 2 = . and McNemar‘s test compares proportions for dependent samples. a. The t test is valid for interval-scale data (with normally-distributed differences. When {αi } are identical.8. yet there is clearly strong association in the table. sample proportion = 29/49 = 0. exp(−. 13. The matched-pairs t test compares means for dependent samples. and Classic Coke are 0.40. Independence has X 2 = 45. Thus. The overall estimator then is the ratio of the numbers of such pairs. Sabatini. 15. model estimate = 0. which has a two-tail P -value of . b. only the difference between Graf and Sanchez is significant. Under independence. In the three-way representation. 23. This is simply the mean of the expected values of the individual binary observations. so that table makes no contribution to the statistic. b. 21. The symmetry and quasi independence models fit well.427 (SE = .59. 17. then each cross-product that contributes to the M-H estimator equals 0. for small samples) whereas McNemar’s test is valid for binary data. Good fit. depending on whether the first observation is a success and the second a failure. d. c. Otherwise.93 for Graf. df = 1 5. G2 (S | QS) = 0. fitted = 5 = observed. c. 2/5.6.060).09 for Navratilova. β̂1 = 1. X 2 = 3.006 = 0. note that each partial table has one observation in each row. If each response in a partial table is identical. a.15. conditional on the subject. For testing fit of the model.59. Pepsi.005 and provides strong evi- dence that the response rate of successes is higher for drug A. 21 The McNemar z = 2.240). but the other model is a marginal model so its odds ratio is not conditional on the subject. based on df = 3 (P = .3. This also equals the sample proportion.

identifying βb with αb (1 − β). Subjects can select any number of the sources.11 for the model with separate effects).20) = . . It is a special case of the quasi association model in which the main-diagonal parameters are replaced by a common pa- rameter δ.19. df = 1. With a linear effect for age using scores 9. One can then use such a parsimonious model that sets certain parameters to be equal.46 times the estimated odds for black subjects. c.063 compared to values around . exp(−.2. Since R = G = S1 = S2 = 0. for cigarettes.22) = 1.66 for cigarettes. a µ̂aa = a naa . a. β = κ = 0 is equivalent to independence for this model. b.2 (df = 3). which then implies marginal homogeneity and quasi symmetry as special cases. Likewise. a b ua ub µ̂ab = P P b ua ub nab . so it also satisfies quasi independence. The estimated correlation is weak.97. a. the GEE estimate of the age effect is .82. Estimated odds ratio = exp(1.22 29.2).086 (SE = . for all a and b. a. / .02. (.0001).46. Since πab = πba . µ̂+b = n+b . P P P P a Chapter 11 1. the likelihood-ratio statistic equals 1322. it satisfies symmetry. by row. and . b.37) = 9. 39. for marijuana. To test marginal homogeneity. .37) = 1. Bhapkar W = 12. exp(−. µ̂a+ = na+ . P < .93 + . For a 6= b. c.3. Likelihood equations are. . D. and thus results in a smaller SE for the estimate of that effect (.003). e. The P -value . 12. For source A the estimated size effect is 1.38) = 1.57).025).8 (P = . . so results will not be much different from treating the 5 responses by a subject as if they came from 5 independent subjects. so the estimated odds for white subjects are exp(0.57 and estimated odds = exp(−.005) b.08 and highly significant (Wald statistic = 6. 0. estimated logit is −. The multinomial distribution does not apply to these 40 cells.93) = 6. 0.86 for alcohol. d.20+ . so a given subject could have anywhere from 0 to 5 observations in this table. / 0.20+0. 41. For alcohol. For sources C.10. πab has form αa βb . The general CMH statistic equals 14. 3. estimated odds ratio = exp(−. and E the size effect estimates are all roughly -. a.42 for marijuana. based on the exchangeable working correlation. 11.2.10. c. extremely strong evidence of differences among the marginal distributions. . 9. a.89. c. and β = κ = 1 is equivalent to perfect agreement.3 and the general CMH statistic equals 1354. Race does not interact with gender or substance type. The association term is symmetric in a and b. 10. 7. showing strong evidence against marginal homogeneity (P = . Estimated odds ratio = exp(1. from 0 to 5. The sample proportions of yes responses are .0 with df = 2. Consider the 3×3 table with cell probabilities.10.

CMH methods summarize information from the counts in the various strata. Also. 21. The model-based estimate tends to be better P when the model holds. P 25. V = [v(µi )]−1 = P P i i ∂β ∂β [ i µ−1 i ] −1 = [n/β]−1 = β/n.0006) is even smaller than in (a). 11. They are equivalent if one assumes in addition that the distribution is in the natural exponential .  ′ ∂µi 23. Since v(µi ) = µi for the Poisson and since µi = β. 23 (.29) gives strong evidence that the active drug group tended to fall asleep more quickly. treating them as hypergeometric after conditioning on row and column totals in each stratum. the actual asymptotic variance that allows for vari- P ance misspecification is ∂µi ′ ∂µi X    [v(µi )]−1 Var(Yi )[v(µi )]−1 µ−1 2 −1 2 X V V = (β/n)[ i µi µi ](β/n) = β /n. u(β) = v(µi )−1 (yi − µi ) = µ−1 i (yi − µi ) = i (yi − P P P i ∂β i   ′  −1 ∂µi ∂µi β)/β. b. as the test is focused on df = 1. the model-based asymptotic variance is ∂µi ′ −1 ∂µi X   [v(µi )]−1 (1/µi )]−1 = β/n. QL assumes only a variance structure but not a particular distribution. yi = nβ. X V= =[ i ∂β ∂β i Thus. If we add association terms for the other pairs of ages. the last expression simplifies (using µi = β) to i (yi − ȳ)2 /n2 . λ̂ = 1.81 and X 2 = 0. and the generalized hypergeometric distribution is degenerate and has variance 0 for each count. Also. Since ∂µi /∂β = 1. and the robust estimate tends to be better when there is severe overdispersion so that the model-based estimate tends to underestimate the SE. the model-based asymptotic variance estimate is ȳ/n. First-order Markov model has G2 = 40. i ∂β ∂β i Replacing the true variance µ2i in this expression by (yi − ȳ)2 . we get G2 = 0. X X = (β/n)[ (1/µi )Var(Yi )(1/µi )](β/n) = ( i i 2 2 which is estimated by [ i (Yi − ȳ) ]/n . a poor fit.12).84 (df = 5) and a good fit.52 (SE = . a. The actual asymptotic variance that allows for variance misspecification is ∂µi ′ ∂µi X    V [v(µi )]−1 Var(Yi )[v(µi )]−1 V i ∂β ∂β Var(Yi ))/n2 . GEE estimate of cumulative log odds ratio is 2. the stratum for the subject has observations in one column only. for those at the two highest levels of initial time to fall asleep.08 (SE = . 27. Setting this equal to 0.0 (df = 8). 13. 15. similar to ML. When a subject makes the same response for each drug.

Because of the extra variance component. double the change in maximized log likelihood is 13. 3. A disadvantage is not having a likelihood function and related likelihood- ratio tests and confidence intervals. An advantage is being able to extend ordinary GLMs to allow for overdispersion. the odds of having used cigarettes are estimated to equal exp[1. yt−1 . .5 reflects strong associations among the three responses.6 on df = 10. Yt+1 does not depend on y0 . . so the evidence of a period effect is weak.19.0 times the odds of having used marijuana. a.25 times the estimated odds for the first question. 9. P = . for a given subject for any sequence. 33. β̂ converges to 0. is . the probability of transition to a particular state is independent of the time of transition. Chapter 12 1.59 comparing B and C).24 family with that variance function. The simpler model is adequate.37). They are not consistent if one misspecifies the model for the mean.51 (SE = . the estimate of β is not as precise.6209 − (−. B and C are better than A. Same as conditional ML β̂ = log(203/90) (SE = 0. 31. For a given department. Yes.163) = 1. c. For a given subject. d. They are consistent if the model for the mean is correct.13 times the estimated odds for B (and odds ratio = .127).176. . Taking into account SE values. but only a variance function and a correlation structure. The large value of σ̂ = 3. The estimated mean log odds ratio between gender and admissions. the estimated odds of admission for a female are exp(. d.07) = . e.127. c. b. because it is still true that given yt .35).19 times the estimated odds of admission for a male.99 (SE = . given the state at a particular time.7751) = 11. the estimated odds of admission for a female are exp(.813 with SE = 0.93 is in a different direction. a. 7.99) = . b.g. the esti- mated odds of approval for the second question are exp(.08 comparing A to C and .19 for comparing models. even if one misspecifies the variance function and correlation structure. The marginal odds ratio of exp(−. for instance by permitting the variance to be some constant multiple of the variance for the usual GLM. the likelihood-ratio statistic = . Comparing the simpler model with the model in which treatment effects vary by se- quence. a.173) = 1. with σ̂ = 0. GEE does not assume a parametric distribution. . Adding period effects to the simpler model. For a given subject. corresponding to an odds ratio of 1. For a given department. the number of which depends on starting values. β̂B = 1.. the estimated odds of relief for A are exp(−1. y1 . given department.813) = 2. 11. df = 2. For this model. permitting heterogeneity among departments.18 times the estimated odds of admission for a male. corresponding . β̂C = 2.5. With a sufficient number of quadrature points. b.

39 (SE = . The random effects model assumes the true log odds ratios come from a normal dis- tribution. the fit is that of the symmetry model. so the integral is the same. Since Z − zit ui has a ′ ′ ′ N(0.06). a. the ratio in each row estimates exp(β). the sum of the counts across the rows gives row totals for which the ratio also estimates exp(β). Thus. The estimated probability is mono- tone increasing in α̂. 35.1. and the P -value is half that of a χ21 variate. When σ̂ is large. 21. and is 0 to many decimal places. which in the univariate case is 1 + σ 2 . . b. the log likelihood is flat and many N values are consistent with the sample. σ̂2 = 1. the model is equivalent to the marginal one deleting the random effect. For the (I − 1) × 2 table of the off-diagonal counts from each collapsed table. b. shrinking them toward a common mean. b. For a given subject. It is also the random effects ML estimate if the log odds ratio in that collapsed table is non- negative. The null distribution is an equal mixture of degenerate at 0 and χ21 . t he probability in the integrand is Φ(xit β[1 + zit Σzit ]−1/2 ). 17.33 between random effects. c. and by results in Chapter 9 on collapsibility is possible when Department is associated both with gender and with admissions.07 (SE = . a. The likelihood-ratio statistic equals −2(−593 − (−621)) = 56. probability = odds/(1 + odds) = exp[logit(qi ) + α]/[1 + exp[logit(qi ) + α]]. Two terms drop out because µ̂11 = n11 and µ̂22 = n22 .8. exp[logit(qi )] = exp[log(qi )−log(1−qi )] = qi /(1−qi ). There is more support for increasing government spending on education than on the others. This is Simpson’s paradox. 1 + zit Σzit ) distribution. When σ̂ = 0. 19. d. e. c. 31. for which log(µ̂21 /µ̂12 ) = 0. ′ ′ 29. so Z ′ ′ P (Yit = 1) = P (Z ≤ xit β + zit ui )f (u. 27. If the model is incorrect. The model applies to that collapsed table and those estimates are consistent for it. Then. the odds of response (Yit ≤ j) for the second observation are exp(β) times those for the first observation. with σ̂1 = 4. The estimate given is the conditional ML estimate of β for the collapsed table. Setting β0 = 0. P (Yit = 1|ui ) = Φ(xit β + zit ui ). so does the estimated Democratic vote in this election. There is extremely strong evidence that σ > 0. Thus. as the Democratic vote in the previous election increases. β̂2A − β̂1A = . Σ)dui . which does not depend on ui . the actual coverage probability may be much less than the nominal probability.09). The parameters in the marginal model√equal those in the GLMM divided by [1 + ′ zit Σzit ]1/2 . The corresponding marginal model has a population-averaged rather than subject-specific interpretation. a. 15. The log of this ratio is β̃. A narrower interval is not necessarily more reliable. ′ where Z is a standard normal variate that is independent of ui . Also. It smooths the sample values. 25 to an odds of being admitted that is lower for females than for males. β̂2M − β̂1M = . and estimated correlation .

..177 and sample variance = . We estimate that Shaq’s probability of success varies from game to game with a mean of . but SE = ..0 for the random effect.015 per million miles of travel. . Since q = 2.454 for the mean of theqbeta distribution for πi .454 and standard deviation of .yT = [ P (Yt = yt | Z = z)]P (Z = z). with estimated standard deviation 1. Since the dispersion parameter estimate is rel- atively small.071(. log µ̂ = −4. The estimated difference of means is 1.02.552 ± 1.9) simplifies n to y µ (1 − µ)n−y .102 with SE = . the model has residual and df = I T − qT (I − 1) − q = 0 and is saturated.546) = . The estimated constant accident rate is exp(−4. the same as for a Poisson model.196 for the Poisson model and SE = .. I = 2.86).442. 17. The ρ̂ estimates for the four groups are . a 95% Wald confidence interval for the difference of means is 1. 7. When  θ = 0. In the multinomial log likelihood.164). y . 9. there is mild overdispersion with ρ̂ = . 29.133. The estimated standard deviation of that distribution is then . An estimate of -.26 Chapter 13 1. Only the placebo group shows evidence of overdispersion. the SE of .186 for the common value of the logit corresponds to an estimate of . The only significant difference is between whites and blacks.186 (SE = .96(.yT . or (. the beta distribution is degenerate at µ and formula (13. Using the negative binomial model.. The intercept estimate is -0.132 for the log rate for the Poisson model in Problem 9.665 for the negative binomial model. a. its estimate is -. 2.. 13. and . also showing evidence of overdispersion. There is not strong evidence of a litter size effect.32.02.454 × . sample mean = . b. and T = 3. b. 11.yT log πy1 . the parameter in the null must fall in the interior of the parameter space.. X ny1 .a. With the QL approach with beta-binomial type variance.177) = .19.153 for the log rate with this model is not much greater than the SE of .05 + 0...19x.071 (logit link).665). The null model falls on the boundary of the parameter space in which the weights given the two components are (1.24x.03.25.78 + 0.133. 0).. z=1 t=1 27. 25.. For the other group. log µ̂ = −5..070. The Poisson SE is not realistic because of the extreme overdispersion. one substitutes q Y X T πy1 .552 for each model. -. Including litter size as a predictor. For ordinary chi-squared distributions to apply. 15.

1−2θ. By the delta method with the square root function. With a normally distributed random effect. 1/4). the asymptotic variance is (1 − 2π)2 /4n. e. 1/(1−θ)] on the main diagonal shows that A is the 4×1 vector A = [2. The vector ∂π/∂θ equals (2θ. recall that θ̂ = (p1+ + p+1 )/2. b. or in other words Tn − nµ is asymptotically N(0. [θ(1−θ)]−1/2 . n[ Tn /n − µ] is asymptot- √ √ √ ically normal with mean 0 and variance (1/2 µ)2 (µ). the asymptotic variance of θ̂ is (A′ A)−1 /n = 1/n[8 + (1 − 2θ)2 /θ(1 − θ)]. is zero. Cov(Yit Yis ) = Corr(Yit Yis ) [Var(Yit )][Var(Yis )] = ρπi (1 − πi ). and the result follows using the delta method. The vector of partial derivatives.a. and solve for µ yields µ̂ = ȳ. Using the delta method. −2(1−θ))′. The likelihood is proportional to nk P yi k µ    i µ+k µ+k The log likelihood depends on µ through X −nk log(µ + k) + yi [log µ − log(µ + k)] i Differentiate with respect to µ. 1−2θ. 13. which implies φ = 1 when ni = 1. and the convergence of the estimated standard deviation to the true value is then faster than the usual rate. (1−2θ)/[θ(1−θ)]1/2 . with positive probability the Poisson mean (conditional on the random effect) is negative. X X X Var( Yit ) = Var(Yit +2 Cov(Yit . 27 q 31. Hence the asymptotic normal distribution is degenerate. [θ(1−θ)]−1/2 . Yis ) = ni πi (1−πi )+ni (ni −1)ρπi (1−πi ) = ni πi (1−πi )[1+(ni −1)ρ]. 17. E(Yit ) = πi = E(Yit2 ). then p1+ = p+1 = 0 with probability 1. When θ = 0. √ √ √ q b. From Problem 3. in which case the asymptotic variance is 1/8n. which simplifies to θ(1 − θ)/2n. evaluated at the parameter value. Multiplying this by the diag- onal matrix with elements [1/θ.31). Problem 13. Chapter 14 5. having a variance of zero. set equal to 0.g. 37. Ordinary least squares assumes constant variance. If g(p) = arcsin( p). so θ̂ = 0 with probability .5. A binary response must have variance equal to µi (1 − µi ) (See. Also. −2]′ . so Var(Yit ) = πi −πi2 . Using the second-order terms in the Taylor expansion yields an asymptotic chi-squared distribu- tion. This vanishes when π= 1/2. This is maximized when θ = . Since A′ A = 8 + (1 −2θ)2 /θ(1 −θ). (1−2θ)/[θ(1−θ)]1/2 . √ q √ 7a.. i i i<j 33. Then. 45.31. then g ′ (p) = (1/ 1 − p)(1/2 p) = 1/2 p(1 − p).

so that the expected frequencies in the cells are relatively large. No proper prior leads to the ML estimate. b.28 1. and (3) the number of cells N is small compared to the sample size n. This is one of the regularity conditions that is assumed in deriving results about asymptotic distribu- tions.1. that is when θ is not on the boundary of the parameter space. The asymptotic covariance matrix is (∂π/∂θ)(A′ A)−1 (∂π/∂θ)′ = [θ(1 − θ)/2][2θ. −2(1 − θ)]. which we get from the beta density by taking the improper settings α = β = 0. This is a parsimonious special case of the independence model. 1 − 2θ. X 2 and G2 necessarily take very similar values when (1) the model holds.) 23. Chapter 15 17a. 1 − 2θ. 1 higher than for the independence model. n1 /n. 1 − 2θ. c. When θ = 1. The ML estimator is the limit of Bayes estimators as α and β both converge to 0. which gives marginal homogeneity. (Note: This model is equivalent to the Poisson loglinear model log mij = µ + λi + λj . 1 − 2θ. df = N − t − 1 = 4 − 1 − 1 = 2. and the asymptotic variance is 0. There are four cells and two parameters in the model. This happens with the improper prior. and the asymptotic variance is also 0. β > 0. . the asymptotic normality of θ̂ applies for 0 < θ < 1. the Bayes estimator is (n1 + α)/(n + α + β). From Sec. 15. so the usual df formula for loglinear models tells us df = 2. in which α > 0.2. (2) the sample size n is large. there is independence plus each marginal distribution has the same parameters. proportional to [π1 (1 − π1 )]−1 . c. θ̂ = 1 with probability 1. d. that is. In summary. −2(1 − θ)]′ [2θ.