Professional Documents
Culture Documents
& %
1
' $
& %
2
' $
Example
Second Survey
& %
3
' $
& %
4
' $
& %
5
' $
Dependent Proportions
• The probabilities of approval at the two surveys are π1+ and
π+1 , the first row and first column totals.
• When these are identical, the probabilities of disapproval are
also identical, and there is marginal homogeneity.
• Note that,
& %
6
' $
Inference For Dependent Proportions
Use δ = π+1 − π1+
• Thus,
√
var( nd) = π1+ (1 − π1+ ) + π+1 (1 − π+1 ) − 2(π11 π22 − π12 π21 )
δ̂ 2 (d) = [p1+ (1 − p1+ ) + p+1 (1 − p+1 ) − 2(p11 p22 − p12 p21 )]/n
= [(p12 + p21 ) − (p12 − p21 )2 ]/n (1)
& %
8
' $
McNemar’s Test
• Under H0 , an alternative estimated variance is
p12 + p21 n12 + n21
σ̂02 (d) = =
n n2
& %
11
' $
Response
1 First 1 0
Second 1 0
2 First 0 1
Second 0 1
3 First 0 1
Second 1 0
4 First 1 0
Second 0 1
& %
12
' $
Connection
• An alternative representation of binary responses for n
matched pairs presents the data in n partial tables, one 2 × 2
table for each pair.
• For our example, we have 2 × 2 × 1600 contingency table.
• For each subject, suppose that the probability of approval is
identical in each survey. Then conditional independence exists
between the opinion outcome and the survey time, controlling
for the subject.
& %
13
' $
Discussion
• The probability of approval is also the same for each survey in
the marginal table collapsed over the subjects.
• But this implies that the true probabilities satisfy marginal
homogeneity. .. Thus, a test of conditional independence in the
2 × 2 × 1600 table provides a test of marginal homogeneity.
• The CMH test statistic for this purpose is identical to the
squared McNemar’s statistic.
• McNemar’s test is a special case of the CMH test applied to
the binary responses of n matched pairs displayed in n partial
tables.
& %
14
' $
Conditional Logistic Regression For Binary Matched Pairs
• Let (Y1 , Y2 ) denote the pair of binary observations for a
randomly selected subject.
• The difference δ = P (Y2 = 1)P (Y1 = 1) between marginal
probabilities occurs as a parameter in
P (Yt = 1) = α + δxt
where x1 = 0 and x2 = 1.
• Then P (Y1 = 1) = α and P (Y2 = 1) = a + δ.
• Alternatively, the logit link yields
logit[P (Yt = 1)] = α + βxt
Marginal Models
• The previous models are marginal models.
• They focus on the marginal distributions of responses for the
two observations.
• For instance, the ML estimate of β in the previous logit model
is the log odds ratio of marginal proportions
P+1 P2+
β̂ = log
P+2 P1+
& %
16
' $
Conditional Models
• By contrast, the subject specific tables implicitly allow
probabilities to vary by subject.
• Let (Yi1 , Yi2 ) denote the i-th pair of observations, i = 1, · · · , n.
• A model then has the form
where x1 = 0 and x2 = 1.
• It assumes a common effect β.
• For subject i,
exp(αi ) exp(αi + β)
P (Yi1 = 1) = , P (Yi2 = 1) = ,
1 + exp(αi ) 1 + exp(αi + β)
& %
18
' $
Discussions
• A subject with a large positive αi has high P (Yit = 1) for each
t and is likely to have success each time.
• The greater the variability in {αi }, the greater the overall
positive association between responses.
• This is true for any β.
• No association occurs only when {αi } are identical.
• The model takes into account for the dependence in matched
pairs.
• Fitting it takes into account nonnegative association through
the structure of the model.
& %
19
' $
Conditional ML Inference
• Large number of {αi } causes difficulties with the fitting process.
• Conditional ML treats them as nuisance parameters and
maximizes the likelihood function for a conditional distribution
that eliminates them.
• The conditional ML estimator of β in the subject specific
model and its standard error are:
p
β̂ = log(n21 /n12 ), SE = 1/n21 + 1/n12
& %
20
' $
& %
21
' $
MI Cases
Diabetes 9 16 25
No Diabetes 37 82 119
Total 46 98 144
& %
22
' $
Example
• Consider the model:
& %
23
' $
πi+ = π+i , i = 1, · · · , I
πij = πji
& %
24
' $
& %
25
' $
Symmetry Models
• The ML fit of the symmetry model is
nij nji
µ̂ij =
2
& %
26
' $
Second Purchase
First
purchase High Point Taster’s Sanka Nescafe Brim Total
& %
27
' $
Second Purchase
First
purchase High Point Taster’s Sanka Nescafe Brim Total
& %
28
' $
& %
29
' $
Quasi-Symmetry
• The symmetry model is so simple that it rarely fits well.
• One can accommodate marginal homogeneity by permitting
the loglinear main-effects terms to differ.
• The resulting model, called the quasi − symmetry model, is
log µij = λ + λX Y
i + λj + λij
& %
30
' $
Quasi-Symmetry Model
• The fitted marginal totals equal the observed totals µ̂i+ = ni+
and µ̂+i = n+i ,i = 1, · · · , I
• The symmetry model is the special case.
• The independence model is the special case in which all λij = 0.
• This model is useful partly because it contains these two
models as special cases.
• Fitting the quasi − symmetry model requires iterative
procedures for loglinear models.
& %
31
' $
Second Purchase
First
purchase High Point Taster’s Sanka Nescafe Brim Total
& %
32
' $
& %
33
' $
An Ordinal Quasi-Symmetry Model
• Consider the case with ordinal responses.
• Let u1 ≤ u2 ≤ · · · ≤ uI denote ordered scores for both the row
and column categories.
• The ordinal quasi − symmetry model has form
log µij = λ + λi + λj + βuj + λij
where λij = λji for all i and j.
• It is the special case of the quasi − symmetry model in which
λYj − λX
j = βuj
Discussions
• For this model, the fitted marginal counts need not equal the
observed marginal counts, but they do have the same means.
• When β > 0, the π1+ > π+1 , π1+ + π2+ > π+1 + π+2 and so
forth.
• That is, responses are more likely to be at the low end of the
ordinal scale for the row variable than for column variable
• When β < 0, the mean response is higher for the row
classification.
• The ordinal quasi-symmetry model is equivalent to the model
of logit form
logit(πij /πji ) = β(uj − ui )
& %
35
' $
Pathologist Y
Pathologist X 1 2 3 4 Total
1 22 2 2 0 26
2 5 7 14 0 26
3 0 2 36 0 38
4 0 1 17 10 28
Total 27 12 69 10 118
& %
37
' $
Pathologist Y
Pathologist X 1 2 3 4 Total
1 22 2 2 0 26
(8.5) (-0.5) (-5.9) (-1.8)
2 5 7 14 0 26
(-0.5) (3.2) (-0.5) (-1.8)
3 0 2 36 0 38
(-4.1) (-1.2) (5.5) (-2.3)
4 0 1 17 10 28
(-3.3) (-1.3) (0.3) (5.9)
Total 27 12 69 10 118
& %
38
' $
& %
39
' $
log µij = λ + λX Y
i + λj + δi I(i = j)
& %
40
' $
Examples
• For the diagnoses of carcinoma example, the
quasi-independence model has G2 = 13.2 and χ2 = 11.5 with
df = 5.
• It fits much better than the independence model, though some
lack of fit remains.
• For the Coffee brand example, the quasi-independence model
has G2 = 13.8 with df = 11.
• This is a dramatic improvement over independence, which has
G2 = 346.4 with df = 16.
& %
41
' $
& %
42
' $
& %
43
' $
Partial Output
purchase1 purchase2
Frequency 1 2 3 4 5 Total
2 13 46 11 2 6.5 78.5
4 6.5 2 9 15 2 34.5
5 10 6.5 12 2 27 57.5
& %
44
' $
& %
45
' $
& %
46