Cross Tabs

CROSSTABS
The notation and statistics refer to bivariate subtables defined by a row variable X
and a column variable Y, unless specified otherwise. By default, CROSSTABS
deletes cases with missing values on a table-by-table basis.
Notation
The following notation is used throughout this chapter unless otherwise stated:
Xi Distinct values of row variable arranged in ascending order:

X1 < X 2 < L < X R
Yj Distinct values of column variable arranged in ascending order:
Y1 < Y2 < L < YC
f ij 1 6
Sum of cell weights for cases in cell i , j
R
cj ∑f ij , the jth column subtotal
i =1
C
ri ∑ fij , the ith row subtotal
j =1
C R
W
∑ cj = ∑ ri , the grand total
j =1 i =1
Marginal and Cell Statistics

Count
count = f ij
1
2 CROSSTABS
Expected Count
ri c j
Eij =
W
Row Percent
3
row percent = 100 × f ij ri 8
Column Percent
3
column percent = 100 × f ij c j 8
Total Percent
3
total percent = 100 × f ij W 8
Residual
Rij = f ij − Eij
Standardized Residual
Rij
SRij =
Eij
CROSSTABS 3
Adjusted Residual
Rij
ARij =
ri 1 − c
W
j
Eij 1 −
W
Chi-Square Statistics
Pearson’s Chi-Square
3f − Eij 8 2
∑
ij
χ 2p =
Eij
ij
The degrees of freedom are R − 1 C − 1 . 1 61 6

Likelihood Ratio
χ 2LR = −2 ∑ f ln3 E ij ij f ij 8
ij
The degrees of freedom are R − 1 C − 1 . 1 61 6

Fisher’s Exact Test
If the table is a 2 × 2 table, not resulting from a larger table with missing cells,
with at least one expected cell count less than 5, then the Fisher exact test is
calculated. See Appendix 5 for details.
4 CROSSTABS
Yates Continuity Corrected for 2 x 2 Tables
%KW 2 f 11 f 22 − f 12 f 21 − 0.5W 7 2
χ 2c =&
K r1r2 c1c2
if f 11 f 22 − f 12 f 21 > 0.5W
KK0
'
otherwise
The degrees of freedom are 1.
Mantel-Haenszel Test of Linear Association
1
χ 2MH = W − 1 r 2 6
where r is the Pearson correlation coefficient to be defined later. The degrees of
freedom are 1.
Other Measures of Association

Phi Coefficient
For a table not 2 × 2
χ 2p
ϕ=
W
For a 2 × 2 table only, ϕ is equal to the Pearson correlation coefficient so that the
sign of ϕ matches that of the correlation coefficients.
CROSSTABS 5
Coefficient of Contingency
χ 2 12
CC =
χ + W
p
2
p
Cramér’s V
χ 2 12
V =
W 1q − 16
p
where q = min R, C . ; @
Measures of Proportional Reduction in Predictive Error

Lambda
Let f im and f mj be the largest cell count in row i and column j, respectively.
Also, let rm be the largest row subtotal and cm the largest column subtotal. Define
λ Y X as the proportion of relative error in predicting an individual’s Y category
that can be eliminated by knowledge of the X category. λ Y X is computed as
∑f im − cm
i =1
λY X =
W − cm
6 CROSSTABS
The standard errors are
2
∑ ∑ f 3δ 8 − ∑ f
R C R
i =1 j =1
ij ij −δ j
2
i =1
im − cm W
ASE0 =
W − cm
∑ ∑ f 3δ 8
R C
2
ij ij − δ j + λδ j − Wλ Y X
i =1 j =1
ASE1 =
W − cm
where
δ ij =
%&1 if j is column index for fim
'0 otherwise
δj =
%&1 if j is index for cm
'0 otherwise
Lambda for predicting X from Y, λ Y X , is obtained by permuting the indices in the

above formulae.
The two asymmetric lambdas are averaged to obtain the symmetric lambda.
R C
∑ f im + ∑ f mj − cm − rm
i =1 j =1
λ=
2W − rm − cm
CROSSTABS 7
The standard errors are
2 "#
∑∑ 3 8 − ∑ f + ∑ f −r
R C R C
##
2
fij δ ijr + δ ijc − δ ir − δ cj − cm
!
im mj m W
ASE0 =
i =1 j =1 i =1 j =1
$
2W − rm − cm
3 8
R C
∑∑ f
2
ij δ ijr + δ cij − δ ir − δ cj + λ δ ir + δ cj − 4Wλ2
i =1 j =1
ASE1 =
2W − rm − cm
where
δ ijr =
%&1 if i is row index for fmj
'0 otherwise
δ ir =
%&1 if i is index for rm
'0 otherwise
and where
δ ijc =
%&1 if j is column index for fim
'0 otherwise
δ ic =
%&1 if j is index for cm
'0 otherwise
8 CROSSTABS
Goodman and Kruskal’s Tau (Goodman & Kruskal, 1954)
Similarly defined is Goodman and Kruskal’s tau τ : 16

C
W ∑4 f ij2 9 ∑c
ri − 2
j
i, j j =1
τY X = C
W2 − ∑c 2
j
j =1
with standard error
%K 1 C 1 C 2 1 (K2
∑ f ij &K1v − δ 6 ri ∑ f ij c j − c j − Wδ ri2 ∑ fij − ri fij )K
4
ASE1 =
δ 4 i, j
' j =1 j =1 *
in which
C C
δ = W2 − ∑ c 2j and v =W ∑ f ij2 ri − ∑ c2j
j =1 i, j j =1
τ X Y and its standard error can be obtained by interchanging the roles of X and Y.
The significance level is based on the chi-square distribution, since
0W − 150C − 15τ YX ~ χ 02R −150 C −15
0W − 150 R − 15τ XY ~ χ 02R −150 C −15

CROSSTABS 9
Uncertainty Coefficient
Let U Y X be the proportional reduction in the uncertainty (entropy) of Y that can

be eliminated by knowledge of X. It is computed as
1 6 16 1 6
U X + U Y − U XY
U 1Y 6
UY X =
where
0 5 ∑ Wr ln Wr
R
U X =− i i
i =1
0 5 ∑ cW ln Wc
C
j j
U Y =−
j =1
and
1 6 ∑ Wf ln Wf ,
U XY = −
ij ij
for f ij > 0
i, j
The asymptotic standard errors are
%K& 0 5 f + U0 X 5 − U0 XY 5 ln c (K) 2
ASE1 =
1
WU 0Y 5
∑ 'K r
2
fij U Y ln W *K
ij
i
j
i, j
P − W U X + U Y − U XY 05 05 0 5 2
WU 0Y 5
ASE0 =
10 CROSSTABS
where
c r 2
P= ∑ f ij ln Wf
j i
ij
i, j
The formulas for U X Y can be obtained by interchanging the roles of X and Y.

A symmetric version of the two asymmetric uncertainty coefficients is defined
as follows:
U 1 X 6 + U 1Y 6 − U 1 XY 6 "#
! U 1 X 6 + U 1Y 6 #$
U =2
with asymptotic standard errors
%K r c − U 1 X 6 + U 1Y 6 ln f (K)
f &U 1 XY 6 ln
2
∑ W W K*
2 i j ij
=
ASE1
W U 1 X 6 + U 1Y 6 K' 2
i, j
ij 2
or
ASE 0 =
W U X +U Y
2
1 6 16 1 6 16 1 6
P − U X + U Y − U XY
2
W
Cohen’s Kappa
16 1 6
Cohen’s kappa κ , defined only for square table R = C , is computed as
R R
W ∑ f ii − ∑r c i i
i =1 i =1
κ= R
W2 − ∑r c i i
i =1
CROSSTABS 11
with variance
%K 4 f 94W − f 9 24W − f 942 f r c − W f 1r + c 69
= W&
∑ ii ∑ + ∑ ∑ ∑ ii ∑ ii ii i i ii i i
KK 4W − ∑ r c 9
var1
4W − ∑ r c 9
2 3
'
2 2
i i i i
"# (K
4W − ∑ f 9 W ∑ f 3r + c 8 − 44∑ r c 9 # K
2 2 2
! #$ K)
ii ij j i i i
i, j
+
4W − ∑ r c 9 2 KK i i
4
K*
"# 2
W rc + r c − W r c 1r + c 6 #
! ∑ ∑ ∑
1
=
#
2

var0 i i i i i i i i
$
2
W W − ∑ r c
i i i

2
i i
i
12 CROSSTABS
Kendall’s Tau-b and Tau-c

Define
R
Dr = W 2 − ∑r i
2
i =1
C
Dc = W 2 − ∑c
j =1
2
j
Cij = ∑∑ f + ∑∑ f hk hk
h <i k < j h >i k > j
Dij = ∑∑ f + ∑∑ f
h <i k > j
hk
h >i k < j
hk
P= ∑f C
i, j
ij ij
Q= ∑f D ij ij
i, j
Note: the P and Q listed above are double the “usual” P (number of concordant
pairs) and Q (number of discordant pairs). Likewise, Dr is double the “usual”
P + Q + X 0 (the number of concordant pairs, discordant pairs, and pairs on which
the row variable is tied) and Dc is double the “usual” P + Q + Y0 (the number of
concordant pairs, discordant pairs, and pairs on which the column variable is tied).
Kendall’s Tau-b
P−Q
τb =
Dr Dc
with standard error

CROSSTABS 13
ASE1 =
1
1
Dr Dc 6 ∑ f 42 ij 3 8
Dr Dc Cij − Dij + τ b vij 9
2
1
− W 3τ b2 Dr + Dc 6 2
i, j
14 CROSSTABS
where
vij = ri Dc + c j Dr
Under the independence assumption, the standard error is
∑ f 3C ij ij − Dij 8 2
−
1
W
1P−Q
2
6
i, j
ASE 0 = 2
Dr Dc
Kendall’s Tau-c
τc =
1
q P−Q 6
W 2
1q − 16
with standard error
ASE1 =
2q
1q − 16W 2 ∑ f 3C ij ij − Dij 8 2
−
1
W
1
P−Q 6 2
i, j
or, under the independence assumption,
ASE 0 =
2q
1q − 16W 2 ∑ f 3C ij ij − Dij 8 2
−
1
W
1
P−Q 6 2
i, j
where
; @
q = min R , C
CROSSTABS 15
Gamma
16
Gamma γ is estimated by
P−Q
γ =
P+Q
with standard error
1 P + Q6 ∑ 3 8
4 2
ASE1 = f QC2 ij ij − PDij
i, j
or, under the hypothesis of independence,
ASE 0 =
1
2
P+Q 6 ∑ f 3C ij ij − Dij 8 2
−
1
W
P−Q 1 6 2
i, j
Somers’ d
Somers’ d with row variable X as the independent variable is calculated as
P−Q
dY X =
Dr
with standard error
ASE1 =
Dr2
2
∑ f J D 3C
ij r ij 8 1
− Dij − P − Q W − Ri 61 6L 2
i, j
or, under the hypothesis of independence,

16 CROSSTABS
ASE 0 =
2
Dr ∑ f 3C ij ij − Dij 8 2
−
1
W
1P−Q 6 2
i, j
By interchanging the roles of X and Y, the formulas for Somers’ d with X as the
dependent variable can be obtained.
Symmetric version of Somers’ d is
d=
1 P − Q6
1
2
1 Dc + Dr 6
The standard error is
2σ τ2
ASE1 =
1 6
b
Dr Dc
Dr + Dc
where σ τ2 is the variance of Kendall’s τ b ,

b
ASE 0 =
1
4
Dc + Dr 6 ∑ f 3C ij ij − Dij 8 2
−
1
W
1P−Q 6 2
i, j
Pearson’s r
The Pearson’s product moment correlation r is computed as
r=
1 6 ≡S
cov X , Y
S 1 X 6S 1Y 6 T
CROSSTABS 17
where
1 6 ∑XY f X r Y c
∑ ∑
R C
cov X , Y = i j ij − i i j j W
i, j i =1 j =1
0 5 ∑ X r − ∑ X r
2
R R
S X = 2

i i i i W
i =1 i =1
and
2
− ∑Y c
C C
16 ∑
SY = Y j2 c j
=1 j j W
j =1 j
The variance of r is
∑ %&' 3 3 X 8 S 1Y 6 + 3Y 8 S 1 X 6"#$()*
2
var1 =
T4
1
f ij T X i − X Y j − Y − 83 8 S
2T ! i −X
2
j −Y
2
i, j
If the null hypothesis is true,
2
∑ − ∑f XY

i, j

f ij X i2 Y j2
i, j
ij i j W
var0 =

∑ r X ∑ c Y 2 2

i
i i
j
j j
where
R
X = ∑X r i i W
i =1
18 CROSSTABS
and
C
Y = ∑ Yj c j W
j =1
Under the hypothesis that ρ = 0 ,
r W −2
t=
1− r2
is distributed as a t with W − 2 degrees of freedom.
Spearman Correlation
The Spearman’s rank correlation coefficient rs is computed by using rank scores
Ri for X i and Ci for Y j . These rank scores are defined as follows:
Ri = ∑ r + 1r + 16 2
k<i
k i for i = 1, 2, K, R
Cj = ∑ c + 3c + 18 2
h< j
h j
for j = 1, 2, K, C
The formulas for rs and its asymptotic variance can be obtained from the Pearson
formulas by substituting Ri and C j for X i and Y j , respectively.
Eta
Asymmetric η with the column variable Y as dependent is
SYW
ηY = 1 −
16
SY
CROSSTABS 19
where
2
− ∑ ∑Y f
R C
∑
1
SYW =
i, j
Y j2 f ij

=1 =1
i
ri
j
j ij

Relative Risk
Consider a 2 × 2 table ( that is, R = C = 2 ) . In a case-control study, the relative
risk is estimated as
f 11 f 22
R0 =
f 12 f 21
1 6
The 100 1− α percent CI for the relative risk is obtained as
4
R0 exp − z1−α 2 v , 9 4
R0 exp z1−α 2 v 9
where
1 1 1 1 12
v=
f11
+
f 12
+ +
f 21 f 22
The relative risk ratios in a cohort study are computed for both columns. For
column 1, the risk is
1
f 11 f 21 + f 22 6
R1 =
f 21 1f 11 + f 12 6
and the corresponding 100 1− α percent CI is 1 6
4
R1 exp − z1−α 2 v , 9 4
R1 exp z1−α 2 v 9
20 CROSSTABS
where
f
v=
12
6
f 22
f 1f 6+ f 1
12
11 11 + f 12 21 f 21 + f 22
The relative risk for column 2 and the confidence interval are computed similarly.
McNemar’s Test
Suppose the test sample is ( x1, y1 ),( x2 , y2 ),K,( x n , y n ) .
The null hypothesis H 0 is P( X < Y ) = P( X > Y ) .
Let
n1 = #{i: xi < yi , i = 1,Kn}
n2 = #{i: xi > yi , i = 1,Kn}
and
r = min(n1 , n2 )
Notation
n1 Number of cases where xi < yi , i = 1,Kn
n2 Number of cases where xi > yi , i = 1,Kn
r min(n1, n2 )
Probability
If there is no real difference between the two trials, we expect the frequencies n1
and n2 to be related as 1:1. Deviations from this ratio can be tested by using the
binomial distribution. The two-tailed probability level is
r
n + n
2× ∑ i (1 / 2)
1 2 n1 + n2
i =0
Note. This is a generalized version of McNemar’s test. The original version is for a
2*2 table.
CROSSTABS 21
Conditional Independence and Homogeneity

The Cochran’s and Mantel-Haenzel statistics test the independence of two
dichotomous variables, controlling for one or more other categorical variables.
These “other” categorical variables define a number of strata, across which these
statistics are computed.
The Breslow-Day statistic is used to test homogeneity of the common odds ratio,
which is a weaker condition than the conditional independence (i.e., homogeneity
with the common odds ratio of 1) tested by Cochran’s and Mantel-Haenszel
statistics. Tarone’s statistic is the Breslow-Day statistic adjusted for the consistent
but inefficient estimator such as the Mantel-Haenszel estimator of the common
odds ratio.
Notation and Definitions

The addition of strata requires the following modifications to the notation:
K The number of strata.
f ijk Sum of cell weights for cases in the ith row of the jth column of the kth
strata.
R
c jk Êf ijk , the jth column of the kth strata subtotal.
i =1
C
rik Êf ijk , the ith row of the kth strata subtotal.
j =1
C R
nk Êc = Êr jk ik , the grand total of the kth strata.
j =1 i =1
Eijk 3 8
E f ijk =
rik c jk
nk
, the expected cell count of the ith row of the jth
column of the kth strata.
A stratum such that nk = 0 is omitted from the analysis. (K must be modified

accordingly.) If nk = 0 for all k, then no computation is done.
Preliminarily, define for each k

22 CROSSTABS
f i1k
p$ ik = ,
rik
d k = p$1k - p$ 2 k ,
c1k
p$ k = ,
nk
and
r1k r2 k
wk = .
nk
Cochran’s Statistic
Cochran’s (1954) statistic is
K K K
Êw d Êw k k k Êw d k k
k =1 k =1 k =1
C= = .
K K K
Êw $ k (1 -
kp p$ k ) Êw k Êw $ k (1 -
kp p$ k )
k =1 k =1 k =1
All stratum such that r1k = 0 or r2 k = 0 are excluded, because d k is undefined. If

every stratum is such, C is undefined. Note that a stratum such that r1k > 0 and
r2 k > 0 but that c1k = 0 or c2 k = 0 is a valid stratum, although it contributes
nothing to the denominator or numerator. However, if every stratum is such, C is
again undefined. So, in order to compute a non system missing value of C, at least
one stratum must have all non-zero marginal totals.
Alternatively, Cochran’s statistic can be written as
CROSSTABS 23
Ê( f 11k - E11k )
k =1
C= .
K
Êw $ k (1 -
kp p$ k )
k =1
When the number of strata is fixed as the sample sizes within each stratum
increase, Cochran’s statistic is asymptotically standard normal, and thus its square
is asymptotically distributed as a chi-squared distribution with 1 d.f.
Mantel and Haeszel’s Statistic

Mantel and Haenszel’s (1959) statistic is simply Cochran’s statistic with small-
sample corrections for continuity and variance “inflation”. These corrections are
desirable when r1k and r2 k are small, but the corrections can make a noticeable
difference even for relatively large r1k and r2 k (Snedecor and Cochran, 1980, p.
213). The statistic is defined as:
K K
{| Ê ( f 11k - E11k )|-0.5} sgn{ Ê( f 11k - E11k )}
k =1 k =1
M= ,
K
Ên
r1k r2 k
p$ k (1 - p$ k )
k =1 k -1
where sgn is the signum function
%K
1 if x > 0
&K
sgn( x ) = 0 if x = 0 .
'
-1 if x < 0
Any stratum in which nk = 1 is excluded from the computation. If every stratum is

such, then M is undefined. M is also undefined if every stratum is such that
r1k = 0 , r2 k = 0 , c1k = 0 , or c2 k = 0 . In order to compute a non system missing
value of M, at least one stratum must have all non-zero marginal totals, just as for
C.
When the number of strata is fixed as the sample sizes within each stratum
increase, or when the sample sizes within each strata are fixed as the number of
24 CROSSTABS
strata increases, this statistic is asymptotically standard normal, and thus its square
is asymptotically distributed as a chi-squared distribution with 1 d.f.
The Breslow-Day Statistic
The Breslow-Day statistic for any estimator θ$ is
{ f 11k - E( f 11k | c1k ; θ$ )}2

K
Ê V( f | c ; θ$ )
.
k =1 11k 1k
E and V are based on the exact moments, but it is customary to replace them with
the asymptotic expectation and variance. Let Ê and V̂ mean the estimated
asymptotic expectation and the estimated asymptotic variance, respectively. Given
the Mantel-Haenszel common odds ratio estimator θˆMH , we use the following
statistic as the Breslow-Day statistic:
{ f 11k - E$ ( f 11k | c1k ;θ$ MH )}2

K
B= Ê V$ ( f | c ; θ$ )
,
k =1 11k 1k MH
where
E$ ( f 11k | c1k ; θ$ MH ) = f$11k
satisfies the equations
f$11k (nk - r1k - c1k + f$11k ) $

= θ MH ,
(r - f$ )(c - f$ )
1k 11k 1k 11k
with constraints such that

CROSSTABS 25
f$11k 0,
r1k - f$11k > 0,
c - f$ > 0,
1k 11k
nk - r1k - c1k + f$11k 0;
and
1 -
1
$ ( f | c ; θ$
V 11k 1k MH ) = f$
11k
+
1
f$12 k
+
1
f$21k
+
1
f$22 k

with constraints such that
f$11k > 0,
f$12 k = r1k - f$11k > 0,
f$21k = c1k - f$11k > 0,
f$22 k = nk - r1k - c1k + f$11k > 0;
All stratum such that r1k = 0 or c1k = 0 are excluded. If every stratum is such, B
is undefined. Stratum such that f$11k = 0 are also excluded. If every stratum is
such, then B is undefined.
Breslow-Day’s statistic is asymptotically distributed as a chi-squared random
variable with K-1 degrees of freedom under the null hypothesis of a constant odds
ratio.
Tarone’s Statistic
Tarone (1985) proposes an adjustment to the Breslow-Day statistic when the
common odds ratio estimator is consistent but inefficient, specifically when we
have the Mantel-Haenszel common odds ratio estimator. The adjusted statistic,
Tarone’s statistic, for θˆMH is
26 CROSSTABS
K " 2
Ê { f 11k - E$ ( f 11k | c1k ; θ$ MH )}#

#$
-!
K $ $
- E ( f 11k | c1k ; θ MH )} 2
T = Ê 11k
{f k =1
$ ( f | c ; θ$
V K
11k 1k MH )
k =1
Ê V$ ( f11k |c1k ;θ$ MH )
k =1
K " 2
Ê { f11k - E$ ( f11k |c1k ;θ$ MH )}##

= B- ! $ ,
k =1
K
Ê V$ ( f11k |c1k ;θ$ MH )

k =1
where Ê and V̂ are as before.

The required data conditions are the same as for the Breslow-Day statistic
computation. T is, of course, undefined, when B is undefined.
T is also asymptotically distributed as a chi-squared random variable with K-1
degrees of freedom under the null hypothesis of a constant odds ratio.
Estimation of the Common Odds Ratio
For K strata of 2 × 2 tables, write the true odds ratios as
p1k (1 - p2 k )
θk =
(1 - p1k ) p2 k
fork = 1, ..., K . And, assuming that the true common odds ratio exists,
θ = θ 1 = ... = θ K , Mantel and Haenszel’s (1959) estimator of this common odds
ratio is
Ê
f 11k f 22 k
nk
θ$ MH = kK=1 .
Ê
f 12 k f 21k
nk
k =1
If every stratum is such that f 12 k = 0 or f 21k = 0 , then θ$ MH is undefined.

CROSSTABS 27
The (natural) log of the estimated common odds ratio is asymptotically normal.
Note, however, that if f 11k = 0 or f 22 k = 0 in every stratum, then θ$ MH is zero and
log θ$4 9
is undefined.
MH
The Asymptotic Confidence Interval
Robins et al. (1986) give an estimated asymptotic variance for log θ$ MH that is 4 9
appropriate in both asymptotic cases:
K
( f 11k + f 22 k ) f 11k f 22 k
Ê nk2
σ$ 2 [log(θ$ MH )] = k =1 K
Ê
f 11k f 22 k 2
2( )
nk
k =1
K
( f 11k + f 22 k ) f 12 k f 21k + ( f 12 k + f 21k ) f 11k f 22 k
Ê nk2
k =1
+ K K
Ê Ê
f 11k f 22 k f 12 k f 21k
2( )( )
nk nk
k =1 k =1
K
( f 12 k + f 21k ) f 12 k f 21k
Ê nk2
+ k =1 K
.
Ê
f 12 k f 21k 2
2( )
nk
k =1
An asymptotic (100 - α )% confidence interval for log1θ 6 is
log(θ$ MH ) z(α / 2)σ$ [log(θ$ MH )] ,
where z(α / 2) is the upper α / 2 critical value for the standard normal distribution.
All these computations are valid only if θ$ MH is defined and greater than 0.
28 CROSSTABS
The Asymptotic P-value
We compute an asymptotic P-value under the null hypothesis that

θ ( = θ k "k ) = θ o ( > 0) against a 2-sided alternative hypothesis (θ θ o ) , using
the standard normal variate, as follows
log(θ$ MH ) - log(θ o ) log(θ$ MH ) - log(θ o )

Pr | Z | >
σ$ [log(θ$ MH )]

= 2 Pr Z >
σ$ [log(θ$ MH )]

,
4 9
given that log θ$ MH is defined.
Alternatively, we can consider using θ$ MH and the estimated exact variance of
θ$ MH , which is still consistent in both limiting cases:
σ$ 2 [log(θ$ MH )]θ$ 2MH .
Then, the asymptotic P-value may be approximated by
θ$ MH - θ o

Pr | Z | >
σ$ [log(θ$ MH )]θ o
.
The caveat for this formula is that θ$ MH may be quite skewed even in moderate
sample sizes (Robins et al., 1986, p. 314).
References
Agresti, A. (1990). Categorical Data Analysis. John Wiley, New York.
Agresti, A. (1996). An Introduction to Categorical Data Analysis. John Wiley,

New York.
Bishop, Y. M. M., Fienberg, S. E., and Holland, P. W. 1975. Discrete multivariate

analysis: Theory and practice. Cambridge, Mass.: MIT Press.
Breslow, N. E. and Day, N. E. (1980). Statistical Methods in Cancer Research, 1,

The Analysis of Case-Control Studies. International Agency for Research on
Cancer, Lyon.
CROSSTABS 29
Brown, M. B. 1975. The asymptotic standard errors of some estimates of

uncertainty in the two-way contingency table. Psychometrika, 40(3): 291.
Brown, M. B., and Benedetti, J. K. 1977. Sampling behavior of tests for correlation
in two-way contingency tables. Journal of the American Statistical Association,
72: 309–315.
Cochran, W. G. (1954). Some methods of strengthening the common χ2 tests.

Biometrics, 10, 417-451.
Goodman, L. A., and Kruskal, W. H. 1954. Measures of association for cross-

classification. Journal of the American Statistical Association, 49: 732–764.
Goodman, L. A., and Kruskal, W. H. 1972. Measures of association for cross-

classification, IV: simplification and asymptotic variances, Journal of the
American Statistical Association, 67: 415–421.
Hauck, W. (1989). Odds ratio inference from stratified samples. Commun. Statist.-
Theory Meth., 18, 767-800.
Somes, G. W. and O’Brien, K. F. (1985). Mantel-Haenszel statistic. In

Encyclopedia of Statistical Sciences, Vol. 5 (S. Kotz and N. L. Johnson, eds.)
214-217. John Wiley, New York.
Mantel, N. and Haenszel, W. (1959). Statistical aspects of the analysis of data

from retrospective studies of disease. J. Natl. Cancer Inst., 22, 719-748.
Robins, J., Breslow, N., and Greenland, S. (1986). Estimators of the Mantel-
Haenszel variance consistent in both sparse data and large-strata limiting
models. Biometrics, 42, 311-323.
th
Snedecor, G. W. and Cochran, W. G. (1980). Statistical Methods, 7 ed. Iowa
State University Press, Ames, Iowa.
Tarone, R. E. (1985). On heterogeneity tests based on efficient scores. Biometrika,

72, 91-95.

Cross Tabs

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Cross Tabs

Uploaded by

Copyright:

Available Formats

CROSSTABS

Xi Distinct values of row variable arranged in ascending order:

Marginal and Cell Statistics

The degrees of freedom are R − 1 C − 1 . 1 61 6

The degrees of freedom are R − 1 C − 1 . 1 61 6

Yates Continuity Corrected for 2 x 2 Tables

The degrees of freedom are 1.

Mantel-Haenszel Test of Linear Association

Other Measures of Association

For a table not 2 × 2

Measures of Proportional Reduction in Predictive Error

The standard errors are

Lambda for predicting X from Y, λ Y X , is obtained by permuting the indices in the

The standard errors are

Goodman and Kruskal’s Tau (Goodman & Kruskal, 1954)

Similarly defined is Goodman and Kruskal’s tau τ : 16

with standard error

The significance level is based on the chi-square distribution, since

0W − 150C − 15τ YX ~ χ 02R −150 C −15

0W − 150 R − 15τ XY ~ χ 02R −150 C −15

Let U Y X be the proportional reduction in the uncertainty (entropy) of Y that can

The asymptotic standard errors are

%K& 0 5  f  + U0 X 5 − U0 XY 5 ln c  (K) 2

The formulas for U X Y can be obtained by interchanging the roles of X and Y.

with asymptotic standard errors

Kendall’s Tau-b and Tau-c

with standard error

Under the independence assumption, the standard error is

or, under the independence assumption,

with standard error

or, under the hypothesis of independence,

with standard error

or, under the hypothesis of independence,

where σ τ2 is the variance of Kendall’s τ b ,

If the null hypothesis is true,

Under the hypothesis that ρ = 0 ,

is distributed as a t with W − 2 degrees of freedom.

The null hypothesis H 0 is P( X < Y ) = P( X > Y ) .

Conditional Independence and Homogeneity

Notation and Definitions

column of the kth strata.

A stratum such that nk = 0 is omitted from the analysis. (K must be modified

Preliminarily, define for each k

All stratum such that r1k = 0 or r2 k = 0 are excluded, because d k is undefined. If

Mantel and Haeszel’s Statistic

where sgn is the signum function

Any stratum in which nk = 1 is excluded from the computation. If every stratum is

The Breslow-Day Statistic

The Breslow-Day statistic for any estimator θ$ is

{ f 11k - E( f 11k | c1k ; θ$ )}2

{ f 11k - E$ ( f 11k | c1k ;θ$ MH )}2

E$ ( f 11k | c1k ; θ$ MH ) = f$11k

satisfies the equations

f$11k (nk - r1k - c1k + f$11k ) $

with constraints such that

Ê { f 11k - E$ ( f 11k | c1k ; θ$ MH )}#

Ê { f11k - E$ ( f11k |c1k ;θ$ MH )}##

Ê V$ ( f11k |c1k ;θ$ MH )

where Ê and V̂ are as before.

Estimation of the Common Odds Ratio

For K strata of 2 × 2 tables, write the true odds ratios as

If every stratum is such that f 12 k = 0 or f 21k = 0 , then θ$ MH is undefined.

The Asymptotic Confidence Interval

An asymptotic (100 - α )% confidence interval for log1θ 6 is

%K& 0 5 f + U0 X 5 − U0 XY 5 ln c (K) 2

log(θ$ MH ) z(α / 2)σ$ [log(θ$ MH )] ,

log(θ$ MH ) - log(θ o ) log(θ$ MH ) - log(θ o )