You are on page 1of 57

Contingency Tables

1. Explain 2 Test of Independence


2. Measure of Association
Contingency Tables

• Tables representing all combinations of


levels of explanatory and response
variables
• Numbers in table represent Counts of the
number of cases in each cell
• Row and column totals are called
Marginal counts
2x2 Tables

• Each variable has 2 levels


– Explanatory Variable – Groups (Typically
based on demographics, exposure)
– Response Variable – Outcome (Typically
presence or absence of a characteristic)
2x2 Tables - Notation

Outcome Outcome Group


Present Absent Total

Group 1 n11 n12 n1.

Group 2 n21 n22 n2.

Outcome n.1 n.2 n..


Total
2 Test of Independence
2 Test of Independence

• 1. Shows If a Relationship Exists Between


2 Qualitative Variables
– One Sample Is Drawn
– Does Not Show Causality
• 2. Assumptions
– Multinomial Experiment
– All Expected Counts  5
• 3. Uses Two-Way Contingency Table
2 Test of Independence
Contingency Table
• 1. Shows # Observations From 1 Sample
Jointly in 2 Qualitative Variables
2 Test of Independence
Contingency Table
• 1. Shows # Observations From 1 Sample
Jointly in 2 Qualitative Variables
Levels of variable 2

House Location
House Style Urban Rural Total
Split-Level 63 49 112
Ranch 15 33 48
Total 78 82 160

Levels of variable 1
2 Test of Independence
Hypotheses & Statistic
• 1. Hypotheses
– H0: Variables Are Independent
– Ha: Variables Are Related (Dependent)
2 Test of Independence
Hypotheses & Statistic
• 1. Hypotheses
– H0: Variables Are Independent
– Ha: Variables Are Related (Dependent)
• 2. Test Statistic Observed count

ch
nij  E nij
2


Expected
 
2

all cells

ch
E n ij
count
2 Test of Independence
Hypotheses & Statistic
• 1. Hypotheses
– H0: Variables Are Independent
– Ha: Variables Are Related (Dependent)
• 2. Test Statistic Observed count

ch
nij  E nij
2


Expected
 
2

all cells

ch
E n ij
count

Rows Columns
• Degrees of Freedom: (r - 1)(c - 1)
2Test of Independence
Expected Counts
• 1. Statistical Independence Means Joint
Probability Equals Product of Marginal
Probabilities
• 2. Compute Marginal Probabilities &
Multiply for Joint Probability
• 3. Expected Count Is Sample Size Times
Joint Probability
Expected Count Example
Expected Count Example

Location
Urban Rural
House Style Obs. Obs. Total
Split-Level 63 49 112
Ranch 15 33 48
Total 78 82 160
Expected Count Example
Marginal probability = 112
160
Location
Urban Rural
House Style Obs. Obs. Total
Split-Level 63 49 112
Ranch 15 33 48
Total 78 82 160
Expected Count Example
Marginal probability = 112
160
Location
Urban Rural
House Style Obs. Obs. Total
Split-Level 63 49 112
Ranch 15 33 48
Total 78 82 160

78
Marginal probability =
160
Expected Count Example
112 78
Joint probability = Marginal probability = 112
160 160 160
Location
Urban Rural
House Style Obs. Obs. Total
Split-Level 63 49 112
Ranch 15 33 48
Total 78 82 160

78
Marginal probability =
160
Expected Count Example
112 78
Joint probability = Marginal probability = 112
160 160 160
Location
Urban Rural
House Style Obs. Obs. Total
Split-Level 63 49 112
Ranch 15 33 48
Total 78 82 160
112 78
78 Expected count = 160·
Marginal probability = 160 160
160 = 54.6
Expected Count Calculation
Expected Count Calculation
Expected count =
aRow totalf aColumn totalf
Sample size
Expected Count Calculation
Expected count =
aRow totalf aColumn totalf
Sample size
112·78 House Location 112·82
160 Urban Rural 160
House Style Obs. Exp. Obs. Exp. Total
Split-Level 63 54.6 49 57.4 112
Ranch 15 23.4 33 24.6 48
Total 78 78 82 82 160

48·78 48·82
160 160
2 Test of Independence
Example
• You’re a marketing research analyst. You ask a
random sample of 286 consumers if they
purchase Diet Pepsi or Diet Coke. At the .05
level, is there evidence of a relationship?
Diet Pepsi
Diet Coke No Yes Total
No 84 32 116
Yes 48 122 170
Total 132 154 286
2 Test of Independence
Solution
2 Test of Independence
Solution
• H0: Test Statistic:
• Ha:
=
• df =
• Critical Value(s): Decision:
Reject
Conclusion:

0 2
2 Test of Independence
Solution
• H0: No Relationship Test Statistic:
• Ha: Relationship
=
• df =
• Critical Value(s): Decision:
Reject
Conclusion:

0 2
2 Test of Independence
Solution
• H0: No Relationship Test Statistic:
• Ha: Relationship
  = .05
• df = (2 - 1)(2 - 1)
=1
Decision:
• Critical Value(s):
Reject
Conclusion:

0 2
2 Test of Independence
Solution
• H0: No Relationship Test Statistic:
• Ha: Relationship
  = .05
• df = (2 - 1)(2 - 1)
=1
Decision:
• Critical Value(s):
Reject
 = .05 Conclusion:

0 3.841 2
2 Test of Independence
Solution

E(nij)  5 in all
cells
116·132 Diet Pepsi 154·116
286 No Yes 286
Diet Coke Obs. Exp. Obs. Exp. Total
No 84 53.5 32 62.5 116
Yes 48 78.5 122 91.5 170
Total 132 132 154 154 286

170·132 170·154
286 286
2 Test of Independence
Solution

ch
nij  E nij
2

 
2

all cells E n chij


af
n11  E n11
2


af
n12  E n12
2


af
n22  E n22
2

E naf 11 E n af
12 af
E n 22
2 2 2
84  53.5 32  62.5 122  91.5
    54.29
53.5 62.5 91.5
2 Test of Independence
Solution
• H0: No Relationship Test Statistic:
• Ha: Relationship 2 = 54.29
  = .05
• df = (2 - 1)(2 - 1)
=1
Decision:
• Critical Value(s):
Reject
 = .05 Conclusion:

0 3.841 2
2 Test of Independence
Solution
• H0: No Relationship Test Statistic:
• Ha: Relationship 2 = 54.29
  = .05
• df = (2 - 1)(2 - 1)
=1
Decision:
• Critical Value(s):
Reject Reject at  = .05
 = .05 Conclusion:

0 3.841 2
2 Test of Independence
Solution
• H0: No Relationship Test Statistic:
• Ha: Relationship 2 = 54.29
  = .05
• df = (2 - 1)(2 - 1)
=1
Decision:
• Critical Value(s):
Reject Reject at  = .05
 = .05 Conclusion:
There is evidence of a
0 3.841 2 relationship
Siskel and Ebert
• | Ebert
• Siskel | Con Mix Pro | Total
• -----------+---------------------------------+----------
• Con | 24 8 13 | 45
• Mix | 8 13 11 | 32
• Pro | 10 9 64 | 83
• -----------+---------------------------------+----------
• Total | 42 30 88 | 160

Siskel and|
Ebert Ebert
• Siskel | Con Mix Pro | Total
•-----------+---------------------------------+----------
• Con | 24 8 13 | 45
• | 11.8 8.4 24.8 | 45.0
•-----------+---------------------------------+----------
• Mix | 8 13 11 | 32
• | 8.4 6.0 17.6 | 32.0
•-----------+---------------------------------+----------
• Pro | 10 9 64 | 83
• | 21.8 15.6 45.6 | 83.0
•-----------+---------------------------------+----------
• Total | 42 30 88 | 160
• | 42.0 30.0 88.0 | 160.0

• Pearson chi2(4) = 45.3569 p < 0.001


Yate’s Statistics

• Method of testing for association for 2x2


tables when sample size is moderate (
total observation between 6 – 25)

 O 
2
ij  eij  0.5
 
2 i j

eij
Measures of association

– Relative End
Risk of Chapter
– Odds Ratio
– Absolute Risk
Any blank slides that follow are
blank intentionally.
Relative Risk

• Ratio of the probability that the outcome


characteristic is present for one group,
relative to the other
• Sample proportions with characteristic from
groups 1 and 2:

^ n11 ^ n21
1  2 
n1. n2.
Relative Risk
• Estimated Relative Risk:
^

RR   1 ^
 2

95% Confidence Interval for Population Relative Risk:

( RR (e 1.96 v
) , RR (e1.96 v
))
^ ^
(1   1 ) (1   )
e  2.71828 v  
2

n11 n21
Relative Risk

• Interpretation
– Conclude that the probability that the
outcome is present is higher (in the
population) for group 1 if the entire interval is
above 1
– Conclude that the probability that the
outcome is present is lower (in the
population) for group 1 if the entire interval is
below 1
– Do not conclude that the probability of the
outcome differs for the two groups if the
interval contains 1
Example - Coccidioidomycosis and
TNF-antagonists
• Research Question: Risk of developing Coccidioidmycosis
associated with arthritis therapy?
• Groups: Patients receiving tumor necrosis factor  (TNF)
versus Patients not receiving TNF (all patients arthritic)

COC No COC Total


TNF 7 240 247
Other 4 734 738
Total 11 974 985

Source: Bergstrom, et al (2004)


Example - Coccidioidomycosis and
TNF-antagonists
• Group 1: Patients on TNF
• Group 2: Patients not on TNF
^ 7 ^ 4
1   .0283  2   .0054
247 738
^
1
.0283 1  .0283 1  .0054
RR  ^   5.24 v   .3874
 2 .0054 7 4

95%CI : (5.24e 1.96 .3874


, 5.24e1.96 .3874
)  (1.55 , 17.76)

Entire CI above 1  Conclude higher risk if on TNF


Odds Ratio

• Odds of an event is the probability it occurs


divided by the probability it does not occur
• Odds ratio is the odds of the event for group 1
divided by the odds of the event for group 2
• Sample odds of the outcome for each group:
n11 / n1. n11
odds1  
n12 / n1. n12
n21
odds2 
n22
Odds Ratio
• Estimated Odds Ratio:

odds1 n11 / n12 n11n22


OR   
odds2 n21 / n22 n12n21

95% Confidence Interval for Population Odds Ratio

( OR (e 1.96 v
) , OR (e1.96 v ) )
1 1 1 1
e  2.71828 v    
n11 n12 n21 n22
Odds Ratio

• Interpretation
– Conclude that the probability that the
outcome is present is higher (in the
population) for group 1 if the entire interval is
above 1
– Conclude that the probability that the
outcome is present is lower (in the
population) for group 1 if the entire interval is
below 1
– Do not conclude that the probability of the
outcome differs for the two groups if the
interval contains 1
Example - NSAIDs and GBM
• Case-Control Study (Retrospective)
– Cases: 137 Self-Reporting Patients with Glioblastoma
Multiforme (GBM)
– Controls: 401 Population-Based Individuals matched to
cases wrt demographic factors

GBM Present GBM Absent Total


NSAID User 32 138 170
NSAID Non-User 105 263 368
Total 137 401 538
Source: Sivak-Sears, et al (2004)
Example - NSAIDs and GBM
32(263) 8416
OR    0.58
138(105) 14490
1 1 1 1
v     0.0518
32 138 105 263

95% CI : ( 0.58e 1.96 0.0518


, 0.58e1.96 0.0518
)  (0.37 , 0.91)

Interval is entirely below 1, NSAID use appears


to be lower among cases than controls
Absolute Risk

• Difference Between Proportions of outcomes


with an outcome characteristic for 2 groups
• Sample proportions with characteristic
from groups 1 and 2:
^ n11 ^ n21
1  2 
n1. n2.
Absolute Risk
Estimated Absolute Risk:

^ ^
AR   1   2

95% Confidence Interval for Population Absolute Risk

^
 ^  ^  ^ 
 1 1   1   2  1   2 
AR  1.96    
n1. n2.
Absolute Risk

• Interpretation
– Conclude that the probability that the
outcome is present is higher (in the
population) for group 1 if the entire interval is
positive
– Conclude that the probability that the
outcome is present is lower (in the
population) for group 1 if the entire interval is
negative
– Do not conclude that the probability of the
outcome differs for the two groups if the
interval contains 0
Example - Coccidioidomycosis and
TNF-antagonists
• Group 1: Patients on TNF
• Group 2: Patients not on TNF
^ 7 ^ 4
1   .0283  2   .0054
247 738
^ ^
AR   1   2  .0283  .0054  .0229
.0283(.9717) .0054(.9946)
95%CI : .0229  1.96 
247 738
 .0229  .0213  (0.0016 , 0.0242)

Interval is entirely positive, TNF is


associated with higher risk
Ordinal Explanatory and Response
Variables
• Pearson’s Chi-square test can be used to test
associations among ordinal variables, but more
powerful methods exist
• When theories exist that the association is
directional (positive or negative), measures exist
to describe and test for these specific
alternatives from independence:
– Gamma
– Kendall’s tb
Concordant and Discordant Pairs
• Concordant Pairs - Pairs of individuals where one
individual scores “higher” on both ordered
variables than the other individual
• Discordant Pairs - Pairs of individuals where one
individual scores “higher” on one ordered
variable and the other individual scores “lower”
on the other
• C = # Concordant Pairs D = # Discordant Pairs
– Under Positive association, expect C > D
– Under Negative association, expect C < D
– Under No association, expect C  D
Example - Alcohol Use and Sick Days

• Alcohol Risk (Without Risk, Hardly any Risk,


Some to Considerable Risk)
• Sick Days (0, 1-6, 7)
• Concordant Pairs - Pairs of respondents where
one scores higher on both alcohol risk and sick
days than the other
• Discordant Pairs - Pairs of respondents where
one scores higher on alcohol risk and the other
scores higher on sick days
Source: Hermansson, et al (2003)
Example - Alcohol Use and Sick Days
A

C
D
d
od
da
t
7
3
5
5 A
W
4
3
6
3 H
2
5
4
1 S
3
1
5
9 T

• Concordant Pairs: Each individual in a given cell is


concordant with each individual in cells “Southeast”
of theirs
•Discordant Pairs: Each individual in a given cell is
discordant with each individual in cells “Southwest”
of theirs
Example - Alcohol Use and Sick Days
A

C
D
d
od
da
t
7
3
5
5 A
W
4
3
6
3 H
2
5
4
1 S
3
1
5
9 T

C  347(63  56  25  34)  113(56  34)  154(25  34)  63(34)  83164


D  145(154  63  52  25)  113(154  52)  56(52  25)  63(52)  73496
Measures of Association
• Goodman and Kruskal’s Gamma:

^ CD ^
  1    1
CD

• Kendall’s tb:
^ CD
tb 
(n   ni. )( n 2   n. j )
2 2 2

When there’s no association between the ordinal variables,


the population based values of these measures are 0.
Statistical software packages provide these tests.
Example - Alcohol Use and Sick Days

^ C  D 83164  73496
   0.0617
C  D 83164  73496

y m
a
b
o
rlE
x
ou
5
0
7
5O
K
2
2
7
5O
G
9N
a
N
b
U

You might also like