Professional Documents
Culture Documents
Associations Between Categorical Variables
Associations Between Categorical Variables
Categorical Variables
Case where both explanatory (independent)
variable and response (dependent) variable
are qualitative (Chapter 7 includes case
where both are binary (2 levels)
20.00
10.00
season
2
P-value: Area above obs in the chi-squared
distribution with (r-1)(c-1) degrees of
freedom. (Critical values in Table 8.5)
Example - Cyclones Near Antarctica
Observed Cell Counts (fo):
Region\Season Autumn Winter Spring Summer Total
40-49S 370 452 273 422 1517
50-59S 526 624 513 1059 2722
60-79S 980 1200 995 1751 4926
Total 1876 2276 1781 3232 9165
SEASON
Autumn Winter Spring Summer Total
REGION 40-49S Count 370 452 273 422 1517
Expected Count 310.5 376.7 294.8 535.0 1517.0
% within REGION 24.4% 29.8% 18.0% 27.8% 100.0%
50-59S Count 526 624 513 1059 2722
Expected Count 557.2 676.0 529.0 959.9 2722.0
% within REGION 19.3% 22.9% 18.8% 38.9% 100.0%
60-79S Count 980 1200 995 1751 4926
Expected Count 1008.3 1223.3 957.3 1737.1 4926.0
% within REGION 19.9% 24.4% 20.2% 35.5% 100.0%
Total Count 1876 2276 1781 3232 9165
Expected Count 1876.0 2276.0 1781.0 3232.0 9165.0
% within REGION 20.5% 24.8% 19.4% 35.3% 100.0%
Chi-Square Tests
Asymp. Sig.
Value df (2-sided)
Pearson Chi-Square 71.189a 6 .000
Likelihood Ratio
P-value
71.337 6 .000
Linear-by-Linear
23.418 1 .000
Association
N of Valid Cases 9165
a. 0 cells (.0%) have expected count less than 5. The
minimum expected count is 294.79.
Misuses of chi-squared Test
Expected frequencies too small (all
expected counts should be above 5, not
necessary for the observed counts)
Dependent samples (the same individuals
are in each row, see McNemars test)
Can be used for nominal or ordinal
variables, but more powerful methods exist
for when both variables are ordinal and a
directional association is hypothesized
Residual Analysis
Once dependence has been determined from a chi-
squared test, often interested in determining which
cells contributed
Residual: fo-fe measures the difference between the
observed and expected counts
Positive implies observed more than expected
Residuals practical importance depends on level of fe
Adjusted Residual (computed for each cell):
fo fe
f e (1 row proportion)(1 column proportion)
Adjusted residuals above 3 in absolute value give strong evidence against independence in
that cell
Example - Cyclones Near Antarctica
Adjusted residuals are computed in the following table.
Row proportion for Region 40-49S: 1517/9165=0.1655
Column Proportion for Season Autumn is: 1876/9165=0.2047
Region Season fo fe row prop col prop adj res
40-49S Autumn 370 310.5 0.1655 0.2047 4.144837
40-49S Winter 452 376.7 0.1655 0.2483 4.898484
40-49S Spring 273 294.8 0.1655 0.1943 -1.54843
40-49S Summer 422 535 0.1655 0.3526 -6.64664
50-59S Autumn 526 557.2 0.297 0.2047 -1.76769
50-59S Winter 624 676 0.297 0.2483 -2.75125
50-59S Spring 513 529 0.297 0.1943 -0.92433
50-59S Summer 1059 959.9 0.297 0.3526 4.741291
60-79S Autumn 980 1008.3 0.5375 0.2047 -1.4695
60-79S Winter 1200 1223.3 0.5375 0.2483 -1.12983
60-79S Spring 995 957.3 0.5375 0.1943 1.996065
60-79S Summer 1751 1737.1 0.5375 0.3526 0.609481
2x2 Tables
^ n11 ^ n21
1 2
n1. n2.
Relative Risk
Estimated Relative Risk:
RR 1 ^
2
Interpretation
Conclude that the probability that the outcome is
present is higher (in the population) for group 1 if
the entire interval is above 1
Conclude that the probability that the outcome is
present is lower (in the population) for group 1 if
the entire interval is below 1
Do not conclude that the probability of the
outcome differs for the two groups if the interval
contains 1
Example - Coccidioidomycosis and
TNF-antagonists
Research Question: Risk of developing
Coccidioidmycosis associated with arthritis
therapy?
Groups: Patients receiving tumor necrosis
factor (TNF) versus Patients not receiving
TNF (all patients arthritic)
COC No COC Total
TNF 7 240 247
Other 4 734 738
Source: Bergstrom, et al Total 11 974 985
(2004)
Example - Coccidioidomycosis and
TNF-antagonists
Group 1: Patients on TNF
Group 2: Patients not on TNF
^ 7 ^ 4
1 .0283 2 .0054
247 738
^
1 .0283 1 .0283 1 .0054
RR ^ 5.24 v .3874
2 .0054 7 4
Interpretation
Conclude that the probability that the outcome is
present is higher (in the population) for group 1 if
the entire interval is above 1
Conclude that the probability that the outcome is
present is lower (in the population) for group 1 if
the entire interval is below 1
Do not conclude that the probability of the
outcome differs for the two groups if the interval
contains 1
Example - NSAIDs and GBM
Case-Control Study (Retrospective)
Cases: 137 Self-Reporting Patients with Glioblastoma
Multiforme (GBM)
Controls: 401 Population-Based Individuals matched to
cases wrt demographic factors
^ ^
AR 1 2
Interpretation
Conclude that the probability that the outcome is
present is higher (in the population) for group 1 if
the entire interval is positive
Conclude that the probability that the outcome is
present is lower (in the population) for group 1 if
the entire interval is negative
Do not conclude that the probability of the
outcome differs for the two groups if the interval
contains 0
Example - Coccidioidomycosis and
TNF-antagonists
Group 1: Patients on TNF
Group 2: Patients not on TNF
^ 7 ^ 4
1 .0283 2 .0054
247 738
^ ^
AR 1 2 .0283 .0054 .0229
.0283(.9717) .0054(.9946)
95%CI : .0229 1.96
247 738
.0229 .0213 (0.0016 , 0.0242)
Count
SICKDAYS
0 days 1-6 days 7+ days Total
ALCOHOL Without Risk 347 113 145 605
Hardly any Risk 154 63 56 273
Some-Considerable Risk 52 25 34 111
Total 553 201 235 989
Count
SICKDAYS
0 days 1-6 days 7+ days Total
ALCOHOL Without Risk 347 113 145 605
Hardly any Risk 154 63 56 273
Some-Considerable Risk 52 25 34 111
Total 553 201 235 989
Kendalls b:
^ CD
b
0.5 (n ni. )(n n. j )
2 2 2 2
Symmetric Measures
Asymp.
a b
Value Std. Error Approx. T Approx. Sig.
Ordinal by Kendall's tau-b .035 .030 1.187 .235
Ordinal Gamma .062 .052 1.187 .235
N of Valid Cases 989
a. Not assuming the null hypothesis.
b. Using the asymptotic standard error assuming the null hypothesis.