Professional Documents
Culture Documents
to use?
• For every model we will cover you must ask a series of
questions
1. What type of dependent variable do I have?
2. How many independent variables do I have?
3. What type of independent variable do I have
• If the independent is categorical, how many categories does it have?
4. Where subjects measured more than once (repeated measures)?
• The next slide provides a decision chart which summarizes
these questions and leads to a statistical method
• We will only cover a few of the models shown, the other methods
are FYI
• This lecture will focus on contingency analysis
Contingency
analysis
Contingency analysis
• Contingency analysis estimate and test for
associations between two or more categorical
variables
Contingency analysis
• Contingency analysis is complimented by
• Contingency table
• Graphing: mosaic plots or grouped barplots
Contingency analysis
• Contingency analysis is complimented by
• Its associated effect size is the odds ratio
Contingency analysis
• Assumptions of contingency analysis are linked to
the test statistic used
• It is common place to apply the X2 test.
• Assumptions include:
• Random sample
• Sufficiently large sample size
• Expected cell counts = 5 (Yates’s correction can be used)
• The observation are assumed to be independent
Contingency analysis
• At its heart, contingency analysis is the
investigation of the independence of the variables
• If two variables are independent, then the state of
one variable tells nothing about the probability of
the different values of the over variable
Contingency analysis
Example, how did women fair on the Titanic?
Observed frequencies
Men Women Sum
Died 1329 109 1438
Survived 338 316 654
Sum 1667 425 2092
X2 contingency test
• Ho: Survival was independent of sex on the Titanic
• Ha: Survival was not independent of sex on the
Titanic
• To perform a X2 contingency test we calculate the
expected frequencies and compare them to the
observed frequencies
• The expected frequencies are those under the
assumption the null hypothesis is true
X2 contingency test
• What we would expect if sex and death were
independent?
• The mosaic chart below suggests that equal proportions
of men and women died on the Titanic - but this isn't
what happened
X2 contingency test
• However, in reality sex and death were not
independent
• More men died
X2 contingency test
• Calculating the expected frequencies
• If two events are independent, then by definition, the
probability of both occurring is equal to the probability
of one event occurring times the probability of the event
occurring
• Thus:
Observed frequencies
Men Women Sum
Died 1329 109 1438
Survived 338 316 654
Sum 1667 425 2092
X2 contingency test
• Calculating the expected frequencies
Observed frequencies
Men Women Sum
Died 1329 109 1438
Survived 338 316 654
Sum 1667 425 2092
X2 contingency test
• Calculating the expected frequencies
• You would do this for every cell and construct this table:
Expected frequencies
Men Women Sum
Died 1145.863 292.137 1438.000
Survived 521.137 132.863 654.000
Sum 1667.000 425.000 2092.000
X2 contingency test
• X2 statistic
X2 = (1329-1145.863)2/1145.863 + (338-521.137)2/521.137 +
(109-292.137)2/292.137 + (316-132.863)2/132.863 = 460.866
^ 𝑝
^
𝑂=
1− 𝑝
^
• Where:
• is the probability of success
• 1 - is the probability of failure
• are the odds of success
Odds ratios
^ 𝑝
^
𝑂=
1− 𝑝
^
• What is the difference between
probability and odds?
• Probability expresses chance as a ratio of the number of
desired outcomes to the total number of possible
outcomes
• Odds is the chance as a ratio of success to failure, the
number of desired outcomes to the number of undesired
outcomes
Odds ratios
• 1st – estimate the proportion of men that died
797
Odds ratios
• 2nd – estimate the proportion of men that lived
^ 𝑝
^ 0.797
𝑂 1= = =3.93
^ 0.203
1− 𝑝
Odds ratios
• Now repeat this process for estimating the odds
that a woman died on the Titanic
109
𝑝
^ 2= =0.256 1− 𝑝
^ 2=1 −0.256=𝟎 .𝟕𝟒𝟒
425
^ 𝑝
^ 0 .256
𝑂 2= = =0.345
^ 0 .744
1− 𝑝
Odds ratios
• Now that we have the odds we can calculate the
odds ratio
• The odds ratio measures the magnitude of
association between two categorical variables
when each variables has only two categories
^1
𝑂
𝑂𝑅=
^
𝑂 2
Odds ratios
• If the odds ratio is a measure of effect size
describing the strength of association or non-
independence between two binary data values
^1
𝑂
𝑂𝑅=
^
𝑂 2
Odds ratios
• If the odds ratio is equal to one, the odds of success
in the response variable are independent of
treatment
• If > 1, the event has higher odds in the first group
than the second
• If < 1, then the odds are higher in the second group
^1
𝑂
𝑂𝑅=
^
𝑂 2
Odds ratios
^
𝑂 3.932
1
𝑂𝑅= = =11.399
^
𝑂 2
0.345
library(epitools)
oddsratio(titanicTable, method = "wald")