You are on page 1of 30

Outline

Association for nomonal variable


Association for ordinal data
Association for dependent samples

STAT 63103
Lecture 03 – Testing independence for two way
contingency tables

Niroshan Withanage
niroshan@sjp.ac.lk

Dec 12, 2021

Niroshan Withanage niroshan@sjp.ac.lk STAT 63103 Lecture 03 – Testing independence for two way c
Outline
Association for nomonal variable
Association for ordinal data
Association for dependent samples

Outline

1 Pearson Chi–Squared test


2 Likelihood–ratio test
3 Linear trend alternative to independence
4 Comparing dependent proportions (MC Nemar’s test)

Niroshan Withanage niroshan@sjp.ac.lk STAT 63103 Lecture 03 – Testing independence for two way c
Outline
Chi-Squared test
Association for nomonal variable
Likelihood ratio test
Association for ordinal data
Comparing test statistics
Association for dependent samples

Chi-Squared test

This is a nonparametric test and is used to test the association


or statistical independence between two categorical variables.
This test only assesses associations between categorical
variables and cannot provide any inferences about causation.
Observations should be independent (i.e. categorical variables
are not ”paired” in any way)
Expected frequency should be atleast 5 for the cells.

Niroshan Withanage niroshan@sjp.ac.lk STAT 63103 Lecture 03 – Testing independence for two way c
Outline
Chi-Squared test
Association for nomonal variable
Likelihood ratio test
Association for ordinal data
Comparing test statistics
Association for dependent samples

Chi-Squared test contd...,


Hypothesis
H0 : Variable 1 and Variable 2 are independent
H1 : Variable 1 and Variable 2 are dependent
Test statistic
r X
c
2
X (oij − eij )2
χ = ∼ χ2(r −1)(c−1)
eij
i=1 j=1

where
oij is the observed cell count in the ith row and jth column of
the table
eij is the expected cell count in the ith row and jth column of
the table, computed as
i th row total × j th column total
eij = i = 1, · · · , r ; j = 1, · · · , c.
Grand total
Niroshan Withanage niroshan@sjp.ac.lk STAT 63103 Lecture 03 – Testing independence for two way c
Outline
Chi-Squared test
Association for nomonal variable
Likelihood ratio test
Association for ordinal data
Comparing test statistics
Association for dependent samples

Chi-Squared test contd...,

Decision rule

Reject the null hypothesis if χ2 ≥ χ2(r −1)(c−1), α

where α is the significance level.

Niroshan Withanage niroshan@sjp.ac.lk STAT 63103 Lecture 03 – Testing independence for two way c
Outline
Chi-Squared test
Association for nomonal variable
Likelihood ratio test
Association for ordinal data
Comparing test statistics
Association for dependent samples

Likelihood ratio test (Λ)

The test determines the parameter value that maximizes the


likelihood function under the assumption that H0 is true.
It also determines the value maximizes under the value more
general condition that H0 may or may not be true (under the
entire parameter space.)

Maximum likelihood (ML) under H0


Λ=
ML when no parameters are restricted
Λ≤1
Λ, far below 1 indicates strong evidence against H0 .

Niroshan Withanage niroshan@sjp.ac.lk STAT 63103 Lecture 03 – Testing independence for two way c
Outline
Chi-Squared test
Association for nomonal variable
Likelihood ratio test
Association for ordinal data
Comparing test statistics
Association for dependent samples

Likelihood ratio test contd...,


Likelihood ratio statistic

G 2 = −2log (Λ)

Non negative and ”small” values of Λ yield ”large” values of


−2log (Λ).
For 2–way contingency table,
I X
J  
X oij
G2 = 2 oij log ∼ χ2(I −1)(J−1)
eij
i=1 j=1
J X
J  
X nij
= 2 nij log ∼ χ2(I −1)(J−1)
µ̂ij
i=1 j=1

where µ̂ Niroshan
be the expected
Withanage frequency STAT
niroshan@sjp.ac.lk under HLecture
63103 0 (proof later).
03 – Testing independence for two way c
Outline
Chi-Squared test
Association for nomonal variable
Likelihood ratio test
Association for ordinal data
Comparing test statistics
Association for dependent samples

Aspirin and Heart Attacks example

Example:Aspirin and Heart Attacks in doctors

Group Myocardial Infarction (oij )


Yes No Total
Placebo 189 10845 11034
Aspirin 104 10933 11037
Total 293 21778 22071
Prospective study (clinical trial).
Number on each treatment fixed by design.
Check the association between Aspirin usage and Heart
attacks in doctors using
Pearson Chi–squared test
Likelihood–ratio test

Niroshan Withanage niroshan@sjp.ac.lk STAT 63103 Lecture 03 – Testing independence for two way c
Outline
Chi-Squared test
Association for nomonal variable
Likelihood ratio test
Association for ordinal data
Comparing test statistics
Association for dependent samples

Complete the following table by the expected frequencies under the


null hypothesis.

Group Myocardial Infarction (eij )


Yes No Total
Placebo 11034
Aspirin 11037
Total 293 21778 22071

Niroshan Withanage niroshan@sjp.ac.lk STAT 63103 Lecture 03 – Testing independence for two way c
Outline
Chi-Squared test
Association for nomonal variable
Likelihood ratio test
Association for ordinal data
Comparing test statistics
Association for dependent samples

Answer for χ2 test statistic value

Niroshan Withanage niroshan@sjp.ac.lk STAT 63103 Lecture 03 – Testing independence for two way c
Outline
Chi-Squared test
Association for nomonal variable
Likelihood ratio test
Association for ordinal data
Comparing test statistics
Association for dependent samples

Answer for G 2 test statistic value

Niroshan Withanage niroshan@sjp.ac.lk STAT 63103 Lecture 03 – Testing independence for two way c
Outline
Chi-Squared test
Association for nomonal variable
Likelihood ratio test
Association for ordinal data
Comparing test statistics
Association for dependent samples

Comparing test statistics

Complete the calculated values in the following table.

Statistic calculated value Hypothesis tested


G2 H0 : There is no association
X2 H1 : There is an association

All are approximately χ21 under the null hypothesis.

Niroshan Withanage niroshan@sjp.ac.lk STAT 63103 Lecture 03 – Testing independence for two way c
Outline
Chi-Squared test
Association for nomonal variable
Likelihood ratio test
Association for ordinal data
Comparing test statistics
Association for dependent samples

Comparing test statistics contd...,

Note:

G 2 and X 2 statistics depend on estimated expected cell


counts (predicted probabilities) and not how we measure the
treatment differences.

Niroshan Withanage niroshan@sjp.ac.lk STAT 63103 Lecture 03 – Testing independence for two way c
Outline
Chi-Squared test
Association for nomonal variable
Likelihood ratio test
Association for ordinal data
Comparing test statistics
Association for dependent samples

Limitations of χ2 test

X 2 tests only indicates the degree of evidence for an


association.
Do not study the nature of the association
X 2 and G 2 require large samples (special techniques required
for small samples).
X 2 and G 2 do not depend on the order in which rows and
columns are listed.

Niroshan Withanage niroshan@sjp.ac.lk STAT 63103 Lecture 03 – Testing independence for two way c
Outline
Linear trend alternative to independence
Association for nomonal variable
Advantages over X 2 and G 2
Association for ordinal data
Example
Association for dependent samples

Testing independence for ordinal data

The chi–squared test of independency using X 2 and G 2 treat


both classifications as nominal.
When the rows and/or columns are ordinal, test statistics that
utilize the ordinality are more appropriate (powerful)
In that case, the investigation of ”trend” association is quite
common.
As the level of X increases, responses on Y tend to increase
towards higher levels, or decrese towards lower levels.

Niroshan Withanage niroshan@sjp.ac.lk STAT 63103 Lecture 03 – Testing independence for two way c
Outline
Linear trend alternative to independence
Association for nomonal variable
Advantages over X 2 and G 2
Association for ordinal data
Example
Association for dependent samples

Linear trend alternative to independence

Assign socres to categories.


Measure the degree of linear trend or correlation.
Let U1 ≤ U2 ≤ · · · ≤ UI denote the scores for the rows
Let V1 ≤ V2 ≤ · · · ≤ VJ denote the scores for the columns
Scores reflects distances between categories.

Niroshan Withanage niroshan@sjp.ac.lk STAT 63103 Lecture 03 – Testing independence for two way c
Outline
Linear trend alternative to independence
Association for nomonal variable
Advantages over X 2 and G 2
Association for ordinal data
Example
Association for dependent samples

Linear trend alternative to independence contd...,


Hypothesis
H0 : Two variables are independent
H1 : Two variables are correlated
Test statistic
M 2 = (n − 1) r 2 ∼ χ21
for large n, and r is the Pearsons’ Product correlation coefficient.
( i Ui ni+ )( j Vj n+j )
P P
P
i,j Ui Vj n ij − n
r=r r
2 2
( i Ui ni+ ) P 2 ( j Vj n+j )
P P
P 2
U
i i ni+ − n V
j j n+j − n

Independence implies r = 0 and −1 < r < +1


Test treats the variables symmetrically.
Niroshan Withanage niroshan@sjp.ac.lk STAT 63103 Lecture 03 – Testing independence for two way c
Outline
Linear trend alternative to independence
Association for nomonal variable
Advantages over X 2 and G 2
Association for ordinal data
Example
Association for dependent samples

Advantages over X 2 and G 2

Power advantages; if association truly has positive or negative


trend, then easier to detect a significance difference.
If several cell counts are small, the Chi–squared approximation
for X 2 and G 2 are worse than for M 2 .

Niroshan Withanage niroshan@sjp.ac.lk STAT 63103 Lecture 03 – Testing independence for two way c
Outline
Linear trend alternative to independence
Association for nomonal variable
Advantages over X 2 and G 2
Association for ordinal data
Example
Association for dependent samples

Choices of scores

If the variable is ordinal (eg. None, Mild, Severe) investigators


often use scores like 1,2 and 3
If the variable is a crude grouping of an underlying continuous
variable, eg. ”Age” with levels [20, 30), [30, 40), [40, 50), · · · ,
investigator often uses the midpoints of the intervals.
For some variables like ”Dose”, investigators mostly use the
actual numerical value.

Niroshan Withanage niroshan@sjp.ac.lk STAT 63103 Lecture 03 – Testing independence for two way c
Outline
Linear trend alternative to independence
Association for nomonal variable
Advantages over X 2 and G 2
Association for ordinal data
Example
Association for dependent samples

Example: Alcohol consumption and infant mortality


Prospective study of maternal drinking and infant information.

Alcohol Malformation Total % of


Consumption Absent Present malformation
0 17066 48 17114 0.28
<1 14464 38 14502 0.26
1–2 788 5 793 0.63
3–5 126 1 127 0.73
≥6 37 1 38 2.63
Check whetehr there is a relationship between alcohol
consumption and infant malformation at 5% level of
significance using an appropriate test. Compare your results
with X 2 and G 2 tests.
Note: You may use scores 0, 0.5, 1.5, 4, 7 for Alcohol
consumption.
Niroshan Withanage niroshan@sjp.ac.lk STAT 63103 Lecture 03 – Testing independence for two way c
Outline
Linear trend alternative to independence
Association for nomonal variable
Advantages over X 2 and G 2
Association for ordinal data
Example
Association for dependent samples

Example contd...,

The percentage of malformation cases increases at each


increase in level of alcohol consumption.
Suggests a tendency for malformation to be more likely at
higher levels of alcohol consumption.

Niroshan Withanage niroshan@sjp.ac.lk STAT 63103 Lecture 03 – Testing independence for two way c
Outline
Linear trend alternative to independence
Association for nomonal variable
Advantages over X 2 and G 2
Association for ordinal data
Example
Association for dependent samples

Example contd...,

Niroshan Withanage niroshan@sjp.ac.lk STAT 63103 Lecture 03 – Testing independence for two way c
Outline
Association for nomonal variable
McNemar Test
Association for ordinal data
Association for dependent samples

Association for dependent samples


Example:

Following table shows the polling results of British citizens about


Prime Minister’s performance in office at two time points. A
random sample of 1600 voting–age British citizens, 944 indicated
approval of the Prime Minister’s performance in office. Six months
later, of these same 1600 people 880 indicated approval.
First Second Survey
Survey Approve Disapprove Total
Approve 794 150 944
Dosapprove 86 570 656
Total 880 720 1600
Diagonal elements represent the subjects which had the same
opinion at both surveys.
Niroshan Withanage niroshan@sjp.ac.lk STAT 63103 Lecture 03 – Testing independence for two way c
Outline
Association for nomonal variable
McNemar Test
Association for ordinal data
Association for dependent samples

Question of interest:
Is the probability of approal of the Priministers’ performance at
the first survey greater than the second?
In this example, the responses in the two samples are
statistically dependent.
The pairs of dependent observations are called matched pairs.

Niroshan Withanage niroshan@sjp.ac.lk STAT 63103 Lecture 03 – Testing independence for two way c
Outline
Association for nomonal variable
McNemar Test
Association for ordinal data
Association for dependent samples

Let πij be the probability that a subject makes response i at


the Survey 1 and response j at the Survey 2.

First Second Survey


Survey Approve Disapprove Total
Approve π11 π12 π1+
Dosapprove π21 π22 π2+
Total π+1 π+2 1

The probability of approval at the two surveys are: π1+ and


π+1 .
When these are identical, the probability of disapproval are
identical as well. Then it is called, there is marginal
homogeneity.

Niroshan Withanage niroshan@sjp.ac.lk STAT 63103 Lecture 03 – Testing independence for two way c
Outline
Association for nomonal variable
McNemar Test
Association for ordinal data
Association for dependent samples

Marginal homogeneity

π1+ = π+1
π1+ − π+1 = 0
(π11 + π12 ) − (π11 + π21 ) = 0
π12 = π21

Marginal homogeneity implies that π12 = π21 .

Niroshan Withanage niroshan@sjp.ac.lk STAT 63103 Lecture 03 – Testing independence for two way c
Outline
Association for nomonal variable
McNemar Test
Association for ordinal data
Association for dependent samples

McNemar test

This is a test for Marginal homogeneity (homogeneity for


dependent samples).
Hypothesis:

H0 : π1+ = π+1 OR H0 : π12 = π21


versus one or two sided alternative

Test statistic:
n21 − n12
Z=√ ∼ N (0, 1)
n21 + n12

provided that sample size n21 + n12 > 10.

Niroshan Withanage niroshan@sjp.ac.lk STAT 63103 Lecture 03 – Testing independence for two way c
Outline
Association for nomonal variable
McNemar Test
Association for ordinal data
Association for dependent samples

Is the probability of approal of the Priministers’ performance at the


first survey greater than the second?

Niroshan Withanage niroshan@sjp.ac.lk STAT 63103 Lecture 03 – Testing independence for two way c
Outline
Association for nomonal variable
McNemar Test
Association for ordinal data
Association for dependent samples

A (1 − α)100% CI for δ = π+1 − π1+


For the population we have:

δ = π+1 − π1+

For sample data we have:

d = p+1 − p1+
π1+ (1 − π1+ ) + π+1 (1 − π+1 ) − 2 (π11 π22 − π12 π21 )
Var (d) =
n
For large sample, a (1 − α) 100% CI for δ = π+1 − π1+ is
given by:

d ± Z α2 Varˆ(d) (1)

Niroshan Withanage niroshan@sjp.ac.lk STAT 63103 Lecture 03 – Testing independence for two way c
Outline
Association for nomonal variable
McNemar Test
Association for ordinal data
Association for dependent samples

For the Prime Ministers approval rating example, construct a 95%


CI for π+1 − π1+ .

Niroshan Withanage niroshan@sjp.ac.lk STAT 63103 Lecture 03 – Testing independence for two way c

You might also like