You are on page 1of 165

Categorical Data Analysis

Sahadeb Sarkar
IIM Calcutta

• Slides Adapted from Prof Ayanendranath Basu’s Class-notes


• R Programs and Data Sets in Textbook (Tang, He & Tu):
http://accda.sph.tulane.edu/r.html
• Readings: Chapters 1-6, Text

1
Readings: Chapters 1-6 2
Terminology
Discrete data: relates to countable number of outcomes,
Discrete distributions
• Categorical Data: Discrete data with finitely many possible
values on a nominal/ordinal scale (e.g., the state a person
lives in, the political party one might vote for, the blood
type of a patient, grades in a course (Bernoulli,
Multinomial distribution). Central tendency given by its
mode
• Count data (non-negative integer valued) : Records the
frequency of an event (‘success’), may not have an upper
bound (e.g., Poisson, Negative Binomial distributions). It
arises out of counting and not ranking.
• Categorical variable has a finite range and count variable
has possibly an infinite range (text, p. 3)
3
Data Types
• Dichotomous data: can take only two values
such as “Yes” and “No”
• Nonordered polytomous data: five different
detergents
• Ordered polytomous data: grades A, B , C, D;
“old”, “middle-aged”, “young” employees

• Integer valued: nonnegative counts


4
Derivation Tools in CDA, Text p.18
Delta Method:
𝑑 1

If 𝜃𝑛 ՜ 𝑁 𝜃, Σ , 𝑔 𝜃 a m×1 differentiable function
𝑛
k m
from R to R , then
𝑑 1 𝑇
𝑔 𝜃෠𝑛 ՜ 𝑁 𝑔 𝜃 , 𝐷 Σ𝐷
𝑛
𝜕
where Dk×m = 𝑔(𝜃); dim()=k, dim(g())=m
𝜕𝜃
𝑝
Example (Exc): Zi ~ Bin 1, p iid, 𝑝Ƹ = 𝑍𝑛ҧ , 𝑔 𝑝 =ln ,
1−𝑝
𝑑 𝑝(1−𝑝) 𝑝ො 𝑑
𝑝Ƹ = 𝑍𝑛ҧ ՜ 𝑁(p, ), then ln( ො) ՜ ??
𝑛 1−𝑝

5
Derivation Tools in CDA, Text p.18
Delta Method:
𝑑 1

If 𝜃𝑛 ՜ 𝑁 𝜃, Σ , 𝑔 𝜃 a m×1 differentiable function from Rk to Rm ,
𝑛
then
𝑑 1 𝑇

𝑔 𝜃𝑛 ՜ 𝑁 𝑔 𝜃 , 𝐷 Σ𝐷
𝑛
𝜕
where Dk×m = 𝑔(𝜃)
𝜕𝜃
𝑝
Example (Exc): Zi ~ Bin 1, p iid, 𝑝Ƹ = 𝑍ҧ𝑛 , 𝑔 𝑝 =ln , 𝑝Ƹ =
𝑑 1−𝑝
𝑝(1−𝑝)
ҧ
𝑍𝑛 ՜ 𝑁(p, ); k=1,m=1;
𝑛
𝑝ො 𝑑 𝑝 𝑝(1−𝑝) 1 𝑝 1
then ln( )՜ 𝑁 ln( ), ( )2 = 𝑁 ln( ),
1−𝑝ො 1−𝑝 𝑛 𝑝(1−𝑝) 1−𝑝 𝑛𝑝(1−𝑝)

6
Derivation Tools in CDA, Text p.18
Delta Method:
𝑑 1

If 𝜃𝑛 ՜ 𝑁 𝜃, Σ , 𝑔 𝜃 a m×1 differentiable function
𝑛
from Rk to Rm , then
𝑑 1 𝑇
𝑔 𝜃መ𝑛 ՜ 𝑁 𝑔 𝜃 , 𝐷 Σ𝐷
𝑛
𝜕
where Dk×m = 𝑔(𝜃)
𝜕𝜃
Example 1.4(p.19):  scalar, 𝑔 𝜃 = exp(),
𝑑  2 𝑑
𝜃መ𝑛 ՜ 𝑁(θ, ), then 𝑒𝑥𝑝 𝜃መ𝑛 ՜ ??
𝑛
7
Derivation Tools in CDA, Text p.18
Delta Method:
𝑑 1

If 𝜃𝑛 ՜ 𝑁 𝜃, Σ , 𝑔 𝜃 a m×1 differentiable function
𝑛
k m
from R to R , then
𝑑 1 𝑇
𝑔 𝜃෠𝑛 ՜ 𝑁 𝑔 𝜃 , 𝐷 Σ𝐷
𝑛
𝜕
where Dk×m = 𝑔(𝜃)
𝜕𝜃
𝑑  2
Example 1.4(p.19):  scalar, 𝑔 𝜃 = exp(), 𝜃෠𝑛 ՜ 𝑁(θ, );
𝑛

𝑑  2
k=1, m=1; then 𝑒𝑥𝑝 𝜃𝑛 ՜ 𝑁 𝑒𝑥𝑝 𝜃 , exp(2)
𝑛

8
Derivation Tools in CDA, Text p.15-19
Set Up: Suppose we have an i.i.d. sample X1, X2, …, Xn, n
large, with E(Xi) = µ and V(Xi)=2, E(Xi-µ)4 = µ4 (fourth
central moment notation) with underlying prob. dist. may
be complex and usually unknown. Let 𝑋ത𝑛 = sample mean;
𝑠𝑛2 = sample variance with divisor n.
Convergence in Probability or (Weak) Law of Large
Numbers: For >0, P(|𝑋ത𝑛 − µ| > ) converges to 0 as n 
, for every ;
𝑝
(written as: 𝑋ത𝑛 ՜ µ).
[𝑋ത𝑛 is said to be a consistent estimate of µ.]

9
Derivation Tools in CDA, Text p.15-19
CLT and Convergence in Distribution (or Law) (p.16):
𝑋ത𝑛 −𝜇
Let Fn(x) denote the CDF of 𝑛[ ] and F(x) = (x) denote

the CDF of N(0,1). Then for every continuity point x of F() =
𝑋ത𝑛 −𝜇 𝑑
(), Fn(x) converges to F(x) as n   [written as: 𝑛[ ]՜

N(0, 1), as n  ]
𝜎2
Application: Asymptotic Variance of 𝑋ത𝑛 = ; Estimated
𝑛
2
𝑠𝑛
Asymptotic Variance of 𝑋ത𝑛 = .
𝑛
[What about asympt dist of 𝑠𝑛2 as an estimator of 𝜎 2 ? For that we
need Slutsky’s Theorem.]
10
Derivation Tools in CDA, Text p.15-19
Convergence in Probability : For >0, P(|𝑋𝑛 − 𝑋| > )
𝑝
converges to 0 as n  , for every ; then 𝑋𝑛 ՜ X).
Convergence in Distribution: Let Fn(x) denote the CDF of
𝑋𝑛 and F(x) denote the CDF of X. Then if for every
continuity point x of F(), Fn(x) converges to F(x) as n  ,
𝑑
then 𝑋𝑛 ՜ X

Result: Convergence in probability implies convergence in


distribution. Convergence in distribution implies
convergence in probability when the limiting random
variable X is a constant.
11
Derivation Tools in CDA, Text p.15-19
Slutsky’s Theorem (p.18):
𝑑 𝑑
Suppose 𝑋𝑛 ՜ 𝑋 and 𝑌𝑛 ՜ 𝑐, constant. Then
𝑑
1. 𝑋𝑛 + 𝑌𝑛 ՜ 𝑋 + 𝑐
𝑑
2. 𝑌𝑛 𝑋𝑛 ՜ 𝑐𝑋
𝑑
3. If c0, 𝑋𝑛 /𝑌𝑛 ՜ 𝑋/𝑐
σ𝑛 ത 2
𝑖=1(𝑍𝑖 −𝑍𝑛 )
Example 1.5(Exc): 𝑠𝑛2 = ;
𝑛
𝑑
Show that 𝑛 − 𝑠𝑛2 𝜎2
՜ N(0, Var(𝑍𝑖 −µ)2 )
Estimated asymp. var. of 𝑠𝑛2 = ? ?

12
Derivation Tools in CDA, Text p.18
Slutsky’s Theorem:
𝑑 𝑑
Suppose 𝑋𝑛 ՜ 𝑋 and 𝑌𝑛 ՜ 𝑐, constant. Then
𝑑
1. 𝑋𝑛 + 𝑌𝑛 ՜ 𝑋 + 𝑐
𝑑
2. 𝑌𝑛 𝑋𝑛 ՜ 𝑐𝑋
𝑑
3. If c0, 𝑋𝑛 /𝑌𝑛 ՜ 𝑋/𝑐

Example 1.5 Exc :


𝑑
𝑛 𝑠𝑛2 − 𝜎2
՜ N(0, Var(𝑍𝑖 −µ)2 )
σ𝑛 ҧ 2
𝑖=1(𝑍𝑖 −𝑍𝑛 ) Var(𝑍𝑖 −µ)2 )
 𝑠𝑛 =
2 2
~ N( , )
𝑛 𝑛
σ𝑛 ഥ 4
𝑖=1(𝑍𝑖 −𝑍𝑛 ) 2 )2
−(𝑠𝑛
Estimated asymp. var. of 𝑠𝑛2 = 𝑛
(How??)
𝑛

13
Derivation Tools in CDA, Text p.18
σ𝑛 ҧ 2
𝑖=1 𝑍𝑖 −𝑍𝑛 σ𝑛
𝑖=1 𝑍𝑖 −µ
2
𝑠𝑛2 2
−𝜎 = − =
2
− 2 − (𝑍ҧ𝑛 − µ)2 .
𝑛 𝑛
𝑍𝑖∗ = 𝑍𝑖 − µ 2 ; 𝐸(𝑍 ∗ ) = 2 , 𝑉(𝑍 ∗ ) = 𝐸 𝑍 − µ 4 − 4 = µ − 4 .
𝑖 𝑖 𝑖 4
σ𝑛𝑖=1 𝑍𝑖 −µ
2
𝑛 𝑠𝑛2 − 𝜎 2 = 𝑛 − 2 − 𝑛 𝑍ҧ𝑛 − µ ∗ (𝑍ҧ𝑛 − µ)
𝑛
=Term 1  Term2*Term3
𝑑
Term1 ՜ N(0, 𝑉 𝑍𝑖 − µ 2 ) by CLT for iid 𝑍𝑖∗ ;
𝑑 𝑑
2
Term 2 ՜ N(0, 𝜎 ) by CLT; Term 3 ՜ 0 by WLLN.
𝑑
By Slutsky’s Theorem, 𝑛 𝑠𝑛2 − 𝜎2 ՜ N(0, 𝑉 𝑍𝑖 − µ 2
= µ4 − 4 ).
1 𝑛
µෞ4 −෢4 ҧ )4
σ𝑖=1(𝑍𝑖 −𝑍𝑛 2 )2
−(𝑠𝑛
Estimated asymptotic variance of 𝑠𝑛2 = = 𝑛
𝑛 𝑛

14
Inference for One-way Frequency
Table

• Binary case (Sec 2.1.1, Text)


• Inference for Multinomial Variable (Sec 2.1.2)
• Inference for Count Variable (Sec 2.1.3)

R Programs and Data Sets in Textbook (Tan, He & Tu):


http://accda.sph.tulane.edu/r.html

15
Binomial Distribution
(leading to One-Way Frequecy Table)
Suppose Y is a random variable with 2 possible outcome
categories c1,c2 with probabilities π1, π2=(1 π1).
Suppose there are n observations on Y ; we can summarize
the responses through the vector of observed frequencies
(random variables) of data on Y: (X1, X2=nX1).

Then (X1, X2=nX1) is said to have a Binomial distribution


with parameters n and (π1, π2=1 π1), or simply X1 is said to
have a Binomial distribution with parameters n and π1.
𝑛! 𝑥
P(X1=x1) = 𝜋1 1 (1 − π1)𝑛−𝑥1 , 𝑥1 = 0,1, … , 𝑛
𝑥1 !(𝑛−𝑥1 )!
Then, E(X1) = nπ1, V(X1) = nπ1(1-π1) < E(X1)

16
Example 1.1, p. 6, Text

What is Metabolic Syndrome ?


Exc: Test if the prevalence of Metabolic Syndrome is 40% in
this study population
Ho: =0 vs Ha: ??
ෝ − 𝜋0
π
𝑍= = ??, P-value=??
𝜋0 ×(1−𝜋0 )/𝑛
Construct 95% Confidence Interval for the prevalence in this
ෝ ± 𝑍/2 × π
population: π ෝ × (1 − π ෝ)/𝑛 = ??
17
Metabolic syndrome
(https://en.wikipedia.org/wiki/Metabolic_syndrome)

Metabolic syndrome, sometimes known by other


names, is a clustering of at least three of the five
following medical conditions (giving a total of 16
possible combinations giving the syndrome):
 abdominal (central) obesity
 High blood pressure
 High blood sugar
 High Serum Triglycerides
 Low high-density lipoprotein (HDL) levels

18
Example 1.1 (Binary Case), p. 37, Text
• Test if the prevalence of Metabolic Syndrome is 40% in this study
population
48
ෝ − 𝜋0
π 93
− 0.4
𝑍= = = 2.286;
𝜋0 ×(1−𝜋0 )/𝑛 0.4×0.6/93
P-value = 2(2.2.86)=0.0223

• Construct 95% Confidence Interval for the prevalence in this


ෝ −
π
population (“Pivot quantity” = πෝ×(1−ෝπ)/𝑛
~ 𝑁(0,1))

ෝ ± 𝑍/2 × π
π ෝ × (1 − π
ෝ)/𝑛
48 48 48
= ± 1.96 × × (1 − )/93
93 93 93
=[0.4146, 0.6177]
19
Negative Binomial Distribution (p. 41)
• A sequence of independent Bernoulli trials, having two potential
outcomes "success" and "failure". In each trial probability of
success is p and of failure is (1 − p). Observe this sequence until
a predefined number r of failures has occurred. Then X =
number of ‘successes’ observed, will have the negative
binomial distribution:

 𝑟+𝑘 𝑟
• 𝑃𝑟 𝑋 = 𝑘 = 𝑝 (1 − 𝑝)𝑘 , where  𝑛 =
 𝑟 𝑘!
∞ 𝑛−1 −𝑥 𝑘+𝑟−1 𝑘 −𝑟
‫׬‬0 𝑥 𝑒 𝑑𝑥 = 𝑛 − 1 ! ; =(−1)
𝑘
𝑘
• Why called Negative Binomial?
−𝑟 σ∞ 𝑘 + 𝑟 − 1 𝑘 σ∞ −𝑟
• (1 − 𝑝) = 𝑘=0 𝑝 = 𝑘=0 (−𝑝)𝑘
𝑘 𝑘
20
Negative Binomial Distribution (p. 41)
• A sequence of independent Bernoulli trials, having two
potential outcomes "success" and "failure". In each trial
probability of success is p and of failure is (1 − p). Observe this
sequence until a predefined number r of failures has occurred.
Then X = number of ‘successes’ observed, will have
the negative binomial distribution:

•  = E(X)= rp/(1-p), V(X) = rp/(1-p)2 > E(X).


1
• Sometimes one puts 𝛼 = & =rp/(1-p) for
r
reparameterization
21
Negative Binomial Distribution (p. 41)

 𝑟+𝑘 𝑟
• 𝑃 𝑋=𝑘 = 𝑝 (1 − 𝑝)𝑘 ……… (1a)
 𝑟 𝑘!
𝑝 𝑝
• E(X)= r( ), V(X) = r 2 > E(X) …….. (1b)
(1−𝑝) (1−𝑝)
• Extension through reparameterization:
1 𝑝
= (> 0), =r( ) in (1)
𝑟 (1−𝑝)
𝑟
 α1 +𝑘 µ 𝑘
1
• Then, 𝑃 𝑋 = 𝑘 = α
……… (2a)
 α1 𝑘! α1 +µ 1
α

• E(X)= ; V(X) =  + 2 ……………………(2b)

22
Hypergeometric Distribution
(used in calculating P-value in Exact Tests)

• Randomly sample n elements from a finite (dichotomous)


population of size N, without replacement, having K “success”-type
and (N-K) “failure”-type elements. (e.g. Pass/Fail or Employed/
Unemployed).
• The probability of a success changes on each draw, as each draw
decreases the population.
• X = number of successes in the sample. Then X has the
hypergeometric distribution:

• E(X)=n(K/N), V(X) = {n(K/N)(1 - K/N)}×[(N-n)/(N-1)]


• Thus, E(X/n)=(K/N)  Estimate K/N with X/n  Estimate N with
K(n/X)

23
Hypergeometric Distribution
(used to calculate ‘Population’ Total)

• Capture Recapture Sampling to estimate the total number of individuals in a


population (https://online.stat.psu.edu/stat506/lesson/12/12.1)
• Example 1: To estimate fish or animal population.
• Example 2: To estimate the total number of homeless individuals.
K - initial sample size captured and marked
n - second sample size recaptured independently without replacement
X - number of marked samples in the recaptured one
N - total population size
• E(X)=n(K/N)
• Thus, E(X/n)=(K/N)  Estimate K/N with X/n  Estimate N with K(n/X)

Assignment: Find nice applications in business management

24
Hypergeometric Distribution
(used in acceptance sampling)

• It has applications in acceptance sampling. Items are


produced in finite batches and a decision is made to accept or
reject the batch on the basis of a random sample selected
from the batch and the observed number of ‘defective’ items.
• Suppose an acceptance sampling plan is to choose a random
sample of size n from a batch of 20 computer chips. The batch
will be accepted if number of defective chips X in the sample
equals zero.
• Assume unknown K = 5 (defective items);
– for n=5, then the acceptance probability of the batch is P(X=0) =
0.1937 [= combin(15,5)/combin(20,5)].
– If n=8, then P(X=0) = 0.0511 [= combin(15,8)/combin(20,8)].
– If n=10, then P(X=0) = 0.0163 [= combin(15,10)/combin(20,10)].

25
Multivariate Hypergeometric Distribution
(used in calculating P-value in Exact Tests)

• Randomly sample n elements from a finite (polytomous)


population of size N, without replacement, having K1, K2, ..., Kc
elements of types 1, 2, …, c.
• Xi= number of i-th type elements in the sample, i=1,…,c. Then
X has multivariate hypergeometric distribution:
𝐾
ς𝑐𝑖=1 𝑖
𝑥𝑖
𝑃(𝑋𝑖 = 𝑥𝑖 , 𝑖 = 1, … , 𝑐) =
𝑁
𝑛
• E(Xi)=n(Ki/N),
• V(Xi) = {n(Ki/N)(1 – (Ki/N) )} × [(N-n)/(N-1)]
• Cov(Xi, Xj) = {n(Ki/N)(Kj/N) } × [(N-n)/(N-1)]

26
Inference for Multinomial Case

27
Multinomial Distribution
(may lead to One-Way, Two-Way, … Frequecy Table)
Suppose Y is a categorical variable with k possible
outcome categories c1,c2,…,ck with probabilities π1,
π2,…, πk=(1- π1-…- πk-1).
Suppose there are n observations on categorical Y;
we can summarize the responses through the vector
of observed frequencies (random variables) of data
on Y: X= (X1, X2,…, Xk), where Xk=n- X1-…- Xk-1.

Then X = (X1, X2,…, Xk) is said to have a multinomial


distribution with parameters n and (π1, π2,…, πk ).
𝑛! 𝑥 𝑥
P(X1=x1, …, Xk=xk) = 𝜋1 1 … 𝜋𝑘 𝑘
𝑥1 ! 𝑥2 !… 𝑥𝑘 !

28
Multinomial Distribution
(may lead to One-Way, Two-Way, … Frequecy Table)
X = (X1, X2,…, Xk) has a multinomial distribution with
parameters n and (π1, π2,…, πk ).
𝑛! 𝑥 𝑥
P(X1=x1, …, Xk=xk) = 𝜋1 1 … 𝜋𝑘 𝑘
𝑥1 ! 𝑥2 !… 𝑥𝑘 !
𝐸(𝑋𝑖 )=n𝜋𝑖 ,
𝑉(𝑋𝑖 )=n𝜋𝑖 (1−𝜋𝑖 ) ;
𝐶𝑜𝑣(𝑋𝑖 , 𝑋𝑗 ) = n𝜋𝑖 𝜋𝑗 , ij [Prove it, Exercise]

Maximum Likelihood Estimation (MLE) :


MLE of 𝜋𝑖 = 𝑋𝑖 /𝑛 (Prove it, Exercise)
𝐿(𝜋1 , …, 𝜋𝑘 , ) = ln 𝑙𝑖𝑘𝑒𝑙𝑖ℎ𝑜𝑜𝑑 + (1 − σ𝑘𝑖=1 𝑖 )

29
Multinomial Distribution
(may lead to One-Way, Two-Way, … Frequency Table)
X = (X1, X2,…, Xk) has a multinomial distribution with
parameters n and (π1, π2,…, πk ).
𝑛! 𝑥 𝑥
P(X1=x1, …, Xk=xk) = 𝜋1 1 … 𝜋𝑘 𝑘
𝑥1 ! 𝑥2 !… 𝑥𝑘 !
𝐸(𝑋𝑖 )=n𝜋𝑖 ,
𝑉(𝑋𝑖 )=n𝜋𝑖 (1−𝜋𝑖 ) ;
𝐶𝑜𝑣(𝑋𝑖 , 𝑋𝑗 ) = n𝜋𝑖 𝜋𝑗 , ij
𝜋1 𝜋1 (1 − 𝜋1 ) ⋯ −𝜋1 𝜋𝑘
E X = µ = 𝑛 ⋮ ; V X = 𝑛 = 𝑛 ⋮ ⋱ ⋮ ;
𝜋𝑘 −𝜋1 𝜋𝑘 ⋯ 𝜋𝑘 (1 − 𝜋𝑘 )
𝜋1 ⋯ 0 𝜋1
Thus,  = 𝑑𝑖𝑎𝑔 𝜋 − 𝜋𝜋 𝑇 = ⋮ ⋱ ⋮ − ⋮ 𝜋1 … 𝜋𝑘
0 ⋯ 𝜋𝑘 𝜋𝑘

30
Linking Poisson with Multinomial Dist,
(pp. 202-203, Text)
Result: Let Y1, …,Yk be independent Poi(i) r.v.’s. Then given (Y1+
…+Yk)=n, the conditional joint distribution of (Y1, …,Yk) is multinomial

distribution with parameters: n trials, and 𝜋𝑖 = 𝑘 𝑖 , i=1,…,k.
σ𝑗=1 𝑗

We may assume cell counts in a contingency table are generated


by independent Poisson distributions and then conditional on n
= sum of the cell counts one can do conditional inference based
on the multinomial distribution for the cell counts.

31
Example 1.1, p. 6, Text
One-Way Frequency Table for Metabolic Syndrome Study
MS
Present Absent Total
48 45 93

Two-Way Frequency Table for Metabolic Syndrome Study


MS
Gender Present Absent Total
male 31 31 62
female 17 14 31
Total 48 45 93
32
Pearson’s Chi-square (χ2) Test
H 0 :  i   0i i  1,..., k (1)

The fit of the model is assessed by comparing the frequencies


expected in each cell, against the observed frequencies. If there
is substantial discrepancy between the observed frequencies
and those expected from the null model, then it would be wise
to reject the null model. The best known goodness-of-fit
statistic used to test the hypothesis in (1) is the Pearson’s Chi-
Square (PCS):
𝑂𝑏𝑠𝑒𝑟𝑣𝑒𝑑𝑖 −𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑𝑖 2 𝑋𝑖 −𝑛0𝑖 2
PCS,  = σ𝑖=1
2 𝑘 𝑘
= σ𝑖=1 ;
𝐸𝑥𝑝𝑒𝑐𝑡𝑒𝑑𝑖 𝑛0𝑖
Result: PCS has chi-square distribution with df=(k-1) under H0
For a proof see pages 1-4 of https://arxiv.org/pdf/1808.09171.pdf
33
Example: Pearson’s χ2 Test
When we are trying to do a test of hypothesis to determine
whether a die is a fair die, it is a simple hypothesis.
Suppose we roll it 120 times and the summarized data
are as follows:

In this case, k=?? and n=??. H0: πi = ?? (= π0i), i=1,2,…,??

34
Example: Pearson’s χ2 Test
When we are trying to do a test of hypothesis to determine
whether a die is a fair die, it is a simple hypothesis.
Suppose we roll it 120 times and the summarized data
are as follows:
In this case, k=6 and n=120. H0: πi = 1/6 (= π0i), i=1,2,…,6

= CHISQ.DIST.RT(6.1,5) , using Excel


35
Multinomial Example, p.38,Text
Multinomial Case:
Depression Diagnosis in the DOS Study
Major Dep Minor Dep No Dep Total
128 136 481 745
DOS = Depression Of Seniors

Test H0: P(No Dep) = 0.65, P(Minor Dep)=0.2, P(Major


Dep = 0.15)
Here, k=3, n=745
(481−484.25)2 (136−149)2 (128−111.75)2
𝑃𝐶𝑆 = + + = 3.519
484.25 149 111.75
df= (no. of categories -1) = 2; P-value= 0.1721
36
Multinomial Example, p.38,Text
Multinomial Case:
Depression Diagnosis in the DOS Study
Major Dep Minor Dep No Dep Total
128 136 481 745
DOS = Depression Of Seniors

Test H0: P(No Dep) = 0.65, P(Minor Dep)=0.2, P(Major


Dep = 0.15)
Here, k=??, n=??, 𝑃𝐶𝑆 =? ?, df= ?? ; P-value= ??

DOS: Depression of Seniors Study (p.4)

37
Example 2.2, p.4, p.38, Text

Conclusion: The null hypothesis claim appears to be true 38


Pearson’s Chi-Square (contd.)
The hypothesis presented in Equation (1) is an
example of a simple hypothesis. (Simple in the sense
that the hypothesis completely specifies the true
distribution).

The hypothesis is said to be composite when the null


is not completely spelt out, but is specified in terms
of d parameters (d < k − 1).

39
Testing Composite Hypothesis
in Inference for Count Data

40
Poisson Distribution Case
Suppose Y is a random variable taking integer values y=0,
− 
𝑦
1, 2,…, with probability P(Y=y)=𝑒
𝑦!
Suppose there are n observations on Y; we can summarize
the observations through the vector of observed
frequencies for value-categories 0, 1, 2, …

Suppose all counts ≥ 6 are combined to make combined


frequency more than 5. Then with k=7, value-categories
0, 1, 2, 3, 4, 5, and ≥6 (say) observed frequencies X =
(X1, X2,…, Xk), where Xk=(n X1  …  Xk-1), has a
multinomial distribution with parameters n and (π1, π2,…,
πk ), where π1 = P(Y=0)=𝑒 − , π2 = P(Y=1)=𝑒 − ,…,
π7=P(Y≥6).
41
Example 2.3, p.42, Text

Exc: Check MLE of


=9.1 or 3.6489 ??
Here, k=??, n=??, 𝑃𝐶𝑆 =? ?, df= ?? ;
P-value= ??

Conclusion: Null hypothesis claim appears to be false or true??

42
MLE of  = 9.1 or 3.6489?
𝐿 𝜃
2 5 3 5 4 5 5 6
32 4 𝜃 𝜃 𝜃 𝜃
= 𝑒 −𝜃 𝑒 −𝜃 𝜃 𝑒 −𝜃 𝑒 −𝜃 𝑒 −𝜃 𝑒 −𝜃 ×
2 6 24 120
2 5 41
𝜃 𝜃
1 − 𝑒 −𝜃 − 𝑒 −𝜃 𝜃 − 𝑒 −𝜃 … −𝑒 −𝜃
2 120
Maximize this function 𝐿 𝜃 w.r.t. 𝜃
Need to do numerical maximization
Do it as an Exercise

43
Example 2.3, p.42, Text

Exc: Check MLE of


=9.1 or 3.6489 ??

Conclusion: Null hypothesis claim appears to be false (df=7-1-1) 44


Example 2.3, p.42, Text

Exc: Check MLE of


=9.1 or 3.6489 ??

Conclusion: Null hypothesis claim appears to be false (df=7-1-1) 45


Intentionally Kept Blank

46
Sampling Schemes
Leading to (2×2) Contingency Tables

47
Layout of the 2×2 table
Column factor
(‘Response’)

Level 1 Level 2
Level 1 n11 n12 R1=
Row Fact n1+= n10=n1 Row
(‘Explanatory’) Total
Level 2 n21 n22 R2=
n2+= n20=n2

C1=n+1 C2=n+2 T=n


=n01 =n02
Grand
Total
Column
Marginal
Total 48
Totals
Sampling schemes
leading to 2×2 contingency tables

Sampling scheme Marginal Total fixed in


advance
Poisson None
Multinomial Grand Total (Sample size)

Prospective Product Row (explanatory) total


Binomial
Retrospective Product Column (Response) total
Binomial

49
Poisson Sampling
• Poisson Sampling (French mathematician Simeon
Denis Poisson): Here a fixed amount of time (or space,
volume, money etc.) is employed to collect a random
sample from a single population and each member of
the population falls into one of the four cells in the
2×2 table.
• In the CVD Death example 1 (next slide), researchers
spent a certain amount of time sampling the health
records of 3112 women who were categorized as
obese and non-obese against died of CVD or not. In
this case, none of the marginal totals or the sample
size was known in advance.
50
Example-1*: Cardio-Vascular Deaths and Obesity
among women in American Samoa

[7.76 (=16/2061) observed deaths among obese women versus 6.66


(=7/1051) deaths per thousand among non-obese … .]

*This is an “Observational Study“, an example of “Poisson Sampling”


[Ramsey, F. L. and Schafer, D. W. (1997). The Statistical Sleuth. Duxbury
Press, Belmont, California.]

51
American Samoa on World Map

52
Multinomial Sampling
• This is same as the Poisson sampling scheme except
for the fact that here the overall sample size is
predetermined, and not the amount of time for
sampling (or space or volume or money etc.)
• Suppose in the CVD Death Example 1, researchers
decided to sample the health records of exactly 3112
women and then for each woman note (i) if obese or
non-obese and (ii) if died of CVD or did not die of
CVD. Then it would have been multinomial sampling.

53
Prospective Product Binomial Sampling
• Prospective Product Binomial Sampling
(“cohort” study): First identify explanatory variable(s)
that explain “causation” . Population is categorized according
to levels of explanatory variable and random samples are then
selected from each explanatory group.
Suppose separate lists of obese and non-obese American
Samoan women were available in Example 1, a random
sample of 2061 and 1051 respectively could have been
selected from the lists. The term Binomial refers to the
dichotomy of the explanatory variable. The term Product
refers to the fact that sampling is done from more than one
population independently.

54
Example-1: Cardio-Vascular Deaths and Obesity
among women in American Samoa

[7.76 (=16/2061) observed deaths versus 6.66 (=7/1051) deaths per


thousand.]
Assuming prospective product binomial sampling, test equal
proportions of CVD deaths in populations of obese and non-obese
Samoan women [Test statistic Z=??, value of Z = ??, P-value = ??]

55
Layout for Inference on the 2×2 table
(Sec 2.2, Text)
Column factor
(‘Response’)

Level 1 Level 2
Level 1 n11 n12 R1=
Row Fact n1+= n10=n1 Row
(‘Explanatory’) Total
Level 2 n21 n22 R2=
n2+= n20=n2

C1=n+1 C2=n+2 T=n


=n01 =n02
Grand
Total
Column
Marginal
Total 56
Totals
Example-1: Cardio-Vascular Deaths and Obesity among
women in American Samoa

[7.76 (=16/2061) observed deaths versus 6.66 (=7/1051) deaths


per thousand.]

Test equality of proportions of CVD deaths in populations of


obese and non-obese Samoan women [Z=0.3397, P-value =
0.7341, test statistic,
ෞ1 −𝜋
𝜋 ෞ2
Z= is asymptotically N(0,1), under H0: 𝜋1 = 𝜋2 ,
 
෡ (1− ෡ )( 1
𝑛1
+ 1
n2
)
ෞ1 )+(𝑛2 ∗ 𝜋
(𝑛1 ∗ 𝜋 ෞ2 )
where estimate of 𝜋1 = 𝜋2 = is given by ො =
𝑛

57
Example-2*: Vitamin-C versus Placebo Experiment

COLD NO COLD TOTAL


PLACEBO 335 76 411
VITAMIN-C 302 105 407
TOTAL 637 181 818

Testing equality of proportions of Colds in populations of


Placebo and Vitamin-C takers. Z = ??, One sided P-value = ??
[Observed proportion 82% versus 74%]

*This is a ‘Double Blind Randomized’ Study [Ramsey and


Schaffer]
58
Example-2*: Vitamin-C versus Placebo Experiment

COLD NO COLD TOTAL


PLACEBO 335 76 411
VITAMIN-C 302 105 407
TOTAL 637 181 818

Testing equality of proportions of Colds in populations of


Placebo and Vitamin-C takers. Z = 2.52, One sided P-value=
0.0059 [Observed proportion 82% versus 74%]

59
Retrospective Product Binomial Sampling

• Retrospective Product Binomial Sampling


(“Case- Control” study): This sampling scheme is
technically same as the previous one. However, roles
of the response and the explanatory factors are
reversed. In this scheme, we categorize the
population according to the identified response
levels and random samples are selected from each
response group.

60
Example 3*: Smoking versus Lung Cancer Outcome

CANCER CONTROL TOTAL

SMOKER 83 72 155
NON-SMOKER 3 14 17
TOTAL 86 86 172

Like to test equality of proportions of cancers in


populations of smokers and nonsmokers
But can only test equality of proportions of smokers in
populations of cancers and non-cancers
(Homogeneity). Z = ??, One sided P-value = ??
*A Retrospective Observational Study [Ramsey and Schaffer]
61
Example 3*: Smoking versus Lung Cancer Outcome

CANCER CONTROL TOTAL

SMOKER 83 72 155
NON-SMOKER 3 14 17
TOTAL 86 86 172

Like to test equality of proportions of cancers (‘Homogeneity’) in


populations of smokers and nonsmokers
But can only test equality of proportions of smokers in
populations of cancers and non-cancers through the Z test. Z =
2.82, One sided P-value = 0.0025.

62
Why Retrospective Product Binomial
Sampling at all ?
• We cannot test for the equality of proportions along the
explanatory variable if the sampling scheme is
retrospective.
• We only get odds ratio from a case control study which is
an inferior measure of strength of association as
compared to relative risk.
• Why do retrospective sampling at all, then?
Compared to prospective cohort studies they tend to be less
costly and shorter in duration. Case-control studies are often
used in the study of rare diseases, or as a preliminary study
where little is known about the association between possible
risk factor and disease of interest.

63
Retrospective Product Binomial
Sampling (Continued)
1. If the probabilities of the “Yes” response (e.g., cancer)
are very small for a particular level (e.g., non-smoker) of
explanatory variable, it may need a huge sample size to
get any “Yes” response at all through prospective
sampling.
2. Retrospective sampling guarantees that we have at
least a reasonable number of “Yes” responses for each
level of explanatory variable.
3. Retrospective sampling may be accomplished without
having to follow the subjects throughout their lifetime
(in the smoking versus lung cancer study, Example 3).

64
Sampling scheme versus Hypotheses Testing
Sampling scheme Marginal Total fixed in Usual Hypothesis: Usual Hypothesis:
advance Independence Homogeneity
Poisson None YES
Multinomial Grand Total (Sample YES
size)
Prospective Row (explanatory) YES
total
Retrospective Column (Response) YES
total

Through “Odds Ratio” only

65
Prospective
Subjects selected
according to the levels
of the explanatory
variable

Explanatory Response
Variable Variable

Retrospective
Subjects selected
according to
levels of the
Response variable

66
Layout for Inference on the 2×2 table
(Sec 2.2, Text)
Column factor
(‘Response’)

Level 1 Level 2
Level 1 n11 n12 R1=
Row Fact n1+= n10=n1 Row
(‘Explanatory’) Total
Level 2 n21 n22 R2=
n2+= n20=n2

C1=n+1 C2=n+2 T=n


=n01 =n02
Grand
Total
Column
Marginal
Total 67
Totals
Estimated Proportions
Prospective Product Binomial Sampling:
• Proportion of “Yes” (Level 1) response in the first level of the
explanatory variable is
𝑛11
𝜋ො1 =
𝑛1+
• Similarly the proportion of “Yes” (Level 1) response in the
second level of the explanatory variable is
𝑛21
𝜋ො 2 =
𝑛2+

68
Assumptions for Asymptotic Chi-square tests in
22 Contingency Tables
• We will assume that the frequencies of all the entries in
the 2x2 table are greater than 5.
• This ensures that the “asymptotic tests” performed on
the 2x2 tables are reasonably accurate. (“asymptotic”
means ‘appropriate in large samples’)
• If all the entries in the 2x2 table are not greater than 5,
one may try Fisher’s Exact test.

69
Example-1*: Cardio-Vascular Deaths and Obesity among
women in American Samoa

[7.76 observed deaths versus 6.66 deaths per thousand.]


Assuming multinomial (or Poisson) sampling scheme, test
independence of two attributes (obesity and cardio-vascular
deaths) [Test statistic PCS =??, P-value = ??]
*This is an “Observational Study“, an example of “Poisson Sampling” [Ramsey, F. L. and
Schafer, D. W. (1997). The Statistical Sleuth. Duxbury Press, Belmont, California.]

70
Example-1 Calculations

DF = 1; Two-sided P-value = 1-CHISQ.DIST(0.115,1,TRUE)


One-sided P-value = 1-NORM.S.DIST(0.34,TRUE)
𝑛𝑖+ 𝑛+𝑗
Calculation of expected cell count under H0: 𝑝𝑖𝑗 = 𝑝𝑖+ *𝑝+𝑗 , 𝑛ො 𝑖𝑗 = 𝑛𝑝Ƹ 𝑖𝑗 = 𝑛 ( )( 𝑛 )
𝑛
71
Pearson’s Chi-square (PCS) Test
(𝑂𝑐 − 𝐸𝑐 )2
𝑃𝐶𝑆 = ෍
𝑐𝑎𝑡𝑒𝑔𝑜𝑟𝑦 𝑐 𝐸𝑐
where Oc = observed count in category c, Ec = expected
count in category c, as per proposed model.

H0: Proposed model generated the observed data


Ha: Proposed model did not generate the data
If H0 is true then PCS has a chi-square distribution with
appropriate degrees of freedom (df).
72
Chi-square Distribution
Let Z1, …, Zk be independent random variables each
having N(0,1) distribution. Then 𝑍1 2 + … + 𝑍𝑘 2 is said to
follow chi-square (2k) with k degrees of freedom (df).

Result: The expected value and variance of a chi-square


(2k) random variable are given by: E(2k ) = k (=df);
Var(2k) = 2k (= 2*df).

For given k and , let 2k, denote a real number, which is


exceeded with probability  by a 2k random variable.
73
Chi-square Distribution

74
Example-2: Vitamin-C versus Placebo Experiment

COLD NO COLD TOTAL


PLACEBO 335 76 411
VITAMIN-C 302 105 407
TOTAL 637 181 818

Testing equal proportions of colds in populations of Placebo


and Vitamin-C takers. [Observed proportion 82% versus 74%]
PCS = ??, two sided P-value= ??, Z = ??, One sided P-value = ??,

75
Example-2: Vitamin-C versus Placebo Experiment

COLD NO COLD TOTAL


PLACEBO 335 76 411
VITAMIN-C 302 105 407
TOTAL 637 181 818

Testing equal proportions (homogeneity) of colds in


populations of Placebo and Vitamin-C takers. PCS = 6.3366, P-
value = 0.012; Z = 2.5173, One sided P-value for this example is
0.0059 [Observed proportion 82% versus 74%]

76
Example-2 Calculations

𝑛+𝑗
• Expected cell count for PCS test under H0: 𝑝1+ = 𝑝2+: 𝑛ො 𝑖𝑗 = 𝑛𝑖+ ( 𝑛 );
• Equivalent Z Test for homogeneity of two binomial populations: 𝑍 =
𝑝ො1+ −𝑝ො2+ 𝑛+1 𝑛+2
( ), where 𝑝Ƹ = and 1 − 𝑝Ƹ = ;
𝑝ො (1−𝑝)/𝑛
ො 𝑛 𝑛
• Note: PCS test can be used to test homogeneity of binomial populations, as
77
PCS=Z2.
Example 3*: Smoking versus Lung Cancer Outcome

CANCER CONTROL TOTAL

SMOKER 83 72 155
NON-SMOKER 3 14 17
TOTAL 86 86 172

• Testing equal proportions of smokers in populations of


cancers and non-cancers.
• PCS = ??, two-sided P-value = ??; Z = ??, One sided P-value =
??

78
Example 3*: Smoking versus Lung Cancer Outcome

CANCER CONTROL TOTAL

SMOKER 83 72 155
NON-SMOKER 3 14 17
TOTAL 86 86 172

• Testing equal proportions of smokers in populations of


cancers and non-cancers.
• PCS = ??, two-sided P-value = ??; Z = ??, One sided P-value =
??

79
Example 3*: Smoking versus Lung Cancer Outcome

CANCER CONTROL TOTAL

SMOKER 83 72 155
NON-SMOKER 3 14 17
TOTAL 86 86 172

Testing equal proportions of smokers in populations of cancers


and non-cancers. PCS=7.8983, two-sided P-value = 0.005; Z =
2.8104, One sided P-value = 0.0025

80
Example-3 Calculations

𝑛+𝑗
• Expected cell count for PCS test under H0: 𝑝1+ = 𝑝2+: 𝑛ො 𝑖𝑗 = 𝑛𝑖+ ( 𝑛 );
• Equivalent Z Test for homogeneity of two binomial populations: 𝑍 =
𝑝ො1+ −𝑝ො2+ 𝑛+1 𝑛+2
( ), where 𝑝Ƹ = and 1 − 𝑝Ƹ = ;
𝑝ො (1−𝑝)/𝑛
ො 𝑛 𝑛
• Note: PCS test can be used to test homogeneity of binomial populations

81
Intentionally Kept Blank

82
Homogeneity versus Independence
Hypotheses
• Hypothesis of homogeneity (for prospective product
binomial sampling)
H0: π1 = π2
Not done in Retrospective Product Binomial Sampling

• Hypothesis of Independence
(At this stage qualitatively expressed)
Done only in Multinomial or Poisson Sampling

83
Homogeneity versus Independence
Hypotheses (contd.)
• The hypothesis of independence is used to
investigate an association between row and column
factors without specifying one of them as a
response. Although the hypotheses may be
expressed in terms of parameters, it is more
convenient to use the qualitative wording:
• H0: The row categorization is independent of the
column categorization

84
Sampling scheme versus Hypotheses Testing
Sampling scheme Marginal Total fixed in Usual Hypothesis: Usual Hypothesis:
advance Independence Homogeneity
Poisson None YES
Multinomial Grand Total (Sample YES
size)
Prospective Row (explanatory) YES
total
Retrospective Column (Response) YES
total

Through “Odds Ratio” only

85
Assumptions for Pearson’s Chi-square tests

• We will assume that the frequencies of all the entries


in the 2x2 table are greater than 5.
• This ensures that the “asymptotic tests” performed
on the 2x2 tables are reasonably accurate.
(“asymptotic” means ‘appropriate in large samples’)

• If all the entries in the 2x2 table are not greater than
5, one may try Fisher’s Exact test.

86
Intentionally Kept Blank

87
Layout for Inference on the 2×2 table
(Sec 2.2, Text)
Column factor
(‘Response’)

Level 1 Level 2
Level 1 n11 n12 R1=
Row Fact n1+= n10=n1 Row
(‘Explanatory’) Total
Level 2 n21 n22 R2=
n2+= n20=n2

C1=n+1 C2=n+2 T=n


=n01 =n02
Grand
Total
Column
Marginal
Total 88
Totals
Inference for 22 Table
(Sec 2.2, Text)

Measures of Association:
• (i) Difference Between Proportions,
• (ii) Relative Risk (or Risk Ratio or Incidence Rate
Ratio or ‘Probability Ratio’)
• (iii) Odds Ratio
1
[ Example: Expected number of trials up to first failure = (1−) ;

‘Expected number of successes before first failure = (1−) = odds
for success’]

89
Is “Tutoring” Helpful in a Business Stat Course?

Success Failure Row marginal


Tutoring a b (a+b)
No Tutoring c d (c+d) = 𝜋ො 2
Col. marginal (a+c) (b+d) n=(a+b+c+d)

Estimated difference of proportions:


𝜋ො1 𝜋ො 2 = a/(a+b) − c/(c+d),
where 𝜋ො1 = a/(a+b), 𝜋ො 2 = c/(c+d)
Estimated Risk Ratio:
a/(a+b) a(c+d) 𝑎𝑑+𝑎𝑐
(𝜋
ො1 /𝜋ො 2 ) = =
(a+b)c
= 𝑏𝑐+𝑎𝑐 ,
c/(c+d)

𝜋 (1−ෝ
𝜋 ) 𝑎/𝑏 𝑎𝑑
Estimated Odds Ratio = 1 × ෝ 2 = =
(1−ෝ
𝜋1 𝜋2 𝑐/𝑑 𝑏𝑐

90
Motivating Use of Odds-Ratios
• High School A reduced the dropout rate from 10% to 5%, a
dramatic 50% decrease!
• High School B increased the graduation rate from 90% to 95%,
a modest 5.5% increase.
• Principal of School A was lauded by the NY Times for slashing
the drop out rate by half. The other principal got a short
mention in the local newspaper. Even though they did the
same thing.
• Log-odds ratios puts these on even terms:
School A: log((0.05/0.95)/(0.10/0.90)) = −0.32
School B: log((0.95/0.05)/(0.90/0.10)) = 0.32
• Log-odds-ratios consider a change from 10% to 5% equivalent
to a change from 90% to 95%.
91
‘Odds for success’ vs ‘probability of success’
in Geometric Distribution Setup (p. 41)
• A sequence of independent Bernoulli trials, having two
potential outcomes "success" and "failure". In each trial
probability of success is  and of failure is (1 − ). Observe
this sequence until first failure has occurred. Then X = number
of ‘successes’ observed, will have the geometric (negative
binomial with r=1) distribution: Pr(X=k) = k(1-), k=0,1,2, …
•  = E(X)=  /(1- ), V(X) =  /(1- )2 > E(X).

• Thus, = ‘odds for success’ is equal to ‘average number’
(1−)
of successes before first failure in a ‘geometric experiment’ of
trials.
• Compare it with  = ‘probability of success’ = ‘proportion’ of
success in an experiment of infinitely many trials.
92
Odds versus Probabilities (contd.)
Interpretation: An event with chance of
occurrence 0.95 means the event has odds of
0.95/0.05 =19 to 1 in favour of its occurrence

An event with chances 0.05 has the same odds


(0.05/0.95=1/19) 19 to 1, but against it. Generally
express the larger number first.

93
Relation between Probability, Odds & Logit
Log(Odds)
Probability Odds =Logit
0 0 NC Odds maps probability
0.1 0.11 -2.20 from [0,1] to [0,)
0.2 0.25 -1.39 asymmetrically,
0.3 0.43 -0.85 while Logit maps
0.4 0.67 -0.41 probability to (-, )
0.5 1.00 0.00 symmetrically
0.6 1.50 0.41
0.7 2.33 0.85
0.8 4.00 1.39
0.9 9.00 2.20
1 NC NC

94
Example: NFL Football
TEAM “ODDS against” (Prob of Win)
San Francisco 49ers Even (1/2)
Denver Broncos 5 to 2 (2/7)
New York Giants 3 to 1 (1/4)
Cleveland Browns 9 to 2 (2/11)
Los Angeles Rams 5 to 1 (1/6)
Minnesota Vikings 6 to 1 (1/7)
Buffalo Bills 8 to 1 (1/9)
Pittsburgh Steelers 10 to 1 (1/11)

Total probability is 1.73!!


[Christensen]
95
The Following are Equivalent
1. Proportions π1, π2 are equal.
2. Odds 1, 2 are equal.
3. Odds ratio 𝜑=1/2 is equal to 1.
4. ln(odds ratio) = ln(𝜑) is equal to 0.

Result: Estimate of Odds Ratio (OR) given by



𝑂𝑅=(n 11n22)/(n12n21) remains invariant over the
sampling designs (i.e., works even in case of
retrospective sampling)

96
Advantage of Odds Ratio over
Risk Ratio or Difference of Proportions
1. Estimate of Odds Ratio (OR) remains invariant over
the sampling design (i.e., works even in case of
retrospective sampling), and it is given by

𝑂𝑅=(n 11n22)/(n12n21), since
𝑷(𝒀=𝟏|𝑿=𝟏) 𝑃(𝑌=1,𝑋=1)
𝑷(𝒀=𝟎|𝑿=𝟏) 𝑃(𝑌=0,𝑋=1) 𝑃(𝑌=1,𝑋=1)𝑃(𝑌=0,𝑋=0)
𝑷(𝒀=𝟏|𝑿=𝟎) = 𝑃(𝑌=1,𝑋=0) =
𝑃 𝑌=0,𝑋=1 𝑃(𝑌=1,𝑋=0)
𝑷(𝒀=𝟎|𝑿=𝟎) 𝑃(𝑌=0,𝑋=0)
𝑃 𝑋 =1 𝑌 =1 𝑃(𝑌 = 1) 𝑷 𝑿 = 𝟏 𝒀=𝟏
𝑃 𝑋 =1 𝑌 =0 𝑃(𝑌 = 0) 𝑷 𝑿 = 𝟎 𝒀=𝟏
= =
𝑃 𝑋 =0 𝑌 =1 𝑃(𝑌 = 1) 𝑷 𝑿 = 𝟏 𝒀=𝟎
𝑃 𝑋 =0 𝑌 =0 𝑃(𝑌 = 0) 𝑷 𝑿 = 𝟎 𝒀=𝟎
2. Comparison of odds extends nicely to regression
analysis when response (Y) is a categorical variable. 97
(iii) Odds and Odds Ratio
Odds of an outcome: Let  be the population
proportion of “YES” outcomes. Then the
corresponding odds is given by,
𝜋
𝜔=
1−𝜋

The sample odds is given by,


෡
𝜔
ෝ= ෡
1−

98
(iii) Odds, and Odds Ratio (contd)
• i = population proportion of “YES” response for
Group X=i. Then the odds of “YES” happening is given
𝜋𝑖
by: 𝜔𝑖 = , 0 ≤ 𝜔𝑖 < ∞.
1−𝜋𝑖
• The sample odds of “YES” in Group i, give the
ෞ𝑖
𝜋
estimate: 𝜔
ෞ𝑖 = , i=1,2
1−ෞ
𝜋𝑖
• Odds Ratio of “YES” response in Group 1 to that in
Group 2:
𝜔1 𝜋1 (1 − 𝜋2 )
𝜑= = ×
𝜔2 (1 − 𝜋1 ) 𝜋2

99
Odds versus Probability
Given the probability  of a “YES” outcome, the
corresponding odds is given by,
𝜔 = 𝜋/(1 − 𝜋)
Similarly, given the odds ω of a “YES” response, the
corresponding probability  is given by

𝜋 = 𝜔/(1 + 𝜔)

100
Properties of Odds

𝜔 = 𝜋/(1 − 𝜋)  𝜋 = 𝜔/(1 + 𝜔)

1. Odds must be greater than or equal to zero but


have no upper limit.
2. Odds are not defined for the proportions that are
exactly 0 or 1
3. If the odds of a “YES” outcome is , then the odds
of a “NO” is 1/

101
Intentionally Kept Blank

102
Difference Between Proportions or Relative Risk?
[ (π1  π2) or (π1/π2) ?]
• Two proportions π1 = 0.5 and π2 = 0.45 have the same
difference as π1 = 0.1 and π2 = 0.05 (even though in the
second case one is twice the other). This is when Relative
Risk is a better measure than Difference Between
Proportions.
• An alternative to comparing proportions (relative risk,
i.e., π1/π2) is to compare the corresponding odds through
odds ratio 𝜑 = 𝜔1 /𝜔2 , where 𝜔1 = π1/(1- π1) 𝜔2 = π2/(1-
π2).

103
Confidence Interval for π1π2
𝑛11 𝑛21
• Estimate of π1  π2: 𝜋ො 1 − 𝜋ො 2 = −
𝑛1+ 𝑛1+
ෝ 1 (1−𝜋
𝜋 ෝ1) ෝ 2 (1−𝜋
𝜋 ෝ2)
• ෢
𝑉𝑎𝑟(𝜋ො 1 − 𝜋ො 2 ) = +
𝑛1+ 𝑛2+
ෝ 1 (1−𝜋
𝜋 ෝ1) ෝ 2 (1−𝜋
𝜋 ෝ2)
• s.e.( 𝜋
ෞ1 − 𝜋
ෞ2 ) = +
𝑛1+ 𝑛2+
• 100(1-)% CI for π1  π2:
෡ 1(1−𝜋
𝜋 ෡ 1) ෡ 2(1−𝜋
𝜋 ෡ 2)
(𝜋ො 1 − 𝜋ො 2 ) − 𝑍/2
𝑛1+ + 𝑛2+ to

෡ 1(1−𝜋
𝜋 ෡ 1) ෡ 2(1−𝜋
𝜋 ෡ 2)
(𝜋ො 1 − 𝜋ො 2 ) + 𝑍/2
𝑛1+ + 𝑛2+

𝑤ℎ𝑒𝑟𝑒 𝑍/2 =1.96 for =.05

104
Testing H0: π1  π2 = 0
𝑛11 𝑛21
• Estimate of π1  π2: (𝜋ො 1 − 𝜋ො 2 )= −
𝑛1+ 𝑛1+
𝑛11 +𝑛21
• ො =
𝑛1+ +𝑛2+
1 1
෢ ) = ො (1 − ො )(
• 𝑉𝑎𝑟(ො + )
𝑛1+ n2+
(𝜋ෝ 1 −ෝ𝜋2 )
• Test statistic, Z = is asymptotically
෡ (1−෡ )(𝑛 + n )
1 1
1+ 2+
N(0,1), under H0, if n1+, n2+ are ‘large’

𝑍Τ2 =1.96 for =.05

105
Relative Risk vs Odds Ratio

• Relative risk tells how much ‘risk’ (probability) is increased or


decreased from an initial level. It is readily understood. A
relative risk of 0.5 means the initial risk has halved. A relative
risk of 2 means initial risk has increased twofold.
• Odds ratio is simply the ratio of odds in two groups of interest.
If the odds ratio is less than one then the odds (and therefore
the risk too) has decreased, and if the odds ratio is greater
than one then they have increased. But by how much?
• How to interpret an odds ratio of, say, 0.5 or an odds ratio of
2? Lack of familiarity with odds implies no intuitive feel for the
size of the difference when expressed in this way.

106
Relative Risk (RR) or Incidence Rate Ratio (IRR)
(Text, p.53)

(Population proportion p, also denoted by Greek letter π)


• The relative risk (RR) of response Y=1 of the population
X=1 to population X=0 is the ratio of two population
proportions:
𝑃(𝑌=1|𝑋=1) π1
• RR = =
𝑃(𝑌=1|𝑋=0) π2
• RR > 1 means probability of response is larger in
Population with X=1 than in Population with X=0
𝑛11
𝑛1+
෢ =
• Estimate of RR : 𝑅𝑅 𝑛21
𝑛2+

107
Confidence Intervals for Relative Risk (RR)
(Text, p.54)
𝑛11
π 𝑛1+
• ෢ =
Estimate of RR ( π1 ): 𝑅𝑅 𝑛21 ;
2 𝑛2+

• Exercise:
1− ෞ 1− ෞ
෢ ~ N(𝑙𝑜𝑔𝑒 𝑅𝑅 , ( 1−π1 +
𝑙𝑜𝑔𝑒 𝑅𝑅
1−π2
))  N( 𝑙𝑜𝑔𝑒 𝑅𝑅 , (
π1
+
π2
))
π1 𝑛1+ π2 𝑛2+ 𝑛11 𝑛21
1− ෞ
π1 1− ෞ
π2
• ෢ 𝑙𝑜𝑔𝑒 𝑅𝑅
𝑉𝑎𝑟 ෢ =( + )
𝑛11 𝑛21

1− ෞ
π1 1− ෞ
π2
• ෢ ∓ 𝑍/2
100(1-)% CI for 𝑙𝑜𝑔𝑒 𝑅𝑅 : 𝑙𝑜𝑔𝑒 𝑅𝑅 +
𝑛11 𝑛21

• 100(1-)% CI for RR :
෢ exp − 𝑍/2 
𝑅𝑅 ෢
𝑉𝑎𝑟(𝑙𝑜𝑔 ෢
𝑒 𝑅𝑅 to ෢ exp 𝑍/2 
𝑅𝑅 ෢
𝑉𝑎𝑟(𝑙𝑜𝑔 ෢
𝑒 𝑅𝑅

where Z = 100*(1-)-th percentile of N(0,1) distribution


• 𝑅𝑅 ෢ 2 (1− ෞ
෢ ~ N(𝑅𝑅, 𝑅𝑅 π1
+
1− ෞ
π2
)) ෢ 𝑅𝑅
; 𝑉𝑎𝑟 ෢ 2 (1− ෞ
෢ = 𝑅𝑅 π1 1− ෞ
+
π2
)
𝑛11 𝑛21 𝑛11 𝑛21

Note: RR can NOT be estimated with retrospective sampling 108


Confidence interval for Risk Ratio:
(Example 2: Vitamin C and Cold data)
Cold No Cold
Placebo 335 76

Vitamin C 302 105

 =.05, Z/2 = 1.96,


Risk Ratio = 1.098476;
ln(Risk Ratio) = 0.093923;
Var( ln(Risk Ratio) ) = 0.001406
95% CI for ln(Risk Ratio) : [ 0.020425, 0.167422]
95% CI for Risk Ratio : [1.020635, 1.182253]
Conclusion: The risk of a cold for the placebo group are estimated to be about 1.1
times the risk of a cold for the vitamin C group (approximate 95% CI: 1.02 to 1.18)

109
Intentionally Kept Blank

110
Computation of odds ratio
(Example 3)
Cancer Control

Smoker 83 72

Non-Smoker 3 14

Odds ratio = (83)(14)/(3)(72) = 5.38


Calculate odds ratio by dividing the product of the diagonal
elements of the table with that of the off diagonal element of the
table.
The above result indicates that the odds of getting cancer for a
smoker is 5.38 times larger than that of getting cancer for a
non-smoker.
Note: RR can not be estimated with retrospective sampling 111
Example 2: Computation of odds ratio
Cold No Cold

Placebo 335 76

Vitamin C 302 105

Odds ratio = (335)(105)/(302)(76) = 4.41/2.88 = 1.53

Calculate odds ratio by dividing the product of the diagonal


elements of the table with that of the off-diagonal elements of
the table.
The above result indicates that the odds of getting cold on a
placebo treatment is 1.53 times larger than that of getting cold
on vitamin C treatment. 112

Derivation: Asymptotic Distribution of 𝑂𝑅
𝑝 𝑝ො1 𝑝ො
• ෢ = 𝑛11𝑛22 ;
𝑂𝑅 = 𝑝1 ; 𝑂𝑅 ෢ = ln
ln 𝑂𝑅 − ln(1−2𝑝ො ) ;
2 𝑛 𝑛 21 12 1−𝑝ො1 2
𝑝1 1−𝑝1
• 𝑝Ƹ1 ~𝑁(𝑝1 , 𝑛1+
); By DELTA method,

𝑝ො1 𝑝1 𝑝 1−𝑝 1 2 𝑝1 1 1 1
• ln ~𝑁 ln , 1𝑛 1  𝑁 ln ,𝑛 =𝑛 +𝑛
1−𝑝ො1 1−𝑝1 1+ 𝑝1 1−𝑝1 1−𝑝1 ො1 1−𝑝ො1
1+ 𝑝 11 12
𝑝ො2 𝑝2 1 1 1
• ln  𝑁 ln ,𝑛 =𝑛 +𝑛
1−𝑝ො2 1−𝑝2 2+ ෝ
𝑝2 1−𝑝ො2 21 22

𝟏 𝟏 𝟏 𝟏
෢ ~𝑵 𝒍𝒏 𝑶𝑹 ,
• 𝒍𝒏 𝑶𝑹 + + + ;
𝒏𝟏𝟏 𝒏𝟏𝟐 𝒏𝟐𝟏 𝒏𝟐𝟐
1 1 1 1
• ෢
𝑂𝑅~𝑁 𝑂𝑅, 𝑂𝑅2 (𝑛 + 𝑛 + 𝑛 + 𝑛 ) , by DELTA method
11 12 21 22

𝟏 𝟏 𝟏
෢ ~𝑵 𝟎,
• Under H0, 𝒍𝒏 𝑶𝑹 (
+ ) ,
ෝ 𝒑𝒄 𝟏−ෝ𝒑𝒄 𝒏𝟏+ 𝒏𝟐+
𝒏𝟏𝟏 +𝒏𝟐𝟏
where estimated common p = ෝ𝒄 =
𝒑
𝒏

113
Confidence Interval for Odds Ratio
(through that for loge of Odds Ratio, (Text, p.52)

෢ = 𝑛11 𝑛22
• Estimate of OR : 𝑂𝑅
𝑛21 𝑛12
• Estimate of “asymptotic” variance of loge(OR):
෢ ~𝑁 ln 𝑂𝑅 , 1 + 1 + 1 + 1 , by DELTA method
• ln 𝑂𝑅
𝑛11 𝑛12 𝑛21 𝑛22
1 1 1 1

• 𝑉𝑎𝑟(𝑙𝑜𝑔 ෢
𝑒 𝑂𝑅 = + + +
𝑛11 𝑛22 𝑛21 𝑛12
• 100(1-)% CI for OR:
𝑛11 𝑛22
exp ∓ 𝑍/2  𝑉𝑎𝑟(𝑙𝑜𝑔
෢ ෢
𝑒 𝑂𝑅 , i.e. ,
𝑛21 𝑛12

𝑛11 𝑛22 1 1 1 1
exp ∓𝑍/2  + + +
𝑛21 𝑛12 𝑛11 𝑛22 𝑛21 𝑛12

Note: RR can not be estimated with retrospective sampling


114
Confidence interval for Odds Ratio
(Example 3: Smoking and Cancer Data)
CANCER CONTROL
SMOKER 83 72
NON-SMOKER 3 14

1. Odds ratio and its log 𝜑ො = 𝟓. 𝟑𝟖 ՜ ln 𝜑ො = 1.683

2. Shortcut method for the 1 1 1 1


SE of the log odds ratio + + + 14 = 0.656
83 72 3

3. 95% interval for the log of 1.683  1.96× 0.656 = [0.396,2.969]


odds ratio

4. 95% interval for the odds ratio exp(0.396) to exp(2.969);


or 1.486 to 19.471
Conclusion: The odds of cancer for the smokers are estimated to be 5.38 times the
odds of cancer for non-smokers (approximate 95% CI: 1.486 to 19.471)
Confidence interval for Odds Ratio
(Example 2: Vitamin C and Cold data)
Cold No Cold
Placebo 335 76

Vitamin C 302 105

1. Odds ratio and its log

2. Shortcut method for the


SE of the log odds ratio

3. 95% interval for the log


odds ratio

4. 95% interval for the odds ratio exp(0.093) to exp(0.761); or 1.10 to 2.14

Conclusion: The odds of a cold for the placebo group are estimated to be 1.53
times the odds of a cold for the vitamin C group (approximate 95% CI: 1.10 to 2.14)
116
Test for Homogeneity
• Hypothesis of homogeneity
H0: π1 = π2

• Alternatively, (ω =odds, φ =odds ratio)


H0 : ω1 = ω2,
or H0: φ = 1,
or H0: log(φ) = 0

117
Testing Equality of Two Population
Odds

118
Odds Ratio (Contd.)
Interpretation:
If the odds ratio =1 /2 equals to 4, then 1=42.
This means that the odds of a “yes” outcome in the
first group is four times the odds of a “yes” outcome in
the second group.

119
Example 3: Cancer vs Smoking
Cancer Control

Smoker 83 72

Non-Smoker 3 14

Odds ratio = (83)(14)/(3)(72) = 5.38

Calculate odds ratio by dividing the product of the diagonal elements of the
table with that of the off diagonal element of the table.

The above result indicates that the odds of getting cancer for a smoker is
5.38 times that of getting cancer for a non-smoker.

120
Sampling Distribution of ln(OR)
𝑝1
• 𝑂𝑅 = ෢ = 𝑛11 𝑛22
; 𝑂𝑅
𝑝2 𝑛21 𝑛12
𝑝ො1 𝑝ො2
• ෢ = ln
ln 𝑂𝑅 − ln( )
1−𝑝ො1 1−𝑝ො2
𝑝 1−𝑝1
• 𝑝Ƹ1 ~𝑁(𝑝1 , 1 ),
𝑛1+
𝑝ො1 𝑝1 𝑝1 1−𝑝1 1 2
• By DELTA method, ln ~𝑁 ln ,
1−𝑝ො1 1−𝑝1 𝑛1+ 𝑝1 1−𝑝1
𝑝1 1 1 1
 𝑁 ln , = +
1 − 𝑝1 𝑛1+ 𝑝Ƹ1 1 − 𝑝Ƹ1 𝑛11 𝑛12
𝑝ො2 𝑝2 1 1 1
• ln  𝑁 ln , = +
1−𝑝ො2 1−𝑝2 𝑛2+ ෝ𝑝2 1−𝑝ො2 𝑛21 𝑛22
𝟏 𝟏 𝟏 𝟏
෢ ~𝑵 𝒍𝒏 𝑶𝑹 ,
• 𝒍𝒏 𝑶𝑹 + + + , by DELTA method
𝒏𝟏𝟏 𝒏𝟏𝟐 𝒏𝟐𝟏 𝒏𝟐𝟐
1 1 1 1
• ෢
𝑂𝑅~𝑁 𝑂𝑅, 𝑂𝑅2 ( + + + ) , by DELTA method
𝑛11 𝑛12 𝑛21 𝑛22
𝟏 𝟏 𝟏
෢ ~𝑵 𝟎,
• Under H0, 𝒍𝒏 𝑶𝑹 ( + ) ,
ෝ 𝒑𝒄 𝟏−ෝ𝒑𝒄 𝒏𝟏+ 𝒏𝟐+
𝒏𝟏𝟏 +𝒏𝟐𝟏
ෝ𝒄 =
where 𝒑
𝒏

121
Example 1: Cardio-Vascular Deaths and Obesity
among women in American Samoa

[7.76 (=16/2061) observed deaths versus 6.66 (=7/1051) deaths per


thousand.]
Test equality of odds of CVD deaths in populations of obese and
nonobese Samoan women ( ln(Odds Ratio) =0 )
This is an “Observational Study“, an example of “Poisson Sampling” [Ramsey, F. L.
and Schafer, D. W. (1997). The Statistical Sleuth. Duxbury Press, Belmont,
California.]
122
Testing equality of two population odds:
Cardiovascular disease and obesity data
group 1 (obese):
1. Estimate the
odds on CVD
death.
group 2 (nonobese):

2. Odds ratio and it’s


log

3. Proportion from the


combined sample

4. SE for the log odds


ratio estimate (test
version) 6. One-side
P-value
5. Z-Statistic 123
Testing equality of two population odds:
Cancer vs Smoking data
• Exercise: do the calculations for dataset in Example 2
and Example 3

124
Intentionally Kept Blank

125
Test for Marginal Homogeneity
(McNemar’s Test, Text, p.55-56)

Comparing dependent proportions in matched or paired-


data, usually in a pre-post treatment study design

H0: Prevalence of Depression at two time points are equal


(P(X=1)= 1+ = +1=P(Y=1), i.e., “treatment” has no effect)
(𝑛12 −𝑛21 )2
McNemar’s (Chi-square) test statistic = ~ 1 2
𝑛12 +𝑛21
126
Test for Marginal Homogeneity
(McNemar’s Test, Text, p.55-56)

Year 0: 9/285=0.03;
Year 1: 41/317=0.13

H0: Prevalence of Depression at two time points are equal


(𝑛12 −𝑛21 )2
McNemar’s (Chi-square) test statistic = ~ 1 2
𝑛12 +𝑛21

Here, (9-41)2/(9+41) = 20.48; P-value =6.02E-06;


Conclusion: The null hypothesis of marginal homogeneity is
rejected
127
Derivation: McNemar’s Test
H0: 1+= +1 ( 2+= +2) ( 12= 21, since 1+ +1 = 12 21)
• Define statistic d = p1+  p+1 (= p+2  p2+), where p denote
sample propotions
• Var(d) = Var (p1+) + Var(p+1) – 2 Cov(p1+, p+1)
= [1+(1-1+) + +1(1-+1) – 2(1122  1221) ]/n,
since
Cov(p1+, p+1) = Cov(p11+p12, p11+p21) = V(p11) + Cov(p11, p21) +
Cov(p12, p11) + Cov(p12, p21)
= [11(1-11)  1121  1112 1221]/n
= [1122  1221]/n

128
Derivation: McNemar’s Test
• H0: 1+= +1, define statistic d = p1+  p+1 (= p+2  p2+),

• Var( 𝑛d) = 1+(1-1+) + +1(1-+1) – 2(1122  1221)


= (11+12)(21+22) + (11+21)(12+22) – 2(1122  1221)
= (1121 + 1122+ 1221+ 1222 ) + (1112 + 1122 + 2112 + 2122)
– 2(1122  1221)
= (1121 + 1221+ 1222 ) + (1112 + 2112 + 2122) + 21221)
= 12(22+ 11+ 21) + 21 (11+ 12+ 22) + 21221
= 12(1-12) + 21(1- 21) + 21221 = [ (12+21) (1221)2 ]

• Under H0, Var( 𝑛d) = (12+21); hence asymptotically d ~ N(0,


(p12+p21)/n );
d/ (p12+p21)/n = (n12  n21)/n) / (n12+n21)/𝑛2
= (n12 n21) / (n12+n21)
Thus, McNemar’s Test statistic: d2 = (n12 n21)2/(n12+n21)
129
Intentionally Kept Blank

130
Cochran-Mantel-Haenszel Test for no row by
column association in any of the 22 Tables
(Text pp. 94-101, Agresti, p. 237)

131
Cochran-Mantel-Haenszel Test
(Text pp. 94-101, Agresti, p. 237)

(ℎ) (ℎ)
(ℎ) 𝑛1+ 𝑛+1
𝑚11 =
𝑛(ℎ)

Text, p. 100: QCMH = (18-16.4


+ 32 – 28.8)2/(2.3855 +
2
3.7236) = 3.7714; P-value =
𝑞 (ℎ) (ℎ)
𝑄𝐶𝑀𝐻 =
σℎ=1 𝑛11 −𝑚11
, 0.052 with 12 dist
𝑞 (ℎ)
σℎ=1 𝑣11
Here, h=1,2 (ℎ) (ℎ) (ℎ) (ℎ) (ℎ) (ℎ)
(ℎ) 𝑛1+ 𝑛+1 (ℎ) 𝑛1+ 𝑛+1 𝑛2+ 𝑛+2
𝑚11 = , 𝑣11 =
𝑛(ℎ) 𝑛(ℎ) 2 𝑛(ℎ) − 1
Note: For Hypergeometric Dist(N,K,n): V(X) = {n(K/N)(1 - K/N)}×[(N-n)/(N-1)]
𝑞 (ℎ) 𝑞 (ℎ)
= n(N-n)K(N-K)/N2(N-1); Test statistic compares σℎ=1 𝑛11 to its expected value σℎ=1 𝑚11
132
Intentionally Kept Blank

133
Cochran-Armitage Trend Test
(See Text, p.60-61, Agresti, p. 178)
Binary categorical (row) variable X, ordered (column) variable Y.

Test H0: Proportions of X=1 follow some (linear) pattern as a


function of the levels of Y. (test whether proportion of females
changes as the depression level changes; (0.57, 0.77, 0.73) )
Cochran-Armitage test statistic ~ N(0,1).
It basically tests whether slope of linear regression of proportions
of X=1 on levels of Y is zero or not [i =  + *(Dep level i) ].
[Exercise: Check Calculations, on p. 61, text, ZCA= 4.21 ??, P-value
<.0001 ??]
134
Exact Tests for Small Samples

135
Layout for Inference on the 2×2 table
(Sec 2.2, Text)
Column factor
(‘Response’)

Level 1 Level 2
Level 1 n11 n12 R1=
Row Fact n1+= n10=n1 Row
(‘Explanatory’) Total
Level 2 n21 n22 R2=
n2+= n20=n2

C1=n+1 C2=n+2 T=n


=n01 =n02
Grand
Total
Column
Marginal
Total 136
Totals
Assumptions for Pearson’s Chi-square tests

• We will assume that the frequencies of all the entries


in the 2x2 table are greater than 5.
• This ensures that the “asymptotic tests” performed
on the 2x2 tables are reasonably accurate.
(“asymptotic” means ‘appropriate in large samples’)

• If all the entries in the 2x2 table are not greater than
5, one may try Fisher’s Exact test.

137
Exact Test: Independence of Two Attributes

• Example 4: Data collected on a random sample of people


attending preview of a movie
• Did the movie have equal appeal to the young and old or
whether it is more liked by the young.
• Test H0: two attributes are independent, against
one sided Ha: the movie more liked by the young.
138
Exact Test: Independence of Two Attributes
(GGD, Fundamentals, Vol 1, Sec 17.12, p. 489)

• To test if two qualitative characters (attributes) A and B


are independent. Let P(A=Ai, B=Bj) = pij, i=1,…,k, j=1,…,l.
• Let 𝑃 𝐴 = 𝐴𝑖 = σ𝑙𝑗=1 𝑝𝑖𝑗 = 𝑝𝑖0 ; Let 𝑃 𝐵 = 𝐵𝑗 =
σ𝑘𝑖=1 𝑝𝑖𝑗 = 𝑝0𝑗
• To test H0: 𝑝𝑖𝑗 = 𝑝𝑖0 𝑝0𝑗 , for all i,j.
• nij= observed freq for cell (Ai,Bj)
• The marginal frequency of Ai and Bj are 𝑛𝑖0 = σ𝑙𝑗=1 𝑛𝑖𝑗
and 𝑛0𝑗 = σ𝑘𝑖=1 𝑛𝑖𝑗

139
Exact (Conditional) Test: Independence of
Two Attributes
• To test if two qualitative characters (attributes) A and B are
independent. Let P(A=Ai, B=Bj) = pij, i=1,…,k, j=1,…,l.
• To test H0: 𝑝𝑖𝑗 = 𝑝𝑖0 𝑝0𝑗 , for all i,j.
• nij= observed freq for cell AiBj. The marginal frequency of Ai and Bj are
𝑛𝑖0 = σ𝑙𝑗=1 𝑛𝑖𝑗 and 𝑛0𝑗 = σ𝑘𝑖=1 𝑛𝑖𝑗
• Under H0, conditional distribution of {nij, all I,j} given current sample
marginals {𝑛𝑖0 , 𝑛0𝑗 , all i, j}, has the (multivariate hypergeometric) pmf
𝑛! 𝑘 ς𝑙 𝑛𝑖0 𝑛0𝑗
ς 𝑝 𝑝0𝑗
ς ς 𝑛𝑖𝑗 !
𝑖 𝑗
𝑖=1 𝑗=1 𝑖0 ς𝑖 ni0! ς𝑗 n0j!
𝑛! 𝑛 𝑛! 𝑛 =
ς𝑖 𝑝𝑖0𝑖0 ς𝑗 𝑝0𝑗0𝑗 𝑛! ς𝑖 ς𝑗 𝑛𝑖𝑗 !
ς n
𝑖 i0!
ς 𝑛0𝑗 !

140
Multivariate Hypergeometric Distribution
(used in calculating P-value in Exact Tests)

• Randomly sample n elements from a finite (polytomous)


population of size N, without replacement, having K1, K2, ..., Kc
elements of types 1, 2, …, c.
• Xi= number of i-th type elements in the sample, i=1,…,c. Then
X has multivariate hypergeometric distribution:
𝐾
ς𝑐𝑖=1 𝑖
𝑥𝑖
𝑃(𝑋𝑖 = 𝑥𝑖 , 𝑖 = 1, … , 𝑐) =
𝑁
𝑛
• E(Xi)=n(Ki/N),
• V(Xi) = {n(Ki/N)(1 – (Ki/N) )}×[(N-n)/(N-1)]
• Cov(Xi, Xj) = {n(Ki/N)(Kj/N) }×[(N-n)/(N-1)]

141
Derivation: Conditional distribution of {nij} given
current sample marginals {𝑛𝑖0 , 𝑛0𝑗 }
• To test if two qualitative characters (attributes) A and B are independent. Let
P(A=Ai, B=Bj) = pij, i=1,…,k, j=1,…,l.
• To test H0: 𝑝𝑖𝑗 = 𝑝𝑖0 𝑝0𝑗 , for all i,j.
• nij= observed freq for cell AiBj. The marginal frequency of Ai and Bj are 𝑛𝑖0 =
σ𝑙𝑗=1 𝑛𝑖𝑗 and 𝑛0𝑗 = σ𝑘𝑖=1 𝑛𝑖𝑗
• Under H0, pmf of {nij,𝑛all I,j},
𝑛! 𝑛! 𝑛𝑖0 𝑛0𝑗
ς𝑘𝑖=1 ς𝑙𝑗=1 𝑝𝑖𝑗𝑖𝑗 = ς𝑘𝑖=1 ς𝑙𝑗=1 𝑝𝑖0 𝑝0𝑗
ς ς
𝑖 𝑗 𝑛𝑖𝑗 ! 𝑖 ς ς
𝑗 𝑛𝑖𝑗 !
𝑛! 𝑛𝑖0
• Under H0, unconditional pmf of {ni0, all i} = ς ς𝑖 𝑝𝑖0
𝑖 ni0!
𝑛! 𝑛
• Under H0, unconditional pmf of {n0j, all j} = ς ς𝑗 𝑝0𝑗0𝑗
𝑛0𝑗 !
• Under H0, conditional pmf of {nij, all I,j} given given current sample marginals
{𝑛𝑖0 , 𝑛0𝑗 , all i, j}
𝑛! 𝑛𝑖0 𝑛0𝑗
ς𝑘𝑖=1 ς𝑙𝑗=1 𝑝𝑖0 𝑝0𝑗
ς𝑖 ς𝑗 𝑛𝑖𝑗 ! ς𝑖 ni0 ! ς𝑗 n0j !
=
𝑛! 𝑛𝑖0 𝑛! 𝑛0𝑗 𝑛! ς𝑖 ς𝑗 𝑛𝑖𝑗 !
( ς𝑖 𝑝𝑖0 ) ( ς𝑗 𝑝0𝑗 )
ς𝑖 ni0 ! ς 𝑛0𝑗 !
142
Example 4: Exact Test of Indep. of Attributes
• Add up probabilities, under H0, of the given table and of those
indicating more extreme positive association (and having the
same marginals). These tables and corresponding probabilities:

• In the above ‘tables’ read ‘P-value’ as conditional probability value of


observing such a sample.
• P-value for the test = 0.0198 < 0.05  Ha seems to be true 143
Exact Test of Homogeneity of Proportions
(Homogeneity of two Binomial Distributions )
• Example. Compare two methods of treatment of an allergy. Method
1(A) uses 15 patients and Method 2(B) uses 14. Is mehod 2 better
than method 1 ?

• Here n1+=15, n2+ = 14, n11=6, n21 = 11 and Ha: p1 < p2. Here sample
sizes are not large, hence asymptotic tests are not applicable. Need
to use exact tests.

Testing equality of two or more multinomial distributions (PCS as Large sample


test): https://online.stat.psu.edu/stat415/lesson/17/17.1

144
Exact Test of Two Proportions (Homogeneity)
(GGD, Fundamentals, Vol 1)

• Two populations for which proportions of subjects with


certain characteristic are p1 and p2. Random samples of sizes
n1 (same as n1+ notation) and n2 (same as n2+ notation) are
drawn independently from the two pop. Let X1 and X2 denote
numbers of members having characteristic in the samples.
• Want to test H0: p1 = p2 (=p, unknown)
• Make use of the statistics X1 (=n11), X2 (=n21), but concentrate
on samples for which sum X=X1+X2 is fixed at observed sum
(x1+x2) (=n+1).

145
Exact Test of Two Proportions
(Homogeneity of two Binomial Distributions )
𝑛1 𝑥
• 𝑓 𝑥1 = 𝑥 𝑝 1 (1 − 𝑝)(𝑛1−𝑥1 ) , under H0
1
𝑛2 𝑥
• 𝑓 𝑥2 = 𝑥 𝑝 2 (1 − 𝑝)(𝑛2−𝑥2 ) , under H0
2
𝑛 𝑥
• 𝑓 𝑥 = 𝑝 (1 − 𝑝)(𝑛−𝑥) , under H0
𝑥
• Conditional pmf of X1 for given X= x =x1+x2, under H0 is
𝑛1 𝑥 𝑛 𝑛1 𝑛2
𝑝 1 (1−𝑝)(𝑛1 −𝑥1 ) 2 𝑝𝑥2 (1−𝑝)(𝑛2 −𝑥2 )
𝑥1 𝑥2 𝑥1 𝑥−𝑥1
𝑓 𝑥1 |𝑥 = 𝑛 𝑥 = 𝑛1 +𝑛2 ,
𝑝 (1−𝑝)(𝑛−𝑥)
𝑥 𝑥
If observed value of X1 is x10 and that of X is x0, then use
conditional pmf of X1, f(x1|x0) for testing H0.

146
Exact Test of Two Proportions
(Homogeneity of two Binomial Distributions )

• H0: p1 = p2 against Ha: p1 > p2, then P-value is computed


by = 𝑃 𝑋1 − 𝐸(𝑋1 > 𝑥1 − 𝐸(𝑋1 )|𝑥 = 𝑥0 )
𝑛1 𝑛2
𝑥1 𝑥0 − 𝑥1
𝑃(𝑋1 ≥ 𝑥10 |𝑥 = 𝑥0 ) = ෍ 𝑛1 + 𝑛2
𝑥1 ≥𝑥10
𝑥0

• H0: p1 = p2 against Ha: p1 < p2, then P-value is computed


by
𝑛1 𝑛2
𝑥1 𝑥0 − 𝑥1
𝑃(𝑋1 ≤ 𝑥10 |𝑥 = 𝑥0 ) = ෍ 𝑛1 + 𝑛2
𝑥1 ≤𝑥10
𝑥0
147
Example 5: Exact Test of Two Proportions
(Homogeneity of two Binomial Distributions )
• Compare two methods of treatment of an allergy. Method 1(A) uses 15
patients and Method 2(B) uses 14. Is method 2 (B) better than method 1 (A) ?

• Here n1=15, n2 = 14, x0=17, x10 = 6 and H0: p1 = p2 vs Ha: p1 < p2


15 14 15 14 15 14 15 14
+ + +
6 11 5 12 4 13 3 14
• P-value=𝑃 𝑋1 ≤ 𝑥10 𝑥 = 𝑥0 = 29 = 0.0407
17
[P-value: here add up probabilities for samples with X1 = n11 away from the
expected value = (17/29)*15 = 8.7931 in the smaller direction.]
148
Intentionally Kept Blank

149
Exact Test for Homogeneity of two Multinomial
Distributions (GGD, Fundamentals, Vol 1)
• Two multinomial population with distributions
Mult(n,(p11,…,p1k)), Mult(n,(p21,…,p2k)). Random
samples of sizes n1 and n2 are drawn independently
from the two pop. Let X1=(X11,…,X1k), X2=(X21,…,X2k),
are the samples. 𝑛𝑖 = σ𝑘𝑗=1 𝑋𝑖𝑗
• Want to test H0: (p11,…,p1k) = (p21,…,p2k) [= (p1,…,pk),
say, but unknown]
• Make use of the statistics X1, X2, but concentrate on
samples for which sum X=X1+X2 is fixed at observed
sum (x1+x2)=x.
150
Exact Test for Homogeneity of two Multinomial
Distributions
𝑛1 !
• 𝑓 𝒙1 = 𝑝1 𝑥11 … 𝑝𝑘 𝑥1𝑘 , under H0
𝑥11 !𝑥12 !…𝑥1𝑘 !
𝑛2 !
• 𝑓 𝒙2 = 𝑝1 𝑥21 … 𝑝𝑘 𝑥2𝑘 , under H0
𝑥21 !𝑥22 !…𝑥2𝑘 !
𝑛!
• 𝑓 𝒙 = 𝑝1 𝑥1 … 𝑝𝑘 𝑥𝑘 , under H0
𝑥1 !𝑥2 !…𝑥𝑘 !
• Conditional pmf of X1 for given X= x =x1+x2, under H0 is
𝑛1 ! 𝑛2 ! 𝑥1 !𝑥2 !…𝑥𝑘 !
𝑓 𝒙1 |𝒙 = .
𝑥11 !𝑥12 !…𝑥1𝑘 ! 𝑥21 !𝑥22 !…𝑥2𝑘 ! 𝑛!
If observed value of X1 is x10 and that of X is x0, then use
conditional pmf of X1, f(x1|x0) for testing H0.
• Exercise: Find a good business example/application

151
Testing Homogeneity of Two Multinomial Pop.
(https://online.stat.psu.edu/stat415/lesson/17/17.1 ; Example 17-3)

Level 1 ⋯ Level J
Population 1 n11 ⋯ n1J n1=n1+
Population 2 n21 ⋯ n2J n2=n2+
n+1 ⋯ n+J n=n++

H0: p11 = p21, …, p1J = p2J ;


𝑛1𝑗 +𝑛2𝑗 𝑛+𝑗
Define ෝ𝑝𝑗 = = ;
𝑛1 +𝑛2 𝑛
Asymptotic Test:
2 𝐽
(𝑛𝑖𝑗 − 𝑛𝑖+ 𝑝Ƹ𝑗 )2
𝑃𝐶𝑆 = ෍ ෍
𝑛𝑖+ ෝ𝑝𝑗
𝑖=1 𝑗=1

152
Testing Homogeneity of Two Multinomial Pop.
(https://online.stat.psu.edu/stat415/lesson/17/17.1 ; Example 17-3)

The head of a surgery department at a university medical center was concerned that
surgical residents in training applied unnecessary blood transfusions at a different
rate than the more experienced attending physicians. Therefore, he ordered a study
of the 49 Attending Physicians and 71 Residents in Training with privileges at the
hospital. For each of the 120 surgeons, the number of blood transfusions prescribed
unnecessarily in a one-year period was recorded. Based on the number recorded, a
surgeon was identified as either prescribing unnecessary blood transfusions
Frequently, Occasionally, Rarely, or Never.

Frequent Occasionally Rarely Never Total


Attending n11=2 n12=3 n13=31 n14=13 n1=n1+=49
Physician
Resident n21=15 n22=28 n23=23 n24=5 n2=n2+=71
Physician
n+1=17 n+2=31 n+3=54 n+4=18 n=n++=120 153
Exact Small Sample Test for Marginal Homogeneity
(Agresti, p.416)

H0: 1+= +1 ( 2+= +2)


H0: 12= 21  12/ (12+ 21) = 1/2
For small samples an exact test conditions on n* = n12+n21
Under H0, given n*, n21 has conditionally a binomial(n*, 1/2) dist, for
which E(n21) = n*/2
Example (Agresti, p. 416): Opinion Changes of 63 young male voters
(less than 30 years) in Presidential voting for 2004 election: n11 =
32, n12 = 4, n21 = 8, n22 = 19.
Here, n*= n12+n21=12; Conditionally, n21 ~ binomial(12, 1/2) and
P-value = P( |n21  6|  |86| ) = P(n21  8 or n21  4) = 2P(n21  4) =
0.388.
Note: Asymptotic McNemar’s Test: = (n12 n21)2/(n12+n21) =
1.333, P-value = 0.248
154
Intentionally Kept Blank

155
Test of Poisson Mean
(GGD, Fundamentals, Vol 1)
• Poisson population for which mean is . Random samples {X1,
…,𝑋𝑛 } are drawn independently. Then Y= σ𝑛𝑗=1 X𝑗 ~ Poi(𝑛).
• Want to test H0: =0 vs Ha: >0 ;
𝑒 
−𝑛 0 (𝑛
0) 
𝑦
• Let Y=y0; Exact P-value = P Y ≥ 𝑦0 = σ∞
𝑦=𝑦0 𝑦!
σ𝑛
𝑗=1 X𝑗
• CI for : 𝑋ത = ~ N(, /n)  N(, 𝑋/𝑛)

𝑛
ത )
(𝑋−
• ത
~ N(0,1)
𝑋/𝑛

• 100*(1-) CI for : 𝑋ത  (𝑧𝛼/2 ) 𝑋/𝑛



• https://online.stat.psu.edu/stat504/node/57/ (Worldcup soccer data on number
of goals in 95 matches) 156
Poisson Example
Exercise: Test the data
came from a Poisson
dist
ത )
(𝑋−

~ N(0,1)
𝑋/𝑛
100*(1-) CI for :
𝑋ത =1.38, n=95
𝑋ത  (𝑧𝛼/2 ) 𝑋/𝑛
ത =
1.38 1.96 1.38/95 =
[1.14,1.62]

157
Exact Test of Two Poisson Means
(GGD, Fundamentals, Vol 1)
• Two Poisson populations for which means are 1 and 2.
Random samples {X11, …,𝑋1𝑛1 } and {X21, …,𝑋2𝑛2 } of sizes n1 and
n2 are drawn independently from the two pop. Then Yi=
𝑛𝑖
σ𝑗=1 X𝑖𝑗 ~ Poi(𝑛𝑖 ).
• Suppose we want to test H0: 1=2 vs Ha: 1>2
𝑛1
• Conditional pmf of Y1 given Y1+ Y2 = y, is Bin(y, ), under H0
𝑛1 +𝑛2
𝑦0 𝑛 𝑛
• P-value = σ𝑦1 ≥𝑦10 𝑦 ( 1 )𝑦1 ( 2 )𝑦0 −𝑦1
1 𝑛1 +𝑛2 𝑛1 +𝑛2
• Exercise: Collect data for two world cup (cricket) tournaments
and test equality of two Poisson means

158
Intentionally Kept Blank

159
Degrees of Freedom for Likelihood Ratio Tests

Degrees of Freedom for LRT = 2loge() is = difference


in dimensionality of  and 0, when H0:0 is true.
(H1: 0c )
Let I = no. of distinct patterns of covariates; then df
1. Independence Test: {IJ-1} – {(I-1)+(J-1)}
2. Homogeneity Test: {I(J-1)} – (J-1)
3. Multinomial Logistic (J=3, say): {I(J-1)} – (J-1); (No. of
model parameters under H0= J-1; null hypothesis of a
multiple logistic regression is that there is no relationship between
the X variables and the Y variable)
4. Binary Logistic (J=2): I(2-1)-1

160
References
• Agresti, A. (2012). Categorical Data Analysis, Wiley Series in
Probability and Statistics.
• Bishop, Y., Fienberg, S. E. and Holland, P. W. (1975). Discrete
Multivariate Analysis, MIT Press, Cambridge.
• Christensen, R. (1990). Loglinear Models. Springer-Verlag,
New York.
• Ramsey, F. L. and Schafer, D. W. (1997). The Statistical Sleuth.
Duxbury Press, Belmont, California.
• Read, T. R. C. and Cressie, N. (1988). Goodness of fit Statistics
for Discrete Multivariate Data. Springer-Verlag, New York.
• Goon, Gupta, Dasgupta, Fundamentals of Statistics, Volume
One.
161
162
Bivariate Normal Distribution
• Two “related”, normally distributed variables X ∼
N(µx, σ2x), and Y ∼ N(µy, σ2y) with correlation
coefficient .
• Example: X= Adv Exp, Y = Sales.
• Outline of a bivariate normal distribution
Histogram can be represented by the following
(probability density) function:

163
Bivariate Normal Distribution
• Correlation coefficient = 0

164
Bivariate Normal Distribution
• Correlation coefficient = 0.8

165

You might also like