Professional Documents
Culture Documents
離散資料分析 Categorical Data Analysis: 陳俞成 Email:ycchen@mail.chna.edu.tw
離散資料分析 Categorical Data Analysis: 陳俞成 Email:ycchen@mail.chna.edu.tw
離散資料分析
Categorical Data Analysis
陳俞成
Email:ycchen@mail.chna.edu.tw
2005.9.26
Y
X y1 y2 ··· yJ
x1 n11 n12 · · · n1J
x2 n21 n22 · · · n2J
.. .. .. .. ..
. . . . .
xI nI 1 nI 2 · · · nIJ
陳俞成 Email:ycchen@mail.chna.edu.tw 離散資料分析 Categorical Data Analysis
Probability Structure for Contingency Tables
大綱
Copmaring Proportions in Two-By-Two Tables
Chapter 2 Two-Way Contingency Tables
The Odds Ratio
Column
Row 1 2 Total
1 π11 π12 π1+
(π1|1 ) (π2|1 ) (1.0)
2 π21 π22 π2+
(π1|2 ) (π2|2 ) (1.0)
Total π+1 π+2 1.0
Column
Row 1 2 Total
1 p11 p12 p1+
(p1|1 ) (p2|1 ) (1.0)
2 p21 p22 p2+
(p1|2 ) (p2|2 ) (1.0)
Total p+1 p+2 1.0
Column
Row 1 2 Total
1 n11 n12 n1+
2 n21 n22 n2+
Total n+1 n+2 n
n
I pij = nij
I The marginal frequencies are the row totals
{ni+ } and the column totals {n+j }.
陳俞成 Email:ycchen@mail.chna.edu.tw 離散資料分析 Categorical Data Analysis
Probability Structure for Contingency Tables
大綱
Copmaring Proportions in Two-By-Two Tables
Chapter 2 Two-Way Contingency Tables
The Odds Ratio
Belief in Afterlife
Gender Yes No or Undecided Total
Females n11 = 435 n12 = 147 n1+ = 582
Males n21 = 375 n22 = 134 n2+ = 509
Total n+1 = 810 n+2 = 281 n = 1091
Independence
I Two variables are said to be statistically
independent if the conditional distributions of Y
are identical at each level of X .
I That is πj|i = π+j .
I When both variables are response variables, one
can describe their relationship using their joint
distribution, or the conditional distribution of Y
given X , or the conditional distribution of X
given Y .
陳俞成 Email:ycchen@mail.chna.edu.tw 離散資料分析 Categorical Data Analysis
Probability Structure for Contingency Tables
大綱
Copmaring Proportions in Two-By-Two Tables
Chapter 2 Two-Way Contingency Tables
The Odds Ratio
Independence
for i=1,. . . ,I
Sample distribution
Poisson Sampling
Poisson Sampling
H0 : 是否飲酒和交通事故是否造成死亡無關
飲酒
死亡 是 否 合計
是 n11 n12 n1+
否 n21 n22 n2+
合計 n+1 n+2 n++ = n
H0 : 是否感冒和服用維他命與否無關
感冒
服用 是 否 合計
維他命 n11 n12 n1+
寬心劑 n21 n22 n2+
合計 n+1 n+2 n++ = n
Multinomial Sampling
Multinomial Sampling
H0 : 是否相信有來生和性別無關
相信有來生
性別 是 否 合計
女性 n11 n12 n1+
男性 n21 n22 n2+
合計 n+1 n+2 n++ = n
Difference of Proportions
Difference of Proportions
I H0 : π1 = π2 v.s. Ha : π1 6= π2
under significance level α = 0.05
(p −p )−(π1 −π2 ) ·
I z= q 1 2
∗ ∗ 1 1
∼ N(0, 1),
p (1−p )( n + n )
1+ 2+
∗ n11 +n21
where p = n1+ +n2+
I reject H0
if z > 1.96(= zα/2 ) or z < −1.96(= −zα/2 )
Difference of Proportions
I
Group Yes No Total
Placebo 189 10845 11034
Aspirin 104 10933 11037
Source:N.Engl.J.Med.,318:262-264(1988)
I H0 : 是否服用阿司匹靈和是否心肌梗塞無關
I π1 = P(MI|placebo), π2 = P(MI|aspirin)
I H0 : π1 = π2 or H0 : π1 − π2 = 0
189 104
I p1 = = 0.0171, p2 = 11037
11034 = 0.0094
q
I σ̂(p1 − p2 ) = (.0171)(.9829)
11034 + (.0094)(99069)
11037 =
0.0015
I A 95% C.I. for π1 − π2 is (0.005, 0.011)
I ∵0∈
/ (0.005, 0.011) ∴ reject H0 : π1 − π2 = 0
Relative Risk(相對風險)
I In 2 × 2 tables, the relative risk is the ratio of the
“success” probabilities for the two groups,
π1
π2
.
I The sample relative risk is
p1
p2
.
陳俞成 Email:ycchen@mail.chna.edu.tw 離散資料分析 Categorical Data Analysis
Probability Structure for Contingency Tables
大綱
Copmaring Proportions in Two-By-Two Tables
Chapter 2 Two-Way Contingency Tables
The Odds Ratio
Relative Risk
Relative Risk
I 假設比較兩種藥物的副作用,
一組比例為 p1 = 0.01, p2 = 0.001,
另一組比例為 p1 = 0.41, p2 = 0.401。
若以 p1 − p2 來看,
兩組比例差值皆為 0.009,
但第一組的相對風險為 0.01/0.001 = 10,
第二組的相對風險為 0.41/0.401 = 1.02,
顯然使用相對風險較能提醒第一組資料較值得注意。
Relative Risk
I 若 π1 = π2 則相對風險= 1, 即解釋變數和反應變數
互相獨立。
I 有時候計算“失敗”機率之比值也能提供一些訊息。
Myocardial Infarction
Group Yes No Total
Placebo 189 10845 11034
Aspirin 104 10933 11037
Source:N.Engl.J.Med.,318:262-264(1988)
I H0 : 是否服用阿司匹靈和是否心肌梗塞無關
I π1 = P(MI|placebo), π2 = P(MI|aspirin)
I H0 : π1 = π2 or H0 : π1 /π2 = 1
189 104
I p1 = 11034 = 0.0171, p2 = 11037 = 0.0094
Myocardial Infarction
Group Yes No Total
Placebo 189 10845 11034
Aspirin 104 10933 11037
Source:N.Engl.J.Med.,318:262-264(1988)
I H0 : 是否服用阿司匹靈和是否心肌梗塞無關
I π1 = P(MI|placebo), π2 = P(MI|aspirin)
I H0 : π1 = π2 or H0 : θ = 1
I odds1 = nn12
11 189
= 10845 = 0.0174 = 1.74
100 ,
n21 104
odds2 = n22 = 10933 = 0.0095 = 0.95
100
Myocardial Infarction
Group Yes No Total
Placebo 189 10845 11034
Aspirin 104 10933 11037
Source:N.Engl.J.Med.,318:262-264(1988)
I H0 : 是否服用阿司匹靈和是否心肌梗塞無關
I π1 = P(MI|placebo), π2 = P(MI|aspirin)
I H0 : π1 = π2 or H0 : log θ = 0
I log θ̂ = log(1.832) = 0.605
Ever Myocardial
Smoker Infarction Controls Total
Yes 172 173 345
No 90 346 436
Total 262 519 781
Source:J.Epidemiol.and Commun.
Health,43:214-217(1989)
陳俞成 Email:ycchen@mail.chna.edu.tw 離散資料分析 Categorical Data Analysis
Probability Structure for Contingency Tables
大綱
Copmaring Proportions in Two-By-Two Tables
Chapter 2 Two-Way Contingency Tables
The Odds Ratio
I Under H0 : π1 = π2 = π
with significance level α = 0.05
n11 +n12
I p∗ = n+1 +n+2
p1 −p2 ·
I z= q
1 1
∼ N(0, 1)
p ∗ (1−p ∗ )( n + n )
+1 +2
I Under H0 : log θ = 0
with significance level α = 0.05
log θ̂ ·
I z= qP
1
∼ N(0, 1)
i,j nij
http://www.stat.ufl.edu/∼aa/cda/cda.html
http:
//www.ats.ucla.edu/stat/examples/icda
Summary