You are on page 1of 51

三、機率

( Probability )
(Chapter 4)
劉仁沛教授
國立台灣大學農藝學研究所生物統計組
國立台灣大學流行病學與預防醫學研究所
國家衛生研究院生物統計與生統資訊組
jpliu@ntu.edu.tw

【本著作除另有註明,網站之內容皆採用 創用 CC 姓名標示 -
非商業使用 - 相同方式分享 3.0 台灣 授權條款釋出】

1 111/10/03 Jen-pei Liu, PhD


 機率概念 (Concept of Probability)

 樣品空間及事件 (Sample Space and Events)

 機率運算法則 (Elementary Probability Rules)

 條件機率及獨立 (Conditional Probability)

 機率的應用 (Applications)

2 111/10/03 Jen-pei Liu, PhD


試驗 (Experiment)

 一個收集不定結果 (Outcome) 之觀測值的過程,


每一次試驗只有一個結果 (Outcome)
例: 擲硬幣一次
 二個可能的結果:正面 (H) 或反面 (T)
 但每次只有一結果出現
 而且在擲硬幣前不知會觀測到哪一個結果
 但可計算每一種結果出現的機率

3 111/10/03 Jen-pei Liu, PhD


機率 (Probability)

1. 機率是介於 0 與 1 之間
2. 所有結果之機率和為 1
例 擲公平硬幣 (fair coin) 一次
出現正面的機率為 0.5
出現反面的機率為 0.5
均在 0 與 1 之間
而且只有正、反面二種結果
→0.5+0.5 = 1
4 111/10/03 Jen-pei Liu, PhD
樣品空間 (Sample Space)

 樣品空間 (Sample Space)


試驗所有可能結果的集合
例:擲硬幣一次
{ H,T }
擲骰子一次
{ 1,2,3,4,5,6 }
夫婦二個小孩的性別
{男男 , 女女 , 男女 , 女男}
= {BB, GG, BG, GB}

5 111/10/03 Jen-pei Liu, PhD


事件 (Event)

事件為樣品空間的子集合
例子:夫婦二個小孩至少一人為女孩
{ GG,BG,GB }
擲骰子一次其結果大於 3
{ 4,5,6 }

6 111/10/03 Jen-pei Liu, PhD


事件機率 (Probability of an Event)

 事件機率為事件中所有結果機率之和
 若試驗中單一結果發生機率均相同
事件中結果之個數
事件機率=
樣品空間內可能結果之總數
若以 E 代表事件
則以 P(E) 代表事件機率

7 111/10/03 Jen-pei Liu, PhD


事件機率 (Probability of an Event)

 例:
夫婦二個小孩的性別
樣品空間={ BB,GG,BG,GB }
可能結果之總數= 4
至少一人為女孩
E ={ GG,BG,GB }
事件中結果之個數= 3
P(E) = 3/4 = 0.75
8 111/10/03 Jen-pei Liu, PhD
機率運算法則

1. 事件 E 之補集合 Ec 之機率為
P(Ec) = 1 - P(E)
P(E) + P(Ec) = 1
例:夫婦二個小孩的性別
E :至少一人為女孩=
{ GG,BG,GB }
Ec :兩人均為男孩= { BB }
P(Ec) = 1/4 = 1 - P(E) = 1 - 3/4
9 111/10/03 Jen-pei Liu, PhD
機率運算法則

2. 加法法則
A 與 B 二事件之交集 (Intersection) = A∩B
包括屬於 A 事件及 B 事件的結果
A 與 B 二事件之聯集 (Union) = A∪B
包括屬於 A 事件或 B 事件的結果

10 111/10/03 Jen-pei Liu, PhD


 P(A∪B) = P(A)+P(B)-P(A∩B)

11 111/10/03 Jen-pei Liu, PhD


 例:
年齡
性別 <=20 歲 >20 歲 和
男 14 6 20
女 21 9 30
和 35 15 50
A : <=20 歲 B :女性
 P(A)=P(<=20 歲 )=35/50=0.7
 P(B)=P( 女性 )=30/50=0.6
 P(A∩B)=P(<=20 歲及女性 )=21/50=0.42
 P(A∪B) = P(<=20 歲或女性 )=0.70+0.60-0.42=0.88

12 111/10/03 Jen-pei Liu, PhD


互斥事件 (Mutually Exclusive Events)

 互斥事件 (Mutually Exclusive Events)


若 A 事件與 B 事件均無相同的結果
A∩B = ψ P(A∩B)=0
P(A∪B)=P(A)+P(B)

13 111/10/03 Jen-pei Liu, PhD


互斥事件 (Mutually Exclusive Events)

 例:隨機抽取一張撲克牌
A :結果為 J
B :結果為 Q
C :結果為紅牌 ( 紅心或方塊 )
4 4 8 2
A∩B=ψ → 52P(A∩B)=0
  
52 52 13

P(A∪B)=P(A)+P(B)=
P(A∪C)=P(A)+P(C)-P(A∩C)
4 26 2 28 7
    
52 52 52 52 13
14 111/10/03 Jen-pei Liu, PhD
條件機率 (Conditional Probability)

 條件機率
年齡
性別 <=20 歲 >20 歲 和
男 14 6 20
女 21 9 30
和 35 15 50

A : <=20 歲 B :女性
學生為 <=20 歲中女性之機率
P(B|A)=21/35=0.6

15 111/10/03 Jen-pei Liu, PhD


條件機率 (Conditional Probability)

 A, B 兩事件
A 事件下 B 事件發生之條件機率

P( A  B ) 0.42
P( B | A)    0.60
P ( A) 0.70

16 111/10/03 Jen-pei Liu, PhD


條件機率 (Conditional Probability)

 乘法法則
P(A∩B)=P(B|A)‧P(A)
=P(A|B)‧P(B)
 獨立事件:兩個互不影響的事件
P(B|A)=P(B)
P(A∩B)=P(B|A)‧P(A)
=P(B)‧P(A)
=P(A|B)‧P(B)
=P(A) ‧P(B)

17 111/10/03 Jen-pei Liu, PhD


條件機率 (Conditional Probability)

 例:擲硬幣二次

第二次
第一次 H T 和
H 1 1 2
T 1 1 2
2 2 4

18 111/10/03 Jen-pei Liu, PhD


A :第一次為正面
B :第二次為正面
P(A)=P( 第一次為正面 )=2/4=1/2
P(B)=P( 第二次為正面 )=2/4=1/2
P(B|A)=P( 第二次為正面|第一次為正面 )
P( A  B) 1/ 4 1
    P( B)
P( A) 2/4 2
第二次為正面或反面與第一次無關
P(A∩B)=1/4=(1/2)(1/2)=P(A)‧P(B)
19 111/10/03 Jen-pei Liu, PhD
個人機率 - 定義機率的兩種哲思

 信念的測量 (measure of belief)


– 量化每個人對某特定事件發生的主觀看法。

( 如 P( 這是營養課 ), P( 我追的上那一班公
車 ))
– Critics: 主觀 ! 應該不主觀嗎 ?

20 111/10/03 Jen-pei Liu,


Jen-pei
PhD Liu, PhD
個人機率

 不同人的主觀信念 不同機率 ?
是的,所以我們要收資料找到證據 !
 東方人通常善於隱藏 personal belief, 所以
如果不是大家只有一個意見 , 不然就是大家都
沒意見全部交給高層決定
 自由開放 ? 一言堂 ? 不敢被挑戰 ? 不能接受
自己錯誤 ?

21 111/10/03 Jen-pei Liu, PhD


個人機率 - 貝氏定理

 以貝氏統計 (Bayesian statistics) 為基礎


 主觀信念 + 資料 客觀的推論
 Proposed early in 1700+

22 111/10/03 Jen-pei Liu,


Jen-pei
PhD Liu, PhD
個人機率 - 貝氏定理
 P(B) = P(ACB) + P(AB)

By Conditional probability


P(AB) = P(B|A)P(A)
and
P(ACB) = P(B|Ac)P(Ac)
P(B) = P(AB) + P(ACB)
= P(B|A)P(A) + P(B|Ac)P(Ac)

23 111/10/03 Jen-pei Liu, PhD


個人機率 - 貝氏定理

Pr( B  A) Pr( B | A) Pr( A)


Pr( A | B)  
Pr( B) Pr( B  A)  Pr( B  Ac )
Pr( B | A) Pr( A) Pr( B | A) Pr( A)
 c c

Pr( B | A) Pr( A)  Pr( B | A ) Pr( A ) Pr( B)

條件機率 ;
事前機率 : Pr(A); 事後機率 : Pr(A|B)
事前機率為 personal probability;
24 111/10/03 Jen-pei Liu, PhD
Applications

 Diagnosis of Diseases
– Classification
– Pattern Recognition
 Estimation of Survival Function

25 111/10/03 Jen-pei Liu, PhD


Diagnosis of Diseases
Contingency Table
True Condition Status

Test Results Present (S2) Absent (S1) Total

Positive (R2) a b a+b


Negative (R1) c d c+d

Total a+c b+d

26 111/10/03 Jen-pei Liu, PhD


Indices of Diagnostic Accuracy

 Sensitivity (True Positive rate): Capacity for making


a correct diagnosis in subjects with the disease
 Estimated Sensitivity: P(R2|S2)
P(R2|S2) = 100% x a/(a+c)
 Specificity (True Negative rate): Capacity for
making a correct diagnosis in subjects without
disease
 Estimated Specificity:
P(R1|S1) =100% x d/(b+d)

27 111/10/03 Jen-pei Liu, PhD


Indices of Diagnostic Accuracy

 Positive Predictive Value (Positive Predictive Accuracy): the


proportion of subjects with the disease given the positive results.
P(S2|R2) = 100% x a/(a+b)
 Negative Predictive Value (Negative Predictive Accuracy): the
proportion of subjects without the disease given the negative results.
P(S1|R1) = 100% x d/(c+d)
 False positive rate: given the positive results ,the proportion of
subjects without the disease
P(S1|R2) =1 – positive predictive value = 100% x b/(a+b)
 False negative rate: given the negative results, the proportion of
subjects with the disease
P(S2|R1) = 1 – negative predictive value = 100% x c/(c+d)

28 111/10/03 Jen-pei Liu, PhD


個人機率 - 貝式定理的應用
Positive Predictive Value
Prior Probability: P(D); Posterior Probability: P(D|T+)
P (T  | D ) P( D )
P( D | T ) 
P (T  )
P(T  | D ) P( D )

P (T  | D ) P( D )  P(T  | D ) P ( D )
sensitivity  prevalence

sens.  prev.  (1  specificity )  (1  prev.)
29 111/10/03 Jen-pei Liu, PhD
Example 2 (Feinstein, 2002)

New Maker Diseased Non-diseased Total


Test Result Cases Control

Positive 46 2 48
Negative 4 48 52

Total 50 50 100

30 111/10/03 Jen-pei Liu, PhD


Indices of Diagnostic Accuracy
Data from Example 2 (Feinstein, 2002)
 Sensitivity = 100% x 46/50 = 92.0%
 Specificity = 100% x 48/50 = 96.0%
 Prevalence = 100% x 50/100 = 50.0%
 Positive Predictive Value
= 100% x 46/48 = 95.8%
= (0.92x0.5)/[0.92x0.5 + (1–0.96)x(1–0.5)]
 Negative Predictive Value
= 100% x 48/52 = 92.3%
 False Positive Rate = 100% x 2/48 = 4.2%
 False Negative Rate = 100% x 4/52 = 7.7%

31 111/10/03 Jen-pei Liu, PhD


Example 3 (Feinstein, 2002)

New Maker Diseased Non-diseased Total


Test Result Cases Control

Positive 46 38 84
Negative 4 912 916
Total 50 950 1000

32 111/10/03 Jen-pei Liu, PhD


Indexes of Diagnostic Accuracy

 Example 3 (Feinstein, 2002)


 Sensitivity = 100% x 46/50 = 92.0%
 Specificity = 100% x 912/950 = 96.0%
 Prevalence = 100% x 50/1000 = 5.0%
 Positive Predictive Value = 100% x 46/84 = 54.8%
= 0.92x0.05/[0.92x0.05 + (1–0.96)x(1–0.05)]
 Negative Predictive Value
= 100% x 912/916 = 99.6%
 False Positive Rate = 100% x 38/84 = 45.2%
 False Negative Rate = 100% x 4/916 = 0.4%

33 111/10/03 Jen-pei Liu, PhD


Error rates associated with screening test (Fleiss, 1981)
Prevalence False Positive Rate False Negative Rate
1/million .9999 0
1/100,000 .9991 0
1/10,000 .9906 .00001
1/1000 .913 .00005
1/500 .840 .00010
1/200 .677 .00025
1/100 .510 .00051

34 111/10/03 Jen-pei Liu, PhD


Indexes of Diagnostic Accuracy

Type of Diagnostic Tests (Feinstein, 1977)


 Screening or discovery tests: mammogram, fasting
blood sugar - required high sensitivity => high false
positive rate.
 Exclusion tests: to rule out the presence of the
disease such as colonoscopic examination =>
require extremely high sensitivity
 Confirmation test: to verify the suspicion of the
presence of the disease such as biopsy for lung
cancer => require extremely high specificity with very
few false positive.

35 111/10/03 Jen-pei Liu, PhD


勝算與勝算比

 Odds ( 勝算 ): p/(1-p)

 得肺癌的人當中有的有抽煙,有的沒有
P( 抽煙得肺癌 )/P( 不抽煙得肺癌 )=5
 抽煙得肺癌的 odds 為 5
P( 抽煙得肺癌 )=5/(1+5)

36 111/10/03 Jen-pei Liu, PhD


勝算與勝算比

 有讀書比從不讀書的人通過考試的勝算是 999

 P( 有讀通過 )/P( 不讀通過 )=999,

 P( 從不讀書會通過考試 )=0.001

37 111/10/03 Jen-pei Liu,


Jen-pei
PhD Liu, PhD
勝算與勝算比
- 烏腳病與飲用含砷井水的關係

38 111/10/03 Jen-pei Liu, PhD


勝算與勝算比
- 烏腳病與飲用含砷井水的關係

 簡單的來看就是 AD 對角線在分子,而 BC 對
角線在分母的一個比值,這就是勝算比。

 若這個勝算比大於 1 ,則代表「飲用含砷井水
的人口中罹患烏腳病的勝算」是高於「未飲用
含砷井水的人口中罹患烏腳病的勝算」的,也
就是說飲用含砷井水可能會有比較高的風險得
到烏腳病。
39 111/10/03 Jen-pei Liu,
Jen-pei
PhD Liu, PhD
勝算與勝算比
- 婦女乳癌與口服避孕藥的關係

40 111/10/03 Jen-pei Liu, PhD


Computation of Kaplan-Meier Estimate
of Survival (Actuarial Estimate)

Time point t1, t2, and t3


E1: event of surviving from 0 to t1;
E2: event of surviving from t1 to t2;
E3: event of surviving from t2 to t3
E1E2 E3: event of surviving from 0 to t3
By conditional probability
P(E1E2 E3) = P(E3| E1E2)P(E1E2)
= P(E3| E1E2)P(E2|E1)P(E1)

41 111/10/03 Jen-pei Liu, PhD


Computation of Kaplan-Meier Estimate of
Survival (Actuarial Estimate)
 Divide the time into intervals by the time points where the pre-defined
event (death) occurred.
 For each interval, count the number of the patients who were alive at the
beginning of the interval and the number of the patients who were still
alive at the end of the interval.
 Compute the survival rate for each interval as the number of the patients
still alive at the end of interval divided by the number of the patients alive
at the beginning of the interval.
 For the time point where pre-defined event occurred, the Kaplan-Meier
estimate is the product of survival rate of the preceding intervals and
present one.

42 111/10/03 Jen-pei Liu, PhD


Computation of Kaplan-Meier Survival

Ŝ[y(k)] = P (在 y(k) 存活)


= P (经过 y(1) , y(2) , ......,y(k-1) , y(k) 都存活)
= P (在 y(k) 存活 | 经过 y(1) , y(2) ,…… ,y(k-1) , 都存
活)
× P (经过 y(1) , y(2) ,……, y(k-1) 都存活)
= P (在 y(k) 存活 | 经过 y(1) , y(2) , ......,y(k-1) ,都存活)
× P (在 y(k-1) 存活 | 经过 y(1) , y(2) ,…… ,y(k-2) 都存活)
×......×P( 在 y(2) 存活 | 经过 y(1) 存活) × P (在 y(1) 存活)。

43 111/10/03 Jen-pei Liu, PhD


Time in Months to Progression of the Patients with StageⅡor ⅢA
Ovarian Carcinoma by Low-grade or Well-differentiated Cancer

Patient Number Time in Months Death (non-censored) Cell Grade

1 0.92 Yes Low Grade


2 2.93 Yes Low Grade

3 5.76 Yes Low Grade

4 6.41 Yes Low Grade


5 10.16 Yes Low Grade

6 12.40 No Low Grade


7 12.93 No Low Grade
8 13.85 No Low Grade

9 14.70 No Low Grade


10 15.20 Yes Low Grade

11 23.32 No Low Grade


12 24.47 No Low Grade
13 25.33 No Low Grade
14 36.38 No Low Grade
15 39.67 No Low Grade
16 1.12 Yes High Grade
17 2.89 Yes High Grade
18 4.51 Yes High Grade

44 111/10/03 Jen-pei Liu, PhD


19 6.55 Yes High Grade
20 9.21 Yes High Grade

21 9.57 Yes High Grade

22 9.84 No High Grade


23 9.87 No High Grade

24 10.16 Yes High Grade


25 11.55 Yes High Grade
26 11.78 Yes High Grade

27 12.14 Yes High Grade


28 12.14 Yes High Grade

29 12.17 Yes High Grade


30 12.34 Yes High Grade
31 12.57 Yes High Grade
32 12.89 Yes High Grade
33 14.11 Yes High Grade
34 14.84 Yes High Grade
35 36.81 No High Grade

Source: Fleming, et al. (1980)

45 111/10/03 Jen-pei Liu, PhD


Data Layout for Computation of Kaplan-Meier Estimates of
Survival Function

Ordered Number of Number


Number
Distinct Censored in in Risk S(y)
of Events
Event Time [y(k), y(k+1)] Set

Y(0) = 0 d0 = 0 m0 n0 1

Y(1) d1 m1 n1 1- d1/n1

Y(2) d2 m2 n2 (1- d1/n1) (1- d2/n2)

Y(k) dk mk nk (1- d1/n1)(1- d2/n2)…(1- dk/nk)

46 111/10/03 Jen-pei Liu, PhD


Computation of Kaplan-Meier Estimates of Survival Function for
Patients with Low-grade Cancer
Ordered Distinct Number of Number of Censored in Number in
S(y)
Progression Time Events [y(k), y(k+1)] Risk Set

0 0 0 15 1
0.92 1 0 15 0.9333
2.93 1 0 14 0.8667
5.76 1 0 13 0.8000
6.41 1 0 12 0.7333
10.16 1 4 11 0.6667
15.20 1 1 6 0.5556

47 111/10/03 Jen-pei Liu, PhD


Example

 K-M estimate of survival at 15.20 months or


longer

Ŝ ( 15.20 months or longer )


=  14  13  12  11  10  5 
     
 15  14  13  12  11  6 

= 0.5556

48 111/10/03 Jen-pei Liu, PhD


總結 (Summary)

機率
機率概念
樣品空間與事件
0≦P(E) ≦1
加法法則
P(A∪B)=P(A)+P(B)-P(A∩B)
互斥事件 =P(A∩B)=0

49 111/10/03 Jen-pei Liu, PhD


總結 (Summary)

機率 條件機率
P(A|B)=P(A∩B)/P(B)
獨立事件: P(A|B)=P(A) ;
P(B|A)=P(B) ;
P(A∩B)=P(A)‧P(B)
貝氏定理
應用
診斷、勝算比、存活機率
50 111/10/03 Jen-pei Liu, PhD



頁碼 作品 授權條件 作者 / 來源

1-51 轉載自 Microsoft Office 2003 多媒體藝廊,


依據 Microsoft 服務合約及著作權法第 46 、 52 、
65 條合理使用。

You might also like