002association Rule 13 PDF

關聯規則 Association Rules
Apriori Algorithm
郭忠義
jykuo@ntut.edu.tw
臺北科技大學資訊工程系
Association Rules
 用於減少潛在大量雜亂無章資料，使成為少量易觀察理解
的資料。
 找出哪些屬性之間有關。
 關係表達法是"如果A，然後B"，及規則支持度 (Support) 和信
心 (Confidence)。
 支持度定義，所有資料中，有多少比例的資料包含A和B。
 信心定義，所有包含A的資料中，有多少比例資料同時包含B。
 關聯規則有Apriori (非監督式) 及GRI (Generalized Rule

Induction)。
2
Association Rules
 一個實例是購物籃分析(Market Basket Analysis)

 80%的顧客如果購買碳粉匣，則也會購買報表紙
 If
啤酒then 70%機率購買尿布
{碳粉匣}→ {報表紙}
 表示形式：{條件}→{結論} {Beer} → {Diaper}
{Milk, Diaper} 
{Beer}
 Itemset
 一或更多的 items的collection。
TID Items
 k-itemset 1 Bread, Milk
 包含 k 個 items。 2 Bread, Diaper, Beer,
Transaction Eggs Item
3 Milk, Diaper, Beer,
Coke
4 Bread, Milk, Diaper,
Itemset
Beer
Coke
3
TID Items
Itemset 1
2
Bread, Milk
Bread, Diaper, Beer,

 Eggs
3 Milk, Diaper, Beer,
 Support  Coke
 The % of transactions that contain itemset  Beer 
 Support(A) = P(A)
Coke 
 Support(A → B) = Support(A ∩ B) = P(A ∩ B)
3
ex. Support(Beer→ Diaper) 0.6
5
 Confidence
 The % of transactions that contain A, which also contain B
Support(A ∩ B)
 Confidence(A→B)=P(B | A) =
Support(A)
Support(Bread, Milk, Beer) 1
ex. Confidence({Bread, Milk}→{Beer}) 0.3
Support(Bread, Milk) 3
4
Lift & Leverage
 Lift
 X、Y同時存在的機率
Support(A ∩ B)
 Lift(A → B) =
Support(A) ∗ Support(B)
 Leverage
 X、Y獨立存在的機率
 Leverage(A → B) = Support(A ∩ B) － Support(A) * Support(B)
5
Apriori
 特性
A Priori 只接受類別性輸入。
 A Priori 能降低可能規則數目的方式。
 關聯分析最大問題，是規則數量太大。
 若一個組合，Z，是不常發生，那 Z 加上任何的組合仍是不常
發生。
 All nonempty subsets of a frequent item set must also be frequent.
 Frequent item set (or large item set) (頻繁項集)
 所有候選項目集 (Candidate item sets, Ck 候選項集)中，支持度
≧ 最小支持度者
 Lk (large k-item set)
 所有頻繁 k-項集所成的集合
6
Apriori
 A Priori步驟
 找出K項集。
 對給定的最小支持度閥值，對所有C1(1項候選集)，保留大於閥
值的項集得到L1 (1項頻繁集)。
 L1自身組合產生2項候選集C2，保留C2中大於閥值的項集，得到
L2 (2 項頻繁集)。
 L2與L1連線產生3項候選集C3，保留C3中大於閥值的項集，得到
L3 (3 項頻繁集)。
 持續得到最大頻繁項集Lk。
7
Example 1
minimum support = 2
{2.3.6}
{3.6.
{2.3.8}
8}
{2.3.9}
{3.6.
9}
8
Example 2
minimum confidence = 80%
Rule Set Cnt Set Cnt Confidence
Frequent Itemset {Diaper,

Diaper → Beer Diaper 700 544 544/700=77%
Beer}
Diaper, Beer
{Beer,
Diaper, Milk Beer → Diaper Beer 631 544 544/631=86%
Diaper}
{Diaper,
Diaper → Milk Diaper 700 527 527/700=75%
Milk}
{Milk,
Milk → Diaper Milk 710 527 527/710=74%
Diaper}
9
Example 3
1000 credit records
Set Cnt Set Cnt Lift
Frequent Itemset Diaper 700 {Diaper,

544 0.544/(0.7*0.631)=1.23
Diaper, Beer Beer 631 Beer}
Diaper, Milk Diaper 700 {Diaper,
527 0.527/(0.7*0.71)=1.06
Milk 710 Milk}
10
GRI
 能接受類別性或數值性的輸入值，輸出結果是類別性。
 由 J-計量測量一個規則，J-計量越高的規則越有意義。
 GRI的步驟
 決定最低的支持度和信心。
 決定總共要找出幾個規則 (以 n 代表)。
 找出所有單一前例的規則，算出各規則 J-計量。
 保留前 n 個最高 J-計量的規則。
 算完單一前例規則，GRI重複計算更複雜規則J-計量，直到所
有可能計算完畢。
11
J-計量
 J-計量的定義
 p( y | x) 1  p( y | x) 
J  p ( x)  p ( y | x) ln  [1  p ( y | x)] ln
 p( x) 1  p ( y ) 
p(x) 是 x 發生的可能性
p(y) 是 y 發生的可能性
p(y|x) 是當知道 x 已發生時，y 發生的可能性。
 在「如果買柴，然後就有買鹽」的規則中，x是柴，y是鹽；
 假設 p(x)=0.6, p(y)=0.7, p(y|x)=0.66，則
 p( y | x) 1  p( y | x) 
J  p ( x)  p ( y | x) ln  [1  p ( y | x)]ln
 p ( x) 1  p ( y ) 
 0.66 1-0.667 
=0.6  0.667 ln +[1-0.667]ln
 0.7 1-0.7 
=0.001525 12
Python
def testApriori_s():
data=np.array([
['Milk','Bread','Apple'],
['Milk','Bread'],
['Milk','Bread','Apple', 'Banana'],
['Milk', 'Banana','Rice','Chicken'],
['Apple','Rice','Chicken'],
['Milk','Bread', 'Banana'],
['Rice','Chicken'],
['Bread','Apple', 'Chicken'],
['Bread','Chicken'],
['Apple', 'Banana']])
for i in data:
print(i)
print("\n\n")
result=list(apriori(data))
df=pd.DataFrame(result)
df.to_csv("appriori_results.csv") #Save to csv formart for detailed view
print(df.head()) # Print the first 5 items
13

002association Rule 13 PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

002association Rule 13 PDF

Uploaded by

Copyright:

Available Formats

關聯規則 Association Rules

 關聯規則有Apriori (非監督式) 及GRI (Generalized Rule

 一個實例是購物籃分析(Market Basket Analysis)

 Leverage(A → B) = Support(A ∩ B) － Support(A) * Support(B)

minimum confidence = 80%

Rule Set Cnt Set Cnt Confidence

Frequent Itemset {Diaper,

1000 credit records

Set Cnt Set Cnt Lift

Frequent Itemset Diaper 700 {Diaper,

 保留前 n 個最高 J-計量的規則。

You might also like