Professional Documents
Culture Documents
Lecture 06
Associations Mining
Definitions: Association Rule Mining
Given a set of transactions, find rules that will predict the
occurrence of an item based on the occurrences of other items
in the transaction
It is an example of un supervised directed data mining
Example:
Set of transactions
Association rules found in the
TID Items
transactions
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs {Diaper} {Beer},
3 Milk, Diaper, Beer, Coke {Milk, Bread} {Eggs,Coke},
4 Bread, Milk, Diaper, Beer {Beer, Bread} {Milk},
5 Bread, Milk, Diaper, Coke
Applications of association mining in Business enterprises
1. Market Basket Analysis:
Association rules are often used by retail stores to analyze
market basket transactions.
Given a database of customer transactions, where each
transaction is represented as a set of items with an aim of
finding groups of items which are frequently purchased
together. e.g beer & diaper case
The discovered association rules can be used by
management to increase the effectiveness (and reduce the
cost) associated with advertising, target marketing,
inventory, and stock location on the floor.
Credit Cards/ Banking Services: analyzing payments where
each card/account represented as a transaction containing
the set of customer’s payments
Applications of association mining in Business enterprises
Where;
•Itemset1 and Itemset2 are disjoint
•Itemset2 is non-empty
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
{Diaper} {Beer},
Customer
buys both Customer
buys diaper
Customer
buys beer
Types of Association Rules
TID Items
1 Bread, Milk
2 Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
Definitions
2. Objective measures:
An association rule (pattern) is interesting if it has
equal or greater than the required
a) minimum Support; and/or
b) minimum confidence
c) simplicity
d) lift
e) leverage
f) conviction
Measures
18
Lecture Notes for data mining
Rule Evaluation Metrics : Example:
TID Items
Determine support and confidence
1 Bread, Milk of the following rule :
2 Bread, Diaper, Beer, Eggs {milk, Diaper}⇒ Beer
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
(Milk , Diaper, Beer ) 2
s 0.4
|M| 5
Solution:-
(Milk, Diaper, Beer) 2
c 0.67
(Milk, Diaper) 3
Rule Evaluation Metrics : example
Given that the minimum support 50%, and minimum confidence 50%,
Find all the rules X & Y Z with minimum confidence and support
C ) Simplicity
A simple rule has smaller item sets.
With smaller Itemsets, it easier to interpret
Length of rule can be limited by user-defined threshold
Example:
Consider the following association rule that holds association between cocoa
powder and milk
buys(cocoa powder) buys(bread,milk,salt)
A simple rule might be:
buys(cocoa powder) buys(milk)
buys(cocoa powder) buys(Bread)
buys(cocoa powder) buys(salt)
21
Lecture Notes for data mining
Rule Evaluation Metrics : Other objective Measures
26
Lecture Notes for data mining
Achieving association Rule Mining goal
These two tasks are executed iteratively until no new rules will
emerge
27
Lecture Notes for data mining
1. Generating frequent itemsets
Example
Given that minimum support count =2 find Frequent Itemset
the following transaction database
Transactions database
TID Products
1 A, B, E
•Solution
2 B, D
3 B, C •sup ({A,B,E}) = 2
4 A, B, D •sup ({B,C}) = 4
5 A, C • and others
6 B, C
7 A, C
8 A, B, C, E
9 A, B, C
1. Generating frequent itemsets
Apriori principle
This is a rule used in association mining algorithms to create
frequent itemsets from a given set of itemsets.
Apriori principle: states that if an itemset is frequent, then all of
its subsets must also be frequent
Apriori principle holds due to the following property of the
support measure:
X , Y : ( X Y ) s( X ) s(Y )
Where X is a subset and Y is an itemset
Support of an itemset never exceeds the support of its subsets
This is known as the anti-monotone property of support
1. Generating frequent itemsets
Steps
1. Each itemset in the transaction database is a candidate
frequent itemset. That is, minsup count=1
2. List all possible association rules
3. Compute the support and confidence for each rule
4. Prune rules that fail the minsup and minconf
thresholds
Brute-force approach: Example
Steps
i. Frequent Itemset Generation
– Generate all itemsets whose support minsup
Apriori Algorithm
Apriori is Latin word that means ”from what comes before”
The Algorithm is called apriori because it uses knowledge from
previous iteration phase to produce frequent itemsets
It attempts to find subsets which are common (frequent) in a
given data base of itemsets. (e.g collections of items bought by
customers).
37
Lecture Notes for data mining
2. Two-step approach Example
A B is an association rule if
Confidence (A B) ≥ minConf,
Triplets (3-itemsets)
Itemset Count
{Bread,Milk,Diaper} 3
Lift =1.1
Number of cycles peformed 13
Size of set of large itemsets L(1): 22
22 one-item sets
36 two-item sets
3 three-item sets
Association rule mining Parameters
Usually, Apriori in WEKA starts with the upper bound support and
incrementally decreases support (by delta increments which by
default is set to 0.05 or 5%).
The algorithm halts when either the specified number of rules are
generated, or the lower bound for min. support is reached.
Association rule mining Parameters
Association Rules
Classification Rules
1. Many target fields
1. Focus on one target field
2. Applicable in some cases
2. Specify class in all cases
3. Measures: Support,
3. Measures: Accuracy, coverage
Confidence, Lift, leverage,
conviction
Lab exercise
Thank you
Questions