You are on page 1of 15

Customer Analytics at Bigbasket

– Product Recommendations
Association Rule Mining
• Given a set of transactions, find rules that will predict the occurrence
of an item based on the occurrences of other items in the transaction

Market-Basket transactions
Example of Association Rules
TID Items
{Diaper} → {Beer},
1 Bread, Milk {Milk, Bread} → {Eggs,Coke},
2 Bread, Diaper, Beer, Eggs {Beer, Bread} → {Milk},
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer Implication means co-occurrence,
5 Bread, Milk, Diaper, Coke not causality!
Definition: Frequent Itemset
• Itemset TID Items
• A collection of one or more items 1 Bread, Milk
• Example: {Milk, Bread, Diaper}
2 Bread, Diaper, Beer, Eggs
• k-itemset 3 Milk, Diaper, Beer, Coke
• An itemset that contains k items
4 Bread, Milk, Diaper, Beer
• Support count () 5 Bread, Milk, Diaper, Coke
• Frequency of occurrence of an itemset
• E.g. ({Milk, Bread, Diaper}) = 2
• Support
• Fraction of transactions that contain an itemset
• E.g. s({Milk, Bread, Diaper}) = 2/5
• Frequent Itemset
• An itemset whose support is greater than or equal to a minsup threshold
Definition: Association Rule
TID Items
1 Bread, Milk
• Association Rule 2 Bread, Diaper, Beer, Eggs
– An implication expression of the form X → Y, where 3 Milk, Diaper, Beer, Coke
X and Y are itemsets 4 Bread, Milk, Diaper, Beer
5 Bread, Milk, Diaper, Coke
– Example:
{Milk, Diaper} → {Beer}
Example:
• Rule Evaluation Metrics {Milk , Diaper }  Beer
– Support (s)
◆ Fraction of transactions that contain both X and Y  (Milk, Diaper, Beer ) 2
s= = = 0.4
– Confidence (c) |T| 5
◆ Measures how often items in Y  (Milk, Diaper, Beer ) 2
appear in transactions that c= = = 0.67
 (Milk, Diaper ) 3
contain X
Association Rule Mining Task
• Given a set of transactions T, the goal of association rule mining is to
find all rules having
• support ≥ minsup threshold
• confidence ≥ minconf threshold
• Brute-force approach:
• List all possible association rules
• Compute the support and confidence for each rule
• Prune rules that fail the minsup and minconf thresholds
 Computationally prohibitive!
Mining Association Rules
• Two-step approach:
1. Frequent Itemset Generation
– Generate all itemsets whose support  minsup

2. Rule Generation
– Generate high confidence rules from each frequent itemset, where each rule is a binary
partitioning of a frequent itemset
Frequent Itemset Generation: Apriori
• Apriori principle:
• If an itemset is frequent, then all of its subsets must also be frequent

• Apriori principle holds due to the following property of the support measure:

X , Y : ( X  Y )  s( X )  s(Y )
• Support of an itemset never exceeds the support of its subsets
Frequent Itemset Generation: Apriori
null

A B C D E

AB AC AD AE BC BD BE CD CE DE

Found to be
Infrequent
ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE

ABCD ABCE ABDE ACDE BCDE

Pruned
ABCDE
supersets
Frequent Itemset Generation: Apriori
• A level-wise, candidate-generation-and-test approach (Agrawal &
Srikant 1994)

Data base D
1-candidates Freq 1-itemsets 2-candidates
TID Items
Itemset Sup Itemset Sup Itemset
10 a, c, d a 2 a 2 ab
20 b, c, e Scan D b 3 b 3 ac
30 a, b, c, c 3 c 3 ae
e
d 1 e 3 bc
40 b, e
e 3 be
Min_sup=2 ce
3-candidates Freq 2-itemsets Counting
Scan D Itemset Itemset Sup Itemset Sup
bce ac 2 ab 1
bc 2 ac 2 Scan D
be 3 ae 1
Freq 3-itemsets ce 2 bc 2
Itemset Sup be 3
bce 2 ce 2
TID Items

Apriori: Example 1
2
Bread, Milk
Bread, Diaper, Beer, Eggs
3 Milk, Diaper, Beer, Coke
4 Bread, Milk, Diaper, Beer
Item Count Items (1-itemsets)
5 Bread, Milk, Diaper, Coke
Bread 4
Coke 2
Milk 4 Itemset Count Pairs (2-itemsets)
Beer 3 {Bread,Milk} 3
Diaper 4 {Bread,Beer} 2 (No need to generate
Eggs 1
{Bread,Diaper} 3 candidates involving Coke
{Milk,Beer} 2 or Eggs)
{Milk,Diaper} 3
{Beer,Diaper} 3
Minimum Support = 3
Triplets (3-itemsets)
Itemset Count
{Bread,Milk,Diaper} 3
Rule Generation
• Given a frequent itemset L, find all non-empty subsets f  L such that
f → L – f satisfies the minimum confidence requirement
• If {A,B,C,D} is a frequent itemset, candidate rules:
ABC →D, ABD →C, ACD →B, BCD →A,
A →BCD, B →ACD, C →ABD, D →ABC
AB →CD, AC → BD, AD → BC, BC →AD,
BD →AC, CD →AB,

• If |L| = k, then there are 2k – 2 candidate association rules (ignoring L


→  and  → L)
Association Rules: Applications
• Supermarket shelf management
• Goal: to identify items which are bought together (by sufficiently
many customers)
• Approach: process POS data to find dependencies among items
• Example:
• If a customer buys diaper and milk then he is very likely to buy beer
• So stack six-packets next to diaper
Cosine Similarity
Item-Item similarity: Cosine similarity
𝑆1 = [1, 1, 1, 1, 1, 1, 0, 0, 0, 0]
𝑆2 = [0, 0, 1, 1, 1, 1, 1, 1, 1, 1]
Item-Item similarity: Cosine similarity

You might also like