You are on page 1of 36

Mining Association Rules

in Large Databases
Alternative Methods for Frequent Itemset
Generation

 Traversal of Itemset Lattice


 General-to-specific vs Specific-to-general
Frequent
itemset Frequent
border null null itemset null
border

.. .. ..
.. .. ..
Frequent
{a1,a2,...,an} {a1,a2,...,an} itemset {a1,a2,...,an}
border
(a) General-to-specific (b) Specific-to-general (c) Bidirectional
Alternative Methods for Frequent Itemset
Generation
 Traversal of Itemset Lattice
 Equivalent Classes
null null

A B C D A B C D

AB AC AD BC BD CD AB AC BC AD BD CD

ABC ABD ACD BCD ABC ABD ACD BCD

ABCD ABCD

(a) Prefix tree (b) Suffix tree


Alternative Methods for Frequent Itemset
Generation

 Traversal of Itemset Lattice


 Breadth-first vs Depth-first

(a) Breadth first (b) Depth first


ECLAT
 For each item, store a list of transaction ids (tids)

Horizontal Vertical Data Layout


Data Layout A B C D E
TID Items 1 1 2 2 1
1 A,B,E 4 2 3 4 3
2 B,C,D 5 5 4 5 6
3 C,E 6 7 8 9
4 A,C,D 7 8 9
5 A,B,C,D 8 10
6 A,E 9
7 A,B
8 A,B,C TID-list
9 A,C,D
10 B
ECLAT

 Determine support of any k-itemset by


intersecting tid-lists of two of its (k-1) subsets.
A B AB
1 1 1

 
4 2 5
5 5 7
6 7 8
7 8
8 10
9
 3 traversal approaches:
 top-down, bottom-up and hybrid
 Advantage: very fast support counting
 Disadvantage: intermediate tid-lists may become
too large for memory
Mining Frequent Closed Patterns:
CLOSET
 Flist: list of all frequent items in support ascending
order
Min_sup=2
 Flist: d-a-f-e-c TID Items
 Divide search space 10 a, c, d, e, f
20 a, b, e
 Patterns having d 30 c, e, f
 Patterns having d but no a, etc. 40 a, c, d, f
50 c, e, f
 Find frequent closed pattern recursively
 Every transaction having d also has cfa  cfad is a frequent
closed pattern
 J. Pei, J. Han & R. Mao. CLOSET: An Efficient
Algorithm for Mining Frequent Closed Itemsets",
DMKD'00.
Iceberg Queries
 Icerberg query: Compute aggregates over one or
a set of attributes only for those whose aggregate
values is above certain threshold
 Example:
select P.custID, P.itemID, sum(P.qty)
from purchase P
group by P.custID, P.itemID
having sum(P.qty) >= 10
 Compute iceberg queries efficiently by Apriori:
 First compute lower dimensions
 Then compute higher dimensions only when all the lower
ones are above the threshold
Multiple-Level Association Rules
 Items often form hierarchy.
 Items at the lower level are Food
expected to have lower
support.
milk bread
 Rules regarding itemsets at
appropriate levels could be
skim full fat wheat white
quite useful.
 A transactional database can ....
be encoded based on Fraser Wonder
dimensions and levels
 We can explore shared multi-
level mining
Mining Multi-Level Associations
 A top_down, progressive deepening approach:
 First find high-level strong rules:
milk  bread [20%, 60%].
 Then find their lower-level “weaker” rules:
full fat milk  wheat bread [6%, 50%].
 Variations at mining multiple-level association
rules.
 Level-crossed association rules:
full fat milk  Wonder wheat bread
 Association rules with multiple, alternative
hierarchies:
full fat milk  Wonder bread
Multi-Dimensional Association:
Concepts
 Single-dimensional rules:
buys(X, “milk”)  buys(X, “bread”)
 Multi-dimensional rules: 2 dimensions or predicates
 Inter-dimension association rules (no repeated predicates)
age(X,”19-25”)  occupation(X,“student”)  buys(X,“coke”)
 hybrid-dimension association rules (repeated predicates)
age(X,”19-25”)  buys(X, “popcorn”)  buys(X, “coke”)
 Categorical Attributes
 finite number of possible values, no ordering among values
 Quantitative Attributes
 numeric, implicit ordering among values
Techniques for Mining Multi-
Dimensional Associations
 Search for frequent k-predicate set:
 Example: {age, occupation, buys} is a 3-predicate set.
 Techniques can be categorized by how age is treated.
1. Using static discretization of quantitative
attributes
 Quantitative attributes are statically discretized by
using predefined concept hierarchies.
2. Quantitative association rules
 Quantitative attributes are dynamically discretized into
“bins”based on the distribution of the data.
3. Distance-based association rules
 This is a dynamic discretization process that considers
the distance between data points.
Quantitative Association Rules
 Numeric attributes are dynamically discretized
 Such that the confidence or compactness of the rules
mined is maximized.
 2-D quantitative association rules: Aquan1  Aquan2
 Acat
 Cluster “adjacent”
association rules
to form general
rules using a 2-D
grid.
 Example:
age(X,”30-34”)  income(X,”24K -
48K”)
 buys(X,”high resolution TV”)
ARCS (Association Rule Clustering
System)
How does ARCS
work?

1. Binning

2. Find frequent
predicate-set

3. Clustering

4. Optimize
Limitations of ARCS
 Only quantitative attributes on LHS of rules.
 Only 2 attributes on LHS. (2D limitation)
 An alternative to ARCS
 Non-grid-based
 equi-depth binning
 clustering based on a measure of partial completeness.
 “Mining Quantitative Association Rules in Large
Relational Tables” by R. Srikant and R. Agrawal.
Interestingness Measurements
 Objective measures
Two popular measurements:
 support; and
 confidence

 Subjective measures
A rule (pattern) is interesting if
 it is unexpected (surprising to the user); and/or
 actionable (the user can do something with it)
Computing Interestingness Measure

 Given a rule X  Y, information needed to compute


rule interestingness can be obtained from a
contingency table
Contingency table for X  Y
Y Y f11: support of X and Y
X f11 f10 f1+ f10: support of X and Y
X f01 f00 fo+ f01: support of X and Y
f+1 f+0 |T| f00: support of X and Y

Used to define various measures


 support, confidence, lift, Gini,
J-measure, etc.
Drawback of Confidence
Coffee Coffee
Tea 15 5 20
Tea 75 5 80
90 10 100

Association Rule: Tea  Coffee

Confidence= P(Coffee|Tea) = 0.75


but P(Coffee) = 0.9
 Although confidence is high, rule is misleading
 P(Coffee|Tea) = 0.9375
Statistical Independence
 Population of 1000 students
 600 students know how to swim (S)
 700 students know how to bike (B)
 420 students know how to swim and bike (S,B)

 P(SB) = 420/1000 = 0.42


 P(S)  P(B) = 0.6  0.7 = 0.42

 P(SB) = P(S)  P(B) => Statistical independence


 P(SB) > P(S)  P(B) => Positively correlated
 P(SB) < P(S)  P(B) => Negatively correlated
Statistical-based Measures
 Measures
that take into account statistical
dependence
P(Y | X )
Lift 
P(Y )
P( X , Y )
Interest 
P( X ) P(Y )
PS  P( X , Y )  P( X ) P(Y )
P( X , Y )  P( X ) P(Y )
  coefficient 
P( X )[1  P( X )] P(Y )[1  P(Y )]
Example: Lift/Interest
Coffee Coffee
Tea 15 5 20
Tea 75 5 80
90 10 100

Association Rule: Tea  Coffee

Confidence= P(Coffee|Tea) = 0.75


but P(Coffee) = 0.9
 Lift = 0.75/0.9= 0.8333 (< 1, therefore is negatively associated)
There are lots of
measures proposed
in the literature

Some measures are


good for certain
applications, but not
for others

What criteria should


we use to determine
whether a measure
is good or bad?

What about Apriori-


style support based
pruning? How does
it affect these
measures?
Example: -Coefficient
 -coefficient is analogous to correlation
coefficient for continuous variables
Y Y Y Y
X 60 10 70 X 20 10 30
X 10 20 30 X 10 60 70
70 30 100 30 70 100

0.6  0.7  0.7 0.2  0.3  0.3


 
0.7  0.3  0.7  0.3 0.7  0.3  0.7  0.3
 0.5238  0.5238
 Coefficient is the same for both tables
Properties of A Good Measure
 Piatetsky-Shapiro:
3 properties a good measure M must
satisfy:
 M(A,B) = 0 if A and B are statistically
independent

 M(A,B) increase monotonically with P(A,B)


when P(A) and P(B) remain unchanged

 M(A,B) decreases monotonically with P(A) [or


P(B)] when P(A,B) and P(B) [or P(A)] remain
unchanged
Constraint-Based Mining
 Interactive, exploratory mining giga-bytes of
data?
 Could it be real? — Making good use of constraints!
 What kinds of constraints can be used in mining?
 Knowledge type constraint: classification, association,
etc.
 Data constraint: SQL-like queries
 Find product pairs sold together in Vancouver in Dec.’98.
 Dimension/level constraints:
 in relevance to region, price, brand, customer category.
 Interestingness constraints:
 strong rules (min_support  3%, min_confidence  60%).
 Rule constraints
 cheap item sales (price < $10) triggers big sales (sum > $200).
Left-hand side Right-hand side

Rule Constraints in Association Mining


 Two kind of rule constraints:
 Ruleform constraints: meta-rule
guided mining.
 P(x, y) ^ Q(x, w) takes(x, “database
systems”).
 Rule(content) constraint:
constraint-based query optimization.
 sum(LHS) < 100 ^ min(LHS) > 20 ^
count(LHS) > 3 ^ sum(RHS) > 1000
Constrain-Based Association Query
 Database: (1) trans (TID, Itemset ), (2) itemInfo (Item, Type, Price)
 A constrained asso. query (CAQ) is in the form of {(S1, S2 )|C },
 where C is a set of constraints on S1, S2 including frequency
constraint
 A classification of (single-variable) constraints:
 Class constraint: S  A. e.g. S  Item
 Domain constraint:
 S v,   { , , , , ,  }. e.g. S.Price < 100
 v S,  is  or . e.g. snacks  S.Type
 V S, or S V,   { , , , ,  }
 e.g. {snacks, sodas }  S.Type
 Aggregation constraint: agg(S)  v, where agg is in {min,
max, sum, count, avg}, and   { , , , , ,  }.
 e.g. count(S1.Type)  1 , avg(S2.Price)  100
Constrained Association Query
Optimization Problem
 Given a CAQ = { (S1, S2) | C }, the algorithm
should be :
 sound: It only finds frequent sets that satisfy
the given constraints C
 complete: All frequent sets satisfy the given
constraints C are found
 A naïve solution:
 Apply Apriori for finding all frequent sets, and
then to test them for constraint satisfaction
one by one.
 A better approach:
 Comprehensive analysis of the properties of
constraints and try to push them as deeply as
possible inside the frequent set computation.
Anti-monotone and Monotone
Constraints
 A constraint Ca is anti-monotone iff. for
any pattern S not satisfying Ca, none of
the super-patterns of S can satisfy Ca
 A constraint Cm is monotone iff. for any
pattern S satisfying Cm, every super-
pattern of S also satisfies it
Property of Constraints:
Anti-Monotone
 Anti-monotonicity: If a set S violates the
constraint, any superset of S violates the
constraint.
 Examples:
 sum(S.Price)  v is anti-monotone
 sum(S.Price)  v is not anti-monotone
 Application:
 Push “sum(S.price)  1000” deeply into
iterative frequent set computation.
Characterization of
Anti-Monotonicity Constraints
S  v,   { , ,  } yes
vS no
SV no
SV yes

min(S)  v no
min(S)  v yes

max(S)  v yes
max(S)  v no

count(S)  v yes
count(S)  v no

sum(S)  v yes
sum(S)  v no

avg(S)  v,   {,  } convertible


(frequent constraint) (yes)
Succinct Constraint
 If a rule constraint is succinct set, we can
directly generate precisely the sets that
satisfy it, before support counting begins.
 Example: min(I.price)500, I is a set of
products
 Knowing the price of each product can directly
evaluate this predicate
 Such constraints are precounting prunable
Property of Constraints:
Succinctness
 Succinctness:
 For any set S1 and S2 satisfying C, S1  S2 satisfies C
 Given A1 is the sets of size 1 satisfying C, then any set S
satisfying C are based on A1 , i.e., it contains a subset that
belongs to A1 ,
 Example :
 sum(S.Price )  v is not succinct
 min(S.Price )  v is succinct
 Optimization:
 If C is succinct, then C is pre-counting prunable. The
satisfaction of the constraint alone is not affected by the
iterative support counting.
Characterization of Constraints by
Succinctness
S  v,   { , ,  } Yes
vS yes
S V yes
SV yes
SV yes
min(S)  v yes
min(S)  v yes
min(S)  v yes
max(S)  v yes
max(S)  v yes
max(S)  v yes
count(S)  v weakly
count(S)  v weakly
count(S)  v weakly
sum(S)  v no
sum(S)  v no
sum(S)  v no
avg(S)  v,   { , ,  } no
(frequent constraint) (no)
Convertible Constraint
 Suppose all items in patterns are listed in
a total order R
 A constraint C is convertible anti-
monotone iff a pattern S satisfying the
constraint implies that each suffix of S
w.r.t. R also satisfies C
 Example: avg(I.price)≤500
 If items are added to the itemset, sorted by
price we know that avg(I.price) can only
increase with the addition of a new item.
 Similarly, there are convertible monotone
constraints
Example of Convertible Constraints:
Avg(S)  V
 Let R be the value descending order over
the set of items
 E.g. I={9, 8, 6, 4, 3, 1}
 Avg(S)  v is convertible monotone w.r.t.
R
 If S is a suffix of S1, avg(S1)  avg(S)
 {8, 4, 3} is a suffix of {9, 8, 4, 3}
 avg({9, 8, 4, 3})=6  avg({8, 4, 3})=5
 If S satisfies avg(S) v, so does S1
 {8, 4, 3} satisfies constraint avg(S)  4, so does {9,
8, 4, 3}

You might also like