Week 4

CP610 Data Analysis
- Mining Frequent Patterns and

Associations: Advanced Methods
Midterm
• Feb 14th 16:00-17:20
• Room BA102
• 20 multiple choice questions
• 5 short-answer questions
• You can bring a one-side hand-written paper for
reference
• Cover lectures in week1 - week5
Advanced Frequent Pattern Mining
• Pattern Mining in Multi-Level, Multi-Dimensional Space
• Mining Multi-Level Association
• Mining Multi-Dimensional Association
• Mining Quantitative Association Rules
• Mining Rare Patterns and Negative Patterns
• Constraint-Based Frequent Pattern Mining
• Mining High-Dimensional Data and Colossal Patterns
• Mining Compressed or Approximate Patterns
• Pattern Exploration and Application
• Summary
3
Multiple-Level Association Rules
• Patterns or association rules may have items or
concepts residing at high, low, or multiple
abstraction levels.
Mining Multiple-Level Association
Rules: Concept hierarchy
• Concept hierarchy for the items: a sequence of
mappings from a set of low-level concepts to a
higher-level, more general concept set.
Mining Multiple-Level Association
Rules: A support-confidence
framework
• Top-down strategy
• Counts are accumulated for the calculation of frequent
itemsets at each concept level
• Starting at concept level 1 and working downward in the
hierarchy
• For each level, any algorithm for discovering frequent
itemsets may be used, such as Apriori or its variations.
Mining Multiple-Level Association Rules: Flexible
support settings
• Flexible support settings

• Items at the lower level are expected to have lower support
• Exploration of shared multi-level mining (Agrawal &
Srikant@VLB’95, Han & Fu@VLDB’95)
uniform support reduced support

Level 1
Milk Level 1
min_sup = 5%
[support = 10%] min_sup = 5%
Level 2 2% Milk Skim Milk Level 2

min_sup = 5% [support = 6%] [support = 4%] min_sup = 3%
7
Multi-level Association: Flexible Support and
Redundancy filtering
• Flexible min-support thresholds: Some items are more valuable but less
frequent
• Use non-uniform, group-based min-support
• E.g., {diamond, watch, camera}: 0.05%; {bread, milk}: 5%; …
• Redundancy Filtering: Some rules may be redundant due to “ancestor”

relationships between items
• milk Þ wheat bread [support = 8%, confidence = 70%]
• 2% milk Þ wheat bread [support = 2%, confidence = 72%]
The first rule is an ancestor of the second rule
• A rule is redundant if its support is close to the “expected” value, based on

the rule’s ancestor
• Suppose 1/4 of milk items are 2% milk. Does the second rule provide novel 8information?

• Summary
9
Mining Multi-Dimensional Association
• Single-dimensional rules:
buys(X, “milk”) Þ buys(X, “bread”)
• Multi-dimensional rules: ³ 2 dimensions or predicates

• Inter-dimension assoc. rules (no repeated predicates)
age(X,”19-25”) Ù occupation(X,“student”) Þ buys(X, “coke”)
• hybrid-dimension assoc. rules (repeated predicates)
age(X,”19-25”) Ù buys(X, “popcorn”) Þ buys(X, “coke”)
• Search for frequent predicate sets, rather than searching

for frequent itemsets
10

• Summary
11
Mining Quantitative Associations
Methods to discover associations with quantitative attributes

1. Static discretization based on predefined concept hierarchies
2. Dynamic discretization based on data distribution
(quantitative rules, e.g., Agrawal & Srikant@SIGMOD96)
3. Clustering: Distance-based association (e.g., Yang &
Miller@SIGMOD97)
• One dimensional clustering then association
4. Statistical Theory: (such as Aumann and Lindell@KDD99)
Sex = female => Wage: mean=$7/hr (overall mean = $9)
12

• Summary
14
Negative and Rare Patterns
• Rare patterns: Very low support but interesting
• E.g., buying Rolex watches
• Mining: Setting individual-based or special group-based
support threshold for valuable items
• Negative patterns: itemsets X and Y are both frequent but rarely

occur together
• it is unlikely that one buys bread and bagel together
15
Defining Negative Correlated Patterns (I)
• Definition 1 (support-based)
• If itemsets X and Y are both frequent but rarely occur together, i.e.,
sup(X U Y) < sup (X) * sup(Y)
Then X and Y are negatively correlated, X U Y is a negatively correlated pattern
• If sup(X U Y) < <sup (X) * sup(Y)
Then X and Y are stongly negatively correlated, X U Y is a strongly negatively correlated
pattern
• Problem: A store sold 100 packages A and 100 packages B, only one transaction containing
both A and B.
• When there are in total 200 transactions, we have
s(A U B) = 0.005, s(A) * s(B) = 0.25, s(A U B) < s(A) * s(B)
• When there are 105 transactions, we have
s(A U B) = 1/105, s(A) * s(B) = 1/103 * 1/103, s(A U B) > s(A) * s(B)
• Where is the problem? —Null transactions, i.e., the support-based definition is not null-
invariant!
16
Defining Negative Correlated Patterns (II)
• Definition 2 (Kulzynski measure-based) If itemsets X
and Y are frequent, but (P(X|Y) + P(Y|X))/2 < є, where є
is a negative pattern threshold, then X and Y are
negatively correlated.
• Ex. For the same needle package problem, when no
matter there are 200 or 105 transactions, if є = 0.01, we
have
(P(A|B) + P(B|A))/2 = (0.01 + 0.01)/2 < є
17
• Summary
18
Constraints in Data Mining
• Knowledge type constraint:

• association, classification, etc.
• Data constraint — using SQL-like queries
• find product pairs sold together in stores in Chicago this year
• Dimension/level constraint
• in relevance to region, price, brand, customer category
• Interestingness constraint
• strong rules: min_support ³ 3%, min_confidence ³ 60%
• Rule (or pattern) constraint
• small sales (price < $10) triggers big sales (sum > $200)
19
Meta-Rule Guided Mining
• Meta-rule can be in the rule form with partially instantiated predicates
and constants
P1(X, Y) ^ P2(X, W) => buys(X, “iPad”)
• The resulting rule derived can be
age(X, “15-25”) ^ profession(X, “student”) => buys(X, “iPad”)
• In general, it can be in the form of
P1 ^ P2 ^ … ^ Pl => Q1 ^ Q2 ^ … ^ Qr
• Method to find meta-rules
• Find frequent (l+r) predicates (based on min-support threshold)
• Push constraint deeply when possible into the mining process (see
the remaining discussions on constraint-push techniques)
• Use confidence, correlation, and other measures when possible
20
Naïve Algorithm: Apriori + Constraint
Database D itemset sup.
L1 itemset sup.
TID Items C1 {1} 2 {1} 2
100 134 {2} 3 {2} 3
200 235 Scan D {3} 3 {3} 3
300 1235 {4} 1 {5} 3
400 25 {5} 3
C2 itemset sup C2 itemset
L2 itemset sup {1 2} 1 Scan D {1 2}
{1 3} 2 {1 3} 2 {1 3}
{2 3} 2 {1 5} 1 {1 5}
{2 3} 2 {2 3}
{2 5} 3
{2 5} 3 {2 5}
{3 5} 2
{3 5} 2 {3 5}
C3 itemset Scan D L3 itemset sup
{2 3 5} {2 3 5} 2
21
Naïve Algorithm: Apriori + Constraint
L1 itemset sup.
TID Items C1 {1} 2 {1} 2
100 134 {2} 3 {2} 3
200 235 Scan D {3} 3 {3} 3
300 1235 {4} 1 {5} 3
400 25 {5} 3
L2 itemset sup {1 2} 1 Scan D {1 2}
{1 3} 2 {1 3} 2 {1 3}
{2 3} 2 {1 5} 1 {1 5}
{2 3} 2 {2 3}
{2 5} 3
{2 5} 3 {2 5}
{3 5} 2
{3 5} 2 {3 5}
C3 itemset Scan D L3 itemset sup Constraint:
{2 3 5} {2 3 5} 2 Sum{S.price} < 5
22
How can we use rule constraints
to prune the search space?
• pruning pattern search space
• checks candidate patterns and decides whether a
pattern can be pruned
• pruning data search space

• checks the data set to determine whether the particular
data piece will be able to contribute to the subsequent
generation of satisfiable patterns (for a particular
pattern) in the remaining mining process
Constraint-Based Frequent Pattern Mining
• Pattern space pruning constraints
• Anti-monotonic: If constraint c is violated, its further mining can be
terminated
• Monotonic: If c is satisfied, no need to check c again
• Succinct: c must be satisfied, so one can start with the data sets satisfying
c
• Convertible: c is not monotonic nor anti-monotonic, but it can be
converted into it if items in the transaction can be properly ordered
• Inconvertible
• Data space pruning constraint
24
Pattern Space Pruning with Anti-Monotonicity Constraints
• A constraint C is anti-monotone if the super pattern satisfies C, all of its sub-

patterns do so too
• In other words, anti-monotonicity: If an itemset S violates the constraint, so
does any of its superset
• Ex. 1. sum(S.price) £ v anti-monotone
• Ex. 2. sum(S.Price) ³ v not anti-monotone
• Ex. 3. support count (s) > v anti-monotone: core property used in Apriori
• Apply: Anti-monotone rules can be pushed into the mining process. Ex. can
be applied at each iteration of Apriori-style algorithms
25
Pattern Space Pruning with Monotonicity Constraints
• A constraint C is monotone if the pattern satisfies C, we do not need to

check C in subsequent mining
• Alternatively, monotonicity: If an itemset S satisfies the constraint, so
does any of its superset
• Ex. 1. sum(S.Price) ³ v is monotone
• Ex. 2. min(S.Price) £ v is monotone
• Ex. 3. range(s.profit) < v is anti-monotone
• Apply: No need to further testing of this constraint on its super itemsets.
26
Pattern Space Pruning with Succinctness
• Succinctness:
• Given A1, the set of items satisfying a succinctness constraint C, then
any set S satisfying C is based on A1 , i.e., S contains a subset belonging
to A1
• Idea: Without looking at the transaction database, whether an itemset
S satisfies constraint C can be determined based on the selection of
items
• min(S.Price) > v is succinct
• Sum(S.Price) >v is not succinct
• Apply: If C is succinct, C is pre-counting pushable. there is no need to

iteratively check the rule constraint during the mining process.
27
Constrained Apriori : Push a Succinct Constraint Deep

L1 itemset sup.
TID Items C1 {1} 2 {1} 2
100 134 {2} 3 {2} 3
200 235 Scan D {3} 3 {3} 3
300 1235 {4} 1 {5} 3
400 25 {5} 3
L2 itemset sup {1 2}
{1 2} 1 Scan D
{1 3} 2 {1 3} 2 {1 3}
{1 5} 1 {1 5}
{2 3} 2
{2 3} 2 {2 3}
{2 5} 3
{2 5} 3 {2 5}
{3 5} 2 {3 5}
{3 5} 2
C3 itemset Scan D L3 itemset sup Constraint:
{2 3 5} {2 3 5} 2 min{S.price } <= 1
28
Constraint-Based Mining — A General Picture
Constraint Anti-monotone Monotone Succinct

vÎS no yes yes
SÊV no yes yes
SÍV yes no yes

min(S) £ v no yes yes
min(S) ³ v yes no yes

max(S) £ v yes no yes
max(S) ³ v no yes yes

count(S) £ v yes no weakly
count(S) ³ v no yes weakly
sum(S) £ v ( a Î S, a ³ 0 ) yes no no
sum(S) ³ v ( a Î S, a ³ 0 ) no yes no
range(S) £ v yes no no
range(S) ³ v no yes no
avg(S) q v, q Î { =, £, ³ } convertible convertible no

support(S) ³ x yes no no
support(S) £ x no yes no
29
Convertible Constraints: Ordering Data in Transactions
TDB (min_sup=2)
TID Transaction
• Convert tough constraints into anti-
10 a, b, c, d, f
monotone or monotone by properly ordering
20 b, c, d, f, g, h
items
30 a, c, d, e, f
• Examine C: avg(S.profit) ³ 25 40 c, e, f, g
• Order items in value-descending order Item Profit
• <a, f, g, d, b, h, c, e> a 40
b 0
• If an itemset afb violates C
c -20
• So does afbh, afb*
d 10
• It becomes anti-monotone! e -30
f 30
g 20
h -10
31
Strongly Convertible Constraints
• A constraint that is both convertible monotone and

antimonotone is called strongly convertible
• avg(X) ³ 25 is convertible anti-monotone w.r.t. item Item Profit
value descending order R: <a, f, g, d, b, h, c, e> a 40
• If an itemset af violates a constraint C, so does b 0
every itemset with af as prefix, such as afd c -20
• avg(X) ³ 25 is convertible monotone w.r.t. item value d 10
ascending order R-1: <e, c, h, b, d, g, f, a> e -30
• If an itemset d satisfies a constraint C, so does f 30
itemsets df and dfa, which having d as a prefix g 20
• Thus, avg(X) ³ 25 is strongly convertible h -10
32
What Constraints Are Convertible?
Convertible anti- Convertible Strongly

Constraint monotone monotone convertible
avg(S) £ , ³ v Yes Yes Yes

median(S) £ , ³ v Yes Yes Yes
sum(S) £ v (items could be of any value,
Yes No No
v ³ 0)
sum(S) £ v (items could be of any value,
No Yes No
v £ 0)
sum(S) ³ v (items could be of any value,
No Yes No
v ³ 0)
sum(S) ³ v (items could be of any value,
Yes No No
v £ 0)
……
33
Constraint-Based Frequent Pattern Mining
• Pattern space pruning constraints
• Data space pruning constraint

• Data succinct: Data space can be pruned at the initial pattern mining
process
• Ex. must contain digital camera
• Data anti-monotonic: If a transaction t does not satisfy c, t can be pruned
from its further mining
• Ex. sum(I.price) ≤ $100
34
• Summary
35
Mining Long Patterns
• Colossal pattern: long pattern
• Mining long patterns is needed in bioinformatics,

social network analysis, software engineering, …
• For example, analyze microarray data that contain a large
number of genes (e.g., 10,000 to 100,000) but only a small
number of samples (e.g., 100 to 1000).
36
Challenges of mining long
patterns
• The curse of “downward closure” property of frequent
patterns
• Any sub-pattern of a frequent pattern is frequent
• If {a1, a2, …, a100} is frequent, then {a1}, {a2}, …, {a100}, {a1, a2}, {a1,
a3}, …, {a1, a100}, {a1, a2, a3}, … are all frequent! There are about 2100
such frequent itemsets!
• No matter searching in breadth-first (e.g., Apriori) or depth-
first (e.g., FPgrowth), if we still adopt the “small to large”
step-by-step growing paradigm, we have to examine so
many patterns, which leads to combinatorial explosion!
T1 = 2 3 4 ….. 39 40
Example Dataset T2 = 1 3 4 ….. 39 40
: .
: .
min-support σ= 20 T40=1 2 3 4 …… 39
T41= 41 42 43 ….. 79
T42= 41 42 43 ….. 79
# of Mid sized (length=20) closed Patters: æç 40 ö÷
ç 20 ÷ : .
è ø
: .
# of long pattern (length =39): 1 T60= 41 42 43 … 79
α= {41,42,…,79} of size 39
Question: How to find it without generating an exponential number of

mid-sized patterns?
Colossal Pattern Set: Small but Interesting
• It is often the case that only

a small number of patterns
are colossal, i.e., of large
size
• Colossal patterns are

usually attached with
greater importance than
those of small pattern sizes
40
Mining Colossal Patterns: Philosophy
• No hope for completeness
• Jumping out of the swamp of the mid-sized results
• Striving for mining almost complete colossal patterns
41
Pattern-Fusion Strategy: Core Patterns
• Core Patterns
Intuitively, for a frequent pattern α, a subpattern β is a τ-core pattern

of α if β shares a similar support set with α, i.e.,
| Da |
³t 0 <t £1
| Db |
where τ is called the core ratio
42
Example: Core Patterns
A data set with 400 transactions, if τ = 0.5
A core pattern: (ab) occurs in (abe) 100 times, and (abcef) 100 times, so
Sup(abe)/Sup(ab)=0.5
Transaction (# of Ts) Core Patterns (τ = 0.5)
(abe) (100) (abe), (ab), (be), (ae), (e)

(bcf) (100) (bcf), (bc), (bf)
(acf) (100) (acf), (ac), (af)
(abcef) (100) (ab), (ac), (af), (ae), (bc), (bf), (be) (ce), (fe), (e),
(abc), (abf), (abe), (ace), (acf), (afe), (bcf), (bce),
(bfe), (cfe), (abcf), (abce), (bcfe), (acfe), (abfe), (abcef)
43
Robustness of Core Patterns
• A colossal pattern has far more core
patterns than a small-sized pattern
• A colossal pattern can be generated

by merging a set of core patterns
• A random draw from a complete set

of pattern of size c would more
likely to pick a core descendant of a
colossal pattern
Idea of Pattern-Fusion Algorithm
• Generate a complete set of frequent patterns up to a small
size
• Randomly pick a pattern β (β has a high probability to be a
core-descendant of some colossal pattern α)
• Identify all α’s descendants in this complete set, and merge
all of them ― This would generate a much larger core-
descendant of α
• The set of all its core patterns of α, is bounded in a “ball” of
diameter r(τ ).
Distance between two patterns

45
Pattern-Fusion: The Algorithm
• Initialization (Initial pool): Use an existing algorithm to mine all

frequent patterns up to a small size, e.g., 3
• Iteration (Iterative Pattern Fusion) to find k Colossal patterns:
• At each iteration, k seed patterns are randomly picked from
the current pattern pool
• For each seed pattern, we find all the patterns within a
bounding ball centered at the seed pattern
• All these patterns found are fused together to generate a set
of super-patterns. All the super-patterns thus generated
form a new pool for the next iteration
• Termination: support threshold
46
Why Is Pattern-Fusion Efficient?
• A bounded-breadth pattern
tree traversal
• Ability to identify “short-cuts”

and take “leaps”
47
• Summary
48
Mining Compressed Patterns by
Pattern Clustering
• Clustering is the automatic process of grouping
similar objects together
• The frequent closed patterns are clustered using a
tightness measure called 𝛿 -cluster.
• A representative pattern is selected for each
cluster, thereby offering a compressed version of
the set of frequent patterns.
Mining Compressed Patterns:
Distance Measure
• Let P1 and P2 be two closed patterns. Their
supporting transaction sets are T(P1) and T(P2),
respectively. The pattern distance of P1 and P2, is
defined as
Example. T(P1) ={t1, t2, t3, t4, t5} and T(P2) = {t1, t2, t3, t4, t6}.
The distance between P1 and P2 is
Mining Compressed Patterns:
Expression of patterns
• Given two patterns A and B, we say if ,
where O(A) is the corresponding itemset of A.
• Assume patterns P1,P2, … ,Pk are in the same
cluster, the representative pattern Pr of the cluster
should be able to express all the other patterns in
the cluster.
Mining Compressed Patterns :
𝛿 –clustering
• A pattern P is 𝛿-covered by another pattern P’ if
and
• A set of patterns form a 𝜹 –cluster, if there exists a

representative pattern Pr such that for each pattern
P in the set, P is 𝛿 -covered by Pr
Pattern compression problem
• Given a transaction database, a minimum support
min_sup, and the cluster quality measure 𝛿
• The pattern compression problem is to find a set of
representative patterns R such that
• for each frequent pattern P (with respect to min_sup),
there is a representative pattern Pr ∈ R (with respect to
min_sup! ), which covers P
• the value of |R| is minimized.
A detailed approach: Liu, Guimei, Haojun Zhang, and Limsoon Wong. "A flexible approach to finding representative pattern sets." IEEE
Transactions on Knowledge and Data Engineering (TKDE) 26.7 (2013): 1562-1574.
• Summary
57
How to Understand and Interpret Patterns?
• Do they all make sense?

diaper beer • What do they mean?
• How are they useful?
morphological info. and simple statistics
Semantic Information
Not all frequent patterns are useful, only meaningful ones …
Annotate patterns with semantic information

A Dictionary Analogy
Word: “pattern” – from Merriam-Webster
Non-semantic info.
Definitions indicating
semantics
Examples of Usage
Synonyms
Related Words
Semantic annotation of the pattern “frequent,
pattern”
Automatic annotation of frequent
patterns: Context Modeling
• A context unit is a basic object in a database, D,
that carries semantic information and co-occurs
with at least one frequent pattern, p, in at least one
transaction in D.
• A context unit can be an item, a pattern, or even a
transaction, depending on the specific task and data.
• The context of a pattern, p, is a selected set of
weighted context units (referred to as context
indicators) in the database
Semantic Analysis with Context Models
• Task1: Extract strongest context indicators

• E.g. cosine similarity
• Task2: Extract representative transactions

• Represent each transaction as a context vector,
then measure similarity
• Task3: Extract semantically similar patterns

Example: Annotations Generated for Frequent
Patterns in the DBLP Data Set
• Summary
64
Summary
• Mining patterns in multi-level, multi dimensional space

• Mining rare and negative patterns
• Constraint-based pattern mining
• Specialized methods for mining high-dimensional data and
colossal patterns
• Mining compressed or approximate patterns
• Pattern exploration and understanding: Semantic annotation
of frequent patterns
65

Week 4

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Week 4

Uploaded by

Copyright:

Available Formats

CP610 Data Analysis

- Mining Frequent Patterns and

• Flexible support settings

uniform support reduced support

Level 2 2% Milk Skim Milk Level 2

• Redundancy Filtering: Some rules may be redundant due to “ancestor”

• A rule is redundant if its support is close to the “expected” value, based on

• Pattern Mining in Multi-Level, Multi-Dimensional Space

• Multi-dimensional rules: ³ 2 dimensions or predicates

• Search for frequent predicate sets, rather than searching

• Pattern Mining in Multi-Level, Multi-Dimensional Space

Methods to discover associations with quantitative attributes

• Pattern Mining in Multi-Level, Multi-Dimensional Space

• Negative patterns: itemsets X and Y are both frequent but rarely

• Pattern Mining in Multi-Level, Multi-Dimensional Space

• Constraint-Based Frequent Pattern Mining

• Mining High-Dimensional Data and Colossal Patterns

• Mining Compressed or Approximate Patterns

• Pattern Exploration and Application

• Knowledge type constraint:

• pruning data search space

• Data space pruning constraint

• A constraint C is anti-monotone if the super pattern satisfies C, all of its sub-

• A constraint C is monotone if the pattern satisfies C, we do not need to

• Apply: No need to further testing of this constraint on its super itemsets.

• Apply: If C is succinct, C is pre-counting pushable. there is no need to

Database D itemset sup.

Constraint Anti-monotone Monotone Succinct

SÍV yes no yes

min(S) ³ v yes no yes

max(S) ³ v no yes yes

count(S) ³ v no yes weakly

avg(S) q v, q Î { =, £, ³ } convertible convertible no

• A constraint that is both convertible monotone and

Convertible anti- Convertible Strongly

avg(S) £ , ³ v Yes Yes Yes

• Data space pruning constraint

• Pattern Mining in Multi-Level, Multi-Dimensional Space

• Constraint-Based Frequent Pattern Mining

• Mining High-Dimensional Data and Colossal Patterns

• Mining Compressed or Approximate Patterns

• Pattern Exploration and Application

• Colossal pattern: long pattern

• Mining long patterns is needed in bioinformatics,

Question: How to find it without generating an exponential number of

• It is often the case that only

• Colossal patterns are

• No hope for completeness

• Jumping out of the swamp of the mid-sized results

• Striving for mining almost complete colossal patterns

Intuitively, for a frequent pattern α, a subpattern β is a τ-core pattern

where τ is called the core ratio

Transaction (# of Ts) Core Patterns (τ = 0.5)

(abe) (100) (abe), (ab), (be), (ae), (e)

• A colossal pattern can be generated

• A random draw from a complete set

Distance between two patterns

• Initialization (Initial pool): Use an existing algorithm to mine all

• Ability to identify “short-cuts”

• Pattern Mining in Multi-Level, Multi-Dimensional Space

• Constraint-Based Frequent Pattern Mining

• Mining High-Dimensional Data and Colossal Patterns

• Mining Compressed or Approximate Patterns