Professional Documents
Culture Documents
MINING FREQUENT
PATTERNS,ASSOCIATIONS AND CORRELATIONS
sequence, structure).
September 3, 2022 Data Mining: Concepts and Techniques 4
Generate associate rules
Support(A=>B)=P(AUB).
-union of A and B or both A and B.
Confidence(A=>B)=P(B/A).
- sup_count(AUB)/sup_count(A).
-measure how the item B that appear in
transaction contains A.
-These are the two interesting measure to
generate the associations rule.
Main Steps:
Join step
Pruning step
C3 Itemset
3rd scan L3 Itemset sup
{B, C, E} {B, C, E} 2
September 3, 2022 Data Mining: Concepts and Techniques 11
Cont...
Calculate confidence for final itemsets(B,C,E)=2.
Possibility:- {B},{C},{E},{B,C},{B,E},{C,E}.
B=>C Ʌ E ; B=2/3 =0.66 =66%
C=>B Ʌ E ; C=2/3 =0.66=66%
E=>B Ʌ C ; E=2/3 =0.66=66%
B Ʌ C=>E ; B Ʌ C=2/2 =1 =100%
B Ʌ E=>C ; B Ʌ E=2/3 =0.66=66%
C Ʌ E=>B ; C Ʌ E=2/2 =1=100%
Overall confidence= 464/6
=77.3% (given minimum support=2)
Completeness
Preserve complete information for frequent pattern
mining
Never break a long pattern of any transaction
Compactness
Reduce irrelevant info—infrequent items are gone
pattern base
m-conditional pattern base:
{} Min_sup=3
Header Table
Item frequency head All frequent
f:4 c:1 patterns relate to m
f 4 {}
c 4 c:3 b:1 b:1 m,
a 3 f:4 fm, cm, am,
b 3 a:3 p:1 fcm, fam, cam,
m 3 c:3 fcam
p 3 m:2 b:1
p:2 m:1 a:3
m-conditional FP-tree
September 3, 2022 Data Mining: Concepts and Techniques 16
Why Is FP-Growth the Winner?
Divide-and-conquer:
leads to focused search of smaller databases
Other factors
no candidate generation, no candidate test
compressed database: FP-tree structure
no repeated scan of entire database
basic ops—counting local freq items and building
sub FP-tree.
support
Exploration of shared multi-level mining.
(age,income,buys)
September 3, 2022 Data Mining: Concepts and Techniques 24
Store aggregates which is essential for computing
support and confidence
Dimension-age,income,buys
Basecuboid aggregates the task relevant data by
Age,income and buys.
2-D aggregates (age,income),(age,buys),(income,buys)
1-D having (age,buys,income)
0-D cuboid contains total number of transactions in the
task relevant data.
Cat – categorical attributes.
Increase confidence level or compact rule.
Only numeric values are discretized is called dynamic
Quantization.
Example:
age(X,”34-35”) income(X,”30-50K”)
buys(X,”high resolution TV”)
sup( X )
all _ conf
max_ item _ sup( X )
sup( X )
coh
| universe( X ) |
3. =1;no correlation.
p(video)=0.75.
3. probability of purchasing both is
p(game,video)=0.40.
Lift=P(AUB)/P(A)P(B)
=0.40/0.60*0.75=0.89
The Value is less than 1.so it is negatively
correlated.
Data constraint:
Set of task relevant data.
Use queries/tools.
Dimension/level constraint
-in relevance to region, price, brand, customer category
Interestingness constraint
-Support,confidence,correlation
-specify threshold on statistical measures.
September 3, 2022 Data Mining: Concepts and Techniques 31
Rule constraints
Specify the syntactic form of rules.
Improve the efficiency of data mining process.
Expected set.
Analysis the relationship between variables.
Simply as Metarules.