You are on page 1of 79

LECTURE 4

DATA MINING:
CHARACTERIZATION

Jimma University ,Faculty of COmputing


Arranged by : Dessalegn Y.
1
CONCEPT DESCRIPTION:

CHARACTERIZATION AND COMPARISON

WHAT IS CONCEPT DESCRIPTION?

 Descriptive vs. predictive data mining


 Descriptive mining: describes concepts or task-relevant data sets in concise,
summarative, informative, discriminative forms
 Predictive mining: Based on data and analysis, constructs models for the database,
and predicts the trend and properties of unknown data
 Concept description:
 Characterization: provides a concise summarization of the given collection of data
 Comparison: provides descriptions comparing two or more collections of data
 Data generalization: process which abstracts a large set of task-relevant data in a
database from a low conceptual levels to higher ones
ATTRIBUTE-ORIENTED INDUCTION

How it is done?
 Collect the task-relevant data( initial relation) using a relational database
query
 Perform generalization by attribute removal or attribute generalization.
 replacing relatively low-level values (e.g., numeric values for an attribute age) with
higher-level concepts (e.g., young, middle-aged, and senior)

 Apply aggregation by merging identical, generalized tuples and


accumulating their respective counts.
 Interactive presentation with users.
BASIC PRINCIPLES OF ATTRIBUTE-ORIENTED INDUCTION

 Data focusing: task-relevant data, including dimensions, and the result is


the initial relation.
 Attribute-removal: remove attribute A if there is a large set of distinct
values for A but (1) there is no generalization operator on A, or (2) A’s
higher level concepts are expressed in terms of other attributes.
 Attribute-generalization: If there is a large set of distinct values for A, and
there exists a set of generalization operators on A, then select an operator
and generalize A.
 Generalized relation threshold control: control the final relation/rule size.
EXAMPLE
 Describe general characteristics of graduate students in the Big-University
database
use Big_University_DB
mine characteristics as “Science_Students”
in relevance to name, gender, major, birth_place, birth_date, residence,
phone#, gpa
from student
where status in “graduate”
 Corresponding SQL statement:
Select name, gender, major, birth_place, birth_date, residence, phone#, gpa
from student
where status in {“Msc”, “MBA”, “PhD” }
CLASS CHARACTERIZATION: EXAMPLE

Name Gender Major Birth-Place Birth_date Residence Phone # GPA


Initial Jim M CS Vancouver,BC, 8-12-76 3511 Main St., 687-4598 3.67
Relation Woodman Canada Richmond
Scott M CS Montreal, Que, 28-7-75 345 1st Ave., 253-9106 3.70
Lachance Canada Richmond
Laura Lee F Physics Seattle, WA, USA 25-8-70 125 Austin Ave., 420-5232 3.83
… … … … … Burnaby … …

Removed Retained Sci,Eng, Country Age range City Removed Excl,
Bus VG,..
Gender Major Birth_region Age_range Residence GPA Count
Prime M Science Canada 20-25 Richmond Very-good 16
Generalized F Science Foreign 25-30 Burnaby Excellent 22
Relation … … … … … … …

Birth_Region
Canada Foreign Total
Gender
M 16 14 30
F 10 22 32
Total 26 36 62
ATTRIBUTE RELEVANCE ANALYSIS

 Why?
 Which dimensions should be included?
 How high level of generalization?
 Reduce # attributes; easy to understand patterns

 What?
 statistical method for preprocessing data
 filter out irrelevant or weakly relevant attributes
 retain or rank the relevant attributes

 relevance related to dimensions and levels


 analytical characterization, analytical comparison
ATTRIBUTE RELEVANCE ANALYSIS (CONT’D)

 How?
 Data Collection
 Analytical Generalization
 Use information gain analysis (e.g., entropy or other measures) to identify
highly relevant dimensions and levels.
 Relevance Analysis
 Sort and select the most relevant dimensions and levels.

 Attribute-oriented Induction for class description


 On selected dimension/level

 OLAP operations (e.g. drilling, slicing) on relevance rules


INFORMATION-THEORETIC APPROACH
 Decision tree
 each internal node tests an attribute
 each branch corresponds to attribute value
 each leaf node assigns a classification

 ID3 algorithm
 build decision tree based on training objects with known class labels
to classify testing objects
 rank attributes with information gain measure
 minimal height
 the least number of tests to classify an object
See example
TOP-DOWN INDUCTION OF DECISION TREE

Attributes = {Outlook, Temperature, Humidity, Wind}


PlayTennis = {yes, no}

Outlook
sunny overcast rain

Humidity Wind
yes
high normal strong weak
no yes no yes
SIMILARITY AND DISTANCE

 For many different problems we need to quantify how close two objects are.
 Examples:
 For an item bought by a customer, find other similar items
 Group together the customers of a site so that similar customers are shown the same
ad.
 Group together web documents so that you can separate the ones that talk about
politics and the ones that talk about sports.
 Find all the near-duplicate mirrored web documents.
 Find credit card transactions that are very different from previous transactions.
 To solve these problems we need a definition of similarity, or distance.
 The definition depends on the type of data that we have
SIMILARITY
 Numerical measure of how alike two data objects are.
 A function that maps pairs of objects to real values
 Higher when objects are more alike.
 Often falls in the range [0,1], sometimes in [-1,1]

 Desirable properties for similarity


1. s(p, q) = 1 (or maximum similarity) only if p = q. (Identity)
2. s(p, q) = s(q, p) for all p and q. (Symmetry)
SIMILARITY BETWEEN SETS
 Consider the following documents

apple apple new


releases releases apple pie
new ipod new ipad recipe
 Which ones are more similar?

 How would you quantify their similarity?


SIMILARITY: INTERSECTION

 Number of words in common

apple apple new


releases releases apple pie
new ipod new ipad recipe

 Sim(D,D) = 3, Sim(D,D) = Sim(D,D) =2


 What about this document?

Vefa rereases new book


with apple pie recipes
 Sim(D,D) = Sim(D,D) = 3
JACCARD SIMILARITY
 The Jaccard similarity (Jaccard coefficient) of two sets S1, S2 is the size of
their intersection divided by the size of their union.
 JSim (C1, C2) = |C1C2| / |C1C2|.

3 in intersection.
8 in union.
Jaccard similarity
= 3/8

 Extreme behavior:
 Jsim(X,Y) = 1, iff X = Y
 Jsim(X,Y) = 0 iff X,Y have no elements in common
 JSim is symmetric 15
JACCARD SIMILARITY BETWEEN SETS

 The distance for the documents

Vefa rereases
apple apple new
new book with
releases releases apple pie
apple pie
new ipod new ipad recipe
recipes

 JSim(D,D) = 3/5
 JSim(D,D) = JSim(D,D) = 2/6
 JSim(D,D) = JSim(D,D) = 3/9
SIMILARITY BETWEEN VECTORS
Documents (and sets in general) can also be represented as vectors
document Apple Microsoft Obama Election
D1 10 20 0 0
D2 30 60 0 0
D3 60 30 0 0
D4 0 0 10 20
How do we measure the similarity of two vectors?

• We could view them as sets of words. Jaccard Similarity will


show that D4 is different form the rest
• But all pairs of the other three documents are equally
similar
We want to capture how well the two vectors are aligned
COSINE SIMILARITY

 Sim(X,Y) = cos(X,Y)

 The cosine of the angle between X and Y

 If the vectors are aligned (correlated) angle is zero degrees and cos(X,Y)=1
 If the vectors are orthogonal (no common coordinates) angle is 90 degrees and
cos(X,Y) = 0

 Cosine is commonly used for comparing documents, where we assume that the
vectors are normalized by the document length.
COSINE SIMILARITY - MATH
 If d1 and d2 are two vectors, then
cos( d1, d2 ) = (d1  d2) / ||d1|| ||d2|| ,
where  indicates vector dot product and || d || is the length of vector d.
 Example:

d1 = 3 2 0 5 0 0 0 2 0 0
d2 = 1 0 0 0 0 0 0 1 0 2

d1  d2= 3*1 + 2*0 + 0*0 + 5*0 + 0*0 + 0*0 + 0*0 + 2*1 + 0*0 + 0*2 = 5
||d1|| = (3*3+2*2+0*0+5*5+0*0+0*0+0*0+2*2+0*0+0*0)0.5 = (42) 0.5 = 6.481
||d2|| = (1*1+0*0+0*0+0*0+0*0+0*0+0*0+1*1+0*0+2*2) 0.5 = (6) 0.5 = 2.245
cos( d1, d2 ) = .3150
EXAMPLE
document Apple Microsoft Obama Election
D1 10 20 0 0
D2 30 60 0 0
D3 60 30 0 0
D4 0 0 10 20

Cos(D1,D2) = 1

Cos (D3,D1) = Cos(D3,D2) = 4/5

Cos(D4,D1) = Cos(D4,D2) = Cos(D4,D3) = 0


DISTANCE

 Numerical measure of how different two data objects are


 A function that maps pairs of objects to real values
 Lower when objects are more alike
 Higher when two objects are different

 Minimum distance is 0, when comparing an object with itself.


 Upper limit varies
SIMILARITIES INTO DISTANCES


HAMMING DISTANCE
 Hamming distance is the number of positions in which bit-
vectors differ.
 Example: p1 = 10101 p2 = 10011.
 d(p1, p2) = 2 because the bit-vectors differ in the 3rd and 4th positions.
 The L1 norm for the binary vectors

 Hamming distance between two vectors of categorical


attributes is the number of positions in which they differ.
 Example: x = (married, low income, cheat),
y = (single, low income, not cheat)
24

d(x,y) = 2
DISTANCE BETWEEN STRINGS

 How do we define similarity between strings?

weird wierd
intelligent unintelligent
Athena Athina

 Important for recognizing and correcting typing errors and analyzing DNA
sequences.
EDIT DISTANCE FOR STRINGS

 The edit distance of two strings is the number of inserts and deletes of
characters needed to turn one into the other.
 Example: x = abcde ; y = bcduve.
 Turn x into y by deleting a, then inserting u and v after d.
 Edit distance = 3.

 Minimum number of operations can be computed using dynamic


programming
 Common distance measure for comparing DNA sequences

26
APPLICATIONS OF SIMILARITY:
RECOMMENDATION SYSTEMS
IMPORTANT PROBLEM

 Recommendation systems
 When a user buys an item (initially books) we want to recommend other
items that the user may like
 When a user rates a movie, we want to recommend movies that the user
may like
 When a user likes a song, we want to recommend other songs that they may
like
 A big success of data mining
UTILITY (PREFERENCE) MATRIX

Harry Harry Harry Twilight Star Star Star


Potter 1 Potter 2 Potter 3 Wars 1 Wars 2 Wars 3
A 4 5 1
B 5 5 4
C 2 4 5
D 3 3

How can we fill the empty entries of the matrix?


RECOMMENDATION SYSTEMS

 Content-based:
 Represent the items into a feature space and recommend
items to customer C similar to previous items rated
highly by C
 Movie recommendations: recommend movies with same actor(s),
director, genre, …
 Websites, blogs, news: recommend other sites with “similar”
content
CONTENT-BASED PREDICTION

Harry Harry Harry Black Star Star Star


Potter 1 Potter 2 Potter 3 panther Wars 1 Wars 2 Wars 3
A 4 5 1
B 5 5 4
C 2 4 5
D 3 3

Someone who likes one of the Harry Potter or Star Wars


movies is likely to like the rest
• Same actors, similar story, same genre
APPROACH

 Map items into a feature space:


 For movies:
 Actors, directors, genre, rating, year,…
 Challenge: make all features compatible.

 To compare items with users we need to map users to the same feature space.
How?
 Take all the movies that the user has seen and take the average vector
 Other aggregation functions are also possible.

 Recommend to user C the most similar item I computing similarity in the common
feature space
 Distributional distance measures also work well.
LIMITATIONS OF CONTENT-BASED APPROACH

 Finding the appropriate features


 e.g., images, movies, music

 Overspecialization
 Never recommends items outside user’s content profile
 People might have multiple interests

 Recommendations for new users


 How to build a profile?
COLLABORATIVE FILTERING

Harry Harry Harry Twilight Star Star Star


Potter 1 Potter 2 Potter 3 Wars 1 Wars 2 Wars 3
A 4 5 1
B 5 5 4
C 2 4 5
D 3 3

Two users are similar if they rate the same items in a similar way

Recommend to user C, the items


liked by many of the most similar users.
USER SIMILARITY

Harry Harry Harry Twilight Star Star Star


Potter 1 Potter 2 Potter 3 Wars 1 Wars 2 Wars 3
A 4 5 1
B 5 5 4
C 2 4 5
D 3 3

Which pair of users do you consider as the most similar?

What is the right definition of similarity?


USER SIMILARITY

Harry Harry Harry Twilight Star Star Star


Potter 1 Potter 2 Potter 3 Wars 1 Wars 2 Wars 3
A 1 1 1
B 1 1 1
C 1 1 1
D 1 1

Jaccard Similarity: users are sets of movies


Disregards the ratings.
Jsim(A,B) = 1/5
Jsim(A,C) = Jsim(B,D) = 1/2
USER SIMILARITY

Harry Harry Harry Twilight Star Star Star


Potter 1 Potter 2 Potter 3 Wars 1 Wars 2 Wars 3
A 4 5 1
B 5 5 4
C 2 4 5
D 3 3

Cosine Similarity:
Assumes zero entries are negatives:
Cos(A,B) = 0.38
Cos(A,C) = 0.32
USER SIMILARITY

Harry Harry Harry Twilight Star Star Star


Potter 1 Potter 2 Potter 3 Wars 1 Wars 2 Wars 3
A 2/3 5/3 -7/3
B 1/3 1/3 -2/3
C -5/3 1/3 4/3
D 0 0

Normalized Cosine Similarity:


• Subtract the mean and then compute Cosine
(correlation coefficient)
Corr(A,B) = 0.092
Cos(A,C) = -0.559
USER-USER COLLABORATIVE FILTERING

 Consider user c
 Find set D of other users whose ratings are most “similar”
to c’s ratings
 Estimate user’s ratings based on ratings of users in D using
some aggregation function

 Advantage: for each user we have small amount of


computation.
ITEM-ITEM COLLABORATIVE FILTERING
 We can transpose (flip) the matrix and perform the same
computation as before to define similarity between items
 Intuition: Two items are similar if they are rated in the same
way by many users.
 Better defined similarity since it captures the notion of genre
of an item
 Users may have multiple interests.

 Algorithm: For each user c and item i


 Find the set D of most similar items to item i that have been rated by user c.
 Aggregate their ratings to predict the rating for item i.

 Disadvantage: we need to consider each user-item pair separately


EXAMPLE: CUSTOMER SEGMENTATION

Problem: to develop meaningful customer groups that are


similar based on individual behaviors
Goal: to know your customer better and to apply that
knowledge to increase profitability reduce operational cost, and
enhance customer service
– Why are my customers leaving?
– What do my best customers look like?
Aproach: Clustering
– Low correlation between input variables
produces more stable clusters
– Class attribute tends to dominate cluster formation
– Low skewness reduces the chance of creating small outlier
clusters

4/11/2019 41
LECTURE 5
DATA MINING:
ASSOCIATION

42
WHAT IS ASSOCIATION MINING?

 Association rule mining:


 Finding frequent patterns, associations, correlations, or causal structures
among sets of items or objects in transaction databases, relational
databases, and other information repositories.
 Applications:
 cross-marketing and analysis, catalog design, clustering, classification, etc.

 Examples.
 Rule form: “Body ead [support, confidence]”.
 buys(x, “diapers”)  buys(x, “beers”) [0.5%, 60%]
 major(x, “CS”) ^ takes(x, “DB”) grade(x, “A”) [1%, 75%]
CONT.
Association Rule Mining is one of the ways to find patterns in data. It finds:
 features (dimensions) which occur together
 features (dimensions) which are “correlated”
 What does the value of one feature tell us about the value of another feature?
 For example, people who buy diapers are likely to buy baby powder. Or we can
rephrase the statement by saying: If (people buy diaper), then (they buy baby powder).
 When to use Association Rules
 We can use Association Rules in any dataset where features take only two values i.e.,
0/1. Some examples are listed below:
 Market Basket Analysis is a popular application of Association Rules.
 People who visit webpage X are likely to visit webpage Y
 People who have age-group [30,40] & income [>$100k] are likely to own home
44
4/11/2019
BASIC CONCEPTS AND RULE MEASURES
Tid Items bought  itemset: A set of one or more items
10 Beer, Nuts, Diaper
 k-itemset X = {x1, …, xk}
20 Beer, Coffee, Diaper
30 Beer, Diaper, Eggs  (absolute) support, or, support count
40 Nuts, Eggs, Milk of X: Frequency or occurrence of
50 Nuts, Coffee, Diaper, Eggs, Milk the itemset X
 (relative) support, s, is the fraction of
Customer Customer transactions that contains X (i.e.,
buys both buys diaper the probability that a transaction
contains X)
 An itemset X is frequent if X’s
support is no less than a minsup
Customer threshold
buys beer 45
BASIC CONCEPTS: ASSOCIATION RULES
Ti Items bought
d  Find all the rules X  Y with minimum
10 Beer, Nuts, Diaper support and confidence
20 Beer, Coffee, Diaper  support, s, probability that a
30 Beer, Diaper, Eggs transaction contains X  Y
40 Nuts, Eggs, Milk  confidence, c, conditional probability
50 Nuts, Coffee, Diaper, Eggs, Milk that a transaction having X also
Customer contains Y
Customer
buys both
buys Let minsup = 50%, minconf = 50%
diaper
Freq. Pat.: Beer:3, Nuts:3, Diaper:4, Eggs:3, {Beer,
Diaper}:3

Customer  Association rules: (many more!)


buys beer  Beer  Diaper (60%, 100%)
 Diaper  Beer (60%, 75%)
46
ASSOCIATION RULE: BASIC CONCEPTS

 Given: (1) database of transactions, (2) each transaction is a list of items


(purchased by a customer in a visit)
 Find: all rules that correlate the presence of one set of items with that of
another set of items
 E.g., 98% of people who purchase Laptop and Tablets also buy the bag
 Applications
 Home Electronics (What other products should the store stocks up?)
 Attached mailing in direct marketing
 Detecting “ping-pong”ing of patients, faulty “collisions”
 etc.
ASSOCIATION RULE MINING: A ROAD MAP

 Boolean vs. quantitative associations (Based on the types of values


handled)
 buys(x, “SQLServer”) ^ buys(x, “DMBook”) buys(x, “DBMiner”)
[0.2%, 60%]
 age(x, “30..39”) ^ income(x, “42..48K”) buys(x, “PC”) [1%, 75%]
 Single dimension vs. multiple dimensional associations
 Single level vs. multiple-level analysis
 What brands of beers are associated with what brands of diapers?
ASSOCIATION RULE MINING TASK

 Given a set of transactions T, the goal of association rule mining is to find all
rules having
 support ≥ minsup threshold
 confidence ≥ minconf threshold
 Brute-force approach:
 List all possible association rules
 Compute the support and confidence for each rule
 Prune rules that fail the minsup and minconf thresholds
 Computationally prohibitive!

49
MINING ASSOCIATION RULES

Tid Items bought Example of Rules:


10 Bread, Milk
{Milk,Diaper} → {Beer} (s=0.4, c=0.67)
{Milk,Beer} → {Diaper} (s=0.4, c=1.0)
20 Bread, Diaper, Beer, Eggs {Diaper,Beer} → {Milk} (s=0.4, c=0.67)
30 Milk, Diaper, Beer, Coke {Beer} → {Milk,Diaper} (s=0.4, c=0.67)
40 Bread, Milk, Diaper, Beer {Diaper} → {Milk,Beer} (s=0.4, c=0.5)
50 Bread, Milk, Diaper, Coke {Milk} → {Diaper,Beer} (s=0.4, c=0.5)

Observations:
• All the above rules are binary partitions of the same itemset:
{Milk, Diaper, Beer}
• Rules originating from the same itemset have identical support but can have
different confidence 50

• Thus, we may decouple the support and confidence requirements


MINING ASSOCIATION RULES

 Two-step approach:
1. Frequent Itemset Generation
 Generate all itemsets whose support ≥ minsup

2. Rule Generation
 Generate high confidence rules from each frequent itemset, where each
rule is a binary partitioning of a frequent itemset

 Frequent itemset generation is still computationally


expensive

51
FREQUENT ITEMSET GENERATION

52
ITEMSET CANDIDATE GENERATION

 Brute-force approach:
 Count the support of each candidate by scanning the database

Transactions List of Candidates


Tid Items bought
10 Bread, Milk
20 Bread, Diaper, Beer, Eggs
N 30 Milk, Diaper, Beer, Coke
m=2d
40 Bread, Milk, Diaper, Beer
50 Bread, Milk, Diaper, Coke

w
– Match each transaction against every candidate
53
– Complexity: O(Nmw): this is costly
FREQUENT ITEMSET GENERATION STRATEGIES

 Reduce the number of candidates (M)


 Complete search: M=2d
 Use pruning techniques to reduce M
 Reduce the number of transactions (N)
 Reduce size of N as the size of itemset increases
 Used by DHP and vertical-based mining algorithms
 Reduce the number of comparisons (NM)
 Use efficient data structures to store the candidates or transactions
 No need to match every candidate against every transaction

54
REDUCING NUMBER OF CANDIDATES

 Apriori principle:
 If an itemset is frequent, then all of its subsets must also be frequent

 Apriori principle holds due to the following property of the


support measure:
 Support of an itemset never exceeds the support of its subsets (anti-
monotone property of support)

55
EXAMPLE APRIORI PRINCIPLE

56
THE APRIORI ALGORITHM—AN EXAMPLE
Supmin = 2
Itemset sup
Itemset sup
Database TDB {A} 2
Tid Items
L1 {A} 2
C1 {B} 3
{B} 3
10 A, C, D {C} 3
1st scan {C} 3
20 B, C, E {D} 1
{E} 3
30 A, B, C, E {E} 3
40 B, E
C2 Itemset sup C2 Itemset
{A, B} 1
L2 Itemset sup 2nd scan {A, B}
{A, C} 2
{A, C} 2 {A, C}
{A, E} 1
{B, C} 2
{B, C} 2 {A, E}
{B, E} 3
{B, E} 3 {B, C}
{C, E} 2
{C, E} 2 {B, E}
{C, E}
57
C3 Itemset L3 Itemset sup
3rd scan
{B, C, E} {B, C, E} 2
THE APRIORI ALGORITHM (PSEUDO-CODE)

Ck: Candidate itemset of size k


Lk : frequent itemset of size k

L1 = {frequent items};
for (k = 1; Lk !=; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1 that are contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return k Lk; 58
IMPLEMENTATION OF APRIORI
 How to generate candidates?

 Step 1: self-joining Lk

 Step 2: pruning

 Example of Candidate-generation

 L3={abc, abd, acd, ace, bcd}

 Self-joining: L3*L3
 abcd from abc and abd

 acde from acd and ace

 Pruning:
 acde is removed because ade is not in L3
59
 C4 = {abcd}
MINING ASSOCIATION RULES—AN EXAMPLE

Min. support 50%


Transaction ID Items Bought
Min. confidence 50%
2000 A,B,C
1000 A,C Frequent Itemset Support
4000 A,D {A} 75%
{B} 50%
5000 B,E,F
{C} 50%
For rule A  C: {A,C} 50%
support = support({A C}) = 50%
confidence = support({A C})/support({A}) = 66.6%
The Apriori principle:
Any subset of a frequent itemset must be frequent
MINING FREQUENT ITEM SETS: THE KEY STEP

 Find the frequent itemsets: the sets of items that have


minimum support
 A subset of a frequent itemset must also be a frequent itemset

 i.e., if {AB} is a frequent itemset, both {A} and {B} should be a frequent
itemset

 Iteratively find frequent itemsets with cardinality from 1 to k (k-


itemset)

 Use the frequent itemsets to generate association rules.


THE APRIORI ALGORITHM — EXAMPLE

Database D itemset sup.


TID Items C1 itemset sup. L1
{1} 2 {1} 2
100 134 {2} 3
200 235 Scan D {2} 3
{3} 3 {3} 3
300 1235 {4} 1 {5} 3
400 25 {5} 3
C2 C2 itemset
itemset sup {1 2}
L2 itemset sup {1 2} 1
Scan D
{1 3} 2 {1 3}
{1 3} 2 {1 5}
{2 3} 2 {1 5} 1
{2 3}
{2 5} 3 {2 3} 2
{2 5} 3 {2 5}
{3 5} 2
{3 5} 2 {3 5}
C3 itemset Scan D L3 itemset sup
{2 3 5} {2 3 5} 2
MULTI-DIMENSIONAL ASSOCIATION: CONCEPTS

 Single-dimensional rules:
buys(X, “milk”)  buys(X, “bread”)
 Multi-dimensional rules: 2 dimensions or predicates
 Inter-dimension association rules (no repeated predicates)
age(X,”19-25”)  occupation(X,“student”)  buys(X,“Labtop”)
 Categorical Attributes
 finite number of possible values, no ordering among values
 Quantitative Attributes
 numeric, implicit ordering among values
FP-growth: Mining Frequent Patterns
Using FP-tree
MINING FREQUENT PATTERNS USING FP-TREE

 General idea (divide-and-conquer)


Recursively grow frequent patterns using the FP-tree: looking for shorter
ones recursively and then concatenating the suffix:
 For each frequent item, construct its conditional pattern base, and then
its conditional FP-tree;
 Repeat the process on each newly created conditional FP-tree until the
resulting FP-tree is empty, or it contains only one path (single path will
generate all the combinations of its sub-paths, each of which is a frequent
pattern)

65
3 MAJOR STEPS

Starting the processing from the end of list L:


Step 1:
Construct conditional pattern base for each item in the header table

Step 2
Construct conditional FP-tree from each conditional pattern base

Step 3
Recursively mine conditional FP-trees and grow frequent patterns obtained
so far. If the conditional FP-tree contains a single path, simply enumerate all the
patterns
66
STEP 1: CONSTRUCT CONDITIONAL PATTERN BASE
 Starting at the bottom of frequent-item header table in the FP-tree
 Traverse the FP-tree by following the link of each frequent item
 Accumulate all of transformed prefix paths of that item to form a
conditional pattern base

{} Conditional pattern bases


Header Table item cond. pattern base
f:4 c:1
Item head p fcam:2, cb:1
f m fca:2, fcab:1
c c:3 b:1 b:1
a b fca:1, f:1, c:1
b a:3 p:1 a fc:3
m
p m:2 b:1 c f:3
f {}
67
p:2 m:1
PROPERTIES OF FP-TREE

 Node-link property
 For any frequent item ai,all the possible frequent patterns that
contain ai can be obtained by following ai's node-links, starting from
ai's head in the FP-tree header.

 Prefix path property


 To calculate the frequent patterns for a node ai in a path P, only the
prefix sub-path of ai in P need to be accumulated, and its frequency
count should carry the same count as node ai.
68
STEP 2: CONSTRUCT CONDITIONAL FP-TREE
 For each pattern-base
 Accumulate the count for each item in the base
 Construct the FP-tree for the frequent items of the pattern base

{} m-conditional pattern
Header Table base:
Item frequency head f:4 c:1 fca:2, fcab:1
f 4 All frequent patterns
c 4 c:3 b:1 b:1 {} concerning m
m,
a
b
3
3 a:3 p:1  f:3  fm, cm, am,
m 3 fcm, fam, cam,
p 3 m:2 b:1 c:3 fcam

p:2 m:1 a:3


m-conditional FP-tree 69
MINING FREQUENT PATTERNS BY CREATING CONDITIONAL
PATTERN-BASES

Item Conditional pattern-base Conditional FP-tree

p {(fcam:2), (cb:1)} {(c:3)}|p {}

m {(fca:2), (fcab:1)} {(f:3, c:3, a:3)}|m f:4 c:1


c:3 b:1 b:1
b {(fca:1), (f:1), (c:1)} Empty
a:3 p:1
a {(fc:3)} {(f:3, c:3)}|a
m:2 b:1
c {(f:3)} {(f:3)}|c p:2 m:1
f Empty Empty
70
STEP 3: RECURSIVELY MINE THE CONDITIONAL FP-TREE
{}
{}
Cond. pattern base of “am”: (fc:3) f:3
f:3
c:3
c:3 am-conditional FP-tree
{}
a:3 Cond. pattern base of “cm”: (f:3)
m-conditional FP-tree
f:3
cm-conditional FP-tree

{}

Cond. pattern base of “cam”: (f:3) f:3


cam-conditional FP-tree 71
STEP 3: RECURSIVELY MINE THE CONDITIONAL FP-TREE

conditional FP-tree of conditional FP-tree of conditional FP-tree of


“am”: (fc:3) “cam”: (f:3)
“m”: (fca:3 ) add
{} “c” {}
add Frequent Pattern Frequent Pattern
Frequent Pattern {} “a” f:3 f:3
f:3 c:3 add add
add “f”
“c” “f”
c:3 conditional FP-tree of conditional FP-tree of
“cm”: (f:3) of “fam”: 3
a:3 add
Frequent Pattern {} “f” Frequent Pattern

add conditional FP-tree of


f:3 “fcm”: 3
“f”

Frequent Pattern
Frequent Pattern
fcam

conditional FP-tree of “fm”: 3 72

Frequent Pattern
PRINCIPLES OF FP-GROWTH

 Pattern growth property


 Let  be a frequent itemset in DB, B be 's conditional pattern base,
and  be an itemset in B. Then    is a frequent itemset in DB iff
 is frequent in B.

 “abcdef ” is a frequent pattern, if and only if


 “abcde ” is a frequent pattern, and

 “f ” is frequent in the set of transactions containing “abcde ”

73
CONDITIONAL PATTERN BASES AND
CONDITIONAL FP-TREE

Item Conditional pattern base Conditional FP-tree

p {(fcam:2), (cb:1)} {(c:3)}|p {}


m {(fca:2), (fcab:1)} {(f:3, c:3, a:3)}|m f:4 c:1

b {(fca:1), (f:1), (c:1)} Empty c:3 b:1 b:1


a:3 p:1
a {(fc:3)} {(f:3, c:3)}|a
m:2 b:1
c {(f:3)} {(f:3)}|c
p:2 m:1
f Empty Empty

order of L
74
SINGLE FP-TREE PATH GENERATION

 Suppose an FP-tree T has a single path P. The complete set of frequent


pattern of T can be generated by enumeration of all the combinations of the
sub-paths of P
{}
All frequent patterns concerning m:
combination of {f, c, a} and m
f:3 m,
c:3  fm, cm, am,
fcm, fam, cam,
a:3
fcam
m-conditional FP-tree

75
EFFICIENCY ANALYSIS

Facts: usually
1. FP-tree is much smaller than the size of the DB
2. Pattern base is smaller than original FP-tree
3. Conditional FP-tree is smaller than pattern base
 mining process works on a set of usually much smaller pattern bases and conditional
FP-trees
 Divide-and-conquer and dramatic scale of shrinking

76
ADVANTAGES OF THE PATTERN GROWTH APPROACH

 Divide-and-conquer:

 Decompose both the mining task and DB according to the frequent patterns
obtained so far

 Lead to focused search of smaller databases

 Other factors

 No candidate generation, no candidate test

 Compressed database: FP-tree structure

 No repeated scan of entire database

 Basic ops: counting local freq items and building sub FP-tree, no pattern search
and matching (Grahne and J. Zhu, FIMI'03) 77
INTERESTINGNESS MEASUREMENTS

 Objective measures
Two popular measurements:
¶ support; and
· confidence

 Subjective measures
A rule (pattern) is interesting if
¶ it is unexpected (surprising to the user); and/or
· actionable (the user can do something with it)
CRITICISM TO SUPPORT AND CONFIDENCE (CONT.)

 Example 2:
 X and Y: positively correlated,
X 1 1 1 1 0 0 0 0
 X and Z, negatively related Y 1 1 0 0 0 0 0 0
 support and confidence of
Z 0 1 1 1 1 1 1 1
X=>Z dominates
 We need a measure of dependent or correlated
events
P( A B) Rule Support Confidence
corrA, B  X=>Y 25% 50%
P( A) P( B) X=>Z 37.50% 75%
 P(B|A)/P(B) is also called the lift of rule A => B
OTHER INTERESTINGNESS MEASURES: INTEREST

 Interest (correlation, lift) P( A  B)


 taking both P(A) and P(B) in consideration P( A) P( B)

 P(A^B)=P(B)*P(A), if A and B are independent events

 A and B negatively correlated, if the value is less than 1; otherwise A and B positively
correlated
Itemset Support Interest
X 1 1 1 1 0 0 0 0
X,Y 25% 2
Y 1 1 0 0 0 0 0 0 X,Z 37.50% 0.9
Z 0 1 1 1 1 1 1 1 Y,Z 12.50% 0.57

You might also like