Data Mining: Characterization: Jimma University, Faculty of Computing Arranged By: Dessalegn Y

LECTURE 4
DATA MINING:
CHARACTERIZATION
Jimma University ,Faculty of COmputing

Arranged by : Dessalegn Y.
1
CONCEPT DESCRIPTION:
CHARACTERIZATION AND COMPARISON
WHAT IS CONCEPT DESCRIPTION?
 Descriptive vs. predictive data mining

 Descriptive mining: describes concepts or task-relevant data sets in concise,
summarative, informative, discriminative forms
 Predictive mining: Based on data and analysis, constructs models for the database,
and predicts the trend and properties of unknown data
 Concept description:
 Characterization: provides a concise summarization of the given collection of data
 Comparison: provides descriptions comparing two or more collections of data
 Data generalization: process which abstracts a large set of task-relevant data in a
database from a low conceptual levels to higher ones
ATTRIBUTE-ORIENTED INDUCTION
How it is done?
 Collect the task-relevant data( initial relation) using a relational database
query
 Perform generalization by attribute removal or attribute generalization.
 replacing relatively low-level values (e.g., numeric values for an attribute age) with
higher-level concepts (e.g., young, middle-aged, and senior)
 Apply aggregation by merging identical, generalized tuples and

accumulating their respective counts.
 Interactive presentation with users.
BASIC PRINCIPLES OF ATTRIBUTE-ORIENTED INDUCTION
 Data focusing: task-relevant data, including dimensions, and the result is

the initial relation.
 Attribute-removal: remove attribute A if there is a large set of distinct
values for A but (1) there is no generalization operator on A, or (2) A’s
higher level concepts are expressed in terms of other attributes.
 Attribute-generalization: If there is a large set of distinct values for A, and
there exists a set of generalization operators on A, then select an operator
and generalize A.
 Generalized relation threshold control: control the final relation/rule size.
EXAMPLE
 Describe general characteristics of graduate students in the Big-University
database
use Big_University_DB
mine characteristics as “Science_Students”
in relevance to name, gender, major, birth_place, birth_date, residence,
phone#, gpa
from student
where status in “graduate”
 Corresponding SQL statement:
Select name, gender, major, birth_place, birth_date, residence, phone#, gpa
from student
where status in {“Msc”, “MBA”, “PhD” }
CLASS CHARACTERIZATION: EXAMPLE
Name Gender Major Birth-Place Birth_date Residence Phone # GPA

Initial Jim M CS Vancouver,BC, 8-12-76 3511 Main St., 687-4598 3.67
Relation Woodman Canada Richmond
Scott M CS Montreal, Que, 28-7-75 345 1st Ave., 253-9106 3.70
Lachance Canada Richmond
Laura Lee F Physics Seattle, WA, USA 25-8-70 125 Austin Ave., 420-5232 3.83
… … … … … Burnaby … …
…
Removed Retained Sci,Eng, Country Age range City Removed Excl,
Bus VG,..
Gender Major Birth_region Age_range Residence GPA Count
Prime M Science Canada 20-25 Richmond Very-good 16
Generalized F Science Foreign 25-30 Burnaby Excellent 22
Relation … … … … … … …
Birth_Region
Canada Foreign Total
Gender
M 16 14 30
F 10 22 32
Total 26 36 62
ATTRIBUTE RELEVANCE ANALYSIS
 Why?
 Which dimensions should be included?
 How high level of generalization?
 Reduce # attributes; easy to understand patterns
 What?
 statistical method for preprocessing data
 filter out irrelevant or weakly relevant attributes
 retain or rank the relevant attributes
 relevance related to dimensions and levels

 analytical characterization, analytical comparison
ATTRIBUTE RELEVANCE ANALYSIS (CONT’D)
 How?
 Data Collection
 Analytical Generalization
 Use information gain analysis (e.g., entropy or other measures) to identify
highly relevant dimensions and levels.
 Relevance Analysis
 Sort and select the most relevant dimensions and levels.
 Attribute-oriented Induction for class description

 On selected dimension/level
 OLAP operations (e.g. drilling, slicing) on relevance rules

INFORMATION-THEORETIC APPROACH
 Decision tree
 each internal node tests an attribute
 each branch corresponds to attribute value
 each leaf node assigns a classification
 ID3 algorithm
 build decision tree based on training objects with known class labels
to classify testing objects
 rank attributes with information gain measure
 minimal height
 the least number of tests to classify an object
See example
TOP-DOWN INDUCTION OF DECISION TREE
Attributes = {Outlook, Temperature, Humidity, Wind}

PlayTennis = {yes, no}
Outlook
sunny overcast rain
Humidity Wind
yes
high normal strong weak
no yes no yes
SIMILARITY AND DISTANCE
 For many different problems we need to quantify how close two objects are.
 Examples:
 For an item bought by a customer, find other similar items
 Group together the customers of a site so that similar customers are shown the same
ad.
 Group together web documents so that you can separate the ones that talk about
politics and the ones that talk about sports.
 Find all the near-duplicate mirrored web documents.
 Find credit card transactions that are very different from previous transactions.
 To solve these problems we need a definition of similarity, or distance.
 The definition depends on the type of data that we have
SIMILARITY
 Numerical measure of how alike two data objects are.
 A function that maps pairs of objects to real values
 Higher when objects are more alike.
 Often falls in the range [0,1], sometimes in [-1,1]
 Desirable properties for similarity

1. s(p, q) = 1 (or maximum similarity) only if p = q. (Identity)
2. s(p, q) = s(q, p) for all p and q. (Symmetry)
SIMILARITY BETWEEN SETS
 Consider the following documents
apple apple new

releases releases apple pie
new ipod new ipad recipe
 Which ones are more similar?
 How would you quantify their similarity?

SIMILARITY: INTERSECTION
 Number of words in common
apple apple new

 Sim(D,D) = 3, Sim(D,D) = Sim(D,D) =2

 What about this document?
Vefa rereases new book

with apple pie recipes
 Sim(D,D) = Sim(D,D) = 3
JACCARD SIMILARITY
 The Jaccard similarity (Jaccard coefficient) of two sets S1, S2 is the size of
their intersection divided by the size of their union.
 JSim (C1, C2) = |C1C2| / |C1C2|.
3 in intersection.
8 in union.
Jaccard similarity
= 3/8
 Extreme behavior:
 Jsim(X,Y) = 1, iff X = Y
 Jsim(X,Y) = 0 iff X,Y have no elements in common
 JSim is symmetric 15
JACCARD SIMILARITY BETWEEN SETS
 The distance for the documents
Vefa rereases
apple apple new
new book with
apple pie
recipes
 JSim(D,D) = 3/5
 JSim(D,D) = JSim(D,D) = 2/6
 JSim(D,D) = JSim(D,D) = 3/9
SIMILARITY BETWEEN VECTORS
Documents (and sets in general) can also be represented as vectors
document Apple Microsoft Obama Election
D1 10 20 0 0
D2 30 60 0 0
D3 60 30 0 0
D4 0 0 10 20
How do we measure the similarity of two vectors?
• We could view them as sets of words. Jaccard Similarity will

show that D4 is different form the rest
• But all pairs of the other three documents are equally
similar
We want to capture how well the two vectors are aligned
COSINE SIMILARITY
 Sim(X,Y) = cos(X,Y)
 The cosine of the angle between X and Y
 If the vectors are aligned (correlated) angle is zero degrees and cos(X,Y)=1
 If the vectors are orthogonal (no common coordinates) angle is 90 degrees and
cos(X,Y) = 0
 Cosine is commonly used for comparing documents, where we assume that the
vectors are normalized by the document length.
COSINE SIMILARITY - MATH
 If d1 and d2 are two vectors, then
cos( d1, d2 ) = (d1  d2) / ||d1|| ||d2|| ,
where  indicates vector dot product and || d || is the length of vector d.
 Example:
d1 = 3 2 0 5 0 0 0 2 0 0
d2 = 1 0 0 0 0 0 0 1 0 2
d1  d2= 3*1 + 2*0 + 0*0 + 5*0 + 0*0 + 0*0 + 0*0 + 2*1 + 0*0 + 0*2 = 5
||d1|| = (3*3+2*2+0*0+5*5+0*0+0*0+0*0+2*2+0*0+0*0)0.5 = (42) 0.5 = 6.481
||d2|| = (1*1+0*0+0*0+0*0+0*0+0*0+0*0+1*1+0*0+2*2) 0.5 = (6) 0.5 = 2.245
cos( d1, d2 ) = .3150
EXAMPLE
document Apple Microsoft Obama Election
D1 10 20 0 0
D2 30 60 0 0
D3 60 30 0 0
D4 0 0 10 20
Cos(D1,D2) = 1
Cos (D3,D1) = Cos(D3,D2) = 4/5
Cos(D4,D1) = Cos(D4,D2) = Cos(D4,D3) = 0

DISTANCE
 Numerical measure of how different two data objects are

 A function that maps pairs of objects to real values
 Lower when objects are more alike
 Higher when two objects are different
 Minimum distance is 0, when comparing an object with itself.

 Upper limit varies
SIMILARITIES INTO DISTANCES

HAMMING DISTANCE
 Hamming distance is the number of positions in which bit-
vectors differ.
 Example: p1 = 10101 p2 = 10011.
 d(p1, p2) = 2 because the bit-vectors differ in the 3rd and 4th positions.
 The L1 norm for the binary vectors
 Hamming distance between two vectors of categorical

attributes is the number of positions in which they differ.
 Example: x = (married, low income, cheat),
y = (single, low income, not cheat)
24
d(x,y) = 2
DISTANCE BETWEEN STRINGS
 How do we define similarity between strings?
weird wierd
intelligent unintelligent
Athena Athina
 Important for recognizing and correcting typing errors and analyzing DNA
sequences.
EDIT DISTANCE FOR STRINGS
 The edit distance of two strings is the number of inserts and deletes of
characters needed to turn one into the other.
 Example: x = abcde ; y = bcduve.
 Turn x into y by deleting a, then inserting u and v after d.
 Edit distance = 3.
 Minimum number of operations can be computed using dynamic

programming
 Common distance measure for comparing DNA sequences
26
APPLICATIONS OF SIMILARITY:
RECOMMENDATION SYSTEMS
IMPORTANT PROBLEM
 Recommendation systems
 When a user buys an item (initially books) we want to recommend other
items that the user may like
 When a user rates a movie, we want to recommend movies that the user
may like
 When a user likes a song, we want to recommend other songs that they may
like
 A big success of data mining
UTILITY (PREFERENCE) MATRIX
Harry Harry Harry Twilight Star Star Star

Potter 1 Potter 2 Potter 3 Wars 1 Wars 2 Wars 3
A 4 5 1
B 5 5 4
C 2 4 5
D 3 3
How can we fill the empty entries of the matrix?

RECOMMENDATION SYSTEMS
 Content-based:
 Represent the items into a feature space and recommend
items to customer C similar to previous items rated
highly by C
 Movie recommendations: recommend movies with same actor(s),
director, genre, …
 Websites, blogs, news: recommend other sites with “similar”
content
CONTENT-BASED PREDICTION
Harry Harry Harry Black Star Star Star

Potter 1 Potter 2 Potter 3 panther Wars 1 Wars 2 Wars 3
A 4 5 1
B 5 5 4
C 2 4 5
D 3 3
Someone who likes one of the Harry Potter or Star Wars

movies is likely to like the rest
• Same actors, similar story, same genre
APPROACH
 Map items into a feature space:

 For movies:
 Actors, directors, genre, rating, year,…
 Challenge: make all features compatible.
 To compare items with users we need to map users to the same feature space.
How?
 Take all the movies that the user has seen and take the average vector
 Other aggregation functions are also possible.
 Recommend to user C the most similar item I computing similarity in the common
feature space
 Distributional distance measures also work well.
LIMITATIONS OF CONTENT-BASED APPROACH
 Finding the appropriate features

 e.g., images, movies, music
 Overspecialization
 Never recommends items outside user’s content profile
 People might have multiple interests
 Recommendations for new users

 How to build a profile?
COLLABORATIVE FILTERING

A 4 5 1
B 5 5 4
C 2 4 5
D 3 3
Two users are similar if they rate the same items in a similar way
Recommend to user C, the items

liked by many of the most similar users.
USER SIMILARITY

A 4 5 1
B 5 5 4
C 2 4 5
D 3 3
Which pair of users do you consider as the most similar?
What is the right definition of similarity?

USER SIMILARITY

A 1 1 1
B 1 1 1
C 1 1 1
D 1 1
Jaccard Similarity: users are sets of movies

Disregards the ratings.
Jsim(A,B) = 1/5
Jsim(A,C) = Jsim(B,D) = 1/2
USER SIMILARITY

A 4 5 1
B 5 5 4
C 2 4 5
D 3 3
Cosine Similarity:
Assumes zero entries are negatives:
Cos(A,B) = 0.38
Cos(A,C) = 0.32
USER SIMILARITY

A 2/3 5/3 -7/3
B 1/3 1/3 -2/3
C -5/3 1/3 4/3
D 0 0
Normalized Cosine Similarity:

• Subtract the mean and then compute Cosine
(correlation coefficient)
Corr(A,B) = 0.092
Cos(A,C) = -0.559
USER-USER COLLABORATIVE FILTERING
 Consider user c
 Find set D of other users whose ratings are most “similar”
to c’s ratings
 Estimate user’s ratings based on ratings of users in D using
some aggregation function
 Advantage: for each user we have small amount of

computation.
ITEM-ITEM COLLABORATIVE FILTERING
 We can transpose (flip) the matrix and perform the same
computation as before to define similarity between items
 Intuition: Two items are similar if they are rated in the same
way by many users.
 Better defined similarity since it captures the notion of genre
of an item
 Users may have multiple interests.
 Algorithm: For each user c and item i

 Find the set D of most similar items to item i that have been rated by user c.
 Aggregate their ratings to predict the rating for item i.
 Disadvantage: we need to consider each user-item pair separately

EXAMPLE: CUSTOMER SEGMENTATION
Problem: to develop meaningful customer groups that are

similar based on individual behaviors
Goal: to know your customer better and to apply that
knowledge to increase profitability reduce operational cost, and
enhance customer service
– Why are my customers leaving?
– What do my best customers look like?
Aproach: Clustering
– Low correlation between input variables
produces more stable clusters
– Class attribute tends to dominate cluster formation
– Low skewness reduces the chance of creating small outlier
clusters
4/11/2019 41
LECTURE 5
DATA MINING:
ASSOCIATION
42
WHAT IS ASSOCIATION MINING?
 Association rule mining:

 Finding frequent patterns, associations, correlations, or causal structures
among sets of items or objects in transaction databases, relational
databases, and other information repositories.
 Applications:
 cross-marketing and analysis, catalog design, clustering, classification, etc.
 Examples.
 Rule form: “Body ead [support, confidence]”.
 buys(x, “diapers”)  buys(x, “beers”) [0.5%, 60%]
 major(x, “CS”) ^ takes(x, “DB”) grade(x, “A”) [1%, 75%]
CONT.
Association Rule Mining is one of the ways to find patterns in data. It finds:
 features (dimensions) which occur together
 features (dimensions) which are “correlated”
 What does the value of one feature tell us about the value of another feature?
 For example, people who buy diapers are likely to buy baby powder. Or we can
rephrase the statement by saying: If (people buy diaper), then (they buy baby powder).
 When to use Association Rules
 We can use Association Rules in any dataset where features take only two values i.e.,
0/1. Some examples are listed below:
 Market Basket Analysis is a popular application of Association Rules.
 People who visit webpage X are likely to visit webpage Y
 People who have age-group [30,40] & income [>$100k] are likely to own home
44
4/11/2019
BASIC CONCEPTS AND RULE MEASURES
Tid Items bought  itemset: A set of one or more items
10 Beer, Nuts, Diaper
 k-itemset X = {x1, …, xk}
20 Beer, Coffee, Diaper
30 Beer, Diaper, Eggs  (absolute) support, or, support count
40 Nuts, Eggs, Milk of X: Frequency or occurrence of
50 Nuts, Coffee, Diaper, Eggs, Milk the itemset X
 (relative) support, s, is the fraction of
Customer Customer transactions that contains X (i.e.,
buys both buys diaper the probability that a transaction
contains X)
 An itemset X is frequent if X’s
support is no less than a minsup
Customer threshold
buys beer 45
BASIC CONCEPTS: ASSOCIATION RULES
Ti Items bought
d  Find all the rules X  Y with minimum
10 Beer, Nuts, Diaper support and confidence
20 Beer, Coffee, Diaper  support, s, probability that a
30 Beer, Diaper, Eggs transaction contains X  Y
40 Nuts, Eggs, Milk  confidence, c, conditional probability
50 Nuts, Coffee, Diaper, Eggs, Milk that a transaction having X also
Customer contains Y
Customer
buys both
buys Let minsup = 50%, minconf = 50%
diaper
Freq. Pat.: Beer:3, Nuts:3, Diaper:4, Eggs:3, {Beer,
Diaper}:3
Customer  Association rules: (many more!)

buys beer  Beer  Diaper (60%, 100%)
 Diaper  Beer (60%, 75%)
46
ASSOCIATION RULE: BASIC CONCEPTS
 Given: (1) database of transactions, (2) each transaction is a list of items

(purchased by a customer in a visit)
 Find: all rules that correlate the presence of one set of items with that of
another set of items
 E.g., 98% of people who purchase Laptop and Tablets also buy the bag
 Applications
 Home Electronics (What other products should the store stocks up?)
 Attached mailing in direct marketing
 Detecting “ping-pong”ing of patients, faulty “collisions”
 etc.
ASSOCIATION RULE MINING: A ROAD MAP
 Boolean vs. quantitative associations (Based on the types of values

handled)
 buys(x, “SQLServer”) ^ buys(x, “DMBook”) buys(x, “DBMiner”)
[0.2%, 60%]
 age(x, “30..39”) ^ income(x, “42..48K”) buys(x, “PC”) [1%, 75%]
 Single dimension vs. multiple dimensional associations
 Single level vs. multiple-level analysis
 What brands of beers are associated with what brands of diapers?
ASSOCIATION RULE MINING TASK
 Given a set of transactions T, the goal of association rule mining is to find all
rules having
 support ≥ minsup threshold
 confidence ≥ minconf threshold
 Brute-force approach:
 List all possible association rules
 Compute the support and confidence for each rule
 Prune rules that fail the minsup and minconf thresholds
 Computationally prohibitive!
49
MINING ASSOCIATION RULES
Tid Items bought Example of Rules:

10 Bread, Milk
{Milk,Diaper} → {Beer} (s=0.4, c=0.67)
{Milk,Beer} → {Diaper} (s=0.4, c=1.0)
20 Bread, Diaper, Beer, Eggs {Diaper,Beer} → {Milk} (s=0.4, c=0.67)
30 Milk, Diaper, Beer, Coke {Beer} → {Milk,Diaper} (s=0.4, c=0.67)
40 Bread, Milk, Diaper, Beer {Diaper} → {Milk,Beer} (s=0.4, c=0.5)
50 Bread, Milk, Diaper, Coke {Milk} → {Diaper,Beer} (s=0.4, c=0.5)
Observations:
• All the above rules are binary partitions of the same itemset:
{Milk, Diaper, Beer}
• Rules originating from the same itemset have identical support but can have
different confidence 50
• Thus, we may decouple the support and confidence requirements

MINING ASSOCIATION RULES
 Two-step approach:
1. Frequent Itemset Generation
 Generate all itemsets whose support ≥ minsup
2. Rule Generation
 Generate high confidence rules from each frequent itemset, where each
rule is a binary partitioning of a frequent itemset
 Frequent itemset generation is still computationally

expensive
51
FREQUENT ITEMSET GENERATION
52
ITEMSET CANDIDATE GENERATION
 Brute-force approach:
 Count the support of each candidate by scanning the database
Transactions List of Candidates

Tid Items bought
10 Bread, Milk
20 Bread, Diaper, Beer, Eggs
N 30 Milk, Diaper, Beer, Coke
m=2d
40 Bread, Milk, Diaper, Beer
50 Bread, Milk, Diaper, Coke
w
– Match each transaction against every candidate
53
– Complexity: O(Nmw): this is costly
FREQUENT ITEMSET GENERATION STRATEGIES
 Reduce the number of candidates (M)

 Complete search: M=2d
 Use pruning techniques to reduce M
 Reduce the number of transactions (N)
 Reduce size of N as the size of itemset increases
 Used by DHP and vertical-based mining algorithms
 Reduce the number of comparisons (NM)
 Use efficient data structures to store the candidates or transactions
 No need to match every candidate against every transaction
54
REDUCING NUMBER OF CANDIDATES
 Apriori principle:
 If an itemset is frequent, then all of its subsets must also be frequent
 Apriori principle holds due to the following property of the

support measure:
 Support of an itemset never exceeds the support of its subsets (anti-
monotone property of support)
55
EXAMPLE APRIORI PRINCIPLE
56
THE APRIORI ALGORITHM—AN EXAMPLE
Supmin = 2
Itemset sup
Itemset sup
Database TDB {A} 2
Tid Items
L1 {A} 2
C1 {B} 3
{B} 3
10 A, C, D {C} 3
1st scan {C} 3
20 B, C, E {D} 1
{E} 3
30 A, B, C, E {E} 3
40 B, E
C2 Itemset sup C2 Itemset
{A, B} 1
L2 Itemset sup 2nd scan {A, B}
{A, C} 2
{A, C} 2 {A, C}
{A, E} 1
{B, C} 2
{B, C} 2 {A, E}
{B, E} 3
{B, E} 3 {B, C}
{C, E} 2
{C, E} 2 {B, E}
{C, E}
57
C3 Itemset L3 Itemset sup
3rd scan
{B, C, E} {B, C, E} 2
THE APRIORI ALGORITHM (PSEUDO-CODE)
Ck: Candidate itemset of size k

Lk : frequent itemset of size k
L1 = {frequent items};
for (k = 1; Lk !=; k++) do begin
Ck+1 = candidates generated from Lk;
for each transaction t in database do
increment the count of all candidates in Ck+1 that are contained in t
Lk+1 = candidates in Ck+1 with min_support
end
return k Lk; 58
IMPLEMENTATION OF APRIORI
 How to generate candidates?
 Step 1: self-joining Lk
 Step 2: pruning
 Example of Candidate-generation
 L3={abc, abd, acd, ace, bcd}
 Self-joining: L3*L3
 abcd from abc and abd
 acde from acd and ace
 Pruning:
 acde is removed because ade is not in L3
59
 C4 = {abcd}
MINING ASSOCIATION RULES—AN EXAMPLE
Min. support 50%

Transaction ID Items Bought
Min. confidence 50%
2000 A,B,C
1000 A,C Frequent Itemset Support
4000 A,D {A} 75%
{B} 50%
5000 B,E,F
{C} 50%
For rule A  C: {A,C} 50%
support = support({A C}) = 50%
confidence = support({A C})/support({A}) = 66.6%
The Apriori principle:
Any subset of a frequent itemset must be frequent
MINING FREQUENT ITEM SETS: THE KEY STEP
 Find the frequent itemsets: the sets of items that have

minimum support
 A subset of a frequent itemset must also be a frequent itemset
 i.e., if {AB} is a frequent itemset, both {A} and {B} should be a frequent
itemset
 Iteratively find frequent itemsets with cardinality from 1 to k (k-

itemset)
 Use the frequent itemsets to generate association rules.

THE APRIORI ALGORITHM — EXAMPLE
Database D itemset sup.

TID Items C1 itemset sup. L1
{1} 2 {1} 2
100 134 {2} 3
200 235 Scan D {2} 3
{3} 3 {3} 3
300 1235 {4} 1 {5} 3
400 25 {5} 3
C2 C2 itemset
itemset sup {1 2}
L2 itemset sup {1 2} 1
Scan D
{1 3} 2 {1 3}
{1 3} 2 {1 5}
{2 3} 2 {1 5} 1
{2 3}
{2 5} 3 {2 3} 2
{2 5} 3 {2 5}
{3 5} 2
{3 5} 2 {3 5}
C3 itemset Scan D L3 itemset sup
{2 3 5} {2 3 5} 2
MULTI-DIMENSIONAL ASSOCIATION: CONCEPTS
 Single-dimensional rules:
buys(X, “milk”)  buys(X, “bread”)
 Multi-dimensional rules: 2 dimensions or predicates
 Inter-dimension association rules (no repeated predicates)
age(X,”19-25”)  occupation(X,“student”)  buys(X,“Labtop”)
 Categorical Attributes
 finite number of possible values, no ordering among values
 Quantitative Attributes
 numeric, implicit ordering among values
FP-growth: Mining Frequent Patterns
Using FP-tree
MINING FREQUENT PATTERNS USING FP-TREE
 General idea (divide-and-conquer)

Recursively grow frequent patterns using the FP-tree: looking for shorter
ones recursively and then concatenating the suffix:
 For each frequent item, construct its conditional pattern base, and then
its conditional FP-tree;
 Repeat the process on each newly created conditional FP-tree until the
resulting FP-tree is empty, or it contains only one path (single path will
generate all the combinations of its sub-paths, each of which is a frequent
pattern)
65
3 MAJOR STEPS
Starting the processing from the end of list L:

Step 1:
Construct conditional pattern base for each item in the header table
Step 2
Construct conditional FP-tree from each conditional pattern base
Step 3
Recursively mine conditional FP-trees and grow frequent patterns obtained
so far. If the conditional FP-tree contains a single path, simply enumerate all the
patterns
66
STEP 1: CONSTRUCT CONDITIONAL PATTERN BASE
 Starting at the bottom of frequent-item header table in the FP-tree
 Traverse the FP-tree by following the link of each frequent item
 Accumulate all of transformed prefix paths of that item to form a
conditional pattern base
{} Conditional pattern bases

Header Table item cond. pattern base
f:4 c:1
Item head p fcam:2, cb:1
f m fca:2, fcab:1
c c:3 b:1 b:1
a b fca:1, f:1, c:1
b a:3 p:1 a fc:3
m
p m:2 b:1 c f:3
f {}
67
p:2 m:1
PROPERTIES OF FP-TREE
 Node-link property
 For any frequent item ai,all the possible frequent patterns that
contain ai can be obtained by following ai's node-links, starting from
ai's head in the FP-tree header.
 Prefix path property

 To calculate the frequent patterns for a node ai in a path P, only the
prefix sub-path of ai in P need to be accumulated, and its frequency
count should carry the same count as node ai.
68
STEP 2: CONSTRUCT CONDITIONAL FP-TREE
 For each pattern-base
 Accumulate the count for each item in the base
 Construct the FP-tree for the frequent items of the pattern base
{} m-conditional pattern
Header Table base:
Item frequency head f:4 c:1 fca:2, fcab:1
f 4 All frequent patterns
c 4 c:3 b:1 b:1 {} concerning m
m,
a
b
3
3 a:3 p:1  f:3  fm, cm, am,
m 3 fcm, fam, cam,
p 3 m:2 b:1 c:3 fcam
p:2 m:1 a:3

m-conditional FP-tree 69
MINING FREQUENT PATTERNS BY CREATING CONDITIONAL
PATTERN-BASES
Item Conditional pattern-base Conditional FP-tree
p {(fcam:2), (cb:1)} {(c:3)}|p {}
m {(fca:2), (fcab:1)} {(f:3, c:3, a:3)}|m f:4 c:1

c:3 b:1 b:1
b {(fca:1), (f:1), (c:1)} Empty
a:3 p:1
a {(fc:3)} {(f:3, c:3)}|a
m:2 b:1
c {(f:3)} {(f:3)}|c p:2 m:1
f Empty Empty
70
STEP 3: RECURSIVELY MINE THE CONDITIONAL FP-TREE
{}
{}
Cond. pattern base of “am”: (fc:3) f:3
f:3
c:3
c:3 am-conditional FP-tree
{}
a:3 Cond. pattern base of “cm”: (f:3)
m-conditional FP-tree
f:3
cm-conditional FP-tree
{}
Cond. pattern base of “cam”: (f:3) f:3

cam-conditional FP-tree 71
STEP 3: RECURSIVELY MINE THE CONDITIONAL FP-TREE
conditional FP-tree of conditional FP-tree of conditional FP-tree of

“am”: (fc:3) “cam”: (f:3)
“m”: (fca:3 ) add
{} “c” {}
add Frequent Pattern Frequent Pattern
Frequent Pattern {} “a” f:3 f:3
f:3 c:3 add add
add “f”
“c” “f”
c:3 conditional FP-tree of conditional FP-tree of
“cm”: (f:3) of “fam”: 3
a:3 add
Frequent Pattern {} “f” Frequent Pattern
add conditional FP-tree of

f:3 “fcm”: 3
“f”
Frequent Pattern
Frequent Pattern
fcam
conditional FP-tree of “fm”: 3 72
Frequent Pattern
PRINCIPLES OF FP-GROWTH
 Pattern growth property

 Let  be a frequent itemset in DB, B be 's conditional pattern base,
and  be an itemset in B. Then    is a frequent itemset in DB iff
 is frequent in B.
 “abcdef ” is a frequent pattern, if and only if

 “abcde ” is a frequent pattern, and
 “f ” is frequent in the set of transactions containing “abcde ”
73
CONDITIONAL PATTERN BASES AND
CONDITIONAL FP-TREE
Item Conditional pattern base Conditional FP-tree
p {(fcam:2), (cb:1)} {(c:3)}|p {}

m {(fca:2), (fcab:1)} {(f:3, c:3, a:3)}|m f:4 c:1
b {(fca:1), (f:1), (c:1)} Empty c:3 b:1 b:1

a:3 p:1
a {(fc:3)} {(f:3, c:3)}|a
m:2 b:1
c {(f:3)} {(f:3)}|c
p:2 m:1
f Empty Empty
order of L
74
SINGLE FP-TREE PATH GENERATION
 Suppose an FP-tree T has a single path P. The complete set of frequent

pattern of T can be generated by enumeration of all the combinations of the
sub-paths of P
{}
All frequent patterns concerning m:
combination of {f, c, a} and m
f:3 m,
c:3  fm, cm, am,
fcm, fam, cam,
a:3
fcam
m-conditional FP-tree
75
EFFICIENCY ANALYSIS
Facts: usually
1. FP-tree is much smaller than the size of the DB
2. Pattern base is smaller than original FP-tree
3. Conditional FP-tree is smaller than pattern base
 mining process works on a set of usually much smaller pattern bases and conditional
FP-trees
 Divide-and-conquer and dramatic scale of shrinking
76
ADVANTAGES OF THE PATTERN GROWTH APPROACH
 Divide-and-conquer:
 Decompose both the mining task and DB according to the frequent patterns
obtained so far
 Lead to focused search of smaller databases
 Other factors
 No candidate generation, no candidate test
 Compressed database: FP-tree structure
 No repeated scan of entire database
 Basic ops: counting local freq items and building sub FP-tree, no pattern search
and matching (Grahne and J. Zhu, FIMI'03) 77
INTERESTINGNESS MEASUREMENTS
 Objective measures
Two popular measurements:
¶ support; and
· confidence
 Subjective measures
A rule (pattern) is interesting if
¶ it is unexpected (surprising to the user); and/or
· actionable (the user can do something with it)
CRITICISM TO SUPPORT AND CONFIDENCE (CONT.)
 Example 2:
 X and Y: positively correlated,
X 1 1 1 1 0 0 0 0
 X and Z, negatively related Y 1 1 0 0 0 0 0 0
 support and confidence of
Z 0 1 1 1 1 1 1 1
X=>Z dominates
 We need a measure of dependent or correlated
events
P( A B) Rule Support Confidence
corrA, B  X=>Y 25% 50%
P( A) P( B) X=>Z 37.50% 75%
 P(B|A)/P(B) is also called the lift of rule A => B
OTHER INTERESTINGNESS MEASURES: INTEREST
 Interest (correlation, lift) P( A  B)

 taking both P(A) and P(B) in consideration P( A) P( B)
 P(A^B)=P(B)*P(A), if A and B are independent events
 A and B negatively correlated, if the value is less than 1; otherwise A and B positively
correlated
Itemset Support Interest
X 1 1 1 1 0 0 0 0
X,Y 25% 2
Y 1 1 0 0 0 0 0 0 X,Z 37.50% 0.9
Z 0 1 1 1 1 1 1 1 Y,Z 12.50% 0.57

Data Mining: Characterization: Jimma University, Faculty of Computing Arranged By: Dessalegn Y

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Data Mining: Characterization: Jimma University, Faculty of Computing Arranged By: Dessalegn Y

Uploaded by

Copyright:

Available Formats

LECTURE 4

Jimma University ,Faculty of COmputing

CHARACTERIZATION AND COMPARISON

WHAT IS CONCEPT DESCRIPTION?

 Descriptive vs. predictive data mining

 Apply aggregation by merging identical, generalized tuples and

 Data focusing: task-relevant data, including dimensions, and the result is

Name Gender Major Birth-Place Birth_date Residence Phone # GPA

 relevance related to dimensions and levels

 Attribute-oriented Induction for class description

 OLAP operations (e.g. drilling, slicing) on relevance rules

Attributes = {Outlook, Temperature, Humidity, Wind}

 Desirable properties for similarity

apple apple new

 How would you quantify their similarity?

 Number of words in common

apple apple new

 Sim(D,D) = 3, Sim(D,D) = Sim(D,D) =2

Vefa rereases new book

 The distance for the documents

• We could view them as sets of words. Jaccard Similarity will

 The cosine of the angle between X and Y

Cos (D3,D1) = Cos(D3,D2) = 4/5

Cos(D4,D1) = Cos(D4,D2) = Cos(D4,D3) = 0

 Numerical measure of how different two data objects are

 Minimum distance is 0, when comparing an object with itself.

 Hamming distance between two vectors of categorical

 How do we define similarity between strings?

 Minimum number of operations can be computed using dynamic

Harry Harry Harry Twilight Star Star Star

How can we fill the empty entries of the matrix?

Harry Harry Harry Black Star Star Star

Someone who likes one of the Harry Potter or Star Wars

 Map items into a feature space:

 Finding the appropriate features

 Recommendations for new users

Harry Harry Harry Twilight Star Star Star

Recommend to user C, the items

Harry Harry Harry Twilight Star Star Star

Which pair of users do you consider as the most similar?

What is the right definition of similarity?

Harry Harry Harry Twilight Star Star Star

Jaccard Similarity: users are sets of movies

Harry Harry Harry Twilight Star Star Star

Harry Harry Harry Twilight Star Star Star

Normalized Cosine Similarity:

 Advantage: for each user we have small amount of

 Algorithm: For each user c and item i

 Disadvantage: we need to consider each user-item pair separately

Problem: to develop meaningful customer groups that are

 Association rule mining:

Customer  Association rules: (many more!)

 Given: (1) database of transactions, (2) each transaction is a list of items

 Boolean vs. quantitative associations (Based on the types of values

Tid Items bought Example of Rules:

• Thus, we may decouple the support and confidence requirements

 Frequent itemset generation is still computationally

Transactions List of Candidates

 Reduce the number of candidates (M)

 Apriori principle holds due to the following property of the

Ck: Candidate itemset of size k

 L3={abc, abd, acd, ace, bcd}

 acde from acd and ace