You are on page 1of 12

Data Mining

Sunitha R S
Dept of ISE,
RIT
Compact Representation of Frequent Itemsets
• The number of frequent itemsets produced from a
transaction dataset can be very large.

• It is useful to identify a small representative set of


itemsets from which all other frequent itemsets can be
derived.
Definitions
• Frequent Itemset: An itemset whose support is greater
than some user specified minimum support.

• Maximal Frequent Itemset: An itemset is maximal


frequent if none of its immediate supersets is frequent.

• Closed Frequent Itemset: An itemset is closed if none of


its immediate supersets has the same support as the
itemset.
Downward closure property:
• All subsets of any frequent itemset must also be
frequent.
• If milk,bread,butter is a frequent itemset, then
the following Itemsets are frequent.
milk,
bread
butter
milk, bread
milk, butter
bread, butter
• If there are k items then, we can generate 2k-1
frequent Itemsets.
Need for Maximal and Closed itemsets
• Used when the amount of data is huge.

• When the computation is very expensive and there is no


interest to find additional subsets. This can be avoided
by frequent itemset with maximum length.

• Disadvantage of maximal frequent itemsets then even


all its subsets are frequent, the support information is
not known. For mining rules support information is
important.

• So closed frequent itemset is preferred.


Closed Itemset

• An itemset is closed if none of its immediate


supersets has the same support as the itemset

TID Items Itemset Support Itemset Support


1 {A,B} {A} 4 {A,B,C} 2
2 {B,C,D} {B} 5 {A,B,D} 3
3 {A,B,C,D} {C} 3 {A,C,D} 2
4 {A,B,D} {D} 4 {B,C,D} 3
5 {A,B,C,D} {A,B} 4 {A,B,C,D} 2
{A,C} 2
{A,D} 3
{B,C} 3
{B,D} 4
{C,D} 3
Maximal vs Closed Itemsets

Frequent
Itemsets

Closed
Frequent
Itemsets

Maximal
Frequent
Itemsets
Alternative Methods for Generating Frequent
Itemsets
• Traversal of Itemset Lattice
– General-to-specific vs Specific-to-general
Frequent
itemset Frequent
border null null itemset null
border

.. .. ..
.. .. ..
Frequent
{a1,a2,...,an} {a1,a2,...,an} itemset {a1,a2,...,an}
border
(a) General-to-specific (b) Specific-to-general (c) Bidirectional
Alternative Methods for Generating Frequent
Itemsets
• Works effectively if the maximum length of the
frequent itemset is not too long.

• In general-to-specific we start with some general set


of items and merge the k-1 items to obtain the k-
itemset.

• In specific-to-general strategy specific frequent


Itemsets are considered first, before finding the more
general frequent Itemsets.
Alternative Methods for Generating Frequent
Itemsets
• Traversal of Itemset Lattice
– Equivalence Classes
null null

A B C D A B C D

AB AC AD BC BD CD AB AC BC AD BD CD

ABC ABD ACD BCD ABC ABD ACD BCD

ABCD ABCD

(a) Prefix tree (b) Suffix tree


Alternative Methods for Generating Frequent
Itemsets
• Traversal of Itemset Lattice
– Breadth-first vs Depth-first

(a) Breadth first (b) Depth first


Alternative Methods for Generating Frequent
Itemsets
• Representation of Database
– horizontal vs vertical data layout
Horizontal
Data Layout Vertical Data Layout
TID Items A B C D E
1 A,B,E 1 1 2 2 1
2 B,C,D 4 2 3 4 3
3 C,E 5 5 4 5 6
4 A,C,D 6 7 8 9
5 A,B,C,D 7 8 9
6 A,E 8 10
7 A,B 9
8 A,B,C
9 A,C,D
10 B

You might also like