Professional Documents
Culture Documents
THEORY :
Frequent Itemset Problem :
The frequent itemset problem consists of mining a set of items to find a subset of items that have
a strong connection between them.
Example: Given a set of baskets in a supermarket, a frequent itemset would be hamburgers and
ketchup. These items appear frequently in the baskets, and very often, together. In the general a
set of items that appear in many baskets is said to be frequent.
In the computer world, we could use this algorithm to recommend items of purchase for a user. If
A and B are a frequent itemset, once a user buys A, B would certainly be a good
recommendation.
In this problem, the number of "baskets" is assumed to be very large. Greater than what could fit
in memory. The number of items in a basket, on the other hand, is considered small.
The main challenge in this problem is the amount of data to be put in memory. In a set of N
items per basket for example,there are n!/2!(n-2)! pair combinations of items. We would have to
keep all these combinations for all baskets and iterate through them to find the frequent pairs.
Solution :
● Frequent Itemset Mining aims to find the regularities in the transaction dataset. Map Reduce
maps the presence of a set of data items in a transaction and reduces the Frequent Item set with
low frequency.
● The input consists of a set of transactions and each transaction contains several items.
● The Map function reads the items from each transaction and generates the output with key and
value. Key is represented with an item and value is represented by 1.
● After the map phase is completed, a reduce function is executed and it aggregates the values
corresponding to key. From the results, the frequent items are computed on the basis of
minimum support value.
1. Union all the frequent itemsets found in each chunk why? “monotonicity” idea: an itemset
cannot be frequent in the entire set of baskets unless it is frequent in at least one subset
SON Algorithm :
MapReduce for Pass 1
● Distributed data mining
● Pass 1 : Find candidate itemsets
1. Map: (F,1)
a. F : frequent itemset
2. Reduce: Union all the (F,1)
1. Map: (C,v)
a. C : possible candidate
2. Reduce: Add all the (C, v)
CONCLUSION :
Thus, the Map Reduce programming model is used for mining frequent itemsets from a dataset.
The frequent itemset mining algorithm is one of the most commonly used algorithms in data
mining which has a larger running time. By implementing mining frequent itemsets in hadoop its
running time can be reduced and its efficiency can also be improved.
OUTPUT :