You are on page 1of 11

Experiment No.

Aim : Write a program to implement Frequent ItemSet Algorithm using Map-Reduce.


Lab Outcome No. : 8.ITL 801.3
Lab Outcome : Construct scalable algorithms for large Datasets using Map Reduce
techniques.
Date of Performance: 4/4/22
Date of Submission:8/4/22

Program Documentation Timely Viva Experiment Teacher


formation/ (02) Submission Answer Marks (15) Signature
Execution / (03) (03) with date
Ethical
practices (07 )
EXPERIMENT NO : 6

AIM : Write a program to implement Frequent ItemSet Algorithm using Map-Reduce.

THEORY :
Frequent Itemset Problem :
The frequent itemset problem consists of mining a set of items to find a subset of items that have
a strong connection between them.

Example: Given a set of baskets in a supermarket, a frequent itemset would be hamburgers and
ketchup. These items appear frequently in the baskets, and very often, together. In the general a
set of items that appear in many baskets is said to be frequent.

In the computer world, we could use this algorithm to recommend items of purchase for a user. If
A and B are a frequent itemset, once a user buys A, B would certainly be a good
recommendation.

In this problem, the number of "baskets" is assumed to be very large. Greater than what could fit
in memory. The number of items in a basket, on the other hand, is considered small.

The main challenge in this problem is the amount of data to be put in memory. In a set of N
items per basket for example,there are n!/2!(n-2)! pair combinations of items. We would have to
keep all these combinations for all baskets and iterate through them to find the frequent pairs.

This is where the Apriori algorithm enters!


The Apriori algorithm is based on the idea that for a pair of items to be frequent, each individual
item should also be frequent.
If the hamburger-ketchup pair is frequent, the hamburger itself must also appear frequently in the
baskets. The same can be said about the ketchup.

Solution :
● Frequent Itemset Mining aims to find the regularities in the transaction dataset. Map Reduce
maps the presence of a set of data items in a transaction and reduces the Frequent Item set with
low frequency.
● The input consists of a set of transactions and each transaction contains several items.
● The Map function reads the items from each transaction and generates the output with key and
value. Key is represented with an item and value is represented by 1.
● After the map phase is completed, a reduce function is executed and it aggregates the values
corresponding to key. From the results, the frequent items are computed on the basis of
minimum support value.

It is implemented using the SON Algorithm.


SON (Savasere, Omiecinski, and Navathe) Algorithm
● Repeatedly read small subsets of the baskets into main memory and run an in memory
algorithm to find all frequent itemsets.
● Possible candidates:

1. Union all the frequent itemsets found in each chunk why? “monotonicity” idea: an itemset
cannot be frequent in the entire set of baskets unless it is frequent in at least one subset

● On a second pass, count all the candidates.

SON Algorithm :
MapReduce for Pass 1
● Distributed data mining
● Pass 1 : Find candidate itemsets

1. Map: (F,1)
a. F : frequent itemset
2. Reduce: Union all the (F,1)

● Pass 2 : Find true frequent itemsets

1. Map: (C,v)
a. C : possible candidate
2. Reduce: Add all the (C, v)

CONCLUSION :
Thus, the Map Reduce programming model is used for mining frequent itemsets from a dataset.
The frequent itemset mining algorithm is one of the most commonly used algorithms in data
mining which has a larger running time. By implementing mining frequent itemsets in hadoop its
running time can be reduced and its efficiency can also be improved.
OUTPUT :

You might also like