You are on page 1of 5

IEEE International Conference on Computer, Communication and Control (IC4-2015).

A New DataStructure For Finding Maximum


Frequent ItemSet in Online Data Mining
Lakhan Yadav1 Pramod S.Nair2
M.I.T.M. M.I.T.M.
Indore, M.P. Indore, M.P.
Yadavlakhan25@gmail.com1 pramodsnair@yahoo.com2

Abstract-Frequent itemset mining is the first step of


association rule mining.Association rule mining in online data
mining is one of the most chalanging task due to data stream.A A. Online Frequent Itemset Mining Techniques
data stream is a huge,infinite,continuous,fast changing and rapid An algorithm to be consider online if: 1) it gives continuous
sequence of data elements.Traditional techniques for finding feedback, 2) during processing it is user controllable and 3) it
frequent itemset required many passes but stream data require produces a accurate and deterministic result.
only one scan over the data for finding frequent itemset so it is
essential to use online algorithms for streaming data . This paper estDec[1] and Carma[5] present an algorithm which generated
proposes an algorithm as well as a data structure for finding continues association rules in online data stream. Online
maximum frequent itemset in online data mining.A data mining means a end less transaction is generated and the
structure consists of a tree which known as Ordered Tree. The whole information is scan and generate the freequent itemsets
structure of Ordered Tree such as it has 26 path if the item coded in least number of scans. The online mining method used to
in alphabets, each path starts with a alphabetical letter and ends overcome the problem of fixed data size. It takes value from a
with the character Z. When we have more unique items then the network and update dataset regularly thus it requires online
coding can be with the numeric numbers. This Tree also known association rule mining to increase the accuracy of the mining
as multi path Tree , in which every node connected to their same result.
neighbour node.Every transaction insert into Ordered Tree in
sorted form and perform online frequent itemset mining. The
proposed algorithm works online as well as offline. Experiment B. Offline Frequent Itemset Mining Technique
Result shows it as better algorithm for online and offline frequent
itemset mining. Offline mining means find a frequent itemsets from a fixed
dataset. Apriori [2] and FP-tree [4] are most classical and
Keywords— Frequent Itemset; Frequent Itemset Mining; Online important algorithm for mining frequent itemsets form a
Data Mining fixed sized database. The basic idea behind Apriori
algorithm is to make many passes over the database. It
I. INTRODUCTION employs an iterative level wise search approach known as a
breadth first search, where k-itemsets are used to explore
The Frequent pattern mining[3] plays an essential role in many (k+1)-itemsets. FP-Tree follows the divide and conquers
data mining applications where such as mining association strategy-based approach.
rules, classification and clustering involved .
An association rule is an expression of the form X → Y, where III. PROPOSED METHODOLOGY
X and Y are the sets of items. The main goal of association
rule is to discover all the rules that have the support and We propose a new data structure for finding maximum
confidence greater than or equal to the minimum support and frequent item set in online data mining. The data structure
confidence from the database. consists of a tree, which is known as Ordered Tree. It is a one-
A data stream is a huge, infinite, continuous, fast changing, pass algorithm.
rapid sequence of data elements. Online mining algorithm
requires one pass algorithm to deal with streaming data for
knowledge discovery. Offline mining process takes value from A. New Data Structure
a fixed size of data set and no more transaction added once it
starts the process of finding frequent itemset. The classical
association rule mining algorithms on static data collect the
count information for all itemsets and discard the non-frequent
itemsets and their count information after multiple scans of the
database.
II. LITERATURE SURVEY

Frequent itemset mining techniques divided into two


catogieries which is online frequent itemsset mining and
offline frequent itemset mining. When frequent itemset mined
from a dynamic data stream known as online mining and
frequent itemset mined from a fixed database known as offline
mining.
IEEE International Conference on Computer, Communication and Control (IC4-2015).

{
Do
Figure 1 Ordered Tree. {
Figure 1 shows the Ordered Tree for locating maximum if (single path is obtained)
frequent Itemset.it consist of 26 path for every alphabetical {
letter, thus it also known as multipath tree. Structure of a if (No element is found)
Ordered Tree such that label of root node is null and the child {Stop algorithm;
node start sequentially and end with letter Z.Node of every Return (empty frequent set)
path decreases one by one with the increasing path of a }
Ordered Tree. else
{
B. Proposed Algorithm Traverse all elements from Lattice l;
Using continues association rule;
It is online algorithm, thus it required one pass to find frequent Find frequent item set
sets. }
Algorithm for finding maximum frequent item set as well as }
user specified frequent item set shown in Figure 2. else
{
Maximum Frequent Item Set Generator Traverse all multipath one by one;
Input: Data Stream. Find all frequent item set using continues association rules. }
Output: Frequent Item Set, Maximum Frequent Item Set, User }}
Specified Frequent Set. Finding Maximum Frequent Item Set
Insertion Procedure Ordered Tree(T , Node ,E) {
{ From Lattice L[e];
Root Node== Null; After finding frequent item set;
Input Transaction T from database. Search each element from node.
Lattice L[]; i=1;
While(Transaction T!=empty ) While(Max.Frequent Item Set !=found)
do {
{ Max Chain=Linear search L[i];
For each transaction i++;
Sort Elements e using insertion sort }
Sort e; Return(maximum frequent item set);
Insert L[e]; }
If(child node==Null ) User Specified Frequent Item Set
{ {
} else From Lattice L[e];
if(child node!=null) User Input is applied;
{ While(Lattice !=empty)
If(Child node follow same path) {
{ Using linear Search in frequent itemset;
Insert L[e]; If (element= =L[e])
Support ++; }
}else Return (user specified maximum frequent set)
Created new node; }
Insert L[e];
} Figure 2 Algorithm for Maximum Frequent Item Set In Online Data Mining.
}
Pruning Procedure Ordered Tree (E, Support) Finding maximum frequent item set in online data mining the
{ New Data Structure works as follows.
Using previous process find unique elements and their Apply the sorting in each transaction. After that, sort element
support values in Lattice L. insert into Ordered Tree according to their alphabetical order
while(given support>Element support) if encoded in alphabets.
do In a Ordered Tree same element of a different path connected
{ to each other, so finding frequent itemsets will be an easier
e1=l[e]; process.
Delete Element’s from Lattice L [ ] whose support value is
less than given support value; TABLE 1 VARIOUS TRANSACTION
l[e--];
}
}
Frequent Set Mining (Node, L[e])
{
While (All Path of Lattice is not traverse)
IEEE International Conference on Computer, Communication and Control (IC4-2015).

measuring the mining time in millisecond and the


Tid Transactions memory usage in kilobytes.
1 BCADE
A. DATASETS
2 C D B E In this Paper [6], maximum frequent item set mining
3 DEC performed on three different datasets (shown in Table 1).
4 E A These datasets have a different number of transaction and
features that include different cases.
Table 1 show various transactions. The procedure of proposed TABLE 3 VARIOUS D ATASETS USED FOR EXPERIMENT
algorithm is as follows.

Applying sorting to every transaction, which shows in table 2. S.N. Dataset Name Number of Size In
Transaction Kb
TABLE 2 SORT TRANSACTION 1 Accident 1101 27
Tid Transactions 2 Kosarak 2774 82
3 RetailSet 505 12
1 ABCDE
2 BCDE B. RESULTS
3 CDE This section present and analyze the results on the Accident,
4 AE Kosarak and RetailSet dataset for mining time and memory
usage for finding maximum frequent item set in online data
Insert transaction, one by one into Ordered Tree depend upon mining.
its prefix structure.
When same value will be repeated, increment the value by one 1) Mining Time
in Ordered Tree. How much time an algorithm take to complete whole process
to find frequent itemset from a data stream known as mining
time. Various dataset used to compare mining time of estDec
and proposed algorithm.
TABLE 4 MINING TIME BASED ON ACCIDENT D ATASET
S.N. Support EstDec Proposed
1 0.2 355 185
2 0.3 116 86
3 0.4 67 32
4 0.5 60 7
5 0.6 50 4
6 0.7 41 4
7 0.8 38 4
8 0.9 36 4

Figure 3 Ordered Tree Insertion.

With the help of figure 3 find maximum frequent itemset as well as


user defined frequent itemset.
C. User Specified Frequent Itemset : Proposed data structure also
find user specified frequent itemset along with frequent
itemset. The difference here is that it will bring out only
the interested items from the tree. Figure 3 Mining Time Based on Accident Dataset.

IV. RESULT ANALYSIS TABLE 5 MINING TIME BASED ON KOSARAK DATASET


A maximum frequent itemset mining in online data mining has S.N. Support EstDec Proposed
been implemented using real dataset. Frequent Item Set 1 0.2 859 209
Mining Dataset Repository (FIMI) prepared the following 2 0.3 185 105
datasets. Algorithm were developed in java language and
IEEE International Conference on Computer, Communication and Control (IC4-2015).

3 0.4 173 12 S.N. Support EstDec Proposed


4 0.5 166 10 1 0.2 11 7
5 0.6 164 17 2 0.3 13 13
6 0.7 158 16 3 0.4 13 10
7 0.8 172 14 4 0.5 12 11
8 0.9 175 10 5 0.6 10 10
6 0.7 15 14
7 0.8 13 10
8 0.9 14 10

Figure 4 Mining Timing Based on Kosarak Dataset.

TABLE 6 MINING TIME BASED ON RETAILSET D ATASET


S.N. Support EstDec Proposed
1 0.2 98 4
2 0.3 12 4 Figure 6 Memory usage Based on Accident Dataset.
3 0.4 10 3
4 0.5 10 3 Table 8 Memory usage based on Kosarak Dataset
5 0.6 10 7 S.N. Support EstDec Proposed
6 0.7 13 3 1 0.2 10 9
7 0.8 10 3 2 0.3 11 10
8 0.9 10 3 3 0.4 13 11
4 0.5 14 10
5 0.6 13 11
6 0.7 13 10
7 0.8 12 9
8 0.9 10 9

Figure 5 Mining Time based on RetailSet Dataset.

2) Memory Consumption

How much primary memory used to search a frequent itemset


from a data stream which shown in below table. Various
dataset used to calculate memory usage of estDec and Figure 7 Memory usage based on Kosarak Dataset.
proposed algorithm comparatively.
TABLE 9 MEMORY USAGE BASED ON RETAIL SET D ATASET
TABLE 7 MEMORY USAGE BASED ON ACCIDENT D ATASET S.N. Support EstDec Proposed
IEEE International Conference on Computer, Communication and Control (IC4-2015).

1 0.2 13 12
2 0.3 16 15
3 0.4 18 12
4 0.5 14 14
5 0.6 17 10
6 0.7 13 13
7 0.8 15 15
8 0.9 17 12

Figure 8 Memory Usage based on RetailSet Dataset.

Result shows that there is a major difference between mining


time and memory used in estDec and proposed technique.
Therefore proposed algorithm faster than the estDec
algorithm.

V. CONCLUSIONS AND FUTURE WORK

A New Data Structure has been proposed for finding


maximum frequent item set in online data mining. It is to find
frequent itemset from the streaming in data. The proposed data
structure used successfully to overcome the problem of
unnecessary scanning process.
The concept of new data structure used for a finding a
maximum frequent item set in online data mining using a
Ordered Tree. This is an approximation-based approach.

REFERENCES
[1] J. Chang and W. Lee. Finding Recent Frequent Item sets Adaptively over
Online Data Streams. In Proc. of the 9 th ACM SIGKDD International
Conference & Data Mining (KDD-2003), 2003.
[2] R. Agrawal and R. Srikant. Fast Algorithms for Mining Association Rules.
In Conf. of the 20thVLDB conference, pages 487-499, 1994.
[3] Han.J, Pei.J, and Yin. Y. “Mining frequent patterns without candidate
generation”. In Proc. ACM-SIGMOD Int’l Conf. Management of Data
(SIGMOD), 2000.
[4] C. Borgelt. “An Implementation of the FP- growth Algorithm”. Proc.
Workshop Open Software for Data Mining, 1–5.ACMPress, New Yo rk, NY,
USA 2005.
[5] C. Hidber. Online association rule mining. In Proc. of the ACM SIGMOD
Int'l Conference on Management of Data, pages 145-156, Philadelphia, PA,
May 1999.
[6]Asuncion, A & Newman, and D.J. “UCI Learning
Repository”,http://www.ics.uci.edu/mlearn/MLRepository.html, CA:
University of California, Department of Information and Computer Science.
2007.

You might also like