Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more ➡
Download
Standard view
Full view
of .
Add note
Save to My Library
Sync to mobile
Look up keyword
Like this
1Activity
×
0 of .
Results for:
No results containing your search query
P. 1
Parallel Edge Projection and Pruning (PEPP) Based Sequence Graph protrude approach for Closed Itemset Mining

Parallel Edge Projection and Pruning (PEPP) Based Sequence Graph protrude approach for Closed Itemset Mining

Ratings: (0)|Views: 322|Likes:
Published by ijcsis
Past observations have shown that a frequent item set mining algorithm are supposed to mine the closed ones as the end gives a compact and a complete progress set and better efficiency. Anyhow, the latest closed item set mining algorithms works with candidate maintenance combined with test paradigm which is expensive in runtime as well as space usage when support threshold is less or the item sets gets long. Here, we show, PEPP, which is a capable algorithm used for mining closed sequences without candidate. It implements a novel sequence closure checking format that based on Sequence Graph protruding by an approach labeled “Parallel Edge projection and pruning” in short can refer as PEPP. A complete observation having sparse and dense real-life data sets proved that PEPP performs greater compared to older algorithms as it takes low memory and is more faster than any algorithms those cited in literature frequently.
Past observations have shown that a frequent item set mining algorithm are supposed to mine the closed ones as the end gives a compact and a complete progress set and better efficiency. Anyhow, the latest closed item set mining algorithms works with candidate maintenance combined with test paradigm which is expensive in runtime as well as space usage when support threshold is less or the item sets gets long. Here, we show, PEPP, which is a capable algorithm used for mining closed sequences without candidate. It implements a novel sequence closure checking format that based on Sequence Graph protruding by an approach labeled “Parallel Edge projection and pruning” in short can refer as PEPP. A complete observation having sparse and dense real-life data sets proved that PEPP performs greater compared to older algorithms as it takes low memory and is more faster than any algorithms those cited in literature frequently.

More info:

Published by: ijcsis on Oct 12, 2011
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See More
See less

10/12/2011

pdf

text

original

 
Parallel Edge Projection and Pruning (PEPP) BasedSequence Graph Protrude Approach for ClosedItemset Mining
kalli Srinivasa Nageswara Prasad
 Research Scholar in Computer Science
Sri Venkateswara University, TirupatiAndhra Pradesh , India.
 
.
Prof. S. Ramakrishna
 Department of Computer Science
Sri Venkateswara University, TirupatiAndhra Pradesh , India..
 Abstract:
Past observations have shown that a frequent item setmining algorithm are supposed to mine the closed ones as the endgives a compact and a complete progress set and better efficiency.Anyhow, the latest closed item set mining algorithms works withcandidate maintenance combined with test paradigm which isexpensive in runtime as well as space usage when supportthreshold is less or the item sets gets long. Here, we show, PEPP,which is a capable algorithm used for mining closed sequenceswithout candidate. It implements a novel sequence closurechecking format that based on Sequence Graph protruding by anapproach labeled “Parallel Edge projection and pruning” in shortcan refer as PEPP. A complete observation having sparse anddense real-life data sets proved that PEPP performs greatercompared to older algorithms as it takes low memory and is morefaster than any algorithms those cited in literature frequently.
 Key words – Data Mining; Graph Based Mining; Frequentitemset; Closed itemset; Pattern Mining; candidate; Itemset Mining;Sequential Itemset Mining.
I.
 
INTRODUCTIONSequential item set mining, is an important task, having manyapplications with market, customer and web log analysis, itemset discovery in protein sequences. Capable mining techniquesare being observed extensively, including the general sequentialitem set mining [1, 2, 3, 4, 5, 6], constraint-based sequentialitem set mining [7, 8, 9], frequent episode mining [10], cyclicassociation rule mining [11], temporal relation mining [12],partial periodic pattern mining [13], and long sequential item setmining [14]. Recently it’s quite convincing that for miningfrequent item sets, one should mine all the closed ones as theend leads to compact and complete result set having highefficiency [15, 16, 17, 18], unlike mining frequent item sets,there are less methods for mining closed sequential item sets.This is because of intensity of the problem and CloSpan is theonly variety of algorithm [17], similar to the frequent closeditem set mining algorithms, it follows a candidate maintenance-and-test paradigm, as it maintains a set of readily mined closedsequence candidates used to prune search space and verifywhether a recently found frequent sequence is to be closed ornot. Unluckily, a closed item set mining algorithm under thisparadigm has bad scalability in the number of frequent closeditem sets as many frequent closed item sets (or just candidates)consume memory and leading to high search space for theclosure checking of recent item sets, which happens when thesupport threshold is less or the item sets gets long.Finding a way to mine frequent closed sequences without thehelp of candidate maintenance seems to be difficult. Here, weshow a solution leading to an algorithm, PEPP, which can mineefficiently all the sets of frequent closed sequences through asequence graph protruding approach. In PEPP, we need not eyedown on any historical frequent closed sequence for a newpattern’s closure checking, leading to the proposal of Sequencegraph edge pruning technique and other kinds of optimizationtechniques.The observations display the performance of the PEPP to findclosed frequent itemsets using Sequence Graph. Thecomparative study claims some interesting performanceimprovements over BIDE and other frequently cited algorithms.In section II, most frequently cited work and their limitsexplained. In section III, the Dataset adoption and formulationexplained. In section IV, introduction to PEPP and its utilization
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 9, No. 9, September 201174http://sites.google.com/site/ijcsis/ISSN 1947-5500
 
for Sequence Graph protruding explained. In section V, thealgorithms used in PEPP described. In section V1, resultsgained from a comparative study briefed and followed byconclusion of the study.II.
 
RELATED WORKThe sequential item set mining problem was initiated byAgrawal and Srikant , and the same developed a filteredalgorithm, GSP [2], basing on the Apriori property [19]. Sincethen, lots of sequential item set mining algorithms are beingdeveloped for efficiency. Some are, SPADE [4], PrefixSpan [5],and SPAM [6]. SPADE is on principle of vertical id-list formatand it uses a lattice-theoretic method to decompose the searchspace into many tiny spaces, on the other hand PrefixSpanimplements a horizontal format dataset representation andmines the sequential item sets with the pattern-growth paradigm:grow a prefix item set to attain longer sequential item sets onbuilding and scanning its database. The SPADE and thePrefixSPan highly perform GSP. SPAM is a recent algorithmused for mining lengthy sequential item sets and implements avertical bitmap representation. Its observations reveal, SPAM isbetter efficient in mining long item sets compared to SPADEand PrefixSpan but, it still takes more space than SPADE andPrefixSpan. Since the frequent closed item set mining [15],many capable frequent closed item set mining algorithms areintroduced, like A-Close [15], CLOSET [20], CHARM [16],and CLOSET+ [18]. Many such algorithms are to maintain theready mined frequent closed item sets to attain item set closurechecking. To decrease the memory usage and search space foritem set closure checking, two algorithms, TFP [21] andCLOSET+2, implement a compact 2-level hash indexed result-tree structure to keep the readily mined frequent closed item setcandidates. Some pruning methods and item set closureverifying methods, initiated the can be extended for optimizingthe mining of closed sequential item sets also. CloSpan is a newalgorithm used for mining frequent closed sequences [17]. Itgoes by the
candidate
 
maintenance-and-test 
method: initiallycreate a set of closed sequence candidates stored in a hashindexed result-tree structure and do post-pruning on it. Itrequires some pruning techniques such as
Common Prefix
and
 Backward Sub-Item set pruning
to prune the search space asCloSpan requires maintaining the set of closed sequencecandidates, it consumes much memory leading to heavy searchspace for item set closure checking when there are morefrequent closed sequences. Because of which, it does not scalewell the number of frequent closed sequences. BIDE
[
26
]
isanother closed pattern mining algorithm and ranked high inperformance when compared to other algorithms discussed.Bide projects the sequences after projection it prunes thepatterns that are subsets of current patterns if and only if subsetand superset contains same support required. But this model isopting to projection and pruning in sequential manner. Thissequential approach sometimes turns to expensive whensequence length is considerably high. In our earlier literature[27]we discussed some other interesting works published in recentliterature.Here, we bring Sequence Graph protruding that based on edgeprojection and pruning, an asymmetric parallel algorithm forfinding the set of frequent closed sequences. The giving of thispaper is: (A) an improved sequence graph based idea isgenerated for mining closed sequences without candidatemaintenance, termed as Parallel Edge Projection and pruning(PEPP) based Sequence Graph Protruding for closed itemsetmining. The Edge Projection is a forward approach grows tilledge with required support is possible during that time the edgeswill be pruned. During this pruning process vertices of the edgethat differs in support with next edge projected will beconsidered as closed itemset, also the sequence of vertices thatconnected by edges with similar support and no projectionpossible also be considered as closed itemset (B) in the EdgeProjection and pruning based Sequence Graph Protruding forclosed itemset mining, we create a algorithms for Forward edgeprojection and back edge pruning(C) the performance clearlysignifies that proposed model has a very high capacity: it can befaster than an order of magnitude of CloSpan but uses order(s)of magnitude less memory in several cases. It has a goodscalability to the database size. When compared to BIDE themodel is proven as equivalent and efficient in an incrementalway that proportional to increment in pattern length and datadensity.III.
 
DATASET ADOPTION AND FORMULATIONItem Sets I: A set of diverse elements by which the sequencesgenerate.
1
n
 Ii
=
=
U
Note: ‘I’ is set of diverse elements Sequence set ‘S’: A set of sequences, where each sequencecontains elements each element ‘e’ belongs to ‘I’ and true for afunction p(e). Sequence set can formulate as
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 9, No. 9, September 201175http://sites.google.com/site/ijcsis/ISSN 1947-5500
 
1
|((),)
miiii
sepee
=
= < >
U
 Represents a sequence‘s’ of items those belongs to set of distinct items ‘I’.‘m’: total ordered items.P(e
i
): a transaction, where e
i
usage is true for that transaction.
1
 j j
Ss
=
=
U
 S: represents set of sequences‘t’: represents total number of sequences and its value is volatiles
 j
: is a sequence that belongs to SSubsequence: a sequence
 p
s
of sequence set ‘S’ is consideredas subsequence of another sequence
q
s
of Sequence Set ‘S’ if all items in sequence S
p
is belongs to s
q
as an ordered list. Thiscan be formulated asIf 
1
()()
n piqpqi
ssss
=
U
 Then
11
:
nm piqjij
ss
= =
<
U U
where
 pq
sSandsS
 Total Support ‘ts’ : occurrence count of a sequence as anordered list in all sequences in sequence set ‘S’ can adopt astotal support ‘ts’ of that sequence. Total support ‘ts’ of asequence can determine by following formulation.
()|: ( 1..||)|
tsttpS
 fsssforeachpDB
= < =
 
S
 DB
Is set of sequences
()
ts
 fs
: Represents the total support ‘ts’ of sequence s
t
is thenumber of super sequences of s
t
 Qualified support ‘q
s
’: The resultant coefficient of total supportdivides by size of sequence database adopt as qualified support‘qs’. Qualified support can be found by using followingformulation.
()()||
tsqsS
 fs fs DB
=
 Sub-sequence and Super-sequence: A sequence is sub sequencefor its next projected sequence if both sequences having sametotal support.Super-sequence: A sequence is a super sequence for a sequencefrom which that projected, if both having same total support.Sub-sequence and super-sequence can be formulated asIf 
()
ts
 fs
 
rs where ‘rs’ is required support threshold givenby userAnd
:
tp
ssforanypvalue
<
where
()()
tsttsp
 fsfs
 IV.
 
PARALLEL EDGE PROJECTION AND PRUNINGBASED SEQUENCE GRAPH PROTRUDEPreprocess:As a first stage of the proposal we perform datasetpreprocessing and itemsets Database initialization. We finditemsets with single element, in parallel prunes itemsets withsingle element those contains total support less than requiredsupport.Forward Edge Projection:In this phase, we select all itemsets from given itemset databaseas input in parallel. Then we start projecting edges from eachselected itemset to all possible elements. The first iterationincludes the pruning process in parallel, from second iterationonwards this pruning is not required, which we claimed as anefficient process compared to other similar techniques likeBIDE. In first iteration, we project an itemset
 p
s
that spawnedfrom selected itemset
i
s
from
S
 DB
and an element
i
e
considered from ‘I’. If the
()
tsp
 fs
is greater or equal to
rs
,
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 9, No. 9, September 201176http://sites.google.com/site/ijcsis/ISSN 1947-5500

You're Reading a Free Preview

Download
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->