Professional Documents
Culture Documents
33
leased any other video in the middle of the pre- means of task execution of projection of the ini-
vious sequence, so that these sequences of trans- tial data base and the obtaining of patterns, with-
actions also fall into the same pattern. out involving a process of lead generation. Using
The researches on mining sequential patterns this technique and approach under the divide and
are based on events that took place in an orderly rule, Han et al. (1996) proposed the algorithm
fashion at the time. FreeSpan (Frecuent Pattern-Project Sequential
Most of the implemented algorithms for the Pattern mining), and Pei et al. (2001) proposes
extraction of frequent sequences, using three dif- PrefixSpan (Prefix-projected Sequential Pattern
ferent types of approaches according to the form mining). In these algorithms the database of se-
of evaluating the support of the candidate se- quences is projected recursively in a set of small
quential patterns. The first group of algorithms is databases from which the fragments of sub se-
based on the ownership apriori, introduced by quences grow based on the current set of fre-
Agrawal and Srikant (1994) in the mining of as- quent sequences, where the patterns are extract-
sociation rules. This property suggests that any ed.
sub pattern from a frequent pattern is also fre- Han et al. (1996)] show that FreeSpan extracts
quent, allowing pruning sequences candidates the full set of patterns and is more efficient and
during the process of lead generation. Based on considerably faster than the algorithm GSP.
this heuristics, Agrawal and Srikant (1995) pro- However, a sub sequence can be generated by
posed algorithms as the AprioriAll and Apri- the combinations of sub strings in a sequence,
oriSome. The substantial difference between the- while the projection in FreeSpan must follow the
se two algorithms is that the AprioriAll generates sequence in the initial database without reducing
the candidates from all the large sequences the length. In addition, it is very expensive the
found, but that might not be lowest panning val- fact that the growths of a sub sequence it will be
ues, however, AprioriSome only counts those explored in any point of the division within a
sequences that are large but lowest panning val- candidate sequence. As an alternative to this
ues, thus reducing the search space of the pat- problem, Pei (2001) proposes PrefixSpan. The
terns. general idea is to examine only the prefixes for
In subsequent work, Srikant and Agrawal the sub project only sequences and their corre-
(1996) propose the same algorithm GSP (Gener- sponding sub sequences postfijas within data-
alization Sequential Patterns), also based on the bases planned. In each of these databases
technical apriori, surpassing previous in 20 mag- planned, it will find the sequential patterns ex-
nitudes of time. Until this time, the algorithms panded exploration only local patterns frequent-
that had been proposed for mining sequential ly. PrefixSpan extracts the full set of patterns and
patterns focused on obtaining patterns taking into their efficiency and implementation are consid-
account only the minimal support given by the erably better both GSP and FreeSpan.
user. But these patterns could fit into transactions The third group is formed by algorithms that
that had been given at intervals of time very dis- kept in memory only information necessary for
tant, which was not convenient for the purposes the evaluation of the bracket. These algorithms
of mining. So, in this paper, we propose the idea are based on the calls of occurrence lists that
that in addition to the minimal support, the user contain the description of the location where the
could be in the ability to specify your interest in patterns occur in the database. Under this ap-
obtaining patterns that fit into transactions that proach, Zaki (2001) proposes the SPADE algo-
have been given in certain periods of time, and rithm (Sequential Pattern Discovery using
this is made from the inclusion of restrictions on Equivalence classes) where he introduces the
the maximum and minimum distance, the size of technical processing of the data base to vertical
the window in which the sequences and the in- format, in addition there is a difference from the
heritance relationships - taxonomies, which are algorithms based on apriori, it does not perform
cross-relations through a hierarchy. multiple passes on the database, and you can ex-
In these algorithms based on the principle of tract all the frequent sequences in only three
apriori, the greater effort focused on developing passes. This is due to the incorporation of new
specific structures that allow sequential patterns techniques and concepts such as the list of identi-
represent the candidates and in this way make the fiers (id-list) with vertical format that is associat-
counting operations support more quickly. ed with the sequences. In these lists by means of
The second group is the algorithms that seek temporary unions can be generated frequent se-
to reduce the size of the set of scanned data, by quences. Also used the grid based approach to
34
break down the search space in small classes that consequently, its performance is better than
can be processed independently in the main Wang et al.
memory. Also, uses the search in both breadth Fiot (2008) in her work suggests that an item
and depth to find the frequent sequences within quantitative is partitioned into several fuzzy sets.
each class. In the context of fuzzy logic, a diffuse item is the
In addition to the techniques mentioned earli- association of a fuzzy set b to its corresponding
er, Lin and Lee (2005) proposes the first algo- item x, i.e. [x,b]. In the DB each record is asso-
rithm that implements the idea of indexing called ciated with a diffuse item [x,b] according to their
Memisp memory (Memory Indexing for sequen- degree of membership. A set of diffuse items
tial pattern mining). The central idea of Memisp will be implicated by the pair (X,B), where X is
is to use the memory for both the data streams as the set of items, and B is a set of fuzzy sets.
to the indexes in the mining process and imple- In addition, it argues that a sequence g-k-
ment a strategy of indexing and search to find all sequence (s1, s2,…, sp) is formed by g item sets
frequent sequences from a sequence of data in diffuse s=(X,B) grouped to diffuse k items [x,b],
memory, sequences that were read from the da- therefore the sequential pattern mining diffuse
tabase in a first tour. Only requires a tour on the consists in finding the maximum frequency dif-
basis of data, at most, two for databases too fuse g-k-sequence.
large. Also avoids the generation of candidates Fiot (2008), provides a general definition of
and the projection of database, but presented as frequency of a sequence, and presents three algo-
disadvantage a high CPU utilization and rithms to find the fuzzy sequential patterns:
memory. SpeedyFuzzy, which has all the objects or items
The fourth group of algorithms is composed of of a fuzzy set, regardless of the degree, if it is
all those who use fuzzy techniques. One of the greater than 0 objects have the same weight,
first work performed is the Wang et al. (1999), MiniFuzzy is responsible for counting the objects
who propose a new data-mining algorithm, or items of a fuzzy set, but supports only those
which takes the advantages of fuzzy sets theory, items of the sequence that candidate have a
to enhance the capability of exploring interesting greater degree of belonging to a specified thresh-
sequential patterns from the databases with quan- old; and TotallyFuzzy that account each object
titative values. The proposed algorithm integrates and each sequence. In this algorithm takes into
concepts of fuzzy sets and the AprioriAll algo- account the importance of the set or sequence of
rithm to find interesting sequential patterns and data, and is considered the best grade of mem-
fuzzy association rules from transaction data. bership.
The rules can thus predict what products and
quantities will be bought next for a customer and 4 Description of the Problem
can be used to provide some suggestions to ap-
A sequence s, denoted by <e1e2… in>, is an or-
propriate supervisors.
dered set of n elements, where each element ei is
Wang et al. (1999) propose fuzzy quantitative
a set of objects called itemset. An itemset, which
sequential patterns (FQSP) algorithm, where an
is denoted by (x1 [c1], x2 [c2] , …, Xq[cq] ), is a
item’s quantity in the pattern is represented by a
non-empty set of elements q, where each element
fuzzy term rather than a quantity interval. In their
xj is an item and is represented by a literal, and cj
work an Apriori-like algorithm was developed to
is the amount associated with the item xj that is
mine all FQSP, it suffers from the same weak-
represented by a number in square brackets.
nesses, including: (1) it may generate a huge set
Without loss of generality, the objects of an ele-
of candidate sequences and (2) it may require
ment are supposed to be found in lexicographical
multiple scans of the database. Therefore, an
order by the literal. The size of the sequence s,
Apriori-like algorithm often does not have a
denoted by |s|, is the total number of objects of
good performance when a sequence database is
all elements of the s, so a sequence s is a k-
large and/or when the number of sequential pat-
sequence, if |s|=k.
terns to be mined is large.
For example, <(a[5])(c[2])(a[1])>,
Chen et al. (2006) propose divide-and-conquer
<(a[2],c[4])(a[3])> and <(b[2])(a[2],e[3])> are
fuzzy sequential mining (DFSM) algorithm, to
all 3-sequences. A sequence s = <e1e2… in> is a
solve the same problem presented by Hong using
sub-sequence of another sequence of s'=<e1'e2'…
the divide-and-conquer strategy, which possesses
em'> if there are 1≤i1<i2<…<in≤m such that e1⊆
the same merits as the PrefixSpan algorithm;
ei1', e2⊆ei2 ', ... , and en⊆ ein'. The sequence s'
35
contains the sequence s if s is a sub-sequence of Consider the sequence C6, which consists of
s'. three elements, the first has the objects b and c,
Similarly, <(b,c)(c)(a,c,e)> contains the second has the object c, and the third has the
<(b)(a,e)> where the quantities may be different. objects a, c, and e. Therefore, the support of
The support (sup) of a sequential pattern X is <(b)(a)> is 4/6 since all the sequences of data
defined as the percentage on the fraction of rec- with the exception of C2 and C3 contain a
ords that contains X the total number of records <(b)(a)>. The sequence <(a,d)(a)> is a sub se-
in the database. The counter for each item is in- quence of both C1 and C4; and therefore,
creased by one each time the item is found in <(a,d)(a)>.sup=2/6.
different transactions in the database during the Given the pattern <(a)> and the frequent item
scanning process. This means that the counter of b, gets the pattern type-1 <(a)(b)> adding (b) to
support does not take into account the quantity of <(a)>, and the pattern type-2 <(a,b)> by the
the item. For example, in a transaction a custom- extension of <(a)> with b.
er buys three bottles of beer, but only increases Similarly, <(a)> is the prefix pattern (_pat)
the number of the counter to support {beer} by which in turn is a frequent sequence, and b is the
one; in other words, if a transaction contains an stem of both: <(a)(b)> and <(a,b)>.
item, then, the support counter that item only is Note that the sequence null, denoted by <>, is
incremented by one. the _pat of any 1-frequent sequence. Therefore,
Each sequence in the database is known as a a k-sequence is like a frequent pattern type-1 or
sequence of data. The support of the sequence s, type-2 of a (k-1)-frequent sequence.
is denoted as s.sup, and represents the number of
sequences of data that contain s divided by the 5 Algorithm for the discovery of se-
total number of sequences that there is in the da- quential patterns with quantity fac-
tabase. minSup threshold is the minimum speci- tors - MSP-QF
fied by the user. A sequence s is frequent if
s.sup≥minSup, therefore it will be a sequential The algorithm for mining sequential patterns
pattern frequently. with quantity factors, arises from the need to dis-
Then, given the value of the minSup and a da- cover from a database of sequences, the set of
tabase of sequences, the problem of the sequen- sequential patterns that include the amounts as-
tial pattern mining is to discover the set of all sociated with each of the items that are part of
sequential patterns whose supports are greater the transactions in the database, since having this
equal to the value of the minimum support additional information can be known with greater
(s.sup≥ minSup). precision not only what is the relationship with
Definition: given a pattern and a frequent respect to the time that exists between the vari-
item x in the database of sequences, ' is a: ous items involved in the transactions of a se-
Pattern Type-1: if ' can be formed by add- quence, but also as is the relationship to the
ing to the itemset that contains the item x, amount of these items.
The algorithm MSP-QF, it is based on the idea
as a new element of .
of the use of prefixes, and the creation of indexes
Pattern Type-2: if ' can be formed by the from the database of sequences or other indices
extension of the last element of with x. that are generated during the mining process,
The item x is called stem of the sequential pat- where recursively searching for frequent pat-
tern ', and prefix is the pattern of '. terns. As a result of the exploration of a particu-
That is, the following database of sequences of lar index, fewer and shorter sequences of data
figure 1, which includes amounts for the items need to be processed, while the patterns that are
and that, has six sequences of data. found will be made longer.
In addition, if the database is very large se-
Sequences
C1 = <(a[1],d[2]) (b[3],c[4]) (a[3],e[2])>
quence uses the techniques of partitioning in a
C2 = <(d[2],g[1]) (c[5],f[3]) (b[2],d[1])> manner that the algorithm is applied to each of
C3 = <(a[5],c[3]) (d[2]) (f[2]) (b[3])> the partitions as if it were a database of lesser
C4 = <(a[4],b[2],c[3],d[1]) (a[3]) (b[4])>
C5 = <(b[3],c[2],d[1]) (a[3],c[2],e[2]) (a[4])> size.
C6 = <(b[4],c[3]) (c[2]) (a[1],c[2],e[3])>
36
5.1 Procedure of the algorithm MSP-QF Step 5: When there is no more stems, filtered
index stems all those who are frequent. For all
Listed below are the steps of the proposed al-
stems (sequences) frequently, we proceed to dis-
gorithm.
cretize the quantities that were associated with
each item and stored in the set of frequent pat-
Step 1: Partitioning and scanning of the data-
terns. For this, before adding it to the set of fre-
base of sequences. Depending on the size of the
quent patterns, we proceed to verify that the
database are applicable to so it can be partitioned
common pattern found recently has not already
and formatted through and then to scan each of
been added before this set as a result of applying
the partitions of independently. For each parti-
the algorithm to a partition of the database that
tion, the sequences are constructed and stored in
was processed with previously. If frequent pat-
the structure DBSeq. At the same time generates
tern already exists in the set of frequent patterns,
the index of items where is stored the support for
the discretization process is again applied to the
each one of them, which is found during the
set of quantities associated with the sequence is
scanning process.
stored as a frequent pattern and set of quantities
of newly discovered frequent pattern; otherwise,
Step 2: The index of items are filtered out
the common pattern found in the current partition
those that are frequent, i.e., whose support is
is added directly to the set of frequent patterns.
greater than or equal to minSup determined by
Then we proceed to perform recursively steps
the user. All these items come to form sequences
3, 4 and 5 with each one of the frequent patterns
of size |s| =1, therefore, form the set of 1-
that are found in the process.
sequences. For all these sequences frequent item
is to write the amounts associated with each item
Discretization Function: This function is re-
to the time it is saved in the whole of frequent
sponsible for making the set of quantities associ-
patterns.
ated with an item, the range of values given by
the mean and standard deviation of this set. For
Step 3: For each one of the frequent patterns
example, given the sequences of the figure 1, the
, found in step 2, or as a result of the step 4, the
set of quantities associated with the item <(a)>
index is constructed _idx, with inputs (ptr_ds, is: 1,5,4,3,1, which after being discretized would
pos), where ptr_ds refers to a sequence of the DB be the interval formed by: [ 2.8±1.6 ]
in which appears the pattern, and pos is the
pair (posItemSet, posItem), where posItemSet is To summarize the steps carried out in the pro-
the position of the itemset in the sequence and posed algorithm, figure 3 shows a schematic of
posItem the position of the item in the itemset the entire procedure.
from the sequence where the pattern appears.
The values of pos allow the following scans are
performed only on the basis of these positions in 5.2 Algorithm specification MSP-QF
a certain sequence. Here we show the specification of the pro-
posed algorithm MSP-QF.
Step 4: Find the stems of type-1 and/or type-2
for each pattern and its corresponding index Algorithm MSP-QF
_idx generated in the previous step, considering In: DB = database sequences
only those items of the sequences referred to in minSup = minimum support
partition = number of sequences included in each of the partitions
_idx and the respective values of pos. At the Out: set of all sequential patterns with quantity factors.
same time as are the stems are calculated their Procedure:
1. Partitioning the DB
supports, and in addition is added to the list of 2. Each partition scan it in main memory and:
quantities of the item that is part of the stem the (i) build sequences and store them in DBSeq structure.
amount referred to in the item of the sequence of (ii) index the items and determine the support of each item.
(iv) associate the quantities of each item in a sequence list of item
the DB which is being examined. The infor- quantities in the index.
mation of the stems and their quantities are 3. Find the set of frequent items
4. For each frequent item x,
stored in another index of stems. This step is re- (i) form the sequential pattern = <(x)>
peated for the same pattern, until they were no (ii) call Discretize() to discretize the set of quantities associated
longer more stems from this. with each item x.
(iii) storing in the set frequent patterns.
(iv) call Indexing (x, <>, DBSeq) to build the -idx index.
(v) call Mining (, -idx) to obtain patterns from index -idx.
37
Subrutine Indexing (x, , set_Seq) 3. For each stem x found in the previous step,
Parameters: (i) form a sequential pattern '' from the prefix pattern -pat and
x = one stem type-1 or type-2; the stem x.
= prefix pattern (-pat); (ii) call Discretize(') to discretize the amounts associated with
set_Seq = set of data sequences the items of '.
/ * If set_Seq is an index, then each data sequence in the index is (iii) call Index(', , ’-idx) to build the index ’-idx.
referenced by the element ptr_ds¸ which is formed at the input (iv) call Mining(’, '-idx) to discover sequential patterns from
(ptr_ds, pos) index * / index ’-idx
Out: índex '-idx, where ' represents the pattern formed by the stem x
and prefix pattern -pat.
Procedure:
1. For each data sequence ds of set_Seq Subrutina Discretize()
(i) If set_Seq = DBSeq the pos_inicial = 0, else pos_inicial = pos. Parameters:
(ii) Find the stem in each sequence ds from the position = a pattern that is a sequence;
(pos_inicial + 1), Output: the arithmetic mean and standard deviation of the amounts
1. If the stem x is in position pos in ds, then insert a pair (ptr_ds, associated with each item pattern.
pos) in '-idx index, where ptr_ds reference to ds. Procedure:
2. If the stems x is equal to the item x’ of the ds sequence, add- 1. For each itemset do
ed the quantity q associated with the item x’, to the list of a) For each item x do
quantities related to x. (i) Calculate the arithmetic mean and standard deviation of
2. Return the '-idx index. the set of quantities associated with the item x
(ii) storing the arithmetic mean and standard deviation in the
pattern
Subrutine Mining(,-idx)
Parameters:
= a pattern;
-idx = an índex.
Procedure:
1. For each data sequence ds referenced by ptr_ds of input (ptr_ds,
pos) in -idx,
(i) Starting from the (pos +1) position until |ds|, determining poten-
tial stems and increase in one support each of these stems.
2. Filter those stems that have a large enough support.
38
In the test with minSup=2% were obtained
6 Experiments and Result 824 sequential patterns with quantity factors,
some of which are:
The experiments to test the technical proposal Olive[ 1.06±0.51 ]$
were implemented in two different scenarios, Olive[ 1.03±0.5 ]$ Paprika[ 0.37±0.23 ]$
which are described below. Olive[ 1.01±0.5 ]$ Porro[ 0.57±0.27 ]$
Celery[ 0.56±0.25 ]$ Lettuce[ 0.53±0.24 ], Lentils[ 0.54±0.23 ]$
Celery[ 0.56±0.26 ]$ Lettuce[ 0.55km Air ±0.24 ],
6.1 Scenario 1: Real data Paprika[ 0.34±0.17 ]$
Lettuce[ 0.61±0.25 ], Lentils[ 0.56±0.26 ]$ Porro[ 0.59±0.24 ],
The technique was applied in the analysis of Paprika[ 0.33±0.15 ]$
the market basket of a supermarket. These tests Porro[ 0.54±0.27 ], Lentils[at 0.62±0.25 ]$ Lentils[ 0.58±0.25 ]$
consisted of obtain the set of frequent sequential Paprika[ 0.35±0.17 ]$
MEMISP MSP-QF
This second scenario is generated multiple da-
minSup tabases (datasets) of synthetic form by means of
Exe.Time No. Exe.Time No.
(%)
(seg.) Patterns (seg.) Patterns the Synthetic Data Generator Tool.
10.00 4 50 5 50 The process followed to synthetic generation
2.00 12 824 15 824
1.50 16 1371 19 1371 of the dataset, it is the describing Agrawal and
1.00 22 2773 27 2773 Srikan (1995), and under the parameters referred
0.75 28 4582 35 4582 to in the work of Lin and Lee (2005).
0.50 39 9286 50 9286
0.25 72 30831 89 30831 In this scenario, tests were carried out both of
effectiveness, efficiency and scalability.
The evidence of effectiveness and efficiency
Analysis of the market basket
100 were made with dataset generated with the fol-
90
lowing parameters: NI = 25000, NS = 5000, N =
Execution time (seg.)
80
70 10000, |S| = 4, |I| = 1.25, corrS = 0.25, crupS =
60 0.75, corrI = 0.25 and crupI = 0.75.
50
40
MEMISP The results of these tests were compared with
30 MSP-QF the results obtained for the algorithms Pre-
20
10
fixSpan-1, PrefixSpan-2 and Memisp.
0
10 2 1.5 1 0.75 0.5 0.25 Efficiency Tests: Ran a first subset of tests for
Minimal Support (%)
|C|=10 and a database of 200,000 sequences,
Figure 4. Results of the tests for scenario 1 with different values for minSup. The results are
shown in figure 5.
39
1600 Efficacy tests: Were carried out 92 efficacy
Exxecution Tme (seg.)
1400 trials with the same dataset of the tests of effi-
1200 ciency. In four of the tests carried out with values
1000
|C| =20 and |T| equal to 2.5, 5 and 7.5 respective-
800
600
ly, it did not achieve the same amount of sequen-
400 tial patterns found with the algorithms Pre-
200 fixSpan-1 and PrefixSpan-2. These 4 tests repre-
0
0.25 0.5 0.75 1 1.5 2
sent 4% of the total.
Minimal Support (%)
PrefixSpan-1 PrefixSpan-2 MEMISP MSP-QF Scalability tests: The scalability tests were
used datasets synthetically generated with the
Figure 5. Test results for |C|=10 and |DB|=200K same values of the parameters of the first subset
of tests of efficiency, and with minimal support
The second subgroup of tests was conducted equal to 0.75%. The amount of sequences in the
with a dataset with values for |C| and |T| of 20 dataset for these tests ranged from |DB|=1000K
and 5 respectively. This value of |T| implies that to 10000K, i.e. of a million to 10 million se-
the number of items of transactions increases, quences. In figure 8, you can watch the results of
which represents that the database is also larger these tests.
and more dense with respect to the number of
frequent sequential patterns that may be ob- 40
tained. The results of these tests are those seen in
Execution Time (relative |DB| =
35
figure 6. 30
25
1000KB)
10000 20
Execution Time (seg.)
15
8000
10
6000
5
4000
0
2000 1 2 3 4 5 6 7 8 9 10
Millions of sequences
0
PrefixSpan-1 PrefixSpan-2 MEMISP MSP-QF
0.25 0.5 0.75 1 1.5 2
Minimal Support (%)
PrefixSpan-1 PrefixSpan-2 MEMISP MSP-QF Figure 8, Results of scalability tests for min-
sup=0.75% and different sizes of datasets
Figure 6. Test results for |C|=20, |T|=5 and
|DB|=200K 7 Conclusion
We have proposed an algorithm for the dis-
A last subset of efficiency tests were carried covery of sequential patterns that allows, as part
out under the same parameters of the subset of the mining process, to infer from the amounts
above with the exception of |T| increased to 7.5. associated with each of the items of the transac-
The results are shown in figure 7. tions that make up a sequence, the quantity fac-
tors linked to the frequent sequential patterns.
20000 The technical proposal has been designed in
18000
16000
such a way that uses a compact set of indices in
Execution Time (seg.)
40
but also, what is the relationship of the quantities Karel Filip. 2006. Quantitative and Ordinal Associa-
of some items to others, which enriches the se- tion Rules Mining (QAR Mining). 10th Internation-
mantics provided by the set of sequential pat- al Conference on Knowledge-Based & Intelligent
terns. Information & Engineering Systems (KES 2006).
South Coast, UK: Springer, Heidelberg.
Finally, the results obtained in section 5, we
can conclude by saying that the technical pro- Lin Ming-Yen and Lee Suh-Yin. 2005. Fast Discov-
posal meets the objectives of the mining process; ery of Sequential Patterns through Memory Index-
it is effective, is efficient and is scalable because ing and Database Partitioning. Journal of Infor-
it has a linear behavior in accordance with the mation Science and Engineering.
sequence database grows, and that when applied Market-Basket Synthetic Data Generator,
to large data bases his performance turned out to http://synthdatagen.codeplex.com/.
be better than the techniques discussed in this Molina L. Carlos. 2001. Torturando los Datos hasta que
work. Confiesen. Departamento de Lenguajes y Sistemas In-
formáticos, Universidad Politécnica de Cataluña. Bar-
Reference celona, España.
Agrawal Rakesh and Srikant Ramakrishnan. 1994. Papadimitriou Stergios and Mavroudi Seferina. 2005.
Fast algorithms for mining association rules. In The fuzzy frequent pattern Tree. In 9th WSEAS Inter-
Proceeding 20th International Conference Very national Conference on Computers. Athens, Greece:
Large Data Bases, VLDB. World Scientific and Engineering Academy and Soci-
ety.
Agrawal Rakesh and Srikant Ramakrishnan. 1995.
Pei Jian, Han Jiawei, Mortazavi-Asl Behzad and Pinto
Minning Pattern Sequential. 11th International
Helen. 2001. PrefixSpan: Mining sequential patterns
Conference on Data Engineering (ICDE’95), Tai-
efficiently by prefix-projected pattern growth. In
pei, Taiwan. ICDE '01 Proceedings of the 17th International Con-
Agrawal Rakesh and Srikant Ramakrishnan. 1996. ference on Data Engineering.
Mining Quantitative Association Rules in Large Srikant Ramakrishnan and Agrawal Rakesh. 1996.
Relational Tables. In Proceeding of the 1996 ACM Mining Sequential Patterns: Generalizations and
SIGMOD Conference, Montreal, Québec, Canada. Performance Improvements. In Proc.5th Int. Conf.
Alatas Bilal, Akin Erhan and Karci Ali. 2008. Extending Database Technology (EDBT’96), pages
MODENAR: Multi-objective differential evolution algo- 3–17, Avignon, France.
rithm for mining numeric association rules. Applied Soft
Takashi Washio, Yuki Mitsunaga and Hiroshi Moto-
Computing.
da. 2005. Mining Quantitative Frequent Itemsets
Chen Yen-Liang and Haung T. Cheng-Kui. 2006. A Using Adaptive Density-Based Subspace Cluster-
new approach for discovering fuzzy quantitative ing. In Fifth IEEE International Conference on Da-
sequential patterns in sequence databases. Fuzzy ta Mining (ICDM'05). Houston, Texas, USA:
Sets and Systems 157(12):1641–1661. IEEE Computer Society.
Fiot Céline. 2008. Fuzzy Sequential Patterns for Wang Shyue, Kuo Chun-Yn and Hong Tzung-Pei.
Quantitative Data Mining. In Galindo, J. (Ed.), 1999. Mining fuzzy sequential patterns from quan-
Handbook of Research on Fuzzy Information Pro- titative data”, 1999 IEEE Internat. Conference Sys-
cessing in Databases. tems, Man, and Cybernetics, vol. 3, 1999, pp. 962–
966.
Han Jiawei, Pei Jian, Mortazavi-Asl Behzad, Chen
Qiming, Dayal Umeshwar and Hsu Mei-Chun. Zaki Mohammed J. 2001. SPADE: An efficient algo-
1996. Freespan: Frequent pattern-projected se- rithm for mining frequent sequences. Machine
quential pattern mining. Conference of the sixth Learning, 42(1-2):31–60.
ACM SIGKDD international conference on
Knowledge discovery and data mining.
41