You are on page 1of 6

JOURNAL OF INFORMATION AND COMMUNICATION TECHNOLOGIES, VOLUME 2, ISSUE 3, MARCH 2012

1

Suffix Tree; from Beginning to constructing
for Sequences Larger than the Main Memory
M. Alavi
AbstractSuffix tree is a data structure which represents all the suffixes of a string. The actions such as finding a substring
can be carried out in long sequences like DNA or in sequences in general through the formation of this data structure. There are
different algorithms for making a suffix tree. The basic algorithms made possible the formation of suffix tree in linear time but
these algorithms do not enjoy necessary efficiency for longer sequences because the input sequence and the tree are not fit in
the main memory. Thereupon, there will be some accessions to the disk. So, the focus of dimensional algorithms in reduction of
these accessibilities has been made through presentation of the methods for fiting the suffix tree at the memory. Accordingly,
this paper introduces and reviews the existing algorithms for creating the suffix tree.
Index TermsIndexing of Sequences, Suffix Tree.

1 INTRODUCTION
ioinformatics is the knowledge of using computer
sciences in molecular biology branch. The main ap-
plication of the term has been at least until the late
1980s in genetics and particularly in the areas of genetics
including the sequence of DNA (identifying the order of
nucleotides A, G, C and T) [3]. The related projects to the
field of genetics are very common. For example, the iden-
tification of human genome project [8] started in 1990 and
terminated in 2003. Presently, there is full information
about the sequence of all human chromosomes [2].
Currently, bioinformatics includes the creation and
improvement of databases and algorithms as a result of
biological data analysis. Rapid development of technolo-
gies related to the genetics in recent decades has caused
formation of a large amount of information on genome
(genetic content) of human beings, animals and plants as
well as creation of large databases in this field. Under
such circumstances, the size of database related to genes
(GenBank) doubles in every 18 months [4].
Considering the mentioned facts about the size of da-
tabase containing the genome of living organisms, it can
be easily recognized that operations like searching in da-
tabases, is turned into a critical issue. So, using efficient
algorithms is necessary for the efficient use of such data-
bases. Thereupon, indexing of existing sequences in this
database will help designing of efficient algorithms in
solving the problem at hand.
Because DNA sequences are a set of consecutive cha-
racters, DNA can't break efficiently into words. Hence,
indexing methods of normal strings, like B-Tree, cannot
be used efficiently for DNA sequences [11].
Suffix tree is a suitable data structure for representing
DNA sequences. This structure is a rooted tree in which
the edge label represents substring of a main string. There
is a unique path at the tree for each suffix which starts
from root and ends in leaf. After constructing the tree,
operations such as matching a pattern with the length of
m takes O(m) time.
The concept of suffix tree was first introduced in 1973
by Weiner [16]. Constructing the tree was simplified in
1976 by McCreight [7]. The first algorithm for suffix tree
construction, which was executed in linear time, was in-
troduced in 1995 by Ukkonen [10]. In following years,
other algorithms were presented which were not run in
linear time but they had better efficiency than the pre-
sented linear algorithms.
Remaining of this paper is organized as follows: In sec-
tion 2 the related definitions to suffix tree are explained.
In section 3, the basic algorithms for constructing suffix
tree are introduced. These algorithms have no memory
management during tree construction. In section 4, other
algorithms for constructing suffix tree are introduced.
Each algorithm tries to apply some improvements during
suffix tree construction. The algorithms introduced in this
section have preferred the construction time increase to
the time waste in carrying out the I/O operations. The
conclusion is presented in section 5.
2 BASIC DEFINITIONS
The input string S is considered on finite alphabet , as
S[0...n] = s0s1s2sn-1. . The suffix tree of S represents n suf-
fix of S. The suffix tree of this string is a rooted and direc-
tional tree which has the following specifications:
1. All paths from root to leaves represent suffixes of
S, so there will be exactly n leaves.
2. Labels of all edges are non-empty substring of S.
3. All internal (branching) nodes have at least two
Childs.
4. The edge label of internal nodes Childs cannot
start with the same character.

M. Alavi, MS student, Faculty of Engineering and Technology, Payam-e
Noor University, Tehran.


B
2012 JICT
www.jict.co.uk
2

























The terminal symbol $, which does not exist in alphabet,
is added to the end of input string with the aim of consi-
dering the first feature at the suffix tree. The suffix related
to this symbol is called empty string. Fig.1 shows suffix
tree for TATAT$ string.
Suffix link is another feature which is used in suffix
trees. The older linear time algorithms used suffix links,
although recent algorithms do not use it. All non-root
nodes have a suffix link to another internal node. For each
internal node v with path label x, which x and is a
substring of S, pointer is headed towards a node (w) with
label as incoming edge. The suffix links of suffix tree of
TATAT$ string are shown in Fig.2.
The symbols used in this paper are as follows:
A suffix of S which starts from location i is
shown as follows: Si = S [i, n-1] (0 i <n) so S0=S and
Sn=$. To showing i
th
character of string, Si is used.
Prefix Pi is substring of S[0, i]. The Longest
Common Prefix (LCP) is substring of S [I, I +k] for two
suffixes of Si and Si, S[i,i+k] = S[j,j+k] and
S[i,i+k+1] S[j,j+k+1].
For example, the following equation is shown:
S = TATAT$, LCP0,2= TAT and |LCP0,2| = 2.
Suffix tree is shown with ST.
Path(u) for node u is attachment of edges lables
which starts from the root and ends in node u.
Constructing a suffix tree is not a difficult task for a
string, although efficient construction of tree is the main
challenge. The basic algorithm [1] for constructing a suffix
tree is as follows: In each stage, STi+1 is constructed based
on STi. . For this reason, we start from root of STi with the
aim of finding the longest path which matches with Si.
Due to the fourth specification of the suffix tree, this path
will be unique. Once the first mismatch is observed, the
operations of finding the longest path will be ended. An
appropriate node is created in this location and suitable
labels are tagged. This algorithm is run at O(n
2
) for the
string of length n.
3 AN INTRODUCTION TO MEMORY-BASED
ALGORITHMS
The required memory for suffix tree is much more than
the amount of memory required for the input string. If the
input string is found too large, not only input string but
also the tree will not be fit in the main memory. This issue
is very common for DNA sequences. The length of these
strings in human genome is 3G. If the input string and
tree is not fit in the main memory, the tree construction
will result in accession to the disk. If these accesses are
not managed, the performance of algorithm will be de-
creased extremely. The algorithms introduced in this part
lack management in using main memory. In these algo-
rithms, it is assumed that input string and tree are all fit
in the main memory. Among the algorithms presented for
constructing suffix tree, Ukkonen Algorithm [10] and
WOTD (Write Only Top Down) Algorithm [13] are placed
in this group of algorithms.
The main idea in Ukkonen Algorithm is that in the ith
step, si symbol is added to the end of all STi-1 suffixes. So,
suffix tree can be constructed with scrolling left to right of
input string. Accordingly, ST0 is first created and then,
ST1 is formed based on it. This process will be continued
as long as the full tree is formed. Thus, all STi-1 suffixes,
ranging from the largest to smallest (empty string) are
added to tree respectively. Completion of adding new
suffix to STi-1 is occurred in one of the three nodes: leaf,
internal and virtual nodes (between an edge). If adding a
suffix is ended in a leaf node, incoming edge label of this
leaf node is expanded and no new node is created. So,
updating
STi-1 is occurred only in internal and virtual nodes. The
operations of adding new suffix start from active node. The
active node is the final node of previous iteration of algo-
rithm. If the active node has an edge, which starts with si,
the active node moves towards the prolongation of this
edge and adding symbol si to the tree completes. If the
active node lacks such edge, a leaf node as a child of the
active node will be added to tree. Then suffix links will
follow and si will be added to other suffixes. Without us-
ing suffix links, attaining linear time for this algorithm
would be impossible. Without suffix links, adding new
symbol to the tree will lead to the traversal of all Childs of
the root.
Analysis: Online construction of suffix tree is the most
important specification of Ukkonen Algorithm. So, if en-
tire of input string is not prepared, its suffix tree can be
constructed. That is with the increasing the input string in
future, there will be no need to construct the tree from the
beginning and new suffixes can be easily added to con-
structed tree. So far, all the algorithms presented for con-
structing the suffix tree, lack this specification. This algo-
rithm assumes that access to input string and tree is poss-
ible in the constant time but in practice, when each of
these data structures exceeds the main memory, access to
them is carried out directly through the disk. So, access
Fig.2. Suffix Links in suffix tree related to TATAT$
Fig.1. Suffix Tree for TATAT$ String.
3

time will depend on the place of the disk in which the
data exist. Since the traversal of tree has not any specific
pattern, this matter reduces the efficiency of algorithm
because the disk head move from one place to another
place [12]. Also, nodes creation is in random form and
there is not any locality in it. In other words, nodes are
generated at any time during the algorithm. If the sibling
nodes are created consecutively, they are stored in con-
secutive locations of disk and this issue makes it easier to
Travers the tree.
The main idea in WOTD Algorithm is that sub-tree be-
low the internal node like u contains all the suffixes of S
which path(u) is their prefix. To do this, all the suffixes
are considered, and then the groups are formed based on
the first character of each suffix. An -group can be
formed for each character . If u for a specific -group
includes only one member, u has only one outgoing edge
which starts with and leads to a leaf. But if -group in-
cludes more than one member, then u has an outgoing
edge which begins with and leads to another internal
node. WOTD Algorithm starts with the evaluation of root
and consideration of all S suffixes. It should be noted that
all suffix tree nodes are evaluated top-down recursively
and from remaining suffixes. After determining the group
based on the suffixes first character, members of each
group are sorted and their LCP are calculated. Finally, an
edge with LCP label is added to tree. In the next stage of
the algorithm, LCP is removed from the beginning of the
group members and new groups are formed from the
remaining suffixes. This process will be continued to add
all the suffixes.
Analysis: Top-down mode is the main feature of
WOTD Algorithm. Then, when a node is added to tree,
there will be no need to revise it at the next stages of con-
struction. That is to say that only a part of suffix tree can
be stored in the memory. Although this feature can be
very helpful in constructing the suffix tree, given that the
size of suffix tree is usually larger than the main memory,
this issue will be problematic at the query time. Since this
method is repeatedly creating the groups (considering
different suffixes) and finding LCP (through character to
character comparison) relevant to them, it has direct
access to input string continually and consequently it re-
quires keeping input string in memory. If the size of input
string is larger than the main memory, this issue will lead
to random access to disk and consequently will result in
reduced efficiency.
4 AN INTRODUCTION TO DISK-BASED
ALGORITHMS
In recent decades, more efforts have been carried out for
making suffix tree related to the larger strings actual
strings. Some of these efforts try to manage access to the
tree while some others try to manage both tree and input
string. Generally, these efforts can be categorized in two
groups: Those which are required to keep input string in
the memory (tree management) and those which do not
have this requirement (input string and tree manage-
ment). The first group is called In-Core String while the
second group is called Out-of-Core String. The efforts
of the first group are introduced in section 4.1 while the
efforts of the second group will be discussed in section
4.2.
4.1 An Introduction to In-Core String Algorithms
Through the application of techniques such as partition-
ing the input string, top-down construction of suffix tree
and/or using each of two techniques, the introduced al-
gorithms try to analyze suffix tree into the small sections
(sub-tree) as well as independent construction of each
sub-tree in memory. Constructing sub-trees of suffix tree
is the main idea of the introduced algorithms, based on
which each sub-tree will be accessed independently in the
main memory. So, for navigating each sub-tree, there is
no need to refer to disk and this issue reduces I/O costs.
In all these methods, if the input string is larger than the
available memory, some accesses to disk should be done.
Based on the basic algorithms, as introduced in Section
2, the first practical algorithm for constructing suffix tree
was introduced in external memory. For the first time,
Hunt and et al [14] suggested a method for construction
of suffix trees larger than the available memory. In this
method, suffix links are not used to attain more localities.
In algorithm introduced by Hunt, prefixes are considered
with the fixed length and sub-tree related to each prefix is
created separately. The length of this prefix (which will
eventually specify the size of sub-tree) is determined ac-
cording to the available memory. The main idea is that
the suffixes which start with s are placed in the sub-tree
different from the suffixes which start with i. constructing
sub-tree is entirely carried out in the main memory and
then the sub-tree is stored in disk.
Analysis: All input strings should be navigated for
constructing each sub-tree. This partitioning method
works very well for non-skewed input string. In skewed
data, repetition of some alphabetic letters exceeds other
letters and this issue occurs in DNA strings. So, the num-
ber of suffixes of some prefixes exceeds others and this
issue will cause largeness of sub-tree of this group of pre-
fixes. Under such circumstances, larger sub-trees are not
fit in the memory and one should refer to disk for con-
structing them. It is possible that small sub-trees may not
have efficient use of memory.
DYNACLUSTER [5] is the next algorithm which tries
to solve the problems of static partitioning of Hunt Algo-
rithm through dynamic partitioning of input string. In
this method, there is a set of fixed length prefixes (prefix
patterns). For each prefix pattern, a queue of index of suf-
fixes, which starts with the specific pattern, is kept. Each
partition is comprised of a table of prefix patterns and
queues related to them. The root cluster is created in the
beginning. The queue associated with each pattern is in-
itialized at the stage of creating root partition and with
the traversal of input string. At the next stage of algo-
rithm, all suffixes existing in a queue (Sk) are processed.
Here, with due observance to the size of queue and the
threshold value (t), a leaf cluster and/or a nonleaf cluster
can be created. If the size of queue turned larger than t, a
nonleaf cluster is created, otherwise, a leaf cluster is
4

created. Suppose that the current level of tree and the
length of prefix are shown with j and l respectively. If leaf
partition is created, suffix Sk+ j.l is processed for each Sk.
For the nonleaf cluster, an edge with Sk label is created for
each Sk and the queue related to the first character of Sk+ j.l
is updated. This process is continued for the next cluster
in the first depth traversal recursively.
Analysis: Top-down feature in this method causes in-
accessibility to nodes which have been created in pre-
vious stages of algorithm with the aim of removing this
group of nodes from the memory. This feature is very
useful due to reduction of memory space during con-
struction time but entire tree should exist in the memory
during query time. Since the size of suffix tree is usually
much larger than the main memory, there will be access
to disk.
TDD [9] is the next method. This method has im-
proved WOTD Algorithm with adding stage of partition-
ing for the creation of independent sub-trees in the mem-
ory. The method consists of two parts: partitioning input
string and top-down construction of each suffix sub-trees
with the use of WOTD Algorithm. The input string parti-
tioning is like Hunt algorithm and sub-trees construction
is carried out using WOTD Algorithm. In fact, combina-
tion of Hunt and WOTD algorithms are the proposed
algorithm, so this algorithm has some part of problems of
the two methods.
Analysis: Similar to Hunt Algorithm, prefixes are used
with the fixed length. Each prefix contains a queue of
start index of its suffixes. For setting these queues, the
input string is scanned ONLY once. Then, there is no
need to scan string for each prefix. Similar to Hunt Algo-
rithm, the management of skewed data will turn proble-
matic through the use of fixed length prefixes and will
cause creation of sub-trees with different sizes. Top-down
feature is desirable at the construction time while it is
undesirable in query time. But in TDD method, the dis-
advantages of top-down feature are limited to the sub-
trees which are not fit in the main memory through the
use of input string partitioning. Accessing the input string
in this method is possible at the time of creating set of
prefixes and also sub-trees construction related to each
prefix. As in previous, if the input string is not fit in the
main memory, the disk will be accessed and this affair
leads to the increased I/O costs.
The next algorithm is Trellis [15]. This method uses
prefixes with variable length for solving the problem of
skewed data. The proposed method consists of four steps:
1) creating a set of prefixes, 2) partitioning the input
string, 3) merging constructed sub-trees and 4) recovering
suffix links.
The input string is scanned for several times to create a
set of prefixes. Frequency of each prefix is calculated in
each scan. The frequency size must be less than the thre-
shold value t, otherwise, the prefix is expanded and input
string is re-scanned for counting the number of the ex-
panded prefix repetitions. The value of t is selected based
on available memory for constructing suffix tree. This
value should be selected appropriately with the aim of
fitting sub-tree of each prefix in the main memory.
For partitioning, input string is divided into r= (n+1)/t
substrings or consecutive partition. The suffix tree of each
partition is created through the use of Ukkonen Algo-
rithm. The sub-tree related to each partition is entirely fit
in the memory because of the selection of t. If the size of
prefixes set (P) turns |P| =m, the created sub-tree will be
stored in the form of m separate file in the disk. At the
third step, the file p related to each partition is loaded in
memory for constructing sub-tree related to the prefix
pP and is merged with growing sub-tree. Trellis ignores
all of the suffix links after the partitioning step and recov-
ers them in 4th step. For recovering suffix links, the con-
structed sub-tree is traversed in depth-first manner start-
ing from children of root.
Analysis: Despite using the variable length prefixes, it
is observed that the size of sub-trees is not completely
balanced [11]. Given that the labels of edges are shown
with start-end index, there is some access to input string
for merging edges at the third step. So, if the input string
is larger than the memory, the disk will be accessed. In
accordance with the comment presented in [11], recover-
ing suffix links does not have performance for the tree
which exists in the disk, because the suffix link of a node
is usually the sub-tree which is stored on a different page
of disk. So, it cannot be assumed that this link can be fol-
lowed in constant time.
4.2 An Introduction to Out-of-String Algorithms
In this section, the algorithms are discussed in which the
I/O costs related to input string larger than available
memory are decreased with dividing input string to the
fitting parts in the main memory.
Also, managing on under-construction tree is in such a
way that there is no need to get access to the whole tree.
Actually, in these methods, one part of memory is allo-
cated to suffix sub-tree while the other part to the sub-
string of input string.
The producers of TDD introduced an algorithm named
ST-MERGE [9]. In this method, the input string is divided
into k partition. The value of k is in such a way that sub-
tree related to each partition is fit in the memory. Then,
suffix sub-tree is created according to TDD algorithm for
each one of these k partition and finally, all sub-trees are
merged with one another. For merging, the roots of all
sub-trees are first merged with one another. Then, the
groups are established based on the first character of the
outgoing edges from the source nodes. At the next stage,
the existing edges in each group are examined. If the
group includes just one member, there is nothing to
merge and its edge and sub-tree will be copied to the tree
simply. But if the group includes more than two edges,
the LCP of group members is found and then a new edge
in the result tree with the label of LCP is created.
Analysis: Although this method, with its input string
partitioning, tries to decrease the I/O costs, it has many
random accesses to input string at the merging stage.
WAVEFRONT [6] is the next efficient algorithm. This
algorithm includes three main stages: 1) Creating the set
of prefixes, 2) Constructing suffix sub-tree, 3) Recovering
suffix links. At the first stage, the set of prefixes is created
5

in such a way that suffix sub-tree, related to each prefix, is
fit in the memory. WAVEFRONT uses variable length
prefixes.
For efficient construction of each suffix sub-tree, cur-
rent suffix and edges label of sub tree should exist in the
memory otherwise there is some access to disk. This me-
thod manages these two entities through use of two types
of tiling.
In the first type, the input string is divided into the
blocks with the size of B byte. These blocks are called Tree
Block. At the ith stage, there is access only to the edges
(substrings of S) which have been placed in the ith Tree
Block. If LCP of a suffix with the under-construction tree
cannot be found in current Tree Block, the operations of
adding the mentioned suffix to the tree will be remained
unfinished until loading of the next Tree Block in the
memory.
In the second type, the input string is divided into the
blocks with the size of B byte and each block is called
Insert Block. In every moment, the algorithm can process
an Insert Block. This processing includes adding the suf-
fixes which their start index is in current Insert Block. In
each stage of algorithm, there is only a Tree Block and an
Insert Block in the memory. After construction of suffix
tree, recovering suffix links is carried out optionally.
Analysis: This algorithm manages well the growing
tree and input string and using prefixes with variable
length in this algorithm solves the problem of skewed
data. Since the whole input string is scanned for all the
prefixes in the first stage of algorithm, the number of in-
put string scans is increased.
Disk-Based Genomic Suffix Tree (DiGeST) [11] is the
next algorithm introduced for the construction of suffix
tree. This algorithm is composed of three steps which
include: 1) preprocessing 2) sortinging suffixes 3) merg-
ing. At the preprocessing step, the input string is divided
into k partition with the same size. The value of k is de-
termined according to the available memory.
At the next step, suffixes in each partition are sorted.
After sorting, the start posision of each suffix is stored, so
there is a list of suffix start positions for each partition.
Further, next to each start position, the 32 character prefix
of the suffix is stored. Storing this prefix helps finding
LCP of two suffixes.
In the merging step, an input buffer is considered for
each of k partition. Suffixes are loaded in the related input
buffer. Then, the top element of the input buffers com-
petes with each other to be added to the tree. The win-
ning element migrates to an output buffer which includes
a part of the suffix tree. A 32 character prefix is used for
identifying the winner. If suffixes, which are being com-
pared, are found equal with each other, the input string
should be accessed for the determination of the winner
suffix. Storing 32 character prefixes for each suffix will
play an important role in decreasing access to the input
string. For adding the winner suffix to tree, the LCP
length of this suffix and the last suffix, which has been
added to tree in the previous stage, is first calculated. The
32 character prefixes of suffixes are used for calculating
LCP and its length. With knowing the LCP, a leaf corres-
ponding to current suffix can be added by traversing the
lexicographically largest path in the existing tree up to
LCP characters and creating a new internal node and a
new leaf.
Analysis: This algorithm also manages growing tree
and input string well. But we should bear in mind that we
can use parallelism as a tool for boosting construction
speed of suffix tree. It should be noted that the structure
of this algorithm is in such a way that makes it hard to
parallel.
5 CONCLUSION
Suffix tree is a very important data structure in
processing long sequences. Construction of suffix tree is
time consuming for the sequences which are larger than
the main memory due to lack of data locality and also the
increase of I/O costs. In this paper, the existing algo-
rithms for constructing suffix tree were classified and the
limitations of each were also explained. Among all the
algorithms mentioned above, only two algorithms called
WAVEFRONT and DiGeST had acceptable management
on the input string and growing suffix tree. Also, review-
ing these algorithms and identifying the limitations of
each can result in designing an algorithm which can im-
prove the problems of previous algorithms.
REFERENCES
[1] D. Gusfield, Algorithms on strings, trees, and sequences. Cam-
bridge University Press, 1997.
[2] "-,-'-,--,,-, Retrieved from Wikipedia",
http://fa.wikipedia.org/w/index.php?title=-,-'-,--,,-&oldid=
4922348, 2011.
[3] "Bioinformatics, Retrieved from Wikipedia",
http://en.wikipedia.org/wiki/Bioinformatics, 2011.
[4] "GenBank release notes. Retrieved from NCBI",
ftp://ftp.ncbi.nih.gov/genbank/gbrel.txt, 2011.
[5] C. Cheung, J. Yu and H. Lu, "Constructing Suffix Tree for Giga-
byte Sequences with Megabyte Memory", IEEE Transactions on
Knowledge and Data Engineering, vol. 17, Issue 1, pp. 90-105,
2005.
[6] A. Ghoting, and K. Makarychev, "I/O Efficient Algorithms for
Serial and Parallel Suffix Tree Construction", ACM Transactions
on Database Systems, vol. 35, Issue 4, no. 25, 2010.
[7] E. McCreight, "A Space-Economical Suffix Tree Construction
Algorithm", Journal of the ACM, vol. 23, Issue 2, pp. 262272,
1976.
[8] J.Szustakowski, "Initial sequencing and analysis of the human
genome". nature, pp. 860-921, 2001.
[9] Y. Tian, S. Tata, R. Hankins and J. Patel, "Practical methods for
constructing suffix trees", The VLDB Journal The International
Journal on Very Large Data Bases, vol. 14, Issue 3, pp. 281-299,
2005.
[10] E. Ukkonen, "On-line construction of suffix trees". Algorithmica
14 , no. 3, pp. 249260, 1995.
[11] M. Barsky, U. Stege, A. Thomo and C. Upton, "A New Method
for Indexing Genomes Using On-Disk Suffix Trees", Proceeding
of the 17th ACM conference on Information and knowledge manage-
ment, pp. 649-658, 2008.
[12] S. Bedathur, and J. Haritsa, "Engineering a Fast Online Persis-
tent Suffix Tree Construction", ICDE '04 Proceedings of the 20th
International Conference on Data Engineering, 720, 2004.
[13] R. Giegerich, S. Kurtz and J. Stoye, "Efficient implementation of
lazy suffix trees", SOFTWAREPRACTICE AND EXPE-
RIENCE, pp. 30-42, 1999.
6

[14] E. Hunt, M.P. Atkinson and R.W.Irving, "A Database Index to
Large Biological Sequences", VLDB '01 Proceedings of the 27th In-
ternational Conference on Very Large Data Bases, pp. 139-148, 2001.
[15] B. Phoophakdee and M.J. Zaki, "Genome-scale disk-based suffix
tree indexing", SIGMOD '07 Proceedings of the 2007 ACM SIG-
MOD international conference on Management of data, pp. 833-844,
2007.
[16] P. Weiner, "Linear pattern matching algorithm", 14th Annual
IEEE Symposium on Switching and Automata Theory. pp. 1-11,
1973.

M. Alavi received BA degree from university of kashan in the year
2008 and now MS student in Payam-e Noor University of Tehran
and working in Pooyesh university as a Network Administrator. Re-
search activities are related to sequence indexing.

You might also like