Professional Documents
Culture Documents
temporal [29] and spatio-temporal [5] databaeea will ● S[i : j] denotes the subsequence that includes entries
in positions i through j
*This research was partially funded by the Institute for
Systems Research (ISR), by the National Science Foundation
● S[i] denotes the i-th entry of sequence S
under Grants IRI-9205273 and IRI-S958546 (PYI), with matching
funds from EMPRESS Software Inc. and Thinking Machines Inc.
● D(S, Q) denotes the distance of the two (equal
t Also with IBM Federal Systems Company, Gaitheraburg, MD
20879. lerigth) sequences S and Q.
$On sabbatical leave from the Department of Informatics,
Aristotle University, Thessaloniki, Greece 54006. Similarity queries can been clsaaified into two cate-
gories:
Permission to cc y without fee all or part of this material Is . Whole Matching. Given a collection of N data
granted providJ that the copies are not made or distributed for
direct commercial advantage, the ACM copyright notice and the sequences of real numbers S1, Sa, . . . . SN and a query
title of the publication and its date appear, and notice is given sequence Q, we want to find those data aequencee
that copying is by permission of the Association of Computing that are within distance e from Q. Notice that data
Machinery. To copy otherwise, or to republish, requires a fee
anct/orspecific permission. and query sequences must have the same length.
419
● Subsequence Matching. Given N data sequences the contributions of the present paper, giving some
Sl,sz,..., SN of arbitrary lengths, a query sequence extensions of this technique and outlining some open
Q and a tolerance e, we want to identify the data problems.
sequences Si (1 < i < N) that cent ain matching
subsequences (i.e. subsequences with distance < c 2 Background
from Q). Report those data sequences, along with To the best of the authors’ knowledge, this is the first
the correct offsets within the data sequences that best work that examines indexing methods for approximate
match the query sequence. (We assume that we are subsequence matching in time-series databases. The
given a function D(S, Q), which gives the distance of following work is related, in different respects:
the sequences S and Q. For example, V() can be the
Euclidean distance.) indexing in text [13] and DNA databases [6]. Text
and DNA strings can be viewed as l-dimensional
The case of ‘whole match’ queries can be handled as
sequences; however, they consist of discrete symbols
follows [2]: A distance-preserving transform, such as
w opposed to continuous numbers, which makes a
the Discrete Fourier transform (DFT), can be used
difference when we do the feature extraction.
to extract ~ features from sequences (eg., the first
~ DFT coefficients), thus mapping them into points ‘whole matching’ approximate queries on time-
in the f-dimensional feature space. Subsequently, sequences [2] or on color images [14, 21] or even on
any spatial access method (such as R*-trees) can be 3-d MRI brain scans [8]. In all these methods, the
used to search for range fapproximate queries. This idea is to use f feature extraction functions to map
approach exploits the assumption that data sequences a whole sequence or image into a point in the (d-
and query sequences SH have the same length. Here, imensional) feature space [18]; then, spatial access
we generalize the problem and present a method to methods may be used to search for similar sequences
answer approximate-match queries for subsequences of or images. The resulting index that contains points
arbitrary length Len(Q) . The ideal method should in feature space is called F – iradez [2].
fulfill the following requirements:
The F-index works as follows: Given N sequences, all
● it should be fast. Sequential scanning and distance of the same length n, we apply the n-point Discrete
calculation at each and every possible offset will be Fourier Transform (DFT) and we keep the first few
too slow for large databases. coefficients. Let’s assume that we keep f numbers -
thus, each sequence is mapped into a point in an f-
● it should be ‘correct’. In other words, it should
dimensional space. These points are organized in an
return all the qualifying subsequences, without
R*-tree, for faster searching. In the typical query, the
missing any (i.e., no ‘false dismissals’). Notice
user specifies a query sequence Q (of length n again) and
that ‘false alarms’ are acceptable, since they can be
a tolerance c, requesting all the data sequences that are
discarded easily through a post-processing step.
within distance e from Q. To resolve this query, (a) we
● the proposed method should require a small space apply the n-point DFT on the sequence Q, we keep the
overhead. f features, thus mapping Q into a f-dimensional point
~t in feature space; (b) we use the F-index to retrieve
✠ the method should be dynamic. It should be easy to all the points within distance c from qf; (c) we discard
insert and delete sequences, as well as to append new the false alarms (see more explanations in Lemma 1),
measurements at the end of a given data sequence. and we return the rest to the user.
Here, we generalize the ‘F-index’ method, which was
● the method should handle data sequences of varying
designed to handle ‘whole matching’ queries, Our goal
length, as well as queries of varying length.
is to handle subsequence queries, by mapping data se-
The remainder of the paper is organized as follows, quences into a few rectangles in feature space. Since we
Section 2 gives some background material on past rely on spatial access methods as the eventual indexing
related work, on spatial access methods and on the mechanism, we mention that several multidimensional
Discrete Fourier Transform. Section 3 focuses on indexing methods have been proposed, forming three
subsequence matching; we propose a new indexing classes: R*-trees [9] and the rest of the R-tree family
mechanism and we show how off-th~shelf spatial access [15, 17, 28]; linear quadtrees [26, 24]; and grid-files [22].
methods (and specifically the R*-tree) can be used. To guarantee that the ‘F-index’ method above does
Section 4 discusses performance results obtained from not result in any false dismissals, the dist ante in feature
experiments on real and synthetic data, which show space should match or underestimate the distance
the effectiveness of our method. Section 5 summarizes between two objects. Mathematically, let 01 and 02 be
420
two objects (e.g., same-length sequences) with distance where j is the imaginary unit j = H. The signal Z
fUIICtiOII Dobject () (e.g., the Euclidean distance) and can be recovered by the inverse transform:
F(O1), 17(02) be their feature vectors (e.g., their
n-1
first few Fourier coefficients), with distance function
Zi=l/@~xFeXp(jzZF~/~) iDO, l,. ..,l–l
~;y~e,() (e.g., the Euclidean distance, again). Then
F=o
(6)
Lemma 1 To guarantee no false dismissals for range X~ is a complex number (with the exception of X.,
queries, the feature extraction function F() should which is a real, if the signal E is real). The energy E(Z)
satisfy the following formula: of a sequence Z is defined as the sum of energies (squares
of the amplitude [Zi 1) at every point of the sequence:
Djeawre(F(@), F(02)) < ~object(@> 02) (1)
n-1
Djeata~.(l’(Q)j F(o))< Dobj~~t(Q, 0) < ‘ (s)
~ lzi12= ~ IXF12 (8)
D i=O F=o
Following [2], we use the Euclidean distance as the
Since the DFT is a linear transformation [23], Parseval’s
distance function between two sequences, that is, the
theorem implies that the DFT also preserves the
sum of squared differences. Formally, for two sequences
Euclidean distance between two signals F and @
S and Q of the same length 1, we define their distance
D(S, Q) as
9(5, y) = qi, i) (9)
1/2
D(S, Q)
As an example
s
( ~(S[~
i=l
of feature
- Q[i])2
extraction
)
function F()
(4)
we
where ~
respectively.
and ~ are Fourier
421
application. For example, in stock price databases,
Symbols Definitions. analysts are interested in weekly or monthly patterns
N Number of data sequences. because shorter patterns are susceptible to noise [12].
Si The i-th data sequence (1 < i < N). Notice that we never lose the ability to answer shorter
Len(S) Length of sequence S. than w queries, because we can always resort to
S[i : j] Subsequence of S, including entries in Generalizing the reasoning of the method for ‘whole
matching’, we use a sliding window of size w and place
positions i through j.
it at every possible position (offset), on every data
Q A query sequence.
sequence. For each such placement of the window,
w Minimum query sequence length.
we extract the features of the subsequence inside the
V(Q, S) (Euclidean) distance between sequences window. Thus, a data sequence of length Len(S) is
Q and S of equal length. mapped to a trail in feature space, consisting of
Tolerance (max. acceptable distance). Len(S) -w+l points:
one point for each possible offset
; Number of featurea. of the sliding window. Figure l(a) gives an example
me Marginal cost of a point. of trails. Assume that we have two sequences, S1 and
Sz (not shown in the figure), and that we keep the first
~=2 features (eg, the amplitude of the first and second
Table 1: Summary of Symbols and Definitions coefficient of the w-point DFT). When the window of
length w is placed at offset=O on S1, we obtain the first
point of the trail Cl; as the window slidee over S1, we
This concludes our discussion on prior work, which
obtain the rest of the points of the trail Cl. The trail
concentrated on ‘whole match’ queries. Next, we
C’z is derived by S2 in the same manner.
describe in detail how we handie the requests for
Figure 4 gives an example of trails, using a real time-
matching subsequences.
series (stock-price movements).
One straightforward way to index these trails would
3 Proposed Method
be to keep track of the individual points of each trail,
Here, we examine the problem of subsequence matching. storing them in a spatial access method. We call
Specifically, the problem is defined as follows: this method ‘1-naive’ method, where ‘I’ stands for
‘Index’ (as opposed to sequential scanning). When
● We are given a collection of N sequences of real
presented with a query of length w and tolerance c,
numbers S1, Sz, SN, each one of potentially different
we could extract the features of the query and search
length.
the spatial access method for a range query with radius
● The user specifies query subsequence Q of length e; the retrieved points would correspond to promising
Len(Q) (which may vary) and the tolerance q subsequences; after discarding the false alarms (by
that is, the maximum acceptable dis-similarity (= retrieving all those subsequences and calculating their
distance). actual distance from the query) we would have the
desired answer set, Notice that the method will not
● We want to find quickly all the sequences Si ( 1<
miss any qualifying subsequence, because it satisfies the
i < N), along wit-h the correct offsets k, such’ that
condition of Lemma 1.
the subsequence S~[k : k+ Len(Q) – 1] matches the
However, storing the individual points of the trail in
query sequence: D(Q, Si[k : k + Len(Q) – 1]) < E.
an R*-tree is inefficient, both in terms of space as well
The brute-force solution is to examine sequentially every as search speed. The reason is that, almost every point
possible subsequence of the data sequences for a match. in a data sequence will correspond to a point in the
We shall refer to this method by ‘SequentialScan’ ~-dimensional feature space, leading to an index with
method. Next, we describe a method that uses a a 1:f increase in storage requirements. Moreover, the
small space overhead, to achieve order of magnitudes search performance will also suffer because the R-tree
savings over the ‘SequentialScan’ method. The main will become tall and slow. As we shall see in the section
symbols used through the paper and their definitions with the experiments, the ‘I-naive’ method ended up
are summarized in Table 1. being almost twice as slow as the ‘Sequentia/Scan’ .
Thus, we want to improve the ‘l-naive’ method, by
3.1 Sketch of the approach - ‘ST-index’ making the representation of the trails more compact.
Without loss of generality, we assume that the minimum The solution we propose exploits the fact that
query length is w , where w (> 1) depends on the successive points of the trail will probably be similar,
422
El
trails. Notice that it is possible that MBRs belonging
to the same trail may overlap, ss C’2 illustrates.
Thus, we propose to map a data sequence into a set
of rectangles in feature space. This yields significant
improvements with respect to space, as well as with
respect to response time, as we shall see in section
4. Each MBR corresponds to a whole sub-trail,
that is, points in feature space that correspond to
successive positioning of the sliding window on the data
sequences. For each such MBR we have to store
t,ta,t, t,~d which are the offsets of the first and last
C2 such positioning;
423
● Queries: How to handle queries, and especially the
ones that are longer than w. Method Description
‘SequentialScan’ Sequential scan of the whole
These are the topics of the next two subsections,
database.
respectively.
‘I-naive’ Search using an ‘ST-index’ with
1 point per sub-trail.
Fz ‘I-$xed’ Search using an ‘ST-indez’ with
a fixed number of points per
El
;5 sub-trail.
:4 ‘I-adaptive’ Search using an ‘ST-indez’ with
1
P; P6 a variable number of points per
sub-trail.
ik
P8
*
P2 Table 2: Summary of searching methods and descrip-
P9*
PI * n tions
QP3 $4
<5 an index,
the
with some fixed-length
‘I-naive’ method is a special
method, when the sub-trail length
Thus we are looking
sub-trails.
for a method
Clearly,
csse of the
is set to 1.
that will
‘I-fixed’
be able
to adapt to the distribution of the points of the trail.
We propose to group points into sub-trails by means
of an ‘adaptive’ heuristic, which is based on a greedy
c1PI l%
algorithm. The algorithm
tries to estimate
uses a cost function,
the number
which
of disk accesses for each
of the options. The resulting indexing method will
F] - be called ‘I-adaptive’. This is the lsst of the four
alternatives we have introduced. Table 2 lists all of
Figure 3: Packing points using (a) a fixed heuristic (sub- them, along with a brief description for each method.
trail size = 3), and (b) an adaptive heuristic. To complete the description of the ‘I-adaptive ‘method,
we have to define a cost function and the concept of
marginal cost of a point. In [19] we developed a for-
3.2 Insertion - Methods to divide trails into mula which, given the sides ~ = (Ll, L2, . . . Ln ) of the
sub-trails n-dimensional MBR of a node in an R-tre~, estimates
the average number of disk accesses DA(L) that this
As we saw before, each data sequence is mapped into
node will contribute for the average range query:
a ‘trail’ in feature space. Then the question arising
is: how should we optimally divide a trail in feature
space into sub-trails and eventually MBRs, so that the DA(Z) = fi(L~ + 0.5) (lo)
number of disk accesses is minimized? A first idea i=l
would be to pack points in sub-trails according to a The formula sssumes that the address space hss been
pre-determined, fixed number (e.g., 50). However, there normalized to the unit hyper-cube ( [0, l)n). We use
is no justifiable way to decide the optimal value of the expected number of disk accesses DA() as the cost
this constant. Another idea would be to use a simple function. The marginal cost (me) of a point is defined as
th of the stored sequence for this sub- follows: Consider a sub-trail of k points with an MBR of
trail size (e.g. Len(S) ). However, both heuristics sizes Ll, ..., Ln. Then the marginal cost of each point
may lead to poor results. Figure 3 illustrates the in this sub-trail is
problem of having a pre-determined sub-trail size. It
shows a trail with 9 points, and it assumes that the mc = DA(~)/k (11)
424
That is, we divide the cost of this MBR equally among ‘D(S, Q)<C ~ 9(S[i:j], Q[i:j])<c (l<i<j<i)
the contained points. The algorithm is then as follows: (12)
Proofi Since D() is the Euclidean distance, we have
/* Algorithm Divide-to-Subtralls */
Assign the first point of the trail in a
(trivial) sub-trail (13)
k=l
FOR each successive point
IF it increases the marginal cost of the Since
current sub-trail
TliEIl start another sub-trail
~(S[k] - Q[k])2 < ~(S[k] - ~k])z (14)
ELSE include it in the current sub-trail
k=i k=]
superset of the sub-trails we should actually retrieve; as we show, the tolerance specified for each sub-query
can be reduced to c/@. The final result is that
if a sub-trail intersects the query region, its MBR will
definitely do so (while the reverse is not necessarily the total query volume in feature space is reduced.
true). Thus themethod introduces no false dismissals. The following Lemma establishes the correctness of the
proposed method. Consider two sequences Q and S of
Handling longer queries (Len(Q) > w) is more
the same length Len(Q) = Len(S) = p *w. Consider
complicated. Thereason is that the ‘ST-index’’knows’
their p disjoint subsequences q~ = Q[i* w+ 1: (i+ 1) * w]
only about subsequences of length w. A straightforward
ands~ = S[i*w+l: (i+l)*w], where O~ i<p–1.
approach would be to select a subsequence (e.g., the
prefix) of Q of length w, and use our ‘ST-indez’ to Lemma 3 If Q and S agree within tolerance e then
search for data subsequences that match the prefix of Q at least one of the pairs (si, qi) of corresponding sub-
within tolerance ~. We call this method ‘PrejixSearch’. sequences agree within tolerance e~fi:
This will clearly return a superset of the qualifying
subsequences: a subsequence Tthat is within tolerance 9(Q, S) < c ~ V:~: (D(q~, S~) < c/@ (17)
c of the query sequence Q (Len(Q) = Len(T)) will
where V indicates disjunction.
have all its (sub) subsequences withb tolerance < e from
Proof. By contradiction: If all the pairs of subs~
the corresponding subsequence of Q. In general we can
quences have distance > ej@, then, by adding all these
prove the following lemma:
distances, the distance of Q and S will be > c, which is
a contradiction. More formally, since for i = O, ..., p – 1
Lemma 2 If two sequences S and Q with the same
length 1 agree within tolerance c, then any pair (S[i : j], (i+l).w
Q[i : j]) of corresponding subsequences agree within the (18)
‘Z(qi, ‘i)= ~ (WUI - ‘iLd)2
same tolerance. j=i*w+l
425
we have that was a real number, having a size of 4 bytes. Figure
4(a) shows a sample of 250 points of such a stock price
Vi (D(q~, s~) > C/@) + (19) sequence. The system is written in ‘C’ under AIX, on
an IBM RS/6000. Based on the results of [2], we used
(i+l)*w
(
only the first 3 frequencies of the DFT; thus our feature
Vi ~ (!?ihl - ‘iti])’ > e2/P * (20)
space has f=6 dimensions (real and imaginary parts of
j=i*w+l
) each retained DFT coefficient). Figure 4(b) illustrates
p*w a trail in feature space: it is a 2-dimensional ‘phase’
(21) plot of the amplitude of the O-th DFT coefficient vs.
X(QUI - shl)’ >p.t2/p= ,2
j=l the amplitude of the l-st DFT coefficient. Figure 4(c)
or similarly plots the amplitudes of the l-st and 2-rid DFT
D(Q, S) > t (22) coefficients. For both figures the window size w was 512
points. The smooth evolution of both curves justifies
which contradicts the hsmothesis. ❑
our method to group successive points of the feature
The searching algori~hm that uses Lemma 3 will be
space in MBRs. An R*-tree [9] was used to store the
called ‘MultiPiece’ search. It works as follows:
MBRs of each sub-trail in feature space.
Algorithm ‘Search_Long ( ‘MultiPiece’ )’
We carried out experiments to measure the perfor-
● the query sequence Q is broken in p sub-queries mance of the most promising method: ‘I-adaptive’.
which correspond to p spheres in feature space with For each setting, we asked queries with variable se-
radius c/~; lectivities, and we measured the wall-clock time on
a dedicated machine. More specifically, query se-
● we use our ‘ST-indez’ to retrieve the sub-trails quences were generated by taking random offsets into
whose MBRs intersect at least one the sub-query the data sequence and obtaining subsequences of length
regions; Len(Q) from those offsets. For each such query se-
quence, a tolerance ~ was specified and the query was
● then, we examine the corresponding subsequences of
run with that tolerance. This method was followed in
the data sequences, to discard the false alarms.
order to eliminate any bias in the results that may be
Next, we compare the two search algorithms caused by the index structure, which is not uniform at
( ‘PrefixSearch’ and ‘MultiPiece’ ) with respect to the the leaf level. Unless otherwise specified, in all the ex-
volume they require in feature space. The volume of an periments we used w = 512 and Len(Q) = w.
f-dimensional sphere of radius c is given by We carried out four groups of experiments, as follows:
426
O.tz
‘&4
0.1
OM
2.006
0.06
265
I
2s45 0114
2.64
Om
2.SW
11,~ ~ o~
0.12 0.1s a14 a16 0.16 0.17 0.18 0.10 a2 021 0.0SS0.00 O.WO0.070.075 o,M 0.0$ 50.020.0850,1 O,lM
A@2mbafO+iID~~ An@u2n d M DITmdRc+eti
(b) (c)
Figure 4: (a) Sample stock-price sequence; (b) its trail in the feature space of the O-th and l-st DFT coefficients and
(c) the equivalent trail of the l-st and 2-rid DFT coefficients
‘“+08
~
❑ I-neive
1e+07
100 , ●
g : ● ● “. . ..” .“ ● *-$,*.:.
●
●
* ● ●
.”●
10 , **8*” ●
.;””
..* ● ●. ● ●* ●*:
● ●
loco” ● .’.:r~}
1000 <
1 10 100 1000 1000o ie-06 1945 O.owl 0.001 0.01 0.1
Average cub-trail length
Query Selectivity
Figure 5: Index space vs. the average sub-trail length Figure 6: Relative wall clock time VS, selectivity in log-
(log-log scales) log scale (Len(Q) =w=512 points).
the ‘I-naive’ method was approximately two times Response time - longer queries. Next we examine
slower than sequentially scanning the entire database. the relative performance of the two methods for queries
that are longer than w. As explained in the previous
Response time - ‘Short’ queries. We start examin-
section, in these cases we have to split the query and
ing our method’s response time using the shortest ac-
merge the results ( WultiPiece’ method). As illustrated
ceptable queries, that is, queries of length equal to the
in Figure 7, again the proposed ‘I-adaptive ‘method
window size (Len(Q) = w). We used 512 points for
outperforms the sequential scanning, from 2 to 40 times.
Len(Q) and w. Figure 6 gives the relative response
Synthetic data. In Figure 8 we examine our tech-
time of the sequential scanning method (Z) vs. our in-
nique’s performance against a time-series database con-
dex assisted method (T,, where r stands for ‘R-tree’),
sisting of 500,000 points produced by a random-walk
by counting the actual wall-clock time for each method.
The horizontal axis is the selectivity; both axes are in method. These points were generated with a starting
value of 1.5, whereas the step increment on each step
logarithmic scales. The query selectivity varies up to
was + .001. Again we remark that our method outper-
10% which is fairly large in comparison with our 329,000
forms sequential scanning from 100 to 10 times approx-
points time-series database. We see that our method
imately for selectivitiea up to 10%.
achieves from 3 up to 100 times better response time
for selectivities in the range from 10-4 to 10%.
5 Conclusions
We carried out similar experiments for Len(Q) = w
= 256, 128 etc., and we observed similar behavior and We have presented the detailed design of a method that
similar savings. Thus, we omit those plots for brevity. efficiently handles approximate (and exact) queries for
Our conclusion is that our method consistently achieves subsequence matching on a stored collection of data
large savings over the sequential scanning. sequences. This method generalizes the work in [2],
427
I mo ● it is provably ‘correct’, that is, it never misses
Relative Measured Wall Clock Time ●
1
qualifying subsequences (Lemmaa 1-3)
Query Saieetivity
References
Figure 7: Relative wall clock time vs. selectivity in a
[1] R. Agrawal, T. Imielinski, and A. Swami. Database
log-log scale (w=128, Len(Q) =512 points).
mining: a performance perspective. IEEE tins.
lWO \ on Knowledge and Data Engineen”ng, 5(6):914–925,
-1
Relative Measured Well Clock Time ● 1993.
428
[9] N. Beckmann, H.-P. Kriegel, R. Schneider, and Also available as IBM Research Report RJ 9203
B. Seeger. The r*-tree: an efficient and robust (81511), Feb. 1, 1993, Computer Science.
access method for points and rectangles. ACM
SIGMOD, pages 322-331, May 1990. [22] J. Nievergelt, H. Hinterberger, and K.C. Sevcik.
The grid file: an adaptable, symmetric multikeyfile
[10] Christopher Chatfield. The Analysis of Time structure. ACM TODS, 9(1):38–71, March 1984.
Series: an Introduction. Chapman and Hall,
[23] Alan Victor Oppenheim and Ronald W. Schafer.
London & New York, 1984. Third Edition.
Digital SignaJ Processing. Prentice-Hall, Engl&
[11] Mathematical Committee on Physical and NSF En- wood Cliffs, N. J., 1975.
gineering Sciences. Grand Challenges: High Per-
[24] J. Orenstein. Spatial query processing in an object-
formance Computing and Communications. Na-
oriented database system. Proc. ACM SIGMOD,
tional Science Foundation, 1992. The FY 1992 U.S.
pages 326–336, May 1986.
Research and Development Program.
[25] Mary Beth Ruskai, Gregory Beylkin, Ronald Coif-
[12] Robert D. Edwards and John Magee. Technical
man, Ingrid Daubechies, Stephane Mallat, Yves
Analysis of Stock llends. John Magee, Springfield,
Meyer, and Louise Raphael. Wavelets and Their
Massachusetts, 1966. 5th Edition, second printing.
Applications. Jones and Bartlett Publishers,
[13] C, Faloutsos. Access methods for text. ACM Boston, MA, 1992.
Computing Surveys, 17(1):49-74, March 1985.
[26] H. Samet. The Design and Analysis of Spatial Data
[14] C. Faloutsos, W. Equitz, M. Flickner, W. Niblack, Structures. Addison-Wesley, 1989.
D. Petkovic, and R. Barber. Efficient and effective
[27] Manfred Schroeder. Fractals, Chaos, Power Laws:
querying by image content. Journal of Ini?elligent
Minutes From an Injinite Paradise, W,H. Freeman
Information Systems, 1993. (to appear).
and Company, New York, 1991.
[15] A. Guttman. R-trees: a dynamic index structure
[28] T, Sellis, N. Roussopoulos, and C. Faloutsos. The
for spatial searching. Proc, ACM SIGMOD, pages
r+ tree: a dynamic index foi multi-dimensional
47-57, June 1984.
objects. In Proc. l%h International Conference on
[16] Richard Wesley Hamming. Digital Filters. VLDB, pages 507-518, England,, September 1987.
Prentice-Hall Signal Processing Series, Englewood also available aa SRC-TR-87-32, UMIACS-TR-87-
Cliffs, N.J., 1977. 3, CS-TR-1795.
[17] H. V. Jagadish. Spatial search with polyhedra. [29] R. Stare and Richard Snodgrass. A bibliography
Proc. Sixth IEEE Int’1 Conf. on Data Engineering, on temporal databases. IEEE Bulletin on Data
February 1990. Engineering, 11(4), December 1988.
[18] H,V. Jagadish. A retrieval technique for similar [30] Dimitris Vassiliadis. The input-state space ap-
shapes. Proc. ACM SIGMOD Conf., pages 208- proach to the prediction of auroral geomagnetic ac-
217, May 1991. tivity from solar wind variables. Int. Workshop on
Applications of Arts’’cial Intelligence in Solar Ter-
[19] Ibrahim Kamel and Christos Faloutsos. On packing restrial Physics, September 1993.
r-trees. Second Ini. Conf on Information and
Knowledge Management (CIKM), November 1993. [31] Gregory K. Wallace. The jpeg still picture compres-
sion standard. CA CM, 34(4):31-44, April 1991.
[20] B. Mandelbrot. Fractal Geomety of Nature. W.H.
Freeman, New York, 1977.
429