Professional Documents
Culture Documents
Trajectory Clustering
Trajectory Clustering
jaegil@uiuc.edu, hanj@cs.uiuc.edu
Kyu-Young Whang
KAIST
kywhang@cs.kaist.ac.kr
ABSTRACT
1. INTRODUCTION
Permission to make digital or hard copies of all or part of this work for personal or
classroom use is granted without fee provided that copies are not made or
distributed for profit or commercial advantage and that copies bear this notice and
the full citation on the first page. To copy otherwise, to republish, to post on servers
or to redistribute to lists, requires prior specific permission and/or a fee.
and STING [22]. Previous research has mainly dealt withclustering of point data.
There is increasing interest to perform data analysis over these trajectory data. A
typical data analysis task is to ¯nd objects that have moved in a similar way [21].
Thus, an eficient clustering algorithm for trajectories is essential for such data
analysis tasks.
Then, unsupervised learning is carried out using the maximum likelihood principle.
Speci¯cally, the EM algorithm is used to determine the cluster memberships. This
algorithm clusters trajectories as a whole; i.e., the basic unit of clustering is the
whole trajectory.
Our key observation is that clustering trajectories as a whole could not detect similar
portions of the trajectories. We note that a trajectory may have a long and
complicated path. Hence, even though some portions of trajectories show a common
behavior, the whole trajectories might not.
Example 1. Consider the ¯ve trajectories in Figure 1. We can clearly see that there is
a common behavior, denoted by the thick arrow, in the dotted rectangle. However, if
we cluster these trajectories as a whole, we cannot discover the common behavior
since they move to totally diferent directions; thus, we miss this valuable information.
TR2
TR3
TR4
Meteorologists are trying to improve the ability to forecast the location and time of
hurricane landfall. An accurate landfall location forecast is of prime importance
ing) or at sea (i.e., before landing). Thus, discovering the common sub-trajectories
helps improve the accuracy of hurricane landfall forecasts.
Zoologists are investigating the impacts of the varying levels of vehicular tra±c on the
movement, distribution, and habitat use of animals. They mainly measure the mean
distance between the road and animals. Zoologists will be interested in the common
behaviors of animals near the road where the tra±c rate has been varied.
Hence, discovering the common sub-trajectories helps reveal the efects of roads and
tra±c.
Example 2. Consider an animal habitat and roads in Figure 2. Thick lines represent
roads, and they have diferent tra±c rates. Wisdom et al. [23] explore the spatial
patterns of mule deer and elk in relation to roads of varying traffic rates. More
speci¯cally, one of the objectives is to assess the degree to which mule deer and elk
may avoid areas near roads based on variation in rates of motorized tra±c. These
areas are represented as solid rectangles. Thus, our framework is indeed useful for
this kind of research. 2
Low
Traffic Rate
Roads
One might argue that, if we prune the useless parts of trajectories and keep only the
interesting ones, we can use traditional clustering algorithms that cluster trajectories
as a whole. This alternative, however, has two major drawbacks compared with our
partition-and-group framework.
partitioned into a set of line segments. These line segments are provided to the next
phase.
(2) The grouping phase: Similar line segments are grouped into a cluster. Here, a
density-based clustering method is exploited.
Density-based clustering methods [2, 6] are the most suitable for line segments
because they can discover clusters of arbitrary shape and can ¯lter out noises [11].
We can easily
see that line segment clusters are usually of arbitrary shape, and a trajectory
database contains typically a large amount of noise (i.e., outliers).
dition, we provide a simple but efective heuristic for determining parameter values of
the algorithm.
² We demonstrate, by using various real data sets, that our clustering algorithm
e®ectively discovers the representative trajectories from a trajectory database.
The rest of the paper is organized as follows. Section presents an overview of our
trajectory clustering algorithm.
Section 3 proposes a trajectory partitioning algorithm for the ¯rst phase. Section 4
proposes a line segment clustering algorithm for the second phase. Section 5
presents the results of experimental evaluation. Section 6 discusses related
2. TRAJECTORY CLUSTERING
In this section, we present an overview of our design. Section 2.1 formally presents
the problem statement. Section
2.2 presents the skeleton of our trajectory clustering algorithm, which we call
TRACLUS. Section 2.3 de¯nes our distance function for line segments.
trajectory for each cluster Ci, where the trajectory, cluster, and representative
trajectory are de¯ned as follows.
points chosen from the same trajectory. Line segments that belong to the same
cluster are close to each other according to the distance measure. Notice that a
trajectory can belong to multiple clusters since a trajectory is partitioned into multiple
line segments, and clustering is performed over these line segments.
to our distance measure are grouped together into a cluster. Then, a representative
trajectory is generated for each cluster. 2
TR5
TR1
TR2
TR3
TR4
A set of trajectories
A cluster
(1) Partition
(2) Group
A representative trajectory
2.2 The TRACLUS Algorithm Figure 4 shows the skeleton of our trajectory clustering
=1 is de¯ned
=1 ap
k Pnk
=1 ap¡1
Algorithm:
/* Partitioning Phase */
/* Figure 8 */
02: Execute ApproximateTrajectory Partitioning;
/* Grouping Phase */
/* Figure 12 */
/* Figure 15 */
Li
Lj
si ei
ej
sj
l^1
l^2
dq
l||1 l|| 2
sin( )
12
2
2
q=´q
=+
^^
^^
d Lj
dll
ll
ll
ps pe
d?(Li;Lj) =
l2?
1 + l2?
l?1 + l?2
(1)
pe to si and ei.
and Lj .
dµ(Li;Lj) =
where u1 =
k¡¡! sieik2 ; u2 =
k¡¡! sieik2
(4)
cos(µ) =
(5)
3. TRAJECTORY PARTITIONING
(c1 < c2 < c3 < ¢ ¢ ¢ < cpari ). Then, the trajectory TRi is
g.
tions.
p1
p2 p3
p4
p5
p6 p7
p8
pc1 pc2
pc3 pc4
TRi
tory partitions.
g. Then,
notes the length of a line segment pcj pcj+1, i.e., the Eu-
L(H) =
paXri¡1
j=1
L(DjH) =
paXri¡1
j=1
cjX+1¡1
k=cj
(7)
log ( ( , ) ( , ) ( , ))
( | ) log ( ( , ) ( , ) ( , ))
( ) log ( ( ))
2141214231434
2141214231434
214
dppppdppppdpppp
LDHdppppdppppdpppp
L H len p p
q+q+q
=+++
^^^
pc1 pc2
p1
p2 p3
p4
p5
ter than the endpoints of line segments. That is, the new
Algorithm ApproximateTrajectoryPartitioning
Algorithm:
12: Add pleni into the set CPi; /* the ending point */
a trajectory.
approximate algorithm.
of a trajectory TRi.
TRi. 2
p1
p2
p4
(,)(,)
(,)(,)
1515
1414
MDL p p MDL p p
MDL p p MDL p p
par nopar
par nopar
<
>
approximate solution
exact solution
p3 p5
is symmetric. 2
line segments.
De¯nition 4. The "-neighborhood N"(Li) of a line segment
MinLns.
two conditions:
L1
L3
L5 L2 L4
L6
L6 L5 L3 L1 L2 L4
connectivity.
4.1.3 Observations
are far apart. Thus, it turns out that a short line segment
(,)(,)0
(,)(,)
12232
1223
==»
=<
dLLdLLL
dist L L dist L L
L3
L2
L1
density-reachable
e
q=q=>
=>
12232
1223
(,)(,)
(,)(,)
dLLdLLL
dist L L dist L L
L3
L2
L1
clustering.
algorithm DBSCAN.
line segments. For example, in the extreme, all the line seg-
De¯nition 10.
core line segment. In the third step (lines 13»16), the algo-
tering algorithm.
Algorithm:
/* Step 1 */
/* Step 2 */
11: else
/* Step 3 */
18: while (Q 6= ?) do
28: g
line segments.
segments.
line. This number changes only when the sweep line passes
otherwise, we skip the current point (e.g., the 5th and 6th
cated too close (e.g., the 3rd position in Figure 13), we skip
sweep
MinLns = 3
2345678
sentative trajectory.
1; ¡!v
2, ¡!v
3,
¡!V
of V is de¯ned as
¡!V
¡!v
1 + ¡!v
2 + ¡!v
3 + ¢ ¢ ¢ + ¡!vn
jVj
(8)
x0
y0
cos Á sin Á
¡sin Á cos Á
¸·
(9)
X¢
Y¢
1x
1y
1 x¢
1 y¢
(,)(,123
111
lllyyy
xyx
¢¢+¢+¢¢¢=
average coordinate
( , ) 1 l1 x¢ y¢
( , ) 1 l 2 x¢ y¢
( , ) 1 l 3 x¢ y¢
Figure 14: Rotation of the X and Y axes.
Algorithm:
¡!V
¡!V
sentative trajectory.
tion. If all the outcomes are equally likely, then the entropy
should be maximal.
for too small an ", jN"(L)j becomes 1 for almost all line
technique.
H(X) =
Xn
i=1
p(xi) log2
p(xi)
=¡
Xn
i=1
where p(xi) =
jN"(xi)j Pn
j=1 jN"(xj)j
and n = numln
(10)
heuristic in Section 5.
5. EXPERIMENTAL EVALUATION
the results for two real data sets in Sections 5.2 and 5.3. We
ricanes from the years 1950 through 2004. This data set has
4http://weather.unisys.com/hurricane/atlantic/
5http://www.fs.fed.us/pnw/starkey/data/tables/
in the animal movement data set are much longer than those
nuXmclus
i=1
@1
2jCij
x2Ci
y2Ci
dist(x; y)2
A+
2jN j
w2N
z2N
dist(w; z)2
10.06
10.08
10.1
10.12
10.14
10.16
10.18
10.2
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58
Eps
Entropy
optimum
130000
140000
150000
160000
170000
180000
27 28 29 30 31 32 33
Eps
QMeasure
MinLns=5
MinLns=6
MinLns=7
optimum
11.36
11.38
11.4
11.42
11.44
11.46
11.48
11.5
1 4 7 10 13 16 19 22 25 28 31 34 37 40 43 46 49 52 55 58
Eps
Entropy
optimum
510000
530000
550000
570000
590000
610000
630000
25 26 27 28 29 30 31
Eps
QMeasure
MinLns=8
MinLns=9
optimum MinLns=10
upper-right region looks dense, we ¯nd out that the elks ac-
on average.
6. RELATEDWORK
PK
the distance measure LCSS, and Chen et al. [5] the distance
measure EDR. Both LCSS and EDR are based on the edit
rates. EDR can represent the gap between two similar sub-
ever, are not adequate for our problem since they are orig-
portions of trajectories.
divided into disjoint row and column groups such that the
7.1 Discussion
tering algorithm.
tive trajectory.
our algorithm.
7.2 Conclusions
trajectories. This work is just the ¯rst step, and there are
8. REFERENCES
[1] Achtert, E., BÄohm, C., Kriegel, H.-P., KrÄoger, P., and
[4] Chen, J., Leung, M. K. H., and Gao, Y., \Noisy Logo
[5] Chen, L., Ä Ozsu, M. T., and Oria, V., \Robust and
[6] Ester, M., Kriegel, H.-P., Sander, J., and Xu, X., \A
Aug. 1999.
2002.
169{184, 2001.
1982.
2003.
1. INTRODUCCIÓN
El permiso para hacer copias digitales o impresas de todo o parte de este trabajo
para uso personal o en el aula se otorga sin cargo, siempre que las copias no se
hagan o distribuyan con fines de lucro o ventaja comercial y que las copias lleven
este aviso y la cita completa en la primera página . Copiar lo contrario, volver a
publicar, publicar en servidores o redistribuir en listas, requiere un permiso
específico previo y / o una tarifa.
TR2
TR3
TR4
ya que es crucial para reducir los daños causados por huracanes. Los meteorólogos
estarán interesados en los comportamientos comunes de los huracanes cerca de la
costa (es decir, en el momento del aterrizaje).
ing) o en el mar (es decir, antes de aterrizar). Por lo tanto, descubrir las
subtrayectorias comunes ayuda a mejorar la precisión de los pronósticos de
huracanes.
Los zoólogos están investigando los impactos de los diversos niveles de tráfico
vehicular en el movimiento, la distribución y el uso del hábitat de los animales.
Miden principalmente la distancia media entre la carretera y los animales. Los
zoólogos estarán interesados en los comportamientos comunes de los animales
cerca del camino donde la tasa de tráfico ha sido variada.
Por lo tanto, descubrir las subtrayectorias comunes ayuda a revelar los efectos de
las carreteras y el tráfico.
Bajo
Tasa de tráfico
Carreteras
vea que los grupos de segmentos de línea generalmente tienen una forma arbitraria,
y una base de datos de trayectoria contiene típicamente una gran cantidad de ruido
(es decir, valores atípicos).
Además, proporcionamos una heurística simple pero efectiva para determinar los
valores de los parámetros del algoritmo.
2. AGRUPACIÓN DE TRAYECTORIA
jectorios Una trayectoria pc1pc2 ¢ ¢ ¢ pck (1 · c1 <c2 <¢ ¢ ¢ <ck · leni) se denomina
subtrayectoria de TRi.
Enviar comentarios
Historial
Guardadas
Comunidad