You are on page 1of 12

Expert Systems with Applications 38 (2011) 1473214743

Contents lists available at ScienceDirect

Expert Systems with Applications


journal homepage: www.elsevier.com/locate/eswa

Similarity measure based on piecewise linear approximation and derivative


dynamic time warping for time series mining
Hailin Li a, Chonghui Guo a,, Wangren Qiu b
a
b

Institute of Systems Engineering, Dalian University of Technology, Dalian 116024, China


Research Center of Information and Control, Dalian University of Technology, Dalian 116024, China

a r t i c l e

i n f o

Keywords:
Similarity measure
Dynamic time warping
Piecewise linear approximation
Time series mining

a b s t r a c t
We propose a new method to calculate the similarity of time series based on piecewise linear approximation (PLA) and derivative dynamic time warping (DDTW). The proposed method includes two phases.
One is the divisive approach of piecewise linear approximation based on the middle curve of original time
series. Apart from the attractive results, it can create line segments to approximate time series faster than
conventional linear approximation. Meanwhile, high dimensional space can be reduced into a lower one
and the line segments approximating the time series are used to calculate the similarity. In the other
phase, we utilize the main idea of DDTW to provide another similarity measure based on the line segments just we got from the rst phase. We empirically compare our new approach to other techniques
and demonstrate its superiority.
Crown Copyright 2011 Published by Elsevier Ltd. All rights reserved.

1. Introduction
Time series is an ubiquitous data form which is relative to the
time and distributed in various elds, such as the stock data (Yang,
Wang, & Philip, 2003), the data of production consumption and
web transaction (Samia & Conrad, 2007). Although some have
nothing to do with the time, it can be transformed into the form
of time series and studied by the models and algorithms of time
series. For example, the data of the shape of the tree leaves can
be treated as time series (Ye & Keogh, 2009). There is much valuable information hiding in time series, including interesting patterns (Ajumobi, Pken, & Preda, 2004; Anthony, Wu, & Lee, 2009),
anomaly points (Keogh, Lin, & Fu, 2005) and motifs (Lin, Keogh,
Lonardi, & Patel, 2002). In most cases, we need to measure the similarity (Chen, Hong, & Tseng, 2009) or dissimilarity (distance) between two time series in advance. However, dimensionality curse
of time series goes against the accurate similarity measure.
There are many ways to reduce the dimensionality, such as the
discrete fourier transform (DFT) (Agrawal, Faloutsos, & Swami,
1993; Agrawal, Psaila, Wimmers, & Zait, 1995), singular value
decomposition (SVD) (Korn, Jagadish, & Faloutsos, 1997), discrete
wavelet transform (DWT) (Chan & Fu, 1999), piecewise linear
approximation (PLA) (Manjula, Morgan, & Layne, 2008), and symbolic aggregate approximation (SAX) (Keogh & Pazzani, 1998;
Keogh et al., 2005) based on the piecewise aggregate approxima Corresponding author. Tel.: +86 41184708007.
E-mail addresses: hailin@mail.dlut.edu.cn (H. Li), guochonghui@tsinghua.org.cn
(C. Guo), qiuone@163.com (W. Qiu).

tion (PAA) (Hung & Duong, 2008). In particularity, SAX and PLA
are widely applied to many elds (Keogh, Chakrabarti, Pazzani, &
Mehrotra, 2001; Keogh, Chu, Hart, & Pazzani, 2001; Lin & Keogh,
2006) and obtain very good results.
After reducing the dimensionality of time series data, Euclidean
distance is useful and simple for similarity measure, but it has
some disadvantages. For example, the abnormal data in time series
affects the whole similarity measure. Moreover, it will abandon the
sequence query too early, which causes the false alarm when
indexing. Another popular method to compare time series in diverse areas is the dynamic time warping (DTW) (Keogh & Pazzani,
1999; Keogh & Ratanamahatana, 2005). It offers a more reasonable
measure for description of the relations between the different time
series by time warping. Its improved version, which is called derivative dynamic time series (DDTW) (Keogh & Pazzani, 2001), can
produce the more intuitive warping and better results by considering the derivative of the time series.
In this paper, we propose a novel approach to measure the similarity of time series. Firstly, a divisive approach of piecewise linear
approximation (DPLA), whose time complexity is lower than the
conventional ones, is given to approximate time series. Secondly,
we propose middle curve piecewise linear approximation (MPLA)
based on DPLA to approximate time series. What we do have two
advantages at least. One is that it is more suitable for the middle
curve to describe the local and whole trends. Because some
conventional methods based on the original time series are difcult to express the trend and easy to fall into local optimization,
it is reasonable for utilizing a middle curve to represent the
original time series for their trends. The other is the lower time

0957-4174/$ - see front matter Crown Copyright 2011 Published by Elsevier Ltd. All rights reserved.
doi:10.1016/j.eswa.2011.05.007

14733

H. Li et al. / Expert Systems with Applications 38 (2011) 1473214743

consumption used to approximation. Therefore, we adopt middle


curve based on DPLA to get nite line segments of time series.
These two processes constitute the new piecewise linear approximation like PAA. Finally, due to the particularities of the line segments derived from MPLA, we provide another reasonable
similarity measure based on DDTW.
The rest of the paper is organized as follows. Section 2 discusses
some methods of piecewise linear approximation and SAX based
on PAA. Meanwhile, we analyze a few problems which need to
solve. We also introduce derivative dynamic time warping. Section
3 presents MPLA algorithm based on DPLA method and similarity
measure based on DDTW in detail. In Section 5, we demonstrate
the superiority of our approach by some experiments. The last section is the conclusion of the whole work, and shows some views for
future research.

tire dataset (Park et al., 1999). However, time complexity of topdown algorithm is larger than the two previous. Its time-consuming is O(Km2). The reason is that each recursion should calculate
the divisive errors of the two line segments when one point breaks
one sequence into two subsequences. That is, each recursion
should nd a best breakpoint to partition the present sequence
so that the approximation error of the two subsequences located
in the both sides of the breakpoint is minimum.
The PLA algorithms represent time series with some line segments. Most of the algorithms have a low computational complexity which is linear to length of the time series, but some have
higher complexity (Bauman, Dorofeyuk, & Kornilovm, 2004; Zhang
& Wan, 2008) because they are in pursuit of the optimal results. In
Section 3, we will propose another algorithm based on top-down
algorithm, whose time complexity is linear to the length of the
time series. We call it divisive piecewise linear approximation
(DPLA).

2. Related work
2.1. Piecewise linear approximation
There are many kinds of piecewise linear approximation (Keogh
& Chu et al., 2001) to reduce the dimensionality of time series data,
which can be grouped into three classes.
Window-Sliding: Once another next point joins in a window,
which results in the cost of the line segment to approximate the
subsequence larger than a threshold values, then create the line
segment of the subsequence.
Bottom-Up: Merge the adjacent segments until every merged
cost of the segments is larger than a threshold value.
Top-Down: Segment the time series from top to down recursively until some threshold value is met.
The main idea of the window-sliding algorithm is to slide the
window to the point which is not fall into the window. At the same
time, the line segment approximating the subsequence within the
window is formed suitably. The approximation depends on a
threshold value, which will cause several pathologically poor results
under some circumstances. Shatkay notices the problem and gives
some explanation (Shatkay & Zdonik, 1996). Later, a modied version (Park, Lee, & Chu, 1999) is given to improve the algorithm.
The two most important properties of the linear approximation
are the linear time complexity and the online computation.
The bottom-up method is different from the sliding window.
Firstly, it works well by the m/2 segments approximating time series of the length of m. Secondly, after merging the adjacent segments with a minimum merging cost, it must delete one of the
two merged segments and respectively recalculate the merging
cost between the other one and the new one. Redo above steps until every merged cost is larger than threshold value. Similar to the
sliding window, the bottom-up approach is widely used for time
series mining (Hunter & McIntosh, 1999) and its time complexity
is linear to the length of time series, i.e. O(Km), where K is the number of line segments. However, it is not an online algorithm.
As opposed to the above bottom-up algorithm, the top-down
algorithm approximates time series into different line segments
by searching the best location each time. At each recursive step
the algorithm should search a minimum divisive error. If the minimum error at the ith point is smaller than a threshold value e, the
subsequence of time series Q will be partitioned into two parts at
the ith point. Next, let it consider the two subsequences again
and redo the above steps until all the divisive minimum errors
are larger than the threshold value.
The top-down algorithm also has been widely used in diverse
areas, such as a framework for the sequence mining, text mining
and time series mining. Part et al. introduced a modied version
where they rst mark every peak and valley after scanning the en-

2.2. Symbolic aggregate approximation


SAX transforms one original time series into some discrete
strings. The whole work includes two phases. The rst, piecewise
aggregate approximation (PAA), is a process of high dimensionality
reduction for time series. The second is a transformation from
mean values to some discrete strings, which is the symbolic procedure. Since SAX has two important advantages, dimensionality
reduction and lower bounding (Lin, Keogh, & Lonardi, 2003), it is
often used to mine time series.
A time series of length m can be represented as a vector
Q = {q1, q2, . . . , qm}, which can be transformed it into w dimensional
space by a new vector Q 0 fq01 ; q02 ; . . . ; q0w g according to

0
q0i @

ki
X

1,
qj A k;

i 1; 2; . . . ; w;

jki11

where k = m/w.
From formula (1), we know that time series data is divided into
w equal sized frames whose length is k and the element value of
new vector is the mean of points falling within the frame. The time
complexity of PAA for the measure similarity is linear to the length
of time series, i.e. O(m). The representative value of the original
time series in new space is shown in Fig. 1. SAX is a method to
change the mean values into discrete string representations (Lin
et al., 2003). Fig. 2 shows the result of SAX after PAA procedure.
It is easy to nd that there are some special values in original
space to be generalized, which cannot express the property of time
series value well, such as local trend, whole trend and individual
point distribution. There are several different cases hiding the

2
1.5
1
0.5
0
0.5
1
1.5
2
0

10

20

30

40

50

Fig. 1. New values in lower space by PAA.

60

14734

H. Li et al. / Expert Systems with Applications 38 (2011) 1473214743

ies. Therefore, we should retain these signicant properties of time


series.

3
2

2.3. Derivative dynamic time warping

1
0
1

10

20

30

40

50

60

Fig. 2. Symbolic representation of time series by SAX, but they only reect the
trend of some points.

valuable information in the new space as shown in Fig. 2. It means


that in some cases the points within the frames have the equal
mean values by SAX, but unfortunately, their corresponding
Euclidean distance is still very large, which result in mismatch of
time series.
In Fig. 2, SAX algorithm cannot represent the trend of the original time series well. It only generalizes those particular segments
of time series with four discrete letters (A, B, C, D) without any
reections of the local trend of the time series. The trend of time
series is very important. It not only describes the features of the
subsequences but also embodies the whole trend of the time ser-

Dynamic time warping (DTW) used to similarity measure not


only reects the naturally common features of two different time
series but also can compare time series of different length. Many
papers regard it as a method to compute the true distance between
two different objects. Meanwhile, its improved version, derivative
dynamic time warping (DDTW), can warp the time more suitably
and obtains better result than DTW.
Now suppose we have two time series, Q = {q1, q2, . . ., qn} and
C = {c1, c2, . . . , cm}. There is a n by m matrix D whose element is
d(i, j) = (qi  cj)2.
The warping path W is a contiguous set of Ds elements, which
denotes the mapping between Q and C. The lth element of W is dened as wl = d(i, j)l, so we have W = {w1, w2, . . . , wl, . . . , wk}, where
k 2 [max (m, n), m + n  1), especially, when m = n, k 2 [m, 2m  1).
The relative information is also illustrated in Fig. 3.
The warping path must be typically subject to several constraints showed as follows.
Boundary conditions: w1 = d(1, 1) and wk = d(m, n), mean that
the path must begin at the element (1, 1) and end up at the element
(m, n) in matrix D.
Continuity and Monotonicity: Given wl = (a, b) then
wl1 = (a0 , b0 ), where 0 6 a  a0 6 1 and 0 6 b  b0 6 1. It not only
makes the allowable steps in the warping path to adjacent cells,
but also forces the points in warping path to be monotonously ordered by time.
We know that there are many warping paths satisfying those
constraints, but only the best one is required. Therefore, we should
choose it carefully according to the minimum warping cost

DTWQ ; C minwl

( s )
k
P
1
wl :
k

l1

Generally, the optimal path can be found by using dynamic programming method

8
>
< ri; j  1;
ri; j di; j min ri  1; j  1;
>
:
ri  1; j:

...
w

w2

The value of the cumulative distance r(i, j) is the distance d(i, j)


found in the current cell plus the minimum of the cumulative distance of the three adjacent elements. We need to seek for the three,
rather than eight, adjacent elements because of the continuity and
monotonicity of the warping path.

w1

Fig. 3. The warping path of dynamic time warping.

30

30

25

25

20

20

15

15

10

10

10

20

30

40

50

60

10

20

(a) DTW

30

(b) DDTW
Fig. 4. The difference of the DTW and DDTW.

40

50

60

14735

H. Li et al. / Expert Systems with Applications 38 (2011) 1473214743

Since DTW tries to explain variability of time series values by


warping the time axis, it causes unintuitive alignments where a
single point on time series maps to a large subsection of another.
Keogh and Pazzani (2001) call it singularity. Moreover, if qi
and cj, respectively, are the points of a rising trend of one time series and a falling trend of the other time series, we might not map
one to the other directly because of the obvious feature. To deal
with this problem, DDTW modied DTW by considering more
about the higher level feature of shapes. For example, the slope
of the points is an importance feature.
In DTW, the distance between qi and cj is d(i, j) = (qi  cj)2. However, in DDTW, the square of difference of the estimated derivatives of qi and cj is replaced d(i, j). It uses

Di q

qi  qi1 qi1  qi1 =2


2

to estimate the qi value. Likewise, the derivatives of the points of C


also can be got by formula (4) and is denoted as Dj(c). Then a new
distance is given by

D0 i; j Di q  Dj c2 :

Fortunately, our proposed method can approximate the trend with


the slope and reduce the dimensionality better.
3.1. Divisive piecewise linear approximation
Divisive piecewise linear approximation (DPLA) is a kind of topdown algorithm. Our motivation is to reduce the dimensionality
and approximate the trends, including local trend changes of subsequence and whole trend changes of time series. If one line segment L(i : j) is approximating the subsequence Q(i : j) of time
series Q, approximation cost of the subsequence should be smaller
than a threshold values e. In this paper, we donot consider all the
points together to nd the minimum approximation cost. Instead,
we only nd the farthest point in the subsequence Q(i : j) to the
corresponding line segment L(i : j) and judge whether the distance
is larger than the threshold value e or not. If it is true, the farthest
point is regarded as a breakpoint; otherwise, the line segment is
seen as the best approximation of the subsequence.
For a subsequence of time series, Q(i : j), there is a line segment
L(i : j) to approximate it well. The distance of the point qt(i 6 t 6 j)
to the line segment L(i : j) is denoted as D(qt, L(i, j)). Because the two
endpoints of the line segment L(i : j) is qi and qj, we have

Now the original D is replaced by the new D , and the remaining


step of DDTW is same to that of DTW. Fig. 4 shows the difference of
DTW and DDTW, explaining that DDTW can reect the trend of sequence but DTW cannot.

3. New piecewise linear approximation and similarity measure


The representations of original time series in new space should
not only reduce the dimensionality but also reect the trends of
the subsequences and the whole time series. That means representations must indicate the trends clearly so as to distinguish the
ascending segments and descending segments. The slope of line
segment is an important feature for every sequence, approximation had better reect the changeable trend of time series. Traditionally, some kinds of piecewise linear approximation get line
segments without slope research like PAA algorithm. As shown in
Fig. 5, the mark 1 and 3 have the same mean, which cannot reect
the different local trend well. Some papers use the mean of the
points in the frame to reduce dimensionality and simultaneously
let the slope express the trend of sequence. For instance, LPAA proposed by Hung and Duong (2008) is a method based on the original
PAA and slope to improve the similarity measure. It can alleviate
but not get rid of the disadvantages of time series with strong turns
at some points. In other words, LPAA loses the ability of trend
expression after approximating the limited subsequence by PAA.

3.5
3
2.5

2
2

1.5
1

10

15

Fig. 5. The curve have strong turning. It has three different trend (descending, at
and ascending), but PAA cannot not express the sharp turning and trend well.

Li : j

qi j  qj i qj  qi
t;

ji
ji

i 6 t 6 j:

Suppose the equation of the straight line segment is q = b + at,


namely, at  q + b = 0, then we have

8
qi
< a qjji
;

: b qi jqj i :
ji

So the distance of the point qt to the L(i : j) is

Dqt ; Li : j

jat  qt bj
p :
a2 1

For a time series Q of length m or subsequence Q(i : j) with discrete real values, if we directly link every two adjacent points with
m  1 or j  i line segments, the distance of every point to the corresponding line segment is equal to 0. It means the line segment
approximation of time series is the best and the approximation
cost is minimum (equal to 0). Although it is best to approximate
time series, it causes no dimensionality reduction and is meaningless. Therefore, we let the distance between the point and the corresponding line segment close to some value rather than the
minimal distance. In other words, let the distance be larger than
a dened threshold value e.
For subsequence Q(i : j), if the distance of point ql(i 6 l 6 j) to the
line segment L(i : j) is maximum and D(ql, L(i : j)) > e, the point ql is
a breakpoint and subsequence Q(i : j) is divided into two parts, Q(i :
l) and Q(l : j); otherwise, the line segment approximates subsequence Q(i : j) well, it is not necessary to divide the subsequence
Q(i : j) any more.
The algorithm of divisive piecewise linear approximation is
shown as follows.
Step 1: Input time series Q(i : j) and threshold value e. A vector Bp
is used to restore the breakpoints. k records the number of
the present breakpoints. pos denotes the position of the
newest breakpoint. Initially, i = 1, j = m, where m is the
length of time series. Since the rst point and the last point
are the special breakpoints, let k = 2, Bp(1) = q1 and
Bp(2) = qm.
Step 2: For time series Q(i : j), create line segment L(i : j) according
to the formula (6). Set two variables l = i + 1 and
best_so_far = 0.

14736

H. Li et al. / Expert Systems with Applications 38 (2011) 1473214743

Step 3: Calculate the distance of point ql to the line segment L(i : j),
that is D(ql, L(i : j)).
Step 4: If D(ql, L(i : j)) > best_so_far, best_so_far = D(ql, L(i : j)) and
pos = l.
Step 5: l = l + 1. If l P j, go to the step 6; otherwise, go back to step
3.
Step 6: If best_so_far P e, k = k + 1, Bp(k) = qpos, go back to the Step
2 and let the two subsequences Q(i : pos) and Q(pos : j)
redo the step 2 to step 6, respectively.
Step 7: Sort the element of vector Bp by an ascending time and
output the sorted result.

Every breakpoint is one of the endpoints of line segments


approximating time series. We only need to link the adjacent
breakpoints to create the line segments. From the process of the
above algorithm, our approach depends on the threshold value e
as well as the conventional top-down method. As shown in
Fig. 6, the various groups of the line segments are made for different e.
It is not reasonable and feasible to use the presetting threshold
values to control the approximation. Every time series in large
database has unique features. In other words, we should consider
the special features for different time series. Therefore, It is important to automatically nd a reasonable threshold value e to control
the number and the trends of line segments to approximate time
series adaptively. Usually, the standard deviation is used to describe the variability of a data set from its mean. We choose the
standard deviation of the distances of the points in time series to

the rst line segment L(1 : m) as the upper bound of the permissible divisive condition. That is

e STDDqt ; L1 : m

s
m
m
1 X
1 X
Dqt ; L1 : m 
Dqt ; L1 : m2 :

m t1
m t1

We must choose the threshold e carefully, because it impacts on


the last results. At the beginning, DPLA creates the rst line segment to approximate all points of time series. The rst and last
points of time series are the two endpoints of the rst line segment. In some cases, if one or two of the two endpoints may abnormally deviate from most of the points, we cannot use the line
segment to calculate the e. Instead, we should search the next suitable line segment without deviating from most of points to be the
rst line segment L(1 : m) for calculating the threshold value e, as
shown in Fig. 7.
Although DPLA is similar to the top-down Algorithm, there are
some obvious differences. One is that time complexity of our approach is linear to the length of time series. It is O(km), where k
is the number of line segments and m is the length of time series.
However, time complexity of the traditional top-down method is
O(km2). Another is that we use the distance measure and standard
deviation to replace the original approximation cost which should
be calculated for every point of time series in traditional method.
Especially, we set the standard deviation as the threshold value,
which makes the approximation of time series have an adaptivity
and reects their own features.

1.5

1.5

0.5

0.5

0.5

0.5

1.5

1.5

10

20

30

40

50

60

1.5

1.5

0.5

0.5

0.5

0.5

1.5

1.5

10

20

30

40

50

60

10

20

30

40

50

60

10

20

30

40

50

60

Fig. 6. The number and the trends of line segments are various according to the different threshold values.

14737

H. Li et al. / Expert Systems with Applications 38 (2011) 1473214743

5
4

L1
3

L3

2
1
0
1
2

L2
0

10

20

30

40

50

60

Fig. 7. L1 is far away from most of the points in time series, which will cause the
standard deviation large. We should look for the next line segment to replace it, and
nd that the L2 embraces most of points in subsequence Q(1:59). So we should
regard L2 as L1 and apply the formula to compute the e value. Since the distance
deviation of the points in subsequence Q(59:60) to L3 is equal to zero, L3 is not
suitable to be regarded as L1. Actually, it is one of the nal line segments
approximating the subsequence.

Actually, the distance D(qt, L(i : j)) of the point qt to the line segment L(i : j) can be replaced by the length of the right angled side of
a right-angled triangle. According to the right triangle theorem, the
length of the hypotenuse is longer than any other right-angled
sides. Therefore, to calculate the distance faster, we use

Dqt ; Li : j jqt  at bj




qi j  qj i qj  qi 


qt 
ji
ji 

10

instead of the formula (8), which can speed up the whole divisive
piecewise linear approximation algorithm.
3.2. Middle curve-based piecewise linear approximation
In Fig. 8, some line segments with sharp slopes cannot express
the trends of subsequences well. For example, the line segment
marked a directly links the two endpoints in disregard of any
point between them. All the points of the subsequence only appear
on the bottom-left side of the line segment. It is not reasonable for
this approximation. Likewise, for the subsequences in time series
marked b, there are too many line segments to approximate the
trends. Actually, it is easy to nd that the sequences are smooth except for some frequent amplitudes and can be approximated naturally by one or two line segments with slightly at slopes.
Why does it happen in our algorithm? The reason is that we
search the breakpoints and link them directly without considering

other points between the two breakpoints when the distance of


every point within the sequence to the corresponding line segment
is smaller than e value.
To overcome the disadvantages, we propose another piecewise
linear approximation based on the middle curve (MPLA) which
looks like a center line of an irregular pipe line. The whole algorithm has two phases. One is to nd a middle curve to represent
the original time series. The other is using DPLA to approximate
the middle curve. Because in the second phase we can regard the
middle curve as the original time series and put it into DLPA algorithm directly, we only consider how to create the middle curve in
this subsection.
The middle curve is the center line of time series. It seems to be
a center line of an irregular pipe line. Therefore, we should create
the pipe line in advance. It means that the peaks and valleys of
time series should be on the edges of line pipe. It also means that
the peaks and valleys can construct the edges of the line pipe. So
we should mine the peaks and valleys in the time series and store
them in two matrixes, Up and Lw, which denote the sites of peaks
and valleys, respectively.
If there are continuous ascending points for r times in some
subsequences, then the last ascending point at this subsequence
is peak point. We store the peak points in the matrix Up. Likewise,
we obtain the valleys matrix, Lw. In other words, qi is the rst
point of the subsequence Q(i : j). For each l(i < l 6 r + i) and
r + i + 1 6 j, if ql1 < ql and qr+i > qr+i+1, the point qr+i is the peak point
and restored into the matrix Up. Similarly, For each l(i < l 6 r + i)
and r + i + 1 6 j, if ql1 > ql and qr+i < qr+i+1, the point qr+i is valley
point and restored into the matrix Lw.
Sometimes a small number of continuous points have several
large amplitudes. That is, the difference of the adjacent points in
this subsequence is very large. For example, in Fig. 9 the points
marked 1, 2 and 3 are continuous and have large amplitudes.
Although the subsequence is made up of three points, it has very
large amplitudes. In this case, the algorithm must recognize them
and save the rst and last points in the point group with same
trend for the middle curve. It means the algorithm must save the
points marked 1 and 4 for the middle curve. The point marked
1 or 3 should be regarded as the point of the middle curve
without any changes. However, because the points marked 1 to
4 are in the ascending group, our algorithm only saves the points
marked 1 and 4 for the middle curve instead of the points
marked 1 and 2. In other words, If the difference of the adjacent points (qi1 and qi) is much larger than the threshold value
e, we directly save the point qi and qi1 for the middle curve, which
means that the two points are the upper and lower bound points
simultaneously, as shown in the Fig. 11. The points marked C

1.5

2
1

0.5

0
0.5

1.5
2

10

20

30

40

50

Fig. 8. DLPA cannot express the slope well.

1
0

10

20

30

40

50

60

60
Fig. 9. Three points in time series have large amplitudes and should be saved for
the middle curve with considering the trends of points.

14738

H. Li et al. / Expert Systems with Applications 38 (2011) 1473214743

Step 6: According to formula (12), combine the upper bound


points values Up with the low bound points values Lw
to calculate the points value of the middle curve M.
Step 7: Put the middle curve M into the DLPA algorithm, then we
can obtain the breakpoints Bp of the middle curve for the
original time series.

Fig. 10. Interpolate four values into line segment L12.

are the upper and lower points. To execute the proposed algorithm
conveniently, as illustrated in Fig. 10, we only interpolate several
new values into the line segments (L12 and L23).
Given a line segment L(i : i + 1), if it is interpolated with n values, the interpolated line segment Ll : l n0 q0l ; q0l1 ; . . . ; q0ln ,
i.e.

q0lj qi qi  qi1 j=n;

0 6 j 6 n:

11

By the above analysis, we obtain the upper bound matrix Up


and lower bound matrix Lw. We put them into a combination matrix T = [Up; Lw] and sort the element of T by the time column. Finally, with the sorted T, we can calculate every point of middle
curve and store into matrix M, i.e.

Mi

Ti Ti 1
;
2

By the above algorithm, we can obtain the upper bound point


values and lower bound point values as shown in Fig. 11. They look
like the points on the edges of the irregular pipe line. According to
the formula (12), we can obtain the points values M of middle
curve shown in Fig. 12.
Fig. 13 illustrates the whole process of the middle curve-based
piecewise linear approximation for time series. The data used by
Fig. 13 seems to be different from the one used to illustrate our
ideas. Actually, they are identical except for stretching the length
of time axis. From the bottom picture in Fig. 13, we nd that our
method not only reduces the dimensionality but also can approximate the local and whole trends of time series well.
3.3. Similarity measure based on MPLA and DDTW
We extract the features of the time series by MPLA. The features
of time series are a set of the lines segments with slopes. Moreover,
different time series have different number of line segments.
Therefore, we cannot use the Euclidean distance to measure the

i 1; 2; . . . lengthT  1:

The algorithm of middle curve-based piecewise linear approximation is illustrated as follows.


Step 1: Input time series Q = q1, q2, . . . , qm. The vector Len stores
the difference of every two adjacent points, namely,
Len(i  1) = qi  qi1, i = 2, 3, . . . , m. Variance minLen is
the mean of Len. Use the matrix Q0 to record the new values of the interpolated time series.
Step 2: Let j = 1, i = 2. Interpolate some values into the irregular
subsequence and execute the following sub-steps.
Step 21: k = 1. If Len(i) > minlen, the number of interpolate values is num = bLen(i)/minlenc and the
step length is l = Len(i)/num.
Step 22: k = k + 1. If qi > qi1, then q0k qi1 l  j;
otherwise, q0k qi1  l  j.
Step 23: If j < num  1, j = j + 1 and go back to Step 22;
otherwise, execute the next sub-step.
Step 24: k k 1; q0k qi . If i < m, i = i + 1 and go back
to step 21; otherwise, go to Step 3.
Step 3: If q01 < q02 , then tag = 1; otherwise, tag = 0. tag = 1 means
that the current value of the points in time series is
ascending and tag = 0 means the current value is
descending. At the beginning, append the rst point
of time series Q0 into Lw and Up respectively,
Lw appendLw; q01 and Up appendUp; q0l . Set
n = length(Q0 ) and i = 2.
Step 4: If q0i < q0i1 and tag = 0, which means the trend of the
points is changed from descending to ascending, then
q0i is the minimum value at present, append q0i into
matrix Lw, namely, Lw appendLw; q0i . Meanwhile,
the tag value must be changed, tag = 1. If q0i > q0i1 and
tag = 1, which means the trend is changed from ascending to descending, then q0i is maximum value at present,
Up appendUp; q0i and tag = 0.
Step 5: i = i + 1. If i 6 n  1, go back to Step 4; otherwise, store
the last point of time series into Lw and Up, namely,
Lw appendLw; q0n and Up appendUp; q0n .

2
1.5

0.5
0
0.5
1

1.5
2

10

20

30

40

50

60

Fig. 11. The upper bound and lower bound of time series. The points marked A
represent the lower bound points, the points marked B mean the upper bound
points and the points marked C denote upper bound point and lower bound point
together.

2
1.5
1
0.5
0
0.5
1
1.5
2

10

20

30

40

50

Fig. 12. The middle curve and the original time series.

60

H. Li et al. / Expert Systems with Applications 38 (2011) 1473214743

14739

So, we transform the data with formula (5) and calculate the
similarity by DDTW. The warping result is shown in Fig. 14. It is
well known that the complexities of DTW and DDTW are O(n m),
where n and m are the length of Q and C, respectively. However,
the time complexity of our approach is O(N M), where N and M
are the number of endpoints of line segments approximating Q
and C. They are much smaller than m and n, that is, N  n and
M  m.
4. Experimental evaluation
Many experiments have already demonstrated that the SAX
algorithm based on PAA and its extended versions are useful and
feasible for time series mining by Keogh and Lin (2002). Therefore,
in this section, we mainly compare our method with SAX algorithm.
4.1. Approximation results
Fig. 13. The process of creating the middle piecewise linear approximation.

similarity. Instead, derivative dynamic time warping algorithm is a


good choice.
Due to the particularity of line segments, such as the number
and the length of line segments are diverse, we should not use
derivative dynamic time warping directly. We dene a warping
window length to constraint the warping path such as Sakoe
Chiba Band and the Itakura Parallelogram (Rabiner & Juang,
1993; Sakeo & Chiba, 1990). For two time series Q and C, we can
get the breakpoints, respectively, QBp = {qbp1, qbp2, . . . , qbpl} and
CBp = {cbp1, cbp2, . . . , cbps}. If r denotes the half length of the warping window in original time series, then we regard another formula
as the distance between the two points. That is,

Ldi; j

di; j; if jqbpi 1  qbpj 1j < r;


di; j Pi; j;

12

otherwise;

where
d(i, j) = (qbpi(2)  cbpj(2))2,
P(i, j) = ((jqbpi(1)  qbpj(1)
2
j  r)w) and w is the mean of Euclidean distance of the subsequences with the length of m chosen from two time series. That
q

Pm
2
is, w m1
i1 qi  c i ; m < minlengthq; lengthc.
From the formula (12), we know that if the time difference of
two endpoints respectively from two line segments surpasses the
window length r, we need to punish the distance of q(i) and c(j)
so that time series in the different groups have a larger distance
values.

We begin our experiments to approximate time series from the


Stock data web page (2005) dataset with the length of 2119415.
We arbitrarily choose 50 subsequences from the time series. The
length of every subsequence is larger than 1000. After executing
MPLA, we can obtain the results of the line segments. It is interesting to nd that the average compress ratios are stable and approximately equal to 14. Fig. 15 shows the result of the average compress
ratio.

Compress Ratio

Number of the line segments


:
Length of the time series

13

From Fig. 16, it states that MPLA automatically creates the 73 line
segments to approximate time series with 300 points. The lines express the original time series well. They not only reect the whole
trend but also describe the local trend. Of course, if we want use less
number of line segments to express the original time series, we only
need to use the man-made setting of the threshold e to control the
line segments in the process of MPLA.
Since SAX is based on PAA, we make another experiment illustrate the average errors of the approximations to the original time
series for PAA and MPLA respectively. We still use the Stock data
web page (2005) dataset and choose arbitrarily 30 subsequences
of length 2000. When reducing dimensionality by the two methods, we deliberately let the dimensionality number of PAA larger
than that of MPLA as shown in Fig. 17(a). It means that the approximation error of the SAX should be smaller than that of MPLA.
However, in Fig. 17(b), the result of the experiments demonstrates
that the error of MPLA is smaller than that of the PAA in spite of its

12
0.29

10

0.28

Compress ratio

8
6
4
2

0.27
0.26
0.25
0.24
0.23
0.22
0.21

0.2
1000 2000 3000 4000 5000 6000 7000 8000 9000 10000
2

10

20

30

40

50

60

Fig. 14. The warping result of middle line segments based on DDTW.

The length of the subsequences


Fig. 15. The average compress ratio for different length of time series.

14740

H. Li et al. / Expert Systems with Applications 38 (2011) 1473214743

0.4

0.6
time series
line segments

0.6

time series
line segments

0.8

0.8

1
1.2
1.2
1.4
1.4
1.6

1.6

1.8

1.8
2

50

100

150

200

250

300

50

100

(a)

150

200

250

300

(b)

Fig. 16. (a) The separated curves of line segments and time series. (b) The overlap of line segments and time series, which means the line segments t the time series well.

1000

0.016
Dimensionality number of PAA
Dimensionality number of MPLA

900

0.014
PAA Error
MPLA Error

700

0.012

Error rate

Dimensionality number

800

600
500
400

0.01

0.008

300
200

0.006

100
0

500

1000

1500

2000

0.004

500

Length of the series

1000

1500

2000

Length of the series

(a) Dimensional number of PAA and MPLA

(b) Error of PAA and MPLA

Fig. 17. (a) Dimensionality number of PAA is greater than that of MPLA. (b) The error of MPLA is smaller than that of PAA to approximate the original time series, whose
length is 2000. It means that we can use the lower dimensionality of the MPLA to approximate the time series better than the higher dimensionality of PAA.

lower dimensionality. Therefore, MPLA can express the time series


better than PAA.

4
3
1
12
10
11
7
6
2
5
13
14
9
8

4.2. Clustering
To testify the new similarity measure, we experiment clustering
study on synthetic control chart dataset (Alcock & Manolopoulos,
1999) derived from UCI data. Comparing the hierarchical clustering
is one of the best ways for similarity comparison. The evaluation is
typically objective. We only observe which dissimilarity measure is
close to the Euclidean distance. In view of the results of comparing
the Euclidean, SAX, IMPACTS and SDA in the paper Lin et al. (2003),
we know that the SAX was the best one whose distance measure is
close to Euclidean. So we only compare our method with Euclidean
and SAX and observe that whether our method is better than SAX.
If it is true, our method is at least better than SAX, IMPACTS and
SDA. We arbitrarily choose 14 time series to cluster with the hierarchy clustering. The results are shown in Figs. 1821, which is
used by different methods, including Euclidean, DDTW, SAX and
MPLA. Note that the 14 data objects derive from 6 groups,
normal{1, 2}, cyclic{3, 4, 5}, increasing trend{6, 7}, decreasing
trend{8, 9}, upward shift{10, 11, 12}, downward shift{13, 14}.
From Figs. 1821, we know that the clustering result of DDTW
is the worst, our method is at least better than DDTW. It also
means that our method is an improved DDTW because of the

40

(a)

60

80

100

(b)

Fig. 18. The best cluster result of the Euclidean.

new similarity measure based on DDTW. Moreover, it seems to


be surprised that the result of SAX is better than the new similarity
measure. Actually, for the 3rd, 4th, 5th time series, the distance between the 3rd and the 5th time series is larger than that between
the 4th and 5th. However, Since SAX only considers the whole

14741

H. Li et al. / Expert Systems with Applications 38 (2011) 1473214743

1
2
6
11
7
12
9
3
10
14
13
8
5
4
0.2

0.25

(a)

0.3

0.35

(b)

Fig. 19. The best cluster result of the derivative dynamic time warping (DDTW).

13
14
9
8
5
3
2
4
1
12
6
11
10
7
1

(a)

(b)

Fig. 20. The best cluster result of SAX whose parameter is w = 10, word_size = 9.

5
4
3
2
1
12
10
11
7
6
9
8
14
13

4.3. Similarity search


Similarity search in time series is to mine the similar subsequences of the pattern sequence. Now we are studying the symbolic
representations of the line segments created by the MPLA. We
initially use the cloud model (Li, Han, Shi, & Chan, 1998) to transform
the angles or the slopes into some string representations. The
similarity search is based on the symbolic representations.
The stock data is very large and has enough information for similarity search, so we experiment our method on it. We choose the
subsequence Q (200 : 250) as pattern sequence and let similarity
search algorithm search time series Q(1 : 2000). The results of
the similarity search are shown in Fig. 24. We nd that the

2.5
2
5

(a)

Especially, for the 1st and 2nd time series, our method does not
group them into normal group at the prior phase, which is the
same to the Euclidean. But SAX is prior to combine the 2nd with
the 1st and 4th because of considering of the means of the points
within the frames rather than the local trend. For considering local
trend of time series, the cyclic group (3rd, 4th, 5th) should be analyzed because the cyclic property can better to reect the local
trend. We nd that SAX cannot classify the cyclic group into the
same cluster, but our method is able to and prior to do it. It is very
important that time series {6, 7}, {8, 9}, {10, 12}, {13, 14} are
grouped into the same class respectively, but SAX and other methods cannot. Therefore, our method can deal with the local trend
and the whole trend simultaneously.
Apart from the above advantages, we need to point out that
there is none of parameter to be set for the MPLA. However, the
SAX needs to set two parameters (the number of segments w
and alphabet sizes a) which are usually hard to decide. Fig. 20 is
the best clustering result by adjusting the two parameters.
Although the MPLA essentially has one threshold to set, yet it is
easy to be dened by formula (9). According to the different
parameters, SAX produces the rough clustering result as shown
in Fig. 22. It means that we must have the priori information to
set the parameters, otherwise we could not get good clustering
results.
In additional, for the similar shapes of different time series, our
approach can recognize it. For example, the similar shapes in two
time series are shown in Fig. 23. If we use the SAX to measure
the similarity, it cannot get the proper result of shape comparison,
which should be equal to 1. However, the similarity result of our
approach is equal to 1, which reects the trend of the whole sequence and subsequence well in time series. Therefore, it is better
for our method to measure the similarity of the time series in spite
of equal and unequal length. It is benecial to similarity search for
time series mining.

10

15

(b)

Fig. 21. The cluster result of new similarity measure.

1.5
1
0.5

trend and neglects the local trend, it is prior to classify the 3rd and
5th time series into the same group. Since our method considers
the local and whole trend of time series, it not only classies the
4th and 5th time series into the same cluster in advance but also
clusters the 3rd,4th and 5th time series into the same group.

11

12

10

14

13

Fig. 22. The result of SAX used to cluster is rough when the two parameters is set
by w = 10 and a = 4, which demonstrates that it does not consider the local trend at
all.

14742

H. Li et al. / Expert Systems with Applications 38 (2011) 1473214743

Fig. 23. (a) Shows the two time series with the very similar shape in the same frame of axes; (b) shows the result of the SAX algorithm to approximate; (c) shows the new
method to approximate.

0.8

0.8

0.6
0.4

0.6
0.4

0.2

0.2

0.2

0.2

c
b

0.4

0.4

0.6

0.6

0.8

0.8

1
0

500

1000

1500

2000

a
5

10

15

(a)

20

25

30

35

40

45

50

(b)

Fig. 24. (a) The result of similarity search. (b) The zoom view of the result of similarity search.

algorithm based on our method not only can search the pattern sequence marked a but also discover the similar subsequences in
time series Q(1 : 2000). They are the subsequences marked a,
b, c and d. Therefore, the similarity research based on our
method also can get a good result.

5. Conclusions
In this paper, we propose a new method to measure the similarity of time series. DPLA adaptively divides time series into unequal
segments, which does not require any parameter to preset. Moreover, its time complexity is O(kn), which is much lower than the
conventional topdown linear approximation. We use MPLA,
which is based on DLPA, to approximate time series with line segments. It is better to reect the trends of the subsequence and the
whole sequence in time series. Moveover, the line segments produced by MPLA to approximate the middle curve of the original
time series make the approximation error rate less. For the particularity of the results of the MPLA, the modied derivative dynamic
time warping is proposed to calculate similarity of time series,
which is benecial to separate the members from different groups.
The empirical results demonstrate that the new similarity measure
is an effective method for time series mining.
The experimental analysis shows that our method is better for
dimensionality reduction and similarity measure, but the time
complexity is a little higher than SAX. The reason is that our method
is based on derivative dynamic time warping to calculate the
similarity.

The symbolic representations to measure time series can be applied in our method. We have found a way (Li et al., 1998) to objectively describe the slopes or angles of line segments as symbolic
representations. Perhaps our method is a good choice for improving the present and available methods of SAX to nd motifs and
discover rules. Of course, we also can use our method to do other
work for time series mining, such as classication, clustering and
other time series mining tasks.

Acknowledgments
This work is supported by the Natural Science Foundation of
China (70871015 and 71031002) and the Fundamental Research
Funds for the Central Universities (DUT11SX04). We also would
like to acknowledge Prof. Eamonn Keogh for the dataset and his
procedure source code.

References
Agrawal, R., Psaila, G., Wimmers, E. L., & Zait, M. (1995). Querying shapes of
histories. In Proceedings of the 21st international conference on very large
databases, Zurich, Switzerland (pp. 502514).
Agrawal, R., Faloutsos, C., & Swami, A. (1993). Efcient similarity search in sequence
databases. In Proceedings of the 4th international conferences on foundations of
data organization and algorithms. Chicago: Springer-Verlag.
Ajumobi, U., Pken, B., & Preda, A. (2004). Discovering all frequent trends in time
series. In Proceedings of the winter international synposium on information and
communication technologies (Vol. 6, pp. 16).
Alcock, R. J., & Manolopoulos, Y. (1999). Time-series similarity queries employing a
feature-based approach. In 7th Hellenic conference on informatics (pp. 19).

H. Li et al. / Expert Systems with Applications 38 (2011) 1473214743


Anthony, J. T. L., Wu, H. W., & Lee, T. Y. (2009). Mining closed patterns in multisequence time-series databases. Data & Knowledge Engineering, 68, 10711090.
Bauman, E. V., Dorofeyuk, A. A., & Kornilovm, G. V. (2004). Optimal piecewise-linear
approximation algorithms for complex dependencies. Automation and Remote
Control, 65, 16671674.
Chan, K., & Fu, W. (1999). Efcient time series matching by wavelets, In Proceedings
of the 15th IEEE international conference on data engineering (pp. 117126).
Chen, C. h., Hong, T. P., & Tseng, V. S. (2009). Mining fuzzy frequent trends from time
series. Expert System with Applications: An International Journal, 36, 41474153.
Hung, N. Q., & Duong, T. A. (2008). An improvement of paa for dimensionality
reduction in large time series databases. In Proceedings of the 10th Pacic rim
international conference on articial intelligence (Vol. 12, pp. 698707).
Hunter, J., McIntosh, N., (1999). Knowledge-based event detection in complex time
series data. In Proceedings of the joint European conference on articial intelligence
in medicine and medical decision making. (pp. 271280).
Keogh, E., & Lin, J. (2002). <http://www.cs.ucr.edu/eamonn/>.
Keogh, E., & Pazzani, M. (1998). An enhanced representation of time series which
allows fast and accurate classication, clustering, and relevance feedback. In
Proceedings of the 4th international conference on knowledge discovery and data
mining (Vol. 9, pp. 239241).
Keogh, E., & Pazzani, M. (1999). Scaling up dynamic time warping to massive
dataset. In Proceedings of the 3rd European conference on principles of data mining
and knowledge discovery (Vol. 9, pp. 111).
Keogh, E., & Pazzani, M. (2001). Derivative dynamic time warping. In Proceedings of
the 1st SIAM international conference on data mining (Vol. 111).
Keogh, E., Chakrabarti, K., Pazzani, M. J., & Mehrotra, S. (2001). Dimensionality
reduction for fast similarity search in large time series databses. Knowledge and
Information Systems, 3, 263286.
Keogh, E., Chu, S., Hart, D., & Pazzani, M. (2001). An online algorithm for segmenting
time series. IEEE International Conference on Data Mining, 289296.
Keogh, E., Lin, J., & Fu, A. (2005). Hot sax: efciently nding the most unusual time
series subsequence. In Proceedings of the 5th IEEE international conference on
data mining (Vol. 11, pp. 226233).
Keogh, E., & Ratanamahatana, C. (2005). Exact indexing of dynamic time warping.
Knowledge and Information Systems, 3, 358386.
Korn, F., Jagadish, H. V., & Faloutsos, C. (1997). Efcently supporting ad hoc queries
in large dataset of time sequences. Special Interest Group on Management of Data
(SIGMOD97), 289300.

14743

Li, D. Y., Han, J. W., Shi, X. M., & Chan, M. C. (1998). Knowledge representation and
discovery based on linguistic atoms. Knowledge-based Systems, 10, 431440.
Lin, J., & Keogh, E. (2006). Group sax: extending the notion of contrast sets to time
series and multimedia data. In Proceedings of the 10th European conference on
principles and practice of knowledge discovery in databases (pp. 284296).
Lin, J., Keogh, E., Lonardi, S., & Patel, P. (2002). Finding motifs in time series. In The
8th ACM international conference on knowledge discovery and data mining (pp.
5368).
Lin, J., Keogh, E., & Lonardi, S. (2003). A symbolic representation of time series, with
implications for streaming algorithms. In Proceedings of the 8th ACM SIGMOD
workshop on research issues in data mining and knowledge discovery (Vol. 7, pp.
211).
Manjula, A. I., Morgan, M. H., & Layne, T. W. (2008). A performance comparison of
piecewise linear estimation methods. In Proceedings of the 2008 Spring
Simulation Multi-Conference (Vol. 4, pp. 273278).
Park, S., Lee, D., & Chu, W. W. (1999). Fast retrieval of similar subsequence in long
sequence databases. In Proceedings of the 3nd IEEE Knowledge and Data
Engineering Exchange Workshop (pp. 6067).
Rabiner, L., & Juang, B. (1993). Fundamentals of speech recognition. Englewood Cliffs,
N.J: Prentice Hall.
Sakeo, H., & Chiba, S. (1990). Dynamic programming algorithm optimization for
spoken word recognition. Readings in Speech Recognition, 159165.
Samia, M., & Conrad, F. (2007). A time-series representation for temporal web
mining using a data band approach. In Proceedings of the 2007 conference on
databases and information systems IV (Vol. 6, pp. 161174).
Shatkay, H., Zdonik, S.B. (1996). Approximate queries and representations for large
data Sequences. In Proceedings of the 12th International Conference on Data
Engineering (pp. 536545).
Stock data web page, 2005. <http://www.cs.ucr.edu/wli/FilteringData/stock.zip>.
Yang, J., Wang, W., & Philip, S. Y. (2003). Mining asynchronous periodic patterns in
time series data. IEEE Transactions on Knowledge and Data Engineering, 15,
613628.
Ye, L. X., & Keogh, E. (2009). Time series shapelets: A new primitive for data mining.
In International conference on knowledge discovery and data mining, Paris, France
(pp. 947956).
Zhang, H., & Wan, S. N. (2008). Linearly constrained global optimization via
piecewise-linear approximation. Journal of Computational and Applied
Mathematics, 214, 111120.

You might also like