Elena Tsiporkova

Dynamic Time Warping Algorithm
for
Gene Expression Time Series
Gene expression time series are expected to vary not
only in terms of expression amplitudes, but also in terms
of time progression since biological processes may unfold
with different rates in response to different experimental
conditions or within different organisms and individuals.
What is Special about Time Series Data?
time
i
i+2
i
i
i
time time
Why Dynamic Time Warping?
Any distance (Euclidean, Manhattan,
…) which aligns the i-th point on one
time series with the i-th point on the
other will produce a poor similarity
score.
A non-linear (elastic) alignment
produces a more intuitive similarity
measure, allowing similar shapes to
match even if they are out of phase in
the time axis.
j
s

i
s

m
1
n 1
Time Series B
Time Series A
Warping Function
p
k
p
s
p
1
To find the best alignment
between A and B one needs to
find the path through the grid
P = p
1
, … , p
s
, … , p
k
p
s
= (i
s
, j
s
)

which minimizes the total
distance between them.
P

is called a warping function.
Time-Normalized Distance Measure
D(A , B ) =
(
(
(
(
¸
(

¸

·
¿
¿
=
=
k
s
s
k
s
s s
w
w p d
1
1
) (
d(p
s
): distance between i
s
and j
s

P
min arg
w
s
> 0: weighting coefficient.
Best alignment path between A
and B :
Time-normalized distance between
A and B :
P
0
= (D(A , B )).
j
s

i
s

m
1
n 1
Time Series B
Time Series A
p
k
p
s
p
1
Optimisations to the DTW Algorithm
The number of possible warping
paths through the grid is
exponentially explosive!
Restrictions on the warping function:
• monotonicity
• continuity
• boundary conditions
• warping window
• slope constraint.
reduction of the
search space
j
s

i
s

m
1
n 1
Time Series B
Time Series A
Restrictions on the Warping Function
Monotonicity: i
s-1
≤ i
s
and j
s-1
≤ j
s
.
The alignment path does not go back
in “time” index.
Continuity: i
s
– i
s-1
≤ 1 and j
s
– j
s-1
≤ 1.
The alignment path does not jump in
“time” index.
Guarantees that features are not
repeated in the alignment.
Guarantees that the alignment does
not omit important features.
i
j
i
j
Restrictions on the Warping Function
Boundary Conditions: i
1
= 1, i
k
= n and
j
1
= 1, j
k
= m.
The alignment path starts at the bottom
left and ends at the top right.
Warping Window: |i
s
– j
s
| ≤ r, where r > 0
is the window length.
A good alignment path is unlikely to
wander too far from the diagonal.
Guarantees that the alignment does not
consider partially one of the sequences.
Guarantees that the alignment does not
try to skip different features and gets
stuck at similar features.
n
m
i
j
(1,1) i
j
r
Restrictions on the Warping Function
Slope Constraint: ( j
s
p
– j
s
0
) / ( i
s
p
– i
s
0
) ≤ p and ( i
s
q
– i
s
0
) / ( j
s
q
– j
s
0
) ≤ q , where q ≥ 0
is the number of steps in the x-direction and p ≥ 0 is the number of steps in the y-
direction. After q steps in x one must step in y and vice versa: S = p / q e[0 , ·].
Prevents that very short parts of the sequences
are matched to very long ones.
The alignment path should not be too steep or
too shallow.
i
j
≤ p
≤ q
The Choice of the Weighting Coefficient
D(A , B ) =
.
) (
min
1
1
(
(
(
(
¸
(

¸

·
¿
¿
=
=
k
s
s
k
s
s s
P
w
w p d
Time-normalized distance between A and B :
complicates
optimisation
(
¸
(

¸

·
¿
=
k
s
s s
P
w p d
C
1
) ( min
1
D(A , B ) =
¿
=
=
k
s
s
w C
1
Seeking a weighting coefficient function which
guarantees that:
can be solved by use of dynamic programming.
is independent of the warping function. Thus
Weighting Coefficient Definitions
• Symmetric form
w
s
= (i
s
– i
s-1
) + (j
s
– j
s-1
),


then C = n + m.
• Asymmetric form
w
s
= (i
s
– i
s-1
),


then C = n.
Or equivalently,
w
s
= (j
s
– j
s-1
),


then C = m.
Initial condition:

g(1,1) = 2d(1,1).
DP-equation:

g(i, j – 1) + d(i, j)
g(i, j) = min g(i – 1, j – 1) + 2d(i, j) .
g(i – 1, j) + d(i, j)
Warping window:

j – r ≤ i ≤ j + r.
Time-normalized distance:
D(A , B ) = g(n, m) / C
C = n + m.
Symmetric DTW Algorithm
(warping window, no slope constraint)
j
m
1
n 1 i


g(1,1)
g(n, m)
i = j + r
i = j - r
1
1
2
Time Series B
Time Series A
Initial condition:

g(1,1) = d(1,1).
DP-equation:

g(i, j – 1)
g(i, j) = min g(i – 1, j – 1) + d(i, j) .
g(i – 1, j) + d(i, j)
Warping window:

j – r ≤ i ≤ j + r.
Time-normalized distance:
D(A , B ) = g(n, m) / C
C = n.
Asymmetric DTW Algorithm
(warping window, no slope constraint)
j
m
1
n 1 i


g(1,1)
g(n, m)
i = j + r
i = j - r
1
0
1
Time Series B
Time Series A
Initial condition:

g(1,1) = d(1,1).
DP-equation:

g(i, j – 1) + d(i, j)
g(i, j) = min g(i – 1, j – 1) + d(i, j) .
g(i – 1, j) + d(i, j)
Warping window:

j – r ≤ i ≤ j + r.
Time-normalized distance:
D(A , B ) = g(n, m) / C
C = n + m.
Quazi-symmetric DTW Algorithm
(warping window, no slope constraint)
j
m
1
n 1 i


g(1,1)
g(n, m)
i = j + r
i = j - r
1
1
1
Time Series B
Time Series A
j
m
1
n 1 i
Time Series B
Time Series A
i = j + r
i = j - r














































DTW Algorithm at Work
Start with the calculation of g(1,1) = d(1,1).
Move to the second row g(i, 2) = min(g(i, 1),
g(i–1, 1), g(i – 1, 2)) + d(i, 2). Book keep for
each cell the index of this neighboring cell,
which contributes the minimum score (red
arrows).
Calculate the first row g(i, 1) = g(i–1, 1) +
d(i, 1).
Calculate the first column g(1, j) = g(1, j) +
d(1, j).
Trace back the best path through the grid
starting from g(n, m) and moving towards
g(1,1) by following the red arrows.
Carry on from left to right and from bottom
to top with the rest of the grid g(i, j) =
min(g(i, j–1), g(i–1, j–1), g(i – 1, j)) + d(i, j).
DTW Algorithm: Example
-0.87 -0.84 -0.85 -0.82 -0.23 1.95 1.36 0.60 0.0 -0.29
-0.88 -0.91 -0.84 -0.82 -0.24 1.92 1.41 0.51 0.03 -0.18
-
0
.
6
0




-
0
.
6
5




-
0
.
7
1




-
0
.
5
8




-
0
.
1
7




0
.
7
7




1
.
9
4

-
0
.
4
6




-
0
.
6
2




-
0
.
6
8




-
0
.
6
3




-
0
.
3
2




0
.
7
4




1
.
9
7

0.02 0.05 0.08 0.11 0.13 0.34 0.49 0.58 0.63 0.66
0.04 0.04 0.06 0.08 0.11 0.32 0.49 0.59 0.64 0.66
0.06 0.06 0.06 0.07 0.11 0.32 0.50 0.60 0.65 0.68
0.08 0.08 0.08 0.08 0.10 0.31 0.47 0.57 0.62 0.65
0.13 0.13 0.13 0.12 0.08 0.26 0.40 0.47 0.49 0.49
0.27 0.27 0.26 0.25 0.16 0.18 0.23 0.25 0.31 0.68
0.51 0.51 0.49 0.49 0.35 0.17 0.21 0.33 0.41 0.49
Time Series B
Time Series A
Euclidean distance between vectors

time . but also in terms of time progression since biological processes may unfold with different rates in response to different experimental conditions or within different organisms and individuals.What is Special about Time Series Data? Gene expression time series are expected to vary not only in terms of expression amplitudes.

. A non-linear (elastic) alignment produces a more intuitive similarity measure.Why Dynamic Time Warping? i i i time i i+2 time Any distance (Euclidean. …) which aligns the i-th point on one time series with the i-th point on the other will produce a poor similarity score. Manhattan. allowing similar shapes to match even if they are out of phase in the time axis.

Time Series B 1 p1 . … . pk ps = (is . P is called a warping function. ps . js ) js ps which minimizes the total distance between them.Warping Function Time Series A 1 m is n pk To find the best alignment between A and B one needs to find the path through the grid P = p1. … .

B ) =  k  d ( ps )  ws    s 1 k    ws    s 1   d(ps): distance between is and js ws > 0: weighting coefficient. B )). js ps Best alignment path between A and B : Time Series B 1 p1 P0 = arg min (D(A . P .Time-Normalized Distance Measure Time Series A 1 m is n pk Time-normalized distance between A and B : D(A .

Optimisations to the DTW Algorithm Time Series A 1 m is n The number of possible warping paths through the grid is exponentially explosive! reduction of the search space Restrictions on the warping function: • monotonicity js • continuity • boundary conditions • warping window Time Series B 1 • slope constraint. .

j i Guarantees that features are not repeated in the alignment. The alignment path does not jump in “time” index. The alignment path does not go back in “time” index. . Continuity: is – is-1 ≤ 1 and js – js-1 ≤ 1. j i Guarantees that the alignment does not omit important features.Restrictions on the Warping Function Monotonicity: is-1 ≤ is and js-1 ≤ js.

where r > 0 is the window length. Warping Window: |is – js| ≤ r. m j n (1.Restrictions on the Warping Function Boundary Conditions: i1 = 1.1) i Guarantees that the alignment does not consider partially one of the sequences. A good alignment path is unlikely to wander too far from the diagonal. j i Guarantees that the alignment does not try to skip different features and gets stuck at similar features. jk = m. ik = n and j1 = 1. The alignment path starts at the bottom left and ends at the top right. r .

j i ≤p ≤q Prevents that very short parts of the sequences are matched to very long ones.Restrictions on the Warping Function Slope Constraint: ( js – js ) / ( is – is ) ≤ p and ( is – is ) / ( js – js ) ≤ q . ]. where q ≥ 0 p 0 p 0 q 0 q 0 is the number of steps in the x-direction and p ≥ 0 is the number of steps in the ydirection. . The alignment path should not be too steep or too shallow. After q steps in x one must step in y and vice versa: S = p / q [0 .

Or equivalently. then C = n + m. Thus 1 k  D(A . is independent of the warping function. B ) = min  d ( ps )  ws  C P  s 1  can be solved by use of dynamic programming.The Choice of the Weighting Coefficient Time-normalized distance between A and B :    d ( ps )  ws   . then C = m. then C = n. D(A . . B ) = min  s 1 k P   ws  complicates   optimisation s 1   k Weighting Coefficient Definitions • Symmetric form ws = (is – is-1) + (js – js-1). • Asymmetric form Seeking a weighting coefficient function which k guarantees that: C   ws s 1 ws = (is – is-1). ws = (js – js-1).

i=j-r 1 1 2 g(i – 1. j) Warping window: j – r ≤ i ≤ j + r. B ) = g(n. j – 1) + 2d(i. Time-normalized distance: j D(A .Symmetric DTW Algorithm (warping window. j) g(i. m) g(i. no slope constraint) Time Series A 1 m i n Initial condition: g(1. j) . j – 1) + d(i.1) = 2d(1. j) + d(i. m) / C Time Series B 1 i=j+r g(1.1). j) = min g(i – 1.1) C = n + m. . DP-equation: g(n.

. no slope constraint) Time Series A 1 m i n Initial condition: g(1.Asymmetric DTW Algorithm (warping window. m) g(i. Time-normalized distance: j D(A .1) = d(1. j) = min g(i – 1. j) Warping window: j – r ≤ i ≤ j + r. i=j-r 1 0 1 g(i – 1. DP-equation: g(n. B ) = g(n. j) + d(i. m) / C Time Series B 1 i=j+r g(1.1).1) C = n. j – 1) + d(i. j) . j – 1) g(i.

j) + d(i. i=j-r 1 1 1 g(i – 1. j) = min g(i – 1.Quazi-symmetric DTW Algorithm (warping window. DP-equation: g(n. Time-normalized distance: j D(A .1) C = n + m. j – 1) + d(i. m) g(i. B ) = g(n.1). j) g(i. no slope constraint) Time Series A 1 m i n Initial condition: g(1. j – 1) + d(i.1) = d(1. j) Warping window: j – r ≤ i ≤ j + r. j) . m) / C Time Series B 1 i=j+r g(1. .

g(i – 1. 2)) + d(i. j). Carry on from left to right and from bottom to top with the rest of the grid g(i. m) and moving towards g(1.DTW Algorithm at Work Start with the calculation of g(1. j). j) = g(1. j) = min(g(i. 1) = g(i–1. 1).1). Time Series A 1 m i n Calculate the first row g(i. 2) = min(g(i. Book keep for each cell the index of this neighboring cell. j) + d(1. Trace back the best path through the grid starting from g(n. 1). g(i–1. g(i–1.1) = d(1. which contributes the minimum score (red arrows). 1) + d(i. i=j-r j Time Series B 1 i=j+r . j)) + d(i. 1). Move to the second row g(i. g(i – 1. j–1). j–1). Calculate the first column g(1. 2).1) by following the red arrows.

34 1.25 0.60 0.71 -0.08 0.10 0.17 -0.11 0.65 0.41 0.06 0.36 1.84 0.33 0.11 -0.12 0.68 0.51 0.31 0.13 1.68 -0.82 0.92 0.82 -0.11 0.32 0.51 0.63 -0.47 0.60 -0.88 0.08 0.74 Time Series B Euclidean distance between vectors .03 0.16 0.57 0.50 0.49 0.06 0.04 0.65 -0.26 0.21 0.02 -0.25 0.32 0.58 -0.59 0.41 0.49 0.13 0.17 0.04 0.35 0.85 -0.49 0.97 -0.47 0.24 0.13 0.66 -0.08 -0.77 0.27 0.18 0.91 0.08 0.27 0.08 0.66 0.26 0.62 -0.08 0.94 1.07 0.49 0.49 0.23 0.40 0.87 -0.32 0.06 0.64 0.62 0.60 0.68 0.06 0.31 0.84 -0.29 -0.63 -0.65 0.23 -0.49 0.95 1.08 0.46 -0.51 0.05 -0.DTW Algorithm: Example Time Series A 1.18 0.49 0.58 0.13 0.0 0.