Professional Documents
Culture Documents
applied to speech analysis and speech recognition. We will give details in the
next section and sketch here how time warping works. Suppose that two
sequences f Ž i ., i s 1, . . . , M 4 and g Ž j ., j s 1, . . . , N 4 characterize two sig-
nals f and g, respectively. We want to find the best match between f and g
by some alignment w, based on minimizing a cost function. The classical cost
function is given by
Ý Ž f Ž i. y g Ž j. . .
2
inf
w Ž
i , j .gw
yi j s f i Ž t i j . q « i j ,
Ž 1.
j s 1, . . . , n i Ž timing for subject i . ; i s 1, . . . , m Ž subjects. .
1 m
Ž 2. fˆ0 Ž ? . s Ý fˆi Ž uˆ i Ž ?. . .
m is1
FIG. 1. Example of aligning two curves: top left gives two smoothed velocity curves; top right is
the alignment by dynamic time warping; bottom left is the alignment by steps Ž2. ] Ž3.; bottom
right gives differences of shift functions and the identity function. The shift functions are produced
by dynamic time warping Ž solid line . and by steps Ž2. ] Ž3. Ž dotted ..
reference for basic ideas, and we present this approach in Section 2.1. A new
cost function is introduced there. In Section 2.2, we deal with the problem of
aligning m regression functions.
Ž 3. w s Ž Ž i Ž 1. , j Ž 1. . , . . . , Ž i Ž K . , j Ž K . . .
K
Ž 4. C0 Ž F , G , w . ' Ý d Ž f Ž i Ž k . . , g Ž jŽ k . . . r Ž k .
ks1
is small enough. The warping path w has to satisfy several side conditions.
Ž 5. D 0 Ž F , G . s inf C0 Ž F , G , w . .
w
Conditions Ži. and Žii. are natural. Condition Žiii. is posted since the start
and end of words are detected before time normalization. Condition Živ. is
posted according to the concept that time-axis fluctuations should not lead to
too excessive differences in timing. Finally Žv. is used to prevent unrealistic
warping if < M y N < is relatively large. Figure 2 shows what time warping
does.
In principle a cost function could be any functional of the two input
sequences and a warping path, depending on the purpose of the application.
The cost functions used in speech recognition and pattern classification are
varieties of the classical quadratic cost function Žsee Table 1 for some
examples..
Some explanations might be helpful. In the second line of the table, P Ž w .
is a penalty function which penalizes jumps and flat spots on the warping
path w, and b is a coefficient which controls how severe the penalty is
w b s 0.0075 in Roberts, Lawrence, Eisen and Hoirch Ž1987., chosen by trial
1256 K. WANG AND T. GASSER
FIG. 2. Tutorial illustration of dynamic time warping. Top left: two given curves; top right:
warping function; bottom figures: the warping of one curve to the other.
Ý f Ž i. g Ž j. s Ý Ž f Ž i . y g Ž j . . y 12 Ý Ž f Ž i . 2 q g Ž j . 2 . .
2
y 1
2
Ž i , j .g v Ž i , j .gw Ž i , j .gw
The first term is just the classical cost function. Since the second term tends
to make the warping path longer, a penalty is needed. A version of P Ž w .
given in Roberts, Lawrence, Eisen and Hoirch Ž1987. is the following. Write
w s Ž iŽ k ., jŽ k ..: k s 1, . . . , K 4 where K is the length of w. Then
K
P Ž w. s Ý p Ž w, k . ,
ks2
TABLE 1
Cost functions
Ý Ž f Ž i . y g Ž j .. 2 Classical
Ž i, j .gw
ž g Ž uŽ t . .
/
2
f Žt.
H01
a2
5f5
y
5 g5
ž g 9Ž uŽ t . .
/
2
f 9Ž t .
q Ž1 y a . q f Ž u9 Ž t . . dt
2
y Proposed here
5 f 95 5 g 95
ALIGNMENT OF CURVES 1257
where
¡Ž k y r k y 1. ,
2
if i Ž k . s i Ž k y 1 . , r k s min r F k y 1: i Ž r . s i Ž k . 4 ,
p Ž w, k . s~ Ž k y r y 1. ,
2
k
if j Ž k . s j Ž k y 1 . , r k s min r F k y 1: j Ž r . s j Ž k . 4 ,
¢0, otherwise.
With the penalty term added, the cost function is no longer a convex function.
The third line of Table 1 is the cost function proposed in this paper. Details
are given below. Our cost function is inspired by Sobolev norms and by the
least squares principle. The normalization with the sup-norm 5 ? 5 in Ž6. is
intended to reduce the differences in amplitudes of the curves when estimat-
ing the shift functions. This should prevent us from explaining amplitude
variability between curves in terms of dynamic variability. There are other
cost functions used in speech analysis. For example, a cost function defined in
¨
terms of the linear predictive coding features sets of signals is used in Hohne,
Coker, Levinson and Rabiner Ž1983..
For our purpose, where aligning maxima, minima, and inflection points of
curves is important, we incorporate derivatives of functions into the cost
function. Apart from heuristics, theoretical and simulation analysis lends
support to this idea. Incorporating higher order derivatives Žthe second
derivative in particular. is possible in principle, but problems of estimating
higher order derivatives from noisy data might arise.
Now, we give details of the new cost function. Define a functional F of the
functions f, f 9, g, g 9, u and a real variable a by
ž f Ž t. g Ž uŽ t . .
/
2
F Ž f , f 9, g , g 9, u, a . Ž t . ' a 2 y
5f5 5 g5
Ž 6.
ž f 9Ž t . g 9Ž uŽ t . .
/
2
qŽ 1 y a .
2
y .
5 f 95 5 g 95
Ž 7 . C Ž f , f 9, g , g 9, u, a . ' H F Ž f , f 9, g , g 9, u, a . Ž t . q f Ž u9 Ž t . . dt.
1
The function f serves as a penalty function which plays a role similar to the
side conditions Ži. ] Žv.. It is specified as follows. Let M ) d ) 0 be constants
and define f to be a convex function satisfying the following conditions:
f Ž x . s 0 for x g w d q r, M y r x with a small positive number r; f Ž dq. s
f Ž My. s `; f Ž x . s ` for x g Ž d , M . c ; and f g C 4 Ž d , M .. An example of
1258 K. WANG AND T. GASSER
such a f is given by
f Ž t . s c Ž d q r y t . IŽ d , dqr x Ž t . r Ž t y d .
5
q Ž t y M q r . Iw Myr , M . Ž t . r Ž M y t . , t g Žd, M.
5
The principle is simple. Let f e be the chosen reference curve. Warp each
curve f i , i s 1, . . . , m, to f e and denote the warping function by h i Ž t .,
t g w 0, 1x . Then
1 m
hŽ t . ' Ý h Ž t.
m is1 i
is the average timing with respect to f e . Since each h i is strictly increasing,
the function h is strictly increasing and it has an inverse hy1 . Now it is clear
that
u i Ž t . ' h i Ž hy1 Ž t . .
is the correct shift function to transform f i to the average time scale. A
structural average of f i : i s 1, . . . , m4 is then computed as
1 m
f0 Ž t . s
m
Ý fi Ž u i Ž t . . .
is1
The reference curve should be close to the typical pattern of the sample
curves and should have more or less the same features as most sample
curves. A consideration in choosing a reference curve is the trade-off between
accuracy and computational effort. Several possibilities are given here.
1. In principle one could choose a curve randomly from the sample as
reference curve, following the arguments given above. This is computation-
ally attractive even when m is large, but the statistical quality may suffer
if an atypical curve were selected. This would inevitably make it more
difficult to estimate the warping function well.
2. Take each f i , i s 1, . . . , m, as reference curve. Warp every other curve to f i
and compute the total cost Ži.e., the sum of the cost for warping f j to f i ,
j / i .. Now choose f e to be the curve corresponding to the maximum total
cost. The main problem with this procedure is that it can require pro-
hibitive computing time if m is large. Note that one would need to solve
the variational problem Ž8. mŽ m y 1.r2 times.
3. An iterative method can be used. First take f e Ž t . s Ž1rm.Ý m Ž .
is1 f i t , the
cross-sectional average. Then compute a structural average based on f e . In
the following steps, take the structural average computed in the previous
step as the reference curve of the next step and iterate. Computation is not
a problem since a few iterations are enough. This proposal shows good
statistical properties if the relative shifts among curves are small. If the
shifts are large, then the cross-sectional average might be too atypical to
start with, since structure gets lost.
4. For large m, one could select a random sample of size k from f i 4 . Assume
k s 2 j. Partition this selected sample into 2 jy1 pairs and compute
a structural average for each pair by a single warping. Now we have
2 jy1 structural averages. Partition this group into 2 jy2 pairs and compute
a structural average for each pair. Repeat this procedure till only one
structural average is left. Take this one as the reference curve f e .
1260 K. WANG AND T. GASSER
Now the question arises as to which of proposals Ž1. ] Ž4. should be used in
practice. Our suggestion is the following. If m is small, Ž2. should be the
choice. If m is large but one has confidence that the relative shifts among
sample curves are small and that no outlier should be in the sample, Ž3.
would be a good choice. Finally, if m is large and no prior information about
the quality of the sample is available, Ž4. is recommended. Another possibility
would be to combine Ž2. and Ž4.: select a random sample of size k from f i 4
and perform Ž2. on this subsample to compute a reference curve. As men-
tioned before, analysis of the methods proposed in this subsection is not
considered in this paper.
Ž 10. fi Ž t . s ai s
ž t y bi
ci / q di .
Ž u Ž t . , a . s Ž c j Ž t y bi . rc i q bj , 0 . .
ALIGNMENT OF CURVES 1261
Thus, the newly introduced cost function is able to identify linear shifts
within SIM correctly. To appreciate this, note that using other cost functions
in Table 1 will not lead to correct shift functions within this simple model.
¨
For comparing two functions f 1 and f 2 , Hardle and Marron Ž1990. consid-
ered a somewhat more general model with amplitude-phase modulation:
Ž 11. f 2 Ž t . s Suy1
0
f 1 Ž Tuy1
0
t.,
where Su and Tu are invertible parametric transformations. For linear trans-
formations Su and Tu this reduces to Ž10.. They proposed to estimate u 0 by
minimizing the loss function
LŽ u . s H f 1 Ž t . y Su f 2 Ž Tu t .
2
r Ž t . dt ,
Despite the great generality allowed for shifts, this model is still identifiable.
The optimal shift function from f j to f i can be obtained by dynamic time
warping with cost function Ž7. as
Ž u Ž t . , a . s Ž uy1
j Ž ui Ž t . . , 1. .
Thus, shift functions can only be extracted in relative terms with respect to
some function chosen as reference Ž f i in this example.. Again, other cost
functions in Table 1 are not successful for extracting the correct shift func-
tion.
An interesting generalization emerges when replacing the individual factor
a i by some parametric function a i Ž t, bi .:
Ž 13. f i Ž t . s a i Ž t , bi . s Ž u i Ž t . . q d i .
Here a i is an a priori known function with individual parameter bi g R d ,
and d i is an unknown constant. Possibly, this quite general semiparametric
model is also identifiable when requiring proper conditions for bi , g i , and d i .
In any case, recovering the shift function between two such functions would
require a more complicated cost function than Ž7.. It is plausible that a
back-fitting procedure}such as the one used in Kneip and Gasser Ž1988. }
could solve this problem. However, the general amplitude-phase modulated
model
fi Ž t . s ai Ž t . s Ž u i Ž t . .
is clearly not identifiable. This is true even when requiring obvious conditions
such as Ea i Ž t . ' 1 and Eg i Ž t . ' t. Nonetheless, simulations show that cost
function Ž7. yields reasonable results even for this model.
1262 K. WANG AND T. GASSER
With the new cost function Ž7., shift functions can be fully recovered by
dynamic time warping in models ŽSIM. and ŽNLSM. if data are noise free.
This seems to us an important achievement, since dynamic time warping is a
relatively easy, automatic method. It can be attributed to the inclusion of
derivatives in the cost function. For noisy data, some nonparametric function
fitting method like kernel estimators or local polynomial fitting allows the
estimation of the function itself and of its derivatives as a preliminary step.
The problem of estimating derivatives from noisy data might have prevented
their earlier use in a cost function.
and the support of Kn is w y1, 1x . Note that optimal kernels are explicitly
known as polynomials of order Ž n q 2.. Then we define
ž /
vyt
1 n
Hs
sj
fˆŽ t . s Ý yj K0 dv,
b0 js1 jy1
b0
ž /
vyt
1 n
Hs
sj
fˆ9 Ž t . s Ý yj K1 dv.
b 12 js1 jy1
b1
½ ž
inf C fˆ, fˆ9, gˆ , gˆ9, u, a : u g C 1 , a g R . / 5
Obviously some theoretical and practical questions need to be addressed.
Does the variational problem Ž8. have a solution? Is a solution unique? How
does dynamic time warping perform with noisy data? We address these
questions in this section. As stated before, the analysis focuses on the
alignment of two functions. First we prove the existence of a solution to Ž8..
s
ž Fu u Ž f , f 9, g , g 9, u, a .
Fu a Ž f , f 9, g , g 9, u, a .
Fu a Ž f , f 9, g , g 9, u, a .
Fa a Ž f , f 9, g , g 9, u, a . / .
Ža. EŽ u,
ˆ aˆ .Ž t . y Ž u, a .Ž t . s ½ O Ž b 12 . qo Ž ny1r2 by3r2
OŽ b 02 . qo Ž n
1
y1r2
log 2 Ž n . . ,
by1r2
0 log Ž n . . ,
2
a /1,
a s1.
Žb. If a / 1 then
'nb 5
1 Ž u, ˆ aˆ . Ž t . « N Ž 0, A Ž t . V Ž a . A Ž t . .
ˆ aˆ . Ž t . y E Ž u,
y1 y1
ž f 9Ž t . g 9Ž uŽ t . .
/H
2
'
If V Ž a . s 0 then nb15 wŽ u,
ˆ aˆ .Ž t . y EŽ u,
ˆ aˆ .Ž t .x ª 0 in probability.
Žc. If a s 1 then
'nb 3
0 Ž u, ˆ aˆ . Ž t . ª N Ž 0, A Ž t . V1 A Ž t . .
ˆ aˆ . Ž t . y E Ž u,
y1 y1
ž f Ž t. g Ž uŽ t . .
/H
2
If V1 s 0 then 'nb 3
0 ˆ aˆ .Ž t . y EŽ u,
wŽ u, ˆ aˆ .Ž t .x ª 0 in probability.
The proof is given in the Appendix. It shows that dynamic time warping
performs reasonably well with noisy data, though the convergence rate is not
very fast Ž ny1 r7 for a / 1 and ny1r5 for a s 1.. We point out that these
results on bias and variance hold for any cost function with three continuous
´
Frechet derivatives. This will be clear from the proof. We make some remarks
about this theorem.
REMARKS. Ža. Condition Ži. is not a restriction since d , r and My1 can be
as small as one wants.
Žb. It would be nice to have a convergence rate for 5 uˆ y u 5. As a result of
the nonuniqueness of solutions Ž8., it could, however, be complicated to show
ˆ such that 5 uˆ y u 5 ª 0.
just that there exists an optimal solution u
ALIGNMENT OF CURVES 1265
H0
1
Ž u1 , a 1 . A Ž t . Ž u1 , a 1 . q f 0 Ž u9 . Ž uX1 . dt ,
T 2
s
is nonnegative for any admissible Ž u1 , a 1 .. See the proof in the Appendix for
the definition of the second variation, or second Frechet ´ derivative. Since u
satisfies Ži., f 0 Ž u9. ' 0. Hence AŽ t . is semipositive definite for all t g w 0, 1x .
Since AŽ t . is continuous in t, condition Žii. implies that AŽ t . is positive
definite in a neighborhood of t. This makes the solution of Ž8. unique in a
neighborhood of t, and therefore makes it possible to prove the pointwise
result given in this theorem.
Žd. One can check condition Žii. for some interesting models. SIM is an
easy example while the proof for NLSM is not so easy.
Že. In the structural analysis proposed by Kneip and Gasser wŽ 1992.,
Theorem 3, page 1289x , the estimated shift function from noisy data con-
verges to the true shift function at a rate of O Ž ny1 r5 .. Here the rate
' nb15 s O Ž ny1r7 . when a F 1 is slower because the second derivative of gˆ is
involved when solving Ž8. Žsee the proof given in the Appendix..
2
2 1
t G 0.55.
with
Under optimal shifting of the second function to the first one, the extreme
t 2 j should be shifted to t 1 j for j g K ' K 1 l K 2 . Let ˆ
t 1 j be the image of t 2 j
under dynamic time warping for j g K. We use
1
Ý Ž ˆt 1 j y t 1 j .
2
Ž 15. r'
<K< jgK
as an error measurement for aligning extremes. This choice is made for the
following reasons. First, the norm 5 u ˆ y u 5 2 , with uˆ and u estimated and true
shift functions, respectively, is not very sensitive and thus not appropriate as
criterion Ža different standardization might, however, help.. One could also
compute the difference 5 f 1Ž?. y f 2 Ž uŽ?..5 2 , but this is often not informative
enough about the appropriate alignment.
TABLE 2
Simulation with SIM model
The sample size for each curve is 100. We happen to take s s s« in this and
subsequent subsections, but there is no special reason for doing so.
Table 2 gives the results of 100 runs. Let r i , i s 1, . . . , 100, be the error
Ž15. of the ith run. The column ‘‘Mean’’ is the sample means of r i 4 , while the
column ‘‘ variance’’ is the sample variance of r i 4 , leading easily to standard
errors of simulation. The rows starting with ‘‘Classical’’ give the results for
the classical quadratic cost function, and the rows starting with ‘‘New’’ give
the results for cost function Ž7..
A typical run with s s s« s 0.2 is shown in Figure 4. The error of aligning
extremes using the classical cost function is 3.33e y 3, and the error using
the new cost function is 1.66 e y 4. The true shift function is linear for the
SIM model.
The results are visually appealing, and even more so for the new cost
function.
5.2. Nonlinear shift model. We consider a nonlinear shift model for the
simulation. The shift function is modeled by
ai
hi Ž t . s t q sin Ž 2p Ž bi t q g i . .
2p
and the model is again
fi Ž t . s ai s Ž h i Ž t . . q d i .
The parameters a i , d i are generated as in the last simulation, while a i , bi , g i
are generated as follows:
a i s max y0.7, min 0.35N Ž 0, 1 . , 0.74 4 ,
bi s min 1 q 0.2 N Ž 0, 1 . , 1.44 , g i s 0.5 Ž U Ž 0, 1 . y 0.5. .
Note that typically < bi < F 1.4 and therefore
hUi Ž t . s 1 q a i bi cos Ž 2p Ž bi t q g i . . G 1 y < a i bi < ) 0.
Thus we have strictly increasing shift functions.
1268 K. WANG AND T. GASSER
FIG. 4. Alignment for SIM model. Top left: two smoothed sample curves; top right: warping
result using the classical cost function; bottom left: warping result using the new cost function;
bottom right: two warping functions Ž dotted line for classical cost function, solid line for new cost
function..
The simulation is done as follows. First 100 shift functions are generated
and the data for 100 curves are formed as
yi j s f i Ž t i j . q N Ž 0, s« . .
Each curve is sampled at 100 points. After kernel smoothing, each curve is
aligned to the shape function sŽ t . and the error r i is computed. Finally, we
compute the sample mean and variance of r i 4 . The results from 100 runs are
given in the Table 3.
A typical run with s s s« s 0.2 is shown in Figure 5. The error of aligning
extremes using the classical cost function is 1.2 e y 4, and the error using the
new cost function is 2.55e y 5. Evidently, dynamic time warping with the
classical cost function has difficulty in aligning the signals properly. This is
not due to noise but to the more complicated shift function. The new cost
function Ž7. performs much better.
ALIGNMENT OF CURVES 1269
TABLE 3
Simulation with NLSM model
APPENDIX A
Proofs.
lim C Ž f , f 9, g , g 9, u n , a n . s
nª`
Ý C Ž f , f 9, g , g 9, u, a . .
u, a
FIG. 5. Alignment for NLSM model. Top left: two smoothed sample curves; top right: warping
result using the classical cost function; bottom left: warping result using the new cost function;
bottom right: warping functions Ž dotted line for classical cost function, solid line for new cost
function, dashed line for true shift function..
A.2. Proof of Theorem 4.1. The proof proceeds as follows. First, the
difference Ž u, ˆ aˆ . y Ž u, ˆ a . can be represented as a linear functional of the
basic statistics plus higher order error terms. These basic statistics are
fˆŽ t . y f Ž t ., fˆŽ1. Ž t . y f 9Ž t ., fˆ9Ž t . y f 9Ž t ., fˆ9Ž1. Ž t . y f 0 Ž t ., 5 fˆ5 q 5 f 5, 5 fˆ9 5 y 5 f 9 5,
and similar statistics with f Ž t . replaced by g Ž uŽ t .. Note that fˆŽ1. Ž t . is
different from fˆ9Ž t .. To treat bias and variance asymptotically, we need only
to take care of these basic statistics. The bias of the linear functional is
simply the linear combination of the bias of those basic statistics, available in
the literature. To deal with the variance we note that the dominating terms
are those involving second derivatives: fˆXŽ1. Ž t . y f 0 Ž t . and gˆ9Ž1. Ž uŽ t .. y
g 0 Ž uŽ t ... All other terms can be neglected as far as asymptotic variance is
concerned. Therefore the main issue is to develop such a representation,
which will be the first thing to do.
ALIGNMENT OF CURVES 1271
yDC Ž f , f 9, g , g 9, u, a . w u1 , a 1 x . dy1 ,
and D 3 C is defined in the same way.
Since Ž u, ˆ fˆ9 g,
ˆ aˆ . is the minimizer of C Ž f, ˆ gˆ9, u, a ., for any Ž u1 , a 1 . with
u1 g C 0, 1x ,
1w
ž
0 s DC fˆ, fˆ9, gˆ , gˆ9, u, /
ˆ aˆ w u1 , a 1 x
s DC Ž f , f 9, g , g 9, u,
ˆ aˆ . w u1 , a 1 x
Ž 16.
ž
qR f , f 9, g , g 9, fˆ, fˆ9, gˆ , gˆ9, u, /
ˆ aˆ w u1 , a 1 x
s D 2 C Ž f , f 9, g , g 9, u, a . u1 , a 1 ; u
ˆ y u, aˆ y a
q O Ž <Ž u
ˆ y u, aˆ y a . < 2 .
ž
qR f , f 9, g , g 9, fˆ, fˆ9, gˆ , gˆ9, u, /
ˆ aˆ w u1 , a 1 x .
We have used the fact that DC Ž f, f 9, g, g 9, u, a .w u1 , a 1 x s 0. The O term
equals
D 3 C Ž f , f 9, g , g 9, cu , ca . u1 , a 1 ; u
ˆ y u, aˆ y a ; uˆ y u, aˆ y a
for some Ž cu , ca . between Ž u, a . and Ž u,
ˆ aˆ .. The term R is simply
ž
DC fˆ, fˆ9, gˆ , gˆ9, u, /
ˆ aˆ w u1 , a 1 x y DC Ž f , f 9, g , g 9, u,
ˆ aˆ . w u1 , a 1 x .
ˆ fˆ9, g,
Since f does not depend on f, f 9, g, g 9, f, ˆ gˆ9 and their derivatives, R
does not depend on f Žcanceled out.. Consequently
ž
R f , f 9, g , g 9, fˆ, fˆ9, gˆ , gˆ9, u, /
ˆ aˆ w u1 , a 1 x
Ž 17. s H0
1
ž F ž fˆ, fˆ9, gˆ, gˆ9, u,ˆ aˆ / y F Ž f , f 9, g , g 9, u,ˆ aˆ . / u
u u 1
D 2 C Ž f , f 9, g , g 9, u, a . u1 , a 1 ; u
ˆ y u, aˆ y a
H0
1 T
s Ž u1 , a 1 . A Ž t . Ž uˆ y u, aˆ y a . q f 0 Ž u9 . uX1 Ž u9
ˆ y u9 . dt.
Since Ž u1 , a 1 . can be chosen arbitrarily, it follows from Ž16., Ž17. and f 0 Ž u9.
' 0 that
with
B̂ Ž t . s
ž
Fu fˆ, fˆ9, gˆ , gˆ9, u,
ˆ aˆ / y F Ž f , f 9, g , g 9, u,
ˆ aˆ . a
0 .
One could also derive Ž18. from Euler equations. Since the inverse Ay1 exists
and since Fu , Fa are continuous, one immediately gets Ž u ˆ y u, aˆ y a . ª 0 in
probability from the fact that Ž f, ˆ fˆ9, g,
ˆ gˆ9, fˆŽ1., gˆŽ1., fˆ9Ž1., gˆXŽ1. . y Ž f, f 9, g, g 9,
f 9, g 9, f 0, g 0 . ª 0 in probability.
We now deal with B. ˆ First note the following two algebraic equalities
fˆ
5 fˆ5
s
f
5f5
q
fˆy f
5f5
y
f 5 fˆ5 y 5 f 5
5f5 5f5
y
5 fˆ5 y 5 f 5
5f5 ž fˆ
5 fˆ5
y
f
5f5 / ,
fˆŽ1.
5 fˆ5
s
f9
5f5
q
fˆŽ1. y f 9
5f5
y
f 9 5 fˆ5 y 5 f 5
5f5 5f5
y
5 fˆ5 y 5 f 5
5f5 ž fˆŽ1.
5 fˆ5
y
f9
5f5 / .
The same relations hold if we replace, f, f 9, f, ˆ fˆŽ1. by f 9, f 0, fˆ9, fˆ9Ž1.. For any
Ž u, a ., we simply write f for f Ž t ., g for g Ž uŽ t .., and so on. Then it follows
from the two algebraic equalities that
ž /
Fu fˆ, fˆ9, gˆ , gˆ9, u, a y Fu Ž f , f 9, g , g 9, u, a .
s 2a 2
ž gˆ
5 gˆ 5
y
5 fˆ5
fˆ
/ gˆŽ1.
5 gˆ 5
y
ž 5 g5
g
y
5f5
f
/ g9
5 g5
y 2Ž 1 y a .
2
ž gˆ9
5 gˆ9 5
y
fˆ9
5 fˆ9 5 / gˆXŽ1.
5 gˆ9 5
y
ž g9
5 g 95
y
f9
5 f 95 / g0
5 g 95
qo Ž < G 2 < . .
ALIGNMENT OF CURVES 1273
G 1 Ž g , gˆ , f , fˆ, u, a .
g 9 gˆ y g g9 g 5 gˆ 5 y 5 g 5 g 9 fˆy f
s y2 y
5 g5 5 g5 5 g5 5 g5 5 g5 5 g5 5 f 5
q2
g 5 fˆ5 y 5 f 5
5 g5 5f5
q
ž g
5 g5
y
5f5
f
/ gˆŽ1. y g 9
5 g5
,
G 2 Ž g 9, gˆ9, f 9, fˆ9, u, a .
g0 gˆ9 y g 9 g0 g 9 5 gˆ9 5 y 5 g 9 5 g0 fˆ9 y f 9
s y2 y
5 g 95 5 g 95 5 g 95 5 g 95 5 q9 5 5 g 95 5 f 95
q2
g 9 5 fˆ9 5 y 5 f 9 5
5 g 95 5 f 95
q
ž g9
5 g 95
y
f9
5 f 95 / gˆ9Ž1. y g 0
5 g 95
.
ž /
Fa fˆ, fˆ9, gˆ , gˆ9, u, a y Fa Ž f , f 9, g , g 9, u, a .
ž
s 4a H Ž f , fˆ, g , gˆ , u, a . y 4 Ž 1 y a . H f 9, fˆ9, g˜ , gˆ9, u, a q o Ž < H < . . /
The H functional is defined by
H Ž f , fˆ, g , gˆ , u, a .
s
ž f
5f5
y
g
5 g5 /ž fˆy f
5f5
y
f 5 fˆ5 y 5 f 5
5f5 5f5
y
gˆ y g
5 g5
q
5 g5
g 5 gˆ 5 y 5 g 5
5 g5 / ,
ž / ž ž
o Ž < H < . s o < H Ž f , fˆ, g , gˆ , u, a . < q o < H f 9, fˆ9, g˜ , gˆ9, u, a < . //
Combining all the above considerations, we have shown that
ˆŽ t . s Bˆ0 Ž t . q higher order error terms
B
with
ž
ˆ 2 G1 g , gˆ, f , fˆ, u, /
ˆ aˆ q 2 Ž 1 y aˆ . G 2 g 9, gˆ9, f 9, fˆ9, u, ž /
0
2
2a ˆ aˆ
B̂0 Ž t . s .
ž
ˆ H f , fˆ, g , gˆ, u,
4a /
ˆ aˆ y 4 Ž 1 y aˆ . H f 9, fˆ9, g 9, gˆ9, u,
ˆ aˆ ž /
Let us make one more observation here. If we replace Ž u, ˆ aˆ . by Ž u, a . in
ˆ0 Ž t ., the error terms are those like Ž gˆŽ j . y g Ž j ..Ž uˆ y u., with some j
B
between u ˆ and u, by a first order Taylor expansion. Therefore,
ˆŽ t . s Bˆ1 Ž t . q higher order error terms,
B
1274 K. WANG AND T. GASSER
PROOF OF Ža.. This part follows from Ž19. and the lemma of Kneip and
Gasser wŽ 1992., pages 1291 and 1292x , which shows that
E Ž fˆy f . s O Ž b 02 . , E Ž fˆŽ1. y f 9 . s O Ž b 02 . ,
E Ž fˆ9 y f 9 . s O Ž b 12 . , E Ž fˆ9Ž1. y f 0 . s O Ž b 12 . .
Also,
O Ž b 02 . s o Ž b 12 . , Ž nb 0 .
y1
ž
s o Ž nb13 .
y1
/.
Part Ža. follows.
ž /
2
E Ž 5 fˆ9 5 y E 5 fˆ9 5 . s E 5 fˆ9 5 y 5 Efˆ9 5 q 5 Efˆ9 5 y E 5 fˆ9
2
F 4 E 5 fˆ9 y Efˆ9 5 2 s o
ž log 4 Ž n .
nb13 /
s o Ž ny1 by5
1 . s o EŽ g ž
ˆ9Ž1. y Egˆ9Ž1. . . /
2
Now it is clear that the covariance of any two different terms in B ˆ1Ž t . is an
oŽ?. term of VarŽ gˆ9Ž1. . because <CovŽ X, Y .< F ŽVarŽ X .VarŽ Y ..y1 r2 for any X
and Y.
ALIGNMENT OF CURVES 1275
Part Žb. now follows from Ž19. and the fact that w cf. the lemma of Kneip and
Gasser Ž1992.x
Var Ž gˆ9Ž1. . s
s2
nb15
Hy1 K
1 Ž1.
1 Ž x . dx q o
2
ž /
1
nb15
and that gˆ9Ž1. Ž t . y Eg9
ˆ Ž1. Ž t . converges to a normal variable if properly scaled;
compare Gasser and Muller ¨ Ž1984..
PROOF OF Žc.. If a s 1, then B ˆ1Ž t . does not depend on fˆ9, gˆ9 and gˆ9Ž1.. The
dominating term is then the term involving gˆŽ1., as far as asymptotic
variance-covariance is concerned. Again the covariance of any two terms in
ˆ1Ž t . is an oŽ?. term of VarŽ gˆŽ1. .. Note that
B
Var Ž gˆŽ1. . s
s2
nb 03
Hy1 K
1 Ž1.
0 Ž x . dx q o
2
ž /
1
nb 03
.
The rest of the proof for Žc. is the same as in the proof of Žb.. I
REFERENCES
GASSER, TH., KNEIP, A., BINDING, A., PRADER, A. and MOLINARI, L. Ž1991.. The dynamics of linear
growth in distance, velocity and acceleration. Ann. of Human Biology 18 187]205.
GASSER, TH., KNEIP, A. and KOHLER ¨ , W. Ž1991.. A flexible and fast method for automatic
smoothing. J. Amer. Statist. Assoc. 86 643]652.
GASSER, TH., KNEIP, A., ZIEGLER, P., MOLINARI, L., PRADER, A. and LARGO, R. Ž1994.. Development
and outcome of indices of obesity in normal children. Ann. of Human Biology 21
275]286.
GASSER, TH. and MULLER¨ , H. Ž1984.. Estimating regression functions and their derivatives by
the Kernel method. Scand. J. Statist. 11 171]185.
¨
GASSER, TH., MULLER , H. and MAMMITZSCH, V. Ž1985.. Kernels for nonparametric curve estima-
tion. J. Roy. Statist. Soc. Ser. B 47 238]252.
ARDLE, W. and MARRON, J. S. Ž1990.. Semiparametric comparison of regression curves. Ann.
H¨
Statist. 18 63]89.
¨
HOHNE , H., COKER, C., LEVINSON, S. and RABINER, L. Ž1983.. On temporal alignment of sentences
of natural and synthetic speech. IEEE Trans. Acoust. Speech Signal Process. 31
807]813.
KNEIP, A. and ENGEL, J. Ž1995.. Model estimation in nonlinear regression under shape invari-
ance. Ann. Statist. 23 551]570.
KNEIP, A. and GASSER, TH. Ž1988.. Convergence and consistency results for self-modeling nonlin-
ear regression. Ann. Statist. 16 82]112.
KNEIP, A. and GASSER, TH. Ž1992.. Statistical tools to analyze data representing a sample of
curves. Ann. Statist. 20 1266]1305.
LAWTON, W. H., SYLVESTRE, E. A. and MAGGO, M. S. Ž1972.. Self-modeling regression. Technomet-
rics 14 513]532.
PARSONS, T. Ž1986.. Voice and Speech Processing. McGraw-Hill, New York.
QI, Y. Ž1992.. Time normalization in voice analysis. J. Acoust. Soc. Am. 92 2569]2576.
RABINER, L. and SCHMIDT, C. Ž1980.. Application of dynamic time warping to connected digit
recognition. IEEE Trans. Acoust. Speech Signal Process. 28 377]388.
1276 K. WANG AND T. GASSER
RAMSAY, J. and DALZELL, C. Ž1991.. Some tools for functional data analysis Žwith discussion..
J. Roy. Statist. Soc. Ser. B 53 539]572.
RAO, C. Ž1958.. Some statistical methods for the comparison of growth curves. Biometrics 14
1]17.
RICE, J. and SILVERMAN, B. Ž1991.. Estimating the mean and covariance structure nonparametri-
cally when data are curves. J. Roy. Statist. Soc. Ser. B 53 233]243.
ROBERTS, K., LAWRENCE, P., EISEN, A. and HOIRCH, M. Ž1987.. Enhancement and dynamic time
warping of somatosensory evoked potential components applied to patients with
multiple sclerosis. IEEE Trans. Biomed. Eng. BME-34 397]405.
SAKOE, H. and CHIBA, S. Ž1978.. Dynamic programming algorithm optimization for spoken word
recognition. IEEE Trans. Acoust. Speech Signal Process. 26 43]49.
SILVERMAN, B. Ž1995.. Incorporating parametric effects into functional principal components
analysis. J. Roy. Statist. Soc. Ser. B 57 673]689.
¨
STUTZLE , W., GASSER, TH., MOLINARI, L., LARGO, R., PRADER, A. and HUBER, P. Ž1980.. Shape-
invariant modeling of human growth. Ann. of Human Biology 7 507]528.