This action might not be possible to undo. Are you sure you want to continue?
Pˆ ole d’ing´enierie math´ematique (INMA)
Travail de ﬁn d’´etudes
Promoteur : Prof. PierreAntoine Absil
DISCRETE CURVE FITTING ON MANIFOLDS
NICOLAS BOUMAL
LouvainlaNeuve, Juin 2010
Je remercie chaleureusement
PierreAntoine Absil, mon promoteur
ainsi que
Quentin Rentmeesters, son doctorant,
pour leur guidance ´eclair´ee.
Je d´edicace ce travail ` a
Pierre Bolly, qui m’a introduit aux math´ematiques.
Contents
Contents i
Notations iii
Introduction 1
1 Discrete curve ﬁtting in R
n
3
1.1 The problem in a continuous setting . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Objective discretization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Leastsquares formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.4 What if we generalize to Riemannian manifolds? . . . . . . . . . . . . . . . . . . 6
1.5 Short state of the art . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2 Elements of Riemannian geometry 9
2.1 Charts and manifolds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.2 Tangent spaces and tangent vectors . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.3 Inner products and Riemannian manifolds . . . . . . . . . . . . . . . . . . . . . . 13
2.4 Scalar ﬁelds on manifolds and the gradient vector ﬁeld . . . . . . . . . . . . . . . 14
2.5 Connections and covariant derivatives . . . . . . . . . . . . . . . . . . . . . . . . 15
2.6 Distances and geodesic curves . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.7 Exponential and logarithmic maps . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.8 Parallel translation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3 Objective generalization and minimization 23
3.1 Geometric ﬁnite diﬀerences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2 Generalized objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3 Alternative discretizations of the objective* . . . . . . . . . . . . . . . . . . . . . 26
3.4 A geometric steepest descent algorithm . . . . . . . . . . . . . . . . . . . . . . . 28
3.5 A geometric nonlinear conjugate gradient algorithm . . . . . . . . . . . . . . . . 29
3.6 Step choosing algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4 Discrete curve ﬁtting on the sphere 33
4.1 S
2
geometric toolbox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.2 Curve space and objective function . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.3 Gradient of the objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.4 Results and comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
5 Discrete curve ﬁtting on positivedeﬁnite matrices 49
5.1 P
n
+
geometric toolbox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.2 Curve space and objective function . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.3 Gradient of the objective . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.4 Alternative I: linear interpolation . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.5 Alternative II: convex programming . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.6 Alternative III: vector space structure . . . . . . . . . . . . . . . . . . . . . . . . 57
5.7 Results and comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
Conclusions and perspectives 63
i
ii CONTENTS
A Line search derivative on S
2
65
B Gradient of E
a
for P
n
+
67
C SO(3) geometric toolbox 69
Bibliography 71
Notations
  usual 2norm in R
n
, Frobenius norm for matrices . . . . . . . . . . . . . . . . . . . . . . . . . . 4
´, ^ (usually) smooth, ﬁnitedimensional, Riemannian manifolds . . . . . . . . . . . . . . . 10
x, y, p points on a manifold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
T
x
´ tangent space to ´ at x . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
ξ, η, v tangent vectors to a manifold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
X, Y vector ﬁelds on a manifold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
Df(x)[ξ] derivative of f at x in the direction ξ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
dist geodesic distance on a manifold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Exp exponential map on a manifold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
Log logarithmic map on a manifold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
S
2
unit sphere embedded in R
3
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
H
n
set of symmetric matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
P
n
+
set of positivedeﬁnite matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
Table 1: Notations
iii
iv
Introduction
Let p
1
, p
2
, . . . , p
N
be N data points in the Euclidean space R
n
with time labels t
1
≤ t
2
≤ ≤ t
N
.
A classic problem arising in many disciplines consists in searching for a curve γ : [t
1
, t
N
] → R
n
that simultaneously (i) reasonably ﬁts the data and (ii) is suﬃciently smooth. This may, among
other things, help reduce measurement noise and ﬁll gaps in the data. One way of formalizing
this loose description of a natural need is to express γ as the minimizer of an energy function
such as
E(γ) =
1
2
N
¸
i=1
p
i
−γ(t
i
)
2
+
λ
2
tN
t1
 ˙ γ(t)
2
dt +
µ
2
tN
t1
¨ γ(t)
2
dt, (1)
deﬁned over some suitable curve space Γ. The tuning parameters λ and µ enable the user to
tune the balance between the conﬂicting goals of ﬁtting and smoothness. It is well known (see
state of the art, section 1.5) that (i) when λ > 0, µ = 0, the optimal γ is piecewise aﬃne and (ii)
when λ = 0, µ > 0 the optimal γ is an approximating cubic spline (provided Γ contains these
curves). Cubic splines are piecewise cubic polynomial curves of class (
2
.
We focus on a broader problem. The data points p
i
are now allowed to lie on a Riemannian
manifold ´. R
n
is the simplest example of such a manifold. Other examples treated in this
document include the sphere embedded in R
3
, which we note S
2
, the cone of positivedeﬁnite ma
trices P
n
+
and the special orthogonal group SO(3). Generalizations of (1) to manifolds have been
proposed. In particular, Samir et al. give, mutatis mutandis, the following deﬁnition in [SASK09]:
E(γ) =
1
2
N
¸
i=1
dist
2
(p
i
, γ(t
i
)) +
λ
2
tN
t1
' ˙ γ(t), ˙ γ(t)`
γ(t)
dt +
µ
2
tN
t1
D
2
γ
dt
2
,
D
2
γ
dt
2
γ(t)
dt, (2)
where dist denotes the Riemannian (geodesic) distance on ´, ', ` the Riemannian metric, ˙ γ
the ﬁrst derivative of γ and
D
2
γ
dt
2
the second covariant derivative of γ (chapter 2 deﬁnes these
terms and other relevant elements of Riemannian geometry). Samir et al. solve this problem
by computing the gradient of E in the socalled Palais metric and applying a steepest descent
scheme to minimize E. In essence, the problem is a variational problem in inﬁnite dimension.
The (necessary) discretization for implementation purposes arises at the very end of the algo
rithm design process. In conclusion of [SASK09], the authors wonder what would happen if one
were to discretize (2) directly. The present work is an attempt to answer that question.
The main contribution of this document lies in the proposed discretization of the problem.
The curve space is reduced to the set of sequences of N
d
points on the manifold, Γ = ´. . .´
(N
d
copies of ´). For a discrete curve γ = (γ
1
, . . . , γ
N
d
) ∈ Γ, each γ
i
is associated to a ﬁxed
time τ
i
such that t
1
= τ
1
< τ
2
< < τ
N
d
= t
N
. We propose the following discretization of (2):
E(γ) =
1
2
N
¸
i=1
dist
2
(p
i
, γ
si
) +
λ
2
N
d
¸
i=1
α
i
'v
i
, v
i
`
γi
+
µ
2
N
d
¸
i=1
β
i
'a
i
, a
i
`
γi
. (3)
Here, the indices s
i
are chosen such that τ
si
is closest (ideally equal) to t
i
. The tangent vectors
v
i
and a
i
, rooted at γ
i
, are approximations to the velocity and acceleration vectors ˙ γ(t
i
) and
D
2
γ
dt
2
(t
i
). We compute these with generalized ﬁnite diﬀerences, which we introduce in this work.
The weights α
i
and β
i
are chosen based on a suitable numerical integration scheme like, e.g., the
trapezium method, such that the sums eﬀectively discretize the integrals present in (2). We also
1
2
propose other ways of discretizing (2). Interestingly, they all come down to the same discretiza
tion in R
n
but, in general, diﬀer on manifolds. Since the curve space is now ﬁnitedimensional,
we may apply the optimization algorithms from [AMS08] to minimize (3). Chapter 3 covers the
details.
Measurements on manifolds arise naturally in applications. To cite but a few examples:
• Positions on Earth closely resemble points on S
2
;
• Conﬁgurations of rigid bodies in space are fully described by the position of their center
of mass and their orientation, which jointly constitute a point of SE(3) = R
3
SO(3), the
special Euclidean group. Rigid body motion estimation thus comes down to regression on
SE(3). The same mathematics can be used for smooth camera transitions in computer
graphics;
• DiﬀusionTensor MRI measures (noisy) positivedeﬁnite matrices in the brain for medical
imaging purposes;
• Shapes (seen as closed curves) can be measured, e.g., by contour detectors applied to video
streams. Shapes belong to the shape space, a complex manifold we plan on working with
in the future.
Each time an application uses data belonging to (tractable) manifolds, noise reduction (i.e.,
smoothing) of the measurements and interpolation can be carried out by the generic techniques
developed in this document. For the control engineer, our algorithms may be helpful, e.g., in
case the control variables from the systems at hand belong to manifolds.
We successfully implemented our methods in R
n
(chapter 1), on the sphere S
2
(chapter 4),
in the cone of positivedeﬁnite matrices P
n
+
(chapter 5) and on the special orthogonal group
SO(3) (appendix C). For P
n
+
and SO(3), we constructed explicit formulas for the gradients of
realvalued functions involving the matrix logarithm. Our results correspond nicely to what one
would expect based on the continuous case. Our short answer to the question Samir et al. ask is
that the regression problem on these manifolds can, in general, be solved satisfactorily via our
direct discretization approach.
Chapter 1
Discrete curve ﬁtting in R
n
In this work, we show how one can ﬁnd a discrete representation of a curve lying on a manifold
that strikes a (tunable) balance between data ﬁtting and smoothness. Before we give a general
formulation of that problem (along with a proper deﬁnition of ﬁtting and smoothness on mani
folds), it is helpful to investigate the Euclidean case. This will help us take conscience of how the
vector space structure of R
n
simpliﬁes the whole process of formalizing and solving the problem.
Later on, we will use these observations to establish a list of tools we should have on manifolds
to adequately generalize some of the more fundamental concepts that can sometimes be hidden
behind generic linear combinations.
First oﬀ, we give a formal deﬁnition of the regression problem for timelabeled data in R
n
in
the continuous case, i.e., what we are looking for is a suitable smooth curve deﬁned over some
time interval and taking values in R
n
. An objective function (energy function) will be introduced
to capture the balance we are seeking between ﬁtting and smoothness. The second step consists
in proposing a discretization for that objective. As we will see, it is very natural to use ﬁnite
diﬀerences for that matter. Doing so, it will be suﬃcient to describe a curve using a ﬁnite set
of sampling points. We will solve the discrete problem in a leastsquares framework. We will
then investigate where the vector space structure of R
n
has been used, and where we should
generalize the problem formulation for the case of nonvector spaces. More speciﬁcally, in the
following chapters, we will consider ﬁnitedimensional, smooth Riemannian manifolds, namely:
the sphere and the set of positivedeﬁnite matrices. We close this chapter by a short state of the
art.
3
4 CHAPTER 1. DISCRETE CURVE FITTING IN R
N
1.1 The problem in a continuous setting
Let us consider a collection of N weighted, timelabeled points p
i
in R
n
: ¦(w
i
, t
i
, p
i
) : i =
1, . . . , N¦, such that the time labels t
i
are increasing with i and such that the weights w
i
are
nonnegative. We are searching for a (
k
(smooth “enough”) function γ : [t
1
, t
N
] →R
n
such that
γ approximately ﬁts the data and such that γ is not too “wiggly”. The former means that we
would like γ(t
i
) to be close to p
i
(even more so when w
i
is high) whereas the latter means that
we would like the derivatives of γ to have a small norm, eﬀectively discouraging sharp turns and
detours. Furthermore, we would like to be able to tune the balance between those two competing
objectives.
The following objective (energy) function, deﬁned over the set of diﬀerentiable functions γ
from [t
1
, t
N
] to R
n
, (
k
([t
1
, t
N
], R
n
) for some appropriate k, captures the problem description:
E(γ) =
1
2
N
¸
i=1
w
i
γ(t
i
) −p
i

2
+
λ
2
tN
t1
 ˙ γ(t)
2
+
µ
2
tN
t1
¨ γ(t)
2
(1.1)
We use the usual 2norm in R
n
, x
2
= x
T
x. The two parameters λ and µ enable us to adjust
the importance of the penalties on the 2norms of the ﬁrst derivative ˙ γ and the second deriva
tive ¨ γ. Playing with these parameters lets us move the minimizer of E from near interpolation
(piecewise linear or cubic splines) to near linear regression.
The curve space over which E has to be minimized to obtain our regression model has inﬁnite
dimension. Minimizing E is a variational problem, which can be complicated, not to mention
that our goal in this document is to solve the problem on manifolds rather than on R
n
. The
added complexity of the continuous approach is the main reason why, in the next section, we
discretize equation (1.1).
For the sake of completeness, let us mention that the continuous objective (1.1) has been
generalized to Riemannian manifolds to take this form:
E(γ) =
1
2
N
¸
i=1
dist
2
(γ(t
i
), p
i
) +
λ
2
tN
t1
' ˙ γ(t), ˙ γ(t)`
γ(t)
dt +
µ
2
tN
t1
D
2
γ
dt
2
,
D
2
γ
dt
2
γ(t)
dt, (1.2)
where dist denotes the Riemannian (geodesic) distance, ', ` the Riemannian metric, ˙ γ the ﬁrst
derivative of γ and
D
2
γ
dt
2
the second covariant derivative of γ. Among others (see section 1.5),
in [SASK09], Samir et al. have solved regression problems based on this energy function. Their
strategy was to discretize as late as possible in the process of writing minimization algorithms.
In this document, we take the opposite strategy. We discretize curves immediately, reducing the
curve space to a ﬁnite dimensional manifold, hence reverting to the optimization setting already
covered in [AMS08]. This is in direct response to the concluding remarks of [SASK09].
1.2 Objective discretization
Instead of optimizing (1.1) over all functions in (
k
([t
1
, t
N
], R
n
), we would like to restrict ourselves
to a ﬁnite dimensional curve space Γ. A sensible way to do that would be to consider a ﬁnite
basis of smooth functions, then to optimize (1.1) over the space spanned by said basis. This
approach works great in the Euclidean case. It can be generalized to some extent, and at the
cost of increased complexity, to manifolds, see section 1.5.
We would like to treat the Euclidean regression problem in a way that will help us work out
the Riemannian case. This is why, even in this section devoted to the Euclidean case, we study
the simple discretization obtained by sampling curves in time.
More precisely, we consider the discrete curve space Γ = R
n
. . . R
n
≡ R
n×N
d
consisting
of all sequences of N
d
points in R
n
(the d subscript stands for discrete). The sampling times
1.2. OBJECTIVE DISCRETIZATION 5
τ
1
, . . . , τ
N
d
are ﬁxed.
For each discrete curve γ = (γ
1
, . . . , γ
N
d
) ∈ Γ, there is an underlying continuous curve γ(t)
such that γ(τ
i
) = γ
i
. Giving a precise deﬁnition of such a γ(t) would yield a clean way of dis
cretizing (1.1). We refrain from doing that because this is the speciﬁc point that makes matters
intricate on manifolds. In the Euclidean case though, one could use, e.g., shape functions like
the ones we use in ﬁnite element methods.
Let us review the three components of (1.1) and see how we can discretize them without
explicitly constructing an underlying γ(t).
Misﬁt penalty The ﬁrst term of (1.1) penalizes misﬁt between the curve γ and the N data
points p
i
. We introduce indices s
1
, . . . , s
N
such that t
i
is closest to τ
si
and deﬁne the misﬁt
penalty, E
d
, as:
E
d
: Γ →R
+
: γ →E
d
(γ) =
1
2
N
¸
i=1
w
i
γ
si
−p
i

2
. (1.3)
We can always choose the sampling times τ
i
such that, for some s
i
, t
i
= τ
si
. There may be
multiple data points associated to the same discretization point, i.e., s
i
= s
j
for some i = j.
Velocity penalty The second term of (1.1) penalizes a kind of L
2
norm of the ﬁrst derivative
of γ(t). Two things need to be discretized here: (i) the derivative ˙ γ and (ii) the integral.
Mainstream numerical methods can help us here. We deﬁne the velocity penalty, E
v
, as:
E
v
: Γ →R
+
: γ →E
v
(γ) =
1
2
N
d
¸
i=1
α
i
v
i

2
, (1.4)
where the velocity approximations v
i
are obtained via ﬁnite diﬀerences and the coeﬃcients α
i
correspond to the weights in, e.g., the trapezium rule for numerical integration. For the latter,
any method suited for ﬁxed, nonhomogeneous sampling is ﬁne.
We delay the explicit construction of the formulas for v
i
, which is standard material, to
section 3.1. The important feature is that v
i
≈ ˙ γ(t
i
) is a linear combination of γ
i
and its
neighbor(s). At the end points, v
0
and v
N
d
are deﬁned as unilateral ﬁnite diﬀerences.
Acceleration penalty The third term of (1.1) penalizes the same kind of L
2
norm of the
second derivative of γ(t). The discretization process is very similar to what was done for E
v
. We
introduce the acceleration penalty E
a
as:
E
a
: Γ →R
+
: γ →E
a
(γ) =
1
2
N
d
¸
i=1
β
i
a
i

2
. (1.5)
Again, the acceleration vector a
i
at γ
i
can be obtained via ﬁnite diﬀerences as a linear combina
tion of γ
i
and its neighbors, except at the end points for which we need unilateral diﬀerences. The
appropriate formulas can be found in section 3.1. The β
i
’s are adequate weights for a numerical
integration method, which need not be the same as the α
i
’s used for E
v
.
The complete objective function is the following, with tuning parameters λ and µ:
E : Γ →R
+
: γ →E(γ) = E
d
(γ) +λE
v
(γ) +µE
a
(γ) (1.6)
We state without proof that, as the maximum discretization step ∆τ
i
= τ
i+1
−τ
i
goes to zero, the
optimal discrete objective value and the optimal continuous objective value become equivalent.
This is a consequence of Taylor’s theorem and of the deﬁnition of Riemann integrals.
In the next section, we show how (1.6) can be minimized by solving a sparse linear system
in a leastsquares sense.
6 CHAPTER 1. DISCRETE CURVE FITTING IN R
N
1.3 Leastsquares formulation
First oﬀ, let us observe that the objective function E is decoupled along the n dimensions of the
problem. Hence, we can focus on solving the problem in R
n
with n = 1. The objective function
E:
E(γ) =
1
2
N
¸
i=1
w
i
γ
si
−p
i

2
+
λ
2
N
d
¸
i=1
α
i
v
i

2
+
µ
2
N
d
¸
i=1
β
i
a
i

2
, (1.7)
can be rewritten as a sum of vector 2norms:
E(γ) =
1
2
¸
¸
√
w
1
(γ
s1
−p
1
)
.
.
.
√
w
N
(γ
sN
−p
N
)
¸
2
+
1
2
¸
¸
√
λα
1
v
1
.
.
.
λα
N
d
v
N
d
¸
2
+
1
2
¸
¸
√
µβ
1
a
1
.
.
.
µβ
N
d
a
N
d
¸
2
.
The vectors appearing in this expression are aﬃne combinations of the variables γ
i
and hence
can be rewritten as (temporarily considering γ as a column vector):
E(γ) =
1
2
P
d
γ −q
d

2
+
1
2
P
v
γ
2
+
1
2
P
a
γ
2
=
1
2
γ
T
(P
T
d
P
d
+P
T
v
P
v
+P
T
a
P
a
)γ −q
T
d
P
d
γ +
q
T
d
q
d
2
=
1
2
γ
T
Aγ −b
T
γ +c,
for adequate matrices P
d
∈ R
N×N
d
, P
v
, P
a
∈ R
N
d
×N
d
, a vector q
d
∈ R
N
, and the obvious
deﬁnitions for A, b and c. The vector q
d
is the only object that depends on the data points
p
i
. The matrices P
d
, P
v
and P
a
depend on the sampling times τ
i
as well as on the various
weights introduced. Hence, A does not depend on the data and must be constructed only once
to solve a problem in R
n
, even for n > 1. Diﬀerentiating and setting the gradient E to zero, we
straightforwardly obtain the linear system:
Aγ = b,
which needs to be solved n times, for n diﬀerent b vectors. This problem has a rich structure
because of the nature of the constraints. Namely, A is 5diagonal and positivedeﬁnite. The
regression problem in R
n
can thus reliably be solved in O(nN
d
) operations. This is a standard
result from numerical linear algebra, see for example [VD08].
The discrete regression problem reduces to a highly structured quadratic optimization prob
lem without constraints, which can be solved, e.g., with an LU decomposition and n back
substitutions. If one wants to add constraints (possibly forcing interpolation by some or all of
the points p
i
or constraining speed/acceleration to respect lower and/or upper bounds on some
time intervals), modern Quadratic Programming solvers can come in handy.
1.4 What if we generalize to Riemannian manifolds?
In the next chapters, we generalize to Riemannian manifolds. In doing so, the lack of a vector
space structure raises two issues:
• How can we generalize the objective E? Finite diﬀerences, since they are linear combi
nations, do not make sense on manifolds. Same goes for the Euclidean distance. We will
need substitutes for these elements. As we will see in section 3.1, the linear combinations
appearing in ﬁnite diﬀerences are not arbitrary: they make use of particular vectors that,
in a Euclidean space, lead from one point to another in the same space. We exploit that.
• Since the objective will not be quadratic anymore, how can we minimize it? We will need
optimization algorithms on manifolds. We will investigate iterative descent methods. For
this to work, we will need tools to transform curve guesses into other curves, moving in
some direction in the curve space.
1.5. SHORT STATE OF THE ART 7
In the next chapter, we study a few properties and tools of Riemannian geometry. These will
enable us to give a precise meaning to the loose terms used in this section. In the next section,
we give a brief review of the state of the art.
1.5 Short state of the art
Several existing methods for tackling the conﬂicting nature of the curve ﬁtting problem (i.e.,
goodness of ﬁt vs regularity) rely on turning the ﬁtting task into an optimization problem where
one of the criteria becomes the objective function and the other criterion is turned into a con
straint.
Let us ﬁrst consider the case where the goodness of ﬁt is optimized under a regularity con
straint. When ´ = R
n
, a possible regularity constraint, already considered by Lagrange, is to
restrict the curve γ to the family of polynomial functions of degree not exceeding m, (m ≤ N−1).
This leastsquares problem cannot be straightforwardly generalized to Riemannian manifolds be
cause of the lack of an equivalent of polynomial curves on Riemannian manifolds. The notion
of polynomial does not carry to ´ in an obvious way. An exception is the case m = 1; the
polynomial functions in R
n
are then straight lines, whose natural generalization on Riemannian
manifolds are geodesics. The problem of ﬁtting geodesics to data on a Riemannian manifold ´
was considered in [MS06] for the case where ´ is the special orthogonal group SO(n) or the
unit sphere S
n
.
The other case is when a regularity criterion is optimized under constraints on the goodness of
ﬁt. In this situation, it is common to require a perfect ﬁt: the curve ﬁtting problem becomes an
interpolation problem. When ´= R
n
, the interpolating curves γ that minimize
1
2
1
0
¨ γ(t)
2
dt
(in the appropriate function space) are known as cubic splines. For the case where ´ is a non
linear manifold, several results on interpolation can be found in the literature. Crouch and Silva
Leite [CS91, CS95] generalized cubic splines to Riemannian manifolds, deﬁned as curves γ that
minimize, under interpolation conditions, the function
1
2
tN
t1
D
2
γ
dt
2
(t),
D
2
γ
dt
2
(t)
γ(t)
dt,
where
D
2
γ
dt
2
denotes the (LeviCivita) second covariant derivative and ', `
x
stands for the Rie
mannian metric on ´ at x. They gave a necessary condition for optimality in the form of a
fourthorder diﬀerential equation, which generalizes a result of Noakes et al. [NHP89]. Splines
of class (
k
were generalized to Riemannian manifolds by Camarinha et al. [CSC95]. Still in the
context of interpolation on manifolds, but without a variational interpretation, we mention the
literature on splines based on generalized B´ezier curves, deﬁned by a generalization to manifolds
of the de Casteljau algorithm; see [CKS99, Alt00, PN07]. Recently, Jakubiak et al. [JSR06] pre
sented a geometric twostep algorithm to generate splines of an arbitrary degree of smoothness
in Euclidean spaces, then extended the algorithm to matrix Lie groups and applied it to generate
smooth motions of 3D objects.
Another approach to interpolation on manifolds consists of mapping the data points onto
the tangent space at a particular point of ´, then computing an interpolating curve in the
tangent space, and ﬁnally mapping the resulting curve back to the manifold. The mapping can
be deﬁned, e.g., by a rolling procedure, see [HS07, KDL07].
Yet another approach, that we focus on in this work, consists in optimizing (1.2) directly
without constraints on γ other than diﬀerentiability conditions. While the concept is well known
in R
n
, its generalization to Riemannian manifolds is more recent.
In [MSH06], the objective function is deﬁned by equation (1.2) with µ = 0, over the class of
all piecewise smooth curves γ : [0, 1] → ´, where λ (> 0) is a smoothing parameter. Solutions
to this variational problem are piecewise smooth geodesics that best ﬁt the given data. As shown
in [MSH06], when λ goes to +∞, the optimal curve converges to a single point which, for certain
8 CHAPTER 1. DISCRETE CURVE FITTING IN R
N
classes of manifolds, is shown to be the Riemannian mean of the data points. When λ goes to
zero, the optimal curve goes to a broken geodesic on ´ interpolating the data points.
In [MS06], the objective function is deﬁned by equation (1.2) with λ = 0, over a certain set of
admissible (
2
curves. The authors give a necessary condition of optimality that takes the form
of a fourthorder diﬀerential equation involving the covariant derivative and the curvature tensor
along with certain regularity conditions at the time instants t
i
, i = 1, . . . , N [MS06, Th. 4.4].
The optimal curves are approximating cubic splines: they are approximating because in general
γ(t
i
) diﬀers from p
i
, and they are cubic splines because they are obtained by smoothly piecing
together segments of cubic polynomials on ´, in the sense of Noakes et al. [NHP89]. It is
also shown in [MS06, Prop. 4.5] that, as the smoothing parameter µ goes to +∞, the optimal
curve converges to the best leastsquares geodesic ﬁt to the data points at the given instants of
time. When µ goes to zero, the approximating cubic spline converges to an interpolating cubic
spline [MS06, Prop. 4.6].
In summary, while the concept of smoothing spline has been satisfactorily generalized to
Riemannian manifolds, the emphasis has been laid on studying properties of these curves rather
than on proposing eﬃcient methods to compute them. In this work, we focus on computing
discrete representations of such curves.
Chapter 2
Elements of Riemannian
geometry
In the ﬁrst chapter, we showed how easily the discrete regression problem can be stated and
solved in a vector space. We then made a few statements about where that structure is used and
what needs to be done to generalize the concepts to other spaces.
In this chapter, we give a brief overview of some Riemannian geometry elements. We start by
giving a proper deﬁnition of manifolds as sets accompanied by atlases, which are sets of charts.
We will not explicitly use charts in our applications. However, deﬁning charts and atlases is a
necessary eﬀort to build the higher level tools we do use later on. Loosely, manifolds are sets
that locally resemble R
n
. Charts are the mathematical concept that embody this idea. At each
point of a manifold we can deﬁne a vector space, called the tangent space, that captures this
essential property. Inner products are then deﬁned as positivedeﬁnite forms on each tangent
space. Considering spaces where these inner products vary continuously on the manifold leads
to the concept of Riemannian manifolds. Using these inner products we give a natural deﬁnition
of distance between points and gradients of realvalued functions on the manifold. Subsequently,
the exponential and logarithmic maps are introduced. They are the key concepts that, later on,
will enable us to generalize ﬁnite diﬀerences and descent methods for optimization. Finally, we
introduce parallel translation, which is the tool that lets us compare tangent vectors rooted at
diﬀerent points of a manifold.
Should you not be familiar with this material, we suggest you think of the general manifold
´ used henceforth simply as the sphere. It is also helpful to think about how the concepts
specialize in the case of R
n
. This chapter is mainly based on [AMS08, Hai09, Boo86]. All the
(beautiful) ﬁgures come from the book [AMS08] by Absil et al. I thank the authors for them.
9
10 CHAPTER 2. ELEMENTS OF RIEMANNIAN GEOMETRY
2.1 Charts and manifolds
In this document, we are mainly concerned with embedded Riemannian manifolds (we will deﬁne
these terms shortly). Nevertheless, it is important to give a proper deﬁnition of smooth manifolds
ﬁrst. Intuitively, manifolds are sets that can be locally identiﬁed with patches of R
n
. These
identiﬁcations are called charts. A set of charts that covers the whole set is called an atlas for
the set. The set and the atlas together constitute a manifold. More formally:
Deﬁnition 2.1.1 (chart). Let M be a set. A chart of M is a pair (U, ϕ) where U ⊂ M and ϕ
is a bijection between U and an open set of R
n
. U is the chart’s domain and n is the chart’s
dimension. Given p ∈ U, the elements of ϕ(p) = (x
1
, . . . , x
n
) are called the coordinates of p in
the chart (U, ϕ).
Deﬁnition 2.1.2 (compatible charts). Two charts (U, ϕ) and (V, ψ) of M, of dimensions n and
m respectively, are smoothly compatible ((
∞
−compatible) if either U ∩V = ∅ or U ∩V = ∅ and
• ϕ(U ∩ V ) is an open set of R
n
,
• ψ(U ∩ V ) is an open set of R
m
,
• ψ ◦ ϕ
−1
: ϕ(U ∩ V ) → ψ(U ∩ V ) is a smooth diﬀeomorphism (i.e., a smooth invertible
function with smooth inverse).
When U ∩ V = ∅, the latter implies n = m.
V
U
R
n
ψ(V )
ϕ ψ
R
n
ϕ(U)
ϕ ◦ ψ
−1
ψ ◦ ϕ
−1
ϕ(U ∩ V )
ψ(U ∩ V )
Figure 2.1: Charts. Figure courtesy of Absil et al., [AMS08].
Deﬁnition 2.1.3 (atlas). A set / of pairwise smoothly compatible charts ¦(U
i
, ϕ
i
), i ∈ I¦ such
that ∪
i∈I
U
i
= M is a smooth atlas of M.
Two atlases /
1
and /
2
are compatible if /
1
∪ /
2
is an atlas. Given an atlas /, one can
generate a unique maximal atlas /
+
. Such an atlas contains /as well as all the charts compatible
with /. Classically, we deﬁne:
Deﬁnition 2.1.4 (manifold). A smooth manifold is a pair ´ = (M, /
+
), where M is a set
and /
+
is a maximal atlas of M.
Example 2.1.5. The vector space R
n
can be endowed with an obvious manifold structure. Simply
consider ´ = (R
n
, /
+
) where the atlas /
+
contains the identity map (R
n
, ϕ), ϕ : U = R
n
→
R
n
: x →ϕ(x) = x. Deﬁned as such, ´ is called a Euclidean space. We always use the notation
R
n
regardless of whether we consider the vector space or the Euclidean space.
Often times, we will refer to M when we really mean ´ and vice versa. Once the diﬀerential
manifold structure is clearly stated, no confusion is possible. For example, the notation ´⊂ R
n
means M ⊂ R
n
.
2.2. TANGENT SPACES AND TANGENT VECTORS 11
Deﬁnition 2.1.6 (dimension). Given a manifold ´ = (M, /
+
), if all the charts of /
+
have
the same dimension n, we call n the dimension of the manifold.
We need one last deﬁnition to assess smoothness of curves deﬁned on manifolds:
Deﬁnition 2.1.7 (smooth mapping). Let ´ and ^ be two smooth manifolds. A mapping
f : ´→^ is of class (
k
if, for all p in ´, there is a chart (U, ϕ) of ´ and (V, ψ) of ^ such
that p ∈ U, f(U) ⊂ V and
ψ ◦ f ◦ ϕ
−1
: ϕ(U) →ψ(V )
is of class (
k
. The latter is called the local expression of f in the charts (U, ϕ) and (V, ψ). It
maps open sets of R
n
, hence all classic tools apply.
This deﬁnition does not depend on the choice of charts.
In the next section, we introduce tangent spaces and tangent vectors. Those are objects that
give a local representation of manifolds as vector spaces.
2.2 Tangent spaces and tangent vectors
As is customary in diﬀerential geometry, we will deﬁne tangent vectors as equivalence classes of
functions. This surprising detour from the very simple idea underlying tangent vectors (namely
that they point in directions one can follow at a given point on a manifold), once again, comes
from the lack of a vector space structure. We ﬁrst construct a simpler deﬁnition. In R
n
, one can
deﬁne derivatives for curves c : R →R
n
:
c
(0) := lim
t→0
c(t) −c(0)
t
.
The diﬀerence appearing in the numerator does not, in general, make sense for manifolds. For
manifolds embedded in R
n
however (like, e.g., the sphere), we can still make sense of the deﬁnition
with the appropriate space identiﬁcations. A simple deﬁnition of tangent spaces in this setting
follows.
Deﬁnition 2.2.1 (tangent spaces for manifolds embedded in R
n
). Let ´ ⊂ R
n
be a smooth
manifold. The tangent space at x ∈ ´, noted T
x
´, is the vector subspace of R
n
deﬁned by:
T
x
´= ¦v ∈ R
n
: v = c
(0) for some smooth c : R →´ such that c(0) = x¦ .
The dimension of T
x
´ is the dimension of a chart of ´ containing x.
The following theorem is useful when dealing with such manifolds deﬁned by equality con
straints on the Cartesian coordinates.
Theorem 2.2.2. Let ´ be a subset of R
n
. (1) and (2) are equivalent:
(1) ´ is a smooth submanifold of R
n
of dimension n −m,
(2) For all x ∈ ´, there is an open set V of R
n
containing x and a smooth function f : V →
R
m
such that the diﬀerential df
x
: R
n
→R
m
has rank m and V ∩ ´= f
−1
(0).
Furthermore, the tangent space at x is given by T
x
´= ker df
x
.
We did not deﬁne the term submanifold properly. Essentially, (1) means that ´is a manifold
endowed with the diﬀerential structure naturally inherited from R
n
.
Example 2.2.3. An example of a smooth, twodimensional submanifold of R
3
is the sphere
S
2
= ¦x ∈ R
3
: x
T
x = 1¦. Use f : R
3
→R : x →f(x) = x
T
x −1 in theorem 2.2.2. The tangent
spaces are then T
x
S
2
= ¦v ∈ R
3
: v
T
x = 0¦. We illustrate this on ﬁgure 2.2.
12 CHAPTER 2. ELEMENTS OF RIEMANNIAN GEOMETRY
S
2
c(t)
x = c(0)
c
/
(0)
Figure 2.2: Tangent space on the sphere. Since S
2
is an embedded submanifold of R
3
, the tangent space
TxS
2
can be pictured as the plane tangent to the sphere at x, with origin at x. Figure courtesy of Absil
et al., [AMS08].
In general, ´ is not embedded in R
n
. We now give a more general deﬁnition of tangent
vectors that does not need the manifold to be embedded in a vector space. Let ´ be a smooth
manifold and p be a point on ´. We deﬁne:
C
p
= ¦c : I →´ : c ∈ (
1
, 0 ∈ I an open interval in R, c(0) = p¦,
the set of diﬀerentiable curves on ´passing through p at t = 0. Here, c ∈ (
1
is to be understood
with the deﬁnition 2.1.7, with the obvious manifold structure on open intervals of R derived from
example 2.1.5. We deﬁne an equivalence relation on C
p
, noted ∼. Let (U, ϕ) be a chart of ´
such that p ∈ U, and let c
1
, c
2
∈ C
p
. Then, c
1
∼ c
2
if and only if ϕ ◦ c
1
and ϕ ◦ c
2
have same
derivative at 0, i.e.:
c
1
∼ c
2
⇔
d
dt
ϕ(c
1
(t))
t=0
=
d
dt
ϕ(c
2
(t))
t=0
.
It is easy to prove that this is independent of the choice of chart.
Deﬁnition 2.2.4 (tangent space, tangent vector). The tangent space to ´ at p, noted T
p
´,
is the quotient space:
T
p
´= C
p
/ ∼
Given c ∈ C
p
, the equivalence class [c] is an element of T
p
´ that we call a tangent vector to ´
at p.
The mapping
θ
ϕ
p
: T
p
´→ R
n
: [c] →θ
ϕ
p
([c]) =
d
dt
ϕ(c(t))
t=0
is bijective and naturally deﬁnes a vector space structure over T
p
´. This structure, again, is
independent of the chart. When ´ ⊂ R
n
, it is possible to build a vector space isomorphism
(i.e., an invertible linear map) proving that the two deﬁnitions 2.2.1 and 2.2.4 are, essentially,
equivalent.
Deﬁnition 2.2.5 (tangent bundle). Let ´ be a smooth manifold. The tangent bundle, noted
T´, is the set:
T´=
¸
p∈M
T
p
´,
where
¸
stands for disjoint union. We deﬁne π as the natural projection on the roots of vectors,
i.e., π(ξ) = p if and only if ξ ∈ T
p
´.
The tangent bundle is itself a manifold. This enables us to deﬁne vector ﬁelds on manifolds
as smooth mappings.
2.3. INNER PRODUCTS AND RIEMANNIAN MANIFOLDS 13
Deﬁnition 2.2.6 (vector ﬁeld). A vector ﬁeld is a smooth mapping from ´ to T´ such that
π ◦ X = Id, the identity map. The vector at p will be written as X
p
or X(p) and lies in T
p
´.
An important example of a vector ﬁeld is the gradient of a scalar ﬁeld on a manifold, which
we will deﬁne later on and extensively use in our optimization algorithms.
Let us introduce an alternative notation for tangent vectors to curves (velocity vectors) that
will make things look more natural in the sequel. Given a curve of class (
1
, γ : [a, b] →´, and
t ∈ [a, b], deﬁne another such curve on ´:
γ
t
: [a −t, b −t] →´: τ →γ
t
(τ) = γ(t +τ).
This curve is such that γ
t
(0) = γ(t) p. [γ
t
] ∈ T
p
´ is a vector tangent to γ at time t. We
propose to write
˙ γ(t) [γ
t
].
When using deﬁnition 2.2.1, ˙ γ(t) is identiﬁed with γ
(t). This notation has the advantage of
capturing the nature of this vector (a velocity vector).
2.3 Inner products and Riemannian manifolds
Exploiting the vector space structure of tangent spaces, we deﬁne inner products as follows:
Deﬁnition 2.3.1 (inner product). Let ´be a smooth manifold and ﬁx p ∈ ´. An inner product
', `
p
on T
p
´ is a bilinear, symmetric positivedeﬁnite form on T
p
´, i.e., ∀ξ, ζ, η ∈ T
p
´,
a, b ∈ R:
• 'aξ +bζ, η`
p
= a 'ξ, η`
p
+b 'ζ, η`
p
,
• 'ξ, ζ`
p
= 'ζ, ξ`
p
,
• 'ξ, ξ`
p
≥ 0, 'ξ, ξ`
p
= 0 ⇔ ξ = 0.
Often, when it is clear from the context that ξ and η are rooted at p, i.e., ξ, η ∈ T
p
´, we
write 'ξ, η` instead of 'ξ, η`
p
. The norm of a tangent vector ξ ∈ T
p
´ is ξ
p
=
'ξ, ξ`
p
.
Let ´ be a smooth manifold of dimension n and p be a point on ´. Let (U, ϕ) be a
chart of ´ such that p ∈ U. Let (e
1
, . . . , e
n
) be a basis of R
n
. We deﬁne the tangent vectors
E
i
= (θ
ϕ
p
)
−1
(e
i
) for i ranging from 1 to n. We state that every vector ξ ∈ T
p
´ can be uniquely
decomposed as ξ =
¸
n
i=1
ξ
i
E
i
, i.e., (E
1
, . . . , E
n
) is a basis of T
p
´. Now using the inner product
of deﬁnition 2.3.1, we deﬁne g
ij
= 'E
i
, E
j
`
p
. Hence, decomposing ζ =
¸
n
i=1
ζ
i
E
i
like we did for
ξ, we have:
'ξ, ζ`
p
=
n
¸
i=1
n
¸
j=1
g
ij
ξ
i
ζ
j
=
ˆ
ξ
T
G
p
ˆ
ζ,
where we deﬁned the vectors
ˆ
ξ = (ξ
1
, . . . , ξ
n
)
T
,
ˆ
ζ = (ζ
1
, . . . , ζ
n
)
T
and the matrix G
p
such that
(G
p
)
ij
= 'E
i
, E
j
`
p
. The matrix G
p
is symmetric, positivedeﬁnite.
Deﬁnition 2.3.2 (Riemannian manifold). A Riemannian manifold is a pair (´, g), where ´
is a smooth manifold and g is a Riemannian metric. A Riemannian metric is a smoothly varying
inner product deﬁned on the tangent spaces of ´, i.e., for each p ∈ ´, g
p
(, ) is an inner
product on T
p
´.
In this deﬁnition, smoothly varying means that the functions (G
p
)
ij
, which are functions
from U to R, are smooth, in the sense described in deﬁnition 2.1.7, with the obvious manifold
structure on R.
As is customary, we will often refer to a Riemannian manifold (´, g) simply as ´, when
the metric is clear from the context. From now on, when we quote the term manifold, unless
otherwise stated, we refer to a smooth, ﬁnitedimensional Riemannian manifold.
14 CHAPTER 2. ELEMENTS OF RIEMANNIAN GEOMETRY
2.4 Scalar ﬁelds on manifolds and the gradient vector ﬁeld
Let ´ be a smooth manifold. A scalar ﬁeld on ´ is a smooth function f : ´ → R. We now
generalize the concept of directional derivatives to scalar ﬁelds on manifolds.
Deﬁnition 2.4.1 (directional derivative). The directional derivative of a scalar ﬁeld f on ´
at p ∈ ´ in the direction ξ = [c] ∈ T
p
´ is the scalar:
Df(p)[ξ] =
d
dt
f(c(t))
t=0
It is easily checked that this deﬁnition does not depend on the choice of c, the representative
of the equivalence class ξ. In the above notation, the brackets around ξ are a convenient way
of denoting that ξ is the direction. They do not mean that we are considering some sort of
equivalence class of ξ (that would not make sense).
The following deﬁnition is of major importance for our purpose. It generalizes gradients of
scalar ﬁelds to manifolds.
Deﬁnition 2.4.2 (gradient). Let f be a scalar ﬁeld on a smooth, ﬁnitedimensional Riemannian
manifold ´. The gradient of f at p, denoted by gradf(p), is deﬁned as the unique element of
T
p
´ satisfying:
Df(p)[ξ] = 'gradf(p), ξ`
p
, ∀ξ ∈ T
p
´
Thus, gradf : ´→T´ is a vector ﬁeld on ´.
For a scalar ﬁeld f on a Euclidean space, gradf is the usual gradient, which we note ∇f. Re
markably, and similarly to the Euclidean case, the gradient deﬁned above is the steepestascent
vector ﬁeld and the norm  gradf(p)
p
is the steepest slope of f at p.
Based on this deﬁnition, one way to derive an expression for the gradient of a scalar ﬁeld f is
to work out an expression for the directional derivative of f, according to deﬁnition 2.4.1, then
to write it as an inner product suitable for direct identiﬁcation. That might not be practical.
Theorem 2.4.5 provides an alternative, often shorter, way for Riemannian submanifolds of Eu
clidean spaces. The proof was derived independently but undoubtedly resembles existing proofs.
We ﬁrst need a deﬁnition.
Deﬁnition 2.4.3 (Riemannian submanifold). A Riemannian submanifold ´ of R
n
is a sub
manifold of R
n
(as indirectly deﬁned by theorem 2.2.2) equipped with the Riemannian metric
inherited from R
n
, i.e., the metric on ´ is obtained by restricting the metric on R
n
to ´.
Riemannian submanifolds of R
n
are quite common and easy to picture. In particular, since
such manifolds are embedded in R
n
, following deﬁnition 2.2.1, the tangent space T
x
´is a vector
subspace of R
n
. We can thus deﬁne (unique) orthogonal projectors
P
x
: R
n
→T
x
´: v →P
x
v, and
P
⊥
x
: R
n
→T
⊥
x
´: v →P
⊥
x
v
such that v = P
x
v +P
⊥
x
v. This follows from the direct sum decomposition R
n
= T
x
´⊕T
⊥
x
´.
Example 2.4.4 (continued from example 2.2.3). The Riemannian metric on the sphere is ob
tained by restricting the metric on R
3
to S
2
. Hence, for x ∈ S
2
and v
1
, v
2
∈ T
x
S
2
, 'v
1
, v
2
`
x
=
v
T
1
v
2
. The orthogonal projector on the tangent space T
x
S
2
is P
x
= I −xx
T
.
Theorem 2.4.5. Let ´ ⊂ R
n
be a Riemannian submanifold of R
n
equipped with the inner
product ', ` and let f : R
n
→R be a smooth scalar ﬁeld. Let f = f
M
be the restriction from f
to ´. Then:
gradf : ´→T´: x → gradf(x) = P
x
∇f(x),
where P
x
is the orthogonal projector from R
n
onto T
x
´.
2.5. CONNECTIONS AND COVARIANT DERIVATIVES 15
Proof. Let x ∈ ´. For all v ∈ T
x
´, consider a smooth curve c
v
: R → ´ such that c
v
(0) = x
and c
v
(0) = v. Then, by deﬁnition 2.4.1:
Df(x)[v] =
d
dt
f(c
v
(t))
t=0
=
d
dt
f(c
v
(t))
t=0
=
∇f(x), c
v
(0)
Decompose the gradient in its tangent and normal components:
=
P
x
∇f(x) +P
⊥
x
∇f(x), v
v is a tangent vector, hence:
=
P
x
∇f(x), v
The result follows immediately from deﬁnition 2.4.2.
2.5 Connections and covariant derivatives
Let ´ be a Riemannian manifold and X, Y be vector ﬁelds on ´. We would like to deﬁne the
derivative of Y at x in the direction X
x
. If ´ were a Euclidean space, we would write:
DY (x)[X
x
] = lim
t→0
Y (x +tX
x
) −Y (x)
t
Of course, when ´is not a vector space, the above equation does not make sense because x+tX
x
is undeﬁned. Furthermore, even if we give meaning to this sum  and we will in section 2.7 
Y (x +tX
x
) and Y (x) would not belong to the same vector spaces. Hence their diﬀerence would
be undeﬁned too.
To overcome these diﬃculties, we need the concept of connection. This can be generically
deﬁned for manifolds. Since we are mainly interested in Riemannian manifolds, we focus on the
so called Riemannian, or LeviCivita connections. The derivatives deﬁned via these connections
are notably interesting because they give a coordinatefree means of deﬁning acceleration along
a curve (i.e., the derivative of the velocity vector) as well as the Hessian of a scalar ﬁeld (i.e., the
derivative of the gradient vector ﬁeld). By coordinatefree, we mean that the choice of charts is
made irrelevant.
We now quickly go over the deﬁnition of aﬃne connection for manifolds and the LeviCivita
theorem, speciﬁc to Riemannian manifolds. We then show a result for Riemannian submanifolds
and promptly specialize it to submanifolds of Euclidean spaces. The latter is the important
result for our study and comes in a simple form. The reader can safely skim through the
technical deﬁnitions that we have to introduce in order to get there.
Deﬁnition 2.5.1 (aﬃne connection). Let A(´) denote the set of smooth vector ﬁelds on ´
and T(´) denote the set of smooth scalar ﬁelds on ´. An aﬃne connection ∇ on a manifold
´ is a mapping
∇ : A(´) A(´) →A(´)
which is denoted by (X, Y ) →∇
X
Y and satisﬁes the following properties:
(1) T(´)linearity in X: ∇
fX+gY
Z = f∇
X
Z +g∇
Y
Z,
(2) Rlinearity in Y : ∇
X
(aY + bZ) = a∇
X
Y +b∇
X
Z,
(3) Product rule (Leibniz’ law): ∇
X
(fY ) = (Xf)Y +f∇
X
Y ,
in which X, Y, Z ∈ A(´), f, g ∈ T(´) and a, b ∈ R. We have used a standard interpretation
of vector ﬁelds as derivations on ´. The notation Xf stands for a scalar ﬁeld on ´ such
that Xf(p) = Df(p)[X
p
]. The above properties should be compared to the usual properties
16 CHAPTER 2. ELEMENTS OF RIEMANNIAN GEOMETRY
of derivations in R
n
. Every smooth manifold admits inﬁnitely many aﬃne connections. This
approach is called an axiomatization: we state the properties we desire in the deﬁnition, then
only investigate whether such objects exist.
Deﬁnition 2.5.2 (covariant derivative). The vector ﬁeld ∇
X
Y is called the covariant derivative
of Y with respect to X for the aﬃne connection ∇.
Essentially, at each point p ∈ ´, ∇
X
Y (p) is a vector telling us how the vector ﬁeld Y is
changing in the direction X
p
. This interpretation shows that it is nonessential that the vector
ﬁelds X and Y be deﬁned over all of ´. We can deﬁne covariant derivatives along curves.
Typically, Y would be deﬁned on the trajectory of a curve γ and the derivative direction X
γ(t)
would be equal to ˙ γ(t), the tangent vector to the curve at γ(t), i.e., the velocity vector.
The following example shows a natural aﬃne connection in Euclidean space.
Example 2.5.3. In R
n
, the classical directional derivative deﬁnes an aﬃne connection:
(∇
X
Y )
x
= lim
t→0
Y (x +tX
x
) −Y (x)
t
= DY (x)[X
x
]
This should give us conﬁdence that the deﬁnition 2.5.1 is a good deﬁnition. As often, the
added structure of Riemannian manifolds makes for stronger results. The LeviCivita theorem
singles out one particular aﬃne connection for each Riemannian manifold.
Theorem 2.5.4 (LeviCivita). On a Riemannian manifold ´ there exists a unique aﬃne con
nection ∇ that satisﬁes
(1) ∇
X
Y −∇
Y
X = [X, Y ] (symmetry), and
(2) Z 'X, Y ` = '∇
Z
X, Y ` +'X, ∇
Z
Y ` (compatibility with the Riemannian metric),
for all X, Y, Z ∈ A(´). This aﬃne connection is called the LeviCivita connection or the
Riemannian connection.
In the above deﬁnition, we used the notation [X, Y ] for the Lie bracket of X and Y , which is
a vector ﬁeld deﬁned by [X, Y ]f = X(Y f) −Y (Xf), ∀f ∈ T(´), again using the interpretation
of vector ﬁelds as derivations. Not surprisingly, the connection exposed in example 2.5.3 is the
Riemannian connection on Euclidean spaces. We always assume that the LeviCivita connection
is the one we use. The next theorem is an important result about the Riemannian connection
of a submanifold of a Riemannian manifold taken from [AMS08]. This situation is illustrated on
ﬁgure 2.3.
X
∇
X
X
´
´
Figure 2.3: Riemannian connection ∇ in a Euclidean space ´ applied to a tangent vector ﬁeld X to
a circle. We observe that ∇XX is not tangent to the circle, hence simply restricting ∇ to the circle is
not an option. As theorem 2.5.5 shows, we need to project (∇XX)x on the tangent space Tx´ to obtain
(∇XX)x. Figure courtesy of Absil et al., [AMS08].
Theorem 2.5.5. Let ´ be a Riemannian submanifold of a Riemannian manifold ´ and let ∇
and ∇ denote the Riemannian connections on ´ and ´. Then,
(∇
X
Y )
p
= P
p
(∇
X
Y )
p
for all X
p
∈ T
p
´ and Y ∈ A(´).
2.6. DISTANCES AND GEODESIC CURVES 17
In particular, for Euclidean spaces (example 2.5.3),
(∇
X
Y )
x
= P
x
(DY (x)[X
x
]) (2.1)
This means that the Riemannian connection on ´ can be computed via a classical directional
derivative in the embedding space followed by a projection on the tangent space. Equation (2.1)
is the most important result of this section. It gives us a practical means of computing derivatives
of vector ﬁelds on the sphere for example, which is useful in chapter 4.
The acceleration along a curve can now be deﬁned.
Deﬁnition 2.5.6 (acceleration along a curve). Let γ : R → ´ be a (
2
curve on ´. The
acceleration along γ is given by:
D
2
dt
2
γ(t) =
D
dt
˙ γ(t) = ∇
˙ γ(t)
˙ γ(t).
In the above equation, we abused the notation for ˙ γ which is tacitly supposed to be smoothly
extended to a vector ﬁeld X ∈ A(´) such that X
γ(t)
= ˙ γ(t). For submanifolds of Euclidean
spaces, by equation (2.1) and using deﬁnition 2.2.1 for tangent vectors, this reduces to:
D
2
dt
2
γ(t) = P
γ(t)
d
2
dt
2
γ(t).
An axiomatic version of this deﬁnition is given in [AMS08, p. 102]. It has the added advantage
of showing linearity of the operator
D
dt
as well as a natural product and composition rule.
2.6 Distances and geodesic curves
The availability of inner products on the tangent spaces makes for an easy deﬁnition of curve
length and distance.
Deﬁnition 2.6.1 (length of a curve). The length of a curve of class (
1
, γ : [a, b] → ´, on a
Riemannian manifold (´, g), with 'ξ, η`
p
g
p
(ξ, η), is deﬁned by
L(γ) =
b
a
' ˙ γ(t), ˙ γ(t)`
γ(t)
dt =
b
a
 ˙ γ(t)
γ(t)
dt.
If ´ is embedded in R
n
, ˙ γ(t) can be replaced by γ
(t), where γ is considered to be a function
from [a, b] to R
n
and with the right deﬁnition of g.
Deﬁnition 2.6.2 (Riemannian distance). The Riemannian distance (or geodesic distance) on
´ is given by:
dist : ´´→R
+
: (p, q) →dist (p, q) = inf
γ∈Γ
L(γ),
where Γ is the set of all (
1
curves γ : [0, 1] →´ such that γ(0) = p and γ(1) = q.
Under very reasonable conditions (see [AMS08, p. 46]), one can show that the Riemannian
distance deﬁnes a metric. The deﬁnition above captures the idea that the distance between
two points is the length of the shortest path joining these two points. In a Euclidean space,
such a path would simply be the line segment joining the points. Another characteristic of line
segments seen as curves with arclength parameterization is that they have zero acceleration.
The next deﬁnition generalizes the concept of straight lines, preserving this zero acceleration
characteristic, to Riemannian manifolds.
Deﬁnition 2.6.3 (geodesic curve). A curve γ : R → ´ is a geodesic curve if and only if
D
2
dt
2
γ(t) ≡ 0, i.e., if it has zero acceleration on all its domain.
For close points, geodesic curves are shortest paths. It is not true for any two points on
a geodesic though. Indeed, think of two points on the equator of the unit sphere in R
3
. The
equator itself, parameterized by arclength, is a geodesic. Following this geodesic, one can join
the two points via a path of length r or a path of length 2π −r. Unless r = π, one of these paths
is bound to be suboptimal.
18 CHAPTER 2. ELEMENTS OF RIEMANNIAN GEOMETRY
2.7 Exponential and logarithmic maps
Exponentials are mappings that, given a point x on a manifold and a tangent vector ξ at x,
generalize the concept of “x + ξ”. In a Euclidean space, the sum x + ξ is a point in space that
can be reached by leaving x in the direction ξ. With exponentials, Exp
x
(ξ) is a point on the
manifold that can be reached by leaving x and moving in the ξ direction while remaining on the
manifold. Furthermore, the trajectory followed is a geodesic (zero acceleration) and the distance
traveled equals the norm of ξ.
Deﬁnition 2.7.1 (exponential map). Let ´ be a Riemannian manifold and x ∈ ´. For every
ξ ∈ T
x
´, there exists an open interval I ÷ 0 and a unique geodesic γ(t; x, ξ) : I →´ such that
γ(0) = x and ˙ γ(0) = ξ. Moreover, we have the homogeneity property γ(t; x, aξ) = γ(at; x, ξ).
The mapping
Exp
x
: T
x
´→ ´: ξ →Exp
x
(ξ) = γ(1; x, ξ)
is called the exponential map at x. In particular, Exp
x
(0) = x, ∀x ∈ ´.
Exponentials can be expensive to compute. The following concept, retractions, has a simpler
deﬁnition that captures the most important aspects of exponentials as far as we are concerned.
Essentially, we drop the requirement that the trajectory by a geodesic, as well as the equality
between distance traveled and ξ. It was introduced by Absil et al. in [AMS08]. Figure 2.4
illustrates the concept.
x
´
T
x
´
R
x
(ξ)
ξ
Figure 2.4: Retraction. Figure courtesy of Absil et al., [AMS08].
Deﬁnition 2.7.2 (retraction). A retraction on a manifold ´ is a smooth mapping R from the
tangent bundle T´ onto ´ with the following properties. Let R
x
denote the restriction of R to
T
x
´.
(1) R
x
(0) = x, where 0 is the zero element of T
x
´.
(2) The diﬀerential (DR
x
)
0
: T
0
(T
x
´) ≡ T
x
´→T
x
´ is the identity map on T
x
´,
(DR
x
)
0
= Id (local rigidity).
Equivalently, the local rigidity condition can be stated as: ∀ξ ∈ T
x
´, the curve γ
ξ
: t →
R
x
(tξ) satisﬁes ˙ γ
ξ
(0) [γ
ξ
] = ξ. In particular, an exponential map is a retraction. One can
think of retractions as mappings that share the important properties we need with the exponen
tial map, while being deﬁned in a ﬂexible enough way that we will be able to propose retractions
that are, computationally, cheaper than exponentials. Retractions are the core concept needed
to generalize descent algorithms to manifolds, see sections 3.4 and 3.5.
A related concept is the logarithmic map. Not surprisingly, it is deﬁned as the inverse mapping
of the exponential map. For two points x and y, logarithms generalize the concept of “y −x”.
2.7. EXPONENTIAL AND LOGARITHMIC MAPS 19
Deﬁnition 2.7.3 (logarithmic map). Let ´ be a Riemannian manifold. We deﬁne
Log
x
: ´→T
x
´: y →Log
x
(y) = ξ, such that Exp
x
(ξ) = y and ξ
x
= dist (x, y) .
Given a root point x and a target point y, the logarithmic map returns a tangent vector at
x pointing toward y and such that  Log
x
(y)  = dist (x, y). As is, this deﬁnition is not perfect.
There might indeed be more than one eligible ξ. For example, think of the sphere S
2
and place
x and y at the poles: for any vector η ∈ T
x
S
2
such that η = π, we have Exp
x
(η) = y. For a
more careful deﬁnition of the logarithm, see, e.g., [dC92]. As long as x and y are not “too far
apart”, this deﬁnition is satisfactory. Again, computing this map can be computationally expen
sive. Plus, we will need to diﬀerentiate it with respect to x and y later on. We pragmatically
introduce, in this work, the notion of generalized logarithm, just like we introduced retractions
as proxies for the exponential.
The inverse function theorem on manifolds can be stated like so, see, e.g., [Boo86].
Theorem 2.7.4 (inverse function theorem). Let ´, ^ be smooth manifolds and let f : ´→^
be a smooth function. If the diﬀerential of f at x ∈ ´,
Df
x
: T
x
´→T
f(x)
^,
is a vector space isomorphism (i.e., an invertible linear map), then there exists an open neigh
borhood U ⊂ ´ of x such that the restriction
f[
U
: U →f(U)
is a diﬀeomorphism (i.e., a smooth invertible function with smooth inverse). Furthermore, writ
ing g = f[
U
, (Dg
−1
)
g(x)
= (Dg
x
)
−1
.
Let us consider a retraction R on a smooth manifold ´ and let us consider a point x ∈ ´.
The map
R
x
: T
x
´→´
is a smooth function from ´ to the smooth manifold T
x
´. Furthermore, according to the
deﬁnition 2.7.2 for retractions, the diﬀerential
(DR
x
)
0
: T
0
(T
x
´) ≡ T
x
´→T
x
´
is the identity map Id on T
x
´ which, of course, is a vector space isomorphism. By the inverse
function theorem, there exists a neighborhood U ⊂ T
x
´ of 0, and a smooth function
L
x
: R
x
(U) ⊂ ´→U
such that L
x
(R
x
(ξ)) = ξ, ∀ξ ∈ U. In particular, L
x
(x) = 0 since R
x
(0) = x. Since Id
−1
= Id,
we also have that the diﬀerential
(DL
x
)
x
: T
x
´→T
0
(T
x
´) ≡ T
x
´
is the identity map on T
x
´. We introduce a new deﬁnition for the purpose of this work.
Deﬁnition 2.7.5 (generalized logarithms). A generalized logarithm is a smooth mapping L from
´ ´ to T´ with the following properties. Let L
x
: ´ → T
x
´ denote the restriction from
L to ¦x¦ ´, with x ∈ ´.
(1) L
x
(x) = 0, where 0 is the zero element of T
x
´.
(2) The diﬀerential (DL
x
)
x
: T
x
´→T
0
(T
x
´) ≡ T
x
´ is the identity map on T
x
´,
(DL
x
)
x
= Id.
20 CHAPTER 2. ELEMENTS OF RIEMANNIAN GEOMETRY
Based on the considerations above, generalized logarithms can be constructed as extensions
of local inverses of retractions. In particular, we can think of Log as a generalized logarithm
constructed from the retraction Exp.
Let us exhibit another way of constructing generalized logarithms on a smooth manifold ´.
Given x ∈ ´ and a smooth scalar ﬁeld f
x
: ´→R, let us consider the smooth map
L
x
: ´→T
x
´: y →L
x
(y) = f
x
(y) Log
x
(y) .
Exploiting the fact that Log is a generalized logarithm, we verify that L
x
(x) = 0 and that
(DL
x
)
x
= f
x
(x) (DLog
x
)(x) + Log
x
(x) Df
x
(x) = f
x
(x) Id
is the identity map on T
x
´ if and only if f
x
(x) = 1. Hence, when Log
x
(y) admits an expression
in the form of a vector scaled by a complicated nonlinear function, a generalized logarithm may
sometimes be constructed by (carefully) getting rid of the scaling factor. We do this for the
sphere in section 4.1.
2.8 Parallel translation
In Euclidean spaces, it is natural to compare vectors rooted at diﬀerent points in space, so much
that the notion of root of a vector is utterly unimportant. On manifolds, each tangent vector
belongs to a tangent space speciﬁc to its root point. Vectors from diﬀerent tangent spaces cannot
be compared immediately. We need a mathematical tool capable of transporting vectors between
tangent spaces while retaining the information they contain.
The proper tool from diﬀerential geometry for this is called parallel translation. Let us con
sider two points x, y ∈ ´, a vector ξ ∈ T
x
´ and a curve γ on ´ such that γ(0) = x and
γ(1) = y. We introduce X, a vector ﬁeld deﬁned along the trajectory of γ and such that X
x
= ξ
and ∇
˙ γ(t)
X(γ(t)) ≡ 0. We say that X is constant along γ. The transported vector is X
y
; it
depends on γ.
In general, computing X
y
requires one to solve a diﬀerential equation on ´. Just like we
introduced retractions as a simpler proxy for exponentials and generalized logarithms for log
arithms, we now introduce the concept of vector transport as a proxy for parallel translation.
Again, this concept was ﬁrst described by Absil et al. in [AMS08].
Roughly speaking, the notion of vector transport tells us how to transport a vector ξ ∈ T
x
´
from a point x ∈ ´ to a point R
x
(η) ∈ ´, η ∈ T
x
´. We ﬁrst introduce the Whitney sum then
quote the deﬁnition of vector transport.
T´⊕T´= ¦(η, ξ) : η, ξ ∈ T
x
´, x ∈ ´¦
Hence T´⊕ T´ is the set of pairs of tangent vectors belonging to a same tangent space. In
the next deﬁnition, one of them will be the vector to transport and the other will be the vector
along which to transport. This deﬁnition is illustrated on ﬁgure 2.5.
Deﬁnition 2.8.1 (vector transport). A vector transport on a manifold ´ is a smooth mapping
T´⊕T´→T´: (η, ξ) →T
η
(ξ) ∈ T´
satisfying the following properties for all x ∈ ´:
(1) (associated retraction) There exists a retraction R, called the retraction associated with T,
such that T
η
(ξ) ∈ T
Rx(η)
´.
(2) (consistency) T
0
(ξ) = ξ for all ξ ∈ T
x
´.
(3) (linearity) T
η
(aξ +bζ) = aT
η
(ξ) +bT
η
(ζ) , ∀η, ξ, ζ ∈ T
x
´, a, b ∈ R
2.8. PARALLEL TRANSLATION 21
x
´
T
x
´
η
R
x
(η)
ξ
T
η
(ξ)
Figure 2.5: Vector transport. Figure courtesy of Absil et al., [AMS08].
In the following, we will use this deﬁnition to give a geometric version of the nonlinear
conjugate gradients method, section 3.5. We give the retraction and the associated transport we
use on the sphere, chapter 4, in the next example taken from [AMS08, p. 172].
Example 2.8.2. A valid retraction on the sphere S
2
is given by:
R
x
(v) =
x +v
x +v
.
An associated vector transport is:
T
η
(ξ) =
1
x +η
I −
(x +η)(x +η)
T
(x +η)
T
(x +η)
ξ
On the righthand side, x, η and ξ are to be treated as elements of R
3
. The scaling factor is
not critical, according to deﬁnition 2.8.1. In our implementation for chapter 4, we chose to
normalize such that T
η
(ξ)  = ξ.
22 CHAPTER 2. ELEMENTS OF RIEMANNIAN GEOMETRY
Chapter 3
Objective generalization and
minimization
The second chapter gave us useful tools to replace the forbidden linear combinations that appear
in the Euclidean case by reasonable equivalents in the more general setting of manifolds.
In this third chapter, we use these tools to introduce what we call geometric ﬁnite diﬀerences.
These are then straightforwardly plugged into our original problem objective function to obtain
a generalized objective. In what constitutes a second part of this chapter, we successively give a
quick reminder of the steepest descent method and nonlinear conjugate gradient methods for the
minimization of a real valued function in R
n
. These reviews are followed by geometric versions,
where we simply replaced every forbidden operation by its geometric counterpart. We close this
chapter with a few suggestions for the step choosing algorithms. These algorithms need not
be adapted, since in both the Euclidean and the Riemannian settings they are concerned with
loosely minimizing real functions of one real variable.
Sadly, the theoretical convergence results we had for these methods in the Euclidean case do
not all hold in our geometric setting, or at least cannot be proven as easily. Still, some useful
results can be found in [AMS08]. As we will see in chapters 4 and 5, numerical results tend to
be reassuring on these matters.
23
24 CHAPTER 3. OBJECTIVE GENERALIZATION AND MINIMIZATION
3.1 Geometric ﬁnite diﬀerences
We start by considering a smooth function x : R →R
n
. We assume that we know x(t) at times
t
0
, t
0
+ ∆t
f
and t
0
− ∆t
b
(the subscripts f and b stand for forward and backward). The goal is
to derive a formula for the velocity ˙ x(t
0
) and the acceleration ¨ x(t
0
) based on these values. As is
customary, we write down the following Taylor expansions of x about t
0
:
x(t
0
+ ∆t
f
) = x(t
0
) + ∆t
f
˙ x(t
0
) +
∆t
2
f
2
¨ x(t
0
) +O(∆t
3
f
)
x(t
0
−∆t
b
) = x(t
0
) −∆t
b
˙ x(t
0
) +
∆t
2
b
2
¨ x(t
0
) +O(∆t
3
b
) (3.1)
We introduce some notations here to simplify the expressions:
x
f
= x(t
0
+ ∆t
f
), x
b
= x(t
0
−∆t
b
), x
0
= x(t
0
),
˙ x
0
= ˙ x(t
0
), ¨ x
0
= ¨ x(t
0
) and ∆t = max([∆t
f
[, [∆t
b
[).
Rewriting, we get:
x
f
= x
0
+ ∆t
f
˙ x
0
+
∆t
2
f
2
¨ x
0
+O(∆t
3
)
x
b
= x
0
−∆t
b
˙ x
0
+
∆t
2
b
2
¨ x
0
+O(∆t
3
) (3.2)
This is a linear system in ˙ x
0
and ¨ x
0
. Solving it yields:
˙ x
0
=
1
∆t
f
+ ∆t
b
1
∆t
f
∆t
b
∆t
2
b
(x
f
−x
0
) −∆t
2
f
(x
b
−x
0
)
+O(∆t
2
)
¨ x
0
=
2
∆t
f
+ ∆t
b
1
∆t
f
∆t
b
[∆t
b
(x
f
−x
0
) + ∆t
f
(x
b
−x
0
)] +O(∆t) (3.3)
As expected, our formulas are linear combinations of the values x
b
, x
0
and x
f
. Arbitrary linear
combinations do not, in general, make sense on manifolds, since manifolds do not, in general,
have a vector space structure. However, we have assembled the terms of (3.3) in such a way that
the diﬀerences x
f
− x
0
and x
b
− x
0
are apparent. Those may be interpreted as vectors rooted
at x
0
and pointing toward, respectively, x
f
and x
b
. Using the logarithmic map introduced in
section 2.7, we now propose a generalization of (3.3). We call the following geometric ﬁnite
diﬀerences:
˙ x
0
≈
1
∆t
f
+ ∆t
b
1
∆t
f
∆t
b
∆t
2
b
Log
x0
(x
f
) −∆t
2
f
Log
x0
(x
b
)
¨ x
0
≈
2
∆t
f
+ ∆t
b
1
∆t
f
∆t
b
∆t
b
Log
x0
(x
f
) + ∆t
f
Log
x0
(x
b
)
(3.4)
x
b
, x
0
and x
f
are now considered to be on a manifold ´. The objects ˙ x
0
and ¨ x
0
are vectors in
the tangent space T
x0
´.
It is interesting to observe that the vector ¨ x
0
is zero if x(t) is a geodesic curve, which is a
desirable property. The converse is also true, i.e., if ¨ x
0
= 0, then there exists a geodesic γ(t)
ﬁtting the data. We give an outline of a proof. Simply build γ as follows:
γ(t) =
Exp
x0
t0−t
∆t
b
Log
x0
(x
b
)
if t < t
0
,
Exp
x0
t−t0
∆t
f
Log
x0
(x
f
)
if t ≥ t
0
.
By construction, γ is a geodesic before and after t
0
and is continuous. γ is also globally a
geodesic. Indeed, we can compute the velocity vector ˙ γ(t
0
) “coming from the left”:
˙ γ(t
−
0
) =
−1
∆t
b
Log
x0
(x
b
) ,
3.2. GENERALIZED OBJECTIVE 25
and “coming from the right”:
˙ γ(t
+
0
) =
1
∆t
f
Log
x0
(x
f
) .
Assuming ¨ x
0
= 0, these two velocity vectors agree, hence γ is a geodesic.
Similar developments yield unilateral ﬁnite diﬀerences for end points. Considering x
0
= x(t
0
),
x
1
= x(t
0
+ ∆t
1
) and x
2
= x(t
0
+ ∆t
1
+ ∆t
2
) and deﬁning ∆t = max([∆t
1
[, [∆t
2
[), we get:
˙ x
0
≈
∆t
1
+ ∆t
2
∆t
1
∆t
2
Log
x0
(x
1
) −
∆t
1
∆t
2
(∆t
1
+ ∆t
2
)
Log
x0
(x
2
)
¨ x
0
≈
−2
∆t
1
∆t
2
Log
x0
(x
1
) +
2
∆t
2
(∆t
1
+ ∆t
2
)
Log
x0
(x
2
) (3.5)
Since the formulas for the acceleration are only of order 1, we may as well derive order 1 formulas
for the velocity. Either of the following is ﬁne:
˙ x
0
≈
1
∆t
f
Log
x0
(x
f
)
˙ x
0
≈
−1
∆t
b
Log
x0
(x
b
)
We generalized ﬁnite diﬀerences in a way that makes sense on any smooth manifold equipped
with a logarithm. We built this generalization based on our interpretation of the linear combi
nations appearing in the classical ﬁnite diﬀerences used in the discrete objective, equation (1.6).
Instead, we could have tried to directly discretize terms such as
D
2
dt
2
γ(t), appearing in the gener
alized continuous objective, equation (1.2). In the case of Riemannian submanifolds of R
n
, the
deﬁnition 2.5.6 for the acceleration would have led us to compute the ﬁnite diﬀerences in the
ambient space, then project the result back on the tangent space. As it turns out, on the sphere,
this is equivalent to computing the geometric ﬁnite diﬀerences with a particular generalized
logarithm, deﬁnition 2.7.5. We use that generalized logarithm in chapter 4.
3.2 Generalized objective
In section 1.2, we deﬁned an objective function for discrete curve ﬁtting in Euclidean spaces.
Equations (1.3), (1.4) and (1.5) used classical tools such as Euclidean distances and ﬁnite diﬀer
ences, the latter being used to evaluate velocity and acceleration along a discrete curve. We now
have new tools to rewrite these elements in a very similar form:
E(γ) = E
d
(γ) +λE
v
(γ) +µE
a
(γ) (3.6)
E
d
(γ) =
1
2
N
¸
i=1
w
i
dist
2
(p
i
, γ
si
)
E
v
(γ) =
1
2
N
d
¸
i=1
α
i
v
i

2
γi
E
a
(γ) =
1
2
N
d
¸
i=1
β
i
a
i

2
γi
This time, the data p
1
, . . . , p
N
lives on a manifold ´. The function E and its components E
d
,
E
v
and E
a
are deﬁned over a curve space Γ = ´ . . . ´, which is simply N
d
copies of the
manifold ´. It is a product manifold, and as such inherits many of the properties of ´. In
particular, if ´ is ﬁnite dimensional, which will always be the case in this document, then so is
Γ. This is useful because the deﬁnition of the gradient, deﬁnition 2.4.2, applies.
The distance dist (, ) is the geodesic distance on ´ (if available), or a proxy for it. The
tangent vectors for velocity v
i
and acceleration a
i
at points γ
i
are computed according to the
formulas given in section 3.1. If the logarithmic map needed for geometric ﬁnite diﬀerences has
26 CHAPTER 3. OBJECTIVE GENERALIZATION AND MINIMIZATION
a too complicated form, a proxy for it can be used too. If even coming up with a generalized
logarithm proves diﬃcult, one can envision the alternative generalized objectives described in
section 3.3. To the best of our knowledge, this is a novel objective function for the regression
problem on manifolds.
Other generalizations can be proposed. We will discuss some of them in the next section.
Interestingly, these alternative formulations all come down to ﬁnite diﬀerences in the Euclidean
case, although they may not be equivalent in the general, Riemannian case.
3.3 Alternative discretizations of the objective*
In this optional section, we propose two alternative discretizations of the second order term
in (1.2),
tN
t1
D
2
γ
dt
2
,
D
2
γ
dt
2
γ(t)
dt. (3.7)
The geometric ﬁnite diﬀerences make for clean developments and can be used to derive higher
order formulas in a straightforward way. The discretizations we present here are limited to the
second order. As we will see, they both come down to ﬁnite diﬀerences when specialized to the
Euclidean case, but diﬀer on manifolds. We give explicit formulas on the sphere S
2
to illustrate
this. We introduce S
2
as a manifold in section 4.1. The reader may want to read that section ﬁrst.
Both our alternatives are based on the idea that (3.7) penalizes departure from straightness.
Let us consider three points on a manifold γ
1
, γ
2
and γ
3
associated with times τ
1
, τ
2
and τ
3
.
Deﬁne ∆τ
1
= τ
2
− τ
1
and ∆τ
2
= τ
3
− τ
2
. If there was a geodesic γ(t) such that γ(τ
i
) = γ
i
,
∀i ∈ ¦1, 2, 3¦ and such that the length of γ between τ
i
and τ
i+1
equals the geodesic distance
between γ
i
and γ
i+1
, ∀i ∈ ¦1, 2¦, then γ
2
would be equal to
γ
∗
= Exp
γ1
∆τ
1
∆τ
1
+ ∆τ
2
Log
γ1
(γ
3
)
,
and γ
3
would be equal to
γ
= Exp
γ1
∆τ
1
+ ∆τ
2
∆τ
1
Log
γ1
(γ
2
)
.
Our ﬁrst alternative penalizes terms of the form dist
2
(γ
2
, γ
∗
) whereas our second alternative
penalizes terms of the form dist
2
(γ
3
, γ
). Let us consider ∆τ
1
= ∆τ
2
∆τ for ease of notations.
In this particular case, we could deﬁne γ
∗
as the mean of γ
1
and γ
3
. Specializing the penalties
to a Euclidean space, we obtain:
dist
2
(γ
2
, γ
∗
) =
γ
1
+
1
2
(γ
3
−γ
1
)
−γ
2
2
=
1
4
γ
1
−2γ
2
+γ
3

2
dist
2
(γ
3
, γ
) = (γ
1
+ 2(γ
2
−γ
1
)) −γ
3

2
= γ
1
−2γ
2
+γ
3

2
We recognize classical ﬁnite diﬀerences for acceleration at γ
2
. Hence, these penalties certainly
make sense and are equivalent up to a scaling factor in a Euclidean space. They are not equiv
alent on the sphere. We show this explicitly.
Let (x, y) →'x, y` = x
T
y be our inner product on R
3
. When γ
i
∈ S
2
= ¦x ∈ R
3
: 'x, x` = 1¦,
∀i ∈ ¦1, 2, 3¦, γ
∗
can be computed as:
γ
∗
= mean (γ
1
, γ
3
)
γ
1
+γ
3
γ
1
+γ
3

=
γ
1
+γ
3
2 (1 +'γ
1
, γ
3
`)
.
Since dist
2
(x, y) = arccos
2
('x, y`) on S
2
(see table 4.1), the associated penalty is
dist
2
(γ
2
, γ
∗
) = arccos
2
'γ
2
, γ
1
+γ
3
`
2 (1 +'γ
1
, γ
3
`)
.
3.3. ALTERNATIVE DISCRETIZATIONS OF THE OBJECTIVE* 27
This is a rather involved function considering that we will need to diﬀerentiate it. We now
compute γ
. The latter is a point on the sphere lying in the plane deﬁned by γ
1
and γ
2
and such
that the geodesic distance, i.e., the angle between γ
1
and γ
2
equals the angle between γ
2
and γ
.
We can formalize this as:
'γ
, γ
` = 1
γ
= α
1
γ
1
+α
2
γ
2
'γ
1
, γ
2
` = 'γ
2
, γ
`
This system in (α
1
, α
2
) has two solutions. Either γ
= γ
1
, which we discard as not interesting,
or:
γ
= 2 'γ
1
, γ
2
` γ
2
−γ
1
.
The associated penalty takes the nice form:
dist
2
(γ
3
, γ
) = arccos
2
(2 'γ
1
, γ
2
` 'γ
2
, γ
3
` −'γ
1
, γ
3
`) .
We would like to exploit this to propose a cheap discretization of (3.7) on S
2
for uniform time
sampling. When consecutive γ
i
’s are close to each other, the geodesic distance can be approxi
mated by the Euclidean distance in the ambient space R
3
. We have:
a −b
2
= 2(1 −'a, b`), a, b ∈ S
2
Using this instead of arccos
2
('a, b`) (this corresponds to a ﬁrst order Taylor approximation), we
propose a simple form for E
a
on S
2
:
E
a
=
1
∆τ
3
N
d
−1
¸
i=2
1 −2 'γ
i−1
, γ
i
` 'γ
i
, γ
i+1
` +'γ
i−1
, γ
i+1
` (3.8)
We neglected the endpoints, which is not a liability since this formula is intended for ﬁne time
sampling. The 1/∆τ
3
factor is explained as follows. In the Euclidean case, the penalty is given
by:
E
a
=
1
2
N
d
−1
¸
i=2
∆τ
γ
i+1
−2γ
i
+γ
i−1
∆τ
2
2
=
1
2
N
d
−1
¸
i=2
1
∆τ
3
γ
i+1
−2γ
i
+γ
i−1

2
As we showed earlier, in the Euclidean case, dist
2
(γ
3
, γ
) = γ
3
−2γ
2
+γ
1

2
, hence we identiﬁed
corresponding terms. The derivatives of (3.8), seen as a scalar ﬁeld in the ambient space R
3×N
d
,
are given by:
∆τ
3
∂E
a
∂γ
1
= γ
3
−2 'γ
2
, γ
3
` γ
2
∆τ
3
∂E
a
∂γ
2
= γ
4
−2('γ
1
, γ
2
` +'γ
3
, γ
4
`)γ
3
−2 'γ
2
, γ
3
` γ
1
∆τ
3
∂E
a
∂γ
k
= γ
k+2
−2('γ
k−1
, γ
k
` +'γ
k+1
, γ
k+2
`)γ
k+1
+γ
k−2
−2('γ
k−2
, γ
k−1
` +'γ
k
, γ
k+1
`)γ
k−1
, ∀k ∈ ¦3, . . . , N
d
−2¦
∆τ
3
∂E
a
∂γ
N
d
−1
= γ
N
d
−3
−2('γ
N
d
−3
, γ
N
d
−2
` +'γ
N
d
−1
, γ
N
d
`)γ
N
d
−2
−2 'γ
N
d
−2
, γ
N
d
−1
` γ
N
d
∆τ
3
∂E
a
∂γ
N
d
= γ
N
d
−2
−2 'γ
N
d
−2
, γ
N
d
−1
` γ
N
d
−1
Both E
a
and its gradient can thus be computed with simple inner products of close points on
S
2
and elementary operations such as sums, subtractions and multiplication. This is cheap and
reliable. Regretfully, this nice property is lost when we try to generalize to nonhomogeneous
time sampling.
28 CHAPTER 3. OBJECTIVE GENERALIZATION AND MINIMIZATION
3.4 A geometric steepest descent algorithm
The steepest descent algorithm is probably the most straightforward means of minimizing a dif
ferentiable function f : R
n
→R without constraints. Following Nocedal and Wright in [NW99],
we give a reminder of the method in algorithm 1. We delay the details of how one can choose
the step length to section 3.6. Of course, whether or not the sequence generated by algorithm 1
converges to a critical point of f (i.e., a point x
∗
such that ∇f(x
∗
) = 0) depends on the step
choosing algorithm. The initial guess can be constructed based on a priori knowledge of the
problem at hand. For practical purposes, one should, of course, integrate a stopping criterion
into the algorithm.
Algorithm 1: Steepest descent method in R
n
input : A function f : R
n
→R and its gradient ∇f : R
n
→R
n
;
An initial guess x
0
∈ R
n
.
output: A sequence x
0
, x
1
, . . . converging toward a critical point of f.
for k = 0, 1, . . . do
if ∇f(x
k
) = 0 then
d
k
:= −
∇f(x
k
)
∇f(x
k
)
α
k
:= choose step length
x
k+1
:= x
k
+α
k
d
k
else
x
k+1
= x
k
end
end
We would like to use the tools developed in the previous chapter to generalize algorithm 1
to a scalar ﬁeld f on a manifold ´. We already deﬁned the gradient gradf : ´ → T´ in
deﬁnition 2.4.2. What we still need is the right interpretation for the update line of the steepest
descent algorithm, namely x
k+1
:= x
k
+ α
k
d
k
. In the geometric context, x
k
is a point on the
manifold and α
k
d
k
is a tangent vector to ´ at x
k
. The summation of these two elements is
undeﬁned. However, the meaning of the sum is clear: starting at x
k
, we follow the vector α
k
d
k
until we reach a new point, which we name x
k+1
. Retractions, deﬁnition 2.7.2, and in particular
the exponential map do exactly this. We can thus rewrite the algorithm as algorithm 2, in very
much the same way.
Algorithm 2: Geometric steepest descent method on ´
input : A scalar ﬁeld f : ´→R and its gradient gradf : ´→ T´;
A retraction R on ´; An initial guess x
0
on ´.
output: A sequence x
0
, x
1
, . . . converging toward a critical point of f.
for k = 0, 1, . . . do
if  gradf(x
k
)
x
k
= 0 then
d
k
:= −
grad f(x
k
)
grad f(x
k
)x
k
α
k
:= choose step length
x
k+1
:= R
x
k
(α
k
d
k
)
else
x
k+1
= x
k
end
end
Such a generalization can be found in [AMS08], along with proper analysis giving, notably,
suﬃcient conditions to guarantee convergence toward a critical point of the scalar ﬁeld f. Such
a treatment of the method is not our main focus in this document. We thus propose to simply
state the relevant theorem from [AMS08] without proof. The following results are concerned with
general descent methods, i.e., d
k
is not forced to point in the opposite direction of the gradient.
3.5. A GEOMETRIC NONLINEAR CONJUGATE GRADIENT ALGORITHM 29
Global convergence results then depend upon the directions sequence d
0
, d
1
, . . . as well as on the
step lengths sequence α
0
, α
1
, . . .
Deﬁnition 3.4.1 (gradientrelated sequence). Let f be a scalar ﬁeld on ´. The sequence
¦d
0
, d
1
, . . .¦, d
k
∈ T
x
k
´, is gradientrelated if, for any subsequence ¦x
k
¦
k∈K
of ¦x
k
¦ that con
verges to a noncritical point of f, the corresponding subsequence ¦d
k
¦
k∈K
is bounded and satisﬁes
limsup
k→∞,k∈K
'gradf(x
k
), d
k
`
x
k
< 0.
The above deﬁnition essentially forces the directions d
k
to be descent directions in the end.
Obviously, the steepest descent sequence
d
k
= −
gradf(x
k
)
 gradf(x
k
)
is gradientrelated. The next deﬁnition, based on Armijo’s backtracking procedure, forces the
decrease in f to be suﬃcient at each step (and simultaneously deﬁnes what suﬃcient means),
based on the steepness of the chosen direction. It diﬀers slightly from [AMS08].
Deﬁnition 3.4.2 (Armijo point). Given a cost function f on ´ with retraction R, a point
x ∈ ´, a tangent vector d ∈ T
x
´ and scalars α > 0, β, σ ∈ (0, 1), the Armijo point is
R
x
d
A
= R
x
t
A
d
= R
x
(β
m
αd), where m is the smallest nonnegative integer such that
f(x) −f(R
x
(β
m
αd)) ≥ σ '−gradf(x), β
m
αd`
x
The real number t
A
is the Armijo step size.
When d is a descent direction, such a point and step size are guaranteed to exist. The next
theorem uses both these deﬁnitions to make a statement about the convergence properties of
variations of algorithm 2.
Theorem 3.4.3. Let ¦x
k
¦ be an inﬁnite sequence of iterates generated by a variation of al
gorithm 2 such that the sequence ¦d
k
¦ is gradientrelated and such that, at each step, f(x
k
) −
f(x
k+1
) ≥ c(f(x
k
) − f(R
x
k
t
A
k
d
k
)), where c ∈ (0, 1) and t
A
k
is the Armijo step size for given
parameters. Further assume that the initial level set L = ¦x ∈ ´ : f(x) ≤ f(x
0
)¦ is compact
(which is certainly the case if ´ itself is compact). Then,
lim
k→∞
 gradf(x
k
) = 0
We do not need to use the Armijo step size per se. Any step choosing algorithm assuring
suﬃcient descent will do. The Armijo point, though, is a nice way of deﬁning suﬃcient descent.
It also makes it possible to prove theorem 3.4.3 fairly independently of the actual directions and
steps used. In the end, one needs to provide a problemspeciﬁc, eﬃcient algorithm, as we will do
in the application chapters.
3.5 A geometric nonlinear conjugate gradient algorithm
In R
n
, the steepest descent algorithm is known for its linear convergence rate. It should not come
as a surprise that the geometric version of it, which we presented in the previous section, is equally
slow (although it is a bit more technical to establish it). In order to achieve faster convergence
rates, one could resort to secondorder methods, like Newton or quasiNewton schemes. This is
unpractical (but not impossible) in a geometric setting, since it requires a proper deﬁnition of
the Hessian of a scalar ﬁeld f, i.e., the derivative of a vector ﬁeld, as well as a means to compute
it. The aﬃne connections deﬁned in section 2.5 are the right tool, but do not lend themselves to
easy calculations. The objective functions we deal with in this document being quite involved,
we decided to postpone such investigations to later work and to focus on, hopefully, superlinear
schemes based on ﬁrst order derivatives only.
30 CHAPTER 3. OBJECTIVE GENERALIZATION AND MINIMIZATION
The nonlinear conjugate gradient method is such a scheme in R
n
. Algorithm 3 is Nocedal
and Wright’s version of it, see [NW99], based on FletcherReeves coeﬃcients. The unitlength
update directions d
k
have been introduced so that the step length chosen would always corre
spond to the actual displacement vector’s length. This way, it is easier to compare the steepest
descent step lengths and the NLCG step lengths.
Algorithm 3: Nonlinear conjugate gradient method in R
n
input: A function f : R
n
→R and its gradient ∇f : R
n
→R
n
;
An initial guess x
0
∈ R
n
.
Evaluate f
0
= f(x
0
), ∇f
0
= ∇f(x
0
)
p
0
:= −∇f
0
k := 0
while ∇f
k
= 0 do
d
k
=
p
k
p
k
α
k
:= choose step length
x
k+1
:= x
k
+α
k
d
k
Evaluate ∇f
k+1
= ∇f(x
k+1
)
β
FR
k+1
:=
∇f
T
k+1
∇f
k+1
∇f
k
2
p
k+1
:= −∇f
k+1
+β
FR
k+1
p
k
k := k + 1
end
An alternative algorithm, termed the PolakRibi`ere version of algorithm 3, uses the following
β’s:
β
PR
k+1
=
∇f
T
k+1
(∇f
k+1
−∇f
k
)
∇f
k

2
The update directions d
k
are not guaranteed to be descent directions unless we impose some
constraints on the α
k
’s. Indeed, we have:
'p
k
, ∇f
k
` = −∇f
k

2
+β
k
'p
k−1
, ∇f
k
` ,
which needs to be negative for d
k
to be a descent direction. This can be guaranteed by using
the wellknown strong Wolfe conditions on the α
k
’s. For a step length choosing algorithm which
cannot guarantee a descent direction for the NLCG method, an easy workaround is to accomplish
a steepest descent iteration every time p
k
is not a descent direction. Theorem 3.4.3 tells us that
such an algorithm would still converge. Interestingly, the particular choice β
k
= 0 makes the
algorithm revert to the steepest descent method altogether.
In [NW99], Nocedal and Wright mention a wealth of variations on algorithm 3 as well as
useful and interesting results concerning convergence and numerical performances. It is not our
aim to provide an exhaustive treatment of CG algorithms in this document. In the application
chapters later on, we will illustrate performances of the methods outlined here.
We now generalize algorithm 3 to manifolds. Such a generalization can be found in [AMS08].
The update step x
k+1
:= x
k
+α
k
d
k
can be modiﬁed in the very same way we did for algorithm 2.
The direction update though, p
k+1
:= −∇f
k+1
+ β
k+1
p
k
, is tricky. Indeed, p
k+1
and gradf
k+1
live in T
x
k+1
´ but p
k
is in T
x
k
´. Hence, the sum is undeﬁned. The vector transport concept,
deﬁnition 2.8.1, comes in handy here. It enables us to transport vector p
k
from T
x
k
´to T
x
k+1
´,
hence giving us a reasonable substitute for which the sum with −gradf
k+1
is deﬁned. Same
thing goes for the β
k
’s:
β
FR
k+1
=
'gradf
k+1
, gradf
k+1
`
 gradf
k

2
β
PR
k+1
=
'gradf
k+1
, gradf
k+1
−T
α
k
d
k
(gradf
k
)`
 gradf
k

2
(3.9)
3.6. STEP CHOOSING ALGORITHMS 31
Algorithm 4 synthesizes these remarks.
Algorithm 4: Geometric nonlinear conjugate gradient method on ´
input: A scalar ﬁeld f : ´→R and its gradient gradf : ´→ T´;
An initial guess x
0
∈ ´.
Evaluate f
0
= f(x
0
), gradf
0
= gradf(x
0
)
p
0
:= −gradf
0
k := 0
while gradf
k
= 0 do
d
k
=
p
k
p
k
α
k
:= choose step length
x
k+1
:= R
x
k
(α
k
d
k
)
Evaluate gradf
k+1
= gradf(x
k+1
)
β
k+1
:= cf. equation (3.9)
p
k+1
:= −gradf
k+1
+β
k+1
T
α
k
d
k
(p
k
)
k := k + 1
end
When minimizing a quadratic objective in n variables, the CG method converges to the exact
solution in at most n iterations. It is therefore customary to restart the nonlinear CG method
every n iterations or so, by applying a steepest descent step (which boils down to setting β
k
= 0).
This way, old, irrelevant information is erased and the algorithm can start anew. In particular,
if the objective is globally convex and is a strictly convex quadratic in a neighborhood about the
minimizer, the algorithm will eventually enter that neighborhood and restart in it, hence revert
ing to a linear CG method and converging in at most n additional steps. In our algorithms, we
decide to restart every n iterations where n is the dimension of ´.
Nonlinear CG methods and their convergence properties are not well understood in R
n
, let
alone on manifolds. We thus stop here, and rely on numerical tests in the application chapters
to assess the performances of algorithm 4.
3.6 Step choosing algorithms
To choose the step length in algorithms 2 and 4, we need to solve a line search problem, i.e.,
compute
α
∗
= arg min
α>0
φ(α) = arg min
α>0
f(R
x
(αp)).
Despite the fact that our original problem uses manifolds, the line search problem only uses real
functions: we need to (crudely) minimize a function φ : R →R, knowing that φ
(0) < 0 (descent
direction). A lot of work has been done for this problem in the optimization community. Because
of this, we keep the present section to a minimum, referring the reader to, e.g., [NW99].
We single out one algorithm that proved particularly interesting in our case. Since the regres
sion problem is quadratic in R
n
, we expect the objective function to “look like” a quadratic in a
neighborhood around the minimizer on a manifold, precisely because manifolds locally resemble
R
n
. Hence, it makes sense to expect the line search functions to be almost quadratic around
α = 0 when we get near the minimum of f. This is indeed the case when working on the sphere.
These considerations led us to choose algorithm 5 as our chief step choosing algorithm. It is
based on an Armijo backtracking technique with automatic choice of the initial guess α
1
.
In practice, we add cubic interpolation to algorithm 5 as well for i ≥ 2 and check that the
sequence (α
1
, α
2
, . . .) does not decrease too rapidly, along with a few other ad hoc numerical con
siderations. Following Nocedal and Wright in [NW99], we picked c = 10
−4
. We also implemented
algorithms ensuring respect of the Wolfe conditions from [NW99] and actual minimization of φ
32 CHAPTER 3. OBJECTIVE GENERALIZATION AND MINIMIZATION
Algorithm 5: Backtrackinginterpolation step choosing algorithm
require: A small constant c ∈ (0, 1), an iteration limit N, a line search function φ and its
derivative φ
(only computed at 0).
input : The previous step chosen α
prev
and the previous slope φ
0,prev
.
output : A step α respecting the suﬃcient decrease condition φ(0) −φ(α) ≥ −cαφ
(0),
or 0 if N iterations did not suﬃce.
Compute φ
0
:= φ(0) and φ
0
:= φ
(0)
Set α
1
:= 2α
prev
φ
0,prev
φ
0
for i = 1, 2, . . . do
if i > N then
return 0
else
Compute φ
i
= φ(α
i
)
if φ
0
−φ
i
≥ −cα
i
φ
0
then
return α
i
else
α
i+1
= −
α
2
i
φ
0
2(φi−φ0−αiφ
0
)
end
end
end
with Matlab’s fminsearch algorithm. These performed well too, but we do not use them for the
results shown in chapters 4 and 5.
Chapter 4
Discrete curve ﬁtting on the
sphere
The previous chapter constitutes a formal deﬁnition of the regression problem we intend to solve
on manifolds and of minimization algorithms on manifolds. These algorithms could, in principle,
be used to solve the regression problem at hand. Still, tractability has not been assessed yet
and is a fundamental factor of success for any numerical method. Mostly, tractability will be
determined by:
• the complexity of the formulas for geodesic distances and logarithmic maps as well as their
derivatives,
• the convergence rate of the minimization algorithms, and
• the amount of data as well as the sampling precision required.
The intricacy of writing the code (mostly inherent to the complexity of working out the gradient
of the objective) and the actual performances are shown to be reasonable on the sphere. It will
be clear, however, that the results of this chapter do not transpose to every manifold.
One of the main arguments justifying the geometric approach instead of, e.g., a direct use
of spherical coordinates, is the absence of any type of singularity problem at the poles. In this
chapter, we give explicit formulas for all the elements needed to apply the nonlinear conjugate
gradient scheme to discrete curve ﬁtting on the sphere S
2
, i.e., the unit sphere embedded in R
3
.
As we will see, the diﬃculty of writing an algorithm to solve the problem can be reduced by using
proxies for, e.g., the logarithmic map. This can have an inﬂuence on the objective function’s
global minimum, but this inﬂuence can be made arbitrarily small by reﬁning the curve sampling.
We then show some results for diﬀerent scenarios and discuss performances as well as tricks to
speed things up a bit.
33
34 CHAPTER 4. DISCRETE CURVE FITTING ON THE SPHERE
Set: S
2
= ¦x ∈ R
3
: x
T
x = 1¦
Tangent spaces: TxS
2
= ¦v ∈ R
3
: x
T
v = 0¦
Projector: Px : R
3
→ TxS
2
: v → Px(v) = (I −xx
T
)v
Inner product: 'v, w)
x
= 'v, w) = v
T
w (induced metric)
Vector norm: vx =
'v, v)
x
= v
Distance: dist (x, y) = arccos(x
T
y)
Exponential: Exp
x
(v) = x cos(v) +
v
v
sin(v)
Logarithm: Log
x
(y) =
dist(x,y)
Px(y−x)
Px(y −x)
Mean: mean (x, y) =
x+y
x+y
Table 4.1: Geometric toolbox for the Riemannian manifold S
2
4.1 S
2
geometric toolbox
We deﬁne the set S
2
, the unit sphere embedded in R
3
, and give it a natural Riemannian manifold
structure inherited from R
3
(i.e., S
2
is a submanifold of the Euclidean space R
3
). From now on,
we indiscriminately use the symbol S
2
to refer to the set or the manifold, just like we often do
when we use the symbol ´ instead of the actual set the manifold is built upon and vice versa.
Table 4.1 contains a few closedform expressions for the geometric tools we will use henceforth.
In the next few paragraphs, we go over these formulas and give a word of explanation.
S
2
being a submanifold of R
3
deﬁned by an equality constraint, theorem 2.2.2 applies and
we are entitled to use the identiﬁcation between tangent vectors and elements of R
3
, deﬁni
tion 2.2.1. Deﬁning f : R
3
→ R : x → f(x) = x
T
x − 1, the diﬀerential df
x
= 2x
T
yields
ker df
x
= T
x
S
2
= ¦v ∈ R
3
: x
T
v = 0¦. We thus think of tangent vectors as actual vectors of
R
3
tangent to the sphere.
It is readily checked that P
x
◦P
x
= P
x
and that P
x
= P
T
x
(i.e., P
x
is an orthogonal projector).
Furthermore, ImP
x
= T
x
S
2
and ker P
x
= T
⊥
x
S
2
(the orthogonal complement of T
x
S
2
). The Rie
mannian metric as well as the norm are inherited fromR
3
because of the submanifold nature of S
2
.
The geodesic distance between p and q, almost by deﬁnition, is the angle separating p and
q in radians. It corresponds to the length of the shortest arc of great circle joining the two
points. Great circles are geodesic curves for S
2
. A great circle is a unitradius circle centered
at the origin. To simplify, when computing the distance between close points, one could use the
distance in the embedding space: x − y
2
= 2(1 − x
T
y) for x, y ∈ S
2
. Proxies for the geodesic
distance are compared on ﬁgure 4.1.
The curve t → Exp
x
(tv) describes a great circle passing through x at t = 0 and such
that
d
dt
Exp
x
(tv)
t=0
= v, hence Exp is indeed the exponential map for S
2
(this can be more
thoroughly checked by computing the second covariant derivative and verifying that it vanishes).
While the closedform expression for the map is not too complicated, we suggest using an even
simpler map instead, called a retraction, see deﬁnition 2.7.2. The retraction we use on S
2
is the
following:
R
x
: T
x
S
2
→S
2
: v →R
x
(v) =
x +v
x +v
=
x +v
1 +v
2
. (4.1)
Using this retraction, it is impossible to reach points further away from x then a right angle (i.e.,
it is impossible to leave the hemisphere whose pole is x). As descent methods usually perform
small steps anyway, this will not be an issue.
The Log map listed in table 4.1 is indeed the inverse of the exponential map, and as such is
the logarithmic map on S
2
. This time, the expression is quite complicated (remember that we
4.2. CURVE SPACE AND OBJECTIVE FUNCTION 35
Angle between x and y in degrees
R
e
l
a
t
i
v
e
e
r
r
o
r
o
f
s
q
u
a
r
e
d
p
r
o
x
y

d
i
s
t
a
n
c
e
b
e
t
w
e
e
n
x
a
n
d
y
Comparison of squared geodesic distance to candidate proxies
Retraction
Euclidean
Taylor 2
0 45 90 135 180
10
−4
10
−2
10
0
Figure 4.1: Diﬀerent proxies can be envisioned to replace the geodesic distance between x and y on
S
2
to lower the computational burden. All of them are not valuable though. This plot illustrates the
relative error between the genuine squared geodesic distance and (1) the squared Euclidean distance (in
the embedding space) which coincides with a ﬁrst order Taylor development of x
T
y → arccos
2
(x
T
y), (2)
a second order Taylor development and, least interestingly, (3) the squared norm that a vector v at x
pointing toward y should have such that Rx (v) = y. In this work, we always use the genuine geodesic
distance.
need to diﬀerentiate it). We propose a generalized logarithm, deﬁnition 2.7.5:
L
x
: S
2
→T
x
S
2
: y →L
x
(y) = P
x
(y −x) = y −(x
T
y)x. (4.2)
This is indeed a generalized logarithm. Following the discussion at the end of section 2.7:
L
x
(y) = f
x
(y) Log
x
(y) , f
x
(y) =
P
x
(y −x)
dist (x, y)
=
1 −(x
T
y)
2
arccos(x
T
y)
,
where f
x
is smoothly extended by deﬁning f
x
(x) = lim
x
T
y→1
f
x
(y) = 1. L
x
maps points on the
sphere to tangent vectors at x, pointing in the same direction as the genuine logarithmic map,
but with a smaller norm. The error on the norm can be made arbitrarily small by constraining
y to be close enough to x. We illustrate this fact by ﬁgure 4.2.
4.2 Curve space and objective function
Section 3.2 gave us a generic deﬁnition of the curve space and of the objective function to
minimize over that space for a regression problem on a manifold ´. In this section, we specialize
the deﬁnitions to ´ = S
2
. We simply have that the curve space Γ consists of N
d
copies of the
sphere, i.e.,
Γ = S
2
. . . S
2
=
S
2
N
d
.
Since Γ is just a product manifold based on S
2
, the tools we listed in table 4.1 for S
2
can readily
be extended to Γ component wise. For the sake of completeness, we show this explicitly in
table 4.2. This second toolbox will mainly be useful for the optimization algorithms. In order to
deﬁne the objective function, one only needs the ﬁrst toolbox.
Building on sections 3.1 and 3.2, we propose the following energy function for the regression
problem on S
2
:
E : Γ →R : γ →E(γ) = E
d
(γ) +λE
v
(γ) +µE
a
(γ) (4.3)
36 CHAPTER 4. DISCRETE CURVE FITTING ON THE SPHERE
Relative norm error between Log
x
(y) and L
x
(y) on S
2
Angle between x and y in degrees
L
o
g
x
(
y
)
−
L
x
(
y
)
L
o
g
x
(
y
)
0 45 90 135 180
10
−3
10
−2
10
−1
10
0
Figure 4.2: Both Log
x
(y) and Lx (y) point in the same direction, but the latter has a smaller norm.
This plot illustrates the relative error introduced by considering the generalized logarithm (4.2) instead
of the true logarithm as a function of the angle separating x and y. In particular, for angles under 12.6
degrees (or 0.22 radians), the error is under 1%, then quickly decays. Since we deal with ﬁner samplings
anyway, the additional error is a very cheap price to pay in comparison to the decrease in complexity.
Set: Γ = S
2
. . . S
2
(N
d
copies)
Tangent spaces: TγΓ = Tγ
1
S
2
. . . Tγ
N
d
S
2
Projector: Pγ : R
3×N
d
→ TγΓ : v → Pγ(v) = w, wi = Pγ
i
(vi)
Inner product: 'v, w)
γ
=
¸
N
d
i=1
'vi, wi)
γ
i
Vector norm: vγ =
'v, v)
γ
Exponential: Exp
γ
(v) = ˜ γ, ˜ γi = Exp
γ
i
(vi)
Table 4.2: Geometric toolbox for the Riemannian manifold Γ = S
2
. . . S
2
E
d
(γ) =
1
2
N
¸
i=1
w
i
arccos
2
(p
T
i
γ
si
)
E
v
(γ) =
1
2
N
d
−1
¸
i=1
1
∆τ
i
arccos
2
(γ
T
i
γ
i+1
)
E
a
(γ) =
1
2
¸
∆τ
1
2
a
1

2
+
N
d
−1
¸
i=2
∆τ
i−1
+ ∆τ
i
2
a
i

2
+
∆τ
N
d
−1
2
a
N
d

2
¸
(4.4)
E
v
follows from choosing the forward order 1 formula for velocity as well as the rectangle method
for the weights α
i
. In particular, α
N
d
= 0, hence we do not need backward diﬀerences. E
v
has
a very simple form because the norm of Log
x
(y) is simply the geodesic distance between x and
y. We choose the trapezium method for the weights β
i
. The main reason for doing so is purely
aesthetic, as the weights appearing in the ﬁnite diﬀerences formulas cancel out the integration
weights. The ∆τ
i
’s are deﬁned as: ∆τ
i
= τ
i+1
−τ
i
.
Replacing Log by L in the geometric ﬁnite diﬀerences for acceleration, equations (3.4) and (3.5),
we deﬁne the a
i
’s as follows:
a
1
=
−2
∆τ
1
∆τ
2
L
γ1
(γ
2
) +
2
∆τ
2
(∆τ
1
+ ∆τ
2
)
L
γ1
(γ
3
)
4.3. GRADIENT OF THE OBJECTIVE 37
a
i
=
2
∆τ
i−1
+ ∆τ
i
¸
1
∆τ
i
L
γi
(γ
i+1
) +
1
∆τ
i−1
L
γi
(γ
i−1
)
, ∀i ∈ 2 . . . N
d
−1
a
N
d
=
−2
∆τ
N
d
−1
∆τ
N
d
−2
L
γN
d
(γ
N
d
−1
) +
2
∆τ
N
d
−2
(∆τ
N
d
−1
+ ∆τ
N
d
−2
)
L
γN
d
(γ
N
d
−2
)
These will be easier to diﬀerentiate, hence making the next section less cumbersome. Remarkably,
the a
i
’s deﬁned above via simpliﬁed geometric diﬀerences are the same as the ones we would
obtain by computing the ﬁnite diﬀerences in the ambient space R
3
then by projecting the results
back on the tangent planes using the P
γi
’s.
4.3 Gradient of the objective
Since S
2
is a Riemannian submanifold of R
3
, Γ is a Riemannian submanifold of R
3×N
d
and, as
such, theorem 2.4.5 applies. We can therefore write:
gradE : Γ → TΓ : γ →gradE(γ) = P
γ
∇E(γ).
In the above equation, E is a scalar ﬁeld on an open set of the ambient space R
3×N
d
containing
Γ and such that E
Γ
= E; TΓ is the tangent bundle of Γ, see deﬁnition 2.2.5.
This makes it a lot easier to derive the gradient of E, equation (4.3). We just need to provide
the gradients on Γ of the functions deﬁned in equations (4.4), interpreted as smooth scalar ﬁelds
deﬁned on an open set of R
3×N
d
containing Γ.
The following atoms are enough to compute ∇E, hence gradE (with x, q ∈ S
2
, g : R
3
→R
3
):
∇
x → arccos
2
(q
T
x)
(x) =
−2 arccos(q
T
x)
1 −(q
T
x)
2
q
∇
x →g(x)
2
(x) = 2(Jg(x))
T
g(x)
J (x →L
q
(x)) (x) = P
q
J (x →L
x
(q)) (x) = −(x
T
q)I −xq
T
J denotes the Jacobian, ∇ denotes the gradient of real functions. A completely explicit derivation
of gradE would be overly cumbersome. The material above is suﬃcient to obtain the needed
expressions. The derivation does not present any challenges, since it reduces to a standard
derivation of a real function followed by a projection (theorem 2.4.5).
4.4 Results and comments
We implemented algorithm 4 for ´= S
2
. . . S
2
= Γ to optimize objective (4.3). We skip over
the technical details of the implementation. The material exposed in this document still being a
proof of concept, we do not deem it important to delve into such details yet. The accompanying
code was written with care. The interested reader can inspect it for further information.
We divide this section in two parts. The ﬁrst one qualitatively exposes the kind of solutions
we expect based on the tuning parameters λ and µ. Practical results produced by our algorithms
are shown to illustrate the adequacy between predicted and obtained curves. The second part
deals with the behavior our algorithms exhibit when converging toward some solutions. In par
ticular, we show that CG methods are usually more eﬃcient than the steepest descent method
and behave better (essentially reducing oscillations around the solution). Also, what we later on
refer to as our reﬁnement technique, algorithm 6, is shown to speed up convergence and increase
reliability.
38 CHAPTER 4. DISCRETE CURVE FITTING ON THE SPHERE
0 0.5 1 1.5 2 2.5 3 3.5
Figure 4.3: The red points are the data on the sphere, with identical weights. The color on the sphere
gives the value of the scalar ﬁeld E, i.e., the sum of squared geodesic distances to the data. The minimum
of E corresponds to the mean of the data, the single white dot. A very good initial guess is to project
the arithmetic mean in R
n
back on the sphere by normalizing it. Convergence to six digits of accuracy
is achieved in typically under four iterations.
4.4.1 Types of solutions
Zeroth order
Setting λ = µ = 0 yields an objective function E that completely disregards smoothness and
just tries to ﬁt the discretization points γ
i
to the associated data. Discretization points with no
data point attached become irrelevant. This makes little sense unless N
d
= 1, in which case the
only point we search for corresponds to the weighted mean of the data on the manifold. In this
setting, we ignore the time labels t
i
. The objective we minimize is:
E : S
2
→R : γ →E(γ) =
1
2
N
¸
i=1
w
i
arccos
2
(p
T
i
γ)
This is to be compared to the usual deﬁnition of the arithmetic mean in R
n
:
p = arg min
γ∈R
n
1
2
N
¸
i=1
w
i
p
i
−γ
2
=
¸
N
i=1
w
i
p
i
¸
N
i=1
w
i
Of course, the explicit formula for p as a linear combination of the p
i
’s does not make sense on
S
2
and we therefore use our numerical algorithms instead
1
. Figure 4.3 shows an example. An
excellent initial guess for the CG algorithm is the Euclidean mean projected back on the sphere,
i.e., γ
0
=
p
p
. Typically, convergence is achieved to 6 signiﬁcant digits in under 4 iterations.
First order
In R
n
, setting λ > 0 and µ = 0 disregards second order derivatives, yielding piecewise aﬃne
regression curves. Solving the corresponding problem on S
2
, we expect solutions to be piecewise
1
Interestingly, since S
2
is compact and E is continuous, we can just as easily deﬁne and compute the point
that is as far away as possible of the data as the maximizer of E, which was not possible in R
n
.
4.4. RESULTS AND COMMENTS 39
geodesic. This is indeed what we observe. This means that a solution is completely speciﬁed
by its breaking points, i.e., one γ
i
associated to each diﬀerent data time label t
i
. In the con
tinuous case, Machado et al. showed the same behavior in [MSH06]. It would be interesting to
check whether the breaking points are the same following both the continuous and our discrete
approach.
As λ goes to 0, we expect the solutions to converge toward piecewise geodesic interpolation,
since reducing the distance between the curve and the data will become increasingly important.
As λ goes to inﬁnity, we expect the curve to collapse toward the mean of the data, since the
length of the curve will be highly penalized. We show a few numerical results corroborating our
intuition. Figure 4.4 shows diﬀerent solutions for three diﬀerent λ’s.
Second order
In R
n
, setting λ = 0 and µ > 0 disregards ﬁrst order derivatives and penalizes second order
derivatives, yielding solutions in a wellknown class: cubic splines. Cubic splines are piecewise
cubic curves constrained to be of class (
2
(in the continuous case). These are characterized by a
continuous, piecewise aﬃne acceleration proﬁle ¨ γ(t) along each dimension, with breaking points
at the data times. This is indeed what we obtain when solving the problem in R
n
as described
in chapter 1 (acceleration is computed with standard ﬁnite diﬀerences).
We may expect to observe this same behaviour on S
2
. The results depicted on ﬁgure 4.5
(where, for visualization, a minus sign has been added to the acceleration for times t >
1
2
since
the curvature changed at that point) indeed exhibits a compatible acceleration proﬁle. But ﬁg
ure 4.6 shows a diﬀerent situation, where the acceleration proﬁle is only continuous and piecewise
diﬀerentiable: the proﬁle segments are curved. This surprising result was already predicted and
quantiﬁed in a continuous setting by Machado and Silva Leite in [MS06, Thm 4.4]. The result
essentialy gives a diﬀerential equation for the connecting segments and continuity contraints,
notably showing that the fourth covariant derivative of γ does not, in general, vanish. Figure 4.6
shows that a similar eﬀect exists in the discrete case. It would be interesting to investigate
whether the eﬀects are quantitatively the same or not. We expect this to be the case.
Tuning µ, again, gives the user access to a whole family of curves. It is important to realize
that, regardless of µ, the length of the curve is not penalized. As µ goes to 0, ﬁtting the data
becomes increasingly important, yielding near interpolation. As µ goes to inﬁnity, we expect the
curve to converge toward a geodesic regression of the data. This is indeed the behavior we can
see on ﬁgure 4.5 and ﬁgure 4.6.
Geodesic regression is such an important special case that we give another illustration of it
on ﬁgure 4.7. Of course, our general approach is not well suited because of the need to make µ
increase to inﬁnity. Plus, we deal with too many degrees of freedom considering the simplicity
of the targeted object. Machado and Silva Leite, in [MS06], exploited the fact that a geodesic
is completely deﬁned by a point p on S
2
and a vector v ∈ T
p
S
2
to exhibit a set of necessary
conditions for a geodesic to be optimal for the regression problem. Sadly, these equations are
complicated and the authors do not try to solve them explicitly. In future work, we would like
to try our descent algorithms on the set of pairs (p, v), which can be identiﬁed with the tangent
bundle, hence is a manifold in its own respect. The tangent bundle of a Riemannian manifold
can be endowed with a metric in various ways. This is a question worth investigating.
Mixed
When λ and µ are both positive, there is no clear classiﬁcation of the solutions, even in R
n
.
Reasoning on the ﬁrst and second order families described above can give some intuition as
to how one should tune the parameters to obtain the desired curve. We show an example on
ﬁgure 4.8.
40 CHAPTER 4. DISCRETE CURVE FITTING ON THE SPHERE
λ = 10
−3
, µ = 0 λ = 10
−2
, µ = 0 λ = 10
−1
, µ = 0
S
p
e
e
d
0
1
3
2
3
1
0
2
4
Time
A
c
c
e
l
e
r
a
t
i
o
n
0
1
3
2
3
1
0
0.5
1
10
−4
Figure 4.4: In this ﬁgure, p1 = p4 is the upper point. By tuning λ with µ set to zero, one can generate
a family of piecewise geodesic curves ranging from near interpolation to a collapsed curve onto the mean
point of the data. We only need to compute the end points and the breaking points, i.e., the γ(ti)’s.
Numerical experiments indeed conﬁrm that optimizing a ﬁner curve with γi’s associated to intermediate
times τi gives the same breaking points (up to numerical precision), while the additional points lie on
the geodesics joining them, as they should. The optimal objective values will also be the same. These
statements hold for the deﬁnition of Ev given in equation (4.4), since integration of a constant function
with the rectangle method is exact. This observation is of great practical importance. Indeed, we know
exactly how to choose the discretization of γ to optimize at minimal computational cost. We also know
that setting the γi’s on the data points is a good initial guess for the descent method. On this example,
convergence to a curve such that the norm of grad E is less than 10
−8
is achieved in, respectively, 6,
11 and 15 CG iterations with the FletcherReeves coeﬃcients and the backtrackinginterpolation step
choosing algorithm. The speed and acceleration proﬁles have been computed using true geometric ﬁnite
diﬀerences, equations (3.4) and (3.5). Data at the breaking points has been removed. For the spheres
from left to right, the speed and acceleration proﬁle colors are respectively blue, green and red.
4.4. RESULTS AND COMMENTS 41
λ = 0, µ = 10
−4
λ = 0, µ = 10
−3
λ = 0, µ = 10
0
S
p
e
e
d
0
1
3
2
3
1
0
1.5
3
Time
A
c
c
e
l
e
r
a
t
i
o
n
0
1
3
2
3
1
16
0
16
Figure 4.5: λ = 0, µ > 0. The speed and acceleration proﬁles are computed with geometric ﬁnite
diﬀerences using the exact logarithm. For the spheres from left to right, the proﬁle colors are respectively
blue, green and red. The acceleration proﬁles, as displayed with corrected sign on the second half, are
piecewise aﬃne (up to numerical precision). This is consistent with cubic splines in R
n
, but is atypical on
manifolds. The neargeodesic solution is the hardest to compute. See ﬁgure 4.7 for more details. The ﬂat
segments at the end points (t = 0 and t = 1) are caused by our implementation of E which artiﬁcially sets
the integration weights β1 and βN
d
to zero to spare us the burden of computing the gradient of unilateral
diﬀerences. This only slightly aﬀects the solution.
42 CHAPTER 4. DISCRETE CURVE FITTING ON THE SPHERE
λ = 0, µ = 10
−4
λ = 0, µ = 10
−2
λ = 0, µ = 10
1
S
p
e
e
d
0
1
3
2
3
1
0
1.8
3.6
Time
A
c
c
e
l
e
r
a
t
i
o
n
0
1
3
2
3
1
0
4
8
Figure 4.6: This ﬁgure is to be compared with ﬁgure 4.5. Notice how the acceleration proﬁles are not
piecewise aﬃne anymore. Previous work by Machado and Silva Leite with continuous curves, see [MS06],
predicted this departure from what can be observed with cubic splines in R
n
. This numerical experiment
shows the existence of a similar eﬀect for discretized curves.
4.4. RESULTS AND COMMENTS 43
Figure 4.7: When the acceleration on γ ∈ Γ is evaluated using geometric ﬁnite diﬀerences with the exact
logarithm on S
2
, it is zero if and only if there is a geodesic γ(t) such that γ(τi) = γi for each i, see
section 3.1. Hence, when setting λ = 0 and some large value for µ, it is suﬃcient to ﬁnd (optimize) the
position of γ(ti) for each distinct data time ti. This observation yields an eﬀective, reliable manner of
computing the geodesic regression. Unfortunately, when we use the simpliﬁed logarithm and the ∆τi’s
are not all equal, the claim is not true anymore. The above procedure still gives a nice initial guess
for the optimization algorithm, but we further need to reﬁne the curve (increasing N
d
hence also the
computational cost) to reduce the impact of the generalized logarithm on the ﬁnal accuracy. With large
N
d
, the optimization algorithm tends to “be satisﬁed” with any geodesic not too far from the data. To
avoid this, an eﬃcient workaround is to optimize with N
d
= N ﬁrst (place the initial guess on the data),
then successively reoptimize while increasing N
d
by curve reﬁnement, algorithm 6. The curve shown in
this ﬁgure has maximum acceleration of 1.5 10
−4
for a total length of 1.79 over 1 unit of time. The
parameter values are: λ = 0, µ = 10.
44 CHAPTER 4. DISCRETE CURVE FITTING ON THE SPHERE
λ = 10
−5
, µ = 10
−6
S
p
e
e
d
0
1
4
1
2
3
4
1
1
1.75
2.5
Time
A
c
c
e
l
e
r
a
t
i
o
n
0
1
4
1
2
3
4
1
0
4
8
Figure 4.8: When λ > 0 and µ > 0, the solution lives in a large family of curves. The acceleration proﬁle
is still continuous and looks piecewise diﬀerentiable, but we do not have an easy characterization of the
family anymore (not even in the Euclidean case). The ﬂat pieces at the end points on the acceleration
plot were explained in the comments of ﬁgure 4.5. Notice how picking small positive values for both
tuning parameters yields a nearinterpolatory yet smoothlooking curve.
4.4. RESULTS AND COMMENTS 45
4.4.2 Numerical performances
Aside from the data volume, i.e., N, the computational cost depends on a number of parameters.
Namely:
• The core descent algorithm used,
• The step choosing algorithm,
• The quality of the initial guess,
• The number of discretization points, and
• The parameters λ and µ.
The relationship between numerical performance and factors such as N
d
, λ and µ is complex to
describe. For the other factors, we give a few guidelines hereafter.
The core descent algorithms we introduced in chapter 3 can be anything from a steepest
descent method to a PolakRibi`ere or FletcherReeves ﬂavored CG. The diﬀerence between
FletcherReeves and PolakRibi`ere is not, in general, signiﬁcant. They usually both perform
better than a steepest descent. Detailed algorithms to apply second order descent methods on
manifolds, including trust region methods, are described in [AMS08]. We would like to try these
in future work.
We select algorithm 5 as our step length choosing method. It is particularly well suited to our
problem since, for small steps, the linesearch function φ(α) = E(R
γ
(αp)) can be (amazingly)
well approximated by a quadratic. This comes from the fact that, in R
n
. . . R
n
, our objec
tive function E is quadratic and Γ locally resembles a Euclidean space. Some alternative step
choosing algorithms require the computation of φ
(α), the derivative of the line search function.
For the sake of completeness, we provide an explicit formula for it in appendix A.
Picking a smart initial guess is important to ensure convergence. Fortunately, numerical
experiments tend to show that the attraction basin of the global minimum is surprisingly large
(but not equal to Γ). Even picking γ
0
at random often works. Here are a few choices we
recommend depending on the scenario:
• For the data mean, choose γ
0
=
p
p
where p is the arithmetic mean of the data in R
3
;
• For piecewise geodesic regression, choose γ
0
i
= p
i
;
• For geodesic regression, choose γ
0
i
= p
i
, optimize, then successively reﬁne the curve and
reoptimize;
• For mixed order regression, solve the problem in R
3
(chapter 1) then project the solution
back on the sphere, or do the same as for geodesic regression.
As a rule of thumb, it is a good idea to presolve the problem with low N
d
. Typically, presolving
with one discretization point γ
i
for each distinct data time label t
i
yields good performances.
Placing the initial curve γ
0
on the data for this presolving step has never failed in our numer
ical experiments, regardless of the scenario. The reﬁnement step we refer to is highlighted in
algorithm 6. It requires the possibility to compute the mean of two points on the manifold
2
.
The main advantage of this algorithm is that it lets us set a tolerance on the spacing between
two discretization points without oversampling. In other words, it is ﬁne to work with curves
that alternate slow and fast segments. The resulting time sampling τ
1
, . . . , τ
N
d
will, in general,
be nonhomogeneous. The added advantage is that most of the optimization is done on fewer
points than the ﬁnal discretization quality.
2
If that is overly complicated, one can always use γ
i
or γ
i+1
as an approximation for the mean of these same
points. It is then up to the optimization process to move the new points to the right places.
46 CHAPTER 4. DISCRETE CURVE FITTING ON THE SPHERE
Algorithm 6: Iterative curve reﬁnement strategy
input : An initial curve γ; tol, a reﬁnement tolerance.
output: An optimized curve γ such that two successive points are separated by a distance
less than tol.
continue := true
while continue do
γ := optimize(γ)
continue := false
foreach i such that dist (γ
i
, γ
i+1
) > tol do
Insert mean (γ
i
, γ
i+1
) between them at time
τi+τi+1
2
continue := true
end
end
The CG algorithm performed really well on the piecewise geodesic regression (order 1) prob
lems. Figure 4.9 shows the evolution of step length, gradient norm and objective value over the
15 descent iterations. Figure 4.10 shows the same thing with a steepest descent algorithm. CG
clearly beats SD. Both exhibit a nice behavior.
Second order problems need more iterations. Furthermore, when µ has a high value (which
is desirable when near geodesic regression is needed), the raw algorithm tends to get stuck with
a closeto geodesic curve not too far away from the data. This stems from the fact that µE
a
overshadows E
d
. To circumvent this, we use algorithm 6. The convergence behavior for the sec
ond case exposed in ﬁgure 4.5 is shown in ﬁgure 4.11. Only the preoptimization step is shown,
i.e., the placement of the γ
i
’s corresponding to the p
i
’s. Subsequent steps (reﬁnement + opti
mization) can be rapidly carried out (around 30 iterations). FletcherReeves and PolakRibi`ere
perform about the same. The steepest descent method does as good a job as CG on the two ﬁrst
instances, and takes as much as ﬁve times more iterations on the closeto geodesic example for
comparable precision.
Computing geodesic regressions is an interesting problem. As we showed before, it turns out
to be the most challenging one for our algorithms. Figure 4.12 shows step length and gradient
norm evolution over a huge number of iterations (1250) of the CG algorithm for the geodesic
regression shown in ﬁgure 4.7. We have other algorithms in mind for geodesic regression. We
will explore them in future work.
4.4. RESULTS AND COMMENTS 47
Iteration
S
t
e
p
l
e
n
g
t
h
Iteration
G
r
a
d
i
e
n
t
n
o
r
m
Iteration
O
b
j
e
c
t
i
v
e
f
u
n
c
t
i
o
n
v
a
l
u
e
2 4 6 8 10 12 14 16
5 10 15 5 10 15
0.2
0.5
0.8
10
−12
10
−6
10
0
10
−8
10
−4
10
0
Figure 4.9: Convergence of the FletcherReeves CG algorithm on the most diﬃcult case of ﬁgure 4.4,
i.e., λ = 10
−1
. The behavior of the algorithm is excellent. It is slightly better than with PolakRibi`ere.
The evolution of the norm of the gradient is typical for superlinear convergence. Notice how the objective
value quickly reaches a plateau. To assess convergence, it is best to look at the gradient norm evolution.
Iteration
S
t
e
p
l
e
n
g
t
h
Iteration
G
r
a
d
i
e
n
t
n
o
r
m
Iteration
O
b
j
e
c
t
i
v
e
f
u
n
c
t
i
o
n
v
a
l
u
e
5 10 15 20 25 30
10 20 30 10 20 30
0.2
0.5
0.8
10
−12
10
−6
10
0
10
−8
10
−4
10
0
Figure 4.10: Convergence of the steepest descent algorithm on the same problem as ﬁgure 4.9. The
behavior of the algorithm is good, but it needs more than twice as many iterations as its CG counterpart.
The evolution of the norm of the gradient is typical for linear convergence.
48 CHAPTER 4. DISCRETE CURVE FITTING ON THE SPHERE
Iteration
S
t
e
p
l
e
n
g
t
h
Iteration
G
r
a
d
i
e
n
t
n
o
r
m
Iteration
O
b
j
e
c
t
i
v
e
f
u
n
c
t
i
o
n
v
a
l
u
e
1 2 3 4 5 6 7 8 9 10
2 4 6 8 10 2 4 6 8
0.01
0.02
0.03
10
−9
10
−5
10
−1
10
−8
10
−4
10
0
Figure 4.11: Optimization of the γi’s corresponding to the pi’s for the second case displayed in ﬁgure 4.5.
Convergence is achieved in about ten iterations. All our algorithms perform about the same except for the
closeto geodesic regression example, for which the CG algorithm performs much better. In subsequent
reﬁnement steps, the points computed in this step will be slightly displaced.
Iteration
S
t
e
p
l
e
n
g
t
h
Iteration
G
r
a
d
i
e
n
t
n
o
r
m
0 500 1000
0 500 1000
10
−4
10
1
10
6
10
−11
10
−6
10
−1
Figure 4.12: Optimization with CG for the problem shown in ﬁgure 4.7, with initial guess for γ equal
to the data. The parameter µ is set to 10 and the optimization is carried out by successive reﬁnements.
After about 750 iterations, the algorithm reached a threshold. The gradient norm was reduced by 8 orders
of magnitude, and numerical precision becomes an issue. We reﬁne the curve by adding points on the
geodesics joining the γi’s and reoptimizing in a few steps.
Chapter 5
Discrete curve ﬁtting on
positivedeﬁnite matrices
In the previous chapter, we built and analyzed a method to ﬁt curves to data on the sphere.
One of the main advantages of that manifold is that it is very easy to visualize the data and how
the algorithms behave. In that setting, intuition can get you a long way and one might argue
that the diﬀerential geometry background is expendable. S
2
was an ideal toy example to test
our ideas, but we need to work on more complicated manifolds to justify the complexity of our
method.
In this chapter, we propose to study curve ﬁtting on a manifold that is both diﬃcult to picture
and of practical interest: the set of positivedeﬁnite matrices P
n
+
, endowed with a Riemannian
metric. As is, P
n
+
is an open convex cone. With the usual metric, P
n
+
is thus not complete
1
.
Completeness is a desirable property because our optimization algorithms produce sequences of
points on P
n
+
and we certainly want these sequences to converge in P
n
+
. A possible workaround
is to use a metric that will deform the space so as to stretch the cone limits to inﬁnity. One of
the metrics that does that, later referred to as the aﬃneinvariant metric, is used in this chapter
and the methodology developed in this document is applied to the resulting manifold. As we
will see, computations are more involved, mainly because of the need for derivatives of matrix
functions.
We then exhibit other ways of solving similar problems. We ﬁrst observe that, P
n
+
being a
convex set, we can interpolate between points by using piecewise linear interpolation. This does
not give us the freedom to tune between ﬁtting and smoothness, but it is nevertheless a very
simple method that we need to compare our algorithms against. Yet another alternative is to
exploit the convex nature of our problem. Modern solvers can readily deal with the usual matrix
norms. Next, we investigate an interesting metric introduced by Arsigny et al. in [AFPA08],
called the LogEuclidean metric. It endows P
n
+
with a vector space structure, eﬀectively enabling
us to use the simple mathematics developed in the ﬁrst chapter to solve our problem rapidly.
We conclude this chapter by testing all described methods and comparing them against each
other.
1
Quoting [Wei10], a complete metric space is a metric space in which every Cauchy sequence is convergent. In
this deﬁnition, the metric refers to the Riemannian distance induced by the Riemannian metric.
49
50 CHAPTER 5. DISCRETE CURVE FITTING ON POSITIVEDEFINITE MATRICES
5.1 P
n
+
geometric toolbox
We note H
n
the set of symmetric matrices of size n and P
n
+
⊂ H
n
the set of positivedeﬁnite
matrices of size n. We use the notation A ∈ P
n
+
⇔A ~ 0. The embedding space H
n
is endowed
with the usual metric
', ` : H
n
H
n
→R : (H
1
, H
2
) →'H
1
, H
2
` = trace (H
1
H
2
) = trace (H
2
H
1
) .
The norm associated to this metric is the Frobenius norm
H =
'H, H` =
trace (H
2
) =
¸
i,j
H
2
ij
.
The set of positivedeﬁnite matrices is an open convex subset of H
n
, hence it is not complete.
Giving P
n
+
a Riemannian manifold structure by restricting the above metric to P
n
+
thus would
yield a noncomplete manifold. We can endow P
n
+
with a diﬀerent Riemannian metric in order
to make the resulting manifold complete. One such metric is given in table 5.1, see [AFPA08,
Bha07]. Following Arsigny et al. in [AFPA08], we call it the aﬃneinvariant metric. Since the
aﬃneinvariant metric is not the restriction of the metric on H
n
to P
n
+
, with this metric, P
n
+
is
not a Riemannian submanifold of H
n
. Regardless of the metric, P
n
+
being an open subset of H
n
,
the tangent space T
A
P
n
+
at any positive matrix A corresponds to the set H
n
. We illustrate this
on ﬁgure 5.1. From now on, P
n
+
refers to the set of positivedeﬁnite matrices endowed with the
geometric toolbox described in table 5.1.
Set: P
n
+
= ¦A ∈ R
n×n
: A = A
T
and x
T
Ax > 0 ∀x ∈ R
n
, x ,= 0¦
Tangent spaces: TAP
n
+
≡ H
n
= ¦H ∈ R
n×n
: H = H
T
¦
Inner product: 'H1, H2)
A
=
A
−1/2
H1A
−1/2
, A
−1/2
H2A
−1/2
Vector norm: HA =
'H, H)
A
Distance: dist (A, B) =
log
A
−1/2
BA
−1/2
Exponential: Exp
A
(H) = A
1/2
exp
A
−1/2
HA
−1/2
A
1/2
Logarithm: Log
A
(B) = A
1/2
log
A
−1/2
BA
−1/2
A
1/2
Mean: mean (A, B) = A
1/2
(A
−1/2
BA
−1/2
)
1/2
A
1/2
Table 5.1: Geometric toolbox for the Riemannian manifold P
n
+
endowed with the aﬃneinvariant metric
c
c
∗
Figure 5.1: Tangent vectors to an open subset c∗ of a vector space c. Figure courtesy of Absil
et al., [AMS08].
In loose terms, the metric described in table 5.1 stretches the limits of P
n
+
to inﬁnity. To
visualize this, it is instructive to particularize the exponential map to positive numbers, i.e.,
5.2. CURVE SPACE AND OBJECTIVE FUNCTION 51
positivedeﬁnite matrices of size 1. Starting at A = 1 and moving along the tangent vector
H = −1, we follow the curve Exp
A
(tH) = exp(−t). We never reach nonpositive numbers for
ﬁnite t. The space deformation is stronger close to singular matrices.
We use the matrix exponentials and logarithms. These are deﬁned as series derived from the
Taylor expansions of their scalar equivalents:
exp(A) = I +A +
1
2
A
2
+. . . =
∞
¸
k=0
1
k!
A
k
(5.1)
log(A) = (A −I) −
1
2
(A −I)
2
+
1
3
(A −I)
3
−. . . =
∞
¸
k=1
(−1)
k+1
k
(A −I)
k
(5.2)
The series for exp is convergent for any square matrix A. The series for log is derived from the
real series log(1+x) = x−x
2
/2+x
3
/3−. . ., which we know is convergent if [x[ < 1 (and divergent
if [x[ > 1). We thus expect the matrix counterpart to be convergent when ρ(A − I) < 1, where
ρ(X) is the spectral radius of X:
ρ(X) = max ([λ
1
[, . . . , [λ
n
[) , with λ
1
, . . . , λ
n
the eigenvalues of X.
This is indeed the case. On the other hand, we know that log(x) is well deﬁned for x > 0.
According to Higham, see [Hig08, problem 11.1], a Taylor series can be used to deﬁne the
principal logarithm of any matrix having no eigenvalue on R
−
. A hint to this is that, using the
identity
log(A) = s log(A
1/s
),
it is always possible, with s big enough, to bring the eigenvalues of A
1/s
into a circle centered
at 1 (in the complex plane) and with radius strictly smaller than 1. Consequently, the matrix
A
1/s
− I has spectral radius less than 1 and the Taylor series (5.2) can be used. In particular,
the matrix logarithm is well deﬁned as such over P
n
+
. Furthermore, Bhatia and Arsigny et al.,
see [Bha07, AFPA08], show that the restriction
exp : H
n
→P
n
+
: H →exp(H)
is a diﬀeomorphism between the metric spaces H
n
and P
n
+
. The inverse mapping is the restriction
log : P
n
+
→H
n
.
Every matrix H ∈ H
n
can be diagonalized as H = UDU
T
with U
T
U = I, the identity
matrix, and D a diagonal matrix composed with the real eigenvalues λ
i
of H. This yields easier
formulas for the exponential:
exp(H) =
∞
¸
k=0
1
k!
UDU
T
k
= U
¸
∞
¸
k=0
1
k!
D
k
¸
U
T
= U exp(D)U
T
,
where exp(D) is the diagonal matrix whose entries are exp(λ
i
). Same goes for the logarithm of
a positive matrix A = UDU
T
:
log(A) = U log(D)U
T
.
5.2 Curve space and objective function
The curve space on P
n
+
is given by
Γ = P
n
+
. . . P
n
+
= (P
n
+
)
N
d
.
Each of the N
d
points γ
i
composing a curve γ is a positive matrix. The tools shown in table 5.1
can be used on Γ by component wise extension. The objective function E is deﬁned as:
E : Γ →R : γ →E(γ) = E
d
(γ) +λE
v
(γ) +µE
a
(γ) (5.3)
52 CHAPTER 5. DISCRETE CURVE FITTING ON POSITIVEDEFINITE MATRICES
E
d
(γ) =
1
2
N
¸
i=1
w
i
dist
2
(p
i
, γ
si
)
E
v
(γ) =
1
2
N
d
−1
¸
i=1
1
∆τ
i
dist
2
(γ
i
, γ
i+1
)
E
a
(γ) =
1
2
¸
∆τ
1
2
a
1

2
γ1
+
N
d
−1
¸
i=2
∆τ
i−1
+ ∆τ
i
2
a
i

2
γi
+
∆τ
N
d
−1
2
a
N
d

2
γN
d
¸
(5.4)
The p
i
’s are positivedeﬁnite matrices. This is to be compared to the deﬁnition given in sec
tion 4.2. The acceleration vectors a
i
∈ T
γi
P
n
+
≡ H
n
are symmetric matrices deﬁned by
a
1
=
−2
∆τ
1
∆τ
2
Log
γ1
(γ
2
) +
2
∆τ
2
(∆τ
1
+ ∆τ
2
)
Log
γ1
(γ
3
) ,
a
i
=
2
∆τ
i−1
+ ∆τ
i
¸
1
∆τ
i
Log
γi
(γ
i+1
) +
1
∆τ
i−1
Log
γi
(γ
i−1
)
, ∀i ∈ 2 . . . N
d
−1,
a
N
d
=
−2
∆τ
N
d
−1
∆τ
N
d
−2
Log
γN
d
(γ
N
d
−1
) +
2
∆τ
N
d
−2
(∆τ
N
d
−1
+ ∆τ
N
d
−2
)
Log
γN
d
(γ
N
d
−2
) .
5.3 Gradient of the objective
We now compute the gradient of E as deﬁned in the previous section. This requires formulas
for the derivatives of the matrix logarithm, appearing in the distance function and the logarith
mic map deﬁned in table 5.1. We derive a generic way of computing derivatives of functions
of symmetric matrices. The way we derive the material in this section is inspired by [FPAA07,
appendix] and [Bha07].
Let A ∈ H
n
, U an orthogonal matrix and D = diag (λ
1
, . . . , λ
n
), such that A = UDU
T
. Let
f be a smooth, real function deﬁned by a Taylor series
f(x) =
∞
¸
k=0
a
k
x
k
.
Generalized to matrices, this yields
f(A) =
∞
¸
k=0
a
k
A
k
= U diag (f(λ
1
), . . . , f(λ
n
)) U
T
.
We would like to compute Df(A)[H], the directional derivative of f at A in the direction H ∈ H
n
.
We diﬀerentiate the series term by term. To this end, we need the following result:
D(X → X
k
)(A)[H] = lim
h→0
(A +hH)
k
−A
k
h
=
k
¸
l=1
A
l−1
HA
k−l
Then,
Df(A)[H] =
∞
¸
k=1
a
k
k
¸
l=1
A
l−1
HA
k−l
= U
¸
∞
¸
k=1
a
k
k
¸
l=1
D
l−1
U
T
HUD
k−l
¸
U
T
= U Df(D)[U
T
HU] U
T
which is a simpler object to compute because D is diagonal. Let us write
˜
H = U
T
HU and
5.3. GRADIENT OF THE OBJECTIVE 53
M = Df(D)[
˜
H]. Entrywise, we get:
M
ij
=
∞
¸
k=1
a
k
k
¸
l=1
(D
l−1
˜
HD
k−l
)
ij
=
∞
¸
k=1
a
k
k
¸
l=1
λ
l−1
i
λ
k−l
j
˜
H
ij
=
˜
H
ij
∞
¸
k=1
a
k
λ
k
j
λ
i
k
¸
l=1
λ
i
λ
j
l
.
Using the identity
¸
k
l=1
x
k
= x
1−x
k
1−x
, valid for x = 1, we distinguish two cases:
λ
k
j
λ
i
k
¸
l=1
λ
i
λ
j
l
=
λ
k
i
−λ
k
j
λi−λj
, if λ
i
= λ
j
kλ
k−1
i
, if λ
i
= λ
j
Hence:
M
ij
=
˜
H
ij
˜
f(λ
i
, λ
j
),
with:
˜
f(λ
i
, λ
j
) =
f(λi)−f(λj)
λi−λj
, if λ
i
= λ
j
f
(λ
i
), if λ
i
= λ
j
The coeﬃcients
˜
f(λ
i
, λ
j
) are termed the ﬁrst divided diﬀerences by Bhatia, see [Bha07, p. 60].
We build the matrix
˜
F such that
˜
F
ij
=
˜
f(λ
i
, λ
j
). Then,
M =
˜
H
˜
F,
where stands for entrywise multiplication (Hadamard’s product, also known as Schur’s prod
uct). Setting f = log, we explicitly compute Dlog(A)[H], A ~ 0, H = H
T
, as follows:
1. Diagonalize A: A = UDU
T
, U orthogonal, D = diag (λ
1
, . . . , λ
n
),
2. Compute
˜
H = U
T
HU,
3. Compute
˜
F (ﬁrst divided diﬀerences of log based on the λ
i
’s),
4. Compute M =
˜
H
˜
F,
5. Compute Dlog(A)[H] = UMU
T
By construction, Dlog(A)[H] is symmetric.
Remark 5.3.1. As one can see from this algorithm, it is not critical that H be symmetric nor
that A be positivedeﬁnite. As long as A does not have eigenvalues on R
−
, diagonalizability is
suﬃcient. We then use U
−1
instead of U
T
.
The objective function components E
d
and E
v
are linear combinations of squared distances.
To compute their gradients, we need only derive an expression for gradf
A
(X), with
f
A
(X) =
1
2
dist
2
(A, X) =
1
2
log
A
−1/2
XA
−1/2
, log
A
−1/2
XA
−1/2
.
By using these identities:
D(f ◦ g)(X)[H] = Df(g(X))[Dg(X)[H]] (composition rule),
D(X →'f(X), g(X)`)(X)[H] = 'Df(X)[H], g(X)` +'f(X), Dg(X)[H]` (product rule), and
D(X →AXB)(X)[H] = lim
h→0
A(X +hH)B −AXB
h
= AHB,
54 CHAPTER 5. DISCRETE CURVE FITTING ON POSITIVEDEFINITE MATRICES
we work out the following formula:
Df
A
(X)[H] =
Dlog(A
−1/2
XA
−1/2
)[A
−1/2
HA
−1/2
], log(A
−1/2
XA
−1/2
)
.
By deﬁnition 2.4.2, the gradient gradf
A
(X) is the unique symmetric matrix verifying
∀H ∈ H
n
, Df
A
(X)[H] = 'gradf
A
(X), H`
X
.
Considering
H
1
, . . . , Hn(n+1)
2
, an orthonormal basis of H
n
endowed with the inner product
', `, we can compute an orthonormal basis
H
1
, . . . , Hn(n+1)
2
of H
n
endowed with the inner
product ', `
X
as H
i
= X
1/2
H
i
X
1/2
, since
'H
i
, H
j
`
X
=
X
−1/2
H
i
X
−1/2
, X
−1/2
H
j
X
−1/2
=
H
i
, H
j
= δ
ij
,
where δ
ij
is the Kronecker delta. We thus compute the gradient of f
A
at X as follows:
gradf
A
(X) =
n(n+1)
2
¸
k=1
Df
A
(X)[H
k
]H
k
=
n(n+1)
2
¸
k=1
Df
A
(X)[X
1/2
H
k
X
1/2
]X
1/2
H
k
X
1/2
.
Collecting the relevant equations yields an explicit algorithm to compute gradf
A
(X). Regret
fully, this method requires
n(n+1)
2
computations of Dlog. We can do better.
Remembering that for square matrices trace (AB) = trace (BA), and with X = UDU
T
:
D
X →
1
2
 log(X)
2
(X)[H] = 'Dlog(X)[H], log(X)`
= trace
U(
˜
H
˜
F)U
T
U log(D)U
T
= trace
(
˜
H
˜
F) log(D)
=
n
¸
i=1
log(λ
i
)
˜
H
ii
˜
F
ii
=
n
¸
i=1
log(λ
i
)λ
−1
i
˜
H
ii
= trace
D
−1
log(D)
˜
H
= trace
U
T
UD
−1
U
T
U log(D)U
T
U
˜
HU
T
U
= trace
X
−1
log(X)H
=
X
−1
log(X), H
,
Df
A
(X)[H] =
A
1/2
X
−1
A
1/2
log(A
−1/2
XA
−1/2
), A
−1/2
HA
−1/2
=
X
−1
log(XA
−1
), H
=
log(XA
−1
)X, H
X
,
where we used Alog(B)A
−1
= log(ABA
−1
). Hence,
gradf
A
(X) = log(XA
−1
)X.
This is a symmetric matrix. Indeed,
log(XA
−1
)X = log
X
1/2
X
1/2
A
−1
X
1/2
X
−1/2
X = X
1/2
log
X
1/2
A
−1
X
1/2
X
1/2
.
5.3. GRADIENT OF THE OBJECTIVE 55
From this equation, we also see that gradf
A
(B) = −Log
B
(A), which makes perfect sense: to
increase the distance between A and B as fast as possible with an inﬁnitesimal change, B ought
to move away from A. By the symmetry f
A
(B) = f
B
(A) =
1
2
dist
2
(A, B),
grad
A →
1
2
dist
2
(A, B)
(A) = gradf
B
(A) = log(AB
−1
)A,
grad
B →
1
2
dist
2
(A, B)
(B) = gradf
A
(B) = log(BA
−1
)B.
Figure 5.2 compares 'gradf
A
(X), H`
X
to a centered diﬀerence for some ﬁxed A, X, H. This
ﬁgure provides further evidence that our formula for gradf
A
(X) is correct.
h
e
r
r
o
r
error =
fA(Exp
X
(hH))−fA(Exp
X
(−hH))
2h
−
log(XA
−1
)X, H
X
10
−10
10
−7
10
−4
10
−1
10
−12
10
−8
10
−4
10
0
Figure 5.2: Comparison of
f
A
(Exp
X
(hH))−f
A
(Exp
X
(−hH))
2h
with 'grad fA(X), H)
X
(order of magnitude: 1)
for some ﬁxed A, X, H. The slope of the straight segment is 2. Typically, the breaking point occurs for
h close to 10
−5
, due to numerical accuracy. This plot gives us conﬁdence in the formula derived for
grad fA(X). Similar graphs have been generated to check every gradient formula given in this document,
including grad E itself.
Deriving the gradient for E
a
is more cumbersome. We construct an explicit formula for it in
appendix B. We still highlight the important points here. First, we needed to prove the following
identity, for three square matrices A, B, C of size n:
'A B, C` = trace ((A B)C)
=
n
¸
i=1
n
¸
j=1
A
ij
B
ij
C
ji
=
n
¸
i=1
n
¸
j=1
A
ij
(B
T
)
ji
(C
T
)
ij
=
AC
T
, B
T
(5.5)
We also needed to prove the following proposition.
Proposition 5.3.2. Let A, B ∈ P
n
+
. Then AB
−1
has real, positive eigenvalues and is diagonal
izable.
Proof. We note Λ(A) the spectrum of A, i.e., the set of eigenvalues of A. We know that, for any
invertible matrix S, Λ(SAS
−1
) = Λ(A). Indeed:
1 = det(I) = det(SS
−1
) = det(S) det(S
−1
),
det(SAS
−1
−λI) = det(S(A −λI)S
−1
) = det(S) det(A −λI) det(S
−1
) = det(A −λI).
56 CHAPTER 5. DISCRETE CURVE FITTING ON POSITIVEDEFINITE MATRICES
Then,
Λ(AB
−1
) = Λ(A
1/2
A
1/2
B
−1
A
1/2
A
−1/2
) = Λ(A
1/2
B
−1/2
B
−1/2
A
1/2
)
= Λ((B
−1/2
A
1/2
)
T
(B
−1/2
A
1/2
))
Furthermore, any symmetric matrix of the form X
T
X with ker X = ¦0¦ is positivedeﬁnite, since
∀x ∈ R
n
, x = 0, x
T
X
T
Xx = Xx
2
> 0.
Setting X = B
−1/2
A
1/2
, we see that AB
−1
has the spectrum of a positivedeﬁnite matrix. We
diagonalize the symmetric matrix A
1/2
B
−1
A
1/2
= UDU
T
, U orthogonal and D diagonal. Deﬁne
V = U
T
A
−1/2
. V AB
−1
V
−1
= D is diagonal, hence AB
−1
is diagonalizable.
The joint material of this section and of appendix B is suﬃcient to implement direct algo
rithms to compute the gradient of E. We use these algorithms in our descent methods from
chapter 3. In section 5.7, we illustrate our results.
5.4 Alternative I: linear interpolation
P
n
+
is a convex set. Indeed, let A, B ∈ P
n
+
. Then, tA + (1 −t)B ∈ P
n
+
for all t ∈ [0, 1], since
∀x ∈ R
n
, x = 0, x
T
(tA + (1 −t)B)x = tx
T
Ax + (1 −t)x
T
Bx > 0.
Hence, to interpolate between two points p
1
, p
2
∈ P
n
+
, we can simply interpolate between the
(real) entries of p
1
and p
2
, linearly. This alternative to our geometric approach is so simple
that we have to consider it. In section 5.7, we compare our main results for piecewise geodesic
interpolation to this alternative. Of course, we cannot use linear interpolation to solve the generic
problem of smooth regression with more than two data points.
5.5 Alternative II: convex programming
When we endowed P
n
+
with the aﬃneinvariant metric, we “stretched” the limits of the (open)
set P
n
+
to inﬁnity, making the resulting metric space complete. This ensures that solutions of the
regression problem are made of matrices γ
i
~ 0. The LogEuclidean metric, which we introduce
in the next section, has a similar stretching eﬀect, needed to map the convex set P
n
+
onto the
vector space H
n
.
We can sometimes solve the problem without such deformations. Using the metric ', ` de
ﬁned on H
n
and the associated norm  , P
n
+
is now seen as a Riemannian submanifold of H
n
.
A natural objective for the regression problem is given by equations (1.3) through (1.6). The
acceleration vectors are deﬁned using ﬁnite diﬀerences in the ambient space H
n
.
This objective is convex. The set Γ is convex too. Standard solvers for semideﬁnite pro
gramming, like SeDuMi or SDPT3 [Stu98, TTT99], can solve a relaxed version of our problem:
minimize E(γ) such that each γ
i
is positive semideﬁnite. The usage of such solvers is made
easy via modeling languages such as Yalmip and CVX [L
¨
04, GB10]. We use the latter with
the Frobenius norm for the ﬁgures shown in section 5.7. If, at optimality, none of the γ
i
_ 0
constraints is active (i.e., γ
i
~ 0, ∀i) the optimal solution corresponds to regression in H
n
. This
can be done with the simple tools from chapter 1. If, however, one of the constraints is active,
there is, a priori, no way to convert the solution of the relaxed problem into a solution for our
original problem. The original problem might even not have a solution, speciﬁcally because Γ is
not complete. We exhibit this in ﬁgure 5.5.
Among the advantages of convex programming, we note that:
• eﬃcient, robust algorithms are available,
• and provide optimality certiﬁcates;
5.6. ALTERNATIVE III: VECTOR SPACE STRUCTURE 57
• we can use 1, 2, ∞ and Frobenius norms over H
n
;
• we need only provide the problem description (no gradients);
• it is easy to add convex constraints.
The aﬃneinvariant and LogEuclidean distances, seen as functions from P
n
+
P
n
+
to R
+
where P
n
+
is understood as a convex subset of the vector space H
n
, are not convex. Indeed, for
n = 1, P
n
+
is the set of positive real numbers. The two metrics are equal, such that dist
2
(x, y) =
(log(y) −log(x))
2
. The Hessian of this function at (x, y) = (1, 2) has a negative eigenvalue, hence
it is not convex. Consequently, convex programming cannot be used to solve the regression
problem with the aﬃneinvariant and LogEuclidean metrics.
5.6 Alternative III: vector space structure
We mentioned in section 5.1 that the restricted matrix exponential
exp : H
n
→P
n
+
: H →exp(H)
is a smooth diﬀeomorphism. Following [AFPA08], this can be used to endow P
n
+
with a vector
space structure. We ﬁrst introduce new summation and scaling operators:
⊕ : P
n
+
P
n
+
→P
n
+
: (A, B) →A ⊕B = exp(log(A) + log(B)),
⊗ : R P
n
+
→P
n
+
: (α, A) →α ⊗A = exp(αlog(A)) = A
α
.
By construction,
exp : (H
n
, +, .) →(P
n
+
, ⊕, ⊗)
is a vector space isomorphism. The inverse mapping is the (principal) matrix logarithm. Arsigny
et al. use this to endow P
n
+
with the metric described in table 5.2, termed the LogEuclidean
metric, see [AFPA08]. The vector space structure simpliﬁes a number of problems. For example,
the geodesic between A and B is γ(t) = exp(t log(A) + (1 −t) log(B)).
Set: P
n
+
= ¦A ∈ R
n×n
: A = A
T
and x
T
Ax > 0 ∀x ∈ R
n
, x ,= 0¦
Tangent spaces: TAP
n
+
≡ H
n
= ¦H ∈ R
n×n
: H = H
T
¦
Inner product: 'H1, H2)
A
= 'Dlog(A)[H1], Dlog(A)[H2])
Vector norm: HA =
'H, H)
A
Distance: dist (A, B) = log(B) − log(A)
Exponential: Exp
A
(H) = exp(log(A) + Dlog(A)[H])
Logarithm: Log
A
(B) = Dexp(log(A))[log(B) − log(A)]
Mean: mean (A, B) = exp(.5(log(A) + log(B)))
Table 5.2: Geometric toolbox for the Riemannian manifold P
n
+
endowed with the LogEuclidean metric.
Let A, B ∈ P
n
+
be commuting matrices, i.e., AB = BA. Then, there exists U, orthogonal,
such that A = UD
A
U
T
and B = UD
B
U
T
, with D
A
and D
B
diagonal, see [Bha07, p. 23]
2
. Using
commutativity of diagonal matrices:
 log(B) −log(A)
2
= U(log(D
B
) −log(D
A
))U
T

2
= U log(D
B
D
−1
A
)U
T

2
= U log(D
A
−1/2
D
B
D
A
−1/2
)U
T

2
= U log(U
T
A
−1/2
UU
T
BUU
T
A
−1/2
U)U
T

2
=  log(A
−1/2
BA
−1/2

2
2
In [Bha07], Bhatia states that it is suﬃcient to have A, B ∈ H
n
for this to be true
58 CHAPTER 5. DISCRETE CURVE FITTING ON POSITIVEDEFINITE MATRICES
This proves that, if A and B commute, the aﬃneinvariant and the LogEuclidean metric are
identical. In particular, this is true when A and B are diagonal. This is linked to the fact that,
for commuting A and B, log(AB) = log(A) + log(B) = log(BA).
We solve the smooth regression problem on P
n
+
endowed with the LogEuclidean metric as
follows:
1. Compute the new data points ˜ p
i
= log(p
i
) ∈ H
n
for each original data point p
i
∈ P
n
+
;
2. Compute ˜ γ ∈ H
n
. . . H
n
, the solution of the regression problem in H
n
with the new
data points;
3. Compute γ, the solution of the original problem, as γ
i
= exp(˜ γ
i
).
Step 2 can be carried out by solving a regression problem in R
n(n+1)
2
as described in chapter 1.
Thanks to the vector space structure, we achieve guaranteed minimization of the objective by
solving sparse, structured linear systems. Furthermore, we need not go through the hassle of
computing gradients like we did in section 5.3. Using modern QP solvers, it is also easy to add
constraints to the original problem in the form of linear equalities and inequalities. Linear matrix
inequalities can be added using semideﬁnite programming solvers, see section 5.5.
5.7 Results and comments
The solutions to our regression problem on P
n
+
can be categorized in the same way we did in sec
tion 4.4 for the sphere. In this section, we show plots for diﬀerent scenarios. We compare linear
interpolation, convex programming, the LogEuclidean metric and the aﬃneinvariant metric.
Positivedeﬁnite matrices of size n = 2 are represented as ellipses. The axis orientations of these
ellipses correspond to the eigenvector directions and the axis lengths to the corresponding eigen
values.
In many special cases, some of these methods yield the same solutions. For example, we
proved that when the data points p
i
are commuting matrices, the aﬃneinvariant and the Log
Euclidean metrics are equivalent. Of course, it is then computationally advantageous to solve the
problem with the LE metric. Because of this, we always use the LE metric solution as our initial
guess for the aﬃneinvariant metric optimization. Additionally, since the convex programming
approach uses the Frobenius norm, the shortest path between two matrices corresponds to linear
interpolation in H
n
.
We use the CG algorithm with the following naive vector transport:
T
η
(ξ) = ξ
This means that we simply compare tangent vectors belonging to diﬀerent tangent spaces as is.
We expect this to work because our descent algorithms usually make small steps. Because of
the smoothness of the inner product on Riemannian manifolds, close tangent spaces from P
n
+
“resemble each other”. In practice, we observe that the CG algorithm with this vector transport
performs about as well as the steepest descent method, and sometimes much better.
We propose an alternative vector transport, theoretically more attractive, based on the fol
lowing vector transport on P
n
+
, with H ∈ T
A
P
n
+
:
T
Log
A
(B)
(H) = B
1/2
A
−1/2
HA
−1/2
B
1/2
∈ T
B
P
n
+
It has the nice property that, for any two tangent vectors H
1
, H
2
∈ T
A
P
n
+
, noting H
i
=
T
Log
A
(B)
(H
i
) ∈ T
B
P
n
+
, we have:
H
1
, H
2
B
=
B
−1/2
H
1
B
−1/2
, B
−1/2
H
2
B
−1/2
=
A
−1/2
H
1
A
−1/2
, A
−1/2
H
2
A
−1/2
= 'H
1
, H
2
`
A
.
Surprisingly, the CG method with this vector transport extended to Γ showed signiﬁcantly poorer
performances on the tests we ran. Consequently, in the following, we use T
η
(ξ) = ξ.
5.7. RESULTS AND COMMENTS 59
Zeroth order
The weighted mean p of positivedeﬁnite matrices can be deﬁned as
p = arg min
X∈P
n
+
¸
N
i=1
w
i
f
X
(p
i
)
¸
N
i=1
w
i
. (5.6)
Recall that f
X
(A) =
1
2
dist
2
(X, A). For the LogEuclidean metric, we simply have
p = exp
¸
N
i=1
w
i
log(p
i
)
¸
N
i=1
w
i
.
This is a generalization of the geometric mean for positive reals. For the aﬃneinvariant metric,
we use the LogEuclidean mean as an initial guess and apply one of our minimization algorithms
to the expression (5.6). When the p
i
’s commute, the initial guess is optimal. Otherwise, con
vergence (i.e.,  gradE < 10
−9
) with 250 random matrices of size 5 is achieved with any of
our algorithms in, typically, three steps or less. In [Bha07], Bhatia proves that the optimization
problem (5.6) is strictly convex in some sense.
First order
Setting λ > 0 and µ = 0, again, produces piecewise geodesic regressions. Figure 5.3 compares
solutions obtained with our main method and the three mentioned alternatives. Figure 5.4 shows
the behavior of the CG and the steepest descent methods. Both converge rapidly. Many of the
remarks we made in section 4.4 for the sphere still hold. More speciﬁcally, notice how the curves
obtained with the deforming metrics pass through thinner matrices than the others. This eﬀect
has already been observed by Arsigny et al. for the geodesic joining two data points, see [AFPA08].
The geodesic between A and B has a closedform expression for the aﬃneinvariant metric:
γ
A,B
: [0, 1] →P
n
+
: t →γ
A,B
(t) = Exp
A
(t Log
A
(B))
= A
1/2
exp
t log
A
−1/2
BA
−1/2
A
1/2
(5.7)
Second order
For geodesic regression (high µ, zero λ), we simply compute the γ
i
’s at the data time labels t
i
,
because we use the exact logarithmic map on P
n
+
(see our discussion in subsection 4.4.1). This
drastically decreases the computational cost. We show an example in ﬁgure 5.6. Geodesics built
with the deforming metrics can be indeﬁnitely extended on both ends, whereas the regression
built with convex programming would, eventually, reach a singular matrix.
Setting λ = 0 and µ > 0 yields splinelike curves. Figure 5.7 shows an example for the
four solving techniques we discussed. Computing the solution for the aﬃneinvariant metric
is computationally more intensive, as ﬁgure 5.8 demonstrates. The CG algorithm, despite our
poor choice of vector transport, performs much better than the steepest descent method. SD
needed 3.5 times as many iterations for equal quality (judged based on the gradient norm). The
reﬁnement technique, algorithm 6, cut down the number of needed iterations by about 20% for
the example on ﬁgure 5.7, compared to using the LogEuclidean solution as initial guess directly.
60 CHAPTER 5. DISCRETE CURVE FITTING ON POSITIVEDEFINITE MATRICES
data
linear
convex
LogEuclidean
aﬃneinvariant
Figure 5.3: λ = 10
−1
, µ = 0. The data consists in three positivedeﬁnite, noncommuting matrices shown
on the ﬁrst line. The second line shows linear interpolation between the data. The third line is obtained
by solving the problem with SDPT3, using CVX. When λ goes to zero, this goes to linear interpolation.
The fourth line is computed by solving a linear leastsquares problem on the log(pi)
s and going back on
P
n
+
via the matrix exponential. For the ﬁfth line, we use the LogEuclidean solution at the three data
times ti as initial guess for the breaking points. We apply a minimization algorithm starting with them.
The additional intermediate points are computed with equation (5.7). This is still optimal. Computation
times for the three last lines are comparable.
Iteration
S
t
e
p
l
e
n
g
t
h
Iteration
G
r
a
d
i
e
n
t
n
o
r
m
Iteration
O
b
j
e
c
t
i
v
e
f
u
n
c
t
i
o
n
v
a
l
u
e
1 8 16
1 8 16 1 8 16
0.56
0.57
10
−9
10
−5
10
−1
10
−7
10
−4
10
−1
Figure 5.4: These plots show the behavior of the minimization process for the aﬃneinvariant metric
solution shown in ﬁgure 5.3. Both the geometric steepest descent method (blue) and the nonlinear
geometric CG method (red) perform really well. The important point here is that we need only compute
the γi’s at the data times (the breaking points), i.e., we search for just three positivedeﬁnite matrices.
5.7. RESULTS AND COMMENTS 61
replacemen
λ
1
λ
2
linear scale
λ
1
λ
2
log scale
1 1.5 2 1 1.5 2
10
−8
10
−1
10
0
0
0.1
1
Figure 5.5: λ = 0, µ = 10
−4
. The four data points are 22 positivedeﬁnite diagonal matrices. They are
pictured as black circles, placed in the eigenvalue plane in linear scale (left) and logarithmic scale (right).
Since the data matrices commute, the solution with the aﬃneinvariant and the LogEuclidean metrics
are the same. The red curve shows regression with these metrics. The blue curve shows regression with
convex programming (metric inherited from H
n
). SDPT3, our semideﬁnite programming solver, claims
optimality (according to the default tolerances). Notice how the blue curve passes through almost singular
matrices. The actual solution most likely passes through singular matrices. SDPT3 uses an interior point
method, hence constraints are never tight. The fact that the optimal curve leaves P
n
+
is a main argument
against convex programming in this setting.
data
linear
convex
LogEuclidean
aﬃneinvariant
Figure 5.6: λ = 0, µ = 10
3
. An example of geodesic regression across three data points with our three
solving techniques, and linear interpolation (second line). Convex and LogEuclidean metric solutions
(third and fourth lines) are exact (up to numerical accuracy) and quick to compute. The aﬃneinvariant
metric solution (bottom line) is reasonably quick to compute. On this plot, the curve displayed has
maximum acceleration of 10
−3
for a total length of 3.3 traveled in one time unit.
62 CHAPTER 5. DISCRETE CURVE FITTING ON POSITIVEDEFINITE MATRICES
data
linear
convex
LogEuclidean
aﬃneinvariant
Figure 5.7: λ = 0, µ = 10
−3
. The three methods of interest (bottom lines) give sensible results. Which
one should use in practice is applicationdependent. Of course, piecewise linear interpolation is not
acceptable when splinelike curves are required.
Iteration
S
t
e
p
l
e
n
g
t
h
Iteration
G
r
a
d
i
e
n
t
n
o
r
m
0 800 1600 2400
0 800 1600 2400
10
−7
10
−4
10
−1
10
−9
10
−6
10
−3
Figure 5.8: These plots show the convergence behavior for ﬁgure 5.7 with the CG algorithm and the
successive reﬁnement technique. Notice how the CG algorithm exhibits a linear convergence rate. Con
vergence is excessively slow compared to alternatives II and III which gave almost instantaneous results.
Conclusions and perspectives
In the present work, we discretized the problem of ﬁtting smooth curves to data on manifolds (1.2)
and we designed and implemented algorithms to solve the discrete problem (3.6). To this end,
we introduced the concept of geometric ﬁnite diﬀerences, which ﬁll the classic role of ﬁnite dif
ferences on manifolds, and computed explicit formulas for the objective (3.6) and its gradient
on R
n
, S
2
, P
n
+
and SO(3). The optimization schemes we used are various ﬂavors of a geomet
ric version of the nonlinear conjugate gradient scheme taken from [AMS08]. To alleviate the
computational burden, we exploited the concepts of retraction and vector transport, borrowed
from [AMS08], as proxies for the expensive exponential map and parallel transport. We also
introduced a similar concept for the logarithmic map, which we here call generalized logarithms.
Performances were shown to be excellent for ﬁrst order problems (µ = 0): our optimization
algorithms reach small values of the gradient in a few iterations. The more interesting case of
second order regression (µ > 0), however, is solved signiﬁcantly more slowly. The iterative reﬁne
ment technique we introduced, although it has proven valuable, only partly helps. Furthermore,
the algebra needed to obtain the gradient of (3.6) on matrix manifolds turned out to be quite
involved. This adds to the intricacy of implementing the algorithms we designed. The attrac
tive alternatives we proposed on P
n
+
are further arguments pointing toward the need for faster,
simpler algorithms. Two options we intend to pursue in future work to address these concerns
are
1. the usage of automatic diﬀerentiation techniques or ﬁnite diﬀerence approximations of the
derivatives of the objective, and
2. the implementation of higherorder optimization schemes like, e.g., the geometric trust
region method exposed in [AMS08].
Let us mention a few research directions we would like to investigate in future work.
• We believe that, as max
i
[τ
i+1
− τ
i
[ → 0, the solution of the discrete problem may tend
toward a sampled version of the solution of the continuous problem. We intend to work
on a proof of this statement. On a related note, we would like to verify that the breaking
points appearing in optimal curves for λ > 0, µ = 0 in the continuous case coincide with
the breaking points computed with our discrete approach.
• Geodesic regression is a special case of interest in applications, but is slow to achieve with
our present algorithms. A geodesic on ´ is completely speciﬁed by an element of the
tangent bundle T´, that is: a point p ∈ ´ and a tangent vector ξ ∈ T
p
´. We would
like to, generically, equip the manifold T´ with a toolbox based on the toolbox for ´ so
we could apply the algorithms described in this document to eﬃciently compute geodesic
regressions.
• To apply our techniques to a manifold ´, one needs a geometric toolbox for ´, pos
sibly a vector transport and the gradients of the functions f(A, B) = dist
2
(A, B) and
g(A, B, C) = 'Log
A
(B) , Log
A
(C)`
A
with respect to all variables. At some point, it would
be interesting to build software that only asks the user for these elements. This would en
able fast prototyping and usefulness assessment. For all manifolds treated in this document,
we found that grad(A → f(A, B))(A) = −2 Log
A
(B), which makes sense geometrically.
63
64
This is a general property, see [SASK09, Theorem 3.1]. This observation partly alleviates
the implementation burden for second order problems (µ > 0). For ﬁrst order problems
(µ = 0), no gradient formulas have to be computed at all.
• A typical strategy in (discretetime) control is to compute the optimal commands to apply
to the system up to some distant horizon in the future, then only apply the ﬁrst computed
command. The criterion is usually expressed in terms of the predicted future states of the
system, based on the proposed command. It is customary to also include a regularization
term on the command in the objective function. A wellestablished strategy to tackle this
problem is, e.g., Model Predictive Control. In practical applications, the system is often
subject to constraints. When the admissible set for the command is a manifold ´, we
really are looking for an optimal smooth curve on ´. Investigating these applications is a
natural extension of our work.
• The software written for this work
3
is only a proof of concept. The rich structure of the
problems we treat gives plenty of opportunities for eﬃcient organization of the compu
tations, which should drastically improve the performances. When our algorithms reach
maturity, it will be useful to design and distribute eﬃcient, usable software.
• Other manifolds of practical interest are, e.g., the Grassmann (quotient) manifold and the
(inﬁnitedimensional) shape space. Smooth interpolation in the shape space corresponds
to smooth morphing between given shapes, with applications in computer graphics, object
tracking. . .
The positive results we obtained in the present work and the appealing questions left to
answer strongly encourage us to further our investigations.
3
Downloadable here: http://www.inma.ucl.ac.be/
~
absil/Boumal/
Appendix A
Line search derivative on S
2
We give an explicit formula for the derivative of the line search function φ : R → R appearing
in the descent methods applied on the sphere S
2
. The equivalent formula for the line search
function on Γ = S
2
. . . S
2
is a component wise extension of the one given below.
With x ∈ S
2
, d ∈ T
x
S
2
and d = 1:
R
x
(αd) =
x +αd
√
1 +α
2
d
dα
R
x
(αd) =
1
(1 +α
2
)
3
2
(d −αx) ∈ T
Rx(αd)
S
2
φ(α) = E(R
x
(αd))
= E(R
x
(αd))
φ
(α) =
∇E(R
x
(αd)),
d
dα
R
x
(αd)
=
1
(1 +α
2
)
3
2
∇E(R
x
(αd)), d −αx
=
1
(1 +α
2
)
3
2
'gradE(R
x
(αd)), d −αx`
Rx(αd)
φ
(0) = 'gradE(x), d`
x
65
66 APPENDIX A. LINE SEARCH DERIVATIVE ON S
2
Appendix B
Gradient of E
a
for P
n
+
In this appendix, we derive an explicit formula for the gradient of E
a
from chapter 5, equa
tion (5.4). For i ∈ 2, . . . , N
d
−1, with the appropriate constants r
1
, r
2
:
1
2
a
i

2
γi
=
1
2
r
1
Log
γi
(γ
i+1
) +r
2
Log
γi
(γ
i−1
)
2
γi
=
1
2
γ
i
−1/2
r
1
Log
γi
(γ
i+1
) +r
2
Log
γi
(γ
i−1
)
γ
i
−1/2
2
=
1
2
r
1
log
γ
i
−1/2
γ
i+1
γ
i
−1/2
+r
2
log
γ
i
−1/2
γ
i−1
γ
i
−1/2
2
= r
2
1
f
γi
(γ
i+1
) +r
2
2
f
γi
(γ
i−1
) +r
1
r
2
log
γ
i
−1/2
γ
i+1
γ
i
−1/2
, log
γ
i
−1/2
γ
i−1
γ
i
−1/2
.
We need to provide formulas for the gradient of the third term with respect to γ
i
and γ
i+1
(which
is symmetric with γ
i−1
). We introduce the real function g,
g : (P
n
+
)
3
→R : (A, B, C) →g(A, B, C) =
log
A
−1/2
BA
−1/2
, log
A
−1/2
CA
−1/2
.
Using the identity 'A B, C` =
A C
T
, B
T
we proved in section 5.3, we compute:
D(B →g(A, B, C)) (B)[H] =
Dlog(A
−1/2
BA
−1/2
)[A
−1/2
HA
−1/2
], log(A
−1/2
CA
−1/2
)
Introducing A
−1/2
BA
−1/2
= UDU
T
,
˜
H = U
T
A
−1/2
HA
−1/2
U and
˜
F, the ﬁrst divided diﬀerences of log w.r.t. the λ
i
= D
ii
=
U(
˜
H
˜
F)U
T
, log(A
−1/2
CA
−1/2
)
=
˜
H
˜
F, U
T
log(A
−1/2
CA
−1/2
)U
Using the identity (5.5):
=
[U
T
log(A
−1/2
CA
−1/2
)U]
˜
F,
˜
H
=
A
−1/2
U([U
T
log(A
−1/2
CA
−1/2
)U]
˜
F)U
T
A
−1/2
, H
grad(B →g(A, B, C)) (B) = BA
−1/2
U
[U
T
log(A
−1/2
CA
−1/2
)U]
˜
F
U
T
A
−1/2
B
We still need the gradient with respect to A. For this, we observe that:
g(A, B, C) =
log
A
−1/2
BA
−1/2
, log
A
−1/2
CA
−1/2
=
A
−1/2
log
BA
−1
A
1/2
, A
−1/2
log
CA
−1
A
1/2
=
log
BA
−1
, log
CA
−1
Using log(X) = −log(X
−1
) :
=
log
AB
−1
, log
AC
−1
67
68 APPENDIX B. GRADIENT OF E
A
FOR P
N
+
The matrices AB
−1
and AC
−1
are not, in general, symmetric. They still have real, positive
eigenvalues and are diagonalizable. We proved this statement in proposition 5.3.2. We introduce:
AB
−1
= U
B
D
B
U
−1
B
AC
−1
= U
C
D
C
U
−1
C
˜
F
B
= ﬁrst divided diﬀerences of log for λ
i
= (D
B
)
ii
˜
F
C
= ﬁrst divided diﬀerences of log for λ
i
= (D
C
)
ii
Y
B
= U
−1
B
log(AC
−1
)U
B
= U
−1
B
U
C
log(D
C
)U
−1
C
U
B
Y
C
= U
−1
C
log(AB
−1
)U
C
= U
−1
C
U
B
log(D
B
)U
−1
B
U
C
Then, similar algebra yields
grad(A →g(A, B, C)) (A) = U
B
D
B
(Y
B
˜
F
B
)U
−1
B
A +U
C
D
C
(Y
C
˜
F
C
)U
−1
C
A.
Although it is not obvious from it’s expression, this matrix is, by construction, symmetric.
The equations presented in section 5.3 and in this appendix are suﬃcient to compute the
gradient gradE(γ).
Appendix C
SO(3) geometric toolbox
Our algorithms and problem formulation from chapter 3 are well deﬁned for any smooth, ﬁnite
dimensional Riemannian manifold. In practice, they can be applied to such manifolds only if the
expressions for, notably, the logarithmic map and the geodesic distance are tractable. SO(3) (the
special orthogonal group) is such a manifold of great practical interest. SO(3) has dimension 3.
Elements of SO(3) are characterized by an axis of rotation in R
3
and an angle of rotation.
In this appendix, we give a full toolbox of formulas needed to apply the techniques described
in this document to SO(3). The material in this appendix has been tested and works very well.
Practical applications include computer graphics for example, where one could require a camera
to show some 3D scene under diﬀerent rotations at predetermined times, with smooth transitions
between rotations. Typically, one would apply our algorithms with zero λ and low µ to achieve
smooth interpolation. The proposed approach yields considerably smoother looking transitions
than could be obtained with piecewise geodesic interpolation. Another area of potential ap
plications could be related to control of mechanical systems. Our reference for this appendix
is [Bak00], an introduction to matrix Lie groups.
The linear group is deﬁned as
GL(3) = ¦A ∈ R
3×3
: det(A) = 0¦.
A particular subgroup of it is the special linear group:
SL(3) = ¦A ∈ GL(3) : det(A) = 1¦.
Considering only the orthogonal matrices in SL(3) yields the special orthogonal group:
SO(3) = ¦A ∈ SL(3) : A
T
A = I¦.
SO(3) is a Lie group, i.e., it is a group and a manifold. Applying theorem 2.2.2 gives a simple
expression for the tangent spaces, see table C.1. The tangent space at the origin,
so(3) = T
I
SO(3) = ¦H ∈ R
3×3
: H +H
T
= 0¦
is called the Lie Algebra of SO(3) and corresponds to the set of skewsymmetric matrices. Note
that T
A
SO(3) = Aso(3) = ¦H = A
˜
H :
˜
H ∈ so(3)¦. Endowing so(3) with the inner product
from GL(3) gives a natural metric on SO(3):
'H
1
, H
2
`
A
=
A
T
H
1
, A
T
H
2
I
= trace
(A
T
H
1
)
T
A
T
H
2
= trace
H
T
1
H
2
= 'H
1
, H
2
` .
Hence, SO(3) is a submanifold of GL(3). It can be proven that, at the identity matrix I, the
Exp and Log maps reduce to the matrix exponential and logarithm:
Exp
I
: so(3) →SO(3) : H →Exp
I
(H) = exp(H),
Log
I
: SO(3) →so(3) : A →Log
I
(A) = log(A).
69
70 APPENDIX C. SO(3) GEOMETRIC TOOLBOX
Set: SO(3) = ¦A ∈ R
3×3
: A
T
A = I and det(A) = 1¦
Tangent spaces: TA SO(3) = ¦H ∈ R
3×3
: A
T
H +H
T
A = 0¦
Inner product: 'H1, H2)
A
= trace
H
T
1
H2
Vector norm: HA =
'H, H)
A
Distance: dist (A, B) = Log
A
(B) =  log(A
T
B)
Exponential: Exp
A
(H) = AExp
I
A
T
H
= Aexp(A
T
H)
Logarithm: Log
A
(B) = Alog(A
T
B)
Mean: mean (A, B) = Aexp(.5 log(A
T
B))
Table C.1: Toolbox for the special orthogonal group, SO(3).
From there, we can deﬁne the Exp and Log maps everywhere on SO(3), see table C.1.
We introduce the two next functions like we did in section 5.3 for positive matrices:
f
A
: SO(3) →R
+
: B →f
A
(B) =
1
2
dist
2
(A, B) ,
g : (SO(3))
3
→R
+
: (A, B, C) →g(A, B, C) =
Alog(A
T
B), Alog(A
T
C)
.
Similar algebra to what was done in section 5.3 and appendix B yields formulas for the gradients.
We used remark 5.3.1 and the fact that orthogonal matrices are diagonalizable.
gradf
A
(B) = Blog(A
T
B) = −Log
B
(A) ∈ T
B
SO(3)
And:
A
T
B = U
B
D
B
U
−1
B
(diagonalize)
A
T
C = U
C
D
C
U
−1
C
(diagonalize)
˜
F
B
= ﬁrst divided diﬀerences of log for λ
i
= (D
B
)
ii
˜
F
C
= ﬁrst divided diﬀerences of log for λ
i
= (D
C
)
ii
grad(A →g(A, B, C)) (A) = BU
B
˜
F
B
U
−1
B
log(C
T
A)U
B
U
−1
B
+CU
C
˜
F
C
U
−1
C
log(B
T
A)U
C
U
−1
C
∈ T
A
SO(3)
grad(B →g(A, B, C)) (B) = AU
−T
B
˜
F
B
U
T
B
log(A
T
C)U
−T
B
U
T
B
∈ T
B
SO(3)
grad(C →g(A, B, C)) (C) = AU
−T
C
˜
F
C
U
T
C
log(A
T
B)U
−T
C
U
T
C
∈ T
C
SO(3)
This is suﬃcient to compute the gradient of the objective function (3.6) applied to Γ =
SO(3) . . . SO(3). The descent algorithms from chapter 3 apply directly, with the following
vector transport on SO(3) with a component wise extension to Γ:
T
ξ
(η) = BA
T
η, with ξ, η ∈ T
A
SO(3) and ξ = Log
A
(B) .
This vector transport preserves inner products (it is an isometric transport), i.e.,
'η
1
, η
2
`
A
= 'T
ξ
(η
1
) , T
ξ
(η
2
)`
B
.
Bibliography
[AFPA08] V. Arsigny, P. Fillard, X. Pennec, and N. Ayache. Geometric means in a novel vector
space structure on symmetric positivedeﬁnite matrices. SIAM Journal on Matrix
Analysis and Applications, 29(1):328, 2008.
[Alt00] C. Altaﬁni. The de Casteljau algorithm on SE(3). In Book chapter, Nonlinear control
in the Year 2000, pages 23–34, 2000.
[AMS08] P.A. Absil, R. Mahony, and R. Sepulchre. Optimization Algorithms on Matrix Man
ifolds. Princeton University Press, Princeton, NJ, 2008.
[Bak00] Andrew Baker. An introduction to matrix groups and their applications. http:
//www.maths.gla.ac.uk/
~
ajb/dvips/liebern.pdf, 2000.
[Bha07] R. Bhatia. Positive deﬁnite matrices. Princeton University Press, 2007.
[Boo86] W.M. Boothby. An introduction to diﬀerentiable manifolds and Riemannian geometry.
Academic Press Inc, 1986.
[CKS99] P. Crouch, G. Kun, and F. Silva Leite. The de Casteljau algorithm on the Lie group
and spheres. In Journal of Dynamical and Control Systems, volume 5, pages 397–429,
1999.
[CS91] P. Crouch and F. Silva Leite. Geometry and the dynamic interpolation problem. In
Proc. Am. Control Conf., Boston, 26–29 July, 1991, pages 1131–1136, 1991.
[CS95] P. Crouch and F. Silva Leite. The dynamic interpolation problem: on Riemannian
manifolds, Lie groups, and symmetric spaces. J. Dynam. Control Systems, 1(2):177–
202, 1995.
[CSC95] M. Camarinha, F. Silva Leite, and P. Crouch. Splines of class C
k
on nonEuclidean
spaces. IMA J. Math. Control Inform., 12(4):399–410, 1995.
[dC92] Manfredo Perdig˜ ao do Carmo. Riemannian geometry. Mathematics: Theory & Ap
plications. Birkh¨auser Boston Inc., Boston, MA, 1992. Translated from the second
Portuguese edition by Francis Flaherty.
[FPAA07] P. Fillard, X. Pennec, V. Arsigny, and N. Ayache. Clinical DTMRI estimation,
smoothing, and ﬁber tracking with logEuclidean metrics. IEEE Transactions on
Medical Imaging, 26(11):1472–1482, 2007.
[GB10] M. Grant and S. Boyd. CVX: Matlab software for disciplined convex programming,
version 1.21. http://cvxr.com/cvx, May 2010.
[Hai09] Luc Haine. El´ements de g´eom´etrie diﬀ´erentielle (MAT2110 course notes). Mathe
matics Department, Universit´e catholique de Louvain (UCL), 2009.
[Hig08] Nicholas J. Higham. Functions of Matrices: Theory and Computation. Society for
Industrial and Applied Mathematics, Philadelphia, PA, USA, 2008.
[HS07] K. H¨ uper and F. Silva Leite. On the geometry of rolling and interpolation curves on
S
n
, SO
n
, and Grassmann manifolds. J. Dyn. Control Syst., 13(4):467–502, 2007.
71
72 BIBLIOGRAPHY
[JSR06] Janusz Jakubiak, F´atima Silva Leite, and Rui C. Rodrigues. A twostep algorithm
of smooth spline generation on Riemannian manifolds. J. Comput. Appl. Math.,
194(2):177–191, 2006.
[KDL07] Alfred Kume, Ian L. Dryden, and Huiling Le. Shapespace smoothing splines for
planar landmark data. Biometrika, 94(3):513–528, 2007.
[L
¨
04] J. L¨ ofberg. Yalmip : A toolbox for modeling and optimization in MATLAB. In
Proceedings of the CACSD Conference, Taipei, Taiwan, 2004.
[MS06] Lu´ıs Machado and F. Silva Leite. Fitting smooth paths on Riemannian manifolds.
Int. J. Appl. Math. Stat., 4(J06):25–53, 2006.
[MSH06] Lu´ıs Machado, F. Silva Leite, and Knut H¨ uper. Riemannian means as solutions of
variational problems. LMS J. Comput. Math., 9:86–103 (electronic), 2006.
[NHP89] Lyle Noakes, Greg Heinzinger, and Brad Paden. Cubic splines on curved spaces. IMA
J. Math. Control Inform., 6(4):465–473, 1989.
[NW99] J. Nocedal and S.J. Wright. Numerical optimization. Springer Verlag, 1999.
[PN07] Tomasz Popiel and Lyle Noakes. B´ezier curves and C
2
interpolation in Riemannian
manifolds. J. Approx. Theory, 148(2):111–127, 2007.
[SASK09] Chaﬁk Samir, P.A. Absil, Anuj Srivastava, and Eric Klassen. A gradientdescent
method for curve ﬁtting on riemannian manifolds. Technical Report UCLINMA
2009.023 (submitted), Universite catholique de Louvain, 2009.
[Stu98] Jos F. Sturm. Using sedumi 1.02, a matlab toolbox for optimization over symmetric
cones, 1998.
[TTT99] K. C. Toh, M.J. Todd, and R. H. Tutuncu. Sdpt3  a matlab software package for
semideﬁnite programming. Optimization Methods and Software, 11:545–581, 1999.
[VD08] Paul Van Dooren. Algorithmique num´erique (INMA2710 course notes). Ecole Poly
technique de Louvain, Universit´e catholique de Louvain (UCL), 2008.
[Wei10] Eric W. Weisstein. Complete metric space. From MathWorld–A Wolfram Web Re
source. http://mathworld.wolfram.com/CompleteMetricSpace.html, 2010.
Je remercie chaleureusement PierreAntoine Absil, mon promoteur ainsi que Quentin Rentmeesters, son doctorant, pour leur guidance ´clair´e. e e Je d´dicace ce travail a e ` Pierre Bolly, qui m’a introduit aux math´matiques. e
. . . . . . . .4 What if we generalize to Riemannian 1. . . . . . . . . . . . . 3. . . . . . . . .3 Alternative discretizations of the objective* . . . . . . . . . . .5 Connections and covariant derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6 Alternative III: vector space structure . . . . . . . . . . . . . . . .6 Step choosing algorithms .2 Tangent spaces and tangent vectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Charts and manifolds . . . . . . . . . . . . . . . . . . . . . . . . . 2. . . . . . . . . . . 1. 2. . . . . . . . . . . . . 4. . . . . . 4 Discrete curve ﬁtting on the sphere 4. . . . . . . . .4 Scalar ﬁelds on manifolds and the gradient vector 2. . 3. . . . . . . . . . . . .5 Short state of the art . . . . . .2 Curve space and objective function . . . . . . . . . . . . . . . 4. . . .2 Generalized objective . . . . . . . . . . . . . . . . . . . . . . 5. . . . . . . . . . Conclusions and perspectives . . . . . .8 Parallel translation . . . . . . . . . . . .Contents Contents Notations Introduction 1 Discrete curve ﬁtting in Rn 1. . . . 3 Objective generalization and minimization 3.5 A geometric nonlinear conjugate gradient algorithm 3. . . . . 3. . . . . . . . . . . . . . . . . . . . . .1 S2 geometric toolbox . . . . . . . . . . . . . . . . . . . .1 Pn geometric toolbox . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7 Results and comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. . 2. . . . . . . . . . . . . .1 Geometric ﬁnite diﬀerences . . . . . . . . . . . . . . . . . . .4 A geometric steepest descent algorithm . . . . . . . . . . . . . . . manifolds? . . . . . . . . . . 2. . . . . . . . . . . . . . . . . . 1. . .6 Distances and geodesic curves . . .3 Leastsquares formulation . . . . . . . . . i iii 1 3 4 4 6 6 7 9 10 11 13 14 15 17 18 20 23 24 25 26 28 29 31 33 34 35 37 37 49 50 51 52 56 56 57 58 63 i 2 Elements of Riemannian geometry 2. . . . . . . . 5 Discrete curve ﬁtting on positivedeﬁnite matrices 5. . . .3 Gradient of the objective . . . . . . ﬁeld . . . . . 5. . . .4 Results and comments . . . 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. . . . . . . .2 Curve space and objective function . . .3 Inner products and Riemannian manifolds . . . . . . 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Objective discretization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . + 5. . . . . . . . . .3 Gradient of the objective . . 2. .1 The problem in a continuous setting 1. . . . . . . . . .7 Exponential and logarithmic maps . . . . .4 Alternative I: linear interpolation . . . . . . . . . 5. . . . . . . . . . . . .5 Alternative II: convex programming . . . 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
ii A Line search derivative on S2 n B Gradient of Ea for P+ CONTENTS 65 67 69 71 C SO(3) geometric toolbox Bibliography .
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . tangent space to M at x . . η. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Notations · M. . . . unit sphere embedded in R3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . exponential map on a manifold . . . . . . . . . . . . . . . . . . . v X. . . . . . . . . . . . . . . . . . . . . . . . . . . Frobenius norm for matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Y Df (x)[ξ] dist Exp Log S2 Hn Pn + usual 2norm in Rn . . . . . . ﬁnitedimensional. . . . . . . . . . . . . . p Tx M ξ. . . . . vector ﬁelds on a manifold . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . set of positivedeﬁnite matrices . . . . derivative of f at x in the direction ξ . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . logarithmic map on a manifold . . . . tangent vectors to a manifold . . . . . . . . . . . . . . set of symmetric matrices . . . . . . . . . . . . . . . . y. . . . . . . . . . . . . . . . (usually) smooth. . . . . . . points on a manifold . . . . . . . . . . . . . . . . . . . . . . . . . N x. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . geodesic distance on a manifold . . . . . . . . . Table 1: Notations 4 10 12 12 13 14 17 19 19 34 50 50 iii . . . . . . . . . . . . . . . . . . Riemannian manifolds . . . .
iv .
. µ = 0.Introduction Let p1 . each γi is associated to a ﬁxed time τi such that t1 = τ1 < τ2 < · · · < τNd = tN . which we introduce in this work. give. The (necessary) discretization for implementation purposes arises at the very end of the algorithm design process.5) that (i) when λ > 0. The curve space is reduced to the set of sequences of Nd points on the manifold. . The main contribution of this document lies in the proposed discretization of the problem. The data points pi are now allowed to lie on a Riemannian manifold M. γ(ti )) + i=1 λ 2 tN γ(t). One way of formalizing this loose description of a natural need is to express γ as the minimizer of an energy function such as N 1 λ tN µ tN E(γ) = pi − γ(ti ) 2 + γ(t) 2 dt + ˙ γ (t) 2 dt. pN be N data points in the Euclidean space Rn with time labels t1 ≤ t2 ≤ · · · ≤ tN . . . the optimal γ is piecewise aﬃne and (ii) when λ = 0. dt2 The weights αi and βi are chosen based on a suitable numerical integration scheme like. · the Riemannian metric. are approximations to the velocity and acceleration vectors γ(ti ) and ˙ D2 γ (ti ). We propose the following discretization of (2): E(γ) = 1 2 N dist2 (pi . (3) Here. the authors wonder what would happen if one were to discretize (2) directly. For a discrete curve γ = (γ1 . . which we note S2 . γ ˙ 2 γ the ﬁrst derivative of γ and D 2 the second covariant derivative of γ (chapter 2 deﬁnes these dt terms and other relevant elements of Riemannian geometry). vi i=1 γi + µ 2 Nd βi a i . such that the sums eﬀectively discretize the integrals present in (2). . . In conclusion of [SASK09]. γ(t) ˙ ˙ t1 γ(t) dt + µ 2 tN t1 D2 γ D2 γ . We focus on a broader problem. The tuning parameters λ and µ enable the user to tune the balance between the conﬂicting goals of ﬁtting and smoothness. The present work is an attempt to answer that question. Cubic splines are piecewise cubic polynomial curves of class C 2 . In particular.×M (Nd copies of M). Samir et al. . solve this problem by computing the gradient of E in the socalled Palais metric and applying a steepest descent scheme to minimize E. A classic problem arising in many disciplines consists in searching for a curve γ : [t1 . p2 . Rn is the simplest example of such a manifold. In essence.g. the following deﬁnition in [SASK09]: E(γ) = 1 2 N dist2 (pi . help reduce measurement noise and ﬁll gaps in the data. Γ = M×. dt2 dt2 dt. e. the cone of positivedeﬁnite matrices Pn and the special orthogonal group SO(3). the trapezium method. the indices si are chosen such that τsi is closest (ideally equal) to ti . rooted at γi . section 1. tN ] → Rn that simultaneously (i) reasonably ﬁts the data and (ii) is suﬃciently smooth. among other things. γNd ) ∈ Γ. γsi ) + i=1 λ 2 Nd αi vi . Samir et al. a i i=1 γi . Other examples treated in this document include the sphere embedded in R3 . . We compute these with generalized ﬁnite diﬀerences. . It is well known (see state of the art. This may. . ¨ (1) 2 i=1 2 t1 2 t1 deﬁned over some suitable curve space Γ. mutatis mutandis. the problem is a variational problem in inﬁnite dimension. Generalizations of (1) to manifolds have been + proposed. γ(t) (2) where dist denotes the Riemannian (geodesic) distance on M. The tangent vectors vi and ai . µ > 0 the optimal γ is an approximating cubic spline (provided Γ contains these curves). ·. We also 1 .
g. Our short answer to the question Samir et al. To cite but a few examples: • Positions on Earth closely resemble points on S2 . e.. Each time an application uses data belonging to (tractable) manifolds.g. in case the control variables from the systems at hand belong to manifolds. on the sphere S2 (chapter 4). • Conﬁgurations of rigid bodies in space are fully described by the position of their center of mass and their orientation. a complex manifold we plan on working with in the future. Our results correspond nicely to what one would expect based on the continuous case. We successfully implemented our methods in Rn (chapter 1). ask is that the regression problem on these manifolds can. Shapes belong to the shape space. Interestingly..e. For Pn and SO(3). noise reduction (i. • DiﬀusionTensor MRI measures (noisy) positivedeﬁnite matrices in the brain for medical imaging purposes. For the control engineer. diﬀer on manifolds. the special Euclidean group. e. in the cone of positivedeﬁnite matrices Pn (chapter 5) and on the special orthogonal group + SO(3) (appendix C). in general. Rigid body motion estimation thus comes down to regression on SE(3). • Shapes (seen as closed curves) can be measured.. Chapter 3 covers the details. we constructed explicit formulas for the gradients of + realvalued functions involving the matrix logarithm. which jointly constitute a point of SE(3) = R3 × SO(3). they all come down to the same discretization in Rn but. our algorithms may be helpful. in general. The same mathematics can be used for smooth camera transitions in computer graphics.2 propose other ways of discretizing (2). we may apply the optimization algorithms from [AMS08] to minimize (3). by contour detectors applied to video streams. smoothing) of the measurements and interpolation can be carried out by the generic techniques developed in this document. . be solved satisfactorily via our direct discretization approach. Since the curve space is now ﬁnitedimensional. Measurements on manifolds arise naturally in applications.
An objective function (energy function) will be introduced to capture the balance we are seeking between ﬁtting and smoothness. We will then investigate where the vector space structure of Rn has been used. namely: the sphere and the set of positivedeﬁnite matrices. we give a formal deﬁnition of the regression problem for timelabeled data in Rn in the continuous case. As we will see. in the following chapters. Before we give a general formulation of that problem (along with a proper deﬁnition of ﬁtting and smoothness on manifolds). it is very natural to use ﬁnite diﬀerences for that matter. what we are looking for is a suitable smooth curve deﬁned over some time interval and taking values in Rn . This will help us take conscience of how the vector space structure of Rn simpliﬁes the whole process of formalizing and solving the problem. More speciﬁcally. we will consider ﬁnitedimensional. it will be suﬃcient to describe a curve using a ﬁnite set of sampling points. smooth Riemannian manifolds. i. The second step consists in proposing a discretization for that objective. We close this chapter by a short state of the art. First oﬀ. and where we should generalize the problem formulation for the case of nonvector spaces. We will solve the discrete problem in a leastsquares framework.. Later on.e. 3 . it is helpful to investigate the Euclidean case. Doing so.Chapter 1 Discrete curve ﬁtting in Rn In this work. we show how one can ﬁnd a discrete representation of a curve lying on a manifold that strikes a (tunable) balance between data ﬁtting and smoothness. we will use these observations to establish a list of tools we should have on manifolds to adequately generalize some of the more fundamental concepts that can sometimes be hidden behind generic linear combinations.
DISCRETE CURVE FITTING IN RN 1. tN ] → Rn such that γ approximately ﬁts the data and such that γ is not too “wiggly”. . we discretize equation (1. x 2 = xT x. N }. This approach works great in the Euclidean case. We are searching for a C k (smooth “enough”) function γ : [t1 . then to optimize (1. It can be generalized to some extent. This is in direct response to the concluding remarks of [SASK09].1) We use the usual 2norm in Rn . tN ] to Rn . hence reverting to the optimization setting already covered in [AMS08].1) over the space spanned by said basis. dt2 dt2 dt.5). such that the time labels ti are increasing with i and such that the weights wi are nonnegative. see section 1. Playing with these parameters lets us move the minimizer of E from near interpolation ¨ (piecewise linear or cubic splines) to near linear regression. γ(t) (1. For the sake of completeness.2) where dist denotes the Riemannian (geodesic) distance. Rn ) for some appropriate k. captures the problem description: E(γ) = 1 2 N i=1 wi γ(ti ) − pi 2 + λ 2 tN γ(t) ˙ t1 2 + µ 2 tN γ (t) ¨ t1 2 (1. · the Riemannian metric. × Rn ≡ Rn×Nd consisting of all sequences of Nd points in Rn (the d subscript stands for discrete). pi ) : i = 1. and at the cost of increased complexity. Minimizing E is a variational problem.1) over all functions in C k ([t1 . The curve space over which E has to be minimized to obtain our regression model has inﬁnite dimension. Their strategy was to discretize as late as possible in the process of writing minimization algorithms.1). γ(t) ˙ ˙ t1 γ(t) dt + µ 2 tN t1 D2 γ D2 γ . A sensible way to do that would be to consider a ﬁnite basis of smooth functions. Furthermore. let us mention that the continuous objective (1. More precisely. . deﬁned over the set of diﬀerentiable functions γ from [t1 . ti . we consider the discrete curve space Γ = Rn × . The sampling times . The following objective (energy) function. which can be complicated.1) has been generalized to Riemannian manifolds to take this form: E(γ) = 1 2 N dist2 (γ(ti ).2 Objective discretization Instead of optimizing (1. .4 CHAPTER 1. The two parameters λ and µ enable us to adjust the importance of the penalties on the 2norms of the ﬁrst derivative γ and the second deriva˙ tive γ . Rn ). timelabeled points pi in Rn : {(wi . ·.1 The problem in a continuous setting Let us consider a collection of N weighted. we take the opposite strategy. pi ) + i=1 λ 2 tN γ(t). 1. reducing the curve space to a ﬁnite dimensional manifold. We discretize curves immediately. γ the ﬁrst ˙ D2 γ derivative of γ and dt2 the second covariant derivative of γ. The added complexity of the continuous approach is the main reason why.5. to manifolds. Among others (see section 1. have solved regression problems based on this energy function. not to mention that our goal in this document is to solve the problem on manifolds rather than on Rn . This is why. In this document. . we would like to be able to tune the balance between those two competing objectives. The former means that we would like γ(ti ) to be close to pi (even more so when wi is high) whereas the latter means that we would like the derivatives of γ to have a small norm. We would like to treat the Euclidean regression problem in a way that will help us work out the Riemannian case. tN ]. even in this section devoted to the Euclidean case. we would like to restrict ourselves to a ﬁnite dimensional curve space Γ. we study the simple discretization obtained by sampling curves in time. tN ]. C k ([t1 . eﬀectively discouraging sharp turns and detours. in the next section. . Samir et al. in [SASK09]. .
We deﬁne the velocity penalty. This is a consequence of Taylor’s theorem and of the deﬁnition of Riemann integrals. one could use. There may be multiple data points associated to the same discretization point. γNd ) ∈ Γ. i=1 (1. e. i. . . sN such that ti is closest to τsi and deﬁne the misﬁt penalty. . si = sj for some i = j. . The important feature is that vi ≈ γ(ti ) is a linear combination of γi and its ˙ neighbor(s).1) and see how we can discretize them without explicitly constructing an underlying γ(t).5) Again. The complete objective function is the following. to section 3. which is standard material.. . Giving a precise deﬁnition of such a γ(t) would yield a clean way of discretizing (1. the trapezium rule for numerical integration.1). The discretization process is very similar to what was done for Ev . which need not be the same as the αi ’s used for Ev . i=1 (1. any method suited for ﬁxed. In the next section. with tuning parameters λ and µ: E : Γ → R+ : γ → E(γ) = Ed (γ) + λEv (γ) + µEa (γ) (1.3) wi γsi − pi 2 . τNd are ﬁxed. except at the end points for which we need unilateral diﬀerences. v0 and vNd are deﬁned as unilateral ﬁnite diﬀerences. . as the maximum discretization step ∆τi = τi+1 −τi goes to zero. At the end points.6) We state without proof that. the acceleration vector ai at γi can be obtained via ﬁnite diﬀerences as a linear combination of γi and its neighbors.g. .6) can be minimized by solving a sparse linear system in a leastsquares sense.1) penalizes the same kind of L2 norm of the second derivative of γ(t). nonhomogeneous sampling is ﬁne. For the latter. e.1. . OBJECTIVE DISCRETIZATION τ1 . shape functions like the ones we use in ﬁnite element methods. as: N 1 Ed : Γ → R+ : γ → Ed (γ) = (1.4) where the velocity approximations vi are obtained via ﬁnite diﬀerences and the coeﬃcients αi correspond to the weights in.1) penalizes misﬁt between the curve γ and the N data points pi .e. . We refrain from doing that because this is the speciﬁc point that makes matters intricate on manifolds.1) penalizes a kind of L2 norm of the ﬁrst derivative of γ(t). 5 For each discrete curve γ = (γ1 . Ev . The βi ’s are adequate weights for a numerical integration method. for some si . ˙ Mainstream numerical methods can help us here.. there is an underlying continuous curve γ(t) such that γ(τi ) = γi . Let us review the three components of (1.g. Velocity penalty The second term of (1. In the Euclidean case though. The appropriate formulas can be found in section 3. the optimal discrete objective value and the optimal continuous objective value become equivalent. . . Ed . as: Ev : Γ → R+ : γ → Ev (γ) = 1 2 Nd αi vi 2 . Two things need to be discretized here: (i) the derivative γ and (ii) the integral. We delay the explicit construction of the formulas for vi . Acceleration penalty The third term of (1.2. we show how (1.1. 2 i=1 We can always choose the sampling times τi such that. We introduce indices s1 .1. Misﬁt penalty The ﬁrst term of (1. .. . ti = τsi . We introduce the acceleration penalty Ea as: 1 Ea : Γ → R : γ → Ea (γ) = 2 + Nd βi a i 2 .
. Diﬀerentiating and setting the gradient E to zero. with an LU decomposition and n backsubstitutions. a vector qd ∈ RN . even for n > 1. since they are linear combinations. . the linear combinations appearing in ﬁnite diﬀerences are not arbitrary: they make use of particular vectors that. let us observe that the objective function E is decoupled along the n dimensions of the problem. Hence. we can focus on solving the problem in Rn with n = 1. for n diﬀerent b vectors.g. we straightforwardly obtain the linear system: Aγ = b. . e. in a Euclidean space. the lack of a vector space structure raises two issues: • How can we generalize the objective E? Finite diﬀerences. we will need tools to transform curve guesses into other curves. E(γ) = + . µβNd aNd 2 . Pv . The vector qd is the only object that depends on the data points pi . As we will see in section 3. The discrete regression problem reduces to a highly structured quadratic optimization problem without constraints. Pv and Pa depend on the sampling times τi as well as on the various weights introduced. In doing so. we generalize to Riemannian manifolds. . how can we minimize it? We will need optimization algorithms on manifolds. and the obvious deﬁnitions for A. We exploit that. Same goes for the Euclidean distance. for adequate matrices Pd ∈ RN ×Nd . This problem has a rich structure because of the nature of the constraints. . (1. DISCRETE CURVE FITTING IN RN 1. modern Quadratic Programming solvers can come in handy. b and c.6 CHAPTER 1. Hence. . The objective function E: N N N 1 λ d µ d E(γ) = wi γsi − pi 2 + αi vi 2 + βi a i 2 . We will need substitutes for these elements. If one wants to add constraints (possibly forcing interpolation by some or all of the points pi or constraining speed/acceleration to respect lower and/or upper bounds on some time intervals). Pa ∈ RNd ×Nd .7) 2 i=1 2 i=1 2 i=1 can be rewritten as a sum of vector 2norms: 2 √ √ λα1 v1 w1 (γs1 − p1 ) 1 1 . A does not depend on the data and must be constructed only once to solve a problem in Rn .3 Leastsquares formulation First oﬀ. Namely. This is a standard result from numerical linear algebra. see for example [VD08]. moving in some direction in the curve space. 2 1 + 2 √ µβ1 a1 . The matrices Pd . do not make sense on manifolds. 1. 2 2 √ wN (γsN − pN ) λαNd vNd 2 The vectors appearing in this expression are aﬃne combinations of the variables γi and hence can be rewritten as (temporarily considering γ as a column vector): E(γ) = 1 1 1 Pd γ − qd 2 + Pv γ 2 + Pa γ 2 2 2 2 1 q T qd T T T T = γ T (Pd Pd + Pv Pv + Pa Pa )γ − qd Pd γ + d 2 2 1 T T = γ Aγ − b γ + c. which needs to be solved n times. . The regression problem in Rn can thus reliably be solved in O(nNd ) operations. lead from one point to another in the same space. . A is 5diagonal and positivedeﬁnite. • Since the objective will not be quadratic anymore.4 What if we generalize to Riemannian manifolds? In the next chapters.1. We will investigate iterative descent methods. which can be solved. For this to work.
whose natural generalization on Riemannian manifolds are geodesics. [CSC95]. Recently. This leastsquares problem cannot be straightforwardly generalized to Riemannian manifolds because of the lack of an equivalent of polynomial curves on Riemannian manifolds. Let us ﬁrst consider the case where the goodness of ﬁt is optimized under a regularity constraint. we mention the literature on splines based on generalized B´zier curves. already considered by Lagrange. the objective function is deﬁned by equation (1. Crouch and Silva Leite [CS91. see [CKS99. [JSR06] presented a geometric twostep algorithm to generate splines of an arbitrary degree of smoothness in Euclidean spaces. SHORT STATE OF THE ART 7 In the next chapter. the optimal curve converges to a single point which. over the class of all piecewise smooth curves γ : [0. its generalization to Riemannian manifolds is more recent. Jakubiak et al. [NHP89]. that we focus on in this work. deﬁned as curves γ that minimize. When M = Rn . In [MSH06]. a possible regularity constraint. When M = Rn . The problem of ﬁtting geodesics to data on a Riemannian manifold M was considered in [MS06] for the case where M is the special orthogonal group SO(n) or the unit sphere Sn . In this situation. is to restrict the curve γ to the family of polynomial functions of degree not exceeding m. CS95] generalized cubic splines to Riemannian manifolds. the function 1 2 2 tN t1 D2 γ D2 γ (t). Solutions to this variational problem are piecewise smooth geodesics that best ﬁt the given data. While the concept is well known in Rn . it is common to require a perfect ﬁt: the curve ﬁtting problem becomes an 1 interpolation problem. Still in the context of interpolation on manifolds. γ(t) γ where D 2 denotes the (LeviCivita) second covariant derivative and ·. The mapping can be deﬁned.1. Another approach to interpolation on manifolds consists of mapping the data points onto the tangent space at a particular point of M. then extended the algorithm to matrix Lie groups and applied it to generate smooth motions of 3D objects. An exception is the case m = 1.2) directly without constraints on γ other than diﬀerentiability conditions. several results on interpolation can be found in the literature.5 Short state of the art Several existing methods for tackling the conﬂicting nature of the curve ﬁtting problem (i. the polynomial functions in Rn are then straight lines.. These will enable us to give a precise meaning to the loose terms used in this section. when λ goes to +∞. consists in optimizing (1. for certain . under interpolation conditions.g. e. where λ (> 0) is a smoothing parameter. For the case where M is a nonlinear manifold. the interpolating curves γ that minimize 1 0 γ (t) 2 dt ¨ 2 (in the appropriate function space) are known as cubic splines. 1] → M. The other case is when a regularity criterion is optimized under constraints on the goodness of ﬁt. see [HS07. which generalizes a result of Noakes et al. As shown in [MSH06].5. by a rolling procedure. deﬁned by a generalization to manifolds e of the de Casteljau algorithm. and ﬁnally mapping the resulting curve back to the manifold.2) with µ = 0. PN07].e. Yet another approach. we study a few properties and tools of Riemannian geometry. but without a variational interpretation. then computing an interpolating curve in the tangent space. · x stands for the Riedt mannian metric on M at x. goodness of ﬁt vs regularity) rely on turning the ﬁtting task into an optimization problem where one of the criteria becomes the objective function and the other criterion is turned into a constraint. They gave a necessary condition for optimality in the form of a fourthorder diﬀerential equation. 2 (t) 2 dt dt dt. we give a brief review of the state of the art. The notion of polynomial does not carry to M in an obvious way. 1. In the next section. KDL07]. (m ≤ N −1).. Alt00. Splines of class C k were generalized to Riemannian manifolds by Camarinha et al.
In this work. in the sense of Noakes et al. as the smoothing parameter µ goes to +∞. In [MS06]. In summary. . over a certain set of admissible C 2 curves. . It is also shown in [MS06. DISCRETE CURVE FITTING IN RN classes of manifolds. Prop.4]. the optimal curve converges to the best leastsquares geodesic ﬁt to the data points at the given instants of time.6]. i = 1. The authors give a necessary condition of optimality that takes the form of a fourthorder diﬀerential equation involving the covariant derivative and the curvature tensor along with certain regularity conditions at the time instants ti .8 CHAPTER 1.5] that. 4. [NHP89]. the objective function is deﬁned by equation (1. and they are cubic splines because they are obtained by smoothly piecing together segments of cubic polynomials on M. is shown to be the Riemannian mean of the data points. Prop. . The optimal curves are approximating cubic splines: they are approximating because in general γ(ti ) diﬀers from pi . 4. we focus on computing discrete representations of such curves. . while the concept of smoothing spline has been satisfactorily generalized to Riemannian manifolds. 4. the approximating cubic spline converges to an interpolating cubic spline [MS06.2) with λ = 0. Th. . the optimal curve goes to a broken geodesic on M interpolating the data points. When µ goes to zero. N [MS06. When λ goes to zero. the emphasis has been laid on studying properties of these curves rather than on proposing eﬃcient methods to compute them.
I thank the authors for them. They are the key concepts that. Charts are the mathematical concept that embody this idea. that captures this essential property. This chapter is mainly based on [AMS08. we showed how easily the discrete regression problem can be stated and solved in a vector space. which is the tool that lets us compare tangent vectors rooted at diﬀerent points of a manifold. Should you not be familiar with this material. In this chapter. manifolds are sets that locally resemble Rn . the exponential and logarithmic maps are introduced. It is also helpful to think about how the concepts specialize in the case of Rn . deﬁning charts and atlases is a necessary eﬀort to build the higher level tools we do use later on. Inner products are then deﬁned as positivedeﬁnite forms on each tangent space. However. Finally. Loosely. 9 . We then made a few statements about where that structure is used and what needs to be done to generalize the concepts to other spaces. will enable us to generalize ﬁnite diﬀerences and descent methods for optimization. Using these inner products we give a natural deﬁnition of distance between points and gradients of realvalued functions on the manifold. we give a brief overview of some Riemannian geometry elements. we introduce parallel translation. At each point of a manifold we can deﬁne a vector space. All the (beautiful) ﬁgures come from the book [AMS08] by Absil et al.Chapter 2 Elements of Riemannian geometry In the ﬁrst chapter. We will not explicitly use charts in our applications. We start by giving a proper deﬁnition of manifolds as sets accompanied by atlases. called the tangent space. Boo86]. we suggest you think of the general manifold M used henceforth simply as the sphere. Subsequently. later on. which are sets of charts. Hai09. Considering spaces where these inner products vary continuously on the manifold leads to the concept of Riemannian manifolds.
are smoothly compatible (C ∞ −compatible) if either U ∩ V = ∅ or U ∩ V = ∅ and • ϕ(U ∩ V ) is an open set of Rn . ϕ). the latter implies n = m. A chart of M is a pair (U.1. ELEMENTS OF RIEMANNIAN GEOMETRY 2. Let M be a set. These identiﬁcations are called charts. A set A of pairwise smoothly compatible charts {(Ui .. For example. .e. A+ ) where the atlas A+ contains the identity map (Rn . Two charts (U. Classically. Intuitively. where M is a set and A+ is a maximal atlas of M . When U ∩ V = ∅. Rn ϕ(U ∩ V ) ϕ(U ) ψ ◦ ϕ−1 ϕ ◦ ψ −1 ϕ ψ ψ(U ∩ V ) Rn ψ(V ) U V Figure 2.2 (compatible charts). ϕ : U = Rn → Rn : x → ϕ(x) = x. U is the chart’s domain and n is the chart’s dimension. More formally: Deﬁnition 2. .1. ϕi ). Deﬁned as such. A+ ).5. xn ) are called the coordinates of p in the chart (U. one can generate a unique maximal atlas A+ . • ψ ◦ ϕ−1 : ϕ(U ∩ V ) → ψ(U ∩ V ) is a smooth diﬀeomorphism (i. it is important to give a proper deﬁnition of smooth manifolds ﬁrst. Once the diﬀerential manifold structure is clearly stated. no confusion is possible. i ∈ I} such that ∪i∈I Ui = M is a smooth atlas of M . .3 (atlas). A set of charts that covers the whole set is called an atlas for the set.10 CHAPTER 2. the elements of ϕ(p) = (x1 . Simply consider M = (Rn . . a smooth invertible function with smooth inverse). ϕ) and (V. We always use the notation Rn regardless of whether we consider the vector space or the Euclidean space. manifolds are sets that can be locally identiﬁed with patches of Rn . we are mainly concerned with embedded Riemannian manifolds (we will deﬁne these terms shortly). ψ) of M . Two atlases A1 and A2 are compatible if A1 ∪ A2 is an atlas.1: Charts. • ψ(U ∩ V ) is an open set of Rm .1.1 Charts and manifolds In this document. we deﬁne: Deﬁnition 2..1. Figure courtesy of Absil et al. [AMS08]. M is called a Euclidean space. ϕ) where U ⊂ M and ϕ is a bijection between U and an open set of Rn .1 (chart). of dimensions n and m respectively. Often times. The set and the atlas together constitute a manifold. Such an atlas contains A as well as all the charts compatible with A. The vector space Rn can be endowed with an obvious manifold structure. we will refer to M when we really mean M and vice versa. Given an atlas A.1. Deﬁnition 2. ϕ). . the notation M ⊂ Rn means M ⊂ Rn .4 (manifold). A smooth manifold is a pair M = (M. Example 2. Nevertheless. Given p ∈ U . Deﬁnition 2.
1 (tangent spaces for manifolds embedded in Rn ).g. for all p in M. twodimensional submanifold of R3 is the sphere S2 = {x ∈ R3 : xT x = 1}. noted Tx M. there is an open set V of Rn containing x and a smooth function f : V → Rm such that the diﬀerential dfx : Rn → Rm has rank m and V ∩ M = f −1 (0). A simple deﬁnition of tangent spaces in this setting follows. A mapping f : M → N is of class C k if. we can still make sense of the deﬁnition with the appropriate space identiﬁcations. if all the charts of A+ have the same dimension n.1. we will deﬁne tangent vectors as equivalence classes of functions.6 (dimension).7 (smooth mapping). t The diﬀerence appearing in the numerator does not. TANGENT SPACES AND TANGENT VECTORS 11 Deﬁnition 2. 2. The following theorem is useful when dealing with such manifolds deﬁned by equality constraints on the Cartesian coordinates. hence all classic tools apply.3. Essentially. is the vector subspace of Rn deﬁned by: Tx M = {v ∈ Rn : v = c (0) for some smooth c : R → M such that c(0) = x} . The tangent space at x ∈ M. (1) and (2) are equivalent: (1) M is a smooth submanifold of Rn of dimension n − m. Let M be a subset of Rn . Let M ⊂ Rn be a smooth manifold. This deﬁnition does not depend on the choice of charts. in general. Example 2. An example of a smooth.1. e. we call n the dimension of the manifold. ϕ) and (V.2. We illustrate this on ﬁgure 2.2. In Rn . make sense for manifolds.2. Use f : R3 → R : x → f (x) = xT x − 1 in theorem 2. The dimension of Tx M is the dimension of a chart of M containing x. (2) For all x ∈ M. there is a chart (U. We ﬁrst construct a simpler deﬁnition.. comes from the lack of a vector space structure. We did not deﬁne the term submanifold properly. Theorem 2. . ψ) of N such that p ∈ U . ψ). Given a manifold M = (M. once again. This surprising detour from the very simple idea underlying tangent vectors (namely that they point in directions one can follow at a given point on a manifold). It maps open sets of Rn . f (U ) ⊂ V and ψ ◦ f ◦ ϕ−1 : ϕ(U ) → ψ(V ) is of class C k . ϕ) of M and (V. the tangent space at x is given by Tx M = ker dfx . the sphere). Those are objects that give a local representation of manifolds as vector spaces. For manifolds embedded in Rn however (like.2. Furthermore.2. one can deﬁne derivatives for curves c : R → Rn : c (0) := lim t→0 c(t) − c(0) . In the next section.2. We need one last deﬁnition to assess smoothness of curves deﬁned on manifolds: Deﬁnition 2. The tangent spaces are then Tx S2 = {v ∈ R3 : v T x = 0}. Let M and N be two smooth manifolds.2 Tangent spaces and tangent vectors As is customary in diﬀerential geometry. The latter is called the local expression of f in the charts (U.2.2. Deﬁnition 2. A+ ). we introduce tangent spaces and tangent vectors. (1) means that M is a manifold endowed with the diﬀerential structure naturally inherited from Rn .2.
. noted Tp M. c1 ∼ c2 if and only if ϕ ◦ c1 and ϕ ◦ c2 have same derivative at 0. the equivalence class [c] is an element of Tp M that we call a tangent vector to M at p.2.2: Tangent space on the sphere. We now give a more general deﬁnition of tangent vectors that does not need the manifold to be embedded in a vector space. In general. c ∈ C 1 is to be understood with the deﬁnition 2. again.. Deﬁnition 2.. equivalent.2. is independent of the chart. We deﬁne an equivalence relation on Cp . tangent vector). essentially. with the obvious manifold structure on open intervals of R derived from example 2.2. The tangent space to M at p. [AMS08].e. When M ⊂ Rn . 0 ∈ I an open interval in R.5 (tangent bundle). This structure. Figure courtesy of Absil et al.4 are.2. This enables us to deﬁne vector ﬁelds on manifolds as smooth mappings.1 and 2. and let c1 . The tangent bundle is itself a manifold. it is possible to build a vector space isomorphism (i.1.1. the tangent space Tx S2 can be pictured as the plane tangent to the sphere at x. Then. is the set: TM = Tp M. Let M be a smooth manifold. dt dt t=0 t=0 It is easy to prove that this is independent of the choice of chart. the set of diﬀerentiable curves on M passing through p at t = 0.e. is the quotient space: Tp M = Cp / ∼ Given c ∈ Cp .e. an invertible linear map) proving that the two deﬁnitions 2. We deﬁne π as the natural projection on the roots of vectors. We deﬁne: Cp = {c : I → M : c ∈ C 1 . Deﬁnition 2. Let (U. i. ϕ) be a chart of M such that p ∈ U .7. Let M be a smooth manifold and p be a point on M. The mapping ϕ ϕ θp : Tp M → Rn : [c] → θp ([c]) = d ϕ(c(t)) dt t=0 is bijective and naturally deﬁnes a vector space structure over Tp M. The tangent bundle.12 CHAPTER 2. with origin at x.. ELEMENTS OF RIEMANNIAN GEOMETRY c(t) x = c(0) c (0) S2 Figure 2. c2 ∈ Cp . M is not embedded in Rn . noted T M. i. c(0) = p}. p∈M where stands for disjoint union. noted ∼.5. Here. π(ξ) = p if and only if ξ ∈ Tp M. Since S2 is an embedded submanifold of R3 .: d d c1 ∼ c2 ⇔ ϕ(c1 (t)) ϕ(c2 (t)) = .4 (tangent space.
we will often refer to a Riemannian manifold (M. i. are smooth. ξ.3. η ∈ Tp M. In this deﬁnition. η ∈ Tp M. . We state that every vector ξ ∈ Tp M can be uniquely decomposed as ξ = n ξ i Ei . Ej p . positivedeﬁnite. · p on Tp M is a bilinear. ζ • ξ. Hence. . . The vector at p will be written as Xp or X(p) and lies in Tp M. for each p ∈ M. (E1 . . .1. with the obvious manifold structure on R. which we will deﬁne later on and extensively use in our optimization algorithms. ﬁnitedimensional Riemannian manifold. ξ. and t ∈ [a. b − t] → M : τ → γt (τ ) = γ(t + τ ). Deﬁnition 2. . . ξ p p p = a ξ. From now on.. Now using the inner product i=1 n of deﬁnition 2. ζ. a.1. . . . Let (e1 .e.7. Let M be a smooth manifold and ﬁx p ∈ M. An important example of a vector ﬁeld is the gradient of a scalar ﬁeld on a manifold. ϕ) be a chart of M such that p ∈ U . We deﬁne the tangent vectors ϕ Ei = (θp )−1 (ei ) for i ranging from 1 to n..e. i. [γt ] ∈ Tp M is a vector tangent to γ at time t. Let M be a smooth manifold of dimension n and p be a point on M. ˆ ˆ where we deﬁned the vectors ξ = (ξ 1 .. A vector ﬁeld is a smooth mapping from M to T M such that π ◦ X = Id. This notation has the advantage of ˙ capturing the nature of this vector (a velocity vector).2. symmetric positivedeﬁnite form on Tp M. we deﬁne inner products as follows: Deﬁnition 2. A Riemannian metric is a smoothly varying inner product deﬁned on the tangent spaces of M. ∀ξ. b ∈ R: • aξ + bζ. where M is a smooth manifold and g is a Riemannian metric. Given a curve of class C 1 . we deﬁne gij = Ei .3 Inner products and Riemannian manifolds Exploiting the vector space structure of tangent spaces. i. ξ p = 0 ⇔ ξ = 0. ≥ 0. ζ = (ζ 1 . when the metric is clear from the context.3. Ej p . ζ n )T and the matrix Gp such that (Gp )ij = Ei .2. Let (U.3. η p . γ : [a. we = ξ. . As is customary. ξ p . ξ n )T .2 (Riemannian manifold). . smoothly varying means that the functions (Gp )ij . η • ξ. An inner product ·. . which are functions from U to R.. η p .e. when we quote the term manifold. ζ p = i=1 j=1 ˆ ˆ gij ξ i ζ j = ξ T Gp ζ. gp (·. En ) is a basis of Tp M. γ(t) is identiﬁed with γ (t). i. ·) is an inner product on Tp M. in the sense described in deﬁnition 2. . . = ζ.1. A Riemannian manifold is a pair (M.1 (inner product).e.6 (vector ﬁeld). the identity map. η p + b ζ. g) simply as M. When using deﬁnition 2. when it is clear from the context that ξ and η are rooted at p. Let us introduce an alternative notation for tangent vectors to curves (velocity vectors) that will make things look more natural in the sequel. 2. we refer to a smooth. The matrix Gp is symmetric. η instead of ξ. we have: n n ξ.3. p write ξ. deﬁne another such curve on M: γt : [a − t.2. . b]. b] → M. . ξ p . g). The norm of a tangent vector ξ ∈ Tp M is ξ Often. INNER PRODUCTS AND RIEMANNIAN MANIFOLDS 13 Deﬁnition 2. en ) be a basis of Rn . unless otherwise stated. decomposing ζ = i=1 ζ i Ei like we did for ξ. We γ(t) ˙ [γt ]. This curve is such that γt (0) = γ(t) propose to write p.
In particular.3).1 (directional derivative). We now generalize the concept of directional derivatives to scalar ﬁelds on manifolds. Deﬁnition 2.2.2.4. This follows from the direct sum decomposition Rn = Tx M ⊕ Tx M. way for Riemannian submanifolds of Euclidean spaces.2. then to write it as an inner product suitable for direct identiﬁcation. Deﬁnition 2. We can thus deﬁne (unique) orthogonal projectors Px : Rn → Tx M : v → Px v. In the above notation. Let f be a scalar ﬁeld on a smooth. since such manifolds are embedded in Rn . The following deﬁnition is of major importance for our purpose. following deﬁnition 2.4 (continued from example 2. the representative of the equivalence class ξ.3 (Riemannian submanifold). Based on this deﬁnition. i. denoted by grad f (p).2) equipped with the Riemannian metric inherited from Rn . and similarly to the Euclidean case. according to deﬁnition 2. The directional derivative of a scalar ﬁeld f on M at p ∈ M in the direction ξ = [c] ∈ Tp M is the scalar: Df (p)[ξ] = d f (c(t)) dt t=0 It is easily checked that this deﬁnition does not depend on the choice of c.4. A Riemannian submanifold M of Rn is a submanifold of Rn (as indirectly deﬁned by theorem 2. is deﬁned as the unique element of Tp M satisfying: Df (p)[ξ] = grad f (p). ﬁnitedimensional Riemannian manifold M. Deﬁnition 2. . grad f is the usual gradient. Hence.5 provides an alternative. The Riemannian metric on the sphere is obtained by restricting the metric on R3 to S2 .4. It generalizes gradients of scalar ﬁelds to manifolds. Let M ⊂ Rn be a Riemannian submanifold of Rn equipped with the inner product ·. A scalar ﬁeld on M is a smooth function f : M → R. · and let f : Rn → R be a smooth scalar ﬁeld. which we note f . Remarkably. ∀ξ ∈ Tp M Thus.4 Scalar ﬁelds on manifolds and the gradient vector ﬁeld Let M be a smooth manifold. That might not be practical. For a scalar ﬁeld f on a Euclidean space. v2 ∈ Tx S2 . Theorem 2. Then: grad f : M → T M : x → grad f (x) = Px f (x).2 (gradient). one way to derive an expression for the gradient of a scalar ﬁeld f is to work out an expression for the directional derivative of f . where Px is the orthogonal projector from Rn onto Tx M. the brackets around ξ are a convenient way of denoting that ξ is the direction.1. ELEMENTS OF RIEMANNIAN GEOMETRY 2.e.5. ξ p . and ⊥ ⊥ ⊥ Px : Rn → Tx M : v → Px v ⊥ ⊥ such that v = Px v + Px v. for x ∈ S2 and v1 .. the tangent space Tx M is a vector subspace of Rn . the gradient deﬁned above is the steepestascent vector ﬁeld and the norm grad f (p) p is the steepest slope of f at p.4. The gradient of f at p.1. The orthogonal projector on the tangent space Tx S2 is Px = I − xxT . v1 . v2 x = T v1 v2 .4. often shorter. The proof was derived independently but undoubtedly resembles existing proofs. grad f : M → T M is a vector ﬁeld on M. Riemannian submanifolds of Rn are quite common and easy to picture. Theorem 2. Let f = f M be the restriction from f to M.4. We ﬁrst need a deﬁnition. the metric on M is obtained by restricting the metric on Rn to M.14 CHAPTER 2. Example 2. They do not mean that we are considering some sort of equivalence class of ξ (that would not make sense).4.
This can be generically deﬁned for manifolds. If M were a Euclidean space. + bZ) = a X (f Y +b X Z. The notation Xf stands for a scalar ﬁeld on M such that Xf (p) = Df (p)[Xp ].1: Df (x)[v] = d f (cv (t)) dt d f (cv (t)) = dt = t=0 t=0 f (x).1 (aﬃne connection). even if we give meaning to this sum . Let x ∈ M. when M is not a vector space. or LeviCivita connections. the derivative of the gradient vector ﬁeld). hence: = Px f (x). We then show a result for Riemannian submanifolds and promptly specialize it to submanifolds of Euclidean spaces. Furthermore. consider a smooth curve cv : R → M such that cv (0) = x and cv (0) = v. To overcome these diﬃculties. The above properties should be compared to the usual properties .2.. We would like to deﬁne the derivative of Y at x in the direction Xx . We now quickly go over the deﬁnition of aﬃne connection for manifolds and the LeviCivita theorem. the above equation does not make sense because x+tXx is undeﬁned..4. b ∈ R.and we will in section 2. 2. The reader can safely skim through the technical deﬁnitions that we have to introduce in order to get there. The latter is the important result for our study and comes in a simple form. For all v ∈ Tx M. v v is a tangent vector. we would write: DY (x)[Xx ] = lim t→0 Y (x + tXx ) − Y (x) t Of course. Let X (M) denote the set of smooth vector ﬁelds on M and F (M) denote the set of smooth scalar ﬁelds on M. Y. Y be vector ﬁelds on M. Hence their diﬀerence would be undeﬁned too. Since we are mainly interested in Riemannian manifolds. v The result follows immediately from deﬁnition 2. g ∈ F(M) and a. Deﬁnition 2. speciﬁc to Riemannian manifolds.2.5 Connections and covariant derivatives Let M be a Riemannian manifold and X. the derivative of the velocity vector) as well as the Hessian of a scalar ﬁeld (i. The derivatives deﬁned via these connections are notably interesting because they give a coordinatefree means of deﬁning acceleration along a curve (i.5. By coordinatefree.7 Y (x + tXx ) and Y (x) would not belong to the same vector spaces. An aﬃne connection on a manifold M is a mapping : X (M) × X (M) → X (M) which is denoted by (X. Then. Z ∈ X (M). f. CONNECTIONS AND COVARIANT DERIVATIVES 15 Proof.e. by deﬁnition 2. in which X. XY (3) Product rule (Leibniz’ law): ) = (Xf )Y + f . we need the concept of connection. we focus on the so called Riemannian. cv (0) Decompose the gradient in its tangent and normal components: ⊥ = Px f (x) + Px f (x). Y ) → (1) F (M)linearity in X: (2) Rlinearity in Y : X (aY XY f X+gY and satisﬁes the following properties: Z=f XZ XY +g Y Z.e.5. We have used a standard interpretation of vector ﬁelds as derivations on M. we mean that the choice of charts is made irrelevant.4.
5. hence simply restricting to the circle is not an option. we need to project ( X X)x on the tangent space Tx M to obtain ( X X)x . the classical directional derivative deﬁnes an aﬃne connection: ( X Y )x = lim t→0 Y (x + tXx ) − Y (x) = DY (x)[Xx ] t This should give us conﬁdence that the deﬁnition 2.e.3. the connection exposed in example 2. In Rn .16 CHAPTER 2. Y ] for the Lie bracket of X and Y .5.3 is the Riemannian connection on Euclidean spaces. Every smooth manifold admits inﬁnitely many aﬃne connections. Theorem 2. we used the notation [X.2 (covariant derivative). Z ∈ X (M).5.5..1 is a good deﬁnition. Example 2. which is a vector ﬁeld deﬁned by [X. X Y )p = Pp ( X Y )p . the tangent vector to the curve at γ(t). [AMS08]. Y ]f = X(Y f ) − Y (Xf ). for all X. ZY (compatibility with the Riemannian metric). Figure courtesy of Absil et al. Figure 2.3. As theorem 2. The next theorem is an important result about the Riemannian connection of a submanifold of a Riemannian manifold taken from [AMS08]. and Z X. Typically. As often.5. XY is called the covariant derivative Essentially. Not surprisingly. ˙ The following example shows a natural aﬃne connection in Euclidean space. This approach is called an axiomatization: we state the properties we desire in the deﬁnition. We can deﬁne covariant derivatives along curves. then only investigate whether such objects exist. the velocity vector. This situation is illustrated on ﬁgure 2. Then. XX X M M in a Euclidean space M applied to a tangent vector ﬁeld X to a circle.. Let M be a Riemannian submanifold of a Riemannian manifold M and let and denote the Riemannian connections on M and M. Y. at each point p ∈ M. The vector ﬁeld of Y with respect to X for the aﬃne connection .5 shows.5. We always assume that the LeviCivita connection is the one we use. the added structure of Riemannian manifolds makes for stronger results. This aﬃne connection is called the LeviCivita connection or the Riemannian connection. ∀f ∈ F(M). On a Riemannian manifold M there exists a unique aﬃne connection that satisﬁes (1) XY − Y X = [X. Y ] (symmetry). X Y (p) is a vector telling us how the vector ﬁeld Y is changing in the direction Xp .5. again using the interpretation of vector ﬁelds as derivations. ELEMENTS OF RIEMANNIAN GEOMETRY of derivations in Rn . Y = + X. Y would be deﬁned on the trajectory of a curve γ and the derivative direction Xγ(t) would be equal to γ(t). In the above deﬁnition. The LeviCivita theorem singles out one particular aﬃne connection for each Riemannian manifold.5. i.3: Riemannian connection Theorem 2. Y (2) Z X. We observe that X X is not tangent to the circle.4 (LeviCivita). Deﬁnition 2. ( for all Xp ∈ Tp M and Y ∈ X (M). This interpretation shows that it is nonessential that the vector ﬁelds X and Y be deﬁned over all of M.
1] → M such that γ(0) = p and γ(1) = q.2. Another characteristic of line segments seen as curves with arclength parameterization is that they have zero acceleration.6. For close points. The acceleration along a curve can now be deﬁned. b] → M. In a Euclidean space. geodesic curves are shortest paths. ( X Y )x 17 = Px (DY (x)[Xx ]) (2.2 (Riemannian distance).1 (length of a curve).1) and using deﬁnition 2.5.e.1 for tangent vectors. γ(t) can be replaced by γ (t).. A curve γ : R → M is a geodesic curve if and only if D2 dt2 γ(t) ≡ 0. where γ is considered to be a function ˙ from [a.6. which is useful in chapter 4. is deﬁned by b b L(γ) = a γ(t). η p gp (ξ. η). Following this geodesic. 102]. if it has zero acceleration on all its domain. p. such a path would simply be the line segment joining the points. γ : [a. γ(t) ˙ ˙ γ(t) dt = γ(t) ˙ a γ(t) dt. 2. one of these paths is bound to be suboptimal. It has the added advantage D of showing linearity of the operator dt as well as a natural product and composition rule. Under very reasonable conditions (see [AMS08. with ξ. The equator itself. It is not true for any two points on a geodesic though.2. i.6 Distances and geodesic curves The availability of inner products on the tangent spaces makes for an easy deﬁnition of curve length and distance.6 (acceleration along a curve). preserving this zero acceleration characteristic. Deﬁnition 2. one can show that the Riemannian distance deﬁnes a metric.6. For submanifolds of Euclidean ˙ spaces. Let γ : R → M be a C 2 curve on M. for Euclidean spaces (example 2. by equation (2. Deﬁnition 2.6. p. Deﬁnition 2. The next deﬁnition generalizes the concept of straight lines. q) → dist (p. parameterized by arclength. Indeed. 2 dt dt An axiomatic version of this deﬁnition is given in [AMS08. we abused the notation for γ which is tacitly supposed to be smoothly ˙ extended to a vector ﬁeld X ∈ X (M) such that Xγ(t) = γ(t). γ∈Γ where Γ is the set of all C curves γ : [0. DISTANCES AND GEODESIC CURVES In particular. If M is embedded in Rn . q) = inf L(γ). this reduces to: d2 D2 γ(t) = Pγ(t) 2 γ(t). It gives us a practical means of computing derivatives of vector ﬁelds on the sphere for example. The acceleration along γ is given by: D D2 γ(t) = γ(t) = γ(t) γ(t). on a Riemannian manifold (M. one can join the two points via a path of length r or a path of length 2π − r. Deﬁnition 2.3 (geodesic curve). g). Equation (2. The Riemannian distance (or geodesic distance) on M is given by: dist : M × M → R+ : (p. think of two points on the equator of the unit sphere in R3 .5. 46]).1) This means that the Riemannian connection on M can be computed via a classical directional derivative in the embedding space followed by a projection on the tangent space.1) is the most important result of this section.3). ˙ ˙ ˙ dt2 dt In the above equation. The deﬁnition above captures the idea that the distance between two points is the length of the shortest path joining these two points. to Riemannian manifolds. 1 . is a geodesic. The length of a curve of class C 1 . Unless r = π. b] to Rn and with the right deﬁnition of g.
given a point x on a manifold and a tangent vector ξ at x. x. computationally. there exists an open interval I 0 and a unique geodesic γ(t. ∀x ∈ M. Furthermore. Expx (ξ) is a point on the manifold that can be reached by leaving x and moving in the ξ direction while remaining on the manifold. an exponential map is a retraction. where 0 is the zero element of Tx M.7. has a simpler deﬁnition that captures the most important aspects of exponentials as far as we are concerned. Deﬁnition 2. ξ). Retractions are the core concept needed to generalize descent algorithms to manifolds. ξ) is called the exponential map at x.7. For two points x and y. ELEMENTS OF RIEMANNIAN GEOMETRY 2. ˙ The mapping Expx : Tx M → M : ξ → Expx (ξ) = γ(1. Figure 2. the trajectory followed is a geodesic (zero acceleration) and the distance traveled equals the norm of ξ. aξ) = γ(at.5. A related concept is the logarithmic map. while being deﬁned in a ﬂexible enough way that we will be able to propose retractions that are. The following concept..7 Exponential and logarithmic maps Exponentials are mappings that. retractions. the local rigidity condition can be stated as: ∀ξ ∈ Tx M. Deﬁnition 2. it is deﬁned as the inverse mapping of the exponential map. A retraction on a manifold M is a smooth mapping R from the tangent bundle T M onto M with the following properties. One can Rx (tξ) satisﬁes γξ (0) think of retractions as mappings that share the important properties we need with the exponential map. Let Rx denote the restriction of R to Tx M. (1) Rx (0) = x. [AMS08].4: Retraction. Moreover. For every ξ ∈ Tx M.2 (retraction). . In a Euclidean space.4 and 3. we drop the requirement that the trajectory by a geodesic. x. In particular. see sections 3. (2) The diﬀerential (DRx )0 : T0 (Tx M) ≡ Tx M → Tx M is the identity map on Tx M. With exponentials. x. Exponentials can be expensive to compute. (DRx )0 = Id (local rigidity). logarithms generalize the concept of “y − x”. Tx M x ξ Rx (ξ) M Figure 2. as well as the equality between distance traveled and ξ . In particular. the curve γξ : t → ˙ [γξ ] = ξ. the sum x + ξ is a point in space that can be reached by leaving x in the direction ξ. in [AMS08]. Not surprisingly.1 (exponential map). ξ) : I → M such that γ(0) = x and γ(0) = ξ.4 illustrates the concept. cheaper than exponentials. generalize the concept of “x + ξ”. we have the homogeneity property γ(t. Essentially.18 CHAPTER 2. x. It was introduced by Absil et al. Expx (0) = x. Equivalently. Let M be a Riemannian manifold and x ∈ M. Figure courtesy of Absil et al.
in this work.2. the diﬀerential (DRx )0 : T0 (Tx M) ≡ Tx M → Tx M is the identity map Id on Tx M which.7.7..7. The inverse function theorem on manifolds can be stated like so. where 0 is the zero element of Tx M. there exists a neighborhood U ⊂ Tx M of 0. We deﬁne Logx : M → Tx M : y → Logx (y) = ξ. is a vector space isomorphism.g. Plus.4 (inverse function theorem). this deﬁnition is satisfactory. A generalized logarithm is a smooth mapping L from M × M to T M with the following properties. then there exists an open neighborhood U ⊂ M of x such that the restriction f U : U → f (U ) is a diﬀeomorphism (i.. .e. the logarithmic map returns a tangent vector at x pointing toward y and such that Logx (y) = dist (x. [Boo86]. computing this map can be computationally expensive. In particular. y). (Dg −1 )g(x) = (Dgx )−1 . see. an invertible linear map).2 for retractions.3 (logarithmic map).7. (DLx )x = Id. this deﬁnition is not perfect. As is. just like we introduced retractions as proxies for the exponential. By the inverse function theorem. As long as x and y are not “too far apart”.5 (generalized logarithms). Furthermore. Let us consider a retraction R on a smooth manifold M and let us consider a point x ∈ M. N be smooth manifolds and let f : M → N be a smooth function. Theorem 2. Deﬁnition 2. writing g = f U . of course. [dC92]. Let M. e. the notion of generalized logarithm. is a vector space isomorphism (i. The map Rx : T x M → M is a smooth function from M to the smooth manifold Tx M. Lx (x) = 0 since Rx (0) = x.g. Let M be a Riemannian manifold.7. Given a root point x and a target point y. We introduce a new deﬁnition for the purpose of this work. (1) Lx (x) = 0. y) . Let Lx : M → Tx M denote the restriction from L to {x} × M. For a more careful deﬁnition of the logarithm.e. according to the deﬁnition 2. we will need to diﬀerentiate it with respect to x and y later on.. and a smooth function Lx : Rx (U ) ⊂ M → U such that Lx (Rx (ξ)) = ξ. Furthermore. For example. We pragmatically introduce. e. such that Expx (ξ) = y and ξ x 19 = dist (x.. (2) The diﬀerential (DLx )x : Tx M → T0 (Tx M) ≡ Tx M is the identity map on Tx M. think of the sphere S2 and place x and y at the poles: for any vector η ∈ Tx S2 such that η = π. with x ∈ M. we have Expx (η) = y. we also have that the diﬀerential (DLx )x : Tx M → T0 (Tx M) ≡ Tx M is the identity map on Tx M. see. a smooth invertible function with smooth inverse). If the diﬀerential of f at x ∈ M. ∀ξ ∈ U . Again. Dfx : Tx M → Tf (x) N . EXPONENTIAL AND LOGARITHMIC MAPS Deﬁnition 2. There might indeed be more than one eligible ξ. Since Id−1 = Id.
a. such that Tη (ξ) ∈ TRx (η) M. Exploiting the fact that Log is a generalized logarithm. a vector ﬁeld deﬁned along the trajectory of γ and such that Xx = ξ and γ(t) X(γ(t)) ≡ 0. one of them will be the vector to transport and the other will be the vector along which to transport. this concept was ﬁrst described by Absil et al. we verify that Lx (x) = 0 and that (DLx )x = fx (x) · (D Logx )(x) + Logx (x) · Dfx (x) = fx (x) · Id is the identity map on Tx M if and only if fx (x) = 1. b ∈ R . We ﬁrst introduce the Whitney sum then quote the deﬁnition of vector transport.8 Parallel translation In Euclidean spaces. we now introduce the concept of vector transport as a proxy for parallel translation. In particular. ξ) : η.1. The transported vector is Xy . T M ⊕ T M = {(η. in [AMS08]. (2) (consistency) T0 (ξ) = ξ for all ξ ∈ Tx M. Again. A vector transport on a manifold M is a smooth mapping T M ⊕ T M → T M : (η. ξ. Deﬁnition 2. x ∈ M} Hence T M ⊕ T M is the set of pairs of tangent vectors belonging to a same tangent space. ξ ∈ Tx M. computing Xy requires one to solve a diﬀerential equation on M. ζ ∈ Tx M. when Logx (y) admits an expression in the form of a vector scaled by a complicated nonlinear function. the notion of vector transport tells us how to transport a vector ξ ∈ Tx M from a point x ∈ M to a point Rx (η) ∈ M. 2. let us consider the smooth map Lx : M → Tx M : y → Lx (y) = fx (y) · Logx (y) . We say that X is constant along γ. η ∈ Tx M. each tangent vector belongs to a tangent space speciﬁc to its root point. so much that the notion of root of a vector is utterly unimportant.1 (vector transport). (3) (linearity) Tη (aξ + bζ) = aTη (ξ) + bTη (ζ) . We introduce X. a vector ξ ∈ Tx M and a curve γ on M such that γ(0) = x and γ(1) = y. called the retraction associated with T. In the next deﬁnition. Hence. ∀η. Let us exhibit another way of constructing generalized logarithms on a smooth manifold M. Vectors from diﬀerent tangent spaces cannot be compared immediately. On manifolds. This deﬁnition is illustrated on ﬁgure 2. We do this for the sphere in section 4.8. it is natural to compare vectors rooted at diﬀerent points in space. Just like we introduced retractions as a simpler proxy for exponentials and generalized logarithms for logarithms. a generalized logarithm may sometimes be constructed by (carefully) getting rid of the scaling factor. In general. generalized logarithms can be constructed as extensions of local inverses of retractions. y ∈ M. The proper tool from diﬀerential geometry for this is called parallel translation. ELEMENTS OF RIEMANNIAN GEOMETRY Based on the considerations above.20 CHAPTER 2. Given x ∈ M and a smooth scalar ﬁeld fx : M → R.5. Let us consider two points x. ξ) → Tη (ξ) ∈ T M satisfying the following properties for all x ∈ M: (1) (associated retraction) There exists a retraction R. We need a mathematical tool capable of transporting vectors between tangent spaces while retaining the information they contain. it ˙ depends on γ. we can think of Log as a generalized logarithm constructed from the retraction Exp. Roughly speaking.
In our implementation for chapter 4. . In the following.8.. Example 2.8.1. chapter 4. section 3. PARALLEL TRANSLATION 21 Tx M x ξ η Rx (η) Tη (ξ) M Figure 2. in the next example taken from [AMS08.2. A valid retraction on the sphere S2 is given by: Rx (v) = An associated vector transport is: Tη (ξ) = 1 x+η I− (x + η)(x + η)T (x + η)T (x + η) ξ x+v .2. p. The scaling factor is not critical. We give the retraction and the associated transport we use on the sphere. x+v On the righthand side. we will use this deﬁnition to give a geometric version of the nonlinear conjugate gradients method. we chose to normalize such that Tη (ξ) = ξ .5: Vector transport. [AMS08]. η and ξ are to be treated as elements of R3 . x.8. according to deﬁnition 2. 172]. Figure courtesy of Absil et al.5.
22 CHAPTER 2. ELEMENTS OF RIEMANNIAN GEOMETRY .
or at least cannot be proven as easily. we use these tools to introduce what we call geometric ﬁnite diﬀerences. Still. In this third chapter. These algorithms need not be adapted. we successively give a quick reminder of the steepest descent method and nonlinear conjugate gradient methods for the minimization of a real valued function in Rn . These are then straightforwardly plugged into our original problem objective function to obtain a generalized objective. 23 . the theoretical convergence results we had for these methods in the Euclidean case do not all hold in our geometric setting. Sadly.Chapter 3 Objective generalization and minimization The second chapter gave us useful tools to replace the forbidden linear combinations that appear in the Euclidean case by reasonable equivalents in the more general setting of manifolds. some useful results can be found in [AMS08]. numerical results tend to be reassuring on these matters. where we simply replaced every forbidden operation by its geometric counterpart. As we will see in chapters 4 and 5. We close this chapter with a few suggestions for the step choosing algorithms. These reviews are followed by geometric versions. since in both the Euclidean and the Riemannian settings they are concerned with loosely minimizing real functions of one real variable. In what constitutes a second part of this chapter.
Those may be interpreted as vectors rooted at x0 and pointing toward. The goal is to derive a formula for the velocity x(t0 ) and the acceleration x(t0 ) based on these values. ˙ ˙ ¨ ¨ Rewriting. our formulas are linear combinations of the values xb . We call the following geometric ﬁnite diﬀerences: 1 1 ∆t2 Logx0 (xf ) − ∆t2 Logx0 (xb ) f b ∆tf + ∆tb ∆tf ∆tb 1 2 ∆tb Logx0 (xf ) + ∆tf Logx0 (xb ) x0 ≈ ¨ ∆tf + ∆tb ∆tf ∆tb x0 ≈ ˙ (3.1 Geometric ﬁnite diﬀerences We start by considering a smooth function x : R → Rn . However. OBJECTIVE GENERALIZATION AND MINIMIZATION 3. Using the logarithmic map introduced in section 2. which is a ¨ desirable property. have a vector space structure. if x0 = 0. respectively. then there exists a geodesic γ(t) ¨ ﬁtting the data. 0 ∆tf Logx0 (xf ) −1 Logx0 (xb ) . ∆tb ). xf and xb . x0 = x(t0 ). since manifolds do not. we write down the following Taylor expansions of x about t0 : ∆t2 f x(t0 ) + O(∆t3 ) ¨ f 2 ∆t2 b x(t0 ) + O(∆t3 ) ¨ x(t0 − ∆tb ) = x(t0 ) − ∆tb x(t0 ) + ˙ b 2 x(t0 + ∆tf ) = x(t0 ) + ∆tf x(t0 ) + ˙ We introduce some notations here to simplify the expressions: xf = x(t0 + ∆tf ).3). x0 = x(t0 ) and ∆t = max(∆tf .3) in such a way that the diﬀerences xf − x0 and xb − x0 are apparent. γ is a geodesic before and after t0 and is continuous.3) As expected. As is ˙ ¨ customary.24 CHAPTER 3. we now propose a generalization of (3. We assume that we know x(t) at times t0 .1) x0 = x(t0 ). in general. x0 and xf . x0 and xf are now considered to be on a manifold M. Simply build γ as follows: Expx t0 −t Logx (xb ) if t < t0 . i. t0 + ∆tf and t0 − ∆tb (the subscripts f and b stand for forward and backward ). we get: ∆t2 f x0 + O(∆t3 ) ¨ 2 ∆t2 b x0 + O(∆t3 ) ¨ xb = x0 − ∆tb x0 + ˙ 2 xf = x0 + ∆tf x0 + ˙ This is a linear system in x0 and x0 . Indeed. (3. Solving it yields: ˙ ¨ x0 = ˙ 1 1 ∆t2 (xf − x0 ) − ∆t2 (xb − x0 ) + O(∆t2 ) b f ∆tf + ∆tb ∆tf ∆tb 1 2 [∆tb (xf − x0 ) + ∆tf (xb − x0 )] + O(∆t) x0 = ¨ ∆tf + ∆tb ∆tf ∆tb (3. ∆tb By construction. The objects x0 and x0 are vectors in ˙ ¨ the tangent space Tx0 M. It is interesting to observe that the vector x0 is zero if x(t) is a geodesic curve. xb = x(t0 − ∆tb ).2) (3.. The converse is also true. Arbitrary linear combinations do not. γ is also globally a geodesic.e.7. in general.4) xb . we can compute the velocity vector γ(t0 ) “coming from the left”: ˙ γ(t− ) = ˙ 0 . 0 ∆tb 0 γ(t) = t−t0 Expx if t ≥ t0 . make sense on manifolds. we have assembled the terms of (3. We give an outline of a proof.
γsi ) i=1 Nd αi vi i=1 Nd 2 γi 1 Ea (γ) = 2 βi a i i=1 2 γi This time. Ev and Ea are deﬁned over a curve space Γ = M × . . Equations (1. (1. We now have new tools to rewrite these elements in a very similar form: E(γ) = Ed (γ) + λEv (γ) + µEa (γ) Ed (γ) = Ev (γ) = 1 2 1 2 N (3.5) Since the formulas for the acceleration are only of order 1. if M is ﬁnite dimensional.2 Generalized objective In section 1. deﬁnition 2.6). The distance dist (·. . appearing in the generalized continuous objective.2). then so is Γ.6) wi dist2 (pi .5. or a proxy for it. ∆tf Assuming x0 = 0. . the deﬁnition 2. We built this generalization based on our interpretation of the linear combinations appearing in the classical ﬁnite diﬀerences used in the discrete objective. The function E and its components Ed . hence γ is a geodesic. equation (1. Either of the following is ﬁne: x0 ≈ ˙ 1 Logx0 (xf ) ∆tf −1 Logx0 (xb ) x0 ≈ ˙ ∆tb We generalized ﬁnite diﬀerences in a way that makes sense on any smooth manifold equipped with a logarithm. As it turns out. the data p1 . ·) is the geodesic distance on M (if available). which is simply Nd copies of the manifold M.2.4) and (1. It is a product manifold.3). the latter being used to evaluate velocity and acceleration along a discrete curve. . We use that generalized logarithm in chapter 4. D2 Instead. × M. The tangent vectors for velocity vi and acceleration ai at points γi are computed according to the formulas given in section 3.2. pN lives on a manifold M. This is useful because the deﬁnition of the gradient. x1 = x(t0 + ∆t1 ) and x2 = x(t0 + ∆t1 + ∆t2 ) and deﬁning ∆t = max(∆t1 . ¨ Similar developments yield unilateral ﬁnite diﬀerences for end points. then project the result back on the tangent space.2. we get: x0 ≈ ˙ ∆t1 ∆t1 + ∆t2 Logx0 (x1 ) − Logx0 (x2 ) ∆t1 ∆t2 ∆t2 (∆t1 + ∆t2 ) 2 −2 Logx0 (x1 ) + x0 ≈ ¨ Logx0 (x2 ) ∆t1 ∆t2 ∆t2 (∆t1 + ∆t2 ) (3. . ∆t2 ). 3. we may as well derive order 1 formulas for the velocity.7. we could have tried to directly discretize terms such as dt2 γ(t).5.5) used classical tools such as Euclidean distances and ﬁnite diﬀerences. equation (1. this is equivalent to computing the geometric ﬁnite diﬀerences with a particular generalized logarithm. If the logarithmic map needed for geometric ﬁnite diﬀerences has .1. GENERALIZED OBJECTIVE and “coming from the right”: γ(t+ ) = ˙ 0 25 1 Logx0 (xf ) . In the case of Riemannian submanifolds of Rn . . deﬁnition 2. Considering x0 = x(t0 ). which will always be the case in this document. we deﬁned an objective function for discrete curve ﬁtting in Euclidean spaces.4. on the sphere. these two velocity vectors agree.3. applies. In particular. and as such inherits many of the properties of M.6 for the acceleration would have led us to compute the ﬁnite diﬀerences in the ambient space.
γ∗ can be computed as: γ∗ = mean (γ1 . Hence. these penalties certainly make sense and are equivalent up to a scaling factor. y) → x. As we will see. . γ∗ ) = 1 γ1 + (γ3 − γ1 ) − γ2 2 2 2 = 1 γ1 − 2γ2 + γ3 4 2 2 dist2 (γ3 . Both our alternatives are based on the idea that (3. Interestingly. If even coming up with a generalized logarithm proves diﬃcult. The reader may want to read that section ﬁrst. We give explicit formulas on the sphere S2 to illustrate this. y ) on S2 (see table 4. 3}. y) = arccos2 ( x. We introduce S2 as a manifold in section 4. but diﬀer on manifolds. τ2 and τ3 .1. this is a novel objective function for the regression problem on manifolds. Let us consider three points on a manifold γ1 . ∀i ∈ {1.2). dt. although they may not be equivalent in the general. γ∗ ) whereas our second alternative penalizes terms of the form dist2 (γ3 . γ ). ∆τ1 ∆τ1 Logγ1 (γ3 ) .7) penalizes departure from straightness. 3} and such that the length of γ between τi and τi+1 equals the geodesic distance between γi and γi+1 . OBJECTIVE GENERALIZATION AND MINIMIZATION a too complicated form. the associated penalty is dist2 (γ2 . Other generalizations can be proposed. x = 1}. 2. we propose two alternative discretizations of the second order term in (1. To the best of our knowledge. ∆τ1 + ∆τ2 Our ﬁrst alternative penalizes terms of the form dist2 (γ2 . y = xT y be our inner product on R3 . 2}. γ2 and γ3 associated with times τ1 . Specializing the penalties to a Euclidean space. Let us consider ∆τ1 = ∆τ2 ∆τ for ease of notations. ∀i ∈ {1. γ3 ) Since dist2 (x. We will discuss some of them in the next section. Let (x. 2. γ3 ) γ1 + γ3 = γ1 + γ3 γ1 + γ3 . they both come down to ﬁnite diﬀerences when specialized to the Euclidean case. The discretizations we present here are limited to the second order. we could deﬁne γ∗ as the mean of γ1 and γ3 . tN D2 γ D2 γ . Riemannian case. They are not equivalent on the sphere. γ3 ) . In this particular case.3.in a Euclidean space. We show this explicitly.7) dt2 dt2 γ(t) t1 The geometric ﬁnite diﬀerences make for clean developments and can be used to derive higherorder formulas in a straightforward way. Deﬁne ∆τ1 = τ2 − τ1 and ∆τ2 = τ3 − τ2 . 2 (1 + γ1 . When γi ∈ S2 = {x ∈ R3 : x. a proxy for it can be used too.3 Alternative discretizations of the objective* In this optional section.26 CHAPTER 3. ∀i ∈ {1. If there was a geodesic γ(t) such that γ(τi ) = γi . 3. γ∗ ) = arccos2 γ2 .1). one can envision the alternative generalized objectives described in section 3. these alternative formulations all come down to ﬁnite diﬀerences in the Euclidean case. (3. γ ) = (γ1 + 2(γ2 − γ1 )) − γ3 = γ1 − 2γ2 + γ3 We recognize classical ﬁnite diﬀerences for acceleration at γ2 . we obtain: dist2 (γ2 . then γ2 would be equal to γ∗ = Expγ1 and γ3 would be equal to γ = Expγ1 ∆τ1 + ∆τ2 Logγ1 (γ2 ) . γ1 + γ3 2 (1 + γ1 .
are given by: ∂Ea = γ3 − 2 γ2 . b ) (this corresponds to a ﬁrst order Taylor approximation). Regretfully.7) on S2 for uniform time sampling. γ2 + γ3 .γ =1 γ = α1 γ1 + α2 γ2 γ1 . γ3 γ2 ∂γ1 ∂Ea = γ4 − 2( γ1 . γk+1 )γk−1 . the geodesic distance can be approximated by the Euclidean distance in the ambient space R3 . . We would like to exploit this to propose a cheap discretization of (3. The 1/∆τ 3 factor is explained as follows. γ3 ) . The derivatives of (3.e. b ). seen as a scalar ﬁeld in the ambient space R3×Nd . γ2 γ2 − γ1 . ∀k ∈ {3. γi+1 (3. which we discard as not interesting. γ2 γ2 . Nd − 2} ∂Ea = γNd −3 − 2( γNd −3 . hence we identiﬁed corresponding terms. γ3 γ1 ∆τ 3 ∂γ2 ∂Ea = γk+2 − 2( γk−1 . or: γ = 2 γ1 . . γk−1 + γk . γNd )γNd −2 − 2 γNd −2 . γ ) = arccos2 (2 γ1 . γk+2 )γk+1 ∆τ 3 ∂γk + γk−2 − 2( γk−2 .8) We neglected the endpoints. Either γ = γ1 . b ∈ S2 Using this instead of arccos2 ( a. γ This system in (α1 . i. a. . the penalty is given by: Ea = 1 2 Nd −1 ∆τ i=2 γi+1 − 2γi + γi−1 ∆τ 2 2 = 1 2 Nd −1 i=2 1 γi+1 − 2γi + γi−1 ∆τ 3 2 2 As we showed earlier. The latter is a point on the sphere lying in the plane deﬁned by γ1 and γ2 and such that the geodesic distance. dist2 (γ3 . γ4 )γ3 − 2 γ2 . α2 ) has two solutions. we propose a simple form for Ea on S2 : Ea = 1 ∆τ 3 Nd −1 i=2 1 − 2 γi−1 . this nice property is lost when we try to generalize to nonhomogeneous time sampling. γ ) = γ3 − 2γ2 + γ1 . . subtractions and multiplication. γNd −1 γNd ∆τ 3 ∂γNd −1 ∂Ea = γNd −2 − 2 γNd −2 .3. γNd −2 + γNd −1 . the angle between γ1 and γ2 equals the angle between γ2 and γ . γ2 = γ2 . We have: a−b 2 = 2(1 − a. In the Euclidean case. which is not a liability since this formula is intended for ﬁne time sampling. When consecutive γi ’s are close to each other.8). We now compute γ . . γi+1 + γi−1 . γNd −1 γNd −1 ∆τ 3 ∂γNd ∆τ 3 Both Ea and its gradient can thus be computed with simple inner products of close points on S2 and elementary operations such as sums. γ3 − γ1 .3. in the Euclidean case. The associated penalty takes the nice form: dist2 (γ3 . γk + γk+1 . We can formalize this as: γ . ALTERNATIVE DISCRETIZATIONS OF THE OBJECTIVE* 27 This is a rather involved function considering that we will need to diﬀerentiate it.. This is cheap and reliable. γi γi .
. Algorithm 2: Geometric steepest descent method on M input : A scalar ﬁeld f : M → R and its gradient grad f : M → T M. x1 . whether or not the sequence generated by algorithm 1 converges to a critical point of f (i. along with proper analysis giving. For practical purposes. The following results are concerned with general descent methods. We delay the details of how one can choose the step length to section 3. Algorithm 1: Steepest descent method in Rn input : A function f : Rn → R and its gradient f : Rn → Rn . converging toward a critical point of f . i.e. one should. The summation of these two elements is undeﬁned. which we name xk+1 . and in particular the exponential map do exactly this. we give a reminder of the method in algorithm 1. do if f (xk ) = 0 then dk := − f (xk ) f (xk ) αk := choose step length xk+1 := xk + αk dk else xk+1 = xk end end We would like to use the tools developed in the previous chapter to generalize algorithm 1 to a scalar ﬁeld f on a manifold M. We thus propose to simply state the relevant theorem from [AMS08] without proof. we follow the vector αk dk until we reach a new point.28 CHAPTER 3. . OBJECTIVE GENERALIZATION AND MINIMIZATION 3. A retraction R on M. a point x∗ such that f (x∗ ) = 0) depends on the step choosing algorithm. . deﬁnition 2. . for k = 0.2. Such a treatment of the method is not our main focus in this document. namely xk+1 := xk + αk dk . output: A sequence x0 . suﬃcient conditions to guarantee convergence toward a critical point of the scalar ﬁeld f . in very much the same way. integrate a stopping criterion into the algorithm. output: A sequence x0 .6. Retractions. 1.4 A geometric steepest descent algorithm The steepest descent algorithm is probably the most straightforward means of minimizing a differentiable function f : Rn → R without constraints..e. The initial guess can be constructed based on a priori knowledge of the problem at hand. An initial guess x0 on M. . x1 . What we still need is the right interpretation for the update line of the steepest descent algorithm. the meaning of the sum is clear: starting at xk . . . dk is not forced to point in the opposite direction of the gradient.2. for k = 0. . Of course. An initial guess x0 ∈ Rn . . . Following Nocedal and Wright in [NW99].4. However. notably. 1. of course. . converging toward a critical point of f . . In the geometric context. We already deﬁned the gradient grad f : M → T M in deﬁnition 2. xk is a point on the manifold and αk dk is a tangent vector to M at xk . do if grad f (xk ) xk = 0 then grad dk := − grad ff (xk ) x (xk ) k αk := choose step length xk+1 := Rxk (αk dk ) else xk+1 = xk end end Such a generalization can be found in [AMS08].. We can thus rewrite the algorithm as algorithm 2. .7.
1) and tA is the Armijo step size for given k k parameters. based on Armijo’s backtracking procedure. based on the steepness of the chosen direction.4. β m αd The real number t A x is the Armijo step size. the derivative of a vector ﬁeld. The above deﬁnition essentially forces the directions dk to be descent directions in the end. It should not come as a surprise that the geometric version of it. . .}. which we presented in the previous section. the steepest descent sequence dk = − grad f (xk ) grad f (xk ) is gradientrelated. at each step.3 fairly independently of the actual directions and steps used. eﬃcient algorithm. f (xk ) − f (xk+1 ) ≥ c(f (xk ) − f (Rxk tA dk )). The next theorem uses both these deﬁnitions to make a statement about the convergence properties of variations of algorithm 2. for any subsequence {xk }k∈K of {xk } that converges to a noncritical point of f .5 are the right tool. such a point and step size are guaranteed to exist. a point x ∈ M. This is unpractical (but not impossible) in a geometric setting. The Armijo point. one could resort to secondorder methods. It also makes it possible to prove theorem 3. the Armijo point is Rx dA = Rx tA d = Rx (β m αd).4.5 A geometric nonlinear conjugate gradient algorithm In Rn . .3. hopefully. dk ∈ Txk M. 3. since it requires a proper deﬁnition of the Hessian of a scalar ﬁeld f . .. though. i. Obviously. . It diﬀers slightly from [AMS08]. The objective functions we deal with in this document being quite involved.1 (gradientrelated sequence). In order to achieve faster convergence rates. a tangent vector d ∈ Tx M and scalars α > 0.e. d1 . σ ∈ (0. Further assume that the initial level set L = {x ∈ M : f (x) ≤ f (x0 )} is compact (which is certainly the case if M itself is compact). When d is a descent direction.4.2 (Armijo point). Deﬁnition 3. but do not lend themselves to easy calculations. Deﬁnition 3.4. superlinear schemes based on ﬁrst order derivatives only. α1 .k∈K grad f (xk ). like Newton or quasiNewton schemes. where c ∈ (0. . Let f be a scalar ﬁeld on M. as well as a means to compute it. Then. A GEOMETRIC NONLINEAR CONJUGATE GRADIENT ALGORITHM 29 Global convergence results then depend upon the directions sequence d0 . Let {xk } be an inﬁnite sequence of iterates generated by a variation of algorithm 2 such that the sequence {dk } is gradientrelated and such that. 1). . The sequence {d0 . The aﬃne connections deﬁned in section 2. In the end. one needs to provide a problemspeciﬁc. the corresponding subsequence {dk }k∈K is bounded and satisﬁes lim sup k→∞. is a nice way of deﬁning suﬃcient descent.3. as we will do in the application chapters. the steepest descent algorithm is known for its linear convergence rate. as well as on the step lengths sequence α0 . we decided to postpone such investigations to later work and to focus on. is gradientrelated if. . d1 . The next deﬁnition. . Theorem 3. Any step choosing algorithm assuring suﬃcient descent will do. . is equally slow (although it is a bit more technical to establish it). forces the decrease in f to be suﬃcient at each step (and simultaneously deﬁnes what suﬃcient means).5. Given a cost function f on M with retraction R. k→∞ lim grad f (xk ) = 0 We do not need to use the Armijo step size per se. where m is the smallest nonnegative integer such that f (x) − f (Rx (β m αd)) ≥ σ − grad f (x). β. dk xk < 0.
It enables us to transport vector pk from Txk M to Txk+1 M. An initial guess x0 ∈ Rn . comes in handy here. Hence.8. we have: pk . Evaluate f0 = f (x0 ). grad fk+1 − Tαk dk (grad fk ) = grad fk 2 (3.3 tells us that such an algorithm would still converge.1. For a step length choosing algorithm which cannot guarantee a descent direction for the NLCG method. uses the following e β’s: T fk+1 ( fk+1 − fk ) PR βk+1 = fk 2 The update directions dk are not guaranteed to be descent directions unless we impose some constraints on the αk ’s. In the application chapters later on. The unitlength update directions dk have been introduced so that the step length chosen would always correspond to the actual displacement vector’s length. we will illustrate performances of the methods outlined here. Interestingly. based on FletcherReeves coeﬃcients. it is easier to compare the steepest descent step lengths and the NLCG step lengths. Such a generalization can be found in [AMS08]. which needs to be negative for dk to be a descent direction. OBJECTIVE GENERALIZATION AND MINIMIZATION The nonlinear conjugate gradient method is such a scheme in Rn . The vector transport concept. In [NW99]. fk . is tricky. This way. pk+1 := − fk+1 + βk+1 pk . f0 = f (x0 ) p0 := − f0 k := 0 while fk = 0 do pk dk = pk αk := choose step length xk+1 := xk + αk dk Evaluate fk+1 = f (xk+1 ) k+1 FR βk+1 := k+1 k 2 f FR pk+1 := − fk+1 + βk+1 pk k := k + 1 end fT f An alternative algorithm. The direction update though.4. We now generalize algorithm 3 to manifolds. Nocedal and Wright mention a wealth of variations on algorithm 3 as well as useful and interesting results concerning convergence and numerical performances. an easy workaround is to accomplish a steepest descent iteration every time pk is not a descent direction. Theorem 3. Same thing goes for the βk ’s: FR βk+1 = PR βk+1 grad fk+1 .30 CHAPTER 3. It is not our aim to provide an exhaustive treatment of CG algorithms in this document. grad fk+1 grad fk 2 grad fk+1 . the particular choice βk = 0 makes the algorithm revert to the steepest descent method altogether. The update step xk+1 := xk + αk dk can be modiﬁed in the very same way we did for algorithm 2. Algorithm 3: Nonlinear conjugate gradient method in Rn input: A function f : Rn → R and its gradient f : Rn → Rn . the sum is undeﬁned. Indeed. Indeed. hence giving us a reasonable substitute for which the sum with − grad fk+1 is deﬁned. deﬁnition 2. This can be guaranteed by using the wellknown strong Wolfe conditions on the αk ’s. Algorithm 3 is Nocedal and Wright’s version of it. pk+1 and grad fk+1 live in Txk+1 M but pk is in Txk M. see [NW99].9) . f k = − fk 2 + βk pk−1 . termed the PolakRibi`re version of algorithm 3.
It is based on an Armijo backtracking technique with automatic choice of the initial guess α1 . equation (3.e. Evaluate f0 = f (x0 ). α2 . we add cubic interpolation to algorithm 5 as well for i ≥ 2 and check that the sequence (α1 . These considerations led us to choose algorithm 5 as our chief step choosing algorithm. irrelevant information is erased and the algorithm can start anew. A lot of work has been done for this problem in the optimization community. along with a few other ad hoc numerical considerations. 31 Algorithm 4: Geometric nonlinear conjugate gradient method on M input: A scalar ﬁeld f : M → R and its gradient grad f : M → T M.g.6 Step choosing algorithms To choose the step length in algorithms 2 and 4. We single out one algorithm that proved particularly interesting in our case. Because of this. referring the reader to. precisely because manifolds locally resemble Rn . Hence.9) pk+1 := − grad fk+1 + βk+1 Tαk dk (pk ) k := k + 1 end When minimizing a quadratic objective in n variables. we need to solve a line search problem. [NW99]. it makes sense to expect the line search functions to be almost quadratic around α = 0 when we get near the minimum of f . hence reverting to a linear CG method and converging in at most n additional steps.) does not decrease too rapidly. . STEP CHOOSING ALGORITHMS Algorithm 4 synthesizes these remarks. .. In our algorithms. and rely on numerical tests in the application chapters to assess the performances of algorithm 4. We also implemented algorithms ensuring respect of the Wolfe conditions from [NW99] and actual minimization of φ . the algorithm will eventually enter that neighborhood and restart in it. the line search problem only uses real functions: we need to (crudely) minimize a function φ : R → R. we decide to restart every n iterations where n is the dimension of M. In practice. It is therefore customary to restart the nonlinear CG method every n iterations or so.3. i. compute α∗ = arg min φ(α) = arg min f (Rx (αp)).. let alone on manifolds. we picked c = 10−4 . Nonlinear CG methods and their convergence properties are not well understood in Rn . knowing that φ (0) < 0 (descent direction). α>0 α>0 Despite the fact that our original problem uses manifolds. if the objective is globally convex and is a strictly convex quadratic in a neighborhood about the minimizer. old. the CG method converges to the exact solution in at most n iterations. 3. This way. Following Nocedal and Wright in [NW99].6. we expect the objective function to “look like” a quadratic in a neighborhood around the minimizer on a manifold. e. . An initial guess x0 ∈ M. by applying a steepest descent step (which boils down to setting βk = 0). We thus stop here. In particular. Since the regression problem is quadratic in Rn . we keep the present section to a minimum. This is indeed the case when working on the sphere. grad f0 = grad f (x0 ) p0 := − grad f0 k := 0 while grad fk = 0 do pk dk = pk αk := choose step length xk+1 := Rxk (αk dk ) Evaluate grad fk+1 = grad f (xk+1 ) βk+1 := cf.
1). 2. . . These performed well too.prev φ0 for i = 1. or 0 if N iterations did not suﬃce. an iteration limit N . input : The previous step chosen αprev and the previous slope φ0.prev . a line search function φ and its derivative φ (only computed at 0). do if i > N then return 0 else Compute φi = φ(αi ) if φ0 − φi ≥ −cαi φ0 then return αi else α2 φ0 αi+1 = − 2 φ −φi −α φ ( i 0 i 0) end end end φ with Matlab’s fminsearch algorithm. output : A step α respecting the suﬃcient decrease condition φ(0) − φ(α) ≥ −cαφ (0). . Compute φ0 := φ(0) and φ0 := φ (0) Set α1 := 2αprev 0.32 CHAPTER 3. OBJECTIVE GENERALIZATION AND MINIMIZATION Algorithm 5: Backtrackinginterpolation step choosing algorithm require: A small constant c ∈ (0. . but we do not use them for the results shown in chapters 4 and 5.
we give explicit formulas for all the elements needed to apply the nonlinear conjugate gradient scheme to discrete curve ﬁtting on the sphere S2 . As we will see.Chapter 4 Discrete curve ﬁtting on the sphere The previous chapter constitutes a formal deﬁnition of the regression problem we intend to solve on manifolds and of minimization algorithms on manifolds. the unit sphere embedded in R3 . Still. Mostly. however.. It will be clear..e. i. e. tractability has not been assessed yet and is a fundamental factor of success for any numerical method. tractability will be determined by: • the complexity of the formulas for geodesic distances and logarithmic maps as well as their derivatives.g. the diﬃculty of writing an algorithm to solve the problem can be reduced by using proxies for. This can have an inﬂuence on the objective function’s global minimum. 33 . be used to solve the regression problem at hand. a direct use of spherical coordinates. in principle. that the results of this chapter do not transpose to every manifold. We then show some results for diﬀerent scenarios and discuss performances as well as tricks to speed things up a bit.. is the absence of any type of singularity problem at the poles. One of the main arguments justifying the geometric approach instead of. the logarithmic map. In this chapter. and • the amount of data as well as the sampling precision required. These algorithms could. e. • the convergence rate of the minimization algorithms. The intricacy of writing the code (mostly inherent to the complexity of working out the gradient of the objective) and the actual performances are shown to be reasonable on the sphere. but this inﬂuence can be made arbitrarily small by reﬁning the curve sampling.g.
34 Set: CHAPTER 4.1. T It is readily checked that Px ◦ Px = Px and that Px = Px (i.7. w = v T w (induced metric) v x = v. DISCRETE CURVE FITTING ON THE SPHERE S2 = {x ∈ R3 : xT x = 1} Tx S2 = {v ∈ R3 : xT v = 0} Px : R3 → Tx S2 : v → Px (v) = (I − xxT )v v. when computing the distance between close points.1) Rx : Tx S2 → S2 : v → Rx (v) = = x+v 1+ v 2 Using this retraction. we indiscriminately use the symbol S2 to refer to the set or the manifold. (4. A great circle is a unitradius circle centered at the origin. just like we often do when we use the symbol M instead of the actual set the manifold is built upon and vice versa. deﬁnition 2. theorem 2.1: Geometric toolbox for the Riemannian manifold S2 4.e. Table 4.2. S2 is a submanifold of the Euclidean space R3 ). From now on.1 contains a few closedform expressions for the geometric tools we will use henceforth.2. The Log map listed in table 4. As descent methods usually perform small steps anyway. While the closedform expression for the map is not too complicated. Proxies for the geodesic distance are compared on ﬁgure 4.. hence Exp is indeed the exponential map for S2 (this can be more thoroughly checked by computing the second covariant derivative and verifying that it vanishes). w x Tangent spaces: Projector: Inner product: Vector norm: Distance: Exponential: Logarithm: Mean: = v. we suggest using an even simpler map instead. we go over these formulas and give a word of explanation. Px is an orthogonal projector). it is impossible to leave the hemisphere whose pole is x). This time. The geodesic distance between p and q. We thus think of tangent vectors as actual vectors of R3 tangent to the sphere.. In the next few paragraphs. see deﬁnition 2.y) Px (y−x) sin( v ) Px (y − x) x+y x+y mean (x. one could use the distance in the embedding space: x − y 2 = 2(1 − xT y) for x.1 is indeed the inverse of the exponential map. The curve t → Expx (tv) describes a great circle passing through x at t = 0 and such d that dt Expx (tv) t=0 = v. S2 being a submanifold of R3 deﬁned by an equality constraint. The Riemannian metric as well as the norm are inherited from R3 because of the submanifold nature of S2 .2 applies and we are entitled to use the identiﬁcation between tangent vectors and elements of R3 . y) = Table 4. it is impossible to reach points further away from x then a right angle (i. and give it a natural Riemannian manifold structure inherited from R3 (i. It corresponds to the length of the shortest arc of great circle joining the two points.e. y) = arccos(xT y) Expx (v) = x cos( v ) + Logx (y) = dist(x. and as such is the logarithmic map on S2 . the unit sphere embedded in R3 .2. the diﬀerential dfx = 2xT yields ker dfx = Tx S2 = {v ∈ R3 : xT v = 0}. The retraction we use on S2 is the following: x+v x+v . is the angle separating p and q in radians. v x = v v v dist (x. almost by deﬁnition.1 S2 geometric toolbox We deﬁne the set S2 .e. the expression is quite complicated (remember that we . y ∈ S2 . Great circles are geodesic curves for S2 ..1. Im Px = Tx S and ker Px = Tx S (the orthogonal complement of Tx S2 ). called a retraction. this will not be an issue. 2 ⊥ 2 Furthermore. To simplify. Deﬁning f : R3 → R : x → f (x) = xT x − 1.
fx (y) = 1 − (xT y)2 Px (y − x) = . we always use the genuine geodesic distance..2. We propose a generalized logarithm. In this work. least interestingly. Γ = S2 × . (3) the squared norm that a vector v at x pointing toward y should have such that Rx (v) = y.2. . We illustrate this fact by ﬁgure 4. y) arccos(xT y) (4. 4.1 and 3.2) where fx is smoothly extended by deﬁning fx (x) = limxT y→1 fx (y) = 1. All of them are not valuable though.2. × S2 = S2 Since Γ is just a product manifold based on S2 . we propose the following energy function for the regression problem on S2 : E : Γ → R : γ → E(γ) = Ed (γ) + λEv (γ) + µEa (γ) (4. Nd . This second toolbox will mainly be useful for the optimization algorithms. . For the sake of completeness.3) . dist (x. In order to deﬁne the objective function. Lx maps points on the sphere to tangent vectors at x. the tools we listed in table 4.5: Lx : S2 → Tx S2 : y → Lx (y) = Px (y − x) = y − (xT y)x. CURVE SPACE AND OBJECTIVE FUNCTION Comparison of squared geodesic distance to candidate proxies Relative error of squared proxydistance between x and y 100 35 10−2 10−4 Retraction Euclidean Taylor 2 0 45 90 Angle between x and y in degrees 135 180 Figure 4.7.7: Lx (y) = fx (y) · Logx (y) .2. need to diﬀerentiate it).2 gave us a generic deﬁnition of the curve space and of the objective function to minimize over that space for a regression problem on a manifold M. but with a smaller norm. We simply have that the curve space Γ consists of Nd copies of the sphere.e. pointing in the same direction as the genuine logarithmic map.4. deﬁnition 2.1: Diﬀerent proxies can be envisioned to replace the geodesic distance between x and y on S2 to lower the computational burden. This is indeed a generalized logarithm. we specialize the deﬁnitions to M = S2 . In this section. This plot illustrates the relative error between the genuine squared geodesic distance and (1) the squared Euclidean distance (in the embedding space) which coincides with a ﬁrst order Taylor development of xT y → arccos2 (xT y). one only needs the ﬁrst toolbox.2 Curve space and objective function Section 3. Following the discussion at the end of section 2. i. we show this explicitly in table 4.1 for S2 can readily be extended to Γ component wise. (2) a second order Taylor development and. The error on the norm can be made arbitrarily small by constraining y to be close enough to x. Building on sections 3.
. γi = Expγi (vi ) ˜ ˜ Table 4. × S2 N Ed (γ) = Ev (γ) = Ea (γ) = 1 2 1 2 wi arccos2 (pT γsi ) i i=1 Nd −1 i=1 1 T arccos2 (γi γi+1 ) ∆τi Nd −1 2 1 ∆τ1 a1 2 2 + i=2 ∆τi−1 + ∆τi ai 2 2 + ∆τNd −1 aN d 2 2 (4. wi = Pγi (vi ) v.2: Both Logx (y) and Lx (y) point in the same direction. × TγNd S2 Pγ : R3×Nd → Tγ Γ : v → Pγ (v) = w. This plot illustrates the relative error introduced by considering the generalized logarithm (4.4) and (3. The ∆τi ’s are deﬁned as: ∆τi = τi+1 − τi . . . DISCRETE CURVE FITTING ON THE SPHERE Relative norm error between Logx (y) and Lx (y) on S2 100 Logx (y) − Lx (y) Log x (y) 10−1 10−2 10−3 0 45 90 Angle between x and y in degrees 135 180 Figure 4. the error is under 1%. as the weights appearing in the ﬁnite diﬀerences formulas cancel out the integration weights. . In particular. v Expγ (v) = γ . w γ = γ Nd i=1 vi . equations (3. × S2 (Nd copies) Tγ Γ = Tγ1 S2 × . The main reason for doing so is purely aesthetic. . for angles under 12. Since we deal with ﬁner samplings anyway.6 degrees (or 0. hence we do not need backward diﬀerences.2: Geometric toolbox for the Riemannian manifold Γ = S2 × .5). Set: Tangent spaces: Projector: Inner product: Vector norm: Exponential: Γ = S2 × . Ev has a very simple form because the norm of Logx (y) is simply the geodesic distance between x and y.4) Ev follows from choosing the forward order 1 formula for velocity as well as the rectangle method for the weights αi . Replacing Log by L in the geometric ﬁnite diﬀerences for acceleration. In particular. αNd = 0.36 CHAPTER 4. the additional error is a very cheap price to pay in comparison to the decrease in complexity.2) instead of the true logarithm as a function of the angle separating x and y. then quickly decays.22 radians). . We choose the trapezium method for the weights βi . we deﬁne the ai ’s as follows: a1 = 2 −2 Lγ1 (γ2 ) + Lγ (γ3 ) ∆τ1 ∆τ2 ∆τ2 (∆τ1 + ∆τ2 ) 1 . but the latter has a smaller norm. wi γ γi v = v.
The derivation does not present any challenges. Remarkably.3). q ∈ S2 . Γ is a Riemannian submanifold of R3×Nd and.4. what we later on refer to as our reﬁnement technique. hence making the next section less cumbersome.4 Results and comments We implemented algorithm 4 for M = S2 × . theorem 2. A completely explicit derivation of grad E would be overly cumbersome. algorithm 6. . is shown to speed up convergence and increase reliability.× S2 = Γ to optimize objective (4.2. The material exposed in this document still being a proof of concept. see deﬁnition 2. We just need to provide the gradients on Γ of the functions deﬁned in equations (4. as such.5 applies. In the above equation.5. g : R3 → R3 ): −2 arccos(q T x) 1 − (q T x)2 q x → arccos2 (q T x) (x) = x → g(x) 2 (x) = 2(Jg(x))T g(x) J (x → Lq (x)) (x) = Pq J (x → Lx (q)) (x) = −(xT q)I − xq T J denotes the Jacobian. We can therefore write: grad E : Γ → T Γ : γ → grad E(γ) = Pγ E(γ).4. T Γ is the tangent bundle of Γ. In particular. Also. . The material above is suﬃcient to obtain the needed expressions. we show that CG methods are usually more eﬃcient than the steepest descent method and behave better (essentially reducing oscillations around the solution). we do not deem it important to delve into such details yet. the ai ’s deﬁned above via simpliﬁed geometric diﬀerences are the same as the ones we would obtain by computing the ﬁnite diﬀerences in the ambient space R3 then by projecting the results back on the tangent planes using the Pγi ’s. This makes it a lot easier to derive the gradient of E. since it reduces to a standard derivation of a real function followed by a projection (theorem 2. We skip over the technical details of the implementation. The interested reader can inspect it for further information. ∀i ∈ 2 .4). The following atoms are enough to compute E.5). 4. Nd − 1 ∆τi−1 + ∆τi ∆τi i ∆τi−1 i −2 2 = LγNd (γNd −1 ) + Lγ (γNd −2 ) ∆τNd −1 ∆τNd −2 ∆τNd −2 (∆τNd −1 + ∆τNd −2 ) Nd 37 These will be easier to diﬀerentiate. denotes the gradient of real functions. . The ﬁrst one qualitatively exposes the kind of solutions we expect based on the tuning parameters λ and µ. equation (4. E is a scalar ﬁeld on an open set of the ambient space R3×Nd containing Γ and such that E Γ = E. 4.3). . .3.4. interpreted as smooth scalar ﬁelds deﬁned on an open set of R3×Nd containing Γ. Practical results produced by our algorithms are shown to illustrate the adequacy between predicted and obtained curves.3 Gradient of the objective Since S2 is a Riemannian submanifold of R3 . GRADIENT OF THE OBJECTIVE ai = aN d 2 1 1 Lγ (γi+1 ) + Lγ (γi−1 ) . hence grad E (with x. The second part deals with the behavior our algorithms exhibit when converging toward some solutions. The accompanying code was written with care. We divide this section in two parts.
Typically. The objective we minimize is: E : S2 → R : γ → E(γ) = 1 2 N wi arccos2 (pT γ) i i=1 This is to be compared to the usual deﬁnition of the arithmetic mean in Rn : p = arg min n γ∈R 1 2 N i=1 wi pi − γ 2 = N i=1 wi pi N i=1 wi Of course. setting λ > 0 and µ = 0 disregards second order derivatives.3: The red points are the data on the sphere. p First order In Rn . A very good initial guess is to project the arithmetic mean in Rn back on the sphere by normalizing it.e. we ignore the time labels ti . The minimum of E corresponds to the mean of the data.. Discretization points with no data point attached become irrelevant. convergence is achieved to 6 signiﬁcant digits in under 4 iterations. the explicit formula for p as a linear combination of the pi ’s does not make sense on S2 and we therefore use our numerical algorithms instead1 . with identical weights. the sum of squared geodesic distances to the data. This makes little sense unless Nd = 1. Figure 4.5 Figure 4. we can just as easily deﬁne and compute the point that is as far away as possible of the data as the maximizer of E. yielding piecewise aﬃne regression curves. which was not possible in Rn .e. In this setting. 4. in which case the only point we search for corresponds to the weighted mean of the data on the manifold. i. γ 0 = p . .4. Solving the corresponding problem on S2 . The color on the sphere gives the value of the scalar ﬁeld E. we expect solutions to be piecewise 1 Interestingly. the single white dot.5 2 2. An excellent initial guess for the CG algorithm is the Euclidean mean projected back on the sphere. Convergence to six digits of accuracy is achieved in typically under four iterations.5 1 1.3 shows an example.1 Types of solutions Zeroth order Setting λ = µ = 0 yields an objective function E that completely disregards smoothness and just tries to ﬁt the discretization points γi to the associated data.5 3 3.. since S2 is compact and E is continuous. i. DISCRETE CURVE FITTING ON THE SPHERE 0 0.38 CHAPTER 4.
In future work.4. Of course. The result essentialy gives a diﬀerential equation for the connecting segments and continuity contraints. piecewise aﬃne acceleration proﬁle γ (t) along each dimension. This is indeed the behavior we can see on ﬁgure 4. As µ goes to inﬁnity.6 shows that a similar eﬀect exists in the discrete case. we expect the curve to collapse toward the mean of the data. exploited the fact that a geodesic is completely deﬁned by a point p on S2 and a vector v ∈ Tp S2 to exhibit a set of necessary conditions for a geodesic to be optimal for the regression problem. one γi associated to each diﬀerent data time label ti . The tangent bundle of a Riemannian manifold can be endowed with a metric in various ways.8. Sadly. Machado and Silva Leite. This is a question worth investigating. As µ goes to 0.5 and ﬁgure 4. It is important to realize that. It would be interesting to check whether the breaking points are the same following both the continuous and our discrete approach. These are characterized by a continuous. vanish. As λ goes to inﬁnity. gives the user access to a whole family of curves. a minus sign has been added to the acceleration for times t > 2 since the curvature changed at that point) indeed exhibits a compatible acceleration proﬁle. . these equations are complicated and the authors do not try to solve them explicitly. showed the same behavior in [MSH06]. since the length of the curve will be highly penalized. notably showing that the fourth covariant derivative of γ does not. again. Figure 4. ﬁtting the data becomes increasingly important. Tuning µ. where the acceleration proﬁle is only continuous and piecewise diﬀerentiable: the proﬁle segments are curved. we expect the solutions to converge toward piecewise geodesic interpolation. since reducing the distance between the curve and the data will become increasingly important. regardless of µ.4. the length of the curve is not penalized. Figure 4.4 shows diﬀerent solutions for three diﬀerent λ’s.6 shows a diﬀerent situation. setting λ = 0 and µ > 0 disregards ﬁrst order derivatives and penalizes second order derivatives. there is no clear classiﬁcation of the solutions. we would like to try our descent algorithms on the set of pairs (p. RESULTS AND COMMENTS 39 geodesic. Reasoning on the ﬁrst and second order families described above can give some intuition as to how one should tune the parameters to obtain the desired curve. Plus. Second order In Rn . with breaking points ¨ at the data times.5 1 (where.. This is indeed what we observe. i. our general approach is not well suited because of the need to make µ increase to inﬁnity.4]. yielding near interpolation. Cubic splines are piecewise cubic curves constrained to be of class C 2 (in the continuous case). Mixed When λ and µ are both positive. Machado et al. We show a few numerical results corroborating our intuition. yielding solutions in a wellknown class: cubic splines. for visualization. in general. As λ goes to 0. This surprising result was already predicted and quantiﬁed in a continuous setting by Machado and Silva Leite in [MS06. in [MS06]. This is indeed what we obtain when solving the problem in Rn as described in chapter 1 (acceleration is computed with standard ﬁnite diﬀerences). hence is a manifold in its own respect. even in Rn . we expect the curve to converge toward a geodesic regression of the data.7. we deal with too many degrees of freedom considering the simplicity of the targeted object. In the continuous case. This means that a solution is completely speciﬁed by its breaking points. We show an example on ﬁgure 4. v). We expect this to be the case. The results depicted on ﬁgure 4.6. But ﬁgure 4. We may expect to observe this same behaviour on S2 . which can be identiﬁed with the tangent bundle. Geodesic regression is such an important special case that we give another illustration of it on ﬁgure 4. It would be interesting to investigate whether the eﬀects are quantitatively the same or not.e. Thm 4.
while the additional points lie on the geodesics joining them. The speed and acceleration proﬁles have been computed using true geometric ﬁnite diﬀerences.4) and (3. we know exactly how to choose the discretization of γ to optimize at minimal computational cost. green and red. µ = 0 4 Speed 2 0 0 1 3 2 3 1 −4 1 ×10 Acceleration 0.5 0 0 1 3 Time 2 3 1 Figure 4. µ = 0 λ = 10−2 . since integration of a constant function with the rectangle method is exact.4). On this example.4: In this ﬁgure. the speed and acceleration proﬁle colors are respectively blue. For the spheres from left to right.40 CHAPTER 4. These statements hold for the deﬁnition of Ev given in equation (4. Data at the breaking points has been removed. equations (3. We only need to compute the end points and the breaking points. 6. one can generate a family of piecewise geodesic curves ranging from near interpolation to a collapsed curve onto the mean point of the data. µ = 0 λ = 10−1 . . The optimal objective values will also be the same. Numerical experiments indeed conﬁrm that optimizing a ﬁner curve with γi ’s associated to intermediate times τi gives the same breaking points (up to numerical precision). Indeed. as they should. We also know that setting the γi ’s on the data points is a good initial guess for the descent method. respectively. the γ(ti )’s.. This observation is of great practical importance. DISCRETE CURVE FITTING ON THE SPHERE λ = 10−3 . By tuning λ with µ set to zero. i. p1 = p4 is the upper point. 11 and 15 CG iterations with the FletcherReeves coeﬃcients and the backtrackinginterpolation step choosing algorithm.e.5). convergence to a curve such that the norm of grad E is less than 10−8 is achieved in.
are piecewise aﬃne (up to numerical precision). The acceleration proﬁles. . µ > 0.4.5 0 0 1 3 2 3 1 16 Acceleration 0 16 0 1 3 Time 2 3 1 Figure 4. the proﬁle colors are respectively blue. The neargeodesic solution is the hardest to compute. This only slightly aﬀects the solution. This is consistent with cubic splines in Rn . RESULTS AND COMMENTS 41 λ = 0.5: λ = 0. but is atypical on manifolds.7 for more details. See ﬁgure 4. The ﬂat segments at the end points (t = 0 and t = 1) are caused by our implementation of E which artiﬁcially sets the integration weights β1 and βNd to zero to spare us the burden of computing the gradient of unilateral diﬀerences. µ = 10−3 λ = 0. µ = 100 3 Speed 1. as displayed with corrected sign on the second half. green and red. µ = 10−4 λ = 0.4. The speed and acceleration proﬁles are computed with geometric ﬁnite diﬀerences using the exact logarithm. For the spheres from left to right.
6: This ﬁgure is to be compared with ﬁgure 4. predicted this departure from what can be observed with cubic splines in Rn . DISCRETE CURVE FITTING ON THE SPHERE λ = 0. This numerical experiment shows the existence of a similar eﬀect for discretized curves. µ = 101 3. µ = 10−4 λ = 0.5. Previous work by Machado and Silva Leite with continuous curves.8 0 0 1 3 2 3 1 8 Acceleration 4 0 0 1 3 Time 2 3 1 Figure 4. µ = 10−2 λ = 0. .42 CHAPTER 4.6 Speed 1. Notice how the acceleration proﬁles are not piecewise aﬃne anymore. see [MS06].
This observation yields an eﬀective. To avoid this. an eﬃcient workaround is to optimize with Nd = N ﬁrst (place the initial guess on the data). Unfortunately.5 × 10−4 for a total length of 1. The parameter values are: λ = 0. it is zero if and only if there is a geodesic γ(t) such that γ(τi ) = γi for each i. when we use the simpliﬁed logarithm and the ∆τi ’s are not all equal. µ = 10. but we further need to reﬁne the curve (increasing Nd hence also the computational cost) to reduce the impact of the generalized logarithm on the ﬁnal accuracy. With large Nd . see section 3. reliable manner of computing the geodesic regression. Hence. The curve shown in this ﬁgure has maximum acceleration of 1. the claim is not true anymore.4.7: When the acceleration on γ ∈ Γ is evaluated using geometric ﬁnite diﬀerences with the exact logarithm on S2 . then successively reoptimize while increasing Nd by curve reﬁnement. when setting λ = 0 and some large value for µ.79 over 1 unit of time. RESULTS AND COMMENTS 43 Figure 4. the optimization algorithm tends to “be satisﬁed” with any geodesic not too far from the data. The above procedure still gives a nice initial guess for the optimization algorithm.4.1. it is suﬃcient to ﬁnd (optimize) the position of γ(ti ) for each distinct data time ti . . algorithm 6.
44
CHAPTER 4. DISCRETE CURVE FITTING ON THE SPHERE
λ = 10−5 , µ = 10−6
2.5
Speed
1.75
1 8 Acceleration
0
1 4
1 2
3 4
1
4
0
0
1 4
Time
1 2
3 4
1
Figure 4.8: When λ > 0 and µ > 0, the solution lives in a large family of curves. The acceleration proﬁle
is still continuous and looks piecewise diﬀerentiable, but we do not have an easy characterization of the family anymore (not even in the Euclidean case). The ﬂat pieces at the end points on the acceleration plot were explained in the comments of ﬁgure 4.5. Notice how picking small positive values for both tuning parameters yields a nearinterpolatory yet smoothlooking curve.
4.4. RESULTS AND COMMENTS
45
4.4.2
Numerical performances
Aside from the data volume, i.e., N , the computational cost depends on a number of parameters. Namely: • The core descent algorithm used, • The step choosing algorithm, • The quality of the initial guess, • The number of discretization points, and • The parameters λ and µ. The relationship between numerical performance and factors such as Nd , λ and µ is complex to describe. For the other factors, we give a few guidelines hereafter. The core descent algorithms we introduced in chapter 3 can be anything from a steepest descent method to a PolakRibi`re or FletcherReeves ﬂavored CG. The diﬀerence between e FletcherReeves and PolakRibi`re is not, in general, signiﬁcant. They usually both perform e better than a steepest descent. Detailed algorithms to apply second order descent methods on manifolds, including trust region methods, are described in [AMS08]. We would like to try these in future work. We select algorithm 5 as our step length choosing method. It is particularly well suited to our problem since, for small steps, the linesearch function φ(α) = E(Rγ (αp)) can be (amazingly) well approximated by a quadratic. This comes from the fact that, in Rn × . . . × Rn , our objective function E is quadratic and Γ locally resembles a Euclidean space. Some alternative step choosing algorithms require the computation of φ (α), the derivative of the line search function. For the sake of completeness, we provide an explicit formula for it in appendix A. Picking a smart initial guess is important to ensure convergence. Fortunately, numerical experiments tend to show that the attraction basin of the global minimum is surprisingly large (but not equal to Γ). Even picking γ 0 at random often works. Here are a few choices we recommend depending on the scenario: • For the data mean, choose γ 0 =
p p
where p is the arithmetic mean of the data in R3 ;
0 • For piecewise geodesic regression, choose γi = pi ; 0 • For geodesic regression, choose γi = pi , optimize, then successively reﬁne the curve and reoptimize;
• For mixed order regression, solve the problem in R3 (chapter 1) then project the solution back on the sphere, or do the same as for geodesic regression. As a rule of thumb, it is a good idea to presolve the problem with low Nd . Typically, presolving with one discretization point γi for each distinct data time label ti yields good performances. Placing the initial curve γ 0 on the data for this presolving step has never failed in our numerical experiments, regardless of the scenario. The reﬁnement step we refer to is highlighted in algorithm 6. It requires the possibility to compute the mean of two points on the manifold2 . The main advantage of this algorithm is that it lets us set a tolerance on the spacing between two discretization points without oversampling. In other words, it is ﬁne to work with curves that alternate slow and fast segments. The resulting time sampling τ1 , . . . , τNd will, in general, be nonhomogeneous. The added advantage is that most of the optimization is done on fewer points than the ﬁnal discretization quality.
2 If that is overly complicated, one can always use γ or γ i i+1 as an approximation for the mean of these same points. It is then up to the optimization process to move the new points to the right places.
46
CHAPTER 4. DISCRETE CURVE FITTING ON THE SPHERE
Algorithm 6: Iterative curve reﬁnement strategy input : An initial curve γ; tol, a reﬁnement tolerance. output: An optimized curve γ such that two successive points are separated by a distance less than tol. continue := true while continue do γ := optimize(γ) continue := false foreach i such that dist (γi , γi+1 ) > tol do Insert mean (γi , γi+1 ) between them at time τi +τi+1 2 continue := true end end
The CG algorithm performed really well on the piecewise geodesic regression (order 1) problems. Figure 4.9 shows the evolution of step length, gradient norm and objective value over the 15 descent iterations. Figure 4.10 shows the same thing with a steepest descent algorithm. CG clearly beats SD. Both exhibit a nice behavior. Second order problems need more iterations. Furthermore, when µ has a high value (which is desirable when near geodesic regression is needed), the raw algorithm tends to get stuck with a closeto geodesic curve not too far away from the data. This stems from the fact that µEa overshadows Ed . To circumvent this, we use algorithm 6. The convergence behavior for the second case exposed in ﬁgure 4.5 is shown in ﬁgure 4.11. Only the preoptimization step is shown, i.e., the placement of the γi ’s corresponding to the pi ’s. Subsequent steps (reﬁnement + optimization) can be rapidly carried out (around 30 iterations). FletcherReeves and PolakRibi`re e perform about the same. The steepest descent method does as good a job as CG on the two ﬁrst instances, and takes as much as ﬁve times more iterations on the closeto geodesic example for comparable precision. Computing geodesic regressions is an interesting problem. As we showed before, it turns out to be the most challenging one for our algorithms. Figure 4.12 shows step length and gradient norm evolution over a huge number of iterations (1250) of the CG algorithm for the geodesic regression shown in ﬁgure 4.7. We have other algorithms in mind for geodesic regression. We will explore them in future work.
4.4. RESULTS AND COMMENTS
47
100 Gradient norm
100 Step length
10−4
10−6
10−8
5
10 Iteration
15
10−12
5
10 Iteration
15
Objective function value
0.8
0.5
0.2
2
4
6
10 8 Iteration
12
14
16
Figure 4.9: Convergence of the FletcherReeves CG algorithm on the most diﬃcult case of ﬁgure 4.4,
i.e., λ = 10−1 . The behavior of the algorithm is excellent. It is slightly better than with PolakRibi`re. e The evolution of the norm of the gradient is typical for superlinear convergence. Notice how the objective value quickly reaches a plateau. To assess convergence, it is best to look at the gradient norm evolution.
100 Gradient norm 10 30 Step length
100
10−4
10−6
10−8
20 Iteration
10−12
10
20 Iteration
30
Objective function value
0.8
0.5
0.2
5
10
15 Iteration
20
25
30
Figure 4.10: Convergence of the steepest descent algorithm on the same problem as ﬁgure 4.9. The
behavior of the algorithm is good, but it needs more than twice as many iterations as its CG counterpart. The evolution of the norm of the gradient is typical for linear convergence.
After about 750 iterations. and numerical precision becomes an issue. the algorithm reached a threshold.03 0.02 0.12: Optimization with CG for the problem shown in ﬁgure 4. with initial guess for γ equal to the data.11: Optimization of the γi ’s corresponding to the pi ’s for the second case displayed in ﬁgure 4. the points computed in this step will be slightly displaced.48 100 CHAPTER 4. for which the CG algorithm performs much better.7.01 1 2 3 4 5 6 Iteration 7 8 9 10 Figure 4. We reﬁne the curve by adding points on the geodesics joining the γi ’s and reoptimizing in a few steps. The parameter µ is set to 10 and the optimization is carried out by successive reﬁnements.5. The gradient norm was reduced by 8 orders of magnitude. In subsequent reﬁnement steps. 10−1 Step length 10−6 10−11 0 500 Iteration 1000 106 Gradient norm 101 10−4 0 500 Iteration 1000 Figure 4. Convergence is achieved in about ten iterations. . All our algorithms perform about the same except for the closeto geodesic regression example. DISCRETE CURVE FITTING ON THE SPHERE 10−1 Gradient norm 2 6 4 Iteration 8 Step length 10−4 10−5 10−8 10−9 2 4 6 Iteration 8 10 Objective function value 0.
intuition can get you a long way and one might argue that the diﬀerential geometry background is expendable. called the LogEuclidean metric. One of the metrics that does that. we built and analyzed a method to ﬁt curves to data on the sphere. the metric refers to the Riemannian distance induced by the Riemannian metric. As we will see. We conclude this chapter by testing all described methods and comparing them against each other. We then exhibit other ways of solving similar problems. Pn is an open convex cone. but we need to work on more complicated manifolds to justify the complexity of our method. 1 Quoting [Wei10]. One of the main advantages of that manifold is that it is very easy to visualize the data and how the algorithms behave. eﬀectively enabling + us to use the simple mathematics developed in the ﬁrst chapter to solve our problem rapidly. A possible workaround + + is to use a metric that will deform the space so as to stretch the cone limits to inﬁnity. Next. 49 . endowed with a Riemannian + metric. Yet another alternative is to exploit the convex nature of our problem. We ﬁrst observe that. later referred to as the aﬃneinvariant metric. S2 was an ideal toy example to test our ideas.Chapter 5 Discrete curve ﬁtting on positivedeﬁnite matrices In the previous chapter. Modern solvers can readily deal with the usual matrix norms. we can interpolate between points by using piecewise linear interpolation. we propose to study curve ﬁtting on a manifold that is both diﬃcult to picture and of practical interest: the set of positivedeﬁnite matrices Pn . in [AFPA08]. is used in this chapter and the methodology developed in this document is applied to the resulting manifold. In this deﬁnition. Pn is thus not complete1 . In this chapter. In that setting. As is. computations are more involved. mainly because of the need for derivatives of matrix functions. + + Completeness is a desirable property because our optimization algorithms produce sequences of points on Pn and we certainly want these sequences to converge in Pn . Pn being a + convex set. It endows Pn with a vector space structure. This does not give us the freedom to tune between ﬁtting and smoothness. we investigate an interesting metric introduced by Arsigny et al. With the usual metric. a complete metric space is a metric space in which every Cauchy sequence is convergent. but it is nevertheless a very simple method that we need to compare our algorithms against.
H A dist (A. H = trace (H 2 ) = i. H2 A = A−1/2 H1 A−1/2 . Figure courtesy of Absil et al. We use the notation A ∈ Pn ⇔ A 0. hence it is not complete. Pn is + + not a Riemannian submanifold of Hn . The norm associated to this metric is the Frobenius norm H = H. To visualize this. i. H2 = trace (H1 H2 ) = trace (H2 H1 ) .1. From now on. Giving Pn a Riemannian manifold structure by restricting the above metric to Pn thus would + + n yield a noncomplete manifold. Pn being an open subset of Hn . B) = log A−1/2 BA−1/2 ExpA (H) = A1/2 exp A−1/2 HA−1/2 A1/2 LogA (B) = A1/2 log A−1/2 BA−1/2 A1/2 mean (A.j 2 Hij . Since the aﬃneinvariant metric is not the restriction of the metric on Hn to Pn .1: Tangent vectors to an open subset E∗ of a vector space E .. Bha07].1.1: Geometric toolbox for the Riemannian manifold Pn endowed with the aﬃneinvariant metric + E∗ E Figure 5. Following Arsigny et al. The set of positivedeﬁnite matrices is an open convex subset of Hn . Regardless of the metric.e. the metric described in table 5. see [AFPA08. Pn refers to the set of positivedeﬁnite matrices endowed with the + geometric toolbox described in table 5. + the tangent space TA Pn at any positive matrix A corresponds to the set Hn . [AMS08].1.50 CHAPTER 5. B) = A1/2 (A−1/2 BA−1/2 ) 1/2 A1/2 Table 5. We can endow P+ with a diﬀerent Riemannian metric in order to make the resulting manifold complete.1 Pn geometric toolbox + We note Hn the set of symmetric matrices of size n and Pn ⊂ Hn the set of positivedeﬁnite + matrices of size n. · : Hn × Hn → R : (H1 . Set: Tangent spaces: Inner product: Vector norm: Distance: Exponential: Logarithm: Mean: Pn = {A ∈ Rn×n : A = AT and xT Ax > 0 ∀x ∈ Rn . . A−1/2 H2 A−1/2 H A = H. The embedding space Hn is endowed + with the usual metric ·. One such metric is given in table 5. H2 ) → H1 . with this metric.1 stretches the limits of P+ to inﬁnity. We illustrate this + on ﬁgure 5. x = 0} + TA Pn ≡ Hn = {H ∈ Rn×n : H = H T } + H1 . n In loose terms. it is instructive to particularize the exponential map to positive numbers. DISCRETE CURVE FITTING ON POSITIVEDEFINITE MATRICES 5.. we call it the aﬃneinvariant metric. in [AFPA08].
+ see [Bha07. with s big enough. CURVE SPACE AND OBJECTIVE FUNCTION 51 positivedeﬁnite matrices of size 1. . AFPA08]. = 2 3 k=1 The series for exp is convergent for any square matrix A. The space deformation is stronger close to singular matrices.1]. . see [Hig08. We never reach nonpositive numbers for ﬁnite t. A hint to this is that. . λn ) .. the matrix A1/s − I has spectral radius less than 1 and the Taylor series (5. This is indeed the case. . × Pn = (Pn )Nd . . This yields easier formulas for the exponential: ∞ exp(H) = k=0 1 U DU T k! k ∞ =U k=0 1 k D U T = U exp(D)U T . 5. Furthermore. we know that log(x) is well deﬁned for x > 0.2 Curve space and objective function Γ = Pn × .2) 1 1 log(A) = (A − I) − (A − I)2 + (A − I)3 − . The inverse mapping is the restriction + log : Pn → Hn . These are deﬁned as series derived from the Taylor expansions of their scalar equivalents: 1 exp(A) = I + A + A2 + . . to bring the eigenvalues of A1/s into a circle centered at 1 (in the complex plane) and with radius strictly smaller than 1. . the matrix logarithm is well deﬁned as such over Pn . We thus expect the matrix counterpart to be convergent when ρ(A − I) < 1. The tools shown in table 5..1 can be used on Γ by component wise extension. a Taylor series can be used to deﬁne the principal logarithm of any matrix having no eigenvalue on R− . and D a diagonal matrix composed with the real eigenvalues λi of H. . We use the matrix exponentials and logarithms. it is always possible. In particular. On the other hand. we follow the curve ExpA (tH) = exp(−t).1) (−1)k+1 (A − I)k k (5. .2) can be used. Starting at A = 1 and moving along the tangent vector H = −1. . show that the restriction exp : Hn → Pn : H → exp(H) + is a diﬀeomorphism between the metric spaces Hn and Pn . The series for log is derived from the real series log(1+x) = x−x2 /2+x3/3−. . The objective function E is deﬁned as: E : Γ → R : γ → E(γ) = Ed (γ) + λEv (γ) + µEa (γ) (5. Bhatia and Arsigny et al. . .3) . which we know is convergent if x < 1 (and divergent if x > 1). using the identity log(A) = s log(A1/s ). λn the eigenvalues of X. the identity matrix. with λ1 . + Every matrix H ∈ Hn can be diagonalized as H = U DU T with U T U = I. . . . = 2 ∞ k=0 1 k A k! ∞ (5. problem 11. k! where exp(D) is the diagonal matrix whose entries are exp(λi ).2. + + + The curve space on Pn is given by + Each of the Nd points γi composing a curve γ is a positive matrix. Same goes for the logarithm of a positive matrix A = U DU T : log(A) = U log(D)U T .5. According to Higham. Consequently. where ρ(X) is the spectral radius of X: ρ(X) = max (λ1 .
To this end. d d ∆τNd −1 ∆τNd −2 ∆τNd −2 (∆τNd −1 + ∆τNd −2 ) a1 = 5. . γsi ) i=1 Nd −1 i=1 1 dist2 (γi . We derive a generic way of computing derivatives of functions of symmetric matrices. we need the following result: D(X → X k )(A)[H] = lim Then. . appearing in the distance function and the logarithmic map deﬁned in table 5. This is to be compared to the deﬁnition given in section 4.4) The pi ’s are positivedeﬁnite matrices. Generalized to matrices. λn ). γi+1 ) ∆τi Nd −1 2 γ1 1 ∆τ1 a1 2 2 + i=2 ∆τi−1 + ∆τi ai 2 2 γi + ∆τNd −1 aN d 2 2 γN d (5. . We would like to compute Df (A)[H]. the directional derivative of f at A in the direction H ∈ Hn .1. . This requires formulas for the derivatives of the matrix logarithm. ∞ k (A + hH)k − Ak = h→0 h k Al−1 HAk−l l=1 Df (A)[H] = k=1 ak l=1 ∞ Al−1 HAk−l k =U k=1 ak Dl−1 U T HU Dk−l U T l=1 T = U · Df (D)[U HU ] · U T ˜ which is a simpler object to compute because D is diagonal. We diﬀerentiate the series term by term. The way we derive the material in this section is inspired by [FPAA07. f (λn )) U T .52 CHAPTER 5. .2. Let f be a smooth. . ∆τi−1 + ∆τi ∆τi ∆τi−1 −2 2 aN d = LogγN (γNd −1 ) + LogγN (γNd −2 ) . ∆τ1 ∆τ2 ∆τ2 (∆τ1 + ∆τ2 ) 2 1 1 ai = Logγi (γi+1 ) + Logγi (γi−1 ) . real function deﬁned by a Taylor series ∞ f (x) = k=0 ak xk . Let A ∈ Hn . such that A = U DU T . Nd − 1. The acceleration vectors ai ∈ Tγi Pn ≡ Hn are symmetric matrices deﬁned by + 2 −2 Logγ1 (γ2 ) + Logγ1 (γ3 ) . DISCRETE CURVE FITTING ON POSITIVEDEFINITE MATRICES 1 2 1 2 N Ed (γ) = Ev (γ) = Ea (γ) = wi dist2 (pi . . Let us write H = U T HU and . ∀i ∈ 2 . this yields ∞ f (A) = k=0 ak Ak = U diag (f (λ1 ). . U an orthogonal matrix and D = diag (λ1 .3 Gradient of the objective We now compute the gradient of E as deﬁned in the previous section. . appendix] and [Bha07]. .
with fA (X) = 1 1 log A−1/2 XA−1/2 . We then use U −1 instead of U T . Setting f = log. Compute H = U T HU . and D(X → AXB)(X)[H] = lim h→0 A(X + hH)B − AXB = AHB. Compute D log(A)[H] = U M U T By construction. diagonalizability is suﬃcient. see [Bha07. As one can see from this algorithm. λj ) = f (λi )−f (λj ) . 5. . H = H T . g(X) + f (X). λj ). λi −λj if λi = λj if λi = λj f (λi ). ˜ M =H ˜ F. k−1 kλi . where stands for entrywise multiplication (Hadamard’s product. we distinguish two cases: 1−x λk j λi k k l=1 λi λj l = λk −λk i j λi −λj . D = diag (λ1 . Diagonalize A: A = U DU T . D(X → f (X). Compute F (ﬁrst divided diﬀerences of log based on the λi ’s). if λi = λj if λi = λj Hence: with: ˜ ˜ Mij = Hij f (λi .3. U orthogonal. λn ). Entrywise. D log(A)[H] is symmetric. . ˜ ˜ ˜ We build the matrix F such that Fij = f (λi . ˜ f (λi . 60]. we get: ∞ k 53 Mij = k=1 ∞ ak l=1 k ˜ (Dl−1 HDk−l )ij l−1 k−l ˜ λi λj Hij l=1 = k=1 ak ∞ ˜ = Hij k=1 ak λk j λi k l=1 λi λj l . log A−1/2 XA−1/2 dist2 (A. A 0. Remark 5.1. Compute M = H ˜ F. we explicitly compute D log(A)[H]. GRADIENT OF THE OBJECTIVE ˜ M = Df (D)[H]. . λj ) are termed the ﬁrst divided diﬀerences by Bhatia. ˜ 2.3. g(X) )(X)[H] = Df (X)[H]. ˜ 4. as follows: 1. valid for x = 1. also known as Schur’s product). ˜ The coeﬃcients f (λi . As long as A does not have eigenvalues on R− . p. Dg(X)[H] (product rule). Then. λj ). it is not critical that H be symmetric nor that A be positivedeﬁnite. The objective function components Ed and Ev are linear combinations of squared distances. By using these identities: D(f ◦ g)(X)[H] = Df (g(X))[Dg(X)[H]] (composition rule). X) = 2 2 . h . . we need only derive an expression for grad fA (X). To compute their gradients. Using the identity k l=1 xk = x 1−x . ˜ 3.5.
· . DfA (X)[H] = A1/2 X −1 A1/2 log(A−1/2 XA−1/2 ). DISCRETE CURVE FITTING ON POSITIVEDEFINITE MATRICES we work out the following formula: DfA (X)[H] = D log(A−1/2 XA−1/2 )[A−1/2 HA−1/2 ]. . Indeed. and with X = U DU T : D X→ 1 log(X) 2 2 (X)[H] = D log(X)[H]. log(XA−1 )X = log X 1/2 X 1/2 A−1 X 1/2 X −1/2 X = X 1/2 log X 1/2 A−1 X 1/2 X 1/2 . . . We thus compute the gradient of fA at X as follows: n(n+1) 2 grad fA (X) = k=1 n(n+1) 2 DfA (X)[Hk ]Hk = k=1 DfA (X)[X 1/2 H k X 1/2 ]X 1/2 H k X 1/2 . H n(n+1) 2 product ·. Regretfully. the gradient grad fA (X) is the unique symmetric matrix verifying ∀H ∈ Hn . · ·. X . Hj X = X −1/2 Hi X −1/2 . DfA (X)[H] = grad fA (X). H n(n+1) . H where we used A log(B)A−1 = log(ABA−1 ). this method requires n(n+1) computations of D log. H 1 . A−1/2 HA−1/2 = X −1 log(XA−1 ).4. H = log(XA−1 )X. grad fA (X) = log(XA−1 )X.2. Collecting the relevant equations yields an explicit algorithm to compute grad fA (X). X −1/2 Hj X −1/2 = H i . .54 CHAPTER 5. an orthonormal basis of Hn endowed with the inner product H1 . . where δij is the Kronecker delta. log(A−1/2 XA−1/2 ) . log(X) ˜ = trace U (H ˜ = trace (H n ˜ F )U T · U log(D)U T ˜ F ) log(D) = i=1 n ˜ ˜ log(λi )Hii Fii ˜ log(λi )λ−1 Hii i i=1 = ˜ = trace D−1 log(D)H ˜ = trace U T U D−1 U T U log(D)U T U HU T U = trace X −1 log(X)H = X −1 log(X). . since Hi . we can compute an orthonormal basis X of Hn endowed with the inner as Hi = X 1/2 H i X 1/2 . H j = δij . By deﬁnition 2. Hence. . H . 2 Remembering that for square matrices trace (AB) = trace (BA). . We can do better. . H Considering 2 X . This is a symmetric matrix.
the breaking point occurs for h close to 10−5 .2 compares grad fA (X).5. we also see that grad fA (B) = − LogB (A). GRADIENT OF THE OBJECTIVE 55 From this equation. Proof. H. 2 grad A → Figure 5.3. BT (5. H X (order of magnitude: 1) for some ﬁxed A. positive eigenvalues and is diagonal+ izable. B ought to move away from A.2. due to numerical accuracy. Let A. including grad E itself. B. X.e. det(SAS −1 − λI) = det(S(A − λI)S −1 ) = det(S) det(A − λI) det(S −1 ) = det(A − λI). Indeed: 1 = det(I) = det(SS −1 ) = det(S) det(S −1 ). Similar graphs have been generated to check every gradient formula given in this document. 2 1 dist2 (A. for any invertible matrix S. X. First. B) (A) = grad fB (A) = log(AB −1 )A. . i. The slope of the straight segment is 2. error = 100 fA (ExpX (hH))−fA (ExpX (−hH)) 2h − log(XA−1 )X. H. This ﬁgure provides further evidence that our formula for grad fA (X) is correct.3.2: Comparison of fA (ExpX (hH))−fA (ExpX (−hH)) 2h Deriving the gradient for Ea is more cumbersome. We still highlight the important points here. Λ(SAS −1 ) = Λ(A). H X 10−4 error 10−8 10−12 10−10 10−7 h 10−4 10−1 with grad fA (X). Figure 5. C of size n: A B. Then AB −1 has real. This plot gives us conﬁdence in the formula derived for grad fA (X). We note Λ(A) the spectrum of A. We know that. we needed to prove the following identity.. H X to a centered diﬀerence for some ﬁxed A. Proposition 5. which makes perfect sense: to increase the distance between A and B as fast as possible with an inﬁnitesimal change.5) We also needed to prove the following proposition. By the symmetry fA (B) = fB (A) = 1 dist2 (A. B ∈ Pn . B) (B) = grad fA (B) = log(BA−1 )B. the set of eigenvalues of A. for three square matrices A. Typically. B). We construct an explicit formula for it in appendix B. C = trace ((A n n B)C) = i=1 j=1 n n Aij Bij Cji Aij (B T )ji (C T )ij i=1 j=1 = = A C T . 2 1 grad B → dist2 (A.
we cannot use linear interpolation to solve the generic problem of smooth regression with more than two data points. If. If. Standard solvers for semideﬁnite programming. since + + + Hence. speciﬁcally because Γ is not complete. 5.7. x = 0. In section 5. we “stretched” the limits of the (open) + set Pn to inﬁnity. The set Γ is convex too. . xT (tA + (1 − t)B)x = txT Ax + (1 − t)xT Bx > 0. The original problem might even not have a solution. we can simply interpolate between the + (real) entries of p1 and p2 . 1]. needed to map the convex set Pn onto the + vector space Hn .5 Alternative II: convex programming When we endowed Pn with the aﬃneinvariant metric. the Frobenius norm for the ﬁgures shown in section 5. robust algorithms are available. Using the metric ·. In section 5. Pn is a convex set. x = 0. This ensures that solutions of the + regression problem are made of matrices γi 0. p2 ∈ Pn . The LogEuclidean metric. CHAPTER 5. V AB −1 V −1 = D is diagonal.56 Then. B ∈ Pn . one of the constraints is active. since ∀x ∈ Rn . however. we note that: • eﬃcient. The acceleration vectors are deﬁned using ﬁnite diﬀerences in the ambient space Hn . let A. has a similar stretching eﬀect. The usage of such solvers is made easy via modeling languages such as Yalmip and CVX [L¨ GB10]. We use these algorithms in our descent methods from chapter 3. Among the advantages of convex programming.7.. we illustrate our results. to interpolate between two points p1 . Indeed. linearly. U orthogonal and D diagonal. Pn is now seen as a Riemannian submanifold of Hn . hence AB −1 is diagonalizable.5.e. any symmetric matrix of the form X T X with ker X = {0} is positivedeﬁnite. at optimality.7. we see that AB −1 has the spectrum of a positivedeﬁnite matrix. • and provide optimality certiﬁcates. This can be done with the simple tools from chapter 1. We exhibit this in ﬁgure 5. none of the γi 0 constraints is active (i. TTT99]. Deﬁne V = U T A−1/2 . can solve a relaxed version of our problem: minimize E(γ) such that each γi is positive semideﬁnite. Of course. · deﬁned on Hn and the associated norm · . a priori. tA + (1 − t)B ∈ Pn for all t ∈ [0. ∀i) the optimal solution corresponds to regression in Hn . DISCRETE CURVE FITTING ON POSITIVEDEFINITE MATRICES Λ(AB −1 ) = Λ(A1/2 A1/2 B −1 A1/2 A−1/2 ) = Λ(A1/2 B −1/2 B −1/2 A1/2 ) = Λ((B −1/2 A1/2 )T (B −1/2 A1/2 )) Furthermore. 5. no way to convert the solution of the relaxed problem into a solution for our original problem. We can sometimes solve the problem without such deformations. Setting X = B −1/2 A1/2 . which we introduce in the next section.3) through (1. we compare our main results for piecewise geodesic interpolation to this alternative. + A natural objective for the regression problem is given by equations (1. We diagonalize the symmetric matrix A1/2 B −1 A1/2 = U DU T . making the resulting metric space complete. We use the latter with 04. γi 0. The joint material of this section and of appendix B is suﬃcient to implement direct algorithms to compute the gradient of E. This alternative to our geometric approach is so simple that we have to consider it. This objective is convex. Then.4 Alternative I: linear interpolation ∀x ∈ Rn . like SeDuMi or SDPT3 [Stu98.6). there is. xT X T Xx = Xx 2 > 0.
H2 A exp : (Hn .e. y) = + (log(y) − log(x))2 . Set: Tangent spaces: Inner product: Vector norm: Distance: Exponential: Logarithm: Mean: Pn = {A ∈ Rn×n : A = AT and xT Ax > 0 ∀x ∈ Rn . ⊗) + = D log(A)[H1 ].2. • we need only provide the problem description (no gradients).5.1 that the restricted matrix exponential is a smooth diﬀeomorphism.) → (Pn . y) = (1. + n ⊗ : R × P+ → Pn : (α. Let A. the geodesic between A and B is γ(t) = exp(t log(A) + (1 − t) log(B)). use this to endow Pn with the metric described in table 5. H A dist (A. Bhatia states that it is suﬃcient to have A. A) → α ⊗ A = exp(α log(A)) = Aα .6. i. see [Bha07.2: Geometric toolbox for the Riemannian manifold P+ endowed with the LogEuclidean metric. + such that A = U DA U T and B = U DB U T . Indeed. is a vector space isomorphism. 57 The aﬃneinvariant and LogEuclidean distances. convex programming cannot be used to solve the regression problem with the aﬃneinvariant and LogEuclidean metrics. are not convex. D log(A)[H2 ] H A = H. B ∈ Pn be commuting matrices. The vector space structure simpliﬁes a number of problems. AB = BA. The inverse mapping is the (principal) matrix logarithm. Arsigny et al. p. this can be used to endow Pn with a vector + space structure. 2) has a negative eigenvalue. B) → A ⊕ B = exp(log(A) + log(B)). For example. 23]2 . for + n = 1. B ∈ Hn for this to be true .6 Alternative III: vector space structure exp : Hn → Pn : H → exp(H) + We mentioned in section 5.. Following [AFPA08]. ∞ and Frobenius norms over Hn . there exists U . B) = exp(. +. • it is easy to add convex constraints. ⊕. Pn is the set of positive real numbers. termed the LogEuclidean + metric. ALTERNATIVE III: VECTOR SPACE STRUCTURE • we can use 1. see [AFPA08]. The two metrics are equal. orthogonal. 2. x = 0} + TA Pn ≡ Hn = {H ∈ Rn×n : H = H T } + H1 . hence it is not convex. such that dist2 (x. B) = log(B) − log(A) ExpA (H) = exp (log(A) + D log(A)[H]) LogA (B) = D exp(log(A))[log(B) − log(A)] mean (A. Using commutativity of diagonal matrices: log(B) − log(A) 2 = U (log(DB ) − log(DA ))U T −1 = U log(DB DA )U T 2 2 = U log(DA −1/2 DB DA −1/2 )U T = log(A−1/2 BA−1/2 2 In 2 2 = U log(U T A−1/2 U U T BU U T A−1/2 U )U T 2 [Bha07]. . The Hessian of this function at (x. + By construction. We ﬁrst introduce new summation and scaling operators: n n ⊕ : P+ × P+ → Pn : (A. Then. Consequently.5(log(A) + log(B))) n Table 5. with DA and DB diagonal. seen as functions from Pn × Pn to R+ + + where Pn is understood as a convex subset of the vector space Hn . 5.
structured linear systems. as γi = exp(˜i ). This is linked to the fact that.58 CHAPTER 5. the CG method with this vector transport extended to Γ showed signiﬁcantly poorer performances on the tests we ran. For example. for any two tangent vectors H1 . Thanks to the vector space structure.7 Results and comments The solutions to our regression problem on Pn can be categorized in the same way we did in sec+ tion 4. with H ∈ TA Pn : + + TLogA (B) (H) = B 1/2 A−1/2 HA−1/2 B 1/2 ∈ TB Pn + It has the nice property that. we proved that when the data points pi are commuting matrices. . we observe that the CG algorithm with this vector transport performs about as well as the steepest descent method. since the convex programming approach uses the Frobenius norm. it is also easy to add constraints to the original problem in the form of linear equalities and inequalities. n(n+1) 5. Because of the smoothness of the inner product on Riemannian manifolds. In this section. the solution of the regression problem in Hn with the new ˜ data points. based on the following vector transport on Pn . We solve the smooth regression problem on Pn endowed with the LogEuclidean metric as + follows: 1. close tangent spaces from Pn + “resemble each other”. DISCRETE CURVE FITTING ON POSITIVEDEFINITE MATRICES This proves that. Using modern QP solvers. 3. we always use the LE metric solution as our initial guess for the aﬃneinvariant metric optimization. B −1/2 H 2 B −1/2 = A−1/2 H1 A−1/2 . the solution of the original problem. we need not go through the hassle of computing gradients like we did in section 5. we show plots for diﬀerent scenarios. Because of this. the shortest path between two matrices corresponds to linear interpolation in Hn . convex programming. some of these methods yield the same solutions. Positivedeﬁnite matrices of size n = 2 are represented as ellipses. Compute γ ∈ Hn × . We expect this to work because our descent algorithms usually make small steps. In practice. we achieve guaranteed minimization of the objective by solving sparse. Compute the new data points pi = log(pi ) ∈ Hn for each original data point pi ∈ Pn . Compute γ. Furthermore. In many special cases. Additionally. Surprisingly. Consequently. We propose an alternative vector transport.5. ˜ + 2. H2 A. we use Tη (ξ) = ξ. noting H i = + n TLogA (B) (Hi ) ∈ TB P+ . . Linear matrix inequalities can be added using semideﬁnite programming solvers. in the following. this is true when A and B are diagonal. and sometimes much better. the aﬃneinvariant and the LogEuclidean metric are identical. We use the CG algorithm with the following naive vector transport: Tη (ξ) = ξ This means that we simply compare tangent vectors belonging to diﬀerent tangent spaces as is. The axis orientations of these ellipses correspond to the eigenvector directions and the axis lengths to the corresponding eigenvalues. the aﬃneinvariant and the LogEuclidean metrics are equivalent. In particular. .3. see section 5. Of course. log(AB) = log(A) + log(B) = log(BA). A−1/2 H2 A−1/2 = H1 . theoretically more attractive. the LogEuclidean metric and the aﬃneinvariant metric. We compare linear interpolation. × Hn . it is then computationally advantageous to solve the problem with the LE metric.4 for the sphere. we have: H 1. H2 ∈ TA Pn . γ Step 2 can be carried out by solving a regression problem in R 2 as described in chapter 1. for commuting A and B. if A and B commute. H 2 B = B −1/2 H 1 B −1/2 .
Otherwise. This + drastically decreases the computational cost.4 for the sphere still hold.6. convergence (i. as ﬁgure 5. A1/2 (5. eventually. In [Bha07]. because we use the exact logarithmic map on Pn (see our discussion in subsection 4.7 shows an example for the four solving techniques we discussed. compared to using the LogEuclidean solution as initial guess directly. For the LogEuclidean metric. grad E < 10−9 ) with 250 random matrices of size 5 is achieved with any of our algorithms in.B : [0. algorithm 6.4. three steps or less.7) . whereas the regression built with convex programming would.B (t) = ExpA (t LogA (B)) + = A1/2 exp t log A−1/2 BA−1/2 Second order For geodesic regression (high µ. typically. we simply compute the γi ’s at the data time labels ti .5. Computing the solution for the aﬃneinvariant metric is computationally more intensive. Both converge rapidly. reach a singular matrix. The CG algorithm. First order Setting λ > 0 and µ = 0. Bhatia proves that the optimization problem (5. For the aﬃneinvariant metric. the initial guess is optimal. see [AFPA08]. A). N i=1 wi (5. We show an example in ﬁgure 5. The reﬁnement technique. performs much better than the steepest descent method.6) is strictly convex in some sense. zero λ). SD needed 3. notice how the curves obtained with the deforming metrics pass through thinner matrices than the others. Figure 5. 1] → Pn : t → γA. for the geodesic joining two data points.5 times as many iterations for equal quality (judged based on the gradient norm).6) dist2 (X. despite our poor choice of vector transport. This eﬀect has already been observed by Arsigny et al.1). we simply have p = exp N i=1 wi log(pi ) N i=1 wi .4 shows the behavior of the CG and the steepest descent methods. Geodesics built with the deforming metrics can be indeﬁnitely extended on both ends. More speciﬁcally.6). Many of the remarks we made in section 4. Setting λ = 0 and µ > 0 yields splinelike curves. we use the LogEuclidean mean as an initial guess and apply one of our minimization algorithms to the expression (5. The geodesic between A and B has a closedform expression for the aﬃneinvariant metric: γA.7. This is a generalization of the geometric mean for positive reals. RESULTS AND COMMENTS Zeroth order The weighted mean p of positivedeﬁnite matrices can be deﬁned as p = arg min n Recall that fX (A) = 1 2 N i=1 59 X∈P+ wi fX (pi ) .3 compares solutions obtained with our main method and the three mentioned alternatives. When the pi ’s commute.8 demonstrates. produces piecewise geodesic regressions..e.7. cut down the number of needed iterations by about 20% for the example on ﬁgure 5. Figure 5. again. Figure 5.
The fourth line is computed by solving a linear leastsquares problem on the log(pi ) s and going back on Pn via the matrix exponential. .e. When λ goes to zero. This is still optimal.. µ = 0.60 CHAPTER 5.57 8 Iteration 10−9 1 8 Iteration 16 0.3: λ = 10−1 . noncommuting matrices shown on the ﬁrst line. The second line shows linear interpolation between the data.3. we search for just three positivedeﬁnite matrices.56 1 8 Iteration 16 Figure 5. The additional intermediate points are computed with equation (5. we use the LogEuclidean solution at the three data + times ti as initial guess for the breaking points. Both the geometric steepest descent method (blue) and the nonlinear geometric CG method (red) perform really well. For the ﬁfth line. i. Computation times for the three last lines are comparable.4: These plots show the behavior of the minimization process for the aﬃneinvariant metric solution shown in ﬁgure 5.7). this goes to linear interpolation. using CVX. DISCRETE CURVE FITTING ON POSITIVEDEFINITE MATRICES data linear convex LogEuclidean aﬃneinvariant Figure 5. The data consists in three positivedeﬁnite. The important point here is that we need only compute the γi ’s at the data times (the breaking points). 10−1 Gradient norm 16 Step length 10−1 10−4 10−5 10−7 1 Objective function value 0. The third line is obtained by solving the problem with SDPT3. We apply a minimization algorithm starting with them.
The fact that the optimal curve leaves Pn is a main argument + against convex programming in this setting.5 2 Figure 5.3 traveled in one time unit. data linear convex LogEuclidean aﬃneinvariant Figure 5. µ = 103 . hence constraints are never tight. claims optimality (according to the default tolerances). The four data points are 2×2 positivedeﬁnite diagonal matrices. The actual solution most likely passes through singular matrices.5. An example of geodesic regression across three data points with our three solving techniques. .7. RESULTS AND COMMENTS replacemen 61 linear scale 1 10 10−1 0 log scale λ2 0. the solution with the aﬃneinvariant and the LogEuclidean metrics are the same. Notice how the blue curve passes through almost singular matrices. The red curve shows regression with these metrics.6: λ = 0. On this plot.5: λ = 0. The aﬃneinvariant metric solution (bottom line) is reasonably quick to compute. SDPT3. Convex and LogEuclidean metric solutions (third and fourth lines) are exact (up to numerical accuracy) and quick to compute. Since the data matrices commute. and linear interpolation (second line).5 λ1 2 1 λ1 1. our semideﬁnite programming solver.1 0 λ2 10−8 1 1. placed in the eigenvalue plane in linear scale (left) and logarithmic scale (right). The blue curve shows regression with convex programming (metric inherited from Hn ). SDPT3 uses an interior point method. the curve displayed has maximum acceleration of 10−3 for a total length of 3. µ = 10−4 . They are pictured as black circles.
µ = 10−3 .7 with the CG algorithm and the successive reﬁnement technique. 10−3 Step length 10−6 10−9 0 800 Iteration 1600 2400 10 Gradient norm −1 10−4 10−7 0 800 Iteration 1600 2400 Figure 5.7: λ = 0. Convergence is excessively slow compared to alternatives II and III which gave almost instantaneous results. The three methods of interest (bottom lines) give sensible results. Which one should use in practice is applicationdependent. . DISCRETE CURVE FITTING ON POSITIVEDEFINITE MATRICES data linear convex LogEuclidean aﬃneinvariant Figure 5. Notice how the CG algorithm exhibits a linear convergence rate.62 CHAPTER 5. piecewise linear interpolation is not acceptable when splinelike curves are required.8: These plots show the convergence behavior for ﬁgure 5. Of course.
that is: a point p ∈ M and a tangent vector ξ ∈ Tp M. • To apply our techniques to a manifold M. C) = LogA (B) . e. Pn and SO(3). The optimization schemes we used are various ﬂavors of a geomet+ ric version of the nonlinear conjugate gradient scheme taken from [AMS08]. however. the implementation of higherorder optimization schemes like. We also introduced a similar concept for the logarithmic map. On a related note. only partly helps. we introduced the concept of geometric ﬁnite diﬀerences. equip the manifold T M with a toolbox based on the toolbox for M so we could apply the algorithms described in this document to eﬃciently compute geodesic regressions. The more interesting case of second order regression (µ > 0). but is slow to achieve with our present algorithms. the solution of the discrete problem may tend toward a sampled version of the solution of the continuous problem. the usage of automatic diﬀerentiation techniques or ﬁnite diﬀerence approximations of the derivatives of the objective. B) = dist2 (A. generically. We intend to work on a proof of this statement. borrowed from [AMS08]. + simpler algorithms. although it has proven valuable. A geodesic on M is completely speciﬁed by an element of the tangent bundle T M.2) and we designed and implemented algorithms to solve the discrete problem (3. • Geodesic regression is a special case of interest in applications. µ = 0 in the continuous case coincide with the breaking points computed with our discrete approach. one needs a geometric toolbox for M. To alleviate the computational burden.6) on matrix manifolds turned out to be quite involved. At some point.. • We believe that. Two options we intend to pursue in future work to address these concerns are 1. as proxies for the expensive exponential map and parallel transport. as maxi τi+1 − τi  → 0. Furthermore. is solved signiﬁcantly more slowly. B))(A) = −2 LogA (B). S2 . To this end. This adds to the intricacy of implementing the algorithms we designed. which makes sense geometrically. For all manifolds treated in this document.g. we found that grad(A → f (A. The attractive alternatives we proposed on Pn are further arguments pointing toward the need for faster. the algebra needed to obtain the gradient of (3. we discretized the problem of ﬁtting smooth curves to data on manifolds (1.Conclusions and perspectives In the present work. LogA (C) A with respect to all variables. The iterative reﬁnement technique we introduced. and computed explicit formulas for the objective (3. we exploited the concepts of retraction and vector transport. B. it would be interesting to build software that only asks the user for these elements. which ﬁll the classic role of ﬁnite differences on manifolds.6) and its gradient on Rn . 63 . and 2. B) and g(A. We would like to. This would enable fast prototyping and usefulness assessment. we would like to verify that the breaking points appearing in optimal curves for λ > 0. Performances were shown to be excellent for ﬁrst order problems (µ = 0): our optimization algorithms reach small values of the gradient in a few iterations. possibly a vector transport and the gradients of the functions f (A. the geometric trustregion method exposed in [AMS08].6). which we here call generalized logarithms. Let us mention a few research directions we would like to investigate in future work.
Theorem 3. In practical applications. usable software.ac. 3 Downloadable here: http://www. A wellestablished strategy to tackle this problem is. e. no gradient formulas have to be computed at all. with applications in computer graphics.inma. then only apply the ﬁrst computed command.g. we really are looking for an optimal smooth curve on M. based on the proposed command.. • A typical strategy in (discretetime) control is to compute the optimal commands to apply to the system up to some distant horizon in the future. The criterion is usually expressed in terms of the predicted future states of the system. The rich structure of the problems we treat gives plenty of opportunities for eﬃcient organization of the computations. e. For ﬁrst order problems (µ = 0). • The software written for this work3 is only a proof of concept. Smooth interpolation in the shape space corresponds to smooth morphing between given shapes..g. Model Predictive Control.1]. • Other manifolds of practical interest are. object tracking. When our algorithms reach maturity.ucl. which should drastically improve the performances. It is customary to also include a regularization term on the command in the objective function. the system is often subject to constraints. Investigating these applications is a natural extension of our work. The positive results we obtained in the present work and the appealing questions left to answer strongly encourage us to further our investigations. the Grassmann (quotient) manifold and the (inﬁnitedimensional) shape space. .be/~ absil/Boumal/ . . it will be useful to design and distribute eﬃcient.64 This is a general property. This observation partly alleviates the implementation burden for second order problems (µ > 0). see [SASK09. When the admissible set for the command is a manifold M.
d x 65 .Appendix A Line search derivative on S2 We give an explicit formula for the derivative of the line search function φ : R → R appearing in the descent methods applied on the sphere S2 . d − αx Rx (αd) grad E(Rx (αd)). d − αx 3 (1 + α2 ) 2 φ (0) = grad E(x). 1 (1 + α2 ) 2 1 3 d Rx (αd) dα E(Rx (αd)). The equivalent formula for the line search function on Γ = S2 × . × S2 is a component wise extension of the one given below. . With x ∈ S2 . d ∈ Tx S2 and d = 1: x + αd Rx (αd) = √ 1 + α2 d 1 2 Rx (αd) = 3 (d − αx) ∈ TRx (αd) S dα (1 + α2 ) 2 φ(α) = E(Rx (αd)) = E(Rx (αd)) φ (α) = = = E(Rx (αd)). .
LINE SEARCH DERIVATIVE ON S2 .66 APPENDIX A.
B. We introduce the real function g. we observe that: g(A. we derive an explicit formula for the gradient of Ea from chapter 5. B. . B T we proved in section 5.t. For this. C = A C T . . A−1/2 log CA−1 A1/2 = log BA−1 . B. log(A−1/2 CA−1/2 ) ˜ F . H = U T A−1/2 HA−1/2 U and ˜ F .5): = [U T log(A−1/2 CA−1/2 )U ] ˜ ˜ F. U T log(A−1/2 CA−1/2 )U . C)) (B)[H] = D log(A−1/2 BA−1/2 )[A−1/2 HA−1/2 ].r.4). r2 : 1 ai 2 2 γi 1 2 r1 Logγi (γi+1 ) + r2 Logγi (γi−1 ) γ i 2 2 1 = γi −1/2 r1 Logγi (γi+1 ) + r2 Logγi (γi−1 ) γi −1/2 2 1 r1 log γi −1/2 γi+1 γi −1/2 + r2 log γi −1/2 γi−1 γi −1/2 = 2 = 2 2 2 = r1 fγi (γi+1 ) + r2 fγi (γi−1 ) + r1 r2 log γi −1/2 γi+1 γi −1/2 . B. log(A−1/2 CA−1/2 ) ˜ Introducing A−1/2 BA−1/2 = U DU T . . Using the identity (5. For i ∈ 2. B. equation (5. we compute: D (B → g(A. We need to provide formulas for the gradient of the third term with respect to γi and γi+1 (which is symmetric with γi−1 ). C) → g(A. C) = log A−1/2 BA−1/2 . log γi −1/2 γi−1 γi −1/2 . g : (Pn )3 → R : (A. C)) (B) = BA−1/2 U [U T log(A−1/2 CA−1/2 )U ] ˜ F U T A−1/2 B We still need the gradient with respect to A.3. log A−1/2 CA−1/2 + Using the identity A B. H = A−1/2 U ([U T log(A−1/2 CA−1/2 )U ] grad (B → g(A. the λi = Dii ˜ = U (H ˜ = H ˜ F )U T . log AC −1 67 . the ﬁrst divided diﬀerences of log w. .Appendix B Gradient of Ea for Pn + In this appendix. C) = log A−1/2 BA−1/2 . log A−1/2 CA−1/2 = A−1/2 log BA−1 A1/2 .H ˜ F )U T A−1/2 . with the appropriate constants r1 . Nd − 1. log CA−1 Using log(X) = − log(X −1 ) : = log AB −1 .
in general. this matrix is. We proved this statement in proposition 5.68 APPENDIX B. . similar algebra yields grad (A → g(A. symmetric.3 and in this appendix are suﬃcient to compute the gradient grad E(γ). Although it is not obvious from it’s expression. We introduce: −1 AB −1 = UB DB UB −1 AC −1 = UC DC UC ˜ FB = ﬁrst divided diﬀerences of log for λi = (DB )ii ˜ FC = ﬁrst divided diﬀerences of log for λi = (DC )ii −1 −1 −1 YB = UB log(AC −1 )UB = UB UC log(DC )UC UB −1 −1 −1 YC = UC log(AB −1 )UC = UC UB log(DB )UB UC Then. GRADIENT OF EA FOR PN + The matrices AB −1 and AC −1 are not. The equations presented in section 5. by construction.3. They still have real.2. B. C)) (A) = UB DB (YB −1 ˜ FB )UB A + UC DC (YC −1 ˜ FC )UC A. symmetric. positive eigenvalues and are diagonalizable.
so(3) = TI SO(3) = {H ∈ R3×3 : H + H T = 0} is called the Lie Algebra of SO(3) and corresponds to the set of skewsymmetric matrices. The material in this appendix has been tested and works very well. Another area of potential applications could be related to control of mechanical systems. where one could require a camera to show some 3D scene under diﬀerent rotations at predetermined times. Our reference for this appendix is [Bak00]. the logarithmic map and the geodesic distance are tractable. at the identity matrix I. Applying theorem 2. we give a full toolbox of formulas needed to apply the techniques described in this document to SO(3). i. In practice. AT H2 I T = trace (AT H1 )T AT H2 = trace H1 H2 = H1 . one would apply our algorithms with zero λ and low µ to achieve smooth interpolation. H2 A = AT H1 . It can be proven that. Typically. SO(3) (the special orthogonal group) is such a manifold of great practical interest. LogI : SO(3) → so(3) : A → LogI (A) = log(A). SO(3) is a submanifold of GL(3). Considering only the orthogonal matrices in SL(3) yields the special orthogonal group: SO(3) = {A ∈ SL(3) : AT A = I}. it is a group and a manifold.1. The proposed approach yields considerably smoother looking transitions than could be obtained with piecewise geodesic interpolation. ﬁnitedimensional Riemannian manifold. Elements of SO(3) are characterized by an axis of rotation in R3 and an angle of rotation. SO(3) is a Lie group. Hence. 69 . SO(3) has dimension 3. the Exp and Log maps reduce to the matrix exponential and logarithm: ExpI : so(3) → SO(3) : H → ExpI (H) = exp(H). In this appendix. an introduction to matrix Lie groups.Appendix C SO(3) geometric toolbox Our algorithms and problem formulation from chapter 3 are well deﬁned for any smooth. notably. Practical applications include computer graphics for example. Endowing so(3) with the inner product from GL(3) gives a natural metric on SO(3): H1 . A particular subgroup of it is the special linear group: SL(3) = {A ∈ GL(3) : det(A) = 1}. with smooth transitions between rotations. Note ˜ ˜ that TA SO(3) = A so(3) = {H = AH : H ∈ so(3)}. see table C. H2 ..2 gives a simple expression for the tangent spaces.2.e. The linear group is deﬁned as GL(3) = {A ∈ R3×3 : det(A) = 0}. The tangent space at the origin. they can be applied to such manifolds only if the expressions for.
B. H2 H A A T = trace H1 H2 = H. . grad fA (B) = B log(AT B) = − LogB (A) ∈ TB SO(3) And: −1 AT B = UB DB UB T −1 UC DC UC (diagonalize) A C= (diagonalize) ˜B = ﬁrst divided diﬀerences of log for λi = (DB )ii F ˜ FC = ﬁrst divided diﬀerences of log for λi = (DC )ii ˜ grad (A → g(A. with ξ. This vector transport preserves inner products (it is an isometric transport). C) = A log(AT B).. B. C)) (A) = BUB FB ˜ + CUC FC −T ˜ grad (B → g(A. .3 for positive matrices: fA : SO(3) → R+ : B → fA (B) = 1 dist2 (A.70 Set: Tangent spaces: Inner product: Vector norm: Distance: Exponential: Logarithm: Mean: APPENDIX C.1: Toolbox for the special orthogonal group. η2 A = Tξ (η1 ) . From there. see table C. C)) (C) = AUC FC −1 UB log(C T A)UB −1 UC log(B T A)UC −T T UB log(AT C)UB −T T UC log(AT B)UC −1 UB −1 UC ∈ TA SO(3) T UB ∈ TB SO(3) T UC ∈ TC SO(3) This is suﬃcient to compute the gradient of the objective function (3.3 and appendix B yields formulas for the gradients. B) = LogA (B) = log(AT B) ExpA (H) = A ExpI AT H = A exp(AT H) LogA (B) = A log(AT B) mean (A. C)) (B) = AUB FB −T ˜ grad (C → g(A. B. Tξ (η2 ) B . we can deﬁne the Exp and Log maps everywhere on SO(3).1 and the fact that orthogonal matrices are diagonalizable. .5 log(AT B)) Table C.3. We introduce the two next functions like we did in section 5.1. η1 .e. × SO(3). C) → g(A. A log(AT C) . SO(3).6) applied to Γ = SO(3) × . B) = A exp(. H A dist (A. 2 3 g : (SO(3)) → R+ : (A. B. We used remark 5. i. SO(3) GEOMETRIC TOOLBOX SO(3) = {A ∈ R3×3 : AT A = I and det(A) = 1} TA SO(3) = {H ∈ R3×3 : AT H + H T A = 0} H1 . B) . Similar algebra to what was done in section 5. η ∈ TA SO(3) and ξ = LogA (B) . B. with the following vector transport on SO(3) with a component wise extension to Γ: Tξ (η) = BAT η. The descent algorithms from chapter 3 apply directly.
Fillard. Princeton University Press. Optimization Algorithms on Matrix Manifolds. Geometry and the dynamic interpolation problem. Bhatia. Silva Leite. Silva Leite. Pennec. and ﬁber tracking with logEuclidean metrics. Crouch and F. R. 13(4):467–502. [GB10] [Hai09] [Hig08] [HS07] M. Positive deﬁnite matrices. smoothing. Manfredo Perdig˜o do Carmo. Arsigny.. R. Silva Leite. Pennec. Nonlinear control in the Year 2000. Crouch. Mahony. version 1.. Control Syst.A. Ayache.gla. In Journal of Dynamical and Control Systems. pages 397–429. Andrew Baker. 12(4):399–410. pages 23–34. 1986. Society for Industrial and Applied Mathematics.ac. 1(2):177– 202. 2007. 1995.Bibliography [AFPA08] V. Higham. Lie groups.21. In Book chapter. Silva Leite. Silva Leite. P... Boothby. 2008. pages 1131–1136. Altaﬁni. volume 5. Am. Luc Haine. 2007. On the geometry of rolling and interpolation curves on u S n . An introduction to diﬀerentiable manifolds and Riemannian geometry. SIAM Journal on Matrix Analysis and Applications.M.uk/~ajb/dvips/liebern. El´ments de g´om´trie diﬀ´rentielle (MAT2110 course notes). [Alt00] [AMS08] [Bak00] [Bha07] [Boo86] [CKS99] C. Mathee e e e matics Department. Geometric means in a novel vector space structure on symmetric positivedeﬁnite matrices. J. 1991. P. Control Systems. 2000. Dyn. 2008. Boston. May 2010. Control Inform. 2008. and P. Arsigny. Camarinha. K. SOn . H¨ per and F.maths.pdf. 1991. 71 . P. IMA J. Grant and S. Dynam. The dynamic interpolation problem: on Riemannian manifolds.com/cvx. M. 29(1):328. W. Crouch. 1999. Riemannian geometry. Princeton University Press. and symmetric spaces. The de Casteljau algorithm on the Lie group and spheres. Translated from the second a Portuguese edition by Francis Flaherty. An introduction to matrix groups and their applications. Fillard. X. Crouch and F. NJ. 1992. MA. Boston. Absil. X. Math. In Proc. and N. and F. Splines of class C k on nonEuclidean spaces. http: //www. Universit´ catholique de Louvain (UCL). PA. P. [CS91] [CS95] [CSC95] [dC92] [FPAA07] P. http://cvxr. Philadelphia. IEEE Transactions on Medical Imaging. Ayache. Sepulchre. P. 2000. and Grassmann manifolds. 2007. 1995. and N. Functions of Matrices: Theory and Computation. e Nicholas J. V. F. and R. Princeton. Clinical DTMRI estimation. Boyd. 26–29 July. Control Conf. Mathematics: Theory & Apa plications. G. Birkh¨user Boston Inc. USA. Academic Press Inc. CVX: Matlab software for disciplined convex programming. 26(11):1472–1482. Kun. 2009. The de Casteljau algorithm on SE(3). J.
Absil.A. e Eric W.J. Appl. Taiwan. Rodrigues. Biometrika. ıs Int. Tutuncu. 2004. 11:545–581. . Sturm. Appl. Algorithmique num´rique (INMA2710 course notes). K. Comput. 2009. Dryden. and Eric Klassen. 194(2):177–191. L¨fberg. Toh. Sdpt3 . 2006. 6(4):465–473. Lyle Noakes.. Todd. 2007. [KDL07] [L¨ 04] [MS06] [MSH06] [NHP89] [NW99] [PN07] [SASK09] Chaﬁk Samir. and R. 9:86–103 (electronic). J. Ian L. A twostep algorithm a of smooth spline generation on Riemannian manifolds. LMS J.023 (submitted).a matlab software package for semideﬁnite programming. 1999. Lu´ Machado and F. Tomasz Popiel and Lyle Noakes. and Rui C. J. 1998. Lu´ Machado. 2006. J. Anuj Srivastava. A gradientdescent method for curve ﬁtting on riemannian manifolds. Numerical optimization. Cubic splines on curved spaces. Taipei. Paul Van Dooren. J. 94(3):513–528. From MathWorld–A Wolfram Web Resource. Silva Leite. 1999. Universit´ catholique de Louvain (UCL). F´tima Silva Leite. Control Inform. Greg Heinzinger. 4(J06):25–53. H. Comput. Riemannian means as solutions of ıs u variational problems. Math. IMA J. and Huiling Le. Using sedumi 1. Approx. 1989. Math. C. [Stu98] [TTT99] [VD08] [Wei10] Jos F. 148(2):111–127. Weisstein. Math.wolfram. B´zier curves and C 2 interpolation in Riemannian e manifolds. Fitting smooth paths on Riemannian manifolds.html. Math. Wright.72 [JSR06] BIBLIOGRAPHY Janusz Jakubiak. 2008. Theory.. 2007.02. Alfred Kume.J. Stat. 2010. Optimization Methods and Software. and Brad Paden. a matlab toolbox for optimization over symmetric cones. In o Proceedings of the CACSD Conference. Complete metric space.com/CompleteMetricSpace. 2006. J. Springer Verlag. and Knut H¨ per. P. Nocedal and S. Silva Leite. Technical Report UCLINMA2009. http://mathworld. F.. Ecole Polye technique de Louvain. Yalmip : A toolbox for modeling and optimization in MATLAB. M. Universite catholique de Louvain.. Shapespace smoothing splines for planar landmark data.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.