You are on page 1of 20

Journal of Mathematical Imaging and Vision 23: 5–24, 2005


c 2005 Springer Science + Business Media, Inc. Manufactured in The Netherlands.

Image Manifolds which are Isometric to Euclidean Space

DAVID L. DONOHO
Department of Statistics, Stanford University, 390 Serra Mall, Stanford, CA 94305-4065, USA
donoho@stat.stanford.edu

CARRIE GRIMES∗
Google, Inc., 1600 Amphithatre Parkway, Mountain View, CA 94043
cgrimes@google.com

First online version published in July, 2005

Abstract. Recently, the Isomap procedure [10] was proposed as a new way to recover a low-dimensional
parametrization of data lying on a low-dimensional submanifold in high-dimensional space. The method assumes
that the submanifold, viewed as a Riemannian submanifold of the ambient high-dimensional space, is isometric to
a convex subset of Euclidean space. This naturally raises the question: what datasets can reasonably be modeled by
this condition? In this paper, we consider a special kind of image data: families of images generated by articulation
of one or several objects in a scene—for example, images of a black disk on a white background with center placed
at a range of locations. The collection of all images in such an articulation family, as the parameters of the articu-
lation vary, makes up an articulation manifold, a submanifold of L 2 . We study the properties of such articulation
manifolds, in particular, their lack of differentiability when the images have edges. Under these conditions, we show
that there exists a natural renormalization of geodesic distance which yields a well-defined metric. We exhibit a list
of articulation models where the corresponding manifold equipped with this new metric is indeed isometric to a
convex subset of Euclidean space. Examples include translations of a symmetric object, rotations of a closed set,
articulations of a horizon, and expressions of a cartoon face.
The theoretical predictions from our study are borne out by empirical experiments with published Isomap code.
We also note that in the case where several components of the image articulate independently, isometry may fail; for
example, with several disks in an image avoiding contact, the underlying Riemannian manifold is locally isometric
to an open, connected, but not convex subset of Euclidean space. Such a situation matches the assumptions of our
recently-proposed Hessian Eigenmaps procedure, but not the original Isomap procedure.

Keywords: Isomap, multidimensional scaling, manifolds of articulated images, isometry

1. Introduction in spectral balance. The collection of all such images


of an object, as the parameters (locations, scale, pose,
An object in the world can be imaged in various ways: light color, etc.) vary, can be thought of as a mani-
it can be observed from various distances and orienta- fold in the high-dimensional space of all conceivable
tions; also the object itself can change appearance in images; compare Nayar et al. [8] and Belhumeur and
pose and articulation; and finally the lighting can vary Kriegman [1].
Often one is given many digital images (Ii : i =
∗ Thiswork has been partially supported by National Science Foun-
1, . . . , N ) of the same object in a variety of artic-
dation grant DMS 00-72661, and by DARPA Applied and Compu- ulations and poses, but the data are ‘unlabeled’ in
tational Mathematics Program. the sense that the underlying parametrization and the
6 Donoho and Grimes

corresponding parameter values are unknown. The im- There are several reasons such a purely empirical
ages are thus thought to arise from a manifold M, with- investigation based on running Isomap on artificial ex-
out the manifold or the associated parametrization be- amples may not be enlightening.
ing known. For concrete tasks in image understanding
and image coding, it could be useful to ‘learn’ the struc- • Sampling issues. Whether Isomap works or not
ture of such image articulation manifolds and to recover might depend on how many model images Ii are
the underlying parameters (location, scale, etc.) from in the database. In some vague sense, it will be im-
unlabeled data. This could be helpful for recognizing, portant for these images to be well-distributed across
for example, articulated vehicles in target recognition the image manifold, but exactly what this means in a
or faces with different expressions in facial recognition. specific instance is unclear. Consequently, if Isomap
The general problem of learning the shape of a man- ‘fails’, this may be due to poor data sampling rather
ifold from scattered observations has been around for a than any intrinsic property of Isomap.
long time; it has been the source of many multivariate • Digitization issues. In some sense, the fact that im-
techniques, including principal components analysis, ages are discretized into pixels makes them ‘noisy’/
independent components analysis, multidimensional ‘blocky’; this again causes some difficulty in inter-
scaling, self-organizing mappings, and other important preting ‘failure’ of Isomap—is it due to the pixeliza-
methodological developments. tion or is it intrinsic to the type of articulation?
Recently, Tenenbaum et al. [10] proposed the • ‘Big Picture’ issues. Empirical work really doesn’t
Isomap procedure as a general tool for recovering give us an intellectual framework that we can lever-
the unknown parametrization underlying a set of age into other settings, at least not the kind of frame-
digital images, {Ii }, of faces in various attitudes work that might be possible by a more theoretical
and articulations. The general principle of Isomap approach.
is to measure distance between images, not using
Euclidean distance (which obscures the intrinsic For these and other reasons, we propose to develop
manifold structure), but using distance according to an alternate framework for understanding the behavior
the shortest path in a nearest neighbor graph; and to of Isomap on image manifolds. We think of an image
use this graph distance as input to a classical “principal as a function I (x) of a continuous variable x ∈ R2 .
coordinates” multidimensional scaling procedure. We consider articulations of a base image I0 , produc-
ing a family of images {Iθ : θ ∈ }, where θ is the
parameter of the articulation and  is the parameter
space. Examples we will consider include translation
1.1. Validating Isomap
families, where  = R2 and Iθ (x) = I0 (x − θ). In
this “continuum” viewpoint, neither sampling nor dig-
Tenenbaum et al. [10] published a few interesting ex-
itization can cause problems, and a clear intellectual
amples, for example mapping out the parameters un-
framework exists naturally.
derlying a face seen from a variety of viewpoints. These
We consider families of images as a subset of L 2 (R2 )
empirical successes lead to the . . .
and, initially, measure distance between images by
Obvious Question: how “correct” is the Isomap pro- µ(θ1 , θ2 ) = Iθ1 − Iθ2  L 2 . The subset M = {Iθ : θ ∈
cedure; does it really recover the “true” underlying } becomes a metric space (M, µ) and, one can check,
parametrization of families of articulated images? a continuous submanifold of L 2 = L 2 (R2 ).
This leads immediately to the ... Translating the isometry principle underlying
Obvious Approach: Test Isomap for synthetic data Isomap into this continuum setting, we let G(θ0 , θ1 )
where we know a priori “the natural” parametriza- be the geodesic distance defined by taking the shortest
tions and see if it can recover the parametrization of path lying in M going from Iθ0 to Iθ1 . Then we ask if
image manifolds. G(θ0 , θ1 ) is proportional to Euclidean distance between
θ0 and θ1 , for all pairs (θ0 , θ1 ). In short, we ask if (, G)
In following the ‘obvious’ approach, we would con- is isometric to (, ·) up to a constant scaling factor.
struct synthetic datasets of artificial images undergoing When the indicated isometry property holds, then it is
standard articulations—translations, rotations, etc.— possible to recover, from knowledge of M alone, the
and ask if the Isomap parametrization correctly dis- parameter space  and the parametrization θ ↔ Iθ ,
covers the (known) underlying parametrizations. modulo a rigid motion and rescaling of the parameter
Image Manifolds which are Isometric to Euclidean Space 7

space. We will therefore also say, when isometry holds, case when there are several articulating objects which
that Continuum Isomap works. avoid each other—the exclusion requirement creates a
Our approach provides a clear theoretical framework nonconvex parameter space. In other cases, even local
with crisply-stated problems that are easily answered. isometry fails, for example, when objects occlude.
There will, however, be a certain amount of technical A special feature of our approach is the analysis
detail required to develop this view, having to do with framework we have developed. In this framework we
the fact that when images have edges, the underlying view images as defined on a continuum, and we trans-
‘surface’ M ⊂ L 2 (R2 ) is typically not a differentiable late an algorithm of interest into a continuum version
manifold; indeed, the geodesic distance G will not be fi- of the algorithm. We have applied the framework here
nite. We will derive a renormalized geodesic distance δ, to understanding the Isomap procedure, but it could
a rescaled limit of distances defined on smoothed ver- equally well be applied to understanding other mani-
sions of M as the smoothing parameter tends to zero. fold learning procedures, such as Locally Linear Em-
Finally our criterion for success in the continuum set- bedding (LLE) [9]. We in fact used this framework to
ting will be that (, δ) is isometric to (, ·). understand LLE specifically, and what we learned from
the effort led us to define a new procedure, Hessian
1.2. Results Eigenmaps [4], introduced in other work, that can solve
the local isometry problem defined here. Potentially,
Our criterion for saying that ‘Continuum Isomap the ‘continuum framework’ could be applied usefully
works’ is quite stringent, demanding exact isometry to additional algorithms.
of the underlying metric spaces, and therefore it may The article is organized as follows: Section 2 de-
seem unlikely that it could ever hold. As it turns out, in velops the mathematical foundations of renormalized
the image articulation setting that we describe, there are distance δ; Section 3 gives a few general ‘simple’
a number of interesting image libraries where isometry image families where δ can be calculated; Section 4
really does hold. These cases include: tests the theoretical results against generated image
data; Section 5 interprets the methodology for more
• Translation of simple black objects on a white back- complex example image families (such as a horizon
ground; parametrized by a basis function expansion, or com-
• Pivoting certain simple black objects on a white posite articulations of more than one object). Section 6
background around a fixed point; discusses a more elaborate example: movies; Section 7
• Morphing of boundaries of black objects on a white offers examples where isometry fails; and further issues
background; and discussion appear in Section 8.
• Articulation of ‘fingers’ of a digital ‘hand’;
• Articulation of a cartoon face by arranging its eye- 2. Geodesic Distance Between Images
brows, eyelids, and lips.

It also turns out that isometry holds for certain ‘movies’, In this section, we develop a continuum setting for un-
i.e. for images articulating in time; an example be- derstanding the issues behind the Isomap procedure.
ing a cartoon face gesturing in time according to a This involves a review of some basic geometric no-
sufficiently rich and complete inventory of gestures. tions, such as arclength and geodesic, showing the
Our theory predicts that from ‘watching’ sufficiently ill-posedness of the ‘obvious’ approach to Continuum
many movies of a cartoon face gesturing in time, Isomap, and defining a special renormalized distance
Isomap could in principle correctly learn the correct and formulas for computing it.
parametrization underlying the facial gesturing. Many
such results are possible to state, but we do not pursue 2.1. A Simple Manifold of Images over
them in this paper. the Continuum
Our examples provide a theoretical framework, in the
setting of image databases, which validates the claim Consider a smooth function f 0 : R2 → R2 that is
implicit in early uses of Isomap. However, we also point radially symmetric with unit L 2 -norm. For a parameter
out examples of image articulation manifolds where θ ∈ R2 , we then have the translation family
isometry fails. At one extreme, local isometry holds, but
the underlying parameter space is not convex; this is the f θ (x1 , x2 ) = f 0 (x − θ ), (2.1)
8 Donoho and Grimes

so that the ‘image’ is being shifted. A simple calcula- x2


f 0 (x) = e− 2 , and f θ (x) = f 0 (x − θ). Then
tion shows that
 1/2 G(θ1 , θ0 ; M) = cθ1 − θ0 2 (2.3)
   
 fθ − fθ  2 =  f θ (x) − f θ (x)2 d x
1 0 L 1 0
where
= η(θ1 − θ0 ), (2.2)  
 ∂ 
c−1 =  
 ∂ x 0 2 2 .
f
where η : R+ → R is a continuous nonlinear function 1 L (R )
satisfying η(0) = 0. Moreover, when θ0 −θ1  is large, In short, up to a constant, geodesic distance agrees
f θ0 and f√
θ1 are almost entirely separated and therefore globally with Euclidean distance on the natural param-
η(∞) = 2. eter space. We can then conclude that M, viewed as a
Suppose  = {θ  ≤ 1}, and set M = { f θ : θ ∈ Riemannian submanifold of L 2 is isometric to (, ·).
}. M is a 2-dimensional subset of the ambient space
L 2 (R2 ).
2.3. Recovering the Natural Parametrization
Define the metric µ(θ0 , θ1 ) =  f θ0 − f θ1  L 2 (R2 ) . The
relation µ(θ0 , θ1 ) = η(θ0 − θ1 ) means that, while the
The idea behind Isomap [10] is that an isometry be-
topology of M is equivalent to the Euclidean topology,
tween M (viewed as a Riemannian submanifold of an
the submanifold is a curved subset of L 2 .
ambient space) and a subset of Rd allows one to recover
the underlying isometric parametrization. Isomap ex-
2.2. Geodesic Distance ploits the usual procedure of metric multidimensional
scaling [2, 7]. Given a matrix of pairwise distances
Let M be a submanifold of L 2 , and consider now a (di, j : 1 ≤ i, j ≤ n) between pairs of objects, and sup-
curve γ : [0, 1] → M. Suppose that γ is C 1 -smooth posing these are truly the Euclidean distances between
when viewed as a curve [0, 1] → L 2 (R2 ). Thus for each points in a d-dimensional Euclidean space, this proce-
t ∈ [0, 1] there exists a derivative γ̇ (t) : R2 → R that dure returns n points xi , say, obeying
has finite L 2 -norm and that varies continuously in t.
Then a first definition of the length of γ is xi − x j  = di, j .
 1
L(γ ) = γ̇ (t) L 2 dt. The Isomap proposal is to use specially-derived empir-
0 ical geodesic distances for the di, j .
A more general definition of arclength, which agrees Applying this idea in the continuum image setting,
with the calculus definition when γ is C 1 -smooth, sets if we use true geodesic distances di, j = G(θi , θ j ; M)
and the isometry property (2.3) holds, we would obtain

n−1 a set of points (xi ) obeying
L(γ ) = sup γ (ti+1 ) − γ (ti ) L 2 ,
i=0 xi = U θi + b.
where the supremum is taken over all partitions of the for some fixed orthogonal matrix U and vector b.
unit interval with arbitrary breakpoints 0 = t0 < t1 <
· · · < tn = 1. This definition is valid even for merely Definition 2.1. Suppose that the geodesic distance be-
continuous curves γ . tween points in M, viewed as a submanifold of L 2 ,
The geodesic distance between points θ0 and θ1 in is proportional to Euclidean distance in the parameter
the smooth manifold M is the length of the shortest space . Then we say that (, G) and (, ·) are in
curve joining the two points, or isometry, and that Continuum Isomap works.
 
G(θ0 , θ1 ; M) = inf L(γ ) : γ (0) = f θ0 , γ (1) = f θ1 . Of course, isometry allows recovery only up to a
scaling and rigid motion. Summarizing the above dis-
We note that G is the distance between points in M, cussion, we have:
when it is viewed as a Riemannian submanifold of the
ambient space L 2 . As a simple example, suppose f 0 is Corollary 2.1. Suppose we have a parametrized fam-
a smooth radial function on R2 , such as the Gaussian ily of ‘images’ f θ : R2
→ R defined by translation of
Image Manifolds which are Isometric to Euclidean Space 9

a common prototype: f θ (x) = f 0 (x − θ ), that f 0 is ra- which increases without bound as n → ∞. This means
dially symmetric, belongs to L 2 and is differentiable that the mapping θ
→ Iθ cannot be differentiable in
in L 2 . Then, for an appropriate c > 0, the geodesic L 2.
distance between f θ0 and f θ1 has the form More generally, we have:

G(θ0 , θ1 ) = c · θ0 − θ1 . Theorem 2.1. If M is the articulation manifold got-


ten by translating an image with edges, then M is a
Hence isometry holds and Continuum Isomap works. continuous but not a differentiable manifold. Moreover
paths in M do not have finite length.
For complete clarity, we also point out that, in these
cases, using ordinary L 2 distance rather than geodesic
distance would not recover the true parameter space. For the formal proof of this result, see [6]. In short,
Indeed as we have seen µ(θ0 , θ1 ) = η(θ0 −θ1 ) where geodesic distance is typically not well-defined on an
η(·) is nonlinear. Therefore, the properties of geodesic image articulation manifold when the generating im-
distance make it uniquely successful in this setting. age has edges. Since any reasonably interesting image
has edges, this seems at first glance to be fatal to the
notion of using isometry to uncover parametrization in
2.4. Non-differentiable Image Manifolds the continuum setting.

The above definitions rely on the implicit assumption


that the submanifold M is differentiable, with finite 2.5. Renormalized Geodesic Distance
geodesic distances everywhere. Unfortunately, objects
in the real world have edges, and so a respectable To rescue the notion of isometry in cases where the
model of images on the continuum has discontinuities underlying image has edges, we introduce a process of
at edges of the object boundaries. This turns out to mean regularization and renormalization.
that the submanifold M is typically non-differentiable, Consider again the case where I0 = 1{|x|≤1} is the
and that typically the geodesics do not have finite indicator of the disk, and define the regularized object
length. I0h by I0h = I0  φh where φ is a smooth radial function
Consider the indicator of the unit disk, of width h and  stands for convolution. Each image
Iθh = I0h (x − θ) is smooth, and if we let Mh denote the
I0 (x) = 1{|x|≤1} , image manifold generated by translations of Iθh , with
L 2 metric µ(θ1 , θ0 ; Mh ) = Iθh − Iθh0  L 2 , we get that Mh
and a family of translated images, θ ∈  = R2 , Iθ = is a smooth submanifold of L 2 . Obviously, Mh tends
I0 (x − θ), θ ∈ . Then let M = {Iθ }. Now we can to M as h → 0, and if we consider a curve ϑ(t) in
write µ(θ1 , θ0 ; M) = Iθ1 − Iθ0  L 2 = η(θ1 − θ0 ), for the common parameter space  of all the Mh and of
a new function η(·). Indeed, Iθ1 − Iθ0  is just the square M, then the lengths of the induced paths on the respec-
root of the area formed by the symmetric difference tive manifolds Mh must converge to the lengths of the
between the support of Iθ1 and Iθ0 . This area depends on limit manifold M. It follows that, if we let G(·, ·; Mh )
r = θ1 − θ0  only. Also, we can calculate that η is not denote the corresponding geodesic distance on Mh ,
differentiable at the origin: η(d) d 1/2 . Indeed when we have
r = θ1 − θ0  the area Ar of the symmetric difference
supp(Iθ0 ) supp(Iθ1 ) obeys
√ Ar √∼ cr as r → 0. Hence lim G(θ1 , θ0 ; Mh ) = ∞.
its square root obeys Ar ∼ cr as r → 0, which h→0

implies the asymptotic equivalence η(d) d 1/2 .


In short, regularization makes the geodesic distance
We see immediately that every path in M has infinite
finite, but this effect disappears as h → 0.
length. Indeed, using the extended notion of arclength
To get a finite limiting result, we make the observa-
discussed in Section 2.2, and setting rn = I1/n − I0  L 2 ,
tion that, if τ0 and τ1 are two fixed ‘landmarks’, then

n−1   the ratio
L(γ ) ≥  I(k+1)/n − Ik/n 

k=0 G Iθh0 , Iθh1 ; Mh


= n · η(rn ) ∼ n · η(c/n) n · n −1/2 = n 1/2 , G Iτh0 , Iτh1 ; Mh


10 Donoho and Grimes

need not diverge as h → 0. In effect, if we renormalize δ(θ0, θ1)


the distance so that τ0 and τ1 stay at unit distance on Θ
Mh for all h > 0, we get a finite limiting ratio.
In our running example, let I0 be the indicator of the
Figure 1. Correspondence between parameter space and smoothed
unit disk in R2 and let Iθ (x) = I (x−θ) be translation by manifold.
θ . For regularization, let φh denote the standard bivari-
ate Gaussian density with covariance matrix h 2 · I d. Let γ τh
the path ϑτ (t) be the line segment between τ0 = (0, 0)
and τ1 = (1, 0). The length of the corresponding path h
γh M
(the ‘landmark distance’), γτh , on the smooth manifold
Mh generated by γτh (t) = Iϑ(τ h
) is given by

 

∂ h 
1
L γτh =  
 ∂t γτ (t) 2 2 dt
Θ
0 L (R )
 1 
∂ h 
=  I  Figure 2. The abstract manifold M.
 ∂t ϑτ (t)  2 2 dt. (2.4)
0 L (R )

3. The Abstract Manifold M


Now note that because of the translation invariance of
the L 2 norm, we can write, for a constant Ah,τ not Inspired by the success of renormalization in the spe-
depending on t, cial case above, we propose the following shift of
viewpoint. We define an abstract manifold M (see
     Fig. 1) which consists of a pair (, δ), where  is the
∂ h  ∂ h  
 I  =  
 ∂t ϑτ (t)   ∂t ϑτ (t)  2 2  = Ah,τ ,
I parameter family underlying M, and δ is a notion of
L 2 (R2 ) L (R ) t=0
distance on  ×  which is derived from geodesic dis-
tance on the smoothed manifolds by the above limiting
so L(γτh ) = Ah,τ . Now for any other chosen parameter procedure.
points θ0 and θ1 in R2 , connected by a line segment,
consider the corresponding path γ01h
in Mh between Iθh0 Definition 3.1. Let ϑ(t) denote a path in both the ab-
h
and Iθ1 . We can perform the same calculation and see stract manifold M and in the parameter space . Con-
that struct a family Mh of smooth manifolds, where h is the
smoothing parameter, converging in L 2 distance to M
h
as h → 0. Let γ h denote the corresponding path in
L γ01 = Ah,τ · θ0 − θ1 .
Mh , and let γτh be the path in Mh induced from the line
Moreover γ01h
is actually, as one might guess, the segment connecting τ0 to τ1 (see Fig. 2). Then define
geodesic in Mh between Iθh0 and Iθh1 . Hence renormalized length λϑ in M by

L(γ h ; Mh )

λ(ϑ; M) = lim
.
G Iθh0 , Iθh1 ; Mh Ah θ0 − θ1  h→0 L γτh ; Mh

= = θ0 − θ1 .
L γτ ; Mh
h Ah
How does this work in our running example of
translating the disk? Fix landmarks τ0 = (0, 0) and
In short, the dependence on h vanishes, and the result τ1 = (1, 0); consider for our path the line segment
is a stable, well-defined (and interpretable!) quantity. ϑ(t) = θ0 + t(θ1 − θ0 ) which runs from θ0 to θ1 accord-
The argument given above for translations of the in- ing to a straight line. We find that
dicator disk is specialized for this particular example.
However, in Appendix A we show that the conclusion λ(ϑ; M) = θ1 − θ0 .
that renormalization works is valid in a more general
context, and that the renormalized limit is finite and Definition 3.2. In referring to renormalized geodesic
defines a sensible metric in a broad range of cases. distance on M we intend the following: fix two
Image Manifolds which are Isometric to Euclidean Space 11

landmarks τ0 and τ1 , construct a sequence Mh of that different regularizations might conceivably lead
smooth manifolds, where h is the smoothing parame- to different answers; second, this might not define an
ter, and which converge in L 2 distance to M as h → 0. actual distance; third, the manifold might not be con-
Then set tinuous or smooth according to this distance. While
there are satisfactory general answers to these ques-
δ(θ0 , θ1 ; M) = min{λ(ϑ(·); M) : ϑ(0) tions, we prefer simply to say for now that in specific
= θ0 , ϑ(1) = θ1 }. cases we have studied, these issues do not arise. See
also [6].
Again, in our running example of translating the disk
indicator, the length of every line segment in the ab-
stract manifold is the same as its Euclidean length.
It follows that the geodesics for δ will just be the 3.1. Calculation of λ, δ
Euclidean geodesics. We also propose the following
interpretation of δ: We now give a formula for the length of paths through
the image manifold deriving from a family of articu-
Definition 3.3. Suppose that, for a constant c > 0, lated images. Our formula applies to a range of articula-
the renormalized geodesic distance δ obeys δ(θ1 , θ2 ) = tion types. In particular, we will not assume a particular
cθ1 − θ2 2 , for all θ1 , θ2 ∈ . Then we say that (, δ) form (such as translation or rotation) for the articula-
and (, ·) are in isometry. tions; merely that they involve smooth transformations
of the object in question.
In the rest of this paper, we will study only mod- In our model, we consider black-and-white images
els of images with edges. We therefore will abandon where the black region Bθ is a kind of ‘blob’—a com-
Definition 2.1, and the associated viewpoint, in favor pact set in the plane, with simple closed curve for
of this new definition of isometry. No longer do we boundary (see Fig. 3). The images themselves are indi-
think of the submanifold M of L 2 , instead we think of cator functions Iθ (x) = 1 Bθ (x), with white represented
the abstract manifold M modeled on , with its own by 0 and black by 1. The boundary ∂ Bθ is a smooth
metric. When isometry holds under these definitions, curve.
we will say that Continuum Isomap works. In defining the regularization procedure, we will
The new definition really matches the original pur- use the 2-dimensional standard Gaussian kernel
pose of our study: applications of Isomap to images. φ(x1 , x2 ) = exp{−(x12 + x22 )/2}, and, with φh (x) =
An object in a scene is rendered into a digital image h −2 φ(h −1 x), define Iθh = φh  Iθ .
by pixelization, and this process provides a limited- The set Bθ has an outward-pointing normal at each
resolution image much as φh  I0 is a limited-resolution point b of the boundary, labeled n(b; θ). Each boundary
image. In some sense applying Isomap to sufficiently point also undergoes a motion with changing θ . We use
fine but limited-resolution imagery gives geodesic dis- the notation νi (b, θ ) for components of the ‘motion
tances heuristically of the form G(·, ·; Mh ). But, as we vector’: the rate of change of a specific boundary point
show in the proof of Theorem 3.1, those distances obey with change in the i-th component of the parameter
vector θ . This is defined carefully in the Appendix.
G(θ1 , θ0 ; Mh ) ∼ δ(θ1 , θ0 ; M)h − 2 ,
1
h → 0. We have the following results.

Obviously, the factor h −1/2 is completely transpar-


ent to all operations involved in metric multidimen-
sional scaling, and will not change the reconstruction
of pointsets except for a homothetic dilation that can
easily be ‘standardized away’ by fixing the distance be-
tween landmark points. This suggests that what really
matters for Isomap is the leading coefficient: δ(·).
There are several things that might be checked
about our definition of δ: first, the choice of smooth- Figure 3. Generic object undergoing a transformation by motion
ing is not specified, and the definition might conceiv- vector ν(b, t); n(b; θ ) is the boundary normal, ν(b, t) is the motion
ably not be invariant to choice of regularization, so vector.
12 Donoho and Grimes

Theorem 3.1. Let (ϑ(t) : t ∈ [0, 1]) be a smooth Again, if we translate from θ0 to θ1 , the linear path
curve in parameter space and, using the notation ϑ(t) = θ0 + t(θ1 − θ0 ), and so
above, set
 dϑi ν(b, t) = (θ1 − θ0 ).
ν(b, t) = νi (b, ϑ(t)).
i
dt
Plugging all these formulas into the inner integral in
Supposing that the boundary is smooth and that its
(3.5) gives
motion is smooth, the length of the curve in M is
  1
1/2   2π
λ(ϑ) = Cτ · n(b), ν(b, t) db
2
dt. n(b), ν(b, t) db = θ1 − θ0 
2 2
cos(ω)2 dω
0 ∂ Bθ ∂ Bθ 0

(3.5) = π · θ1 − θ0  . 2

Here Cτ depends on the choice of landmark points, τ1 Hence we have, as expected:


and τ2 , used in defining the renormalized length.

Theorem 3.2. Let (ϑ(t) : t ∈ [0, 1]) be a smooth λ(θ ) = Cτ · θ1 − θ0 .


curve in parameter space and, using the notation
above, set As far as the Riemannian structure goes, we have

gi j (θ) = n(b), νi (b; θ ) · n(b), ν j (b; θ ) db  2π
∂ Bθ
g11 (θ ) = cos(ω)2 dω = π,
Then gi j defines a Riemannian structure on  and 0

the renormalized length of the induced curve in M is


the same as the length of the curve in  with respect and similarly g12 = g21 = 0, while g22 = π . Hence the
to this Riemannian structure: Riemannian structure is simply Euclidean and isometry
holds.
 1 
dθi dθ j
λ(ϑ) = Cτ · gi j (θ ) dt. (3.6)
0 ij
dt dt
3.2. Inferring Euclidean Structure
Here Cτ depends on the choice of landmark points used
Below, we will see several examples where gi j is con-
in defining the renormalized length.
stant and proportional to the identity. This would seem
We outline the proof for the first of these results in to indicate that the abstract manifold is in fact isomet-
Appendix A; the other is proved similarly; a complete ric to a subset of Euclidean space, and that therefore
discussion is given in [6]. To see how the results can be Isomap works. In order to make such an inference from
applied, consider again our running example of trans- local characteristics to global ones, however, we need
lations of the disk. In this case, boundary points b can an extra element. In order to be globally Euclidean,
be parametrized by the disk center θ and the arclength the local structure must be Euclidean and the space 
ω along the perimeter, and must be convex. If it is not convex, then shortest paths in
 will not be dictated by the infinitesimal metric struc-
b = θ + (cos(ω), sin(ω)), ture alone, but will in some cases be dictated by the non-
convexity of the space, e.g. to avoid holes. Therefore,
while we will sometimes emphasize the issue of convexity.
In some cases, one can demonstrate that isometry
n(b; θ ) = (cos(ω), sin(ω)), holds without computing either δ or g. In fact, all that
is necessary is that  be convex, and that, for every
and pair of points θi ∈ , the linear path in parameter
 space ϑ(t) = θ0 + t(θ1 − θ0 ) has a length proportional
(1, 0) i = 1, to the Euclidean distance θ1 − θ0 , where the constant
νi (b; θ) =
(0, 1) i = 2. of proportionality does not depend on θi .
Image Manifolds which are Isometric to Euclidean Space 13

4. Simple Examples

We now give a few simple examples of the preceding


theory.

4.1. Translating a 4-Fold Symmetric Object

We now generalize the problem of translating a disk,


considering a broader class of symmetric objects.

Definition 4.1. An object B0 has 4-fold symmetry if


it is invariant under rotations about the origin by 90,
180, and 270 degrees.

This class of objects includes  p balls for 0 < p <


∞:
Figure 4. Translation of a 4-fold symmetric object.
 p = {x : |x1 | p + |x2 | p ≤ 1},

although for the cases p ∈ {1, ∞} the boundaries are Now for every b ∈ Q θ ,
not smooth.
• there are three corresponding boundary points b1 ,
Theorem 4.1. For translations of 4-fold symmet- b2 , b3 obtained by rotations through 90, 180, and
ric figures with smooth boundary, isometry holds; the 270 degrees;
renormalized geodesic distance is proportional to the • we have antisymmetric pairs of normals at b, b2 and
Euclidean distance in parameter space. at b1 , b3 , in the sense that the respective normals n(·)
point in opposite directions at each member of the
As  p balls for p = 2 are just disks, this contains our pair; and
earlier result about translating disks as a special case. • we have orthogonal pairs of normals at b, b1 , and
However, this result is much more general than just at b2 , b3 in the sense each pair of normals n(·) is
to  p balls; it works for a wide variety of nonconvex mutually orthogonal.
figures; see Fig. 4.

Proof: Consider the integrand of (3.5): Now letting n ⊥ (b) be perpendicular to n(b), we have
 
3
n(b), ν(b, t)2 db n(bi ), v2 = 2(n(b), v2 + n ⊥ (b), v2 ) = 2v2 .
∂ Bθ k=0

Now for a translation family, and a path ϑ(t) = θ0 + Hence,


t(θ1 − θ0 ), we have that ν(b, t) = θ1 − θ0 ; this is a
 
constant, v, say, independent of t and b. Hence the v2 v2
integral becomes: n(b), v2 db = db = L(∂ B0 )
∂ Bθ 2 ∂ Bθ 2

n(b), v2 db In short, recalling the definition of v = θ1 − θ0 :
∂ Bθ 
λ(ϑ) = L(∂ B0 )/2 · θ1 − θ0 .
Now for a 4-fold symmetric object, let Q θ denote the
northeast boundary (i.e. the intersection of ∂ Bθ with the
This implies the proportionality of geodesic and
positive orthant). Hence, we may write (taking b0 = b)
Euclidean distance.
  
3
n(b), v db =
2
n(bi ), v2 db Vin de Silva explained to us that a more general result
∂ Bθ Q θ k=0 is true, in which 4-fold symmetry is replaced by k-fold
14 Donoho and Grimes

symmetry for any k ≥ 3. De Silva’s argument starts by


noting that the infinitesimal structure of the metric δ n
is quadratic, and so has ellipsoidal level sets. At the v
same time, k-fold symmetry of B0 would impose k-fold
symmetry of that ellipsoid, which, for k ≥ 3, forces it to
be circular, and hence makes the infinitesimal structure θ

conformal. Isometry follows by noting that the scale of


the problem is translation-invariant.

4.2. Pivoting an Object

Consider an object B0 with smooth boundary ∂ B0 , and


suppose that 0 ∈ ∂ B0 . Now consider a family of articu- Figure 5. Pivoting a disk around a boundary point.
lations that pivot the object around zero. More formally,
let Rθ denote the rotation matrix
  of the square, and indeed this is the case. To verify that,
cos(θ ) sin(θ ) we need to know that (3.5) still works if ∂ Bθ is merely
Rθ = ,
− sin(θ ) cos(θ ) piecewise smooth. The demonstration of this appears in
[6]. It is clear that the same extension applies not only to
let I0 (x) = 1 B0 (x), and let squares, but to 4k-gons, for any k (squares are the case
k = 1). It is also clear that the same extension applies to
Iθ = I0 (Rθ x). the case of objects with piecewise smooth boundaries
and 4-fold symmetry. In all cases, isometry holds.
Theorem 4.2. In the Pivot model, isometry holds; On the other hand, as a limiting case of Theorem
for |θ0 − θ1 | < π renormalized geodesic distance is 4.2, consider a ‘pie slice’ B0 outlined by two radial
proportional to the Euclidean distance in parameter line segments, {(r cos (+ω), r sin (+ω)) : 0 ≤ r < R}
space. and {(r cos (−ω), r sin (−ω)) : 0 ≤ r < R}; and an
arc connecting them, {(R cos (ω ), R sin ω ) : −ω ≤
Proof: One has merely to notice that g11 (θ ) is inde- ω ≤ ω}.
pendent of θ. This is easy to see by observing that the Articulate the structure by a pivoting it according to
integrand in g11 , Rθ as in Section 4.2 above. Hence, with I0 = 1 B0 , set
Iθ (x) = I0 (Rθ x).
n(b), ν1 (b; θ )2 This is a limiting case of Theorem 4.2 because
the boundary is piecewise smooth rather than smooth.
is independent of θ . Indeed, for any value of θ, n(b; θ ) However, because (3.5) extends to the case of objects
points normal to ∂ B, while ν1 (b; θ ) points normal to with piecewise smooth boundaries, the conclusion of
the line segment joining 0 to b. The angle between Theorem 4.2 carries over to this case as well.
these two vectors is the same independent of θ (See
Fig. 5).
5. Empirical Results

In this section, we compare our theoretical results


4.3. Extensions to Piecewise Smooth Boundaries showing isometry in a continuum setting against em-
pirical results applying Isomap to families of digital
The results above can be generalized to the case of images.
piecewise smooth boundary curves.
On the one hand, as a limiting case of Theorem 4.1,
consider translations of an  p ball for p = ∞, where 5.1. Rigid Translation of a Single Disk
B0 is the unit square in the plane. Extrapolating the
above result from p < ∞ to p = ∞ suggests that it Consider our standing example where the image
might also be true that isometry holds for translations I0 (x) = 1{|x|≤1} is the indicator of a unit disk, and
Image Manifolds which are Isometric to Euclidean Space 15

Original Parameter Space, Single Disk 2D Embedding, Single Disk


4 4

3 3

2 2

1 1

0 0

−1 −1

−2 −2

−3 −3

−4 −4
−4 −2 0 2 4 −4 −2 0 2 4

Figure 6. Left panel: Original Disk Centers, colored by polar angle. Right panel: 2-dimensional embedding recovered by Isomap. The points
in the right panel have been colored by original polar angle (optimally rotated and scaled). Similarly-colored points are typically in similar
positions in the two panels.

our articulation translates the center of the disk by a by (3.5) after some calculation as
vector θ in R2 . As indicated above, isometry holds in
1
the continuum setting. We expect that an empirical trial λ(ϑ) ∝ π 2 × |dθ|
with a finite database of digital imagery should show
essentially ‘perfect’ recovery of the original parameter where the articulation is a pivoting through an angle
space. dθ = |θ0 − θ1 |. As in the previous experiment, we
Our test uses the publicly-available Isomap code create a database of 100 images of a black-on-white
[10], running with the nearest-neighbor parameter k = disk at 64-by-64 pixels, and articulate the disk under
7, and a set of 100 black-on-white images at 64-by- pivoting through a randomly selected angle θ .
64 pixels. The translation of the prototype was per- The resulting experiment, optimally rotated and
formed by randomly selecting x, y-coordinates of the scaled, shows a satisfactory match to the original
disk centers, restricting the centers so that the disk parameter space in Fig. 7. (We plot the disk cen-
stayed inside the borders of the image. The results of ters in two-dimensions for comparison to the image
the unit disk experiment are seen in Fig. 6. The ac- field).
tual Isomap 2-dimensional embedding was optimally
rotated and scaled using a Procrustes rotation [2] to
match it to the original parameter space for graphi- 6. Variations
cal presentation. Isomap almost perfectly recovers the
original spacing and relative positioning of the disk In this section, we return to the continuum setting
centers. In fact, we note that the only noticeable devia- and give more complex examples of articulation/object
tions tend to occur where the sampling was particularly models exhibiting isometry.
sparse, such as in the upper left corner of the parameter
domain.
6.1. Horizon Articulation

5.2. Pivoting of a Single Disk We consider first a variation in both the image index set
and the articulation model. Now the image I (u, v) will
Consider pivoting a unit disk around a fixed boundary have domain 0 ≤ u ≤ 1 and −∞ < v < ∞, forming
point, which for convenience is chosen as the origin a vertical ‘strip’, and the images of interest to us will
of our coordinate system. The position of the center of be black and white images with the boundary between
the disk provides a natural parametrization; as it lies the two colors being a horizon v = ψ(u). Hence
on the unit circle, it can be measured in radians θ . The
renormalized length of a line segment [θ0 , θ1 ] is given I (u, v; ψ) = 1{v≤ψ(u)} .
16 Donoho and Grimes

Original Disk Centers Embedded disk centers, scaled to original


4 4

3 3

2 2

1 1

0 0

−1 −1

−2 −2

−3 −3

−4 −4
−4 −2 0 2 4 −4 −2 0 2 4

Figure 7. Right Panel: Original disk centers. Left Panel: 2-dimensional embedding for single disk pivots.

this case is to smooth ‘vertically’. So let ϕ denote the


1-dimensional standard normal density, and the
ψ(u;θ1) smoothing operation
 ∞
(ϕh v f )(u, v) = ϕh (w) f (u, v − w) dw;
ψ(u;θ2) −∞

define then the smoothed image I h by

Iθh = ϕh v Iθ .

For fixed h > 0, the collection of all Iθh for all θ ∈ 


Figure 8. Horizon Articulation Model. defines a smooth image manifold Mh , and so we can
define renormalized length in a fashion similar to (3.5).
In this setting, the normal n(b) is no longer relevant;
We are interested in families Iθ of articulated images we define the motion vector
where the horizon is generated by a linear combination  dθ j
of basis elements ψ j (an example of an articulation v(t, u) = ψ j (u);
appears in Fig. 8) j
dt

ψ(u; θ ) = θi ψi (u). and the formula for renormalized length is
i
  1/2
1 1
Hence θ is a vector of expansion coefficients governing
λ(ϑ) = Cτ · v(t, u) du
2
dt. (6.7)
the shape of the horizon. We assume the basis functions 0 0
are orthogonal for L 2 [0, 1]:
 1
When we do this, we find that, if we select a linear
ψ j (u)ψ j  (u)du = 1{ j= j  } ; path in parameter space,
0

examples include orthonormal wavelets, and orthonor- ϑ(t) = θ0 + t(θ1 − θ0 )


mal sinusoids. We will assume that the basis functions
are smooth: at least C 2 . then
In defining the metric δ for this setting, we need
to define image regularization. The natural choice in λ(ϑ) = Cτ · θ1 − θ0 .
Image Manifolds which are Isometric to Euclidean Space 17

Since the length of every line segment is proportional As long as the condition that the mobile wedges can-
to the Euclidean distance, it follows that δ itself is pro- not overlap is enforced, the demonstration can be given
portional to Euclidean distance. for any number of ‘fingers’.
Also, the appropriate formula for the Riemannian
metric is simply Theorem 6.2. For the ‘Bunny Ears’ and ‘Hand’
 Models, with the indicated convex parameter spaces,
1
isometry holds.
gi j (θ) = Cτ2 · φi (u)φ j (u) du = Cτ2 · 1{i= j} .
0

which shows the same thing another way. 6.3. Translating Several Ordered Nonoverlapping
Disks
Theorem 6.1. In the Horizon Articulation model, if Consider now continuum images containing n disks,
the parameter space  is convex, isometry holds. each of radius one, with centers θi , i = 1, . . . , n. With
I0 the indicator of the unit disk, and θ = (θ1 , . . . , θn ),
An interesting feature here is that there is no limit to put
the dimensionality of θ; in particular, we can consider
quite complex articulations of a horizon; in each case, 
n
the natural parametrization will be recovered up to a Iθ (x) = I0 (x − θi ).
rigid motion of parameter space. I =1

In this section we constrain


6.2. Bunny Ears and Hands
θi − θ j  > 1
We now consider multiple-object articulations. Restric-
tions on the joint behavior of the objects will be essen- so that no two terms in the sum overlap, and the im-
tial in order to have isometry. For detailed discussion age remains of the ‘black-and-white’ type in previous
of the necessary conditions and proofs of the results, sections. We will go further than merely constrain the
see [6]. disks to be nonoverlapping; we do so in several ways,
Consider a two-object ‘Bunny Ears’ composed of each defining a different parameter space .
two wedges on a disk: Let θ = (θ 1 , θ 2 ) denote a pair
of angles, and let Bθ be the unit disk with two ‘ears’
attached, with the inclination of the ‘ears’ controlled • Single File. Let 1 denote the set of θ where the disk
by the components of θ . Each ear is an annular wedge centers lie along the x-axis, so that
with angular opening 2ω. Such a basic wedge is defined
by boundary segments {(r cos (θ + ω), r sin (θ + ω)) : θi,2 = 0 i = 1, . . . , n;
1 ≤ r < 2} and {(r cos (θ − ω), r sin (θ − ω)) : 1 ≤
r < 2}; and two arcs connecting them, along r = 1 (here θi, j denotes the j-coordinate of θi ); moreover
and r = 2. The first ear is a wedge centered at θ = θ 1 ; the centers are ordered along the x-axis:
the second ear is a wedge centered at θ = θ 2 . The
parameter space  is chosen so the ears don’t overlap: θ1,1 < θ2,1 < · · · < θn,1 .

 = {(θ 1 , θ 2 ) : 2ω < θ 1 − θ 2 < 2π − 2ω}. • Separated Columns. Let 2 = 2 ((ai )) denote the
set of θ arising where the disk centers can be
We can extend this to the Hands model, in which distributed in the 2-dimensional plane relatively
the unit disk has 5 ‘fingers’ attached, and each ‘finger’ broadly, but are constrained to lie in zones defined by
is again an annular wedge at specific angle θ i , i = regions of the x-axis. We fix cutpoints a0 , a1 , . . . , an ,
1, . . . , 5. with ai > ai−1 + 2 and demand that

 = {(θ 1 , θ 2 , . . . , θ 5 ) : 2ω < θ i − θ i+1 < 2π − 2ω}. ai−1 + 1 < θi,1 < ai − 1 i = 1, . . . , n.


18 Donoho and Grimes

• Northeast/Southwest ordering. We let 3 denote the both horizons are touching at the endpoints u = 0, 1,
parameter space where the disks are arranged so that and that elsewhere ψ1 (u) < ψ2 (u). Suppose that the
the i-th one has all earlier disks lying to the southwest two horizons are each parametrized as in Section 5.1,
and all later disks lying in the northeast quadrant. with respective parameter vectors θi , and that, with
θ = (θ1 , θ2 ), we have
Each such constraint corresponds to a convex family .
Now we consider independent translations of the fam- Iθ = 1{ψ(u;θ1 )≤v≤ψ(u;θ2 )} .
ily of disks that maintain the assumed constraint, i.e.
translations that preserve membership in . We let  be the subset of θ vectors where
In this setting, we can adapt arguments from the case
of a single disk to see that, if θ(0) = (θ01 , . . . , θ0n ) is ψ(u; θ1 ) ≤ ψ(u; θ2 ) u ∈ [0, 1],
a collection of disk centers and θ(1) = (θ11 , . . . , θ1n )
is another collection of disk centers, then for ϑ(t) = so that the definition makes sense. Note that, for each
θ(0) + t(θ(1) − θ(0) ), we have the renormalized length: fixed u, θ
→ ψ(u; θ ) is linear; therefore,  is a convex
parameter space.

 n It is not hard to see that if θ(0) = (θ01 , θ02 ) is a col-
  
λ(θ ; M) = Cτ ·  θ i − θ i 2 lection of Horizon parameters in  and θ(1) = (θ11 , θ12 )
1 0
i=1 is another collection of horizon parameters in , then
for ϑ(t) = θ(0) + t(θ(1) − θ(0) ) a linear path we have the
and a Riemannian structure which is in fact Euclidean: arclength

 2
g(θ ) = Cτ · I d, ∀ θ ∈ .   
λ(ϑ; M) = Cτ ·  θ i − θ i 2
1 0
i=1
Thus, in this setting, the shortest distance between pairs
of images indeed goes by morphing the centers along and a Riemannian structure which is in fact Euclidean:
line segments.
g(θ ) = Cτ2 · Id, ∀ θ ∈ .
Theorem 6.3. Isometry holds for the Ordered
Nonoverlapping Disks model with any of the param- Thus, again in this setting, the shortest path between
eter spaces 1 , 2 , or 3 . images defined by two horizons indeed goes by follow-
ing a linear path in parameter space.
The extension from Disks to 4-fold symmetry seems
likely to hold as well. Theorem 6.4. Isometry holds for the Two Non-
Crossing Horizons model.
6.4. Two Non-Crossing Horizons
6.5. Cartoon Faces
Now consider a region which is white everywhere
except between two horizons which are constrained Now consider a very simple model of a cartoon face
never to cross. For example, suppose that, as in Fig. 9, undergoing various articulations.

Figure 9. Successful Composite Articulations: Multiple Ordered Disks, Horizon models, and Bunny Ears.
Image Manifolds which are Isometric to Euclidean Space 19

Theorem 6.5. Isometry holds for the Cartoon Face


Model.

7. Movies

In this section only, we consider the case of ‘movies’,


which are objects with an additional time index: I (x, α)
Figure 10. Examples of cartoon faces generated by horizon artic-
ulations. where x ∈ R2 and the time index α ∈ [0, 1]. Think of a
(continuous) collection of 2-D images I (·, α); instead
of looking at the parameter space for one movie, we
A fixed oval region in the plane is called the ‘head’, consider the joint parameter space of an entire set of
and five regions inside it, called ‘brows’, ‘eyes’ and movies.
‘mouth’ are defined. Each of the brows, eyes and mouth We consider two kinds of articulations under the
is a black region defined by two articulating horizons. movie model. For much more ambitious examples, see
The eyes and mouth have horizons which, as in the the technical report [3].
previous section, are joined at the ends, creating an al-
mond shape. The brows have two horizons ending in
vertical line segments and having fixed widths. Exam- 7.1. Motions of a Disk
ples of these faces appear in Fig. 10. The parameter
space under this model is convex for the same reasons Let I0 be a 2-dimensional image—the indicator func-
as the parameter space in Section 6.4. tion of a disk—and consider a movie showing motion
The underlying parameter vector θ = (θ 1 , ..., θ 8 ); θ 1 of the disk. Let in fact χ (α) be the center of the disk as
and θ 2 , give the upper and lower horizon of the mouth a function of time and put
(i.e. the lips), θ 3 and θ 4 give the upper and lower horizon
of the left eye, (i.e. the lids), etc. The parameter space I (x, α) = I0 (x − χ (α)).
 is constrained so that the upper and lower boundaries
of an object such as the right eye never cross, and so
that the upper boundary of the right eye never crosses For a simple example, consider a movie which shows
the lower boundary of the right brow, etc. linear motion of the disk (see Fig. 11). Then
We evaluate closeness of the images using L 2 (dx)
as usual. Applying the same smoothing recipe as in the χ (α) = ν · α.
single horizon case, we get a renormalized distance.
It is not hard to see that if θ(0) = (θ01 , . . . , θ08 ) is
a collection of Horizon parameters in  and θ(1) = The center of the disk is at (0, 0) at the movie’s begin-
(θ11 , . . . , θ18 ) is another collection of horizon parameters ning, at time α = 0, and it is at ν at the movie’s end, at
in , then for a linear path ϑ(t) = θ(0) + t(θ(1) − θ(0) ) time α = 1.
we have the arclength

 8
  
λ(ϑ; M) = Cτ ·  θ i − θ i 2
1 0
i=1

and a Riemannian structure which is in fact Euclidean: Time Axis


I(θ;α)

gi j (θ ) = Cτ2 · Id, ∀ θ ∈ .

Thus, again in this setting, the shortest path between α


images defined by two horizons indeed goes by follow-
ing a linear path in parameter space. Figure 11. Movie model: sequence of articulated disks in time.
20 Donoho and Grimes

For a more interesting class of motions, we consider the hand components. Let in fact ζ (α) be the parameters
the parametric family: of the hand as a function of time and put

χ (α) = χ (α; θ ) = θ j ψ j (α) I (x, α) = Iζ (α) (x).
j
For a simple example, consider a movie which shows
where ψ j are vector-valued and orthogonal in L [0, 1]. 2 uniform rotation of the fingers around a common center.
Define now the family of articulated images Iθ (x) = Then
I0 (x − χ(α; θ)), and for regularization, do 2-d smooth-
ing within each frame: ζ (α) = ζ0 + α(ζ1 − ζ0 ).

The hand is in configuration ζ0 at the movie’s begin-


Iθh = φh (x,y) Iθ ;
ning, at time α = 0, and it is at ζ1 at the movie’s end,
at time α = 1.
here φh is a 2-d smoothing kernel in (x, y) only (not in For a more interesting class of motions, we consider
α). For a smooth path ϑ(·) in  we have the following the parametric family:
formula for arclength:

 ζ (α) = ζ (α; θ) = θ j ζ j (α)
 1 j
λ(ϑ) = Cτ ν, n2 dbdα dt
0
where ζ j are vector-valued and orthogonal in L 2 [0, 1].
Define now the family of articulated images
where b runs along the perimeter of the disk, ν = Iθ (x, α) = Iζ (α) (x) and for regularization, do 1-D
ν(b, α; t) is the motion vector of the boundary element smoothing angularly:
b at time α in the movie and at parameter t in the path,
and n is the (x, y)-normal to the surface of the disk
Iθh = φh ω Iθ
at (b, α). Then we notice that, just as in the case of
translating a 2-d image,
here φh is a 1-D smoothing kernel acting convolution-
   ally in r = constant. For a smooth path ϑ(·) in  we
 dχ 2
ν, n2 db = C ·  
 dα 
have the following formula for arclength:
2
 
1
We conclude that λ(ϑ) = ν, n2 db dα dt
0

δ(θ0 , θ1 ) = Cτ · θ1 − θ0 2 . where b runs across the perimeter of the hand within the
two-dimensional image, ν(b, α; t) is the motion vector
Theorem 7.1. Isometry holds for movies of a trans- of the boundary element at time α in the movie and at
lating disk. parameter t in the path, and n is the (x, y)-normal to the
surface of the disk at (b, α). Then we notice that, just
For similar reasons, Continuum Isomap works for as in the case of translating a two-dimensional image,
movies of several translating disks which are ordered   2
 dζ 
and nonoverlapping. ν, n2 db = C ·  
 dα  .
2

7.2. Gestures of a Hand We conclude that

Consider the Hands model of the previous section, only δ(θ0 , θ1 ) = C · θ1 − θ0 2 .
relabel the parameter θ of that model by ζ . Then ζ has
several components, controlling the position of the var- Theorem 7.2. Isometry holds for movies of Hand
ious fingers. Now consider a movie showing motion of Gestures.
Image Manifolds which are Isometric to Euclidean Space 21

Rectangle volumes and aspect ratios and common centers. With


α1 and α2 the halflengths in the axial directions, and
θ λα12 θ = (α1 , α2 ), the image Iθ is the the indicator of the
α1 region defined by |x| ≤ α1 , |y| ≤ α2 . Let  be the
rectangle αi ∈ (1/2, 2).
Overlapping Disks Now consider a smooth path in . Because at each
point of ∂ Bθ the normal vector is either coincident with
the motion vector or perpendicular to it, and because
Ellipse articulation in α1 alone has an impact corresponding to
α1 the vertical profile of the rectangle,
α2
θ(3) 
θ(2)
θ(1) g11 = Cτ × dx
Jointed Finger x∈[−α2 ,α2 ]

In short,
Figure 12. Failures: Two Disk Articulation, Rectangle Morphing,
Ellipse Morphing, Jointed Finger.   C α 0 
g11 g12 τ 2
=
g21 g22 0 C τ α1
8. Failures
As the Riemannian structure (gi j ) depends in an es-
The conditions established for Continuum Isomap to sential way on θ, it is not constant, and the abstract
‘work’ can fail in two different ways. First, the param- manifold M is not flat. Hence, we do not have isom-
eter space can be inherently non-convex. Second, the etry between the apparently natural parameter space
manifold can be essentially curved. (, δ) and Euclidean space.
Other examples of this type, e.g. images of ellipses
with various aspect ratios and volumes, are discussed
8.1. Non-Convexity: Two Disks with Exclusion in [3]. Many three-dimensional perspective transfor-
mations result in inherent curvature of the abstract
Consider images of two black disks on a white back- manifold.
ground within a square image as in Fig. 12. Suppose
the disks are forbidden to occupy the same space at the
same time. Partitioning the parameter θ as (θ(1), θ (2)), 8.3. Non-Flatness: Fingers with Multiple Joints
with components denoting the positions of the image
subcomponents, the disks cannot overlap. At any given Imagine a model as in the fourth panel of Fig. 12 where
position of the first disk, the parameter space of the sec- we define a single ‘skeletal finger’ composed of three
ond disk has a hole representing the parameter values connected line segments, thickened slightly into ‘rods’.
leading to overlap with the first disk. As a result, the In this case, we assume that the joints of the finger are
parameter space is not convex. For further examples of negligible compared to the segment lengths (all seg-
non-convexity, see [3]. ments are of length L). Parametrize each of the seg-
In this case, the Riemannian structure gi j (θ ) is pro- ments by an angular position: θi , i = 1, 2, 3, where
portional to the identity everywhere, but the domain is 0 ≤ θi ≤ π/2. The entire skeleton of the finger is then,
not convex. We have local isometry but not isometry. for l ∈ [0, L]:
As it turns out, local isometry is sufficient to allow re-
construction of the parameter space, using one of the S1 = (l cos(θ1 ), l sin(θ1 ))
methods described in [4]. S2 = (L cos(θ1 ) + l cos(θ2 ), L sin(θ1 ) + l sin(θ2 ))
S3 = (L cos(θ1 ) + L cos(θ2 ) + l cos(θ3 ), L sin(θ1 )
8.2. Non-Flatness: Rectangles +L sin(θ2 ) + l sin(θ3 ))

We now turn to examples where the abstract manifold As a result, the normal and motion vectors under a
is not flat. Consider the ‘world’ of rectangles of various three-parameter articulation, (θ1 , θ2 , θ3 ), depend on the
22 Donoho and Grimes

values of the other variables. To achieve the minimal and, if θ evolves along a path ϑ(t), the boundary ∂ Bθ
distance between two positions of θ1 requires that the flows with time according to
other parameters θi , i > 1 be in optimal position, and
the definition of an ‘optimal’ position depends on the  dθi
current value of θ1 . The result is curvature. ν(b, t) = νi (b, ϑ(t)).
i
dt

9. Conclusion We can also parametrize the boundary ∂ Bθ , for all θ


sufficiently close to θ0 , in terms of a function β(b, θ )
We have studied manifolds of model images defined giving the u-coordinate of the boundary point in ∂ Bθ
over the continuum plane, finding that for a num- corresponding to a given b in ∂ Bθ0 . We will make the
ber of interesting articulation families, the resulting quantitative assumption that β is a C 2 function of both
abstract image manifolds are isometric to the under- θ and b jointly. The proof is a series of observations
lying ‘natural’ parameter space. This gives a theo- recorded in the form of Lemmas.
retical framework explaining why the Isomap algo-
rithm can be expected to work in the case of image Lemma A.1. If ψ is C ∞ (R2 ) and of rapid decay at
manifolds. ∞,
Although our analysis abstracts away several possi-

ble confounding factors such as pixelization, the com- d
parison of theoretical predictions with empirical tests It , ψ|t=0 = σ (b) ψ(b) db
dt ∂B
of Isomap shows that the theory is accurate in predict-
ing actual empirical results. In addition, our theoretical where σ (b) ≡ ν(b, 0), n(b).
perspective allows consideration of variations of the
image case (movies and very high-dimensional articu-
Proof: The Tubular Neighborhood Theorem [5] al-
lations, for example) that would not be easy to explore
lows us to write, for small enough  > 0
by empirical techniques.
  β(b,θ )
I − I0 , ψ = ψ(b + un(b))J (u, b) du db
Appendix A: Proof of Theorem 3.1 ∂B 0

Notation: Let It ≡ Iϑ(t) , where Iθ is a function of where J (u, b) is the Jacobian of the cartesian-to-tubular
x ∈ R2 . Let the boundary of the set B be denoted coordinates map (and which is 1 + o(1) as u → 0),
by ∂ B; points in the boundary are b ∈ ∂ B. We assume and where we interpret an 0integral with reversed limits
−|a|
that B and ∂ B have the same topology throughout , according to 0 ≡ − −|a| as usual. Now
and so, for a fixed θ0 there exists a parametrizing map
α : ∂ Bθ0 ×  → R2 using ∂ Bθ0 as a parameter space
β(b, θ ) = σ (b) + O( 2 )
for each boundary point and each θ ⊂ . We also as-
sume that the radius of curvature of ∂ B is uniformly
bounded away from zero both over every b ∈ ∂ Bθ and so
over θ ⊂ . It follows from this that for fixed θ0 there  β(b,θ )
is a tubular neighborhood of ∂ Bθ0 with coordinates
ψ(b + un(b))J (u, b)du ∼ ψ(b)σ (b),
(b, u), where any x in that neighborhood has for its b- 0
coordinate the closest point in ∂ Bθ0 whose normal n(b)  → 0.
is parallel to x − b, and for its u-coordinate the signed
distance to ∂ Bθ0 . Fix now θ0 and consider the coordinate
Dividing by  and letting  → 0 gives the result.
system associated to ∂ Bθ0 . The motion vectors νi (b, θ )
referred to verbally in Section 3 are then defined
by Lemma A.2.

∂ d
νi (b, θ) = α(b, θ), [φh  It (x)]t=0 = φh (x − b)σ (b) db (1.8)
∂θi dt ∂B
Image Manifolds which are Isometric to Euclidean Space 23

Proof: Define the function ψx0 according to ψx0 (y) = b = b + t ḃ + O(t 2 ), and so
φh (x0 − y). Self-adjointness of convolution allows us
   
to write b − b −2 
σ (b)σ (b )ψ h db db
 ∂B ∂B h
1 1   ∞
(φh  (I − I0 ))(x0 ) = (I − I0 )(y)ψx0 (y) dy
  ∼ h −1 σ (b)σ (b)ψ(t ḃ) dt db
∂ B −∞
d
→ It , ψx0 l|t=0 ,  → 0, 
dt ∼ h −1 Cφ × σ 2 (b) db,
∂B
= ψx0 (b)σ (b) db
∂B
where we define
where the last step follows from Lemma (A.1). (1.8) 
ψ(t ḃ) dt = Cφ ,
follows.

Lemma A.3. Let ψh (x) ≡ φ√2h (x). Then which is independent of the choice of unit vector ḃ
because ψ is isotropic. We conclude that
  2
d
(φh  It ) d x   2 
dt d
  (φh  It ) d x ∼ h −1 Cφ σ 2 (b) db,
dt ∂B
= σ (b)σ (b )ψh (b − b ) db db .
∂B ∂B h → 0.

Proof: Combining this result for both (It ), a path of interest,


and (Itτ ), a line segment between landmark points, gives
  2 the result.
d
(φh  It ) d x
dt
  
= φh (x − b)σ (b) db Acknowledgments
R2
  ∂B 
CG would like to thank Vin de Silva and DLD would
× φh (x − b )σ (b ) db d x
∂B like to thank Raphy Coifman, in both cases for dis-
    cussions and references. This work has been partially
= σ (b)σ (b ) φh (x−b)φh (x−b ) d x db db supported by National Science Foundation grants
 ∂ B ∂ B R2
DMS-00-72661, ANI-00-85984, and DMS-01-
= σ (b)σ (b )ψh (b − b ) db db 40698.
∂B ∂B

where ψh denotes the cross-correlation φh ⊗ φh . Be- References


cause φh is Gaussian, its cross-correlation with itself,
ψh , is also √
Gaussian, with a scale parameter larger by 1. P.N. Belhumeur and D.J. Kriegman, “What is the set of images
a factor of 2. of an object under all possible illumination conditions?” Inter-
national Journal of Computer Vision, Vol. 28, No. 3, pp. 1–16,
Now combine the above lemmas. With ψ an 1998.
2. T.F. Cox and M.A.A. Cox, Multidimensional Scaling, Chapman,
appropriately-scaled isotropic Gaussian,
Hall, 1994.
  3. D.L. Donoho and C. Grimes, “When does Isomap recover
b − b 1 the true parametrization of manifolds of articulated images?”
ψh (b − b ) = ψ . Department of Statistics, Stanford University, Technical Report
h h2
TR2002-27, 2002.
4. D. Donoho and C. Grimes, “Hessian Eigenmaps: Locally linear
Note that near b, ∂ B is approximated by L = {b + embedding techniques for high-dimensional data,” Proceedings
t ḃ : t ∈ R} where ḃ is the unit tangent vector to ∂ B of the National Academy of Sciences, Vol. 100, No. 10, pp. 5591–
at b. Therefore, if b ∈ ∂ B is near b, we can write 5596, 2003.
24 Donoho and Grimes

5. A. Gray, Tubes, Addison-Wesley, 1990. in Statistics at Princeton University where his thesis adviser was John
6. C. Grimes, “New Methods in Nonlinear Dimensionality Reduc- W. Tukey and his Ph.D. in Statistics at Harvard University, where his
tion,” PhD, Department of Statistics, Stanford University, Stan- thesis adviser was Peter J. Huber. He is a member of the US National
ford, CA, 2003. Academy of Sciences and of the American Academy of Arts and
7. K.V. Mardia, J.T. Kent, and J.M. Bibby, Multivariate Analysis, Sciences. According to the Institute for Scientific Information, he
Academic Press: Harcourt Brace, 1979. was the most cited author in the Mathematical Sciences for 1991–
8. S. Nayar, S. Baker, and H. Murase, “Parametric feature detec- 2000.
tion,” in Proceedings of the IEEE Conference on Computer Vi-
sion and Pattern Recognition, San Francisco, California, 1996,
pp. 471–477.
9. S.T. Roweis and L.K. Saul, “Nonlinear dimensionality analy-
sis by locally linear embedding,” Science, Vol. 290, No. 5500,
pp. 2323–2326, 2003.
10. J.B. Tenenbaum, V. de Silva, and J.C. Langford. “A global geo-
metric framework for nonlinear dimensionality reduction,” Sci-
ence, Vol. 290, No. 5500, pp. 2319–2323, 2000.

Carrie Grimes is a Senior Research Scientist in the Search Quality


group at Google, Inc. in Mountain View, CA. She received her A.B.
in Anthropology/Archaeology at Harvard University in 1998 where
her thesis advisor was William Fash, and her Ph.D. in Statistics at
Stanford University in 2003 where her thesis advisor was David L.
Donoho.

David L. Donoho is Anne T. and Robert M. Bass Professor in the


Humanities and Sciences at Stanford University. He received his A.B.

You might also like