Medical Image Display & Analysis Group, University of North Carolina, Chapel Hill

SUPPORT VECTOR MACHINE FOR DATA ON MANIFOLDS: AN APPLICATION TO IMAGE
ANALYSIS
Suman K. Sen, Mark Foskey, James S. Marron, Martin A. Styner
Medical Image Display & Analysis Group, University of North Carolina, Chapel Hill
ABSTRACT Embedding. The drawback of this method is that the results

are not interpretable since it does not provide a natural direc-
The Support Vector Machine (SVM) is a powerful tool for
tion of separation. This issue is addressed here by generaliz-
classification. We generalize SVM to work with data objects
ing SVM which naturally handles data on curved manifolds.
that are naturally understood to be lying on curved manifolds,
The notion of separating hyperplane, fundamental to
and not in the usual d-dimensional Euclidean space. Such
many Euclidean classifiers, is intractable to explicitly com-
data arise from medial representations (m-reps) in medical
pute when generalized to arbitrary manifolds. The approach
images, Diffusion Tensor-MRI (DT-MRI), diffeomorphisms,
adopted here is to find control points on the manifold, which
etc. Considering such data objects to be embedded in higher
represent the different classes of data, and then define the
dimensional Euclidean space results in invalid projections (on
classifier as a function of the distances (geodesic distances on
the separating direction) while Kernel Embedding does not
the manifold) of individual points from the control points (see
provide a natural separating direction. We use geodesic dis-
section 2). We thus bypass the problem of explicitly finding
tances, defined on the manifold to formulate our methodol-
separating boundaries on the manifold. In 2.3.1, a choice of
ogy. This approach addresses the important issue of analyzing
the control points is proposed which generalizes the SVM
the change that accompanies the difference between groups
criterion for manifolds.
by implicitly defining the notions of separating surface and
This approach will enable us not only to use the method
separating direction on the manifold. The methods are ap-
on our motivating example of m-rep data, but it is also ap-
plied in shape analysis with target data being m-reps of 3 di-
plicable for DT-MRI, and several other sciences like human
mensional medical images.
movement, mechanical engineering, robotics, computer vi-
Index Terms— Image classification, Image shape analy- sion and molecular biology where non-Euclidean data often
sis. appear.
1. INTRODUCTION 1.1. M-reps as Shape Descriptor

The medial representation is based on the medial axis of Blum
Classification plays an important role in statistical shape anal-
[14]. There is a medial manifold sampled over an approxi-
ysis [1, 2, 3] of medical images. For example, brain disorders
mately regular lattice, the elements of which are called medial
like Alzheimer’s and schizophrenia are often accompanied by
atoms. When a medial figure consists of n medial atoms, its
structural changes. By detecting these changes, classification
parameters are naturally understood to be lying in the carte-
help in understanding differences between populations.
sian product manifold {3 × + × S 2 × S 2 }n [15, 16].
Classification methods like Fisher Linear Discrimination
(FLD) [4], Support Vector Machines [5, 6] and Distance
Weighted Discrimination (DWD) [7] were designed for data
which are vectors in Euclidean space and do not deal ex-
tensively with data which are parameterized by elements in
curved manifolds. See [8] and [9] for an overview of com-
mon existing classification methods. Examples of data on
manifold include m-reps [10, 11] and DT-MRIs [12].
SVM has been widely used in image analysis since it han-
dles the issue of High Dimension Low Sample Size (HDLSS)
reasonably well. Kernel Embedding [13] is another approach Fig. 1. Medial atom with a cross section of the boundary
where the data are mapped to a higher dimensional feature surface it implies (left). An m-rep model of a hippocampus
space and the above mentioned Euclidean methods are ap- and its boundary surface (right).
plied. SVM is known to be widely used along with Kernel
978-1-4244-2003-2/08/$25.00 ©2008 IEEE 1195 ISBI 2008

2. METHODS
Here we extend the ideas of separating hyperplane and sepa-

rating direction (which are the foundations of many Euclidean
classification methods such as Mean Difference, FLD, SVM
and DWD) to data lying on a manifold. Our solution is based
on the idea of control points (and the geodesic distance of
data from these control points), as described in the following
section.
2.1. Control Points and The General Classification Rule

Fig. 2. Two pairs of control points showing their respective
We think about control points as being representatives of the
separating boundary and separating direction on the surface of
two classes. If we name the control points as c1 and c2 , then
the sphere. Different colors (with symbols) represent classes.
the proposed classification rule fc1 ,c2 (x) is given by
The solid red surface (great circle) separates the data better
than the dotted black surface.
fc1 ,c2 (x) = d2 (c2 , x) − d2 (c1 , x), (1)
where c1 , c2 , and x ∈ M and d(·, ·) is the geodesic distance the SVM criterion has many interpretations, it is the max-
metric defined on the manifold M. This rule assigns a new imum margin idea that generalizes most naturally to man-
point x to class 1 if it is closer to c1 than c2 , and to class 2 ifolds where some Euclidean notions such as distance are
otherwise. much more readily available than others (e.g., inner product).
MSVM appears to be the first approach where all calculations
2.2. The Implied Separating Surface and Direction of are done on the manifold. As in the classical SVM literature,
Separation yi denotes the class label taking values -1 and 1.
The zero level set of the function fc1 ,c2 (·) (defined in (1))
The zero level set of fc1 ,c2 (·) is the analog of the separating defines the separating boundary H(c1 , c2 ) for a given pair
hyperplane, while the geodesic joining c1 and c2 is the analog (c ,c ) denote the set (to handle possible
(c1 , c2 ). Also, let X 1 2
of the direction of separation. Thus, the separating surface is ties) of training points which are nearest to H(c1 , c2 ).
the set of points which is equidistant from c1 and c2 . If we We would like to solve for some c˜1 and c˜2 such that
denote the separating surface by H(c1 , c2 ), we can write,
(c˜1 , c˜2 ) = arg maxc1 ,c2 min d(xi , H(c1 , c2 )) (4)
H(c1 , c2 ) = {x ∈ M : fc1 ,c2 (x) = 0} (2) i=1...n
= {x ∈ M : d2 (c1 , x) = d2 (c2 , x)} (3) In other words, we want to maximize the minimum distance
of the training points from the separating boundary.
In d dimensional Euclidean space, H(c1 , c2 ) is a hyper- It is important to note that the solution of (c˜1 , c˜2 ) in (4) is
plane of dimension d − 1 that is the perpendicular bisector of not unique. In fact, in the d dimensional Euclidean case there
the line segment joining c1 and c2 . Note that the Mean Differ- is a d − 1 dimensional space of solutions. Therefore, in order
ence method is a particular case of this rule, where the control to make the search space for (c˜1 , c˜2 ) smaller we propose to
points are the means of the respective classes. find (c˜1 , c˜2 ) as
2.3. Choice of Control Points
(c˜1 , c˜2 ) = arg max(c1 ,c2 )∈Ck min d(xi , H(c1 , c2 )) (5)
i=1...n
Having set the framework for the general decision rule for
manifolds the critical issue now is the choice of control points. where, for a given k > 0,
For example, Fig. 2 shows that for the given set of data, Ck = {(c1 , c2 ) : ŷc1 ,c2 fc1 ,c2 (x̂(c1 ,c2 ) ) = k} (6)
the control points corresponding to the red solid separating
boundary do a better job of classification than the pair cor- and,
x̂(c1 ,c2 ) = arg minx∈X (c fc1 ,c2 (x) (7)
responding to the black dotted boundary. So, the key to the 1 ,c2 )
construction of a good classification rule is to find the right and ŷc1 ,c2 is the class label of x̂(c1 ,c2 ) .
pair of control points.
In Euclidean space, note that the distance of any point x
2.3.1. The Manifold SVM (MSVM) method from the separating boundary H(c1 , c2 ) is

MSVM determines a pair of control points that maximizes fc1 ,c2 (x)
d(x, H(c1 , c2 )) = (8)
the mimimum distance to the separating boundary. While 2d(c1 , c2 )
1196
Therefore, using (5) - (8), and the fact x̂(c1 ,c2 ) is one of the Fig. 3 shows the performance (training error (left panel)
training points closest to H(c1 , c2 ), the optimization problem and cross-validation error(left panel)) of the different meth-
can be recast as the following penalized minimization prob- ods.
lem:
n Cross Validation Error VS λ
λ
Training Error VS λ
Minimize 0.4
gλ (c1 , c2 ) = d2 (c1 , c2 ) +
0.4
(hi )+ (9) GMD

TSVM
GMD
TSVM
n i=1 0.35 MSVM
0.38
MSVM
0.36
0.3
where λ is the penalty for violating the constraints given by 0.34
Training Error
hi = k − yi {d2 (xi , c2 ) − d2 (xi , c1 )}.
0.25
CV Error
0.32
0.2
0.3
0.15
0.28
2.3.2. Tuning Parameter λ 0.1

0.26
0.05
Note that the second term in (9) not only penalizes misclassi- 0.24
0
fication, but also penalizes cases where training points come 0 2
log
4
(λ)
6 0.22
0 2 4 6
15 log (λ)
too close to the separating boundary. The parameter λ is 15
called the tuning parameter. For large λ, the training error

(proportion of misclassified training data) tends to decrease. Fig. 3. Left: Training Errors against cost log 15 λ. Right:
But increasing λ indiscriminately tends to result in overfitting. Cross-validation Errors against cost log 15 λ. Cross-validation
This tradeoff is reflected by the cross-validation error (pro- error for MSVM is robust to the choice of λ.
portion of misclassified test data), which initially decreases,
but increases when λ becomes large enough that the error is From Fig. 3, we see that MSVM (λ = 1) has train-
driven by overfitting. A sensible choice of λ is one which has ing error either very close to GMD or substantially smaller.
low value of the cross-validation error. On the other hand, for small values of λ the training error of
TSVM is much higher. MSVM fails to attain a training error
3. RESULTS of zero while TSVM acheive zero training error (at λ = 155
or higher). But this could be due to overfitting by TSVM, and
In this section, we compare the performance of MSVM with this idea is validated by the increased cross-validation error
two other methods. Method (a) is Euclidean SVM on a single for high λ values.
tangent plane (with the overall geodesic mean as base point). We note that the cross-validation error TSVM is sensi-
We will call this method TSVM. Method (b) is called Geodesic tive to the choice of λ, i.e., a good choice of λ appears to be
Mean Difference (GMD), attained by choosing the control critical. In contrast, MSVM is much more robust against the
points as the geodesic means of the respective classes. choice of λ. In particular, the fact that the cross-validation of
In our experiment, several values of the tuning parameter MSVM is much more stable for high values of λ is promising.
λ (λ = 15k , k = 0, . . . 7) are considered for each of MSVM MSVM attains the minimum cross validation error of all the
and TSVM. The choice of the base 15 for λ is not set in stone: methods at λ = 152 .
we choose it as a reasonable compromise between coverage
of a large range for λ and the number of grid points. 3.1.2. Shape Change Between The Two Groups
In our context, the rule that best shows structural change in
3.1. Application To Hippocampi Data the hippocampus is the most valuable. The structural change
This data consists of 82 m-rep models (of Hippocampi), 56 of captured by each method is shown in Figure 4. For each clas-
which are from schizophrenic individuals and the remaining sification rule (at the λ which has the least cross-validation
26 are from healthy control individuals (see [17]). Each of error), we project the data points on to direction of separation.
these models has 24 medial atoms, placed in a 8 × 3 lattice The mean of the projected data is calculated. The projected
(see Fig.1). data points with the lowest and highest projection scores give
the extent of structural change captured by the separating di-
rection. The objects in the left are the projected shapes with
3.1.1. Training Error and Cross Validation Error
the highest score, and on the right, with the lowest score. The
We conduct the simulation study in the following way. For colormap shows the surface distance maps of the mean (of
each run we randomly remove five data points from the pop- projected data points) and projected shapes.
ulation of 82 and train our classifiers (for each λ) on the re- Fig 4 shows that GMD represents a large structural
maining 77 data points and test on the remaining five. Aggre- change. But its relevance is questionable because of its
gating over several simulated replications, the training error poor discriminating performance (Fig. 3). GMD shows a
and the cross-validation error are calculated. lot of structural change, but it fails to isolate the important
1197
Projected Model of Extreme Patient Projected Model of Extreme Control
visualization tools and softwares. This research has been sup-
ported by NIH grant NCI/P01 CA47982.
6. REFERENCES
GMD [1] I. Dryden and K. Mardia, “Multivariate shape analysis,”
Sankhya, vol. 55, pp. 460–480, 1993.
[2] D. Thompson, On Growth and Form, Cambridge University
Press, second edition, 1942.
[3] C. G. Small, The statistical theory of shape, Springer, 1996.
TSVM [4] R.A. Fisher, “The use of multiple measurements in taxonomic
problems,” Annals of Eugenics, vol. 7, no. 2, pp. 179–188,
1936.
[5] V. Vapnik, S. Golowich, and A Smola, “Support vector method
for function approximation, regression estimation, and signal
processing,” Advances in Neural Information Processing Sys-
MSVM tems, vol. 9, pp. 281–287, 1996.
[6] C. J. Burges, “A tutorial on support vector machines for pattern
Fig. 4. Diagram showing the structural change captured by recognition,” Data Mining and Knowledge Discovery, vol. 2,
the different methods. Red, green, and blue are inward dis- pp. 121–167, 1998.
tance, zero distance, and outward distance respectively. [7] J. S. Marron, M. Todd, and J. Ahn, “Distance weighted dis-
crimination,” Operations Research and Industrial Engineer-
ing, Cornell University, Technical Report 1339, 2004.
features which actually separate the two groups. Among the [8] R.O. Duda, P. E. Hart, and D. G. Stork, Pattern Classification,
other methods, MSVM captures the change most strongly. Wiley-Interscience, 2001.
[9] T. Hastie, R. Tibshirani, and J.H. Friedman, The Elements of
Statistical Learning, New York: Springer, 2001.
4. DISCUSSION
[10] S. Pizer, D. Fritsch, P. Yushkevich, V. Johnson, and E. Chaney,
“Segmentation, registration, and measurement of shape varia-
In this article we attempt to generalize SVM so that it can
tion via image object shape,” IEEE Transactions on Medical
handle data on manifolds. The method was applied to m-rep
Imaging, vol. 18, no. 851-865, 1999.
models of hippocampi and compared with other approaches.
[11] P. Yushkevich, P. T. Fletcher, S. Joshi, A. Thall, and S. M.
The results were encouraging. It seems that by virtue of work-
Pizer., “Continuous medial representations for geometric ob-
ing on the manifold (and not a tangent plane, like TSVM), ject modeling in 2d and 3d,” Image and Vision Computing, vol.
MSVM provides a good balance of classifying power and in- 21, no. 1, pp. 17–28, 2003.
formative separating direction. TSVM hardly captures any
[12] P. J. Basser, J. Mattiello, and D. Le Bihan, “MR diffusion
change. This could be related to overfitting where the sep- tensor spectroscopy and imaging.,” Biophysics Journal, vol.
arating direction feels the microscopic noisy features of the 66, pp. 259–267, 1994.
training data and thus fails to capture the relevant structural [13] B. Sch ö lkopf and A. Smola, Learning with Kernels., Cam-
changes. bridge: MIT Press, 2002.
One could argue for comparing the method with the “ap- [14] H. Blum, “A transformation for extracting new descriptors of
propriate” Kernel Embedding approach. It should be recalled shape, In W.Wathen-Dunn, editor,” Models for the Perception
that such an approach will not produce interpretable pro- of Speech and Visual Form. MIT Press, Cambridge MA, pp.
jected m-reps (necessary to analyze the difference between 363–380, 1967.
the groups) and thus we are not interested in them. The pos- [15] P. T. Fletcher, C. Lu, and S. Joshi., “Statistics of shape via prin-
sibility of doing Euclidean SVM on multiple tangent planes cipal geodesic analysis on lie groups,” In Proceedings IEEE
has not been explored here, but it can be an area of research. Conference on Computer Vision and Pattern Recognition, vol.
We believe this is one of the first attempts to do classification 1, pp. 95101, 2003.
working on the manifold and preliminary results suggest that [16] P. T. Fletcher, C. Lu, S. M. Pizer, and S. Joshi., “Princi-
this approach identifies a meaningful separating direction. pal geodesic analysis for the study of nonlinear statistics of
shape.,” IEEE Transactions on Medical Imaging, vol. 23, no.
8, pp. 995–1005, 2004.
5. ACKNOWLEDGEMENTS
[17] M. Styner, J.A. Lieberman, D. Pantazis, and G. Gerig,
“Boundary and medial shape analysis of the hippocampus in
The authors would like to thank Sarang Joshi for helpful in-
schizophrenia,” MedIA, vol. 197-203, 2004.
sights and Ja-Yeon Jeong and Rohit Saboo for help with the
1198

Medical Image Display & Analysis Group, University of North Carolina, Chapel Hill

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Medical Image Display & Analysis Group, University of North Carolina, Chapel Hill

Uploaded by

Copyright:

Available Formats

SUPPORT VECTOR MACHINE FOR DATA ON MANIFOLDS: AN APPLICATION TO IMAGE

Suman K. Sen, Mark Foskey, James S. Marron, Martin A. Styner

ABSTRACT Embedding. The drawback of this method is that the results

1. INTRODUCTION 1.1. M-reps as Shape Descriptor

978-1-4244-2003-2/08/$25.00 ©2008 IEEE 1195 ISBI 2008

Here we extend the ideas of separating hyperplane and sepa-

2.1. Control Points and The General Classiﬁcation Rule

(hi )+ (9) GMD

where λ is the penalty for violating the constraints given by 0.34

2.3.2. Tuning Parameter λ 0.1

called the tuning parameter. For large λ, the training error

You might also like