Professional Documents
Culture Documents
Editorial Board
David Hutchison
Lancaster University, UK
Takeo Kanade
Carnegie Mellon University, Pittsburgh, PA, USA
Josef Kittler
University of Surrey, Guildford, UK
Jon M. Kleinberg
Cornell University, Ithaca, NY, USA
Alfred Kobsa
University of California, Irvine, CA, USA
Friedemann Mattern
ETH Zurich, Switzerland
John C. Mitchell
Stanford University, CA, USA
Moni Naor
Weizmann Institute of Science, Rehovot, Israel
Oscar Nierstrasz
University of Bern, Switzerland
C. Pandu Rangan
Indian Institute of Technology, Madras, India
Bernhard Steffen
University of Dortmund, Germany
Madhu Sudan
Massachusetts Institute of Technology, MA, USA
Demetri Terzopoulos
University of California, Los Angeles, CA, USA
Doug Tygar
University of California, Berkeley, CA, USA
Gerhard Weikum
Max-Planck Institute of Computer Science, Saarbruecken, Germany
Andrea Torsello Francisco Escolano
Luc Brun (Eds.)
Graph-Based
Representations in
Pattern Recognition
7th IAPR-TC-15 International Workshop, GbRPR 2009
Venice, Italy, May 26-28, 2009
Proceedings
13
Volume Editors
Andrea Torsello
Department of Computer Science
“Ca’ Foscari” University of Venice
Venice, Italy
E-mail: torsello@dsi.unive.it
Francisco Escolano
Department of Computer Science
and Artificial Intelligence
Alicante University
Alicante, Spain
E-mail: sco@dccia.ua.es
Luc Brun
GreyC
University of Caen
Caen Cedex, France
E-mail: luc.brun@greyc.ensicaen.fr
ISSN 0302-9743
ISBN-10 3-642-02123-9 Springer Berlin Heidelberg New York
ISBN-13 978-3-642-02123-7 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting,
reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965,
in its current version, and permission for use must always be obtained from Springer. Violations are liable
to prosecution under the German Copyright Law.
springer.com
© Springer-Verlag Berlin Heidelberg 2009
Printed in Germany
Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India
Printed on acid-free paper SPIN: 12682966 06/3180 543210
Preface
This volume contains the papers presented at the 7th IAPR-TC-15 Workshop
on Graph-Based Representations in Pattern Recognition – GbR 2009. The work-
shop was held in Venice, Italy between May 26–28, 2009. The previous work-
shops in the series were held in Lyon, France (1997), Haindorf, Austria (1999),
Ischia, Italy (2001), York, UK (2003), Poitiers, France (2005), and Alicante,
Spain (2007).
The Technical Committee (TC15, http://www.greyc.ensicaen.fr/iapr-tc15/)
of the IAPR (International Association for Pattern Recognition) was founded in
order to federate and to encourage research work at the intersection of pattern
recognition and graph theory. Among its activities, the TC15 encourages the
organization of special graph sessions in many computer vision conferences and
organizes the biennial GbR Workshop.
The scientific focus of these workshops covers research in pattern recognition
and image analysis within the graph theory framework. This workshop series
traditionally provide a forum for presenting and discussing research results and
applications in the intersection of pattern recognition, image analysis and graph
theory.
The papers in the workshop cover the use of graphs at all levels of represen-
tation, from low-level image segmentation to high-level human behavior. There
are papers on formalizing the use of graphs for representing and recognizing data
ranging from visual shape to music, papers focusing on the development of new
and efficient approaches to matching graphs, on the use of graphs for super-
vised and unsupervised classification, on learning the structure of sets of graphs,
and on the use of graph pyramids and combinatorial maps to provide suitable
coarse-to-fine representations. Encouragingly, the workshop saw the convergence
of ideas from several fields, from spectral graph theory, to machine learning, to
graphics.
The papers presented in the proceedings have been reviewed by at least two
members of the Program Committee and each paper received on average of three
reviews, with more critical papers receiving as many as five reviews. We sincerely
thank all the members of the Program Committee and all the additional referees
for their effort and invaluable help. We received 47 papers from 18 countries and
5 continents. The Program Committee selected 18 of them for oral presentation
and 19 as posters. The resulting 37 papers revised by the authors are published
in this volume.
General Chairs
Andrea Torsello Univesità Ca’ Foscari di Venezia, Italy
Francisco Escolano Universidad de Alicante, Spain
Luc Brun GREYC ENSICAEN, France
Program Committee
I. Bloch TELECOM ParisTech, France
H. Bunke University of Bern, Switzerland
S. Dickinson University of Toronto, Ontario, Canada
M. Figueiredo Instituto Superior Técnico, Portugal
E. R. Hancock University of York, UK
C. de la Higuera University of Saint-Etienne, France
J.-M. Jolion Universite de Lyon, France
W. G. Kropatsch Vienna University of Technology, Austria
M. Pelillo Univesità Ca’ Foscari di Venezia, Italy
A. Robles-Kelly National ICT Australia (NICTA), Australia
A. Shokoufandeh Drexel University, PA, USA
S. Todorovic Oregon State University, OR, USA
M. Vento Università di Salerno, Italy
R. Zabih Cornell University, NY, USA
Organizing Committee
S. Rota Bulò Univesità Ca’ Foscari di Venezia, Italy
A. Albarelli Univesità Ca’ Foscari di Venezia, Italy
E. Rodolà Univesità Ca’ Foscari di Venezia, Italy
Additional Referees
Andrea Albarelli Daniela Giorgi Emanuele Rodolà
Xiang Bai Michael Jamieson Samuel Rota Bulò
Sebastien Bougleux Jean-Christophe Janodet Émilie Samuel
Gustavo Carneiro Rolf Lakemper Cristian Sminchisescu
Fatih Demirci Miguel Angel Lozano
Aykut Erdem James Maclean
Sébastien Fourey Anand Rangarajan
Sponsoring Institutions
Dipartimento di Informatica
Università Ca’ Foscari di Venezia, Italy
Table of Contents
Graph Matching
A Polynomial Algorithm for Submap Isomorphism: Application to
Searching Patterns in Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Guillaume Damiand, Colin de la Higuera, Jean-Christophe Janodet,
Émilie Samuel, and Christine Solnon
VIII Table of Contents
Graph-Based Segmentation
1 Introduction
This paper is about shape matching by using: (1) a new hierarchical shape representa-
tion, and (2) a new quadratic-assignment objective function that is efficiently optimized
via convexification. Many psychophysical studies suggest that shape perception is the
major route for acquiring knowledge about the visual world [1]. However, while hu-
mans are very efficient in recognizing shapes, this proves a challenging task for com-
puter vision. This is mainly due to certain limitations in existing shape representations
and matching criteria used, which typically cannot adequately address matching of de-
formable shapes. Two perceptually similar deformable shapes may have certain parts
very different or even missing, whereas some other parts very similar. Therefore, ac-
counting for shape parts in matching is important. However, it is not always clear how to
define a shape part. The motivation behind the work described in this paper is to improve
robustness of shape matching by using a rich hierarchical shape representation that will
provide access to all shape parts existing at all scales, and by formulating a matching
criterion that will account for these shape parts and their hierarchical properties.
We address the following problem: Given two shapes find correspondences between
all their parts that are similar in terms of photometric, geometric, and structural prop-
erties, the same holds recursively for their subparts, and the same holds for their neigh-
bor parts. To this end, a shape is represented by a hierarchical attributed graph whose
node attributes encode the intrinsic properties of corresponding multiscale shape parts
(e.g., intensity gradient, length, orientation), and edge attributes capture the strength of
neighbor and part-of interactions between the parts. We formulate shape matching as
finding the subgraph isomorphism that preserves the original graph connectivity and
minimizes a quadratic cost whose linear and quadratic terms account for differences
A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 1–10, 2009.
c Springer-Verlag Berlin Heidelberg 2009
2 N. Payet and S. Todorovic
between node and edge attributes, respectively. The cost is defined so as to be invariant
to scale changes and in-plane rotation of the shapes. The search in the matching space
of all shape-part pairs is accelerated by convexifying the quadratic cost, which also re-
duces the chances to get trapped in a local minimum. As our experiments demonstrate,
the proposed approach is robust against large variations of individual shape parts and
partial occlusion.
In the rest of this paper, Sec. 2 points out main contributions of our approach with re-
spect to prior work, Sec. 3 describes our hierarchical representation of a shape, Sec. 4.1
specifies node and edge compatibilities and formulates our matching algorithm, Sec. 4.2
explains how to convexify and solve the quadratic program, and Sec. 5 presents experi-
mental evaluation of our approach.
This section reviews prior work and points out our main contributions. Hierarchical
shape representations are aimed at efficiently capturing both global and local properties
of shapes, and thus facilitating their matching. Shortcomings of existing representations
typically reduce the efficiency of matching algorithms. For example, the arc-tree [2,3]
trades off its accuracy and stability for lower complexity, since it is a binary tree, gener-
ated by recursively splitting the curve in two halves. Arc-trees are different for similar
shapes with some part variations, which will be hard to match. Another example is the
curvature scale-space [4,5] that loses its descriptive power by pre-specifying the degree
of image decimation (i.e., blurring and subsampling), while capturing salient curvature
points of a contour at different degrees of smoothing. Also, building the articulation-
invariant, part-based signatures of deformable shapes, presented in [6], is sensitive to
the correct identification of the shape’s landmark points and to the multidimensional
scaling and estimating of the shortest path between these points. Other hierarchical
shape descriptions include the Markov-tree graphical models [7], and the hierarchy of
polygons [8] that are based on the restrictive assumptions about the number, size, and
hierarchy depth of parts that a curve consists of. The aforementioned methods encode
only geometric properties of shape parts, and their part-of relationships, yielding a strict
tree. In contrast, we use a more general, hierarchical graph that encodes the strength of
all ascendant-descendant and neighbor relationships between shape parts, as well as
their geometric and photometric properties. The sensitivity of the graph structure to
small shape variations is reduced, since we estimate the shape’s salient points at multi-
ple scales. Also, unlike in prior work, the number of nodes, depth, and branching factor
in different parts of the hierarchical graph are data dependent.
Graph-based shape matching has been the focus of sustained research activity for more
than three decades. Graph matching may be performed by: (i) exploiting spectral prop-
erties of the graphs’ adjacency matrices [9,10]; (ii) minimizing the graph edit-distance
[11,12]; (iii) finding a maximum clique of the association graph [13]; (iv) using the
expectation-maximization of a statistical, generative model [14]. Regardless of a particu-
lar formulation, graph matching in general can be cast as a quadratic assignment problem,
where a linear term in the objective function encodes node compatibility functions, and
a quadratic term encodes edge compatibility functions. Therefore, approaches to graph
Matching Hierarchies of Deformable Shapes 3
matching mainly focus on: (i) finding suitable definitions of the compatibility functions;
and (ii) developing efficient algorithms for approximately solving the quadratic assign-
ment problem (since it is NP-hard), including a suitable reformulation of the quadratic
into linear assignment problem. However, most popular approximation algorithms (e.g.,
relaxation labeling, and loopy belief propagation) critically depend on a good initial-
ization and may be easily trapped in a local minimum, while some (e.g., deterministic
annealing schemes) can be used only for graphs with a small number of nodes. Graduated
nonconvexity schemes [15], and successive convexification methods [16] have been used
to convexify the objective function of graph matching, and thus alleviate these problems.
Since it is difficult to convexify matching cost surfaces that are not explicit functions,
these methods resort to restrictive assumptions about the functional form of a matching
cost, or reformulate the quadratic objective function into a linear program. In this pa-
per, we develop a convexification scheme that shrinks the pool of matching candidates
for each individual node in the shape hierarchy, and thus renders the objective function
amenable to solution by a convex quadratic solver.
ae ea
ab be ia
eh ik ka
...
ef gh jk ...
Fig. 1. An example contour: (left) Lines approximating the detected contour parts are marked
with different colors. (right) The shape parts are organized in a hierarchical graph that encodes
their part-of and neighbor relationships. Only a few ascendant-descendant and neighbor edges
are depicted for clarity.
4 N. Payet and S. Todorovic
larger approximation error than a pre-set threshold. This threshold controls the resolu-
tion level (i.e., scale) at which we seek to represent the contour’s fine details. How to
compute this approximation error is explained later in this section. After the desired
resolution level is reached, the shape parts obtained at different scales can be organized
in a tree structure, where nodes and parent-child (directed) edges represent the shape
parts and their part-of relationships. The number of nodes, depth, and branching factors
of each node of this tree are all automatically determined by the shape at hand.
Transitive closure. Small, perceptually negligent shape variations (e.g., due to varying
illumination in images) may lead to undesired, large structural changes in the shape
tree (e.g., causing a tree node to split into multiple descendants at multiple levels).
As in [18], we address these potential structural changes of the shape tree by adding
new directed edges that connect every node with all of its descendants, resulting in a
transitive closure of the tree. Later, in matching, the transitive closures will allow that
a search for a maximally matching node pair is conducted over all descendants under a
visited ancestor node pair, rather than stopping the search if the ancestors’ children do
not match. This, in turn, will make matching more robust.
Neighbors. Like other strictly hierarchical representations, the transitive closure of the
shape tree is capable of encoding only a limited description of spatial-layout properties
of the shape parts. For example, it cannot distinguish different layouts of the same set
of parts along the shape. In the literature, this problem has been usually addressed by
associating a context descriptor with each part. In this paper, we instead augment the
transitive closure with new, undirected edges, capturing the neighbor relationships be-
tween parts. This transforms the transitive closure of the shape tree into a more general
graph that we call the shape hierarchy.
Node Attributes. Both nodes and edges of the shape hierarchy are attributed. Node
attributes are vectors whose elements describe photometric and geometric properties of
the corresponding shape part. The following estimates help us define the shape proper-
ties. We estimate the contour’s mean intensity gradient, and use this vector to identify
the contour’s direction – namely, the sequence of points along the shape – by the right-
hand rule. The principal axis of the entire contour is estimated as the principal axis of
an ellipse fitted to all points of the shape. The attribute vector of a node (i.e., shape
part) includes the following properties: (1) length as a percentage of the parent length;
(2) angle between the principal axes of this shape part and its parent; (3) approxima-
tion error estimated as the total area between the shape part and its associated straight
line, expressed as a percentage of the area of the fitted ellipse; (4) signed approximation
error is similar to the approximation error except that the total area between the shape
part and its approximating straight line is computed by accounting for the sign of the
intensity gradient along the shape; and (5) curvature at the two end points of the shape
part. All the properties are normalized to be in [0, 1].
Edge Attributes. The attribute of an edge in the shape hierarchy encodes the strength
of the corresponding part-of or neighbor relationship. Given a directed edge between a
shape part and its descendant part, the attribute of this edge is defined as the percentage
that the length of the descendant makes in the length of the shape part. Thus, the shorter
descendant or the longer ancestor, the smaller strength of their interaction. The attribute
Matching Hierarchies of Deformable Shapes 5
of an undirected edge between two shape parts can be either 1 or 0, where 1 means that
the parts have one common end point, and 0 means that the parts are not neighbors.
4 Shape Matching
Given two shapes, our goal is to identify best matching shape parts and discard dis-
similar parts, so that the total cost is minimized. This cost is defined as a function of
geometric, photometric, and structural properties of the matched parts, their subparts,
and their neighbor parts, as explained below.
where A is a vector of costs avv of matching nodes v and v , and B is a matrix of costs
bvv uu of matching edges (v, u) and (v , u ). We define avv = d1 ψ(v) − ψ(v )2 ,
where d is the dimensionality of the node attribute vector. Also, we define bvv uu so
that matching edges of different types is prohibited, and matches between edges of the
same type with similar properties are favored in (2): bvv uu = ∞ if edges (v, u) and
(v , u ) are not of the same type; and bvv uu = |φ(v, v ) − φ(u, u )| ∈ [0, 1] if edges
(v, u) and (v , u ) are of the same type.
The constraints in (2) are typically too restrictive, because of potentially large struc-
tural changes of V or E in H that may be caused by relatively small variations of
certain shape parts. For example, suppose H and H represent similar shapes. It may
happen that node v in H corresponds to a subgraph consisting of nodes {v1 , . . . , vm
}
in H , and vice versa. Therefore, a more general many-to-many matching formulation
would be more appropriate for our purposes. The literature reports a number of heuris-
tic approaches to many-to-many matching [19,20,21], which however are developed
only for weighted graphs, and thus cannot be used for our shape hierarchies that have
6 N. Payet and S. Todorovic
v v ... v
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8
Fig. 2. Convexification of costs {avv }v ∈V for each node v ∈ V . Matching candidates of v that
belong to the region of support of the lower convex hull, v ∈ V̆ (v), are marked red.
attributes on both nodes and edges. To relax the constraints in (2), we first match H to
H , which yields solution X1 . Then, we match H to H, which yields solution X2 . The
final solution, X̃, is estimated as an intersection of non-zero elements of X1 and X2 .
Formally, theconstraints are relaxed as follows: (i) ∀(v, v ) ∈ V ×V , xvv ≥ 0; and
(ii) ∀v ∈ V, v ∈V xvv = 1 when matching H to H ; and ∀v ∈ V , v∈V xvv = 1
when matching H to H.
The QP in (2) is in general non-convex, and defines a matching space of typically 104
possible node pairs in our experiments. In order to efficiently find a solution, we con-
vexify the QP. This significantly reduces the number of matching candidates.
Given H and H to be matched, for each node v ∈ V of H, we identify those
matching candidates v ∈ V of H that form the region of support of the lower convex
hull of costs {avv }v ∈V , as illustrated in Fig. 2. Let V̆ (v) ⊂ V denote this region
of support of the convex hull, and let Ṽ (v) ⊂ V denote the set of true matches of
node v that minimize the QP in (2) (i.e., the solution). Then, by definition, we have
that Ṽ (v) ⊆ V̆ (v), i.e., the true matches must be located in the region of support of
the convex hull. It follows, that for each node v ∈ V , we can discard those matching
candidates from V that do not belong to V̆ (v). In our experiments, we typically ob-
tain |V̆ (v)| |V |, which leads to a dramatic reduction of the dimensionality of the
original matching space, |V ×V |.
In summary, we compute Ă, X̆, B̆ from original A, X, B, respectively, by deleting
all their elements avv , xvv , bvv uu for which v ∈/ V̆ (v). Then, we use the standard
interior-reflective Newton method to solve the following program:
minX̆ β ĂT X̆ + (1 − β)X̆ T B̆ X̆ ,
(3)
s.t. ∀(v, v )∈V ×V̆ (v), xvv ≥0, ∀v∈V, v ∈V̆ (v) xvv =1.
5 Results
This section presents the experimental evaluation of our approach on the standard
MPEG-7 and Brown shape datasets [12]. MPEG-7 has 1400 silhouette images show-
ing 70 different object classes, with 20 images per object class, as illustrated in Fig. 3.
MPEG-7 presents many challenges due to a large intra-class variability within each
class, and small differences between certain classes. The Brown shape dataset has 11
examples from 9 different object categories, totaling 99 images. This dataset introduces
Matching Hierarchies of Deformable Shapes 7
additional challenges, since many of the shapes have missing parts (e.g., due to occlu-
sion), and the images may contain clutter in addition to the silhouettes, as illustrated
in Figs. 1, 4, 5. We use the standard evaluation on both datasets. For every silhouette
in MPEG-7, we retrieve the 40 best matches, and count the number of those that are
in the same class as the query image. The retrieval rate is defined as the ratio of the
total number of correct hits obtained and the best possible number of correct hits. The
latter number is 1400 · 20. Also, for each shape in the Brown dataset, we first retrieve
the 10 best matches, then, check if they are in the same class as the query shape, and,
finally, compute the retrieval rate, as explained above. Input to our algorithm consists
of two parameters: the fine-resolution level (approximation error defined in Sec. 3) of
representing the contour, and β. For silhouettes in both datasets, and the approximation
error (defined in Sec. 3) equal to 1%, we obtain shape hierarchies with typically 50-100
nodes, maximum hierarchy depths of 5-7, and maximum branching factors of 4-6. For
every query shape, the distances to other shapes are computed as the normalized total
matching cost, D, between the query and these other shapes. If X is the solution of our
quadratic programming, then D=[βAT X + (1−β)X T BX]/[|V | + |V |], where |V | is
the total number of nodes in one shape hierarchy. Matching two shape hierarchies takes
about 5-10sec in MATLAB on a 3.1GHz, 1GB RAM PC.
Qualitative evaluation. Fig. 3 shows a few examples of our shape retrieval results on
MPEG-7. From the figure, our approach makes errors mainly due to the non-optimal
pre-setting of the fine-resolution level at which contours are represented by the shape
hierarchy. Also, some object classes in the MPEG-7 are characterized by multiple
Fig. 3. MPEG-7 retrieval results on three query examples and comparison with [6]. For each
query, we show 11 retrieved shapes with smallest to highest cost. (top) Results of [6]. (bottom)
Our results. Note that for deer we make the first mistake in 6th retrieval, and then get confused
with shapes whose parts are very similar to those of deer. Mistakes for other queries usually occur
due to missing to capture fine details of the curves in the shape hierarchy in our implementation.
8 N. Payet and S. Todorovic
Fig. 4. The Brown dataset – each of the four columns shows one example pair of silhouettes, and
each of the two rows shows shape parts at a specific scale that got matched; top row shows finer
scale and bottom row shows coarser scale. As can be seen, silhouettes that belong to the same
class may have large differences; despite the differences, corresponding parts got successfully
matched (each match is marked with unique color).
Fig. 5. The Brown dataset – two example pairs of silhouettes, and their shape parts that got
matched. The shapes belong to different classes, but the algorithm identifies their similar parts, as
expected (each match is marked with unique color). The normalized total matching cost between
the bunny and gen (left), or the fish and tool (right) is larger than the costs computed for the
examples shown in Fig 4, since there are fewer similar than dissimilar parts. (β = 0.4).
disjoint contours, whereas our approach is aimed at matching only one single contour
at a time. Next, Fig. 4 shows four example pairs of silhouettes from the same class,
and their matched shape parts. Similar shape parts at multiple scales got successfully
matched in all cases, as expected. Fig. 4 presents two example pairs of silhouettes that
belong to different classes. As in the previous case, similar shape parts got successfully
matched; however, since there are fewer similar than dissimilar parts the normalized
total matching cost in this case is larger. This helps discriminate between the shapes
from different classes in the retrieval.
Quantitative evaluation. To evaluate the sensitivity of our approach to input param-
eter β, we compute the average retrieval rate on the Brown dataset as a function of
input β = 0.1 : 0.1 : 0.9. The maximum retrieval rate of 99% is obtained for β=0.4,
while for β = {0.3, 0.5, 0.6} we obtain the rate of 98%, while for other values of β
the retrieval rate gracefully decreases. This suggests that both intrinsic properties of
shape parts and their spatial relations are important for shape matching, and that our
Matching Hierarchies of Deformable Shapes 9
Approaches 1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th
[12] 99 99 99 98 98 97 96 95 93 82
[6] 99 99 99 98 98 97 97 98 94 79
[3] 99 99 99 99 99 99 99 97 93 86
Our method 99 99 98 98 98 97 96 94 93 82
6 Conclusion
Matching deformable shapes is difficult since they may be perceptually similar, but
have certain parts very different or even missing. We have presented an approach aimed
at robust matching of deformable shapes by identifying multiscale salient shape parts,
and accounting for their intrinsic properties, and part-of and neighbor relationships.
Experimental evaluation of the proposed hierarchical shape representation and shape
matching via minimizing a quadratic cost has demonstrated that the approach robustly
deals with large variations or missing parts of perceptually similar shapes.
References
1. Biederman, I.: Recent psychophysical and neural research in shape recognition. In: Osaka,
N., Rentschler, I., Biederman, I. (eds.) Object Recognition, Attention, and Action, pp. 71–88.
Springer, Heidelberg (2007)
2. Günther, O., Wong, E.: The arc tree: an approximation scheme to represent arbitrary curved
shapes. Comput. Vision Graph. Image Process. 51(3), 313–337 (1990)
3. Felzenszwalb, P.F., Schwartz, J.D.: Hierarchical matching of deformable shapes. In: CVPR
(2007)
4. Mokhtarian, F., Mackworth, A.K.: A theory of multiscale, curvature-based shape representa-
tion for planar curves. IEEE TPAMI 14(8), 789–805 (1992)
5. Ueda, N., Suzuki, S.: Learning visual models from shape contours using multiscale con-
vex/concave structure matching. IEEE TPAMI 15(4), 337–352 (1993)
10 N. Payet and S. Todorovic
6. Ling, H., Jacobs, D.: Shape classification using the inner-distance. IEEE TPAMI 29(2), 286–
299 (2007)
7. Fan, X., Qi, C., Liang, D., Huang, H.: Probabilistic contour extraction using hierarchical
shape representation. In: ICCV, pp. 302–308 (2005)
8. Mcneill, G., Vijayakumar, S.: Hierarchical procrustes matching for shape retrieval. In: CVPR
(2006)
9. Siddiqi, K., Shokoufandeh, A., Dickinson, S.J., Zucker, S.W.: Shock graphs and shape match-
ing. Int. J. Comput. Vision 35(1), 13–32 (1999)
10. Shokoufandeh, A., Macrini, D., Dickinson, S., Siddiqi, K., Zucker, S.W.: Indexing hierarchi-
cal structures using graph spectra. IEEE TPAMI 27(7), 1125–1140 (2005)
11. Bunke, H., Allermann, G.: Inexact graph matching for structural pattern recognition. Pattern
Rec. Letters 1(4), 245–253 (1983)
12. Sebastian, T.B., Klein, P.N., Kimia, B.B.: Recognition of shapes by editing their shock
graphs. IEEE Trans. Pattern Anal. Machine Intell. 26(5), 550–571 (2004)
13. Pelillo, M., Siddiqi, K., Zucker, S.W.: Matching hierarchical structures using association
graphs. IEEE TPAMI 21(11), 1105–1120 (1999)
14. Tu, Z., Yuille, A.: Shape matching and recognition - using generative models and informative
features. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3023, pp. 195–209.
Springer, Heidelberg (2004)
15. Gold, S., Rangarajan, A.: A graduated assignment algorithm for graph matching. IEEE
TPAMI 18(4), 377–388 (1996)
16. Jiang, H., Drew, M.S., Li, Z.N.: Matching by linear programming and successive convexifi-
cation. IEEE TPAMI 29(6), 959–975 (2007)
17. Teh, C.H., Chin, R.T.: On the detection of dominant points on digital curves. IEEE Trans.
Pattern Anal. Mach. Intell. 11(8), 859–872 (1989)
18. Torsello, A., Hancock, E.R.: Computing approximate tree edit distance using relaxation la-
beling. Pattern Recogn. Lett. 24(8), 1089–1097 (2003)
19. Pelillo, M., Siddiqi, K., Zucker, S.W.: Many-to-many matching of attributed trees using as-
sociation graphs and game dynamics. In: Arcelli, C., Cordella, L.P., Sanniti di Baja, G. (eds.)
IWVF 2001. LNCS, vol. 2059, pp. 583–593. Springer, Heidelberg (2001)
20. Demirci, M.F., Shokoufandeh, A., Keselman, Y., Bretzner, L., Dickinson, S.J.: Object recog-
nition as many-to-many feature matching. Int. J. Computer Vision 69(2), 203–222 (2006)
21. Todorovic, S., Ahuja, N.: Region-based hierarchical image matching. Int. J. of Computer
Vision 78(1), 47–66 (2008)
Edition within a Graph Kernel Framework for
Shape Recognition
1 Introduction
The skeleton is a key feature within the shape recognition framework [1,2,3].
Indeed, this representation holds many properties: it is a thin set, homotopic to
the shape and invariant under Euclidean transformations. Moreover, any shape
can be reconstructed from the maximal circles of the skeleton points.
The set of points composing a skeleton does not highlight the structure of a
shape. Consequently, the recognition step is usually based on a graph compari-
son where graphs encode the main properties of the skeletons. Several encoding
systems have been proposed: Di Ruberto [4] proposes a direct translation of the
skeleton to the graph using many attributes. Siddiqi [5] proposes a graph which
characterises both structural properties of a skeleton and the positive, negative
or null slopes of the radius of the maximal circles along a branch. Finally this
last encoding has been improved and extended to 3D by Leymarie and Kimia [6].
The recognition of shapes using graph comparisons may be tackled using
various methods. A first family of methods is based on the graph edit distance
which is defined as the minimal number of operations required to transform
the graph encoding the first shape into the graph encoding the second one [2,3].
Another method, introduced by Pelillo [1], transforms graphs into trees and then
This work is performed in close collaboration with the laboratory Cycéron and is
supported by the CNRS and the région Basse-Normandie.
A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 11–20, 2009.
c Springer-Verlag Berlin Heidelberg 2009
12 F.-X. Dupé and L. Brun
models the tree matching problem as a maximal clique problem within a specific
association graph. A last method proposed by Bai and Latecki [7] compares
paths between end-node (node with only one neighbor) after a matching task on
the end-nodes. Contrary to previously mentioned approaches this last method
can deal with loops and may thus characterize holed shapes.
All the above methods perform in the graph space which almost contains no
mathematical structure. This forbids many common mathematical tools like the
mean graph of a set which has to be replaced by its median. A solution consists to
project graphs into a richer space. Graph kernels provide such an embedding: by
using appropriate kernels, graphs can be mapped either explicitly or implicitly
into a vector space whose dot product corresponds to the kernel function.
Most famous graph kernels are the random walk kernel, the marginalized
graph kernel and the geometric kernel [8]. A last family of kernels is based on
the notion of bag of paths [9]. These methods describe each graph by a subset of
its paths, the similarity between two graphs being deduced from the similarities
between their paths. Path similarity is based on a comparison between the edges
and nodes attributes of both paths.
However, skeletonization is not a continuous process and small perturbations
of a shape may produce ligatures and spurious branches. Graph kernels may
in this case lead to inaccurate comparisons. Neuhaus and Bunke have proposed
several kernels (e.g. [10]) based on the graph edit distance in order to reduce
the influence of graph perturbations. However the graph edit distance does not
usually fulfills all the properties of a metric and the design of a definite positive
kernel from such a distance is not straightforward. Our approach is slightly
different. Indeed, instead of considering a direct edit distance between graphs,
our kernel is based on a rewriting process applied on the bag of paths of two
graphs. The path rewriting follows the same basic idea than the string edit
distance but provides a definite positive kernel between paths.
This paper follows a first contribution [11] where we introduced the notion
of path rewriting within the graph kernel framework. It is structured as follows:
first, we recall how to construct a bag of path kernel [9,11] (Section 2). Then, we
propose a graph structure (Section 3) which encodes both the structure of the
skeleton and its major characteristics. This graph contains a sufficient amount
of information for shape reconstruction. We then extend the edition operations
(Section 4) by taking into account all the attributes and by controlling the effect
of the edition on them. Finally, we present experiments (Section 5) in order to
highlight the benefit of the edition process.
Let us consider a graph G = (V, E) where V denotes the set of vertices and
E ⊂ V × V the set of edges. A bag of paths P associated to G is defined as a
set of paths of G whose cardinality is denoted by |P |. Let us denote by Kpath
a generic path kernel. Given two graphs G1 and G2 and two paths h1 ∈ P1
and h2 ∈ P2 of respectively G1 and G2 , Kpath (h1 , h2 ) may be interpreted as
Edition within a Graph Kernel Framework for Shape Recognition 13
Desobry [12] proposed a general approach for the comparison of two sets which
has straightforward applications in the design of a bag of path kernel (bags
are sets). The two bags are modelled as the observation of two sets of random
variables in a feature space.
Desobry proposes to estimate a distance between the two distributions without
explicitly building the pdf of the two sets. The considered feature space is based
on a normalised kernel (K(h, h ) = Kpath (h, h )/ (Kpath (h, h)Kpath (h , h ))). Using
such a kernel we have h2K = K(h, h) = 1 for any path. The image in the feature
space of our set of paths lies thus on an hypersphere of radius 1 centered at the
origin (Fig. 1(a)). Using the one-class ν-SVM, we associate a set of paths to a
region on this sphere. This region corresponds to the density support estimate
of the set of paths’ unknown pdf.
Once the two density supports are estimated, the one-class SVM yields w1
(resp. w2 ), the mean vector, and ρ1 (resp. ρ2 ), the ordinate at the origin, for the
first bag (resp. the second bag). In order to compare the two mean vectors w1
and w2 , we define the following distance function:
t
w1 K1,2 w2
dmean (w1 , w2 ) = arccos , (1)
w1 w2
(a) Sets on the unit (b) Original (c) Spurious branch (d) Ligature
sphere
Fig. 1. (a) Separating two sets using one-class SVM. The symbols (w1 ,ρ1 ) and (w2 ,ρ2 )
denote the parameters of the two hyperplanes which are represented by dashed lines.
Influence of small perturbation on the skeleton (in black) ((b),(c) and (d)).
14 F.-X. Dupé and L. Brun
the difference between the two coordinates at the origin (ρ1 and ρ2 ):
−d2mean (w1 , w2 ) −(ρ1 − ρ2 )2
Kchange (P1 , P2 ) = exp 2
exp 2
. (2)
2σmean 2σorigin
Finally, we define the kernel between two graphs G1 , G2 as the kernel between
their two bags of path: Kchange (G1 , G2 ) = Kchange (P1 , P2 ).
The distance between the mean vectors is a metric based on a normalized
scalar product combined with arccos which is bijective on [0, 1]. However, the
relationship between the couple (w, ρ) and the bag of path being not bijective,
the final kernel between bags is only semi positive-definite [13]. Though, in all
our experiments run so far the Gram matrices associated to the bags of paths
were positive-definite.
where ϕ(v) and ψ(e) denote respectively the vectors of features associated to the
node v and the edge e. The terms Kv and Ke denote two kernels for respectively
node’s and edge’s features. For the sake of flexibility and simplicity, we use
Gaussian RBF kernels based on the distance between the attributes defined in
section 3.2.
3 Skeleton-Based Graph
3.1 Graph Representations
Medial-axis based skeleton are built upon a distance function whose evolution
along the skeleton is generally modeled as a continuous function. This function
presents important changes of slope mainly located at the transitions between
two parts of the shape. Based on this remark Siddiqi and Kimia distinguish three
kind of branches within the shock graph construction scheme [2]: branches with
positive, null or negative slopes. Nodes corresponding to these slope transitions
are inserted within the graph. Such nodes may thus have a degree 2. Finally,
edges are directed using the slope sign information.
Compared to the shock graph representation, we do not use oriented edges
since small positive or negative values of the slope may change the orientation
of an edge and thus alter the graph representation. On the other hand our set
of nodes corresponds to junction points and to any point encoding an important
Edition within a Graph Kernel Framework for Shape Recognition 15
change of slope of the radius function. Such a significant change may encode a
change from a positive to a negative slope but also an important change of slope
with a same sign (Fig. 2(a)). Encoding these changes improves the detection of
the different parts of the shape. The main difficulty remains the detection of the
slope changes due to the discrete nature of the data. The slopes are obtained
using regression methods based on first order splines [15]. These methods are
robust to discrete noise and first order splines lead to a continuous representation
of the data. Moreover, such methods intrinsically select the most significant
slopes using a stochastic criterion. Nodes encoding slope transitions are thus
located at the junctions (or knot) between first order splines.
3.2 Attributes
The graph associated to a shape only provides information about its structural
properties. Additional geometrical properties of the shape may be encoded using
node and edge attributes. From a structural point of view, a node represents a
particular point inside the shape skeleton and an edge a branch. However, a
branch also represents the set of points of the shape which are closer to the
branch than any other branch. This set of points is defined as the influence zone
of the branch and can be computed using SKIZ transforms [16].
Descriptors computed from the influence zone are called local, whilst the ones
computed from the whole shape are called global. In [3] Goh introduces this
notion and points out that an equilibrium between local and global descriptors is
crucial for the efficiency of a shape matching algorithm. Indeed local descriptors
provide a robustness against occlusions, while global ones provide a robustness
against noise.
We have thus selected a set of attributes which provides an equilibrium be-
tween local and global features. Torsello in [17] proposes as edge attribute an
approximation of the perimeter of the boundary which contributes to the forma-
tion of the edge, normalized by the approximated perimeter of the whole shape.
Suard proposes [9] as node attribute the distance between the node position and
the gravity center of the shape divided by the square of the shape area. These
two attributes correspond to our global descriptors.
16 F.-X. Dupé and L. Brun
Goh proposes several local descriptors [3] for edges based on the evolution
of the radius of the maximal circle along a branch. For each point (x(t), y(t))
of a branch, t ∈ [0, 1], we consider the radius R(t) of its maximal circle. In or-
der to normalize the data, the radius is divided by the square root of the area
of the influence zone of the branch. We also introduce α(t), the angle formed
by the tangent vector at (x(t), y(t)) and the x-axis. Then we consider (ak )k∈N
and (bk )k∈N the coefficients of two regression polynomials that fit respectively
R(t) and α(t) in the least square sense. If both polynomials are of sufficient
orders, the skeleton can be reconstructed from the graph and so the shape
(Section
1). Following Goh [3], our two local descriptors are defined by: k ak /k
and k bk /k.
The distance associated to each attribute is defined as the absolute value of
the difference between the values a and b of the attribute: d(a, b) = |a − b|. As the
attributes are normalized , the distances are invariant to change of scale and rota-
2
tion. Such distances are used to define the Gaussian RBF kernels (exp −d2σ(.,.) 2 )
used to design Kpath (Section 2.2).
4 Hierarchical Kernels
The node suppression operation removes a node from the path and all the graph
structures that are connected to this path by this node. Within the path, the
two edges incident to the nodes are then merged. This operation corresponds
to the removal of a part of the shape: for example, if we remove the node 2 in
Fig. 2(b1), a new shape similar to Fig. 2(b2) is obtained.
The edge contraction operation contracts an edge and merges its two extremity
nodes. This results in a contraction of the shape: for example, if we contract
the edge e1,2 of the shape in Fig. 2(b1) then the new shape will be similar to
Fig. 2(b3).
Edition within a Graph Kernel Framework for Shape Recognition 17
where σcost is a tuning variable. This kernel is composed of two parts: a scalar
product of the edition costs in a particular space and a path kernel. For a small
value of σcost the behavior of the kernel will be close to Kclassic as only low
editions cost will contribute to Kedit (h, h ). Conversely, for a high value every
editions will contribute to Kedit (h, h ) with an approximately equal importance.
The kernel Kclassic is a tensor product kernel based on positive-definite kernels
(Section 2.2), so it is positive-definite. The kernel over edition costs is constructed
from a scalar product and is thus positive-definite. These two last kernels form
a tensor product kernel. Finally Kedit is proportional (by a factor D + 1) to a
R-convolution kernel [18, Lemma 1], thus this kernel is positive-definite.
5 Experiments
For the following experiments, we defined the importance of a path as the sum of
the weights of its edges. For each graph, we first consider all its paths composed
of at most 7 nodes and we sort them according to their importance using a
descending order. The bag of paths is then constructed using the first 5 percent of
the sorted paths. For all the experiments, the tuning variable of the deformation
cost γ (Section 4.2) is set to 0.5.
The first experiment consists in an indexation of the shapes using the distances
induced by the kernels, i.e. d(G, G ) = k(G, G) + k(G , G ) − 2k(G, G ) where k is
a graph kernel. The different σ of the attributes RBF kernels involved in Kclassic
(Section 3.2) are fixed as follows: σperimeter = σradius = σorientation = 0.1 and
σgravity center = 0.2. Note that Kclassic constitutes the basis of all the kernels
defined below. The parameters of Kchange are set to: σmean = 1.0, σorigin = 20
and ν = 0.9. The maximal number of editions is fixed to 6. Let us consider the
class tool from the LEMS database [19] of 99 shapes with 11 elements per class.
Two kind of robustness are considered: robustness against ligatures and per-
turbations and robustness against erroneous slope nodes. Ligatured skeletons of
the shapes are created by varying the threshold parameter ζ of the skeletoniza-
tion algorithm [17], high values lead to ligatured skeletons while low value tend
to remove relevant branches. Skeletons with erroneous slope nodes are created
by varying the parameter of our slope detection algorithm. This detection is
based on the BIC criterion which uses the standard error of the noise σBIC .
Edition within a Graph Kernel Framework for Shape Recognition 19
11 11 1
10 10 0.9
9 0.8
9
8
8 0.7
7
7 0.6
6
0.5
6
5
0.4
5
4
0.3
4
3
0.2
3
2
0.1
1 2
0
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Fig. 3. Resistance to spurious slope changes (a) and spurious branches(b). For (a) and
(b) the kernels are from top to bottom: Kchange,edit2 ( ), Kchange,edit1 ( ), random
walk kernel( ), and Kchange,classic ( · ). (c) ROC curves for the classification of dogs
and cats using: Kchange,edit ( ), random walk kernel ( ) and Kchange,classic ( · ).
A small value of σBIC makes the criterion sensitive to small changes of slopes
and gives many slope nodes, while high value makes the criterion insensitive
to slope changes. Four kernels are compared: random walk kernel [8], Kchange
with Kclassic (denoted as Kchange,classic ) and 2 kernels using Kchange with Kedit
(with σcost = 0.1 for Kchange,edit1 and σcost = 0.2 for Kchange,edit2 ). Using the
distances induced by the kernels, shapes are sorted in ascending order according
to their distance to the perturbed tool. Fig. 3(a) shows the mean number of tools
inside the first 11 sorted shapes for an increasing value of σBIC . Fig. 3(b) shows
the same number but for a decreasing threshold value ζ. The two edition kernels
show a good resistance to perturbations and ligatures as they get almost all the
tools for each query. Their performances slightly decrease when shapes become
strongly distorted. The kernel Kchange,classic gives the worst results as the re-
duction of the bag of paths leads to paths of different lengths which cannot be
compared with Kclassic (Section 2.2). The random walk kernel is robust against
slight perturbations of the shapes but cannot deal with severe distortion.
In the second experiment, we strain kernels by separating 49 dogs from 49 cats
using a ν-SVM. The three considered kernels are Kchange,classic , Kchange,edit
(with σcost = 0.5) and random walk. The different σ of the attributes RBF
kernels (Section 3.2) are fixed as follows: σperimeter = σradius = σorientation = 0.1
and σgravity center = 0.5. The parameters of Kchange are set to: σmean = 5.0,
σorigin = 20 and ν = 0.9. We compute the ROC curves produced by kernels using
a 10-fold cross-validation. Fig 3(c) presents the three ROC curves. The random
walk kernel gives correct results, whilst the Kchange,classic kernel confirms its
poor performance. The Kchange,edit kernel shows the best performances and a
behaviour similar to the random walk kernel. Furthermore, on our computer a
Core Duo 2 at 2GHz, the computational burden of the 98x98 Gram matrix is of
approximately 23 minutes for Kchange,edit and of 2.5 hours for the random walk
kernel.
20 F.-X. Dupé and L. Brun
6 Conclusion
We have defined in this paper a positive-definite kernel for shape classification
which is robust to perturbations. Our bag of path contains the more important
paths of a shape below a given length in order to only capture the main infor-
mation about the shape. Only the Kedit kernel provides enough flexibility for
path comparison and gives better results then the classical random walk kernel.
In a near future, we would like to improve the selection of paths. An extension
of the edition process on graphs is also planned.
References
1. Pelillo, M., Siddiqi, K., Zucker, S.: Matching hierarchical structures using associa-
tion graphs. IEEE Trans. on PAMI 21(11), 1105–1120 (1999)
2. Sebastian, T., Klein, P., Kimia, B.: Recognition of shapes by editing their shock
graphs. IEEE Trans. on PAMI 26(5), 550–571 (2004)
3. Goh, W.B.: Strategies for shape matching using skeletons. Computer Vision and
Image Understanding 110, 326–345 (2008)
4. Ruberto, C.D.: Recognition of shapes by attributed skeletal graphs. Pattern Recog-
nition 37(1), 21–31 (2004)
5. Siddiqi, K., Shokoufandeh, A., Dickinson, S.J., Zucker, S.W.: Shock graphs and
shape matching. Int. J. Comput. Vision 35(1), 13–32 (1999)
6. Leymarie, F.F., Kimia, B.B.: The shock scaffold for representing 3d shape. In:
Arcelli, C., Cordella, L.P., Sanniti di Baja, G. (eds.) IWVF 2001. LNCS, vol. 2059,
pp. 216–229. Springer, Heidelberg (2001)
7. Bai, X., Latecki, J.: Path Similarity Skeleton Graph Matching. IEEE PAMI 30(7)
(2008)
8. Vishwanathan, S., Borgwardt, K.M., Kondor, I.R., Schraudolph, N.N.: Graph ker-
nels. Journal of Machine Learning Research 9, 1–37 (2008)
9. Suard, F., Rakotomamonjy, A., Bensrhair, A.: Mining shock graphs with kernels.
Technical report, LITIS (2006),
http://hal.archives-ouvertes.fr/hal-00121988/en/
10. Neuhaus, M., Bunke, H.: Edit-distance based kernel for structural pattern classifi-
cation. Pattern Recognition 39, 1852–1863 (2006)
11. Dupé, F.X., Brun, L.: Hierarchical bag of paths for kernel based shape classification.
In: SSPR 2008, pp. 227–236 (2008)
12. Desobry, F., Davy, M., Doncarli, C.: An online kernel change detection algorithm.
IEEE Transaction on Signal Processing 53(8), 2961–2974 (2005)
13. Berg, C., Christensen, J.P.R., Ressel, P.: Harmonic Analysis on Semigroups.
Springer, Heidelberg (1984)
14. Kashima, H., Tsuda, K., Inokuchi, A.: Marginalized kernel between labeled graphs.
In: Proc. of the Twentieth International conference on machine Learning (2003)
15. DiMatteo, I., Genovese, C., Kass, R.: Bayesian curve fitting with free-knot splines.
Biometrika 88, 1055–1071 (2001)
16. Meyer, F.: Topographic distance and watershed lines. Signal Proc. 38(1) (1994)
17. Torsello, A., Hancock, E.R.: A skeletal measure of 2d shape similarity. CVIU 95,
1–29 (2004)
18. Haussler, D.: Convolution kernels on discrete structures. Technical report, Depart-
ment of Computer Science, University of California at Santa Cruz (1999)
19. LEMS: shapes databases, http://www.lems.brown.edu/vision/software/
Coarse-to-Fine Matching of Shapes Using
Disconnected Skeletons by Learning
Class-Specific Boundary Deformations
1 Introduction
There is a long history of research in computer vision on representing generic
shape since shape information is a very strong visual clue in recognizing and
classifying objects. A generic shape representation should be insensitive to not
only geometric similarity transformations (i.e. translation, rotation, and scaling)
but also visual transformations such as occlusion, deformation and articulation
of parts. Since their introduction by Blum in [4], local symmetry axis based
representations (commonly referred to as shape skeletons), have attracted and
still attracts many scientists in the field, and became a superior alternative to
boundary-based shape representations. These representation schemes naturally
capture part structure by modeling any given shape via a set of axial curves, each
of which explicitly represents some part of the shape. Once the relations among
extracted shape primitives, i.e. the skeleton branches, are expressed in terms of
a graph or a tree data structure (e.g. [5,6,7]), resulting shape descriptions are
insensitive to articulations and occlusions.
A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 21–30, 2009.
c Springer-Verlag Berlin Heidelberg 2009
22 A. Erdem and S. Tari
Fig. 1. Disconnected skeletons of some shapes and the corresponding tree representa-
tions. Note that each disconnection point (except the pruned major branches) gives
rise to two different nodes in the tree, representing the positive and negative skeleton
branches meeting at that disconnection point. However, for illustration purposes, only
one node is drawn.
Coarse-to-Fine Matching of Shapes Using Disconnected Skeletons 23
descriptions that they do not explicitly carry any information about boundary
details. This issue is in fact about a philosophical choice of compromise between
sensitivity and stability. Clearly, in distinguishing shapes, it might happen that
the similarity of boundary details is more distinctive than the similarity of the
structure of disconnection points (Fig. 6, 7).
In this study, we present a coarse-to-fine strategy to deal with such situations.
The organization of the paper is as follows. In Section 2, we describe a way
to obtain radius functions [4] (associated with the positive skeleton branches)
in order to enrich the disconnected skeleton representation with information
about shape boundary details. In Section 3, we utilize this extra information to
enhance the class-specific knowledge utilized in the category influenced matching
method proposed in [3] that boundary deformations in a shape category are
additionally learned from examples. Following to that, in Section 4, we introduce
a fine tuning step to the category influenced matching method, which then takes
into account the similarity of boundary details. In Section 5, we present some
matching results. Finally, in Section 6, we give a brief summary and provide
some concluding remarks.
Fig. 2. (a) A camel shape. The level curves of the surfaces (b) φ, (c) v, computed with
ρ = 16, (d) v, computed with ρ = 64, (e) v, computed with ρ = 256.
24 A. Erdem and S. Tari
The value of v on the skeleton point (the midpoint x = d) is equal to the hy-
1
perbolic cosine function cosh(d/ρ) , or equivalently, the distance from the skeleton
point to the closest point on the boundary is given by ρcosh−1 ( v(d) 1
). This ex-
plicit solution is certainly not valid for the 2D case as the interactions in the
diffusion process are more complicated but it can be used as an approximation.
Let s be a skeleton point located at (sx , sy ) along a positive skeleton branch.
Given a corresponding edge strength function v computed with a sufficiently
large value of ρ, the minimum distance from s to the shape boundary, denoted
by r(s), can be approximated with:
1
r(s) = ρcosh−1 (2)
v(sx , sy )
Fig. 4(a) shows the disconnected skeleton of a horse shape where the radius
functions of the positive skeleton branches are approximately obtained from the
edge strength function computed with ρ = 256 (the same value of ρ is used in
the experiments). The reconstructions of the shape sections associated with the
positive skeleton branches are given separately in Fig. 4(b). Notice that small de-
tails on the shape boundary, e.g. the horse’s ears, cannot be recovered completely
since the perturbations on the shape boundary are ignored in disconnected skele-
ton representation. Moreover, the reconstructions might deviate from their true
form at some locations, e.g. the skeleton points close to the leg joints, where a
positive branch loses its ribbon-like structure of having slowly varying width.
However, these approximate radius functions, when normalized with respect to
the radius of maximum circle associated with the shape center, can be used as
the descriptions of the most prominent boundary details (Fig. 4(c)).
Coarse-to-Fine Matching of Shapes Using Disconnected Skeletons 25
(a) (b)
Approximate radius function along axis A Approximate radius function along axis B Approximate radius function along axis C Approximate radius function along axis D
1 1 1 1
0 0 0 0
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
medial points medial points medial points medial points
Approximate radius function along axis E Approximate radius function along axis F
1 1
0.8 0.8
normalized radii values
0.6 0.6
0.4 0.4
0.2 0.2
0 0
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
medial points medial points
(c)
Fig. 4. (a) Disconnected skeleton of a horse shape and the radius functions obtained
from the edge strength function computed with ρ = 256 (the maximal inscribed circles
are drawn at every 3 consecutive skeleton points). (b) Shape sections associated with
the positive skeleton branches. (c) Normalized radius functions associated with the
branches A-F (from top left to bottom right).
0.8
0.4
0.2
0
4 8 12 16 20 24 28 32
sampled medial points
(a) (b)
positive branches (Fig. 5). The deformation space is then modeled by applying
principal component analysis (PCA), where the first few principal components
describe the representation space for possible deformations. In the experiments,
our sample rate is 32 per each positive skeleton branch, and we use the first
five principal components. Hence, each sampled radius functions are represented
with a 5-dimensional vector.
case, we additionally store the mean of the matched radius functions together
with the reduced set of principle components in the nodes of the category tree.
More formally, the overall algorithm can be summarized with the following two
successive steps:
1. Let T1 be the shape tree of the query shape which is being compared with
the shape tree of a database shape, denoted by T2 , nodes of which is linked
with a specific leaf node of the corresponding category tree. Compute an
initial distance and the correspondences between T1 and T2 using category
influenced matching method:
⎡ ⎤
where Λ and Δ respectively denote the set of nodes removed from T1 and
the set of nodes inserted to T1 from T2 , and Ω denotes the set of matched
nodes (See [3] for the details about the definition of cost functions associated
with the edit operations rem(ove), ins(ert) and ch(ange)).
⎧ 5
⎪
⎨ (αi − βi )2
√ 1 exp − if u, v express positive branches
Φ(u, v) = 2πσ 2 2σ 2 (5)
⎪
⎩ i=1
0 otherwise
where α and β are to the vectors formed by projecting the radius functions
associated with u and v onto related deformation space (σ is taken as σ = 0.4
in the experiments).
5 Experimental Results
costs obtained with the category influenced matching method in [3] do not well
reflect the perceptual dissimilarities1 . On the other hand, when one examines the
differences in the boundary details, it is clear that a more perfect decision can
be made. For example, refer to Fig. 6. The pairs of radius functions associated
with the matched branches is much similar in the case of matching of the horse
shapes than the ones in the matching of the query horse shape with the cat
shape. The only exception is the similarity of the horses’ tails (Fig. 6(b), in the
middle row and on the right) but note that these radius functions are compared
in the corresponding deformation spaces that are learned from the given set of
examples. In this regard, the proposed coarse-to-fine strategy can be used to
refine the matching results.
Approximate radius functions Approximate radius functions Approximate radius functions Approximate radius functions
1 1 1 1
Approximate radius functions Approximate radius functions Approximate radius functions Approximate radius functions
1 1 1 1
Approximate radius functions Approximate radius functions Approximate radius functions Approximate radius functions
1 1 1 1
(a) (b)
Fig. 6. Some matching results and the uniformly sampled radius functions of matched
branches. The final matching costs are (a) 0.5800 (reduced from 0.7240), (b) 0.5368 (re-
duced from 0.7823). Note that the similarity of radius functions are actually computed
in the related low-dimensional deformation spaces.
1
In each experiment, the knowledge about the category of the database shape (the
ones on the right) is defined by 15 examples of that category, randomly selected from
the shape database given in [3].
Coarse-to-Fine Matching of Shapes Using Disconnected Skeletons 29
(a) (b)
(c) (d)
(e) (f)
Fig. 7. Some other matching results. The final matching costs are (a) 1.1989 (reduced
from 1.2904), (b) 0.9458 (reduced from 1.4936), (c) 1.9576 (reduced from 2.1879), (d)
1.8744 (reduced from 3.0387), (e) 0.8052 (reduced from 0.8105), (f) 0.6738 (reduced
from 1.0875).
References
1. Aslan, C., Tari, S.: An axis-based representation for recognition. In: ICCV 2005,
vol. 2, pp. 1339–1346 (2005)
2. Aslan, C., Erdem, A., Erdem, E., Tari, S.: Disconnected skeleton: Shape at its
absolute scale. IEEE Trans. Pattern Anal. Mach. Intell. 30(12), 2188–2203 (2008)
3. Baseski, E., Erdem, A., Tari, S.: Dissimilarity between two skeletal trees in a con-
text. Pattern Recognition 42(3), 370–385 (2009)
4. Blum, H.: Biological shape and visual science. Journal of Theoretical Biology 38,
205–287 (1973)
5. Zhu, S.C., Yuille, A.L.: Forms: A flexible object recognition and modeling system.
Int. J. Comput. Vision 20(3), 187–212 (1996)
6. Siddiqi, K., Kimia, B.B.: A shock grammar for recognition. In: CVPR, pp. 507–513
(1996)
7. Liu, T.L., Geiger, D., Kohn, R.V.: Representation and self-similarity of shapes. In:
ICCV, pp. 1129–1135 (1998)
8. August, J., Siddiqi, K., Zucker, S.W.: Ligature instabilities in the perceptual orga-
nization of shape. Comput. Vis. Image Underst. 76(3), 231–243 (1999)
9. Sebastian, T.B., Klein, P.N., Kimia, B.B.: Recognition of shapes by editing shock
graphs. In: ICCV, vol. 1, pp. 755–762 (2001)
10. Siddiqi, K., Shokoufandeh, A., Dickinson, S.J., Zucker, S.W.: Shock graphs and
shape matching. Int. J. Comput. Vision 35(1), 13–32 (1999)
11. Pelillo, M., Siddiqi, K., Zucker, S.W.: Many-to-many matching of attributed trees
using association graphs and game dynamics. In: Arcelli, C., Cordella, L.P., Sanniti
di Baja, G. (eds.) IWVF 2001. LNCS, vol. 2059, pp. 583–593. Springer, Heidelberg
(2001)
12. Pelillo, M., Siddiqi, K., Zucker, S.W.: Matching hierarchical structures using asso-
ciation graphs. IEEE Trans. Pattern Anal. Mach. Intell. 21(11), 1105–1120 (1999)
13. Torsello, A., Hancock, E.R.: A skeletal measure of 2d shape similarity. CVIU 95(1),
1–29 (2004)
14. Liu, T., Geiger, D.: Approximate tree matching and shape similarity. In: ICCV,
vol. 1, pp. 456–462 (1999)
15. Macrini, D., Siddiqi, K., Dickinson, S.: From skeletons to bone graphs: Medial
abstraction for object recognition. In: CVPR (2008)
16. Bai, X., Latecki, L.J.: Path similarity skeleton graph matching. IEEE Trans. Pat-
tern Anal. Mach. Intell. 30(7), 1282–1292 (2008)
17. Torsello, A., Hancock, E.R.: Matching and embedding through edit-union of trees.
In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS,
vol. 2352, pp. 822–836. Springer, Heidelberg (2002)
18. Demirci, M.F., Shokoufandeh, A., Dickinson, S.J.: Skeletal shape abstraction from
examples. IEEE Trans. Pattern Anal. Mach. Intell (to appear 2009)
19. Shokoufandeh, A., Dickinson, S.J., Siddiqi, K., Zucker, S.W.: Indexing using a
spectral encoding of topological structure. In: CVPR, pp. 2491–2497 (1999)
20. Demirci, M.F., van Leuken, R., Veltkamp, R.: Indexing through laplacian spectra.
CVIU 110(3), 312–325 (2008)
21. Bai, X., Latecki, L.J., Liu, W.Y.: Skeleton pruning by contour partitioning with
discrete curve evolution. IEEE Trans. Pattern Anal. Mach. Intell. 29(3), 449–462
(2007)
22. Tari, S., Shah, J., Pien, H.: Extraction of shape skeletons from grayscale images.
CVIU 66(2), 133–146 (1997)
An Optimisation-Based Approach to Mesh
Smoothing: Reformulation and Extensions
Smoothing mesh data is a common issue in computer graphics, finite element mod-
elling and data visualisation. A simple and natural method to perform this task
is known as Laplacian smoothing, or Gaussian smoothing: it basically consists of
moving, in parallel, every mesh vertex towards the center of mass of its neigh-
bours, and repeating this operation until the desired smoothing effect is obtained.
In practice, it gives reasonable results with low computational effort when cor-
rectly tuned, and it is extremely simple to implement, hence its popularity.
This somewhat naive method has the drawback of shrinking objects: when
repeated until stability, it reduces any finite object to a single point. Thus, the
choice of when to stop smoothing iterations is a crucial issue.
However, Laplacian smoothing has inspired a number of variants and alterna-
tive methods. Taubin’s method [1] avoids shrinkage by alternating contraction
and expansion phases. The method of Vollmer et al. [2] introduces a term which
correspond to a (loose) attachment of the points to their initial position.
Another criticism made against Laplacian smoothing is that it lacks motiva-
tion, because it is not directly connected to any specific mesh quality criterion
[3,4,5]. A common approach to mesh smoothing consists of defining a cost func-
tion related to the mesh elements (relative positions of vertices, edge lengths,
triangle areas, angles, etc) and to design an algorithm that minimises this cost
function [4,5,6,7,8,9,10,11,12].
Mesh smoothing has also been tackled from the signal filtering point of view.
In this framework, [13] analyses the shrinkage effect of the Laplacian smooth-
ing method, and explains the nice behaviour of the operator described in [1].
Other classical filters, such as the mean and median filters, were also adapted to
meshes [14]. Other approaches are based on physical analogies [15,16], anisotropic
and non-linear diffusion [17,18,19,20,21], and curvature flow [22,23].
This work has been partially supported by the “ANR BLAN07-2 184378” project.
A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 31–41, 2009.
c Springer-Verlag Berlin Heidelberg 2009
32 Y. Hamam and M. Couprie
Optimising this function leads to grouping all the points in one, thus shrinking
the mesh to one point. We will show in what follows that under certain condi-
tions this yields, when optimised by gradient descent, to the basic Laplacian
smoothing technique. This function may be represented in matrix form. Let C
be the ⎧
node-edge incidence matrix of the graph G, defined as:
⎨ 1, if v is the sending end of edge e;
Cve = −1, if v is the receiving end of edge e;
⎩
0, otherwise.
Then J may be written as:
1 1 1
J= (Cx)t Cx = xt C t Cx = xt Ax (2)
2 2 2
where A = C t C. Then since C t is not full ranked (sum of rows is equal to
zero), the determinant of A is zero1 . Furthermore, let z = Cy, then we have
y t C t Cy = z t z ≥ 0 and hence A is positive semi-definite.
The A matrix is usually sparse for large problems, with the diagonal elements:
aii = number of edges incident to vertex i; and the off-diagonal elements:
1
For the notions of linear algebra, see e.g. [24].
An Optimisation-Based Approach to Mesh Smoothing 33
−1, if an edge exists between vertices i and j;
aij =
0, otherwise.
In the literature this matrix is referred to as the Laplacian matrix (also called
topological or graph Laplacian), it plays a central role in various mesh processing
applications [25]. Later on we will give some of the properties of matrix A.
Optimisation-based smoothing. Consider the optimisation problem of the
function J. The gradient is ∇x J = C t Cx = Ax. The gradient descent algorithm
may be written as
xn+1 = xn − αAxn (3)
1
J= (x − x
)t (x − x
) + θxt Ax (4)
2
where x is the initial value of the coordinate vector x and θ is a positive constant
that allows changing the respective weigths of the two parts of the function. If
θ = 0 then there is no need for optimisation and the minimum of J is obtained
for x = x. For θ >> 1 the function is equivalent to (2). Thus this function is a
compromise between keeping the vertices at their initial positions and reducing
the distance between points. Now consider the gradient of J with respect to x:
∇x J = (x − x ) + θAx. At the optimum, we have (x − x ) + θAx = 0, that is,
(I + θA)x = x .
Consider the matrix (I + θA). Since A is symmetric positive semi-definite, its
eigenvalues are greater or equal to zero. Adding an identity matrix to θA with
θ ≥ 0 gives a positive definite matrix. Hence the inverse of (I +θA) exists and for
small size problems the above equation may be solved to give x = (I + θA)−1 x .
Also note that, due to this property, the solution is unique.
An Optimisation-Based Approach to Mesh Smoothing 35
By identifying the terms with equation (5), M with (I + θA) and z with ∇x J,
we get the following inequality: αn ≥ 1+θλmax
1
(A) .
The optimal value given by equation (5) does not satisfy the limit on α
for monotonic convergence but gives faster convergence. However, this doubles
the computation time at each iteration. In our experiments, we noticed that in
the optimal step case oscillations are indeed obtained. Furthemore, note that
the optimisation problem is quadratic. This problem may be solved by the con-
jugate gradient method. If this is done, the exact solution may be obtained by
N iterations, where N is the size of the matrix A, i.e. the number of vertices in
the graph.
In the above section, two special cases of functions were given. Many proposals
have been made to smooth while conserving the size of objects. In this section,
a second order function is proposed. Special cases are then considered and com-
pared. It is then shown that many published methods are special cases of the
36 Y. Hamam and M. Couprie
optimisation of this function. Consider the following second order function with
attach to the initial coordinates:
1 2
J= (x − x
)t Q(x − x
) + θ0 xt x + θ1 xt Ax + θ2 xt A x (6)
2
where
– Q is a symmetric positive definite weighing matrix,
– θ0 , θ1 and θ2 are weighing scalars for the zero, first and second order terms,
– A = C t ΩC, and Ω is a diagonal matrix of weights associated to each edge
(see [27]).
Let us now consider two special cases of the proposed function. The first with-
out a term that attaches the vertices to their original position, and the second
with such a term. The first order objective function minimises the sum of the
squares of distances between adjacent vertices. In the proposed objective func-
tion, it is proposed to minimise the sum of the squares of the distances between
vertices and the geometric centre of their neighbours. The method obtained
based on the optimisation of this function will be refered to as the Second Order
(SO) algorithm.
Case 1. Consider the function
1 t 1
J= x (AA)x = xt A2 x (7)
2 2
In this function Ax gives a measure of the deviation of each xi from the geometric
centre of its neighbours. So (Ax)t Ax = xt AAx is the sum of the squares of
the distances of each vertex from the geometric centre of its neighbours. In
comparison with the Laplacian case, where the sum of the squares of distances
between neighbouring vertices is minimised, this function is proposed to reduce
shrinkage.
Application of the Gradient Descent Method. In a similar manner to the
above development one iteration of the gradient descent method is as follows:
xn+1 = xn − αn ∇x J = xn − αn A2 xn .
With αn being constant (αn = α), the condition for assuring monotonic con-
vergence is α < λ2 1 (A) , which is obtained if the following condition is satisfied:
max
α < 4a12 .
As for the previous case, an extra term is added to the objective function that
attaches the vertices to their original positions. This gives the method that we
will refer to as the Second Order With Attach (SOWA) algorithm.
Case 2. Consider the function
1
J= (x − x
)t (x − x
) + θxt (AA)x (8)
2
where x is the initial value of the coordinate vector x. Now consider the gradient
of J with respect to x: ∇x J = (x − x ) + θA2 x. At the optimum, we have:
(x − x
) + θA x = 0, that is, (I + θA )x = x
2 2
.
An Optimisation-Based Approach to Mesh Smoothing 37
In a similar manner to the previous case, the inverse of (I +θA2 ) exists and for
small size problems the above equation may be solved to give x = (I + θA2 )−1 x ,
and the solution is unique.
Application of the Gradient Descent Method. With similar considerations
as above, one iteration of the gradient descent
method is as follows: xn+1 =
x − α ∇x J = x − α (x − x
n n n n n
) + θA x , and for monotonic convergence αn
2 n
n
is considered to be constant (α = α).
The algorithm converges monotonically when α < 1+θλ21 (A) , which is as-
max
1
sured if α < 1+4θa 2.
the function to be optimised (eq. 3 of [25]) is a special case of eq. (6), where only
the first term is used. In Sorkine’s notations the matrix Q is equal to At (D−1 )2 A
in our framework. Sorkine adds a second term for the purpose of fixing some ver-
tices to their original positions.
In a more recent work by Bougleux et al. [12], eq. 1 gives the optimisation
function. In the same way as in Sorkine’s work, a term is added for the purpose
of fixing some vertices to their original positions. In the case where no attach
points are given and p = 2 (the most common one), this function is equal to the
first order term of ours.
Other methods may also be represented using the proposed framework. We
may cite here, as other examples, the work of Vollmer et al. [2], the work of
Nealen et al. [11] and the one of Ji et al. [10].
4 Numerical Results
In this section the following five smoothing algorithms are compared:
1. FO algorithm: This is the first order (FO) optimisation based modified
smoothing with the objective function as the sum of the squares of distances
between adjacent points as given in eq. (2).
2. FOWA algorithm: The previous method with an attach term (FOWA) added
to the objective function related to the original positions of the points as
given in eq. (4).
3. SO algoritm: The optimisation based modified smoothing with the objective
function as the sum of the squares of distances between adjacent points and
the geometric center of their neighbours. This is refered to as the second
order (SO), and corresponds to the function given in eq. (7).
4. SOWA algorithm: The previous method with an attach term added to the
objective function related to the original positions of the points. It is refered
to as the second order with attach (SOWA) and corresponds to the function
given in eq. (8).
5. HC algorithm: This is the algorithm described by [2]. It is used in the compar-
ison since it is considered to be quite efficient for smoothing while reducing
shrinkage.
In order to test the shrinkage and smoothing properties of the functions we will
first compare the FO and SO algorithms. This is done to compare the effect
of using the two distance measures. To run this test, a sphere with a radius of
one is used. Random noise was added to the sphere, which gave a sphere with
1.001 mean distance to center and 0.665 standard deviation. The mean angle
between two facets is 0.527. This sphere was then smoothed using FO and SO
algorithms. This was run to give smoothing of the same order of magnitude.
The mesh smoothed by the FO algorithm using 13 iterations gave the following
properties: Mean angle = 0.069, Standard deviation = 0.075, Mean distance to
center = 0.977. The mesh smoothed by the SO algorithm using 30 iterations gave
the following properties: Mean angle = 0.070, Standard deviation = 0.075, Mean
An Optimisation-Based Approach to Mesh Smoothing 39
distance to center = 1.00015. The shrinkage in the first case is 2.42% whereas in
the second case it is 0.125%.
These results show that it is more interesting to use as an optimisation func-
tion the sum of the squares of the distances between the vertices and the geomet-
ric center of the adjacent vertices (second order term). For equivalent smoothing
this method gives significantly less shrinkage.
Consider next the two algorithms (FOWA and SOWA) based on functions
with attach to the original points. These are compared using the sphere with
three noise levels (0.05, 0.5 and 1.0) as shown in Table 1.
Table 1. Comparison of the four methods for the sphere with various noise levels and
various values of θ. The values of α used are those calculated to ensure monotonic
convergence.
In the above table, the MSE value is the sum of the squares of the error
between the smoothed points and the original sphere. The shrink value given
is the ratio between the average distance of points to the center of the sphere
divided by that of the sphere with noise. Notice that in all cases when the sum
of the squares of the distances between the vertices and the geometric center
of the adjacent vertices is used the shrinkage is less than 1%. The method that
gives best results is SOWA.
In this table, some results obtained for the same sphere by the HC
Algorithm [2] are also given. This algorithm necessitates the tuning of two pa-
rameters (α and β). The results using the tuning given by [2] are given. The
algorithm gives equivalent results to our method.
40 Y. Hamam and M. Couprie
a b c d e
Fig. 1. Some results of Laplacian smoothing. a: Original cube, b: cube with added
noise, c: after 5 iterations, d: after 10 iterations, e: after 15 iterations.
a b c d
References
1. Taubin, G.: A signal processing approach to fair surface design. In: Computer
Graphics Proceedings, Annual Conference Series, pp. 351–358 (1995)
2. Vollmer, J., Mencl, R., Muller, H.: Improved laplacian smoothing of noisy surface
meshes. Computer Graphics Forum 18(3), 131–138 (1999)
3. Parthasarathy, V., Kodiyalam, S.: A constrained optimization approach to finite
element mesh smoothing. Finite Elements in Analysis and Design 9, 309–320 (1991)
4. Freitag, L.: On combining laplacian and optimization-based mesh smoothing tech-
niques. In: Joint ASME, ASCE, SES symposium on engineering mechanics in man-
ufacturing processes and materials processing, pp. 37–43 (1997)
5. Amenta, N., Bern, M., Eppstein, D.: Optimal point placement for mesh smoothing.
Journal of Algorithms 30, 302–322 (1999)
6. Bank, R., Smith, R.: Mesh smoothing using a posteriori error estimates. SIAM
Journal on Numerical Analysis 34(3), 979–997 (1997)
7. Freitag, L., Jones, M., Plassmann, P.: A parallel algorithm for mesh smoothing.
SIAM Journal on Scientific Computing 20(6), 2023–2040 (1999)
8. Freitag, L., Knupp, P., Munson, T., Shontz, S.: A comparison of optimization soft-
ware for mesh shape-quality improvement problems. In: Int. Meshing Roundtable,
pp. 29–40 (2002)
An Optimisation-Based Approach to Mesh Smoothing 41
9. Chen, Z., Tristano, J., Kwok, W.: Combined laplacian and optimization-based
smoothing for quadratic mixed surface meshes. In: 12th International Meshing
Roundtable (2003)
10. Ji, Z., Liu, L., Wang, G.: A global laplacian smoothing approach with feature
preservation. In: Int. Conf. on Computer Aided Design and Computer Graphics,
pp. 269–274 (2005)
11. Nealen, A., Igarashi, T., Sorkine, O., Alexa, M.: Laplacian mesh optimization. In:
ACM GRAPHITE, pp. 381–389 (2006)
12. Bougleux, S., Elmoataz, A., Melkemi, M.: Discrete regularization on weighted
graphs for image and mesh filtering. In: Sgallari, F., Murli, A., Paragios, N. (eds.)
SSVM 2007. LNCS, vol. 4485, pp. 128–139. Springer, Heidelberg (2007)
13. Taubin, G.: Curve and surface smoothing without shrinkage. In: Fifth International
Conference on Computer Vision, pp. 852–857 (1995)
14. Yagou, H., Ohtake, Y., Belyaev, A.: Mesh smoothing via mean and median filtering
applied to face normals. In: Procs. Geometric Modeling and Processing, pp. 124–
131 (2002)
15. Djidjev, H.N.: Force-directed methods for smoothing unstructured triangular and
tetrahedral meshes. In: Ninth International Meshing Roundtable, pp. 395–406
(2000)
16. Mezentsev, A.: A generalized graph-theoretic mesh optimization model. In: 13th
International Meshing Roundtable, pp. 255–264 (2004)
17. Ohtake, Y., Belyaev, A.G., Bogaevski, I.A.: Polyhedral surface smoothing with
simultaneous mesh regularization. In: Procs. Geometric Modeling and Processing,
pp. 229–237 (2000)
18. Ohtake, Y., Belyaev, A.G., Bogaevski, I.A.: Mesh regularization and adaptive
smoothing. Computer-Aided Design 33(11), 789–800 (2001)
19. Ohtake, Y., Belyaev, A., Seidel, H.: Mesh smoothing by adaptive and anisotropic
gaussian filter applied to mesh normals. In: Vision, Modeling, and Visualization,
pp. 203–210 (2002)
20. Tasdizen, T., Whitaker, R., Burchard, P., Osher, S.: Geometric surface smoothing
via anisotropic diffusion of normals. In: IEEE Visualization 2002, pp. 125–132
(2002)
21. Fleishman, S., Drori, I., Cohen-Or, D.: Bilateral mesh denoising. ACM Transactions
on Graphics 22(3), 950–953 (2003)
22. Desbrun, M., Meyer, M., Schröder, P., Barr, A.H.: Implicit fairing of irregular
meshes using diffusion and curvature flow. In: 26th annual conference on Computer
graphics and interactive techniques, pp. 317–324 (1999)
23. Zhao, H., Xu, G.: Triangular surface mesh fairing via gaussian curvature flow.
Journal of Computational and Applied Mathematics 195(1-2), 300–311 (2006)
24. Strang, G.: Introduction to Linear Algebra. Wellesley-Cambridge Press (2003)
25. Sorkine, O.: Differential representations for mesh processing. Computer Graphics
Forum 25(4), 789–807 (2006); alt. title: Laplacian mesh processing (Eurographics
2005 presentation)
26. Chung, F.R.: Spectral Graph Theory. Amer. Mathematical Society, Providence
(1997)
27. Field, D.A.: Laplacian smoothing and delaunay triangulations. Communications in
Applied Numerical Methods 4(6), 709–712 (1988)
Graph-Based Representation of Symbolic
Musical Data
1 Introduction
The ever increasing amount of music collections available in online stores or
public databases has created a need for user-friendly and powerful interactive
tools which allow an intuitive browsing and searching of musical pieces.
Ongoing research in the field of music information retrieval thus includes
the adaptation of many standard data mining and retrieval tools to the music
domain. In this regard the topographic mapping and visualization of large music
compilations combines several important features: data and class structures are
arranged in such a way that an inspection of the full dataset as well as an
intuitive motion through partial views of the database become possible.
Generally, there are two basic ways to automatically construct the topographic
arrangement for a mapping: One way – the classical one – is using a set of fea-
tures to position each subject in an Euclidean space. Then, since the number
of features is usually much larger than three, the high-dimensional spaces have
to be projected to two or three dimensions for visualization. Here, usually tech-
niques like the linear Principal Components Analysis (PCA) or the non-linear
Self-Organizing Map (SOM) are applied. Unfortunately, it is often not possible
to represent complex data such as symbolic musical data, i.e. sequences of notes,
by a set of Euclidean vectors. Therefore, there is demand for representations
that are capable of capturing complex structures.
A more sophisticated way uses pairwise dissimilarities between all subjects
able to capture more complex structures in the data. But in general, these dis-
similarities are no longer Euclidean distances and there is no embedding into
any Euclidean space without distortion — they may not be metric at all. Hence,
the classical methods cannot be applied in this case. Recently, variants of Neural
Gas and Self-Organizing Maps for dissimilarity datasets have been introduced
A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 42–51, 2009.
c Springer-Verlag Berlin Heidelberg 2009
Graph-Based Representation of Symbolic Musical Data 43
that are able to directly handle arbitrary similarity data instead of only simple
Euclidean ones [9].
To make use of those sophisticated non-Euclidean methods, it is essential to
have a dissimilarity measure at hand that provides a reliable pairwise measure-
ment of the complex data. For music, there are many different features that can
be extracted by algorithmic methods, either from acoustic data or a symbolic
description of a musical piece. Global features like the overall tempo, musical
key, pitch transition statistics, dynamics statistics or spectral features can be
measured for a song and used directly for mapping. However, their importance
and level of expression is different in the various styles of music and thus it
is difficult to find musical feature sets that are generally valid and equally sig-
nificant for every genre. A variety of approaches handling musical data based
on pairwise comparisons have been proposed, including metrics popular in data
mining standards such as the cosine distance based on tf×idf weightings of the
basic constituents of given data, complex mathematical constructions such as
the Haussdorff distance [6,17] or the spectra of an associated graph [16].
To compute a dissimilarity between pieces based instead on their tonal and
rhythmic progression, it is possible to use the temporal progression of features
like rhythmic patterns, note or chord sequences to measure dissimilarity (an
overview of encoding methods can be found in [5]). This can be achieved with
a suitable method to measure string dissimilarity like the edit distance or the
powerful and universal compression distance. Especially the latter has been used
in this way on symbolic representations of musical data with promising results
in recent years, like e.g. in [2,4,13].
Due to the nature of acoustic signals, it usually requires much more effort to
extract high-level features from acoustic audio data than from symbolic music
representations like MIDI1 or MusicXML2 . But recent progress in developing
efficient and reliable automated extraction methods that are able to gain musical
notation directly from complex acoustic material, as presented in [11,15,20],
open the way towards mapping techniques which directly rely on a symbolic
description of musical data.
When defining a similarity measure on sequences of musical notes, certain
invariances should be respected to comply with the average human perception
of melodies: These include invariances to transposition of the notes to a different
key and the scaling of the tempo (further described in [7]).
In the following section, we introduce a new way to convert symbolic rep-
resentations of music into strings via priorly constructed precedence trees. We
show how the string encoding derived from the graph structure as well as the
subsequently applied Normalized Compression Distance (NCD) is beneficial for
the named invariances.
We implemented our method in Matlab3 and used MIDI files as symbolic input
data. We processed selected subsets of classical pieces from the comprehensive
1
http://www.midi.org
2
http://www.musicxml.org
3
http://www.mathworks.com
44 B. Mokbel, A. Hasenfuss, and B. Hammer
Kunst der Fuge 4 MIDI collection, containing pieces from almost a thousand
composers, spanning various musical forms and epochs. The generated dissimilar-
ities were mapped with a non-metric Multi-Dimensional Scaling with Kruskal’s
normalized stress1 criterion. The experiments show most of the data arranged
in meaningful clusters with a reasonable separation of composers and eras.
To measure the dissimilarity between the tonal and rhythmic progression of two
pieces of music, our method compares string representations derived from their
note succession. We therefore developed an algorithm that converts the symbolic
note sequences in a MIDI file into a string, following a priorly constructed prece-
dence structure. Our algorithmic approach is based on the assumption that a hu-
man’s subjective perception of musical identity usually works very context-driven.
Thereby we suppose that most listeners will consider a melody to a certain extent
similar to a copy of it, if it has been changed in the following ways:
– It is shifted in its overall pitch, i.e. it has been transposed to another funda-
mental note.
– It is scaled in its overall tempo, i.e. all note lengths and pauses have been
contracted or elongated by a constant factor.
Thus we assume that the human perception of melodies is, to a certain extent,
invariant to overall pitch translation (pitch translation invariance 5 ) and to an
overall scaling of the tempo (time scaling invariance). To gain a measure that
is close to the human music perception we therefore encode the note sequences
to new symbolic sequences with an encoding method that is invariant to the
aforementioned changes of the note sequences. That means it produces the same
output, whether the input is the original or the altered note sequence. The in-
formation that is not encoded in the new strings is the magnitude by which
the pitch was shifted or the tempo scaled. As the described human assessment
of similarity would probably decrease along with a raise in magnitude of such
changes, it might be more truthfully described by distinguishing degrees of sim-
ilarity. Although this is not part of our encoding scheme at the moment, it is
easily imaginable to incorporate such information into our measure in the future.
Some related methods can be found in literature that partially provide for the
emphasized invariances, like e.g. [4,13,18,19]. In [4] pitch translation invariance
was achieved with a global pitch normalization throughout the entire piece,
making the encoding very sensitive regarding the automatic choice of the global
point of reference. In [13] every note’s pitch was encoded as the difference to the
pitch of its directly preceding note. In addition to independence of the overall
pitch, this method yields local separation: parts in the strings are equal for
4
http://www.kunstderfuge.com
5
Also referred to as pitch invariance or transposition invariance. The terms differ in
literature, we use the ones described in [7].
Graph-Based Representation of Symbolic Musical Data 45
parts of two songs that, aside from transposition, have equal note sequences,
even if the rests of these songs are completely different. Using common string
dissimilarity measures on those two representations would therefore reflect the
partly equality in its output value. In addition, one could store note lengths
and pauses analogously to gain time scaling invariance. Still, in these strings
you will find only very little equality in the case of two songs playing the same
melody (like a riff or a theme), only with dissimilar accompanying notes. Then
the output of a string dissimilarity measure would in our opinion not represent
most listeners’ assessments.
Our goal for the generated string representation was therefore a decomposi-
tion of the tonal and rhythmic progression of a song, that has the benefit of local
points of reference, but, on top of that, represents melodic lines and themes more
independently of the surrounding melodic context. Our strategy is to automati-
cally define precedence relationships between notes throughout the entire piece:
The functions pitch(n), start(n) and length(n) return the absolute pitch and the
normalized start time and length of a note n respectively. For every note n in
the sequence of notes N played in the song, the algorithm picks one designated
predecessor. For the current note cn ∈ N , the function pred(cn) returns one of
its time-wise preceding notes on the same MIDI channel (the sequence P (cn))
as the predecessor note.
We define
start(cn) − start(p)
pred(cn) = argmin k · |pitch(p) − pitch(cn)| + r ·
p∈P (cn) length(cn)
120
100
80
+1 +1 +4 +1 +1 +7 +1 +1
+3 +3 +3
-1 -1 -1 -1 -1 -1
+2 +3 +1 +2 +1
-5 -2 +5 -5 -2 +5 -1 -5
+4
-2 -3 -2
+4 +4
pitch
+3 +3
60
+5 +4 +1 +5 +4 +1
+7 +7
-24
40 -5 -5
20
0
0 10 20 30 40 50 60 70 80 90 100
time
Fig. 1. The precedence tree structure for the first 60 notes of Beethoven’s ”Für Elise”.
The edges are marked with all nonzero pitch changes of notes relative to their prede-
cessors.
notations that represent the same musical output with different combinations of
overall tempo and note durations.
Considering the precedence structure, a small, local change in the musical
progression will subsequently cause it to alter locally, resulting yet only in a
local symbolic change of the string representation. To explain the benefit of
this behavior on a practical example, imagine a bass line which is - due to its
lower pitch - isolated in the tonal progression. Its melody would be represented
independently in the string, unaffected by changes of higher lead melodies played
at the same time on the same MIDI channel. In this way any two melodies
will have independent representations within the string as long as their note
sequences are sufficiently separated in pitch locally.
To sum it up, our encoding method is fully pitch translation invariant and
time scaling invariant on an entire song but also shows highly invariant behavior
upon changes to certain subsets of notes and hence to variations of melodies. It
offers a structural decomposition for the representation of polyphonic music or
polyphonic instrument tracks.
To calculate the dissimilarity of the string representations, we used the pop-
ular Normalized Compression Distance (NCD) (see [3]), a measure based on
approximations of the Kolmogorov Complexity from algorithmic information
theory described in [12]. The NCD is defined as
where x and y are strings, C(x) denotes the compressed size of x and C(xy) the
compressed size of the concatenation of x and y using a real compressor. For our
experiments the bzip2 compression method was used. Since bzip2 works byte-
oriented as most of the common compression methods, the size of a reasonable set
of symbols is restricted to 28 . Therefore, we utilize the integer values in [1..255]
to code every possible relative state change of a note compared to its predeces-
sor. For every musical piece, we automatically build two strings, one that holds
the pitch changes relpitch(n) for every note n and another one for the rhythmic
progression. The latter is compiled from rellength(n) and reltiming(n), result-
ing in a string which is twice as long as the one representing the pitches. The
dissimilarity of two songs is then calculated as the mean value of the normalized
NCD of the pitch strings and the normalized NCD of the rhythmic strings. If one
disregards the possibility to change the overall tuning of a MIDI file or the use
of pitch-bending and vibrato while notes are played, MIDI distinguishes at most
128 different note pitches [0..127]. So our byte representation has to cover a total
of 255 possible values of relative pitch change, 127 upwards and 127 downwards
plus one for ’no change’ which in total is presentable with one byte. Our code
thus indicates decreasing pitches with the values [1..127], ’no change’ with 128
and increases with [129..255]. To encode the fractions of note durations given by
rellength, our algorithm maps all occurring numeric values to 127 real-valued in-
tervals that are centered around rhythmically important ratio values. These are
all the possible fractions between the lengths of a whole note, a half, a quarter,
1 1 1
an eighth, a 16 th, a 32 th and a 64 th note as well as the corresponding triplets
1
and five-lets in between. The encoding thereby treats a 64 th note followed by a
whole as the farthest upward change in note length and downward vice versa.
This way, very steep transitions of note lengths which exceed this magnitude
are all being treated as equal maximal steps as they are assigned to the same
interval. The resulting unique ratios consist of 63 values that are less than 1
(meaning the duration of the actual note is longer than the predecessor’s), an-
other 63 larger than 1 (the actual note is shorter than the predecessor), and the
ratio equal to 1 (durations are equal). These 127 ratios are presentable with half
a byte. The other byte values are used analogously as symbols to encode the
values of reltiming.
Apart from information about the instrumentation, our coding scheme disre-
gards musical phrasings like pitch-bending, vibrato, etc. We are currently work-
ing on an algorithm to correctly convert seamless pitch transitions (i.e. bendings)
to discrete notes. After the conversion our algorithm would include the emulated
pitch-bending like normal notes. This opens the way to further test our dissimi-
larity measure with datasets of mainstream and popular music. In popular music,
phrasings are usually very important in the melodic progression, especially in
the notations of the vocal/singing voices.
3 Experiments
We implemented our algorithms in Matlab 7.5 and used the Matlab MIDI
Toolbox [8] to read the MIDI files. To show the performance of the introduced
48 B. Mokbel, A. Hasenfuss, and B. Hammer
Bach
Haydn
Haydn Mozart
Bach Mozart
Chopin
Debussy
Haydn
Haydn
Bach Chopin Debussy
Chopin Debussy
Debussy Buxtehude
Mozart
Haydn Buxtehude
Haydn Chopin Debussy
Haydn
Haydn
Buxtehude
Haydn Haydn Buxtehude
Chopin Buxtehude
Haydn
Chopin Bach
Haydn Beethoven Chopin
Haydn
Chopin
Beethoven Bach
Chopin
Buxtehude
Beethoven
Mozart
Mozart
Bach Mozart
Beethoven Mozart
Beethoven
Beethoven
Mozart
Bach Buxtehude
Haydn
Mozart Bach Buxtehude
Chopin
Mozart
Beethoven
Beethoven
Beethoven
Haydn
Mozart
120
100
+12
+8
80 +5 -3 -1 +2
+3
+2 -1 +2 -2 -4 -1 -4
-2
-4 -5
pitch
+19 +2
60 +1 +1 +1
+16
-1 -1
+12 +1
-1 +2 -1
-5 -5
-2
40
-5 -5
20
0
0 10 20 30 40 50 60 70 80 90 100
time
120
100
80 +5 +5
+8 +8
+5 -3 -1 +2 +5 -3 -1 +2
+3 +3
+2 -1 +2 -2 -4 -1 -4 +2 -1 +2 -2 -4 -1 -4
-2 -2
-4 -5 -5 -4 -5
pitch
60
+3 +2 +3 +2
+1 +4 +1
-3 -3
-2 -2
-7 -7
40
20
0
0 10 20 30 40 50 60 70 80 90 100
time
References
1. Birmingham, W., Dannenberg, R., Pardo, B.: Query by humming with the vo-
calsearch system. Commun. ACM 49(8), 49–52 (2006)
2. Cataltepe, Z., Yaslan, Y., Sonmez, A.: Music genre classification using midi and
audio features. EURASIP Journal on Advances in Signal Processing, Article ID
36409 (2007)
3. Cilibrasi, R., Vitányi, P.: Clustering by compression. IEEE Transactions on Infor-
mation Theory 51(4), 1523–1545 (2005)
4. Cilibrasi, R., Vitányi, P., de Wolf, R.: Algorithmic clustering of music based on
string compression. Computer Music Journal 28(4), 49–67 (2004)
5. Cruz-Alcázar, P.P., Vidal, E.: Two grammatical inference applications in music
processing. Applied Artificial Intelligence 22(1&2), 53–76 (2008)
6. Di Lorenzo, P., Di Maio, G.: The hausdorff metric in the melody space: A new
approach to melodic similarity. In: ICMPC (2006)
7. Dorrell, P.: What Is Music?: Solving a Scientific Mystery. Lulu(print on demand)
(2005)
8. Eerola, T., Toiviainen, P.: MIDI Toolbox: MATLAB Tools for Music Research.
University of Jyvaskyla (2004)
9. Hammer, B., Hasenfuss, A.: Relational neural gas. In: Hertzberg, J., Beetz, M.,
Englert, R. (eds.) KI 2007. LNCS (LNAI), vol. 4667, pp. 190–204. Springer, Hei-
delberg (2007)
10. Hammer, B., Hasenfuss, A.: Relational neural gas. In: Hertzberg, J., Beetz, M.,
Englert, R. (eds.) KI 2007. LNCS (LNAI), vol. 4667, pp. 190–204. Springer, Hei-
delberg (2007)
11. Klapuri, A., Davy, M. (eds.): Signal Processing Methods for Music Transcription.
Springer, New York (2006)
12. Li, M., Vitányi, P.: An Introduction to Kolmogorov Complexity and Its Applica-
tions. Springer, Heidelberg (1997)
13. Londei, A., Loreto, V., Belardinelli, M.O.: Musical style and authorship catego-
rization by informative compressors. In: Proc. ESCOM Conference (2003)
14. Neuhaus, M., Bunke, H.: Bridging the Gap Between Graph Edit Distance and
Kernel Machines. World Scientific, Singapore (2007)
15. Pardo, B., Birmingham, W.P.: Algorithms for chordal analysis. Computer Music
Journal 26(2), 27–49 (2002)
16. Pinto, A., van Leuken, R.H., Demirci, M.F., Wiering, F., Veltkamp, R.C.: Index-
ing music collection through graph spectra. In: Proceedings of the International
Conference of Music Information Retrieval (2007)
17. Romming, C.A., Selfridge-Field, E.: Algorithms for polyphonic music retrieval:
The hausdorff metric and geometric hashing. In: Proceedings of the International
Conference of Music Information Retrieval (2007)
18. Ruppin, A., Yeshurun, H.: Midi music genre classification by invariant features. In:
Proceedings of the International Conference of Music Information Retrieval (2006)
19. Ukkonen, E., Lemström, K., Maekinen, V.: Geometric algorithms for transposi-
tion invariant content-based music retrival. In: Proceedings of the International
Conference of Music Information Retrieval (2003)
20. Woodruff, J., Pardo, B.: Using pitch, amplitude modulation, and spatial cues for
separation of harmonic instruments from stereo music recordings. EURASIP Jour-
nal on Advances in Signal Processing, Article ID 86369 (2007)
Graph-Based Analysis of Nasopharyngeal
Carcinoma with Bayesian Network
Learning Methods
Alex Aussem1 , Sergio Rodrigues de Morais1 , Marilys Corbex2 , and Joël Favrel1
1
University of Lyon,
LIESP, Université de Lyon 1, F-69622 Villeurbanne France
aaussem@univ-lyon1.fr
2
International Agency for Research on Cancer (IARC)
150 cours Albert Thomas, F-69280 Lyon Cedex 08 France
CORBEXM@emro.who.int
1 Introduction
The identification of relevant subset of risk factors that are not captured by
traditional statistical testing is a topic of considerable interest within the epi-
demiologic community. It is also a very challenging topic of pattern recognition
research that has attracted much attention in recent years [1,2]. In this study,
we apply a new graphical framework for extracting the relevant risk factors that
are statistically associated with the Nasopharyngeal Carcinoma (NPC) based on
a case-control epidemiologic study. The database consists of 1289 subjects (664
cases of NPC and 625 controls) and 150 binary random variables.
Nasopharyngeal Carcinoma (NPC for short) is a malignancy with unusually
variable incidence rates across the world. In most parts of the world it is a
rare disease but in some regions it occurs in an endemic form. Endemic regions
include the southern parts of China, other parts of south-east Asia and North
A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 52–61, 2009.
c Springer-Verlag Berlin Heidelberg 2009
Graph-Based Analysis of NPC with Bayesian Network Learning Methods 53
2 Preliminaries
For the paper to be accessible to those outside the domain, we recall first the
principles of Bayesian networks. In this paper, we only deal with discrete random
variables. Formally, a BN is a tuple < G, P >, where G =< V, E > is a directed
acyclic graph (DAG) with nodes representing the random variables V and P a
joint probability distribution on V. A BN structure G entails a set of conditional
independence assumptions. They can all be identified by the d-separation crite-
rion [8]. We use X ⊥G Y |Z to denote the assertion that X is d-separated from
54 A. Aussem et al.
the algorithm to handle large neighborhoods while still being correct under faith-
fulness condition. The theorem below (see [6] for the proof) establishes HPC’s
correctness under faithfulness condition:
Theorem 2. Under the assumptions that the independence tests are reliable
and that the database is an independent and identically distributed sample from
a probability distribution P faithful to a DAG G, HPC(T ) returns PCT .
4 Experiments
Before we proceed to the experiments with HPC on the NPC database, we
report some results on synthetic data that are independent and identically dis-
tributed samples from well known BN benchmarks ALARM, CHILD, INSUR-
ANCE, GENE and PIGS. The aim is to evaluate empirically the inevitable errors
that will arise our epidemiologic data. Therefore, we consider the same sample
size as the NPC data to get an empirical estimate of the accuracy of HPC on the
NPC data. To implement the conditional independence test, we calculate the G2
statistic as in [12], under the null hypothesis of conditional independence. The
significance level of the test is fixed to 0.05 for all algorithms. The test is deemed
unreliable when the number of instances is less than ten times the number of
degrees of freedom.
0.8
Euclidean distance from
0.7
0.6
0.5
0.4
0.3
0.2
0.1
Fig. 2. Local PDAG around variable NPC obtained by HPC. A selection of 37 variables
out of 150 is shown for sake of clarity. Line width is proportional to the G2 statistical
association measure. The links were partially directed by the domain expert. Dash
nodes and arrows are latent variables that were added by the expert for sake of clarity,
coherence and conciseness.
the variable ”bad habits” is a common ”cause” to alcohol, cannabis and tobacco;
the principle of a ”healthy diet” is clearly to eat ”fruits” and ”vegetables”; in-
dustrial workers (associated to variable ”working in industry”) are exposed to
noxious chemicals and poisonous substances that are often used in the course
of manufacturing etc. Now, adding a parent node (the cause) explains ways the
correlation between its child variables (the effects).
We now turn to the epidemiological interpretation of the PDAG. As may
be seen, the extracted variables provide a coherent picture of the population
under study. The NPC variable is directly linked to 15 variables: chemical
products, pesticides, fume intake, dust exposure, number of NPC cases in the
family, diabetes, otitis, other disease, kitchen ventilation, burning incense and
perfume, sheep fat, house-made proteins, industrial harissa, traditional treat-
ments during childhood and cooked vegetables. More specifically, the graph re-
veals that people exposed to dust, pesticide and chemical products are much
more likely to have NPC. Indeed, industrial workers are often exposed to noxious
60 A. Aussem et al.
chemicals and poisonous substances that are used in the course of manufactur-
ing etc. The PDAG also suggests that pesticides may be a contributing factor
for NPC along with other factors such as chemical manure exposure and hav-
ing a family history of NPC. Consumption of a number of preserved food items
(variables ”house made proteins”, ”sheep fat” and ”harissa” in the PDAG) was
already found to be a major risk factor for NPC [13,14,15]. Consumption of
”cooked vegetables” was also shown to be associated with reduced risk of NPC in
[14]. There is also strong evidence that intense exposure to smoke particles from
incomplete combustion of coal and wood (as occurs under occupational settings;
variables ”burning incense” and ”ventilation” in the graph) is associated with a
duration-dependent, increased risk of NPC [16]. In [17], the authors show that do-
mestic fume intake from wood fire and cooking with kanoun (i.e., compact sized
oven) is significantly associated with NPC risk. Apart of smoke particles, long
term use of incense is also known to increase the risk of developing cancers of the
respiratory tract. Therefore, the CPDAG supports previous findings that some
occupational inhalants are risk factors for NPC. The rest of the graph is also in-
formative and the edges lend themselves to interpretation. For instance, gender,
cigarette smoking and alcohol drinking are highly correlated to lifestyle habits in
the maghrebian societies but not to NPC. It was shown that NPC is less sensi-
tive to the carcinogenic effects of tobacco constituents [13], and that alcohol has
a marginal effect on NPC [17]. Poor housing condition is characterized by over-
crowding and lack of ventilation. Instruction, lodging conditions and professional
category are correlated. Consumption of traditional food (Spicy food, house-made
proteins and harissa) is related to consumption of traditional rancid fat (sheep fat,
smen) cooked with traditional technics (kanoun, tabouna) etc.
5 Conclusion
We discussed in this paper the situation where NPC survey data are passed to
a graphical discovery process to infer the risk factors associated with NPC. The
extracted feature match previous biological findings and opens new hypothesis
for future studies.
Acknowledgment
This work is supported by ”Ligue contre le Cancer, Comité du Rhône, France”.
The NPC data was kindly supplied by the International Agency for Research on
Cancer, Lyon, France.
References
1. Nilsson, R., Pena, J.M., Bjrkegren, J., Tegnr, J.: Consistent feature selection for
pattern recognition in polynomial time. Journal of Machine Learning Research 8,
589–612 (2007)
2. Peña, J.M., Nilsson, R., Bjrkegren, J., Tegnr, J.: Towards scalable and data effi-
cient learning of markov boundaries. International Journal of Approximate Rea-
soning 45(2), 211–232 (2007)
Graph-Based Analysis of NPC with Bayesian Network Learning Methods 61
3. Peña, J.M., Bjrkegren, J., Tegnr, J.: Growing bayesian network models of gene
networks from seed genes. Bioinformatics 40, 224–229 (2005)
4. Guyon, I., Aliferis, C., Cooper, G., Elissee, A., Pellet, J.P., Statnikov, P.A.: Design
and analysis of the causation and prediction challenge. In: JMLR: Workshop and
Conference Proceedings, vol. 1, pp. 1–16 (2008)
5. Aussem, A., Rodrigues de Morais, S., Perraud, F., Rome, S.: Robust gene selec-
tion from microarray data with a novel Markov boundary learning method: Appli-
cation to diabetes analysis. In: European Conference on Symbolic and Quantitative
Approaches to Reasoning with Uncertainty ECSQARU 2009 (2009) (to appear)
6. Rodrigues de Morais, S., Aussem, A.: A novel scalable and data efficient feature
subset selection algorithm. In: European Conference on Machine Learning and
Principles and Practice of Knowledge Discovery in Databases ECML-PKDD 2008,
Antwerp, Belgium, pp. 298–312 (2008)
7. Rodrigues de Morais, S., Aussem, A.: A novel scalable and correct markov bound-
ary learning algorithms under faithfulness condition. In: 4th European Workshop
on Probabilistic Graphical Models PGM 2008, Hirtshals, Denmark, pp. 81–88
(2008)
8. Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible
Inference. Morgan Kaufmann, San Francisco (1988)
9. Neapolitan, R.E.: Learning Bayesian Networks. Prentice Hall, Englewood Cliffs
(2004)
10. Chickering, D.M., Heckerman, D., Meek, C.: Large-sample learning of bayesian
networks is np-hard. Journal of Machine Learning Research 5, 1287–1330 (2004)
11. Tsamardinos, I., Brown, L.E., Aliferis, C.F.: The max-min hill-climbing bayesian
network structure learning algorithm. Machine Learning 65(1), 31–78 (2006)
12. Spirtes, P., Glymour, C., Scheines, R.: Causation, Prediction, and Search, 2nd edn.
The MIT Press, Cambridge (2000)
13. Yu, M.C., Yuan, J.-M.: Epidemiology of nasopharyngeal carcinoma. Seminars in
Cancer Biology 12, 421–429 (2002)
14. Feng, B.J., et al.: Dietary risk factors for nasopharyngeal carcinoma in maghrebian
countries. International Journal of Cancer 121(7), 1550–1555 (2007)
15. Jeannel, D., et al.: Diet, living conditions and nasopharyngeal carcinoma in tunisia:
a case-control study. Int. J. Cancer 46, 421–425 (1990)
16. Armstrong, R.W., Imrey, P.B., Lye, M.S., Armstrong, M.J., Yu, M.C.: Nasopharyn-
geal carcinoma in malaysian chinese: occupational exposures to particles, formalde-
hyde and heat. Int. J. Epidemiol. 29, 991–998 (2000)
17. Feng, B.J., et al.: Cannabis smoking and domestic fume intake are associated with
nasopharyngeal carcinoma in north africa (2009) (submitted)
Computing and Visualizing a Graph-Based
Decomposition for Non-manifold Shapes
1 Introduction
Non-manifold models have been introduced in geometric modeling long time ago.
They are relevant in describing the shape of mechanical models, which are usu-
ally represented as volumes, surfaces and lines connected together. Informally,
a manifold (with boundary) M is a compact and connected subset of the Eu-
clidean space for which the neighborhood of each point of M is homeomorphic
to an open ball (or to an open half-ball). Shapes, that do not fulfill this property
at one or more points, are called non-manifold.
Non-manifold shapes are usually discretized as cell or simplicial complexes and
arise in several applications, including finite element analysis, computer aided
manufacturing, rapid prototyping, reverse engineering, animation. In Computer
Aided Design (CAD), non-manifold shapes are usually obtained through an ide-
alization process which consists of operations, such as removal of details, hole
removal, or reduction in the dimensionality of some parts. For instance, parts pre-
senting a beam behavior in an object can be replaced with one-dimensional en-
tities, and parts presenting a plate behavior can be replaced by two-dimensional
surfaces. This process reduces the complexity of the object, thus resulting in a
representation which captures only its essential features.
A natural way to deal with the intrinsic complexity of modeling non-manifold
shapes consists of considering a topological decomposition of the shape into
manifold or ”almost” manifold parts. We consider here a decomposition of a
A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 62–71, 2009.
c Springer-Verlag Berlin Heidelberg 2009
Computing and Visualizing a Graph-Based Decomposition 63
2 Related Work
Shape analysis is an active research area in geometric and solid modeling, com-
puter vision, and computer graphics. The major approaches to shape analysis
64 L. De Floriani, D. Panozzo, and A. Hui
are based on computing the decomposition of a shape into simpler parts. Such ap-
proaches are either interior-based, or boundary-based [5]. Interior-based
approaches implicitly partition the volume of a shape by describing it as a geo-
metric, or a topological skeleton [6]. Boundary-based methods provide a decom-
position of the boundary of an object into parts, by considering local properties
of the boundary of the shape, such as critical features or curvature. These lat-
ter methods aim at decomposing an object into meaningful components, i.e.,
components which can be perceptually distinguished from the remaining part
of the object. Boundary-based methods have been developed in CAD/CAM for
extracting form features and produce a boundary-based decomposition of a 3D
object guided by geometric, topological and semantic criteria [7].
All shape segmentation and feature extraction algorithms, however, work on
manifold shapes. Only few techniques have been proposed in the literature for
decomposing the boundary of regular non-manifold 3D shapes [8, 9].
The partition of an analytic variety into analytic manifolds, called a strati-
fication, has been studied in mathematics to investigate the properties of such
varieties [10]. A stratification expresses the variety as the disjoint union of a
locally finite set of analytic manifolds, called strata. Pesco et al. [11] introduced
the concept of combinatorial stratification as the basis for a data structure for
representing non-manifold 3D shapes described by their boundary. The combi-
natorial stratification for a cell complex is a collection of manifold sub-complexes
of different dimensions, the union of which forms the original complex. A com-
binatorial stratification as discussed in [11], however, is not unique.
3 Background Notions
incident into σ. Any simplex σ such that star(σ) contains only σ is called a top
simplex. A simplicial d-complex in which all top simplexes are of dimension d
is called regular, or of uniform dimension. An h-path in a simplicial d-complex
Σ joining two (h+1)-simplexes in Σ, where h = 0, 1, ..., d − 1, is a path formed
by an alternating sequence of h-simplexes and (h+1)-simplexes. A complex Σ
is said to be h-connected if and only if there exists an h-path joining every pair
of (h+1)-simplexes in Σ. A subset Σ of Σ is a sub-complex if Σ is a simplicial
complex. Any maximal h-connected sub-complex of a d-complex Σ is called an
h-connected component of Σ.
belonging to C. Figure 1(c) shows the MC-decomposition graph for the pinched
torus depicted in Figure 1(a): the graph contains one self-loop corresponding to
the non-manifold edges and vertices forming the non-manifold singularity in the
shape.
Fig. 2. A simplicial 2-complex (a), its corresponding MC-decomposition graph (b) and
the exploded version of the MC-decomposition graph (c)
68 L. De Floriani, D. Panozzo, and A. Hui
Figure 3 depicts a screenshot from GCViewer showing the original shape, its
MC-decomposition (into twelve MC-components), the MC-decomposition graph
and its exploded version. The MC-decomposition is shown in the original shape
by assigning different colors to the components. Note that the MC-components
are the back, the seat, the two armrests, the four legs and four pieces which
connect the legs to the seat.
Figure 4(a) shows a shape formed by two bottles connected by two laminas
(2-dimensional MC-components), plus the caps, each of which consists of two
MC-components. The two bottles with the two laminas form a 1-cycle in the
shape. This is reflected in the cycle in the MC-decomposition graph, shown in
Figure 4(c). As shown by this example, there is a relation between the cycles in
the graph and the 1-cycles in the original shape which is not, however, a one-to-
one correspondence. Not all the cycles in the graph correspond to 1-cycles in the
shape, as shown in the example of Figure 3. 1-cycles in the shape that appear
as cycles in the MC-decomposition graph are those containing non-manifold
singularities. We are currently investigating the relation of the 1-cycles in the
shape with the properties of the MC-decomposition graph.
Beta binary versions of the visualization tool and of the MC-decomposition
algorithm are available at http://www.disi.unige.it/person/PanozzoD/mc/.
7 Concluding Remarks
Acknowledgements
This work has been partially supported by the MIUR-FIRB project SHALOM
under contract number RBIN04HWR8.
Computing and Visualizing a Graph-Based Decomposition 71
References
1. Hui, A., De Floriani, L.: A two-level topological decomposition for non-manifold
simplicial shapes. In: Proceedings of the 2007 ACM Symposium on Solid and Phys-
ical Modeling, Beijing, China, pp. 355–360 (June 2007)
2. De Floriani, L., Panozzo, D., Hui, A.: A dimension-independent data structure for
simplicial complexes (in preparation)
3. Léon, J.C., De Floriani, L.: Contribution to a taxonomy of non-manifold models
based on topological properties. In: Proceedings CIE 2008. ASME 2008 Computers
and Information in Engineering Conference, New York City, USA, August 3-6
(2008)
4. Crovetto, C., De Floriani, L., Giannini, F.: Form features in non-manifold shapes:
A first classification and analysis. In: Eurographics Italian Chapter Conference,
Trento, Italy, Eurographics, February 14–16, pp. 1–8 (2007)
5. Shamir, A.: Segmentation and shape extraction of 3d boundary meshes. In: State-
of-the-Art Report, Eurographics, Vienna, Austria, September 7 (2006)
6. Cornea, N., Silver, D., Min, P.: Curve-skeleton properties, applications and algo-
rithms. IEEE Transactions on Visualization and Computer Graphics 13(3), 530–
548 (2007)
7. Shah, J., Mantyla, M.: Parametric and feature-based CAD/CAM: concepts, tech-
niques and applications. John Wiley, Interscience (1995)
8. Falcidieno, B., Ratto, O.: Two-manifold cell-decomposition of R-sets. In: Kilgour,
A., Kjelldahl, L. (eds.) Proceedings Computer Graphics Forum, vol. 11, pp. 391–
404 (September 1992)
9. Rossignac, J., Cardoze, D.: Matchmaker: manifold BReps for non-manifold R-sets.
In: Bronsvoort, W.F., Anderson, D.C. (eds.) Proceedings Fifth Symposium on Solid
Modeling and Applications, pp. 31–41. ACM Press, New York (1999)
10. Whitney, H.: Local properties of analytic varieties. In: Cairns, S.S. (ed.) Differential
and combinatorial topology, A Symposium in Honor of Marston Morse, pp. 205–
244. Princeton University Press, Princeton (1965)
11. Pesco, S., Tavares, G., Lopes, H.: A stratification approach for modeling two-
dimensional cell complexes. Computers and Graphics 28, 235–247 (2004)
12. Agoston, M.: Computer Graphics and Geometric Modeling. Springer, Heidelberg
(2005)
A Graph Based Data Model for Graphics Interpretation
Endre Katona
1 Introduction
Although a lot of papers have been published presenting graphics interpretation sys-
tems [5, 6]; most of them concentrate on algorithmic questions, and less attention is
taken to data storage and handling. In this section we give an overview of data models
that we have investigated when creating our own model.
It is a natural way to use some graph representation for a vectorized drawing.
Lladós et al. [11] define an attributed graph, and after extracting minimum closed
loops, a region adjacency graph is generated. This model concentrates on region
matching and this fact restricts its applicability. In some sense similar approach is
given in [15] defining an attributed relational graph where nodes represent shape
primitives and edges correspond to relations between primitives. A special approach
is applied in [1]: after an initial rungraph vectorization a mixed graph representation is
used to ensure interface between raster and vector data.
Some interpretation systems use relational database tables to store geometric in-
formation of vectorized maps [2, 18]. The advantage of this approach is that commer-
cial database management systems can be used to handle data. Although there are
existing techniques to store spatial data in relational and object-relational tables [14],
it is clear that the relational model is not the best choice for graphics interpretation.
Object-oriented models are more flexible than relational ones and can be used in
graphics recognition [3] as well as in GIS (Geographical Information System) ap-
proaches. Object-oriented GIS typically uses a hierarchy of spatial object types, such
as defined in the Geometry Object Model of the Open Geospatial Consortium [14]
supporting interoperability of different systems. The object-oriented concept is excel-
lent for high level description, but it does not support low level algorithmic efficiency
during recognition.
A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 72–81, 2009.
© Springer-Verlag Berlin Heidelberg 2009
A Graph Based Data Model for Graphics Interpretation 73
– Zn denotes the set of normal objects; they are used to describe the current drawing.
– Zs denotes the set of sample objects, giving an a priori knowledge description.
For instance, sample objects can describe a symbol library of the map legend
orvector fonts of a given language.
Each object has the structure (id, layer, references) where id is the object identifier
number, and layer is a CAD-like attribute to classify objects. Normal objects have 0,
1, 2, etc. layer numbers, while sample objects are kept in a distinguished layer S.
Layer 0 is reserved for unrecognized objects. Each object may have references to
other objects using their id’s. The set of references R ⊂ Z × Z form an acyclic directed
graph, termed reference graph. Two types of references can be distinguished:
– A “contains” reference means that the current object involves the referred object
as a component. Rc ⊂ Z × Z denotes the set of “contains” references.
– A “defined by” reference means that the current object is a transformed version
of a referred sample object. Such references are mainly used to describe
recognized instances of sample objects. Rd ⊂ Z × Zs denotes the set of “defined
by” references, and R = Rc U Rd holds.
Denote domain(u) the set of all objects v that have a reference path from u to v, and
denote scope(u) the set of all objects v with a reference path from v to u. Notations
74 E. Katona
domainc(u) and scopec(u) mean restrictions to ”contains” reference paths. For any
sample object s, domain(s) ⊆ Zs is required.
The DG model contains three basic object types (instances of each may be normal
or sample objects as well):
– A NODE object represents a point with coordinates; a NODE instance is denoted
as node(x, y). Normally, a NODE has no references to other objects.
– An EDGE object is a straight line section given by “contains” references to the
endpoints. An EDGE instance is denoted as edge(node1, node2). A “line width”
attribute may be attached, if necessary.
– A PAT (pattern) object represents a set of arbitrary DG-objects given by “contains”
references to its components. A PAT instance is denoted as pat(obj1,..., objn).
Coordinates (x, y) of a center point may be attached to the PAT, usually giving the
“center of gravity” or other characteristic point of the pattern.
At a first look, NODE and EDGE objects form a usual graph structure describing
the drawing after initial vectorization. A PAT object typically contains a set of edges
identifying a recognized pattern on the drawing, but PATs can be utilized also in very
different ways.
The object type NODE has an important subtype, termed TEXT. Basically it repre-
sents a recognized inscription on the drawing, defined as a special transformation of a
vector font. (Vector fonts are given as sample objects.) Generalizing this idea, a
TEXT object can be used to describe a transformed instance of any other sample
object, as will be shown in Section 3. A TEXT instance is denoted as text(sample, x,
y, T, string) where sample is a “defined by” reference to a sample object, x and y are
coordinates of the insertion point, T is a transformation usually given by an enlarge-
ment factor and rotation angle, and string is an ASCII sequence of characters.
If string is given, then sample refers to a vector font pat(letter1,..., letterm) where,
for any i, letteri is a pat(edge1, edge2,...) object defining a character shape. In this case
text(sample, x, y, T, string) describes a recognized inscription on the drawing.
If string is omitted, then sample refers to the description of a certain symbol. For in-
stance, sample may refer to a pat(edge1,..., edgen) object giving vector description of a
map symbol with (0, 0) coordinates as center point. In this case text(sample, x, y, T)
describes a recognized instance of the symbol at point (x, y) transformed according to T.
3 Interpretation Strategy
Initially the DG-document contains only sample objects coding prototypes of symbols
and characters to be recognized. Interpretation starts with some raw vectorization
process (see [20] for an overview of vectorization methods). As a result of the
vectorization, a NODE-EDGE graph description of the drawing is inserted in the
DG-document. At this moment all normal objects are in layer 0. The processing is
performed as a sequence of recognition steps, each step may consist of three phases:
1. Hypothesis generation. PAT objects are created in the DG-document. For in-
stance, if a set of edges e1,..., en is recognized as a map object, then a pat(e1,..., en)
is created with the layer number associated with the current map object type.
Such an operation does not change the underlying data, thus the hypothesis gen-
eration is a reversible step ensuring the possibilities of backtracking and ignoring.
PAT objects can describe a hierarchy of higher level structures like blocks and
entities in [19].
2. Verification of hypotheses can be made by the user or by a higher level algorithm:
PATs of false hypotheses are marked as “rejected” while correct ones as “accepted”.
3. Finalization. Rejected hypotheses are dropped and accepted ones are processed,
possibly making irreversible changes in the underlying data. In some cases final-
ization can be omitted or postponed, in this way preserving the possibility of
backtracking.
The above procedure will be demonstrated on two examples.
Fig. 2. Example of text recognition using the DG model. Arrows denote references between
objects.
4 Implementation
The whole DG-document can be stored in RAM, since the DG description of a whole
map sheet normally does not exceed 10 Mbytes. Data structure consists of two arrays,
Obj and In. The Obj array contains object descriptions, and In[k] gives the starting
address for object with identifier k. (Note that description of an object does not con-
tain its id.) This mode of storage ensures constant access time along object references.
To ensure computational efficiency, “contained in” references – as inverses of
“contains” references – should be applied in some cases. For instance, all NODE
objects should have “contained in” references to the connected EDGE objects, in this
way efficient graph algorithms can be programmed. Note that in our implementation,
when necessary, all “contained in” references can be generated in linear time.
Automatic interpretation always needs human control and corrections, therefore it
is important to ensure fast displaying of the current recognition state on monitor
screen. When displaying a DG-document, only straight line sections should be drawn,
because all objects can be traced back to EDGE objects along “contains” and “defined
by” references. Color and line style is associated with each layer number (excepting
the S layer, because sample objects are not displayed). To demonstrate recognition
hypotheses on the screen, edges in layer 0 are displayed according to the maximum
layer number in their scope.
5 Spatial Indexing
The above DG implementation ensures fast data access along references, but spatial
searches, for instance to find the nearest node to a given node, may be very slow. The
problem can be solved by spatial indexing (for an overview see [16]). There are two
main types of spatial indexes: tree structured indexes are based on hierarchical tiling
of the space (usually quadtrees are applied), while grid structured indexes use homo-
geneous grid tiling.
Although quadtrees have nice properties in general, in the case of drawing interpreta-
tion a grid index may be a better choice, because of the following reasons. On one hand,
drawing density is limited by readability constraints. As a consequence, the number of
objects in a grid cell is a priori limited. On the other hand, map interpretation algorithms
normally use fixed search window (for instance, when recognizing dashed lines). A grid
index of cell size near to the search window size can work efficiently.
To discuss indexing techniques, we define the minimum bounding box (MBB) of
an object as the minimum enclosing rectangle whose edges are parallel to coordinate
axes. Considering the DG model, the MBB of an object z can be determined by com-
puting minimum and maximum coordinates of nodes in domainc(z). (MBB can be
defined also by domain(z), in this case the MBB of a TEXT object involves not only
the insertion point, but also the transformed vectors of the sample object.)
Fig. 4 shows a grid index example of 3 × 3 tiles where a list of object id’s is cre-
ated to each grid cell. An id appears in the i-th list if the MBB of the object overlaps
Ci. In this way the same object id may appear is several lists.
78 E. Katona
C1 1
C2
C3 2
C4 1, 3, 4, 6
C5 6
C6
C7 4, 5
C8 5, 6
C9 5
Our grid index implementation [10] ensures the insertion of a new id in constant
time. As a consequence, a grid index for N objects can be generated in O(N) time,
which is better than the usual building time O(N⋅log N) for quadtrees.
Processing time of one recognition step takes only a few seconds for a total map
sheet. This fact supports interactivity and makes it possible to create rather complex
algorithms in realistic time.
7 Conclusions
A universal graph-based data model has been introduced for graphics interpretation.
The same data structure is used
- to describe the original (raw vectorized) drawing,
- to describe and display the recognized drawing,
- to support the recognition process as well as manual corrections.
Our specification is independent of recognition algorithms, but suggests an
interpretation methodology on one hand, and gives a technical background on the
other hand.
When applying an interpretation system in practice, it is a usual difficulty that the
user is not familiar with inherent algorithms and data structures, and therefore cannot
make optimal control of the system. We think that basic ideas of the DG model
80 E. Katona
(only with four object types) are simple enough to understand for the user, and this
fact supports the efficiency of interactive work.
References
1. Boatto, L., Consorti, V., Buono, M., Zenzo, S., Eramo, V., Esposito, A., Melcarne, F.,
Meucci, M., Morelli, A., Mosciatti, M., Scarci, S., Tucci, M.: An Interpretation System for
Land Register Maps. Computer 25(7), 25–33 (1992)
2. Chen, L.-H., Liao, H.-Y., Wang, J.-Y., Fan, K.-C., Hsieh, C.-C.: An Interpretation System
for Cadastral Maps. In: Proc. of 13th Internat. Conf. on Pattern Recognition, pp. 711–715.
IEEE Press, Los Alamitos (1996)
3. Delalandre, M., Trupin, E., Labiche, J., Ogier, J.M.: Graphical Knowledge Management in
Graphics Recognition Systems. In: Brun, L., Vento, M. (eds.) GbRPR 2005. LNCS,
vol. 3434, pp. 35–44. Springer, Heidelberg (2005)
4. Ebi, N.B.: Image Interpretation of Topographic Maps on a Medium Scale Via Frame-based
modelling. In: International Conference on Image Processing, vol. I, pp. 250–253. IEEE
Press, California (1995)
5. Graph-based Representations in Pattern Recognition. Series of conference proceedings.
LNCS, vol. 2726 (2003), vol. 3434 (2005), vol. 4538 (2007). Springer, Heidelberg (last
three volumes)
6. Graphics Recognition (series). Selected papers of GREC workshops. LNCS, vol. 3088
(2004), vol. 3926 (2006), vol. 5046 (2008). Springer, Heidelberg (last three volumes)
7. Hartog, J., Kate, T., Gerbrands, J.: Knowledge-Based Segmentation for Automatic Map In-
terpretation. In: Kasturi, R., Tombre, K. (eds.) Graphics Recognition 1995. LNCS,
vol. 1072, pp. 159–178. Springer, Heidelberg (1996)
8. Hoel, E., Menon, S., Morehouse, S.: Building a robust relational Implementation of To-
pology. In: Hadzilacos, T., Manolopoulos, Y., Roddick, J.F., Theodoridis, Y. (eds.) SSTD
2003. LNCS, vol. 2750, pp. 508–524. Springer, Heidelberg (2003)
9. Katona, E., Hudra, G.: An Interpretation System for Cadastral Maps. In: Proceedings of
10th International Conference on Image Analysis and Processing (ICIAP 1999), pp. 792–
797. IEEE Press, Los Alamitos (1999)
10. Katona, E.: Automatic map interpretation. Ph.D. Thesis (in Hungarian), University of
Szeged (2001)
11. Lladós, J., Sanchez, G., Marti, E.: A String-Based Method to Recognize Symbols and
Structural Textures in Architectural Plans. In: Chhabra, A.K., Tombre, K. (eds.) GREC
1997. LNCS, vol. 1389, pp. 91–103. Springer, Heidelberg (1998)
12. Messner, B.T., Bunke, H.: Automatic Learning and Recognition of Graphical Symbols in
Engineering Drawings. In: Kasturi, R., Tombre, K. (eds.) Graphics Recognition 1995.
LNCS, vol. 1072, pp. 123–134. Springer, Heidelberg (1996)
13. Niemann, H., Sagerer, G.F., Schröder, S., Kummert, F.: ERNEST: A Semantic Network
System for Pattern Understanding. IEEE Trans. on Pattern Analysis and Machine Intelli-
gence 12(9), 883–905 (1990)
14. Open Geospatial Consortium: Simple Features Specification for SQL – Version 1.1.,
http://www.opengeospatial.org
15. Qureshi, R.J., Ramel, J.Y., Cardot, H.: Graph Based Shapes Representation and Recogni-
tion. In: Escolano, F., Vento, M. (eds.) GbRPR 2007. LNCS, vol. 4538, pp. 49–60.
Springer, Heidelberg (2007)
A Graph Based Data Model for Graphics Interpretation 81
16. Samet, H.: Design and Analysis of Spatial Data Structures. Addison Wesley, Reading
(1989)
17. Schawemaker, J.G.M., Reinders, M.J.T.: Information Fusion for Conflict Resolution in
Map Interpretation. In: Chhabra, A.K., Tombre, K. (eds.) GREC 1997. LNCS, vol. 1389,
pp. 231–242. Springer, Heidelberg (1998)
18. Suzuki, S., Yamada, T.: MARIS: Map Recognition Input System. Pattern Recogni-
tion 23(8), 919–933 (1990)
19. Vaxiviere, P., Tombre, K.: Celesstin: CAD Conversion of Mechanical Drawings. Com-
puter 25(7), 46–54 (1992)
20. Wenyin, L., Dori, D.: From Raster to Vectors: Extracting Visual Information from Line
Drawings. In: Pattern Analysis and Applications, pp. 10–21. Springer, Heidelberg (1999)
Tracking Objects beyond Rigid Motion
1 Introduction
Tracking multiple features belonging to rigid as well as articulated objects is
a challenging task in computer vision. Features of rigid parts can change their
relative positions due to variable detection precision, or can become occluded. To
solve this problem, one can consider using part-based models that are tolerant to
small irregular shifts in relative position - non-rigid motion, while still imposing
the global structure, and that can be extended to handle articulation.
One possibility to solve this task is to describe the relationships of the parts
of an object in a deformable configuration - a spring system. This has already
been proposed in 1973 by Fischler et al. [1]. Felzenszwalb et al. employed this
idea in [2] to do part-based object recognition for faces and articulated objects
(humans). Their approach is a statistical framework minimizing the energy of
the spring system learned from training examples using maximum likelihood
estimation. The energy of the spring system depends on how well the parts
match the image data and how well the relative locations fit into the deformable
model. Ramanan et al. apply in [3] the ideas from [2] in tracking people. They
model the human body with colored and textured rectangles, and look in each
frame for likely configurations of the body parts. Mauthner et al. present in [4]
an approach using a two-level hierarchy of particle filters for tracking objects
described by spatially related parts in a mass spring system.
Partially supported by the Austrian Science Fund under grants P18716-N13 and
S9103-N13.
A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 82–91, 2009.
c Springer-Verlag Berlin Heidelberg 2009
Tracking Objects beyond Rigid Motion 83
In this paper we employ spring systems, but in comparison to the related work
we try to stress solutions that emerge from the underlying structure, instead of
using structure to verify statistical hypothesis. The approach presented here re-
fines the concepts in [5] and extends them to handle articulation. Initial thoughts
related to this work have been presented in the informal workshop [6]. The aim
is to successfully track objects, consisting of one or more rigid parts, undergo-
ing non-rigid motion. Every part is represented by a spring system encoding
the spatial relationships of the features describing it. For articulated objects,
the articulation points are found through observation of the behavior/motion of
the object parts over time. The articulation points are integrated into the spring
systems as additional distance constraints of the parts connected to them.
Looking at related work in a broader field, the work done in tracking and
motion analysis is also related to our approach. There is a vast amount of work
in this field, as can be seen in the surveys [7,8,9,10]. It would go beyond the
scope of this paper mentioning all of this work. Interesting to know is that early
works even date back to the seventies, where Badler and Smoliar [11] discuss
different approaches to represent the information concerning and related to the
movement of the human body (as an articulated object).
The paper is organized as follows: Sec. 2 introduces tracking rigid parts with
a spring system. In Sec. 3 this concept is extended to tracking articulated ob-
jects. Experiments on real and synthetic videos and a discussion are in Sec. 4.
Conclusion and future plans can be found in Sec. 5.
where E(v) are all edges e incident to vertex v, k is the elasticity constant of
the edges in the structure, e is the edge length in the initial state and e, at a
different point in time. d(e, v) is the unitary vector in the direction of edge e
that points toward v. Fig. 1 shows two simple examples for graph relaxation.
A B A C
A B’ B B’
A B B’ B
(a) (b)
Fig. 1. Graph relaxation examples. B is the initial state of the vertex and B the
deformed one. The arrows visualize the structural offset vectors O(B ).
For more details on the Bhattacharyya coefficient see [13]. The regions are or-
dered descending by the Bhattacharyya coefficient and with this the iterations
start with the most confident regions.
To compute the position of each region (vertex in AG), Mean Shift offset and
structure-induced offset are combined using a mixing coefficient
Tracking Objects beyond Rigid Motion 85
3 Imposing Articulation
Articulated motion is a piecewise rigid motion, where the rigid parts conform
to the rigid motion constraints, but the overall motion is not rigid [10]. An
articulation point connects several rigid parts. The parts can move independent
to each other, but their distance to the articulation point remains the same. This
paper considers articulation in the image plane (1 degree of freedom).
As described in Sec. 2, the rigid parts of an articulated object are tracked
combining the forces of the deterministic tracker and the graph structure. To
integrate articulation, two vertices of each rigid part are connected with the
common articulation point1 . These two reference vertices constrain the distance
of all other vertices of the same part to the articulation point. The reference
vertices are directly influenced by the articulation point and propagate the “in-
formation” from the other connected parts during tracking.
Each rigid part is iteratively optimized as explained in Sec. 2 and for artic-
ulated objects the articulation points are integrated into this process through
their connection to the reference vertices.
Important features of the structure of an object do not necessarily correspond
to easily trackable visual features, e.g. articulation points can be occluded, or
can be hard to track and localize. Articulation points are thus not associated to
a tracked region (as opposed to tracked features of the rigid parts). The position
of the articulation points is determined in an initial frame (see Sec. 3.1) and
used in the rest of the video (see Sec. 3.2).
where Zi is the sum of all Bhattacharyya coefficients (see Eq. 2) of part i with vi
regions/vertices, m is the number of adjacent regions, and ai is the gain for part
Y
X
X
p2
p1 c
o
c
p2
θ p1
Y
time: t time: t + δ
Fig. 2. Encoding and deriving of an articulation point in the local coordinate system,
during two time steps: t and t + δ
Tracking Objects beyond Rigid Motion 87
4 Experiments
The following experiments show sequences with one articulation point. More
articulation points can be handled by pairwise processing of all adjacent rigid
parts (a more efficient strategy is planned). In all experiments we employ a pri-
ori knowledge about the structure of the target object (number of rigid parts
and articulation points). A method like in [14] could be used to automatically
delineate rigid parts and articulation points of an object. The elasticity con-
stant k (see Equ. 1) is set to 0.2 for all experiments (this value was selected
empirically).
88 N. Artner, A. Ion, and W.G. Kropatsch
Fig. 3. Experiment 1. Tracking non-rigid motion without (top row) and with structure
(bottom row). Frame 25 in bottom row shows how the graph should look like.
Table 1. Sum of spatial deviations in pixels from ground truth for experiment 1
Fig. 4. Experiment 2. Top row: with structure and articulation point. Bottom row:
without structure. The red star-like symbol represents the estimated articulation point.
Fig. 5. Experiment 3. Top row without articulation point and bottom row with.
4.1 Discussion
The Mean Shift tracker fits very well into our approach as the spring system
optimization is also iterative, and we are able to re-initiate Mean Shift at any
given state of a vertex in the spring system. Another tracker with the same
90 N. Artner, A. Ion, and W.G. Kropatsch
45 45
40 40
35 35
30 30
Spatial deviation
Spatial deviation
25 25
20 20
15 15
10 10
5 5
0 0
5 10 15 20 5 10 15 20
Frames Frames
(a) (b)
Fig. 6. Spatial deviation for each region. (a) without and (b) with articulation point.
The big deviations are a result of the full occlusion in frame 8 in Fig. 5.
properties could also be used. As tracking with Mean Shift is used to solve the
association task (avoiding complex graph matching), the success of this approach
is highly dependent on the results of the trackers. It is necessary that at least
part of the vertices of the spring system can be matched.
The current approach extends the rigid structure to handle articulation. This
only imposes a distance constraint and does not consider any information related
to the motion of the parts. During an occlusion the articulation point improves
the reconstruction of the positions of the occluded regions. Nevertheless, the
distance constraint brought in by the articulation point is not always enough
to successfully estimate the positions (it is sufficient for translations, but not
for rotations of parts). For example if one of two rigid parts of an object is
completely occluded and there is a big rotation of the occluded part between
adjacent frames this approach may fail.
At the moment the two reference vertices are selected with no special criteria.
This criteria could be the connectivity of the vertices or their visual support.
5 Conclusion
This paper presents a structural approach for tracking objects undergoing non-
rigid motion. The focus lies on the integration of articulation into the spring
systems describing the spatial relationships between features of the rigid parts
of an object. The position of the articulation points is derived by observing
the movements of the parts of an articulated object. Integrating the articulation
point into the optimization process of the spring system leads to improved track-
ing results in videos with big transformations and occlusions. A weakness of this
approach is that it cannot deal with big rotation during occlusions. Therefore,
we plan to consider higher level knowledge like spatio-temporal continuity to
observe the occluded part reappearing around the borders of the visible occlud-
ing object. Another open issue is dealing with scaling and perspective changes.
Tracking Objects beyond Rigid Motion 91
Future work is also to cope with pose variations and the resulting changes in the
features representing the object.
References
1. Fischler, M.A., Elschlager, R.A.: The representation and matching of pictorial
structures. Transactions on Computers 22, 67–92 (1973)
2. Felzenszwalb, P.F.: Pictorial structures for object recognition. IJCV 61, 55–79
(2005)
3. Ramanan, D., Forsyth, D.: Finding and tracking people from the bottom up. In:
Proceedings of 2003 IEEE Computer Society Conference on Computer Vision and
Pattern Recognition, June 2003, vol. 2, pp.II-467–II-474 (2003)
4. Mauthner, T., Donoser, M., Bischof, H.: Robust tracking of spatial related compo-
nents. In: ICPR, pp. 1–4. IEEE, Los Alamitos (2008)
5. Artner, N., Mármol, S.B.L., Beleznai, C., Kropatsch, W.G.: Kernel-based tracking
using spatial structure. In: 32nd Workshop of the AAPR, OCG, May 2008, pp.
103–114 (2008)
6. Artner, N.M., Ion, A., Kropatsch, W.G.: Tracking articulated objects using struc-
ture (accepted). In: Computer Vision Winter Workshop 2009, PRIP, Vienna Uni-
versity of Technology, Austria (February 2009)
7. Gavrila, D.M.: The visual analysis of human movement: A survey. Computer Vision
and Image Understanding 73(1), 82–980 (1999)
8. Moeslund, T.B., Hilton, A., Krger, V.: A survey of advances in vision-based human
motion capture and analysis. Computer Vision and Image Understanding 104(2–3),
90–126 (2006)
9. Aggarwal, J.K., Cai, Q.: Human motion analysis: A review. Computer Vision and
Image Understanding 73(3), 428–440 (1999)
10. Aggarwal, J.K., Cai, Q., Liao, W., Sabata, B.: Articulated and elastic non-rigid
motion: A review. In: IEEE Workshop on Motion of Non-Rigid and Articulated
Objects, pp. 2–14 (1994)
11. Badler, N.I., Smoliar, S.W.: Digital representations of human movement. ACM
Comput. Surv. 11(1), 19–38 (1979)
12. Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from
maximally stable extremal regions. Image and Vision Computing 22(10), 761–767
(2004)
13. Comaniciu, D., Ramesh, V., Meer, P.: Kernel-based object tracking. PAMI 25(5),
564–575 (2003)
14. Mármol, S.B.L., Artner, N.M., Ion, A., Kropatsch, W.G., Beleznai, C.: Video object
segmentation using graphs. In: Ruiz-Shulcloper, J., Kropatsch, W.G. (eds.) CIARP
2008. LNCS, vol. 5197, pp. 733–740. Springer, Heidelberg (2008)
Graph-Based Registration of Partial Images of
City Maps Using Geometric Hashing
A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 92–101, 2009.
c Springer-Verlag Berlin Heidelberg 2009
Graph-Based Registration of Partial Images of City Maps 93
Fig. 1. Existing approaches to track a mobile-camera device over paper maps: Visual
markers (left), grid of black dots (middle), application scenario (right)
2 Registration Algorithm
In this section, we will present our algorithm for the registration of city maps. Our
algorithm stores data of city maps by generating a graph representation, and by
saving this graph representation as geometric hash. The goal of our algorithm is
to compute a registration between stored city maps and images of arbitrary parts
of city maps, which may be translated, rotated and scaled. The matching is per-
formed using geometric hashing and is divided into an offline phase and an online
phase. Figure 2 gives an overview of the two phases and the corresponding steps.
The offline phase is used to efficiently store given city maps in the form of
a geometric hash, which can be used later with high efficiency, e.g. on mobile
devices. Our algorithm can extract and store information from different kinds
of city maps. To extract information from maps, which are given in the form
of images, we use a map-dependent preprocessing that creates the graph repre-
sentation. To extract information from maps which consist of vector data, e.g.
data for navigation systems, the algorithm transforms the vector data into the
appropriate graph representation. From this graph representation, a geometric
hash is created.
The online phase is a query phase, where a part of a map is presented to
the algorithm with the goal to find the best registration between this map part
and the stored map data. From the map part to be registered, a graph repre-
sentation is generated and used as query. Such map parts may be images from
low-resolution cameras, e.g. from camera phones. We use special preprocessing
to create the graph representations from such low quality images of map parts.
Due to the use of geometric hashing, this registration is completely translation,
scale and rotation invariant and robust to noise, small perspective distortions,
and occlusions or missing data.
94 S. Wachenfeld et al.
Fig. 2. Overview of the two phases of our algorithm and their corresponding steps
The result of the online phase is a registration of the smaller part onto one
of the stored larger maps. This registration implicitly leads to a transformation
function between the coordinate systems of the two maps. This allows for one of
the main applications, which is an overlay function for mobile devices. A camera
phone can be used to take an image or a video of a part of a map. The part
of the map which is visible in this image or video, is registered and additional
information is overlayed at the corresponding positions, e.g. locations of WLAN
spots, cash machines or other points of interest.
The following three subsections will present the algorithm in detail. First, we
will explain our preprocessing steps, which transform an image of a city map
into a graph representation. Then, we will explain how the geometric hash is
created from the graph representations of city maps. In the last subsection we
show how the graph of the query image is used to find the best registration and
to compute the transformation function.
2.1 Preprocessing
Street Detection. The intention of the street detection step is to localize and
to extract streets from background. This step is not necessary for vector data,
as already mentioned. But the standard input are map images, which can be
of different types. Maps from different sources generally use different colors.
Graph-Based Registration of Partial Images of City Maps 95
a) b) c)
Fig. 3. Three maps of Orlando: Google Map (a), Map24 (b), OpenStreetMap (c)
a) b) c)
Fig. 4. From a map image to a graph representation: Map image (Google style) (a),
binary street map with skeleton (b), resulting graph representation (c)
Figure 3 shows the color differences between three example maps for the same
part of the town Orlando.
To localize the streets of a certain map type, we use a specific color profile.
If, for example, our algorithm shall localize streets of a Google map, it uses a
specific Google Map color profile. This profile defines specific shades of yellow,
orange and white as streets, while specific shades of green, blue, as well as light
and dark gray are defined as background. Pixels of a given image are classified to
be foreground or background according to their distance to the specified colors.
The classification result is an intermediate binary image. Due to noise this image
may contain smaller holes or touching streets, which shall not be connected.
Also, larger holes may occur, where text was in the original map. This step is
not completely map type independent though it works with different map types
just by replacing the profile data.
Onto this intermediate binary image, we apply morphological opening and
closing operations to close small holes and to remove noise. Larger holes, e.g.
from text in the map image, are not closed by morphological operations, as using
larger closing masks leads to unwanted connections of streets. We close larger
holes or remove isolated areas of foreground pixels by investigating the size of
connected components. This way, we get a satisfying binary image, where streets
are foreground and non-streets are background.
The input image may result from a low quality camera of a mobile device,
such as a camera phone or a PDA. This leads to extra noise and inhomogeneous
illumination. While the noise is not critical to our experience, inhomogeneous
illumination leads to misclassifications. We divide the image into 9 larger regions
(3 × 3) and 64 smaller regions (8 × 8) and use local histograms to determine
illumination changes. We then adapt the intensity of the expected colors for the
classification step. This way, we get a satisfying binary image even for noisy and
inhomogeneously illuminated camera images (see Figure 6 for an example).
96 S. Wachenfeld et al.
Skeletonization. The second step is to build a skeleton from the binary street
image. This means to slim the streets to the width of one pixel. Figure 4b
shows an example of a binary street map and the resulting skeleton. This step
is relatively straightforward and thus not described in detail due to the space
limitation.
Graph Computation. The graph can easily be created from the skeleton by fol-
lowing the skeleton from node to node. At crossings of larger roads, multiple
nodes may result which have to be merged. Edges between two nodes have to
be significantly longer than the widths of the corresponding streets, otherwise
they will be merged. Figure 4c shows a resulting graph. Remember that edges
represent the existence of street connections but not their shape.
To be independent of the map colorization and thereby enable the use of different
map types for offline and online phase, we compare structural information using
geometric hashing. Geometric hashing is a well known method in computer vision
to match objects which have undergone transformations or when only partial
information exists (see Wolfson [14] for a good overview). The idea is to use a
computationally expensive offline phase to store geometric map data in the form
of a hash. Later, during the online phase, the hash can be accessed in a fast
manner to find a best matching result for given query data.
In the preprocessing step, we have extracted the geometric features of the
city map. The result of the preprocessing is a graph representation, which is
completely map type independent. Two graphs from two different maps of the
same location will most probably look similar (crossings are nodes and streets
are edges).
The hash which is going to be created is a 2D plane which will hold information
about node positions of transformed (translated, scaled, and rotated) graphs. It
can be visualized as a plane full of dots, which represent node positions. The
hash for a city map is created by transforming the graph representation of the
map many times. This is similar to the generalized Hough transform (see [1])
with the difference that our set of transformations also contains translations and
is generated from the graph. The 2D positions of the graph’s nodes build one
hash plane for each transformation. The final hash represents the information of
all hash planes in an efficient way.
Hash planes are created by selecting an edge e and by translating, rotating and
scaling the whole map, so that one of the two nodes which belong to edge e is pro-
jected onto position x1 = (−1, 0) and the other one onto position x2 = (1, 0). This
transforms edge e into an edge between x1 and x2 of two unit lengths. All other
node positions undergo the same transformation and build the hash plane he .
To yield rotation and scale invariance, the original map is scaled and rotated
many times, one time for each edge. Thus, the hash consists of multiple hash
planes, one for each edge. If the two nodes of an edge are very close or very far
from each other, extremely large or small scales result respectively. Extremely
Graph-Based Registration of Partial Images of City Maps 97
small scales lead to agglomerations of projected nodes in the hash and extremely
large scales lead to an error amplification. As a consequence, our algorithm
prefers edges which approximately have a preferred length d∗ . Also, if the nodes
of a selected edge are located near the borders of the image, the resulting hash
plane will have large empty areas (i.e. the upper or lower half). To avoid such
heavily unused hash space, we prefer to select edges from the image’s center c.
We use the best rated edges e = (n1 , n2 ), according to the rating function
−d(n1 , n2 ) 1
r(e) = d(n1 , n2 ) · exp ∗
·
d d(c, n1 ) + d(c, n2 ) + 1
∗ prefers nodes near the center c
prefers d = d
where d(a, b) is the distance between a and b, d∗ is the preferred length, and c
is the image center.
Later, the online phase will require a nearest neighbor search on the hash data.
To facilitate a fast search, the positions of the projected nodes are saved in buckets.
Buckets result from dividing the 2D plane of the hash using an equidistant grid. A
bucket is one field of this grid and stores information about the projected nodes
within this field of all overlayed hash planes. The optimal number of buckets de-
pends on the number of nodes in the hash. Also non-equidistant grids are possible
to yield a uniform distribution of nodes over the buckets.
2.3 Registration
The registration is done in two steps, first the matching between the graph
representation of the query map image and the geometric hash, and second, the
computation of the transformation function.
Matching. To match the query graph with the hash, edges e from the query
graph which have a good rating r(e) are selected. Similar to the hash plane
creation, the edge is projected onto to x1 and x2 , and the query graph is trans-
formed accordingly. The transformed query graph is then projected onto the
hash. At the positions of the projected query graph’s nodes, the hash’s buckets
are searched for coinciding nodes of hash planes. If the query image shows a part
of a stored map, the query graph will be like a subgraph to the graph of the stored
map. All good rated edges of the stored map have been used to create the hash
planes. Thus, the projection of the query graph, which is a subgraph, will lead
to matches. Noise and perspective distortion will certainly impact the exactness
of the matches, but projected nodes may still be expected to be found in the
right buckets. For each selected edge e, we compute a matching quality q(e, h)
for each hash plane h. This quality indicates how good the edge e corresponds
to the edge which has been used to create the hash plane h. It is measured by
investigating the distance of all transformed query nodes to the nearest nodes of
hash plane h. For each edge the five best matching hash planes will be considered
for further investigation, in the assumption that the corresponding matching is
amongst these five hash planes. See Figure 5 for examples of best matching hash
planes for selected edges of the same query image.
98 S. Wachenfeld et al.
Fig. 5. Examples of best matching hash planes for four different query node pairs: best
matching hash plane (blue), query graph (green), selected node pair for alignment (red)
where tx and tx are translations, and c and s represent scale and rotation.
To determine the parameters, we generate an association matrix A. This ma-
trix stores information about matchings between query nodes and nodes of the
five best matching hash planes per selected query edge. For m nodes in the query
graph and n nodes of a stored graph, A is a m × n matrix. If for a query edge
e, the query node mi is matched to the stored node nj of hash plane hk , then
the value of A(i, j) is increased by the quality of the matching q(e, hk ). If the
five best matching hash planes are considered, the node mi will be associated to
five nodes. Normally, one association is correct, and the other four are wrong.
This is repeated for each selected query edge. Correct associations will occur re-
peatedly for many query edges, while wrong associations will vary. Thus, correct
associations will be indicated by high accumulated quality values in the rows of
matrix A. The highest entry of each row indicates a correct association.
Assuming that these associations are correct, we select pairs of these associ-
ations to solve the linear equation system for the variables c, s, tx and ty of T .
For each pair we get a transformation. If we apply this transformation, we can
measure an error based on the distance of all query nodes to their associated
nodes. We select the transformation with the least median error (LME) as result.
Because several associations will be wrong, we use the LME which is robust to
up to 50% of outliers [12].
3 Experimental Results
We have performed two kinds of experiments: laboratory experiments by simu-
lation and live experiments using images from a Nokia N95 camera phone.
We took screenshots of very large displayed city maps (∼ 5000 × 3000 pixel)
from Google for two German cities (Münster and Hannover). These images were
used to produce a ground truth hash for each city. For testing, small parts from
known positions of these maps were generated in various test series and then
presented to the system.
To measure the quality of a resulting transformation, we compute the RMSE
for the area of the input image. If TR is the resulting transformation and TG the
ground truth transformation, the RMSE for an input image I of size w × h is
calculated by
! h ! w
1
RM SE(TR (I), TG (I)) = TR (x, y) − TG (x, y)2 dxdy
w·h 0 0
This RMSE value measures the spatial distance between the ground truth trans-
formation and the computed transformation and can be interpreted as an error
in pixels. To distinguish between successful and failed registrations, we have set
a threshold of 5 pixels with regard to the RMSE measure. For our purpose of
map augmentation this is a sufficient accuracy.
We generated 1080 query (sub)graphs of a ground truth image. This is done
by choosing 6 rotation angles and 36 translational vectors and applying these
transformations to some fix subarea of the ground truth image, resulting in 216
different subimages and accordingly 216 subgraphs. We repeated the experiments
5 times (1080 query graphs in total) and determined the average accuracy mea-
sure. Due to space limitations, Table 1 shows only results of a small fraction of
all experiments which were conducted using the following test series for which
only the query graph was modified and the hash was left unaffected:
– k% of the nodes were randomly deleted to simulate missing data.
– k% of the nodes were shifted to new positions to simulate map differences.
Each coordinate of the shift vector was subject to a Gaussian distribution
with standard deviation σ.
– Insertion (mode 1) of k% new nodes, which are randomly positioned and
connected to 1 to 4 nearest nodes to simulate map updates.
– In a second insertion (mode 2), k% new nodes were generated, each lying on
an original edge to simulate variations in the graph creation procedure.
– Finally, a series of query graphs was generated by combining the node dele-
tion, shift, and insertion (mode 1) operations.
The various kinds of artificially generated distortions in the test series were in-
tended to simulate different errors we encounter in dealing with real images. The
experiments have shown that our algorithm is robust against such distortions
up to a certain extent. Particularly interesting is the case of insertion (mode 2).
Adding new nodes lying on an edge of the query graph means an oversegmenta-
tion of street contours and directly leads to substantial changes in graph topol-
ogy. Fortunately, the simulation results indicate that this distortion source does
not introduce more registration inaccuracy than other errors. This remarkable
property is due to the robust behavior of geometric hashing.
100 S. Wachenfeld et al.
Fig. 6. Live experiment: Image taken by Nokia N95 camera phone (left), visualization
of detected streets, skeleton and nodes (middle), registration result (right)
Live experiments. For the live experiments, we printed the city maps on paper
and took images of parts of these maps using a Nokia N95 camera phone under
uncontrolled illumination. Good results for such low quality images (640 × 480
pixels) could be observed. In this experiment series the ground truth is not
exactly known and we thus determined successful and failed registrations by
visual inspection. One such successful registration can be seen in Figure 6, where
the mobile camera image suffers from inhomogeneous illumination and small
perspective distortions.
The online phase of our hashing algorithm is very fast and the memory needed
to store the hash of a city map is low (∼200KB-2MB per city map), which enables
the algorithm directly implemented on mobile devices. Our long-term goal is thus
to realize a realtime implementation using Symbian C++. This would allow for
realtime registration and augmentation of city maps using our Nokia N95 camera
phones. Further, we would like to implement an automatic map type recognition
based on color distribution. This would allow for an adaption of the color profile
to completely unknown map styles.
References
1. Ballard, D.H.: Generalizing the hough transform to detect arbitrary shapes. In:
Readings in computer vision: issues, problems, principles and paradigms, pp. 714–
725. Morgan Kaufmann Publishers Inc., San Francisco (1987)
2. Bier, E.A., Stone, M.C., Pier, K., Buxton, W., DeRose, T.D.: Toolglass and Magic
Lenses: the See-Through Interface. In: Proc. of the 20th Annual Conf. on Computer
Graphics and Interactive Techniques, pp. 73–80. ACM Press, New York (1993)
3. Hecht, B., Rohs, M., Schöning, J., Krüger, A.: WikEye–Using Magic Lenses to
Explore Georeferenced Wikipedia Content. In: Proc. of the 3rd Int. Workshop on
Pervasive Mobile Interaction Devices, PERMID (2007)
4. Lowe, D.G.: Distinctive Image Features from Scale-Invariant Keypoints. Int. Jour-
nal of Computer Vision 60(2), 91–110 (2004)
5. Ozuysal, M., Fua, P., Lepetit, V.: Fast Keypoint Recognition in Ten Lines of Code.
In: Proc. of Int. Conf. on Computer Vision and Pattern Recognition, pp. 1–8 (2007)
6. Reilly, D., Rodgers, M., Argue, R., Nunes, M., Inkpen, K.: Marked-up Maps: Com-
bining Paper Maps and Electronic Information Resources. Personal and Ubiquitous
Computing 10(4), 215–226 (2006)
7. Reitmayr, G., Eade, E., Drummond, T.: Localisation and Interaction for Aug-
mented Maps. In: Proc. ISMAR, pp. 120–129 (2005)
8. Rohs, M., Schöning, J., Krüger, A., Hecht, B.: Towards Real-Time Markerless
Tracking of Magic Lenses on Paper Maps. In: Adjunct Proc. of the 5th Int. Conf.
on Pervasive Computing, Late Breaking Results, pp. 69–72 (2007)
9. Rohs, M., Schöning, J., Raubal, M., Essl, G., Krüger, A.: Map Navigation with
Mobile Devices: Virtual Versus Physical Movement with and without Visual Con-
text. In: Proc. of the 9th Int. Conf. on Multimodal Interfaces, pp. 146–153. ACM,
New York (2007)
10. Schöning, J., Hecht, B., Starosielski, N.: Evaluating Automatically Generated
Location-based Stories for Tourists. In: Extended Abstracts on Human Factors
in Computing Systems, pp. 2937–2942. ACM, New York (2008)
11. Schöning, J., Krüger, A., Müller, H.J.: Interaction of Mobile Devices with Maps.
In: Adjunct Proc. of the 4th Int. Conf. on Pervasive Computing, vol. 27. Oesterre-
ichische Computer Gesellschaft (2006)
12. Stewart, C.V.: Robust Parameter Estimation in Computer Vision. SIAM
Rev. 41(3), 513–537 (1999)
13. Wagner, D., Reitmayr, G., Mulloni, A., Drummond, T., Schmalstieg, D.: Pose
Tracking from Natural Features on Mobile Phones. In: Proc. ISMAR, pp. 125–134
(2008)
14. Wolfson, H.J., Rigoutsos, I.: Geometric Hashing: An Overview. IEEE Comput. Sci.
Eng. 4(4), 10–21 (1997)
A Polynomial Algorithm for Submap
Isomorphism
Application to Searching Patterns in Images
1 Introduction
In order to manage the huge image sets that are now available, and more partic-
ularly to classify them or search through them, one needs similarity measures.
A key point that motivates our work lies in the choice of data structures for
modelling images: These structures must be rich enough to describe images in a
relevant way, while allowing an efficient exploitation. When images are modelled
by vectors of numerical values, similarity is both mathematically well defined and
easy to compute. However, images may be poorly modelled with such numerical
vectors that cannot express notions such as adjacency or topology.
Graphs allow one to model images by means of, e.g., region adjacency relation-
ships or interest point triangulation. In either case, graph similarity measures
have been investigated [CFSV04]. These measures often rely on (sub)graph iso-
morphism —which checks for equivalence or inclusion— or graph edit distances
and alignments —which evaluate the cost of transforming a graph into another
The authors acknowledge an Anr grant Blanc 07-1_184534: this work was done
in the context of project Sattic. This work was partially supported by the IST
Programme of the European Community, under the Pascal 2 Network of Excellence,
Ist–2006-216886.
A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 102–112, 2009.
c Springer-Verlag Berlin Heidelberg 2009
A Polynomial Algorithm for Submap Isomorphism 103
f 1 2 1 2
a b j g
6 6
e
4 3 4 3
d c i h 5 5
(a) (b) (c) (d)
Fig. 1. (a) and (b) are not isomorphic plane graphs; bold edges define a compact plane
subgraph in (c), but not in (d)
graph. If there exist rather efficient heuristics for solving the graph isomorphism
problem1 [McK81, SS08], this is not the case for the other measures which are
often computationally intractable (NP-hard), and therefore practically unsolv-
able for large scale graphs. In particular, the best performing approaches for
subgraph isomorphism are limited to graphs up to a few thousands of nodes
[CFSV01, ZDS+ 07].
However, when measuring graph similarity, it is overwhelmingly forgotten
that graphs actually model images and, therefore, have special features that
could be exploited to obtain both more relevant measures and more efficient
algorithms. Indeed, these graphs are planar, i.e., they may be drawn in the
plane, but even more specifically just one of the possible planar embeddings is
relevant as it actually models the image topology, that is, the order in which
faces are encountered when turning around a node.
In the case where just one embedding is considered, graphs are called plane.
Isomorphism of plane graphs needs to be defined in order to integrate topolog-
ical relationships. Let us consider for example the two plane graphs drawn in
Fig. 1(a) and 1(b). The underlying graphs are isomorphic, i.e., there exists a bi-
jection between their nodes which preserves edges. However, these plane graphs
are not isomorphic since there does not exist a bijection between their nodes
which both preserves edges and topological relationships.
Now by considering this, the isomorphism problem becomes simple [Cor75],
but the subgraph isomorphism problem is still too hard to be tackled in a
systematic way. Yet we may argue that when looking for some pattern in a
picture (for example a chimney in a house, or a wheel in a car) we may simplify
the problem to that of searching for compact plane subgraphs (i.e., subgraphs
obtained from a graph by iteratively removing nodes and edges that are incident
to the external face). Let us consider for example the plane graphs of Fig. 1.
The bold edges in Fig. 1(c) constitute a compact plane subgraph. However,
the bold edges in Fig. 1(d) do not constitute a compact plane subgraph because
edge (4, 3) separates a face of the subgraph into two faces in the original
graph.
1
The theoretical complexity of graph isomorphism is an open question: If it clearly
belongs to NP, it has not been proven to be NP-complete.
104 G. Damiand et al.
Contribution and outline of the paper. In this paper, we address the problem
of searching for compact subgraphs in a plane graph. To do that, we propose
to model plane graphs with 2-dimensional combinatorial maps, which provide
nice data structures for modelling the topology of a subdivision of a plane into
nodes, edges and faces. We define submap isomorphism, we give a polynomial
algorithm for this problem, and we show how this problem may be used to search
for a compact graph in a plane graph. Therefore we show that the problem can
be solved in this case in polynomial time.
We introduce 2D combinatorial maps in Section 2. A polynomial algorithm
for map isomorphism is given in Section 3 and submap isomorphism is studied
in Section 4. We relate these results with the case of plane graphs in Section 5,
and we give some experimental results that show the validity of this approach
on image recognition tasks in Section 6.
2 Combinatorial Maps
A plane graph is a planar graph with a mapping from every node to a point in 2D
space. However, in our context the exact coordinates of nodes matter less than
their topological organisation, i.e., the order nodes and edges are encountered
when turning around faces. This topological organisation is nicely modelled by
combinatorial maps [Edm60, Tut63, Cor75].
To model a plane graph with a combinatorial map, each edge of the graph
is cut in two halves called darts, and two one-to-one mappings are defined onto
these darts: the first to link darts belonging to two consecutive edges around a
same face, the second to link darts belonging to a same edge.
Definition 1. (2D combinatorial map [Lie91]) A 2D combinatorial map, (or
2-map) is a triplet M = (D, β1 , β2 ) where D is a finite set of darts; β1 is a per-
mutation on D, i.e., a one-to-one mapping from D to D; and β2 is an involution
on D, i.e., a one-to-one mapping from D to D such that β2 = β2−1 .
We note β0 for β1−1 . Two darts i and j such that i = βk (j) are said to be k-sewn.
Fig. 2 gives an example of a combinatorial map.
In some cases, it may be useful to allow βi to be partially defined, thus leading
to open combinatorial maps. The intuitive idea is to add a new element to the
6
7 9 5
8 10
11
4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
12
17 β1 2 3 4 5 6 7 1 9 10 11 8 13 14 15 12 17 18 16
13 16
1 15
β2 15 14 18 17 10 9 8 7 6 5 12 11 16 2 1 13 4 3
18
14 3
2
Fig. 2. Combinatorial map example. Darts are represented by numbered black seg-
ments. Two darts 1-sewn are drawn consecutively, and two darts 2-sewn are concur-
rently drawn and in reverse orientation, with little grey segment between the two darts.
A Polynomial Algorithm for Submap Isomorphism 105
b
f ab c de f g
c e
a β1 b c d a f g e
g β2 e c
d
Fig. 3. Open combinatorial map example. Darts a, b, d, f and g are not 2-sewn.
set of darts, and to allow darts to be linked with for β1 and/or β2 . By definition,
∀0 ≤ i ≤ 2, βi () = . Fig. 3 gives an example of open map (see [PABL07] for
precise definitions).
Finally, Def. 2 states that a map is connected if there is a path of sewn darts
between every pair of darts.
Definition 2. (connected map) A combinatorial map M = (D, β1 , β2 ) is con-
nected if ∀d ∈ D, ∀d ∈ D, there exists a path (d1 , . . . , dk ) such that d1 = d,
dk = d and ∀1 ≤ i < k, ∃ji ∈ {0, 1, 2}, di+1 = βji (di ).
3 Map Isomorphism
Lienhardt has defined isomorphism between two combinatorial maps as follows.
Definition 3. (map isomorphism [Lie94])Two maps M = (D, β1 , β2 ) and M =
(D , β1 , β2 ) are isomorphic if there exists a one-to-one mapping f : D → D , called
isomorphism function, such that ∀d ∈ D, ∀i ∈ {1, 2}, f (βi (d)) = βi (f (d)).
We extend this definition to open maps by adding that f () = , thus enforcing
that, when a dart is linked with for βi , then the dart matched to it by f is also
linked with for βi .
An algorithm may be derived from this definition in a rather straightforward
way, as sketched in [Cor75]. Algorithm 1 describes the basic idea which will be
extended in section 4 to submap isomorphism: We first fix a dart d0 ∈ D; then,
for every dart d0 ∈ D , we call Algorithm 2 to build a candidate matching func-
tion f and check whether f is an isomorphism function. Algorithm 2 basically
performs a traversal of M , starting from d0 and using βi to discover new darts
from discovered darts. Initially, f [d0 ] is set to d0 whereas f [d] is set to nil for
all other darts. Each time a dart di ∈ D is discovered, from another dart d ∈ D
through βi so that di = βi (d), then f [di ] is set to the dart di ∈ D which is
linked with f [d] by βi .
Algorithm 1. checkIsomorphism(M, M )
Input: two open connected maps M = (D, β1 , β2 ) and M = (D , β1 , β2 )
Output: returns true iff M and M are isomorphic
1 choose d0 ∈ D
2 for d0 ∈ D do
3 f ← traverseAndBuildMatching(M, M , d0 , d0 )
4 if f is a bijection from D ∪ { } to D ∪ { } and
∀d ∈ D, ∀i ∈ {1, 2}, f [βi (d)] = βi (f [d]) then
5 return true
6 return false
10 f[ ] ←
11 return f
such that dn = d and ∀k ∈ [1; n], ∃jk ∈ {0, 1, 2}, dk = βjk (dk−1 ). Therefore,
each time a dart di of this path is popped from S (line 5), di+1 is pushed in
S (line 9) if it has not been pushed before (through another path).
4 Submap Isomorphism
Intuitively, a map M is a submap of M if M can be obtained from M by
removing some darts. When a dart d is removed, we set βi (d ) to for every dart
d such that βi (d ) = d.
Definition 4. (submap) An open combinatorial map M = (D, β1 , β2 ) is isomor-
phic to a submap of an open map M = (D , β1 , β2 ) if there exists an injection
f : D ∪ {} → D ∪ {}, called a subisomorphism function, such that f () = and
∀d ∈ D, ∀i ∈ {1, 2}, if βi (d) = then βi (f (d)) = f (βi (d)) else either βi (f (d)) =
or f −1 (βi (f (d))) is empty.
This definition derives from the definition of isomorphism. The only modification
concerns the case where d is i-sewn with . In this case, the definition ensures
that f (d) is i-sewn either with , or with a dart d which is not matched with a
dart of M , i.e., such that f −1 (d ) is empty (see example in Fig. 4).
Note that if M is isomorphic to a submap of M , then M is isomorphic to the
map M obtained from M by restricting the set of darts D to the set of darts
D = {d ∈ D |∃a ∈ D, f (a) = d}.
Algorithm 3 determines if there is a submap isomorphism between two open
connected maps. It is based on the same principle as Algorithm 1; the only
difference is the test of line 4, which succeeds if f is a subisomorphism function
instead of an isomorphism function. The time complexity of this algorithm is
in O(|D| · |D |) as traverseAndBuilMatching is called at most |D | times and
its complexity is in O(|D|). Note that the subisomorphism test may be done in
linear time.
Concerning correctness, note that proofs and evidences given for isomorphism
are still valid: We solve the submap isomorphism problem with the same method
as before, except that function f is now an injection instead of a bijection.
o 5
6’ f 6
5’ e n p 4
8’ 8 9
9’ h i
4’ d
2’ b
1’ 3’ 7’ 10’ a
g j r 2 7 10
k l c 3
m q 1
M M M
Algorithm 3. checkSubIsomorphism(M, M )
Input: two open connected maps M = (D, β1 , β2 ) and M = (D , β1 , β2 )
Output: returns true iff M is isomorphic to a submap of M
1 choose d0 ∈ D
2 for d0 ∈ D do
3 f ← traverseAndBuildMatching(M, M , d0 , d0 )
4 if f is an injection from D ∪ { } to D ∪ { } and
∀d ∈ D, ∀i ∈ {1, 2}, βi (d) = ⇒ f (βi (d)) = βi (f (d)) and
∀d ∈ D, ∀i ∈ {1, 2}, βi (d) = ⇒ ∃e ∈ D, f (e) = βi (f (d)) then
5 return true
6 return false
Note that the pattern may be a partial subgraph of the target. Let us consider for
example Fig. 1c. Edge (1, 5) needs not to belong to the searched pattern, even-
though nodes 1 and 5 are matched to nodes of the searched pattern. However,
edge (4, 3) must belong to the searched pattern; otherwise it is not compact.
To use submap isomorphism to solve compact plane subgraph isomorphism,
we have to transform plane graphs into 2-maps. This is done by associating a
face in the map with every face of the graph except the external face. Indeed, a
2-map models a drawing of a graph on a sphere instead of a plane. Hence, none of
the faces of a map has a particular status whereas a plane graph has an external
(or unbounded) face. Let us consider for example the two graphs in Fig. 1a and
Fig. 1b: When embedded in a sphere, they are topologically isomorphic because
one can translate edge (d, c) by turning around the sphere, while this is not
possible when these graphs are embedded in a plane. In order to forbid one
to turn around the sphere through the external face, graphs are modelled by
open 2-maps such that external faces are removed: Only β2 is opened, and only
external faces are missing. Such open 2-maps correspond to topological disks.
A Polynomial Algorithm for Submap Isomorphism 109
Finally, a strong precondition for using our algorithms is that maps must
be connected. This implies that the original graphs must also be connected.
However, this is not a sufficient condition. One can show that an open 2-map
M modelling a plane graph G without its external face is connected if G is
connected and if the external face of G is delimited by an elementary cycle.
Hence, submap isomorphism may be used to decide in polynomial time if G1
is a compact plane subgraph of G2 provided that (i) G1 and G2 are modelled by
open 2-maps such that external faces are removed, and (ii) external faces of G1
and G2 are delimited by elementary cycles.
This result may be related to [JB98, JB99] which describe polynomial-time
algorithms for solving (sub)graph isomorphism of ordered graphs, i.e., graphs in
which the edges incident to a vertex are uniquely ordered.
6 Experiments
This section gives some preliminary experimental results that show the validity
of our approach. We first show that it allows to find patterns in images, and
then we study scale-up properties of our algorithm on plane graphs of growing
sizes. Experiments were run on an Intel Core2 Duo CPU at 2.20GHz processor.
Fig. 5. Finding a car in an image: The original image is on the left. The plane graph
obtained after segmentation is on the middle; the car has been extracted and rotated on
the right and it has been found in the original image. The graph obtained by Delaunay
triangulation and the corresponding combinatorial map are on the right; the car has
been extracted and it has been found in the original image.
It is worth mentionning here that the two approaches actually solve different
problems: Our approach searches for compact plane subgraphs whereas Vflib2
searches for induced subgraphs and do not exploit the fact that the graphs
are planar. Hence, the number of solutions found may be different: Vflib2 may
found subgraphs that are topologically different from the searched pattern; also
our approach may found compact plane subgraphs that are partial (see Fig. 1c)
whereas Vflib2 only searches for induced subgraphs. For each instance considered
in Table 1, both methods find only one matching, except for sg5000,10% which is
found twice in g5000 by vflib2 and once by our approach.
7 Discussion
References
[CFSV01] Cordella, L.P., Foggia, P., Sansone, C., Vento, M.: An improved algorithm
for matching large graphs. In: 3rd IAPR-TC15 Workshop on Graph-based
Representations in Pattern Recognition, Ischia, Italy, pp. 149–159 (2001)
[CFSV04] Conte, D., Foggia, P., Sansone, C., Vento, M.: Thirty years of graph match-
ing in pattern recognition. International Journal of Pattern Recognition and
Artificial Intelligence 18(3), 265–298 (2004)
[Cor75] Cori, R.: Un code pour les graphes planaires et ses applications. In:
Astérisque, vol. 27. Soc. Math. de, France (1975)
112 G. Damiand et al.
[DBF04] Damiand, G., Bertrand, Y., Fiorio, C.: Topological model for two-
dimensional image representation: definition and optimal extraction algo-
rithm. Computer Vision and Image Understanding 93(2), 111–154 (2004)
[Edm60] Edmonds, J.: A combinatorial representation for polyhedral surfaces. In:
Notices of the American Mathematical Society, vol. 7 (1960)
[JB98] Jiang, X., Bunke, H.: Marked subgraph isomorphism of ordered graphs.
In: Amin, A., Pudil, P., Dori, D. (eds.) SPR 1998 and SSPR 1998. LNCS,
vol. 1451, pp. 122–131. Springer, Heidelberg (1998)
[JB99] Jiang, X., Bunke, H.: Optimal quadratic-time isomorphism of ordered
graphs. Pattern Recognition 32(7), 1273–1283 (1999)
[Lie91] Lienhardt, P.: Topological models for boundary representation: a compar-
ison with n-dimensional generalized maps. Computer-Aided Design 23(1),
59–82 (1991)
[Lie94] Lienhardt, P.: N-dimensional generalized combinatorial maps and cellu-
lar quasi-manifolds. International Journal of Computational Geometry and
Applications 4(3), 275–324 (1994)
[LWH03] Luo, B., Wilson, R.C., Hancock, E.R.: Spectral embedding of graphs. Pat-
tern Recognition 36(10), 2213–2230 (2003)
[McK81] McKay, B.D.: Practical graph isomorphism. Congressus Numerantium 30,
45–87 (1981)
[PABL07] Poudret, M., Arnould, A., Bertrand, Y., Lienhardt, P.: Cartes combina-
toires ouvertes. Research Notes 2007-1, Laboratoire SIC E.A. 4103, F-86962
Futuroscope Cedex - France (October 2007)
[SS08] Sorlin, S., Solnon, C.: A parametric filtering algorithm for the graph iso-
morphism problem. Constraints 13(4), 518–537 (2008)
[Tut63] Tutte, W.T.: A census of planar maps. Canad. J. Math. 15, 249–271 (1963)
[ZDS+ 07] Zampelli, S., Deville, Y., Solnon, C., Sorlin, S., Dupont, P.: Filtering for
subgraph isomorphism. In: Bessière, C. (ed.) CP 2007. LNCS, vol. 4741,
pp. 728–742. Springer, Heidelberg (2007)
A Recursive Embedding Approach to Median
Graph Computation
1 Introduction
Graphs are a powerful tool to represent structured objects compared to other
alternatives such as feature vectors. For instance, a recent work comparing the
representational power of such approaches under the context of web content
mining has been presented in [1]. Experimental results show better accuracies of
the graph-based approaches over the vector-based methods. Nevertheless, some
basic operations such as computing the sum or the mean of a set of graphs,
become very difficult or even impossible in the graph domain.
The mean of a set of graphs has been defined using the concept of the median
graph. Given a set of graphs, the median graph [2] is defined as the graph
that has the minimum sum of distances (SOD) to all graphs in the set. It can
be seen as a representative of the set. Thus it has a large number of potential
applications primarily enabling many classical algorithms for learning, clustering
and classification typically used in the vector domain. However, its computation
time increases exponentially both in terms of the number of input graphs and
their size [3]. A number of algorithms for the median graph computation have
been reported in the past [2,3,4,5], but, in general, they either suffer from a large
complexity or they are restricted to specific applications.
A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 113–123, 2009.
c Springer-Verlag Berlin Heidelberg 2009
114 M. Ferrer et al.
2 Basic Definitions
2.1 Graph
Given L, a finite alphabet of labels for nodes and edges, a graph g is defined
by the four-tuple g = (V, E, μ, ν) where, V is a finite set of nodes, E ⊆ V × V is
the set of edges, μ is the node labeling function (μ : V −→ L) and ν is the edge
labeling function (ν : V × V −→ L). The alphabet of labels is not constrained
in any way. For example, L can be defined as a vector space (i.e. L = Rn ) or
simply as a set of discrete labels (i.e. L = {Δ, Σ, Ψ, · · · }). Edges are defined as
ordered pairs of nodes, that is, an edge is defined by (u, v) where u, v ∈ V . The
edges are directed in the sense that if the edge is defined as (u, v) then u ∈ V is
the source node and v ∈ V is the target node.
consisting of the insertion, deletion and substitution of both nodes and edges are
defined. Given these edit operations, for every pair of graphs, g1 and g2 , there
exists a sequence of edit operations, or edit path p(g1 , g2 ) = (e1 , . . . , ek ) (where
each ei denotes an edit operation) that transforms g1 into g2 (see Figure 1). In
general, several edit paths exist between two given graphs. This set of edit paths
is denoted by ℘(g1 , g2 ). To evaluate which edit path is the best, edit costs are
introduced through a cost function. The basic idea is to assign a penalty (or cost)
c to each edit operation according to the amount of distortion it introduces in
the transformation. The edit distance between two graphs g1 and g2 , d(g1 , g2 ),
is the minimum cost edit path that transforms one graph into the other. Since
the graph edit distance is a NP-complete problem, in this paper we will use
suboptimal methods for its computation [11,12].
3 Median Graph
Let U be the set of graphs that can be constructed using labels from L. Given
S = {g1 , g2 , ..., gn } ⊆ U , the generalized median graph ḡ of S is defined
as:
ḡ = arg min d(g, gi ) (1)
g∈U
gi ∈S
of graphs in S. The set median graph is usually not the best representative of a
set of graphs, but it is often a good starting point when searching the generalized
median graph.
As explained before the difficulty in using graph embedding to calculate the me-
dian graph is the mapping from vector space back to the graph space. Here we pro-
pose a recursive solution to the problem based on the algorithm of the weighted
mean of a pair of graphs [8].
The weighed mean of two graphs g and g is a graph g such that
d(g, g ) = a (2)
(Pn − P1 ) · Nn−1 = 0
(Pn − P2 ) · Nn−1 = 0
..
. (4)
(Pn − Pn−1 ) · Nn−1 = 0
Nn−1 = 1
The Euclidean median Mn of these n points will always fall on the hyperplane
Hn−1 . Moreover it will fall within the volume of the n-1 dimensional simplex
with vertices Pi . For n=4 this is visualised in Figure 2(a). This figure shows
the hyperplane H3 defined by the 4 points Pi = {P1 , P2 , P3 , P4 }. The Euclidean
median M4 falls in the 3D space defined by the 4 points and specifically within
the pyramid (3D simplex) with vertices Pi .
(a) (b)
Without loss of generality we can choose any one of the points, say Pn , and
create the vector (Mn − Pn ). This vector will lie fully on the hyperplane Hn−1 .
As mentioned before, in order to use the weighted mean between of a pair of
graphs to calculate the graph corresponding to Mn we need to first find a point
(whose corresponding graph is known) that lies on the line defined by the vector
(Mn − Pn ), and specifically on the ray extending Mn (so that Mn lies between
Pn and the new point).
Let’s call Hn−2 the hyperplane of dimensionality n-2 defined by the set of
points {P1 , P2 , . . . , Pn−1 }, that is all the original points except Pn . Then the
intersection of the line defined by the vector (Mn − Pn ) and the new hyperplane
Hn−2 will be a single point. For the running example of n=4 this point (M3 )
would be the point of intersection of the line P4 − M4 and the 2D plane H2
defined by the remaining points {P1 , P2 , P3 } (see Figure 2(a)).
A Recursive Embedding Approach to Median Graph Computation 119
For the normal vector Nn−2 of the hyperplane Hn−2 we can create the fol-
lowing set of n-1 equations in a similar fashion as before:
(Pn−1 − P1 ) · Nn−2 = 0
(Pn−1 − P2 ) · Nn−2 = 0
..
. (5)
(Pn−1 − Pn−2 ) · Nn−2 = 0
Nn−2 = 1
Furthermore, we ask that Nn−2 is perpendicular to Nn−1 (i.e. it falls within the
hyperplane Hn−1 ):
Nn−1 · Nn−2 = 0 (6)
Equations 5 and 6 provide us a set of n equations to calculate Nn−2 .
Suppose Mn−1 is the point of intersection of the line defined by (Mn − Pn )
and the hyperplane Hn−2 , then for this point it should be:
Mn−1 = Pn + α (Mn − Pn ) (7)
Based on eq. 7, 8 and 9, in the generic case the point Mk can be computed
recursively from:
Mk = Pk+1 + α (Mk+1 − Pk+1 ) (11)
Where:
Nk−1 · (Pk − Pk+1 )
α= (12)
Nk−1 · (Mk+1 − Pk+1 )
This process is recursively applied until M2 is sought. The case of M2 is solvable
using the weighted mean of a pair of graphs, as M2 will lie on the line segment
defined by P1 and P2 which correspond to known graphs (see Figure 2(b)).
Having calculated M2 the inverse process can be followed all the way up to
Mn . In the next step M3 can be calculated as the weighted mean of the graphs
corresponding to M2 and P3 . Generally the graph corresponding to the point
Mk will be given as the weighted mean of the graphs corresponding to Mk−1 and
Pk . The weighted mean algorithm can be applied continuously until the graph
corresponding to Mn is calculated, which is the median graph of the set.
It is important to note that the order of consideration of the points will affect
the final solution arrived at. As a result it is possible that one of the intermediate
solutions along the recursive path produces a lower SOD to the graphs of the
set than the final solution. Thus, the results reported here are based on the best
intermediate solutions.
5 Experiments
In this section we provide the results of an experimental evaluation of the pro-
posed algorithm. To this end we have used three graph databases representing
Letter shapes, Webpages and Molecules. Table 1 show some characteristics of
each dataset. For more information of these databases see [15].
To evaluate the quality of the proposed method, we propose to compare the
SOD of the median calculated using the present method (RE) taking the best
intermediate solution to the SOD of the median obtained using other existing
methods, namely the set median (SM) and the method of [9] (TE). For every
database we generated sets of different sizes as shown in Table 1. The graphs in
each set were chosen randomly from the whole database. In order to generalize
the results, we generated 10 different sets for each size.
Results of the mean value of the SOD over all the classes and repetitions for
each dataset are shown in Figure 3. Clearly, the lower the SOD, the better the
Table 1. Summary of dataset characteristics, viz. the size, the number of classes (#
classes), the average size of the graphs (∅ nodes) and the sizes of the sets
(a)
(b) (c)
Fig. 3. SOD evolution for the three databases. a) Letter, b) Molecule and c) Webpage.
result. Since the set median graph is the graph belonging to the training set with
minimum SOD, it is a good reference to evaluate the median graph quality.
As we can see, the results show that in all cases we obtain medians with lower
SOD than those obtained with the TE method. In addition, in two cases (Web
and Molecule) we also obtain better results than the SM method. In the case of
the Letter database, we obtain slightly worse results than the SM method but
quite close to that. Nevertheless, our results do not diverge from the results of the
SM method as in the case of the TE method, which means that our proposed
method is more robust against the size of the set. With these results we can
conclude that our method finds good approximations of the median graph.
6 Conclusions
In the present paper we have proposed a novel technique to obtain approximate
solutions for the median graph. This new approach is based on graph embed-
ding into vector spaces. First, the graphs are mapped to points in n-dimensional
vector space using the graph edit distance paradigm. Then, the crucial point
of obtaining the median of the set is carried out in the vector space, not in the
graph domain, which simplifies dramatically this operation. Finally, we proposed
a recursive application of the weighted mean of a pair of graphs to obtain the
graph corresponding to the median vector. This embedding approach allows us
to exploit the main advantages of both the vector and graph representations,
computing the more complex parts in real vector spaces but keeping the rep-
resentational power of graphs. Results on three databases, containing a high
122 M. Ferrer et al.
number of graphs with large sizes, show that the medians obtained with our
method are, in general, better that those obtained with other methods, in terms
of the SOD. For datasets such ones used in this paper, the generalized median
could not be computed before, due to the high computational cost of the exist-
ing methods. These results show that with this new procedure the median graph
can be potentially applied to any application where a representative of a set is
needed. Nevertheless, there are still a number of issues to be investigated. For
instance, the order in which the points are taken becomes an important topic to
be further studied in order to improve the results of the method.
Acknowledgements
This work has been supported by the Spanish research programmes Consolider
Ingenio 2010 CSD2007-00018, TIN2006-15694-C02-02 and TIN2008-04998, the
fellowship 2006 BP-B1 00046 and the Swiss National Science Foundation Project
200021-113198/1.
References
1. Schenker, A., Bunke, H., Last, M., Kandel, A.: Graph-Theoretic Techniques for
Web Content Mining. World Scientific Publishing, USA (2005)
2. Jiang, X., Münger, A., Bunke, H.: On median graphs: Properties, algorithms, and
applications. IEEE Trans. Pattern Anal. Mach. Intell. 23(10), 1144–1151 (2001)
3. Münger, A.: Synthesis of prototype graphs from sample graphs. Diploma Thesis,
University of Bern (1998) (in German)
4. Hlaoui, A., Wang, S.: Median graph computation for graph clustering. Soft Com-
put. 10(1), 47–53 (2006)
5. Ferrer, M., Serratosa, F., Sanfeliu, A.: Synthesis of median spectral graph. In: Mar-
ques, J.S., Pérez de la Blanca, N., Pina, P. (eds.) IbPRIA 2005. LNCS, vol. 3523,
pp. 139–146. Springer, Heidelberg (2005)
6. Riesen, K., Neuhaus, M., Bunke, H.: Graph embedding in vector spaces by means
of prototype selection. In: Escolano, F., Vento, M. (eds.) GbRPR 2007. LNCS,
vol. 4538, pp. 383–393. Springer, Heidelberg (2007)
7. Bunke, H., Allerman, G.: Inexact graph matching for structural pattern recogni-
tion. Pattern Recognition Letters 1(4), 245–253 (1983)
8. Bunke, H., Günter, S.: Weighted mean of a pair of graphs. Computing 67(3), 209–
224 (2001)
9. Ferrer, M., Valveny, E., Serratosa, F., Riesen, K., Bunke, H.: An approximate
algorithm for median graph computation using graph embedding. In: Proceedings
of 19th ICPR, pp. 287–297 (2008)
10. Sanfeliu, A., Fu, K.: A distance measure between attributed relational graphs for
pattern recognition. IEEE Transactions on Systems, Man and Cybernetics 13(3),
353–362 (1983)
11. Neuhaus, M., Riesen, K., Bunke, H.: Fast suboptimal algorithms for the compu-
tation of graph edit distance. In: Yeung, D.-Y., Kwok, J.T., Fred, A., Roli, F.,
de Ridder, D. (eds.) SSPR 2006 and SPR 2006. LNCS, vol. 4109, pp. 163–172.
Springer, Heidelberg (2006)
A Recursive Embedding Approach to Median Graph Computation 123
12. Riesen, K., Neuhaus, M., Bunke, H.: Bipartite graph matching for computing the
edit distance of graphs. In: Escolano, F., Vento, M. (eds.) GbRPR 2007. LNCS,
vol. 4538, pp. 1–12. Springer, Heidelberg (2007)
13. White, D., Wilson, R.C.: Mixing spectral representations of graphs. In: 18th In-
ternational Conference on Pattern Recognition (ICPR 2006), Hong Kong, China,
August 20-24, pp. 140–144. IEEE Computer Society, Los Alamitos (2006)
14. Weiszfeld, E.: Sur le point pour lequel la somme des distances de n points donnés
est minimum. Tohoku Math. Journal (43), 355–386 (1937)
15. Riesen, K., Bunke, H.: IAM graph database repository for graph based pattern
recognition and machine learning. In: SSPR/SPR, pp. 287–297 (2008)
Efficient Suboptimal Graph Isomorphism
1 Introduction
Graphs, employed in structural pattern recognition, offer a versatile alternative
to feature vectors for pattern representation. Particularly in problem domains
where the objects consist of complex and interrelated substructures of different
size, graph representations are advantageous. However, after the initial enthusi-
asm induced by the “smartness” and flexibility of graphs in the late seventies,
graphs have been left almost unused for a long period of time [1]. One of the
reasons for this phenomenon is that their comparison, termed graph matching,
is computationally very demanding.
The present paper addresses the most elementary graph matching problem,
which is graph isomorphism. Several algorithms for the computation of graph
isomorphism have been proposed in the literature [2,3,4,5,6,7,8]. Note, however,
that no polynomial runtime algorithm is known for this particular decision prob-
lem [9]. Under all available algorithms, the computational complexity of graph
isomorphism is exponential in the number of nodes in case of general graphs.
However, since the graphs encountered in practice often have special properties
A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 124–133, 2009.
c Springer-Verlag Berlin Heidelberg 2009
Efficient Suboptimal Graph Isomorphism 125
and furthermore, the labels of both nodes and edges very often help to substan-
tially reduce the search time, the actual computation time is sometimes manage-
able. In fact, polynomial algorithms for graph isomorphism have been developed
for special kinds of graphs, such as trees [10], planar graphs [11], bounded-valence
graphs [12], ordered graphs [13], and graphs with unique node labels [14]. Appli-
cations of the graph isomorphism problem can be found, for example, in compu-
tational chemistry [12] and in electronic automation [15]. Nonetheless, the high
computational complexity of graph isomorphism in case of general graphs con-
stitutes a serious drawback that prevents the more widespread use of graphs in
pattern recognition and related fields.
The present paper introduces a novel framework for the problem of graph
isomorphism. It is not restricted to any special class of graphs. The basic idea
is inspired by two papers, viz. [16,17]. In [16] it was shown that the problem
of graph isomorphism can be seen as a special case of optimal error-tolerant
graph matching under particular cost functions. In [17] a framework for fast
but suboptimal graph edit distance based on bipartite graph matching has been
proposed. The method is based on an (optimal) fast bipartite optimization pro-
cedure mapping nodes and their local structure of one graph to nodes and their
local structure of another graph. This procedure is somewhat similar in spirit
to the method proposed in [18]. However, rather than using dynamic program-
ming for finding an optimal match between the sets of local structure, Munkres’
algorithm [19] is used.
The work presented here combines these two ideas to obtain a suboptimal algo-
rithmic framework for graph isomorphism with polynomial runtime. Concretely,
the problem of graph isomorphism is reduced to an instance of the assignment
problem. In fact, polynomial runtime algorithms exist solving the assignment
problem in an optimal way. Yet, due to the fact that the assignment procedure
regards the nodes and their local structure only, it cannot be guaranteed that an
existing graph isomorphism between two graphs is detected in any case. Some-
times the proposed algorithm may not be able to decide, for a given pair of
graphs, whether they are isomorphic or not. In such a case, the given pair of
graphs is rejected. Consequently, the algorithm is suboptimal in the sense that
is does not guarantee to process any given input. However, if a pair of graphs is
not rejected the decision returned by the algorithm (yes or no) is always correct.
With experimental results achieved on two data sets from the TC-15 graph data
base [20], we empirically verify the feasibility of our novel approach to the graph
isomorphism problem.
2 Graph Isomorphism
The aim in exact graph matching is to determine whether two graphs, or parts
of them, are identical in terms of structure and labels. A common approach
to describe the structure of a graph g = (V, E, μ, ν) is to define the graph’s
adjacency matrix A = (aij )n×n (|V | = n). In the adjacency matrix, entry aij is
equal to 1 if there is an edge (vi , vj ) ∈ E connecting the i-th node with the j-th
node in g, and 0 otherwise1 . Generally, for the nodes (and also the edges) of a
graph there is no unique canonical order. Thus, for a single graph with n nodes,
n! different adjacency matrices exist. Consequently, for checking two graphs for
structural identity, we cannot merely compare their adjacency matrices. The
identity of two graphs g1 and g2 is commonly established by defining a function,
termed graph isomorphism, mapping g1 to g2 .
Two graphs are called isomorphic if there exists an isomorphism between them.
Obviously, isomorphic graphs are identical in both structure and labels. The
relation of graph isomorphism satisfies the conditions of reflexivity, symmetry,
and transitivity and can therefore be regarded as an equivalence relation on
graphs.
Standard procedures for testing graphs for isomorphism are based on tree
search techniques with backtracking. A well known algorithm implementing the
idea of a tree search with backtracking for graph isomorphism is described in [2].
A more recent algorithm for graph isomorphism, also based on the idea of tree
search, is the VF algorithm and its successor VF2 [21]. Here the basic tree search
algorithm is endowed with an efficiently computable heuristic which substan-
tially speeds up the search time. In [4] the tree search method for isomorphism
is sped up by means of another heuristic based on constraint satisfaction. An-
other algorithm for exact graph matching is Nauty [5]. It is based on a set of
transformations that reduce the graphs to be matched to a canonical form on
which the testing of the isomorphism is significantly faster. In [8] an approxi-
mate solution to the graph isomorphism problem, using the eigendecompositions
of the adjacency or Hermitian matrices, is discussed. In [6] a novel approach to
the graph isomorphism problem based on quantum walks is proposed. The basic
idea is to simulate coined quantum walks on an auxiliary graph representing
possible node matchings of the underlying graphs. The reader is referred to [1]
for an exhaustive list of graph isomorphism algorithms developed since 1973.
1
Two nodes vi , vj ∈ V connected by an edge (vi , vj ) ∈ E or (vj , vi ) ∈ E are commonly
referred to as adjacent.
Efficient Suboptimal Graph Isomorphism 127
where k > 0 is an arbitrary constant greater than zero. Hence, the entry cij in
C is zero if the corresponding node labels μ1 (ui ) and μ2 (vj ) are identical, and
non-zero otherwise.
We denote P as the set of all n! permutations of the integers 1, 2, . . . , n.
Given the cost matrix C = (cij )n×n , the assignment problem
can be stated as
finding a permutation (p1 , . . . , pn ) ∈ P that minimizes ni=1 cipi . Obviously, this
is equivalent to the minimum cost assignment of the nodes of g1 represented by
the rows to the nodes of g2 represented by the columns of matrix C. Hence,
Munkres’ algorithm can be seen as a function m : V1 → V2 minimizing the
n
objective function i=1 cipi . Note that in general the function m is not unique
as there may be several node mappings minimizing the actual objective function.
128 K. Riesen et al.
Clearly, if the minimum node assignment cost d(g1 , g2 ) is greater than zero,
one can be sure that there exists no graph isomorphism between g1 and g2 . On the
other hand, if d(g1 , g2 ) is equal to zero, there exists the possibility that g1 and g2
are isomorphic to each other. Obviously, the condition d(g1 , g2 ) = 0 is necessary,
but not sufficient for the existence of a graph isomorphism as the structure of
the graph is not considered by d(g1 , g2 ). In other words, the proposed algorithm
looks at the nodes and their respective labels only and takes no information
about the edges into account. According to Def. 2 only Condition (1) is satisfied
by function m.
In order to get more stringent criteria for the decision whether or not a graph
isomorphism exists, the edge structure can be involved in the node assignment
process (Conditions (2) and (3) of Def. 2). To this end, structural information
is included in the node labels. In particular, we extend the node label μ(u) of
every node u ∈ V by the indegree and the outdegree of u. The indegree and
the outdegree of node u ∈ V denote the number of incoming and outgoing
edges of u, respectively. Furthermore, the Morgan index M is used to add fur-
ther information about the local edge structure in the node labels [22]. This
index is iteratively computed for each node u ∈ V , starting with Morgan index
values M (u, 1) equal to 1 for all nodes u ∈ V . Next, at iteration step i + 1,
M (u, i + 1) is defined as the sum of the Morgan indices of u’s adjacent nodes
of the last iteration i. Note that the Morgan index M (u, i) associated to a node
u after the i-th iteration counts the number of paths of length i starting at
u and ending somewhere in the graph [23]. Hence, Morgan index provides us
with a numerical description of the structural neighborhood of the individual
nodes.
Given this additional information about the local structure of the nodes in a
graph, viz. the indegree, the outdegree, and the Morgan index, the cost cij of
a node mapping ui → vj is now defined with respect to the nodes’ labels and
their local structure information. That is, the entry cij is zero iff the original
labels, the indegrees and outdegrees, and the Morgan indices are identical for
both nodes ui ∈ V1 and vj ∈ V2 . Otherwise, we set cij = k, where k > 0 is an
arbitrary constant.
Considerations in the present paper are restricted to graphs with unlabeled
edges. However, if there are labels on the edges, the minimum sum of assign-
ment costs, implied by node substitution ui → vj , could be added to cij . This
minimum sum will be zero, iff all of the incoming and outgoing edges of node ui
can be mapped to identically labeled and equally directed edges incident to vj .
Otherwise, for all non-identical edge matchings implied by ui → vj , a constant
Efficient Suboptimal Graph Isomorphism 129
k > 0 is added to cij 2 . In summary, cij will be zero iff ui and vj and their
respective local neighborhoods are identical in terms of structure and labeling.
Note that Munkres’ algorithm used in its original form is optimal for solving
the assignment problem, but it provides us with a suboptimal solution for the
graph isomorphism problem only. This is due to the fact that each node assign-
ment operation is considered individually (considering the local edge structure
only), such that no implied operations on the edges can be inferred dynamically.
The result returned by Munkres’ algorithm corresponds to the minimum cost
mapping m of the nodes V1 to the nodes V2 according to matrix C. The overall
cost d(g1 , g2 ) defined in Eq. (1) builds the foundation of a two-stage decision pro-
cedure. In Fig. 1 the decision framework is illustrated. If d(g1 , g2 ) > 0, a graph
isomorphism can be definitely excluded as the nodes and their local structure of
g1 cannot be identically mapped to local structures in g2 . If d(g1 , g2 ) = 0, it is
possible that g1 and g2 are isomorphic to each other. Yet, the global edge struc-
ture might be violated given the node mapping m : V1 → V2 . Hence, the mapping
of the edges implied by the node mapping is tested (Check Structure). This test
can be easily accomplished, given the node mapping returned by Munkres’ al-
gorithm. If the edge structure is not violated by mapping m (identical), a graph
isomorphism has been found. Otherwise (non-identical), based on the current
information no definite answer can be given, as there may exist other optimal
node mappings m that would not violate the global edge structure. In such a
case, the decision for isomorphism is rejected.
The decision framework of Fig. 1 is suboptimal in the sense that a decision
(yes or no) is not guaranteed to be returned for all inputs. It is possible that
the algorithm rejects a given pair of graphs. However, if an answer yes or no is
given, it is always correct. In the remainder of the present paper, we refer to this
algorithm as Bipartite-Graph-Isomorphism, or BP-GI for short.
Check
Munkres
Structure
d>0 d=0 identical non-identical
ye re-
no yes yes
s ject
Fig. 1. Graph isomorphism decision scheme. Square boxes refer to algorithms, circles
to decisions. Black circles stand for definite decisions, while the gray circle stands for a
possible “yes” which is verified by checking the edge structures for identity. If the edge
structure is violated by mapping m, no definite answer can be given.
matching process by means of Munkres’ algorithm is carried out (O(n3 )). Finally,
the edge structure is checked (O(n )). Hence, the total complexity amounts to
O(n3 ).
An alternative to the proposed algorithm is to check other optimal matchings
m whenever the edge structure check fails. In the worst case, however, there
exist O(n!) optimal matchings and trying all of them leads to a computational
complexity of O(n!). In order to avoid this high complexity, one can define an
upper limit L on the number of optimal assignments to be tried by the algorithm.
4 Experimental Evaluation
The purpose of the experiments is twofold. First, we want to compare the run-
time of the novel approach for graph isomorphism with the runtime of standard
algorithms for the same problem3 . To this end, Ullmann’s method [2], the VF2
algorithm [3], and Nauty [5] are employed as reference systems4 . Second, we are
interested how often the novel algorithm rejects a given pair of graphs.
We use two graph sets from the TC-15 graph data base [20], viz. the randomly
connected graphs (RCG), and the irregular mesh graphs (IMG). The former data
set consists of graphs where the edges connect the nodes without any structural
regularity. That is, the probability of an edge between two nodes is independent
of the actual nodes. The parameter η defines the probability of an edge between
two given nodes. Hence, given a graph g with |V | = n nodes, the expected
number of edges in g is given by η · n · (n − 1) (in our experiments we set η = 0.1).
Note that if g is not connected, additional edges are suitably inserted into graph
g until it becomes connected. The latter data set is based on structurally regular
mesh graphs in which each node (except those belonging to the border of the
mesh) is connected with its four neighborhood nodes. Irregular mesh graphs
are then obtained by the addition of uniformly distributed random edges. The
number of added edges is ρ · n, where ρ is a constant greater than 0, and n = |V |
(in our experiments we set ρ = 0.2). Note that the graphs from both data sets
are unlabeled a priori. Hence, when Munkres’ algorithm is applied the Morgan
index M (u, i), as well as the in- and outdegree of a particular node u ∈ V , are
the only labels on the nodes.
Graphs of various size are tested. The size of the randomly connected graphs
varies between 20 and 1000 nodes per graph (|V | = 20, 40, . . . , 100, 200, 400, 800,
1000). On the irregular mesh graphs, the size varies from 16 nodes up to 576
nodes per graph (|V | = 16, 36, 64, 81, 100, 196, 400, 576). For each graph size 100
graphs are available. Hence, 90, 000 isomorphism tests (thereof 900 between iso-
morphic graphs) and 80, 000 isomorphism tests (thereof 800 between isomorphic
graphs) are carried out in total on RCG and IMG, respectively.
3
Computations are carried out on an Intel Pentium 4 CPU, 3.00 GHZ with 2.0 Giga
RAM.
4
We use the original implementations available under http://amalfi.dis.unina.it/
graph/db/vflib-2.0/ for Ullmann’s method and VF2, and http://cs.anu.edu.
au/~bdm/nauty/ for Nauty.
Efficient Suboptimal Graph Isomorphism 131
On the first data set (RCG) the algorithm returns 89, 998 correct decisions.
Only in two cases the input is rejected. On the second data set (IMG) we obtain
79, 996 correct decisions and four rejects.
In Fig. 2 (a) and (b) the mean computation time of one graph isomorphism
test is plotted as a function of the graph size |V |. On both data sets Ullmann’s
method turns out to be the slowest graph isomorphism algorithm. VF2 and
Nauty feature faster matching times for both graph sets than the traditional
approach of Ullmann. Similar results are reported in [24] on the same data sets.
However, it clearly turns out that our novel system based on bipartite graph
matching is faster than all reference systems for all available graph sizes.
Fig. 2. Mean computation time for graph isomorphism as a function of graph size |V |
on randomly connected graphs (RCG) and irregular mesh graphs (IMG)
Acknowledgments
This work has been supported by the Swiss National Science Foundation (Project
200021-113198/1). We would like to thank the Laboratory of Intelligent Systems
and Artificial Vision of the University of Naples for making the TC-15 data base,
Ullmann’s algorithm, and the VF2 algorithm available to us. Furthermore, we
are very grateful to Brendan McKay for making Nauty available to us.
References
1. Conte, D., Foggia, P., Sansone, C., Vento, M.: Thirty years of graph matching
in pattern recognition. Int. Journal of Pattern Recognition and Artificial Intelli-
gence 18(3), 265–298 (2004)
2. Ullmann, J.: An algorithm for subgraph isomorphism. Journal of the Association
for Computing Machinery 23(1), 31–42 (1976)
3. Cordella, L.P., Foggia, P., Sansone, C., Vento, M.: An improved algorithm for
matching large graphs. In: Proc. 3rd Int. Workshop on Graph Based Representa-
tions in Pattern Recognition (2001)
4. Larrosa, J., Valiente, G.: Constraint satisfaction algorithms for graph pattern
matching. Mathematical Structures in Computer Science 12(4), 403–422 (2002)
5. McKay, B.: Practical graph isomorphism. Congressus Numerantium 30, 45–87
(1981)
6. Emms, D., Hancock, E., Wilson, R.: A correspondence measure for graph matching
using the discrete quantum walk. In: Escolano, F., Vento, M. (eds.) GbRPR 2007.
LNCS, vol. 4538, pp. 81–91. Springer, Heidelberg (2007)
7. Messmer, B., Bunke, H.: A decision tree approach to graph and subgraph isomor-
phism detection. Pattern Recognition 32, 1979–1998 (2008)
8. Umeyama, S.: An eigendecomposition approach to weighted graph matching prob-
lems. IEEE Transactions on Pattern Analysis and Machine Intelligence 10(5), 695–
703 (1988)
9. Garey, M., Johnson, D.: Computers and Intractability: A Guide to the Theory of
NP-Completeness. Freeman and Co., New York (1979)
10. Aho, A., Hopcroft, J., Ullman, J.: The Design and Analysis of Computer Algo-
rithms. Addison Wesley, Reading (1974)
11. Hopcroft, J., Wong, J.: Linear time algorithm for isomorphism of planar graphs. In:
Proc. 6th Annual ACM Symposium on Theory of Computing, pp. 172–184 (1974)
12. Luks, E.: Isomorphism of graphs of bounded valence can be tested in polynomial
time. Journal of Computer and Systems Sciences 25, 42–65 (1982)
13. Jiang, X., Bunke, H.: Optimal quadratic-time isomorphism of ordered graphs. Pat-
tern Recognition 32(17), 1273–1283 (1999)
Efficient Suboptimal Graph Isomorphism 133
14. Dickinson, P., Bunke, H., Dadej, A., Kraetzl, M.: Matching graphs with unique
node labels. Pattern Analysis and Applications 7(3), 243–254 (2004)
15. Ebeling, C.: Gemini ii: A second generation layout validation tool. In: IEEE Inter-
national Conference on Computer Aided Design, pp. 322–325 (1988)
16. Bunke, H.: Error correcting graph matching: On the influence of the underlying cost
function. IEEE Transactions on Pattern Analysis and Machine Intelligence 21(9),
917–911 (1999)
17. Riesen, K., Bunke, H.: Approximate graph edit distance computation by means of
bipartite graph matching. In: Image and Vision Computing (2008) (accepted for
publication)
18. Eshera, M., Fu, K.: A graph distance measure for image analysis. IEEE Transac-
tions on Systems, Man, and Cybernetics (Part B) 14(3), 398–408 (1984)
19. Munkres, J.: Algorithms for the assignment and transportation problems. Journal
of the Society for Industrial and Applied Mathematics 5, 32–38 (1957)
20. Foggia, P., Sansone, C., Vento, M.: A database of graphs for isomorphism and
subgraph isomorphism benchmarking. In: Proc. 3rd Int. Workshop on Graph Based
Representations in Pattern Recognition, pp. 176–187 (2001)
21. Cordella, L., Foggia, P., Sansone, C., Vento, M.: A (sub)graph isomorphism algo-
rithm for matching large graphs. IEEE Trans. on Pattern Analysis and Machine
Intelligence 26(20), 1367–1372 (2004)
22. Morgan, H.: The generation of a unique machine description for chemical
structures-a technique developed at chemical abstracts service. Journal of Chemical
Documentation 5(2), 107–113 (1965)
23. Mahé, P., Ueda, N., Akutsu, T.: Graph kernels for molecular structures – activity
relationship analysis with support vector machines. Journal of Chemical Informa-
tion and Modeling 45(4), 939–951 (2005)
24. Foggia, P., Sansone, C., Vento, M.: A performance comparison of five algorithms for
graph isomorphism. In: Jolion, J., Kropatsch, W., Vento, M. (eds.) Proc. 3rd Int.
Workshop on Graph Based Representations in Pattern Recognition, pp. 188–199
(2001)
Homeomorphic Alignment
of Edge-Weighted Trees
Université Paris-Est
Laboratoire d’Informatique Gaspard Monge, Equipe A3SI
UMR 8049 UPEMLV/ESIEE/CNRS
1 Introduction
Motion capture without markers is a highly active research area, and is used
in several applications which have not the same needs: 3D models animation,
for movies FX or video games for example, requests an highly accurate model,
but does not need real-time computation (offline video processing is acceptable).
Real-time interaction, for virtual reality applications, requests a fast computa-
tion, at the price of a lower accuracy. This paper is placed in the context of
real-time interaction.
The first step (called initialization step) consists of finding the initial pose
of the subject, represented here by a 3d shape (visual hull) constructed using a
multi view system with an algorithm of Shape From Silhouette [1].
An important part of the algorithms of 3D pose estimation use a manually
initialized model, or ask the subject to move succesively the differents parts of
his/her body, but several automatic approaches have been developped, using
an a priori model. This a priori model can approximate different characteristics
of the subject: kinematic structure, shape or appearance. This kind of a priori
model have several constraints. It is complex because characteristics of different
nature are involved, and needs to be adapted to each subject (specialy in the
case of appearance).
1.1 Motivation
Our goal is to automatize the initial pose estimation step. To achieve this aim,
we use a very simple a priori model.
A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 134–143, 2009.
c Springer-Verlag Berlin Heidelberg 2009
Homeomorphic Alignment of Edge-Weighted Trees 135
3
b 16 c
H
1 4
d 11 e
A1 12 T 12 A2
12 10
f C
7 1 16 16
g 2 h i 2 j 2 k F1 F2
1 12
l 2 m n
8
3D SHAPE SKELETON DATA TREE PATTERN TREE
o
Fig. 1. Example of data tree acquisition and expected alignment with model tree
The model is an unrooted weighted tree (called the pattern tree), where vertices
represent the different parts of the shape, and each edge represents the link between
this parts, associated to a weight, representing the distance between two parts.
Concerning the data, we extract the curve skeleton of the visual hull, and
compute the associated weighted unrooted tree (called the data tree), by con-
sidering each multiple point and ending point, and linking them when they are
directly connected, the weight of the edge beeing the geodesic distance between
them (see figure 1).
After this step, the main difficulty is to match the pattern tree in the data
tree, with a good preservation of both topology and distances.
A lot of similar approaches have been developed, using the skeleton of a shape,
in motion capture research area [3,4,5], and in 3D shape matching research
area [6,8]. In the first case, the best time obtained for find the initial pose is
some one second [4], which is too slow, even for interactive time interaction. In
the second case, the algorithms used give an approximated solution [8], or need
a accurate knowledge of the radius distance of the skeleton, in order to compute
the associated shock graph [9].
As shown on Fig. 1, several kinds of noise and deformities can appear in
the data tree : spurious branches (edges {g, h}, {l, m}, {i, j}, {j, k}), useless 2-
degree vertices, obtained after spurious branches deletion (in our example, ver-
tices j, k, m), and splitted vertices (vertex T of pattern tree match with vertices
b and e in data tree).
Approaches found in the literature do not permit to achieve a robust match-
ing, with respect to these pertubations, mainly because they are defined for reach
an isomorphism between the trees, instead of an homeomorphism. In the follow-
ing, after adapting basic notions, we introduce both a new alignment, called
homeomorphic alignment, and a robust tree-matching algorithm which may be
used for real-time pose estimation.
2 Basics Notions
An undirected graph is a pair (V, E), where V is a finite set of vertices, and E
a subset of {{x, y}, x ∈ V, y ∈ V, x = y} (called edge). The degree of v ∈ V is
136 B. Raynal, M. Couprie, and V. Biri
3 Measure of Similarity
For a graph G = (V, A, ω), commonly used operations are :
resize : Change the weight of an arc a = (u, v) ∈ A.
delete : Delete an arc a = (u, v) ∈ A and merge u and v into one vertex.
insert : Split a vertex in two vertices, and link them by a new arc.
The cost of these edit operations is given by a cost function γ(w, w ), where
w (respectively w ) is the total weight of the arcs involved in the operation
before (respectively after) its application.We asume that γ is a metric. Typically,
γ(w, w ) = |w − w | or (w − w )2 .
Various edit-based distances have been defined, using different constraints on
sequence order and different definitions of operations. These edit-based distances
can be classified, as proposed by Wang and al. [10] : Edit distance [11], alignment
distance [12,13], isolated-subtrees distance [14], and top-down distance [15]. Pro-
posed edit distances, isolated-subtrees distances and top-down distances cannot
always match all the model tree, but only subparts, most often unconnected. How-
ever, we will see in the next subsection that it is not the case for alignment distance.
The minimal cost of all alignments from G1 and G2 , called the alignment dis-
tance, is denoted by α(G1 , G2 ). Alignment distance is an interesting way in our
case for three reasons: it preserves topological relations between trees, it can be
computed in polynomial time, and it enables to ”remove edges”, regardless of
the rest of the graph, solving the problem of splitted vertices.
For the purpose of solving the useless vertex problem, we propose a new align-
ment, which removes 2-degree vertices and search for minimal sequence of op-
erations to reach a homeomorphism, instead of an isomorphism between the
trees.
4 Algorithms
4.1 Algorithm for Rooted Trees
Let T = (V, A, ω) be a weighted tree rooted in rT . For each vertex v ∈ V \ {rT },
we denote by ↑ v the arc (w, v) ∈ A, w being the parent of v. We denote by
Homeomorphic Alignment of Edge-Weighted Trees 139
b c b b b b
d e f g d e f e d e f
h i j h i j h i h i j
Tcut (v, va ) = Cut(T (va ), {↑ p , p ∈ C(p) \ Π(va , v), p ∈ Π(va , par(v))}) . (3)
We denote by T (v, va ) the tree obtained from Tcut (v, va ) by merging on each
vertex n ∈ Π(va , v) \ {va , v}. We denote by F (T, v) the rooted forest, the con-
nected components of which are the trees T (p, v), for all p ∈ C(v). By abuse of
notation we also denote by F (T, v) the set of all connected components of this
forest (that is, as set of trees).
Proofs of the following propositions can be found in [17].
ηcut (∅, ∅) = 0
ηcut (P (i, ia ), ∅) = ηcut (F (P, i), ∅) + γ(ω(ia , i), 0)
ηcut (F (P, ia ), ∅) = ηcut (P (i , ia ), ∅) (5)
i ∈C(ia )
ηcut (∅, D(j, ja )) = 0
ηcut (∅, F (D, ja )) = 0 .
ηcut (P
⎧(i, ia ), D(j, ja )) =
⎪
⎪ ηcut (F (P, i), ∅) + γ(ω(ia , i), 0)
⎪
⎪
⎨ γ(ω(ia , i), ω(ja , j)) + ηcut (F (P, i), F (D, j))
(6)
min minjc ∈C(j) {ηcut (P (i, ia ), D(jc , ja ))}
⎪
⎪
⎪
⎪ minic ∈C(i) {ηcut (P (ic , ia ), D(j, ja )) + ηcut (P (ic , i), ∅)} .
⎩
ic ∈C(i)\ic
140 B. Raynal, M. Couprie, and V. Biri
ηcut (A,
⎧ B) =
⎪
⎪ minD(j ,j)∈B {ηcut (A, B \ {D(j , j)})}
⎪
⎪
⎪
⎪ minP (i ,i)∈A {ηcut (A \ {P (i , i)}, B) + ηcut (P (i , i), ∅)}
⎪
⎪
⎪
⎪ minP (i ,i)∈A,D(j ,j)∈B {ηcut (A \ {P (i , i)}, B \ {D(j , j)})+
⎨
ηcut (P (i , i), D(j , j))} (7)
min
⎪
⎪ minP (i ,i)∈A,B ⊆B {ηcut (A \ {P (i , i)}, B \ B )
⎪
⎪
⎪
⎪ +ηcut (F (P, i ), B ) + γ(Ω(i ), 0)}
⎪
⎪
⎪
⎪ minA ⊆A,D(j ,j)∈B {ηcut (A \ A , B \ {D(j , j)})+
⎩
ηcut (A , F (D, j )j) + γ(0, Ω(j ))} .
end
The total computation time complexity is in O(|VP | ∗ |VD | ∗ (2dP ∗ 2dD ∗ (dD ∗
2dP + dP ∗ 2dD ) + hP ∗ hD ∗ (dP + dD )), where dG and hG denote, respec-
tively, the maximal degree of a vertex in G and the height of G. If the maximal
degree is bounded, the total computation time complexity is in O(|VP | ∗ |VD | ∗
hP ∗ hD ).
5 Experimentation
5.2 Results
Our model tree contains seven vertices, representing head, torso, crotch, the
two hands and the two feet. Experimentally, the data tree obtained from the
skeleton of the visual hull has a degree bounded by 4, and its number of vertices
is between seven and twenty, with a gaussian probability repartition centred on
ten. All the results have been obtained on a computer with a processor Xeon
3 GHz and 1 Go of RAM.
For finding the average time of computation of our algorithm, we have ran-
domly generated 32 pattern trees, and for each pattern tree, we have generated
32 data trees, which yields 1024 pairs of trees. Each pattern tree has seven
vertices, one of which has a degree equals to 4. Each data tree has at least
one 4-degree vertex. The results of the four kinds of alignments are shown on
Fig. 3.
In the average case (|VD | ≤ 12), the homeomorphic alignement between a
rooted pattern tree and a unrooted data tree (we assume than the torso is always
aligned), can be easily computed in real time (frequence superior to 24Hz) and
in the worst case (|VD | 20), we keep an interactive time (frequence superior
to 12Hz). For tracking, if we can use the homeomorphic alignment between two
rooted trees, we are widely above 50Hz.
For finding the average precision of our algorithm, we have generate data trees
from pattern trees by adding new vertices, by three ways : splitting an existing
vertex in twice, adding a new 1-degree vertex, adding a new 2-degree vertex.
Then, we modify the weight of each edge in a proportional range. The results
are shown on Fig. 3.
142 B. Raynal, M. Couprie, and V. Biri
200
% of good matching
80
frequence (Hz)
150 60
100
40
50
20
0
10 15 20 25 30 0
0 50 100 150 200 250 300
|Vd|
% of noising vertices
HA(rooted P,rooted D)
HA(rooted P, D) 0% of weight variation
HA(P, rooted D) 10% of weight variation
HA(P, D) 50% of weight variation
Fig. 3. Top : frequences of the different homeomorphic alignments for variable sizes
of data tree, and precision for several kind of noises. Bottom : Examples of results
obtained on 3D shape : black circles represent the points matching with pattern tree.
6 Conclusion
References
1. Laurentini, A.: The Visual Hull Concept for Silhouette-based Image Understand-
ing. IEEE Transactions on Pattern Analysis and Machine Intelligence 16(2), 150–
162 (1994)
Homeomorphic Alignment of Edge-Weighted Trees 143
1 Introduction
Many problems in computer vision, shape recognition, document and text analysis can
be formulated as a graph matching problem. The nodes of a graph correspond to local
features or, more generally, to objects and the edges of the graph correspond to rela-
tionships between these objects. Solution to graph matching consists of finding an iso-
morphism (exact matching) or an optimal sub-graph isomorphism (inexact matching)
between the two graphs. Spectral graph matching methods are attractive because they
provide a framework that allows to embed graphs into isometric spaces and hence to re-
place the initial NP-hard isomorphism problem with a more tractable point registration
problem.
An undirected weighted graph with N nodes can be represented by an N × N real
symmetric matrix, or the adjacency matrix of the graph. Provided that this matrix has N
distinct eigenvalues, the graph can be embedded in the orthonormal basis formed by the
corresponding eigenvectors. Hence, an N -node graph becomes a set of N points in an
N -dimensional isometric space. In [1] it is proved that the eigendecomposition of the
adjacency matrices provide an optimal solution for exact graph matching, i.e., matching
graphs with the same number of nodes. The affinity matrix of a shape described by a
set of points can be used as the adjacency matrix of a fully connected weighted graph
[2,3,4,5]. Although these methods can only match shapes with the same number of
points, they introduce the heat kernel to describe the weights between points (nodes),
which has a good theoretical justification [6].
A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 144–153, 2009.
c Springer-Verlag Berlin Heidelberg 2009
Inexact Matching of Large and Sparse Graphs Using Laplacian Eigenvectors 145
Unfortunately, exact graph matching is not very practical, in particular when the
two graphs have a different number of nodes, e.g., constructed from real data, such as
visual sensors. Therefore one needs to combine spectral analysis with dimensionality
reduction such that two graphs that are being matched are embedded in a common sub-
eigenspace with lower dimension than to the original graphs. This immediately calls for
methods that allow many-to-many point correspondences. A clever idea is to combine
matching with agglomerative hierarchical clustering, as done in [7]. We also note that
spectral matching has strong links with spectral clustering [8] which uses the Laplacian
matrix of a graph [9].
The analysis of the spectral methods cited above rely on the eigenvalues of the adja-
cency or Laplacian matrices. The strict ordering of the eigenvalues allows the alignment
of the two eigenbases, while the existence of an eigengap allows the selection of a low-
dimensional eigen space. In the case of inexact matching of large and sparse graphs,
a number of issues remain open for the following reasons. The eigenvalues cannot be
reliably ordered and one needs to use heuristics such as the ones proposed in [3,10].
The graph matrices may have eigenvalues with geometric multiplicities and hence the
corresponding eigenspaces are not uniquely defined. Dimensionality reduction relies
on the existence of an eigengap. In the case of large graphs, e.g., ten thousands nodes,
the eigengap analysis yields an eigen space whose dimension is not well suited for
the task at hand. The sign ambiguity of eigenvectors is generally handled using sim-
ple heuristics [1,7]. The link between spectral matching and spectral clustering has not
yet been thoroughly investigated. Existing spectral matching algorithms put the eigen-
vectors on an equal footing; the particular role played in clustering by the the Fiedler
vector [9] has not been studied in the context of matching. We remark that the selec-
tion of strongly localized eigenvectors, which we define as the vector (eigenfunction)
Fig. 1. Two graphs (meshes) of a hip-hop dancer (top-left) with respectively 31,600 and 34,916
nodes (vertices). The matching (top-right), subsampled for visualization purposes, was obtained
by computing the embeddings of the two graphs into a 5-dimensional space (bottom-left) and by
registering the two embeddings (bottom-right).
146 D. Knossow et al.
that spans over a small subset of the graph while being zero elsewhere, hence corre-
sponding to subgraph clusters, has not been studied in depth. The only existing strategy
for selecting such eigenvectors is based on eigenvalue ordering and the detection of an
eigengap.
In this paper we propose an inexact spectral matching algorithm that embeds large
graphs on an isometric space spanned by a subset of eigenvectors corresponding to the
smallest eigenvalues of the Laplacian matrix. We claim that the tasks of (i) selecting a
subset of eigenvectors, (ii) ordering them, (iii) finding a solution to the sign-ambiguity
problem, as well as (iv) aligning two embeddings, can be carried out by computing
and comparing the histograms of the projections of the graphs’ nodes onto the eigen-
vectors. We postulate that the statistics of these histograms convey important geometric
properties of the Laplacian eigenvectors [11]. In practice, we apply the proposed algo-
rithm to match graphs corresponding to discrete surface representations of articulated
shapes, i.e., mesh registration. Figure 1 shows a graph matching result obtained with
our algorithm.
2 Laplacian Embedding
where dij is the geodesic distance between two nodes and σ is a smoothing parameter.
In the case of meshes, a vertex is connected to its neighbors onto the surface. In practice
we take the Euclidean distance between two connected vertices and Dii ≈ 6 which
yields a very sparse graph. The Laplacian of a graph is defined as L = D − W. We
consider the normalized graph Laplacian: L = D−1/2 (D − W)D−1/2 . This is a semi-
definite positive symmetric matrix with eigenvalues 0 = λ0 ≤ λ1 ≤ . . . ≤ λN −1 . The
null space of this matrix is the constant eigenvector U 0 = (1 . . . 1) .
The eigenvectors of L form an orthonormal basis, U 0 U i = 0, ∀i ∈ {1, . . . , N − 1}.
Therefore we obtain the following property: k Uik = 0. It is worthwhile to notice
that L = I − W where W = D−1/2 WD−1/2 is the normalized adjacency matrix; ma-
trices L and W share the same eigenvectors. Finally let Lt = UΛU be the truncated
eigendecomposition where the null eigenvalue and constant eigenvector where omitted.
Graph projections onto the eigenvectors corresponding to the smallest non-null eigen-
values of the Laplacian are displayed on the left side of figure 2. The right side of this
figure shows the density of these projections being considered as continuous equivalent
of histograms. These densities were estimated using the MCLUST software1 [12].
1
http://www.stat.washington.edu/mclust/
Inexact Matching of Large and Sparse Graphs Using Laplacian Eigenvectors 147
Fig. 2. A mesh with 7,063 vertices and with approximately six edges per vertex is shown here pro-
jected onto four Laplacian eigenvectors corresponding to the four smallest non-null eigenvalues.
The curves onto the right correspond to histograms of these graph projections. The first one of
this vector, the Fiedler vector is supposed to split the graph along a cut but in this case such a cut
is difficult to interpret. Other vectors, such as the second and fourth ones are very well localized
which makes them good candidates for clustering and for matching. These histograms also re-
veal that not all these eigenvectors are well localized. This suggests that some of the eigenvectors
shown here are not well suited for spectral clustering.
148 D. Knossow et al.
Q∗ = Ux SUy , (4)
where S = Diag [s1 , . . . , sN −1 ], s ∈ {+1; −1} accounts for the sign ambiguity in the
eigendecomposition and where the domain of the objective function (3) has been ex-
tended to the group of orthogonal matrices. The entries of Q∗ are Q∗ij = xi (s • y j ),
where a • b is the Hadamard product between two vectors. Since both Ux and Uy are
orthonormal matrices, all entries Q∗ij of Q∗ vary between −1 and 1. Therefore, Q∗ can
be interpreted as a cosine node-similarity matrix. Several heuristics were proposed in
the past to solve for the sign ambiguity and hence to recover node-to-node assignments
uniquely. In [1] the entries of Ux and Uy are replaced by their absolute values. The
recovery of P∗ from Q∗ , i.e., exact matching, becomes a bipartite maximum weighted
matching problem that is solved with the Hungarian algorithm.
In this paper we propose to perform the matching in a reduced space and let K <
(N, M ) be the dimension of this space. We start with a solution provided by eigenvalue
ordering followed by dimensionality reduction. This provides two sets of K ordered
eigenvectors. However, ordering based on eigenvalues is not reliable simply because
there may be geometric multiplicities that give rise, in practice, to numerical instabili-
ties. To overcome this problem we seek a new eigenvector permutation which we denote
by a K × K matrix P. Thus, equation (4) can be rewritten as:
Q = Ux S P Uy , (5)
This is an instance of the rigid point registration problem that can be solved by treat-
ing the assignments αij as missing variables in the framework of the expectation-
maximization algorithm. A detailed solution is described in [13]. If matrices S and
P are not correctly estimated, matrix R belongs to the orthogonal group, i.e., rotations
and reflections. However, if S and P are correctly estimated, R belongs to the special
orthogonal group, i.e., it is a rotation, which means that the two sets of points can be
matched via an Euclidean transformation. The estimation of the latter is much more
tractable than the more general case of orthogonal transformations.
Hence, the inexact graph matching problem at hand can be solved with the fol-
lowing three steps: (i) estimate matrices S and P using properties associated with the
Laplacian eigenvectors, (ii) establish a match between the two sets of embedded nodes
(K-D points) based on a nearest-neighbor strategy, and (iii) achieve point registration
probabilistically by jointly estimating point-to-point assignments and a rotation matrix
between the two sets of points.
eigenfuctions just defined, namely h([Uxi ]) and h([Uyj ]), where the notation h([U ])
corresponds to the histogram of the set of values returned by the eigenfunction U . The
first observation is that these histograms are invariant to node permutation, i.e., invari-
ant to the order in which the components of the eigenvectors are considered. Therefore,
the histogram of an eigenfunction can be viewed as an invariant signature of an eigen-
vector. The second important observation is that h([−U ]) = h(B −[U ]), where B is the
total number of bins used to build the histograms; Hence, the histograms can be used
to detect sign flips. The third important observation is that the shape of the histogram
is not too sensitive to the number of nodes in the graph and it is therefore possible to
compare histograms arising form graphs with different cardinalities.
The problem of estimating matrices S and P can therefore be replaced by the
problem of finding a set of assignments {U xi ⇔ ±U yj , 1 ≤ i, j ≤ K} based on the
comparison of their histograms. This is an instance of the bipartite maximum match-
ing algorithm already mentioned with complexity O(K 3 ). Since the eigenvectors are
defined up to a sign (modeled by S), we must associate two histograms with each
eigenfunction. Let C(hi , hj ) be a measure of similarity between two histograms. By
computing the similarities between all pairs of histograms we can build a K × K ma-
trix A whose entries are defined by:
as well as another matrix whose entries contain the signs of Uyj which are eventually
retained. The Hungarian algorithm finds an optimal permutation matrix P as well as a
sign matrix S.
5 Results
As a first example, consider a motion sequence of an articulated shape and their reg-
istration shown in figure 3. The articulated shape is described by a mesh/graph with
7,063 vertices and the degree of each vertex is approximately equal to six. The graphs
were matched using the method described above, i.e., alignment of eigenvectors based
on their histograms and naive point registration based on a nearest neighbor classifier.
On an average, the algorithm found 4,000 one-to-one matches and 3,000 many-to-many
matches. Notice that in this case the two graphs are isomorphic.
Figure 4 shows two sets of eigenvector histograms (top and bottom) corresponding to
the first pair of registered shapes of figure 3. The histograms shown on each row corre-
spond to the five eigenvectors associated to the smallest non-null eigenvalues, shown in
increasing order from left to right. There is a striking similarity between these histograms
in spite of large discrepancies between the two shapes’ poses. This clearly suggests that
these histograms are good candidates for both exact and inexact graph matching.
Figure 5 shows three more examples of inexact graph matching corresponding to
different shape pairs: dog-horse, dancer-gorilla, and horse-seahorse. Each mesh in this
figure is described by a sparse graph and there are notable differences in the number of
nodes. For example, the horse graph has 3,400 nodes, the gorilla graph has 2,046 nodes,
the dancer graph has 34,000 nodes, and the seahorse graph has 2,190 nodes. The top
Inexact Matching of Large and Sparse Graphs Using Laplacian Eigenvectors 151
Fig. 3. The result of applying the graph matching algorithm to a dance sequence
Fig. 4. Two sets of histograms associated with two sets of Laplacian eigenvectors. One may notice
the striking similarity between these histograms that correspond to two isomorphic graphs.
row of figure 5 shows the result of many-to-many inexact matching obtained with the
method described in this paper. The bottom row shows the result of one-to-one rigid
point registration obtained with a variant of the EM algorithm [13] initialized from the
matches shown on the top row.
152 D. Knossow et al.
6 Conclusion
In this paper, we proposed a framework for inexact matching of large and sparse graphs.
The method is completely unsupervised, it does not need any prior set of matches be-
tween the two graphs. The main difficulty of the problem is twofold: (1) to extend
the known spectral graph matching methods such that they can deal with graphs of
very large size, i.e., of the order of 10,000 nodes and (2) to carry out the match-
ing in a robust manner, i.e., in the presence of large discrepancies between the two
graphs.
We showed that it is possible to relax the graph isomorphism problem such that in-
exact graph matching can be carried out when the dimension of the embedding space
is much smaller than the number of vertices in the two graphs. We also showed that the
alignment of the eigenbases associated with the two embedded shapes can be robustly
estimated using eigenvector’s density instead of eigenvalue ordering. The method starts
with an initial solution based on ordering the eigenvalues and then it finds the optimal
subset of eigenvectors that are aligned based on comparing their density distribution.
This selects both a one-to-one eigenvector alignment and the dimension of the embed-
ding. We also pointed out localization as an important property of eigenvectors and
presented initial results to support our observations.
In future, we plan to investigate more thoroughly the link between graph matching
and graph clustering. We believe localization property to be a promising direction to
move forward.
References
1. Umeyama, S.: An eigen decomposition approach to weighted graph matching problems.
IEEE PAMI 10, 695–703 (1988)
2. Scott, G., Longuet-Higgins, C.: An algorithm for associating the features of two images.
Proceedings Biological Sciences 244, 21–26 (1991)
3. Shapiro, L., Brady, J.: Feature-based correspondence: an eigenvector approach. Image and
Vision Computing 10, 283–288 (1992)
4. Carcassoni, M., Hancock, E.R.: Correspondence matching with modal clusters. IEEE
PAMI 25, 1609–1615 (2003)
5. Carcassoni, M., Hancock, E.R.: Spectral correspondence for point pattern matching. Pattern
Recognition 36, 193–204 (2003)
6. Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data represen-
tation. Neural Computation 15, 1373–1396 (2003)
7. Caelli, T., Kosinov, S.: An eigenspace projection clustering method for inexact graph match-
ing. IEEE PAMI 26, 515–519 (2004)
8. Ng, A., Jordan, M., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: NIPS
(2002)
9. Chung, F.: Spectral Graph Theory. American Mathematical Society, Providence (1997)
10. Zhang, H., van Kaick, O., Dyer, R.: Spectral methods for mesh processing and analysis. In:
Eurographics Symposium on Geometry Processing (2007)
Inexact Matching of Large and Sparse Graphs Using Laplacian Eigenvectors 153
11. Biyikoglu, T., Leydold, J., Stadler, P.F.: Laplacian Eigenvectors of Graphs. Springer, Heidel-
berg (2007)
12. Fraley, C., Raftery, A.: MCLUST version 3 for R: Normal mixture modeling and
model-based clustering. Technical Report 504, Department of Statistics, University of
Washington (2006)
13. Mateus, D., Horaud, R., Knossow, D., Cuzzolin, F., Boyer, E.: Articulated shape
matching using Laplacian eigenfunctions and unsupervised point registration. In:
CVPR (2008)
Graph Matching Based on Node Signatures
1 Introduction
In image processing applications, it is often required to match different images
of the same object or similar objects based on structural descriptions constructed
from these images. If the structural descriptions of objects are represented by
graphs, different images can be matched by performing some kind of graph
matching. Graph matching is the process of finding a correspondence between
nodes and edges of two graphs that satisfies some constraints ensuring that
similar substructures in one graph are mapped to similar substructures in the
other. Many approaches have been proposed to solve the graph matching prob-
lem [1,5,15]. Matching by minimizing the edit distance [4,11,13,14] is attractive
since it gauges the distance between graphs by counting the least cost of edit
operations needed to make two graphs isomorphic. Moreover the graph edit dis-
tance has tolerance to noise and distortion. The main drawback of graph edit
distance is its computational complexity, which is exponential in the number of
nodes of the involved graphs. To reduce the complexity, Apostolos [14] gives a
fast edit distance based on matching specific graphs by using the sorted graph
histogram. Equivalently, Lopresti [12] gives an equivalence test procedure that
allows to quantify the similarity between graphs. Other methods based on spec-
tral approaches [2,3,16], give an elegant matrix representation for graphs that
This work is partially supported by the French National Research Agency project
NAVIDOMASS referenced under ANR-06-MCDA-012 and Lorraine region.
A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 154–163, 2009.
c Springer-Verlag Berlin Heidelberg 2009
Graph Matching Based on Node Signatures 155
In this section we describe our algorithm, firstly for the graph matching problem
(exact and inexact), and then for computing a metric distance between graphs.
a node signature in the context of weighted and unweighted graphs. For weighted
graphs, the signature is defined as the degree of the node and the weights of all
the incident edges. Given a graph G = (X, E), the node signature is formulated
as follows:
V s(x) = {d(x), w0 ,w1 ,w2 ...}
Where x ∈ X, d(x) gives the degree of x, and wi are the weights of the incident
edges to x. For unweighted graphs, the weights of any incident edges are fixed to
1. The set of node signatures (vectors) describing nodes in a graph is a collection
of local descriptions. So, local changes in the graph will modify only a subset of
vectors while leaving the rest unchanged. Moreover, the computational cost of
the construction of these signatures is low since it is computed straightforwardly
from the adjacency matrix. Based on these node signatures, a cost matrix C is
defined by:
Cgi ,gj (i, j) = L1 (γ(i), γ(j)) (1)
where i and j are, respectively, the nodes of gi and gj , and L1 (.,.) the Manhattan
distance. γ(i) is the vector Vs (i) sorted only for the weights in a decreasing order.
Finally, since the graphs have different size, the γ vectors are padded by zeros
to keep the same size of vectors.
The cost matrix defines a vertex-to-vertex assignment for a pair of graphs.
This task can be seen as an instance of the assignment problem, and can be
solved by the Hungarian method, running in O (n 3 ) time [19] where n is the
size of the biggest graph. The permutation matrix P, obtained by applying the
Hungarian method to the cost matrix, defines the optimum matching between
two given graphs. Based on the permutation matrix P, we define a matching
function M as follow :
yj , if Pi,j = 1 (2a)
M (xi ) =
0, else (2b)
where xi and yj are the nodes, respectively, in the first and the second graph.
3 Experiments
To show the utility of our method in pattern recognition applications and the
robustness to structural changes, we drawn different experiments.
60
60
5 5
50
50
10 10
40
40
15 15
30
30
20 20
20
20
25 25
10
10
(a) 30
5 10 15 20 25 30
0
(b) 30
5 10 15 20 25 30
Fig. 1. Graph distance matrices. (a) results from Umeyama approach; (b) results from
our approach.
158 S. Jouili and S. Tabbone
100 8
22
27
17
18
20
23
21
30
16
26
25
29
19
24
28 28
18
50 6
9
6
3
7 4
4
0 2 21
5 17
25
19
2 12 29
8
−50 5 13
1
0 8 30
15
14 24
16
20
−100 46
9 23
13 15 27
−2 14
10
22
26
11 10
−150
11 −4
13 12 7
(a) −200
−150 −100 −50 0 50 100 150 200 (b) −6
−40 −30 −20 −10 0 10 20 30
Fig. 2. MDS for each distance matrices. (a) MDS of Umeyama approach. (b) MDS of
our graph distance.
mixed together. In Fig. 2(b), two classes of images can be clustered clearly and
are distributed more compactly.
The MST method is a well known clustering method from the graph theory
analysis. By this approach, a minimum spanning tree for the complete graph is
generated, whose nodes are images and edge weights are the distance measures
between images (graphs in our experiments). By cutting all edges with weights
greater than a specified threshold, subtrees are created and each subtree repre-
sents a cluster. We use the distance matrices obtained previously to implement
the MST clustering and for each method a threshold that optimizes its results
is selected (see Table. 1). The MST clustering is evaluated by the Rand index
[27] and the Dunn index [28]. The Rand index measures how closely the clusters
created by the clustering algorithm match the ground truth. The Dunn index
is a measure of the compactness and separation of the clusters and unlike the
Rand index, the Dunn index is not normalized. When the distance measure is
the Umeyama distance, many images of second class are clustered into the first
class and three classes are detected by MST clustering. When our method is
used, two classes are detected and all images are clustered correctly. These re-
sults coincide with the MDS results. In addition, the results of Dunn index and
the Rand index show that the clustering using our method obtains a better sep-
aration of the graphs into compact clusters. The time consumed by our method
is 39.14% less than the Umeyama one (see Table. 1).
Secondly, we have compared our method with the GED from spectral seriation
[2], the graph histograms [14] and the graph probing [12]. The experiments
Table 1. MST clustering with our graph distance and Umeyama’s approach
5 5 5 5
10 10 10 10
15 15 15 15
20 20 20 20
25 25 25 25
(a) 30
5 10 15 20 25 30 (b) 30
5 10 15 20 25 30 (c) 30
5 10 15 20 25 30 (d) 30
5 10 15 20 25 30
Fig. 3. Graph distance matrices. (a) results from our method; (b) results from GED
from spectral seriation; (c) results from graph histograms method; (d)results from
graph probing method.
consist on applying the previous tests (MDS and MST) on a database derived from
COIL-100 [20] which contains different views of 3D objects. We have used three
classes chosen randomly, with ten images per class. Two consecutive images in
the same class represent the same object rotated by 5o . The images are converted
into graphs by feature points extraction using the Harris interest points [23] and
Delaunay triangulation [24]. Finally, in order to get weighted graphs, each edge is
weighted by the euclidean distance between the two points that it connect. The
size of the graphs ranges from 5 to 128 nodes.
The distance matrix in Fig. 3(a) show clearly three blocks along the diagonal;
thus the within-class and between-class distances are not close to each other.
Whereas, in the other matrices (Fig. 3.b-d) the intensity of the first two blocks
along the diagonal is close to the neighbor blocks. In addition, the MDS (see
Fig. 4) and the MST clustering results (see Table. 2) show that with our method
three classes are clearly separated and the Rand index gets a value of 1. However,
the evaluation of the separability and the compactness of the created clusters
show that the graph histograms [14] has the best Dunn index but with two
detected classes only (instead of three classes) and the graph probing has the
best execution time.
From table. 2, we can note that contrary to our method the first two classes are
merged for the three methods (spectral seriation, graph histograms and graph
probing). Each of these approaches uses a global description to represent graphs:
the probing [12] and the graph histograms [14] methods represent each graph
with only one vector, and the spectral seriation method [2] uses a string represen-
tation for graphs. Therefore, these global descriptions can not distinct differences
when the graphs share similar global characteristics but not local.
2D MDS Solutions
2D MDS Solutions 2D MDS Solutions 2D MDS Solutions
10
60 30 40
50
30
20
40
20
30
5 10
20
10
10 0
0
0
0 −10
−10
−10
−20
−20
−20
−30
(a) −40
−80 −60 −40 −20 0 20 40 60 80 100 (b) −5
−10 −5 0 5 10 15 20 (c) −30
−400 −200 0 200 400 600 800 (d) −30
−50 0 50 100
Fig. 4. MDS. (a) results from our method; (b) results from GED from spectral seriation;
(c) results from graph histograms method; (d)results from graph probing method.
160 S. Jouili and S. Tabbone
Table 2. MST clustering in three classes from COIL-100 : images 1-10 belong to first
class, images 11-20 to the second class and images 21-30 to the third class
Histograms 14, 18, 13, 17, 21, 27, 22, 23, 25.60 0.77 4.54
20, 11, 15, 16, 25, 24, 28, 26,
method 19, 1, 4, 7, 8, 10, 30, 29
9, 5, 2, 3, 6, 12
Graph 14, 18, 13, 20, 21, 29, 22, 25, 19.46 0.77 1.78
19, 16, 17, 11, 23, 24, 27, 26,
Probing 15, 12, 2, 4, 7, 3, 28, 30
6, 10, 9, 8, 1, 5
Our 3, 6, 2, 1, 9, 4, 7, 11, 19, 14, 17, 21, 22, 23, 25, 329.02 1 1.54
8, 10, 5 18, 20, 16, 12, 24, 28, 26, 30,
method 13, 15 27, 29
Graph retrieval application. Firstly, the retrieval performance on the face ex-
pression database of Carnegie Mellon University [29] are evaluated. Secondly, the
effectiveness of the proposed node signature is evaluated by performing a graph
retrieval application with the GREC database [21,22]. In the two
experiments, given a query image, the system retrieves the ten most similar
images from the database. The receiver-operating curve (ROC) is used to mea-
sure retrieval performances. The ROC curve is formed by Precision rate against
Recall rate.
Figure 5 gives the retrieving results of our methods compared with the three
methods used previously on the face database which contains 13 subjects and
each subject has 75 images showing different expressions. The graphs are con-
structed with same manner as the previous experiment (graph clustering). The
size of the graphs ranges from 4 to 17. Even though our method provides better
results, the results in the figure 5 have a low performance. We can conclude
that the way of the construction of the graphs is not appropriated for this kind
of data.
1
Our Method
Histogram Method
0.9 Probing
Spectral
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Table 3 shows the accuracy rate of the retrieval on the GREC database making
use of our graph distance as a function of the node signature. The aim of this
experiment is to show the behavior of our metric when the signature about each
node is defined of one of the two features either the degree of the node or the
weights of the incident edges. From this experiment, we can remark that the
use of the combination of the degree and the weights improves the accuracy
rate. Moreover, the incident edge weights feature seems to affect more strongly
the behavior of our metric because this feature provides a good specification to
characterize the nodes compared with only the node degree.
Sensitivity Analysis. The aim in this section is to investigate the sensitivity
of our matching method to structural differences in the graphs. Here, we have
taken three classes from the COIL-100 database, each one contains 10 images.
The structural errors are simulated by randomly deleting nodes and edges in
the graph. The query graphs are the distorted version of the original graph
representing the 5th image in each class.
Figure 6 shows the retrieval accuracy as a function of the percentage of edges
deletion (Fig. 6-a) and nodes deletion (Fig. 6-b). The retrieval accuracy degrades
when the percent of edge deletion is around 22% (Fig. 6-a) and 20% of node
deletion (Fig. 6-b). The main feature to denote from these plots is that our graph
matching method is most robust to edge deletion, because the edge deletion does
not imply an important structural changes into the graph. It changes only some
elements in the node signatures of the incident nodes of the deleted edge. In fact,
the node signature procedure describes the nodes from different localization in
the graph, e.g. all informations about the connected edge to the node is given.
Therefore, the performance of the retrieval task is more sensitive to node deletion
compared to the edge deletion.
10 10
Graph corresponding to the 5th image in the first class Graph corresponding to the 5th image in the first class
Graph corresponding to the 5th image in the second class Graph corresponding to the 5th image in the second class
Graph corresponding to the 5th image in the third class Graph corresponding to the 5th image in the third class
9
8
8
Accuracy of Retrieval
Retrieved images
6
7
6
4
2
4
3 0
0 20 40 60 80 100 0 20 40 60 80 100
(a) Percentage of edges deleted (b) Percentage of nodes deleted
Fig. 6. Effect of Noise for similarity queries. (b) Edges Deletion. (a) Nodes deletion.
162 S. Jouili and S. Tabbone
4 Conclusion
In this work, we propose a new graph matching technique based on node signa-
tures describing local information in the graphs. The cost matrix between two
graphs is based on these signatures and the optimum matching is computed us-
ing the Hungarian algorithm. Based on this matching, we have also proposed a
metric graph distance. From the experimental results, we have implicitly shown,
that the nodes are well differentiated by their valence and the weights of the inci-
dent edges (considered as an unordered set) and therefore, our method provides
good results to cluster and retrieve images represented by graphs.
References
1. Myers, R., Wilson, R.C., Hancock, E.R.: Bayesian Graph Edit Distance. IEEE
Trans. Pattern Anal. Mach. Intell. 22(6), 628–635 (2000)
2. Robles-Kelly, A., Hancock, E.R.: Graph edit distance from spectral seriation. IEEE
Trans. on Pattern Analysis and Machine Intelligence 27(3), 365–378 (2005)
3. Umeyama, S.: An eigendecomposition approach to weighted graph matching prob-
lems. IEEE Trans. on Pattern Analysis and Machine Intelligence 10(5), 695–703
(1988)
4. Bunke, H., Shearer, K.: A graph distance metric based on the maximal common
subgraph. Pattern Recognition Letters 19, 255–259 (1998)
5. Bunke, H., Munger, A., Jiang, X.: Combinatorial Search vs. Genetic Algorithms: A
Case Study Based on the Generalized Median Graph Problem. Pattern Recognition
Letters 20(11-13), 1271–1279 (1999)
6. Riesen, K., Bunke, H.: Approximate graph edit distance computation
by means of bipartite graph matching. Image Vis. Comput. (2008),
doi:10.1016/j.imavis.2008.04.004
7. Gold, S., Rangarajan, A.: A graduated assignment algorithm for graph matching.
IEEE Trans. on Pattern Analysis and Machine Intelligence 18(4), 377–388 (1996)
8. Shokoufandeh, A., Dickinson, S.: Applications of Bipartite Matching to Problems
in Object Recognition. In: Proceedings, ICCV Workshop on Graph Algorithms and
Computer Vision, September 21 (1999)
9. Shokoufandeh, A., Dickinson, S.: A unified framework for indexing and matching
hierarchical shape structures. In: Arcelli, C., Cordella, L.P., Sanniti di Baja, G.
(eds.) IWVF 2001. LNCS, vol. 2059, pp. 67–84. Springer, Heidelberg (2001)
10. Eshera, M.A., Fu, K.S.: A graph distance measure for image analysis. IEEE Trans.
Syst. Man Cybern. 14, 398–408 (1984)
11. Sorlin, S., Solnon, C., Jolion, J.M.: A Generic Multivalent Graph Distance Measure
Based on Multivalent Matchings. Applied Graph Theory in Computer Vision and
Pattern Recognition 52, 151–181 (2007)
12. Lopresti, D., Wilfong, G.: A fast technique for comparing graph representations
with applications to performance evaluation. International Journal on Document
Analysis and Recognition 6(4), 219–229 (2004)
13. Sanfeliu, A., Fu, K.S.: A Distance Measure between Attributed Relational Graphs
for Pattern Recognition. IEEE Trans. Systems, Man, and Cybernetics 13(353-362)
(1983)
Graph Matching Based on Node Signatures 163
14. Apostolos, N.P., Yannis, M.: Structure-Based Similarity Search with Graph His-
tograms. In: Proc. of the 10th International Workshop on Database & Expert
Systems Applications (1999)
15. Gori, M., Maggini, M., Sarti, L.: Exact and Approximate graph matching using
random walks. IEEE Trans. on Pattern Analysis and Machine Intelligence 27(7),
1100–1111 (2005)
16. Chung, R.K.: FAN, Spectral Graph Theory. AMS Publications (1997)
17. Xu, L., King, I.: A PCA approach for fast retrieval of structural patterns in
attributed graphs. IEEE Trans. Systems, Man, and Cybernetics 31(5), 812–817
(2001)
18. Luo, B., Hancock, E.R.: Structural Graph Matching Using the EM Algorithm and
Singular Value Decomposition. IEEE Trans. on Pattern Analysis and Machine
Intelligence 23(10), 1120–1136 (2001)
19. Kuhn, H.W.: The Hungarian method for the assignment problem. Naval Research
Logistic Quarterly 2, 83–97 (1955)
20. Nene, S.A., Nayar, S.K., Murase, H.: Columbia Object Image Library (COIL-100),
technical report, Columbia Univ. (1996)
21. Riesen, K., Bunke, H.: IAM Graph Database Repository for Graph Based Pattern
Recognition and Machine Learning. In: IAPR Workshop SSPR & SPR, pp. 287–297
(2008)
22. Dosch, P., Valveny, E.: Report on the Second Symbol Recognition Contest. In:
Proc. 6th IAPR Workshop on Graphics Recognition, pp. 381–397 (2005)
23. Harris, C., Stephens, M.: A combined corner and edge detection. In: Proc. 4th
Alvey Vision Conf., pp. 189–192 (1988)
24. Fortune, S.: Voronoi diagrams and Delaunay triangulations. In: Computing in Eu-
clidean Geometry, pp. 193–233 (1992)
25. Zahn, C.T.: Graph-theoretical methods for detecting and describing Gestalt clus-
ters. IEEE Trans. on Computers C-20, 68–86 (1971)
26. Hofmann, T., Buhmann, J.M.: Multidimensional Scaling and Data Clustering. In:
Advances in Neural Information Processing Systems (NIPS 7), pp. 459–466. Mor-
gan Kaufmann Publishers, San Francisco (1995)
27. Rand, W.M.: Objective criteria for the evaluation of clustering methods. Journal
of the American Statistical Association 66, 846–850 (1971)
28. Dunn, J.: Well separated clusters and optimal fuzzy partitions. Journal of Cyber-
netics 4(1), 95–104 (1974)
29. Carnegie Mellon University face expression database,
http://amp.ece.cmu.edu/downloads.htm
A Structural and Semantic Probabilistic Model for
Matching and Representing a Set of Graphs
Abstract. This article presents a structural and probabilistic framework for rep-
resenting a class of attributed graphs with only one structure. The aim of this ar-
ticle is to define a new model, called Structurally-Defined Random Graphs.
This structure keeps together statistical and structural information to increase
the capacity of the model to discern between attributed graphs within or outside
the class. Moreover, we define the match probability of an attributed graph re-
spect to our model that can be used as a dissimilarity measure. Our model has
the advantage that does not incorporate application dependent parameters such
as edition costs. The experimental validation on a TC-15 database shows that
our model obtains higher recognition results, when there is moderate variability
of the class elements, than several structural matching algorithms. Indeed in our
model fewer comparisons are needed.
1 Introduction
From 80’s, graphs have increase its importance in pattern recognition, being one of
the more powerful characteristics the abstraction they achieve. Therefore, the same
structure is able to represent a wide sort of problems from image understanding to
interaction networks. Consequently, algorithms based on graph models can be used in
a very large problems space. There is an interesting review of graph representation
models, graph matching algorithms and its applications in [7].
One of the main problems that practical applications, using structural pattern rec-
ognition, are confronted with is the fact that sometimes there is more than one model
graph that represents a class, what means that the conventional error-tolerant graph
matching algorithms must be applied to each model-input pair sequentially. As a
consequence, the total computational cost is linearly dependent on the size of the
database of model graphs and exponential to the number of nodes of the graphs to be
compared. For applications dealing with large databases, this may be prohibitive. To
alleviate this problem, some attempts have been made to try to reduce the computa-
tional time of matching the unknown input patterns to the whole set of models from
A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 164–173, 2009.
© Springer-Verlag Berlin Heidelberg 2009
A Structural and Semantic Probabilistic Model 165
the database. Assuming that the graphs that represent a cluster or a class are not com-
pletely dissimilar, only one structural and probabilistic model could be defined from
these graphs to represent the cluster, and thus, only one comparison would be needed
for each cluster [FORGS, FDG, SORG].
One of the earliest approaches was the model called First Order Random Graphs
(FORGs) [3], where a random variable was assigned to each node and edge to repre-
sent its possible values. In the Function-Described Graph approach (FDGs) [2], some
logical functions between nodes and arcs were introduced to alleviate some problems,
increasing thus, the capacity to represent the set with a low increase of storage space.
Finally, Second-Order Random Graphs (SORGs) [4] were presented. Basically, they
converted the logical functions of the FDGs in bidimensional random variables. The
representative capacity was increased and also the storage space.
This paper presents a new model called Structurally-Defined Random Graph
(SDRG) with low storage space but with higher capacity to discern between in and
out elements of the class. This is achieved by reducing the complexity of the probabil-
ity density function used to describe each random variable and defining the probabil-
ity of a match such that the probability is 1 when a perfect match is performed (in the
other models [FORGS, FDG, SORG] this does not hold).
Section 2 introduces the main definitions of graphs and presents the new model.
Section 3 describes a probabilistic measure of dissimilarity between a graph and an
SDRG. Section 4 evaluates the model. Section 5 gives some conclusions and further
lines to explore.
Example 1
We give a case example of a set of graphs representation using our model SDRG.
Suppose we have a set of 5 AGs in which the attribute value of the nodes is their bidi-
mensional position (x,y) and the attribute value of the arcs is a logic value indicating
their existence (Fig. 1). The attribute value of the nodes is shown on the right-hand side
of the node number. The existence of an arc is represented by a straight line.
Suppose also that it is given a common labelling (Table 1a) between the nodes of
these AGs and a hypothetic structure composed by 4 nodes (L1, L2, L3, L4). With this
set and the common labelling, we define the SDRG shown in Fig. 2. R is composed
by a structure of 4 random nodes and 4 random arcs and S is composed by 4 AGs. On
the right-hand side of the random nodes, we show the mean of the random variable.
Note that nodes v1 and v2 from all S elements share the same R attribute. The exis-
tence probability of each node and arc is shown in Table 1c and finally, the labellings
between the AGs in S and R is shown in Table 1b.
A Structural and Semantic Probabilistic Model 167
i
Table 1a. Common labelling of Table 1b. R to A' Table 1c. Existence prob-
Fig. 1 examples labelling ability of Fig. 2 nodes
Pω1 = 5 / 5 Pε1 = 5 / 5
1 2 3 4 5 1 2 3 4
G G G G G A A A A
L1 V 1 V1 V1 V1 V1 ω1 v1 V1 V1 V1
L2 V3 V3 ω2 V2 V2 V2 V2 Pω 2 = 5 / 5 Pε 2 = 1 / 2
L3 V 2 V2 V2 V2 V2 ω3 V3 V3 Pω 3 = 2 / 5 Pε 3 = 1 / 2
L4 V 3 ω4 V3
Pω 4 = 1 / 5 Pε 4 = 1 / 1
This expression is crucial in our model, since independently on the number of graphs
and the variability of these graphs, the probability of a graph respect an SDRG is
obtained as the maximum value calculated from the graphs that compose S.
From now to the rest of this section, we use A to represent one of the structures in
S, i.e. a concrete Ai. Moreover, we consider that we have a set of structurally correct
labellings Γ that maps nodes and arcs from G to nodes and arcs from A. That is,
f = ( f v , f e ) ∈ Γ being f v : ΣGv → ΣvA and f e : ΣGe → ΣeA . Given a node n and an
arc e from G, we define a random node ω and a random arc ε from R such that
ω = lvA ( f v (n)) and ε = leA ( fe (e)) .
Given a specific graph A in S, the probability of G respect to A and R is the maxi-
mum value among all consistent labellings f. That is,
P R, A
(G) = MAX{P R , A (G f )} . (2)
∀f ∈Γ
P R, A
(G f ) = k1 ∑P
∀n∈Σ v
sem
R, A
(n G, f v ) ⋅ PRstr, A (n G, f v ) +
(3)
k2 ∑P
∀e∈Σ e
sem
R,A
(e G, f e ) ⋅ P (e G, f e ).
str
R, A
Where the weighting terms k1 and k2 adjust the importance of nodes and arcs in the
final result (being k1+k2=1). The probabilities PRsem str
, A and PR , A are the semantic and
structural probabilities of nodes or arcs of G respect a random node or arc of R. The
semantic probability, which represents the attribute-value knowledge, is weighted by
the structural probability, which represents the appearing frequency. Both probabili-
ties are defined in the following sections.
pω pε
PRstr, A (n G, f v ) = and PRstr, A (e G, f e ) = .
∑ pω ∑ pε
(4)
' '
ω
∀ '∈Σ ω ε
∀ '∈Σ ε
A Structural and Semantic Probabilistic Model 169
Where p ω and pε are the existence probabilities of the random node ω and arc ε.
Moreover, the set of random vertices ω’ and arcs ε‘ are the ones that
ω ' = lvA ( f v (n' )) and ε ' = leA ( f e (e' )) for all the nodes n’ and arcs e’ of the ex-
Example 2
Consider that we would like to compute the probability of a new data graph G respect
to the SDRG obtained in Example 1. We only show how to obtain the probability
respect to A4 graph. Fig. 3 shows graph G and Table 2 shows the labelling f 4.
4
Table 2. G to A labelling
G nodes Labeling f 4
n1 v1
n2 v2
n3 Φ
n4 v3
Fig. 3. G to A4 labelling
To compute the structural probabilities, we have to obtain the value pωi from Table
1b and consider that pωΦ = 1 (Definition 4). If we consider the mapping f 4 and the
mapping from A4 to R shown in Table 1b, we get the following structural probabilities:
P str4 (n1 G, f ) = pω1 pω1 ,ω2 ,ωΦ ,ω4 = 5 / 16 , P str4 (n2 G, f ) = pω2 pω1 ,ω2 ,ωΦ ,ω4 = 5 / 16 ,
R,A R,A
P str4 (n3 G, f ) = pωΦ pω1 ,ω2 ,ωΦ ,ω4 = 5 / 16 and P str4 (n4 G, f ) = pω4 pω1 ,ω2 ,ωΦ ,ω4 = 1 / 16
R, A R,A
The random variable is not restricted to any distribution function. A possible solu-
tion is to define a discrete distribution and store the function as a histogram [FORGs,
FDGs, SORGs]. This solution keeps all the knowledge of the training examples but
170 A. Solé-Ribalta and F. Serratosa
needs a huge storage space. On the other hand, if we assume a Normal distribution
(defined in Equation 6 as N), the model only needs to store μ and σ for each node and
arc. In this case, if we assume that μω, με and σω, σε are the mean and variance of the
previously defined random nodes or arcs, the semantic probability can be defined as
follows,
N (a, μω ,σ ω ) N (b, με , σ ε )
PRsem ( n G, f v ) = and PR , A (e G, f e ) =
sem
. (6)
,A
N (μω , μω ,σ ω ) N ( με , με , σ ε )
Note that, in the case that G and A has exactly the same structure and the attributes of
G have the same value as the means of R, PR,A(G|f)=1.
Fig. 4. Graph-based representations of the Fig. 5. Some examples of letter X and H with
original prototypes low and high distortion respectively
With each class of the training set, an SDRG has been synthesised. To do so, we
have used a variation of the incremental-synthesis algorithm used to construct
SORGs [4]. The coordinates (x, y) of the positions are considered to be independent
in the basis that they don’t have any mathematical relationship. Therefore, the
random variable Xω is defined according to P(ωk=(x,y)| ωk≠Φ)=
P(ω k( x ) = x ω k ≠ Φ ) ⋅ P(ω k( y ) = y ω k ≠ Φ ) ; ∀( x, y ) ∈ R 2 . The random vari-
able in the arcs, i.e. Xω, is defined according to P(εij=∃|εij≠Φ)=1 and
P(εij=Φ|εij≠Φ)=0. In our tests we set k1=k2=1/2, see Equation 3.
A Structural and Semantic Probabilistic Model 171
Fig. 6. Graphical representations of all the node’s random variables Xω of the SDRGs. Left
image represent letter I and right letter X. Both synthesised using the low distortion training set.
Fig. 6 shows the node’s random variables Xω for two SDRGs that represent letter I
(left) and letter X (right) synthesised using the low-distortion training set. On the
right-hand side of the node, we show pω. In the case of letter I, we appreciate two
nodes with low variance (high peaks) which their means are situated in the expected
position. Nevertheless, we appreciate another two nodes with high variance (low
peaks) that seem to model the distortion of the training set. In the case of letter X, we
appreciate 4 clear nodes (n1, n2, n3, n4), again in the expected position, with low
variance and 2 high variance nodes generated by the distortion (n5, n6). Finally,
Fig. 7 shows the set elements of S for letter I.
In the incremental-synthesis algorithm [8], each new graph G is compared to the
current SDRG and a labelling between both is obtained. Using this labelling, the
SDRG is updated to incorporate G. Fig. 8 shows the evolution of the match
10 10
9
9
9
8 8
probability
probability
8
7 "A" high "F" high
7
6 7
6
5
6
4 5
1 5 9 13 17 21 25 29 33 37 41 45 49 1 5 9 13 17 21 25 29 33 37 41 45 49
example example
probability for the construction of two SDRGs1, letter A (left) and letter F (right). We
can see that while the learning process moves forward the probability of the next
element tends to increase. This tendency could be explained because when new ele-
ments of the training set are incorporated to the SDRG, the model contains more in-
formation of the class.
We define the compression rate as the number of graphs in the set S respect the
number of graphs that SDRG contains, i.e. the number of Ais. In our method, the
computational time in the recognition process is proportional to the number of ele-
ments in S. In a classical Nearest-Neighbours method, it is proportional to the number
of elements that represents the set. For this reason, it is important to evaluate the
achieved compression rate. Table 3 shows the compression rate for low and high
distortion databases. The compression in the low database is clearly considerably.
Nevertheless, in the high database, two letters achieve zero compression. This is due
to the fact that the training set elements are very structurally different. Finally,
Table 4 shows the classification ratio of our method compared to 5 other methods
reported in [5].
Table 4. Classification rate of 5 methods reported in the literature and our method
perform few comparisons for each class, when the number of graph in the training set
is high, this results to an important run time reduction.
Our future work is addressed to compare the model with FDGs and SORG. More-
over, we want to study statistical techniques of node reduction and analyze its impact
in the recognition-ratio and run time. From the practical point of view, we want to test
our method using other databases and analyze the dependence degree of the training
set’s distortion.
Acknowledgements
This research was partially supported by Consolider Ingenio 2010; project CSD2007-
00018, by theCICYT project DPI 2007-61452 and by the Universitat Rovira I Virgili
(URV) through a predoctoral research grant.
References
1. Riesen, K., Bunke, H.: Graph Database Repository for Graph Based Pattern Recognition
and Machine Learning. In: SSPR 2008 (2008)
2. Serratosa, F., René, A., Sanfeliu, A.: Function-described graphs for modelling objects repre-
sented by sets of attributed graphs. P. Recognition 36, 781–798 (2003)
3. Wong, A.K.C., You, M.: Entropy and distance of random graphs with application to struc-
tural pattern recognition. IEEE Trans. PAMI 7, 599–609 (1985)
4. Serratosa, F., Alquézar, R., Sanfeliu, A.: Estimating the Joint Probability Distribution of
Random Vertices and Arcs by means of Second-order Random Graphs. In: Caelli, T.M.,
Amin, A., Duin, R.P.W., Kamel, M.S., de Ridder, D. (eds.) SPR 2002 and SSPR 2002.
LNCS, vol. 2396, pp. 252–262. Springer, Heidelberg (2002)
5. Bunke, H., Riesen, K.: A Family of Novel Graph Kernels for Structural Pattern Recogni-
tion. In: Rueda, L., Mery, D., Kittler, J. (eds.) CIARP 2007. LNCS, vol. 4756, pp. 20–31.
Springer, Heidelberg (2007)
6. Sanfeliu, A., Serratosa, F., Alquézar, R.: Second-Order Random Graphs For Modeling Sets
Of Attributed Graphs And Their Application To Object Learning And Recognition.
IJPRAI 18(3), 375–396 (2004)
7. Conte, D., Foggia, P., Sansone, C., Vento, M.: Thirty Years Of Graph Matching In Pattern
Recognition. IJPRAI 18(3), 265–298 (2004)
8. Serratosa, F., Alquézar, R., Sanfeliu, A.: Synthesis of Function-Described Graphs and Clus-
tering of Attributed Graphs. IJPRAI 16(6), 621–656 (2002)
Arc-Consistency Checking with Bilevel
Constraints: An Optimization
1 Introduction
2 Basic Notions
A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 174–183, 2009.
c Springer-Verlag Berlin Heidelberg 2009
Arc-Consistency Checking with Bilevel Constraints: An Optimization 175
Cij (v, w) is the Boolean value obtained when variables i and j are replaced by
values v and w respectively. ¬Cij denotes the negation of the Boolean value Cij .
Let R be the set of these constraining relations. We use D to denote the union
of all domains and d the size of D.
A finite-domain constraint satisfaction problem consists of finding all sets of
values {a1 , ..., an }, a1 x ... x an ∈ D1 x ... x Dn , for(1, ..., n) satisfying all
relations belonging to R.
In this classical definition of F DCSP , one variable is associated with one
value. This assumption can not hold for some classes of problems where we need
to associate a variable with a set of linked values as describe in [8]. We call
this problem the Finite-Domain Constraint Satisfaction Problem with Bilevel
Constraints (F DCSPBC ). In this problem we define two kinds of constraint: the
binary inter-node constraints Cij between two nodes and the binary intra-node
constraints Cmpi between two values that could be associated with the node i.
Then, the problem is defined as follows:
The classical arc-consistency algorithm can not classify a set of data in a node of
the graph as we would like to do in an over-segmented image interpretation task.
We thus define a class of problems called arc-consistency problems with bilevel
constraints (ACBC ). It is associated with the F DCSPBC (see Definition 1) and
is defined as follows:
Definition 2. Let (i, j) ∈ arc(G). Let P(Di ) be the set of sub parts of the
domain Di . Arc (i,j) is arc consistent with respect to P(Di ) and P(Dj ) iff ∀Si ∈
P(Di ) ∃Sj ∈ P(Dj ) such that ∀v ∈ Si ∃t ∈ Si , ∃w ∈ Sj Cmpi (v, t) and Cij (t, w).
(v and t could be identical)
Definition 3. Let P(Di ) be the set of sub parts of the domain Di . Let P=P(D1 )×
.... ×P(Dn ). A graph G is arc-consistent with respect to P iff ∀(i, j) ∈ arc(G): (i,j)
is arc-consistent with respect to P(Di ) and P(Dj ).
The AC4BC was derived from the AC4 algorithm proposed by Mohr and Hen-
derson in 1986 [3,1] to solve the ACBC problem (See [8] and [9] for the details
of the algorithm).
For AC4BC , a new definition of a node i belonging to node(G) is given. A
node is made up of a kernel Di and a set of interfaces Dij associated with each
arc which comes from another linked node (See Figure 1.a). In addition, an
intra-node compatibility relation Cmpi (See Section 2.1) is associated with each
node of the graph. It describes the semantic link between different subparts of
an object which could be associated with the node. The intra-node constraint
Cmpi can be spatial or morphological constraint as shown in the Figure 1.b.
a. b.
Fig. 1. a. Structure of a node with bilevel constraints. The constraint Cmpi links
regions classified inside the node i. If a region does not belong to an interface Dij but
satisfies the constraint Cmpi with another region belonging to Dij , then this region is
kept inside Di . b. The values α, β and γ (segmented regions) can be associated with
the node i representing a conceptual object. In this example α ∈ Dik , β ∈ Dik , γ ∈ Dik
and α ∈ Dij , β ∈ Dij , γ ∈ Dij . In a classical arc-consistency checking algorithm, these
values β and γ would be removed from the node i because they are not supported by
other regions. Thanks to the intra-node constraints Cmpi, β and γ can be kept in the
node i because a path can be found between the value α and the values β and γ.
Arc-Consistency Checking with Bilevel Constraints: An Optimization 177
begin AC4BC
Step1: Construction of the data structures.
1 InitQueue(Q);
2 for each i ∈ node(G) do
3 for each b ∈ Di do
4 begin
5 S[i,b]:= empty set;
6 end;
7 for each (i, j) ∈ arc(G) do
8 for each b ∈ Dij do
9 begin
10Total:= 0;
11 for each c ∈ Dj do
12 if Cij (b, c) then
13 begin
14 Total:= Total +1
15 S[j,c]:=S[j,c] ∪ (i,b);
16 end
17 Counter[(i,j),b] := Total;
18 if Total=0 then
19 Dij := Dij − {b};
20end;
21 for each i ∈ node(G) do
22 for each Dij ∈ Ii do
23 begin
24 CleanKernel(Di , Dij , Ii , Q);
25 end
Fig. 2. The AC4BC algorithm: step1. Figure 12 describes the procedure CleanKernel.
As in algorithm AC4 , the domains Di are initialized with values satisfying unary
node constraints. The algorithm is decomposed into two main steps: an initial-
ization step (See the pseudo code in Figure 3)and a pruning step which updates
the nodes as a function of the removals made by the previous step to keep the
arc-consistency (See the pseudo code in Figure 4). However, whereas in AC4 a
value was removed from a node i if it had no direct support, in AC4BC , a value
is removed if it has no direct support and no indirect support obtained by using
the compatibility relation Cmpi.This is an additional step which is called the
cleaning step (See the pseudo code in Figure 5).
Theorem 1. The time complexity of the cleaning step is in O(ed)in the worst
case, where e is the number of edges and d is the size of D.
The key point of the time complexity is the call to the procedure CleanKernel.
Reducing the number of calls will reduce the computation time. AC4BC is de-
rived from AC4. In the pruning step, each time an element is removed from the
Queue, the algorithm try to refill the Queue before emptying it. This strategy
is costive because it implies many unnecessary calls to the procedure CleanKer-
nel which produces only few effects. One removal in an interface has only few
chances of producing a change in the domain Di of the kernel of the node. We
states previously that the complexity of the procedure CleanKernel is in O(ed).
In fact this complexity can be defined more accurately by edit where dit is the
size of the domain Di at the time t of the algorithm. The less quickly the size
of Di decreases, the more slowly the algorithm runs.
1. First, the Queue is filled by the removed labels in the initialization step as
in the previous versions of AC4BC ,
2. Second, the Queue is emptied
180 A. Deruyver and Y. Hodé
3. Next, the procedure CleanKernel is called for each node having at least a
label removed. This step refill the Queue. Then the steps 2 and 3 are repeated
until no removal were possible (see Figure 6).
In order to do that, an array Tabnode of boolean, with a size equal to the number
of nodes, is updated each time at least a removal has been made in one node. Then,
Tabnode[i] is equal to true if at least one label has been removed from the node i.
This array is initialized to false before the beginning of the pruning step. This array
allows to know which nodes has to be updated by the procedure CleanKernel.
This procedure is called only if it is necessary after having emptied the Queue and
studied all the interfaces of all the nodes. The pseudo code of the pruning step of
the optimized version of AC4BC called OAC4BC is given in the Figure 7.
Fig. 7. The experimentations shows that the time complexity order of OAC4BC is
better in average than the time complexity of AC4BC
4 Experimentations
Reducing the number of calls to CleanKernel will reduce the computation time
of the arc-consistency checking. However, we can imagine that in some cases
the structure Queue can only be filled with few elements. Then, the gain may
be lost by a change of the scanning order of the labels. It may lead to work
first with labels whose removal has a poor effect on the other labels. The time
complexity in the worst case for AC4BC and its optimized version OAC4BC are
the same. However, it is interesting to study the gain of the optimized algorithm
on experimental data.
a.
b.
c.
If the optimization change the time cost by a constant scaling factor, x/y should
be constant for any x. Figure 8 shows that this is not true. The correlation
between x and x/y (Spearman coefficient r = 0.93, p < 0.0001) is very strong.
It means that the higher is x, the higher is the gain x/y. This result suggests
that the time complexity order of OAC4BC is better in average than the time
complexity of AC4BC , at least with our set of test images.
The optimized version of the AC4BC algorithm called OAC4BC has two advan-
tages:
– It gives the possibility to apply our approach on images with more than 800
segmented regions and with a conceptual graph containing 142 edges. These
experimentation would not be possible without this optimization. Than it
gives possibility to apply our approach on real complex problems.
– It gives the possibility to envisage the parallelization of our algorithm. In that
case each node can be considered as an individual process. Each node are
updated separately (See lines 45-55 of Figure 7). The nodes can be updated
in one step in parallel. The consequences of this updating can be sent to the
other nodes in a second step (See lines 32-43 of Figure 7). Such a parallel
implementation could be made in the context of GPU programming.
Acknowledgment
We thank the company ”Véolia” for having supplied us the set of images of water
meter.
Arc-Consistency Checking with Bilevel Constraints: An Optimization 183
References
1. Bessière, C.: Arc-consistency and arc-consistency again. Artificial intelligence 65,
179–190 (1994)
2. Kokèny, T.: A new arc consistency algorithm for csps with hierarchical domains. In:
Proceedings 6th IEEE International Conference on Tools for Artificial Intelligence,
pp. 439–445 (1994)
3. Mohr, R., Henderson, T.: Arc and path consistency revisited. Artificial Intelli-
gence 28, 225–233 (1986)
4. Mohr, R., Masini, G.: Good old discrete relaxation. In: Proceedings ECAI 1988, pp.
651–656 (1988)
5. Hentenryck, P.V., Deville, Y., Teng, C.: A generic arc-consistency algorithm and its
specializations. Artificial Intelligence 57(2), 291–321 (1992)
6. Mackworth, A., Freuder, E.: The complexity of some polynomial network consis-
tency algorithms for constraint satisfaction problems. Artificial Intelligence 25, 65–
74 (1985)
7. Freuder, E., Wallace, R.: Partial constraint satisfaction. Artificial Intelligence 58,
21–70 (1992)
8. Deruyver, A., Hodé, Y.: Constraint satisfaction problem with bilevel constraint:
application to interpretation of over segmented images. Artificial Intelligence 93,
321–335 (1997)
9. Deruyver, A., Hodé, Y.: Qualitative spatial relationships for image interpretation
by using a conceptual graph. In: Image and Vision Computing (2008) (to appear)
Pairwise Similarity Propagation Based Graph
Clustering
for Scalable Object Indexing and Retrieval
1 Introduction
In this paper we aim to develop a framework for indexing and retrieving objects of
interest where large variations of viewpoint, background structure and occlusions are
present. State-of-the-art methods for object retrieval from large image corpora rely on
variants of the "Bag-of-Feature (BoF)" technique [2][7][13]. According to this method-
ology, each image in the corpus is first processed to extract high-dimensional feature
descriptors. These descriptors are quantized or clustered so each feature is mapped to a
"visual word" in a relatively small discrete vocabulary. The corpus is then summarized
using an index where each image is represented by the visual words contained within it.
At query time, the system is presented with a query in the form of an image region. This
region is itself processed to extract feature descriptors that are mapped onto the visual
word vocabulary, and these words are used to index the query. The response set of the
query is a set of images from the corpus that contain a large number of visual words
in common with the query region. These response images may be ranked subsequently
Corresponding author.
A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 184–194, 2009.
c Springer-Verlag Berlin Heidelberg 2009
Pairwise Similarity Propagation Based Graph Clustering 185
using spatial information to ensure that the response and the query not only contain sim-
ilar features, but that the features occur in compatible spatial configurations [6][9][14].
However, recent work [2][7] has shown that these methods can suffer from poor re-
call when the object of interest appears with large variations of viewpoint, variation in
background structure and under occlusion.
The work reported in [2][7][8], explores how to derive a better latent object-model
using a generalization of the concept of query expansion, a well-known technique from
the field of text based information retrieval [1][11]. In text-based query expansion a
number of the highly ranked documents from the original response set are used to gen-
erate a new query, or several new queries, that can be used to obtain new response sets.
The outline of the approach [2][7][8] is as follows:
Stage 1. Given a query region, search the corpus and retrieve a set of image regions that match
the query object;
Stage 2. Combine the retrieved regions, along with the original query, to form a richer latent
model of the object of interest;
Stage 3. Re-query the corpus using this expanded model to retrieve an expanded set of match-
ing regions;
Stage 4. Repeat the process as necessary, alternating between model refinement and
re-querying.
In Stage 1, a BoF based method is used to retrieve a set of initial images. In Stage
2, the initially returned result list is re-ranked by estimating affine homographies be-
tween the query image and each of the top-ranking results from the initial query. The
score used in re-ranking is computed from the number of verified inliers for each result.
According to the top ranked images, a richer latent model is formed and is re-issued
as a new query image in Stage 3. To generate the re-queries, five alternative query ex-
pansion methods are proposed[2][7][8]. These include a) query expansion baseline, b)
transitive closure, c) average query expansion, d) recursive average query expansion,
and e) resolution expansion. Each method commences by evaluating the original query
Q0 composed of all the visual words which fall inside the query region. A latent model
is then constructed from the verified images returned from Q0 , and a new query Q1 ,
or several new queries, issued. These methods have achieved substantially improved
retrieval performance. However, they suffer from four major problems. In the follow-
ing paragraphs, we analyze these problems in detail and use the analysis to design an
improved search engine based on a graph-based representation.
Image indexing. In BoF methods, the images are indexed by the quantized descriptors.
However, if we analyze the "bag-of-words (BoW)" used in text information retrieval
(TIR) and the BoF used in object indexing/retrieval (OIR), we observe that the BoF
does not operate at the same semantic level as the BoW. A word in BoW, specified as
a keyword, is a single word, a term or a phrase. Every keyword (e.g. cup or car) nor-
mally has a high level semantic meaning. However, a visual feature usually does not
posses semantic meaning. Furthermore, we observe that most of the visual words are
not object or class specific. In a preliminary experimental investigation, we have trained
a large clustering tree using over 2M selected SIFT [15] descriptors extracted from over
50K images and spanning more than 500 objects. The number of the leaf nodes in the
186 S. Xia and E.R. Hancock
clustering tree is 25334 and the mean vector of each leaf node is used as a quantized
visual word. With an increasing number of objects, a single visual word may appear in
hundreds of different objects. By contrast, a group of local features for an object con-
tained in an image together with their collective spatial arrangement are usually of a
high level semantic meaning. Moreover, such a representation is also significantly more
object or scene specific. Accordingly, the above visual word might best be regarded as
a morpheme in English, or a stroke or word-root in Chinese. Motivated by these obser-
vations, we propose an OIR model based on an arrangement of features for an object
and which is placed at the word-level in TIR. Since each bag of features is structured
data, a more versatile and expressive representational tool is provided by an attributed
graph [3]. Hence we represent a bag of features using an attributed graph G, and this
graph will be used for the purposes of indexing. Further details appear in Section 3.
Measuring image similarity. Provided that the graph representation is constructed
using all of the available local invariant features, then the number of local invariant
features of an image that are detected and that need to be processed might be very
large. Moreover such an approach also renders redundant the representation of shape
information and also poses computationally difficulty in the manipulation of all pos-
sible features for modeling and training. For example, one high resolution image (e.g.
3264×2448) can be resized to a many lower resolutions (e.g. 816×612, 204×153). As
a result the number of spatially consistent inliers varies significantly, and it is difficult
to define a ranking function. If the images are not matched under comparable scales,
an object that is a sub-part of another object may have a high matching score. This will
result in significant false matching using query expansion. Hence, each image is repre-
sented by a pyramid structure, with each grid scaled to an identical size, and then select
a subset of salient visual features that can be robustly detected and matched, using a
method for ranking SIFT features proposed in [15]. In this way, one high resolution im-
age might be represented by several graphs. For such canonical graphs, it will be much
easier to define a suitable similarity measure.
Retrieval speed. In the above method, spatial verification must also be performed for
the subsequent re-queries. As a result it may become prohibitively expensive to retrieve
from a very large corpus of images. We therefore require efficient ways to include spa-
tial information in the index, and move some of the burden of spatial matching from
the ranking stage to the training stage. We represent each image or each region of inter-
est using graphs and then compute all possible pairwise graph similarity measures. For
each graph we rank in descending order all remaining graphs in the dataset according
to the similarity measures. For each graph we then select the K best ranked graphs,
referred to as K-nearest neighbor graphs (KNNG). For retrieval, we directly use the
training result for each re-query to repeat the above query expansion process. This will
significantly decrease the time consumed in the query stage.
Ranking. In the above method, the images in the final result are in the same order in
which they entered the queue for the subsequent re-query. We argue that these images
should be re-ranked. Unfortunately, re-computing the pairwise similarity measures be-
tween the query image and each retrieved graph will be time consuming. We thus pro-
pose a similarity propagation method to approximate the similarity measure.
Pairwise Similarity Propagation Based Graph Clustering 187
The outline of the remainder of this paper is as follows. In Section 2, we present some
preliminaries for our work. In Section 3, we describe how to train a search engine for
incremental indexing and efficient retrieval. We present experimental results in Section
4 and conclude the paper in Section 5.
2 Preliminaries
For an image, those SIFT [5] features that are robustly matched with the SIFT features
in similar images can be regarded as salient representative features. Motivated by this,
a method for ranking SIFT features has been proposed in [15]. Using this method, the
SIFT features of an image I are ranked in descending order according to a matching fre-
quency. We select the T best ranked SIFT features, denoted as V={V t , t = 1, 2, ..., T },
→
− →
− →
− →− →−
where V t = (( X t )T , ( D t )T , (U t )T )T . Here, X t is the location, D t is the direction vector
→
−
and U t is the set of descriptors of a SIFT feature. In our experiments, T is set to 40. If
there are less than this number of feature points present then all available SIFT features
in an image are selected. We then represent the selected SIFT features in each image
using an attributed graph.
Formally, an attributed graph G [3] is a 2-tuple G = (V, E), where V is the set of
vertices, E⊆V×V is the set of edges. For each image, we construct a Delaunay graph G
using the coordinates of the selected SIFT features. In this way, we can obtain a set of
graphs G ={Gl , l = 1, 2, ..., N} from a set of images.
We perform pairwise graph matching (PGM) with the aim of finding a maximum
common subgraph (MCS) between two graphs Gl and Gq , and the result is denoted as
MCS (Gl ,Gq ). In general, this problem has been proven to be NP-hard. Here we use
a Procrustes alignment procedure [12] to align the feature points and remove those
features that do not satisfy the spatial arrangement constraints.
Suppose that Xl and Xq are respectively the position coordinates of the selected fea-
tures in graphs Gl and Gq . We can construct a matrix
Z = arg min Xl · Ω · Xq F , sub ject to ΩT · Ω = I. (1)
where • F denotes the Frobenius norm. The norm is minimized by the nearest orthog-
onal matrix
Z ∗ = Ψ · Υ∗ , sub ject to XlT · Xq = Ψ · Σ · Υ∗ . (2)
where Ψ · Σ · Υ∗ is the singular value decomposition of matrix XlT · Xq . The goodness-
of-fit criterion is the root-mean-squared error, denoted as e(Xl , Xq ). The best case is
e(Xl , Xq ) = 0. The error e can be used as a measure of geometric similarity between the
two groups of points. If we discard one pair of points from Xl and Xq , denoted as Xl→i
and Xq→i , e(Xl→i , Xq→i ), i = 1, 2, ..., CS (Gl,Gq ) can be obtained, where CS (Gl ,Gq )
is the number of SIFT features between two graphs initially matched using the matching
proposed in [18]. The maximum decrease of e(Xl→i , Xq→i ) is defined as
Δe(CS (Gl , Gq )) = e(Xl , Xq ) − min{e(Xl→i , Xq→i )} (3)
if Δe(CS (Gl , Gq ))/e(Xl , Xq ) > , e.g. = 0.1, the corresponding pair Xli and Xqi is
discarded as a mismatched feature pair. This leave-one-out procedure can proceed iter-
atively, and is referred as the iterative Procrustes matching of Gl and Gq .
188 S. Xia and E.R. Hancock
Given MCS (Gl , Gq ) obtained by the above PGM procedure, we construct a similarity
measure between the graphs Gl and Gq as follows:
R(Gl , Gq ) = MCS (Gl , Gq ) × ( exp(− e(Xl , Xq )) )κ . (4)
where MCS (Gl , Gq ) is the cardinality of the MCS of Gl and Gq , κ is the number of
mismatched feature pairs discarded by iterative Procrustes matching, which is used to
amplify the influence of the geometric dissimilarity between Xl and Xq .
Finally, for the graph set G ={Gq , q = 1, 2, ..., N}, for each graph Gl ∈ G, and the
remaining graphs in the set (∀Gq ∈ G), we obtain the pairwise graph similarity measures
R(Gl , Gq ) defined in Equation (4). Using the similarity measures we rank in descending
order all graphs Gq . The K top ranked graphs are defined as the K-nearest neighbor
graphs (KNNG) of graph Gl , denoted as K{Gl }.
∀Gl ∈ G, we can obtain the siblings S {Gl }. For each graph Gq ∈ S {Gl }, the correspond-
ing siblings can also be obtained. In this way, we can iteratively obtain a series of graphs
which satisfy consistent sibling relationships.
The graph set, obtained in this way, is referred to as a family tree of graph Gl
(FTOG). Given a graph set G, an FTOG of Gl with k generations, denoted as L{Gl , k}, is
defined as:
L{Gl , k} = L{Gl , k − 1} S Rτ {Gq }. (8)
Gq ∈L{Gl ,k−1}
where, if k = 1, L{Gl , 1} = L{Gl , 0} S {Gl } and L{Gl , 0} = {Gl }; and the process stops
when L{Gl , k} = L{Gl , k + 1}. An FTOG, whose graphs satisfy the restriction defined in
Equation (7), can be regarded as a cluster of graphs. We thus refer to this process defined
in Equation (8) as pairwise similarity propagation based graph clustering (SPGC).
KNNG information of each graph. The computational complexity of Step 4 is also low.
Hence the time consumed is nearly a constant for a query from even very large image
datasets.
Incremental Object Indexing. Given a graph set G and its accompanying RSOM tree,
an additional graph Gl is processed as follows:
1) If maxGq ∈L{Gl ,g} R(Gl , Gq ) is greater than a threshold Rτ0 , we regard Gq as a dupli-
cate of Gl . Meanwhile, a graph Gl in graph set is referred to as an exemplar graph.
2) If maxGq ∈L{Gl ,g} R(Gl , Gq ) ≤ Rτ0 , Gl is incrementally added to G. Each Gq ∈
K{Gl }, K{Gq } is updated according to the descending order of the pairwise similarity
measures if needed. In addition, the descriptors of graph Gl are incrementally added
to the RSOM tree.
Although the threshold Rτ0 is set as a constant in this paper, it can also be learned
from the training data for each object in order to select a group of representative irre-
ducible graphs. These graphs act as indexing items and are analogous to the keywords
in TIR. When querying, if a graph Gq is queried, its duplicate graphs, if any, are ranked
in the same order with Gq .
4 Experimental Results
We have collected 53536 images, referred as Dataset I, some examples of which are
shown in Figure 1, as training data. The data spans more than 500 objects including
Fig. 1. Image data sets. a: 3600 images of 50 objects in COIL 100, labeled as A1∼A50; b: 29875
unlabeled images from many other standard datasets, e.g. Caltech101 [4] and Google images,
covering over 450 objects and used as negative samples; c: 161 images of 8 objects used in [10],
labeled as C1 to C8; d: 20000 images of 10 objects collected by us, labeled as D1 to D10. For
each of the objects in D1 to D9, we collect 1500 images which traverse a large variation of
imaging conditions, and similarly 6500 images for D10. For simple description, the 4 dada sets
are denoted as A to D. The objects in Figure 1a,Figure 1c and Figure 1d are numbered from left to
right and then from top to bottom as shown in the corresponding figures, e.g. A1 to A50 in Figure
1a. As a whole, the 68 objects are also identified as Object 1 to Object 68. The above images as
a whole are referred as Dataset I.
Pairwise Similarity Propagation Based Graph Clustering 191
Fig. 2. A sample of the results returned by our method for 72 images of a car, appearing with
viewpoint variations of 0 360o , in COIL 100, achieving total recall and precision 1. This query
was performed on a dataset of over 50,000 images. The center image is the query image. Using
SPGC, we can obtain an FTOG containing all 72 images of the car as shown in this figure.
human faces and scenes. We take 68 images as examples, which are identified as Object
1 to Object 68. For each of these images, we extract ranked SIFT features, using the
method presented in [15], of which at most 40 highly ranked features are selected to
construct a graph. We have collected over 2,140,000 SIFT features and 53536 graphs
for the training set. We have trained a RSOM clustering tree with 25334 leaf nodes for
the SIFT descriptors of Dataset I using the incremental RSOM training method. In this
training stage, we have obtained K{Gl } for each of the graphs of Dataset I. We set Rτ0
as 18 and 33584 graphs are selected as exemplar graphs. As a result 9952 graphs are
indexed as a duplicates of their nearest neighbors.
A sample of the results returned by our method is shown in Figure 2. Each of the
instances are recalled with precision 1, although the car appears with large viewpoint
changes.
We randomly select 30% of the sample graphs from the above 68 objects in Dataset I.
We use each of these graphs to obtain a query response set for each similarity threshold.
For each retrieval we compute the maximal F-measure defined as 1/recall + 21/precision over
the different threshold values. The average of these maximal F-measures for each object
class are given in Table 1.
ID f ID f ID f ID f ID f ID f ID f ID f ID f ID f
1 1.0 2 .651 3 1.0 4 1.0 5 1.0 6 1.0 7 1.0 8 1.0 9 1.0 10 1.0
11 1.0 12 1.0 13 1.0 14 1.0 15 1.0 16 1.0 17 1.0 18 1.0 19 1.0 20 1.0
21 1.0 22 1.0 23 1.0 24 1.0 25 1.0 26 1.0 27 1.0 28 1.0 29 1.0 30 1.0
31 1.0 32 1.0 33 1.0 34 1.0 35 1.0 36 1.0 37 1.0 38 1.0 39.619 40 1.0
41 1.0 42 1.0 43 1.0 44 1.0 45 1.0 46 1.0 47 1.0 48 1.0 49 1.0 50 1.0
51.325 520.350 530.333 540.354 55.314 560.364 57.353 580.886 59.812 60.868
61.752 62 .777 63 .753 64 .734 65.791 66 .747 67.714 680.975
192 S. Xia and E.R. Hancock
(a) Performance using FTOG (b) performance using KNNG (c) Performance comparison
between FTOG and KNNG
Fig. 3. Retrieval performance of Object 3.(a) Retrieval performance using our family tree of
graphs method, referred as FTOG; (b) Retrieval performance using simple K-nearest neighbor
graphs(KNNG). (c) Two ROC plots of two methods, in that we can obtain an optimal operating
point where recall and precision and F-Measure all achieve a value 1, and the average precision
of our method also achieves a value 1.
From Table 1, it is clear that for most of the objects sampled under controlled imaging
conditions, the ideal retrieval performance (an F-measure of 1 or an average precision
of 1) has been achieved. This is illustrated by Figure 3. The plots of recall/precision and
similarity threshold using our FTOG based method are shown in Figure 3a. The plots
of recall/precsion and similarity threshold using simple K-nearest neighbors graphs
(KNNG) are shown in Figure 3b. The ROC plots for the two methods are shown in
Figure 3c. For the FTOG method, the optimal operation point is that both recall and
precision achieve 1, while the F-measure and the average precision also achieve 1.
This means that all graphs of the object of interest can be clustered into a unique clus-
ter. Comparing to the simple K nearest neighbors based method, the retrieval perfor-
mance has been significantly improved by introducing pairwise clustering, as shown
in Figure 3c.
However, in most practical situations, the images of an object might be obtained with
large variations of imaging conditions and are more easily clustered into several FTOGs.
In this situation the average precision is usually less than 1. An example is provided
by the retrieval performance for Objects 51 to 68 shown in Table 1. In particular, for
Objects 51 to 58, the F-measure is very low because of the large variations of viewpoint.
The corresponding images are not densely sampled to form a unique cluster using our
similarity propagation based graph clustering method. The results for Objects 59 to 68
are much better since we have collected thousands of images for each of them with
continuous variations of "imaging parameters".
5 Conclusion
In this paper, we propose a scalable object indexing and retrieval framework based
on the RSOM tree clustering of feature descriptors and pairwise similarity propaga-
tion based graph clustering (SPGC). It is distinct from current state-of-the-art bag-of-
feature based methods [2][7] since we do not use a quantization of descriptors as visual
Pairwise Similarity Propagation Based Graph Clustering 193
words. Instead, we represent each bag of features of an image together with their spatial
configuration using a graph. In object indexing and retrieval such graphs act in a man-
ner that is analogous to keywords in text indexing and retrieval. We extend the widely
used query expansion strategy, and propose a graph clustering technique based on pair-
wise similarity propagation. Using RSOM tree and SPGC, we implement an incremen-
tal training search engine. Since most of the computation has been transferred to the
training stage, the high precision and recall retrieval requires a nearly constant time
consumption for each query.
We perform experiments with over 50K images spanning more than 500 objects
and these show that the instances similar to the query item can be retrieved with ease,
speed and accuracy. For some of the objects, the ideal retrieval performance (an average
precision of 1 or an F-measure of 1) has been achieved.
In our framework, if the SIFT feature extractor is implemented by using C++ or
DSP, and the RSOM tree is implemented based on cluster-computers [17], and multiple
pairwise graph matchings run in parallel, our system can be scalable to huge dataset
with real time retrieval. We leave such researches for our future work.
Acknowledge
We acknowledge financial support from the FET programme within the EU FP7, un-
der the SIMBAD project (contract 213250), and by the ATR Lab Foundation project
91408001020603.
References
1. Buckley, C., Salton, G., Allan, J., Singhal, A.: Automatic query expansion using smart. In:
TREC-3 Proc. (1995)
2. Chum, O., Philbin, J., Sivic, J., Isard, M., Zisserman, A.: Total recall: Automatic query ex-
pansion with a generative feature model for object retrieval. In: Proc. ICCV (2007)
3. Chung, F.: Spectral graph theory. American Mathematical Society, Providence (1997)
4. Li, F.F., Perona, P.: A Bayesian hierarchical model for learning natural scene categories.
CVPR 2, 524–531 (2005)
5. Lowe, D.: Local feature view clustering for 3d object recognition. CVPR 2(1), 1682–1688
(2001)
6. Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: Comp. Vision and
Pattern Recognition, pp. II: 2161–2168 (2006)
7. Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabu-
laries and fast spatial matching. In: Proc. CVPR (2007)
8. Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Lost in Quantization: Improving
Particular Object Retrieval in Large Scale Image Databases. In: Proc. CVPR (2008)
9. Quack, T., Ferrari, V., Van Gool, L.: Video mining with frequent itemset configurations. In:
Proc. CIVR (2006)
10. Rothganger, F., Lazebnik, S., Schmid, C., Ponce, J.: 3d object modeling and recognition
using local affine-invariant image descriptors and multi-view spatial constraints. IJCV 66(3),
231–259 (2006)
11. Salton, G., Buckley, C.: Improving retrieval performance by relevance feedback. Journal of
the American Society for Information Science 41(4), 288–297 (1999)
194 S. Xia and E.R. Hancock
12. Schonemann, P.: A generalized solution of the orthogonal procrustes problem. Psychome-
trika 31(3), 1–10 (1966)
13. Sivic, J., Zisserman, A.: Video Google: A text retrieval approach to object matching in
videos. In: Proc. ICCV (October 2003)
14. Tell, D., Carlsson, S.: Combining appearance and topology for wide baseline matching. In:
Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp.
68–81. Springer, Heidelberg (2002)
15. Xia, S.P., Ren, P., Hancock, E.R.: Ranking the local invariant features for the robust visual
saliencies. In: ICPR 2008 (2008)
16. Xia, S.P., Zhang, L.F., Yu, H., Zhang, J., Yu, W.X.: Theory and algorithm of machine learning
based on rsom tree model. ACTA Electronica sinica 33(5), 937–944 (2005)
17. Xia, S.P., Liu, J.J., Yuan, Z.T., Yu, H., Zhang, L.F., Yu, W.X.: Cluster-computer based incre-
mental and distributed rsom data-clustering. ACTA Electronica sinica 35(3), 385–391 (2007)
18. Xia, S.P., Hancock, E.R.: 3D Object Recognition Using Hyper-Graphs and Ranked Local
Invariant Features. In: da Vitoria Lobo, N., et al. (eds.) SSPR+SPR 2008. LNCS, vol. 5342,
pp. 117–1126. Springer, Heidelberg (2008)
A Learning Algorithm for the Optimum-Path
Forest Classifier
Institute of Computing
University of Campinas
Campinas SP, Brazil
1 Introduction
Pattern recognition techniques can be divided according to the amount of avail-
able information of the training set: (i) supervised approaches, in which we have
fully information of the samples, (ii) semi-supervised ones, in which both labeled
and unlabeled samples are used for training classifiers and (iii) unsupervised
techniques, where none information about the training set are available [1].
Semi-supervised [2,3,4,5] and unsupervised [6,7,8,9] techniques are commonly
represented by graphs, in which the dataset samples are the nodes and some kind
of adjacency relation need to be established. Zahn [7] proposed to compute a
Minimum Spanning Tree (MST) in the whole graph, and further one can remove
some edges aiming to partition the graph into clusters. As we have a connected
acyclic graph (MST), any removed edge will make the graph a forest (a collection
of clusters, i.e., trees). These special edges are called inconsistent edges, which
can be defined according to some heuristic, such that an edge can be inconsistent
if and only if its weight was greater than the average weight of its neighborhood.
Certainly, this approach does not work very well in real and complex situations.
Basically, graph-based approaches aim to add or to remove edges, trying to join
or to separate the dataset into clusters [8].
Supervised techniques use a priori information of the dataset to create optimal
decision boundaries, trying to separate the samples that share some characteris-
tic from the other ones. Most of these techniques does not make use of the graph
to model their problems, such that the widely used Artificial Neural Networks
A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 195–204, 2009.
c Springer-Verlag Berlin Heidelberg 2009
196 J.P. Papa and A.X. Falcão
situations, but none learning algorithm was developed for this last one version.
In that way, the main idea of this paper is to present a learning algorithm for
this new variant of the OPF classifier, as well some comparisons against the tra-
ditional OPF and Support Vector Machines are also discussed. The remainder of
this paper is organized as follows: Sections 2 and Section 3 presents, respectively,
the new variant of the OPF classifier and its learning algorithm. Section 4 shows
the experimental results and finally, Section 5 discuss the conclusions.
d
where σ = 3f and df is the maximum arc weight in (Z1 , Ak ). This parameter
choice considers all nodes for density computation, since a Gaussian function
covers most samples within d(s, t) ∈ [0, 3σ]. However the density value ρ(s) be
calculated with a Gaussian kernel, the use of the k-nn graph allows the proposed
OPF to be robust to possible variations in the shape of the classes.
A sequence of adjacent samples defines a path πt , starting at a root R (t) ∈ Z1
and ending at a sample t. A path πt = t is said trivial, when it consists
of a single node. The concatenation of a path πs and an arc (s, t) defines an
extended path πs · s, t. We define f (πt ) such that its maximization for all
nodes t ∈ Z1 results into an optimum-path forest with roots at the maxima
of the pdf, forming a root set R. We expect that each class be represented by
one or more roots (maxima) of the pdf. Each optimum-path tree in this forest
represents the influence zone of one root r ∈ R, which is composed by samples
more strongly connected to r than to any other root. We expect that the training
samples of a same class be assigned (classified) to an optimum-path tree rooted
at a maximum of that class. The path-value function is defined as follows.
ρ(t) if t ∈ R
f1 (t) =
ρ(t) − δ otherwise
198 J.P. Papa and A.X. Falcão
where δ = min∀(s,t)∈Ak |ρ(t)=ρ(s) |ρ(t) − ρ(s)|. The root set R is obtained on-
the-fly. The method uses the image foresting transform (IFT) algorithm [17] to
maximize f1 (πt ) and obtain an optimum-path forest P — a predecessor map
with no cycles that assigns to each sample t ∈ / R its predecessor P (t) in the
optimum path P ∗ (t) from R or a marker nil when t ∈ R. The IFT algorithm
for (Z1 , Ak ) is presented below.
Algorithm 1 – IFT Algorithm
Input: A k-nn graph (Z1 , Ak ), λ(s) for all s ∈ Z1 , and path-value function f1 .
Output: Label map L, path-value map V , optimum-path forest P .
Auxiliary: Priority queue Q and variable tmp.
1. For each s ∈ Z1 , do
2. P (s) ← nil, L(s) ← λ(s), V (s) ← ρ(s) − δ
3. and insert s in Q.
4. While Q is not empty, do
5. Remove from Q a sample s such that V (s) is
6. maximum.
7. If P (s) = nil, then V (s) ← ρ(s).
8. For each t ∈ Ak (s) and V (t) < V (s), do
9. tmp ← min{V (s), ρ(t)}.
10. If tmp > V (t) then
11. L(t) ← L(s), P (t) ← s, V (t) ← tmp.
12. Update position of t in Q.
Initially, all paths are trivial with values f (t) = ρ(t) − δ (Line 2). The global
maxima of the pdf are the first to be removed from Q. They are identified as
roots of the forest, by the test P (s) = nil in Line 7, where we set its correct path
value f1 (s) = V (s) = ρ(s). Each node s removed from Q offers a path πs · s, t
to each adjacent node t in the loop from Line 8 to Line 12. If the path value
f1 (πs · s, t) = min{V (s), ρ(t)} (Line 9) is better than the current path value
f1 (πt ) = V (t) (Line 10), then πt is replaced by πs · s, t (i.e., P (t) ← s), and the
path value and label of t are updated accordingly (Line 11). Local maxima of the
pdf are also discovered as roots during the algorithm. The algorithm also outputs
an optimum-path value map V and a label map L, wherein the true labels of
the corresponding roots are propagated to every sample t. A classification error
in the training set occurs when the final L(t) = λ(t). We define the best value
of k ∗ ∈ [1, kmax ] as the one which maximizes the accuracy Acc of classification
in the training set. The accuracy is defined as follows.
Let N Z1 (i), i = 1, 2, . . . , c, be the number of samples in Z1 from each class i.
We define
F P (i) F N (i)
ei,1 = and ei,2 = , (3)
|Z1 | − |N Z1 (i)| |N Z1 (i)|
where F P (i) and F N (i) are the false positives and false negatives, respectively.
That is, F P (i) is the number of samples from other classes that were classified
A Learning Algorithm for the Optimum-Path Forest Classifier 199
as being from the class i in Z1 , and F N (i) is the number of samples from the
class i that were incorrectly classified as being from other classes in Z1 . The
errors ei,1 and ei,2 are used to define
where E(i) is the partial sum error of class i. Finally, the accuracy Acc of the
classification is written as
c c
2c − i=1 E(i) E(i)
Acc = = 1 − i=1 . (5)
2c 2c
The accuracy Acc is measured by taking into account that the classes may have
different sizes in Z1 (similar definition is applied for Z2 ). If there are two classes,
for example, with very different sizes and the classifier always assigns the label
of the largest class, its accuracy will fall drastically due to the high error rate
on the smallest class.
It is expected that each class be represented by at least one maximum of the
pdf and L(t) = λ(t) for all t ∈ Z1 (zero classification errors in the training set).
However, these properties can not be guaranteed with path-value function f1
and the best value k ∗ . In order to assure them, we first find the best value k ∗
using function f1 and then execute Algorithm 1 one more time using path-value
function f2 instead of f1 .
ρ(t) if t ∈ R
f2 (t) =
ρ(t) − δ otherwise
−∞ if λ(t) = λ(s)
f2 (πs · s, t) = (6)
min{f2 (πs ), ρ(t)} otherwise.
Equation 6 weights all arcs (s, t) ∈ Ak such that λ(t) = λ(s) with d(s, t) = −∞,
constraining optimum paths within the correct class of their nodes.
The training process in our method can be summarized by Algorithm 2.
Algorithm 2 – Training
Input: Training set Z1 , λ(s) for all s ∈ Z1 , kmax and path-value functions f1
and f2 .
Output: Label map L, path-value map V , optimum-path forest P .
Auxiliary: Variables i, k, k∗ , M axAcc ← −∞, Acc, and arrays F P and F N of
size c.
1. For k = 1 to kmax do
2. Create graph (Z1 , Ak ) weighted on nodes by Eq. 1.
3. Compute (L, V, P ) using Algorithm 1 with f1 .
4. For each class i = 1, 2, . . . , c, do
5. F P (i) ← 0 and F N (i) ← 0.
6. For each sample t ∈ Z1 , do
7. If L(t) = λ(t), then
8. F P (L(t)) ← F P (L(t)) + 1.
9. F N (λ(t)) ← F N (λ(t)) + 1.
200 J.P. Papa and A.X. Falcão
Let the node s∗ ∈ Z1 be the one that satisfies the above equation. Given that
L(s∗ ) = λ(R(t)), the classification simply assigns L(s∗ ) to t.
4 Experimental Results
We performed two rounds of experiments: in the first one we used the OPFcpl ,
OPFknn and SVM 10 times to compute their accuracies, using different ran-
domly selected training (Z1 ) and test (Z2 ) sets. In the second round, we executed
202 J.P. Papa and A.X. Falcão
(a) (b)
Fig. 2. Samples from MPEG-7 shape dataset (a)-(c) Fish e (d)-(f) Camel
the above algorithms again, but they were submitted to the learning algo-
rithm. In this case, the datasets were divided into three parts: a training set
Z1 with 30% of the samples, an evaluation set Z3 with 20% of the samples,
and a test set Z2 with 50% of the samples. Section 4.1 presents the accuracy
results of training on Z1 and testing on Z2 . The accuracy results of training
on Z1 , with learning from the errors in Z3 , and testing on Z2 are presented in
Section 4.2.
The experiments used some combinations of public datasets — CONE TORUS
(2D points)(Figure 1a), SATURN (2D points) (Figure 1b), MPEG-7 (shapes)
(Figure 2) and BRODATZ (textures) — and descriptors — Fourier Coefficients
(FC), Texture Coefficients (TC), and Moment Invariants (MI). A detailed expla-
nation of them can be found in [20,15]. The results in Tables 1 and 2 are displayed
in the following format: x(y), where x and y are, respectively, mean accuracy and
its standard deviation. The percentages of samples in Z1 and Z2 were 50% and 50%
for all datasets.
We can observe that the conclusions drawn from Table 2 remain the same with
respect to the overall performance of the classifiers. In most cases, the general
learning algorithm improved the performance of the classifiers with respect to
their results in Table 1, i. e., it is possible for a given classifier to learn with its
own errors.
5 Conclusion
The OPF classifiers are a novel collection of graph-based classifiers, in which
some advantages with respect to the commonly used classifiers can be addressed:
they do not make assumption about shape/separability of the classes and run
training phase faster. There exists, actually, two variants of OPF-based classi-
fiers: OPFcpl and OPFknn , and the difference between them relie on the adja-
cency relation, prototypes estimation and path-cost function.
We show here how can a OPF-based classifier learns with its own errors,
introducing a learning algorithm for OPFknn , in which its classification results
were good and similar to those reported by the traditional OPF (OPFcpl ) and
SVM approaches. However, the OPF classifiers are about 50 times faster than
SVM for training. It is also important to note that the good accuracy of SVM was
due to parameter optimization. One can see that the OPFknn learning algorithm
improved its results, in some cases up to 3%, without increasing its training
set size.
204 J.P. Papa and A.X. Falcão
References
1. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley-
Interscience, Hoboken (2000)
2. Kulis, B., Basu, S., Dhillon, I., Mooney, R.: Semi-supervised graph clustering: a kernel
approach.In:ICML2005:Proc.ofthe22ndICML,pp.457–464.ACM,NewYork(2005)
3. Schlkopf, B., Zhou, D., Hofmann, T.: Semi-supervised learning on directed graphs.
In: Adv. in Neural Information Processing Systems, pp. 1633–1640 (2005)
4. Callut, J., Fançoisse, K., Saerens, M.: Semi-supervised classification in graphs using
bounded random walks. In: Proceedings of the 17th Annual Machine Learning
Conference of Belgium and the Netherlands (Benelearn), pp. 67–68 (2008)
5. Kumar, N., Kummamuru, K.: Semisupervised clustering with metric learning us-
ing relative comparisons. IEEE Transactions on Knowledge and Data Engineer-
ing 20(4), 496–503 (2008)
6. Hubert, L.J.: Some applications of graph theory to clustering. Psychometrika 39(3),
283–309 (1974)
7. Zahn, C.T.: Graph-theoretical methods for detecting and describing gestalt clus-
ters. IEEE Transactions on Computers C-20(1), 68–86 (1971)
8. Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall, Inc., Upper
Saddle River (1988)
9. Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Transactions on
Pattern Analysis and Machine Intelligence 22(8), 888–905 (2000)
10. Haykin, S.: Neural networks: a comprehensive foundation. Prentice Hall, Engle-
wood Cliffs (1994)
11. Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal mar-
gin classifiers. In: Proceedings of the 5th Workshop on Computational Learning
Theory, pp. 144–152. ACM Press, New York (1992)
12. Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. Wiley-
Interscience, Hoboken (2004)
13. Reyzin, L., Schapire, R.E.: How boosting the margin can also boost classifier com-
plexity. In: Proceedings of the 23th International Conference on Machine learning,
pp. 753–760. ACM Press, New York (2006)
14. Duan, K., Keerthi, S.S.: Which is the best multiclass svm method? an empirical
study. In: Oza, N.C., Polikar, R., Kittler, J., Roli, F. (eds.) MCS 2005. LNCS,
vol. 3541, pp. 278–285. Springer, Heidelberg (2005)
15. Papa, J.P., Falcão, A.X., Suzuki, C.T.N., Mascarenhas, N.D.A.: A discrete approach
for supervised pattern recognition. In: Brimkov, V.E., Barneva, R.P., Hauptman, H.A.
(eds.) IWCIA 2008. LNCS, vol. 4958, pp. 136–147. Springer, Heidelberg (2008)
16. Papa, J.P., Falcão, A.X.: A new variant of the optimum-path forest classifier. In:
4th International Symposium on Visual Computing, pp. I: 935–944 (2008)
17. Falcão, A.X., Stolfi, J., Lotufo, R.A.: The image foresting transform: Theory, al-
gorithms, and applications. IEEE Transactions on Pattern Analysis and Machine
Intelligence 26(1), 19–29 (2004)
18. Papa, J.P., Suzuki, C.T.N., Falcão, A.X.: LibOPF: A library for the
design of optimum-path forest classifiers, Software version 1.0 (2008),
http://www.ic.unicamp.br/~ afalcao/LibOPF
19. Chang, C.C., Lin, C.J.: LIBSVM: A Library for Support Vector Machines (2001),
http://www.csie.ntu.edu.tw/~ cjlin/libsvm
20. Montoya-Zegarra, J.A., Papa, J.P., Leite, N.J., Torres, R.S., Falcão, A.X.: Learning
how to extract rotation-invariant and scale-invariant features from texture images.
EURASIP Journal on Advances in Signal Processing, 1–16 (2008)
Improving Graph Classification by Isomap
1 Introduction
A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 205–214, 2009.
c Springer-Verlag Berlin Heidelberg 2009
206 K. Riesen, V. Frinken, and H. Bunke
underlying patterns. However, given a distance measure for graphs, this obstacle
can be overcome with Isomap by discovering and exploiting possible manifolds
in the underlying graph domain. The geodesic distance approximated by Isomap
may be more appropriate for certain pattern recognition tasks than the original
graph dissimilarities. The requirement for making Isomap applicable in graph
domains is the existence of a distance, or dissimilarity, measure for graphs. In
the context of the work described in this paper, the edit distance is used [4].
Compared to other approaches, graph edit distance is known to be very flexible.
Furthermore, graph edit distance is an error-tolerant dissimilarity measure and
is therefore able to cope well with distorted data.
To analyze the applicability of Isomap, a new family of graph kernels is used
in the present paper. The key idea of graph kernels is to map graphs implicitly
to a vector space where the pattern recognition task is eventually carried out.
Kernel functions can be interpreted as similarity measures. Hence, given the
graph edit distance, one can apply monotonically decreasing functions mapping
low edit distance values to high kernel values and vice versa. Rather than deriv-
ing a kernel from the original edit distances, the graph kernel proposed in this
paper is based on graph distances obtained from the original graph edit distance
through Isomap. That is, before the transformation from graph edit distance into
a kernel value is carried out, Isomap is applied to the graphs and their respec-
tive distances. In an experimental evaluation involving several graph data sets
of diverse nature, we investigate the question whether it is beneficial to employ
Isomap rather than the original edit distances for such a kernel construction.
In [5] a strategy similar to Isomap is applied to trees. In this work the geodesic
distances between the tree’s nodes are computed resulting in a distance matrix
for each individual tree. Via multidimensional scaling the nodes are eventually
embedded in a Euclidean space where spectral methods are applied in order
to analyze and cluster the underlying tree models. Note the difference to our
method where Isomap is applied to a whole set of graphs rather than to a
set of nodes. Furthermore, in our approach the resulting Isomap distances are
directly used in a distance based graph kernel rather than computing spectral
characterizations of the MDS-embedded nodes. Finally, as graph edit distance is
employed as pairwise dissimilarity measure, our approach can handle arbitrary
graphs with any type of node and edge labels.
The remainder of this paper is structured as follows. In the next section the
Isomap transformation is described in detail. In Sect. 3 similarity kernels based
on graph distances are introduced. An experimental evaluation of the proposed
Isomap framework is presented in Sect. 4. Finally, in Sect. 5 we draw some
conclusions from this work.
2 Isomap Transformation
One way of exploiting the manifold’s structure is by letting paths between two
points only traverse on the manifold, which is defined by areas of high data den-
ˆ i , xj ) between data points xi and xj ,
sity. Given these paths, a new distance d(x
termed Isomap distance, can be defined.
The data density is determined by the closeness of elements in the feature
space. Two close points have an Isomap distance equal to the original distance,
since they are from the same area of the manifold. A valid Isomap path along
the manifold can therefore be constructed as a concatenation of subpaths within
areas of a high data density. Of course, one needs to be careful not to create
disconnected areas. Closeness needs therefore to be defined in such a way that
local structures can be exploited, but at the same time outliers as well as distant
areas must still be connected. In this paper, closeness is induced via an auxiliary
graph termed k-nearest neighbor graph (k-NN graph).
Definition 1 (k-NN Graph). Given a set of input patterns X = {x1 , . . . , xn }
and a corresponding distance measure d : X × X → R, the k-NN graph G =
(V, E, d) with respect to X is defined as an auxiliary graph where the nodes
represent input patterns, i.e. V = X . Two nodes xi and xj are connected by an
edge (xi , xj ) ∈ E if xj is among the k nearest patterns of xi according to d. The
edge (xi , xj ) ∈ E is labeled with the corresponding distance d(xi , xj ).
Note that, according to this definition, the k-NN graph G is directed. In order to
obtain an undirected graph, for each edge (xi , xj ) an identically labeled reverse
edge (xj , xi ) is inserted in G. The Isomap distance between two patterns xi and
xj is then defined as the minimum length of all paths between them on the k-NN
graph.
Definition 2 (Isomap Distance). Given is a set of input patterns X =
{x1 , . . . , xn } with a distance function d : X × X → R and the k-NN graph
G = (V, E, d) defined with respect to X . A valid path between two patterns
xi , xj ∈ X is a sequence (pi )i=1,...,lp of length lp ∈ N of patterns pj ∈ X such
ˆ ·) between two
that (pi−1 , pi ) ∈ E for all i = 2, . . . , lp . The Isomap distance d(·,
patterns xi and xj is then given by
lp
ˆ i , xj ) = min
d(x d(pi−1 , pi )
p
i=2
On the k-NN graph G, the Isomap distances dˆ can be efficiently computed with
Dijkstra’s algorithm [6] as the shortest paths in G. The complete algorithm is
described in Alg. 1.
Any new data point x ∈ / X can be added in a simple way in O(1), provided
that only Isomap distances starting at the new point x are required, as would be
in the case of classifying a new graph. Since the k-nearest neighbors define the
direct neighborhood, all valid Isomap paths starting in x must pass through one
of its k-nearest neighbors. Therefore it is sufficient to connect the new element
208 K. Riesen, V. Frinken, and H. Bunke
Algorithm 1. Isomap(X , k)
Input: X = {x1 , . . . , xn }, k
Output: Pairwise Isomap distances d̂ij
with these nearest neighbors to compute the correct Isomap distances from x to
all other points in the graph.
Obviously, the Isomap distance dˆ crucially depends on the meta parameter
k. That is, k has to be defined sufficiently high such that G is connected, i.e.
each pair of patterns (xi , xj ) is connected by at least one path. We denote this
minimum value by kmin . Conversely, if k = n, i.e. if G is complete, the Isomap
distance dˆ will be equal to the original distance d and no additional information
is gained by Isomap. Hence, the optimal value for k lies somewhere in the interval
[kmin , n] and need to be determined on an independent validation set.
In this section the concept of graph edit distance and its transformation into a
kernel function is described in detail.
Definition 3 (Graph). Let LV and LE be sets of labels for nodes and edges,
respectively. A graph g is defined by the four-tuple g = (V, E, μ, ν), where V is
the finite set of nodes, E ⊆ V × V is the set of edges, μ : V → LV is the node
labeling function, and ν : E → LE is the edge labeling function.
where Υ (g1 , g2 ) denotes the set of edit paths transforming g1 into g2 , and c
denotes the edit cost function measuring the strength c(ei ) of edit operation ei .
Optimal algorithms for computing the edit distance of graphs are typically based
on combinatorial search procedures that explore the space of all possible map-
pings of the nodes and edges of the first graph to the nodes and edges of the
second graph [4]. A major drawback of those procedures is their computational
complexity, which is exponential in the number of nodes of the involved graphs.
However, efficient suboptimal methods for graph edit distance computation have
been proposed [7].
Clearly, the Isomap procedure described in Sect. 2 in conjunction with the
graph edit distance d can be applied to any graph set. The Isomap graph edit
distance dˆ between two graphs gi and gj is the minimum amount of distortion
applied to gi such that the edit path to gj passes only through areas of the
input space where elements of the training set can be found. Hence, all of the
intermediate graphs created in the process of editing gi into gj are similar or
equal to those graphs in the training set.
a shortcut (commonly referred to as kernel trick ) that eliminates the need for
computing ϕ(·) explicitly. What makes kernel theory interesting is the fact that
many pattern recognition algorithms can be kernelized, i.e. formulated in such
a way that no individual patterns, but only dot products of vectors are needed.
Such algorithms together with an appropriate kernel function are referred to as
kernel machines. In the context of kernel machines, the kernel trick allows us
to address any given recognition problem originally defined in a graph space G
in an implicitly existing vector space Rn instead, without explicitly performing
the mapping from G to Rn . As we are mainly concerned with the problem of
graph classification in this paper, we will focus on kernel machines for pattern
classification, in particular on support vector machines (SVM).
A number of kernel functions have been proposed for graphs [8,9,10,11]. Yet,
these kernels are to a large extent applicable to unlabeled graphs only or unable
to deal sufficiently well with strongly distorted data. In this section, a kernel
function is described that is derived from graph edit distance. The basic rationale
for the definition of such a kernel is to bring together the flexibility of edit
distance based graph matching and the power of SVM based classification [8].
Graph kernel functions can be seen as graph similarity measures satisfying
certain conditions, viz. symmetry and positive definiteness [12]. Such kernels are
commonly referred to as valid graph kernels. Given the dissimilarity information
of graph edit distance, a possible way to construct a kernel is to apply a mono-
tonically decreasing function mapping high dissimilarity values to low similarity
values and vice versa.
Formally, given such a dissimilarity value v(g1 , g2 ) between graphs g1 , g2 ∈ G
we define a kernel function κv : G × G → R as
where γ > 0.
Although this approach will not generally result in valid kernel functions,
i.e. functions satisfying the conditions of symmetry and positive definiteness,
there exists theoretical evidence suggesting that training an SVM with such a
kernel function can be interpreted as the maximal separation of convex hulls in
pseudo-Euclidean spaces [13].
4 Experimental Results
For our experimental evaluation, five graph data sets from the IAM graph
database repository are used1 . Lacking space we give a short description of the
data only. For a more detailed description we refer to [14].
The first data set used in the experiments consists of graphs representing dis-
torted letter drawings out of 15 classes (Letter ). Next we apply the proposed
method to the problem of fingerprint classification using graphs that represent
1
Note that all data sets are publicly available under http://www.iam.unibe.ch/
fki/databases/iam-graph-database
Improving Graph Classification by Isomap 211
Fig. 1. Five classes of the Letter data set before and after Isomap (plotted via MDS)
fingerprint images out of the four classes arch, left loop, right loop, and whorl
(Fingerprint ). Elements from the third graph set belong to two classes (active,
inactive) and represent molecules with activity against HIV or not (Molecule).
The fourth data set also consists of graphs representing molecular compounds.
However, these molecules belong to one of the two classes mutagen or non-
mutagen (Mutagenicity). The last data set consists of graphs representing web-
pages that belong to 20 different categories (Business, Health, Politics, . . .)
(Web). All data sets are divided into three disjoint subsets, i.e. a training, a
validation, and a test set.
The aim of the experiments is to investigate the impact of Isomap graph edit
distances on the classification performance. The original edit distance d and the
Isomap distance dˆ as a dissimilarity value give rise to two different kernels κd
and κd̂ , which are compared against each other.
Multidimensional scaling (MDS), which maps a set of pairwise distances into
an n-dimensional vector space, allows one to get a visual impression of the trans-
formation induced by Isomap. A subset of different classes is plotted before and
after the Isomap transformation in Fig. 1 for the Letter data set. The advantage
of better separability after the transformation can be seen clearly.
For the reference system two meta parameters have to be validated, viz. C
and γ. The former parameter is a weighting parameter for the SVM, which
controls whether the maximization of the margin or the minimization of the
error is more important. The second parameter γ is the weighting parameter in
the kernel function. Both parameters are optimized on the validation set and
eventually applied to the independent test set. For our novel approach with
Isomap graph edit distances d, ˆ an additional meta parameter has to be tuned,
namely k which regulates how many neighbors are taken into account when the
k-NN graph is constructed for the Isomap procedure. The optimization of the
parameter pair (C, γ) is performed on various Isomap edit distances, varying k
in a certain interval. Thus, the optimized classification accuracy with respect to
(C, γ) (illustrated in Fig. 2 (a)) can be regarded as a function of k (illustrated
in Fig. 2 (b)).
212 K. Riesen, V. Frinken, and H. Bunke
Table 1. Classification results of an SVM on the validation set (va) and the test set
(te). The reference system uses a kernel κd based on the original graph edit distances
d, while the novel kernel κd̂ is based on Isomap distances dˆ computed on a k-NN
graph (the optimal value for k is indicated for each data set). On all but the Web data
set an improvement of the classification accuracy can be observed – two out of four
improvements are statistically significant.
Data set va te va te k
◦ Statistically significant improvement over the reference system (Z-test with α = 0.05).
• Statistically significant deterioration over the reference system (Z-test with α = 0.05).
In Table 1 the classification accuracy on all data sets achieved by the reference
system and our novel approach are provided for both the validation set and the
test set. Additionally, the number of considered neighbors in the k-NN graph is
indicated. On the validation sets we observe that in three out of five cases our
novel approach achieves equal or better classification results than the reference
method. In the test case, on four out of five data sets the kernel based on Isomap
graph edit distances outperforms the original kernel. Two of these improvements
are statistically significant. Overall only one deterioration is observed by our
novel approach. Hence, we conclude that it is clearly beneficial to apply Isomap
to the edit distances before the transformation to a kernel is carried out.
5 Conclusions
In the present paper a graph kernel based on graph edit distances is extended
such that pairwise edit distances are non-linearly transformed before they are
Improving Graph Classification by Isomap 213
turned into kernel values. For the non-linear mapping the Isomap procedure
is used. This procedure builds an auxiliary graph, the so called k-NN graph,
where the nodes represent the underlying objects (graphs) and the edges con-
nect a particular object with its k nearest neighbors according to graph edit
distance. Based on this neighborhood graph, the shortest path between two en-
tities, computed by Dijkstra’s algorithm, is used as new pairwise distance. In
the experimental section of the present paper, a classification task is carried out
on five different graph data sets. As classifier, a SVM is employed. The reference
system’s kernel is derived from the original graph edit distances while the novel
kernel is derived from Isomap graph edit distances. The SVM based on the latter
kernel outperforms the former kernel on four out of five data sets (twice with
statistical significance).
Acknowledgments
We would like to thank B. Haasdonk and Michel Neuhaus for valuable discussions
and hints regarding our similarity kernel. This work has been supported by the
Swiss National Science Foundation (Project 200021-113198/1) and by the Swiss
National Center of Competence in Research (NCCR) on Interactive Multimodal
Information Management (IM2).
References
1. Tenenbaum, J., de Silva, V., Langford, J.: A global geometric framework for non-
linear dimensionality reduction. Science 290, 2319–2323 (2000)
2. Saul, L., Weinberger, K., Sha, F., Ham, J., Lee, D.: Spectral Methods for Dimen-
sionality Reduction. In: Semi-Supervised Learning. MIT Press, Cambridge (2006)
3. Conte, D., Foggia, P., Sansone, C., Vento, M.: Thirty years of graph matching
in pattern recognition. Int. Journal of Pattern Recognition and Artificial Intelli-
gence 18(3), 265–298 (2004)
4. Bunke, H., Allermann, G.: Inexact graph matching for structural pattern recogni-
tion. Pattern Recognition Letters 1, 245–253 (1983)
5. Xiao, B., Torsello, A., Hancock, E.R.: Isotree: Tree clustering via metric embedding.
Neurocomputing 71, 2029–2036 (2008)
6. Dijkstra, E.: A note on two problems in connexion with graphs. Numerische Math-
ematik 1, 269–271 (1959)
7. Riesen, K., Neuhaus, M., Bunke, H.: Bipartite graph matching for computing the
edit distance of graphs. In: Escolano, F., Vento, M. (eds.) GbRPR 2007. LNCS,
vol. 4538, pp. 1–12. Springer, Heidelberg (2007)
8. Neuhaus, M., Bunke, H.: Bridging the Gap Between Graph Edit Distance and
Kernel Machines. World Scientific, Singapore (2007)
9. Gärtner, T.: Kernels for Structured Data. World Scientific, Singapore (2008)
10. Kashima, H., Tsuda, K., Inokuchi, A.: Marginalized kernels between labeled
graphs. In: Proc. 20th Int. Conf. on Machine Learning, pp. 321–328 (2003)
11. Jain, B., Geibel, P., Wysotzki, F.: SVM learning with the Schur-Hadamard inner
product for graphs. Neurocomputing 64, 93–105 (2005)
214 K. Riesen, V. Frinken, and H. Bunke
12. Berg, C., Christensen, J., Ressel, P.: Harmonic Analysis on Semigroups. Springer,
Heidelberg (1984)
13. Haasdonk, B.: Feature space interpretation of SVMs with indefinite kernels. IEEE
Transactions on Pattern Analysis and Machine Intelligence 27(4), 482–492 (2005)
14. Riesen, K., Bunke, H.: IAM graph database repository for graph based pattern
recognition and machine learning. In: da Vitoria Lobo, N., et al. (eds.) Struc-
tural, Syntactic, and Statistical Pattern Recognition. LNCS, vol. 5342, pp. 287–297.
Springer, Heidelberg (2008)
On Computing Canonical Subsets of Graph-Based
Behavioral Representations
Drexel University
Department of Computer Science
Philadelphia PA 19104, USA
{walt,pjb38,ashokouf,salvucci}@drexel.edu
1 Introduction
In many domains involving the analysis of human behavior, data are often collected
in the form of time-series known as behavioral protocols — sequences of actions per-
formed during the execution of a task. Behavioral protocols offer a rich source of in-
formation about human behavior and have been used, for example, to examine how
computer users perform basic tasks (e.g., [1]), how math students solve algebra prob-
lems (e.g., [2]), and how drivers steer a vehicle down the road (e.g., [3]). However, the
many benefits of behavioral protocols come with one significant limitation: The typi-
cally sizable amount of data often makes it difficult, if not impossible, to analyze the
data manually. At times, researchers have tried to overcome this limitation by using
some form of aggregation in order to make sense of the data (e.g., [4,5]). While this
aggregation has its merits in seeing overall behavior, it masks potentially interesting
patterns in individuals and subsets of individuals. Alternatively, researchers have some-
times laboriously studied individual protocols by hand to identify interesting behaviors
(e.g. [6,7]). Although some work has been done on automated protocol analysis, such
techniques focus on matching observed behaviors to the predictions of a step-by-step
process model (e.g. [8,9]), and often such models are not available and/or their devel-
opment is infeasible given the complexity of the behaviors.
A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 215–222, 2009.
c Springer-Verlag Berlin Heidelberg 2009
216 W.C. Mankowski et al.
In our previous work we have introduced the notion of canonical behaviors as a novel
way of providing automated analysis of behavioral protocols [10]. Canonical behaviors
are a small subset of behavioral protocols that is most representative of the full data set,
providing a reasonable “big picture” view of the data with as few protocols as possible.
In contrast with previous techniques, our method identifies the canonical behavior pat-
terns without any a priori step-by-step process model; all that is needed is a similarity
measure between pairs of behaviors. To illustrate our approach in a real-world domain,
we applied the method to the domain of web browsing. We found that the canonical
browsing paths found by our algorithm compared well with those identified by two ex-
pert human coders with significant experience in cognitive task analysis and modeling.
However, our technique was limited by the fact that our similarity measure treated each
browsing path as a string, ignoring the underlying graph structure of the web site. In
this paper we explore a graph-based similarity measure which takes into account the
effects of graph topology when computing the similarity between two patterns.
The remainder of this paper is structured as follows. In Sect. 2 we describe our
canonical set algorithm and our new similarity metric. In Sect. 3 we review our web
browsing experiment from [10]. In Sect. 4 we compare the results of our new metric
with those from our previous experiment. Finally in Sect. 5 we summarize our findings
and discuss possible future directions of research.
At a high level, our goal in finding canonical behavior patterns is to reduce a large set
of protocols to a smaller subset that is most representative of the full data set. We define
a canonical set of behaviors as a subset such that the behaviors within the subset are
minimally similar to each other and are maximally similar to those behaviors not in the
subset.
Our technique for finding canonical behavior patterns derives from work on the
canonical set problem. Given a set of patterns P = {p1 , . . . , pn } and a similarity func-
tion S : P × P → IR≥0 , the canonical set problem is to find a subset P ⊆ P that best
characterizes the elements of P with respect to S. The key aspects of our method are an
approximation algorithm for the canonical set problem, and the specification of an ap-
propriate similarity metric for the particular problem being modeled. We now describe
each in turn.
Exact solutions to the canonical set problem require integer programming, which is
known to be NP-Hard [11]. Denton et al. [12] have developed an approximation algo-
rithm using semidefinite programming which has been shown to work very well on a
wide variety of applications. First, a complete graph G is constructed such that each
pattern (in this case, a behavior protocol) is a vertex, and each edge is given a weight
such that w(u, v) is the similarity of the patterns corresponding to the vertices u and
v. Finally, we find the canonical set by computing a cut that bisects the graph into two
subsets, as shown in Fig. 1.
On Computing Canonical Subsets of Graph-Based Behavioral Representations 217
Fig. 1. Canonical-set graph with behaviors at vertices and edge weights corresponding to be-
havioral similarities (from [10]). Finding the canonical set can be expressed as a optimization
problem, where the goal is to minimize the weights of the intra edges while simultaneously max-
imizing the weights of the cut edges.
The task of determining the proper graph cut to find the canonical set can be ex-
pressed as an optimization problem, where the objective is to minimize the sum of the
weights of the intra edges — those edges between vertices within the canonical set,
as shown in Fig. 1 — while simultaneously maximizing the sum of the weights of the
cut edges — those edges between vertices in the canonical set and those outside the
set. This optimization is known to be intractable [11] and thus Denton et al. employ
an approximation algorithm (please see Algorithm 1): they formulate the canonical
set problem as an integer programming problem, relax it to a semidefinite program,
and then use an off-the-shelf solver [13] to find the approximate solution. Please refer
to [12] for a full derivation and description of the algorithm.
The algorithm includes one free parameter, λ ∈ [0, 1], which scales the weighting
given to cut edges verses intra edges. Higher values of λ favor maximizing the cut edge
weights, resulting in fewer but larger subsets of patterns; lower values favor minimizing
the intra edge weights, resulting smaller, more numerous subsets.
There are two main advantages of the canonical set algorithm compared to many
similar methods of extracting key items from sets. First, it is an unsupervised algo-
rithm; no training dataset is necessary. Second, no a priori knowledge of the number of
representative elements (in this case, behaviors) is needed. Both the sets themselves and
218 W.C. Mankowski et al.
the most representative elements of the sets arise naturally from the algorithm. As a re-
sult, the canonical set algorithm has applications in a wide variety of machine learning
areas, for example image matching [14] and software engineering [15].
3 Data Collection
To test if the canonical set algorithm could be applied to find canonical behavior proto-
cols, we collected data from users performing typical web-browsing tasks on a univer-
sity web site [10]. The users were given a set of 32 questions covering a range of topics
On Computing Canonical Subsets of Graph-Based Behavioral Representations 219
Fig. 2. Sample analysis graphs (from [10]). The canonical behaviors found by our algorithm are
shown in bold in (b), and the other behaviors are labeled according to their nearest neighbor.
this undefined and allowed them to use their own judgments to decide on the best par-
tition for each question.
Figure 2 shows an example of the automated and expert results (from [10]) for an
individual question (“What is the phone number of Professor . . . ?”) to illustrate our
analysis in detail. Each vertex represents a single web page (labeled with a unique inte-
ger) and each directed edge represents a clicked link from one page to another taken by
one of the users. The expert (graph a) found 6 sets of behaviors: sets A and B represent
different ways of clicking through the department web page to get to the professor’s
home page; sets C and D represent different ways of clicking through to the site’s direc-
tory search page (vertex 14); and sets E and F represent slight variations on sets C and
D. The canonical set algorithm (graph b, with λ=.36)) identified 4 canonical behaviors
for this same question; these are shown in bold in the figure, and the other behaviors
are labeled according to their nearest neighbor. The behaviors found by the algorithm
correspond directly to the expert’s sets A–D, but instead of splitting out sets E and F,
the algorithm (in part due to the value of λ) grouped these variations with the nearest
canonical set D.
0.9
Fig. 3. Rand index comparison of edit distance and association graph similarity measures for the
two experts across a range of λ values. The association graph measure outperforms edit distance
across the nearly entire range for both experts.
their sense of “similar” and “different” behaviors depending on the particular behaviors
they observed for each question. While the selection of the correct λ is beyond the scope
of this paper, it is something we plan to study further in our future research.
5 Discussion
We have presented an automated method of finding canonical subsets of behavior pro-
tocols which uses a graph-based representation of the data. The collection of these types
of time series is common in psychology and human factors research. While these data
can often be naturally represented as graphs, there has been relatively little work in
applying graph theory to their study. As users move through the space of possible be-
haviors in a system, their paths naturally induce a graph topology. We have shown that
by taking into account this topology, improved results may be obtained over methods
which ignore the underlying graph structure. We believe that this work is an important
first step in the application of graph-based representations and algorithms to the analysis
of human behavior protocols.
References
1. Card, S.K., Newell, A., Moran, T.P.: The Psychology of Human-Computer Interaction.
Lawrence Erlbaum Associates, Hillsdale (1983)
2. Milson, R., Lewis, M.W., Anderson, J.R.: The Teacher’s Apprentice Project: Building an
Algebra Tutor. In: Artificial Intelligence and the Future of Testing, pp. 53–71. Lawrence
Erlbaum Associates, Hillsdale (1990)
222 W.C. Mankowski et al.
3. Salvucci, D.D.: Modeling driver behavior in a cognitive architecture. Human Factors 48(2),
362–380 (2006)
4. Chi, E.H., Rosien, A., Supattanasiri, G., Williams, A., Royer, C., Chow, C., Robles, E., Dalal,
B., Chen, J., Cousins, S.: The Bloodhound project: automating discovery of web usability
issues using the InfoScentTM simulator. In: CHI 2003: Proceedings of the SIGCHI conference
on Human factors in computing systems, pp. 505–512. ACM, New York (2003)
5. Cutrell, E., Guan, Z.: What are you looking for? An eye-tracking study of information usage
in web search. In: CHI 2007: Proceedings of the SIGCHI conference on Human factors in
computing systems, pp. 407–416. ACM, New York (2007)
6. Ericsson, K.A., Simon, H.A.: Protocol analysis: verbal reports as data, Revised edn. MIT
Press, Cambridge (1993)
7. Salvucci, D.D., Anderson, J.R.: Automated eye-movement protocol analysis. Human-
Computer Interaction 16(1), 39–86 (2001)
8. Ritter, F.E., Larkin, J.H.: Developing process models as summaries of HCI action sequences.
Human-Computer Interaction 9(3), 345–383 (1994)
9. Smith, J.B., Smith, D.K., Kupstas, E.: Automated protocol analysis. Human-Computer Inter-
action 8(2), 101–145 (1993)
10. Mankowski, W.C., Bogunovich, P., Shokoufandeh, A., Salvucci, D.D.: Finding canonical
behaviors in user protocols. In: CHI 2009: Proceedings of the SIGCHI conference on Human
factors in computing systems, pp. 1323–1326. ACM, New York (2009)
11. Garey, M.R., Johnson, D.S.: Computers and Intractibility: A Guide to the Theory of NP-
Completeness. W.H. Freeman and Co., San Francisco (1979)
12. Denton, T., Shokoufandeh, A., Novatnack, J., Nishino, K.: Canonical subsets of image fea-
tures. Computer Vision and Image Understanding 112(1), 55–66 (2008)
13. Toh, K., Todd, M., Tütüncü, R.: SDPT3 — a M ATLAB software package for semidefinite
programming. Optimization Methods and Software 11, 545–581 (1999)
14. Novatnack, J., Denton, T., Shokoufandeh, A., Bretzner, L.: Stable bounded canonical sets
and image matching. In: Rangarajan, A., Vemuri, B.C., Yuille, A.L. (eds.) EMMCVPR 2005.
LNCS, vol. 3757, pp. 316–331. Springer, Heidelberg (2005)
15. Kothari, J., Denton, T., Mancoridis, S., Shokoufandeh, A.: On computing the canonical fea-
tures of software systems. In: Proceedings of the 13th Working Conference on Reverse En-
gineering (WCRE), pp. 93–102 (2006)
16. Rubner, Y., Tomasi, C., Guibas, L.J.: The earth mover’s distance as a metric for image re-
trieval. International Journal of Computer Vision 40(2), 99–121 (2000)
17. Levenshtein, V.: Binary codes capable of correcting deletions, insertions and reversals. Soviet
Physics Doklady 10, 707–710 (1966)
18. Pelillo, M., Siddiqi, K., Zucker, S.W.: Matching hierarchical structures using association
graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence 21(11), 1105–1120
(1999)
19. Motzkin, T., Strauss, E.: Maxima for graphs and a new proof of a theorem of Turan. Canadian
Journal of Mathematics 17(4), 533–540 (1964)
20. Floyd, R.W.: Algorithm 97: Shortest path. Communications of the ACM 5(6), 345 (1962)
21. Rand, W.M.: Objective criteria for the evaluation of clustering methods. Journal of the Amer-
ican Statistical Association 66(336), 846–850 (1971)
Object Detection by Keygraph Classification
1 Introduction
Object detection is one of the most classic problems in computer vision and
can be informally defined as follows: given an image representing an object and
another, possibly a video frame, representing a scene, decide if the object belongs
to the scene and determine its pose if it does. Such pose consists not only of the
object location, but also of its scale and rotation. The object might not even be
necessarily rigid, in which case more complex deformations are possible. We will
refer to the object image as our model and, for the sake of simplicity, refer to
the scene image simply as our frame.
Recent successful approaches to this problem are based on keypoints [1,2,3,4].
In such approaches, instead of the model itself, the algorithm tries to locate
a subset of points from the object. The chosen points are those that satisfy
desirable properties, such as ease of detection and robustness to variations of
scale, rotation and brightness. This approach reduces the problem to supervised
classification where each model keypoint represents a class and feature vectors
of the frame keypoints represent input data to the classifier.
A well-known example is the SIFT method proposed by Lowe [1]. The most
important aspect of this method relies on the very rich feature vectors calculated
for each keypoint: they are robust and distinctive enough to allow remarkably
good results in practice even with few vectors per class and a simple nearest-
neighbor approach. More recent feature extraction strategies, such as the SURF
A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 223–232, 2009.
c Springer-Verlag Berlin Heidelberg 2009
224 M. Hashimoto and R.M. Cesar Jr.
method proposed by Bay, Tuytelaars and van Gool [2], are reported to perform
even better.
The main drawback of using rich feature vectors is that they are usually
complex or computationally expensive to calculate, which can be a shortcoming
for real-time detection in videos, for example. Lepetit and Fua [3] worked around
this limitation by shifting much of the computational burden to the training
phase. Their method uses simple and cheap feature vectors, but extracts them
from several different images artificially generated by applying changes of scale,
rotation and brightness to the model. Therefore, robustness is achieved not by
the richness of each vector, but by the richness of the training set as a whole.
Regardless the choice among most of such methods, keypoint-based approach-
es traditionally follow the same general framework, described below.
Training
3. Use the feature vectors to train a classifier whose classes are the keypoints.
The accuracy must be reasonably high, but not necessarily near-perfect.
Classification
3. Apply the classifier to the feature vectors in order to decide if each frame
keypoint is sufficiently similar to a model keypoint. As near-perfect accu-
racy is not required, several misclassifications might be done in this step.
where small groups of keypoints are considered. However, since those works fol-
low the framework of associating classes to individual points, there is still an
inherent underuse of structural information.
In this paper, we propose an alternative framework that, instead of classifying
single keypoints, classifies sets of keypoints using both appearance and structural
information. Since graphs are mathematical objects that naturally model rela-
tions, they are adopted to represent such sets. Therefore, the proposed approach
is based on supervised classification of graphs of keypoints, henceforth referred
as keygraphs. A general description of our framework is given below.
Training
4. Use the feature vectors to train a classifier whose classes are the keygraphs.
The accuracy must be reasonably high, but not necessarily near-perfect.
Classification
4. Apply the classifier to the feature vectors in order to decide if each frame
keygraph is sufficiently similar to a model keygraph. As near-perfect accu-
racy is not required, several misclassifications might be done in this step.
The idea of using graphs built from keypoints to detect objects is also not new:
Tang and Tao [7], for example, had success with dynamic graphs defined over
SIFT points. Their work, however, shifts away from the classification approach
and tries to solve the problem with graph matching. Our approach, in contrast,
still reduces the problem to supervised classification, which is more efficient.
In fact, it can be seen as a generalization of the traditional methods, since a
keypoint is a single-vertex graph.
This paper is organized as follows. Section 2 introduces the proposed frame-
work, focusing on the advantages of using graphs instead of points. Section 3
226 M. Hashimoto and R.M. Cesar Jr.
One of the most evident differences between detecting a keypoint and detecting
a keygraph is the size of the universe set: the number of subgraphs of G(K) is ex-
ponential on the size of K. This implies that a keygraph detector must be much
more restrictive than a keypoint detector if we are interested in real-time per-
formance. Such necessary restrictiveness, however, is not hard to obtain because
graphs have structural properties to be explored that individual keypoints do
not. Those properties can be classified in three types: combinatorial, topological
and geometric. Figure 1 shows how those three types of structural properties can
be used to gradually restrict the number of considered graphs.
a) b) c)
Fig. 1. Gradual restriction by structural properties. Column (a) shows two graphs
with different combinatorial structure. Column (b) shows two graphs combinatorially
equivalent but topologically different. Finally, column (c) shows two graphs with the
same combinatorial and topological structure, but different geometric structure.
(a) (b)
Fig. 2. Model keygraph (a) and a frame keygraph (b) we want to classify. From the
topological structure alone we can verify that the latter cannot be matched with the
former: the right graph does not have a vertex inside the convex hull of the others.
Furthermore, translating this simple boolean property into a scalar value does not make
much sense.
There are two motivations for such approach: the first one is the fact that a
structural property, alone, may present a strong distinctive power. The second
one is the fact that certain structural properties may assume boolean values for
which a translation to a scalar does not make much sense. Figure 2 gives a simple
example that illustrates the two motivations.
By training several classifiers, one for each subset given by the partition,
instead of just one, we not only satisfy the two motivations above, but we also
improve the classification from both an accuracy and an efficiency point of view.
For extracting a feature vector from a keygraph, there exists a natural approach
by merging multiple keypoint feature vectors extracted from its vertices. How-
ever, a more refined approach may be derived. In traditional methods, a keypoint
feature vector is extracted from color values of the points that belong to a certain
228 M. Hashimoto and R.M. Cesar Jr.
(a) (b)
Fig. 3. Comparison of patch extraction (a) and relative extraction (b) with keygraphs
that consist of two keypoints and the edge between them. Suppose there is no variation
of brightness between the two images and consider for each keygraph the mean gray
level relative to all image pixels crossed by its edge. Regardless of scale and rotation,
there should be no large variations between the two means. Therefore, they represent a
naturally robust feature. In contrast, variations in scale and rotation gives completely
different patches and a non-trivial patch extraction scheme is necessary for robustness.
patch around it. This approach is inherently flawed because such patches are not
naturally robust to scale and rotation. Traditional methods work around this flaw
by improving the extraction itself. Lowe [1] uses a gradient histogram approach,
while Lepetit and Fua [3] rely on the training with multiple sintethic views.
With keygraphs, in contrast, the flaw does not exist in the first place, because
they are built on sets of keypoints. Therefore, they allow the extraction of relative
features that are naturally robust to scale and rotation without the need of
sophisticated extraction strategies. Figure 3 shows a very simple example.
Fig. 4. Example of pose estimation ambiguity. The image on the left indicates the pose
of a certain 2-vertex graph in a frame. If a classifier evaluates this graph as being the
model keygraph indicated in Figure 3, there would be two possible coherent rotations.
For keypoint detection we used the well-known good features to track detector
proposed by Shi and Tomasi [9], that applies a threshold over a certain qual-
ity measure. By adjusting this threshold, we are able to control how rigid is
the detection. A good balance between accuracy and efficiency was found in a
threshold that gave us 79 keypoints in the model.
θ3
θ1 θ2
Fig. 5. Scalene triangle with θ1 < θ2 < θ3 . In this case, if we pass through the three
vertices in increasing order of internal angle, we have a counter-clockwise movement.
The partitioning of the feature vector set is made according to three structural
properties. Two of them are the values of the two largest angles. Notice that, since
the internal angles of a triangle always sum up to 180 degrees, considering all
angles would be redundant. The third property refers to a clockwise or counter-
clockwise direction defined by the three vertices in increasing order of internal
angle. Figure 5 has a simple example.
In our experiment we established a partition in 2 · 36 · 36 = 2592 subsets:
the angles are quantized by dividing the interval (0, 180) in 36 bins. The largest
subset in the partition has 504 keygraphs, a drastic reduction from the 51.002
possible ones.
Fig. 6. Corner chrominance extraction. The gray segments define a limit for the
size of the projected lines. The white points defining the extremities of those lines
are positioned according to a fraction of the edge they belong to. In the above example
the fraction is 1/3.
Object Detection by Keygraph Classification 231
vertices. Finally, the size of those projected lines is limited by a segment whose
extremities are points in the keygraph edges.
This scheme is naturally invariant to rotation. Invariance to brightness is
ensured by the fact that we are considering only the chrominance and ignoring
the luminance. Finally, the invariance to scale is ensured by the fact that the
extremities mentioned above are positioned in the edges according to a fraction
of the size of the edge that they belong to, and not by any absolute value.
4 Conclusion
keygraphs are thick scalene triangles, we have shown successful results for real-
time detection after training with a single image.
The framework is very flexible and is not bounded to an specific keypoint
detector or keygraph detector. Therefore, room for improvement lies on both
the framework itself and the implementation of each one of its steps. We are
currently interested in using more sophisticated keygraphs and in adding the
usage of temporal information to adapt the framework to object tracking. Finally,
we expect to cope with 3D poses (i.e. out-of-plane rotations) by incorporating
aditional poses to the training set. These advances will be reported in due time.
References
1. Lowe, D.: Distinctive image features from scale-invariant keypoints. International
Journal of Computer Vision 20, 91–110 (2004)
2. Bay, H., Tuytelaars, T., van Gool, L.: SURF: Speeded Up Robust Features. In:
Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–
417. Springer, Heidelberg (2006)
3. Lepetit, V., Fua, P.: Keypoint recognition using randomized trees. IEEE Transac-
tions on Pattern Analysis and Machine Inteligence 28, 1465–1479 (2006)
4. Özuysal, M., Fua, P., Lepetit, V.: Fast keypoint recognition in ten lines of code. In:
Proceedings of the 2007 IEEE Computer Society Conference on Computer Vision
and Pattern Recognition, pp. 1–8. IEEE Computer Society, Los Alamitos (2007)
5. Schmid, C., Mohr, R.: Local grayvalue invariants for image retrieval. IEEE Trans-
actions on Pattern Analysis and Machine Inteligence 19, 530–535 (1997)
6. Tell, D., Carlsson, S.: Combining appearance and topology for wide baseline match-
ing. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS,
vol. 2350, pp. 68–81. Springer, Heidelberg (2002)
7. Tang, F., Tao, H.: Object tracking with dynamic feature graph. In: Proceedings
of the 2nd Joint IEEE International Workshop on Visual Surveillance and Per-
formance Evaluation of Tracking and Surveillance, pp. 25–32. IEEE Computer
Society, Los Alamitos (2005)
8. OpenCV: http://opencv.willowgarage.com/
9. Shi, J., Tomasi, C.: Good features to track. In: Proceedings of the 1994 IEEE
Computer Society Conference on Computer Vision and Pattern Recognition, pp.
593–600. IEEE Computer Society, Los Alamitos (1994)
10. Fortune, S.: A sweepline algorithm for Voronoi diagrams. In: Proceedings of the
Second Annual Symposium on Computational Geometry, pp. 313–322. ACM, New
York (1986)
Graph Regularisation Using Gaussian Curvature
1 Introduction
In computer vision, image processing and graphics the data under consideration
frequently exists in the form of a graph or a mesh. The fundamental problems
that arise in the processing of such data are how to smooth, denoise, restore and
simplify data samples over a graph. The Principal difficulty of this task is how
to preserve the geometrical structures existing in the initial data. Many methods
have been proposed to solve this problem. Among existing methods, variational
techniques based on regularization, provide a general framework for designing
efficient filtering processes. Solutions to the variational models can be obtained
by minimizing an appropriate energy function. The minimization is usually per-
formed by designing a continuous partial differential equation, whose solutions
are discretized in order to fit with the data domain. A complete overview of these
methods in image processing can be found in ([1,2,3,4]). One of the problems as-
sociated with variational methods is that of distretisation, which for some types
The authors acknowledge the financial support from the FET programme within the
EU FP7, under the SIMBAD project (contract 213250).
A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 233–242, 2009.
c Springer-Verlag Berlin Heidelberg 2009
234 H. ElGhawalby and E.R. Hancock
In this section, we recall some basic prerequisites concerning graphs, and define
nonlocal operators which can be considered as discrete versions of continuous
differential operators.
Graph Regularisation Using Gaussian Curvature 235
2.1 Preliminaries
An undirected unweighted graph denoted by G = (V, E) consists of a finite set
of nodes V and a finite set of edges E ⊆ V × V . The elements of the adjacency
matrix A of the graph G is defined by:
1 if (u, v) ∈ E
A(u, v) = (1)
0 otherwise
To construct the Laplacian
matrix we first establish a diagonal degree matrix D
with elements D(u, u) = v∈V A(u, v) = du . From the degree and the adjacency
matrices we can construct the Laplacian matrix L = D − A, that is the degree
matrix minus the adjacency matrix.
⎧
⎨ du if u = v
L(u, v) = −1 if (u, v) ∈ E (2)
⎩
0 otherwise
1/2
with the induced L2 - norm: f 2 = f,hH(V ) .
This operator arises naturally from the variational problem associated to the
energy function [13]. The p-Laplace operator is nonlinear, with the exception of
p=2, where it corresponds to the combinatorial graph Laplacian, which is one
of the classical second order operators defined in the context of spectral graph
theory [7]
Lf(u) = (f(u) − f(v)) (9)
v∼u
Another particular case of the p-Laplace operator is obtained with p=1 . In this
case, it is the curvature of the function f on the graph
κ corresponds to the curvature operator proposed in [17] and [4] in the context
of image restoration. More generally, κ is the discrete analogue of the mean
curvature of the level curve of a function defined on a continuous domain of !N .
over the vertices. Gaussian curvature is one of the fundamental second order
geometric properties of a surface, and it is an intrinsic property of a surface
independent of the coordinate system used to describe it. As stated by Gauss’s
theorema egregium [11], it depends only on how distance is measured on the
surface, not on the way it is embedded on the space.
1 2
Ag = d (11)
2R e
where dM is the Riemannian volume element. Since all the points, except for the
vertices, of a piecewise linear surface have a neighborhood isometric to a planar
Euclidean domain with zero curvature, the Gauss curvature is concentrated at
the isolated vertices. Hence, to estimate the Gaussian curvature of a smooth
surface from its triangulation, we need to normalize by the surface area, which
here is the area of the triangle. Consequently, we will assign one third of the
triangle area to each vertex. Hence, the Gaussian curvature associated with each
vertex will be #
g
KdM
κg = 1 (13)
3A
238 H. ElGhawalby and E.R. Hancock
3
κg = (15)
R2
Recalling that the Gaussian curvature is the product of the two principle cur-
vatures, and that the curvature of a point on a sphere is the reciprocal of the
radius of the sphere gives an explanation for the result in (15). As we assumed
earlier that the geodesic is a great arc of a circle of radius R, in [10] we deduced
that
1 24(dg − de )
= dg − (16)
R2 d3g
and since for an edge of the graph dg = 1, we have
1
= 24(1 − de ) (17)
R2
From (15) and (17) the Gaussian curvature associated with the embedded node
can be found from the following formula
κg = 72(1 − de ) (18)
4 Hausdorff Distance
and . is some underlying norm on the points of A and B (e.g., the L2 or Eu-
clidean norm). Using these ingredients we can describe how the modified Haus-
dorff distances can be extended to graph-based representations. To commence
let us consider two graphs G1 = (V1 , E1 , T1 , κ1 ) and G2 = (V2 , E2 , T1 , κ2 ), where
V1 ,V2 are the sets of nodes, E1 , E2 the sets of edges, T1 ,T2 are the sets of tri-
angles, and κ1 ,κ2 the sets of the Gaussian curvature associated with each node
defined in §3.2. We can now write the distances between two graphs as follows:
1
hMHD (G1 , G2 ) = min κ2 (j) − κ1 (i)) (21)
|V1 | j∈V2
i∈V1
5 Multidimensional Scaling
For the purpose of visualization, the classical Multidimensional Scaling (MDS)
[8] is a commonly used technique to embed the data specified in the matrix in Eu-
clidean space. Given that H is the distance matrix with row r and column c entry
Hrc . The first step of MDS is to calculate a matrix T whose element with row r
and column c is given by Trc = − 21 [Hrc 2
−H r.
2 .c
−H 2 ..2 ] where H
+H r. = 1 N Hrc
N c=1
is the average value over the rth row in the distance matrix, H . c is
the simi-
larly defined average value over the cth column and H .. = 12 N N
N r=1 c=1 Hrc
is the average value over all rows and columns of the distance matrix. Then, we
subject the matrix T to an eigenvector analysis to obtain a matrix of embed-
ding coordinates X. If the rank of T is k; k ≤ N , then we will have knon-zero
eigenvalues. We arrange these k non-zero eigenvalues in descending order, i.e.,
l1 ≥ l2 ≥ ... ≥ lk ≥ 0. The corresponding ordered eigenvectors are denoted by ui
where li √ is the ith
√ eigenvalue.
√ The embedding coordinate system for the graphs
is X = [ l1 u1 , l2 u2 , ..., lk uk ] for the graph indexed i, the embedded vector
of the coordinates is xi = (Xi,1 , Xi,2 , ..., Xi,k )T .
6 Experiments
For the purposes of experimentation we use the standard CMU, MOVI and
chalet house image sequences as our data set [15]. These data sets contain dif-
ferent views of model houses from equally spaced viewing directions. From the
house images, corner features are extracted, and Delaunay graphs represent-
ing the arrangement of feature points are constructed. Our data consists of ten
graphs for each of the three houses. To commence, we compute the Euclidean
distances between the nodes in each graph based on the Laplacian and then on
the heat kernel with the values of t = 10.0, 1.0, 0.1 and 0.01. Then we compute
the Gaussian curvature associated with each node using the formula given in §.
Commencing with each node attributed with the the Gaussian curvatures
(as the value of a real function f acting on the nodes of the graph), we can
regularise each graph by applying the the p-Laplacian operator to the Gaussian
curvatures. For each graph we construct a set of regularised Gaussian curvatures
using both the Laplace operator and the curvature operator, as a special cases
240 H. ElGhawalby and E.R. Hancock
1000 200
500 20
500
150
0 0 0
100
−500
−500 50 −20
−1000
0
−1500 −1000 −40
−50
−2000
−1500 −60
−2500 −100
Fig. 1. MDS embedding obtained using Laplace operator to regularize the houses data
resulting from the heat kernel embedding
3000 60
200
2000
40
2000
100
1000 20
1000
0 0
0 0
−100 −20
−1000
−1000 −40
−200
−2000
−60
−2000
−3000 −300
−80
Fig. 2. MDS embedding obtained using Curvature operator to regularize houses data
resulting from the heat kernel embedding
600 2000
400 1500
1000
200
500
0
0
−200
−500
−400
−1000
−600 −1500
−800 −2000
−4000 −3000 −2000 −1000 0 1000 2000 −3000 −2000 −1000 0 1000 2000 3000
Fig. 3. MDS embedding obtained using Laplace operator(left) and the Curvature opra-
tor (right) to regularize the houses data resulting from the Laplacian embedding
of the p-Laplacian operator. The next step is to compute the distances between
the sets for the thirty different graphs using the modified Hausdorff distance.
Finally, we subject the distance matrices to the Multidimensional Scaling (MDS)
procedure to embed them into a 2D space. Here each graph is represented by
a single point. Figure 1 shows the results obtained using the Laplace operator.
The subfigures are ordered from left to right, using the heat kernel embedding
with the values t = 10.0, 1.0, 0.1 and 0.01. Figure 2 shows the corresponding
results obtained when the Curvature operator is used. Figure 3 shows the results
obtained whenusing the Laplacian embedding, from the Laplace operator (left)
and the Curvature operator (right).
To investigate the results in more detail table 1 shows the rand index for the
distance as a function of t. This index is computed as follows: 1) compute the
mean for each cluster; 2) compute the distance from each point to each mean;
3) if the distance from correct mean is smaller than those to remaining means,
then classification is correct, if not then classification is incorrect; 4) compute
the Rand-index (incorrect/(incorrect+correct)).
Graph Regularisation Using Gaussian Curvature 241
In this paper, a process for regularizing the curvature attributes associated with
the geometric embedding of graphs was presented. Experiments show that it
is an efficient procedure for the purpose of gauging the similarity of pairs of
graphs. The regularisation procedure improves the results obtained with graph
clustering. Our future plans are twofold. First, we aim to explore if geodesic
flows along the edges of the graphs can be used to implement a more effective
regularisation process. Second, we aim to apply our methods to problems of
image and mesh smoothing.
References
1. Bertalmio, M., Cheng, L.T., Osher, S., Sapiro, G.: Variational problems and partial
differential equations on implicit surfaces. Journal of Computational Physics 174,
759–780 (2001)
2. Bougleux, S., Elmoataz, A.: Image smoothing and segmentation by graph regular-
ization. LNCS, vol. 3656, pp. 745–752. Springer, Heidelberg (2005)
3. Boykov, Y., Huttenlocher, D.: A new bayesian framework for object recognition. In:
Proceeding of IEEE Computer Society Conference on CVPR, vol. 2, pp. 517–523
(1999)
4. Chan, T., Osher, S., Shen, J.: The digital tv filter and nonlinear denoising. IEEE
Trans. Image Process 10(2), 231–241 (2001)
5. Chan, T., Shen, J.: Variational restoration of non-flat image features: Models and
algorithms. SIAM J. Appl. Math. 61, 1338–1361 (2000)
6. Cheng, L., Burchard, P., Merriman, B., Osher, S.: Motion of curves constrained
on surfaces using a level set approach. Technical report, UCLA CAM Technical
Report (00-32) (September 2000)
7. Chung, F.R.: Spectral graph theory. In: Proc. CBMS Regional Conf. Ser. Math.,
vol. 92, pp. 1–212 (1997)
8. Cox, T., Cox, M.: Multidimensional Scaling. Chapman-Hall, Boca Raton (1994)
9. Dubuisson, M., Jain, A.: A modified hausdorff distance for object matching, pp.
566–568 (1994)
10. ElGhawalby, H., Hancock, E.R.: Measuring graph similarity using spectral geom-
etry. In: Campilho, A., Kamel, M.S. (eds.) ICIAR 2008. LNCS, vol. 5112, pp.
517–526. Springer, Heidelberg (2008)
242 H. ElGhawalby and E.R. Hancock
1 Introduction
Pattern analysis using graph structures has proved to be a challenging and sometimes
elusive problem. The main reason for this is that graphs are not vectorial in nature, and
hence they are not amenable to classical statistical methods from pattern recognition or
machine learning [7]. One way to overcome this problem is to extract feature vectors
from graphs which succinctly capture their structure in a manner that is permutation in-
variant. There are a number of ways in which this may be accomplished. One approach
is to use simple features such as the numbers of edges and nodes, edge density or di-
ameters. A more sophisticated approach is to count the numbers of cycles of different
order. Alternatively graph-spectra can be used [7][10].
However, one elegant way in which to capture graph structure is to compute the char-
acteristic polynomial. To do so requires a matrix characterization M of the graph, and
the characteristic polynomial is the determinant det(λI − M) where I is the identity
matrix and λ the variable of the polynomial. The simplest way to exploit the character-
istic polynomial is to use its coefficients. With an appropriate choice of matrix M, these
coefficients are determined by the cycle frequencies in the graph. They are also easily
computed from the spectrum of M. Moreover, since it is determined by the numbers of
cycles in a graph, the characteristic polynomial may also be used to define an analogue
of the Riemann zeta function from number theory for a graph. Here the zeta function is
A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 243–252, 2009.
c Springer-Verlag Berlin Heidelberg 2009
244 P. Ren, R.C. Wilson, and E.R. Hancock
determined by the reciprocal of the characteristic polynomial, and prime cycles deter-
mine the poles of the zeta function in a manner analogous to the prime numbers. The
recent work of Bai et al[2] and Ren et al[8][9] has shown that practical characterizations
can be extracted from different forms of the zeta function and used for the purposes of
graph-based object recognition. Finally, it is interesting to note that if the matrix M is
chosen to be the adjacency matrix T of the oriented line graph derived from a graph,
which is also called the Perron-Frobenius operator, then the characteristic polynomial
is linked to the Ihara zeta function of the original graph.
As noted above, the characteristic polynomial is determined by the choice of the
matrix M. Here there are a number of alternatives including the adjacency matrix A,
the Laplacian matrix L = D − A where D is the node degree matrix, and the Perron-
Frobenius operator T where the graph is transformed prior to the computation of the
characteristic polynomial. To compute the Iharra zeta function the oriented line graph is
first constructed and then the characteristic polynomial is computed from its adjacency
matrix. This is similar to the approach taken by Emms et al [3] in their study of discrete
time quantum walks. However, rather than characterizing the oriented line graph using
the adjacency matrix, they construct a unitary matrix U which captures the transitions
of a quantum walk controlled by a Grover coin. The resulting unitary matrix proves to
be a powerful tool for analyzing graphs since the spectrum of the positive support of its
third power (denoted by sp(S + (U3 )) can be used to resolve structural ambiguities due
to the cospectrality of strongly regular graphs.
The aim in this paper is to explore the roles of matrix graph representation in the
construction of characteristic polynomials. In particular we are interested in which
combination is most informative in terms of graph-structure and which gives the best
empirical performance when graph clustering is attempted using the characteristic poly-
nomial coefficients. We study both the original graph and its oriented line graph. The
matrix characterizations used are the adjacency matrix A, the Laplacian matrix L, the
transition matrix T and the unitary characterization U.
To commence, suppose that the graph under study is denoted by G = (V, E) where
V is the set of nodes and E ⊆ V × V is the set of edges. Since we wish to adopt a
graph spectral approach we introduce the adjacency matrix A for the graph where the
elements are
1 if u, v ∈ E
A(u, v) = (1)
0 otherwise
We also
construct the diagonal degree matrix D, whose elements are given by D(u, u) =
du = u∈V A(u, v). From the degree matrix and the adjacency matrix we construct
the Laplacian matrix L = D − A, i.e. the degree matrix minus the adjacency matrix
⎧
⎨ du if u = v
L(u, v) = 1 if u, v ∈ E but u = v (2)
⎩
0 otherwise
Characteristic Polynomial Analysis on Matrix Representations of Graphs 245
Here, p denotes a prime cycle and L(p) denotes the length of p. As shown in (3), the
Ihara zeta function is generally an infinite product. However, one of its elegant features
is that it can be collapsed down into a rational function, which renders it of practical
utility.
Here, χ(G)is the Euler number of the graph G(V, E), which is defined as the difference
between the vertex number and the edge number of the graph, i.e. χ(G) = N − M , A
is the adjacency matrix of the graph, Ik denotes the k × k identity matrix, and Q is the
matrix difference of the degree matrix D and the identity matrix IN , i.e. Q = D − IN .
From (4) it has been shown that the Ihara zeta function is permutation invariant to vertex
label permutations [9]. This is because permutation matrices, which represent vertex
label permutations in matrix calculation, have no effect on the determinant in (4).
where T is the Perron-Frobenius operator on the oriented line graph of the original
graph, and is an 2M × 2M square matrix.
To obtain the Perron-Frobenius operator T, we must construct the oriented line graph
of the original graph from the associated symmetric digraph. The symmetric digraph
246 P. Ren, R.C. Wilson, and E.R. Hancock
The Perron-Frobenius operator T of the original graph is the adjacency matrix of the
associated oriented line graph. For the (i, j)th entry of T, T(i, j) is 1 if there is one
edge directed from the vertex with label i to the vertex with label j in the oriented line
graph, and is 0 otherwise.
Unlike the adjacency matrix of an undirected graph, the Perron-Frobenius operator is
not a symmetric matrix. This is because of a constraint that arises in the construction of
oriented edges. Specifically, the arc pair with two arcs that are the reverse of one another
in the symmetric digraph are not allowed to establish an oriented edge in the oriented
line graph. This constraint arises from the second requirement in the edge definition
appearing in (6).
The Perron-Frobenius operator T is matrix representation which can convey the in-
formation contained in the Ihara zeta function for a graph. It is the adjacency matrix of
the oriented line graph associated with the original graph. As T is not symmetric, the
Laplacian form of the Perron-Frobenius operator can not be uniformly defined because
the relevant vertex degree can be calculated from either incoming or outgoing edges
in the oriented line graph which is a directed graph. In this study, we consider three
types of Laplacian matrices of the Perron-Frobenius operator. They are defined as the
incoming degree matrix minus the T, the outgoing degree matrix minus T and the sum
of both incoming degree matrix and the outgoing degree matrix minus T, respectively.
as the intermediate step of constructing the digraph to compute the determinant of the
Ihara zeta function. The state space for the discrete-time quantum walk is the set of arcs
Ed . If the walk is at vertex v having previously been at vertex u with probability 1, then
the state is written as |ψ = |uv. Transitions are possible form one arc ed (w, x) to
another arc ed (u, v), i.e. from a state |wx to |uv, and only if x = u and x is adjacent
to v. Note that this corresponds to only permitting transitions between adjacent vertices.
The state vector for the walk is a quantum superposition of states on single arcs of the
graph, and can be written as
where the quantum amplitudes are complex, i.e. α ∈ C. Using (7), the probability
that the walk is in the state |uv is given by Pr(|uv) = αuv α∗uv . As with the clas-
sical walk,the evolution of the state vector is determined by a matrix, in this case de-
noted U, according to |ψt+1 = U|ψt . Since the evolution of the walk is linear and
conserves probability the matrix U must be unitary. That is, the inverse is equal to
the complex conjugate of the matrix transposed, i.e. U−1 = U† . The entries of U de-
termine the probabilities for transitions between states. Thus, there are constraints on
these entries and there are therefore constraints on the permissible amplitudes for the
transitions. The sum of the squares of the amplitudes for all the transitions from a par-
ticular state must be unity. Consider a state |ψ = |u1 v where the neighborhood of v,
N (v) = u1 , u2 , · · · , ur . A single step of the walk should only assign non-zero quan-
tum amplitudes to transitions between adjacent states, i.e.the states |vui where ui ∈ N .
However, since U must be unitary these amplitudes cannot all be the same. Recall that
the walk does not rely on any labeling of the edges or vertices. Thus, the most gen-
eral form of transition will be one that assigns the same amplitudes to all transitions
|u1 v → |vui , ui ∈ N \ u1 , and a different amplitude to the transition |u1 v → |vu1 .
The second of these two transitions corresponds to the walk returning along the same
edge to which it came. Thus, the transition will be of the form
r
|u1 v → |vu1 + b |vui , a, b ∈ C (8)
i=2
It is usual to use the Grover diffusion matrices which assign quantum amplitudes of
a = 2/dv − 1 when the walk returns a long the same edge and b = 2/dv for all other
transitions. Such matrices are used as they are the matrices furthest from the identity
which are unitary and are not dependent on any labeling of the vertices.
Using the Grover diffusion matrices, the matrix, U, that governs the evolution of the
walk has entries 2
− δvw if u = x
U(u,v),(w,x) = dx (9)
0 otherwise
Note that the state transition matrix U in discrete time quantum walk and the
Perron-Frobenius operator T in the Ihara zeta function have has similar form. They
are of the same dimensionality for a graph. Specifically, all the non-zero entries of T
are 1 while the same entries in U are weighted twice of the reciprocal of the connecting
248 P. Ren, R.C. Wilson, and E.R. Hancock
vertex degree in the original graph. Additionally, the entries representing reverse arcs
in T generally have a non-zero value 2/dx − 1 while the same entries in T are always
set zero.
In [3], the spectrum of the positive support of third power of U (denoted by sp
(S + (U3 )) is shown to distinguish cospectral graphs. Thus, sp(S + (U3 )) proves an ef-
fective graph representation matrix.
5 Characteristic Polynomials
Once the graph representation matrices are to hand, our task is how to characterize
graphs using different matrix representations and thus distinguish graphs from different
classes. One simple but effective way to embed graphs into a pattern space is furnished
by spectral methods [7]. The eigenvalues of the representation matrices are used as the
elements of graph feature vectors. However, graphs with different sizes have different
numbers of eigenvalues. There are generally two ways to overcome this problem. The
first is to establish the patten space with a dimensionality the same as the cardinality of
the vertex set of the graph with the greatest size. Feature vectors of the smaller graphs
are adjusted to the same length by padding zeros before the non-zero eigenvalues up
to the dimension of the pattern space. One drawback of this method is that the upper
bound on the dimension, i.e. the size of the largest graph, should be known before-
hand. Furthermore, for a pattern space with a high dimensionality, there would be too
many unnecessary zeros in the feature vectors of small graphs. The second method for
dealing with size difference of graphs is spectral truncation. In this case, a fix-sized
subset of eigenvalues for the different graphs are used to establish feature vectors. For
example, a fixed number of the leading non-zero eigenvalues are chosen as the ele-
ments of a feature vector. This method does not require prior knowledge of the size
of the largest graph. Nevertheless, it only takes advantage of a fraction of the spectral
information available and thus induces varying degrees information loss. To overcome
these drawbacks associated with traditional spectral methods, we take advantage of the
characteristic polynomial of the representation matrices. The characteristic polynomial
p(λ) of a matrix M with size N is defined as follows
Since these coefficients are closely related to the spectrum, they can be regarded as a
possible way to characterize graphs. Here we propose to use the characteristic polyno-
mial coefficients as the elements of the feature vector for a graph, instead of the eigen-
values. In this way we embed the graphs into a pattern space using the feature vectors
Characteristic Polynomial Analysis on Matrix Representations of Graphs 249
based on the characteristic polynomial coefficients. The merit of using the character-
istic polynomial coefficients over spectral embedding methods is that the coefficients
naturally take advantage of the complete spectrum and thus do not induce spectral trun-
cation. Hence the dimensionality of the pattern space can be determined without taking
into account the graph size differences.
6 Experimental Results
We experiment with the proposed feature vectors consisting of characteristic poly-
nomial coefficients on graphs extracted from the COIL database (samples shown in
Figure 1). We first extract corner points using the Harris detector. Then we establish
Delaunay graphs based on these corner points as nodes. The graphs extracted from
sample objects are superimposed upon the sample images in Figure 1.
We choose to work with the coefficient subset {c3 , c4 , cN −3 , cN −2 , cN −1 , cN }
because these coefficients tend to be the most salient ones in the relevant matrix repre-
sentations [8]. We establish the feature vector at two levels of scales, i.e. scaling the last
four coefficients by natural logarithm v1 = {c3 , c4 , ln(cN −3 ), ln(cN −2 ), ln(cN −1 ),
ln(cN )}T and scaling all the coefficients by natural logarithm v2 = {ln(c3 ), ln(c4 ),
ln(cN −3 ), ln(cN −2 ), ln(cN −1 ), ln(cN )}T . We conduct tests on the feature vectors con-
sisting of the characteristic polynomial coefficients on the following alternative
matrices:
(a) Adjacency matrix of the original graph;
(b) Laplacian matrix of the original graph;
(c) Adjacency matrix of the oriented lined graph (i.e. the Perron-Frobenius operator T
in the Ihara zeta function);
(d) Laplacian matrix associated with incoming vertex degree of the oriented lined
graph;
(e) Laplacian matrix associated with outgoing vertex degree of the oriented lined graph;
(f) Laplacian matrix associated with the sum of incoming and outgoing vertex degree
of the oriented lined graph;
(g) The positive support of third power of the state transition matrix U of discrete
random walks (i.e. sp(S + (U3 ))
We perform PCA on the pattern vectors to embed them into a 3-dimensional space.
We then locate the clusters using K-means method and calculate the Rand index for
the resulting clusters. The Rand index is defined as RI = Z/(Z + Y ), where Z is the
number of agreements and Y is the number of disagreements in cluster assignment. It
takes a value in the interval [0,1], where 1 corresponds to a perfect clustering.
There are 72 view images of each object in COIL database. The original image size
is 128 × 128. For the extracted delaunay graphs with more than 120 vertices and with
average vertex degree 5.6, the intermediate and higher coefficients of the characteristic
polynomial of (d)(e)(f) tend to be larger than 1.79 × 10308 and are not suitable for
matlab computation. Therefore, for the first set of experiments we resize the images in
to COIL database to a resolution 70 × 70 to reduce the number of detected corners and
hence to ensure that the computations do not overflow memory.
250 P. Ren, R.C. Wilson, and E.R. Hancock
Table 5. Rand Indices for traditional spectral methods on 128 × 128 images
The experimental results for the two types of scaled feature vectors on the resized
images are shown in Table 1 and Table 2. From Table 1 and Table 2 we can see that al-
though the within-class variation of c3 and c4 is reasonably small, the scheme in which
coefficients are scaled by the natural logarithm behaves slightly better. For feature vec-
tor v1 , the adjacency matrix of the original graph and the Perron-Frobenius operator T
perform better than the alternatives. For feature vector v2 each of the matrix represen-
tations has a similar performance in distinguishing graph classes.
Furthermore, we test our methods on the images with original size. In this case,
the characteristic polynomial coefficients of the Laplacian matrices for oriented line
graphs and that of quantum walk sp(S + (U3 )) do not work well due to their compu-
tation inefficiency. Table 3 gives the results using v1 on the adjacency matrix together
with Laplacian matrix for original graphs and the adjacency matrix for the oriented line
graph. Here we can see that the matrix representation in the transformed domain (i.e.
oriented line graph) performs much better than that in original domain. As far as v2 is
concerned, the Perron-Frobenius operator is also generally better than the traditional
matrix representations. To compare the proposed polynomial methods with the tradi-
tional methods, we list the results for traditional spectral methods in Table 5. Among
these three method, the first two only take advantage of the eigenvalues while the heat
contents involve the information contained both in the eigenvalues and the eigenvectors.
We can see that the results obtained using characteristic polynomial coefficients of the
oriented line graph are better than those obtained using the traditional methods.
7 Conclusion
represented sp(S + (U3 )) are less efficient to compute due to their extremely large dy-
namic range. On the other hand, the coefficients for the Perron-Frobenius operator do
not have this problem. For reasonably large graphs (given size), the coefficients of the
Perron-Frobenius operator perform better than the alternative methods described in this
paper. Above all, the Perron-Frobenius operator is superior in computational efficiency
and is effective in characterizing graphs among the four matrix representations, from
the characteristic polynomial point of view.
Acknowledgments
We acknowledge the financial support from the FET programme within the EU FP7,
under the SIMBAD project (contract 213250).
References
1. Aharonov, D., Ambainis, A., Kempe, J., Vazirani, U.: Quantum walks on graphs. In: Pro-
ceedings of ACM Theory of Computing (2001)
2. Bai, X., Hancock, E.R., Wilson, R.C.: Graph characteristics from the heat kernel trace. In:
Pattern Recognition (2009) (to appear)
3. Emms, D., Severini, S., Wilson, R.C., Hancock, E.R.: Coined quantum walks lift the cospec-
trality of graphs and trees. In: Proceedings of SSPR (2008)
4. Ihara, Y.: Discrete subgroups of pl(2, kϕ ). In: Proceeding Symposium of Pure Mathematics,
pp. 272–278 (1965)
5. Ihara, Y.: On discrete subgroups of the two by two projective linear group over p-adic fields.
Journal of Mathematics Society Japan 18, 219–235 (1966)
6. Kotani, M., Sunada, T.: Zeta functions of finite graphs. Journal of Mathematics University of
Tokyo 7(1), 7–25 (2000)
7. Luo, B., Wilson, R.C., Hancock, E.R.: Spectral embedding of graphs. Pattern Recogni-
tion 36(10), 2213–2223 (2003)
8. Ren, P., Wilson, R.C., Hancock, E.R.: Graph characteristics from the ihara zeta function. In:
Proceedings of SSPR (2008)
9. Ren, P., Wilson, R.C., Hancock, E.R.: Pattern vectors from the ihara zeta function. In: Pro-
ceedings of The 19th International Conference of Pattern Recognition (2008)
10. Wilson, R.C., Luo, B., Hancock, E.R.: Pattern vectors from algebraic graph theory. IEEE
Transactions on Pattern Analysis and Machine Intelligence 27(7), 1112–1124 (2005)
Flow Complexity: Fast Polytopal Graph
Complexity and 3D Object Clustering
1 Introduction
The quantification of the intrinsic complexity of undirected graphs has attracted
significant attention due to its fundamental practical importance, not only in
pattern recognition but also other areas such as control theory and network
analysis. Such a quantification not only allows the complexity of different graph
structures to be compared, but also allows the complexity to be traded against
goodness of fit to data when a structure is being learned. Previous complexity
characterizations include: a) the number of spanning trees, its connections with
the Laplacian spectrum, b) methods based on the path-length chromatic decom-
position, c) the number of Boolean operators necessary to build the graph from
generator graphs, and more recently, d) the notion of linear complexity of any
of the associated adjacency matrices.
In pattern recognition and machine learning the problem of quantifying graph
complexity is not only deeply related to embedding methods [1][2] for structural
classification or indexing [3], but is also key to the process of constructing graph
prototypes [4][5]. Recently, the connection between convex polytopes (and those
of the Birkhoff type in particular), heat kernels in graphs, and graph structure,
has been explored in [6][7]. A new measure of structural complexity, dubbed
polytopal complexity has been described in the latter works. Such measure is con-
nected with the the notion of graph entropy introduced by Körner in [8], and also
with novel spectral-based analysis and categorizations of complex networks [9].
In terms of graph embedding, polytopal complexity compares well with classical
A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 253–262, 2009.
c Springer-Verlag Berlin Heidelberg 2009
254 F. Escolano et al.
Thus, the graph complexity, i.e. trace Cβ (G), is a signature of the interaction
between the heat diffusion process and the structure/topology of the graph as
β (and thus the range of interactions between vertices) changes. It can also
be interpreted as a trajectory in Bn (the n−th Birkhoff polytope) between the
extreme point given by the identity permutation PI = In and the barycenter
B∗ = K(Cn ). In addition, the typical signature is heavy tailed, monotonically
increasing from 0 to β ∗ ≡ arg max{Cβ (G)} and monotonically decreasing from
β ∗ to ∞. Thus, β ∗ represents the most significant topological phase transition
regarding the impact of the diffusion process in the topology of the input graph.
In addition, it has been experimentally found [6] that the polytopal functional
is quasi-invariant is to graph permutations, that is Cβ (G) ≈ Cβ (QT AQ), where
A is the adjacency matrix of G and Q any permutation of order n. Quasi-
invariance is fulfilled despite the fact that the BvN decomposition does not yield
Flow Complexity: Fast Polytopal Graph Complexity 255
such invariance in the coefficients. The polytopal descriptor has also proved to
be effective for graph embedding and subsequent graph clustering. Experiments
related to protein-protein interaction networks are presented in [7]. However,
the main drawback of the polytopal descriptor is its computational cost. As the
number of iterations of the BvN decomposition is O(n2 ) and a Kuhn-Munkres
algorithm (O(n3 )) is executed at each iteration, we have a O(n5 ) complexity
per β value. This complexity precludes the use of the descriptor in real-time
pattern-recognition tasks unless the original graph is simplified [10] beforehand.
In addition the analysis of phase change is very cumbersome in the polytopal
framework. Thus, a new descriptor, qualitatively similar but more efficient than
the current one, and also providing a simpler analytical framework, is needed.
and Kβij ∈ [0, 1] is the (i, j) entry of a doubly stochastic matrix. The fact that
the heat kernel in the interval [β ∗ , βmin ] is populated by an increasing number of
elements for which Kβij ≈ 1, or equivalently an increasing number of off-diagonal
elements for which Kij = 0, i = j, motivated the analysis of phase change not
only through permanents, but also through the dynamic quantification of the
total heat flowing through the network represented by the graph. In this context,
heat flowing means heat passing through a given edge of the graph. Therefore,
the total heat flowing through the graph at a given β is:
n n
n
Fβ (G) = δij φk (i)φk (j)e−λn−k+1 β , (4)
i=1 j=i k=1
where δij = 1 if Aij = 1 and δij = 0 otherwise, that is, if δij = 1 if (i, j) ∈ E.
If we take derivative of F with respect to β, plug-in the second order Taylor
expansion for e−λn−k+1 β , and set the derivative to zero, then we have:
n
n
n
(λn−k+1 β)2
Fβ (G) = δij φk (i)φk (j)(−λn−k+1 ) 1 − λn−k+1 β + =0
i=1 j=i
2!
k=1
(5)
256 F. Escolano et al.
Let r <= n denote the number of elements of components in the upper tri-
angular part of A with δij = 1 (the same as in the lower triangular one), and
(i1 , . . . , ir ), (j1 , . . . , jr ) new indexes for these components. Then we have:
ir jr n
(λn−k+1 β)2
Fβ (G) = φk (i)φk (j)(−λn−k+1 ) 1 − λn−k+1 β + =
i=i j=j
2!
1 1 k=1
n ir jr
(λn−k+1 β)2
= (−λn−k+1 ) 1 − λn−k+1 β + φk (i)φk (j) =
2! i=i1 j=j1
k=1
Γ (k)
n n n
λ3n−k+1
=− Γ (k)β 2 + λ2n−k+1 Γ (k)β − λn−k+1 Γ (k) = 0 ,
2
k=1 k=1 k=1
(6)
which is a quadratic equation in β. Thus, let be β + one of the solutions to the
equation Fβ (G) = 0. Instead of solving that equation here, we must consider
that the second derivative at the phase-change point must be negative (local
concavity). So, valid β + must satisfy:
n
Fβ (G) = − λ3n−k+1 Γ (k)β + λ2n−k+1 Γ (k) < 0 , (7)
k=1 k=1
which is only true for β > 0. Actually, we define β + as the minimum β > 0
satisfying the latter equation. This rationale is still valid when, being coherent
with the definition of polytopal complexity in Eq. 1, we take the log2 of the
number of components in the Birkhoff decomposition. Following the rules of
defining complexity by multiplying entropy and disorder [11], which in our case
corresponds to a normalizing factor, we define the graph flow complexity as:
log2 (1 + Fβ (G))
Cfβ (G) = . (8)
log2 n
whose first derivative with respect to β is:
1
Cfβ (G) = F (G) =
(log2 n)(ln 2)(1 + Fβ (G)) β
Λβ
$ n n n
%
1 λ3n−k+1
= − Γ (k)β 2 + λ2n−k+1 Γ (k)β − λn−k+1 Γ (k) .
Λβ 2
k=1 k=1 k=1
(9)
Resulting β as defined in Eq. 6. However, what about the concavity of Cfβ (G)?
+
where a = (log2 n)(ln 2). In general, we define β ∗ as the minimum β satisfying the
latter inequality. This definition is coherent with the polytopal profile and β ∗ starts
either a function decay which passes through an inflexion towards the equilibrium
or reaches it immediately. Equilibrium is defined as limβ→∞ Cfβ (G) = logn2 n . Al-
though this equilibrium point is different from the polytopal one, the qualitative
behavior of the both profiles are similar and independent of the scales (number of
nodes) of the graphs. The difference between this descriptor and that of the poly-
topal one that flow complexity is very fast to compute. Therefore, graph simplifi-
cation is not yet needed to obtain a functional descriptor in real time. Moreover,
the flow-based profile is also permutation invariant.
In their original formulation, Reeb graphs date back to 1946, when they were
defined by George Reeb as topological constructs [12]. The basic idea is to ob-
tain information concerning the topology of a manifold M from the information
related to the critical points of a real function f defined on M . This is done by
analyzing the behaviour of the level sets La of f , namely the set of points sharing
the same value of f : La = f −1 (a) = {P ∈M: f (P ) = a}. As the isovalue a spans
the range of its possible values in the co-domain of f , connected components of
level sets may appear, disappear, join, split or change genus. The Reeb graph
keeps track of these changes, and stores them in a graph structure, whose nodes
are associated with the critical points of f .
Reeb graphs have been introduced in Computer Graphics by Shinagawa et al
in 1991 [13], and since then they have become popular for shape analysis and
description. The extension of Reeb graphs to triangle meshes to triangle meshes
has attracted considerable interest, and has proved to be one of the most pop-
ular representation for shapes in Computer Graphics. In this paper, we follow
the computational approach in [14,15], where a discrete counterpart of Reeb
graphs, referred to as the Extended Reeb Graph (ERG), is defined for triangle
meshes representing surfaces in R3 . The basic idea behind ERG is to provide a
region-oriented characterization of surfaces, rather then a point-oriented charac-
terization. This is done by replacing the notion and role of critical points with
that of critical areas, and the analysis of the behaviour of the level sets with
the analysis of the behaviour of surface stripes, defined by partitioning the co-
domain of f into a finite set of intervals. We consider in more detail a finite
number of level sets of f , which divide the surface into a set of regions. Each
region is classified as a regular or a critical area according to the number and
the value of f along its boundary components. Critical areas are classified as
258 F. Escolano et al.
Fig. 1. Pipeline of the ERG extraction. Left: surface partition and recognition of critical
areas; blue areas correspond to minima, red areas correspond to maxima, green areas
to saddle areas. Middle: insertion of edges between minima and saddles and between
maxima and saddles, by expanding all maxima and minima to their nearest critical
area. Right: insertion of the remaining edges, to form the final graph.
maximum, minimum and saddle areas. A node in the graph is associated with
each critical area. Then arcs between nodes are detected through an expansion
process of the critical areas, by tracking the evolution of the level sets. The
pipeline of the ERG extraction is illustrated in Fig. 1.
A fundamental property of ERGs is their parametric nature with respect to
the mapping function f : different choices of f produce different graphs. In this
paper, we deal with the comparison of free-form 3D shapes, hence we require a
graph representation that is invariant with respect to rotations and translations
(to distinguish the real content of shapes from the particular choices made by
the 3D designer) and possibly with respect to pose changes. A solution comes
from the adoption of the integral geodesic distance, as discretized in [16]. For
each vertex v in the mesh, the function is defined as
Fig. 2. 3D models described by the integral geodesic distance (red corresponds to high
values of the function, blue corresponds to low values), along with the corresponding
Extended Reeb Graphs
Flow Complexity: Fast Polytopal Graph Complexity 259
of the neighborhood of bi . We recall that the geodesic distance between two given
surface points p and q is the minimal length of all surface curves joining p and
q. Since the geodesic distance relies neither on a local coordinate system nor on
surface embedding, the graph configuration derived is invariant to translation
and rotation of the model, as well as to isometric pose changes, thus being well
suited to deal with articulated 3D shapes. Fig. 2 shows some examples models
described by the integral geodesic distance (red corresponds to high values of the
function, blue corresponds to low values), along with their corresponding ERGs.
Fig. 3. The dataset of 300 models used for the experiments in this paper. See the
SHREC 2008 Stability Track Report [17] for further details.
In this paper, we are using the dataset of 300 3D models used in the Stability
Track of the SHREC Contest 2008 [17]. This dataset is made of 15 classes, with
20 models per class. The objects included range from humans and animals to
cups and mechanical parts, as shown in Fig. 3.
3.2 Experimentation
We have tested the new complexity measure as a descriptor (discretization of
the function) of 300 graphs (15 classes with 20 members each, see Fig. 3). These
graphs correspond to the geodesics of the 3D shapes described in the previous
section. We have verified that the profile similarity is approximately invariant
to non-rigid deformations, and thus not effective for clustering the patterns de-
rived from the discretization. Instead we have compared them using the Eu-
clidean distances dij and have constructed a pairwise similarity matrix M where
Mij = e−Kdij (in this case K = 25). We show the resulting similarity matrix in
Fig. 5, with 90, 000 entries. The analysis of this matrix reveals two very compact
classes (glasses, spring), and some other classes with a smaller degree of com-
pactness (armadillo, cup, bird). There are also two super-clusters. The first one,
of 6 classes, includes (see one representative profile of each class in Fig. 4-left):
airplane,chair, octopus,table, hand, and fish being airplane and hand the most
compact ones. The table class is too sparse. The second super-cluster (see Fig. 4-
right) includes bust, mechanic and fourleg. The human class is highly similar to
elements in both super-clusters, and in particular the h fourlegs In both super-
clusters, the profiles follow the same characteristic path in qualitative terms,
having similar values of β ∗ and different, but sometimes coincident, values of
Fβ ∗ (G)). This indicates that the descriptor should be applied to graphs origi-
nating from sources alternative to geodesics, and then an integrated clustering,
in order to take into account these additional measures of similarity.
4 Conclusions
In this paper, we have proposed and successfully tested a novel measure of graph
complexity which is qualitatively similar to, but more efficient than the polytopal
one. The novel measure provides a simpler analysis framework for characterizing
the phase-transition locations in terms of graph spectra. The new measure has
been tested on clustering a database of 300 graphs originating from 3D objects.
We have found several clusters and a pair of super-clusters in the data. Most
of the classes in the super-clusters are consistent with the corresponding 3D
shapes. However, there is some scope for improving the discriminating power of
the method. This latter is a topic for future research, and it will be addressed
through computing the descriptors of several graphs originating from the same
shape with alternatives to the geodesics (eigenfunctions, for instance) and then
performing consesus-clustering. This approach is feasible since the descriptor is
efficient. We will also compare this method with alternatives described in the
literature, since it is now possible to compare such descriptors for very large
networks.
262 F. Escolano et al.
Acknowledgements
Work partially supported by the Project SHALOM funded by the Italian Min-
istry of Research (contract number RBIN04HWR8) and the EU FP7 Project
FOCUS K3D (contract number ICT-2007.4.2).
References
1. Robles-Kelly, A., Hancock, E.R.: A riemannian approach to graph embedding.
Pattern Recognition (40), 1042–1056 (2007)
2. Luo, B., Wilson, R.C., Hancock, E.: Spectral embedding of graphs. Pattern Recog-
nition (36), 2213–2223 (2003)
3. Shokoufandeh, A., Dickinson, S., Siddiqi, K., Zucker, S.: Indexing using a spectral
encoding of topological structure. In: IEEE ICPR, pp. 491–497
4. Torsello, A., Hancock, E.: Learning shape-classes using a mixture of tree-unions.
IEEE Trans. on PAMI 28(6), 954–967 (2006)
5. Lozano, M., Escolano, F.: Protein classification by matching and clustering surface
graphs. Pattern Recognition 39(4), 539–551 (2006)
6. Escolano, F., Hancock, E., Lozano, M.: Birkhoff polytopes, heat kernels, and graph
embedding. In: ICPR (2008)
7. Escolano, F., Hancock, E., Lozano, M.: Graph complexity, matrix permanents, and
embedding. In: Proc. SSPR/SPR (2008)
8. Körner, J.: Coding of an information source having ambiguous alphabet and the
entropy of graphs. In: Trans. of the 6th Prague Conference on Information Theory,
pp. 411–425 (1973)
9. Estrada, E.: Graph spectra and structure in complex networks. Technical report,
Institute of Complex Systems at Strathclyde, Department of Physics and Depart-
ment of Mathematics, University of Strathclyde, Glasgow, UK (2008)
10. Qiu, H., Hancock, E.: Graph simplification and matching using conmute times.
Pattern Recognition (40), 2874–2889 (2007)
11. López-Ruiz, R., Mancini, H., Calbet, X.: A statistical measure of complexity.
Physics Letters A 209, 321–326 (1995)
12. Reeb, G.: Sur les points singuliers d’une forme de Pfaff complètement intégrable
ou d’une fonction numérique. Comptes Rendus Hebdomadaires des Séances de
l’Académie des Sciences 222, 847–849 (1946)
13. Shinagawa, Y., Kunii, T.L.: Constructing a Reeb Graph automatically from cross
sections. IEEE Computer Graphics and Applications 11(6), 44–51 (1991)
14. Biasotti, S.: Computational Topology Methods for Shape Modelling Applications.
Ph.D thesis, Universitá degli Studi di Genova (May 2004)
15. Biasotti, S.: Topological coding of surfaces with boundary using Reeb graphs. Com-
puter Graphics and Geometry 7(1), 31–45 (2005)
16. Hilaga, M., Shinagawa, Y., Kohmura, T., Kunii, T.L.: Topology matching for fully
automatic similarity estimation of 3D shapes. In: SIGGRAPH 2001: Proceedings
of the 28th Annual Conference on Computer Graphics and Interactive Techniques,
Los Angeles, CA, pp. 203–212. ACM Press, New York (2001)
17. Attene, M., Biasotti, S.: Shape retrieval contest 2008: Stability of watertight mod-
els. In: Spagnuolo, M., Cohen-Or, D., Gu, X. (eds.) SMI 2008: Proceedings IEEE
International Conference on Shape Modeling and Applications, pp. 219–220 (2008)
Irregular Graph Pyramids and Representative
Cocycles of Cohomology Generators
1 Introduction
Image analysis deals with digital images as input to pattern recognition systems.
Topological features have the ability to ignore changes in geometry caused by dif-
ferent transformations. Simple features are for example the number of connected
components, the number of holes, etc., while more refined ones, like homology
and cohomology, characterize holes and their relations.
In order to characterize the holes in a region adjacency graph (RAG) associ-
ated to a 2D binary digital image, one way would be to consider the cycles with
exactly 4 edges as degenerate cycles and establish an equivalence between all the
cycles of the graph as follows: two cycles are equivalent if one can be obtained
from the other by joining to it one or more degenerate cycles. There is only one
equivalence class for the foreground (gray pixels) of the digital image in Fig. 1,
which represents the unique hole. This is similar to consider the digital image
Partially supported by the Austrian Science Fund under grants S9103-N13 and
P18716-N13; Junta de Andalucı́a (grants FQM-296 and TIC-02268) and Spanish
Ministry for Science and Education (grant MTM-2006-03722).
A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 263–272, 2009.
c Springer-Verlag Berlin Heidelberg 2009
264 R. Gonzalez-Diaz et al.
Fig. 1. a) A 2D digital image I; b) its RAG; c) a cell complex associated to I (in blue,
a representative cocycle); and d) the cell complex without the hole
as a cell complex1 [1] (see Fig. 1.c). Here one can ask for the edges we have to
delete in order to ‘destroy’ the hole.
In the example in Fig. 1 it is not enough to delete only one edge. The set of blue
edges in Fig. 1.c block any cycle that surrounds the hole; the deletion of these
edges together with the faces that they bound produces the ‘disappearing’ of
the hole. A 1-cocycle of a planar object can be seen as a set of edges ‘blocking’
the creation of cycles of one homology class. The elements of cohomology are
equivalence classes of cocycles.
Topology simplification is an active field in geometric modeling and medical
imaging (see for example [2]). In fact, the ring structure presented in cohomol-
ogy is more refined than homology. The main drawbacks to using cohomology
in Pattern Recognition have been its lack of geometrical meaning and the com-
plexity for computing it. Nevertheless, concepts related to cohomology can have
associated interpretations in graph theory. Having these interpretations opens
the door for applying classical graph theory algorithms to compute and manip-
ulate these features. Initial plans regarding this research have been presented in
this paper in Section 5.
The paper is organized as follows: Sections 2 and 3 recall graph pyramids
and cohomology, and make initial connections. Section 4 presents the proposed
method. Section 5 gives considerations regarding the usage of cohomology in
image processing. Section 6 concludes the paper.
Fig. 2. A digital image I, and boundary graphs G6 , G10 and G16 of the pyramid of I
correspondence between the edges of G and the edges of G, which also induces
a one-to-one correspondence between the vertices of G and the 2D cells (will be
denoted by faces 2 ) of G. The dual of G is again G. The following operations are
equivalent: edge contraction in G with edge removal in G, and edge removal in
G with edge contraction in G.
A (dual) irregular graph pyramid [3,4] is a stack of successively reduced planar
graphs P = {(G0 , G0 ), . . . , (Gn , Gn )}. Each level (Gk , Gk ), 0 < k ≤ n is obtained
by first contracting edges in Gk−1 (removal in Gk−1 ), if their end vertices have
the same label (regions should be merged), and then removing edges in Gk−1
(contraction in Gk−1 ) to simplify the structure. The contracted and removed
edges are said to be contracted or removed (sometimes called removal edges)
in (Gk−1 , Gk−1 ). In each Gk−1 and Gk−1 , contracted edges form trees called
contraction kernels. One vertex of each contraction kernel is called a surviving
vertex and is considered to have been ‘survived’ to (Gk , Gk ). The vertices of a
contraction kernel in level k − 1 form the reduction window of the respective
surviving vertex v in level k. The receptive field of v is the (connected) set of
vertices from level 0 that have been ‘merged’ to v over levels 0 . . . k.
For each boundary graph Gi , the cell complex [5] associated to the foreground
object, called boundary cell complex, is obtained by taking all faces of Gi cor-
responding to vertices of Gi , whose receptive fields contain (only) foreground
pixels, and adding all edges and vertices needed to represent the faces.
Lemma 1. All the boundary cell complexes of a given irregular dual graph pyra-
mid are cell subdivisions of the same object. Therefore, all these cell complexes
are homeomorphic.
As a result of Lemma 1, topological invariants computed on different levels of
the pyramid are equivalent.
0-cells {v1 , v2 , v3 , v4 }
1-cells {e1 , e2 , e3 , e4 , e5 , e6 }
2-cells {f1 }
1-boundary ∂f1 = e1 + e2 + e5
1-chain e1 + e3
1-cycle a = e3 + e4 + e5
1-cycle b = e1 + e2 + e3 + e4
homologous cycles a and b; since a = b + ∂f1
kernel is the group of cocycles and its image is the group of coboundaries. Two
p-cocycles c and c are cohomologous if there exists a p-coboundary d such that
c = c + d. The pth cohomology group is defined as the quotient of p-cocycle
modulo p-coboundary groups, H p = Z p /B p , for all p. Each element of H p is a
class obtained by adding each p-coboundary to a given p-cocycle c. Then c is a
representative cocycle of the cohomology class c + B p . If the object is embedded
in R3 , then homology and cohomology are isomorphic. However, cohomology has
a ring structure which is a more refined invariant than homology. See [5] for a
more detailed explanation.
Starting from a cell decomposition of an object (e.g. from any level of the
∂ ∂ ∂
pyramid) and the chain complex associated to it, · · · →2 C1 →1 C0 →0 0, take a
q-cell σ and a (q + 1)-chain α. An integral operator [6] is defined as the set of
homomorphisms {φp : Cp → Cp+1 }p≥0 such that φq (σ) = α, φq (μ) = 0 if μ is a
q-cell different to σ, and for all p = q and any p-cell γ we have φp (γ) = 0. It is
extended to all p-chains by linearity.
Integral operators can be seen as a kind of inverse boundary operator. They
satisfy the condition φp+1 φp = 0 for all p. An integral operator {φp : Cp →
Cp+1 }p≥0 satisfies the chain-homotopy property iff φp ∂p+1 φp = φp for each p.
For φp satisfying the chain-homotopy property, define πp = idp +φp−1 ∂p +∂p+1 φp
∂ ∂ ∂
where {idp : Cp → Cp }p≥0 is the identity. Then, · · · →2 imπ1 →1 imπ0 →0 0 is
a chain complex and {πp : Cp → imπp } is a chain equivalence [5]. Its chain-
homotopy inverse is the inclusion map {ιp : imπp → Cp }.
Consider, for example, the cell complex K in Fig. 4 on the left. The integral
operator associated to the removal of the edge e is given by φ1 (e) = B. Then,
π1 (e) = a + f + d, π2 (B) = 0, π2 (A) = A + B (A + B is renamed as A in K ) and
πp is the identity over the other p-cells of K, p = 0, 1, 2. The removal of edge e
decreased the degree of vertices 1 and 3 allowing for further simplification.
The following lemma guarantees the correctness of the down projection pro-
cedure given in Section 5.
Lemma 2. Let {φp : Cp → Cp+1 }p≥0 be an integral operator satisfying the
∂ ∂ ∂ ∂
chain-homotopy property. The chain complexes · · · →2 C1 →1 C0 →0 0 and · · · →2
∂ ∂
imπ1 →1 imπ0 →0 0 have isomorphic homology and cohomology groups. If c :
imπp → Z/2 is a representative p-cocycle of a cohomology generator, then cπ :
Cp → Z/2 is a representative p-cocycle of the same generator.
For example, consider the cell complex K of Fig. 4. The 1-cochain α, defined
by the set {c, d} of edges of K , is a 1-cocycle which ‘represents’ the white hole
φ π
e B a+f +d
B 0 0
A 0 A
other p-cell σ 0 σ
H (in the sense that all the cycles representing the hole must contain at least
one of the edges of the set). Then β = απ is defined by the set {c, d, e} of
edges of K. α and β are both 1-cocycles representing the same white hole H.
Lemma 5. The boundary cell complex of any level of the pyramid and the one of
the homology-generator level have isomorphic homology and cohomology groups.
Irregular Graph Pyramids and Representative Cocycles 269
Note that these graphs were defined from the integral operators associated to
the removed and contracted edges of the boundary graph of level k − 1 to obtain
level k. An example of the down projection is shown in Fig. 5.b.
Let n be the height of the pyramid (number of levels), en the number of edges
in the top level, and v0 the number of vertices in the base level, with n ≈ log v0
(logarithmic height). An upper bound for the computation complexity is: O(v0 n),
to build the pyramid; for each foreground component, O(h) in the number of
holes h, to choose the representative cocycles in the top level; O(en n) to down
project the cocycles (each edge is contracted or removed only once). Normally not
all edges are part of cocycles, so the real complexity of down projecting a cocycle
is below O(en n). The overall computation complexity is: O(v0 n+c(hen n)), where
c is the number of cocycles that are computed and down projected.
and β in G0 : the cycles ι(α) = a and ι(β) = b, respectively. Take any edge ea ∈ a
and eb ∈ b. Let fa , fb be faces of K0 having ea respectively eb in their boundary.
Let v0 , v1 , . . . , vn be a simple path of vertices in G0 s.t. all vertices are labeled as
foreground. v0 is the vertex associated to fa , and vn to fb .
Proposition 1. Consider the set of edges c = {e0 , . . . , en+1 } of G0 , where e0 =
ea , en+1 = eb , and ei , i = 1 . . . n, is the common edge of the regions in G0
associated with the vertices vi−1 and vi . c defines a cocycle cohomologous to the
down projection of the cocycle α∗ .
Proof. c is a cocycle iff c∂ is the null homomorphism. First, c∂(fi ) = c(ei +
ei+1 ) = 1 + 1 = 0. Second, if f is a 2-cell of K0 , f = fi , i = 0, . . . , n, then,
c∂(f ) = 0. To prove that the cocycles c and α∗ π (the down projection of α∗ to
the base level of the pyramid) are cohomologous, is equivalent to showing that
cι = α∗ . We have that cι(α) = c(eb ) = 1 and cι(β) = c(ea ) = 1. Finally, cι over
the remaining self-loops of the boundary graph of the homology-generator level
is null. Therefore, cι = α∗ .
Observe that the cocycle c in G0 may correspond to the path connecting two
boundaries and having the minimal number of edges: ‘a minimal representative
cocycle’. As a descriptor for the whole object, take a set of minimal cocycles
having some common property3 .
Lemma 6. Let γ ∗ be a representative 1-cocycle in G0 , whose projection in the
homology-generator level is the cocycle α∗ defined by the two self-loops {α, β}.
γ ∗ has to satisfy that it contains an odd number of edges of any cycle g in G0
that is homologous to ι(α), the down projection of α in G0 .
Proof. γ ∗ contains an even number of edges of g iff γ ∗ (g) = 1. First, there exists
a 2-chain b in K0 such that g = ι(α) + ∂(b). Second, γ ∗ (g) = γ ∗ (ι(α) + ∂(b)) = 1,
since γ ∗ ι(α) = α∗ (α) = 1, and γ ∗ ∂(b) = 0 because γ ∗ is a cocycle. So g must
contain an even number of edges of the set that defines γ ∗ .
Consider the triangulation in Fig. 7, corresponding to a torus4 . Any cycle homol-
ogous to β contains an odd number of edges of β ∗ (e.g. dotted edges in Fig. 7.c).
3
E.g. they all connect the boundaries of holes with the ‘outer’ boundary of the object,
and each of them corresponds to an edge in the inclusion tree of the object.
4
Rectangle where bottom and top, respectively left and right edges are glued together.
272 R. Gonzalez-Diaz et al.
a) b) c) d) e)
Fig. 7. A torus: a) triangulation; b) representative cycles of homology generators; c) a
representative cocycle; d) and e) non-valid representative cocycles
The dotted edges in d) and e) do not form valid representative cocycles: in d),
a cycle homologous to β does not contain any edge of β ∗ ; in e), another cycle
homologous to β contains an even number of edges of β ∗ .
6 Conclusion
This paper considers cohomology in the context of graph pyramids. Representa-
tive cocycles are computed at the reduced top level and down projected to the
base level corresponding to the original image. Connections between cohomol-
ogy and graph theory are proposed, considering the application of cohomology
in the context of classification and recognition. Extension to higher dimensions,
where cohomology has a richer algebraic structure than homology, and complete
cohomology - graph theory associations are proposed for future work.
References
1. Hatcher, A.: Algebraic topology. Cambridge University Press, Cambridge (2002)
2. Wood, Z.J., Hoppe, H., Desbrun, M., Shroder, P.: Removing excess topology from
isosurfaces. ACM Trans. Graph. 23(2), 190–208 (2004)
3. Kropatsch, W.G.: Building irregular pyramids by dual graph contraction. IEE-Proc.
Vision, Image and Signal Processing 142(6), 366–374 (1995)
4. Kropatsch, W.G., Haxhimusa, Y., Pizlo, Z., Langs, G.: Vision pyramids that do not
grow too high. Pattern Recognition Letters 26(3), 319–337 (2005)
5. Munkres, J.R.: Elements of Algebraic Topology. Addison-Wesley, Reading (1993)
6. González-Dı́az, R., Jiménez, M.J., Medrano, B., Molina-Abril, H., Real, P.: Integral
operators for computing homology generators at any dimension. In: Ruiz-Shulcloper,
J., Kropatsch, W.G. (eds.) CIARP 2008. LNCS, vol. 5197, pp. 356–363. Springer,
Heidelberg (2008)
7. Peltier, S., Ion, A., Kropatsch, W.G., Damiand, G., Haxhimusa, Y.: Directly com-
puting the generators of image homology using graph pyramids. Image and Vision
Computing (2008) (in press), doi:10.1016/j.imavis.2008.06.009
8. Iglesias-Ham, M., Ion, A., Kropatsch, W.G., Garcı́a, E.B.: Delineating homology
generators in graph pyramids. In: Ruiz-Shulcloper, J., Kropatsch, W.G. (eds.)
CIARP 2008. LNCS, vol. 5197, pp. 576–584. Springer, Heidelberg (2008)
Annotated Contraction Kernels
for Interactive Image Segmentation
Hans Meine
1 Introduction
One of the most valueable and most often employed tools for image segmenta-
tion is the watershed transform, which is based on a solid theory and extracts
object contours even with low contrast. On the other hand, it is often criticized
for delivering a strong oversegmentation, which is simply a consequence of the
fact that the watershed transform has no built-in relevance filtering. Instead, it
is often used as the basis for a hierarchical segmentation setting in which an
initial oversegmentation is successively reduced, i.e. by merging adjacent regions
that are rated similar by some appropriate cost measure (e.g. the difference of
their average intensity) [1,2,3,4]. This bottom-up approach fits very well with
the concept of irregular pyramids [5,6], and the main direction of this work is to
show how the Active Paintbrush – an interactive segmentation tool developed
for medical imaging [2] – and an automatic region merging [2,3,7] can be for-
mulated based on the concepts of irregular pyramids and contraction kernels.
This serves three goals: a) delivering a useful, practical application of contrac-
tion kernels, b) basing the description of segmentation methods on well-known
concepts instead of their own, specialized representation, and c) demonstrating
how a common representation facilitates the development of a more efficient
integration of the above automatic and interactive methods.
The following sections are organized as follows: In section 2, we summarize
previous work on the Active Paintbrush and automatic region merging (2.1) and
on irregular pyramids and contraction kernels (2.2). Section 3 combines these
concepts and introduces the ideas of continuous pyramids and annotated con-
traction kernels (3.1), before proposing methods that exploit this new foundation
for a better integration of automatic and interactive tools (section 3.2).
A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 273–282, 2009.
c Springer-Verlag Berlin Heidelberg 2009
274 H. Meine
2 Previous Work
2.1 The Active Paintbrush Tool
The Active Paintbrush was introduced by Maes [2] as an efficient interactive seg-
mentation tool for medical imaging. It is based on an initial oversegmentation
produced using the watershed transform, and a subsequent merging of regions.
The latter is performed in two steps:
Since this is a pure bottom-up approach (i.e. the number of regions monotonically
decreases, and no new boundaries are introduced), this approach relies on all
important boundaries being already present in the initial oversegmentation. The
user steers the amount of merging performed in the first step in order to remove
as many boundaries as possible (to reduce the time spent in the second step)
without losing relevant parts.
Merge Tree Representation. For this work, it is important to highlight the in-
ternal representation built within the first step, in which the automatic region
merging interactively merges the two regions rated most similar (an equivalent
approach is used in [2,3,7,8]). This process is continued until the whole image is
represented by one big region, and at the same time a hierarchical description
of the image is built up: a tree of merged regions, the leaves of which are the
primitive regions of the initial oversegmentation (illustrated in Fig. 1a). This
tree can also be interpreted as encoding a stack of partitionings, each of which
contains one region less than the one below.
10 1 region 10
8 9 8 9
merging
7 4 7 4
6 2 5 3 6 2 5
1 1 1 1 1 1 1 1 1 1 10 regions 1 1 1 1 1
(a) full merge tree (10 regions) (b) pruned tree (7 regions)
By labeling each merged node with the step in which the merge happened, it
becomes very easy to prune the tree as the user adjusts the amount of merging
interactively: for instance, the partitioning at level l = 4 within the above-
mentioned stack can be retrieved by pruning all branches below nodes with a
label≤ l (cf. Fig. 1b).
Annotated Contraction Kernels for Interactive Image Segmentation 275
Limitations. While this approach already allows for a relatively efficient interac-
tive segmentation, there is one limitation that we will remove in this article which
increases the efficiency a lot: the two above-mentioned steps are strictly sepa-
rated. This is unfortunate, since the automatic method used in the first step in
general produces partitionings that suffer from oversegmentation in some parts,
but already removed crucial edges elsewhere, e.g. at locations with very low
contrast. Thus, the merge parameter has to be set low enough not to lose the
part with the lowest contrast, and the interactive paintbrush needs to be used
to remove all unwanted edges in all other areas, too. It would be helpful if it
was possible to just make the needed manual changes and then go back to the
automatic method to quickly finish the segmentation.
of darts together, i.e. each α-orbits represents an edge of the boundary graph.
The permutation σ then encodes the counter-clockwise order of darts around
a vertex, i.e. each σ-orbit corresponds to a vertex of the boundary graph. By
convention, D ⊂ Z \ {0} such that α can be efficiently encoded as α (d) := −d.
The dual permutation of σ is defined as ϕ = σ ◦ α and thus encodes the order
of darts encountered during a contour traversal of the face to the right, i.e. each
ϕ-orbit represents a face of the tessellation.
In contrast to earlier representations using simple [5,6] or dual graphs [12],
combinatorial maps explicitly encode the cyclic order of darts around a face,
which makes the computation of the dual graph so efficient that it does not need
to be represented explicitly anymore.
Nevertheless, combinatorial maps also suffer from some limitations, most no-
tably that they rely on “pseudo edges” or “fictive edges” [12,13] to connect oth-
erwise separate boundary components. Topologically-wise, they are commonly
called “bridges”, since every path between their end nodes must pass via this
edge. These artificial connections have several drawbacks:
– In some situations, one may want to have bridges represent existing image
features, for instance incomplete boundary information or skeleton parts. This
would require algorithms to differentiate between fictive and real bridges.
– If we relate topological edges with their geometrical counterparts, we are
faced with the problem that fictive edges do not correspond to any ge-
ometrical entity. Even topologically-wise, fictive edges “appear arbitrarily
placed” [13].
– They lead to inefficient algorithms; e.g. contour traversals are needed to
determine the number of holes or to find an enclosing parent region.
Because of the above limitations, combinatorial maps are often used in conjunc-
tion with an inclusion relation that replaces the fictive edges [14,15].
Using these topological formalisms, segmentation algorithms can rely on a
sound topology that allows them to work with regions and boundaries as duals
of each other. However, segmentation first and foremost relies on an encoding of
the tessellation’s geometry, which is not represented by the above maps. Thus,
they are typically used side-by-side with a label image or similar.
Therefore, we have introduced the GeoMap [8,16,17] which represents both
topological and geometrical aspects of a segmentation, thus allowing algorithms
no longer to deal with pixels directly, and ensuring consistency between geometry
and topology. In particular, this makes algorithms independent of the embedding
model and allows to use either inter-pixel boundaries [18], 8-connected pixel
boundaries [16], or sub-pixel precise polygonal boundaries [8,17].
for irregular pyramids (for brevity, we give the graph-based definition here, which
is less involved than the analoguous definition on combinatorial maps [10]):
A contraction kernel is applied to a graph whose vertices represent regions (cf. the
dual map (D, ϕ)) by contracting all edges in N , such that for each graph in
the forest, all vertices connected by the graph are identified and represented
by its root s ∈ S (details on contractions within combinatorial maps may be
found in [11]). In simple words, a contraction kernel is used to specify groups of
adjacent regions within a segmentation that should be merged together.
9 9
7 7
8 8
6 6
5 4 3 5 4 3
1 2 1 2
(a) annotated contraction kernel (b) contraction kernel for the fourth level
level 8646 with 374 regions (cfh = 0.5), and level 9000 with 20 regions left
(cfh ≈ 5.07).
As described in section 2.1, the use of the Active Paintbrush [2] consists of two
steps: after the oversegmentation and the hierarchical representation have been
computed, the user first adjusts the level of automatic merging by choosing an ap-
propriate level from the imaginary stack of tesselations. Afterwards, the operator
uses the Active Paintbrush to “paint over” any remaining undesirable boundaries
within the object of interest, which effectively creates new pyramid levels.
We can now implement the automatic and the interactive reduction methods
based on the same internal, map-based representation and contraction kernels.
This opens up new possibilities with respect to the combination of the tools, i.e.
we can now use one after the other for reducing the oversegmentation and creating
further pyramid levels up to the desired result. This is illustrated in Fig. 5a: the lev-
els of our continuous pyramid are ordered from level 0 (initial oversegmentation)
on the left to level 2834 (the apex, at which the whole image is represented as one
single region) on the right. The current pyramid is the result of applying first the
automatic region merging (ARM), then performing some manual actions with the
Active Paintbrush (APB), then using the ARM again.
(c) the cost threshold is interactively ad- (d) with a few strokes, single critical re-
justed so that no boundaries are damaged gions are finalized and "fixed" by protect-
(114 regions remaining) ing the faces (white, hatched)
(e) now, automatic region merging can be (f) with two quick final strokes, three re-
applied again, without putting the pro- maining unwanted regions are removed to
tected regions at risk (30 regions left) get this final result (27 regions)
Fig. 6. Example session demonstrating our new face protection concept; the captions
explain the user actions for going from (a) to (f)
Annotated Contraction Kernels for Interactive Image Segmentation 281
However, this architecture poses difficulties when the user is given the freedom
to e.g. change the cost measure employed by the ARM or to navigate to lower
pyramid levels than those generated manually: it is very unintuitive if the results
of one’s manual actions disappear from the working level, or if the pyramid is even
recomputed such that they are lost completely. Again, the solution lies in the con-
cept of equivalent contraction kernels, which make it possible to reorder merges:
we represent the results of applying the Active Paintbrush in separate contraction
kernels such that they always get applied first, see Fig. 5b. (This is equivalent to
labeling the edges within our annotated contraction kernel with zero.) In effect,
this makes it possible to locally finish the segmentation of an object at the de-
sired pyramid level, but to go back to lower pyramid levels when one notices that
important edges are missing in other parts of the image.
We also add the concept of face protection to improve the workflow in the
opposite direction: often, the Active Paintbrush is used to remove all unwanted
edges within the contours of an object of interest. Then, it should be possible
to navigate to higher pyramid levels without losing it again, so we provide a
means to protect a face, effectively finalizing all of its contours. An example
segmentation session using these tools is illustrated in Fig. 6.
4 Conclusions
In this paper, we have shown how the theory of contraction kernels within
irregular pyramids can be used as a solid foundation for the formulation of inter-
active segmentation methods. We have introduced annotated contraction kernels
in order to be able to quickly retrieve a contraction kernel suitable for efficiently
computing any desired level directly from the pyramid’s bottom or from any of
the levels in between. Furthermore, we have argued that logarithmic tapering
with a fixed reduction factor is irrelevant for irregular pyramids in contexts like
ours, and we have introduced the term continuous pyramids for the degenerate
case in which each level has only one region less than the one below.
On the other hand, we proposed two extensions around the Active Paintbrush
tool which make it even more effective. First, we have expressed both the auto-
matic region merging and the interactive method as reduction operations within
a common irregular pyramid representation. This allowed us to apply the the-
ory of equivalent contraction kernels in order to separate the representation of
manual actions from automatically generated pyramid levels and thus to enable
the user to go back and forth between segmentation tools. Along these lines,
we have also introduced the concept of face protection which complements the
Active Paintbrush very well in a pyramidal context.
References
1. Najman, L., Schmitt, M.: Geodesic saliency of watershed contours and hierarchical
segmentation. IEEE T-PAMI 18, 1163–1173 (1996)
2. Maes, F.: Segmentation and Registration of Multimodal Images: From Theory,
Implementation and Validation to a Useful Tool in Clinical Practice. Ph.D thesis,
Katholieke Universiteit Leuven, Leuven, Belgium (1998)
282 H. Meine
3. Haris, K., Efstratiadis, S.N., Maglaveras, N., Katsaggelos, A.K.: Hybrid image
segmentation using watersheds and fast region merging. IEEE Trans. on Image
Processing 7, 1684–1699 (1998)
4. Meine, H.: XPMap-based irregular pyramids for image segmentation. Diploma the-
sis, Dept. of Informatics, Univ. of Hamburg (2003)
5. Meer, P.: Stochastic image pyramids. Comput. Vision Graph. Image Process. 45,
269–294 (1989)
6. Jolion, J.M., Montanvert, A.: The adaptive pyramid: A framework for 2D image
analysis. CVGIP: Image Understanding 55, 339–348 (1992)
7. Beaulieu, J.M., Goldberg, M.: Hierarchy in picture segmentation: A stepwise opti-
mization approach. IEEE T-PAMI 11, 150–163 (1989)
8. Meine, H.: The GeoMap Representation: On Topologically Correct Sub-pixel Image
Analysis. Ph.D thesis, Dept. of Informatics, Univ. of Hamburg (2009) (in press)
9. Kropatsch, W.G.: From equivalent weighting functions to equivalent contraction
kernels. In: Digital Image Processing and Computer Graphics: Applications in Hu-
manities and Natural Sciences, vol. 3346, pp. 310–320. SPIE, San Jose (1998)
10. Brun, L., Kropatsch, W.G.: Contraction kernels and combinatorial maps. Pattern
Recognition Letters 24, 1051–1057 (2003)
11. Brun, L., Kropatsch, W.G.: Introduction to combinatorial pyramids. In: Bertrand,
G., Imiya, A., Klette, R. (eds.) Digital and Image Geometry. LNCS, vol. 2243, pp.
108–127. Springer, Heidelberg (2002)
12. Kropatsch, W.G.: Building irregulars pyramids by dual graph contraction. IEEE-
Proc. Vision, Image and Signal Processing 142, 366–374 (1995)
13. Kropatsch, W.G., Haxhimusa, Y., Lienhardt, P.: Hierarchies relating topology and
geometry. In: Christensen, H.I., Nagel, H.-H. (eds.) Cognitive Vision Systems.
LNCS, vol. 3948, pp. 199–220. Springer, Heidelberg (2006)
14. Brun, L., Domenger, J.P.: A new split and merge algorithm with topological maps
and inter-pixel boundaries. In: The 5th Intl. Conference in Central Europe on
Computer Graphics and Visualization, WSCG 1997 (1997)
15. Köthe, U.: XPMaps and topological segmentation - a unified approach to finite
topologies in the plane. In: Braquelaire, A.J.P., Lachaud, J.O., Vialard, A. (eds.)
DGCI 2002. LNCS, vol. 2301, pp. 22–33. Springer, Heidelberg (2002)
16. Meine, H., Köthe, U.: The GeoMap: A unified representation for topology and
geometry. In: Brun, L., Vento, M. (eds.) GbRPR 2005. LNCS, vol. 3434, pp. 132–
141. Springer, Heidelberg (2005)
17. Meine, H., Köthe, U.: A new sub-pixel map for image analysis. In: Reulke, R.,
Eckardt, U., Flach, B., Knauer, U., Polthier, K. (eds.) IWCIA 2006. LNCS,
vol. 4040, pp. 116–130. Springer, Heidelberg (2006)
18. Braquelaire, J.P., Brun, L.: Image segmentation with topological maps and inter-
pixel representation. J. Vis. Comm. and Image Representation 9, 62–79 (1998)
19. Haxhimusa, Y., Glantz, R., Saib, M., Langs, G., Kropatsch, W.G.: Logarithmic
tapering graph pyramid. In: Van Gool, L. (ed.) DAGM 2002. LNCS, vol. 2449, pp.
117–124. Springer, Heidelberg (2002)
20. Meine, H., Köthe, U.: Image segmentation with the exact watershed transform. In:
Proc. Intl. Conf. Visualization, Imaging, and Image Processing, pp. 400–405 (2005)
3D Topological Map Extraction from Oriented
Boundary Graph
1 Introduction
The Segmentation process consists in defining a partition of the image into ho-
mogeneous regions. Split and merge methods [HP74] are widely used in seg-
mentation. It consists in alternatively splitting a region, and merging adjacent
ones according to some criteria, in order to define a partition of the image. To
be efficient, it needs a topological structuring of the partition in order to retrieve
some information such as: the region containing a given voxel, the list of regions
adjacent to a given one, the list of surfaces defining the boundary of a region,
etc [BBDJ08].
Several models have been proposed to represent the partition of an image. A
popular model is the Region Adjacency Graph [Ros74], which is not sufficient
for most of 3D segmentation algorithms due to the lack of information encoded.
A more sophisticated model is the topological map model [Dam08] that uses
combinatorial maps in order to encode the topology of the partition and an
intervoxel matrix [BL70] for the geometry. It encodes all information required
to design split and merge segmentation algorithm including high topological
features allowing to retrieve Euler characteristic and Betti numbers of a region
[DD02]. Since high level topological features are not necessary for basic split and
merge segmentation algorithms, an other model have been proposed [BBDJ08].
A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 283–292, 2009.
c Springer-Verlag Berlin Heidelberg 2009
284 F. Baldacci, A. Braquelaire, and G. Damiand
This model uses a multigraph called Oriented Boundary Graph (OBG) to encode
the topology associated with the same geometrical level than in the topological
map model.
This second model is both more efficient (table 1) and less space consuming.
The space consumption difference cannot be exactly computed because the mem-
ory optimized implementation of the OBG is still under development, but unopti-
mized version show that it will be at least two to four times less space consuming
depending on the number of regions and surfaces of the partition. The space con-
sumption can be a critical constraint with large image or with algorithms needing
a highly oversegmented partition during the segmentation process.
The OBG model is more efficient than the topological map one, it can be
efficiently parallelized [BD08] and is sufficient for split and merge segmentation
that does not use topological characteristics of regions as criteria. But this miss-
ing information could in some cases be necessary and that is the reason why we
have studied the extraction of the topological map of some regions of interest
from the OBG. It could also be useful for the topological map model to use
a more efficient model for a presegmentation step and extract the topological
map from the simplified partition using an algorithm avoiding to traverse all
the voxels. Thus this work is suitable both for the OBG model in order to have
an on-demand high topological features extraction, and for the topological map
model in order to be efficiently extracted from a presegmented image, for which
the equivalent presegmentation using topological map is too much time or space
consuming.
This paper is organized as follow. In section 2, we describe and briefly compare
the two topological models. Then in section 3 we describe the topological map
extraction algorithm from the OBG. We conclude in section 4.
Let us present some usual notions about image and intervoxels elements. A voxel
is a point of discrete space ZZ 3 associated with a value which could be a color
or a gray level. A three dimensional image is a finite set of voxels. In this work,
combinatorial maps are used to represent voxel sets having the same labeled
value and that are 6-connected. We define region as a maximal isovalued set of
6-connected voxels.
3D Topological Map Extraction from Oriented Boundary Graph 285
To avoid particular process for the image border voxels, we consider an infinite
region R0 that surrounds the image. If a region Rj is completely surrounded by
a region Ri we say that Rj is included in Ri .
R0
R3 11
00
000
111
11
00
000
111
11
00
000
111
000
1
000
111
11
11
0001
1
000
111
11
00
000
1110
01
10
R0
11
00
000
111
001
11
11
00 0
000
1110
1
0000
110000
1111
11
000
111
00
11
000
1110
10
1
0000
00
1111
11
00
11
000
111 0
1
0000
00
1111
11
0
10
000
11111
0000
00
1111
11
011
00
0
1
000
111 0
1
0000
1111
0
111
00
0
10
1
0000
1111
11
00
R1 R2 0
1
0000
1111
11
00
0
1
0
1
0
1
0
1
11
00
0000
1111
0
111
00
0000
1111
0
111
00
0000
1111
11
00
0000
1111
11
00
0000
11110
11
00
0
1
000
111
111
00
0
1
000
111
11
00
0
1
000
111
11
00
0000
1111
111
011
00
0
1
000
111
00
0000
1111111
00
00
1
000
111
1111
00
0000
11111
0
00
11
00
000
111
000
111
00
11
000
111
000
111
00
11
000
111
00
11
000
111
00
11
000
111
00
11
000
111
00
11
000
111
00
11
000
111
00
11
000
111
00
11
00
11
000
111
00
11
000
111
00
11
000
111
00
11
000
111
00
11
000
111
000
111
11
00
000
111
000
111
11
00
000
111
000
111
11
00
000
111
000
111
11
00
R1 R2 R 3
A B C D
Fig. 1. The different parts of the topological map used to represent an image. (A) 3D
image. (B) Minimal combinatorial map. (C) Intervoxel matrix (embedding). (D) Inclu-
sion tree of regions.
of an image into regions. Each face of the topological map is separating two
adjacent regions and two adjacent faces do not separate the same two regions.
With these rules, we ensure the minimality (in number of cells) of the topological
map (see [Dam08] for more details on topological maps).
The intervoxel matrix is the embedding of the combinatorial map. Each cell
of the map is associated with intervoxel elements representing geometrical infor-
mation of the cell.
The inclusion tree of regions represents the inclusion relations. Each region in
the topological map is associated to a node in the inclusion tree. The nodes are
linked together by the inclusion relation previously defined.
Fig. 2. Example of image partition with the corresponding representation using the
Oriented Boundary Graph model
Let us recall advantages and drawbacks of each model. The Topological Map
model encodes the whole topology of the partition, from the regions and surfaces
adjacencies to the Euler characteristic and Betti number of regions. Computing
this map consumes an important memory space and requires a time consuming
extraction algorithm. The OBG is an enhanced multiple adjacency graph with
an intervoxel matrix associated. It is intended to be simpler than the topological
map but also less expressive. It has an efficient extraction algorithm and uses low
memory space. But some high topological features such as the characteristics of
regions are not encoded. Given a description of the image partition with a matrix
of labels, the OBG extraction algorithm have a O(v + s + l) complexity with v
the number of voxels, s the number of surfels and l the number of linels, because
each operation is processed once by element and takes O(1). The topological
288 F. Baldacci, A. Braquelaire, and G. Damiand
map extraction algorithm has the same theoretical complexity than the the OBG
one, O(v + s + l), for the same reason: each cell of the intervoxel subdivision is
processed exactly once. However, the number of operations achieved for each cell
is more important, which explains the difference in extraction times.
Advantages and drawbacks of each model could be summarized as follows:
Converging to an optimal model that have the efficiency of an OBG and the
expression of a Topological Map is not possible. That is the reason why we need
a conversion algorithm allowing to take advantages of the two models, by not
using the same model during the whole segmentation process.
Algorithm 1 is the main part which reconstructs the part of topological map
representing a given set of regions.
To reconstruct a given region R, we run through edges of the OBG. Indeed,
each edge corresponds to a surface of R. Now, two cases have to be considered
depending if the surface is closed or not.
3D Topological Map Extraction from Oriented Boundary Graph 289
Adding fictive edges to the existing edges in the OBG allows to retrieve the two
properties of the topological map that are missing in the OBG. Indeed, fictive
edges (i) link the different boundaries of a surface and (ii) conserve a valid Euler
characteristic for each surface.
Before applying Algorithm 1, we need to compute the Euler characteristic of
each surface since this information is not present in the OBG but it is needed
during the map reconstruction. For that, we compute for each edge e of G, #v
#e and #f (respectively the number of vertices, edges and faces of the surface
associated to e).
The Euler characteristic of the face associated to e is denoted by χ(e) = #v −
#e + #f . The genus associated to this surface is denoted by g(e) and computed
with the Euler formula formula: g(e) = 2−χ(e) 2 . The Euler characteristic of the
surface of a region r is the sum of the Euler characteristic of all its faces (the fact
that vertices and edges incident to two faces are counted twice is not a problem for
the Euler characteristic since it uses the difference between these two numbers).
290 F. Baldacci, A. Braquelaire, and G. Damiand
During this computation, new created darts are associated to their triplets
(that have to be oriented) in order to retrieve, when a new dart is created,
incident darts to the same triplet, and thus update the β2 and β3 links.
This algorithm is local: when processing dart d associated to triplet (p, l, s),
we search for darts already existing in the neighborhood of d and sew found
darts with d. In Fig. 3, we explain how triplets (p, lprev , s2 ) and (p, lprev , s3 ) are
computed from (p, l, s).
n’ n
l’
s
s3
l prev l
s2 p
pprev
Fig. 3. How to compute triplets (p, lprev , s2 ) and (p, lprev , s3 ). (p, l, s) is the triplet
associated to current dart d, and pprev is the pointel incident to dart dprev . We want
to sew dprev by β2 and β3 if corresponding darts are already created. s3 is the first
surfel found from s, by turning around linel l in the direction of − →
n (the normal of s,
oriented towards the current region r). lprev is the linel incident to p and s3 (i.e. the
previous linel of the current border). s2 is the first surfel found from s3 , by turning
→
−
around linel lprev in the opposite direction of n (the opposite of the normal of s3 .
Indeed, the normal of s3 is oriented towards the adjacent region of r, thus the opposite
is oriented towards r).
of the reconstructed region. This give the first part of the complexity O(nl ).
The second part is due to the adding of 2 × g edges which is done in linear
time depending on g. The last part corresponds to the computation of g, which
required each surfel to be considered once leading to a complexity in O(ns ).
4 Conclusion
Split and merge segmentation in the 3D case could be a highly time consuming
method without the use of a topological structuring. But an optimal structuring
both in term of time and space consumption and in term of topological features
292 F. Baldacci, A. Braquelaire, and G. Damiand
representation could not be achieved. That is the reason why two models have
been developed: the Topological map one which represent the whole topology of
an image partition and the OBG model which is more efficient according to time
and space consumption.
In this article we have developed an algorithm that allows to extract the
Topological Map from the OBG. This operation allows to have an on-demand
extraction of the Topological Map from some regions of the OBG, which allows
to locally retrieve all the topological features of some regions of interest in the
image partition.
The other utilization of this algorithm is to extract the Topological Map of
the whole image partition but only a the step of the segmentation process where
it is needed. The presegmentation will be done using the OBG in order to have
a lower time consumption or in order to avoid a lack of memory.
In future works, we want to study the possibility to modify the topological
map reconstructed, for example by an algorithm which take into account a topo-
logical criterion, and then to update locally the OBG model to reflect the image
modifications.
References
[BBDJ08] Baldacci, F., Braquelaire, A., Desbarats, P., Domenger, J.P.: 3d image topo-
logical structuring with an oriented boundary graph for split and merge seg-
mentation. In: Coeurjolly, D., Sivignon, I., Tougne, L., Dupont, F. (eds.)
DGCI 2008. LNCS, vol. 4992, pp. 541–552. Springer, Heidelberg (2008)
[BD08] Baldacci, F., Desbarats, P.: Parallel 3d split and merge segmentation with
oriented boundary graph. In: Proceedings of The 16th International Con-
ference in Central Europe on Computer Graphics, Visualization and Com-
puter Vision 2008, pp. 167–173 (2008)
[BL70] Brice, C.R., Fennema, C.L.: Scene analysis using regions. Artif. Intell. 1(3),
205–226 (1970)
[Dam08] Damiand, G.: Topological model for 3d image representation: Definition
and incremental extraction algorithm. Computer Vision and Image Under-
standing 109(3), 260–289 (2008)
[DD02] Desbarats, P., Domenger, J.-P.: Retrieving and using topological character-
istics from 3D discrete images. In: Proceedings of the 7th Computer Vision
Winter Workshop, pp. 130–139, PRIP-TR-72 (2002)
[HP74] Horowitz, S.L., Pavlidis, T.: Picture segmentation by a directed split and
merge procedure. In: ICPR 1974, pp. 424–433 (1974)
[KKM90] Khalimsky, E., Kopperman, R., Meyer, P.R.: Boundaries in digital planes.
Journal of Applied Mathematics and Stochastic Analysis 3(1), 27–55 (1990)
[Lie91] Lienhardt, P.: Topological models for boundary representation: a compar-
ison with n-dimensional generalized maps. Computer-Aided Design 23(1)
(1991)
[Ros74] Rosenfeld, A.: Adjacency in digital pictures. In: InfoControl, vol. 26 (1974)
An Irregular Pyramid for Multi-scale Analysis
of Objects and Their Parts
Martin Drauschke
1 Introduction
The interpretation of images showing objects with a complex structure is a dif-
ficult task, especially if the object’s components may repeat or vary a lot in
their appearance. As far as human perception is understood today, objects are
often recognized by analyzing their compositional structure, cf. [9]. Besides spa-
tial relations between object parts, the hierarchical structure of the components
is often helpful for recognizing an object or its parts. E. g. in aerial images of
buildings with a resolution of 10 cm per pixel, it is easier to classify dark image
parts as windows in the roof, if the building at whole has been recognized before.
Buildings are objects with parts of various scales. Depending on the view
point, terrestrial or aerial, the largest visible building parts are its facade or its
roof. Mid-scale entities are balconies, dormers or the building’s entrance; and
small-scale parts are e. g. windows and window panes as window parts. We
restrict our focus on such parts, a further division down to the level of bricks or
tiles is not of our interest.
Recently, many compositional models have been proposed for the recognition
of natural and technical objects. E. g. in [6] a part-based recognition framework
This work has been done within the project Ontological scales for automated detec-
tion, efficient processing and fast visualization of landscape models which is funded
by the German Research Council (DFG).
A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 293–303, 2009.
c Springer-Verlag Berlin Heidelberg 2009
294 M. Drauschke
is proposed, where the image fragments have been put in a hierarchical order
to infer the category of the whole object after having classified its parts. So
far, this approach has only been used for finding the category of an object, but
it does not analyze the parts individually. This approach has been evaluated
on blurred, downsampled building images, cf. [13]. Without resizing the image,
the algorithm seems to work inefficiently or even might fail at homogeneous
facades or on the repetitive patterns like bricks, because the fragments cannot
get grouped together easily. Thus, the approach is not easily applicable to the
domain of buildings.
Working on hyperspectral images, a hierarchical segmentation scheme for
geospatial objects as buildings has been recently proposed using morphologi-
cal operations, cf. [1]. Due to the low resolution of the images, the hierarchy
can only be used for detecting the object of the largest scale, but not its parts
separately.
We work on segmented image regions at different scales, where we derive a
region hierarchy from the analysis of the regions. So far, it is purely data-driven,
so that the general approach can be used in many domains. A short literature
review on multi-scale image analysis is given in sec. 2. Then, we present our own
multi-scale approach in sec. 3. For complexity reasons, we need to select regions
from the pyramid for further processes. We document this procedure in sec. 4.
The validation of our graphical representation is demonstrated in an experiment
on building images in sec. 5. Concluding, we summarize our contribution in
sec. 6.
In this section, we present our multi-scale segmentation framework and the con-
struction of our region hierarchy graph (RHG). For receiving more precise region
boundaries, we applied an adaptation of the approach of [8].
Many different segmentation algorithms were proposed since the age of digital
imagery has started. We decided to derive our segmentation from the watershed
boundaries on the image’s gradient magnitude. Considering the segmentation of
man-made objects, we mostly find strong color edges between different surfaces,
and so the borders of the watershed regions are often (nearly) identical with the
borders of the objects.
Our approach uses the Gaussian scale-space for obtaining regions in multiple
scales. We arranged the discrete scale-space layers logarithmically between σ = 1
and σ = 16 with 10 layers in each octave, obtaining 41 layers. For each scale
σ, we convolve each image channel with a Gaussian filter and obtain a three-
dimensional image space for each channel. Then we compute the combined gra-
dient magnitude of the color images. Since the watershed algorithm is inclined
to produce oversegmentation, we suppress many gradient minima by resetting
the gradient value at positions where the gradient is below the median of the
gradient magnitude. So, those minima are removed which are mostly caused by
noise. The mathematical notation of this procedure is described in more detail
in [5]. As result of the watershed algorithm, we obtain a complete partitioning
of the image, where every image pixel belongs to exactly one region.
296 M. Drauschke
Fig. 1. Segmentation in scale-space and its RHG. Regions from the same scale are
ordered horizontally, and the increasing scales are ordered vertically from bottom to
top. The edges between the nodes describe the development of the regions over scale.
The gray-filled region has been created in the second layer.
An Irregular Pyramid for Multi-scale Analysis of Objects and Their Parts 297
regions, if a scale-space layer has been skipped when constructing the RHG. We
show a scale-space with three layers and the corresponding RHG in fig. 1.
Fig. 2. Image segmentation of an aerial image. Left: RGB image of a suburban scene in
Graz, Austria (provided by Vexcel Imaging GmbH). Middle: Original watershed regions
in scale σ = 35. Right: Region focusing with merged regions of scales σ = 12 (thin)
and σ = 35 (thick). Clearly, both segmentations of scale σ = 35 are not topologically
equivalent, because the newly created or split regions (and their borders) cannot get
tracked down to the initial partition by our region focusing.
298 M. Drauschke
layers above the initial partition. Hence, the respective nodes and edges must be
removed from the RHG. Furthermore, all regions must be removed which only
develop from these newly created regions. The updated RHG of the example in
fig. 1 will contain all white nodes and the their connecting edges.
Fig. 3. Tree of stable regions: the layers of the pyramid are arranged in a vertical
order (going upwards), each rectangle represents a node in the TSR, the white ones
correspond to stable regions, the black ones the merging events from the RHG. The
horizontal extensions of the rectangles show their spatial state, and the vertical exten-
sion corresponds with the range of stability. The idea of this figure is taken from the
interval tree and its representation as rectangular tessellation in [18].
pyramid. Due to the limited space, we cannot go more into detail here, we present
a sketch of our method in fig. 3. Its result is a tree of stable regions (TSR), where
we inserted an additional root-node for describing the complete scene.
5 Experiments
Our approach is very general, because we used only two assumptions for gener-
ating the TSR: the color-homogeneity of the objects and the color-heterogeneity
between them, and that the objects of interest are stable in scale-space or are
merged stable regions. Now, we want to present some results of our experiments.
Therefore, we analyzed the TSR of 123 facade images from six German cities:
Berlin, Bonn, Hamburg, Heidelberg, Karlsruhe and Munich, see fig. 4. These
buildings have a sufficient large variety with respect to their size, the architec-
tural style and the imaging conditions.
The ground truth of our experiments on facade images are hand-labeled an-
notations1 . On one side, the annotation contains the polygonal borders of all
1
The images and their annotations were provided by the project eTraining for inter-
preting images of man-made scenes which is funded by the European Union. The
labeling of the data has been realized by more than ten people in two different re-
search groups. To avoid inconsistencies within the labeled data, there was defined
an ontology for facade images with a list of objects that must be annotated and
their part-of relationships. A publication of the data is in preparation. Please visit
www.ipb.uni-bonn.de/etrims for further information.
300 M. Drauschke
Fig. 4. Left: Facade images from Berlin, Bonn, Hamburg, Heidelberg, Karlsruhe and
Munich (f. l. t. r.), showing the variety of our data set. Right: Two levels from the
irregular pyramid of the Hamburg image.
Fig. 5. Left: Facade image from Hamburg with manually annotated objects. Right:
Major classes and their part-of relationships from the defined building-scene ontology.
interesting objects that are visible in the scene. On the other side, part-of rela-
tionships have also been inserted in the annotations. An extract of the facade
ontology is shown in fig. 5.
5.2 Results
the union of both regions is bigger than 0.5. So, we compute this quotient for
each region in the TSR with respect to all annotated objects. Then, the maxi-
mum quotient is taken for determining the class label of the segmented region.
If the ratio is above the threshold, then we call the object detectable. Otherwise,
we also look for partial detectability, i. e. if the segmented region is completely
included by an annotation. This partial detectability is relevant, e. g. if the ob-
ject is occluded by a car or by a tree. Furthermore, we do not expect to detect
complete facades, but our segmentation scheme could be used for analysis of
image extracts, i. e. the roof part or around balconies.
Regarding the 2nd experiment, our interest is, if the TSR reflects the class
hierarchy. This would be the case, if e. g. a window -region includes window pane-
regions, i. e. they both are connected by a path upwards in the TSR. So, we only
focus on those annotated objects, which were (a) detectable or partially de-
tectable and (b) annotated as an aggregate. In this case, the annotation includes
a list of parts of this object. Then, we determine, whether we find other regions
in the TSR, which are (a) also at least partially detectable and (b) are connected
to the first region by a path upwards in the TSR. Then the upper region can
get described as an aggregate containing at least the lower one. Additionally, we
also check, whether not at least one but all parts of the aggregated object have
been found, i. e. if the list of detectable parts is complete. Our results are shown
in tab. 1.
Table 1. Results on detectabilty of building parts: 84% of the annotated objects have a
corresponding region in the TSR or are partially detectable. The columns are explained
in the surrounding text.
Note: the automatically segmented regions were only compared with the la-
beled data, no classification step has been done so far. We have presented first
classification results on the regions from the Gaussian scale-space in [4], where
we classified segmented regions as e. g. windows with a recognition rate of 80%
using an Adaboost approach. With geometrically more precise image regions, we
expect to obtain even better results. Furthermore, the detected regions can be
inserted as hypotheses to a high-level image interpretation system as it has been
demonstrated in [11]. It uses initial detectors and scene interpretations of mid-
level systems to infer an image interpretation by means of artificial intelligence,
where new hypotheses must be verified by new image evidence.
302 M. Drauschke
References
1. Akçay, H.G., Aksoy, S.: Automatic detection of geospatial objects using multiple
hierarchical segmentations. Geoscience & Remote Sensing 46(7), 2097–2111 (2008)
2. Bergholm, F.: Edge focusing. PAMI 9(6), 726–741 (1987)
3. Brun, L., Mokhtari, M., Meyer, F.: Hierarchical watersheds within the combinato-
rial pyramid framework. In: Andrès, É., Damiand, G., Lienhardt, P. (eds.) DGCI
2005. LNCS, vol. 3429, pp. 34–44. Springer, Heidelberg (2005)
4. Drauschke, M., Förstner, W.: Selecting appropriate features for detecting buildings
and building parts. In: Proc. 21st ISPRS Congress, IAPRS 37 (B3b-2), pp. 447–452
(2008)
5. Drauschke, M., Schuster, H.-F., Förstner, W.: Detectability of buildings in aerial
images over scale space. PCV 2006, IAPRS 36(3), 7–12 (2006)
6. Epshtein, B., Ullman, S.: Feature hierarchies for object classification. In: Proc. 10th
ICCV, pp. 220–227 (2005)
7. Everingham, M., Winn, J.: The pascal visual object classes challenge 2008
(voc2008) development kit (2008) (online publication)
8. Gauch, J.M.: Image segmentation and analysis via multiscale gradient watershed
hierarchies. Image Processing 8(1), 69–79 (1999)
An Irregular Pyramid for Multi-scale Analysis of Objects and Their Parts 303
9. Goldstein, E.B.: Sensation and Perception (in German translation by Ritter, M),
6th edn. Wadsworth, Belmont (2002)
10. Guigues, L., Le Men, H., Cocquerez, J.-P.: The hierarchy of the cocoons of a graph
and its application to image segmentation. Pattern Recognition Letters 24(8),
1059–1066 (2003)
11. Hartz, J., Neumann, B.: Learning a knowledge base of ontological concepts for
high-level scene interpretation. In: Proc. ICMLA, pp. 436–443 (2007)
12. Harvey, R., Bangham, J.A., Bosson, A.: Scale-space filters and their robustness.
In: ter Haar Romeny, B.M., Florack, L.M.J., Viergever, M.A. (eds.) Scale-Space
1997. LNCS, vol. 1252, pp. 341–344. Springer, Heidelberg (1997)
13. Lifschitz, I.: Image interpretation using bottom-up top-down cycle on fragment
trees. Master’s thesis, Weizmann Institute of Science (2005)
14. Lindeberg, T.: Scale space theory in computer vision. Kluwer Academic, Dordrecht
(1994)
15. Meer, P.: Stochastic image pyramids. CVGIP 45, 269–294 (1989)
16. Olsen, O.F., Nielsen, M.: Multiscale gradient magnitude watershed segmentation.
In: Del Bimbo, A. (ed.) ICIAP 1997. LNCS, vol. 1310, pp. 9–13. Springer, Heidel-
berg (1997)
17. Perona, P., Malik, J.: Scale-space and edge detection using anisotropic diffussion.
PAMI 12(7), 629–639 (1990)
18. Witkin, A.: Scale-space filtering. In: Proc. 8th IJCAI, pp. 1019–1022 (1983)
A First Step toward Combinatorial Pyramids in
n-D Spaces
GREYC, CNRS UMR 6072, ENSICAEN, 6 bd maréchal Juin F-14050 Caen, France
Sebastien.Fourey@greyc.ensicaen.fr, Luc.Brun@greyc.ensicaen.fr
1 Introduction
Pyramids of combinatorial maps have first been defined in 2D [1], and later
extended to pyramids of n-dimensional generalized maps by Grasset et al. [6].
Generalized maps model subdivisions of orientable but also non-orientable quasi-
manifolds [7] at the expense of twice the data size of the one required for com-
binatorial maps. For practical use (for example in image segmentation), this
may have an impact on the efficiency of the associated algorithms or may even
prevent their use. Furthermore, properties and constrains linked to the notion
of orientation may be expressed in a more natural way with the formalism of
combinatorial maps. For these reasons, we are interested here in the definition of
pyramids of n-dimensional combinatorial maps. This paper is a first step toward
the definition of such pyramids, and the link between our definitions and the
ones that consider G-maps is maintained throughout the paper. In fact, the link
between n-G-maps and n-maps was first established by Lienhardt [7] so that it
was claimed in [2], but not explicitly stated, that pyramids of n-maps could be
defined.
The key notion for the definition of pyramids of maps is the operation of
simultaneous removal or contraction of cells. Thus, we define the operation of
simultaneous removal and the one of simultaneous contraction of cells in an
n-map, the latter being introduced here as a removal operation in the dual map.
This work was supported under a research grant of the ANR Foundation (ANR-06-
MDCA-008-02/FOGRIMMI).
A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 304–313, 2009.
c Springer-Verlag Berlin Heidelberg 2009
A First Step toward Combinatorial Pyramids in n-D Spaces 305
We first raise in Section 3 a minor problem with the definition of ”cells with
local degree 2 in a G-map” used in [5,2] and more precisely with the criterion
for determining if a cell is a valid candidate for removal. We provide a formal
definition of the local degree, which is consistent with the results established in
previous papers [2,6], using the notion of a regular cell that we introduce.
An essential result of this paper, presented in Section 4, is that the removal
operation we introduce here is well defined since it indeed transforms a map
into another map. Instead of checking that the resulting map satisfies from its
very definition the properties of a map, we use an indirect proof by using the
removal operation in G-maps defined by Damiand in [2,3]. If needed, this way
again illustrates the link between the two structures.
Eventually, in Section 5 we will state a definition of simultaneous contraction
of cells in a G-map in terms of removals in the dual map, definition which we
prove to be equivalent to the one given by Damiand and Lienhardt in [2]. We
finally define in the same way the simultaneous contraction operation in maps.
Note that the proofs of the results stated in this paper may be found in [4].
e2
e1 3 4
1 −1
v2
v1
2 −2
−3 −4
(a) (b)
As the number of (i + 1)-cells that are incident to it, the degree of an i-cell C
in an n-G-map G = (D, α0 , . . . , αn ) is the number of sets in the set Δ = <
308 S. Fourey and L. Brun
α̂i+1 > (d) d ∈ C . As part of a criterion for cells that may be removed from a
G-map, we need a notion of degree that better reflects the local configuration of
a cell: the local degree. A more precise justification for the following definition
may be found in [4].
Definition 7 (Local degree in G-maps). Let C be an i-cell in an n-G-map.
– For i ∈ {0, . . . , n − 1}, the local degree of C is the number
< α̂i , α̂i+1 >(b) b ∈ C
– For i ∈ {1, . . . , n}, the dual local degree of C is the number
< α̂i−1 , α̂i >(b) b ∈ C
The local degree (resp. the dual local degree) of an n-cell (resp. a 0-cell) is 0.
Intuitively, the local degree of an i-cell C is the number of (i+1)-cells that locally
appear to be incident to C. It is called local because it may be different from
the degree since an (i + 1)-cell may be incident more than once to an i-cell, as
illustrated in Fig. 1 where the 1-cell e2 is multi-incident to the 0-cell v2 , hence
the cell v2 has a degree 2 and a local degree 3.
On the other hand, the dual local degree of an i-cell C is the number of (i − 1)-
cells that appear to be incident to C. As in the example given in Fig. 1 where the
edge e2 locally appears to be bounded by two vertices2 , whereas the darts defining
this edge all belong to a unique vertex (v2 ). Hence, e2 has a dual degree 1 and a
dual local degree 2. and a dual local degree 2.
In [5,6], Grasset defines an i-cell with local degree 2 (0 ≤ i ≤ n − 2) as a cell C
such that for all b ∈ C, bαi+1 αi+2 = bαi+2 αi+1 , and an i-cell with dual local degree
2 (2 ≤ i ≤ n) as a cell C such that for all b ∈ C, bαi−1 αi−2 = bαi−2 αi−1 . In fact,
Grasset’s definition does not actually distinguish cells with local degree 1 from cells
with local degree 2, so that the vertex v1 in the 2-G-map of Fig. 1 is considered as
removable, yielding the loop (−1, −2) after removal. On the other hand, it is also
more restrictive then our definition for a cell with local degree 2 (Definition 7). As
an example, the vertex depicted in Fig. 1(b) has local degree 2 but does not satisfy
the above mentioned criterion.
However, Grasset’s definition was merely intended to characterize cells that
could be removed from a G-map, producing a valid new G-map, following the
works of Damiand and Lienhardt [2] where the term “degree equal to 2” is
actually used with quotes. To that extend, it is a good criterion [3, Theorem 2]
but again not a proper definition for cells with local degree 2.
Grasset’s criterion is in fact a necessary but not sufficient condition to prevent
the production of a degenerated G-map after a removal operation, like in the
case of the removal of a vertex with local degree 1 (v1 in Fig. 1). We introduce
here our own criterion based on the proper notion of local degree and a notion
of regularity introduced below. This criterion is proved to be equivalent to a
corrected version of Grasset’s condition (Theorem 1). We first introduce the
notion of a regular cell.
2
It is always the case for an (n − 1)-cell.
A First Step toward Combinatorial Pyramids in n-D Spaces 309
We proved ([4, Proposition 14]) that the removal operation introduced here for
n-maps produces a valid n-map when applied to the map of the hypervolumes
of a G-map. Formally, if G is an n-G-map and Kr is a removal kernel in G:
HV (G) \ HV (Kr ) = HV (G \ Kr ) (1)
so that the left term is a valid map.
It remains to be proved that the removal operation, when applied to any n-
map, produces a valid n-map. This is proved to be true (Theorem 2) as soon as
the cells to be removed constitute a removal kernel according to Definition 14.
Definition 14 (Removal kernel). Let M be an n-map. A removal kernel
Kr = {Ri }0≤i≤n in M is a removal set such that all cells of R are disjoint
and all of them are regular cells with local degree 2 ([4, Definition 16] and Defi-
nition 11).
If M is an n-map and G = AG(M ) with the notations of Definition 6, for any
i-cell C of M the set3 C ∪ Cσ (if i < n) or C ∪ Cγ0 σ (if i = n) is an n-cell of
AG(M ) [4, Proposition 7] called the associated cell of C in AG(M ), denoted
by C̃. This definition of associated cell allows to directly define in AG(M ) the
associated removal set of a removal kernel in M , which is proved to be a removal
kernel [4, Definition 24,Proposition 15].
We may now state the main result of this section.
Theorem 2. If M is an n-map and Kr is a removal kernel in M , the (n + 1)-
tuple M \ Kr (Definition 12) is a valid n-map.
Sketch of proof: With G̃ = AG(M ), we have the following diagram:
removal of K
M −−−−→ M −−−−−−−−−→ r
M \ Kr
⏐
⏐ ⏐ ⏐
|D ⏐ |D ⏐
⏐
⏐ removal of HV (K̃r )
AG HV (G̃) −−−−−−−−−−−−→ HV (G̃) \ HV (K̃r )
⏐
⏐ ⏐ ⏐
HV ⏐ HV ⏐
removal of K̃
G̃ −−−−→ G̃ −−−−−−−−−→
r
G̃ \ K̃r
−1 k
– ∀i ∈ {1, . . . , n − 1}, ∀d ∈ D , dγi = dγn−1
k
(γi γi−1 ) γi , where k is the small-
est integer such that dγn−1 ∈ D and k is the smallest integer such that
k
−1 k
k
dγn−1 (γi γi−1 ) γi ∈ D .
6 Conclusion
Based on the previous work by Damiand and Lienhardt for generalized maps, we
have defined cells removal and contraction in n-dimensional combinatorial maps,
and proved the validity of such operations. A logical sequel of this paper will be
the definition of n-dimensional combinatorial pyramids and the related notions,
the way Brun and Kropatsch did in the two-dimensional case and following the
works of Grasset about pyramids of generalized maps.
References
1. Brun, L., Kropatsch, W.: Combinatorial pyramids. In: Suvisoft (ed.) IEEE Interna-
tional conference on Image Processing (ICIP), Barcelona, September 2003, vol. II,
pp. 33–37. IEEE, Los Alamitos (2003)
2. Damiand, G., Lienhardt, P.: Removal and contraction for n-dimensional generalized
maps. In: Nyström, I., Sanniti di Baja, G., Svensson, S. (eds.) DGCI 2003. LNCS,
vol. 2886, pp. 408–419. Springer, Heidelberg (2003)
3. Damiand, G., Lienhardt, P.: Removal and contraction for n-dimensional generalized
maps. Technical report (2003)
4. Fourey, S., Brun, L.: A first step toward combinatorial pyramids in nD spaces.
Technical report TR-2009-01, GREYC (2009),
http://hal.archives-ouvertes.fr/?langue=en
5. Grasset-Simon, C.: Définition et étude des pyramides généralisées nD : application
pour la segmentation multi-echelle d’images 3D. Ph.D. thesis, Université de Poitiers
(2006)
6. Grasset-Simon, C., Damiand, G., Lienhardt, P.: nD generalized map pyramids: Def-
inition, representations and basic operations. Pattern Recognition 39(4), 527–538
(2006)
7. Lienhardt, P.: Topological models for boundary representation: a comparison with
n-dimensional generalized maps. Computer-Aided Design 23(1), 59–82 (1991)
8. Lienhardt, P.: N-dimensional generalized combinatorial maps and cellular quasi-
manifolds. International Journal of Computantional Geometry & Applications 4(3),
275–324 (1994)
Cell AT-Models for Digital Volumes
1 Introduction
In [4], a polyhedral cell complex P (V ) homologically equivalent to a binary 26-
adjacency voxel-based digital volume V is constructed. The former is an useful
tool in order to visualize, analyze and topologically process the latter.The contin-
uous analogous P (V ) is constituted of contractile polyhedral blocks installed in
overlapping 2×2×2 unit cubes. Concerning visualization, the boundary cell com-
plex ∂P (V ) (in fact, a triangulation) of P (V ) is an alternative to marching-cube
based algorithms [7]. The complex P (V ) is obtained in [4] suitably extending to
volumes the discrete boundary triangulation method given in [8]. Nevertheless,
the main interest in constructing P (V ) essentially lies in the fact that we can
extract from it homological information in a straightforward manner. More pre-
cisely, by homological information we mean here not only Betti numbers (number
of connected components, ”tunnels” or ”holes” and cavities), Euler characteristic
and representative cycles of homology classes but also homological classification
of cycles and higher cohomology invariants. Roughly speaking, for obtaining this
homological acuity, we use an approach in which the homology problem is posed
in terms of finding a concrete algebraic “deformation process” φ (so-called chain
homotopy in Homological Algebra language [6] or homology gradient vector field
as in [4]) which we can apply to P (V ), obtaining a minimal cell complex with
exactly one cell of dimension n for each homology generator of dimension n.
This work has been partially supported by ”Computational Topology and Applied
Mathematics” PAICYT research project FQM-296, ”Andalusian research project
PO6-TIC-02268 and Spanish MEC project MTM2006-03722, and the Austrian Scien-
ce Fund under grant P20134-N13.
A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 314–323, 2009.
c Springer-Verlag Berlin Heidelberg 2009
Cell AT-Models for Digital Volumes 315
condition). Then, H ∂ (K) and H φ (K) are isomorphs. The maps h : H ∂ (K) →
H φ (K) defined by h([c]∂ ) = [c + ∂φ(c)]φ and k : H φ (K) → H ∂ (K) defined by
h([c]φ ) = [c + φ∂(c)]φ specify this isomorphism.
Fig. 4. A unit cube with labeled vertices (a) and arrows describing the contractibility
of the cube (b)
Cell AT-Models for Digital Volumes 321
Fig. 5. The maximal cell R (a) and its corresponding homology gvf (b)
(shown in yellow), φQ (< 5, 6 >) =< 5, 6, 7, 8 > + < 1, 2, 7, 8 > (shown in green)
and φQ (< 6 >) =< 1, 2 > + < 2, 7 > + < 6, 7 > (shown in red). Obviously, the
boundary map ∂Q : C(Q) → C(Q) is defined in a canonical way (no problems
here with the orientation of the cells, due to the fact we work over F2 ). For
instance, ∂Q (< 1, 2, 3, 4 >) =< 1, 2 > + < 2, 3 > + < 3, 4 > + < 4, 1 > and
∂Q (< 1, 8 >) =< 1 > + < 8 >.
Now, an alternative technique to the modified Kenmochi et al. method [8] for
constructing P (V ) is sketched here. In order to determine a concrete polyhedral
configuration R as well as a concrete homology gvf for it (to determine its bound-
ary map is straightforward in F2 ), we use a homological algebra strategy which
amounts to take advantage of the contractibility of Q for creating a homology gvf
for R, by means of integral operators acting on Q. For avoiding to overburden with
too much notation, we only develop the method in one concrete cases.
First, let us take the convex hull of eight black points showed in figure 4.
Applying the integral operator given by ψ(< 8 >) =< 1, 8 >, the final result
R and its homology gvf appears in figure 5. The face < 1, 5, , 67 > need to be
subdivided into two triangular faces: < 1, 5, 7 > and < 1, 6, 7 > for getting the
configuration R. For connecting R and R , we applied to R the integral operator
given by the formula ψ(< 5, 7 >) =< 1, 5, 7 >.
In consequence, a homology gvf for R appears in Figure 5.
In fact, all this homology gvfs are obtained by transferring the homology gvf
of Q via chain homotopy equivalences.
All these techniques are valid for any finite field or integer coefficients, and
additional difficulties about orientation of the cells can be easily overcome.
We are now able for designing an incremental algorithm for computing the
homology of V via the cell complex P (V ), based on the reiterated use of homo-
logy gvfs for polyhedral cells inscribed in the unit cube Q, we face to the problem
of computing the homology of an union of a polyhedral cell complex P (V ) and
a polyhedral cell R.
Definition 2. Let (K, ∂) be a finite cell complex and φ1 , φ2 , . . . , φr a sequence of
integral operators φi : C∗ (K) → C∗+1
(K) involving two cells {ci1 , ci2 } of different
such that {c1 , c2 } {c1 , c2 } = ∅, ∀1 ≤ i, k ≤ r. Then, an
i i k k
dimension and
algebraic gvf ri=1 φi for C(K) onto a chain subcomplex having n − 2r cells can
r
be constructed. The sum i=1 φi applied to a cell u is ck2 is u = ck1 (k = 1, . . . , r)
and zero elsewhere.
322 P. Real and H. Molina-Abril
Fig. 6. An example showing the representative generator of the 1-cycle (in blue) and
the resulting Φ and ϕ. Notice that Φ(< 3, 6 >) = 0 and < 3, 6 >∈
/ Im(Φ) (< 3, 6 > is
a critical simplex in terms of Discrete Morse Theory).
Fig. 7. An example showing the filling of the “hole” and an attachment of a 2-cell
r
In general Φ = i=1 φi does not satisfy the condition ΦdΦ = Φ. Applying
Algorithm 1 to (K, ∂) (previously filtered) to a partial filtering affecting only
to the cells cij (1 ≤ i ≤ r and j = 1, 2) in its sub-cells and specifying at each
cell-step concerning the cell ci2 that φ̃(fi (ci1 )) := ci2 , the final result will be a
(non necessarily homological) algebraic integral operator ϕ : C(K) → C(K).
Applying Proposition 1 to the algebraic integral operator ϕ and assuming that
K has n cells, we obtain a chain contraction (f, g, ϕ) from C(K) to a chain
subcomplex C(M (K)) having M (K) (also called, Morse complex of K associated
to the sequence {φi }ri=1 ) n − 2r cells. Algorithm 1 applied to M (K) gives us a
homology gvf φ for M (K). Finally, the map ϕ + φ(1 − dϕ − ϕd) gives us a
homology gvf for the cell complex K.
Using these arguments, it is straightforward to design an algorithmic pro-
cess of homology computation (over F2 ) for a binary 26-adjacency voxel-based
digital volume V based on the contractibility of the maximal cells (in terms
Cell AT-Models for Digital Volumes 323
References
1. Delfinado, C.J.A., Edelsbrunner, H.: An Incremental Algorithm for Betti Numbers
of Simplicial Complexes on the 3–Sphere. Comput. Aided Geom. Design 12, 771–
784 (1995)
2. Eilenberg, S., MacLane, S.: Relations between homology and homotopy groups of
spaces. Ann. of Math. 46, 480–509 (1945)
3. Forman, R.: A Discrete Morse Theory for Cell Complexes. In: Yau, S.T. (ed.)
Geometry, Topology & Physics for Raoul Bott. International Press (1995)
4. Molina-Abril, H., Real, P.: Advanced homology computation of digital volumes via
cell complexes. In: Proceedings of the Structural and Syntactic Pattern Recognition
Workshop, Orlando, Florida, USA (December 2008)
5. Gonzalez-Diaz, R., Real, P.: On the Cohomology of 3D Digital Images. Discrete
Applied Math. 147, 245–263 (2005)
6. Hatcher, A.: Algebraic Topology. Cambridge University Press, Cambridge (2001)
7. Kenmochi, Y., Kotani, K., Imiya, A.: Marching cube method with connectivity. In:
Proceedings of International Conference on Image Processing. ICIP 1999, vol. 4(4),
pp. 361–365 (1999)
8. Kenmochi, Y., Imiya, A., Ichikawa, A.: Boundary extraction of discrete objects.
Computer Vision and Image Understanding 71, 281–293 (1998)
9. Munkres, J.R.: Elements of Algebraic Topology. Addison–Wesley Co., London
(1984)
10. Real, P., Gonzalez-Diaz, R., Jimenez, M.J., Medrano, B., Molina-Abril, H.: Inte-
gral Operators for Computing Homology Generators at Any Dimension. In: Ruiz-
Shulcloper, J., Kropatsch, W.G. (eds.) CIARP 2008. LNCS, vol. 5197, pp. 356–363.
Springer, Heidelberg (2008)
11. Zomorodian, A., Carlsson, G.: Localized Homology. Computational Geometry:
Theory and Applications archive 41(3), 126–148 (2008)
From Random to Hierarchical Data through an
Irregular Pyramidal Structure
1 Introduction
Sometimes large sets of data are sought to be searched with respect to a specific
query point. Many data items in these sets could have been excluded from the
search as they are far from the query point. However, if data items are not
clustered, there will no way but to check each item; a process that can be time
consuming. If the data items are clustered or categorized into a hierarchy, search
time can be enhanced considerably. However, if we structure data in a hierarchy,
visualizing such a hierarchy may be a challenge. Different techniques have been
developed over the last years to help humans grasp the structure of a hierarchy in
a visual form (e.g., treemaps [19], information slices [1] and sunburst [20]). Those
techniques can be categorized under different sets depending on the nature of
data visualized and the way the data are visualized.
This paper presents a technique based on irregular pyramidal rules to cluster
data points in an aim to reduce time consumed in the search process.
The paper is organized as follows. Sec. 2 presents the concepts of pyrami-
dal architecture and multiresolution structures. Sec. 3 surveys different visual-
ization techniques that have been developed under different categories. Sec. 4
presents our algorithm that depends on a hierarchical structure to cluster the
data. Finally, Sec. 5 presents some experimental results while Sec. 6 derives some
conclusions.
A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 324–333, 2009.
c Springer-Verlag Berlin Heidelberg 2009
From Random to Hierarchical Data 325
2 Pyramidal Architecture
Hierarchical or multiresolution processing through pyramidal structures is a well-
known topic in image analysis. The main aim of such a concept is to reduce the
amount of information to be manipulated in order to speed up the whole process.
Over the past recent decades, many hierarchical or pyramidal structures have
been developed to solve various problems that process images in general (e.g.,
segmenting an image according to its different gray levels). Such pyramidal struc-
tures can be categorized into two main subsets. These are regular and irregular
pyramids. The classification of regularity and irregularity depends on whether
a parent in a hierarchy has a constant number of children to build a regular
structure or various number of children to build an irregular structure.
Regular pyramids include, among others, bin pyramid [9] in which a parent has
exactly two children; quad pyramid [9] where a parent has four children (Fig. 1);
hexagonal pyramid [7] that uses a triangular tessellation and in which a parent
has four children; dual pyramid [15] with levels rotated 45◦ alternatively.
In the category of irregular pyramids, the number of children per parent varies
according to the information processed and the operation under consideration.
Hence, the number of surviving nodes, cells or pixels may change from one situa-
tion to another according to the data processed. In order to accommodate this:
• A level should be represented as a graph data structure; and
• Some rules must be utilized in order to govern the process. In the adaptive
pyramid [8] and the disparity pyramid [6], the decimation process; i.e., the
process by which the surviving cells are chosen, can be controlled by two
rules:
1. Two neighbors cannot survive together to the higher level; and
2. For each non-surviving cell, there is at least one surviving cell in its
neighborhood.
It is worth mentioning that all the above pyramids work on images. However, we
may apply pyramidal rules to space points in order to cluster them according to
the proximity among each other. Hence, flat and random data with no apparent
hierarchical nature can be categorized into a hierarchy.
The next section specifies the steps of the algorithm we propose in order to
cluster the data points and visualize them as a hierarchy using a query-dependent
pixel-oriented technique.
3 Visualization Techniques
In addition to the irregular pyramid concept mentioned above, we need to in-
vestigate some visualization concepts. These are query-dependent versus query-
independent techniques in addition to different techniques to visualize hierarchies.
3.1 Query-Dependency
Visualization techniques can be categorized into query-dependent and query-
independent subsets. The query-dependent techniques refer to visualizing the
arranged data according to some attribute. The user may input a query point to
compare with the other data items. The differences can be calculated, arranged
in order and visualized as colored pixels. Spiral and axes techniques and their
variations [11,10,12] are examples that can be used in this case. The query-
independent techniques do not require the user to input a query point to visualize
data with respect to that point; instead, the data are visualized with no apparent
order if data items are not sorted originally.
4 Algorithm
Our algorithm can be split into two main phases to:
1. Build the hierarchy through data clustering using irregular pyramidal tech-
nique.
2. Visualize the established hierarchical data with respect to a query point.
As in other irregular pyramids, each level of the structure is represented as a
graph. At the lowest level (i.e., the base), the graph consists of a number of
cluster cells (or nodes) where each node is linked to every other node and where
every node contains only one space point. At the upper levels, a cluster node
may contain more points while the number of clusters is reduced at that level
comparing to its predecessor.
As mentioned in Sec. 2, some rules must exist in order to control the decima-
tion process of choosing the surviving cells and how cells at different levels are
linked together. The rules used in this structure are:
1. Two neighbors may both survive at the next level if and only if some binary
variable is set to zero during the decimation process. Such a rule is different
from the case of the adaptive and disparity pyramids [8,6]; and
2. For each nonsurviving node, there exists at least one surviving node in its
neighborhood. Such a rule is true in case of the adaptive and disparity
pyramids.
Suppose that the set of clusters at a given level i is L(i) = {C(i,1) , C(i,2) , ..., C(i,n) }
where n is the number of clusters at this level; and C(i,j) is a cluster consisting of
a number of space points (where j ∈ {1, ..., n}). Also, we can define a cluster as
C(i,j) = {p(i,j,1) , p(i,j,2) , ..., p(i,j,m) } where j ∈ {1, ..., n} is the cluster number;
m is the number of points in cluster; and p is a vector whose length depends on
the dimension of space.
A binary variable q is reset to 0 for every two clusters, C(i,j) and C(i,k) , at level
Li . The following Euclidean distance is calculated among the points contained
in these clusters:
D
2
d p(i,j,a) , p(i,k,b) = ||p(i,j,a) − p(i,k,b) || = p(i,j,a,d) − p(i,k,b,d) (1)
d=1
where i is the level number; j and k are the cluster numbers; a and b are the
point numbers; D is the dimension of space and ||.|| represents the norm of the
difference between the two vectors. Manhattan metric may be used instead for
faster results as:
D
d p(i,j,a) , p(i,k,b) = p(i,j,a,d) − p(i,k,b,d) (2)
d=1
The value of the distance d p(i,j,a) , p(i,k,b) is comparedagainst a threshold
t supplied as a parameter to the algorithm. If the test d p(i,j,a) , p(i,k,b) < t
328 R. Elias, M. Al Ashraf, and O. Aly
results in a true condition, the search is broken immediately for the current
clusters and the variable q is set to 1; otherwise, q remains 0. Thus, different
situations arise with respect to the value of q and whether or the parents C(i+1,j)
and C(i+1,k) of clusters C(i,j) and C(i,k) do exist. Those can be summarized as
listed in Table 1.
The procedure explained above is repeated until all clusters are within dis-
tances greater than the above threshold t from each other (similar to [8,6]). Note
that statistics like the mean and the size of the clusters are updated at each level.
After storing the flat random data along a hierarchy, viewing parts of the
data relevant to a query point becomes easier. Spiral and axes techniques [11]
are applied to the hierarchical data. Clusters constituting each level are repre-
sented as pixels where each pixel has a color indicating the mean of all points
contained in the cluster. Interactivity is added as clicking on a pixel displays
the children underneath. A way of magnifying the results is also included in our
implementation.
5 Experimental Results
We considered different factors while building the pyramid. Among these factors
are the number of data points to be clustered and the threshold used and their
impact on the number of levels and the number of clusters at the top level and
consequently on the reduction factor of clusters.
Ten files with sets ranging from 100 to 1000 5D points are used with a fixed
threshold t of 800 applied to Manhattan metric. As expected, the number of
levels increases as the number of points increases for the same threshold. This
is shown in Fig. 2(a).
In our hierarchical structure, a cluster contains one data point at the lowest
level, which makes the number of clusters at this level equal to the number of
points. As we go up the hierarchy, the number of clusters gets smaller while
the number of points per cluster gets larger. For the ten files used before with
the same threshold t of 800, the greatest impact concerning the reduction of
the number of clusters with respect to the number of points happens at the
second level as shown in Fig. 2(b).
From Random to Hierarchical Data 329
(a)
(b)
Fig. 2. (a) The number of levels of the hierarchical structure increases as the number
of points increases. (b) The number of clusters is reduced significantly at the second
level of the hierarchy.
(a)
(b)
Fig. 3. (a) Number of clusters at the top level of the hierarchy for different point sets
and reduction factor values associated with these sets. For all cases, a threshold value
of 800 is used. (b) The percentage of the number of clusters at the top levels to the
total number of points decreases as the number of points increases.
330 R. Elias, M. Al Ashraf, and O. Aly
(a)
(b)
Fig. 5. (a) Number of clusters at the top level of the hierarchy for different thresh-
old levels and reduction factor values associated with these threshold levels. (b) The
reduction factor increases as the threshold value increases.
For each data set where t = 800, the percentages of the number of clusters at
the top levels to the total number of points were measured. As expected from
Fig. 2(b), the percentage decreases as the number of points increases. Conse-
quently, the reduction factor increases as the number of points increases. This is
shown in Fig. 3.
In order to test the impact of the threshold value, one file containing 1000
points is used with threshold values ranging from 300 to 1300. In these cases, the
number of levels ranges from 2 to 4 according to the threshold value as shown
in Fig. 4.
It is logical that by increasing the threshold value, more points can be clustered
together and less clusters can be formed at the top level of the hierarchical
structure. As a consequence, the reduction factor should increase as the threshold
value increases. These results are shown in Fig. 5.
From Random to Hierarchical Data 331
(a) (b)
Fig. 6. Axes technique results for the same file after clustering. (a) Level 4 L(4) is
displayed with 243 points. (b) Level 3 L(3) showing the contents of one of points in the
lower right quadrant in (a).
(a)
(b)
Fig. 7. (a) Time consumed to perform both versions of the axes technique for different
sets of points. (b) Time consumed to perform both versions of the axes technique for
different sets of points.
In order to visualize the points along the hierarchy built as four levels for a
set of 1000 5D points with t= 800, we use both spiral and axes visualization
techniques. As shown in Fig. 6(a), we start by plotting the top level (L(4) ) of the
332 R. Elias, M. Al Ashraf, and O. Aly
clustered hierarchy that contains only 243 points (as opposed to 1000 points in
the original list). A cluster at the top level is represented as a point with a color
indicating the mean of the points (or sub-clusters) contained in that cluster. The
user has the ability to select a particular cluster and view its inner cluster points
where each point can represent a cluster that can be viewed hierarchically and
so on. Fig. 6(b) shows the contents of L(3) after selecting a point in the lower
right quadrant in L(4) .
In order to show the effect of our approach, we measured the time consumed
when using the axes technique in both cases of random and hierarchical data
for different sets of points. This is shown in Fig. 7. Notice that the difference
between both versions gets larger with larger number of points. This makes sense
as the reduction factor gets larger with larger number of points as mentioned
previously (refer to Fig. 3(a)).
We measured the time consumed to display random and hierarchical data for
the same test set using the spiral technique and these were 76 msec and 16 msec
respectively with a computer running at 2.0 GHz.
6 Conclusions
An irregular pyramidal scheme is suggested to transform random data hierar-
chically in an attempt to reduce time consumed searching the whole data for
a particular query. Tests show reductions in the amount of data processed and
consequently in time consumed.
References
1. Andrews, K., Heidegger, H.: Information slices: Visualising and exploring large
hierarchies using cascading, semi-circular discs. In: IEEE InfoVis 1998, pp. 9–12
(1998)
2. Balzer, M., Deussen, O., Lewerentz, C.: Voronoi treemaps for the visualization of
software metrics. In: Proc. ACM SoftVis 2005, New York, USA, pp. 165–172 (2005)
3. Beaudoin, L., Parent, M.-A., Vroomen, L.C.: Cheops: a compact explorer for com-
plex hierarchies. In: Proc. 7th conf. on Visualization (VIS 1996), Los Alamitos,
CA, USA, p. 87 (1996)
4. Bladh, T., Carr, D., Scholl, J.: Extending tree-maps to three dimensions: a compar-
ative study. In: Masoodian, M., Jones, S., Rogers, B. (eds.) APCHI 2004. LNCS,
vol. 3101, pp. 50–59. Springer, Heidelberg (2004)
5. Chignell, M.H., Poblete, F., Zuberec, S.: An exploration in the design space of
three dimensional hierarchies. In: Human Factors and Ergonomics Society Annual
Meeting Proc., pp. 333–337 (1993)
6. Elias, R., Laganiere, R.: The disparity pyramid: An irregular pyramid approach
for stereoscopic image analysis. In: VI 1999, Trois-Rivières, Canada, May 1999, pp.
352–359 (1999)
7. Hartman, N.P., Tanimoto, S.: A hexagonal pyramid data structure for image pro-
cessing. IEEE Trans. on Systems, Man and Cybernetics 14, 247–256 (1984)
8. Jolion, J.M., Montavert, A.: The adaptive pyramid: A framework for 2d image
analysis. CVGIP: Image Understanding 55(3), 339–348 (1991)
From Random to Hierarchical Data 333
9. Jolion, J.M., Rosenfeld, A.: A Pyramid Frame-work for Early Vision. Kluwer Aca-
demic Publishers, Dordrecht (1994)
10. Keim, D.A., Ankerst, M., Kriegel, H.-P.: Recursive pattern: A technique for visu-
alizing very large amounts of data. In: Proc. 6th VIS 1995, Washington, DC, USA,
pp. 279–286, 463 (1995)
11. Keim, D.A., Kriegel, H.: VisDB: Database exploration using multidimensional vi-
sualization. In: Computer Graphics and Applications (1994)
12. Keim, D.A., Kriegel, H.-P.: Visualization techniques for mining large databases: A
comparison. IEEE Trans. on Knowl. and Data Eng. 8(6), 923–938 (1996)
13. Kerren, A.: Explorative analysis of graph pyramids using interactive visualization
techniques. In: Proc. 5th IASTED VIIP 2005, Benidorm, Spain, pp. 685–690 (2005)
14. Kerren, A., Breier, F., Kgler, P.: Dgcvis: An exploratory 3d visualization of graph
pyramids. In: Proc. 2nd CMV 2004, London, UK, pp. 73–83 (2004)
15. Kropatsch, W.G.: A pyramid that grows by powers of 2. Pattern Recognition Let-
ters 3, 315–322 (1985)
16. Plaisant, C., Grosjean, J., Bederson, B.B.: Spacetree: Supporting exploration in
large node link tree, design evolution and empirical evaluation. In: Proc. IEEE
InfoVis 2002, Washington, DC, USA, p. 57 (2002)
17. Rekimoto, J., Green, M.: The information cube: Using transparency in 3d infor-
mation visualization. In: Proc. 3rd WITS 1993, pp. 125–132 (1993)
18. Robertson, G.G., Mackinlay, J.D., Card, S.K.: Cone trees: animated 3d visualiza-
tions of hierarchical information. In: Proc. CHI 1991, New York, USA, pp. 189–194
(1991)
19. Shneiderman, B.: Tree visualization with tree-maps: 2-d space-filling approach.
ACM Trans. Graph. 11(1), 92–99 (1992)
20. Stasko, J.T., Zhang, E.: Focus+context display and navigation techniques for en-
hancing radial, space-filling hierarchy visualizations. In: INFOVIS, p. 57 (2000)
21. Wattenberg, M.: Visualizing the stock market. In: CHI 1999 extended abstracts on
Human factors in computing systems, New York, USA, pp. 188–189 (1999)
22. Yang, J., Ward, M.O., Rundensteiner, E.A.: Interring: An interactive tool for vi-
sually navigating and manipulating hierarchical structures. In: Proc. IEEE InfoVis
2002, Washington, DC, USA, p. 77 (2002)
Electric Field Theory Motivated Graph
Construction for Optimal Medical Image
Segmentation
Electrical and Computer Engineering, The University of Iowa, Iowa City, IA, USA
Milan-Sonka@uiowa.edu
1 Introduction
Wu and Chen introduced graph search image segmentation called optimal net
surface problem in 2002 [1]. Use of this method in medical image segmenta-
tion area closely followed [2,3,4,5,6,7,8,9,10]. Out of these publications, [3] is
considered a pioneering paper in which Li et al. explained and verified how to
optimally segment single and multiple coupled flat surfaces represented by a vol-
umetric graph structure. This work was further extended to optimally segment
multiple coupled closed surfaces of a single object [2]. Later, Garvin introduced
in-region cost concept [5] and applied to 8-surface segmentation of retinal layers
from OCT images [7]. Olszewski and Zhao utilized this concept for 4D dual-
surface inner/outer wall segmentation in coronary intravascular ultrasound and
in intrathoracic airway CT images [4]. Yin has further extended this framework
by solving a general “multiple surfaces of multiple objects” problem with ap-
plications to knee cartilage segmentation and quantification [8]. Independently,
Li added elasticity constraint and segmented 3D liver tumors [9]. The optimal
surface detection algorithms were also employed for 3D soft tissue segmentation
in [6] as well as for segmentation of a coupled femoral head and ilium and a
coupled distal femur and proximal tibia in 3D CT data [10].
A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 334–342, 2009.
c Springer-Verlag Berlin Heidelberg 2009
Electric Field Theory Motivated Graph Construction 335
2 Methods
2.1 Graph Structures for Optimal Surface Detection
The basic graph construction idea comes from a study of an optimal V-weight
net surface problem on proper ordered multi-column graphs [1]. Let us start from
a simple example shown on Fig. 1(a). Each node is assigned a cost value C. Each
edge has infinite capacity. We reassign the cost as Ca = Ca , Cb = Cb − Ca , Cc =
Cc − Cb , Cd = Cd − Cc . . .. This cost assignment is called cost translation. After
translation, we connect a source s to all negative C and connect all positive C
to a sink t. The connection-edge capacity is assigned to |C |. A max-flow/min-
cut computation will partition the nodes of this graph into two sets – S and
T , such that s ∈ S and t ∈ T . Note that S − s is a closed set, meaning that
the graph cut position on column i must be −ε higher and θ lower than the
graph cut position on column j, so that the minimum and maximum distances
between the two cut positions are −ε and θ, respectively. Furthermore, the total
translated costs in the closed set are guaranteed to be minimal because their
sum and the cost of the corresponding cut only differ by a constant (the sum
of absolute values of all negative C ). Thus, summing up these costs guarantees
that the nodes immediately under the graph cut have the sum of untranslated
costs equivalent to the sum of translated costs in the closed set. For that reason,
the surface formed by such a cut is globally optimal.
In image segmentation tasks, the nodes on the graph correspond to candidate
searching points. We want to find one and only one point along each searching
direction which corresponds to each column in the graph. The graph cut on
columns provides a globally optimal solution which gives a minimum sum of
point costs under a specific graph structure. Based on this simple two-column
relationship, n-D graphs can be constructed. Fig. 1(b) shows a 3D example,
in which the graph cut forms a 3D surface. The 4D case can be seen in [4].
If i and j are from the same surface, the min and max distances define the
336 Y. Yin, Q. Song, and M. Sonka
(a) (b)
Fig. 1. A simple example of proper ordered multi-column graph. (a) Two columns. (b)
Columns combined in 3D.
surface smoothness. If i and j are from different surfaces of one object, such a
configuration corresponds to a multiple coupled surface relationship. If i and j
are from different objects, multi-object relationships are represented [8]. If i and
j are the grid neighbors on an image, a flat surface will result [3]. If i and j
are the vertex neighbors of a closed surface, a closed surface will be found as a
solution of the graph search optimization process [2].
While the theory – as presented – is quite straightforward, the implementation
of these basic principles is not simple. In our multi-object multi-surface image
segmentation task, two problems frequently arise. One is to prevent occurrence of
surface warping when applying graph search iteratively. Another issue is finding
a reliable cross-surface mapping method. In our previous work we have employed
a 3D distance transform and medial sheet definition approaches to define cross-
surface mapping. This approach suffers from local inconsistencies in areas of
complex surface shapes. Motivated by electric field theory, we devised a new
method for cross-surface mapping defining the searching directions (columns)
of our graph column construction, which has overcome the limitations of our
previous approach. This approach has proven to be very promising to handle
the two identified problems when applied to medical image segmentation tasks,
as described below.
(a) (b)
Fig. 2. A simulation of ELF. (a) Multiple unit charge points used for field definition
– the electric field is depicted in red and ELF is shown in white. (b) Simulated ELF
(red lines) for closed surface model of a 3D bifurcation.
the electric field has the same direction as the electric line of force (ELF).
When multiple source points are forming an electric field, the electric lines of
force exhibit a non-intersection property, which is of major interest in the context
of our graph construction task. This property can be shown in 2D in Fig. 2(a).
Note that if we change r2 to rm (m > 0), the non-intersection property still
holds. The difference is that the vertices with longer distances will be penalized
in ELF computing. Considering that the surface is composed of a limited number
of vertices greatly reduces the effect of charges with short distances. In order to
compensate, we selected m = 4. Discarding the constant term, we defined our
electric field as Ei = r14 . Inspired by ELF, we assigned unit charges to each
338 Y. Yin, Q. Song, and M. Sonka
(a) (b)
Fig. 3. Correspondent pair generation in 2D and 3D. (a) 2D case where the red lines
are ELF and their connecting counterparts are depicted in green. The constraint points
are at the intersection position between the green line and the corresponding coupled
surface. (b) Use of barycentric coordinates to interpolate back-trace lines in 3D, then
connect to each vertices for the intersected triangle.
If there is one closed surface charge in an n-D space, there is only one n-D point
inside this closed surface having a zero electric field. At an extreme case, the
closed surface will converge to a point when searching along the ELF. Except
for that point, any position having non-zero electric field in n-D space will be
crossed by one ELF. In that case, we can trace back along the ELF to a specific
position on the surface (whether it is a vertex or not). This property can be used
to relate multiple coupled surfaces, thus defining cross-surface mapping.
In the application of cross-surface mapping, we compute ELF for each closed
surface within a searching range independently. Considering a task of segmenting
multiple mutually interacting surfaces for multiple mutually interacting objects,
the regions in which the objects are in proximity to each other are called contact
areas. We can compute medial sheets between coupled surfaces to identify the
separation of objects in the contact areas. Clearly, any vertex for which the ELF
intersects the medial sheet can be regarded as belonging to the contact area. To
form correspondent vertex pairs, the medial-sheet-intersected ELF will connect
the coupled surface points while intersecting the medial sheet at one and only one
point, forming an intersection point on the coupled surface, used as constraint
point. The vertex having the intersected ELF and its corresponding constraint
point will form a correspondent vertex pair. Consequently, the ELF connecting
this pair forms the searching graph column. Fig. 3(a) shows a 2D case in which
the red lines depict the ELF and their connecting counterparts are depicted by
Electric Field Theory Motivated Graph Construction 339
green lines. The constraint points are at the intersection position between the
green lines and the corresponding surfaces. In the 2D case, the back-trace can
be done by linear interpolation of the nearest ELF. Subsequently, the constraint
points are connected to the points on the coupled surface. In the 3D case, the
lines can be traced according to the barycentric coordinates of the intersected
triangles. As shown in Fig. 3, the constraint point are further connected to the
vertices of the triangle.
Each vertex in the contact area can therefore be used to create a constraint
point affecting the coupled surface. Importantly, the correspondent pairs of ver-
tices from two interacting objects in the contact area identified using the ELF
are guaranteed to be in a one-to-one relationship and every-to-every mapping,
irrespective of surface vertex density. As a result, the desirable property of main-
taining the previous surface geometry is preserved.
3 Applications
3.1 Single-Surface Detection along a 3D Bifurcation
An example is shown in Fig. 4(a) in which a perfect pre-segmented inner bound-
ary of a 3D bifurcation is provided and the outer boundary needs to be identified.
The graph search along the surface-normal direction will corrupt the surface due
to the sharp corner as shown in Fig. 4(b). However, when employing the direc-
tionality constraints specified by ELF, the directionality of the “normal” lines
along the surface is orderly and the search can avoid the otherwise inevitable
corruption of the surface solution (Fig. 4(c)).
(a) (b)
Fig. 6. Graph-based femur-tibia cartilage delineation in 3D. (a) Graph searching result
without using correspondent vertex pairs. (b) Graph searching result using constraint-
point correspondent vertex pairs.
4 Conclusion
Acknowledgments
References
1. Wu, X., Chen, D.Z.: Optimal net surface problem with applications. In: Widmayer,
P., Triguero, F., Morales, R., Hennessy, M., Eidenbenz, S., Conejo, R. (eds.) ICALP
2002. LNCS, vol. 2380, pp. 1029–1042. Springer, Heidelberg (2002)
2. Li, K., Millington, S., Wu, X., Chen, D.Z., Sonka, M.: Simultaneous segmentation
of multiple closed surfaces using optimal graph searching. In: Christensen, G.E.,
Sonka, M. (eds.) IPMI 2005. LNCS, vol. 3565, pp. 406–417. Springer, Heidelberg
(2005)
3. Li, K., Wu, X., Chen, D.Z., Sonka, M.: Optimal surface segmentation in volumetric
images – a graph-theoretic approach. IEEE Trans. Pattern Anal. and Machine
Intelligence 28(1), 119–134 (2006)
4. Zhao, F., Zhang, H., Walker, N.E., Yang, F., Olszewski, M.E., Wahle, A., Scholz,
T., Sonka, M.: Quantitative analysis of two-phase 3D+time aortic MR images.
SPIE Medical Imaging, vol. 6144, pp. 699–708 (2006)
5. Haeker, M., Wu, X., Abramoff, M., Kardon, R., Sonka, M.: Incorporation of re-
gional information in optimal 3-D graph search with application for intraretinal
layer segmentation of optical coherence tomography images. In: Karssemeijer, N.,
Lelieveldt, B. (eds.) IPMI 2007. LNCS, vol. 4584, pp. 607–618. Springer, Heidelberg
(2007)
6. Heimann, T., Munzing, S., Meinzer, H., Wolf, I.: A shape-guided deformable
model with evolutionary algorithm initialization for 3D soft tissue segmentation.
In: Karssemeijer, N., Lelieveldt, B. (eds.) IPMI 2007. LNCS, vol. 4584, pp. 1–12.
Springer, Heidelberg (2007)
7. Garvin, M.K., Abramoff, M.D., Kardon, R., Russell, S.R., Wu, X., Sonka, M.:
Intraretinal layer segmentation of macular optical coherence tomography images
using optimal 3-D graph search. IEEE Trans. Med. Imaging 27(10), 1495–1505
(2008)
8. Yin, Y., Zhang, X., Sonka, M.: Optimal multi-object multi-surface graph search
segmentation: Full-joint cartilage delineation in 3D. In: Medical Image Understand-
ing and Analysis 2008, pp. 104–108 (2008)
9. Li, K., Jolly, M.P.: Simultaneous detection of multiple elastic surfaces with appli-
cation to tumor segmentation in ct images. In: Proc. SPIE, vol. 6914, pp. 69143S–
69143S–11 (2008)
10. Kainmueller, D., Lamecker, H., Zachow, S., Heller, M., Hege, H.C.: Multi-object
segmentation with coupled deformable models. In: Proc. of Medical Image Under-
standing and Analysis, pp. 34–38 (2008)
11. Dice, L.R.: Measures of the amount of ecologic association between species. Ecol-
ogy 26, 297–302 (1945)
Texture Segmentation by Contractive Decomposition
and Planar Grouping
Abstract. Image segmentation has long been an important problem in the com-
puter vision community. In our recent work we have addressed the problem of
texture segmentation, where we combined top-down and bottom-up views of the
image into a unified procedure. In this paper we extend our work by proposing
a modified procedure which makes use of graphs of image regions. In the top-
down procedure a quadtree of image region descriptors is obtained in which a
novel affine contractive transformation based on neighboring regions is used to
update descriptors and determine stable segments. In the bottom-up procedure
we form a planar graph on the resulting stable segments, where edges are present
between vertices representing neighboring image regions. We then use a vertex
merging technique to obtain the final segmentation. We verify the effectiveness
of this procedure by demonstrating results which compare well to other recent
techniques.
1 Introduction
The problem of image segmentation, with the general goal of partitioning an image
into non-overlapping regions such that points within a class are similar while points
between classes are dissimilar [1], has long been studied in computer vision. It plays a
major role in high level tasks like object recognition [2,3], where it is used to find image
parts corresponding to scene objects, and image retrieval [4], where the objective is to
relate images from similar segments. Textured objects, in particular, pose a great chal-
lenge for segmentation since patterns and boundaries can be difficult to identify in the
presence of changing scale and lighting conditions [5]. Often textures are characterized
by repetitive patterns [6], and these are only characteristic from a certain scale. Below
this scale these patterns will only be partly visible [7] which makes precise boundary
detection in this case an additional challenge. The intensity variation of textures is of-
ten overlapping with the background, which may add further difficulty. Examples of
proposed approaches to texture segmentation include active contours [8], templates [2],
or region descriptors [9]. We recently introduced a new approach to texture segmen-
tation [10], where the procedure is unsupervised in the sense that we assume no prior
knowledge of the target classes, i.e. number of regions or known textures.
A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 343–352, 2009.
c Springer-Verlag Berlin Heidelberg 2009
344 A.B. Dahl, P. Bogunovich, and A. Shokoufandeh
Fig. 1. Texture segmentation from contractive maps. In (a) a heterogeneous image is shown
created by composing a Brodatz texture [11] with itself rotated 90o in a masked out area obtained
from the bird in (b). The resulting segmentation is shown in (c).
(b)
(a) (c)
Fig. 2. The segmentation procedure. The top-down decomposition of the image is shown in (a).
In (b) the feature kernel set is shown. The first image in (c) is the over-segmented image obtained
from the decomposition. The segments are merged in the bottom-up procedure to obtain the final
segment shown in the last two images.
2 Method
In this section we present an overview of the general procedure for unsupervised texture
segmentation. First we give a brief review of the process for obtaining base characteri-
zations of small image regions which serve as a starting point for the segmentation. We
then indicate our modifications to the decomposition transformation and the approach
to merging leaves and generating the final segmentation.
In [10] we introduced the concept of kernel partition iterated function systems (kPIFS)
which proved to be a viable technique for obtaining a basic characterization of local
image structure to serve as a starting point for segmentation. Since we are primarily
focused on the top-down and bottom-up procedures in this paper we only provide a
brief review of kPIFS descriptors and we refer the reader to our previous paper [10] for
more details.
The kPIFS technique which we developed is inspired by and closely related to the
partition iterated function systems (PIFS) introduced by Jacquin [12] for the purpose
of lossy image compression [13]. We saw potential in PIFS to characterize local image
structure based on evidence indicating that it can be used in tasks such as edge detection
[14] and image retrieval [15].
The traditional PIFS image compression technique computes a set of self-mappings
on the image. The process begins by partitioning an image into a set of domain blocks
DI , and again into smaller range blocks RI , as illustrated by Figure 3(b). The image is
encoded by matching an element d ∈ DI to each rk ∈ RI . In the course of matching,
a transformation θk which is generally affine is calculated for the domain block d
that matches range block rk and θk (d ) is used to represent rk . Once all of the maps
are computed they can be applied to an arbitrary image and will result in an accurate
reconstruction of the encoded image.
For our goal of characterizing local structure we designed kPIFS to avoid self-
mappings between domain blocks and range blocks. Instead we chose to find mappings
from an over-complete basis of texture kernels, DK , to the range blocks of the image as
illustrated by Figure 3(c). The kernels employed here are meant to represent local struc-
tural image patterns such as corners, edges of varying width and angle, blobs, and flat
regions. In our procedure, each image range block will be characterized by distances of
each of the domain kernels to the range block after a calibration transform is applied.
Specifically, for a domain kernel d ∈ DK and a range block rk ∈ RI the distance in
kPIFS is given by
346 A.B. Dahl, P. Bogunovich, and A. Shokoufandeh
Fig. 3. Comparison of PIFS and kPIFS. Part (a) shows the original image with the highlighted
area is focused on in (b) and (c). Part (b) is an example of PIFS where the best matching domain
block is mapped to a range block. Part (c) shows the kPIFS where the domain blocks are replaced
by domain kernels.
d − μd rk − μrk
δkPIFS (rk , d ) = − , (1)
σd σrk
where μx and σx are the mean and standard deviation respectively of block x. The
calibrated blocks will be highly influenced by noise if σrk is small and if it is zero we
cannot estimate δkPIFS . Therefore, we use a measure of flatness of the range blocks,
√
bf = σrk / μrk . If bf < tf , where tf is a threshold, we categorize the block as flat.
We then let each range block be described by its best mapped (least distant) domain
kernels. The similarity for a kernel is weighted by the relative similarity of all of the
kernels to the range block. Let Δrk denote the mean distance from each kernel in DK
to the current range block obtained from (1) and let γkernel be a scalar constant con-
trolling how many domain kernels are included in the descriptions. The kernel to range
block similarity is given by w[rk ,d ] = max {γkernelΔrk − δkPIFS (rk , d ), 0} for each
d ∈ DK to form a vector of similarities which is normalized yielding a range block
descriptor in the form of a distribution of domain kernels. Intuitively each w[rk ,d ] de-
scribes the error in fitting kernel d to block rk .
The novel idea that we now introduce to this procedure addresses the iterative trans-
formations that are applied to the nodes until convergence. The convergence of both the
original transformation and the new one presented here rely on properties of contractive
transformations in a metric space [16]. Here we briefly review the necessary concepts.
Definition 1 (Contractive Transformation). Given a metric space (X, δ), a transfor-
mation T : X → X is called contractive or a contraction with contractivity factor s if
there exists a positive constant s < 1 so that δ(T (x), T (y)) ≤ sδ(x, y) ∀x, y ∈ X.
Let us then denote T ◦n (x) = T ◦ T ◦ · · · ◦ T (x); that is, T composed with itself n times
and applied to x. The property of contractive transformations that we are interested in
is given in the following theorem which is proved in [16].
Theorem 1 (Contractive Mapping Fixed Point Theorem). Let (X, δ) be a complete
metric space and let T : X → X be a contractive transformation, then there exists a
unique point xf ∈ X such that for all x ∈ X we have xf = T (xf ) = limn→∞ T ◦n (x).
The point xf is called the fixed point of T .
The importance of this theorem is that if we can show a transformation to be contractive
in a defined metric space, then we are sure that some fixed point will be reached by
applying the transformation iteratively. In both the original procedure and the updated
version the metric space was defined as the set of image region descriptor histograms
which can be thought of as lying in the space IRd . It follows that any metric on IRd can
be chosen, but in practice however we have just used the L1 distance metric, denoted
by δL1 and defined as δL1 (x, y) = di=1 |xi − yi |.
In the original paper on the procedure [10] we proposed a transformation to perform
an iterative weighted averaging of similar region descriptors within a local spatial neigh-
borhood. Specifically, given some descriptor wi at the current level of the quadtree, let
Ni denote the set of m × m spatially local neighbor descriptors around wi but not
including wi , and let μNi be the average L1 distance from wi to all of the other de-
scriptors in Ni . We then denote a weighted average distance tNi = ψμNi , where ψ is
some weighting constant, and denote the set of close descriptors Nic = {wj ∈ Ni :
dL1 (wi , wj ) ≤ tNi }. Then we define a transformation Fi for this descriptor to be the
average of the descriptors Nic and wi . More explicitly:
⎛ ⎞
1
Fi (w) = ⎝w + wj ⎠ . (2)
1 + |Nic |
wj ∈Nic
A transformation Fi was found for each wi at the current level and it was applied it-
eratively to obtain updated descriptors, i.e. wni = Fi◦n (wi ), until δL1 (wni , wn+1
i )<
for some given error threshold . We claimed that each Fi was contractive and would
thus yield a fixed point descriptor based on a result from Van der Vaart and Van Zanten
[17]. While this appears sufficient, the proof is complicated and indirect and Fi takes a
somewhat inconvenient form. Here we propose a simpler affine transformation where
contractivity can easily be observed.
Our new transformation is also defined for each region descriptor at each level of
the quadtree. Let wi , Ni and tNi be defined as above and let Ni = {wi } ∪ Ni . We
348 A.B. Dahl, P. Bogunovich, and A. Shokoufandeh
now define a set of scalar weights for every descriptor in Ni such that s(i,j) represents
a measure of similarity between wi and wj for wj ∈ Ni . The weights are defined
as s(i,j) = max{(tNi − dL1 (wi , wj ))/ci , 0}, where ci is a normalization constant so
|N |
that j=1i s(i,j) = 1. In this way all s(i,j) ≤ 1, and each descriptor wj ∈ Ni has
an associated similarity weight s(i,j) with the special scalar s(i,i) being the weight for
the wi . Now define
a new descriptor vi to be a linear combination of the descriptors
in Ni as vi = wj ∈Ni s(i,j) wj , and our affine transformation Gi for descriptor wi
is given by
Gi (w) = s(i,i) w + vi . (3)
Again we iteratively apply Gi to wi obtaining wni = G◦n i (wi ) until convergence,
but here due to the simple affine form of Gi it is particularly easy to demonstrate
the contractivity of the transformation. For arbitrary descriptors x, y ∈ IRd we have
d
δL1 (Gi (x), Gi (y)) = j=1 |(s(i,i) xj + vij ) − (s(i,i) yj + vij )|. Notice that the vij ’s
all cancel out and the s(i,i) can be factored out, simplifying to δL1 (Gi (x), Gi (y)) =
s(i,i) dj=1 |xj − yj | = s(i,i) δL1 (x, y) and since s(i,i) ≤ 1, we have that Gi is either
contractive or it does not move wi at all, either way we are guaranteed by theorem 1 to
reach a fixed point descriptor which we can denote by wi . In practice the convergence
is quite fast and we generally need less than 10 iterations for = 0.01.
When the fixed point descriptors wi are reached for all regions at the current level,
we identify the stability of each region based on the discrepancy of its fixed point to
the fixed point of its neighbors. Since both Fi and Gi average each wi with its similar
neighbors, there is a strong possibility that sub-images in the regions with high local
discrepancy after the iterative procedure will cover different textures. To avoid misclas-
sifications we split and repeat the contractive mappings on these regions at the next level
of the quadtree, as illustrated in Figure 2(a). The discrepancy of a node is measured by
comparing wi to the fixed points of its four spatially nearest neighbors which we de-
note by the set N i . Let μN i denote the average L1 distance from wi to the descriptors
in N i and let mN i denote the maximum distance from wi to N i , then the discrepancy
measure of the region is defined as Di = μN i + mN i .
Though we are only concerned with splitting and reprocessing unstable regions, in
practice all regions are split. From Di we are able to calculate a border measure for
each node as Bi = Di / max{Dj : j ∈ {1, . . . , Nk }} where Nk is the total number of
nodes at the current decomposition level. Bi determines how wi ’s children descriptors
are calculated. Let {w(i,j) : j ∈ {1, . . . , 4}} denote the 4 initial descriptors of wi ’s
children used in the next level of the quadtree. If Bi = 0 then the region is stable and
there is no chance of wi covering a boundary region and so we assign w(i,j) = wi for
all children. When Bi > 0 we let {v(i,j) : j ∈ {1, . . . , 4}} denote the descriptors of
the child regions calculated as the normalized sum of kPIFS histograms in the same
manner as at the starting level lstart . Then we obtain the new descriptors as w(i,j) =
(1 − Bi )wi + Bi v(i,j) .
1. 0.07 2. 1, 2.
1. 2. 1, 2.
0.
0.
52
47
0.87 0.34 0.92
4. 3. 4. 3.
4. 0.77 3. 4. 0.77 3.
(a) (b)
Fig. 4. Bottom-up merging of image regions. Part (a) shows the obtained segments and (b) show
the corresponding graph. Edge weights are given similarity between the segments. In the right
hand of (a) and (b) segments 1 and 2 of the left sides of (a) and (b) are merged.
of the bottom-up procedure is to merge these leaves into homogeneous clusters which
form the final segmentation.
In our original approach we fit a mixture of Gaussians to the distribution of leaf
nodes wf using the approach of Figueiredo [18] and the final segmentation was found
by the Gaussian that gave the highest probability.
Our new approach begins by forming a planar graph G so that the vertices of G are
the leaf nodes and an edge (i, j) is formed between vertices representing adjacent image
regions with edge weight equal to δL1 (wi , wj ), the distance between the associated
fixed point descriptors. The bottom-up procedure then merges adjacent vertices of G
based on edge weight. Let αi denote the percentage of the total image covered by vertex
i. Then αi is considered in the merging, so the smallest regions will be forced to merge
with the most similar neighboring region and when merging any two vertices i, j the
ratio αi /αj is considered so that the merged vertex has a descriptor which is mostly
influenced by the relatively larger region.
The merging of vertices is done in two steps. Initially we merge all vertex pairs i, j
where the edge weight is close to 0, i.e. less than some small positive . These regions
had nearly identical fixed points and the disparity is most likely only due to the fact
that the fixed point is approximated. In the second step we let ΔG denote the average
weight in the current graph G which is updated after each merging is performed. We
proceed in merging the vertices i, j with the smallest current edge weight until the
relative weight δL1 (wi , wj )/ΔG is larger than some threshold γmerge ∈ [0, 1). Figure 4
gives an illustration of the process.
3 Experiments
In this section we show the experimental results of our procedure. The images used for
testing our procedure are from the Berkley image database [19] and the Brodatz textures
[11]. Our procedure has shown to be very powerful for texture segmentation, which is
demonstrated by comparing our results to state of the art methods of Fauzi and Lewis
[3], Houhou et al.[8], and Hong et al.[7].
In Fauzi and Lewis [3] they perform unsupervised segmentation on a set of composed
Brodatz textures [11]. We have compared the performance of our method to theirs by
making a set of randomly composed images from the same set of Brodatz textures.
These composed images are very well suited to our method because the descriptors
350 A.B. Dahl, P. Bogunovich, and A. Shokoufandeh
Fig. 5. Segmentation of the Brodatz textures [11]. The composition of the textures is inspired
by the segmentation procedure of Fauzi and Lewis [3]. Segmentations borders are marked with
white lines except the last image where a part in the lower right is marked in black to make it
visible.
Fig. 6. Comparative results. This figure shows our results compared to that of Hong et al.[7]. Our
results are on the top in (a) and (b) and right in (c).
precisely cover one texture, so to challenge our procedure we changed the composition.
Some examples of the results are shown in Figure 5. We obtain very good segmentation
for all images with only small errors along the texture boundaries. In 19 of 20 images
we found the correct 5 textures and only the texture in the lower right hand corner
of the last image was split into two. It should be noted that this texture contains two
homogenous areas. In [3] only 7 of 9 composed images were accurately segmented.
These results show that the texture characterization is quite good. But the challenge of
textures in natural images is larger, as we will show next.
We have tested our procedure on the same set of images from the Berkley segmen-
tation database [19] as was used in Hong et al.[7] and Houhou et al.[8]. The results are
compared in Figures 6 and 7. Our method preforms well compared to that of Hong et
al., especially in Figures 6(a) and (c). It should be noted that the focus of that paper was
also on texture scale applied to segmentation. The results compared to the method of
Houhou et al.are more alike and both methods find the interesting segments in all im-
ages. In Figures 7(e) and (f) our method finds some extra textures which are clearly dis-
tinct. In Figures 7(k) and (l) both methods find segments that are not part of the starfish,
but are clearly distinct textures. There are slight differences in the two methods, e.g. in
Figures 7(a) and (b) where the object is merged with a part of the background in our
method, whereas it is found very nicely in the method of Houhou et al. [8]. An example
in favor of our procedure is Figures 7(m) and (n) where part of the head and the tail is
not found very well by their method, whereas it is found very well by our procedure.
Texture Segmentation by Contractive Decomposition and Planar Grouping 351
(m) (h)
Fig. 7. Comparative results. This figure shows our results in columns one and three compared to
the results from Houhou et al.[8] in columns two and four.
4 Conclusion
Texture poses a great challenge to segmentation methods, because textural patterns can
be hard to distinguish at a fine scale making precise boundary detection difficult. We
have presented a novel, computationally efficient approach to segmentation of texture
images. To characterize the local structure of the image, we begin by a top-down decom-
position in the form of a hierarchical quadtree. At each level of this tree a contractive
transformation is computed for each node and is iteratively applied to generate a novel
encoding of the sub-images. The hierarchical decomposition is controlled by the stabil-
ity of the encoding associated with nodes (sub-images). The leaves of this quadtree and
their incidency structure with respect to the original image will form a planar graph in a
natural way. The final segmentation will be obtained from a bottom-up merging process
applied to adjacent nodes in the planar graph. We evaluate the technique on artificially
composed textures and natural images, and we observe that the approach compares fa-
vorably to several leading texture segmentation algorithms on these images.
References
1. Pal, N.R., Pal, S.K.: A review on image segmentation techniques. Pattern Recognition 26(9),
1277–1294 (1993)
2. Borenstein, E., Ullman, S.: Class-specific, top-down segmentation. In: Heyden, A., Sparr,
G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2351, pp. 109–122. Springer,
Heidelberg (2002)
352 A.B. Dahl, P. Bogunovich, and A. Shokoufandeh
3. Fauzi, M.F.A., Lewis, P.H.: Automatic texture segmentation for content-based image retrieval
application. Pattern Anal. & App. 9(4), 307–323 (2006)
4. Liu, Y., Zhou, X.: Automatic texture segmentation for texture-based image retrieval. In:
MMM (2004)
5. Malik, J., Belongie, S., Shi, J., Leung, T.: Textons, contours and regions: Cue integration in
image segmentation. In: IEEE ICCV, pp. 918–925 (1999)
6. Zeng, G., Van Gool, L.: Multi-label image segmentation via point-wise repetition. In: IEEE
Conference on Computer Vision and Pattern Recognition, 2008. CVPR 2008, pp. 1–8 (June
2008)
7. Hong, B.H., Soatto, S., Ni, K., Chan, T.: The scale of a texture and its application to seg-
mentation. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8
(2008)
8. Houhou, N., Thiran, J., Bresson, X.: Fast texture segmentation model based on the shape
operator and active contour. In: CVPR, pp. 1–8 (2008)
9. Bagon, S., Boiman, O., Irani, M.: What is a good image segment? a unified approach to seg-
ment extraction. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part IV. LNCS,
vol. 5305, pp. 30–44. Springer, Heidelberg (2008)
10. Dahl, A., Bogunovich, P., Shokoufandeh, A., Aanæs, H.: Texture segmentation from context
and contractive maps. Technical report (2009)
11. Brodatz, P.: Textures; a photographic album for artists and designers (1966)
12. Jacquin, A.E.: Image coding based on a fractal theory of iterated contractive image transfor-
mations. IP 1(1), 18–30 (1992)
13. Fisher, Y.: Fractal Image Compression - Theory and Application. Springer, New York (1994)
14. Alexander, S.: Multiscale Methods in Image Modelling and Image Processin. PhD thesis
(2005)
15. Xu, Y., Wang, J.: Fractal coding based image retrieval with histogram of collage error. In:
Proceedings of 2005 IEEE International Workshop on VLSI Design and Video Technology,
2005, pp. 143–146 (2005)
16. Rudin, W.: Principles of Mathematical Analysis, 3rd edn. McGraw-Hill, New York (1976)
17. van der Vaart, A.W., van Zanten, J.H.: Rates of contraction of posterior distributions based
on gaussian process priors. The Annals of Statistics 36(3), 1435–1436 (2008)
18. Figueiredo, M.A.T., Jain, A.K.: Unsupervised selection and estimation of finite mixture mod-
els. In: Proc. Int. Conf. Pattern Recognition, pp. 87–90 (2000)
19. Martin, D., Fowlkes, C., Tal, D., Malik, J.: A database of human segmented natural images
and its application to evaluating segmentation algorithms and measuring ecological statistics.
In: ICCV, vol. 2, pp. 416–423 (July 2001)
Image Segmentation Using Graph
Representations and Local Appearance and
Shape Models
1 Introduction
The goal of image segmentation is to partition an image into meaningful dis-
joint regions, whereby these regions delineate different objects of interest in the
observed scene. Segmentation of anatomical structures in medical images is es-
sential in some clinical applications such as diagnosis, therapy planning, visual-
ization and quantification. As manual segmentation of anatomical structures in
two or three-dimensional medical images is a very subjective and time-consuming
process, there is a strong need for automated or semi-automated image segmen-
tation algorithms.
A large number of segmentation algorithms have been proposed. While earlier
approaches where often based on a set of ad hoc processing steps, optimization
A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 353–365, 2009.
c Springer-Verlag Berlin Heidelberg 2009
354 J. Keustermans et al.
avoided [13]. To build the local shape model the polygon mesh is considered as an
undirected graph in which the nodes represent vertices and neighboring nodes
are connected by edges. By implying the markovianity property to the graph
each vertex is only directly dependent upon its neighboring vertices. In this way
a local shape model can be applied. This method does not suffer from the noise-
sensitivity of the edge-detection methods if a good local image descriptor is used.
Nor does this method assume homogeneity of the object appearance.
We extend the framework proposed by Seghers [12] by incorporating kernel
based methods for statistical model building and experimenting with other local
appearance models. We applied this segmentation algorithm to the segmentation
of teeth from Cone Beam Computed Tomography (CBCT) images of a patient.
The recent introduction of CBCT enables the routine computer-aided planning of
orthognatic surgery or orthodontic treatment due to its low radiation dose, unique
accessibility and low cost. These applications however require the segmentation of
certain anatomical structures from the 3D images, like teeth. The CBCT image
quality can be hampered by the presence of several artifacts, like metallic streak
artifacts due to orthodontic braces or dental fillings. The method should be able
to cope with these artifacts. Another application, also in the maxillofacial region,
is automatic 3D cephalometric analysis [14]. 3D Cephalometric analysis consists
of finding anatomical landmarks in 3D medical images of the head of the patient.
Based on the location of these anatomical landmarks for example an orthodontic
or orthognathic treatment planning can be made. Due to its notion of landmarks,
this method is particularly suited to automate this task.
2 Method
2.1 Model Building
The segmentation algorithm presented in this paper belongs to the class of su-
pervised segmentation algorithms. From a training data set of ground truth seg-
mentations a statistical model is build. This training data set consist of images
together with their corresponding object segmentations. These object segmen-
tations are surfaces represented as a polygon mesh. The nodes of this polygon
mesh are seen as landmarks in the image. Each landmark must correspond to the
same location between the the training images, i.e. landmark correspondences
between the training data must exist. The next paragraphs describe the model
building procedure. These models are build by estimating the probability density
function from the training data. First the global statistical framework and the
made assumptions are presented. The next paragraphs discuss the local appear-
ance model and the local shape model. The final section explains the probability
density function estimation.
P (I|G)P (G)
P (G|I) = . (1)
P (I)
The first term P (I|G) is the image prior, the second term, P (G), is the shape prior.
The term in the denominator is a constant and therefore of no interest. Maximizing
the posterior probability is equal to minimizing its negative algorithm,
where EI (I, G) is the negative logarithm of the image prior and ES (G) the
negative logarithm of shape prior. In this way a cost function that needs to be
minimized is formulated. The next sections describe respectively the image prior
and the shape prior.
Image Prior. In order to build a model for the image prior two assumptions are
made. The first assumption states that the influence of the segmentation on the
image intensities is only local. This local influence is described by a Local Image
Descriptor (LID). This LID extracts the local intensity patterns around each
landmark in the image. The second assumption states the mutual independence
of these landmark-individual LIDs. Using these assumptions the image prior
term can be rewritten as follows:
n
n
P (I|G) = P (I|li ) = Pi (ωi ). (3)
i i
where di (ωi ) is the negative logarithm of Pi (ωi ) and represents the intensity
cost of landmark i. As already explained, this LID tries to describe the image
intensity in the local neighborhood of each landmark. In the computer vision
literature some LIDs are proposed [15]. In this article two LIDs are used, the
Gabor LID and locally orderless images.
Gabor LID. The Gabor LID is the response in a given landmark of a Gabor filter
bank applied to the image. A Gabor filter captures the most optimal localized,
in terms of space-frequency localization, frequency and phase content of a signal.
The filter consists of a Gaussian kernel modulated by a complex sinusoid with
a specific frequency and orientation. These Gabor filters have been found to be
distortion tolerant for pattern recognition tasks [16]. There is also a biological
Image Segmentation Using Graph Representations 357
where f is the central frequency of the filter, R (θ, φ) is the rotation matrix
determining the filter orientation and γ, η and ζ control the filter sharpness.
|3
The term |f3 is a normalization constant for the filter response. The real-
π 2 γηζ
valued part (cosine) of the Gabor filter captures the symmetric properties and
the imaginary-valued part (sine) the asymmetrical properties of the signal. The
Gabor filter response can also be decomposed into a magnitude and a phase. The
phase behaves oscillatory, while the magnitude is more smooth. Therefore, when
comparing two LIDs, including phase information can lead to better results,
however when using only the response magnitude robustness is improved. [17].
Locally Orderless Images. The purpose of the LID is to describe the local intensi-
ties around each landmark. A Taylor expansion approximates the local intensities
by a polynomial of some order N . The coefficients of this Taylor expansion are
proportional to the derivatives up to order N . For images these derivatives can
be computed by convolving the image with the derivative of a Gaussian at a
particular scale σ. Instead of directly using the derivatives of the image, locally
orderless images [18] are used. The term locally orderless is used because the
image intensities are replaced by a local intensity histogram, and thus locally
the order of the image is removed. The first few moments of these histograms
are used to construct the feature images. In this way, the LID is defined as
follows: firstly, by computing the derivatives of the image, applying the locally
orderless image technique and computing the first few moments of the local in-
tensity histograms a set feature images is constructed. Subsequently the LIDs
are constructed by taking samples along a spherical or linear profile centered at
the location of the landmark. The linear profile can be defined along the image
gradient in the landmark.
Shape Prior. The shape prior introduces the shape model in the Bayesian
framework. In the shape model two assumptions are made. The first assumes
invariance of the model to translations. The second assumes that a landmark
only interacts with its neighbors, thereby implying its local nature.
To define this local shape model, some definitions need to be formulated.
The polygon mesh, representing the object segmentation, can be considered
as an undirected graph G = {V, E} with a set of vertices V = {l1 , . . . , ln },
the landmarks, and a set of edges E, representing the connections between the
358 J. Keustermans et al.
p(x1 , x2 , x3 ) = p(x1 ).p(x2 |x1 ).p(x3 |x1 , x2 ) ≈ p(x1 ).p(x2 |x1 ).p(x3 |x1 ). (7)
In this last equation E(x) corresponds to ES (G), the negative logarithm of the
shape prior.
the optimization of the cost function (2). In the continuous domain this cost
function has many local minima. Optimization techniques in this domain, like
Gradient descent, therefore often do not result in a global minimum. This can be
avoided by the discretization of the cost function and the use of combinatorial
optimization techniques.
Discretization. The discretization of the cost function is performed by impos-
ing a discrete sample space X on the graph G. This discrete sample space consists
of a finite set of possible landmark locations. In this case the optimization prob-
lem comes down to the selection of the optimal possible landmark location for
each landmark. These possible landmark locations for each landmark are ob-
tained by evaluating the intensity cost di (ωi ) in a search grid located around
the landmark of interest and selecting the m locations with lowest cost. This
results in a set of candidates xi = {xik }m k=1 for every landmark. The opti-
mization problem now becomes a labeling problem: r = {r1 , . . . , rn }. Following
m
conditions must hold: k=1 rik = 1 and rik = 1 if candidate k is selected. The
resulting discrete cost function to be minimized becomes
⎛ ⎛ ⎞⎞
n m
n
m n m
r = arg min ⎝ rik di (ωik ) + γ ⎝rik rjo dij (xik , xjo )⎠⎠ ,
r
i=1 k=1 i=1 k=1 j=1 o=1
(14)
where γ is a constant that determines the relative weight of the image and shape
prior. Important to note here is that we assume that all images in the training
data set are rigid registered to a reference image, using for example mutual
information [22]. Any image not contained in the training data set, can also be
registered to this reference image. In this way an initial guess concerning the
location of every landmark can be made and a grid of possible candidates can
be generated.
Optimization. Currently two classes of methods are the most prominent ones
in discrete Markov random field optimization: the methods based on graph-
cuts [7] and those based on message-passing [23]. Examples of the message-
passing methods are belief propagation [24] and the so-called tree-reweighted
message passing algorithms [25,26]. The methods based on graph-cuts, however,
can not be used to minimize our cost function (14) because it is not graph-
representable [27].
Another method to minimize our cost function is mean field annealing [19]. By
considering Ri to be a random variable taking values ri in some discrete sample
space R containing the labels of the labeling problem, R = {R1 , . . . , Rn } is
said to be a Markov random field under certain conditions (section 2.1). The
probability density function of this Markov random field can be written as
1 1
P (r) = exp − E(r) , (15)
Zr T
where E(r) is equal to the cost function in equation (14) and an artificial param-
eter T , called temperature, is added. The solution of equation (14) corresponds
Image Segmentation Using Graph Representations 361
to the configuration of the Markov random field with highest probability. The
temperature T can be altered without altering the most probable configuration
r , while the mean configuration r̄ alters with T as follows:
lim r̄T = lim rP (r) = r . (16)
T →0+ T →0+
r
Fig. 1. An example of a segmented tooth is shown in figure (a). Figures (b) and (c)
show the distance of this segmentation to the manual segmentation. The distances
are indicated by a color coding, in which blue indicates a distance of 0 mm and red
indicates a distance of 0.8 mm, being twice the voxel size.
25
20
15
distance [mm]
10
0
N S Por(r) Por(l) Or(r) Or(l) UI(r) UI(l) LI(r) LI(l) Go(r) Go(l) Men ANS A B PNS
Fig. 2. Box plots of the error distances between the true and found location for each
anatomical landmark
located further away from these artifacts a similar performance as with the
locally orderless images is obtained. The locally orderless images are less sensitive
to the metallic streak artifacts because these are defined more locally.
the head. The anatomical landmarks are nasion, sella, porion (left and right),
orbitale (left and right), upper incisor (left and right), lower incisor (left and
right), gonion (left and right), menton, anterior nasal spine, A-point, B-point
and posterior nasal spine [14]. The training data set consisted of 37 patients
of which most of them had orthodontic braces. A leave-one-out procedure is
performed and the errors for each anatomical landmark are reported. The LID
used is the Gabor filter bank, containing again 72 Gabor filters with 9 different
orientations and 8 different frequencies. To compare the Gabor filter responses
both magnitude and phase is used, since this improved the results. KPCA is
used to estimate the probability density functions. Figure 2 shows the results of
this procedure. In this figure box plots of the error values are shown.
References
1. Kass, M., Witkin, A., Terzopoulos, D.: Snakes: Active Contour Models. Interna-
tional Journal of Computer Vision 1(4), 231–331 (1987)
2. Caselles, V., Kimmel, R., Sapiro, G.: Geodesic Active Contours. International Jour-
nal of Computer Vision 22(1), 66–79 (1997)
364 J. Keustermans et al.
3. Mumford, D., Shah, J.: Optimal approximation by piecewise smooth functions and
associated variational problems. Communications of Pure and Applied Mathemat-
ics 42, 577–685 (1989)
4. Chan, T.F., Vese, L.A.: Active Contours Without Edges. IEEE Transactions on
Image Processing 10(2), 266–277 (2001)
5. Cremers, D., Rousson, M., Deriche, R.: A Review of Statistical Approaches to Level
Set Segmentation: Integrating Color, Texture, Motion and Shape. International
Journal of Computer Vision 72(2), 195–215 (2007)
6. Cremers, D.: Statistical Shape Knowledge in Variational Image Segmentation. Uni-
versität Mannheim (2002)
7. Boykov, Y., Veksler, O., Zabih, R.: Fast Approximate Energy Minimization
via Graph Cuts. IEEE Transactions on Pattern Analysis and Machine Intelli-
gence 23(11), 1222–1239 (2001)
8. Boykov, Y., Funka-Leah, G.: Graph Cuts and Efficient N-D Image Segmentation.
International Journal of Computer Vision 70(2), 109–131 (2006)
9. Cootes, T.F., Taylor, C.J., Cooper, D.H., Graham, J.: Active Shape Models - Their
Training and Application. Computer Vision and Image Understanding 61(1), 38–59
(1995)
10. Cootes, T.F., Edwards, G.E., Taylor, C.J.: Active Appearance Models. IEEE
Transactions on Pattern Analysis and Machine Intelligence 23(6), 681–685 (2001)
11. Cremers, D., Osher, S.J., Soatto, S.: Kernel Density Estimation and Intrinsic Align-
ment for Shape Priors in Level Set Segmentation. International Journal of Com-
puter Vision 69(3), 335–351 (2006)
12. Seghers, D.: Local graph-based probabilistic representation of object shape and
appearance for model-based medical image segmentation. Katholieke Universiteit
Leuven (2008)
13. Seghers, D., Hermans, J., Loeckx, D., Maes, F., Vandermeulen, D., Suetens, P.:
Model-Based Segmentation Using Graph Representations. In: Metaxas, D., Axel,
L., Fichtinger, G., Székely, G. (eds.) MICCAI 2008, Part I. LNCS, vol. 5241, pp.
393–400. Springer, Heidelberg (2008)
14. Swennen, G.R.J., Schutyser, F., Hausamen, J.-E.: Three-Dimensional Cephalome-
try, A Color Atlas and Manual. Springer, Heidelberg (2006)
15. Ilonen, J.: Supervised Local Image Feature Detection. Lappeenranta University of
Technology (2007)
16. Lampinen, J., Oja, E.: Distortion tolerant pattern recognition based on self-
organizing feature extraction. IEEE Transactions on Neural Networks 6, 539–547
(1995)
17. Kämäräinen, J.-K.: Feature extraction using Gabor filters. Lappeenranta Univer-
sity of Technology (2003)
18. Koenderink, J.J., Van Doorn, A.J.: The Structure of Locally Orderless Images.
International Journal of Computer Vision 31, 159–168 (1999)
19. Li, S.Z.: Markov Random Field Modeling in Computer Vision. Springer, Heidelberg
(1995)
20. Courant, R., Hilbert, D.: Methods of Mathematical Physics. Interscience Publish-
ers, Inc., New York (1953)
21. Schölkopf, B., Smola, A., Müller, K.R.: Nonlinear component analysis as a kernel
eigenvalue problem. Neural Computation 10, 1299–1319 (1998)
22. Maes, F., Collignon, A., Vandermeulen, D., Marchal, G., Suetens, P.: Multimodal-
ity image registration by maximization of mutual information. IEEE transactions
on medical imaging 16(2), 187–198 (1997)
Image Segmentation Using Graph Representations 365
23. Komodakis, N., Paragios, N., Tziritas, G.: MRF Optimization via Dual Decompo-
sition: Message-Passing Revisited. In: ICCV 2007. IEEE 11th International Con-
ference on Computer Vision, pp. 1–8 (October 2007)
24. Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient Belief Propagation for Early Vi-
sion. International Journal of Computer Vision 70(1) (October 2006)
25. Wainwright, M.J., Jaakkola, T.S., Willsky, A.S.: MAP Estimation Via Agreement
on Trees: Message-Passing and Linear Programming. IEEE Transactions on Infor-
mation Theory 51(11), 3697–3717 (2006)
26. Kolmogorov, V.: Convergent Tree-reweighted Message Passing for Energy Mini-
mization. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(10),
1568–1583 (2006)
27. Kolmogorov, V., Zabih, R.: What Energy Functions Can Be Minimized via Graph
Cuts? IEEE Transactions on Pattern Analysis and Machine Intelligence 26(2),
147–159 (2001)
28. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International
Journal of Computer Vision 60, 91–110 (2004)
29. Vapnik, V.N.: Statistical Learning Theory. Wiley, New York (1998)
30. Wiskott, L., Fellous, J.-M., Krüger, N., von der Malsburg, C.: Face Recognition
by Elastic Bunch Graph Matching. Intelligent Biometric Tecniques in Fingerprint
and Face Recognition. Chapt. 11, 355–396 (1999)
Comparison of Perceptual Grouping Criteria
within an Integrated Hierarchical Framework
1 Introduction
A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 366–375, 2009.
c Springer-Verlag Berlin Heidelberg 2009
Comparison of Perceptual Grouping Criteria 367
As taking into account the Gestalt principles to group pixels into higher level
structures is computationally complex, perceptual segmentation approaches typ-
ically integrate a pre–segmentation stage with a subsequent perceptual grouping
stage. Basically, the first stage conducts the low–level definition of segmenta-
tion as a process of grouping pixels into homogeneous clusters, meanwhile the
second stage performs a domain–independent grouping of the pre–segmentation
regions which is mainly based on properties like the proximity, similarity, closure
or continuity. In this paper, both stages performs a perceptual organization of
the image which is described by a hierarchy of partitions ordered by inclusion.
The base of this hierarchy is the whole image, and each level represents the
image at a certain scale of observation [3]. This hierarchy has been structured
using a Bounded Irregular Pyramid (BIP) [4]. The data structure of the BIP is a
mixture of regular and irregular data structures, and it has been previously em-
ployed by color–based segmentation approaches [4,5]. Experimental results have
shown that, although computationally efficient, these segmentation approaches
are excessively affected by the shift–variance problem [4,5]. In this paper, the
original decimation strategy has been modified to solve this problem increasing
the degree of mixture between the regular and irregular parts of the BIP data
structure. The pre–segmentation stage of the proposed perceptual grouping ap-
proach uses this decimation scheme to accomplish a color-based segmentation of
the input image. Experimental results have shown that the shift-variance prob-
lem is significantly reduced without an increase of the computational cost. On
the other hand, the second stage groups the set of blobs into a smaller set of
regions taking into account a pairwise comparison function derived from the
Gestalt theory. To achieve this second stage, the proposed approach generates a
set of new pyramid levels over the previously built pre–segmentation pyramid.
At this stage, we have tested three pairwise comparison functions to determine
if two nodes must be grouped. The rest of this paper is organized as follows:
Section 2 describes the proposed approach and the three implemented compari-
son functions. Experimental results revealing the efficacy of these functions are
described in Section 3. The paper concludes along with discussions and future
work in Section 4.
value given by the averaged CIELab color of the image pixels linked to x. Be-
sides, each regular node has associated a boolean value hx : the homogeneity [5].
Only regular nodes which have hx equal to 1 are considered to be part of the
regular structure. Regular nodes with an homogeneity value equal to 0 are not
considered for further processing. At the base level of the hierarchy G0 , all nodes
are regular, and they have hx equal to 1. In order to divide the image into a
set of homogeneous colored blobs, the graph Gl is transformed in Gl+1 using
a pairwise comparison of neighboring nodes [6]. At the pre–segmentation stage,
the pairwise comparison function, g(vx1 , vx2 ), is true if the Euclidean distance
between the CIELab values vx1 and vx2 is under an user–defined threshold Uv .
As it was aforementioned, the decimation algorithm proposed to build the BIP
by [4,5] has been modified to increase the degree of mixture between the regular
and irregular parts of the BIP data structure. The new decimation algorithm
runs two consecutive steps to obtain the set of nodes Nl+1 from Nl . The first
step generates the set of regular nodes of Gl+1 from the regular nodes at Gl and
the second one determines the set of irregular nodes at level l+1. Contrary to
previously proposed algorithms [4,5], this second process employs a union-find
process which is simultaneously conducted over the set of regular and irregular
nodes of Gl which do not present a parent in the upper level l+1. The decimation
process consists of the following steps:
1. Regular decimation process. The hx value of a regular node x at level l+1
is set to 1 if the four regular nodes immediately underneath {yi } are similar
and their h{yi } values are equal to 1. That is, hx is set to 1 if
$ $
{ g(vyj , vyk )} ∩ { hy j } (1)
∀yj ,yk ∈{yi } yj ∈{yi }
Besides, at this step, inter-level arcs among regular nodes at levels l and l+1
are established. If x is an homogeneous regular node at level l+1 (hx ==1),
then the set of four nodes immediately underneath {yi } are linked to x.
2. Irregular decimation process. Each irregular or regular node x ∈ Nl without
parent at level l+1 chooses the closest neighbor y according to the vx value.
Besides, this node y must be similar to x. That is, the node y must satisfy
{||vx − vy || = min(||vx − vz || : z ∈ ξx )} ∩ {g(vx , vy )} (2)
If this condition is not satisfy by any node, then a new node x is gener-
ated at level l+1. This node will be the parent node of x. Besides, it will
constitute a root node and the set of nodes linked to it at base level will be
an homogeneous set of pixels according to the defined criteria. On the other
hand, if y exists and it has a parent z at level l+1, then x is also linked to
z. If y exists but it does not have a parent at level l+1, a new irregular node
z is generated at level l+1. In this case, the nodes x and y are linked to z .
This process is sequentially performed and, when it finishes, each node of
Gl is linked to its parent node in Gl+1 . That is, a partition of Nl is defined.
It must be noted that this process constitutes an implementation of the
union-find strategy [5].
Comparison of Perceptual Grouping Criteria 369
Table 1. Shift Variance values for different decimation processes. Average values have
been obtained from 30 color images from Waterloo and Coil 100 databases (All these
images have been resized to 128x128 pixels size).
MIS [7] D3P [8] MIES [9] BIP [5] Modified BIP
SVmin 39.9 31.8 23.7 25.6 19.5
SVave 59.8 49.1 44.1 73.8 43.7
SVmax 101.1 75.3 77.2 145.0 73.2
the last level of the pre–segmentation pyramid (lm ) linked to a node is named its
pre–segmentation receptive field, then Ext(yi , yj ) is defined as the smallest color
difference between two neighbor nodes xi ∈ Nlm and xj ∈ Nlm which belong
to the pre–segmentation receptive fields of yi and yj , respectively. P Int(·, ·) is
defined as
P Int(yi , yj ) = min(Int(yi ) + τ (yi ), Int(yj ) + τ (yj )) (6)
Int(n) being the internal contrast of the node n. This contrast measure is defined
as the largest color difference between n and the nodes belonging to the pre–
segmentation receptive field of n. The threshold function τ controls the degree
to which the external variation can actually be larger that the internal variations
and still have the nodes be considered similar. In this work, we have used the
function proposed by [10], τ = α/|n|, where |n| defines the set of pixels of the
input image linked to n.
Energy Functions (EF). In the Luo and Guo’s proposal [11], a set of energy
functions was used to characterize desired single–region properties and pairwise
region properties. The single-region properties include region area, region con-
vexity, region compactness, and color variances in one region. The pairwise prop-
erties include the color mean differences between two regions, the edge strength
along the shared boundary, the color variance of the cross–boundary area, and
the contour continuity between two regions.
372 R. Marfil and A. Bandera
With the aim of finding the lowest energy groupings, Huart and Bertolino [12]
propose to employ these energies to measure the cost of any region or group of
regions. In a similar way, we have defined a pairwise comparison function to
evaluate if two nodes can be grouped. Two energies are defined:
3 Experimental Results
In order to evaluate the performance of the perceptual segmentation frame-
work and the three described comparison functions, the BSDB has been em-
ployed1 [14]. In this dataset, the methodology for evaluating the performance of
segmentation techniques is based in the comparison of machine detected bound-
aries with respect to human-marked boundaries using the Precision-Recall frame-
work [13]. This technique considers two quality measures: precision and recall.
The F –measure is defined as the harmonic mean of these measures, combining
them into a single measure.
Fig. 2 shows the partitions on the higher level of the hierarchy for five different
images when the three variants of the proposed framework are used. The optimal
training parameters have been chosen. It can be noted that the proposed criteria
are able to group perceptually important regions in spite of the large intensity
variability presented on several areas of the input images. Fig. 2 shows that the
F –measure associated to the individual results ranged from bad to significantly
good values. In any case, the ERAI comparison function allows the user to set
thresholds to partition the input image into less perceptually coherent regions
than the other two functions. If thresholds employed by the IDEC and EF func-
tions are set to provide a similar number of regions than the EARI function,
undesiderable groupings are obtained. However, it must be also noted that the
EF comparison function is more global than the others, and it could be extended
to evaluate if more than two pyramid nodes must be linked. Thus, it will take
the pyramid level as a whole. The main problems of the proposed approaches
are due to its inability to deal with textured regions which are defined at high
natural scales. Thus, the zebras or tigers in Fig. 2 are divided into a set of dif-
ferent regions. The maximal F –measure obtained from the whole test set is 0.66
for the EF comparison function, 0.65 for the IDEC function and 0.70 for the
ERAI function (see Fig. 3).
1
http://www.cs.berkeley.edu/projects/vision/grouping/segbench/
Comparison of Perceptual Grouping Criteria 373
Fig. 2. a) Original images; and b) obtained regions after the perceptual grouping for
the three implemented comparison functions (ERAI: Uv =5.0, Up =50.0; IDEC: Uv =5.0;
α = 15000; EF: Uv =5, Uf =1.0)
Specifically, the F –measure value obtained when the ERAI comparison func-
tion is employed is equal than the one obtained by the gPb [16] and greater
than values provided by other methods like the UCM [3] (0.67), BCTG [13]
(0.65), BEL [15] (0.66) or the min-cover [17] (0.65). Besides, the main advan-
tage of the proposed segmentation framework is that it provides these results
at a relative low computational cost. Thus, using an Intel Core Duo T7100
PC with 1Gb DDR2 of memory, the processing times associated to the pre–
segmentation stage are typically less than 250 ms, meanwhile the perceptual
grouping stage takes less than 150 ms to process any image on the test set.
Therefore, the processing time of the perceptual segmentation framework is less
than 400 ms for any image on the test set. These processing time values are
similar to the ones provided when the IDEC comparison function is employed.
However, if the EF comparison function is used, the approach is almost 50 times
slower.
374 R. Marfil and A. Bandera
Fig. 3. Performance of the proposed framework using the BSDB protocol (see text)
Acknowledgments
This work has been partially granted by the Spanish Junta de Andalucı́a, under
projects P07-TIC-03106 and P06-TIC-02123 and by the Spanish Ministerio de
Ciencia y Tecnologı́a (MCYT) and FEDER funds under project no. TIN2005-
01359.
References
1. Robles-Kelly, A., Hancock, E.R.: An Expectation–maximisation Framework for
Segmentation and Grouping. Image and Vision Computing 20, 725–738 (2002)
2. Wertheimer, M.: Ǔber Gestaltheorie. Philosophische Zeitschrift für Forschung und
Aussprache 1, 30–60 (1925)
Comparison of Perceptual Grouping Criteria 375