You are on page 1of 387

Lecture Notes in Computer Science 5534

Commenced Publication in 1973


Founding and Former Series Editors:
Gerhard Goos, Juris Hartmanis, and Jan van Leeuwen

Editorial Board
David Hutchison
Lancaster University, UK
Takeo Kanade
Carnegie Mellon University, Pittsburgh, PA, USA
Josef Kittler
University of Surrey, Guildford, UK
Jon M. Kleinberg
Cornell University, Ithaca, NY, USA
Alfred Kobsa
University of California, Irvine, CA, USA
Friedemann Mattern
ETH Zurich, Switzerland
John C. Mitchell
Stanford University, CA, USA
Moni Naor
Weizmann Institute of Science, Rehovot, Israel
Oscar Nierstrasz
University of Bern, Switzerland
C. Pandu Rangan
Indian Institute of Technology, Madras, India
Bernhard Steffen
University of Dortmund, Germany
Madhu Sudan
Massachusetts Institute of Technology, MA, USA
Demetri Terzopoulos
University of California, Los Angeles, CA, USA
Doug Tygar
University of California, Berkeley, CA, USA
Gerhard Weikum
Max-Planck Institute of Computer Science, Saarbruecken, Germany
Andrea Torsello Francisco Escolano
Luc Brun (Eds.)

Graph-Based
Representations in
Pattern Recognition
7th IAPR-TC-15 International Workshop, GbRPR 2009
Venice, Italy, May 26-28, 2009
Proceedings

13
Volume Editors

Andrea Torsello
Department of Computer Science
“Ca’ Foscari” University of Venice
Venice, Italy
E-mail: torsello@dsi.unive.it
Francisco Escolano
Department of Computer Science
and Artificial Intelligence
Alicante University
Alicante, Spain
E-mail: sco@dccia.ua.es
Luc Brun
GreyC
University of Caen
Caen Cedex, France
E-mail: luc.brun@greyc.ensicaen.fr

Library of Congress Control Number: Applied for

CR Subject Classification (1998): I.5, I.3, I.4, I.2.10, G.2.2

LNCS Sublibrary: SL 6 – Image Processing, Computer Vision, Pattern Recognition,


and Graphics

ISSN 0302-9743
ISBN-10 3-642-02123-9 Springer Berlin Heidelberg New York
ISBN-13 978-3-642-02123-7 Springer Berlin Heidelberg New York

This work is subject to copyright. All rights are reserved, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting,
reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965,
in its current version, and permission for use must always be obtained from Springer. Violations are liable
to prosecution under the German Copyright Law.
springer.com
© Springer-Verlag Berlin Heidelberg 2009
Printed in Germany
Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India
Printed on acid-free paper SPIN: 12682966 06/3180 543210
Preface

This volume contains the papers presented at the 7th IAPR-TC-15 Workshop
on Graph-Based Representations in Pattern Recognition – GbR 2009. The work-
shop was held in Venice, Italy between May 26–28, 2009. The previous work-
shops in the series were held in Lyon, France (1997), Haindorf, Austria (1999),
Ischia, Italy (2001), York, UK (2003), Poitiers, France (2005), and Alicante,
Spain (2007).
The Technical Committee (TC15, http://www.greyc.ensicaen.fr/iapr-tc15/)
of the IAPR (International Association for Pattern Recognition) was founded in
order to federate and to encourage research work at the intersection of pattern
recognition and graph theory. Among its activities, the TC15 encourages the
organization of special graph sessions in many computer vision conferences and
organizes the biennial GbR Workshop.
The scientific focus of these workshops covers research in pattern recognition
and image analysis within the graph theory framework. This workshop series
traditionally provide a forum for presenting and discussing research results and
applications in the intersection of pattern recognition, image analysis and graph
theory.
The papers in the workshop cover the use of graphs at all levels of represen-
tation, from low-level image segmentation to high-level human behavior. There
are papers on formalizing the use of graphs for representing and recognizing data
ranging from visual shape to music, papers focusing on the development of new
and efficient approaches to matching graphs, on the use of graphs for super-
vised and unsupervised classification, on learning the structure of sets of graphs,
and on the use of graph pyramids and combinatorial maps to provide suitable
coarse-to-fine representations. Encouragingly, the workshop saw the convergence
of ideas from several fields, from spectral graph theory, to machine learning, to
graphics.
The papers presented in the proceedings have been reviewed by at least two
members of the Program Committee and each paper received on average of three
reviews, with more critical papers receiving as many as five reviews. We sincerely
thank all the members of the Program Committee and all the additional referees
for their effort and invaluable help. We received 47 papers from 18 countries and
5 continents. The Program Committee selected 18 of them for oral presentation
and 19 as posters. The resulting 37 papers revised by the authors are published
in this volume.

March 2009 Andrea Torsello


Francisco Escolano
Luc Brun
Organization

General Chairs
Andrea Torsello Univesità Ca’ Foscari di Venezia, Italy
Francisco Escolano Universidad de Alicante, Spain
Luc Brun GREYC ENSICAEN, France

Program Committee
I. Bloch TELECOM ParisTech, France
H. Bunke University of Bern, Switzerland
S. Dickinson University of Toronto, Ontario, Canada
M. Figueiredo Instituto Superior Técnico, Portugal
E. R. Hancock University of York, UK
C. de la Higuera University of Saint-Etienne, France
J.-M. Jolion Universite de Lyon, France
W. G. Kropatsch Vienna University of Technology, Austria
M. Pelillo Univesità Ca’ Foscari di Venezia, Italy
A. Robles-Kelly National ICT Australia (NICTA), Australia
A. Shokoufandeh Drexel University, PA, USA
S. Todorovic Oregon State University, OR, USA
M. Vento Università di Salerno, Italy
R. Zabih Cornell University, NY, USA

Organizing Committee
S. Rota Bulò Univesità Ca’ Foscari di Venezia, Italy
A. Albarelli Univesità Ca’ Foscari di Venezia, Italy
E. Rodolà Univesità Ca’ Foscari di Venezia, Italy

Additional Referees
Andrea Albarelli Daniela Giorgi Emanuele Rodolà
Xiang Bai Michael Jamieson Samuel Rota Bulò
Sebastien Bougleux Jean-Christophe Janodet Émilie Samuel
Gustavo Carneiro Rolf Lakemper Cristian Sminchisescu
Fatih Demirci Miguel Angel Lozano
Aykut Erdem James Maclean
Sébastien Fourey Anand Rangarajan

Sponsoring Institutions
Dipartimento di Informatica
Università Ca’ Foscari di Venezia, Italy
Table of Contents

Graph-Based Representation and Recognition


Matching Hierarchies of Deformable Shapes . . . . . . . . . . . . . . . . . . . . . . . . . 1
Nadia Payet and Sinisa Todorovic
Edition within a Graph Kernel Framework for Shape Recognition . . . . . . 11
François-Xavier Dupé and Luc Brun
Coarse-to-Fine Matching of Shapes Using Disconnected Skeletons by
Learning Class-Specific Boundary Deformations . . . . . . . . . . . . . . . . . . . . . . 21
Aykut Erdem and Sibel Tari
An Optimisation-Based Approach to Mesh Smoothing: Reformulation
and Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Yskandar Hamam and Michel Couprie
Graph-Based Representation of Symbolic Musical Data . . . . . . . . . . . . . . . 42
Bassam Mokbel, Alexander Hasenfuss, and Barbara Hammer
Graph-Based Analysis of Nasopharyngeal Carcinoma with Bayesian
Network Learning Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
Alex Aussem, Sergio Rodrigues de Morais, Marilys Corbex, and
Joël Favrel
Computing and Visualizing a Graph-Based Decomposition for
Non-manifold Shapes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
Leila De Floriani, Daniele Panozzo, and Annie Hui
A Graph Based Data Model for Graphics Interpretation . . . . . . . . . . . . . . 72
Endre Katona
Tracking Objects beyond Rigid Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
Nicole Artner, Adrian Ion, and Walter G. Kropatsch
Graph-Based Registration of Partial Images of City Maps Using
Geometric Hashing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
Steffen Wachenfeld, Klaus Broelemann, Xiaoyi Jiang, and
Antonio Krüger

Graph Matching
A Polynomial Algorithm for Submap Isomorphism: Application to
Searching Patterns in Images . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
Guillaume Damiand, Colin de la Higuera, Jean-Christophe Janodet,
Émilie Samuel, and Christine Solnon
VIII Table of Contents

A Recursive Embedding Approach to Median Graph Computation . . . . . 113


M. Ferrer, D. Karatzas, E. Valveny, and H. Bunke

Efficient Suboptimal Graph Isomorphism . . . . . . . . . . . . . . . . . . . . . . . . . . . 124


Kaspar Riesen, Stefan Fankhauser, Horst Bunke, and
Peter Dickinson

Homeomorphic Alignment of Edge-Weighted Trees . . . . . . . . . . . . . . . . . . . 134


Benjamin Raynal, Michel Couprie, and Venceslas Biri

Inexact Matching of Large and Sparse Graphs Using Laplacian


Eigenvectors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
David Knossow, Avinash Sharma, Diana Mateus, and Radu Horaud

Graph Matching Based on Node Signatures . . . . . . . . . . . . . . . . . . . . . . . . . 154


Salim Jouili and Salvatore Tabbone

A Structural and Semantic Probabilistic Model for Matching and


Representing a Set of Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
Albert Solé-Ribalta and Francesc Serratosa

Arc-Consistency Checking with Bilevel Constraints: An Optimization . . . 174


Aline Deruyver and Yann Hodé

Graph Clustering and Classification

Pairwise Similarity Propagation Based Graph Clustering for Scalable


Object Indexing and Retrieval . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
Shengping Xia and Edwin R. Hancock

A Learning Algorithm for the Optimum-Path Forest Classifier . . . . . . . . . 195


João Paulo Papa and Alexandre Xavier Falcão

Improving Graph Classification by Isomap . . . . . . . . . . . . . . . . . . . . . . . . . . 205


Kaspar Riesen, Volkmar Frinken, and Horst Bunke

On Computing Canonical Subsets of Graph-Based Behavioral


Representations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 215
Walter C. Mankowski, Peter Bogunovich, Ali Shokoufandeh, and
Dario D. Salvucci

Object Detection by Keygraph Classification . . . . . . . . . . . . . . . . . . . . . . . . 223


Marcelo Hashimoto and Roberto M. Cesar Jr.

Graph Regularisation Using Gaussian Curvature . . . . . . . . . . . . . . . . . . . . . 233


Hewayda ElGhawalby and Edwin R. Hancock
Table of Contents IX

Characteristic Polynomial Analysis on Matrix Representations of


Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243
Peng Ren, Richard C. Wilson, and Edwin R. Hancock

Flow Complexity: Fast Polytopal Graph Complexity and 3D Object


Clustering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 253
Francisco Escolano, Daniela Giorgi, Edwin R. Hancock,
Miguel A. Lozano, and Bianca Falcidieno

Pyramids, Combinatorial Maps, and Homologies

Irregular Graph Pyramids and Representative Cocycles of Cohomology


Generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263
Rocio Gonzalez-Diaz, Adrian Ion, Mabel Iglesias-Ham, and
Walter G. Kropatsch

Annotated Contraction Kernels for Interactive Image Segmentation . . . . 273


Hans Meine

3D Topological Map Extraction from Oriented Boundary Graph . . . . . . . 283


Fabien Baldacci, Achille Braquelaire, and Guillaume Damiand

An Irregular Pyramid for Multi-scale Analysis of Objects and Their


Parts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293
Martin Drauschke

A First Step toward Combinatorial Pyramids in n-D Spaces . . . . . . . . . . . 304


Sébastien Fourey and Luc Brun

Cell AT-Models for Digital Volumes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 314


Pedro Real and Helena Molina-Abril

From Random to Hierarchical Data through an Irregular Pyramidal


Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 324
Rimon Elias, Mohab Al Ashraf, and Omar Aly

Graph-Based Segmentation

Electric Field Theory Motivated Graph Construction for Optimal


Medical Image Segmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 334
Yin Yin, Qi Song, and Milan Sonka

Texture Segmentation by Contractive Decomposition and Planar


Grouping . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 343
Anders Bjorholm Dahl, Peter Bogunovich, and Ali Shokoufandeh
X Table of Contents

Image Segmentation Using Graph Representations and Local


Appearance and Shape Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 353
Johannes Keustermans, Dieter Seghers, Wouter Mollemans,
Dirk Vandermeulen, and Paul Suetens

Comparison of Perceptual Grouping Criteria within an Integrated


Hierarchical Framework . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 366
R. Marfil and A. Bandera

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 377


Matching Hierarchies of Deformable Shapes

Nadia Payet and Sinisa Todorovic

Oregon State University, Corvallis, OR 97331, USA


payetn@onid.orst.edu, sinisa@eecs.oregonstate.edu

Abstract. This paper presents an approach to matching parts of deformable


shapes. Multiscale salient parts of the two shapes are first identified. Then, these
parts are matched if their immediate properties are similar, the same holds recur-
sively for their subparts, and the same holds for their neighbor parts. The shapes
are represented by hierarchical attributed graphs whose node attributes encode the
photometric and geometric properties of corresponding parts, and edge attributes
capture the strength of neighbor and part-of interactions between the parts. Their
matching is formulated as finding the subgraph isomorphism that minimizes a
quadratic cost. The dimensionality of the matching space is dramatically reduced
by convexifying the cost. Experimental evaluation on the benchmark MPEG-7
and Brown datasets demonstrates that the proposed approach is robust.

1 Introduction
This paper is about shape matching by using: (1) a new hierarchical shape representa-
tion, and (2) a new quadratic-assignment objective function that is efficiently optimized
via convexification. Many psychophysical studies suggest that shape perception is the
major route for acquiring knowledge about the visual world [1]. However, while hu-
mans are very efficient in recognizing shapes, this proves a challenging task for com-
puter vision. This is mainly due to certain limitations in existing shape representations
and matching criteria used, which typically cannot adequately address matching of de-
formable shapes. Two perceptually similar deformable shapes may have certain parts
very different or even missing, whereas some other parts very similar. Therefore, ac-
counting for shape parts in matching is important. However, it is not always clear how to
define a shape part. The motivation behind the work described in this paper is to improve
robustness of shape matching by using a rich hierarchical shape representation that will
provide access to all shape parts existing at all scales, and by formulating a matching
criterion that will account for these shape parts and their hierarchical properties.
We address the following problem: Given two shapes find correspondences between
all their parts that are similar in terms of photometric, geometric, and structural prop-
erties, the same holds recursively for their subparts, and the same holds for their neigh-
bor parts. To this end, a shape is represented by a hierarchical attributed graph whose
node attributes encode the intrinsic properties of corresponding multiscale shape parts
(e.g., intensity gradient, length, orientation), and edge attributes capture the strength of
neighbor and part-of interactions between the parts. We formulate shape matching as
finding the subgraph isomorphism that preserves the original graph connectivity and
minimizes a quadratic cost whose linear and quadratic terms account for differences

A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 1–10, 2009.

c Springer-Verlag Berlin Heidelberg 2009
2 N. Payet and S. Todorovic

between node and edge attributes, respectively. The cost is defined so as to be invariant
to scale changes and in-plane rotation of the shapes. The search in the matching space
of all shape-part pairs is accelerated by convexifying the quadratic cost, which also re-
duces the chances to get trapped in a local minimum. As our experiments demonstrate,
the proposed approach is robust against large variations of individual shape parts and
partial occlusion.
In the rest of this paper, Sec. 2 points out main contributions of our approach with re-
spect to prior work, Sec. 3 describes our hierarchical representation of a shape, Sec. 4.1
specifies node and edge compatibilities and formulates our matching algorithm, Sec. 4.2
explains how to convexify and solve the quadratic program, and Sec. 5 presents experi-
mental evaluation of our approach.

2 Our Contributions and Relationships to Prior Work

This section reviews prior work and points out our main contributions. Hierarchical
shape representations are aimed at efficiently capturing both global and local properties
of shapes, and thus facilitating their matching. Shortcomings of existing representations
typically reduce the efficiency of matching algorithms. For example, the arc-tree [2,3]
trades off its accuracy and stability for lower complexity, since it is a binary tree, gener-
ated by recursively splitting the curve in two halves. Arc-trees are different for similar
shapes with some part variations, which will be hard to match. Another example is the
curvature scale-space [4,5] that loses its descriptive power by pre-specifying the degree
of image decimation (i.e., blurring and subsampling), while capturing salient curvature
points of a contour at different degrees of smoothing. Also, building the articulation-
invariant, part-based signatures of deformable shapes, presented in [6], is sensitive to
the correct identification of the shape’s landmark points and to the multidimensional
scaling and estimating of the shortest path between these points. Other hierarchical
shape descriptions include the Markov-tree graphical models [7], and the hierarchy of
polygons [8] that are based on the restrictive assumptions about the number, size, and
hierarchy depth of parts that a curve consists of. The aforementioned methods encode
only geometric properties of shape parts, and their part-of relationships, yielding a strict
tree. In contrast, we use a more general, hierarchical graph that encodes the strength of
all ascendant-descendant and neighbor relationships between shape parts, as well as
their geometric and photometric properties. The sensitivity of the graph structure to
small shape variations is reduced, since we estimate the shape’s salient points at multi-
ple scales. Also, unlike in prior work, the number of nodes, depth, and branching factor
in different parts of the hierarchical graph are data dependent.
Graph-based shape matching has been the focus of sustained research activity for more
than three decades. Graph matching may be performed by: (i) exploiting spectral prop-
erties of the graphs’ adjacency matrices [9,10]; (ii) minimizing the graph edit-distance
[11,12]; (iii) finding a maximum clique of the association graph [13]; (iv) using the
expectation-maximization of a statistical, generative model [14]. Regardless of a particu-
lar formulation, graph matching in general can be cast as a quadratic assignment problem,
where a linear term in the objective function encodes node compatibility functions, and
a quadratic term encodes edge compatibility functions. Therefore, approaches to graph
Matching Hierarchies of Deformable Shapes 3

matching mainly focus on: (i) finding suitable definitions of the compatibility functions;
and (ii) developing efficient algorithms for approximately solving the quadratic assign-
ment problem (since it is NP-hard), including a suitable reformulation of the quadratic
into linear assignment problem. However, most popular approximation algorithms (e.g.,
relaxation labeling, and loopy belief propagation) critically depend on a good initial-
ization and may be easily trapped in a local minimum, while some (e.g., deterministic
annealing schemes) can be used only for graphs with a small number of nodes. Graduated
nonconvexity schemes [15], and successive convexification methods [16] have been used
to convexify the objective function of graph matching, and thus alleviate these problems.
Since it is difficult to convexify matching cost surfaces that are not explicit functions,
these methods resort to restrictive assumptions about the functional form of a matching
cost, or reformulate the quadratic objective function into a linear program. In this pa-
per, we develop a convexification scheme that shrinks the pool of matching candidates
for each individual node in the shape hierarchy, and thus renders the objective function
amenable to solution by a convex quadratic solver.

3 Hierarchical Shape Representation

In this paper, a shape (also called contour or curve) is represented by a hierarchical


graph. We first detect the contour’s salient points at multiple scales, which in turn define
the corresponding shape parts. Then, we derive a hierarchy of these shape parts, as
illustrated in Fig. 1.
Multiscale part detection. A data-driven number of salient (or dominant) points along
the contour are detected using the scale-invariant algorithm of [17]. This algorithm does
not require any input parameters, and remains reliable even when the shape is rich in
both fine and coarse details, unlike most existing approaches. The algorithm first de-
termines, for each point along the curve, its curvature and the region of support, which
jointly serve as a measure of the point’s relative significance. Then, the dominant points
are detected by the standard nonmaximum suppression. Each pair of subsequent domi-
nant points along the shape define the corresponding shape part. The end points of each
shape part define a straight line that is taken to approximate the part. We recursively
apply the algorithm of [17] to each shape part whose associated line segment has a

ae ea
ab be ia
eh ik ka
...
ef gh jk ...

Fig. 1. An example contour: (left) Lines approximating the detected contour parts are marked
with different colors. (right) The shape parts are organized in a hierarchical graph that encodes
their part-of and neighbor relationships. Only a few ascendant-descendant and neighbor edges
are depicted for clarity.
4 N. Payet and S. Todorovic

larger approximation error than a pre-set threshold. This threshold controls the resolu-
tion level (i.e., scale) at which we seek to represent the contour’s fine details. How to
compute this approximation error is explained later in this section. After the desired
resolution level is reached, the shape parts obtained at different scales can be organized
in a tree structure, where nodes and parent-child (directed) edges represent the shape
parts and their part-of relationships. The number of nodes, depth, and branching factors
of each node of this tree are all automatically determined by the shape at hand.
Transitive closure. Small, perceptually negligent shape variations (e.g., due to varying
illumination in images) may lead to undesired, large structural changes in the shape
tree (e.g., causing a tree node to split into multiple descendants at multiple levels).
As in [18], we address these potential structural changes of the shape tree by adding
new directed edges that connect every node with all of its descendants, resulting in a
transitive closure of the tree. Later, in matching, the transitive closures will allow that
a search for a maximally matching node pair is conducted over all descendants under a
visited ancestor node pair, rather than stopping the search if the ancestors’ children do
not match. This, in turn, will make matching more robust.
Neighbors. Like other strictly hierarchical representations, the transitive closure of the
shape tree is capable of encoding only a limited description of spatial-layout properties
of the shape parts. For example, it cannot distinguish different layouts of the same set
of parts along the shape. In the literature, this problem has been usually addressed by
associating a context descriptor with each part. In this paper, we instead augment the
transitive closure with new, undirected edges, capturing the neighbor relationships be-
tween parts. This transforms the transitive closure of the shape tree into a more general
graph that we call the shape hierarchy.
Node Attributes. Both nodes and edges of the shape hierarchy are attributed. Node
attributes are vectors whose elements describe photometric and geometric properties of
the corresponding shape part. The following estimates help us define the shape proper-
ties. We estimate the contour’s mean intensity gradient, and use this vector to identify
the contour’s direction – namely, the sequence of points along the shape – by the right-
hand rule. The principal axis of the entire contour is estimated as the principal axis of
an ellipse fitted to all points of the shape. The attribute vector of a node (i.e., shape
part) includes the following properties: (1) length as a percentage of the parent length;
(2) angle between the principal axes of this shape part and its parent; (3) approxima-
tion error estimated as the total area between the shape part and its associated straight
line, expressed as a percentage of the area of the fitted ellipse; (4) signed approximation
error is similar to the approximation error except that the total area between the shape
part and its approximating straight line is computed by accounting for the sign of the
intensity gradient along the shape; and (5) curvature at the two end points of the shape
part. All the properties are normalized to be in [0, 1].
Edge Attributes. The attribute of an edge in the shape hierarchy encodes the strength
of the corresponding part-of or neighbor relationship. Given a directed edge between a
shape part and its descendant part, the attribute of this edge is defined as the percentage
that the length of the descendant makes in the length of the shape part. Thus, the shorter
descendant or the longer ancestor, the smaller strength of their interaction. The attribute
Matching Hierarchies of Deformable Shapes 5

of an undirected edge between two shape parts can be either 1 or 0, where 1 means that
the parts have one common end point, and 0 means that the parts are not neighbors.

4 Shape Matching
Given two shapes, our goal is to identify best matching shape parts and discard dis-
similar parts, so that the total cost is minimized. This cost is defined as a function of
geometric, photometric, and structural properties of the matched parts, their subparts,
and their neighbor parts, as explained below.

4.1 Definition of the Objective Function of Matching


Let H = (V, E, ψ, φ) denote the shape hierarchy, where V = {v} and E = {(v, u)} ⊆
V × V are the sets of nodes and edges, and ψ and φ are functions that assign attributes
to nodes, ψ : V →[0, 1]d, and to edges, φ : E→[0, 1]. Given two shapes, H and H  , the
goal of the matching algorithm is to find the subgraph isomorphism, f :U →U  , where
U ⊆V and U  ⊆V  , which minimizes the cost, C, defined as
   
C = β v∈V c1 (v, f (v)) + (1 − β) (v,u)∈E c2 (v, f (v), u, f (u)) , (1)

where c1 is a non-negative cost function of matching nodes v and v  = f (v), and c2


is a non-negative cost function of matching edges (v, u) and (v  , u ), and β ∈ [0, 1]
weights their relative significance to matching. To minimize C, we introduce a vector,
X, indexed by all node pairs (v, v  )∈V ×V  , whose each element xvv ∈[0, 1] encodes
the confidence that pair (v, v  ) should be matched. Matching can then be reformulated
as estimating X so that C is minimized. That is, we use the standard linearization and
relaxation of (1) to obtain the following quadratic program (QP):
 
minX βAT X + (1 − β)X T BX ,   (2)
s.t. ∀(v, v  )∈V ×V  , xvv ≥0, ∀v  ∈V  , v∈V xvv =1, ∀v∈V, v ∈V  xvv =1,

where A is a vector of costs avv of matching nodes v and v  , and B is a matrix of costs
bvv uu of matching edges (v, u) and (v  , u ). We define avv = d1 ψ(v) − ψ(v  )2 ,
where d is the dimensionality of the node attribute vector. Also, we define bvv uu so
that matching edges of different types is prohibited, and matches between edges of the
same type with similar properties are favored in (2): bvv uu = ∞ if edges (v, u) and
(v  , u ) are not of the same type; and bvv uu = |φ(v, v  ) − φ(u, u )| ∈ [0, 1] if edges
(v, u) and (v  , u ) are of the same type.
The constraints in (2) are typically too restrictive, because of potentially large struc-
tural changes of V or E in H that may be caused by relatively small variations of
certain shape parts. For example, suppose H and H  represent similar shapes. It may
happen that node v in H corresponds to a subgraph consisting of nodes {v1 , . . . , vm 
}

in H , and vice versa. Therefore, a more general many-to-many matching formulation
would be more appropriate for our purposes. The literature reports a number of heuris-
tic approaches to many-to-many matching [19,20,21], which however are developed
only for weighted graphs, and thus cannot be used for our shape hierarchies that have
6 N. Payet and S. Todorovic

a1v a2v avv

v v ... v
1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8 1 2 3 4 5 6 7 8

Fig. 2. Convexification of costs {avv }v ∈V  for each node v ∈ V . Matching candidates of v that
belong to the region of support of the lower convex hull, v  ∈ V̆  (v), are marked red.

attributes on both nodes and edges. To relax the constraints in (2), we first match H to
H  , which yields solution X1 . Then, we match H  to H, which yields solution X2 . The
final solution, X̃, is estimated as an intersection of non-zero elements of X1 and X2 .
 
Formally, theconstraints are relaxed as follows: (i) ∀(v, v ) ∈ V ×V  , xvv ≥ 0; and
(ii) ∀v ∈ V, v ∈V  xvv = 1 when matching H to H  ; and ∀v  ∈ V  , v∈V xvv = 1
when matching H  to H.

4.2 Convexification of the Objective Function of Matching

The QP in (2) is in general non-convex, and defines a matching space of typically 104
possible node pairs in our experiments. In order to efficiently find a solution, we con-
vexify the QP. This significantly reduces the number of matching candidates.
Given H and H  to be matched, for each node v ∈ V of H, we identify those
matching candidates v  ∈ V  of H  that form the region of support of the lower convex
hull of costs {avv }v ∈V  , as illustrated in Fig. 2. Let V̆  (v) ⊂ V  denote this region
of support of the convex hull, and let Ṽ  (v) ⊂ V  denote the set of true matches of
node v that minimize the QP in (2) (i.e., the solution). Then, by definition, we have
that Ṽ  (v) ⊆ V̆  (v), i.e., the true matches must be located in the region of support of
the convex hull. It follows, that for each node v ∈ V , we can discard those matching
candidates from V  that do not belong to V̆  (v). In our experiments, we typically ob-
tain |V̆  (v)| |V  |, which leads to a dramatic reduction of the dimensionality of the
original matching space, |V ×V  |.
In summary, we compute Ă, X̆, B̆ from original A, X, B, respectively, by deleting
all their elements avv , xvv , bvv uu for which v  ∈/ V̆  (v). Then, we use the standard
interior-reflective Newton method to solve the following program:
 
minX̆ β ĂT X̆ + (1 − β)X̆ T B̆ X̆ ,
 (3)
s.t. ∀(v, v  )∈V ×V̆  (v), xvv ≥0, ∀v∈V, v  ∈V̆  (v) xvv =1.


5 Results

This section presents the experimental evaluation of our approach on the standard
MPEG-7 and Brown shape datasets [12]. MPEG-7 has 1400 silhouette images show-
ing 70 different object classes, with 20 images per object class, as illustrated in Fig. 3.
MPEG-7 presents many challenges due to a large intra-class variability within each
class, and small differences between certain classes. The Brown shape dataset has 11
examples from 9 different object categories, totaling 99 images. This dataset introduces
Matching Hierarchies of Deformable Shapes 7

additional challenges, since many of the shapes have missing parts (e.g., due to occlu-
sion), and the images may contain clutter in addition to the silhouettes, as illustrated
in Figs. 1, 4, 5. We use the standard evaluation on both datasets. For every silhouette
in MPEG-7, we retrieve the 40 best matches, and count the number of those that are
in the same class as the query image. The retrieval rate is defined as the ratio of the
total number of correct hits obtained and the best possible number of correct hits. The
latter number is 1400 · 20. Also, for each shape in the Brown dataset, we first retrieve
the 10 best matches, then, check if they are in the same class as the query shape, and,
finally, compute the retrieval rate, as explained above. Input to our algorithm consists
of two parameters: the fine-resolution level (approximation error defined in Sec. 3) of
representing the contour, and β. For silhouettes in both datasets, and the approximation
error (defined in Sec. 3) equal to 1%, we obtain shape hierarchies with typically 50-100
nodes, maximum hierarchy depths of 5-7, and maximum branching factors of 4-6. For
every query shape, the distances to other shapes are computed as the normalized total
matching cost, D, between the query and these other shapes. If X is the solution of our
quadratic programming, then D=[βAT X + (1−β)X T BX]/[|V | + |V  |], where |V | is
the total number of nodes in one shape hierarchy. Matching two shape hierarchies takes
about 5-10sec in MATLAB on a 3.1GHz, 1GB RAM PC.
Qualitative evaluation. Fig. 3 shows a few examples of our shape retrieval results on
MPEG-7. From the figure, our approach makes errors mainly due to the non-optimal
pre-setting of the fine-resolution level at which contours are represented by the shape
hierarchy. Also, some object classes in the MPEG-7 are characterized by multiple

Fig. 3. MPEG-7 retrieval results on three query examples and comparison with [6]. For each
query, we show 11 retrieved shapes with smallest to highest cost. (top) Results of [6]. (bottom)
Our results. Note that for deer we make the first mistake in 6th retrieval, and then get confused
with shapes whose parts are very similar to those of deer. Mistakes for other queries usually occur
due to missing to capture fine details of the curves in the shape hierarchy in our implementation.
8 N. Payet and S. Todorovic

Fig. 4. The Brown dataset – each of the four columns shows one example pair of silhouettes, and
each of the two rows shows shape parts at a specific scale that got matched; top row shows finer
scale and bottom row shows coarser scale. As can be seen, silhouettes that belong to the same
class may have large differences; despite the differences, corresponding parts got successfully
matched (each match is marked with unique color).

Fig. 5. The Brown dataset – two example pairs of silhouettes, and their shape parts that got
matched. The shapes belong to different classes, but the algorithm identifies their similar parts, as
expected (each match is marked with unique color). The normalized total matching cost between
the bunny and gen (left), or the fish and tool (right) is larger than the costs computed for the
examples shown in Fig 4, since there are fewer similar than dissimilar parts. (β = 0.4).

disjoint contours, whereas our approach is aimed at matching only one single contour
at a time. Next, Fig. 4 shows four example pairs of silhouettes from the same class,
and their matched shape parts. Similar shape parts at multiple scales got successfully
matched in all cases, as expected. Fig. 4 presents two example pairs of silhouettes that
belong to different classes. As in the previous case, similar shape parts got successfully
matched; however, since there are fewer similar than dissimilar parts the normalized
total matching cost in this case is larger. This helps discriminate between the shapes
from different classes in the retrieval.
Quantitative evaluation. To evaluate the sensitivity of our approach to input param-
eter β, we compute the average retrieval rate on the Brown dataset as a function of
input β = 0.1 : 0.1 : 0.9. The maximum retrieval rate of 99% is obtained for β=0.4,
while for β = {0.3, 0.5, 0.6} we obtain the rate of 98%, while for other values of β
the retrieval rate gracefully decreases. This suggests that both intrinsic properties of
shape parts and their spatial relations are important for shape matching, and that our
Matching Hierarchies of Deformable Shapes 9

Table 1. Retrieval results on the Brown dataset for β = 0.4

Approaches 1st 2nd 3rd 4th 5th 6th 7th 8th 9th 10th
[12] 99 99 99 98 98 97 96 95 93 82
[6] 99 99 99 98 98 97 97 98 94 79
[3] 99 99 99 99 99 99 99 97 93 86
Our method 99 99 98 98 98 97 96 94 93 82

algorithm is relatively insensitive to small changes of β around 0.4. However, as any


hierarchical approach, ours also seems to be sensitive to the right choice of the finest
resolution at which the shape is represented. As mentioned above, different values of
this input parameter may result in large variations of the number of nodes in the shape
hierarchy, which, in turn, cause changes in computing the normalized total matching
cost. If the right choice is selected separately for each class of MPEG-7, using valida-
tion data, then we obtain the retrieval rate of 88.3%. If this parameter is set to 1%, as
stated above, for all classes, then our performance drops to 84.3%. This is comparable
to the state of the art that achieves the rates of 85.40% in [6] and 87.70% in [3]. Table 1
summarizes our retrieval rates on the Brown dataset after first top 1 to 10 retrievals, for
β = 0.4 and shape-resolution level fixed over all classes. Again, this retrieval improves
if we select a suitable value for the resolution parameter for each class separately, using
validation data.

6 Conclusion

Matching deformable shapes is difficult since they may be perceptually similar, but
have certain parts very different or even missing. We have presented an approach aimed
at robust matching of deformable shapes by identifying multiscale salient shape parts,
and accounting for their intrinsic properties, and part-of and neighbor relationships.
Experimental evaluation of the proposed hierarchical shape representation and shape
matching via minimizing a quadratic cost has demonstrated that the approach robustly
deals with large variations or missing parts of perceptually similar shapes.

References
1. Biederman, I.: Recent psychophysical and neural research in shape recognition. In: Osaka,
N., Rentschler, I., Biederman, I. (eds.) Object Recognition, Attention, and Action, pp. 71–88.
Springer, Heidelberg (2007)
2. Günther, O., Wong, E.: The arc tree: an approximation scheme to represent arbitrary curved
shapes. Comput. Vision Graph. Image Process. 51(3), 313–337 (1990)
3. Felzenszwalb, P.F., Schwartz, J.D.: Hierarchical matching of deformable shapes. In: CVPR
(2007)
4. Mokhtarian, F., Mackworth, A.K.: A theory of multiscale, curvature-based shape representa-
tion for planar curves. IEEE TPAMI 14(8), 789–805 (1992)
5. Ueda, N., Suzuki, S.: Learning visual models from shape contours using multiscale con-
vex/concave structure matching. IEEE TPAMI 15(4), 337–352 (1993)
10 N. Payet and S. Todorovic

6. Ling, H., Jacobs, D.: Shape classification using the inner-distance. IEEE TPAMI 29(2), 286–
299 (2007)
7. Fan, X., Qi, C., Liang, D., Huang, H.: Probabilistic contour extraction using hierarchical
shape representation. In: ICCV, pp. 302–308 (2005)
8. Mcneill, G., Vijayakumar, S.: Hierarchical procrustes matching for shape retrieval. In: CVPR
(2006)
9. Siddiqi, K., Shokoufandeh, A., Dickinson, S.J., Zucker, S.W.: Shock graphs and shape match-
ing. Int. J. Comput. Vision 35(1), 13–32 (1999)
10. Shokoufandeh, A., Macrini, D., Dickinson, S., Siddiqi, K., Zucker, S.W.: Indexing hierarchi-
cal structures using graph spectra. IEEE TPAMI 27(7), 1125–1140 (2005)
11. Bunke, H., Allermann, G.: Inexact graph matching for structural pattern recognition. Pattern
Rec. Letters 1(4), 245–253 (1983)
12. Sebastian, T.B., Klein, P.N., Kimia, B.B.: Recognition of shapes by editing their shock
graphs. IEEE Trans. Pattern Anal. Machine Intell. 26(5), 550–571 (2004)
13. Pelillo, M., Siddiqi, K., Zucker, S.W.: Matching hierarchical structures using association
graphs. IEEE TPAMI 21(11), 1105–1120 (1999)
14. Tu, Z., Yuille, A.: Shape matching and recognition - using generative models and informative
features. In: Pajdla, T., Matas, J(G.) (eds.) ECCV 2004. LNCS, vol. 3023, pp. 195–209.
Springer, Heidelberg (2004)
15. Gold, S., Rangarajan, A.: A graduated assignment algorithm for graph matching. IEEE
TPAMI 18(4), 377–388 (1996)
16. Jiang, H., Drew, M.S., Li, Z.N.: Matching by linear programming and successive convexifi-
cation. IEEE TPAMI 29(6), 959–975 (2007)
17. Teh, C.H., Chin, R.T.: On the detection of dominant points on digital curves. IEEE Trans.
Pattern Anal. Mach. Intell. 11(8), 859–872 (1989)
18. Torsello, A., Hancock, E.R.: Computing approximate tree edit distance using relaxation la-
beling. Pattern Recogn. Lett. 24(8), 1089–1097 (2003)
19. Pelillo, M., Siddiqi, K., Zucker, S.W.: Many-to-many matching of attributed trees using as-
sociation graphs and game dynamics. In: Arcelli, C., Cordella, L.P., Sanniti di Baja, G. (eds.)
IWVF 2001. LNCS, vol. 2059, pp. 583–593. Springer, Heidelberg (2001)
20. Demirci, M.F., Shokoufandeh, A., Keselman, Y., Bretzner, L., Dickinson, S.J.: Object recog-
nition as many-to-many feature matching. Int. J. Computer Vision 69(2), 203–222 (2006)
21. Todorovic, S., Ahuja, N.: Region-based hierarchical image matching. Int. J. of Computer
Vision 78(1), 47–66 (2008)
Edition within a Graph Kernel Framework for
Shape Recognition

François-Xavier Dupé and Luc Brun

GREYC UMR CNRS 6072,


ENSICAEN-Université de Caen Basse-Normandie,
14050 Caen France
{francois-xavier.dupe,luc.brun}@greyc.ensicaen.fr

Abstract. A large family of shape comparison methods is based on a


medial axis transform combined with an encoding of the skeleton by a
graph. Despite many qualities this encoding of shapes suffers from the
non continuity of the medial axis transform. In this paper, we propose
to integrate robustness against structural noise inside a graph kernel.
This robustness is based on a selection of the paths according to their
relevance and on path editions. This kernel is positive semi-definite and
several experiments prove the efficiency of our approach compared to
alternative kernels.

Keywords: Shape, Skeleton, Support Vector Machine, Graph Kernel.

1 Introduction

The skeleton is a key feature within the shape recognition framework [1,2,3].
Indeed, this representation holds many properties: it is a thin set, homotopic to
the shape and invariant under Euclidean transformations. Moreover, any shape
can be reconstructed from the maximal circles of the skeleton points.
The set of points composing a skeleton does not highlight the structure of a
shape. Consequently, the recognition step is usually based on a graph compari-
son where graphs encode the main properties of the skeletons. Several encoding
systems have been proposed: Di Ruberto [4] proposes a direct translation of the
skeleton to the graph using many attributes. Siddiqi [5] proposes a graph which
characterises both structural properties of a skeleton and the positive, negative
or null slopes of the radius of the maximal circles along a branch. Finally this
last encoding has been improved and extended to 3D by Leymarie and Kimia [6].
The recognition of shapes using graph comparisons may be tackled using
various methods. A first family of methods is based on the graph edit distance
which is defined as the minimal number of operations required to transform
the graph encoding the first shape into the graph encoding the second one [2,3].
Another method, introduced by Pelillo [1], transforms graphs into trees and then

This work is performed in close collaboration with the laboratory Cycéron and is
supported by the CNRS and the région Basse-Normandie.

A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 11–20, 2009.

c Springer-Verlag Berlin Heidelberg 2009
12 F.-X. Dupé and L. Brun

models the tree matching problem as a maximal clique problem within a specific
association graph. A last method proposed by Bai and Latecki [7] compares
paths between end-node (node with only one neighbor) after a matching task on
the end-nodes. Contrary to previously mentioned approaches this last method
can deal with loops and may thus characterize holed shapes.
All the above methods perform in the graph space which almost contains no
mathematical structure. This forbids many common mathematical tools like the
mean graph of a set which has to be replaced by its median. A solution consists to
project graphs into a richer space. Graph kernels provide such an embedding: by
using appropriate kernels, graphs can be mapped either explicitly or implicitly
into a vector space whose dot product corresponds to the kernel function.
Most famous graph kernels are the random walk kernel, the marginalized
graph kernel and the geometric kernel [8]. A last family of kernels is based on
the notion of bag of paths [9]. These methods describe each graph by a subset of
its paths, the similarity between two graphs being deduced from the similarities
between their paths. Path similarity is based on a comparison between the edges
and nodes attributes of both paths.
However, skeletonization is not a continuous process and small perturbations
of a shape may produce ligatures and spurious branches. Graph kernels may
in this case lead to inaccurate comparisons. Neuhaus and Bunke have proposed
several kernels (e.g. [10]) based on the graph edit distance in order to reduce
the influence of graph perturbations. However the graph edit distance does not
usually fulfills all the properties of a metric and the design of a definite positive
kernel from such a distance is not straightforward. Our approach is slightly
different. Indeed, instead of considering a direct edit distance between graphs,
our kernel is based on a rewriting process applied on the bag of paths of two
graphs. The path rewriting follows the same basic idea than the string edit
distance but provides a definite positive kernel between paths.
This paper follows a first contribution [11] where we introduced the notion
of path rewriting within the graph kernel framework. It is structured as follows:
first, we recall how to construct a bag of path kernel [9,11] (Section 2). Then, we
propose a graph structure (Section 3) which encodes both the structure of the
skeleton and its major characteristics. This graph contains a sufficient amount
of information for shape reconstruction. We then extend the edition operations
(Section 4) by taking into account all the attributes and by controlling the effect
of the edition on them. Finally, we present experiments (Section 5) in order to
highlight the benefit of the edition process.

2 Bag of Path Kernel

Let us consider a graph G = (V, E) where V denotes the set of vertices and
E ⊂ V × V the set of edges. A bag of paths P associated to G is defined as a
set of paths of G whose cardinality is denoted by |P |. Let us denote by Kpath
a generic path kernel. Given two graphs G1 and G2 and two paths h1 ∈ P1
and h2 ∈ P2 of respectively G1 and G2 , Kpath (h1 , h2 ) may be interpreted as
Edition within a Graph Kernel Framework for Shape Recognition 13

a measure of similarity between h1 and h2 . The aim of a bag of path kernel


consists to aggregate all these local measures between pairs of paths into a global
similarity measure between the two graphs. Such a kernel differs from random
walk kernels where all the paths of the two graphs are compared.

2.1 Change Detection Kernel

Desobry [12] proposed a general approach for the comparison of two sets which
has straightforward applications in the design of a bag of path kernel (bags
are sets). The two bags are modelled as the observation of two sets of random
variables in a feature space.
Desobry proposes to estimate a distance between the two distributions without
explicitly building the pdf of the two sets. The considered feature space is based
on a normalised kernel (K(h, h ) = Kpath (h, h )/ (Kpath (h, h)Kpath (h , h ))). Using
such a kernel we have h2K = K(h, h) = 1 for any path. The image in the feature
space of our set of paths lies thus on an hypersphere of radius 1 centered at the
origin (Fig. 1(a)). Using the one-class ν-SVM, we associate a set of paths to a
region on this sphere. This region corresponds to the density support estimate
of the set of paths’ unknown pdf.
Once the two density supports are estimated, the one-class SVM yields w1
(resp. w2 ), the mean vector, and ρ1 (resp. ρ2 ), the ordinate at the origin, for the
first bag (resp. the second bag). In order to compare the two mean vectors w1
and w2 , we define the following distance function:
 t
w1 K1,2 w2
dmean (w1 , w2 ) = arccos , (1)
w1  w2 

where K1,2 (i, j) = K(hi , hj ), hi ∈ P1 , hj ∈ P2 and w1t K1,2 w2 is the scalar


product between w1 and w2 . This distance corresponds to the angle α between
the two mean vectors w1 and w2 of each region (Fig. 1(a)).Then we define the
kernel between two bags of path P1 and P2 as 1) the product of a Gaussian RBF
kernel associated to dmean (w1 , w2 ) and 2) a Gaussian RBF kernel associated to

(a) Sets on the unit (b) Original (c) Spurious branch (d) Ligature
sphere

Fig. 1. (a) Separating two sets using one-class SVM. The symbols (w1 ,ρ1 ) and (w2 ,ρ2 )
denote the parameters of the two hyperplanes which are represented by dashed lines.
Influence of small perturbation on the skeleton (in black) ((b),(c) and (d)).
14 F.-X. Dupé and L. Brun

the difference between the two coordinates at the origin (ρ1 and ρ2 ):
 
−d2mean (w1 , w2 ) −(ρ1 − ρ2 )2
Kchange (P1 , P2 ) = exp 2
exp 2
. (2)
2σmean 2σorigin

Finally, we define the kernel between two graphs G1 , G2 as the kernel between
their two bags of path: Kchange (G1 , G2 ) = Kchange (P1 , P2 ).
The distance between the mean vectors is a metric based on a normalized
scalar product combined with arccos which is bijective on [0, 1]. However, the
relationship between the couple (w, ρ) and the bag of path being not bijective,
the final kernel between bags is only semi positive-definite [13]. Though, in all
our experiments run so far the Gram matrices associated to the bags of paths
were positive-definite.

2.2 Path Kernel


The above bag of path kernel is based on a generic path kernel Kpath . A kernel
between two paths h1 = (v1 , . . . , vn ) and h = (v1 , . . . , vp ) is classically [14] built
by considering each path as a sequence of nodes and a sequence of edges. This
kernel denoted Kclassic is defined as 0 if both paths have not the same size and
as follows otherwise:
|h|
Kclassic (h, h ) = Kv (ϕ(v1 ), ϕ(v1 )) Ke (ψ(evi−1 vi ), ψ(evi−1


vi ))Kv (ϕ(vi ), ϕ(vi )), (3)
i=2

where ϕ(v) and ψ(e) denote respectively the vectors of features associated to the
node v and the edge e. The terms Kv and Ke denote two kernels for respectively
node’s and edge’s features. For the sake of flexibility and simplicity, we use
Gaussian RBF kernels based on the distance between the attributes defined in
section 3.2.

3 Skeleton-Based Graph
3.1 Graph Representations
Medial-axis based skeleton are built upon a distance function whose evolution
along the skeleton is generally modeled as a continuous function. This function
presents important changes of slope mainly located at the transitions between
two parts of the shape. Based on this remark Siddiqi and Kimia distinguish three
kind of branches within the shock graph construction scheme [2]: branches with
positive, null or negative slopes. Nodes corresponding to these slope transitions
are inserted within the graph. Such nodes may thus have a degree 2. Finally,
edges are directed using the slope sign information.
Compared to the shock graph representation, we do not use oriented edges
since small positive or negative values of the slope may change the orientation
of an edge and thus alter the graph representation. On the other hand our set
of nodes corresponds to junction points and to any point encoding an important
Edition within a Graph Kernel Framework for Shape Recognition 15

(a) A change (b) Edition effect on the shape (path in gray)


of slope

Fig. 2. Slope detection (a) and edition of paths (b)

change of slope of the radius function. Such a significant change may encode a
change from a positive to a negative slope but also an important change of slope
with a same sign (Fig. 2(a)). Encoding these changes improves the detection of
the different parts of the shape. The main difficulty remains the detection of the
slope changes due to the discrete nature of the data. The slopes are obtained
using regression methods based on first order splines [15]. These methods are
robust to discrete noise and first order splines lead to a continuous representation
of the data. Moreover, such methods intrinsically select the most significant
slopes using a stochastic criterion. Nodes encoding slope transitions are thus
located at the junctions (or knot) between first order splines.

3.2 Attributes

The graph associated to a shape only provides information about its structural
properties. Additional geometrical properties of the shape may be encoded using
node and edge attributes. From a structural point of view, a node represents a
particular point inside the shape skeleton and an edge a branch. However, a
branch also represents the set of points of the shape which are closer to the
branch than any other branch. This set of points is defined as the influence zone
of the branch and can be computed using SKIZ transforms [16].
Descriptors computed from the influence zone are called local, whilst the ones
computed from the whole shape are called global. In [3] Goh introduces this
notion and points out that an equilibrium between local and global descriptors is
crucial for the efficiency of a shape matching algorithm. Indeed local descriptors
provide a robustness against occlusions, while global ones provide a robustness
against noise.
We have thus selected a set of attributes which provides an equilibrium be-
tween local and global features. Torsello in [17] proposes as edge attribute an
approximation of the perimeter of the boundary which contributes to the forma-
tion of the edge, normalized by the approximated perimeter of the whole shape.
Suard proposes [9] as node attribute the distance between the node position and
the gravity center of the shape divided by the square of the shape area. These
two attributes correspond to our global descriptors.
16 F.-X. Dupé and L. Brun

Goh proposes several local descriptors [3] for edges based on the evolution
of the radius of the maximal circle along a branch. For each point (x(t), y(t))
of a branch, t ∈ [0, 1], we consider the radius R(t) of its maximal circle. In or-
der to normalize the data, the radius is divided by the square root of the area
of the influence zone of the branch. We also introduce α(t), the angle formed
by the tangent vector at (x(t), y(t)) and the x-axis. Then we consider (ak )k∈N
and (bk )k∈N the coefficients of two regression polynomials that fit respectively
R(t) and α(t) in the least square sense. If both polynomials are of sufficient
orders, the skeleton can be reconstructed from the graph and so the  shape
(Section
 1). Following Goh [3], our two local descriptors are defined by: k ak /k
and k bk /k.
The distance associated to each attribute is defined as the absolute value of
the difference between the values a and b of the attribute: d(a, b) = |a − b|. As the
attributes are normalized , the distances are invariant to change of scale and rota-
2
tion. Such distances are used to define the Gaussian RBF kernels (exp −d2σ(.,.) 2 )
used to design Kpath (Section 2.2).

4 Hierarchical Kernels

The biggest issue with skeleton-based graph representation is the non-negligible


effect of small perturbations on the shape [2]: Fig. 1 shows two deformations of
the skeleton of a circle (Fig. 1(b)) one induced by a small bump (Fig. 1(c)) and
one by an elongation (Fig. 1(d)). On complex shapes, severe modifications of the
graphs may occur and lead to inaccurate comparisons.
From a structural point of view, perturbations like bumps (Fig. 1(c)) create
new nodes and edges. In contrast, the principal effect of an elongation (Fig. 1(d))
is either the addition of an edge inside the graph or the extension of an exist-
ing edge. So shape noise mainly induces two effects on paths: addition of nodes
(Fig. 1(c)) and addition of edges (Fig. 1(d)). This leads to the two editions op-
erations: node suppression and edge contraction. Note that, as the compared
structure are paths, the relevance of these operations should be evaluated ac-
cording to the path under study.

4.1 Elementary Operations on Path

The node suppression operation removes a node from the path and all the graph
structures that are connected to this path by this node. Within the path, the
two edges incident to the nodes are then merged. This operation corresponds
to the removal of a part of the shape: for example, if we remove the node 2 in
Fig. 2(b1), a new shape similar to Fig. 2(b2) is obtained.
The edge contraction operation contracts an edge and merges its two extremity
nodes. This results in a contraction of the shape: for example, if we contract
the edge e1,2 of the shape in Fig. 2(b1) then the new shape will be similar to
Fig. 2(b3).
Edition within a Graph Kernel Framework for Shape Recognition 17

Since each operation is interpreted as a shape transformation, the global


descriptors must be updated. From this point of view our method may be con-
sidered as a combination of the methods of Sebastian [2] and Goh [3] who re-
spectively use local descriptors with edit operations and both local and global
descriptors but without edit operations.

4.2 Edition Cost

In order to select the appropriate operation, an edition cost is associated to each


operation. Let us consider an attribute weight associated to each edge of the
graph which encodes the relevance of its associated branch. We suppose that
this attribute is additive: the weight of two consecutive edges along a path is the
sum of both weights.
Note that, we consider the maximal spanning tree T of the graph G. As
skeletonization is an homotopic transform, a shape with no hole yields T = G.
Let us consider a path h = (v1 , . . . , vn ) within T . Now, an edition cost is assigned
to both operations within h:

– Let us consider a node vi , i ∈ {2, . . . , n − 1} of the path h (extremity nodes


are not considered). The cost of the node suppression operation on vi must
reflect two of its properties: 1) the importance of the sub-trees of T connected
to the path by vi and 2) the importance of the slope changes (Section 3.1)
between the two branches respectively encoded by the edges evi−1 vi and
evi vi+1 .
The relevance of a sub-tree is represented by its total weight: for each
neighbor v of vi , v ∈
/ h, we compute the weight W (v) defined as the addition
of the weight of the tree rooted on v in T \ {evi v } and the weight of evi v . This
tree is unique since T is a tree. The weight of the node vi is then defined as
the sum of weights W (v) for all neighbors v of vi (v ∈ / h) and is denoted by
ω(vi ).
We encode the relevance of a slope change by the angle β(vi ) formed by
the slope vectors associated to evi−1 vi and evi vi+1 . An high value of β(vi )
encodes a severe change of slopes and conversely. Since slopes are approx-
imated using first-order
 polynomials (section 3.1), the angle β(vi ) is given

∗a 1
√ 1+a21√
by β(vi ) = arccos where a1 and a1 are the first order coef-
1+a1 1+a21
ficients of the regression polynomials.
Finally the edition cost of the suppression of a node is defined by (1 −
γ)ω(vi ) + γβ(vi )/π, where γ is a tuning variable.
– The cost of the edge contraction operation encodes the importance of the
edge inside the graph T , this is the purpose of the weight. So, the edition
cost of contracting an edge is defined as its weight.

Concerning weight any additive measure encoding the relevance of a skeleton’s


branch may be used. We choose to use the normalized perimeter as computed
by Torsello [17], because of its resistance to noise on the shape boundary.
18 F.-X. Dupé and L. Brun

4.3 Edition Path Kernel


Let us denote by κ the function which applies the cheapest operation on a path
and D the maximal number of reductions. The successive applications of the
function κ associate to each path h a sequence of reduced paths (h, κ(h), . . . ,
κD (h)). Each κk (h) is associated to a cost: costk (h) defined as the sum of the
costs of the k operations yielding κk (h) from h. Using Kclassic for the path
comparison, we define the kernel Kedit as a sum of kernels between reduced
paths. Given two paths h and h , the kernel Kedit (h, h ) is defined as:
D D 
1 costk (h) + costl (h )
Kedit (h, h ) = exp − 2
Kclassic (κk (h), κl (h )), (4)
D+1 k=0 l=0
2σcost

where σcost is a tuning variable. This kernel is composed of two parts: a scalar
product of the edition costs in a particular space and a path kernel. For a small
value of σcost the behavior of the kernel will be close to Kclassic as only low
editions cost will contribute to Kedit (h, h ). Conversely, for a high value every
editions will contribute to Kedit (h, h ) with an approximately equal importance.
The kernel Kclassic is a tensor product kernel based on positive-definite kernels
(Section 2.2), so it is positive-definite. The kernel over edition costs is constructed
from a scalar product and is thus positive-definite. These two last kernels form
a tensor product kernel. Finally Kedit is proportional (by a factor D + 1) to a
R-convolution kernel [18, Lemma 1], thus this kernel is positive-definite.

5 Experiments
For the following experiments, we defined the importance of a path as the sum of
the weights of its edges. For each graph, we first consider all its paths composed
of at most 7 nodes and we sort them according to their importance using a
descending order. The bag of paths is then constructed using the first 5 percent of
the sorted paths. For all the experiments, the tuning variable of the deformation
cost γ (Section 4.2) is set to 0.5.
The first experiment consists in an indexation of the shapes using the distances
induced by the kernels, i.e. d(G, G ) = k(G, G) + k(G , G ) − 2k(G, G ) where k is
a graph kernel. The different σ of the attributes RBF kernels involved in Kclassic
(Section 3.2) are fixed as follows: σperimeter = σradius = σorientation = 0.1 and
σgravity center = 0.2. Note that Kclassic constitutes the basis of all the kernels
defined below. The parameters of Kchange are set to: σmean = 1.0, σorigin = 20
and ν = 0.9. The maximal number of editions is fixed to 6. Let us consider the
class tool from the LEMS database [19] of 99 shapes with 11 elements per class.
Two kind of robustness are considered: robustness against ligatures and per-
turbations and robustness against erroneous slope nodes. Ligatured skeletons of
the shapes are created by varying the threshold parameter ζ of the skeletoniza-
tion algorithm [17], high values lead to ligatured skeletons while low value tend
to remove relevant branches. Skeletons with erroneous slope nodes are created
by varying the parameter of our slope detection algorithm. This detection is
based on the BIC criterion which uses the standard error of the noise σBIC .
Edition within a Graph Kernel Framework for Shape Recognition 19

11 11 1

10 10 0.9

9 0.8
9

8
8 0.7

7
7 0.6

6
0.5
6
5
0.4
5
4
0.3
4
3
0.2
3
2

0.1
1 2

0
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

(a) Slope changes (b) Ligatures (c) ROC curves

Fig. 3. Resistance to spurious slope changes (a) and spurious branches(b). For (a) and
(b) the kernels are from top to bottom: Kchange,edit2 ( ), Kchange,edit1 ( ), random
walk kernel( ), and Kchange,classic ( · ). (c) ROC curves for the classification of dogs
and cats using: Kchange,edit ( ), random walk kernel ( ) and Kchange,classic ( · ).

A small value of σBIC makes the criterion sensitive to small changes of slopes
and gives many slope nodes, while high value makes the criterion insensitive
to slope changes. Four kernels are compared: random walk kernel [8], Kchange
with Kclassic (denoted as Kchange,classic ) and 2 kernels using Kchange with Kedit
(with σcost = 0.1 for Kchange,edit1 and σcost = 0.2 for Kchange,edit2 ). Using the
distances induced by the kernels, shapes are sorted in ascending order according
to their distance to the perturbed tool. Fig. 3(a) shows the mean number of tools
inside the first 11 sorted shapes for an increasing value of σBIC . Fig. 3(b) shows
the same number but for a decreasing threshold value ζ. The two edition kernels
show a good resistance to perturbations and ligatures as they get almost all the
tools for each query. Their performances slightly decrease when shapes become
strongly distorted. The kernel Kchange,classic gives the worst results as the re-
duction of the bag of paths leads to paths of different lengths which cannot be
compared with Kclassic (Section 2.2). The random walk kernel is robust against
slight perturbations of the shapes but cannot deal with severe distortion.
In the second experiment, we strain kernels by separating 49 dogs from 49 cats
using a ν-SVM. The three considered kernels are Kchange,classic , Kchange,edit
(with σcost = 0.5) and random walk. The different σ of the attributes RBF
kernels (Section 3.2) are fixed as follows: σperimeter = σradius = σorientation = 0.1
and σgravity center = 0.5. The parameters of Kchange are set to: σmean = 5.0,
σorigin = 20 and ν = 0.9. We compute the ROC curves produced by kernels using
a 10-fold cross-validation. Fig 3(c) presents the three ROC curves. The random
walk kernel gives correct results, whilst the Kchange,classic kernel confirms its
poor performance. The Kchange,edit kernel shows the best performances and a
behaviour similar to the random walk kernel. Furthermore, on our computer a
Core Duo 2 at 2GHz, the computational burden of the 98x98 Gram matrix is of
approximately 23 minutes for Kchange,edit and of 2.5 hours for the random walk
kernel.
20 F.-X. Dupé and L. Brun

6 Conclusion
We have defined in this paper a positive-definite kernel for shape classification
which is robust to perturbations. Our bag of path contains the more important
paths of a shape below a given length in order to only capture the main infor-
mation about the shape. Only the Kedit kernel provides enough flexibility for
path comparison and gives better results then the classical random walk kernel.
In a near future, we would like to improve the selection of paths. An extension
of the edition process on graphs is also planned.

References
1. Pelillo, M., Siddiqi, K., Zucker, S.: Matching hierarchical structures using associa-
tion graphs. IEEE Trans. on PAMI 21(11), 1105–1120 (1999)
2. Sebastian, T., Klein, P., Kimia, B.: Recognition of shapes by editing their shock
graphs. IEEE Trans. on PAMI 26(5), 550–571 (2004)
3. Goh, W.B.: Strategies for shape matching using skeletons. Computer Vision and
Image Understanding 110, 326–345 (2008)
4. Ruberto, C.D.: Recognition of shapes by attributed skeletal graphs. Pattern Recog-
nition 37(1), 21–31 (2004)
5. Siddiqi, K., Shokoufandeh, A., Dickinson, S.J., Zucker, S.W.: Shock graphs and
shape matching. Int. J. Comput. Vision 35(1), 13–32 (1999)
6. Leymarie, F.F., Kimia, B.B.: The shock scaffold for representing 3d shape. In:
Arcelli, C., Cordella, L.P., Sanniti di Baja, G. (eds.) IWVF 2001. LNCS, vol. 2059,
pp. 216–229. Springer, Heidelberg (2001)
7. Bai, X., Latecki, J.: Path Similarity Skeleton Graph Matching. IEEE PAMI 30(7)
(2008)
8. Vishwanathan, S., Borgwardt, K.M., Kondor, I.R., Schraudolph, N.N.: Graph ker-
nels. Journal of Machine Learning Research 9, 1–37 (2008)
9. Suard, F., Rakotomamonjy, A., Bensrhair, A.: Mining shock graphs with kernels.
Technical report, LITIS (2006),
http://hal.archives-ouvertes.fr/hal-00121988/en/
10. Neuhaus, M., Bunke, H.: Edit-distance based kernel for structural pattern classifi-
cation. Pattern Recognition 39, 1852–1863 (2006)
11. Dupé, F.X., Brun, L.: Hierarchical bag of paths for kernel based shape classification.
In: SSPR 2008, pp. 227–236 (2008)
12. Desobry, F., Davy, M., Doncarli, C.: An online kernel change detection algorithm.
IEEE Transaction on Signal Processing 53(8), 2961–2974 (2005)
13. Berg, C., Christensen, J.P.R., Ressel, P.: Harmonic Analysis on Semigroups.
Springer, Heidelberg (1984)
14. Kashima, H., Tsuda, K., Inokuchi, A.: Marginalized kernel between labeled graphs.
In: Proc. of the Twentieth International conference on machine Learning (2003)
15. DiMatteo, I., Genovese, C., Kass, R.: Bayesian curve fitting with free-knot splines.
Biometrika 88, 1055–1071 (2001)
16. Meyer, F.: Topographic distance and watershed lines. Signal Proc. 38(1) (1994)
17. Torsello, A., Hancock, E.R.: A skeletal measure of 2d shape similarity. CVIU 95,
1–29 (2004)
18. Haussler, D.: Convolution kernels on discrete structures. Technical report, Depart-
ment of Computer Science, University of California at Santa Cruz (1999)
19. LEMS: shapes databases, http://www.lems.brown.edu/vision/software/
Coarse-to-Fine Matching of Shapes Using
Disconnected Skeletons by Learning
Class-Specific Boundary Deformations

Aykut Erdem1 and Sibel Tari2


1
Dipartimento di Informatica, Universitá Ca’ Foscari di Venezia
Via Torino 155, Mestre, Venezia, 30172, Italy
erdem@dsi.unive.it
2
Department of Computer Engineering, Middle East Technical University
Inonu Bulvari, 06531, Ankara, Turkey
stari@metu.edu.tr

Abstract. Disconnected skeleton [1] is a very coarse yet a very stable


skeleton-based representation scheme for generic shape recognition in
which recognition is performed mainly based on the structure of discon-
nection points of extracted branches, without explicitly using information
about boundary details [2,3]. However, sometimes sensitivity to bound-
ary details may be required in order to achieve the goal of recognition. In
this study, we first present a simple way to enrich disconnected skeletons
with radius functions. Next, we attempt to resolve the conflicting goals
of stability and sensitivity by proposing a coarse-to-fine shape match-
ing algorithm. As the first step, two shapes are matched based on the
structure of their disconnected skeletons, and following to that, the com-
puted matching cost is re-evaluated by taking into account the similarity
of boundary details in the light of class-specific boundary deformations
which are learned from a given set of examples.

1 Introduction
There is a long history of research in computer vision on representing generic
shape since shape information is a very strong visual clue in recognizing and
classifying objects. A generic shape representation should be insensitive to not
only geometric similarity transformations (i.e. translation, rotation, and scaling)
but also visual transformations such as occlusion, deformation and articulation
of parts. Since their introduction by Blum in [4], local symmetry axis based
representations (commonly referred to as shape skeletons), have attracted and
still attracts many scientists in the field, and became a superior alternative to
boundary-based shape representations. These representation schemes naturally
capture part structure by modeling any given shape via a set of axial curves, each
of which explicitly represents some part of the shape. Once the relations among
extracted shape primitives, i.e. the skeleton branches, are expressed in terms of
a graph or a tree data structure (e.g. [5,6,7]), resulting shape descriptions are
insensitive to articulations and occlusions.

A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 21–30, 2009.

c Springer-Verlag Berlin Heidelberg 2009
22 A. Erdem and S. Tari

A challenging issue regarding skeleton-based representations is the so-called


instability of skeletons [8]. These representations are very sensitive to noise
and/or small details on the shape boundary, and hence two visually very sim-
ilar shapes might have structurally different skeleton descriptions. Hence, the
success of any skeletonization method depends on how robust the final skeleton
descriptions are in the presence of noise and shape features such as protrusions,
indentations, necks, and concavities. As one might expect, this instability is-
sue can also be passed over to the recognition framework, but in this case, the
recognition algorithm should be devised in such a way that it includes a mech-
anism to handle possible structural changes (e.g. [5,9,10,11,12,13,14]). A line of
studies that focuses on solving the instability issue early in the representation
level investigates the abstraction of skeleton graphs. This includes the methods
which seek for a simplified graphical representation where the level of hierarchy
is reduced to a certain extent (e.g. [5,7,15,16]), the studies which try to come up
with an abstract representation from a set of example skeletons (e.g. [5,17,18]),
and more general graph spectral approaches (e.g. [19,20]).
The method proposed in [1] is conceptually different than other approaches
in the sense that the aim is obtaining the coarsest yet the most stable skeleton
representations of shapes from scratch. The method depends on computing a
special, excessively smooth distance surface where each skeleton extracted from
this surface is in the form of a set of unconventionally disconnected and sim-
ple branches, i.e. the skeleton branches all terminate before reaching the unique
shape center and no extra branching occurs on them. Hence, one can express dis-
connected skeletons in terms of rooted attributed depth-1 trees, whose nodes store
some measurable properties, such as the location of the disconnection points, the
length and the type (positive or negative, respectively identifying protrusions or
indentations) of the branches [3] (Fig. 1).
Disconnected skeletons have been previously used for recognition in [2,3] in
which quite successful results are reported. Although the representation does
not suffer from the instability of skeletons as a direct result of the disconnected
nature of extracted branches, and that structure alone is an effective shape
representation, as commented in [21], one might criticize the very coarseness of

Fig. 1. Disconnected skeletons of some shapes and the corresponding tree representa-
tions. Note that each disconnection point (except the pruned major branches) gives
rise to two different nodes in the tree, representing the positive and negative skeleton
branches meeting at that disconnection point. However, for illustration purposes, only
one node is drawn.
Coarse-to-Fine Matching of Shapes Using Disconnected Skeletons 23

descriptions that they do not explicitly carry any information about boundary
details. This issue is in fact about a philosophical choice of compromise between
sensitivity and stability. Clearly, in distinguishing shapes, it might happen that
the similarity of boundary details is more distinctive than the similarity of the
structure of disconnection points (Fig. 6, 7).
In this study, we present a coarse-to-fine strategy to deal with such situations.
The organization of the paper is as follows. In Section 2, we describe a way
to obtain radius functions [4] (associated with the positive skeleton branches)
in order to enrich the disconnected skeleton representation with information
about shape boundary details. In Section 3, we utilize this extra information to
enhance the class-specific knowledge utilized in the category influenced matching
method proposed in [3] that boundary deformations in a shape category are
additionally learned from examples. Following to that, in Section 4, we introduce
a fine tuning step to the category influenced matching method, which then takes
into account the similarity of boundary details. In Section 5, we present some
matching results. Finally, in Section 6, we give a brief summary and provide
some concluding remarks.

2 Obtaining Radius Functions

Disconnected skeleton of a shape is obtained from a special distance surface φ,


the level curves of which are the excessively smoothed versions of the initial
shape boundary (Fig. 2(b)). The surface has a single extremum point captur-
ing the center of a blob-like representation of the shape, and from that one can
extract skeleton branches using the method in [22] in a straightforward way,
without any need of a shock capturing scheme. As analyzed in detail in [2],
this special surface is naively the limit case of the edge strength function v [22]
when the degree of regularization specified by the parameter ρ tends to infin-
ity (Fig. 2(c)-(e)). The excessive regularization employed in the formulation of
φ makes it possible to obtain a very stable skeleton representation but this
stability comes at the expense of losing information about boundary details.
In contrast to Blum’s skeletons, it is impossible to recover the distance from
a skeleton point to the closest point on the shape boundary from the surface
values.

(a) (b) (c) (d) (e)

Fig. 2. (a) A camel shape. The level curves of the surfaces (b) φ, (c) v, computed with
ρ = 16, (d) v, computed with ρ = 64, (e) v, computed with ρ = 256.
24 A. Erdem and S. Tari

In this study, we exploit the link between the


surfaces φ and v, and in order to obtain the radius
functions associated with the positive branches of
disconnected skeletons (which are analogous to
the Blum skeleton), we propose to benefit from
a corresponding v surface. Consider a ribbon-like
section of a shape illustrated in Fig. 3 in which the
dotted line shows the skeleton points represent- Fig. 3. An illustration of a
ing that shape section. Assuming the 1D form of ribbon-like section and its
the edge strength function v, the diffusion process skeleton (the dotted line)
along a 1D slice (shown in red) is given by:
v(x)
vxx (x) − =0; 0 ≤ x ≤ 2d
ρ2
with the boundary conditions v(0) = 1, v(2d) = 1.

The explicit solution of this equation can be easily derived as:


 
1 − e2d/ρ −x/ρ 1 − e−2d/ρ
v(x) = e − ex/ρ (1)
e−2d/ρ − e2d/ρ e−2d/ρ − e2d/ρ

The value of v on the skeleton point (the midpoint x = d) is equal to the hy-
1
perbolic cosine function cosh(d/ρ) , or equivalently, the distance from the skeleton
point to the closest point on the boundary is given by ρcosh−1 ( v(d) 1
). This ex-
plicit solution is certainly not valid for the 2D case as the interactions in the
diffusion process are more complicated but it can be used as an approximation.
Let s be a skeleton point located at (sx , sy ) along a positive skeleton branch.
Given a corresponding edge strength function v computed with a sufficiently
large value of ρ, the minimum distance from s to the shape boundary, denoted
by r(s), can be approximated with:

1
r(s) = ρcosh−1 (2)
v(sx , sy )
Fig. 4(a) shows the disconnected skeleton of a horse shape where the radius
functions of the positive skeleton branches are approximately obtained from the
edge strength function computed with ρ = 256 (the same value of ρ is used in
the experiments). The reconstructions of the shape sections associated with the
positive skeleton branches are given separately in Fig. 4(b). Notice that small de-
tails on the shape boundary, e.g. the horse’s ears, cannot be recovered completely
since the perturbations on the shape boundary are ignored in disconnected skele-
ton representation. Moreover, the reconstructions might deviate from their true
form at some locations, e.g. the skeleton points close to the leg joints, where a
positive branch loses its ribbon-like structure of having slowly varying width.
However, these approximate radius functions, when normalized with respect to
the radius of maximum circle associated with the shape center, can be used as
the descriptions of the most prominent boundary details (Fig. 4(c)).
Coarse-to-Fine Matching of Shapes Using Disconnected Skeletons 25

(a) (b)
Approximate radius function along axis A Approximate radius function along axis B Approximate radius function along axis C Approximate radius function along axis D
1 1 1 1

0.8 0.8 0.8 0.8


normalized radii values

normalized radii values

normalized radii values

normalized radii values


0.6 0.6 0.6 0.6

0.4 0.4 0.4 0.4

0.2 0.2 0.2 0.2

0 0 0 0
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
medial points medial points medial points medial points

Approximate radius function along axis E Approximate radius function along axis F
1 1

0.8 0.8
normalized radii values

normalized radii values

0.6 0.6

0.4 0.4

0.2 0.2

0 0
0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100
medial points medial points

(c)

Fig. 4. (a) Disconnected skeleton of a horse shape and the radius functions obtained
from the edge strength function computed with ρ = 256 (the maximal inscribed circles
are drawn at every 3 consecutive skeleton points). (b) Shape sections associated with
the positive skeleton branches. (c) Normalized radius functions associated with the
branches A-F (from top left to bottom right).

3 Learning Boundary Deformations in a Shape Category

In the previous section, we developed a way to supply information about bound-


ary details to disconnected skeletons. In this section, we extend our analysis
and use the enriched skeleton descriptions to learn boundary deformations in a
shape category from a given set of examples. It is noteworthy that the one-level
hierarchy in the skeleton descriptions makes the learning process very practical
since each positive skeleton branch simply corresponds to a major protrusion of
the shape, and hence the correspondences among two disconnected skeletons can
be found by a one-to-one matching.
Once the correspondence information is available, we follow the approach
in [5], and model boundary deformations of a shape section in a category by
forming a low-dimensional linear space from the corresponding radius functions.
To be specific, we first uniformly sample equal number of points along matched
26 A. Erdem and S. Tari

Approximate radius functions along medial axes


1

0.8

normalized radii values


0.6

0.4

0.2

0
4 8 12 16 20 24 28 32
sampled medial points

(a) (b)

Fig. 5. An analysis of boundary deformations using approximated radius functions.


(a) Equivalent shape sections of 15 squirrel shapes, each associated with a positive
skeleton branch. (b) The corresponding set of uniformly sampled radius functions.

positive branches (Fig. 5). The deformation space is then modeled by applying
principal component analysis (PCA), where the first few principal components
describe the representation space for possible deformations. In the experiments,
our sample rate is 32 per each positive skeleton branch, and we use the first
five principal components. Hence, each sampled radius functions are represented
with a 5-dimensional vector.

4 A Coarse-to-Fine Strategy to Incorporate Similarity of


Boundary Details into Category-Influenced Matching
In [3], we presented a novel tree edit distance based shape matching method,
named as category influenced matching, in which we used rooted attributed
depth-1 trees to represent disconnected skeletons of shapes. The novelty in that
work lies in the fact that the semantic roles of the shapes in comparison are dis-
tinguished as query shape or database shape (i.e. a member of a familiar shape
category), and the knowledge about the category of the database shape is utilized
as a context in the matching process in order to improve the performance. Such
a context is defined by a category tree, which is a special tree union structure,
nodes of which store basically the correspondence relations among the members
of the same shape category, and some statistical information about observed
skeleton attributes.
Here, we propose a fine tuning step to our category influenced matching
method, in which the computed distance between the shapes in comparison
is re-evaluated based on the similarity of their boundary details. Note that the
process presented in Section 2 for learning class-specific boundary deformations
can be easily integrated to the formation procedure of category trees. In that
Coarse-to-Fine Matching of Shapes Using Disconnected Skeletons 27

case, we additionally store the mean of the matched radius functions together
with the reduced set of principle components in the nodes of the category tree.
More formally, the overall algorithm can be summarized with the following two
successive steps:

1. Let T1 be the shape tree of the query shape which is being compared with
the shape tree of a database shape, denoted by T2 , nodes of which is linked
with a specific leaf node of the corresponding category tree. Compute an
initial distance and the correspondences between T1 and T2 using category
influenced matching method:
⎡ ⎤

d (T1 , T2 ) = min ⎣ rem (u) + ins (v) + ch (u, v, B)⎦ (3)


S
u∈Λ v∈Δ (u,v)∈Ω

where Λ and Δ respectively denote the set of nodes removed from T1 and
the set of nodes inserted to T1 from T2 , and Ω denotes the set of matched
nodes (See [3] for the details about the definition of cost functions associated
with the edit operations rem(ove), ins(ert) and ch(ange)).

2. Let S ∗ = (Λ∗ , Δ∗ , Ω ∗ ) be the sequence of edit operations transforming T1


into T2 with the minimum cost. Re-calculate the distance between T1 and
T2 according to Equation 4, in which Φ(u, v), appearing inside the extra
term in front of the label change cost function, is the similarity between
the radius functions associated with matched skeleton branches. Note that
Φ(u, v) is calculated after projecting the corresponding uniformly sampled
radius functions onto the related low-dimensional deformation space, as in
Equation 5.
 
d(T1 , T2 ) = rem (u) + ins (v) + (1 − Φ(u, v)) × ch (u, v, B) (4)
u∈Λ∗ v∈Δ∗ (u,v)∈Ω ∗

⎧  5


⎨ (αi − βi )2
√ 1 exp − if u, v express positive branches
Φ(u, v) = 2πσ 2 2σ 2 (5)

⎩ i=1
0 otherwise

where α and β are to the vectors formed by projecting the radius functions
associated with u and v onto related deformation space (σ is taken as σ = 0.4
in the experiments).

5 Experimental Results

To demonstrate the effectiveness of the proposed approach, we test our method


on the matching examples shown in Fig. 6, 7, in which the coarse structure
of disconnected skeletons alone is not enough in distinguishing the shapes. In
these examples, although part correspondences are correctly determined, the
28 A. Erdem and S. Tari

costs obtained with the category influenced matching method in [3] do not well
reflect the perceptual dissimilarities1 . On the other hand, when one examines the
differences in the boundary details, it is clear that a more perfect decision can
be made. For example, refer to Fig. 6. The pairs of radius functions associated
with the matched branches is much similar in the case of matching of the horse
shapes than the ones in the matching of the query horse shape with the cat
shape. The only exception is the similarity of the horses’ tails (Fig. 6(b), in the
middle row and on the right) but note that these radius functions are compared
in the corresponding deformation spaces that are learned from the given set of
examples. In this regard, the proposed coarse-to-fine strategy can be used to
refine the matching results.

Approximate radius functions Approximate radius functions Approximate radius functions Approximate radius functions
1 1 1 1

0.8 0.8 0.8 0.8


normalized radii values

normalized radii values

normalized radii values

normalized radii values

0.6 0.6 0.6 0.6

0.4 0.4 0.4 0.4

0.2 0.2 0.2 0.2

A − branch(1) A − branch(3) A − branch(1) A − branch(3)


B − branch(1) B − branch(3) C − branch(1) C − branch(3)
0 0 0 0
4 8 12 16 20 24 28 32 4 8 12 16 20 24 28 32 4 8 12 16 20 24 28 32 4 8 12 16 20 24 28 32
sampled medial points sampled medial points sampled medial points sampled medial points

Approximate radius functions Approximate radius functions Approximate radius functions Approximate radius functions
1 1 1 1

0.8 0.8 0.8 0.8


normalized radii values

normalized radii values

normalized radii values

normalized radii values

0.6 0.6 0.6 0.6

0.4 0.4 0.4 0.4

0.2 0.2 0.2 0.2

A − branch(5) A − branch(7) A − branch(5) A − branch(7)


B − branch(5) B − branch(7) C − branch(7) C − branch(9)
0 0 0 0
4 8 12 16 20 24 28 32 4 8 12 16 20 24 28 32 4 8 12 16 20 24 28 32 4 8 12 16 20 24 28 32
sampled medial points sampled medial points sampled medial points sampled medial points

Approximate radius functions Approximate radius functions Approximate radius functions Approximate radius functions
1 1 1 1

0.8 0.8 0.8 0.8


normalized radii values

normalized radii values

normalized radii values

normalized radii values

0.6 0.6 0.6 0.6

0.4 0.4 0.4 0.4

0.2 0.2 0.2 0.2

A − branch(9) A − branch(11) A − branch(9) A − branch(11)


B − branch(9) B − branch(11) C − branch(11) C − branch(13)
0 0 0 0
4 8 12 16 20 24 28 32 4 8 12 16 20 24 28 32 4 8 12 16 20 24 28 32 4 8 12 16 20 24 28 32
sampled medial points sampled medial points sampled medial points sampled medial points

(a) (b)

Fig. 6. Some matching results and the uniformly sampled radius functions of matched
branches. The final matching costs are (a) 0.5800 (reduced from 0.7240), (b) 0.5368 (re-
duced from 0.7823). Note that the similarity of radius functions are actually computed
in the related low-dimensional deformation spaces.

1
In each experiment, the knowledge about the category of the database shape (the
ones on the right) is defined by 15 examples of that category, randomly selected from
the shape database given in [3].
Coarse-to-Fine Matching of Shapes Using Disconnected Skeletons 29

(a) (b)

(c) (d)

(e) (f)

Fig. 7. Some other matching results. The final matching costs are (a) 1.1989 (reduced
from 1.2904), (b) 0.9458 (reduced from 1.4936), (c) 1.9576 (reduced from 2.1879), (d)
1.8744 (reduced from 3.0387), (e) 0.8052 (reduced from 0.8105), (f) 0.6738 (reduced
from 1.0875).

6 Summary and Conclusion


Despite its coarse structure, disconnected skeleton representation is a very stable
and effective skeleton based representation. However, as the result of the exces-
sive regularization employed in the extraction process, no information about
boundary details is available in the skeleton descriptions. As articulated in [2],
this is in fact a compromise between the opposing goals of stability and sen-
sitivity. To enrich disconnected skeletons, we present a simple way to obtain
the radius functions associated with the positive skeleton branches. This allows
us to learn class-specific boundary deformations in a category when the corre-
spondence relations among the members of the category is specified. This extra
information is then incorporated into the category influenced matching method
in [3] as a refinement step, in which the initial matching cost is re-evaluted by
taking into account the similarity of radius functions of the matched positive
branches. Our experiments show that this approach can be used to obtain per-
ceptually more meaningful matching costs when the structure of disconnection
points by themselves are not so distinctive in distinguishing shapes.
30 A. Erdem and S. Tari

References
1. Aslan, C., Tari, S.: An axis-based representation for recognition. In: ICCV 2005,
vol. 2, pp. 1339–1346 (2005)
2. Aslan, C., Erdem, A., Erdem, E., Tari, S.: Disconnected skeleton: Shape at its
absolute scale. IEEE Trans. Pattern Anal. Mach. Intell. 30(12), 2188–2203 (2008)
3. Baseski, E., Erdem, A., Tari, S.: Dissimilarity between two skeletal trees in a con-
text. Pattern Recognition 42(3), 370–385 (2009)
4. Blum, H.: Biological shape and visual science. Journal of Theoretical Biology 38,
205–287 (1973)
5. Zhu, S.C., Yuille, A.L.: Forms: A flexible object recognition and modeling system.
Int. J. Comput. Vision 20(3), 187–212 (1996)
6. Siddiqi, K., Kimia, B.B.: A shock grammar for recognition. In: CVPR, pp. 507–513
(1996)
7. Liu, T.L., Geiger, D., Kohn, R.V.: Representation and self-similarity of shapes. In:
ICCV, pp. 1129–1135 (1998)
8. August, J., Siddiqi, K., Zucker, S.W.: Ligature instabilities in the perceptual orga-
nization of shape. Comput. Vis. Image Underst. 76(3), 231–243 (1999)
9. Sebastian, T.B., Klein, P.N., Kimia, B.B.: Recognition of shapes by editing shock
graphs. In: ICCV, vol. 1, pp. 755–762 (2001)
10. Siddiqi, K., Shokoufandeh, A., Dickinson, S.J., Zucker, S.W.: Shock graphs and
shape matching. Int. J. Comput. Vision 35(1), 13–32 (1999)
11. Pelillo, M., Siddiqi, K., Zucker, S.W.: Many-to-many matching of attributed trees
using association graphs and game dynamics. In: Arcelli, C., Cordella, L.P., Sanniti
di Baja, G. (eds.) IWVF 2001. LNCS, vol. 2059, pp. 583–593. Springer, Heidelberg
(2001)
12. Pelillo, M., Siddiqi, K., Zucker, S.W.: Matching hierarchical structures using asso-
ciation graphs. IEEE Trans. Pattern Anal. Mach. Intell. 21(11), 1105–1120 (1999)
13. Torsello, A., Hancock, E.R.: A skeletal measure of 2d shape similarity. CVIU 95(1),
1–29 (2004)
14. Liu, T., Geiger, D.: Approximate tree matching and shape similarity. In: ICCV,
vol. 1, pp. 456–462 (1999)
15. Macrini, D., Siddiqi, K., Dickinson, S.: From skeletons to bone graphs: Medial
abstraction for object recognition. In: CVPR (2008)
16. Bai, X., Latecki, L.J.: Path similarity skeleton graph matching. IEEE Trans. Pat-
tern Anal. Mach. Intell. 30(7), 1282–1292 (2008)
17. Torsello, A., Hancock, E.R.: Matching and embedding through edit-union of trees.
In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS,
vol. 2352, pp. 822–836. Springer, Heidelberg (2002)
18. Demirci, M.F., Shokoufandeh, A., Dickinson, S.J.: Skeletal shape abstraction from
examples. IEEE Trans. Pattern Anal. Mach. Intell (to appear 2009)
19. Shokoufandeh, A., Dickinson, S.J., Siddiqi, K., Zucker, S.W.: Indexing using a
spectral encoding of topological structure. In: CVPR, pp. 2491–2497 (1999)
20. Demirci, M.F., van Leuken, R., Veltkamp, R.: Indexing through laplacian spectra.
CVIU 110(3), 312–325 (2008)
21. Bai, X., Latecki, L.J., Liu, W.Y.: Skeleton pruning by contour partitioning with
discrete curve evolution. IEEE Trans. Pattern Anal. Mach. Intell. 29(3), 449–462
(2007)
22. Tari, S., Shah, J., Pien, H.: Extraction of shape skeletons from grayscale images.
CVIU 66(2), 133–146 (1997)
An Optimisation-Based Approach to Mesh
Smoothing: Reformulation and Extensions

Yskandar Hamam1 and Michel Couprie2


1
F’SATIE at Tshwane University of Technology, Pretoria, RSA, and ESIEE, France
2
Université Paris-Est, Laboratoire d’Informatique Gaspard Monge,
Equipe A3SI, ESIEE

Abstract. The Laplacian approach, when applied to mesh smoothing


leads in many cases to convergence problems. It also leads to shrinking of
the mesh. In this work, the authors reformulate the mesh smoothing prob-
lem as an optimisation one. This approach gives the means of controlling
the steps to assure monotonic convergence. Furthermore, a new optimisa-
tion function is proposed that reduces the shrinking effect of the method.
Examples are given to illustrate the properties of the proposed approches.

Smoothing mesh data is a common issue in computer graphics, finite element mod-
elling and data visualisation. A simple and natural method to perform this task
is known as Laplacian smoothing, or Gaussian smoothing: it basically consists of
moving, in parallel, every mesh vertex towards the center of mass of its neigh-
bours, and repeating this operation until the desired smoothing effect is obtained.
In practice, it gives reasonable results with low computational effort when cor-
rectly tuned, and it is extremely simple to implement, hence its popularity.
This somewhat naive method has the drawback of shrinking objects: when
repeated until stability, it reduces any finite object to a single point. Thus, the
choice of when to stop smoothing iterations is a crucial issue.
However, Laplacian smoothing has inspired a number of variants and alterna-
tive methods. Taubin’s method [1] avoids shrinkage by alternating contraction
and expansion phases. The method of Vollmer et al. [2] introduces a term which
correspond to a (loose) attachment of the points to their initial position.
Another criticism made against Laplacian smoothing is that it lacks motiva-
tion, because it is not directly connected to any specific mesh quality criterion
[3,4,5]. A common approach to mesh smoothing consists of defining a cost func-
tion related to the mesh elements (relative positions of vertices, edge lengths,
triangle areas, angles, etc) and to design an algorithm that minimises this cost
function [4,5,6,7,8,9,10,11,12].
Mesh smoothing has also been tackled from the signal filtering point of view.
In this framework, [13] analyses the shrinkage effect of the Laplacian smooth-
ing method, and explains the nice behaviour of the operator described in [1].
Other classical filters, such as the mean and median filters, were also adapted to
meshes [14]. Other approaches are based on physical analogies [15,16], anisotropic
and non-linear diffusion [17,18,19,20,21], and curvature flow [22,23].

This work has been partially supported by the “ANR BLAN07-2 184378” project.

A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 31–41, 2009.

c Springer-Verlag Berlin Heidelberg 2009
32 Y. Hamam and M. Couprie

In this paper, we focus on the optimisation approach to mesh smoothing.


We show that a general formulation of the mesh smoothing problem as the
optimisation of a global cost function leads to unifying a number of previous
works, in particular the classical Laplacian smoothing and [1,2,10,11,12]. We
also discuss the convergence of the related algorithms and the quality of results.

1 Analysis and Extension of the Mesh Smoothing


Problem
In this section, the mesh smoothing problem is first reformulated to understand
its convergence properties, and then some extensions are given.
Reformulation. The Laplacian approach to mesh smoothing may be reformu-
lated as an optimisation problem. This optimisation gives a method, when using
gradient descent, which is close to the Laplacian smoothing.
Our input data is a graph G = (V, E), embedded in the 3D Euclidean space.
Each edge e in E is an ordered pair (s, r) of vertices, where s (resp. r) is the
sending (resp. receiving) end vertex of e. To each vertex v is associated a triplet
of real coordinates xv , yv , zv .
Note that the smoothing using this method is applied to the three coordinates
simultaneously. Since in most of the applications the coordinates are modified
independently, in this presentation only one dimension will be considered. The
other coordinates are treated in the same manner.
Consider the function
1
J= (xs − xr )2 (1)
2
∀(s,r)∈E

Optimising this function leads to grouping all the points in one, thus shrinking
the mesh to one point. We will show in what follows that under certain condi-
tions this yields, when optimised by gradient descent, to the basic Laplacian
smoothing technique. This function may be represented in matrix form. Let C
be the ⎧
node-edge incidence matrix of the graph G, defined as:
⎨ 1, if v is the sending end of edge e;
Cve = −1, if v is the receiving end of edge e;

0, otherwise.
Then J may be written as:
1 1 1
J= (Cx)t Cx = xt C t Cx = xt Ax (2)
2 2 2
where A = C t C. Then since C t is not full ranked (sum of rows is equal to
zero), the determinant of A is zero1 . Furthermore, let z = Cy, then we have
y t C t Cy = z t z ≥ 0 and hence A is positive semi-definite.
The A matrix is usually sparse for large problems, with the diagonal elements:
aii = number of edges incident to vertex i; and the off-diagonal elements:
1
For the notions of linear algebra, see e.g. [24].
An Optimisation-Based Approach to Mesh Smoothing 33

−1, if an edge exists between vertices i and j;
aij =
0, otherwise.
In the literature this matrix is referred to as the Laplacian matrix (also called
topological or graph Laplacian), it plays a central role in various mesh processing
applications [25]. Later on we will give some of the properties of matrix A.
Optimisation-based smoothing. Consider the optimisation problem of the
function J. The gradient is ∇x J = C t Cx = Ax. The gradient descent algorithm
may be written as
xn+1 = xn − αAxn (3)

where α is a predetermined constant corresponding to the step in the opposite


direction to the gradient, and n is the iteration number.
In order to compare this to the Laplacian smoothing, let A = D + G where D
is a diagonal matrix composed of the diagonal elements of A and G is the matrix
with diagonal elements zero and off-diagonal elements equal to those of A.
Equation (3) may now be rewritten as xn+1 = (I − αD)xn − αGxn , where I
is an identity matrix of appropriate size. For each coordinate
 xi of vertex i the
above expression becomes xn+1 i = (1 − α |V (i)|)x n
i + α n
j∈V (i) xj , where V (i)
is the set of neighbours of vertex i and |V (i)| is the number of elements of V (i),
or the number of edges incident to vertex i. Rewriting the second term of the
right-hand side of the previous equation gives xn+1
i = (1 − γi )xni + γi B(i), where
1 n
B(i) = |V (i)| j∈V (i) xj is the geometric centre of the neighbours of vertex i,
and γi = α |V (i)|.
This gives an algorithm which is a generalisation of the Laplacian smoothing.
For the purpose of comparing the various methods, this algorithm will be referred
to as the First Order (FO) algorithm. The original Laplacian smoothing used
γi = 1, ∀i (ie, each vertex is moved to the geometric center of its neighbours).
To improve the stability of the algorithm, this was modified later [2] to move
the vertex on the segment between its original position and the geometric center
of its neighbours. In the FO algorithm, the values of γi depend on the number
of edges incident to vertex i. In the case where this number is the same for all
the vertices, the two algorithms are exactly the same. We will now consider the
convergence properties of both algorithms.
Convergence of the Laplacian Smoothing. An iterationn of the Laplacian
smoothing has the form: xn+1i = (1 − β)xn
i + β
|V (i)| j∈V (i) xj , with 0 ≤ β ≤ 1.
n+1 n
In other terms, x = M x , where the matrix M is the Markovian matrix with
the diagonal elements mii = 1 − β, and the off-diagonal elements mij = |V β(i)|
whenever i is a neighbour of j; and mij = 0 otherwise.
Notice that the sum of the elements in each row is equal to one.
It is well established that any Markovian matrix has two eigenvalues at zero
and one, and the other eigenvalues lie between −1 and 1. This property assures
the convergence of the above algorithm. However, since the eigenvalues may
be negative or positive, this convergence is not monotonic and variables may
oscillate around their final value.
34 Y. Hamam and M. Couprie

Convergence of the First Order (FO) Algorithm. Consider equation (3),


it could be rewritten as: xn+1 = (I − αA)xn . It is well known that the algo-
rithm converges monotonically to a final value if the eigenvalues are are between
zero and one. We will now find a condition on α in order to assure the conver-
gence. A full analysis of the Laplacian can be found in [26]. In this work, we are
interested in the following property. The Laplacian matrix A satsifies the three
following conditions: a) All the eigenvalues of A are real and positive (λi ≥ 0, ∀i),
b) λmin (A) = 0, and c) λmax (A) ≤ 2a where a = max(aii ).
From the above we can give the conditions for the convergence of the algo-
rithm. If α is chosen such that 0 < α < λmax1 (A) , then the eigenvalues of (I − αA)
1 1
are between zero and one. Hence if α < 2a , then α < 2a < λmax1 (A) and condition
0 < α < λmax1 (A) is satisfied, leading to monotonic convergence.
Let s be a vector of ones of an appropriate dimension. Then, from (3), we
have st xn+1 = st xn − αst Axn = st xn − αst C t Cxn , and since st C t = 0, we have
st xn+1 = st xn .
Thus the sum of xn is invariant. Since the optimal solution is a point (sum of
squares of the lengths of edges at the optimal solution equal zero) the solution
converges to the geometric centre of the vertices.
The modification of the algorithm based on the optimisation criterion gives
a better understanding of the algorithm and leads to the improvement of the
convergence of the Laplacian filter. The optimisation approach, however, per-
mits the extension to other objective functions. We will next give a new func-
tion that, when optimised, conserves under certain conditions the size of the
object.

Optimisation with attach to the initial coordinates. In order to preserve


the dimension of the object, we first propose to modify equation (2) by adding
a term related to the distance between the smoothed points and their original
position. Consider the function

1 
J= (x − x
)t (x − x
) + θxt Ax (4)
2
where x is the initial value of the coordinate vector x and θ is a positive constant
that allows changing the respective weigths of the two parts of the function. If
θ = 0 then there is no need for optimisation and the minimum of J is obtained
for x = x. For θ >> 1 the function is equivalent to (2). Thus this function is a
compromise between keeping the vertices at their initial positions and reducing
the distance between points. Now consider the gradient of J with respect to x:
∇x J = (x − x ) + θAx. At the optimum, we have (x − x ) + θAx = 0, that is,
(I + θA)x = x .
Consider the matrix (I + θA). Since A is symmetric positive semi-definite, its
eigenvalues are greater or equal to zero. Adding an identity matrix to θA with
θ ≥ 0 gives a positive definite matrix. Hence the inverse of (I +θA) exists and for
small size problems the above equation may be solved to give x = (I + θA)−1 x .
Also note that, due to this property, the solution is unique.
An Optimisation-Based Approach to Mesh Smoothing 35

Application of the Gradient Descent Method. In the following we will


develop the gradient descent method applied to the above function. One iteration
of the gradient descent method is as follows: xn+1 = xn − αn ∇x J = xn −
αn [(xn − x) + θAxn ], where n is the iteration number and αn is a positive scalar
corresponding to the step in the opposite direction of the gradient. Consider first
the case where αn is constant (αn = α).
As for the previous gradient descent solution, it may be shown that the algo-
1
rithm converges monotonically when α < 1+θλmax (A) .
Since λmax (A) ≤ 2a, the condition for monotonic convergence is α < 1+2θa 1
.
This algorithm will be refered to hereafter as the First Order With Attach
(FOWA) algorithm. Unlike the previous case, for this optimisation problem we
wish to reach the optimal solution. So it is worthwhile to obtain an optimal step
at each iteration. In what follows, an optimal step for the descent method will
be developed.
Let α∇x J be the step taken in the direction opposite to the gradient. The
objective function may be expressed in the vicinity of xn as J n = 12 [(xn −α∇x J −
)t (xn − α∇x J − x
x ) +θ(xn − α∇x J)t A(xn − α∇x J)].
Differenciating this function with respect to α and setting the derivative to
zero gives
∇x J t ∇Jx
αn = (5)
∇x J t (I + θA)∇x J
which is the optimal step for the gradient descent. However, this does not as-
sure monotonic convergence. Consider the 2-norm property of a real symmetric
matrix:
||Mz||2 t
||M ||2 = supz=0 ||z||2 = supz=0 z zMz
tz = λmax (M )

By identifying the terms with equation (5), M with (I + θA) and z with ∇x J,
we get the following inequality: αn ≥ 1+θλmax
1
(A) .
The optimal value given by equation (5) does not satisfy the limit on α
for monotonic convergence but gives faster convergence. However, this doubles
the computation time at each iteration. In our experiments, we noticed that in
the optimal step case oscillations are indeed obtained. Furthemore, note that
the optimisation problem is quadratic. This problem may be solved by the con-
jugate gradient method. If this is done, the exact solution may be obtained by
N iterations, where N is the size of the matrix A, i.e. the number of vertices in
the graph.

2 Proposed Functions and Optimisation Schemes

In the above section, two special cases of functions were given. Many proposals
have been made to smooth while conserving the size of objects. In this section,
a second order function is proposed. Special cases are then considered and com-
pared. It is then shown that many published methods are special cases of the
36 Y. Hamam and M. Couprie

optimisation of this function. Consider the following second order function with
attach to the initial coordinates:
1 2

J= (x − x
)t Q(x − x
) + θ0 xt x + θ1 xt Ax + θ2 xt A x (6)
2
where
– Q is a symmetric positive definite weighing matrix,
– θ0 , θ1 and θ2 are weighing scalars for the zero, first and second order terms,
– A = C t ΩC, and Ω is a diagonal matrix of weights associated to each edge
(see [27]).
Let us now consider two special cases of the proposed function. The first with-
out a term that attaches the vertices to their original position, and the second
with such a term. The first order objective function minimises the sum of the
squares of distances between adjacent vertices. In the proposed objective func-
tion, it is proposed to minimise the sum of the squares of the distances between
vertices and the geometric centre of their neighbours. The method obtained
based on the optimisation of this function will be refered to as the Second Order
(SO) algorithm.
Case 1. Consider the function
1 t 1
J= x (AA)x = xt A2 x (7)
2 2
In this function Ax gives a measure of the deviation of each xi from the geometric
centre of its neighbours. So (Ax)t Ax = xt AAx is the sum of the squares of
the distances of each vertex from the geometric centre of its neighbours. In
comparison with the Laplacian case, where the sum of the squares of distances
between neighbouring vertices is minimised, this function is proposed to reduce
shrinkage.
Application of the Gradient Descent Method. In a similar manner to the
above development one iteration of the gradient descent method is as follows:
xn+1 = xn − αn ∇x J = xn − αn A2 xn .
With αn being constant (αn = α), the condition for assuring monotonic con-
vergence is α < λ2 1 (A) , which is obtained if the following condition is satisfied:
max
α < 4a12 .
As for the previous case, an extra term is added to the objective function that
attaches the vertices to their original positions. This gives the method that we
will refer to as the Second Order With Attach (SOWA) algorithm.
Case 2. Consider the function
1 
J= (x − x
)t (x − x
) + θxt (AA)x (8)
2
where x is the initial value of the coordinate vector x. Now consider the gradient
of J with respect to x: ∇x J = (x − x ) + θA2 x. At the optimum, we have:
(x − x
) + θA x = 0, that is, (I + θA )x = x
2 2
.
An Optimisation-Based Approach to Mesh Smoothing 37

In a similar manner to the previous case, the inverse of (I +θA2 ) exists and for
small size problems the above equation may be solved to give x = (I + θA2 )−1 x ,
and the solution is unique.
Application of the Gradient Descent Method. With similar considerations
as above, one iteration of the gradient descent
 method is as follows: xn+1 =
x − α ∇x J = x − α (x − x
n n n n n
) + θA x , and for monotonic convergence αn
2 n
n
is considered to be constant (α = α).
The algorithm converges monotonically when α < 1+θλ21 (A) , which is as-
max
1
sured if α < 1+4θa 2.

To accelarate convergence the optimal step is developed. Let α∇x J be the


step taken in the direction opposite to the gradient.
The objective function may be expressed in the vicinity of xn as J n = 12 [(xn −
α∇x J − x )t (xn − α∇x J − x) + θ(xn − α∇x J)t A2 (xn − α∇x J)].
Differenciating this function with respect to α and setting the derivative to
zero gives
∇x J t ∇x J
αn = (9)
∇x J t (I + θA2 )∇x J
which gives the optimal step for the gradient descent.
In the above, we have opted for solving the optimization problems using the
gradient descent. With attach to original points, the optimality conditions may
be used directly. Even when A is sparse, A2 is less so and the solution may
be lenghty. The gradient descent may also be accelerated using the conjugate
gradient method.

3 Fitting Other Published Algorithms within Our


Framework
In this section, published smoothing methods are considered. These methods are
fit within the above proposed framework.
Taubin’s method. In [1,13] the author gives a method that avoids shrinkage.
It will be shown in this section that this method corresponds to a special case of
the above function. One iteration of Taubin corresponds to the following: xn+1 =
(I − μK)(I − λK)xn , where λ > 0, μ < 0 and μ + λ < 0. The matrix K given
by Taubin may be expressed using our notations by setting A = DK. Consider
the special case where D = dI, then K = d1 A, and xn+1 = [(I + θ1 A − θ2 A2 ]xn ,
where θ1 = −(μ+λ)
d > 0 and θ2 = −μλ d2 > 0. By identifying terms the following
optimisation function is obtained: J = 12 [−θ1 xt Ax + θ2 xt A2 x].
If θ1 << θ2 the algorithm SO is obtained. Furthermore, note that the first
term increases the size whereas the second reduces it. The first, however, pro-
duces much more shrinkage than the second. A combination of terms may lead
to smoothing with less shrinkage. Tuning the terms, however, is quite delicate.
Other algorithms. Many other works exist in the literature that may be inte-
grated in this framework. In the work of Sorkine, for example, the main term of
38 Y. Hamam and M. Couprie

the function to be optimised (eq. 3 of [25]) is a special case of eq. (6), where only
the first term is used. In Sorkine’s notations the matrix Q is equal to At (D−1 )2 A
in our framework. Sorkine adds a second term for the purpose of fixing some ver-
tices to their original positions.
In a more recent work by Bougleux et al. [12], eq. 1 gives the optimisation
function. In the same way as in Sorkine’s work, a term is added for the purpose
of fixing some vertices to their original positions. In the case where no attach
points are given and p = 2 (the most common one), this function is equal to the
first order term of ours.
Other methods may also be represented using the proposed framework. We
may cite here, as other examples, the work of Vollmer et al. [2], the work of
Nealen et al. [11] and the one of Ji et al. [10].

4 Numerical Results
In this section the following five smoothing algorithms are compared:
1. FO algorithm: This is the first order (FO) optimisation based modified
smoothing with the objective function as the sum of the squares of distances
between adjacent points as given in eq. (2).
2. FOWA algorithm: The previous method with an attach term (FOWA) added
to the objective function related to the original positions of the points as
given in eq. (4).
3. SO algoritm: The optimisation based modified smoothing with the objective
function as the sum of the squares of distances between adjacent points and
the geometric center of their neighbours. This is refered to as the second
order (SO), and corresponds to the function given in eq. (7).
4. SOWA algorithm: The previous method with an attach term added to the
objective function related to the original positions of the points. It is refered
to as the second order with attach (SOWA) and corresponds to the function
given in eq. (8).
5. HC algorithm: This is the algorithm described by [2]. It is used in the compar-
ison since it is considered to be quite efficient for smoothing while reducing
shrinkage.
In order to test the shrinkage and smoothing properties of the functions we will
first compare the FO and SO algorithms. This is done to compare the effect
of using the two distance measures. To run this test, a sphere with a radius of
one is used. Random noise was added to the sphere, which gave a sphere with
1.001 mean distance to center and 0.665 standard deviation. The mean angle
between two facets is 0.527. This sphere was then smoothed using FO and SO
algorithms. This was run to give smoothing of the same order of magnitude.
The mesh smoothed by the FO algorithm using 13 iterations gave the following
properties: Mean angle = 0.069, Standard deviation = 0.075, Mean distance to
center = 0.977. The mesh smoothed by the SO algorithm using 30 iterations gave
the following properties: Mean angle = 0.070, Standard deviation = 0.075, Mean
An Optimisation-Based Approach to Mesh Smoothing 39

distance to center = 1.00015. The shrinkage in the first case is 2.42% whereas in
the second case it is 0.125%.
These results show that it is more interesting to use as an optimisation func-
tion the sum of the squares of the distances between the vertices and the geomet-
ric center of the adjacent vertices (second order term). For equivalent smoothing
this method gives significantly less shrinkage.
Consider next the two algorithms (FOWA and SOWA) based on functions
with attach to the original points. These are compared using the sphere with
three noise levels (0.05, 0.5 and 1.0) as shown in Table 1.

Table 1. Comparison of the four methods for the sphere with various noise levels and
various values of θ. The values of α used are those calculated to ensure monotonic
convergence.

Method θ α N. iter. Shrink MSE Method θ α N. iter. Shrink MSE


FOWA 0.25 0.250 22 0.990 0.0016 FOWA 0.25 0.250 33 0.993 0.266
0.50 0.142 43 0.979 0.0012 0.50 0.142 63 0.988 0.375
1.00 0.076 88 0.959 0.0021 1.00 0.0769 126 0.978 0.840
5.00 0.016 440 0.822 0.0318 5.00 0.0163 644 0.909 8.648
SOWA 0.25 0.027 156 1.001 6.55e-4 SOWA 0.25 0.027 286 0.998 0.172
0.50 0.014 313 0.999 4.79e-4 0.50 0.013 585 0.998 0.221
1.00 0.007 639 0.998 3.89e-4 1.00 0.007 1189 0.998 0.319
5.00 0.001 4067 0.991 4.72e-4 5.00 0.001 6323 0.996 0.921
Noise = 0.05 Noise = 0.5
Method θ α N. iter. Shrink MSE
FOWA 0.25 0.2500 33 0.992 0.751
0.50 0.1428 63 0.987 0.644
Noise α β N. iter. Shrink MSE
1.00 0.0769 126 0.977 1.002
0.05 0.1 0.6 27 1.0009 0.00088
5.00 0.0164 644 0.908 8.825
0.50 0.1 0.6 56 0.9985 0.15828
SOWA 0.25 0.0270 287 0.997 0.362
1.00 0.1 0.6 57 0.99718 0.41335
0.50 0.0137 586 0.997 0.354
1.00 0.0069 1190 0.997 0.414
5.00 0.0014 6323 0.995 0.970
Noise = 1.0 Results obtained using HC Algorithm

In the above table, the MSE value is the sum of the squares of the error
between the smoothed points and the original sphere. The shrink value given
is the ratio between the average distance of points to the center of the sphere
divided by that of the sphere with noise. Notice that in all cases when the sum
of the squares of the distances between the vertices and the geometric center
of the adjacent vertices is used the shrinkage is less than 1%. The method that
gives best results is SOWA.
In this table, some results obtained for the same sphere by the HC
Algorithm [2] are also given. This algorithm necessitates the tuning of two pa-
rameters (α and β). The results using the tuning given by [2] are given. The
algorithm gives equivalent results to our method.
40 Y. Hamam and M. Couprie

a b c d e

Fig. 1. Some results of Laplacian smoothing. a: Original cube, b: cube with added
noise, c: after 5 iterations, d: after 10 iterations, e: after 15 iterations.

a b c d

Fig. 2. a,b,c: Some results using SOWA. a: θ = 0.025, b: θ = 0.05, c: θ = 0.1. d:


Algorithm of Vollmer et al. (HC algorithm).

To illustrate the proposed schemes, a cube with a small number of points is


used and noise is added. The results of smoothing the noisy cube (see figure 1b)
using the above described algorithms are shown in the figures below.

Computation time. To summarize our experiments on the computation time


of the SOWA algorithm, we found that the number of iterations does not vary
significantly with the number of points. Furthermore, the computation time per
point and per iteration is almost constant, around 0.5 μs per vertex with a
standard laptop computer. The algorithm is quasi linear.

References
1. Taubin, G.: A signal processing approach to fair surface design. In: Computer
Graphics Proceedings, Annual Conference Series, pp. 351–358 (1995)
2. Vollmer, J., Mencl, R., Muller, H.: Improved laplacian smoothing of noisy surface
meshes. Computer Graphics Forum 18(3), 131–138 (1999)
3. Parthasarathy, V., Kodiyalam, S.: A constrained optimization approach to finite
element mesh smoothing. Finite Elements in Analysis and Design 9, 309–320 (1991)
4. Freitag, L.: On combining laplacian and optimization-based mesh smoothing tech-
niques. In: Joint ASME, ASCE, SES symposium on engineering mechanics in man-
ufacturing processes and materials processing, pp. 37–43 (1997)
5. Amenta, N., Bern, M., Eppstein, D.: Optimal point placement for mesh smoothing.
Journal of Algorithms 30, 302–322 (1999)
6. Bank, R., Smith, R.: Mesh smoothing using a posteriori error estimates. SIAM
Journal on Numerical Analysis 34(3), 979–997 (1997)
7. Freitag, L., Jones, M., Plassmann, P.: A parallel algorithm for mesh smoothing.
SIAM Journal on Scientific Computing 20(6), 2023–2040 (1999)
8. Freitag, L., Knupp, P., Munson, T., Shontz, S.: A comparison of optimization soft-
ware for mesh shape-quality improvement problems. In: Int. Meshing Roundtable,
pp. 29–40 (2002)
An Optimisation-Based Approach to Mesh Smoothing 41

9. Chen, Z., Tristano, J., Kwok, W.: Combined laplacian and optimization-based
smoothing for quadratic mixed surface meshes. In: 12th International Meshing
Roundtable (2003)
10. Ji, Z., Liu, L., Wang, G.: A global laplacian smoothing approach with feature
preservation. In: Int. Conf. on Computer Aided Design and Computer Graphics,
pp. 269–274 (2005)
11. Nealen, A., Igarashi, T., Sorkine, O., Alexa, M.: Laplacian mesh optimization. In:
ACM GRAPHITE, pp. 381–389 (2006)
12. Bougleux, S., Elmoataz, A., Melkemi, M.: Discrete regularization on weighted
graphs for image and mesh filtering. In: Sgallari, F., Murli, A., Paragios, N. (eds.)
SSVM 2007. LNCS, vol. 4485, pp. 128–139. Springer, Heidelberg (2007)
13. Taubin, G.: Curve and surface smoothing without shrinkage. In: Fifth International
Conference on Computer Vision, pp. 852–857 (1995)
14. Yagou, H., Ohtake, Y., Belyaev, A.: Mesh smoothing via mean and median filtering
applied to face normals. In: Procs. Geometric Modeling and Processing, pp. 124–
131 (2002)
15. Djidjev, H.N.: Force-directed methods for smoothing unstructured triangular and
tetrahedral meshes. In: Ninth International Meshing Roundtable, pp. 395–406
(2000)
16. Mezentsev, A.: A generalized graph-theoretic mesh optimization model. In: 13th
International Meshing Roundtable, pp. 255–264 (2004)
17. Ohtake, Y., Belyaev, A.G., Bogaevski, I.A.: Polyhedral surface smoothing with
simultaneous mesh regularization. In: Procs. Geometric Modeling and Processing,
pp. 229–237 (2000)
18. Ohtake, Y., Belyaev, A.G., Bogaevski, I.A.: Mesh regularization and adaptive
smoothing. Computer-Aided Design 33(11), 789–800 (2001)
19. Ohtake, Y., Belyaev, A., Seidel, H.: Mesh smoothing by adaptive and anisotropic
gaussian filter applied to mesh normals. In: Vision, Modeling, and Visualization,
pp. 203–210 (2002)
20. Tasdizen, T., Whitaker, R., Burchard, P., Osher, S.: Geometric surface smoothing
via anisotropic diffusion of normals. In: IEEE Visualization 2002, pp. 125–132
(2002)
21. Fleishman, S., Drori, I., Cohen-Or, D.: Bilateral mesh denoising. ACM Transactions
on Graphics 22(3), 950–953 (2003)
22. Desbrun, M., Meyer, M., Schröder, P., Barr, A.H.: Implicit fairing of irregular
meshes using diffusion and curvature flow. In: 26th annual conference on Computer
graphics and interactive techniques, pp. 317–324 (1999)
23. Zhao, H., Xu, G.: Triangular surface mesh fairing via gaussian curvature flow.
Journal of Computational and Applied Mathematics 195(1-2), 300–311 (2006)
24. Strang, G.: Introduction to Linear Algebra. Wellesley-Cambridge Press (2003)
25. Sorkine, O.: Differential representations for mesh processing. Computer Graphics
Forum 25(4), 789–807 (2006); alt. title: Laplacian mesh processing (Eurographics
2005 presentation)
26. Chung, F.R.: Spectral Graph Theory. Amer. Mathematical Society, Providence
(1997)
27. Field, D.A.: Laplacian smoothing and delaunay triangulations. Communications in
Applied Numerical Methods 4(6), 709–712 (1988)
Graph-Based Representation of Symbolic
Musical Data

Bassam Mokbel, Alexander Hasenfuss, and Barbara Hammer

Clausthal University of Technology, Department of Computer Science,


Clausthal-Zellerfeld, Germany

Abstract. In this work, we present an approach that utilizes a graph-


based representation of symbolic musical data in the context of automatic
topographic mapping. A novel approach is introduced that represents
melodic progressions as graph structures providing a dissimilarity mea-
sure which complies with the invariances in the human perception of
melodies. That way, music collections can be processed by non-Euclidean
variants of Neural Gas or Self-Organizing Maps for clustering, classifi-
cation, or topographic mapping for visualization. We demonstrate the
performance of the technique on several datasets of classical music.

1 Introduction
The ever increasing amount of music collections available in online stores or
public databases has created a need for user-friendly and powerful interactive
tools which allow an intuitive browsing and searching of musical pieces.
Ongoing research in the field of music information retrieval thus includes
the adaptation of many standard data mining and retrieval tools to the music
domain. In this regard the topographic mapping and visualization of large music
compilations combines several important features: data and class structures are
arranged in such a way that an inspection of the full dataset as well as an
intuitive motion through partial views of the database become possible.
Generally, there are two basic ways to automatically construct the topographic
arrangement for a mapping: One way – the classical one – is using a set of fea-
tures to position each subject in an Euclidean space. Then, since the number
of features is usually much larger than three, the high-dimensional spaces have
to be projected to two or three dimensions for visualization. Here, usually tech-
niques like the linear Principal Components Analysis (PCA) or the non-linear
Self-Organizing Map (SOM) are applied. Unfortunately, it is often not possible
to represent complex data such as symbolic musical data, i.e. sequences of notes,
by a set of Euclidean vectors. Therefore, there is demand for representations
that are capable of capturing complex structures.
A more sophisticated way uses pairwise dissimilarities between all subjects
able to capture more complex structures in the data. But in general, these dis-
similarities are no longer Euclidean distances and there is no embedding into
any Euclidean space without distortion — they may not be metric at all. Hence,
the classical methods cannot be applied in this case. Recently, variants of Neural
Gas and Self-Organizing Maps for dissimilarity datasets have been introduced

A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 42–51, 2009.

c Springer-Verlag Berlin Heidelberg 2009
Graph-Based Representation of Symbolic Musical Data 43

that are able to directly handle arbitrary similarity data instead of only simple
Euclidean ones [9].
To make use of those sophisticated non-Euclidean methods, it is essential to
have a dissimilarity measure at hand that provides a reliable pairwise measure-
ment of the complex data. For music, there are many different features that can
be extracted by algorithmic methods, either from acoustic data or a symbolic
description of a musical piece. Global features like the overall tempo, musical
key, pitch transition statistics, dynamics statistics or spectral features can be
measured for a song and used directly for mapping. However, their importance
and level of expression is different in the various styles of music and thus it
is difficult to find musical feature sets that are generally valid and equally sig-
nificant for every genre. A variety of approaches handling musical data based
on pairwise comparisons have been proposed, including metrics popular in data
mining standards such as the cosine distance based on tf×idf weightings of the
basic constituents of given data, complex mathematical constructions such as
the Haussdorff distance [6,17] or the spectra of an associated graph [16].
To compute a dissimilarity between pieces based instead on their tonal and
rhythmic progression, it is possible to use the temporal progression of features
like rhythmic patterns, note or chord sequences to measure dissimilarity (an
overview of encoding methods can be found in [5]). This can be achieved with
a suitable method to measure string dissimilarity like the edit distance or the
powerful and universal compression distance. Especially the latter has been used
in this way on symbolic representations of musical data with promising results
in recent years, like e.g. in [2,4,13].
Due to the nature of acoustic signals, it usually requires much more effort to
extract high-level features from acoustic audio data than from symbolic music
representations like MIDI1 or MusicXML2 . But recent progress in developing
efficient and reliable automated extraction methods that are able to gain musical
notation directly from complex acoustic material, as presented in [11,15,20],
open the way towards mapping techniques which directly rely on a symbolic
description of musical data.
When defining a similarity measure on sequences of musical notes, certain
invariances should be respected to comply with the average human perception
of melodies: These include invariances to transposition of the notes to a different
key and the scaling of the tempo (further described in [7]).
In the following section, we introduce a new way to convert symbolic rep-
resentations of music into strings via priorly constructed precedence trees. We
show how the string encoding derived from the graph structure as well as the
subsequently applied Normalized Compression Distance (NCD) is beneficial for
the named invariances.
We implemented our method in Matlab3 and used MIDI files as symbolic input
data. We processed selected subsets of classical pieces from the comprehensive

1
http://www.midi.org
2
http://www.musicxml.org
3
http://www.mathworks.com
44 B. Mokbel, A. Hasenfuss, and B. Hammer

Kunst der Fuge 4 MIDI collection, containing pieces from almost a thousand
composers, spanning various musical forms and epochs. The generated dissimilar-
ities were mapped with a non-metric Multi-Dimensional Scaling with Kruskal’s
normalized stress1 criterion. The experiments show most of the data arranged
in meaningful clusters with a reasonable separation of composers and eras.

2 Graph-Based Representation of Symbolic Musical Data

To measure the dissimilarity between the tonal and rhythmic progression of two
pieces of music, our method compares string representations derived from their
note succession. We therefore developed an algorithm that converts the symbolic
note sequences in a MIDI file into a string, following a priorly constructed prece-
dence structure. Our algorithmic approach is based on the assumption that a hu-
man’s subjective perception of musical identity usually works very context-driven.
Thereby we suppose that most listeners will consider a melody to a certain extent
similar to a copy of it, if it has been changed in the following ways:

– It is shifted in its overall pitch, i.e. it has been transposed to another funda-
mental note.
– It is scaled in its overall tempo, i.e. all note lengths and pauses have been
contracted or elongated by a constant factor.

Thus we assume that the human perception of melodies is, to a certain extent,
invariant to overall pitch translation (pitch translation invariance 5 ) and to an
overall scaling of the tempo (time scaling invariance). To gain a measure that
is close to the human music perception we therefore encode the note sequences
to new symbolic sequences with an encoding method that is invariant to the
aforementioned changes of the note sequences. That means it produces the same
output, whether the input is the original or the altered note sequence. The in-
formation that is not encoded in the new strings is the magnitude by which
the pitch was shifted or the tempo scaled. As the described human assessment
of similarity would probably decrease along with a raise in magnitude of such
changes, it might be more truthfully described by distinguishing degrees of sim-
ilarity. Although this is not part of our encoding scheme at the moment, it is
easily imaginable to incorporate such information into our measure in the future.
Some related methods can be found in literature that partially provide for the
emphasized invariances, like e.g. [4,13,18,19]. In [4] pitch translation invariance
was achieved with a global pitch normalization throughout the entire piece,
making the encoding very sensitive regarding the automatic choice of the global
point of reference. In [13] every note’s pitch was encoded as the difference to the
pitch of its directly preceding note. In addition to independence of the overall
pitch, this method yields local separation: parts in the strings are equal for
4
http://www.kunstderfuge.com
5
Also referred to as pitch invariance or transposition invariance. The terms differ in
literature, we use the ones described in [7].
Graph-Based Representation of Symbolic Musical Data 45

parts of two songs that, aside from transposition, have equal note sequences,
even if the rests of these songs are completely different. Using common string
dissimilarity measures on those two representations would therefore reflect the
partly equality in its output value. In addition, one could store note lengths
and pauses analogously to gain time scaling invariance. Still, in these strings
you will find only very little equality in the case of two songs playing the same
melody (like a riff or a theme), only with dissimilar accompanying notes. Then
the output of a string dissimilarity measure would in our opinion not represent
most listeners’ assessments.
Our goal for the generated string representation was therefore a decomposi-
tion of the tonal and rhythmic progression of a song, that has the benefit of local
points of reference, but, on top of that, represents melodic lines and themes more
independently of the surrounding melodic context. Our strategy is to automati-
cally define precedence relationships between notes throughout the entire piece:
The functions pitch(n), start(n) and length(n) return the absolute pitch and the
normalized start time and length of a note n respectively. For every note n in
the sequence of notes N played in the song, the algorithm picks one designated
predecessor. For the current note cn ∈ N , the function pred(cn) returns one of
its time-wise preceding notes on the same MIDI channel (the sequence P (cn))
as the predecessor note.
We define

start(cn) − start(p)
pred(cn) = argmin k · |pitch(p) − pitch(cn)| + r ·
p∈P (cn) length(cn)

with P (n) = { x ∈ N : start(x) < start(n) } and p ∈ P (cn), cn ∈ N .


The predecessor is thereby chosen to be the closest prior note in terms of the
absolute difference in start time and in pitch. Since the time-wise distance is
calculated relatively to the current note’s length, the field of search is stretched
or shrinked in the time dimension depending on its length. As an alternative
strategy one could also consider stop times of prior notes instead of their start
times. The global parameters k and r determine the overall search strategy.
From a graph-theoretical point of view the resulting precedence structures
form a forest of weighted trees whose vertices represent the played notes. Ex-
amples are shown in Figure 1 and 4. The trees are then utilized to store the
change in pitch and length as well as the relative difference in note start times
at every edge along every path of every tree. For a note n the named changes
are calculated and stored in relation to its predecessor pr = pred(n) as follows:
relpitch(n) = pitch(n) − pitch(pr),
start(n) − start(pr) length(pr)
reltiming(n) = , rellength(n) = .
length(pr) length(n)
Next to rellength(n), the value of reltiming(n) adds some more information
about the rhythmic expression to the string representation, as it also encodes
the existence and length of pauses or overlaps between the notes. Since the
start times and note durations are being normalized beforehand, the results
from rellength(n) and reltiming(n) stay the same for varying musical tempo
46 B. Mokbel, A. Hasenfuss, and B. Hammer

120

100

80
+1 +1 +4 +1 +1 +7 +1 +1
+3 +3 +3
-1 -1 -1 -1 -1 -1
+2 +3 +1 +2 +1
-5 -2 +5 -5 -2 +5 -1 -5
+4
-2 -3 -2
+4 +4
pitch

+3 +3
60
+5 +4 +1 +5 +4 +1

+7 +7

-24
40 -5 -5

20

0
0 10 20 30 40 50 60 70 80 90 100
time

Fig. 1. The precedence tree structure for the first 60 notes of Beethoven’s ”Für Elise”.
The edges are marked with all nonzero pitch changes of notes relative to their prede-
cessors.

notations that represent the same musical output with different combinations of
overall tempo and note durations.
Considering the precedence structure, a small, local change in the musical
progression will subsequently cause it to alter locally, resulting yet only in a
local symbolic change of the string representation. To explain the benefit of
this behavior on a practical example, imagine a bass line which is - due to its
lower pitch - isolated in the tonal progression. Its melody would be represented
independently in the string, unaffected by changes of higher lead melodies played
at the same time on the same MIDI channel. In this way any two melodies
will have independent representations within the string as long as their note
sequences are sufficiently separated in pitch locally.
To sum it up, our encoding method is fully pitch translation invariant and
time scaling invariant on an entire song but also shows highly invariant behavior
upon changes to certain subsets of notes and hence to variations of melodies. It
offers a structural decomposition for the representation of polyphonic music or
polyphonic instrument tracks.
To calculate the dissimilarity of the string representations, we used the pop-
ular Normalized Compression Distance (NCD) (see [3]), a measure based on
approximations of the Kolmogorov Complexity from algorithmic information
theory described in [12]. The NCD is defined as

C(xy) − min{C(x), C(y)}


NCD(x, y) =
max{C(x), C(y)}
Graph-Based Representation of Symbolic Musical Data 47

where x and y are strings, C(x) denotes the compressed size of x and C(xy) the
compressed size of the concatenation of x and y using a real compressor. For our
experiments the bzip2 compression method was used. Since bzip2 works byte-
oriented as most of the common compression methods, the size of a reasonable set
of symbols is restricted to 28 . Therefore, we utilize the integer values in [1..255]
to code every possible relative state change of a note compared to its predeces-
sor. For every musical piece, we automatically build two strings, one that holds
the pitch changes relpitch(n) for every note n and another one for the rhythmic
progression. The latter is compiled from rellength(n) and reltiming(n), result-
ing in a string which is twice as long as the one representing the pitches. The
dissimilarity of two songs is then calculated as the mean value of the normalized
NCD of the pitch strings and the normalized NCD of the rhythmic strings. If one
disregards the possibility to change the overall tuning of a MIDI file or the use
of pitch-bending and vibrato while notes are played, MIDI distinguishes at most
128 different note pitches [0..127]. So our byte representation has to cover a total
of 255 possible values of relative pitch change, 127 upwards and 127 downwards
plus one for ’no change’ which in total is presentable with one byte. Our code
thus indicates decreasing pitches with the values [1..127], ’no change’ with 128
and increases with [129..255]. To encode the fractions of note durations given by
rellength, our algorithm maps all occurring numeric values to 127 real-valued in-
tervals that are centered around rhythmically important ratio values. These are
all the possible fractions between the lengths of a whole note, a half, a quarter,
1 1 1
an eighth, a 16 th, a 32 th and a 64 th note as well as the corresponding triplets
1
and five-lets in between. The encoding thereby treats a 64 th note followed by a
whole as the farthest upward change in note length and downward vice versa.
This way, very steep transitions of note lengths which exceed this magnitude
are all being treated as equal maximal steps as they are assigned to the same
interval. The resulting unique ratios consist of 63 values that are less than 1
(meaning the duration of the actual note is longer than the predecessor’s), an-
other 63 larger than 1 (the actual note is shorter than the predecessor), and the
ratio equal to 1 (durations are equal). These 127 ratios are presentable with half
a byte. The other byte values are used analogously as symbols to encode the
values of reltiming.
Apart from information about the instrumentation, our coding scheme disre-
gards musical phrasings like pitch-bending, vibrato, etc. We are currently work-
ing on an algorithm to correctly convert seamless pitch transitions (i.e. bendings)
to discrete notes. After the conversion our algorithm would include the emulated
pitch-bending like normal notes. This opens the way to further test our dissimi-
larity measure with datasets of mainstream and popular music. In popular music,
phrasings are usually very important in the melodic progression, especially in
the notations of the vocal/singing voices.

3 Experiments
We implemented our algorithms in Matlab 7.5 and used the Matlab MIDI
Toolbox [8] to read the MIDI files. To show the performance of the introduced
48 B. Mokbel, A. Hasenfuss, and B. Hammer

dissimilarity measure, we used Multi-Dimensional Scaling to embed the dissim-


ilarities into the Euclidean plane for visualization.
First, we measured a dataset taken from [4] and projected the calculated dis-
similarities into the Euclidean plane by non-metric Multi-Dimensional Scaling

Bach
Haydn

Haydn Mozart
Bach Mozart
Chopin

Debussy
Haydn
Haydn
Bach Chopin Debussy
Chopin Debussy
Debussy Buxtehude
Mozart
Haydn Buxtehude
Haydn Chopin Debussy
Haydn
Haydn
Buxtehude
Haydn Haydn Buxtehude
Chopin Buxtehude
Haydn
Chopin Bach
Haydn Beethoven Chopin

Haydn

Chopin
Beethoven Bach

Chopin
Buxtehude
Beethoven
Mozart
Mozart
Bach Mozart
Beethoven Mozart
Beethoven

Beethoven
Mozart
Bach Buxtehude
Haydn
Mozart Bach Buxtehude
Chopin
Mozart
Beethoven

Beethoven

Fig. 2. Performance of the Dissimilarity Measure on a Classical Piano Dataset

Beethoven
Haydn
Mozart

Fig. 3. Dissimilarity-based Mapping of Symphonies of Beethoven, Haydn, and Mozart


– The Symphonies have been separated into movements
Graph-Based Representation of Symbolic Musical Data 49

120

100

+12
+8
80 +5 -3 -1 +2
+3
+2 -1 +2 -2 -4 -1 -4
-2
-4 -5
pitch

+19 +2
60 +1 +1 +1
+16
-1 -1
+12 +1
-1 +2 -1
-5 -5
-2

40

-5 -5

20

0
0 10 20 30 40 50 60 70 80 90 100
time

120

100

80 +5 +5
+8 +8
+5 -3 -1 +2 +5 -3 -1 +2
+3 +3
+2 -1 +2 -2 -4 -1 -4 +2 -1 +2 -2 -4 -1 -4
-2 -2
-4 -5 -5 -4 -5
pitch

60
+3 +2 +3 +2
+1 +4 +1
-3 -3
-2 -2
-7 -7

40

20

0
0 10 20 30 40 50 60 70 80 90 100
time

Fig. 4. Graph-based Representations of the Song Happy Birthday – Similar Lead


Melodies with Different Accompaniments
50 B. Mokbel, A. Hasenfuss, and B. Hammer

with Kruskal’s normalized stress1 criterion. The dataset consists of 63 classical


pieces of piano music originating from seven composers (Bach, Beethoven, Bux-
tehude, Chopin, Debussy, Haydn, Mozart). As can be seen in Figure 2 most of
the data is arranged in meaningful clusters. Obviously, the dissimilarity measure
distinguishes between the different composers, though it is solely based on the
dissimilarities in tonal and rhythmic progressions.
The second experiment shows a dissimilarity-based mapping of symphonies
from the classical period. Symphonies by Beethoven, Haydn, and Mozart were
measured by the above introduced method. The symphonies have been separated
into movements beforehand such that each of the 190 data points corresponds
to a movement of a symphony. The mapping of the data is shown in Figure 3.
The three clusters are clearly recognizable, two clusters are well separated and
the third one (Haydn) is in between. This very promising result demonstrates
the abilities of the new graph-based measure since it is consistent with the music
historical setting.

4 Summary and Outlook


In this work, we have introduced a novel approach for measuring the similarity of
symbolic music. The presented approach features a graph-based representation
and derives a dissimilarity measure thereof. These dissimilarities can directly be
processed by topographic mapping techniques like e.g. Relational Self-Organizing
Map or Relational Neural Gas (cf. [10]). The used graph precedence structure is
built on pitch transitions and relative timings. We demonstrated the performance
of the measure on several datasets from classical music.
In the near future we want to further test our method on very large datasets
of different musical styles, especially with popular and mainstream music. To
compare its performance and speed with existing approaches, we plan to use
test datasets for music classification.
Furthermore, we want to implement alternative methods to measure the sim-
ilarity of the strings derived from the graphs other than the compression metric.
Moreover, we are currently working on the application of graph similarity mea-
sures with graph kernels as presented in [14].
By combining our system with a prior automatic conversion of audio data to
note sequences, as presented in [11,20,15], many further applications are possible.
One obvious extension would be a ’query by humming’ database search system,
as e.g. in [1]. By preprocessing the entire dataset with a topographic mapping
method for non-Euclidean data like the Relational Self-Organizing Map, an effec-
tive similarity-based search technique is possible that narrows down the search
space.
To speed up the calculation of the dissimilarity measure, it might be sufficient
to calculate the dissimilarities only between the most-significant paths — the
paths with the highest entropy in pitch variations. Those are most likely to be
the melodic progressions that have a lead role within the arrangement.
Graph-Based Representation of Symbolic Musical Data 51

References
1. Birmingham, W., Dannenberg, R., Pardo, B.: Query by humming with the vo-
calsearch system. Commun. ACM 49(8), 49–52 (2006)
2. Cataltepe, Z., Yaslan, Y., Sonmez, A.: Music genre classification using midi and
audio features. EURASIP Journal on Advances in Signal Processing, Article ID
36409 (2007)
3. Cilibrasi, R., Vitányi, P.: Clustering by compression. IEEE Transactions on Infor-
mation Theory 51(4), 1523–1545 (2005)
4. Cilibrasi, R., Vitányi, P., de Wolf, R.: Algorithmic clustering of music based on
string compression. Computer Music Journal 28(4), 49–67 (2004)
5. Cruz-Alcázar, P.P., Vidal, E.: Two grammatical inference applications in music
processing. Applied Artificial Intelligence 22(1&2), 53–76 (2008)
6. Di Lorenzo, P., Di Maio, G.: The hausdorff metric in the melody space: A new
approach to melodic similarity. In: ICMPC (2006)
7. Dorrell, P.: What Is Music?: Solving a Scientific Mystery. Lulu(print on demand)
(2005)
8. Eerola, T., Toiviainen, P.: MIDI Toolbox: MATLAB Tools for Music Research.
University of Jyvaskyla (2004)
9. Hammer, B., Hasenfuss, A.: Relational neural gas. In: Hertzberg, J., Beetz, M.,
Englert, R. (eds.) KI 2007. LNCS (LNAI), vol. 4667, pp. 190–204. Springer, Hei-
delberg (2007)
10. Hammer, B., Hasenfuss, A.: Relational neural gas. In: Hertzberg, J., Beetz, M.,
Englert, R. (eds.) KI 2007. LNCS (LNAI), vol. 4667, pp. 190–204. Springer, Hei-
delberg (2007)
11. Klapuri, A., Davy, M. (eds.): Signal Processing Methods for Music Transcription.
Springer, New York (2006)
12. Li, M., Vitányi, P.: An Introduction to Kolmogorov Complexity and Its Applica-
tions. Springer, Heidelberg (1997)
13. Londei, A., Loreto, V., Belardinelli, M.O.: Musical style and authorship catego-
rization by informative compressors. In: Proc. ESCOM Conference (2003)
14. Neuhaus, M., Bunke, H.: Bridging the Gap Between Graph Edit Distance and
Kernel Machines. World Scientific, Singapore (2007)
15. Pardo, B., Birmingham, W.P.: Algorithms for chordal analysis. Computer Music
Journal 26(2), 27–49 (2002)
16. Pinto, A., van Leuken, R.H., Demirci, M.F., Wiering, F., Veltkamp, R.C.: Index-
ing music collection through graph spectra. In: Proceedings of the International
Conference of Music Information Retrieval (2007)
17. Romming, C.A., Selfridge-Field, E.: Algorithms for polyphonic music retrieval:
The hausdorff metric and geometric hashing. In: Proceedings of the International
Conference of Music Information Retrieval (2007)
18. Ruppin, A., Yeshurun, H.: Midi music genre classification by invariant features. In:
Proceedings of the International Conference of Music Information Retrieval (2006)
19. Ukkonen, E., Lemström, K., Maekinen, V.: Geometric algorithms for transposi-
tion invariant content-based music retrival. In: Proceedings of the International
Conference of Music Information Retrieval (2003)
20. Woodruff, J., Pardo, B.: Using pitch, amplitude modulation, and spatial cues for
separation of harmonic instruments from stereo music recordings. EURASIP Jour-
nal on Advances in Signal Processing, Article ID 86369 (2007)
Graph-Based Analysis of Nasopharyngeal
Carcinoma with Bayesian Network
Learning Methods

Alex Aussem1 , Sergio Rodrigues de Morais1 , Marilys Corbex2 , and Joël Favrel1
1
University of Lyon,
LIESP, Université de Lyon 1, F-69622 Villeurbanne France
aaussem@univ-lyon1.fr
2
International Agency for Research on Cancer (IARC)
150 cours Albert Thomas, F-69280 Lyon Cedex 08 France
CORBEXM@emro.who.int

Abstract. In this paper, we propose a new graphical framework for


extracting the relevant dietary, social and environmental risk factors
that are associated with an increased risk of Nasopharyngeal Carcinoma
(NPC) based on a case-control epidemiologic study. This framework
builds on the use of Bayesian network for representing statistical de-
pendencies between the random variables. BN is directed acyclic graphs
that models the joint multivariate probability distribution underlying
the data. These graphical models are highly attractive for their ability to
describe complex probabilistic interactions between variables. The graph
provides a statistical profile of the recruited population and meanwhile
help identify a subset of features that are most relevant for probabilistic
classification of the NPC. We report experiment results with the NPC
case-study data using a novel constraint-based BN structure learning al-
gorithm. We show how the DAG provides an improved comprehension
of NPC etiology. Our findings are compared with the risk factors that
were suggested in the recent literature in cancerology.

1 Introduction
The identification of relevant subset of risk factors that are not captured by
traditional statistical testing is a topic of considerable interest within the epi-
demiologic community. It is also a very challenging topic of pattern recognition
research that has attracted much attention in recent years [1,2]. In this study,
we apply a new graphical framework for extracting the relevant risk factors that
are statistically associated with the Nasopharyngeal Carcinoma (NPC) based on
a case-control epidemiologic study. The database consists of 1289 subjects (664
cases of NPC and 625 controls) and 150 binary random variables.
Nasopharyngeal Carcinoma (NPC for short) is a malignancy with unusually
variable incidence rates across the world. In most parts of the world it is a
rare disease but in some regions it occurs in an endemic form. Endemic regions
include the southern parts of China, other parts of south-east Asia and North

A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 52–61, 2009.

c Springer-Verlag Berlin Heidelberg 2009
Graph-Based Analysis of NPC with Bayesian Network Learning Methods 53

Africa. In these countries it is a major public health problem. Etiology of NPC


is still poorly understood, many factors seem to be involved (diet, life style,
genetic) which complicates the work of epidemiologists. Detecting NPC patients
using a machine learning approach would be advantageous in many situations.
To our knowledge, no statistical methods, expect logistic regression, have been
developed yet to support the epidemiologists in their analysis of NPC through
case-control studies.
In this paper, we discuss a recursive procedure that builds a local graph
that includes all the relevant features statistically associated to the target vari-
able, without having to find the whole BN first. It is based on an incremental
constraint-based method called Hybrid Parent and Children (HPC) for learning
the parents and children of a target variable in the graph. Like all constraint-
based (CB) methods, the procedure systematically checks the data for condi-
tional independence relationships and use those relationships to infer the parents
and children of the target variable. HPC is run recursively on the adjacent nodes
of NPC, our target variable, in order to establish a local graph in the neighbor-
hood of the target. A similar procedure has been presented in [3]. Once the graph
is constructed, it is straightforward to extract the relevant features for predic-
tion purposes. The key advantage of this constraint-based learning procedure is
to find the feature subset that is conjunctly the most associated to the disease
directly without having to find the whole BN first. CB methods like HPC were
shown to be among the top-ranking entrants in the recent ”WCCI2008 Causa-
tion and Prediction Challenge” as noted in [4]. In addition, HPC has been shown
in [5,6,7] to outperform the latest learning proposal discussed in detail in [2], in
terms of accuracy on data sets scaling up to tens of thousand variables with
small sample sizes.
In this study, special emphasis is placed on integrating domain knowledge and
statistical data analysis. Once the graph skeleton is constructed from data, it is
afterwards directed by the domain expert according to his causal interpretation
and additional latent variable are added to the graph for sake of clarity, coher-
ence and conciseness. The graphical representation provides a statistical profile
of the recruited population, and meanwhile help identify the important risk fac-
tors involved in NPC. It is therefore greatly appreciated by epidemiologists. We
compare our findings with the results obtained with the same data using more
traditional logistic regression models and published recently in the cancerology
literature.

2 Preliminaries
For the paper to be accessible to those outside the domain, we recall first the
principles of Bayesian networks. In this paper, we only deal with discrete random
variables. Formally, a BN is a tuple < G, P >, where G =< V, E > is a directed
acyclic graph (DAG) with nodes representing the random variables V and P a
joint probability distribution on V. A BN structure G entails a set of conditional
independence assumptions. They can all be identified by the d-separation crite-
rion [8]. We use X ⊥G Y |Z to denote the assertion that X is d-separated from
54 A. Aussem et al.

Y given Z in G. Formally, X ⊥G Y |Z is true when for every undirected path in


G between X and Y , there exists a node W in the path such that either (1) W
does not have two parents in the path and W ∈ Z, or (2) W have two parents
in the path and neither W nor its descendants is in Z. X ⊥G Y |Z. If < G, P >
is a BN, X ⊥P Y |Z if X ⊥G Y |Z. The converse does not necessarily hold. We
say that < G, P > satisfies the faithfulness condition if the d-separations in G
identify all and only the conditional independencies in P , i.e., X ⊥P Y |Z iff
X ⊥G Y |Z.
A Markov blanket MT of the T is any set of variables such that T is condition-
ally independent of all the remaining variables given MT . A Markov boundary,
MBT , of T is any Markov blanket such that none of its proper subsets is a
Markov blanket of T .
Theorem 1. Suppose < G, P > satisfies the faithfulness condition. Then X and
Y are not adjacent in G iff ∃Z ∈ U \ {X ∪ Y } such that X ⊥ Y |Z. Moreover,
for all X, the set of parents, children of X, and parents of children of X is the
unique Markov boundary of X.
A proof can be found for instance in [9]. We denote by PCGT , the set of parents
and children of T in G, and by SPGT , the set of spouses of T in G. The spouses
of T are the parents of the children of T . These sets are unique for all G, such
that < G, P > is faithful and so we will drop the superscript G. We denote by
dSep(X), the set that d-separates X from the (implicit) target T .
A structure learning algorithm from data is said to be correct (or sound)
if it returns the correct DAG pattern (or a DAG in the correct equivalence
class) under the assumptions that the independence test are reliable and that
the learning database is a sample from a distribution P faithful to a DAG G,
The (ideal) assumption that the independence tests are reliable means that they
decide (in)dependence iff the (in)dependence holds in P . The problem of learning
the most probable a posteriori Bayesian network (BN) from data is worst-case
NP-hard [10]. This challenging topic of pattern recognition has attracted much
attention over the last few years.

3 The Hybrid Parent and Children Algorithm


In this section, we present a new hybrid algorithm called hybrid parents and
children algorithm (HPC), for learning a graph skeleton from a database D. It is
hybrid in that HPC combines the benefits of incremental and divide-and-conquer
methods, while their respective drawbacks are reduced. HPC was designed in
order to endow the search procedure with the ability to: 1) handle efficiently
data sets with thousands of variables but comparably few instances, 2) be correct
under faithfulness condition, and most importantly, 3) remain efficient when the
number of adjacent node is large.
Graph-Based Analysis of NPC with Bayesian Network Learning Methods 55

Algorithm 1. Inter-IAPC Algorithm 2. HPC


Require: T : target; D: data set; U set of Require: T : target; D: data set; U: vari-
variables ables
Ensure: PC: Parents and children of T ; Ensure: PC: Parents and Children of T
Phase I: Remove X if T ⊥ X
1: MB ← ∅ 1: PCS ← U \ T
2: repeat 2: for all X ∈ PCS do
3: Add true positives to MB 3: if (T ⊥ X) then
4: Y ← 4: PCS ← PCS \ X
argmaxX∈(U\MB\T) dep(T, X|MB) 5: dSep(X) ← ∅
5: if T ⊥ Y |MB then 6: end if
6: MB ← MB ∪ Y 7: end for
7: end if
Phase II:Remove X if T ⊥ X|Y
Remove false positives from MB 8: for all X ∈ PCS do
8: for all X ∈ MB do 9: for all Y ∈ PCS \ X do
9: if T ⊥ X|(MB \ X) then 10: if (T ⊥ X | Y ) then
10: MB ← MB \ X 11: PCS ← PCS \ X
11: end if 12: dSep(X) ← Y ; go to 15
12: end for 13: end if
13: until MB has not changed 14: end for
15: end for
Remove parents of children from MB Phase III: Find superset for SP
14: PC ← MB 16: SPS ← ∅
15: for all X ∈ MB do 17: for all X ∈ PCS do
16: if ∃Z ⊂ (MB \ X) st T ⊥ X | Z 18: SPSX ← ∅
then 19: for all Y ∈ U \ {T ∪ PCS} do
17: PC ← PC \ X 20: if (T ⊥ Y |dSep(Y ) ∪ X) then
18: end if 21: SPSX ← SPSX ∪ Y
19: end for 22: end if
23: end for
24: for all Y ∈ SPSX do
25: for all Z ∈ SPSX \ Y do
26: if (T ⊥ Y |X ∪ Z) then
27: SPSX ← SPSX \ Y ; go to
30
28: end if
29: end for
30: end for
31: SPS ← SPS ∪ SPSX
32: end for
Phase IV: Find PC of T
33: PC ← Inter-IAPC(T, D(PCS ∪
SPS))
34: for all X ∈ PCS \ PC do
35: if T ∈ Inter-IAPC(X, D) then
36: PC ← PC ∪ X
37: end if
38: end for
56 A. Aussem et al.

3.1 Inter-IAPC Algorithm

HPC(T) is based on a subroutine, called Interleaved Incremental Association


Parents and Children, Inter-IAPC(T). Inter-IAPC(T) is a fast incremental
method that receives a target node T as input and promptly returns a rough
estimate of PCT . It is based on the Inter-IAMB algorithm [11]. The algorithm
starts with a two-phase approach to infer a candidate set for MBT . A grow-
ing phase attempts to add the most dependent variables to T, followed by a
shrinking phase that attempts to remove as many irrelevant variables as pos-
sible. The shrinking phase is interleaved with the growing phase. The function
dep(T, X|MB) at line 4 returns a statistical estimation of the association be-
tween T and X given the current set MB. Interleaving the two phases allows
to eliminate some of the false positives in the current blanket as the algorithm
progresses during the growing phase. PCT is obtained by removing the spouses
of the target from MBT (lines 14-19).
While Inter-IAPC(T) is very fast and sound (it overcomes the problem illus-
trated in Fig. 1), it is considered as data inefficient in [2] because the number of
instances required to identify PCT is at least exponential in the size of MBT
(at lines 9 and 16). Note that independence is usually assumed when data is
lacking to perform the test reliably. In our implementation, for instance, the
test is deemed unreliable when the number of instances is less than ten times
the number of degrees of freedom of the test. Therefore, if the number of nodes
adjacent to T is too large compared to the number of instances, the growing
phase will end up before all the variables enter the candidate set. We will see
next how HPC(T) combines several runs of Inter-IAPC(T) to alleviate its data
inefficiency.

3.2 HPC Algorithm

HPC(T) receives a target node T as input and returns an estimate of PCT .


It implements a divide-and-conquer strategies in order to improve the data effi-
ciency of the search, while still being scalable and correct under the faithfulness
condition. HPC(T) works in four phases and uses Inter-IAPC(T) as a subrou-
tine. In phase I and II, HPC(T) constructs a superset of the parents and children
to reduce as much as possible the number of variables before proceeding further:
the size of the conditioning set Z in the tests is severely restricted: card(Z) ≤ 1
(at lines 3 and 10). In phase III, a superset of the parents of the children of T
is built with card(Z) ≤ 2 (at lines 20 and 26). Phase IV finds the parents and
children in the superset of the PC, using the OR operator. The rule states as
follows: X ∈ PCT iff [X ∈ LearnP C(T )] OR [T ∈ LearnP C(X)]. Therefore, all
variables that have T in their vicinity are included in PCT . As discussed in more
details in [6,7], the OR operator is one of the key advantage of the algorithm,
compared to GetPC [2] and CMMPC [11] that use the AND operator instead.
By loosening the criteria by which two nodes are said adjacent, the effective
restrictions on the size of the neighborhood are now far less severe. This simple
”trick” has significant impact on the accuracy of HPC as we will see. It enables
Graph-Based Analysis of NPC with Bayesian Network Learning Methods 57

the algorithm to handle large neighborhoods while still being correct under faith-
fulness condition. The theorem below (see [6] for the proof) establishes HPC’s
correctness under faithfulness condition:
Theorem 2. Under the assumptions that the independence tests are reliable
and that the database is an independent and identically distributed sample from
a probability distribution P faithful to a DAG G, HPC(T ) returns PCT .

4 Experiments
Before we proceed to the experiments with HPC on the NPC database, we
report some results on synthetic data that are independent and identically dis-
tributed samples from well known BN benchmarks ALARM, CHILD, INSUR-
ANCE, GENE and PIGS. The aim is to evaluate empirically the inevitable errors
that will arise our epidemiologic data. Therefore, we consider the same sample
size as the NPC data to get an empirical estimate of the accuracy of HPC on the
NPC data. To implement the conditional independence test, we calculate the G2
statistic as in [12], under the null hypothesis of conditional independence. The
significance level of the test is fixed to 0.05 for all algorithms. The test is deemed
unreliable when the number of instances is less than ten times the number of
degrees of freedom.

4.1 Experimental Validation on Synthetic Data


In this section, we report the results of our experiments on five common bench-
marks (ALARM, CHILD, GENE, INSURANCE and PIGS, see [11] and refer-
ences therein). While HPC was initially designed to output the PC set, it can
easily be extended to include the parents of the children to the PC set to yield
the Markov boundary of the target. The modified version of HPC, called MBOR,
is not shown for conciseness. It is discussed in detail in [6,7]. The task here is to
compare the computed features output by MBOR against the true features in
the Markov boundary in terms of missing and extra features. For each bench-
mark, we sampled 200 data sets containing the same number of samples as our
NPC database, namely 1289 samples.
To evaluate the accuracy, we combine precision (i.e., the number of true posi-
tives in the output divided by the number of nodes in the output) and recall (i.e.,
the
 number of true positives divided by the true size of the Markov Boundary) as
(1 − precision)2 + (1 − recall)2 , to measure the Euclidean distance from per-
fect precision and recall, as proposed in [2]. Figure 1 summarizes the distribution
of the Euclidean distance over 200 data sets in the form of triplets of boxplots,
one for each algorithm (from left to right: PCMB, Inter-IAMB and MBOR), ver-
sus the number of instances. The figure shows the distance distribution for the
nodes with the largest MBs in the BN. Several observations can be made. The
advantages of MBOR against the other two algorithms are noticeable. As may
be seen on PIGS and GENES for instance, MBOR consistently outperforms the
other algorithms. This is not a surprise, the largest MB in PIGS and GENES
58 A. Aussem et al.

perfect precision and recall 0.9

0.8
Euclidean distance from

0.7

0.6

0.5

0.4

0.3

0.2

0.1

ALARM CHILD GENE INSURANCE PIGS


Benchmarks

Fig. 1. Performance comparison between InterIAMB, PCMB and MBOR on synthetic


data in terms of FSS accuracy. The figure shows the accuracy in the form of triplets of
boxplots for PCMB (left), Inter-IAMB (middle) and MBOR (right).

have 68 and 15 variables respectively, while ∀X, card(MBX ) ≤ 10 in the other


networks. Further experimental results are provided in [6,7].

4.2 Experiments on Nasopharyngeal Carcinoma Data


We now report the results of our experiments on the NPC database. The goal
is to investigate the role of the dietary, social and environmental risk factors
in the aetiology of NPC, and to shed some light into the statistical profile of
the recruited population. The problem of finding strongly relevant features is
usually achieved in the context of determining the Markov boundary of the class
variable that we want to predict. However, it is useful for the epidemiologist to
induce a broader set of features that are not strictly relevant for classification
purposes but that are still associated to the NPC. Therefore, in this study, HPC
is run recursively on the adjacent nodes of NPC, the target variable, in order to
establish a local graph in its neighborhood as discussed in [3]. The local graph
only includes those variables that depend on NPC such that no more than 3
(and sometimes less) other variables mediate the dependency. The nodes to be
further developed were chosen iteratively by our domain expert to provide a
broader picture of the features that carry some information about the target
variable.
This yields the graph in Figure 2. Line width is proportional to the G2 sta-
tistical association measure. Edge orientation judgments are more reliable when
they are anchored onto fundamental blocks of the domain expert knowledge.
Therefore, we asked our expert to partially direct some links to form a par-
tially directed acyclic graph (PDAG). Of course, the interpretation of PDAG
as carriers of independence assumptions does not necessarily imply causation.
Dash nodes and arrows are latent variables that were added by the expert for
sake of clarity, coherence and conciseness. These nodes are latent (or hidden)
variables (i.e., they are not observed, nor recorded in our data set) that were
added because the expert feels they are common causes that explains away
some correlations that were found between their common effects. For example,
Graph-Based Analysis of NPC with Bayesian Network Learning Methods 59

Fig. 2. Local PDAG around variable NPC obtained by HPC. A selection of 37 variables
out of 150 is shown for sake of clarity. Line width is proportional to the G2 statistical
association measure. The links were partially directed by the domain expert. Dash
nodes and arrows are latent variables that were added by the expert for sake of clarity,
coherence and conciseness.

the variable ”bad habits” is a common ”cause” to alcohol, cannabis and tobacco;
the principle of a ”healthy diet” is clearly to eat ”fruits” and ”vegetables”; in-
dustrial workers (associated to variable ”working in industry”) are exposed to
noxious chemicals and poisonous substances that are often used in the course
of manufacturing etc. Now, adding a parent node (the cause) explains ways the
correlation between its child variables (the effects).
We now turn to the epidemiological interpretation of the PDAG. As may
be seen, the extracted variables provide a coherent picture of the population
under study. The NPC variable is directly linked to 15 variables: chemical
products, pesticides, fume intake, dust exposure, number of NPC cases in the
family, diabetes, otitis, other disease, kitchen ventilation, burning incense and
perfume, sheep fat, house-made proteins, industrial harissa, traditional treat-
ments during childhood and cooked vegetables. More specifically, the graph re-
veals that people exposed to dust, pesticide and chemical products are much
more likely to have NPC. Indeed, industrial workers are often exposed to noxious
60 A. Aussem et al.

chemicals and poisonous substances that are used in the course of manufactur-
ing etc. The PDAG also suggests that pesticides may be a contributing factor
for NPC along with other factors such as chemical manure exposure and hav-
ing a family history of NPC. Consumption of a number of preserved food items
(variables ”house made proteins”, ”sheep fat” and ”harissa” in the PDAG) was
already found to be a major risk factor for NPC [13,14,15]. Consumption of
”cooked vegetables” was also shown to be associated with reduced risk of NPC in
[14]. There is also strong evidence that intense exposure to smoke particles from
incomplete combustion of coal and wood (as occurs under occupational settings;
variables ”burning incense” and ”ventilation” in the graph) is associated with a
duration-dependent, increased risk of NPC [16]. In [17], the authors show that do-
mestic fume intake from wood fire and cooking with kanoun (i.e., compact sized
oven) is significantly associated with NPC risk. Apart of smoke particles, long
term use of incense is also known to increase the risk of developing cancers of the
respiratory tract. Therefore, the CPDAG supports previous findings that some
occupational inhalants are risk factors for NPC. The rest of the graph is also in-
formative and the edges lend themselves to interpretation. For instance, gender,
cigarette smoking and alcohol drinking are highly correlated to lifestyle habits in
the maghrebian societies but not to NPC. It was shown that NPC is less sensi-
tive to the carcinogenic effects of tobacco constituents [13], and that alcohol has
a marginal effect on NPC [17]. Poor housing condition is characterized by over-
crowding and lack of ventilation. Instruction, lodging conditions and professional
category are correlated. Consumption of traditional food (Spicy food, house-made
proteins and harissa) is related to consumption of traditional rancid fat (sheep fat,
smen) cooked with traditional technics (kanoun, tabouna) etc.

5 Conclusion
We discussed in this paper the situation where NPC survey data are passed to
a graphical discovery process to infer the risk factors associated with NPC. The
extracted feature match previous biological findings and opens new hypothesis
for future studies.

Acknowledgment
This work is supported by ”Ligue contre le Cancer, Comité du Rhône, France”.
The NPC data was kindly supplied by the International Agency for Research on
Cancer, Lyon, France.

References
1. Nilsson, R., Pena, J.M., Bjrkegren, J., Tegnr, J.: Consistent feature selection for
pattern recognition in polynomial time. Journal of Machine Learning Research 8,
589–612 (2007)
2. Peña, J.M., Nilsson, R., Bjrkegren, J., Tegnr, J.: Towards scalable and data effi-
cient learning of markov boundaries. International Journal of Approximate Rea-
soning 45(2), 211–232 (2007)
Graph-Based Analysis of NPC with Bayesian Network Learning Methods 61

3. Peña, J.M., Bjrkegren, J., Tegnr, J.: Growing bayesian network models of gene
networks from seed genes. Bioinformatics 40, 224–229 (2005)
4. Guyon, I., Aliferis, C., Cooper, G., Elissee, A., Pellet, J.P., Statnikov, P.A.: Design
and analysis of the causation and prediction challenge. In: JMLR: Workshop and
Conference Proceedings, vol. 1, pp. 1–16 (2008)
5. Aussem, A., Rodrigues de Morais, S., Perraud, F., Rome, S.: Robust gene selec-
tion from microarray data with a novel Markov boundary learning method: Appli-
cation to diabetes analysis. In: European Conference on Symbolic and Quantitative
Approaches to Reasoning with Uncertainty ECSQARU 2009 (2009) (to appear)
6. Rodrigues de Morais, S., Aussem, A.: A novel scalable and data efficient feature
subset selection algorithm. In: European Conference on Machine Learning and
Principles and Practice of Knowledge Discovery in Databases ECML-PKDD 2008,
Antwerp, Belgium, pp. 298–312 (2008)
7. Rodrigues de Morais, S., Aussem, A.: A novel scalable and correct markov bound-
ary learning algorithms under faithfulness condition. In: 4th European Workshop
on Probabilistic Graphical Models PGM 2008, Hirtshals, Denmark, pp. 81–88
(2008)
8. Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible
Inference. Morgan Kaufmann, San Francisco (1988)
9. Neapolitan, R.E.: Learning Bayesian Networks. Prentice Hall, Englewood Cliffs
(2004)
10. Chickering, D.M., Heckerman, D., Meek, C.: Large-sample learning of bayesian
networks is np-hard. Journal of Machine Learning Research 5, 1287–1330 (2004)
11. Tsamardinos, I., Brown, L.E., Aliferis, C.F.: The max-min hill-climbing bayesian
network structure learning algorithm. Machine Learning 65(1), 31–78 (2006)
12. Spirtes, P., Glymour, C., Scheines, R.: Causation, Prediction, and Search, 2nd edn.
The MIT Press, Cambridge (2000)
13. Yu, M.C., Yuan, J.-M.: Epidemiology of nasopharyngeal carcinoma. Seminars in
Cancer Biology 12, 421–429 (2002)
14. Feng, B.J., et al.: Dietary risk factors for nasopharyngeal carcinoma in maghrebian
countries. International Journal of Cancer 121(7), 1550–1555 (2007)
15. Jeannel, D., et al.: Diet, living conditions and nasopharyngeal carcinoma in tunisia:
a case-control study. Int. J. Cancer 46, 421–425 (1990)
16. Armstrong, R.W., Imrey, P.B., Lye, M.S., Armstrong, M.J., Yu, M.C.: Nasopharyn-
geal carcinoma in malaysian chinese: occupational exposures to particles, formalde-
hyde and heat. Int. J. Epidemiol. 29, 991–998 (2000)
17. Feng, B.J., et al.: Cannabis smoking and domestic fume intake are associated with
nasopharyngeal carcinoma in north africa (2009) (submitted)
Computing and Visualizing a Graph-Based
Decomposition for Non-manifold Shapes

Leila De Floriani1 , Daniele Panozzo1, and Annie Hui2


1
Department of Computer Science, University of Genova, Italy
{deflo,panozzo}@disi.unige.it
2
Department of Computer Science, University of Maryland, College Park, USA
huiannie@cs.umd.edu

Abstract. Modeling and understanding complex non-manifold shapes


is a key issue in shape analysis and retrieval. The topological structure of
a non-manifold shape can be analyzed through its decomposition into a
collection of components with a simpler topology. Here, we consider a de-
composition of a non-manifold shape into components which are almost
manifolds, and we present a novel graph representation which highlights
the non-manifold singularities shared by the components as well as their
connectivity relations. We describe an algorithm for computing the de-
composition and its associated graph representation. We present a new
tool for visualizing the shape decomposition and its graph as an effective
support to modeling, analyzing and understanding non-manifold shapes.

1 Introduction
Non-manifold models have been introduced in geometric modeling long time ago.
They are relevant in describing the shape of mechanical models, which are usu-
ally represented as volumes, surfaces and lines connected together. Informally,
a manifold (with boundary) M is a compact and connected subset of the Eu-
clidean space for which the neighborhood of each point of M is homeomorphic
to an open ball (or to an open half-ball). Shapes, that do not fulfill this property
at one or more points, are called non-manifold.
Non-manifold shapes are usually discretized as cell or simplicial complexes and
arise in several applications, including finite element analysis, computer aided
manufacturing, rapid prototyping, reverse engineering, animation. In Computer
Aided Design (CAD), non-manifold shapes are usually obtained through an ide-
alization process which consists of operations, such as removal of details, hole
removal, or reduction in the dimensionality of some parts. For instance, parts pre-
senting a beam behavior in an object can be replaced with one-dimensional en-
tities, and parts presenting a plate behavior can be replaced by two-dimensional
surfaces. This process reduces the complexity of the object, thus resulting in a
representation which captures only its essential features.
A natural way to deal with the intrinsic complexity of modeling non-manifold
shapes consists of considering a topological decomposition of the shape into
manifold or ”almost” manifold parts. We consider here a decomposition of a

A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 62–71, 2009.

c Springer-Verlag Berlin Heidelberg 2009
Computing and Visualizing a Graph-Based Decomposition 63

non-manifold shape into what we call manifold-connected components, which


form a topological super-class of pseudo-manifolds [1]. Further investigation on
the properties of such decomposition, that we call an MC-decomposition, showed
that it is unique and is the discrete counterpart of Whitney stratification used
in the differentiable case.
We represent the structure of a non-manifold shape as a hypergraph, that we call
the MC-decomposition graph, in which the nodes correspond to the MC-components
and the arcs describe the connectivity among the components defined by the non-
manifold singularities. We have developed an algorithm for computing the
MC-decomposition, and its associated graph, based on a new data structure for
encoding the discretized input shape, that we have implemented in a library, the
IS library, for encoding and manipulating simplicial complexes [2].
In our work, we have designed and developed a visualization tool for rendering
a segmentation of a shape into parts and its associated graph representation. The
tool is completely general and is not tailored to non-manifold shapes, or to the
specific MC-decomposition. A beta version of the decomposition software and
of the visualization tool can be downloaded from http://www.disi.unige.it/
person/PanozzoD/mc/.
The MC-decomposition and its associated graph is a very flexible tool for
shape analysis, shape matching and retrieval and shape understanding and an-
notation. We have applied such representation for the computation of topological
invariants of a shape, such as the Betti numbers, and for developing a taxonomy
for non-manifold shapes [3]. The basis for shape understanding and semantic an-
notation is extracting and recognizing the so-called form features of a shape, such
as protrusions or depressions, through-holes or handles. Since form features have
been classified in the literature only for manifold shapes, in our previous work,
we have extended such classification to non-manifold shapes [4]. The combina-
torial structure of the MC-decomposition graph and the topological structure of
the MC-components themselves are related to the topology of the original non-
manifold shape. Thus, its form features can be extracted through graph-theoretic
algorithms applied to the MC-decomposition graph.
The remainder of this paper is organized as follows. In Section 2, we review
some related work. In Section 3, we briefly discuss background notions on sim-
plicial complexes. In Section 4, we present the decomposition for non-manifold
shapes discretized through simplicial 3-complexes, i.e., the MC-decomposition,
and a graph-based representation for the MC-decomposition. In Section 5 we
describe an algorithm for computing the MC-decomposition and its associated
graph. In Section 6, we present the tool we have developed to view the MC-
decomposition and its decomposition graph, and we show some results. Finally,
in Section 7, we draw some concluding remarks and discuss current and future
development of this work.

2 Related Work
Shape analysis is an active research area in geometric and solid modeling, com-
puter vision, and computer graphics. The major approaches to shape analysis
64 L. De Floriani, D. Panozzo, and A. Hui

are based on computing the decomposition of a shape into simpler parts. Such ap-
proaches are either interior-based, or boundary-based [5]. Interior-based
approaches implicitly partition the volume of a shape by describing it as a geo-
metric, or a topological skeleton [6]. Boundary-based methods provide a decom-
position of the boundary of an object into parts, by considering local properties
of the boundary of the shape, such as critical features or curvature. These lat-
ter methods aim at decomposing an object into meaningful components, i.e.,
components which can be perceptually distinguished from the remaining part
of the object. Boundary-based methods have been developed in CAD/CAM for
extracting form features and produce a boundary-based decomposition of a 3D
object guided by geometric, topological and semantic criteria [7].
All shape segmentation and feature extraction algorithms, however, work on
manifold shapes. Only few techniques have been proposed in the literature for
decomposing the boundary of regular non-manifold 3D shapes [8, 9].
The partition of an analytic variety into analytic manifolds, called a strati-
fication, has been studied in mathematics to investigate the properties of such
varieties [10]. A stratification expresses the variety as the disjoint union of a
locally finite set of analytic manifolds, called strata. Pesco et al. [11] introduced
the concept of combinatorial stratification as the basis for a data structure for
representing non-manifold 3D shapes described by their boundary. The combi-
natorial stratification for a cell complex is a collection of manifold sub-complexes
of different dimensions, the union of which forms the original complex. A com-
binatorial stratification as discussed in [11], however, is not unique.

3 Background Notions

In this Section, we introduce some background notions on simplicial complexes,


which will be used throughout the paper (see [12] for more details).
A Euclidean simplex σ of dimension k is the convex hull of k + 1 linearly
independent points in the n-dimensional Euclidean space E n , 0 ≤ k ≤ n. Vσ is
the set formed by such points. We simply call a Euclidean simplex of dimension
k a k-simplex. k is called the dimension of σ. Any Euclidean p-simplex σ  , with
0 ≤ p < k, generated by a set Vσ ⊆ Vσ of cardinality p+1 ≤ d, is called a p-face
of σ. Whenever no ambiguity arises, the dimensionality of σ  can be omitted,
and σ  is simply called a face of σ. Any face σ  of σ such that σ  = σ is called a
proper face of σ. A finite collection Σ of Euclidean simplexes forms a Euclidean
simplicial complex if and only if (i), for each simplex σ ∈ Σ, all faces of σ belong
to Σ, and (ii), for each pair of simplexes σ and σ  , either σ ∩ σ  = ∅ or σ ∩ σ  is a
face of both σ and σ  . If d is the maximum of the dimensions of the simplexes in
Σ, we call Σ a d-dimensional simplicial complex, or a simplicial d-complex. In the
following, we will restrict our consideration to simplicial 1-, 2- and 3-complexes
in the three-dimensional Euclidean space E 3 .
The boundary of a simplex σ is the set of all proper faces of σ in Σ, while
the star of σ is the set of simplexes in Σ that have σ as a face. The link of
σ is the set of all the faces of the simplexes in the star of σ which are not
Computing and Visualizing a Graph-Based Decomposition 65

incident into σ. Any simplex σ such that star(σ) contains only σ is called a top
simplex. A simplicial d-complex in which all top simplexes are of dimension d
is called regular, or of uniform dimension. An h-path in a simplicial d-complex
Σ joining two (h+1)-simplexes in Σ, where h = 0, 1, ..., d − 1, is a path formed
by an alternating sequence of h-simplexes and (h+1)-simplexes. A complex Σ
is said to be h-connected if and only if there exists an h-path joining every pair
of (h+1)-simplexes in Σ. A subset Σ  of Σ is a sub-complex if Σ  is a simplicial
complex. Any maximal h-connected sub-complex of a d-complex Σ is called an
h-connected component of Σ.

4 The MC-Decomposition into Manifold-Connected


Components

In this Section, we describe a decomposition for non-manifold shapes discretized


through simplicial 2- and 3-complexes, first introduced in [1], called the MC-
decomposition and a graph representation for such decomposition.
The non-manifold singularities in the combinatorial representation of a non-
manifold shape are characterized by defining non-manifold vertices and edges.
A vertex (0-simplex) v in a d-dimensional regular complex Σ, with d ≥ 1, is a
manifold vertex if and only if the link of v in Σ is a triangulation of the (d−1)-
sphere S d−1 , or of the (d−1)-disk B d−1 . A vertex (0-simplex) v in a 1-dimensional
regular complex Σ is a manifold vertex if and only if the link of v consists of one
or two vertices. A vertex is called non-manifold otherwise.
An edge (1-simplex) e in a regular 3-complex Σ is a manifold edge if and only
if the link of e in Σ is a triangulation of the 1-sphere S 1 , or of the 1-disk B 1 .
An edge (1-simplex) e in a regular 2-complex Σ is a manifold edge if and only if
the link of e in Σ consists of one or two vertices. An edge is called non-manifold
otherwise.
The building blocks of the decomposition are manifold-connected (MC) com-
plexes. We consider a regular simplicial d-complex Σ embedded in the three-
dimensional Euclidean space, where d = 1, 2, 3. In such a complex, we say that
a (d−1)-simplex σ is a manifold simplex if and only if there exist at most two
d-simplexes in Σ incident in σ. A (d − 1)-path such that every (d − 1)-simplex
in the path is a manifold simplex is called a manifold (d-1)-path. Thus, we say
that two d-simplexes in Σ are manifold-connected if and only if there exists
a manifold (d− 1)-path connecting them. Then, we call a regular simplicial d-
complex Σ a manifold-connected complex if and only if any pair of d-simplexes
in Σ are manifold-connected. Figures 1(a) and 1(b) show examples of manifold-
connected 2- and 3-complexes, respectively. Note that manifold-connected 2- and
3-complexes may contain both non-manifold vertices and edges. It can be eas-
ily seen that a 1-dimensional manifold-connected complex cannot contain either
non-manifold vertices or edges.
A simplicial 3-complex Σ embedded in the three-dimensional Euclidean space
can be decomposed into manifold-connected one-, two- and three-dimensional
complexes, called Manifold-Connected (MC) components. Recall that a subset
66 L. De Floriani, D. Panozzo, and A. Hui

(a) (b) (c)

Fig. 1. (a) Example of a manifold-connected 2-complex; (b) example of a manifold-


connected 3-complex; (c) MC-decomposition graph for the complex in Figure 1(a):
non-manifold edges e1 , e2 and non-manifold vertices v1 , v2 , v3 define the non-manifold
singularity in the pinched torus of Figure 1(a)

Σ  of a complex Σ is a sub-complex if Σ  is a simplicial complex. Intuitively, a


decomposition Δ of Σ is a collection of sub-complexes of Σ, such that the union
of the components in Δ is Σ, and any two components Σ1 and Σ2 in Δ, if they
intersect, intersect at a collection of non-manifold vertices and edges. An MC-
decomposition is constructively defined by applying the following property: two
k-dimensional top simplexes σ1 and σ2 belong to the same MC-component if and
only if there exists a manifold (k−1)-path that connects σ1 and σ2 in Σ. It can
be proved that the MC-decomposition is unique and that the MC-decomposition
is the closest combinatorial counterpart of a Whitney stratification.
The MC-decomposition Δ can be described as a hypergraph H =< N, A >,
called the MC-decomposition graph, in which the nodes correspond to the MC-
components in Δ, while the hyperarcs correspond to the non-manifold singular-
ities common to two or more components, or within a single component. The
hyperarcs that connect distinct components are defined as follows: any k com-
ponents C1 , C2 , · · · , Ck in the MC-decomposition, with k > 1, such that the in-
tersection J of all such components is not empty, and J is common only to the k
components, defines one or more hyperarcs with extreme nodes in C1 , C2 , · · · , Ck .
The intersection of components C1 , C2 , · · · , Ck consists of isolated non-manifold
vertices, or maximal connected 1-complexes formed by non-manifold edges. A hy-
perarc is a connected component of such intersection. Thus, we classify hyperarcs
as 0-hyperarcs, which consist only of one non-manifold vertex and as 1-hyperarcs,
which are maximal 0-connected 1-complexes formed by non-manifold edges.
Figure 2(b) shows the MC-decomposition graph of the simplicial 2-complex de-
picted in Figure 2(a). The complex is formed by three triangles incident at a
common edge e1 and by a dangling edge C4 incident at one extreme of e1 . The
MC-decomposition graph consists of four nodes that represent the four com-
ponents, each of which is made of a single top cell, and of two hyperarcs. A
1-hyperarc is associated with vertex v1 and edge e1 , and a 0-hyperarc is associ-
ated with vertex v2 . Since a component C may contain non-manifold singularities,
we represent C in the decomposition graph with a node and with self-loops corre-
sponding to the non-manifold vertices and non-manifold edges. A 0-hyperarc cor-
responds to a non-manifold vertex belonging to C, while a 1-hyperarc corresponds
to a maximal connected 1-complex formed by non-manifold edges and vertices all
Computing and Visualizing a Graph-Based Decomposition 67

belonging to C. Figure 1(c) shows the MC-decomposition graph for the pinched
torus depicted in Figure 1(a): the graph contains one self-loop corresponding to
the non-manifold edges and vertices forming the non-manifold singularity in the
shape.

5 Computing the MC-Decomposition Graph

Our algorithm for computing the MC-decomposition of a simplicial 3-complex


Σ extracts first the maximal connected k-dimensional regular sub-complexes of
Σ of dimensions 0, 1 and 2, and then computes the MC-decomposition of each
k-dimensional regular sub-complex. To compute the MC-decomposition of a k-
dimensional regular complex, we use the property stated above that any pair
of manifold simplexes belonging to the same k-dimensional manifold-connected
component (for k = 1, 2, 3) must be connected through a manifold (k −1)-path.
This means that every MC-component C can be traversed by following the man-
ifold (k − 1)-paths connecting the k-simplexes in C. We consider then a graph
G in which the nodes are the top k-simplexes, k = 1, 2, 3, and the arcs connect
any pair of top k-simplexes which share a manifold (k − 1)-simplex. The con-
nected components of such a graph are the manifold-connected components in
the MC-decomposition.
We compute first an exploded version of the MC-decomposition graph, that we
call the expanded MC-decomposition graph. In the expanded MC-decomposition
graph, that we denote as HE = (NE , AE ), the nodes are in one-to-one corre-
spondence with the MC-components, while the hyperarcs are in one-to-one cor-
respondence with the non-manifold vertices and edges. A hyperarc corresponding
to a non-manifold vertex v (or to a non-manifold edge e) connects all the MC-
components that contain vertex v (or edge e). Figure 2(c) shows the expanded
MC-decomposition graph of the simplicial 2-complex depicted in Figure 2(a). A
hyperarc is associated with each non-manifold singularity of the complex.
The MC-decomposition graph H is then computed from its expanded ver-
sion HE by merging in a single hyperarc connecting components C1 , C2 , ..., Cq
all the hyperarcs of G which connect all such components and correspond to
non-manifold vertices and edges which form a connected 1-complex. In other

Fig. 2. A simplicial 2-complex (a), its corresponding MC-decomposition graph (b) and
the exploded version of the MC-decomposition graph (c)
68 L. De Floriani, D. Panozzo, and A. Hui

words, if we consider the connected components of the 1-complex formed by the


non-manifold vertices and edges shared by C1 , C2 , ..., Cq , then the hyperarcs in
H joining C1 , C2 , ..., Cq are in one-to-one correspondence with such connected
components.
Our implementation of the MC-decomposition algorithm is based on the IS
library, which implements the Incidence Simplicial (IS) data structure together
with traversal and update operators. The IS data structure is a new dimension-
independent data structure specific for d-dimensional simplicial complexes, that
encodes all simplexes in the complex explicitly and uniquely, and some topolog-
ical relations among such simplexes [2]. We use such information to detect non-
manifold singularities in the input complex and to perform an efficient traversal
of the complex. By using the IS data structure, the computation of the MC-
decomposition graph has a time complexity linear in terms of the number of
simplexes in Σ.

6 Visualizing the MC-Decomposition and the


MC-Decomposition Graph
We have developed a tool for visualizing a decomposition of a simplicial com-
plex and its decomposition graph. This tool is called Graph-Complex Viewer
(GCViewer) and can visualize d-dimensional simplicial complexes, with d =
1, 2, 3, embedded in E 3 . GCViewer can be used as a stand-alone viewer for a
simplicial complex, or as a C++ library. GCViewer is general and is not tailored
to a specific decomposition. Thus, it is intended as a support to the development
and analysis of any graph-based representation for discretized shapes. Right now,
it is restricted to 3D shapes discretized as simplicial complexes, but it can be eas-
ily extended to deal with cellular shape decompositions. The MC-decomposition
algorithm, described before, has been developed as a plug-in for GCViewer.
GCViewer allows the user to specify a set of graphs, embedded in E 3 , and
provides a rich set of visualization capabilities to personalize the rendering of
both the complex and the graph. The user interface of GCViewer allows gener-
ating one or more views of the complex. For each view, it is possible to show,
hide, or personalize the rendering options of each component and graph that
has been defined. In GCViewer, we have developed a new technique for an ef-
fective visualization of the graph representing the decomposition of a shape,
that we have applied in rendering both the MC-decomposition graph and its
expanded version. The issue here is that the graphs are not planar. Since the
tool should be a support for an effective shape analysis and semantic annotation,
the layout of the graph nodes should visually reflect the position of the compo-
nents in the shape decomposition (in our case, in the MC-decomposition). We
have used the Cartesian coordinates of the vertices in each MC-component of
the original complex to compute an embedding of the nodes of graph in 3D
space. We place each node at the barycenter of its associated component. This
greatly improves the readability of both the MC-decomposition graph and of its
exploded version by also showing visually the correspondence with the shape
decomposition.
Computing and Visualizing a Graph-Based Decomposition 69

(a) (b) (c)

Fig. 3. A screenshot from GCViewer, that shows a complex representing an armchair,


highlighting its twelve MC-components (a), its MC-decomposition graph (b) and the
exploded version (c)

Figure 3 depicts a screenshot from GCViewer showing the original shape, its
MC-decomposition (into twelve MC-components), the MC-decomposition graph
and its exploded version. The MC-decomposition is shown in the original shape
by assigning different colors to the components. Note that the MC-components
are the back, the seat, the two armrests, the four legs and four pieces which
connect the legs to the seat.
Figure 4(a) shows a shape formed by two bottles connected by two laminas
(2-dimensional MC-components), plus the caps, each of which consists of two
MC-components. The two bottles with the two laminas form a 1-cycle in the
shape. This is reflected in the cycle in the MC-decomposition graph, shown in
Figure 4(c). As shown by this example, there is a relation between the cycles in
the graph and the 1-cycles in the original shape which is not, however, a one-to-
one correspondence. Not all the cycles in the graph correspond to 1-cycles in the

(a) (b) (c) (d)

Fig. 4. A complex representing a pair of bottles connected by two laminas (a); an


expanded version of the complex that shows its internal structure (b); the corresponding
MC-decomposition graph (c) and the corresponding exploded graph (d)
70 L. De Floriani, D. Panozzo, and A. Hui

shape, as shown in the example of Figure 3. 1-cycles in the shape that appear
as cycles in the MC-decomposition graph are those containing non-manifold
singularities. We are currently investigating the relation of the 1-cycles in the
shape with the properties of the MC-decomposition graph.
Beta binary versions of the visualization tool and of the MC-decomposition
algorithm are available at http://www.disi.unige.it/person/PanozzoD/mc/.

7 Concluding Remarks

We have presented a decomposition for non-manifold shapes into manifold-


connected components. We have discussed the MC-decomposition graph as a de-
scription of the connectivity structure of the decomposition and we have shown
through examples how the combinatorial properties of the MC-decomposition
graph are related to the topology of the decomposed shape. We have also de-
scribed an innovative tool for visualizing the decomposition and its
associated graph.
The MC-decomposition and its graph representation are the basis for ap-
plications to the analysis, understanding, retrieval and semantic annotation of
non-manifold shapes. In our current work, we are using the MC-decomposition
as the basis for computing topological invariants of a non-manifold shape, like the
Betti numbers. These latter are computed by reconstructing from the
MC-decomposition what we call a shell-based decomposition. The shell-based
decomposition is obtained by combining together into closed components the
MC-components that form a 2-cycle in the shape. Betti numbers are an impor-
tant topological shape signature to be used for shape classification and retrieval.
Another important application is detecting form features in a non-manifold
shape, based on the structure of the single components and on the combinatorial
structure of the decomposition. This is a very relevant issue in CAD, where non-
manifold shapes are helpful in describing mechanical models, often obtained as
the idealization of manifold ones.
In our future work, we plan to use the MC-decomposition as the basis for shape
matching and retrieval. This unique topological decomposition can be combined
with unique descriptions of the manifold parts, like the Reeb graph, thus forming
the basis for a two-level shape recognition process. Moreover, an important issue
is to study how the MC-decomposition is affected by updating the underlying
shape and its simplicial discretization. In this context, we plan to analyze and
classify operators for modifying a non-manifold shape and to develop algorithms
for efficiently updating the decomposition based on such operators.

Acknowledgements
This work has been partially supported by the MIUR-FIRB project SHALOM
under contract number RBIN04HWR8.
Computing and Visualizing a Graph-Based Decomposition 71

References
1. Hui, A., De Floriani, L.: A two-level topological decomposition for non-manifold
simplicial shapes. In: Proceedings of the 2007 ACM Symposium on Solid and Phys-
ical Modeling, Beijing, China, pp. 355–360 (June 2007)
2. De Floriani, L., Panozzo, D., Hui, A.: A dimension-independent data structure for
simplicial complexes (in preparation)
3. Léon, J.C., De Floriani, L.: Contribution to a taxonomy of non-manifold models
based on topological properties. In: Proceedings CIE 2008. ASME 2008 Computers
and Information in Engineering Conference, New York City, USA, August 3-6
(2008)
4. Crovetto, C., De Floriani, L., Giannini, F.: Form features in non-manifold shapes:
A first classification and analysis. In: Eurographics Italian Chapter Conference,
Trento, Italy, Eurographics, February 14–16, pp. 1–8 (2007)
5. Shamir, A.: Segmentation and shape extraction of 3d boundary meshes. In: State-
of-the-Art Report, Eurographics, Vienna, Austria, September 7 (2006)
6. Cornea, N., Silver, D., Min, P.: Curve-skeleton properties, applications and algo-
rithms. IEEE Transactions on Visualization and Computer Graphics 13(3), 530–
548 (2007)
7. Shah, J., Mantyla, M.: Parametric and feature-based CAD/CAM: concepts, tech-
niques and applications. John Wiley, Interscience (1995)
8. Falcidieno, B., Ratto, O.: Two-manifold cell-decomposition of R-sets. In: Kilgour,
A., Kjelldahl, L. (eds.) Proceedings Computer Graphics Forum, vol. 11, pp. 391–
404 (September 1992)
9. Rossignac, J., Cardoze, D.: Matchmaker: manifold BReps for non-manifold R-sets.
In: Bronsvoort, W.F., Anderson, D.C. (eds.) Proceedings Fifth Symposium on Solid
Modeling and Applications, pp. 31–41. ACM Press, New York (1999)
10. Whitney, H.: Local properties of analytic varieties. In: Cairns, S.S. (ed.) Differential
and combinatorial topology, A Symposium in Honor of Marston Morse, pp. 205–
244. Princeton University Press, Princeton (1965)
11. Pesco, S., Tavares, G., Lopes, H.: A stratification approach for modeling two-
dimensional cell complexes. Computers and Graphics 28, 235–247 (2004)
12. Agoston, M.: Computer Graphics and Geometric Modeling. Springer, Heidelberg
(2005)
A Graph Based Data Model for Graphics Interpretation

Endre Katona

University of Szeged, H-6720 Szeged, Árpád tér 2, Hungary


katona@inf.u-szeged.hu

Abstract. A universal data model, named DG, is introduced to handle vector-


ized data uniformly during the whole recognition process. The model supports
low level graph algorithms as well as higher level processing. To improve algo-
rithmic efficiency, spatial indexing can be applied. Implementation aspects are
discussed as well. An earlier version of the DG model has been applied for in-
terpretation of Hungarian cadastral maps. Although this paper gives examples
of map interpretation, our concept can be extended to other fields of graphics
recognition.

Keywords: map interpretation, graphics recognition, graph based data model,


vectorization, spatial indexing.

1 Introduction
Although a lot of papers have been published presenting graphics interpretation sys-
tems [5, 6]; most of them concentrate on algorithmic questions, and less attention is
taken to data storage and handling. In this section we give an overview of data models
that we have investigated when creating our own model.
It is a natural way to use some graph representation for a vectorized drawing.
Lladós et al. [11] define an attributed graph, and after extracting minimum closed
loops, a region adjacency graph is generated. This model concentrates on region
matching and this fact restricts its applicability. In some sense similar approach is
given in [15] defining an attributed relational graph where nodes represent shape
primitives and edges correspond to relations between primitives. A special approach
is applied in [1]: after an initial rungraph vectorization a mixed graph representation is
used to ensure interface between raster and vector data.
Some interpretation systems use relational database tables to store geometric in-
formation of vectorized maps [2, 18]. The advantage of this approach is that commer-
cial database management systems can be used to handle data. Although there are
existing techniques to store spatial data in relational and object-relational tables [14],
it is clear that the relational model is not the best choice for graphics interpretation.
Object-oriented models are more flexible than relational ones and can be used in
graphics recognition [3] as well as in GIS (Geographical Information System) ap-
proaches. Object-oriented GIS typically uses a hierarchy of spatial object types, such
as defined in the Geometry Object Model of the Open Geospatial Consortium [14]
supporting interoperability of different systems. The object-oriented concept is excel-
lent for high level description, but it does not support low level algorithmic efficiency
during recognition.

A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 72–81, 2009.
© Springer-Verlag Berlin Heidelberg 2009
A Graph Based Data Model for Graphics Interpretation 73

Topological models have been introduced in early GIS systems. A topological


model can be regarded as a set of cross-referenced drawing elements: each element has
a unique identifier (id) to offer the possibility for other objects to reference it. A char-
acteristic example is the node-line-polygon structure of the Arc/Info data model [8].
This model ensures efficient algorithms to compute polygon areas, point-in-polygon
decisions, overlay of two polygon sets, etc., but has some drawbacks when updating a
graphical database [8].
It is a challenging approach in graphics recognition to use some knowledge repre-
sentation scheme, such as semantic networks. Niemann et al. [13] give detailed de-
scription of a general semantic network model, applied both for vector-based [4] and
raster-based [17] map interpretation. Such a semantic network model defines concepts
to describe a priori knowledge, while modified concepts and instances are built up
during interpretation. Hartog et al. [7] use a similar approach to process gray-level
map images. Although semantic networks give a rather general approach, applied also
in speech understanding and robotics [13], they do not support the low-level effi-
ciency of algorithms like topological models do.

2 The DG (Drawing Graph) Model


Our model combines graph-based and topological approaches, but object-oriented
aspects and semantic networks are also taken into consideration in some sense. A
preliminary version of DG has been applied for the interpretation of Hungarian cadas-
tral maps [9].
Basic element of the model is the DG-object. A set Z of objects, describing the cur-
rent state of interpretation, is called DG-document. Z consists of two disjoint subsets:

– Zn denotes the set of normal objects; they are used to describe the current drawing.
– Zs denotes the set of sample objects, giving an a priori knowledge description.
For instance, sample objects can describe a symbol library of the map legend
orvector fonts of a given language.
Each object has the structure (id, layer, references) where id is the object identifier
number, and layer is a CAD-like attribute to classify objects. Normal objects have 0,
1, 2, etc. layer numbers, while sample objects are kept in a distinguished layer S.
Layer 0 is reserved for unrecognized objects. Each object may have references to
other objects using their id’s. The set of references R ⊂ Z × Z form an acyclic directed
graph, termed reference graph. Two types of references can be distinguished:

– A “contains” reference means that the current object involves the referred object
as a component. Rc ⊂ Z × Z denotes the set of “contains” references.
– A “defined by” reference means that the current object is a transformed version
of a referred sample object. Such references are mainly used to describe
recognized instances of sample objects. Rd ⊂ Z × Zs denotes the set of “defined
by” references, and R = Rc U Rd holds.

Denote domain(u) the set of all objects v that have a reference path from u to v, and
denote scope(u) the set of all objects v with a reference path from v to u. Notations
74 E. Katona

domainc(u) and scopec(u) mean restrictions to ”contains” reference paths. For any
sample object s, domain(s) ⊆ Zs is required.
The DG model contains three basic object types (instances of each may be normal
or sample objects as well):
– A NODE object represents a point with coordinates; a NODE instance is denoted
as node(x, y). Normally, a NODE has no references to other objects.
– An EDGE object is a straight line section given by “contains” references to the
endpoints. An EDGE instance is denoted as edge(node1, node2). A “line width”
attribute may be attached, if necessary.
– A PAT (pattern) object represents a set of arbitrary DG-objects given by “contains”
references to its components. A PAT instance is denoted as pat(obj1,..., objn).
Coordinates (x, y) of a center point may be attached to the PAT, usually giving the
“center of gravity” or other characteristic point of the pattern.
At a first look, NODE and EDGE objects form a usual graph structure describing
the drawing after initial vectorization. A PAT object typically contains a set of edges
identifying a recognized pattern on the drawing, but PATs can be utilized also in very
different ways.
The object type NODE has an important subtype, termed TEXT. Basically it repre-
sents a recognized inscription on the drawing, defined as a special transformation of a
vector font. (Vector fonts are given as sample objects.) Generalizing this idea, a
TEXT object can be used to describe a transformed instance of any other sample
object, as will be shown in Section 3. A TEXT instance is denoted as text(sample, x,
y, T, string) where sample is a “defined by” reference to a sample object, x and y are
coordinates of the insertion point, T is a transformation usually given by an enlarge-
ment factor and rotation angle, and string is an ASCII sequence of characters.
If string is given, then sample refers to a vector font pat(letter1,..., letterm) where,
for any i, letteri is a pat(edge1, edge2,...) object defining a character shape. In this case
text(sample, x, y, T, string) describes a recognized inscription on the drawing.
If string is omitted, then sample refers to the description of a certain symbol. For in-
stance, sample may refer to a pat(edge1,..., edgen) object giving vector description of a
map symbol with (0, 0) coordinates as center point. In this case text(sample, x, y, T)
describes a recognized instance of the symbol at point (x, y) transformed according to T.

Fig. 1. Small circles denoting geodetically identified points on maps


A Graph Based Data Model for Graphics Interpretation 75

Since TEXT is a subtype of NODE, it can be applied as endpoint of an edge. In


this way we can represent special symbols applied for instance at parcel corners on
cadastral maps (Fig. 1).
It is easy to see that the above concept of TEXT ensures not only high flexibility, but
also supports efficient displaying of the current recognition state during the whole process.

3 Interpretation Strategy
Initially the DG-document contains only sample objects coding prototypes of symbols
and characters to be recognized. Interpretation starts with some raw vectorization
process (see [20] for an overview of vectorization methods). As a result of the
vectorization, a NODE-EDGE graph description of the drawing is inserted in the
DG-document. At this moment all normal objects are in layer 0. The processing is
performed as a sequence of recognition steps, each step may consist of three phases:
1. Hypothesis generation. PAT objects are created in the DG-document. For in-
stance, if a set of edges e1,..., en is recognized as a map object, then a pat(e1,..., en)
is created with the layer number associated with the current map object type.
Such an operation does not change the underlying data, thus the hypothesis gen-
eration is a reversible step ensuring the possibilities of backtracking and ignoring.
PAT objects can describe a hierarchy of higher level structures like blocks and
entities in [19].
2. Verification of hypotheses can be made by the user or by a higher level algorithm:
PATs of false hypotheses are marked as “rejected” while correct ones as “accepted”.
3. Finalization. Rejected hypotheses are dropped and accepted ones are processed,
possibly making irreversible changes in the underlying data. In some cases final-
ization can be omitted or postponed, in this way preserving the possibility of
backtracking.
The above procedure will be demonstrated on two examples.

Example 1. Text recognition (Fig. 2).


1. In the hypothesis generation step small connected subgraphs are detected as can-
didates of single characters (see pat1, pat2 and pat3 in Fig. 2). Next, if an aligned
group of character candidates is found, a pat(pat1,..., patn, text1, text2) object is
created where the contents of text1 and text2 are not defined yet. Rotation angle α
of the string candidate is detected, and characters are recognized with rotation α
and 180º + α, respectively. Recognition results are stored in text1 for α and in text2
for 180º + α. The recognition itself can be performed with some graph matching
technique like in [11] and [12], or using a neural network [9].
2. String hypotheses can be verified by the user or applying some a priori informa-
tion. For instance, when processing cadastral maps, the set of legal parcel num-
bers is usually given in an external database.
3. If a hypothesis is rejected, then all PATs and TEXTs, created in the hypothesis
generation phase, should be deleted (Fig. 2). If accepted, then only the selected
TEXT should be kept and all other objects – including the base vectors of the
characters – should be deleted.
76 E. Katona

Fig. 2. Example of text recognition using the DG model. Arrows denote references between
objects.

Example 2. Recognition of connection signs on cadastral maps (Fig. 3). A connection


sign is applied on the boundary line of two objects expressing that the two objects are
logically connected (e.g. a building belongs to a given parcel, see Fig. 5).
1. In the hypothesis generation phase a pat(pat1, pat2, text) is created to each connec-
tion sign candidate, where pat1 involves the connection sign edges, pat2 contains
the segments of the base line (Fig. 3), and text has a reference to a sample connec-
tion sign symbol.
2. Verification of hypotheses can be made by simple accept/reject answers.
3. If a hypothesis has been rejected, then pat, pat1, pat2 and text should be deleted. If
accepted, then edges in pat1 are deleted, edges in pat2 are unified into a single
edge e0, and pat(pat1, pat2, text) is replaced with pat(e0, text) expressing the con-
nection between the symbol text and the edge e0. As a result, recognized connec-
tion signs are displayed correctly (Fig. 5/b).

Fig. 3. Recognition of a connection sign. Arrows denote references between objects.


A Graph Based Data Model for Graphics Interpretation 77

4 Implementation
The whole DG-document can be stored in RAM, since the DG description of a whole
map sheet normally does not exceed 10 Mbytes. Data structure consists of two arrays,
Obj and In. The Obj array contains object descriptions, and In[k] gives the starting
address for object with identifier k. (Note that description of an object does not con-
tain its id.) This mode of storage ensures constant access time along object references.
To ensure computational efficiency, “contained in” references – as inverses of
“contains” references – should be applied in some cases. For instance, all NODE
objects should have “contained in” references to the connected EDGE objects, in this
way efficient graph algorithms can be programmed. Note that in our implementation,
when necessary, all “contained in” references can be generated in linear time.
Automatic interpretation always needs human control and corrections, therefore it
is important to ensure fast displaying of the current recognition state on monitor
screen. When displaying a DG-document, only straight line sections should be drawn,
because all objects can be traced back to EDGE objects along “contains” and “defined
by” references. Color and line style is associated with each layer number (excepting
the S layer, because sample objects are not displayed). To demonstrate recognition
hypotheses on the screen, edges in layer 0 are displayed according to the maximum
layer number in their scope.

5 Spatial Indexing
The above DG implementation ensures fast data access along references, but spatial
searches, for instance to find the nearest node to a given node, may be very slow. The
problem can be solved by spatial indexing (for an overview see [16]). There are two
main types of spatial indexes: tree structured indexes are based on hierarchical tiling
of the space (usually quadtrees are applied), while grid structured indexes use homo-
geneous grid tiling.
Although quadtrees have nice properties in general, in the case of drawing interpreta-
tion a grid index may be a better choice, because of the following reasons. On one hand,
drawing density is limited by readability constraints. As a consequence, the number of
objects in a grid cell is a priori limited. On the other hand, map interpretation algorithms
normally use fixed search window (for instance, when recognizing dashed lines). A grid
index of cell size near to the search window size can work efficiently.
To discuss indexing techniques, we define the minimum bounding box (MBB) of
an object as the minimum enclosing rectangle whose edges are parallel to coordinate
axes. Considering the DG model, the MBB of an object z can be determined by com-
puting minimum and maximum coordinates of nodes in domainc(z). (MBB can be
defined also by domain(z), in this case the MBB of a TEXT object involves not only
the insertion point, but also the transformed vectors of the sample object.)
Fig. 4 shows a grid index example of 3 × 3 tiles where a list of object id’s is cre-
ated to each grid cell. An id appears in the i-th list if the MBB of the object overlaps
Ci. In this way the same object id may appear is several lists.
78 E. Katona

C1 1
C2
C3 2
C4 1, 3, 4, 6
C5 6
C6
C7 4, 5
C8 5, 6
C9 5

Fig. 4. Example of a grid index

Our grid index implementation [10] ensures the insertion of a new id in constant
time. As a consequence, a grid index for N objects can be generated in O(N) time,
which is better than the usual building time O(N⋅log N) for quadtrees.

6 Application: Interpretation of Cadastral Maps


The DG model has been applied to interpret Hungarian cadastral maps. Main process-
ing phases are sketched below (here we concentrate only on data modeling aspects,
for algorithmic details see [9, 10]).

1. Vectorization. A thinning-based vectorization algorithm converts the whole


scanned image into a set of vectors. Further on denote N the number of generated
vectors.
2. Creating topology. An initial DG-document (a NODE–EDGE graph) is gener-
ated from the set of vectors. This process takes O(N⋅log N) time (it is based on
sorting the nodes according to their coordinates, and unifying nodes of identical
coordinates).
3. Dashed line recognition. For each dashed line candidate a pat(edge1,..., edgen) is
created. If grid indexing is applied, then recognition can be performed in O(N)
time (see Section 5). If a dashed line candidate is accepted, then component edges
are replaced with a single EDGE object having a layer number assigned to dashed
lines.
4. Text recognition. String candidate PATs are created as shown in Example 1 of
Section 3. A 17-element feature vector is generated for each character PAT, and
recognition is performed by a feedforward neural network (for details see [9]).
Although vectorized symbols may have significant distortions as compared to the
raster image, the neural network can learn these distortions and can produce suc-
cessful recognition (Fig. 5).
5. Recognizing connection signs, as explained in Example 2 of Section 3.
A Graph Based Data Model for Graphics Interpretation 79

Fig. 5. Automatic interpretation of Hungarian cadastral maps: a) raw vectorized image, b)


recognition result (without manual correction)

6. Recognizing small circle symbols (Fig. 1). A hypothesis pat(edge1,..., edgen) is


created to each small closed convex polygon. If accepted, edge1,..., edgen are de-
leted and nodes of these edges are unified into a single TEXT object at the center
point of the polygon.
7. Drawing correction. The initial raw vectorization has typical anomalies at corners
and T-junctions (Fig. 5/a). Each of these anomalies is recognized as a pat(pat1,
pat2) where pat1 contains edges to be corrected and pat2 contains the new (cor-
rected) edges. When accepted, edges in pat1 are deleted; otherwise edges in pat2
should be deleted. These corrections are performed only on edges with empty
scope, that is, on edges that have not been recognized till now.
8. Recognizing buildings and parcels. A complex algorithm is applied, utilizing rec-
ognized connection signs, house numbers and parcel numbers [9]. A PAT object
is created to each building polygon and parcel polygon. If accepted, the PAT is
kept, otherwise rejected.

Processing time of one recognition step takes only a few seconds for a total map
sheet. This fact supports interactivity and makes it possible to create rather complex
algorithms in realistic time.

7 Conclusions
A universal graph-based data model has been introduced for graphics interpretation.
The same data structure is used
- to describe the original (raw vectorized) drawing,
- to describe and display the recognized drawing,
- to support the recognition process as well as manual corrections.
Our specification is independent of recognition algorithms, but suggests an
interpretation methodology on one hand, and gives a technical background on the
other hand.
When applying an interpretation system in practice, it is a usual difficulty that the
user is not familiar with inherent algorithms and data structures, and therefore cannot
make optimal control of the system. We think that basic ideas of the DG model
80 E. Katona

(only with four object types) are simple enough to understand for the user, and this
fact supports the efficiency of interactive work.

References
1. Boatto, L., Consorti, V., Buono, M., Zenzo, S., Eramo, V., Esposito, A., Melcarne, F.,
Meucci, M., Morelli, A., Mosciatti, M., Scarci, S., Tucci, M.: An Interpretation System for
Land Register Maps. Computer 25(7), 25–33 (1992)
2. Chen, L.-H., Liao, H.-Y., Wang, J.-Y., Fan, K.-C., Hsieh, C.-C.: An Interpretation System
for Cadastral Maps. In: Proc. of 13th Internat. Conf. on Pattern Recognition, pp. 711–715.
IEEE Press, Los Alamitos (1996)
3. Delalandre, M., Trupin, E., Labiche, J., Ogier, J.M.: Graphical Knowledge Management in
Graphics Recognition Systems. In: Brun, L., Vento, M. (eds.) GbRPR 2005. LNCS,
vol. 3434, pp. 35–44. Springer, Heidelberg (2005)
4. Ebi, N.B.: Image Interpretation of Topographic Maps on a Medium Scale Via Frame-based
modelling. In: International Conference on Image Processing, vol. I, pp. 250–253. IEEE
Press, California (1995)
5. Graph-based Representations in Pattern Recognition. Series of conference proceedings.
LNCS, vol. 2726 (2003), vol. 3434 (2005), vol. 4538 (2007). Springer, Heidelberg (last
three volumes)
6. Graphics Recognition (series). Selected papers of GREC workshops. LNCS, vol. 3088
(2004), vol. 3926 (2006), vol. 5046 (2008). Springer, Heidelberg (last three volumes)
7. Hartog, J., Kate, T., Gerbrands, J.: Knowledge-Based Segmentation for Automatic Map In-
terpretation. In: Kasturi, R., Tombre, K. (eds.) Graphics Recognition 1995. LNCS,
vol. 1072, pp. 159–178. Springer, Heidelberg (1996)
8. Hoel, E., Menon, S., Morehouse, S.: Building a robust relational Implementation of To-
pology. In: Hadzilacos, T., Manolopoulos, Y., Roddick, J.F., Theodoridis, Y. (eds.) SSTD
2003. LNCS, vol. 2750, pp. 508–524. Springer, Heidelberg (2003)
9. Katona, E., Hudra, G.: An Interpretation System for Cadastral Maps. In: Proceedings of
10th International Conference on Image Analysis and Processing (ICIAP 1999), pp. 792–
797. IEEE Press, Los Alamitos (1999)
10. Katona, E.: Automatic map interpretation. Ph.D. Thesis (in Hungarian), University of
Szeged (2001)
11. Lladós, J., Sanchez, G., Marti, E.: A String-Based Method to Recognize Symbols and
Structural Textures in Architectural Plans. In: Chhabra, A.K., Tombre, K. (eds.) GREC
1997. LNCS, vol. 1389, pp. 91–103. Springer, Heidelberg (1998)
12. Messner, B.T., Bunke, H.: Automatic Learning and Recognition of Graphical Symbols in
Engineering Drawings. In: Kasturi, R., Tombre, K. (eds.) Graphics Recognition 1995.
LNCS, vol. 1072, pp. 123–134. Springer, Heidelberg (1996)
13. Niemann, H., Sagerer, G.F., Schröder, S., Kummert, F.: ERNEST: A Semantic Network
System for Pattern Understanding. IEEE Trans. on Pattern Analysis and Machine Intelli-
gence 12(9), 883–905 (1990)
14. Open Geospatial Consortium: Simple Features Specification for SQL – Version 1.1.,
http://www.opengeospatial.org
15. Qureshi, R.J., Ramel, J.Y., Cardot, H.: Graph Based Shapes Representation and Recogni-
tion. In: Escolano, F., Vento, M. (eds.) GbRPR 2007. LNCS, vol. 4538, pp. 49–60.
Springer, Heidelberg (2007)
A Graph Based Data Model for Graphics Interpretation 81

16. Samet, H.: Design and Analysis of Spatial Data Structures. Addison Wesley, Reading
(1989)
17. Schawemaker, J.G.M., Reinders, M.J.T.: Information Fusion for Conflict Resolution in
Map Interpretation. In: Chhabra, A.K., Tombre, K. (eds.) GREC 1997. LNCS, vol. 1389,
pp. 231–242. Springer, Heidelberg (1998)
18. Suzuki, S., Yamada, T.: MARIS: Map Recognition Input System. Pattern Recogni-
tion 23(8), 919–933 (1990)
19. Vaxiviere, P., Tombre, K.: Celesstin: CAD Conversion of Mechanical Drawings. Com-
puter 25(7), 46–54 (1992)
20. Wenyin, L., Dori, D.: From Raster to Vectors: Extracting Visual Information from Line
Drawings. In: Pattern Analysis and Applications, pp. 10–21. Springer, Heidelberg (1999)
Tracking Objects beyond Rigid Motion

Nicole Artner1 , Adrian Ion2 , and Walter G. Kropatsch2


1
Austrian Research Centers GmbH - ARC, Smart Systems Division, Vienna, Austria
nicole.artner@arcs.ac.at
2
PRIP, Vienna University of Technology, Austria
{ion,krw}@prip.tuwien.ac.at

Abstract. Tracking multiple features of a rigid or an articulated ob-


ject, without considering the underlying structure, becomes ambiguous
if the target model (for example color histograms) is similar to other
nearby regions or to the background. Instead of tracking multiple fea-
tures independently, we propose an approach that integrates the under-
lying structure into the tracking process using an attributed graph. The
independent tracking processes are driven to a solution that satisfies the
visual as well as the structural constraints. An approach for rigid ob-
jects is presented and extended to handle articulated objects consisting
of rigid parts. Experimental results on real and synthetic videos show
promising results in scenes with considerable amount of occlusion.

1 Introduction
Tracking multiple features belonging to rigid as well as articulated objects is
a challenging task in computer vision. Features of rigid parts can change their
relative positions due to variable detection precision, or can become occluded. To
solve this problem, one can consider using part-based models that are tolerant to
small irregular shifts in relative position - non-rigid motion, while still imposing
the global structure, and that can be extended to handle articulation.
One possibility to solve this task is to describe the relationships of the parts
of an object in a deformable configuration - a spring system. This has already
been proposed in 1973 by Fischler et al. [1]. Felzenszwalb et al. employed this
idea in [2] to do part-based object recognition for faces and articulated objects
(humans). Their approach is a statistical framework minimizing the energy of
the spring system learned from training examples using maximum likelihood
estimation. The energy of the spring system depends on how well the parts
match the image data and how well the relative locations fit into the deformable
model. Ramanan et al. apply in [3] the ideas from [2] in tracking people. They
model the human body with colored and textured rectangles, and look in each
frame for likely configurations of the body parts. Mauthner et al. present in [4]
an approach using a two-level hierarchy of particle filters for tracking objects
described by spatially related parts in a mass spring system.

Partially supported by the Austrian Science Fund under grants P18716-N13 and
S9103-N13.

A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 82–91, 2009.

c Springer-Verlag Berlin Heidelberg 2009
Tracking Objects beyond Rigid Motion 83

In this paper we employ spring systems, but in comparison to the related work
we try to stress solutions that emerge from the underlying structure, instead of
using structure to verify statistical hypothesis. The approach presented here re-
fines the concepts in [5] and extends them to handle articulation. Initial thoughts
related to this work have been presented in the informal workshop [6]. The aim
is to successfully track objects, consisting of one or more rigid parts, undergo-
ing non-rigid motion. Every part is represented by a spring system encoding
the spatial relationships of the features describing it. For articulated objects,
the articulation points are found through observation of the behavior/motion of
the object parts over time. The articulation points are integrated into the spring
systems as additional distance constraints of the parts connected to them.
Looking at related work in a broader field, the work done in tracking and
motion analysis is also related to our approach. There is a vast amount of work
in this field, as can be seen in the surveys [7,8,9,10]. It would go beyond the
scope of this paper mentioning all of this work. Interesting to know is that early
works even date back to the seventies, where Badler and Smoliar [11] discuss
different approaches to represent the information concerning and related to the
movement of the human body (as an articulated object).
The paper is organized as follows: Sec. 2 introduces tracking rigid parts with
a spring system. In Sec. 3 this concept is extended to tracking articulated ob-
jects. Experiments on real and synthetic videos and a discussion are in Sec. 4.
Conclusion and future plans can be found in Sec. 5.

2 Tracking Rigid Parts

To identify suitable features of a rigid object, the Maximally Stable Extremal


Regions (MSER) detector [12] is used to extract regions in a manually delin-
eated region of interest. An attributed graph (AG) represents the structural
dependencies. It is created by associating a vertex to each region. The corre-
sponding 3D color histograms of the underlying regions are the attributes of the
vertices. In this approach, a Delaunay triangulation is employed to insert edges
between the vertices (color regions) and to define the spatial relationships be-
tween the regions. A triangulation can model a rigid structure just by imposing
distance constraints between connected vertices. On each vertex of the AG, a
feature tracker, in our case the Mean Shift tracker [13], is initialized and the
color histograms of the vertices in the initial state become the target models q̂.
During object tracking the color histograms of the AG and “spring-like” edge
energies of the structure are used to carry out gradient descent energy minimiza-
tion on the joint distribution surface (color similarity and structure).

2.1 Realizing the Spring System by Graph Relaxation

The objective is to link the processes of structural energy minimization of the


graph, and color histogram similarity maximization by Mean Shift tracking.
Graph relaxation is a possibility to realize the spring system. It introduces a
84 N. Artner, A. Ion, and W.G. Kropatsch

mechanism, which imposes structural constraints on the mode seeking process


of Mean Shift. As the tracked objects are rigid, the objective of the relaxation is
to maintain the tracked structure as similar as possible to the initial structure.
Thus the aim of graph relaxation is to minimize the total energy of the structure.
The variations of the edge lengths in the AG and their directions are used to
determine a structural offset for each vertex. This offset vector is the direction
where a given vertex should move such that its edges restore their initial length
and the energy of the structure is minimized. This structural offset vector O is
calculated for each vertex v as follows:

O(v) = k · (|e, | − |e|)2 · (−d(e, v)), (1)


e∈E(v)

where E(v) are all edges e incident to vertex v, k is the elasticity constant of
the edges in the structure, e is the edge length in the initial state and e, at a
different point in time. d(e, v) is the unitary vector in the direction of edge e
that points toward v. Fig. 1 shows two simple examples for graph relaxation.

A B A C

A B’ B B’

A B B’ B

(a) (b)

Fig. 1. Graph relaxation examples. B is the initial state of the vertex and B  the
deformed one. The arrows visualize the structural offset vectors O(B  ).

2.2 Combining Iterative Tracking and Graph Relaxation


For every frame, Mean Shift and structural iterations are performed until a max-
imum number of iterations is reached i , or the graph structure attains equilib-
rium, i.e. its total energy is beneath a certain threshold e (see Algorithm 1).
The ordering of the regions during the iterations of the optimization process
depends on the correspondence between the candidate models p̂ of the regions in
the current frame and the target models q̂ from the initialization. Both models
are normalized 3D color histograms in the RGB color space. The similarity
between the models can be determined by the Bhattacharyya coefficient
m 
B= p̂u · q̂u . (2)
u=1

For more details on the Bhattacharyya coefficient see [13]. The regions are or-
dered descending by the Bhattacharyya coefficient and with this the iterations
start with the most confident regions.
To compute the position of each region (vertex in AG), Mean Shift offset and
structure-induced offset are combined using a mixing coefficient
Tracking Objects beyond Rigid Motion 85

g = 0.5 − (B − 0.5). (3)


g weights the structural offset vector and 1 − g the offset of Mean Shift. This
gain ensures that the offset vector of Mean Shift has a greater influence on the
resulting offset vector if the Bhattacharyya coefficient B is high, meaning that
candidate and target model are similar. If the Bhattacharyya coefficient is low
the gain leads to an increased influence of the structural offset.

3 Imposing Articulation
Articulated motion is a piecewise rigid motion, where the rigid parts conform
to the rigid motion constraints, but the overall motion is not rigid [10]. An
articulation point connects several rigid parts. The parts can move independent
to each other, but their distance to the articulation point remains the same. This
paper considers articulation in the image plane (1 degree of freedom).
As described in Sec. 2, the rigid parts of an articulated object are tracked
combining the forces of the deterministic tracker and the graph structure. To
integrate articulation, two vertices of each rigid part are connected with the
common articulation point1 . These two reference vertices constrain the distance
of all other vertices of the same part to the articulation point. The reference
vertices are directly influenced by the articulation point and propagate the “in-
formation” from the other connected parts during tracking.
Each rigid part is iteratively optimized as explained in Sec. 2 and for artic-
ulated objects the articulation points are integrated into this process through
their connection to the reference vertices.
Important features of the structure of an object do not necessarily correspond
to easily trackable visual features, e.g. articulation points can be occluded, or
can be hard to track and localize. Articulation points are thus not associated to
a tracked region (as opposed to tracked features of the rigid parts). The position
of the articulation points is determined in an initial frame (see Sec. 3.1) and
used in the rest of the video (see Sec. 3.2).

3.1 Determining the Articulation Point


For discrete time steps the motion of rigid parts connected by an articulation
point can be modeled by considering rotation and translation separately:
p = translate(rotate(p, c, θ), o),
where p = (x, y) is the vertex at time t and p = (x , y  ) is the same vertex at
time t + δ. p is obtained by first rotating p around c = (xc , yc ) with angle θ
and then translating it with offset o = (xo , yo ). More formally,
p = (R ∗ (p − c) + c) + o, (4)
where R is the 2D rotation matrix with angle θ.
1
One could consider connecting all points of a part, but this would unnecessarily
increase the complexity of the optimization process.
86 N. Artner, A. Ion, and W.G. Kropatsch

To compute the position of c at time t it is enough to know the position of


two rigid parts A and B. Each of them is represented by two reference vertices,
at times t and t + δ: pi , pi , 0 < i  4, where pi is the position of a vertex at
time t and pi is the position at time t + δ. The vertices of part A are identified
by i ∈ {1, 2} and of B by i ∈ {3, 4}. The previous relations produce a system
of eight equations in eight unknowns: xc , yc , xo , yo , sin(θA ), cos(θA ), sin(θB ),
cos(θB ), where θA , and θB are the rotation angles of the two parts.
The position of the articulation point c is computed in the first frames and
used further on as mentioned below.

3.2 Integration into Spring System


To derive the position of the articulation point in each frame of the video, the
following procedure is applied. In the frame in which the position of the articu-
lation point was computed (see Sec. 3.1), a local coordinate system is created for
each adjacent rigid part and aligned to the corresponding reference vertices. In
Fig. 2 this concept is shown, where p1 , p2 , c, X, Y are the tracked vertices, artic-
ulation point (rotation center) and coordinate system at time t; p1 , p2 , c , X  , Y 
at time t + δ; o is the offset (translation), and θ is the rotation angle. The posi-
tions of the articulation point in the local coordinate systems of each connected
part are determined and associated to the respective rigid part.
In every frame, having the tracked reference vertices enables determining the
local coordinate system and the position of the articulation point. For each
frame Algorithm 1 is executed. When determining the current position of the
articulation point (line 13 in Algorithm 1), the hypothesis of the adjacent parts
for the position of the articulation point are combined using the gain a:
vi
Zi
ai = 
m , Zi = Bij (5)
Zk j=1
k=1

where Zi is the sum of all Bhattacharyya coefficients (see Eq. 2) of part i with vi
regions/vertices, m is the number of adjacent regions, and ai is the gain for part

Y
X
X
p2
p1 c
o
c

p2
θ p1

Y

time: t time: t + δ

Fig. 2. Encoding and deriving of an articulation point in the local coordinate system,
during two time steps: t and t + δ
Tracking Objects beyond Rigid Motion 87

Algorithm 1. Algorithm for tracking articulated objects


1: processFrame
e threshold total energy of structure
i threshold maximum number of iterations
2: i←1  iteration counter
3: while (i < i and Et > e ) do
4: for every rigid part do
5: define region order depending on B
6: for every region do
7: do Mean Shift iteration
8: do structural iteration
9: calculate mixing gain g (Eq. 3)
10: mix offsets depending on g and set new position
11: end for
12: end for
13: calculate current position of articulation point (Eq. 5)
14: for every rigid part do
15: define region order depending on B
16: for every region do
17: do Mean Shift iteration
18: do structural iteration including articulation point
19: calculate mixing gain g
20: mix offsets depending on g and set new position
21: end for
22: end for
23: i ← i+1
24: Et ← determine total energy of spring system
25: end while
26: end

i weighting its influence on the position of the articulation point. ai depends on


the correspondence of the color regions of a rigid part with the target models q̂ of
this regions from the initial frame. This results in high weights for the hypothesis
of parts which are confident (e.g. not occluded).

4 Experiments

The following experiments show sequences with one articulation point. More
articulation points can be handled by pairwise processing of all adjacent rigid
parts (a more efficient strategy is planned). In all experiments we employ a pri-
ori knowledge about the structure of the target object (number of rigid parts
and articulation points). A method like in [14] could be used to automatically
delineate rigid parts and articulation points of an object. The elasticity con-
stant k (see Equ. 1) is set to 0.2 for all experiments (this value was selected
empirically).
88 N. Artner, A. Ion, and W.G. Kropatsch

Experiment 1: Fig. 3 shows an experiment with a real video sequence, where


the challenge is to track the partly non-rigid motion of the pattern on the t-
shirt. The pattern is not only translated and rotated, but also squeezed and
expanded (crinkles of the t-shirt). The idea behind this experiment is to show
how the proposed approach handles independent movement of the features of a
single rigid part. As can be seen in the second row of Fig. 3 and Tab. 1, Mean
Shift combined with structure is superior to Mean Shift alone. The graphs in
Fig. 3 (and all other experiments) represent a visual support to easily see the
spatial arrangement of the tracked regions. In the results without the spring
system there is no inter-relation between the trackers, and the graphs show the
deformation of the structure of the object.

Frame 25 Frame 70 Frame 120 Frame 180

Fig. 3. Experiment 1. Tracking non-rigid motion without (top row) and with structure
(bottom row). Frame 25 in bottom row shows how the graph should look like.

Table 1. Sum of spatial deviations in pixels from ground truth for experiment 1

Frame 25 70 120 180


spatial deviation without structure 122.18 152.66 269.86 196.96
spatial deviation with structure 58.99 66.49 140.64 124.96

Experiment 2: In experiment 2 the task is to track scissors through partial occlu-


sions. The employed Mean Shift tracking tracks color regions. It was necessary
to put color stickers on the scissors, to create features to track. Fig. 4 shows that
the additional information provided by structure helps to successfully overcome
the occlusion. Without the support of the spring system the Mean Shift trackers
mix up the regions.

Experiment 3: In the following experiment (see Fig. 5) a synthetic sequence is


used to accurately analyze the behavior of the approachThe synthetic pattern
contains 7 color regions (region size: height 10 to 20 pixels, width 10 to 20
pixels) and is 50 × 100 pixels, the occlusion is 100 × 100 pixels. The patterns are
Tracking Objects beyond Rigid Motion 89

Frame 266 Frame 461 Frame 628

Frame 266 Frame 274 Frame 286

Fig. 4. Experiment 2. Top row: with structure and articulation point. Bottom row:
without structure. The red star-like symbol represents the estimated articulation point.

Frame 7 Frame 9 Frame 11 Frame 12

Fig. 5. Experiment 3. Top row without articulation point and bottom row with.

translated by a x-offset of 6 pixels per frame and rotated by 4 degrees. Due to


the big movement between the frames and the full occlusion of the left pattern in
frame 8, separately tracking the patterns fails. Using the estimated articulation
point, it is possible to successfully track the regions through this sequence. The
distance constraint imposed by the articulation point is the reason why, even
though there are big to full occlusions, the positions of the occluded regions
can be reconstructed without visible features. Fig. 6 shows the deviation from
ground truth of experiment 3. We did several of these synthetic experiments and
found out that tracking including the articulation point is in all cases superior
to tracking the parts separately.

4.1 Discussion
The Mean Shift tracker fits very well into our approach as the spring system
optimization is also iterative, and we are able to re-initiate Mean Shift at any
given state of a vertex in the spring system. Another tracker with the same
90 N. Artner, A. Ion, and W.G. Kropatsch

45 45

40 40

35 35

30 30
Spatial deviation

Spatial deviation
25 25

20 20

15 15

10 10

5 5

0 0
5 10 15 20 5 10 15 20
Frames Frames

(a) (b)

Fig. 6. Spatial deviation for each region. (a) without and (b) with articulation point.
The big deviations are a result of the full occlusion in frame 8 in Fig. 5.

properties could also be used. As tracking with Mean Shift is used to solve the
association task (avoiding complex graph matching), the success of this approach
is highly dependent on the results of the trackers. It is necessary that at least
part of the vertices of the spring system can be matched.
The current approach extends the rigid structure to handle articulation. This
only imposes a distance constraint and does not consider any information related
to the motion of the parts. During an occlusion the articulation point improves
the reconstruction of the positions of the occluded regions. Nevertheless, the
distance constraint brought in by the articulation point is not always enough
to successfully estimate the positions (it is sufficient for translations, but not
for rotations of parts). For example if one of two rigid parts of an object is
completely occluded and there is a big rotation of the occluded part between
adjacent frames this approach may fail.
At the moment the two reference vertices are selected with no special criteria.
This criteria could be the connectivity of the vertices or their visual support.

5 Conclusion

This paper presents a structural approach for tracking objects undergoing non-
rigid motion. The focus lies on the integration of articulation into the spring
systems describing the spatial relationships between features of the rigid parts
of an object. The position of the articulation points is derived by observing
the movements of the parts of an articulated object. Integrating the articulation
point into the optimization process of the spring system leads to improved track-
ing results in videos with big transformations and occlusions. A weakness of this
approach is that it cannot deal with big rotation during occlusions. Therefore,
we plan to consider higher level knowledge like spatio-temporal continuity to
observe the occluded part reappearing around the borders of the visible occlud-
ing object. Another open issue is dealing with scaling and perspective changes.
Tracking Objects beyond Rigid Motion 91

Future work is also to cope with pose variations and the resulting changes in the
features representing the object.

References
1. Fischler, M.A., Elschlager, R.A.: The representation and matching of pictorial
structures. Transactions on Computers 22, 67–92 (1973)
2. Felzenszwalb, P.F.: Pictorial structures for object recognition. IJCV 61, 55–79
(2005)
3. Ramanan, D., Forsyth, D.: Finding and tracking people from the bottom up. In:
Proceedings of 2003 IEEE Computer Society Conference on Computer Vision and
Pattern Recognition, June 2003, vol. 2, pp.II-467–II-474 (2003)
4. Mauthner, T., Donoser, M., Bischof, H.: Robust tracking of spatial related compo-
nents. In: ICPR, pp. 1–4. IEEE, Los Alamitos (2008)
5. Artner, N., Mármol, S.B.L., Beleznai, C., Kropatsch, W.G.: Kernel-based tracking
using spatial structure. In: 32nd Workshop of the AAPR, OCG, May 2008, pp.
103–114 (2008)
6. Artner, N.M., Ion, A., Kropatsch, W.G.: Tracking articulated objects using struc-
ture (accepted). In: Computer Vision Winter Workshop 2009, PRIP, Vienna Uni-
versity of Technology, Austria (February 2009)
7. Gavrila, D.M.: The visual analysis of human movement: A survey. Computer Vision
and Image Understanding 73(1), 82–980 (1999)
8. Moeslund, T.B., Hilton, A., Krger, V.: A survey of advances in vision-based human
motion capture and analysis. Computer Vision and Image Understanding 104(2–3),
90–126 (2006)
9. Aggarwal, J.K., Cai, Q.: Human motion analysis: A review. Computer Vision and
Image Understanding 73(3), 428–440 (1999)
10. Aggarwal, J.K., Cai, Q., Liao, W., Sabata, B.: Articulated and elastic non-rigid
motion: A review. In: IEEE Workshop on Motion of Non-Rigid and Articulated
Objects, pp. 2–14 (1994)
11. Badler, N.I., Smoliar, S.W.: Digital representations of human movement. ACM
Comput. Surv. 11(1), 19–38 (1979)
12. Matas, J., Chum, O., Urban, M., Pajdla, T.: Robust wide baseline stereo from
maximally stable extremal regions. Image and Vision Computing 22(10), 761–767
(2004)
13. Comaniciu, D., Ramesh, V., Meer, P.: Kernel-based object tracking. PAMI 25(5),
564–575 (2003)
14. Mármol, S.B.L., Artner, N.M., Ion, A., Kropatsch, W.G., Beleznai, C.: Video object
segmentation using graphs. In: Ruiz-Shulcloper, J., Kropatsch, W.G. (eds.) CIARP
2008. LNCS, vol. 5197, pp. 733–740. Springer, Heidelberg (2008)
Graph-Based Registration of Partial Images of
City Maps Using Geometric Hashing

Steffen Wachenfeld, Klaus Broelemann, Xiaoyi Jiang, and Antonio Krüger

Department of Mathematics and Computer Science, University of Münster, Germany

Abstract. In this paper, we present a novel graph-based approach for


the registration of city maps. The goal is to find the best registration
between a given image, which shows a small part of a city map, and
stored map data. Such registration is important in fields like mobile
computing for augmentation purposes. Until now, RFID tags, markers,
or regular dot grids on specially prepared maps are typically required.
In this paper we propose a graph-based method to avoid the need of
special maps. It creates a graph representation of a given input image
and robustly finds an optimal registration using a geometric hashing
technique. Our approach is translation, scale and rotation invariant, map
type independent and robust against noise and missing data.

1 Introduction and Related Work


In this paper, we present a novel graph-based approach for the registration of
city maps. The goal is to find the best registration between a given image, which
shows a small part of a city map, and stored map data. Such registration is
very important in fields like mobile computing, where mobile camera devices
are used to take images or videos of paper-based city maps, which are then
augmented with additional information. Using georeferenced mobile devices for
the augmentation of paper-based city maps is an excellent proof of concept for
the so called toolglass and magic lens principle that was introduced by Bier
in 1993 [2]. Many different applications [3,6,7,9,10,11] have realized the magic
map lens concept using different technologies. The motivation is to combine
the advantages of a high-resolution paper map and the dynamic and the up-to-
datedness of a movable display.
Reilly et al. [6] use maps fitted with an array of RFID tags that have the
disadvantage of low spatial resolution and high production costs. Schöning et
al. [11] use a classical augmented reality approach by attaching optical markers
on the map, which occlude large parts of the map. The approach of Rohs et al. [8]
needs a regular dot grid as optical marker (see Figure 1). Despite their differences
all these applications require additional infrastructure on the map, for example
RFID tags or visual markers. The optimal solution would be an image based
registration without any modification of the map. Wagner et al. [13] present an
adaption of SIFT [4] and FERN [5] to perform a registration of the image. Still,
this method uses image based descriptors and requires exact a priori knowledge
about the map image.

A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 92–101, 2009.

c Springer-Verlag Berlin Heidelberg 2009
Graph-Based Registration of Partial Images of City Maps 93

Fig. 1. Existing approaches to track a mobile-camera device over paper maps: Visual
markers (left), grid of black dots (middle), application scenario (right)

Our interest in this work is to explore the potential of graph-based approaches


to the task of map registration. We propose a graph-based method, which creates
a graph representation of a given input image and robustly finds an optimal
registration using a geometric hashing technique. This enables us to match maps
of arbitrary types with ground truth data that is given in an abstract form, e.g.
in vector representation like it is used for navigation systems. Our approach
is translation, scale and rotation invariant, map type independent and robust
against noise and missing data.

2 Registration Algorithm
In this section, we will present our algorithm for the registration of city maps. Our
algorithm stores data of city maps by generating a graph representation, and by
saving this graph representation as geometric hash. The goal of our algorithm is
to compute a registration between stored city maps and images of arbitrary parts
of city maps, which may be translated, rotated and scaled. The matching is per-
formed using geometric hashing and is divided into an offline phase and an online
phase. Figure 2 gives an overview of the two phases and the corresponding steps.
The offline phase is used to efficiently store given city maps in the form of
a geometric hash, which can be used later with high efficiency, e.g. on mobile
devices. Our algorithm can extract and store information from different kinds
of city maps. To extract information from maps, which are given in the form
of images, we use a map-dependent preprocessing that creates the graph repre-
sentation. To extract information from maps which consist of vector data, e.g.
data for navigation systems, the algorithm transforms the vector data into the
appropriate graph representation. From this graph representation, a geometric
hash is created.
The online phase is a query phase, where a part of a map is presented to
the algorithm with the goal to find the best registration between this map part
and the stored map data. From the map part to be registered, a graph repre-
sentation is generated and used as query. Such map parts may be images from
low-resolution cameras, e.g. from camera phones. We use special preprocessing
to create the graph representations from such low quality images of map parts.
Due to the use of geometric hashing, this registration is completely translation,
scale and rotation invariant and robust to noise, small perspective distortions,
and occlusions or missing data.
94 S. Wachenfeld et al.

Fig. 2. Overview of the two phases of our algorithm and their corresponding steps

The result of the online phase is a registration of the smaller part onto one
of the stored larger maps. This registration implicitly leads to a transformation
function between the coordinate systems of the two maps. This allows for one of
the main applications, which is an overlay function for mobile devices. A camera
phone can be used to take an image or a video of a part of a map. The part
of the map which is visible in this image or video, is registered and additional
information is overlayed at the corresponding positions, e.g. locations of WLAN
spots, cash machines or other points of interest.
The following three subsections will present the algorithm in detail. First, we
will explain our preprocessing steps, which transform an image of a city map
into a graph representation. Then, we will explain how the geometric hash is
created from the graph representations of city maps. In the last subsection we
show how the graph of the query image is used to find the best registration and
to compute the transformation function.

2.1 Preprocessing

As already mentioned, the goal of the preprocessing is to generate a graph rep-


resentation from map images or from vector data. The latter case is a format
dependent mathematical transformation, which will not be covered here. In-
stead, we will focus on the generation of graph representations from (partial)
map images. City maps may differ a lot concerning the colors and line types
used for streets, buildings and so on. Our registration algorithm relies on the
structure of the streets. In contrast to other approaches which use statistical de-
scriptors of the map image, the use of structure information makes our algorithm
independent of the map types to a great extent.
The preprocessing is performed in three steps (see Figure 4 for an illustration),
which will be explained in the following. The first step creates a binary image,
where streets are foreground and non-streets are background. The second step
computes the skeleton of the binary street image. From the skeleton, the graph
representation is computed in the third and last preprocessing step.

Street Detection. The intention of the street detection step is to localize and
to extract streets from background. This step is not necessary for vector data,
as already mentioned. But the standard input are map images, which can be
of different types. Maps from different sources generally use different colors.
Graph-Based Registration of Partial Images of City Maps 95

a) b) c)

Fig. 3. Three maps of Orlando: Google Map (a), Map24 (b), OpenStreetMap (c)

a) b) c)

Fig. 4. From a map image to a graph representation: Map image (Google style) (a),
binary street map with skeleton (b), resulting graph representation (c)

Figure 3 shows the color differences between three example maps for the same
part of the town Orlando.
To localize the streets of a certain map type, we use a specific color profile.
If, for example, our algorithm shall localize streets of a Google map, it uses a
specific Google Map color profile. This profile defines specific shades of yellow,
orange and white as streets, while specific shades of green, blue, as well as light
and dark gray are defined as background. Pixels of a given image are classified to
be foreground or background according to their distance to the specified colors.
The classification result is an intermediate binary image. Due to noise this image
may contain smaller holes or touching streets, which shall not be connected.
Also, larger holes may occur, where text was in the original map. This step is
not completely map type independent though it works with different map types
just by replacing the profile data.
Onto this intermediate binary image, we apply morphological opening and
closing operations to close small holes and to remove noise. Larger holes, e.g.
from text in the map image, are not closed by morphological operations, as using
larger closing masks leads to unwanted connections of streets. We close larger
holes or remove isolated areas of foreground pixels by investigating the size of
connected components. This way, we get a satisfying binary image, where streets
are foreground and non-streets are background.
The input image may result from a low quality camera of a mobile device,
such as a camera phone or a PDA. This leads to extra noise and inhomogeneous
illumination. While the noise is not critical to our experience, inhomogeneous
illumination leads to misclassifications. We divide the image into 9 larger regions
(3 × 3) and 64 smaller regions (8 × 8) and use local histograms to determine
illumination changes. We then adapt the intensity of the expected colors for the
classification step. This way, we get a satisfying binary image even for noisy and
inhomogeneously illuminated camera images (see Figure 6 for an example).
96 S. Wachenfeld et al.

Skeletonization. The second step is to build a skeleton from the binary street
image. This means to slim the streets to the width of one pixel. Figure 4b
shows an example of a binary street map and the resulting skeleton. This step
is relatively straightforward and thus not described in detail due to the space
limitation.

Graph Computation. The graph can easily be created from the skeleton by fol-
lowing the skeleton from node to node. At crossings of larger roads, multiple
nodes may result which have to be merged. Edges between two nodes have to
be significantly longer than the widths of the corresponding streets, otherwise
they will be merged. Figure 4c shows a resulting graph. Remember that edges
represent the existence of street connections but not their shape.

2.2 Offline Hash Generation

To be independent of the map colorization and thereby enable the use of different
map types for offline and online phase, we compare structural information using
geometric hashing. Geometric hashing is a well known method in computer vision
to match objects which have undergone transformations or when only partial
information exists (see Wolfson [14] for a good overview). The idea is to use a
computationally expensive offline phase to store geometric map data in the form
of a hash. Later, during the online phase, the hash can be accessed in a fast
manner to find a best matching result for given query data.
In the preprocessing step, we have extracted the geometric features of the
city map. The result of the preprocessing is a graph representation, which is
completely map type independent. Two graphs from two different maps of the
same location will most probably look similar (crossings are nodes and streets
are edges).
The hash which is going to be created is a 2D plane which will hold information
about node positions of transformed (translated, scaled, and rotated) graphs. It
can be visualized as a plane full of dots, which represent node positions. The
hash for a city map is created by transforming the graph representation of the
map many times. This is similar to the generalized Hough transform (see [1])
with the difference that our set of transformations also contains translations and
is generated from the graph. The 2D positions of the graph’s nodes build one
hash plane for each transformation. The final hash represents the information of
all hash planes in an efficient way.
Hash planes are created by selecting an edge e and by translating, rotating and
scaling the whole map, so that one of the two nodes which belong to edge e is pro-
jected onto position x1 = (−1, 0) and the other one onto position x2 = (1, 0). This
transforms edge e into an edge between x1 and x2 of two unit lengths. All other
node positions undergo the same transformation and build the hash plane he .
To yield rotation and scale invariance, the original map is scaled and rotated
many times, one time for each edge. Thus, the hash consists of multiple hash
planes, one for each edge. If the two nodes of an edge are very close or very far
from each other, extremely large or small scales result respectively. Extremely
Graph-Based Registration of Partial Images of City Maps 97

small scales lead to agglomerations of projected nodes in the hash and extremely
large scales lead to an error amplification. As a consequence, our algorithm
prefers edges which approximately have a preferred length d∗ . Also, if the nodes
of a selected edge are located near the borders of the image, the resulting hash
plane will have large empty areas (i.e. the upper or lower half). To avoid such
heavily unused hash space, we prefer to select edges from the image’s center c.
We use the best rated edges e = (n1 , n2 ), according to the rating function

−d(n1 , n2 ) 1
r(e) = d(n1 , n2 ) · exp ∗
·
d d(c, n1 ) + d(c, n2 ) + 1
     
∗ prefers nodes near the center c
prefers d = d
where d(a, b) is the distance between a and b, d∗ is the preferred length, and c
is the image center.
Later, the online phase will require a nearest neighbor search on the hash data.
To facilitate a fast search, the positions of the projected nodes are saved in buckets.
Buckets result from dividing the 2D plane of the hash using an equidistant grid. A
bucket is one field of this grid and stores information about the projected nodes
within this field of all overlayed hash planes. The optimal number of buckets de-
pends on the number of nodes in the hash. Also non-equidistant grids are possible
to yield a uniform distribution of nodes over the buckets.

2.3 Registration
The registration is done in two steps, first the matching between the graph
representation of the query map image and the geometric hash, and second, the
computation of the transformation function.
Matching. To match the query graph with the hash, edges e from the query
graph which have a good rating r(e) are selected. Similar to the hash plane
creation, the edge is projected onto to x1 and x2 , and the query graph is trans-
formed accordingly. The transformed query graph is then projected onto the
hash. At the positions of the projected query graph’s nodes, the hash’s buckets
are searched for coinciding nodes of hash planes. If the query image shows a part
of a stored map, the query graph will be like a subgraph to the graph of the stored
map. All good rated edges of the stored map have been used to create the hash
planes. Thus, the projection of the query graph, which is a subgraph, will lead
to matches. Noise and perspective distortion will certainly impact the exactness
of the matches, but projected nodes may still be expected to be found in the
right buckets. For each selected edge e, we compute a matching quality q(e, h)
for each hash plane h. This quality indicates how good the edge e corresponds
to the edge which has been used to create the hash plane h. It is measured by
investigating the distance of all transformed query nodes to the nearest nodes of
hash plane h. For each edge the five best matching hash planes will be considered
for further investigation, in the assumption that the corresponding matching is
amongst these five hash planes. See Figure 5 for examples of best matching hash
planes for selected edges of the same query image.
98 S. Wachenfeld et al.

Fig. 5. Examples of best matching hash planes for four different query node pairs: best
matching hash plane (blue), query graph (green), selected node pair for alignment (red)

Resulting Transformation. To complete the registration process, we use a func-


tion T (x, y), which transforms any 2D point of the query image to the corre-
sponding 2D point on the stored map. T , which allows for rotation, scaling, and
translation, can be described by

c s
T : (x, y) → (x, y) · + (tx , ty )
−s c

where tx and tx are translations, and c and s represent scale and rotation.
To determine the parameters, we generate an association matrix A. This ma-
trix stores information about matchings between query nodes and nodes of the
five best matching hash planes per selected query edge. For m nodes in the query
graph and n nodes of a stored graph, A is a m × n matrix. If for a query edge
e, the query node mi is matched to the stored node nj of hash plane hk , then
the value of A(i, j) is increased by the quality of the matching q(e, hk ). If the
five best matching hash planes are considered, the node mi will be associated to
five nodes. Normally, one association is correct, and the other four are wrong.
This is repeated for each selected query edge. Correct associations will occur re-
peatedly for many query edges, while wrong associations will vary. Thus, correct
associations will be indicated by high accumulated quality values in the rows of
matrix A. The highest entry of each row indicates a correct association.
Assuming that these associations are correct, we select pairs of these associ-
ations to solve the linear equation system for the variables c, s, tx and ty of T .
For each pair we get a transformation. If we apply this transformation, we can
measure an error based on the distance of all query nodes to their associated
nodes. We select the transformation with the least median error (LME) as result.
Because several associations will be wrong, we use the LME which is robust to
up to 50% of outliers [12].

3 Experimental Results
We have performed two kinds of experiments: laboratory experiments by simu-
lation and live experiments using images from a Nokia N95 camera phone.

Laboratory experiments. The purpose of laboratory experiments is to system-


atically investigate the performance of our algorithm in a controlled manner.
Graph-Based Registration of Partial Images of City Maps 99

We took screenshots of very large displayed city maps (∼ 5000 × 3000 pixel)
from Google for two German cities (Münster and Hannover). These images were
used to produce a ground truth hash for each city. For testing, small parts from
known positions of these maps were generated in various test series and then
presented to the system.
To measure the quality of a resulting transformation, we compute the RMSE
for the area of the input image. If TR is the resulting transformation and TG the
ground truth transformation, the RMSE for an input image I of size w × h is
calculated by
! h ! w
1
RM SE(TR (I), TG (I)) = TR (x, y) − TG (x, y)2 dxdy
w·h 0 0

This RMSE value measures the spatial distance between the ground truth trans-
formation and the computed transformation and can be interpreted as an error
in pixels. To distinguish between successful and failed registrations, we have set
a threshold of 5 pixels with regard to the RMSE measure. For our purpose of
map augmentation this is a sufficient accuracy.
We generated 1080 query (sub)graphs of a ground truth image. This is done
by choosing 6 rotation angles and 36 translational vectors and applying these
transformations to some fix subarea of the ground truth image, resulting in 216
different subimages and accordingly 216 subgraphs. We repeated the experiments
5 times (1080 query graphs in total) and determined the average accuracy mea-
sure. Due to space limitations, Table 1 shows only results of a small fraction of
all experiments which were conducted using the following test series for which
only the query graph was modified and the hash was left unaffected:
– k% of the nodes were randomly deleted to simulate missing data.
– k% of the nodes were shifted to new positions to simulate map differences.
Each coordinate of the shift vector was subject to a Gaussian distribution
with standard deviation σ.
– Insertion (mode 1) of k% new nodes, which are randomly positioned and
connected to 1 to 4 nearest nodes to simulate map updates.
– In a second insertion (mode 2), k% new nodes were generated, each lying on
an original edge to simulate variations in the graph creation procedure.
– Finally, a series of query graphs was generated by combining the node dele-
tion, shift, and insertion (mode 1) operations.
The various kinds of artificially generated distortions in the test series were in-
tended to simulate different errors we encounter in dealing with real images. The
experiments have shown that our algorithm is robust against such distortions
up to a certain extent. Particularly interesting is the case of insertion (mode 2).
Adding new nodes lying on an edge of the query graph means an oversegmenta-
tion of street contours and directly leads to substantial changes in graph topol-
ogy. Fortunately, the simulation results indicate that this distortion source does
not introduce more registration inaccuracy than other errors. This remarkable
property is due to the robust behavior of geometric hashing.
100 S. Wachenfeld et al.

Table 1. Registration algorithm performance for various test series. Percentage of


correctly registered images (RMSE < 5 pixels) based on a test of 1,080 query images.

Test series Münster Hannover


Deletion (5%) 99.91% 94.17%
Shift (5%) (σ=15) 99.81% 93.70%
Insertion (mode 1, 5%) 99.91% 93.06%
Insertion (mode 2, 10%) 99.91% 94.54%
Combination (5%, 5%, 5%) 99.17% 91.02%

Fig. 6. Live experiment: Image taken by Nokia N95 camera phone (left), visualization
of detected streets, skeleton and nodes (middle), registration result (right)

Live experiments. For the live experiments, we printed the city maps on paper
and took images of parts of these maps using a Nokia N95 camera phone under
uncontrolled illumination. Good results for such low quality images (640 × 480
pixels) could be observed. In this experiment series the ground truth is not
exactly known and we thus determined successful and failed registrations by
visual inspection. One such successful registration can be seen in Figure 6, where
the mobile camera image suffers from inhomogeneous illumination and small
perspective distortions.

4 Conclusion and Future Work


We have shown that it is possible to register partial maps by structural infor-
mation extracted from images of city maps. Our approach allows for a variety of
applications including augmented maps using mobile devices, without the need
to use markers or regular point grids. Our intention in this work was to explore
the potential of such structural approaches, which are particularly important if
only vectorial representation of ground truth maps is available. Even in other
situations results from structure-based registration can still provide a valuable
additional information source in solving the complex registration task.
Structural approaches like ours may encounter some difficulties in case of self-
similarity like strong periodic maps (e.g. Manhattan). However, as soon as the
query map does not have a perfect structural identity with other parts, there
will be some distinguishing features helpful for our map registration. Additional
experimental work will be conducted to investigate this issue.
Graph-Based Registration of Partial Images of City Maps 101

The online phase of our hashing algorithm is very fast and the memory needed
to store the hash of a city map is low (∼200KB-2MB per city map), which enables
the algorithm directly implemented on mobile devices. Our long-term goal is thus
to realize a realtime implementation using Symbian C++. This would allow for
realtime registration and augmentation of city maps using our Nokia N95 camera
phones. Further, we would like to implement an automatic map type recognition
based on color distribution. This would allow for an adaption of the color profile
to completely unknown map styles.

References
1. Ballard, D.H.: Generalizing the hough transform to detect arbitrary shapes. In:
Readings in computer vision: issues, problems, principles and paradigms, pp. 714–
725. Morgan Kaufmann Publishers Inc., San Francisco (1987)
2. Bier, E.A., Stone, M.C., Pier, K., Buxton, W., DeRose, T.D.: Toolglass and Magic
Lenses: the See-Through Interface. In: Proc. of the 20th Annual Conf. on Computer
Graphics and Interactive Techniques, pp. 73–80. ACM Press, New York (1993)
3. Hecht, B., Rohs, M., Schöning, J., Krüger, A.: WikEye–Using Magic Lenses to
Explore Georeferenced Wikipedia Content. In: Proc. of the 3rd Int. Workshop on
Pervasive Mobile Interaction Devices, PERMID (2007)
4. Lowe, D.G.: Distinctive Image Features from Scale-Invariant Keypoints. Int. Jour-
nal of Computer Vision 60(2), 91–110 (2004)
5. Ozuysal, M., Fua, P., Lepetit, V.: Fast Keypoint Recognition in Ten Lines of Code.
In: Proc. of Int. Conf. on Computer Vision and Pattern Recognition, pp. 1–8 (2007)
6. Reilly, D., Rodgers, M., Argue, R., Nunes, M., Inkpen, K.: Marked-up Maps: Com-
bining Paper Maps and Electronic Information Resources. Personal and Ubiquitous
Computing 10(4), 215–226 (2006)
7. Reitmayr, G., Eade, E., Drummond, T.: Localisation and Interaction for Aug-
mented Maps. In: Proc. ISMAR, pp. 120–129 (2005)
8. Rohs, M., Schöning, J., Krüger, A., Hecht, B.: Towards Real-Time Markerless
Tracking of Magic Lenses on Paper Maps. In: Adjunct Proc. of the 5th Int. Conf.
on Pervasive Computing, Late Breaking Results, pp. 69–72 (2007)
9. Rohs, M., Schöning, J., Raubal, M., Essl, G., Krüger, A.: Map Navigation with
Mobile Devices: Virtual Versus Physical Movement with and without Visual Con-
text. In: Proc. of the 9th Int. Conf. on Multimodal Interfaces, pp. 146–153. ACM,
New York (2007)
10. Schöning, J., Hecht, B., Starosielski, N.: Evaluating Automatically Generated
Location-based Stories for Tourists. In: Extended Abstracts on Human Factors
in Computing Systems, pp. 2937–2942. ACM, New York (2008)
11. Schöning, J., Krüger, A., Müller, H.J.: Interaction of Mobile Devices with Maps.
In: Adjunct Proc. of the 4th Int. Conf. on Pervasive Computing, vol. 27. Oesterre-
ichische Computer Gesellschaft (2006)
12. Stewart, C.V.: Robust Parameter Estimation in Computer Vision. SIAM
Rev. 41(3), 513–537 (1999)
13. Wagner, D., Reitmayr, G., Mulloni, A., Drummond, T., Schmalstieg, D.: Pose
Tracking from Natural Features on Mobile Phones. In: Proc. ISMAR, pp. 125–134
(2008)
14. Wolfson, H.J., Rigoutsos, I.: Geometric Hashing: An Overview. IEEE Comput. Sci.
Eng. 4(4), 10–21 (1997)
A Polynomial Algorithm for Submap
Isomorphism
Application to Searching Patterns in Images

Guillaume Damiand1,2 , Colin de la Higuera1,3 , Jean-Christophe Janodet1,3 ,


Émilie Samuel1,3 , and Christine Solnon1,2,
1
Université de Lyon
2
Université Lyon 1, LIRIS, UMR5205 CNRS, F-69622, France
{guillaume.damiand,christine.solnon}@liris.cnrs.fr
3
CNRS UMR5516, F-42023 Laboratoire Hubert Curien,
Université de Saint-Etienne - Jean Monnet
{cdlh,janodet,emilie.samuel}@univ-st-etienne.fr

Abstract. In this paper, we address the problem of searching for a


pattern in a plane graph, i.e., a planar drawing of a planar graph. To do
that, we propose to model plane graphs with 2-dimensional combinatorial
maps, which provide nice data structures for modelling the topology of
a subdivision of a plane into nodes, edges and faces. We define submap
isomorphism, we give a polynomial algorithm for this problem, and we
show how this problem may be used to search for a pattern in a plane
graph. First experimental results show the validity of this approach to
efficiently search for patterns in images.

1 Introduction
In order to manage the huge image sets that are now available, and more partic-
ularly to classify them or search through them, one needs similarity measures.
A key point that motivates our work lies in the choice of data structures for
modelling images: These structures must be rich enough to describe images in a
relevant way, while allowing an efficient exploitation. When images are modelled
by vectors of numerical values, similarity is both mathematically well defined and
easy to compute. However, images may be poorly modelled with such numerical
vectors that cannot express notions such as adjacency or topology.
Graphs allow one to model images by means of, e.g., region adjacency relation-
ships or interest point triangulation. In either case, graph similarity measures
have been investigated [CFSV04]. These measures often rely on (sub)graph iso-
morphism —which checks for equivalence or inclusion— or graph edit distances
and alignments —which evaluate the cost of transforming a graph into another

The authors acknowledge an Anr grant Blanc 07-1_184534: this work was done
in the context of project Sattic. This work was partially supported by the IST
Programme of the European Community, under the Pascal 2 Network of Excellence,
Ist–2006-216886.

A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 102–112, 2009.

c Springer-Verlag Berlin Heidelberg 2009
A Polynomial Algorithm for Submap Isomorphism 103

f 1 2 1 2

a b j g
6 6
e
4 3 4 3

d c i h 5 5
(a) (b) (c) (d)

Fig. 1. (a) and (b) are not isomorphic plane graphs; bold edges define a compact plane
subgraph in (c), but not in (d)

graph. If there exist rather efficient heuristics for solving the graph isomorphism
problem1 [McK81, SS08], this is not the case for the other measures which are
often computationally intractable (NP-hard), and therefore practically unsolv-
able for large scale graphs. In particular, the best performing approaches for
subgraph isomorphism are limited to graphs up to a few thousands of nodes
[CFSV01, ZDS+ 07].
However, when measuring graph similarity, it is overwhelmingly forgotten
that graphs actually model images and, therefore, have special features that
could be exploited to obtain both more relevant measures and more efficient
algorithms. Indeed, these graphs are planar, i.e., they may be drawn in the
plane, but even more specifically just one of the possible planar embeddings is
relevant as it actually models the image topology, that is, the order in which
faces are encountered when turning around a node.
In the case where just one embedding is considered, graphs are called plane.
Isomorphism of plane graphs needs to be defined in order to integrate topolog-
ical relationships. Let us consider for example the two plane graphs drawn in
Fig. 1(a) and 1(b). The underlying graphs are isomorphic, i.e., there exists a bi-
jection between their nodes which preserves edges. However, these plane graphs
are not isomorphic since there does not exist a bijection between their nodes
which both preserves edges and topological relationships.
Now by considering this, the isomorphism problem becomes simple [Cor75],
but the subgraph isomorphism problem is still too hard to be tackled in a
systematic way. Yet we may argue that when looking for some pattern in a
picture (for example a chimney in a house, or a wheel in a car) we may simplify
the problem to that of searching for compact plane subgraphs (i.e., subgraphs
obtained from a graph by iteratively removing nodes and edges that are incident
to the external face). Let us consider for example the plane graphs of Fig. 1.
The bold edges in Fig. 1(c) constitute a compact plane subgraph. However,
the bold edges in Fig. 1(d) do not constitute a compact plane subgraph because
edge (4, 3) separates a face of the subgraph into two faces in the original
graph.
1
The theoretical complexity of graph isomorphism is an open question: If it clearly
belongs to NP, it has not been proven to be NP-complete.
104 G. Damiand et al.

Contribution and outline of the paper. In this paper, we address the problem
of searching for compact subgraphs in a plane graph. To do that, we propose
to model plane graphs with 2-dimensional combinatorial maps, which provide
nice data structures for modelling the topology of a subdivision of a plane into
nodes, edges and faces. We define submap isomorphism, we give a polynomial
algorithm for this problem, and we show how this problem may be used to search
for a compact graph in a plane graph. Therefore we show that the problem can
be solved in this case in polynomial time.
We introduce 2D combinatorial maps in Section 2. A polynomial algorithm
for map isomorphism is given in Section 3 and submap isomorphism is studied
in Section 4. We relate these results with the case of plane graphs in Section 5,
and we give some experimental results that show the validity of this approach
on image recognition tasks in Section 6.

2 Combinatorial Maps
A plane graph is a planar graph with a mapping from every node to a point in 2D
space. However, in our context the exact coordinates of nodes matter less than
their topological organisation, i.e., the order nodes and edges are encountered
when turning around faces. This topological organisation is nicely modelled by
combinatorial maps [Edm60, Tut63, Cor75].
To model a plane graph with a combinatorial map, each edge of the graph
is cut in two halves called darts, and two one-to-one mappings are defined onto
these darts: the first to link darts belonging to two consecutive edges around a
same face, the second to link darts belonging to a same edge.
Definition 1. (2D combinatorial map [Lie91]) A 2D combinatorial map, (or
2-map) is a triplet M = (D, β1 , β2 ) where D is a finite set of darts; β1 is a per-
mutation on D, i.e., a one-to-one mapping from D to D; and β2 is an involution
on D, i.e., a one-to-one mapping from D to D such that β2 = β2−1 .
We note β0 for β1−1 . Two darts i and j such that i = βk (j) are said to be k-sewn.
Fig. 2 gives an example of a combinatorial map.
In some cases, it may be useful to allow βi to be partially defined, thus leading
to open combinatorial maps. The intuitive idea is to add a new element  to the

6
7 9 5
8 10
11
4 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18
12
17 β1 2 3 4 5 6 7 1 9 10 11 8 13 14 15 12 17 18 16
13 16
1 15
β2 15 14 18 17 10 9 8 7 6 5 12 11 16 2 1 13 4 3
18
14 3
2

Fig. 2. Combinatorial map example. Darts are represented by numbered black seg-
ments. Two darts 1-sewn are drawn consecutively, and two darts 2-sewn are concur-
rently drawn and in reverse orientation, with little grey segment between the two darts.
A Polynomial Algorithm for Submap Isomorphism 105

b
f ab c de f g
c e
a β1 b c d a f g e
g β2 e c
d

Fig. 3. Open combinatorial map example. Darts a, b, d, f and g are not 2-sewn.

set of darts, and to allow darts to be linked with  for β1 and/or β2 . By definition,
∀0 ≤ i ≤ 2, βi () = . Fig. 3 gives an example of open map (see [PABL07] for
precise definitions).
Finally, Def. 2 states that a map is connected if there is a path of sewn darts
between every pair of darts.
Definition 2. (connected map) A combinatorial map M = (D, β1 , β2 ) is con-
nected if ∀d ∈ D, ∀d ∈ D, there exists a path (d1 , . . . , dk ) such that d1 = d,
dk = d and ∀1 ≤ i < k, ∃ji ∈ {0, 1, 2}, di+1 = βji (di ).

3 Map Isomorphism
Lienhardt has defined isomorphism between two combinatorial maps as follows.
Definition 3. (map isomorphism [Lie94])Two maps M = (D, β1 , β2 ) and M  =
(D , β1 , β2 ) are isomorphic if there exists a one-to-one mapping f : D → D , called
isomorphism function, such that ∀d ∈ D, ∀i ∈ {1, 2}, f (βi (d)) = βi (f (d)).
We extend this definition to open maps by adding that f () = , thus enforcing
that, when a dart is linked with  for βi , then the dart matched to it by f is also
linked with  for βi .
An algorithm may be derived from this definition in a rather straightforward
way, as sketched in [Cor75]. Algorithm 1 describes the basic idea which will be
extended in section 4 to submap isomorphism: We first fix a dart d0 ∈ D; then,
for every dart d0 ∈ D , we call Algorithm 2 to build a candidate matching func-
tion f and check whether f is an isomorphism function. Algorithm 2 basically
performs a traversal of M , starting from d0 and using βi to discover new darts
from discovered darts. Initially, f [d0 ] is set to d0 whereas f [d] is set to nil for
all other darts. Each time a dart di ∈ D is discovered, from another dart d ∈ D
through βi so that di = βi (d), then f [di ] is set to the dart di ∈ D which is
linked with f [d] by βi .

Complexity issues. Algorithm 2 is in O(|D|). Indeed, the while loop is iterated


|D| times as (i) exactly one dart d is removed from the stack S at each iteration;
and (ii) each dart d ∈ D enters S at most once (d enters S only if f [d] = nil,
and before entering S, f [d] is set to a dart of D ). In Algorithm 1, the test of
line 4 may also be performed in O(|D|). Hence, the overall time complexity of
106 G. Damiand et al.

Algorithm 1. checkIsomorphism(M, M  )
Input: two open connected maps M = (D, β1 , β2 ) and M  = (D , β1 , β2 )
Output: returns true iff M and M  are isomorphic
1 choose d0 ∈ D
2 for d0 ∈ D do
3 f ← traverseAndBuildMatching(M, M  , d0 , d0 )
4 if f is a bijection from D ∪ { } to D ∪ { } and
∀d ∈ D, ∀i ∈ {1, 2}, f [βi (d)] = βi (f [d]) then
5 return true

6 return false

Algorithm 2. traverseAndBuildMatching(M, M  , d0 , d0 )


Input: two open connected maps M = (D, β1 , β2 ) and M  = (D , β1 , β2 ) and an
initial couple of darts (d0 , d0 ) ∈ D × D
Output: returns an array f : D ∪ { } → D ∪ { }
1 for every dart d ∈ D do: f [d] ← nil
2 f [d0 ] ← d0
3 let S be an empty stack; push d0 in S
4 while S is not empty do
5 pop a dart d from S
6 for i ∈ {0, 1, 2} do
7 if βi (d) = and f [βi (d)] = nil then
8 f [βi (d)] ← βi (f [d])
9 push βi (d) in S

10 f[ ] ←
11 return f

Algorithm 1 is O(|D|2 ). Note that it may be optimised (without changing its


complexity), e.g., by detecting failure while building matchings.

Correction of Algorithm 1. Let us first note that if checkIsomorphism(M, M )


returns true, then M and M  are isomorphic as true is returned only if the
isomorphism test of line 4 suceeds.
Let us now show that if M and M  are isomorphic then checkIsomorphism(M,
M ) returns true. If M and M  are isomorphic then there exists an isomorphism


function ϕ : D → D . Let d0 ∈ D be the dart chosen at line 1 of Algorithm 1.


As the loop lines 2-5 iterates on every dart d0 ∈ D , there will be an iter-
ation of this loop for which d0 = ϕ(d0 ). Let us show that for this iteration
traverseAndBuildMatching(M, M , d0 , d0 ) returns f such that ∀d ∈ D, f [d] =
ϕ(d) so that true will be returned line 5. Claim 1: When pushing a dart d in S,
f [d] = ϕ(d). This is true for the push of line 3 as f [d0 ] is set to d0 = ϕ(d0 ) at
line 2. This is true for the push of line 9 as f [βi (d)] is set to βi (f [d]) line 8 and
f [d] = ϕ(d) (induction hypothesis) and ϕ(d) = d ⇒ ϕ(βi (d)) = βi (d ) (by def-
inition of an isomorphism function). Claim 2: Every dart d ∈ D is pushed once
in S. Indeed, M is connected. Hence, there exists at least one path (d0 , . . . , dn )
A Polynomial Algorithm for Submap Isomorphism 107

such that dn = d and ∀k ∈ [1; n], ∃jk ∈ {0, 1, 2}, dk = βjk (dk−1 ). Therefore,
each time a dart di of this path is popped from S (line 5), di+1 is pushed in
S (line 9) if it has not been pushed before (through another path).

4 Submap Isomorphism
Intuitively, a map M is a submap of M  if M can be obtained from M  by
removing some darts. When a dart d is removed, we set βi (d ) to  for every dart
d such that βi (d ) = d.
Definition 4. (submap) An open combinatorial map M = (D, β1 , β2 ) is isomor-
phic to a submap of an open map M  = (D , β1 , β2 ) if there exists an injection
f : D ∪ {} → D ∪ {}, called a subisomorphism function, such that f () =  and
∀d ∈ D, ∀i ∈ {1, 2}, if βi (d) =  then βi (f (d)) = f (βi (d)) else either βi (f (d)) = 
or f −1 (βi (f (d))) is empty.
This definition derives from the definition of isomorphism. The only modification
concerns the case where d is i-sewn with . In this case, the definition ensures
that f (d) is i-sewn either with , or with a dart d which is not matched with a
dart of M , i.e., such that f −1 (d ) is empty (see example in Fig. 4).
Note that if M is isomorphic to a submap of M  , then M is isomorphic to the
map M  obtained from M  by restricting the set of darts D to the set of darts
D = {d ∈ D |∃a ∈ D, f (a) = d}.
Algorithm 3 determines if there is a submap isomorphism between two open
connected maps. It is based on the same principle as Algorithm 1; the only
difference is the test of line 4, which succeeds if f is a subisomorphism function
instead of an isomorphism function. The time complexity of this algorithm is
in O(|D| · |D |) as traverseAndBuilMatching is called at most |D | times and
its complexity is in O(|D|). Note that the subisomorphism test may be done in
linear time.
Concerning correctness, note that proofs and evidences given for isomorphism
are still valid: We solve the submap isomorphism problem with the same method
as before, except that function f is now an injection instead of a bijection.

o 5
6’ f 6
5’ e n p 4
8’ 8 9
9’ h i
4’ d
2’ b
1’ 3’ 7’ 10’ a
g j r 2 7 10
k l c 3
m q 1

M M M 

Fig. 4. Submap example. M is a submap of M  as it is obtained from M  by deleting


darts k to r. M  is not a submap of M  as the injection f : D → D wich matches
darts 1 to 10 with darts a to j does not verify Def. 4: β2 (2) = and f (2) = b but
β2 (b) = and f −1 (β2 (b)) is not empty because f −1 (d) = 4.
108 G. Damiand et al.

Algorithm 3. checkSubIsomorphism(M, M  )
Input: two open connected maps M = (D, β1 , β2 ) and M  = (D , β1 , β2 )
Output: returns true iff M is isomorphic to a submap of M 
1 choose d0 ∈ D
2 for d0 ∈ D do
3 f ← traverseAndBuildMatching(M, M  , d0 , d0 )
4 if f is an injection from D ∪ { } to D ∪ {  } and
∀d ∈ D, ∀i ∈ {1, 2}, βi (d) = ⇒ f (βi (d)) = βi (f (d)) and
∀d ∈ D, ∀i ∈ {1, 2}, βi (d) = ⇒ ∃e ∈ D, f (e) = βi (f (d)) then
5 return true

6 return false

5 From Plane Graphs to Maps

In this section, we show how to transform the problem of finding a compact


plane subgraph inside a plane graph into the problem of finding a submap in a
map, thus allowing to use our polynomial algorithm.
Let us first precise what we exactly mean by (compact) plane (sub)graph
isomorphism. Let us consider two graphs G1 = (N1 , E1 ) and G2 = (N2 , E2 ) that
are embedded in planes and let us note o(i, j) the edge which follows edge (i, j)
when turning around node i in the clockwise order. We shall say that

– G1 and G2 are plane isomorphic if there exists a bijection f : N1 → N2 which


preserves (i) edges, i.e., ∀(i, j) ∈ N1 ×N1 , (i, j) ∈ E1 ⇔ (f (i), f (j)) ∈ E2 and
(ii) topology, i.e., ∀(i, j) ∈ E1 , o(i, j) = (k, l) ⇔ o(f (i), f (j)) = (f (k), f (l));
– G1 is a compact plane subgraph of G2 if G1 is plane isomorphic to a compact
subgraph of G2 which is obtained from G2 by iteratively removing nodes
and edges that are incident to the external face.

Note that the pattern may be a partial subgraph of the target. Let us consider for
example Fig. 1c. Edge (1, 5) needs not to belong to the searched pattern, even-
though nodes 1 and 5 are matched to nodes of the searched pattern. However,
edge (4, 3) must belong to the searched pattern; otherwise it is not compact.
To use submap isomorphism to solve compact plane subgraph isomorphism,
we have to transform plane graphs into 2-maps. This is done by associating a
face in the map with every face of the graph except the external face. Indeed, a
2-map models a drawing of a graph on a sphere instead of a plane. Hence, none of
the faces of a map has a particular status whereas a plane graph has an external
(or unbounded) face. Let us consider for example the two graphs in Fig. 1a and
Fig. 1b: When embedded in a sphere, they are topologically isomorphic because
one can translate edge (d, c) by turning around the sphere, while this is not
possible when these graphs are embedded in a plane. In order to forbid one
to turn around the sphere through the external face, graphs are modelled by
open 2-maps such that external faces are removed: Only β2 is opened, and only
external faces are missing. Such open 2-maps correspond to topological disks.
A Polynomial Algorithm for Submap Isomorphism 109

Finally, a strong precondition for using our algorithms is that maps must
be connected. This implies that the original graphs must also be connected.
However, this is not a sufficient condition. One can show that an open 2-map
M modelling a plane graph G without its external face is connected if G is
connected and if the external face of G is delimited by an elementary cycle.
Hence, submap isomorphism may be used to decide in polynomial time if G1
is a compact plane subgraph of G2 provided that (i) G1 and G2 are modelled by
open 2-maps such that external faces are removed, and (ii) external faces of G1
and G2 are delimited by elementary cycles.
This result may be related to [JB98, JB99] which describe polynomial-time
algorithms for solving (sub)graph isomorphism of ordered graphs, i.e., graphs in
which the edges incident to a vertex are uniquely ordered.

6 Experiments

This section gives some preliminary experimental results that show the validity
of our approach. We first show that it allows to find patterns in images, and
then we study scale-up properties of our algorithm on plane graphs of growing
sizes. Experiments were run on an Intel Core2 Duo CPU at 2.20GHz processor.

6.1 Finding Patterns in Images

We have considered the MOVI dataset provided by Hancock [LWH03]. This


dataset consists of images representing a house surrounded by several objects.
We consider two different kinds of plane graphs modelling these images. First,
we have segmented them into regions and compute the 2D combinatorial map of
the segmented image using the approach described in [DBF04]. Second, we have
used the plane graphs provided by Hancock. They correspond to a set of cor-
ner points extracted from the images and connected by Delaunay triangulation.
These graphs were then converted into 2D combinatorial maps. In both cases,
we have extracted patterns from original images, and used our approach to find
these patterns.
The left part of Fig. 5 shows an image example, together with its plane
graph obtained after segmentation. This graph consists of 2435 nodes, 4057
edges and 1700 faces. The pattern extracted from this image corresponds to
the car, composed of 181 nodes, 311 edges and 132 faces. This pattern has been
found by our algorithm in the original image, even when submitted to rotation,
in 60ms.
The Delaunay graph corresponding to the corner points is shown on the right
part of Fig. 5. It has 140 nodes, 404 edges and 266 faces. The graph corresponding
to the car has 16 nodes, 38 edges and 23 faces. This pattern has been found by
our algorithm in the original image in 10ms.
Experiments show that our approach always allows one to find these patterns
in the image they have been extracted from.
110 G. Damiand et al.

Fig. 5. Finding a car in an image: The original image is on the left. The plane graph
obtained after segmentation is on the middle; the car has been extracted and rotated on
the right and it has been found in the original image. The graph obtained by Delaunay
triangulation and the corresponding combinatorial map are on the right; the car has
been extracted and it has been found in the original image.

6.2 Scale Up Properties

To compare scale-up properties of our approach with those of subgraph isomor-


phism algorithms, we have performed a second series of experiments: We have
randomly generated 3 plane graphs, g500 , g1000 and g5000 which have 500, 1000
and 5000 nodes respectively. These plane graphs are generated by randomly
picking n 2D points in the plane, then by computing Delaunay graph of these
points. For each plane graph gi , we have generated 5 sub-graphs, called sgi,k% ,
which have k% of the number of nodes of the original graph gi , where k belongs
to {5, 10, 20, 33, 50}.
Table 1 compares CPU times of our approach with those of Vflib2 [CFSV01],
a state-of-the-art approach for solving subgraph isomorphism (we present only
results of Vflib2 which is, in our experiments, always faster than Vflib and Ull-
mann). It shows the interest of using submap isomorphism to solve compact plane
subgraph isomorphism. Indeed, if both approaches spend comparable time for
the smaller instances, larger instances are much quickly solved by our approach.
In particular, instances (g5000 , sg5000,k% ) are solved in less than one second by
our approach whereas it is not solved after one hour of computation by Vflib2
when k ≥ 20.

Table 1. Comparison of scale-up properties of submap and subgraph isomorphism


algorithms. Each cell gives the CPU time (in seconds) spent by Vflib2 and our submap
algorithm to find all solutions. > 3600 means that Vflib2 had not finished after one
hour of computation.

sgi,5% sgi,10% sgi,20% sgi,33% sgi,50%


gi vf2 map vf2 map vf2 map vf2 map vf2 map
g500 0.08 0.07 0.04 0.10 0.47 0.03 0.7 0.02 10.4 0.10
g1000 4.7 0.21 2.54 0.07 0.55 0.05 7.31 0.06 12.7 0.06
g5000 12.3 0.28 156.5 0.31 >3600. 0.31 >3600. 0.31 >3600. 0.31
A Polynomial Algorithm for Submap Isomorphism 111

It is worth mentionning here that the two approaches actually solve different
problems: Our approach searches for compact plane subgraphs whereas Vflib2
searches for induced subgraphs and do not exploit the fact that the graphs
are planar. Hence, the number of solutions found may be different: Vflib2 may
found subgraphs that are topologically different from the searched pattern; also
our approach may found compact plane subgraphs that are partial (see Fig. 1c)
whereas Vflib2 only searches for induced subgraphs. For each instance considered
in Table 1, both methods find only one matching, except for sg5000,10% which is
found twice in g5000 by vflib2 and once by our approach.

7 Discussion

We have defined submap isomorphism, and we have proposed an associated


polynomial algorithm. This algorithm may be used to find compact subgraphs
in plane graphs. First experiments on images have shown us that this may be
used to efficiently find patterns in images.
These first results open very promising further work. In particular, our ap-
proach could be used to solve the subgraph isomorphism problem in polynomial
time for classes of planar graphs which admit polynomial numbers of planar em-
beddings. Also, generalisation to 3 and higher dimensional combinatorial maps
is immediate. Hence, our approach could be used to find subsets of compact
volumes in 3D images.
Submap isomorphism leads to exact measures, which may be used to check
if a pattern belongs to an image. We plan to extend this work to error-tolerant
measures such as the largest common submap, which could be used to find the
largest subset of edge-connected faces, and map edit distances, which could be
used to measure the similarity of maps by means of edit costs.
Finally, more relevant results in the image field could be obtained by inte-
grating geometric information: combinatorial maps may be labelled by features
extracted from the modelled image such as, e.g., the shape or the area of a face,
the angle between two segments, or the length of a segment. These labels may be
used to measure map similarity by quantifying the similarity of labels associated
with matched cells.

References

[CFSV01] Cordella, L.P., Foggia, P., Sansone, C., Vento, M.: An improved algorithm
for matching large graphs. In: 3rd IAPR-TC15 Workshop on Graph-based
Representations in Pattern Recognition, Ischia, Italy, pp. 149–159 (2001)
[CFSV04] Conte, D., Foggia, P., Sansone, C., Vento, M.: Thirty years of graph match-
ing in pattern recognition. International Journal of Pattern Recognition and
Artificial Intelligence 18(3), 265–298 (2004)
[Cor75] Cori, R.: Un code pour les graphes planaires et ses applications. In:
Astérisque, vol. 27. Soc. Math. de, France (1975)
112 G. Damiand et al.

[DBF04] Damiand, G., Bertrand, Y., Fiorio, C.: Topological model for two-
dimensional image representation: definition and optimal extraction algo-
rithm. Computer Vision and Image Understanding 93(2), 111–154 (2004)
[Edm60] Edmonds, J.: A combinatorial representation for polyhedral surfaces. In:
Notices of the American Mathematical Society, vol. 7 (1960)
[JB98] Jiang, X., Bunke, H.: Marked subgraph isomorphism of ordered graphs.
In: Amin, A., Pudil, P., Dori, D. (eds.) SPR 1998 and SSPR 1998. LNCS,
vol. 1451, pp. 122–131. Springer, Heidelberg (1998)
[JB99] Jiang, X., Bunke, H.: Optimal quadratic-time isomorphism of ordered
graphs. Pattern Recognition 32(7), 1273–1283 (1999)
[Lie91] Lienhardt, P.: Topological models for boundary representation: a compar-
ison with n-dimensional generalized maps. Computer-Aided Design 23(1),
59–82 (1991)
[Lie94] Lienhardt, P.: N-dimensional generalized combinatorial maps and cellu-
lar quasi-manifolds. International Journal of Computational Geometry and
Applications 4(3), 275–324 (1994)
[LWH03] Luo, B., Wilson, R.C., Hancock, E.R.: Spectral embedding of graphs. Pat-
tern Recognition 36(10), 2213–2230 (2003)
[McK81] McKay, B.D.: Practical graph isomorphism. Congressus Numerantium 30,
45–87 (1981)
[PABL07] Poudret, M., Arnould, A., Bertrand, Y., Lienhardt, P.: Cartes combina-
toires ouvertes. Research Notes 2007-1, Laboratoire SIC E.A. 4103, F-86962
Futuroscope Cedex - France (October 2007)
[SS08] Sorlin, S., Solnon, C.: A parametric filtering algorithm for the graph iso-
morphism problem. Constraints 13(4), 518–537 (2008)
[Tut63] Tutte, W.T.: A census of planar maps. Canad. J. Math. 15, 249–271 (1963)
[ZDS+ 07] Zampelli, S., Deville, Y., Solnon, C., Sorlin, S., Dupont, P.: Filtering for
subgraph isomorphism. In: Bessière, C. (ed.) CP 2007. LNCS, vol. 4741,
pp. 728–742. Springer, Heidelberg (2007)
A Recursive Embedding Approach to Median
Graph Computation

M. Ferrer1 , D. Karatzas2, E. Valveny2 , and H. Bunke3


1
Institut de Robòtica i Informàtica Industrial, UPC-CSIC
C. Llorens Artigas 4-6, 08028 Barcelona, Spain
mferrer@iri.upc.edu
2
Centre de Visió per Computador, Universitat Autònoma de Barcelona
Edifici O Campus UAB, 08193 Bellaterra, Spain
{dimos,ernest}@cvc.uab.cat
3
Institute of Computer Science and Applied Mathematics, University of Bern
Neubrückstrasse 10, CH-3012 Bern, Switzerland
bunke@iam.unibe.ch

Abstract. The median graph has been shown to be a good choice to


infer a representative of a set of graphs. It has been successfully applied
to graph-based classification and clustering. Nevertheless, its computa-
tion is extremely complex. Several approaches have been presented up
to now based on different strategies. In this paper we present a new ap-
proximate recursive algorithm for median graph computation based on
graph embedding into vector spaces. Preliminary experiments on three
databases show that this new approach is able to obtain better medians
than the previous existing approaches.

1 Introduction
Graphs are a powerful tool to represent structured objects compared to other
alternatives such as feature vectors. For instance, a recent work comparing the
representational power of such approaches under the context of web content
mining has been presented in [1]. Experimental results show better accuracies of
the graph-based approaches over the vector-based methods. Nevertheless, some
basic operations such as computing the sum or the mean of a set of graphs,
become very difficult or even impossible in the graph domain.
The mean of a set of graphs has been defined using the concept of the median
graph. Given a set of graphs, the median graph [2] is defined as the graph
that has the minimum sum of distances (SOD) to all graphs in the set. It can
be seen as a representative of the set. Thus it has a large number of potential
applications primarily enabling many classical algorithms for learning, clustering
and classification typically used in the vector domain. However, its computation
time increases exponentially both in terms of the number of input graphs and
their size [3]. A number of algorithms for the median graph computation have
been reported in the past [2,3,4,5], but, in general, they either suffer from a large
complexity or they are restricted to specific applications.

A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 113–123, 2009.

c Springer-Verlag Berlin Heidelberg 2009
114 M. Ferrer et al.

In this paper we propose a new approximate method based on graph embed-


ding in vector spaces. Graph embedding has been recently used as a way to map
graphs into vector spaces [6] using the graph edit distance [7]. In this way we
can combine advantages from both domains: we keep the representational power
of graphs while being able to operate in a vector space. The median of the set of
vectors obtained with this mapping can be easily computed in the vector space.
Then, applying recursively the weighted mean of a pair of graphs [8] we go from
the vector domain back to the graph domain and we obtain an approximation of
the median graph from the obtained median vector. This is the main difference
over other embedding-based methods for the median graph computation like [9],
where they obtain a graph not corresponding to the median vector of the whole
set but the median of just three graphs of the set. We have made experiments
on three different graph databases. The underlying graphs have no constraints
regarding the number of nodes and edges. The results show that our method
obtains better medians, in terms of the SOD, that two other previous meth-
ods. With these results at hand, we can think of applying this new approach
to the world of real graph-based applications in pattern recognition and ma-
chine learning. In addition, our procedure potentially allows us to transfer any
machine learning algorithm that uses a median, from the vector to the graph
domain.
The rest of this paper is organized as follows. First, the basic concepts are
introduced in the next section. Then, we introduce in detail the concept of the
median graph and the previous work for its computation in Section 3. In Section 4
the proposed method for the median computation is described. Section 5 reports
a number of experiments and present results achieved with our method. Finally,
in Section 6 we draw some conclusions.

2 Basic Definitions
2.1 Graph
Given L, a finite alphabet of labels for nodes and edges, a graph g is defined
by the four-tuple g = (V, E, μ, ν) where, V is a finite set of nodes, E ⊆ V × V is
the set of edges, μ is the node labeling function (μ : V −→ L) and ν is the edge
labeling function (ν : V × V −→ L). The alphabet of labels is not constrained
in any way. For example, L can be defined as a vector space (i.e. L = Rn ) or
simply as a set of discrete labels (i.e. L = {Δ, Σ, Ψ, · · · }). Edges are defined as
ordered pairs of nodes, that is, an edge is defined by (u, v) where u, v ∈ V . The
edges are directed in the sense that if the edge is defined as (u, v) then u ∈ V is
the source node and v ∈ V is the target node.

2.2 Graph Edit Distance


The basic idea behind the graph edit distance [7,10] is to define the dissimilarity
of two graphs as the minimum amount of distortion required to transform one
graph into the other. To this end, a number of distortion or edit operations e,
A Recursive Embedding Approach to Median Graph Computation 115

consisting of the insertion, deletion and substitution of both nodes and edges are
defined. Given these edit operations, for every pair of graphs, g1 and g2 , there
exists a sequence of edit operations, or edit path p(g1 , g2 ) = (e1 , . . . , ek ) (where
each ei denotes an edit operation) that transforms g1 into g2 (see Figure 1). In
general, several edit paths exist between two given graphs. This set of edit paths
is denoted by ℘(g1 , g2 ). To evaluate which edit path is the best, edit costs are
introduced through a cost function. The basic idea is to assign a penalty (or cost)
c to each edit operation according to the amount of distortion it introduces in
the transformation. The edit distance between two graphs g1 and g2 , d(g1 , g2 ),
is the minimum cost edit path that transforms one graph into the other. Since
the graph edit distance is a NP-complete problem, in this paper we will use
suboptimal methods for its computation [11,12].

Fig. 1. Example of a possible edit path between two graphs, g1 and g2

3 Median Graph
Let U be the set of graphs that can be constructed using labels from L. Given
S = {g1 , g2 , ..., gn } ⊆ U , the generalized median graph ḡ of S is defined
as:
ḡ = arg min d(g, gi ) (1)
g∈U
gi ∈S

That is, the generalized median graph ḡ of S is a graph g ∈ U that minimizes


the sum of distances (SOD) to all the graphs in S. Notice that ḡ is usually not a
member of S, and in general more than one generalized median graph may exist
for a given set S.
The computation of the generalized median graph can only be done in ex-
ponential time, both in the number of graphs in S and their size [2]. As a
consequence, in real world applications we are forced to use suboptimal meth-
ods in order to obtain solutions for the generalized median graph in reasonable
time. Such approximate methods [2,4,5,13] apply some heuristics in order to re-
duce the graph edit distance computation complexity and the size of the search
space.
Another alternative is to use the set median graph instead of the generalized
median graph. The difference is that, while the search space for the generalized
median graph is U , that is, the whole universe of graphs, the search space for the
set median graph is simply S, that is, the set of graphs in the given set. It makes
the computation of set median graph exponential in the size of the graphs, due to
the complexity of graph edit distance, but polynomial with respect to the number
116 M. Ferrer et al.

of graphs in S. The set median graph is usually not the best representative of a
set of graphs, but it is often a good starting point when searching the generalized
median graph.

3.1 Median Graph via Embedding


Graph embedding aims to convert graphs into another structure, such as real
vectors, and then operate in the associated space to facilitate certain graph-based
tasks, such as matching and clustering.
In this paper we will use a new class of graph embedding procedures based on
the selection of some prototypes and graph edit distance computation [6]. For
the sake of completeness, we briefly describe this approach in the following.
Assume we have a set of training graphs T = {g1 , g2 , . . . , gn } and a graph
dissimilarity measure d(gi , gj ) (i, j = 1 . . . n; gi , gj ∈ T ). Then, a set P =
{p1 , . . . , pm } ⊆ T of m prototypes is selected from T (with m ≤ n). After that,
the dissimilarity between a given graph of g ∈ T and every prototype p ∈ P is
computed. This leads to m dissimilarity values, d1 , . . . , dm where dk = d(g, pk ).
These dissimilarities can be arranged in a vector (d1 , . . . , dm ). In this way, we
can transform any graph of the training set T into an m-dimensional vector using
the prototype set P .
Such kind of embedding has already been used for the approximate median
graph computation [9]. The idea behind such an approach is to follow a three
step process. Assuming that a set of n graphs S = {g1 , g2 , . . . , gn } is given, in
a first step every graph in S is embedded into a n-dimensional space, i.e. each
graph becomes a point in Rn because in our case the set of prototypes P is
the whole set S, and therefore there is no prototype selection. The second step
consists of computing the median vector M of all the points obtained in the
previous step. Finally, the resulting median vector has to be mapped back to an
equivalent graph. This last step of mapping back from the vector space to the
graph space presents a difficult problem for a number of reasons. To mention just
two, depending on the embedding technique not every point in the (continuous)
vector space corresponds to a graph. Secondly it might be that a particular
vector presents a one to many relationship to graphs. For instance, to obtain
the median graph, in [9] the three closest points to the computed median vector
M are used to compute their own median M  (which always falls on the plane
defined by them). Using these three points (corresponding to known graphs) and
the new median M  , the weighted mean approach [8] is used to recover a graph
ḡ  (corresponding to M  ), which is taken as an approximation of the median
graph ḡ of S.
In the next section we present a new recursive approach for computing the me-
dian graph for a given set of graphs based on the embedding procedure explained
before. The aim of the presented approach is to obtain a graph corresponding
to the actual median vector M of the whole set S. We show that, as expected,
obtaining a graph corresponding to the real median vector M produces better
medians (with a lower SOD to the graphs of the set), than using the graph
corresponding to M  as in the approach of [9].
A Recursive Embedding Approach to Median Graph Computation 117

4 A Recursive Embedding Approach

As explained before the difficulty in using graph embedding to calculate the me-
dian graph is the mapping from vector space back to the graph space. Here we pro-
pose a recursive solution to the problem based on the algorithm of the weighted
mean of a pair of graphs [8].
The weighed mean of two graphs g and g  is a graph g  such that

d(g, g  ) = a (2)

d(g, g  ) = a + d(g  , g  ) (3)


where a, with 0 ≤ a ≤ d(g, g  ), is a constant. That is, the graph g  is a graph
between the graphs g and g  along the edit path between them. Furthermore, if
the distance between g and g  is a and the distance between g  and g  is b, then
the distance between g and g  is a + b.
Assume that we can define a line segment in the vector space that connects
two points P1 and P2 corresponding to the known graphs g1 and g2 such as the
calculated median M lies on this line segment. We can then calculate the graph
gM corresponding to the median M as the weighted mean of g1 and g2 . The
problem is thus reduced to creating such a line segment in the vector space. We
show here how this can be achieved by recursively applying the weighted mean
of a pair of graphs.
Given a set of graphs S = {g1 , g2 , . . . , gn }, we use the graph embedding
method described in Section 3.1 to obtain the corresponding n-dimensional
points {P1 , P2 , . . . , Pn } in Rn . As long as there are no identical graphs in the set
S, the vectors vi = (Pi − O), where O is the origin of the n-dimensional space
defined, will be linearly independent. This arises from the way the coordinates
of the points were defined during graph embedding.
Once all the graphs have been embedded in the vector space, the median
of the corresponding points is computed. To this end we use the concept of
Euclidean Median using the Weiszfeld algorithm [14] as in the case of [9]. The
Euclidean median has been chosen as the representative in the vector domain for
two reasons. The first reason is that the median of a set of objects is one of the
most promising ways to obtain the representative of such a set. The second is
that, since the median graph is defined in a very close way to the median vector,
we expect the median vector to represent accurately the vectorial representation
of the median graph, and then, from the median vector to obtain good median
graphs.
Given a set of n linearly independent points in Rn we can define a hyper-
plane Hn−1 of dimensionality n-1 (e.g. in the case of n=2, two points define a
unique 1D line, in the case of n=3, three points define a unique 2D plane, etc).
The normal vector Nn−1 of the hyperplane Hn−1 can be calculated from the
following set of equations:
118 M. Ferrer et al.

(Pn − P1 ) · Nn−1 = 0
(Pn − P2 ) · Nn−1 = 0
..
. (4)
(Pn − Pn−1 ) · Nn−1 = 0
Nn−1  = 1

The Euclidean median Mn of these n points will always fall on the hyperplane
Hn−1 . Moreover it will fall within the volume of the n-1 dimensional simplex
with vertices Pi . For n=4 this is visualised in Figure 2(a). This figure shows
the hyperplane H3 defined by the 4 points Pi = {P1 , P2 , P3 , P4 }. The Euclidean
median M4 falls in the 3D space defined by the 4 points and specifically within
the pyramid (3D simplex) with vertices Pi .

(a) (b)

Fig. 2. a) The 3D hyperplane defined given four 4D points {P1 , P2 , P3 , P4 }. b) The 2D


hyperplane defined by the remaining points {P1 , P2 , P3 }.

Without loss of generality we can choose any one of the points, say Pn , and
create the vector (Mn − Pn ). This vector will lie fully on the hyperplane Hn−1 .
As mentioned before, in order to use the weighted mean between of a pair of
graphs to calculate the graph corresponding to Mn we need to first find a point
(whose corresponding graph is known) that lies on the line defined by the vector
(Mn − Pn ), and specifically on the ray extending Mn (so that Mn lies between
Pn and the new point).
Let’s call Hn−2 the hyperplane of dimensionality n-2 defined by the set of
points {P1 , P2 , . . . , Pn−1 }, that is all the original points except Pn . Then the
intersection of the line defined by the vector (Mn − Pn ) and the new hyperplane
Hn−2 will be a single point. For the running example of n=4 this point (M3 )
would be the point of intersection of the line P4 − M4 and the 2D plane H2
defined by the remaining points {P1 , P2 , P3 } (see Figure 2(a)).
A Recursive Embedding Approach to Median Graph Computation 119

For the normal vector Nn−2 of the hyperplane Hn−2 we can create the fol-
lowing set of n-1 equations in a similar fashion as before:
(Pn−1 − P1 ) · Nn−2 = 0
(Pn−1 − P2 ) · Nn−2 = 0
..
. (5)
(Pn−1 − Pn−2 ) · Nn−2 = 0
Nn−2  = 1
Furthermore, we ask that Nn−2 is perpendicular to Nn−1 (i.e. it falls within the
hyperplane Hn−1 ):
Nn−1 · Nn−2 = 0 (6)
Equations 5 and 6 provide us a set of n equations to calculate Nn−2 .
Suppose Mn−1 is the point of intersection of the line defined by (Mn − Pn )
and the hyperplane Hn−2 , then for this point it should be:
Mn−1 = Pn + α (Mn − Pn ) (7)

(Pn−1 − Mn−1 ) · Nn−2 = 0 (8)


Solving the above equations for a, we have:
Nn−2 · (Pn−1 − Pn )
α= (9)
Nn−2 · (Mn − Pn )
Substituting back to 7 we obtain the point Mn−1 .
We can now follow exactly the same process as before, and assume a new
line defined by the vector (Mn−1 − Pn−1 ). Again we can define as Mn−2 the
point of intersection of the above line with the n-3 dimensional hyperplane Hn−3
which is defined by the n-2 points: {P1 , P2 , . . . , Pn−2 }. As an example see Figure
2(b) for n=4. In this figure the point M2 is defined as the intersection of the
line defined be (M3 − P3 ) and the 1D hyperplane (line) H1 defined by the
remaining points {P1 , P2 }.
In the generic case the set of n equations needed to calculate the normal
vector Nk of the k dimensional hyperplane Hk are:
(Pk+1 − P1 ) · Nk = 0
(Pk+1 − P2 ) · Nk = 0
..
.
(Pk+1 − Pk ) · Nk = 0
Nn−1 · Nk = 0 (10)
Nn−2 · Nk = 0
..
.
Nk+1 · Nk = 0
Nk+1  = 1
120 M. Ferrer et al.

Based on eq. 7, 8 and 9, in the generic case the point Mk can be computed
recursively from:
Mk = Pk+1 + α (Mk+1 − Pk+1 ) (11)
Where:
Nk−1 · (Pk − Pk+1 )
α= (12)
Nk−1 · (Mk+1 − Pk+1 )
This process is recursively applied until M2 is sought. The case of M2 is solvable
using the weighted mean of a pair of graphs, as M2 will lie on the line segment
defined by P1 and P2 which correspond to known graphs (see Figure 2(b)).
Having calculated M2 the inverse process can be followed all the way up to
Mn . In the next step M3 can be calculated as the weighted mean of the graphs
corresponding to M2 and P3 . Generally the graph corresponding to the point
Mk will be given as the weighted mean of the graphs corresponding to Mk−1 and
Pk . The weighted mean algorithm can be applied continuously until the graph
corresponding to Mn is calculated, which is the median graph of the set.
It is important to note that the order of consideration of the points will affect
the final solution arrived at. As a result it is possible that one of the intermediate
solutions along the recursive path produces a lower SOD to the graphs of the
set than the final solution. Thus, the results reported here are based on the best
intermediate solutions.

5 Experiments
In this section we provide the results of an experimental evaluation of the pro-
posed algorithm. To this end we have used three graph databases representing
Letter shapes, Webpages and Molecules. Table 1 show some characteristics of
each dataset. For more information of these databases see [15].
To evaluate the quality of the proposed method, we propose to compare the
SOD of the median calculated using the present method (RE) taking the best
intermediate solution to the SOD of the median obtained using other existing
methods, namely the set median (SM) and the method of [9] (TE). For every
database we generated sets of different sizes as shown in Table 1. The graphs in
each set were chosen randomly from the whole database. In order to generalize
the results, we generated 10 different sets for each size.
Results of the mean value of the SOD over all the classes and repetitions for
each dataset are shown in Figure 3. Clearly, the lower the SOD, the better the

Table 1. Summary of dataset characteristics, viz. the size, the number of classes (#
classes), the average size of the graphs (∅ nodes) and the sizes of the sets

Database Size # classes ∅ nodes Number of Graphs in S


Letter 2,250 15 4.7 15, 25, 35, ..., 75
Webpages 2,340 6 186.1 15, 25, 35, ..., 75
Molecules 2,000 2 15.7 10, 20, 30, ..., 100
A Recursive Embedding Approach to Median Graph Computation 121

(a)

(b) (c)

Fig. 3. SOD evolution for the three databases. a) Letter, b) Molecule and c) Webpage.

result. Since the set median graph is the graph belonging to the training set with
minimum SOD, it is a good reference to evaluate the median graph quality.
As we can see, the results show that in all cases we obtain medians with lower
SOD than those obtained with the TE method. In addition, in two cases (Web
and Molecule) we also obtain better results than the SM method. In the case of
the Letter database, we obtain slightly worse results than the SM method but
quite close to that. Nevertheless, our results do not diverge from the results of the
SM method as in the case of the TE method, which means that our proposed
method is more robust against the size of the set. With these results we can
conclude that our method finds good approximations of the median graph.

6 Conclusions
In the present paper we have proposed a novel technique to obtain approximate
solutions for the median graph. This new approach is based on graph embed-
ding into vector spaces. First, the graphs are mapped to points in n-dimensional
vector space using the graph edit distance paradigm. Then, the crucial point
of obtaining the median of the set is carried out in the vector space, not in the
graph domain, which simplifies dramatically this operation. Finally, we proposed
a recursive application of the weighted mean of a pair of graphs to obtain the
graph corresponding to the median vector. This embedding approach allows us
to exploit the main advantages of both the vector and graph representations,
computing the more complex parts in real vector spaces but keeping the rep-
resentational power of graphs. Results on three databases, containing a high
122 M. Ferrer et al.

number of graphs with large sizes, show that the medians obtained with our
method are, in general, better that those obtained with other methods, in terms
of the SOD. For datasets such ones used in this paper, the generalized median
could not be computed before, due to the high computational cost of the exist-
ing methods. These results show that with this new procedure the median graph
can be potentially applied to any application where a representative of a set is
needed. Nevertheless, there are still a number of issues to be investigated. For
instance, the order in which the points are taken becomes an important topic to
be further studied in order to improve the results of the method.

Acknowledgements
This work has been supported by the Spanish research programmes Consolider
Ingenio 2010 CSD2007-00018, TIN2006-15694-C02-02 and TIN2008-04998, the
fellowship 2006 BP-B1 00046 and the Swiss National Science Foundation Project
200021-113198/1.

References
1. Schenker, A., Bunke, H., Last, M., Kandel, A.: Graph-Theoretic Techniques for
Web Content Mining. World Scientific Publishing, USA (2005)
2. Jiang, X., Münger, A., Bunke, H.: On median graphs: Properties, algorithms, and
applications. IEEE Trans. Pattern Anal. Mach. Intell. 23(10), 1144–1151 (2001)
3. Münger, A.: Synthesis of prototype graphs from sample graphs. Diploma Thesis,
University of Bern (1998) (in German)
4. Hlaoui, A., Wang, S.: Median graph computation for graph clustering. Soft Com-
put. 10(1), 47–53 (2006)
5. Ferrer, M., Serratosa, F., Sanfeliu, A.: Synthesis of median spectral graph. In: Mar-
ques, J.S., Pérez de la Blanca, N., Pina, P. (eds.) IbPRIA 2005. LNCS, vol. 3523,
pp. 139–146. Springer, Heidelberg (2005)
6. Riesen, K., Neuhaus, M., Bunke, H.: Graph embedding in vector spaces by means
of prototype selection. In: Escolano, F., Vento, M. (eds.) GbRPR 2007. LNCS,
vol. 4538, pp. 383–393. Springer, Heidelberg (2007)
7. Bunke, H., Allerman, G.: Inexact graph matching for structural pattern recogni-
tion. Pattern Recognition Letters 1(4), 245–253 (1983)
8. Bunke, H., Günter, S.: Weighted mean of a pair of graphs. Computing 67(3), 209–
224 (2001)
9. Ferrer, M., Valveny, E., Serratosa, F., Riesen, K., Bunke, H.: An approximate
algorithm for median graph computation using graph embedding. In: Proceedings
of 19th ICPR, pp. 287–297 (2008)
10. Sanfeliu, A., Fu, K.: A distance measure between attributed relational graphs for
pattern recognition. IEEE Transactions on Systems, Man and Cybernetics 13(3),
353–362 (1983)
11. Neuhaus, M., Riesen, K., Bunke, H.: Fast suboptimal algorithms for the compu-
tation of graph edit distance. In: Yeung, D.-Y., Kwok, J.T., Fred, A., Roli, F.,
de Ridder, D. (eds.) SSPR 2006 and SPR 2006. LNCS, vol. 4109, pp. 163–172.
Springer, Heidelberg (2006)
A Recursive Embedding Approach to Median Graph Computation 123

12. Riesen, K., Neuhaus, M., Bunke, H.: Bipartite graph matching for computing the
edit distance of graphs. In: Escolano, F., Vento, M. (eds.) GbRPR 2007. LNCS,
vol. 4538, pp. 1–12. Springer, Heidelberg (2007)
13. White, D., Wilson, R.C.: Mixing spectral representations of graphs. In: 18th In-
ternational Conference on Pattern Recognition (ICPR 2006), Hong Kong, China,
August 20-24, pp. 140–144. IEEE Computer Society, Los Alamitos (2006)
14. Weiszfeld, E.: Sur le point pour lequel la somme des distances de n points donnés
est minimum. Tohoku Math. Journal (43), 355–386 (1937)
15. Riesen, K., Bunke, H.: IAM graph database repository for graph based pattern
recognition and machine learning. In: SSPR/SPR, pp. 287–297 (2008)
Efficient Suboptimal Graph Isomorphism

Kaspar Riesen1 , Stefan Fankhauser1 , Horst Bunke1 , and Peter Dickinson2


1
Institute of Computer Science and Applied Mathematics, University of Bern,
Neubrückstrasse 10, CH-3012 Bern, Switzerland
{bunke,fankhauser,riesen}@iam.unibe.ch
2
C3I Division, DSTO, PO Box 1500, Edinburgh SA 5111 (Australia)
peter.dickinson@dsto.defence.gov.au

Abstract. In the field of structural pattern recognition, graphs provide


us with a common and powerful way to represent objects. Yet, one of the
main drawbacks of graph representation is that the computation of stan-
dard graph similarity measures is exponential in the number of involved
nodes. Hence, such computations are feasible for small graphs only. The
present paper considers the problem of graph isomorphism, i.e. checking
two graphs for identity. A novel approach for the efficient computation
of graph isomorphism is presented. The proposed algorithm is based on
bipartite graph matching by means of Munkres’ algorithm. The algorith-
mic framework is suboptimal in the sense of possibly rejecting pairs of
graphs without making a decision. As an advantage, however, it offers
polynomial runtime. In experiments on two TC-15 graph sets we demon-
strate substantial speedups of our proposed method over several stan-
dard procedures for graph isomorphism, such as Ullmann’s method, the
VF2 algorithm, and Nauty. Furthermore, although the computational
framework for isomorphism is suboptimal, we show that the proposed
algorithm rejects only very few pairs of graphs.

1 Introduction
Graphs, employed in structural pattern recognition, offer a versatile alternative
to feature vectors for pattern representation. Particularly in problem domains
where the objects consist of complex and interrelated substructures of different
size, graph representations are advantageous. However, after the initial enthusi-
asm induced by the “smartness” and flexibility of graphs in the late seventies,
graphs have been left almost unused for a long period of time [1]. One of the
reasons for this phenomenon is that their comparison, termed graph matching,
is computationally very demanding.
The present paper addresses the most elementary graph matching problem,
which is graph isomorphism. Several algorithms for the computation of graph
isomorphism have been proposed in the literature [2,3,4,5,6,7,8]. Note, however,
that no polynomial runtime algorithm is known for this particular decision prob-
lem [9]. Under all available algorithms, the computational complexity of graph
isomorphism is exponential in the number of nodes in case of general graphs.
However, since the graphs encountered in practice often have special properties

A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 124–133, 2009.

c Springer-Verlag Berlin Heidelberg 2009
Efficient Suboptimal Graph Isomorphism 125

and furthermore, the labels of both nodes and edges very often help to substan-
tially reduce the search time, the actual computation time is sometimes manage-
able. In fact, polynomial algorithms for graph isomorphism have been developed
for special kinds of graphs, such as trees [10], planar graphs [11], bounded-valence
graphs [12], ordered graphs [13], and graphs with unique node labels [14]. Appli-
cations of the graph isomorphism problem can be found, for example, in compu-
tational chemistry [12] and in electronic automation [15]. Nonetheless, the high
computational complexity of graph isomorphism in case of general graphs con-
stitutes a serious drawback that prevents the more widespread use of graphs in
pattern recognition and related fields.
The present paper introduces a novel framework for the problem of graph
isomorphism. It is not restricted to any special class of graphs. The basic idea
is inspired by two papers, viz. [16,17]. In [16] it was shown that the problem
of graph isomorphism can be seen as a special case of optimal error-tolerant
graph matching under particular cost functions. In [17] a framework for fast
but suboptimal graph edit distance based on bipartite graph matching has been
proposed. The method is based on an (optimal) fast bipartite optimization pro-
cedure mapping nodes and their local structure of one graph to nodes and their
local structure of another graph. This procedure is somewhat similar in spirit
to the method proposed in [18]. However, rather than using dynamic program-
ming for finding an optimal match between the sets of local structure, Munkres’
algorithm [19] is used.
The work presented here combines these two ideas to obtain a suboptimal algo-
rithmic framework for graph isomorphism with polynomial runtime. Concretely,
the problem of graph isomorphism is reduced to an instance of the assignment
problem. In fact, polynomial runtime algorithms exist solving the assignment
problem in an optimal way. Yet, due to the fact that the assignment procedure
regards the nodes and their local structure only, it cannot be guaranteed that an
existing graph isomorphism between two graphs is detected in any case. Some-
times the proposed algorithm may not be able to decide, for a given pair of
graphs, whether they are isomorphic or not. In such a case, the given pair of
graphs is rejected. Consequently, the algorithm is suboptimal in the sense that
is does not guarantee to process any given input. However, if a pair of graphs is
not rejected the decision returned by the algorithm (yes or no) is always correct.
With experimental results achieved on two data sets from the TC-15 graph data
base [20], we empirically verify the feasibility of our novel approach to the graph
isomorphism problem.

2 Graph Isomorphism

Definition 1 (Graph). Let LV and LE be a finite or infinite label alphabet for


nodes and edges, respectively. A graph g is a four-tuple g = (V, E, μ, ν), where V
is the finite set of nodes, E ⊆ V × V is the set of edges, μ : V → LV is the node
labeling function, and ν : E → LE is the edge labeling function. The number of
nodes and edges of a graph g is denoted by |V | and |E|, respectively.
126 K. Riesen et al.

The aim in exact graph matching is to determine whether two graphs, or parts
of them, are identical in terms of structure and labels. A common approach
to describe the structure of a graph g = (V, E, μ, ν) is to define the graph’s
adjacency matrix A = (aij )n×n (|V | = n). In the adjacency matrix, entry aij is
equal to 1 if there is an edge (vi , vj ) ∈ E connecting the i-th node with the j-th
node in g, and 0 otherwise1 . Generally, for the nodes (and also the edges) of a
graph there is no unique canonical order. Thus, for a single graph with n nodes,
n! different adjacency matrices exist. Consequently, for checking two graphs for
structural identity, we cannot merely compare their adjacency matrices. The
identity of two graphs g1 and g2 is commonly established by defining a function,
termed graph isomorphism, mapping g1 to g2 .

Definition 2 (Graph Isomorphism). Assume that two graphs g1 = (V1 , E1 ,


μ1 , ν1 ) and g2 = (V2 , E2 , μ2 , ν2 ) are given. A graph isomorphism is a bijective
function f : V1 → V2 satisfying

1. μ1 (u) = μ2 (f (u)) for all nodes u ∈ V1


2. for each edge e1 = (u, v) ∈ E1 , there exists an edge e2 = (f (u), f (v)) ∈ E2
such that ν1 (e1 ) = ν2 (e2 )
3. for each edge e2 = (u, v) ∈ E2 , there exists an edge e1 = (f −1 (u), f −1 (v)) ∈
E1 such that ν1 (e1 ) = ν2 (e2 )

Two graphs are called isomorphic if there exists an isomorphism between them.
Obviously, isomorphic graphs are identical in both structure and labels. The
relation of graph isomorphism satisfies the conditions of reflexivity, symmetry,
and transitivity and can therefore be regarded as an equivalence relation on
graphs.
Standard procedures for testing graphs for isomorphism are based on tree
search techniques with backtracking. A well known algorithm implementing the
idea of a tree search with backtracking for graph isomorphism is described in [2].
A more recent algorithm for graph isomorphism, also based on the idea of tree
search, is the VF algorithm and its successor VF2 [21]. Here the basic tree search
algorithm is endowed with an efficiently computable heuristic which substan-
tially speeds up the search time. In [4] the tree search method for isomorphism
is sped up by means of another heuristic based on constraint satisfaction. An-
other algorithm for exact graph matching is Nauty [5]. It is based on a set of
transformations that reduce the graphs to be matched to a canonical form on
which the testing of the isomorphism is significantly faster. In [8] an approxi-
mate solution to the graph isomorphism problem, using the eigendecompositions
of the adjacency or Hermitian matrices, is discussed. In [6] a novel approach to
the graph isomorphism problem based on quantum walks is proposed. The basic
idea is to simulate coined quantum walks on an auxiliary graph representing
possible node matchings of the underlying graphs. The reader is referred to [1]
for an exhaustive list of graph isomorphism algorithms developed since 1973.
1
Two nodes vi , vj ∈ V connected by an edge (vi , vj ) ∈ E or (vj , vi ) ∈ E are commonly
referred to as adjacent.
Efficient Suboptimal Graph Isomorphism 127

3 Bipartite Matching for Graph Isomorphism


The proposed approach for graph isomorphism is based on the assignment prob-
lem. The assignment problem considers the task of finding an optimal assignment
of the elements of a set A to the elements of a set B, where A and B have the
same cardinality. Assuming that numerical costs are given for each assignment
pair, an optimal assignment is one which minimizes the sum of the assignment
costs. Formally, the assignment problem can be defined as follows.
Definition 3 (The Assignment Problem). Let us assume there are two sets
A and B together with an n × n cost matrix C = (cij )n×n of real numbers
given, where |A| = |B| = n. The matrix elements cij ≥ 0 correspond to the cost
of assigning the i-th element of A to the j-th element of B. The assignment
problem can be stated as finding
 a permutation p = p1 , . . . , pn of the integers
1, 2, . . . , n that minimizes ni=1 cipi .
The assignment problem can be reformulated as finding an optimal matching in
a complete bipartite graph and is therefore also referred to as bipartite graph
matching problem. Solving the assignment problem in a brute force manner by
enumerating all possible assignments and selecting the one that minimizes the
objective function leads to an exponential complexity which is unreasonable, of
course. However, there exists an algorithm which is known as Munkres’ algo-
rithm [19] that solves the bipartite matching problem in O(n3 ) time. The same
algorithm can be used to derive a suboptimal solution to the graph isomorphism
problem as described below. That is, the graph isomorphism problem is reformu-
lated as an instance of an assignment problem which is in turn solved by means
of Munkres’ algorithm in polynomial time.
Let us assume a source graph g1 = (V1 , E1 , μ1 , ν1 ) and a target graph g2 =
(V2 , E2 , μ2 , ν2 ) with V1 = V2 = n are given. We solve the assignment problem
with A = V1 and B = V2 . In our solution we define a cost matrix C = (cij )n×n
such that entry cij corresponds to the cost of assigning the i-th node of V1 with
the j-th node of V2 . Formally,
"
0 if μ1 (ui ) = μ2 (vj )
cij =
k otherwise,

where k > 0 is an arbitrary constant greater than zero. Hence, the entry cij in
C is zero if the corresponding node labels μ1 (ui ) and μ2 (vj ) are identical, and
non-zero otherwise.
We denote P as the set of all n! permutations of the integers 1, 2, . . . , n.
Given the cost matrix C = (cij )n×n , the assignment problem
 can be stated as
finding a permutation (p1 , . . . , pn ) ∈ P that minimizes ni=1 cipi . Obviously, this
is equivalent to the minimum cost assignment of the nodes of g1 represented by
the rows to the nodes of g2 represented by the columns of matrix C. Hence,
Munkres’ algorithm can be seen as a function m : V1 → V2 minimizing the
n
objective function i=1 cipi . Note that in general the function m is not unique
as there may be several node mappings minimizing the actual objective function.
128 K. Riesen et al.

The minimum value of the objective function of Munkres’ algorithm provides us


with a dissimilarity measure d(g1 , g2 ) for input graphs g1 and g2 , defined as
n
d(g1 , g2 ) = min cipi (1)
(p1 ,...,pn )∈P
i=1

Clearly, if the minimum node assignment cost d(g1 , g2 ) is greater than zero,
one can be sure that there exists no graph isomorphism between g1 and g2 . On the
other hand, if d(g1 , g2 ) is equal to zero, there exists the possibility that g1 and g2
are isomorphic to each other. Obviously, the condition d(g1 , g2 ) = 0 is necessary,
but not sufficient for the existence of a graph isomorphism as the structure of
the graph is not considered by d(g1 , g2 ). In other words, the proposed algorithm
looks at the nodes and their respective labels only and takes no information
about the edges into account. According to Def. 2 only Condition (1) is satisfied
by function m.
In order to get more stringent criteria for the decision whether or not a graph
isomorphism exists, the edge structure can be involved in the node assignment
process (Conditions (2) and (3) of Def. 2). To this end, structural information
is included in the node labels. In particular, we extend the node label μ(u) of
every node u ∈ V by the indegree and the outdegree of u. The indegree and
the outdegree of node u ∈ V denote the number of incoming and outgoing
edges of u, respectively. Furthermore, the Morgan index M is used to add fur-
ther information about the local edge structure in the node labels [22]. This
index is iteratively computed for each node u ∈ V , starting with Morgan index
values M (u, 1) equal to 1 for all nodes u ∈ V . Next, at iteration step i + 1,
M (u, i + 1) is defined as the sum of the Morgan indices of u’s adjacent nodes
of the last iteration i. Note that the Morgan index M (u, i) associated to a node
u after the i-th iteration counts the number of paths of length i starting at
u and ending somewhere in the graph [23]. Hence, Morgan index provides us
with a numerical description of the structural neighborhood of the individual
nodes.
Given this additional information about the local structure of the nodes in a
graph, viz. the indegree, the outdegree, and the Morgan index, the cost cij of
a node mapping ui → vj is now defined with respect to the nodes’ labels and
their local structure information. That is, the entry cij is zero iff the original
labels, the indegrees and outdegrees, and the Morgan indices are identical for
both nodes ui ∈ V1 and vj ∈ V2 . Otherwise, we set cij = k, where k > 0 is an
arbitrary constant.
Considerations in the present paper are restricted to graphs with unlabeled
edges. However, if there are labels on the edges, the minimum sum of assign-
ment costs, implied by node substitution ui → vj , could be added to cij . This
minimum sum will be zero, iff all of the incoming and outgoing edges of node ui
can be mapped to identically labeled and equally directed edges incident to vj .
Otherwise, for all non-identical edge matchings implied by ui → vj , a constant
Efficient Suboptimal Graph Isomorphism 129

k > 0 is added to cij 2 . In summary, cij will be zero iff ui and vj and their
respective local neighborhoods are identical in terms of structure and labeling.
Note that Munkres’ algorithm used in its original form is optimal for solving
the assignment problem, but it provides us with a suboptimal solution for the
graph isomorphism problem only. This is due to the fact that each node assign-
ment operation is considered individually (considering the local edge structure
only), such that no implied operations on the edges can be inferred dynamically.
The result returned by Munkres’ algorithm corresponds to the minimum cost
mapping m of the nodes V1 to the nodes V2 according to matrix C. The overall
cost d(g1 , g2 ) defined in Eq. (1) builds the foundation of a two-stage decision pro-
cedure. In Fig. 1 the decision framework is illustrated. If d(g1 , g2 ) > 0, a graph
isomorphism can be definitely excluded as the nodes and their local structure of
g1 cannot be identically mapped to local structures in g2 . If d(g1 , g2 ) = 0, it is
possible that g1 and g2 are isomorphic to each other. Yet, the global edge struc-
ture might be violated given the node mapping m : V1 → V2 . Hence, the mapping
of the edges implied by the node mapping is tested (Check Structure). This test
can be easily accomplished, given the node mapping returned by Munkres’ al-
gorithm. If the edge structure is not violated by mapping m (identical), a graph
isomorphism has been found. Otherwise (non-identical), based on the current
information no definite answer can be given, as there may exist other optimal
node mappings m that would not violate the global edge structure. In such a
case, the decision for isomorphism is rejected.
The decision framework of Fig. 1 is suboptimal in the sense that a decision
(yes or no) is not guaranteed to be returned for all inputs. It is possible that
the algorithm rejects a given pair of graphs. However, if an answer yes or no is
given, it is always correct. In the remainder of the present paper, we refer to this
algorithm as Bipartite-Graph-Isomorphism, or BP-GI for short.

Check
Munkres
Structure
d>0 d=0 identical non-identical

ye re-
no yes yes
s ject

Fig. 1. Graph isomorphism decision scheme. Square boxes refer to algorithms, circles
to decisions. Black circles stand for definite decisions, while the gray circle stands for a
possible “yes” which is verified by checking the edge structures for identity. If the edge
structure is violated by mapping m, no definite answer can be given.

In order to analyze the computational complexity of the proposed algorithm,


we note that for matching two graphs with |V | = n nodes and |E| = n edges the
following four steps are necessary. First, the Morgan index is computed for each
node (O(n )). Second, the cost matrix C = (cij )n×n is built (O(n2 )). Third, the
2
Finding the minimum overall cost of the edge assignments can be accomplished by
Munkres’ algorithm as well, as this problem is also an assignment problem.
130 K. Riesen et al.

matching process by means of Munkres’ algorithm is carried out (O(n3 )). Finally,
the edge structure is checked (O(n )). Hence, the total complexity amounts to
O(n3 ).
An alternative to the proposed algorithm is to check other optimal matchings
m whenever the edge structure check fails. In the worst case, however, there
exist O(n!) optimal matchings and trying all of them leads to a computational
complexity of O(n!). In order to avoid this high complexity, one can define an
upper limit L on the number of optimal assignments to be tried by the algorithm.

4 Experimental Evaluation
The purpose of the experiments is twofold. First, we want to compare the run-
time of the novel approach for graph isomorphism with the runtime of standard
algorithms for the same problem3 . To this end, Ullmann’s method [2], the VF2
algorithm [3], and Nauty [5] are employed as reference systems4 . Second, we are
interested how often the novel algorithm rejects a given pair of graphs.
We use two graph sets from the TC-15 graph data base [20], viz. the randomly
connected graphs (RCG), and the irregular mesh graphs (IMG). The former data
set consists of graphs where the edges connect the nodes without any structural
regularity. That is, the probability of an edge between two nodes is independent
of the actual nodes. The parameter η defines the probability of an edge between
two given nodes. Hence, given a graph g with |V | = n nodes, the expected
number of edges in g is given by η · n · (n − 1) (in our experiments we set η = 0.1).
Note that if g is not connected, additional edges are suitably inserted into graph
g until it becomes connected. The latter data set is based on structurally regular
mesh graphs in which each node (except those belonging to the border of the
mesh) is connected with its four neighborhood nodes. Irregular mesh graphs
are then obtained by the addition of uniformly distributed random edges. The
number of added edges is ρ · n, where ρ is a constant greater than 0, and n = |V |
(in our experiments we set ρ = 0.2). Note that the graphs from both data sets
are unlabeled a priori. Hence, when Munkres’ algorithm is applied the Morgan
index M (u, i), as well as the in- and outdegree of a particular node u ∈ V , are
the only labels on the nodes.
Graphs of various size are tested. The size of the randomly connected graphs
varies between 20 and 1000 nodes per graph (|V | = 20, 40, . . . , 100, 200, 400, 800,
1000). On the irregular mesh graphs, the size varies from 16 nodes up to 576
nodes per graph (|V | = 16, 36, 64, 81, 100, 196, 400, 576). For each graph size 100
graphs are available. Hence, 90, 000 isomorphism tests (thereof 900 between iso-
morphic graphs) and 80, 000 isomorphism tests (thereof 800 between isomorphic
graphs) are carried out in total on RCG and IMG, respectively.
3
Computations are carried out on an Intel Pentium 4 CPU, 3.00 GHZ with 2.0 Giga
RAM.
4
We use the original implementations available under http://amalfi.dis.unina.it/
graph/db/vflib-2.0/ for Ullmann’s method and VF2, and http://cs.anu.edu.
au/~bdm/nauty/ for Nauty.
Efficient Suboptimal Graph Isomorphism 131

On the first data set (RCG) the algorithm returns 89, 998 correct decisions.
Only in two cases the input is rejected. On the second data set (IMG) we obtain
79, 996 correct decisions and four rejects.
In Fig. 2 (a) and (b) the mean computation time of one graph isomorphism
test is plotted as a function of the graph size |V |. On both data sets Ullmann’s
method turns out to be the slowest graph isomorphism algorithm. VF2 and
Nauty feature faster matching times for both graph sets than the traditional
approach of Ullmann. Similar results are reported in [24] on the same data sets.
However, it clearly turns out that our novel system based on bipartite graph
matching is faster than all reference systems for all available graph sizes.

(a) RCG (b) IMG

Fig. 2. Mean computation time for graph isomorphism as a function of graph size |V |
on randomly connected graphs (RCG) and irregular mesh graphs (IMG)

5 Conclusions and Future Work


The present paper proposes a novel framework for a suboptimal computation of
graph isomorphism. The basic idea is that nodes, augmented by some informa-
tion about their local edge structure, are matched with each other. Hence, the
graph isomorphism problem is reduced to an assignment problem which can be
solved in polynomial time by Munkres’ algorithm. Due to the suboptimal match-
ing found by Munkres’ algorithm (the global edge structure might be violated
by the mapping found), we accept that the algorithm might reject the decision.
In this case, we can resort to any of the conventional algorithms. However, on
two data sets only six out of 170, 000 graph pairs are rejected and the remaining
decisions are all correct. Moreover, it clearly turned out that our novel system
is the fastest procedure for graph isomorphism among all tested algorithms.
In future work several open issues will be investigated. For instance, there are
more TC-15 data sets available for testing our algorithm. Moreover, implement-
ing the idea of an any time algorithm as discussed in Section 3, the number of
132 K. Riesen et al.

rejections might be further reduced or completely eliminated. The algorithmic


framework presented in this paper has been implemented in Java, while the ref-
erence systems are implemented in C++ (Ullmann’s algorithm and VF2) and
C (Nauty). There seems to be room for further speeding up our algorithm by
using another implementation language. Finally, extending the ideas presented
in this paper to the task of subgraph isomorphism detection is an interesting
future research problem.

Acknowledgments
This work has been supported by the Swiss National Science Foundation (Project
200021-113198/1). We would like to thank the Laboratory of Intelligent Systems
and Artificial Vision of the University of Naples for making the TC-15 data base,
Ullmann’s algorithm, and the VF2 algorithm available to us. Furthermore, we
are very grateful to Brendan McKay for making Nauty available to us.

References
1. Conte, D., Foggia, P., Sansone, C., Vento, M.: Thirty years of graph matching
in pattern recognition. Int. Journal of Pattern Recognition and Artificial Intelli-
gence 18(3), 265–298 (2004)
2. Ullmann, J.: An algorithm for subgraph isomorphism. Journal of the Association
for Computing Machinery 23(1), 31–42 (1976)
3. Cordella, L.P., Foggia, P., Sansone, C., Vento, M.: An improved algorithm for
matching large graphs. In: Proc. 3rd Int. Workshop on Graph Based Representa-
tions in Pattern Recognition (2001)
4. Larrosa, J., Valiente, G.: Constraint satisfaction algorithms for graph pattern
matching. Mathematical Structures in Computer Science 12(4), 403–422 (2002)
5. McKay, B.: Practical graph isomorphism. Congressus Numerantium 30, 45–87
(1981)
6. Emms, D., Hancock, E., Wilson, R.: A correspondence measure for graph matching
using the discrete quantum walk. In: Escolano, F., Vento, M. (eds.) GbRPR 2007.
LNCS, vol. 4538, pp. 81–91. Springer, Heidelberg (2007)
7. Messmer, B., Bunke, H.: A decision tree approach to graph and subgraph isomor-
phism detection. Pattern Recognition 32, 1979–1998 (2008)
8. Umeyama, S.: An eigendecomposition approach to weighted graph matching prob-
lems. IEEE Transactions on Pattern Analysis and Machine Intelligence 10(5), 695–
703 (1988)
9. Garey, M., Johnson, D.: Computers and Intractability: A Guide to the Theory of
NP-Completeness. Freeman and Co., New York (1979)
10. Aho, A., Hopcroft, J., Ullman, J.: The Design and Analysis of Computer Algo-
rithms. Addison Wesley, Reading (1974)
11. Hopcroft, J., Wong, J.: Linear time algorithm for isomorphism of planar graphs. In:
Proc. 6th Annual ACM Symposium on Theory of Computing, pp. 172–184 (1974)
12. Luks, E.: Isomorphism of graphs of bounded valence can be tested in polynomial
time. Journal of Computer and Systems Sciences 25, 42–65 (1982)
13. Jiang, X., Bunke, H.: Optimal quadratic-time isomorphism of ordered graphs. Pat-
tern Recognition 32(17), 1273–1283 (1999)
Efficient Suboptimal Graph Isomorphism 133

14. Dickinson, P., Bunke, H., Dadej, A., Kraetzl, M.: Matching graphs with unique
node labels. Pattern Analysis and Applications 7(3), 243–254 (2004)
15. Ebeling, C.: Gemini ii: A second generation layout validation tool. In: IEEE Inter-
national Conference on Computer Aided Design, pp. 322–325 (1988)
16. Bunke, H.: Error correcting graph matching: On the influence of the underlying cost
function. IEEE Transactions on Pattern Analysis and Machine Intelligence 21(9),
917–911 (1999)
17. Riesen, K., Bunke, H.: Approximate graph edit distance computation by means of
bipartite graph matching. In: Image and Vision Computing (2008) (accepted for
publication)
18. Eshera, M., Fu, K.: A graph distance measure for image analysis. IEEE Transac-
tions on Systems, Man, and Cybernetics (Part B) 14(3), 398–408 (1984)
19. Munkres, J.: Algorithms for the assignment and transportation problems. Journal
of the Society for Industrial and Applied Mathematics 5, 32–38 (1957)
20. Foggia, P., Sansone, C., Vento, M.: A database of graphs for isomorphism and
subgraph isomorphism benchmarking. In: Proc. 3rd Int. Workshop on Graph Based
Representations in Pattern Recognition, pp. 176–187 (2001)
21. Cordella, L., Foggia, P., Sansone, C., Vento, M.: A (sub)graph isomorphism algo-
rithm for matching large graphs. IEEE Trans. on Pattern Analysis and Machine
Intelligence 26(20), 1367–1372 (2004)
22. Morgan, H.: The generation of a unique machine description for chemical
structures-a technique developed at chemical abstracts service. Journal of Chemical
Documentation 5(2), 107–113 (1965)
23. Mahé, P., Ueda, N., Akutsu, T.: Graph kernels for molecular structures – activity
relationship analysis with support vector machines. Journal of Chemical Informa-
tion and Modeling 45(4), 939–951 (2005)
24. Foggia, P., Sansone, C., Vento, M.: A performance comparison of five algorithms for
graph isomorphism. In: Jolion, J., Kropatsch, W., Vento, M. (eds.) Proc. 3rd Int.
Workshop on Graph Based Representations in Pattern Recognition, pp. 188–199
(2001)
Homeomorphic Alignment
of Edge-Weighted Trees

Benjamin Raynal, Michel Couprie, and Venceslas Biri

Université Paris-Est
Laboratoire d’Informatique Gaspard Monge, Equipe A3SI
UMR 8049 UPEMLV/ESIEE/CNRS

Abstract. Motion capture, a currently active research area, needs esti-


mation of the pose of the subject. For this purpose, we match the tree
representation of the skeleton of the 3D shape to a pre-specified tree
model. Unfortunately, the tree representation can contain vertices that
split limbs in multiple parts, which do not allow a good match by usual
methods. To solve this problem, we propose a new alignment, taking
in account the homeomorphism between trees, rather than the isomor-
phism, as in prior works. Then, we develop several computationally effi-
cient algorithms for reaching real-time motion capture.

Keywords: Graphs, homeomorphism, alignment, matching algorithm.

1 Introduction
Motion capture without markers is a highly active research area, and is used
in several applications which have not the same needs: 3D models animation,
for movies FX or video games for example, requests an highly accurate model,
but does not need real-time computation (offline video processing is acceptable).
Real-time interaction, for virtual reality applications, requests a fast computa-
tion, at the price of a lower accuracy. This paper is placed in the context of
real-time interaction.
The first step (called initialization step) consists of finding the initial pose
of the subject, represented here by a 3d shape (visual hull) constructed using a
multi view system with an algorithm of Shape From Silhouette [1].
An important part of the algorithms of 3D pose estimation use a manually
initialized model, or ask the subject to move succesively the differents parts of
his/her body, but several automatic approaches have been developped, using
an a priori model. This a priori model can approximate different characteristics
of the subject: kinematic structure, shape or appearance. This kind of a priori
model have several constraints. It is complex because characteristics of different
nature are involved, and needs to be adapted to each subject (specialy in the
case of appearance).

1.1 Motivation
Our goal is to automatize the initial pose estimation step. To achieve this aim,
we use a very simple a priori model.

A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 134–143, 2009.

c Springer-Verlag Berlin Heidelberg 2009
Homeomorphic Alignment of Edge-Weighted Trees 135

3
b 16 c
H
1 4
d 11 e
A1 12 T 12 A2
12 10
f C
7 1 16 16

g 2 h i 2 j 2 k F1 F2
1 12
l 2 m n

8
3D SHAPE SKELETON DATA TREE PATTERN TREE
o

Fig. 1. Example of data tree acquisition and expected alignment with model tree

The model is an unrooted weighted tree (called the pattern tree), where vertices
represent the different parts of the shape, and each edge represents the link between
this parts, associated to a weight, representing the distance between two parts.
Concerning the data, we extract the curve skeleton of the visual hull, and
compute the associated weighted unrooted tree (called the data tree), by con-
sidering each multiple point and ending point, and linking them when they are
directly connected, the weight of the edge beeing the geodesic distance between
them (see figure 1).
After this step, the main difficulty is to match the pattern tree in the data
tree, with a good preservation of both topology and distances.
A lot of similar approaches have been developed, using the skeleton of a shape,
in motion capture research area [3,4,5], and in 3D shape matching research
area [6,8]. In the first case, the best time obtained for find the initial pose is
some one second [4], which is too slow, even for interactive time interaction. In
the second case, the algorithms used give an approximated solution [8], or need
a accurate knowledge of the radius distance of the skeleton, in order to compute
the associated shock graph [9].
As shown on Fig. 1, several kinds of noise and deformities can appear in
the data tree : spurious branches (edges {g, h}, {l, m}, {i, j}, {j, k}), useless 2-
degree vertices, obtained after spurious branches deletion (in our example, ver-
tices j, k, m), and splitted vertices (vertex T of pattern tree match with vertices
b and e in data tree).
Approaches found in the literature do not permit to achieve a robust match-
ing, with respect to these pertubations, mainly because they are defined for reach
an isomorphism between the trees, instead of an homeomorphism. In the follow-
ing, after adapting basic notions, we introduce both a new alignment, called
homeomorphic alignment, and a robust tree-matching algorithm which may be
used for real-time pose estimation.

2 Basics Notions
An undirected graph is a pair (V, E), where V is a finite set of vertices, and E
a subset of {{x, y}, x ∈ V, y ∈ V, x = y} (called edge). The degree of v ∈ V is
136 B. Raynal, M. Couprie, and V. Biri

denoted by deg(v). A tree is a connected graph with no cycles. A simple path


from x to y in a tree is unique and is denoted by π(x, y). A forest is a graph
with no cycles, each of its connected components being a tree.
A directed graph is a pair (V, A), where V is a finite set, and A a subset of V × V
(called arc set). The undirected graph associated to G is the undirected graph G =
(V, E), such that {x, y} ∈ E if and only if (x, y) ∈ A or (y, x) ∈ A. A vertex r ∈ V
is a root of G if for all x ∈ V \ {r}, a path from r to x in G exists. G is antisymetric
if for all (x, y) ∈ A, (y, x) ∈
/ A. The graph G is a rooted tree (with root r) if r is a
root of G, G is antisymetric and if the undirected graph associated to G is a tree.
A graph, where each of its connected components is a rooted tree, is called a rooted
forest. The parent of x ∈ V is denoted by par(x), the set of the ancestors of x by
anc(x) and the set of all children of x is denoted by C(x).
Unless otherwise indicated, all the other definitions and notations in this ar-
ticle are similar for the two kinds of graphs. We will give them for the directed
graphs, the versions for undirected graphs can be obtained by replacing arcs by
edges. Two graphs G = (VG , AG ) and G = (VG , AG ) are said to be isomorphic
if there exists a bijection f : VG → VG , such that for any pair (x, y) ∈ VG × VG ,
(x, y) ∈ AG if and only if (f (x), f (y)) ∈ AG . A weighted graph is a triplet
(V, A, ω), where V is a finite set, A a subset of V × V , and ω a mapping from A
to R. In a weighted tree, the weight of the unique path from x to y, denoted by
ω(x, y) is the sum of the weights of all arcs traversed in the path.

3 Measure of Similarity
For a graph G = (V, A, ω), commonly used operations are :
resize : Change the weight of an arc a = (u, v) ∈ A.
delete : Delete an arc a = (u, v) ∈ A and merge u and v into one vertex.
insert : Split a vertex in two vertices, and link them by a new arc.
The cost of these edit operations is given by a cost function γ(w, w ), where
w (respectively w ) is the total weight of the arcs involved in the operation
before (respectively after) its application.We asume that γ is a metric. Typically,
γ(w, w ) = |w − w | or (w − w )2 .
Various edit-based distances have been defined, using different constraints on
sequence order and different definitions of operations. These edit-based distances
can be classified, as proposed by Wang and al. [10] : Edit distance [11], alignment
distance [12,13], isolated-subtrees distance [14], and top-down distance [15]. Pro-
posed edit distances, isolated-subtrees distances and top-down distances cannot
always match all the model tree, but only subparts, most often unconnected. How-
ever, we will see in the next subsection that it is not the case for alignment distance.

3.1 Alignment Distance


In [12], Jiang et al. propose a similarity measure between vertex-labeled trees,
that we transpose here for edge-weighted graphs.
Homeomorphic Alignment of Edge-Weighted Trees 137

Let G1 = (V1 , A1 , ω1 ) and G2 = (V2 , A2 , ω2 ) be two weighted graphs. Let G1 =


(V1 , A1 , ω1 ) and G2 = (V2 , A2 , ω2 ) be weighted graphs obtained by inserting arcs
weighted by 0 in G1 and G2 , such that there exists an isomorphism I between
G1 and G2 . The set of all couples of arcs A = {(a1 , a2 ); a1 ∈ A1 , a2 ∈ A2 , a2 =
I(a1 )} is called an alignment of G1 and G2 . The cost CA of A is given by

CA = γ(ω1 (a1 ), ω2 (a2 )) . (1)


(a1 ,a2 )∈A

The minimal cost of all alignments from G1 and G2 , called the alignment dis-
tance, is denoted by α(G1 , G2 ). Alignment distance is an interesting way in our
case for three reasons: it preserves topological relations between trees, it can be
computed in polynomial time, and it enables to ”remove edges”, regardless of
the rest of the graph, solving the problem of splitted vertices.

3.2 Cut Operation

In the purpose of removing spurious branches without any cost, we propose to


integrate the cut operation in our alignment.
In [16], Wang et al. propose a new operation allowing to consider only a part of
a tree. Let G = (V, A, ω) be a weighted tree. Cutting G at an arc a ∈ A, means
removing a, thus dividing G into two subtrees G1 and G2 . The cut operation
consists of cutting G at an arc a ∈ A, then considering only one of the two
subtrees. Let K a subset of A. We use Cut(G, K, v) to denote the subtree of
G containing v and resulting from cutting G at all arcs in K. In the case of a
rooted tree, we consider that the root rG of G cannot be removed by the cut
operation.
At this step, we can combine the methods described above [12,16] as follows :
given P = (VP , AP , ωP ) (the pattern tree) and D = (VD , AD , ωD ) (the data
tree), we define αcut (P, D) = minK⊆AD ,v∈VD {α(P, Cut(D, K, v)}, which is the
minimal alignment distance between P and a subgraph of D. The introduction
of αcut (P, D) solves the problems of splitted vertices and spurious branches, but
not the problem of useless 2-degree vertices. In the example of Fig.1, the vertex
F 1 in pattern tree will match with the vertex h in the data tree, instead of the
vertex o, because after cut of {g, h} and {l, m}, edges {f, h}, {h, m} and {m, o}
cannot be merged in only one edge, and then cannot be matched with {C, F 1}.

3.3 Homeomorphic Alignment Distance

For the purpose of solving the useless vertex problem, we propose a new align-
ment, which removes 2-degree vertices and search for minimal sequence of op-
erations to reach a homeomorphism, instead of an isomorphism between the
trees.

Homeomorphism. The merging is an operation that can be applied only on


arcs sharing a 2-degree vertex. The merging of two arcs (u, v) and (v, w) in a
138 B. Raynal, M. Couprie, and V. Biri

weighted graph G = (V, A, ω) consists of removing v in V , replacing (u, v) and


(v, w) by (u, w) in A, weighted by ω((u, w)) = ω((u, v)) + ω((v, w)).
Two weighted graphs G = (VG , AG , ωG ) and G = (VG , AG , ωG ) are home-
omorphic if there exists an isomorphism between a graph obtained by mergings
on G and a graph obtained by mergings on G .

Merging Kernel. Considering that a merging on a vertex v on the graph


G = (V, A, ω) does not affect the degree of any vertex in V \ {v} (by defini-
tion of merging operation) and therefore the possibility of merging this vertex,
the number of possible mergings decreases by one after each merging. In conse-
quence, the maximal size of a sequence of merging operations, transforming G
into another graph G = (V  , A , ω  ) is equal to the initial number of possible
mergings in G. It can be remarked that any sequence of merging operations of
maximal size yields the same result. The graph resulting of such a sequence on
G is called the merging kernel of G, and is denoted by MK(G). The following
proposition is straightforward :
Proposition 1. Two graphs G1 = (V1 , A1 , ω1 ) and G2 = (V2 , G2 , ω2 ) are home-
omorphic iff MK(G1 ) and MK(G2 ) are isomorphic.

Homeomorphic Alignment Distance. Let G1 = (V1 , A1 , ω1 ) and G2 =


(V2 , A2 , ω2 ) be two weighted graphs. Let G1 = (V1 , A1 , ω1 ) and G2 = (V2 , A2 , ω2 )
be weighted graphs obtained by deleting arcs in G1 and G2 , such that there ex-
ists an homeomorphism between G1 and G2 (not necessarily unique). Let G1 =
(V1 , A1 , ω1 ) and G2 = (V2 , A2 , ω2 ) be the merging kernels of G1 and G2 , re-
spectively. From proposition 1, there exists an isomorphism I between G1 and
G2 . The set of all couples of arcs H = {(a, a ); a ∈ A1 , a ∈ A2 , a = I(a)} is
called an homeomorphic alignment of G1 with G2 .
The cost CH of H is defined as

CH = γ(ω1 (a), ω2 (a )) + γ(ω1 (ad ), 0) + γ(0, ω2 (ad )) .


(a,a )∈H ad ∈A1 \A1 ad ∈A2 \A2
(2)
This minimal cost of all homeomorphic alignments between G1 and G2 , called
the homeomorphic alignment distance, is denoted by η(G1 , G2 ).
Our main problem can be stated as follows: given a weighted tree P =
(VP , AP , ωP ) (the pattern tree) and a weighted tree D = (VD , AD , ωD ) (the
data tree), find ηcut (P, D) = minK⊆AD ,v∈VD {η(P, Cut(D, K, v)} (in the case of
rooted trees, ηcut (P, D) = minK⊆AD {η(P, Cut(D, K, rD ))} ), and the associated
homeomorphic alignment.

4 Algorithms
4.1 Algorithm for Rooted Trees
Let T = (V, A, ω) be a weighted tree rooted in rT . For each vertex v ∈ V \ {rT },
we denote by ↑ v the arc (w, v) ∈ A, w being the parent of v. We denote by
Homeomorphic Alignment of Edge-Weighted Trees 139

T a T (b, a) a T (e, a) a F(T, b)

b c b b b b

d e f g d e f e d e f

h i j h i j h i h i j

Fig. 2. Examples for a rooted tree T

T (v), v ∈ V , the subtree of T rooted in v. We denote by Π(a, b) the set of all


vertices of the path π(a, b). Let va be an ancestor of v, we denote by Tcut (v, va )
the subgraph of T defined as follows :

Tcut (v, va ) = Cut(T (va ), {↑ p , p ∈ C(p) \ Π(va , v), p ∈ Π(va , par(v))}) . (3)

We denote by T (v, va ) the tree obtained from Tcut (v, va ) by merging on each
vertex n ∈ Π(va , v) \ {va , v}. We denote by F (T, v) the rooted forest, the con-
nected components of which are the trees T (p, v), for all p ∈ C(v). By abuse of
notation we also denote by F (T, v) the set of all connected components of this
forest (that is, as set of trees).
Proofs of the following propositions can be found in [17].

Proposition 2. Let P = (VP , EP , ωP ) and D = (VD , ED , ωD ) be two weighted


trees, rooted respectively in rP and rD .

ηcut (P, D) = ηcut (F (P, rP ), F (D, rD )) . (4)

Proposition 3. Let i ∈ VP \ {p}, j ∈ VD \ {d}, ia ∈ anc(i), ja ∈ anc(j),

ηcut (∅, ∅) = 0
ηcut (P (i, ia ), ∅) = ηcut (F (P, i), ∅) + γ(ω(ia , i), 0)
ηcut (F (P, ia ), ∅) = ηcut (P (i , ia ), ∅) (5)
i ∈C(ia )
ηcut (∅, D(j, ja )) = 0
ηcut (∅, F (D, ja )) = 0 .

Proposition 4. Let i ∈ VP \ {p}, j ∈ VD \ {d}, ia ∈ anc(i), ja ∈ anc(j).

ηcut (P
⎧(i, ia ), D(j, ja )) =

⎪ ηcut (F (P, i), ∅) + γ(ω(ia , i), 0)


⎨ γ(ω(ia , i), ω(ja , j)) + ηcut (F (P, i), F (D, j))
(6)
min minjc ∈C(j) {ηcut (P (i, ia ), D(jc , ja ))}



⎪ minic ∈C(i) {ηcut (P (ic , ia ), D(j, ja )) + ηcut (P (ic , i), ∅)} .
⎩ 
ic ∈C(i)\ic
140 B. Raynal, M. Couprie, and V. Biri

Proposition 5. ∀A ⊆ F(P, i), B ⊆ F(D, j),

ηcut (A,
⎧ B) =

⎪ minD(j  ,j)∈B {ηcut (A, B \ {D(j  , j)})}



⎪ minP (i ,i)∈A {ηcut (A \ {P (i , i)}, B) + ηcut (P (i , i), ∅)}



⎪ minP (i ,i)∈A,D(j  ,j)∈B {ηcut (A \ {P (i , i)}, B \ {D(j  , j)})+

ηcut (P (i , i), D(j  , j))} (7)
min

⎪ minP (i ,i)∈A,B  ⊆B {ηcut (A \ {P (i , i)}, B \ B  )



⎪ +ηcut (F (P, i ), B  ) + γ(Ω(i ), 0)}



⎪ minA ⊆A,D(j  ,j)∈B {ηcut (A \ A , B \ {D(j  , j)})+

ηcut (A , F (D, j  )j) + γ(0, Ω(j  ))} .

Algorithm 1. Homeomorphic Alignement Distance for Rooted Trees


Data: pattern rooted tree P , datas rooted tree D
Result: ηcut (P, D) = ηcut (F (P, rP ), F (D, rD )); // Prop.2
begin
foreach p ∈ VP , in suffix order do
foreach A ⊆ F(P, p) do Compute ηcut (A, ∅); // Prop.3
foreach pa ∈ anc(p) \ {p} do Compute ηcut (P (p, pa ), ∅)
foreach d ∈ VD , in suffix order do
foreach B ⊆ F(D, d) do Compute ηcut (∅, B); // Prop.3
foreach da ∈ anc(d) \ {d} do Compute ηcut (∅, D(d, da ))
foreach p ∈ VP , d ∈ VD , both in suffix order do
foreach A ⊆ F(P, p), B ⊆ F(D, d) do
Compute ηcut (A, B); // Prop.5
foreach pa ∈ anc(p) \ {p}, da ∈ anc(d) \ {d} do
Compute ηcut (P (p, pa ), D(d, da )); // Prop.4

end

The total computation time complexity is in O(|VP | ∗ |VD | ∗ (2dP ∗ 2dD ∗ (dD ∗
2dP + dP ∗ 2dD ) + hP ∗ hD ∗ (dP + dD )), where dG and hG denote, respec-
tively, the maximal degree of a vertex in G and the height of G. If the maximal
degree is bounded, the total computation time complexity is in O(|VP | ∗ |VD | ∗
hP ∗ hD ).

4.2 Algorithm for Unrooted Trees

Let G = (V, E, ω) be a weighted tree, let r ∈ V , we denote by Gr the directed


weighted tree rooted in r, such that G is the undirected graph associated to Gr .

Proposition 6. Let P = (VP , EP , ωP ) and D = (VD , ED , ωD ) be two weighted


trees.
ηcut (P, D) = mini∈VP ,j∈VD {ηcut (P i , Dj )} . (8)
Homeomorphic Alignment of Edge-Weighted Trees 141

By choosing an adapted order of navigation in the trees, avoiding the redundancy


of subtrees alignement computation, we can use the same algorithm than for
rooted trees. The total computation time of this algorithm is in O(|VP | ∗ |VD | ∗
(dP ∗ 2dP +2∗dD + dD ∗ 2dD +2∗dP + |VP | ∗ |VD | ∗ (dP + dD ))) complexity. If the
maximal degree is bounded, the total computation is in O(|VP |2 ∗ |VD |2 ) time
complexity.

5 Experimentation

5.1 Usage of Homeomorphic Alignment

In case of motion capture, we can use homeomorphic alignment in three different


ways :

– between the two unrooted trees, if we have no a priori knowledge.


– between two rooted trees, obtained from the unrooted trees by rooting them
on vertices we want to match together.
– between a rooted tree and an unrooted tree, if we want to be sure that the
root is conserved by the homeomorphic alignment.

5.2 Results

Our model tree contains seven vertices, representing head, torso, crotch, the
two hands and the two feet. Experimentally, the data tree obtained from the
skeleton of the visual hull has a degree bounded by 4, and its number of vertices
is between seven and twenty, with a gaussian probability repartition centred on
ten. All the results have been obtained on a computer with a processor Xeon
3 GHz and 1 Go of RAM.
For finding the average time of computation of our algorithm, we have ran-
domly generated 32 pattern trees, and for each pattern tree, we have generated
32 data trees, which yields 1024 pairs of trees. Each pattern tree has seven
vertices, one of which has a degree equals to 4. Each data tree has at least
one 4-degree vertex. The results of the four kinds of alignments are shown on
Fig. 3.
In the average case (|VD | ≤ 12), the homeomorphic alignement between a
rooted pattern tree and a unrooted data tree (we assume than the torso is always
aligned), can be easily computed in real time (frequence superior to 24Hz) and
in the worst case (|VD |  20), we keep an interactive time (frequence superior
to 12Hz). For tracking, if we can use the homeomorphic alignment between two
rooted trees, we are widely above 50Hz.
For finding the average precision of our algorithm, we have generate data trees
from pattern trees by adding new vertices, by three ways : splitting an existing
vertex in twice, adding a new 1-degree vertex, adding a new 2-degree vertex.
Then, we modify the weight of each edge in a proportional range. The results
are shown on Fig. 3.
142 B. Raynal, M. Couprie, and V. Biri

Frequence of HA for |Vp| = 7 Percentage of good matching for |Vp| = 20


250 100

200

% of good matching
80
frequence (Hz)

150 60
100
40
50
20
0
10 15 20 25 30 0
0 50 100 150 200 250 300
|Vd|
% of noising vertices
HA(rooted P,rooted D)
HA(rooted P, D) 0% of weight variation
HA(P, rooted D) 10% of weight variation
HA(P, D) 50% of weight variation

Fig. 3. Top : frequences of the different homeomorphic alignments for variable sizes
of data tree, and precision for several kind of noises. Bottom : Examples of results
obtained on 3D shape : black circles represent the points matching with pattern tree.

6 Conclusion

In this paper, we have introduced a new type of alignment between weighted


trees, the homeomorphic aligment, taking into account the topology and avoiding
the noise induced by spurious branches, splited and useless 2-degree vertices. This
alignment has the particularity to propose graph transformation to reach an
homeomorphism beteween tree, instead of an isomorphism, as usually proposed
in the literature.
We have also developed several robust algorithms to compute it with a good
complexity, which enable its application in real time for motion capture purpose.
In future works, we will take into account more useful information on the
model, such as spatial coordinates of data vertices, and include them in our
algorithm, for a better robustness. Finally, using this alignment, we will propose
a new fast method of pose initialization for motion capture applications.

References
1. Laurentini, A.: The Visual Hull Concept for Silhouette-based Image Understand-
ing. IEEE Transactions on Pattern Analysis and Machine Intelligence 16(2), 150–
162 (1994)
Homeomorphic Alignment of Edge-Weighted Trees 143

2. Moeslund, T.B., Hilton, A., Krüger, V.: A Survey of Advances in Vision-based


Human Motion Capture and Analysis. Computer Vision and Image Understand-
ing 104(2-3), 90–126 (2006)
3. Chu, C., Jenkins, O., Mataric, M.: Markerless Kinematic Model and Motion Cap-
ture from Volume Sequences. In: IEEE Computer Society Conference on Computer
Vision and Pattern Recognition, vol. 2. IEEE Computer Society, Los Alamitos
(2003)
4. Menier, C., Boyer, E., Raffin, B.: 3d Skeleton-based Body Pose Recovery. In: Pro-
ceedings of the 3rd International Symposium on 3D Data Processing, Visualization
and Transmission, Chapel Hill, USA (2006)
5. Brostow, G., Essa, I., Steedly, D., Kwatra, V.: Novel Skeletal Representation for
Articulated Creatures. LNCS, pp. 66–78. Springer, Heidelberg (2006)
6. Sundar, H., Silver, D., Gagvani, N., Dickinson, S.: Skeleton Based Shape Matching
and Retrieval. In: SMI, pp. 130–139 (2003)
7. Baran, I., Popović, J.: Automatic rigging and animation of 3D characters. In:
International Conference on Computer Graphics and Interactive Techniques. ACM
Press, New York (2007)
8. Cornea, N., Demirci, M., Silver, D., Shokoufandeh, A., Dickinson, S., Kantor, P.:
3D Object Retrieval using Many-to-many Matching of Curve Skeletons. In: Shape
Modeling and Applications (2005)
9. Siddiqi, K., Shokoufandeh, A., Dickinson, S., Zucker, S.: Shock Graphs and Shape
Matching. International Journal of Computer Vision 35(1), 13–32 (1999)
10. Wang, J., Zhang, K.: Finding similar consensus between trees: an algorithm and a
distance hierarchy. Pattern Recognition 34(1), 127–137 (2001)
11. Tai, K.: The Tree-to-Tree Correction Problem. Journal of the ACM 26(3), 422–433
(1979)
12. Jiang, T., Wang, L., Zhang, K.: Alignment of Trees - an Alternative to Tree Edit. In:
CPM 1994: Proceedings of the 5th Annual Symposium on Combinatorial Pattern
Matching, London, UK, pp. 75–86. Springer, Heidelberg (1994)
13. Jansson, J., Lingas, A.: A Fast Algorithm for Optimal Alignment between Similar
Ordered Trees. In: Amir, A., Landau, G.M. (eds.) CPM 2001. LNCS, vol. 2089,
pp. 232–240. Springer, Heidelberg (2001)
14. Tanaka, E., Tanaka, K.: The Tree-to-tree Editing Problem. International Journal
of Pattern Recognition and Artificial Intelligence 2(2), 221–240 (1988)
15. Selkow, S.: The Tree-to-Tree Editing Problem. Information Processing Letters 6(6),
184–186 (1977)
16. Wang, J., Zhang, K., Chang, G., Shasha, D.: Finding Approximate Patterns in
Undirected Acyclic Graphs. Pattern Recognition 35(2), 473–483 (2002)
17. Raynal, B., Biri, V., Couprie, M.: Homeomorphic Alignment of Weighted Trees.
Internal report IGM 2009-01. LIGM, Université Paris-Est (2009)
Inexact Matching of Large and Sparse Graphs Using
Laplacian Eigenvectors

David Knossow, Avinash Sharma, Diana Mateus, and Radu Horaud

Perception team, INRIA Grenoble Rhone-Alpes


655 avenue de l’Europe, Montbonnot, 38334 Saint Ismier Cedex, France
firstname.lastname@inrialpes.fr

Abstract. In this paper we propose an inexact spectral matching algorithm that


embeds large graphs on a low-dimensional isometric space spanned by a set of
eigenvectors of the graph Laplacian. Given two sets of eigenvectors that cor-
respond to the smallest non-null eigenvalues of the Laplacian matrices of two
graphs, we project each graph onto its eigenenvectors. We estimate the histograms
of these one-dimensional graph projections (eigenvector histograms) and we show
that these histograms are well suited for selecting a subset of significant eigenvec-
tors, for ordering them, for solving the sign-ambiguity of eigenvector computa-
tion, and for aligning two embeddings. This results in an inexact graph matching
solution that can be improved using a rigid point registration algorithm. We apply
the proposed methodology to match surfaces represented by meshes.

1 Introduction

Many problems in computer vision, shape recognition, document and text analysis can
be formulated as a graph matching problem. The nodes of a graph correspond to local
features or, more generally, to objects and the edges of the graph correspond to rela-
tionships between these objects. Solution to graph matching consists of finding an iso-
morphism (exact matching) or an optimal sub-graph isomorphism (inexact matching)
between the two graphs. Spectral graph matching methods are attractive because they
provide a framework that allows to embed graphs into isometric spaces and hence to re-
place the initial NP-hard isomorphism problem with a more tractable point registration
problem.
An undirected weighted graph with N nodes can be represented by an N × N real
symmetric matrix, or the adjacency matrix of the graph. Provided that this matrix has N
distinct eigenvalues, the graph can be embedded in the orthonormal basis formed by the
corresponding eigenvectors. Hence, an N -node graph becomes a set of N points in an
N -dimensional isometric space. In [1] it is proved that the eigendecomposition of the
adjacency matrices provide an optimal solution for exact graph matching, i.e., matching
graphs with the same number of nodes. The affinity matrix of a shape described by a
set of points can be used as the adjacency matrix of a fully connected weighted graph
[2,3,4,5]. Although these methods can only match shapes with the same number of
points, they introduce the heat kernel to describe the weights between points (nodes),
which has a good theoretical justification [6].

A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 144–153, 2009.

c Springer-Verlag Berlin Heidelberg 2009
Inexact Matching of Large and Sparse Graphs Using Laplacian Eigenvectors 145

Unfortunately, exact graph matching is not very practical, in particular when the
two graphs have a different number of nodes, e.g., constructed from real data, such as
visual sensors. Therefore one needs to combine spectral analysis with dimensionality
reduction such that two graphs that are being matched are embedded in a common sub-
eigenspace with lower dimension than to the original graphs. This immediately calls for
methods that allow many-to-many point correspondences. A clever idea is to combine
matching with agglomerative hierarchical clustering, as done in [7]. We also note that
spectral matching has strong links with spectral clustering [8] which uses the Laplacian
matrix of a graph [9].
The analysis of the spectral methods cited above rely on the eigenvalues of the adja-
cency or Laplacian matrices. The strict ordering of the eigenvalues allows the alignment
of the two eigenbases, while the existence of an eigengap allows the selection of a low-
dimensional eigen space. In the case of inexact matching of large and sparse graphs,
a number of issues remain open for the following reasons. The eigenvalues cannot be
reliably ordered and one needs to use heuristics such as the ones proposed in [3,10].
The graph matrices may have eigenvalues with geometric multiplicities and hence the
corresponding eigenspaces are not uniquely defined. Dimensionality reduction relies
on the existence of an eigengap. In the case of large graphs, e.g., ten thousands nodes,
the eigengap analysis yields an eigen space whose dimension is not well suited for
the task at hand. The sign ambiguity of eigenvectors is generally handled using sim-
ple heuristics [1,7]. The link between spectral matching and spectral clustering has not
yet been thoroughly investigated. Existing spectral matching algorithms put the eigen-
vectors on an equal footing; the particular role played in clustering by the the Fiedler
vector [9] has not been studied in the context of matching. We remark that the selec-
tion of strongly localized eigenvectors, which we define as the vector (eigenfunction)

Fig. 1. Two graphs (meshes) of a hip-hop dancer (top-left) with respectively 31,600 and 34,916
nodes (vertices). The matching (top-right), subsampled for visualization purposes, was obtained
by computing the embeddings of the two graphs into a 5-dimensional space (bottom-left) and by
registering the two embeddings (bottom-right).
146 D. Knossow et al.

that spans over a small subset of the graph while being zero elsewhere, hence corre-
sponding to subgraph clusters, has not been studied in depth. The only existing strategy
for selecting such eigenvectors is based on eigenvalue ordering and the detection of an
eigengap.
In this paper we propose an inexact spectral matching algorithm that embeds large
graphs on an isometric space spanned by a subset of eigenvectors corresponding to the
smallest eigenvalues of the Laplacian matrix. We claim that the tasks of (i) selecting a
subset of eigenvectors, (ii) ordering them, (iii) finding a solution to the sign-ambiguity
problem, as well as (iv) aligning two embeddings, can be carried out by computing
and comparing the histograms of the projections of the graphs’ nodes onto the eigen-
vectors. We postulate that the statistics of these histograms convey important geometric
properties of the Laplacian eigenvectors [11]. In practice, we apply the proposed algo-
rithm to match graphs corresponding to discrete surface representations of articulated
shapes, i.e., mesh registration. Figure 1 shows a graph matching result obtained with
our algorithm.

2 Laplacian Embedding

We consider undirected weighted graphs. A graph G = {V, E} has a node set V =


{V1 , . . . , VN } and an edge set E = {Eij }. We use a Gaussian kernel to define the
N × N weight and degree matrices:

Wij = exp(−d2ij )/σ 2 (1)


N
Dii = Wij (2)
j=1

where dij is the geodesic distance between two nodes and σ is a smoothing parameter.
In the case of meshes, a vertex is connected to its neighbors onto the surface. In practice
we take the Euclidean distance between two connected vertices and Dii ≈ 6 which
yields a very sparse graph. The Laplacian of a graph is defined as L = D − W. We
consider the normalized graph Laplacian: L = D−1/2 (D − W)D−1/2 . This is a semi-
definite positive symmetric matrix with eigenvalues 0 = λ0 ≤ λ1 ≤ . . . ≤ λN −1 . The
null space of this matrix is the constant eigenvector U 0 = (1 . . . 1) .
The eigenvectors of L form an orthonormal basis, U 0 U i = 0, ∀i ∈ {1, . . . , N − 1}.
Therefore we obtain the following property: k Uik = 0. It is worthwhile to notice
that L = I − W where W = D−1/2 WD−1/2 is the normalized adjacency matrix; ma-
trices L and W share the same eigenvectors. Finally let Lt = UΛU be the truncated
eigendecomposition where the null eigenvalue and constant eigenvector where omitted.
Graph projections onto the eigenvectors corresponding to the smallest non-null eigen-
values of the Laplacian are displayed on the left side of figure 2. The right side of this
figure shows the density of these projections being considered as continuous equivalent
of histograms. These densities were estimated using the MCLUST software1 [12].
1
http://www.stat.washington.edu/mclust/
Inexact Matching of Large and Sparse Graphs Using Laplacian Eigenvectors 147

Fig. 2. A mesh with 7,063 vertices and with approximately six edges per vertex is shown here pro-
jected onto four Laplacian eigenvectors corresponding to the four smallest non-null eigenvalues.
The curves onto the right correspond to histograms of these graph projections. The first one of
this vector, the Fiedler vector is supposed to split the graph along a cut but in this case such a cut
is difficult to interpret. Other vectors, such as the second and fourth ones are very well localized
which makes them good candidates for clustering and for matching. These histograms also re-
veal that not all these eigenvectors are well localized. This suggests that some of the eigenvectors
shown here are not well suited for spectral clustering.
148 D. Knossow et al.

3 Matching Using Laplacian Eigenvectors


We consider two graphs Gx and Gy , with respectively N and M nodes. Exact graph
matching, i.e., the special case N = M , can be written as the problem of minimizing
the Frobenius norm:
P∗ = arg min Wx − PWy P 2F (3)
P
over the set of N × N permutation matrices P. To solve this problem one can use the
Laplacian embeddings previously introduced. Notice first that Wx − PWy P = Lx −
PLy P . Let Lx = Ux Λx Ux and Ly = Uy Λy Uy be the truncated eigendecomposi-
tions of the two Laplacian matrices. The columns of U = [U 1 , . . . , U N −1 ] correspond
to the eigenvectors which form an orthonormal basis, and Λ = Diag [λ1 , . . . , λN −1 ].
The spectral graph matching theorem [1] states that if the eigenvalues of Lx and Ly
are distinct and if they can be ordered, then the minimum of (3) is reached by:

Q∗ = Ux SUy , (4)

where S = Diag [s1 , . . . , sN −1 ], s ∈ {+1; −1} accounts for the sign ambiguity in the
eigendecomposition and where the domain of the objective function (3) has been ex-
tended to the group of orthogonal matrices. The entries of Q∗ are Q∗ij = xi (s • y j ),
where a • b is the Hadamard product between two vectors. Since both Ux and Uy are
orthonormal matrices, all entries Q∗ij of Q∗ vary between −1 and 1. Therefore, Q∗ can
be interpreted as a cosine node-similarity matrix. Several heuristics were proposed in
the past to solve for the sign ambiguity and hence to recover node-to-node assignments
uniquely. In [1] the entries of Ux and Uy are replaced by their absolute values. The
recovery of P∗ from Q∗ , i.e., exact matching, becomes a bipartite maximum weighted
matching problem that is solved with the Hungarian algorithm.
In this paper we propose to perform the matching in a reduced space and let K <
(N, M ) be the dimension of this space. We start with a solution provided by eigenvalue
ordering followed by dimensionality reduction. This provides two sets of K ordered
eigenvectors. However, ordering based on eigenvalues is not reliable simply because
there may be geometric multiplicities that give rise, in practice, to numerical instabili-
ties. To overcome this problem we seek a new eigenvector permutation which we denote
by a K × K matrix P. Thus, equation (4) can be rewritten as:

Q = Ux S P Uy , (5)

where Ux and Uy are (N − 1) × K block-matrices. These matrices were obtained from


Ux and Uy by retaining the first K columns and by re-normalizing each row such that
the N − 1 row vectors lie on a hypersphere of dimension K, [8,7]. As above, S is a
K × K sign-ambiguity diagonal form. Notice that, unlike Q∗ in (4), matrix Q is not an
orthonormal matrix anymore (since Ux and Uy are no longer orthonormal) and it has
rank K. Consequently, it can only define an inexact (many-to-many) matching.
Suppose now that matrices S and P are correctly computed. By extension of the
spectral graph matching theorem mentioned above we obtain the following interpreta-
tion. The entries of Q in equation (5) can be written as Qij = xi y j , where xi and
y j are K-dimensional vectors corresponding respectively to the rows of Ux and to the
Inexact Matching of Large and Sparse Graphs Using Laplacian Eigenvectors 149

columns of of S P Uy . Because of the normalization performed onto the rows of Ux


and of Uy , and because the two eigenvector bases are correctly aligned, the points xi
and y j lie on the same hypersphere. This suggests that Q can be interpreted as a loose
cosine node-similarity matrix, i.e., −1 ≤ Qij ≤ 1, and that good matches xi ↔ y j
may be chosen using a simple strategy such as Qij > t > 0. As a consequence, many-
to-many matches may be established by seeking for each point in one set its nearest
neighbors in the other set, and vice-versa.
This result is very important because it allows to bootstrap the matching of very
large graphs through the alignment of two eigenbases of size K with K N . We
return now to the more general case when the two graphs have different cardinalities.
The best one can hope in this case is to find the largest isomorphic subgraphs of the
two graphs. In terms of spectral embeddings, this means that one has to find the largest
subsets of points of the two sets of K-dimensional points. This problem can be stated
as the following minimization problem:

min αij xi − R y j 2 (6)


R
i,j

This is an instance of the rigid point registration problem that can be solved by treat-
ing the assignments αij as missing variables in the framework of the expectation-
maximization algorithm. A detailed solution is described in [13]. If matrices S and
P are not correctly estimated, matrix R belongs to the orthogonal group, i.e., rotations
and reflections. However, if S and P are correctly estimated, R belongs to the special
orthogonal group, i.e., it is a rotation, which means that the two sets of points can be
matched via an Euclidean transformation. The estimation of the latter is much more
tractable than the more general case of orthogonal transformations.
Hence, the inexact graph matching problem at hand can be solved with the fol-
lowing three steps: (i) estimate matrices S and P using properties associated with the
Laplacian eigenvectors, (ii) establish a match between the two sets of embedded nodes
(K-D points) based on a nearest-neighbor strategy, and (iii) achieve point registration
probabilistically by jointly estimating point-to-point assignments and a rotation matrix
between the two sets of points.

4 Eigenvector Alignment Based on Histograms

Each column U xi of Ux (as well as U yj of Uy ), 1 ≤ i, j ≤ K, corresponds to an


eigenvector of dimension N − 1 (and of dimension M − 1). Hence, vector U xi (and
Uyj ) can be interpreted as function projecting the nodes of Gx (and of Gy ) onto the real
line defined by Uxi : RN −1 → R (and by Uyj : RM−1 → R).
In the case of two isomorphic graphs, there should be a one-to-one match between
the eigenfunctions of the first graph and the eigenfunctions of the second graph, pro-
vided that both S and P are known. Indeed, in the isomorphic case, the node-to-node
assignment between the two graphs is constrained by (4). Unfortunately, as already ex-
plained in the previous section, the two eigenbases cannot be reliably selected such that
they span the same Euclidean space. Alternatively, we consider the histograms of the
150 D. Knossow et al.

eigenfuctions just defined, namely h([Uxi ]) and h([Uyj ]), where the notation h([U ])
corresponds to the histogram of the set of values returned by the eigenfunction U . The
first observation is that these histograms are invariant to node permutation, i.e., invari-
ant to the order in which the components of the eigenvectors are considered. Therefore,
the histogram of an eigenfunction can be viewed as an invariant signature of an eigen-
vector. The second important observation is that h([−U ]) = h(B −[U ]), where B is the
total number of bins used to build the histograms; Hence, the histograms can be used
to detect sign flips. The third important observation is that the shape of the histogram
is not too sensitive to the number of nodes in the graph and it is therefore possible to
compare histograms arising form graphs with different cardinalities.
The problem of estimating matrices S and P can therefore be replaced by the
problem of finding a set of assignments {U xi ⇔ ±U yj , 1 ≤ i, j ≤ K} based on the
comparison of their histograms. This is an instance of the bipartite maximum match-
ing algorithm already mentioned with complexity O(K 3 ). Since the eigenvectors are
defined up to a sign (modeled by S), we must associate two histograms with each
eigenfunction. Let C(hi , hj ) be a measure of similarity between two histograms. By
computing the similarities between all pairs of histograms we can build a K × K ma-
trix A whose entries are defined by:

Aij = min{C(h([Uxi ]), h([Uyj ])), C(h([Uxi ]), h([−Uyj ]))}

as well as another matrix whose entries contain the signs of Uyj which are eventually
retained. The Hungarian algorithm finds an optimal permutation matrix P as well as a
sign matrix S.

5 Results

As a first example, consider a motion sequence of an articulated shape and their reg-
istration shown in figure 3. The articulated shape is described by a mesh/graph with
7,063 vertices and the degree of each vertex is approximately equal to six. The graphs
were matched using the method described above, i.e., alignment of eigenvectors based
on their histograms and naive point registration based on a nearest neighbor classifier.
On an average, the algorithm found 4,000 one-to-one matches and 3,000 many-to-many
matches. Notice that in this case the two graphs are isomorphic.
Figure 4 shows two sets of eigenvector histograms (top and bottom) corresponding to
the first pair of registered shapes of figure 3. The histograms shown on each row corre-
spond to the five eigenvectors associated to the smallest non-null eigenvalues, shown in
increasing order from left to right. There is a striking similarity between these histograms
in spite of large discrepancies between the two shapes’ poses. This clearly suggests that
these histograms are good candidates for both exact and inexact graph matching.
Figure 5 shows three more examples of inexact graph matching corresponding to
different shape pairs: dog-horse, dancer-gorilla, and horse-seahorse. Each mesh in this
figure is described by a sparse graph and there are notable differences in the number of
nodes. For example, the horse graph has 3,400 nodes, the gorilla graph has 2,046 nodes,
the dancer graph has 34,000 nodes, and the seahorse graph has 2,190 nodes. The top
Inexact Matching of Large and Sparse Graphs Using Laplacian Eigenvectors 151

Fig. 3. The result of applying the graph matching algorithm to a dance sequence

Fig. 4. Two sets of histograms associated with two sets of Laplacian eigenvectors. One may notice
the striking similarity between these histograms that correspond to two isomorphic graphs.

Fig. 5. Examples of matching different shapes (dog-horse, dancer-gorilla, and horse-seahorse)


corresponding to graphs of different size. The top row shows the results obtained with the al-
gorithm described in this paper while the second row shows the results after performing point
registration with a variant of the EM algorithm. While in the first two examples (dog-horse and
dancer-gorilla) the graphs have the same global structure, the third example shows the result
obtained with two very different graphs.

row of figure 5 shows the result of many-to-many inexact matching obtained with the
method described in this paper. The bottom row shows the result of one-to-one rigid
point registration obtained with a variant of the EM algorithm [13] initialized from the
matches shown on the top row.
152 D. Knossow et al.

6 Conclusion

In this paper, we proposed a framework for inexact matching of large and sparse graphs.
The method is completely unsupervised, it does not need any prior set of matches be-
tween the two graphs. The main difficulty of the problem is twofold: (1) to extend
the known spectral graph matching methods such that they can deal with graphs of
very large size, i.e., of the order of 10,000 nodes and (2) to carry out the match-
ing in a robust manner, i.e., in the presence of large discrepancies between the two
graphs.
We showed that it is possible to relax the graph isomorphism problem such that in-
exact graph matching can be carried out when the dimension of the embedding space
is much smaller than the number of vertices in the two graphs. We also showed that the
alignment of the eigenbases associated with the two embedded shapes can be robustly
estimated using eigenvector’s density instead of eigenvalue ordering. The method starts
with an initial solution based on ordering the eigenvalues and then it finds the optimal
subset of eigenvectors that are aligned based on comparing their density distribution.
This selects both a one-to-one eigenvector alignment and the dimension of the embed-
ding. We also pointed out localization as an important property of eigenvectors and
presented initial results to support our observations.
In future, we plan to investigate more thoroughly the link between graph matching
and graph clustering. We believe localization property to be a promising direction to
move forward.

References
1. Umeyama, S.: An eigen decomposition approach to weighted graph matching problems.
IEEE PAMI 10, 695–703 (1988)
2. Scott, G., Longuet-Higgins, C.: An algorithm for associating the features of two images.
Proceedings Biological Sciences 244, 21–26 (1991)
3. Shapiro, L., Brady, J.: Feature-based correspondence: an eigenvector approach. Image and
Vision Computing 10, 283–288 (1992)
4. Carcassoni, M., Hancock, E.R.: Correspondence matching with modal clusters. IEEE
PAMI 25, 1609–1615 (2003)
5. Carcassoni, M., Hancock, E.R.: Spectral correspondence for point pattern matching. Pattern
Recognition 36, 193–204 (2003)
6. Belkin, M., Niyogi, P.: Laplacian eigenmaps for dimensionality reduction and data represen-
tation. Neural Computation 15, 1373–1396 (2003)
7. Caelli, T., Kosinov, S.: An eigenspace projection clustering method for inexact graph match-
ing. IEEE PAMI 26, 515–519 (2004)
8. Ng, A., Jordan, M., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: NIPS
(2002)
9. Chung, F.: Spectral Graph Theory. American Mathematical Society, Providence (1997)
10. Zhang, H., van Kaick, O., Dyer, R.: Spectral methods for mesh processing and analysis. In:
Eurographics Symposium on Geometry Processing (2007)
Inexact Matching of Large and Sparse Graphs Using Laplacian Eigenvectors 153

11. Biyikoglu, T., Leydold, J., Stadler, P.F.: Laplacian Eigenvectors of Graphs. Springer, Heidel-
berg (2007)
12. Fraley, C., Raftery, A.: MCLUST version 3 for R: Normal mixture modeling and
model-based clustering. Technical Report 504, Department of Statistics, University of
Washington (2006)
13. Mateus, D., Horaud, R., Knossow, D., Cuzzolin, F., Boyer, E.: Articulated shape
matching using Laplacian eigenfunctions and unsupervised point registration. In:
CVPR (2008)
Graph Matching Based on Node Signatures

Salim Jouili and Salvatore Tabbone

University of Nancy 2 - LORIA UMR 7503


BP 239, 54506 Vandoeuvre-lès-Nancy Cedex, France
{salim.jouili,tabbone}@loria.fr

Abstract. We present an algorithm for graph matching in a pattern


recognition context. This algorithm deals with weighted graphs, based on
new structural and topological node signatures. Using these signatures,
we compute an optimum solution for node-to-node assignment with the
Hungarian method and propose a distance formula to compute the dis-
tance between weighted graphs. The experiments demonstrate that the
newly presented algorithm is well suited to pattern recognition appli-
cations. Compared with four well-known methods, our algorithm gives
good results for clustering and retrieving images. A sensitivity analysis
reveals that the proposed method is also insensitive to weak structural
changes.

Keywords: graph representation, graph matching, graph clustering.

1 Introduction
In image processing applications, it is often required to match different images
of the same object or similar objects based on structural descriptions constructed
from these images. If the structural descriptions of objects are represented by
graphs, different images can be matched by performing some kind of graph
matching. Graph matching is the process of finding a correspondence between
nodes and edges of two graphs that satisfies some constraints ensuring that
similar substructures in one graph are mapped to similar substructures in the
other. Many approaches have been proposed to solve the graph matching prob-
lem [1,5,15]. Matching by minimizing the edit distance [4,11,13,14] is attractive
since it gauges the distance between graphs by counting the least cost of edit
operations needed to make two graphs isomorphic. Moreover the graph edit dis-
tance has tolerance to noise and distortion. The main drawback of graph edit
distance is its computational complexity, which is exponential in the number of
nodes of the involved graphs. To reduce the complexity, Apostolos [14] gives a
fast edit distance based on matching specific graphs by using the sorted graph
histogram. Equivalently, Lopresti [12] gives an equivalence test procedure that
allows to quantify the similarity between graphs. Other methods based on spec-
tral approaches [2,3,16], give an elegant matrix representation for graphs that

This work is partially supported by the French National Research Agency project
NAVIDOMASS referenced under ANR-06-MCDA-012 and Lorraine region.

A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 154–163, 2009.

c Springer-Verlag Berlin Heidelberg 2009
Graph Matching Based on Node Signatures 155

ensure an approximate solutions for graphs matching in polynomial time. Among


the pioneering works related to graph matching using the spectral techniques we
quote the paper of Umeyama [3], in which the Weighted Graph Isomorphism
Problem is addressed by an eigendecomposition. However, this method can only
be applied for graphs with the same number of nodes. More recent works [17,18]
extend this approach for graphs with different sizes but with a higher complexity.
In this paper, we propose a new efficient algorithm for matching and com-
puting the distance between weighted graphs. We introduce a new vector-based
node signature to reduce the problem of graph matching to a bipartite graph
matching problem. Each node is associated with a vector where components are
the collection of the node degree and the incident edge weights. Using these node
signatures a cost matrix is constructed. The cost matrix describes the matching
costs between nodes in two graphs, it is a (n,m) matrix where n and m are the
sizes of the two graphs. An element (i,j ) in this matrix gives the Manhattan
distance between the ith node signature in the first graph and the j th node
signature in the second graph. To find the optimum matching we consider this
problem as an instance of the assignment problem [6,7,8], which can be solved by
the Hungarian method [19]. We introduce also a new metric to compute the dis-
tance between graphs. The concept of node signature has been studied previously
in [10,8,15] where the node signatures are computed using spectral, decompo-
sition and random walks approaches. On the contrary, our node signature is a
vector and it is computed straightforwardly from the adjacency matrix.
The remainder of this paper is organized as follows: in the next section (2),
the proposed matching algorithm is described and also the distance between two
graphs. This distance is used to cluster and retrieve graph data sets. The pro-
posed algorithm is validated within images clustering and content-based image
retrieval applications. We have compared our results with the Umeyama method
[3], the graph edit distance from spectral seriation [2], the graph histograms dis-
tance [14], and the graph probing technique [12] (section 3). Finally, in section
4, some conclusions are drawn.

2 Graph Matching Algorithm

In this section we describe our algorithm, firstly for the graph matching problem
(exact and inexact), and then for computing a metric distance between graphs.

Graph matching method. In order to obtain a set of local descriptions de-


scribing a weighted graph, each node is associated to a signature (vector). As it
will be seen later, these node signatures are used to determine if two nodes in
different graphs can be matched. Therefore, the construction of the node signa-
ture is a crucial stage in the graph matching process. For this aim, two kinds
of information are available to describe the nodes. The first one is the degree of
the node and the second one is the weights of the incident edges of the node.
By combining these two informations, the valued neighborhood relations can be
drawn as well as the topological features of one node in the graph. We introduce
156 S. Jouili and S. Tabbone

a node signature in the context of weighted and unweighted graphs. For weighted
graphs, the signature is defined as the degree of the node and the weights of all
the incident edges. Given a graph G = (X, E), the node signature is formulated
as follows:
V s(x) = {d(x), w0 ,w1 ,w2 ...}
Where x ∈ X, d(x) gives the degree of x, and wi are the weights of the incident
edges to x. For unweighted graphs, the weights of any incident edges are fixed to
1. The set of node signatures (vectors) describing nodes in a graph is a collection
of local descriptions. So, local changes in the graph will modify only a subset of
vectors while leaving the rest unchanged. Moreover, the computational cost of
the construction of these signatures is low since it is computed straightforwardly
from the adjacency matrix. Based on these node signatures, a cost matrix C is
defined by:
Cgi ,gj (i, j) = L1 (γ(i), γ(j)) (1)
where i and j are, respectively, the nodes of gi and gj , and L1 (.,.) the Manhattan
distance. γ(i) is the vector Vs (i) sorted only for the weights in a decreasing order.
Finally, since the graphs have different size, the γ vectors are padded by zeros
to keep the same size of vectors.
The cost matrix defines a vertex-to-vertex assignment for a pair of graphs.
This task can be seen as an instance of the assignment problem, and can be
solved by the Hungarian method, running in O (n 3 ) time [19] where n is the
size of the biggest graph. The permutation matrix P, obtained by applying the
Hungarian method to the cost matrix, defines the optimum matching between
two given graphs. Based on the permutation matrix P, we define a matching
function M as follow :

yj , if Pi,j = 1 (2a)
M (xi ) =
0, else (2b)
where xi and yj are the nodes, respectively, in the first and the second graph.

Distance formula. Before introducing the distance formula we denote:


– |M |: the size of the matching function M which is the number of matching
operations. In any case, when two graphs are matched the number of the
matching
 operations is the size of the smaller one.
– M̂ = L1 (γ(x), γ(M(x))) : the matching cost which is the sum of the
matching operation costs, for two graphs matched by M.
We define the distance between two graphs gi and gj as follows:

D(gi , gj ) = + ||gi | − |gj || (3)
|M |
This distance represents the matching cost normalized by the matching size,
and the result is increased by the difference of sizes of the two graphs. We can
demonstrate that this distance is a metric satisfying non-negativity, identity of
indiscernible, and symmetry triangle inequality conditions.
Graph Matching Based on Node Signatures 157

3 Experiments
To show the utility of our method in pattern recognition applications and the
robustness to structural changes, we drawn different experiments.

Graph clustering application. Firstly, we compare our method with the


Umeyama’s algorithm for inexact graph matching [3]. The reason of selecting
this method is that since we have applied the Hungarian algorithm to the cost
matrix to find the optimum matching, we choose to compare our approach with
a similar one. Since this method needs weighted graphs with the same number of
nodes, we use only two classes from the GREC database, both have 15 graphs and
8 nodes per graph [22,21]. The GREC data set consists in graphs representing
symbols from architectural and electronic drawings classified into 22 classes.
Graphs in each class are obtained by distorting original GREC images and the
extracted graphs[21].
We compute the distance matrices (Fig. 1) for the two methods. The size
of each matrix is 30x30. Each class of images corresponds to a block in these
matrices. Images labeled between 1 and 15 correspond to the first class, and
images between 16 and 30 correspond to the second class. The row and column
index the distances between graphs, an element (i,j ) corresponds to the distance
between the ith and the j th image. Two blocks along the diagonal present the
within-class distance and other blocks present the between-class distance. In Fig.
1(a), there are three blocks instead of two blocks along diagonal, and in the same
block there are higher intensities; thus the within-class distance has a high value.
In contrast, Fig. 1(b) shows two marked blocks, so a higher difference between
within-class and between-classes distances.
Furthermore, we have performed the multidimensional scaling (MDS)[26] and
the minimum spanning tree (MST) clustering [25]. Generally speaking, the MDS
pictures the structure of a set of objects from data that define the distances
between pairs of objects. Each object is represented by a point in a multidi-
mensional space. The points are arranged in this space so that the distances
between pairs of points have the strongest possible relation to the similarities
among pairs of objects. We show the MDS results corresponding to the Umeyama
method (Fig. 2(a)) and the results of our method (Fig. 2(b)). In Fig. 2(a), the
two classes can not clearly be separated, since some points of diverse classes are

60

60

5 5
50

50

10 10
40
40

15 15
30
30

20 20
20
20

25 25
10
10

(a) 30
5 10 15 20 25 30
0
(b) 30
5 10 15 20 25 30

Fig. 1. Graph distance matrices. (a) results from Umeyama approach; (b) results from
our approach.
158 S. Jouili and S. Tabbone

100 8
22
27
17
18
20
23
21
30
16
26
25
29
19
24
28 28
18

50 6
9
6

3
7 4
4
0 2 21
5 17
25
19
2 12 29
8
−50 5 13
1

0 8 30
15
14 24
16
20
−100 46
9 23
13 15 27
−2 14
10
22
26
11 10

−150
11 −4
13 12 7

(a) −200
−150 −100 −50 0 50 100 150 200 (b) −6
−40 −30 −20 −10 0 10 20 30

Fig. 2. MDS for each distance matrices. (a) MDS of Umeyama approach. (b) MDS of
our graph distance.

mixed together. In Fig. 2(b), two classes of images can be clustered clearly and
are distributed more compactly.
The MST method is a well known clustering method from the graph theory
analysis. By this approach, a minimum spanning tree for the complete graph is
generated, whose nodes are images and edge weights are the distance measures
between images (graphs in our experiments). By cutting all edges with weights
greater than a specified threshold, subtrees are created and each subtree repre-
sents a cluster. We use the distance matrices obtained previously to implement
the MST clustering and for each method a threshold that optimizes its results
is selected (see Table. 1). The MST clustering is evaluated by the Rand index
[27] and the Dunn index [28]. The Rand index measures how closely the clusters
created by the clustering algorithm match the ground truth. The Dunn index
is a measure of the compactness and separation of the clusters and unlike the
Rand index, the Dunn index is not normalized. When the distance measure is
the Umeyama distance, many images of second class are clustered into the first
class and three classes are detected by MST clustering. When our method is
used, two classes are detected and all images are clustered correctly. These re-
sults coincide with the MDS results. In addition, the results of Dunn index and
the Rand index show that the clustering using our method obtains a better sep-
aration of the graphs into compact clusters. The time consumed by our method
is 39.14% less than the Umeyama one (see Table. 1).
Secondly, we have compared our method with the GED from spectral seriation
[2], the graph histograms [14] and the graph probing [12]. The experiments

Table 1. MST clustering with our graph distance and Umeyama’s approach

Execution Rand Dunn


Cluster
time (s) Index Index
1 2 3
Images
Umeyama’s 3, 20, 11, 14, 16, 25, 26, 27, 17, 18, 22, 19, 5.751 0.69 0.002
5, 2, 21, 24, 4, 28 23
Method 15, 30, 8, 6, 7,
10, 13, 1, 9, 12

Our 1, 5, 3, 7, 14, 16, 20, 23, 27, 2.251 1 2.32


15, 2, 10, 4, 22, 26, 19, 29,
Method 12, 6, 9, 8, 11, 30, 24, 17, 21,
13 25, 18, 28
Graph Matching Based on Node Signatures 159

5 5 5 5

10 10 10 10

15 15 15 15

20 20 20 20

25 25 25 25

(a) 30
5 10 15 20 25 30 (b) 30
5 10 15 20 25 30 (c) 30
5 10 15 20 25 30 (d) 30
5 10 15 20 25 30

Fig. 3. Graph distance matrices. (a) results from our method; (b) results from GED
from spectral seriation; (c) results from graph histograms method; (d)results from
graph probing method.

consist on applying the previous tests (MDS and MST) on a database derived from
COIL-100 [20] which contains different views of 3D objects. We have used three
classes chosen randomly, with ten images per class. Two consecutive images in
the same class represent the same object rotated by 5o . The images are converted
into graphs by feature points extraction using the Harris interest points [23] and
Delaunay triangulation [24]. Finally, in order to get weighted graphs, each edge is
weighted by the euclidean distance between the two points that it connect. The
size of the graphs ranges from 5 to 128 nodes.
The distance matrix in Fig. 3(a) show clearly three blocks along the diagonal;
thus the within-class and between-class distances are not close to each other.
Whereas, in the other matrices (Fig. 3.b-d) the intensity of the first two blocks
along the diagonal is close to the neighbor blocks. In addition, the MDS (see
Fig. 4) and the MST clustering results (see Table. 2) show that with our method
three classes are clearly separated and the Rand index gets a value of 1. However,
the evaluation of the separability and the compactness of the created clusters
show that the graph histograms [14] has the best Dunn index but with two
detected classes only (instead of three classes) and the graph probing has the
best execution time.
From table. 2, we can note that contrary to our method the first two classes are
merged for the three methods (spectral seriation, graph histograms and graph
probing). Each of these approaches uses a global description to represent graphs:
the probing [12] and the graph histograms [14] methods represent each graph
with only one vector, and the spectral seriation method [2] uses a string represen-
tation for graphs. Therefore, these global descriptions can not distinct differences
when the graphs share similar global characteristics but not local.

2D MDS Solutions
2D MDS Solutions 2D MDS Solutions 2D MDS Solutions
10
60 30 40

50
30
20
40

20
30
5 10

20
10

10 0

0
0

0 −10
−10
−10

−20
−20
−20
−30

(a) −40
−80 −60 −40 −20 0 20 40 60 80 100 (b) −5
−10 −5 0 5 10 15 20 (c) −30
−400 −200 0 200 400 600 800 (d) −30
−50 0 50 100

Fig. 4. MDS. (a) results from our method; (b) results from GED from spectral seriation;
(c) results from graph histograms method; (d)results from graph probing method.
160 S. Jouili and S. Tabbone

Table 2. MST clustering in three classes from COIL-100 : images 1-10 belong to first
class, images 11-20 to the second class and images 21-30 to the third class

Cluster Execution Rand Dunn


time (s) Index Index
1 2 3
Images
Spectral 18, 20, 13, 14, 21, 22, 27, 23, 1195.4 0.77 1.23
17, 19, 16, 15, 25, 24, 28, 26,
Seriation 11, 12, 1, 4, 9, 2, 29, 30
6, 3, 10, 7, 8, 5

Histograms 14, 18, 13, 17, 21, 27, 22, 23, 25.60 0.77 4.54
20, 11, 15, 16, 25, 24, 28, 26,
method 19, 1, 4, 7, 8, 10, 30, 29
9, 5, 2, 3, 6, 12

Graph 14, 18, 13, 20, 21, 29, 22, 25, 19.46 0.77 1.78
19, 16, 17, 11, 23, 24, 27, 26,
Probing 15, 12, 2, 4, 7, 3, 28, 30
6, 10, 9, 8, 1, 5

Our 3, 6, 2, 1, 9, 4, 7, 11, 19, 14, 17, 21, 22, 23, 25, 329.02 1 1.54
8, 10, 5 18, 20, 16, 12, 24, 28, 26, 30,
method 13, 15 27, 29

Graph retrieval application. Firstly, the retrieval performance on the face ex-
pression database of Carnegie Mellon University [29] are evaluated. Secondly, the
effectiveness of the proposed node signature is evaluated by performing a graph
retrieval application with the GREC database [21,22]. In the two
experiments, given a query image, the system retrieves the ten most similar
images from the database. The receiver-operating curve (ROC) is used to mea-
sure retrieval performances. The ROC curve is formed by Precision rate against
Recall rate.
Figure 5 gives the retrieving results of our methods compared with the three
methods used previously on the face database which contains 13 subjects and
each subject has 75 images showing different expressions. The graphs are con-
structed with same manner as the previous experiment (graph clustering). The
size of the graphs ranges from 4 to 17. Even though our method provides better
results, the results in the figure 5 have a low performance. We can conclude
that the way of the construction of the graphs is not appropriated for this kind
of data.

1
Our Method
Histogram Method
0.9 Probing
Spectral

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

Fig. 5. Precision-Recall curves


Graph Matching Based on Node Signatures 161

Table 3. Accuracy rate (A.R) in the GREC database

Node Signature Node signature Node signature


without node without edge
degree weights
A.R 60.19% 56.25 % 50.30 %

Table 3 shows the accuracy rate of the retrieval on the GREC database making
use of our graph distance as a function of the node signature. The aim of this
experiment is to show the behavior of our metric when the signature about each
node is defined of one of the two features either the degree of the node or the
weights of the incident edges. From this experiment, we can remark that the
use of the combination of the degree and the weights improves the accuracy
rate. Moreover, the incident edge weights feature seems to affect more strongly
the behavior of our metric because this feature provides a good specification to
characterize the nodes compared with only the node degree.
Sensitivity Analysis. The aim in this section is to investigate the sensitivity
of our matching method to structural differences in the graphs. Here, we have
taken three classes from the COIL-100 database, each one contains 10 images.
The structural errors are simulated by randomly deleting nodes and edges in
the graph. The query graphs are the distorted version of the original graph
representing the 5th image in each class.
Figure 6 shows the retrieval accuracy as a function of the percentage of edges
deletion (Fig. 6-a) and nodes deletion (Fig. 6-b). The retrieval accuracy degrades
when the percent of edge deletion is around 22% (Fig. 6-a) and 20% of node
deletion (Fig. 6-b). The main feature to denote from these plots is that our graph
matching method is most robust to edge deletion, because the edge deletion does
not imply an important structural changes into the graph. It changes only some
elements in the node signatures of the incident nodes of the deleted edge. In fact,
the node signature procedure describes the nodes from different localization in
the graph, e.g. all informations about the connected edge to the node is given.
Therefore, the performance of the retrieval task is more sensitive to node deletion
compared to the edge deletion.

10 10
Graph corresponding to the 5th image in the first class Graph corresponding to the 5th image in the first class
Graph corresponding to the 5th image in the second class Graph corresponding to the 5th image in the second class
Graph corresponding to the 5th image in the third class Graph corresponding to the 5th image in the third class
9
8

8
Accuracy of Retrieval
Retrieved images

6
7

6
4

2
4

3 0
0 20 40 60 80 100 0 20 40 60 80 100
(a) Percentage of edges deleted (b) Percentage of nodes deleted

Fig. 6. Effect of Noise for similarity queries. (b) Edges Deletion. (a) Nodes deletion.
162 S. Jouili and S. Tabbone

4 Conclusion

In this work, we propose a new graph matching technique based on node signa-
tures describing local information in the graphs. The cost matrix between two
graphs is based on these signatures and the optimum matching is computed us-
ing the Hungarian algorithm. Based on this matching, we have also proposed a
metric graph distance. From the experimental results, we have implicitly shown,
that the nodes are well differentiated by their valence and the weights of the inci-
dent edges (considered as an unordered set) and therefore, our method provides
good results to cluster and retrieve images represented by graphs.

References

1. Myers, R., Wilson, R.C., Hancock, E.R.: Bayesian Graph Edit Distance. IEEE
Trans. Pattern Anal. Mach. Intell. 22(6), 628–635 (2000)
2. Robles-Kelly, A., Hancock, E.R.: Graph edit distance from spectral seriation. IEEE
Trans. on Pattern Analysis and Machine Intelligence 27(3), 365–378 (2005)
3. Umeyama, S.: An eigendecomposition approach to weighted graph matching prob-
lems. IEEE Trans. on Pattern Analysis and Machine Intelligence 10(5), 695–703
(1988)
4. Bunke, H., Shearer, K.: A graph distance metric based on the maximal common
subgraph. Pattern Recognition Letters 19, 255–259 (1998)
5. Bunke, H., Munger, A., Jiang, X.: Combinatorial Search vs. Genetic Algorithms: A
Case Study Based on the Generalized Median Graph Problem. Pattern Recognition
Letters 20(11-13), 1271–1279 (1999)
6. Riesen, K., Bunke, H.: Approximate graph edit distance computation
by means of bipartite graph matching. Image Vis. Comput. (2008),
doi:10.1016/j.imavis.2008.04.004
7. Gold, S., Rangarajan, A.: A graduated assignment algorithm for graph matching.
IEEE Trans. on Pattern Analysis and Machine Intelligence 18(4), 377–388 (1996)
8. Shokoufandeh, A., Dickinson, S.: Applications of Bipartite Matching to Problems
in Object Recognition. In: Proceedings, ICCV Workshop on Graph Algorithms and
Computer Vision, September 21 (1999)
9. Shokoufandeh, A., Dickinson, S.: A unified framework for indexing and matching
hierarchical shape structures. In: Arcelli, C., Cordella, L.P., Sanniti di Baja, G.
(eds.) IWVF 2001. LNCS, vol. 2059, pp. 67–84. Springer, Heidelberg (2001)
10. Eshera, M.A., Fu, K.S.: A graph distance measure for image analysis. IEEE Trans.
Syst. Man Cybern. 14, 398–408 (1984)
11. Sorlin, S., Solnon, C., Jolion, J.M.: A Generic Multivalent Graph Distance Measure
Based on Multivalent Matchings. Applied Graph Theory in Computer Vision and
Pattern Recognition 52, 151–181 (2007)
12. Lopresti, D., Wilfong, G.: A fast technique for comparing graph representations
with applications to performance evaluation. International Journal on Document
Analysis and Recognition 6(4), 219–229 (2004)
13. Sanfeliu, A., Fu, K.S.: A Distance Measure between Attributed Relational Graphs
for Pattern Recognition. IEEE Trans. Systems, Man, and Cybernetics 13(353-362)
(1983)
Graph Matching Based on Node Signatures 163

14. Apostolos, N.P., Yannis, M.: Structure-Based Similarity Search with Graph His-
tograms. In: Proc. of the 10th International Workshop on Database & Expert
Systems Applications (1999)
15. Gori, M., Maggini, M., Sarti, L.: Exact and Approximate graph matching using
random walks. IEEE Trans. on Pattern Analysis and Machine Intelligence 27(7),
1100–1111 (2005)
16. Chung, R.K.: FAN, Spectral Graph Theory. AMS Publications (1997)
17. Xu, L., King, I.: A PCA approach for fast retrieval of structural patterns in
attributed graphs. IEEE Trans. Systems, Man, and Cybernetics 31(5), 812–817
(2001)
18. Luo, B., Hancock, E.R.: Structural Graph Matching Using the EM Algorithm and
Singular Value Decomposition. IEEE Trans. on Pattern Analysis and Machine
Intelligence 23(10), 1120–1136 (2001)
19. Kuhn, H.W.: The Hungarian method for the assignment problem. Naval Research
Logistic Quarterly 2, 83–97 (1955)
20. Nene, S.A., Nayar, S.K., Murase, H.: Columbia Object Image Library (COIL-100),
technical report, Columbia Univ. (1996)
21. Riesen, K., Bunke, H.: IAM Graph Database Repository for Graph Based Pattern
Recognition and Machine Learning. In: IAPR Workshop SSPR & SPR, pp. 287–297
(2008)
22. Dosch, P., Valveny, E.: Report on the Second Symbol Recognition Contest. In:
Proc. 6th IAPR Workshop on Graphics Recognition, pp. 381–397 (2005)
23. Harris, C., Stephens, M.: A combined corner and edge detection. In: Proc. 4th
Alvey Vision Conf., pp. 189–192 (1988)
24. Fortune, S.: Voronoi diagrams and Delaunay triangulations. In: Computing in Eu-
clidean Geometry, pp. 193–233 (1992)
25. Zahn, C.T.: Graph-theoretical methods for detecting and describing Gestalt clus-
ters. IEEE Trans. on Computers C-20, 68–86 (1971)
26. Hofmann, T., Buhmann, J.M.: Multidimensional Scaling and Data Clustering. In:
Advances in Neural Information Processing Systems (NIPS 7), pp. 459–466. Mor-
gan Kaufmann Publishers, San Francisco (1995)
27. Rand, W.M.: Objective criteria for the evaluation of clustering methods. Journal
of the American Statistical Association 66, 846–850 (1971)
28. Dunn, J.: Well separated clusters and optimal fuzzy partitions. Journal of Cyber-
netics 4(1), 95–104 (1974)
29. Carnegie Mellon University face expression database,
http://amp.ece.cmu.edu/downloads.htm
A Structural and Semantic Probabilistic Model for
Matching and Representing a Set of Graphs

Albert Solé-Ribalta and Francesc Serratosa

Universitat Rovira i Virgili


Departament d’Enginyeria Informàtica i Matemàtiques,
Universitat Rovira i Virgili, Spain
albert.sole@urv.cat
francesc.serratosa@urv.cat

Abstract. This article presents a structural and probabilistic framework for rep-
resenting a class of attributed graphs with only one structure. The aim of this ar-
ticle is to define a new model, called Structurally-Defined Random Graphs.
This structure keeps together statistical and structural information to increase
the capacity of the model to discern between attributed graphs within or outside
the class. Moreover, we define the match probability of an attributed graph re-
spect to our model that can be used as a dissimilarity measure. Our model has
the advantage that does not incorporate application dependent parameters such
as edition costs. The experimental validation on a TC-15 database shows that
our model obtains higher recognition results, when there is moderate variability
of the class elements, than several structural matching algorithms. Indeed in our
model fewer comparisons are needed.

Keywords: graph matching, probabilistic model, semantic relations, structural


relations, graph synthesis, graph clustering.

1 Introduction
From 80’s, graphs have increase its importance in pattern recognition, being one of
the more powerful characteristics the abstraction they achieve. Therefore, the same
structure is able to represent a wide sort of problems from image understanding to
interaction networks. Consequently, algorithms based on graph models can be used in
a very large problems space. There is an interesting review of graph representation
models, graph matching algorithms and its applications in [7].
One of the main problems that practical applications, using structural pattern rec-
ognition, are confronted with is the fact that sometimes there is more than one model
graph that represents a class, what means that the conventional error-tolerant graph
matching algorithms must be applied to each model-input pair sequentially. As a
consequence, the total computational cost is linearly dependent on the size of the
database of model graphs and exponential to the number of nodes of the graphs to be
compared. For applications dealing with large databases, this may be prohibitive. To
alleviate this problem, some attempts have been made to try to reduce the computa-
tional time of matching the unknown input patterns to the whole set of models from

A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 164–173, 2009.
© Springer-Verlag Berlin Heidelberg 2009
A Structural and Semantic Probabilistic Model 165

the database. Assuming that the graphs that represent a cluster or a class are not com-
pletely dissimilar, only one structural and probabilistic model could be defined from
these graphs to represent the cluster, and thus, only one comparison would be needed
for each cluster [FORGS, FDG, SORG].
One of the earliest approaches was the model called First Order Random Graphs
(FORGs) [3], where a random variable was assigned to each node and edge to repre-
sent its possible values. In the Function-Described Graph approach (FDGs) [2], some
logical functions between nodes and arcs were introduced to alleviate some problems,
increasing thus, the capacity to represent the set with a low increase of storage space.
Finally, Second-Order Random Graphs (SORGs) [4] were presented. Basically, they
converted the logical functions of the FDGs in bidimensional random variables. The
representative capacity was increased and also the storage space.
This paper presents a new model called Structurally-Defined Random Graph
(SDRG) with low storage space but with higher capacity to discern between in and
out elements of the class. This is achieved by reducing the complexity of the probabil-
ity density function used to describe each random variable and defining the probabil-
ity of a match such that the probability is 1 when a perfect match is performed (in the
other models [FORGS, FDG, SORG] this does not hold).
Section 2 introduces the main definitions of graphs and presents the new model.
Section 3 describes a probabilistic measure of dissimilarity between a graph and an
SDRG. Section 4 evaluates the model. Section 5 gives some conclusions and further
lines to explore.

2 Formal Definitions and Notation


n
Definition 1. Let Δv and Δe denote the domains of possible values (for instance R )
for attributed vertices and arcs, respectively. These domains are assumed to include a
special value Φ that represents a null value of a vertex or arc. An attributed graph
(AG) G over (Δ v , Δ e ) is defined to be a set G = (Σ v ,Σ e ,γ v ,γ e ) , where
Σ v = {vk k = 1,..., n} is a set of vertices (or nodes), Σe = {eij i, j ∈ {1,..., n}, i ≠ j}
is a set of arcs (or edges), and the mappings γ v : Σv → Δv and γ e : Σe → Δe
assign attribute values to vertices and arcs respectively.

Definition 2. A complete AG is an AG with a complete graph structure obtained by


including null elements. An AG G with N vertices can be extended to form a com-
plete AG G’, with K vertices K ≥ N , by adding vertices and arcs with Φ attribute
values. G’ is called the k-extension of G.

Definition 3. A Structurally-Defined Random Graph SDRG is defined to be


F = (R, S ) with R = (Σω , Σε , γ ω , γ ε ) , where:
1) Σω = {ω k k = 1,..., n} is a set of vertices.
2) Σε = {ε ij i, j ∈ {1,..., n}, i ≠ j} is a set of arcs.
166 A. Solé-Ribalta and F. Serratosa

3) The mapping γ ω : Σ ω → {Χ ω , pω } associates each vertex ω k ∈ Σω with a set


composed by 2 elements. The first one is a multidimensional random variable Xω in
the domain Δω = Δ v − {Φ}. It is defined according to P(ωk = x ωk ≠ Φ) ; this
probability stores the semantic information of the vertex. The second element pω
represents the existence probability of the vertex.
4) The mapping γ ε : Σε →{Χε , pε } associates each arc ε ij ∈ Σε with Xε in the

domain Δε = Δ e − {Φ}, according to P (ε ij = x ε ij ≠ Φ, ω i ≠ Φ, ω j ≠ Φ ) and


the existence probability pε .
6) S = {A ,... A
1 M
} is a set of AGs ( )
Ai = Σiv ,Σie , lvi , lei defined over the domain Σω
for the vertices and Σε for the arcs. The set S represents the different structures (with-
out attributes) that R is trying to compact. For this reason, the mappings
lvi : Σ iv → Σω and lei : Σie → Σε associate vertices and arcs from each Ai to verti-
ces and arcs from R.

Definition 4. A null random vertex ωΦ = {ΧΦ , pΦ} or a null random arc


εΦ = {ΧΦ , pΦ} are vertices or arcs that always exists in the SDRG but with a null
value. They are defined as follows: Χ Φ is defined to be P ( Χ Φ = x ωΦ ≠ Φ ) = 0 ;
∀x ∈ Δ v − {Φ} because the value is not in the domain. pΦ = 1 since we suppose
this element always exists in the SDRG.

Definition 5. A complete SDRG F’=(R’,S’) where {


S ' = A'1 ,... A'M }
and
R ' = (Σ'ω , Σ'ε , γ 'ω , γ 'ε ) is an SDRG with a complete graph structure in R’ and in A’i.
This extension is done by adding null vertices and null arcs.

Example 1

We give a case example of a set of graphs representation using our model SDRG.
Suppose we have a set of 5 AGs in which the attribute value of the nodes is their bidi-
mensional position (x,y) and the attribute value of the arcs is a logic value indicating
their existence (Fig. 1). The attribute value of the nodes is shown on the right-hand side
of the node number. The existence of an arc is represented by a straight line.
Suppose also that it is given a common labelling (Table 1a) between the nodes of
these AGs and a hypothetic structure composed by 4 nodes (L1, L2, L3, L4). With this
set and the common labelling, we define the SDRG shown in Fig. 2. R is composed
by a structure of 4 random nodes and 4 random arcs and S is composed by 4 AGs. On
the right-hand side of the random nodes, we show the mean of the random variable.
Note that nodes v1 and v2 from all S elements share the same R attribute. The exis-
tence probability of each node and arc is shown in Table 1c and finally, the labellings
between the AGs in S and R is shown in Table 1b.
A Structural and Semantic Probabilistic Model 167

Fig. 1. Training set composed by 5 AGs

i
Table 1a. Common labelling of Table 1b. R to A' Table 1c. Existence prob-
Fig. 1 examples labelling ability of Fig. 2 nodes

Pω1 = 5 / 5 Pε1 = 5 / 5
1 2 3 4 5 1 2 3 4
G G G G G A A A A
L1 V 1 V1 V1 V1 V1 ω1 v1 V1 V1 V1
L2 V3 V3 ω2 V2 V2 V2 V2 Pω 2 = 5 / 5 Pε 2 = 1 / 2
L3 V 2 V2 V2 V2 V2 ω3 V3 V3 Pω 3 = 2 / 5 Pε 3 = 1 / 2
L4 V 3 ω4 V3
Pω 4 = 1 / 5 Pε 4 = 1 / 1

Fig. 2. SDRG constructed using Fig. 1. examples

3 Match Probability of an AG to a SDRG


The aim of this section is to describe the probability of a labelling, between a graph
and an SDRG, which is used as a dissimilarity measure. For the theoretical definition
of this probability, it is needed that the graph to be compared and the structures of the
SDRG have the same number of nodes. For this reason, we consider from now on,
that the graph and the SDRG are extended and contain the same number of nodes and
arcs. Note that the algorithm to search the optimal labelling is not explained in this
paper due to lack of space. Nevertheless, in practical implementations of this match-
ing algorithm, this extension is not always needed, since it would increase the compu-
tational cost.
Given an SDRG F = (R, S ) and an AG G, the probability of G respect F is de-
fined as the maximum probability among all the structures in S. That is,
168 A. Solé-Ribalta and F. Serratosa

PF (G) = MAX {P R , Ai (G)} . (1)


i
∀A ∈S

This expression is crucial in our model, since independently on the number of graphs
and the variability of these graphs, the probability of a graph respect an SDRG is
obtained as the maximum value calculated from the graphs that compose S.
From now to the rest of this section, we use A to represent one of the structures in
S, i.e. a concrete Ai. Moreover, we consider that we have a set of structurally correct
labellings Γ that maps nodes and arcs from G to nodes and arcs from A. That is,
f = ( f v , f e ) ∈ Γ being f v : ΣGv → ΣvA and f e : ΣGe → ΣeA . Given a node n and an
arc e from G, we define a random node ω and a random arc ε from R such that
ω = lvA ( f v (n)) and ε = leA ( fe (e)) .
Given a specific graph A in S, the probability of G respect to A and R is the maxi-
mum value among all consistent labellings f. That is,

P R, A
(G) = MAX{P R , A (G f )} . (2)
∀f ∈Γ

The probability of G, respect R and A given a labelling f is composed by the prob-


ability contribution of nodes and arc as follows,

P R, A
(G f ) = k1 ∑P
∀n∈Σ v
sem
R, A
(n G, f v ) ⋅ PRstr, A (n G, f v ) +
(3)
k2 ∑P
∀e∈Σ e
sem
R,A
(e G, f e ) ⋅ P (e G, f e ).
str
R, A

Where the weighting terms k1 and k2 adjust the importance of nodes and arcs in the
final result (being k1+k2=1). The probabilities PRsem str
, A and PR , A are the semantic and
structural probabilities of nodes or arcs of G respect a random node or arc of R. The
semantic probability, which represents the attribute-value knowledge, is weighted by
the structural probability, which represents the appearing frequency. Both probabili-
ties are defined in the following sections.

3.1 Structural Probability

Structural probability represents the confidence of a random element (node or arc).


Thus, this probability increases when a node or arc appears more frequently in the set of
graphs used to synthesise the SDRG. For the nodes and arcs, it is defined as follows,

pω pε
PRstr, A (n G, f v ) = and PRstr, A (e G, f e ) = .
∑ pω ∑ pε
(4)
' '
ω
∀ '∈Σ ω ε
∀ '∈Σ ε
A Structural and Semantic Probabilistic Model 169

Where p ω and pε are the existence probabilities of the random node ω and arc ε.
Moreover, the set of random vertices ω’ and arcs ε‘ are the ones that
ω ' = lvA ( f v (n' )) and ε ' = leA ( f e (e' )) for all the nodes n’ and arcs e’ of the ex-

tended G. Note that ∑P str


R, A (n G, f v ) = 1 and ∑P str
R, A (e G, f ) = 1 .
∀n '∈Σ vA ∀e '∈ΣeA

Example 2

Consider that we would like to compute the probability of a new data graph G respect
to the SDRG obtained in Example 1. We only show how to obtain the probability
respect to A4 graph. Fig. 3 shows graph G and Table 2 shows the labelling f 4.

4
Table 2. G to A labelling

G nodes Labeling f 4
n1 v1
n2 v2
n3 Φ
n4 v3
Fig. 3. G to A4 labelling

To compute the structural probabilities, we have to obtain the value pωi from Table
1b and consider that pωΦ = 1 (Definition 4). If we consider the mapping f 4 and the
mapping from A4 to R shown in Table 1b, we get the following structural probabilities:

P str4 (n1 G, f ) = pω1 pω1 ,ω2 ,ωΦ ,ω4 = 5 / 16 , P str4 (n2 G, f ) = pω2 pω1 ,ω2 ,ωΦ ,ω4 = 5 / 16 ,
R,A R,A

P str4 (n3 G, f ) = pωΦ pω1 ,ω2 ,ωΦ ,ω4 = 5 / 16 and P str4 (n4 G, f ) = pω4 pω1 ,ω2 ,ωΦ ,ω4 = 1 / 16
R, A R,A

with pω1 ,ω 2 ,ω Φ ,ω 4 = pω1 + pω 2 + pω Φ + pω 4 = 1 + 1 + 1 + 1 / 5 = 16 / 5 .

3.2 Semantic probability

The semantic probability is obtained as an instantiation of the random variable Xω or


Xε of R, given the attribute value of a node from G a=γn(n) or the attribute value of an
arc from G b=γe(e). It is defined as follows,

(n G, fv ) = P(ω = a ω ≠ Φ) , PR, A (e G, f e ) = P(ε ij = e ε ij = ωi = ω j ≠ Φ) .


sem
PRsem
,A
(5)

The random variable is not restricted to any distribution function. A possible solu-
tion is to define a discrete distribution and store the function as a histogram [FORGs,
FDGs, SORGs]. This solution keeps all the knowledge of the training examples but
170 A. Solé-Ribalta and F. Serratosa

needs a huge storage space. On the other hand, if we assume a Normal distribution
(defined in Equation 6 as N), the model only needs to store μ and σ for each node and
arc. In this case, if we assume that μω, με and σω, σε are the mean and variance of the
previously defined random nodes or arcs, the semantic probability can be defined as
follows,

N (a, μω ,σ ω ) N (b, με , σ ε )
PRsem ( n G, f v ) = and PR , A (e G, f e ) =
sem
. (6)
,A
N (μω , μω ,σ ω ) N ( με , με , σ ε )
Note that, in the case that G and A has exactly the same structure and the attributes of
G have the same value as the means of R, PR,A(G|f)=1.

4 Evaluation of the Model


We have evaluated the model using a dataset created at the University of Bern[1]. It is
composed of 15 capital letters (classes) of the Roman alphabet (only those which are
composed of straight lines), i.e A, E, F, H, I, K, L, M, N, T, V, W, X, Y and Z. The
letters are represented by graphs as follows. Straight lines represent edges and termi-
nal points of the lines represent nodes. Nodes are defined over a two-dimensional
domain that represents the position (x,y) of the terminal point in the plane, Δv =R2.
Edges have a one-dimensional and binary attribute that represents the existence or
non-existence of a line between two terminal points, Δe ={∃}. Graph-based represen-
tations of the prototypes are shown in Fig. 4. This database contains three sub-
databases with different distortion level: low, med, high. Fig. 5 shows 3 examples of
letter X with low distortion and 3 examples of letter H with high distortion. Moreover,
for each sub-database, there is a training set, a validation set and a test set.

Fig. 4. Graph-based representations of the Fig. 5. Some examples of letter X and H with
original prototypes low and high distortion respectively

With each class of the training set, an SDRG has been synthesised. To do so, we
have used a variation of the incremental-synthesis algorithm used to construct
SORGs [4]. The coordinates (x, y) of the positions are considered to be independent
in the basis that they don’t have any mathematical relationship. Therefore, the
random variable Xω is defined according to P(ωk=(x,y)| ωk≠Φ)=
P(ω k( x ) = x ω k ≠ Φ ) ⋅ P(ω k( y ) = y ω k ≠ Φ ) ; ∀( x, y ) ∈ R 2 . The random vari-
able in the arcs, i.e. Xω, is defined according to P(εij=∃|εij≠Φ)=1 and
P(εij=Φ|εij≠Φ)=0. In our tests we set k1=k2=1/2, see Equation 3.
A Structural and Semantic Probabilistic Model 171

Fig. 6. Graphical representations of all the node’s random variables Xω of the SDRGs. Left
image represent letter I and right letter X. Both synthesised using the low distortion training set.

Fig. 6 shows the node’s random variables Xω for two SDRGs that represent letter I
(left) and letter X (right) synthesised using the low-distortion training set. On the
right-hand side of the node, we show pω. In the case of letter I, we appreciate two
nodes with low variance (high peaks) which their means are situated in the expected
position. Nevertheless, we appreciate another two nodes with high variance (low
peaks) that seem to model the distortion of the training set. In the case of letter X, we
appreciate 4 clear nodes (n1, n2, n3, n4), again in the expected position, with low
variance and 2 high variance nodes generated by the distortion (n5, n6). Finally,
Fig. 7 shows the set elements of S for letter I.
In the incremental-synthesis algorithm [8], each new graph G is compared to the
current SDRG and a labelling between both is obtained. Using this labelling, the
SDRG is updated to incorporate G. Fig. 8 shows the evolution of the match

Fig. 7. S elements of the SDRGs that represent letter I

Learning process for A(high) Learning proccess for F(high)

10 10
9
9
9
8 8
probability

probability

8
7 "A" high "F" high
7
6 7
6
5
6
4 5
1 5 9 13 17 21 25 29 33 37 41 45 49 1 5 9 13 17 21 25 29 33 37 41 45 49
example example

Fig. 8. Evolution of the learning process


172 A. Solé-Ribalta and F. Serratosa

probability for the construction of two SDRGs1, letter A (left) and letter F (right). We
can see that while the learning process moves forward the probability of the next
element tends to increase. This tendency could be explained because when new ele-
ments of the training set are incorporated to the SDRG, the model contains more in-
formation of the class.

Table 3. Compression rate using the low-distortion/ high-distortion database, (left)/(right)

Letter Low database rate Letter High database rate


H,E,Z,F (70-75]% E,H aprox. 0%
K,T,N,Y (75-80]% A,M,F,K,W (10-20]%
X,W,M,A,V,L (80-85]% X,T,Y,N 27%,39%,55%,59%
I 92% Z,V,L,I 61%,71%,80%,92%

We define the compression rate as the number of graphs in the set S respect the
number of graphs that SDRG contains, i.e. the number of Ais. In our method, the
computational time in the recognition process is proportional to the number of ele-
ments in S. In a classical Nearest-Neighbours method, it is proportional to the number
of elements that represents the set. For this reason, it is important to evaluate the
achieved compression rate. Table 3 shows the compression rate for low and high
distortion databases. The compression in the low database is clearly considerably.
Nevertheless, in the high database, two letters achieve zero compression. This is due
to the fact that the training set elements are very structurally different. Finally,
Table 4 shows the classification ratio of our method compared to 5 other methods
reported in [5].

Table 4. Classification rate of 5 methods reported in the literature and our method

k-NN(Graph) Prot.-SVM PCA-SVM LDA-SVM P. Voting SDRG


LOW 98.3 98.5 98.5 99.1 98.3 98.9%
HIGH 90.1 92.9 93.7 94.0 94.3 64.3%

5 Conclusions and Further Work


We have presented a structural and probabilistic model for representing a set of at-
tributed graphs. The new model has the advantage of bringing together the statistical
and structural techniques to keep, to the most, the knowledge of the training set. The
match probability of an attributed graph with respect to the model is directly used as a
dissimilarity function without need to applying edit costs, which are application de-
pendent. The results of the experimental validation show that our model obtains high
recognition-ratio results when the elements of the set have low distortion. Neverthe-
less, with high distortion levels, the element-to-element recognition algorithms seem
to obtain better results. Besides the recognition-ratio results, our method only needs to
1
For these examples, we used the high distortion databases.
A Structural and Semantic Probabilistic Model 173

perform few comparisons for each class, when the number of graph in the training set
is high, this results to an important run time reduction.
Our future work is addressed to compare the model with FDGs and SORG. More-
over, we want to study statistical techniques of node reduction and analyze its impact
in the recognition-ratio and run time. From the practical point of view, we want to test
our method using other databases and analyze the dependence degree of the training
set’s distortion.

Acknowledgements

This research was partially supported by Consolider Ingenio 2010; project CSD2007-
00018, by theCICYT project DPI 2007-61452 and by the Universitat Rovira I Virgili
(URV) through a predoctoral research grant.

References
1. Riesen, K., Bunke, H.: Graph Database Repository for Graph Based Pattern Recognition
and Machine Learning. In: SSPR 2008 (2008)
2. Serratosa, F., René, A., Sanfeliu, A.: Function-described graphs for modelling objects repre-
sented by sets of attributed graphs. P. Recognition 36, 781–798 (2003)
3. Wong, A.K.C., You, M.: Entropy and distance of random graphs with application to struc-
tural pattern recognition. IEEE Trans. PAMI 7, 599–609 (1985)
4. Serratosa, F., Alquézar, R., Sanfeliu, A.: Estimating the Joint Probability Distribution of
Random Vertices and Arcs by means of Second-order Random Graphs. In: Caelli, T.M.,
Amin, A., Duin, R.P.W., Kamel, M.S., de Ridder, D. (eds.) SPR 2002 and SSPR 2002.
LNCS, vol. 2396, pp. 252–262. Springer, Heidelberg (2002)
5. Bunke, H., Riesen, K.: A Family of Novel Graph Kernels for Structural Pattern Recogni-
tion. In: Rueda, L., Mery, D., Kittler, J. (eds.) CIARP 2007. LNCS, vol. 4756, pp. 20–31.
Springer, Heidelberg (2007)
6. Sanfeliu, A., Serratosa, F., Alquézar, R.: Second-Order Random Graphs For Modeling Sets
Of Attributed Graphs And Their Application To Object Learning And Recognition.
IJPRAI 18(3), 375–396 (2004)
7. Conte, D., Foggia, P., Sansone, C., Vento, M.: Thirty Years Of Graph Matching In Pattern
Recognition. IJPRAI 18(3), 265–298 (2004)
8. Serratosa, F., Alquézar, R., Sanfeliu, A.: Synthesis of Function-Described Graphs and Clus-
tering of Attributed Graphs. IJPRAI 16(6), 621–656 (2002)
Arc-Consistency Checking with Bilevel
Constraints: An Optimization

Aline Deruyver1 and Yann Hodé2


1
LSIIT,UMR 7005 CNRS-ULP, 67 000 Strasbourg
aline.deruyver@libertysurf.fr
2
Centre Hospitalier, G08, 68 250 Rouffach

Abstract. Arc-consistency checking has been adapted to be able to in-


terpret over-segmented image. This adaptation lead to the arc-consistency
algorithm with bilevel constraints. In this paper we propose an optimiza-
tion of this algorithm. This new way to solve arc-consistency checking with
bilevel constraints gives the possibility to parallelize the algorithm. Some
experiments shows the efficiency of this approach.

1 Introduction

In the framework of image interpretation we adapted the Arc-consistency check-


ing algorithm AC4 to the problem of non bijective matching. This algorithm has
been called AC4BC : Arc-consistency checking algorithm with bilevel constraints.
However, the process proposed by AC4BC can be time consuming when the
number of labels (segmented region) and the number of arcs of the conceptual
graph are large. Several arc-consistency algorithms show interesting theoretical
and practical optimality properties [1,2,3,4,5,6,7]. However it does not meet our
needs because all these approaches propose a solution for bijective matching. In
this paper we propose an optimization of The AC4BC algorithm. This improve-
ment makes the algorithm much faster and leads to the possibility of parallelized
it. This paper is organized as follows: Section 2 describes the notation used in
this paper and recall the basic definitions. Section 3 described the improvement
of the algorithm. Section 4 shows the efficiency of this improvement with a large
number of experimentations and section 5 states the conclusion of this work.

2 Basic Notions

We use the following conventions:


Variables are represented by the natural numbers 1, ... n. Each variable i has
an associated domain Di .
Within our framework, a variable corresponds to a high level label that we
wish to attach to regions which correspond to the values within the arc consis-
tent framework. All constraints are binary and relate two distinct variables. A
constraint relating two variables i and j is denoted by Cij .

A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 174–183, 2009.

c Springer-Verlag Berlin Heidelberg 2009
Arc-Consistency Checking with Bilevel Constraints: An Optimization 175

Cij (v, w) is the Boolean value obtained when variables i and j are replaced by
values v and w respectively. ¬Cij denotes the negation of the Boolean value Cij .
Let R be the set of these constraining relations. We use D to denote the union
of all domains and d the size of D.
A finite-domain constraint satisfaction problem consists of finding all sets of
values {a1 , ..., an }, a1 x ... x an ∈ D1 x ... x Dn , for(1, ..., n) satisfying all
relations belonging to R.
In this classical definition of F DCSP , one variable is associated with one
value. This assumption can not hold for some classes of problems where we need
to associate a variable with a set of linked values as describe in [8]. We call
this problem the Finite-Domain Constraint Satisfaction Problem with Bilevel
Constraints (F DCSPBC ). In this problem we define two kinds of constraint: the
binary inter-node constraints Cij between two nodes and the binary intra-node
constraints Cmpi between two values that could be associated with the node i.
Then, the problem is defined as follows:

Definition 1. Let Cmpi be a compatibility relation, such that (a, b) ∈ Cmpi iff


a and b are compatible.
Let Cij be a constraint between i and j. Let us consider a pair Si , Sj such that
Si ⊂ Di and Sj ⊂ Dj , Si , Sj |= Cij means that (Si , Sj ) satisfies the oriented
constraint Cij . Within the image analysis framework, the sets Si and Sj contain
sets of segmented regions.

∀ai ∈ Si , ∃(ai , aj ) ∈ Si × Sj ,
Si , Sj |= Cij ⇔
such that (ai , ai ) ∈ Cmpi and (ai , aj ) ∈ Cij

The sets {S1 ...Sn } satisfy F DCSPBC iff ∀Cij Si , Sj |= Cij .

We associate a graph G to a constraint satisfaction problem in the following way:


(1) G has a node i for each variable i. (2) A directed arc (i, j) is associated with
each constraint Cij . (3) Arc(G) is the set of arcs of G and e is the number of arcs
in G. (4) Node(G) is the set of nodes of G and n is the number of nodes in G.

2.1 Arc-Consistency Problem with Bilevel Constraints

The classical arc-consistency algorithm can not classify a set of data in a node of
the graph as we would like to do in an over-segmented image interpretation task.
We thus define a class of problems called arc-consistency problems with bilevel
constraints (ACBC ). It is associated with the F DCSPBC (see Definition 1) and
is defined as follows:

Definition 2. Let (i, j) ∈ arc(G). Let P(Di ) be the set of sub parts of the
domain Di . Arc (i,j) is arc consistent with respect to P(Di ) and P(Dj ) iff ∀Si ∈
P(Di ) ∃Sj ∈ P(Dj ) such that ∀v ∈ Si ∃t ∈ Si , ∃w ∈ Sj Cmpi (v, t) and Cij (t, w).
(v and t could be identical)

The definition of an arc consistent graph becomes:


176 A. Deruyver and Y. Hodé

Definition 3. Let P(Di ) be the set of sub parts of the domain Di . Let P=P(D1 )×
.... ×P(Dn ). A graph G is arc-consistent with respect to P iff ∀(i, j) ∈ arc(G): (i,j)
is arc-consistent with respect to P(Di ) and P(Dj ).

The purpose of an arc-consistency algorithm with bilevel constraints is, given


a graph G and a set P , to compute P  , the largest arc-consistent domain with
bilevel constraints for G in P .
In our framework the set P contains the sets of segmented regions which stisfy
the constraints imposed by the conceptual graph.

2.2 Arc Consistency Algorithm with Bilevel Constraints(AC4BC )

The AC4BC was derived from the AC4 algorithm proposed by Mohr and Hen-
derson in 1986 [3,1] to solve the ACBC problem (See [8] and [9] for the details
of the algorithm).
For AC4BC , a new definition of a node i belonging to node(G) is given. A
node is made up of a kernel Di and a set of interfaces Dij associated with each
arc which comes from another linked node (See Figure 1.a). In addition, an
intra-node compatibility relation Cmpi (See Section 2.1) is associated with each
node of the graph. It describes the semantic link between different subparts of
an object which could be associated with the node. The intra-node constraint
Cmpi can be spatial or morphological constraint as shown in the Figure 1.b.

Definition 4. Let i ∈ node(G), then Di is the domain corresponding to the


kernel of i and the set Ii ={ Dij | (i, j) ∈ arc(G) } is the set of interfaces of i.

a. b.

Fig. 1. a. Structure of a node with bilevel constraints. The constraint Cmpi links
regions classified inside the node i. If a region does not belong to an interface Dij but
satisfies the constraint Cmpi with another region belonging to Dij , then this region is
kept inside Di . b. The values α, β and γ (segmented regions) can be associated with
the node i representing a conceptual object. In this example α ∈ Dik , β ∈ Dik , γ ∈ Dik
and α ∈ Dij , β ∈ Dij , γ ∈ Dij . In a classical arc-consistency checking algorithm, these
values β and γ would be removed from the node i because they are not supported by
other regions. Thanks to the intra-node constraints Cmpi, β and γ can be kept in the
node i because a path can be found between the value α and the values β and γ.
Arc-Consistency Checking with Bilevel Constraints: An Optimization 177

begin AC4BC
Step1: Construction of the data structures.
1 InitQueue(Q);
2 for each i ∈ node(G) do
3 for each b ∈ Di do
4 begin
5 S[i,b]:= empty set;
6 end;
7 for each (i, j) ∈ arc(G) do
8 for each b ∈ Dij do
9 begin
10Total:= 0;
11 for each c ∈ Dj do
12 if Cij (b, c) then
13 begin
14 Total:= Total +1
15 S[j,c]:=S[j,c] ∪ (i,b);
16 end
17 Counter[(i,j),b] := Total;
18 if Total=0 then
19 Dij := Dij − {b};
20end;
21 for each i ∈ node(G) do
22 for each Dij ∈ Ii do
23 begin
24 CleanKernel(Di , Dij , Ii , Q);
25 end

Fig. 2. The AC4BC algorithm: step1. Figure 12 describes the procedure CleanKernel.

Step2: Pruning the inconsistent labels


26 While not Emptyqueue(Q) do
27begin
28Dequeue(i,b,Q);
29 for each (j, c) ∈ S[i, b] do
30 begin
31 Counter[(j,i),c] := Counter[(j,i),c]-1;
32 if Counter[(j,i),c]=0 then
33 begin
34 Dji := Dji − {c};
35 CleanKernel(Dj , Dji , Ij , Q);
36 end;
37 end;
38 end AC4BC ;

Fig. 3. The AC4BC algorithm: step2


178 A. Deruyver and Y. Hodé

begin CleanKernel(inDi , Dij , Ii , outQ)


1 begin
2 R:= Dij ;
3 while (Searchsucc(Di, R, Cmpi, S)) do
4 begin
5 R := R ∪ S;
6 end
7 for each b ∈ Di - R do
8 begin
9 EnQueue(i,b,Q);
10 for each Dij ∈ Ii do
11 Dij = Dij -{b};
12 end
13 Di := R;
14 end;

Fig. 4. The Procedure CleanKernel

As in algorithm AC4 , the domains Di are initialized with values satisfying unary
node constraints. The algorithm is decomposed into two main steps: an initial-
ization step (See the pseudo code in Figure 3)and a pruning step which updates
the nodes as a function of the removals made by the previous step to keep the
arc-consistency (See the pseudo code in Figure 4). However, whereas in AC4 a
value was removed from a node i if it had no direct support, in AC4BC , a value
is removed if it has no direct support and no indirect support obtained by using
the compatibility relation Cmpi.This is an additional step which is called the
cleaning step (See the pseudo code in Figure 5).

Theorem 1. The time complexity of the cleaning step is in O(ed)in the worst
case, where e is the number of edges and d is the size of D.

Fig. 5. The systolic process


Arc-Consistency Checking with Bilevel Constraints: An Optimization 179

Proof: We introduce the function SearchSucc(in Di, R, Cmpi, out S) which


looks for successors of elements of Di in the set R by using the relation Cmpi.
Each new successors is marked such that successors already encountered will
not be considered again. This function is repeated until no new successor can
be found. Since the size of R is bounded by d, the time complexity of lines 3-6
is at most d times. The number of interfaces Dij to check is at most equal to e.
Then, the time complexity of lines 7-12 is in O(ed). Finally, the time complexity
of CleanKernel is in O(ed).
Theorem 2. The time complexity of AC4BC is in O(e2 d2 ) in the worst case,
where e is the number of edges and d is the size of D.
Proof: The time complexity of lines 1-20 is in O(ed2 ).In line 21 the procedure
CleanKernel is called e times. Then the times complexity of line 21-25 is in
O(e2 d).Then the time complexity of the initialisation step is in O(ed2 + e2 d).
The line 31 is executed at most ed2 . The test of line 32 is true at most ed times,
then CleanKernel is executed at most ed times. The time complexity of lines
26-37 is in O(e2 d2 ). Then the time complexity is in O(ed2 + e2 d + e2 d2 ). This
complexity is bounded by O(e2 d2 ) in the worst case.

2.3 Weakness of the Algorithm

The key point of the time complexity is the call to the procedure CleanKernel.
Reducing the number of calls will reduce the computation time. AC4BC is de-
rived from AC4. In the pruning step, each time an element is removed from the
Queue, the algorithm try to refill the Queue before emptying it. This strategy
is costive because it implies many unnecessary calls to the procedure CleanKer-
nel which produces only few effects. One removal in an interface has only few
chances of producing a change in the domain Di of the kernel of the node. We
states previously that the complexity of the procedure CleanKernel is in O(ed).
In fact this complexity can be defined more accurately by edit where dit is the
size of the domain Di at the time t of the algorithm. The less quickly the size
of Di decreases, the more slowly the algorithm runs.

3 Optimization of the AC4BC Algorithm: A Systolic


Solution

To avoid unnecessary calls to the procedure CleanKernel we propose to manage


the structure Queue in a different way. The CleanKernel procedure is not called
while it is possible to remove some labels from the interfaces. The implemented
process is systolic: The Queue is completely emptied before being refilled. The
algorithm of the new pruning step can be described as follows:

1. First, the Queue is filled by the removed labels in the initialization step as
in the previous versions of AC4BC ,
2. Second, the Queue is emptied
180 A. Deruyver and Y. Hodé

3. Next, the procedure CleanKernel is called for each node having at least a
label removed. This step refill the Queue. Then the steps 2 and 3 are repeated
until no removal were possible (see Figure 6).
In order to do that, an array Tabnode of boolean, with a size equal to the number
of nodes, is updated each time at least a removal has been made in one node. Then,
Tabnode[i] is equal to true if at least one label has been removed from the node i.
This array is initialized to false before the beginning of the pruning step. This array
allows to know which nodes has to be updated by the procedure CleanKernel.
This procedure is called only if it is necessary after having emptied the Queue and
studied all the interfaces of all the nodes. The pseudo code of the pruning step of
the optimized version of AC4BC called OAC4BC is given in the Figure 7.

Step2: Pruning the inconsistent labels


26 for each i ∈ Arc(G) do
27 Tabnode[i]:= false;
28 remove:=true;
29 While remove = true do
30 begin
31 remove:=false;
32While not Emptyqueue(Q) do
33 begin
34 Dequeue(i,b,Q);
35 for each (j, c) ∈ S[i, b] do
36 begin
37 Counter[(j,i),c] := Counter[(j,i),c]-1;
38 if Counter[(j,i),c]=0 then
39 begin
40 Dji := Dji − {c};
41 Tabnode[j]:= true;
42 end;
43 end;
44for each i ∈ N ode(G) do
45 begin
46 ifTabnode[i]=true then
47 begin
48 Tabnode[i]:=false;
49 for each Dij ∈ Ii do
50 begin
51 remove:= CleanKernel(Di , Dij , Ii , Q);
52 if remove=true then Tabnode[i]:=true
53 end
54 end
55 end
56 end
57 end OAC4BC ;

Fig. 6. OAC4BC algorithm : step2


Arc-Consistency Checking with Bilevel Constraints: An Optimization 181

Fig. 7. The experimentations shows that the time complexity order of OAC4BC is
better in average than the time complexity of AC4BC

4 Experimentations
Reducing the number of calls to CleanKernel will reduce the computation time
of the arc-consistency checking. However, we can imagine that in some cases
the structure Queue can only be filled with few elements. Then, the gain may
be lost by a change of the scanning order of the labels. It may lead to work
first with labels whose removal has a poor effect on the other labels. The time
complexity in the worst case for AC4BC and its optimized version OAC4BC are
the same. However, it is interesting to study the gain of the optimized algorithm
on experimental data.

4.1 Application to a Set of Water Meter Images


In this application the aim is to localize the water meter in the image in order
to detect if it is not broken, to recognize the type of water meters (analogical or
numerical) and to read the numerical value displayed on it if there is one. These
images are very noisy and contains after applying a watershed algorithm a large
number of segmented regions. The conceptual graph describing the water meter
is very complex (it contains 142 edges and 24 nodes) because the grey level values
are not significant and it was necessary to describe in detail the spatial relations
between each subpart of the object and the morphological characteristics of each
subpart. Our approach has been applied with success on a set of 26 images to
localize the frame and the center of the water meter. Figure 9 presents 7 labeled
images. In that framework, during the arc-consistency checking of each image,
the number of labels removed from interfaces when the queue is emptied and the
number of calls of the procedure CleanKernel to obtain the largest arc consistent
domain are recorded at each systolic cycle . We compare
– the number of removal from interfaces x which gives an idea of the number
of calls to CleanKernel in the non optimized version of ACBC .
– the number of calls to CleanKernel y in the optimized version OAC4BC .
182 A. Deruyver and Y. Hodé

a.

b.

c.

Fig. 8. Interpretation of water meter images. a: original images, b: segmented images


with a watershed algorithm c: detection of the frame and the center of the water meter.
The object of these images does not have the morphological characteristics described
by the conceptual graph. (The original images are supplied by the company ”Véolia”).

If the optimization change the time cost by a constant scaling factor, x/y should
be constant for any x. Figure 8 shows that this is not true. The correlation
between x and x/y (Spearman coefficient r = 0.93, p < 0.0001) is very strong.
It means that the higher is x, the higher is the gain x/y. This result suggests
that the time complexity order of OAC4BC is better in average than the time
complexity of AC4BC , at least with our set of test images.

5 Conclusion and Discussion

The optimized version of the AC4BC algorithm called OAC4BC has two advan-
tages:

– It gives the possibility to apply our approach on images with more than 800
segmented regions and with a conceptual graph containing 142 edges. These
experimentation would not be possible without this optimization. Than it
gives possibility to apply our approach on real complex problems.
– It gives the possibility to envisage the parallelization of our algorithm. In that
case each node can be considered as an individual process. Each node are
updated separately (See lines 45-55 of Figure 7). The nodes can be updated
in one step in parallel. The consequences of this updating can be sent to the
other nodes in a second step (See lines 32-43 of Figure 7). Such a parallel
implementation could be made in the context of GPU programming.

Acknowledgment

We thank the company ”Véolia” for having supplied us the set of images of water
meter.
Arc-Consistency Checking with Bilevel Constraints: An Optimization 183

References
1. Bessière, C.: Arc-consistency and arc-consistency again. Artificial intelligence 65,
179–190 (1994)
2. Kokèny, T.: A new arc consistency algorithm for csps with hierarchical domains. In:
Proceedings 6th IEEE International Conference on Tools for Artificial Intelligence,
pp. 439–445 (1994)
3. Mohr, R., Henderson, T.: Arc and path consistency revisited. Artificial Intelli-
gence 28, 225–233 (1986)
4. Mohr, R., Masini, G.: Good old discrete relaxation. In: Proceedings ECAI 1988, pp.
651–656 (1988)
5. Hentenryck, P.V., Deville, Y., Teng, C.: A generic arc-consistency algorithm and its
specializations. Artificial Intelligence 57(2), 291–321 (1992)
6. Mackworth, A., Freuder, E.: The complexity of some polynomial network consis-
tency algorithms for constraint satisfaction problems. Artificial Intelligence 25, 65–
74 (1985)
7. Freuder, E., Wallace, R.: Partial constraint satisfaction. Artificial Intelligence 58,
21–70 (1992)
8. Deruyver, A., Hodé, Y.: Constraint satisfaction problem with bilevel constraint:
application to interpretation of over segmented images. Artificial Intelligence 93,
321–335 (1997)
9. Deruyver, A., Hodé, Y.: Qualitative spatial relationships for image interpretation
by using a conceptual graph. In: Image and Vision Computing (2008) (to appear)
Pairwise Similarity Propagation Based Graph
Clustering
for Scalable Object Indexing and Retrieval

Shengping Xia1,2, and Edwin R. Hancock2


1
ATR Lab, School of Electronic Science and Engineering, National University
of Defense Technology, Changsha, Hunan, P.R. China 410073
2
Department of Computer Science, University of York, York YO10 5DD, UK

Abstract. Given a query image of an object of interest, our objective is to re-


trieve all instances of that object with high precision from a database of scalable
size. As distinct from the bag-of-feature based methods, we do not regard descrip-
tor quantizations as "visual words". Instead a group of selected SIFT features of
an object together with their spatial arrangement are represented by an attributed
graph. Each graph is then regarded as a "visual word". We measure the similar-
ity between graphs using the similarity of SIFT features and the compatibility of
their arrangement. Using the similarity measure we efficiently identify the set of
K nearest neighbor graphs (KNNG) using a SOM based clustering tree. We then
extend the concept of “query expansion” widely used in text retrieval to develop
a graph clustering method based on pairwise similarity propagation (SPGC), in
that the trained KNNG information is utilized for speeding up. Using SOM based
clustering tree and SPGC, we develop a framework for scalable object indexing
and retrieval. We illustrate these ideas on a database of over 50K images span-
ning more than 500 objects. We show that the precision is substantially boosted,
achieving total recall in many cases.

1 Introduction
In this paper we aim to develop a framework for indexing and retrieving objects of
interest where large variations of viewpoint, background structure and occlusions are
present. State-of-the-art methods for object retrieval from large image corpora rely on
variants of the "Bag-of-Feature (BoF)" technique [2][7][13]. According to this method-
ology, each image in the corpus is first processed to extract high-dimensional feature
descriptors. These descriptors are quantized or clustered so each feature is mapped to a
"visual word" in a relatively small discrete vocabulary. The corpus is then summarized
using an index where each image is represented by the visual words contained within it.
At query time, the system is presented with a query in the form of an image region. This
region is itself processed to extract feature descriptors that are mapped onto the visual
word vocabulary, and these words are used to index the query. The response set of the
query is a set of images from the corpus that contain a large number of visual words
in common with the query region. These response images may be ranked subsequently

Corresponding author.

A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 184–194, 2009.

c Springer-Verlag Berlin Heidelberg 2009
Pairwise Similarity Propagation Based Graph Clustering 185

using spatial information to ensure that the response and the query not only contain sim-
ilar features, but that the features occur in compatible spatial configurations [6][9][14].
However, recent work [2][7] has shown that these methods can suffer from poor re-
call when the object of interest appears with large variations of viewpoint, variation in
background structure and under occlusion.
The work reported in [2][7][8], explores how to derive a better latent object-model
using a generalization of the concept of query expansion, a well-known technique from
the field of text based information retrieval [1][11]. In text-based query expansion a
number of the highly ranked documents from the original response set are used to gen-
erate a new query, or several new queries, that can be used to obtain new response sets.
The outline of the approach [2][7][8] is as follows:
Stage 1. Given a query region, search the corpus and retrieve a set of image regions that match
the query object;
Stage 2. Combine the retrieved regions, along with the original query, to form a richer latent
model of the object of interest;
Stage 3. Re-query the corpus using this expanded model to retrieve an expanded set of match-
ing regions;
Stage 4. Repeat the process as necessary, alternating between model refinement and
re-querying.
In Stage 1, a BoF based method is used to retrieve a set of initial images. In Stage
2, the initially returned result list is re-ranked by estimating affine homographies be-
tween the query image and each of the top-ranking results from the initial query. The
score used in re-ranking is computed from the number of verified inliers for each result.
According to the top ranked images, a richer latent model is formed and is re-issued
as a new query image in Stage 3. To generate the re-queries, five alternative query ex-
pansion methods are proposed[2][7][8]. These include a) query expansion baseline, b)
transitive closure, c) average query expansion, d) recursive average query expansion,
and e) resolution expansion. Each method commences by evaluating the original query
Q0 composed of all the visual words which fall inside the query region. A latent model
is then constructed from the verified images returned from Q0 , and a new query Q1 ,
or several new queries, issued. These methods have achieved substantially improved
retrieval performance. However, they suffer from four major problems. In the follow-
ing paragraphs, we analyze these problems in detail and use the analysis to design an
improved search engine based on a graph-based representation.
Image indexing. In BoF methods, the images are indexed by the quantized descriptors.
However, if we analyze the "bag-of-words (BoW)" used in text information retrieval
(TIR) and the BoF used in object indexing/retrieval (OIR), we observe that the BoF
does not operate at the same semantic level as the BoW. A word in BoW, specified as
a keyword, is a single word, a term or a phrase. Every keyword (e.g. cup or car) nor-
mally has a high level semantic meaning. However, a visual feature usually does not
posses semantic meaning. Furthermore, we observe that most of the visual words are
not object or class specific. In a preliminary experimental investigation, we have trained
a large clustering tree using over 2M selected SIFT [15] descriptors extracted from over
50K images and spanning more than 500 objects. The number of the leaf nodes in the
186 S. Xia and E.R. Hancock

clustering tree is 25334 and the mean vector of each leaf node is used as a quantized
visual word. With an increasing number of objects, a single visual word may appear in
hundreds of different objects. By contrast, a group of local features for an object con-
tained in an image together with their collective spatial arrangement are usually of a
high level semantic meaning. Moreover, such a representation is also significantly more
object or scene specific. Accordingly, the above visual word might best be regarded as
a morpheme in English, or a stroke or word-root in Chinese. Motivated by these obser-
vations, we propose an OIR model based on an arrangement of features for an object
and which is placed at the word-level in TIR. Since each bag of features is structured
data, a more versatile and expressive representational tool is provided by an attributed
graph [3]. Hence we represent a bag of features using an attributed graph G, and this
graph will be used for the purposes of indexing. Further details appear in Section 3.
Measuring image similarity. Provided that the graph representation is constructed
using all of the available local invariant features, then the number of local invariant
features of an image that are detected and that need to be processed might be very
large. Moreover such an approach also renders redundant the representation of shape
information and also poses computationally difficulty in the manipulation of all pos-
sible features for modeling and training. For example, one high resolution image (e.g.
3264×2448) can be resized to a many lower resolutions (e.g. 816×612, 204×153). As
a result the number of spatially consistent inliers varies significantly, and it is difficult
to define a ranking function. If the images are not matched under comparable scales,
an object that is a sub-part of another object may have a high matching score. This will
result in significant false matching using query expansion. Hence, each image is repre-
sented by a pyramid structure, with each grid scaled to an identical size, and then select
a subset of salient visual features that can be robustly detected and matched, using a
method for ranking SIFT features proposed in [15]. In this way, one high resolution im-
age might be represented by several graphs. For such canonical graphs, it will be much
easier to define a suitable similarity measure.
Retrieval speed. In the above method, spatial verification must also be performed for
the subsequent re-queries. As a result it may become prohibitively expensive to retrieve
from a very large corpus of images. We therefore require efficient ways to include spa-
tial information in the index, and move some of the burden of spatial matching from
the ranking stage to the training stage. We represent each image or each region of inter-
est using graphs and then compute all possible pairwise graph similarity measures. For
each graph we rank in descending order all remaining graphs in the dataset according
to the similarity measures. For each graph we then select the K best ranked graphs,
referred to as K-nearest neighbor graphs (KNNG). For retrieval, we directly use the
training result for each re-query to repeat the above query expansion process. This will
significantly decrease the time consumed in the query stage.
Ranking. In the above method, the images in the final result are in the same order in
which they entered the queue for the subsequent re-query. We argue that these images
should be re-ranked. Unfortunately, re-computing the pairwise similarity measures be-
tween the query image and each retrieved graph will be time consuming. We thus pro-
pose a similarity propagation method to approximate the similarity measure.
Pairwise Similarity Propagation Based Graph Clustering 187

The outline of the remainder of this paper is as follows. In Section 2, we present some
preliminaries for our work. In Section 3, we describe how to train a search engine for
incremental indexing and efficient retrieval. We present experimental results in Section
4 and conclude the paper in Section 5.

2 Preliminaries
For an image, those SIFT [5] features that are robustly matched with the SIFT features
in similar images can be regarded as salient representative features. Motivated by this,
a method for ranking SIFT features has been proposed in [15]. Using this method, the
SIFT features of an image I are ranked in descending order according to a matching fre-
quency. We select the T best ranked SIFT features, denoted as V={V t , t = 1, 2, ..., T },

− →
− →
− →− →−
where V t = (( X t )T , ( D t )T , (U t )T )T . Here, X t is the location, D t is the direction vector


and U t is the set of descriptors of a SIFT feature. In our experiments, T is set to 40. If
there are less than this number of feature points present then all available SIFT features
in an image are selected. We then represent the selected SIFT features in each image
using an attributed graph.
Formally, an attributed graph G [3] is a 2-tuple G = (V, E), where V is the set of
vertices, E⊆V×V is the set of edges. For each image, we construct a Delaunay graph G
using the coordinates of the selected SIFT features. In this way, we can obtain a set of
graphs G ={Gl , l = 1, 2, ..., N} from a set of images.
We perform pairwise graph matching (PGM) with the aim of finding a maximum
common subgraph (MCS) between two graphs Gl and Gq , and the result is denoted as
MCS (Gl ,Gq ). In general, this problem has been proven to be NP-hard. Here we use
a Procrustes alignment procedure [12] to align the feature points and remove those
features that do not satisfy the spatial arrangement constraints.
Suppose that Xl and Xq are respectively the position coordinates of the selected fea-
tures in graphs Gl and Gq . We can construct a matrix
Z = arg min Xl · Ω · Xq F , sub ject to ΩT · Ω = I. (1)
where  • F denotes the Frobenius norm. The norm is minimized by the nearest orthog-
onal matrix
Z ∗ = Ψ · Υ∗ , sub ject to XlT · Xq = Ψ · Σ · Υ∗ . (2)
where Ψ · Σ · Υ∗ is the singular value decomposition of matrix XlT · Xq . The goodness-
of-fit criterion is the root-mean-squared error, denoted as e(Xl , Xq ). The best case is
e(Xl , Xq ) = 0. The error e can be used as a measure of geometric similarity between the
two groups of points. If we discard one pair of points from Xl and Xq , denoted as Xl→i
and Xq→i , e(Xl→i , Xq→i ), i = 1, 2, ..., CS (Gl,Gq ) can be obtained, where CS (Gl ,Gq )
is the number of SIFT features between two graphs initially matched using the matching
proposed in [18]. The maximum decrease of e(Xl→i , Xq→i ) is defined as
Δe(CS (Gl , Gq )) = e(Xl , Xq ) − min{e(Xl→i , Xq→i )} (3)
if Δe(CS (Gl , Gq ))/e(Xl , Xq ) > , e.g.  = 0.1, the corresponding pair Xli and Xqi is
discarded as a mismatched feature pair. This leave-one-out procedure can proceed iter-
atively, and is referred as the iterative Procrustes matching of Gl and Gq .
188 S. Xia and E.R. Hancock

Given MCS (Gl , Gq ) obtained by the above PGM procedure, we construct a similarity
measure between the graphs Gl and Gq as follows:
R(Gl , Gq ) = MCS (Gl , Gq ) × ( exp(− e(Xl , Xq )) )κ . (4)
where MCS (Gl , Gq ) is the cardinality of the MCS of Gl and Gq , κ is the number of
mismatched feature pairs discarded by iterative Procrustes matching, which is used to
amplify the influence of the geometric dissimilarity between Xl and Xq .
Finally, for the graph set G ={Gq , q = 1, 2, ..., N}, for each graph Gl ∈ G, and the
remaining graphs in the set (∀Gq ∈ G), we obtain the pairwise graph similarity measures
R(Gl , Gq ) defined in Equation (4). Using the similarity measures we rank in descending
order all graphs Gq . The K top ranked graphs are defined as the K-nearest neighbor
graphs (KNNG) of graph Gl , denoted as K{Gl }.

3 Object Indexing and Retrieval


This section explains how we train our search engine so that it can be used for object
retrieval with ease, speed and accuracy.

3.1 Obtaining KNNG Using RSOM Tree


With the increase in the size of the graph dataset, it becomes time consuming to obtain
all K{Gl } if a sequential search strategy is adopted. However, in a large graph set, most
of the PGSM values are very low. For a single graph Gl , if we can efficiently find a
subset G with significant similarity values from the complete graph set G as a filtering
stage, then we only need to perform pairwise graph matching for this subset. To this
end, we propose a clustering tree based method.
We firstly incrementally train a clustering tree on the feature descriptors. We use the
SOM based method proposed in [16] for recursively generating a cluster tree, referred
to as RSOM tree. To obtain K{Gl } for each training graph using a trained RSOM tree
we proceed as follows. Given a graph Gl , we find the winner of the leaf nodes for each
descriptor of this graph and define the union of all graphs in those winners as follows:
j j
UG{Gl } = { Gq | Uq ∈ Gq , Uq ∈ WL{Ult }, Ult ∈ Gl }. (5)
where WL{Ult } is the winner of the leaf nodes for descriptor Ult . The frequency of graph
Gq , denoted as Hq , represents the number of roughly matched descriptors between two
graphs. Since we aim to obtain K{Gl }, we need not process all graphs in the subsequent
stages. We rank the graphs in UG{Gl } according to decreasing frequency Hq of graph
Gq . From the ranked list, we select the first K graphs, denoted by K {Gl } as follows:
K {Gl } = { Gq | Gq ∈ UG{Gl }, Hq > Hq+1 , q = 1, 2, ..., K.}. (6)
For each graph Gq in K {Gl }, we will obtain the similarity measure according to Equa-
tion (4) and then K{Gl } can be obtained. It is important to stress that though the code-
book vectors of those leaf nodes in an RSOM tree are a quantization of the descriptors,
we do not regard such a quantization as a bag-of-features [2][7][8]. We simply use the
RSOM tree to efficiently retrieve candidate matching graphs as shown in Equation (6).
Pairwise Similarity Propagation Based Graph Clustering 189

3.2 Pairwise Similarity Propagation Based Graph Clustering


For a given similarity threshold, the siblings of Gl are defined as follows:

S {Gl } = {Gq ∈ K{Gl } | R(Gl , Gq ) ≥ Rτ }  S Rτ {Gl }. (7)

∀Gl ∈ G, we can obtain the siblings S {Gl }. For each graph Gq ∈ S {Gl }, the correspond-
ing siblings can also be obtained. In this way, we can iteratively obtain a series of graphs
which satisfy consistent sibling relationships.
The graph set, obtained in this way, is referred to as a family tree of graph Gl
(FTOG). Given a graph set G, an FTOG of Gl with k generations, denoted as L{Gl , k}, is
defined as: 
L{Gl , k} = L{Gl , k − 1} S Rτ {Gq }. (8)
Gq ∈L{Gl ,k−1}

where, if k = 1, L{Gl , 1} = L{Gl , 0} S {Gl } and L{Gl , 0} = {Gl }; and the process stops
when L{Gl , k} = L{Gl , k + 1}. An FTOG, whose graphs satisfy the restriction defined in
Equation (7), can be regarded as a cluster of graphs. We thus refer to this process defined
in Equation (8) as pairwise similarity propagation based graph clustering (SPGC).

3.3 Scalable Object Retrieval and Indexing


Object Retrieval using RSOM and SPGC. Given a query graph Gl , we obtain L{Gl , k}
using Equation (8). We only need to obtain K{Gl } for the query graph in the query stage
using the method described in Section 3.1. For each re-query graph Gl , we directly take
K{Gl } from the training results. The graphs in L{Gl , k} are re-ranked as follows. Suppose
G p ∈ K{Gl } and Gq ∈ K{G p }, if R{Gl , Gq } has not been obtained in the training stage,
we here estimate it using another similarity propagation rule defined as follows:
R(G p , Gq )
R{Gl , Gq } = R(Gl , G p ) × . (9)
G p 
If the generation difference between the query graph Gl and a queried graph Gq is
greater than 2, this similarity propagation rule can be iteratively used. In this way, for
each graph Gq ∈ L{Gl , k}, its corresponding R(Gl , Gq ) can be obtained. The graphs in
FTOG L{Gl , k} are then re-ranked in a descending order according to their similarity
measures to give the retrieval result.
In outline of our retrieval method has the four following steps:
Step 1. Obtain siblings of query graph Gl using the RSOM tree;
Step 2. Obtain the FTOG of Gl using Equation (8);
Step 3. Obtain similarity measures for all graphs in the FTOG using Equation (9);
Step 4. Re-rank all graphs according to their similarity measures in a descending order.
In Step 1, it is needed to look up the winning leaf nodes for a constant number of
descriptors from a large RSOM tree, the time consumed in this process is direct ratio
to the logarithm of the number of the training descriptors. As a result of the above pro-
cedure, we obtain a set of graphs with constant size. We need only perform pairwise
graph matching for the query graph against this graph set. The computational complex-
ity of Step 2 and Step 3 significantly decreases because we utilize the training results-the
190 S. Xia and E.R. Hancock

KNNG information of each graph. The computational complexity of Step 4 is also low.
Hence the time consumed is nearly a constant for a query from even very large image
datasets.
Incremental Object Indexing. Given a graph set G and its accompanying RSOM tree,
an additional graph Gl is processed as follows:
1) If maxGq ∈L{Gl ,g} R(Gl , Gq ) is greater than a threshold Rτ0 , we regard Gq as a dupli-
cate of Gl . Meanwhile, a graph Gl in graph set is referred to as an exemplar graph.
2) If maxGq ∈L{Gl ,g} R(Gl , Gq ) ≤ Rτ0 , Gl is incrementally added to G. Each Gq ∈
K{Gl }, K{Gq } is updated according to the descending order of the pairwise similarity
measures if needed. In addition, the descriptors of graph Gl are incrementally added
to the RSOM tree.
Although the threshold Rτ0 is set as a constant in this paper, it can also be learned
from the training data for each object in order to select a group of representative irre-
ducible graphs. These graphs act as indexing items and are analogous to the keywords
in TIR. When querying, if a graph Gq is queried, its duplicate graphs, if any, are ranked
in the same order with Gq .

4 Experimental Results
We have collected 53536 images, referred as Dataset I, some examples of which are
shown in Figure 1, as training data. The data spans more than 500 objects including

(a) 50 objects in Coil 100 (b) Unlabeled sample images

(c) 8 objects in[10] (d) 10 objects collected by the authors

Fig. 1. Image data sets. a: 3600 images of 50 objects in COIL 100, labeled as A1∼A50; b: 29875
unlabeled images from many other standard datasets, e.g. Caltech101 [4] and Google images,
covering over 450 objects and used as negative samples; c: 161 images of 8 objects used in [10],
labeled as C1 to C8; d: 20000 images of 10 objects collected by us, labeled as D1 to D10. For
each of the objects in D1 to D9, we collect 1500 images which traverse a large variation of
imaging conditions, and similarly 6500 images for D10. For simple description, the 4 dada sets
are denoted as A to D. The objects in Figure 1a,Figure 1c and Figure 1d are numbered from left to
right and then from top to bottom as shown in the corresponding figures, e.g. A1 to A50 in Figure
1a. As a whole, the 68 objects are also identified as Object 1 to Object 68. The above images as
a whole are referred as Dataset I.
Pairwise Similarity Propagation Based Graph Clustering 191

Fig. 2. A sample of the results returned by our method for 72 images of a car, appearing with
viewpoint variations of 0 360o , in COIL 100, achieving total recall and precision 1. This query
was performed on a dataset of over 50,000 images. The center image is the query image. Using
SPGC, we can obtain an FTOG containing all 72 images of the car as shown in this figure.

human faces and scenes. We take 68 images as examples, which are identified as Object
1 to Object 68. For each of these images, we extract ranked SIFT features, using the
method presented in [15], of which at most 40 highly ranked features are selected to
construct a graph. We have collected over 2,140,000 SIFT features and 53536 graphs
for the training set. We have trained a RSOM clustering tree with 25334 leaf nodes for
the SIFT descriptors of Dataset I using the incremental RSOM training method. In this
training stage, we have obtained K{Gl } for each of the graphs of Dataset I. We set Rτ0
as 18 and 33584 graphs are selected as exemplar graphs. As a result 9952 graphs are
indexed as a duplicates of their nearest neighbors.
A sample of the results returned by our method is shown in Figure 2. Each of the
instances are recalled with precision 1, although the car appears with large viewpoint
changes.
We randomly select 30% of the sample graphs from the above 68 objects in Dataset I.
We use each of these graphs to obtain a query response set for each similarity threshold.
For each retrieval we compute the maximal F-measure defined as 1/recall + 21/precision over
the different threshold values. The average of these maximal F-measures for each object
class are given in Table 1.

Table 1. F-measure f for given test set of Object 1 to 68

ID f ID f ID f ID f ID f ID f ID f ID f ID f ID f
1 1.0 2 .651 3 1.0 4 1.0 5 1.0 6 1.0 7 1.0 8 1.0 9 1.0 10 1.0
11 1.0 12 1.0 13 1.0 14 1.0 15 1.0 16 1.0 17 1.0 18 1.0 19 1.0 20 1.0
21 1.0 22 1.0 23 1.0 24 1.0 25 1.0 26 1.0 27 1.0 28 1.0 29 1.0 30 1.0
31 1.0 32 1.0 33 1.0 34 1.0 35 1.0 36 1.0 37 1.0 38 1.0 39.619 40 1.0
41 1.0 42 1.0 43 1.0 44 1.0 45 1.0 46 1.0 47 1.0 48 1.0 49 1.0 50 1.0
51.325 520.350 530.333 540.354 55.314 560.364 57.353 580.886 59.812 60.868
61.752 62 .777 63 .753 64 .734 65.791 66 .747 67.714 680.975
192 S. Xia and E.R. Hancock

(a) Performance using FTOG (b) performance using KNNG (c) Performance comparison
between FTOG and KNNG

Fig. 3. Retrieval performance of Object 3.(a) Retrieval performance using our family tree of
graphs method, referred as FTOG; (b) Retrieval performance using simple K-nearest neighbor
graphs(KNNG). (c) Two ROC plots of two methods, in that we can obtain an optimal operating
point where recall and precision and F-Measure all achieve a value 1, and the average precision
of our method also achieves a value 1.

From Table 1, it is clear that for most of the objects sampled under controlled imaging
conditions, the ideal retrieval performance (an F-measure of 1 or an average precision
of 1) has been achieved. This is illustrated by Figure 3. The plots of recall/precision and
similarity threshold using our FTOG based method are shown in Figure 3a. The plots
of recall/precsion and similarity threshold using simple K-nearest neighbors graphs
(KNNG) are shown in Figure 3b. The ROC plots for the two methods are shown in
Figure 3c. For the FTOG method, the optimal operation point is that both recall and
precision achieve 1, while the F-measure and the average precision also achieve 1.
This means that all graphs of the object of interest can be clustered into a unique clus-
ter. Comparing to the simple K nearest neighbors based method, the retrieval perfor-
mance has been significantly improved by introducing pairwise clustering, as shown
in Figure 3c.
However, in most practical situations, the images of an object might be obtained with
large variations of imaging conditions and are more easily clustered into several FTOGs.
In this situation the average precision is usually less than 1. An example is provided
by the retrieval performance for Objects 51 to 68 shown in Table 1. In particular, for
Objects 51 to 58, the F-measure is very low because of the large variations of viewpoint.
The corresponding images are not densely sampled to form a unique cluster using our
similarity propagation based graph clustering method. The results for Objects 59 to 68
are much better since we have collected thousands of images for each of them with
continuous variations of "imaging parameters".

5 Conclusion

In this paper, we propose a scalable object indexing and retrieval framework based
on the RSOM tree clustering of feature descriptors and pairwise similarity propaga-
tion based graph clustering (SPGC). It is distinct from current state-of-the-art bag-of-
feature based methods [2][7] since we do not use a quantization of descriptors as visual
Pairwise Similarity Propagation Based Graph Clustering 193

words. Instead, we represent each bag of features of an image together with their spatial
configuration using a graph. In object indexing and retrieval such graphs act in a man-
ner that is analogous to keywords in text indexing and retrieval. We extend the widely
used query expansion strategy, and propose a graph clustering technique based on pair-
wise similarity propagation. Using RSOM tree and SPGC, we implement an incremen-
tal training search engine. Since most of the computation has been transferred to the
training stage, the high precision and recall retrieval requires a nearly constant time
consumption for each query.
We perform experiments with over 50K images spanning more than 500 objects
and these show that the instances similar to the query item can be retrieved with ease,
speed and accuracy. For some of the objects, the ideal retrieval performance (an average
precision of 1 or an F-measure of 1) has been achieved.
In our framework, if the SIFT feature extractor is implemented by using C++ or
DSP, and the RSOM tree is implemented based on cluster-computers [17], and multiple
pairwise graph matchings run in parallel, our system can be scalable to huge dataset
with real time retrieval. We leave such researches for our future work.

Acknowledge
We acknowledge financial support from the FET programme within the EU FP7, un-
der the SIMBAD project (contract 213250), and by the ATR Lab Foundation project
91408001020603.

References
1. Buckley, C., Salton, G., Allan, J., Singhal, A.: Automatic query expansion using smart. In:
TREC-3 Proc. (1995)
2. Chum, O., Philbin, J., Sivic, J., Isard, M., Zisserman, A.: Total recall: Automatic query ex-
pansion with a generative feature model for object retrieval. In: Proc. ICCV (2007)
3. Chung, F.: Spectral graph theory. American Mathematical Society, Providence (1997)
4. Li, F.F., Perona, P.: A Bayesian hierarchical model for learning natural scene categories.
CVPR 2, 524–531 (2005)
5. Lowe, D.: Local feature view clustering for 3d object recognition. CVPR 2(1), 1682–1688
(2001)
6. Nister, D., Stewenius, H.: Scalable recognition with a vocabulary tree. In: Comp. Vision and
Pattern Recognition, pp. II: 2161–2168 (2006)
7. Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Object retrieval with large vocabu-
laries and fast spatial matching. In: Proc. CVPR (2007)
8. Philbin, J., Chum, O., Isard, M., Sivic, J., Zisserman, A.: Lost in Quantization: Improving
Particular Object Retrieval in Large Scale Image Databases. In: Proc. CVPR (2008)
9. Quack, T., Ferrari, V., Van Gool, L.: Video mining with frequent itemset configurations. In:
Proc. CIVR (2006)
10. Rothganger, F., Lazebnik, S., Schmid, C., Ponce, J.: 3d object modeling and recognition
using local affine-invariant image descriptors and multi-view spatial constraints. IJCV 66(3),
231–259 (2006)
11. Salton, G., Buckley, C.: Improving retrieval performance by relevance feedback. Journal of
the American Society for Information Science 41(4), 288–297 (1999)
194 S. Xia and E.R. Hancock

12. Schonemann, P.: A generalized solution of the orthogonal procrustes problem. Psychome-
trika 31(3), 1–10 (1966)
13. Sivic, J., Zisserman, A.: Video Google: A text retrieval approach to object matching in
videos. In: Proc. ICCV (October 2003)
14. Tell, D., Carlsson, S.: Combining appearance and topology for wide baseline matching. In:
Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2350, pp.
68–81. Springer, Heidelberg (2002)
15. Xia, S.P., Ren, P., Hancock, E.R.: Ranking the local invariant features for the robust visual
saliencies. In: ICPR 2008 (2008)
16. Xia, S.P., Zhang, L.F., Yu, H., Zhang, J., Yu, W.X.: Theory and algorithm of machine learning
based on rsom tree model. ACTA Electronica sinica 33(5), 937–944 (2005)
17. Xia, S.P., Liu, J.J., Yuan, Z.T., Yu, H., Zhang, L.F., Yu, W.X.: Cluster-computer based incre-
mental and distributed rsom data-clustering. ACTA Electronica sinica 35(3), 385–391 (2007)
18. Xia, S.P., Hancock, E.R.: 3D Object Recognition Using Hyper-Graphs and Ranked Local
Invariant Features. In: da Vitoria Lobo, N., et al. (eds.) SSPR+SPR 2008. LNCS, vol. 5342,
pp. 117–1126. Springer, Heidelberg (2008)
A Learning Algorithm for the Optimum-Path
Forest Classifier

João Paulo Papa and Alexandre Xavier Falcão

Institute of Computing
University of Campinas
Campinas SP, Brazil

Abstract. Graph-based approaches for pattern recognition techniques


are commonly designed for unsupervised and semi-supervised ones. Re-
cently, a novel collection of supervised pattern recognition techniques
based on an optimum-path forest (OPF) computation in a feature space
induced by graphs were presented: the OPF-based classifiers. They have
some advantages with respect to the widely used supervised classifiers:
they do not make assumption of shape/separability of the classes and
run training phase faster. Actually, there exists two versions of OPF-
based classifiers: OPFcpl (the first one) and OPFknn . Here, we introduce
a learning algorithm for the last one and we show that a classifier can
learns with its own errors without increasing its training set.

1 Introduction
Pattern recognition techniques can be divided according to the amount of avail-
able information of the training set: (i) supervised approaches, in which we have
fully information of the samples, (ii) semi-supervised ones, in which both labeled
and unlabeled samples are used for training classifiers and (iii) unsupervised
techniques, where none information about the training set are available [1].
Semi-supervised [2,3,4,5] and unsupervised [6,7,8,9] techniques are commonly
represented by graphs, in which the dataset samples are the nodes and some kind
of adjacency relation need to be established. Zahn [7] proposed to compute a
Minimum Spanning Tree (MST) in the whole graph, and further one can remove
some edges aiming to partition the graph into clusters. As we have a connected
acyclic graph (MST), any removed edge will make the graph a forest (a collection
of clusters, i.e., trees). These special edges are called inconsistent edges, which
can be defined according to some heuristic, such that an edge can be inconsistent
if and only if its weight was greater than the average weight of its neighborhood.
Certainly, this approach does not work very well in real and complex situations.
Basically, graph-based approaches aim to add or to remove edges, trying to join
or to separate the dataset into clusters [8].
Supervised techniques use a priori information of the dataset to create optimal
decision boundaries, trying to separate the samples that share some characteris-
tic from the other ones. Most of these techniques does not make use of the graph
to model their problems, such that the widely used Artificial Neural Networks

A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 195–204, 2009.

c Springer-Verlag Berlin Heidelberg 2009
196 J.P. Papa and A.X. Falcão

using Multilayer Perceptrons (ANN-MLP) [10] and Support Vector Machines


(SVM) [11]. An ANN-MLP, for example, can address linearly and piecewise lin-
early separable feature spaces, by estimating the hyperplanes that best separates
the data set, but can not efficiently handle non-separable problems. As an un-
stable classifier, collections of ANN-MLP [12] can improve its performance up to
some unknown limit of classifiers [13]. SVMs have been proposed to overcome the
problem, by assuming linearly separable classes in a higher-dimensional feature
space [11]. Its computational cost rapidly increases with the training set size and
the number of support vectors. As a binary classifier, multiple SVMs are required
to solve a multi-class problem [14]. These points make the SVM quadratic opti-
mization problem an suffering task in situations in which you have large datasets.
The problem still increases with the chosen of the nonlinear mapping functions,
generally Radial Basis Functions (RBF), in which it is needed to chose their op-
timal parameters by cross-validation manifold techniques, for instance, making
the training phase time prohibitive.
Trying to address these problems, a novel supervised graph-based approach
was recently presented [15,16]. Papa et al. [15] firstly presented the Optimum-
Path Forest (OPF) classifier, which is fast, simple, multi-class, parameter inde-
pendent, does not make any assumption about the shapes of the classes, and can
handle some degree of overlapping between classes. The training set is thought
of as a complete graph, whose nodes are the samples and arcs link all pairs of
nodes. The arcs are weighted by the distances between the feature vectors of
their corresponding nodes. Any sequence of distinct samples forms a path con-
necting the terminal nodes and a connectivity function assigns a cost to that
path (e.g., the maximum arc-weight along it). The idea is to identify prototypes
in each class such that every sample is assigned to the class of its most strongly
connected prototype. That is, the one which offers to it a minimum-cost path,
considering all possible paths from the prototypes. The OPF classifier creates a
discrete optimal partition of the feature space and each sample of the test set
is classified according to the label of its strongly connected partition (optimum-
path tree root). A learning algorithm for the OPF classifier was also presented
in [15], in which a third evaluation set was used to identify the most representa-
tive samples (classification errors), and then these samples are replaced by other
ones of the training set. This process is repeated until some convergence criteria.
The importance of a learning algorithm remains from several points: one can
be used to identify the most representative samples and remove the other ones,
trying to decrease the training set size. This is very interesting in situations in
which you have large datasets. Another point concerns with: does a classifier can
learn with its own errors? The question is: yes. We show here that a classifier
can increase its performance by using an appropriated learning algorithm.
Further, Papa et al. [16] presented a novel variant of the OPF classifier, in
which the graph now is seen as a k-nn graph, with arcs weighted by the distance
between their corresponding feature vectors. Notice that now the nodes are also
weighted by a probability density function (pdf) that takes into account the
arc weights. This new variant have been overcome the traditional OPF in some
A Learning Algorithm for the Optimum-Path Forest Classifier 197

situations, but none learning algorithm was developed for this last one version.
In that way, the main idea of this paper is to present a learning algorithm for
this new variant of the OPF classifier, as well some comparisons against the tra-
ditional OPF and Support Vector Machines are also discussed. The remainder of
this paper is organized as follows: Sections 2 and Section 3 presents, respectively,
the new variant of the OPF classifier and its learning algorithm. Section 4 shows
the experimental results and finally, Section 5 discuss the conclusions.

2 The New Variant of the Optimum-Path Forest


Classifier
Let Z = Z1 ∪ Z2 , where Z1 and Z2 are, respectively, the training and test sets.
Every sample s ∈ Z has a feature vector v(s) and d(s, t) is the distance between
s and t in the feature space (e.g., d(s, t) = v(t)− v(s)). A function λ(s) assigns
the correct label i, i = 1, 2, . . . , c, of class i to any sample s ∈ Z. We aim to
project a classifier from Z1 , which can predict the correct label of the samples in
Z2 . This classifier creates a discrete optimal partition of the feature space such
that any unknown sample can be classified according to this partition.
Let k ≥ 1 be a fixed number for the time being. An k-nn relation Ak is defined
as follows. A sample t ∈ Z1 is said adjacent to a sample s ∈ Z1 , if t is k-nearest
neighbor of s according to d(s, t). The pair (Z1 , Ak ) then defines a k-nn graph
for training. The arcs (s, t) are weighted by d(s, t) and the nodes s ∈ Z1 are
weighted by a density value ρ(s), given by
 2
1 −d (s, t)
ρ(s) = √ exp , (1)
2πσ 2 k 2σ 2
∀t∈Ak (s)

d
where σ = 3f and df is the maximum arc weight in (Z1 , Ak ). This parameter
choice considers all nodes for density computation, since a Gaussian function
covers most samples within d(s, t) ∈ [0, 3σ]. However the density value ρ(s) be
calculated with a Gaussian kernel, the use of the k-nn graph allows the proposed
OPF to be robust to possible variations in the shape of the classes.
A sequence of adjacent samples defines a path πt , starting at a root R (t) ∈ Z1
and ending at a sample t. A path πt = t is said trivial, when it consists
of a single node. The concatenation of a path πs and an arc (s, t) defines an
extended path πs · s, t. We define f (πt ) such that its maximization for all
nodes t ∈ Z1 results into an optimum-path forest with roots at the maxima
of the pdf, forming a root set R. We expect that each class be represented by
one or more roots (maxima) of the pdf. Each optimum-path tree in this forest
represents the influence zone of one root r ∈ R, which is composed by samples
more strongly connected to r than to any other root. We expect that the training
samples of a same class be assigned (classified) to an optimum-path tree rooted
at a maximum of that class. The path-value function is defined as follows.

ρ(t) if t ∈ R
f1 (t) =
ρ(t) − δ otherwise
198 J.P. Papa and A.X. Falcão

f1 (πs · s, t) = min{f1 (πs ), ρ(t)} (2)

where δ = min∀(s,t)∈Ak |ρ(t)=ρ(s) |ρ(t) − ρ(s)|. The root set R is obtained on-
the-fly. The method uses the image foresting transform (IFT) algorithm [17] to
maximize f1 (πt ) and obtain an optimum-path forest P — a predecessor map
with no cycles that assigns to each sample t ∈ / R its predecessor P (t) in the
optimum path P ∗ (t) from R or a marker nil when t ∈ R. The IFT algorithm
for (Z1 , Ak ) is presented below.
Algorithm 1 – IFT Algorithm
Input: A k-nn graph (Z1 , Ak ), λ(s) for all s ∈ Z1 , and path-value function f1 .
Output: Label map L, path-value map V , optimum-path forest P .
Auxiliary: Priority queue Q and variable tmp.
1. For each s ∈ Z1 , do
2. P (s) ← nil, L(s) ← λ(s), V (s) ← ρ(s) − δ
3. and insert s in Q.
4. While Q is not empty, do
5. Remove from Q a sample s such that V (s) is
6. maximum.
7. If P (s) = nil, then V (s) ← ρ(s).
8. For each t ∈ Ak (s) and V (t) < V (s), do
9. tmp ← min{V (s), ρ(t)}.
10. If tmp > V (t) then
11. L(t) ← L(s), P (t) ← s, V (t) ← tmp.
12. Update position of t in Q.

Initially, all paths are trivial with values f (t) = ρ(t) − δ (Line 2). The global
maxima of the pdf are the first to be removed from Q. They are identified as
roots of the forest, by the test P (s) = nil in Line 7, where we set its correct path
value f1 (s) = V (s) = ρ(s). Each node s removed from Q offers a path πs · s, t
to each adjacent node t in the loop from Line 8 to Line 12. If the path value
f1 (πs · s, t) = min{V (s), ρ(t)} (Line 9) is better than the current path value
f1 (πt ) = V (t) (Line 10), then πt is replaced by πs · s, t (i.e., P (t) ← s), and the
path value and label of t are updated accordingly (Line 11). Local maxima of the
pdf are also discovered as roots during the algorithm. The algorithm also outputs
an optimum-path value map V and a label map L, wherein the true labels of
the corresponding roots are propagated to every sample t. A classification error
in the training set occurs when the final L(t) = λ(t). We define the best value
of k ∗ ∈ [1, kmax ] as the one which maximizes the accuracy Acc of classification
in the training set. The accuracy is defined as follows.
Let N Z1 (i), i = 1, 2, . . . , c, be the number of samples in Z1 from each class i.
We define
F P (i) F N (i)
ei,1 = and ei,2 = , (3)
|Z1 | − |N Z1 (i)| |N Z1 (i)|
where F P (i) and F N (i) are the false positives and false negatives, respectively.
That is, F P (i) is the number of samples from other classes that were classified
A Learning Algorithm for the Optimum-Path Forest Classifier 199

as being from the class i in Z1 , and F N (i) is the number of samples from the
class i that were incorrectly classified as being from other classes in Z1 . The
errors ei,1 and ei,2 are used to define

E(i) = ei,1 + ei,2 , (4)

where E(i) is the partial sum error of class i. Finally, the accuracy Acc of the
classification is written as
c c
2c − i=1 E(i) E(i)
Acc = = 1 − i=1 . (5)
2c 2c
The accuracy Acc is measured by taking into account that the classes may have
different sizes in Z1 (similar definition is applied for Z2 ). If there are two classes,
for example, with very different sizes and the classifier always assigns the label
of the largest class, its accuracy will fall drastically due to the high error rate
on the smallest class.
It is expected that each class be represented by at least one maximum of the
pdf and L(t) = λ(t) for all t ∈ Z1 (zero classification errors in the training set).
However, these properties can not be guaranteed with path-value function f1
and the best value k ∗ . In order to assure them, we first find the best value k ∗
using function f1 and then execute Algorithm 1 one more time using path-value
function f2 instead of f1 .

ρ(t) if t ∈ R
f2 (t) =
ρ(t) − δ otherwise

−∞ if λ(t) = λ(s)
f2 (πs · s, t) = (6)
min{f2 (πs ), ρ(t)} otherwise.

Equation 6 weights all arcs (s, t) ∈ Ak such that λ(t) = λ(s) with d(s, t) = −∞,
constraining optimum paths within the correct class of their nodes.
The training process in our method can be summarized by Algorithm 2.
Algorithm 2 – Training
Input: Training set Z1 , λ(s) for all s ∈ Z1 , kmax and path-value functions f1
and f2 .
Output: Label map L, path-value map V , optimum-path forest P .
Auxiliary: Variables i, k, k∗ , M axAcc ← −∞, Acc, and arrays F P and F N of
size c.

1. For k = 1 to kmax do
2. Create graph (Z1 , Ak ) weighted on nodes by Eq. 1.
3. Compute (L, V, P ) using Algorithm 1 with f1 .
4. For each class i = 1, 2, . . . , c, do
5. F P (i) ← 0 and F N (i) ← 0.
6. For each sample t ∈ Z1 , do
7. If L(t) = λ(t), then
8. F P (L(t)) ← F P (L(t)) + 1.
9. F N (λ(t)) ← F N (λ(t)) + 1.
200 J.P. Papa and A.X. Falcão

10. Compute Acc by Equation 5.


11. If Acc > M axAcc, then
12. k∗ ← k and M axAcc ← Acc.
13. Destroy graph (Z1 , Ak ).
14. Create graph (Z1 , Ak∗ ) weighted on nodes by Eq. 1.
15. Compute (L, V, P ) using Algorithm 1 with f2 .

For any sample t ∈ Z2 , we consider the k-nearest neighbors connecting t with


samples s ∈ Z1 , as though t were part of the graph. Considering all possible
paths from R to t, we find the optimum path P ∗ (t) with root R(t) and label t
with the class λ(R(t)). This path can be identified incrementally, by evaluating
the optimum cost V (t) as

V (t) = max{min{V (s), ρ(t)}}, ∀s ∈ Z1 . (7)

Let the node s∗ ∈ Z1 be the one that satisfies the above equation. Given that
L(s∗ ) = λ(R(t)), the classification simply assigns L(s∗ ) to t.

3 Proposed Learning Algorithm


There are many situations that limit the size of Z1 : large datasets, limited compu-
tational resources, and high computational time as required by some approaches.
Mainly in applications with large datasets, it would be interesting to select for
Z1 the most informative samples, such that the accuracy of the classifier is lit-
tle affected by this size limitation. It is also important to show that a classifier
can improve its performance along time of use, when we are able to teach it from
its errors. This section presents a learning algorithm which uses a third evalu-
ation set Z3 to improve the composition of samples in Z1 without increasing
its size.
From an initial choice of Z1 and Z3 , the algorithm projects an instance I of the
OPF classifier from Z1 and evaluates it on Z3 . The misclassified samples of Z2
are randomly selected and replaced by samples of Z1 (under certain constraints).
This procedure assumes that the most informative samples can be obtained from
the errors. The new sets Z1 and Z3 are then used to repeat the process during a
few iterations T . The instance of classifier with highest accuracy is selected along
the iterations. The accuracy values L(I) (Equation 5) obtained for each instance
I form a learning curve, whose non-decreasing monotonic behavior indicates a
positive learning rate for the classifier. Afterwards, by comparing the accuracies
of the classifier on Z2 , before and after the learning process, we can evaluate its
learning capacity from the errors.
Algorithm 3 presents the proposed learning procedure for the new variant of
the OPF (OPFknn ), which uses the k-nn graph as the adjacency relation. The
learning procedure applied for the traditional OPF (OPFcpl ), which makes use
of the complete graph, can be found in [15]. They are quite similar, and the main
difference between them is the training phase in the Line 4.
A Learning Algorithm for the Optimum-Path Forest Classifier 201

Algorithm 3 – General Learning Algorithm


Input: Training and evaluation sets, Z1 and Z2 , labeled by λ, number T of
iterations, and the pair (v, d) for feature vector and distance compu-
tations.
Output: Learning curve L and the OPFknn classifier with highest accuracy.
Auxiliary: Arrays F P and F N of sizes c for false positives and false negatives
and list LM of misclassified samples.
1. Set M axAcc ← −1.
2. For each iteration I = 1, 2, . . . , T , do
3. LM ← ∅
4. Train OPFknn with Z1 .
5. For each class i = 1, 2, . . . , c, do
6. F P (i) ← 0 and F N (i) ← 0.
7. For each sample t ∈ Z2 , do
8. Use the classifier obtained in Line 3 to classify t
9. with a label L(t).
10. If L(t) = λ(t), then
11. F P (L(t)) ← F P (L(t)) + 1.
12. F N (λ(t)) ← F N (λ(t)) + 1.
13. LM ← LM ∪ t.
14. Compute accuracy L(I) by Equation 5.
15. If L(I) > M axAcc then save the current instance
16. of the classifier and set M axAcc ← L(I).
17. While LM = ∅
18. LM ← LM \t
19. Replace t by a randomly selected sample of the
20. same class in Z1 , under some constraints.

In OPFknn , Line 4 is implemented by computing S ∗ ⊂ Z1 as described in


Section 2 and the predecessor map P , label map L and cost map C by Al-
gorithm 1. The classification is done by setting L(t) ← L(s∗ ), where s∗ ∈ Z1 is
the sample that satisfies Equation 5. The constraints in Lines 19 − 20 refer to
keep the prototypes out of the sample interchanging process between Z1 and Z3 .
These same constraints are also applied for the OPFcpl , and for its implementa-
tion we used the LibOPF library [18].
Notice that we also applied the above algorithm for SVM classifier. However,
they may be selected for interchanging in future iterations if they are no longer
prototypes or support vectors. For SVM, we use the latest version of the LibSVM
package [19] with Radial Basis Function (RBF) kernel, parameter optimization
and the OVO strategy for the multi-class problem to implement Line 4.

4 Experimental Results
We performed two rounds of experiments: in the first one we used the OPFcpl ,
OPFknn and SVM 10 times to compute their accuracies, using different ran-
domly selected training (Z1 ) and test (Z2 ) sets. In the second round, we executed
202 J.P. Papa and A.X. Falcão

(a) (b)

Fig. 1. 2D points dataset: (a) CONE TORUS and (b) SATURN

(a) (b) (c)

(d) (e) (f)

Fig. 2. Samples from MPEG-7 shape dataset (a)-(c) Fish e (d)-(f) Camel

the above algorithms again, but they were submitted to the learning algo-
rithm. In this case, the datasets were divided into three parts: a training set
Z1 with 30% of the samples, an evaluation set Z3 with 20% of the samples,
and a test set Z2 with 50% of the samples. Section 4.1 presents the accuracy
results of training on Z1 and testing on Z2 . The accuracy results of training
on Z1 , with learning from the errors in Z3 , and testing on Z2 are presented in
Section 4.2.
The experiments used some combinations of public datasets — CONE TORUS
(2D points)(Figure 1a), SATURN (2D points) (Figure 1b), MPEG-7 (shapes)
(Figure 2) and BRODATZ (textures) — and descriptors — Fourier Coefficients
(FC), Texture Coefficients (TC), and Moment Invariants (MI). A detailed expla-
nation of them can be found in [20,15]. The results in Tables 1 and 2 are displayed
in the following format: x(y), where x and y are, respectively, mean accuracy and
its standard deviation. The percentages of samples in Z1 and Z2 were 50% and 50%
for all datasets.

4.1 Accuracy Results on Z2 without Using Z3


We present here the results without using the third evaluation set, i. e., the
simplest holdout method: one set for training (Z1 ) and other for testing (Z2 ).
The results show (Table 1) that OPFknn can provide better accuracies than
OPFcpl and SVM, being about 50 times faster than SVM for training.
A Learning Algorithm for the Optimum-Path Forest Classifier 203

Table 1. Mean accuracy and standard deviation without learning in Z3

Dataset-Descriptor OPFcpl OPFknn SVM


MPEG7-FC 71.92(0.66) 72.37(0.48) 71.40(0.49)
MPEG7-MI 76.76(0.60) 82.07(0.37) 85.17(0.62)
BRODATZ-TC 87.81(0.70) 88.22(0.96) 87.91(1.06)
CONE TORUS-XY 88.24(1.13) 86.75(1.29) 87.28(3.37)
SATURN-XY 90.40(1.95) 91.00(1.61) 89.40(2.65)

4.2 Accuracy Results on Z3 with Learning on Z2


In order to evaluate the ability of each classifier in learning from the errors in Z3
without increasing the size of Z1 , we executed Algorithm 3 for T = 3 iterations.
The results are presented in Table 2.

Table 2. Mean accuracy and standard deviation with learning in Z3

Dataset-Descriptor OPFcpl OPFknn SVM


MPEG7-FC 73.82(0.66) 75.94(0.48) 74.42(0.49)
MPEG7-MI 81.20(0.60) 81.03(0.37) 82.03(0.62)
BRODATZ-TC 88.54(0.70) 90.41(0.96) 84.37(1.06)
CONE TORUS-XY 88.38(1.13) 86.28(1.29) 87.95(3.37)
SATURN-XY 91.04(1.85) 92.00(1.71) 89.90(2.85)

We can observe that the conclusions drawn from Table 2 remain the same with
respect to the overall performance of the classifiers. In most cases, the general
learning algorithm improved the performance of the classifiers with respect to
their results in Table 1, i. e., it is possible for a given classifier to learn with its
own errors.

5 Conclusion
The OPF classifiers are a novel collection of graph-based classifiers, in which
some advantages with respect to the commonly used classifiers can be addressed:
they do not make assumption about shape/separability of the classes and run
training phase faster. There exists, actually, two variants of OPF-based classi-
fiers: OPFcpl and OPFknn , and the difference between them relie on the adja-
cency relation, prototypes estimation and path-cost function.
We show here how can a OPF-based classifier learns with its own errors,
introducing a learning algorithm for OPFknn , in which its classification results
were good and similar to those reported by the traditional OPF (OPFcpl ) and
SVM approaches. However, the OPF classifiers are about 50 times faster than
SVM for training. It is also important to note that the good accuracy of SVM was
due to parameter optimization. One can see that the OPFknn learning algorithm
improved its results, in some cases up to 3%, without increasing its training
set size.
204 J.P. Papa and A.X. Falcão

References
1. Duda, R.O., Hart, P.E., Stork, D.G.: Pattern Classification, 2nd edn. Wiley-
Interscience, Hoboken (2000)
2. Kulis, B., Basu, S., Dhillon, I., Mooney, R.: Semi-supervised graph clustering: a kernel
approach.In:ICML2005:Proc.ofthe22ndICML,pp.457–464.ACM,NewYork(2005)
3. Schlkopf, B., Zhou, D., Hofmann, T.: Semi-supervised learning on directed graphs.
In: Adv. in Neural Information Processing Systems, pp. 1633–1640 (2005)
4. Callut, J., Fançoisse, K., Saerens, M.: Semi-supervised classification in graphs using
bounded random walks. In: Proceedings of the 17th Annual Machine Learning
Conference of Belgium and the Netherlands (Benelearn), pp. 67–68 (2008)
5. Kumar, N., Kummamuru, K.: Semisupervised clustering with metric learning us-
ing relative comparisons. IEEE Transactions on Knowledge and Data Engineer-
ing 20(4), 496–503 (2008)
6. Hubert, L.J.: Some applications of graph theory to clustering. Psychometrika 39(3),
283–309 (1974)
7. Zahn, C.T.: Graph-theoretical methods for detecting and describing gestalt clus-
ters. IEEE Transactions on Computers C-20(1), 68–86 (1971)
8. Jain, A.K., Dubes, R.C.: Algorithms for Clustering Data. Prentice-Hall, Inc., Upper
Saddle River (1988)
9. Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Transactions on
Pattern Analysis and Machine Intelligence 22(8), 888–905 (2000)
10. Haykin, S.: Neural networks: a comprehensive foundation. Prentice Hall, Engle-
wood Cliffs (1994)
11. Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal mar-
gin classifiers. In: Proceedings of the 5th Workshop on Computational Learning
Theory, pp. 144–152. ACM Press, New York (1992)
12. Kuncheva, L.I.: Combining Pattern Classifiers: Methods and Algorithms. Wiley-
Interscience, Hoboken (2004)
13. Reyzin, L., Schapire, R.E.: How boosting the margin can also boost classifier com-
plexity. In: Proceedings of the 23th International Conference on Machine learning,
pp. 753–760. ACM Press, New York (2006)
14. Duan, K., Keerthi, S.S.: Which is the best multiclass svm method? an empirical
study. In: Oza, N.C., Polikar, R., Kittler, J., Roli, F. (eds.) MCS 2005. LNCS,
vol. 3541, pp. 278–285. Springer, Heidelberg (2005)
15. Papa, J.P., Falcão, A.X., Suzuki, C.T.N., Mascarenhas, N.D.A.: A discrete approach
for supervised pattern recognition. In: Brimkov, V.E., Barneva, R.P., Hauptman, H.A.
(eds.) IWCIA 2008. LNCS, vol. 4958, pp. 136–147. Springer, Heidelberg (2008)
16. Papa, J.P., Falcão, A.X.: A new variant of the optimum-path forest classifier. In:
4th International Symposium on Visual Computing, pp. I: 935–944 (2008)
17. Falcão, A.X., Stolfi, J., Lotufo, R.A.: The image foresting transform: Theory, al-
gorithms, and applications. IEEE Transactions on Pattern Analysis and Machine
Intelligence 26(1), 19–29 (2004)
18. Papa, J.P., Suzuki, C.T.N., Falcão, A.X.: LibOPF: A library for the
design of optimum-path forest classifiers, Software version 1.0 (2008),
http://www.ic.unicamp.br/~ afalcao/LibOPF
19. Chang, C.C., Lin, C.J.: LIBSVM: A Library for Support Vector Machines (2001),
http://www.csie.ntu.edu.tw/~ cjlin/libsvm
20. Montoya-Zegarra, J.A., Papa, J.P., Leite, N.J., Torres, R.S., Falcão, A.X.: Learning
how to extract rotation-invariant and scale-invariant features from texture images.
EURASIP Journal on Advances in Signal Processing, 1–16 (2008)
Improving Graph Classification by Isomap

Kaspar Riesen, Volkmar Frinken, and Horst Bunke

Institute of Computer Science and Applied Mathematics, University of Bern,


Neubrückstrasse 10, CH-3012 Bern, Switzerland
{riesen,frinken,bunke}@iam.unibe.ch

Abstract. Isomap emerged as a powerful tool for analyzing input pat-


terns on manifolds of the underlying space. It builds a neighborhood
graph derived from the observable distance information and recomputes
pairwise distances as the shortest path on the neighborhood graph. In
the present paper, Isomap is applied to graph based pattern representa-
tions. For measuring pairwise graph dissimilarities, graph edit distance is
used. The present paper focuses on classification and employs a support
vector machine in conjunction with kernel values derived from original
and Isomap graph edit distances. In an experimental evaluation on five
different data sets from the IAM graph database repository, we show that
in four out of five cases the graph kernel based on Isomap edit distance
performs superior compared to the kernel relying on the original graph
edit distances.

1 Introduction

Isomap [1] is a non-linear transformation of input patterns that can be applied


to arbitrary domains where a dissimilarity measure is available. It is assumed
that the data lie on a manifold in the input space. Therefore, distances between
input patterns are measured along this manifold [2]. These geodesic distances
along the manifold are estimated in a graph-based approach. Considering that
adjacent patterns have the same distance on the manifold as in the input space,
a neighborhood graph is created in which the nodes represent the input patterns
and edges represent neighborhood relations based on pairwise distances. The
neighborhood graph can be viewed as a discretized approximation of the man-
ifold on which the input patterns lie. Thus, the shortest paths along the edges
of the graph, i.e. the Isomap distance, are assumed to give a better approxima-
tion of the true distances between patterns than the distances measured in the
original feature space.
The present paper investigates the use of Isomap for graph classification.
Due to their power and flexibility, graph based pattern representations found
widespread applications in science and engineering; see [3] for an exhaustive re-
view. However, most of the basic mathematical operations actually required by
many pattern analysis algorithms are not available or not defined in a standard-
ized way for graphs. Consequently, we observe a lack of algorithmic tools in the
graph domain, and it is often difficult to adequately utilize the structure of the

A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 205–214, 2009.

c Springer-Verlag Berlin Heidelberg 2009
206 K. Riesen, V. Frinken, and H. Bunke

underlying patterns. However, given a distance measure for graphs, this obstacle
can be overcome with Isomap by discovering and exploiting possible manifolds
in the underlying graph domain. The geodesic distance approximated by Isomap
may be more appropriate for certain pattern recognition tasks than the original
graph dissimilarities. The requirement for making Isomap applicable in graph
domains is the existence of a distance, or dissimilarity, measure for graphs. In
the context of the work described in this paper, the edit distance is used [4].
Compared to other approaches, graph edit distance is known to be very flexible.
Furthermore, graph edit distance is an error-tolerant dissimilarity measure and
is therefore able to cope well with distorted data.
To analyze the applicability of Isomap, a new family of graph kernels is used
in the present paper. The key idea of graph kernels is to map graphs implicitly
to a vector space where the pattern recognition task is eventually carried out.
Kernel functions can be interpreted as similarity measures. Hence, given the
graph edit distance, one can apply monotonically decreasing functions mapping
low edit distance values to high kernel values and vice versa. Rather than deriv-
ing a kernel from the original edit distances, the graph kernel proposed in this
paper is based on graph distances obtained from the original graph edit distance
through Isomap. That is, before the transformation from graph edit distance into
a kernel value is carried out, Isomap is applied to the graphs and their respec-
tive distances. In an experimental evaluation involving several graph data sets
of diverse nature, we investigate the question whether it is beneficial to employ
Isomap rather than the original edit distances for such a kernel construction.
In [5] a strategy similar to Isomap is applied to trees. In this work the geodesic
distances between the tree’s nodes are computed resulting in a distance matrix
for each individual tree. Via multidimensional scaling the nodes are eventually
embedded in a Euclidean space where spectral methods are applied in order
to analyze and cluster the underlying tree models. Note the difference to our
method where Isomap is applied to a whole set of graphs rather than to a
set of nodes. Furthermore, in our approach the resulting Isomap distances are
directly used in a distance based graph kernel rather than computing spectral
characterizations of the MDS-embedded nodes. Finally, as graph edit distance is
employed as pairwise dissimilarity measure, our approach can handle arbitrary
graphs with any type of node and edge labels.
The remainder of this paper is structured as follows. In the next section the
Isomap transformation is described in detail. In Sect. 3 similarity kernels based
on graph distances are introduced. An experimental evaluation of the proposed
Isomap framework is presented in Sect. 4. Finally, in Sect. 5 we draw some
conclusions from this work.

2 Isomap Transformation

Isomap, as first introduced in [1], is a nonlinear distance transformation of ele-


ments in a feature space. Every set of data points X = {x1 , . . . , xn } can be seen
as laying on a manifold, whose structure may contain important information.
Improving Graph Classification by Isomap 207

One way of exploiting the manifold’s structure is by letting paths between two
points only traverse on the manifold, which is defined by areas of high data den-
ˆ i , xj ) between data points xi and xj ,
sity. Given these paths, a new distance d(x
termed Isomap distance, can be defined.
The data density is determined by the closeness of elements in the feature
space. Two close points have an Isomap distance equal to the original distance,
since they are from the same area of the manifold. A valid Isomap path along
the manifold can therefore be constructed as a concatenation of subpaths within
areas of a high data density. Of course, one needs to be careful not to create
disconnected areas. Closeness needs therefore to be defined in such a way that
local structures can be exploited, but at the same time outliers as well as distant
areas must still be connected. In this paper, closeness is induced via an auxiliary
graph termed k-nearest neighbor graph (k-NN graph).
Definition 1 (k-NN Graph). Given a set of input patterns X = {x1 , . . . , xn }
and a corresponding distance measure d : X × X → R, the k-NN graph G =
(V, E, d) with respect to X is defined as an auxiliary graph where the nodes
represent input patterns, i.e. V = X . Two nodes xi and xj are connected by an
edge (xi , xj ) ∈ E if xj is among the k nearest patterns of xi according to d. The
edge (xi , xj ) ∈ E is labeled with the corresponding distance d(xi , xj ).
Note that, according to this definition, the k-NN graph G is directed. In order to
obtain an undirected graph, for each edge (xi , xj ) an identically labeled reverse
edge (xj , xi ) is inserted in G. The Isomap distance between two patterns xi and
xj is then defined as the minimum length of all paths between them on the k-NN
graph.
Definition 2 (Isomap Distance). Given is a set of input patterns X =
{x1 , . . . , xn } with a distance function d : X × X → R and the k-NN graph
G = (V, E, d) defined with respect to X . A valid path between two patterns
xi , xj ∈ X is a sequence (pi )i=1,...,lp of length lp ∈ N of patterns pj ∈ X such
ˆ ·) between two
that (pi−1 , pi ) ∈ E for all i = 2, . . . , lp . The Isomap distance d(·,
patterns xi and xj is then given by
lp
ˆ i , xj ) = min
d(x d(pi−1 , pi )
p
i=2

where p1 = xi and plp = xj .

On the k-NN graph G, the Isomap distances dˆ can be efficiently computed with
Dijkstra’s algorithm [6] as the shortest paths in G. The complete algorithm is
described in Alg. 1.
Any new data point x ∈ / X can be added in a simple way in O(1), provided
that only Isomap distances starting at the new point x are required, as would be
in the case of classifying a new graph. Since the k-nearest neighbors define the
direct neighborhood, all valid Isomap paths starting in x must pass through one
of its k-nearest neighbors. Therefore it is sufficient to connect the new element
208 K. Riesen, V. Frinken, and H. Bunke

Algorithm 1. Isomap(X , k)
Input: X = {x1 , . . . , xn }, k
Output: Pairwise Isomap distances d̂ij

1. Initialize G to the empty graph


2. for each input pattern xi ∈ X do
3. add a node to G with the label xi
4. end for
5. for all xi ∈ X do
6. for the k pairs (xi , xj ) with the smallest value d(xi , xj ) do
7. insert an edge between node xi and node xj and vice versa with the label d(xi , xj )
8. end for
9. end for
10. for each pair (xi , xj ) ∈ X × X do
11. compute the Isomap distance d̂(xi , xj ) as the shortest path between xi and xj in G
12. end for

with these nearest neighbors to compute the correct Isomap distances from x to
all other points in the graph.
Obviously, the Isomap distance dˆ crucially depends on the meta parameter
k. That is, k has to be defined sufficiently high such that G is connected, i.e.
each pair of patterns (xi , xj ) is connected by at least one path. We denote this
minimum value by kmin . Conversely, if k = n, i.e. if G is complete, the Isomap
distance dˆ will be equal to the original distance d and no additional information
is gained by Isomap. Hence, the optimal value for k lies somewhere in the interval
[kmin , n] and need to be determined on an independent validation set.

3 Similarity Kernel Based on Graph Edit Distance

In this section the concept of graph edit distance and its transformation into a
kernel function is described in detail.

3.1 Graph Edit Distance

Definition 3 (Graph). Let LV and LE be sets of labels for nodes and edges,
respectively. A graph g is defined by the four-tuple g = (V, E, μ, ν), where V is
the finite set of nodes, E ⊆ V × V is the set of edges, μ : V → LV is the node
labeling function, and ν : E → LE is the edge labeling function.

This definition allows us to handle arbitrary graphs with unconstrained labeling


functions. For example, the label alphabet can be given by the set of integers, the
vector space Rn , or a finite set of symbolic labels. Moreover, unlabeled graphs
are obtained by assigning the same label ε to all nodes and edges.
Graph edit distance is a widely studied error-tolerant dissimilarity measure
for graphs [4]. The basic idea of edit distance is to define the dissimilarity of two
graphs by the minimum amount of distortion that is needed to transform one
graph into the other one. A standard set of distortions is given by insertions,
deletions, and substitutions of both nodes and edges. A sequence of distortions
Improving Graph Classification by Isomap 209

termed edit operations, e1 , . . . , ek , that transform g1 into g2 is called an edit


path between g1 and g2 . Obviously, for every pair of graphs (g1 , g2 ), there exist a
number of different edit paths transforming g1 into g2 . Let Υ (g1 , g2 ) denote the
set of all such edit paths. To find the most suitable edit path out of Υ (g1 , g2 ),
one introduces a cost for each edit operation, measuring the strength of the
corresponding operation. The cost of an edit path is given by the sum of the
costs of its individual edit operations. Eventually, the edit distance of two graphs
is defined by the minimum cost edit path between two graphs.
Definition 4 (Graph Edit Distance). Assume that a finite or infinite set
G of graphs is given. Let g1 = (V1 , E1 , μ1 , ν1 ) ∈ G be the source graph and
g2 = (V2 , E2 , μ2 , ν2 ) ∈ G be the target graph. The graph edit distance between g1
and g2 is defined by
k
d(g1 , g2 ) = min c(ei ) ,
(e1 ,...,ek )∈Υ (g1 ,g2 )
i=1

where Υ (g1 , g2 ) denotes the set of edit paths transforming g1 into g2 , and c
denotes the edit cost function measuring the strength c(ei ) of edit operation ei .
Optimal algorithms for computing the edit distance of graphs are typically based
on combinatorial search procedures that explore the space of all possible map-
pings of the nodes and edges of the first graph to the nodes and edges of the
second graph [4]. A major drawback of those procedures is their computational
complexity, which is exponential in the number of nodes of the involved graphs.
However, efficient suboptimal methods for graph edit distance computation have
been proposed [7].
Clearly, the Isomap procedure described in Sect. 2 in conjunction with the
graph edit distance d can be applied to any graph set. The Isomap graph edit
distance dˆ between two graphs gi and gj is the minimum amount of distortion
applied to gi such that the edit path to gj passes only through areas of the
input space where elements of the training set can be found. Hence, all of the
intermediate graphs created in the process of editing gi into gj are similar or
equal to those graphs in the training set.

3.2 Deriving Kernels from Graph Edit Distance


The following definitions generalize kernel functions from vector spaces to the
domain of graphs [8].
Definition 5 (Graph Kernel). Let G be a finite or infinite set of graphs, gi , gj ∈
G, and ϕ : G → Rn a function with n ∈ N. A graph kernel function is a mapping
κ : G × G → R such that κ(gi , gj ) = ϕ(gi ), ϕ(gj ). 
According to this definition a graph kernel function takes two graphs gi and gj as
arguments and returns a real number that is equal to the result achieved by first
mapping the two graphs by a function ϕ to a vector space and then computing
the dot product ·, · in the vector space. The kernel function κ provides us with
210 K. Riesen, V. Frinken, and H. Bunke

a shortcut (commonly referred to as kernel trick ) that eliminates the need for
computing ϕ(·) explicitly. What makes kernel theory interesting is the fact that
many pattern recognition algorithms can be kernelized, i.e. formulated in such
a way that no individual patterns, but only dot products of vectors are needed.
Such algorithms together with an appropriate kernel function are referred to as
kernel machines. In the context of kernel machines, the kernel trick allows us
to address any given recognition problem originally defined in a graph space G
in an implicitly existing vector space Rn instead, without explicitly performing
the mapping from G to Rn . As we are mainly concerned with the problem of
graph classification in this paper, we will focus on kernel machines for pattern
classification, in particular on support vector machines (SVM).
A number of kernel functions have been proposed for graphs [8,9,10,11]. Yet,
these kernels are to a large extent applicable to unlabeled graphs only or unable
to deal sufficiently well with strongly distorted data. In this section, a kernel
function is described that is derived from graph edit distance. The basic rationale
for the definition of such a kernel is to bring together the flexibility of edit
distance based graph matching and the power of SVM based classification [8].
Graph kernel functions can be seen as graph similarity measures satisfying
certain conditions, viz. symmetry and positive definiteness [12]. Such kernels are
commonly referred to as valid graph kernels. Given the dissimilarity information
of graph edit distance, a possible way to construct a kernel is to apply a mono-
tonically decreasing function mapping high dissimilarity values to low similarity
values and vice versa.
Formally, given such a dissimilarity value v(g1 , g2 ) between graphs g1 , g2 ∈ G
we define a kernel function κv : G × G → R as

κv (g1 , g2 ) = exp(−v(g1 , g2 )/γ) ,

where γ > 0.
Although this approach will not generally result in valid kernel functions,
i.e. functions satisfying the conditions of symmetry and positive definiteness,
there exists theoretical evidence suggesting that training an SVM with such a
kernel function can be interpreted as the maximal separation of convex hulls in
pseudo-Euclidean spaces [13].

4 Experimental Results

For our experimental evaluation, five graph data sets from the IAM graph
database repository are used1 . Lacking space we give a short description of the
data only. For a more detailed description we refer to [14].
The first data set used in the experiments consists of graphs representing dis-
torted letter drawings out of 15 classes (Letter ). Next we apply the proposed
method to the problem of fingerprint classification using graphs that represent
1
Note that all data sets are publicly available under http://www.iam.unibe.ch/
fki/databases/iam-graph-database
Improving Graph Classification by Isomap 211

(a) Original distances (b) Isomap distances

Fig. 1. Five classes of the Letter data set before and after Isomap (plotted via MDS)

fingerprint images out of the four classes arch, left loop, right loop, and whorl
(Fingerprint ). Elements from the third graph set belong to two classes (active,
inactive) and represent molecules with activity against HIV or not (Molecule).
The fourth data set also consists of graphs representing molecular compounds.
However, these molecules belong to one of the two classes mutagen or non-
mutagen (Mutagenicity). The last data set consists of graphs representing web-
pages that belong to 20 different categories (Business, Health, Politics, . . .)
(Web). All data sets are divided into three disjoint subsets, i.e. a training, a
validation, and a test set.
The aim of the experiments is to investigate the impact of Isomap graph edit
distances on the classification performance. The original edit distance d and the
Isomap distance dˆ as a dissimilarity value give rise to two different kernels κd
and κd̂ , which are compared against each other.
Multidimensional scaling (MDS), which maps a set of pairwise distances into
an n-dimensional vector space, allows one to get a visual impression of the trans-
formation induced by Isomap. A subset of different classes is plotted before and
after the Isomap transformation in Fig. 1 for the Letter data set. The advantage
of better separability after the transformation can be seen clearly.
For the reference system two meta parameters have to be validated, viz. C
and γ. The former parameter is a weighting parameter for the SVM, which
controls whether the maximization of the margin or the minimization of the
error is more important. The second parameter γ is the weighting parameter in
the kernel function. Both parameters are optimized on the validation set and
eventually applied to the independent test set. For our novel approach with
Isomap graph edit distances d, ˆ an additional meta parameter has to be tuned,
namely k which regulates how many neighbors are taken into account when the
k-NN graph is constructed for the Isomap procedure. The optimization of the
parameter pair (C, γ) is performed on various Isomap edit distances, varying k
in a certain interval. Thus, the optimized classification accuracy with respect to
(C, γ) (illustrated in Fig. 2 (a)) can be regarded as a function of k (illustrated
in Fig. 2 (b)).
212 K. Riesen, V. Frinken, and H. Bunke

(a) Optimizing C and γ for a (b) Validation of k


specific k (here k = 100)

Fig. 2. Meta parameter optimization on the Letter data set

Table 1. Classification results of an SVM on the validation set (va) and the test set
(te). The reference system uses a kernel κd based on the original graph edit distances
d, while the novel kernel κd̂ is based on Isomap distances dˆ computed on a k-NN
graph (the optimal value for k is indicated for each data set). On all but the Web data
set an improvement of the classification accuracy can be observed – two out of four
improvements are statistically significant.

κd = exp(−d/γ) κd̂ = exp(−d̂/γ)

Data set va te va te k

Letter 96.40 94.93 96.27 95.47 100


Fingerprint 82.33 81.95 82.33 82.70 ◦ 40
Molecule 98.00 97.00 98.40 97.60 ◦ 165
Mutagenicity 72.40 68.60 73.60 69.50 90
Web 81.79 82.95 77.95 77.56 • 20

◦ Statistically significant improvement over the reference system (Z-test with α = 0.05).
• Statistically significant deterioration over the reference system (Z-test with α = 0.05).

In Table 1 the classification accuracy on all data sets achieved by the reference
system and our novel approach are provided for both the validation set and the
test set. Additionally, the number of considered neighbors in the k-NN graph is
indicated. On the validation sets we observe that in three out of five cases our
novel approach achieves equal or better classification results than the reference
method. In the test case, on four out of five data sets the kernel based on Isomap
graph edit distances outperforms the original kernel. Two of these improvements
are statistically significant. Overall only one deterioration is observed by our
novel approach. Hence, we conclude that it is clearly beneficial to apply Isomap
to the edit distances before the transformation to a kernel is carried out.

5 Conclusions
In the present paper a graph kernel based on graph edit distances is extended
such that pairwise edit distances are non-linearly transformed before they are
Improving Graph Classification by Isomap 213

turned into kernel values. For the non-linear mapping the Isomap procedure
is used. This procedure builds an auxiliary graph, the so called k-NN graph,
where the nodes represent the underlying objects (graphs) and the edges con-
nect a particular object with its k nearest neighbors according to graph edit
distance. Based on this neighborhood graph, the shortest path between two en-
tities, computed by Dijkstra’s algorithm, is used as new pairwise distance. In
the experimental section of the present paper, a classification task is carried out
on five different graph data sets. As classifier, a SVM is employed. The reference
system’s kernel is derived from the original graph edit distances while the novel
kernel is derived from Isomap graph edit distances. The SVM based on the latter
kernel outperforms the former kernel on four out of five data sets (twice with
statistical significance).

Acknowledgments

We would like to thank B. Haasdonk and Michel Neuhaus for valuable discussions
and hints regarding our similarity kernel. This work has been supported by the
Swiss National Science Foundation (Project 200021-113198/1) and by the Swiss
National Center of Competence in Research (NCCR) on Interactive Multimodal
Information Management (IM2).

References
1. Tenenbaum, J., de Silva, V., Langford, J.: A global geometric framework for non-
linear dimensionality reduction. Science 290, 2319–2323 (2000)
2. Saul, L., Weinberger, K., Sha, F., Ham, J., Lee, D.: Spectral Methods for Dimen-
sionality Reduction. In: Semi-Supervised Learning. MIT Press, Cambridge (2006)
3. Conte, D., Foggia, P., Sansone, C., Vento, M.: Thirty years of graph matching
in pattern recognition. Int. Journal of Pattern Recognition and Artificial Intelli-
gence 18(3), 265–298 (2004)
4. Bunke, H., Allermann, G.: Inexact graph matching for structural pattern recogni-
tion. Pattern Recognition Letters 1, 245–253 (1983)
5. Xiao, B., Torsello, A., Hancock, E.R.: Isotree: Tree clustering via metric embedding.
Neurocomputing 71, 2029–2036 (2008)
6. Dijkstra, E.: A note on two problems in connexion with graphs. Numerische Math-
ematik 1, 269–271 (1959)
7. Riesen, K., Neuhaus, M., Bunke, H.: Bipartite graph matching for computing the
edit distance of graphs. In: Escolano, F., Vento, M. (eds.) GbRPR 2007. LNCS,
vol. 4538, pp. 1–12. Springer, Heidelberg (2007)
8. Neuhaus, M., Bunke, H.: Bridging the Gap Between Graph Edit Distance and
Kernel Machines. World Scientific, Singapore (2007)
9. Gärtner, T.: Kernels for Structured Data. World Scientific, Singapore (2008)
10. Kashima, H., Tsuda, K., Inokuchi, A.: Marginalized kernels between labeled
graphs. In: Proc. 20th Int. Conf. on Machine Learning, pp. 321–328 (2003)
11. Jain, B., Geibel, P., Wysotzki, F.: SVM learning with the Schur-Hadamard inner
product for graphs. Neurocomputing 64, 93–105 (2005)
214 K. Riesen, V. Frinken, and H. Bunke

12. Berg, C., Christensen, J., Ressel, P.: Harmonic Analysis on Semigroups. Springer,
Heidelberg (1984)
13. Haasdonk, B.: Feature space interpretation of SVMs with indefinite kernels. IEEE
Transactions on Pattern Analysis and Machine Intelligence 27(4), 482–492 (2005)
14. Riesen, K., Bunke, H.: IAM graph database repository for graph based pattern
recognition and machine learning. In: da Vitoria Lobo, N., et al. (eds.) Struc-
tural, Syntactic, and Statistical Pattern Recognition. LNCS, vol. 5342, pp. 287–297.
Springer, Heidelberg (2008)
On Computing Canonical Subsets of Graph-Based
Behavioral Representations

Walter C. Mankowski, Peter Bogunovich, Ali Shokoufandeh, and Dario D. Salvucci

Drexel University
Department of Computer Science
Philadelphia PA 19104, USA
{walt,pjb38,ashokouf,salvucci}@drexel.edu

Abstract. The collection of behavior protocols is a common practice in human


factors research, but the analysis of these large data sets has always been a tedious
and time-consuming process. We are interested in automatically finding canoni-
cal behaviors: a small subset of behavioral protocols that is most representative
of the full data set, providing a view of the data with as few protocols as possi-
ble. Behavior protocols often have a natural graph-based representation, yet there
has been little work applying graph theory to their study. In this paper we extend
our recent algorithm by taking into account the graph topology induced by the
paths taken through the space of possible behaviors. We applied this technique to
find canonical web-browsing behaviors for computer users. By comparing iden-
tified canonical sets to a ground truth determined by expert human coders, we
found that this graph-based metric outperforms our previous metric based on edit
distance.

1 Introduction

In many domains involving the analysis of human behavior, data are often collected
in the form of time-series known as behavioral protocols — sequences of actions per-
formed during the execution of a task. Behavioral protocols offer a rich source of in-
formation about human behavior and have been used, for example, to examine how
computer users perform basic tasks (e.g., [1]), how math students solve algebra prob-
lems (e.g., [2]), and how drivers steer a vehicle down the road (e.g., [3]). However, the
many benefits of behavioral protocols come with one significant limitation: The typi-
cally sizable amount of data often makes it difficult, if not impossible, to analyze the
data manually. At times, researchers have tried to overcome this limitation by using
some form of aggregation in order to make sense of the data (e.g., [4,5]). While this
aggregation has its merits in seeing overall behavior, it masks potentially interesting
patterns in individuals and subsets of individuals. Alternatively, researchers have some-
times laboriously studied individual protocols by hand to identify interesting behaviors
(e.g. [6,7]). Although some work has been done on automated protocol analysis, such
techniques focus on matching observed behaviors to the predictions of a step-by-step
process model (e.g. [8,9]), and often such models are not available and/or their devel-
opment is infeasible given the complexity of the behaviors.

A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 215–222, 2009.

c Springer-Verlag Berlin Heidelberg 2009
216 W.C. Mankowski et al.

In our previous work we have introduced the notion of canonical behaviors as a novel
way of providing automated analysis of behavioral protocols [10]. Canonical behaviors
are a small subset of behavioral protocols that is most representative of the full data set,
providing a reasonable “big picture” view of the data with as few protocols as possible.
In contrast with previous techniques, our method identifies the canonical behavior pat-
terns without any a priori step-by-step process model; all that is needed is a similarity
measure between pairs of behaviors. To illustrate our approach in a real-world domain,
we applied the method to the domain of web browsing. We found that the canonical
browsing paths found by our algorithm compared well with those identified by two ex-
pert human coders with significant experience in cognitive task analysis and modeling.
However, our technique was limited by the fact that our similarity measure treated each
browsing path as a string, ignoring the underlying graph structure of the web site. In
this paper we explore a graph-based similarity measure which takes into account the
effects of graph topology when computing the similarity between two patterns.
The remainder of this paper is structured as follows. In Sect. 2 we describe our
canonical set algorithm and our new similarity metric. In Sect. 3 we review our web
browsing experiment from [10]. In Sect. 4 we compare the results of our new metric
with those from our previous experiment. Finally in Sect. 5 we summarize our findings
and discuss possible future directions of research.

2 Finding Canonical Behaviors

At a high level, our goal in finding canonical behavior patterns is to reduce a large set
of protocols to a smaller subset that is most representative of the full data set. We define
a canonical set of behaviors as a subset such that the behaviors within the subset are
minimally similar to each other and are maximally similar to those behaviors not in the
subset.
Our technique for finding canonical behavior patterns derives from work on the
canonical set problem. Given a set of patterns P = {p1 , . . . , pn } and a similarity func-
tion S : P × P → IR≥0 , the canonical set problem is to find a subset P  ⊆ P that best
characterizes the elements of P with respect to S. The key aspects of our method are an
approximation algorithm for the canonical set problem, and the specification of an ap-
propriate similarity metric for the particular problem being modeled. We now describe
each in turn.

2.1 Modeling Canonical Sets as Graphs

Exact solutions to the canonical set problem require integer programming, which is
known to be NP-Hard [11]. Denton et al. [12] have developed an approximation algo-
rithm using semidefinite programming which has been shown to work very well on a
wide variety of applications. First, a complete graph G is constructed such that each
pattern (in this case, a behavior protocol) is a vertex, and each edge is given a weight
such that w(u, v) is the similarity of the patterns corresponding to the vertices u and
v. Finally, we find the canonical set by computing a cut that bisects the graph into two
subsets, as shown in Fig. 1.
On Computing Canonical Subsets of Graph-Based Behavioral Representations 217

Fig. 1. Canonical-set graph with behaviors at vertices and edge weights corresponding to be-
havioral similarities (from [10]). Finding the canonical set can be expressed as a optimization
problem, where the goal is to minimize the weights of the intra edges while simultaneously max-
imizing the weights of the cut edges.

Algorithm 1. Approximation of Canonical Sets [12]


1. Construct an edge-weighted graph G(P) from the set of patterns P and the similarity func-
tion S : P × P → IR≥0
2. Form a semidefinite program with the combined objective of minimizing the weights of the
intra edges and maximizing the weights of the cut edges (see Fig. 1).
3. Solve the semidefinite program from step 2 using the algorithm in [13], obtaining positive
semidefinite matrix X ∗ .
4. Compute the Cholesky decomposition X ∗ = V t V.
5. Construct indicator variables y1 , . . . , yn , yn+1 from the matrix V.
6. Form the canonical set V ∗ .

The task of determining the proper graph cut to find the canonical set can be ex-
pressed as an optimization problem, where the objective is to minimize the sum of the
weights of the intra edges — those edges between vertices within the canonical set,
as shown in Fig. 1 — while simultaneously maximizing the sum of the weights of the
cut edges — those edges between vertices in the canonical set and those outside the
set. This optimization is known to be intractable [11] and thus Denton et al. employ
an approximation algorithm (please see Algorithm 1): they formulate the canonical
set problem as an integer programming problem, relax it to a semidefinite program,
and then use an off-the-shelf solver [13] to find the approximate solution. Please refer
to [12] for a full derivation and description of the algorithm.
The algorithm includes one free parameter, λ ∈ [0, 1], which scales the weighting
given to cut edges verses intra edges. Higher values of λ favor maximizing the cut edge
weights, resulting in fewer but larger subsets of patterns; lower values favor minimizing
the intra edge weights, resulting smaller, more numerous subsets.
There are two main advantages of the canonical set algorithm compared to many
similar methods of extracting key items from sets. First, it is an unsupervised algo-
rithm; no training dataset is necessary. Second, no a priori knowledge of the number of
representative elements (in this case, behaviors) is needed. Both the sets themselves and
218 W.C. Mankowski et al.

the most representative elements of the sets arise naturally from the algorithm. As a re-
sult, the canonical set algorithm has applications in a wide variety of machine learning
areas, for example image matching [14] and software engineering [15].

2.2 Graph-Based Similarity Measures


A critical aspect of finding canonical sets is the definition of some measure that quan-
tifies the similarity (or, inversely, the distance) between two given patterns. Let S(x, y)
be the similarity between two patterns x and y. Clearly S is highly dependent upon the
nature of the domain being studied. For example, in image matching, the earth-mover’s
distance [16] might be an appropriate similarity measure, while in an eye-movement
study the similarity measure might take into account the sequence of items fixated upon
(e.g., [7]).
In our original work on web browsing [10], we used a simple edit-distance metric
[17] to compute the similarity between browsing protocols. Intuitively, the edit-distance
ED(x, y) between two protocols measures the minimum cost of inserting, deleting, or
substituting actions to transform one sequence of web pages to the other. We assigned
a uniform cost of 1 to all insertions, deletions, and substitutions. The edit-distance cost
was converted to similarity as S(x, y) = 1/(1 + ED(x, y)).
While the edit-distance similarity measure worked well overall, it had one drawback.
Web sites are by their nature graph-based (with pages as nodes and links as edges), but
the edit-distance measure ignores this and treats each path taken by the subjects as a
simple sequence of pages. We hypothesized that our performance would be improved
by using a measure which takes into account the underlying topology of the graphs.
As our new similarity metric, we chose to use a modified version of Pelillo’s sub-
graph isomorphism algorithm [18] due to its flexibility in encoding node similarity
constraints, especially if the graphs are induced from a fixed topology. In brief, the al-
gorithm works as follows (please see Algorithm 2). Given two graphs U and V , an
association graph G is built from the product graph of U and V . A vertex {u, v} exists
in G for every pair of vertices u in U and v in V . An edge is added between vertices
{u1 , v1 } and {u2 , v2 } in G only if the shortest path distance between vertices u1 and
u2 in U is equal to the shortest path distance between vertices v1 and v2 in V . Cliques
found in the association graph using the Motzkin-Strauss formulation [19] correspond
to subgraph isomorphisms between the original graphs U and V .
Since in our experiment we were searching for isomorphisms between induced sub-
graphs of the same graph (namely, the web site), we modified the construction of the
association graph slightly to enforce level consistency — a web site can be thought of
as a tree, and we checked that the two paths ended at the same level or depth of the tree.
To accomplish this, we only add an edge to the association graph if the distances are
the same and the two pairs of vertices are identical.

3 Data Collection
To test if the canonical set algorithm could be applied to find canonical behavior proto-
cols, we collected data from users performing typical web-browsing tasks on a univer-
sity web site [10]. The users were given a set of 32 questions covering a range of topics
On Computing Canonical Subsets of Graph-Based Behavioral Representations 219

Algorithm 2. Modified version of Pelillo’s subgraph isomorphism algorithm


Given two graphs U and V , representing two paths taken by users through a web site:

1. Compute all-pairs shortest path distances for U and V [20].


2. Build an association graph G from the product graph of U and V :
(a) Add vertex {u, v} to G for every pair of vertices u in U and v in V .
(b) Add edge between vertices {u1 , v1 } and {u2 , v2 } if:
– the shortest path distance between vertices u1 and u2 in U is equal to the shortest
path distance between vertices v1 and v2 in V , and
– u1 and u2 refer to the same URL, and
– v1 and v2 refer to the same URL.
3. Find a clique in G using the Motzkin-Strauss formulation [19].
4. The clique corresponds to a subgraph isomorphism between U and V .

Fig. 2. Sample analysis graphs (from [10]). The canonical behaviors found by our algorithm are
shown in bold in (b), and the other behaviors are labeled according to their nearest neighbor.

related to finding information about athletic programs, academic departments, and so


on. We also required a “ground truth” against which to compare the canonical sets
found by our method. For this purpose, we recruited two experts (a professor and an ad-
vanced graduate student) with significant experience in cognitive task analysis and mod-
eling. We asked the experts, given the sequence of URLs visited by the users for each
question, to identify subsets that they felt represented distinct behaviors. Clearly the
experts could each have their own notion of what would constitute “similar” and “dif-
ferent” behaviors, analogous to the λ parameter in the canonical set algorithm. We left
220 W.C. Mankowski et al.

this undefined and allowed them to use their own judgments to decide on the best par-
tition for each question.
Figure 2 shows an example of the automated and expert results (from [10]) for an
individual question (“What is the phone number of Professor . . . ?”) to illustrate our
analysis in detail. Each vertex represents a single web page (labeled with a unique inte-
ger) and each directed edge represents a clicked link from one page to another taken by
one of the users. The expert (graph a) found 6 sets of behaviors: sets A and B represent
different ways of clicking through the department web page to get to the professor’s
home page; sets C and D represent different ways of clicking through to the site’s direc-
tory search page (vertex 14); and sets E and F represent slight variations on sets C and
D. The canonical set algorithm (graph b, with λ=.36)) identified 4 canonical behaviors
for this same question; these are shown in bold in the figure, and the other behaviors
are labeled according to their nearest neighbor. The behaviors found by the algorithm
correspond directly to the expert’s sets A–D, but instead of splitting out sets E and F,
the algorithm (in part due to the value of λ) grouped these variations with the nearest
canonical set D.

4 Analysis and Results


To compare the performance of our graph-based similarity measure with our previous
metric based on the edit distance, we used the well-known Rand index [21] to compare
the clusterings found by the canonical set algorithm using each measure. Given a set of
n elements S = {O1 , . . . , On }, let X = {x1 , . . . , xr } and Y = {y1 , . . . , ys } represent
two ways of partitioning S into r and s subsets, respectively. Then let a be the number
of pairs of elements that are in the same partition in X and also in the same partition in
Y ; let b be the number of pairs in the same partition in X but in different partitions in
Y ; let c be the number of pairs in different partitions in X but in the same partition in
Y ; and let d be the number of pairs in different partitions in both X and Y . Then the
Rand index is simply
a+d a+d
R= = n . (1)
a+b+c+d 2
We compared the performance of the edit distance and association graph measures across
a wide range of possible values of λ. For each λ we computed the canonical sets for each
question using both measures. We compared the resulting partitions with those found by
our experts using the Rand index, and then computed the average value of R across all
questions. The results are shown in Fig. 3. As the graphs illustrate, the association graph
measure produced partitions that more closely matched both experts than our previous
edit distance measure across nearly the entire range of λ values we tested.
The shapes of the graphs in Fig. 3 are somewhat surprising, as the curves might be
expected to be concave with a peak at the actual λ used by each expert. There are several
possible explanations for this. First, our canonical set algorithm is not symmetric with
respect to values of λ. When λ is very high (above roughly 0.9 in this experiment) only
one canonical pattern is found. However, the inverse is not the case: when λ is very low,
the algorithm does not consider every element in the set to be canonical. Second, it is
possible that our experts did not use a single λ in their evaluations, but rather varied
On Computing Canonical Subsets of Graph-Based Behavioral Representations 221

0.9

Fig. 3. Rand index comparison of edit distance and association graph similarity measures for the
two experts across a range of λ values. The association graph measure outperforms edit distance
across the nearly entire range for both experts.

their sense of “similar” and “different” behaviors depending on the particular behaviors
they observed for each question. While the selection of the correct λ is beyond the scope
of this paper, it is something we plan to study further in our future research.

5 Discussion
We have presented an automated method of finding canonical subsets of behavior pro-
tocols which uses a graph-based representation of the data. The collection of these types
of time series is common in psychology and human factors research. While these data
can often be naturally represented as graphs, there has been relatively little work in
applying graph theory to their study. As users move through the space of possible be-
haviors in a system, their paths naturally induce a graph topology. We have shown that
by taking into account this topology, improved results may be obtained over methods
which ignore the underlying graph structure. We believe that this work is an important
first step in the application of graph-based representations and algorithms to the analysis
of human behavior protocols.

Acknowledgments. This work was supported by ONR grants #N00014-03-1-0036 and


#N00014-08-1-0925 and NSF grant #IIS-0426674.

References
1. Card, S.K., Newell, A., Moran, T.P.: The Psychology of Human-Computer Interaction.
Lawrence Erlbaum Associates, Hillsdale (1983)
2. Milson, R., Lewis, M.W., Anderson, J.R.: The Teacher’s Apprentice Project: Building an
Algebra Tutor. In: Artificial Intelligence and the Future of Testing, pp. 53–71. Lawrence
Erlbaum Associates, Hillsdale (1990)
222 W.C. Mankowski et al.

3. Salvucci, D.D.: Modeling driver behavior in a cognitive architecture. Human Factors 48(2),
362–380 (2006)
4. Chi, E.H., Rosien, A., Supattanasiri, G., Williams, A., Royer, C., Chow, C., Robles, E., Dalal,
B., Chen, J., Cousins, S.: The Bloodhound project: automating discovery of web usability
issues using the InfoScentTM simulator. In: CHI 2003: Proceedings of the SIGCHI conference
on Human factors in computing systems, pp. 505–512. ACM, New York (2003)
5. Cutrell, E., Guan, Z.: What are you looking for? An eye-tracking study of information usage
in web search. In: CHI 2007: Proceedings of the SIGCHI conference on Human factors in
computing systems, pp. 407–416. ACM, New York (2007)
6. Ericsson, K.A., Simon, H.A.: Protocol analysis: verbal reports as data, Revised edn. MIT
Press, Cambridge (1993)
7. Salvucci, D.D., Anderson, J.R.: Automated eye-movement protocol analysis. Human-
Computer Interaction 16(1), 39–86 (2001)
8. Ritter, F.E., Larkin, J.H.: Developing process models as summaries of HCI action sequences.
Human-Computer Interaction 9(3), 345–383 (1994)
9. Smith, J.B., Smith, D.K., Kupstas, E.: Automated protocol analysis. Human-Computer Inter-
action 8(2), 101–145 (1993)
10. Mankowski, W.C., Bogunovich, P., Shokoufandeh, A., Salvucci, D.D.: Finding canonical
behaviors in user protocols. In: CHI 2009: Proceedings of the SIGCHI conference on Human
factors in computing systems, pp. 1323–1326. ACM, New York (2009)
11. Garey, M.R., Johnson, D.S.: Computers and Intractibility: A Guide to the Theory of NP-
Completeness. W.H. Freeman and Co., San Francisco (1979)
12. Denton, T., Shokoufandeh, A., Novatnack, J., Nishino, K.: Canonical subsets of image fea-
tures. Computer Vision and Image Understanding 112(1), 55–66 (2008)
13. Toh, K., Todd, M., Tütüncü, R.: SDPT3 — a M ATLAB software package for semidefinite
programming. Optimization Methods and Software 11, 545–581 (1999)
14. Novatnack, J., Denton, T., Shokoufandeh, A., Bretzner, L.: Stable bounded canonical sets
and image matching. In: Rangarajan, A., Vemuri, B.C., Yuille, A.L. (eds.) EMMCVPR 2005.
LNCS, vol. 3757, pp. 316–331. Springer, Heidelberg (2005)
15. Kothari, J., Denton, T., Mancoridis, S., Shokoufandeh, A.: On computing the canonical fea-
tures of software systems. In: Proceedings of the 13th Working Conference on Reverse En-
gineering (WCRE), pp. 93–102 (2006)
16. Rubner, Y., Tomasi, C., Guibas, L.J.: The earth mover’s distance as a metric for image re-
trieval. International Journal of Computer Vision 40(2), 99–121 (2000)
17. Levenshtein, V.: Binary codes capable of correcting deletions, insertions and reversals. Soviet
Physics Doklady 10, 707–710 (1966)
18. Pelillo, M., Siddiqi, K., Zucker, S.W.: Matching hierarchical structures using association
graphs. IEEE Transactions on Pattern Analysis and Machine Intelligence 21(11), 1105–1120
(1999)
19. Motzkin, T., Strauss, E.: Maxima for graphs and a new proof of a theorem of Turan. Canadian
Journal of Mathematics 17(4), 533–540 (1964)
20. Floyd, R.W.: Algorithm 97: Shortest path. Communications of the ACM 5(6), 345 (1962)
21. Rand, W.M.: Objective criteria for the evaluation of clustering methods. Journal of the Amer-
ican Statistical Association 66(336), 846–850 (1971)
Object Detection by Keygraph Classification

Marcelo Hashimoto and Roberto M. Cesar Jr.

Instituto de Matemática e Estatı́stica - IME,


Universidade de São Paulo - USP, São Paulo, Brazil
{mh,roberto.cesar}@vision.ime.usp.br

Abstract. In this paper, we propose a new approach for keypoint-based


object detection. Traditional keypoint-based methods consist of classifying
individual points and using pose estimation to discard misclassifications.
Since a single point carries no relational features, such methods inherently
restrict the usage of structural information. Therefore, the classifier con-
siders mostly appearance-based feature vectors, thus requiring computa-
tionally expensive feature extraction or complex probabilistic modelling
to achieve satisfactory robustness. In contrast, our approach consists of
classifying graphs of keypoints, which incorporates structural information
during the classification phase and allows the extraction of simpler feature
vectors that are naturally robust. In the present work, 3-vertices graphs
have been considered, though the methodology is general and larger order
graphs may be adopted. Successful experimental results obtained for real-
time object detection in video sequences are reported.

1 Introduction
Object detection is one of the most classic problems in computer vision and
can be informally defined as follows: given an image representing an object and
another, possibly a video frame, representing a scene, decide if the object belongs
to the scene and determine its pose if it does. Such pose consists not only of the
object location, but also of its scale and rotation. The object might not even be
necessarily rigid, in which case more complex deformations are possible. We will
refer to the object image as our model and, for the sake of simplicity, refer to
the scene image simply as our frame.
Recent successful approaches to this problem are based on keypoints [1,2,3,4].
In such approaches, instead of the model itself, the algorithm tries to locate
a subset of points from the object. The chosen points are those that satisfy
desirable properties, such as ease of detection and robustness to variations of
scale, rotation and brightness. This approach reduces the problem to supervised
classification where each model keypoint represents a class and feature vectors
of the frame keypoints represent input data to the classifier.
A well-known example is the SIFT method proposed by Lowe [1]. The most
important aspect of this method relies on the very rich feature vectors calculated
for each keypoint: they are robust and distinctive enough to allow remarkably
good results in practice even with few vectors per class and a simple nearest-
neighbor approach. More recent feature extraction strategies, such as the SURF

A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 223–232, 2009.

c Springer-Verlag Berlin Heidelberg 2009
224 M. Hashimoto and R.M. Cesar Jr.

method proposed by Bay, Tuytelaars and van Gool [2], are reported to perform
even better.
The main drawback of using rich feature vectors is that they are usually
complex or computationally expensive to calculate, which can be a shortcoming
for real-time detection in videos, for example. Lepetit and Fua [3] worked around
this limitation by shifting much of the computational burden to the training
phase. Their method uses simple and cheap feature vectors, but extracts them
from several different images artificially generated by applying changes of scale,
rotation and brightness to the model. Therefore, robustness is achieved not by
the richness of each vector, but by the richness of the training set as a whole.
Regardless the choice among most of such methods, keypoint-based approach-
es traditionally follow the same general framework, described below.

Training

1. Detect keypoints in the model.

2. Extract feature vectors from each keypoint.

3. Use the feature vectors to train a classifier whose classes are the keypoints.
The accuracy must be reasonably high, but not necessarily near-perfect.

Classification

1. Detect keypoints in the frame.

2. Extract feature vectors from each keypoint.

3. Apply the classifier to the feature vectors in order to decide if each frame
keypoint is sufficiently similar to a model keypoint. As near-perfect accu-
racy is not required, several misclassifications might be done in this step.

4. Use an estimation algorithm to determine a pose spatially coherent with


a large enough number of classifications made during the previous step.
Classifications disagreeing with such pose are discarded as outliers.

A shortcoming of this framework is that structural information, such as ge-


ometric and topological relations between the points, has its usage inherently
limited by the fact that classes are represented by single points. Therefore, most
of the burden of describing a keypoint lies on individual appearance information,
such as the color of pixels close to it. The idea of using structure to overcome
this limitation is certainly not new: the seminal work of Schmid and Mohr [5]
used geometric restrictions to refine keypoint classification, Tell and Carlsson [6]
obtained substantial improvements with topological constraints and, more re-
cently, Özuysal, Fua and Lepetit [4] proposed a probabilistic modelling scheme
Object Detection by Keygraph Classification 225

where small groups of keypoints are considered. However, since those works fol-
low the framework of associating classes to individual points, there is still an
inherent underuse of structural information.
In this paper, we propose an alternative framework that, instead of classifying
single keypoints, classifies sets of keypoints using both appearance and structural
information. Since graphs are mathematical objects that naturally model rela-
tions, they are adopted to represent such sets. Therefore, the proposed approach
is based on supervised classification of graphs of keypoints, henceforth referred
as keygraphs. A general description of our framework is given below.
Training

1. Detect keypoints in the model.

2. Build a set of keygraphs whose vertices are the detected keypoints.

3. Extract feature vectors from each keygraph.

4. Use the feature vectors to train a classifier whose classes are the keygraphs.
The accuracy must be reasonably high, but not necessarily near-perfect.

Classification

1. Detect keypoints in the frame.

2. Build a set of keygraphs whose vertices are the detected keypoints.

3. Extract feature vectors from each keygraph.

4. Apply the classifier to the feature vectors in order to decide if each frame
keygraph is sufficiently similar to a model keygraph. As near-perfect accu-
racy is not required, several misclassifications might be done in this step.

5. Use an estimation algorithm to determine a pose spatially coherent with


a large enough number of classifications made during the previous step.
Classifications disagreeing with such pose are discarded as outliers.

The idea of using graphs built from keypoints to detect objects is also not new:
Tang and Tao [7], for example, had success with dynamic graphs defined over
SIFT points. Their work, however, shifts away from the classification approach
and tries to solve the problem with graph matching. Our approach, in contrast,
still reduces the problem to supervised classification, which is more efficient.
In fact, it can be seen as a generalization of the traditional methods, since a
keypoint is a single-vertex graph.
This paper is organized as follows. Section 2 introduces the proposed frame-
work, focusing on the advantages of using graphs instead of points. Section 3
226 M. Hashimoto and R.M. Cesar Jr.

describes a concrete implementation of the framework, where 3-vertices key-


graphs are used, and some successful experimental results that this implemen-
tation had for real-time object detection. Finally, in Section 4 we present our
conclusions.

2 Keygraph Classification Framework


   
A graph is a pair (V, E), where V is an arbitrary set, E ⊆ V2 and V2 denotes the
family of all subsets of V with cardinality 2. We say that V is the set of vertices

and E is the set of edges. We also say that the graph is complete if V = V2 and
  
that (V  , E  ) is a subgraph of (V, E) if V  ⊆ V and E  ⊆ E ∩ V2 . Given a set S,
we denote by G(S) the complete graph whose set of vertices is S.
Those definitions allow us to easily summarize the difference between the
traditional and the proposed frameworks. Both have the same outline: define
certain universe sets from the model and the frame, detect key elements from
those sets, extract feature vectors from those elements, train a classifier with the
model vectors, apply the classifier to the frame vectors and analyze the result
with a pose estimation algorithm. The main difference lies on the first step:
defining the universe set of an image. In the traditional framework, since the
set of keypoints K represents the key elements, this universe is the set of all
image points. In the proposed framework, while the detection of K remains, the
universe is the set of all subgraphs of G(K). In the following subsections, we
describe the fundamental advantages of such difference in three steps: the key
element detection, the feature vector extraction and the pose estimation.

2.1 Keygraph Detection

One of the most evident differences between detecting a keypoint and detecting
a keygraph is the size of the universe set: the number of subgraphs of G(K) is ex-
ponential on the size of K. This implies that a keygraph detector must be much
more restrictive than a keypoint detector if we are interested in real-time per-
formance. Such necessary restrictiveness, however, is not hard to obtain because
graphs have structural properties to be explored that individual keypoints do
not. Those properties can be classified in three types: combinatorial, topological
and geometric. Figure 1 shows how those three types of structural properties can
be used to gradually restrict the number of considered graphs.

2.2 Partitioning the Feature Vectors

A natural approach for extracting feature vectors from keygraphs is by translat-


ing all the keygraph properties, regardless if they are structural or appearance-
based, into scalar values. However, a more refined approach that allows to take
more advantage of the power of structural information has been adopted.
This approach consists of keeping the feature vectors themselves appearance-
based, but partitioning the set of vectors according to structural properties.
Object Detection by Keygraph Classification 227

a) b) c)

Fig. 1. Gradual restriction by structural properties. Column (a) shows two graphs
with different combinatorial structure. Column (b) shows two graphs combinatorially
equivalent but topologically different. Finally, column (c) shows two graphs with the
same combinatorial and topological structure, but different geometric structure.

(a) (b)

Fig. 2. Model keygraph (a) and a frame keygraph (b) we want to classify. From the
topological structure alone we can verify that the latter cannot be matched with the
former: the right graph does not have a vertex inside the convex hull of the others.
Furthermore, translating this simple boolean property into a scalar value does not make
much sense.

There are two motivations for such approach: the first one is the fact that a
structural property, alone, may present a strong distinctive power. The second
one is the fact that certain structural properties may assume boolean values for
which a translation to a scalar does not make much sense. Figure 2 gives a simple
example that illustrates the two motivations.
By training several classifiers, one for each subset given by the partition,
instead of just one, we not only satisfy the two motivations above, but we also
improve the classification from both an accuracy and an efficiency point of view.

2.3 Naturally Robust Features

For extracting a feature vector from a keygraph, there exists a natural approach
by merging multiple keypoint feature vectors extracted from its vertices. How-
ever, a more refined approach may be derived. In traditional methods, a keypoint
feature vector is extracted from color values of the points that belong to a certain
228 M. Hashimoto and R.M. Cesar Jr.

(a) (b)

Fig. 3. Comparison of patch extraction (a) and relative extraction (b) with keygraphs
that consist of two keypoints and the edge between them. Suppose there is no variation
of brightness between the two images and consider for each keygraph the mean gray
level relative to all image pixels crossed by its edge. Regardless of scale and rotation,
there should be no large variations between the two means. Therefore, they represent a
naturally robust feature. In contrast, variations in scale and rotation gives completely
different patches and a non-trivial patch extraction scheme is necessary for robustness.

patch around it. This approach is inherently flawed because such patches are not
naturally robust to scale and rotation. Traditional methods work around this flaw
by improving the extraction itself. Lowe [1] uses a gradient histogram approach,
while Lepetit and Fua [3] rely on the training with multiple sintethic views.
With keygraphs, in contrast, the flaw does not exist in the first place, because
they are built on sets of keypoints. Therefore, they allow the extraction of relative
features that are naturally robust to scale and rotation without the need of
sophisticated extraction strategies. Figure 3 shows a very simple example.

2.4 Pose Estimation by Voting


A particular advantage of the SIFT feature extraction scheme relies on its ca-
pability of assigning, to each feature vector, a scale and rotation relative to the
scale and rotation of the model itself. This greatly reduces the complexity of pose
estimation because each keypoint classification naturally induces a pose that the
object must have in the scene if such classification is correct. Therefore, one can
obtain a robust pose estimation and discard classifier errors by simply follow-
ing a Hough transform procedure: a quantization of all possible poses is made
and each evaluation from the classifier registers a vote for the corresponding
quantized pose. The most voted pose wins.
The same procedure can be used with keygraphs, because relative properties of
a set of keypoints can be used to infer scale and rotation. It should be emphasized,
however, that the viability of such strategy depends on how rich the structure
of the considered keygraphs is. Figure 4 has a simple example of how a poorly
chosen structure can cause ambiguity during the pose estimation.
Object Detection by Keygraph Classification 229

Fig. 4. Example of pose estimation ambiguity. The image on the left indicates the pose
of a certain 2-vertex graph in a frame. If a classifier evaluates this graph as being the
model keygraph indicated in Figure 3, there would be two possible coherent rotations.

3 Implementation and Results

In this section we present the details of an implementation of the proposed


framework that was made in C++ with the OpenCV [8] library. To illustrate our
current results with this implementation, we describe an experiment on which
we attempted to detect a book in real-time with a webcam, while varying its
position, scale and rotation. We ran the tests in an Intel R CoreTM 2 Duo T7250
with 2.00GHz and 2 GB of RAM. A 2-megapixel laptop webcam was used for
the detection itself and to take the single book picture used during the training.

3.1 Good Features to Track

For keypoint detection we used the well-known good features to track detector
proposed by Shi and Tomasi [9], that applies a threshold over a certain qual-
ity measure. By adjusting this threshold, we are able to control how rigid is
the detection. A good balance between accuracy and efficiency was found in a
threshold that gave us 79 keypoints in the model.

3.2 Thick Scalene Triangles

For keygraph detection we selected 3-vertices complete graphs whose induced


triangle is sufficiently thick and scalene. More formally, that means each one of
the internal angles in the triangle should be larger than a certain threshold and
the difference between any two internal angles is larger than another threshold.
The rationale behind this choice is increasing structure richness: the vertices of
a excessively thin triangle are too close of being collinear and high similarity
between internal angles could lead to the pose estimation ambiguity problem
mentioned in the previous section.
In our experiment, we established that no internal angle should have less than
5 degrees and no pair of angles should have less than 5 degrees of difference. To
avoid numerical problems, we also added that no pair of vertices should have
less than 10 pixels of distance. Those three thresholds limited drastically the
number of keygraphs: out of 79 · 78 · 77 = 474.474 possible 3-vertices subgraphs,
the detector considered 51.002 keygraphs.
230 M. Hashimoto and R.M. Cesar Jr.

θ3

θ1 θ2

Fig. 5. Scalene triangle with θ1 < θ2 < θ3 . In this case, if we pass through the three
vertices in increasing order of internal angle, we have a counter-clockwise movement.

The partitioning of the feature vector set is made according to three structural
properties. Two of them are the values of the two largest angles. Notice that, since
the internal angles of a triangle always sum up to 180 degrees, considering all
angles would be redundant. The third property refers to a clockwise or counter-
clockwise direction defined by the three vertices in increasing order of internal
angle. Figure 5 has a simple example.
In our experiment we established a partition in 2 · 36 · 36 = 2592 subsets:
the angles are quantized by dividing the interval (0, 180) in 36 bins. The largest
subset in the partition has 504 keygraphs, a drastic reduction from the 51.002
possible ones.

3.3 Corner Chrominance Extraction


Figure 6 illustrates the scheme for extracting a feature vector from a keygraph.
Basically, the extraction consists of taking several internal segments and, for
each one of them, to calculate the mean chrominance of all pixels intersected by
the segment.
The chrominance values are obtained by converting the model to the HSV
color space and considering only the hue and saturation components. The seg-
ments are obtained by evenly partitioning bundles of lines projected from the

Fig. 6. Corner chrominance extraction. The gray segments define a limit for the
size of the projected lines. The white points defining the extremities of those lines
are positioned according to a fraction of the edge they belong to. In the above example
the fraction is 1/3.
Object Detection by Keygraph Classification 231

vertices. Finally, the size of those projected lines is limited by a segment whose
extremities are points in the keygraph edges.
This scheme is naturally invariant to rotation. Invariance to brightness is
ensured by the fact that we are considering only the chrominance and ignoring
the luminance. Finally, the invariance to scale is ensured by the fact that the
extremities mentioned above are positioned in the edges according to a fraction
of the size of the edge that they belong to, and not by any absolute value.

3.4 Results with Delaunay Triangulation


We could not use, during the classification phase, the same keygraph detector
we used during the training phase: it does not reduce enough the keygraph set
size for real-time performance. We use an alternative detector that gives us a
smaller subset of the set the training detector would give.
This alternative detector consists of selecting thick scalene triangles from a
Delaunay triangulation of the keypoints. A triangulation is a good source of
triangles because it covers the entire convex hull of the keypoints. And the De-
launay triangulation, in particular, can be calculated very efficiently, for example
with the Θ(n lg n) Fortune [10] algorithm.
Figure 7 shows some resulting screenshots. A full video can be seen at
http://www.vision.ime.usp.br/~mh/gbr2009/book.avi.

Fig. 7. Results showing object detection robust to scale and rotation

4 Conclusion

We presented a new framework for keypoint-based object detection that consists


of classifying keygraphs. With an implementation of this framework, where the
232 M. Hashimoto and R.M. Cesar Jr.

keygraphs are thick scalene triangles, we have shown successful results for real-
time detection after training with a single image.
The framework is very flexible and is not bounded to an specific keypoint
detector or keygraph detector. Therefore, room for improvement lies on both
the framework itself and the implementation of each one of its steps. We are
currently interested in using more sophisticated keygraphs and in adding the
usage of temporal information to adapt the framework to object tracking. Finally,
we expect to cope with 3D poses (i.e. out-of-plane rotations) by incorporating
aditional poses to the training set. These advances will be reported in due time.

Acknowledgments. We would like to thank FAPESP, CNPq, CAPES and


FINEP for the support.

References
1. Lowe, D.: Distinctive image features from scale-invariant keypoints. International
Journal of Computer Vision 20, 91–110 (2004)
2. Bay, H., Tuytelaars, T., van Gool, L.: SURF: Speeded Up Robust Features. In:
Leonardis, A., Bischof, H., Pinz, A. (eds.) ECCV 2006. LNCS, vol. 3951, pp. 404–
417. Springer, Heidelberg (2006)
3. Lepetit, V., Fua, P.: Keypoint recognition using randomized trees. IEEE Transac-
tions on Pattern Analysis and Machine Inteligence 28, 1465–1479 (2006)
4. Özuysal, M., Fua, P., Lepetit, V.: Fast keypoint recognition in ten lines of code. In:
Proceedings of the 2007 IEEE Computer Society Conference on Computer Vision
and Pattern Recognition, pp. 1–8. IEEE Computer Society, Los Alamitos (2007)
5. Schmid, C., Mohr, R.: Local grayvalue invariants for image retrieval. IEEE Trans-
actions on Pattern Analysis and Machine Inteligence 19, 530–535 (1997)
6. Tell, D., Carlsson, S.: Combining appearance and topology for wide baseline match-
ing. In: Heyden, A., Sparr, G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS,
vol. 2350, pp. 68–81. Springer, Heidelberg (2002)
7. Tang, F., Tao, H.: Object tracking with dynamic feature graph. In: Proceedings
of the 2nd Joint IEEE International Workshop on Visual Surveillance and Per-
formance Evaluation of Tracking and Surveillance, pp. 25–32. IEEE Computer
Society, Los Alamitos (2005)
8. OpenCV: http://opencv.willowgarage.com/
9. Shi, J., Tomasi, C.: Good features to track. In: Proceedings of the 1994 IEEE
Computer Society Conference on Computer Vision and Pattern Recognition, pp.
593–600. IEEE Computer Society, Los Alamitos (1994)
10. Fortune, S.: A sweepline algorithm for Voronoi diagrams. In: Proceedings of the
Second Annual Symposium on Computational Geometry, pp. 313–322. ACM, New
York (1986)
Graph Regularisation Using Gaussian Curvature

Hewayda ElGhawalby1,2 and Edwin R. Hancock1,


1
Department of Computer Science, University of York,
YO10 5DD, UK
2
Faculty of Engineering, Suez Canal university, Egypt
{howaida,erh}@cs.york.ac.uk

Abstract. This paper describes a new approach for regularising trian-


gulated graphs. We commence by embedding the graph onto a manifold
using the heat-kernel embedding. Under the embedding, each first-order
cycle of the graph becomes a triangle. Our aim is to use curvature infor-
mation associated with the edges of the graph to effect regularisation.
Using the difference in Euclidean and geodesic distances between nodes
under the embedding, we compute sectional curvatures associated with
the edges of the graph. Using the Gauss Bonnet Theorem we compute
the Gaussian curvature associated with each node from the sectional cur-
vatures and through the angular excess of the geodesic triangles. Using
the curvature information we perform regularisation with the advantage
of not requiring the solution of a partial differential equation. We exper-
iment with the resulting regularization process, and explore its effect on
both graph matching and graph clustering.

Keywords: Manifold regularization, Heat kernel, Hausdorff distance,


Gaussian curvature, Graph matching.

1 Introduction

In computer vision, image processing and graphics the data under consideration
frequently exists in the form of a graph or a mesh. The fundamental problems
that arise in the processing of such data are how to smooth, denoise, restore and
simplify data samples over a graph. The Principal difficulty of this task is how
to preserve the geometrical structures existing in the initial data. Many methods
have been proposed to solve this problem. Among existing methods, variational
techniques based on regularization, provide a general framework for designing
efficient filtering processes. Solutions to the variational models can be obtained
by minimizing an appropriate energy function. The minimization is usually per-
formed by designing a continuous partial differential equation, whose solutions
are discretized in order to fit with the data domain. A complete overview of these
methods in image processing can be found in ([1,2,3,4]). One of the problems as-
sociated with variational methods is that of distretisation, which for some types

The authors acknowledge the financial support from the FET programme within the
EU FP7, under the SIMBAD project (contract 213250).

A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 233–242, 2009.

c Springer-Verlag Berlin Heidelberg 2009
234 H. ElGhawalby and E.R. Hancock

of data can prove to be intractable. An alternative to the variational approach


is to make direct use of differential geometry and the calculus of variation to
regularize data on manifolds. There are two principal ways in which this may
be effected. The first approach is to use an intrinsic-parametric description of
the manifold and an explicit form of the metric, referred to as the Polyakov
action [12,22,23,24,25]. The second approach is to use an implicit representa-
tion of the manifold, referred to as the harmonic map [1,5,6,16,18]. In [19,20,21],
the relation between these two approaches was explained and a new approach
regularization on manifolds refered to as the Beltrami flow was introduced. An
implementation for the case of a manifold represented by a level set surface was
introduced in [19]. A method to compute the Beltrami flow for scalar functions
defined on triangulated manifolds using a local approximation of the operator
was proposed in [14].
The Laplace-Beltrami operator on a Riemannian manifold has been exten-
sively studied in the mathematics literature. Recently, there has been intense
interest in the spectral theory of the operator, and this has lead to the field of
study referred to as spectral geometry. This work has established relations be-
tween its first eigenvalue of the Beltrami operator and the geometrical properties
of the manifold including curvatures, diameter, injectivity radius and volume.
Recently, an alternative operator referred to as the p-Laplacian has attracted
considerable interest, and has proved a powerful means of solving geometric non-
linear partial differential equations arising in non-Newtonian fluids and nonlinear
elasticity.
In prior work [10], we have explored the problem of how to characterise graphs
in a geometric manner. The idea has been to embed graphs in a vector-space.
Under this embedding nodes become points on a manifold, and edges become
geodesics on the manifold. We use the differences between the geodesic and
Euclidean distances between points (i.e. nodes) connected by an edge to associate
sectional curvatures with edges. Using the Gauss-Bonnet theorem, we can extend
this characterisation to include the Gauss curvatures associated with nodes (i.e.
points on a manifold). Unfortunately, the approximations required to compute
these curvature characterisations can lead to unstable values. For this reason
in this paper, we turn to regularisation as a means of smoothing the Guassian
curvature estimates. To do this we investigate two cases of the p-Laplacian,
the Laplace and Curvature operators, for the purpose of regularisation, and use
the Gaussian curvature associated with the heat-kernel embedding of nodes as
regularisation function on the manifold. The idea of using functionals on graphs,
in a regularization process, has also been proposed in other contexts, such as
semi-supervised data learning [28,29] and image segmentation [2].

2 Functions and Operators on Graphs

In this section, we recall some basic prerequisites concerning graphs, and define
nonlocal operators which can be considered as discrete versions of continuous
differential operators.
Graph Regularisation Using Gaussian Curvature 235

2.1 Preliminaries
An undirected unweighted graph denoted by G = (V, E) consists of a finite set
of nodes V and a finite set of edges E ⊆ V × V . The elements of the adjacency
matrix A of the graph G is defined by:

1 if (u, v) ∈ E
A(u, v) = (1)
0 otherwise
To construct the Laplacian
 matrix we first establish a diagonal degree matrix D
with elements D(u, u) = v∈V A(u, v) = du . From the degree and the adjacency
matrices we can construct the Laplacian matrix L = D − A, that is the degree
matrix minus the adjacency matrix.

⎨ du if u = v
L(u, v) = −1 if (u, v) ∈ E (2)

0 otherwise

The normalized Laplacian is given by L  = D−1/2 LD−1/2 . The spectral decom-



position of the normalized Laplacian matrix is L  = ΦLΦT = |V | λi φi φT where
i=1 i
|V | is the number of nodes and Λ = diag(λ1 , λ2 , ..., λ|V | ), (0 < λ1 < λ2 <
... < λ|V | ) is the diagonal matrix with the ordered eigenvalues as elements and
Φ = (φ1 |φ2 |...|φ|V | ) is the matrix with the eigenvectors as columns.

2.2 Embedding Graphs onto Manifolds


We follow Bai and Hancock [26] and make use of the heat kernel embedding.
The heat kernel plays an important role in spectral graph theory. It encapsulates
the way in which information flows through the edges of graph over time under
the heat equation, and is the solution of the partial differential equation
∂ht  t
= −Lh (3)
∂t
where ht is the heat kernel and t is the time. The solution is found by exponen-
tiating the Laplacian eigenspectrum as follows
 = Φ exp[−tΛ]ΦT
ht = exp[−Lt] (4)
For the heat kernel, the matrix of embedding coordinates Y (i.e. the matrix
whose columns are the vectors of node coordinates) is found by performing the
Young-Householder [27] decomposition ht = Y T Y as a result the matrix of node
embedding coordinates is
1
Y = (y1 |y2 |...|y|V | ) = exp[− tΛ]ΦT (5)
2
where yu is the coordinate vector for the node u. In the vector space, the Eu-
clidean distance between the nodes u and v of the graph is
|V |
d2e (u, v) = (yu − yv )T (yu − yv ) = exp[−λi t](φi (u) − φi (v))2 (6)
i=1
236 H. ElGhawalby and E.R. Hancock

2.3 Functions on Graphs


For the purpose of representing the data we use a discrete real-valued function
f : V → !, which assigns a real value f(u) to each vertex u ∈ V . Functions
of this type form a discrete N -dimensional space. They can be represented by
vectors of !N , f = f(u)u∈V , and interpreted as the intensity of a discrete signal
defined on the vertices of the graph. By analogy with continuous functional
spaces,
# the  discrete integral of a function f : V → !, on the graph G, is defined
by G f = u∈V f(u). Let H(V ) denote the Hilbert space of the real-valued
functions on the vertices of G and endowed with the usual inner product:

f,hH(V ) = f(u)h(u) , f,h : V → ! (7)


u∈V

1/2
with the induced L2 - norm: f 2 = f,hH(V ) .

2.4 The p-Laplacian Operator


For a smooth Riemannian manifold M and a real number p ∈ (1, +∞), the
p-Laplacian operator of a function f ∈ H(V ), denoted Lp : H(V ) → H(V ) is
defined by

1 f(u) − f(v) f(u) − f(v)


Lp f(u) = (  +  ) (8)
2 v∼u ( u∼v (f(u) − f(v))2 )p−2 ( v∼u (f(v) − f(u)) )
2 p−2

This operator arises naturally from the variational problem associated to the
energy function [13]. The p-Laplace operator is nonlinear, with the exception of
p=2, where it corresponds to the combinatorial graph Laplacian, which is one
of the classical second order operators defined in the context of spectral graph
theory [7]
Lf(u) = (f(u) − f(v)) (9)
v∼u

Another particular case of the p-Laplace operator is obtained with p=1 . In this
case, it is the curvature of the function f on the graph

1 f(u) − f(v) f(u) − f(v)


κf(u) = (  +  ) (10)
u∼v (f(u) − f(v)) v∼u (f(v) − f(u))
2 v∼u 2 2

κ corresponds to the curvature operator proposed in [17] and [4] in the context
of image restoration. More generally, κ is the discrete analogue of the mean
curvature of the level curve of a function defined on a continuous domain of !N .

3 The Gaussian Curvature


Curvature is a local measure of geometry and can be used to represent local shape
information. We choose the function f to be the Gaussian curvature defined
Graph Regularisation Using Gaussian Curvature 237

over the vertices. Gaussian curvature is one of the fundamental second order
geometric properties of a surface, and it is an intrinsic property of a surface
independent of the coordinate system used to describe it. As stated by Gauss’s
theorema egregium [11], it depends only on how distance is measured on the
surface, not on the way it is embedded on the space.

3.1 Geometric Preliminaries

Let T be the embedding of a triangulated graph onto a smooth surface M in


!3 , Ag be the area of a geodesic triangle on M with angles {αi }3i=1 and geodesic
edge lengths {dgi }3i=1 , and Ae be the area of the corresponding Euclidean triangle
with edge lengths {dei }3i=1 and angles {ϕi }3i=1 . Assuming that each geodesic is a
great arc on a sphere with radius Ri , i = 1, 2, 3 corresponding to a central angle
2θ, and that the
3geodesic triangle is a triangle on the surface of a sphere with
radius R = 13 i=1 Ri , with the Euclidean distance between the pair of nodes
3
to be de = 13 i=1 dei . Considering a small area element on the sphere given in
spherical coordinates by dA = R2 sin θdθdϕ, the integration of dA bounded by
2θ gives us the following formula for computing the area of the geodesic triangle

1 2
Ag = d (11)
2R e

where d2e is computed from the embedding using (6).

3.2 Gaussian Curvature from Gauss Bonnet Theorem

For a smooth compact oriented Riemannian 2-manifold M , let "g be a triangle


on M whose sides are geodesics, i.e. paths of shortest length on the manifold.
Further, let α1 , α2 and α3 denote the interior angles of the triangle. According to
Gauss’s theorem, if the Gaussian curvature K (i.e. the product of the maximum
and minimum curvatures at a point on the manifold) is integrated over "g , then
! 3
KdM = αi − π (12)
g i=1

where dM is the Riemannian volume element. Since all the points, except for the
vertices, of a piecewise linear surface have a neighborhood isometric to a planar
Euclidean domain with zero curvature, the Gauss curvature is concentrated at
the isolated vertices. Hence, to estimate the Gaussian curvature of a smooth
surface from its triangulation, we need to normalize by the surface area, which
here is the area of the triangle. Consequently, we will assign one third of the
triangle area to each vertex. Hence, the Gaussian curvature associated with each
vertex will be #
g
KdM
κg = 1 (13)
3A
238 H. ElGhawalby and E.R. Hancock

from (12) we get


3
i=1 αi −π
κg = 1 (14)
3A

Substituting by the area from (12), eventually we get

3
κg = (15)
R2
Recalling that the Gaussian curvature is the product of the two principle cur-
vatures, and that the curvature of a point on a sphere is the reciprocal of the
radius of the sphere gives an explanation for the result in (15). As we assumed
earlier that the geodesic is a great arc of a circle of radius R, in [10] we deduced
that
1 24(dg − de )
= dg − (16)
R2 d3g
and since for an edge of the graph dg = 1, we have

1
= 24(1 − de ) (17)
R2
From (15) and (17) the Gaussian curvature associated with the embedded node
can be found from the following formula

κg = 72(1 − de ) (18)

4 Hausdorff Distance

We experiment with the Gaussian curvatures as node-based attributes for the


purposes of graph-matching. We represent the graphs under study using sets of
curvatures, and compute the similarity of sets resulting from different graphs
using the robust modified Hausdorff distance. The Hausdorff distance provides
a means of computing the distance between sets of unordered observations when
the correspondences between the individual items are unknown. In its most gen-
eral setting, the Hausdorff distance is defined between compact sets in a metric
space. Given two such sets, we consider for each point in one set is the closest
point in the second set. The modified Hausdorff distance is the average over all
these values. More formally, the modified Hausdorff distance (HD) [9] between
two finite point sets A and B is given by

H(A, B) = max(h(A, B), h(B, A)) (19)

where the directed modified Hausdorff distance from A to B is defined to be


1
h(A, B) = min a − b (20)
NA b∈B
a∈A
Graph Regularisation Using Gaussian Curvature 239

and . is some underlying norm on the points of A and B (e.g., the L2 or Eu-
clidean norm). Using these ingredients we can describe how the modified Haus-
dorff distances can be extended to graph-based representations. To commence
let us consider two graphs G1 = (V1 , E1 , T1 , κ1 ) and G2 = (V2 , E2 , T1 , κ2 ), where
V1 ,V2 are the sets of nodes, E1 , E2 the sets of edges, T1 ,T2 are the sets of tri-
angles, and κ1 ,κ2 the sets of the Gaussian curvature associated with each node
defined in §3.2. We can now write the distances between two graphs as follows:
1
hMHD (G1 , G2 ) = min κ2 (j) − κ1 (i)) (21)
|V1 | j∈V2
i∈V1

5 Multidimensional Scaling
For the purpose of visualization, the classical Multidimensional Scaling (MDS)
[8] is a commonly used technique to embed the data specified in the matrix in Eu-
clidean space. Given that H is the distance matrix with row r and column c entry
Hrc . The first step of MDS is to calculate a matrix T whose element with row r
and column c is given by Trc = − 21 [Hrc 2
−H  r.
2  .c
−H 2  ..2 ] where H
+H  r. = 1 N Hrc
N c=1
is the average value over the rth row in the distance matrix, H  . c is 
the simi-
larly defined average value over the cth column and H  .. = 12 N N
N r=1 c=1 Hrc
is the average value over all rows and columns of the distance matrix. Then, we
subject the matrix T to an eigenvector analysis to obtain a matrix of embed-
ding coordinates X. If the rank of T is k; k ≤ N , then we will have knon-zero
eigenvalues. We arrange these k non-zero eigenvalues in descending order, i.e.,
l1 ≥ l2 ≥ ... ≥ lk ≥ 0. The corresponding ordered eigenvectors are denoted by ui
where li √ is the ith
√ eigenvalue.
√ The embedding coordinate system for the graphs
is X = [ l1 u1 , l2 u2 , ..., lk uk ] for the graph indexed i, the embedded vector
of the coordinates is xi = (Xi,1 , Xi,2 , ..., Xi,k )T .

6 Experiments
For the purposes of experimentation we use the standard CMU, MOVI and
chalet house image sequences as our data set [15]. These data sets contain dif-
ferent views of model houses from equally spaced viewing directions. From the
house images, corner features are extracted, and Delaunay graphs represent-
ing the arrangement of feature points are constructed. Our data consists of ten
graphs for each of the three houses. To commence, we compute the Euclidean
distances between the nodes in each graph based on the Laplacian and then on
the heat kernel with the values of t = 10.0, 1.0, 0.1 and 0.01. Then we compute
the Gaussian curvature associated with each node using the formula given in §.
Commencing with each node attributed with the the Gaussian curvatures
(as the value of a real function f acting on the nodes of the graph), we can
regularise each graph by applying the the p-Laplacian operator to the Gaussian
curvatures. For each graph we construct a set of regularised Gaussian curvatures
using both the Laplace operator and the curvature operator, as a special cases
240 H. ElGhawalby and E.R. Hancock

1500 1000 250 40

1000 200
500 20
500
150

0 0 0
100
−500
−500 50 −20
−1000
0
−1500 −1000 −40

−50
−2000
−1500 −60
−2500 −100

−3000 −2000 −150 −80


−8000 −6000 −4000 −2000 0 2000 4000 6000 −4000 −3000 −2000 −1000 0 1000 2000 3000 4000 5000 −600 −400 −200 0 200 400 600 −150 −100 −50 0 50 100 150 200

Fig. 1. MDS embedding obtained using Laplace operator to regularize the houses data
resulting from the heat kernel embedding

4000 3000 300 80

3000 60
200
2000
40
2000
100
1000 20
1000
0 0
0 0
−100 −20
−1000
−1000 −40
−200
−2000
−60
−2000
−3000 −300
−80

−4000 −3000 −400 −100


−6000 −4000 −2000 0 2000 4000 6000 8000 −5000 −4000 −3000 −2000 −1000 0 1000 2000 3000 4000 −500 0 500 −150 −100 −50 0 50 100 150

Fig. 2. MDS embedding obtained using Curvature operator to regularize houses data
resulting from the heat kernel embedding

600 2000

400 1500

1000
200

500
0
0
−200
−500

−400
−1000

−600 −1500

−800 −2000
−4000 −3000 −2000 −1000 0 1000 2000 −3000 −2000 −1000 0 1000 2000 3000

Fig. 3. MDS embedding obtained using Laplace operator(left) and the Curvature opra-
tor (right) to regularize the houses data resulting from the Laplacian embedding

of the p-Laplacian operator. The next step is to compute the distances between
the sets for the thirty different graphs using the modified Hausdorff distance.
Finally, we subject the distance matrices to the Multidimensional Scaling (MDS)
procedure to embed them into a 2D space. Here each graph is represented by
a single point. Figure 1 shows the results obtained using the Laplace operator.
The subfigures are ordered from left to right, using the heat kernel embedding
with the values t = 10.0, 1.0, 0.1 and 0.01. Figure 2 shows the corresponding
results obtained when the Curvature operator is used. Figure 3 shows the results
obtained whenusing the Laplacian embedding, from the Laplace operator (left)
and the Curvature operator (right).
To investigate the results in more detail table 1 shows the rand index for the
distance as a function of t. This index is computed as follows: 1) compute the
mean for each cluster; 2) compute the distance from each point to each mean;
3) if the distance from correct mean is smaller than those to remaining means,
then classification is correct, if not then classification is incorrect; 4) compute
the Rand-index (incorrect/(incorrect+correct)).
Graph Regularisation Using Gaussian Curvature 241

Table 1. A rand index vs. t

lap t=10 t=1.0 t=0.1 t=0.01


Lalplace operator 0.0000 0.0000 0.0000 0.0000 0.0000
Curvature operator 0.1667 0.0000 0.0000 0.0000 0.0000

From this experimental study, we conclude that the proposed regularization


procedure, using two special cases of the p-Laplacian operator (Laplace and
Curvature operators) improves the processes of graph matching and clustering.

7 Conclusion and Future Work

In this paper, a process for regularizing the curvature attributes associated with
the geometric embedding of graphs was presented. Experiments show that it
is an efficient procedure for the purpose of gauging the similarity of pairs of
graphs. The regularisation procedure improves the results obtained with graph
clustering. Our future plans are twofold. First, we aim to explore if geodesic
flows along the edges of the graphs can be used to implement a more effective
regularisation process. Second, we aim to apply our methods to problems of
image and mesh smoothing.

References
1. Bertalmio, M., Cheng, L.T., Osher, S., Sapiro, G.: Variational problems and partial
differential equations on implicit surfaces. Journal of Computational Physics 174,
759–780 (2001)
2. Bougleux, S., Elmoataz, A.: Image smoothing and segmentation by graph regular-
ization. LNCS, vol. 3656, pp. 745–752. Springer, Heidelberg (2005)
3. Boykov, Y., Huttenlocher, D.: A new bayesian framework for object recognition. In:
Proceeding of IEEE Computer Society Conference on CVPR, vol. 2, pp. 517–523
(1999)
4. Chan, T., Osher, S., Shen, J.: The digital tv filter and nonlinear denoising. IEEE
Trans. Image Process 10(2), 231–241 (2001)
5. Chan, T., Shen, J.: Variational restoration of non-flat image features: Models and
algorithms. SIAM J. Appl. Math. 61, 1338–1361 (2000)
6. Cheng, L., Burchard, P., Merriman, B., Osher, S.: Motion of curves constrained
on surfaces using a level set approach. Technical report, UCLA CAM Technical
Report (00-32) (September 2000)
7. Chung, F.R.: Spectral graph theory. In: Proc. CBMS Regional Conf. Ser. Math.,
vol. 92, pp. 1–212 (1997)
8. Cox, T., Cox, M.: Multidimensional Scaling. Chapman-Hall, Boca Raton (1994)
9. Dubuisson, M., Jain, A.: A modified hausdorff distance for object matching, pp.
566–568 (1994)
10. ElGhawalby, H., Hancock, E.R.: Measuring graph similarity using spectral geom-
etry. In: Campilho, A., Kamel, M.S. (eds.) ICIAR 2008. LNCS, vol. 5112, pp.
517–526. Springer, Heidelberg (2008)
242 H. ElGhawalby and E.R. Hancock

11. Gauss, C.F.: Allgemeine Flächentheorie (Translated from Latin). W. Engelmann


(1900)
12. Kimmel, R., Malladi, R., Sochen, N.: Images as embedding maps and minimal sur-
faces: Movies, color, texture, and volumetric medical images. International Journal
of Computer Vision 39(2), 111–129 (2000)
13. Lim, B.P., Montenegro, J.F., Santos, N.L.: Eigenvalues estimates for the p-laplace
operator on manifolds. arXiv:0808.2028v1 [math.DG], August 14 (2008)
14. Lopez-Perez, L., Deriche, R., Sochen, N.: The beltrami flow over triangulated man-
ifolds. In: Sonka, M., Kakadiaris, I.A., Kybic, J. (eds.) CVAMIA/MMBIA 2004.
LNCS, vol. 3117, pp. 135–144. Springer, Heidelberg (2004)
15. Luo, B., Wilson, R.C., Hancock, E.R.: Spectral embedding of graphs. Pattern
Recogintion 36, 2213–2230 (2003)
16. Memoli, F., Sapiro, G., Osher, S.: Solving variational problems and partial differ-
ential equations, mapping into general target manifolds. Technical report, UCLA
CAM Technical Report (02-04) (January 2002)
17. Osher, S., Shen, J.: Digitized pde method for data restoration. In: Anastassiou,
E.G.A. (ed.) Analytical-Computational methods in Applied Mathematics, pp. 751–
771. Chapman & Hall/CRC, New York (2000)
18. Sapiro, G.: Geometric Partial Differential Equations and Image Analysis. Cam-
bridge University Press, Cambridge (2001)
19. Sochen, N., Deriche, R., Lopez-Perez, L.: The beltrami flow over implicit manifolds.
In: ICCV (2003)
20. Sochen, N., Deriche, R., Lopez-Perez, L.: Variational beltrami flows over manifolds.
In: IEEE ICIP 2003, Barcelone (2003)
21. Sochen, N., Deriche, R., Lopez-Perez, L.: Variational beltrami flows over manifolds.
Technical report, INRIA Resarch Report 4897 (June 2003)
22. Sochen, N., Kimmel, R.: Stereographic orientation diffusion. In: Proceedings of the
4th Int. Conf. on Scale-Space, Vancouver, Canada (October 2001)
23. Sochen, N., Kimmel, R., Malladi, R.: From high energy physics to low level vision.
Report, LBNL, UC Berkeley, LBNL 39243, August, Presented in ONR workshop,
UCLA, September 5 (1996)
24. Sochen, N., Kimmel, R., Malladi, R.: A general framework for low level vision.
IEEE Trans. on Image Processing 7, 310–318 (1998)
25. Sochen, N., Zeevi, Y.: Representation of colored images by manifolds embedded in
higher dimensional non-euclidean space. In: Proc. IEEE ICIP 1998, Chicago (1998)
26. Xiao, B., Hancock, E.R.: Heat kernel, riemannian manifolds and graph embedding.
In: Fred, A., Caelli, T.M., Duin, R.P.W., Campilho, A.C., de Ridder, D. (eds.)
SSPR&SPR 2004. LNCS, vol. 3138, pp. 198–206. Springer, Heidelberg (2004)
27. Young, G., Householder, A.S.: Disscussion of a set of points in terms of their mutual
distances. Psychometrika 3, 19–22 (1938)
28. Zhou, D., Schlkopf, B.: Regularization on discrete spaces. In: Kropatsch, W.G.,
Sablatnig, R., Hanbury, A. (eds.) DAGM 2005. LNCS, vol. 3663, pp. 361–368.
Springer, Heidelberg (2005)
29. Zhou, D., Schlkopf, B.: In: Chapelle, O., Schlkopf, B., Zien, A. (eds.) Semi-
supervised learning, pp. 221–232 (2006)
Characteristic Polynomial Analysis on Matrix
Representations of Graphs

Peng Ren, Richard C. Wilson, and Edwin R. Hancock

Department of Computer Science, The University of York, York, YO10 5DD, UK


{pengren, wilson, erh}@cs.york.ac.uk

Abstract. Matrix representations for graphs play an important role in combina-


torics. In this paper, we investigate four matrix representations for graphs and
carry out an characteristic polynomial analysis upon them. The first two graph
matrices are the adjacency matrix and Laplacian matrix. These two matrices can
be obtained straightforwardly from graphs. The second two matrix representa-
tions, which are newly introduced [9][3], are closely related with the Ihara zeta
function and the discrete time quantum walk. They have a similar form and are
established from a transformed graph, i.e. the oriented line graph of the origi-
nal graph. We make use of the characteristic polynomial coefficients of the four
matrices to characterize graphs and construct pattern spaces with a fixed dimen-
sionality. Experimental results indicate that the two matrices in the transformed
domain perform better than the two matrices in the original graph domain whereas
the matrix associated with the Ihara zeta function is more efficient and effective
than the matrix associated with the discrete time quantum walk and the remaining
methods.

1 Introduction
Pattern analysis using graph structures has proved to be a challenging and sometimes
elusive problem. The main reason for this is that graphs are not vectorial in nature, and
hence they are not amenable to classical statistical methods from pattern recognition or
machine learning [7]. One way to overcome this problem is to extract feature vectors
from graphs which succinctly capture their structure in a manner that is permutation in-
variant. There are a number of ways in which this may be accomplished. One approach
is to use simple features such as the numbers of edges and nodes, edge density or di-
ameters. A more sophisticated approach is to count the numbers of cycles of different
order. Alternatively graph-spectra can be used [7][10].
However, one elegant way in which to capture graph structure is to compute the char-
acteristic polynomial. To do so requires a matrix characterization M of the graph, and
the characteristic polynomial is the determinant det(λI − M) where I is the identity
matrix and λ the variable of the polynomial. The simplest way to exploit the character-
istic polynomial is to use its coefficients. With an appropriate choice of matrix M, these
coefficients are determined by the cycle frequencies in the graph. They are also easily
computed from the spectrum of M. Moreover, since it is determined by the numbers of
cycles in a graph, the characteristic polynomial may also be used to define an analogue
of the Riemann zeta function from number theory for a graph. Here the zeta function is

A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 243–252, 2009.

c Springer-Verlag Berlin Heidelberg 2009
244 P. Ren, R.C. Wilson, and E.R. Hancock

determined by the reciprocal of the characteristic polynomial, and prime cycles deter-
mine the poles of the zeta function in a manner analogous to the prime numbers. The
recent work of Bai et al[2] and Ren et al[8][9] has shown that practical characterizations
can be extracted from different forms of the zeta function and used for the purposes of
graph-based object recognition. Finally, it is interesting to note that if the matrix M is
chosen to be the adjacency matrix T of the oriented line graph derived from a graph,
which is also called the Perron-Frobenius operator, then the characteristic polynomial
is linked to the Ihara zeta function of the original graph.
As noted above, the characteristic polynomial is determined by the choice of the
matrix M. Here there are a number of alternatives including the adjacency matrix A,
the Laplacian matrix L = D − A where D is the node degree matrix, and the Perron-
Frobenius operator T where the graph is transformed prior to the computation of the
characteristic polynomial. To compute the Iharra zeta function the oriented line graph is
first constructed and then the characteristic polynomial is computed from its adjacency
matrix. This is similar to the approach taken by Emms et al [3] in their study of discrete
time quantum walks. However, rather than characterizing the oriented line graph using
the adjacency matrix, they construct a unitary matrix U which captures the transitions
of a quantum walk controlled by a Grover coin. The resulting unitary matrix proves to
be a powerful tool for analyzing graphs since the spectrum of the positive support of its
third power (denoted by sp(S + (U3 )) can be used to resolve structural ambiguities due
to the cospectrality of strongly regular graphs.
The aim in this paper is to explore the roles of matrix graph representation in the
construction of characteristic polynomials. In particular we are interested in which
combination is most informative in terms of graph-structure and which gives the best
empirical performance when graph clustering is attempted using the characteristic poly-
nomial coefficients. We study both the original graph and its oriented line graph. The
matrix characterizations used are the adjacency matrix A, the Laplacian matrix L, the
transition matrix T and the unitary characterization U.

2 Classical Graph Matrix Representations

To commence, suppose that the graph under study is denoted by G = (V, E) where
V is the set of nodes and E ⊆ V × V is the set of edges. Since we wish to adopt a
graph spectral approach we introduce the adjacency matrix A for the graph where the
elements are 
1 if u, v ∈ E
A(u, v) = (1)
0 otherwise
We also
construct the diagonal degree matrix D, whose elements are given by D(u, u) =
du = u∈V A(u, v). From the degree matrix and the adjacency matrix we construct
the Laplacian matrix L = D − A, i.e. the degree matrix minus the adjacency matrix

⎨ du if u = v
L(u, v) = 1 if u, v ∈ E but u = v (2)

0 otherwise
Characteristic Polynomial Analysis on Matrix Representations of Graphs 245

3 The Ihara Zeta Function


The Ihara zeta function for a graph is a generalization of the Riemann zeta function
from number theory. In the definition of the Ihara zeta function, the ’prime number’ in
the Euler product expansion of the Riemann zeta function is replaced by a ’prime cycle’,
i.e. cycles with no backtracking in a graph. The definition of the Ihara zeta function of
a graph G(V, E) is a product form which runs over all the possible prime cycles
−1
ZG (u) = 1 − uL(p) (3)
[p]

Here, p denotes a prime cycle and L(p) denotes the length of p. As shown in (3), the
Ihara zeta function is generally an infinite product. However, one of its elegant features
is that it can be collapsed down into a rational function, which renders it of practical
utility.

3.1 Rational Expression


For a graph G(V, E) with the vertex set V of cardinality |V | = N and the edge set E
of cardinality |E| = M , the rational expression of the Ihara zeta function is [4][5]:
 χ(G)  −1
ZG (u) = 1 − u2 det IN − uA + u2 Q (4)

Here, χ(G)is the Euler number of the graph G(V, E), which is defined as the difference
between the vertex number and the edge number of the graph, i.e. χ(G) = N − M , A
is the adjacency matrix of the graph, Ik denotes the k × k identity matrix, and Q is the
matrix difference of the degree matrix D and the identity matrix IN , i.e. Q = D − IN .
From (4) it has been shown that the Ihara zeta function is permutation invariant to vertex
label permutations [9]. This is because permutation matrices, which represent vertex
label permutations in matrix calculation, have no effect on the determinant in (4).

3.2 Determinant Expression


For md2 graphs, i.e. the graphs with vertex degree at least 2, it is straightforward to
show that (4) can be rewritten in the form of the reciprocal of a polynomial. However,
it is difficult to compute the coefficients of the reciprocal of the Ihara zeta function
from (4) in a uniform way, except by resorting to software for symbolic calculation. To
efficiently compute these coefficients, it is more convenient to transform the rational
form of the Ihara zeta function in (4) into a concise expression. The Ihara zeta function
can also be written in the form of a determinant[6]:
−1
ZG (u) = det(I2M − uT) (5)

where T is the Perron-Frobenius operator on the oriented line graph of the original
graph, and is an 2M × 2M square matrix.
To obtain the Perron-Frobenius operator T, we must construct the oriented line graph
of the original graph from the associated symmetric digraph. The symmetric digraph
246 P. Ren, R.C. Wilson, and E.R. Hancock

DG(V, Ed ) of a graph G(V, E) is composed of a finite nonempty vertex set V identical


to that of G(V, E) and a finite multiset Ed of oriented edges called arcs, which consist
of ordered pairs of vertices. For arc ed (u, w) ∈ Ed where u and v are elements in
V , the origin of ed (u, w) is defined to be o(ed ) = u and the terminus is t(ed ) = v.
Its inverse arc, which is formed by switching the origin and terminus of ed (u, w), is
denoted by ed (w, u). For the graph G(V, E), we can obtain the associated symmetric
digraph SDG(V, Ed ) by replacing each edge of G(V, E) with the arc pair in which the
two arcs are inverse to each other.
The oriented line graph associated with the original graph can be defined using the
symmetric digraph. It is a dual graph of the symmetric digraph since its oriented edge
set and vertex set are constructed from the vertex set and the oriented edge (arc) set of
its corresponding symmetric digraph. The construction of the oriented edge set and the
vertex set of the oriented line graph can be formulated as follows:

⎨ VL = Ed (SDG)
EdL = {(ed (u, v), ed (v, w)) (6)

∈ Ed (SDG) × Ed (SDG); u = w}

The Perron-Frobenius operator T of the original graph is the adjacency matrix of the
associated oriented line graph. For the (i, j)th entry of T, T(i, j) is 1 if there is one
edge directed from the vertex with label i to the vertex with label j in the oriented line
graph, and is 0 otherwise.
Unlike the adjacency matrix of an undirected graph, the Perron-Frobenius operator is
not a symmetric matrix. This is because of a constraint that arises in the construction of
oriented edges. Specifically, the arc pair with two arcs that are the reverse of one another
in the symmetric digraph are not allowed to establish an oriented edge in the oriented
line graph. This constraint arises from the second requirement in the edge definition
appearing in (6).
The Perron-Frobenius operator T is matrix representation which can convey the in-
formation contained in the Ihara zeta function for a graph. It is the adjacency matrix of
the oriented line graph associated with the original graph. As T is not symmetric, the
Laplacian form of the Perron-Frobenius operator can not be uniformly defined because
the relevant vertex degree can be calculated from either incoming or outgoing edges
in the oriented line graph which is a directed graph. In this study, we consider three
types of Laplacian matrices of the Perron-Frobenius operator. They are defined as the
incoming degree matrix minus the T, the outgoing degree matrix minus T and the sum
of both incoming degree matrix and the outgoing degree matrix minus T, respectively.

4 The Discrete Time Quantum Walks


The discrete-time quantum walk is the quantum counterpart of the discrete-time classical
random walk and has been used in the design of new quantum algorithms on graphs [1].
Quantum processes are reversible, and in order to make the discrete-time quantum walk
reversible a particular state must specify both the current and previous location of the
walk. To this end each edge of the graph G(V, E), e(u, v) ⊂ E, is replaced by a pair of
arcs ed (u, v) and ed (v, u), and the set of these arcs is denoted by Ed . This is the same
Characteristic Polynomial Analysis on Matrix Representations of Graphs 247

as the intermediate step of constructing the digraph to compute the determinant of the
Ihara zeta function. The state space for the discrete-time quantum walk is the set of arcs
Ed . If the walk is at vertex v having previously been at vertex u with probability 1, then
the state is written as |ψ = |uv. Transitions are possible form one arc ed (w, x) to
another arc ed (u, v), i.e. from a state |wx to |uv, and only if x = u and x is adjacent
to v. Note that this corresponds to only permitting transitions between adjacent vertices.
The state vector for the walk is a quantum superposition of states on single arcs of the
graph, and can be written as

|ψ = αuv |uv (7)


ed (u,v)∈Ed

where the quantum amplitudes are complex, i.e. α ∈ C. Using (7), the probability
that the walk is in the state |uv is given by Pr(|uv) = αuv α∗uv . As with the clas-
sical walk,the evolution of the state vector is determined by a matrix, in this case de-
noted U, according to |ψt+1  = U|ψt . Since the evolution of the walk is linear and
conserves probability the matrix U must be unitary. That is, the inverse is equal to
the complex conjugate of the matrix transposed, i.e. U−1 = U† . The entries of U de-
termine the probabilities for transitions between states. Thus, there are constraints on
these entries and there are therefore constraints on the permissible amplitudes for the
transitions. The sum of the squares of the amplitudes for all the transitions from a par-
ticular state must be unity. Consider a state |ψ = |u1 v where the neighborhood of v,
N (v) = u1 , u2 , · · · , ur . A single step of the walk should only assign non-zero quan-
tum amplitudes to transitions between adjacent states, i.e.the states |vui  where ui ∈ N .
However, since U must be unitary these amplitudes cannot all be the same. Recall that
the walk does not rely on any labeling of the edges or vertices. Thus, the most gen-
eral form of transition will be one that assigns the same amplitudes to all transitions
|u1 v → |vui , ui ∈ N \ u1 , and a different amplitude to the transition |u1 v → |vu1 .
The second of these two transitions corresponds to the walk returning along the same
edge to which it came. Thus, the transition will be of the form
r
|u1 v → |vu1  + b |vui , a, b ∈ C (8)
i=2

It is usual to use the Grover diffusion matrices which assign quantum amplitudes of
a = 2/dv − 1 when the walk returns a long the same edge and b = 2/dv for all other
transitions. Such matrices are used as they are the matrices furthest from the identity
which are unitary and are not dependent on any labeling of the vertices.
Using the Grover diffusion matrices, the matrix, U, that governs the evolution of the
walk has entries  2
− δvw if u = x
U(u,v),(w,x) = dx (9)
0 otherwise
Note that the state transition matrix U in discrete time quantum walk and the
Perron-Frobenius operator T in the Ihara zeta function have has similar form. They
are of the same dimensionality for a graph. Specifically, all the non-zero entries of T
are 1 while the same entries in U are weighted twice of the reciprocal of the connecting
248 P. Ren, R.C. Wilson, and E.R. Hancock

vertex degree in the original graph. Additionally, the entries representing reverse arcs
in T generally have a non-zero value 2/dx − 1 while the same entries in T are always
set zero.
In [3], the spectrum of the positive support of third power of U (denoted by sp
(S + (U3 )) is shown to distinguish cospectral graphs. Thus, sp(S + (U3 )) proves an ef-
fective graph representation matrix.

5 Characteristic Polynomials
Once the graph representation matrices are to hand, our task is how to characterize
graphs using different matrix representations and thus distinguish graphs from different
classes. One simple but effective way to embed graphs into a pattern space is furnished
by spectral methods [7]. The eigenvalues of the representation matrices are used as the
elements of graph feature vectors. However, graphs with different sizes have different
numbers of eigenvalues. There are generally two ways to overcome this problem. The
first is to establish the patten space with a dimensionality the same as the cardinality of
the vertex set of the graph with the greatest size. Feature vectors of the smaller graphs
are adjusted to the same length by padding zeros before the non-zero eigenvalues up
to the dimension of the pattern space. One drawback of this method is that the upper
bound on the dimension, i.e. the size of the largest graph, should be known before-
hand. Furthermore, for a pattern space with a high dimensionality, there would be too
many unnecessary zeros in the feature vectors of small graphs. The second method for
dealing with size difference of graphs is spectral truncation. In this case, a fix-sized
subset of eigenvalues for the different graphs are used to establish feature vectors. For
example, a fixed number of the leading non-zero eigenvalues are chosen as the ele-
ments of a feature vector. This method does not require prior knowledge of the size
of the largest graph. Nevertheless, it only takes advantage of a fraction of the spectral
information available and thus induces varying degrees information loss. To overcome
these drawbacks associated with traditional spectral methods, we take advantage of the
characteristic polynomial of the representation matrices. The characteristic polynomial
p(λ) of a matrix M with size N is defined as follows

p(λ) = det(λI − M) = c0 λN + c1 λN −1 + · · · + cN −1 λ + cN (10)

From (10), the characteristic polynomial of a matrix M is a function with variable λ.


The roots {λ1 , λ2 · · · λN } of the equation p(λ) = 0 is the set of eigenvalues of the
matrix M, i.e. the spectrum of M. The key point here is that there is a close relationship
between the roots and the polynomial coefficients as follows

cr = (−1)r λk1 λk2 ... λkr (11)


k1 <k2 < ... <kr

Since these coefficients are closely related to the spectrum, they can be regarded as a
possible way to characterize graphs. Here we propose to use the characteristic polyno-
mial coefficients as the elements of the feature vector for a graph, instead of the eigen-
values. In this way we embed the graphs into a pattern space using the feature vectors
Characteristic Polynomial Analysis on Matrix Representations of Graphs 249

based on the characteristic polynomial coefficients. The merit of using the character-
istic polynomial coefficients over spectral embedding methods is that the coefficients
naturally take advantage of the complete spectrum and thus do not induce spectral trun-
cation. Hence the dimensionality of the pattern space can be determined without taking
into account the graph size differences.

6 Experimental Results
We experiment with the proposed feature vectors consisting of characteristic poly-
nomial coefficients on graphs extracted from the COIL database (samples shown in
Figure 1). We first extract corner points using the Harris detector. Then we establish
Delaunay graphs based on these corner points as nodes. The graphs extracted from
sample objects are superimposed upon the sample images in Figure 1.
We choose to work with the coefficient subset {c3 , c4 , cN −3 , cN −2 , cN −1 , cN }
because these coefficients tend to be the most salient ones in the relevant matrix repre-
sentations [8]. We establish the feature vector at two levels of scales, i.e. scaling the last
four coefficients by natural logarithm v1 = {c3 , c4 , ln(cN −3 ), ln(cN −2 ), ln(cN −1 ),
ln(cN )}T and scaling all the coefficients by natural logarithm v2 = {ln(c3 ), ln(c4 ),
ln(cN −3 ), ln(cN −2 ), ln(cN −1 ), ln(cN )}T . We conduct tests on the feature vectors con-
sisting of the characteristic polynomial coefficients on the following alternative
matrices:
(a) Adjacency matrix of the original graph;
(b) Laplacian matrix of the original graph;
(c) Adjacency matrix of the oriented lined graph (i.e. the Perron-Frobenius operator T
in the Ihara zeta function);
(d) Laplacian matrix associated with incoming vertex degree of the oriented lined
graph;
(e) Laplacian matrix associated with outgoing vertex degree of the oriented lined graph;
(f) Laplacian matrix associated with the sum of incoming and outgoing vertex degree
of the oriented lined graph;
(g) The positive support of third power of the state transition matrix U of discrete
random walks (i.e. sp(S + (U3 ))
We perform PCA on the pattern vectors to embed them into a 3-dimensional space.
We then locate the clusters using K-means method and calculate the Rand index for
the resulting clusters. The Rand index is defined as RI = Z/(Z + Y ), where Z is the
number of agreements and Y is the number of disagreements in cluster assignment. It
takes a value in the interval [0,1], where 1 corresponds to a perfect clustering.
There are 72 view images of each object in COIL database. The original image size
is 128 × 128. For the extracted delaunay graphs with more than 120 vertices and with
average vertex degree 5.6, the intermediate and higher coefficients of the characteristic
polynomial of (d)(e)(f) tend to be larger than 1.79 × 10308 and are not suitable for
matlab computation. Therefore, for the first set of experiments we resize the images in
to COIL database to a resolution 70 × 70 to reduce the number of detected corners and
hence to ensure that the computations do not overflow memory.
250 P. Ren, R.C. Wilson, and E.R. Hancock

Fig. 1. Datasets for Experiments

Table 1. Rand Indices for v1 on 70 × 70 images

Number of Object Classes


Pattern Vector
4 5 6 7 8
(a) 0.8595 0.8522 0.8269 0.8233 0.8348
(b) 0.7185 0.7343 0.7302 0.7436 0.7792
(c) 0.8942 0.8319 0.8233 0.8291 0.8450
(d) 0.7076 0.6969 0.7292 0.7747 0.7717
(e) 0.7076 0.6969 0.7292 0.7747 0.7717
(f) 0.7076 0.6969 0.7292 0.7747 0.7717
(g) 0.7048 0.6637 0.6897 0.6979 0.7567

Table 2. Rand Indices for v2 on 70 × 70 images

Number of Object Classes


Pattern Vector
4 5 6 7 8
(a) 0.8764 0.8321 0.8090 0.8067 0.8165
(b) 0.9301 0.8501 0.8177 0.8171 0.8459
(c) 0.9300 0.8408 0.8246 0.8182 0.8429
(d) 0.9018 0.8509 0.8111 0.8073 0.8405
(e) 0.9101 0.8327 0.8173 0.8214 0.8481
(f) 0.9234 0.8318 0.8208 0.8250 0.8491
(g) 0.9296 0.8434 0.8237 0.8259 0.8514

Table 3. Rand Indices for v1 on 128 × 128 images

Number of Object Classes


Pattern Vector
4 5 6 7 8
(a) 0.9864 0.8522 0.8269 0.8233 0.8348
(b) 0.8382 0.8351 0.7923 0.7953 0.7797
(c) 0.9897 0.9319 0.8877 0.8757 0.8865
Characteristic Polynomial Analysis on Matrix Representations of Graphs 251

Table 4. Rand Indices for v2 on 128 × 128 images

Number of Object Classes


Pattern Vector
4 5 6 7 8
(a) 0.9794 0.9403 0.8854 0.8663 0.8744
(b) 0.9897 0.9277 0.8885 0.8801 0.8733
(c) 0.9897 0.9334 0.9000 0.8845 0.8921

Table 5. Rand Indices for traditional spectral methods on 128 × 128 images

Number of Object Classes


Pattern Vector
4 5 6 7 8
Laplacian Spectra 0.9245 0.8658 0.8534 0.8496 0.8601
Quantum Spectra 0.9897 0.9263 0.8920 0.8779 0.8789
Heat Contents 0.9897 0.9251 0.8995 0.8776 0.8891

The experimental results for the two types of scaled feature vectors on the resized
images are shown in Table 1 and Table 2. From Table 1 and Table 2 we can see that al-
though the within-class variation of c3 and c4 is reasonably small, the scheme in which
coefficients are scaled by the natural logarithm behaves slightly better. For feature vec-
tor v1 , the adjacency matrix of the original graph and the Perron-Frobenius operator T
perform better than the alternatives. For feature vector v2 each of the matrix represen-
tations has a similar performance in distinguishing graph classes.
Furthermore, we test our methods on the images with original size. In this case,
the characteristic polynomial coefficients of the Laplacian matrices for oriented line
graphs and that of quantum walk sp(S + (U3 )) do not work well due to their compu-
tation inefficiency. Table 3 gives the results using v1 on the adjacency matrix together
with Laplacian matrix for original graphs and the adjacency matrix for the oriented line
graph. Here we can see that the matrix representation in the transformed domain (i.e.
oriented line graph) performs much better than that in original domain. As far as v2 is
concerned, the Perron-Frobenius operator is also generally better than the traditional
matrix representations. To compare the proposed polynomial methods with the tradi-
tional methods, we list the results for traditional spectral methods in Table 5. Among
these three method, the first two only take advantage of the eigenvalues while the heat
contents involve the information contained both in the eigenvalues and the eigenvectors.
We can see that the results obtained using characteristic polynomial coefficients of the
oriented line graph are better than those obtained using the traditional methods.

7 Conclusion

We have performed a characteristic polynomial analysis on four matrix representations


for graphs. We argue that the polynomial coefficients perform better than graph spectra
in distinguishing graph classes. For graphs of large size, the characteristic polynomial
coefficients of the Laplacian of the oriented line graphs and those of the quantum walk
252 P. Ren, R.C. Wilson, and E.R. Hancock

represented sp(S + (U3 )) are less efficient to compute due to their extremely large dy-
namic range. On the other hand, the coefficients for the Perron-Frobenius operator do
not have this problem. For reasonably large graphs (given size), the coefficients of the
Perron-Frobenius operator perform better than the alternative methods described in this
paper. Above all, the Perron-Frobenius operator is superior in computational efficiency
and is effective in characterizing graphs among the four matrix representations, from
the characteristic polynomial point of view.

Acknowledgments
We acknowledge the financial support from the FET programme within the EU FP7,
under the SIMBAD project (contract 213250).

References
1. Aharonov, D., Ambainis, A., Kempe, J., Vazirani, U.: Quantum walks on graphs. In: Pro-
ceedings of ACM Theory of Computing (2001)
2. Bai, X., Hancock, E.R., Wilson, R.C.: Graph characteristics from the heat kernel trace. In:
Pattern Recognition (2009) (to appear)
3. Emms, D., Severini, S., Wilson, R.C., Hancock, E.R.: Coined quantum walks lift the cospec-
trality of graphs and trees. In: Proceedings of SSPR (2008)
4. Ihara, Y.: Discrete subgroups of pl(2, kϕ ). In: Proceeding Symposium of Pure Mathematics,
pp. 272–278 (1965)
5. Ihara, Y.: On discrete subgroups of the two by two projective linear group over p-adic fields.
Journal of Mathematics Society Japan 18, 219–235 (1966)
6. Kotani, M., Sunada, T.: Zeta functions of finite graphs. Journal of Mathematics University of
Tokyo 7(1), 7–25 (2000)
7. Luo, B., Wilson, R.C., Hancock, E.R.: Spectral embedding of graphs. Pattern Recogni-
tion 36(10), 2213–2223 (2003)
8. Ren, P., Wilson, R.C., Hancock, E.R.: Graph characteristics from the ihara zeta function. In:
Proceedings of SSPR (2008)
9. Ren, P., Wilson, R.C., Hancock, E.R.: Pattern vectors from the ihara zeta function. In: Pro-
ceedings of The 19th International Conference of Pattern Recognition (2008)
10. Wilson, R.C., Luo, B., Hancock, E.R.: Pattern vectors from algebraic graph theory. IEEE
Transactions on Pattern Analysis and Machine Intelligence 27(7), 1112–1124 (2005)
Flow Complexity: Fast Polytopal Graph
Complexity and 3D Object Clustering

Francisco Escolano1, Daniela Giorgi2 , Edwin R. Hancock3 , Miguel A. Lozano1 ,


and Bianca Falcidieno2
1
University of Alicante, Departamento de Ciencia de la Computación e Inteligencia
Artificial, Spain
2
Istituto di Matematica Applicata e Tecnologie Informatiche Consiglio Nazionale
delle Ricerche, Italy
3
University of York, Department of Computer Science, UK

Abstract. In this paper, we introduce a novel descriptor of graph com-


plexity which can be computed in real time and has the same qualitative
behavior of polytopal (Birkhoff) complexity, which has been successfully
tested in the context of Bioinformatics. We also show how the phase-
change point may be characterized in terms of the Laplacian spectrum,
by analyzing the derivatives of the complexity function. In addition, the
new complexity notion (flow complexity) is applied to cluster a database
of Reeb graphs coming from analyzing 3D objects.

1 Introduction
The quantification of the intrinsic complexity of undirected graphs has attracted
significant attention due to its fundamental practical importance, not only in
pattern recognition but also other areas such as control theory and network
analysis. Such a quantification not only allows the complexity of different graph
structures to be compared, but also allows the complexity to be traded against
goodness of fit to data when a structure is being learned. Previous complexity
characterizations include: a) the number of spanning trees, its connections with
the Laplacian spectrum, b) methods based on the path-length chromatic decom-
position, c) the number of Boolean operators necessary to build the graph from
generator graphs, and more recently, d) the notion of linear complexity of any
of the associated adjacency matrices.
In pattern recognition and machine learning the problem of quantifying graph
complexity is not only deeply related to embedding methods [1][2] for structural
classification or indexing [3], but is also key to the process of constructing graph
prototypes [4][5]. Recently, the connection between convex polytopes (and those
of the Birkhoff type in particular), heat kernels in graphs, and graph structure,
has been explored in [6][7]. A new measure of structural complexity, dubbed
polytopal complexity has been described in the latter works. Such measure is con-
nected with the the notion of graph entropy introduced by Körner in [8], and also
with novel spectral-based analysis and categorizations of complex networks [9].
In terms of graph embedding, polytopal complexity compares well with classical

A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 253–262, 2009.

c Springer-Verlag Berlin Heidelberg 2009
254 F. Escolano et al.

eigenvalue-eigenvector methods. However, the main drawback of polytopal com-


plexity is its high computational cost. As the number of iterations of the BvN
decomposition is O(n2 ) and a Kuhn-Munkres algorithm (O(n3 )) is executed at
each iteration, we have a O(n5 ) complexity per β value. Our contribution here
is to propose an alternative and faster measure with similar qualitative behav-
ior: the so-called flow complexity. Another drawback of polytopal complexity is
that the theoretical framework of polytopes makes difficult to characterize phase
change in terms of graph spectra. We also test this measure in a real structural
categorization problem (graphs coming from a database of 3D objects). All these
points are tackled along the work described herein.

2 Polytopal Graph Complexity


Given an undirected and unweighted graph G = (V, E) with diffusionγ kernel
Kβ (G), and Birkhoff-von Newmann (BvN) decomposition Kβ (G) = α=1 pα Pα
the polytopal graph complexity of G is defined by the β−dependent function:
H(P) log2 γ + D(P||Uγ )
Cβ (G) = = , (1)
log2 n log2 n
where P = {p1 , . . . , pγ } is the probability density function (pdf) induced by
the BvN decomposition (see details in  [6][7]), H(.) the entropy and D(.) the
Kullback-Leibler divergence D(P||Q) = α pα log pqαα .
The latter definition considers: a) the size of the graph, b) the number of
components of the decomposition and c) the information content of the induced
pdf. Moreover, as the pdf induced by the complete graph Cn is Un , our definition
of graph complexity is actually the entropy ratio H(P)/H(Un ). Independent of
n, we have Cβ (Cn ) = 1, and as a result a complete graph has unit β−graph
complexity. Also, Cβ (In ) = 0 where In is the graph with all its n vertices isolated
(without neighbors). In addition, the latter complexity profile fulfills

lim Cβ (G) = 1 and lim Cβ (G) = 0 ∀G . (2)


β→∞ β→0

Thus, the graph complexity, i.e. trace Cβ (G), is a signature of the interaction
between the heat diffusion process and the structure/topology of the graph as
β (and thus the range of interactions between vertices) changes. It can also
be interpreted as a trajectory in Bn (the n−th Birkhoff polytope) between the
extreme point given by the identity permutation PI = In and the barycenter
B∗ = K(Cn ). In addition, the typical signature is heavy tailed, monotonically
increasing from 0 to β ∗ ≡ arg max{Cβ (G)} and monotonically decreasing from
β ∗ to ∞. Thus, β ∗ represents the most significant topological phase transition
regarding the impact of the diffusion process in the topology of the input graph.
In addition, it has been experimentally found [6] that the polytopal functional
is quasi-invariant is to graph permutations, that is Cβ (G) ≈ Cβ (QT AQ), where
A is the adjacency matrix of G and Q any permutation of order n. Quasi-
invariance is fulfilled despite the fact that the BvN decomposition does not yield
Flow Complexity: Fast Polytopal Graph Complexity 255

such invariance in the coefficients. The polytopal descriptor has also proved to
be effective for graph embedding and subsequent graph clustering. Experiments
related to protein-protein interaction networks are presented in [7]. However,
the main drawback of the polytopal descriptor is its computational cost. As the
number of iterations of the BvN decomposition is O(n2 ) and a Kuhn-Munkres
algorithm (O(n3 )) is executed at each iteration, we have a O(n5 ) complexity
per β value. This complexity precludes the use of the descriptor in real-time
pattern-recognition tasks unless the original graph is simplified [10] beforehand.
In addition the analysis of phase change is very cumbersome in the polytopal
framework. Thus, a new descriptor, qualitatively similar but more efficient than
the current one, and also providing a simpler analytical framework, is needed.

2.1 Phase Change through Diffusion Flow


The connection of phase change at β ∗ with the loss of weighted perfect match-
ings, and consequently, with the permanents of doubly stochastic matrices [7]
is intriguing. Although it does not yield a practical method for simplifying the
computation of polytopal complexity, it has inspired the dynamic analysis of
the diffusion flow over the structure as β changes. Let G = (V, E) a undi-
rected graph with |V | = n and adjacency matrix A. The diffusion kernel is
Kβ (G) = exp(−βL) ≡ ΦΛΦT , being Λ = diag(e−βλn , e−βλn−1 , . . . , e−βλ1 = 1),
and λ1 = 0 ≤ λ2 ≤ . . . ≤ λn are the eigenvalues of L. Thus
n
Kβij = φk (i)φk (j)e−λn−k+1 β , (3)
k=1

and Kβij ∈ [0, 1] is the (i, j) entry of a doubly stochastic matrix. The fact that
the heat kernel in the interval [β ∗ , βmin ] is populated by an increasing number of
elements for which Kβij ≈ 1, or equivalently an increasing number of off-diagonal
elements for which Kij = 0, i = j, motivated the analysis of phase change not
only through permanents, but also through the dynamic quantification of the
total heat flowing through the network represented by the graph. In this context,
heat flowing means heat passing through a given edge of the graph. Therefore,
the total heat flowing through the graph at a given β is:
n n
 n 
Fβ (G) = δij φk (i)φk (j)e−λn−k+1 β , (4)
i=1 j=i k=1

where δij = 1 if Aij = 1 and δij = 0 otherwise, that is, if δij = 1 if (i, j) ∈ E.
If we take derivative of F with respect to β, plug-in the second order Taylor
expansion for e−λn−k+1 β , and set the derivative to zero, then we have:
  

n 
n 
n
(λn−k+1 β)2
Fβ (G) = δij φk (i)φk (j)(−λn−k+1 ) 1 − λn−k+1 β + =0
i=1 j=i
2!
k=1

(5)
256 F. Escolano et al.

Let r <= n denote the number of elements of components in the upper tri-
angular part of A with δij = 1 (the same as in the lower triangular one), and
(i1 , . . . , ir ), (j1 , . . . , jr ) new indexes for these components. Then we have:
ir jr n 
 (λn−k+1 β)2
Fβ (G) = φk (i)φk (j)(−λn−k+1 ) 1 − λn−k+1 β + =
i=i j=j
2!
1 1 k=1
n  ir jr
(λn−k+1 β)2
= (−λn−k+1 ) 1 − λn−k+1 β + φk (i)φk (j) =
2! i=i1 j=j1
k=1
  
Γ (k)
n n n
λ3n−k+1
=− Γ (k)β 2 + λ2n−k+1 Γ (k)β − λn−k+1 Γ (k) = 0 ,
2
k=1 k=1 k=1
(6)
which is a quadratic equation in β. Thus, let be β + one of the solutions to the
equation Fβ (G) = 0. Instead of solving that equation here, we must consider
that the second derivative at the phase-change point must be negative (local
concavity). So, valid β + must satisfy:
n
Fβ (G) = − λ3n−k+1 Γ (k)β + λ2n−k+1 Γ (k) < 0 , (7)
k=1 k=1
which is only true for β > 0. Actually, we define β + as the minimum β > 0
satisfying the latter equation. This rationale is still valid when, being coherent
with the definition of polytopal complexity in Eq. 1, we take the log2 of the
number of components in the Birkhoff decomposition. Following the rules of
defining complexity by multiplying entropy and disorder [11], which in our case
corresponds to a normalizing factor, we define the graph flow complexity as:
log2 (1 + Fβ (G))
Cfβ (G) = . (8)
log2 n
whose first derivative with respect to β is:
1
Cfβ (G) = F  (G) =
(log2 n)(ln 2)(1 + Fβ (G)) β
  
Λβ
$ n n n
%
1 λ3n−k+1
= − Γ (k)β 2 + λ2n−k+1 Γ (k)β − λn−k+1 Γ (k) .
Λβ 2
k=1 k=1 k=1
(9)
Resulting β as defined in Eq. 6. However, what about the concavity of Cfβ (G)?
+

Analyzing the second derivative of the new complexity measure, we have:


1   1  Λβ Fβ (G) − Λβ Fβ (G)
Cfβ (G) = − Λ F (G) + F (G) = =
Λ2β β β Λβ β Λ2β
Flow Complexity: Fast Polytopal Graph Complexity 257

a(1 + Fβ (G))Fβ (G) − a(Fβ (G))2


= .
Λ2β
Cfβ (G) < 0 ⇒ (1 + Fβ (G))Fβ (G) − (Fβ (G))2 < 0 , (10)

where a = (log2 n)(ln 2). In general, we define β ∗ as the minimum β satisfying the
latter inequality. This definition is coherent with the polytopal profile and β ∗ starts
either a function decay which passes through an inflexion towards the equilibrium
or reaches it immediately. Equilibrium is defined as limβ→∞ Cfβ (G) = logn2 n . Al-
though this equilibrium point is different from the polytopal one, the qualitative
behavior of the both profiles are similar and independent of the scales (number of
nodes) of the graphs. The difference between this descriptor and that of the poly-
topal one that flow complexity is very fast to compute. Therefore, graph simplifi-
cation is not yet needed to obtain a functional descriptor in real time. Moreover,
the flow-based profile is also permutation invariant.

3 Reeb Graphs of 3D Objects and Experimentation

3.1 Reeb Graphs from Geodesics

In their original formulation, Reeb graphs date back to 1946, when they were
defined by George Reeb as topological constructs [12]. The basic idea is to ob-
tain information concerning the topology of a manifold M from the information
related to the critical points of a real function f defined on M . This is done by
analyzing the behaviour of the level sets La of f , namely the set of points sharing
the same value of f : La = f −1 (a) = {P ∈M: f (P ) = a}. As the isovalue a spans
the range of its possible values in the co-domain of f , connected components of
level sets may appear, disappear, join, split or change genus. The Reeb graph
keeps track of these changes, and stores them in a graph structure, whose nodes
are associated with the critical points of f .
Reeb graphs have been introduced in Computer Graphics by Shinagawa et al
in 1991 [13], and since then they have become popular for shape analysis and
description. The extension of Reeb graphs to triangle meshes to triangle meshes
has attracted considerable interest, and has proved to be one of the most pop-
ular representation for shapes in Computer Graphics. In this paper, we follow
the computational approach in [14,15], where a discrete counterpart of Reeb
graphs, referred to as the Extended Reeb Graph (ERG), is defined for triangle
meshes representing surfaces in R3 . The basic idea behind ERG is to provide a
region-oriented characterization of surfaces, rather then a point-oriented charac-
terization. This is done by replacing the notion and role of critical points with
that of critical areas, and the analysis of the behaviour of the level sets with
the analysis of the behaviour of surface stripes, defined by partitioning the co-
domain of f into a finite set of intervals. We consider in more detail a finite
number of level sets of f , which divide the surface into a set of regions. Each
region is classified as a regular or a critical area according to the number and
the value of f along its boundary components. Critical areas are classified as
258 F. Escolano et al.

Fig. 1. Pipeline of the ERG extraction. Left: surface partition and recognition of critical
areas; blue areas correspond to minima, red areas correspond to maxima, green areas
to saddle areas. Middle: insertion of edges between minima and saddles and between
maxima and saddles, by expanding all maxima and minima to their nearest critical
area. Right: insertion of the remaining edges, to form the final graph.

maximum, minimum and saddle areas. A node in the graph is associated with
each critical area. Then arcs between nodes are detected through an expansion
process of the critical areas, by tracking the evolution of the level sets. The
pipeline of the ERG extraction is illustrated in Fig. 1.
A fundamental property of ERGs is their parametric nature with respect to
the mapping function f : different choices of f produce different graphs. In this
paper, we deal with the comparison of free-form 3D shapes, hence we require a
graph representation that is invariant with respect to rotations and translations
(to distinguish the real content of shapes from the particular choices made by
the 3D designer) and possibly with respect to pose changes. A solution comes
from the adoption of the integral geodesic distance, as discretized in [16]. For
each vertex v in the mesh, the function is defined as

f (v) = g(v, bi ) · area(bi )


i

where {bi } = {b0 , . . . , bk } is an almost uniform sampling of the mesh vertices,


g(v, pi ) is the geodesic distance of point v from point bi , and area(bi ) is the area

Fig. 2. 3D models described by the integral geodesic distance (red corresponds to high
values of the function, blue corresponds to low values), along with the corresponding
Extended Reeb Graphs
Flow Complexity: Fast Polytopal Graph Complexity 259

of the neighborhood of bi . We recall that the geodesic distance between two given
surface points p and q is the minimal length of all surface curves joining p and
q. Since the geodesic distance relies neither on a local coordinate system nor on
surface embedding, the graph configuration derived is invariant to translation
and rotation of the model, as well as to isometric pose changes, thus being well
suited to deal with articulated 3D shapes. Fig. 2 shows some examples models
described by the integral geodesic distance (red corresponds to high values of the
function, blue corresponds to low values), along with their corresponding ERGs.

Fig. 3. The dataset of 300 models used for the experiments in this paper. See the
SHREC 2008 Stability Track Report [17] for further details.

Fig. 4. Representative profiles for the two super-clusters


260
F. Escolano et al.

Fig. 5. Pairwise Matrix


Flow Complexity: Fast Polytopal Graph Complexity 261

In this paper, we are using the dataset of 300 3D models used in the Stability
Track of the SHREC Contest 2008 [17]. This dataset is made of 15 classes, with
20 models per class. The objects included range from humans and animals to
cups and mechanical parts, as shown in Fig. 3.

3.2 Experimentation
We have tested the new complexity measure as a descriptor (discretization of
the function) of 300 graphs (15 classes with 20 members each, see Fig. 3). These
graphs correspond to the geodesics of the 3D shapes described in the previous
section. We have verified that the profile similarity is approximately invariant
to non-rigid deformations, and thus not effective for clustering the patterns de-
rived from the discretization. Instead we have compared them using the Eu-
clidean distances dij and have constructed a pairwise similarity matrix M where
Mij = e−Kdij (in this case K = 25). We show the resulting similarity matrix in
Fig. 5, with 90, 000 entries. The analysis of this matrix reveals two very compact
classes (glasses, spring), and some other classes with a smaller degree of com-
pactness (armadillo, cup, bird). There are also two super-clusters. The first one,
of 6 classes, includes (see one representative profile of each class in Fig. 4-left):
airplane,chair, octopus,table, hand, and fish being airplane and hand the most
compact ones. The table class is too sparse. The second super-cluster (see Fig. 4-
right) includes bust, mechanic and fourleg. The human class is highly similar to
elements in both super-clusters, and in particular the h fourlegs In both super-
clusters, the profiles follow the same characteristic path in qualitative terms,
having similar values of β ∗ and different, but sometimes coincident, values of
Fβ ∗ (G)). This indicates that the descriptor should be applied to graphs origi-
nating from sources alternative to geodesics, and then an integrated clustering,
in order to take into account these additional measures of similarity.

4 Conclusions
In this paper, we have proposed and successfully tested a novel measure of graph
complexity which is qualitatively similar to, but more efficient than the polytopal
one. The novel measure provides a simpler analysis framework for characterizing
the phase-transition locations in terms of graph spectra. The new measure has
been tested on clustering a database of 300 graphs originating from 3D objects.
We have found several clusters and a pair of super-clusters in the data. Most
of the classes in the super-clusters are consistent with the corresponding 3D
shapes. However, there is some scope for improving the discriminating power of
the method. This latter is a topic for future research, and it will be addressed
through computing the descriptors of several graphs originating from the same
shape with alternatives to the geodesics (eigenfunctions, for instance) and then
performing consesus-clustering. This approach is feasible since the descriptor is
efficient. We will also compare this method with alternatives described in the
literature, since it is now possible to compare such descriptors for very large
networks.
262 F. Escolano et al.

Acknowledgements
Work partially supported by the Project SHALOM funded by the Italian Min-
istry of Research (contract number RBIN04HWR8) and the EU FP7 Project
FOCUS K3D (contract number ICT-2007.4.2).

References
1. Robles-Kelly, A., Hancock, E.R.: A riemannian approach to graph embedding.
Pattern Recognition (40), 1042–1056 (2007)
2. Luo, B., Wilson, R.C., Hancock, E.: Spectral embedding of graphs. Pattern Recog-
nition (36), 2213–2223 (2003)
3. Shokoufandeh, A., Dickinson, S., Siddiqi, K., Zucker, S.: Indexing using a spectral
encoding of topological structure. In: IEEE ICPR, pp. 491–497
4. Torsello, A., Hancock, E.: Learning shape-classes using a mixture of tree-unions.
IEEE Trans. on PAMI 28(6), 954–967 (2006)
5. Lozano, M., Escolano, F.: Protein classification by matching and clustering surface
graphs. Pattern Recognition 39(4), 539–551 (2006)
6. Escolano, F., Hancock, E., Lozano, M.: Birkhoff polytopes, heat kernels, and graph
embedding. In: ICPR (2008)
7. Escolano, F., Hancock, E., Lozano, M.: Graph complexity, matrix permanents, and
embedding. In: Proc. SSPR/SPR (2008)
8. Körner, J.: Coding of an information source having ambiguous alphabet and the
entropy of graphs. In: Trans. of the 6th Prague Conference on Information Theory,
pp. 411–425 (1973)
9. Estrada, E.: Graph spectra and structure in complex networks. Technical report,
Institute of Complex Systems at Strathclyde, Department of Physics and Depart-
ment of Mathematics, University of Strathclyde, Glasgow, UK (2008)
10. Qiu, H., Hancock, E.: Graph simplification and matching using conmute times.
Pattern Recognition (40), 2874–2889 (2007)
11. López-Ruiz, R., Mancini, H., Calbet, X.: A statistical measure of complexity.
Physics Letters A 209, 321–326 (1995)
12. Reeb, G.: Sur les points singuliers d’une forme de Pfaff complètement intégrable
ou d’une fonction numérique. Comptes Rendus Hebdomadaires des Séances de
l’Académie des Sciences 222, 847–849 (1946)
13. Shinagawa, Y., Kunii, T.L.: Constructing a Reeb Graph automatically from cross
sections. IEEE Computer Graphics and Applications 11(6), 44–51 (1991)
14. Biasotti, S.: Computational Topology Methods for Shape Modelling Applications.
Ph.D thesis, Universitá degli Studi di Genova (May 2004)
15. Biasotti, S.: Topological coding of surfaces with boundary using Reeb graphs. Com-
puter Graphics and Geometry 7(1), 31–45 (2005)
16. Hilaga, M., Shinagawa, Y., Kohmura, T., Kunii, T.L.: Topology matching for fully
automatic similarity estimation of 3D shapes. In: SIGGRAPH 2001: Proceedings
of the 28th Annual Conference on Computer Graphics and Interactive Techniques,
Los Angeles, CA, pp. 203–212. ACM Press, New York (2001)
17. Attene, M., Biasotti, S.: Shape retrieval contest 2008: Stability of watertight mod-
els. In: Spagnuolo, M., Cohen-Or, D., Gu, X. (eds.) SMI 2008: Proceedings IEEE
International Conference on Shape Modeling and Applications, pp. 219–220 (2008)
Irregular Graph Pyramids and Representative
Cocycles of Cohomology Generators

Rocio Gonzalez-Diaz1,3, Adrian Ion3 ,


Mabel Iglesias-Ham2,3, and Walter G. Kropatsch3
1
Applied Math Department, University of Seville, Spain
2
Pattern Recognition Department, CENATAV, Havana, Cuba
3
Pattern Recognition and Image Processing Group, Vienna University of Technology

Abstract. Structural pattern recognition describes and classifies data


based on the relationships of features and parts. Topological invariants,
like the Euler number, characterize the structure of objects of any di-
mension. Cohomology can provide more refined algebraic invariants to
a topological space than does homology. It assigns ‘quantities’ to the
chains used in homology to characterize holes of any dimension. Graph
pyramids can be used to describe subdivisions of the same object at mul-
tiple levels of detail. This paper presents cohomology in the context of
structural pattern recognition and introduces an algorithm to efficiently
compute representative cocycles (the basic elements of cohomology) in
2D using a graph pyramid. Extension to nD and application in the con-
text of pattern recognition are discussed.

Keywords: Graph pyramids, representative cocycles of cohomology


generators.

1 Introduction

Image analysis deals with digital images as input to pattern recognition systems.
Topological features have the ability to ignore changes in geometry caused by dif-
ferent transformations. Simple features are for example the number of connected
components, the number of holes, etc., while more refined ones, like homology
and cohomology, characterize holes and their relations.
In order to characterize the holes in a region adjacency graph (RAG) associ-
ated to a 2D binary digital image, one way would be to consider the cycles with
exactly 4 edges as degenerate cycles and establish an equivalence between all the
cycles of the graph as follows: two cycles are equivalent if one can be obtained
from the other by joining to it one or more degenerate cycles. There is only one
equivalence class for the foreground (gray pixels) of the digital image in Fig. 1,
which represents the unique hole. This is similar to consider the digital image

Partially supported by the Austrian Science Fund under grants S9103-N13 and
P18716-N13; Junta de Andalucı́a (grants FQM-296 and TIC-02268) and Spanish
Ministry for Science and Education (grant MTM-2006-03722).

A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 263–272, 2009.

c Springer-Verlag Berlin Heidelberg 2009
264 R. Gonzalez-Diaz et al.

Fig. 1. a) A 2D digital image I; b) its RAG; c) a cell complex associated to I (in blue,
a representative cocycle); and d) the cell complex without the hole

as a cell complex1 [1] (see Fig. 1.c). Here one can ask for the edges we have to
delete in order to ‘destroy’ the hole.
In the example in Fig. 1 it is not enough to delete only one edge. The set of blue
edges in Fig. 1.c block any cycle that surrounds the hole; the deletion of these
edges together with the faces that they bound produces the ‘disappearing’ of
the hole. A 1-cocycle of a planar object can be seen as a set of edges ‘blocking’
the creation of cycles of one homology class. The elements of cohomology are
equivalence classes of cocycles.
Topology simplification is an active field in geometric modeling and medical
imaging (see for example [2]). In fact, the ring structure presented in cohomol-
ogy is more refined than homology. The main drawbacks to using cohomology
in Pattern Recognition have been its lack of geometrical meaning and the com-
plexity for computing it. Nevertheless, concepts related to cohomology can have
associated interpretations in graph theory. Having these interpretations opens
the door for applying classical graph theory algorithms to compute and manip-
ulate these features. Initial plans regarding this research have been presented in
this paper in Section 5.
The paper is organized as follows: Sections 2 and 3 recall graph pyramids
and cohomology, and make initial connections. Section 4 presents the proposed
method. Section 5 gives considerations regarding the usage of cohomology in
image processing. Section 6 concludes the paper.

2 Irregular Graph Pyramids

A RAG, encodes the adjacency of regions in a partition. A vertex is associated


to each region, vertices of neighbooring regions are connected by an edge. Clas-
sical RAGs do not contain any self-loops or parallel edges. An extended region
adjacency graph (eRAG) is a RAG that contains the so-called pseudo edges,
which are self-loops and parallel edges used to encode neighborhood relations
to a cell completely enclosed by one or more other cells [3]. The dual graph of
an eRAG G is called a boundary graph (BG) and is denoted by G (G is said
to be the primal graph of G). The edges of G represent the boundaries (bor-
ders) of the regions encoded by G, and the vertices of G represent points where
boundary segments meet. G and G are planar graphs. There is a one-to-one
1
Intuitively a cell complex is defined by a set of 0-cells (vertices) that bound a set of
1-cells (edges), that bound a set of 2-cells (faces), etc.
Irregular Graph Pyramids and Representative Cocycles 265

Fig. 2. A digital image I, and boundary graphs G6 , G10 and G16 of the pyramid of I

correspondence between the edges of G and the edges of G, which also induces
a one-to-one correspondence between the vertices of G and the 2D cells (will be
denoted by faces 2 ) of G. The dual of G is again G. The following operations are
equivalent: edge contraction in G with edge removal in G, and edge removal in
G with edge contraction in G.
A (dual) irregular graph pyramid [3,4] is a stack of successively reduced planar
graphs P = {(G0 , G0 ), . . . , (Gn , Gn )}. Each level (Gk , Gk ), 0 < k ≤ n is obtained
by first contracting edges in Gk−1 (removal in Gk−1 ), if their end vertices have
the same label (regions should be merged), and then removing edges in Gk−1
(contraction in Gk−1 ) to simplify the structure. The contracted and removed
edges are said to be contracted or removed (sometimes called removal edges)
in (Gk−1 , Gk−1 ). In each Gk−1 and Gk−1 , contracted edges form trees called
contraction kernels. One vertex of each contraction kernel is called a surviving
vertex and is considered to have been ‘survived’ to (Gk , Gk ). The vertices of a
contraction kernel in level k − 1 form the reduction window of the respective
surviving vertex v in level k. The receptive field of v is the (connected) set of
vertices from level 0 that have been ‘merged’ to v over levels 0 . . . k.
For each boundary graph Gi , the cell complex [5] associated to the foreground
object, called boundary cell complex, is obtained by taking all faces of Gi cor-
responding to vertices of Gi , whose receptive fields contain (only) foreground
pixels, and adding all edges and vertices needed to represent the faces.
Lemma 1. All the boundary cell complexes of a given irregular dual graph pyra-
mid are cell subdivisions of the same object. Therefore, all these cell complexes
are homeomorphic.
As a result of Lemma 1, topological invariants computed on different levels of
the pyramid are equivalent.

3 Cohomology and Integral Operators


Intuitively, homology characterizes the holes of any dimension (i.e. cavities, tun-
nels, etc.) of an n-dimensional object. It defines the concept of generators which,
for example for 2D objects are similar to closed paths of edges surrounding holes.
More general, k-dimensional manifolds surrounding (k +1)-dimensional holes are
generators [5], and define equivalence classes of holes. Cohomology arises from
2
Not to be confused with the vertices of the dual of a RAG (sometimes also denoted
by the term faces).
266 R. Gonzalez-Diaz et al.

0-cells {v1 , v2 , v3 , v4 }
1-cells {e1 , e2 , e3 , e4 , e5 , e6 }
2-cells {f1 }
1-boundary ∂f1 = e1 + e2 + e5
1-chain e1 + e3
1-cycle a = e3 + e4 + e5
1-cycle b = e1 + e2 + e3 + e4
homologous cycles a and b; since a = b + ∂f1

Fig. 3. Example cell complex

the algebraic dualization of the construction of homology. It manipulates groups


of homomorphisms to define equivalence classes. Intuitively, cocycles (the invari-
ants computed by cohomology), represent the sets of elements (e.g. edges) to be
removed to destroy certain holes. See Fig. 1.c for an example cocycle.
Starting from a cell decomposition of an object, its homology studies incidence
relations of its subdivision. Fig. 3 illustrates the following abstract concepts. A
cell of dimension p is called a p-cell. The notion of p-chain is defined as a formal
sum of p-cells. The chains are considered over Z/2 coefficients i.e. a p-cell is either
present in a p-chain (coefficient 1) or absent (coefficient 0) - any cell that appears
twice vanishes. The set of p-chains form an abelian group called the p-chain group
Cp . This group is generated by all the p-cells. The boundary operator is a set of
homomorphisms {∂p : Cp → Cp−1 }p≥0 connecting two immediate dimensions:
∂p+1 ∂p ∂ ∂
· · · → Cp → Cp−1 → · · · →1 C0 →0 0. By linearity, the boundary of any p-chain
is defined as the formal sum of the boundaries of each p-cell that appears in the
chain. The boundary of 0-cells (i.e. points) is always 0. For each p, ∂p−1 ∂p = 0.
A p-chain σ is called a p-cycle if ∂p (σ) = 0. If σ = ∂p+1 (μ) for some (p + 1)-chain
μ then σ is called a p-boundary. Two p-cycles a and a are homologous if there
exists a p-boundary b such that a = a + b.
Denote the groups of p-cycles and p-boundaries by Zp and Bp respectively.
All p-boundaries are p-cycles (Bp ⊆ Zp ). Define the pth homology group to be
the quotient group Hp = Zp /Bp , for all p. Each element of Hp is a class obtained
by adding each p-boundary to a given p-cycle a. Then a is a representative cycle
of the homology class a + Bp .
Cohomology groups are constructed by turning chain groups into groups of
homomorphisms and boundary operators into their dual homomorphisms. Define
a p-cochain as a homomorphism c : Cp → Z/2. We can see a p-cochain as a binary
mask of the set of p-cells: imagine you order all p-cells in the complex (let’s say
we have n p-cells, and call this ordered set Sp ). Then a p-cochain c is a binary
mask of n values in {0, 1}n.
The p-cochains form the set C p which is a group. A p-cochain c is totally
defined by the set of p-cells that are evaluated to 1 by c. The boundary operator
defines a dual homomorphism, the coboundary operator δ p : C p → C p+1 , such
that δ p (c) = c∂p+1 for any p-cochain c. Since the coboundary operator runs
in a direction opposite to the boundary operator, it raises the dimension. Its
Irregular Graph Pyramids and Representative Cocycles 267

kernel is the group of cocycles and its image is the group of coboundaries. Two
p-cocycles c and c are cohomologous if there exists a p-coboundary d such that
c = c + d. The pth cohomology group is defined as the quotient of p-cocycle
modulo p-coboundary groups, H p = Z p /B p , for all p. Each element of H p is a
class obtained by adding each p-coboundary to a given p-cocycle c. Then c is a
representative cocycle of the cohomology class c + B p . If the object is embedded
in R3 , then homology and cohomology are isomorphic. However, cohomology has
a ring structure which is a more refined invariant than homology. See [5] for a
more detailed explanation.
Starting from a cell decomposition of an object (e.g. from any level of the
∂ ∂ ∂
pyramid) and the chain complex associated to it, · · · →2 C1 →1 C0 →0 0, take a
q-cell σ and a (q + 1)-chain α. An integral operator [6] is defined as the set of
homomorphisms {φp : Cp → Cp+1 }p≥0 such that φq (σ) = α, φq (μ) = 0 if μ is a
q-cell different to σ, and for all p = q and any p-cell γ we have φp (γ) = 0. It is
extended to all p-chains by linearity.
Integral operators can be seen as a kind of inverse boundary operator. They
satisfy the condition φp+1 φp = 0 for all p. An integral operator {φp : Cp →
Cp+1 }p≥0 satisfies the chain-homotopy property iff φp ∂p+1 φp = φp for each p.
For φp satisfying the chain-homotopy property, define πp = idp +φp−1 ∂p +∂p+1 φp
∂ ∂ ∂
where {idp : Cp → Cp }p≥0 is the identity. Then, · · · →2 imπ1 →1 imπ0 →0 0 is
a chain complex and {πp : Cp → imπp } is a chain equivalence [5]. Its chain-
homotopy inverse is the inclusion map {ιp : imπp → Cp }.
Consider, for example, the cell complex K in Fig. 4 on the left. The integral
operator associated to the removal of the edge e is given by φ1 (e) = B. Then,
π1 (e) = a + f + d, π2 (B) = 0, π2 (A) = A + B (A + B is renamed as A in K  ) and
πp is the identity over the other p-cells of K, p = 0, 1, 2. The removal of edge e
decreased the degree of vertices 1 and 3 allowing for further simplification.
The following lemma guarantees the correctness of the down projection pro-
cedure given in Section 5.
Lemma 2. Let {φp : Cp → Cp+1 }p≥0 be an integral operator satisfying the
∂ ∂ ∂ ∂
chain-homotopy property. The chain complexes · · · →2 C1 →1 C0 →0 0 and · · · →2
∂ ∂
imπ1 →1 imπ0 →0 0 have isomorphic homology and cohomology groups. If c :
imπp → Z/2 is a representative p-cocycle of a cohomology generator, then cπ :
Cp → Z/2 is a representative p-cocycle of the same generator.
For example, consider the cell complex K  of Fig. 4. The 1-cochain α, defined
by the set {c, d} of edges of K  , is a 1-cocycle which ‘represents’ the white hole

φ π
e B a+f +d
B 0 0
A 0 A
other p-cell σ 0 σ

Fig. 4. The cell complexes K and K  and the homomorphisms φ, π, ι


268 R. Gonzalez-Diaz et al.

H (in the sense that all the cycles representing the hole must contain at least
one of the edges of the set). Then β = απ is defined by the set {c, d, e} of
edges of K. α and β are both 1-cocycles representing the same white hole H.

Lemma 3. The two operations used to construct an irregular graph pyramid:


edge removal and edge contraction, are integral operators satisfying the chain-
homotopy property.

In terms of embedded graphs an integral operator maps a vertex/point to exactly


one of its incident edges and an edge to exactly one of its incident faces. In every
level of a graph pyramid, the contraction kernels make up a spanning forest. A
forest composed of k connected components, spanning a graph with n vertices,
has k root vertices, n − k other vertices, and also n − k edges. These edges
can be oriented toward the respective root such that each edge has a unique
starting vertex. Then, integral operators mapping the starting vertices to the
corresponding edge of the spanning forest can be defined as follows: φ0 (vi ) = ej ,
where ej is the edge incident to vi , oriented away from it.

Lemma 4. All integral operators that create homeomorphisms can be repre-


sented in a dual graph pyramid. This is equivalent to: given an input image
(G0 , G0 ) and its associated cell complex Z = {C0 , C1 , C2 }, a cell complex Z  =
{C0 , C1 , C2 } with Z, Z  homeomorphic, and Z a refinement of Z  i.e. C0 ⊆ C0 ,
C1 ⊆ C1 , and C2 ⊆ C2 , then there exists a pyramid P s.t. Z  is the cell complex
associated to some level (Gk , Gk ), k ≥ 0, of P .

4 Representative Cocycles in Irregular Graph Pyramids

A method for efficiently computing representative cycles of homology generators


using an irregular graph pyramid is given in [7]. In [8] a novel algorithm for
correctly visualizing graph pyramids, including multiple edges and self-loops is
given. This algorithm preserves the geometry and the topology of the original
image.
In this paper, representative cocycles are computed and drawn in the bound-
ary graph of any level of a given irregular graph pyramid. They are computed
in the top (last) level and down projected using the described process.
For this purpose, a new level, called homology-generator level, is added over
the boundary graph of the last level of the pyramid. The boundary graph in
this new level is a set of regions surrounded by a set of self-loops incident in a
single vertex. To obtain this level, on the top of the computed pyramid [7] we
compute a spanning tree and contract all the edges that belong to it (see Fig. 5).
Note that this last level is no longer homeomorphic to the base level, but
homotopic.

Lemma 5. The boundary cell complex of any level of the pyramid and the one of
the homology-generator level have isomorphic homology and cohomology groups.
Irregular Graph Pyramids and Representative Cocycles 269

For example, in the boundary graph of


the homology-generator level (Fig. 5.a, top)
each self-loop α that surrounds a region of
the background (hole of a region R of the
foreground) is a representative 1-cycle of
a homology generator. In the same graph,
the representative 1-cocycle of each coho-
mology generator is defined by exactly two
self-loops. One of them is the self-loop α
representing one homology generator. Let β
be the self-loop surrounding the region R.
Then, {α, β} is a representative 1-cocycle
of a cohomology generator.
Let Ak , k > 0, denote the set of edges
that define a cocycle in Gk (the boundary
graph in level k). The down projection of
Ak to the level Gk−1 is the set of edges
Ak−1 ⊆ Gk−1 that corresponds to Ak i.e.
represents the same cocycle. Ak−1 is com-
puted as Ak−1 = Ask−1 ∪ Ark−1 , where Ask−1
denotes the set of surviving edges in Gk−1
that correspond to Ak , and Ark−1 is a sub-
set of removed edges in Gk−1 . The following
steps show how to obtain Ark−1 :

1. Consider the contraction kernels of


Gk−1 (RAG) whose vertices are labeled
with (the region for which cocycles are
computed). The edges of each contrac-
tion kernel are oriented toward the re-
spective root - each edge has a unique
starting vertex.
2. For each contraction kernel T , from the
leaves of T to the root, let e be an edge
of T , v its starting point, and Ev the
edges in the boundary of the face asso-
ciated to v: label e with the sum of the
number of edges that are in both Ask−1
and in Ev , and the sum of the labels of
the edges of T which are incident to v. a) b)
3. A removal edge of Gk−1 is in Ark−1 if the
corresponding edge of Gk−1 is labeled Fig. 5. a) Levels of a pyramid.
with an odd number. Edges: removed (thin), contracted
(middle) and surviving (bold). b)
The proof of correctness uses the homomor- Down projection representative
phisms {πp }. 1-cocycle (bold).
270 R. Gonzalez-Diaz et al.

Note that these graphs were defined from the integral operators associated to
the removed and contracted edges of the boundary graph of level k − 1 to obtain
level k. An example of the down projection is shown in Fig. 5.b.
Let n be the height of the pyramid (number of levels), en the number of edges
in the top level, and v0 the number of vertices in the base level, with n ≈ log v0
(logarithmic height). An upper bound for the computation complexity is: O(v0 n),
to build the pyramid; for each foreground component, O(h) in the number of
holes h, to choose the representative cocycles in the top level; O(en n) to down
project the cocycles (each edge is contracted or removed only once). Normally not
all edges are part of cocycles, so the real complexity of down projecting a cocycle
is below O(en n). The overall computation complexity is: O(v0 n+c(hen n)), where
c is the number of cocycles that are computed and down projected.

5 Cohomology, Image Representation and Processing

Besides simplifying topology, cohomology can be considered in the context of


classification and recognition based on structure. There is no concrete definition
of what ‘good’ features are, but usually they should be stable under certain trans-
formations, robust with respect to noise, easy to compute, and easy to match.
The last two aspects motivate the following considerations: finding associations
between concepts in cohomology and graph theory will open the door for apply-
ing existing efficient algorithms (e.g. shortest path); if cocycles are to be used
as features for structure, the question of a stable class representative has to be
considered i.e. not taking any representative cocycle, but imposing additional
properties s.t. the obtained one is in most of the cases the same. The rest of the
section considers one example: 1-cocycles of 2D objects.
A 1-cocycle of a planar object can be seen as a set of edges that ‘block’
the creation of cycles of one homology class. Assume that the reverse is also
valid i.e. all sets that ‘block’ the creation of cycles of one homology class are
representative 1-cocycles. Then, any set of foreground edges in the boundary
graph Gi , associated to a path in the RAG Gi , connecting a hole of the object
with the (outside) background face, is a representative 1-cocycle. It blocks any
generator that would surround the hole and it can be computed efficiently (proof
follows). If additional constraints are added, like minimal length, the 1-cocycle is
a good candidate for pattern recognition tasks as it is invariant to the scanning
of the cells, the processing order, rotation, etc.
Let KH be the boundary cell complex associated to the foreground of the
homology-generator level. Suppose that α is a representative cycle i.e. a self-loop
surrounding a face of the background, and β is a self-loop surrounding a face f of
the foreground such that α is in the boundary of f in KH (Fig. 6). Let α∗ denote
the cocycle defined by the set {α, β}. Let K0 denote the boundary cell complex
associated to the foreground in G0 . Let φ be the composition of all integral oper-
ators associated with all the removals and contractions of edges of the foreground
of the boundary graphs of a given irregular graph pyramid. Let π = id + φ∂ + ∂φ
and let ι : KH → K0 be the inclusion map. Consider the down projection [7] of α
Irregular Graph Pyramids and Representative Cocycles 271

a) cocycle {α, β} in b) down projec- c) edges ea ∈ a d) cocycle in G0 co-


the top level tion a, b of α, β and eb ∈ b homologous to a)

Fig. 6. Example cocycle down projection

and β in G0 : the cycles ι(α) = a and ι(β) = b, respectively. Take any edge ea ∈ a
and eb ∈ b. Let fa , fb be faces of K0 having ea respectively eb in their boundary.
Let v0 , v1 , . . . , vn be a simple path of vertices in G0 s.t. all vertices are labeled as
foreground. v0 is the vertex associated to fa , and vn to fb .
Proposition 1. Consider the set of edges c = {e0 , . . . , en+1 } of G0 , where e0 =
ea , en+1 = eb , and ei , i = 1 . . . n, is the common edge of the regions in G0
associated with the vertices vi−1 and vi . c defines a cocycle cohomologous to the
down projection of the cocycle α∗ .
Proof. c is a cocycle iff c∂ is the null homomorphism. First, c∂(fi ) = c(ei +
ei+1 ) = 1 + 1 = 0. Second, if f is a 2-cell of K0 , f = fi , i = 0, . . . , n, then,
c∂(f ) = 0. To prove that the cocycles c and α∗ π (the down projection of α∗ to
the base level of the pyramid) are cohomologous, is equivalent to showing that
cι = α∗ . We have that cι(α) = c(eb ) = 1 and cι(β) = c(ea ) = 1. Finally, cι over
the remaining self-loops of the boundary graph of the homology-generator level
is null. Therefore, cι = α∗ .
Observe that the cocycle c in G0 may correspond to the path connecting two
boundaries and having the minimal number of edges: ‘a minimal representative
cocycle’. As a descriptor for the whole object, take a set of minimal cocycles
having some common property3 .
Lemma 6. Let γ ∗ be a representative 1-cocycle in G0 , whose projection in the
homology-generator level is the cocycle α∗ defined by the two self-loops {α, β}.
γ ∗ has to satisfy that it contains an odd number of edges of any cycle g in G0
that is homologous to ι(α), the down projection of α in G0 .
Proof. γ ∗ contains an even number of edges of g iff γ ∗ (g) = 1. First, there exists
a 2-chain b in K0 such that g = ι(α) + ∂(b). Second, γ ∗ (g) = γ ∗ (ι(α) + ∂(b)) = 1,
since γ ∗ ι(α) = α∗ (α) = 1, and γ ∗ ∂(b) = 0 because γ ∗ is a cocycle. So g must
contain an even number of edges of the set that defines γ ∗ .
Consider the triangulation in Fig. 7, corresponding to a torus4 . Any cycle homol-
ogous to β contains an odd number of edges of β ∗ (e.g. dotted edges in Fig. 7.c).
3
E.g. they all connect the boundaries of holes with the ‘outer’ boundary of the object,
and each of them corresponds to an edge in the inclusion tree of the object.
4
Rectangle where bottom and top, respectively left and right edges are glued together.
272 R. Gonzalez-Diaz et al.

a) b) c) d) e)
Fig. 7. A torus: a) triangulation; b) representative cycles of homology generators; c) a
representative cocycle; d) and e) non-valid representative cocycles

The dotted edges in d) and e) do not form valid representative cocycles: in d),
a cycle homologous to β does not contain any edge of β ∗ ; in e), another cycle
homologous to β contains an even number of edges of β ∗ .

6 Conclusion
This paper considers cohomology in the context of graph pyramids. Representa-
tive cocycles are computed at the reduced top level and down projected to the
base level corresponding to the original image. Connections between cohomol-
ogy and graph theory are proposed, considering the application of cohomology
in the context of classification and recognition. Extension to higher dimensions,
where cohomology has a richer algebraic structure than homology, and complete
cohomology - graph theory associations are proposed for future work.

References
1. Hatcher, A.: Algebraic topology. Cambridge University Press, Cambridge (2002)
2. Wood, Z.J., Hoppe, H., Desbrun, M., Shroder, P.: Removing excess topology from
isosurfaces. ACM Trans. Graph. 23(2), 190–208 (2004)
3. Kropatsch, W.G.: Building irregular pyramids by dual graph contraction. IEE-Proc.
Vision, Image and Signal Processing 142(6), 366–374 (1995)
4. Kropatsch, W.G., Haxhimusa, Y., Pizlo, Z., Langs, G.: Vision pyramids that do not
grow too high. Pattern Recognition Letters 26(3), 319–337 (2005)
5. Munkres, J.R.: Elements of Algebraic Topology. Addison-Wesley, Reading (1993)
6. González-Dı́az, R., Jiménez, M.J., Medrano, B., Molina-Abril, H., Real, P.: Integral
operators for computing homology generators at any dimension. In: Ruiz-Shulcloper,
J., Kropatsch, W.G. (eds.) CIARP 2008. LNCS, vol. 5197, pp. 356–363. Springer,
Heidelberg (2008)
7. Peltier, S., Ion, A., Kropatsch, W.G., Damiand, G., Haxhimusa, Y.: Directly com-
puting the generators of image homology using graph pyramids. Image and Vision
Computing (2008) (in press), doi:10.1016/j.imavis.2008.06.009
8. Iglesias-Ham, M., Ion, A., Kropatsch, W.G., Garcı́a, E.B.: Delineating homology
generators in graph pyramids. In: Ruiz-Shulcloper, J., Kropatsch, W.G. (eds.)
CIARP 2008. LNCS, vol. 5197, pp. 576–584. Springer, Heidelberg (2008)
Annotated Contraction Kernels
for Interactive Image Segmentation

Hans Meine

Cognitive Systems Laboratory, University of Hamburg,


Vogt-Kölln-Str. 30, 22527 Hamburg, Germany
meine@informatik.uni-hamburg.de

Abstract. This article shows how the interactive segmentation tool


termed “Active Paintbrush” and a fully automatic region merging can
both be based on the theoretical framework of contraction kernels within
irregular pyramids instead of their own, specialized data structures. We
introduce “continous pyramids” in which we purposely drop the common
requirement of a fixed reduction factor between successive levels, and
we show how contraction kernels can be annotated for a fast naviga-
tion of such pyramids. Finally, we use these concepts for improving the
integration of the automatic region merging and the interactive tool.

1 Introduction
One of the most valueable and most often employed tools for image segmenta-
tion is the watershed transform, which is based on a solid theory and extracts
object contours even with low contrast. On the other hand, it is often criticized
for delivering a strong oversegmentation, which is simply a consequence of the
fact that the watershed transform has no built-in relevance filtering. Instead, it
is often used as the basis for a hierarchical segmentation setting in which an
initial oversegmentation is successively reduced, i.e. by merging adjacent regions
that are rated similar by some appropriate cost measure (e.g. the difference of
their average intensity) [1,2,3,4]. This bottom-up approach fits very well with
the concept of irregular pyramids [5,6], and the main direction of this work is to
show how the Active Paintbrush – an interactive segmentation tool developed
for medical imaging [2] – and an automatic region merging [2,3,7] can be for-
mulated based on the concepts of irregular pyramids and contraction kernels.
This serves three goals: a) delivering a useful, practical application of contrac-
tion kernels, b) basing the description of segmentation methods on well-known
concepts instead of their own, specialized representation, and c) demonstrating
how a common representation facilitates the development of a more efficient
integration of the above automatic and interactive methods.
The following sections are organized as follows: In section 2, we summarize
previous work on the Active Paintbrush and automatic region merging (2.1) and
on irregular pyramids and contraction kernels (2.2). Section 3 combines these
concepts and introduces the ideas of continuous pyramids and annotated con-
traction kernels (3.1), before proposing methods that exploit this new foundation
for a better integration of automatic and interactive tools (section 3.2).

A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 273–282, 2009.

c Springer-Verlag Berlin Heidelberg 2009
274 H. Meine

2 Previous Work
2.1 The Active Paintbrush Tool
The Active Paintbrush was introduced by Maes [2] as an efficient interactive seg-
mentation tool for medical imaging. It is based on an initial oversegmentation
produced using the watershed transform, and a subsequent merging of regions.
The latter is performed in two steps:

1. First, an automatic region merging reduces the oversegmentation by merg-


ing adjacent regions based on some homogeneity measure (in [2], an MDL
criterion is used, but there is a large choice of suitable measures [3,8]).
2. Subsequently, the Active Paintbrush allows the user to “paint” over region
boundaries to quickly determine the set of regions belonging to the object
to be delineated.

Since this is a pure bottom-up approach (i.e. the number of regions monotonically
decreases, and no new boundaries are introduced), this approach relies on all
important boundaries being already present in the initial oversegmentation. The
user steers the amount of merging performed in the first step in order to remove
as many boundaries as possible (to reduce the time spent in the second step)
without losing relevant parts.

Merge Tree Representation. For this work, it is important to highlight the in-
ternal representation built within the first step, in which the automatic region
merging interactively merges the two regions rated most similar (an equivalent
approach is used in [2,3,7,8]). This process is continued until the whole image is
represented by one big region, and at the same time a hierarchical description
of the image is built up: a tree of merged regions, the leaves of which are the
primitive regions of the initial oversegmentation (illustrated in Fig. 1a). This
tree can also be interpreted as encoding a stack of partitionings, each of which
contains one region less than the one below.

10 1 region 10
8 9 8 9
merging

7 4 7 4
6 2 5 3 6 2 5
1 1 1 1 1 1 1 1 1 1 10 regions 1 1 1 1 1
(a) full merge tree (10 regions) (b) pruned tree (7 regions)

Fig. 1. Hierarchical description of image as tree of merged primitive regions [2]

By labeling each merged node with the step in which the merge happened, it
becomes very easy to prune the tree as the user adjusts the amount of merging
interactively: for instance, the partitioning at level l = 4 within the above-
mentioned stack can be retrieved by pruning all branches below nodes with a
label≤ l (cf. Fig. 1b).
Annotated Contraction Kernels for Interactive Image Segmentation 275

Limitations. While this approach already allows for a relatively efficient interac-
tive segmentation, there is one limitation that we will remove in this article which
increases the efficiency a lot: the two above-mentioned steps are strictly sepa-
rated. This is unfortunate, since the automatic method used in the first step in
general produces partitionings that suffer from oversegmentation in some parts,
but already removed crucial edges elsewhere, e.g. at locations with very low
contrast. Thus, the merge parameter has to be set low enough not to lose the
part with the lowest contrast, and the interactive paintbrush needs to be used
to remove all unwanted edges in all other areas, too. It would be helpful if it
was possible to just make the needed manual changes and then go back to the
automatic method to quickly finish the segmentation.

2.2 Contraction Kernels


The concept of contraction kernels has been introduced in the context of irregu-
lar pyramids [9,10]. Like regular (Burt-style) pyramids, irregular pyramids define
tapering stacks of images represented at increasingly coarser scales. However, ir-
regular pyramids are based on graph-like s [5,6] to overcome the drawbacks of
regular pyramids imposed by their rigid, regular structure. More recently, com-
binatorial maps have been widely adopted as the basis for representing irregu-
lar tessellations, hence irregular pyramids have been defined as stacks of such
maps [4,8,11].
Contraction kernels are used to encode a reduction of one such graph-like
structure into a simpler one, i.e. the difference between two levels in an irregular
pyramid. In order to give a formal definition, we first need to recall the definitions
of some underlying concepts, starting with combinatorial maps (see Fig. 2):
Definition 1 (combinatorial map). A combinatorial map is a triple (D, σ, α)
where D is a set of darts (half-edges), and σ, α are permutations defined on D
such that α is an involution (all orbits have length 2) and the map is connected,
i.e. there exists a σ-α-path between any two darts.
In order to represent a segmented image, each edge of the boundary graph is
split into two opposite darts, and the permutation α is used to tie these pairs

D = {1, −1, 2, −2, . . . , 8, −8}


1 −1
1 −1 2 −2 3 −3 4 −4 . . .
σ 7 −7 α −1 1 −2 2 −3 3 −4 4 . . .
σ −5 −7 −1 3 4 5 −2 −3 . . .
6
−8 8 ϕ 2 7 4 8 −4 −2 5 3 . . .
−5 . . . 5 −5 6 −6 7 −7 8 −8
−6 2 α
α . . . −5 5 −6 6 −7 7 −8 8
−3 3 −2 σ . . . −4 7 −6 −8 1 8 2 6
5
ϕ . . . 1 −3 6 −8 −1 −5 −6 −7
−44 ϕ := σ −1 ◦ α for contour traversal

Fig. 2. Example combinatorial map representing the contours of a house


276 H. Meine

of darts together, i.e. each α-orbits represents an edge of the boundary graph.
The permutation σ then encodes the counter-clockwise order of darts around
a vertex, i.e. each σ-orbit corresponds to a vertex of the boundary graph. By
convention, D ⊂ Z \ {0} such that α can be efficiently encoded as α (d) := −d.
The dual permutation of σ is defined as ϕ = σ ◦ α and thus encodes the order
of darts encountered during a contour traversal of the face to the right, i.e. each
ϕ-orbit represents a face of the tessellation.
In contrast to earlier representations using simple [5,6] or dual graphs [12],
combinatorial maps explicitly encode the cyclic order of darts around a face,
which makes the computation of the dual graph so efficient that it does not need
to be represented explicitly anymore.
Nevertheless, combinatorial maps also suffer from some limitations, most no-
tably that they rely on “pseudo edges” or “fictive edges” [12,13] to connect oth-
erwise separate boundary components. Topologically-wise, they are commonly
called “bridges”, since every path between their end nodes must pass via this
edge. These artificial connections have several drawbacks:

– In some situations, one may want to have bridges represent existing image
features, for instance incomplete boundary information or skeleton parts. This
would require algorithms to differentiate between fictive and real bridges.
– If we relate topological edges with their geometrical counterparts, we are
faced with the problem that fictive edges do not correspond to any ge-
ometrical entity. Even topologically-wise, fictive edges “appear arbitrarily
placed” [13].
– They lead to inefficient algorithms; e.g. contour traversals are needed to
determine the number of holes or to find an enclosing parent region.

Because of the above limitations, combinatorial maps are often used in conjunc-
tion with an inclusion relation that replaces the fictive edges [14,15].
Using these topological formalisms, segmentation algorithms can rely on a
sound topology that allows them to work with regions and boundaries as duals
of each other. However, segmentation first and foremost relies on an encoding of
the tessellation’s geometry, which is not represented by the above maps. Thus,
they are typically used side-by-side with a label image or similar.
Therefore, we have introduced the GeoMap [8,16,17] which represents both
topological and geometrical aspects of a segmentation, thus allowing algorithms
no longer to deal with pixels directly, and ensuring consistency between geometry
and topology. In particular, this makes algorithms independent of the embedding
model and allows to use either inter-pixel boundaries [18], 8-connected pixel
boundaries [16], or sub-pixel precise polygonal boundaries [8,17].

Reduction Operations. In order to build irregular pyramids using any of the


above maps, one needs some kind of reduction operation for building higher
levels from the ones below, analogous to the operations used for regular pyra-
mids. While in Gaussian pyramids, the reduction operation is parametrized by a
Gaussian (smoothing) kernel, Kropatsch [9] has introduced contraction kernels
Annotated Contraction Kernels for Interactive Image Segmentation 277

for irregular pyramids (for brevity, we give the graph-based definition here, which
is less involved than the analoguous definition on combinatorial maps [10]):

Definition 2 (contraction kernel). Given a graph G (V, E), a contraction


kernel is a pair (S, N ) of a set of surviving vertices S ⊂ V and a set of non-
surviving edges N ⊂ E such that (V, N ) is a spanning forest of G and S are the
roots of the forest (V, N ).

A contraction kernel is applied to a graph whose vertices represent regions (cf. the
dual map (D, ϕ)) by contracting all edges in N , such that for each graph in
the forest, all vertices connected by the graph are identified and represented
by its root s ∈ S (details on contractions within combinatorial maps may be
found in [11]). In simple words, a contraction kernel is used to specify groups of
adjacent regions within a segmentation that should be merged together.

3 Contraction Kernels for Efficient Interactive


Segmentation

3.1 Interactive Navigation of Continuous Pyramids

Contraction kernels as described in section 2.2 form a very general description


of a graph decimation, i.e. much more general than previous approaches [5,6],
which had strict requirements on the chosen survivors and contracted edges. For
example, although it may be desirable for some approaches to have a logarithmic
tapering graph pyramid for computational reasons [19], the above definition does
not enforce this at all.

Continuous Pyramids. In fact, we can build “continuous pyramids” in which only


one region is merged in every step, as done by the stepwise optimization used for
the Active Paintbrush preprocessing [2,7]. In our context, the reduction factor
between successive levels can be declared irrelevant:

– In practice, it is unneeded to represent all levels at the same time; instead,


we will show in the following how to efficiently encode only the bottom layer
and an annotated contraction kernel that allows to directly recreate any level
of the whole hierarchy from it. Thus, memory is no issue.
– The whole purpose of introducing irregular pyramids is to preserve fine de-
tails at higher levels, which should let further analysis steps work on single
levels instead of the whole hierarchy at once.
– Given the right merge order, traditional irregular pyramids simply consist
of a subset of the levels of our continuous pyramid, and even for good
cost measures, it is unlikely that the implicit selection of the levels is op-
timal. Therefore, we propose to separate the computation of the pyramid
and the subsequent level selection, and leave the latter up to the analysis
algorithm.
278 H. Meine

9 9
7 7
8 8
6 6
5 4 3 5 4 3
1 2 1 2

(a) annotated contraction kernel (b) contraction kernel for the fourth level

Fig. 3. Annotated contraction kernels for a continuous pyramid (cf. Fig. 1)

Annotated Contraction Kernels. We have already hinted at how our represen-


tation of this continuous pyramid looks like: We simply represent the pyramid’s
bottom by means of one GeoMap and the series of merges by an annotated
contraction kernel that resembles the merge tree from section 2.1. Then, when
retrieving a given pyramid level l, we take advantage of the concept of equivalent
contraction kernels [9,11], which means that it is possible to combine the effect
of a sequence of contraction kernels (here, merging only two regions each) into
a single, equivalent kernel.
The contraction kernel illustrated in Fig. 3a reduces the bottom layer to a
single surviving region (represented by the leftmost vertex), i.e. it contains a
single, spanning tree. The key to its use is the annotation: while the automatic
algorithm used in the preparation step of the Active Paintbrush merged all
regions in order of increasing cost (i.e. increasing dissimilarity), we composed
the corresponding contraction kernels, effectively building the depicted tree, and
labeled each edge with the step in which the corresponding merge happened
(analoguous to the node labels used in [2]).
Now when a given level l shall be retrieved (e.g. the user interactively changes
the desired granularity of the segmentation), we do not have to explicitly perform
the sequence of region merges that led from the initial oversegmentation to l, but
we can apply the combined, equivalent contraction kernel at once, which can be
implemented much more efficently (e.g. partially parallelized). The annotation
allows us to derive this contraction kernel simply by removing all edges with
labels ≥ l. This is illustrated by the dashed cut in Fig. 3b, which shows the
contraction kernel leading to the same segmentation as in the example from
Fig. 1b. The same approach can be used to jump from any level l1 to a level
l2 ≥ l1 , where edges with labels < l1 can be ignored (the reader may imagine a
second cut from below).
Often, we are also interested in the values of the merge cost (i.e. dissimilarity)
measure associated with each step; therefore, we do not only label each edge
in our annotated contraction kernel with the step, but with a (step, cost) pair.
This makes an efficient user interface possible that allows an operator to quickly
choose any desired level of segmentation granularity. Some example levels gener-
ated from a CT image of human lungs using the region-intensity- and -size-based
“face homogeneity” cost measure cfh from [3] are depicted in Fig. 4; from left
to right: level 0 with 9020 regions, level 7020 with 2000 regions (cfh ≈ 0.12),
Annotated Contraction Kernels for Interactive Image Segmentation 279

Fig. 4. Example pyramid levels generated by the automatic region merging

level 8646 with 374 regions (cfh = 0.5), and level 9000 with 20 regions left
(cfh ≈ 5.07).

3.2 Efficient Integration of Manual and Automatic Segmentation

As described in section 2.1, the use of the Active Paintbrush [2] consists of two
steps: after the oversegmentation and the hierarchical representation have been
computed, the user first adjusts the level of automatic merging by choosing an ap-
propriate level from the imaginary stack of tesselations. Afterwards, the operator
uses the Active Paintbrush to “paint over” any remaining undesirable boundaries
within the object of interest, which effectively creates new pyramid levels.
We can now implement the automatic and the interactive reduction methods
based on the same internal, map-based representation and contraction kernels.
This opens up new possibilities with respect to the combination of the tools, i.e.
we can now use one after the other for reducing the oversegmentation and creating
further pyramid levels up to the desired result. This is illustrated in Fig. 5a: the lev-
els of our continuous pyramid are ordered from level 0 (initial oversegmentation)
on the left to level 2834 (the apex, at which the whole image is represented as one
single region) on the right. The current pyramid is the result of applying first the
automatic region merging (ARM), then performing some manual actions with the
Active Paintbrush (APB), then using the ARM again.

display/work level apex


ARM APB ARM
level: 0 1207 1211 2410 2834
navigational range
(a) Naive representation of generated pyramid

display/work level apex


APB ARM
level: 0 4 2410 2834
navigational range
(b) Pyramid after reordering to protect manual changes from disappearing

Fig. 5. Alternating application of automatic and interactive methods


280 H. Meine

(a) initial oversegmentation (pre-filtered (b) with high thresholds, low-contrast


sub-pixel watersheds [20,8]) edges are removed by the automatic
method (38 regions left)

(c) the cost threshold is interactively ad- (d) with a few strokes, single critical re-
justed so that no boundaries are damaged gions are finalized and "fixed" by protect-
(114 regions remaining) ing the faces (white, hatched)

(e) now, automatic region merging can be (f) with two quick final strokes, three re-
applied again, without putting the pro- maining unwanted regions are removed to
tected regions at risk (30 regions left) get this final result (27 regions)

Fig. 6. Example session demonstrating our new face protection concept; the captions
explain the user actions for going from (a) to (f)
Annotated Contraction Kernels for Interactive Image Segmentation 281

However, this architecture poses difficulties when the user is given the freedom
to e.g. change the cost measure employed by the ARM or to navigate to lower
pyramid levels than those generated manually: it is very unintuitive if the results
of one’s manual actions disappear from the working level, or if the pyramid is even
recomputed such that they are lost completely. Again, the solution lies in the con-
cept of equivalent contraction kernels, which make it possible to reorder merges:
we represent the results of applying the Active Paintbrush in separate contraction
kernels such that they always get applied first, see Fig. 5b. (This is equivalent to
labeling the edges within our annotated contraction kernel with zero.) In effect,
this makes it possible to locally finish the segmentation of an object at the de-
sired pyramid level, but to go back to lower pyramid levels when one notices that
important edges are missing in other parts of the image.
We also add the concept of face protection to improve the workflow in the
opposite direction: often, the Active Paintbrush is used to remove all unwanted
edges within the contours of an object of interest. Then, it should be possible
to navigate to higher pyramid levels without losing it again, so we provide a
means to protect a face, effectively finalizing all of its contours. An example
segmentation session using these tools is illustrated in Fig. 6.

4 Conclusions
In this paper, we have shown how the theory of contraction kernels within
irregular pyramids can be used as a solid foundation for the formulation of inter-
active segmentation methods. We have introduced annotated contraction kernels
in order to be able to quickly retrieve a contraction kernel suitable for efficiently
computing any desired level directly from the pyramid’s bottom or from any of
the levels in between. Furthermore, we have argued that logarithmic tapering
with a fixed reduction factor is irrelevant for irregular pyramids in contexts like
ours, and we have introduced the term continuous pyramids for the degenerate
case in which each level has only one region less than the one below.
On the other hand, we proposed two extensions around the Active Paintbrush
tool which make it even more effective. First, we have expressed both the auto-
matic region merging and the interactive method as reduction operations within
a common irregular pyramid representation. This allowed us to apply the the-
ory of equivalent contraction kernels in order to separate the representation of
manual actions from automatically generated pyramid levels and thus to enable
the user to go back and forth between segmentation tools. Along these lines,
we have also introduced the concept of face protection which complements the
Active Paintbrush very well in a pyramidal context.

References
1. Najman, L., Schmitt, M.: Geodesic saliency of watershed contours and hierarchical
segmentation. IEEE T-PAMI 18, 1163–1173 (1996)
2. Maes, F.: Segmentation and Registration of Multimodal Images: From Theory,
Implementation and Validation to a Useful Tool in Clinical Practice. Ph.D thesis,
Katholieke Universiteit Leuven, Leuven, Belgium (1998)
282 H. Meine

3. Haris, K., Efstratiadis, S.N., Maglaveras, N., Katsaggelos, A.K.: Hybrid image
segmentation using watersheds and fast region merging. IEEE Trans. on Image
Processing 7, 1684–1699 (1998)
4. Meine, H.: XPMap-based irregular pyramids for image segmentation. Diploma the-
sis, Dept. of Informatics, Univ. of Hamburg (2003)
5. Meer, P.: Stochastic image pyramids. Comput. Vision Graph. Image Process. 45,
269–294 (1989)
6. Jolion, J.M., Montanvert, A.: The adaptive pyramid: A framework for 2D image
analysis. CVGIP: Image Understanding 55, 339–348 (1992)
7. Beaulieu, J.M., Goldberg, M.: Hierarchy in picture segmentation: A stepwise opti-
mization approach. IEEE T-PAMI 11, 150–163 (1989)
8. Meine, H.: The GeoMap Representation: On Topologically Correct Sub-pixel Image
Analysis. Ph.D thesis, Dept. of Informatics, Univ. of Hamburg (2009) (in press)
9. Kropatsch, W.G.: From equivalent weighting functions to equivalent contraction
kernels. In: Digital Image Processing and Computer Graphics: Applications in Hu-
manities and Natural Sciences, vol. 3346, pp. 310–320. SPIE, San Jose (1998)
10. Brun, L., Kropatsch, W.G.: Contraction kernels and combinatorial maps. Pattern
Recognition Letters 24, 1051–1057 (2003)
11. Brun, L., Kropatsch, W.G.: Introduction to combinatorial pyramids. In: Bertrand,
G., Imiya, A., Klette, R. (eds.) Digital and Image Geometry. LNCS, vol. 2243, pp.
108–127. Springer, Heidelberg (2002)
12. Kropatsch, W.G.: Building irregulars pyramids by dual graph contraction. IEEE-
Proc. Vision, Image and Signal Processing 142, 366–374 (1995)
13. Kropatsch, W.G., Haxhimusa, Y., Lienhardt, P.: Hierarchies relating topology and
geometry. In: Christensen, H.I., Nagel, H.-H. (eds.) Cognitive Vision Systems.
LNCS, vol. 3948, pp. 199–220. Springer, Heidelberg (2006)
14. Brun, L., Domenger, J.P.: A new split and merge algorithm with topological maps
and inter-pixel boundaries. In: The 5th Intl. Conference in Central Europe on
Computer Graphics and Visualization, WSCG 1997 (1997)
15. Köthe, U.: XPMaps and topological segmentation - a unified approach to finite
topologies in the plane. In: Braquelaire, A.J.P., Lachaud, J.O., Vialard, A. (eds.)
DGCI 2002. LNCS, vol. 2301, pp. 22–33. Springer, Heidelberg (2002)
16. Meine, H., Köthe, U.: The GeoMap: A unified representation for topology and
geometry. In: Brun, L., Vento, M. (eds.) GbRPR 2005. LNCS, vol. 3434, pp. 132–
141. Springer, Heidelberg (2005)
17. Meine, H., Köthe, U.: A new sub-pixel map for image analysis. In: Reulke, R.,
Eckardt, U., Flach, B., Knauer, U., Polthier, K. (eds.) IWCIA 2006. LNCS,
vol. 4040, pp. 116–130. Springer, Heidelberg (2006)
18. Braquelaire, J.P., Brun, L.: Image segmentation with topological maps and inter-
pixel representation. J. Vis. Comm. and Image Representation 9, 62–79 (1998)
19. Haxhimusa, Y., Glantz, R., Saib, M., Langs, G., Kropatsch, W.G.: Logarithmic
tapering graph pyramid. In: Van Gool, L. (ed.) DAGM 2002. LNCS, vol. 2449, pp.
117–124. Springer, Heidelberg (2002)
20. Meine, H., Köthe, U.: Image segmentation with the exact watershed transform. In:
Proc. Intl. Conf. Visualization, Imaging, and Image Processing, pp. 400–405 (2005)
3D Topological Map Extraction from Oriented
Boundary Graph

Fabien Baldacci1 , Achille Braquelaire1, and Guillaume Damiand2


1
Université Bordeaux 1, CNRS, Labri, UMR5800, F-33405, Talence Cedex, France
{baldacci,braquelaire}@labri.fr
2
Université de Lyon, CNRS,
Université Lyon 1, LIRIS, UMR5205, F-69622, Villeurbanne Cedex, France
guillaume.damiand@liris.cnrs.fr

Abstract. The extraction of a 3D topological map from an Oriented


Boundary Graph can be needed to refine a 3D Split and Merge seg-
mentation using topological information such as the Euler characteristic
of regions. A presegmentation could thus be efficiently obtained using
a light structuring model, before proceeding to the extraction of more
information on some regions of interest. In this paper, we present the
topological map extraction algorithm which allows to reconstruct locally
a set of regions from the Oriented Boundary Graph. A comparison of the
two models is also presented.

Keywords: 3D split and merge, image segmentation, topological


structuring.

1 Introduction
The Segmentation process consists in defining a partition of the image into ho-
mogeneous regions. Split and merge methods [HP74] are widely used in seg-
mentation. It consists in alternatively splitting a region, and merging adjacent
ones according to some criteria, in order to define a partition of the image. To
be efficient, it needs a topological structuring of the partition in order to retrieve
some information such as: the region containing a given voxel, the list of regions
adjacent to a given one, the list of surfaces defining the boundary of a region,
etc [BBDJ08].
Several models have been proposed to represent the partition of an image. A
popular model is the Region Adjacency Graph [Ros74], which is not sufficient
for most of 3D segmentation algorithms due to the lack of information encoded.
A more sophisticated model is the topological map model [Dam08] that uses
combinatorial maps in order to encode the topology of the partition and an
intervoxel matrix [BL70] for the geometry. It encodes all information required
to design split and merge segmentation algorithm including high topological
features allowing to retrieve Euler characteristic and Betti numbers of a region
[DD02]. Since high level topological features are not necessary for basic split and
merge segmentation algorithms, an other model have been proposed [BBDJ08].

A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 283–292, 2009.

c Springer-Verlag Berlin Heidelberg 2009
284 F. Baldacci, A. Braquelaire, and G. Damiand

Table 1. Construction time comparison between the two models

Image size Image complexity Topological map model OBG model


(nb of voxels) (nb of regions) construction time (s) construction time (s)
256x256x256 34390 35.8 3.2
324x320x253 103358 69.15 5.26
512x512x475 279618 277.88 23.15
512x512x475 518253 301.27 24.37
512x512x475 1121911 317.01 27.11

This model uses a multigraph called Oriented Boundary Graph (OBG) to encode
the topology associated with the same geometrical level than in the topological
map model.
This second model is both more efficient (table 1) and less space consuming.
The space consumption difference cannot be exactly computed because the mem-
ory optimized implementation of the OBG is still under development, but unopti-
mized version show that it will be at least two to four times less space consuming
depending on the number of regions and surfaces of the partition. The space con-
sumption can be a critical constraint with large image or with algorithms needing
a highly oversegmented partition during the segmentation process.
The OBG model is more efficient than the topological map one, it can be
efficiently parallelized [BD08] and is sufficient for split and merge segmentation
that does not use topological characteristics of regions as criteria. But this miss-
ing information could in some cases be necessary and that is the reason why we
have studied the extraction of the topological map of some regions of interest
from the OBG. It could also be useful for the topological map model to use
a more efficient model for a presegmentation step and extract the topological
map from the simplified partition using an algorithm avoiding to traverse all
the voxels. Thus this work is suitable both for the OBG model in order to have
an on-demand high topological features extraction, and for the topological map
model in order to be efficiently extracted from a presegmented image, for which
the equivalent presegmentation using topological map is too much time or space
consuming.
This paper is organized as follow. In section 2, we describe and briefly compare
the two topological models. Then in section 3 we describe the topological map
extraction algorithm from the OBG. We conclude in section 4.

2 Presentation of the Two Models

Let us present some usual notions about image and intervoxels elements. A voxel
is a point of discrete space ZZ 3 associated with a value which could be a color
or a gray level. A three dimensional image is a finite set of voxels. In this work,
combinatorial maps are used to represent voxel sets having the same labeled
value and that are 6-connected. We define region as a maximal isovalued set of
6-connected voxels.
3D Topological Map Extraction from Oriented Boundary Graph 285

To avoid particular process for the image border voxels, we consider an infinite
region R0 that surrounds the image. If a region Rj is completely surrounded by
a region Ri we say that Rj is included in Ri .

2.1 Recalls on 3D Topological Maps


A 3D topological map is an extension of a combinatorial map used to represent a
3D image partition. Let us recall the notions on combinatorial maps, 3D images,
intervoxel elements and topological maps that are used in this work.
A combinatorial map is a mathematical model describing the subdivision of
space, based on planar maps. A combinatorial map encodes all the cells of the
subdivision and all the incidence and adjacency relations between the different
cells, and so describe the topology of this space.
The single basic elements used in the definition of combinatorial maps are
called darts, and adjacency relations are defined onto darts. We call βi the rela-
tion between two darts that describes an adjacency between two i-dimensional
cells (see Fig. 1B for one example of combinatorial map and [Lie91] for more
details on maps and comparison with other combinatorial models). Intuitively,
with this model, the notion of cells is represented by a set of darts linked by spe-
cific βi relations. For example, a face incident to a dart d is represented by the
set of darts accessible using any combination of β1 and β3 relations. Moreover,
given a dart d, which belongs to an i-cell c, we can find the i-cell adjacent to c
along the (i − 1)-cell which contains d by using βi (d). For example, given a dart
d that belongs to a face f and a volume v, the volume adjacent to v along f is
the 3-cell containing β3 (d). Lastly, we call i-sew the operation which puts two
darts in relation by βi .
In the intervoxel framework [KKM90], an image is considered as a subdivision
of a 3-dimensional space in a set of unit elements: voxels are the unit cubes,
surfels the unit squares between two voxels, linels the unit segments between
surfels-cells and pointels the points between linels (see example in Fig. 1C).
The topological map is a data structure used to represent the subdivision of
an image into regions. It is composed of three parts:
– a minimal combinatorial map representing the topology of the image;
– an intervoxel matrix used to retrieve geometrical information associated to
the combinatorial map. The intervoxel matrix is called the embedding of the
combinatorial map;
– an inclusion tree of regions.
Fig. 1 presents an example of topological map. The 3D image, composed of three
regions plus the infinite region R0 (Fig. 1A), is represented by the topological
map which is divided in three parts labeled B, C and D. The minimal combina-
torial map extracted from this image is shown in Fig. 1B. The embedding of the
map is represented in Fig. 1C, and the inclusion tree of regions in Fig. 1D.
The combinatorial map allows the representation of all the incidence and
adjacency relations between cells of an object. In the topological map framework,
we use the combinatorial map as a topological representation of the partition
286 F. Baldacci, A. Braquelaire, and G. Damiand

R0
R3 11
00
000
111
11
00
000
111
11
00
000
111
000
1
000
111
11
11
0001
1
000
111
11
00
000
1110
01
10
R0
11
00
000
111
001
11
11
00 0
000
1110
1
0000
110000
1111
11
000
111
00
11
000
1110
10
1
0000
00
1111
11
00
11
000
111 0
1
0000
00
1111
11
0
10
000
11111
0000
00
1111
11
011
00
0
1
000
111 0
1
0000
1111
0
111
00
0
10
1
0000
1111
11
00
R1 R2 0
1
0000
1111
11
00
0
1
0
1
0
1
0
1
11
00
0000
1111
0
111
00
0000
1111
0
111
00
0000
1111
11
00
0000
1111
11
00
0000
11110
11
00
0
1
000
111
111
00
0
1
000
111
11
00
0
1
000
111
11
00
0000
1111
111
011
00
0
1
000
111
00
0000
1111111
00
00
1
000
111
1111
00
0000
11111
0
00
11
00
000
111
000
111
00
11
000
111
000
111
00
11
000
111
00
11
000
111
00
11
000
111
00
11
000
111
00
11
000
111
00
11
000
111
00
11
000
111
00
11
00
11
000
111
00
11
000
111
00
11
000
111
00
11
000
111
00
11
000
111
000
111
11
00
000
111
000
111
11
00
000
111
000
111
11
00
000
111
000
111
11
00

R1 R2 R 3
A B C D

Fig. 1. The different parts of the topological map used to represent an image. (A) 3D
image. (B) Minimal combinatorial map. (C) Intervoxel matrix (embedding). (D) Inclu-
sion tree of regions.

of an image into regions. Each face of the topological map is separating two
adjacent regions and two adjacent faces do not separate the same two regions.
With these rules, we ensure the minimality (in number of cells) of the topological
map (see [Dam08] for more details on topological maps).
The intervoxel matrix is the embedding of the combinatorial map. Each cell
of the map is associated with intervoxel elements representing geometrical infor-
mation of the cell.
The inclusion tree of regions represents the inclusion relations. Each region in
the topological map is associated to a node in the inclusion tree. The nodes are
linked together by the inclusion relation previously defined.

2.2 Oriented Boundary Graph Model

The second model is composed of a multigraph called Oriented Boundary Graph


[BBDJ08]. Each node of the graph corresponds to a region of the partition. Each
surface of the segmented image corresponds to an oriented edge of the graph.
Surfaces and edges are linked by associating an oriented surfel to each edge. Each
edge is also linked to a representative linel for each border of its corresponding
surface; this is necessary to retrieve the surface adjacency relation (which is used
to compute the inclusion relation). Geometrical position of the region relating to
its boundary surfaces is retrieved according to the orientation of the embedding
surfel and the position of the node (beginning or end of the oriented edge). This
graph is sufficient to encode the multiple region adjacency relation, the surface
adjacency relation and thus the inclusion relation could efficiently be computed.
The geometrical level encoded by an intervoxel matrix is the same as the one
used with topological map, and links are defined in order to go both from the
geometrical level to the topological one and reciprocally. An example is shown
on Fig. 2.
This model, contrary to the precedent one, can be directly extracted from
the description of the image partition without treatments on the resulting sur-
faces. It only needs a strong labelling of the voxels and linels that can be locally
computed (by looking at the neighbors of the considered element). For each
new label, the corresponding topological element have to be created, and the
3D Topological Map Extraction from Oriented Boundary Graph 287

Fig. 2. Example of image partition with the corresponding representation using the
Oriented Boundary Graph model

topological and geometrical elements to be linked themselves. This model re-


quires less processing than the topological map one to be maintained and is so
more efficient by avoiding to have each surface homeomorphic to a topological
disc. Furthermore, this split algorithm requires only local treatment and could be
efficiently computed in parallel [BD08]. Information encoded by the OBG model
are sufficient to design basic split and merge segmentation algorithms. But it
could be necessary for some segmentation algorithms to use some high topologi-
cal features such as Euler characteristic of some regions, either as a segmentation
criteria or as a constraint on regions in a merge step. In order to design efficiently
those segmentation algorithms it is necessary to build the topological map of a
set of selected regions.

2.3 Comparison of Both Models

Let us recall advantages and drawbacks of each model. The Topological Map
model encodes the whole topology of the partition, from the regions and surfaces
adjacencies to the Euler characteristic and Betti number of regions. Computing
this map consumes an important memory space and requires a time consuming
extraction algorithm. The OBG is an enhanced multiple adjacency graph with
an intervoxel matrix associated. It is intended to be simpler than the topological
map but also less expressive. It has an efficient extraction algorithm and uses low
memory space. But some high topological features such as the characteristics of
regions are not encoded. Given a description of the image partition with a matrix
of labels, the OBG extraction algorithm have a O(v + s + l) complexity with v
the number of voxels, s the number of surfels and l the number of linels, because
each operation is processed once by element and takes O(1). The topological
288 F. Baldacci, A. Braquelaire, and G. Damiand

map extraction algorithm has the same theoretical complexity than the the OBG
one, O(v + s + l), for the same reason: each cell of the intervoxel subdivision is
processed exactly once. However, the number of operations achieved for each cell
is more important, which explains the difference in extraction times.
Advantages and drawbacks of each model could be summarized as follows:

– OBG: enhanced multiple region adjacency graph with an intervoxel matrix


embedding
• advantages: simple, efficient extraction algorithm, low memory space
consumption
• drawbacks: does not represent topological characteristics of regions
– Topological map: combinatorial map describing the subdivision of an image
into sets of vertices, edges, faces and volumes
• advantages: represent topological characteristics of region, represent all
the cells (vertices, edges, faces and volumes)
• drawbacks: high memory space and time consumption required for the
extraction algorithm

Converging to an optimal model that have the efficiency of an OBG and the
expression of a Topological Map is not possible. That is the reason why we need
a conversion algorithm allowing to take advantages of the two models, by not
using the same model during the whole segmentation process.

3 The Conversion Algorithm

The principle of the algorithm is to start with an OBG G embedded in an


intervoxel matrix I, and a set of connected regions S, and to compute the local
3D topological map M representing S, but taking into account neighbors regions
not included in S.
The extraction of the topological map is achieved by building the map of
each surface in S, and linking them together using corresponding βi in order to
represent S. Surfaces and real edges already exists in the OBG, thus only fictive
edges have to be computed and the βi relations need to be fixed in order to have
the 3D topological map corresponding to S.
The algorithm is divided into two subtasks: the main task reconstructs the
map corresponding to a set of regions. This task uses a second task which re-
construct one face of the region.

3.1 Region Reconstruction

Algorithm 1 is the main part which reconstructs the part of topological map
representing a given set of regions.
To reconstruct a given region R, we run through edges of the OBG. Indeed,
each edge corresponds to a surface of R. Now, two cases have to be considered
depending if the surface is closed or not.
3D Topological Map Extraction from Oriented Boundary Graph 289

Algorithm 1. Region set reconstruction


Input: G an OBG
S a set of connected regions
Output: The topological map representing S
foreach edge e adjacent to a region of S do
if ∃ at least a linel associated to e then
l ← first linel of e;
b ← build the face associated to l;
foreach linel l associated to e (except l) do
b ← build the face associated to l ;
insert a fictive edge between b and b ;
else b ← NULL;
1 if g(e) = 0 then
create 2 × g(e) edges, loop sewed on themselves;
insert them around b;

1. if the surface is closed, there is no linel associated to the surface in the


OBG (because linels represent border of surfaces). In such a case, we need
to construct a map composed of 2 × g(e) edges, 1 vertex and 1 face (with
g(e) the genus of the current surface).
2. if the surface is open, let us denote by k the number of boundaries, each one
being a list of consecutive linels forming a loop. Each boundary is reconstruct
by using Algorithm 2. Moreover, each new face is linked with previous map
by adding a fictive edge. This ensures to obtain a connected map. Then, we
may need to add some edges in order to obtain a surface with the “correct
topology”. In order to do so, as in the previous case, we construct a map
composed of 2 × g(e) edges, 1 vertex and 1 face, but in this case we link this
new map with the map already build and corresponding to boundaries in
order both to obtain a connected map and having the correct topology.

Adding fictive edges to the existing edges in the OBG allows to retrieve the two
properties of the topological map that are missing in the OBG. Indeed, fictive
edges (i) link the different boundaries of a surface and (ii) conserve a valid Euler
characteristic for each surface.
Before applying Algorithm 1, we need to compute the Euler characteristic of
each surface since this information is not present in the OBG but it is needed
during the map reconstruction. For that, we compute for each edge e of G, #v
#e and #f (respectively the number of vertices, edges and faces of the surface
associated to e).
The Euler characteristic of the face associated to e is denoted by χ(e) = #v −
#e + #f . The genus associated to this surface is denoted by g(e) and computed
with the Euler formula formula: g(e) = 2−χ(e) 2 . The Euler characteristic of the
surface of a region r is the sum of the Euler characteristic of all its faces (the fact
that vertices and edges incident to two faces are counted twice is not a problem for
the Euler characteristic since it uses the difference between these two numbers).
290 F. Baldacci, A. Braquelaire, and G. Damiand

3.2 Face Reconstruction

The principle of the face reconstruction given in Algorithm 2 is to traverse the


geometry (the linels) and to reconstruct darts and β1 relations. Each created
dart is linked with the associated triplet in the OBG. For each linel, if some
neighbors triplets have already been treated, β2 and β3 are updated.

Algorithm 2. Face border reconstruction


Input: G an OBG
l1 a linel belonging to the border of a face
Output: The part of the map corresponding to this border
dprev ← nil; df irst ← nil;
foreach linel l of the face border incident to l1 do
if l is incident to a pointel p then
Compute the surfel s from p, l and the current region;
d ← new dart associated to the triplet (p, l, s);
if df irst = nil then df irst ← d;
else 1-sew(dprev , d);
d ← dart associated to the triplet (p, lprev , s2 );
if d = nil then 2-sew(dprev , d );
d ← dart associated to the triplet (p, lprev , s3 );
if d = nil then 3-sew(dprev , d );
dprev ← d;
1-sew(dprev , df irst );
d ← dart associated to the triplet (p, lprev , s2 );
if d = nil then 2-sew(dprev , d );
d ← dart associated to the triplet (p, lprev , s3 );
if d = nil then 3-sew(dprev , d );
return df irst ;

During this computation, new created darts are associated to their triplets
(that have to be oriented) in order to retrieve, when a new dart is created,
incident darts to the same triplet, and thus update the β2 and β3 links.
This algorithm is local: when processing dart d associated to triplet (p, l, s),
we search for darts already existing in the neighborhood of d and sew found
darts with d. In Fig. 3, we explain how triplets (p, lprev , s2 ) and (p, lprev , s3 ) are
computed from (p, l, s).

3.3 Complexity and Proof of Correctness of the Algorithm

Complexity. Algorithm 1 is in time O(nl + g + ns ) with nl the number of linels of


reconstructed regions, g its genus, and ns the number of surfels of reconstructed
regions. Indeed, Algorithm 2 passes through all linels of the process border.
Each operation is atomic, and finding triplet (p, lprev , s2 ) and (p, lprev , s3 ) can
be achieved in at most 4 operations (since there are at most 4 surfels around
a linel). In Algorithm 1, we process successively and exactly once each border
3D Topological Map Extraction from Oriented Boundary Graph 291

n’ n
l’
s
s3
l prev l
s2 p
pprev

Fig. 3. How to compute triplets (p, lprev , s2 ) and (p, lprev , s3 ). (p, l, s) is the triplet
associated to current dart d, and pprev is the pointel incident to dart dprev . We want
to sew dprev by β2 and β3 if corresponding darts are already created. s3 is the first
surfel found from s, by turning around linel l in the direction of − →
n (the normal of s,
oriented towards the current region r). lprev is the linel incident to p and s3 (i.e. the
previous linel of the current border). s2 is the first surfel found from s3 , by turning


around linel lprev in the opposite direction of n (the opposite of the normal of s3 .
Indeed, the normal of s3 is oriented towards the adjacent region of r, thus the opposite
is oriented towards r).

of the reconstructed region. This give the first part of the complexity O(nl ).
The second part is due to the adding of 2 × g edges which is done in linear
time depending on g. The last part corresponds to the computation of g, which
required each surfel to be considered once leading to a complexity in O(ns ).

Proof of correctness. Firstly, Algorithm 1 build a combinatorial map where each


dart is sew for β1 and β2 . For β1 , this is directly due to Algorithm 2 which follows
one cycle of closed linels. At the end of this step, we have created a closed list of
darts which are β1 sew. Moreover, since in Algorithm 1 we process each face of
reconstructed region, and since the border of a region is closed, we are sure that
given a face f , we process all the adjacent faces of f and thus each dart is β2 -sew.
Secondly, we need to prove that the Euler number of the reconstructed re-
gion is correct. We note g the genus of the region. Algorithm 2 computes only
faces which are homeomorphic to topological disks (each face has one closed
boundary). Thus, if we do not add fictive edges, we obtain a sphere, with
χ = #v − #e + #f = 2. To this surface, we add 2 × g edges (more precisely
we add the sum of 2 × g(e) for each edge e of the model, and this sum is equal
to 2 × g as explain above), without modifying the number of vertices nor the
number of faces. Thus, the new Euler characteristic is χ = #v − (#e + 2g) + #f
and so χ = χ − (2 × g) = 2 − 2g: we obtain the correct Euler characteristic of a
surface of genus g.

4 Conclusion

Split and merge segmentation in the 3D case could be a highly time consuming
method without the use of a topological structuring. But an optimal structuring
both in term of time and space consumption and in term of topological features
292 F. Baldacci, A. Braquelaire, and G. Damiand

representation could not be achieved. That is the reason why two models have
been developed: the Topological map one which represent the whole topology of
an image partition and the OBG model which is more efficient according to time
and space consumption.
In this article we have developed an algorithm that allows to extract the
Topological Map from the OBG. This operation allows to have an on-demand
extraction of the Topological Map from some regions of the OBG, which allows
to locally retrieve all the topological features of some regions of interest in the
image partition.
The other utilization of this algorithm is to extract the Topological Map of
the whole image partition but only a the step of the segmentation process where
it is needed. The presegmentation will be done using the OBG in order to have
a lower time consumption or in order to avoid a lack of memory.
In future works, we want to study the possibility to modify the topological
map reconstructed, for example by an algorithm which take into account a topo-
logical criterion, and then to update locally the OBG model to reflect the image
modifications.

References
[BBDJ08] Baldacci, F., Braquelaire, A., Desbarats, P., Domenger, J.P.: 3d image topo-
logical structuring with an oriented boundary graph for split and merge seg-
mentation. In: Coeurjolly, D., Sivignon, I., Tougne, L., Dupont, F. (eds.)
DGCI 2008. LNCS, vol. 4992, pp. 541–552. Springer, Heidelberg (2008)
[BD08] Baldacci, F., Desbarats, P.: Parallel 3d split and merge segmentation with
oriented boundary graph. In: Proceedings of The 16th International Con-
ference in Central Europe on Computer Graphics, Visualization and Com-
puter Vision 2008, pp. 167–173 (2008)
[BL70] Brice, C.R., Fennema, C.L.: Scene analysis using regions. Artif. Intell. 1(3),
205–226 (1970)
[Dam08] Damiand, G.: Topological model for 3d image representation: Definition
and incremental extraction algorithm. Computer Vision and Image Under-
standing 109(3), 260–289 (2008)
[DD02] Desbarats, P., Domenger, J.-P.: Retrieving and using topological character-
istics from 3D discrete images. In: Proceedings of the 7th Computer Vision
Winter Workshop, pp. 130–139, PRIP-TR-72 (2002)
[HP74] Horowitz, S.L., Pavlidis, T.: Picture segmentation by a directed split and
merge procedure. In: ICPR 1974, pp. 424–433 (1974)
[KKM90] Khalimsky, E., Kopperman, R., Meyer, P.R.: Boundaries in digital planes.
Journal of Applied Mathematics and Stochastic Analysis 3(1), 27–55 (1990)
[Lie91] Lienhardt, P.: Topological models for boundary representation: a compar-
ison with n-dimensional generalized maps. Computer-Aided Design 23(1)
(1991)
[Ros74] Rosenfeld, A.: Adjacency in digital pictures. In: InfoControl, vol. 26 (1974)
An Irregular Pyramid for Multi-scale Analysis
of Objects and Their Parts

Martin Drauschke

Department of Photogrammetry, Institute of Geodesy and Geoinformation


University of Bonn, Nussallee 15, 53115 Bonn, Germany
martin.drauschke@uni-bonn.de

Abstract. We present an irregular image pyramid which is derived from


multi-scale analysis of segmented watershed regions. Our framework is
based on the development of regions in the Gaussian scale-space, which
is represented by a region hierarchy graph. Using this structure, we are
able to determine geometrically precise borders of our segmented regions
using a region focusing. In order to handle the complexity, we select
only stable regions and regions resulting from a merging event, which
enables us to keep the hierarchical structure of the regions. Using this
framework, we are able to detect objects of various scales in an image.
Finally, the hierarchical structure is used for describing these detected
regions as aggregations of their parts. We investigate the usefulness of
the regions for interpreting images showing building facades with parts
like windows, balconies or entrances.

1 Introduction
The interpretation of images showing objects with a complex structure is a dif-
ficult task, especially if the object’s components may repeat or vary a lot in
their appearance. As far as human perception is understood today, objects are
often recognized by analyzing their compositional structure, cf. [9]. Besides spa-
tial relations between object parts, the hierarchical structure of the components
is often helpful for recognizing an object or its parts. E. g. in aerial images of
buildings with a resolution of 10 cm per pixel, it is easier to classify dark image
parts as windows in the roof, if the building at whole has been recognized before.
Buildings are objects with parts of various scales. Depending on the view
point, terrestrial or aerial, the largest visible building parts are its facade or its
roof. Mid-scale entities are balconies, dormers or the building’s entrance; and
small-scale parts are e. g. windows and window panes as window parts. We
restrict our focus on such parts, a further division down to the level of bricks or
tiles is not of our interest.
Recently, many compositional models have been proposed for the recognition
of natural and technical objects. E. g. in [6] a part-based recognition framework

This work has been done within the project Ontological scales for automated detec-
tion, efficient processing and fast visualization of landscape models which is funded
by the German Research Council (DFG).

A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 293–303, 2009.

c Springer-Verlag Berlin Heidelberg 2009
294 M. Drauschke

is proposed, where the image fragments have been put in a hierarchical order
to infer the category of the whole object after having classified its parts. So
far, this approach has only been used for finding the category of an object, but
it does not analyze the parts individually. This approach has been evaluated
on blurred, downsampled building images, cf. [13]. Without resizing the image,
the algorithm seems to work inefficiently or even might fail at homogeneous
facades or on the repetitive patterns like bricks, because the fragments cannot
get grouped together easily. Thus, the approach is not easily applicable to the
domain of buildings.
Working on hyperspectral images, a hierarchical segmentation scheme for
geospatial objects as buildings has been recently proposed using morphologi-
cal operations, cf. [1]. Due to the low resolution of the images, the hierarchy
can only be used for detecting the object of the largest scale, but not its parts
separately.
We work on segmented image regions at different scales, where we derive a
region hierarchy from the analysis of the regions. So far, it is purely data-driven,
so that the general approach can be used in many domains. A short literature
review on multi-scale image analysis is given in sec. 2. Then, we present our own
multi-scale approach in sec. 3. For complexity reasons, we need to select regions
from the pyramid for further processes. We document this procedure in sec. 4.
The validation of our graphical representation is demonstrated in an experiment
on building images in sec. 5. Concluding, we summarize our contribution in
sec. 6.

2 Multi-scale Image Analysis

Although, the segmentation of images can be discussed in a very general way, we


have in our mind the segmentation of images showing man-made scenes. These
images usually show objects of various scales. With respect to the building do-
main, windows, balconies or facades can be such objects. For detecting them,
the image must be analyzed at several scales. The two most convenient frame-
works for multi-scale region detections are (a) segmentation in scale-space and
(b) irregular pyramids. Regarding scale-space techniques, the behaviour of seg-
mentation schemes have been studied, and the watershed segmentation is often
favored, even in different domains, cf. e. g. [16], [8], [10] and [3].
We also evaluated the usability of watersheds for segmenting images of build-
ings. Thereby, our focus was the possibility to segment objects of different scales.
In Gaussian scale-space, the smoothing with the circular filter leads to rounded
edges and region borders in higher scales. We obtain similar result when us-
ing the morphological scale-space as proposed in [12]. Again, the shape of the
structural element emerges disturbingly at the higher scales. In the anisotropic
diffusion scheme, cf. [17], the region borders of highest contrast are preserved
longest, and therefore it can not be used for modeling aggregates of building
parts, where the strongest gradient appear at the border between e. g. bright
window frames and dark window panes.
An Irregular Pyramid for Multi-scale Analysis of Objects and Their Parts 295

Pyramids are a commonly used representation for scale-space structures,


cf. [14]. When working on the regular grid of image blocks, e. g. on pixel-level,
the use of a regular pyramid is supported by many advantages, e. g. access in
memory, adjacencies of blocks etc. In contrast to the regular grid, the number
of entities rapidly decreases when working on segmented image regions, which
also decreases the complexity of many further algorithms. Furthermore, the rep-
resentation of objects by (aggregated) regions is more precise in the shape of the
objects boundary than using rectangular blocks.
In the last years, different pyramid frameworks have been proposed. With re-
spect to image segmentation, we would like to point out the stochastic pyramids,
cf. [15], and irregular pyramids as used in [10]. In both approaches, a hierarchy of
image regions is obtained by grouping them according to certain condition, e. g.
a homogeneity measure. With respect to buildings we often have the problem of
finding such conditions, because we want to merge regions of similar appearance
on one hand and regions rich in contrast on the other hand. Thus we decided to
work on watershed regions in scale-space, and to use this scale-space structure
to derive a region hierarchy that forms an irregular pyramid.

3 Construction of the Irregular Pyramid

In this section, we present our multi-scale segmentation framework and the con-
struction of our region hierarchy graph (RHG). For receiving more precise region
boundaries, we applied an adaptation of the approach of [8].

3.1 Multi-scale Image Segmentation

Many different segmentation algorithms were proposed since the age of digital
imagery has started. We decided to derive our segmentation from the watershed
boundaries on the image’s gradient magnitude. Considering the segmentation of
man-made objects, we mostly find strong color edges between different surfaces,
and so the borders of the watershed regions are often (nearly) identical with the
borders of the objects.
Our approach uses the Gaussian scale-space for obtaining regions in multiple
scales. We arranged the discrete scale-space layers logarithmically between σ = 1
and σ = 16 with 10 layers in each octave, obtaining 41 layers. For each scale
σ, we convolve each image channel with a Gaussian filter and obtain a three-
dimensional image space for each channel. Then we compute the combined gra-
dient magnitude of the color images. Since the watershed algorithm is inclined
to produce oversegmentation, we suppress many gradient minima by resetting
the gradient value at positions where the gradient is below the median of the
gradient magnitude. So, those minima are removed which are mostly caused by
noise. The mathematical notation of this procedure is described in more detail
in [5]. As result of the watershed algorithm, we obtain a complete partitioning
of the image, where every image pixel belongs to exactly one region.
296 M. Drauschke

3.2 Region Hierarchy Graph


The result of the scale-space watershed procedure is a set of regions Rσν where
ν is the index for the identifying label and σ specifies the scale. The area of a
region |R| is the number of its pixels. Since the scale-space layers are ordered in
a sequence, we denote neighbored scales by their indices, i. e. σi and σi+1 . Our
RHG is based on pair wise neighborhoods of scale and we define two regions
Rνm ,σi and Rνn ,σi+1 of neighbored scales as adjacent in scale if their overlap is
maxized. Therefore, we determine the number of pixel positions which belong
to both regions |Rσνm
i
∩ Rσνni+1 |. Concluding, adjacency in scale of two regions of
neighbored scales is defined by the mapping
Rσνm
i
→ Rσνni+1 ⇔ |Rσνm
i
∩ Rσνni+1 | > |Rσνm
i
∩ Rσνki+1 | ∀ k = n, (1)
which defines an ordered binary relation between region, and the mapping sym-
bol → reflects the development of a region with increasing scale. Observe, no
threshold is necessary.
According to [14], there occur four events with region features in scale-space:
the merging of two or more regions into one, and the creation, the annihila-
tion or the split of a region. Our RHG reflects only two of these events, the
creation and the merging. A creation-event is represented by a region of a
higher layer that is no target of the mapping-relation, and a merge-event is
represented, if two or more regions are mapped to the same region in the next
layer. Equ. 1 avoids that a region can disappear, because we always find a re-
gion in the next layer. Furthermore, our mapping-relation avoids the occurrence
of the split-event, because we always look for the (unique) maximum overlap.
Our definition of the region hierarchy leads to a simple RHG, which only con-
sists of trees, where each node (except in the highest scale) has exactly one
leaving edge.
Note that the relation defined in equ. 1 is asymmetric. When expressing region
adjacency with decreasing scale, we take the inverted edges from the RHG. More-
over, the relation is not transitive. Thus, the RHG may contain paths to different

Fig. 1. Segmentation in scale-space and its RHG. Regions from the same scale are
ordered horizontally, and the increasing scales are ordered vertically from bottom to
top. The edges between the nodes describe the development of the regions over scale.
The gray-filled region has been created in the second layer.
An Irregular Pyramid for Multi-scale Analysis of Objects and Their Parts 297

regions, if a scale-space layer has been skipped when constructing the RHG. We
show a scale-space with three layers and the corresponding RHG in fig. 1.

3.3 Region Focusing


The Gaussian smoothing leads to blurred edges at larger scales, and corners
become rounder and rounder. Therefore, we perform an additional region focus-
ing, which is inspired by [18] and [2]. In [2], the existence of an edge has been
recognized in a large scale, but its specific geometric appearance was derived by
tracking it to the lowest available scale.
We improve the geometrical precision of our segmented regions by combining
information from the RHG with the initial image partition, i. e. the segmentation
at the lowest scale σ = 1. Taking the forest as a directed graph with arcs from
higher scale to lower scales we obtain the focused region at a level below a given
regions as the union of all regions reachable from the source region. Reaching the
initial image partition, we obtain regions Rσνn by merging all respective regions:
i

νn = ∪k Rνk with ∃ a path from Rνk to Rνn .


R (2)
σi σ=1 σ=1 σi

In fact, our approach is an adaptation of the segmentation approach in [8]. There,


a similar merging strategy for watershed regions has been proposed, where the
regions were merged on the basis of their tracked seed points, thus bottom up,
whereas our approach ist top down. The procedure in [8] is not suitable to our
segmentation scheme, because we have suppressed all minima in the gradient
image which are below its median, so we might analyze the development of a
huge number of seed point for a single region. Furthermore, our approach with
looking for the maximum overlap is also applicable, if a different segmentation
is used than watershed regions. We visualize a result of our region focusing in
comparison with the original image partition in fig. 2.
Since we use the RHG for performing the region focusing, the RHG nearly
remains unchanged. We only delete all newly created regions from all scale-space

Fig. 2. Image segmentation of an aerial image. Left: RGB image of a suburban scene in
Graz, Austria (provided by Vexcel Imaging GmbH). Middle: Original watershed regions
in scale σ = 35. Right: Region focusing with merged regions of scales σ = 12 (thin)
and σ = 35 (thick). Clearly, both segmentations of scale σ = 35 are not topologically
equivalent, because the newly created or split regions (and their borders) cannot get
tracked down to the initial partition by our region focusing.
298 M. Drauschke

layers above the initial partition. Hence, the respective nodes and edges must be
removed from the RHG. Furthermore, all regions must be removed which only
develop from these newly created regions. The updated RHG of the example in
fig. 1 will contain all white nodes and the their connecting edges.

4 Selection of Regions from Irregular Pyramid

Up to this point, we only described the construction of our irregular image


pyramid, but we have not mentioned its complexity. On relatively small images
with a size of about 400 × 600 pixels, the ground layer of our irregular pyramid
often contains 1500 or more regions, and their number decrease down to 10 to
30 in the highest layer. Assuming that the number of regions in a layer decreases
with a constant velocity, the complete pyramid contains over 30.000 regions.
Since most of these regions do not represent objects of interest, a selection of
regions seems to be helpful to reduce the complexity of further processes.
The integration of knowledge about the scene could later be done in this step,
e. g. one could choose regions with a major axis which leads in direction of the
most dominant vanishing points, or one could choose regions which represent
a repetitive pattern in the image, so that it might correspond to a window in
the image. But nevertheless, the search for such reasonable regions in the whole
pyramid is still a task with a very high complexity.
We have tested our algorithms by segmenting images showing man-made
scenes, preferably buildings. These objects mostly have clearly visible borders,
so that the according edges can be detected in several layers of the pyramids.
Therefore, we focus on stable regions in our irregular pyramid and defined a sta-
bility measure ςm,i for a region Rσνm
i
to the adjacent region in the next scale-space
level i + 1 by
Rσνm
i
∩ Rσνni+1
ςm,i = νm , (3)
Rσi ∪ Rσνni+1
where region Rσνni+1 is adjacent in scale to Rσνm
i
and, therefore, both regions are
connected by an edge in the RHG. Then we define the stability measure ς of a
scale range with d scale-space levels by

ςm,i = max min ςm ,j , (4)
k=0..d j=i−d+k..i+k

where m corresponds to the region of layer j that is connected to Rσνmi


by a path.
We call all regions with ςm,i > t stable, where t is a threshold, e. g. t = 0.75.
If we find a stable region in our pyramid, than we will find at least d − 1
additional regions with a similar shape. All these regions can be represented by
the same region. This is the first step, when we reduce our pyramid. The stable
regions are not necessarily adjacent in scale to other stable regions. In fact, this
happens seldom. We are able to keep the information of the RHG, if we arrange
the stable regions in a hierarchical order and include the merging events, where
paths from two or more previously stable regions reach the same region of the
An Irregular Pyramid for Multi-scale Analysis of Objects and Their Parts 299

Fig. 3. Tree of stable regions: the layers of the pyramid are arranged in a vertical
order (going upwards), each rectangle represents a node in the TSR, the white ones
correspond to stable regions, the black ones the merging events from the RHG. The
horizontal extensions of the rectangles show their spatial state, and the vertical exten-
sion corresponds with the range of stability. The idea of this figure is taken from the
interval tree and its representation as rectangular tessellation in [18].

pyramid. Due to the limited space, we cannot go more into detail here, we present
a sketch of our method in fig. 3. Its result is a tree of stable regions (TSR), where
we inserted an additional root-node for describing the complete scene.

5 Experiments

Our approach is very general, because we used only two assumptions for gener-
ating the TSR: the color-homogeneity of the objects and the color-heterogeneity
between them, and that the objects of interest are stable in scale-space or are
merged stable regions. Now, we want to present some results of our experiments.
Therefore, we analyzed the TSR of 123 facade images from six German cities:
Berlin, Bonn, Hamburg, Heidelberg, Karlsruhe and Munich, see fig. 4. These
buildings have a sufficient large variety with respect to their size, the architec-
tural style and the imaging conditions.

5.1 Manual Annotations

The ground truth of our experiments on facade images are hand-labeled an-
notations1 . On one side, the annotation contains the polygonal borders of all
1
The images and their annotations were provided by the project eTraining for inter-
preting images of man-made scenes which is funded by the European Union. The
labeling of the data has been realized by more than ten people in two different re-
search groups. To avoid inconsistencies within the labeled data, there was defined
an ontology for facade images with a list of objects that must be annotated and
their part-of relationships. A publication of the data is in preparation. Please visit
www.ipb.uni-bonn.de/etrims for further information.
300 M. Drauschke

Fig. 4. Left: Facade images from Berlin, Bonn, Hamburg, Heidelberg, Karlsruhe and
Munich (f. l. t. r.), showing the variety of our data set. Right: Two levels from the
irregular pyramid of the Hamburg image.

Fig. 5. Left: Facade image from Hamburg with manually annotated objects. Right:
Major classes and their part-of relationships from the defined building-scene ontology.

interesting objects that are visible in the scene. On the other side, part-of rela-
tionships have also been inserted in the annotations. An extract of the facade
ontology is shown in fig. 5.

5.2 Results

We investigate the coherence between our automatically segmented image re-


gions taken from the TSR and the manual annotations. Our experiment consists
of two tasks. First, we document the detection rate of the annotated objects, and
secondly, we test, if the hierarchical structure of the TSR reflects the aggregation
structure of the annotated objects.
In the 1st test, we perform a similar evaluation as it is done in the PASCAL
challenge, cf. [7]. There, it is sufficient enough to map an automatically seg-
mented region to the ground truth region, if the quotient of the intersection and
An Irregular Pyramid for Multi-scale Analysis of Objects and Their Parts 301

the union of both regions is bigger than 0.5. So, we compute this quotient for
each region in the TSR with respect to all annotated objects. Then, the maxi-
mum quotient is taken for determining the class label of the segmented region.
If the ratio is above the threshold, then we call the object detectable. Otherwise,
we also look for partial detectability, i. e. if the segmented region is completely
included by an annotation. This partial detectability is relevant, e. g. if the ob-
ject is occluded by a car or by a tree. Furthermore, we do not expect to detect
complete facades, but our segmentation scheme could be used for analysis of
image extracts, i. e. the roof part or around balconies.
Regarding the 2nd experiment, our interest is, if the TSR reflects the class
hierarchy. This would be the case, if e. g. a window -region includes window pane-
regions, i. e. they both are connected by a path upwards in the TSR. So, we only
focus on those annotated objects, which were (a) detectable or partially de-
tectable and (b) annotated as an aggregate. In this case, the annotation includes
a list of parts of this object. Then, we determine, whether we find other regions
in the TSR, which are (a) also at least partially detectable and (b) are connected
to the first region by a path upwards in the TSR. Then the upper region can
get described as an aggregate containing at least the lower one. Additionally, we
also check, whether not at least one but all parts of the aggregated object have
been found, i. e. if the list of detectable parts is complete. Our results are shown
in tab. 1.

Table 1. Results on detectabilty of building parts: 84% of the annotated objects have a
corresponding region in the TSR or are partially detectable. The columns are explained
in the surrounding text.

class objects det. part. summed aggregates aggreg. compl. aggreg.


all 9201 58% 26% 84% 2303 48% 13%
balcony 285 31% 62% 93% 243 53% 13%
entrance 72 47% 38% 85% 57 26% 11%
facade 191 49% 46% 95% 172 74% 13%
roof 178 46% 46% 92% 89 51% 13%
window 2491 56% 33% 89% 1369 46% 12%
window pane 2765 68% 8% 76% 0 - -

Note: the automatically segmented regions were only compared with the la-
beled data, no classification step has been done so far. We have presented first
classification results on the regions from the Gaussian scale-space in [4], where
we classified segmented regions as e. g. windows with a recognition rate of 80%
using an Adaboost approach. With geometrically more precise image regions, we
expect to obtain even better results. Furthermore, the detected regions can be
inserted as hypotheses to a high-level image interpretation system as it has been
demonstrated in [11]. It uses initial detectors and scene interpretations of mid-
level systems to infer an image interpretation by means of artificial intelligence,
where new hypotheses must be verified by new image evidence.
302 M. Drauschke

A similar experiment on aerial images showing buildings in the suburbs of


Graz, Austria, is in preparation. There, we expect even better results, because
the roof parts only contain relatively small parts which often merge with the
roof in our observed scale range.

6 Conclusion and Outlook

We presented a purely data-driven image segmentation framework for a multi-


scale image analysis, where regions of different size are observable in different
scales. A defined region hierarchy graph enables us to obtain geometrically sig-
nificantly more precise region boundary than we obtain by only working in the
Gaussian scale-space. Furthermore, the graph can be used for detecting struc-
tures of aggregates. So, far we only compared the segmented regions to the
annotated ground truth and did not present a classifier for the regions.
In next steps, we will insert more knowledge about our domain, e. g. the re-
gions can be reshaped using detected edges. Then, the merging of region does not
only depend on the observations in scale-space, but also on the not-occurrence
of an edge. Therefore, we need a projection of the detected edges to the bor-
ders of the detected image regions in the lowest layer. Another way would be
a multiple-view image analysis, where 3D-information has been derived from a
stereo pair of images.
Our region hierarchy graph can further be used as the structure of a Bayesian
network, where each node is a stochastic variable on the set of classes. The part-
of relations between the regions are analogously taken to model the dependencies
between these stochastic variables. This will enable a simultaneous classification
of all regions taking the partonomy into account.

References
1. Akçay, H.G., Aksoy, S.: Automatic detection of geospatial objects using multiple
hierarchical segmentations. Geoscience & Remote Sensing 46(7), 2097–2111 (2008)
2. Bergholm, F.: Edge focusing. PAMI 9(6), 726–741 (1987)
3. Brun, L., Mokhtari, M., Meyer, F.: Hierarchical watersheds within the combinato-
rial pyramid framework. In: Andrès, É., Damiand, G., Lienhardt, P. (eds.) DGCI
2005. LNCS, vol. 3429, pp. 34–44. Springer, Heidelberg (2005)
4. Drauschke, M., Förstner, W.: Selecting appropriate features for detecting buildings
and building parts. In: Proc. 21st ISPRS Congress, IAPRS 37 (B3b-2), pp. 447–452
(2008)
5. Drauschke, M., Schuster, H.-F., Förstner, W.: Detectability of buildings in aerial
images over scale space. PCV 2006, IAPRS 36(3), 7–12 (2006)
6. Epshtein, B., Ullman, S.: Feature hierarchies for object classification. In: Proc. 10th
ICCV, pp. 220–227 (2005)
7. Everingham, M., Winn, J.: The pascal visual object classes challenge 2008
(voc2008) development kit (2008) (online publication)
8. Gauch, J.M.: Image segmentation and analysis via multiscale gradient watershed
hierarchies. Image Processing 8(1), 69–79 (1999)
An Irregular Pyramid for Multi-scale Analysis of Objects and Their Parts 303

9. Goldstein, E.B.: Sensation and Perception (in German translation by Ritter, M),
6th edn. Wadsworth, Belmont (2002)
10. Guigues, L., Le Men, H., Cocquerez, J.-P.: The hierarchy of the cocoons of a graph
and its application to image segmentation. Pattern Recognition Letters 24(8),
1059–1066 (2003)
11. Hartz, J., Neumann, B.: Learning a knowledge base of ontological concepts for
high-level scene interpretation. In: Proc. ICMLA, pp. 436–443 (2007)
12. Harvey, R., Bangham, J.A., Bosson, A.: Scale-space filters and their robustness.
In: ter Haar Romeny, B.M., Florack, L.M.J., Viergever, M.A. (eds.) Scale-Space
1997. LNCS, vol. 1252, pp. 341–344. Springer, Heidelberg (1997)
13. Lifschitz, I.: Image interpretation using bottom-up top-down cycle on fragment
trees. Master’s thesis, Weizmann Institute of Science (2005)
14. Lindeberg, T.: Scale space theory in computer vision. Kluwer Academic, Dordrecht
(1994)
15. Meer, P.: Stochastic image pyramids. CVGIP 45, 269–294 (1989)
16. Olsen, O.F., Nielsen, M.: Multiscale gradient magnitude watershed segmentation.
In: Del Bimbo, A. (ed.) ICIAP 1997. LNCS, vol. 1310, pp. 9–13. Springer, Heidel-
berg (1997)
17. Perona, P., Malik, J.: Scale-space and edge detection using anisotropic diffussion.
PAMI 12(7), 629–639 (1990)
18. Witkin, A.: Scale-space filtering. In: Proc. 8th IJCAI, pp. 1019–1022 (1983)
A First Step toward Combinatorial Pyramids in
n-D Spaces

Sébastien Fourey and Luc Brun

GREYC, CNRS UMR 6072, ENSICAEN, 6 bd maréchal Juin F-14050 Caen, France
Sebastien.Fourey@greyc.ensicaen.fr, Luc.Brun@greyc.ensicaen.fr

Abstract. Combinatorial maps define a general framework which al-


lows to encode any subdivision of an n-D orientable quasi-manifold with
or without boundaries. Combinatorial pyramids are defined as stacks of
successively reduced combinatorial maps. Such pyramids provide a rich
framework which allows to encode fine properties of the objects (either
shapes or partitions). Combinatorial pyramids have first been defined in
2D. This first work has later been extended to pyramids of n-D gener-
alized combinatorial maps. Such pyramids allow to encode stacks of non
orientable partitions but at the price of a twice bigger pyramid. These
pyramids are also not designed to capture efficiently the properties con-
nected with orientation. The present work presents our first results on
the design of a pyramid of n-D combinatorial maps.

1 Introduction
Pyramids of combinatorial maps have first been defined in 2D [1], and later
extended to pyramids of n-dimensional generalized maps by Grasset et al. [6].
Generalized maps model subdivisions of orientable but also non-orientable quasi-
manifolds [7] at the expense of twice the data size of the one required for com-
binatorial maps. For practical use (for example in image segmentation), this
may have an impact on the efficiency of the associated algorithms or may even
prevent their use. Furthermore, properties and constrains linked to the notion
of orientation may be expressed in a more natural way with the formalism of
combinatorial maps. For these reasons, we are interested here in the definition of
pyramids of n-dimensional combinatorial maps. This paper is a first step toward
the definition of such pyramids, and the link between our definitions and the
ones that consider G-maps is maintained throughout the paper. In fact, the link
between n-G-maps and n-maps was first established by Lienhardt [7] so that it
was claimed in [2], but not explicitly stated, that pyramids of n-maps could be
defined.
The key notion for the definition of pyramids of maps is the operation of
simultaneous removal or contraction of cells. Thus, we define the operation of
simultaneous removal and the one of simultaneous contraction of cells in an
n-map, the latter being introduced here as a removal operation in the dual map.

This work was supported under a research grant of the ANR Foundation (ANR-06-
MDCA-008-02/FOGRIMMI).

A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 304–313, 2009.

c Springer-Verlag Berlin Heidelberg 2009
A First Step toward Combinatorial Pyramids in n-D Spaces 305

We first raise in Section 3 a minor problem with the definition of ”cells with
local degree 2 in a G-map” used in [5,2] and more precisely with the criterion
for determining if a cell is a valid candidate for removal. We provide a formal
definition of the local degree, which is consistent with the results established in
previous papers [2,6], using the notion of a regular cell that we introduce.
An essential result of this paper, presented in Section 4, is that the removal
operation we introduce here is well defined since it indeed transforms a map
into another map. Instead of checking that the resulting map satisfies from its
very definition the properties of a map, we use an indirect proof by using the
removal operation in G-maps defined by Damiand in [2,3]. If needed, this way
again illustrates the link between the two structures.
Eventually, in Section 5 we will state a definition of simultaneous contraction
of cells in a G-map in terms of removals in the dual map, definition which we
prove to be equivalent to the one given by Damiand and Lienhardt in [2]. We
finally define in the same way the simultaneous contraction operation in maps.
Note that the proofs of the results stated in this paper may be found in [4].

2 Maps and Generalized Maps in Dimension n


An n-G-map is defined by a set of basic abstract elements called darts connected
by (n + 1) involutions. More formally:
Definition 1 (n-G-map [7]). Let n ≥ 0, an n-G-map is defined as an (n + 2)-
tuple G = (D, α0 , . . . , αn ) where:
– D is a finite non-empty set of darts;
– α0 , . . . , αn are involutions on D (i.e. ∀i ∈ {0, . . . , n}, α2i (b) = b) such that:
• ∀i ∈ {0, . . . , n − 1}, αi is an involution without fixed point (i.e. ∀b ∈ D,
αi (b) = b);
• ∀i ∈ {0, . . . , n − 2}, ∀j ∈ {i + 2, . . . , n}, αi αj is an involution1 .
The dual of G, denoted by G, is the n-G-map G = (D, αn , . . . , α0 ). If αn is an
involution without fixed point, G is said to be without boundaries or closed. In
the following we only consider closed n-G-maps with n ≥ 2.
Figure 1(a) shows a 2-G-map G = (D, α0 , α1 , α2 ) whose set of darts D is {1, 2, 3,
4, −1, −2, −3, −4}, with the involutions α0 = (1, −1)(2, −2)(3, −3) (4, −4), α1 =
(1, 2)(−1, 3)(−2, −3)(4, −4), and α2 = (1, 2)(−1, −2)(3, 4)(−3, −4).
Let Φ = {φ1 , . . . , φk } be a set of permutations on a set D. We denote by <Φ>
the permutation group generated by Φ, i.e. the set of permutations obtained
by any composition and inversion of permutations contained in Φ. The orbit of
d ∈ D relatively to Φ is defined by < Φ>(d) = φ(d) φ ∈< Φ> . Furthermore,
we extend this notation to the empty set by defining <∅> as the identity map.
If Ψ = {ψ1 , . . . , ψh } ⊂ Φ we denote < ψ1 , . . . , ψ̂j , . . . , ψh >(d) =< Ψ \ {ψj }>(d).
Moreover, when there will be no ambiguity about the reference set Φ we will
denote by < ψ̂1 , ψ̂2 , . . . , ψ̂h>(d) the orbit <Φ \ Ψ>(d).
1
Given two involutions αi , αj and one dart d, the expression dαi αj denotes αj ◦αi (d).
306 S. Fourey and L. Brun

e2
e1 3 4
1 −1
v2
v1
2 −2
−3 −4

(a) (b)

Fig. 1. (a) A 2-G-map. (b) A solid representation of a part of a 3-G-map where a


vertex has a local degree 2 but is not regular. (The vertex is made of all the depicted
darts.)

Definition 2 (Cells in n-G-maps [7]). Let G = (D, α0 , . . . , αn ) be an n-G-


map, n ≥ 1. Let us consider d ∈ D. The i-cell (or cell of dimension i) that contains
d is denoted by Ci (d) and defined by the orbit: Ci (d) =<α0 , . . . , α̂i , . . . , αn>(d).
Thus, the 2-G-map of Fig. 1(a) counts 2 vertices (v1 =<α1 , α2 >(1) = {1, 2} and
v2 = {−1, 3, 4, −4, −3, −2}), 2 edges (e1 =< α0 , α2 > (1) = {1, −1, 2, −2} and
e2 = {3, 4, −3, −4}), and 2 faces (the one bounded by e2 and the outer one).
Definition 3 (n-map [7]). An n-map (n ≥ 1) is defined as an (n + 1)-tuple
M = (D, γ0 , . . . , γn−1 ) such that:
– D is a finite non-empty set of darts;
– γ0 , . . . γn−2 are involutions on D and γn−1 is a permutation on D such that
∀i ∈ {0, . . . , n − 2}, ∀j ∈ {i + 2, . . . , n}, γi γj is an involution.
The dual of M , denoted by M , is the n-map M = (D, γ0 , γ0 γn−1 , . . . , γ0 γ1 ). The
−1
inverse of M , denoted by M −1 is defined by M −1 = (D, γ0 , . . . , γn−2 , γn−1 ). Note
that Damiand and Lienhardt introduced a definition of n-map as an (n+1)-tuple
(D, βn , . . . , β1 ) defined as the inverse of the dual of our map M . If we forget the
inverse relationships (which only reverses the orientation), we have γ0 = βn and
βi = γ0 γi for i ∈ {1, . . . , n − 1}. The application β1 is the permutation of the
map while (βi )i∈{2,...,n} defines its involutions.
Definition 4 (Cells in n-maps [7]). Let M = (D, γ0 , . . . , γn−1 ) be an n-map,
n ≥ 1. The i-cell (or cell of dimension i) of M that owns a given dart d ∈ D is
denoted by Ci (d) and defined by the orbits:
∀i ∈ {0, . . . , n − 1} Ci (d) = < γ0 , . . . , γˆi , . . . , γn−1 > (d)
For i = n Cn (d) = < γ0 γ1 , . . . , γ0 γn−1 > (d)
In both an n-map and an n-G-map, two cells C and C with different dimensions
will be called incident if C ∩ C = ∅. Moreover, the degree of an i-cell C is the
number of (i + 1)-cells incident to C, whereas the dual degree of C is the number
of (i − 1)-cells incident to C. An n-cell (resp. a 0-cell) has a degree (resp. dual
degree) equal to 0.
A First Step toward Combinatorial Pyramids in n-D Spaces 307

2.1 From n-G-Maps to Maps and Vice Versa

An n-map may be associated to an n-G-map, as stated by the next definition.


In this paper, we use this direct link between the two structures to show that
the removal operation we introduce for maps is properly defined (Section 4). For
that purpose, we notably use the fact that a removal operation (as defined by
Damiand and Lienhardt [2]) in a G-map has a counterpart (according to our
definition) in its associated map and vice versa.
Definition 5 (Map of the hypervolumes). Let G = (D, α0 , . . . , αn ) be an
n-G-map, n ≥ 1. The n-map HV = (D, δ0 = αn α0 , . . . , δn−1 = αn αn−1 ) is
called the map of the hypervolumes of G.
A connected component of a map (D, γ0 , . . . , γn−1 ) is a set < γ0 , . . . , γn−1 >(d)
for some d ∈ D. Lienhardt [8] proved that if an n-G-map G is orientable, HV (G)
has two connected components. In the following we only consider orientable n-
G-maps.
Conversely, given an n-map, we may construct an orientable n-G-map that
represents the same partition of a quasi-manifold. Thus, we define below the
notion of an n-G-map associated to a given n-map (Definition 6). Lienhard [7,
Theorem 4] only stated the existence of such a G-map; we provide here an explicit
construction scheme that will be used in Section 4.
Definition 6. Let M = (D, γ0 , . . . , γn−1 ) be an n-map. We denote by AG(M )
the (n + 1)-tuple (D̃ = D ∪ D , α0 , α1 , . . . , αn ) where D is a finite set with the
same cardinal as D such that D ∩ D = ∅, and the involutions αi , 0 ≤ i ≤ n, are
defined by:
dαi d ∈ D d ∈ D
i < n − 1 dγi σ dσ −1 γi
−1
i = n − 1 dγn−1 σ dσ −1 γn−1
i=n dσ dσ −1
where σ is a one-to-one correspondence between D and D .
As stated by [4, Proposition 7] the (n + 1)-tuple AG(M ) is actually an n-G-map.
Furthermore, given an n-map M = (D, γ0 , . . . , γn−1 ), if D is a connected compo-
nents of M , the (n + 1)-tuple (D , γ0 |D , . . . , γn−1 |D ) is an n-map [4, Remark 3],
which is called the sub-map of M induced by D , denoted by M|D . Finally, the
following proposition establishes the link between the HV and AG operations.
Proposition 1. If M is an n-map, we have M = HV (AG(M ))|D where D is
the set of darts of M .

3 Cells Removal in n-G-Maps

As the number of (i + 1)-cells that are incident to it, the degree of an i-cell C
in an n-G-map G = (D, α0 , . . . , αn ) is the number of sets in the set Δ = <
308 S. Fourey and L. Brun

α̂i+1 > (d) d ∈ C . As part of a criterion for cells that may be removed from a
G-map, we need a notion of degree that better reflects the local configuration of
a cell: the local degree. A more precise justification for the following definition
may be found in [4].
Definition 7 (Local degree in G-maps). Let C be an i-cell in an n-G-map.
– For i ∈ {0, . . . , n − 1}, the local degree of C is the number
< α̂i , α̂i+1 >(b) b ∈ C
– For i ∈ {1, . . . , n}, the dual local degree of C is the number
< α̂i−1 , α̂i >(b) b ∈ C
The local degree (resp. the dual local degree) of an n-cell (resp. a 0-cell) is 0.
Intuitively, the local degree of an i-cell C is the number of (i+1)-cells that locally
appear to be incident to C. It is called local because it may be different from
the degree since an (i + 1)-cell may be incident more than once to an i-cell, as
illustrated in Fig. 1 where the 1-cell e2 is multi-incident to the 0-cell v2 , hence
the cell v2 has a degree 2 and a local degree 3.
On the other hand, the dual local degree of an i-cell C is the number of (i − 1)-
cells that appear to be incident to C. As in the example given in Fig. 1 where the
edge e2 locally appears to be bounded by two vertices2 , whereas the darts defining
this edge all belong to a unique vertex (v2 ). Hence, e2 has a dual degree 1 and a
dual local degree 2. and a dual local degree 2.
In [5,6], Grasset defines an i-cell with local degree 2 (0 ≤ i ≤ n − 2) as a cell C
such that for all b ∈ C, bαi+1 αi+2 = bαi+2 αi+1 , and an i-cell with dual local degree
2 (2 ≤ i ≤ n) as a cell C such that for all b ∈ C, bαi−1 αi−2 = bαi−2 αi−1 . In fact,
Grasset’s definition does not actually distinguish cells with local degree 1 from cells
with local degree 2, so that the vertex v1 in the 2-G-map of Fig. 1 is considered as
removable, yielding the loop (−1, −2) after removal. On the other hand, it is also
more restrictive then our definition for a cell with local degree 2 (Definition 7). As
an example, the vertex depicted in Fig. 1(b) has local degree 2 but does not satisfy
the above mentioned criterion.
However, Grasset’s definition was merely intended to characterize cells that
could be removed from a G-map, producing a valid new G-map, following the
works of Damiand and Lienhardt [2] where the term “degree equal to 2” is
actually used with quotes. To that extend, it is a good criterion [3, Theorem 2]
but again not a proper definition for cells with local degree 2.
Grasset’s criterion is in fact a necessary but not sufficient condition to prevent
the production of a degenerated G-map after a removal operation, like in the
case of the removal of a vertex with local degree 1 (v1 in Fig. 1). We introduce
here our own criterion based on the proper notion of local degree and a notion
of regularity introduced below. This criterion is proved to be equivalent to a
corrected version of Grasset’s condition (Theorem 1). We first introduce the
notion of a regular cell.
2
It is always the case for an (n − 1)-cell.
A First Step toward Combinatorial Pyramids in n-D Spaces 309

Definition 8 (Regular cell). An i-cell (i ≤ n − 2) in an n-G-map is said to


be regular if it satisfies the two following conditions:
a) ∀d ∈ C, dαi+1 αi+2 = dαi+2 αi+1 or dαi+1 αi+2 ∈< α̂i , α̂i+1 > (dαi+2 αi+1 ),
and
b) ∀b ∈ C, bαi+1 ∈<
/ α̂i , α̂i+1 >(b)
Cells of dimension n − 1 are defined as regular cells too.
Thus, the vertex depicted in Fig. 1(b) is a 0-cell (with local degree 2) in a 3-G-
map which is not regular. Grasset et al.’s criterion prevents this configuration
from being considered as a removable vertex, although it is indeed a vertex
with local degree 2 according to our definition. Eventually, the link between the
criterion used in [2,5] and our definitions is summarized by the following theorem
where condition i) excludes cells with local degree 1.
Theorem 1. For any i ∈ {0, . . . , n − 2}, an i-cell C is a regular cell with local
degree 2 if and only if
i) ∃b ∈ C, bαi+1 ∈<
/ α̂i , α̂i+1 > (b), and
ii) ∀b ∈ C, bαi+1 αi+2 = bαi+2 αi+1
Note that, under a local degree 2 assumption, both conditions (a) and (b) of
Definition 8 are used to show condition ii). We thus do not have i) ⇔ b) and
ii) ⇔ a).
In order to define simultaneous removal of cells in a G-map G (resp. in a map
M ), we will consider families of sets of the form Sr = {Ri }0≤i≤n , where Ri is
a set of i-cells and Rn = ∅. The family Sr is called a removal set in G (resp.
in M ). We will denote R = ∪ni=0 Ri , the set of all cells of Sr , and R∗ = ∪C∈R C,
the set of all darts in Sr . If D is a connected component of G (resp. M ), we
denote by Sr |D the removal set that contains all the cells of Sr included in D .
The following definition characterizes particular removal sets that actually may
be removed from an n-G-map, resulting in a valid map.
Definition 9 (Removal kernel). Let G be an n-G-map. A removal kernel
Kr = {Ri }0≤i≤n in G is a removal set such that all cells of R are disjoint and
all of them are regular cells with local degree 2 (Definitions 8 and 7).
We provide the following definition which is slightly simpler and proved to be
equivalent [4, Proposition 12] to the one used in [2,3,6].
Definition 10 (Cells removal in n-G-maps). Let G = (D, α0 , . . . , αn ) be
an n-G-map and Kr = {Ri }0≤i≤n−1 be a removal kernel in G. The n-G-map
resulting of the removal of the cells of R is G = (D , α0 , . . . , αn ) where:
1. D = D \ R∗ ;
2. ∀d ∈ D , dαn = dαn ;
3. ∀i, 0 ≤ i < n, ∀d ∈ D , dαi = d = d(αi αi+1 )k αi where k is the smallest
integer such that d ∈ D .
We denote G = G \ Kr or G = G \ R∗ .
310 S. Fourey and L. Brun

4 Cells Removal in n-Maps


In this section we define an operation of simultaneous removal of cells in an
n-map derived from the one given for n-G-maps in the previous section. The
link between the two operations is established by first showing that a removal
operation in an n-G-map G has its counterpart in the map of the hypervolumes
of G (Eq. (1)). Furthermore, we also prove indirectly that the map resulting
from a removal operation is a valid map (Theorem 2).
As for G-maps, we need a notion of local degree in a map.
Definition 11 (Local degree in maps). Let C be an i-cell in an n-map.
– The local degree of C is the number
|{< γ̂i , γ̂i+1 > (b) | b ∈ C}| if i ∈ {0, . . . , n − 2}
|{< γ0 γ1 , . . . , γ0 γn−2 > (b) | b ∈ C}| if i = n − 1
– The dual local degree of C is the number
|{< γ̂i , γ̂i−1 > (b) | b ∈ C}| for i ∈ {1, . . . , n − 1}
|{< γ0 γ1 , . . . , γ0 γn−2 > (b) | b ∈ C}| for i = n
The local degree (resp. the dual local degree) of an n-cell (resp. a 0-cell) is 0.
We also define ([4, Definition 16]) a notion of regular cell in an n-map from the
same notion in G-maps (Definition 8). Now, we may introduce a key definition
of this paper: the simultaneous removal of a set of cells in an n-map.
Definition 12 (Cells removal in n-maps). Let M = (D, γ0 , . . . , γn−1 ) be an
n-map and Sr = {Ri }0≤i≤n−1 a removal set in M . We define the (n − 1)-tuple
M \ Sr = (D , γ0 , . . . , γn−1

) obtained after removal of the cells of Sr by:
– D  = D \ R∗ ;
−1 k
– ∀i ∈ {0, . . . , n − 2}, ∀d ∈ D , dγi = d(γi γi+1 ) γi , where k is the smallest
−1 k 
integer such that d(γi γi+1 ) γi ∈ D .
– For i = n − 1, ∀d ∈ D , dγn−1
= dγn−1k+1
where k is the smallest integer such

that dγn−1 ∈ D .
k+1

Note that an equivalent definition in terms of (βi )i∈{1,...,n} (Section 2) is provided


in [4, Proposition 13].
We will prove in the sequel (Theorem 2) that the such defined (n − 1)-tuple
M \ Sr is an n-map if Sr is a removal kernel (Definition 14), this by establishing
the link between removal in n-maps and removal in n-G-maps.
Definition 13. Let G be an n-G-map, Sr = {Ri }0≤i≤n be a removal set in G,
and M = HV (G). We define the set HV (Sr ) = {Ri }0≤i≤n as follows:
– ∀i ∈ {0, . . . , n − 1}, Ri = <αn α0 , . . . , αnˆαi , . . . , αn αn−1>(d) d ∈ Ri∗
– Rn = <α0 α1 , . . . , α0 αn−1>(d) d ∈ Rn∗
The set HV (Sr ) is a removal set in M ([4, Lemma 17]).
A First Step toward Combinatorial Pyramids in n-D Spaces 311

We proved ([4, Proposition 14]) that the removal operation introduced here for
n-maps produces a valid n-map when applied to the map of the hypervolumes
of a G-map. Formally, if G is an n-G-map and Kr is a removal kernel in G:
HV (G) \ HV (Kr ) = HV (G \ Kr ) (1)
so that the left term is a valid map.
It remains to be proved that the removal operation, when applied to any n-
map, produces a valid n-map. This is proved to be true (Theorem 2) as soon as
the cells to be removed constitute a removal kernel according to Definition 14.
Definition 14 (Removal kernel). Let M be an n-map. A removal kernel
Kr = {Ri }0≤i≤n in M is a removal set such that all cells of R are disjoint
and all of them are regular cells with local degree 2 ([4, Definition 16] and Defi-
nition 11).
If M is an n-map and G = AG(M ) with the notations of Definition 6, for any
i-cell C of M the set3 C ∪ Cσ (if i < n) or C ∪ Cγ0 σ (if i = n) is an n-cell of
AG(M ) [4, Proposition 7] called the associated cell of C in AG(M ), denoted
by C̃. This definition of associated cell allows to directly define in AG(M ) the
associated removal set of a removal kernel in M , which is proved to be a removal
kernel [4, Definition 24,Proposition 15].
We may now state the main result of this section.
Theorem 2. If M is an n-map and Kr is a removal kernel in M , the (n + 1)-
tuple M \ Kr (Definition 12) is a valid n-map.
Sketch of proof: With G̃ = AG(M ), we have the following diagram:
removal of K
M −−−−→ M −−−−−−−−−→ r
M \ Kr
⏐  
⏐ ⏐ ⏐
 |D ⏐ |D ⏐


⏐ removal of HV (K̃r )
AG HV (G̃) −−−−−−−−−−−−→ HV (G̃) \ HV (K̃r )
⏐  
⏐ ⏐ ⏐
 HV ⏐ HV ⏐

removal of K̃
G̃ −−−−→ G̃ −−−−−−−−−→
r
G̃ \ K̃r

Indeed, we have HV (G̃)|D = M by Proposition 1; hence the left part of the


diagram. If Kr is a removal kernel in M , then a removal kernel K̃r in G̃ may be
associated to Kr [4, Definition 24,Proposition 15]. Thus the bottom-right part of
the diagram holds by (1). Eventually, we have Kr = HV (K̃r )|D [4, Lemma 19],
and (HV (G̃)\HV (K̃r ))|D = HV (G̃)|D \HV (K̃r )|D = M \Kr [4, Proposition 16],
hence the upper-right part of the diagram. Therefore, if we follow the sequence
of mappings
AG \K̃r HV |D
M −→ G̃ −→ G̃ \ K̃r −→ HV (G̃ \ K̃r ) −→ M \ Kr
3
If σ : E −→ F and S ⊂ E, Sσ is the image of S by σ, namely Sσ = σ(d) d ∈ S .
312 S. Fourey and L. Brun

we deduce that M \ Kr is a valid n-map since G̃ = AG(M ) is an n-G-map [4,


Proposition 7], therefore G̃ \ K̃r is an n-G-map [2,3], hence HV (G̃ \ K̃r ) is an
n-map [8], and finally HV (G̃ \ K̃r )|D , i.e. M \ Kr , is an n-map [4, Remark 3]. 

5 Cells Contraction in n-G-Maps and n-Maps


Definition 15 (Contraction kernel). Let G = (D, α0 , . . . , αn ) be an n-G-
map and Kc = {Ci }0≤i≤n be sets of i-cells with C0 = ∅, such that all cells of
C = ∪ni=0 Ci are disjoint and regular cells with dual local degree 2. The family Kc
is called a contraction kernel in G. A contraction
 kernel is 
defined in a similar
way for an n-map M . (Recall that Ci∗ = c∈Ci c and C ∗ = i∈{0,...,n} Ci∗ .)
In this paper, we choose to define the contraction operation in G-maps as a re-
moval operation in the dual map (Definition 16) when Damiand and Lienhardt [2]
provided a definition close to the one they gave for the removal operation (see
Section 3).
Definition 16 (Cells contraction). Let G = (D, α0 , . . . , αn ) be an n-G-map
(resp. M = (D, γ0 , . . . , γn−1 ) be an n-map) and Kc = {Ci }1≤i≤n be a contraction
kernel. The n-G-map (resp. n-map) resulting of the contraction of the cells of
Kc , which we denote G/Kc (resp. M/Kc ) is the n-G-map G \ Kc (resp. the
n-map M \ Kc ).
We proved [4, Proposition 22] that this definition is equivalent to the one given
by Damiand and Lienhardt about simultaneous removals and contractions [2].
Not surprisingly, this definition also leads to a constructive description of the
G-map obtained after contraction of cells [4, Proposition 21] which is similar to
the definition given for the removal operation in an n-G-map (Definition 10).
Proposition 2. Let G = (D, α0 , . . . , αn ) be an n-G-map and Kc = {Ci }1≤i≤n
be a contraction kernel. The n-G-map resulting of the contraction of the cells of
C according to Definition 16 is G = (D , α0 , . . . , αn ) defined by:
1. D = D \ C;
2. ∀d ∈ D , dα0 = dα0 ;
3. ∀i, 0 < i ≤ n, ∀d ∈ D , dαi = d = d(αi αi−1 )k αi where k is the smallest
integer such that d ∈ D .
Moreover, if M is a map the tuple M/Kc is indeed a map as the dual of the map
M \ Kc . Using the same approach as Proposition 2 we obtain an explicit con-
struction scheme for the contracted map [4, Proposition 24] (see Proposition 25
for the same result in terms of (βi )i∈{1,...,n} ).
Proposition 3. Let M = (D, γ0 , . . . , γn−1 ) be an n-map. Let Kc = {Ci }1≤i≤n
be a contraction kernel. The n-map obtained after contraction of the cells of Kc ,
is the map M  = (D = D \ C, γ0 , . . . , γn−1

) where:
– ∀d ∈ D , dγ0 = dγn−1
k k
γ0 where k is the smallest integer such that dγn−1 γ0 ∈

D;
A First Step toward Combinatorial Pyramids in n-D Spaces 313

−1 k
– ∀i ∈ {1, . . . , n − 1}, ∀d ∈ D , dγi = dγn−1
k
(γi γi−1 ) γi , where k is the small-
 
est integer such that dγn−1 ∈ D and k is the smallest integer such that
k
−1 k
k
dγn−1 (γi γi−1 ) γi ∈ D .

6 Conclusion
Based on the previous work by Damiand and Lienhardt for generalized maps, we
have defined cells removal and contraction in n-dimensional combinatorial maps,
and proved the validity of such operations. A logical sequel of this paper will be
the definition of n-dimensional combinatorial pyramids and the related notions,
the way Brun and Kropatsch did in the two-dimensional case and following the
works of Grasset about pyramids of generalized maps.

References
1. Brun, L., Kropatsch, W.: Combinatorial pyramids. In: Suvisoft (ed.) IEEE Interna-
tional conference on Image Processing (ICIP), Barcelona, September 2003, vol. II,
pp. 33–37. IEEE, Los Alamitos (2003)
2. Damiand, G., Lienhardt, P.: Removal and contraction for n-dimensional generalized
maps. In: Nyström, I., Sanniti di Baja, G., Svensson, S. (eds.) DGCI 2003. LNCS,
vol. 2886, pp. 408–419. Springer, Heidelberg (2003)
3. Damiand, G., Lienhardt, P.: Removal and contraction for n-dimensional generalized
maps. Technical report (2003)
4. Fourey, S., Brun, L.: A first step toward combinatorial pyramids in nD spaces.
Technical report TR-2009-01, GREYC (2009),
http://hal.archives-ouvertes.fr/?langue=en
5. Grasset-Simon, C.: Définition et étude des pyramides généralisées nD : application
pour la segmentation multi-echelle d’images 3D. Ph.D. thesis, Université de Poitiers
(2006)
6. Grasset-Simon, C., Damiand, G., Lienhardt, P.: nD generalized map pyramids: Def-
inition, representations and basic operations. Pattern Recognition 39(4), 527–538
(2006)
7. Lienhardt, P.: Topological models for boundary representation: a comparison with
n-dimensional generalized maps. Computer-Aided Design 23(1), 59–82 (1991)
8. Lienhardt, P.: N-dimensional generalized combinatorial maps and cellular quasi-
manifolds. International Journal of Computantional Geometry & Applications 4(3),
275–324 (1994)
Cell AT-Models for Digital Volumes

Pedro Real and Helena Molina-Abril

Dpto. Matematica Aplicada I, E.T.S.I. Informatica,


Universidad de Sevilla,
Avda. Reina Mercedes, s/n 41012 Sevilla, Spain
{real,habril}@us.es

Abstract. In [4], given a binary 26-adjacency voxel-based digital vo-


lume V , the homological information (that related to n-dimensional
holes: connected components, ”tunnels” and cavities) is extracted from
a linear map (called homology gradient vector field) acting on a polyhe-
dral cell complex P (V ) homologically equivalent to V . We develop here
an alternative way for constructing P (V ) based on homological algebra
arguments as well as a new more efficient algorithm for computing a ho-
mology gradient vector field based on the contractibility of the maximal
cells of P (V ).

1 Introduction
In [4], a polyhedral cell complex P (V ) homologically equivalent to a binary 26-
adjacency voxel-based digital volume V is constructed. The former is an useful
tool in order to visualize, analyze and topologically process the latter.The contin-
uous analogous P (V ) is constituted of contractile polyhedral blocks installed in
overlapping 2×2×2 unit cubes. Concerning visualization, the boundary cell com-
plex ∂P (V ) (in fact, a triangulation) of P (V ) is an alternative to marching-cube
based algorithms [7]. The complex P (V ) is obtained in [4] suitably extending to
volumes the discrete boundary triangulation method given in [8]. Nevertheless,
the main interest in constructing P (V ) essentially lies in the fact that we can
extract from it homological information in a straightforward manner. More pre-
cisely, by homological information we mean here not only Betti numbers (number
of connected components, ”tunnels” or ”holes” and cavities), Euler characteristic
and representative cycles of homology classes but also homological classification
of cycles and higher cohomology invariants. Roughly speaking, for obtaining this
homological acuity, we use an approach in which the homology problem is posed
in terms of finding a concrete algebraic “deformation process” φ (so-called chain
homotopy in Homological Algebra language [6] or homology gradient vector field
as in [4]) which we can apply to P (V ), obtaining a minimal cell complex with
exactly one cell of dimension n for each homology generator of dimension n.

This work has been partially supported by ”Computational Topology and Applied
Mathematics” PAICYT research project FQM-296, ”Andalusian research project
PO6-TIC-02268 and Spanish MEC project MTM2006-03722, and the Austrian Scien-
ce Fund under grant P20134-N13.

A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 314–323, 2009.

c Springer-Verlag Berlin Heidelberg 2009
Cell AT-Models for Digital Volumes 315

Fig. 1. Zoom of the polyhedral cell complex associated to a digital volume

Collaterally, homology groups can be deduced in a straightforward manner from


φ. This idea of describing homology in terms of chain homotopies is not new and
comes back to Eilenberg-MacLane work [2] on Algebraic Topology and it has
been developed later in algebraic-topological methods like Effective Homology
and Homological Perturbation Theory and in discrete settings as Discrete Morse
[3] and AT-model [5] theories. In this paper, working in the field of general cell
complexes embedded in R3 and using discrete Morse theory notions, we con-
struct a homology gradient vector field starting from any initial gradient vector
field on a cell complex and, in the setting of the polyhedral cell complexes asso-
ciated to digital volumes we design an efficient homology computation algorithm
based on addition of contractile maximal cells. We work with coefficients in the
finite field F2 = 0, 1, but all the results here can be extended to other finite field
or integer homology.

2 Homological Information on Cell Complexes


We deal with here the homology problem for finite cell complexes. Throughout
the paper, we consider that the ground ring is the finite field F2 = {0, 1}. Let
K be a three-dimensional cell complex. A q–chain a is a formal sum of simplices
of K (q) (q = 0, 1, 2, 3). We denote σ ∈ a if σ ∈ K (q) is a summand of a. The
q–chains form a group with respect to the component–wise addition; this group
is the qth chain complex of K, denoted by Cq (K). There is a chain group for
every integer q ≥ 0, but for a complex in R3 , only the ones for 0 ≤ q ≤ 3 may
be non–trivial. The boundary map ∂q : Cq (K) → Cq−1 (K) applied to a q–cell
σ gives us the collection of all its (q − 1)–faces which is a (q − 1)–chain. By
linearity, the boundary operator ∂q can be extended to q–chains. In the concrete
case of a simplicial complex, the boundary of aq-simplex defined in terms of
vertices σ = v0 , . . . , vq  is defined by: ∂q (σ) = v0 , . . . , v̂i , . . . , vq , where the
hat means that vertex vi is omitted. In our case, taking into account that the
3-cells of our cell complexes can automatically be subdivided into tetrahedra, its
boundary map can directly be derived from that of the component tetrahedra.
It is clear that ∂q−1 ∂q = 0. From now on, a cell complex will be denoted by
(K, ∂), being ∂ : C(K) → C(K) its boundary map. A chain a ∈ Cq (K) is called
316 P. Real and H. Molina-Abril

a q–cycle if ∂ q (a) = 0. If a = ∂ q + 1(a ) for some a ∈ Cq+1 (K) then a is


called a q–boundary. Define the qth homology group to be the quotient group of
q–cycles and q–boundaries, denoted by Hq (K). The homology class of a chain
a ∈ Cq (K) is denoted by [a]. It is clear that the Homology Problem for cell
complexes (K, ∂) can be reduced to solving up to boundary the equation ∂ = 0.
Two main approaches can be used:
The differential approach. Classically, in Algebraic Topology, this last ques-
tion has mainly been understood in terms of obtaining the different equivalence
classes (H0 (K), H1 (K), H2 (K)). In an informal way, the homology groups de-
scribe in an algebraic way the maximal different disjoint set of cycles such that
two cycles belonging to the same set can be deformed (using a boundary) to
each other. For a 3D object, the ranks of the free part of the groups H0 (K),
H1 (K) and H2 (K), called Betti numbers, measure the corresponding number
of connected components, ”holes” or ”tunnels” and cavities of this object. The
homology groups are ”computable” (up to isomorphism) global properties for
the most of object representation models, they are strongly linked to the ob-
ject structure (they do not depend on the particular subdivision you use), they
are free groups up to dimension three and the main topological characteristics
exhaustively used at to now in Digital Imagery (Euler characteristic and Betti
numbers) can directly be obtained from them. There are two main strategies for
computing homology groups for cell complexes: (a) the classical matrix ”reduc-
tion algorithm” [9], mainly based on the Smith normal form diagonalization of
the incidence matrices corresponding to the boundary map in each dimension;
(b) the incremental technique of Delfinado-Edelsbrunner [1] in which homology
is updated in each one-cell processing step, until the object is completely covered.
The integral approach. The solution to the Homology Problem can also be
described in the following terms: to find a concrete map φ : C∗ (K) → C∗+1 (K),
increasing the dimension by one and satisfying that φφ = 0, φ∂φ = φ and
∂φ∂ = ∂. In [4], a map φ of this kind have been called homology gradient
vector field. This datum φ is, in fact, a chain homotopy operator on K (a purely
homological algebra notion) and it is immediate to establish a strong algebraic
link between the cell complex associate to K and its homology groups (H0 (K),
H1 (K), H2 (K)), such that it is possible to ”reconstruct” the object starting
from its homology. For example, we need to specify a homological integral ope-
rator in order to homologically classifying any cycle or computing cohomology
ring numbers. An algorithms using this integral approach can be classified into
one of these two main groups: (a) starting from a zero integral operator, the
idea is to save more algebraic information for constructing a homology gradient
vector field φ (cost negligible in time but not in space) during the execution of
the previous homology computation algorithms (matrix and incremental); (b)
processes generating first a non-zero initial gradient vector field φ0 (using, for
example, Discrete Morse Theory techniques via Morse functions), constructing
a reduced cell complex K’ resulting from the application of the deformation φ0 ,
and finally applying algorithms of kind (a) to K’. Let us emphasize that this
Cell AT-Models for Digital Volumes 317

description of ”homology” as a pure algebraic deformation process is classical


and comes from Eilenberg-Mac Lane work on Algebraic Topology in the sixties
of the last century. Nevertheless, its use in the context of Digital Imagery is
relatively recent [4,5,10].
Summing up, differential approach can be seen as a sort of minimal (and
classical) solution in the sense that only the final result is considered while inte-
gral approach is a ”maximal” solution in which all the homological deformation
process is codified in an efficient way.
We are here now ready for defining homological information for an object K:
any feature or characteristic extracted in a straightforward manner from a (non
necessarily homological) gradient vector field for K. In that way, homological
information includes not only Euler characteristic, Betti numbers, topological
skeletons, Reeb graphs, representative cycles of homology generators and relative
homology groups but also homological classification of cycles, homology and
cohomology operations, cohomology ring, induced homomorphisms in homology.
Our choice within the context of Digital Imagery between differential or in-
tegral approach for the Homology Problem will mainly depend on the concrete
application we are involved and can be ”modulated” (from minimal-differential
to maximal-integral approach) mainly in terms of the input, output and the ho-
mological elementary process for gradually constructing an homology gradient
vector field on a cell complex.
In order to be understandable, the following definitions are needed.
Definition 1. [3] Let (K, d) be a finite cell complex. A linear map of chains
φ : C∗ (K) → C∗+1 (K) is a combinatorial gradient vector field (or, shortly,
combinatorial gvf ) on K if the following conditions hold: (1) For any cell a ∈ Kq ,
φ(a) is a q + 1-cell b; (2) φ2 = 0.
If we remove the first condition, then φ will be called an algebraic gradient vector
field. If φ is a combinatorial gvf which is only non-null for a unique cell a ∈ Kq
and satisfying the extra-condition φdφ = φ, then it is called a (combinatorial)
integral operator [10]. An algebraic gvf satisfying the condition φdφ = φ is called
an algebraic integral operator. An algebraic gvf satisfying the conditions φdφ = φ
and dφd = d will be called a homology gvf [4]. A gvf is called strongly-nilpotent
r if
it satisfies the following property: given any u ∈ Kq , and being φ(u) = i=1 vi ,
then φ(vi ) = 0, ∀i. We say that a linear map f : C∗ (K) → C∗ (K) is strongly
null overan algebraic gradient vector field φ if given any u ∈ Kq , and being
r
φ(u) = i=1 vi , then f (vi ) = 0, ∀i.
Using homological algebra arguments, it is possible to deduce that a ho-
mology gvf φ determines a strong algebraic relationship connecting C(K) and
its homology vector space H(K). Let us define a chain contraction (f, g, φ) :
(C, ∂) => (C  , ∂  ) between two chain complexes as a triple of linear maps such
that f : C∗ → C∗ , g : C∗ → C∗ and φ : C∗ → C∗+1 and they satisfy the following
conditions: (a) idC − gf = ∂φ + φ∂; (b)f g = idC  ; (c) f φ = 0; (d) φ g = 0; (e)
φ φ = 0.
Proposition 1. Let (K, ∂) be a finite cell complex. A homology gvf φ : C∗ (K) →
C∗+1 (K) give raise to a chain contraction (π, incl, φ) from C(K) onto a chain
318 P. Real and H. Molina-Abril

subcomplex of it isomorphic to the homology of K. Reciprocally, given a chain


contraction (f, g, φ) from C(K) to its homology H(K), then φ is a homology gvf.
Let incl : Imπ → C(K) be the inclusion map. Let π = idC(K) − ∂φ − φ∂. This
chain map describe for each cell a representative cycle of the homology class asso-
ciated to this cell and satisfies that π 2 = π. If Imπ = {x ∈ C(K), such that x =
φ(y) for some y} and Kerπ = {x ∈ C(K) such that φ(x) = 0}, then C(K) =
Imπ ⊕ Kerπ). Let f : C(K) → Im(π) be the corestriction of π to Im(π) (that
is, π : C(K) → Im(π)) and g : Im(π) → C(K) be the inclusion. Let d˜ be the
boundary operator of Im(π). We now prove that d˜ = 0. Taking into account that
idC(K) + gf = φ∂ + ∂φ, ∂∂ = 0 and ∂φ∂ = ∂, we then obtain ∂ − ∂gf = ∂.
Therefore, ∂gf = g df ˜ = 0. Since f is onto and g is one-to-one, we deduce that
˜
d = 0. That means that the Morse complex Mφ = Imπ is a graded vector space
with null boundary operator isomorphic to the homology H(K).
The homology computation process we apply in this paper is that given in [4],
in which the incremental homology algorithm of [1] is adapted for getting a
homology gradient vector field.
Given a cell complex (K, ∂), the ordered set of cells K = c1 , . . . , cm  is a filter
if ci is a face of cj for i < j. It is possible to ”filter” K by first considering all
the 0-cells in a certain order, then an order on all the 1-cells, and so on.
Algorithm 1. Let (K, ∂) be a filtered finite cell complex with filter Km = c0 , . . . ,
cm . We represent the cell complex K up to filter level i by Ki = c0 , . . . , ci , with
boundary map ∂i . Let Hi the homology chain complex (with zero boundary map)
associated to Ki .
H0 := {c0 }, φ0 (c0 ) := 0, π0 (c0 ) := c0 .
For i = 1 to m do
πi (ci ) = ci = ci + φi−1 ∂i (ci ),
Hi := Hi−1 ∪ {ci }, φi (ci ) := 0,
If (∂i + ∂i−1 φi−1 ∂i )(ci ) = 0, then
For j = 0 to i − 1 do,
φi (cj ) := φi−1 (cj ).
If (∂i + ∂i−1 φi−1 ∂i )(ci ) is r
r
a sum of a kind j=1 πi−1 (esj ) = j=1 uj = 0 (ui ∈ Hi−1 ), then:
Let us choose a summand uk and define φ̃(uk ) := ci
and zero for the rest of elements of Hi−1 .
then For j = 0 to i − 1 do,
φi (cj ) = (φi−1 + φ̃(1Ki + φi−1 ∂i−1 + ∂i−1 φi−1 )(cj ),
πi (cj ) = [1Ki − φi ∂i + ∂i φi ](cj )
Hi := Hi \ {uk , ci }
Output: a homology gradient vector field φm for K.
Sketch of the proof
It can be proved by induction on i that φm is a homology gvf and, in conse-
quence, it naturally produces a chain contraction (πm , incl, φm ) from C(K) to its
homology H(K). The number of elementary operations involved in this process
is O(m3 ).
Cell AT-Models for Digital Volumes 319

Fig. 2. Figure showing a 3D digital object V , an simplicial continuous analogous K(V ),


and an homology gradient vector field φ on K(V ) using the filter {1, 2, 3, 4, ...}. For
example, φ(5) = (11) + (10) + (9) + (8), φ(14) = 0, φ(15) = (16).

Fig. 3. Combinatorial gvf (a) and algebraic gvf (b)

Morevoer, it is not difficult to prove in that the resulting homology gvf φm of


Algorithm 1 is a strongly nilpotent algebraic gvf and πm is a strongly null map
over φm .
Using Discrete Morse Theory pictorial language, combinatorial gvfs can be
described in terms of directed graphs on the cell complex. For example, let us
take an integral operator φ such that φ(a) = c, a ∈ K0 and being a and b
the vertices of the 1-cell c. It is clear that φ can be represented by a directed
tree consisting in the edge c together with its vertices, such that the arrow on
c goes out from vertex a. Of course, the previous properties of a homology gvf
φi : Ci (K) → Ci+1 (K) (i = 0, 1, 2) help us to suitably express all the φi in terms
of graphs.
Proposition 2. If φ : C(K) → C(K) is a homology gvf for a cell complex
(K, ∂) and we denote by H ∂ (K) and H φ (K) the homology groups of K tak-
ing respectively ∂ and φ as boundary maps on K (both satisfy the 2-nilpotency
320 P. Real and H. Molina-Abril

condition). Then, H ∂ (K) and H φ (K) are isomorphs. The maps h : H ∂ (K) →
H φ (K) defined by h([c]∂ ) = [c + ∂φ(c)]φ and k : H φ (K) → H ∂ (K) defined by
h([c]φ ) = [c + φ∂(c)]φ specify this isomorphism.

3 Polyhedral AT-Model for a Digital Volume

Let V be a binary 26-adjacency voxel-based digital volume. A cell AT-model for


V is a pair ((P (V ), ∂), φ), such that (P (V ), ∂) is a polyhedral cell complex (for
example, that specified in [4]) homologically equivalent to V and φ : C(P (V )) →
C(P (V )) is a homology gvf for P (V ). To obtain the cell complex P (V ) we do
as follows. Each black voxel can be seen as a point (0-cell) of our complex. The
algorithm consist of dividing the volume into overlapped (its intersection being a
”square” of four voxels mutually 26-adjacent) unit cubes formed by eight voxels
mutually 26-adjacent, and to associate each unit cube configuration with its
corresponding cell. We scan the complete volume, always taking as elementary
step a unit cube.
The cell associated to a unit cube configuration is a 0-cell if there is a single
point. If there are two points, the complex is a 1-cell which is the edge con-
necting both of them. With three or four coplanar points on the set, the 2-cell
associated is a polygon. If there are four non coplanar points or more, the 3-cell
is a polyhedra. In other words, the cell associated to a unit cube configuration
is just the convex hull of the black points and all its lower dimension faces. Note
that for 3-cells, their 2-dimension faces are either triangles or squares.
Once we have covered all the volume and joined all the cells, we build the
complete cell complex without incoherences.
The idea here is to design an incremental algorithm for computing the ho-
mology of P (V ) taking into account the contractibility of the cells (that is, the
fact that they are homologically equivalents to a point). First at all, we develop
a method for determining a homology gvf for any cell or polyhedral block R for
P (V ) installed in a 2 × 2 × 2 unit cube Q, which also provides an alternative
method for constructing P (V ).
Let us start by describing the contractibility of a unit cube Q by a particular
homology gvf. In the figure 4, it is visualized this vector field φQ : C(Q) →
C(Q) by colored arrows. For example, φQ (< 3, 4, 5, 6 >) =< 1, 2, 3, 4, 5, 6, 7, 8 >

Fig. 4. A unit cube with labeled vertices (a) and arrows describing the contractibility
of the cube (b)
Cell AT-Models for Digital Volumes 321

Fig. 5. The maximal cell R (a) and its corresponding homology gvf (b)

(shown in yellow), φQ (< 5, 6 >) =< 5, 6, 7, 8 > + < 1, 2, 7, 8 > (shown in green)
and φQ (< 6 >) =< 1, 2 > + < 2, 7 > + < 6, 7 > (shown in red). Obviously, the
boundary map ∂Q : C(Q) → C(Q) is defined in a canonical way (no problems
here with the orientation of the cells, due to the fact we work over F2 ). For
instance, ∂Q (< 1, 2, 3, 4 >) =< 1, 2 > + < 2, 3 > + < 3, 4 > + < 4, 1 > and
∂Q (< 1, 8 >) =< 1 > + < 8 >.
Now, an alternative technique to the modified Kenmochi et al. method [8] for
constructing P (V ) is sketched here. In order to determine a concrete polyhedral
configuration R as well as a concrete homology gvf for it (to determine its bound-
ary map is straightforward in F2 ), we use a homological algebra strategy which
amounts to take advantage of the contractibility of Q for creating a homology gvf
for R, by means of integral operators acting on Q. For avoiding to overburden with
too much notation, we only develop the method in one concrete cases.
First, let us take the convex hull of eight black points showed in figure 4.
Applying the integral operator given by ψ(< 8 >) =< 1, 8 >, the final result
R and its homology gvf appears in figure 5. The face < 1, 5, , 67 > need to be
subdivided into two triangular faces: < 1, 5, 7 > and < 1, 6, 7 > for getting the
configuration R. For connecting R and R , we applied to R the integral operator
given by the formula ψ(< 5, 7 >) =< 1, 5, 7 >.
In consequence, a homology gvf for R appears in Figure 5.
In fact, all this homology gvfs are obtained by transferring the homology gvf
of Q via chain homotopy equivalences.
All these techniques are valid for any finite field or integer coefficients, and
additional difficulties about orientation of the cells can be easily overcome.
We are now able for designing an incremental algorithm for computing the
homology of V via the cell complex P (V ), based on the reiterated use of homo-
logy gvfs for polyhedral cells inscribed in the unit cube Q, we face to the problem
of computing the homology of an union of a polyhedral cell complex P (V  ) and
a polyhedral cell R.
Definition 2. Let (K, ∂) be a finite cell complex and φ1 , φ2 , . . . , φr a sequence of
integral operators φi : C∗ (K) → C∗+1
 (K) involving two cells {ci1 , ci2 } of different
 such that {c1 , c2 } {c1 , c2 } = ∅, ∀1 ≤ i, k ≤ r. Then, an
i i k k
dimension and
algebraic gvf ri=1 φi for C(K) onto a chain subcomplex having n − 2r cells can
r
be constructed. The sum i=1 φi applied to a cell u is ck2 is u = ck1 (k = 1, . . . , r)
and zero elsewhere.
322 P. Real and H. Molina-Abril

Fig. 6. An example showing the representative generator of the 1-cycle (in blue) and
the resulting Φ and ϕ. Notice that Φ(< 3, 6 >) = 0 and < 3, 6 >∈
/ Im(Φ) (< 3, 6 > is
a critical simplex in terms of Discrete Morse Theory).

Fig. 7. An example showing the filling of the “hole” and an attachment of a 2-cell

r
In general Φ = i=1 φi does not satisfy the condition ΦdΦ = Φ. Applying
Algorithm 1 to (K, ∂) (previously filtered) to a partial filtering affecting only
to the cells cij (1 ≤ i ≤ r and j = 1, 2) in its sub-cells and specifying at each
cell-step concerning the cell ci2 that φ̃(fi (ci1 )) := ci2 , the final result will be a
(non necessarily homological) algebraic integral operator ϕ : C(K) → C(K).
Applying Proposition 1 to the algebraic integral operator ϕ and assuming that
K has n cells, we obtain a chain contraction (f, g, ϕ) from C(K) to a chain
subcomplex C(M (K)) having M (K) (also called, Morse complex of K associated
to the sequence {φi }ri=1 ) n − 2r cells. Algorithm 1 applied to M (K) gives us a
homology gvf φ for M (K). Finally, the map ϕ + φ(1 − dϕ − ϕd) gives us a
homology gvf for the cell complex K.
Using these arguments, it is straightforward to design an algorithmic pro-
cess of homology computation (over F2 ) for a binary 26-adjacency voxel-based
digital volume V based on the contractibility of the maximal cells (in terms
Cell AT-Models for Digital Volumes 323

of a homology gvf) constituting the continuous analogous P (V ). All is reduced


to find a sequence of elementary integral operators acting as internal topologi-
cal thinning operators on P (V ). Our candidates are the arrows describing the
contractibility of all the maximal polyhedral cell configurations forming the ob-
jects.In order to suitably choose these integral operators, we use a maximal cell
incremental technique.

References
1. Delfinado, C.J.A., Edelsbrunner, H.: An Incremental Algorithm for Betti Numbers
of Simplicial Complexes on the 3–Sphere. Comput. Aided Geom. Design 12, 771–
784 (1995)
2. Eilenberg, S., MacLane, S.: Relations between homology and homotopy groups of
spaces. Ann. of Math. 46, 480–509 (1945)
3. Forman, R.: A Discrete Morse Theory for Cell Complexes. In: Yau, S.T. (ed.)
Geometry, Topology & Physics for Raoul Bott. International Press (1995)
4. Molina-Abril, H., Real, P.: Advanced homology computation of digital volumes via
cell complexes. In: Proceedings of the Structural and Syntactic Pattern Recognition
Workshop, Orlando, Florida, USA (December 2008)
5. Gonzalez-Diaz, R., Real, P.: On the Cohomology of 3D Digital Images. Discrete
Applied Math. 147, 245–263 (2005)
6. Hatcher, A.: Algebraic Topology. Cambridge University Press, Cambridge (2001)
7. Kenmochi, Y., Kotani, K., Imiya, A.: Marching cube method with connectivity. In:
Proceedings of International Conference on Image Processing. ICIP 1999, vol. 4(4),
pp. 361–365 (1999)
8. Kenmochi, Y., Imiya, A., Ichikawa, A.: Boundary extraction of discrete objects.
Computer Vision and Image Understanding 71, 281–293 (1998)
9. Munkres, J.R.: Elements of Algebraic Topology. Addison–Wesley Co., London
(1984)
10. Real, P., Gonzalez-Diaz, R., Jimenez, M.J., Medrano, B., Molina-Abril, H.: Inte-
gral Operators for Computing Homology Generators at Any Dimension. In: Ruiz-
Shulcloper, J., Kropatsch, W.G. (eds.) CIARP 2008. LNCS, vol. 5197, pp. 356–363.
Springer, Heidelberg (2008)
11. Zomorodian, A., Carlsson, G.: Localized Homology. Computational Geometry:
Theory and Applications archive 41(3), 126–148 (2008)
From Random to Hierarchical Data through an
Irregular Pyramidal Structure

Rimon Elias, Mohab Al Ashraf, and Omar Aly

Faculty of Digital Media Engineering and Technology


German University in Cairo
New Cairo City, Egypt
rimon.elias@guc.edu.eg

Abstract. This paper proposes to transform data scanned randomly


in a well-defined space (e.g, Euclidean) along a hierarchical irregular
pyramidal structure in an attempt reduce search time consumed query-
ing these random data. Such a structure is built as a series of graphs
with different resolutions. Levels are constructed and surviving cells are
chosen following irregular irregular pyramidal rules and according to a
proximity criterion among the space points under consideration. Exper-
imental results show that using such a structure to query data can save
considerable search time.

Keywords: Irregular pyramids, hierarchical structure, point clustering,


hierarchical visualization, multiresolution visualization.

1 Introduction
Sometimes large sets of data are sought to be searched with respect to a specific
query point. Many data items in these sets could have been excluded from the
search as they are far from the query point. However, if data items are not
clustered, there will no way but to check each item; a process that can be time
consuming. If the data items are clustered or categorized into a hierarchy, search
time can be enhanced considerably. However, if we structure data in a hierarchy,
visualizing such a hierarchy may be a challenge. Different techniques have been
developed over the last years to help humans grasp the structure of a hierarchy in
a visual form (e.g., treemaps [19], information slices [1] and sunburst [20]). Those
techniques can be categorized under different sets depending on the nature of
data visualized and the way the data are visualized.
This paper presents a technique based on irregular pyramidal rules to cluster
data points in an aim to reduce time consumed in the search process.
The paper is organized as follows. Sec. 2 presents the concepts of pyrami-
dal architecture and multiresolution structures. Sec. 3 surveys different visual-
ization techniques that have been developed under different categories. Sec. 4
presents our algorithm that depends on a hierarchical structure to cluster the
data. Finally, Sec. 5 presents some experimental results while Sec. 6 derives some
conclusions.

A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 324–333, 2009.

c Springer-Verlag Berlin Heidelberg 2009
From Random to Hierarchical Data 325

2 Pyramidal Architecture
Hierarchical or multiresolution processing through pyramidal structures is a well-
known topic in image analysis. The main aim of such a concept is to reduce the
amount of information to be manipulated in order to speed up the whole process.
Over the past recent decades, many hierarchical or pyramidal structures have
been developed to solve various problems that process images in general (e.g.,
segmenting an image according to its different gray levels). Such pyramidal struc-
tures can be categorized into two main subsets. These are regular and irregular
pyramids. The classification of regularity and irregularity depends on whether
a parent in a hierarchy has a constant number of children to build a regular
structure or various number of children to build an irregular structure.
Regular pyramids include, among others, bin pyramid [9] in which a parent has
exactly two children; quad pyramid [9] where a parent has four children (Fig. 1);
hexagonal pyramid [7] that uses a triangular tessellation and in which a parent
has four children; dual pyramid [15] with levels rotated 45◦ alternatively.
In the category of irregular pyramids, the number of children per parent varies
according to the information processed and the operation under consideration.
Hence, the number of surviving nodes, cells or pixels may change from one situa-
tion to another according to the data processed. In order to accommodate this:
• A level should be represented as a graph data structure; and
• Some rules must be utilized in order to govern the process. In the adaptive
pyramid [8] and the disparity pyramid [6], the decimation process; i.e., the
process by which the surviving cells are chosen, can be controlled by two
rules:
1. Two neighbors cannot survive together to the higher level; and
2. For each non-surviving cell, there is at least one surviving cell in its
neighborhood.
It is worth mentioning that all the above pyramids work on images. However, we
may apply pyramidal rules to space points in order to cluster them according to

Fig. 1. An example of a quad pyramid


326 R. Elias, M. Al Ashraf, and O. Aly

the proximity among each other. Hence, flat and random data with no apparent
hierarchical nature can be categorized into a hierarchy.
The next section specifies the steps of the algorithm we propose in order to
cluster the data points and visualize them as a hierarchy using a query-dependent
pixel-oriented technique.

3 Visualization Techniques
In addition to the irregular pyramid concept mentioned above, we need to in-
vestigate some visualization concepts. These are query-dependent versus query-
independent techniques in addition to different techniques to visualize hierarchies.

3.1 Query-Dependency
Visualization techniques can be categorized into query-dependent and query-
independent subsets. The query-dependent techniques refer to visualizing the
arranged data according to some attribute. The user may input a query point to
compare with the other data items. The differences can be calculated, arranged
in order and visualized as colored pixels. Spiral and axes techniques and their
variations [11,10,12] are examples that can be used in this case. The query-
independent techniques do not require the user to input a query point to visualize
data with respect to that point; instead, the data are visualized with no apparent
order if data items are not sorted originally.

3.2 Visualizing Hierarchies


If data are arranged in a hierarchical order, the visualization problem can be
re-formulated so as to visualize the hierarchical structure (i.e., a tree in general).
It becomes more difficult if such trees grow in width or depth. More challenge is
imposed if interaction is to be added for the user to browse or focus on a subtree.
Many algorithms have been developed in this area as SpaceTree [16], Cheops [3],
cone trees [18], InfoTV [5] and InfoCube [17]. Treemaps [19] can also by used
to visualize hierarchies. The idea of a treemap is to split the space into regions
according to the number of branches as well as the size of the hierarchy. Versions
of treemaps are clustered treemaps [21], Voronoi treemaps [2] and 3D Treemaps
[4]. Circular visualization techniques can also be used to view hierarchies as in
information slices [1], Sunburst [20] and InterRing [22].
Note that visualizing a hierarchy as a set of levels where each level is repre-
sented as a graph consisting of a number of nodes and edges is another challenge.
Examples of techniques tackling this problem can be found in [14,13] where the
graph nodes are visualized as colored spheres while the edges are shown as thin
cylinders; each connecting two spheres. Although the hierarchical structure that
we are suggesting in this paper is built as an irregular pyramid with levels rep-
resented as graphs comprising nodes and edges, visualizing this structure is not
our target. Instead, we aim to convert the flat data into hierarchical data in
order to speed up the process of querying the data.
From Random to Hierarchical Data 327

4 Algorithm
Our algorithm can be split into two main phases to:
1. Build the hierarchy through data clustering using irregular pyramidal tech-
nique.
2. Visualize the established hierarchical data with respect to a query point.
As in other irregular pyramids, each level of the structure is represented as a
graph. At the lowest level (i.e., the base), the graph consists of a number of
cluster cells (or nodes) where each node is linked to every other node and where
every node contains only one space point. At the upper levels, a cluster node
may contain more points while the number of clusters is reduced at that level
comparing to its predecessor.
As mentioned in Sec. 2, some rules must exist in order to control the decima-
tion process of choosing the surviving cells and how cells at different levels are
linked together. The rules used in this structure are:
1. Two neighbors may both survive at the next level if and only if some binary
variable is set to zero during the decimation process. Such a rule is different
from the case of the adaptive and disparity pyramids [8,6]; and
2. For each nonsurviving node, there exists at least one surviving node in its
neighborhood. Such a rule is true in case of the adaptive and disparity
pyramids.
Suppose that the set of clusters at a given level i is L(i) = {C(i,1) , C(i,2) , ..., C(i,n) }
where n is the number of clusters at this level; and C(i,j) is a cluster consisting of
a number of space points (where j ∈ {1, ..., n}). Also, we can define a cluster as
C(i,j) = {p(i,j,1) , p(i,j,2) , ..., p(i,j,m) } where j ∈ {1, ..., n} is the cluster number;
m is the number of points in cluster; and p is a vector whose length depends on
the dimension of space.
A binary variable q is reset to 0 for every two clusters, C(i,j) and C(i,k) , at level
Li . The following Euclidean distance is calculated among the points contained
in these clusters:

D
    2
d p(i,j,a) , p(i,k,b) = ||p(i,j,a) − p(i,k,b) || =  p(i,j,a,d) − p(i,k,b,d) (1)
d=1

where i is the level number; j and k are the cluster numbers; a and b are the
point numbers; D is the dimension of space and ||.|| represents the norm of the
difference between the two vectors. Manhattan metric may be used instead for
faster results as:
  D
d p(i,j,a) , p(i,k,b) = p(i,j,a,d) − p(i,k,b,d) (2)
d=1
 
The value of the distance d p(i,j,a) , p(i,k,b) is comparedagainst a threshold

t supplied as a parameter to the algorithm. If the test d p(i,j,a) , p(i,k,b) < t
328 R. Elias, M. Al Ashraf, and O. Aly

Table 1. Different creation and linking possibilities

q C(i+1,j) C(i+1,k) Action


1 Yes No Link C(i,k) to C(i+1,j)
1 No No Create a new C(i+1,j) and link C(i,k) & C(i,j) to C(i+1,j)
1 Yes Yes Delete C(i+1,j) and link C(i,j) to C(i+1,k)
0 Yes No Create a new C(i+1,k) and link C(i,k) to C(i+1,k)
0 No No Create a new C(i+1,k) and link C(i,k) to C(i+1,k)
Create a new C(i+1,j) and link C(i,j) to C(i+1,j)
0 Yes Yes Take no action

results in a true condition, the search is broken immediately for the current
clusters and the variable q is set to 1; otherwise, q remains 0. Thus, different
situations arise with respect to the value of q and whether or the parents C(i+1,j)
and C(i+1,k) of clusters C(i,j) and C(i,k) do exist. Those can be summarized as
listed in Table 1.
The procedure explained above is repeated until all clusters are within dis-
tances greater than the above threshold t from each other (similar to [8,6]). Note
that statistics like the mean and the size of the clusters are updated at each level.
After storing the flat random data along a hierarchy, viewing parts of the
data relevant to a query point becomes easier. Spiral and axes techniques [11]
are applied to the hierarchical data. Clusters constituting each level are repre-
sented as pixels where each pixel has a color indicating the mean of all points
contained in the cluster. Interactivity is added as clicking on a pixel displays
the children underneath. A way of magnifying the results is also included in our
implementation.

5 Experimental Results

We considered different factors while building the pyramid. Among these factors
are the number of data points to be clustered and the threshold used and their
impact on the number of levels and the number of clusters at the top level and
consequently on the reduction factor of clusters.
Ten files with sets ranging from 100 to 1000 5D points are used with a fixed
threshold t of 800 applied to Manhattan metric. As expected, the number of
levels increases as the number of points increases for the same threshold. This
is shown in Fig. 2(a).
In our hierarchical structure, a cluster contains one data point at the lowest
level, which makes the number of clusters at this level equal to the number of
points. As we go up the hierarchy, the number of clusters gets smaller while
the number of points per cluster gets larger. For the ten files used before with
the same threshold t of 800, the greatest impact concerning the reduction of
the number of clusters with respect to the number of points happens at the
second level as shown in Fig. 2(b).
From Random to Hierarchical Data 329

(a)
(b)

Fig. 2. (a) The number of levels of the hierarchical structure increases as the number
of points increases. (b) The number of clusters is reduced significantly at the second
level of the hierarchy.

(a)

(b)

Fig. 3. (a) Number of clusters at the top level of the hierarchy for different point sets
and reduction factor values associated with these sets. For all cases, a threshold value
of 800 is used. (b) The percentage of the number of clusters at the top levels to the
total number of points decreases as the number of points increases.
330 R. Elias, M. Al Ashraf, and O. Aly

Fig. 4. The number of levels peeks to 4 before it decreases again to 3

(a)

(b)

Fig. 5. (a) Number of clusters at the top level of the hierarchy for different thresh-
old levels and reduction factor values associated with these threshold levels. (b) The
reduction factor increases as the threshold value increases.

For each data set where t = 800, the percentages of the number of clusters at
the top levels to the total number of points were measured. As expected from
Fig. 2(b), the percentage decreases as the number of points increases. Conse-
quently, the reduction factor increases as the number of points increases. This is
shown in Fig. 3.
In order to test the impact of the threshold value, one file containing 1000
points is used with threshold values ranging from 300 to 1300. In these cases, the
number of levels ranges from 2 to 4 according to the threshold value as shown
in Fig. 4.
It is logical that by increasing the threshold value, more points can be clustered
together and less clusters can be formed at the top level of the hierarchical
structure. As a consequence, the reduction factor should increase as the threshold
value increases. These results are shown in Fig. 5.
From Random to Hierarchical Data 331

(a) (b)

Fig. 6. Axes technique results for the same file after clustering. (a) Level 4 L(4) is
displayed with 243 points. (b) Level 3 L(3) showing the contents of one of points in the
lower right quadrant in (a).

(a)

(b)

Fig. 7. (a) Time consumed to perform both versions of the axes technique for different
sets of points. (b) Time consumed to perform both versions of the axes technique for
different sets of points.

In order to visualize the points along the hierarchy built as four levels for a
set of 1000 5D points with t= 800, we use both spiral and axes visualization
techniques. As shown in Fig. 6(a), we start by plotting the top level (L(4) ) of the
332 R. Elias, M. Al Ashraf, and O. Aly

clustered hierarchy that contains only 243 points (as opposed to 1000 points in
the original list). A cluster at the top level is represented as a point with a color
indicating the mean of the points (or sub-clusters) contained in that cluster. The
user has the ability to select a particular cluster and view its inner cluster points
where each point can represent a cluster that can be viewed hierarchically and
so on. Fig. 6(b) shows the contents of L(3) after selecting a point in the lower
right quadrant in L(4) .
In order to show the effect of our approach, we measured the time consumed
when using the axes technique in both cases of random and hierarchical data
for different sets of points. This is shown in Fig. 7. Notice that the difference
between both versions gets larger with larger number of points. This makes sense
as the reduction factor gets larger with larger number of points as mentioned
previously (refer to Fig. 3(a)).
We measured the time consumed to display random and hierarchical data for
the same test set using the spiral technique and these were 76 msec and 16 msec
respectively with a computer running at 2.0 GHz.

6 Conclusions
An irregular pyramidal scheme is suggested to transform random data hierar-
chically in an attempt to reduce time consumed searching the whole data for
a particular query. Tests show reductions in the amount of data processed and
consequently in time consumed.

References
1. Andrews, K., Heidegger, H.: Information slices: Visualising and exploring large
hierarchies using cascading, semi-circular discs. In: IEEE InfoVis 1998, pp. 9–12
(1998)
2. Balzer, M., Deussen, O., Lewerentz, C.: Voronoi treemaps for the visualization of
software metrics. In: Proc. ACM SoftVis 2005, New York, USA, pp. 165–172 (2005)
3. Beaudoin, L., Parent, M.-A., Vroomen, L.C.: Cheops: a compact explorer for com-
plex hierarchies. In: Proc. 7th conf. on Visualization (VIS 1996), Los Alamitos,
CA, USA, p. 87 (1996)
4. Bladh, T., Carr, D., Scholl, J.: Extending tree-maps to three dimensions: a compar-
ative study. In: Masoodian, M., Jones, S., Rogers, B. (eds.) APCHI 2004. LNCS,
vol. 3101, pp. 50–59. Springer, Heidelberg (2004)
5. Chignell, M.H., Poblete, F., Zuberec, S.: An exploration in the design space of
three dimensional hierarchies. In: Human Factors and Ergonomics Society Annual
Meeting Proc., pp. 333–337 (1993)
6. Elias, R., Laganiere, R.: The disparity pyramid: An irregular pyramid approach
for stereoscopic image analysis. In: VI 1999, Trois-Rivières, Canada, May 1999, pp.
352–359 (1999)
7. Hartman, N.P., Tanimoto, S.: A hexagonal pyramid data structure for image pro-
cessing. IEEE Trans. on Systems, Man and Cybernetics 14, 247–256 (1984)
8. Jolion, J.M., Montavert, A.: The adaptive pyramid: A framework for 2d image
analysis. CVGIP: Image Understanding 55(3), 339–348 (1991)
From Random to Hierarchical Data 333

9. Jolion, J.M., Rosenfeld, A.: A Pyramid Frame-work for Early Vision. Kluwer Aca-
demic Publishers, Dordrecht (1994)
10. Keim, D.A., Ankerst, M., Kriegel, H.-P.: Recursive pattern: A technique for visu-
alizing very large amounts of data. In: Proc. 6th VIS 1995, Washington, DC, USA,
pp. 279–286, 463 (1995)
11. Keim, D.A., Kriegel, H.: VisDB: Database exploration using multidimensional vi-
sualization. In: Computer Graphics and Applications (1994)
12. Keim, D.A., Kriegel, H.-P.: Visualization techniques for mining large databases: A
comparison. IEEE Trans. on Knowl. and Data Eng. 8(6), 923–938 (1996)
13. Kerren, A.: Explorative analysis of graph pyramids using interactive visualization
techniques. In: Proc. 5th IASTED VIIP 2005, Benidorm, Spain, pp. 685–690 (2005)
14. Kerren, A., Breier, F., Kgler, P.: Dgcvis: An exploratory 3d visualization of graph
pyramids. In: Proc. 2nd CMV 2004, London, UK, pp. 73–83 (2004)
15. Kropatsch, W.G.: A pyramid that grows by powers of 2. Pattern Recognition Let-
ters 3, 315–322 (1985)
16. Plaisant, C., Grosjean, J., Bederson, B.B.: Spacetree: Supporting exploration in
large node link tree, design evolution and empirical evaluation. In: Proc. IEEE
InfoVis 2002, Washington, DC, USA, p. 57 (2002)
17. Rekimoto, J., Green, M.: The information cube: Using transparency in 3d infor-
mation visualization. In: Proc. 3rd WITS 1993, pp. 125–132 (1993)
18. Robertson, G.G., Mackinlay, J.D., Card, S.K.: Cone trees: animated 3d visualiza-
tions of hierarchical information. In: Proc. CHI 1991, New York, USA, pp. 189–194
(1991)
19. Shneiderman, B.: Tree visualization with tree-maps: 2-d space-filling approach.
ACM Trans. Graph. 11(1), 92–99 (1992)
20. Stasko, J.T., Zhang, E.: Focus+context display and navigation techniques for en-
hancing radial, space-filling hierarchy visualizations. In: INFOVIS, p. 57 (2000)
21. Wattenberg, M.: Visualizing the stock market. In: CHI 1999 extended abstracts on
Human factors in computing systems, New York, USA, pp. 188–189 (1999)
22. Yang, J., Ward, M.O., Rundensteiner, E.A.: Interring: An interactive tool for vi-
sually navigating and manipulating hierarchical structures. In: Proc. IEEE InfoVis
2002, Washington, DC, USA, p. 77 (2002)
Electric Field Theory Motivated Graph
Construction for Optimal Medical Image
Segmentation

Yin Yin, Qi Song, and Milan Sonka

Electrical and Computer Engineering, The University of Iowa, Iowa City, IA, USA
Milan-Sonka@uiowa.edu

Abstract. In this paper, we present a novel graph construction method


and demonstrate its usage in a broad range of applications starting
from a relatively simple single-surface segmentation and ranging to very
complex multi-surface multi-object graph based image segmentation. In-
spired by the properties of electric field direction lines, the proposed
method for graph construction is inherently applicable to n-D problems.
In general, the electric field direction lines are used for graph “column”
construction. As such, our method is robust with respect to the initial
surface shape and the graph structure is easy to compute. When ap-
plied to cross-surface mapping, our approach can generate one-to-one
and every-to-every vertex correspondent pairs between the regions of
mutual interaction, which is a substantially better solution compared
with other surface mapping techniques currently used for multi-object
graph-based image segmentation.

1 Introduction

Wu and Chen introduced graph search image segmentation called optimal net
surface problem in 2002 [1]. Use of this method in medical image segmenta-
tion area closely followed [2,3,4,5,6,7,8,9,10]. Out of these publications, [3] is
considered a pioneering paper in which Li et al. explained and verified how to
optimally segment single and multiple coupled flat surfaces represented by a vol-
umetric graph structure. This work was further extended to optimally segment
multiple coupled closed surfaces of a single object [2]. Later, Garvin introduced
in-region cost concept [5] and applied to 8-surface segmentation of retinal layers
from OCT images [7]. Olszewski and Zhao utilized this concept for 4D dual-
surface inner/outer wall segmentation in coronary intravascular ultrasound and
in intrathoracic airway CT images [4]. Yin has further extended this framework
by solving a general “multiple surfaces of multiple objects” problem with ap-
plications to knee cartilage segmentation and quantification [8]. Independently,
Li added elasticity constraint and segmented 3D liver tumors [9]. The optimal
surface detection algorithms were also employed for 3D soft tissue segmentation
in [6] as well as for segmentation of a coupled femoral head and ilium and a
coupled distal femur and proximal tibia in 3D CT data [10].

A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 334–342, 2009.

c Springer-Verlag Berlin Heidelberg 2009
Electric Field Theory Motivated Graph Construction 335

While already an extremely powerful paradigm, the optimal surface detection


approach is also facing some problems commonly seen for other segmentation
algorithms. In this paper, we will, for the first time, present a novel graph con-
struction method and demonstrate its usage in a broad range of applications
starting from a relatively simple single-surface segmentation and ranging to very
complex multi-surface multi-object graph based image segmentation. Inspired by
the properties of electric field direction lines, the proposed method for graph con-
struction is inherently applicable to n-D problems. In general, the electric field
direction lines are used for graph “column” construction. As such, our method is
robust with respect to the initial surface shape and the graph structure is easy
to compute. When applied to cross-surface mapping, our approach can generate
one-to-one and every-to-every vertex correspondent pairs between the regions of
mutual interaction, which is a substantially better solution compared with other
surface mapping techniques contemporary used for multi-object graph-based im-
age segmentation.

2 Methods
2.1 Graph Structures for Optimal Surface Detection
The basic graph construction idea comes from a study of an optimal V-weight
net surface problem on proper ordered multi-column graphs [1]. Let us start from
a simple example shown on Fig. 1(a). Each node is assigned a cost value C. Each
edge has infinite capacity. We reassign the cost as Ca = Ca , Cb = Cb − Ca , Cc =
Cc − Cb , Cd = Cd − Cc . . .. This cost assignment is called cost translation. After
translation, we connect a source s to all negative C  and connect all positive C 
to a sink t. The connection-edge capacity is assigned to |C  |. A max-flow/min-
cut computation will partition the nodes of this graph into two sets – S and
T , such that s ∈ S and t ∈ T . Note that S − s is a closed set, meaning that
the graph cut position on column i must be −ε higher and θ lower than the
graph cut position on column j, so that the minimum and maximum distances
between the two cut positions are −ε and θ, respectively. Furthermore, the total
translated costs in the closed set are guaranteed to be minimal because their
sum and the cost of the corresponding cut only differ by a constant (the sum
of absolute values of all negative C  ). Thus, summing up these costs guarantees
that the nodes immediately under the graph cut have the sum of untranslated
costs equivalent to the sum of translated costs in the closed set. For that reason,
the surface formed by such a cut is globally optimal.
In image segmentation tasks, the nodes on the graph correspond to candidate
searching points. We want to find one and only one point along each searching
direction which corresponds to each column in the graph. The graph cut on
columns provides a globally optimal solution which gives a minimum sum of
point costs under a specific graph structure. Based on this simple two-column
relationship, n-D graphs can be constructed. Fig. 1(b) shows a 3D example,
in which the graph cut forms a 3D surface. The 4D case can be seen in [4].
If i and j are from the same surface, the min and max distances define the
336 Y. Yin, Q. Song, and M. Sonka

(a) (b)

Fig. 1. A simple example of proper ordered multi-column graph. (a) Two columns. (b)
Columns combined in 3D.

surface smoothness. If i and j are from different surfaces of one object, such a
configuration corresponds to a multiple coupled surface relationship. If i and j
are from different objects, multi-object relationships are represented [8]. If i and
j are the grid neighbors on an image, a flat surface will result [3]. If i and j
are the vertex neighbors of a closed surface, a closed surface will be found as a
solution of the graph search optimization process [2].
While the theory – as presented – is quite straightforward, the implementation
of these basic principles is not simple. In our multi-object multi-surface image
segmentation task, two problems frequently arise. One is to prevent occurrence of
surface warping when applying graph search iteratively. Another issue is finding
a reliable cross-surface mapping method. In our previous work we have employed
a 3D distance transform and medial sheet definition approaches to define cross-
surface mapping. This approach suffers from local inconsistencies in areas of
complex surface shapes. Motivated by electric field theory, we devised a new
method for cross-surface mapping defining the searching directions (columns)
of our graph column construction, which has overcome the limitations of our
previous approach. This approach has proven to be very promising to handle
the two identified problems when applied to medical image segmentation tasks,
as described below.

2.2 A New Search Direction Based on Electric Field Direction Line


The optimal closed surface segmentation works in a way that is – to some ex-
tent – similar to the functionality of deformable models. As most deformable
models do, our graph-search approach searches for the solution along normal
Electric Field Theory Motivated Graph Construction 337

(a) (b)

Fig. 2. A simulation of ELF. (a) Multiple unit charge points used for field definition
– the electric field is depicted in red and ELF is shown in white. (b) Simulated ELF
(red lines) for closed surface model of a 3D bifurcation.

directions to an approximate pre-segmentation. While the solution finding pro-


cesses of deformable models and graph search segmentation differ significantly,
both methods may suffer from the sensitivity of the employed normal directions
to the local surface shapes, especially rapid shape changes. As a result of the lack
of directional robustness of the normals, these normal directions may intersect.
In a worst case, the initial contour may warp and result in segmentation failure.
Recall the Coulomb’s law in basic physics:
1 Q
Ei = r̂ , (1)
4πε0 r2
where Ei is the electric field at point i. Q is the charge of point i; r is the distance
from the point i to the evaluation point; r̂ is the unit vector pointing from the
point i to the evaluation point. ε0 is the vacuum permittivity.
Since the total electric field E is the sum of Ei ’s:

E= Ei , (2)

the electric field has the same direction as the electric line of force (ELF).
When multiple source points are forming an electric field, the electric lines of
force exhibit a non-intersection property, which is of major interest in the context
of our graph construction task. This property can be shown in 2D in Fig. 2(a).
Note that if we change r2 to rm (m > 0), the non-intersection property still
holds. The difference is that the vertices with longer distances will be penalized
in ELF computing. Considering that the surface is composed of a limited number
of vertices greatly reduces the effect of charges with short distances. In order to
compensate, we selected m = 4. Discarding the constant term, we defined our
electric field as Ei = r14 . Inspired by ELF, we assigned unit charges to each
338 Y. Yin, Q. Song, and M. Sonka

(a) (b)

Fig. 3. Correspondent pair generation in 2D and 3D. (a) 2D case where the red lines
are ELF and their connecting counterparts are depicted in green. The constraint points
are at the intersection position between the green line and the corresponding coupled
surface. (b) Use of barycentric coordinates to interpolate back-trace lines in 3D, then
connect to each vertices for the intersected triangle.

vertex on a 3D closed bifurcation surface model and simulated ELF as shown


in Fig. 3(b). Our graph columns will be constructed along these ELF directions,
thus we are searching along ELF directions for an optimal segmentation solution.

2.3 Cross-Surface Mapping by ELF Search Direction

If there is one closed surface charge in an n-D space, there is only one n-D point
inside this closed surface having a zero electric field. At an extreme case, the
closed surface will converge to a point when searching along the ELF. Except
for that point, any position having non-zero electric field in n-D space will be
crossed by one ELF. In that case, we can trace back along the ELF to a specific
position on the surface (whether it is a vertex or not). This property can be used
to relate multiple coupled surfaces, thus defining cross-surface mapping.
In the application of cross-surface mapping, we compute ELF for each closed
surface within a searching range independently. Considering a task of segmenting
multiple mutually interacting surfaces for multiple mutually interacting objects,
the regions in which the objects are in proximity to each other are called contact
areas. We can compute medial sheets between coupled surfaces to identify the
separation of objects in the contact areas. Clearly, any vertex for which the ELF
intersects the medial sheet can be regarded as belonging to the contact area. To
form correspondent vertex pairs, the medial-sheet-intersected ELF will connect
the coupled surface points while intersecting the medial sheet at one and only one
point, forming an intersection point on the coupled surface, used as constraint
point. The vertex having the intersected ELF and its corresponding constraint
point will form a correspondent vertex pair. Consequently, the ELF connecting
this pair forms the searching graph column. Fig. 3(a) shows a 2D case in which
the red lines depict the ELF and their connecting counterparts are depicted by
Electric Field Theory Motivated Graph Construction 339

green lines. The constraint points are at the intersection position between the
green lines and the corresponding surfaces. In the 2D case, the back-trace can
be done by linear interpolation of the nearest ELF. Subsequently, the constraint
points are connected to the points on the coupled surface. In the 3D case, the
lines can be traced according to the barycentric coordinates of the intersected
triangles. As shown in Fig. 3, the constraint point are further connected to the
vertices of the triangle.
Each vertex in the contact area can therefore be used to create a constraint
point affecting the coupled surface. Importantly, the correspondent pairs of ver-
tices from two interacting objects in the contact area identified using the ELF
are guaranteed to be in a one-to-one relationship and every-to-every mapping,
irrespective of surface vertex density. As a result, the desirable property of main-
taining the previous surface geometry is preserved.

3 Applications
3.1 Single-Surface Detection along a 3D Bifurcation
An example is shown in Fig. 4(a) in which a perfect pre-segmented inner bound-
ary of a 3D bifurcation is provided and the outer boundary needs to be identified.
The graph search along the surface-normal direction will corrupt the surface due
to the sharp corner as shown in Fig. 4(b). However, when employing the direc-
tionality constraints specified by ELF, the directionality of the “normal” lines
along the surface is orderly and the search can avoid the otherwise inevitable
corruption of the surface solution (Fig. 4(c)).

(a) (b) (c)

Fig. 4. 3D bifurcation model demonstrating segmentation of the outer surface. Note


that the bifurcating objects consists of a tubular structure with inner and outer sur-
faces. (a) Perfect pre-segmentation (red line) of inner boundary surface, which is used
to guide segmenting the outer border. (b) Graph searching result (red line) performed
along normal directions of the pre-segmentation surface using our previous approach
– notice the severe corruption of the surface along inner area of the bifurcation. (c)
Graph searching result (red line) using graph constructed along ELF directions, no
surface corruption present.
340 Y. Yin, Q. Song, and M. Sonka

3.2 Tibial Bone-Cartilage Segmentation in 3D


Another example demonstrates iterative graph searching of tibial bone-cartilage
interface. This segmentation step is frequently used for approximate segmenta-
tion of individual bones prior to final complete-joint segmentation that is based
on multi-object multi-surface optimal graph searching. As such, robustness of
this initial pre-segmentation step is necessary. In Fig. 5(a), an initial tibia mean
shape is positioned on a 3D image of human tibial bone. The initial mean shape
may not be well positioned and after several iterations, the solution wraps around
itself near the tibial cartilage. If we want to segment cartilage based on this pre-
segmentation result, a segmentation failure will likely result. Incorporating ELF
paradigm in the graph construction overcomes this problem and substantially
increases the robustness of the pre-segmentation step (Fig. 5(c)).

(a) (b) (c)

Fig. 5. Tibia bone-cartilage interface segmentation in 3D performed using iterative


graph searching. (a) An initial 3D contour (red line) placed on the 3D MR image
of human tibia. (b) Iterative volumetric graph searching result (red line) performed
along normal directions, notice the surface wrapping near the tibial cartilage, that
is detrimental to consequent cartilage segmentation. (c) Iterative volumetric graph
searching result (red line) performed along ELF directions, no surface wrapping present
– using the same initialization and the same number of iterations.

3.3 Graph Based Femur-Tibia Cartilage Delineation in 3D


The last example is a much more complicated 3D multi-object multi-surface
graph search segmentation of mutually interacting femoral and tibial cartilage.
If no cross-object relationship would be considered, the segmented femoral and
tibial cartilage may overlap as shown in Fig. 6(a). Creating a multi-object link [8]
according to the constraint point mapping technique, tibia and femur bones and
cartilages can be delineated in a reasonable manner, even in images for which
the cartilage boundaries are not visually obvious (Fig. 6).
When comparing performance of the previous multi-object multi-surface image
segmentation applied to knee-joint bone and cartilage segmentation with the new
approach that uses ELF-based graph construction, the method’s performance im-
proved substantially. The dice similarity coefficient (DSC) [11] measurement of 8
3D MR dataset between computer segmentation result and manually-defined inde-
pendent standard improved from 0.709±0.007 to 0.738±0.012. For comparison, if
no vertex correspondence is used and all other method’s parameters are otherwise
identical, these image datasets are segmented with average DSC of 0.689±0.009.
Electric Field Theory Motivated Graph Construction 341

(a) (b)

Fig. 6. Graph-based femur-tibia cartilage delineation in 3D. (a) Graph searching result
without using correspondent vertex pairs. (b) Graph searching result using constraint-
point correspondent vertex pairs.

4 Conclusion

A new method for image segmentation graph construction was presented in


which the graph searching direction are defined according to the directions de-
fined by the electric line field paradigm applied to n-D image data. This approach
is suitable for creating graph searching columns for which the non-intersection
property of the ELF guarantees non-wrapping surface segmentation outcome
when dealing with complex local surface shapes or rapid shape changes. Using
the non-intersection property, a constraint point cross-surface mapping technique
was designed, which does not require surface merging and preserves the surface
geometry. Furthermore, one-to-one and every-to-every mapping is obtained at
the coupled-surface contact area. We are convinced that this property makes our
mapping technique superior to that presented in [10] or utilized in our previous
nearest-point graph construction method [8]. By building the multi-object links
from the constraint point correspondent vertex pairs, the graph can optimally
delineate the bone and cartilage surfaces of femoral and tibial cartilage surfaces.
The presented method is not free of several limitation. The most significant
is that the ELF definition is computationally demanding for surfaces with high
vertex density. We are currently exploring ways to accelerate this process by
only using subsampled local vertices instead of all available vertices. Another
research direction is to compute ELF at image grid positions and interpolate to
form dense searching columns. However, even with the computational demands
resulting in about X-times slower processing compared to the previous nearest-
medial-sheet method [8], the improvements in image segmentation quality clearly
justify the additional computational requirements.
342 Y. Yin, Q. Song, and M. Sonka

Acknowledgments

This work was supported, in part, by NIH grants R01–EB004640, R44–AR052983,


and P50 AR055533. The contributions of C. Van Hofwegen, N. Laird, N. Muhlen-
bruch, and R. Williams, who provided the knee-joint manual tracings, are grate-
fully acknowledged.

References
1. Wu, X., Chen, D.Z.: Optimal net surface problem with applications. In: Widmayer,
P., Triguero, F., Morales, R., Hennessy, M., Eidenbenz, S., Conejo, R. (eds.) ICALP
2002. LNCS, vol. 2380, pp. 1029–1042. Springer, Heidelberg (2002)
2. Li, K., Millington, S., Wu, X., Chen, D.Z., Sonka, M.: Simultaneous segmentation
of multiple closed surfaces using optimal graph searching. In: Christensen, G.E.,
Sonka, M. (eds.) IPMI 2005. LNCS, vol. 3565, pp. 406–417. Springer, Heidelberg
(2005)
3. Li, K., Wu, X., Chen, D.Z., Sonka, M.: Optimal surface segmentation in volumetric
images – a graph-theoretic approach. IEEE Trans. Pattern Anal. and Machine
Intelligence 28(1), 119–134 (2006)
4. Zhao, F., Zhang, H., Walker, N.E., Yang, F., Olszewski, M.E., Wahle, A., Scholz,
T., Sonka, M.: Quantitative analysis of two-phase 3D+time aortic MR images.
SPIE Medical Imaging, vol. 6144, pp. 699–708 (2006)
5. Haeker, M., Wu, X., Abramoff, M., Kardon, R., Sonka, M.: Incorporation of re-
gional information in optimal 3-D graph search with application for intraretinal
layer segmentation of optical coherence tomography images. In: Karssemeijer, N.,
Lelieveldt, B. (eds.) IPMI 2007. LNCS, vol. 4584, pp. 607–618. Springer, Heidelberg
(2007)
6. Heimann, T., Munzing, S., Meinzer, H., Wolf, I.: A shape-guided deformable
model with evolutionary algorithm initialization for 3D soft tissue segmentation.
In: Karssemeijer, N., Lelieveldt, B. (eds.) IPMI 2007. LNCS, vol. 4584, pp. 1–12.
Springer, Heidelberg (2007)
7. Garvin, M.K., Abramoff, M.D., Kardon, R., Russell, S.R., Wu, X., Sonka, M.:
Intraretinal layer segmentation of macular optical coherence tomography images
using optimal 3-D graph search. IEEE Trans. Med. Imaging 27(10), 1495–1505
(2008)
8. Yin, Y., Zhang, X., Sonka, M.: Optimal multi-object multi-surface graph search
segmentation: Full-joint cartilage delineation in 3D. In: Medical Image Understand-
ing and Analysis 2008, pp. 104–108 (2008)
9. Li, K., Jolly, M.P.: Simultaneous detection of multiple elastic surfaces with appli-
cation to tumor segmentation in ct images. In: Proc. SPIE, vol. 6914, pp. 69143S–
69143S–11 (2008)
10. Kainmueller, D., Lamecker, H., Zachow, S., Heller, M., Hege, H.C.: Multi-object
segmentation with coupled deformable models. In: Proc. of Medical Image Under-
standing and Analysis, pp. 34–38 (2008)
11. Dice, L.R.: Measures of the amount of ecologic association between species. Ecol-
ogy 26, 297–302 (1945)
Texture Segmentation by Contractive Decomposition
and Planar Grouping

Anders Bjorholm Dahl1 , Peter Bogunovich2, and Ali Shokoufandeh2


1
Technical University of Denmark, Department of Informatics
Lyngby, Denmark
abd@imm.dtu.dk
2
Drexel University, Department of Computer Science
Philadelphia, PA, USA
{pjb38,ashokouf}@drexel.edu

Abstract. Image segmentation has long been an important problem in the com-
puter vision community. In our recent work we have addressed the problem of
texture segmentation, where we combined top-down and bottom-up views of the
image into a unified procedure. In this paper we extend our work by proposing
a modified procedure which makes use of graphs of image regions. In the top-
down procedure a quadtree of image region descriptors is obtained in which a
novel affine contractive transformation based on neighboring regions is used to
update descriptors and determine stable segments. In the bottom-up procedure
we form a planar graph on the resulting stable segments, where edges are present
between vertices representing neighboring image regions. We then use a vertex
merging technique to obtain the final segmentation. We verify the effectiveness
of this procedure by demonstrating results which compare well to other recent
techniques.

1 Introduction
The problem of image segmentation, with the general goal of partitioning an image
into non-overlapping regions such that points within a class are similar while points
between classes are dissimilar [1], has long been studied in computer vision. It plays a
major role in high level tasks like object recognition [2,3], where it is used to find image
parts corresponding to scene objects, and image retrieval [4], where the objective is to
relate images from similar segments. Textured objects, in particular, pose a great chal-
lenge for segmentation since patterns and boundaries can be difficult to identify in the
presence of changing scale and lighting conditions [5]. Often textures are characterized
by repetitive patterns [6], and these are only characteristic from a certain scale. Below
this scale these patterns will only be partly visible [7] which makes precise boundary
detection in this case an additional challenge. The intensity variation of textures is of-
ten overlapping with the background, which may add further difficulty. Examples of
proposed approaches to texture segmentation include active contours [8], templates [2],
or region descriptors [9]. We recently introduced a new approach to texture segmen-
tation [10], where the procedure is unsupervised in the sense that we assume no prior
knowledge of the target classes, i.e. number of regions or known textures.

A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 343–352, 2009.

c Springer-Verlag Berlin Heidelberg 2009
344 A.B. Dahl, P. Bogunovich, and A. Shokoufandeh

(a) (b) (c)

Fig. 1. Texture segmentation from contractive maps. In (a) a heterogeneous image is shown
created by composing a Brodatz texture [11] with itself rotated 90o in a masked out area obtained
from the bird in (b). The resulting segmentation is shown in (c).

(b)

(a) (c)

Fig. 2. The segmentation procedure. The top-down decomposition of the image is shown in (a).
In (b) the feature kernel set is shown. The first image in (c) is the over-segmented image obtained
from the decomposition. The segments are merged in the bottom-up procedure to obtain the final
segment shown in the last two images.

Our segmentation technique begins with a top-down quadtree decomposition proce-


dure where nodes describe image regions such that the root describes the entire image;
the next four children each describe quarter and so on. Each quadtree node contains a
descriptor characterizing the texture of the associated region. This characterization is
obtained as a distribution of a set of kernels that we introduced in [10]. At each level of
the tree a novel contractive transformation is computed for each node and is applied to
update the node. The decomposition is controlled by the stability of the resulting node
descriptors relative to their neighbors, and a leaf is obtained either when a node is deemed
stable or it covers a subpixel image region. Following this procedure we apply our graph-
based merging technique. A planar graph is formed on the resulting leaves with edges
connecting neighboring image regions whose weights are based on descriptor similarity.
The final segmentation is obtained by iteratively merging nodes with highest similarity.
Figure 1 shows a result of our procedure. Figure 1(a) shows a heterogeneous image
with itself rotated 90o in a masked out area obtained from the bird in Figure 1(b). The
resulting segmentation is shown in Figure 1(c). An overview of our procedure is shown
Texture Segmentation by Contractive Decomposition and Planar Grouping 345

in Figure 2. The remainder of the paper is summarized as follows: In section 2 we ex-


plain the entire procedure by first reviewing the kPIFS used to obtain a base description
of the image, followed by a description of the top-down process where we introduce our
novel contraction maps, and finally we describe the bottom-up process which includes
the details of the planar graph merging technique. In section 3 we present some results
and compare them to other methods. We provide a conclusion in section 4.

2 Method

In this section we present an overview of the general procedure for unsupervised texture
segmentation. First we give a brief review of the process for obtaining base characteri-
zations of small image regions which serve as a starting point for the segmentation. We
then indicate our modifications to the decomposition transformation and the approach
to merging leaves and generating the final segmentation.

2.1 kPIFS and the Base Descriptors

In [10] we introduced the concept of kernel partition iterated function systems (kPIFS)
which proved to be a viable technique for obtaining a basic characterization of local
image structure to serve as a starting point for segmentation. Since we are primarily
focused on the top-down and bottom-up procedures in this paper we only provide a
brief review of kPIFS descriptors and we refer the reader to our previous paper [10] for
more details.
The kPIFS technique which we developed is inspired by and closely related to the
partition iterated function systems (PIFS) introduced by Jacquin [12] for the purpose
of lossy image compression [13]. We saw potential in PIFS to characterize local image
structure based on evidence indicating that it can be used in tasks such as edge detection
[14] and image retrieval [15].
The traditional PIFS image compression technique computes a set of self-mappings
on the image. The process begins by partitioning an image into a set of domain blocks
DI , and again into smaller range blocks RI , as illustrated by Figure 3(b). The image is
encoded by matching an element d ∈ DI to each rk ∈ RI . In the course of matching,
a transformation θk which is generally affine is calculated for the domain block d
that matches range block rk and θk (d ) is used to represent rk . Once all of the maps
are computed they can be applied to an arbitrary image and will result in an accurate
reconstruction of the encoded image.
For our goal of characterizing local structure we designed kPIFS to avoid self-
mappings between domain blocks and range blocks. Instead we chose to find mappings
from an over-complete basis of texture kernels, DK , to the range blocks of the image as
illustrated by Figure 3(c). The kernels employed here are meant to represent local struc-
tural image patterns such as corners, edges of varying width and angle, blobs, and flat
regions. In our procedure, each image range block will be characterized by distances of
each of the domain kernels to the range block after a calibration transform is applied.
Specifically, for a domain kernel d ∈ DK and a range block rk ∈ RI the distance in
kPIFS is given by
346 A.B. Dahl, P. Bogunovich, and A. Shokoufandeh

(a) (b) (c)

Fig. 3. Comparison of PIFS and kPIFS. Part (a) shows the original image with the highlighted
area is focused on in (b) and (c). Part (b) is an example of PIFS where the best matching domain
block is mapped to a range block. Part (c) shows the kPIFS where the domain blocks are replaced
by domain kernels.

d − μd rk − μrk
δkPIFS (rk , d ) = − , (1)
σd σrk
where μx and σx are the mean and standard deviation respectively of block x. The
calibrated blocks will be highly influenced by noise if σrk is small and if it is zero we
cannot estimate δkPIFS . Therefore, we use a measure of flatness of the range blocks,

bf = σrk / μrk . If bf < tf , where tf is a threshold, we categorize the block as flat.
We then let each range block be described by its best mapped (least distant) domain
kernels. The similarity for a kernel is weighted by the relative similarity of all of the
kernels to the range block. Let Δrk denote the mean distance from each kernel in DK
to the current range block obtained from (1) and let γkernel be a scalar constant con-
trolling how many domain kernels are included in the descriptions. The kernel to range
block similarity is given by w[rk ,d ] = max {γkernelΔrk − δkPIFS (rk , d ), 0} for each
d ∈ DK to form a vector of similarities which is normalized yielding a range block
descriptor in the form of a distribution of domain kernels. Intuitively each w[rk ,d ] de-
scribes the error in fitting kernel d to block rk .

2.2 Top-Down Decomposition


In the first step of the top-down procedure we begin the construction of the quadtree
by decomposing the image to some start level lstart , where level 1 is the root cover-
ing the entire image, by splitting the region nodes at each level into 4 child subregion
nodes. Once we are at level lstart we calculate a descriptor histogram for each of the
22(lstart −1) region nodes by summing the kPIFS descriptors making up each region and
normalizing. From this point onward iterative transformations for each node at the cur-
rent level are constructed based on the local spatial neighborhoods and are applied to
each of the nodes until an approximate convergence is reached. At this point stable re-
gions are identified and the next level of the quadtree is constructed from the children
of the nodes based on some stability (or discrepancy) measure.
In practice the choice of lstart in both the original and modified version is important
in determining the resulting segments. If lstart is a small number then there is a risk that
the region nodes identified as stable will still contain much heterogeneity while a larger
lof theimages can result in an over-segmentation. We have experimentally found that
lstart = 6 is a good choice as a start level, i.e. at 32 × 32 sub-image nodes.
Texture Segmentation by Contractive Decomposition and Planar Grouping 347

The novel idea that we now introduce to this procedure addresses the iterative trans-
formations that are applied to the nodes until convergence. The convergence of both the
original transformation and the new one presented here rely on properties of contractive
transformations in a metric space [16]. Here we briefly review the necessary concepts.
Definition 1 (Contractive Transformation). Given a metric space (X, δ), a transfor-
mation T : X → X is called contractive or a contraction with contractivity factor s if
there exists a positive constant s < 1 so that δ(T (x), T (y)) ≤ sδ(x, y) ∀x, y ∈ X.
Let us then denote T ◦n (x) = T ◦ T ◦ · · · ◦ T (x); that is, T composed with itself n times
and applied to x. The property of contractive transformations that we are interested in
is given in the following theorem which is proved in [16].
Theorem 1 (Contractive Mapping Fixed Point Theorem). Let (X, δ) be a complete
metric space and let T : X → X be a contractive transformation, then there exists a
unique point xf ∈ X such that for all x ∈ X we have xf = T (xf ) = limn→∞ T ◦n (x).
The point xf is called the fixed point of T .
The importance of this theorem is that if we can show a transformation to be contractive
in a defined metric space, then we are sure that some fixed point will be reached by
applying the transformation iteratively. In both the original procedure and the updated
version the metric space was defined as the set of image region descriptor histograms
which can be thought of as lying in the space IRd . It follows that any metric on IRd can
be chosen, but in practice however we have just used the L1 distance metric, denoted

by δL1 and defined as δL1 (x, y) = di=1 |xi − yi |.
In the original paper on the procedure [10] we proposed a transformation to perform
an iterative weighted averaging of similar region descriptors within a local spatial neigh-
borhood. Specifically, given some descriptor wi at the current level of the quadtree, let
Ni denote the set of m × m spatially local neighbor descriptors around wi but not
including wi , and let μNi be the average L1 distance from wi to all of the other de-
scriptors in Ni . We then denote a weighted average distance tNi = ψμNi , where ψ is
some weighting constant, and denote the set of close descriptors Nic = {wj ∈ Ni :
dL1 (wi , wj ) ≤ tNi }. Then we define a transformation Fi for this descriptor to be the
average of the descriptors Nic and wi . More explicitly:
⎛ ⎞
1 
Fi (w) = ⎝w + wj ⎠ . (2)
1 + |Nic |
wj ∈Nic

A transformation Fi was found for each wi at the current level and it was applied it-
eratively to obtain updated descriptors, i.e. wni = Fi◦n (wi ), until δL1 (wni , wn+1
i )<
for some given error threshold . We claimed that each Fi was contractive and would
thus yield a fixed point descriptor based on a result from Van der Vaart and Van Zanten
[17]. While this appears sufficient, the proof is complicated and indirect and Fi takes a
somewhat inconvenient form. Here we propose a simpler affine transformation where
contractivity can easily be observed.
Our new transformation is also defined for each region descriptor at each level of
the quadtree. Let wi , Ni and tNi be defined as above and let Ni = {wi } ∪ Ni . We
348 A.B. Dahl, P. Bogunovich, and A. Shokoufandeh

now define a set of scalar weights for every descriptor in Ni such that s(i,j) represents
a measure of similarity between wi and wj for wj ∈ Ni . The weights are defined
as s(i,j) = max{(tNi − dL1 (wi , wj ))/ci , 0}, where ci is a normalization constant so
|N  |
that j=1i s(i,j) = 1. In this way all s(i,j) ≤ 1, and each descriptor wj ∈ Ni has
an associated similarity weight s(i,j) with the special scalar s(i,i) being the weight for
the wi . Now define
 a new descriptor vi to be a linear combination of the descriptors
in Ni as vi = wj ∈Ni s(i,j) wj , and our affine transformation Gi for descriptor wi
is given by
Gi (w) = s(i,i) w + vi . (3)
Again we iteratively apply Gi to wi obtaining wni = G◦n i (wi ) until convergence,
but here due to the simple affine form of Gi it is particularly easy to demonstrate
the contractivity of the transformation. For arbitrary descriptors x, y ∈ IRd we have
d
δL1 (Gi (x), Gi (y)) = j=1 |(s(i,i) xj + vij ) − (s(i,i) yj + vij )|. Notice that the vij ’s
all cancel out and the s(i,i) can be factored out, simplifying to δL1 (Gi (x), Gi (y)) =

s(i,i) dj=1 |xj − yj | = s(i,i) δL1 (x, y) and since s(i,i) ≤ 1, we have that Gi is either
contractive or it does not move wi at all, either way we are guaranteed by theorem 1 to
reach a fixed point descriptor which we can denote by wi . In practice the convergence
is quite fast and we generally need less than 10 iterations for  = 0.01.
When the fixed point descriptors wi are reached for all regions at the current level,
we identify the stability of each region based on the discrepancy of its fixed point to
the fixed point of its neighbors. Since both Fi and Gi average each wi with its similar
neighbors, there is a strong possibility that sub-images in the regions with high local
discrepancy after the iterative procedure will cover different textures. To avoid misclas-
sifications we split and repeat the contractive mappings on these regions at the next level
of the quadtree, as illustrated in Figure 2(a). The discrepancy of a node is measured by
comparing wi to the fixed points of its four spatially nearest neighbors which we de-
note by the set N i . Let μN i denote the average L1 distance from wi to the descriptors
in N i and let mN i denote the maximum distance from wi to N i , then the discrepancy
measure of the region is defined as Di = μN i + mN i .
Though we are only concerned with splitting and reprocessing unstable regions, in
practice all regions are split. From Di we are able to calculate a border measure for
each node as Bi = Di / max{Dj : j ∈ {1, . . . , Nk }} where Nk is the total number of
nodes at the current decomposition level. Bi determines how wi ’s children descriptors
are calculated. Let {w(i,j) : j ∈ {1, . . . , 4}} denote the 4 initial descriptors of wi ’s
children used in the next level of the quadtree. If Bi = 0 then the region is stable and
there is no chance of wi covering a boundary region and so we assign w(i,j) = wi for
all children. When Bi > 0 we let {v(i,j) : j ∈ {1, . . . , 4}} denote the descriptors of
the child regions calculated as the normalized sum of kPIFS histograms in the same
manner as at the starting level lstart . Then we obtain the new descriptors as w(i,j) =
(1 − Bi )wi + Bi v(i,j) .

2.3 Bottom-Up Merging of Regions


Upon the completion of the top-down procedure we obtain a quadtree decomposition
of the image with leaves representing non-overlapping stable image regions. The goal
Texture Segmentation by Contractive Decomposition and Planar Grouping 349

1. 0.07 2. 1, 2.
1. 2. 1, 2.

0.

0.
52

47
0.87 0.34 0.92
4. 3. 4. 3.
4. 0.77 3. 4. 0.77 3.
(a) (b)

Fig. 4. Bottom-up merging of image regions. Part (a) shows the obtained segments and (b) show
the corresponding graph. Edge weights are given similarity between the segments. In the right
hand of (a) and (b) segments 1 and 2 of the left sides of (a) and (b) are merged.

of the bottom-up procedure is to merge these leaves into homogeneous clusters which
form the final segmentation.
In our original approach we fit a mixture of Gaussians to the distribution of leaf
nodes wf using the approach of Figueiredo [18] and the final segmentation was found
by the Gaussian that gave the highest probability.
Our new approach begins by forming a planar graph G so that the vertices of G are
the leaf nodes and an edge (i, j) is formed between vertices representing adjacent image
regions with edge weight equal to δL1 (wi , wj ), the distance between the associated
fixed point descriptors. The bottom-up procedure then merges adjacent vertices of G
based on edge weight. Let αi denote the percentage of the total image covered by vertex
i. Then αi is considered in the merging, so the smallest regions will be forced to merge
with the most similar neighboring region and when merging any two vertices i, j the
ratio αi /αj is considered so that the merged vertex has a descriptor which is mostly
influenced by the relatively larger region.
The merging of vertices is done in two steps. Initially we merge all vertex pairs i, j
where the edge weight is close to 0, i.e. less than some small positive . These regions
had nearly identical fixed points and the disparity is most likely only due to the fact
that the fixed point is approximated. In the second step we let ΔG denote the average
weight in the current graph G which is updated after each merging is performed. We
proceed in merging the vertices i, j with the smallest current edge weight until the
relative weight δL1 (wi , wj )/ΔG is larger than some threshold γmerge ∈ [0, 1). Figure 4
gives an illustration of the process.

3 Experiments

In this section we show the experimental results of our procedure. The images used for
testing our procedure are from the Berkley image database [19] and the Brodatz textures
[11]. Our procedure has shown to be very powerful for texture segmentation, which is
demonstrated by comparing our results to state of the art methods of Fauzi and Lewis
[3], Houhou et al.[8], and Hong et al.[7].
In Fauzi and Lewis [3] they perform unsupervised segmentation on a set of composed
Brodatz textures [11]. We have compared the performance of our method to theirs by
making a set of randomly composed images from the same set of Brodatz textures.
These composed images are very well suited to our method because the descriptors
350 A.B. Dahl, P. Bogunovich, and A. Shokoufandeh

Fig. 5. Segmentation of the Brodatz textures [11]. The composition of the textures is inspired
by the segmentation procedure of Fauzi and Lewis [3]. Segmentations borders are marked with
white lines except the last image where a part in the lower right is marked in black to make it
visible.

(a) (b) (c)

Fig. 6. Comparative results. This figure shows our results compared to that of Hong et al.[7]. Our
results are on the top in (a) and (b) and right in (c).

precisely cover one texture, so to challenge our procedure we changed the composition.
Some examples of the results are shown in Figure 5. We obtain very good segmentation
for all images with only small errors along the texture boundaries. In 19 of 20 images
we found the correct 5 textures and only the texture in the lower right hand corner
of the last image was split into two. It should be noted that this texture contains two
homogenous areas. In [3] only 7 of 9 composed images were accurately segmented.
These results show that the texture characterization is quite good. But the challenge of
textures in natural images is larger, as we will show next.
We have tested our procedure on the same set of images from the Berkley segmen-
tation database [19] as was used in Hong et al.[7] and Houhou et al.[8]. The results are
compared in Figures 6 and 7. Our method preforms well compared to that of Hong et
al., especially in Figures 6(a) and (c). It should be noted that the focus of that paper was
also on texture scale applied to segmentation. The results compared to the method of
Houhou et al.are more alike and both methods find the interesting segments in all im-
ages. In Figures 7(e) and (f) our method finds some extra textures which are clearly dis-
tinct. In Figures 7(k) and (l) both methods find segments that are not part of the starfish,
but are clearly distinct textures. There are slight differences in the two methods, e.g. in
Figures 7(a) and (b) where the object is merged with a part of the background in our
method, whereas it is found very nicely in the method of Houhou et al. [8]. An example
in favor of our procedure is Figures 7(m) and (n) where part of the head and the tail is
not found very well by their method, whereas it is found very well by our procedure.
Texture Segmentation by Contractive Decomposition and Planar Grouping 351

(a) (b) (c) (d)

(e) (f) (g) (h)

(i) (j) (k) (i)

(m) (h)

Fig. 7. Comparative results. This figure shows our results in columns one and three compared to
the results from Houhou et al.[8] in columns two and four.

4 Conclusion
Texture poses a great challenge to segmentation methods, because textural patterns can
be hard to distinguish at a fine scale making precise boundary detection difficult. We
have presented a novel, computationally efficient approach to segmentation of texture
images. To characterize the local structure of the image, we begin by a top-down decom-
position in the form of a hierarchical quadtree. At each level of this tree a contractive
transformation is computed for each node and is iteratively applied to generate a novel
encoding of the sub-images. The hierarchical decomposition is controlled by the stabil-
ity of the encoding associated with nodes (sub-images). The leaves of this quadtree and
their incidency structure with respect to the original image will form a planar graph in a
natural way. The final segmentation will be obtained from a bottom-up merging process
applied to adjacent nodes in the planar graph. We evaluate the technique on artificially
composed textures and natural images, and we observe that the approach compares fa-
vorably to several leading texture segmentation algorithms on these images.

References
1. Pal, N.R., Pal, S.K.: A review on image segmentation techniques. Pattern Recognition 26(9),
1277–1294 (1993)
2. Borenstein, E., Ullman, S.: Class-specific, top-down segmentation. In: Heyden, A., Sparr,
G., Nielsen, M., Johansen, P. (eds.) ECCV 2002. LNCS, vol. 2351, pp. 109–122. Springer,
Heidelberg (2002)
352 A.B. Dahl, P. Bogunovich, and A. Shokoufandeh

3. Fauzi, M.F.A., Lewis, P.H.: Automatic texture segmentation for content-based image retrieval
application. Pattern Anal. & App. 9(4), 307–323 (2006)
4. Liu, Y., Zhou, X.: Automatic texture segmentation for texture-based image retrieval. In:
MMM (2004)
5. Malik, J., Belongie, S., Shi, J., Leung, T.: Textons, contours and regions: Cue integration in
image segmentation. In: IEEE ICCV, pp. 918–925 (1999)
6. Zeng, G., Van Gool, L.: Multi-label image segmentation via point-wise repetition. In: IEEE
Conference on Computer Vision and Pattern Recognition, 2008. CVPR 2008, pp. 1–8 (June
2008)
7. Hong, B.H., Soatto, S., Ni, K., Chan, T.: The scale of a texture and its application to seg-
mentation. In: 2008 IEEE Conference on Computer Vision and Pattern Recognition, pp. 1–8
(2008)
8. Houhou, N., Thiran, J., Bresson, X.: Fast texture segmentation model based on the shape
operator and active contour. In: CVPR, pp. 1–8 (2008)
9. Bagon, S., Boiman, O., Irani, M.: What is a good image segment? a unified approach to seg-
ment extraction. In: Forsyth, D., Torr, P., Zisserman, A. (eds.) ECCV 2008, Part IV. LNCS,
vol. 5305, pp. 30–44. Springer, Heidelberg (2008)
10. Dahl, A., Bogunovich, P., Shokoufandeh, A., Aanæs, H.: Texture segmentation from context
and contractive maps. Technical report (2009)
11. Brodatz, P.: Textures; a photographic album for artists and designers (1966)
12. Jacquin, A.E.: Image coding based on a fractal theory of iterated contractive image transfor-
mations. IP 1(1), 18–30 (1992)
13. Fisher, Y.: Fractal Image Compression - Theory and Application. Springer, New York (1994)
14. Alexander, S.: Multiscale Methods in Image Modelling and Image Processin. PhD thesis
(2005)
15. Xu, Y., Wang, J.: Fractal coding based image retrieval with histogram of collage error. In:
Proceedings of 2005 IEEE International Workshop on VLSI Design and Video Technology,
2005, pp. 143–146 (2005)
16. Rudin, W.: Principles of Mathematical Analysis, 3rd edn. McGraw-Hill, New York (1976)
17. van der Vaart, A.W., van Zanten, J.H.: Rates of contraction of posterior distributions based
on gaussian process priors. The Annals of Statistics 36(3), 1435–1436 (2008)
18. Figueiredo, M.A.T., Jain, A.K.: Unsupervised selection and estimation of finite mixture mod-
els. In: Proc. Int. Conf. Pattern Recognition, pp. 87–90 (2000)
19. Martin, D., Fowlkes, C., Tal, D., Malik, J.: A database of human segmented natural images
and its application to evaluating segmentation algorithms and measuring ecological statistics.
In: ICCV, vol. 2, pp. 416–423 (July 2001)
Image Segmentation Using Graph
Representations and Local Appearance and
Shape Models

Johannes Keustermans1 , Dieter Seghers1 , Wouter Mollemans2 ,


Dirk Vandermeulen1 , and Paul Suetens1
1
Katholieke Universiteit Leuven, Faculties of Medicine and Engineering,
Medical Imaging Research Center (Radiology - ESAT/PSI), University Hospital
Gasthuisberg, Herestraat 49, B-3000 Leuven, Belgium
johannes.keustermans@uz.kuleuven.ac.be
2
Medicim nv, Kardinaal Mercierplein 1, 2800 Mechelen, Belgium

Abstract. A generic model-based segmentation algorithm is presented.


Based on a set of training data, consisting of images with correspond-
ing object segmentations, a local appearance and local shape model is
build. The object is described by a set of landmarks. For each land-
mark a local appearance model is build. This model describes the local
intensity values in the image around each landmark. The local shape
model is constructed by considering the landmarks to be vertices in an
undirected graph. The edges represent the relations between neighboring
landmarks. By implying the markovianity property on the graph, every
landmark is only directly dependent upon its neighboring landmarks,
leading to a local shape model. The objective function to be minimized
is obtained from a maximum a-posteriori approach. To minimize this
objective function, the problem is discretized by considering a finite set
of possible candidates for each landmark. In this way the segmentation
problem is turned into a labeling problem. Mean field annealing is used
to optimize this labeling problem. The algorithm is validated for the seg-
mentation of teeth from cone beam computed tomography images and
for automated cephalometric analysis.

1 Introduction
The goal of image segmentation is to partition an image into meaningful dis-
joint regions, whereby these regions delineate different objects of interest in the
observed scene. Segmentation of anatomical structures in medical images is es-
sential in some clinical applications such as diagnosis, therapy planning, visual-
ization and quantification. As manual segmentation of anatomical structures in
two or three-dimensional medical images is a very subjective and time-consuming
process, there is a strong need for automated or semi-automated image segmen-
tation algorithms.
A large number of segmentation algorithms have been proposed. While earlier
approaches where often based on a set of ad hoc processing steps, optimization

A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 353–365, 2009.

c Springer-Verlag Berlin Heidelberg 2009
354 J. Keustermans et al.

methods have become established as being more powerful and mathematically


found. These segmentation algorithms formulate an appropriate cost function
that needs to be minimized, thereby obtaining the optimal image segmentation.
In order to formulate this cost function prior knowledge on the object to be
segmented, such as edges [1,2], homogeneity requirements on the statistics of
the regions [3,4] or a combination of both, is needed. The underlying assump-
tion using the edge prior is that the edges present in the image correspond to
boundaries between different objects or between objects and background in the
true scene. Segmentation of an object of interest in the image corresponds in this
case to edge detection. However edge detection is an ill-posed problem and hence
very sensitive to noise present in the images. Unlike the edge prior, homogeneity
requirements on statistics of the regions are much less sensitive to noise. This
type of prior knowledge assumes a homogeneous appearance of the object in the
image. Other kinds of prior knowledge can deal with for instance smoothness or
length of the object contour, favoring smoother or shorter contours. These types
are more arbitrary.
Segmentation now corresponds to optimizing the cost function. Among the op-
timization methods a distinction can be made between spatially discrete and spa-
tially continuous representations of the object [5]. In spatially continuous
approaches variational methods [4,6], leading to ordinary or partial differential
equations, can be used. Spatially discrete approaches on the other hand need com-
binatorial optimization methods [7,8]. Besides the distinction between spatially
continuous and discrete object representations, a distinction can be made between
explicit and implicit object representations. In the implicit case the object bound-
ary is represented as the zero-level set of some embedding function, whereas in the
explicit case a parametrization by using for instance splines is needed.
The performance of the segmentation algorithms can be improved by using
more prior knowledge on the object to be segmented, like object appearance or
shape. Examples of segmentation algorithms that incorporate knowledge such as
shape or appearance are numerous [6,9,10,11]. To make the segmentation algo-
rithm as generic as possible, a statistical learning approach can be used, in which
prior knowledge on the object to be segmented can be learned from training ex-
amples. In this way, from a set of training data, a model of the object shape or
appearance is build. Next this model is used to segment an image not contained
in the training data set. This kind of prior knowledge can be seen as high-level.
Recently Seghers [12] proposed a new segmentation algorithm that incor-
porates local shape and local appearance knowledge on the object to be seg-
mented. An explicit object representation consisting of a polygon surface mesh
is used. The nodes of this mesh are considered as landmarks. Object segmen-
tation corresponds to finding the optimal location for each landmark. Based on
a set of training data a local appearance model for each landmark is build.
Local appearance means that only the image intensity values in a local re-
gion around the landmark of interest are considered. To describe this local
appearance local image descriptors are used. To incorporate shape information,
a local shape model is used. In this way the stiffness of global shape models is
Image Segmentation Using Graph Representations 355

avoided [13]. To build the local shape model the polygon mesh is considered as an
undirected graph in which the nodes represent vertices and neighboring nodes
are connected by edges. By implying the markovianity property to the graph
each vertex is only directly dependent upon its neighboring vertices. In this way
a local shape model can be applied. This method does not suffer from the noise-
sensitivity of the edge-detection methods if a good local image descriptor is used.
Nor does this method assume homogeneity of the object appearance.
We extend the framework proposed by Seghers [12] by incorporating kernel
based methods for statistical model building and experimenting with other local
appearance models. We applied this segmentation algorithm to the segmentation
of teeth from Cone Beam Computed Tomography (CBCT) images of a patient.
The recent introduction of CBCT enables the routine computer-aided planning of
orthognatic surgery or orthodontic treatment due to its low radiation dose, unique
accessibility and low cost. These applications however require the segmentation of
certain anatomical structures from the 3D images, like teeth. The CBCT image
quality can be hampered by the presence of several artifacts, like metallic streak
artifacts due to orthodontic braces or dental fillings. The method should be able
to cope with these artifacts. Another application, also in the maxillofacial region,
is automatic 3D cephalometric analysis [14]. 3D Cephalometric analysis consists
of finding anatomical landmarks in 3D medical images of the head of the patient.
Based on the location of these anatomical landmarks for example an orthodontic
or orthognathic treatment planning can be made. Due to its notion of landmarks,
this method is particularly suited to automate this task.

2 Method
2.1 Model Building
The segmentation algorithm presented in this paper belongs to the class of su-
pervised segmentation algorithms. From a training data set of ground truth seg-
mentations a statistical model is build. This training data set consist of images
together with their corresponding object segmentations. These object segmen-
tations are surfaces represented as a polygon mesh. The nodes of this polygon
mesh are seen as landmarks in the image. Each landmark must correspond to the
same location between the the training images, i.e. landmark correspondences
between the training data must exist. The next paragraphs describe the model
building procedure. These models are build by estimating the probability density
function from the training data. First the global statistical framework and the
made assumptions are presented. The next paragraphs discuss the local appear-
ance model and the local shape model. The final section explains the probability
density function estimation.

Bayesian inference. The goal of the segmentation algorithm is the optimal


segmentation of the object of interest from the background. This optimal segmen-
tation can be expressed as the segmentation with the highest probability given
the image. Using Bayesian inference this posterior probability can be expressed
356 J. Keustermans et al.

in terms of the conditional probability of the image, I, given the segmentation,


G, and the prior probability of the segmentation (equation 1).

P (I|G)P (G)
P (G|I) = . (1)
P (I)

The first term P (I|G) is the image prior, the second term, P (G), is the shape prior.
The term in the denominator is a constant and therefore of no interest. Maximizing
the posterior probability is equal to minimizing its negative algorithm,

G = arg min (EI (I, G) + ES (G)) , (2)


G

where EI (I, G) is the negative logarithm of the image prior and ES (G) the
negative logarithm of shape prior. In this way a cost function that needs to be
minimized is formulated. The next sections describe respectively the image prior
and the shape prior.

Image Prior. In order to build a model for the image prior two assumptions are
made. The first assumption states that the influence of the segmentation on the
image intensities is only local. This local influence is described by a Local Image
Descriptor (LID). This LID extracts the local intensity patterns around each
landmark in the image. The second assumption states the mutual independence
of these landmark-individual LIDs. Using these assumptions the image prior
term can be rewritten as follows:

n 
n
P (I|G) = P (I|li ) = Pi (ωi ). (3)
i i

In this equation li represents the ith landmark, n is the number of landmarks


and ωi represents the LID of landmark li . Furthermore the term EI (I, G) from
equation (2) can now be written as:

n
EI (I, G) = di (ωi ) , (4)
i

where di (ωi ) is the negative logarithm of Pi (ωi ) and represents the intensity
cost of landmark i. As already explained, this LID tries to describe the image
intensity in the local neighborhood of each landmark. In the computer vision
literature some LIDs are proposed [15]. In this article two LIDs are used, the
Gabor LID and locally orderless images.

Gabor LID. The Gabor LID is the response in a given landmark of a Gabor filter
bank applied to the image. A Gabor filter captures the most optimal localized,
in terms of space-frequency localization, frequency and phase content of a signal.
The filter consists of a Gaussian kernel modulated by a complex sinusoid with
a specific frequency and orientation. These Gabor filters have been found to be
distortion tolerant for pattern recognition tasks [16]. There is also a biological
Image Segmentation Using Graph Representations 357

motivation for using them: the well-known connection to mammalian vision, as


these filters resemble the receptive fields of simple cells in the primary visual
cortex of the brain.
The basic form of the 3D Gabor filter is [17]:
  2   
|f |3 f 2 f 2 2 f 2 2 
ψ(x, y, z; f ) = 3 exp − 2
x + 2
y + 2
z exp 2jπf x , (5)
π 2 γηζ γ η ζ
⎡ ⎤ ⎡ ⎤
x x
⎣ y  ⎦ = R (θ, φ) ⎣ y ⎦ ,
z z

where f is the central frequency of the filter, R (θ, φ) is the rotation matrix
determining the filter orientation and γ, η and ζ control the filter sharpness.
|3
The term |f3 is a normalization constant for the filter response. The real-
π 2 γηζ
valued part (cosine) of the Gabor filter captures the symmetric properties and
the imaginary-valued part (sine) the asymmetrical properties of the signal. The
Gabor filter response can also be decomposed into a magnitude and a phase. The
phase behaves oscillatory, while the magnitude is more smooth. Therefore, when
comparing two LIDs, including phase information can lead to better results,
however when using only the response magnitude robustness is improved. [17].

Locally Orderless Images. The purpose of the LID is to describe the local intensi-
ties around each landmark. A Taylor expansion approximates the local intensities
by a polynomial of some order N . The coefficients of this Taylor expansion are
proportional to the derivatives up to order N . For images these derivatives can
be computed by convolving the image with the derivative of a Gaussian at a
particular scale σ. Instead of directly using the derivatives of the image, locally
orderless images [18] are used. The term locally orderless is used because the
image intensities are replaced by a local intensity histogram, and thus locally
the order of the image is removed. The first few moments of these histograms
are used to construct the feature images. In this way, the LID is defined as
follows: firstly, by computing the derivatives of the image, applying the locally
orderless image technique and computing the first few moments of the local in-
tensity histograms a set feature images is constructed. Subsequently the LIDs
are constructed by taking samples along a spherical or linear profile centered at
the location of the landmark. The linear profile can be defined along the image
gradient in the landmark.

Shape Prior. The shape prior introduces the shape model in the Bayesian
framework. In the shape model two assumptions are made. The first assumes
invariance of the model to translations. The second assumes that a landmark
only interacts with its neighbors, thereby implying its local nature.
To define this local shape model, some definitions need to be formulated.
The polygon mesh, representing the object segmentation, can be considered
as an undirected graph G = {V, E} with a set of vertices V = {l1 , . . . , ln },
the landmarks, and a set of edges E, representing the connections between the
358 J. Keustermans et al.

landmarks in the mesh. Let N be a neighborhood system defined on V, where


Ni = {lj ∈ V|(li , lj ) ∈ E} denotes the set of neighbors of li . A clique c of the
graph G is a fully connected subset of V. The set of cliques is represented by C.
For each li ∈ V, let Xi be a random variable taking values xi in some discrete or
continuous sample space X . X = {X1 , . . . , Xn } is said to be a Markov random
field if and only if it is strictly positive and the markovianity property is valid.
By the Hammersly-Clifford theorem [19], any Markov random field can be de-
fined in terms of a decomposition of the distribution over cliques of the graph. In
this way the probability density function can be expressed in terms of potential
functions defined on the cliques,
1 
p(x) = exp(− (Vc (x))) , (6)
Z
c∈C

where Z represents the partition function.


To estimate the potential functions on the cliques we follow the approach of
Seghers [12]. First the shape energy is computed for a trivial graph consisting
of three nodes V = {l1 , l2 , l3 } and three edges E = {(l1 , l2 ), (l1 , l3 ), (l2 , l3 )}. In
this case the joint probability density function becomes:

p(x1 , x2 , x3 ) = p(x1 ).p(x2 |x1 ).p(x3 |x1 , x2 ) ≈ p(x1 ).p(x2 |x1 ).p(x3 |x1 ). (7)

The approximation in equation (7) is made by considering only the influence of


the edges (l1 , l2 ) and (l1 , l3 ). To obtain an approximation that takes all edges
into account equation (7) is considered for all three possible combinations and
averaged. Equation (8) gives the shape energy of this graph, thereby making use
of the assumption on the translation invariance of the model. Here dij (xi , xj ) =
− log(p(xj − xj )).
2
E(x1 , x2 , x3 ) = (d12 (x1 , x2 ) + d13 (x1 , x3 ) + d23 (x2 , x3 )). (8)
3
For a general graph, a similar expression as equation (8) can be derived, assuming
that every node has the same number of neighbors and that every edge is equally
important:
n−1 
E(x) = dij (xi , xj ). (9)
t
(li ,lj )∈E

In this last equation E(x) corresponds to ES (G), the negative logarithm of the
shape prior.

Probability Density Function Estimation. Because of the Bayesian infer-


ence used to formulate the cost function, probability density functions need to
be estimated based on the training data. Let χ = {xi ∈ IRn } be a set of observed
samples. The goal of probability density function estimation is to estimate the
underlying distribution function of the observed samples. Both parametric and
non-parametric methods can be used. The parametric methods assume the ob-
served samples originated from a certain distribution function, for instance a
Image Segmentation Using Graph Representations 359

Gaussian, and try to find its parameters. In contradiction, the non-parametric


methods, for instance kernel density estimation methods, directly try to approx-
imate the underlying distribution.
In this article two methods are used. The first method assumes the observed
samples xi ∈ χ are distributed according to a Gaussian distribution:
 
1 T −1
P (xi ) ∝ exp − (xi − x̃) Σ⊥ (xi − x̃) , (10)
2
where x̃ represents the sample mean and Σ⊥ is the regularized sample covariance
matrix [6]. If the observed samples span a lower-dimensional subspace of Rn ,
the sample covariance matrix Σ is singular and regularization of this sample
covariance matrix is needed:
 
Σ⊥ = Σ + λ⊥ I − V V T , (11)
where V is the matrix of the eigenvectors of Σ. λ⊥ is a constant replacing all zero
eigenvalues, λ⊥ ∈ [0, λr ], with λr being the smallest non-zero eigenvalue of Σ. A
reasonable choice for the regularization parameter λ⊥ is λ2r [6]. This approach
actually comes down to performing Principal Component Analysis (PCA) and
further on we will refer to this approach as PCA.
The second method is adopted from Cremers [6]. This approach comes down
to a nonlinear mapping φ : Rn → F of the observed data to a higher, possibly in-
finite, dimensional feature space F . In this feature space a Gaussian distribution
is presumed:  
1
P (xi ) ∝ exp − φ̃(xi )T Σφ−1 φ̃(xi ) . (12)
2
In this equation φ̃(xi ) represents the centered nonlinear mapping of a sample xi
to a higher dimensional feature space and Σφ represents the sample covariance
matrix in feature space. As discussed for the previous method, regularization
of the sample covariance matrix is necessary and can be performed analogously
in this case. It turns out that there is no need to ever compute the nonlinear
mapping φ. Only scalar products in the feature space need to be computed and,
by use of the Mercer theorem [20], these can be evaluated using a positive definite
kernel function: k(xi , xj ) = φ(xi ), φ(xj ). The Gaussian kernel is used:
 
1
k(xi , xj ) = exp − 2 xi − xj  ,2
(13)

the kernel parameter σ is equal to the mean squared nearest-neighbor distance
of the samples. Important to note is that this method is closely related to Kernel
Principal Component Analysis (KPCA) [21], therefore, further on we will refer
to this method as KPCA.

2.2 Model Fitting


In this section the model, trained based on a set of training images, is fit to
an image not contained in the training set. The model fitting comes down to
360 J. Keustermans et al.

the optimization of the cost function (2). In the continuous domain this cost
function has many local minima. Optimization techniques in this domain, like
Gradient descent, therefore often do not result in a global minimum. This can be
avoided by the discretization of the cost function and the use of combinatorial
optimization techniques.
Discretization. The discretization of the cost function is performed by impos-
ing a discrete sample space X on the graph G. This discrete sample space consists
of a finite set of possible landmark locations. In this case the optimization prob-
lem comes down to the selection of the optimal possible landmark location for
each landmark. These possible landmark locations for each landmark are ob-
tained by evaluating the intensity cost di (ωi ) in a search grid located around
the landmark of interest and selecting the m locations with lowest cost. This
results in a set of candidates xi = {xik }m k=1 for every landmark. The opti-
mization problem now  becomes a labeling problem: r = {r1 , . . . , rn }. Following
m
conditions must hold: k=1 rik = 1 and rik = 1 if candidate k is selected. The
resulting discrete cost function to be minimized becomes
⎛ ⎛ ⎞⎞
n m 
n 
m n  m
r  = arg min ⎝ rik di (ωik ) + γ ⎝rik rjo dij (xik , xjo )⎠⎠ ,
r
i=1 k=1 i=1 k=1 j=1 o=1
(14)
where γ is a constant that determines the relative weight of the image and shape
prior. Important to note here is that we assume that all images in the training
data set are rigid registered to a reference image, using for example mutual
information [22]. Any image not contained in the training data set, can also be
registered to this reference image. In this way an initial guess concerning the
location of every landmark can be made and a grid of possible candidates can
be generated.
Optimization. Currently two classes of methods are the most prominent ones
in discrete Markov random field optimization: the methods based on graph-
cuts [7] and those based on message-passing [23]. Examples of the message-
passing methods are belief propagation [24] and the so-called tree-reweighted
message passing algorithms [25,26]. The methods based on graph-cuts, however,
can not be used to minimize our cost function (14) because it is not graph-
representable [27].
Another method to minimize our cost function is mean field annealing [19]. By
considering Ri to be a random variable taking values ri in some discrete sample
space R containing the labels of the labeling problem, R = {R1 , . . . , Rn } is
said to be a Markov random field under certain conditions (section 2.1). The
probability density function of this Markov random field can be written as
 
1 1
P (r) = exp − E(r) , (15)
Zr T
where E(r) is equal to the cost function in equation (14) and an artificial param-
eter T , called temperature, is added. The solution of equation (14) corresponds
Image Segmentation Using Graph Representations 361

to the configuration of the Markov random field with highest probability. The
temperature T can be altered without altering the most probable configuration
r  , while the mean configuration r̄ alters with T as follows:

lim r̄T = lim rP (r) = r  . (16)
T →0+ T →0+
r

Therefore, instead of solving equation (14), it is also possible to estimate the


mean field r̄T at a sufficiently high temperature and then track this down as the
temperature is decreased.

3 Experiments and Results


3.1 Segmentation of Teeth
The method discussed in this paper is applied to two medical applications. The
first application is the segmentation of teeth from 3D Cone Beam Computed
Tomography (CBCT) images. This is a rather difficult application because the
teeth are fixated in the jaw bone. Therefore there is a lack of contrast between
the bone and the teeth, certainly at the level of the apex, and in the case of
non-mature teeth. Besides this lack of contrast there are several artifacts in
the images, mainly caused by the orthodontic braces and dental fillings of the
patients. These cause metallic streak artifacts in the images. Our method should
be able to cope with such problems.
A training data set of 22 patients is used of which the upper left canine
is manually segmented. Both patients with and without orthodontic braces are
enclosed in the training data set. Using a leave-one-out procedure on the training
data set the segmentation procedure is validated. For the local image descriptor
locally orderless images are used. These are sampled along a linear profile in
the direction of the image gradient, with a sample distance equal to the voxel
size and a certain length. For the probability density estimation only KPCA is
used since this gives the best results. Figure 1 shows an example of a segmented
tooth. This segmentation is compared to the manual segmentation by means of
a global overlap coefficient and the distance between the surfaces. The distance
between the surfaces is shown in figure 1. As can be seen there is a large error
at the apex of the tooth. This is caused by the lack of contrast at this level. The
global overlap coefficient is computed as follows:
TP
Ω= (17)
TP + FP + FN
where TP stand for true positive (area correctly classified), FP for false positive
(area incorrectly classified as object), and FN for false negative (area incorrectly
classified as background). The global overlap coefficient in this case is equal to
85.77 %.
The Gabor local image descriptors perform worse in this case. This is caused
by the presence of the metallic streak artifacts in the images. For landmarks
362 J. Keustermans et al.

(a) (b) (c)

Fig. 1. An example of a segmented tooth is shown in figure (a). Figures (b) and (c)
show the distance of this segmentation to the manual segmentation. The distances
are indicated by a color coding, in which blue indicates a distance of 0 mm and red
indicates a distance of 0.8 mm, being twice the voxel size.

25

20

15
distance [mm]

10

0
N S Por(r) Por(l) Or(r) Or(l) UI(r) UI(l) LI(r) LI(l) Go(r) Go(l) Men ANS A B PNS

Fig. 2. Box plots of the error distances between the true and found location for each
anatomical landmark

located further away from these artifacts a similar performance as with the
locally orderless images is obtained. The locally orderless images are less sensitive
to the metallic streak artifacts because these are defined more locally.

3.2 Automatic 3D Cephalometric Analysis

The second application is automatic 3D cephalometric analysis. In this applica-


tion the method has to locate 17 anatomical landmarks in 3D CBCT images of
Image Segmentation Using Graph Representations 363

the head. The anatomical landmarks are nasion, sella, porion (left and right),
orbitale (left and right), upper incisor (left and right), lower incisor (left and
right), gonion (left and right), menton, anterior nasal spine, A-point, B-point
and posterior nasal spine [14]. The training data set consisted of 37 patients
of which most of them had orthodontic braces. A leave-one-out procedure is
performed and the errors for each anatomical landmark are reported. The LID
used is the Gabor filter bank, containing again 72 Gabor filters with 9 different
orientations and 8 different frequencies. To compare the Gabor filter responses
both magnitude and phase is used, since this improved the results. KPCA is
used to estimate the probability density functions. Figure 2 shows the results of
this procedure. In this figure box plots of the error values are shown.

4 Discussion and Conclusion

A supervised model-based segmentation algorithm is presented that incorporates


both local appearance and local shape characteristics. The object is described
by a set of landmarks. The local appearance model describes the local intensity
values around each landmark. The local shape model is build by considering the
position of each landmark in relation with its neighbors. This was performed
by implying the markovianity property on a graph. The discretization of the
objective function, obtained from a maximum a-posteriori approach, converts
the segmentation problem into a labeling problem. This can be efficiently solved
by use of robust combinatorial optimization methods. The performance of the
algorithm is validated on two medical applications: the segmentation of teeth
from CBCT images and automated 3D cephalometric analysis.
Improvements can still be made by using more advanced combinatorial op-
timization methods, like belief propagation and tree-reweighted message pass-
ing [23,24,25,26]. Also the use of other local image descriptors, like scale-invariant
feature transforms [28], is to be investigated. At the level of the probability den-
sity function estimation more robust and sparse methods might be used, for
instance based on statistical learning theory [29]. Finally we also note that this
method is closely related to the so called Elastic Bunch Graph Matching for face
recognition [30].

Acknowledgment. This research has been supported by Medicim nv. The


authors wish to thank Medicim nv for the provided data and their comments.

References

1. Kass, M., Witkin, A., Terzopoulos, D.: Snakes: Active Contour Models. Interna-
tional Journal of Computer Vision 1(4), 231–331 (1987)
2. Caselles, V., Kimmel, R., Sapiro, G.: Geodesic Active Contours. International Jour-
nal of Computer Vision 22(1), 66–79 (1997)
364 J. Keustermans et al.

3. Mumford, D., Shah, J.: Optimal approximation by piecewise smooth functions and
associated variational problems. Communications of Pure and Applied Mathemat-
ics 42, 577–685 (1989)
4. Chan, T.F., Vese, L.A.: Active Contours Without Edges. IEEE Transactions on
Image Processing 10(2), 266–277 (2001)
5. Cremers, D., Rousson, M., Deriche, R.: A Review of Statistical Approaches to Level
Set Segmentation: Integrating Color, Texture, Motion and Shape. International
Journal of Computer Vision 72(2), 195–215 (2007)
6. Cremers, D.: Statistical Shape Knowledge in Variational Image Segmentation. Uni-
versität Mannheim (2002)
7. Boykov, Y., Veksler, O., Zabih, R.: Fast Approximate Energy Minimization
via Graph Cuts. IEEE Transactions on Pattern Analysis and Machine Intelli-
gence 23(11), 1222–1239 (2001)
8. Boykov, Y., Funka-Leah, G.: Graph Cuts and Efficient N-D Image Segmentation.
International Journal of Computer Vision 70(2), 109–131 (2006)
9. Cootes, T.F., Taylor, C.J., Cooper, D.H., Graham, J.: Active Shape Models - Their
Training and Application. Computer Vision and Image Understanding 61(1), 38–59
(1995)
10. Cootes, T.F., Edwards, G.E., Taylor, C.J.: Active Appearance Models. IEEE
Transactions on Pattern Analysis and Machine Intelligence 23(6), 681–685 (2001)
11. Cremers, D., Osher, S.J., Soatto, S.: Kernel Density Estimation and Intrinsic Align-
ment for Shape Priors in Level Set Segmentation. International Journal of Com-
puter Vision 69(3), 335–351 (2006)
12. Seghers, D.: Local graph-based probabilistic representation of object shape and
appearance for model-based medical image segmentation. Katholieke Universiteit
Leuven (2008)
13. Seghers, D., Hermans, J., Loeckx, D., Maes, F., Vandermeulen, D., Suetens, P.:
Model-Based Segmentation Using Graph Representations. In: Metaxas, D., Axel,
L., Fichtinger, G., Székely, G. (eds.) MICCAI 2008, Part I. LNCS, vol. 5241, pp.
393–400. Springer, Heidelberg (2008)
14. Swennen, G.R.J., Schutyser, F., Hausamen, J.-E.: Three-Dimensional Cephalome-
try, A Color Atlas and Manual. Springer, Heidelberg (2006)
15. Ilonen, J.: Supervised Local Image Feature Detection. Lappeenranta University of
Technology (2007)
16. Lampinen, J., Oja, E.: Distortion tolerant pattern recognition based on self-
organizing feature extraction. IEEE Transactions on Neural Networks 6, 539–547
(1995)
17. Kämäräinen, J.-K.: Feature extraction using Gabor filters. Lappeenranta Univer-
sity of Technology (2003)
18. Koenderink, J.J., Van Doorn, A.J.: The Structure of Locally Orderless Images.
International Journal of Computer Vision 31, 159–168 (1999)
19. Li, S.Z.: Markov Random Field Modeling in Computer Vision. Springer, Heidelberg
(1995)
20. Courant, R., Hilbert, D.: Methods of Mathematical Physics. Interscience Publish-
ers, Inc., New York (1953)
21. Schölkopf, B., Smola, A., Müller, K.R.: Nonlinear component analysis as a kernel
eigenvalue problem. Neural Computation 10, 1299–1319 (1998)
22. Maes, F., Collignon, A., Vandermeulen, D., Marchal, G., Suetens, P.: Multimodal-
ity image registration by maximization of mutual information. IEEE transactions
on medical imaging 16(2), 187–198 (1997)
Image Segmentation Using Graph Representations 365

23. Komodakis, N., Paragios, N., Tziritas, G.: MRF Optimization via Dual Decompo-
sition: Message-Passing Revisited. In: ICCV 2007. IEEE 11th International Con-
ference on Computer Vision, pp. 1–8 (October 2007)
24. Felzenszwalb, P.F., Huttenlocher, D.P.: Efficient Belief Propagation for Early Vi-
sion. International Journal of Computer Vision 70(1) (October 2006)
25. Wainwright, M.J., Jaakkola, T.S., Willsky, A.S.: MAP Estimation Via Agreement
on Trees: Message-Passing and Linear Programming. IEEE Transactions on Infor-
mation Theory 51(11), 3697–3717 (2006)
26. Kolmogorov, V.: Convergent Tree-reweighted Message Passing for Energy Mini-
mization. IEEE Transactions on Pattern Analysis and Machine Intelligence 28(10),
1568–1583 (2006)
27. Kolmogorov, V., Zabih, R.: What Energy Functions Can Be Minimized via Graph
Cuts? IEEE Transactions on Pattern Analysis and Machine Intelligence 26(2),
147–159 (2001)
28. Lowe, D.G.: Distinctive image features from scale-invariant keypoints. International
Journal of Computer Vision 60, 91–110 (2004)
29. Vapnik, V.N.: Statistical Learning Theory. Wiley, New York (1998)
30. Wiskott, L., Fellous, J.-M., Krüger, N., von der Malsburg, C.: Face Recognition
by Elastic Bunch Graph Matching. Intelligent Biometric Tecniques in Fingerprint
and Face Recognition. Chapt. 11, 355–396 (1999)
Comparison of Perceptual Grouping Criteria
within an Integrated Hierarchical Framework

R. Marfil and A. Bandera

Grupo ISIS, Dpto. Tecnologı́a Electrónica


University of Málaga, Campus de Teatinos 29071-Málaga, Spain
{rebeca,ajbandera}@uma.es

Abstract. The efficiency of a pyramid segmentation approach mainly


depends on the graph selected to encode the information within each
pyramid level, on the reduction or decimation scheme used to build one
graph from the graph below, and on the criteria employed to define if two
adjacent regions are similar or not. This paper evaluates three pairwise
comparison functions for perceptual grouping into a generic framework
for image perceptual segmentation. This framework integrates the low–
level definition of segmentation with a domain–independent perceptual
grouping. The performance of the framework using the different compar-
ison functions has been quantitatively evaluated with respect to ground-
truth segmentation data using the Berkeley Segmentation Dataset and
Benchmark providing satisfactory scores.

1 Introduction

In a general framework for image processing, perceptual grouping can be defined


as the process which allows to organize low–level image features into higher level
relational structures [1]. Handling such high–level features instead of image pix-
els offers several advantages such as the reduction of computational complexity
of further processes like scene understanding. It also provides an intermediate
level of description (shape, spatial relationships) for data, which is more suit-
able for object recognition tasks. The simplest approach for perceptual grouping
consists in grouping pixels into higher level structures based on low–level de-
scriptors such as color or texture. However, these approaches cannot deal with
natural, real objects as these are usually non–homogeneous patterns composed
by different low–level descriptors. This implies that neither low–level image fea-
tures, such as brightness or color, nor abrupt changes in some of these image
features cannot produce a complete final good segmentation. Nearly 85 years
ago, Wertheimer [2] formulated the importance of wholes and not of its in-
dividual components, pointing out the importance of perceptual grouping and
organization in visual perception. The Gestalt principles can be applied to image
segmentation [11]. For instance, it may be relevant to group two regions with
close and continuous borders, as those may be two parts of the same object.
Other criteria are useful too, such as compactness, similarity or symmetry.

A. Torsello, F. Escolano, and L. Brun (Eds.): GbRPR 2009, LNCS 5534, pp. 366–375, 2009.

c Springer-Verlag Berlin Heidelberg 2009
Comparison of Perceptual Grouping Criteria 367

As taking into account the Gestalt principles to group pixels into higher level
structures is computationally complex, perceptual segmentation approaches typ-
ically integrate a pre–segmentation stage with a subsequent perceptual grouping
stage. Basically, the first stage conducts the low–level definition of segmenta-
tion as a process of grouping pixels into homogeneous clusters, meanwhile the
second stage performs a domain–independent grouping of the pre–segmentation
regions which is mainly based on properties like the proximity, similarity, closure
or continuity. In this paper, both stages performs a perceptual organization of
the image which is described by a hierarchy of partitions ordered by inclusion.
The base of this hierarchy is the whole image, and each level represents the
image at a certain scale of observation [3]. This hierarchy has been structured
using a Bounded Irregular Pyramid (BIP) [4]. The data structure of the BIP is a
mixture of regular and irregular data structures, and it has been previously em-
ployed by color–based segmentation approaches [4,5]. Experimental results have
shown that, although computationally efficient, these segmentation approaches
are excessively affected by the shift–variance problem [4,5]. In this paper, the
original decimation strategy has been modified to solve this problem increasing
the degree of mixture between the regular and irregular parts of the BIP data
structure. The pre–segmentation stage of the proposed perceptual grouping ap-
proach uses this decimation scheme to accomplish a color-based segmentation of
the input image. Experimental results have shown that the shift-variance prob-
lem is significantly reduced without an increase of the computational cost. On
the other hand, the second stage groups the set of blobs into a smaller set of
regions taking into account a pairwise comparison function derived from the
Gestalt theory. To achieve this second stage, the proposed approach generates a
set of new pyramid levels over the previously built pre–segmentation pyramid.
At this stage, we have tested three pairwise comparison functions to determine
if two nodes must be grouped. The rest of this paper is organized as follows:
Section 2 describes the proposed approach and the three implemented compari-
son functions. Experimental results revealing the efficacy of these functions are
described in Section 3. The paper concludes along with discussions and future
work in Section 4.

2 Natural Image Segmentation Approaches


2.1 Pre–segmentation Stage
The pre–segmentation stage groups the image pixels into a set of photometric
homogeneous regions (blobs) whose spatial distribution is physically represen-
tative of the image content. This grouping is hierarchically conducted and the
output is organized as a hierarchy of graphs which uses the BIP as data struc-
ture. Let Gl = (Nl , El ) be a hierarchy level where Nl stands for the set of regular
and irregular nodes and El for the set of intra-level arcs. Let ξx be the neigh-
borhood of the node x defined as {y ∈ Nl : (x, y) ∈ El }. It can be noted that
a given node x is not a member of its neighborhood, which can be composed
by regular and irregular nodes. At this stage, each node x has associated a vx
368 R. Marfil and A. Bandera

value given by the averaged CIELab color of the image pixels linked to x. Be-
sides, each regular node has associated a boolean value hx : the homogeneity [5].
Only regular nodes which have hx equal to 1 are considered to be part of the
regular structure. Regular nodes with an homogeneity value equal to 0 are not
considered for further processing. At the base level of the hierarchy G0 , all nodes
are regular, and they have hx equal to 1. In order to divide the image into a
set of homogeneous colored blobs, the graph Gl is transformed in Gl+1 using
a pairwise comparison of neighboring nodes [6]. At the pre–segmentation stage,
the pairwise comparison function, g(vx1 , vx2 ), is true if the Euclidean distance
between the CIELab values vx1 and vx2 is under an user–defined threshold Uv .
As it was aforementioned, the decimation algorithm proposed to build the BIP
by [4,5] has been modified to increase the degree of mixture between the regular
and irregular parts of the BIP data structure. The new decimation algorithm
runs two consecutive steps to obtain the set of nodes Nl+1 from Nl . The first
step generates the set of regular nodes of Gl+1 from the regular nodes at Gl and
the second one determines the set of irregular nodes at level l+1. Contrary to
previously proposed algorithms [4,5], this second process employs a union-find
process which is simultaneously conducted over the set of regular and irregular
nodes of Gl which do not present a parent in the upper level l+1. The decimation
process consists of the following steps:
1. Regular decimation process. The hx value of a regular node x at level l+1
is set to 1 if the four regular nodes immediately underneath {yi } are similar
and their h{yi } values are equal to 1. That is, hx is set to 1 if
$ $
{ g(vyj , vyk )} ∩ { hy j } (1)
∀yj ,yk ∈{yi } yj ∈{yi }

Besides, at this step, inter-level arcs among regular nodes at levels l and l+1
are established. If x is an homogeneous regular node at level l+1 (hx ==1),
then the set of four nodes immediately underneath {yi } are linked to x.
2. Irregular decimation process. Each irregular or regular node x ∈ Nl without
parent at level l+1 chooses the closest neighbor y according to the vx value.
Besides, this node y must be similar to x. That is, the node y must satisfy
{||vx − vy || = min(||vx − vz || : z ∈ ξx )} ∩ {g(vx , vy )} (2)
If this condition is not satisfy by any node, then a new node x is gener-
ated at level l+1. This node will be the parent node of x. Besides, it will
constitute a root node and the set of nodes linked to it at base level will be
an homogeneous set of pixels according to the defined criteria. On the other
hand, if y exists and it has a parent z at level l+1, then x is also linked to
z. If y exists but it does not have a parent at level l+1, a new irregular node
z is generated at level l+1. In this case, the nodes x and y are linked to z .
This process is sequentially performed and, when it finishes, each node of
Gl is linked to its parent node in Gl+1 . That is, a partition of Nl is defined.
It must be noted that this process constitutes an implementation of the
union-find strategy [5].
Comparison of Perceptual Grouping Criteria 369

Table 1. Shift Variance values for different decimation processes. Average values have
been obtained from 30 color images from Waterloo and Coil 100 databases (All these
images have been resized to 128x128 pixels size).

MIS [7] D3P [8] MIES [9] BIP [5] Modified BIP
SVmin 39.9 31.8 23.7 25.6 19.5
SVave 59.8 49.1 44.1 73.8 43.7
SVmax 101.1 75.3 77.2 145.0 73.2

3. Definition of intra-level arcs. The set of edges El+1 is obtained by defining


the neighborhood relationships between the nodes Nl+1 . Two nodes at level
l+1 are neighbors if their reduction windows, i.e. the sets of nodes linked to
them at level l, are connected at level l.
When the decimation scheme proposed in [4,5] is used, regular and irregular
nodes of level l cannot be linked to the same parent node at level l+1. This
provokes that the image partition varies when the base of the pyramid is shifted
slightly or rotated (Shift Variance, SV). In this case, the decimation process
has been simplified, allowing that regular and irregular nodes will be grouped.
Table 1 shows the obtained results from the evaluation of several decimation
schemes using the SV test. This test compares the segmentation of an image
by a given algorithm with the segmentation produced by the same algorithm
on slightly shifted versions of the same image. To do that, we have taken a
128x128 pixel window in the center of the original image. We have compared the
segmentation of this subimage with each segmented image obtained by shifting
the window a maximum shift of 11 pixels to the right and 11 pixels down. Thus,
there is a total of 120 images to compare with the original one. In order to do
each comparison between a segmented shifted image and the segmented original
one, the root mean square difference is calculated [4]. It must be noted that the
smaller the value of this parameter, the better the segmentation result should be.
Experimental results show that the modified BIP decimation scheme is robust
against slightly shiftings of the input image.
Let I ⊂ 2 be the domain of definition of the input image and l ∈  a level
of the hierarchy, the pre–segmentation assigns a partition Pl to the couple (I, l)
which is defined by the sets of image pixels linked to the nodes at level l. In this
process, the effective set of pyramid levels is restricted to the interval [0, lm ]. At
l = 0, each pixel of the input image is an individual blob of the partition P0 . If
l ≥ lm , then the partition Pl is equal to Plm . That is, the structure hierarchy
stops growing at a certain level lm when it is no longer possible to link together
any more nodes because they are not similar according to the given pairwise
comparison function.

2.2 Perceptual Grouping Stage


After the local similarity pre–segmentation stage, grouping regions aims at sim-
plifying the content of the obtained partition. For managing this grouping, the
370 R. Marfil and A. Bandera

irregular structure is used: the roots of the pre–segmented blobs at level lm


constitute the first level of the perceptual grouping multiresolution output. Suc-
cessive levels can be built using the decimation scheme described in Section 2.1.
Let Plm be the image partition provided by the pre–segmentation stage and
l > lm ∈  a level of the hierarchy, this second stage assigns a partition Ql to
the couple (Plm , l), satisfying that Qlm is equal to Plm and that
∃ln ∈ + : Ql = Qln , ∀l ≥ ln (3)
That is, the perceptual grouping process is iterated until the number of nodes re-
mains constant between two successive levels. In order to achieve the perceptual
grouping process, a perceptual pairwise comparison function must be defined.
Three functions are evaluated in this paper:
Edge and Region Attributes Integration (ERAI). In this case, the pair-
wise comparison function g(vyi , vyj ) is implemented as a thresholding process,
i.e. it is true if a distance measure between both nodes is under a given threshold
Up , and false otherwise. The defined distance integrates edge and region descrip-
tors. Thus, it has two main components: the color contrast between image blobs
and the edges of the original image computed using the Canny detector. In or-
der to speed up the process, a global contrast measure is used instead of a local
one. It allows to work with the nodes of the current working level, increasing
the computational speed. This contrast measure is complemented with internal
regions properties and with attributes of the boundary shared by both regions.
The distance between two nodes yi ∈ Nl and yj ∈ Nl , ϕα (yi , yj ), is defined as
d(yi , yj ) · min(byi , byj )
ϕα (yi , yj ) = (4)
α · cyi yj + (byi yj − cyi yj )
where d(yi , yj ) is the color distance between yi and yj . byi is the perimeter of
yi , byi yj is the number of pixels in the common boundary between yi and yj
and cyi yj is the set of pixels in the common boundary which corresponds to
pixels of the edge detected by the Canny detector. α is a constant value used to
control the influence of the Canny edges in the grouping process. Fig. 1 shows the
perceptual segmentation results obtained for two threshold values and different
pyramid levels.
Minimum Internal Contrast Difference and External Contrast (IDEC)
In the hierarchy of partitions, Haxhimusa and Kropatsch [10] define a pairwise
merge criterion which uses the minimum internal contrast difference and the
external contrast. In this work, we have tested a slightly modified version of this
criterion. In order to merge two nodes yi ∈ Nl and yj ∈ Nl , the pairwise merge
criterion is defined as

1 if Ext(yi , yj ) ≤ P Int(yi , yj ),
Comp(yi , yj ) (5)
0 otherwise
where P Int(·, ·) and Ext(·, ·) estimate the minimum internal contrast difference
and the external contrast between two nodes, respectively. If the set of nodes in
Comparison of Perceptual Grouping Criteria 371

Fig. 1. Segmentation results: a) Original images; b) multi-scale segmentation images


for levels 5, 10 and ln (Uv =5,Up =50); and b) multi-scale segmentation images for levels
5, 10 and ln (Uv =5, Up =100)

the last level of the pre–segmentation pyramid (lm ) linked to a node is named its
pre–segmentation receptive field, then Ext(yi , yj ) is defined as the smallest color
difference between two neighbor nodes xi ∈ Nlm and xj ∈ Nlm which belong
to the pre–segmentation receptive fields of yi and yj , respectively. P Int(·, ·) is
defined as
P Int(yi , yj ) = min(Int(yi ) + τ (yi ), Int(yj ) + τ (yj )) (6)
Int(n) being the internal contrast of the node n. This contrast measure is defined
as the largest color difference between n and the nodes belonging to the pre–
segmentation receptive field of n. The threshold function τ controls the degree
to which the external variation can actually be larger that the internal variations
and still have the nodes be considered similar. In this work, we have used the
function proposed by [10], τ = α/|n|, where |n| defines the set of pixels of the
input image linked to n.
Energy Functions (EF). In the Luo and Guo’s proposal [11], a set of energy
functions was used to characterize desired single–region properties and pairwise
region properties. The single-region properties include region area, region con-
vexity, region compactness, and color variances in one region. The pairwise prop-
erties include the color mean differences between two regions, the edge strength
along the shared boundary, the color variance of the cross–boundary area, and
the contour continuity between two regions.
372 R. Marfil and A. Bandera

With the aim of finding the lowest energy groupings, Huart and Bertolino [12]
propose to employ these energies to measure the cost of any region or group of
regions. In a similar way, we have defined a pairwise comparison function to
evaluate if two nodes can be grouped. Two energies are defined:

Ef usion = Earea + Ecompactness + EvarianceL + EvarianceA + EvarianceB (7)

Eregion = EcolorMeanDif f L + EcolorMeanDif f A + EcolorMeanDif f B +


(8)
EBoundaryV arianceL + EBoundaryV arianceA + EBoundaryV arianceB + Eedge
where the energy functions have been taken from [11]. Ef usion and Eregion are
used to measure the cost of the fusion operation and the energy of a region
resulting from the fusion, respectively. If Ef usion + Eregion is less than a given
threshold Uc , the comparison function is true and the grouping is accepted.
Otherwise, the function is false.

3 Experimental Results
In order to evaluate the performance of the perceptual segmentation frame-
work and the three described comparison functions, the BSDB has been em-
ployed1 [14]. In this dataset, the methodology for evaluating the performance of
segmentation techniques is based in the comparison of machine detected bound-
aries with respect to human-marked boundaries using the Precision-Recall frame-
work [13]. This technique considers two quality measures: precision and recall.
The F –measure is defined as the harmonic mean of these measures, combining
them into a single measure.
Fig. 2 shows the partitions on the higher level of the hierarchy for five different
images when the three variants of the proposed framework are used. The optimal
training parameters have been chosen. It can be noted that the proposed criteria
are able to group perceptually important regions in spite of the large intensity
variability presented on several areas of the input images. Fig. 2 shows that the
F –measure associated to the individual results ranged from bad to significantly
good values. In any case, the ERAI comparison function allows the user to set
thresholds to partition the input image into less perceptually coherent regions
than the other two functions. If thresholds employed by the IDEC and EF func-
tions are set to provide a similar number of regions than the EARI function,
undesiderable groupings are obtained. However, it must be also noted that the
EF comparison function is more global than the others, and it could be extended
to evaluate if more than two pyramid nodes must be linked. Thus, it will take
the pyramid level as a whole. The main problems of the proposed approaches
are due to its inability to deal with textured regions which are defined at high
natural scales. Thus, the zebras or tigers in Fig. 2 are divided into a set of dif-
ferent regions. The maximal F –measure obtained from the whole test set is 0.66
for the EF comparison function, 0.65 for the IDEC function and 0.70 for the
ERAI function (see Fig. 3).
1
http://www.cs.berkeley.edu/projects/vision/grouping/segbench/
Comparison of Perceptual Grouping Criteria 373

Fig. 2. a) Original images; and b) obtained regions after the perceptual grouping for
the three implemented comparison functions (ERAI: Uv =5.0, Up =50.0; IDEC: Uv =5.0;
α = 15000; EF: Uv =5, Uf =1.0)

Specifically, the F –measure value obtained when the ERAI comparison func-
tion is employed is equal than the one obtained by the gPb [16] and greater
than values provided by other methods like the UCM [3] (0.67), BCTG [13]
(0.65), BEL [15] (0.66) or the min-cover [17] (0.65). Besides, the main advan-
tage of the proposed segmentation framework is that it provides these results
at a relative low computational cost. Thus, using an Intel Core Duo T7100
PC with 1Gb DDR2 of memory, the processing times associated to the pre–
segmentation stage are typically less than 250 ms, meanwhile the perceptual
grouping stage takes less than 150 ms to process any image on the test set.
Therefore, the processing time of the perceptual segmentation framework is less
than 400 ms for any image on the test set. These processing time values are
similar to the ones provided when the IDEC comparison function is employed.
However, if the EF comparison function is used, the approach is almost 50 times
slower.
374 R. Marfil and A. Bandera

Fig. 3. Performance of the proposed framework using the BSDB protocol (see text)

4 Conclusions and Future Work


This paper has presented a generic, integrated approach which combines an accu-
rate segmentation process that takes into account the color information of the im-
age, with a grouping process that merges blobs of uniform color to produce regions
that are perceptually relevant. Both processes are accomplished over an irregular
pyramid. This pyramid uses the data structure of the BIP. However, the decimation
algorithm has been modified with respect to previous proposals [4,5]. This modifi-
cation increases the mixture of the irregular and regular parts of the BIP.
Future work will be focused on employing a texture descriptor in the pre–
segmentation stage. Besides, it will be also interesting that the perceptual group-
ing stage incorporates texture, layout and context information efficiently. The
EF pairwise comparison function could be modified to take into account these
perception-based grouping parameters.

Acknowledgments
This work has been partially granted by the Spanish Junta de Andalucı́a, under
projects P07-TIC-03106 and P06-TIC-02123 and by the Spanish Ministerio de
Ciencia y Tecnologı́a (MCYT) and FEDER funds under project no. TIN2005-
01359.

References
1. Robles-Kelly, A., Hancock, E.R.: An Expectation–maximisation Framework for
Segmentation and Grouping. Image and Vision Computing 20, 725–738 (2002)
2. Wertheimer, M.: Ǔber Gestaltheorie. Philosophische Zeitschrift für Forschung und
Aussprache 1, 30–60 (1925)
Comparison of Perceptual Grouping Criteria 375

3. Arbeláez, P., Cohen, L.: A Metric Approach to Vector–valued Image Segmentation.


Int. Journal of Computer Vision 69, 119–126 (2006)
4. Marfil, R., Molina-Tanco, L., Bandera, A., Rodrı́guez, J.A., Sandoval, F.: Pyramid
Segmentation Algorithms Revisited. Pattern Recognition 39(8), 1430–1451 (2006)
5. Marfil, R., Molina-Tanco, L., Bandera, A., Sandoval, F.: The Construction of
Bounded Irregular Pyramids with a Union–find Decimation Process. In: Escolano,
F., Vento, M. (eds.) GbRPR 2007. LNCS, vol. 4538, pp. 307–318. Springer, Hei-
delberg (2007)
6. Haxhimusa, Y., Glantz, R., Kropatsch, W.G.: Constructing Stochastic Pyramids
by MIDES - Maximal Independent Directed Edge Set. In: Hancock, E.R., Vento,
M. (eds.) GbRPR 2003. LNCS, vol. 2726, pp. 35–46. Springer, Heidelberg (2003)
7. Meer, P.: Stochastic Image Pyramids. Computer Vision, Graphics and Image Pro-
cessing 45, 269–294 (1989)
8. Jolion, J.M.: Stochastic Pyramid Revisited. Pattern Recognition Letters 24(8),
1035–1042 (2003)
9. Haxhimusa, Y., Glantz, R., Saib, M., Langs, G., Kropatsch, W.G.: Logarithmic
Tapering Graph Pyramid. In: van Gool, L. (ed.) DAGM 2002. LNCS, vol. 2449,
pp. 117–124. Springer, Heidelberg (2002)
10. Haxhimusa, Y., Kropatsch, W.: Segmentation Graph Hierarchies. In: Proc. of IAPR
Int. Workshop on Syntactical and Structural Pattern Recognition and Statistical
Pattern Recognition, pp. 343–351 (2004)
11. Luo, J., Guo, C.: Perceptual Grouping of Segmented Regions in Color Images.
Pattern Recognition, 2781–2792 (2003)
12. Huart, J., Bertolino, P.: Similarity–based and Perception–based image segmenta-
tion. In: Proc. IEEE Int. Conf. on Image Processing, pp. 1148–1151 (2005)
13. Martin, D., Fowlkes, C., Malik, J.: Learning to Detect Natural Image Boundaries
Using Brightness, Color, and Texture Cues. IEEE Trans. on Pattern Analysis Ma-
chine Intell. 26(1), 1–20 (2004)
14. Martin, D., Fowlkes, C., Tal, D., Malik, J.: A Database of Human Segmented
Natural Images and its Application to Evaluating Segmentation Algorithms and
Measuring Ecological Statistics. In: Int. Conf. Computer Vision (2001)
15. Dollár, P., Tu, Z., Belongie, S.: Supervised Learning of Edges and Object Bound-
aries. In: Int. Conf. Computer Vision Pattern Recognition (2006)
16. Maire, M., Arbeláez, P., Fowlkes, C., Malik, J.: Using Contours to Detect and
Localize Junctions in Natural Images. In: Int. Conf. Computer Vision Pattern
Recognition (2008)
17. Felzenszwalb, P., McAllester, D.: A Min–cover Approach for Finding Salient
Curves. In: POCV 2006 (2006)
Author Index

Al Ashraf, Mohab 324 Hancock, Edwin R. 184, 233, 243, 253


Aly, Omar 324 Hasenfuss, Alexander 42
Artner, Nicole 82 Hashimoto, Marcelo 223
Aussem, Alex 52 Hodé, Yann 174
Horaud, Radu 144
Baldacci, Fabien 283 Hui, Annie 62
Bandera, A. 366
Biri, Venceslas 134 Iglesias-Ham, Mabel 263
Bogunovich, Peter 215, 343 Ion, Adrian 82, 263
Braquelaire, Achille 283
Broelemann, Klaus 92 Janodet, Jean-Christophe 102
Brun, Luc 11, 304 Jiang, Xiaoyi 92
Bunke, Horst 113, 124, 205 Jouili, Salim 154

Cesar Jr., Roberto M. 223 Karatzas, D. 113


Katona, Endre 72
Corbex, Marilys 52
Keustermans, Johannes 353
Couprie, Michel 31, 134
Knossow, David 144
Dahl, Anders Bjorholm 343 Kropatsch, Walter G. 82, 263
Damiand, Guillaume 102, 283 Krüger, Antonio 92
De Floriani, Leila 62
Lozano, Miguel A. 253
de la Higuera, Colin 102
Deruyver, Aline 174 Mankowski, Walter C. 215
Dickinson, Peter 124 Marfil, R. 366
Drauschke, Martin 293 Mateus, Diana 144
Dupé, François-Xavier 11 Meine, Hans 273
Mokbel, Bassam 42
ElGhawalby, Hewayda 233 Molina-Abril, Helena 314
Elias, Rimon 324 Mollemans, Wouter 353
Erdem, Aykut 21
Escolano, Francisco 253 Panozzo, Daniele 62
Papa, João Paulo 195
Falcão, Alexandre Xavier 195 Payet, Nadia 1
Falcidieno, Bianca 253
Fankhauser, Stefan 124 Raynal, Benjamin 134
Favrel, Joël 52 Real, Pedro 314
Ferrer, M. 113 Ren, Peng 243
Fourey, Sébastien 304 Riesen, Kaspar 124, 205
Frinken, Volkmar 205 Rodrigues de Morais, Sergio 52

Giorgi, Daniela 253 Salvucci, Dario D. 215


Gonzalez-Diaz, Rocio 263 Samuel, Émilie 102
Seghers, Dieter 353
Hamam, Yskandar 31 Serratosa, Francesc 164
Hammer, Barbara 42 Sharma, Avinash 144
378 Author Index

Shokoufandeh, Ali 215, 343 Valveny, E. 113


Solé-Ribalta, Albert 164 Vandermeulen, Dirk 353
Solnon, Christine 102
Song, Qi 334 Wachenfeld, Steffen 92
Sonka, Milan 334 Wilson, Richard C. 243
Suetens, Paul 353

Tabbone, Salvatore 154 Xia, Shengping 184


Tari, Sibel 21
Todorovic, Sinisa 1 Yin, Yin 334

You might also like