Spatial Cognition

Lecture Notes in Artificial Intelligence 5248
Edited by R. Goebel, J. Siekmann, and W. Wahlster
Subseries of Lecture Notes in Computer Science

Christian Freksa Nora S. Newcombe
Peter Gärdenfors Stefan Wölfl (Eds.)
Spatial Cognition VI
Learning, Reasoning,
and Talking about Space
International Conference Spatial Cognition 2008

Freiburg, Germany, September 15-19, 2008
Proceedings
13
Series Editors
Randy Goebel, University of Alberta, Edmonton, Canada
Jörg Siekmann, University of Saarland, Saarbrücken, Germany
Wolfgang Wahlster, DFKI and University of Saarland, Saarbrücken, Germany
Volume Editors
Christian Freksa
SFB/TR 8 Spatial Cognition
Universität Bremen, Bremen, Germany
E-mail: freksa@sfbtr8.uni-bremen.de
Nora S. Newcombe
James H. Glackin Distinguished Faculty Fellow
Temple University, Philadelphia, PA, USA
E-mail: newcombe@temple.edu
Peter Gärdenfors
Lund University Cognitive Science
Lund, Sweden
E-mail: peter.gardenfors@lucs.lu.se
Stefan Wölfl
Department of Computer Science
University of Freiburg, Freiburg, Germany
E-mail: woelfl@informatik.uni-freiburg.de
Library of Congress Control Number: 2008934601
CR Subject Classification (1998): H.2.8, I.2.10, H.3.1, K.4.2, B.5.1

LNCS Sublibrary: SL 7 – Artificial Intelligence
ISSN 0302-9743
ISBN-10 3-540-87600-6 Springer Berlin Heidelberg New York
ISBN-13 978-3-540-87600-7 Springer Berlin Heidelberg New York
This work is subject to copyright. All rights are reserved, whether the whole or part of the material is
concerned, specifically the rights of translation, reprinting, re-use of illustrations, recitation, broadcasting,
reproduction on microfilms or in any other way, and storage in data banks. Duplication of this publication
or parts thereof is permitted only under the provisions of the German Copyright Law of September 9, 1965,
in its current version, and permission for use must always be obtained from Springer. Violations are liable
to prosecution under the German Copyright Law.
Springer is a part of Springer Science+Business Media
springer.com
© Springer-Verlag Berlin Heidelberg 2008
Printed in Germany
Typesetting: Camera-ready by author, data conversion by Scientific Publishing Services, Chennai, India
Printed on acid-free paper SPIN: 12519798 06/3180 543210
Preface
This is the sixth volume in a series of books dedicated to basic research in spatial
cognition. Spatial cognition research investigates relations between the physical
spatial world, on the one hand, and the mental world of humans, animals, and
artificial agents, on the other hand. Cognitive agents – natural or artificial – make
use of spatial and temporal information about their environment and about their
relation to the environment to move around, to behave intelligently, and to make
adaptive decisions in the pursuit of their goals. More specifically, cognitive agents
process various kinds of spatial knowledge for learning, reasoning, and talking
about space.
From a cognitive point of view, a central question is how our brains represent
and process spatial information. When designing spatial representation systems,
usability will be increased if the external and internal forms of representation
are aligned as much as possible. A particularly interesting feature is that much
of the internal representations of the meanings of words seem to have a spatial
structure. This also holds when we are not talking about space as such. The
spatiality of natural semantics will impose further requirements on the design
of information systems. An elementary example is that “more” of something is
often imagined as “higher” on a vertical dimension: consequently, a graphical
information system that associates “more” with “down” will easily be misunder-
stood. Another example concerns similarity relations: features that are judged
to be similar in meaning are best represented as spatially close in a graphical
information system.
In addition to the question of how this information is represented and used
– which was the focus of the previous Spatial Cognition volumes – an impor-
tant question is whether spatial abilities are innate (“hard wired”) or whether
these abilities can be learned and trained. The hypothesis that spatial cogni-
tion is malleable, and hence that spatial learning can be fostered by effective
technology and education, is based on recent evidence from multiple sources.
Developmental research now indicates that cognitive growth is not simply the
unfolding of a maturational program but instead involves considerable learning;
new neuroscience research indicates substantial developmental plasticity; and
cognitive and educational research has shown us significant effects of experience
on spatial skill.
Because an informed citizen in the 21st century must be fluent at process-
ing spatial abstractions including graphs, diagrams, and other visualizations,
research that reveals how to increase the level of spatial functioning in the pop-
ulation is vital. In addition, such research could lead to the reduction of gender
and socioeconomic status differences in spatial functioning and thus have an im-
portant impact on social equity. We need to understand spatial learning and to
use this knowledge to develop programs and technologies that will support the
VI Preface
capability of all children and adolescents to develop the skills required to com-
pete in an increasingly complex world. To answer these questions, we need to
understand structures and mechanisms of abstraction and we must develop and
test models that instantiate our insights into the cognitive mechanisms studied.
Today, spatial cognition is an established research area that investigates a
multitude of phenomena in a variety of domains on many different levels of ab-
straction involving a palette of disciplines with their specific methodologies. One
of today’s challenges is to connect and relate these different research areas. In
pursuit of this goal, the Transregional Collaborative Research Center SFB/TR 8
Spatial Cognition (Bremen and Freiburg) and the Spatial Intelligence and Learn-
ing Center (Philadelphia and Chicago) co-organized Spatial Cognition 2008 in
the series of the biannual international Spatial Cognition conferences. This con-
ference brought together researchers from both centers and from other spatial
cognition research labs, from all over the world. This proceedings volume con-
tains 27 papers that were selected for oral presentation at the conference in
a thorough peer-review process to which 54 papers had been submitted; each
paper was reviewed and commented on by at least three Program Committee
members. Many high-quality contributions could not be accepted due to space
limitations in the single-track conference program.
The Program Chairs invited three prominent scientists to deliver keynote
lectures at the Spatial Cognition 2008 conference: Heinrich H. Bülthoff spoke on
“Virtual Reality as a Valuable Research Tool for Investigating Different Aspects
of Spatial Cognition”, Laura Carlson’s talk was about “On the ‘Whats’ and
‘Hows’ of ‘Where’: The Role of Salience in Spatial Descriptions”, and Dedre
Gentner addressed the topic “Learning about space”. Abstracts of the keynote
presentations are also printed in this volume.
Spatial Cognition 2008 took place at Schloss Reinach near Freiburg (Ger-
many) in September 2008. Besides the papers for oral presentation, more than
30 poster contributions were selected for presenting work in progress. The con-
ference program also featured various tutorials, workshops, and a doctoral col-
loquium to promote an exchange of research experience of young scientists and
knowledge transfer at an early stage of project development. Immediately before
the conference, a workshop sponsored by the American National Science Foun-
dation (NSF) was organized by the SILC consortium in cooperation with the
SFB/TR 8 at the University of Freiburg. This workshop included lab visits at
the Freiburg site of the SFB/TR 8.
Many people contributed to the success of the Spatial Cognition 2008 con-
ference. First of all, we thank the authors for preparing excellent contributions.
This volume presents contributions by 61 authors on a large spectrum of interdis-
ciplinary work on descriptions of space, on spatial mental models and maps, on
spatio-temporal representation and reasoning, on route directions, wayfinding in
natural and virtual environments, and spatial behavior, and on robot mapping
and piloting. Our special thanks go to the members of the Program Committee
for carefully reviewing and commenting on these contributions. Thorough re-
views by peers are one of the most important sources of feedback to the authors
Preface VII
that connects them to still-unknown territory and that helps them to improve
their work and to secure a high-quality scientific publication.
We thank Thomas F. Shipley for organizing, and Kenneth D. Forbus, Alexan-
der Klippel, Marco Ragni, and Niels Krabisch for offering tutorials. For orga-
nizing workshops we owe thanks to Kenny Coventry and Jan M. Wiener as
well as Alexander Klippel, Stephen Hirtle, Marco Ragni, Holger Schultheis,
Thomas Barkowsky, Ronan O’Ceallaigh, and Wolfgang Stürzl. Further thanks
go to Christoph Hölscher for organizing the poster session, and Sven Bertel and
Marco Ragni, who were responsible for organizing the doctoral colloquium and
for allocating travel grants to PhD students.
We thank the members of our support staff, namely, Ingrid Schulz, Dagmar
Sonntag, Roswitha Hilden, Susanne Bourjaillat, and Ulrich Jakob for profes-
sionally arranging many details. Special thanks go to Thomas Barkowsky, Eva
Räthe, Lutz Frommberger, and Matthias Westphal for the close cooperation on
both sites of the SFB/TR 8.
We thank Wolfgang Bay and the SICK AG for the generous sponsorship
for this conference and the continuous support of scientific activities in and
around Freiburg. We thank Daniel Schober and the ESRI Geoinformatik GmbH
for sponsoring the travel grants to PhD students participating in the doctoral
colloquium.
We thank the Deutsche Forschungsgemeinschaft and the National Science
Foundation and their program directors Bettina Zirpel, Gerit Sonntag, and Soo-
Siang Lim for continued support of our research and for encouragement and
enhancing our international research cooperations.
For the review process and for the preparation of the conference proceed-
ings we used the EasyChair conference management system, which we found
convenient to use.
Finally, we thank Alfred Hofmann and his staff at Springer for their contin-
uing support of our book series as well as for sponsoring the Spatial Cognition
2008 Best Paper Award.
September 2008 Christian Freksa

Nora Newcombe
Peter Gärdenfors
Stefan Wölfl
Conference Organization
Program Chairs
Christian Freksa
Nora S. Newcombe
Peter Gärdenfors
Local Organization
Stefan Wölfl
Tutorial Chair Poster Session Chair

Thomas F. Shipley Christoph Hölscher
Workshop Chairs Doctoral Colloquium Chairs

Kenny Coventry Sven Bertel
Jan M. Wiener Marco Ragni
Program Committee
Pragya Agarwal Maureen Donnelly

Marios Avraamides Matt Duckham
Christian Balkenius Russell Epstein
Thomas Barkowsky Ron Ferguson
John Bateman Ken Forbus
Brandon Bennett Antony Galton
Michela Bertolotto Susan Goldin-Meadow
Stefano Borgo Gabriela Goldschmidt
Melissa Bowerman Klaus Gramann
Angela Brunstein Christopher Habel
Wolfram Burgard Mary Hegarty
Lily Chao Stephen Hirtle
Christophe Claramunt Christoph Hölscher
Eliseo Clementini Petra Jansen
Anthony Cohn Gabriele Janzen
Leila De Floriani Alexander Klippel
X Organization
Markus Knauff Martin Raubal

Stefan Kopp Terry Regier
Maria Kozhevnikov Kai-Florian Richter
Bernd Krieg-Brückner M. Andrea Rodrı́guez
Antonio Krüger Ute Schmid
Benjamin Kuipers Amy Shelton
Yohei Kurata Thomas F. Shipley
Gerhard Lakemeyer Jeanne Sholl
Longin Jan Latecki Barry Smith
Hanspeter Mallot Kathleen Stewart Hornsby
Mark May Holly Taylor
Timothy P. McNamara Barbara Tversky
Tobias Meilinger Florian Twaroch
Daniel R. Montello David Uttal
Stefan Münzer Constanze Vorwerg
Lynn Nadel Stefan Wölfl
Bernhard Nebel Thomas Wolbers
Marta Olivetti Belardinelli Diedrich Wolter
Dimitris Papadias Nico Van de Weghe
Eric Pederson Wai Yeap
Ian Pratt-Hartmann
Additional Reviewers
Daniel Beck
Kirsten Bergmann
Roberta Ferrario
Alexander Ferrein
Stefan Schiffer
Related Book Publications
1. Winter, S., Duckham, M., Kulik, L., Kuipers, B. (eds.): COSIT 2007. LNCS,
vol. 4736. Springer, Heidelberg (2007)
2. Fonseca, F., Rodrı́guez, M.A., Levashkin, S. (eds.): GeoS 2007. LNCS, vol. 4853.
Springer, Heidelberg (2007)
3. Barkowsky, T., Knauff, M., Ligozat, G., Montello, D.R. (eds.): Spatial Cognition
2007. LNCS (LNAI), vol. 4387. Springer, Heidelberg (2007)
4. Barker-Plummer, D., Cox, R., Swoboda, N. (eds.): Diagrams 2006. LNCS (LNAI),
5. Raubal, M., Miller, H.J., Frank, A.U., Goodchild, M.F. (eds.): GIScience 2006.
LNCS, vol. 4197. Springer, Heidelberg (2006)
6. Cohn, A.G., Mark, D.M. (eds.): COSIT 2005. LNCS, vol. 3693. Springer, Heidel-
berg (2005)
7. Rodrı́guez, M.A., Cruz, I., Levashkin, S., Egenhofer, M.J. (eds.): GeoS 2005. LNCS,
8. Meng, L., Zipf, A., Reichenbacher, T. (eds.): Map-based mobile services — Theo-
ries, methods and implementations. Springer, Berlin (2005)
9. Freksa, C., Knauff, M., Krieg-Brückner, B., Nebel, B., Barkowsky, T. (eds.): Spatial
Cognition IV. LNCS (LNAI), vol. 3343. Springer, Heidelberg (2005)
10. Blackwell, A.F., Marriott, K., Shimojima, A. (eds.): Diagrams 2004. LNCS (LNAI),
11. Egenhofer, M.J., Freksa, C., Miller, H.J. (eds.): GIScience 2004. LNCS, vol. 3234.
12. Gero, J.S., Tversky, B., Knight, T. (eds.): Visual and spatial reasoning in design
III, Key Centre of Design Computing and Cognition. University of Sydney (2004)
13. Freksa, C., Brauer, W., Habel, C., Wender, K.F.: Spatial Cognition III. LNCS
(LNAI), vol. 2685. Springer, Heidelberg (2003)
14. Kuhn, W., Worboys, M.F., Timpf, S. (eds.): COSIT 2003. LNCS, vol. 2825.
15. Hegarty, M., Meyer, B., Narayanan, N.H. (eds.): Diagrams 2002. LNCS (LNAI),
16. Egenhofer, M.J., Mark, D.M. (eds.): GIScience 2002. LNCS, vol. 2478. Springer,
Heidelberg (2002)
17. Barkowsky, T.: Mental Representation and Processing of Geographic Knowledge.
LNCS (LNAI), vol. 2541. Springer, Heidelberg (2002)
18. Renz, J.: Qualitative Spatial Reasoning with Topological Information. LNCS
(LNAI), vol. 2293. Springer, Heidelberg (2002)
19. Coventry, K., Olivier, P. (eds.): Spatial language: Cognitive and computational
perspectives. Kluwer, Dordrecht (2002)
20. Montello, D.R. (ed.): COSIT 2001. LNCS, vol. 2205. Springer, Heidelberg (2001)
21. Gero, J.S., Tversky, B., Purcell, T. (eds.): Visual and spatial reasoning in design
II. Key Centre of Design Computing and Cognition. University of Sydney (2001)
22. Habel, C., Brauer, W., Freksa, C., Wender, K.F. (eds.): Spatial Cognition 2000.
LNCS (LNAI), vol. 1849. Springer, Heidelberg (2000)
23. Habel, C., von Stutterheim, C. (eds.): Räumliche Konzepte und sprachliche Struk-
turen. Niemeyer, Tübingen (2000)
XII Related Book Publications
24. Freksa, C., Mark, D.M. (eds.): COSIT 1999. LNCS, vol. 1661. Springer, Heidelberg
(1999)
25. Gero, J.S., Tversky, B. (eds.): Visual and spatial reasoning in design. Key Centre
of Design Computing and Cognition. University of Sydney (1999)
26. Habel, C., Werner, S. (eds.): Special issue on spatial reference systems, Spatial
Cognition and Computation, vol. 1(4) (1999)
27. Freksa, C., Habel, C., Wender, K.F. (eds.): Spatial Cognition 1998. LNCS (LNAI),
28. Hirtle, S.C., Frank, A.U. (eds.): COSIT 1997. LNCS, vol. 1329. Springer, Heidel-
berg (1997)
29. Kuhn, W., Frank, A.U. (eds.): COSIT 1995. LNCS, vol. 988. Springer, Heidelberg
(1995)
Table of Contents
Invited Talks
Virtual Reality as a Valuable Research Tool for Investigating Different
Aspects of Spatial Cognition (Abstract) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
Heinrich H. Bülthoff, Jennifer L. Campos, and Tobias Meilinger
On the “Whats” and “Hows” of “Where”: The Role of Salience in

Spatial Descriptions (Abstract) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
Laura A. Carlson
Learning about Space (Abstract) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Dedre Gentner
Spatial Orientation
Does Body Orientation Matter When Reasoning about Depicted or
Described Scenes? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
Marios N. Avraamides and Stephanie Pantelidou
Spatial Memory and Spatial Orientation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

Jonathan W. Kelly and Timothy P. McNamara
Spatial Navigation
Map-Based Spatial Navigation: A Cortical Column Model for Action
Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Louis-Emmanuel Martinet, Jean-Baptiste Passot, Benjamin Fouque,
Jean-Arcady Meyer, and Angelo Arleo
Efficient Wayfinding in Hierarchically Regionalized Spatial

Environments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
Thomas Reineking, Christian Kohlhagen, and Christoph Zetzsche
Analyzing Interactions between Navigation Strategies Using a

Computational Model of Action Selection . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
Laurent Dollé, Mehdi Khamassi, Benoı̂t Girard, Agnès Guillot, and
Ricardo Chavarriaga
A Minimalistic Model of Visually Guided Obstacle Avoidance and Path

Selection Behavior . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
Lorenz Gerstmayr, Hanspeter A. Mallot, and Jan M. Wiener
XIV Table of Contents
Spatial Learning
Route Learning Strategies in a Virtual Cluttered Environment . . . . . . . . . 104
Rebecca Hurlebaus, Kai Basten, Hanspeter A. Mallot, and
Jan M. Wiener
Learning with Virtual Verbal Displays: Effects of Interface Fidelity on

Cognitive Map Development . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
Nicholas A. Giudice and Jerome D. Tietz
Cognitive Surveying: A Framework for Mobile Data Collection,

Analysis, and Visualization of Spatial Knowledge and Navigation
Practices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
Drew Dara-Abrams
Maps and Modalities

What Do Focus Maps Focus On? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
Kai-Florian Richter, Denise Peters, Gregory Kuhnmünch, and
Falko Schmid
Locating Oneself on a Map in Relation to Person Qualities and Map

Characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
Lynn S. Liben, Lauren J. Myers, and Kim A. Kastens
Conflicting Cues from Vision and Touch Can Impair Spatial Task
Performance: Speculations on the Role of Spatial Ability in Reconciling
Frames of Reference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 188
Madeleine Keehner
Spatial Communication
Epistemic Actions in Science Education . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202
Kim A. Kastens, Lynn S. Liben, and Shruti Agrawal
An Influence Model for Reference Object Selection in Spatially Locative

Phrases . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
Michael Barclay and Antony Galton
Spatial Language
Tiered Models of Spatial Language Interpretation . . . . . . . . . . . . . . . . . . . . 233
Robert J. Ross
Perspective Use and Perspective Shift in Spatial Dialogue . . . . . . . . . . . . . 250

Juliana Goschler, Elena Andonova, and Robert J. Ross
Table of Contents XV
Natural Language Meets Spatial Calculi . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266

Joana Hois and Oliver Kutz
Automatic Classification of Containment and Support Spatial Relations

in English and Dutch . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283
Kate Lockwood, Andrew Lovett, and Ken Forbus
Similarity and Abstraction

Integral vs. Separable Attributes in Spatial Similarity Assessments . . . . . 295
Konstantinos A. Nedas and Max J. Egenhofer
Spatial Abstraction: Aspectualization, Coarsening, and Conceptual

Classification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311
Lutz Frommberger and Diedrich Wolter
Concepts and Reference Frames

Representing Concepts in Time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
Martin Raubal
The Network of Reference Frames Theory: A Synthesis of Graphs and

Cognitive Maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 344
Tobias Meilinger
Spatially Constrained Grammars for Mobile Intention Recognition . . . . . 361

Peter Kiefer
Modeling Cross-Cultural Performance on the Visual Oddity Task . . . . . . 378

Andrew Lovett, Kate Lockwood, and Kenneth Forbus
Spatial Modeling and Spatial Reasoning

Modelling Scenes Using the Activity within Them . . . . . . . . . . . . . . . . . . . . 394
Hannah M. Dee, Roberto Fraile, David C. Hogg, and
Anthony G. Cohn
Pareto-Optimality of Cognitively Preferred Polygonal Hulls for Dot

Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 409
Antony Galton
Qualitative Reasoning about Convex Relations . . . . . . . . . . . . . . . . . . . . . . 426

Dominik Lücke, Till Mossakowski, and Diedrich Wolter
Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441

Virtual Reality as a Valuable Research
Tool for Investigating Different
Aspects of Spatial Cognition
(Abstract)
Heinrich H. Bülthoff, Jennifer L. Campos, and Tobias Meilinger
Max-Planck-Institute for Biological Cybernetics

Spemannstr. 38, 72076 Tübingen, Germany
{heinrich.buelthoff,jenny.campos,tobias.meilinger}@tuebingen.mpg.de
The interdisciplinary research field of spatial cognition has benefited greatly

from the use of advanced Virtual Reality (VR) technologies. Such tools have
provided the ability to explicitly control specific experimental conditions, ma-
nipulate variables not possible in the real world, and provide a convincing, multi-
modal experience. Here we will first describe several of the VR facilities at the
Max Planck Institute (MPI) for Biological Cybernetics that have been developed
to optimize scientific investigations related to multi-modal self-motion percep-
tion and spatial cognition. Subsequently, we will present some recent empirical
work contributing to these research areas.
While in the past, low-quality visual simulations of space were the most promi-
nent types of VR (i.e., simple desktop displays), more advanced visualization
systems are becoming increasingly more desirable. At the MPI we have utilized
a variety of visualization tools ranging from immersive head-mounted displays
(HMD), to large field-of-view, curved projection systems, to a high resolution
tiled display. There is also an increasing need for high-quality, adaptable, large-
scale, simulated environments. At the MPI we have created a virtual replica of
downtown Tübingen throughout which observers can navigate. In collaboration
with ETH Zurich, who have developed “CityEngine”, a virtual city builder, we
are now able to rapidly create virtual renditions of existing cities or customized
environmental layouts. In order to naturally interact within such virtual environ-
ments (VEs), it is also increasingly more important to be able to physically move
within these spaces. Under most natural conditions involving self-motion, body-
based information is inherently present. Therefore, the recent developments of
several sophisticated self-motion interfaces have allowed us to present and eval-
uate natural, multi-sensory navigational experiences in unprecedented ways. For
instance, within a large (12 m×12 m), free-walking space, a high-precision opti-
cal tracking system (paired with an HMD) updates one’s position within a VE
as they naturally navigate through walking or when passively transported (i.e.,
via a robotic wheelchair). Further, the MPI Motion Simulator is a 6-degree of
freedom anthropomorphic robotic arm that can translate and rotate an observer
C. Freksa et al. (Eds.): Spatial Cognition VI, LNAI 5248, pp. 1–3, 2008.

c Springer-Verlag Berlin Heidelberg 2008
2 H.H. Bülthoff, J.L. Campos, and T. Meilinger
in any number of ways (both open and closed-loop). Finally, a new, state-of-the-
art omni-directional treadmill now offers observers the opportunity to experience
unrestricted, limitless walking throughout large-scale VE’s.
When moving through space, both, dynamic visual information (i.e., optic flow),
and body-based information (i.e., proprioceptive/efference copy and vestibular)
jointly specify the magnitude of a distance travelled. Relatively little is currently
known about how these cues are integrated when simultaneously present. In a series
of experiments, we investigated participants’ ability to estimate travelled distances
under a variety of sensory/motor conditions. Visual information presented via an
HMD was combined with body-based cues that were provided either by walking
in a fully-tracked, free-walking space, by walking on a large linear treadmill, or by
being passively transported in a robotic wheelchair. Visually-specified distances
were either congruent or incongruent with distances specified by body-based cues.
Responses reflect a combined effect of both visual and body-based information,
with an overall higher weighting of body-based cues during walking and a relatively
equal weighting of inertial and visual cues during passive movement. The charac-
teristics of self-motion perception have also been investigated using a novel contin-
uous pointing method. This task simply requires participants to view a target and
point continuously towards the target as they moved past it along a straight, for-
ward trajectory. By using arm angle, we are able to measure perceived location and,
hence, perceived self-velocity during the entire trajectory. We have compared the
natural characteristics of continuous pointing during sighted walking with those
during reduced sensory/motor cue conditions, including: blind-walking, passive
transport, and imagined walking. The specific characteristics of self-motion per-
ception during passive transport have also been further evaluated through the use
of a robotic wheelchair and the MPI Motion Simulator.
Additional research programs have focused on understanding particular as-
pects of spatial memory when navigating through visually rich, complex envi-
ronments. In one study that investigated route memory, participants navigated
through virtual Tübingen while it was projected onto a 220◦ field-of-view, curved
screen display. Participants learned two routes while they were simultaneously
required to perform a visual, spatial, or verbal secondary task. In the subsequent
wayfinding phase the participants were asked to locate and “virtually travel”
along the two routes again (via joystick manipulation). During this wayfinding
phase a number of dependent measures were recorded. The results indicate that
encoding wayfinding knowledge interfered with the verbal and spatial secondary
tasks. These interferences were even stronger than the interference of wayfinding
knowledge with the visual secondary task. These findings are consistent with a
dual-coding approach of wayfinding knowledge. This dual coding approach was
further examined in our fully-tracked, free-walking space. In this case, partic-
ipants walked a route through a virtual environment and again were required
to remember the route. For 50% of the intersections they encountered, they
were asked associate it with an arbitrary name they heard via headphones (e.g.,
“Goethe place”). For the other 50% of the intersections they were asked to re-
member the intersection by the local environmental features and not associate
Virtual Reality as a Valuable Research Tool 3
it with a name. In a successive route memory test participants were “beamed”

to an intersection and had to indicate in which direction they originally traveled
the route. Participants performed better at intersections without a name than
they did for intersections associated with an arbitrary name. When repeating the
experiment with meaningful names that accurately represented the environmen-
tal features (e.g., “Hay place”), the results turned around (i.e., naming a place
no longer lead to worse performance). These results indicate that the benefits of
language do not come for free.
References
1. Berger, D.R., Terzibas, C., Beykirch, K., Bülthoff, H.H.: The role of visual cues
and whole-body rotations in helicopter hovering control. In: Proceedings of the
AIAA Modeling and Simulation Technologies Conference and Exhibit (AIAA 2007),
Reston, VA, USA. American Institute of Aeronautics and Astronautics (2007)
2. Bülthoff, H.H., van Veen, H.A.H.C.: Vision and action in virtual environments:
Modern psychophysics in spatial cognition research. In: Jenkin, M., Harris, M.L.
(eds.) Vision and Attention, pp. 233–252. Springer, Heidelberg (2000)
3. Campos, J.L., Butler, J.S., Mohler, B.J., Bülthoff, H.H.: The contributions of visual
flow and locomotor cues to walked distance estimation in a virtual environment. In:
Proceedings of the 4th Symposium on Applied Perception in Graphics and Visual-
ization, p. 146. ACM Press, New York (2007)
4. Meilinger, T., Knauff, M., Bülthoff, H.H.: Working memory in wayfinding - a dual
task experiment in a virtual city. Cognitive Science 32, 755–770 (2008)
5. Mohler, B.J., Campos, J.L., Weyel, M., Bülthoff, H.H.: Gait parameters while walk-
ing in a head-mounted display virtual environment and the real world. In: Proceed-
ings of Eurographics 2007, Eurographics Association, pp. 85–88 (2007)
6. Teufel, H.J., Nusseck, H.-G., Beykirch, K.A., Butler, J.S., Kerger, M., Bülthoff,
H.H.: MPI motion simulator: Development and analysis of a novel motion simulator.
In: Proceedings of the AIAA Modeling and Simulation Technologies Conference and
Exhibit (AIAA 2007), Reston, VA, USA. American Institute of Aeronautics and
Astronautics (2007)
On the “Whats” and “Hows” of “Where”:
The Role of Salience in Spatial Descriptions
(Abstract)
Laura A. Carlson
Department of Psychology, University of Notre Dame, USA
According to Clark [1] language is a joint activity between speaker and listener,
undertaken to accomplish a shared goal. In the case of spatial descriptions, one
such goal is for a speaker to assist a listener in finding a sought-for object. For
example, imagine misplacing your keys on a cluttered desktop, and asking your
friend if s/he knows where they are. In response, there are a variety of spatial
descriptions that your friend can select that vary in complexity, ranging from
a simple deictic expression such as “there” (and typically accompanied by a
pointing gesture), to a much more complicated description such as “its on the
desk, under the shelf, to the left of the book and in front of the phone.” Between
these two extremes are descriptions of the form “The keys are by the book”,
consisting of three parts: the located object that is being sought (i.e., the keys);
the reference object from which the location of the located object is specified
(i.e., the book) and the spatial term that conveys the spatial relation between
these two objects (i.e., by). For inquiries of this type (“where are my keys?”), the
located object is pre-specified, but the speaker needs to select an appropriate
spatial term and an appropriate reference object. My research focuses on the
representations and processes by which a speaker selects these spatial terms
and reference objects, and the representations and processes by which a listener
comprehends these ensuing descriptions.
The “Whats”
With respect to selection, one important issue is understanding why particular
terms and particular reference objects are chosen. For a given real-world scene,
there are many possible objects that stand in many possible relations with respect
to a given located object. On what basis might a speaker make his/her selection?
Several researchers argue that reference objects are selected on the basis of prop-
erties that make them salient relative to other objects [2,3,4]. Given the purpose of
the description as specifying the location of the sought-for object, it would make
sense that the reference object be easy to find among the other objects in the dis-
play. However, there are many different properties that could define salience, in-
cluding spatial features, perceptual properties, and conceptual properties.
With respect to spatial features, certain spatial relations are preferred over oth-
ers. For example, objects that stand in front/back relations to a given located ob-
ject are preferred to objects that stand in left/right relations [5]. This is consistent

On the “Whats” and “Hows” of “Where” 5
with well-known differences in the ease of processing different terms [6,7]. In ad-
dition, distance may play an important role, with objects that are closer to the
located object preferred to those that are more distant [8]. Thus, all else being
equal, a reference object may be selected because it is closest to the located object
and/or stands in a preferred relation with respect to the located object.
With respect to perceptual features, Talmy [4] identified size and movability
as key dimensions, with larger and immovable objects preferred as reference
objects. In addition, there may be a preference to select more geometrically
complex objects as reference objects. Blocher and Stopp [9] argued for color,
shape and size as critical salient dimensions. Finally, de Vega et al. [2] observed
preferences for reference objects that are inanimate, more solid, and whole rather
than parts of objects.
Finally, with respect to conceptual features, reference objects are considered
“given” objects, less recently mentioned into the discourse [4]. In addition, there
may be a bias to select reference objects that are functionally related to the
located object [10,11].
In this talk I will present research from my lab in which we systematically
manipulate spatial, conceptual and perceptual features, and ask which dimen-
sions are influential in reference object selection, and how priorities are assigned
across the spatial, perceptual and conceptual dimensions. Both production and
comprehension measures will be discussed. This work will provide a better sense
of how salience is being defined with respect to selecting a reference object for
a spatial description.
The “Hows”
Implicit in the argument that the salience of an object is computed across these di-
mensions is the idea that such computation requires that multiple objects are eval-
uated and compared among each other along these dimensions. That is, to say an
object stands out relative to other objects (for example, a red object among black
objects) requires that the color of all objects (black and red) be computed and com-
pared, and that on the basis of this comparison, the unique object (in this case, red)
stands out (among black). Put another way, an object can only stand out relative
to a contrast set [12]. Research in my lab has examined how properties of various
objects are evaluated and compared during production and comprehension, and
in particular, the point in processing at which properties of multiple objects ex-
ert their influence. For example, we have shown that the presence, placement and
properties of surrounding objects have a significant impact during comprehension
and production [13,11]. I will discuss these findings in detail, and will present elec-
trophysiological data that illustrate within the time course of processing the point
at which these features have an impact.
The Main Points

The main points of the talk will be an identification of the features and dimen-
sions that are relevant for selecting a reference object, and an examination of how
6 L.A. Carlson
and when these features and dimensions have an impact on processing spatial
descriptions. Implications for other tasks and other types of spatial descriptions
will be discussed.
References
1. Clark, H.H.: Using language. Cambridge University Press, Cambridge (1996)
2. de Vega, M., Rodrigo, M.J., Ato, M., Dehn, D.M., Barquero, B.: How nouns and
prepositions fit together: An exploration of the semantics of locative sentences.
Discourse Processes 34, 117–143 (2002)
3. Miller, G.A., Johnson-Laird, P.N.: Language and perception. Harvard University
Press, Cambridge (1976)
4. Talmy, L.: How language structures space. In: Pick, H.L., Acredolo, L.P. (eds.)
Spatial orientation: Theory, research, and application, pp. 225–282. Plenum, New
York (1983)
5. Craton, L.G., Elicker, J., Plumert, J.M., Pick Jr., H.L.: Children’s use of frames of
reference in communication of spatial location. Child Developmen 61, 1528–1543
(1990)
6. Clark, H.H.: Space, time, semantics, and the child. In: Moore, T.E. (ed.) Cognitive
development and the acquisition of language. Academic Press, New York (1973)
7. Fillmore, C.J.: Santa Cruz lectures on deixis. Indiana University Linguistics Club,
Bloomington (1971)
8. Hund, A.M., Plumert, J.M.: What counts as by? Young children’s use of relative
distance to judge nearbyness. Developmental Psychology 43, 121–133 (2007)
9. Blocher, A., Stopp, E.: Time-dependent generation of minimal sets of spatial de-
scriptions. In: Olivier, P., Gapp, K.P. (eds.) Representation and processing of spa-
tial relations, pp. 57–72. Erlbaum, Mahwah (1998)
10. Carlson-Radvansky, L.A., Tang, Z.: Functional influences on orienting a reference
frame. Memory & Cognition 28, 812–820 (2000)
11. Carlson, L.A., Hill, P.L.: Processing the presence, placeent and properties of a
distractor in spatial language tasks. Memory & Cognition 36, 240–255 (2008)
12. Olson, D.: Language and thought: Aspects of a cognitive theory of semantics.
Psychological Review 77, 143–184 (1970)
13. Carlson, L.A., Logan, G.D.: Using spatial terms to select an object. Memory &
Cognition 29, 883–892 (2001)
Learning about Space
(Abstract)
Dedre Gentner
Department of Psychology, Northwestern University, USA
Spatial cognition is important in human learning, both in itself and as a major substrate
of learning in other domains. Although some aspects of spatial cognition may be in-
nate, it is clear that many important spatial concepts must be learned from experience.
For example, Dutch and German use three spatial prepositions—op, aan, and om in
Dutch—to describe containment and support relations, whereas English requires just
one preposition—on—to span this range. How do children learn these different ways
of partitioning the world of spatial relations? More generally, how do people come to
understand powerful spatial abstractions like parallel, convergent, proportionate, and
continuous?
I suggest that two powerful contributors to spatial learning are analogical mapping—
structural alignment and abstraction—and language, especially relational language,
which both invites and consolidates the insights that arise from analogical processes.
I will present evidence that (1) analogical processes are instrumental in learning new
spatial relational concepts; and, further, that (2) spatial relational language fosters ana-
logical processing. I suggest that mutual bootstrapping between structure-mapping pro-
cesses and relational language is a major contributor to spatial learning in humans.
C. Freksa et al. (Eds.): Spatial Cognition VI, LNAI 5248, p. 7, 2008.

Does Body Orientation Matter When Reasoning about
Depicted or Described Scenes?*
Marios N. Avraamides and Stephanie Pantelidou
Department of Psychology, University of Cyprus

P.O. Box 20537, 1678 Nicosia, Cyprus
mariosav@ucy.ac.cy, ps04sp1@ucy.ac.cy
Abstract. Two experiments were conducted to assess whether the orientation of

the body at the time of test affects the efficiency with which people reason
about spatial relations that are encoded in memory through symbolic media.
Experiment 1 used depicted spatial layouts while Experiment 2 used described
environments. In contrast to previous studies with directly-experienced spatial
layouts, the present experiments revealed no sensorimotor influences on per-
formance. Differences in reasoning about immediate and non-immediate envi-
ronments are thus discussed. Furthermore, the same patterns of findings (i.e.,
normal alignment effects) were observed in the two experiments supporting the
idea of functional equivalence of spatial representations derived from different
modalities.
Keywords: body orientation, sensorimotor interference, perspective-taking,

spatial reasoning.
1 Introduction
While moving around in the environment people are able to keep track of how ego-
centric spatial relations (i.e., self-to-object directions and distances) change as a result
of their movement [1-4]. To try out an example, choose one object from your imme-
diate surroundings (e.g., a chair), and point to it. Then, close your eyes and take a few
steps forward and/or rotate yourself by some angle. As soon as you finish moving, but
before opening your eyes, point to the object again. It is very likely that you pointed
very accurately and without taking any time to contemplate where the object might be
as a result of your movement. This task which humans can carry out with such re-
markable efficiency and speed entails rather complex mathematical computations. It
requires that the egocentric location of an object is initially encoded and then continu-
ously updated while moving in the environment. The mechanism that allows people to
update egocentric relations and stay oriented within their immediate surroundings is
commonly known as spatial updating.
Several studies have suggested that spatial updating takes place automatically with
physical movement because such movement provides the input that is necessary for
*
The presented experiments have been conducted as a part of an undergraduate thesis by
Stephanie Pantelidou.
Does Body Orientation Matter When Reasoning about Depicted or Described Scenes? 9
updating [2, 4]. In the case of non-visual locomotion this input consists of kinesthetic
cues, vestibular feedback, and copies of efferent commands. The importance of
physical movement is corroborated by empirical findings showing that participants
point to a location equally fast and accurately from an initial standpoint and a novel
standpoint they adopt by means of physical movement (as in the example above). In
contrast, when the novel standpoint is adopted by merely imagining the movement,
participants are faster and more accurate to respond from their initial than their novel
(imagined) standpoint [5]. This is particularly the case when an imagined rotation is
needed to adopt the novel standpoint.
The traditional account for spatial updating [4, 6] posits that spatial relations are
encoded and updated on the basis of an egocentric reference frame (i.e., a reference
frame that is centered on one´s body). Because egocentric relations are continuously
updated when moving, reasoning from one’s physical perspective is privileged as it
can be carried out on the basis of relations that are directly represented in memory.
Instead, reasoning from imagined perspectives is deliberate and effortful as it entails
performing “off-line” mental transformations to compute the correct response. Re-
cently, May proposed the sensorimotor interference account which places the exact
locus of difficulty for responding from imagined perspectives at the presence of con-
flicts between automatically-activated sensorimotor codes that specify locations rela-
tive to the physical perspective and cognitive codes that define locations relative to
the imagined perspective [7, 8]. Based on this account, while responding from an
actual physical perspective is facilitated by compatible sensorimotor codes, in order to
respond from an imagined perspective, the incompatible sensorimotor codes must be
inhibited while an alternative response is computed. The presence of conflicts reduces
accuracy and increases reaction time when reasoning from imagined perspectives. In
a series of elegant experiments, May provided support for the facilitatory and interfer-
ing effects of sensorimotor codes [7].
Recently, Kelly, Avraamides, and Loomis [9] dissociated the influence of sensori-
motor interference in spatial reasoning from effects caused by the organizational
structure of spatial memory (see also [10]). In one condition of the study participants
initially examined a spatial layout of 9 objects from a fixed standpoint and perspec-
tive. Then, they were asked to rotate 90° to their left or right to adopt a novel perspec-
tive. From this perspective participants carried out a series of localization trials that
involved pointing to object locations from various imagined perspectives. This para-
digm allowed dissociating the orientation of the testing perspective from that of the
perspective adopted during learning. This dissociation is deemed necessary in light of
evidence from several studies showing that spatial memories are stored with a pre-
ferred direction that is very often determined by the learning perspective [11]. Results
revealed that responding from imagined perspectives that coincided with either the
learning or the testing perspective was more efficient compared to responding from
other perspectives. A similar result was obtained in the earlier study of Mou, McNa-
mara, Valiquette, and Rump [10] which suggested that independent effects attributed
to the orientation of the body of the observer at test and the preferred storage orienta-
tion of spatial memory can be obtained in spatial cognition experiments. Kelly et al,
have termed the former effect as the sensorimotor alignment effect and the latter as
the memory-encoding alignment effect.
10 M.N. Avraamides and S. Pantelidou
In order to investigate the boundary conditions of sensorimotor facilita-

tion/interference, Kelly et al included an experimental condition in which participants
performed testing trials after having moved to an adjacent room. Results from this
condition revealed that when participants reasoned about relations that were not im-
mediately present, no sensorimotor interference/facilitation was exerted on perform-
ance. Only a memory-encoding alignment effect was obtained in this condition.
The study by Kelly et al. provided evidence that the orientation of one´s body
when reasoning about space influences performance only when immediate spatial
relations are retrieved. Presumably this occurs because egocentric relations are main-
tained in a transient sensorimotor representation that functions to encode and auto-
matically update egocentric directions and distances to objects in one´s immediate
surroundings [12, 13]. When reasoning about remote environments such a representa-
tion is of little, if any, use. In this case, a more enduring, perhaps allocentric, repre-
sentation would more suitably provide the information needed to compute the spatial
relations as needed (see [14] for a comprehensive review of theories of memory that
provide for multiple encoding systems). If this is true, then the same pattern of find-
ings (i.e., presence of memory-encoding alignment effect but no sensorimotor align-
ment effect) should be expected when people reason about spatial relations included
in any remote environment regardless of how it is encoded. Although we very fre-
quently in our daily lives reason about environments that we have previously experi-
enced directly, in many cases we process spatial relations that have been committed to
memory through symbolic media such as pictures, movies, language etc (e.g., plan-
ning a route after having studied a map).
While numerous studies have been carried out to examine how people reason about
depicted or described experiments, most studies have either focused on examining
effects caused by the misalignment between medium and actual space [15] or have
confounded the orientations of the learning and testing perspectives [16] . As a result,
it is not yet known whether the orientation of the observer’s body mediates spatial
reasoning for environments encoded through symbolic media. The goal of the present
study is to assess whether the orientation of the body influences performance when
reasoning about spatial relations contained in depicted (Experiment 1) or a described
(Experiment 2) remote layout. We expect that the use of remote environments will
give rise to a pattern of findings similar to those obtained in conditions in which par-
ticipants are tested after being removed from the learning environment. If such a
result is obtained, it would further highlight the fundamental difference between “on-
line” reasoning about immediate environments and “off-line” reasoning about remote
environments. A secondary goal of the study is to compare spatial reasoning for de-
picted and linguistic spatial scenes in order to assess the functional equivalence of
spatial layouts that are derived from different modalities. This is a question that has
accumulated increased theoretical interest in recent years, presumably because it bears
important implications for modern tools and applications that rely on sensory substitu-
tion, as in the case of navigational systems for the blind. Most previous studies tested
functional equivalence using environments that were immediate to participants [17-
19]. Although some indirect evidence suggests that learning an environment from a
map or text engages the same parieto-frontal network in the brain [20, 21], it is impor-
tant to test whether the same behavioral effects are found when reasoning for spatial
relations derived from different modalities. By comparing the findings of Experiments
1 and 2 in the present study, we will be able to assess the degree of functional equiva-
lence between scenes that are learned though pictures and language. Based on evi-
dence from previous studies that examined the organization of spatial memories
derived from maps and linguistic descriptions [22, 23], we expect that similar patterns
of findings will be found in the two experiments.
For the present experiments we adopted the paradigm used by Waller, Montello,
Richardson, and Hegarty [24] and previously by Presson and Hazelrigg [15]. In these
studies participants first learned various 4-point paths and then made judgments of
relative direction by adopting imagined perspectives within the paths. Trials could be
classified as aligned (i.e., the orientation of the imagined perspective matched the
physical perspective of the participant) or as contra-aligned (i.e., the imagined per-
spective deviated 180° from the physical perspective of the participant). The typical
result when participants carry out the task without moving from the learning stand-
point/perspective (Stay condition in [24]) is that performance is more efficient in
aligned than contra-aligned trials. This finding is commonly referred to as an align-
ment effect. Additional interesting conditions were included in the study by Waller.
In experiment 2, a Rotate condition was included. In this condition, participants per-
formed the task after having physically rotated 180°. The rationale was that if the
alignment effect is caused primarily by the learning orientation then a similar align-
ment effect to that of the Stay condition would be obtained. However, if the alignment
effect is caused by the influence of the orientation of the body at the time of test, a
reverse-alignment effect should be expected. Results, however, revealed no alignment
effect (see also[25]). Two additional conditions, namely the Rotate-Update and the
Rotate-Ignore, provided important results. In the Rotate-Update condition participants
were instructed to physically rotate 180° in place and imagine that the spatial layout
was behind them (i.e., they updated their position relative to the learned layout). In
the Rotate-Ignore condition participants also rotated by 180° but were asked to imag-
ine that the learned layout had rotated along with them. Results revealed a normal
alignment effect in the rotate-ignore condition but a reverse-alignment effect in the
rotate-update condition. Overall, these findings suggest that the orientation of the
body is important when reasoning about immediate environments.
In the present experiments we adopted the rationale of Waller et al. to examine the
presence of normal vs. reverse alignment effects in Stay and Rotate conditions. How-
ever, in contrast to Waller et al., the paths that we have used were not directly experi-
enced by participants. Instead, they were presented on a computer monitor as either
pictures (Experiment 1) or text route descriptions (Experiment 2). If the orientation
of the body of the participant at the time of test influences performance, a normal
alignment effect should be found in Stay conditions and a reverse alignment effect
should be obtained in Rotate conditions. However, if the learning perspective domi-
nates performance then a normal alignment effect should be expected in both Stay and
Rotate conditions. Finally, a third possibility is that both the learning and physical
perspectives influence performance, as shown by Kelly et al for immediate environ-
ments. In that case, if the two effects are of equal magnitude then no alignment effect
should be expected in Rotate conditions as the two effects would cancel each other
out. However, without making any assumptions about the magnitude of the two ef-
fects, we should at least expect a reduced alignment effect in Rotate conditions, if
indeed both learning and physical perspectives influence reasoning.
2 Experiment 1
In Experiment 1 participants encoded paths that were depicted on a computer screen
and then carried out judgments of relative direction (JRD’s). A Stay condition and a
Rotate condition (in which neither update nor ignore instructions were given) were
included. Based on previous findings documenting that the orientation of one’s body
does not typically influence spatial reasoning about non-immediate environments, we
predict that a normal alignment effect would be present in both the Stay and Rotate
conditions. We also expect that overall performance will be equal in the Stay and
Rotate conditions.
2.1 Method
Participants
Twenty-two students from an introductory psychology course at the University of
Cyprus participated in the experiment in exchange for course credit. Twelve were
assigned to the Stay condition and 10 to the Rotate condition.
Design
A 2 (observer position: Stay vs Rotate) x 3 (imagined perspective: aligned 0°, mis-
aligned 90°, contra-aligned 180°) mixed factorial design was used. Observer position
was manipulated between subjects while imagined perspective varied within-subjects.
Materials and Apparatus

Two 19” LCD monitors attached to a computer running the Vizard software (from
WorldViz, Santa Barbara, CA) were used to display stimuli. The monitors were
placed facing each other and participants sat on a swivel chair placed in-between the
two monitors. Four paths were created as models with Source SDK (from Valve Cor-
poration). Oblique screenshots of these models constituted the spatial layouts that
participants learned. Each path comprised of 4 segments of equal length that con-
nected 5 numbered location points (Figure 1).
Pointing responses were made using a joystick with the angle of deflection and la-
tency of pointing being recorded by the computer at each trial.
2.2 Procedure
Prior to the beginning of the experiment participants were shown example paths on
paper and were instructed on how to perform JRD’s. JRD’s involve responding to
statements of the form “imagine being at x, facing y. Point to z” were x, y, and z are
objects/landmarks from the studied layout. Prior to the beginning of the experiment
participants were asked to perform various practice trials with JRD’s using campus
landmarks as targets and responding both with their arms and the joystick. Then,
participants were seated in front of one of the monitors and were asked to study the
first path. They were instructed to visualize themselves moving on the path. The
Fig. 1. Example of a path used in Experiment 1
initial direction of imagined movement was to the left for two paths and to the right in
the other two (e.g., Figure 1). This was done to avoid confounding the initial move-
ment direction with either the orientation of the body or the one opposite to it. Partici-
pants were given unlimited time to memorize the path and then proceeded to perform
the experimental trials. Each trial instructed them to imagine adopting a perspective
within the memorized path (e.g., Imagine standing at 1 facing 2) and point from it
with the joystick toward a different position in the path (e.g., Point to 3). Participants
in the Stay condition performed the trials on the same monitor on which they have
previously viewed the path. Those in the Rotate condition were asked to rotate 180°
and perform the pointing trials on the other monitor. Participants were instructed to
respond as fast as possible but without sacrificing accuracy. Sixteen trials for each
path were included yielding to a total of 64 trials per subject. Four imagined perspec-
tives (i.e., aligned 0°, misaligned 90° left, misaligned 90° right, and contra-aligned
180°) were equally represented in the 64 trials. Furthermore, correct pointing re-
sponses, which could be 45°, 90°, and 135° to the left and right of the forward joy-
stick position, were equally distributed across the four imagined perspectives. The
order of trials within each path was randomized. Also, the order in which the four
paths were presented to participants varied randomly.
2.3 Results
Separate analyses for pointing accuracy and latency for correct responses were carried
out. In order to classify responses as correct and incorrect, joystick deflection angles
were quantized as follows. Responses between 22.5° and 67.5° from to forward posi-
tion of the joystick were classified as 45° responses to the left or right depending on
the side of deflections. Similarly, responses that fell between 67.5° and 112.5° were
considered as 90° responses to the left or right. Finally, responses between 112.5° and
157.5° were marked as 135° responses. Initial analyses of accuracy and latency in-
volving all four imagined perspectives revealed no differences between the 90° left
and the 90° right perspectives in either Stay or rotate conditions. Therefore, data for
these two perspectives were averaged to form a misaligned 90° condition. A 2 (ob-
server position) x 3 (imagined perspective) mixed-model Analysis of Variance
(ANOVA) was conducted for both accuracy and latency data.
Accuracy
The analysis revealed that overall accuracy was somewhat higher in the Stay (79,9%)
than in the Rotate (73,9%) condition. However, this difference did not reach statistical
significance, F(1,20)=.92, p=.35, η2 =.04. A significant main effect for imagined
perspective was obtained, F(2,40)=8.44, p<.001, η2 =.30. As seen in Table 1, accuracy
was higher for the aligned 0° perspective (84,4%), intermediate for the misaligned 90°
perspective (76,2%), and the lowest for the 180° contra-aligned perspective (70,2%).
Within-subject contrasts verified that all pair-wise differences were significant,
p´s<.05. Importantly, this pattern was obtained in both the Stay and Rotate conditions
as evidenced by the absence of a significant interaction, F(2,40)=.40, p=.68, η2 =.02.
Table 1. Accuracy (%) in Experiment 1 as a function of observer position and imagined per-
spective. Values in parentheses indicate standard deviations.
Aligned 90° Misaligned 90° Contra-Aligned 180°
Stay 86.27 (18.40) 78.57 (17.55) 75,00 (23.23)
Rotate 82.45 (13.11) 73.75 (13.28) 65.42 (15.63)
Latency
The analysis of latencies yielded similar findings with the accuracy data. No differ-
ences were obtained between the Stay (11,63s) and the Rotate (11,45s) conditions,
F(1,20)=.03, p=.87, η2=.001. However, a significant main effect was obtained for
imagined perspective, F(2,40)=19,96, p<.001, η2 =.50.
As seen in Figure 2, pointing was faster in the aligned 0° condition (9,80s), inter-
mediate in the misaligned 90° condition (11,47s), and the slowest in the contra-
aligned 180° condition (13,35s). All pair-wise comparisons were significant, p´s<.01.
Fig. 2. Latency for pointing responses as a function of observer position and imagined perspec-
tive in Experiment 1. Error bars represent standard errors.
Finally, the interaction between observer position and imagined perspective was not
significant, F(2,40)=.72, p=.50, η2=.04.
2.4 Discussion
Results from Experiment 1 clearly documented the presence of a normal alignment

effect in both Stay and Rotate conditions. This effect was present in both accuracy
and latency. These findings contrast those of Waller et al [24], who found no align-
ment effect in the Rotate condition and a reverse-alignment effect in the Rotate-
Update condition. The critical difference between the two studies is in our opinion the
fact that our depicted scenes referred to non-immediate environments while the lay-
outs in Waller et al’s study were immediate to participants. We will return to this
issue in the General Discussion.
3 Experiment 2
Experiment 2 was identical to Experiment 1 with the only exception being that instead
of presenting the paths in pictures, route descriptions were shown. Previous studies
with route descriptions have documented the presence of a strong influence of the
orientation of the first travel segment of the path on spatial performance [26]; this
suggests that the way the path is represented in memory determined the ease of spatial
reasoning. Based on these findings we expect that no influence of body orientation
would be evidenced in our experiment. Like Experiment 1, we predict the presence of
a normal alignment effect in both Stay and Rotate conditions.
3.1 Method
Participants
Twenty-two students, none of which were included in Experiment 1, participated in
the experiment in exchange for course credit. Half were randomly assigned to the
Stay condition and the other half to the Rotate condition.
Design
As in Experiment 1, the design adopted was a 2 (observer position: Stay vs rotate) x 3
(imagined perspective: aligned 0°, misaligned 90°, contra-aligned 180°) mixed facto-
rial with observer position and imagined perspective as between-subject and within-
subject factors respectively.
Materials and Apparatus
In contrast to Experiment 1, the paths were learned through text descriptions pre-
sented on the screen. These descriptions were presented in Greek, the native language
of all participants. Prior to the experiment participants were shown a picture as the
one in Figure 1, which however included no path. They were told that this was an
environment in which they should imagine themselves standing in. The text descrip-
tion described the same paths of Experiment 1. An English-translation of an example
description would read as follows:
Imagine standing at the beginning of a path. The position that you are
standing at is position 1. Without moving from this position, you turn
yourself to the left. Then, you walk straight for 10 meters and you reach
position 2. As soon as you get there you turn towards the left again and
you walk another 10 meters to reach position 3. At this position, you
turn to your right and walk another 10 meters to position 4. Finally, you
turn again to your right and walk another 10 meters towards position 5
which is the endpoint of the path.
3.2 Procedure
The procedure was identical to that of Experiment 1. Prior to reading the descriptions
participants were instructed to visualize themselves moving along the described path
and imagine turning 90° whenever a turn was described. As in Experiment 1, the
initial movement direction was to the left for two paths and to the right for the other
two. Participants in the Rotate condition carried out a physical 180° turn prior to be-
ginning the test trials.
3.3 Results
As in Experiment 1, no differences were obtained between the 90° left and the 90°
right imagined perspective in either accuracy or latency. Therefore, data were aver-
aged across these two perspectives to form a 90° misaligned perspective condition.
Separate 2 x 3 repeated measures ANOVA were then conducted for accuracy and
latency.
Accuracy
The ANOVA on accuracy data revealed that overall performance was equivalent
between the Stay (68,7%) and the rotate conditions (70,3%), F(1,20)=.40, p=.84, η2
=.002. A significant main effect for imagined perspective was obtained,
F(2,40)=17.60, p<.001, η2 =.47. As seen in Table 2, accuracy was higher for the
aligned 0° perspective (77,1%), intermediate for the misaligned 90° perspective
(69,8%), and the lowest for the 180° contra-aligned perspective (61,7%). Within-
subject contrasts verified that all pair-wise differences were significant, p´s<.05.
These difference among perspectives were present in both the Stay and rotate condi-
tions as suggested by the lack of a significant interaction, F(2,40)=.22, p=.81, η2 =.01.
Table 2. Accuracy (%) in Experiment 2 as a function of observer position and imagined per-
spective. Values in parentheses indicate standard deviations.
Aligned 90° Misaligned 90° Contra-Aligned 180°
Stay 76.96 (20.59) 69.24 (20.84) 59.94 (23.67)
Rotate 77.15 (16.89) 70.38 (18.85) 63.45 (17.37)
Fig. 3. Latency for pointing responses as a function of observer position and imagined perspec-
tive in Experiment 1. Error bars represent standard errors.
Latency
The analysis reveal no difference in performance for the Stay (12,39 s) and the Rotate
(11,79 s) conditions, F(1,20)=.12, p=.74, η2 =.006. A significant main effect was pre-
sent for imagined perspective, F(2,40)=24,22, p<.001, η2 =.55.
As seen in Figure 3, participants pointed faster in the aligned 0° condition (10,51 s),
intermediate in the misaligned 90° condition (11,82 s), and the slowest in the contra-
aligned 180° condition (13,94 s). All pair-wise comparisons were significant, p´s<.05.
Finally, the interaction between observer position and imagined perspective was not
significant, F(2,40)=.41, p=.67, η2 =.02.
3.4 Discussion and Cross-Experiment Analyses
Results from Experiment 2 replicated closely those of Experiment 1. Specifically, a

normal alignment effect was evidenced in both Stay and Rotate conditions. This effect
was present in both accuracy and latency data. Furthermore, performance did not
seem to be influenced by rotation as indicated by the equal overall performance be-
tween the Stay and Rotate conditions. The presence of a similar pattern of findings
with depicted and described scenes is compatible with recent accounts of functional
equivalence of representation derived from various modalities. To further assess func-
tional equivalence we conducted a cross-experiment analysis using the data from
Experiments 1 and 2. Separate 3 x 2 ANOVA’s using imagined perspective as a
within-subject factor and experiment (visual vs. verbal) as a between-subjects factor
were carried out for accuracy and latency data.
Accuracy was higher in the visual task of Experiment 1 (77,2%) than in the verbal
task of Experiment 2 (69,4%). However, this difference fail short of significance,
F(1,42)=2.37, p=.13, η2 =.05. The interaction between experiment and imagined per-
spective was also non-significant, F(2,84)=.18, p=.84. η2=.004. The only significant
effect was the main effect of imagined perspective, F(2,84)=24.12, p<.001, η2 =.37.
Similarly, the only significant effect in the latency analysis was the main effect of
perspective, F(2,84)=45.21, p<.001, η2 =.52 . In support of the functional equivalence
hypothesis, neither the main effect for experiment nor the interaction between ex-
periment and imagined perspective were significant, F(1,42)=.32, p=.58, η2=.01 and
F(2,84)=.14, p=.87, η2=.003 respectively.
4 General Discussion
The experiments presented here provide evidence for the lack of sensorimotor influ-
ence for reasoning about spatial relations contained in depicted or described environ-
ments. The current findings deviate from those obtained from experiments with real
visual scenes in which the influence of body orientation was substantial [9, 10].
While our findings suggest that reasoning through symbolic media might not al-
ways be equivalent to reasoning about actual environments, in our opinion, the criti-
cal variable is not whether the environments are experienced directly through our
senses or indirectly through symbolic media but rather whether the spatial relations
they contain are immediate or not (see [9]). We believe that reasoning about remote
locations is free of sensorimotor facilitation/interference. Because symbolic media are
typically used to encode non-immediate spatial relations while immediate relations

are encoded through direct experience, the difference in findings occurs. Compatible
with this explanation are the findings of Kelly et al which showed that no sensorimo-
tor influence occurs when participants are removed from the spatial layout they had
previously encoded by means of visual perception [9].
The current findings are compatible with theories of spatial memory and action that
posit separate systems for encoding egocentric and allocentric relations [8, 27, 28]. In
these theories, egocentric relations are maintained in a transient sensorimotor memory
system and are updated as one moves within the environment. On the other hand,
allocentric relations (i.e., inter-object directions and distances) are maintained in an
enduring memory system. As Mou et al [10] suggested, memories in the enduring
system are stored with a preferred orientation which can be chosen based on a variety
of factors that include viewing perspective, instructions, the internal structure of the
layout etc.
In their critical evaluation of spatial memory theories, Avramides and Kelly [14]
argued that when reasoning about immediate spatial relations, both the transient sen-
sorimotor and the enduring systems are relevant to the task. When a participant is
asked to point to a location from her actual perspective, performance is facilitated by
the fact that the self-to-object vector signifying the correct response is directly repre-
sented in the sensorimotor system and is automatically activated as suggested by May
[7]. However, in order to point from an imagined perspective, the participant must
suppress this self-to-object vector and compute a response using the inter-object rela-
tions from the enduring system. As Waller and Hodgson [28] have recently suggested,
computations from the enduring system are cognitively effortful. Reasoning from
imagined perspectives is thus expected to take longer and be prone to sensorimotor
interference. Avraamides and Kelly also argued that when reasoning about non-
immediate spatial relations only the enduring system is relevant to the task. This is the
case because the transient egocentric system functions to encode the current surround-
ings and not the layout one reasons about. As a result, performance is neither facili-
tated nor interfered with by the physical orientation of the participant.
The tasks we used in this experiment seem to fall under the second type of reason-
ing described by Avraamides and Kelly. We have used pictures and descriptions that
referred to spatial layouts that were understood as remote to participants. We have
also instructed participants to visualize themselves within the environment that was
shown or described. If indeed the environments were understood to be remote, no
egocentric relations should have been formed between the actual self and the loca-
tions contained in the layouts. Indeed, we believe that the task was executed solely on
the basis of an enduring allocentric system and we therefore attribute the alignment
effect that was found in all our conditions to the way the paths were represented in
memory. In the case of Experiment 1, we believe that paths were organized in mem-
ory on the basis of viewing experience (i.e., as a snapshot taken from a vantage point
that coincided with the physical observation point of the participant). In the case of
Experiment 2, paths were maintained in memory from the direction of the initial
imagined facing direction. Although no instructions were given to participants in
terms of imagining an initial facing direction, adopting one that is aligned with their
actual facing direction seems less taxing on cognitive resources. Indeed, a number of
previous studies have suggested that people have difficulty in maintaining misaligned
imagined perspectives [26].
At this point it should be pointed out that while we claim that no egocentric rela-
tions between the self and the elements of the path were formed, we acknowledge that
the transient egocentric systems of participants would have been used to encode and
update egocentric relations to objects from the laboratory, including the two computer
monitors used to present stimuli. Moreover, spatial relations between each path loca-
tion and an imagined representation of the self within the path could have been
formed. But, such relations could be more easily classified as allocentric rather than
egocentric if the self in the imagined path is regarded as just another location in the
layout.
A secondary goal of our study was to assess the degree of functional equivalence
between spatial representations created from depicted and described scenes. An im-
portant result is that the same pattern of findings (i.e., a normal alignment effect) was
observed in the two experiments. While performance was somewhat more accurate
for depicted than described scenes, our cross-experiment analysis revealed that the
difference was not significant. The difference in mean accuracy is not surprising
given findings from previous studies showing that it takes a longer to reach the same
level of learning when encoding spatial layouts through language than vision [17-19].
In the current study we have used no learning criterion. Instead, participants were
provided with unlimited time to study the layouts in the two experiments. The accu-
racy and fidelity of their spatial representations was, however, not assessed prior to
testing. It is possible then that the overall performance difference between described
and depicted was caused by differences in encoding. Previous studies suggest that
functional equivalence for representations acquired from different modalities is
achieved after equating conditions in terms of encoding differences [3, 17]. A future
direction for research would thus to examine functional equivalence for representa-
tions of remote environments after taking in account the differences that may exist
across modalities in terms of encoding.
Acknowledgments. We are grateful to all the students who participated in the study.
References
1. Amorim, M.A., et al.: Updating an object’s orientation and location during nonvisual navi-
gation: a comparison between two processing modes. Percept. Psychophys. 59(3), 404–418
(1997)
2. Farrell, M.J., Thomson, J.A.: On-Line Updating of Spatial Information Druing Locomo-
tion Without Vision. J. Mot. Behav. 31(1), 39–53 (1999)
3. Loomis, J.M., et al.: Spatial updating of locations specified by 3-d sound and spatial lan-
guage. J. Exp. Psychol. Learn. Mem. Cogn. 28(2), 335–345 (2002)
4. Rieser, J.J.: Access to knowledge of spatial structure at novel points of observation. J. Exp.
Psychol. Learn. Mem. Cogn. 15(6), 1157–1165 (1989)
5. Presson, C.C., Montello, D.R.: Updating after rotational and translational body move-
ments: coordinate structure of perspective space. Perception 23(12), 1447–1455 (1994)
6. Wang, R.F., Spelke, E.S.: Updating egocentric representations in human navigation. Cog-
nition 77(3), 215–250 (2000)
7. May, M.: Imaginal perspective switches in remembered environments: transformation ver-

sus interference accounts. Cognit. Psychol. 48(2), 163–206 (2004)
8. Mou, W., et al.: Roles of egocentric and allocentric spatial representations in locomotion
and reorientation. J. Exp. Psychol. Learn. Mem. Cogn. 32(6), 1274–1290 (2006)
9. Kelly, J.W., Avraamides, M.N., Loomis, J.M.: Sensorimotor alignment effects in the learn-
ing environment and in novel environments. J. Exp. Psychol. Learn. Mem. Cogn. 33(6),
1092–1107 (2007)
10. Mou, W., et al.: Allocentric and egocentric updating of spatial memories. J. Exp. Psychol.
Learn. Mem. Cogn. 30(1), 142–157 (2004)
11. Mou, W., McNamara, T.P.: Intrinsic frames of reference in spatial memory. J. Exp. Psy-
chol. Learn. Mem. Cogn. 28(1), 162–170 (2002)
12. Wang, R.F.: Between reality and imagination: when is spatial updating automatic? Percept
Psychophys 66(1), 68–76 (2004)
13. Wang, R.F., Brockmole, J.R.: Human navigation in nested environments. J. Exp. Psychol.
Learn. Mem. Cogn. 29(3), 398–404 (2003)
14. Avraamides, M.N., Kelly, J.W.: Multiple systems of spatial memory and action. Cogn.
Process (2007)
15. Presson, C.C., Hazelrigg, M.D.: Building spatial representations through primary and sec-
ondary learning. J. Exp. Psychol. Learn. Mem. Cogn. 10(4), 716–722 (1984)
16. Avraamides, M.N.: Spatial updating of environments described in texts. Cognit. Psy-
chol. 47(4), 402–431 (2003)
17. Avraamides, M.N., et al.: Functional equivalence of spatial representations derived from
vision and language: evidence from allocentric judgments. J. Exp. Psychol. Learn. Mem.
Cogn. 30(4), 804–814 (2004)
18. Klatzky, R.L., et al.: Encoding, learning, and spatial updating of multiple object locations
specified by 3-D sound, spatial language, and vision. Exp. Brain. Res. 149(1), 48–61
(2003)
19. Klatzky, R.L., et al.: Learning directions of objects specified by vision, spatial audition, or
auditory spatial language. Learn. Mem. 9(6), 364–367 (2002)
20. Mellet, E., et al.: Neural basis of mental scanning of a topographic representation built
from a text. Cereb Cortex 12(12), 1322–1330 (2002)
21. Mellet, E., et al.: Neural correlates of topographic mental exploration: the impact of route
versus survey perspective learning. Neuroimage 12(5), 588–600 (2000)
22. Taylor, H.A., Tversky, B.: Descriptions and depictions of environments. Mem. Cog-
nit. 20(5), 483–496 (1992)
23. Denis, M., Zimmer, H.D.: Analog properties of cognitive maps constructed from verbal
descriptions. Psychological Research 54(4), 286–298 (1992)
24. Waller, D., et al.: Orientation specificity and spatial updating of memories for layouts. J.
Exp. Psychol. Learn. Mem. Cogn. 28(6), 1051–1063 (2002)
25. Harrison, A.M.: Reversal of the alignment effect: influence of visualization and spatial set
size. In: Proceedings of the Annual Cognitive Science Meeting (2007)
26. Wildbur, D.J., Wilson, P.N.: Influences on the first-perspective alignment effect from text
route descriptions. Q. J. Exp. Psychol. 61(5), 763–783 (2007)
27. Easton, R.D., Sholl, M.J.: Object-array structure, frames of reference, and retrieval of spa-
tial knowledge. J. Exp. Psychol. Learn. Mem. Cogn. 21(2), 483–500 (1995)
28. Waller, D., Hodgson, E.: Transient and enduring spatial representations under disorienta-
tion and self-rotation. J. Exp. Psychol. Learn. Mem. Cogn. 32(4), 867–882 (2006)
Spatial Memory and Spatial Orientation
Jonathan W. Kelly and Timothy P. McNamara
Department of Psychology, Vanderbilt University

111 21st Ave. South, Nashville, TN 37203
jonathan.kelly@vanderbilt.edu
Abstract. Navigating through a remembered space depends critically on the

ability to stay oriented with respect to the remembered environment and to re-
orient after becoming lost. This chapter describes the roles of long-term spatial
memory, sensorimotor spatial memory, and path integration in determining spa-
tial orientation. Experiments presented here highlight the reference direction
structure of long-term spatial memory and suggest that self-position and orien-
tation during locomotion are updated with respect to those reference directions.
These results indicate that a complete account of spatial orientation requires a
more thorough understanding of the interaction between long-term spatial
memory, sensorimotor spatial memory, and path integration.
Keywords: Navigation; Path integration; Reorientation; Spatial cognition; Spa-

tial memory; Spatial updating.
1 Introduction
Navigation through a familiar environment can be considered a two-part task, where

the successful navigator must first orient him or herself with respect to the known
environment and then determine the correct travel direction in order to arrive at the
goal location. Several accounts of spatial memory and spatial orientation have been
reported in recent years to explain human navigation abilities (Avraamides & Kelly,
2008; Kelly, Avraamides & Loomis, 2007; Mou, McNamara, Valiquette & Rump,
2004; Rump & McNamara, 2007; Sholl, 2001; Waller & Hodgson, 2006; Wang &
Spelke, 2000). Inspired in part by perceptual theories positing separate representations
for perception and action (Bridgeman, Lewis, Heit & Nagle, 1979; Milner & Goodale,
1995; Schneider, 1969), many of these theories of spatial memory agree that a com-
plete account of human navigation and spatial orientation requires multiple spatial
representations. The first such spatial representation is a long-term representation, in
which locations are represented in an enduring manner. This long-term representation
allows the navigator to plan future travels, recognize previously experienced envi-
ronments, and identify remembered locations, even when those locations are obscured
from view. The preponderance of evidence from spatial memory experiments indi-
cates that these long-term representations are orientation dependent, with privileged
access to particular orientations (see McNamara, 2003 for a review). Section 2 (be-
low) reviews the evidence for orientation dependence, and also details recent experi-
ments aimed at understanding the relevant cues that determine which orientations
Spatial Memory and Spatial Orientation 23
receive privileged access, particularly in naturalistic environments that contain many

potential cues.
The second spatial representation consistently implicated in models of spatial
memory is a working memory representation, referred to here as a sensorimotor rep-
resentation, in which locations are represented only transiently. The sensorimotor
representation is thought to be used when performing body-defined actions, such as
negotiating obstacles and moving toward intermediate goal locations like landmarks,
which can function as beacons. Because these behaviors typically rely on egocentri-
cally organized actions, it makes sense that this sensorimotor representation should
also be egocentrically organized, in order to maintain an isomorphic mapping be-
tween representation and response. The evidence reviewed in Section 3 supports this
conjecture, indicating that the sensorimotor representation is organized in an egocen-
tric framework. Although most models of spatial memory agree that the sensorimotor
representation is transient, the exact nature of its transience is not well understood.
While some experiments indicate that the sensorimotor representation fades with time
(e.g., Mou et al., 2004), other evidence shows that the sensorimotor representation
depends primarily on environmental cues that are only transiently available during
locomotion (Kelly et al., 2007). Evidence supporting these two claims is presented in
Section 3.
In order to stay oriented with respect to a known environment, the navigator must
be able to identify salient features of his or her surrounding environment and match
those features with the same features in long-term spatial memory. This point be-
comes particularly evident when attempting to reorient after becoming lost. For ex-
ample, a disoriented student might have an accurate long-term representation of the
campus, along with a vivid sensorimotor representation of his or her surrounding
environment. But unless the student can identify common features shared by both
representations, and bring those representations into alignment based on their com-
mon features, he or she will remain disoriented. Neither the sensorimotor nor the
long-term representation alone contains sufficient information to re-establish location
and orientation within the remembered space. Instead, the disoriented navigator must
be able to align the long-term representation, which contains information about how
to continue toward one’s navigational goal, with the sensorimotor representation of
the immediately surrounding space, similar to how visitors to unfamiliar environ-
ments will often align a physical map with the visible surrounds during navigation. In
Section 4, we review previous work on cues to reorientation, and frame these results
in the context of this matching process between long-term and sensorimotor represen-
tations. We also present new data from two experiments exploring the differences and
similarities in spatial cue use during reorientation and maintenance of orientation, two
tasks integral to successful navigation. The results suggest that spatial orientation is
established with respect to the same reference directions that are used to organize
long-term spatial memories.
2 Long-Term Spatial Memory

An every-day task like remembering the location of one’s car in a stadium parking lot
draws on the long-term spatial memory of the remembered environment. Because
24 J.W. Kelly and T.P. McNamara
locations are inherently relative, objects contained in this long-term spatial memory
must be specified in the context of a spatial reference system. For example, a football
fan might remember the location of his or her car in the stadium parking lot
with respect to the rows and columns of cars, or possibly with respect to the car’s
location relative to the main stadium entrance. In either case, the car’s location must
be represented relative to some reference frame, which is likely to be centered on the
environment.
Much of the experimental work on the organization of long-term spatial memories
has focused on the cues that influence the selection of one spatial reference system
over the infinite number of candidate reference systems. In these experiments, par-
ticipants learn the locations of objects on a table, within a room, or throughout a city,
and are later asked to retrieve inter-object spatial relationships from the remembered
layout. A variety of spatial memory retrieval tasks have been employed, including
map drawing, picture recognition, and perspective taking. These retrieval tasks are
commonly performed after participants have been removed from the learning envi-
ronment, to ensure that spatial memories are being retrieved from the long-term repre-
sentation and not from the sensorimotor representation. Here we focus primarily on
results from perspective taking tasks, where participants point to locations from imag-
ined perspectives within the remembered environment. A consistent finding from
these experiments is that long-term spatial memories are typically represented with
respect to a small number of reference directions, centered on the environment and
selected during learning (see McNamara, 2003, for a review). During spatial memory
retrieval, inter-object spatial relationships aligned with those reference directions are
readily accessible because they are directly represented in the spatial memory. In
contrast, misaligned spatial relationships must be inferred from other represented
relationships, and this inference process is cognitively effortful (e.g, Klatzky, 1998).
The pattern of response latencies and pointing errors across a sample of imagined
perspectives is interpreted as an indicator of the reference directions used to organize
the spatial memory, and a large body of work has focused on understanding the cues
that influence the selection of one reference direction over another during acquisition
of the memory.
Because we use our bodies to sense environmental information and also to act on
the environment, the body’s position during learning seems likely to have a large
influence on selecting a reference direction. Consistent with this thinking, early evi-
dence indicated that perspectives aligned with experienced views are facilitated rela-
tive to non-experienced views. This facilitation fell off as a function of angular dis-
tance from the experienced views (Diwakdar & McNamara, 1997; Roskos-Ewoldsen,
McNamara, Shelton & Carr, 1999; Shelton & McNamara, 1997), and this pattern of
facilitation holds true for up to three learned perspectives. These findings resonate
with similar findings from object recognition (Bülthoff & Edelman, 1992), but are
complicated by two other sets of findings. First, Kelly et al. (2007; see also
Avraamides & Kelly, 2005) had participants learn a layout of eight objects within an
octagonal room using immersive virtual reality. Participants freely turned and ex-
plored the virtual environment during learning, but the initially experienced perspec-
tive was held constant. After learning this layout, imagined perspectives aligned with
the initially experienced perspective were facilitated, and this pattern persisted even
after extensive experience with other perspectives misaligned with that initial view.
The authors concluded that participants established a reference direction upon first
experiencing the environment, and that this organization was not updated even after
learning from many other views. Second, Shelton and McNamara (2001; also see
Hintzman, O’Dell & Arndt, 1981) found that the environmental shape has a profound
impact on selecting reference directions. In one of their experiments, participants
learned a layout of objects on the floor of a rectangular room. Learning occurred
from two perspectives, one parallel with the long axis of the room and one misaligned
with the room axis. Perspective taking performance was best when imagining the
aligned learning perspective and performance on the misaligned learning perspective
was no better than on non-experienced perspectives. The authors concluded that
reference directions are selected based on a combination of egocentric experience and
environmental structure, and that the rectangular room served as a cue to selecting a
reference direction consistent with that structure. This finding is supported by other
work showing facilitated retrieval of inter-object relationships aligned with salient
environmental features like city streets, large buildings, and lakes (McNamara, Rump
& Werner, 2003; Montello, 1991; Werner & Schmidt, 1999).
Other work has shown that selection of reference directions is influenced not only
by features external to the learned layout, but also by the structure of the learned lay-
out itself. For example, the reference directions used to remember the locations of
cars in a stadium parking lot might be influenced by the row and column structure of
the very cars that are being learned. Mou and McNamara (2002) demonstrated the
influence of this intrinsic structure by having participants study a rectilinear object
array. The experimenter pointed out the spatial regularity of the layout, which con-
tained rows and columns oblique to the viewing perspective during learning. Subse-
quent perspective taking performance was best for perspectives aligned with the
intrinsic axes defined by the rows and columns of objects, even though those perspec-
tives were never directly experienced during learning. Furthermore, this influence of
the intrinsic object structure is not dependent on experimenter instructions like those
provided in Mou and McNamara’s experiments. Instead, an axis of bilateral symme-
try within the object array can induce the same organization with respect to an intrin-
sic frame of reference, defined by the symmetry axis (Mou, Zhao & McNamara,
2007).
To summarize the findings reviewed so far, the reference directions used to organ-
ize long-term spatial memories are known to be influenced by egocentric experience,
extrinsic environmental structures like room walls (extrinsic to the learned layout),
and intrinsic structures like rows and columns of objects or symmetry axes (intrinsic
to the learned layout). While these cues have each proven influential in cases where
only one or two cues are available, real world environments typically contain a whole
host of cues, including numerous extrinsic and intrinsic cues like sidewalks, tree lines,
waterfronts, and mountain ranges. A recent set of experiments reported by Kelly &
McNamara (2008) sought to determine whether one particular cue type is dominant in
a more representative scene, where egocentric experience, extrinsic structure, and
intrinsic structure all provided potential cues to selecting a reference direction. In the
first of two experiments using immersive virtual reality, participants learned a layout
of seven virtual objects from two perspectives. The objects were arranged in rows and
columns which were oblique to the walls of a surrounding square room (termed the
incongruent environment, since intrinsic and extrinsic environmental structures
60
Absolute Pointing Error [deg] 0°-135°
50 135°-0°
40
30 135°
20
10
Incongruent environment 0°
0
0 45 90 135 180 225 270 315
60
Absolute Pointing Error [deg]
0°-135° 135°
50 135°-0°
40
30 0°
20
10
Congruent environment
0
0 45 90 135 180 225 270 315
Imagined Perspective [deg]
Fig. 1. Stimuli and results from Kelly and McNamara (2008). Plan views of the incongruent
(top) and congruent (bottom) environments appear as insets within each panel. In the plan
views, open circles represent object locations, solid lines represent room walls, and arrows
represent viewing locations during learning. Pointing error is plotted as function of imagined
perspective, separately for the two viewing orders (0° then 135° or 135° then 0°). After learn-
ing the incongruent environment (top), where intrinsic and extrinsic structures were incongru-
ent with one another, performance was best on the initially experienced view. After learning
the congruent environment (bottom), where intrinsic and extrinsic structures were congruent
with one another, performance was best for perspectives aligned with the redundant environ-
mental structures, regardless of viewing order.
were incongruent with one another; see Figure 1, top panel). One of the learned per-
spectives (0°) was aligned with the intrinsic object structure, and the other (135°) was
aligned with the extrinsic room structure. Learning occurred from both views, and
viewing order was manipulated. If the intrinsic structure was more salient than extrin-
sic structure, then participants should have selected a reference direction from the 0°
view (aligned with the rows and columns of the layout). However, if extrinsic struc-
ture was more salient than intrinsic structure, then participants should have selected a
reference direction from the 135° view (aligned with the walls of the room). Finally, if
the competing intrinsic and extrinsic structures negated one another’s influence, then
participants should have selected a reference direction from the initially experienced
view, regardless of its alignment with a particular environmental structure. In fact,
spatial memories of the incongruent environment (top panel of Figure 1) were based on
the initially experienced view, and the pattern of facilitation is well predicted by the
viewing order. Neither the intrinsic structure of the objects nor the extrinsic structure of
the room was more salient when the two were placed in competition.
In the second experiment reported by Kelly and McNamara (2008), the intrinsic and
extrinsic structures were placed in alignment with one another (termed the congruent
environment; see inset in Figure 1, bottom panel), and learning occurred from two
perspectives, one aligned and one misaligned with the congruent environmental struc-
tures. Spatial memories of the congruent environment (bottom panel of Figure 1) were
organized around the redundantly defined environmental axes. Performance was best
for perspectives aligned with the congruent intrinsic and extrinsic structures, and was
no better on the misaligned experienced view than on other misaligned views that were
never experienced. The results of these two experiments fit well with those reported by
Shelton and McNamara (2001), where multiple misaligned extrinsic structures (a
rectangular room and a square mat on the floor) resulted in egocentric selection of
reference directions, but aligned extrinsic structures resulted in environment-based
selection. Taken together, these findings indicate that intrinsic and extrinsic structures
are equally salient, and can serve to reinforce or negate the influences of one another as
cues to the selection of reference directions. Every-day environments typically contain
multiple intrinsic and extrinsic structures like roads, waterfronts, and tree lines, and
these structures often define incongruent sets of environmental axes. As such, it
is possible that reference directions are most commonly selected on the basis of
egocentric experience.
Experiments on long-term spatial memory have regularly provided evidence that
long-term representations are orientation-dependent, allowing for privileged access to
spatial relations aligned with a reference direction centered on the environment. How-
ever, the evidence reviewed thus far is based primarily on imagined perspective tak-
ing performance, and experiments using scene recognition indicate that there may be
more than one long-term representation. Valiquette and McNamara (2007; also see
Shelton & McNamara, 2004) had participants learn a layout of objects from two per-
spectives, one aligned and one misaligned with the extrinsic structure of the environ-
ment (redundantly defined by the room walls and a square mat on the floor). As in
other experiments (e.g., Kelly & McNamara, 2008; Shelton & McNamara, 2001),
perspective taking performance was better when imagining the aligned learning per-
spective than when imagining the misaligned learning perspective, which was no
better than when imagining other misaligned perspectives that were never experi-
enced. In contrast, scene recognition performance was good on both the aligned and
misaligned learning perspectives, and fell off as a function of angular distance from
the learned perspectives. So while imagined perspective taking performance indicated
that the misaligned learning view was not represented in long-term memory, scene
recognition performance indicated that the misaligned view was represented. The
authors interpreted this as evidence for two long-term representations, one used for
locating self-position (active during the scene recognition test) and the other for locat-
ing goal locations after establishing self-position (active during the perspective taking
task). Importantly, both representations were found to be orientation-dependent, but
the reference directions used to organize the two types of representations were differ-
ent. The influence of these reference directions on navigation is still unclear. One
possibility is that spatial relationships are more accessible when the navigator is
aligned with a reference direction in long-term memory. As a result, a navigator’s
ability to locate and move toward a goal location might be affected by his or her ori-
entation within the remembered space. Additionally, experiments presented in
Section 4 suggest that spatial updating occurs with respect to the same reference di-
rections used to organize spatial memories.
3 Sensorimotor Spatial Memory

Whereas long-term representations are suitable for reasoning about inter-object relation-
ships from learned environments, they are, by themselves, insufficient for coordinating
actions within the remembered environment. In order to act on our environments, we
require a body-centered representation of space, rather than the environment-centered
representations characteristic of long-term spatial memories. Indeed, current theories of
spatial memory (e.g., Avraamides & Kelly, 2008; Kelly et al., 2007; Mou et al., 2004;
Rump & McNamara, 2007; Sholl, 2001) typically include something analogous to a
sensorimotor spatial memory system, which represents egocentric locations of objects in
the environment and can be used to negotiate obstacles, intercept moving objects, and
steer a straight course toward a goal. This sensorimotor representation provides privi-
leged access to objects in front of the body, evidenced by the finding that retrieval of
unseen object locations is facilitated for locations in front of the body, compared to
behind (Sholl, 1987). This same pattern also occurs when imagining perspectives within
a remote environment stored in long-term memory (Hintzman et al., 1981; Shelton &
McNamara, 1997; Werner & Schmidt, 1999), where pointing from an imagined per-
spective is facilitated for objects in front of the imagined position, relative to objects
behind, and suggests that the sensorimotor representation might also be used to access
spatial relationships from non-occupied environments (Sholl, 2001). This privileged
access to objects in front is consistent with other front-facing aspects of human sensory
and locomotor abilities, and highlights the importance of locations and events in front of
the body.
Unlike the environment-centered reference frames characteristic of long-term spa-
tial memories, egocentric locations within the sensorimotor representation must be
updated during movement through the environment. Because this updating process is
cognitively demanding, there is a limit to the number of objects that can be updated
successfully (Wang et al., 2006; but see Hodgson & Waller, 2006). Furthermore, self-
motion cues are critical to successful updating, and a large body of work has studied
the effectiveness of various self-motion cues. While updating the location of one or
two objects can be done fairly effectively during physical movements, imagined
movements, which lack the corresponding self-motion cues, are comparatively quite
difficult (Rieser, 1989). In a seminal study on imagined movements, Rieser asked
blindfolded participants to point to remembered object locations after physical rota-

tions or after imagined rotations. Pointing was equally good before and after physical
rotations, indicating the efficiency of updating in the presence of self-motion cues.
However, performance degraded as a function of rotation angle after imagined rota-
tion. According to Presson and Montello (1994; also see Presson, 1987), pointing
judgments from imagined perspectives misaligned with the body are difficult because
of a reference frame conflict between two competing representations of the surround-
ing environment. The remembered environment in relationship to one’s physical loca-
tion and orientation is held in a primary representation (i.e., the sensorimotor
representation), and the same environment relative to one’s imagined location and
orientation is held in a secondary representation. Imagined movement away from
one’s physical location and orientation creates conflict between these two representa-
tions, referred to here as sensorimotor interference (May, 2004). This conflict occurs
when the primary and secondary representations both represent the same environ-
ment, and therefore sensorimotor interference only affects perspective-taking per-
formance when imagining perspectives within the occupied environment, but not
when imaging perspectives from a remote environment.
Much of the research on the sensorimotor representation has been conducted inde-
pendently from research on long-term spatial memory (reviewed above in Section 2).
However, recent experiments indicate that a complete understanding of the sensori-
motor representation must also take into account the organization of long-term spatial
memory. Experiments by Mou et al. (2004) indicate that the interference associated
with imagining a perspective misaligned with the body depends on whether that imag-
ined perspective is aligned with a reference direction in long-term memory. They
found that the sensorimotor interference associated with imagining a perspective
misaligned with the body was larger when the imagined perspective was also mis-
aligned with a reference direction in long-term memory, compared to perspectives
aligned with a reference direction. However, a thorough exploration of this interaction
between sensorimotor interference and reference frames in long-term spatial memory
is still lacking.
As proposed by Mou et al. (2004), the sensorimotor representation is transient, and
decays at retention intervals of less than 10 seconds in the absence of perceptual sup-
port. However, experiments by Kelly et al. (2007) challenge this notion based on the
finding that sensorimotor interference can occur after long delays involving extensive
observer movements. In one experiment using immersive virtual reality, participants
learned a circular layout of objects within a room. Although participants were allowed
unrestricted viewing of the virtual environment, the initially experienced view was
held constant across participants. The objects were removed after learning, and subse-
quent spatial memory retrieval occurred over two blocks of testing. In each block,
participants imagined perspectives within the learned set of objects, and those perspec-
tives could be 1) aligned with the initially experienced view (termed the “original”
perspective), 2) aligned with the participant’s actual body orientation during retrieval
(termed the “sensorimotor aligned” perspective), or 3) misaligned with both the ini-
tially experienced view and the participant’s body orientation (termed the “misaligned”
perspective). Prior to starting the first block of trials, participants walked three meters
into a neighboring virtual room. Perspective taking performance when standing in this
neighboring room was best when imagining the initially experienced perspective
(compare performance on the original perspective with performance on the misaligned

perspective in Figure 2), but there was no advantage for the perspective aligned, com-
pared to misaligned with the body (compare performance on the sensorimotor aligned
perspective with performance on the misaligned perspective). Based on results from
this first test block, the authors concluded that participants’ sensorimotor representa-
tions of the learned objects were purged upon leaving the learning room and replaced
with new sensorimotor representations of the currently occupied environment (i.e., the
room adjacent to the learning room). As such, there was no sensorimotor interference
when imagining the learned layout while standing in the neighboring room. Using
Presson and Montello’s (1994) framework, participants’ primary and secondary spa-
tial representations contained spatial information from separate environments, and
therefore no sensorimotor interference occurred.
10 Original
9 Sensorimotor aligned
Misaligned
8
Response latency [sec]
7
6
5
4
3
2
1
0
Block 1 Block 2
Fig. 2. Results of Kelly, Avraamides and Loomis (2007). Response latency is plotted as a
function of test block and imagined perspective. After learning a layout of objects, participants
walked into a neighboring room and performed Block 1 of the perspective-taking task. Results
indicate that performance was best for the originally experienced perspective during learning,
but was unaffected by the disparity between the orientation of the imagined perspective relative
to the orientation of the participants’ bodies during testing. After completing Block 1, partici-
pants returned to the empty learning room and performed Block 2. Results indicate that per-
formance was facilitated on the originally experienced perspective, and also on the perspective
aligned with the body during testing.
For the second block of trials, participants returned to the empty learning room (the
learned objects had been removed after learning), and performed the exact same per-
spective-taking task as before. Performance was again facilitated when participants
imagined the initially experienced perspective, but also when they imagined their
actual perspective, compared to performance on the misaligned perspective (see
Figure 2). Despite the fact that participants did not view the learned objects upon
returning to the learning room, their sensorimotor representations of the objects were
reactivated, causing sensorimotor interference when imagining perspectives mis-
aligned with the body. This indicates that walking back into the empty learning room
was sufficient to reinstantiate the sensorimotor representation of the learned objects,
even though several minutes had passed since they were last seen. Renewal of the
sensorimotor representation must have drawn on the long-term representation, be-
cause the objects themselves were not experienced upon returning to the empty
learning room. In sum, Kelly et al.’s experiment suggests that the sensorimotor repre-
sentation is less sensitive to elapsed time than previously thought, and instead is de-
pendent on perceived self-location. The sensorimotor representation appears to be
context dependent, and moving from one room to another changes the context and
therefore also changes the contents of the sensorimotor representation.
4 Spatial Orientation
Staying oriented during movement through a remembered space and reorienting after
becoming lost are critical spatial abilities. With maps and GPS systems, getting lost
on one’s drive home might not present a life or death situation, but the same was not
true for our ancestors, whose navigation abilities were necessary for survival. Accord-
ing to Gallistel (1980; 1990), spatial orientation is achieved, in part, by relating prop-
erties of the perceived environment (i.e., the sensorimotor representation) with those
same properties in the remembered environment (i.e., the long-term representation),
and is also informed by perceived self-position as estimated by integrating self-
motion cues during locomotion, a process known as path integration. The importance
of information from path integration becomes particularly clear when navigating
within an ambiguous environment, such as an empty rectangular room in which two
orientations provide the exact same perspective of the room (e.g., Hermer & Spelke,
1994). In this case, one’s true orientation can only be known by using path integra-
tion to distinguish between the two potentially correct matches between sensorimotor
and long-term representations.
From time to time, the matching between perceived and remembered environments
can produce grossly incorrect estimates of self-position. Jonsson (2002; also see Gal-
listel, 1980) describes several such experiences. In one case, he describes arriving in
Cologne by train. Because his origin of travel was west of Cologne, he assumed that
the train was facing eastward upon its arrival at Cologne Central Station. The train
had, in fact, traveled past Cologne and turned around to enter the station from the
east, and was therefore facing westward upon its arrival. Jonsson’s initial explora-
tions of the city quickly revealed his orienting error, and he describes the disorienting
experience of rotating his mental representation of the city 180° into alignment with
the visible scene. Experiences such as these are typically preceded by some activity
that disrupts the path integration system (like riding in a subway, or falling asleep on
a train), which would have normally prevented such an enormous error.
4.1 Environmental Cues to Spatial Orientation
Much of the experimental work on the topic of human spatial orientation has focused
on the cues used to reorient after explicit disorientation. In particular, those studies
distinguish between two types of environmental cues to spatial orientation: 1) geomet-

ric cues, such as the shape of the room as defined by its extended surfaces, and 2)
featural cues, such as colors, textures, and other salient features that cannot be de-
scribed in purely geometric terms (see Cheng & Newcombe, 2005, for an overview of
the findings in this area). The majority of these experiments employ a task originally
developed by Cheng (1986) to study spatial orientation in rats. Hermer and Spelke
(1994, 1996) adapted Cheng’s task to study reorientation in humans. In the basic
experimental paradigm, participants learn to locate one corner within a rectangular
room, consisting of two long walls and two short walls. Participants are later blind-
folded and disoriented, and are then asked to identify which corner is the learned
corner. When all four room walls are uniformly colored (Hermer & Spelke, 1996),
participants split their responses evenly between the correct corner and the diagonally
opposite corner, both of which share the same ratio of left and right wall lengths and
the same corner angle. Rarely do participants choose one of the geometrically incor-
rect corners, a testament to their sensitivity to environmental geometry and their abil-
ity to reorient using geometric cues. When a featural cue is added by painting one of
the four walls a unique color (Hermer & Spelke, 1996), participants are able to con-
sistently identify the correct corner and no longer choose the diagonally opposite
corner, indicating the influence of featural cues on reorientation.
Recent experiments in our lab have focused on room rotational symmetry as the
underlying geometric cue in determining reorientation performance. Rotational sym-
metry is defined as the number of possible orientations of the environment that result
in the exact same perspective. For example, any perspective within a rectangular
100
90
Percentage of correct responses
80
70
60
50
40
30
20
10
0
Circular Square Rectangular Trapezoidal
Room shape
Fig. 3. Reorientation performance in four rooms, varying in their rotational symmetry. Partici-
pants learned to identify one of twelve possible object locations, and then attempted to locate
the learned location after disorientation.
room (without featural cues) can be exactly reproduced by rotating the room 180°.
Because there are two orientations that produce the same perspective, the rectangular
room is two-fold rotationally symmetric. A square room is four-fold rotationally
symmetric, and so on. In our experiment, we tested reorientation performance within
environments of 1-fold (trapezoidal), 2-fold (rectangular), 4-fold (square) and ∞-fold
(circular) rotational symmetry. Participants memorized one of twelve possible target
locations within the room, and then attempted to re-locate the target position after
explicit disorientation. Reorientation performance (see Figure 3) was inversely pro-
portional to room rotational symmetry across the range of rotational symmetries
tested. This can be considered an effect of geometric ambiguity, with the greater
ambiguity of the square room compared to the trapezoidal room leading to compara-
tively poorer reorientation performance in the square room. The same analysis can be
applied to featural cues, which have traditionally been operationalized as unambigu-
ous indicators of self-location (e.g., Hermer & Spelke, 1996), but need not be
unambiguous.
4.2 Path Integration
Even in the absence of environmental cues, humans can maintain a sense of spatial
orientation through path integration. Path integration is the process of updating per-
ceived self-location and orientation using internal motion cues such as vestibular and
proprioceptive cues, and external motion cues such as optic flow, and integrating
those motion signals over time to estimate self-location and orientation (for a review,
see Loomis, Klatzky, Golledge & Philbeck, 1999).
The path integration process is noisy, and errors accrue with increased walking and
turning. In an experiment by Klatzky et al. (1990), blindfolded participants were led
along an outbound path consisting of one to three path segments, and each segment
was separated by a turn. After reaching the end of the path, participants were first
asked to turn and face the path origin and then to walk to the location of the path
origin. Turning errors and walked-distance errors increased with the number of path
segments, demonstrating that path integration is subject to noise. Errors that accumu-
late during path integration cannot be corrected for without perceptual access to envi-
ronmental features, such as landmarks or geometry.
4.3 Spatial Orientation Using Path Integration and Environmental Cues
Only occasionally are we faced with a pure reorientation task or a pure path integration
task. More commonly, environmental cues and path integration are both available as we
travel through a remembered space. In a recent experiment, we investigated the role of
environmental geometry in spatial orientation when path integration was also available.
Participants performed a spatial updating task, where they learned a location within a
room and attempted to keep track of that location while walking along an outbound path.
At the end of the path they were asked to point to the remembered location. The path was
defined by the experimenter and varied in length from two to six path segments, and
participants actively guided themselves along this path. The task was performed in envi-
ronments of 1-fold (trapezoidal), 2-fold (rectangular), 4-fold (square) and ∞-fold (circu-
lar) rotational symmetry. If rotational symmetry affects spatial updating performance like
it affected reorientation performance (see Section 4.1, above), then performance should
degrade as room shape becomes progressively more ambiguous. The effect of room
rotational symmetry was expected to be particularly noticeable at long path lengths, when
self-position estimates through path integration become especially error-prone (Klatzky
et al., 1990; Rieser & Rider, 1991), and people are likely to become lost and require
reorientation. Contrary to these predictions, spatial updating performance was quite
good, and was unaffected by increasing path length in all three angled environments
(square, rectangular and trapezoidal; see Figure 4). This is in stark contrast to perform-
ance in the circular room, where errors increased with increasing path length. Participants
were certainly using path integration to stay oriented when performing the task. Other-
wise, performance would have been completely predicted by room rotational symmetry
(like the reorientation experiment discussed above in Section 4.1). Participants were also
certainly using room shape cues, when available. Otherwise, pointing errors in all envi-
ronments would have increased with increasing path length, as they did in the circular
room.
To explain these results, we draw on previous work showing that long-term spatial
memories are represented with respect to a small number of reference directions (see
Section 2). Of particular relevance, Mou et al. (2007) showed that reference directions
often correspond to an axis of environmental symmetry. Based on this finding, we
believe that participants in the spatial updating task represented each environment
(including the room itself and the to-be-remembered locations within the room) with
respect to a reference direction, coincident with an environmental symmetry axis.
Perceived self-position was updated with respect to this reference direction (see
Cheng & Gallistel, 2005, for a similar interpretation of experiments on reorientation
by rats). In the circular room, any error in estimating self-position relative to the ref-
erence direction directly resulted in pointing error, because the environment itself
offered no information to help participants constrain their estimates of the orientation
of the reference direction. However, geometric cues in the three angled environments
at least partially defined the reference direction, which we believe corresponded to an
environmental symmetry axis. For example, the square environment defined the
symmetry axis within +/- 45°. If errors in perceived heading ever exceeded this +/-
45° threshold, then participants would have mistaken a neighboring symmetry axis for
the selected reference direction. The rectangular and trapezoidal environments were
even more forgiving, as the environmental geometries defined those symmetry axes
within +/- 90° and +/- 180°, respectively. Furthermore, participants in the angled
environments could use the environmental geometry to reduce heading errors during
locomotion, thereby preventing those errors from exceeding the threshold allowed by
a given rotational symmetry.
The experiments described in this section demonstrate how ambiguous environ-
mental cues and noisy self-motion cues can be combined to allow for successful
spatial orientation. During normal navigation, we typically have information from
multiple sources, all of which may be imperfect indicators of self-position. By com-
bining those information sources, we can stay oriented with respect to the remem-
bered environment, a crucial step toward successful navigation.
Circular
70
Absolute pointing error [deg]
Square
60 Rectangular
Trapezoidal
50
40
30
20
10
0
2 4 6
Path segments
Fig. 4. Pointing error in a spatial updating task as a function of walked path length, plotted
separately for the four different surrounding environments. Pointing errors increased with
increased walking distance in the round room. In comparison, performance was unaffected by
path length in the square, rectangular, and trapezoidal rooms.
5 Summary and Conclusions

Although sensorimotor and long-term spatial memories have traditionally been re-
searched separately, the current overview indicates that a complete description of
navigation will depend on a better understanding of how these spatial representations
are coordinated to achieve an accurate sense of spatial orientation. This chapter has
reviewed the evidence that long-term spatial memories are orientation-dependent, and
that the selection of reference directions depends on egocentric experiences within the
environment as well as environmentally defined structures, such as intrinsic and ex-
trinsic axes. Environmental symmetry axes are particularly salient cues shown to
influence reference frame selection (Mou et al., 2007). Furthermore, the sensorimotor
representation can access this long-term representation under certain circumstances.
In the experiment by Kelly et al. (2007), the sensorimotor representation of objects
from a previously experienced environment could be reified even though participants
never actually viewed the represented objects again. The environmental context al-
lowed participants to retrieve object locations from long-term memory and rebuild
their sensorimotor representations of those retrieved objects. Building up the sensori-
motor representation through retrieval of information stored in long-term memory is
necessary when navigating toward unseen goal locations. Furthermore, the sensorimo-
tor representation is likely to be partially responsible for generating and adding to the
long-term representation. By keeping track of one’s movements through a new envi-
ronment, new objects contained in the sensorimotor representation (i.e., novel objects
in the visual field) can be added to the long-term, environment-centered spatial mem-
ory. However, the nature of these interactions between long-term and sensorimotor
spatial memories remains poorly understood, and warrants further research.
The experiments on spatial orientation presented in Section 4 represent a step to-
ward understanding this interaction between sensorimotor and long-term representa-
tions. Participants in those experiments are believed to have monitored self-position
and orientation relative to the reference direction used to structure the long-term
memory of the environment, and the selected reference direction most likely corre-
sponded to an axis of environmental symmetry. Path integration helped participants
keep track of the selected reference direction and avoid confusion with neighboring
symmetry axes. This conclusion underscores the importance of the reference direc-
tions used in long-term memory, not just for retrieving inter-object relationships, but
also for staying oriented within remembered spaces and updating those spaces during
self-motion. A more complete understanding of spatial orientation should be in-
formed by further studies of the interaction between long-term spatial memory, sen-
sorimotor spatial memory, and path integration.
References
1. Avraamides, M.N., Kelly, J.W.: Imagined perspective-changing within and across novel
environments. In: Freksa, C., Nebel, B., Knauff, M., Krieg-Brückner, B. (eds.) Spatial
Cognition IV. LNCS (LNAI), pp. 245–258. Springer, Berlin (2005)
2. Avraamides, M.N., Kelly, J.W.: Multiple systems of spatial memory and action. Cogntive
Processing 9, 93–106 (2008)
3. Bridgeman, B., Lewis, S., Heit, G., Nagle, M.: Relation between cognitive and motor-
oriented systems of visual position perception. Journal of Experimental Psychology: Hu-
man Perception and Performance 5, 692–700 (1979)
4. Bülthoff, H.H., Edelman, S.: Psychophysical support for a two-dimensional view interpo-
lation theory of object recognition. Proceedings of the National Academy of Sci-
ences 89(1), 60–64 (1992)
5. Cheng, K.: A purely geometric module in the rat’s spatial representation. Cognition 23,
149–178 (1986)
6. Cheng, K., Gallistel, C.R.: Shape parameters explain data from spatial transformations:
Comment on Pearce et al (2004) and Tommasi and Polli (2004). Journal of Experimental
Psychology: Animal Behavior Processes 31(2), 254–259 (2005)
7. Cheng, K., Newcombe, N.S.: Is there a geometric module for spatial orientation? Squaring
theory and evidence. Psychonomic Bulletin & Review 12(1), 1–23 (2005)
8. Diwadkar, V.A., McNamara, T.P.: Viewpoint dependence in scene recognition. Psycho-
logical Science 8(4), 302–307 (1997)
9. Gallistel, C.R.: The Organization of Action: A New Synthesis. Erlbaum, Hillsdale (1980)
10. Gallistel, C.R.: The Organization of Learning. MIT Press, Cambridge (1990)
11. Hermer, L., Spelke, E.S.: A geometric process for spatial reorientation in young children.
Nature 370, 57–59 (1994)
12. Hermer, L., Spelke, E.S.: Modularity and development: The case of spatial reorientation.
Cognition 61(3), 195–232 (1996)
13. Hintzman, D.L., O’Dell, C.S., Arndt, D.R.: Orientation in cognitive maps. Cognitive Psy-
chology 13, 149–206 (1981)
14. Hodgson, E., Waller, D.: Lack of set size effects in spatial updating: Evidence for offline
updating. Journal of Experimental Psychology: Learning, Memory, & Cognition 32, 854–
866 (2006)
15. Jonsson, E.: Inner Navigation: Why we Get Lost in the World and How we Find Our Way.
Scribner, New York (2002)
16. Kelly, J.W., Avraamides, M.N., Loomis, J.M.: Sensorimotor alignment effects in the learn-
ing environment and in novel environments. Journal of Experimental Psychology: Learn-
ing, Memory & Cognition 33(6), 1092–1107 (2007)
17. Kelly, J.W., McNamara, T.P.: Spatial memories of virtual environments: How egocentric
experience, intrinsic structure, and extrinsic structure interact. Psychonomic Bulletin &
Review 15(2), 322–327 (2008)
18. Klatzky, R.L.: Allocentric and egocentric spatial representations: Definitions, distinctions,
and interconnections. In: Freksa, C., Habel, C., Wender, K.F. (eds.) Spatial Cognition, pp.
1–17. Springer, Berlin (1998)
19. Klatzky, R.L., Loomis, J.M., Golledge, R.G., Cicinelli, J.G., Doherty, S., Pellegrino, J.W.:
Acquisition of route and survey knowledge in the absence of vision. Journal of Motor Be-
havior 22(1), 19–43 (1990)
20. Loomis, J.M., Klatzky, R.L., Golledge, R.G., Philbeck, J.W.: Human navigation by path
integration. In: Golledge, R.G. (ed.) Wayfinding: Cognitive mapping and other spatial
processes, pp. 125–151. Johns Hopkins, Baltimore (1999)
21. May, M.: Imaginal perspective switches in remembered environments: Transformation
versus interference accounts. Cognitive Psychology 48, 163–206 (2004)
22. McNamara, T.P.: How are the locations of objects in the environment represented in mem-
ory? In: Freksa, C., Brauer, W., Habel, C., Wender, K.F. (eds.) Spatial cognition III. LNCS
(LNAI), pp. 174–191. Springer, Berlin (2003)
23. McNamara, T.P., Rump, B., Werner, S.: Egocentric and geocentric frames of reference in
memory of large-scale space. Psychonomic Bulletin & Review 10(3), 589–595 (2003)
24. Milner, A.D., Goodale, M.A.: The visual brain in action. Oxford University Press, Oxford
(1995)
25. Montello, D.R.: Spatial orientation and the angularity of urban routes: A field study. Envi-
ronment and Behavior 23(1), 47–69 (1991)
26. Mou, W., McNamara, T.P.: Intrinsic frames of reference in spatial memory. Journal of Ex-
perimental Psychology: Learning, Memory, and Cognition 28(1), 162–170 (2002)
27. Mou, W., McNamara, T.P., Valiquette, C.M., Rump, B.: Allocentric and egocentric updat-
ing of spatial memories. Journal of Experimental Psychology: Learning, Memory, and
Cognition 30(1), 142–157 (2004)
28. Mou, W., Zhao, M., McNamara, T.P.: Layout geometry in the selection of intrinsic frames
of reference from multiple viewpoints. Journal of Experimental Psychology: Learning,
Memory, and Cognition 33, 145–154 (2007)
29. Presson, C.C.: The development of spatial cognition: Secondary uses of spatial informa-
tion. In: Eisenberg, N. (ed.) Contemporary Topics in Developmental Psychology, pp. 87–
112. Wiley, New York (1987)
ments: Coordinate structure of perspective space. Perception 23, 1447–1455 (1994)
31. Rieser, J.J.: Access to knowledge of spatial structure at novel points of observation. Jour-
nal of Experimental Psychology: Learning, Memory, and Cognition 15(6), 1157–1165
(1989)
32. Rieser, J.J., Rider, E.A.: Young children’s spatial orientation with respect to multiple tar-
gets when walking without vision. Developmental Psychology 27(1), 97–107 (1991)
33. Roskos-Ewoldsen, B., McNamara, T.P., Shelton, A.L., Carr, W.: Mental representations of
large and small spatial layouts are orientation dependent. Journal of Experimental Psy-
chology: Learning, Memory, and Cognition 24(1), 215–226 (1999)
34. Rump, B., McNamara, T.P.: Updating in models of spatial memory. In: Barkowsky, T.,
Knauff, M., Montello, D.R. (eds.) Spatial cognition V. LNCS (LNAI), pp. 249–269.
Springer, Berlin (2007)
35. Schneider, G.E.: Two visual systems. Science 163, 895–902 (1969)
36. Shelton, A.L., McNamara, T.P.: Multiple views of spatial memory. Psychonomic Bulletin
& Review 4(1), 102–106 (1997)
37. Shelton, A.L., McNamara, T.P.: Systems of spatial reference in human memory. Cognitive
Psychology 43(4), 274–310 (2001)
38. Shelton, A.L., McNamara, T.P.: Orientation and perspective dependence in route and sur-
vey learning. Journal of Experimental Psychology: Learning, Memory, and Cognition 30,
158–170 (2004)
39. Sholl, M.J.: Cognitive maps as orienting schemata. Journal of Experimental Psychology:
Learning, Memory, and Cognition 13(4), 615–628 (1987)
40. Sholl, M.J.: The role of a self-reference system in spatial navigation. In: Montello, D. (ed.)
Spatial information theory: Foundations of geographic information science, pp. 217–232.
41. Valiquette, C., McNamara, T.P.: Different mental representations for place recognition and
goal localization. Psychonomic Bulletin & Review 14(4), 676–680 (2007)
42. Waller, D., Hodgson, E.: Transient and enduring spatial representations under disorienta-
tion and self-rotation. Journal of Experimental Psychology: Learning, Memory, & Cogni-
tion 32, 867–882 (2006)
43. Wang, R.F., Crowell, J.A., Simons, D.J., Irwin, D.E., Kramer, A.F., Ambinder, M.S.,
Thomas, L.E., Gosney, J.L., Levinthal, B.R., Hsieh, B.B.: Spatial updating relies on an
egocentric representation of space: Effects of the number of objects. Psychonomic Bulletin
& Review 13, 281–286 (2006)
44. Wang, R.F., Spelke, E.S.: Updating egocentric representations in human navigation. Cog-
nition 77, 215–250 (2000)
45. Werner, S., Schmidt, K.: Environmental reference systems for large-scale spaces. Spatial
Cognition and Computation 1(4), 447–473 (1999)
Map-Based Spatial Navigation:
A Cortical Column Model for Action Planning
Louis-Emmanuel Martinet1,2,3 , Jean-Baptiste Passot2,3, Benjamin Fouque1,2,3 ,

Jean-Arcady Meyer1 , and Angelo Arleo2,3
1
UPMC Univ Paris 6, FRE2507, ISIR, F-75016, Paris, France
2
UPMC Univ Paris 6, UMR 7102, F-75005, Paris, France
3
CNRS, UMR 7102, F-75005, Paris, France
louis-emmanuel.martinet@upmc.fr
Abstract. We modelled the cortical columnar organisation to design

a neuromimetic architecture for topological spatial learning and action
planning. Here, we first introduce the biological constraints and the hy-
potheses upon which our model was based. Then, we describe the learn-
ing architecture, and we provide a series of numerical simulation results.
The system was validated on a classical spatial learning task, the Tolman
& Honzik’s detour protocol, which enabled us to assess the ability of the
model to build topological representations suitable for spatial planning,
and to use them to perform flexible goal-directed behaviour (e.g., to pre-
dict the outcome of alternative trajectories avoiding dynamically blocked
pathways). We show that the model reproduced the navigation perfor-
mance of rodents in terms of goal-directed path selection. In addition,
we present a series of statistical and information theoretic analyses to
study the neural coding properties of the learnt space representations.
Keywords: spatial navigation, topological map, trajectory planning,

cortical column, hippocampal formation.
1 Introduction
Spatial cognition calls upon the ability to learn neural representations of the
spatio-temporal properties of the environment, and to employ them to achieve
goal-oriented navigation. Similar to other high-level functions, spatial cognition
involves parallel information processing mediated by a network of brain struc-
tures that interact to promote effective spatial behaviour [1,2]. An extensive
body of experimental work has investigated the neural bases of spatial cogni-
tion, and a significant amount of evidence points towards a prominent role of
the hippocampal formation (see [1] for recent reviews). This limbic region has
been thought to mediate spatial learning functions ever since location-selective
neurones — namely hippocampal place cells [3], and entorhinal grid cells [4] —
and orientation-selective neurones — namely head-direction cells [5] — were
found by means of electrophysiological recordings from freely moving rats.
Hippocampal place cells, grid cells, and head-direction cells are likely to sub-
serve spatial representations in allocentric (i.e., world centred) coordinates, thus

40 L.-E. Martinet et al.
providing cognitive maps [3] to support spatial behaviour. Yet, to perform flexible
navigation (i.e., to plan detours and/or shortcuts) two other components are nec-
essary: goal representation, and target-dependent action sequence planning [6].
The role of the hippocampal formation in these two mechanisms remains unclear.
On the one hand, the hippocampus has been proposed to encode topological-like
representations suitable for action sequence learning [6]. This hypothesis mainly
relies on the recurrent dynamics generated by the CA3 collaterals of the hip-
pocampus [7]. On the other hand, the hippocampal space code is likely to be
highly redundant and distributed [8], which does not seem adequate for learning
compact topological representations of high-dimensional spatial contexts. Also,
the experimental evidence for high-level spatial representations mediated by a
network of neocortical areas (e.g., the posterior parietal cortex [9], and the pre-
frontal cortex [10]) suggests the existence of an extra-hippocampal action plan-
ning system shared among multiple brain regions [11]. This hypothesis postulates
a distributed spatial cognition system in which (i) the hippocampus would take
part to the action planning process by conveying redundant (and robust) spa-
tial representations to higher associative areas, (ii) a cortical network would
elaborate more abstract and compact representations of the spatial context (ac-
counting for motivation-dependent memories, action cost/risk constraints, and
temporal sequences of goal-directed behavioural responses). Among the corti-
cal areas involved in map building and action planning, the prefrontal cortex
(PFC) may play a central role, as suggested by anatomical PFC lesion studies
showing impaired navigation planning in rats [12]. Also, the anatomo-functional
properties of the PFC seem appropriate to encode abstract contextual memo-
ries not merely based on spatial correlates. The PFC receives direct projections
from sub-cortical structures (e.g., the hippocampus [13], the amygdala [14], and
the ventral tegmental area [15]), and indirect connections from the basal ganglia
through the basal ganglia - thalamocortical loops [16]. These projections provide
the PFC with a multidimensional context, including emotional and motivational
inputs [17], reward-dependent modulation [18], and action-related signals [16].
The PFC seems then well suited to (i) process manifold spatial information
[19], (ii) encode the motivational values associated to spatio-temporal events
[6], and (iii) perform supra-modal decisions [20]. Also, the PFC may be involved
in integrating events in the temporal domain at multiple time scales [21]. The
PFC recurrent dynamics regulated by the modulatory action of dopaminergic
afferents [22] may permit to maintain patterns of activity over long time scales.
Finally, the PFC is likely to be critical to detecting cross-temporal contingencies,
which is relevant to the temporal organisation of behavioural responses, and to
the encoding of retrospective and prospective memories [21].
1.1 Cortical Columnar Organisation: A Computational Principle?

The existence of cortical columns was first reported by Mountcastle [23], who
observed chains of cortical neurones reacting to the same external stimuli simul-
taneously. Cortical columns can be divided in six main layers including: layer I,
mostly containing axons and dendrites; layer IV, receiving sensory inputs from
Cortical Model for Navigation Planning 41
sub-cortical structures (mainly the thalamus); and layer VI, sending outputs to
sub-cortical brain areas (e.g., to the striatum and the thalamus). Layers II-III
and V-VI constitute the so called supragranular and infragranular layers, respec-
tively. The anatomo-functional properties of cortical columns have been widely
investigated [24]. Neuroanatomical findings have indicated that columns can be
divided into several minicolumns, each of which is composed of a population
of interconnected neurones [25]. Thus, a column can be seen as an ensemble of
interrelated minicolumns receiving inputs from cortical areas and other struc-
tures. It processes these afferent signals and projects the responses both within
and outside the cortical network. This twofold columnar organisation has been
suggested to subserve efficient computation and information processing [24].
1.2 Related Work

This paper presents a neuromimetic model of action planning inspired by the
columnar organisation of the mammalian neocortex. Planning is defined here as
the ability, given a state space S and an action space A, to “mentally” explore
the S × A space to infer an appropriate sequence of actions leading to a goal
state sg ∈ S. This definition calls upon the capability of (i) predicting the
consequences of actions, i.e. the most likely state s ∈ S to be reached when an
action a ∈ A is executed from a state s ∈ S, (ii) evaluating the effectiveness
of the selected plan on-line. The model generates a topological representation
of the environment, and it employs an activation-diffusion mechanism [26] to
plan goal-directed trajectories. The activation-diffusion process is based on the
propagation of a reward-dependent activity signal from the goal state sg through
the entire topological network. This propagation process enables the system to
generate action sequences (i.e., trajectories) from the current state s towards sg .
Topological map learning and path planning have been extensively studied in
biomimetic robotics (see [27] for a review). Here we focus on model architectures
that take inspiration from the anatomical organisation of the cortex, and imple-
ment an activation-diffusion planning principle. Burnod [28] proposed one of the
first models of the cortical column architecture, called “cortical automaton”. He
also described a “call tree” process that can be seen as a neuromimetic imple-
mentation of the activation-diffusion principle. Several action selection models
were inspired by Burnod’s hypothesis. Some of these works employed the cor-
tical automaton concept explicitly [29,30,31]. Others used either connectionist
architectures [32,33,34] or Markov decision processes [35]. Yet, none of these
works took into account the multilevel coding property offered by the possibility
to refine the cortical organisation by adding a sublevel to the column, i.e. the
minicolumn. The topological representation presented here exploits this idea by
associating the columnar level to a compact representation of the environment,
and by employing the minicolumn level to characterise the agent behaviour. In
order to validate the model, we have implemented it on a simulated robotic
platform, and tested it on the classical Tolman & Honzik’s navigation task [36].
This protocol allowed us to assess the ability of the system to learn topological
representations, and to exploit them to perform flexible goal-directed behaviour

(e.g., planning detours).
2 Methods
2.1 Single Neurone Model
The elementary computational units of the model are artificial firing-rate neu-
rones i, whose mean discharge ri ∈ [0, 1] is given by

ri (t) = f Vi (t) · (1 ± η) . (1)
where Vi (t) is the membrane potential at time t, f is the transfer function, and
η is a random noise uniformly drawn from [0, 0.01]. Vi varies according to
dVi (t)
τi · = −Vi (t) + Ii (t) . (2)
dt
where τi = 10 ms is the membrane time constant, and Ii (t) is the synaptic drive
generated by all the inputs. Eq. 2 is integrated by using a time step Δt = 1 ms.
Both the synaptic drive Ii (t) and the transfer function f are characteristic of
the different types of model units, and they will be defined thereafter.
2.2 Encoding Space and Actions: Minicolumn and Column Model

The main inputs to the cortical model are the location- and orientation-selective
activities of hippocampal place and head-direction cells, respectively [3,5]. The
hippocampal place field representation is built incrementally as the simulated
animal (i.e., the animat) explores the environment, and it provides the system
with a continuous distributed and redundant state representation S [37,38]. A
major objective of the cortical model was to build a compact state-action rep-
resentation S × A suitable for topological map learning and action planning.
In the model, the basic component of the columnar organisation is the minicol-
umn (vertical grey regions in Fig. 1). An unsupervised learning scheme (Sec. 2.3)
makes the activity of each minicolumn selective to a specific state-action pair
(s, a) ∈ S × A. Notice that a given action a ∈ A represents the allocentric motion
direction of the animat when it performs the transition between two locations
s, s ∈ S. According to the learning algorithm, all the minicolumns selective for
the same spatial location s ∈ S are grouped to form a higher-level computational
unit, i.e. the column (see c and c in Fig. 1A). This architecture is inspired by
biological data showing that minicolumns inside a column have similar selec-
tivity properties [39]. Thus, columns consist of a set of minicolumns that are
incrementally recruited to encode all the state-action pairs (s, a1···N ) ∈ S × A
experienced by the animat at a location s. During planning (Sec. 2.4), all the
minicolumns of a column compete with each other to locally infer the most
appropriate goal-directed action.
Motivation Back-propagation of goal signal Goal

column
wm
c c’
SL wu
wc Current position
column Goal
IL
wl column
wh
Distributed state-space representation Current position

(place cells and head direction cells) column Propagation of the path signal
(A) (B)
Fig. 1. The cortical model and the implementation of the activation-diffusion process.
(A) Columns (c and c ) consist of sets of minicolumns (vertical grey regions), each of
which contains a supragranular (SL) and an infragranular (IL) layer unit. (B) Top:
back-propagation of the motivational signal through the network of SL neurones. Bot-
tom: forward-propagation of the goal-directed action signal through the IL neurones.
Every minicolumn of the model consists of two computational units, repre-

senting supragranular layer (SL) and infragranular layer (IL) neurones (Fig. 1A).
The discharge of SL and IL units simulates the mean firing activity of a popula-
tion of cortical neurones in layers II-III, and V-VI, respectively. Each minicolumn
receives three different sets of afferent projections (Fig. 1A): (i) Hippocampal in-
puts conveying allocentric space coding signals converge onto IL neurones; these
connections are plastic, and their synaptic efficacy is determined by the weight
distribution w h (all the synaptic weights of the model are within the maximum
range of [0, 1]). (ii) Collateral afferents from adjacent cortical columns converge
onto SL and IL neurones via the projections wu and wl , respectively. These lat-
eral connections are learnt incrementally (Sec. 2.3), and play a prominent role
in both encoding the environment topology and implementing the activation-
diffusion planning mechanism. (iii) SL neurones receive projections w m convey-
ing motivation-dependent signals. As shown in Sec. 2.4, this input is employed
to relate the activity of a minicolumn to goal locations.
SL neurones discharge as a function of the motivational signals mediated by
the w u and w m projections. The synaptic drive Ii (t) depolarising a SL neurone
i that belongs to a column c is given by:

u
Ii (t) = max

w ii · ri (t) + wim · rm . (3)
i ∈c =c
where i indexes other SL neurones of the cortical network; wim and rm are the
weight and the intensity of the motivational signal, respectively. In the current
version of the model the motivational input is generated algorithmically, i.e.

wim = 1 if column c is associated to the goal location, wim = 0 otherwise, and the
motivational signal rm = 1. The membrane potential of unit i is then computed
according to Eq. 2, and its firing rate ri (t) is obtained by means of an identity
transfer function f .
Within each minicolumn, SL neurones project onto IL units by means of non-
plastic projections wc (Fig. 1A). Thus, IL neurones are driven by hippocampal
place (HP) cells h (via the projections w h ), IL neurones belonging to adjacent
columns (via the collaterals wl ), and SL units i (via wc ). The synaptic drive of
a IL neurone j ∈ c is:
l
h c
Ij (t) = max wjh · rh (t) , max

wjj · rj (t) + wji · ri (t) . (4)
j ∈c =c
h∈HP
where j indicates other IL neurones of the network; wjic

= 1 if the SL neurone
c
i and the IL neurone j belong to the same minicolumn, wji = 0 otherwise.
The membrane potential Vj (t) is computed by Eq. 2, and a sigmoidal transfer
function f is employed to calculate rj (t). The parameters of the transfer function
change online to adapt the electroresponsiveness properties of IL neurones j to
the strength of their inputs [40].
2.3 Unsupervised Growing Network Scheme for Topological Map

Learning
The topological representation is built incrementally as the animat explores the
environment. At each location visited by the agent at time t the cortical network
is updated if-and-only-if
the infragranular layers of all existing minicolumns re-
main silent, i.e. j H(rj (t) − ρ) = 0, where j indexes all the IL neurones, H
is the Heaviside function (i.e., H(x) = 1 if x ≥ 0, H(x) = 0 otherwise), and
ρ = 0.1 (see [38] for a similar algorithmic implementation of novelty detection
in the hippocampal activity space). If at time t the novelty condition holds, a
new group of minicolumns (i.e., a new column c) is recruited to become selective
to the new place. Then, all the simultaneously active place cells h ∈ HP are
h
connected to the new IL units j ∈ c. Weights wjh are initialised according to
h
wjh = H(rh − ρ) · rh . (5)
For t > t, the synaptic strength of these connections is changed by unsupervised

Hebbian learning combined to a winner-take-all scheme. Let c be the column
selective for the position visited by the animat at time t , i.e. let all the j ∈ c be
the most active IL units of the network at time t . Then:
h h
Δwjh = α · rh · (rj − wjh ). (6)
with α = 0.005. Whenever a state transition occurs, the collateral projections

wl and wu are updated to relate the minicolumn activity to the state-action
space S × A. For instance, let columns c and c denote the animat position
before and after a state transition, respectively (Fig. 1A). A minicolumn θ ∈ c

becomes selective for the locomotion orientation taken by the animat to perform
the transition. A new set of projections wjl j is then established from the IL unit
j ∈ θ of column c to all the IL units j of the column c . In addition, at the
u
supragranular level, a new set of connections wii is learnt to connect all the SL

units of column c , i.e. i ∈ c , to the SL unit i of the minicolumn θ ∈ c. The
strengths of the lateral projections are initialised as:
wjl j = wii
u
= βLT P ∀i , j ∈ c . (7)
with βLT P = 0.9. Finally, in order to adapt the topological representation online,
a synaptic potentiation-depression mechanism can modify the lateral projections
wl and w u . For example, if a new obstacle prevents the animat from achieving
a previously learnt transition from column c to c (i.e., if the activation of the
IL unit j ∈ θ ∈ c is not followed in the time by the activation of all IL units
j ∈ c ), then a depression of the wjl j synaptic efficacy occurs:
Δwjl j = −βLT D · wjl j ∀j ∈ c . (8)

u
where βLT D = 0.5. The projections wii are updated in a similar manner. A
compensatory potentiation mechanism reinforces both wl and wu connections

whenever a previously experienced transition is performed successfully:
Δwjl j = βLT P − wjl j ∀j ∈ c . (9)

u l u
wii are updated similarly. Notice that w , w ∈ [0, βLT P ].
2.4 Action Planning

The model presented here aims at developing a high-level controller determin-
ing the spatial behaviour based on action planning. Yet, a low-level reactive
module subserves the obstacle-avoidance behaviour. Whenever the proximity
sensors detect an obstacle, the reactive module takes control and prevents col-
lisions. Also, the simulated animal behaves in order to either follow planned
pathways (i.e., exploitation) or improve the topological map (i.e., exploration).
This exploitation-exploration tradeoff is governed by an -greedy selection mech-
anism, with ∈ [0, 1] decreasing exponentially over time [38].
Fig. 1B shows an example of activation-diffusion process mediated by the
columnar network. During trajectory planning, the SL neurones of the column
corresponding to the goal location sg are activated via a motivational signal rm
(Eq. 3). Then, the SL activity is back-propagated through the network by means
of the lateral projections wu (Fig. 1B, top). During planning, the responsiveness
of IL neurones (Eq. 4) is decreased to detect coincident inputs. In particular, the
occurrence of the SL input ri is a necessary condition for a IL neurone j to fire.
In the presence of the SL input ri , either the hippocampal signal rh or the inter-
column signal rj is sufficient to activate the IL unit j. When the back-propagated
Goal
Food Box
Block B B
Path 1
Gate A
Block A P1
P2
Pa
th
2
Path 3
P3
Starting Place
Start
(A) (B)
Fig. 2. (A) Tolman & Honzik’s maze (adapted from [36]). The gate near the second
intersection prevented rats from going from right to left. (B) The simulated maze
and robot. The dimensions of the simulated maze were taken so as to maintain the
proportions of the Tolman & Honzik’s setup. Bottom-left inset: the real e-puck mobile
robot has a diameter of 70 mm and is 55 mm tall.
goal signal reaches the minicolumns selective for the current position s this coin-
cidence event occurs, which triggers the forward propagation of a goal-directed
path signal through the projections w l (Fig. 1B, bottom). Goal-directed trajec-
tories are generated by reading out the successive activations of IL neurones.
Action selection calls upon a competition between the minicolumns encoding
the (s, a1···N ) ∈ S × A pairs, where s is the current location, and a1···N are the
transitions from s to adjacent positions s . For sake of robustness, competition
occurs over a 10-timestep cycle. Notice that each SL synaptic relay attenuates
u
the goal signal by a factor wii (Eq. 3). Thus, the smaller the number of synaptic
relays, the stronger the goal signal received by the SL neurone corresponding to
the current location s. As a consequence, because the model column receptive
fields are distributed rather uniformly over the environment, the intensity of the
goal signal at a given location s is correlated to the distance between s and the
target position sg .
2.5 Behavioural Task and Simulated Agent

In order to validate our navigation planning system, we chose the classical ex-
perimental task proposed by Tolman & Honzik [36]. The main objective of this
behavioural protocol was to demonstrate that rodents undergoing a navigation
test were able to show some “insights”, e.g. to predict the outcome of alterna-
tive trajectories leading to a goal location in the presence of blocked pathways.
The original Tolman & Honzik’s maze is shown in Fig. 2A. It consisted of three
narrow alleys of different lengths (Paths 1, 2, and 3) guiding the animals from a
starting position (bottom) to a feeder location (top).
Fig. 2B shows a simulated version of the Tolman & Honzik’s apparatus, and
the simulated robot1 . We emulated the experimental protocol designed by Tol-
man & Honzik to assess the animats’ navigation performance. The overall pro-
tocol consisted of a training period followed by a probe test. Both training and
probe trials were stopped when the animat had found the goal.
Training period: it lasted 14 days with 12 trials per day. The animats could
explore the maze and learn their navigation policy.
– During Day 1, a series of 3 forced runs was carried out, in which additional
doors were used to force the animats to go successively through P1, P2, and
P3. Then, during the remaining 9 runs, all additional doors were removed,
and the subjects could explore the maze freely. At the end of the first training
day, a preference for P1 was expected to be already developed [36].
– From Day 2 to 14, a block was introduced at place A (Fig. 2B) to require
a choice between P2 and P3. In fact, additional doors were used to close
the entrances to P2 and P3 to force the animats to go first to the Block A.
Then, doors were removed, and the subjects were forced to decide between
P2 and P3 on their way back to the first intersection. Each day, there were
10 “Block at A” runs that were mixed with 2 non-successive free runs to
maintain the preference for P1.
Probe test period: it lasted 1 day (Day 15), and it involved 7 runs with a block at
position B to interrupt the common section (Fig. 2B). The animats were forced
to decide between P2 and P3 when returning to the first intersection point.
For these experiments, Tolman & Honzik used 10 rats with no previous train-
ing. In our simulations, we used a population of 100 animats, and we assessed
the statistical significance of the results by means of an ANOVA analysis (the
significant threshold was set at 10−2 , i.e. p < 0.01 was considered significant).
2.6 Theoretical Analysis

A series of analyses was done to characterise the neural activities subserving
the behavioural responses of the system. We recall that one of the aims of the
cortical column model was to build a spatial code less redundant than the hip-
pocampal place (HP) field representation. Yet, it is relevant to show that the
spatial properties (e.g., spatial information content) of the neural responses were
preserved in the cortical network.
The set of stimuli S consisted of the places visited by the animat. For the
analyses, the continuous two-dimensional input space was discretized, with each
location s ∈ S defined as a 5 x 5 cm square region of the environment. The size
of the receptive field of a neurone j was taken as 2·σS (j), with σS (j) denoting
the standard deviation around the mean of the response tuning curve.
A spatial density measure was used to assess the level of redundancy of a
neural spatial code, i.e. the average number of units necessary to encode a place:

D= H(rj (s) − σJ (s)) s∈S . (10)
j∈J
1 c
The model was implemented by means of the Webots robotics simulation software.
with rj (s) being the response of the neurone j when the animat is visiting the
location s ∈ S, and σJ (s) representing the standard deviation of the population
activity distribution for a given stimulus s.
Another measure was used to analyse the neural responses, the kurtosis func-
tion. This measure is defined as the normalised fourth central moment of a
probability distribution, and estimates its degree of peakedness. If applied to a
neural response distribution, the kurtosis can be used to measure its degree of
sparseness across both population and time [41]. We employed an average pop-
ulation kurtosis measure k̄1 = k1 (s)s∈S to estimate how many neurones j of a
population J were, on average, responding to a given stimulus s simultaneously.
The kurtosis k1 (s) was taken as:
k1 (s) = [rj (s) − r̄J (s))/σJ (s)]4 j∈J . (11)
with r̄J (s) = rj (s)j∈J . Similarly, an average lifetime kurtosis k̄2 = k2 (j)j∈J
was employed to assess how rarely a neurone j responded across time. The k2 (j)
function was given by:
k2 (j) = [rj (s) − r̄j )/σj ]4 s∈S (12)
with r̄j = rj (s)s∈S , and σj being the standard deviation of the cell activity rj .
Finally, we used an information theoretic analysis [42] to characterise the neu-
ral codes of our cortical and hippocampal populations. The mutual information
M I(S; R) between neural responses R and spatial locations S was computed:

P (r, s)
M I(S; R) = P (r, s) log2 (13)
P (r)P (s)
s∈S r∈R
where r ∈ R indicated firing rates, P (r, s) the joint probability of having the
animat visiting a region s ∈ S while recording a response r, P (s) the a priori
probability computed
as the ratio between time spent at place s and the total
time, and P (r) = s∈S P (r, s) the probability of observing a neural response r.
The continuous output space of a neurone, i.e. R = [0, 1], was discretized via a
binning procedure (bin-width equal to 0.1). The M I(S; R) measure allowed us
to quantify the spatial information content of a neural code, i.e. how much could
be learnt about the animat’s position s by observing the neural responses r.
3 Results
3.1 Spatial Behaviour
Day 1. During the first 12 training trials, the animats learnt the topology of
the maze and planned their navigation trajectory in the absence of both block
A and B. Similar to Tolman & Honzik’s findings, our results show that the
model learnt to select the shortest goal-directed pathway P1 significantly more
frequently than the alternative trajectories P2, P3 (ANOVA, F2,297 = 168.249,
8 150 7
Number of occurences
7 6
5
6 100
4
5
3
4 50
2
3
1
2 0 0
P1 P2 P3 P1 P2 P3 P1 P2 P3
1 1 1
0.5 0.5 0.5
0 0 0
(A) (B) (C)
Fig. 3. Behavioural results. Top row: mean number of transits through P1, P2, and
P3 (averaged over 100 animats). Bottom row: occupancy grid maps. (A) During the
first 12 training trials (day 1) the simulated animals developed a significant preference
for P1 (no significant difference was observed between P2 and P3). (B) During the
following 156 training trials (days 2-14, in the presence of block A, Fig. 2B) P2 was
selected significantly more frequently than P3. (C) During the last 7 trials (day 15, test
phase), the block A was removed whereas the block B was introduced. The animats
exhibited a significant preference for P3 compared to P2.
p < 0.0001). The quantitative and qualitative analyses reported on Fig. 3A de-
scribe the path selection performance averaged over 100 animats.
Days 2-14. During this training phase (consisting of 156 trials), a block was
introduced at location A, which forced the animats to update their topological
maps dynamically, and to plan a detour to the goal. The results reported by Tol-
man & Honzik provided strong evidence for a preference for the shortest detour
path P2. Consistently, in our simulations (Fig. 3B) we observed a significantly
larger number of transits through P2 compared to P3 (ANOVA, F1,198 = 383.068
p < 0.0001), P1 being ignored in this analysis (similar to Tolman & Honzik’s
analysis) because blocked.
Day 15. Seven probe trials were performed during the 15th day of the simu-
lated protocol, by removing the block A and adding a new block at location B.
This manipulation aimed at testing the “insight” working hypothesis: after a first
run through the shortest path P1 and after having encountered the unexpected
block B, will animats try P2 (wrong behaviour) or will they go directly through
P3 (correct behaviour)? According to Tolman & Honzik’s results, rats behaved
as predicted by the insight hypothesis, i.e. they tended to select the longer but
20 120
Number of individuals
15
Number of errors
80
10
40
5
0
0
0 10 20 30 40 50 Learning Randomly
Number of errors individual behaving
individual
(A) (B)
Fig. 4. Comparison between a learning and a randomly behaving agent. (A) Error
distribution of learning (black histogram) versus random (grey line) animats. (B) Mean
number of errors made by the model and by a randomly behaving agent.
effective P3. The authors concluded that rats were able to inhibit the previously
learnt policy (i.e., the “habit behaviour” consisting of selecting P2 after a fail-
ure of P1 during the 156 previous training trials). Our probe test simulation
results are shown in Fig. 3C. Similar to rats, the animats exhibited a signifi-
cant preference for P3 compared to P2 (ANOVA, F1,198 = 130.15, p < 0.0001).
Finally, in order to further assess the mean performance of the system during
the probe trials, we compared the action selection policy of learning animats
with that of randomly behaving (theoretical) animats. Fig. 4A provides the re-
sults of this comparison by showing the error distribution over the population of
learning agents (black histogram) and randomly behaving agents (grey curve).
The number of errors per individual are displayed in the boxplot of Fig. 4B.
These findings indicate a significantly better performance of learning animats
compared to random agents (ANOVA, F1,196 = 7.4432, p < 0.01).
3.2 Analysis of Neural Activities
Fig. 5A contrasts the mean spatial density (Eq. 10) of the HP receptive fields
with that of cortical column receptive fields. It is shown that, compared to
the upstream hippocampal space code, the cortical column model reduced the
redundancy of the learnt spatial code significantly (ANOVA, F1,316 = 739.2,
p < 0.0001). Fig. 5B shows the probability distribution representing the number
of active column units (solid curve) and active HP cells (dashed line) per spatial
location s ∈ S. As shown by the inset boxplots, the distribution kurtosis was
significantly higher for column units than for HP cells (ANOVA, F1,198 = 6057,
p < 0.0001). To further investigate this property, we assessed the average pop-
ulation kurtosis k̄1 (Eq. 11) of both columnar and HP cell activities (Fig. 5C).
Again, the columnar population activity exhibited a significantly higher kurtosis
1.0 90
140
18
Distribution kurtosis
Kurtosis across population

120
Density of receptive fields
0.8 60
100
Probability
12 0.6 30
80
0 60
0.4 Place Column
6 units units
40
0.2
20
0 0
0
0 5 10 15 20 25 30
Place Column Place Column
units units Number of active units units units
(A) (B) (C)
Fig. 5. (A) Spatial density of the receptive fields of HP cells and cortical column units.
(B) Probability distribution of the number of active column units (solid line) and ac-
tive HP cells (dashed line) per spatial location s ∈ S. Inset boxplots: kurtosis measures
for the two distributions. (C) Population kurtosis of columnar and hippocampal as-
semblies.
than the HP cell activity (ANOVA, F1,3128 = 14901, p < 0.0001). These results
suggest that, in the model, the cortical column network was able to provide a
sparser state-space population coding than HP units.
In a second series of analyses, we focused on the activity of single cells, and
we compared the average lifetime kurtosis k̄2 (Eq. 12) of cortical and HP units.
As reported on Fig. 6A, we found that the kurtosis across time did not differ
significantly between cortical and HP units (ANOVA, F1,2356 = 2.2699, p <
0.13). This result suggests that, on average, single cortical and HP units tended
to respond to a comparable number of stimuli (i.e., spatial locations) over their
lifetimes. Along the same line, we recorded the receptive fields of the two types
of units. Figs. 6B,C display some samples of place fields of cortical and HP
cells, respectively. As expected, we found a statistical anticorrelation between
the lifetime kurtosis and the size of the receptive fields. The example of Fig. 6D
shows that, for a randomly chosen animat performing the whole experimental
protocol (15 days), the size of hippocampal place fields was highly anticorrelated
to the HP cells’ lifetime kurtosis (correlation coefficient = −0.94). These results
add to those depicted in Fig. 5 in that the increase of sparseness at the level
of the cortical population (compared to HP cells) was not merely due to an
enlargement of the receptive fields (or, equivalently, to a decrease of the lifetime
stimulus-dependent activity).
Despite their less redundant code, were cortical columns able to provide a
representation comparable to that of HP cells in terms of spatial information
content? The results of our information theoretic analysis (Eq. 13) suggest that
this was indeed the case. Fig. 6E shows that, for a randomly chosen animat,
0.9
35 0.8
Kurtosis across time

0.7
0.6
25
0.5
0.4
15
0.3
0.2
5 0.1
Place Column Receptive fieds of Receptive fieds

units units column units of place cells
(A) (B) (C)
0.22 0.7
Mutual information I(S;R) (bits)

Width of receptive fields
0.6
0.20
0.5
0.18 0.4
0.3
0.16
0.2
0.14
0.1
0.12 0
5 10 15 20 25 30
Place Column
Kurtosis across time units units
(D) (E)
Fig. 6. (A) Lifetime kurtosis for column and HP units. (B, C) Samples of receptive
fields of three column units and four HP cells. (D) Correlation between the lifetime
kurtosis and the size of receptive fields. (E) Mutual information M I(S; R) between
the set of spatial locations S and the activity R for both cortical and HP units.
the average amount of spatial information conveyed by cortical units was not
significantly lower than that of HP cells (ANOVA, F1,140 = 0.8034, P < 0.3716).
4 Discussion
We presented a navigation model that builds a topological map of the environ-

ment incrementally, and uses it to plan a course of actions leading to a goal
location. The model was employed to solve the classical Tolman & Honzik’s task
[36]. As aforementioned, other models have been proposed to solve goal-directed
navigation tasks. They are mainly based on the properties of hippocampal (e.g.,
[43]), and prefrontal cortex (e.g., [31]) neural assemblies. However, most of these
models do not perform action planning as defined in this paper (Sec. 1). Sam-
sonovich and Ascoli [43] implement a local path finding mechanism to select the
most suitable orientation leading to the goal. Similarly, Hasselmo’s model [31]
does not plan a sequence of actions from the current location to the goal but
it rather infers the first local action to be taken, based upon a back-propagated
goal signal. Yet, these two models rely on discretized state spaces (with prede-
fined grid units coding for places), whereas our model uses a place field popu-
lation providing a continuous representation of the environment [38]. Also, our
model learns topological maps coding for the state-action space simultaneously.
In the model by Samsonovich and Ascoli [43] no topological information is rep-
resented, but only a distance measure between each visited place and a set of
potential goals. Likewise, in Hasselmo’s model states and actions are not jointly
represented, which generates a route-based rather than a map-based navigation
system [1].
We adopted a three-fold working hypothesis according to which (i) the hip-
pocampus would play a prominent role in encoding spatial information; (ii)
higher-level cortical areas, particularly the PFC, would mediate multidimen-
sional contextual representations (e.g., coding for motivation-dependent memo-
ries and action cost/risk constraints) grounded on the hippocampal spatial code;
(iii) neocortical representations would facilitate the temporal linking of multi-
ple contexts, and the sequential organisation (e.g., planning) of behavioural re-
sponses. The preliminary version of the model presented here enabled us to focus
on some basic computational properties, such as the ability of the columnar or-
ganisation to learn a compact topological representation, and the efficiency of the
activation-diffusion planning mechanism. Further efforts will be put to integrate
multiple sources of information. For example, the animat should be able to learn
maps that encode reward (subjective) values, and action-cost constraints. Also,
these maps should be suitable to represent multiple spatio-temporal scales to
overcome the intrinsic limitation of the activation-diffusion mechanism in large
scale environments. Additionally, these multiscale maps should allow the model
to infer high-level shortcuts to bypass low-level environmental constraints.
The neurocomputational approach presented here aims at generating cross-
disciplinary insights that may help to systematically explore potential connections
between findings on the neuronal level (e.g., single-cell discharge patterns), and
observations on the behavioural level (e.g., spatial navigation). Mathematical rep-
resentations permit to describe both the space and time components characteris-
ing the couplings between neurobiological processes. Models can help to scale up
from single cell properties to the dynamics of neural populations, and generate
novel hypotheses about their interactions to produce complex behaviour.
Acknowledgments. Granted by the EC Project ICEA (Integrating Cognition,
Emotion and Autonomy), IST-027819-IP.
References
1. Arleo, A., Rondi-Reig, L.: Multimodal sensory integration and concurrent navi-
gation strategies for spatial cognition in real and artificial organisms. J. Integr.
Neurosci. 6(3), 327–366 (2007)
2. Dollé, L., Khamassi, M., Girard, B., Guillot, A., Chavarriaga, R.: Analyzing in-
teractions between navigation strategies using a computational model of action
selection. In: Freksa, C., et al. (eds.) SC 2008. LNCS (LNAI), vol. 5248, pp. 71–86.
3. O’Keefe, J., Nadel, L.: The Hippocampus as a Cognitive Map. Oxford University
Press, Oxford (1978)
4. Hafting, T., Fyhn, M., Molden, S., Moser, M.B., Moser, E.I.: Microstructure of a
spatial map in the entorhinal cortex. Nature 436(7052), 801–806 (2005)
5. Wiener, S.I., Taube, J.S.: Head Direction Cells and the Neural Mechansims of
Spatial Orientation. MIT Press, Cambridge (2005)
6. Poucet, B., Lenck-Santini, P.P., Hok, V., Save, E., Banquet, J.P., Gaussier, P.,
Muller, R.U.: Spatial navigation and hippocampal place cell firing: the problem of
goal encoding. Rev. Neurosci. 15(2), 89–107 (2004)
7. Amaral, D.G., Witter, M.P.: The three-dimensional organization of the hippocam-
pal formation: a review of anatomical data. Neurosci. 31(3), 571–591 (1989)
8. Wilson, M.A., McNaughton, B.L.: Dynamics of the hippocampal ensemble code
for space. Science 261, 1055–1058 (1993)
9. Nitz, D.A.: Tracking route progression in the posterior parietal cortex. Neu-
ron. 49(5), 747–756 (2006)
10. Hok, V., Save, E., Lenck-Santini, P.P., Poucet, B.: Coding for spatial goals in
the prelimbic/infralimbic area of the rat frontal cortex. Proc. Natl. Acad. Sci.
USA. 102(12), 4602–4607 (2005)
11. Knierim, J.J.: Neural representations of location outside the hippocampus. Learn.
Mem. 13(4), 405–415 (2006)
12. Granon, S., Poucet, B.: Medial prefrontal lesions in the rat and spatial navigation:
evidence for impaired planning. Behav. Neurosci. 109(3), 474–484 (1995)
13. Jay, T.M., Witter, M.P.: Distribution of hippocampal ca1 and subicular efferents
in the prefrontal cortex of the rat studied by means of anterograde transport of
phaseolus vulgaris-leucoagglutinin. J. Comp. Neurol. 313(4), 574–586 (1991)
14. Kita, H., Kitai, S.T.: Amygdaloid projections to the frontal cortex and the striatum
in the rat. J. Comp. Neurol. 298(1), 40–49 (1990)
15. Thierry, A.M., Blanc, G., Sobel, A., Stinus, L., Golwinski, J.: Dopaminergic ter-
minals in the rat cortex. Science 182(4111), 499–501 (1973)
16. Uylings, H.B.M., Groenewegen, H.J., Kolb, B.: Do rats have a prefrontal cortex?
Behav. Brain. Res. 146(1-2), 3–17 (2003)
17. Aggleton, J.: The amygdala: neurobiological aspects of emotion, memory, and men-
tal dysfunction. Wiley-Liss, New York (1992)
18. Schultz, W.: Predictive reward signal of dopamine neurons. J. Neurophysiol. 80(1),
1–27 (1998)
19. Jung, M.W., Qin, Y., McNaughton, B.L., Barnes, C.A.: Firing characteristics of
deep layer neurons in prefrontal cortex in rats performing spatial working memory
tasks. Cereb. Cortex 8(5), 437–450 (1998)
20. Otani, S.: Prefrontal cortex function, quasi-physiological stimuli, and synaptic plas-
ticity. J. Physiol. Paris 97(4-6), 423–430 (2003)
21. Fuster, J.M.: The prefrontal cortex–an update: time is of the essence. Neu-
ron. 30(2), 319–333 (2001)
22. Cohen, J.D., Braver, T.S., Brown, J.W.: Computational perspectives on dopamine
function in prefrontal cortex. Curr. Opin. Neurobiol. 12(2), 223–229 (2002)
23. Mountcastle, V.B.: Modality and topographic properties of single neurons of cat’s
somatic sensory cortex. J. Neurophysiol. 20(4), 408–434 (1957)
24. Mountcastle, V.B.: The columnar organization of the neocortex. Brain 120, 701–
722 (1997)
25. Buxhoeveden, D.P., Casanova, M.F.: The minicolumn hypothesis in neuroscience.
Brain 125(5), 935–951 (2002)
26. Hampson, S.: Connectionist problem solving. In: The Handbook of Brain Theory
and Neural Networks, pp. 756–760. The MIT Press, Cambridge (1998)
27. Meyer, J.A., Filliat, D.: Map-based navigation in mobile robots - ii. a review of
map-learning and path-planing strategies. J. Cogn. Syst. Res. 4(4), 283–317 (2003)
28. Burnod, Y.: An adaptative neural network: the cerebral cortex. Masson (1989)
29. Bieszczad, A.: Neurosolver: a step toward a neuromorphic general problem solver.
Proc. World. Congr. Comput. Intell. WCCI94 3, 1313–1318 (1994)
30. Frezza-Buet, H., Alexandre, F.: Modeling prefrontal functions for robot navigation.
IEEE Int. Jt. Conf. Neural. Netw. 1, 252–257 (1999)
31. Hasselmo, M.E.: A model of prefrontal cortical mechanisms for goal-directed be-
havior. J. Cogn. Neurosci. 17(7), 1115–1129 (2005)
32. Schmajuk, N.A., Thieme, A.D.: Purposive behavior and cognitive mapping: a neu-
ral network model. Biol. Cybern. 67(2), 165–174 (1992)
33. Dehaene, S., Changeux, J.P.: A hierarchical neuronal network for planning behav-
ior. Proc. Natl. Acad. Sci. USA. 94(24), 13293–13298 (1997)
34. Banquet, J.P., Gaussier, P., Quoy, M., Revel, A., Burnod, Y.: A hierarchy of asso-
ciations in hippocampo-cortical systems: cognitive maps and navigation strategies.
Neural Comput. 17, 1339–1384 (2005)
35. Fleuret, F., Brunet, E.: Dea: an architecture for goal planning and classification.
Neural Comput 12(9), 1987–2008 (2000)
36. Tolman, E.C., Honzik, C.H.: ”Insight” in rats. Univ. Calif. Publ. Psychol. 4(14),
215–232 (1930)
37. Arleo, A., Gerstner, W.: Spatial orientation in navigating agents: modeling head-
direction cells. Neurocomput. 38(40), 1059–1065 (2001)
38. Arleo, A., Smeraldi, F., Gerstner, W.: Cognitive navigation based on nonuniform
gabor space sampling, unsupervised growing networks, and reinforcement learning.
IEEE Trans. Neural. Netw. 15(3), 639–651 (2004)
39. Rao, S.G., Williams, G.V., Goldman-Rakic, P.S.: Isodirectional tuning of adjacent
interneurons and pyramidal cells during working memory: evidence for microcolum-
nar organization in pfc. J. Neurophysiol. 81(4), 1903–1916 (1999)
40. Triesch, J.: Synergies between intrinsic and synaptic plasticity mechanisms. Neural
Comput. 19(4), 885–909 (2007)
41. Willmore, B., Tolhurst, D.J.: Characterizing the sparseness of neural codes. Netw.
Comput. Neural Syst. 12(3), 255–270 (2001)
42. Bialek, W., Rieke, F., de Ruyter van Steveninck, R., Warland, D.: Reading a neural
code. Science 252(5014), 1854–1857 (1991)
43. Samsonovich, A., Ascoli, G.: A simple neural network model of the hippocampus
suggesting its pathfinding role in episodic memory retrieval. Learn. Mem. 12, 193–
208 (2005)
Efficient Wayfinding in Hierarchically
Regionalized Spatial Environments
Thomas Reineking, Christian Kohlhagen, and Christoph Zetzsche
Cognitive Neuroinformatics
University of Bremen
28359 Bremen, Germany
{trking,ckohlhag,zetzsche}@informatik.uni-bremen.de
Abstract. Humans utilize region-based hierarchical representations in

the context of navigation. We propose a computational model for repre-
senting region hierarchies and define criteria for automatically generating
them. We devise a cognitively plausible online wayfinding algorithm ex-
ploiting the hierarchical decomposition given by regions. The algorithm
allows an agent to derive plans with decreasing detail level along paths,
enabling the agent to obtain the next action in logarithmic time and com-
plete solutions in almost linear time. The resulting paths are reasonable
approximations of optimal shortest paths.
Keywords: Navigation, hierarchical spatial representation, regions, re-

gion hierarchy, wayfinding.
1 Introduction
Agents situated in spatial environments must be capable of autonomous navigation
using prior learned representations. There exist a wide variety of approaches for
representing environments ranging from metrical maps [1] to topological graphs [2].
In the context of large-scale wayfinding topological models seem cognitively more
plausible because they are robust with regard to global consistency, and because
they permit abstracting from unnecessary details, enabling higher level planning
[3]. Topological graph-based representations of space can be divided into those uti-
lizing single “flat” graphs and those that employ hierarchies of graphs for different
layers of granularity. Single-graph schemes are limited in case of large domains since
action selection by an agent may take unacceptably long times due to huge search
spaces. Hierarchical approaches on the other hand decompose these spaces and are
therefore significantly more efficient but solutions are not always guaranteed to
be optimal.
One possibility for a hierarchical representation is to assume graph-subgraph
structures in which higher levels form subsets of lower, more detailed levels.
This approach has been particularly popular in geographical information sys-
tems (GIS), where this technique can be used to eliminate unwanted details.
Based on this idea a domain-specific path planning algorithm for street maps was
proposed in [4]. The authors approximated street maps by connection grids and

Efficient Wayfinding in Hierarchically Regionalized Spatial Environments 57
Fig. 1. Small indoor environment with superimposed region hierarchy. Local connec-
tivity graphs are used for efficient near-optimal path planning (not shown).
constructed each hierarchical layer as a subset of the next lower layer. Another
way of defining topological hierarchies is having nodes at higher levels represent
sets of nodes at lower levels. A hierarchical extension of the A∗ algorithm (HPA∗ )
can be found in [5]. In this approach a topological graph is abstracted from an
occupancy gridmap which facilitates restricting the search space. In [6] the D∗
algorithm for robot path planning is modified in order to support hierarchies
consisting of different node classes. Unlike the other approaches the hierarchical
D∗ algorithm is guaranteed to generate optimal paths, however, it requires the
offline computation and storage of partial paths. Furthermore, it depends on the
availability of exact metrical information. In addition to wayfinding, hierarchical
representations have been successfully applied to related spatial reasoning prob-
lems, such as the traveling salesman problem [7,8] or the automated generation
of route directions [9].
In this paper we introduce a cognitively motivated hierarchical representa-
tion which is based on regions. We provide a formal data structure for this
representation, and we develop an efficient wayfinding algorithm that exploits
its specific properties. Many natural and man-made environments exhibit an
intrinsic regionalization that can be directly exploited for building hierarchical
representations and there is strong experimental evidence that humans actually
make use of region-based hierarchical representations in the context of navi-
gation [10,11,12,13,14]. Research on cognitive maps supports this idea and has
identified the hierarchical nature of spatial representations as one of their crucial
properties [15,16].
58 T. Reineking, C. Kohlhagen, and C. Zetzsche
Our work originated from a project on a cognitive agent which explores and
navigates through an indoor environment by means of a hierarchical region-
based representation [17]. The architecture was based on a biologically inspired
approach for recognizing regions by sensorimotor features [18]. Figure 1 shows
a small example of a typical environment with a superimposed hierarchy. In
this representation, regions were formed by means of visual scene analysis [19]
and intrinsic connectivity structure. However, this solution depends on specific
properties of the environment. In the current paper we hence use a quadtree
approach for building hierarchies, to enable comparability with other approaches
and to ease complexity analysis.
In our model smaller regions are grouped to form new regions at the next hi-
erarchy level which yields a tree-like structure. This hierarchical representation
is augmented by local connectivity graphs for each subtree. Explicit metrical
information is completely disregarded, instead the regions themselves constitute
a qualitative metric at different levels of granularity. By combining the hierar-
chically organized connectivity graphs with this qualitative metric we were able
to develop an efficient online wayfinding algorithm that produces near-optimal
paths while drastically reducing the search space. Abstract paths are determined
at the top of subtrees and recursively broken down to smaller regions. By only
relaxing the first element of an abstract path the next action can be obtained in
logarithmic time depending on the number of regions while generating complete
paths can be done with almost linear complexity.
The idea of region-based wayfinding has been introduced as a heuristic in [13],
but without an explicit computational model. The aforementioned hierarchical
wayfinding algorithms do not consider the topological notion of regions as an
integral part of the underlying representation, nor can they be easily adapted to
regionalized environments in general.
The paper is structured into two parts. The first explains the hierarchical
region representations, its properties, and how it can be constructed. The second
introduces the wayfinding algorithm utilizing this data structure. We analyze its
algorithmic complexity and compare it to existing path planning approaches.
We conclude in a short discussion about advantages of the proposed model and
give hints towards future extensions.
2 Region Hierarchy
In this section we describe the region hierarchy as a formal data structure and
demonstrate how valid hierarchies can be constructed. We argue that most real-
world environments are inherently regionalized in that they are comprised of
areas that form natural units. These units form larger regions at higher levels of
granularity, resulting in a hierarchy of regions.
Most approaches in the context of path planning rely on metric information for
estimating distances. As mentioned in the introduction the proposed representa-
tion enables an agent to obtain qualitative distances based on regions, thus mak-
ing quantitative information an optional addition. A qualitative metric is given if
regions of one hierarchy level are of similar size and approximately convex. The
length of a path can then be assessed by the number of crossed regions at a given
level. This is especially useful since it allows an agent to estimate distances with
near-arbitrary precision depending on the considered hierarchy level.
In order to derive a computational model of the representation it is necessary
to make assumptions about how regions are modeled. First we assume that
each region is fully contained by exactly one region at next coarser level of
granularity, that there is a single region containing all other ones, and that
the set of all regions therefore constitutes a tree. Humans may use a more fuzzy
representation with fluent region boundaries and less strict containment relations
[20] but a tree of regions seems a reasonable approximation. Second, we demand
the descendants of a region to be reachable from each other without leaving the
parent region. This asserts the existence of a path within the parent region which
can be used as a clustering criterion for constructing valid hierarchies.
Unlike a flat connectivity graph a hierarchy allows the representation of con-
nectivity at multiple layers of detail. We propose imposing region connectivity
on the tree structure by storing connections in the smallest subtree only, thus
decomposing a global connectivity graph into local subgraphs. This limits the
problem space of a wayfinding task to the corresponding subtree, thus exclud-
ing large parts of the environment at each recursion step which is in fact the
underlying idea for the wayfinding algorithm described in the next section.
2.1 Formal Definition

Here we state the proposed data structure formally using first-order logic. In the
following formulas all variables are understood to be regions. We introduce basic
properties of the hierarchy, explain how region connectivity is represented and
at the end we give a key requirement for regions in the navigation context.
First we provide a relation which expresses the containment of a region c
by another region p and which can be thought of as defining a subset of the
encompassing region. This relation is transitive with respect to a third region g:
∀c, p, g : in(c, p) ∧ in(p, g) ⇒ in(c, g). (1)
The in predicate is reflexive for arbitrary regions r but to simplify notation we also
define an irreflexive version in ∗ for which the following expression is not true:
∀r : in(r, r). (2)
The tree’s root W contains all other regions:
∀r : in(r, W). (3)
A direct descendant c of a region p is given by a region r that is located in (or
equal to) c:
∀c, p, r : child(c, p, r) ⇒ in(r, c) ∧ in∗ (c, p)
∧ (¬∃d : in∗ (d, p)
∧ in∗ (c, d)). (4)
In order to determine the smallest subtree in which two nodes r1 , r2 are located, it
is necessary to determine their first common ancestor (FCA) f , i.e., the subtree’s
root node:
∀f, r1 , r2 : fca(f, r1 , r2 ) ⇔ in(r1 , f ) ∧ in(r2 , f )
∧ (¬∃p : in∗ (p, f )
∧ in(r1 , p)
∧ in(r2 , p)). (5)
The hierarchy is composed of two kinds of nodes: atomic regions and complex
regions. Atomic regions correspond to places in the environment that are not
further divided while complex regions comprise atomic regions or other complex
regions, and therefore represent the area covered by the non-empty set of all
their descendants. Regions do not intersect other than by containment and the
set of atomic regions exhaustively covers the environment. (In case of the indoor
domain an atomic region could be a room, whereas a complex region might
represent a hallway along with all its neighboring rooms.) An atomic region a
therefore contains no other region r:
∀a, r : atomic(a) ⇔ ¬in∗ (a, r). (6)
The connectivity of atomic regions is given by the environment. In our model
an atomic connection is simply a tuple of atomic regions a1 , a2 :
∀a1 , a2 : con(a1 , a2 ) ⇒ atomic(a1 ) ∧ atomic(a2 ). (7)
Further information like the specific action necessary for reaching the next region
could be represented as well but for the sake of simplicity we stick to region
tuples as connections. Note that the connection predicate is non-transitive and
irreflexive, thus disallowing a region to be connected with itself.
The global connectivity graph is hierarchically decomposed by storing atomic
connections in the root of the smallest subtree given by two regions a1 , a2 . There-
fore each region f carries a set of all atomic connections between its descendants
provided that this node is their FCA:
∀f, a1 , a2 : consa (f, a1 , a2 ) ⇔ fca(f, a1 , a2 )
∧ con(a1 , a2 ). (8)
This connection set is later used for obtaining crossings between regions at the
atomic level.
Alongside atomic connections the hierarchy also needs to represent the con-
nectivity of complex regions. For this purpose each region has a second set con-
taining complex connections. A complex connection is a tuple of two regions
c1 , c2 sharing the same parent f . A complex connection exists if a region a1
contained by (or equal to) c1 is atomically connected to a region a2 contained
by (or equal to) c2 :
∀f, c1 , c2 : consc (f, c1 , c2 ) ⇔ ∃a1 , a2 : consa (f, a1 , a2 )
∧ child(c1 , f, a1 )
∧ child(c2 , f, a2 ). (9)
A complex connection therefore exists if and only if the set of atomic connec-
tions contains a corresponding entry. The set of complex connections defines
a connected graph by interpreting region tuples as edges. This graph enables
searching for paths between complex regions, whereas the atomic connection set
yields the actual (atomic) crossings between the complex regions.
The existence of a path between two arbitrary regions s, d is conditioned on
a third region r that completely encompasses the path. Hence for all nodes x
along the path the in predicate must be fulfilled with respect to r:
∀s, d, r : path(s, d, r) ⇔ (∃x : con(s, x)

∧ in(x, r) ∧ path(x, d, r))
∨ con(s, d) ∨ s = d. (10)
Finally we state the connectivity criterion that a valid hierarchy must satisfy.
We require that all regions r1 , r2 located in a common parent region p must be
reachable from each other without leaving p:
∀r1 , r2 , p : in(r1 , p) ∧ in(r2 , p) ⇔ path(r1 , r2 , p). (11)
This enables the hierarchical decomposition of wayfinding tasks because a hier-

archy adhering to this principle reduces the search space to the subtree given by
a source and a destination region.
2.2 Clustering
The problem of imposing a hierarchy onto an environment is essentially a matter
of clustering regions hierarchically. Humans seem to be able to do this effort-
lessly and there is evidence that the acquisition of region knowledge happens
very early during the exploration of an environment [14]. Some suggestions on
principles for the automated hierarchical clustering of spatial environments can
be found in [21]. However, automatically generating hierarchies similar to the
ones constructed by humans for arbitrary spatial configurations is an unsolved
problem. We briefly describe a domain-specific clustering approach for indoor en-
vironments. For the purpose of auditability and comparability of our wayfinding
algorithm’s performance we first state the more generic albeit artificial quadtree
as a possibility for generating hierarchies.
While humans seem to use various criteria for grouping regions, we focus on
the connectivity aspect, since it is essential for navigation. We require a proper
hierarchy to fulfill four properties. The first two are similarity of region size at
each hierarchy level and convexity as mentioned above. The third is given by
(11) and asserts the existence of a path within a bounding region. The fourth
property concerns the hierarchy’s shape. The tree should be nearly balanced and
its depth must be logarithmically dependent on the number of atomic regions.
This excludes “flat” hierarchies as well as “deformed” ones with arbitrary depths.
Note that the third requirement is necessary for correctness of the wayfinding
algorithm described in the next section while the hierarchy’s shape merely affects
the algorithms computation time. Size and convexity of regions determine the
accuracy of qualitative distance estimates.
Generating proper clusters becomes significantly easier if one makes assump-
tions about the connectivity structure of an environment. In the spatial domain
it is popular to approximate place connectivity, i.e., connections between atomic
regions, by assuming grid-like connections where each region is connected to its
four neighboring regions. In this case a simple quadtree can be applied in which
a set of four adjacent regions at level k corresponds to one region at level k + 1.
The resulting hierarchy would indeed satisfy the connectivity property defined
by (11) and with a constant branching factor of b = 4 its depth would be loga-
rithmically bounded. Similar region size and convexity are asserted due to the
uniformity of grid cells. However, the applicability of the quadtree approach is
limited in case of environments with less regular and more restricted connectiv-
ity since this could easily violate the connectedness of regions. This connectivity
restriction is especially predominant in indoor environments.
A first approach towards modeling human grouping mechanisms for indoor
environments has been proposed in [17]. It applied a classical metrical cluster
algorithm to local quantitative representations and combined this with domain-
specific connectivity-based heuristics for the topological level. Indoor environ-
ments are especially suited for the hierarchical clustering of regions because they
offer natural region boundaries in the form of walls and because they are char-
acterized by local connectivity agglomerations. The latter can be exploited by
defining regions based on places of high connectivity, e.g., a hallway could form
a region together with its connected rooms. This not only asserts connectedness
of regions, it also leads to intuitively shaped hierarchies with convex regions like
the one shown in figure 1. Similar region sizes at each hierarchy level are a direct
result of the high degree of symmetry found in many indoor environments.
3 Wayfinding
Given a hierarchy that satisfies the properties described above it is possible to

devise algorithms that utilize this data structure. In this section we propose a
wayfinding algorithm that is capable of planning paths between atomic source
regions and arbitrary (atomic or complex) destination regions. By decomposing
tasks hierarchically and by estimating path lengths qualitatively based on re-
gions, the algorithms aims to be cognitively plausible while efficiently producing
near-optimal solutions.
The first part explains the algorithm in-depth using pseudo code. The second
part analyzes its time and space complexity. Finally, we compare its properties to
different path planning algorithms, both hierarchical and non-hierarchical ones.
3.1 Algorithm
The basic idea of the algorithm is to limit the search space to the region given
by the minimum subtree containing the source and destination region at each
recursion step. This is possible because (11) guarantees the connectedness of such
a region. Within this region an “abstract” path is constructed using a shortest
path algorithm. The search space is the local connectivity graph composed of the
direct descendants of the subtree’s root and their connections. Start and goal of
the abstract path are given by the regions in which the source and destination
region are located at the corresponding level. For each connection of the abstract
path a corresponding one from the set of atomic connections is selected and used
as a new destination for the next recursion. This process is repeated until the
atomic level is reached and a complete path has been constructed. Alternatively
only the first element from each abstract path is relaxed, which yields only the
first atomic region crossing while keeping the other crossings abstract.
Figure 2 illustrates how the region hierarchy is used to find a path from one
atomic region to another by means of complex and atomic connections.
1 find way ( s , d , h )
2 i f i n ( s , d , h ) then
3 return [ ] // empty p a t h
4
5 f c a := fca ( s , d , h)
6 cs a = c o n a ( f c a , h ) // atomi c c o n n e c t i o n s
7 cs c = c o n c ( f c a , h ) // complex c o n n e c t i o n s
8 p c := Dijkstra ( cs c , child ( s , fca ) , child (d , fca ))
9
10 c u r := s // c u r r e n t ( atomi c ) r e g i o n
11 p a := [ ] // ( atomi c ) p a t h
12 f o r each c c from p c
13 c a := s e l e c t ( cur , c c , c s a )
14 p a := p a + f i n d w a y ( cur , s o u r c e ( c a ) , h )
15 p a := p a + c a
16 cur = d e s t i n a t i o n ( c a )
17
18 return p a + f i n d w a y ( cur , d , h )
The algorithm has three input parameters: s denotes the atomic source region,
d the atomic or complex destination region and h is a properly clustered region
hierarchy. First the trivial case of s being located in (or equal to) d is covered
which results in an empty path (lines 2-3).
If this is not the case, the FCA (defined by (5)) of s and d is determined
(line 5). Given this region the set of complex connections cs c between the
direct descendants of fca and the set of corresponding atomic connections cs a
(given by (9) and (8)) is obtained (lines 6-7). The former is used to construct
an abstract path p c from s to d composed of the FCA’s direct descendants by
applying Dijkstra’s shortest path algorithm.
Next the current region cur is initialized with the source region s (line 10)
and the algorithm iterates through all complex connections in the abstract path
(line 12). From the set of atomic connections cs a an element corresponding to
the current complex connection is randomly selected (line 13). Corresponding
means that the source region of the atomic connection must be located in (or

Fig. 2. For finding a path from region R1 to region R5 the FCA of both nodes is first
determined. This node contains a connectivity graph composed of complex connections
between direct descendants (C1-R3, R3-C2), which is used for finding the abstract
path from C1 to C2 (C1→R3→C2). The actual region crossings obtained from the set
of corresponding atomic connections (R2-R3, R3-R4) form new destinations for the
subsequent recursion steps (first R2, then R4).
equal to) the complex connection’s source region and the same must be true
for the destination region. This atomic connection c a can be thought of as a
waypoint that the agent has to pass in order to reach the next region, even if
the complete path has not been determined yet. This intermediate goal then
becomes the new destination for the recursion step; the new source is given by
the current region cur (line 14). The result is a partial atomic path which is
concatenated to the path variable p a along with the atomic connection c a
(lines 14-15). Afterwards the current region is updated accordingly (line 16).
Finally, the path from the current region to the actual destination d is obtained
recursively and the combined path is returned (line 18).
As mentioned the algorithm can be operated in two modes. In the form in
which it is stated above a complete path containing all atomic steps is con-
structed. If the agent is only interested in generating the next action towards
the goal, the iteration over all complex connections can be omitted. Instead
only the first connection at each hierarchy level is recursively broken down to
the atomic level while the remaining paths are kept abstract. This guarantees
a decrease of hierarchy level by at least one in each recursion because the next
intermediate destination given by l a is always located in the same complex
region (one level below the current FCA) as the current region.
3.2 Complexity
Here we investigate the time and space complexity of our algorithm. Space com-
plexity is equal to that of a flat connectivity graph apart from a slight overhead.
The number of atomic regions n and atomic connections is exactly the same.
1
There is an additional number of b−1 n of complex regions, assuming the tree
is balanced and that it has a constant branching factor b. The number of com-
plex connections depends on the specific connectivity of an environment but it
is obviously lower than the number of atomic connections, since complex con-
nections are only formed for atomic connections defining a region crossing at the
complex level. The overall space complexity is thus of the same class as that of
a flat connectivity graph.
If the algorithm is operated such that it only determines the next action, it
exhibits a time complexity of O(log n). The construction of an abstract path at a
given level lies in O(1), if the branching factor has an upper bound independent of
n (e.g., b = 4 for quadtrees). The selection of an atomic connection is of constant
time complexity as well because, given a complex connection, it is possible to
immediately obtain a corresponding atomic, for instance by using a lookup table.
By expanding only the first element of each abstract path, the current FCA
basically moves from the initial FCA down to the source region which leads to
logb n visited nodes in the worst case. This is because the tree’s depth is assumed
to be logarithmically dependent on n and because each recursion step leads to a
decrease of the current hierarchy level by at least one. Even though the current
subtree has to be redetermined for each new destination the complexity stays
the same. The overall complexity of determining the FCAs for all recursion
step is O(log n) as well since there are only logarithmically many candidates.
Implementation-wise the FCA search could be accomplished by using a simple
loop apart from the recursion. By combining the number of visited nodes with
the complexity for determining the current subtree additively, the complexity
for obtaining the next action is O(log n).
For planning complete paths the worst-case complexity is O(n log(log n)). In
the unlikely case of having to visit every single atomic region each complex region
becomes the FCA once. In caseof a properly shaped hierarchy the number of com-
1
plex regions is given by b−1 n and therefore lies in O(n). Determining the FCA
of two arbitrary regions lies in O(log(log n)) because it involves comparing two hi-
erarchy paths of logarithmic length. The comparison itself can be performed by
a binary search in O(log n). Again, the construction of an abstract path and the
retrieval of an atomic connection take a constant amount of time, which yields an
overall complexity of O(n log(log n)) for generating complete (atomic) paths.
3.3 Comparison
We compare the properties of our algorithm to the HPA∗ [5] path planning algo-
rithm and the hierarchical D∗ algorithm [6] as well as Dijkstra’s classical shortest
path algorithm [22]. We show that our approach exceeds all of these in terms of
time complexity while at the same time yielding reasonable approximations of
optimal solutions.
When implemented with an efficient distance queue the time complexity of
Dijkstra’s algorithms is O(|E| + n log n) with |E| denoting the number of edges
in a given connectivity graph [23]. In case of sparse graph, i.e., graphs for which
n log n is not dominated by the number of edges, the complexity reduces to
O(n log n). This sparseness is generally satisfied in the spatial domain, since
connectivity is limited to the intermediate neighborhood, which is certainly true
for regionalized environments. For practical purposes HPA∗ and hierarchical D∗
are both significantly faster than Dijkstra’s algorithm, however their worst-case
complexity poses no improvement over the one of Dijkstra’s algorithm.
In contrast, our algorithm exhibits a complexity of only O(n log(log n)) for
planning complete paths. Furthermore, the number of expanded regions is nb in-
stead of n in case of Dijkstra’s algorithm because atomic regions behave trivially
as FCAs. The complexity of O(log n) for determining the next action can not be
compared, since Dijkstra’s algorithm has to generate all shortest paths before
being able to state the first step.
Despite being efficient an algorithm obviously also needs to produce useful
paths. Like Dijkstra’s algorithm the hierarchical D∗ algorithm generates optimal
paths, however, it does so at the expense of having to calculate partial solutions
offline, leading to increased storage requirements. HPA∗ on the other hand yields
near-optimal results, if an additional path smoothing is applied, while also using
precomputed partial solutions. Since both approaches make use of the A∗ search
algorithm, they both require the availability of metrical information in order
to obtain an admissible distance heuristic [24], which makes them unsuited for
cases where such knowledge can not be provided.
For the purpose of analyzing our algorithm in terms of path lengths we set up
a simulation in which we tested the proposed algorithm against optimal solutions
obtained via Dijkstra’s shortest path algorithm. Since we did not consider metric
knowledge the length of a path was measured by the number of visited atomic
regions. The environment was a quadratic grid with 64 atomic regions in which
each cell was connected to its four neighbors. We chose a grid-like connectivity
since it works well as a general approximation for many environments and since
it allowed us to avoid using domain-specific clustering criteria. A simple quadtree
was therefore applied to obtain a region hierarchy.
In 1000 iterations we randomly selected two cells as source and destination
regions for the wayfinding task and we compared the path lengths obtained via
our hierarchical algorithm to the ones of Dijkstra’s algorithm. On average the
produced paths contained 20.5% more atomic regions. Although these results
are not sufficient for an in-depth analysis, the example demonstrates that the
resulting paths are not arbitrarily longer than shortest paths. For domains with
more restricted connectivity such as indoor environments we observed better
performance, typically equal to optimal solutions.
The main source of error resulted from the selection of atomic connections
between two regions, because the regions themselves do not offer any information
that would permit the derivation of a useful selection heuristic. The discussion
points to some work that could improve this behavior. The error is considerably
reduced if the hierarchy’s branching factor is increased. In fact there is a direct
Fig. 3. A grid-like environment with source S and destination D and two paths, one
optimal (dashed), the other obtained by the hierarchical algorithm (continuous). The
different gray levels of hierarchy nodes indicate the different search levels: white nodes
are completely disregarded, gray nodes are part of a local search and black nodes are
part of a search and recursively expanded. The gray cells at the bottom visualize the
atomic search space.
trade-off between efficiency and average path length, because higher branching
factors lead to larger local search spaces for which optimal solutions are obtained.
Next to time complexity and path optimality it is noteworthy to take a look
at the search space of the proposed algorithm. Unlike Dijkstra’s algorithm which
blindly expands nodes in all directions, our algorithm limits the set of possible
paths for each recursion by excluding solutions at the parent level. Figure 3 shows
the environment used during the simulation along with two exemplary paths be-
tween two region S and D, one optimal, the other one constructed hierarchically.
7
On the atomic level only 16 of the regions are considered by our approach while
Dijkstra’s algorithm visits each region. This ratios decreases further with more
atomic regions and it tends to zero as the number of region tends to infinity.
4 Discussion
Humans make use of hierarchical representations of regionalized environments, as

has been shown in several experiments [10,11,12,7]. In this paper we proposed a
computational model of such a hierarchical representation and we demonstrated

how it can be used by an agent to solve wayfinding tasks. Region connectivity
is represented at different levels of granularity in the hierarchy and this can
be exploited for a simplification of the search by a decomposition of the global
search space into locally bounded subspaces. This results in an efficient path
planning and in the ability to generate the next action in logarithmic time.
We described the proposed hierarchy formally and we provided several criteria
based on which valid hierarchies can be formed. An essential criterion for the
clustering of regions seems to be connectivity, in particular if one is interested
in basic navigation issues, but connectivity is probably only one criterion among
a larger set of criteria used by humans. We considered a connectivity-based
clustering approach with domain-specific rules. However, this algorithm is still ad
hoc and restricted to office-like domains. Currently there exist no general spatial
cluster algorithms that produce satisfying hierarchies for arbitrary environments
and more work concerning this problem, and on how it is solved by humans, is
obviously needed. As a first approximation and as a basis for comparison with
other approaches we hence resorted to using a quadtree decomposition, which
can be regarded as a prototypical hierarchical representation, which satisfies the
required formal criteria.
The proposed wayfinding algorithm that operates on the region hierarchy is
cognitively plausible in that it derives paths by searching at different levels of
detail, which humans seem to do as well. By only obtaining the first action
and leaving the remaining path abstract, an agent can save on computation
and memory resources. Furthermore, planning further ahead than necessary also
bares the risk of rendering solutions obsolete, since actions lying far ahead are
less likely to actually occur. This is especially important for environments that
exhibit dynamic effects, e.g., closing doors, in which case offline algorithms are
forced to perform complete replanning.
Wiener and Mallot introduced the terms ’coarse-to-fine’ and ’fine-to-coarse’
to refer to the commitment of a wayfinding algorithm regarding the level of plan
detail [13]. The mode of generating complete paths can be called ’coarse-to-fine’
because it decomposes complex tasks into smaller subtasks down to the atomic
level. When only obtaining the next action the agent restricts itself to planning
the waypoints that are necessary for moving to the next atomic region which
can be thought of as a least-commitment strategy. This complies with the ’fine-
to-coarse’ scheme, since a detailed plan is produced only for the intermediate
surroundings. The plan resolution decreases monotonically along the path which
we believe is an essential property of human navigation.
Even though the resulting solutions are not guaranteed to be optimal, they
are not significantly worse. When planning complete paths the algorithm ex-
hibits a time complexity of only O(n log(log n)) and is thus more efficient than
Dijkstra’s shortest path algorithm as well as hierarchical approaches like HPA∗
and hierarchical D∗ . Unlike most algorithms that do not entirely rely on best-
shot heuristics our approach can be used to obtain intermediate actions without
planning complete path or risk moving towards dead ends. Such a retrieval of
the next action is done in O(log n). Aside from time complexity the hierarchical
organization leads to drastically reduced search spaces for most domains.
There are two main sources of error that can cause a suboptimality of com-
puted paths. One is the estimated path length based on regions at higher levels.
This estimate becomes less accurate the more region size varies and the less
convex regions are shaped. However, this problem effects human judgment of
distance as well, as has been shown in [10]. The other type of error results from
the way how atomic connections between regions are selected. In the current im-
plementation this is done by a first-match mechanism which is problematic for
environments with many region crossings, as in case of the mentioned grid maps.
This problem could be reduced if suitable selection heuristics were available. One
possibility is the use of relative directions for which hierarchical region-based ap-
proaches already exist [25].
In addition to efficient wayfinding the suggested hierarchical region-based rep-
resentation could provide an agent with other useful skills. For example, it en-
ables an abstraction from unnecessary details, and it can also overcome the
problem of asserting a common reference frame. Obviously all this requires that
real environments are actually regionalized in an inherently hierarchical fashion.
We believe that this is indeed the case for most real-world scenarios. Finally, we
expect hierarchical representations to facilitate the communication about spatial
environments between artificial agents and humans [9,26] and spatial problem
solving in general.
Acknowledgements
This work was supported by the DFG (SFB/TR 8 Spatial Cognition, project
A5-[ActionSpace]).
References
1. Thrun, S.: Robotic mapping: A survey (2002)
2. Werner, S., Krieg-Brückner, B., Herrmann, T.: Modelling navigational knowledge
by route graphs. In: Habel, C., Brauer, W., Freksa, C., Wender, K.F. (eds.) Spatial
Cognition II 2000. LNCS (LNAI), vol. 1849, pp. 295–317. Springer, Heidelberg
(2000)
3. Kuipers, B.: The spatial semantic hierarchy. Technical Report AI99-281 (29, 1999)
4. Car, A., Frank, A.: General principles of hierarchical reasoning - the case of
wayfinding. In: SDH 1994, Sixth Int. Symposium on Spatial Data Handling, Edin-
burgh, Scotland (September 1994)
5. Botea, A., Müller, M., Schaeffer, J.: Near optimal hierarchical path-finding. Journal
of Game Development 1(1), 7–28 (2004)
6. Cagigas, D.: Hierarchical D* algorithm with materialization of costs for robot path
planning. Robotics and Autonomous Systems 52(2-3), 190–208 (2005)
7. Graham, S., Joshi, A., Pizlo, Z.: The traveling salesman problem: a hierarchical
model. Memory & Cognition 28(7), 1191–1204 (2000)
8. Pizlo, Z., Stefanov, E., Saalweachter, J., Li, Z., Haxhimusa, Y., Kropatsch, W.:
Traveling salesman problem: a foveating pyramid model. Journal of Problem Solv-
ing 1, 83–101 (2006)
9. Tomko, M., Winter, S.: Recursive construction of granular route directions. Journal
of Spatial Science 51(1), 101–115 (2006)
10. Stevens, A., Coupe, P.: Distortions in judged spatial relations. Cognitive Psychol-
ogy 10, 526–550 (1978)
11. Hirtle, S.C., Jonides, J.: Evidence of hierarchies in cognitive maps. Memory and
Cognition 13(3), 208–217 (1985)
12. McNamara, T.P.: Mental representations of spatial relations. Cognitive Psychol-
ogy 18, 87–121 (1986)
13. Wiener, J., Mallot, H.: ’Fine-to-coarse’ route planning and navigation in regional-
ized environments. Spatial Cognition and Computation 3(4), 331–358 (2003)
14. Wiener, J., Schnee, A., Mallot, H.: Navigation strategies in regionalized environ-
ments. Technical Report 121 (January 2004)
15. Voicu, H.: Hierarchical cognitive maps. Neural Networks 16(5-6), 569–576 (2003)
16. Thomas, R., Donikian, S.: A model of hierarchical cognitive map and human mem-
ory designed for reactive and planned navigation. In: 4th International Space Syn-
tax Symposium, Londres (June 2003)
17. Gadzicki, K., Gerkensmeyer, T., Hünecke, H., Jäger, J., Reineking, T., Schult, N.,
Zhong, Y., et al.: Project MazeXplorer. Technical report, University of Bremen
(2007)
18. Schill, K., Zetzsche, C., Wolter, J.: Hybrid architecture for the sensorimotor rep-
resentation of spatial configurations. Cognitive Processing 7, 90–92 (2006)
19. Schill, K., Umkehrer, E., Beinlich, S., Krieger, G., Zetzsche, C.: Scene analysis with
saccadic eye movements: Top-down and bottom-up modeling. Journal of Electronic
Imaging 10(1), 152–160 (2001)
20. Montello, D.R., Goodchild, M.F., Gottsegen, J., Fohl, P.: Where’s downtown?: Be-
havioral methods for determining referents of vague spatial queries. Spatial Cog-
nition & Computation 3(2-3), 185–204 (2003)
21. Han, J., Kamber, M., Tung, A.K.H.: Spatial clustering methods in data mining. In:
Geographic Data Mining and Knowledge Discovery, pp. 188–217. Taylor & Francis,
Inc., Bristol (2001)
22. Dijkstra, E.W.: A note on two problems in connexion with graphs. Numerische
Mathematik 1(1), 269–271 (1959)
23. Barbehenn, M.: A note on the complexity of dijkstra’s algorithm for graphs with
weighted vertices. IEEE Trans. Comput. 47(2), 263 (1998)
24. Junghanns, A.: Pushing the limits: new developments in single-agent search. PhD
thesis, University of Alberta (1999)
25. Papadias, D., Egenhofer, M.J., Sharma, J.: Hierarchical reasoning about direction
relations. In: GIS 1996: Proceedings of the 4th ACM international workshop on
Advances in geographic information systems, pp. 105–112. ACM, New York (1996)
26. Maaß, W.: From vision to multimodal communication: Incremental route descrip-
tions. Artificial Intelligence Review 8(2), 159–174 (1994)
Analyzing Interactions between Navigation
Strategies Using a Computational Model of
Action Selection
Laurent Dollé1, , Mehdi Khamassi1,2 , Benoı̂t Girard2 ,

Agnès Guillot1 , and Ricardo Chavarriaga3
1
ISIR, FRE2507, Université Pierre et Marie Curie - Paris 6, Paris, F-75016, France
2
LPPA, UMR7152 CNRS, Collège de France, Paris, F-75005, France
3
IDIAP Research Institute, Martigny, CH-1920, Switzerland
laurent.dolle@isir.fr
Abstract. For animals as well as for humans, the hypothesis of multiple

memory systems involved in different navigation strategies is supported
by several biological experiments. However, due to technical limitations,
it remains difficult for experimentalists to elucidate how these neural
systems interact. We present how a computational model of selection
between navigation strategies can be used to analyse phenomena that
cannot be directly observed in biological experiments. We reproduce an
experiment where the rat’s behaviour is assumed to be ruled by two
different navigation strategies (a cue-guided and a map-based one). Using
a modelling approach, we can explain the experimental results in terms
of interactions between these systems, either competing or cooperating
at specific moments of the experiment. Modelling such systems can help
biological investigations to explain and predict the animal behaviour.
Keywords: Navigation strategies, Action selection, Modelling, Robotics.
1 Introduction
In natural environments, animals encounter situations where they have to si-

multaneously learn various means for reaching interesting locations and to se-
lect dynamically the best to use. Many neurobiological studies in both rodents
and humans have investigated how this selection is performed using experimental
paradigms in which several navigation strategies may be learned in parallel. Some
strategies may be based on a spatial representation (i.e., inferring a goal-directed
action as a function of its location, called map-based strategies), whereas other
strategies can be based on direct sensory-motor associations without requiring a
spatial representation (i.e., map-free) [1,2,3,4]. A number of experimental results
lead to the hypothesis that these strategies are learned by separate memory sys-
tems, with the dorsolateral striatum involved in the acquisition of the map-free
strategies and the hippocampus mediating the map-based strategy [5,6].

Corresponding author.

72 L. Dollé et al.
However, it is not yet clear whether these learning systems are independent
or whether they interact for action control in a competitive or in a coopera-
tive manner. The competition implies that inactivation of one system enhances
the learning of the remaining functional system, while the cooperation states
that learning in one system would compensate the limitations of the other one
[7,8,9,10,11]. The present work aims at investigating such interactions using a
computational model of spatial navigation based on the selection between the
map-based and map-free strategies [12]. Besides a qualitative reproduction of
the experimental results obtained in animals, the modelling approach allows
us to further characterize the competitive or cooperative nature of interactions
between the two strategies.
Following our previous modelling efforts [12], we study the interaction between
the navigation strategies in the experimental paradigm proposed by Pearce et
al. (1998) [13]. In this paradigm, which is a modification of the Morris Hidden
Water Maze task [14], two groups of rats (“Control” group of intact animals and
“Hippocampal” group of animals with damaged hippocampus) had to reach a
hidden platform indicated by a landmark located at a fixed distance and ori-
entation from the platform. After four trials, the platform and its associated
landmark were moved to another location and a new session started. The au-
thors observed that both groups of animals were able to learn the location of the
hidden platform, but at the start of each new session the hippocampal animals
were significantly faster in finding the platform than controls. Moreover, only
the control rats were able to decrease their escape latencies within a session.
From these results, authors conclude that rats could simultaneously learn two
navigation strategies. On the one hand, a map-based strategy encodes a spatial
representation of the environment based on visual extra-maze landmarks and
self-movement information. On the other hand, a map-free strategy (called by
the authors “heading vector strategy”) encodes the goal location based on its
proximity and direction with respect to the intra-maze cue [15]. Based on these
conclusions, the decrease in the escape latency within sessions could be explained
by the learning of a spatial representation by intact animals. Furthermore, such
learning also suggests that when the platform is displaced at the start of a new
session, intact rats would swim to the previous (wrong) location of the platform
based on the learned map, whereas hippocampal animals would swim directly
to the correct location.
For the modelling purposes, the results of this experiment can be summarized
as follows: (i) both groups of rats could decrease their escape latencies across
sessions, but only the control rats improved their performance within sessions;
(ii) the improvement in the performance within each session, observed in the
control group, could be attributed to the use of a map-based strategy by these
rats; and (iii) higher performance of hippocampal rats relative to the controls
at the start of each session could be due to the use of the map-free strategy
(the only strategy that could be used by the lesioned animals). In other words,
the process of choosing the best strategy (i.e. the competition) performed by
Analyzing Interactions between Navigation Strategies 73
the control, but not the hippocampal, animals, decreased the performance of
controls relative to that of lesioned animals.
We have shown previously that the computational model used in the present
study is able to reproduce the behaviour of rats in the experiment of Pearce et al.
[12]. In the present paper, we extend these results by performing a further anal-
ysis of the interactions between both learning systems at different stages of
the experiment, taking into account the three points formulated above. In the
following section, we describe the model, the simulated environment and the
experimental protocol. Then we present the results and the analyses. Finally, we
discuss the results in terms of interactions between systems.
2 Methods and Simulation

2.1 Navigation Model
The neural network computational model is based on the hypothesis of different,
parallel learning systems exclusively involved in each strategy, and interacting
for behaviour control (Fig. 1). It is composed of two experts, learning separately
a map-based strategy and a map-free one (the experts are denoted MBe, and
MFe, respectively), both following reinforcement learning rules to acquire their
policy, i.e., the way the expert chooses an action given the current state in or-
der to maximize the reward. The model provides a mechanism that selects, at
each timestep, which strategy should drive the behaviour of the simulated robot,
given its reliability to find the goal. This section briefly describes both naviga-
tional experts, their learning process as well as the selection mechanism and the
learning mechanism underlying this selection (for a more detailed description
see [12]).
The map-free strategy is encoded by the MFe that receives visual signals from
sensory cells (SI), consisting of a vector of 36 inputs of gray values (one input for
every ten degrees), transducing a 360 degrees horizontal 1-D gray-scale image.
To simulate the heading vector strategy stated by Pearce et al., the landmark is
viewed with an allocentric reference: for example, when the landmark is located
to the North with regards to the robot, it will appear in the same area of the
camera, whatever might be the orientation of the robot.
The map-based strategy is encoded by the MBe that receives information
from a spatial representation encoded in a regular grid of 1600 place cells (PC)
with Gaussian receptive fields of width σP C [16] (values of all model parameters
are given in Table 1).
Strategy Learning. Both experts learn the association between their inputs
and the actions leading the robot to the platform, using a direct mapping be-
tween inputs (either SI or PC) and directions of movement (i.e., actions). Move-
ments are encoded by a population of 36 action cells (AC). The policy is learned
by both experts by means of a neural implementation of Q-learning algorithm [17].
Sensory inputs
Place cells
AC MFe AC MBe
Gating network
gMFe gMBe
ΦMFe , AMFe ΦMBe , AMBe
Selection
Fig. 1. The computational model of strategy selection [12]. The gating network receives
the inputs of both experts, and their reward prediction error, in order to compute their
reliability according to their performance (i.e., gating values gk ). Gating values are
then used with the Action value Ak in order to compute the probability of each expert
to be selected. The direction Φ proposed by the winning expert is then performed. See
text for further explanations.
* *
(a) (b)
Fig. 2. (a) A simplified view of ad hoc place cells. Each circle represents a place cell
and is located at the cell’s preferred position (i.e., the place where the cells are most
active). Cell activity is color coded from white (inactive cells) to black (highly active
cells) (b) The environment used in our simulation (open circles: platform locations,
stars: landmarks).
In this algorithm, the value of every state-action pair is learned by updating the
synaptic weight wij linking input cell i to action cell j:
Δwij = ηhk δeij , (1)
where η is the learning rate and δ the reward prediction error. The scaling
factor hk ensures that the learning module is updating its weights according to
its reliability (for all the following equations, k is either the MBe or the MFe). Its
computation is detailed further below. The eligibility trace e allows the expert
to reinforce the state-action couples previously chosen during the trajectory:
eij (t + 1) = rjpre ri + λeij (t), (2)
where rjpre is the activity of the pre-synaptic cell j, λ a decay factor, and ri the
activity of the action cell i in the generalization phase. Generalization in the
action space is achieved by reinforcing every action weighted by a Gaussian of
standard deviation σac centered on the chosen action. Each expert suggests a
direction of movement Φk :
k
i ai sin(φi )
Φk = arctan k , (3)
i ai cos(φi )
where ai is the action value of the discrete direction of movement φi . The corre-
sponding action-value Ak , computed by linear interpolation of the two nearest
discrete actions [17].
Action Selection. In order to select the direction Φ of the next robot move-
ment, the model uses a gating scheme such that the probability of being selected
depends not only on the Q-values of the actions (Ak ), but also on a gating value
gk . Gating values are updated in order to quantify the expert’s reliability ac-
cording to the current inputs. It takes the shape of a network linking the inputs
(place cells and sensory inputs) to the gating values gk , computed as a weighted
sum:
gk = zkP C rP C + zkSI rSI , (4)
where zkP C is the synaptic weight linking the PC, with activation rP C to the
gate k, idem for zkSI . Weights are updated in order to approach hk = gk(gcikci )
i
2
where ck = e(−ρδk ) (ρ > 0), according to the following rule:
P C,SI
Δzkj = ξ(hk − gk )rjP C,SI . (5)
The next action will be then chosen according to a probability of selection P :
gk Ak
P (Φ = Φk ) = i
. (6)
i∈k gi A
If both experts have the same gating value (i.e., reliability), then the expert
with the highest action value will be chosen. In contrast, if both experts have
the same action value, the most reliable expert, i.e., the one with highest gating
value, will be chosen.
2.2 Simulated Environment and Protocol

In our simulation, the environment is a square of size equivalent to 200×200
cm, while the simulated robot’s diameter is 15 cm (Fig. 2b). The landmark,
represented by a star of diameter 10 cm, is always situated at a constant distance
Table 1. Parameters of the model
Parameters Value Description

NP C 1600 Number of place cells
σP C 10 cm Standard deviation of PC’s activity profile
NAC 36 Number of action cells
σAC 22.5˚ Standard deviation of the enforced activity profile
η 0.015 Learning rate of both experts
λ 0.76 Decay factor of both experts
ξ 0.01 Learning rate of the gating network
ρ 1.0 Decreasing rate in ck
of 30 cm to the North of the platform, whose diameter is 20 cm. These dimensions

have been chosen in order to keep similar ratio of distances as in Pearce et al.’s
experimental setting (the platform’s size has been scaled up, as the original size
(10 cm) was too small and did not allow the experts to learn the task). The
number of possible locations of the platform has been reduced from eight to
four, in order to compensate the new size of the platform. As in [13], at the
beginning of each trial, the simulated robot is placed at a random position at
least 120 cm from the platform. The robot moving speed is 10 cm per timestep,
meaning that it requires at least 12 timesteps to reach the platform. If it is not
able to reach the platform in 150 timesteps, it is automatically guided to it, as
were the actual rats. A positive reward (R = 1) is provided when the platform
is reached.
We performed three sets of 50 experiments. In the first set, both experts
(MBe and MFe) are functional (Control group), in the second set only the MFe
is activated (Hippocampal group). For the third set of experiments, only the
MBe is activated. This “Striatal group” emulates a striatal lesion not included
in the original experiment.
2.3 Data Analysis

Performances of different groups were statistically assessed by comparing their
mean escape latency (signed-rank Wilcoxon test for matched-paired samples).
Moreover, following Pearce’s analysis, we assess learning differences within a ses-
sion by comparing the performance on the first and fourth trials using the same
test as before. Concerning the differences between both groups (i.e., between the
first trials of Control and Hippocampal groups, and between their fourth trials),
we use a Mann-Whitney test for non matched-paired samples.
To assess strategy changes during the whole experiment, we compare their se-
lection rates at every first and fourth trials of both early (first three sessions) and
late sessions (last three sessions). The selection rate of each expert is recorded
on two squares of 0.4 m2 , centered on the current and on the previous platform
positions and is computed as the number of times the robot chooses one strategy
over the total number of times it goes inside each of these regions.
In order to estimate strategy changes within a trial, the selected strategy

at each timestep is recorded. Since trajectories have different lengths, they are
first normalized in 10 bins, then we compute the selection rate on each of these
bins. The navigational maps of both experts, the preferred orientation at each
location of the environment, are also provided in order to illustrate changes in
the expert’s learning across trials or sessions.
Finally, we evaluate the influence of the robot’s behaviour when controlled by
an expert on the learning of another expert. The averaged heading error across
the sessions is computed for the three groups. This heading error corresponds to
the difference between the actual direction of movement proposed by the expert
and the “ideal” direction, pointing to the current platform location (the heading
error will be zero when the robot points towards the platform; an error of one
means that the robot moves in the opposite direction). This error is computed
in the neighbourhood of the current platform –on a square of 0.4 m2 – in order to
take values from places that are sufficiently explored by the robot. The influence
between experts can be assessed by measuring whether the heading error for one
of the strategies decreases as a result of the execution of the other strategy.
3 Results
3.1 Learning across and within Session
Our model qualitatively reproduces the results obtained in animals (Fig. 3a).
As shown in Fig. 3b, both Control and Hippocampal groups are able to learn
the task, i.e., their escape latencies decrease with training. Moreover, the perfor-
mance of the Control group improves within each session, as there is a significant
decrement of the escape latency between the first and fourth trials (p<0.001).
Finally, as it was the case with rats, escape latencies of the Hippocampal group
in the first trial are smaller than the Control group (p<0.001). Concerning the
Striatal group, Fig. 3c shows a significant improvement within session for this
group, but no learning is achieved across sessions, suggesting a key role of the
MFe in the performance improvement across sessions of the Control group.
3.2 Role of Interactions between the MFe and the MBe in the
Control Group
First trials: Increase of MFe Selection Across Sessions and Competi-
tion Between the MFe and the MBe Within Trials
In the first trial of every session, the platform is relocated, so as the strategy
learned by the MBe in the previous session is not relevant anymore. Accord-
ingly, the selection of the MFe expert near the current platform location increases
from the early to the late sessions (p<0.05), strongly suggesting a role of the MFe
in the latency decrease across sessions that occurs in the Control group (Fig. 4a).
Fig. 4a also shows that the MBe is often selected near the previous platform lo-
cation, suggesting the existence of a competition between both experts. MBe
(a) Control and Hippocampal (Pearce) (b) Control and Hippocampal (c) Control and Striatal
120 160 160
140 140
Escape latency (timesteps)
Escape latency (timesteps)

Escape latency (seconds)
100
120 120
80 100 100
60 80 80
60 60
40
40 40
20
20 20
0 0 0
0 2 4 6 8 10 12 0 2 4 6 8 10 12 0 2 4 6 8 10 12
Session Session Session
Control−1 Control−1 Control−1
Control−4 Control−4 Control−4
Hippocampal−1 Hippocampal−1 Striatal−1
Hippocampal−4 Hippocampal−4 Striatal−4
Fig. 3. Mean escape latencies measured during the first and the fourth trial of each
session. (a) Results of the original experiment with rats, reproduced from [13]. (b)
Hippocampal group (MFe only) versus Control group (provided with both a MFe and
a MBe). (c) Striatal group (MBe only) versus Control group. See text for explanations.
1 1
Early sessions
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0 (start) 2 4 6 8 10 (goal)
1
1
Late sessions
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0 (start) 2 4 6 8 10 (goal)
current goal previous goal current goal occupation

MBe selection
(a) (b)
Fig. 4. First trials: (a) Selection rates of MBe (empty boxes) and MFe (full boxes) near
the current and the previous platform in early (top) and late sessions (bottom) (b)
Selection rates of MBe and current goal occupation within trial in early (top) and late
(bottom) sessions
preference does not change within a trajectory and is in average less selected
than the MFe (Fig. 4b).
The trajectories (Fig. 5a and 5b) confirm the existence of a competition: the
MBe tends to lead the robot to the previous location of the platform – as shown
in the navigational maps of this expert (Fig. 5c and 5d) – whereas the MFe has
* * *
* * * * * *
* * *
(a) (c) MBe (e) MFe
* * *
* * * * * *
* * *
(b) (d) MBe (f) MFe

MBe Current platform
MFe Previous platform
Fig. 5. First trials: (a) Trajectory of the robot for the 3rd session (b) Trajectory of
the robot for the 9th session. (c) Navigational map of the MBe for the 3rd session (d)
Navigational map of the MBe for the 9th session (e) Navigational map of the MFe for
the 3rd session (f) Navigational map of the MFe for the 9th session.
learned to orient the robot towards the appropriate direction, i.e., at the South
of the landmark (Fig. 5e and 5f). This result is consistent with the explanation
provided by Pearce and colleagues and shows that the competition between the
MBe and the MFe is mainly responsible for the poor performances of the Control
group in the first trials.
Fourth trials: Cooperation Between the MFe and the MBe Within
Trials. At the end of a session, the platform location remained stable during
four trials, allowing the MBe to learn its location. According to Pearce’s hy-
pothesis, rats behaviour depends mainly on the map-based strategy (involving
the hippocampus) that has learned the platform location for this session. How-
ever, simulation results show that the Striatal group –controlled by the MBe
only– is outperformed by both the Hippocampal and the Control groups, de-
spite a high improvement within sessions (c.f. Fig. 3c). This suggests that the
performance of the Control group on the fourth trials cannot be explained exclu-
sively by the MBe expert. Indeed, although this expert leads the agent towards
the current goal position, it also leads to the previous goal location as illustrated
by its selection rate on both sites (Fig. 6a). In addition, selection rates within a
trajectory show a strategy change from the MFe –which is preferred at the be-
ginning of a trial– towards a preference for the MBe at the end of the trajectory
(Fig. 6b).
Early sessions
1
0.8 0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
(start) 2 4 6 8 10 (goal)
0
1 1
Late sessions
0.8
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
(start) 2 4 6 8 10 (goal)
0
current goal occupation
current goal previous goal MBe selection
(a) (b)
Fig. 6. Fourth trials: (a) Selection rates of MBe (empty boxes) and MFe (full boxes)
near the current and the previous platform in early (top) and late (bottom) sessions.
(b) Selection rates of MBe and current goal occupation within trial in early (top) and
late (bottom) sessions.
This sequence is visible in typical trajectories (Fig. 7a and 7b). The naviga-
tional maps of each expert reveal that the MFe orients the robot towards the
South of the landmark (Fig. 7e and 7f), whereas the MBe leads it on the precise
location of the platform, only when it is at its vicinity (Fig. 7c and 7d).
This suggests that the experts are cooperating by both adequately partici-
pating to the resolution of the task, depending on their reliability at a specific
point of the journey. Our findings –pointing out a cooperative interaction at the
end of each session– extend Pearce’s hypothesis of MBe dominance in behaviour
control.
3.3 Interactions between MFe and MBe

In simulations of both the Hippocampal and Striatal groups, the inactivation of
one expert only prevented it to control the robot’s behaviour, but not to learn.
We can thus analyze how the interactions influence the learning of each strategy.
First, looking at the accuracy of both experts in the neighbourhood of the
current platform (Fig. 8), we observe that when the robot behavior is driven by
the MBe (i.e. Striatal group), the performance of the MFe decreases (Fig. 8c).
Second, we observe that MBe performs better in the Control group (Fig. 8a) than
in Striatal and Hippocampal groups (Fig. 8b and c), presumably because of the
influence of the efficient learning of the MFe (i.e., cooperative interactions).
The navigational maps of MFe are similar –i.e., pointing to the South of the
landmark– for the Control, Striatal and Hippocampal groups, despite the differ-
ence of performance observed above (Fig. 9c, d and 7f). In contrast, those of the
MBe are different: in the Striatal group (Fig. 9a), the MBe is less attracted by the
previous platform location than in the Control group (Fig. 7d), whereas it is at-
tracted by the four possible locations in the Hippocampal group (Fig. 9b). The
* * *
* * * * * *
* * *
(a) (c) MBe (e) MFe
* * *
* * * * * *
* * *
(b) (d) MBe (f) MFe

MBe Current platform
MFe Previous platform
Fig. 7. Fourth trials: (a) Trajectory of the robot for the 3rd session (b) Trajectory of
the robot for the 11th session. (c) Navigational map of the MBe for the 3rd session (d)
Navigational map of the MBe for the 11th session (e) Navigational map of the MFe for
the 3rd session (f) Navigational map of the MFe for the 11th session.
(a) Control group (b) Hippocampal group (MFe only) (c) Striatal group (MBe only)
0.5 0.5 0.5
MBe−1 MBe−1 MBe−1
MBe−4 MBe−4 MBe−4
MFe−1 MFe−1 MFe−1
0.4 MFe−4 0.4 MFe−4 0.4 MFe−4
Averaged heading error
0.3 0.3 0.3
0.2 0.2 0.2
0.1 0.1 0.1
0 0 0
0 2 4 6 8 10 12 0 2 4 6 8 10 12 0 2 4 6 8 10 12
Session Session Session
Fig. 8. Average heading error near the current platform for the three groups. Zero
means the expert is pointing to the platform, one means a difference of π. (a) Results
in the Control group (MBe and MFe activated) (b) Hippocampal group (MFe only) (c)
Striatal group (MBe only).
MBe is able to reach the every possible platform location, but only when it is in
its vicinity. This suggests that a cooperation between the MFe –leading the robot
to the neighbourhood of the current platform– and the MBe –finding the precise
location once the robot is there– would perform well and enhance the performance
of the robot. Therefore, this particular configuration of the MBe is impaired in the
* * * *
* * * * * * * *
* * * *
(a) MBe in Striatal (b) MBe in (c) MFe in Striatal (d) MFe in
group Hippocampal group group Hippocampal group
(MBe only) (MFe only) (MBe only) (MFe only)

Current platform
Previous platform
Fig. 9. (a) Navigational map of the MBe in the Striatal group in the last session (fourth
trial): there are no center of attractions in other platform locations than the current
and the previous ones
(b) Navigational map of the MBe in the Hippocampal group in the last session (fourth
trial): the MBe has learned the ways to go the four possible locations of the platform
(c) Navigational map of the MFe in the Striatal group in the last session (fourth trial):
it has learned the same kind of map as in the Hippocampal and the Control groups
(d) Navigational map of the MFe in the Hippocampal group in the last session (fourth
trial): the learned policy is very close to the one in the Striatal group.
case where the MBe should perform the trajectory alone, but enhanced in the case
of a cooperation with the MFe.
We observe that the behavior of the robot when controlled by the MFe,
strongly influence the MBe. In contrast, the MBe-based behavior has less in-
fluence on the improvement of the MFe strategy. Remarkably, activation of both
experts (i.e., Control group) do not impair the simultaneous learning of both
strategies and allows the MBe to achieve better performance than when this
expert is the only one available.
4 Discussion
4.1 Competition and Cooperation
We have been able to reproduce the behaviour of rats in an experiment designed

to study interactions between different spatial learning systems. Our simulation
results are consistent with the original hypothesis of competitive interaction
between map-based (MB) and map-free (MF) strategies at the start of a session
when the location of the hidden cue-marked platform suddenly changes [13]. In
addition, our model suggests a cooperative interaction during the learning of
current location within a session. In these trials, the MF strategy is preferred
at the beginning of the journey when the local cue gives information about
the general direction to follow, while the robot gets closer to the goal, the MB
strategy provides more accurate information about the real platform location
and is chosen more often.
Other experimental studies have reported strategy changes during a jour-

ney depending on the available information and the animal previous experience
[18,19]. Hamilton et al. [19] reported a change from a map-based to a taxon strat-
egy when rats were looking for a visible, stable platform. Contrasting to Pearce
et al.’s settings, there are no intra-maze cues, and the authors report that rats
first used distal landmarks to find a general direction, then approached the plat-
form using a taxon strategy. Both, our results and those of Hamilton follow the
same rationale, i.e., rats first choose a general direction of movement, and then
choose the strategy that allows them to accurately locate the platform. Rat’s
head scanning was analyzed in order to estimate the strategy changes. The same
approach could be applied to the animal trajectories in Pearce’s paradigm in or-
der to identify whether the strategy change predicted by our model is confirmed
by the rats’ behaviour.
4.2 Synergistic Interactions and Dependence of an Expert on

Another One
Changes in the heading error, assessed by the evolution of the error in the dif-
ferent experimental groups, suggest synergistic interactions between the two ex-
perts. The MFe orients the robot towards the landmark, and the MBe helps the
robot to find the platform in the vicinity of the Landmark. If we define an ex-
pert as being dependent on another based on its ability to achieve a task alone,
we conclude that MBe is dependent on the MFe, as the MBe does not learn the
task across sessions. It should be noticed that an opposite relationship –i.e., MFe
depending on MBe– has been reported in different experimental conditions (see
[11] for a review).
4.3 Further Work

Despite qualitatively reproducing most of the results reported by Pearce et al.
[13], our model differs from animal results since a performance improvement was
observed within sessions in the Hippocampal group. This difference seems to be
mainly due to the learning process of the MFe in cases where, in the previous
session, the robot could reach the platform only by following the landmark (for
example, if the platform is at the North, as illustrated in Fig. 10). This impair-
ments can also explain the absence of convergence of both groups in the last
session.
In contrast to Pearce’s results, no significant difference is found between the
fourth trials of the Control and Hippocampal groups. We impute this to the
stochastic selection process –i.e., the probabilities associated with a strategy (see
section 2.1)– which is sometimes sub-optimal. More generally our results might
be improved by the use of a dynamically updated hippocampal map, as well as
the use of explicit extra-maze cues on which –according to the authors– both
strategies were anchored. In our simulation, these cues were only designed by an
absolute reference for the MFe, and an ad hoc cognitive map for the MBe. Finally,
models of map-based strategy different than place-response associations, can be
* *
* * * *
* *
(a) (c)
* *
* * * *
* *
(b) (d)
Current platform
Previous platform
Fig. 10. (a) Trajectory at the fourth trial of the 7th session: as the simulated robot
mainly went to this platform from the South, direction to the North were reinforced,
even at the North of the platform.
(b) Trajectory at the first trial of the 8th session: Starting from North, the robot needs
then a longer trial to readjust the direction towards the current platform.
(c) Navigational map of the MFe at the fourth trial of the 7th session : direction to the
North were reinforced, even at the North of the platform.
(d) Navigational map of the MFe at the first trial of the 8th session.
taken into account. The place-response strategy currently used in the model
associates locations to actions that lead to a single goal location. Therefore,
when the platform is relocated, the strategy has to be relearned. An alternative
map-based strategy can be proposed such as the relations between different
locations are learned irrespectively of the goal location (e.g. a topographical
map of the environment). Planning strategies can be used to find the new goal
location without relearning [3]. The use of computational models of planning
(e.g. [20,21]) as a map-based strategy in our model can yield further insights on
the use of spatial information in these types of tasks.
5 Conclusion
What stands out from our results is that our model allowed to analyze the
selection changes between both learning systems, while providing information
that is not directly accessible in experiments with animals (e.g., strategy selection
rate, expert reliability). This information can be used to elaborate predictions,
and propose new experiments towards the two-fold goal of further improving
our models and expand our knowledge of animal behaviour. It showed also that
opposite interactions can happen within a single experiment, and depend mainly
on contextual contingencies and practice, as it has been suggested by recent
works (e.g., [22,23]).
Coexistence of several spatial learning systems allows animals to dynami-
cally select which navigation strategy is the most appropriate to achieve their
behavioural goals. Furthermore, interaction among these systems may improve
the performance, either by speeding learning through collaboration of differ-
ent strategies, or competitive processes that prevents sub-optimal strategies to
be applied. Besides, better understanding of these interactions in animals by
use of the modelling approach described in this paper also contributes to the
improvement of autonomous robot navigation systems. Indeed, several bioin-
spired studies began exploring the robotic use of multiple navigation strategies
[12,24,25,26], the topic is however far from being fully explored yet.
Acknowledgment
This research was granted by the EC Integrated Project ICEA (Integrating

Cognition, Emotion and Autonomy). The authors wish to thank Angelo Ar-
leo, Karim Benchenane, Jean-Arcady Meyer and Denis Sheynikhovich for useful
discussions.
References
1. Trullier, O., Wiener, S.I., Berthoz, A., Meyer, J.A.: Biologically-based artificial
navigation systems: review and prospects. Progress in Neurobiology 83(3), 271–
285 (1997)
2. Filliat, D., Meyer, J.A.: Map-based navigation in mobile robots - i. a review of
localisation strategies. Journal of Cognitive Systems Research 4(4), 243–282 (2003)
3. Meyer, J.A., Filliat, D.: Map-based navigation in mobile robots - ii. a review of map-
learning and path-planning strategies. Journal of Cognitive Systems Research 4(4),
283–317 (2003)
4. Arleo, A., Rondi-Reig, L.: Multimodal sensory integration and concurrent navi-
gation strategies for spatial cognition in real and artificial organisms. Journal of
Integrative Neuroscience 6, 327–366 (2007)
5. Packard, M., McGaugh, J.: Double dissociation of fornix and caudate nucleus le-
sions on acquisition of two water maze tasks: Further evidence for multiple memory
systems. Behavioral Neuroscience 106(3), 439–446 (1992)
6. White, N., McDonald, R.: Multiple parallel memory systems in the brain of the
rat. Neurobiology of Learning and Memory 77, 125–184 (2002)
7. Kim, J., Baxter, M.: Multiple brain-memory systems: The whole does not equal
the sum of its parts. Trends in Neurosciences 24(6), 324–330 (2001)
8. Poldrack, R., Packard, M.: Competition among multiple memory systems: Con-
verging evidence from animal and human brain studies. Neuropsychologia 41(3),
245–251 (2003)
9. McIntyre, C., Marriott, L., Gold, P.: Patterns of brain acetylcholine release pre-
dict individual differences in preferred learning strategies in rats. Neurobiology of
Learning and Memory 79(2), 177–183 (2003)
10. McDonald, R., Devan, B., Hong, N.: Multiple memory systems: The power of
interactions. Neurobiology of Learning and Memory 82(3), 333–346 (2004)
11. Hartley, T., Burgess, N.: Complementary memory systems: Competition, cooper-
ation and compensation. Trends in Neurosciences 28(4), 169–170 (2005)
12. Chavarriaga, R., Strosslin, T., Sheynikhovich, D., Gerstner, W.: A computational
model of parallel navigation systems in rodents. Neuroinformatics 3(3), 223–242
(2005)
13. Pearce, J., Roberts, A., Good, M.: Hippocampal lesions disrupt navigation based
on cognitive maps but not heading vectors. Nature 396(6706), 75–77 (1998)
14. Morris, R.: Spatial localisation does not require the presence of local cues. Learning
and Motivation 12, 239–260 (1981)
15. Doeller, C.F., King, J.A., Burgess, N.: Parallel striatal and hippocampal systems
for landmarks and boundaries in spatial memory. Proceedings of the National
Academy of Sciences of the United States of America 105(15), 5915–5920 (2008)
16. Arleo, A., Gerstner, W.: Spatial cognition and neuro-mimetic navigation: A model
of hippocampal place cell activity. Biological Cybernetics 83(3), 287–299 (2000)
17. Strösslin, T., Sheynikhovich, D., Chavarriaga, R., Gerstner, W.: Robust self-
localisation and navigation based on hippocampal place cells. Neural Net-
works 18(9), 1125–1140 (2005)
18. Devan, B., White, N.: Parallel information processing in the dorsal striatum: Re-
lation to hippocampal function. Neural Computation 19(7), 2789–2798 (1999)
19. Hamilton, D., Rosenfelt, C., Whishaw, I.: Sequential control of navigation by locale
and taxon cues in the morris water task. Behavioural Brain Research 154(2), 385–
397 (2004)
20. Martinet, L.E., Passot, J.B., Fouque, B., Meyer, J.A., Arleo, A.: Map-based spatial
navigation: A cortical column model for action planning. In: Spatial Cognition (in
press, 2008)
21. Filliat, D., Meyer, J.: Global localization and topological map-learning for robot
navigation. In: Proceedings of the seventh international conference on simulation
of adaptive behavior on From animals to animats table of contents, pp. 131–140
(2002)
22. Pych, J., Chang, Q., Colon-Rivera, C., Haag, R., Gold, P.: Acetylcholine release
in the hippocampus and striatum during place and response training. Learning &
Memory 12(6), 564–572 (2005)
23. Martel, G., Blanchard, J., Mons, N., Gastambide, F., Micheau, J., Guillou, J.: Dy-
namic interplays between memory systems depend on practice: The hippocampus
is not always the first to provide solution. Neuroscience 150(4), 743–753 (2007)
24. Meyer, J., Guillot, A., Girard, B., Khamassi, M., Pirim, P., Berthoz, A.: The
Psikharpax project: Towards building an artificial rat. Robotics and Autonomous
Systems 50(4), 211–223 (2005)
25. Guazzelli, A., Corbacho, F.J., Bota, M., Arbib, M.A.: In: Affordances, motivations,
and the world graph theory, pp. 435–471. MIT Press, Cambridge (1998)
26. Girard, B., Filliat, D., Meyer, J.A., Berthoz, A., Guillot, A.: Integration of nav-
igation and action selection in a computational model of cortico-basal ganglia-
thalamo-cortical loops. Adaptive Behavior 13(2), 115–130 (2005)
A Minimalistic Model of Visually Guided
Obstacle Avoidance and Path Selection Behavior
Lorenz Gerstmayr1,2, Hanspeter A. Mallot1 , and Jan M. Wiener1,3

1
Cognitive Neuroscience, University of Tübingen, Auf der Morgenstelle 28,
D-72076 Tübingen, Germany
2
Computer Engineering Group, University of Bielefeld, Universitätsstr. 25,
D-33615 Bielefeld, Germany
3
Centre for Cognitive Science, Universtity of Freiburg, Friedrichstr. 50,
D-79098 Freiburg, Germany
Abstract. In this study we present an empirical experiment investigat-

ing obstacle avoidance and path selection behavior in rats and a number
of visually-guided models that could account for the empirical data. In
the experiment, the animals were repeatedly released into an open arena
containing several obstacles and a single feeder that was marked by a
large visual landmark. We recorded and analyzed the animals’ trajecto-
ries as they approached the feeder. We found that the animals adapted
their paths according to the specific obstacle configurations not only to
avoid the obstacles that were blocking the direct path, but also to select
optimal or near-optimal trajectories. On basis of these results, we then
develop and present a series of minimalistic models of obstacle avoidance
and path selection behavior that are based purely on visual input. In con-
trast to standard approaches to obstacle avoidance and path planning,
our models do not require a map-like representation of space.
Keywords: Spatial cognition, obstacle avoidance, path selection, bio-

logically inspired model.
1 Introduction
Selecting a path to approach a goal while avoiding obstacles is a fundamental spa-
tial behavior. Surprisingly few studies investigated the underlying mechanisms
and strategies in animals or humans (but see [1,2]). In the robotics community,
in contrast, obstacle avoidance and path selection is a vivid field of research and
several models have been developed (for an overview see [3,4]). These models
usually require rich spatial information: for example, the distances and direc-
tions to the goal and the obstacles have to be known and often a 2d map of
the environment has to be generated to select a trajectory to the goal. We be-
lief that in many situations successfull navigation behavior can also be achieved
using very sparse spatial information directly obtained from vision without map-
like representations of space. In this article, we present a series of minimalistic
visually guided models that closely predict empirical results on path selection
and obstacle avoidance behavior in rats.

88 L. Gerstmayr, H.A. Mallot, and J.M. Wiener
Obstacle avoidance methods which are related to the models proposed in the
following can be divided into two main categories: the first group models the goal as
an attractor whereas each obstacle is modelled as an repellor. Thus, the position of
each obstacle has to be known and the model’s complexity depends on the number
of obstacles. This group of methods is influenced by potential field methods [3,4]
which treat the robot as a particle moving in a vector field. The combination of
attractive and repulsive forces can be used to guide the agent towards the goal
while avoiding obstacles. Potential fields suffer several limitations: the agent can
get trapped in local minima, lateral obstacles can have a large influence on the
agent’s path towards the goal, and the approach predicts oscillating trajectories
in narrow passages [5]. Several improvements of the original method have been
proposed to overcome these drawbacks [6]. Potential fields have also been used to
model prey-approaching behavior in toads [7].
The task of goal approaching and obstacle avoidance can also be formulated as
dynamical system [8]. The movement decision is generated by solving a system
of differential equations. Again, the goal is represented as attractor whereas each
obstacle is modelled as repellor. The model is used to explain data obtained for
human path selection experiments [2]. In this model, route selection emerges
from on-line steering rather than from explicit path planning. In comparison
to the potential field method, the dynamical approach predicts smoother paths
and does not get trapped in local minima. A further extension of the model was
tested in real robot experiments [6].
The second class of obstacle avoidance methods only relies on distance infor-
mation at the agent’s current position and do not assume that the exact position
of each obstacle is known. The family of vector-field histogram (VFH) methods
[9,10,11] uses an occupancy grid as representation of the agent’s environment.
In a first processing step, obstacle information is condensed to a 1d polar his-
togram. In this representation, candidate corridors are identified and the corridor
which is closest to the goal direction is selected. The VFH+ method additionally
considers the robot’s dynamics: corridors which cannot be reached due to the
robot’s movement constraints are rejected [10]. The VFH* method incorporates
a look-ahead verification based on a map of the agent’s environment to prevent
trap situations due to dead ends [11]. For determing the movement decision, it
takes the consequences of certain movement decisions into account.
In the following, we present an exploratory empirical study investigating ob-
stacle avoidance and path selection behavior in rats (Sec. 2). We then (Sec. 3)
present a series of minimalistic visually guided models of obstacle avoidance and
path selection that could account for the empirical data. In contrast to the mod-
els introduced above, the proposed models (1) act purely on visual input, (2) do
not require map-like representations of space, and (3) the most basic 1d model
does not require any distance information to generate obstacle avoidance and
path selection behavior. Based on our findings, we finally conclude (Sec. 4) that
reliable obstacle avoidance is possible with such minimalistic models and that
our models implicitly solve the path-planning problem.
A Minimalistic Model of Visually Guided Obstacle Avoidance 89
2 Part 1: Behavioral Study
In this part we present an exploratory study, examining path selection and ob-
stacle avoidance behavior in rats. For this, animals were trained to receive food
reward at a landmark visible in the entire experimental arena. Rats were re-
peatedly released into this arena and their paths, approaching this landmark,
was recorded. Placing obstacles between the start position and the landmark,
allowed us to systematically investigate how rats reacted to the obstacles during
target approach. The behavioral results from this experiment are the basis for
the development of a series of visually guided models of obstacle avoidance and
path selection behavior in the second part of this manuscript.
2.1 Material and Methods

Animals. Six Long Evans rats (rattus norvegicus), approximately 7 weeks old at
the beginning of the study, weighing between 150 and 200 g, participated in the
study. They were housed individually under constant temperature and humidity.
Fig. 1. Experimental setup
Apparatus. The apparatus consisted of an open area (140 × 140 cm), separated
from the remaining laboratory by white barriers (height 40 cm) and surrounded
by a black curtain (see Fig. 1). Within this area up to 6 obstacles (brown 0.5 l
bottles) were distributed. Food was available from a small feeder that was placed
directly under a black-white striped cylinder (25 cm diameter, 80 cm in height).
The cylinder was suspended from the ceiling about 40 cm above the ground and
was visible in the entire arena. A transparent start box was placed in one of
the corners of the arena. At the beginning of each trial, rats were released by
opening the door of the start box. Their trajectories were recorded by a tracking
system registering the position of a small reflector foil that was attached to a
soft leather harness the animals were wearing (sampling rate: 50 Hz).
Procedure. Prior to the experiments, the animals were familiarized with the
experimental setup by repeatedly placing them in the arena, allowing them to
explore it for 180 sec. The obstacles, feeder and landmark, and the start box were
randomly repositioned within the arena on a daily basis. After familiarization,
the animals were trained to receive cocoa flavored cereal (Kellogg’s) that was
located in the feeder under the landmark. For each trial, the feeder (incl. land-
mark) as well as the obstacles were randomly distributed in the arena. Before
each trial, the rats were placed in the start box and released after 15 seconds.
A trial lasted until food reward was found or until 600 seconds passed. Animals
were given 4 training-trials each day for a period of 10 days. In addition to the
food reward during the experiments, the rats received 15g rodent food pellets
per day.
The procedure during the test phase was identical to the training phase, but
animals received 8 trials per day for a given test-configuration of obstacles and
feeder. For each trial, rats’ trajectories were recorded until food reward was ob-
tained. Each day, the release box was randomly repositioned in one of the corners
of the arena. The positions of feeder and obstacles were adjusted accordingly.
Each rat was tested in each of the 20 test-configurations (see Fig(s). 2, 3, and 7).
The rats were subdivided into 2 groups with 3 animals each that were exposed
to the test-configuration in a different order.
Analysis. For each configuration, we evaluated the percentage of trials in which

a single animal passed the obstacles on the left or on the right side. For config-
urations in which the obstacles created a gap (see Fig. 3) we also calculated the
percentage of trials in which animals passed through that gap.
Altogether, 67 trials (6.98 %) were removed from the final data set (893 trials
remaining). This was due to the following reasons: (1) In 8 trials the tracking
system failed; (2) In the remaining 59 trials the rats either left the open area
by running towards and touching the surrounding walls, or they turned more
than 180◦ , thus running in a loop. In all these cases, the animals did not behave
goal-directed (i.e. approaching the feeder position).
2.2 Results
Fig(s). 2, 3, and 7 display the entire set of trajectories for all configurations as
well as the relative frequencies for passing the obstacles on the left side or the
right side or passing through the gap. It is apparent from these figures that the
rats adapted their path selection behavior according to the specific configuration.
In the following, we present a detailed analysis of how the configurations influ-
enced path selection behavior. Specific interest concerns the questions whether
animals minimized path length, how animals reacted to gaps of different sizes,
and whether animals built up individual preferences to pass obstacle configura-
tions to the left or right side. Finally, we extract motion parameters from the
recorded trajectories that will be used in the second part in which we present a
visually guided model of path selection and obstacle avoidance behavior.
1 2 3 4
5 6 7 8
Fig. 2. Asymmetric configurations: rats’ chosen trajectories are displayed in the upper
row, the predictions of the 1d-model (see Sec. 3) are displayed below. The black and
gray horizontal bars depict the animals (upper) or the models (lower) behavior with
respect to passing the obstacles on the left (black) or the right (light gray) side.
Distance minimization. Fig. 2 displays the asymmetric configurations. Passing

the obstacles on the right and on the left side resulted in paths of unequal length.
For these configurations we evaluated whether animals showed a preference for
the shorter alternative. Animals preferred the shorter over the longer alternative
in 76.53 % of the runs (t-test against chance level (50 %), t(5)=8.09, p<0.001).
Gap size. In configurations 13 to 16 the obstacles were arranged such that they
created a gap (see Fig. 3). The width of the gap was either 32 cm (configurations
13 and 14) or 14 cm (configurations 15 and 16). Rats’ behavior in choosing the
path through the gap depended on the width of the gap. In configurations 13
and 14 (wide gap) they ran through the gap in 83.76 % of the runs as compared
to 36.20 % of the runs for the configurations 15 and 16 (narrow gap; t-test:
t(5)=3.00, p=0.03).
Symmetric configurations. In the symmetric configurations (see Fig. 3) passing

the obstacle on the right side and on the left side resulted in paths of equal
9 10 11 12
13 14 15 16
Fig. 3. Symmetric configurations: rats’ chosen trajectories are displayed in the upper
row, the predictions of the 1d-model (see Sec. 3) are displayed below. The gray shaded
horizontal bars depict the animals (upper) or the models (lower) behavior with respect
to passing the obstacles on the left (black) or the right (light gray) side, or passing
through the gap (middle gray).
length. The symmetric configuration allowed investigating whether rats devel-

oped individual preferences to pass the obstacles on the left side or the right
side. Three rats displayed an overall tendency to pass obstacles on the left side,
the remaining three rats displayed a tendency to pass obstacles on the right side.
While the individual preferences are moderate at the beginning of the experi-
ment, they strongly increase over time (i.e. over experimental sessions, r=0.89,
p<0.01).
Locomotor behavior. To adjust the movement parameters of our visually guided

models of obstacle avoidance and path selection behavior and to compare their
behavior to the empirical data presented above, it was necessary to estimate the
rats’ navigation velocity and turning behavior from the recorded trajectories.
To do so, we hand-selected 6 runs for each rat (36 runs in total) for which the
tracking data were precise and error-free. We then calculated the distance covered
by the rat between 2 successive samples (tracker frequency: 50 Hz). Furthermore,
250 600
500
200
400
150
Count
Count
300
100
200
50
100
0 0
0 1 2 3 4 5 0 10 20 30 40
Distance [cm] Orientation change [deg]
Fig. 4. Left: histogram of rats’ distance covered between 2 subsequent samples; right:
histogram of change in orientation between 2 subsequent samples. The vertical gray
lines mark the mean distance and the mean orientation change, respectively.
we calculated the rats’ change in orientation (turning rate) between successive

samples. Fig. 4 displays the results of this analysis. The average distance covered
was 2.1 cm per timestep which relates to a velocity of 105 cm/sec.; the average
turning rate was ±9.25◦ per timestep.
2.3 Discussion
In the part of the work we presented an exploratory study examining rats’ path
selection and obstacle avoidance behavior. Animals were released from a start
box into an open arena with a number of obstacles and a feeder, marked by a
large landmark. Rats avoided the obstacles and approached the feeder fastly and
efficiently. In fact, in over 75 % of the trials in which path alternatives differed
in length, the animals showed a preference for the shorter alternative. These
empirical results demonstrate that the animals reacted to the specific target
configurations. The fact that animals minimized path length is remarkable to
some extend as the additional energy expenditure when taking detours or sub-
optimal paths are estimated to be rather small in this scenario. Nevertheless, the
animals did not adopt a general strategy that could account for the entire set
of configurations, such as moving towards and along the walls, but they decided
on the exact path on a trial by trial basis. It has to be noted, however, that for
the symmetric configurations (see Fig. 3) rats built up rather strong individual
preferences to pass obstacles on the right or the left side. Such preferences can
be explained by motor programs. Rats are well-known to develop stereotyped
locomotory behavior when repeatedly exposed to the same situation (for an
overview see [12]), such as being released from the start box. In other words, the
animals draw movement decisions already in the start box that were independent
of the specific obstacle configuration. However, at some point on their trajectory,
animals reacted to the configuration. Otherwise no variance in behavior would
have been observed.
3 Part 2: A Visually Guided Model of Obstacle

Avoidance and Path Selection
In this part of our paper, we present a series of visually guided models for obsta-
cle avoidance and path selection behavior which are inspired by the experiments
presented above. The proposed algorithms were designed to be both minimal-
istic and biologically plausible models for the rats’ behavior. The models are
purely reactive, do not build or iteratively update a map-like representation of
the environment, and make only use of visual information. For our models, the
position of each obstacle with respect to the agent’s position needs not to be
known. Their complexity solely depends on the size of the visual input. By such
a bottom-up approach we hope to find out which kind of information is relevant
for the rat.
Visual input. As input, our models use a panoramic image with a horizontal field
of view of 360◦. The vertical field of view covers 90◦ below the horizon. The rats’
visual field above the horizon is neglected as it does — at least for our setup —
not contain information necessary for obstacle avoidance. The angular resolution
of the images is 1◦ per pixel. Images are generated by a simple raycaster assuming
that the rats’ eye level is 5 cm above the ground plane.
The process of image generation is sketched in Fig. 5. For each direction of
sight, the raycaster computes the distance from the current robot position to the
corresponding obstacle (modeled as cylinders, black pixels) or the walls of the
arena (white pixels). Since the object’s distance to the agent is directly linked
to the elevation under which it is imaged, a 2d view of the environment were
computed (Fig. 5, middle). Close-by objects are imaged both larger and under
a larger elevation than distant objects. Based on the 2d images, 1d images can
be obtained by taking slices of constant elevation below the horizon (gray
horizontal lines) out of the 2d image. Depending on the elevation of the slice,
the resulting 1d view only contains obstacle information up to a certain distance
(Fig. 5, right). In case the slice is taken along the horizon (top right), also objects
at a very large distance are imaged. For this case, no depth cue is available
because we do not analyze the angular extend of the obstacles and we do not
compute optical flow between two consecutive input images.
Since the input images only contain visual information about the obstacles, the
goal direction (Fig. 5, vertical dark gray line) w.r.t. the agent’s current heading
direction (vertical light gray line) is provided as another input parameter.
Model assumptions and restrictions. As outlined in the previous section, the

raycaster computes a binary image which only contains the obstacle information
(Fig. 5). Assuming such obstacle-segmented images facilitates the further pro-
cessing of the visual information by our models. Nevertheless, we are aware that
such a strong assumption makes it difficult to test the proposed models with
a robot operating in a natural environment. Such a test would require further
image preprocessing steps such as ground plane segmentation: based on color
or texture cues, optical flow, or geometrical considerations such algorithms can
bird’s eye view 2d view 1d views
Elevation [DEG]
22.5
45
67.5
90 −90 0 90 180
−90 0 90 180
Azimuth [DEG]
Azimuth [DEG]
Fig. 5. Image generation. For detailed explanations see the text; for visualization pur-
poses, the 1d views are stretched in the y-direction.
classify whether image regions lie in the ground plane or not (for reviews see
[13,14]). The ground plane could then be assumed to be free space, whereas
other image regions would be interpreted as obstacles.
Assuming an image preprocessing step is also reasonable from the viewpoint of
visual information processing in the brain: lower processing stages usually trans-
form the visual information into a representation which facilitates further process-
ing by higher-level stages [15]. As the goal direction could be derived in earlier
stages, we think it is reasonable to pass it as input parameter to our models.
The behavior is modeled in discrete time steps, each time step corresponding
to one sampling cycle of the used tracker. The models also neglect dynamic
aspects of the moving rats. The simulated agents are moving with a constant
velocity of 2.1cm per time step and a maximum turning rate of ±9.25◦ per time
step (compare Sec. 2.2). By limiting the maximum turning rate, aspects of the
agent’s kinematic are at least partially considered [10]; though the simplifications
could complicate a real robot implementation.
3.1 1D Model without Depth Information

In this section, we will propose a model for obstacle avoidance and goal ap-
proaching behavior. As it uses a 1d view of the environment taken along the
horizon, no depth cues are available. To this end, we refer to the model as 1d
model.
For our 1d model, we only use the most fundamental building blocks needed to
achieve obstacle avoidance and goal approaching behavior. These building blocks
are (1) the ability to detect obstacles and (2) the ability of steering [2,6,8]. These
two abilities are sufficient to guide the agent towards its goal position. Thus, the
path planning problem is implicitly solved. In detail, our algorithm includes the
following steps (see Fig. 6):
(0) The agent is initialized with a position and a orientation.
(1) Obstacles are enlarged for a constant angle δ which is independent of the
agent’s distance to the obstacle and the extend of the obstacle within the input
image. The angle δ is the only model parameter of our 1d model. Growing the
robots representation of obstacles is a standard method in mobile robotics: it
ensures that the robot passes the real obstacles at a safe distance [4].
βl goal direction 4.5
ρ heading 4
δ direction 3.5
γ
ρ(= α ) 3
Residual E(δ)
2.5
βr (= α)
δ 2
1.5
movement direction 0.5

0 5 10 15 20
Enlargement δ
Fig. 6. Left: sketch of the 1d model. For explanations see the description above. Right:
optimization residuals E depending on the enlargement parameter δ. For details see
the section below.
(2) Check whether the goal direction γ is blocked by an obstacle or not. In

case it is not blocked, choose α = γ as desired movement direction for the next
simulation step and proceed with step (4).
(3) Determine the angles βl , βr between the agent’s current heading direction
and the borders of the enlarged obstacles. In case abs(βl ) < abs(βr ) choose
α = βl ; otherwise use α = βr . In each simulation step, this method of selecting
the next movement direction tries to keep the agent on a straight trajectory by
minimizing the change of orientation. This step is similar to the corridor selection
of the VFH methods [9,10,11].
(4) Limit the desired movement direction α to the range ±ρ (light gray shaded
area). The result α is used as change of orientation for the next simulation step.
After rotation, the agent moves straight (for d = 2.1 cm).
Steps (1) to (4) of the algorithm are repeated until the agent’s distance to
the goal is smaller than a certain distance (in our experiments 6 cm). In case the
agent does not reach the goal within a maximum number of steps, in case it hits
an obstacle, or in case the simulated trajectory extends beyond the limits of the
arena, the trial is counted as unsuccessful.
3.2 Model Evaluation

Parameter optimization. As the goal of our model is to optimally reproduce the
data obtained from the behavioral experiments, the enlargement parameter δ
was systematically varied in the range δ ∈ 0◦ , 1◦ , 2◦ , . . . , 20◦ . For each of the
20 configurations (c ∈ 1, 2, . . . , 20), 70 trajectories were simulated (starting at 7
release positions with 10 different initial orientations; positions and orientations
were equally and symmetrically distributed). As for the rats’ trajectories, we
analyzed for each simulated trajectory whether the agent passes the obstacles on
the left side, the right side or through the middle. Depending on the configuration
c and the enlargement δ, a vector

hsim (c, δ) = hL (c, δ), hM (c, δ), hR (c, δ) (1)
of relative frequencies for passing on the left side, on the right side, or passing
through the middle. In order to determine the optimal value of δ, the following
dissimilarity measure

20

E(δ) = SSD hsim (c, δ), hrat (c) (2)
c=1
was minimized. The measure computes the sum of squared differences (SSD)
between the vectors of relative frequencies hsim and hrat for the simulation and
the rats’ data, respectively. The best fit (Fig. 6, right) was obtained for δ = 6◦
with an optimization residual of E = 0.989. The resulting trajectories are shown
in Fig(s). 2, 3, and 7; the configurations depicted in Fig. 7 were solely used for
adjusting the model parameters.
17 18 19 20
Fig. 7. Further configurations used only for the parameter optimization. Rats’ chosen
trajectories are displayed in the upper row, the predictions of the 1d-model (see Part
2) are displayed below. The gray shaded horizontal bars depict the animals (upper) or
the models (lower) behavior with respect to passing the obstacles on the left (black)
or the right (light gray) side, or passing through the gap (middle gray).
Correlation between simulation and behavioral data. To assess how good the
model fits the behavioral data, we correlated the relative frequencies hrat (c) and
hsim (c) (for 1 ≤ c ≤ 20). From the 9 possible combinations, the correlations
rL,L = 0.919, rM,M = 0.947, and rR,R = 0.935 are most relevant for our
purposes; the mixed correlations all exhibit negative correlations around −0.45.
However, this analysis does not distinguish between configurations with and
without a gap: the correlation rM,M is influenced because hM = 0 is assumed
for all configurations without a gap. To overcome this drawback, we separately
correlated the relative frequencies for the configurations with and without a gap.
For the first class (configurations 13 to 20) we obtained correlations rL,L = 0.816,
rM,M = 0.943, and rR,R = 0.969; for the second class (configurations 1 to 12)
we obtained rL,L = rR,R = 0.934 (as for these configurations hR equals 1 − hL ,
the correlations rL,L and rR,R are identical).
Trap situations. In order to test our model with other obstacle configurations
than those tested in the behavioral experiments, we performed tests (Fig. 8)
with cluttered environments (configurations 1, 2), a U-shaped obstacle configu-
ration (3), and a configuration for which the agent is completely surrounded by
obstacles (4). For each test run, the agents were initialized with identical start
position; the initial orientation was varied in steps of 15◦ . After initialization,
the simulated agent was moved forward for one step. Afterwards, the model was
applied to predict the agent’s path.
The results for configurations 1 and 2 show that almost every simulated tra-
jectory reaches the goal position. Some of the paths are not as short as possible
because the model also tries to avoid distant obstacles which do not directly block
the agent’s way towards the goal. In case the agent hits the obstacle, it cannot turn
fast enough to avoid the obstacle due to the limited turning rate. Our 1d model
is also able to reach the goal in test configuration 3. This is a test situation for
which many obstacle avoidance methods relying on depth information (e.g. poten-
tial field approaches) fail due to local minima [5]. Our model fails for condition 4:
in this case, no movement decision can be derived because the agent is completely
surrounded by obstacles (resulting in a completely black 1d view).
1 2 3 4
Fig. 8. Trap situations for the 1d model
3.3 Discussion
Our model is capable to produce smooth trajectories reaching the goal position
without crashing into the obstacles. Since our model does not contain any noise,
the simulated trajectories look much smoother than the rats’ trajectories. Com-
paring the analysis whether the agent passed on the left side, on the right side
or through the gap to the corresponding behavioral data reveals that the model
covers several aspects we outlined in Sec. 2.2. These aspects will be discussed in
the following paragraphs in more detail; afterwards, we outline the limitations
of the 1d model.
Distance minimization. For the asymmetric configurations (Fig. 2) 78.93% of

the simulated trajectories pass the obstacles such that the length of the result-
ing path is minimal (rats’ trajectories: 76.53%). Non-optimal paths are due to
the model’s tendency to predict straight trajectories: since in every time step
the change of orientation is kept as small as possible, the agent passes the ob-
stacles on that side which results in a longer path. This effect is also visible for
configurations 19 and 20: there, the shortest path would be to pass through the
gap. However, this path would require the agent to turn more than passing on
the left or the right of the obstacles.
Gap size. For the behavioral data we observed that the rats more frequently pass
through larger gaps. Comparing configurations 13 to 16 (Figure 3) reveals that
all simulated trajectories pass through the gap if the gap is large (in comparison
to 83.76% for the rats’ trajectories). In case the gap is small, only 15.71% of the
simulated trajectories pass through the gap (rats: 36.20%).
Symmetric configurations. Our model did not reproduce the left-right-preferences

we observed for the symmetric configurations (Fig. 3). As we initialized the simu-
lated agent with symmetrically and equally distributed release positions and ori-
entations, we did not account for reproducing the rats’ preferences. It is left for
future work to initialize the model with positions and directions which better re-
produce the rats’ trajectories.
Model limitations. Although the model is capable to reproduce the results ob-
tained from the behavioral experiments, the comparison between the simulated
and the rats’ trajectories reveal several aspects which are due to the lack of
depth information in our model: (1) the model seems to react earlier to obsta-
cles than the rats, (2) the simulated trajectories pass closer to obstacles than the
rats’ trajectories and (3) our model cannot solve the trap configuration 4 which
definitively could be solved by rats. The latter aspect is due to neglecting the
agent’s dynamics in our simulation.
(1) Reaction to obstacles. Many simulated trajectories (e.g. for configurations

10, 11, and 12) are curved, then turn straight until they pass the obstacle, turn
again and finally the agent travels along a straight line towards the goal position.
In contrary, many rats run straight towards the goal and only later start to
avoid the obstacles. They avoid the obstacles on a curved path and also often
approach the goal on a curved path. This behavior suggests that the rats try
to approach the goal and only start reacting to obstacles when they are at
a certain distance from the obstacles. Since our model does not incorporate
any information about the distance to the obstacles it tries to avoid obstacles
independent of the agent’s current distance to the obstacles.
(2) Distance while passing obstacles. Comparing the model’s and the rats’ tra-
jectories also reveals that the simulated agent passes-by closer to obstacles than
the rats. This can also be explained with the lack of depth information: indepen-
dent of the distance to the obstacle, the obstacle is enlarged by δ. If the agent
is far away from the obstacle, the enlargement δ is large compared to the size of
the obstacles. In case the agent is close to the obstacle, this enlargement is small
compared to the size of the obstacles imaged on the agent’s retina. For this rea-
son, the agent passes very close to obstacles. For larger δ, gaps between obstacles
get closed due to obstacle growing. For these cases, the agent could no longer
pass through gaps. These model properties could be avoided by introducing an
enlargement mechanism which depends on the distance to the obstacle.
3.4 Outlook: Models Incorporating Depth Information

In order to overcome the drawbacks of the 1d model outlined in the previous
section, we are currently working on extensions of the 1d model which also
incorporate depth information. Two of these extensions will be briefly described
in this section. Both models have in common that the elevation under which
models are imaged is used as depth cue and that the 2d input image is reduced
to a 1-dimensional representation of the environment. Again, our models are
purely reactive, do not need a map-like representation and their complexity
depends only on the resolution of the input images.
1.5d model. Except for the input image, our 1.5d model is identical to the 1d
model described above. In contrary to our 1d model, the 1.5d model uses a view
taken out of the 2d image with constant elevation > 0 (Figure 5, right). Hence,
it only takes obstacles up to a certain distance from the agent’s current position
into account. Monocular vision as a cue of depth information received attention
in the context of robot soccer. A method called “visual sonar” [16,14] searches
along radial scan lines in the camera image. In case an obstacle is encountered
along the scan line, its distance can be computed. This information can then be
used for further navigation capabilities such as obstacle avoidance, path planning
or mapping. Like the proposed 1.5d model, the “visual sonar” relies on elevation
as a cue for depth information [17]. This depth cue can also be used by frogs
and humans [18,19].
Fig. 9 visualizes the results obtained for testing the 1.5d model with the trap
situations described in Sec. 3.2. For the experiments, horizontal views taken
at = 30◦ were used. For the cluttered environments (configurations 1 and
2), the model predicts paths which are shorter than the paths predicted by
the 1d model. Since the 1.5d model does not consider distant obstacles, there
are situations in which the 1.5d model approaches the goal, whereas the 1d
model avoids distant obstacles. Hence, the 1.5d model is able to predict shorter
paths. For test configuration 3, our model suffers the same problems as many
reactive obstacle avoidance methods incorporating depth information: due to
local minima, the simulated agents head towards the goal. When the obstacle
in front of the agent comes into sight, it starts to avoid the obstacle. However,
it then cannot reach the goal position any more. Related work tries to solve
this problem by map-building and look-ahead path-planning algorithms [11].
1 2 3 4
Fig. 9. Trap situations for the 1.5d model
bird’s eye view 2d view

3
Elevation [DEG]
22.5
45
heading
direction
67.5 goal
90
direction
−90 0 90 180
Azimuth [DEG]
repelling profile
4
attracting profile
combination
α
2ρ
Fig. 10. Left: sketch of the 2d model. Right: trap situations for the 2d model.
Since the model incorporates depth information, it can solve the test condition
4 at least if the initial orientation points towards the open side of the U-shaped
obstacle. Due to the restricted movement parameters, the model cannot turn
fast enough and hits the obstacles for other initial orientations.
2d model. The 2d model (Fig. 10) we are currently working on uses a 2d view
of the environment as shown in Fig. 5. For a set of n horizontal directions of
sight ϕi (1 ≤ i ≤ n), the distance di towards the visible obstacle is computed
based on the elevation under which the obstacle is imaged. By this step, the
2d image information is reduced to a 1d depth profile. At each direction of sight
ϕi , a periodic and unimodal curve (comparable to the von Mises distribution) is
placed. The curve’s height is weighted by the inverse of di . By summing over all
the von Mises curves, a repelling profile is computed. Goal attraction is modeled
by an attracting profile with a minimum at the goal direction. Both profiles are
summed up and a minimization process searches the profile’s minimum in the
range ±ρ around the agent’s heading direction. The direction of the minimum α
is used as movement direction. The polar obstacle representation is recomputed
in each iteration and not updated from stept to step.
Fig. 10 also visualizes the model’s trajectories obtained for trap configurations
3 and 4. Although the agent gets trapped in configuration 3, much more test
trials than for the 1.5d model successfully reach the goal. Since the trajectories
were simulated with relatively large σ, objects are passed by at a comparatively
large distance. We are currently working on improving the distance weighting
as well as the interplay between repelling and attractive profiles. By this means,
we expect that our 2d model performs better than the other models.
4 Conclusion
In this work we presented an exploratory study examining obstacle avoidance
and path selection behavior in rats and a minimalistic visually-guided model
that could account for the empirical data. The particular appeal of the model is
its simplicity, neither requiring map-like representations of the goal and obsta-
cles nor does it incorporate depth information. These results demonstrate that
reliable obstacle avoidance can be achieved with only two basic building blocks:
(1) the ability to approach the goal and (2) the ability to detect if the course
towards the goal is blocked by an obstacle and to avoid the obstacle. While the
proposed basic 1d-model is capable to reproduce the results of the behavioral ex-
periment described in Sec. 2, a detailed comparison of the simulated trajectories
with the empirical data suggests that the rats probably used depth information.
This can be concluded from the fact that rats seem to only react to obstacles
when they are at a certain distance from them and that rats passed by obstacles
at a comparatively large distance. Both of these aspects cannot be reproduced
by our 1d model. In order to explain these findings, we have presented first ideas
(1.5d and 2d model) of how depth information can be integrated in our model
in a sparse and biologically inspired fashion.
References
1. Fajen, B., Warren, W.: Behavioral dynamics of steering, obstacle avoidance, and
route selection. Journal of Experimental Psycholology: Human Perception Perfor-
mance 29(2), 343–362 (2003)
2. Fajen, B., Warren, W., Temizer, S., Kaelbling, L.P.: A dynamical model of visually-
guided steering, obstacle avoidance, and route selection. International Journal of
Computer Vision 54(1–3), 13–34 (2003)
3. Choset, H., Lynch, K., Hutchinson, S., Kantor, G., Burgard, W., Kavraki, L.,
Thrun, S.: Principles of Robot Motion. MIT Press, Cambridge (2005)
4. Siegwart, R., Nourbaksh, I.: Introduction to Autonomous Mobile Robots. MIT
5. Koren, Y., Borenstein, J.: Potential field methods and their inherent limitations
for mobile robot navigation. In: Proceedings of the IEEE Conference on Robotics
and Automation, pp. 1398–1404 (1991)
6. Huang, W., Fajen, B., Finka, J., Warren, W.: Visual navigation and obstacle avoid-
ance using a steering potential function. Robotics and Autonomous Systems 54(4),
288–299 (2006)
7. Arbib, M., House, D.: Depth and Detours: An Essay on Visually Guided Behav-
ior. In: Vision, Brain, and Cooperative Computations, pp. 129–163. MIT Press,
Cambridge (1987)
8. Schöner, G., Dose, M., Engels, C.: Dynamics of behavior: Theory and applications
for autonomous robot architectures. Robotics and Autonomous Systems 16(2–4),
213–245 (1995)
9. Borenstein, J., Koren, Y.: The vector field histogram – fast obstacle avoidance for
mobile robots. IEEE Journal of Robotics and Automation 7(3), 278–288 (1991)
10. Ulrich, I., Borenstein, J.: VFH+: Reliable obstacle avoidance for fast mobile robots.
In: Proceedings of the IEEE Conference on Robotics and Automation (1998)
11. Ulrich, I., Borenstein, J.: VFH*: Local obstacle avoidance with look–ahead verifica-
tion. In: Proceedings of the Internactional Conference on Robotics and Automation
(2000)
12. Gallistel, C.R.: The Organisation of Learning. MIT Press, Bradford Books, Cam-
bridge (1990)
13. Chen, Z., Pears, N., Liang, B.: Monocular obstacle detection using reciprocal-polar
rectification. Image and Vision Computing 24(12), 1301–1312 (2006)
14. Lenser, S., Veloso, M.: Visual sonar: Fast obstacle avoidance using monocular vi-
sion. In: Proceedings of the IEEE Conference on Intelligent Robots and Systems,
pp. 886–891 (2003)
15. Simoncelli, E.P., Olshausen, B.A.: Natural image statistics and neural representa-
tion. Annual Reviews Neuroscience 24, 1193–1216 (2001)
16. Horswill, I.D.: Visual collision avoidance by segmentation. In: Proceedings of the
IEEE Conference on Robotics and Autonomous Systems, pp. 901–909 (1994)
17. Hoffmann, J., Jüngel, M., Lötzsch, M.: A vision based system for goal-directed
obstacle avoidance. In: Nardi, D., Riedmiller, M., Sammut, C., Santos-Victor, J.
(eds.) RoboCup 2004. LNCS (LNAI), vol. 3276, pp. 418–425. Springer, Heidelberg
(2005)
18. Collett, T.S., Udin, S.B.: Frogs use retinal elevation as a cue to distance. Journal
of Comparative Physiology A 163(5), 677–683 (1988)
19. Ooi, T.L., Wu, B., He, Z.J.: Distance determined by the angular declination below
the horizon. Nature 414(6860), 197–200 (2001)
Route Learning Strategies in a Virtual Cluttered
Environment
Rebecca Hurlebaus1 , Kai Basten1 ,

Hanspeter A. Mallot1 , and Jan M. Wiener2
1
Cognitive Neurosccience, University of Tübingen, Auf der Morgenstelle 28, D-72076
Tübingen, Germany
2
Center for Cognitive Science, University of Freiburg, Friedrichstr. 50, D-79089
Freiburg, Germany
Abstract. Here we present an experiment investigating human route

learning behavior. Specific interest concerned the learning strategies as
well as the underlying spatial knowledge. In the experiment naive par-
ticipants were asked to learn a path between two locations in a complex,
cluttered virtual environment that featured local and global landmark
information. Participants were trained for several days until they solved
the wayfinding task fastly and efficiently. The analysis of individual
navigation behavior demontrates strong interindividual differences sug-
gesting different route learning strategies: while some participants were
very conservative in their route choices, always selecting the same route,
other participants showed a high variability in their route choices. In
the subsequent test phase we systematically varied the availability of lo-
cal and global landmark information to gain first insights into the spatial
knowledge underlying these different behavior. Participants showing high
variability in route choices strongly depended on global landmark in-
formation. Moreover, participants who were conservative in their route
choices were able to reproduce the basic form of the learned routes even
without any local landmark information, suggesting that their route
memory contained metric information. The results of this study sug-
gest two alternative strategies for solving route learning and wayfinding
tasks that are reflected in the spatial knowledge aquired during learning.
Keywords: spatial cognition, route learning, navigation.
1 Introduction
Finding the way between two locations is an essential and frequent wayfinding
task for both animals and humans. Typical examples include the way from the
nest to a feeding site or the route between your home and the office. While several
navigation studies, both in real and virtual environments, investigated the form
and content of route knowledge (e.g., [1,2,3]), empirical studies investigating the
route learning process itself are rather limited (but see [4,5]).
A very influential theoretical framework of spatial knowledge acquisition pro-
poses three stages when learning a novel environment [6]. First, landmark knowl-
edge, i.e., knowledge about objects or views allowing to identify places, is acquired.

Route Learning Strategies in a Virtual Cluttered Environment 105
In the second stage, landmarks are combined to form route knowledge. With in-
creasing experience in the environment, survey knowledge (i.e. knowledge about
distances and direction between landmarks) emerges. According to this model, the
mental representation of a route can be conceived as a chain of landmarks or places
with associated movement directives (e.g. turn right at red house, turn left at the
street lights). This landmark to route to survey knowledge theory of spatial learn-
ing has not remained unchallenged: Recent findings, for example, demonstrate
that repeated exposures to a route not necessarily resulted in improving metric
knowledge between landmarks encountered on the route [5]. Most participants
either had accurate knowledge from the first exposure or they never acquired
it. Furthermore, results from route learning experiments in virtual reality sug-
gest two spatial learning processes that act in parallel rather than subsequently
[4]: (1) a visually dominated strategy for the recognition of routes (i.e., chains
of places with associated movement directives) and (2) a spatially dominated
strategy integrating places into a survey map. The latter strategy requires no
prior connection of places to routes. Support for parallel rather than subsequent
learning processes also comes from experiments with rats: depending on the exact
training and reinforcement procedure, rats can be trained to approach positions
that are defined by the configuration of extramaze cues (c.f. spatially dominated
strategy), to follow local visual beacons (c.f. visually dominated strategy), or to
execute motor responses (e.g., turn right at intersection; [7,8]). Evidence for a
functional distinction of spatial memories also comes from experiments demon-
strating that participants who learned a route by navigation performed better
on route perspective tasks, while participants who learned a route from a map
performed better on tasks analysing survey knowledge [9].
In any case, route knowledge is usually described as a chain of stimulus-
response pairs [10,11], in which the recognition of a place stimulates or triggers
a response (i.e., a direction of motion). Places along a route can be recognized
by objects but also by views or scenes [12]. Evidence for this concept of route
memory mostly comes from experiments in mazes, buildings, or urban environ-
ments, in which decision points were well defined (e.g. [1,13,3]). Furthermore,
distinct objects (i.e., unique landmarks) are usually presented at decision points.
Route learning in open environments, in contrast, has received little attention
in humans, but has been convincingly demonstrated in ants [2]. The desert ant
Melophorus bagoti is a singly foraging ant and its environment is characterized
by cluttered distributed small grass tussocks. The ants establish idiosyncratic
routes while shuttling back and forth between a feeder and their nest. Each
individual ant follows a constant route for inbound runs (feeder to nest) and
outbound runs (nest to feeder). Usually both routes differ from each other and
show a high directionality [14]. In contrast, wood ants can learn bi-directional
routes when briefly reversing direction and tracing their path for a short distance
[15]. For both ant species view-dependend learning is essential for route learning
in open cluttered environments [16]. View dependent representations [17] and
view dependent recognition of places has also been demonstrated in humans and
has been shown to be relevant for navigation [12].
106 R. Hurlebaus et al.
Most studies investigating route knowledge in humans were conducted in ur-

ban environments in which the number of route alternatives between locations as
well as possible movement decisions at street junctions are rather limited. How
do humans behave when faced with a route learning task in open environments,
lacking road networks, predefined places, and unique objects or landmarks? Are
they able to learn their way between two locations in such environments? And
if so, what are the underlying route learning processes?
1.1 Synopsis and Predictions

In the following we present a route learning experiment in an open cluttered
environment characterized by prismatic objects differing in contour but neither
in hight nor in texture. The environment did not contain any predefined places,
road networks, or unique landmarks. Distal cues were present in form of four
large colored columns and background texture. Participants task was to explore
the environment and to shuttle between two target locations repeatedly. We
monitored participants navigation and route learning behavior during an exten-
sive training phase. Subsequently, we tested the impact and influence of proximal
and distal spatial information on participants navigational ability.
We expected that over an extended period of training, participants were able to
solve the general experimental task (i.e., to fast and efficiently navigate between the
home and the feeder position). It was, however, an open question whether partici-
pants established fixed routes (as ants do when faced with such a task in a similar
environment [2]) or whether they learned global directions and distances between
the relevant locations. The latter alternative would allow solving the task with-
out explicit route knowledge but requires spatial knowledge that is best described
as survey knowledge. In contrast to route knowledge, survey knowledge allows for
more flexible navigation behavior when shuttling between distant locations. Con-
sequently, one might expect a higher variability of (similarly efficient) route choices
between navigations. Moreover, it is possible that different participants adopted
or weighted these alternative spatial learning strategies differently.
In the test phase, we systematically varied the availability of local and global
cues to study which spatial information was relevant for solving the wayfind-
ing task. If participants established fixed routes, they were expected to strongly
depend on local (i.e, proximal) spatial information to guide their movements.
Hence, if that information was removed in a no-local-objects test, their naviga-
tion performance was expected to decrease dramatically. If, on the other hand,
participants relied on global directions and distal information to solve the task,
we expected their navigation performance to drop when such information was
removed by adding fog to the environment.
2 Material and Methods

2.1 Participants
Twenty-one students of the University of Tübingen participated in this study
(10 females). The average age was 24 years (range 19-28). No participant had
(a) Participants view (b) Map
(c) Condition: no-local-objects (d) Condition: fog
Fig. 1. (a) The virtual environment from the perspective of the participant at the
home position. The sphere was visible only in close proximity. The text specified the
current task (here:”Search for the feeder!”). A distal landmark (large small column) is
visible in the background; (b) A map of the environment: the positions of home and
feeder are marked by asterisks; the crossed circles indicate the positions of the colored
columns (for illustration columns were plotted closer to the center of the environment);
(c) The no-local-objects condition; (d) The fog condition.
prior knowledge of the virtual environment or the experimental hypotheses at

the time of testing. They were paid 8e per hour. One participant (female) had
to be excluded, because of motion sickness during the first experimental trials.
2.2 Virtual Environment

The virtual environment was generated using Virtual Environments Library
(VeLib)1 . It consisted of a ground plane cluttered with objects of equal height and
texture, which differed only in the shape of their groundplate. The background
texture consisted of a cloudy sky and separate flat hills. To provide distinct
global landmark information, four large columns of different colour (red, blue,
1
http://velib.kyb.mpg.de/ (March 2008)
green, yellow) were positioned on four sides of the environment, at a distance of

80 units from the center. The directions of these globale landmarks are shown
in the environment map in Figure 1b, but are plotted closer to the obstacles
for the sake of clarity. Two additional objects, a red sphere and a blue sphere,
that marked the relevant location (referred to as home and feeder in the follow-
ing) were placed in the environment with a distance of 5̃.5 units between them.
These were so-called pop-up objects that were visible only at close proximity
(< 0.4 units). An experimental session always started at the blue sphere that
was referred to as the home location. In analogy to experiments with ants (see
Introduction), the red sphere was referred to as the feeder. Figure 1a displays
the participants’ view within the virtual environment.
2.3 Experimental Setup

The virtual environment was presented on a standard 19” computer monitor.
Participants were sitting in front of the monitor on an office chair in a distance
of approximately 80cm. Using a standard joypad (Logitech RumblePad 2) they
steered through the virtual environment. Translation and rotation velocity could
be adjusted separately by the two analog controls. Maximum translation velocity
was 0.4 units per second; maximum rotation velocity was 26◦ per second. All
participants were instructed how to use the joypad and had chance to familiarize
themselves with the setup.
2.4 Procedure
General Experimental Task and Procedure. The general experimental task
was to repeatedly navigate between two target locations, the home (blue sphere)
and the feeder (red sphere). During navigation, the target (home or feeder) for
the current run was indicated by a text message (e.g., ”Go home!”). As soon as
the participant moved over the current target (e.g., blue sphere indicating home
location), the respective text message changed (e.g., ”Search for the feeder!”).
Runs from home to feeder are referred to as outbound runs, runs from the feeder
to home are referred to as inbound runs. Experimental session always started at
the home position. As participants were naive with respect to the environment,
the experiment had an extensive training-phase prior to the test-phase.
Training-Phase. The training-phase consisted of several sessions, during which

participants were instructed to repeatedly navigate between home and feeder. At
the beginning of each session participants were positioned at the home location.
Pilot experiments demonstrated that the experimental task was very difficult in
the first run; participants were therefore provided with course directional infor-
mation about the direction from home to the other feeder at the beginning of the
first session (”seen from home, the feeder is situated between the red and the green
distal column”). Participants unterwent two training sessions per day with a 5min
break between them. A single training session was completed with the first visit of
the home after 20min. In case participants showed a performance of > 2 runs per
minute (for inbound and outbound runs) after 5 training session, they advanced
to the test phase. The maximal number of training sessions was 9.
Test-Phase. In the test phase participants were confronted with 2 wayfinding

tasks that are reported here and that were always conducted in the same order.
1: No-Local-Objects Condition. Directly after the last training session, partici-

pants conducted the no-local-objects condition. They were positioned at the home
location and were asked to navigate to the feeder. Upon reaching it all local ob-
jects (i.e., all local landmark information) disappeared, while the global landmarks
(i.e., the distal colored columns and baground texture) remained (see Figure 1c).
Participants task was to navigate back to the home location. Once they were con-
vinced to have reached the position of their home, they pressed a button on the
joypad. This procedure was repeated three times. After the no-local-objects test,
participants were given a 5 min. break. If they already went through 2 training
session that day, they conducted the fog test (see below) at the next day.
2: Fog Condition. The fog condition was identical to a training session but the
visibility conditions were altered by introducing fog in the virtual environment:
the visibility of the environment decreased with increasing distance. Beyond a
distance of 2.0 units only fog but no other information was perceptible. By these
means, global landmarks as well as structures such as view axes or corridors
arising from obstacle constellations were eliminated from the visual scene. The
fog also covered the ground plane such that it provided no optic flow during
navigation. In this modified environment participants had to rely only on local
objects in their close proximity to find their way between home and feeder. All
participants had 20 minutes to complete as many runs as possible. After that
time the fog test stopped with the first visit back at home. Unfortunately, data
of three participants had to be excluded from the final data set due to technical
problems with the software.
2.5 Data Analysis

During the experiment, participants’ positions were recorded with a sampling
frequency of 5Hz, allowing to reproduce detailed trajectories. In the following
we describe the different dependent variables extracted from these trajectories:
1. Performance. Navigation performance is measured in runs per minute. A

single run is defined as a navigation from home to the feeder or vice versa.
For each experimental session we calculated the average number of runs per
minute.
2. Route-similarity. The route similarity measure describes how conservative
or variable participants were with respect to their route choices. High values
(≈1) demonstrate that participants repeatedly choose the same route; low
values correspond to a high variablity in route choices. To calculate the
route similarity, we used a two step method: (1) the raw trail data was
reduced to sequences of numbers; (2) the similarity of these sequences was

compared. For the first step, the environment was tessellated into triangles:
we reduced each local obstacle to its center point and applied a Delaunay-
triangulation to this set of points. A unique number was assigned to each
resulting triangle. Now every run was expressed as a sequence of numbers
corresponding to the triangles crossed. To compare these sequences, we used
an algorithm originally developed to compare genetic sequences [18]. In each
case two single sequences are compaired. The basic principle is to find the
number of matches and relate that to the total length of the sequences (for
details [19]). A complete match results in a value of 1.0. For each participant
these comparisons were done separately for all outbound and inbound routes.
In the following we present the mean similarity of all comparisons of runs
performed in one session.
3. Change in performance. In order to describe navigation performance
during the fog test, participants performance (runs per minute) in the fog
condition was divided by their performance in the last training session. Equal
performance then results in a value of 1.0, increased navigation performance
results in values > 1.0 and decreased performance results in values < 1.0.
4. No-local-object test. For the homing task in the No-Local-Objects test,
the following variables were evaluated:
(a) Homing error : Distance between participant’s endpoint and actual home
position.
(b) Distance error : Air-line distance from start (feeder location) to partic-
ipant’s chosen endpoint compared with the air-line distance from start
(feeder location) to the actual home position
(c) Angular error : Angle between the beeline of participants’ homing re-
sponse and the correct direction towards the home location.
3 Results
3.1 Training Phase
Route Learning Performance. 19 out of the total of 20 participants were
able to solve the task: they learned to efficiently navigate between the home
and the feeder. One participant was removed from the final data set, as he did
not reach the learning criterion. This participant also reported to be clueless
about the positions of the home and the feeder. For the remaining participants,
the time to reach the learning criterion differed: four participants reached it
after 5 training sessions, 6 participants after 6 sessions, 2 after 7 sessions, 2
after 8 sessions, and 5 participants needed 9 training session (6.9 sessions on
average). The increase in navigation performance was highly significant for both
inbound and outbound runs (see Figure 2, paired t-test first vs. last training
session, inbound: t(18) = 14.26; p < .001; outbound: t(18) = 10.76; p < .001).
Figure 3 show examples from two participants how route-knowledge evolved with
increasing training sessions for two participants. At the end of the training phase,
all remaining 19 participants solved the task of navigating between home and
2.5
runs per minute

1.5
0.5
0
1 2 3 4 5 6 7 8 9
session
Fig. 2. During the training phase participants’ performance increased in number of

runs per minute with increasing number of sessions for both, outbound runs (♦), and
inbound runs (∗). Mean values of all participants ± standard error.
the feeder reliably and efficiently. For this result and all other results we did not
found any significant gender differences. Since we have small groups (10 female
and 11 male) small differences if present are not ascertainable.
Outbound Runs and Inbound Runs. Participants showed better naviga-
tion performance (runs/min) on inbound runs as compared to outbound runs
(Wilcoxon signed rank test: p < .01, see Figure 2). In other words, participants
found the way from feeder to home faster than the way from home to feeder. It
appears that this difference increases with increasing number of sessions. Note,
however, that some particpants reached the learning criterion already after 5
sessions and proceeded to the test phase. In later sessions the number of par-
ticipants therefore decreases which explains the increasing variations in later
sessions and which could account for the saturation effect.
Constant and Variable Routes. Analysing the chosen trajectories in the last
training session in detail, reveals remarkable inter-individual differences. While
some participants were very conservative in their route choices (see right column
in Figure 3), other participants showed a large variability in their choices (see
left column in Figure 3). The calculated mean route similarity ranged from .19
in case of very variable to 1.0 for constant routes (mean=.67, std=.24). Figure 4
displays the route similarity values for all participants revealing a continuum
rather than distinct groups. Navigation performance (runs/min) in the last train-
ing session was significantly correlated with route similarity. Specifically, with
higher route similarity the navigation performance increased (r = .47, p < .05).
Neither navigation speed during the last training session, nor the number of
sessions needed to reach the performance criterium to enter the test phase sig-
nificantly correlated with the route similarity values of the last training session
0 1 2 0 1 2
AS25 FW23
session 1 session 1
⇓ ⇓
0 1 2 0 1 2
AS25 FW23
session 5 session 4
⇓ ⇓
0 1 2 0 1 2
AS25 FW23
session 8 session 7
Fig. 3. Evolving route knowledge of two participants. Left column: variable routes with
similarity of 0.55 (mean of outbound and inbound runs of the last session, compare to
fig. 4 and see text); right column: constant route with similarity 1.0, (mean of outbound
and inbound runs of the last session, compare to fig. 4 and see text). lower left corner
measuring unit, participant, and session number.
0.9
0.8
0.7
0.6
similarity
0.5
0.4
0.3
0.2
0.1
0
27
KH 5
TU 1
N 7
TK 6
C 3
AN 0
SN 7
FW 8
SM 7
EH 2
20
C 3
AS 5
TS 6
SP 5
23
BG 9
KE 3
2
2
2
2
2
2
2
2
2
P2
2
TE
KB
JF
G
subject
Fig. 4. Route similarity values of all participants of their final session
(correlation navigation speed and route similarity: r = −.01, p = .97; correlation

number of sessions and route similarity: r = −.34, p = .16).
3.2 No-Local-Objects Condition
In the no-local-objects condition, all obstacles disappeared after participant

reached the feeder. By moving to the estimated position of the home and press-
ing a button, participants marked the location where they assume the position
2.5
homing error
2 r=−0.13, n.s.
1.5
0.5
0 0.2 0.4 0.6 0.8 1
similarity
Fig. 5. Navigation without local landmarks: Participants’ homing error as a function

of their route similarity of the last training session
of their home. On average participants produced a homing error of 1.57 units

(std=1.00), an angular error of 16.38 degrees (std=12.17), and a mean distance
error of 0.97 units (std=0.76). Together, these results suggest that, in principle,
participants could solve the homing task. None of the measures for the hom-
ing performance did significantly correlate with the route similarity measure
(homing error and route similarity: r = −.13, p = .61, see Fig. 5; angular er-
ror and route similarity: r = −.22, p = .36; distance error and route similarity:
r = −.42, p = .07). Apparently, participants performance in solving the homing
task was independent of whether or not they had established fixed routes during
the training phase.
Nevertheless, a closer look at the homing trajectories themself suggests that
participants’ differed in the strategies they applied to solve the homing task.
Figure 6 provides a few examples of homing trajectories. Participants, with
low route similarity values (i.e, participants showing a high variability in their
choices) show more or less straight inbound routes when homing. Participants
with high route similarity values (i.e., participants that established fixed routes
during training) generate trajectories that are typically curved, not linear. More-
over, their trajectories were often similar to their habitual routes: the shape of
the routes they established during training was roughly reproduced, even if the
translational or rotational metric did not fit exactly (see Figure 6). In some cases,
established routes were close to the beeline between feeder and home. In such
cases it is indistinguishable if the established route is reproduced or if another
strategy was used. The same is obviously true for participants showing a high
variability in their route choices. While it is not clear how such data could be
quantitatively analyzed, Figure 6 demonstrates that some participants with high
route similarity values reproduced the form of their habitual routes.
3.3 Fog Condition
In this part of the experiment participants were able to see closely obstacles.
Spatial information at larger distances was masked by fog (Fig. 1d). Individ-
ual performance (runs per minutes) during the fog test was compared with the
performance of the last training session (expressed as change in performance).
As expected, most participants show a performance decrease in the fog test (in-
bound: 14 of 16, outbound: 13 of 16). More interestingly, we found significant
correlations between the change in performance and the route similarity in the
last training session: participants with low route similarity values show stronger
(negative) changes in performance as compared to participants with higher route
similarity values (see Figure 7). These correlation were significant for both, in-
bound runs (r = .50, < .02) and outbound runs (r = .75, p < .001).
4 Discussion
In this work we presented a navigation experiment investigating human route

learning behavior in a complex cluttered virtual environment. In contrast to
0 1 2 0 1 2
0 1 2 0 1 2
0 1 2 0 1 2
0 1 2 0 1 2
Fig. 6. Four examples of behavior in the last training session (left column) and homing
behavior in the no-local-objects test (right column). The two top rows show results from
a participant with low route similarity values, the lower two rows show examples from
a participant with high route similarity values.
most earlier studies on route learning and route knowledge (e.g.,[3,20,21]), the
current environment did not feature a road network with predefined places, junc-
tions (decision points), and unique local landmarks. The environment was made
up of many similarly shaped objects with identical texture and height that were
Fig. 7. Participants had to navigate in a foggy VR environment, so only objects in

close proximity were visible. Given is the change in performance (runs per minute)
of all participants in fog compared to the similarity of routes established in the last
training session.
non-uniformly distributed about a large open space. In addition to these local

objects, four distal unique landmarks provided global references. Specific inter-
est concerned the question if navigators were able to learn their way between
two locations in such an environment. Furthermore, we were interested if all
participants used similar or identical route learning strategies (for example: do
navigators establish fixed routes or do they rather learn the global layout of
the environment and chose between different similarly efficient routes). In the
experiment, participants were trained for several session until they were able to
efficiently navigate between two locations, the home and the target. After reach-
ing a learning criterion they entered the test phase, during which the availability
of spatial information was systematically varied to investigate which spatial in-
formation (local or global) participants used to solve the task.
All but one participant reached the learning criterion after a maximum of 9
training sessions. Navigation performance (measured as runs per minute) clearly
increased with the number of training session (see Figure 2). This demonstrates
that participants were able to efficiently and reliably navigate in complex clut-
tered environments lacking predefined places, road networks, and local landmark
information that is usually provided by unique objects (e.g., large red house) at
decision points or road crossings. Comparisons of navigation performance over
the entire training phase revealed differences for outbound runs (home to target)
and inbound runs (target to home): specifically, participants found their way
faster on inbound runs. This could be explained by the specific significance of
the home location, which may result from the fact that each training session
started at the home/nest. In central place foragers, like the desert ants, the
importance of the nest and its close surrounding is well documented [22]. An
alternative explanation for this effect is that the local surrounding of nest and
feeder were different (i.e. the spatial distribution of the surrounding obstacles):
the nest, for example, was positioned at a larger open space, surrounded by
fewer objects, as compared to the feeder. By these means, the nest position
might have been recognized from larger distances, hence resulting in an increased
performance. Further experiments will have to show whether semantic or spatial
(configurational) effects were responsible for the described effect.
The most important result of the training phase is that participants greatly
differed with respect to their route choices: using a novel method to compare
trajectories (see Section 2.5) we obtained descriptions of the similarity of the
traveled paths during the last training session. While some participants were
very conservative, selecting the same outbound path and the same inbound path
on most runs, others showed a high variability, navigating along many different
paths (for examples, see Figure 3). Participants’ route similarity values of the
last training session were correlated with their navigation performance during
that session: participants that established fixed routes during training showed
better navigation performance than participants that showed higher variabili-
ties in their route choices. How can these inter-individual differences in route
similarity and navigation performance be interpreted? Did different participants
employ different navigation or learning strategies, relying on different spatial
information?
Results from the test phase in which the availability of different types of spa-
tial information was systematically manipulated allowed for first answers: In
the fog condition (see Figure 1d) only obstacles in close proximity were visible.
By these means, global spatial information was erased (i.e., distal global land-
marks and spatial information, emerging by lined-up obstacles such as visual
gate-ways or corridors). We observed correlations between participants’ route
similarity values and their performance in the fog condition. Specifically, indi-
viduals showing a high variability in route choices showed a clear reduction of
navigation performance during the fog condition as compared to the last training
session. Individuals with a low variability in route choices, on the other hand,
were largely unaffected by the fog. These results suggest that participants with
variable route choice behavior strongly relied on distal or global spatial infor-
mation, while participants exhibiting conservative route choice behavior rather
relied on proximal spatial information, as provided by the close-by obstacles or
obstacle configurations. A straight forward assumption is that the latter group
learned local views (obstacle configurations) and corresponding movement de-
cisions (c.f. [23]) during the training phase that were also available also during
the fog condition. In other words, route knowledge for these participants would
be best described as a sequence of recognition triggered responses [1,3].
If, in fact, participants exhibiting conservative route choice behavior relied on
recognition triggered responses, and participants showing variable route choice
behavior primarily relied on distal, global spatial information or knowledge, the
following behavior had to be predicted for the no local obstacle condition: if all
local obstacles dissappear after reaching the feeder and only the distal global
landmarks remained, returning to the home should be impossible for partici-
pants relying on recognition triggered responses only. Participants relying on
global information, on the other hand, should be able to solve the task. In con-
trast to these predictions, results demonstrate that all participants were able to
solve the task with a certain accuracy (see Figures 5 and 6). Furthermore, virtu-
ally no correlation (r=-.13) was found between participants’ route similarities in
the last training session and their homing performance in the no local obstacle
condition. This disproves the explanations given above: apparently participants
showing conservative route choice behavior did not solely rely on stored views
and remembered movement decisions (i.e., recognition triggered responses), but
had additional spatial knowledge allowing them to solve the homing task. A
detailed inspection of their homing trajectories revealed that some participants
reproduced the overall form of their habitual routes from the last training session
(see Figure 6). There are two ways of achieving such behavior: (1) participants
learned a motor program during training that was replayed during the no local
obstacle condition, or (2) they possessed a metric representation of the estab-
lished routes. While this experiment does not allow distinguishing between these
alternatives, informal interviews participants after the experiment support the
latter explanation.
Taken together, we have shown that participants could learn to efficiently nav-
igate between two locations in a complex cluttered virtual environment, lacking
predefined places, decision points, and road networks. In such unstructured en-
vironments a route is best described as a sequence places defined by views or
object configurations [3], rather than as a sequence of places defined by unique
single objects. Analyzing participants’ navigation behavior, we could show strong
interindividual differences that could be related to different navigation or orien-
tation strategies taking different kind of spatial information into account. Specif-
ically, participants showing a high variability in their route choices depended on
distal spatial information, suggesting that they learned global directions and
distances between relevant locations. Participants who established fixed routes
instead relied on proximal obstacles to guide their movements. However, even
if such local spatial information was not available, some were able to reproduce
the overall form of their preferred paths. Apparently they learned more than
reflex-like recognition triggered responses during training, presumably generat-
ing a metric representation of their preferred paths. These results are not in
line with the dominant landmark to route to survey knowledge framework of
spatial knowledge acquisition [6], stating that survey knowledge emerges not un-
til route knowledge is established. Apparently some participants were able to
learn about distances and directions in the environment without first establish-
ing route knowledge (c.f. [5]). The fact that participants’ route similarities of
their last training session did not fall into two distinct clusters but constituted a
continuum, furthermore, suggests that the two learning strategies sketched above
are not exclusive but complementary, existing in parallel (c.f. [4]), and that dif-
ferent participants weighted them differently. It is highly likely that these weights
are adopted during the course of learning.
Further research is needed to answer questions arising from this exploratory
study. For example, what triggers the usage of which strategy? How are the
strategies related to each other? And, how is metric information entangled with
the strategies applied?
Acknowledgement. The work described in the paper was supported by the

European Commission (FP6-2003-NEST-PATH Project “Wayfinding”).
References
1. van Janzen, G., Turennout, M.: Selective neural representation of objects relevant
for navigation. Nature Neuroscience 7(6), 572–574 (2004)
2. Kohler, M., Wehner, R.: Idiosyncratic route-based memories in desert ants,
Melophorus bagoti: How do they interact with path-integration vectors? Neuro-
biol. Learn. Mem. 83, 1–12 (2005)
3. Mallot, H., Gillner, S.: Route navigation without place recognition. what is recog-
nized in recognition triggered responses? Perception 29, 43–55 (2000)
4. Aginsky, V., Harris, C., Rensink, R., Beusmans, J.: Two strategies for learning
a route in a driving simulator. Journal of Environmental Psychology 17, 317–331
(1997)
5. Ishikawa, T., Montello, D.: Spatial knowledge acquisition from direct experience
in the environment: Individual differences in the development of metric knowledge
and the integration of separately learned places. Cognitive Psychology 52, 93–129
(2006)
6. Siegel, A., White, S.: The development of spatial representations of large-scale
environments. Advances in child development and behavior 10, 9–55 (1975)
7. Restle, F.: Discrimination cues in mazes: A resolution of the ’place-vs-response’
question. Psychological Review 64(4), 217–228 (1957)
8. Leonard, B., McNaughton, B.: Spatial representation in the rat conceptual be-
havioural and neurophysiological perspectives. In: Kessner, R., Olton, D.S. (eds.)
Comparative Cognition and Neuroscience: Neurobiology of Comparative Cogni-
tion. Hillsdale, New Jersey (1990)
9. Taylor, H., Naylor, S., Chechile, N.: Goal-specific influences on the representation
of spatial perspective. Memory and Cognition 27, 309–319 (1999)
10. Trullier, O., Wiener, S., Berthoz, A., Meyer, J.A.: Biologically based artificial nav-
igation systems: review and prospects. Progress in Neurobiology 51(5), 483–544
(1997)
11. Kuipers, B.: The spatial semantic hierarchy. Artificial Intelligence 119, 191–233
(2000)
12. Gillner, S., Mallot, H.: Navigation and acquisition of spatial knowledge in a virtual
maze. Journal of Cognitive Neuroscience 10, 445–463 (1998)
13. Hölscher, C., Meilinger, T., Vrachliotis, G., Brösamle, M., Knauff, M.: Up the down
staircase: Wayfinding strategies and multi-level buildings. Journal of Environmen-
tal Psychology 26(4), 284–299 (2006)
14. Wehner, R., Boyer, M., Loertscher, F., Sommer, S., Menzi, U.: Ant navigation:
One-way routes rather than maps. Current Biology 16, 75–79 (2006)
15. Graham, P., Collett, T.: Bi-directional route learning in wood ants. Journal of
Experimental Biology 209, 3677–3684 (2006)
16. Judd, S., Collett, T.S.: Multiple stored views and landmark guidance in ants. Na-
ture 392, 710–714 (1998)
17. Diwadkar, V., McNamara, T.: Viewpoint dependence in scene recognition. Psycho-
logical Science 8, 302–307 (1997)
18. Needleman, S., Wunsch, C.: A general method applicable to the search for similar-
ities in the amino acid sequence of two proteins. J. Mol. Biol. 48, 443–453 (1970)
19. Basten, K., Mallot, H.: Building blocks for trail analysis (in preparation, 2008)
20. Gaunet, F., Vidal, M., Kemeny, A., Berthoz, A.: Active, passive and snapshot
exploration in a virtual environment: influence on scene memory, reorientation and
path memory. Cognitive Brain Research 11(3), 409–420 (2001)
21. Munzer, S., Zimmer, H., Schwalm, M., Baus, J., Aslan, I.: Computer-assisted navi-
gation and the acquisition of route and survey knowledge. Journal of Environmental
Psychology 26(4), 300–308 (2006)
22. Bisch-Knaden, S., Wehner, R.: Landmark memories are more robust when acquired
at the nest site than en route: experiments in desert ants. Naturwissenschaften 90,
127–130 (2003)
23. Christou, C.G., Bülthoff, H.H.: View dependence in scene recognition after active
learning. Memory and Cognition 27(6), 996–1007 (1999)
Learning with Virtual Verbal Displays: Effects of
Interface Fidelity on Cognitive Map Development
Nicholas A. Giudice1 and Jerome D. Tietz2,*

1
Department of Spatial Information Science and Engineering
348 Boardman Hall, University of Maine, Orono
Orono, ME 04469
giudice@psych.ucsb.edu
2
University of California, Santa Barbara
Department of Psychology
Santa Barbara, CA 93106-9660
Abstract. We investigate verbal learning and cognitive map development of

simulated layouts using a non-visual interface called a virtual verbal display
(VVD). Previous studies have questioned the efficacy of VVDs in supporting
cognitive mapping (Giudice, Bakdash, Legge, & Roy, in revision). Two factors
of interface fidelity are investigated which could account for this deficit, spatial
language vs. spatialized audio and physical vs. imagined rotation. During train-
ing, participants used the VVD (Experiments 1 and 2) or a visual display (Ex-
periment 3) to explore unfamiliar computer-based layouts and seek-out target
locations. At test, participants performed a wayfinding task between targets in
the corresponding real environment. Results demonstrated that only spatialized
audio in the VVD improved wayfinding behavior, yielding almost identical per-
formance as was found in the visual condition. These findings suggest that
learning with both modalities led to comparable cognitive maps and demon-
strate the importance of incorporating spatial cues in verbal displays.
Keywords: wayfinding, verbal learning, spatialized audio, interface fidelity.
1 Introduction
Most research investigating verbal spatial learning has focused on comprehension of
route directions or the mental representations developed from reading spatial texts
[1-4]. Owing to this research emphasis, there is much less known about the efficacy
of verbal information to support real-time spatial learning and navigation. What dis-
tinguishes a real-time auditory display from other forms of spatial verbal information
is the notion of dynamic updating. In a dynamically-updated auditory display, the
presentation of information about a person’s position and orientation in the environ-
ment changes in register with physical movement. For example, rather than receiving
*
The authors thank Jack Loomis for insightful comments on the manuscript, Maryann Betty for
experimental preparation, Brandon Friedman for assistance in running participants, and Ma-
saki Miyanohara for helping with running participants and data analysis. This work was sup-
ported by an NRSA grant to the first author, #1F32EY015963-01.
122 N.A. Giudice and J.D. Tietz
a sequential list of all the distances and turns at the beginning of a route, as is done
with traditional verbal directions, a real-time display provides the user with context-
sensitive information with respect to their current location/heading state as they
progress along the route. Vehicle-based navigation systems utilizing GPS and speech-
based route directions represent a good example of these dynamic displays. Dynamic
auditory interfaces also have relevance in navigation systems for the blind, and in this
context, they have proven extremely effective in supporting real-time route guidance
[see 5 for a review].
Rather than addressing route navigation, the current research uses free exploration
of computer-simulated training layouts to investigate environmental learning. The
training environments are explored using a non-visual interface called a virtual verbal
display (VVD). The VVD is based on dynamically-updated geometric descriptions,
verbal messages which provide real-time orientation and position information as well
as a description of the local layout geometry [see 6 for details]. A sample output
string is: “You are facing West, at a 3-way intersection, there are hallways ahead, left,
and behind.” If a user executed a 90° left rotation at this t-junction, the VVD would
return an updated message to reflect that he/she was now facing South, with hallways
extending ahead, left, and right.
We know that geometric-based displays are extremely effective for supporting free
exploration (open search) in both real and computer-based layouts [7-9]. However, their
efficacy for supporting cognitive map development is unclear. That is, participants who
trained using a virtual verbal display to search computer-based environments performed
significantly worse on subsequent wayfinding tests in the corresponding real environ-
ment [7, 8] than subjects who trained and tested exclusively in real environments [9].
These findings suggest that training with a virtual verbal display results in impoverished
environmental learning and cognitive map development compared to use of the same
verbal information for searching real building layouts. This deficit cannot be attributed
to environmental transfer more generally, as previous studies have demonstrated that
learning in virtual environments (VEs) transfers to accurate real-world navigation, even
with perceptually sparse visual displays similar to our geometric verbal display [10-12].
The current studies investigate several factors of interface fidelity which may ac-
count for problems in spatial knowledge acquisition with the VVD. As described by
Waller and colleagues [13], interface fidelity refers to how the input and output of
information from the virtual display is used, i.e. how one’s physical actions affect
movement in the VE and how well feedback from the system supports normal percep-
tual-motor couplings. These interactions can be distinguished from factors relating to
environment fidelity, which refers to how well the information rendered in the VE
resembles the real environment, e.g. sensory richness, spatial detail, surface features,
and field of view [13]. Our previous work with VVDs dealt with environment fidelity,
investigating whether describing more of the layout from a given vantage point, called
“verbal view depth,” would facilitate learning of global structure and aid subsequent
wayfinding behavior. However, the lackluster environmental transfer performance
with three levels of verbal view depth, ranging from local to global descriptions,
demonstrated that deficits in cognitive map development were not due to availability
of environmental information but to the interface itself [7, 8].
The current experiments hold environmental variables constant and manipulate
several factors relating to interface fidelity. Experiment 1 compares traditional verbal
descriptions, where the message is delivered as a monaural signal to both ears, with
Learning with VVDs: Effects of Interface Fidelity on Cognitive Map Development 123
spatialized audio descriptions, where the message is heard as coming from a specific
direction, e.g. a hallway to the left would be heard as a description emanating from
the navigator’s left side. Experiment 2 addresses the influence of body-based informa-
tion, e.g. physical rotation vs. imagined rotation. Experiment 3 follows the same
design of the first two verbal studies but uses a visual display as a control. All ex-
periments incorporate training in computer-based layouts and environmental transfer
requiring wayfinding in the corresponding real environment. Our focus is on the
transfer tests, as they provide the best index of environmental learning and cognitive
map development.
2 Experiment 1
In this study, blindfolded participants are given a training period where they use ver-
bal descriptions to freely explore unfamiliar computer-based floors of university
buildings and seek out four target locations. At test, they must find routes between
target pairs in the corresponding real environment. This design is well-suited for ad-
dressing environmental learning, as theories of cognitive map development have long
emphasized the importance of free exploration and repeated environmental exposure
[14, 15]. The wayfinding test represents a good measure of cognitive map accuracy,
as performance cannot be accomplished using a route matching strategy. Since no
routes are specified during training, accurate wayfinding behavior requires subjects to
form a globally coherent representation of the environment, i.e. the trademark of a
cognitive map [16].
Our previous work with virtual verbal displays was based exclusively on spatial lan-
guage (SL), i.e. consistent, unambiguous terminology for describing spatial relations
[17]. The problem with any purely linguistic display is that the information provided is
symbolic. A description of a door at 3 o’clock in 10 feet has no intrinsic spatial content
and requires cognitive mediation to interpret the message. By contrast, a spatialized
audio (SA) display is perceptual, directly conveying spatial information about the envi-
ronment by coupling user movement with the distance and direction of object locations
in 3-D space. For instance, rather than describing the location of the door, the person
simply hears its name as coming from that location in the environment.
Several lines of research support the benefit of spatialized auditory displays. Ex-
periments comparing different non-visual displays with a GPS-based navigation sys-
tem for the blind have shown that performance on traversing novel routes, finding
landmarks, and reaching a goal state is superior when guided with spatialized audio
versus spatial language [18-20]. Research has also shown that spatialized auditory
displays are beneficial as a navigation aid during real-time flight [21] and for provid-
ing non-visual information to pilots in the cockpit of flight simulators [22]. It is pre-
dicted that spatialized audio displays will have similar benefits on cognitive map
development, especially when training occurs in computer-based environments as are
used here. Spatial updating and environmental learning is known to be more cogni-
tively effortful in VEs than in real spaces, [23, 24]. However, recent work suggests
that SA is less affected by cognitive load than SL during guidance of virtual routes,
yielding faster and more accurate performance in the presence of a concurrent distrac-
tor task [25]. These findings indicate that the use of SA in the VVD may reduce the
working memory demands associated with virtual navigation, thus increasing re-
sources available for cognitive map development.
To address this issue, Experiment 1 compared environmental learning with virtual

verbal displays based on Spatial language descriptions about layout geometry [8] with
identical descriptions that added spatial information to the signal, e.g. a hallway on
the left would be heard in the left ear.
2.1 Method
Participants. Fourteen blindfolded-sighted participants, ages 18-32 (mean = 20.6),

balanced equally by gender, ran in the two hour study. Subjects in all experiments
were unfamiliar with the test environments, reported normal (or corrected to normal)
visual and auditory acuity, gave informed consent, and received course credit for their
participation.
Environments and Apparatus. Two simulated floors of the UC Santa Barbara

Psychology building, and their physical analogs, were used. The computer-based
layouts were rendered to be perceptually sparse, with the verbal messages providing
information about the user’s facing direction and layout geometry only. The simulated
layouts were broken into corridor segments separated by nodes (each segment
approximated ten feet in the real space). The floors averaged 100.5 m of hallway
extent and 8.5 intersections. Each floor contained four targets which must be found
during the search. The name of each target was spoken whenever subjects reached its
location in the layout (Ss were told the names of the four targets before starting the
trial). Figure 1 shows an illustration of an experimental environment and sample
verbal descriptions.
Fig. 1. Experimental layout with target locations denoted. What is heard upon entering an
intersection (listed above and below the layout) is depicted in gray. Each arrow represents the
orientation of a user at this location.
Participants navigated the virtual environments using the arrow keys on a USB
numberpad. Pushing the up arrow (8) translated the user forward and the left (4) and
right (6) arrows rotated them in place left or right respectively. Forward movements
were made in discrete "steps," with each forward key press virtually translating the
navigator ahead one corridor segment (approximately ten feet in the real environ-
ment). Left-right rotations were quantized to 90 degrees. Pressing the numpad 5 key
repeated the last verbal message spoken and the 0 key served as a "shut-up" function
by truncating the active verbal message.
Verbal descriptions, based on a female voice, were generated automatically upon
reaching an intersection or target location and rotation at any point returned an up-
dated heading, e.g. “facing north”. A footstep sound was played for every forward
move when navigating between hallway junctions. Movement transitions took ap-
proximately 750 ms. The Vizard 3-D rendering application (www.worldviz.com) was
used to coordinate the verbal messages, present a visual map of what was being heard
for experimenter monitoring, and to log participant search trajectories for subsequent
analyses.
2.2 Design and Procedure
A within subjects design was used with participants running in one spatial language
and spatialized audio condition, counterbalanced across the two experimental envi-
ronments. The experiment comprised three phases. During practice, the movement
behavior was demonstrated and participants were familiarized with the speech output
from the VVD on a visual map depicting what would be spoken for each type of in-
tersection.
Training Phase. To start the trial, blindfolded participants stood in the center of a one
meter radius circle with four three inch RadioShack speakers mounted on tripods (at a
height of 152 cm) placed on the circumference at azimuths of 0° (ahead), 90° (right),
180° (behind) and 270° (left). In the SL conditions, the verbal message was
simultaneously presented from the left and right speaker only. With the SA
conditions, the participant heard the verbal message as coming from any of the four
speakers based on the direction of the hallway being described. The spatialized audio
messages were generated by sending the signal from the speaker outputs on the
computer’s sound card (Creative Labs Audigy2 Platinum) to a four-channel
multiplexer which routed the audio to the relevant speaker. The input device was
affixed via Velcro to an 88 cm stand positioned directly in front of them.
Subjects were started from an origin position in the layout, designated as "start"
and instructed to freely explore the environment using the verbal descriptions to ap-
prehend the space and the input device to affect movement. Their task for the training
period was to cover the entire layout during their search and to seek out four hidden
target locations. Although no explicit instructions were given about search strategy or
specific routes, they were encouraged to try to learn the global configuration of the
layout and to be able to navigate a route from any target to any other target. The train-
ing period continued until the number of forward moves in their search trajectory
equaled three times the number of segments comprising the environment. Participants
were alerted when 50 % and 75 % of their moves were exhausted.
Testing Phase. Upon completion of the training period, participants performed the
transfer tests. Blindfolded, they were led via a circuitous route to the corresponding
physical floor and started at one of the target locations. After removing the blindfold,
participants were told they were now facing north, standing at target X and requested
to walk the shortest route to target Y. They performed this wayfinding task using
vision, no verbal descriptions about the environment or target locations were given.
Participants indicated that they had reached the destination by speaking the target’s
name (e.g., “I have reached target dog”). To reduce accumulation of error between
trials, they were brought to the actual target location for incorrectly localized targets
before proceeding. Participants found routes between four target pairs, the order of
which were counterbalanced.
Analysis. Although our focus was on transfer performance, three measures of search
behavior were also analyzed from the training phase in all experiments:
1. Floor coverage percent: the number of unique segments traversed during train-
ing divided by the total number of segments in the environment.
2. Unique targets percent: ratio of unique targets encountered during training to the
total number of target locations (4).
3. Shortest routes traversed: sum of all direct routes taken between target locations
during the search period. A shortest route equals the route between target loca-
tions with the minimum number of intervening segments.
Two wayfinding test measures were analyzed for all studies during the transfer phase
in the real building:
1. Target localization accuracy percent: ratio of target locations correctly found at
test to the total number of target localization trials (four).
2. Route efficiency: length of the shortest route between target locations divided by
length of the route traveled (only calculated for correct target localization trials).
2.3 Results and Discussion
As predicted, training performance using both VVD display modes revealed accurate
open search behavior. Collapsing across SL and SA conditions, participants covered
97.3% of the segments comprising each floor, found 97.3% of the target locations and
traveled an average of 9.9 shortest routes between targets. By comparison, the theo-
retical maximum number of shortest routes traveled during the training period, given
100% floor coverage with the same number of moves, is 14.5 (averaged across
floors). Results from the inferential tests provide statistical support for the near identi-
cal performance observed between inputs; none of the one-way repeated measures
ANOVAs conducted for each training measure revealed reliable differences between
SL and SA conditions, all ps > .1. Indeed, performance on the training measures was
almost identical for all conditions across experiments (see table 1 for comparison of
all means and standard errors). These findings indicate that irrespective of training
condition, subjects adopted a broadly distributed, near optimal route-finding search
strategy.
Table 1. Training Measures of Experiments 1-3 by Condition. Each cell represents the mean (±
SEM) on three measures of search performance for participants in experiments 1-3. No signifi-
cant differences were observed between any of the dependent measures.
Floor Cover- Unique Tar- Total Shortest

Experiment Condition age (%) gets Hit (%) Routes Traversed
1 (N=14) Spatialized Audio 98.46(1.35) 98.21(1.79) 10.71(0.98)
Spatial Language 96.14(2.98) 96.43(3.57) 9.07(1.22)
2 (N=16) Rotation
98.99(0.85) 100.00(0) 12.0625(0.99)
+ Spatialized Audio
Rotation
99.14(0.50) 98.44(1.56) 10.94(1.11)
+ Spatial Language
3 (N=14) Visual Control 97.34(2.46) 96.43(2.43) 11.07(1.29)
Environmental learning/cognitive map development was assessed using a wayfind-

ing test in the physical building. To address the effect of spatialization, one-way re-
peated measures ANOVAs comparing spatial language and spatialized audio were
conducted for the two transfer test measures of target localization accuracy and route
efficiency. Participants who trained using spatialized audio in the VVD correctly
localized significantly more targets, 76.8% (SE = 6.12) than those who learned with
spatial language, 51.8% (SE = 10.63), F(1,13) = 7.583, p = .016, η2 = 0.39. Target
localization accuracy in both conditions, average 64.3%, was significantly above
chance performance of ~3%, defined as one divided by 33 possible target locations,
e.g. a target can be located at any of the 33 segments comprising the average envi-
ronment, t(27) = 8.94, p < 0.001.
Route efficiency for correctly localized targets did not reliably differ between
conditions, SA, 96.9% (SE = 1.6) and SL, 95.9% (SE = 2.6), ps >.1. Note that route
efficiency is calculated for correctly executed routes only. Thus, the near ceiling per-
formance simply means that the routes that were known were followed optimally. It is
likely that this measure would be more sensitive to detecting differences between
conditions on floors having greater topological complexity.
Experiment 1 investigated the effect of spatialization on environmental learning.
Results from the spatial language condition almost perfectly replicated a previous
experiment using the same SL condition and near identical design [8]. Both studies
showed that training with the VVD led to efficient search behavior but poor wayfind-
ing performance in the real environment, 51.8% target localization accuracy in the
current study vs. 51.3% accuracy found previously. These findings support our hy-
pothesis that limitations arising from use of the VVD are not due to problems
performing effective searches but to deficits in building up accurate spatial represen-
tations in memory.
The SA condition, serving as a perceptual interface providing direct access to spa-
tial relations, was tested here because we believed that it would aid cognitive map
development. The confirmatory results were dramatic. Participants who trained in
computer-based layouts using spatialized audio demonstrated 50% better wayfinding
performance in the corresponding real building than when they trained with spatial
language. The 76.8% target localization accuracy in the SA condition was also on par
with target localization performance of 80% observed in a previous study after verbal
learning in real buildings [9]. This similarity is important as it shows that the same
level of spatial knowledge acquisition is possible between learning in real and virtual
environments. Our results are consistent with the advantage of spatialized auditory
displays vs. spatial language found for route guidance [18-20, 25] and extend the
efficacy of spatialized audio displays for supporting cognitive mapping and wayfind-
ing behavior.
3 Experiment 2
Experiment 2 was designed to assess the contribution of physical body movement
during virtual verbal learning on cognitive map development. Navigation with our
virtual verbal display, as with most desktop virtual environment technologies, lacks
the idiothetic information which is available during physical navigation, i.e. body-
based movement cues such as proprioceptive, vestibular, and biomechanical feed-
back. VEs incorporating these cues have greater interface fidelity as the sensorimotor
contingencies are more analogous to real-world movement [26]. Various spatial be-
haviors requiring accessing an accurate cognitive map show improved performance
when idiothetic information is included. For instance, physical rotation during VE
learning vs. imagined rotation benefits tasks requiring pointing to previously learned
targets [27, 28], estimation of unseen target distances [29] and updating self orienta-
tion between multiple target locations [30]. Path integration is also better in VEs pro-
viding proprioceptive and visual information specifying rotation compared to visual
information in isolation [31]. The inclusion of idiothetic information has also led to
improved performance on cognitive mapping tasks similar to the current experiment,
where VE learning is tested during transfer to real-world navigation [32, 33].
Where the previous work has addressed the role of body-based cues with visual
displays, Experiment 2 investigates whether similar benefits for verbal learning mani-
fest when physical body rotation is included in the VVD. As with experiment 1, par-
ticipants use the VVD to explore computer-based training environments and then
perform wayfinding tests in the corresponding real environment. However, rather than
using arrow keys to affect imagined rotations and translations during training, partici-
pants physically turn in place whenever they wished to execute a change of heading.
Translations are still done via the keypad as the benefit of physical translation on VE
learning is generally considered nominal. This is consistent with studies in real envi-
ronments showing that pointing to target locations is faster and more accurate after
actual than imagined rotations, whereas errors and latencies tend not to differ between
real and imagined translations [34].
We predict that inclusion of idiothetic information in the VVD will yield marked
improvements in spatial knowledge acquisition and cognitive map development. In
addition to the previous evidence supporting body-based cues, we believe the conver-
sion of linguistic operators into a spatial form in memory is a cognitively effortful
process, facilitated by physical movement. Evidence from several studies support this
movement hypothesis. Avraamides and colleagues (Experiment 3, 2004) showed that
mental updating of allocentric target locations learned via spatial language was
impaired until the observer was allowed to physically move before making their judg-
ments, presumably inducing the spatial representation. Updating object locations
learned from a text description is also improved when the reader is allowed to physi-
cally rotate to the perspective described by the text [35], with egocentric direction
judgments made faster and more accurately after physical, rather than imagined
rotation [36].
To test our prediction, this experiment adds real rotation to the spatialized audio
and spatial language conditions of Experiment 1. If the inclusion of rotational infor-
mation is critical for supporting environmental learning from verbal descriptions,
wayfinding performance during real-world transfer should be better after training with
both physical rotation conditions of the current experiment than was observed in the
analogous conditions with imagined rotation of Experiment 1. Furthermore, assuming
some level of complementarity between rotation and spatialization, the rota-
tion+spatialized audio (R+SA) condition is predicted to show superior performance to
the rotation+spatial language (R+SL) condition.
3.1 Method
Sixteen blindfolded-sighted participants, nine female and seven male, ages 18-24
(mean = 19.6) ran in the two hour study.
Experiment 2 employs the same spatial language and spatialized audio conditions as
Experiment 1 and adopts the same within Ss design using two counterbalanced condi-
tions, each including a practice, training, and transfer phase. The only difference from
Experiment 1 is that during the training phase, participants used real body rotation in
the VVD instead of imagined rotation via the arrow keys. Since all intersections were
right angle, left and right rotations always required turning 90° in place. An automati-
cally-updated heading description was generated when their facing direction was ori-
ented with the orthogonal corridor. They could then either continue translating by
means of the keypad or request an updated description of intersection geometry.
Heading changes were tracked using a three degree-of-freedom (DOF) inertial orienta-
tion tracker called 3D-Bird (ascension corporation: http://www.ascension-tech.com/
products/3dbird.php).
To address the effect of rotation on environmental learning and wayfinding perform-

ance, One-way repeated measures ANOVAs were conducted for the transfer tests of
target localization accuracy and route efficiency. Results indicated a significant dif-
ference for target localization only, with the 78.1% (SE = 5.03) accuracy of the rota-
tion+spatialized audio condition found to be reliably better than the 57.8% (SE =
9.33) accuracy of the rotation+spatial language condition, F(1,15) = 4.601, p<.05, η2
= 0.24. Performance on route efficiency did not differ between conditions, R+SA =
98.5% and R+SL = 100%. As discussed in Experiment 1, the results of this measure
are far less interesting than the target localization performance on what they say about
cognitive map development. Since we were interested in evaluating whether the
physical rotation conditions were better than the same conditions using imagined
rotation of Experiment 1, we performed a two-way between subjects ANOVA
comparing target accuracy performance between experiments by spatialized and non-

spatialized conditions. This between Ss comparison is appropriate as the subject
groups in both experiments were similar in age, sex, educational background and
spatial ability (as assessed by the Santa Barbara sense of Direction Scale, SBSOD).
As can be seen in Figure 2, results showed a main effect of target accuracy by spati-
alization, F(1,28) = 11.753, p<.05, η2 = 0.3, but the more meaningful experiment by
spatialization interaction was not significant, F(1,28) = .126, p = >.1, η2 = 0.004.
Likewise, a one-way ANOVA comparing target localization accuracy collapsed
across condition between experiments, thereby directly addressing the influence of
rotation factoring out spatialization, was not significant, p > .1.
The results of Experiment 2 paint a clear, yet surprising picture. The addition of
physical rotation in the VVD was predicted to significantly benefit spatial knowledge
acquisition and cognitive map development, as “real” movement was thought to be
particularly important in converting the symbolic verbal messages into a spatial form
in memory. While there was a difference in transfer performance between conditions
in this experiment, comparison of the data to analogous conditions of experiment 1
confirm that this difference was driven by the presence of spatialized audio descrip-
tions, not physical rotation. Subjects in the SL condition of experiment 1 found routes
between targets in the real building with 51.8% accuracy. The 57.8% accuracy of the
R+SL condition of Experiment 2, which is identical to that condition except for the
addition of real vs. imagined body turning during training, represents a small, nonsig-
nificant performance improvement, P>.1. Likewise, the absence of reliable differ-
ences between the SA condition of Experiment 1 and the same condition with rotation
in Experiment 2 (76.8% vs. 78.1% correct target accuracy respectively), demonstrates
that the addition of physical rotation did not benefit environmental learning.
Fig. 2. Comparison of mean target localization accuracy (± SEM) between Experiments 1 and
2. Note: Both experiments compared SL and SA conditions but Experiment 1 used imagined
rotation and Experiment 2 (gray bars) used body rotation.
The finding that idiothetic information did not benefit transfer performance was
unexpected given previous literature showing that physical body movement during
and after verbal learning significantly improves latency and error performance at test
[35-37]. Differences in task demands likely contribute to these findings. In the previ-
ous studies, subjects learned a series of target locations from text or speech descrip-
tions and then were tested using a pointing-based spatial updating task. The increased
weighting of physical movement demonstrated in those studies may be less important
with the free exploration paradigm and transfer tests used here, as these tasks do not
force updating of Euclidean relations between targets. Thus, the addition of a pointing
task between target locations may have shown greater benefit of physical rotation
than was evident from our wayfinding task. This needs to be addressed in future ex-
periments as it cannot be resolved from the current data.
4 Experiment 3
Experiment 3 followed the same design of the previous two studies but subjects
learned the computer-based training environments from a visual display rather than a
verbal display. The main goal of Experiment 3 was to provide a good comparison
benchmark with the previous two verbal experiments. Specifically, we wanted to
investigate whether learning with verbal and visual displays lead to comparable envi-
ronmental transfer performance, findings which would provide proof of efficacy of
the VVD. Our previous experiments using an almost identical design to the current
studies, found that wayfinding performance during environmental transfer was sig-
nificantly worse after learning from a virtual verbal display than from a visual display
[8, Experiment 3, 10]. However, those studies only compared visual learning with a
spatial language condition, analogous to that used in experiment 1. By contrast, the
significantly improved transfer performance of the spatialized audio conditions are on
par with our previous findings with the visual display. Likewise, the SA conditions in
the first two experiments provide perceptual information about the direction of hall-
ways which is better matched with what is apprehended from a visual display. Since
the visual display and movement behavior in the previous studies differed slightly
from the information and movement of the VVD used here, Experiment 3 was run to
serve as a more valid comparison.
4.1 Method
Fourteen normally sighted participants, six females and eight males, ages 18-21
(mean = 19.2) ran in the one hour study.
The experimental procedure was identical to the previous studies except that sub-
jects only learned one environment and trained with a visual display instead of the
VVD. During training, participants saw the same geometric “views” of the layout on
the computer monitor (Gateway VX700, 43.18 cm diagonal) as were previously de-
scribed with each message from the VVD. The environment was viewed from the
center of the monitor and movement was performed via the keypad’s arrow keys, as
described earlier. Figure 3 shows an example of what would be seen from a 3-way
intersection. With each translation, the participant heard the footstep sound and the
Fig. 3. Sample 3-way intersection as seen on a visual display. Information seen from each view
is matched to what would be heard in the corresponding message from the VVD.
next corridor segment(s) was displayed with an animated arrow indicating forward
movement. With rotations, they saw the viewable segments rotate in place and an
updated indication of heading was displayed. In addition, they heard the target names,
starting location noise, foot step sound, and percent of training time elapsed via mon-
aural output through the same speakers. This design ensured the visual display was
equivalent in information content to what was available in the auditory conditions of
experiments 1 and 2.
Performance on the transfer tests after visual learning was quite good, resulting in
target localization accuracy of 78.6% (SE = 8.6) and route efficiency of 95.6% (SE =
2.4). Given our interest in comparing learning performance between the visual display
and the VVD, independent samples t-tests were used to evaluate how wayfinding
performance after visual learning compared to the same tests after verbal learning in
Experiments 1 and 2. As the presence or absence of spatialized information was the
only factor that reliably affected verbal learning performance, the visual learning data
was only compared to the combined performance from the spatial language and spati-
alized audio conditions of the previous experiments, collapsing across imagined and
real rotation. Note that these between-subjects comparisons were based on partici-
pants drawn from a similar background and who fell within the same range of spatial
abilities as measured by the SBSOD scale. As can be seen in Figure 4, the 78.6% (SE
= 8.6) target localization performance observed after visual learning was significantly
better than the 54.6% (SE = 5.4) performance of the spatial language conditions, t(28)
= 2.345, p=.027. By contrast, target localization accuracy in the spatialized audio
conditions, 77.5% (SE = 4.1), was almost identical to performance in the visual condi-
tion, t(26) = .116, p=.908. In agreement with the previous studies, route efficiency
was highly insignificant between all conditions, ps > .1.
Experiment 3 was run to benchmark performance with the VVD against visual
learning. Replicating earlier work, transfer performance after learning from a visual
display was significantly better than learning with spatial language with a VVD [8,
Experiment 3, 10]. However, target localization accuracy between the spatialized
audio conditions and the visual condition were nearly identical. This finding suggests
that learning with a spatialized audio display and an information-matched visual dis-
play build up into a spatial representation in memory which can be acted on in a func-
tionally equivalent manner.
Fig. 4. Comparison of mean target localization accuracy (± SEM) across all experiments. “Spa-
tial language” represents combined data from the two language conditions of Experiments 1
and 2, collapsing across imagined and real rotation. “Spatialized audio” represents the same
combined data from the two spatialized conditions of Experiments 1 and 2.
5 General Discussion
The primary motivation of these experiments was to investigate verbal learning and
cognitive map development using a new type of non-visual interface, called a virtual
verbal display. Previous research has demonstrated that VVDs support efficient
search behavior of unfamiliar computer-based environments but lead to inferior cog-
nitive map development compared to verbal learning in real environments or learning
in visually rendered VEs. The aim of this research was to understand what could ac-
count for these differences. Deficits in spatial knowledge acquisition with the VVD
were postulated as stemming from inadequacies of the interface. To address this pre-
diction, two factors influencing interface fidelity, spatialized audio and physical
rotation, were compared on a wayfinding task requiring accessing of an accurate
cognitive map.
Results showing almost identical performance on the training measures for all con-
ditions across experiments (see Table 1) but widely varying wayfinding accuracy
during transfer tests in the real building are informative. Indeed, these findings sup-
port the hypothesis that deficits in cognitive map development are related to factors of
interface fidelity, rather than use of ineffective search strategies with the VVD. The
most important findings from these studies are the results showing that information
about layout geometry conveyed as a spatialized verbal description versus from spa-
tial language lead to a dramatic improvement on cognitive map development. These
findings are congruent with previous studies showing an advantage of 3-D spatial
displays vs. spatial language during route guidance [18-20, 25].
The current results extend the efficacy of spatialized audio for providing perceptual
access to specific landmarks in the surrounding environment for use in route naviga-
tion to specifying environmental structure during free exploration to support cognitive
mapping. Of note to the motivations of the current work, wayfinding performance
during transfer after learning in the SA conditions in the VVD was on par with per-
formance after learning with an information-matched visual display, experiment 3,
and with verbal learning in real buildings [9]. The similarity of these results suggest
that virtual verbal displays incorporating spatialized information can support equiva-
lent spatial knowledge acquisition and cognitive map development. Although
comparisons between verbal and visual learning were made between subjects in the
current paper, these results are consistent with previous findings demonstrating func-
tionally equivalent spatial representations built up after learning target arrays between
the same conditions [38]. Interestingly, the benefit of SA seems to be magnified for
open search exploration of large-scale environments vs. directed guidance along
routes, as the 50% improvement for spatialized information observed in the current
study is much greater than the marginal advantage generally found in the previous
real-world route guidance studies. This finding is likely due to the increased cognitive
effort known for learning and updating in VEs [23, 24] being offset by the decreased
working memory demands of processing spatialized audio vs. spatial language [25].
The effects of including physical rotation vs. imagined rotation in the VVD were
investigated in Experiment 2. We expected this factor to have the greatest influence
on virtual verbal learning given the importance attributed to idiothetic cues from the
inclusion of physical rotation in visually rendered VEs [27, 29, 31, 33], and the im-
portance of physical movement on updating verbally learned target locations [35, 36].
Surprisingly, the inclusion of physical rotation during training with the VVD did not
lead to a significant advantage on subsequent wayfinding performance. Indeed, com-
parison of transfer performance between experiments 1 and 2 shows that conditions
employing spatialized descriptions led to the best verbal learning performance and did
not reliably differ whether they employed real or imagined rotation. As discussed in
Experiment 2, this finding may relate to our experimental design and more research is
needed to make any definitive conclusions.
For researchers interested in verbal spatial learning, especially in the context of

navigation, dynamically-updated virtual verbal displays represent an excellent re-
search tool. They also have important application to blind individuals for remote envi-
ronmental learning before traveling to a new place or as part of a multi-modal virtual
interface for training sighted people in low-light environments. Until now, their effi-
cacy as a research tool or navigation aid was questionable, as VVD training seemed to
lead to deficient cognitive map development. However, the results of this paper
clearly demonstrate that the VVD can be used to support these tasks and can be as
effective as verbal learning in real buildings or from a visual display when spatialized
verbal descriptions are used. These findings have clear implications for the impor-
tance of incorporating spatialized audio in dynamically-updated verbal interfaces.
References
1. Taylor, H.A., Tversky, B.: Spatial mental models derived from survey and route descrip-
tions. Journal of Memory and Language 31, 261–292 (1992)
2. Denis, M., et al.: Spatial Discourse and Navigation: An analysis of route directions in the
city of Venice. Applied Cognitive Psychology 13, 145–174 (1999)
3. Lovelace, K., Hegarty, M., Montello, D.: Elements of good route directions in familiar and
unfamiliar environments. In: Freksa, C., Mark, D.M. (eds.) Spatial information theory:
Cognitive and computational foundations of geographic information science, pp. 65–82.
4. Tversky, B.: Spatial perspective in descriptions. In: Bloom, P., et al. (eds.) Language and
Space, pp. 463–492. MIT Press, Cambridge (1996)
5. Loomis, J.M., et al.: Assisting wayfinding in visually impaired travelers. In: Allen, G.L.
(ed.) Applied spatial cognition: From research to cognitive technology, pp. 179–202. Erl-
baum, Mahwah (2007)
6. Giudice, N.A.: Navigating novel environments: A comparison of verbal and visual learn-
ing, Unpublished dissertation, University of Minnesota, Twin Cities (2004)
7. Giudice, N.A.: Wayfinding without vision: Learning real and virtual environments using
dynamically-updated verbal descriptions. In: Conference and Workshop on Assistive
Technologies for Vision and Hearing Impairment, Kufstein, Austria (2006)
8. Giudice, N.A., et al.: Spatial learning and navigation using a virtual verbal display. ACM
Transactions on Applied Perception (in revision)
9. Giudice, N.A., Bakdash, J.Z., Legge, G.E.: Wayfinding with words: Spatial learning and
navigation using dynamically-updated verbal descriptions. Psychological Research 71(3),
347–358 (2007)
10. Giudice, N.A., Legge, G.E.: Comparing verbal and visual information displays for learning
building layouts. Journal of Vision 4(8), 889 (2004)
11. Ruddle, R.A., Payne, S.J., Jones, D.M.: Navigating buildings in “desk-top” virtual envi-
ronments: Experimental investigations using extended navigational experience. Journal of
Experimental Psychology: Applied 3(2), 143–159 (1997)
12. Bliss, J.P., Tidwell, P., Guest, M.: The effectiveness of virtual reality for administering
spatial navigation training to firefighters. Presence 6(1), 73–86 (1997)
13. Waller, D., Hunt, E., Knapp, D.: The transfer of spatial knowledge in virtual environment
training. Presence 7, 129–143 (1998)
14. Piaget, J., Inhelder, B., Szeminska, A.: The child’s conception of geometry. Basic Books,
New York (1960)
15. Siegel, A., White, S.: The development of spatial representation of large scale environ-
ments. In: Reese, H. (ed.) Advances in Child Development and Behavior. Academic Press,
New York (1975)
16. O’Keefe, J., Nadel, L.: The hippocampus as a cognitive map. Oxford University Press,
London (1978)
17. Ehrlich, K., Johnson-Laird, P.N.: Spatial descriptions and referential continuity. Journal of
Verbal Learning & Verbal Behavior 21, 296–306 (1982)
18. Loomis, J.M., et al.: Personal guidance system for people with visual impairment: A com-
parison of Spatial Displays for route guidance. Journal of Visual Impairment & Blind-
ness 99, 219–232 (2005)
19. Loomis, J.M., Golledge, R.G., Klatzky, R.L.: Navigation system for the blind: Auditory
display modes and guidance. Presence 7, 193–203 (1998)
20. Marston, J.R., et al.: Evaluation of spatial displays for navigation without sight. ACM
Transactions on Applied Perception 3(2), 110–124 (2006)
21. Simpson, B.D., et al.: Spatial audio as a navigation aid and attitude indicator. In: Human
Factors and Ergonomics Society 49th Annual Meeting, Orlando, Florida (2005)
22. Oving, A.B., Veltmann, J.A., Bronkhorst, A.W.: Effectiveness of 3-D audio for warnings
in the cockpit. Int. Journal of Aviation Psychology 14, 257–276 (2004)
23. Richardson, A.E., Montello, D.R., Hegarty, M.: Spatial knowledge acquisition from maps
and from navigation in real and virtual environments. Memory & Cognition 27(4), 741–
750 (1999)
24. Wilson, P.N., Foreman, N., Tlauka, M.: Transfer of spatial information from a virtual to a
real environment. Human Factors 39(4), 526–531 (1997)
25. Klatzky, R.L., et al.: Cognitive load of navigating without vision when guided by virtual
sound versus spatial language. Journal of Experimental Psychology: Applied 12(4), 223–
232 (2006)
26. Lathrop, W.B., Kaiser, M.K.: Acquiring spatial knowledge while traveling simple and
complex paths with immersive and nonimmersive interfaces. Presence 14(3), 249–263
(2005)
27. Lathrop, W.B., Kaiser, M.K.: Perceived orientation in physical and virtual environments:
Changes in perceived orientation as a function of idiothetic information available. Presence
(Camb) 11(1), 19–32 (2002)
28. Bakker, N.H., Werkhoven, P.J., Passenier, P.O.: The effects of proprioceptive and visual
feedback on geographical orientation in virtual environments. Presence 8(1), 36–53 (1999)
29. Ruddle, R.A., Payne, S.J., Jones, D.M.: Navigating large-scale virtual environments: What
differences occur between helmet-mounted and desk-top displays. Presence 8(2), 157–168
(1999)
30. Wraga, M., Creem-Regehr, S.H., Proffitt, D.R.: Spatial updating of virtual displays during
self- and display rotation. Mem. and Cognit. 32(3), 399–415 (2004)
31. Klatzky, R.L., et al.: Spatial updating of self-position and orientation during real, imag-
ined, and virtual locomotion. Psychological Science 9(4), 293–299 (1998)
32. Grant, S.C., Magee, L.E.: Contributions of proprioception to navigation in virtual envi-
ronments. Human Factors 40(3), 489–497 (1998)
33. Farrell, M.J., et al.: Transfer of route learning from virtual to real environments. Journal of
Experimental Psychology: Applied 9(4), 219–227 (2003)
ments: Coordinate structure of perspective space. Perception 23(12), 1447–1455 (1994)
35. de Vega, M., Rodrigo, M.J.: Updating spatial layouts mediated by pointing and labelling
under physical and imaginary rotation. European Journal of Cognitive Psychology 13,
369–393 (2001)
36. Avraamides, M.N.: Spatial updating of environments described in texts. Cognitive Psy-
chology 47(4), 402–431 (2003)
37. Chance, S.S., et al.: Locomotion mode affects the updating of objects encountered during
travel: The Contribution of vestibular and proprioceptive inputs to path integration. Pres-
ence 7(2), 168–178 (1998)
38. Klatzky, R.L., et al.: Encoding, learning, and spatial updating of multiple object locations
specified by 3-D sound, spatial language, and vision. Experimental Brain Research 149(1),
48–61 (2003)
Cognitive Surveying: A Framework for Mobile
Data Collection, Analysis, and Visualization of
Spatial Knowledge and Navigation Practices
Drew Dara-Abrams
University of California, Santa Barbara

Depts. of Geography and Psychology
drew@geog.ucsb.edu
Abstract. Spatial cognition researchers must, at present, choose be-

tween the relevance of the real world and the precision of the lab. Here
I introduce cognitive surveying as a framework of computational tech-
niques to enable the automated and precise study of spatial knowledge
and navigation practices in everyday environments. I draw on surveying
engineering to develop the framework’s components: a hardware plat-
form, data structures and algorithms for mobile data collection, and
routines for data analysis and visualization. The cognitive surveying sys-
tem tracks users with GPS, allows them to label meaningful points as
landmarks, and asks them to point toward or estimate the distance to
out-of-sight landmarks. The data from these and other questions is then
used to produce specific analyses and comprehensive overviews of the
user’s spatial knowledge and navigation practices, which will be of great
interest to spatial cognition researchers, the developers of location-based
services, urban designers, and city planners alike.
Keywords: spatial cognition, spatial knowledge, navigation, geographic

data collection, GPS tracking.
1 Introduction
Much has been established about how people learn and navigate the physical
world thanks to controlled experiments performed in laboratory settings. Such
studies have articulated the fundamental properties of “cognitive maps,” the
fidelity of the sensory and perceptual systems we depend on, and the set of
decisions we make in order to reach a novel destination, for instance. Clear and
precise findings certainly, but in the process what has often been controlled
away is the worldly context of spatial cognition. To divorce internal mental
processes from the external influences of the world is to tell an incomplete story,

This work has been generously supported by the National Science Foundation
through the Interactive Digital Multimedia IGERT (grant number DGE-0221713)
and a Graduate Research Fellowship. Many thanks to Martin Raubal, Daniel Mon-
tello, Helen Couclelis, and, of course, Alec and Benay Dara-Abrams for suggestions.

Cognitive Surveying 139
for what sort of thinking is more intimately and concretely tied to our physical
surroundings?
And, forgetting the real world also means forgetting that spatial cognition
research can be valuable to professionals and ordinary people alike. Building
construction and city planning projects are oftentimes so complex that concerns
about engineering or budget take all attention away from the impact that the
environments will ultimately have on the people that will live and work there.
Ecological concerns are now addressed with environmental-impact reports. Spa-
tial cognition research has already identified techniques that could be used to
produce similarly useful reports on how a building floor plan can allow visitors to
successfully navigate or how a neighborhood can be designed to naturally draw
residents together in communal spaces. Non-professionals may not be charged
with designing their surroundings, but still, many would appreciate having an
opportunity to contribute. Spatial cognition methodology can be used to collect,
aggregate, and analyze their input. Also, the same approach can be applied to as-
sist individuals: In-car navigation systems and other location-based services are
notoriously inflexible and would certainly be improved if they took into account
end-users’ spatial knowledge and other subjective tendencies.
What is needed are techniques for precisely measuring spatial cognition in
real-world settings and analyzing the behavioral data in an automated fashion
so that results are consistent and immediately available to act on. Surveying en-
gineers have perfected this sort of measurement and analysis; the only difference
is that whereas a surveyor maps the physical world more or less as it exists,
spatial cognition researchers attempt to characterize the world as it is used and
remembered by people.
Such a sentence is most always followed by a reference to The Image of the
City, that slim and intriguing volume by Kevin Lynch (1960). He and his fel-
low urban planners come to understand three very different American cities by
studying and interviewing residents, and by aggregating the results for each city,
they produce “images” that represent how an average person might remember
Boston, Jersey City, or downtown Los Angeles. These “images” are composed
of five elements (paths, edges, districts, nodes, and landmarks), which Lynch
details in terms at once understandable to an urban designer and meaningful
to a spatial cognition researcher. Unfortunately, it’s less clear how to go about
collecting “images of the city” on your own with any precision or consistency,
since a fair amount of expert interpretation appears to be involved.
Toward that end, let me propose cognitive surveying as an umbrella under
which we can pursue the goal of characterizing people’s spatial knowledge and
navigation practices in more carefully defined computational and behavioral
terms, while still producing results that are understandable to researchers and
laypeople alike. In this paper, I will specify the architecture of such a system for
behavioral data collection, analysis, and visualization. Much relevant research
already exists in spatial cognition, surveying engineering, geographic informa-
tion science, and urban planning; this framework of cognitive surveying ought
to serve well to integrate the pieces.
140 D. Dara-Abrams
2 Cognitive Surveying
Cognitive surveying is the measurement of a person’s spatial knowledge and

navigation practices conducted in a piecemeal fashion in the real world. In com-
putational terms, cognitive surveying can be described as a combination of data
structures, algorithms, analysis routines, and visualizations. When implemented
as a useable system, some electronics are also involved (although this framework
is not tied to the particulars of any one computer system). The novel contribution
of cognitive surveying is this integration, and so let’s begin by stepping through
the framework in its entirety, even if not every component will be required for
each application.
2.1 Hardware
The tools of a surveyor have been combined, in recent years, into the single
package of a total station, which contains a theodolite to optically measure an-
gles, a microwave or infrared system to electronically measure distances, and a
computer interface to record measurements (for surveying background, see An-
derson & Mikhail, 1998). Some total stations also integrate GPS units to take
measurements in new territory or when targets are out of sight. The equipment
remains bulky, yet its functions can also now be approximated with portable,
consumer-grade hardware: a GPS unit, an electronic compass, and a mobile com-
puter (as illustrated in Figure 1). If the mobile computer has its own wireless
Internet connection, it can automatically upload measurements to a centralized
server for analysis. Otherwise, the user can download the measurements to a
PC that has an Internet connection. Although it is a somewhat more involved
process, asking the user to connect the mobile device to a PC at home, work,
or in a lab provides a further opportunity to also assess their spatial knowledge
away from the environment, by presenting tasks, like a map arrangement, on
the big screen of a PC. (More on these tasks and other measurements momen-
tarily.) Ultimately, cellular phones may be the platform of choice for real-world
measurement, since they are connected, ready at hand, and kept charged.
2.2 Mobile Data Collection
With their equipment, surveyors make only a few types of measurements, but by
repeating elementary measurements they are able to perform complex operations
like mapping property boundaries. The measurement techniques of cognitive
surveying are similarly elementary, already widely used in the spatial cognition
literature (if rarely used together), and become interesting when repeated ac-
cording to a plan (detailed in Figure 2). Since all this data is collected in the
field, the most fundamental measurement is the user’s position, which can be
captured to within a few meters by GPS, cleaned to provide a more accurate
fix, and recorded to a travel log (see Shoval & Isaacson, 2006). The GPS in-
formation can be supplemented with status information provided by the user
mobile computer (worn on a sash or belt clip?)

Bluetooth • small screen for display • storage space for logging
• small keyboard (or on- • processing power to run
GPS unit screen) for user input real-time algorithms
(worn on shoulder?)
• sound for alerting user • interfaces to other devices
• battery life for a full day (USB, serial, Bluetooth)
USB
WiFi USB or serial
download to user’s PC
for map arrangement
electronic compass
upload data to server (affixed to mobile computer)
for analysis, output, and Internet
visualization
Fig. 1. Hardware components to use for mobile data collection
point measures (questions like “what’s the name of the

neighborhood you’re currently in?)
datetime lat. long. measure value
2008-01-20 134.224 32.332 neighborhood downtown
... ... ... ... ...
GPS unit
algorithm: determine when to user input: user changes status
ask point measure questions to (e.g., “lost” or “not lost”; “in car”
algorithm:
sufficiently sample a region or “on foot”)
clean traces
electronic
correct for
compass magnetic
travel log (records user’s movement every few seconds) declination
datetime lat. long. altitude status
algorithm: determines when to algorithm: determines when to
2008-01-20 134.224 32.332 24.5 ft. completely lost
ask distance estimate questions in ask direction estimate questions
... ... ... ... ...
order to cover entire environment in order to cover entire environ-
with sufficient repetition ment with sufficient repetition
distance estimates direction estimates

(questions like “how far to the courthouse?”)
base map of (questions like “which way to the courthouse?”)
target estimated environment target compass
(used to precompute
datetime lat. long. landmark distance datetime lat. long. landmark heading
sampling patterns and to
2008-01-20 134.224 32.332 courthouse 10 units clean GPS input) 2008-01-20 134.224 32.332 courthouse 24.5 °
... ... ... ... ... ... ... ... ... ...
algorithm: suggests to user user input: user decides

when to label a landmark to label a landmark
landmarks (points labeled by user) landmark map arrangement

name lat. long. (optional): download landmarks to
county courthouse 134.254 32.362 PC and ask participant to drag and
... ... ... drop dots to make a map
Fig. 2. Data structures and algorithms for mobile data collection
(e.g., “lost” or “in car”). The travel log alone allows us to begin to evaluate
navigation practices (to be discussed in Section 3).
More complex measurements are required to assess spatial knowledge. These
are landmarks, other point measures, direction estimates, and distance estimates
(to be discussed in Section 4). Landmarks are points of personal importance
142 D. Dara-Abrams
labeled by the user. She may label any point of her choosing; an algorithm can
be used to suggest potential landmarks to her as well. Her knowledge of the
landmarks’ relative locations is measured by direction and distance questions.
From one landmark, she is asked to point to other landmarks, using an electronic
compass. (Compass readings will need to be corrected against models of magnetic
declination, which are available on-line.) Also, she is asked to judge the distance
from her current location to other landmarks by keying in numerical estimates.
In addition to labeling landmarks, the user can be asked to provide other point
measures. For instance, she can be asked “What’s the name of the neighborhood
you’re currently in?” The algorithms that decide when to ask these questions
can call on a base map of the environment, not to mention the user’s travel log.
Finally, when the user is sitting at a larger computer screen—back home at her
PC, say—she can be asked to again consider her landmarks by arranging them
to make a map. For all of these tasks, both the user’s answer and her reaction
time can be recorded.
From these measurements will come a comprehensive data set on an individ-
ual’s spatial knowledge and navigation practices to analyze and visualize.
2.3 Data Analysis and Visualization
Lynch only produced “images” for groups, but from this point, angle, and dis-
tance data can come both individual and aggregate analyses (see Figure 3). Of
particular interest will be the routes that people take, the accuracy of their spa-
tial knowledge, and the contents of their spatial knowledge (all of which will
be discussed below). While quantitative output will be necessary to run behav-
ioral studies, also important will be visualizations, which are oftentimes much
more effective at quickly conveying, for instance, the distorted nature of spatial
knowledge or the cyclical nature of a person’s movement day after day (see also
Dykes & Mountain, 2003; Kitchin, 1996a).
3 Moving, Traveling, Navigating
People move, doing so for any number of reasons, through any number of spaces,
at any number of scales. As such, a number of research traditions consider human
movement. At the scale of cities and other large environments, time geography
studies how the constraints of distance limit an individual, and transportation ge-
ography studies the travel of masses between origins and destinations (Golledge
& Stimson, 1996). Cognition is certainly involved in both, but memory and cog-
nitive processes are most evident, and most studied, in navigation. More specif-
ically, spatial cognition research often decomposes navigation into locomotion
and wayfinding (Montello, 2005), the former being perceptually guided move-
ment through one’s immediate surrounds (walking through a crowded square
and making sure to avoid obstacles, say) and the latter being route selection
between distant locations (figuring out how to travel from Notre Dame to the
Luxembourg Gardens, say). When attempting to understand how an individual
data collected individual-level analyses

from participants
routes
• characterize participant’s travel behavior: loop-backs, pauses, shortcuts…
travel log • determine most popular route taken between two points
• relate routes to structure of environment (like Conroy Dalton, 2003)
landmarks
spatial knowledge accuracy
direction estimates • average across repeated direction and distance estimates
• infer missing measurements and propegate error among existing ones
• fit together multiple direction and distance estimates (multidimensional
distance estimates scaling, like Waller & Haun, 2003)
• relate accuracy to structure of environment, by correlating with measures
point measures from environmental form models (as in Dara-Abrams, 2008)
landmark map
arrangements aggregate analyses
spatial knowledge contents
• compute regularity of landmark use across multiple participants
environmental • evaluate distributions of point measures (say, to identify neighborhood
areas); draw polygons around regions of similar points (similar to
data sources Montello, Goodchild, Gottsegen, & Fohl, 2003)
base map
of environment
models of visualizations exploration patterns “image of the city”

environmental form • map out routes taken by • arrange landmarks to fit
participants, frequency of visits measurements taken by an
• overlay on environment (base individual or a group
map and models of form) • outline best-fit neighborhood
• like Dykes and Mountain’s (2003) boundaries
Location Trends Extractor • mark highly frequented routes
Fig. 3. Data analysis and visualization
uses and remembers a city, wayfinding behavior is of particular interest. Which

locations does a person choose to visit? What routes does she take between those
places? Do those routes appear to be derived from certain wayfinding strategies?
When is she confident in her wayfinding abilities and when might she instead
be feeling lost? Cognitive surveying can allow us to consider these questions
in real world settings thanks to the automated tracking of GPS, the option to
collect user input along the way, and the ability to analyze and visualize this
data together with users’ custom sets of landmarks, their direction and distance
estimates, and any number of other measurements taken in the field.
4 Spatial Knowledge
Spatial knowledge is the stored memories we call on when orienting ourselves in
a familiar environment, navigating toward a known destination, writing route
directions for a visitor from out of town, and so on. Like other aspects of cog-
nition, spatial knowledge can be modeled in a computational fashion (Kuipers,
1978). That, however, is not the goal of cognitive surveying, which is focused
on measuring spatial knowledge. In fact, the “read-out” provided by a cogni-
tive surveying system should be of much interest to cognitive scientists who are
developing and calibrating computational models of spatial knowledge.
Therefore, what is needed for the purposes of cognitive surveying is not a the-
ory of spatial knowledge itself but simply a working abstraction that can be used
to measure spatial knowledge. Lynch’s five elements are one such abstraction,
144 D. Dara-Abrams
but some are too subjective. By borrowing from surveying engineering, we can
fashion a more computationally precise set of elements to use to measure spatial
knowledge: landmarks, direction estimates, distance estimates, and regions.
4.1 Landmarks and Other Locations

Land surveyors rely on control points and monuments to fix their measurements.
For humans, landmarks are the equivalent identifying features. Like the Eiffel
Tower, landmarks are often visually salient, have special meaning, and stand in
prominent locations (Sorrows & Hirtle, 1999). Yet just as the plainest pipe may
be used as a monument by a surveyor, people often depend on unconventional
landmarks when providing route directions or describing their neighborhoods.
Allowing each person to identify his own custom set of landmarks is an important
way in which cognitive surveying can more closely study the spatial knowledge
of individuals rather than aggregates (see Figure 4). Algorithms can be used
to suggest potential landmarks to a user. Heuristics that rely on GPS input
measure signal loss—since it’s often when entering buildings that sight of the
GPS satellites is lost (Marmasse & Schmandt, 2000)—or duration—as people
are more likely to remain for a while in or near landmarks (Ashbrook & Starner,
2003; Nurmi & Koolwaaij, 2006). Another approach is to rely on information
about the environment itself to identify in advance which locations may make
for salient landmarks (Raubal & Winter, 2002) and to then suggest those when
the user is nearby.
user input GPS signal loss GPS point clustering environmental analysis
• (see Marmasse & Schmandt, 2000) (after Ashbrook & Starner, 2003) (after Raubal & Winter, 2002)
please press this button
whenever you’re near
procedure:
a landmark that you’d
1. ahead of time identify potential
like to label ? landmarks by analyzing base maps,
? taking into account:
• visual attraction (facade area,
shape, color, visibility)
• semantic attraction (cultural and
historical importance, signage)
procedure: 2. when user nears a potential land-
1. watch for loss of GPS signal mark, ask if it’s personally meaning-
procedure: ful and worth labeling
2. when the signal degrades and then
disappears, assume the user has 1. filter based on speed to separate
entered a building moving points (unfilled dots) from
3. ask user if the obstructing location is pause points (filled dots)
worth labeling 2. when pause points cluster together,
ask the user if this is a meaningful
landmark worth labeling
Fig. 4. Landmark identification approaches
4.2 Directions and Distances
Once a surveyor has identified measurement points, the next task is to deter-
mine the directions and distances among the set. Using triangulation and other
trigonometric techniques may mean that some measurements can be inferred
from others. People certainly rely on shortcuts and heuristics, too, but whereas
a surveyor strives for accuracy, human spatial knowledge is systematically dis-

torted, presumably in the interest of efficiency (Tversky, 1992). People asked
to estimate the direction from San Diego to Reno are likely to draw an ar-
row pointing northeast, toward the center of Nevada but away from the actual
northwestern direction (Stevens & Coupe, 1978). People stopped on the street
are significantly more accurate at pointing to distant locations when on an or-
thogonal street grid than at an odd angle (Montello, 1991a). And people asked
to estimate the distance between university buildings judge the more common
buildings to be closer to the landmark buildings than vice versa—their distance
estimates are asymmetric (Sadalla, Burroughs, & Staplin, 1980).
Paper, pencil, and manual pointing dials have served these experimenters
well; the studies are controlled and the results are straightforward to interpret,
although see Montello, Richardson, Hegarty, and Provenza (1999) and Waller,
Beall, and Loomis (2004) on the relative merits of various direction estimate
methods, some of which must be analyzed using circular statistics (Batschelet,
1981); also see Montello (1991b) on distance estimate methods. Cognitive survey-
ing can use similarly simple direction and distance estimates, but by automat-
ing the data collecting with GPS tracking, an electronic compass, and a mobile
computer, the questions can be asked in situ for landmarks drawn from each
participant’s custom set. The time taken by the user to complete an estimate
can be precisely recorded as well.
Surveying Operations for Direction and Distance Estimates. If direc-

tion and distance estimates can be made at any time, when and where should
the user be queried? To tackle this question, surveying engineering offers us a
number of surveying operations (illustrated in Figure 5; see Anderson & Mikhail,
1998; Barnes, 1988). In a traverse, one of the simpler and more accurate sur-
veying operations, control points are placed in a line and at each, measurements
are taken backward. Repeatedly asking a person to estimate the distance and
direction toward the place they just left may also yield relatively accurate mea-
surements. In fact, this is similar to the “look-back” strategy that children and
adults can use to better learn a route (Cornell, Heth, & Rowat, 1992). Asking
questions according to a traverse arrangement may thus be more appropriate for
testing route knowledge rather than survey knowledge (more on the distinction
in a moment).
Triangulation, in which measurements are combined following the law of sines,
is likely a better method for capturing a person’s knowledge of the configuration
of a number of landmarks spread over a large environment (see Kirasic, Allen, &
Siegel, 1984, where the authors call their procedure projective convergence; tri-
angulation can be used for similar purposes in indoor settings as well: Hardwick,
McIntyre, & Pick, 1976). Land surveyors using triangulation cover their terrain
with triangles, measure the inner angles of those triangles, and finally measure
the distance of one leg of a triangle. From that baseline distance measurement,
the entire network of measurements can be scaled. Human spatial knowledge is
hardly as precise as a microwave or infrared distance measuring system, and so
to evaluate distance knowledge based on only one distance estimate would not be
146 D. Dara-Abrams
traversing D
C
A
dDC
αDC
αCB
dCB
dBA B
procedure:
1. estimate distance (d) to previous
landmark visited
αBA 2. estimate angle (α) to previous
landmark visited
two advantages: a disadvantage: perhaps only useful for
• easy computations measuring routes (see Cornell, Heth, &
• not too many estimates required Rowat, 1992)
triangulation A
(direction only)
B
procedure:
1. estimate directions to other
αBA αBA
landmarks so that each landmark αAD α
AE
sits at the vertex of a triangle αBD
2. fit together direction estimates to D
C αBC αDA
determine the relative position of
each landmark
αCB
αDB
an advantage: can measure survey αCD
knowledge for large areas αDE
a disadvantage: depends on a large αCE αDC
number of estimates E
a question: How does triangle size affect
the data collected? What would be the αED αEA
difference between using ∆BCD and
∆BCE? αEC
trilateration
procedure: A
1. estimate distances from landmark dAB
to landmark (note that A-B can be
perceived to be a different distance
B
dBA
than B-A)
2. combine distance estimates using dBC dAD
dDB dDA
multidimensional scaling to produce a dBD
best-fit arrangement of landmarks dCB
advantages, disadvantages, and questions C dDC
are similar to triangulation D dAE dEA
dCD
another issue: distance knowledge is often
dED
poorer than direction knowledge (Waller & dEC dDE
Huan, 2003)
dCE
E
triangulation
B
(direction and distance) αBA
dAB
αBA
procedure: d αAD αAE
1. estimate directions and distances αBD BA dAD
from landmark to landmark dBC α d D dDA
C BC DB α
2. combine estimates using direction/ dBD DA
αCB
distance scaling to produce a best-fit dCB
d αDB
arrangement of landmarks αCD DC d d
an advantage: more comprehensive αDE AE EA
measure of survey knowledge αCE dCD αDC
d
a disadvantage: computationally intensive dEC dDE ED E
a question: Can missing estimates be dCE αED αEA
approximated based on others that were
in fact taken?
αEC
Fig. 5. Surveying operations for collecting direction and distance estimates

proper. Thus, triangulation is probably best used for cognitive surveying when
performed without any distance estimates or with a number of distance estimates
distributed around the triangle network. If using only distance measurements,
trilateration—as is performed in more complex forms to derive positions from
GPS signals—can be used instead.
In any case, the cognitive surveyor has an advantage over the land surveyor:
Precise base maps already exist for most urban and natural environments, and so
we can use information about streets, paths, and other environmental features
to guide the sampling design that determines when to ask users to estimate
directions and distances. (More on sampling design in the next section.)
As these direction and distance estimates are collected, they can be integrated
in order to approximate the user’s overall spatial knowledge and to propagate
error among repeated measurements. Multidimensional scaling, or MDS, is one
such technique often used to turn pairwise distance and direction estimates into
a best-fit two-dimensional configuration (Waller & Haun, 2003). When apply-
ing MDS to distance estimates alone, the optimization procedure is effectively
trilateration repeated many times.
Map Arrangements. In addition to taking multiple direction and distance

estimates so that MDS and other fitting routines have more data to work with,
it’s ideal to test people on a range of tasks, with the goal of using different meth-
ods to converge on their spatial knowledge (see Kitchin, 1996b). Direction and
distance questions evaluate knowledge of the relative positions of two locations.
To consider people’s overall knowledge of an environment, using a different ap-
proach, we can ask them to also make a map arrangement. The user is given
dots for each of her landmarks and asked to arrange those dots so that they
best approximate the real-life locations of the landmarks. While a map arrange-
ment may be simplistic, the technique has important advantages over sketch
mapping (first used by Lynch, 1960); asking people to draw a map is a difficult
process to automate, the sketchs are tricky to score consistently, and the process
conflates spatial knowledge with drawing ability. Like a sketch map, a map ar-
rangement does often serve as a compelling visualization of a person’s distorted
spatial knowledge. Moreover, x and y coordinates for each landmarks can easily
be extracted from a map arrangement, and these measures can be compared
with those produced by MDS using multidimensional regression (Friedman &
Kohler, 2003). Asking direction and distance questions in the field and doing
map arrangements when users have returned home or to the lab will provide us
will a more complete understanding of their spatial knowledge.
4.3 Learning
People do not come by their spatial knowledge of an environment instantaneously—
we learn over time from repeated exposure and novel experience. The most widely
accepted theory of spatial microgenesis (as one’s acquisition of spatial knowledge
for an environment is called) proposes that people first learn the locations of point-
like landmarks, then learn the linear routes that connect pairs of landmarks, and
148 D. Dara-Abrams
finally learn how the landmarks and routes fit into an overall configuration, known
as survey knowledge (Siegel & White, 1975). If people follow discrete stages in this
manner, they will not begin to acquire metric knowledge, like the direction between
a pair of landmarks, until the final stage. Yet longitudinal studies suggest that spa-
tial microgenesis may progress in a continuous manner, without qualitatively differ-
ent stages (Ishikawa & Montello, 2006). The consistent, automated data collection
that a cognitive surveying system offers will be invaluable for studying how people
learn an environment over time.
4.4 Regions
One way by which we learn environments is to subdivide them into meaningful
regions (Hirtle, 2003). In the case of cities, these regions are usually neighbor-
hoods, districts, wards, barrios, and so on. Some are official while others are
informally defined—the City of London versus the Jets’ turf. Even if a region
name is in common parlance, its boundary is still likely vague (Montello, Good-
child, Gottsegen, & Fohl, 2003).
Regions may be areas, but like any other polygon, their extents can be ap-
proximated by point measures. In other words, users can be asked occasionally
“What’s the name of this neighborhood?” and around that sampling of points,
polygons of a certain confidence interval can be drawn. As with direction and
distance estimates, there is the question of when to ask point measurement
questions. A number of sampling designs can be used (as in Figure 6): wait for
user input; ask questions at preset temporal intervals; ask questions at uniform,
preset spatial locations; and, ask questions at preset spatial locations whose se-
lection has been informed by base maps of the environment in question. The
best approach is likely a combination of all four.
uniform spatial informed spatial

wait for user input stratification stratification a combined approach
• Use informed spatial stratification
please press this button
when base map is available; when
whenever you’re ready
not, revert to uniform spatial
to answer a question
stratification.
• Limit the number of questions asked
by time period, to ensure that the
user is not overwhelmed.
• Allow the option of user input.
time
Every t minutes, When the user comes within radius r of a Based on a base map, cluster potential
ask a question. potential sample point, ask a question. sample points where the user is more
(see Longley, et al., 2005, p. 91) likely to travel.
Fig. 6. Sampling approaches
Other point measures may be collected and analyzed in a similar manner. For
example, in one of the more clever demonstrations of GPS tracking, Christian
Nold has logged people’s position and their galvanic skin response with the goal
of mapping the physical arousal associated with different parts of a city (see
biomapping.net).
This sort of subjective spatial data is highly personal, yet when aggregated it
can be of use. Again, take the example of regions. Mapping firms now compete
to provide data sets of city neighborhoods to Web search engines, real estate
firms, and others who want to organize their spatial data in a more intuitive
manner. (For example, see Zillow.com and EveryBlock.com.) A cognitive sur-
veying system is one means by which an individual’s spatial knowledge can be
measured and aggregated for these sorts of purposes.
5 Spatial Ability and Other Individual Differences

On the other hand, aggregating spatial knowledge and navigation practices and
other behavioral results can mask the fact of individual differences. Some cer-
tainly do better than others at navigating an unfamiliar city, and psychometric
tests certainly find that people vary in terms of their spatial abilities (Hegarty
& Waller, 2005). One’s sex is sometimes all too quickly targeted as the cause of
these differences, leaving aside the role of one’s motivation, penchant to explore,
confidence in staying oriented, money to be able to fund travels. These are only
a selection of factors that could be considered when analyzing data collected
with a cognitive surveying system.
6 User Assistance and Location-Based Services

If individual differences can be understood, those who might benefit from ad-
ditional assistance can also be helped. Today’s location-based services, or LBS,
like in-car navigation systems are poorly designed and difficult to operate. If
LBS can adapt themselves to better fit each particular user, they will be more
useable. Accordingly, the first step is to understand your user, and for this, data
collection is key. The methods of cognitive surveying can provide LBS with:
– a custom set of landmarks to use when generating route directions (Raubal

& Winter, 2002)
– subjective distances that represent how long the user thinks a travel segment
will take to traverse (relevant to travel constraint planning: Raubal, Miller,
& Bridwell, 2004)
– a rating of the user’s relative performance on route knowledge (collected us-
ing a traverse operation) and survey knowledge (collected using a triangula-
tion or trilateration operation), which may indicate what type of instructions
to include in his route directions
– a map of what territory is known to the user and what he has yet to explore
(which can be applied to formulate route directions of varying detail: Srinivas
& Hirtle, 2006)
Using this sort of subjective spatial data may very well help LBS become
easier and more useful for the end-users.
150 D. Dara-Abrams
7 The Environment
So far we have focused on the knowledge that people carry in their heads and the
cognitive processes that they use to navigate. Both are, by definition, intimately
tied to the physical world, the information that it offers and the constraints
that it imposes. Spatial cognition researchers wish to understand the interplay,
while the designers and planners who are charged with creating and enhancing
built environments want to understand how those places are used. As cognitive
surveying can be performed accurately in real-world settings, such a system can
be effective in both cases.
The behavioral data being collected and analyzed here is perfectly suited for
comparison with computational models of environmental form. In short, an en-
vironmental form model captures the patterns of accessibility or visibility for a
physical setting like a building interior or a university campus using grid cells
(Turner, Doxa, O’Sullivan, & Penn, 2001), lines (Hillier & Hanson, 1984), or
other geometric building blocks. Quantitative measures can be computed and
extracted for a certain location, a path, or an entire region. Certain environmen-
tal form measures have been found to predict the number of pedestrians walking
on city streets (Hillier, Penn, Hanson, & Xu, 1993) and the accuracy of students’
spatial knowledge for their university campus (Dara-Abrams, 2008). A cognitive
surveying system will help further research on the relationship between human
behavior/cognition and models of environmental form.
Even without the specificity of an environmental form model, the data collec-
tion and analysis of cognitive surveying can inform the work of architects, urban
designers, and city planners. Lynch demonstrated that collecting “images of the
city” identifies design flaws to remediate, captures reasons underlying residents’
attitudes toward development, and reveals which places are attractive to resi-
dents and which are not. These, among other practical outcomes of measuring
spatial knowledge and navigation practices, are details that can guide not just
the mechanics of design but also the way in which projects are presented and
framed to the public. Collecting “images” depends on trained experts, but a cog-
nitive surveying system could be deployed and used by architects and planners,
as well as expert cognitive scientists.
8 Further Research and Conclusion

What I am presenting as cognitive surveying is an amalgamation of mobile com-
puter hardware, software for data collection and analysis, ideas for behavioral
studies, and practical applications. Many of these components already exist.
The novelty is in the framework that unites the behavioral methodology of spa-
tial cognition, the techniques of surveying engineering, the data analysis meth-
ods of geographic information science, and the concerns of design professionals.
The basic measurements being collected are simple, even simplistic—travel logs,
landmarks and other point-based measures, estimated directions and distances
between those landmarks, and so on—but from these, complex descriptions can
be constructed and theoretically interesting questions addressed, including:
– When people are allowed to freely travel through an environment, does their
spatial knowledge contain the same sort of systematic errors that have been
found in lab-based studies?
– When people repeatedly explore an environment, how does their spatial
knowledge develop over time? Does their learning follow a fixed set of qual-
itative stages or instead progressively increase from the beginning?
– How do spatial abilities relate to other factors that may also cause individual
differences in spatial knowledge and navigation practices (e.g, regular travel
extent, confidence in spatial abilities, sex, demographics)?
– What are the optimal surveying operations and sampling designs for mea-
suring spatial knowledge? Are particular parameters more appropriate for
certain circumstances and studies than others? For instance, is knowledge for
a long route best tested using a different set of parameters than knowledge
for a neighborhood?
– Can models of environmental form predict where people are likely to travel,
which features they are likely to remember, and how accurate that spatial
knowledge will likely be? If so, can these models be used to better under-
stand which particular properties of real-world environments influence peo-
ple’s spatial knowledge and navigation practices?
– How can the automated collection of this subjective data improve location-
based services and assist the users of other electronic services?
– Will summaries and visualizations of people’s spatial knowledge and navi-
gation practices make for the beginnings of a “psychological-impact report”
for environmental design projects?
Cognitive surveying will better enable us to pursue all of these research
questions.
This paper’s contribution is the framework of cognitive surveying. In the fu-
ture, I intend to present implemented systems along with results that begin to
address the preceding questions. Even as a conceptual framework, cognitive sur-
veying can already help us take spatial cognition research into the real world.
We now know what sort of questions to ask of a person and what sort of mea-
surements to record, when to ask each question and when to alternate methods,
how to synthesize all these measurements and how to present them for analysis.
In addressing such issues, cognitive surveying will allow us to characterize the
world as it is remembered and used by people—if not with absolute accuracy, at
least with consistency and ease.
References
Anderson, J.M., Mikhail, E.M.: Surveying: Theory and practice. WCB/McGraw-Hill,
Boston (1998)
Ashbrook, D., Starner, T.: Using GPS to learn significant locations and predict move-
ment across multiple users. Personal Ubiquitous Computing 7, 275–286 (2003)
152 D. Dara-Abrams
Barnes, W.M.: BASIC surveying. Butterworths, London (1988)

Batschelet, E.: Circular statistics in biology. Academic Press, New York (1981)
Conroy Dalton, R.: The secret is to follow your nose. Environment and Behavior 35,
107–131 (2003)
Cornell, E.H., Heth, C.D., Rowat, W.L.: Way finding by children and adults: Re-
sponse to instructions to use look-back and retrace strategies. Developmental Psy-
chology 28, 328–336 (1992)
Dara-Abrams, D.: Modeling environmental form to predict students’ spatial knowledge
of a university campus. Unpublished master’s thesis, University of California, Santa
Barbara (2008)
Dykes, J.A., Mountain, D.M.: Seeking structure in records of spatio-temporal be-
haviour: Visualization issues, efforts and applications. Computational Statistics and
Data Analysis 43, 581–603 (2003)
Friedman, A., Kohler, B.: Bidimensional regression: Assessing the configural similarity
and accuracy of cognitive maps and other two-dimensional data sets. Psychological
Methods 8, 468–491 (2003)
Golledge, R.G., Stimson, R.J.: Spatial behavior: A geographic perspective. Guilford
Press, New York (1996)
Hardwick, D.A., McIntyre, C.W., Pick, H.L.: The content and manipulation of cognitive
maps in children and adults. In: Monographs of the Society for Research in Child,
Development, vol. 41(3), pp. 1–55. University of Chicago Press, Chicago (1976)
Hegarty, M., Waller, D.A.: Individual differences in spatial abilities. In: Shah, P.,
Miyake, A. (eds.) The Cambridge handbook of visuospatial thinking. Cambridge
University Press, UK (2005)
Hillier, B., Hanson, J.: The social logic of space. Cambridge University Press, Cam-
bridge (1984)
Hillier, B., Penn, A., Hanson, J., Xu, J.: Natural movement: or, configuration and
attraction in urban pedestrian movement. Environment and Planning B 20, 29–66
(1993)
Hirtle, S.C.: Neighborhoods and landmarks. In: Duckham, M., Goodchild, M.F., Wor-
boys, M.F. (eds.) Foundations of geographic information science. Taylor & Francis,
London (2003)
Ishikawa, T., Montello, D.R.: Spatial knowledge acquisition from direct experience in
the environment: Individual differences in the development of metric knowledge and
the integration of separately learned places. Cognitive Psychology 52, 93–129 (2006)
Kirasic, K.C., Allen, G.L., Siegel, A.W.: Expression of configurational knowledge of
large-scale environments: Student’s performance of cognitive tasks. Environment
and Behavior 16, 687–712 (1984)
Kitchin, R.: Exploring approaches to computer cartography and spatial analysis in cog-
nitive mapping research: CMAP and MiniGASP prototype packages. Cartographic
Journal 33, 51–55 (1996a)
Kitchin, R.: Methodological convergence in cognitive mapping research: investigat-
ing configurational knowledge. Journal of Environmental Psychology 16, 163–185
(1996b)
Kuipers, B.J.: Modeling spatial knowledge. Cognitive Science 2, 129–153 (1978)
Longley, P.A., Goodchild, M.F., Maguire, D.J., Rhind, D.W.: Geographic information
systems and science. Wiley, Chichester (2005)
Lynch, K.: The image of the city. MIT Press, Cambridge (1960)
Marmasse, N., Schmandt, C.: Location-aware information delivery with ComMo-
tion. In: Thomas, P., Gellersen, H.-W. (eds.) Handheld and ubiquitous computing.
Montello, D.R.: Spatial orientation and the angularity of urban routes: A field study.
Environment and Behavior 23, 47–69 (1991a)
Montello, D.R.: The measurement of cognitive distance: Methods and construct valid-
ity. Journal of Environmental Psychology 11, 101–122 (1991b)
Montello, D.R.: Navigation. In: Shah, P., Miyake, A. (eds.) The Cambridge handbook
of visuospatial thinking, pp. 257–294. Cambridge University Press, UK (2005)
Montello, D.R., Goodchild, M.F., Gottsegen, J., Fohl, P.: Where’s downtown?: Behav-
ioral methods for determining referents of vague spatial queries. Spatial Cognition
and Computation 3, 185–204 (2003)
Montello, D.R., Richardson, A.E., Hegarty, M., Provenza, M.: A comparison of methods
for estimating directions in egocentric space. Perception 28, 981–1000 (1999)
Nothegger, C., Winter, S., Raubal, M.: Selection of salient features for route directions.
Spatial Cognition and Computation 4, 113–136 (2004)
Nurmi, P., Koolwaaij, J.: Identifying meaningful locations. In: The 3rd Annual In-
ternational Conference on Mobile and Ubiquitous Systems: Networks and Services
(MobiQuitous), San Jose, CA (2006)
Raubal, M., Miller, H.J., Bridwell, S.A.: User-centered time geography for location-
based services. Geografiska Annaler-B 86, 245–265 (2004)
Raubal, M., Winter, S.: Enriching wayfinding instructions with local landmarks. In:
Egenhofer, M., Mark, D. (eds.) Geographic Information Science, pp. 243–259.
Sadalla, E.K., Burroughs, W.J., Staplin, L.J.: Reference points in spatial cognition.
Journal of Experimental Psychology: Human Memory and Learning 5, 516–528
(1980)
Shoval, N., Isaacson, M.: Application of tracking technologies to the study of pedestrian
spatial behavior. The Professional Geographer 58, 172–183 (2006)
Siegel, A.W., White, S.H.: The development of spatial representations of large-scale
environments. In: Advances in child development and behavior, vol. 10, pp. 9–55.
Academic, New York (1975)
Sorrows, M.E., Hirtle, S.C.: The nature of landmarks for real and electronic spaces.
In: Freksa, C., Mark, D. (eds.) Spatial Information Theory, pp. 37–50. Springer,
Heidelberg (1999)
Srinivas, S., Hirtle, S.C.: Knowledge-based schematization of route directions. In:
Barkowsky, T., Knauff, M., Ligozat, G., Montello, D.R. (eds.) Spatial cognition
V: Reasoning, action, interaction, pp. 346–364. Springer, Berlin (2006)
Stevens, A., Coupe, P.: Distortions in judged spatial relations. Cognitive Psychology 10,
422–437 (1978)
Turner, A., Doxa, M., O’Sullivan, D., Penn, A.: From isovists to visibility graphs: A
methodology for the analysis of architectural space. Environment and Planning B:
Planning and Design 28, 103–121 (2001)
Tversky, B.G.: Distortions in cognitive maps. Geoforum 23, 131–138 (1992)
Waller, D.A., Beall, A., Loomis, J.M.: Using virtual environments to assess directional
knowledge. Journal of Environmental Psychology 24, 105–116 (2004)
Waller, D.A., Haun, D.B.M.: Scaling techniques for modeling directional knowledge.
Behavior Research Methods, Instruments, and Computers 35, 285–293 (2003)
What Do Focus Maps Focus On?
Kai-Florian Richter1,2 , Denise Peters1,2,

Gregory Kuhnmünch1,3 , and Falko Schmid1,2
1
2
Universität Bremen, Germany
3
Universität Freiburg, Germany
{richter,peters,schmid}@sfbtr8.uni-bremen.de,
gregory@cognition.uni-freiburg.de
Abstract. Maps are an important, everyday medium to communicate

spatial information. We are faced with a great variety of different maps
used for different purposes. While many of these maps are task-specific
and concentrate on specific pieces of information, often they do not sup-
port map reading to extract the information relevant for the task at
hand. In this paper, we explore the concept of focus maps. This concept
has been previously presented with a restricted scope, however it covers
a range of different kinds of maps that all focus a map user’s attention
on the relevant information, be it specific features or areas. We discuss
their general properties and the importance of context for designing such
maps, and introduce a toolbox for constructing schematic maps that pro-
vides a generic way of generating the different kinds of maps discussed.
Furthermore, we provide empirical evidence supporting our approach and
outline how navigation in 3D virtual environments may benefit from a
transfer of the proposed concept of focus maps from 2D to 3D.
Keywords: Schematic maps, map design, wayfinding assistance.
1 Introduction
Maps are a dominant medium to communicate spatial information. They are
omnipresent in our daily life. In news and ads they point out where specific
places are, often in relation to other places; they link events, dates, and other
data to locations to illustrate, for example, commercial, historical, or sports de-
velopments. For planning holidays or trips to unknown places inside or outside
our hometown we often grab a map—or, nowadays, we recur to Internet plan-
ners, like Google Maps, or (car) navigation systems. And if we ask someone for
directions, we may well end up with a sketch map illustrating the way to take.
All these maps display different information for different purposes. Often, they
are intended for a specific task. However, the design of the maps not always reflects
this task-specificity. The depicted information may be hard to extract, either be-
cause of visual clutter, i.e., a lot of excess information, or because the map user
is not properly guided to the relevant information. In this paper, we discuss the
concept of focus maps, which is an approach to designing maps that guide a map

What Do Focus Maps Focus On? 155
user in reading information off a map. Using simple graphical and geometric oper-
ations, the constructed maps focus a user’s attention on the relevant information
for a given task. This way, we are able to design maps that not only are tailored
for the intended task, but also assist a map user in reading them.
In the next section, we present approaches to map-based assistance in spatial
tasks and illustrate the fundamental concepts underlying our approach, namely
schematization and a computational approach to constructing schematic maps.
Section 3 explains the concept of focus maps previously presented by [1] in
a restricted scope and discusses its generalized aim and properties. Section 4
introduces a toolbox for map construction and the relevant components needed
for designing focus maps. This section also shows examples of different kinds of
focus maps. In Section 5 we provide empirical evidence supporting our approach;
in Section 6 we outline how the concept of focus maps may be transferred to
the construction of 3D virtual worlds. The paper ends with conclusions and an
outlook on future work in Section 7.
2 Maps and Map-Based Assistance
Maps and map-like representations have been used by humans since ancient
times [2]. There are evidences that they are used universally, i.e., across cultures
[3]. That is, maps are (or have become) a form of representing space used by
almost any human being, just as natural language. Over time, maps have become
an everyday product. However, often there is a mismatch between what the map
designer has intended and how the map reader actually uses the map [4]. This
problem persists even though maps are rarely purely graphical representations,
but usually also contain (explanatory) verbal elements [5]. And this problem
increases with the increasing use of map-like representations in electronic form.
While there is a rich set of rules and guidelines for the generation of paper-based
cartographic maps (e.g., [6,7]), these rules are mostly missing for electronic maps
presented on websites or on mobile devices.
This can be observed in approaches for automatic assistance in spatial tasks.
Maps play a major role here; in addition to verbal messages almost all Internet
route planners and car navigation systems also provide information on the way to
take in graphical form. In research, for example in the areas of human-computer
interaction and context awareness, several approaches exist that deal with map-
based assistance (e.g., [8,9,10]). Most of these approaches employ mobile devices
to present maps; the maps are used as interaction means in location based ser-
vices [8,11,9]. Accordingly, this research aims at an automatic adaptation of the
maps to the given medium and situation [12]. Questions of context awareness and
adaptation to context play an important role [13] (see also the next section).
Our work is based on ideas presented by Berendt et al. [14]. They develop a
computational approach to constructing maps they term schematic. Schematic
maps are representations that are intentionally simplified beyond technical needs
to achieve cognitive adequacy [15]. They represent the specific knowledge needed
for a given task; accordingly, the resulting maps are task-specific maps [16]. Three
156 K.-F. Richter et al.
different levels of knowledge are distinguished in this approach: 1) knowledge that

needs to be represented unaltered, 2) knowledge that can be distorted but needs
to be represented, 3) knowledge that can be omitted [17]. This distinction guides
the map construction process in that the required knowledge, called aspects, is
selected from existing knowledge prior to map construction and ranked in a
depictional precedence [17]. This order guides the construction, for example, in
deciding which knowledge may be distorted to solve local conflicts that are due
to space limitations in the depictional medium. When reading a schematic map,
the reader’s assumptions about this depictional precedence needs to match the
actually used precedence. Otherwise, map reading may lead to mis- or over-
interpretation [18].
3 The Concept of Focus in Map Design
As we have detailed in the last section, maps are important in our everyday life. It
is a prime means to communicate spatial information; reading maps is a recurring
task. Consequently, assistance systems that use maps as communication means
should not only assist in the given spatial task, but also provide assistance in
reading the maps. This holds especially since the advent of mobile devices with
their small displays as platform for these assistance systems.
In line with the aspect maps approach (see last section), maps as assistance
means should concentrate on the relevant information. This serves to reduce
cognitive load of the users; they should not need to process spatial information
that is not needed for the task at hand. At the same time, however, these maps
should also guide their reading. This serves to speed up information processing;
by the design of the map, map users should be drawn to the relevant information.
We term this design principle of reader guidance focus map. The focus ef-
fect is a specific form of schematization. While it does not reduce information
represented in a map homogeneously by, for example, removing objects or sim-
plifying geometry over all objects, it reduces the information to be processed by
funneling a reader’s attention to the relevant information.
Since schematic maps are task-specific [16], what information focus maps fo-
cus on is dependent on the task at hand. When the task is to guide a user from
location A to location B, maps need to be designed differently from maps that
present points of interest in the depicted environment. That is, map design is
context dependent; the appearance of the generated map depends on the envi-
ronment depicted, on the selected information, and on the intended task. Other
than the approaches listed in Section 2 and other “traditional” approaches to
context (e.g., [19,20]) that define context by (non-exhaustive) lists of factors
whose parametrization is supposed to result in context-adaptive behavior, we
take a process-oriented approach to context [21]. Figure 1 provides a diagram-
matic view on this approach. It distinguishes between the environment at hand,
the environment’s representation (in the context of this paper this is the focus
map), and an agent using the representation to interact with the environment—
here, this is the map user. Between these three constituents, processes determine
the interactions going on to solve a given task. For example, map reading and
interpretation processes determine what information the agent extracts from the
map, while processes of selection and schematization determine what information
gets depicted in the map by the map designer, i.e., determine the representation.
These processes, finally, are determined by the task at hand. The designer selects
and schematizes information with a specific task in mind, the map user reads
information off the map to solve a specific task. This way of handling context is
also flexibe with respect to task changes—be it the kind of task or the concrete
task at hand. Thus, it may well be the basis for flexibly producing different kinds
of maps using the same data basis, for example, in mobile applications.
Fig. 1. A process-oriented view on context (from [21], modified). It is determined by

the interaction between environment (E), representation (R), and agent (A). The task
(T) determines the processes that drive this interaction.
What to Focus on
The term focus map stands for representations that guide a map user’s reading
processes to the relevant information. However, as just explained, depending on
the context there is a great variety of what this relevant information might be.
Accordingly, different kinds of maps can be summarized under the term focus
map. It is important to note that what is generally depicted on a map, i.e., the
(types of) objects shown, are selected in a previous step (see Section 4). The
selected features depend on the kind of task as illustrated above; focusing then
highlights specific instances of these features, namely those specifically relevant
for the actual task. For example, for a wayfinding map the street network as
well as landmark features may be selected for depiction; the route connecting
origin and destination and those landmarks relevant for the route then may be
highlighted using focus effects.
Broadly, we can distinguish between maps that focus on specific objects (or ob-
ject types) and maps that focus on specific areas of the depicted environment (cf.
also the distinction between object- and space-schematization in [22]). Focusing
on objects can be achieved by using symbols to represent the relevant objects, for
example, landmarks [23,24]. It may also be achieved by object-based schematiza-
tion, i.e., by altering the appearance of specific objects to either increase or de-
crease their visibility (see Section 4.2).
When focusing on specific areas, all objects in these areas are in focus, in-
dependent of their type. Objects in the focused area are highlighted, all other
objects are diminished. Such maps may, for example, focus on the route between
some origin and destination, funneling a wayfinder’s attention to the route to
take [1]. Several different areas can be in focus at the same time, which may be
disconnected. This holds also for focusing on multiple routes at the same time
to, for example, indicate alternative detours next to the proposed main route.
For all the different kinds of focus maps, graduated levels of focus are possible,
i.e., it is possible to define several levels of varying focus. In a way, this corre-
sponds to the depicitional precedence explained in Section 2; different types of
information may be highlighted to different degrees. This may be used to either
depict “next-best” information along with the most important information, or to
increase the funneling effect by having several layers of increasing focus around
an area. With this graduated levels of focus, we can distinguish strong and weak
focus. Using a strong focus, there is an obvious, hard difference in presenting
features in focus and those that are not. Features in focus are intensely high-
lighted, those that are not are very much diminished. A weak focus provides a
smoother transition between those features in focus and those that are not.
The kinds of focus maps presented so far all focus on either objects or areas,
i.e., on parts of the depicted environment. They emphasize structural informa-
tion [25]. However, maps may also be designed such that they emphasize the
actions to be performed. Such maps focus on functional information. Wayfind-
ing choreme maps [26] are an example of this kind of maps. In designing such
maps, the visual prototypes identified by Klippel [25] that represent turning ac-
tions at intersections emphasize the incoming and outgoing route-segments at
intersections, i.e., the kind of turn due at an intersection. This way, they ease
understanding which action to perform, reducing ambiguity and fostering con-
ceptualization of the upcoming wayfinding situations. Combining structural and
functional focus, for example, as in chorematic focus maps [27], then results in
maps that focus on the relevant information in the relevant areas.
Combining structural and functional focus is also employed in generating per-
sonalized wayfinding maps. Here, different levels of focus are used in that maps
depict information in different degrees of detail (focus) depending on how well
known an area is to the wayfinder [28]. Such maps that show transitions between
known and unknown parts of an environment are a good example for using mul-
tiple levels of focus. The maps consist of three classes of elements of different
semantics and reference frames:
– One or more familiar paths; those paths are obtained by an analysis of

previous trajectories of the wayfinder and map matching. Familiar paths
belong to an individual frame of reference, as they describe a previously
traveled route between two individually meaningful places. These are the
most restricted elements of the map: only the previously traveled path and
prominent places or landmarks along the path are selected and depicted on
the resulting map.
– Transition points; they describe the transition from familiar to unfamiliar
areas and also define the transition between the individual reference frame
and a geographic frame of reference. For reasons of orientation and localiza-
tion, elements of the known part at the transition points are selected and
added to the map.
– One or more unfamiliar areas; all elements of these areas belong to a ge-
ographic frame of reference. This means focus effects can only sensibly be
applied to unfamiliar environments, as is further explained below.
We apply focus effects differently for each of the three classes of elements. The
familiar paths are highly schematized, chorematized (all angles are replaced by
conceptual prototypes; see [26]), and scaled down. No focusing is applied to these
parts of the map, as there is no additional environmental information depicted
that could distract the attention of the wayfinder. These paths only serve as
connections between familiar and unfamiliar environments.
The purpose of maps based on previous knowledge is to highlight the unknown
parts of a route. Accordingly, the transition areas are subject to focus. To enable
localization, a transition point has to be clearly oriented and identifiable. This
requires resolving ambiguities that may arise. To this end, elements in direct
vicinity of the transition points that belong to the known parts of a route are
selected and displayed. We apply a strong focus function to these points. This
enables a smooth reading of the transition between the different parts. In unfa-
miliar parts, we display much more environmental information to provide more
spatial context. To focus a wayfinder’s attention on the route to travel, we apply
focus effects on the route as explained above (see also Section 4.3).
4 Implementation
Focus maps, as a specific kind of schematic maps, are part of the toolbox for
schematic map design developed in project I2-[MapSpace] of the Transregional
Collaborative Research Center SFB/TR 8 Spatial Cognition.1 In this section, we
will briefly introduce the basics of this toolbox and the underlying operations for
generating focus maps. Section 4.3 then introduces a generic way of generating
focus maps and shows examples of the different kinds of focus maps discussed
so far.
4.1 Toolbox for Schematic Maps

The toolbox for schematic maps collects functionality for the design of schematic
maps in a coherent framework. It comprises fundamental operations, such as
1
http://www.sfbtr8.spatial-cognition.de/project/i2/
vector-based geometry and building up the required data structures (e.g., ex-
tracting a graph from a given street network). The toolbox is able to deal with
data given in different formats, for instance, as EDBS- or GML-files.2 There is
also functionality provided to export data again, which is also used as one way to
communicate between different parts of the toolbox. The main part of the tool-
box, though, is the provision of operations for the graphical and geometric ma-
nipulation of spatial objects (features represented as points, lines or polygons).
These operations form the basis for the different implemented schematization
principles; those operations required for focus maps are explained in more detail
in the next subsection.
The toolbox is implemented in Lisp. Maps can be produced as Scalable Vector
Graphics (SVG)3 or in Flash format4 . SVG is an XML-based graphics format
that is highly portable across different platforms and applications. Flash allows
for a simple integration of interaction means in the map itself and can be dis-
played by most modern browsers.
Mostly, the operations of the data processing part can be used independently
from each other; there is no predefined order of execution. The context model
presented in Section 3 (see also Fig. 1) may be used to implement a control mod-
ule that determines the execution order given a task, agent, and environment.
4.2 Basic Operations for Focus Maps
As for any schematic map to be constructed, the spatial information (e.g., objects
or spatial relations) to be depicted needs to be selected. Specific to focus maps,
the selection operation also involves determining which parts of this information
are to be highlighted. The concrete operation achieving this focus depends on
the kind of focus effect aimed for. Focusing on specific objects, for example, is
realized simply by type comparison of the objects in the database with the target
type. In focusing on specific areas, on the other hand, for every object a focus
factor is calculated that depends on the object’s distance to the focus area.
The most important operation for designing focus maps is adapted coloring
of depicted graphical objects. This operation determines the visual appearance
of the map; it works on a perceptual level. This operation is used for any kind of
focus map described in Section 3. The coloring operation manipulates the color—
the RGB values—of objects before they get depicted. Those objects that are in
focus are depicted in full color to make them salient. In contrast, the color of
objects not in focus is shifted towards white. This color shift renders these ob-
jects less visible as they are depicted in lighter, more grayish color. In contrast,
the non-shifted objects stick out, putting them in focus. Additionally, the ge-
ometry of objects not in focus may be simplified. This further diminishes their
visual appearance, as has been demonstrated by [1]. To this end, the toolbox
implements line simplification based on discrete curve evolution [29].
2
EDBS: http://www.atkis.de; GML: http://www.opengis.net/gml/
3
http://www.w3.org/Graphics/SVG/
4
http://www.adobe.com/support/documentation/en/flash/documentation.html
4.3 A Generic Way of Generating Focus Maps
As argued in Section 3, conceptually focus maps cover a range of different effects

that put parts of the depicted information in focus. This is reflected in the imple-
mentation. A single function allows for the generation of different kinds of focus
maps. It provides a uniform, generic interface to designing the different kinds
of maps by capturing the fundamental logics of map construction. Setting two
parameters, a map designer can determine which focus effects are employed. The
first parameter determines which features shall be in focus—by either specifying
their boundary (area focus) or by listing them (object focus). The second param-
eter then determines which kind of focus effect shall be generated—graduated
focus or focus on a single (type of) feature.
The focus map function performs the requested operations on the map objects
to be depicted. Visual presentation—drawing the map—is realized in the toolbox
by another function taking parameters that determine, for example, map size,
whether a grid is drawn, and which output format (SVG or Flash) is used. This
function takes all objects to be depicted as a list and draws them in the list’s
order. Thus, additional visual effects can be achieved by carefully choosing the
objects’ order in the list. Objects at the end of the list are drawn later than
those in the beginning, i.e., by ordering objects effects of (partial) occlusion may
be achieved.
In the following, we show some example focus maps generated by the tool-
box. For each map, we provide the parameter settings used. We use part of the
inner city of Bremen, Germany, as an example environment; the maps depict
streets, water bodies, and tramways. The first sample focus map shown in Fig-
ure 2 highlights tramways, while water bodies are strongly diminished. This is
achieved by setting the first parameter to ranked-objects; the value ranked-objects
corresponds to maps that allow for a rank order in focusing on objects. Accord-
ingly, the second parameter states this rank order, given as a list of weights
(0 ≤ w ≤ 1). Here, the weight for tramways is 1, for water bodies 0.2, and for
streets 0.5.
The sample maps of Figure 3 illustrate effects that focus on specific areas. In
Figure 3a, a single route is in focus, while Figure 3b additionaly keeps alternative
detours to a lesser degree in focus. To achieve this, the first parameter is set to
area-focus. The second parameter states the area(s) to focus on. In case of the
example maps shown below, these areas are the routes in focus (given as sequence
of coordinates). In case there are multiple routes to focus on, the first route is
taken as main route, the following routes as alternatives with lesser degree of
focus. The chosen example of Figure 3b is an artificial one, emphasizing the
effect of focusing on multiple areas. Usually, the additional routes are meant to
be possible detours in case the main route is blocked, for example. But having
disconnected routes in a single map as shown here helps to make the effects more
visible.
More formally, fading out of colors is achieved by calculating for each coordinate
its distance to the area in focus (here, the route). As in the RGB color-space (0, 0, 0)
is black and (255, 255, 255) is white, a shift towards white corresponds to adding a
Fig. 2. A focus map emphasizing a specific oject type. In this example, tramways (the
big, black lines) are highlighted, while water bodies (the light gray areas) are strongly
diminished.
factor to each color component. The distance d is defined as the minimal distance
between a coordinate c and the focus area f . The three new color components
r , g , b then are calculated to be the minimum of 230,5 and the sum of the old
color component (r, g, b, respectively) and the distance d multiplied with a factor
k, which determines how quickly colors fade out (i.e., corresponds to strong or
weak focus). This sum is normalized by the size of the environment s.
d = |c − f |
kd
r = min(230, r + )
s
kd
g = min(230, g + )
s
kd
b = min(230, b + )
s
When multiple areas a0 , ..., an are present, the secondary areas are integrated
as focus objects such that they decrease the added sum again. This is achieved
by calculating an additional distance value n that gives the minimal distance
(nearness) of a coordinate c to the nearest additional area. However, to restrict
the influence of additional areas, we only take those coordinates into account
that are nearer to the areas than the average distance between all objects and
5
A RGB-color of (230, 230, 230) corresponds to a light grey that is still visible on white
background.
a) b)
Fig. 3. Focus maps emphasizing route information. a) A single route is in focus; b)

Multiple routes in focus; one is the main route (the same as in a), two others (the
bigger lines) are focused on to a lesser degree.
the main focus object (the average distance is denoted by p). The value n is
additionaly modified by another focus factor j that determines the strength of
the additional areas’ influence.
n = max(0, p − argmin(|c − a|))

a
kd
r = min(230, r + − jn)
s
kd
g = min(230, g + − jn)
s
kd
b = min(230, b + − jn)
s
5 Empirical Results
In the literature and our own work, we can find several arguments why focus
maps as they have been discussed in the previous sections are beneficial for map
reading and, consequently, for task performance. Li and Ho [30], for example,
discuss maps for navigation systems that highlight the area a wayfinder is cur-
rently in. A user study demonstrates that people consider this highlighting as
beneficial, especially if a strong focus function is used, i.e., if only the area in the
immediate vicinity is highlighted. In a similar line, the resource-adaptive naviga-
tion system developed at Universität Saarbrücken [8,11] adapts presentation of
information to the available display space and the time a user has to extract the
required information. The less time there is, the more focused the presentation is.
If there is only short time to extract information, the presented information is

highly schematized and restricted to the route. If time permits, i.e., if a user
remains at one spot for some time, the system displays the surroundings with
increasing detail. This way, users can quickly and with little effort read off from
the device what to do when in a hurry, but are also enabled to re-plan and orient
themselves when sufficient time is available [8].
In one of our own studies—which will be the first in a line of studies con-
cerned with the performance of different map types (see Section 7)—Kuhnmünch
and Strube [31] tested wayfinding performance with three differerent schematic
maps. Participants had to follow a specific route indicated on printed maps. The
route consisted of pathways situated on a campus area (Universität Freiburg,
Germany). The route tested had a length of 775 meters and comprised sixteen
decision points. Sixteen participants unfamiliar with the campus used a chore-
matic focus map of the type depicted in Figure 4; the route is indicated by a line
connecting the origin and the destination. In the same experiment, we tested two
other types of maps (two additional groups with sixteen participants each). These
results are not reported, as we specifically discuss focus maps here. Concerning
wayfinding performance, seven participants went astray but finally reached the
destination; nine accomplished the task without errors. Furthermore, a post-test
asked participants to evaluate the given map and their wayfinding experience
with it. Taken together, the focus map yielded good results. On five-point rat-
ing scales (0: “I strongly disagree”; 4: “I strongly agree”) participants indicated
it was easy to use (Mean = 3; Standard Deviation = 0.65), they succeeded to
localize themselves on the map (Mean = 3; Standard Deviation = 1.07), and
they mostly knew which action to take at decision points (Mean = 3; Standard
Deviation = 0.89).
Surprisingly, though, none of the participants stated to have used the contours
of buildings for self-localization. Instead, all participants indicated they had used
the structure of the pathways and landmarks given in the map for solving the
Fig. 4. Sample of a chorematic focus map as used in the study

task. In fact, they experienced such structural information successively while

wayfinding and could match it with the structures shown on the map. Pre-
sumably, comparing contours of buildings was not considered necessary or too
intricate for this task. Of course, these results should not be interpreted as a
recommendation to omit buildings in maps. Instead, it depends on the task how
useful or essential information on buildings is. Another experiment [32] exempli-
fies this for the task of self-localization with minimal given context. Participants
unfamiliar with the same campus area were blindfolded and placed in front of a
building. After the blindfold had been removed, they were asked to indicate their
position on either a map that only displayed the contours of all the buildings of
the campus, or on a map that only depicted all pathways of the campus.6 They
were allowed to explore a small area around their current position in order to
understand the local configuration of pathways or buildings. If participants in-
dicated the wrong position on the map, they were asked to rethink their answer.
Experimenters measured the time until the correct answer was given. As ex-
pected, participants with the building map were significantly faster. They could
rely on contours and the configuration of buildings as relatively unambiguous
landmarks. Instead, the other participants had to match the experienced small
section of the complete path network with the map, which is more ambiguous
and, therefore, causes more errors and longer reaction times.
6 Focus in 3D
Schematization methods, including focus effects, can also be transferred to the
generation of 3D virtual environments (VEs). Nowadays, these environments are
more and more utilized, for example, to visualize geospatial data [33]. Some of
these geospatial virtual environments remodel real environments, for example,
cities, such as in Google Earth. One of the reasons for this newly emerged trend
is the huge amount of available 3D data to produce high quality virtual envi-
ronments. These virtual cities can be used not only for entertainment, but they
can also provide a new medium for tourism and can be used for training people,
for example, in rescue scenarios.
A virtual environment “[...] offers the user a more naturalistic medium in
which to acquire spatial information, and potentially allows to devote less cog-
nitive effort to learning spatial information than by maps” ([34], p. 275). While
this is an important aspect of using VEs for getting acquainted with an envi-
ronment, also several navigational problems have been identified [34]. Compared
to navigational experiences in a real environment, when navigating in a vir-
tual environment people get less feedback and information on their movement.
This is due to the fact that often virtual environments are only presented on a
desktop and movement is controlled by a joystick or a mouse. Vestibular and
proprioceptive stimuli are missing in this case [35]. Therefore, people have se-
vere problems in orienting and acquiring survey knowledge. Accordingly, there
has been a lot of research trying to improve navigational performance in VEs
6
None of these maps have been focus maps.
(e.g., [36,37]). Nevertheless, there are contradicting results how well survey
knowledge can be learned by a virtual environment [34] and which ways of pre-
senting spatial information in VEs are most efficient.
We believe that a transfer of schematization principles from 2D to 3D repre-
sentations is a promising way to ease the extraction of the relevant information
in VEs and, hence, a promising way to improve navigation performance. One
example of this transfer is the use of focus effects in 3D by, for example, fading
colors away from the relevant areas and using simplified geometry in areas that
are not in focus—similar to the maps depicted in Figure 3. This way, we can form
regions of interest, such as a specific route (see Fig. 5 for a sketch of this effect).
This focus effect may be used to form several regions of interest by highlighting
different features and using different levels of detail. Forming such regions may
help to get a better sense of orientation [38].
Fig. 5. Sketched example of how to transfer focus effects to 3D virtual environments
A transfer of focus maps from 2D to 3D environments is also proposed by

Neis and Zipf [24] (see also [39]). They present an approach to integrating land-
marks in 2D focus maps and outline how to transfer this to 3D virtual envi-
ronments. However, there are more options to achieve a focus effect than only
the use of landmarks. As explained in Section 3, we distinguish between object-
and space-schematization [22]. In object-schematization—which also covers the
highlighting of specific objects as landmarks—a further option is, for example,
to highlight specific features of an object, depending on its role in the given task.
Space-schematization can be used to highlight specific areas as explained above,
but also to emphasize distance or direction information, such as the transfer of
the choreme prototypes for turning actions (see Section 3).
7 Conclusions
We have discussed and generalized the concept of focus maps previously pre-
sented by [1]. Focus maps are specific kinds of schematic maps. The concept of
focus maps covers a range of different kinds of maps that all have in common
that they guide map reading to the relevant information for a given task. We
can distinguish between maps that focus on specific (types of) objects, and those
that focus on specific areas. We have illustrated their properties, design princi-
ples, and how they relate to our context model. We have introduced a toolbox
for the design of schematic maps and shown example maps constructed with this
toolbox. We have also outlined how navigation in 3D virtual environments may
benefit from a transfer of the concept of focus maps to these representations.
In addition to the transfer of focus effects from 2D to 3D representations
explained in Section 6, we plan to employ the concept of focus maps in maps
that while primarily presenting the route to take, also provide information on
how to recover from accidental deviation from that route. Here, those decision
points (intersections) that are considered to be especially prone to errors may
be highlighted and further environmental information, i.e., the surrounding area,
may be displayed in more detail than is used for the rest of the map.
We reported some empirical studies that support the claims of our approach.
Further analyses and empirical studies are required, though, to better understand
the properties of focus maps and wayfinding performance with diverse types of
maps. For example, we plan to perform eye-tracking studies that will determine
whether a map user’s map reading is guided as predicted by the employed design
principles. We will also further analyze the performance of map users in different
wayfinding tasks, such as route following or self-localization, where they are
assisted by different types of maps, for example, tourist maps or focus maps.
These studies will help to improve design principles for schematic maps and will
lead to a detailed model of map usage. Finally, we will evaluate the consequences
of transferring focus maps to 3D environments on navigation performance in
these environments.
Acknowledgments
This work has been supported by the Transregional Collaborative Research Cen-
ter SFB/TR 8 Spatial Cognition, which is funded by Deutsche Forschungsge-
meinschaft (DFG). Fruitful discussions with Jana Holsanova, University of Lund,
helped to sharpen ideas presented in this paper. We also like to thank partic-
ipants of a project-seminar held by C. Hölscher and G. Strube at Universität
Freiburg for providing their empirical results (see [32]).
References
1. Zipf, A., Richter, K.-F.: Using focus maps to ease map reading — developing
smart applications for mobile devices. KI Special Issue Spatial Cognition 02(4),
35–37 (2002)
2. Tversky, B.: Some ways that maps and diagrams communicate. In: Freksa, C.,
Brauer, W., Habel, C., Wender, K.F. (eds.) Spatial Cognition II - Integrating
Abstract Theories, Empirical Studies, Formal Methods, and Practical Applications,
pp. 72–79. Springer, Berlin (2000)
3. Stea, D., Blaut, J.M., Stephens, J.: Mapping as a cultural universal. In: Portu-
gali, J. (ed.) The Construction of Cognitive Maps, pp. 345–358. Kluwer Academic
Publishers, Dordrecht (1996)
4. Mijksenaar, P.: Maps as public graphics: About science and craft, curiosity and
passion. In: Zwaga, H.J., Boersema, T., Hoonhout, H.C. (eds.) Visual Informa-
tion for Everyday Use: Design and Research Perspectives, pp. 211–223. Taylor &
Francis, London (1999)
5. Tversky, B., Lee, P.U.: Pictorial and verbal tools for conveying routes. In: Freksa,
C., Mark, D.M. (eds.) Spatial Information Theory - Cognitive and Computational
Foundations of Geopraphic Information Science, Berlin, International Conference
COSIT, pp. 51–64. Springer, Heidelberg (1999)
6. MacEachren, A.: How Maps Work: Representation, Visualization and Design. Guil-
ford Press, New York (1995)
7. Hirtle, S.C.: The use of maps, images and ”gestures” for navigation. In: Freksa,
C., Brauer, W., Habel, C., Wender, K.F. (eds.) Spatial Cognition II - Integrating
Abstract Theories, Empirical Studies, Formal Methods, and Practical Applications,
pp. 31–40. Springer, Berlin (2000)
8. Wahlster, W., Baus, J., Kray, C., Krüger, A.: REAL: Ein ressourcenadaptierendes
mobiles Navigationssystem. Informatik Forschung und Entwicklung 16, 233–241
(2001)
9. Schmidt-Belz, B.P.S., Nick, A., Zipf, A.: Personalized and location-based mobile
tourism services. In: Workshop on Mobile Tourism Support Systems, Pisa, Italy
(2002)
10. Kray, C., Laakso, K., Elting, C., Coors, V.: Presenting route instructions on mobile
devices. In: International Conference on Intelligent User Interfaces (IUI 2003), pp.
117–124. ACM Press, New York (2003)
11. Baus, J., Krüger, A., Wahlster, W.: A resource-adaptive mobile navigation system.
In: IUI 2002: Proceedings of the 7th international conference on Intelligent user
interfaces, pp. 15–22. ACM Press, New York (2002)
12. Reichenbacher, T.: The world in your pocket — towards a mobile cartography.
In: Proceedings of the 20th International Cartographic Conference, Beijing, China
(2001)
13. Zipf, A.: User-adaptive maps for location-based services (LBS) for tourism. In:
Woeber, K., Frew, a., Hitz, M. (eds.) Proceedings of the 9th International Con-
ference for Information and Communication Technologies in Tourism, Innsbruck,
Austria, ENTER 2002. Springer, Heidelberg (2002)
14. Berendt, B., Barkowsky, T., Freksa, C., Kelter, S.: Spatial representation with
aspect maps. In: Freksa, C., Habel, C., Wender, K.F. (eds.) Spatial Cognition
1998. LNCS (LNAI), vol. 1404, pp. 157–175. Springer, Heidelberg (1998)
15. Klippel, A., Richter, K.-F., Barkowsky, T., Freksa, C.: The cognitive reality of
schematic maps. In: Meng, L., Zipf, A., Reichenbacher, T. (eds.) Map-based Mobile
Services - Theories, Methods and Implementations, pp. 57–74. Springer, Berlin
(2005)
16. Freksa, C.: Spatial aspects of task-specific wayfinding maps - a representation-
theoretic perspective. In: Gero, J.S., Tversky, B. (eds.) Visual and Spatial Reason-
ing in Design, pp. 15–32. University of Sidney, Key Centre of Design Computing
and Cognition (1999)
17. Barkowsky, T., Freksa, C.: Cognitive requirements on making and interpreting
maps. In: Hirtle, S.C., Frank, A.U. (eds.) COSIT 1997. LNCS, vol. 1329, pp. 347–
361. Springer, Heidelberg (1997)
18. Berendt, B., Rauh, R., Barkowsky, T.: Spatial thinking with geographic maps: An
empirical study. In: Czap, H., Ohly, P., Pribbenow, S. (eds.) Herausforderungen an
die Wissensorganisation:Visualisierung, multimediale Dokumente, Internetstruk-
turen, pp. 63–73. Ergon-Verlag, Würzburg (1998)
19. Dey, A.K.: Understanding and using context. Personal and Ubiquitous Comput-
ing 5(1), 4–7 (2001)
20. Sarjakoski, L.T., Nivala, A.M.: Adaptation to context - a way to improve the
usability of topographic mobile maps. In: Meng, L., Zipf, A., Reichenbacher, T.
(eds.) Map-based Mobile Services - Theories, Methods and Implementations, pp.
107–123. Springer, Berlin (2005)
21. Freksa, C., Klippel, A., Winter, S.: A cognitive perspective on spatial context.
In: Cohn, A.G., Freksa, C., Nebel, B. (eds.) Spatial Cognition: Specialization and
Integration. Number 05491 in Dagstuhl Seminar Proceedings, Dagstuhl, Germany,
Internationales Begegnungs- und Forschungszentrum für Informatik (IBFI), Schloss
Dagstuhl, Germany (2007)
22. Peters, D., Richter, K.F.: Taking off to the third dimension — schematization of vir-
tual environments. International Journal of Spatial Data Infrastructures Research
(accepted); Special Issue GI-DAYS 2007. Young Researchers Forum, Münster
23. Elias, B., Paelke, V., Kuhnt, S.: Concepts for the cartographic visualization of
landmarks. In: Gartner, G. (ed.) Location Based Services & Telecartography - Pro-
ceedings of the Symposium 2005. Geowissenschaftliche Mitteilungen, TU Vienna,
pp. 1149–1155 (2005)
24. Neis, P., Zipf, A.: Realizing focus maps with landmarks using OpenLS services.
In: Mok, E., Gartner, G. (eds.) Proceedings of the 4th International Symposium
on Location Based Services & TeleCartography, Department of Land Surveying &
Geo-Informatics. HongKong Polytechnic University (2007)
25. Klippel, A.: Wayfinding choremes. In: Kuhn, W., Worboys, M.F., Timpf, S. (eds.)
COSIT 2003. LNCS, vol. 2825, pp. 320–334. Springer, Heidelberg (2003)
26. Klippel, A., Richter, K.-F., Hansen, S.: Wayfinding choreme maps. In: Bres, S.,
Laurini, R. (eds.) VISUAL 2005. LNCS, vol. 3736, pp. 94–108. Springer, Heidelberg
(2006)
27. Klippel, A., Richter, K.F.: Chorematic focus maps. In: Gartner, G. (ed.) Location
Based Services & Telecartography. Geowissenschaftliche Mitteilungen. Technische
Universität Wien, Wien, pp. 39–44 (2004)
28. Schmid, F.: Personalized maps for mobile wayfinding assistance. In: 4th Interna-
tional Symposium on Location Based Services and Telecartography, Hong Kong
(2007)
29. Barkowsky, T., Latecki, L.J., Richter, K.-F.: Schematizing maps: Simplification of
geographic shape by discrete curve evolution. In: Freksa, C., Brauer, W., Habel, C.,
Wender, K.F. (eds.) Spatial Cognition II - Integrating Abstract Theories, Empirical
Studies, Formal Methods, and Practical Applications, pp. 41–53. Springer, Berlin
(2000)
30. Li, Z., Ho, A.: Design of multi-scale and dynamic maps for land vehicle navigation.
The Cartographic Journal 41(3), 265–270 (2004)
31. Kuhnmünch, G., Strube, G.: Wayfinding with schematic maps. Data taken from
an article in preparation (2008)
32. Ahles, J., Scherrer, S., Steiner, C.: Selbstlokalisation mit Karten und Orientierung
im Gelände. Unpublished report from a seminar held in 2007/08 by C. Hölscher
and G. Strube. University of Freiburg (2007)
33. Slocum, T., Blok, C., Jiangs, B., Koussoulakou, A., Montello, D., Fuhrmann, S.,
Hedley, N.: Cognitive and usability issues in geovisualization. Cartography and
Geographic Information Science 28(1), 61–75 (2006)
34. Montello, D.R., Hegarty, M., Richerdson, A.E.: Spatial memory of real environ-
ments, virtual environments, and maps. In: Allen, G. (ed.) Human spatial memory:
Remembering where, pp. 251–285. Lawrence Erlbaum Associates, Mahwah (2004)
35. Nash, E.B., Edwards, G.W., Thompson, J.A., Barfield, W.: A review of pres-
ence and performance in virtual environments. International Journal of Human-
computer Interaction 12(1), 1–41 (2000)
36. Darken, R.P., Sibert, J.L.: A toolset for navigation in virtual environments. In:
UIST, pp. 158–165 (1993)
37. Darken, R.P., Sibert, J.L.: Wayfinding strategies and behaviours in large virtual
worlds. In: CHI, pp. 142–149 (1996)
38. Wiener, J.M., Mallot, H.A.: ’fine-to-coarse’ route planning and navigation in re-
gionalized environments. Spatial Cognition and Computation 3, 331–358 (2003)
39. Coors, V.: Resource-adaptive interactive 3d maps. In: SMARTGRAPH 2002: Pro-
ceedings of the 2nd international symposium on Smart graphics, pp. 140–144.
ACM, New York (2002)
Locating Oneself on a Map in Relation to Person
Qualities and Map Characteristics
Lynn S. Liben1, Lauren J. Myers1, and Kim A. Kastens2

1
Department of Psychology,
The Pennsylvania State University, University Park, PA 16802, USA
2
Lamont-Doherty Earth Observatory,
Department of Earth & Environmental Sciences, Columbia University,
Palisades, NY 10964 USA
liben@psu.edu, ljmyers@brynmawr.edu, kastens@ldeo.columbia.edu
Abstract. Adults were taken to various positions on a college campus and

asked to mark their locations on a round or square map drawn from either di-
rectly overhead or from an oblique angle. In session 1, participants were also
given paper and pencil spatial tests to assess their skills in mental rotation (2D
figure rotation), spatial visualization (paper folding), and spatial perception
(water level). In session 2, participants completed computer-based navigation
and mapping tasks. Performance varied widely among participants. Regression
analyses showed that spatial skills predicted performance on both campus and
computer mapping tasks, but the specific spatial skills that predicted success
differed. Across map types, some differences in strategies and speed were ob-
served. Findings show the value of research with both real and simulated envi-
ronments, and with maps having varying cartographic properties.
Keywords: Spatial cognition, maps, navigation, spatial skills.
1 Introduction
Spatial cognition refers to the myriad of cognitive processes involved in acquiring,
storing, representing, and manipulating knowledge about space. The spaces in ques-
tion may range from small spaces, visible from a single viewpoint and amenable to
direct manipulation (e.g., a desk surface littered with objects), to environmental
spaces that may be experienced by navigating to multiple vantage points (e.g., a cam-
pus or city environment), to geographic or celestial spaces that are rendered visible by
amplifiers of human capacities (e.g., maps representing the entire surface of Earth at
once, photographs of the far side of the moon) [1]. Cognitive processes concerning
space may be supported by a variety of representations ranging from the interior and
mental (e.g., mental images of individual objects or landmarks, a survey-like cogni-
tive map) to the external and concrete (e.g., Global Positioning System technology, a
room blueprint, a road map). The focus of the research discussed here is on human
adults’ ability to use external spatial representations (maps) to represent navigable
environments. Specifically, we examine adults’ success in connecting locations in
outdoor (campus or park) environments to locations on a map.
172 L.S. Liben, L.J. Myers, and K.A. Kastens
The motivation for our focus on maps is both practical and theoretical. At the prac-
tical level, maps are pervasive tools across eras and cultures, and maps are used to
teach new generations about how to conceptualize and use the environments in which
they live and work [2,3,4,5]. They play a central role in a wide range of disciplines as
diverse as epidemiology, geology, geography, and ecology; they are used for common
life tasks such as navigating to new locations, interpreting daily news reports, and
making decisions about where to buy a house or locate a business [6,7]. Map use and
map education may provide important pathways for enhancing users’ spatial skills
more generally [5,8,9,10]. Research on map use may thus help to identify what map
qualities impede or enhance clarity or use, and may help to identify what qualities of
people must be taken into account when designing maps or educational interventions.
At the theoretical level, research on map understanding is valuable because maps
challenge users’ representational, logical, and – of particular relevance here – spatial
concepts. Studying how adults successfully use maps (or become confused by them)
may help to identify component spatial processes and strategies, in turn enhancing
understanding of basic spatial cognition.
In the current research, people were asked to find correspondences between loca-
tions in environmental space and locations on a map of that space. Figuring out where
one is “on a map” is an essential step for using a map to navigate from one’s current
location to another location. It is also an essential step for using a map to record in-
formation about spatial distributions of phenomena observed in the field, as when
geologists record locations of rock outcrops, ecologists record the nesting areas of a
particular species, or city planners record areas of urban blight.
There is a relatively large body of research that explores the way that people de-
velop and use mental representations of large environments [11,12,13]. There is also a
relatively large body of research that explores the way that people use maps to repre-
sent vista spaces, that is, spaces that extend beyond the tabletop, but that can still be
seen from a single vantage point or with only minor amounts of locomotion [14,15].
But there has been relatively little work that combines experience in large-scale, navi-
gable spaces with finding one’s location on ecologically valid maps of those spaces.
Our work falls at this intersection, and, as enumerated below, was designed to address
four major topics: adults’ success and strategies in identifying their current locations
on a map, whether these would differ with different map characteristics, whether suc-
cess would vary with participants’ spatial skills and gender, and, finally, whether pat-
terns of findings would be similar for field and computer mapping tasks.
1.1 Finding Oneself on a Map
First, we were interested in examining how well adults carry out the important step in
map use of locating themselves on a map when they are in a relatively unfamiliar en-
vironmental space and are given a map of that space without verbal information. This
is the condition one faces in real life when one is in a new environment with a map
labeled in a completely foreign language (as, for example, when an English-literate
monolingual is using a map labeled in Japanese or Arabic).
To collect relevant data, we asked college students (relatively new to campus) to
show their locations on a map similar to the one routinely provided to campus visi-
tors. Prior research [16] has shown that many adults head off in the wrong direction
Locating Oneself on a Map in Relation to Person Qualities and Map Characteristics 173
after consulting posted “You Are Here” maps when the map is unaligned with the
referent space (i.e., when up on the map does not indicate straight ahead in the space).
Would adults likewise have difficulty identifying their own location on a map even if
they had the opportunity to manipulate it as they liked? Would they rotate the map as
they tried to get their bearings?
1.2 Map Qualities
Second, we were interested in examining the effect of map variables on the user’s
success in identifying correct locations. Within psychology, research on map use has
tended to pay relatively little attention to the particular kind of map used. That is, psy-
chological research has generally examined map performance in relation to person
variables (e.g., age, sex, spatial skills) rather than in relation to cartographic variables
(e.g., scale, viewing angle, color schemes). Within cartography, research has tended
to examine the pragmatic effects of manipulating map variables (i.e., asking which of
several maps works best), paying relatively little attention to how perceptual and cog-
nitive theories inform or are informed by the observed effects.
One potentially fruitful way to tie these two traditions together is through the
concept of embodiment, the notion that our bodies and bodily activities ground some
aspects of meaning [17]. There has been considerable work on the importance of em-
bodied action for encoding spatial information from the environment. For example,
Hegarty and colleagues [18] reported that kinesthetic experiences associated with
moving through the environment contribute to learning spatial layouts. An embodi-
ment perspective also implies that place representations will be relatively more or less
difficult to interpret to the degree that they are more or less similar to embodied ex-
perience [19]. Consistent with this argument, prior research has shown that preschool
children are better able to identify locations on an oblique perspective map than on an
overhead map (plan view) of their classroom, and are better able to identify referents
on oblique than vertical aerial photographs [19,20,21]. In comparison to plan repre-
sentations, oblique representations are more consonant with perceptual experiences as
humans move through their ecological niche using the sensory and locomotor capaci-
ties of their species.
To test whether map characteristics have an effect on adult performance, we exam-
ined adults’ success in marking their locations on one of four different kinds of cam-
pus maps created by crossing two dimensions – viewing angle (varying whether the
map was plan vs. oblique) and map shape (varying whether the map was round vs.
square). We expected that the difference in viewing angle might show an advantage
for the oblique map (following the embodiment argument above). We expected that
the difference in shape might advantage the round map because unlike a rectilinear
map, it does not implicitly privilege any particular orientation (thus perhaps increas-
ing participants’ propensity to turn the map into alignment with the environment).
However, because the two map variables might be expected to interact (because an
oblique – but not a plan view map – specifies a particular viewing direction), we did
not design this work as a test of a priori predictions, but instead as a means of exam-
ining adults’ success and strategies in relation to map type.
1.3 Spatial Skills and the Campus Mapping Task
A third goal of our research was to examine whether spatial skills would predict
performance on the campus mapping task, and if so, which spatial tasks would have
predictive value. Earlier investigators have addressed the relation between spatial
abilities and success in learning large-scale spatial layouts [18,22]. Here we extended
this approach to tasks that did not require integrating or remembering information
gathered across time and space, but instead required participants to link information
from the visible, directly perceived environment to a graphic representation of that
environment. To select the candidate spatial skills, we drew from the task- and meta-
analysis of Linn and Petersen [23] which identified three major kinds of spatial abili-
ties: mental rotation (skill in imagining figures or objects moving through two- or
three-dimensional space), spatial perception (skill in representing one’s own or an
object’s orientation despite conflicting visual cues or frames of reference), and spatial
visualization (skill in solving multi-step spatial tasks by a combination of verbal and
visual strategies). In addition, we designed our work to examine whether participant
sex would have any predictive value for performance on the mapping task, above and
beyond any that might be attributed to differences in measured spatial skills. This
question was of interest because of the continuing evidence of gender differences in
spatial cognition [24].
1.4 Simulating Environmental Mapping
A final goal of our research was motivated by the practical challenges of studying
map-related spatial cognition in the field as in the campus mapping task just de-
scribed. There are surprisingly frequent changes in field sites even in environments
that might be expected to be highly stable. In our work, for example, even over short
time spans we have encountered the construction of new buildings, new roads, and
new signage, all of which influence the test environment, require a change in routes
between locations, and necessitate the preparation of new maps. Outdoor testing is
open to the exigencies of weather and daylight; the use of large field sites requires
energetic experimenters and participants. The layout of field sites cannot be manipu-
lated to test theoretically interesting questions. It is difficult to identify local partici-
pants who do not yet have too much familiarity with the site and equally well it is
difficult to identify and transport non-local participants to the site. These and similar
concerns led us to join others who have attempted to develop simulated testing envi-
ronments [19,25] to study environmental cognition.
The specific approach taken here was to derive research measures from the soft-
ware included in the Where Are We? [WAW?] map-skills curriculum developed by
Kastens [26]. This software links dynamic images of eye-level views of a park (video-
taped as someone walked through a real park) to a plan map of that park. The soft-
ware allows the user to control the walk through the park (and hence the sequence of
scenes shown on the video image) by clicking on arrows beneath the videotaped inset.
Arrows (straight, pointing left, pointing right) control whether the video inset shows
what would be seen if walking straight ahead, turning left, or turning right. As de-
scribed in more detail below, using WAW? exercises, we created mapping tasks in
which eye-level views of the terrain had to be linked to locations and orientations on
the map. Our goal was first, to explore whether the same kinds of spatial skills (if
any) would predict performance on the campus mapping and computer tasks, and
second, to examine whether performance on the campus and computer tasks was
highly related.
1.5 Summary
In summary, this research was designed to provide descriptive data on adults’ success
and their strategies in marking maps to indicate their locations in a relatively new
campus environment, to determine whether mapping performance or strategies would
vary across maps that differed with respect to viewing angle (plan vs. oblique) and
shape (square vs. round), to examine whether paper and pencil spatial tasks and par-
ticipant sex would predict success on the campus mapping task, to explore whether
similar person qualities would predict success on a computer mapping task, and to
determine whether performance on the field and computer mapping tasks would be
highly correlated.
2 Method
Students who were new to a large state university campus in the U.S. and were mem-
bers of the psychology department’s subject pool were recruited to participate in this
study. Sixty-nine students (50 women, 19 men; M [SD] age = 18.6 [1.4] years) par-
ticipated in session 1 for which they received course credit. Most participants (48)
took part in this first session within 6 weeks of their arrival on campus, and the
remainder did so within 10 weeks of arrival. Self-reported scores on the Scholastic
Aptitude Test (SAT) were provided by 44 participants: Ms (SDs) for verbal and quan-
titative scores, respectively, were 599 (75) and 623 (78). Participants’ race/ethnicity
reflected the subject pool which was almost entirely White.
Following completion of all session-1 testing, participants were invited to return
for session 2 for which they received either additional course credit or $10, as pre-
ferred. Of the initial group, 43 students (31 women, 12 men) returned.
Session 1 included the outdoor campus mapping activity and paper and pencil spa-
tial tasks; session 2 included the computer mapping tasks. All testing for session 1
was completed first to take advantage of better weather for outdoor testing, and to
minimize students’ familiarity with campus for the campus mapping task.
2.1 Campus Mapping Task
Participants were greeted in a small testing room in the psychology department where
they completed consent forms. They were then given a map of the room and asked to
place an arrow sticker on the map so that the point of the arrow would show exactly
where they were sitting in the room, and the direction of the arrow would show which
direction they were facing. They were told that the experimenter would be using a
stopwatch to keep track of how long the activities were taking, but to place the sticker
at a comfortable pace rather than attempt to rush. Participants implemented these di-
rections indoors without difficulty. Following this introduction to the procedure, they
were told that they would be doing something similar outside as they toured campus.
Participants were then led along a fixed route to five locations on campus. At each,
a laminated campus map was casually handed to participants (maps were intentionally
unaligned with the space), and participants were asked to place an arrow sticker on
the map to show their location and direction. (Because there was some experimenter
error in orienting participants at some locations, the directional data were compro-
mised and thus only those data depending on participant location are described here.)
Each participant was randomly assigned to use one of four different campus maps
described earlier. Both the oblique perspective map (the official campus map) and the
plan map were created by the university cartographers except that all labels were re-
moved. All maps were identical in size and scale: square sides and circle diameters
were 205 mm, representing approximately 965 m, thus at a scale of approximately
1:4,700. An illustrative map is shown in Fig. 1.
At each location, the experimenter recorded whether the participant turned the map
from its initial orientation, the time taken to place the sticker on the map (beginning
from when the map was handed to the participant), and the map orientation (in rela-
tion to the participant’s body) at the moment the sticker was placed. Participants did
not have a map as they were led from location to location, and experimenters chatted
with participants as they walked to reduce the likelihood that participants would focus
on their routes. After all test locations had been visited, the participants returned to
the lab where they were given the paper and pencil spatial tasks (described later). Par-
ticipants were asked to provide their scores on the SAT if they could remember them
and were willing to report them.
Fig. 1. Round oblique map. See text for information on map size and scale.
After the session was completed, each map with its sticker was scanned. Of the po-
tential 345 sticker placements (5 stickers for each of 69 participants), 3 stickers from
two participants’ maps became dislodged before the maps were scanned and thus full
data for the campus map task were available for 67 of the 69 participants. Sticker
placements were scored as correct if the tip of the arrow fell within a circle centered
on the correct location, with a radius of 6 mm (equivalent to approximately 28 m on
the ground).
2.2 Computer Mapping Tasks
In session 2 we administered computer mapping tasks drawn from the WAW? curricu-
lum described earlier. One task was drawn from the activity called Are We There Yet?
In this activity, the participant is shown a starting position and facing direction on the
map, sees on a video inset what would be visible from that position, and is asked to
use the arrow keys to navigate to a target location. To ease the participant’s introduc-
tion to the software, the navigation task used here was the easiest one available in
WAW? The second activity was drawn from the WAW? activity called Lost! In this
activity, participants are dropped into the park in some unknown location (i.e., it is
not marked on the map), and are asked to discover where they are by traveling around
the park via arrow clicks that control which video images are seen. We gave partici-
pants two Lost! problems, the first at the easiest level of task difficulty and the second
at the most difficult. For all three tasks, we recorded whether or not the problem was
solved (i.e., whether the target location was found or whether the location was cor-
rectly identified), how many seconds and how many arrow clicks the participant used
within the maximum time allotted (8 minutes for each of the tasks).
2.3 Spatial Tasks
During session 1, participants were given paper and pencil tests to measure the three
spatial skills identified by Linn and Petersen [23]. A paper folding test (PFT) was
used to assess spatial visualization [27]. This task shows 20 sequences of between two
and four drawings in which a sheet of paper is folded one or more times and then a
hole is punched through the layers. Respondents are asked to select which of five
drawings shows the pattern of holes that would appear if the paper were then com-
pletely unfolded. Scores are the number marked correctly minus one-fourth the num-
ber marked incorrectly within the allowed time (here 2 minutes). The test of spatial
perception was the water level task (WLT) in which students are given drawings of
six tipped, straight-sided bottles and asked to draw a line in each to show where the
water would be if the bottle were about half full [28]. Lines drawn within 5° of hori-
zontal were scored as correct. Finally, mental rotation (MR) was assessed by a modi-
fied version of the Spatial Relations subtest of the Primary Mental Abilities (PMA)
battery [29]. Respondents are shown 21 simple line figures as models. Each model is
followed by five similar figures, and respondents are asked to circle any that show the
model rotated but not flipped over (i.e., not a mirror image). Scores are the number
correctly circled (2 per row) minus those incorrectly circled (up to 3 per row) within
the allotted time (here 2 minutes).
3 Results
The data are presented below in five sections. First, we offer descriptive data on the
performance on the campus mapping task. Second, we address the question of
whether performance or strategies on the campus mapping task differed as a function
of map type. Third, we address whether performance on the campus mapping task is
predicted by participant variables. Fourth, we address the same question for the com-
puter mapping task. Finally, we address the relation between performance on the
campus and computer mapping tasks.
3.1 Performance on the Campus Mapping Task
College students’ performance on the campus mapping task covered the full range,
with some placing none, and others placing all five stickers correctly, M (SD) = 2.2
(1.4). An even more telling index of performance variability is evident in Fig. 2 which
shows the locations of erroneous responses for one target location. It is striking not
only that many responses are distant from the correct location, but also that many
responses fail to show the correct kind of location.
Fig. 2. Erroneous sticker placements (40 black circles) for one target location (star). Omitted
are 12 stickers placed correctly and 17 stickers falling within the area defined by adjacent
buildings (striped region). Note that some errors were particularly egregious, as in stickers
placed in open fields or parking lots.
3.2 Campus Mapping Task and Map Variables
Accuracy of Sticker Placements. As explained initially, this research was also de-
signed to examine whether task performance would vary with map qualities of shape
and viewing angle. To examine this question, the total number correct served as the
dependent variable in a two-way analysis of variance (ANOVA) in which between-
subjects factors were map shape and map angle. Neither main effect nor their interac-
tion was significant. Means (SDs) for round versus square, respectively, were 2.2
(1.3) versus 2.3 (1.5); for plan versus oblique, 2.1 (1.4) versus 2.4 (1.4).
Speed of Sticker Placements. As a second means of examining the possible impact

of map variables on performance on the campus mapping task, we analyzed the time
participants took to place the arrows on the map. A two-way ANOVA showed a sig-
nificant interaction between map shape and viewing angle, F(1,65)=6.98, p = .010,
subsuming a main effect of viewing angle, F(1,65)=7.52, p = .008. Specifically,
when the map was square, average response times were significantly longer on the
plan than the oblique map, Ms (SDs) in seconds, respectively: 38.7 (21.7) versus 19.1
(9.3), whereas when map shape was round, response times did not differ significantly
for the plan and oblique maps, 27.7 (11.5) versus 27.74 (14.6). (If all four map types
are entered as four levels of a map-type factor, the average response time was signifi-
cantly longer for the square plan map than for any other map type among which there
were no significant differences.) This pattern holds within individual items and irre-
spective of accuracy. That is, the reaction times for the square plan map are consis-
tently longer both among individuals who responded correctly and among those who
responded incorrectly on a particular item.
Map Turning. A third dependent measure examined in relation to map type was use
of a map-turning strategy. For this analysis, the dependent measure was the number of
locations (0-5) at which participants turned the map rather than leaving it in the orien-
tation in which they received it from the experimenter. A few participants never
turned the map or turned it only once (n=4); on average, the map was turned on 3.9
(1.3) items. An ANOVA on the number of turns revealed neither main effects nor
interactions with respect to map shape or viewing angle. Means (SDs) for round ver-
sus square, respectively were 3.9 (1.2) versus 4.0 (1.4); for plan versus oblique, 4.1
(1.2) versus 3.8 (1.4).
Map Orientation. The final behavior examined with respect to map type was how the
participant held the map (with respect to the participant’s own body) while placing the
sticker. Based on the sides of the square map, we defined as canonical the position shown
in Fig. 2 or its 90°, 180°, or 270° rotation. A 2 (map shape) x 2 (map angle) ANOVA on
the number of canonical orientations (0-5) revealed a significant main effect of map shape,
F(1,65)=5.35, p=.024. More canonical orientations were used by participants with square
than with circular maps, Ms (SDs), respectively, 4.0 (1.0) versus 3.3 (1.4).
3.3 Campus Mapping Task and Participant Variables
To provide descriptive data on the association between performance on the campus

mapping task and participant qualities, we first computed the correlation between the
number of stickers placed correctly on the campus mapping task and scores on each
of the three paper and pencil spatial tests. Correlations of sticker accuracy with mental
rotation (MR), spatial visualization (PFT), and spatial perception (WLT), respec-
tively, were r(67) = .048, p = .357; r(67) = .321, p = .004; and r(67) = .219, p = .038
(here and below, one-tailed tests were used given directional hypotheses). These cor-
relations reflect data from all participants in session 1, irrespective of whether they
were available for session 2. (An identical pattern of results holds if analyses are lim-
ited to the 43 participants who took part in both sessions.) As anticipated, perform-
ance on the three spatial measures was also correlated: MR with PFT, r(69) = .425, p
< .001; MR with WLT, r(68) = .410, p < .001, and PFT with WLT, r(68) = .253, p =
.019. (Again, identical patterns hold with the smaller sample as well.)
The number of correct sticker placements was then used as the criterion variable
for a regression analysis of the campus mapping task. A stepwise regression was per-
formed with the three spatial tests entered on the first step. We entered participant sex
on the second step to determine if there were any effects of sex above and beyond
those that could be attributed to possible spatial skill differences. Finally, on step
three we entered the strategy variable of the number of locations at which the partici-
pant turned the map.
At the first level of the model, all three predictors together accounted for 15% of the
variance, R2 = .15, F(3, 66) = 3.61, p = .018. Within this multiple regression, however,
only PFT predicted success (standardized β = .34, p = .010). At the second level of the
model, participant sex did not significantly increase the prediction, p-change = .56,
although PFT remained a significant predictor (standardized β = .34, p = .010) and the
overall model remained significant, R2 = .15, F(4, 66) = 2.76, p = .035. Finally, at the
third level of the model, the map-turning strategy significantly improved the prediction,
R2-change = .108, p-change = .004 (standardized β = .35, p = .004), and PFT remained a
significant predictor (standardized β = .27, p = .033). The final overall model was
R2 = .25, F(5, 66) = 6.59, p = .002.
3.4 Computer Mapping Task and Participant Qualities
A composite measure of participants’ performance on the computer mapping tasks

was created by summing the number of WAW? tasks that were completed correctly
within the allotted amount of time. (Similar patterns of results were obtained with
time or the number of arrow clicks measures instead.) As in the campus mapping task,
we first computed the correlation between performance on the computer mapping task
with each of the three paper and pencil spatial tests. Correlations with mental rotation
(MR), spatial visualization (PFT), and spatial perception (WLT), respectively, were
r(43) = .495, p < .001; r(43) = .317, p = .019; and r(43) = -.009, p = .478. These cor-
relations necessarily reflect data from only those who participated in both session 1
and 2 (when WAW? data were collected).
The composite WAW? measure served as the outcome variable for a regression
parallel to the one described above, that is, with the spatial tests entered on step 1 and
participant sex on step 2 (although the map-turning strategy was not entered on step 3
because there was no corresponding opportunity for map rotation on the computer
mapping task). As was true in the regression analysis of the campus mapping task,
there was a significant effect of spatial measures at step 1, R2 = .30, F(3, 42) = 5.44,
p = .003, but again, participant sex at step 2 did not add significantly to the model
after spatial scores had been entered (p-change = .603). However, unlike the prior
regression, in this analysis it was MR (standardized β = .52, p = .003) rather than PFT
(standardized β = .12, p = .475) that predicted mapping performance on the computer
task.
3.5 Relating Performance on Campus and Computer Mapping Tasks
An additional goal of this research was to explore the possibility that the computer
mapping tasks drawn from WAW? might be a viable substitute for measuring success
on mapping tasks in the real, life-size environment. To evaluate this possibility, we
computed correlations between scores on the two tasks. Irrespective of which depend-
ent measure is used for the WAW? tasks (number completed, time in seconds, or
number of arrow clicks), there was no significant relation between scores on the cam-
pus and computer tasks. The highest correlation was between the number of correctly
placed stickers on the campus mapping task and the number of correctly completed
WAW? tasks, and it was not marginally significant even with a one-tailed test, r(43) =
.121, p = .22. Furthermore, what little trend toward an association there was disap-
pears entirely by statistically controlling for scores on the spatial tasks: partial r(39) =
.005, p = .487.
As an additional means of examining the distinctions or comparability of the two
mapping tasks, we compared the patterns of association between success on each
mapping task and the success on the paper and pencil spatial tasks. As is evident from
the findings described for each of the two mapping tasks taken individually, the re-
gression analyses showed different patterns for the campus and computer mapping
tasks. Particularly striking was the finding that MR score predicted performance on
the computer mapping task, but not performance on the campus mapping task. To
provide data bearing on the question of whether the associations differ in the two
tasks, we compared the sizes of the correlations between MR score and performance
on campus versus computer tasks. These correlations differed significantly,
t(40)=1.73, p <.05. Neither of the other correlations (PFT or WLT) differed signifi-
cantly between the two mapping tasks.
4 Discussion
We begin our discussion by commenting on what the empirical data suggest about
how well adults can mark a map to show their location in a real, relatively newly en-
countered campus environment, addressing the question of whether performance dif-
fers in relation to the two manipulated map characteristics (viewing angle and map
shape). In the course of doing so, we comment on the appearance and distribution of
the map-related behaviors observed during the campus mapping task. We then discuss
findings from the regression analyses concerning which individual difference vari-
ables predict performance on the campus mapping task and performance on the com-
puter mapping task. Finally, we discuss implications of data concerning the relation
between performance on the two mapping tasks.
4.1 Performance and Strategies on the Campus Mapping Task and Their
Relation to Map Characteristics
The data from the campus mapping task offer a compelling demonstration that many
adults are challenged by the request to show their location on a map. The fact that
some participants were right at every one of the locations establishes that the task was
a solvable one. The fact that some participants were wrong at every one of the loca-
tions establishes that the task was not a trivial one. Furthermore, egregious errors (see
Fig. 2) suggest that some adults’ map-interpretation skills are particularly poor. Al-
though it is perhaps not surprising to see errors like these among preschool and ele-
mentary school children [20,30], it is surprising to see them among adults. Based on
participants’ comments and affective demeanor during testing, we have every reason
to believe that all were engaged by the task, and all were trying their best.
In addition to providing information on absolute levels of performance, the campus
mapping task was of interest as an avenue for testing the possible impact of the map
characteristics of map shape and viewing angle. One reason that we thought that map
characteristics might lead to different behaviors and different levels of accuracy was
because the different map characteristics might be differentially conducive to partici-
pants’ aligning the map with the space, and research with both adults and children had
shown better performance with aligned than unaligned maps [16,31,32]. The current
data, however, provided no evidence that map shape affected accuracy on the location
tasks nor that it affected the number of items on which participants turned the map.
This was true even if we limited the comparison to the plan maps which – unlike the
oblique maps – did not imply a particular vantage point.
We had also hypothesized that oblique maps – in comparison to plan maps – might
elicit better performance insofar as they were more consonant with an embodied view,
that is, one more similar to that encountered by humans as they navigate through the
environment [19] and given that past research had shown advantages to oblique-
perspective representations for children [20,21]. Again, however, there were no sig-
nificant differences in accuracy or strategies in relation to map angle, either as a main
effect or in interaction with map shape.
Although there were no differences in accuracy in relation to map type, partici-
pants were significantly slower on the square plan map than on any other map type. In
addition, square maps were held in canonical positions in relation to participants’ bod-
ies significantly more often, implying that these maps were less often aligned with the
environmental space. Perhaps the extra time taken for the square plan maps reflects
additional time needed for mental rotation with unaligned maps. That the oblique ver-
sion did not require additional time suggests that participants may (like children) find
it easier to work with the oblique map, despite the fact that in most orientations, its
vantage point differs from the one experienced in the actual environment. The data do
not yet permit definitive conclusions about process, but they do permit the conclusion
that additional research on the effects of map characteristics is worthwhile.
4.2 Predictors of Success on Campus and Computer Mapping Tasks
As expected, the regression analyses showed that spatial skills significantly predicted
performance on both the campus mapping task and the computer mapping task. Sex
added no additional prediction in either task. Interestingly, the specific spatial skills
that predicted performance differed on the two tasks. For the campus mapping task, it
was the score on the paper folding task that was the significant predictor. Mental rota-
tion scores added nothing further to the prediction. The reverse held in the computer
mapping task. For this task, it was the score on the mental rotation task that predicted
task success, and other spatial scores did not add significantly to the prediction.
In the taxonomy offered by Linn and Petersen [23], the paper folding task falls
within the skill category labeled spatial visualization which they describe as covering
tasks that involve multiple steps, using visual or verbal strategies, or both. It is possi-
ble to think of the campus mapping task as one for which varied approaches would
indeed be viable. For example, someone might focus on landmark buildings, someone
else might focus on the geometric qualities of the streets, someone else might try to
figure out the direction walked from some earlier identified spot, some might try to
align the map and the space, and so on. In other words, this outdoor task – much like
normal map-based navigation – gives the map-user considerable freedom in structur-
ing the task.
That mental rotation mattered for performance on the computer mapping task is
also easily understood because in this task – unlike the campus mapping task – par-
ticipants had less control over the visual array and the map. That is, although partici-
pants controlled which video clip they saw (by selecting which of three arrows they
clicked at every choice point), they had no control of what was seen within the result-
ing video clip that was played. That is, once a video clip had been selected by an ar-
row click, participants saw whatever part of the park was recorded by the camera – at
the camera’s height, at the camera’s angle, at the camera’s azimuth, and at the cam-
era’s speed of rotation or translation. Furthermore, participants had no control over
the orientation of the map: the map of the videotaped park was always in a fixed posi-
tion, and thus, usually out of alignment with the depicted vista. It is thus not surpris-
ing that under these conditions, an ability to handle mental rotation was significantly
associated with performance.
An additional finding from the regression analysis on the campus mapping task
lends further support to the hypothesized importance of participants’ own actions for
success on the task. Specifically, as reported earlier, participants’ use of the map-
turning strategy added significant prediction to the score on the campus mapping task
even after spatial skills had been entered into the regression model. Aligning a map
with the referent space is an epistemic action, defined by Kirsch and Maglio as an
action in which an agent manipulates objects in the environment with the goal of ac-
quiring information [33]. As explicated by Kirsch and Maglio for the case of expert
Tetris players, epistemic actions serve the user by revealing otherwise inaccessible
information or by decreasing the cognitive load required to gain information. For ex-
ample, it is more time-efficient for Tetris players to rotate a polygon on the screen and
visually compare its shape with a candidate nesting place than to do the rotation and
comparison mentally. In our work, we have observed epistemic actions in a task in
which adults visited eight outcrops in a field site, and were asked to select which of
14 scale models best depicts the underlying geological structure [34]. As they strug-
gled to select the correct model, some participants rotated candidate models into
alignment with a map of the area, rotated candidate models into alignment with the
full-scale geological structure, placed two candidate models side by side to facilitate
comparison, and pushed rejected models out of the field of view. Like rotating a Tet-
ris shape or rotating a scale model of a geological structure, rotating a map into
alignment with the referent space decreases the cognitive load required to solve the
task at hand by substituting direct perception for mental rotation and mental compari-
son. Use of epistemic actions requires that the agent foresees, before the action is
taken, that the action will have epistemic value; such tactical foresight is separate
from the spatial skills measured by the paper and pencil tasks, in which the actions are
prescribed by the experimenter.
4.3 Computer Screens Are Not Real Environments
The regression findings just discussed provide one line of evidence that the computer
mapping task cannot be used as a substitute for the campus mapping task for studying
spatial cognition. That is, the finding that different spatial skills predict performance
on each of the two mapping tasks implies that the two tasks differ in important ways.
This conclusion is bolstered by two other findings, first, that there is a significant dif-
ference in the size of the correlation between MR and performance on the campus
mapping task versus the computer mapping task, and second, that the correlation be-
tween scores on the two mapping tasks is not significant. Taken together, these data
imply that it is important to continue to conduct mapping research – as well as map
skill education – in real, life-size environments.
5 Conclusions
The data from the present research bear upon adults’ success in using one of the most
common kinds of spatial representations of large environments – maps – as they ob-
serve the environment directly in the field or via another representational medium.
Our data show dramatic variability with respect to how well cognitively intact adults
(all of whom met the intellectual criteria needed for university admission) succeed in
indicating their locations on a map. Although some participants showed outstanding
performance, others made serious errors reminiscent of those made by young children
[20,32,35].
Our data also bear on questions about research in different kinds of spatial environ-
ments. The finding that different spatial skills predicted success on the campus versus
computer mapping tasks coupled with the finding that participants’ scores on the two
mapping tasks were not significantly correlated, lead to the conclusion that it is unwise
to substitute one task for the other. From the pragmatic perspective of conducting be-
havioral research in environmental cognition, this conclusion is perhaps disheartening. It
would ease research significantly if the answer were otherwise. From the perspective of
theoretical work on spatial cognition, however, the finding is more intriguing than dis-
heartening. The current findings contribute evidence to the growing conclusion that the
skills entailed in solving spatial problems in object or vista spaces do not entirely over-
lap with skills entailed in solving spatial problems in environmental spaces. Past re-
searchers have shown the importance of testing in real environments even for indoor,
built spaces (corridors and rooms) that are highly defined, homogeneous, and rectilinear
[18]. Our findings add to the evidence for the importance of testing in larger, more
varied, less clearly defined outdoor environments as well [36]. Outdoor environments
provide potential clues (e.g., a nearby building, a distant skyscraper, a river, the position
of the sun). But they also present potential challenges including barriers (that may ob-
struct otherwise useful landmarks), an absence of clear boundaries to define the borders
of the space (in contrast to the walls of a room), and vistas that may appear homoge-
nous to the untrained eye (e.g., desert vistas, dense forests, or acres of wheat fields as
far as the eye can see). A full understanding of human spatial cognition will thus re-
quire studying how people identify and use information that is available within a di-
verse range of environments.
Likewise, the findings from the research described here bear on the role of map
characteristics. Although our data do not yet permit firm conclusions about the way
that map qualities interact with environmental and person qualities, they do provide
strong support for the importance of systematically varying map qualities as we con-
tinue to explore the fascinating territory of spatial cognition.
Acknowledgments. Portions of this work were supported by National Science Foun-

dation (NSF) grants to Liben (RED95-54504; ESI 01-01758) and to Kastens (ESI-96-
17852; ESI 01-011086), although no endorsement by NSF is implied. We acknowl-
edge with thanks the contributions of past and current members of the Penn State
Cognitive & Social Development Lab, particularly Lisa Stevenson and Kelly Garner
who contributed in so many ways to this project.
References
1. Liben, L.S.: Environmental cognition through direct and representational experiences: A
life-span perspective. In: Gärling, T., Evans, G.W. (eds.) Environment, cognition, and ac-
tion, pp. 245–276. Oxford, New York (1991)
2. Downs, R.M., Liben, L.S.: Mediating the environment: Communicating, appropriating,
and developing graphic representations of place. In: Wozniak, R.H., Fischer, K. (eds.) De-
velopment in context: Acting and thinking in specific environments, pp. 155–181. Erl-
baum, Hillsdale (1993)
3. Harley, J.B., Woodward, D. (eds.): The history of cartography: Cartography in prehistoric,
ancient and Medieval Europe and the Mediterranean, vol. 1. University of Chicago Press,
Chicago (1987)
4. Stea, D., Blaut, J.M., Stephens, J.: Mapping as a cultural universal. In: Portugali, J. (ed.)
The construction of cognitive maps, pp. 345–360. Kluwer Academic Publishers, The
Netherlands (1996)
5. Uttal, D.H.: Seeing the big picture: Map use and the development of spatial cognition.
Dev. Sci. 3, 247–264 (2000)
6. MacEachren, A.M.: How maps work. Guilford, New York (1995)
7. Muehrcke, P., Muehrcke, J.O.: Map use: Reading, analysis, and interpretation, 4th edn. JP
Publications, Madison (1998)
8. Davies, C., Uttal, D.H.: Map use and the development of spatial cognition. In: Plumert,
J.M., Spencer, J.P. (eds.) The emerging spatial mind, pp. 219–247. Oxford, New York
(2007)
9. Liben, L.S.: Education for spatial thinking. In: Damon, W., Lerner, R.(series eds.) Ren-
ninger, K.A., Sigel, I.E. (vol. eds.) Handbook of child psychology: Child psychology in
practice, 6th edn., vol. 4, pp. 197–247. Wiley, Hoboken (2006)
10. National Research Council: Learning to think spatially: GIS as a support system in the K-
12 curriculum. National Academy Press, Washington (2006)
11. Evans, G.W.: Environmental cognition. Psy. Bull. 988, 259–287 (1980)
12. Gärling, T., Golledge, R.G.: Environmental perception and cognition. In: Zube, E.H.,
Moore, G.T. (eds.) Advances in environment, behavior and design, pp. 203–236. Plenum
Press, New York (1987)
13. Kitchin, R., Blades, M.: The cognition of geographic space. L.B. Taurus, London (2002)
14. Montello, D.R.: Scale and multiple psychologies of space. In: Campari, I., Frank, A.U.
(eds.) COSIT 1993. LNCS, vol. 716, pp. 312–321. Springer, Heidelberg (1993)
15. Montello, D.R., Golledge, R.G.: Scale and detail in the cognition of geographic informa-
tion. Report of the specialist meeting of Project Varenius, Santa Barbara, CA, May 14-16,
1998. University of California Press, Santa Barbara (1999)
16. Levine, M., Marchon, I., Hanley, G.: The placement and misplacement of You-Are-Here
maps. Env. and Beh. 16, 139–158 (1984)
17. Johnson, M.L.: The meaning of the body. In: Overton, W.F., Mueller, U., Newman, J.L.
(eds.) Body in mind, mind in body: Developmental perspectives on embodiment and con-
sciousness, pp. 191–224. Erlbaum, New York (2008)
18. Hegarty, M., Montello, D.R., Richardson, A.E., Ishikawa, T., Lovelace, K.: Spatial abili-
ties at different scales: Individual differences in aptitude-test performance and spatial-
layout learning. Intelligence 34, 151–176 (2006)
19. Liben, L.S.: The role of action in understanding and using environmental place representa-
tions. In: Rieser, J., Lockman, J., Nelson, C. (eds.) The Minnesota symposium on child de-
velopment, pp. 323–361. Erlbaum, Mahwah (2005)
20. Liben, L.S., Yekel, C.A.: Preschoolers’ understanding of plan and oblique maps: The role
of geometric and representational correspondence. Child Dev. 67, 2780–2796 (1996)
21. Plester, B., Richards, J., Blades, M., Spencer, C.: J. Env. Psy. 22, 29–47 (2002)
22. Allen, G.L., Kirasic, K.C., Dobson, S.H., Long, R.G., Beck, S.: Predicting environmental
learning from spatial abilities: An indirect route. Intelligence 22, 327–355 (1996)
23. Linn, M.C., Petersen, A.C.: Emergence and characterization of sex differences in spatial
ability: A meta-analysis. Child Dev. 56, 1479–1498 (1985)
24. Halpern, D.F.: Sex differences in cognitive abilities, 3rd edn. Erlbaum, Mahwah (2000)
25. Lawton, C.A., Morrin, K.A.: Gender differences in pointing accuracy in computer-
simulated 3D mazes. Sex Roles 40, 73–92 (1999)
26. Kastens, K.A.: Where Are We? Tom Snyder Productions, Watertown, MA (2000)
27. Ekstrom, R.B., French, J.W., Harman, H.H.: Manual for kit of factor-referenced cognitive
tests. Educational Testing Service, Princeton (1976)
28. Liben, L.S., Golbeck, S.L.: Sex differences in performance on Piagetian spatial tasks: Dif-
ferences in competence or performance? Child Dev. 51, 594–597 (1980)
29. Thurstone, T.G.: Primary mental abilities for grades 9-12. Science Research Associates,
Chicago (1962)
30. Kastens, K.A., Liben, L.S.: Eliciting self-explanations improves children’s performance on
a field-based map skills task. Cog. and Instr. 25, 45–74 (2007)
31. Bluestein, N., Acredolo, L.: Developmental changes in map-reading skills. Child Dev. 50,
691–697 (1979)
32. Liben, L.S., Downs, R.M.: Understanding person-space-map relations: Cartographic and
developmental perspectives. Dev. Psy. 29, 739–752 (1993)
33. Kirsch, J.G., Maglio., P.: On distinguishing epistemic from pragmatic action. Cog. Sci. 18,
513–549 (1994)
34. Kastens, K.A., Liben, L.S., Agrawal, S.: Epistemic actions in science education. In: Fre-
ska, C., Newcombe, N.S., Gärdenfors, P. (eds.) Spatial cognition VI. Springer, Heidelberg
(in press)
35. Liben, L.S., Kastens, K.A., Stevenson, L.M.: Real-world knowledge through real-world
maps: A developmental guide for navigating the educational terrain. Dev. Rev. 22, 267–
322 (2002)
36. Pick, H.L., Heinrichs, M.R., Montello, D.R., Smith, K., Sullivan, C.N., Thompson, W.B.:
Topographic map reading. In: Hancock, P.A., Flach, J., Caird, J.K., Vicente, K. (eds.) Lo-
cal applications of the ecological approach to human-machine systems, vol. 2, pp. 255–
285. Erlbaum, Hillsdale (1995)
Conflicting Cues from Vision and Touch Can Impair
Spatial Task Performance: Speculations on the Role of
Spatial Ability in Reconciling Frames of Reference
Madeleine Keehner
School of Psychology, University of Dundee, UK

M.M.Keehner@Dundee.ac.uk
Abstract. In “hand assisted” minimally invasive surgery, the surgeon inserts one
hand into the operative site. Despite anecdotal claims that seeing their own hand
via the laparoscopic camera enhances spatial understanding, a previous study us-
ing a maze-drawing task in indirect viewing conditions found that seeing one’s
own hand sometimes helped and sometimes hurt performance (Keehner et al.,
2004). Here I present a new analysis exploring the mismatch between kinesthetic
cues (knowing where the hand is) and visual cues (seeing the hand in an orienta-
tion that is incongruent with this). Seeing one’s left hand as if from the right side
of egocentric space (palm view) impaired performance, and this depended on
spatial ability (r=-.54). Conversely, there was no relationship with spatial ability
when viewing the left hand from the left side of egocentric space (back view).
The view-specific nature of the confusion raises a possible role for spatial abili-
ties in reconciling spatial frames of reference.
Keywords: bimodal, cross-modal, visuotactile, frame of reference, mental

rotation, spatial ability, kinesthetic, proprioceptive, sensory cues, individual
differences.
1 Introduction
This paper presents a new analysis of data originally presented at the Human Factors
and Ergonomics Society annual conference [1]. The original motivation for the study
was to assess a specific anecdotal claim made by surgeons working under minimally
invasive conditions. In laparoscopic or “keyhole” surgery, a special technique is
sometimes employed in which one of the small incisions in the patient’s body is
slightly enlarged, and the surgeon’s non-preferred hand is inserted through this into
the operative site. Under these conditions, the surgeon's hand becomes visible on the
video monitor via the laparoscopic camera, and it can be guided and used like a surgi-
cal instrument. Surgeons anecdotally report that seeing their own hand on the video
monitor enhances their understanding of the spatial relations within the operative
space, in this otherwise spatially demanding domain.
This claim is intuitively plausible, and is consistent with prior literature on cross-
modal sensory integration in peripersonal space. However, previous studies in this
field have typically allowed participants to view their own hands directly, not via
Conflicting Cues from Vision and Touch Can Impair Spatial Task Performance 189
video feedback, and in these studies the angle from which the hand is seen is usually
consistent with its actual orientation in space. By contrast, in minimally invasive
surgery the surgical camera, or laparoscope, is often placed in an orientation that is at
odds with the surgeon's own perspective, producing a view of the hand that is spa-
tially misaligned and incompatible with proprioceptive information.
In the 2004 paper, we showed that congruent and conflicting kinesthetic and visual
cues sometimes help and sometimes impair performance on a spatial task (maze
drawing in indirect viewing conditions). The present new analysis provides novel
insights into these effects. In this paper I show that the confusion caused by seeing
one’s own hand from an incongruent angle is viewpoint-specific. Moreover, the de-
gree of confusion and even whether any confusion occurs (relative to performance on
the same task without seeing the hand) depends strongly on individual differences in
spatial abilities. In this paper I discuss possible reasons for the viewpoint specific
nature of the confusion and speculate on how spatial abilities might function in recon-
ciling conflicting sensory cues relating to position and orientation in space.
1.1 The Relationship between Vision and Touch
Previous studies with humans and primates have shown that the senses of vision and
touch have a special relationship. Graziano and Gross have identified bimodal neu-
rons that respond only when the information received through vision and touch corre-
sponds [2]. These specialized visuo-tactile neurons fire when the hand reaches
towards an object or location in reachable space that can simultaneously be seen.
When the hand or its target location is unseen, these cells do not respond. This find-
ing suggests that highly dexterous higher primates, including humans, have developed
specialized bimodal connections between vision and touch, evolved for exploring the
world with seen hands.
The fundamental nature of the relationship between vision and touch is demonstrated
neatly by the crossmodal congruency effect. Driver and colleagues have shown that
visual cues, such as LEDs attached to the surface of the hands, can enhance speed of
responses to tactile stimuli [3]. This inter-modal facilitatory effect demonstrates the
rapid and automatic crosstalk between the two senses, such that spatial cues presented in
one modality can speed reactions to spatial cues presented in the other modality. Impor-
tantly, this effect follows the hand when it moves in space, such as when the hands are
crossed in front of the body, demonstrating that these cross-modal sensory cues are
coded in limb centered or body centered spatial coordinates.
1.2 Representation of the Hands in the Body Schema
It is well established that the somatosensory cortex of the brain represents the mo-
ment-by-moment positions and orientations of body parts as we move our limbs,
trunk, and head in space and in relation to each other [4]. The somatosensory cortex
receives proprioceptive feedback from muscles, joints, and tendons, and combines
these in a representation of the body's configuration and the relationships among dif-
ferent body parts or effectors. This “felt” position and orientation of body parts makes
up our internal representation or “body schema”.
190 M. Keehner
Although this internal representation is being constantly updated by proprioceptive

information as we move our bodies in space, studies involving mental rotation of
hands have shown that there is a “default” representation of hand orientation within
the body schema. Sekiyama has shown that the fastest responses in imagined hand
rotation tasks are generally to stimuli showing a back view of the hand [5]. This find-
ing suggests the default or baseline representation of hand orientation in the body
schema is the equivalent of having the hand in front of the body with the back of the
hand in view, and it indicates that motor imagery processes start from this orientation.
Interestingly, there is evidence that this default positional representation develops
over time. Funk and colleagues had participants perform a mental rotation task using
hand images while they held their hands in either palm up or palm down orientations
[6]. In children, the physical orientation of the hands in space affected speed of re-
sponse - when the actual orientation of their hands matched the target orientation they
were fastest, whereas when the two were incongruent they took longer. This suggests
that the children started the mental rotation process from an internal representation
that matched the physical orientation of their hands in space. By contrast, adults were
consistently faster on trials where a back view was presented, regardless of the physi-
cal orientation of their hands in space. This suggests that by adulthood the default
internal representation of hand orientation (the back view) defines the starting orienta-
tion of our imagined hands in the body schema, and is more influential in this kind of
motor imagery task than the actual orientation of our hands in space.
1.3 Body Schema Can Be Influenced by Visual Experience
Although these studies demonstrate that by adulthood the body schema is well-
developed in the sensorimotor cortex, a number of ingenious experiments using fake
hands and visual prisms have shown that what we see can influence our internal rep-
resentation of limb position. Sekiyama had participants wear a visual prism that re-
versed left and right. This produced a conflict between vision and touch, such that the
participant’s right hand looked like their left hand when viewed through the prism.
After adaptation the internal representation of the hand had changed in a way that
brought it into line with the visual information [7]. This finding demonstrates that
visual experience can dramatically affect the body schema representation. Indeed, it
has been argued that visual experience may be a key mechanism by which we acquire
our default representation of hand orientation in the body schema by the time we
reach adulthood, since the back of the hand is the most frequently seen orientation of
our own hands as we grasp and manipulate everyday objects [6].
The crossmodal congruency effect described above [3], in which cues from one
modality (vision) can speed attention in another modality (touch), has been shown to
occur even when the seen “hand” is not the participant’s own. The effect has been
demonstrated with fake hands, and occurs even though the participant’s own hand is
displaced somewhat from the location of the fake hand, such as being underneath the
table on which the fake hand is placed. Studies have shown that one of the most im-
portant factors for producing these illusions with false hands is temporal alignment. If
the participant feels a touch at precisely the same moment as they see a fake hand or
rubber glove being touched they can experience an illusion whereby they are con-
vinced that it is their own hand that they are seeing [10]. Thus, a perfect match in
terms of timing between what is seen and what is felt seems to be critical in aligning
information received through vision and touch. However, this effect is disrupted
when the discrepancy between the orientations of the fake hand and the participant’s
own hand become too great, such as when the fake hand is rotated ninety degrees
relative to the participant’s hand [8, 9].
1.4 Spatial Coding and the Parietal Cortex
Our apparently unitary representation of body position in space is generated when the
information from all of our sensory modalities are integrated in the brain. Human and
monkey studies have shown that this occurs in the posterior parietal cortex, specifi-
cally within the intraparietal sulcus (IPS). Areas within monkey IPS are critical for
integrating information acquired through vision and touch, and are active in control-
ling reaching and pointing movements in space. Homologous regions exist in human
IPS, and in both species this area appears to play a critical role in creating a represen-
tation of the space of our bodies and the space around our bodies, with particular
importance in tasks that involve movement of the hands guided by vision [11].
In an ingenious study, monkeys were trained to retrieve food rewards from beneath
a glass plate that could be turned clear or opaque at the flick of a switch. After train-
ing, neurons in the ventral intraparietal sulcus, which had previously been responding
only to proprioceptive information, showed visual responses, indicating that they had
become bimodal through the process of associating visual and proprioceptive infor-
mation. The visual receptive fields persisted even when the view of the arm was
obscured, leading the authors to argue that these intraparietal bimodal neurons allow
the updating of body images even when the limb is unseen [12].
Sekiyama argues that of all the brain regions containing bimodal neurons, the IPS
is perhaps the most important for our internal representation of the body in space [13].
Graziano and colleagues have shown that neurons in parietal area 5 respond to the
sight of a fake arm, and furthermore that these neurons can distinguish between plau-
sible and implausible arm orientations and even between a left hand and a right hand
by sight alone (e.g., the neurons did not respond when a fake right arm was placed in
the same position and orientation as the monkey’s own left arm) [14]. Sekiyama
argues that bimodal neurons in this region (unlike other sensorimotor regions) inte-
grate visual and proprioceptive cues when the visual information matches the internal
body schema representation, and therefore the parietal cortex and specifically the IPS
contains the highest level of representation of the body in space [13].
Thus, the parietal lobe plays a critical role in integrating the many different forms
of sensory information that we receive into a high-level, overarching representation of
our own body in space. From many different sensory inputs (e.g., head-based, trunk-
based, arm-based, and retinocentric frames of reference), the parietal lobe generates a
global egocentric frame of reference and a unified internal sense of our position and
orientation in space [15, 16].
1.5 The Parietal Lobe and Adaptability of Spatial Frames of Reference
Despite the obvious stability of this representation over time, Sekiyama argues that
the body schema is somewhat adaptable [13]. Studies from a range of domains indi-
cate that these adaptations occur in the high-level representation of the IPS. It appears
192 M. Keehner
that the bimodal IPS neurons can take account of changes to information from multi-
ple senses and as a result can alter the internal representation of the body.
Such modifications to the body schema are seen in prism adaptation studies, as dis-
cussed earlier, in which bimodal neurons of the IPS alter the way that they code the
relationship between vision and touch to recalibrate discrepant sensory information
caused by wearing prisms [7]. Similar modifications to the body schema are evident
in studies involving tool use in monkeys. Research has shown that changes to spatial
coding of the limbs result from extensive experience of using long tools or instru-
ments, such that the tips of the tools become coded in the same way as the tips of the
limbs [17]. As with prism adaptation, this recalibration of the spatial extent of the
limb is reflected in changes at the neural level [18]. Essentially these kinds of flexible
processes allow the system to alter the way that different spatial frames of reference
operate together, in order to maintain a coherent sense of space and position.
Perhaps it is this capacity to adapt to new information, both real and imagined, that
allows us to perform everyday spatial tasks. The computational processes involved in
encoding, maintaining, and manipulating spatial information include the kinds of
spatial transformation processes that psychologists study and that define what we call
spatial ability [19]. Standardized tests of spatial ability, as well as everyday operations
in the real world, include tasks such as imagining how an object would change if we
picked it up and turned it (mental rotation) or imagining how the world would look,
and the consequences for our intended actions, if we moved to a different location or
orientation in space (perspective shifting). What all of these processes have in com-
mon is the requirement to represent, manipulate, update, and reconcile different spa-
tial frames of reference. These flexible processes are among the key determinants of
spatial ability, and therefore individuals with better spatial abilities should be better
able to reconcile sensory cues that represent conflicting frames of reference. This is
the central hypothesis in the present analysis.
1.6 The Set-Up in Our Study and in Typical Hand Assisted Surgery
In typical minimally invasive surgery conditions, the surgeon has no direct view of
the operative site, but must instead depend on a 2-D image from the laparoscopic
camera presented on a monitor. This image lacks binocular depth cues, and further-
more it is quite common for the laparoscope to be inserted into the patient at an angle
that differs from the orientation of the surgeon. This means that the viewpoint from
which the camera captures the operative site is inconsistent with the surgeon's per-
spective, and presumably some kind of spatial transformation, such as mental rotation
or perspective shifting, must be performed in order to align the two. In extreme cases,
the laparoscope may be inserted through a port in the patient's body that produces a
view of the operative site that is up to 180° discrepant from the surgeon's perspective.
Ideally, the surgeon seeks to minimize the discrepancy between their view and that of
the camera, but this is by no means always possible and in any case the angle of the
laparoscope is often altered multiple times during a procedure in order to provide
unobstructed views of particular structures or to allow a particular instrument to be
inserted through a specific port.
In traditional minimally invasive surgery, the surgeon has no direct contact with
the operative site using his or her hands. However, in hand assisted methods, one
hand is inserted into the operative site through a slightly enlarged port in the patient's
body. This allows the surgeon to use one hand like a very dexterous instrument, and
it also makes the hand visible on the monitor via the laparoscopic camera. These are
the conditions that we replicated in our original study. The hand was either inserted
into the task space, and thus it appeared on the monitor, or it was not present in the
task space and was therefore not visible via the camera. It was not allowed to inter-
fere with the task at all, so that any effect of seeing the hand in view of the camera
was due to its presence alone, and not to any benefits that might result from using it to
help with the spatial task.
In the original paper, we found that both camera angle and spatial ability had main
effects on performance. We also found that having the hand in view was helpful,
relative to performing the task without the hand in view, for all participants when the
camera was inserted from the left side of the workspace. Contrary to this, we found
unexpectedly that when the camera was inserted from the right side of the workspace,
having the hand in view impaired performance for lower spatial participants only [1].
In what follows, I explore these effects further, and attempt to establish whether
there is some qualitative difference in the effects of seeing the hand in view of the
camera that depends on how the hand looks. I also examine whether and under which
circumstances these effects depend on the spatial abilities of the individual partici-
pant. Given the preceding discussion of the importance of spatial ability for reconcil-
ing different frames of reference, and the fundamental connection between what we
see and what we feel, I predict that spatial ability may be especially important for
reconciling incongruent visual and kinesthetic cues and for adapting to inconsisten-
cies between these two sources of information.
2 Method
2.1 Participants
Forty right-handed paid volunteers (18 males) were recruited from the UC Berkeley
undergraduate population, mean age 20.1 years (SD 2.3 years).
2.2 Apparatus
The apparatus was constructed to mimic laparoscopic conditions (see Figure 1). The
participant’s view of the workspace was provided by a laparoscope, with the image
presented on a monitor at head height. A shield prevented direct view of the work-
space. A permanent marker pen was attached to the end of a laparoscopic instrument,
whose movements were constrained by a fulcrum (analogous to the point of entry
through the abdominal wall). The instrument was offset -45º (+315º) in azimuth.
Holding the instrument handle with their right hand, participants used the monitor
image to guide the pen tip around a star-shaped maze mounted on a platform.
2.3 Design
The independent variables were camera angle and spatial ability. The dependent
variable was error difference (number of errors with hand in view minus number of
194 M. Keehner
Monitor
90 270
Shield
Participant
Fig. 1. Experimental setup. The participant’s view of the maze and the instrument tip was
obscured by the shield and they completed the maze drawing task using only the image from
the laparoscope, which was positioned either at 90º (left side) or at 270º (right side). On half of
the trials, the participant’s left hand was visible in the monitor image.
errors without hand in view). Reported here are two conditions (camera angles 90º
and 270º) that were common across three separate experiments. Amalgamating the
experiments resulted in a mixed design, in which some participants participated in
both camera angle conditions, while others participated in either 90º or 270º but not
both. The methodologies of the experiments were identical in all essential design
features (instructions and practice trials, apparatus, procedure, counterbalancing of
conditions and trials, total number of conditions and trials).
2.4 Procedure
The laparoscopic camera was secured at one of two positions (offset in azimuth by
90º or 270º; see Figure 1). On half of the trials, participants were instructed to hold
the maze platform so that their left hand appeared on the monitor (the hand did not
interfere or help with the task). Participants completed one practice trial at each angle
(using a different maze), followed by four experimental trials, two with the hand in
view and two without (order ABBA/BAAB). Instructions were given to complete the
star mazes as quickly as possible but with as few errors as possible.
The order of conditions was counterbalanced using a Latin square design. Spatial
visualization ability was assessed using three paper-and-pencil tests: the Mental Rota-
tions Test [20], the Paper Folding Test [21] and the Card Rotations Test [21]. These
tests correlated positively (r = .58 to .63), so standardized scores (z-scores) were cal-
culated for the three tests and they were averaged to produce an aggregate measure of
spatial ability. A median split was performed on the aggregate measure to create two
groups, defined as high and low spatial ability.
Errors were scored blind after the task was completed by a manual frequency
count. Using the ink trace, one error was allocated for every time the pen crossed the
outer border of the maze.
3 Results
Previously, we reported main effects on performance of camera angle, hand position,
spatial abilities, and the interactions among these variables. In the present analysis a
new variable was created to establish whether performance was affected positively or
negatively by having the hand in view. In this analysis, performance without the hand
in view was used as the baseline and the positive or negative effects of seeing the
hand in the monitor were assessed against this. This variable was generated by sub-
tracting the number of errors made without the hand in view from the number of er-
rors made with the hand in view, in each of the two conditions. Thus, a negative error
difference indicates that seeing one’s own hand helped performance, whereas a posi-
tive error difference indicates that seeing one’s own hand impaired performance. This
new variable makes it possible to isolate the effect of seeing the hand in the camera
-25
-20
Error difference (with hand - without hand)
-15
-10 High
spatial
-5
0
Low
spatial
5
10
15
Camera at 90 degrees Camera at 270 degrees
(back view of hand) (palm view of hand)
Fig. 2. Difference in errors with hand in view versus without hand in view, under the two view-
ing orientations, split by high and low spatial participants (median split of aggregate ability
measure). Error bars represent +/- 1 SEM.
196 M. Keehner
view, and to determine whether the effect is negative or positive, relative to not seeing
the hand. In all analyses the variables met assumptions of normality.
Figure 2 represents this difference in errors with the hand in view versus without
the hand in view under the two viewing orientations, split by high and low spatial
participants (median split of aggregate spatial ability measure). This plot indicates
that qualitatively different patterns of errors occurred when the camera was positioned
to show the back view of the hand versus when it showed the palm view of the hand.
When the back view was visible (90º), seeing the hand improved performance for all
participants. An independent samples t-test showed that this effect did not differ for
higher and lower spatial participants, t(25) = .67, p = .51, n/s. By contrast, when the
palm view was visible (270º), seeing the hand impaired performance for lower spatial
participants (more errors) but it somewhat helped performance for higher spatial par-
ticipants (fewer errors), and this difference between higher and lower spatial partici-
pants was significant, t(21) = -2.93, p = .008.
Four separate one-sample t-tests with alpha adjusted for multiple comparisons
tested these effects against zero. This analysis showed that all of the error differences
except one were significantly different from zero (t = -3.77 to 3.98, p = .003 to .002,
in all significant cases). Thus, in the 90º condition, seeing the hand was significantly
beneficial to both low and high spatial participants. By contrast, in the 270º condition,
seeing the hand significantly impaired low spatial participants, whereas it did not
significantly affect high spatial participants, either positively or negatively.
90º 270º
Fig. 3. Relationship between spatial ability and error difference (hand versus no hand) in the
two viewing conditions. Points below the dotted line indicate performance that was better with
the hand in view than without, and points above the dotted line indicate performance that was
poorer with the hand in view than without. The solid line represents the best-fit regression line.
These patterns were explored further using correlational analyses. Figure 3 shows
the relationships between spatial ability and error difference (hand minus no hand) in
the two viewing conditions. The dotted line indicates the level of errors at
which there was no effect, either positive or negative, of seeing the hand relative to
not seeing the hand. Points below this line indicate that seeing the hand helped per-
formance, whereas points above this line indicate that seeing he hand hurt perform-
ance, relative to not seeing the hand. The solid line is the best-fit regression line.
Figure 3 indicates that there was no systematic relationship between spatial ability
and error difference in the 90º view condition (back view), r = -.007, p = .97, n/s. By
contrast, in the 270º condition, Figure 3 shows a clear linear relationship between
spatial ability and error difference, indicating that as spatial ability decreases, the
detrimental effect of seeing the hand increases. This correlation was significant, r = -
.54, p = .008.
4 Discussion
This new analysis of the data from these experiments reveals that qualitatively differ-
ent effects occurred depending on the view of the hand that was available to the
participant in each trial. In the 90° view condition, when the back of the hand was
visible, all participants benefited from seeing their own hand in the monitor. Further-
more, in this condition there was no significant correlation with individual differences
in spatial ability. By contrast, the effects in the 270º condition, in which the palm
view of the hand was visible, were quite different. In this condition, low spatial par-
ticipants were significantly impaired when they saw their own hand in the monitor,
while high spatial participants did not experience any significant benefit or detriment.
Moreover, there was a strong correlation between spatial ability and the effect of
seeing the hand. In this condition, individual differences in spatial ability strongly
predicted whether an individual became confused by the sight of their own hand.
If we compare overall performance, it is clear from Figure 3 that there is overall
more benefit gained from seeing the back view of the hand (90° condition) - more of
the data points fall below the dotted line, indicating that people are better off with the
sight of the hand than without it. By contrast, in the 270° condition around half of all
participants do worse when they see their own hand than when they do not (data
points above the dotted line), and these are primarily lower spatial individuals.
What is responsible for the enhancement of performance in the 90° condition and
the apparent confusion in the 270° condition, caused by seeing one's own hand? It
would appear that the view of the hand in the 90° condition is adequately aligned with
how the hand feels (its internal representation in the body schema) that it does not
cause confusion. In terms of previous research with monkeys, this might be analo-
gous to the responses that occur in bimodal neurons only when the visual and
kinesthetic information are sufficiently compatible [2, 8]. Perhaps this harmonious
“visuotactile” representation of the hand in space is what helps the participant better
understand the spatial relations of the task when the hand appears in the camera image
compared to when it is not present.
By contrast, the view of the hand in the 270° condition does not allow the partici-
pant to compensate for the camera misalignment. In fact, for individuals with poor
spatial abilities it caused more confusion than when the hand was not visible at all.
This suggests that how the hand looks in this condition is fundamentally at odds with
how it feels. In other words, it is not possible to reconcile this view of the hand with
the internal representation of the hand’s position in the body schema. In this sense,
this condition seems analogous to previous studies where false hands were placed in
orientations that were too incongruent with the “felt” hand position to allow the illu-
sion of unity to prevail [8].
198 M. Keehner
Figure 4 shows how the hand looks from these two camera orientations. There are
at least two possible reasons why the palm view of the hand should be so difficult to
reconcile with the internal representation, compared to the back view. One possibility
is that the default representation of the hand in the body schema, which has been
shown to be the back view [5], means that the 90° view can be more readily integrated
with this, whereas the 270° palm view causes too much of a conflict. Another possi-
bility is that the appearance of the hand, and especially the angle of the wrist, in the
270° view is a biomechanically awkward position for the left hand to adopt, and
therefore it is difficult to perceive the seen hand as one’s own left hand. In fact, it is
almost impossible to move the left hand in such a way as to produce this view of it
under normal circumstances, whereas it is relatively easy to orient the left-hand in
such a way as to produce a view similar to that in the 90° viewing condition. This
account is consistent with previous research on mental rotation of hands, which has
shown that motor imagery (imagined movements of body parts) is subject to the same
biomechanical constraints as real movements of the body in space [22]. These two
accounts are not mutually exclusive. Indeed, given that extended visual experience of
seeing the hands in particular orientations can influence the internal body schema
[6, 7] it seems plausible that they might, if anything, be mutually reinforcing.
Fig. 4. View of the hand from the 90º camera orientation (left) and the 270º camera orientation
(right)
Why is spatial ability so important in the 270° view condition? If we assume that
the confusion in this condition arises from difficulties with reconciling conflict be-
tween two incompatible frames of reference (visual and kinesthetic), this gives us an
interesting insight into what kinds of abilities psychometric spatial tests such as the
ones we used may be tapping. Perhaps one of the key components of cognitive spa-
tial abilities is the ability to represent, manipulate, update, and reconcile different
spatial frames of reference. It has been claimed that all spatial manipulation tasks
essentially involve manipulating relations among three spatial frames of reference:
egocentric, object-centered, and environmental [23]. For example, mental rotation
tests require the test taker to transform the orientation of an object around its intrinsic
axes and then update the transformed object-centered frame of reference in relation to
stable reference frames of the environment and the self (egocentric). Paper folding
tests require the test taker to manipulate internal parts of an object with respect to the
object’s overall frame of reference. Test of spatial orientation, which involve egocen-
tric perspective shifts, require the test taker to transform and update their own egocen-
tric reference frame with respect to stable environmental and object-centered frames
of reference. Thus, it is possible that individuals who performed poorly on the psy-
chometric spatial ability tests that we administered were generally poor at such proc-
esses, and therefore also had particular difficulty reconciling the conflict between the
visual and kinesthetic frames of reference in the 270° condition.
Although somewhat speculative, this interpretation is consistent with what we
know about brain regions involved in integrating multiple frames of reference. Spatial
information from many different sensory sources are integrated in posterior parietal
cortex into a coherent whole [15, 16]. It has also been shown that spatial transforma-
tion tasks such as mental rotation involve these same parietal regions [24-28], and
moreover, individual differences in parietal activation have been shown to correlate
with individual differences in spatial abilities [29]. Thus, it may be that an essential
function of this region is to encode, represent, manipulate, and reconcile different
spatial frames of reference.
While more research is needed to demonstrate that these effects are replicable
within a single experiment and with a larger sample, the present analysis does raise
some interesting potential avenues to pursue. Future studies could establish the pa-
rameters of congruent versus conflicting visual and kinesthetic cues. For example, is
there a degree of rotation of the image of the hand at which the information changes
from being primarily helpful to primarily hurtful in these kinds of tasks (at least for
lower spatial individuals)? Another interesting future question is whether extended
visual experience of the hand in apparently incongruous orientations can overcome
confusion such as that observed in the 270º palm-view condition. Could this view of
the hand eventually become integrated with the body schema representation, such as
occurs in prism adaptation studies [7], and consequently help in spatial reasoning
tasks such as these, even for individuals with poorer spatial abilities?
If replicable, the implications of these findings for hand-assisted minimally inva-
sive surgery are clear. In previous studies we have found that laparoscopic surgeons
comprise the same wide range of spatial abilities as the general population [30], be-
cause the domain of medicine does not pre-select for these abilities. Therefore it is
likely that surgeons using these methods will be subject to the same effects that were
evident here (at least in the beginning of their laparoscopic experience - we do not
know about possible effects of extended experience with these methods). Thus, while
in some conditions seeing the hand in the operative view may be helpful, as surgeons
claim, in other circumstances it may actually impair their understanding of the spatial
relations of the operative space. Knowing how to avoid these conditions with judi-
cious laparoscope placement might be an important applied outcome of this line of
research.
Finally, these findings shed light on the interface between vision and touch and the
multimodal nature of our apparently unitary internal representation of the space
around us. They also highlight the importance of individual differences. The data
suggest that spatial ability is a key variable, and should be included in theoretical
accounts of how, and how well, people generate, maintain, and manipulate their men-
tal representations of space.
200 M. Keehner
References
1. Keehner, M., Wong, D., Tendick, F.: Effects of viewing angle, spatial ability, and sight of
own hand on accuracy of movements performed under simulated laparoscopic conditions.
In: Proceedings of the Human Factors and Ergonomics Society’s 48th Annual Meeting, pp.
1695–1699 (2004)
2. Graziano, M.S.A., Gross, C.G.: A bimodal map of space - somatosensory receptive-fields
in the macaque putamen with corresponding visual receptive-fields. Experimental Brain
Research 97(1), 96–109 (1993)
3. Driver, J., Spence, C.: Attention and the crossmodal construction of space. Trends in Cog-
nitive Sciences 2, 254–262 (1998)
4. Penfield, W., Rasmussen, T.L.: The cerebral cortex of man. MacMillan, New York (1955)
5. Sekiyama, K.: Kinesthetic aspects of mental representations in the identification of left and
right hands. Perception and Psychophysics 32, 89–95 (1982)
6. Funk, M., Brugger, P., Wilkening, F.: Motor processes in children’s imagery: the case of
mental rotation of hands. Developmental Science 8(5), 402–408 (2005)
7. Sekiyama, K., et al.: Body image as a visuomotor transformation device revealed in adap-
tation to reversed vision. Nature 407, 374–377 (2000)
8. Graziano, M.S.A.: Where is my arm? Proceedings of the National Academy of Sci-
ences 96, 10418–10421 (1999)
9. Maravita, A., Spence, C., Driver, J.: Multisensory integration and the body schema: Close
to hand and within reach. Current Biology 13, R531–R539 (2003)
10. Pavani, F., Spence, C., Driver, J.: Visual capture of touch: Out-of-the-body experiences
with rubber gloves. Psychological Science 11(5), 353–359 (2000)
11. Grefkes, C., Fink, G.R.: The functional organization of the intraparietal sulcus in humans
and monkeys. Journal of Anatomy 207, 3–17 (2005)
12. Obayashi, S., Tanaka, M., Iriki, A.: Subjective image of invisible hand coded by monkey
intraparietal neurons. Neuroreport. 11(16), 3499–3505 (2000)
13. Sekiyama, K.: Dynamic spatial cognition: Components, functions, and modifiability of
body schema. Japanese Psychological Research 48(3), 141–157 (2006)
14. Graziano, M.S.A., Cooke, D.F., Taylor, C.S.R.: Coding the location of the arm by sight.
Science 290, 1782–1786 (2000)
15. Cohen, Y.E., Anderson, R.A.: A common reference frame for movement plans in the pos-
terior parietal cortex. Nature Reviews Neuroscience 3, 553–562 (2002)
16. Colby, C.L.: Action-oriented spatial reference frames in cortex. Neuron. 20, 15–24 (1998)
17. Maravita, A., et al.: Tool-use changes multimodal spatial interactions between vision and
touch in normal humans. Cognition 83, B25–B34 (2002)
18. Iriki, A., Tanaka, M., Iwamura, Y.: Coding of modified body schema during tool use by
macaque postcentral neurones. NeuroReport 7(14), 2325–2330 (1996)
19. Hegarty, M., Waller, D.: Individual differences in spatial abilities. In: Miyake, A., Shah, P.
(eds.) The Cambridge handbook of visuospatial thinking. Cambridge University Press,
Cambridge (2005)
20. Vandenberg, S.G., Kuse, A.R.: Mental rotations, a group test of three-dimensional spatial
visualization. Perceptual & Motor Skills 47, 599–604 (1978)
21. Ekstrom, R.B., et al.: Manual for kit of factor-referenced cognitive tests. Educational Test-
ing Service, Princeton (1976)
22. Parsons, L.M.: Imagined spatial transformations of one’s hands and feet. Cognitive Psy-
chology 19, 178–241 (1987)
23. Zacks, J.M., Michelon, P.: Transformations of visuospatial images. Behavioral and Cogni-
tive Neuroscience Reviews 4(2), 96–118 (2005)
24. Zacks, J.M., Vettel, J.M., Michelon, P.: Imagined viewer and object rotations dissociated
with event-related fMRI. Journal of Cognitive Neuroscience 15(7), 1002–1018 (2003)
25. Carpenter, P.A., et al.: Graded functional activation in the visuospatial system with amount
of task demand. Journal of Cognitive Neuroscience 11(1), 9–24 (1999)
26. Harris, I.M., et al.: Selective right parietal lobe activation during mental rotation.
Brain 123, 65–73 (2000)
27. Podzebenko, K., Egan, G.F., Watson, J.D.G.: Widespread dorsal stream activation during a
parametric mental rotation task, revealed with functional magnetic resonance imaging.
Neuroimage 15, 547–558 (2002)
28. Keehner, M., et al.: Modulation of neural activity by angle of rotation during imagined
spatial transformations. Neuroimage 33, 391–398 (2006)
29. Lamm, C., et al.: Differences in the ability to process a visuo-spatial task are reflected in
event-related slow cortical potentials of human subjects. Neuroscience Letters 269, 137–
140 (1999)
30. Keehner, M., et al.: Spatial ability, experience, and skill in laparoscopic surgery. American
Journal of Surgery 188(1), 71–75 (2004)
Epistemic Actions in Science Education
Kim A. Kastens1,2, Lynn S. Liben3, and Shruti Agrawal1

1
Lamont-Doherty Earth Observatory of Columbia University
2
Department of Earth & Environmental Sciences, Columbia University
3
Department of Psychology, The Pennsylvania State University
kastens@ldeo.columbia.edu, liben@psu.edu,
shruti@ldeo.columbia.edu
Abstract. Epistemic actions are actions in the physical environment taken with
the intent of gathering information or facilitating cognition. As students and
geologists explain how they integrated observations from artificial rock
outcrops to select the best model of a three-dimensional geological structure,
they occasionally take the following actions, which we interpret as epistemic:
remove rejected models from the field of view, juxtapose two candidate models,
juxtapose and align a candidate model with their sketch map, rotate a candidate
model into alignment with the full scale geological structure, and reorder their
field notes from a sentential order into a spatial configuration. Our study differs
from prior work on epistemic actions in that our participants manipulate spatial
representations (models, sketches, maps), rather than non-representational
objects. When epistemic actions are applied to representations, the actions can
exploit the dual nature of representations by manipulating the physical aspect to
enhance the representational aspect.
Keywords: spatial cognition, epistemic actions, science education.
1 Introduction
Kirsch and Maglio [1] introduced the term "epistemic action" to designate actions
which humans (or other agents) take to alter their physical environment with the
intent of gathering information and facilitating cognition.1 Epistemic actions may
uncover information that is hidden, or reduce the memory required in mental compu-
tation, or reduce the number of steps involved in mental computation, or reduce the
probability of error in mental computation. Epistemic actions change the informa-
tional state of the actor, as well as the physical state of the environment. Kirsch and
1
Magnani [24] used a similar term, "epistemic acting," more broadly, to encompass all actions
that provide the actor with additional knowledge and information, including actions that do
not alter anything in the environment (e.g., "looking [from different viewpoints]," "checking,"
"evaluating," "feeling [a piece of cloth]".) Roth [25] (p. 142) used "epistemic action" to refer
to sensing of objects and "ergotic action" to refer to manipulating objects in a school labora-
tory setting. In this paper, we use the term "epistemic action" in the original sense of Kirsh
and Maglio.
Epistemic Actions in Science Education 203
Maglio contrasted epistemic actions with "pragmatic actions," those taken to imple-
ment a plan, or implement a reaction, or in some other way move oneself closer to a
goal.
Kirsch and Maglio [1] explicated their ideas in terms of the computer game Tetris.
They showed that expert players make frequent moves that do not advance the goal of
nestling polygons together into space-conserving configurations, but do gain infor-
mation. For example, a player might slide a falling polygon over to contact the side of
the screen and then count columns outwards from the side to determine where to drop
the polygon down to fit into a target slot. For a skilled player this backtracking ma-
neuver is more time-efficient than waiting for the polygon to fall low enough for the
judgment to be made by direct visual inspection. At a different point in the game, a
player might rotate a polygon through all four of the available configurations before
selecting a configuration. Kirsh and Maglio showed that such physical rotation, fol-
lowed by direction perceptual comparison of the polygon and the available target
slots, is more time-efficient than the corresponding mental rotation. As an individual
player's skill increases from novice to expert, the frequency of such "extraneous"
moves increases [2].
In this paper, we apply the concept of epistemic actions to science and science edu-
cation. Scientists and science students manipulate objects in the physical world in the
course of trying to solve cognitively demanding puzzles. We argue that epistemic
actions, in the sense of Kirsch and Maglio [1], are an underappreciated tool that scien-
tists use, and that science students could be taught to use, to enhance the efficiency of
their cognitive effort. We begin by showing examples of participant actions that we
believe to be epistemic which emerged in our own study of spatial thinking in geo-
sciences. We then describe epistemic actions in other domains of science education,
and conclude by offering some generalizations and hypotheses about how epistemic
actions may work.
2 Epistemic Actions in Our Geoscience Field Study

Our study [3] investigates how students and professional geologists gather and record
spatial information from rock outcrops scattered across a large field area, and then
integrate that information to form a mental model of a geological structure, keeping in
mind that the structure is partly eroded and mostly buried. Participants observe and
take notes on eight artificial outcrops constructed on a campus, then select from an
array of fourteen 3-D scale models to indicate which they think could represent the
shape of a structure formed by the layered rocks in the eight outcrops. The scale mod-
els vary systematically on key attributes, including convex/concave, circular/elongate,
symmetric/asymmetric, open/closed, and shallow/deep. Participants are videotaped as
they make their selection and explain why they chose the selected model and rejected
the other models. Based on their comments and body language, students find this task
difficult but engaging, and all appear to be trying determinedly to solve the puzzle
posed to the best of their ability.
As detailed elsewhere [4], students use abundant deictic (pointing) gestures to indi-
cate features on their notes, a model or group of models, a real-world direction, or the
outcrops in that real-world direction. For example, a student points over his shoulder
204 K.A. Kastens, L.S. Liben, and S. Agrawal
to indicate the location of the most steeply-dipping outcrops. They also make frequent
use of iconic gestures, while discussing or describing attributes of an observed out-
crop, a specific model, a group of models, or a hypothesized structure. For example, a
student uses a cupped hand to convey her interpretation that the structure is concave
upwards.
In addition to abundant deictic and iconic gestures, the videotapes also document
instances in which participants spontaneously move their hands in ways that do not
have apparent communicative value, manipulating the objects available to them in a
manner that we interpret as "epistemic actions."
2.1 Situation #1: Participant Moves Rejected Models Out of View
Participants frequently begin their reasoning process by eliminating individual models

or categories of models, for example, all the convex models. In many cases, they
merely point out the rejected models with deictic gesture, or describe the rejected
category in words (i.e., "it can't be convex"). But in some cases, they go to consider-
able effort to remove the rejected models from their field of view, for example by
setting them off to the side (Fig. 1), or handing them to the experimenter. We infer
that they are seeking to decrease their perceptual and cognitive load by decreasing the
complexity of the visual array and by reducing the number of possibilities that are
actively competing for their attention. These actions serve to address one of the basic
problems of visual attention, namely that there is a limited capacity for processing
information. Although there is a considerable research literature showing that humans
are able to focus attention on some rather than other stimuli within a particular visual
array [5], at least some processing is necessary when there are competing stimuli, and
thus any actions that reduce that competition may be expected to simplify the task [6].
2.2 Situation #2: Participant Moves Two Candidate Models Side by Side
As participants progress through their reasoning process, they may take two candidate
models out of the array and place them side by side (Fig. 2.) We infer that this action
is intended to facilitate comparing and contrasting attributes of the two models. The
side-by-side comparison technique is employed when the two models differ subtly;
for example, in Fig. 2 the two models are both concave, both elongate, both steep-
sided, both closed, and differ only in that one is symmetrical along the long axis while
the other is asymmetrical. Based on eye movements of people who were asked to
recreate spatial patterns of colored blocks working from a visually-available model,
Ballard, Hayhoe, Pook and Rao [7] concluded that their participants adopted a "mini-
mum memory strategy" when the model and under-construction area were close to-
gether. They kept in mind only one small element of the model (for example, the
color of the next block), and relied on repeated revisits back and forth between the
model and the under-construction block array. The revisits allowed them to acquire
information incrementally and avoid even modest demands on visual memory. Bal-
lard, et al.'s participants overwhelmingly favored this minimal memory strategy even
though it was more time-consuming than remembering multiple aspects of the model,
and even though they were instructed to complete the task as quickly as possible.
When Ballard, et al. increased the distance between model and copy, use of the mini-
mal memory strategy decreased.
Fig. 1. Participant places rejected models out of field of view. We infer that the purpose of this
action is to decrease the number of visually-available comparisons.
Fig. 2. After rejecting most models, this participant took the remaining two candidate models
out of the array and placed them side-by-side, to faciliate comparison of details
We hypothesize that by moving two subtly-different models side-by-side, our par-

ticipants enabled a minimal memory strategy to efficiently compare and contrast
attributes of the models incrementally, without relying on visual memory to carry the
entire model shape as attention is transferred from model to model.
2.3 Situation #3: Participant Moves Candidate Model Adjacent to Inscriptions
In some cases, participants place a candidate 3-D model side by side with their in-
scriptions (field notes) (Fig. 3). We infer that this juxtaposition facilitates the process of
comparing observation (in the notes) with interpretation (embodied in the candidate 3-D
model), presumably through enabling the minimal memory strategy as described above.
Participants' inscriptions took many forms [3], including a map of the field area with
outcrop locations marked. Among the participants who had a map, we noted an addi-
tional epistemic action: participants rotated the map and candidate model such that the
long axis of the model was oriented parallel to the long axis of the cluster of outcrop
positions marked on the map (Fig. 3). This alignment allowed a direct perceptual
Fig. 3. This participant has placed her inscriptions (notes) side by side with a candidate model
to facilitate comparison between her recorded observations and her candidate interpretation
Fig. 4. This participant, an expert, rotates several candidate models so that the long axis of the
model aligns with the long axis of the full-scale structure
comparison of inscriptions and model, without requiring the additional cognitive load of
mental rotation, as in the case of Kirsh and Maglio's [1] Tetris players.
2.4 Situation #4: Participant Rotates Model to Align with the Referent Space
In a few cases, a participant spontaneously rotated a model or models to align with the
full-scale structure formed by the outcrops in the perceptual space2 (Fig. 4). As in
Situation #3, we hypothesize that the alignment achieved by physical rotation enabled
a direct comparison, eliminating the cognitive load of mental rotation. An interesting
aspect of Situation #4 is that the full-scale structure was not perceptually available to
compare with the model structure. Only 2 of the 8 outcrops were visible to the par-
ticipants as they made and defended their model selection. We hypothesize that
Fig. 5. While observing the eight outcrops, this participant recorded observations onto blank
sheets of paper “sententially,” that is, sequenced from top to bottom, left to right on the paper,
like text in a book. When confronted with the integrative task, she tore up her inscriptions into
small rectangles with one outcrop per rectangle, and reorganized them into a map-like spatial
arrangement. (Note: in order to show the reader both the spatial arrangement of the paper scraps
and the details of the sketch, this figure was constructed by scanning the student’s inscriptions
and superimposing the scanned sketches onto a video screen shot).
2
After completing their explanation of their model selection, all participants were asked by the
experimenter to rotate their selected model into alignment with the full-scale structure. In this
paper, we are referring to individuals who spontaneously elected to align their model with the
structure before being asked to do so by the experimenter.
as they moved through the field area from outcrop to outcrop and then back to the
starting place, some participants acquired or constructed an embodied knowledge of
the outcrop locations and configuration, and that embodied knowledge is somehow
anchored to, or superimposed upon the landscape through which they moved.
2.5 Situation #5: Participant Rips Up Inscriptions, and Reorders Them in Space
In the no-map condition of our experiment [3], participants recorded their observa-
tions onto blank paper. Some participants situated their observations spatially to form
a sketch map of the field area, and others recorded their observations "sententially"
[8], in chronological order on the page from top to bottom, left to right, like text in a
book. One participant, a novice to field geology, recorded her observations senten-
tially, sketching each outcrop as she visited it. Then, when she was confronted with
the selection task, she spontaneously tore up her papers so that each outcrop sketch
was on a separate scrap of paper, and arranged the scraps spatially into a rough plan
view of the outcrop locations (Fig. 5).
3 Other Occurrences of Epistemic Actions in Science Education

In the laboratory or "hands-on" component of a well-taught science education pro-
gram, students are engaged in manipulating physical objects while thinking hard—
conditions that may tend to foster use of epistemic actions. And indeed, we can envi-
sion epistemic actions across a range of science fields. For example:
• Elementary school children grow bean plants in paper cups. They place their bean
plants in a row along the window sill such that each plant gets the same amount of
sunlight. Each child waters his or her bean plant by a different amount each day.
Initially, they arrange the plants in alphabetical order by child's name. Then, as the
plants sprout and begin to grow, they rearrange the bean plants in order by amount
of daily watering, to make it easier to see the relationship between amount of water
and growth rate.
• High school chemistry students arrange their test tubes in a test tube rack in order
so that the tube that received the most reagent is farthest to the right.
• College paleontology students begin their study of a new taxonomic group by
arranging fossils across the lab table in stratigraphic order from oldest to youngest,
to make it easier to detect evolutionary trends in fossil morphology.
• Earth Science students begin their study of igneous rocks by sorting a pile of hand
samples into a coarse-grained cluster and a fine-grained grained cluster, to rein-
force the conceptual distinction between intrusive rocks (which cooled slowly
within the Earth's crust and thus have large crystals) and extrusive rocks (which
cooled quickly at the Earth's surface and thus have small crystals).
• Elementary school geography students, or high school Earth Science students,
rotate the map of their surroundings until map and referent are aligned. This makes
it easier to see the representational and configurational correspondences between
map and referent space, without the effort of mental rotation, which is known to be
a cognitively demanding task [9].
4 Discussion
4.1 Are Epistemic Actions Consciously Purposeful?
The participants in our study produced the actions described above spontaneously, as
they struggled to puzzle their way through a spatially-demanding task that most found
difficult. Some participants first asked whether it was OK to move or turn the models,
which suggests that they knew in advance that such actions would be beneficial. They
valued these actions sufficiently that they were willing to risk rejection of a poten-
tially forbidden move, and they anticipated that the experimenter might see these
actions as being of sufficient value to outlaw.
4.2 Are Epistemic Actions Always Spatial?
All of the examples of epistemic actions we have provided thus far, and the original
Tetris examples of Kirsch and Maglio [1], have involved spatial thinking, that is, think-
ing that finds meaning in the shape, size, orientation, location, direction, or trajectory of
objects, processes, or phenomena, or the relative positions in space of multiple objects,
processes, or phenomena. Spatial examples of epistemic actions seem most obvious and
most powerful. But is this association between epistemic actions and spatial thinking
inevitable? Are all epistemic actions in service of spatial thinking?
No. It is possible to think of counter-examples of epistemic actions that seek non-
spatial information. An everyday example would be placing two paint chips side by
side to make it easier to determine which is darker or more reddish, seeking informa-
tion about color. The science equivalent would be placing a spatula full of dirt or
sediment next to the color chips in the Munsell color chart [11].
4.3 Taxonomies of Epistemic Actions
Kirsh [12] developed a classification scheme for how humans (or other intelligent
agents) can manage their spatial environment: (a) spatial arrangements that simplify
choice; (b) spatial arrangements that simplify perception, and (c) spatial dynamics
that simplify internal computation. Our Situation #1, in which participants remove
rejected 3-D models from view, is a spatial arrangement that simplifies choice. Situa-
tion #2 and #3, in which participants juxtapose two items to simplify comparison, are
spatial arrangements that simplify perception. Situations #3 and #4 from the outcrop
experiment, plus the case of rotating a map to align with the terrain, simplify internal
computation by eliminating the need for mental rotation.
Kirsh's scheme classified epistemic actions according to the change in cognitive or
informational state of the actor. Epistemic actions could also be classified by the na-
ture of the change to the environment: (a) relocate/remove/hide objects, (b) cluster
objects, (c) juxtapose objects, (d) order or array objects, (e) rotate/reorient objects.
Considering both classification schemes together yields a two-dimensional matrix for
categorizing epistemic actions (Table 1). Each cell in the matrix of Table 1 describes
benefits obtained by the specified change to the environment (row) and change to the
cognitive state of the actor (column).
Table 1. Two-dimensional taxonomy of epistemic actions
Changes to cognitive state of actor (after Kirsh)

Change to
Simplify choice Simplify perception Simplify cognition
environment
Less visual input, Fewer pairwise
Remove or Fewer apparent
fewer visual dis- comparisons re-
hide object(s) choices
tractions quired
Choice is among
Easier to see
few clusters (e.g.,
within-group simi- Fewer attributes that
concave vs. con-
Cluster objects larities; easier to need to be consid-
vex) rather than
see between-group ered
among many indi-
differences
viduals
Easier to see differ-
Juxtapose Less demand on
ences and similari-
objects visual memory
ties
Easier to select Easier to see trends
end members (e.g., (e.g., bean plant
Order or array No need for mental
largest, smallest) growth by watering
objects re-ordering
or central "typical" rate) and correla-
example tions
Rotate/ reori- Easier to see corre- No need for mental
ent objects spondences rotation
"Juxtapose objects" appears at first glance to be a special case of "cluster objects,"

but we have separated them because the information gained and the nature of the
change of cognitive state may be different. The value-added of juxtaposing two simi-
lar objects is that it is easier to perceive similarities and differences, without the cog-
nitive load of carrying a detailed image of object 1 in visual memory while the gaze is
shifted laterally to object 2 [7]. The value-added of clustering objects into groups is
that one can then reason about a small number of groups rather than a larger number
of individual objects. An example of the latter would be separating the trilobites from
the brachiopods in a pile of fossils; an example of the former would be juxtaposing
two individual trilobite samples to compare their spines.
The taxonomy of Table 1 has been structured to accommodate a variety of tasks
and to allow extension as new observations accrue from other studies.
4.4 Epistemic Actions and the Duality Principle of Representations
Kirsh's [12] taxonomy of actions to manage space was based on observation of people
playing games and engaging in everyday activities such as cooking, assembling fur-
niture, and bagging groceries. In the case of science or science education, we suggest
that epistemic actions can enhance cognition in a manner not explored by Kirsh: epis-
temic actions can exploit or enhance the dual nature of representations.
A spatial representation, such as a map, graph, or 3-D scale model, has a dual na-
ture: it is, simultaneously, a concrete, physical object, and a symbol that represents
something other than itself [13-18]. We suggest three ways in which epistemic actions
can exploit or enhance the dual nature of representations:
1. The action can rearrange or reorder the physical aspect of the representation so that
the referential aspect of the representation is more salient and/or has more
dimensions.
2. The action can rearrange or reorder the physical aspect of the materials so that a
more useful representation replaces a less useful representation.
3. The action can create a dual-natured representation from what had previously been
mere non-representational objects.
Mechanism (1): Manipulate the Physical Representation to Enhance or Fore-
ground its Referential Meaning. In Situation #4 of the artificial outcrop experiment,
an expert rotates candidate 3-D scale models to align with the full-scale structure.
Before rotation, the correct model accurately represented the full-scale structure with
respect to the attributes of concave/convex, elongate/circular, steep-sided/gentle-
sided, symmetric/asymmetric, and closed/open. After rotation, the model accurately
represented the full-scale structure with respect to all of those attributes, and also with
respect to alignment of the long axis. In other words, manipulating the physical object
transformed the representation into a more complete or more perfect analogy to the
referent structure. The same is true of rotating a map to align with the represented
terrain [19].
In addition to creating a new correspondence (alignment) where none had existed
previously, rotating the correct model to align with the referent space makes the other
correspondences more salient, and easier to check or verify. On the other hand, if the
model chosen is an incorrect model (for example, open-ended rather than closed-
contoured), the discrepancy between model and full-scale structure becomes harder to
overlook when the long axes of the model and referent are brought into alignment.
Mechanism (2): Manipulate the Physical Representation to Create a More Useful
Representation. In Situation #5 of the artificial outcrop experiment, the participant
had initially arranged annotated sketches of each outcrop onto her paper such that the
down-paper dimension represented the temporal sequence in which the eight outcrops
had been visited and the observations had been made. Upon receiving the task direc-
tions and seeing the choice array, she apparently realized that this was not a useful
organizational strategy. She physically destroyed that organization schema. Then she
physically reorganized the fragments into a more task-relevant spatial arrangement, in
which positions of outcrop sketches represented positions of full-scale outcrops. This
participant apparently had the ability to think of her inscriptions as both (a) a concrete
object that could be torn into pieces and reordered, and (b) a set of symbolic marks
standing for individual outcrops.
Mechanism (3): Manipulate the Physical World to Carry Representational
Meaning. In several of the examples described above, the objects have no represen-
tational significance before the epistemic action. The epistemic action creates repre-
sentational significance where none had previously existed.
For example, in the case of the children's growing bean plants, as a consequence
of the epistemic action, the spatial dimension parallel to the window sill becomes a
representation of water per unit time. The vertical dimension, the height of each plant,
becomes a representation of growth rate as a function of watering rate. The entire
array of plants becomes a living bar graph.
In the case of the fossils arranged on the table, the spatial dimension along the line
of fossils acquires two representational aspects, which run in parallel: geologic time
and evolutionary distance.
In the case of the igneous rocks, the two piles of rocks, fine-grained and coarse-
grained, represent the fundamental division of igneous rocks into extrusive and intru-
sive products of cooling magma. Within each pile, the rocks could further be ordered
according to the percentage of light-colored minerals, an indicator of silica content.
Kirlik [20] presents a compelling non-science example, in which a skilled short-
order cook continuously manipulates the positions of steaks on a grill, such that the
near-far axis of the grill (from the cook's perspective) represents doneness requested
by the customer, and the distance from left-hand edge of the grill represents time
remaining until desired doneness. This skilled cook need only monitor the perceptu-
ally-available attribute of distance from the left edge of grill, and need not try to per-
ceive the hidden attribute of interior pinkness, nor try to remember the variable attrib-
ute of elapsed-duration-on-grill. A less skilled cook in the same diner created only
one axis of representation (the near-far requested-doneness axis), and the least skilled
cook had no representations at all, only steaks.
5 Conclusions and Directions for Further Research

Cowley and MacDorman [21] make the case that capability and tendency to use epis-
temic actions is an attribute that separates humans from other primates and from an-
droids. If so, then we might expect that the most cognitively demanding of human
enterprises, including science, would make use of this capability.
In reflecting on the significance of their work, Maglio and Kirsh [2] note (p. 396)
that "it is no surprise…that people offload symbolic computation (e.g., preferring
paper and pencil to mental arithmetic…), but it is a surprise to discover that people
offload perceptual computation as well." This description applies well to science
education. Science and math educators have long recognized the power of "offloading
symbolic computation," and explicitly teach the techniques of creating and manipu-
lating equations, graphs, tables, concept maps, and other symbolic representations.
However, science educators have generally not recognized or emphasized that hu-
mans can also "set up their external environment to facilitate perceptual processing"
(p. 396).
All science reform efforts emphasize that students should have ample opportunities
for "hands-on" inquiry [22]. But we are just beginning to understand what students
should do with those hands in order to make connections between the physical objects
available in the laboratory or field-learning environment and the representations and
concepts that lie at the heart of science. We hypothesize that epistemic actions may be
a valuable laboratory inquiry strategy that could be fostered through instruction and
investigated through research.
Questions for future research include the following: Can instructors foster epis-
temic actions in their students? If so, do student learning outcomes on laboratory
activities improve? Is there individual variation in the epistemic actions found useful
by different science students or scientists, as Schwan and Riempp [23] have found
during instruction on how to tie nautical knots? Do those scientists who have reputa-
tions for "good hands in the lab" make more epistemic actions than those who do not,
by analogy with the strategic management of one's surrounding space that Kirsh [12]
found to be an attribute of expertise in practical domains?
Acknowledgements. The authors thank the study participants for their thoughts and
actions, G. Michael Purdy for permission to use the grounds of Lamont-Doherty Earth
Observatory, T. Ishikawa, M. Turrin and L. Pistolesi for assistance with data acquisi-
tion, L. Pistolesi for preparing the illustrations, and the National Science Foundation
for support through grants REC04-11823 and REC04-11686. The opinions are those
of the authors and no endorsement by NSF is implied. This is Lamont-Doherty Earth
Observatory contribution number 7171.
References
1. Kirsh, D., Maglio, P.: On distinguishing epistemic from pragmatic action. Cog. Sci. 18,
513–549 (1994)
2. Maglio, P., Kirsh, D.: Epistemic action increases with skill. In: Proceedings of the 18th an-
nual meeting of the Cognitive Science Society (1996)
3. Kastens, K.A., Ishikawa, T., Liben, L.S.: Visualizing a 3-D geological structure from out-
crop observations: Strategies used by geoscience experts, students and novices [abstract].
Geological Society of America Abstracts with Program, 171–173 (2006)
4. Kastens, K.A., Agrawal, S., Liben, L.S.: Research in Science Education: The Role of Ges-
tures in Geoscience Teaching and Learning. In: Geosci, J. (ed.) (2008)
5. Broadbent, D.E.: Perception and Communication. Oxford University Press, Oxford (1958)
6. Desimone, R., Duncan, J.: Neural mechanisms of selective visual attention. Ann. Rev. of
Neurosci. 18, 193–222 (2000)
7. Ballard, D.H., Hayhoe, M.M., Pook, P.K., Rao, R.P.N.: Deictic codes for the embodiment
of cognition. Beh. & Brain Sci. 20, 723–767 (1997)
8. Larkin, J.H., Simon, H.A.: Why a diagram is (sometimes) worth ten thousand words. Cog.
Sci. 11, 65–99 (1987)
9. Shepard, R.N., Metzler, J.: Mental Rotation of Three-Dimensional Objects. Sci. 171, 701–
703 (1971)
10. Liben, L.S., Downs, R.M.: Understanding Person-Space-Map Relations: Cartographic and
Developmental Perspectives. Dev. Psych. 29, 739–752 (1993)
11. Goodwin, C.: Practices of Color Classification. Mind, Cult., Act. 7, 19–36 (2000)
12. Kirsh, D.: The intelligent use of space. Artif. Intel. 73, 31–68 (1995)
13. Goodman, N.: Languages of art: An approach to a theory of symbols. Hackett, Indianapo-
lis (1976)
14. Potter, M.C.: Mundane Symbolism: The relations among objects, names, and ideas. In:
Smith, N.R., Franklin, M.B. (eds.) Symbolic functioning in childhood, pp. 41–65. Law-
rence Erlbaum Associates, Hillsdale (1979)
15. DeLoache, J.S.: Dual representation and young children’s use of scale models. Child
Dev. 71, 329–338 (2000)
16. Liben, L.S.: Developing an Understanding of External Spatial Representations. In: Sigel,
I.E. (ed.) Development of mental representation: theories and applications, pp. 297–321.
Lawrence Erlbaum Associates, Hillsdale (1999)
17. Liben, L.S.: Education for Spatial Thinking. In: Renninger, K.A., Sigel, I.E. (eds.) Hand-
book of child psychology, 6th edn., vol. 4, pp. 197–247. Wiley, Hoboken (2006)
18. Uttal, D.H., Liu, L.L., DeLoache, J.S.: Concreteness and symbolic development. In: Balter,
L., Tamis-LeMonde, C.S. (eds.) Child Psychology: A Handbook of Contemporary Issues,
pp. 167–184. Psychology Press, New York (2006)
19. Liben, L.S., Myers, L.J., Kastens, K.A.: Locating oneself on a map in relation to person
qualities and map characteristics. In: Freska, C., Newcombe, N.S., Gärdenfors, P. (eds.)
Spatial Cognition VI. LNCS, vol. 5248. Springer, Heidelberg (2008)
20. Kirlik, A.: The ecological expert: Acting to create information to guide action. In: The
Conference on Human Interaction with Complex Systems, Piscataway, NJ (1998)
21. Cowley, S.J., MacDorman, K.F.: What baboons, babies and Tetris players tell us about in-
teraction: A biosocial view of norm-based social learning. Cog. Sci. 18, 363–378 (2006)
22. National Research Council.: National Science Education Standards. National Academy
Press, Washington (1996)
23. Schwan, S., Riempp, R.: The cognitive benefits of interactive videos: learning to tie nauti-
cal knots. Learn. and Instr. 14, 293–305 (2004)
24. Magnani, L.: Model-based and manipulative abduction in science. Found. of Sci. 9, 219–
247 (2004)
25. Roth, W.M.: From epistemic (ergotic) actions to scientific discourse: The bridging func-
tion of gestures. Prag. and Cogn. 11, 141–170 (2003)
An Influence Model for Reference Object
Selection in Spatially Locative Phrases
Michael Barclay and Antony Galton
Exeter University, Exeter, UK

mjb231@ex.ac.uk
Abstract. A comprehensive influence model for choosing a reference ob-

ject in a spatially locative phrase is developed. The model is appropriate
for a Bayesian network implementation and intended as a step toward
machine learning of spatial language. It takes its structure from the nec-
essary steps a listener must take in utilising spatial communication and
contains as variables parameters derived from the literature concerning
characteristics of landmarks for wayfinding as well as reference objects
in general. Practical limitations on the implementation and training of
the model are discussed.
Keywords: Bayesian, reference, spatial, locative.
1 Introduction
1.1 Reference Objects in Spatially Locative Phrases

A spatially locative phrase is one in which a reference object, a spatial preposition
and an object to be located (herein the “target”) are combined for the purpose
of identifying the position of the target to the listener. Hence “The cup is on the
table” and “The bird flew over the hill” are both examples of simple spatially
locative phrases, with the cup and the bird respectively as targets, and the table
and the hill as reference objects, further information and any temporal reference
being provided by the verb. In the landmark domain a phrase such as “The
meeting place is in front of the cathedral” is clearly in this category.
The problem addressed by this paper, illustrated in Fig. 1, is to identify a
suitable reference object from the many present in a scene, and so take the first
step in forming a spatially locative phrase. In Fig. 1 in answer to the question
“where is the man?” the answers “by the skip” or “in front of the white house”
are acceptable answers, but what about “on the sidewalk” or “to the right of
the road”? If Talmy’s categorisation (see [1] and Sect. 2.1) of reference objects
is considered, the road and the sidewalk should be suitable candidates, but it
is apparent that they do not serve to locate the man as well as the skip or the
white house.
It should be noted that this is not the same issue as involved in generating
referring expressions (see for example [2], [3], [4]). The target object in this paper
is assumed to be unique: it needs to be located not disambiguated. A candidate

An Influence Model for Reference Object Selection 217
Fig. 1. Influences on search-space optimisation
reference object may or may not be ambiguous and this leads to a variety of issues
which are discussed in Sect. 3.4. Even an ambiguous candidate reference object
must be treated differently from a referent in a referring expression because it
has a purpose in helping to locate the target.
1.2 Spatial Language Generation

While much work on the characteristics of landmarks and some on the charac-
teristics of reference objects in general has been undertaken (see Sect. 2), there
has been no attempt to combine these into a model suitable for machine learn-
ing of the task of reference selection and hence of an important part of spatial
language generation and scene description. This paper proposes such a model.
Machine learning of spatial preposition use has been attempted by Regier [5],
Lockwood et al. [6] and Coventry et al. [7] among others, but these systems have
‘pre-selected’ references.
Machine learning of reference selection takes place to an extent in the “De-
scriber” system (Roy [8]). This uses a 2 dimensional scene set with limited influ-
ences on reference object choice, although it is impressive in its scope, tackling
reference choice, target (referent) disambiguation and preposition assignment
simultaneously.
The “VITRA” system [9] is an ambitious scene description system including
all elements of spatial language, but it does not include machine learning.
Implementations of the model described in this paper are expected to be
trained with, and tested on, 3-dimensional scenes which are schematised (see
Herskovits [10]) representations of reality.
218 M. Barclay and A. Galton
1.3 Scope of the Investigation

Discourse and Context. Returning to Fig. 1, it can be seen that answering
“on the sidewalk” to the question “where is the man?” is appropriate if the man
had previously been in the road with a bus approaching. A discourse process or
context may have raised the salience of the sidewalk to the point where it was a
better reference than the skip or the pink house.
For the purposes of the current paper the scenes for which reference choice
is to be made are presumed to be memoryless i.e. time leading up to the point
at which a scene is described is unknown or non-existent. The multiple added
complexities of describing a scene during, or as part of, a discourse will be the
subject of future work.
Functionally Locative Phrases. As noted by Miller and Johnson-Laird [11]

phrases of the type “the bird is out of the cage” conform to the template of
a spatially locative phrase but provide no effective communication about the
location of the target object. Instead the purpose of the phrase is to convey
the information that the cage is not fulfilling its normal containment function.
The same can be said to be true of the phrase “The flowers are in the vase”,
which if the vase is as mobile as the flowers, conveys only the information that
the vase is performing its containment function. If someone is trying to find the
flowers the phrase “the flowers are in the bedroom” is more likely to be helpful
than “the flowers are in the vase”.
If “the bird is in the cage” and the cage is fixed and its location known
then the cage is also a good spatial reference for the bird. In this paper the
assumption is made that the aim is to select a good spatial reference, irrespective
of any functional relationship, and that the existence of a functional relationship
between a target and a reference does not of itself make the reference more
suitable (see [12]).
1.4 Structure of Paper

The remainder of this paper is structured as follows: Section 2 gives an overview
of the literature from linguistics and landmarks concerning the characteristics
of reference objects. Section 3 develops an influence model for reference object
selection starting from the function of a spatially locative phrase. Section 4
discusses possible computational methods for implementing the model and issues
surrounding these.
2 Reference Object Characteristics

2.1 Linguistic Investigations
Miller and Johnson-Laird [11] note that the scale of the reference and target
objects is important in selection of a reference: “It would be unusual to say
that the ashtray is by the town-hall”. Talmy [1] lists attributes of located and
reference objects, and states that relative to the located object the reference is:
1. More permanently located

2. Larger
3. Geometrically more complex in its treatment
4. Earlier on the scene / in memory
5. Of lesser concern / relevance
6. More immediately perceivable
7. More backgrounded when located object is perceived
8. More independent
Thus the reference is likely to be somewhat bigger, if not vastly so, than the
target object. This scale issue is discussed in Sect. 3.3, permanence and perceiv-
ability in Sect. 3.2. These are not intended as absolute categorisations and the
model developed in this paper embodies the concept that the influences can be
traded against each other. For instance the phrase “the bicycle is leaning on the
bollard” uses as a reference an object smaller than the target (less appropriate)
but more permanently located (more appropriate).
Bennett and Agarwal [13] investigate the semantics of ‘place’ and derive a
logical categorisation of reference attributes. DeVega et al [14] analyse Spanish
and German text corpora and (with a restricted range of prepositions) find that
reference objects are more likely to be solid and countable (i.e. not a substance
like ‘snow’). It should be noted that the corpora were taken from novels rather
than first hand descriptions of real scenes.
Recent experimental work by Carlson and Hill [12] indicates that the geo-
metric placement of a reference is a more important influence than a conceptual
link between target and reference, and that proximity and joint location on a
cardinal axis (e.g., target directly above or directly to the left of reference) are
preferred (see Sect. 3.3). The experiments were carried out using 2-dimensional
object representations on a 2-dimensional grid. Earlier work by Plumert et al.
[15] focusses on hierarchies of reference objects in compound locative phrases
but also finds that in particular the smallest reference in the hierarchy might
be omitted if the relationship between it and the target did not allow sufficient
extra information to be provided (see Sect 3.4).
2.2 Landmark Characteristics
A considerable body of work on landmarks exists, including the role of landmarks

in cognitive mapping and structuring of space, which cannot be comprehensively
reviewed here. Of more relevance to the present paper is the practical matter
of selecting landmarks when giving way-finding instructions and much of this
work can be related directly to general reference object selection, augmenting
the work from linguistics.
Various researchers have looked at the nature of objects chosen as landmarks
and derived characteristics of good landmarks. Burnett et al. [16] deal with the
case of urban navigation. Based on interviewing subjects who have chosen par-
ticular landmarks in an experimental setting they derive the following charac-
teristics of good landmarks:
1. Permanence
2. Visibility,
3. Usefulness of Location
4. Uniqueness
5. Brevity of description
They also note that most landmarks do not exhibit all of the desired charac-
teristics, indeed the most frequently used landmarks, traffic lights, are ubiquitous
rather than unique. This is discussed in section 3.4.
The factors which contribute to “visual and cognitive salience” in urban way-
finding are investigated by Raubal and Winter [17] and Nothegger et al [18],
who test automatically selected landmarks against those selected by humans.
The measure of saliency for visual features is complex. Nothegger et al. [18]
point out that using deviation from a local mean or median value (for example
in a feature such as building colour) to represent salience does not hold for
asymmetric quantities such as size, where bigger is usually better than smaller.
Cognitive salience, including cultural or historic significance, is in practice related
to the issue of prior knowledge of the Landmark by the listener and is discussed
in section 3.2.
Winter [19] adds advance visibility to the list of desirable characteristics for
landmarks, citing both way-finder comfort and reduced likelihood of reference
frame confusion as reasons.
Sorrows and Hirtle [20], along with singularity (sharp contrast from the en-
vironment), prominence (visibility) and cultural or historic significance, which
are picked up in the lists already mentioned, also list accessibility and prototyp-
icality as characteristics of landmarks. Accessibility (as in the junction of many
roads) may make a landmark more frequently used and may lead to the accretion
of other characteristics useful for way-finding, but it probably mostly denotes
usefulness of location, which is further discussed in Sect. 3.3. Prototypicality is
an important factor as without specific knowledge of a landmark or reference,
categorical knowledge is required. A church which looked like a supermarket
would be a problematic reference.
Tezuka and Tanaka [21] note that landmark use is relative to the task at hand,
mode of transport and time of day. A good landmark for pedestrian navigation is
not necessarily good for car drivers. This seems always to be expressible in terms
of visibility etc. but highlights the need for speed, conditions and viewpoint to
be taken into account in assessing visibility. Also cultural factors, preferences
and, according to Klabunde and Porzel [22], social status may affect landmark
choice.
In [21] a reinforcement mechanism is proposed whereby landmark usage ef-
fectively improves the goodness of the landmark. The initial choice of a landmark
which subsequently becomes much used would presumably have been made be-
cause it displayed characteristics of a good landmark. However, an object’s prior
use as a landmark may cause continuation of use even if an otherwise more suit-
able landmark appears. A related case is noted in [20], “turn left where the red
barn used to be”, where the use of the landmark outlives the landmark itself.
3 Processing a Locative Phrase
3.1 Three Primary Influences on Reference Object Suitability
The three primary influences on reference object suitability can be derived from
the necessary steps a listener must take on hearing a locative phrase, with the
addition of a cost function. Presented with a locative phrase and the task of
finding the target object the listener must do two things:
1. Locate the reference object.

2. Search for the target object in the region constrained by combining the
reference object location with the spatial preposition.
Making the assumption that the speaker intends his communication to be

effective, or at least is trying to cooperate with the listener, it will follow that
the speaker will have chosen the reference object to be easily locatable: and
also that, in conjunction with the preposition, the reference will optimise the
region in which the listener must search for the located object. There is some
evidence for this cooperation with (or consideration for) the listener in spatial
communication (see Mainwaring et al. [23]) in the adoption of reference frames
that reduce the mental effort required by the listener.
The functional basis of this analysis also leads to the addition of a third
criterion for reference objects, the communication cost of using them (see Grice
[24] on brevity and giving the optimum amount of information). Communication
cost will be an important consideration if a potential reference is ambiguous or
if the time taken for the communication is comparable to the time the listener
will take to locate the object.
The following sections expand on the aspects of reference objects which con-
tribute to the three primary influences shown in Fig. 2.
Reference locatability Search-space optimisation Communication cost
Reference suitability
Fig. 2. Three primary influences on reference suitability

3.2 Influences on Reference Locatability

Specific and Categorical Knowledge. For a reference object to be locatable
the listener must either have specific knowledge of the object in question, or have
categorical knowledge of the type of object in question so that it is apparent
when visually encountered. Specific knowledge may substitute for or enhance
categoric knowledge: for instance, in the case of “The National Gallery” specific
knowledge of the building in question would be required by the listener if we
accept that there is no visual category for art galleries. “St Paul’s Cathedral”
may be specifically known but it is also clearly a member of its category and
hence could be referred to as “the Cathedral” in some circumstances. Since
the influence model is for reference choice it is more appropriate to term these
two primary influences on reference locatability “reference apparency” for the
case where categoric knowledge only is assumed and “degree of belief in listener
specific knowledge” for the case where specific knowledge is relied on. These are
shown in Fig. 3.
Listener approach Reference obscurancy Reference visual contrast Reference size
Temporal relevance
Target mobility Reference mobility
(Listener presence)
Reference persistence Reference prototypicality Reference visibility
Reference general Knowledge of listener's

significance past locales
Degree of Belief in listener

Reference apparency
specific knowledge
Fig. 3. Influences on reference locatability

Degree of Belief in Listener’s Specific Knowledge. From the studies of

landmarks discussed in Sect. 2.2, the value of cultural or historic significance in
landmark choice is clear and can be simply represented as an influence on the
speaker’s degree of belief in listener’s specific knowledge as such. (A historically
significant reference is more likely to be known, this is termed “reference general
significance” in Fig. 3)
The second influence on the speaker’s degree of belief in listener specific knowl-
edge comes from the speaker’s knowledge of the listener. For instance in Shaftes-
bury, “Gold Hill”, a well known landmark, would be useful in giving directions
to visitors or locals; “Shooters hill”, less well known, would only be of use if
the speaker knew that the listener was local to Shaftesbury. Sorrows and Hirtle
[20] give a related example relating reference choice to frequency of visits to a
building. This influence is included in Fig. 3 as “speaker’s knowledge of listener’s
past locales”.
Indeed this factor may influence more than just reference object choice for a
simple locative phrase and may influence whether a simple or complex locative
phrase can be used (see Sect. 4.5). For example “In front of St Martin in the
Fields” is likely to be replaced by “In front of the church at the north east corner
of Trafalgar square” for a listener unacquainted with the church in question.
Reference Apparency. For a reference to be apparent to a listener who has

categoric knowledge of the object in question it must be a good representative
of the category (be prototypical [20]), must be visible, and must persist until it
is no longer required as a reference (be persistent, or permanent [16]). Note that
although ambiguity may be thought to have a direct influence on apparency, the
way it is proposed to deal with ambiguous references results in them influencing
either communication cost or search space optimisation. This is discussed in
Sect. 3.4.
It is an open question as to whether persistence should be a direct influence
on apparency or considered an influence on visibility as “visible at the time
required”. As it should make no difference to the model output (although it may
reduce comprehensibility) it is left as a direct influence at present.
Prototypicality. This is a complex area and initial computer implementations of

the influence model will not include this parameter. Size, geometry and presence
or absence of features will all influence prototypicality. Further study of relevant
literature and consideration of methods of representation will be required before
this can be brought within the scope of the model. At present reference objects
are assumed to be recognisable members of their category.
Visibility. This is affected by many factors identified in the landmark studies

cited in Sect. 2.2. Size, obscurance, brightness, colour contrast, shape factor are
all relevant. The speed of travel of the listener [21] and the direction of approach
of the listener [19] are also important. These are included in Fig. 3 as the influ-
ences “reference size”, “reference obscurance”, “reference visual contrast” and
“listener approach”. Note that even something as seemingly simple as size may
have multiple influences. For size, bounding box volume, convex hull volume,
actual volume, maximum dimension and sum of dimensions are all possible can-
didate influences. The apparent size, the area projected toward the speaker, may
in some cases be more important than the actual size. Raubal and Winter [17]
note this in the case of building façades, for instance. These are omitted from
Fig. 3 for simplicity, although they will be included in model implementations.
Persistence. Following Talmy [1] and the work by de Vega et al. [14] it is clear
that both the target object and candidate reference object mobility influence
reference choice. Intuitively the reference object is expected to be more stable
(see [25]) than the target. Also important, as pointed out by Burnett [16], is
when the listener will need to use the reference to find the target. If in Fig 1 the
target object is the post box and the listener will not be at the scene for some
time, then the pink house, rather than the skip (which may be removed) will be
a better reference even though the skip is nearer and plainly visible. This factor
is summarised as “Temporal relevance (listener presence)” in Fig. 3.
3.3 Searching for the Target Object

This is a conceptually simpler task than finding the reference object. Less knowl-
edge about the listener is required and the target object itself, its characteristics
and function, do not come in to focus until it is found (when, as Talmy [1] notes,
“the reference object is backgrounded”).
Scene Scale. As already noted, Miller and Johnson Laird [11] point out that the
scale of the reference and located objects are important in determining whether
a reference is appropriate. It is proposed here, following Plumert et al. [15], that
this is due to the influence on the search space. Choosing a large reference may
make the reference more apparent but may leave the listener a difficult task find-
ing the target object as, along with any preposition, it defines too large a region
of interest (e.g., “the table is near Oxford”). Reference size must be treated care-
fully as, dependent on geometry, the search space may vary considerably. To say
a target object is “next to the train” defines a large search area but to say that
it is “in front of the train” defines a much smaller area. Computational models il-
lustrating this can be seen in Gapp [26]. Geometry here is effectively a shorthand
term for what might be termed “projected area in the direction of the target” A
further important influence on search space is the location of the listener relative
to a target object of a given size. As Plumert et al. [15] point out, if the target
object is a safety pin and the listener is more than a few yards away, there may be
no single suitable reference. This factor is included with reference size and geom-
etry and target object size as influences on “scene scale” (see Fig 4) which in turn
influences search space. The real effect of some critical combination of a small tar-
get object and a distant listener will be to suppress the suitability of all reference
objects and force the decision to use a compound locative phrase containing more
than one reference. This is discussed in Sect. 4.5.
Reference ambiguity
Disambiguation by
Target size Target obscurance
grouping
Reference size Reference geometry Target visibility Listener location
Reference proximity Cardinal axis placement
Reference/Target topology Reference location Scene scale
Fig. 4. Influences on search-space optimisation
Reference Location. This is likely to affect search-space optimisation in two

ways. Firstly the simple proximity of the reference to the target reduces the
search space and secondly the presence of the target on a cardinal axis (where
the reference is the origin) appears to make the search easier. This is apparent
in Carlson and Hill [12] and also studies on preposition appropriateness (see
for instance [27]). Intuitively, given a preposition “above” and a reference the
listener will locate the reference and move his eyes up from there until the target
is encountered. Given a reference and the direction “above and to the left” the
process is much more involved. Proximity and cardinal axis placement are shown
in Fig. 4 as influencing reference location which in turn influences search space.
Reference/Target Topology. From the study by Plumert et al [15] it is clear

that as well as the geometry of the reference and target, the topological rela-
tionship between them is also important. If a target object was “on the book
on the table” the book was more likely to be included as a reference than if
the target was “near the book on the table” (in which case the the target was
simply “on the table”). The search space would appear to be comparable but
in the case where the target was “on the book” the extra communication cost of
using the two references was considered worthwhile by the speaker. It is possible
that there is a perceived chance of confusion in that an object “on A which
is on B” is not necessarily seen as “on B” (i.e., “on” is not always accepted as
transitive, although this is not necessarily the same as Miller and Johnson-Laird’s
limited transitivity [11]). The reference/target topology influence is included in
the model at present pending further testing of its relevance.
The inclusion of reference ambiguity along with disambiguation by grouping
in Fig. 4 is discussed in Sect. 3.4.
3.4 Communication Cost
Reference Innate Cost. The costs of simple references such as “hill”, “house”
or “desk” are typically fairly comparable. However references can be parts of
objects (see [14]) such as “the town hall steps” or regions such as “the back
of the desk”. The distinction between this form of reference and a compound
reference is that there is still only a single preposition (in contrast to “in front
of the town hall by the steps”). It is clear that references of this nature incur
cost both for the speaker and the listener and in a computational model a cost
function will be required to prevent over-specifying of a reference (e.g., “The
back right hand corner of the desk”) when a less specific reference (“on the
desk”) would be sufficient. Sufficiency here is clearly related to the difficulty of
the search task. How these costs are quantified in the model, beyond a simple
count of syllables (which will be used in initial implementations), needs further
investigation.
Search Task Difficulty. It was earlier noted that communication cost would
become important if the time taken for the communication approached that re-
quired for the speaker to locate the target. As noted this is a factor in the results
reported by Plumert et al. [15]. The study concluded that a secondary reference
might be omitted because the target was “in plain view” although the topolog-
ical relationships involved were also a factor (see Sect. 3.3). Much of the search
task difficulty is already expressed in the model as search-space optimisation
and does not require re-inclusion as a factor influencing communication cost;
however, some factor is required in the model to represent the speed of visual
search the listener is capable of. This should be more or less constant for human
listeners and if not would require the speaker to know if the listener was much
slower or quicker than normal, which is outside the scope of the model at this
point. As a constant it should be incorporated automatically into the weights of
the model as it is learned and so is not explicitly included in Fig. 5.
Reference Ambiguity. Two possibilities exist for a speaker confronted with an

ambiguous reference in the case of a spatially locative phrase, as opposed to the
case in which the object is the intended referent in a referring expression, when
disambiguation is mandatory. Consider a scene such as that in Fig 6. The speaker
can choose to aggregate the ambiguous references into a single unambiguous
Reference ambiguity
Disambiguation by
Reference innate cost
specification
Fig. 5. Influences on Communication Cost
reference as in “The bus-shelter is in front of the grey houses”, or disambiguate

as in “The bus-shelter is in front of the second grey house”. The first of these
alternatives creates a reference with different size and geometry and hence has
the effect included in Fig 4. The second increases the communication cost of using
the reference object. Methods for disambiguation and algorithms for arriving at
suitable phrases are addressed in the literature on referring expressions, see for
instance [2] and for an empirical study of disambiguation using spatial location
see [28].
As an example of the balancing influences between communication cost and
search-space optimisation, consider the use of traffic lights as ‘landmarks’ in
way-finding instructions. Burnett [16] notes the frequency of their use in spite
of their ubiquity and the impossibility of disambiguating them except by the
use of a count specifier (e.g., “turn right at the third set of traffic lights”). The
communication cost of the count specifier is quite high both for the speaker and
the listener, but the precision with which the traffic lights optimise the search-
space for the target (effectively the “right turn” in this example) makes them a
suitable reference. This influence on communication cost is shown in Fig. 5 as
“disambiguation by specification”.
4 Discussion: Implementation Possibilities for the

Influence Model
4.1 Computational Structure
The problem as stated in 1.1, “to identify a suitable reference object from the
many present in a scene”, is essentially a classification task. As such a wide
variety of well documented techniques is available for solving the problem. The
least flexible and most error prone would be to reduce the model by hand to an
Fig. 6. “The bus shelter is in front of ?”
algorithm with constant values decided from (necessarily) piecemeal experimen-

tation. Some form of supervised machine learning approach would seem more
promising. From the machine learning literature, decision trees, neural networks,
probabilistic (Bayesian) networks and kernel machines would all appear to be
potential candidates for a computational structure. With a view to retaining,
as well as validating, to some extent, the semantics of the derived variables in
the model, a decision tree or probabilistic network approach would be favoured,
although a constrained neural network approach such as that of [5] could be
used. A Bayesian network approach has been initially adopted without making
any further claim for its superiority.
4.2 Model Output

The influence model, as illustrated in Figs. 2, 3, 4 and 5, when implemented as
a Bayesian-network and trained, will return a suitability value for each reference
object in a scene, rather than a chosen reference object. The number of values
assigned to the various variables in the network will make it more or less likely
that more than one candidate reference object will have the same suitability value.
Since it is clear that in a typical scene there would be no consensus reference chosen
by a group of human subjects, having several judged suitable by the model is not
unreasonable. What is also clear is that in some scenes there will be no suitable
single reference and this must be reflected in the model output. The absence of a
suitable reference would be the trigger for formation of a compound locative phrase
or description and this is briefly discussed in Sect. 4.5.
The model must evaluate the candidate reference objects in some order and
how a suitable order is determined is not clear. The evaluation order may prove
to be important if the first suitable reference is to be returned or if pruning of
the evaluation is required (i.e., ignoring references that are clearly unsuitable).
Evidence from research on visual search (see for example Horowitz and Wolfe
[29]), although not directly applicable to the reference choice task, may help
guide experiments in this area.
4.3 Complexity of the Model

The model can be easily represented as a poly-tree (i.e., no more than one path
connects any two variables) and although many of the observable parameters are
by nature continuous, initial versions of the model can use discretized values. The
number of variables and the number of values required for a given variable will
be the subject of future experimentation.
Considering the model as presented in Figs. 2, 3, 4 and 5 there are 20 evidence
variables and 10 hidden variables as well as the output variable. As an indication
of the model size, if all variables are 5-valued the model can be represented with
20.51 +6.53 +3.54 +2.55 or 8975 values (as opposed to 4.7∗1014 for the conditional
probability table with no hidden variables). Representation and evaluation of a
model of this scale and form is straightforward.
4.4 Training Data Set

A scene corpus for training and testing the model is currently under development.
The scenes depicted in Figs. 1, 6 have been taken from the corpus. The design
and construction of the corpus is described in [30]. The practical limit for the
initial size of the corpus is likely to be around 2000 test cases in some 500 scenes.
The training data available for the model limits the influences that can be
included in it to some extent. Considering Fig. 3, the following simplifications
are made:
1. There is no way at present of including learnable measures of “reference
general significance” or “speaker knowledge of listener’s past locales”. A
simple mechanism for tagging some of the objects in the scene as specifically
known to the listener will be used initially.
2. As noted, there is no attempt to measure or learn “prototypicality”. Proto-
typicality by itself will, in all probability, require a model of simlar complex-
ity to the one developed here.
3. “Listener approach” is limited to a “scene entry point” at present.
4. However, learnable information about target and reference mobility are both
available as the training scenes are sequences of pictures of locations with
some objects moving and some not.
5. Also, although not strictly limited by the training data, only a simple mea-
sure of communication cost related to utterance length will be used. This
clearly does not express many aspects of the mental effort involved (in dis-
ambiguation in particular) which will need to be the subject of future work.
The corpus contains scenes derived from real situations and scenes designed
to test specific elements of the model. These designed scenes include for instance
Fig. 1 in which the sidewalk is inappropriate (due to its linear extension) as a

reference for the location of the man, although it is the closest object to him. All
objects and parts of objects in a scene are potential references, other limitations
on the representation of scenes are described in [30].
4.5 Simple and Compound Locative Phrases
As noted in Sect. 4.2 if no suitable single reference is found by the model a

compound locative phrase is required. Various possible algorithms for this can
be investigated using the model as described in an iterative fashion. For instance
this could be achieved by conceptually moving the listener within the scene to
a point closer to the target object and selecting an appropriate reference object
and then making this the new target object with the listener moved further
towards his initial position. Whether the model would be effective in this task
without some learned concept of scale-spaces (see for example [31]), and how
this learning would be achieved if required, are open questions.
Also problematic is the case where a reference object requires disambiguation
by use of a second reference as in “the keys are on the desk under the window” in
a room with more than one desk. As noted, the work of Roy [8] addresses this in a
simple context disambiguating both the target and reference objects as necessary.
A reference must be provided that is suitable for the desired primary reference
and unsuitable for any distractors. In practice using the model to achieve this
may be easier than detecting the problem in the first place and recognising that
though ambiguous the desk is still a good reference because of its conventional
use in defining a space (indeed in typifying a scale) where objects are collected.
Cases of reference combinations that are not hierarchical such as “the library
is at the intersection of 5th Street and 7th Avenue” will also need to be the
subject of future work.
5 Conclusions, Further Work
Reviewing recent literature on landmarks and references enables a list of rele-

vant characteristics to be drawn up. Reasoning from this list and the function of
spatially locative phrases allows an organisation to be imposed on the character-
istics which is lacking in the literature. The resulting model will enable effective
probabilistic modeling and machine learning of reference (and hence landmark)
suitability.
The speaker’s assumed knowledge of the listener in intial implementations of
the model is (for practical reasons) limited to that of his location, his temporal
requirements for the spatial information and whether he has prior knowledge of
a candidate reference object. A more sophisticated model of the listener would
repay investigation. The model could also be expanded to include the different
cases where multiple or compound locative phrases are planned by the speaker.
As noted in the discussion, development of an automated system for reference
object choice based on the analysis in this paper is currently under way. Initial
results from a limited model, containing some 8 variables relating to target and
reference geometry, trained with a 320 case scene corpus, suggest that results
from the full model will be very worthwhile.
References
[1] Talmy, L.: Toward a Cognitive Semantics. MIT Press, Cambridge (2000)
[2] Dale, R., Reiter, E.: Computational interpretations of the gricean maxims in the
generation of referring expressions. Cognitive Science 19, 233–263 (1995)
[3] Duwe, I., Kessler, K., Strohner, H.: Resolving ambiguous descriptions through
visual information. In: Coventry, K.R., Olivier, P. (eds.) Spatial Language. Cog-
nitive and Computational Perspectives, pp. 43–67. Kluwer Academic Publishers,
Dordrecht (2002)
[4] van Deemter, K., van der Sluis, I., Gatt, A.: Building a semantically transparent
corpus for the generation of referring expressions (2006)
[5] Regier, T.: The human semantic potential: Spatial language and constrained con-
nectionism. MIT Press, Cambridge (1996)
[6] Lockwood, K., Forbus, K., Usher, J.: Spacecase: A model of spatial preposition
use. In: Proceedings of the 27th Annual Conference of the Cognitive Science So-
ciety (2005)
[7] Coventry, K.R., Cangelosi, A., Rajapakse, R., Bacon, A., Newstead, S., Joyce,
D., Richards, L.V.: Spatial prepositions and vague quantifiers: Implementing the
functional geometric framework. In: Proceedings of Spatial Cognition Conference
(2004)
[8] Roy, D.K.: Learning visually-grounded words and syntax for a scene description
task. Computer Speech and Language 16(3) (2002)
[9] Herzog, G., Wazinski, P.: Visual translator: Linking perceptions and natural lan-
guage descriptions. Artificial Intelligence Review 8, 175–187 (1994)
[10] Herskovits, A.: Schematization. In: Olivier, P., Gapp, K.-P. (eds.) Representation
and Processing of Spatial Expressions, pp. 149–162. Laurence Earlbaum Asso-
ciates (1998)
[11] Miller, G.A., Johnson-Laird, P.N.: Language and perception. Harvard University
Press (1976)
[12] Carlson, L.A., Hill, P.L.: Processing the presence, placement, and properties of a
distractor in spatial language tasks. Memory and Cognition 36, 240–255 (2008)
[13] Bennett, B., Agarwal, P.: Semantic categories underlying the meaning of ‘place’.
In: Winter, S., Duckham, M., Kulik, L., Kuipers, B. (eds.) COSIT 2007. LNCS,
[14] de Vega, M., Rodrigo, M.J., Ato, M., Dehn, D.M., Barquero, B.: How nouns and
prepositions fit together: An exploration of the semantics of locative sentences.
Discourse Processes 34, 117–143 (2002)
[15] Plumert, J.M., Carswell, C., DeVet, K., Ihrig, D.: The content and organization
of communication about object locations. Journal of Memory and Language 34,
477–498 (1995)
[16] Burnett, G.E., Smith, D., May, A.J.: Supporting the navigation task: charac-
teristics of good landmarks. In: Proceedings of the Annual Conference of the
Ergonomics Society. Taylor & Francis, Abington (2001)
[17] Raubal, M., Winter, S.: Enriching wayfinding instructions with local landmarks.
In: Egenhofer, M.J., Mark, D.M. (eds.) GIScience 2002. LNCS, vol. 2478, pp.
243–259. Springer, Heidelberg (2002)
[18] Nothegger, C., Winter, S., Raubal, M.: Computation of the salience of features.
Spatial Cognition and Computation 4, 113–136 (2004)
[19] Winter, S.: Route adaptive selection of salient features. In: Kuhn, W., Worboys,
M., Timpf, S. (eds.) COSIT 2003. LNCS, vol. 2825. Springer, Heidelberg (2003)
[20] Sorrows, M., Hirtle, S.: The nature of landmarks for real and electronic spaces.
In: Freska, C., Mark, D. (eds.) Spatial Information Theory: Cognitive and Com-
putational Foundations of GIS. Springer, Heidelberg (1999)
[21] Tezuka, T., Tanaka, K.: Landmark extraction: A web mining approach. In: Cohn,
A.G., Mark, D.M. (eds.) COSIT 2005. LNCS, vol. 3693. Springer, Heidelberg
(2005)
[22] Klabunde, R., Porzel, R.: Tailoring spatial descriptions to the addressee: a
constraint-based approach. Linguistics 36(3), 551–577 (1998)
[23] Mainwaring, S.D., Tversky, B., Ohgishy, M., Schiano, D.J.: Descriptions of simple
spatial scenes in english and japanese. Spatial Cognition and Computation 3(1),
3–43 (2003)
[24] Grice, H.P.: Logic and conversation. In: Cole, P., Morgan, J. (eds.) Syntax and
Semantics: Speech Acts, vol. 3, pp. 43–58. Academic Press, New York (1975)
[25] Vandeloise, C.: Spatial Prepositions. University of Chicago Press (1991)
[26] Gapp, K.P.: An empirically validated model for computing spatial relations.
Künstliche Intelligenz, pp. 245–256 (1995)
[27] Regier, T., Carlson, L.: Grounding spatial language in perception: An empiri-
cal and computational investigation. Journal of Experimental Psychology: Gen-
eral 130(2), 273–298 (2001)
[28] Tenbrink, T.: Identifying objects on the basis of spatial contrast: An empiri-
cal study. In: Freksa, C., Knauff, M., Krieg-Bruckner, B., Nebel, B., Thomas
Barkowsky, T. (eds.) Spatial Cognition IV: Reasoning, Action, Interaction. In-
ternational Conference Spatial Cognition 2004, pp. 124–146. Springer, Heidelberg
(2005)
[29] Horowitz, T.S., Wolfe, J.M.: Search for multiple targets: Remember the targets,
forget the search. Perception and Psychophysics 63, 272–285 (2001)
[30] Barclay, M.J., Galton, A.P.: A scene corpus for training and testing spatial com-
munication systems (in press, 2008)
[31] Montello, D.R.: Scale and multiple psychologies of space. In: Frank, A.U., Cam-
pari, I. (eds.) COSIT 1993. LNCS, vol. 716, pp. 312–321. Springer, Heidelberg
(1993)
Tiered Models of Spatial Language
Interpretation
Robert J. Ross

robertr@informatik.uni-bremen.de
Abstract. In this paper we report on the implementation and evalua-

tion of an interactive model of verbal route instruction interpretation.
Unlike in previous works, this approach takes a generalised plan con-
struction view of the interpretation process which sees the elements of
verbal route instructions as context enhanced specifications of physical
action which move an agent through a mental or physical search space.
We view such an approach as essential for effective dialogic wayfinding
assistance. The model has been developed within a modular framework
of spatial language production and analysis which we have developed
to explore different reasoning and representation facilities in the spatial
language cognition process. We describe the developed cognitive spatial
model, the interpretation of individual actions based on explicit linguistic
and extra-linguistic spatial context, and the interactive plan construction
process which builds these individual processes into a route interpreta-
tion mechanism. Finally, we report on a recent evaluation study that was
conducted with an initial implementation of the described models.
Keywords: Spatial Language, Language Interpretation, Embodied

Action.
1 Introduction
While particular semantics and schema based models of spatial language use have
been proposed in the literature [1,2], as well as layered spatial representation
and reasoning models [3,4], and a wealth of qualitative and quantitative models
of spatial reasoning (see [5] for a review), the processing of spatial language
remains challenging both due to to complexities of spatial reasoning, but also
because of the inherent difficulties of language processing due to the remarkable
efficiency of spoken communication (see [6] for a discussion). For the development
of sophisticated linguistically aware spatial applications, it is not only necessary
to develop spatial reasoning systems, but it is also necessary to identify the
properties of spatial language - particularly with respect to what elements of
language are left under-specified for efficient communication, and which hence
must be retrieved through other mechanisms.
In this paper we will consider this problem of moving from the surface lan-
guage form to embodied processing for verbal route instructions. Route interpre-
tations, like scene descriptions, involve the semantic conjunction, or complexing,

234 R.J. Ross
of a number of smaller grained spatial language types such as spatial locating

expressions and action descriptions; thus their interpretation requires a clear
understanding of these individual language types. Before outlining our general
approach to this problem, we first review some of the known properties of ver-
bal route instructions and computational approaches to their interpretation and
modelling.
1.1 The Structure of Verbal Routes

The properties of route instructions as monologic explanations have been stud-
ied extensively in the discourse analysis [7,8], discourse semantics [9], and spatial
cognition communities [10]. Denis [7] for example characterized routes as schema-
tized descriptions of motion, starting with an initial localization, followed by one
or more instances of route segments which move a hearer along a path, finalised
by an orientation of the hearer towards the destination. Denis however also char-
acterised route instructions as making extensive use of landmark reference and
action descriptions. Prévot [9] took a more detailed semantic approach to the
analysis of route interpretation, concluding that verbalised routes are highly un-
derspecified structurally when introduced through dialogue and that hearers can
only produce the intended interpretation through the application of many layers
of context - including both discourse and situational information.
More recently, the semantic and surface form features of route instruction in
a dialogic rather than monologic setting have been discussed at length by Shi
& Tenbrink [11]. We will not repeat such analysis here, but will instead note
some of the defining features of individual motion process and spatial locating
utterances which compose a route instruction. In general, motion process sen-
tences typically mark spatial elements such as a general direction of motion, or
include more specific trajectory constraints which mark particular landmarks for
the path of motion. Moreover, qualitative or quantitative extents for motions are
often frequently found in surface language. In the surface form trajectory land-
marks are discourse referents which can have arbitrarily complex descriptions
to aid landmark resolution by the hearer. While such descriptions can include
non-spatial features, geo-positional information such as projective or ordinal re-
lationships can be present, e.g.,
(1) go into the second room, it’s the one after John’s office, on the left
Trajectory constraints as well as the spatial characterization of landmarks

which play a role in such constraints, e.g., “into” or “after” above, are subject to
some choice of perspective and reference frame. To process spatial expressions
in a principled way, each piece of this information must be identified. While it is
possible to hard-code some choices, in general we must select these parameters
from either the surface form as provided by the user, or retrieve them from either
discourse or situational context. One issue which is problematic for previous
approaches is that the resolution of such parameters does not rely on a static
domain model, but is instead quite dynamic.
Tiered Models of Spatial Language Interpretation 235
1.2 Computational Models of Route Interpretation
A number of approaches to the computational interpretation of routes from a

language understanding perspective have been considered in recent years. One
approach to this problem from the formal spatial modelling community is tightly
bound to Denis’s [7] and Tversky’s [12] characterization of certain route types as
schematized structures. Such models of routes, typified by Werner et al. [10], and
later Krieg-Brückner’s [3] Route Graph, focus on a spatial structural rather than
action oriented view of routes. Such a structurally oriented view of space led to
the exploration of direct mapping of formally specified logical representations
of verbalised routes against spatial representations coded as route graphs aided
in part through the use of qualitative spatial calculi (see for example [13] and
[14]). Such an approach essentially conflates verbalised route instructions with
the spatial model which they are to guide a user through, relying on a pre-
formalisation of the verbalised route which is not trivially achievable in practical
language systems. Such approaches, in themselves, also offer no mechanisms to
fall back to clarification dialogues in the case of ambiguities which can not be
resolved within context alone.
Within the applied robotics domain on the other hand, a more pragmatic
approach to the route interpretation problem has been investigated. Works such
as Lauria’s [15] interpretation of routes by a simple robot in a toy environment,
or Mandel et al. [16] interpretation of route instructions against a detailed spatial
representation have pursued an action oriented view of route instructions more
in keeping with the instruction based nature of route descriptions proposed by
Denis, but, as with the formal modelling accounts, often assume perfect user
input, fail to integrate the spatial reasoning algorithm with dialogic capabilities,
and in the case of Mandel et al., require unnaturally long route instructions to
aid the pruning of a search space, which while being well configured for robot
navigation and localization, was highly unsuited to the requirements of natural
language processing. Moreover, this approach presupposed a specification of the
user’s route instruction as a ground structural formalism which already assumed
a schematized view of routes which was already considerably abstracted from
the realities of surface spatial language. More recently, Shi et al. [17] attempted
to evaluate the relative merits of the formal spatial and robot oriented views of
route interpretation through a common action description framework. However,
no attempt was made to reconcile these two methodologies against the realities
of under-specified natural language, nor was there any attempt to incorporate
notions of quantitative information or spatial extent in route description.
1.3 Towards an Action Oriented View of Route Interpretation
In this paper we will adhere to an action oriented view of route interpretation in

keeping with Denis’s main construal of routes. Moreover, while we recognize the
schematization of long routes into a series of ‘segments’ as a general characteristic
of route descriptions in unknown environments, and one which has particular
significance to the production of clear unambiguous routes [18], we argue that
236 R.J. Ross
the robust resolution of verbalised routes by artificial systems should not be

overly dependent on such well formed segmentation. The resolution of ambiguous
circumstance relies on one hand on a search space technique based on Mandel et
al.’s methodology, but is instead applied to spatial representations more suited
to the processing of spatial language so as to reduce the overall search space and
hence be practical even for short language inputs. Moreover, our goal is to tease
out the various stages of spatial language interpretation so as to provide a more
methodological blueprint for scalable spatial language interpretation.
We proceed in Section 2 with a description of the logical surface of language
which forms the first layer in a clear model of spatial language interpretation. Then
in Section 3 we move from language itself to issues of embodiment in terms of rep-
resentation and spatial action. With the elements of spatial language and embod-
iment established, in Section 4 we explore the relation between the two through
a description of the language interpretation process for simple spatial actions be-
fore extending this model to the larger grained structures of routes. Section 5 con-
siders the application and evaluation of the models described through a modular
framework for spatial language production and analysis which we report on in the
context of a recent evaluation study. We conclude with a general discussion and
report on on-going and future work.
2 The Logical Surface of Route Instructions
In many approaches to the computational interpretation of spatial language there

exists an implicit assumption of a direct mapping or isomorphism between the
surface meaning of language and the types used for model specific spatial reason-
ing. While such a view is appealing for its simplicity, it unfortunately belies the
complexity of spatial language, spatial reasoning, and the relationships between
the two. An alternative view is to subscribe strictly to a “two level semantics”
view of knowledge representation within a spatial language system, within which
the first level or “Linguistic Semantics” captures the direct surface meaning of
spatial language, and a second, world oriented or “Conceptual Semantics” cap-
tures contextualized spatial knowledge and reasoning process. A mapping then
exists between the two, with the complexity of the mapping being a function of
the particular conceptual organization.
For the modelling presented here we subscribe to the second view, thus as-
suming the surface form of spatial language to be given in terms of a logical
formalism which abstracts over the surface language in terms of types and roles
which describe the spatial import of language. For this interface we adopt the
Generalised Upper Model (GUM), a “Linguistic Ontology” or formal theory of
the world whose categories are motivated based on the lexicogrammatical evi-
dence of natural language. In particular we assume the categories provided by
the latest version of GUM, which has been specifically extended to provide a
comprehensive account of the natural language semantics of spatial language.
Those extension, described in detail by Bateman et al. [2], are rooted in the
traditions of formal spatial language semantics (e.g., [19]) and more descriptive
accounts of spatial phenomena in language (e.g., [1]), resulting in category types

which are wholly motivated by the distinctions made by language in its construal
of space.
The range of categories defined for spatial language description within GUM
have been defined in depth elsewhere, and we will make no attempt to repli-
cate such detail here; instead, we simply highlight some of the more salient
modelling points so as to explain the role of the surface logical form in spa-
tial language interpretation. Within GUM, the central organizational unit is
the Configuration - which can broadly be considered a semantic categoriza-
tion of sentence types based on the constituents of those sentences. Of the
configurations, those of most relevance to the linguistic description of routes
are the subclasses of NonAffectingSpatialDoing, i.e., OrientingChange and
DirectedMotion which provide surface semantics for expressions of directed mo-
tion. Such configurations define typical dependent semantic participants, e.g., the
performer of the action, the direction within which the action is to be made, and
so forth. Of particular relevance to the specification of spatial routes in language
is the category of GeneralizedRoute, which through playing the role of route
within a configuration, specifies the trajectory elements captured by a single ut-
terance. Based on grammatical evidence a generalized route is defined as consist-
ing of minimally one of a source, destination, or path placement role. Roles such
as source and destination are in turn filled by so called GeneralizedLocation
entities which comprise both the relatum of an expression and the dynamic
spatial preposition, or semantically, the spatial modality given by the term. To
illustrate this surface spatial semantics provided by GUM, consider the following
directed motion expression along with its surface spatial semantics provided in
a frame like formalism:
(2) Go along the corridor to the kitchen
(SL1 / NonAffectingDirectedMotion
processInConfiguration (G1 / Going)
route (GR1 / GeneralizedRoute
pathPlacement (GL1 / GeneralizedLocation
relatum (C1 / Corridor)
hasSpatialModality (SM1 / PathIndicatingExternal))
destination (GL2 / GeneralizedLocation
relatum (K1 / Kitchen)
hasSpatialModality (SM2 / GeneralDirectional))))
It should be noted that such a surface semantics is not necessarily a model of

the real world in the sense of referential semantics, but is better thought of as a
logical specification of surface language. Furthermore, while the surface spatial
semantics attempts to cover all surface spatial meaning given by an utterance,
some elements of a complete spatial meaning such as perspective are not marked
overtly by natural language and hence, as we will shortly see, must be retrieved
from extra-linguistic context during the spatial language interpretation process.
238 R.J. Ross
Before considering the contextualization process, we first turn to the embodied

non-linguistic representation and reasoning of the spatial context assumed by our
model.
3 The Embodiment Model

3.1 Spatial Representation
The heterogeneous nature of reasoning types employed by human and artificial
agents dictate that a single homogeneous model of space and intention for both
non-communicative perception, action, localisation, and navigation, as well as
communicative and cognitive reasoning processes is not practical. A multi-tiered
representation which separates out distinct knowledge types as argued for, for
example, by [4,3,20] is a useful means of achieving the required diversity of
reasoning types. In the following we outline the layered representation used in
our processing of spatial language.
Our implemented spatial model follows from the graph based organization of
space proposed by Krieg-Brückner [3] which was also placed within an ontological
characterization of spatial knowledge by Bateman [20]. For present purposes we
organize the agent’s spatial information as a three layer structure:
SM = {RS, CS, GS} (1)
where: (a) RS or Region Space is a metric model of the agent’s environment

composed of a Cartesian coordinate based grid system within which each grid
element can be either unoccupied or occupied by one or more Environment
Things some of which can be abstract; (b) CS or Concept Space is a knowledge
base like representation of Environment Things and their non-spatial character-
istics and relationships; and (c) GS or Graph Space is a structural abstraction
of navigable space which sets up possible navigable accessibilities and visibility
relations between environment things as marked by decision points. The explicit
distinction between CS and RS is motivated principally by pragmatic concerns
in that by separating out the physical geo-spatial properties of the model from
the non-spatial we can on one hand make use of the most appropriate reason-
ing system for the information type, e.g., ontological reasoners for non-spatial
category like information, and explicit spatial reasoning techniques for the spa-
tial properties of space - such a distinction also follows from a more principled
organization of spatial content for robotic systems [20]. For current purposes,
we adopt a metric RS layer, but we can in principles replace this with a more
abstracted or qualitative representation of space without breaking the general
organization of our agent’s knowledge system. With respect to the GS layer, one
minor differences between our applied model and Krieg-Brückner’s Route Graph
formalism is that following Denis [7] and Kuipers [4], and a general principles of
ontological organization of space, our notion of Route Graph place or Decision
Point is strictly zero-dimensional and does not import notions such as reference
system from associated regions.
(a) Raw Image (b) Region Space
(c) Graph Space (d) Concept Space
Fig. 1. Illustration of the various levels of the NavSpace model
For illustration, Figure 1 depicts a simplified office environment in abstract

terms, along with the three representational layers of the spatial model. The office
environment, described in more detail in Section 5, is illustrated in Figure 1(a);
note here that the darker area in the centre of the environment is free space. The
Region Space layer for the same environment is illustrated in Figure 1(b); note
here the overlapping of both abstract entities with each other and with concrete
entities. Figure 1(c) similarly shows the Graph Space which includes the Route
Graph like structuring. Finally Figure 1(d) illustrates a highly simplified view of
the concept space model1 . While spatial representation layers similar to the Re-
gion Space and Graph Space are commonly used for practical robot navigation
and localization, i.e., occupancy grids and voronoi graph, it should be made clear
that the presence of GS and RS here, like the CS layer, in the current model is
motivated entirely by the necessities of spatial language interpretation.
3.2 Action Schemas

If we assume that the core unit of route descriptions are motion expressions
which correspond to actions which should be performed by an agent to reach a
1
The actual concept space model, even for the office environment used here, is consid-
erably more detailed, but is replaced here for visual clarity.
240 R.J. Ross
goal, then we must define those actions in a meaningful way. Such definitions
require, amongst other factors, a suitable choice of action granularity, relevant
parametrization, as well as the traditional notions of applicability constraints and
effects. For the current model, we have chosen a granularity and enumeration of
action close to the conception of spatial action in human language as identified
in the Generalized Upper Model introduced earlier. We will refer to these action
types as action schemas, but it should be noted that the types of action schemas
and GUM configurations are not one to one; action schemas necessarily introduce
full spatial information including perspective, and are also, as will be seen below,
marginally finer grained than GUM configurations.
Excluding non-spatial features such as start time, performer of the action and
so forth, we can define the generalized form of a directed motion action schema
as follows:
M otion(direction, extent, pathConstraint) (2)
where:
– direction ∈ Direction is a direction expressed with respect to the movers

perspective and intrinsic reference frame.
– extent ∈ GeneralizedExtent is a quantitatively or qualitatively expressed
extent for the movement
– pathConstraint ∈ {P lace, M odality} is an expression of a path constraint
broadly equivalent to the GeneralizedLocation which play the roles of
source, destination, and path placement in the Generalized Upper Model.
For each action there is also an implicit source which is the starting point of
any motion. The source of a motion, typically omitted from surface language, is
necessarily required to define an action. Trivially the source of motioni is equal
to the final location of motioni−1 . Furthermore, certain pragmatic constraints
hold on which parameters of a motion action schema are set. For example, the
specification of an extent without either direction or a pathConstraint is not
permitted. Furthermore explicit definition of extent and a path constraint must
not be contradictory with respect to the world model.
While action schemas are similar in centralization and composition to config-
urations within the Generalised Upper Model, action schemas are more finely
centralized, typically decomposing a single GUM motion configuration into mul-
tiple action schemas, e.g., the configuration given for Sentence 2 earlier is given
by two distinct action schemas instances within the embodiment model, one cap-
turing the path placement constraint, while the other captures the destination
constraint. Multiple action schemas are then given a logical structuring with
ordering and conditional operators.
We must also define the effects of such schemas. The defining characteristic of a
movement is a profile of probable location of the mover following the initialization
of the action. While there are some logical symbolic ways to define such results,
our approach follows Mandel et al. [16] in that we give a probable location of
the mover as a function of the starting pose and the action schemas considered
as follows:
p(xj , yk , ol ) = fschema (x0 , y0 , o0 ) (3)
where x0 , y0 , o0 denotes the starting pose of the agent (location on Cartesian
plane and orientation respectively), p(xj , yk , ol ) denotes the probability of even-
tual occupation of an arbitrary pose, and the motion profile of each schema is
determined empirically as a function of the supplied parameters.
4 Route Interpretation through Interactive Plan

Construction
While action schemas and the logical form of language share common features,
the mapping function between the two is non-trivial and is highly dependent
on forms of spatial and discourse context. To illustrate, Figure 2 schematically
depicts an office environment with a robot (oval with a straight line to indi-
cate orientation). In such environments, the identification of discourse referents
Fig. 2. Illustration of a spatial situation wherein parametrization of action schemas

based solely on surface language fails because of multiple salient and relevant objects
Fig. 3. Illustration of a spatial situation wherein parametrization of action schemas

based solely on surface language fails because of ambiguous perspective
242 R.J. Ross
typically used as landmarks within route instructions can depend on a range of

spatial factors such as visual saliency, proximity, or accessibility relations. This
can be seen in that if the robot was told to “enter the office” then it is highly
likely that the office in question is directly ahead rather than behind the agent
or to its right. Moreover, the mapping process can also involve the application
of non-physical context to enrich the surface information provided. To illustrate,
consider Figure 3 which depicts an agent situated at a junction while being given
instructions by an operator who has a survey perspective equivalent to the fig-
urative view presented in Figure 3. In such a case and where the instructee is
aware of the instructor’s non-route perspective, an instruction such as “turn to
the left” can have alternative meanings. While explicit clarification through di-
alogue is possible, the more efficient solution is to apply contextual information
in the transformation process.
In general, the relationship between surface spatial language and interpreta-
tion is not a simple mapping, but rather a more complex function. In the case
of our action schemas then:
action schema = f (context, surf ace) (4)
where f is a complex grounding and enrichment process involving on one hand

the resolution of objects from the speaker’s model which match the (sometimes
partial and ambiguous) descriptions given in the spatial language surface form,
while on the other hand requiring the application of context to supplement the
information not provided in the surface form. This grounding and enrichment
process can itself require linguistic interaction when context alone cannot provide
a unique action schema. Here we will sketch and illustrate the spatial elements
of such a function.
An appropriate action schema must first be selected from a schema inventory
provided by the embodiment layer. In the case of fully specified surface forms,
this decision can be based directly on mappings between configuration type and
action schema types. However, if a surface form is missing key information such
as a verb or other indications of configuration type, the choice of action schema
must instead be based on the schema type whose parameters are maximally filled
by the information given in the surface form. Similarly, rule based decomposition
of configurations into series of action schemas is necessary for example in the
case of surface forms which supply complex route information.
Since the surface spatial form typically provides concept descriptions rather
than ground concepts from the agent’s spatial model, grounding functions must
be applied to action schema parameters. The set of grounding functions are
themselves dependent on a wide variety of spatial and non-spatial factors, the
automatic and complete definition or categorization of which is arguably an AI-
Complete problem in that it requires human level intelligence. However, within
practical models we can of course make various simplifications and assumptions
which provide an adequate approximation of actual grounding models. Typi-
cally within the route interpretation domain parameters to be ground include
landmarks. As illustrated by Figure 3, since action schema depend on directions
and motion constraints defined largely in terms of the agent’s ego-centric per-
spective, the grounding process must also include the application of perspective
and reference frame transformation to directions provided in surface form spatial
descriptions.
For single action instructions, if, during the grounding process, a unique
parametrization of the action schema can be made, then the action may be
committed to by the agent immediately. Whereas if no suitable parametrization
is found, or if multiple solutions exist, then clarification through spoken dialogue
is necessary to resolve the inherent ambiguity. For multiple action schemas as
typified by complete route instructions we must adopt an incremental integration
approach which composes a process structure from supplied information:
1. Construct multiple ungrounded action schemas through decomposition and
augmentation of surface spatial language configurations.
2. For action schema 1 apply grounding process as per single schema grounding,
Store final position, probability tuples for the action.
3. For action schema i + 1 take the most probably location tuples from action
schema i and supply them as input parameters to the grounding of action
schema i + 1.
4. If for action schema n one solution exists where the probability of location
(p) is greater than a threshold (t), the sequence of grounded action schemas
can be committed to by the agent.
This method, similar to the search algorithm applied by [16], essentially moves
a set of most probable locations through a physical search space seeking the
most probable final destination given the set of action specifications supplied.
However since the search space in our case has been simplified to a conceptual
graph structure which includes information on explicit junctions etc., rather than
a more fine grained voronoi graph which treats all nodes equally, the search
process is considerably simplified, resulting in even short route interpretations
providing accurate results.
Moreover, the current model offers a simple backtracking solution for the
case where for action schema n the number of solutions is greater than one, or
where no solution exists. In this case rather than rejecting the user’s request,
the interpretation algorithm may backtrack to the last action segment where
no unique solution exists and compose a clarification question relevant to that
point.
5 Model Application and Evaluation
The models of spatial action and route interpretation described earlier have
been partially implemented within a modular framework of spatial language
production and analysis, and evaluated as a whole in a user study with the
developed system.
244 R.J. Ross
Fig. 4. The Corella Dialogue Framework & Navspace Application
5.1 Implementation Description

Figure 4 presents an overview of the system which comprises on one hand an
application independent dialogue processing framework, and on the other a do-
main specific robot simulation application based on the models described in
earlier sections. The dialogue framework, named Corella, is broadly based on
the principles of Information State processing [21] and fine-grained grammatical
mechanisms [22]. Various choices within the dialogue architecture - including the
choices of input and output components, linguistics semantics formalisms, and
grammars for analysis and generation - have been outlined elsewhere [23], thus
for reasons of space constraints we will not consider them further here.
In earlier work [23], direct integration with the robotic wheelchair platform
described by [16] was undertaken. However, for the currently presented work,
we have developed an application model based around a simplified simulation
environment so as to best study representation of space for linguistic interaction,
as well as issues of domain interface design, without having to consider low-level
interfaces and models which have been designed for robot specific navigation
and localization issues.
Following our background in developing language interfaces for intelligent
wheelchairs, the current Navspace design models a simulated robot in a schema-
tized office environment. Typical interaction with a system involves using an
interface similar to Figure 5 to direct the wheelchair around the environment
with typed free form natural language input. The system can also communicate
with the user through a text window, with system communicative acts displayed
to the user within the same interaction window.
Fig. 5. The Interaction Window
5.2 Study 1: Human-Human Gold Standard

As a data gathering exercise we first ran a human-human study in our research
group to gather linguistic and behavioural data for a gold-standard of human-
human route interpretation performance. The study, detailed and analysed in
detail by Goschler et al. [24], required pairs of native German speakers to play
the roles of “route giver” (RG) and “route follower” (RF); thus emulating a route
interpretation task for a user and intelligent wheelchair. Participants, placed in
separate physical rooms, interacted through a chat interface and two views on
a shared spatial environment. Within a given trial, each participant viewed the
shared simulated environment from a plan perspective which included corridors,
named and unnamed rooms, and a simulated wheelchair’s position in the envi-
ronment at any given time. The RG, but not the RF, was also made aware of
a given target destination for each trial thorough the highlighting of a portion
of the screen. The RF on the other hand was given physical control over the
wheelchair via a joystick, and could thus move the wheelchair towards the tar-
get based on typed instructions from the RG. Figure 5 depicts a screen similar
to that presented to RG and RF during each trial. The key differences being a
re-naming of the chat window label “Rohlstuhl” (Wheelchair) to “Partner” for
human-human trials.
Given, (a) both RG and RF were situated in the same spatial model and had no
reason to think otherwise; (b) RG and RF were given continuous real time infor-
mation on wheelchair pose; and, (c) RF was manually controlling the wheelchair -
thus required to move his/her hands from the joystick to the keyboard to type,
246 R.J. Ross
we expected limited, if any, explicit dialogic interaction. However, as detailed in

[24], a surprising amount of dialogue was observed, including explicit acknowledge-
ments, clarifications, and challenges. Thus, in addition to allowing the collection
of data, the human-human study illustrates the complexities of spatial language
processing even for competent human speakers.
5.3 Study 2: Human-Computer Interaction

To evaluate the spatial language interpretation models, we ran a second study
with human participants interacting with the dialogue system implementation.
In this study participants played the role of Route Giver (RG) while the system
itself played the role of Route Follower (RF) in a scenario, set up, and spatial
configuration practically identical to that for the human-human study just de-
scribed. The application included many of the model features discussed earlier
including linguistic semantic to action schema transformations, and the spatial
context sensitive resolution of referenced landmarks. Reference frame identifica-
tion and transformation was not however included. The interface presented to
the participant is shown in Figure 5.
In total, thirteen participants took part in this study, each of whom was a
native German speaker and an undergraduate student of Applied Linguistics at
University Bremen. As with the human-human study, each participant was given
written instructions informing them of their objectives. Participants took part in
11 trials, the first trial being a ‘test run’ after which questions could be directed
to the experimenter. For each trial the wheelchair avatar was positioned within a
room in the environment and a target location was highlighted for the participant
to see. The same 11 start and end points were used as in the human-human
study. Unlike in the human-human trial, a time-out mechanism was included in
the experiment scenario to (a) encourage participants to complete the task in
a timely fashion, and (b) to prevent participants becoming stuck in a trial due
to failure of the dialogue system. The time-out was set for four minutes (apart
from the test trial), after which the trial was aborted and the next trial loaded.
While detailed linguistic analysis of the results has not yet been completed,
we can report here on the success rates and observations of limitations in the
implemented model. The study was broadly split into two groups: group one for
testing and group two for evaluation. Group one involved 7 of 13 participants,
who were used for isolating technical issues and extending the input grammars
towards common language strategies used by the participants. For the second
group of 6, the system design was left constant so as to allow comparison be-
tween participants. Of group one, 5 of 7 participants aborted the study, before
partaking in all 11 trials, due to system errors. However for group two only 1 of 6
participants had to abort early. Of group two we had a total of 58 trials (5 com-
plete sets of 11 plus one set of 3). Of these 58 trials, 50 (86%) were successfully
completed by the participants within the timeout window.
As predicted both by the results of our own gold standard study [24], and
earlier studies with humans and a Wizard of Oz scenario [11], a wide variety
of language strategies were used by participants to direct the RF to the desti-

nation. Following an initial review of the results corpus, it is clear that while
many causes of communicative failure were due to insufficient grammatical cov-
erage, the results show that many other errors are due to modest verbal feedback
from the system in case of uncertain interpretations. We believe that generating
feedback based on the explicit grounded interpretations maintained in action
schemas could thus significantly improve performance rates; thus we are cur-
rently extending our model to achieve just this.
6 Discussion and Outlook

The aim of the work presented in this paper has been to investigate complete
models of spatial language interpretation which systematically consider issues
of language modelling and representation as well as issues of embodiment. We
argue that the elements of spatial and linguistic reasoning must be separated
out through appropriate modularization of the cognitive backbone, but that clear
modelling of the relationships between different reasoning layers is necessary. The
mapping between different reasoning layers is non-trivial, being highly dependent
on spatial, discourse, and other forms of context.
We have proposed an action oriented view of the interpretation of spatial mo-
tion instructions and route instructions. While an action oriented perspective
is in some way a simplification of previous more formalized views, we believe
that the action oriented perspective is ideally suited to the modelling of a range
of complex spatial constraints which can be captured within route instructions.
Such an action-oriented view does not however preclude the use of sophisticated
qualitative modelling and reasoning techniques within the spatial processing
system of such agents, indeed such models are essential for more complex forms
of spatial reasoning necessary to ground the various parameters of the action
schemas assumed by our models. It is simply however that we assume the in-
terface to spatial motion to be defined in terms of actions rather than what we
consider a premature transition to a structural formalism.
As mentioned, the system implementation used in the user studies reported,
assumes system ego-perspective in all interpretation tasks, thus a detailed model
of perspective choice on spatial and discourse context is currently under develop-
ment. Furthermore, while various approximations of the action schema profiles
have been developed for our prototype, a more systematic coverage of dynamic
spatial relations would be highly beneficial. Despite both these limitations, and
the necessity of providing a broad overview of the problems addressed rather
than a precise fine grained model, we believe the models presented here to be
a useful intermediate step towards more detailed models of spatial language
interpretation for artificial agents.
Acknowledgments. I gratefully acknowledge the support of the Deutsche
Forschungsgemeinschaft (DFG) through the Collaborative Research Center
SFB/TR 8 Spatial Cognition - Project I5-[DiaSpace]. I also thank Prof. John
Bateman for his useful comments on early drafts of this paper.
248 R.J. Ross
References
1. Levinson, S.C.: Space in language and cognition: explorations in cognitive diversity.

Cambridge University Press, Cambridge (2003)
2. Bateman, J., Hois, J., Ross, R., Tenbrink, T., Farrar, S.: The Generalized Up-
per Model 3.0: Documentation. SFB/TR8 internal report, Collaborative Research
Center for Spatial Cognition, University of Bremen, Germany (2008)
3. Krieg-Brückner, B., Frese, U., Lüttich, K., Mandel, C., Mossakowski, T., Ross, R.J.:
Specification of route graphs via an ontology. In: Proceedings of Spatial Cognition
2004, Chiemsee, Germany (2004)
4. Kuipers, B.: The spatial semantic hierarchy. Artificial Intelligence 19, 191–233
(2000)
5. Cohn, A., Hazarika, S.: Qualitative spatial representation and reasoning: an
overview. Fundamenta Informaticae 43, 2–32 (2001)
6. Pickering, M.J., Garrod, S.: Towards a mechanistic psychology of dialogue. Be-
havioural and Brain Sciences 27(2), 169–190 (2004)
7. Denis, M.: The Description of Routes: A Cognitive Approach to the Production of
Spatial Discourse. Cahiers de Psychologie Cognitive 16, 409–458 (1997)
8. Denis, M., et al.: Spatial discourse and navigation: An analysis of route directions
in the city of venice. Applied Cognitive Psychology 13, 145–174 (1999)
9. Prévot, L.: Topic structure in route explanation dialogues. In: Proceedings of the
workshop ”Information structure, Discourse structure and discourse semantics” of
the 13th European Summer School in Logic, Language and Information (2001)
10. Werner, S., Krieg-Brückner, B.: Modelling navigational knowledge by route graphs.
In: Freksa, C., Habel, C., Wender, K. (eds.) Spatial Cognition 2000. LNCS (LNAI),
vol. 1849, pp. 295–317. Springer, Heidelberg (2000), http://www.springer.de
11. Shi, H., Tenbrink, T.: Telling Rolland where to go: HRI dialogues on route navi-
gation. In: WoSLaD Workshop on Spatial Language and Dialogue, 2005, October
23-25 (2005)
12. Tversky, B., Lee, P.U.: How space structures language. In: Freksa, C., Habel, C.,
Wender, K.F. (eds.) Spatial Cognition 1998. LNCS (LNAI), vol. 1404, pp. 157–176.
13. Bateman, J., Borgo, S., Luttich, K., Masolo, C., Mossakowski, T.: Ontological
modularity and spatial diversity. Spatial Cognition & Computation 7(1) (2007)
14. Krieg-Brückner, B., Shi, H.: Orientation calculi and route graphs: Towards se-
mantic representations for route descriptions. In: Proc. International Conference
GIScience 2006, Münster, Germany (2006)
15. Lauria, S., Kyriacou, T., Bugmann, G., Bos, J., Klein, E.: Converting natural
language route instructions into robot executable procedures. In: Proceedings of
the 2002 IEEE Int. Workshop on Robot and Human Interactive Communication,
Berlin, Germany, pp. 223–228 (2002)
16. Mandel, C., Frese, U., Röfer, T.: Robot navigation based on the mapping of coarse
qualitative route descriptions to route graphs. In: Proceedings of the IEEE/RSJ
International Conference on Intelligent Robots and Systems (IROS 2006) (2006)
17. Shi, H., Mandel, C., Ross, R.J.: Interpreting route instructions as qualitative spatial
actions. In: Spatial Cognition V. LNCS (LNAI), vol. 14197, Springer, Heidelberg
(2007), http://www.springeronline.com
18. Richter, K.F., Klippel, A.: A model for context-specific route directions. In: Pro-
ceedings of Spatial Cognition 2004 (2004)
19. Eschenbach, C.: Geometric structures of frames of reference and natural language
semantics. Spatial Cognition and Computation 1(4), 329–348 (1999)
20. Bateman, J., Farrar, S.: Modelling models of robot navigation using formal spatial
ontology. In: Proceedings of Spatial Cognition 2004 (2004)
21. Larsson, S.: Issue-Based Dialogue Management. Ph.d. dissertation, Department of
Linguistics, Göteborg University, Göteborg (2002)
22. Steedman, M.J.: The syntactic process. MIT Press, Cambridge (2000)
23. Ross, R.J., Shi, H., Vierhuf, T., Krieg-Bruckner, B., Bateman, J.: Towards Dialogue
Based Shared Control of Navigating Robots. In: Proceedings of Spatial Cognition
2004, Germany. Springer, Heidelberg (2004)
24. Goschler, J., Andonova, E., Ross, R.J.: Perspective use and perspective shift in
spatial dialogue. In: Proceedings of Spatial Cognition 2008 (2008)
Perspective Use and Perspective Shift
in Spatial Dialogue
Juliana Goschler, Elena Andonova, and Robert J. Ross

{goschler,robertr}@informatik.uni-bremen.de,
andonova@uni-bremen.de
Abstract. Previous research has shown variability in spatial perspec-

tive and the occurrence of perspective shifts to be common in monologic
descriptions of spatial relationships, and in route directions, in partic-
ular. Little is known, however, about preferences and the dynamics of
use of route vs. survey perspectives as well as perspective shifts in dia-
logue. These were the issues we addressed in a study of dialogic inter-
action where one participant instructed the other on how to navigate
a wheelchair avatar in a shared environment towards a goal. Although
there was no clear preference for one of the two perspectives overall,
dialogues tended to evolve from an early incremental, local, ego-based
strategy towards a later more holistic, global, and environment-oriented
strategy in utterance production. Perspective mixing was also observed
for a number of reasons, including the relative difficulty of spatial sit-
uations and changes across them, navigation errors by the interlocutor,
and verbal reactions by the interlocutor.
Keywords: Spatial Language, Perspective.
1 Introduction
Communication about the world, even in its simplest form, can easily turn into a
problem-solving task because form and function do not match unequivocally in
language systems. Multiple forms may correspond to one and the same function
or meaning, and multiple functions may be associated with one and the same
verbal expression. In addition, the same referential object or scene can trigger a
number of different perceptual and conceptual representations [1], or a certain
arrangement of objects can be perceived and conceptualized in multiple ways. For
example, in a study of goal-directed dialogue [2], different description schemes
were used by participants in reference to a maze and movement in it (path,
coordinate, line, and figural schemes). Similarly, in a study of how people describe
complex scenes with multiple objects, participants’ choices varied significantly
[3] depending on the nature of the spatial array. Thus, we wanted to investigate
how people deal with these issues in a dialogic spatial task. Specifically, we
were interested in their perspective-taking and how they would solve occurring
ambiguities and misunderstandings.

Perspective Use and Perspective Shift in Spatial Dialogue 251
Multiple perspectives, or ways of speaking about the world and the entities
that populate it, are reflected at different levels of language, e.g., in lexical and
syntactic alternatives, but also in variation at a conceptual level. In spatial refer-
ence, different conceptualizations can be seen in the choices of spatial perspective
and frames of reference. Perspective taking involves abstracting from the visual
scene or schematization [4] and it has been interpreted as occurring at the level of
microplanning of utterances [5,6] rather than macroplanning (deciding on what
information to express, e.g., which landmarks and their relations are to be men-
tioned). Therefore, while being related to lexical and grammatical encoding, it
carries conceptual choices beyond them.
In spatial perspective, there have been two views as defined by Tversky [7] - on
the narrow view, perspective is realized through the choice of reference system,
variously classified into deictic, intrinsic, extrinsic, or, egocentric and allocentric
in wayfinding, and relative, intrinsic, and absolute in Levinson’s framework [8].
On the other hand, the broadly viewed perspective choices refer to the use of
reference systems in extended spatial descriptions (e.g., of a room, apartment,
campus, town). Spatial perspective of this kind has also been categorized in al-
ternative ways. In a binary classification schema, embedded perspective refers
to a viewpoint within the environment and goes well together with verbs of
locomotion and terms with respect to landmarks’ spatial relations to an agent
while external perspective takes a viewpoint external to the environment and
is commonly associated with static verbs and cardinal directions [9]. In a tri-
partite framework of spatial perspective, the route perspective/tour is typical
of exploring an environment with a changing viewpoint, the gaze perspective is
associated with scanning a scene from a fixed viewpoint outside an environment
(e.g., describing a room from its entrance), and in the survey perspective a scene
or a map is scanned from a fixed viewpoint above the environment [7,6].
Variability in perspective is an important feature of spatial language. Previous
studies have considered several individual, environmental, and learning factors
as a source of this kind of variation in verbal descriptions. Mode of knowledge
acquisition has been shown to affect perspective choices in spatial memory, for
example, participants who studied maps gave more accurate responses later to
survey perspective tasks whereas participants who were navigating gave more
accurate responses to route perspective tasks [10]. In addition, in these exper-
iments, spatial goals (route vs. survey) were also shown to affect perspective
choices.
Taylor & Tversky [6] tested the influence of four environmental features on
spatial perspective choices and found that although overall most participants’ de-
scriptions followed a survey or a mixed perspective, preference for the use of route
perspective rather than mixed was enhanced in environments that contained a
single path (vs. multiple paths) and environments that contained landmarks of
a single size scale (vs. landmarks of varying size). The other two environmental
features that were manipulated (overall size and enclosure) did not produce any
clear pattern of preferences in their participants’ descriptions.
252 J. Goschler, E. Andonova, and R.J. Ross
Variability in perspective choices is frequently accompanied with perspective

switching behavior-participants tend to mix perspectives quite regularly, for ex-
ample, 27 out of 67 participants in Taylor & Tversky’s first experiment and 74
out of 192 participants in their second experiment mixed perspectives in their
descriptions [6]. There are multiple reasons why a speaker may switch from one
perspective to another. Perspectives are problem-ridden and fit certain situations
and tasks better than others. For example, in a deictic system which derives from
a speaker’s viewpoint, interlocutors must keep track of their partners’ viewpoints
throughout the spatial discourse, or, in an intrinsic system, the success of coor-
dination between interlocutors depends on having a shared image of the object
in order to identify intrinsic sides and relations unambiguously [11].
However, consistency in spatial description, including spatial perspective, has
also been identified as an important factor in choices of reference frame, in lexi-
cal and syntactic means of expression. Vorwerg, for example, found considerable
internal, within-participant consistency in reference frames as well as language
features [12]. Consistency in the use of perspective may be useful in at least two
ways - by offering more cognitive ease for the speaker who can proceed with a
given perspective or reference frame that has already been activated (success-
fully) and more cognitive ease for the addressee by providing coherence to the
spatial discourse. Tversky found that in comprehension, reading times and imme-
diate verification times are both decreased by consistency of perspective in texts
[7]. They conclude that the cognitive cost to switching perspective is relatively
small and transient, while there is a cognitive cost involved also in retaining
the same perspective, for example, changes in the viewpoint within a route per-
spective, and that on balance, switching perspective may be more effective in
communication than not switching perspective. The paradox of having advan-
tages both for switching perspective and for staying in the same perspective can
only be resolved by analysis of specific situations and interactions [11].
In sum, previous studies have been able to account for some important as-
pects of perspective choice and perspective switching. However, little is known
about two related issues that could enhance the ecological validity of research on
the topic. First, previously published research has focused on verbal descriptions
by individuals in a monologic format (even if participants were told that their
descriptions would later be used by future addressees in a different task). There
are a few exceptions: Schober’s study [13] showed that speakers set spatial per-
spectives differently with actual addressees than with imaginary ones. Striegnitz
et. al., in their study on perspective and gesture in direction-giving dialogues
[14], point out that the use of survey perspective increases in answers to clar-
ification questions and in re-descriptions of an already given route description.
Second, online interaction involving simultaneous verbal exchanges and physical
movement along a route in a given environment has not been studied yet sys-
tematically. It is not immediately clear to what extent speakers’ choices in such
a format of interaction (both dialogic and online) would replicate existing find-
ings. For example, switching perspective in dialogue can take forms not found
in single-participant descriptions, e.g., spatial reference may occur as part of the
joint cooperative activities of the two interlocutors, or spatial perspective nego-

tiation may emerge as a natural feature of their interaction. In this paper we will
consider perspective taking in the spatial domain in such a dialogic situation,
and more specifically, we will examine the differences between the use of route
and survey perspectives.
2 Perspective Use in Spatial Dialogue
When two interlocutors refer to one and the same spatial array, they select
a frame of reference or a perspective for the description. Thus, in dialogue,
perspective use and perspective switching are part of overall coordination. The
need to align perspectives may arise because interlocutors have different viewing
positions (vantage points) with respect to a scene, or because the terms referring
to objects’ spatial relations may be ambiguous or underspecified. In our study,
we kept the vantage point invariable, but there were two possible perspectives
on the scene, namely survey and route perspective: participants could look at
the map and refer to the main directions as left, right, up, and down in a survey
perspective; or they could take the perspective of the wheelchair avatar and refer
to the main directions with left, right, forward, backward in a route perspective.
The availability of these two different perspectives on spatial scenes leads
in many situations to ambiguous utterances in route or location descriptions,
e.g., the meaning of left and right may differ in route and survey perspective.
Whenever people have to deal with two-dimensional representations of three-
dimensional space, this problem is likely to occur. Thus, the data we collected
with participants who navigated a wheelchair avatar on a map on the computer
screen point to more general problems when people have to use maps of any
kind. For example, if one is told to go left in the position indicated by the
wheelchair avatar on Fig.1., this could be interpreted as an instruction to turn
to one’s intrinsic left and then continue movement (in the route perspective)
or to move in the direction they are already facing in which case left would be
employed as a term in the survey perspective on this map. This term left could
receive the same interpretation only when the instruction-follower’s orientation
is aligned with the bodily axes of the speaker (the two perspectives would then
be conflated).
How do interlocutors manage to align perspectives and communicate success-
fully then? Alignment or marking of perspective can be achieved explicitly by
giving a verbal signal of the choice or of the switch of perspective. Previous
research has indicated that this is rare. However, previous studies have mostly
focused on individuals giving a spatial description to an imaginary rather than a
real interlocutor. Dialogue, on the other hand, offers the addressee the possibility
to explicitly confirm or question a perspective choice or even initiate a switch
that may not have otherwise occurred.
In the corpus of data we collected, participants did so by saying “from my
point of view”, “(to the left) on the map”, “if you look at the picture” to express
that they were using the survey perspective. The route perspective was signalled
Table 1. Examples of linguistic markers of route and survey perspective in the corpus
Route Perspective Survey Perspective

vom Rollstuhl aus von dir/ mir aus gesehen
vom Fahrer aus auf der Karte
so wie der Fahrer es sieht wenn du auf das Bild guckst
wieder (links, rechts) oben, unten
(links, rechts) dann links, rechts obere, untere
hinter über, unter
vor oberhalb, unterhalb
Rückseite, Vorderseite hoch, runter
nach vorne ganz (links, rechts, oben, unten)
vorwärts, rückwärts der rechte/linke Flur
by phrases such as “seen from the wheelchair”, “seen from the driver”, etc.
(Table 1). There are in fact some further linguistic markers for perspective that
can give the interaction partners clues about which perspective their dialogue
partner is taking. For example, while the terms left and right (Ger., “links”,
“rechts”) are perspectively ambiguous, up/above and down (Ger., “hoch”/“nach
oben”; “runter”/“nach unten”) are not.
Alignment of perspective, however, may also be achieved implicitly, without
any verbal reaction. In the case of real-world tasks such as navigation, for exam-
ple, tacit agreement (and alignment of perspective) may also occur at the level
of subsequent task-relevant non-verbal action (e.g., physical movement) by the
instruction-follower which indicates that the previous speaker’s utterance was
treated as felicitous enough and ensuing action could be initiated. Most of the
participants in our study did not refer explicitly to their perspective choices at
all, and still managed to take the same perspective and accomplish the task.
3 Method
In order to examine how people deal with the problem of perspective when more
than one is possible and appropriate to use, we elicited a small corpus of typed
interaction by giving participants a shared spatial task. To accomplish this task,
which consisted of the navigation of an iconic wheelchair on the schematized map
of an office building, participants had to interact with a partner. Participants’
utterances were then analyzed with respect to the use of the route and survey
spatial perspectives.
3.1 Participants
Participants were 22 sixteen to seventeen year-old students at a local high-school.
All of them were native speakers of German. Dialogue partners communicated
in same-sex dyads (5 male, 6 female).
3.2 Apparatus
A networked software application was used to allow participant pairs to com-

municate via a chat interface whilst being provided with a view on a schema-
tised spatial environment which at any time included the location of an iconic
wheelchair.
Within a given environment the wheelchair avatar location was controllable
via a joystick. Movement behaviour simulated two-wheel differential drive robot
movement, with both angular and transitional velocities proportional to joystick
position. Movement of the avatar was constrained by the presence of walls in the
environment, but no doors were explicitly modelled, thus allowing the avatar to
be freely moved in and out of spaces.
Text based rather than spoken communication was used to best simulate the
communication channels conditions typical of our computational dialogue imple-
mentations (See [15]). While this communication mode results in more favourable
transcription and analysis times, we are aware of the differences introduced by
typed rather than spoken language. To alleviate these effects a history-less chat
interface was used where partner utterances were removed in a time proportional
to the length of that utterance. Participants typed into one text box labelled Ich
(“I” in German), while their partner’s text was presented in an appropriately
labelled text box. In addition to recording all typed utterances, the application
logged the position and orientation of the wheelchair at regular intervals.
The software was set up on two terminals in separate rooms which were con-
nected over a network. Terminals were identical apart from the presence of the
joystick controller at one terminal only.
3.3 Stimuli
Each dyad participant was given a view of a schematised indoor environment
on the computer screen. The same spatial environment was available to both
speakers to minimize explicit negotiation of the map. The same map was used
throughout all trials and with all dyads. The map, depicted in Figure 1, included
unnamed locations, 6 named locations, and the position of the wheelchair avatar
at any given time. One participant’s view also indicated a target location for the
shared task through the highlighting of one room on the screen.
3.4 Procedure
Two participants at a time were placed at the terminals and provided with sepa-
rate written instructions. Instructions required the participant with the joystick
to act as an instruction-follower in the interaction by imagining that they were
situated in the environment with their partner, who is in the wheelchair and is
instructing them towards a goal. The instruction-giver on the other hand was
asked to imagine being situated in the wheelchair and giving instructions towards
a goal.
Fig. 1. Interface window. The goal area is identified only on the instruction-giver’s
map.
The complete task consisted of 11 trials where within each trial the instruction-
giver directed the instruction-follower towards the goal. Each trial began with the
wheelchair avatar located within a room but facing an exit onto a corridor. Par-
ticipants were then free to communicate via the chat interface. No time-out was
used for the trial, but instructions did request that participants attempt the task
as quickly as possible. Once participants had successfully navigated the wheelchair
avatar to the target room, the screen went blank. After two seconds, the map reap-
peared with a new starting position of the wheelchair in one of the rooms and a
new target room. The same 11 start and end point configurations were used across
all dyads in a different pseudo-randomized order for each dyad.
While the task structure is similar to the Map Task [16], and in particular its
text-mediated realization by Newlands et al. [17], in that both tasks involve the
description of routes between two interlocutors, there are important differences
between the tasks with respect to our research goals. The Map Task purposefully
introduces disparities between the maps used by interlocutors to solicit explicit
discussion of the spatial arrangements presented. While this results in interesting
dialogue structure, it also complicates rationality for explicit perspective shift,
which, as we will see, exists even with the isomorphic spatial representations
present in our task. Moreover, it has been our aim to analyse communication in
an interaction situation which is more directly related to our targeted application
domain of route following assistance systems [15].
4 Results and Discussion
The main research questions in this study were related to the choice of perspec-
tive made by the instruction-giver and instruction-follower in these dialogues,
how their choices changed over time, especially in terms of the general efficiency
of interaction (measured here in number of utterances spoken before the goal
was reached), how much coordination there was between interlocutors, and the
patterns underlying shifts from route to survey perspective and vice versa.
There are several features of the design that are very likely to have influ-
enced the choice of perspective. One was the setup with the map on the screen
which was positioned vertically in front of them. This should trigger the use of
the survey perspective since it is the one aligned with their own bodily axes. It
is unambiguous and cognitively “cheaper” because there is no need for mental
rotation because of the orientation of the wheelchair. That is why the survey
perspective could have been expected to dominate. On the other hand, partic-
ipants may have been biased towards the use of the route perspective by the
task in which movement in a wheelchair with its clear intrinsic front and back
was involved. In addition, participants were explicitly encouraged to take the
perspective of the wheelchair in the task instructions.
The interaction in the eleven dyads on 11 trials each yielded a corpus of 121
dialogues and a total of 1301 utterances, the majority of which (1121) were
task-related. As the focus of this study was on perspective use, only the 552
utterances indicating a spatial perspective were included in the analyses (49.24%
of all task-related dialogues). Other task-related utterances included incremental
route instructions by the instruction-giver such as go on or stop, go out of the
room or similar and clarification questions by the instruction-follower such as
what?, where to? here?.
4.1 Preferred Perspective Use
In order to examine the preference for one of the two perspectives (route vs.
survey), we first classified all task-related utterances indicating a spatial per-
spective into the following categories: (a) utterances with route perspective, (b)
utterances with survey perspective, (c) utterances with mixed perspective, and
(d) utterances with conflated perspective where the description is valid in both
route and survey perspectives. Only a small percentage was either mixed (1.59%)
or conflated (7.58%) and they were excluded from subsequent analyses. Thus,
the data could be analysed in terms of a binary choice between route and sur-
vey perspective utterances yielding a mean percent use of route perspective as
a measure and 462 utterances to be included in the analysis. As a result, the
overall mean percent use of route perspective in this corpus was established as
Table 2. Number of utterances with spatial perspective and mean percent use of
utterances in route perspective produced by instruction-givers and instruction-followers
Instruction Giver Instruction Follower Total

N utt. in spatial perspective 414 48 462
N utt. in Route perspective 294 20 314
N utt. in Survey perspective 120 28 148
Mean % use of Route perspective 71.01% 41.67% 67.97
67.97% (SD=46.71%). Participants produced a total of 314 utterances in the

route perspective and 148 utterances in the survey perspective. A breakdown by
dialogic role showed that instruction-givers dominated these dialogues (Table 2).
This is not surprising given that it was their task to provide directions to their
conversational partners. In addition, the task requirements for the instruction-
follower were such that producing speech was in addition to joystick navigation
which took up a relatively large amount of their time and effort, hence their
limited participation in these exchanges. Furthermore, instruction-givers’ utter-
ances were mostly in the route perspective while the opposite was observed for
the instruction-followers whose utterances in the survey perspective outnum-
bered those in the route perspective (Table 2).
Although this descriptive analysis reveals a general preference for route-
perspective utterances, based on instruction-givers’ utterances, it is not infor-
mative of the dynamics of use across interactions, and for this purpose, further
analyses were conducted on the data averaged for each trial and dyad, yielding
121 means (11 dyads x 11 trials).
One issue that could be addressed in this analysis was the relative dominance
of use of the two perspectives across dyads and speakers. A one-way ANOVA re-
vealed a great deal of variation across dyads with respect to the mean percent use
of route perspective (F(10,100) = 12.29, p <.001). On average, dyads produced
53.61% of their utterances in a trial in the route perspective (SD = 43.43%).
However, while the mean percent use of route perspective in some dyads was as
low as 0%, i.e., all their relevant utterances were framed in the survey perspec-
tive, in others it was as high as 94.91%. As these figures indicate, although in
the corpus as a whole, route perspective utterances were far more numerous, if
perspective choices are examined within dyads, we find a considerable amount of
variation, and no clear dominance of one of the two perspectives (M=53.61%).
4.2 Speaker Coordination and Efficiency

Clearly, there were significant differences between instruction-givers and
instruction-followers on the total number of utterances and on the mean percent
use of route perspective (Table 2). However, conversational partners also tend
to converge in their choices and develop mutually acceptable strategies of refer-
ence and description schemes [18,2,19]. This is why we also explored the degree to
which there was coordination between instruction-givers and instruction-followers
Table 3. Correlations among trial number, number of utterances by the interlocutors,

mean percent use of route perspective, and mean percent perspective shift. *p < .05,
†p < .01, ‡p < .001. Note. Trial - trial number; IG Utt - number of instruction-givers’
utterances; IF Utt - number of instruction-followers’ utterances; ALL Utt - number of
all utterances in a dyad; %Route - mean percent use of route perspective in utterances;
%Shift - mean percent perspective shift; n.s. - not significant.
Trial IG Utt IF Utt ALL Utt %Route

IG Utt -.19*
IF Utt -.20* .86‡
ALL Utt -.20* .98‡ .94‡
%Route -.28† .28† .25† .28†
%Shift n.s. .22* .27† .24† n.s.
in individual dyads in terms of ‘talkativeness’, i.e., number of utterances, and in

terms of preferences for a certain perspective. A correlation analysis revealed a
strong positive correlation between the average number of utterances with a spa-
tial perspective produced by instruction-givers and the number of such utterances
by instruction-followers (r = .86, p <.001; see Table 2). This confirms the existence
of a high degree of coordination between interlocutors in the individual dyads in
terms of the number of their dialogic contributions – the more the instruction-giver
spoke, the more the instruction-follower said as well, and vice versa. However, no
statistically significant correlation emerged between the mean percent use of route
perspective by instruction-givers and by instruction-followers. At first glance, this
may appear to suggest that instruction-givers and instruction-followers had differ-
ent perspective preferences and maintained these preferences irrespective of their
interlocutors’ utterances. However, a more detailed analysis reveals that the two
speaker roles were associated with different types of dialogic contributions –
whereas instruction-givers’ utterances with spatial perspective were mostly ex-
amples of direction-giving (assertions & action-directives in the terminology of
DAMSL [20]), instruction-followers’ spatial-perspective contributions were
frequently queries, information requests, or reformulations of the directions given
by instruction-givers. Thus, they did not function as a contribution to joint refer-
ential activities by repetition or imitation (convergence) but by offering alternative
formulations or questioning incomplete and ambiguous directions by instruction-
givers. At the same time, efficiency and economy of effort was a major con-
sideration for instruction-followers who were always doing joystick navigation
simultaneously. Therefore, instruction-followers’ utterances in the same perspec-
tive would have been redundant and relatively inefficient.
In fact, efficiency of communication should increase across trials according to
the collaborative model, or the framework of joint common ground [18]. Previous
findings have demonstrated such effects as far as reference to the same pictorial
stimuli is concerned, for example, in a study of Tangram shape descriptions,
Krauss [21] found that later references (noun phrases) to the same figure tended
to be shorter. Similarly, in Clark’s [18] referential communication task, the
director and the matcher became more efficient not only from one trial to the next
but also from the beginning to the end of a trial (measured in number of words).
In our study, we addressed the question of changes in the efficiency of interaction
over time in a correlation analysis which revealed a negative correlation between
trial number, on the one hand, and the average number of utterances produced
on a trial by dyads, the number of instruction-givers’ utterances, and the num-
ber of instruction-followers’ utterances, on the other hand. These correlations
reached significance but were rather weak (Table 3) and need to be interpreted
in view of the differences across experimental designs and measures. The studies
using the referential communication task mentioned above examine efficiency in
reference to the same shape, object, or stimulus more generally, selected from
a limited and pre-specified set of options which were visually available to both
participants, whereas in our study, although the task remained the same (giving
route directions in a certain map), the routes themselves, i.e., the positioning
of their start and end points on the map, varied on each trial. Still, the overall
result is clear - efficiency in terms of shorter dialogues increases across the span
of the experimental session.
4.3 Perspective Use across the Interaction Span

Apart from considerable inter-dyad diversity on the use of perspective and per-
spective shifts, there was also much variation across experimental trials, ranging
from 77.36% mean use of route perspective on the first trial to 30% on the last
trial. In order to examine this decline in the use of route perspective systemat-
ically, we turn next to the analysis of perspective choices and perspective shift
depending on how long the interlocutors in a dyad have already interacted in
this task. The measure for this is the number of the experimental trial (1-11
where 1 is the first trial and 11 is the last trial for a dyad).
The correlation analysis confirmed the existence of a weak to moderate nega-
tive correlation between trial number and mean percent use of route perspective
(r = -.28, p <.01). On early trials, utterances in the route perspective tended
to be produced more than on later trials overall. This correlation illustrates the
tendency of speakers to opt for the survey perspective more and more as time
elapsed. In addition, as Table 3 shows, there were significant positive correla-
tions between the different efficiency measures (number of utterances overall,
by instruction-givers, and by instruction-followers) and the mean percent use of
route perspective indicating that dyads who spoke on average more on a trial,
also tended to have a preference for route-perspective utterances. Having in
mind that later utterances also tended to be shorter, as discussed earlier, these
correlations taken as a whole point towards survey-perspective utterances being
more economical in this task and suggest, as before, that over time, participants
tended to be more efficient in their interactions by communicating about routes
less and less in the initially popular yet eventually relatively inefficient route
perspective.
4.4 Perspective Shift/Switch
Perspective shift was coded systematically in the following way: Every use of a
certain perspective by an interlocutor was checked against the speaker’s latest
utterance in a spatial perspective on a given trial. If the perspective in the
utterance differed from the perspective used in the previous utterance, it was
coded as a perspective shift, e.g., from route to survey or vice versa. Shifts across
speakers were not included in the analysis.
In order to examine the distribution of perspective shifts across trials and
dyads, we calculated the mean percent perspective shift on each trial for each
dyad. Although the overall percentage of perspective shift was relatively low
(M=8.78%), there was considerable variation across dyads (SD=15.94%) ranging
from 0% to 67% switches. We found that although perspective shift did not
correlate with trial number, it did correlate positively with the three measures
of efficiency (number of utterances overall, utterances by instruction-givers, and
utterances by instruction-followers). Trials on which participants were ‘high-
volume’ speakers tended to elicit also more perspective shifts in their utterances.
That perspective switches occur at all is not surprising and has been described
by Tversky [7]. They argue that after a while, both perspectives are conceptually
represented and available in speakers’ minds. The question is, when and why do
switches occur?
The factors influencing perspective choices can be of spatial or communicative
nature. Thus, certain changes in the spatial situation could be responsible for
the occurrence of perspective shifts. In addition, both the interlocutor’s verbal
and non-verbal behaviour can lead to a perspective switch. If the instruction-
giver is faced with behaviour by the instruction-follower that does not follow the
plan as outlined and intended by the instruction-giver, for example, a turn in
the wrong direction, this could be a reason for the speaker to shift perspective in
order to achieve better mutual understanding. We found indeed occurrences of
misunderstandings and mistakes that might have triggered a perspective shift,
as is likely to have happened in this piece of conversation:
(1) a. Instructor: fahr den flur nach rechts ganz durch. Dann nach links in den
2. flur und in den letzten raum rechts.
drive through the corridor to the right. Then to the left into the 2nd
corridor and into the last room on the right-hand side
b. Instructor: falscher raumM
wrong room
c. Instructee: wo denn?
where?
d. Instructor: nach oben
up
e. Instructor: jez revhts
now right
f. Instructor: und in den raum rein der jez neben dir is

and into the room which is now beside you
g. Instructor: genau
exactly
In this example, the instruction-giver first describes the way in the route per-
spective but then the instruction-follower makes a navigation error which the
instruction-giver points out by uttering falscher raum (E., “wrong room”). When
the instruction-follower asks wo denn? (E., “where?”), the instruction-giver im-
mediately switches to the survey perspective by saying “nach oben” (E., “up”).
The following example shows how the use of a certain perspective by one of
the interlocutors could influence their partner to use the same perspective, i.e.,
to align with them:
(2) a. Instructor: so, jetzt wieder links, dann den gang nach rechts direkt
so, now left again, then the corridor to the right directly
b. Instructor: links
left
c. Instructee: ich bin nu oben links in der Ecke
I am in the upper left corner
d. Instructor: aso, dann auf der karte nach rechts
ok, then to the right on the map
e. Instructor: genau, jetzt nach unten
exactly, now down
Here again, the instruction-giver starts by using the route perspective, but
the instruction-follower interrupts with a description of her location in the sur-
vey perspective. The instruction-giver switches to survey, marking the use of this
perspective explicitly by saying auf der Karte (E., “on the map”). The interlocu-
tor, in this case the instruction-follower, can even explicitly ask for directions in
another perspective than the one used by the instruction-giver:
(3) a. Instructor: jtyt nah rechts

now to the right
b. Instructee: oben oder unten
top or bottom
c. Instructor: in den nachsten gang nach rechts
into the next corridor to the right
d. Instructor: unten
bottom
e. Instructee: ok
ok
f. Instructor: gradeaus
straight
In example 3, the instruction-follower asks a clarification question in the sur-

vey perspective, oben oder unten (E., “top or bottom”) after the instruction-giver
had given directions in the route perspective. After answering this particular
question in the survey perspective, the instruction-giver switches back to using
the route perspective.
Thus, the analysis of the corpus data shows that the verbal behavior of the
interlocutor exerts an influence on perspective choices and can lead to perspective
shifts. It remains to be studied in future research to what extent perspective
shifts can be caused by different kinds of verbal behavior, non-verbal behavior
(spatial action), and how exactly these diverse factors interact.
5 Conclusions
Previous findings have shown variability in spatial perspective and perspective
shift to be ubiquitous in monologic descriptions of spatial relationships and in
spatial instructions such as those found in route directions. This study explored
these issues in a dialogic online navigation task which enhances the ecological
validity of such interactions. The analyses reveal that within a route instruction
task of this kind, survey and route spatial perspectives are more or less equally
likely to occur, although there is a clear tendency for instruction-givers to have
an initial preference for route-perspective descriptions which, however, gradually
evolve towards the more economical and efficient use of route perspective instruc-
tions. This is, in effect, a reflection of a trend away from a rather incremental, lo-
cal, ego-based strategy towards a more holistic, global, and environment-oriented
strategy in producing directions.
Our results also point towards a great deal of coordination among speakers,
even though the instruction-followers’ verbal contributions were limited because
of the nature of the task and the requirement to navigate via a joystick. In
addition, the findings support communicative models that account for increased
efficiency as a result of joint effort across the lifespan of an interaction.
Our data confirm the occurrence of mixing of perspective in spatial language
on a regular basis (on approximately 9% of all trials) and thus show that this
phenomenon is not restricted to monological spatial descriptions. The correlation
between perspective shifts and number of utterances needed before the spatial
goal is reached reflect the tendency for speakers in more efficient dialogues to
stay within one perspective and minimize the number of switches in describing
and negotiating a route. As a whole, we identified several driving forces behind
perspective shifts in dialogues, including the relative difficulty of specific spatial
situations and changes across situations, navigation errors by the interlocutor,
and explicit and implicit verbal reactions by the interlocutor. Controlled experi-
mental paradigms in future research need to disentangle these diverse influences.
Acknowledgements
We gratefully acknowledge the support of the Deutsche Forschungsgemein-
schaft (DFG) through the Collaborative Research Center SFB/ TR 8 Spatial
Cognition - Projects I5-[DiaSpace]. We would also like to thank the students and
teachers of the Gauderkesee Gymnasium, Bremen for their participation in our
study.
References
1. Clark, E.: Conceptual perspective and lexical choice in acquisition. Cognition. 64,
137 (1997)
2. Garrod, S.C., Anderson, A.: Saying what you mean in dialogue: a study in concep-
tual and semantic co-ordination. Cognition. 27, 181–218 (1987)
3. Andonova, E., Tenbrink, T., Coventry, K.: Spatial description, function, and con-
text (submitted)
4. Tversky, B., Lee, P.U.: How space structures language. In: Freksa, C., Habel, C.,
Wender, K.F. (eds.) Spatial Cognition 1998. LNCS (LNAI), vol. 1404, pp. 157–176.
5. Levelt, W.J.M.: Speaking: From intention to articulation. MIT Press, Cambridge
(1989)
6. Taylor, H., Tversky, B.: Perspective in spatial descriptions. Journal of Memory and
Language 35, 371–391 (1996)
7. Tversky, B., Lee, P., Mainwaring, S.: Why do speakers mix perspectives? Spatial
cognition and computation 1, 399–412 (1999)
8. Levinson, S.C.: Space in language and cognition: explorations in cognitive diversity.
Cambridge University Press, Cambridge (2003)
9. Kriz, S., Hegarty, M.: Spatial perspective in spoken descriptions of real world envi-
ronments at different scales. In: Proceedings of the XXVII Annual Meeting of the
Cognitive Science Society, Stresa, Italy (2005)
10. Taylor, H., Naylor, S., Chechile, N.: Goal-specific influences on the representation
of spatial perspective. Memory & Cognition 27, 309–319 (1999)
11. Levelt, W.: Perspective Taking and Ellipsis in Spatial Descriptions. In: Bloom, P.,
Peterson, M., Nadel, L., Garrett, M. (eds.) Language and Space, pp. 77–109. MIT
12. Vorwerg, C.: Consistency in successive spatial utterances. In: Coventry, K., Ten-
brink, T., Bateman, J. (eds.) Spatial language and dialogue. Oxford University
Press, Oxford (in Press)
13. Schober, M.F.: Spatial perspective taking in conversation. Cognition 47(1), 1–24
(1993)
14. Striegnitz, K., Tepper, P., Lovett, A., Cassel, J.: Knowledge representation for
generating locating gestures in route directions. In: Spatial Language in Dialogue.
Oxford University Press, Oxford (2008)
15. Ross, R.J.: Tiered models of spatial language interpretation. In: Proceedings of
Spatial Cognition 2008, Freiburg, Germany (2008)
16. Anderson, A.H., Bader, M., Bard, E.G., Boyle, E.H., Doherty, G.M., Garrod, S.C.,
Isard, S.D., Kowtko, J.C., McAllister, J.M., Miller, J., Sotillo, C.F., Thompson,
H.S., Weinert, R.: The HCRC Map Task corpus. Language and Speech 34(4), 351–
366 (1992)
17. Newlands, A., Anderson, A.H., Mullin, J.: Adapting communicative strategies to
computer-mediated communication: an analysis of task performance and dialogue
structure. Applied Cognitive Psychology 17(3), 325–348 (2003)
18. Clark, H.H., Wilkes-Gibbs, D.: Referring as a Collaborative Process. Cogni-

tion 22(1), 1–39 (1986)
19. Pickering, M.J., Garrod, S.: Towards a mechanistic psychology of dialogue. Be-
havioural and Brain Sciences 27(2), 169–190 (2004)
20. Core, M.G., Allen, J.F.: Coding dialogues with the DAMSL annotation scheme.
In: Traum, D. (ed.) Working Notes: AAAI Fall Symposium on Communicative
Action in Humans and Machines. American Association for Artificial Intelligence,
pp. 28–35. AAAI, Menlo Park, California (1997)
21. Krauss, R., Weinheimer, S.: Changes in reference phrases as a function of frequency
of usage in social interaction. Psychonomic Science (1964)
Natural Language Meets Spatial Calculi
Joana Hois and Oliver Kutz

University of Bremen, Germany
{joana,okutz}@informatik.uni-bremen.de
Abstract. We address the problem of relating natural language de-

scriptions of spatial situations with spatial logical calculi, focusing on
projective terms (orientations). We provide a formalism based on the
theory of E -connections that connects natural language and spatial cal-
culi. Semantics of linguistic expressions are specified in a linguistically
motivated ontology, the Generalized Upper Model. Spatial information
is specified as qualitative spatial relationships, namely orientations from
the double-cross calculus.
This linguistic-spatial connection cannot be adequately formulated
without certain contextual, domain-specific aspects. We therefore extend
the framework of E -connections twofold: (1) external descriptions narrow
down the class of intended models, and (2) context-dependencies inherent
in natural language descriptions feed back into the representation finite
descriptions of necessary context information.
Keywords: Spatial language, Spatial calculi, Ontologies, E -connections.
1 Introduction
We are aiming at a formal specification of connections between linguistic repre-
sentations and logical theories of space. Language covers various kinds of spatial
relationships between entities. It can express, for instance, orientations between
them (“the cat sat behind the sofa”), regions they occupy (“the plant is in the
corner”), shapes they commit to (“the terrace is surrounded by a wall”), or
distances between them (“ships sailed close to the coast”). Formal theories of
space also cover various types of relations, such as orientations [1], regions [2,3],
shapes [4], or even more complex structures, such as map hierarchies [5]. Com-
pared to natural language, spatial theories focus on one particular spatial aspect
and specify its underlying spatial logic in detail. Natural language, on the other
hand, comprises all of these aspects, and has thus to be linked to a number of
different spatial theories. This linking has to be specified for each aspect and each
spatial logic, identifying relevant information necessary for a linking or mapping
function. This process involves contextual as well as domain-specific knowledge.
Our overall aim is to provide a general framework for identifying links be-
tween language and space as a generic approach to spatial communication and
independent of concrete kinds of applications in which it is used. It should be ap-
plicable to any spatial context in connection with human-computer interaction,

Natural Language Meets Spatial Calculi 267
be it a geographic applications for way-finding and locating, city guides using

maps, home/office automation applications, paths and spatial guidance, or ar-
chitectural design planners. In particular, rather than attempting to integrate
the most general spatial theories, we propose to use, in a modular way, vari-
ous specialised (qualitative) spatial logics supporting dedicated and optimised
reasoning algorithms.
In this paper, we analyse the linking between natural language and one spe-
cific aspect of space, namely orientation information for static spatial situations.
We concentrate on static descriptions throughout this article, because dynamic
descriptions (as they are defined in the linguistic ontology) do not differ from
static descriptions with respect to their orientation-based locations: in “I am go-
ing to the left” and “The stove is to the left” the “to the left” refers to the same
leftness in terms of the orientation. Moreover, most information about locatives
are given by static descriptions of locations rather than dynamic movement [6].
We define links between language and a spatial theory concerning orientations,
showing examples of linguistic projective terms, such as “A is to the right of B”,
“A is sitting to B’s left”, or “A is straight ahead”. These types of terms are
specified in a linguistic ontology, the Generalized Upper Model [7], and linked
with necessary non-linguistic information of the orientation calculus [8]. In order
to apply this representation to spatial maps, we introduce spatial orientations
according to four basic projective, two-dimensional directions (left, right, front,
back), which are distinguished and formalised. In particular, spatial entities are
reducible to points and refer to material objects with finite dimensions.
We will introduce the linguistic ontology and its representation of spatial
relationships in the next section. In Section 3, the connection between linguistic
semantics, the double-cross calculus and relevant link-related aspects will be
analysed using natural language examples. Finally, in Section 4, we will introduce
an extension of the framework of E-connections to formalise all these aspects in
a modular way, which can be represented as a structured logical theory in the
system Hets for heterogeneous specification.
2 Linguistic Spatial Semantics
Natural language groups spatial relations into different categories according to

certain aspects, which can be related to specific spatial theories that deal with
these aspects. A linguistic categorisation of spatial relationships on the basis
of linguistic evidence, empirical research, and grammatical indications has been
developed in detail in the Generalized Upper Model GUM [7,9], a linguistically
motivated ontology. Linguistic ontologies structure language into groups of cat-
egories and relations by their semantics, i.e. categories are not based on lexemes
but meanings. As a formal theory, GUM is axiomatised in first-order logic, parts
of which can also be expressed in description logics (DLs) such as SROIQ [10],
underlying the Web Ontology Language OWL 2.0. GUM’s signature, i.e. its
set of non-logical symbols, contains categories (unary predicates) and relations
(binary predicates).
268 J. Hois and O. Kutz
utterance:
The chair is to the right of the table
linguistic description of the utterance:
locatum process placement
spatialModality relatum
Fig. 1. Relations in GUM of an utterance example of a static spatial situation
GUM captures linguistic semantics of spatial expressions while nevertheless

rendering this organisation independently of specific spatial logics. Also, its cat-
egorisation is not simply based on groups of spatial prepositions, but based on
linguistic characteristics of spatial relations, grammatically or inherently, linguis-
tic evidence and empirical data. Therefore, the development of GUM has been
carried out with respect to empirical results in human computer interaction
[11,7,12] and general linguistic research [13,14,15,16,17]. Utterances of spatial
situations are specified as instances in GUM. We refer the reader to an overview
of GUM in [9] and specific spatial components in [7].
2.1 Linguistic Specifications in the Generalized Upper Model

An utterance expressing a static spatial description is instantiated in GUM as
a SpatialLocating. This category is a subclass of Configuration, a category that
represents activities or states of affairs usually expressed at the level of the clause.
They are defined according to their possible relations within the ontology, i.e.
defined by entities that participate in the activity or state of affair. In principle,
a single static description is specified by an instance of Configuration. Specific
parts of the description (what, where, who, how, etc.) are specified by instances
of Element, and their roles within the Configuration are specified by instances of
relations (actor, manner, process, attribute, etc.).
Subcategories of Configuration that represent spatial activities or conditions
are divided into static spatial situations and dynamic spatial situations. In the
following, we will concentrate on the former, the SpatialLocating. This GUM
category defines at least the following three relations:
1. The relation locatum in GUM relates the SpatialLocating to its located object
within the spatial linguistic description. In the example “the chair is to the
right of the table” (see Fig. 1), “the chair” is at a specific spatial position
and represents the locatum [7] (also called the “referent” in [13]), i.e. the
entity that is located somewhere.
2. The relation processInConfiguration relates the SpatialLocating to its process,
the action or condition entity, which is usually expressed by a verbal group,
indicating tense, polar and modal aspects [17]. In the example in Fig. 1, the
process corresponds to “is”.
3. The relation placement relates the SpatialLocating to the location of the loca-
tum. This location is represented by the GUM category GeneralizedLocation.
It refers to “to the right of the table” in the example. A GeneralizedLocation
specifies the spatial position of a locatum and consists of a spatial term, e.g.
a spatial preposition, and an entity that corresponds to the reference object.
Hence, the GeneralizedLocation defines two relations: spatialModality (spatial
relation) and relatum (reference object). In the example, the spatialModality
is expressed by “to the right of” and the relatum is expressed by “the table”.
The relatum, however, may remain implicit in natural language discourse
[12], such as in the example “the chair is to the right”, i.e. to the right of
an undefined relatum, be it the speaker, listener or another entity. In case
multiple relata are described together with the same spatial modality, they
fill the relation relatum as a collection.
Binding the relatum and the spatialModality in the placement relation is rather
a design issue than a logical constraint. This encapsulation allows convenient
combinations of multiple locations expressed within one configuration: in the
example “The plant is in the corner, by the window, next to the chair.”, one
SpatialLocating defines three placements. This is even more important as soon as
placements are modified by expressing spatial perspectives, spatial accessibility,
extensions or enhancements of the spatial relation. The utterance “The plant is
to the front left of the chair, right here in the corner.” combines two relations
(front and left) with respect to one relatum (the chair), while a second relatum
(in the corner) is combined with possible access information (right here). More-
over, modifications that are encapsulated together with the placement are easier
to compare in case of re-use of spatial placements, e.g. throughout a dialogue
discourse. Moreover, the GeneralizedLocation retains its structure independently
of the configuration. It is equally specified in “he goes to the right of the chair ”
(dynamic spatial configuration) and “he stands to the right of the chair ” (static
spatial configuration), related by different relations (destination and placement).
Types of spatial relationships between locatum and reference objects are de-
scribed by the category SpatialModality. Linguistically, this category corresponds
to a preposition, an adverb, an adjective, or parts of the verb. It is subdivided
into several categories that are primarily grouped into (1) relations expressing
distance between entities, (2) functional dependencies between entities, and (3)
positions between entities relative to each other depending on particular prop-
erties of the entities (such as intrinsic front side, size, shape). There are, how-
ever, intersections between these three general groups. Subcategories that refer
particularly to spatial relationships based on orientations are subsumed under
ProjectionRelation, describing positions between entities relative to each other
depending on particular orientation-based properties of the entities.
2.2 Orientation-Related Linguistic Spatial Relationships

Projective Relations are distinguished along their three dimensions and can be
divided into horizontal and vertical directions [18]. In order to reason (and talk)
FrontProjectionExternal
Access
FrontProjection
FrontProjectionInternal
FrontalProjection
BackProjectionExternal
BackProjection
SpatialDistance
Disjointness
BackProjectionInternal
Parthood
HorizontalProjection
LeftProjectionExternal
LeftProjection
LeftProjectionInternal
LateralProjection
RightProjectionExternal
RightProjection
RightProjectionInternal
Fig. 2. Projective horizontal relations in GUM
about map-like representations, it suffices to concentrate on horizontal relations,

which can be distinguished along lateral and frontal directions. Lateral projec-
tions comprise the directions left and right, frontal projections comprise front
and back.
All four ontological categories of horizontal ground (atomic) projective rela-
tions, namely LeftProjection, RightProjection, FrontProjection, and BackProjection,
can be expressed as an internal or external relationship [19]. Internal projective
relations inherit from the category Parthood (topological) and refer to internal
projections between locatum and relatum, such as “A is in the left (part) of
C” or “B is in the front of C”. External projective relations inherit from the
categories Disjointness (topological) and SpatialDistance and refer to external pro-
jections between locatum and relatum, such as “A is to the left of C” or “B is in
front of C” (compare Fig. 3). Furthermore, the category FrontProjectionExternal
also inherits from the category Access, as external front
projections imply functional access between locatum and
relatum. An overview of the projective categories and B
their hierarchical dependencies in GUM are shown in C’s orientation
Fig. 2. These categories are pairwise disjoint, for in-
B
stance, FrontalProjection is disjoint with LateralProjection.
They can, however, be extended (in GUM terminology), A’ A C
i.e. an instance of FrontProjectionInternal (“front”) in
“A is in the front left” is extended by an instance of
LeftProjectionInternal (“left”). Spatial modalities can also Fig. 3. Internal and
be enhanced (in GUM terminology) by additional entities, external projective
e.g. distance information in “A is 10 meters to the left”. relations
Hence, GUM represents linguistic characterisations of
orientations, which have to be associated with concrete spatial situations in order
to yield a fully contextualised interpretation. In the next section, we will intro-
duce an orientation-based spatial calculus and link this representation to GUM’s
projective categories. We will also identify missing aspects needed to minimise
ambiguity in such a connection, namely context-dependent and domain-specific
information.
3 Orientation Calculi and External Aspects

Spatial calculi address specific aspects of space, such as regions, orientations,
shapes, etc., in order to provide formal representations as well as automatic
reasoning techniques. General overviews for such representations are given in
[20] and [21]. Calculi most relevant for mapping GUM’s projective categories are
those involving orientations since the linguistic projective relations described
above refer to orientations within a spatial situation.1 Many well known spatial
calculi for orientations have been studied in the literature, among them are the
double-cross calculus [8], the star calculus [22], the line segment-based calculus
[23], or a model for positional relations2 [24].
Such calculi are intended to be used for either static or dynamic relationships.
They refer either to point-based or region-based spatial entities. They are based
either on geometric or cognitive factors. The approach described in this paper
maps orientations expressed in natural language to orientations represented in
the double-cross calculus.
3.1 The Double-Cross Calculus
1 [8] introduces a ternary calculus of spatial orientations,

2 12
the so-called double-cross calculus (DCC) [21]. In DCC,
15 relations are distinguished between an observer at po-
3 14 11
B sition A, who is oriented (or moves) towards an entity at
4 13 10 position B (compare Fig. 4). The 15 orientation relations
A
are defined along three axes motivated by human cogni-
5 9
15 tive characteristics: A and B are called the perspective
6 8 point and the reference point respectively in [21]. They
determine the front-back axis. Orthogonal to this axis are
7 two further axes specified by A and B. Another entity lo-
cated at some position C can then be described according
Fig. 4. DCC’s 15 qua-
to one of the 15 orientations.
litative orientation re-
lations according to [8] Some of the correspondences between GUM and DCC
are readily inferred: given an utterance, the perspective
from where the relationship holds refers to an entity at A,
the relatum refers to an entity at B, and the locatum refers to an entity located
with respect to one of the 15 orientation relations determined by the spatial
modality. The perspective, however, is often underspecified in utterances and
might refer to the speaker, the listener, some other entity, or B. Which frame
of reference [13] underlies the utterance is often not explicitly given. Also, in
case the relatum is missing, B has to be inferred by other implicit or contextual
1
Although cardinal direction, i.e. north, east, south, west, are also related to orienta-
tions in some calculi, they are different from linguistic projective terms as introduced
above and should thus be investigated separately.
2
[24] use projective relations in their model, which do not correspond to linguistic
projective relations as they are used in GUM (e.g. “surround”, “inside”, “outside”
are not linguistic projective terms in GUM).
information. The perspective (A) and the relatum (B) can even be identical: in
this case, the locatum and the relatum are identical (i.e. A = B). The reference
frame will automatically be intrinsic, and the orientation has to be determined
by the intrinsic front.
Even if GUM’s spatial relationships, then, are linked almost directly with
DCC’s orientations, especially by means of the inherent distinction between
front/back and right/left projections, a missing perspective and relatum of an
utterance have to be inferred and mapped to a DCC representation. What ex-
actly these missing links are, and how an adequate mapping can be constructed
by taking other information into account, is described in the following.
3.2 External Spatial Aspects in Linguistic Semantics

As GUM’s linguistic specification is strongly based on concepts indicated by nat-
ural language, it does not entail enough information in order to map linguistic
elements directly to entities of the spatial calculus. Hence, a mapping function
from language to (models of) space needs additional information: [6] identifies
eight parameters necessary to interpret a linguistic utterance. Among them are
speaker and addressee, their locations and a view- or vantage point. Although
[6] argues that orientations of speakers and addressees can be derived from their
locations, this derivation is not specified in more detail, and as orientations are
highly important in interpreting projective terms, our mapping has to specify
them directly. Still missing are also intrinsic fronts of the reference object: pro-
jective linguistic terms can be interpreted along intrinsic orientations of objects
independent of location and orientation of speaker or listener.
Before we introduce links between corresponding linguistic and spatial enti-
ties, we start with examples of natural language utterances from a scene descrip-
tion. They motivate missing aspects not given in the utterance. The examples are
windows
computer writing
table
table desk
chair
plant
TV
2 chairs
on table
fridge
stove coffee
table sofa
kitchen
table armchair
speaker
dining
3 chairs table with
4 chairs
door door
Fig. 5. Room layout of a scene description task, introduced in [7]. Arrows indicate
intrinsic orientations of objects.
Table 1. Example of utterances of native English speakers from spoken experiment

and their representation in GUM. Utterances are cited without padding and pauses.
utterance locatum spatialModality relatum

1. the armchair is almost directly to my armchair RightProjectionExternal me
right
2. with the table just in front of it table FrontProjectionExternal it
3. and diagonally behind me to the table BackProjectionExternal me
right is the table / RightProjection /–
4. the stove is directly to our left stove LeftProjectionExternal us
5. and to the right of that is the fridge fridge RightProjectionExternal that
6. there is a table to the right table RightProjectionExternal –
7. further to the right a little bit in living room RightProjection –
front is a living room + FrontProjection
8. directly in front, there are two tables tables FrontProjectionExternal –
9. from here the television is diagonally television RightProjectionExternal –
to the right (perspective: here)
taken from a series of experiments involving human-robot interaction in which

participants were asked to explain a spatial situation to a robot. A detailed
description of the experimental design is given in [7].
Fig. 5 shows the room layout of the experiment. Here, the position of speaker
and listener coincides, i.e. they share the same perspective. In Table 1, an excerpt
from the corpus data is given, in which participants refer to positions of objects
in the room along their projective relationships. Although utterances from the
corpus lack information about relatum and perspectives in general, such infor-
mation is commonly omitted in natural language and has to be determined by
other contextual or domain-specific factors. Even though positions of locatum,
relatum, and perspective point have to be determined with respect to these ex-
ternal factors, links between projective spatial modalities and DCC relations can
be defined in general: a concrete mapping, for instance, from a LeftProjection to
the DCC orientations 2–6 is not affected by the position of A.
3.3 Non-linguistic Spatial Aspects of Projective Relations in GUM

The utterance “the armchair is (almost directly)3 to my right” shows an example,
where the locatum (armchair) is located to the right (RightProjectionExternal)
of the speaker (relatum: me) (see Table 1). This utterance refers to an intrinsic
frame of reference, where the perspective coincides with the relatum, i.e. the
speaker, related to the position A in DCC. The locatum is then located at a
point with one of the orientations 8–12 in the DCC, A and B are identical. In
this example, information about the speaker’s identity with “my (right)” and
the frame of reference has to be added to the mapping function.
3
Although GUM specifies modifications such as “almost directly” as enhancements
of the spatial relation, we disregard them for a general mapping function, as they
have minor impact on orientations (i.e. left does not become right).
The next sentence “with the table just in front of it (the armchair)” also
refers to an intrinsic frame of reference, but with the armchair as origin, i.e. the
armchair refers to A in DCC (see also Fig. 6), which also coincides with B. In
this case, the locatum (table) is located at a position with one of the orientations
1–4 and 10–14. Hence, information about the armchair’s intrinsic front and the
frame of reference have to be taken into account.
In case of a relative frame of reference as in “to the right of that (the stove)
is the fridge”, the perspective point A is indicated by the speaker, the refer-
ence point B is indicated by the relatum (stove), and the locatum (fridge) is
indicated by a point that refers to one of the orientations 10–12 in DCC. Here,
the frame of reference, the possibility of the stove having an intrinsic front and
the perspective, i.e. the position of the speaker, are relevant for the mapping.
If the relatum has no intrinsic front, it follows that a relative frame of reference
applies. Otherwise, the choice of the underlying frame of reference is based on
user preferences (extracted from the dialogue history) and the likeliness of in-
trinsic vs. relative frame of reference (according to the contextual descriptions).
In cases where the relatum is missing—e.g. the relatum of “further to the
right” is omitted in Example 7—it is usually possible to determine its position
by considering the preceding utterances. Hence, the sequence of utterances may
give implicit information about missing entities in GUM’s representation, and
thus has to be considered throughout the construction of the mapping between
GUM and DCC. Similarly, in Example 9, the given perspective “here” can either
be interpreted as reference to the speaker or to the position that has just been
described in a previous sentence, though a relative frame of reference can be
assumed for explicit perspectives.
Given the corpus data, we conclude that the following parameters are involved
in mapping the linguistic semantics of an utterance to a spatial situation:
1. Position and orientation of speaker and listener

2. Reference system (relative or intrinsic) and origin (perspective)
Fig. 6. DCC orientations of different entities: different perspectives cause different

projective relationships. The DCC orientations in the left figure are based on the per-
spective of the speaker (participant), while the orientations in the right figure are based
on intrinsic orientations of objects with intrinsic fronts and a changed orientation of
the speaker. Objects are implicitly reduced to points defined by their centre.
3. Domain-specific knowledge of entities (e.g. possibility of intrinsic fronts, their

orientations and granularity)
4. Dialogue history (sequence of utterances)
A linguistic representation in GUM together with the parameters can then
be mapped to the location of the perspective point A, the reference point B
and possible orientations towards the position of the located entity in DCC. The
formalisation of this mapping is described in the following.
4 Multi-dimensional Formalisms and Perspectivism

The formation of multi-dimensional formalisms, i.e. formalisms that combine
the syntax and semantics of different logical systems in order to create a new
hybrid formalism, is a difficult and complex task in general, compare [25] for
an overview. ‘Classical’ formalisms that have been used for formalising natural
language statements involving modalities are counterpart theory [26] and modal
predicate logics. However, both these formalisms, apart from being computa-
tionally rather difficult to deal with, are not particulary suited to deal with
(qualitative) spatial reasoning as they do not, in their standard formulations,
provide a dedicated spatial component, neither syntactically nor semantically.
Similarly, the semantically tight integration that product logics of space and
modality provide does not support the sometimes loose or unsystematic rela-
tionships that natural language modelling requires.
From the discussion so far, it follows that there are three desiderata for the
envisaged formalisation:
1. To be able to represent various aspects of space and spatial reasoning, it
needs to be multi-dimensional. However, in order to keep the semantics of
the spatial calculi intact, the interaction between the formalisms needs to be
loose initially, but also fine-tunable and controllable.
2. It needs to account for common sense knowledge that would typically be
formalised in a domain ontology, and allow to restrict further the interaction
between components.
3. It needs to account for context information not present in the representation
using linguistic semantics.
The general idea of counterpart relations being based on a notion of similarity,
however, gives rise to a framework of knowledge representation languages that
seems quite well-suited to meet these requirements, namely the theory of E-con-
nections [27,28], which we sketch in the next section.
4.1 From Counterparts to E-Connections

In E-connections, a finite number of formalisms talking about distinct domains
are ‘connected’ by relations relating entities in different domains, intended to
capture different aspects or representations of the ‘same object’. For instance,
an ‘abstract’ object o of a description logic L1 (e.g. an instance in GUM defining

a linguistic item) can be related via a relation R to its life-span in a temporal
logic L2 (a set of time points) as well as to its spatial extension in a spatial logic
L3 (a set of points in a topological space, for instance). Essentially, the language
of an E-connection is the (disjoint) union of the original languages enriched with
operators capable of talking about the link relations.
The possibility of having multiple relations between domains is essential for
the versatility of this framework, the expressiveness of which can be varied by
allowing different language constructs to be applied to the connecting relations.
E-connections approximate the expressivity of products of logics ‘from below’
and could be considered a more ‘cognitively adequate’ counterpart theory.
E-connections have also been adopted as a framework for the integration of on-
tologies in the Semantic Web [29], and, just as DLs themselves, offer an appealing
compromise between expressive power and computational complexity: although
powerful enough to express many interesting concepts, the coupling between the
combined logics is sufficiently loose for proving general results about the transfer
of decidability: if the connected logics are decidable, then their (basic) connec-
tion will also be decidable. More importantly in our present context, they allow
the heterogeneous combination of logical formalisms without the need to adapt
the semantics of the respective components.
Note that the requirement of disjoint signatures of the formal languages of
the component logics is essential for the expressivity of E-connections. What this
boils down to is the following simple fact: while more expressive E-connection
languages allow to express various degrees of qualitative identity, for instance
by using number restrictions on links to establish partial bijections, they lack
means to express ‘proper’ numerical trans-module identity.
For lack of space we can only sketch the formal definitions, and present only
the two-dimensional case, but compare [28]: we assume that the languages L1
and L2 of two logics S1 and S2 are disjoint. To form a connection C E (S1 , S2 ), fix
a non-empty set of links E = {Ej | j ∈ J}, which are binary relation symbols in-
terpreted as relations connecting the domains of models of S1 and S2 . The basic
E-connection language is then defined by enriching the respective languages
with operators for talking about the link relations. A structure

M = W1 , W2 , E M = (EjM )j∈J ,
where Wi = (Wi , .Wi ) is an interpretation of Si for i ∈ {1, 2} and EjM ⊆ W1 ×W2

for each j ∈ J, is called an interpretation for C E (S1 , S2 ). Given a concept C
of logic S2 , denoting a subset of W2 , the semantics of the basic E-connection
operator is
(Ej 1 C)M = {x ∈ W1 | ∃y ∈ C M (x, y) ∈ EjM }
Fig. 7 displays the connection of an ontology with a spatial logic for regions
such as S4u , by means of a single link relation E which we might read as ‘is the
spatial extension of’.
Fig. 7. A two-dimensional connection
As follows from the complexity results of [28], E-connections add substantial

expressivity and interaction to the component formalism. However, it is also clear
that many properties related to (2) and (3) above can not directly be formalised
in this framework. The next section sketches an extension to E-connections that
adds these expressive means, called perspectival E-connections.
4.2 Perspectival E-Connections

We distinguish three levels of interaction between the two representation lan-
guages S1 and S2 :
1. internal descriptions: axioms formulated in the link language
2. external descriptions: axioms formulated in an external description language:
reasoning over the same signature, but in a richer logic. They add interaction
constraints not expressible in (1), motivated by general domain knowledge.
3. context descriptions: a class of admissible models needs to be finitely spec-
ified: here, not a unique model needs to be singled out in general, but a
description of a class of models compatible with a situation (a context).
There are several motivations for such a modular representation: it (i) respects
differences in epistemic status of the modules; (ii) reflects different representa-
tional layers; (iii) displays different computational properties of the modules; (iv)
facilitates independent modification and development of modules; (v) allows to
apply structuring techniques developed in algebraic specification theory; etc.
The general architecture of perspectival E-connections is shown in Fig. 8. For
an E-connection of GUM with DCC, the internal descriptions cover the axioms
of GUM and the constraint systems of DCC. Moreover, basic interactions can
be axiomatised, e.g. mappings from GUM elements to DCC points need to be
functional.
4.3 Layered Expressivity: External Descriptions and Context

The main distinction between external and contextual descriptions is not techni-
cal but epistemic. External descriptions are meant to enforce necessary interac-
tions between ontological and spatial dimensions, while contextual descriptions
add missing context information. The formal languages used to enforce these
O D
......
.........
Contextual Descriptions - ......
... External Descriptions
E
C (S1 , S2 )(D, O)
6
C E (S1 , S2 )
Internal Descriptions
Fig. 8. Architecture of Perspectival E -connections
constraints will typically be different. Similar to conceptual spaces [30], they are
intended to reflect different representational dimensions or layers of a situation.
External Descriptions. An example, taken from [27], is the following con-

straint: “The spatial extension of the capital of every country is included in
the spatial extension of that country”. This is a rather natural condition in an
E-connection combining a DL describing geography conceptually and a qualita-
tive calculus for regions. Unfortunately, a basic E-connection C E (ALCO, S4u ) is
not expressive enough to enforce such a condition. However, it can be added as
an external description if we assume the external language allows quantification

∀x∀y x capital of y → E(x) ⊆ E(y)
In this case, the external description does not affect the decidability of the for-
malism, as shown in [28]. Of course, this is not always the case: the computational
benefits of using E-connections as the basic building block in a layered represen-
tation can get lost in case the external descriptions are too expressive. While a
general characterisation of decidability preserving constraints is difficult to give,
this can be dealt with on a case-by-case basis. In particular, the benefits of a
modular design remain regardless of this issue.
Similarly to the above example, when combining GUM with DCC, assuming
Φ axiomatises a LeftProjection (“left of”) within a SpatialLocating configuration,
we need to enforce that elements participating in that configuration are mapped
to elements of DCC models restricted to the five ‘leftness’ relations of DCC (see
Section 3.2).

6
∀x, y, z Φ(x, y, z) → Li (E(x), E(y), E(z))
i=2
This would be a typical external description for C E (GU M, DCC). Note that
any internal description can be turned into an external one in case the exter-
nal language is properly more expressive. However, the converse may be the case
as well. For a (set of) formula(s) χ, denote by Mod(χ) the class of its models. An
external description Ψ may now be called internally describable just in case

there is a finite set X of internal descriptions such that Mod(Ψ ) = Mod(X ).
Contextual Descriptions. Assume an E-connection C = CLE (S1 , S2 ) with link

language L is given, and where Sig(C) denotes its signature, i.e. its set of
non-logical symbols, including link relations. Moreover, assume S is a finite
set of situations for C. Now, from an abstract point of view, a context oracle
(or simply an oracle) is any function f mapping situations for an E-connection
to a subclass of its models to pick out the class of models compatible with a
situation:
f: S −→ P(Mod(Sig(C))),
where P denotes the powerset operation. This restricts the class of models for
CLE (S1 , S2 ) independently of the link language L and the external description
language. For practical applications, however, we need to assume that these
functions are computable, and that the classes {f(s) | s ∈ S} of models they sin-
gle out can be finitely described by a context description language for S.
For combining GUM and DCC, the context description language simply needs
to add the missing items discussed at the end of Section 3.3, i.e. fix the posi-
tion of the speaker, the reference system, etc., relative to a situation s. Clearly,
there are many options how to internalise the contextual information into an
E-connection. We have mentioned a language for specifying descriptions of finite
models, but there are many other possibilities. For instance, [31] discuss several
formal logics that have been designed specifically for dealing with contextual
information, and compare their expressive power. Moreover, it might turn out
that different contextual aspects require different logics or languages of context
to be adequately formalised. Such problems, however, are left for future work.
4.4 Perspectival E-Connections in Hets

The Heterogeneous Tool Set Hets [32] provides analysis and reasoning tools for
the specification language HetCasl, a heterogeneous extension of Casl sup-
porting a wide variety of logics [33]. In particular, OWL-DL, relational schemes,
C E (S1 , S2 )(D,
O)
-
(possible) - C E (S1 , S2 )(O)
.... C E (S1 , S2 )(D)
.... (naı̈ve)physics/
contexts -
world knowledge
.. ..
..6 ..6
.. ..
.. ..
..
.. C E (S1 , S
2) ..
..
..
.. - ..
..
. .
linguistic - S1
................. S2
................ (qualitative) spa-
semantics tial reasoning
Fig. 9. Perspectival E -connections as a structured theory in Hets

sorted first-order logic FOLms , and quantified modal logic QS5, are covered.
The DCC composition tables and GUM have already been formalised in Casl,
and it has also been used successfully to formally verify the composition tables
of qualitative spatial calculi [34].
As should be clear from the discussion so far, E-connections can essentially
be considered as many-sorted heterogeneous theories: component theories can
be formulated in different logical languages (which should be kept disjoint or
sorted), and link relations are interpreted as relations connecting the sorts of
the component logics.4
Fig. 9 shows perspectival E-connections as structured logical theories in the
system Hets. Here, dotted arrows denote the extra-logical or external sources of
input for the formal representation, i.e. for the description of relevant context and
world-knowledge; black arrows denote theory extensions, and dashed arrows a
pushout operation into a (typically heterogeneous) colimit theory of the diagram
(see [35,36,37] for technical details).
5 Conclusions and Future Work

We have investigated the problem of linking spatial language as analysed in a
linguistically motivated ontology with spatial (qualitative) calculi, by mapping
GUM’s projective spatial relationships to DCC’s orientations. We concluded that
various aspects important for this connection but omitted or not given explicitly
in the linguistic semantics need to be added to the formal representation.
Moreover, we argued that these additional aspects can be divided into domain-
specific (world-dependent) and contextual (situation-dependent) aspects. An
approach for connecting all these heterogeneous modules into a structured het-
erogeneous theory is defined, called perspectival E-connections.
Perspectival E-connections now provide us with a formal framework for defin-
ing relationships between spatial language and calculi. This is not limited to the
aspect of orientation discussed in detail in this paper. Rather, it can be car-
ried out in the same way to deal with aspects covered by alternative orientation
calculi, as well as calculi for distances, topology, shapes, etc. Here, the inter-
play between various such spatial calculi and GUM’s respective treatment of the
relevant non-projective spatial language has to be analysed.
Acknowledgements
Our work was carried out in the DFG Transregional Collaborative Research
Center SFB/TR 8 Spatial Cognition, project I1-[OntoSpace]. Financial support
by the Deutsche Forschungsgemeinschaft is gratefully acknowledged. The authors
would like to thank John Bateman and Till Mossakowski for fruitful discussions.
4
The main difference between various E -connections now lies in the expressivity of the
‘link language’ L connecting the different logics. This can range from a sub-Boolean
logic, to various DLs, or indeed to full first-order logic.
References
1. Moratz, R., Dylla, F., Frommberger, L.: A Relative Orientation Algebra with Ad-
justable Granularity. In: Proceedings of the Workshop on Agents in Real-Time and
Dynamic Environments (IJCAI 2005) (2005)
2. Casati, R., Varzi, A.C.: Parts and Places - The Structures of Spatial Representa-
tion. MIT Press, Cambridge (1999)
3. Cohn, A.G., Bennett, B., Gooday, J., Gotts, N.M.: Representing and Reasoning
with Qualitative Spatial Relations. In: Stock, O. (ed.) Spatial and Temporal Rea-
soning, pp. 97–132. Kluwer Academic Publishers, Dordrecht (1997)
4. Schlieder, C.: Qualitative Shape Representation. In: Geographic Objects with In-
determinate Boundaries, pp. 123–140. Taylor & Francis, London (1996)
5. Kuipers, B.: The Spatial Semantic Hierarchy. Artificial Intelligence 19, 191–233
(2000)
6. Kracht, M.: Language and Space, Book manuscript (2008)
7. Bateman, J., Tenbrink, T., Farrar, S.: The Role of Conceptual and Linguistic On-
tologies in Discourse. Discourse Processes 44(3), 175–213 (2007)
8. Freksa, C.: Using Orientation Information for Qualitative Spatial Reasoning. In:
Frank, A.U., Campari, I., Formentini, U. (eds.) Theories and methods of spatio-
temporal reasoning in geographic space, pp. 162–178. Springer, Berlin (1992)
9. Bateman, J.A., Henschel, R., Rinaldi, F.: Generalized Upper Model 2.0: Documen-
tation. Technical report, GMD/Institut für Integrierte Publikations- und Informa-
tionssysteme, Darmstadt, Germany (1995)
10. Horrocks, I., Kutz, O., Sattler, U.: The Even More Irresistible SROIQ. In: Knowl-
edge Representation and Reasoning (KR 2006), pp. 57–67 (2006)
11. Shi, H., Tenbrink, T.: Telling Rolland Where to Go: HRI Dialogues on Route Nav-
igation. In: WoSLaD Workshop on Spatial Language and Dialogue, Delmenhorst,
Germany, October 23-25 (2005)
12. Tenbrink, T.: Space, Time, and the Use of Language: An Investigation of Relation-
ships. Mouton de Gruyter, Berlin (2007)
13. Levinson, S.C.: Space in Language and Cognition: Explorations in Cognitive Di-
versity. Cambridge University Press, Cambridge (2003)
14. Herskovits, A.: Language and Spatial Cognition: An Interdisciplinary Study of
the Prepositions in English. Studies in Natural Language Processing. Cambridge
University Press, London (1986)
15. Coventry, K.R., Garrod, S.C.: Saying, Seeing and Acting. The Psychological Se-
mantics of Spatial Prepositions. Essays in Cognitive Psychology. Psychology Press,
Hove (2004)
16. Talmy, L.: How Language Structures Space. In: Pick, H., Acredolo, L. (eds.) Spatial
Orientation: Theory, Research, and Application, pp. 225–282. Plenum Press, New
York (1983)
17. Halliday, M.A.K., Matthiessen, C.M.I.M.: Construing Experience Through Mean-
ing: A Language-Based Approach to Cognition. Cassell, London (1999)
18. Vorwerg, C.: Raumrelationen in Wahrnehmung und Sprache: Kate-
gorisierungsprozesse bei der Benennung visueller Richtungsrelationen. Deutscher
Universitätsverlag, Wiesbaden (2001)
19. Winterboer, A., Tenbrink, T., Moratz, R.: Spatial Directionals for Robot Naviga-
tion. In: van der Zee, E., Vulchanova, M. (eds.) Motion Encoding in Language and
Space. Oxford University Press, Oxford (2008)
20. Cohn, A.G., Hazarika, S.M.: Qualitative Spatial Representation and Reasoning:
An Overview. Fundamenta Informaticae 43, 2–32 (2001)
21. Renz, J., Nebel, B.: Qualitative Spatial Reasoning Using Constraint Calculi. In:
Aiello, M., Pratt-Hartmann, I., van Benthem, J. (eds.) Handbook of Spatial Logics,
pp. 161–215. Springer, Dordrecht (2007)
22. Renz, J., Mitra, D.: Qualitative direction calculi with arbitrary granularity. In:
Zhang, C., Guesgen, H.W., Yeap, W.K. (eds.) PRICAI 2004. LNCS (LNAI),
vol. 3157, pp. 65–74. Springer, Heidelberg (2004)
23. Dylla, F., Moratz, R.: Exploiting Qualitative Spatial Neighborhoods in the Situa-
tion Calculus. In: Freksa, C., Knauff, M., Krieg-Brückner, B., Nebel, B., Barkowsky,
T. (eds.) Spatial Cognition IV: Reasoning, Action, Interaction. International Con-
ference Spatial Cognition 2004. Springer, Heidelberg (2005)
24. Billen, R., Clementini, E.: Projective Relations in a 3D Environment. In: Sester,
M., Galton, A., Duckham, M., Kulik, L. (eds.) Geographic Information Science,
pp. 18–32. Springer, Heidelberg (2006)
25. Gabbay, D., Kurucz, A., Wolter, F., Zakharyaschev, M.: Many-Dimensional Modal
Logics: Theory and Applications. Studies in Logic and the Foundations of Mathe-
matics, vol. 148. Elsevier, Amsterdam (2003)
26. Lewis, D.K.: Counterpart Theory and Quantified Modal Logic. Journal of Philoso-
phy 65; repr. in The Possible and the Actual, Michael J.(ed.) Loux, Ithaca (1979);
also in: David K. Lewis. (ed.) Philosophical papers 1, Oxford 1983, pp. 113–126
(1968)
27. Kutz, O.: E -Connections and Logics of Distance. PhD thesis, The University of
Liverpool (2004)
28. Kutz, O., Lutz, C., Wolter, F., Zakharyaschev, M.: E -Connections of Abstract
Description Systems. Artificial Intelligence 156(1), 1–73 (2004)
29. Cuenca Grau, B., Parsia, B., Sirin, E.: Combining OWL Ontologies Using E -Con-
nections. Journal of Web Semantics 4(1), 40–59 (2006)
30. Gärdenfors, P.: Conceptual Spaces - The Geometry of Thought. Bradford Books.
MIT Press, Cambridge (2000)
31. Serafini, L., Bouquet, P.: Comparing Formal Theories of Context in AI. Artificial
Intelligence 155, 41–67 (2004)
32. Mossakowski, T., Maeder, C., Lüttich, K.: The Heterogeneous Tool Set. In: Grum-
berg, O., Huth, M. (eds.) TACAS 2007. LNCS, vol. 4424, pp. 519–522. Springer,
Heidelberg (2007)
33. CoFI: The Common Framework Initiative: Casl Reference Manual. Springer, Freely
(2004), http://www.cofi.info
34. Wölfl, S., Mossakowski, T., Schröder, L.: Qualitative Constraint Calculi: Hetero-
geneous Verification of Composition Tables. In: Proceedings of the Twentieth In-
ternational Florida Artificial Intelligence Research Society Conference (FLAIRS
2007), pp. 665–670. AAAI Press, Menlo Park (2007)
35. Kutz, O., Mossakowski, T., Codescu, M.: Shapes of Alignments: Construction,
Composition, and Computation. In: International Workshop on Ontologies: Rea-
soning and Modularity (at ESWC) (2008)
36. Kutz, O., Mossakowski, T.: Conservativity in Structured Ontologies. In: 18th Eu-
ropean Conf. on Artificial Intelligence (ECAI 2008). IOS Press, Amsterdam (2008)
37. Codescu, M., Mossakowski, T.: Heterogeneous Colimits. In: Boulanger, F., Gaston,
C., Schobbens, P.Y. (eds.) MoVaH 2008 (2008)
Automatic Classification of Containment and Support
Spatial Relations in English and Dutch
Kate Lockwood, Andrew Lovett, and Ken Forbus
Qualitative Reasoning Group, Northwestern University

2133 Sheridan Rd
Evanston, IL 60208
{kate, andrew-lovett, forbus}@northwestern.edu
Abstract. The need to communicate and reason about space is pervasive in hu-
man cognition. Consequently, most languages develop specialized terms for de-
scribing relationships between objects in space – spatial prepositions. However,
the specific set of prepositions and the delineations between them vary widely.
For example, in English containment relationships are categorized as in and
support relationships are classified as on. In Dutch, on the other hand, three dif-
ferent prepositions are used to distinguish between different types of support re-
lations: op, aan, and om. In this paper we show how progressive alignment can
be used to model the formation of spatial language categories along the con-
tainment-support continuum in both English and Dutch.
Keywords: Cognitive modeling, spatial prepositions.
1 Introduction
Being able to reason and communicate about space is important in many human tasks
from hunting and gathering to engineering design. Virtually all languages have de-
veloped specialized terms to describe spatial relationships between objects in their
environments. In particular, we are interested in spatial prepositions. Spatial preposi-
tions are typically a closed-class of words and usually make up a relatively small part
of a language. For example, in English there are only around 100 spatial prepositions.
Understanding how people assign spatial prepositions to arrangements of objects in
the environment is an interesting problem for cognitive science.
Several different aspects of a scene have been shown to contribute to spatial prepo-
sition assignment: geometric arrangement of objects, typical functional roles of ob-
jects (e.g. [9]), whether those functional relationships are being fulfilled (e.g. [4]) and
even the qualitative physics of the situation (e.g. [5]). The particular elements that
contribute to prepositions and how they are used to divide the space of prepositions
has been found to vary widely between languages (e.g. [1, 2]).
This paper shows how progressive alignment can be used to model how spatial
prepositions are learned. Progressive alignment uses the structural alignment process
of structure-mapping theory to construct generalizations from an incremental stream
of examples. The specific phenomena we model here is how people make distinctions
284 K. Lockwood, A. Lovett, and K. Forbus
along the containment-support continuum in both English and Dutch, based on a psy-
chological experiment by Gentner and Bowerman [11]. To reduce tailorability in
encoding the stimuli, we use hand-drawn sketches which are processed by a sketch
understanding system. We show that our model can learn to distinguish these prepo-
sitions, using (as people do) semantic knowledge as well as geometric information,
and requiring orders of magnitude fewer examples than other models of learning spa-
tial prepositions.
The next section describes the Gentner and Bowerman study that provided the in-
spiration for our experiments. Section 3 reviews structure-mapping theory, progres-
sive alignment, and the analogical processing simulations we use in our model. It
also summarizes the relevant aspects of CogSketch, the sketch understanding system
we used to encode the stimuli, and the ResearchCyc knowledge base we use for
common sense knowledge. Section 4 describes the simulation study. We conclude
by discussing related work, broader issues, and future work.
2 Gentner and Bowerman’s Study of English/Dutch Prepositions

Gentner and Bowerman [11] were testing the Typological Prevalence hypothesis that
the frequency with which distinctions and categories are found across the world’s
languages provides a clue to conceptual “naturalness” and how easy that particular
distinction is to learn. To explore this, they focused on a subset of spatial preposi-
tions in English and Dutch. The English and Dutch languages divide the support-
containment continuum quite differently. In English there are two prepositions: in is
used for containment relationships and on is used for support relationships. However,
Dutch distinguishes three different forms of support. The prepositions for Dutch and
English are outlined in Table 1 below.
Table 1. Table showing the containment and support prepositions in English and Dutch.
Drawings here are taken from the original Genter and Bowerman paper.
English Dutch Relationship Example
on op support from below
on aan hanging attachment
on om encirclement with contact
in in containment
Automatic Classification of Containment and Support Spatial Relations 285
Bowerman and Pederson found in a previous study [1] that some ways of dividing
up the containment-support continuum are very common crosslinguistically while
others are relatively rare. English follows a more linguistically common approach by
grouping all support relations together into the on category while the Dutch op-om-
aan distinction is extremely rare. Both use the very common containment category.
Following the Typological Prevalence Hypothesis, both English and Dutch children
should learn the common and shared category of in around the same time. It should
take Dutch children longer to learn the rare aan/op/om distinctions for support than it
takes the English children to learn the common on category.
2.1 Experiment
They tested children in five age groups (2, 3, 4, 5, and 6 years old) as well as adults
who were native speakers of English and Dutch. Each subject was shown a particular
arrangement of objects and asked to describe the relationship in their native language.
In the original experiment, 3-dimensional objects where used. So, for example, a sub-
ject would be shown a mirror on the wall of a doll house and asked “Where is the mir-
ror”. The set of all stimuli is shown in Table 2 below.
Table 2. Stimuli from the Gentner and Bowerman study
op/on aan/on om/on in/in

cookie on plate mirror on wall necklace on neck cookie in bowl
toy dog on book purse on hook rubber band on can candle in bottle
bandaid on leg clothes on line bandana on head marble in water
raindrops on lamp on ceiling hoop around doll stick in straw
window
sticker on handle on pan ring on pencil apple in ring
cupboard
lid on jar string on balloon tube on stick flower in book
top on tube knob on door wrapper on gum Cup in tube
freckles on face button on jacket ribbon on candle Hole in towel
The results of the study were consistent with the Typological Prevalence hypothe-
sis. Specifically, Dutch children are slower to acquire the op, aan, om system of sup-
port relations than English children are to learn the single on category. Both groups
of children learned the in category early and did not differ in their proficiency using
the term. Across all prepositions, English-speaking 3 to 4 year old children used the
correct preposition 77% of the time, while the Dutch children used the correct prepo-
sition 43% of the time. Within the Dutch children, the more typical op category was
learned sooner than the rarer aan and om categories. For a more detailed description
of the results, please see the original paper.
2.2 Motivation for Our Simulation Study
Modeling these results in detail is a daunting challenge for cognitive simulation. To

accurately capture the developmental trajectories of learning over multiple years, for
example, requires constructing a body of stimuli whose statistical properties are based
on hypotheses about the commonalities of experiences in the world. No cognitive

simulation has ever operated at that scale. There are practical challenges as well as
methodological challenges: automatic encoding of stimuli becomes essential, for in-
stance, whereas most cognitive simulations operate over hand-coded representations.
Consequently, in this paper we focus on a simpler, but still difficult, question: Can
progressive alignment be used to learn the spatial language containment/support cate-
gories in both English and Dutch? We use the Gentner & Bowerman stimuli as a
starting point, a known set of good examples for each of these categories.
3 Simulation Background
Several existing systems were used in our simulation. Each is described briefly here.
3.1 Simulating Similarity Via Analogical Matching
We use Gentner’s structure-mapping theory of analogy and similarity [12]. In structure-

mapping, analogy and similarity are defined in terms of a structural alignment process
operating over structured, relational representations. Our simulation of comparison for
finding similarity is the Structure-Mapping Engine [8], which is based on structure-
mapping theory. SME takes as input two cases, a base and a target. It produces as
output between one and three mappings describing the comparison between base and
target. Each mapping consists of: (1) correspondences between elements in the base
and elements in the target; (2) a structural evaluation score, a numerical characteriza-
tion of how similar the base and target are; and (3) candidate inferences, conjectures
about the target made by projecting partially-mapped base structures. There is consid-
erable psychological evidence supporting structure-mapping theory, including modeling
visual similarity and differences [13, 17] and SME has been used to successfully model
a variety of psychological phenomena.
3.2 Progressive Alignment and SEQL

Progressive alignment constructs generalizations by incremental comparisons, assimilat-
ing examples that are sufficiently similar into generalizations. These generalizations are
still rather concrete, and do not contain variables. Attributes and relationships that are
not common “wear away”, leaving the important commonalities in the concepts. Prob-
abilities are associated with each statement in the generalization, which provides a way
of identifying what aspects of the description are more common (and hence more cen-
tral) to the generalization.
We model progressive alignment via SEQL [14, 15, 20], which uses SME as a
component. SEQL creates generalizations from an incoming stream of examples. A
generalization context consists of a set of generalizations and examples for a concept.
For example, in learning spatial prepositions, there would be one generalization con-
text per preposition. All scenes described with the word op, for example, would be
processed in the op context. There can be more than one generalization per context,
since real-world concepts are often messy and hence disjunctive.
When a new example arrives, it is compared against every generalization in turn,
using SME. If it is sufficiently close to one of them (as determined by the assimilation
threshold), it is assimilated into that generalization. The probabilities associated with
statements that match the example are updated, and the statements of the example that
do not match the generalization are incorporated, but with a probability of 1/n, where
n is the number of examples in that generalization. If the example is not sufficiently
close to any generalization, it is then compared against the list of unassimilated ex-
amples in that context. If the similarity is over the assimilation threshold, the two ex-
amples are used to construct a new generalization, by the same process. An example
that is determined not to be sufficiently similar to either an existing generalization or
unassimilated example is maintained as a separate example.
3.3 CogSketch
CogSketch1 is an open-domain sketch understanding system. Each object in a Cog-

Sketch sketch is a glyph. Glyphs have ink and content. The ink consists of polylines,
i.e., lists of points representing what the user drew. The content is a symbolic token
used to represent what the glyph denotes. In CogSketch, users indicate the type of the
content of the glyph in terms of concepts in an underlying knowledge base. This is
one form of conceptual labeling. The knowledge base used for this work is a subset
of the ResearchCyc KB, which contains over 30,000 concepts. In addition to concep-
tual labels, the contents of glyphs can also be given names. A name is a natural lan-
guage string that the user can use to refer to the content of the glyph.
CogSketch automatically computes a number of qualitative spatial relations and at-
tributes for glyphs in a sketch. The relations computed include the RCC-8 qualitative
relations [3] that describe all possible topological relations between two-dimensional
shapes (e.g. disconnected, edge-connected, partially-overlapping). RCC-8 relations
are also used to guide the computation of additional spatial relationships such as posi-
tional relations like right/left. CogSketch also computes two types of glyph groups:
connected glyph groups and contained glyph groups. Connected glyph groups consist
of a set of glyphs whose ink strokes intersect. A contained glyph group consists of a
single container glyph and all of the glyphs fully contained within it.
3.4 ResearchCyc
Consider the sketch below showing the stimuli “freckles on face”. If you just look at the
topological relationship between the freckle glyphs and the face glyph, they clearly form
a contained glyph group with the face as the container and the freckles as the insider. As
work by Coventry and others has shown [6], geometric properties are not sufficient to
account for the way people label situations with spatial prepositions. A purely geomet-
ric account would declare freckles to be in the face, but we actually say freckles are
on/op faces. To model such findings, we must use real-world knowledge as part of our
simulation. For example, we know that freckles are physically part of a face. We use
knowledge from the ResearchCyc2 as an approximation for such knowledge. Freckles,
for example, are a subclass of PhysiologicalFeatureOfSurface, providing
the semantic knowledge that, combined with geometric information, enables us to
1
Available online at http://spatiallearning.org/projects/cogsketch_index.html. The publicly
available version of CogSketch comes bundled with the OpenCyc KB as opposed to the Re-
searchCyc KB which was used for this work.
2
http://research.cyc.com/
Fig. 1. Sketch of the spatial arrangement “freckles on face”. If you examine just the geometric
information, the freckles are in the area delineated by the face.
model spatial preposition judgments. As the world’s largest and most complete gen-
eral knowledge base, ResearchCyc contains much of the functional information
needed about the figure and ground objects in our stimuli.
4 Experiment
4.1 Materials
All 32 original stimuli from the Gentner and Bowerman study were sketched using
CogSketch. Each sketch was stored as a case containing: (1) the automatically com-
puted qualitative spatial relationships and (2) information about the types of objects in
the sketch. In the original experiment subjects were cued as to which object should be
the figure (e.g. “where is the mirror”) and which should be the ground. To approxi-
mate this, each sketch contained two glyphs, one named figure and one named
ground, and these names were used by the model. Recall that names in CogSketch are
just strings that are used to refer to the objects. Each object was also conceptually la-
beled using concepts from the ResearchCyc KB. For instance, in the mirror on the
wall stimulus, the mirror was declared to be an instance of the concept Mirror and
the wall was labeled as an instance of WallInAConstruction.
When people learn to identify spatial language categories in their native languages,
they learn to focus on the relationships between objects, and to retain only the impor-
tant features of the objects themselves rather than focusing on the surface features of
the objects. As noted above, having conceptual labels and a knowledge base allows us
to simulate this type of knowledge. For each conceptual label, additional concepts
from its genls hierarchy were extracted from ResearchCyc. The genls hierarchy speci-
fies subclass/superclass relationships between all the concepts of the KB. So, for ex-
ample, Animal and Dog would both be genls of Daschund. Here we were particularly
interested in facts relating to whether objects were surfaces or containers – and this
was particularly important for ground glyphs. The original facts were removed (in our
example “Daschund” would be deleted) to simulate abstraction away from specific
object types to more important semantic categories.
In the original study, the physical objects used as stimuli were manipulated to
make the important relationships more salient to subjects. We approximated this by
drawing our sketches so as to highlight the important relationships for the individual
spatial language categories. For example, the sketches for aan that required showing a
connection by fixed points were drawn from an angle that made the connectivity be-
tween the parts observable. Figure 2 below shows two aan sketches: knob aan door
and clothes aan line. They are drawn from perspectives that allow the system easy ac-
cess to the point-contact relationship.
Fig. 2. Two sketched stimuli showing objects drawn from different angles to make the point
connections salient
4.2 Method
The basic spatial category learning algorithm is this: For each word to be learned, a
generalization context is created. Each stimulus representing an example of that word
in use is added to the appropriate generalization contexts using SEQL. (Since we are
looking at both Dutch and English, each example will be added to two generalization
contexts, one for the appropriate word in each language.) Recall that SEQL can con-
struct more than one generalization, and can include unassimilated examples in its
representation of a category.
We model the act of assigning a spatial preposition to a new example E as follows.
We let the score of a generalization context be the maximum score obtained by using
SME to compare E to all of the generalizations and unassimilated examples in that
context. The word associated with the highest-scoring generalization context repre-
sents the model’s decision.
To test this model, we did a series of trials. Each trial consisted of selecting one
stimulus as the test probe, and using the rest to learn the words. The test probe was
then labeled as per the procedure above. The trial was correct if the model generated
the intended label for that stimulus. There were a total of 32 trials in English (8 for in
and 24 for on) and 32 trials in Dutch (8 each for in, op, aan, and om) one for each
stimulus sketch.
4.3 Results
The results of our experiment are shown below. The generalizations and numbers
given are for running SEQL on all the sketches for a category. The table below sum-
marizes the number of sketches that were classified correctly, for each preposition the
Table 3. Summary of correct labels for each preposition category tested
English Dutch
in 6 75% in 6 75%
op 7 87%
on 21 87% aan 6 75%
om 8 100%
Table 4. Number of exemplars and generalizations for each generalization context
English Dutch
in on in op aan om
Generalizations 2 6 2 2 3 3
Exemplars 2 0 2 2 0 2
number is out of 8 total sketches except for English on which has 24 total sketches.
All results are statistically significant (P < 10-4), except for the English in (P < 0.2),
which is close. For an in-depth discussion of the error patterns, see section 4.4.
Recall that within each generalization context, SEQL was free to make as many gen-
eralizations as it liked. SEQL was also able to keep some cases as exemplars if they
did not match any of the other cases in the context. The table below summarizes the
number of generalizations and exemplars for each context.
Best Generalization IN
Size: 3
(candle in bottle, cookie in bowl, marble in water)
--DEFINITE FACTS:
(rcc8-TPP figure ground)
--POSSIBLE FACTS:
33%: (Basin ground)
33%: (Bowl-Generic ground)
Fig. 3. One of the generalizations for English in along with the sketches for the component
exemplars
At first the amount of variation within the contexts might seem surprising. How-
ever, since the stimuli were chosen to cover the full range of situations for each con-
text it makes more sense. Consider the Dutch category op. The 8 sketches for this one
generalization included very different situations: clingy attachment (e.g. sticker op
cupboard), traditional full support (e.g. cookie op plate) and covering relationships
(e.g. top op jar).
Two of the English generalizations are shown in the figures below. For each gener-
alization the cases that were combined are listed followed by the facts and associated
probabilities.
Best Generalization ON
Size: 2
(top on tube, lid on jar)
--DEFINITE FACTS:
(Covering-Object figure)
(above figure ground)
--POSSIBLE FACTS:
50%: (definiteOverlapCase figure ground)
50%: (rcc8-PO figure ground)
50%: (rcc8-EC figure ground)
Fig. 4. Sample generalizations for English on along with the component sketches
4.4 Error Analysis
Closer examination of the specific errors made by SEQL is also illuminating. For ex-
ample, both the Dutch and English experiments failed on two in stimuli. It was the
same two stimuli for both languages: flower in book, and hole in towel. The first case,
flower in book, is hard to represent in a sketch. In the original study, actual objects
were used making it easier to place the flower in the book. It is not surprising that this
case failed given that it was an exemplar in both in contexts and did not share much
structure with other stimuli in that context. Hole in towel fails for a different reason.
The ResearchCyc knowledge base does not have any concept of a hole. Moreover,
how holes should be considered in spatial relationships seems different than for
physical objects.
Many of our errors stem from the small size of our stimuli set. For contexts that
contained multiple variations, there were often only one or two samples of each. An
interesting future study will be to see how many stimuli are needed to minimize error
rates. (Even human adults are not 100% correct on these tasks.) Interestingly, om is
one of the prepositions that is harder for Dutch children to learn (it covers situations
of encirclement with support). However, it was the only Dutch preposition for which
our system scored 100%. This again is probably explainable by sample size. Since the
entire context contained only cases of encirclement with support, there was more in
common between all of the examples.
4.5 Discussion
Our results suggest that progressive alignment is a promising technique for modeling
the learning of spatial language categories. Using a very small set of training stimuli
(only 7 sketches in some cases) SEQL was able to correctly label the majority of the
test cases. An examination of the results and errors indicates that our model, consis-
tent with human data, uses both geometric and semantic knowledge in learning these
prepositions. SEQL is able to learn these terms reasonably well, even with far less
data than human children, but on the other hand, it is given very refined inputs to be-
gin with (i.e., sketches). As noted below, we plan to explore scaling up to larger
stimulus sets in future work.
5 Related Work
There has been considerable cognitive science research into spatial prepositions, in-
cluding a number of computational models. Most computational models (cf. [16, 18,
10]) are based only on geometric information, which means that they cannot model
findings of Coventry et al [6] and Feist & Gentner[9], who showed that semantic
knowledge of functional properties is also crucial. Prior computational models have
also focused only on inputs consisting of simple geometric shapes (squares, circles,
triangles, etc.). We believe our use of conceptually labeled sketches is an interesting
and practical intermediate point between simple geometric stimuli and full 3D vision.
We also differ from many other models of spatial language use in the number of
training trials required. Many current models use orders of magnitude more trials than
we do. We are not arguing that people learn spatial preposition categories after expo-
sure to only 7 examples. After all, children have a much harder task than the one we
have modeled here: they have many more distractions and a much richer environment
from which to extract spatial information. On the other hand, we suspect that requiring
103-104 exposures, as current connectionist models need, is psychologically implausible.
For example, one model requires an epoch of 2100 stimuli just to learn the distinction
above/below/over/under for one arrangement of objects (a container pouring a liquid
into a bowl/plate/dish) [7]. The actual number of trials that is both sufficient and cogni-
tively plausible remains an open question and an interesting problem for future work.

Our model was able to successfully learn the support-containment prepositions in
both Dutch and English with a small number of training trials. We see three lines of
investigation suggested by these results. First, we would like to expand our experi-
ments to include more relationships (e.g. under, over, etc). Second, we would like to
expand to other languages. For example, Korean uniquely divides the containment re-
lationship into tight fit and loose fit relations. Third, we are in the process of building
a sketch library of more instances of spatial relations. With more sketches, we will
have additional evidence concerning the coverage of our model.
There is also clearly a tradeoff between using a cognitively plausible number of
training examples and having enough training examples to get good generality. For
example, being able to automatically extract the important object types and features
(e.g. containers) and ignore the spurious ones (e.g. that something is edible). We are
planning future experiments to examine this issue by varying the number of training
trials used. It will also be interesting to see if we can use the same set of experiments
to model the development of spatial language categories in children by varying the
availability of different types of information.
Acknowledgments. This work was sponsored by a grant from the Intelligent Systems
Program of the Office of Naval Research and by The National Science Foundation
under grant no: SBE-0541957, The Spatial Intelligence and Learning Center. The
authors would like to thank Dedre Gentner and Melissa Bowerman for access to their
in-press paper and stimuli.
References
1. Bowerman, M., Pederson, E.: Crosslinguistic perspectives on topological spatial relation-
ships. In: The 87th Annual Meeting of the American Anthropological Association, San
Francisco, CA (paper presented, 1992)
2. Bowerman, M.: Learning How to Structure Space for Language: A Crosslinguistic Per-
spective. In: Bloom, P., Peterson, M.A., Nadel, L., Garrett, M.F. (eds.) Language and
Space, pp. 493–530. MIT Press, Cambridge (1996)
3. Cohn, A.: Calculi for Qualitative Spatial Reasoning. In: Pfalzgraf, J., Calmet, J., Campbell,
J.A. (eds.) AISMC 1996. LNCS, vol. 1138, pp. 124–143. Springer, Heidelberg (1996)
4. Coventry, K.R., Prat-Sala, M., Richards, L.V.: The Interplay Between Geometry and Func-
tion in the Comprehension of ‘over’, ‘under’, ‘above’, and ‘below’. Journal of Memory
and Language 44, 376–398 (2001)
5. Coventry, K.R., Mather, G.: The real story of ‘over’? In: Coventry, K.R., Oliver, P. (eds.)
Spatial Language: Cognitive and Computational Aspects, Kluwer Academic Publishers,
Dordrecht (2002)
6. Coventry, K.R., Garrod, S.C.: Saying, Seeing and Acting: The Psychological Semantics of
Spatial Prepositions. Essays in Cognitive Science Series. Lawrence Erlbaum Associates,
Mahwah (2004)
7. Coventry, K.R., Cangelosi, A., Rajapakse, R., Bacon, A., Newstead, S., Joyce, D., Rich-
ards, L.V.: Spatial prepositions and vague quantifiers: Implementing the functional geo-
metric framework. In: Proceedings of Spatial Cognition Conference. Springer, Germany
(2005)
8. Falkenhainer, B., Forbus, K., Gentner, D.: The Structure-Mapping Engine. In: Proceedings
of the Fifth National Conference on Artificial Intelligence, pp. 272–277. Morgan Kauf-
mann, San Francisco (1986)
9. Feist, M.I., Gentner, D.: On Plates, Bowls, and Dishes: Factors in the Use of English ‘in’
and ‘on’. In: Proceedings of the 20th Annual Conference of the Cognitive Science Society
(1998)
10. Gapp, K.P.: Angle, distance, shape and their relationship to project relations. In: Moore,
J.D., Lehman, J.F. (eds.) Proceedings of the Seventeenth Annual Conference of the Cogni-
tive Science Society, pp. 112–117. Lawrence Erlbaum Associates Inc., Mahwah (1995)
11. Gentner, D., Bowerman, M.: Why Some Spatial Semantic Categories are Harder to Learn
than Others: The Typological Prevalence Hypothesis (in press)
12. Gentner, D.: Structure-Mapping: A theoretical framework for analogy. Cognitive Sci-
ence 7, 155–170 (1983)
13. Gentner, D., Markman, A.B.: Structure mapping in analogy and similarity. American Psy-
chologist 52, 42–56 (1997)
14. Halstead, D., Forbus, K.: Transforming between Propositions and Features: Bridging the
Gap. In: Proceedings of AAAI, Pittsburgh, PA (2005)
15. Kuehne, S., Forbus, K., Gentner, D., Quinn, B.: SEQL: Category learning as progressive
abstraction using structure mapping. In: Proceedings of the 22nd Annual Meeting of the
Cognitive Science Society (2000)
16. Lockwood, K., Forbus, K., Halstead, D., User, J.: Automatic Categorization of Spatial
Prepositions. In: Proceedings of the 28th Annual Conference of the Cognitive Science So-
ciety (2006)
17. Markman, A.B., Gentner, D.: Commonalities and differences in similarity comparisons.
Memory & Cognition 24(2), 235–249 (1996)
18. Regier, T.: The human semantic potential: Spatial language and constrained connection-
ism. MIT Press, Cambridge (1996)
19. Regier, T., Carlson, L.A.: Grounding spatial language in perception: An empirical and
computational investigation. Journal of Experimental Psychology: General 130(2), 273–
298 (2001)
20. Skorstad, J., Gentner, D., Medin, D.: Abstraction Process During Concept Learning: A
Structural View. In: Proceedings of the 10th Annual Conference of the Cognitive Science
Society (1988)
Integral vs. Separable Attributes
in Spatial Similarity Assessments
Konstantinos A. Nedas and Max J. Egenhofer
National Center for Geographic Information and Analysis

and
Department of Spatial Information Science and Engineering
University of Maine
Boardman Hall, Orono, ME 04469-5711, USA
konstantinos@nedas.gr, max@spatial.maine.edu
Abstract. Computational similarity assessments over spatial objects are typi-

cally decomposed into similarity comparisons of geometric and non-geometric
attribute values. Psychological findings have suggested that different types of
aggregation functions—for the conversions from the attributes’ similarity val-
ues to the objects’ similarity values—should be used depending on whether the
attributes are separable (which reflects perceptual independence) or whether
they are integral (which reflects such dependencies among the attributes as
typically captured in geometric similarity measures). Current computational
spatial similarity methods have ignored the potential impact of such differences,
however, treating all attributes and their values homogeneously. Through a
comprehensive simulation of spatial similarity queries the impact of psycho-
logically compliant (which recognize groups of integral attributes) vs. deviant
(which fail to detect such groups) methods have been studied, comparing the
top-10 items of the compliant and deviant ranked lists. We found that only
for objects with very small numbers of attributes—no more than two or three
attributes for the objects—the explicit recognition of integral attributes is negli-
gible, but the differences between compliant and deviant methods become
progressively worse as the percentage of integral attributes increases and the
number of groups in which these integral attributes are distributed decreases.
1 Introduction
Similarity assessment implies a judgment about the semantic proximity of two or
more entities. In a rudimentary form, this process consists of a decomposition of the
entities under comparison into elements in which the entities are the same, and
elements in which they differ (James 1890). People perform such tasks based on their
intuitions and knowledge; however, their judgments are often subjective and display
no strict mathematical models (Tversky 1977). Formalized similarity assessments are
critical ingredients of Naive Geography (Egenhofer and Mark 1995), which serves as
the basis for the design of intelligent GISs that will act and respond much like a
person would. The challenge for machines to perform similarly is the translation of a
qualitative similarity assessment into the quantitative realm of similarity scores,
296 K.A. Nedas and M.J. Egenhofer
typically within the range of 0 (worst match) to 1 (best match). This paper addresses
similarity within the context of spatial database systems.
Spatial similarity assessment is commonly based on the comparisons of spatial ob-
jects, which are typically characterized by geometric (Bruns and Egenhofer 1996) and
thematic (Rodríguez and Egenhofer 2004) attributes. Geometric attributes are associ-
ated with the objects’ shapes and sizes, while thematic attributes capture non-spatial
information. For example, the class of Rhodes is island, its name and population are
thematic attributes, while a shape description such as the ratio of the major and minor
axes of its minimum bounding rectangle provides values for its geometric attributes.
The same dichotomy of spatial and thematic characteristics applies to relations. For
example, Rhodes, which is disjoint from the Greek mainland and located 650km
southeast of Thessaloniki, has a smaller population than Athens. Spatial similarity
assessments consider the objects’ characteristics and relations.
The similarity of two spatial objects is typically computed with a distance (i.e.,
dissimilarity) measure that is defined upon the objects’ representations. To yield cog-
nitively plausible results, this estimate must match with people’s notions of object
similarity (Gärdenfors 2000). A critical aspect in this process is the role of an aggre-
gation function, which combines atomic judgments (i.e., comparisons of pairs of
attribute values) into an overall composite measure for pairs of objects. Separable
attributes are perceptually independent as they refer to properties that are obvious,
compelling, and clearly perceived as two different qualities of an entity (Torger-
son 1965). Conversely, integral attributes create a group when their values are con-
ceptually correlated, but lack an obvious separability (Ashby and Townsend 1986).
Conceptual correlation implies that the values of such attributes are perceived as a
single property, independent of their attributes’ internal representations (e.g., as a set
of concomitant attributes). While general-purpose information systems employ pri-
marily separable attributes, such as age, job title, salary, and gender in a personnel
database, a significant amount of integral attributes may be hidden in the representa-
tional formalisms that GISs employ to model the complex topological relations of
spatial objects (Egenhofer and Franzosa 1995; Clementini and di Felice 1998). The
set of possible integral attributes grows with metric refinements of topological rela-
tions (Egenhofer and Shariff 1998; Nedas et al. 2007).
Psychological research has converged to a consensus that aggregation functions
should differ depending on whether the atomic judgments are made on separable or
integral attributes (Attneave 1950; Nosofsky 1986; Shepard 1987; Nosofsky 1992;
Takane and Shibayama 1992; Hahn and Chater 1997; Gärdenfors 2000). Since the
recognition of the integral attributes and the form of the aggregation function affect
the rankings at the object level, spatial information systems should employ a psycho-
logically compliant model (i.e., a model that acocunts for integral attributes) for simi-
larity assessments using psychologically correct aggregation functions to determine
the similarity of a result to a query. Most of the current studies and prototypes, how-
ever, do not account for integral attributes as they use psychologically deviant meth-
ods, making no distinctios between separable and integral attribtues.
Would the incorporation of psychologically compliant provisions into a formalized
spatial similarity assessment yield different similarity results? To answer this ques-
tion, this paper sets up a similarity simulation that generates a broad spectrum of ex-
perimental results for spatial similarity queries. This simulation provides a rationale
Integral vs. Separable Attributes in Spatial Similarity Assessments 297
for deciding whether spatial similarity assessments should employ psychologically

compliant measures that recognize integral vs. separable attributes, or whether this
distinction has no impact on computational similarity comparisions.
The remainder of this paper reviews similarity measures (Section 2), describes the
experimental and analytical setup for the simulation (Section 3), and presents an in-
depth interpretation of the results (Section 4). Conclusions and future work are dis-
cussed in Section 5.
2 Similarity Measures
Similarity-based information retrieval goes beyond the determination of an exact
match between queries and stored data. It provides the users with a range of possible
answers, which are the most similar to the initial requests and, therefore, the most
likely to satisfy their queries. The results of such spatial queries are ranked (Hjaltason
and Samet 1995) according to similarity scores, enabling exploratory access to data
by browsing, since users usually know only approximately what they are looking for.
Such similarity-based retrieval also relieves users from the burden of reformulating a
query repeatedly until they find useful information.
2.1 Similarity at the Attribute Level
The core of a similarity mechanism’s inferential abilities is at the attribute level. By

exploiting the differences among attribute values of objects and relations, a similarity
algorithm can reason about the degree of difference or resemblance of a result to a
query. When the query consists of a constraint on an atomic value of a single
attribute, the process of similarity assessment takes place at the attribute level, while
for a query that consists of multiple such constraints, a similarity assessment takes
place at the object level. In both cases, the results are objects; the difference, however,
is that in the latter case the individual similarity scores that were produced separately
for each attribute must somehow be combined to a meaningful composite.
Dey et al. (2002) developed simple similarity measures for attribute values to iden-
tify duplicates for the same entity in databases. Rodríguez and Egenhofer (2003) com-
bined distinguishing features of entities with their semantic relations in a hierarchical
network and created a model that evaluates similarity among spatial concepts (i.e.,
entity classes). Based on theories for reasoning with topological, metric, and direc-
tional relations several computational models for spatial relation similarity have been
developed (Egenhofer 1997; Egenhofer and Shariff 1998; Goyal and Egenhofer
2001), including their integrations into qualitative (Bruns and Egenhofer 1996; Li and
Fonseca 2006) and quantitative (Gudivada and Raghavan 1995; Nabil et al. 1996)
similarity measures. Most established similarity measures are derived in an ad-hoc
manner, guided by experience and observation. In this sense, they are concerned with
similarity from a pragmatic, not a cognitive, point of view.
Different roles of attributes have a profound impact on similarity assessments.
Some attributes are perceived as separable, while others are perceptually indistin-
guishable. Such perceptual indistinguishability must not be confused with correlation.
For instance, people’s heights and weights may have a positive correlation as attrib-
utes, but they are probably separable quantities in perception. Integral attributes are
produced by artificial decompositions of quantities or qualities that make no intuitive

sense, but often serve to describe representations in information systems, with color
(e.g., RGB and CMYK) being the classic example.
Integral spatial attributes typically occur for high-level abstractions, such as shape
or spatial relation. Shape is often captured through a series of metric parameters, for
instance for elongation and perforation (Wentz 2000), or combinations of deforma-
tions, such as stretching and bending (Basri et al. 1998). When judging shape, these
detailed spatial properties are typically perceived in combination, rendering shape an
integral spatial attribute. In the same vein, spatial relations (Bruns and Egenhofer
1996), or more specifically topological relations (Egenhofer and Franzosa 1995), are
spatial representations that contain several integral dimensions. While a user per-
ceives one topological relation (Figure 1), there are a dozen spatial attributes—from
coarse topological properties (Egenhofer and Herring 1991) over detailed topological
properties (Egenhofer and Franzosa 1995) to detailed metric refinements (Egenhofer
and Shariff 1998, Nedas et al. 2007) in order to differentiate spatial relations.
Fig. 1. Representing a topological relation at progressively finer levels of detail
2.2 Similarity at the Object Level
Findings from psychology about the way that people perceive the nature of similarity,
its properties, and its relationship to peripheral notions, such as difference and
dissimilarity, are largely ignored in computational similarity assessments. The focus
on the computational feasibility and efficiency, while dismissing cognitive elements,
renders the plausibility of such approaches to human perception questionable. The
similarity of one object to another is an inverse function of the distance between the
objects in a conceptual space, that is the collection of one or more domains
(Gärdenfors 2000). Attribute weights that indicate each dimension’s salience within
the space offer a refined similarity asssessment. The distance in a conceptual space
indicates dissimilarity, which should be compatible with people’s judgments of
overall dissimilarity; therefore, its correct calculation becomes important. Following
widely accepted psychological research (Attneave 1950; Torgerson 1965; Shepard
1987; Ashby and Lee 1991; Nosofsky 1992; Gärdenfors 2000), the perceived
interpoint distances between the objects’ point representations in that space should be
computed either by a Euclidean metric or a city-block metric (also known as the
Manhattan distance).
Which one to employ depends on whether one deals with integral or separable di-
mensions. Integral dimensions are strongly unanalyzable and typically perceived as a
single stimulus. For instance, the proximity of two linear objects may be described
with a number of measures that associate the boundaries and interiors of the objects
(Nedas et al. 2007), but the closeness relation may be perceived as one stimulus from
the users that inspect the lines. Hence, a set of integral dimensions constitutes in es-
sence one multi-dimensional attribute (Torgerson 1965). Separable dimensions, on the
other hand, are different and distinct properties (e.g., length and height) that are per-
ceptually independent (Ashby and Lee 1991). It has been suggested and experimen-
tally confirmed (Attneave 1950; Torgerson 1965; Shepard 1987) that, with respect to
human judgments for similarity, a Euclidean metric performs better with integral
dimensions, whereas a city-block metric matches more closely separable dimensions.
Perceptually separable dimensions are expected to have a higher frequency of oc-
currance in databases; therefore, in the general case the composite dissimilarity indi-
cator between two objects will be calculated by the weighted average of individual
dissimilarities along each of the dimensions. For a group of n integral attributes, how-
ever, an Euclidean metric should be adopted to derive the dissimilarity of the objects
with respect to this integral group. Therefore, the combination of the n concomitant
attributes of an integral group should yield one dissimilarity component rather than n
individual components in the composite measure (Figure2).
Fig. 2. Combining the dissimilarity values d4 and d5 of two integral attributes (Attribute 4 and
Attribute 5) into a single dissimilarity component, before summing it up with the dissimilarity
values d1 … d3 (of the separable attributes 1…3) to determine the overall dissimilarity D be-
tween a DB Object and a Query Object
3 Object Similarity Simulation

The impact of intergral vs. separable attributes on object-based similarity comparisions
is evaluated for a comprehensive series of query scenarios. Factors that may influence
the outcome include the number of objects compared, the number of attributes of each
object, and the distribution of separable vs. integral attributes. The intended exhaustive
character of these experiments was a prohibitive factor in locating real-world datasets

that accommodate all tested scenarios. Hence, the assessment relies on simulations with
synthetic datasets and queries that are randomly generated with the Sensitivity Analyzer
for Similarity Assessments SASA (Nedas 2006). This software prototype served as a
testbed to examine different processing strategies for an exhaustive set of similarity
queries. The experiment’s results comprise a ranked list for a compliant method and
another one for a defiant method. We introduce tailored measures for comparing the
relevant portions of such ranked lists (Section 3.1).
In SASA these synthetic constructs were originally populated with random values
that followed different statistical distributions each time (e.g., uniform, normal). The
experimental set-up included five parameters (Section 3.2), which offered a wide
range of variations for the distribution of separable and integral attributed. The under-
lying distribution of the actual data had a negligible effect on the final results. The
distribution of random values is, therefore, kept constant and assumed to be uniform
throughout this study. Likewise, a consideration of different attribute types in the
simulated databases is immaterial for the purposes of the experiment, because the
algorithm performs atomic value assessments, yielding a dissimilarity measure be-
tween 0 and 1 regardless of the attribute type. The focus of the experiment, however,
is to examine how such atomic dissimilarities should be combined to create scores of
aggregate dissimilarity. The experiment was conducted several thousand times and
the results were averaged in order to make the rank-list measures converge to their
medium values. The number of repetitions was determined empirically, such that
successive experiment executions yielded results with less than 1% deviation.
In order to summarize the test results effectively a 4-dimensional rendering was
developed (Section 3.3), which fixes one of the five parameters—the size of the test
set—and visualizes the other three parameters through a 3-D-plus-color diagram.
3.1 Comparisons of Ranked Lists
Most approaches to compute the deviations between two ranking lists (Mosteller and
Rourke 1973; Gibbons 1996) rely on statistical tests, which consider the entire range
of the lists. An evaluation of ranking lists produced from database queries or web
search queries is different, however, as they focus only on the first few ranks, because
the relevance of retrieved items decreases rapidly for lower ranks. For the
experiments in this study, the relevant portion of the ranking list was defined as the
ten best hits. This decision was partially based on the experimental outcomes that
people retain no more than five to nine items in short term memory (Miller 1956).
The 7+/-2 rule refers to unidimensional stimuli; therefore, people are expected to be
able to retain this number of results in short term memory only for very simple
queries. This choice was also based on the typical strategy of current web-search
engines, which present ten items per page, starting from the most relevant. Therefore,
the set of the ten best results is not only easy to browse and inspect, but also
convenient in the sense that users can memorize it to a large degree and perform swift
comparative judgments about the relevance of each match to their query.
As the database size grows, the ranks of the ten best results are determined based on
finer differences of their similarity values. If one also considers that psychologically
compliant methods approximate better, but do not necessarily model human perception
exactly, then a measure of incompatibility that relies only on rank differences would be
strict. A more practical and objective indicator of the incompatibility between two
methods considers instead the overlap of common objects within the relevant portion of
the ranking lists. This measure, denoted by O, expresses the percentage of the common
items within the ten best results that the compared methods produce. The selection of
this measure is also further justified by the fact that each of the items in the relevant
portion is equally accessible to the users (i.e., ten results per page).
The actual rank differences are examined as a secondary, less crucial index of in-
compatibility. They are used as an additional criterion when the overlap measure
provides borderline evidence for that purpose. The rank differences are assessed using
a modified Spearman Rank Correlation (SRC) test. This test is an appropriate statistic
for ordinal data, provided that its resulting coefficient is used only to test a hypothesis
about order (Stevens 1951). The SRC coefficient R, with xi and yi as the rank orders of
item i in two compared samples that contain n items each (Equation 1), takes a value
between –1 and +1, where +1 indicates perfect agreement between two samples (i.e.,
the elements are ranked identically), while –1 signals complete disagreement (i.e., the
elements are ranked in inverse order). A value of 0 means that there is no association
between the two samples, whereas other values than 0, 1, and –1 would indicate in-
termediate levels of correlation.
n
6∑ (xi − yi ) 2 (1)
R = 1− i=1
n ⋅(n −1)2
The SRC coefficient and similar statistics are designed for evaluations of ranking lists
that contain exactly the same elements. Hence, it cannot be readily applied to tests
that require a correlation value between a particular subsection of the ranking lists.
This observation is essential, because the items in the relevant portion of the lists will
only incidentally be the same for two different methods. To enable the comparison of
lists with different numbers of entries, a modified SRC coefficient is computed as
Fig. 3. Overlap percentage O and modified Spearman Rank Correlation coefficient R' for the
relevant portion of two ranking lists
follows: first, the different elements in the two lists are eliminated and R (Equation 1)
is computed for the common elements that remain. Then, the modified coefficient R'
is calculated by multiplying R with the overlap percentage O (Figure 3). This second
step is necessary in order to avoid misleading results. For example, when among the
top ten items only one common element exists, R = 1, but R' = 0.1.
Methods that produce very similar results are characterized by positive values of
the measures O and R', close to 1, whereas methods that produce very dissimilar re-
sults are characterized by an overlap value close to 0 and by a modified SRC coeffi-
cient value close to 0 or negative.
3.2 Test Parameters
The dissimilarities of the ranks for an object query with different methods are
captured through the incompatibility measures O and R', which are each functions of
five variables n, m, p, g, and d.
• Variable n is the number of objects in the database, determining the database size.
The experiments were conducted for the set N = {1,000, 5,000, 25,000, 100,000},
so that each database size increases approximately one order of magnitude over its
predecessor. A dataset of 1,000 objects was adopted as a characteristic case of a
small database, a dataset of 100,000 objects as a characteristic case of a large data-
base, with datasets of 5,000 and 25,000 objects as representatives of medium-small
and medium-large databases, respectively.
• Variable m is the number of attributes that participate in the similarity assessment
of a database object to a query object. The set examined is
M = {2, 5, 10, 20, 30, 40, …, 100} and accounts for the most simple and complex
modeled objects. The case of queries on a single attribute is omitted, because it is
irrelevant for this investigation. One integral attribute is undefined, because it es-
sentially degenerates to one separable attribute.
• Variable p is the percentage of integral attributes out of the total number of attrib-
utes m. The actual number of integral attributes is, therefore, p⋅m. In this manner, p
also indirectly determines the number of separable attributes. The percentages
taken are p = {0%, 10%, 20%, …, 100%). The two extreme values represent the
cases where all attributes are separable (0%) and integral (100%).
• Variable g is the number of integral groups in which the integral attributes are
distributed. Its values are constrained by the specific instantiations of m and p. For
example, for objects with ten attributes (m = 10), four of which are integral
(p = 40%), there could be one group of four attributes or two groups of two attrib-
utes. For the experiment, g has a range from 1 to 50. The smallest value occurs in
various settings, starting with m = 2 and p = 100%. The largest value occurs only if
m = 100 and p = 100%.
• Variable d is the group distribution policy, specifying how a number of integral
attributes p⋅m is distributed into g integral groups. For some configurations there
could be numerous such possibilities. For instance, eight integral attributes that are
distributed into two groups can yield several different allocations, such as 6-2, 5-3,
and 4-4. Preliminary experimentation indicated that the similarity results could be
affected by the distribution policy, especially for larger percentages of integral
attributes. This parameter is treated as a binary variable taking the values “optimal”
and “worst.” An optimal distribution policy tries to distribute the integral attributes
evenly, such that each integral group contains approximately the same number of
attributes (Figure 4a), whereas a worst-case distribution policy creates dispropor-
tionately-sized groups by assigning as many attributes as possible to one large in-
tegral group, while populating the remaining groups with the smallest number of
attributes (Figure 4b). The binary treatment of the group distribution policy allows
inferences about the behavior of this variable between its two extremes settings,
while keeping the number of produced diagrams within realistic limits.
Fig. 4. Splitting integral attributes into groups using (a) an optimal and (b) a worst distribution
policy
3.3 Visualization of Similarity Scores
A specific instantiation of the variables n, m, p, g, and d represents a possible database

configuration and is referred to as a db scenario. The simultaneous interaction of all
variables involved for such db scenarios and their effect on the ranks cannot be
accommodated by the representational capabilities of typical 2-dimensional or 3-
dimensional visualization techniques due to the large amount of diagrams that would
have to be produced. In order to visualize the results effectively, while keeping the
number of produced diagrams within acceptable bounds, a 4-dimensional
visualization technique was employed. For each 4-dimensional diagram, the database
size n and the distribution policy d are kept fixed, while the remaining variables are
allowed to vary within a 3-dimensional cubic space. The axes X, Y, and Z of this space
correspond to the number of integral groups g, the number of attributes m, and the
percentage of integral attributes p, respectively. Each point in the cubic space
signifies, therefore, a db scenario determined by the instantiation of the triple (m, p, g)
that defines the point, and the fixed values of n and d. The color assigned to a db
scenario (i.e., point) embeds a fourth dimension in the visualization, which represents
the measurement of O (i.e., the overlap) or R' (i.e., the modified SRC coefficient)
between the two compared methods for that db scenario. Warm colors (in the range of
red) display shades of high similarity, while cool colors (in the range of blue) indicate
shades of high dissimilarity. Since there are two incompatibility measures, four
database sizes, and two distribution policies, a total of sixteen diagrams was produced
for each experiment.
The 4-dimensional diagrams (Figure 5) correspond to the scenario of a database of

1,000 objects, each with 40 attributes—20 separable and 20 integral. The latter are
distributed in 10 groups through an optimal distribution policy (i.e., each group con-
tains 2 attributes). For the db scenario of point A in Figure 5 the overlap measure is
approximately 40%, whereas A’s value of R' is approximately 0.2.
A triangular half of the volumes of the produced cubes is not populated with meas-
urements, because it corresponds to non-applicable db scenarios. For example, point
B in Figure 5 is such a db scenario, because it is impossible to allocate 60 integral
attributes within 40 groups. Realizable db scenarios are located within the remaining
half of the cube. Since the values of the variables m, p, and g are discrete, the realiz-
able db scenarios form a dense grid, rather than a continuous surface. The diagrams,
however, use continuous color-rendered surfaces instead—produced by interpolating
the grid values—in order to facilitate the interpretation of the results. Furthermore, the
cube is sliced at regular intervals along the Z-axis to reveal the patterns in its interior.
Fig. 5. A 4-dimensional diagram depicting the measures (a) O and (b) R' (color figure available
at http://www.spatial.maine.edu/~max/similarity/4D-0.pdf)
4 Experiment Results and Interpretation

The test results (Figure 6) indicate a definitive pattern of gradual variation. The
deviant method in this experiment is a manifestation of the Manhattan distance
function with no integral groups recognized. Hence, the number of aggregated terms
is always equal to the total number of attributes m. Furthermore, each term contributes
equally to the similarity score assigned to each object of the database. As the variables
change, the form of the compliant method becomes more or less similar to the pattern
of the deviant method. The interactions behind these deviations explain the outcome
illustrated in the diagrams.
The main conclusion is that the measures O and R' become progressively worse as
the percentage of integral attributes increases and the number of groups in which
these integral attributes are distributed decreases. When either or both trends occur,
the aggregated terms with the compliant method reduce to a number much less than
m. For example, for one separable attribute, nine integral attributes, and three groups,
the deviant method aggregates ten terms and the compliant four terms. Moreover, the
effect of the one remaining separable attribute with the compliant method is dispro-
portionate on the final score compared to that of the other attributes. As the number of
groups increases, the measures have a greater concordance, because the impact of
such isolated attributes on the final score diminishes.
This observation also explains the dissonance to the deterioration pattern observed
at the highest layer of the optimal distribution policy diagrams, where such separable
attributes disappear. The even distribution of integral attributes into groups makes the
compliant method behave similarly to the deviant at this layer. For example, consider
a query with ten attributes, all of which are integral and must be distributed in five
groups. The deviant approach will aggregate all ten attributes as separable. The com-
pliant will first separate the ten attributes in groups of two, aggregate each group, and
combine the resulting five terms to derive the object’s similarity. For a single group,
the compliant method becomes identical to the Euclidean distance function. The trend
of deterioration, however, is not interrupted at the highest layer of the diagrams for
the worst distribution policy because the group sizes with this policy differ drastically.
In this case, the smaller integral groups continue to have a disproportionate influence
on the final similarity score.
The more uniform the distribution into groups is, the less significant the effects on
the measures O and R' become. The wavy patterns at the higher layers of the optimal-
distribution diagrams also support this conclusion. Such effects are due to the alternat-
ing exact and approximate division of integral attributes into groups. For example, for
nine integral attributes and three groups the division is exact with three attributes in
each group, while for ten or eleven integral attributes, the groups differ in size. In the
diagrams of the worst distribution policy where group sizes remain consistently im-
balanced, the small stripes of temporary improvements disappear. Excluding the wavy
patterns and the case of all attributes being integral, the measures appear to be invari-
ant to the group distribution policy elsewhere.
The results worsen slightly with an increase in the number of attributes; however,
the influence of this variable is much more subtle compared to the others. When the
attribute number is very small, the methods are often identical, because the attributes
are insufficient to form integral groups (e.g., for two attributes and up to 50% per-
centage of integral attributes). This observation explains the cause for the very high
values of O and R' detected at the rightmost edge of the diagrams.
The compared methods also yield progressively different outcomes as the database
size increases. This result was anticipated, because two functions are expected to
demonstrate approximately the same degree of correlation regardless of the sample
size with which they are tested. Hence, if the entire ranking lists were considered (i.e.,
if the lists contained all database objects), and assuming all other variables equal, the
two compared methods would exhibit on average the same correlation, regardless of
Fig. 6. Overview of the results acquired from the experiment (color figure available at http://
www.spatial.maine.edu/~max/similarity/4D-1.pdf)
the database size. Increasing the number of objects in the database, while keeping the
size of the relevant portion constant leaves more potential for variations within the
ten best results and explains why the overlaps and correlations decline for larger
databases.
Both O and R' take a value of 1 at the lowest layer where all attributes are separa-
ble and the compared methods coincide. For all other db scenarios, the modified
Spearman Rank Correlation coefficient R' has a lower value than the overlap O. This
result is not surprising considering that R' is a stricter measure than O. The diagrams
suggest that the correct recognition of integral attributes and groups is immaterial for
smaller datasets as long as the percentage of integral attributes remains below 40%.
For the largest database considered this limit drops to around 20%. At these percent-
ages, O and R' have values of 0.5 and 0.2, respectively. Such values constitute border-
line measurements, because they imply that only half of the retrieved objects in the
relevant portion are the same and that these common objects are ranked very differ-
ently. The need for different treatments of separable vs. integral attributes also cor-
roborated by the actual sizes of real-world geographic databases, which are often
much larger than the largest dataset in this experiment. Only for objects with very
small numbers of attributes—no more than two or three attributes for the objects—the
recognition of integral attributes is negligible.
5 Conclusions
Computational similarity assessments among spatial objects typically compare the
values of corresponding attributes and relations employing distance functions to
capture dissimilarities. Psychological findings have suggested that different types of
aggregation functions—for the conversions from the attributes’ similarity values to
the objects’ similarity values—should be used depending on whether the attributes are
separable (which reflects perceptual independence) or whether they are integral
(which reflects a dependency among the attributes). Current computational similarity
methods have ignored the potential impact of such differences, however, treating all
attributes and their values homogeneously.
An experimental comparison between a psychologically compliant approach
(which recognizes groups of integral attributes) and a psychologically deviant ap-
proach (which fails to detect such groups) showed that the rankings produced with
each method are incompatible. The results do not depend per se on the correlation of
the attribute dimensions. It is the choice of the aggregation function that yields the
object similarities depending on whether the attributes are perceptually distinguish-
able or not and, therefore, the perceptual plausibility of the obtained results will be
affected if one ignores the perceptual "correlation" of the attribute dimensions. The
simulations showed that even for a modest amount of integral attributes within the
total set of attributes considered, the dissimilarities are pronounced, particularly in
the presence of a single integral group or a small number of them. This trend worsens
for large-scale databases. Both scenarios correspond closely to spatial representations
and geographic databases. The structure of the current formalisms used to represent
detailed topological, directional, and metric relations is often based on criteria other
than a one-to-one correspondence between the representational primitives employed
and human perception. Such formalisms are likely to contain one or few integral
groups within their representation. Furthermore, geographic databases are typically
large, in the order of 105 or 106 objects. This result is, therefore, significant, because
it suggests that existing similarity models may need to be revised such that new simi-
larity algorithms must consider the possible presence of perceptually correlated
attributes.
Future work should consider the impact of these findings beyond highly-structured
spatial databases to embrace the less rigid geospatial semantic web (Egenhofer 2002),
which is driven by ontologies. Similarity relations fit well into an ontological frame-
work, because it is expected that people who commit to the same ontology perceive
identically not only the concepts that are important in their domain of interest, but
also the similarity relations that hold among these concepts. This alignment of indi-
vidual similarity views towards a common similarity view is emphasized by the fact
that ontologies already have inherent a notion of qualitative similarity relations among
the concepts that they model. This notion is reflected in their structure (i.e., in the way
they specify classes and subclasses) and in the properties and roles that are attributed
to each concept. Formalizing similarity within ontologies would be a step forward in
the employment of ontologies not only as means for semantic integration, but also as
tools for semantic management, and would help their transition from symbolic to
conceptual constructs.
Acknowledgments
This work was partially supported by the National Geospatial-Intelligence Agency
under grant numbers NMA401-02-1-2009 and NMA201-01-1-2003.
References
Ashby, F., Lee, W.: Predicting Similarity and Categorization from Identification. Journal of
Experimental Psychology: General 120(2), 150–172 (1991)
Ashby, F., Townsend, J.: Varieties of Perceptual Independence. Psychological Review 93(2),
154–179 (1986)
Attneave, F.: Dimensions of Similarity. American Journal of Psychology 63(4), 516–556
(1950)
Basri, R., Costa, L., Geiger, D., Jacobs, D.: Determining the Similarity of Deformable Shapes.
Vision Research 38, 2365–2385 (1998)
Bruns, T., Egenhofer, M.: Similarity of Spatial Scenes. In: Kraak, M.-J., Molenaar, M. (eds.)
Seventh International Symposium on Spatial Data Handling (SDH 1996), Delft, The Nether-
lands, pp. 173–184. Taylor & Francis, London (1996)
Clementini, E., di Felice, P.: Topological Invariants for Lines. IEEE Transactions on Knowl-
edge and Data Engineering 10(1), 38–54 (1998)
Dey, D., Sarkar, S., De, P.: A Distance-Based Approach to Entity Reconciliation in Heteroge-
neous Databases. IEEE Transactions on Knowledge and Data Engineering 14(3), 567–582
(2002)
Egenhofer, M.: Query Processing in Spatial-Query-by-Sketch. Journal of Visual Languages and
Computing 8(4), 403–424 (1997)
Egenhofer, M.: Towards the Semantic Geospatial Web. In: Voisardand, A., Chen, S.-C. (eds.)
10th ACM International Symposium on Advances in Geographic Information Systems,
McLean, VA, pp. 1–4 (2002)
Egenhofer, M., Franzosa, R.: On the Equivalence of Topological Relations. International Jour-
nal of Geographical Information Systems 9(2), 133–152 (1995)
Egenhofer, M., Mark, D.: Naive Geography. In: Kuhn, W., Frank, A.U. (eds.) COSIT 1995.
LNCS, vol. 988, pp. 1–15. Springer, Heidelberg (1995)
Egenhofer, M., Shariff, R.: Metric Details for Natural-Language Spatial Relations. ACM
Transactions on Information Systems 16(4), 295–321 (1998)
Gärdenfors, P.: Conceptual Spaces: The Geometry of Thought. MIT Press, Cambridge (2000)
Gibbons, J.: Nonparametric Methods for Quantitative Analysis. American Sciences Press,
Syracuse (1996)
Goyal, R., Egenhofer, M.: Similarity of Cardinal Directions. In: Jensen, C., Schneider, M.,
Seeger, B., Tsotras, V. (eds.) Proceedings of the Seventh International Symposium on Spa-
tial and Temporal Databases, Los Angeles, CA. LNCS, vol. 2121, pp. 36–55. Springer, Hei-
delberg (2001)
Gudivada, V., Raghavan, V.: Design and Evaluation of Algorithms for Image Retrieval by
Spatial Similarity. ACM Transactions on Information Systems 13(1), 115–144 (1995)
Hahn, U., Chater, N.: Concepts and Similarity. In: Lamberts, K., Shanks, D. (eds.) Knowledge,
Concepts, and Categories, pp. 43–92. MIT Press, Cambridge (1997)
Hjaltason, G., Samet, H.: Ranking in Spatial Databases. In: Egenhofer, M.J., Herring, J.R.
(eds.) SSD 1995. LNCS, vol. 951, pp. 83–95. Springer, Heidelberg (1995)
James, W.: The Principles of Psychology. Holt, New York (1890)
Li, B., Fonseca, F.: TDD: A Comprehensive Model for Qualitative Similarity Assessment.
Spatial Cognition and Computation 6(1), 31–62 (2006)
Miller, G.: The Magical Number Seven, Plus or Minus Two: Some Limits on Our Capacity for
Processing Information. The Psychological Review 63(1), 81–97 (1956)
Mosteller, F., Rourke, R.: Sturdy Statistics: Nonparametric & Order Statistics. Addison-
Wesley, Menlo Park (1973)
Nabil, M., Ngu, A., Shepherd, J.: Picture Similarity Retrieval using the 2D Projection Interval
Representation. IEEE Transactions on Knowledge and Data Engineering 8(4), 533–539
(1996)
Nedas, K.: Semantic Similarity of Spatial Scenes. Ph.D. Dissertation, Department of Spatial
Information Science and Engineering, University of Maine (2006)
Nedas, K., Egenhofer, M., Wilmsen, D.: Metric Details for Topological Line-Line Relations.
International Journal of Geographical Information Science 21(1), 21–24 (2007)
Nosofsky, R.: Attention, Similarity, and the Identification-Categorization Relationship. Journal
of Experimental Psychology: General 115(1), 39–57 (1986)
Nosofsky, R.: Similarity Scaling and Cognitive Process Models. Annual Review of Psychol-
ogy 43(1), 25–53 (1992)
Rodríguez, A., Egenhofer, M.: Determining Semantic Similarity among Entity Classes from
Different Ontologies. IEEE Transactions on Knowledge and Data Engineering 15(2), 442–
456 (2003)
Rodríguez, A., Egenhofer, M.: Comparing Geospatial Entity Classes: An Asymmetric and
Context-Dependent Similarity Measure. International Journal of Geographical Information
Science 18(3), 229–256 (2004)
Shepard, R.: Toward a Universal Law of Generalization for Psychological Science. Journal of
Science 237(4820), 1317–1323 (1987)
Stevens, S.: Mathematics, Measurement, and Psychophysics. In: Stevens, S. (ed.) Handbook of
Experimental Psychology, pp. 1–49. John Wiley & Sons, Inc., New York (1951)
Takane, Y., Shibayama, T.: Structures in Stimulus Identification Data. In: Ashby, F. (ed.) Prob-
abilistic Multidimensional Models of Perception and Cognition, pp. 335–362. Earlbaum,
Hillsdale (1992)
Torgerson, W.: Multidimensional Scaling of Similarity. Psychometrika 30(4), 379–393 (1965)
Tversky, A.: Features of Similarity. Psychological Review 84(4), 327–352 (1977)
Wentz, E.: Developing and Testing of a Trivariate Shape Measure for Geographic Analysis.
Geographical Analysis 32(2), 95–112 (2000)
Spatial Abstraction: Aspectualization,
Coarsening, and Conceptual Classification
Lutz Frommberger and Diedrich Wolter

Universität Bremen
Enrique-Schmidt-Str. 5, 28359 Bremen, Germany
{lutz,dwolter}@sfbtr8.uni-bremen.de
Abstract. Spatial abstraction empowers complex agent control proc-

esses. We propose a formal definition of spatial abstraction and classify
it by its three facets, namely aspectualization, coarsening, and conceptual
classification. Their characteristics are essentially shaped by the repre-
sentation on which abstraction is performed. We argue for the use of
so-called aspectualizable representations which enable knowledge trans-
fer in agent control tasks. In a case study we demonstrate that aspectu-
alizable spatial knowledge learned in a simplified simulation empowers
strategy transfer to a real robotics platform.
Keywords: abstraction, knowledge representation, knowledge transfer.
1 Introduction
Abstraction is one of the key capabilities of human cognition. It enables us to
conceptualize the surrounding world, build categories, and derive reactions from
them to cope with a certain situation. Complex and overly detailed circumstances
can be reduced to much simpler concepts and not until then it becomes feasible
to deliberate about conclusions to draw and actions to take.
Certainly, we want to see such abstraction capabilities in intelligent artificial
agents too. This requires us to implement abstraction principles in the knowledge
representation used by the artificial agent. First of all, abstraction is a process
transforming a knowledge representation. But how can this process be charac-
terized? We can distinguish three different facets of abstraction. For example it
is possible to regard a subset of the available information only, or the level of
detail of every bit of information can be reduced, or the available information
can be used to construct new, more abstract entities. Intuitively, these types of
abstraction are different and lead to different results as well. Various terms have
been coined for abstraction principles, distributed over several scientific fields
like cognitive science, artificial intelligence, architecture, linguistics, geography,
and many more. Among others we find the terms granularity [1,2], generaliza-
tion [3], schematization [4,5], idealization [5], selection [5,6], amalgamation [6],
or aspectualization [7]. Unfortunately, some of these terms define overlapping
concepts, different ones sometimes have the same meaning, or a single term is
Springer-Verlag Berlin Heidelberg 2008
312 L. Frommberger and D. Wolter
used for different concepts. Also, these terms are often not distinguished in an
exact manner or only defined by giving examples.
In this work we take a formal view from a computer scientist’s perspective.
We study abstraction as part of knowledge representation. Our primary concern
is representation of spatial knowledge, yet we aim at maintaining a perspective
as general as possible, allowing adaption to other domains. Spatial information
is rich and can be conceptualized in a multitude of ways, making its analysis
challenging as well as relevant to applications. Handling of spatial knowledge is
essential to all agents acting in the real world.
One contribution of this article is a formal definition of abstraction processes:
aspectualization, coarsening, and conceptual classification. We characterize their
properties and investigate into the consequences that arise when using abstrac-
tion in agent control processes. Applying the formal framework to a real appli-
cation in robot navigation exemplifies its utility. Appropriate use of abstraction
allows knowledge learned in a simplified computer simulation to be transferred
to a control task with a real autonomous robot. Aspectualizable knowledge rep-
resentations which we introduce and promote in this paper play a key role. The
exemplified robot application shows how abstraction principles empower intel-
ligent agents to transfer decision processes, thereby being able to cope with
unfamiliar situations. Put differently, aspectualizable knowledge representations
enable knowledge transfer.
This paper is organized as follows: In Section 2 we give our definition of the
spatial abstraction paradigms and discuss the role of abstraction in knowledge
representation and its utility in agent control tasks. Section 3 covers the case
study of learning navigational behavior in simulation and transferring it to a
real robot. The paper ends with a discussion of formal approaches to spatial
abstraction and their utility (Section 4) and a conclusion.
2 A Formal View on Facets of Abstraction
The term abstraction is etymologically derived from the Latin words “abs” and
“trahere”, so the literal meaning is “drawing away”. However, if we talk about
abstraction in the context of information processing and cognitive science, ab-
straction covers more than just taking away something, because it is not intended
merely to reduce the amount of data. Rather, abstraction is employed to put the
focus on the relevant information. Additionally, the result is supposed to gen-
eralize and to be useful for a specific task at hand. We define abstraction as
follows:
Definition 1. Abstraction is the process or the result of reducing the informa-

tion of a given observation in order to achieve a classification that omits all
information that is irrelevant for a particular purpose.
We first concentrate on information reduction. Let us say that all potential values
of a knowledge representation are elements of a set S which can be regarded as
Spatial Abstraction 313
a Cartesian product of features from different domains: S = D1 × D2 × . . . × Dn .

We call s = (s1 , . . . , sn ) ∈ S a feature vector, and every si is a feature. S is also
called state space and its elements are states.
Abstraction is a non-injective function κ : S → T mapping the source space S
to a target set T . Non-injectiveness is important as otherwise no reduction (n:1-
mapping) is possible. In the case of S being finite it holds that |S| > |Image(κ)|.
Without loss of generality we will assume in the following, simply to ease read-
ability, that all domains D are of the same kind: S = Dn .
In the following we will formally classify abstraction into three different cat-
egories: aspectualization, coarsening, and conceptual classification.
2.1 Aspectualization
Aspects are semantic concepts. They are pieces of information that represent
certain properties. For example, if we record the trajectory of a moving robot,
we have a spatio-temporal data set denoting at what time the robot visited which
place. Time and place are two different aspects of this data set. Aspectualization
singles out such aspects.
Definition 2. Aspectualization is the process or result of explicating certain as-

pects of an observation purely by eliminating the others. Formally, it is defined
as a function κ : Dn → Dm (n, m ∈ N, n > m):
κ(s1 , s2 , . . . , sn ) = (si1 , si2 , . . . , sim ), ik ∈ [1, n], ik < ik+1 ∀k .
Thus, aspectualization projects Dn to Dm .
Example 1. An oriented line segment s in the plane is represented as a point

(x, y) ∈ R2 , a direction θ ∈ [0, 2π], and a length l ∈ R: s = (x, y, θ, l). The
reduction of this line segment to an oriented point is an aspectualization with
κa (x, y, θ, l) = (x, y, θ).
Aspects may span over several features si . However, to be able to single out an
aspect from a feature vector by aspectualization, it must be guaranteed that no
feature refers to more than one aspect. We call this property aspectualizability:
Definition 3. If an aspect is exclusively represented by one or more components

of a feature vector s ∈ S (that is: no si refers to more than one aspect), then we
call S aspectualizable regarding this aspect.
Example 2. The oriented line segment representation in Example 1 can be bijec-

tively mapped from point, angle, and length to two points (x1 , y1 , x2 , y2 ). Then
aspectualization as defined in Def. 2 cannot single out the length of the line
segment, because length is not represented explicitly. S is not aspectualizable
regarding length.
aspectualization
(e.g., by focusing
on shape, disregarding
object color)
conceptual classification coarsening

(e.g., by grouping objects (e.g., by reducing
to form new objects) spatial resolution)
Fig. 1. Iconographic illustration of the three abstraction principles aspectualization,

coarsening, and conceptual classification applied to the same representation
2.2 Coarsening
When the set of values a feature can take is reduced, we speak of a coarsening:
Definition 4. Coarsening is the process or result of reducing the details of in-
formation of an observation by lowering the granularity of the input space. For-
mally, it is defined as a function κ : Dn → Dn (n ∈ N),
κ(s) = (κ1 (s1 ), κ2 (s2 ), . . . , κn (sn ))
with κi : D → D and at least one κi being not injective.

The existence of a non-injective κi ensures that we have an abstraction.
Example 3. An important representation in the area of robot navigation is the
occupancy grid [8], a partition of 2-D or 3-D space into a set of discrete grid
cells. A function κ : R2 → R2 , κ(x, y) = (x, y) is a coarsening that maps any
coordinate to a grid cell of an occupancy grid.
2.3 Conceptual Classification

Conceptual classification is the most general of the three proposed abstraction
facets. It can utilize all components of the input to build new entities:
Definition 5. Conceptual classification abstracts information by grouping se-

mantically related features to form new abstract entities. Formally, it is defined
as a non-injective function κ : Dn → Dm (m, n ∈ N),
κ(s1 , s2 , . . . , sn ) = (κ1 (s1,1 , s1,2 , . . . , s1,h1 ), κ2 (s2,1 , s2,2 , . . . , s2,h2 ), . . . ,

κm (sm,1 , sm,2 , . . . , sm,hm ))
with κi : Dhi → D and hi ∈ {1, . . . , n}, whereby i ∈ {1, . . . , m}.

Conceptual classification subsumes the other two abstraction concepts: If all κi
have the form κi : D → D and m = n, a conceptual classification is a coarsening;
and if all κi have the form κi (sj ) = sj , i ≤ j, m < n, and κi = κj ⇒ i = j, then
a conceptual classification becomes an aspectualization.
Example 4. Data gathered from a laser range finder comes as a vector of distance
values and angles to obstacles in the local surrounding, which can be represented
as 2-D points in a relative coordinate frame around the sensor. Abstraction of
these points to line segments by the means of a line detection algorithm (as, for
example, described in [9]) is a conceptual classification.
To sum up, Fig. 1 illustrates aspectualization, coarsening, and conceptual clas-
sification in an iconographic way.
2.4 Abstraction and Representation

Intuitively, aspectualization and coarsening describe two very different processes:
The first one reduces the number of features of the input, the latter one the variety
of instances for every single feature. While aspectualization necessarily reduces the
dimensionality of a representation, coarsening preserves dimensionality.
Depending on the representation of the feature vector, coarsening can pro-
duce a result that is equivalent to an aspectualization though: Let one or more
mappings κi in a coarsening be defined as mappings to a single constant value:
κi = ci , ci ∈ D. Assume all other mappings κi to be the identity function. Now,
consider an aspectualization that retains exactly the components not mapped
to single constant values ci by the coarsening. Obviously, this aspectualization
has a canonical embedding in the result of the coarsening. We illustrate this by
an example:
Example 5. As in Example 1, an oriented line segment in the plane is represented

as a point (x, y) ∈ R2 , a direction θ ∈ [0, 2π], and a length l ∈ R: s = (x, y, θ, l).
κa is defined as in Example 1. The function κc (x, y, θ, l) = (x, y, 1, l) is a coars-
ening, and it trivially holds:
{κc (x, y, θ, l)} = {(x, y, 1, l)} ∼

= {(x, y, l)} = {κa (x, y, θ, l)}
It is also possible to transform a knowledge representation such that a coarsening

can be expressed by an aspectualization. For example, this is the case when
abstraction operates on a group:
Theorem 1. If κc is a coarsening on a group, for example (S, +), then there

exists an isomorphism ϕ and an aspectualization κa such that the following di-
agram commutes:
ϕ
S? / S
?? ~
?? ~~
κc ?? ~~~κa
~~
T
Proof. Choose ϕ(s) = (s + κc (s), κc (s)) , ϕ−1 (t1 , t2 ) = t1 + (−t2 ) and κa (t1 , t2 ) =
t2 , and define (S , ⊕) with S = Image(ϕ) and t ⊕ u = ϕ ϕ−1 (t) + ϕ−1 (u)
for each t, u ∈ S . Checking that (S , ⊕) is a group and ϕ a homomorphism is
straightforward.
We illustrate this theorem by the following example:
Example 6. Coordinates (x, y) ∈ R2 can be bijectively mapped to a represen-

tation (x, x − x, y, y − y) which features decimal places separately. The
function κ(x, x , y, y ) = (x, y) is an aspectualization with the same result as the
coarsening in Example 3.
Note that Theorem 1 does not introduce additional redundancy into the rep-
resentation. If we would allow for introducing redundancy we could bijectively
create new representations by concatenating s and an arbitrary abstraction κ(s)
with the effect that any abstraction, including conceptional classification, can
always be achieved by an aspectualization from this representation. Therefore,
we do not regard this kind of redundancy here.
Not every representation allows for coarsening, as the following example shows:
Example 7. Commercial rounding is defined by a function f : R+ 0 → R0 , f (x) =

+
x + 0.5. f is a coarsening. If, similar to Example 6, x ∈ R is represented as

(x, x − x), then commercial rounding can neither be expressed by aspectual-
ization (because the representation is not aspectualizable regarding this rounding)
nor by coarsening (because the abstraction function operates on both components
x and x − x of the feature vector, which contradicts Def. 4). So even if com-
mercial rounding reduces the number of instances in half of the components, the
example above cannot be expressed as a coarsening under this representation fol-
lowing Def. 4. It then must be seen as a conceptual classification, which is the most
general of the three facets of abstraction presented here.
Different abstraction paradigms, even if describing distinct processes, can thus

lead to the same result: Applicability of a specific abstraction principle relies
heavily on the given representation, and usually different types of abstraction
can be utilized to achieve the same result. Thus, the choice of an abstraction
paradigm is tightly coupled with the choice of the state space representation. In
the following we will argue for an action-centered view for choosing appropriate
representations.
2.5 Abstraction in Agent Control Processes

Abstraction, as we define it, is not a blind reduction of information, but comes
with a particular purpose. It is applied to ease solving a specific problem, and
the concrete choice of abstraction is implied by the approach to master the task.
If we want to utilize spatial abstraction in the context of agent control tasks,
we try to reach three goals:
1. Significantly reducing the size of the state space the agent is operating in
2. Eliminating unnecessary details in the state representation
3. Subsuming similar states to unique concepts
The first goal, reducing state space size, is a mandatory consequence of the
latter two, which must be seen in the context of action selection: The question
whether a detail is “unnecessary” or whether two states are “similar” depends
on the task of the agent:
– A detail is considered unnecessary if its existence does not affect the action
selection of the agent.
– Two states are considered to be similar if the agent should select the same
action in any of the states.
This action centered view expands classical definitions of similarity, as it is for
example given by Fred Roberts [10]: Two states s and s are indistinguishable
(written s ∼ s ) if there is a mapping f : S → R and an ∈ R+ with s ∼ s ⇔
|f (s) − f (s )| < . Roberts’ concept is data driven whereas ours is action driven
in order to account for the task at hand. States may be very near concerning
a certain measure, but nevertheless require different actions to take in certain
contexts. Grid based approaches, achieved by coarsening, easily bear the danger
of not being able to provide an appropriate state separation due to missing
resolution in critical areas of the state space. Furthermore, a “nearness” concept
as presented by Roberts is again a matter of representation and may only be
appropriate in homogeneous environments.
Abstraction shall ease the agent’s action selection process. If those details are
eliminated that are irrelevant for the choice of an action, difficulty and process-
ing time of action selection is reduced, and action selection strategies may be
applicable to a broader range of scenarios.
When choosing an abstraction paradigm for a given data set, the result must
be regarded in the context of accessibility of information. The goal of abstraction
must be to enable easy access to the relevant information. Which piece of infor-
mation is relevant, of course, depends on the task at hand: A computer-driven
navigation control may require different concepts than a system interacting with
a human being. Abstraction retains information that is relevant for a certain
purpose. Therefore, it can never be regarded as purely data driven, but requires
a solid a-priori concept of the problem to solve and, consequently, the actions to
take.
As we have seen in Section 2.4, we can use different abstraction paradigms to
achieve the same effect, given an appropriate state space representation. From
the view point of accessibility we now argue for preferring the use of aspec-
tualizable representations, as relevant aspects are clearly separated and easy to
access and aspectualization itself is a computationally simple process. Accessibil-
ity eases knowledge extraction: Section 3.3 will show an example of an algorithm
that makes use of the aspectualizability of a representation. Once again, aspectu-
alizability can be achieved by abstraction. In particular, conceptual classification
is a powerful means. So abstraction helps to create representations that allow
for distinguishing different aspects by using aspectualization.
3 Knowledge Transfer of Simulation Strategies to a Real

Robot
In this section we show how abstraction supports knowledge transfer. We regard
the problem of transferring navigation skills learned with reinforcement learning
(RL) [11] in a simple simulator to a real robot and demonstrate that on the one
hand abstraction allows us to cope with real world concepts in the same way as
with simulated ones and on the other hand that the transfer of knowledge benefits
from aspectualizability of the state space representation. The key benefit of this
approach is that learning is much more efficient in simple simulations than in real
environments or complex simulations thereof. In particular, we show that the use
of an aspectualizable representation empowers the derivation of aspectualizable
behavior that is key to successful knowledge transfer.
3.1 The Task

The task considered here is the following: A simulated robot shall learn to find
find a specified location s∗ ∈ S within an unknown environment (see Fig. 2 left
for a look on the simulation testbed).
This scenario is formalized as a Markov Decision Process (MDP) S, A, T, R
with a continuous state space S = {(x, y, θ)— x, y ∈ R, θ ∈ [0, 2π)} where
each system state is given by the robot’s position (x, y) and an orientation θ,
an action space A of navigational actions the agent can perform to move to
another state, a transition function T : S × A × S → [0, 1] denoting a probability
distribution that performing an action a at state s will result in state s , and
a reward function R : S → R, where a positive reward will be given when a
goal state s∗ ∈ S is reached and a negative one if the agent collides with an
obstacle. A solution to this MDP is a policy π(s) that assigns an action to take
to any state s. RL is a frequently used method to compute such a strategy. After
successful learning, following π in a greedy way will now bring the robot from
every position in the world to the goal location s∗ .
In general, this strategy π is bound to the goal state the agent learned to
reach. For mastering another task only differing in s∗ , the whole strategy would
need to be re-learned from scratch, including low-level skills as turning actions
and collision avoidance—the knowledge gained in the first learning task would
not be applicable. The challenge of avoiding this and re-using parts of the gained
Fig. 2. Left: a screenshot of the robot navigation scenario in the simulator, where the
strategy is learned. Right: a Pioneer 2 in an office building, where the strategy shall
be applied. The real office environment offers structural elements not present in the
simulator: open space, uneven walls, tables, and other obstacles.
knowledge of a learning task and transferring it to another one has recently been
labeled transfer learning, and several approaches have been proposed to tackle
this problem [12,13,14, e.g.]. We will describe how such transfer capabilities can
be achieved by spatial state space abstraction and we will point out how abstrac-
tion mechanisms allow for knowledge transfer in a more general sense: Learned
navigation knowledge is not only transferable to a similar task with another
goal location, but abstraction allows us to operate on the same abstract entities
in quite different tasks. We will show that the spatial state space abstraction
approach even allows for bridging the gap between results gained in a simple
simulator and real robotics just by the use of spatial abstraction.
3.2 Learning a Policy in Simulation
In our simulation scenario, the robot is able to perceive walls around it as line
segments within a certain maximum range. This perception is disturbed by noise
such that every line segment is detected as several smaller ones. The agent can
also identify the walls. In our simulator, this is modeled in a way that every wall
has a unique color and the agent perceives the color of the wall. The robot is
capable of performing three actions: moving forward and turning a few degrees
either to the left or to the right. Turning includes a small forward movement;
and some noise is added to all actions. There is no built-in collision avoidance
or any other navigational intelligence provided.
For learning we use the reinforcement learning paradigm of Q-learning [15].
The result is a Q-function that assigns an expected overall reward to any state-
action pair (s, a) and a policy π(s) = argmaxa Q(s, a) that delivers the action
with the highest expected reward for every state s.
14 13 12 14 13 12
15 11 15 4 3 2 11
16 10 16 5 1 10
(a) (b)
Fig. 3. Neighboring regions around the robot in relation to its moving direction. Note
that the regions R1 , . . . , R5 in the immediate surroundings (b) overlap R10 , . . . , R16
(a). The size of the grid defining the immediate surroundings is given a-priori. It is
a property of the agent and depends on its size and system dynamics (for example,
the robot’s maximal speed). In this work, only the thick drawn boundaries in (a) are
regarded for building the representation.
This learning task is a complex one, because the underlying state space is
large and continuous, and reinforcement learning processes are known to suffer
from performance problems under these conditions. Thrun and Schwartz stated
that for being able to adapt RL to complex tasks it is necessary to discover the
structure of the world and to abstract from its details [16]. In any case, a sensible
reduction of the state space will be beneficial for any RL application.
To achieve that structural abstraction, we make use of the observation that
navigation in space can be divided into two different aspects: Goal-directed be-
havior towards a task-specific target location, and generally sensible behavior
that is task-independent and the same in any environment [17]. According to
[12], we refer to the first as problem space and to the latter as agent space. It is
especially agent space that encodes structural information about the world that
persists in any learning task and therefore this knowledge is worth transferring
to different scenarios.
The structure of office environments as depicted in Fig. 2 is usually character-
ized by walls, which can be abstracted as line segments in the plane. Even more
it is relative position information of line segments with respect to the robot’s
moving direction that defines structural paths in the world and leads to sensible
action sequences for a moving agent. Thus, for encoding agent space, we use the
qualitative representation RLPR (Relative Line Position Representation) [17].
Inspired by the “direction relation matrix” [18], the space around the agent is
partitioned into bounded and unbounded regions Ri (see Fig. 3). Two functions
τ : N → {0, 1} and τ : N → {0, 1} are defined: τ (i) denotes whether there is a
line segment detected within a region Ri and τ (i) denotes whether a line spans
from a neighboring region Ri+1 to Ri . τ i is used for bounded sectors in the im-
mediate vicinity of the agent (R1 to R5 in Fig. 3(b)). Objects that appear there
have to be avoided in any case. The position of detected line segments in R10 to
R16 (Fig. 3(a)) is helpful information to be used for general orientation and mid-
term planning, so τ is used for R10 to R16 . This abstraction from line segments
in the simulator to a vector of RLPR values is a conceptual classification.
environmental
data from
simulation
conceptual classification
landmark
enriched RLPR
representation
Ψ1 Ψ2
aspectualization aspectualization
problem
agent space
space
Fig. 4. Abstraction principles used to build an aspectualizable state space representation
For representing problem space it is sufficient to encode the qualitative posi-

tion of the agent within the world. We do this by representing a circular order
of detected landmarks, as for example proposed in [19]. Therefore we regard a
sequence of detected wall colors ci at seven discrete angles around the robot:
ψl (s) = (c1 , . . . , c7 ).
As suggested in Section 2.5, we now use these two conceptual classifications
to create an aspectualizable state space representation by concatenating ψl and
ψr . The result ψ(s) is the landmark-enriched RLPR representation:
ψ(s) = (ψl (s), ψr (s)) = (c1 , . . . , c7 , τ (R1 ), . . . , τ (R5 )), τ (R10 ), . . . , τ (R16 )
We call the new emerging state space O = Image(ψ) the observation space. It
is a comparably small and discrete state space, fulfilling the three goals of ab-
straction we defined in Section 2.5. The RLPR based approach has been shown
to outperform metrical representations that rely on distances or absolute coor-
dinates with regard to learning speed and robustness [17]. For an example of
deriving RLPR values refer to Fig. 5.
So conceptual classification is employed twice for both problem and agent
space to create a compact state space representation. ψ(s) is aspectualizable
regarding the two aspects of navigation (see Fig.4). Let us now investigate how
to take advantage of that to transfer general navigation knowledge to a new task.
3.3 Extracting General Navigation Behavior
The learning process results in a policy π : O → A that maps any o ∈ O to

an action to take when the agent observes o. The corresponding Q-values are
stored in a lookup table. We now want to apply this policy to a totally different
domain, the real world, where we cannot recognize the landmarks encountered
during learning. So the policy must provide sensible actions to take in the absence
of known landmarks. This is the behavior that refers to the aspect of general
navigation behavior or agent space. It has to be singled out from π.
By design, ψ(s) is aspectualizable with regard to agent space, and the desired
information is easily accessible. An aspectualization κ(o) = κ(ψl (s), ψr (s)) =
ψr (s)) provides structural world information for any observation. That is, struc-
turally identical situations share an identical RLPR representation.
A new Q-function Qπ for a general, aspectualized policy π for arbitrary states
with the same aspect ψr (s) can be constructed by Q-value averaging over states
with identical ψr (s), which are easily accessible because of the aspectualizability
of O. Given a learned policy π with a value function Qπ (o, a) (o ∈ O, a ∈ A), we
construct a new policy π with Qπ (o , a) (o ∈ O , a ∈ A) in a new observation
space O = Image(ψr ), with the following function [20]:
−1
c∈{ψl (s)} (maxb∈A (|Qπ ((c, o ), b)|))) Q((c, o ), a)
Qπ (o , a) =
|{((c, o ), a)|Qπ ((c, o ), a) = 0}|
This is a weighted sum over all possible landmark observations (in reality, of
course, only the visited states have to be considered, because Q(o, a) = 0 for
the others, so the computational effort is very low). It is averaged over all state-
action pairs where the information is available, that is, the Q-value is not zero.
A weighting factor scales all values according to the maximum reward over all
actions.
This procedure has been applied to a policy learned in the simulated environ-
ment depicted in Fig. 2 for 40,000 learning episodes. For the exact experimental
conditions of learning the policy in simulation refer to [20]. The resulting policy
Qπ can then be used to control a real robot as shown in the following section.
3.4 Using the Aspectualized Strategy on a Mobile Robot

Controlling a mobile robot with a strategy learned in simulation requires sensory
input to be mapped to the same domain as used in the simulation. This can be
accomplished in a straightforward manner given that an abstract intermediate
representation is constructed on the robot from raw sensor data. We now detail
this approach using a Pioneer-2 type robot equipped with a laser range finder.
Laser range finders detect obstacles around the robot (the field of view of the
sensor used on our robot is 180 ). By measuring laser beams reflected by obstacles
one obtains a sequence of (in our case) 361 points in local coordinates. We use
the well-known iterative split-and-merge algorithm that is commonly used in
robotics to fit lines to scan data (see [9])—another conceptual classification, as
we have seen in Example 4. With respect to parameters of the procedure we
point out that a precise line fitting is not required [20]. Rather, we want to make
sure that all obstacles detected by the laser range scanners get represented by
lines, even if this is a crude approximation. The detected line configuration is
then mapped every 0.25 seconds to the RLPR representation and fed into the
learned strategy to obtain the action primitive to perform. See Fig. 5 for a view
τ (R1 ) = 1
τ (R2 ) = 0
τ (R3 ) = 0
τ (R4 ) = 1
τ (R5 ) = 0
τ (R10 ) = 1
τ (R11 ) = 0
τ (R12 ) = 0
τ (R13 ) = 0
τ (R14 ) = 1
τ (R16 ) = 1
Fig. 5. Screenshot: Abstraction to RLPR in the robot controller. Depicted are the
qualitative regions (see Fig. 3) and the interpreted sensor data which has been ac-
quired from the robot position shown in Fig. 2 right. The overall representation for
this configuration is ψr (s) = {1, 0, 0, 1, 0, 1, 0, 0, 0, 1, 1}.
on the line detection of the laser range finder data and the corresponding RLPR
representation. Fig. 6 gives an overview on the development of representations
in both simulator and robot application.
In the simulation three action primitives (straight on, turn left, turn right) have
been used that always move the robot some fixed distance. Rather then implement-
ing this step-wise motion on the real robot, we mapped the action to commands
controlling the wheel speeds in order to obtain continuous motion. Additionally,
movement is smoothed by averaging the most recent wheel speed commands to
avoid strong acceleration/deceleration which the robot drive cannot handle well.
We applied the averaging to the last 8 actions which (given the 0.25 second inter-
val of wheel commands) yields a time of 2 seconds before reaching the wheel speed
associated with the action primitive. In accordance to the robot’s size and motion
dynamics the inner regions of the RLPR grid (Fig.3(b)) have been set to 60 cm in
front and both 30 cm to the left and the right of the robot.
We analyzed the behavior of the Pioneer 2 robot with the learned policy in our
office environment. In contrast to the simple simulation environment the office
environment presents uneven walls, open spaces of several meters, plants, and
furniture like a sofa or bistro tables. The robot shows a reasonable navigation be-
havior, following corridors straightly and turning smoothly around curves. It also
showed ability to cope with structural elements not present in the simulated envi-
ronment, such as open space or tiny obstacles. In other words, general navigation
skills learned in simulation have been transferred to the real-world environment.
The robot only got stuck when reaching areas with a huge amount of clutter
efficient learning in abstracted simulation realization in real world environment
artificial robot mobile robot

simulation with laser range finder
geometric
RLPR landmark representation
representation configuration of obstacle
outlines
landmark
RLPR
enriched RLPR
representation
representation
RLPR representation
action selection based on

learned strategy
Fig. 6. Evolution of spatial representations in both simulation and real robot applica-
tion. Abstraction techniques enable both scenarios to operate on the RLPR represen-
tation to achieve a reasonable action selection.
Fig. 7. Pioneer 2 entering an open space, using the aspectualized policy learned in
the simulator. It shows a reasonable navigation behavior in a real office environment,
driving smoothly forward and safely around obstacles.
(such as hanging leaves of plants) and in dead ends where the available motion
primitives do not allow for collision-free movement anymore. Because the original
task was goal-oriented (searching a specific place), the robot also showed a strong
tendency of moving forward and thus actively exploring the environment instead
of just avoiding obstacles. This generally sensible navigation behavior could now,
for example, be used as a basis for learning new goal-oriented tasks on the
robotics platform. Fig. 7 gives an impression of the robot experiment.
4 Discussion
Performing abstraction is a fundamental ability of intelligent agents and different
facets of abstraction have thus been issued in previous work, addressing various
scientific fields and considering a rich diversity of tasks. First, we comment on
a critical remark by Klippel et al.: In their thorough study on schematization,
they state that “there is no consistent approach to model schematization” [4].
We believe that by our formal definitions of abstraction principles the manifold
terms used to describe abstraction can very well be classified and related.
The insight that abstraction can be divided into different categories has
been mentioned before. Stell and Worboys present a distinction of what they
call “selection” and “amalgamation” and formalize these concepts for graph
structures [6]. Our definition of aspectualization and coarsening corresponds to
selection and amalgamation, which Stell and Worboys describe as being “con-
ceptually distinct” types of generalization. Regarding this, we pointed out that
this conceptual distinctness does only apply to the process of abstraction and
not the result, as we could show that the effect of different abstraction paradigms
critically depends on the choice of the initial state space representation.
Bertel et al. also differentiate between different facets of abstraction (“as-
pectualization versus specificity”, “aspectualization versus concreteness”, and
“aspectualization versus integration”), but without giving an exact definition [7].
“Aspectualization versus specificity” corresponds to our definition of aspectu-
alization, and “aspectualization versus concreteness” to coarsening. However,
our definition of aspectualization is tighter than the one given by Bertel et al.:
According to them, aspectualization is “the reduction of problem complexity
through the reduction of the number of feature dimensions”. In our definition,
it is also required that all the other components remain unchanged.
The notion of schematization, which Leonard Talmy describes as “a process
that involves the systematic selection of certain aspects of a referent scene to
represent the whole disregarding the remaining aspects” [21] is tightly connected
to our definition of aspectualization. If we assume the referent scene to be as-
pectualizable according to Def. 3, then the process mentioned by Talmy is as-
pectualization as defined here.
Annette Herskovits defines the term schematization in the context of linguis-
tics as consisting of three different processes, namely abstraction, idealization,
and selection [5]. According to our definition, abstraction and selection would
both be an aspectualization, and idealization refers to coarsening.
The action-centered view on abstraction we introduced in Section 2.5 is also
shared by the definition of categorizability given by Porta and Celaya [22]. The
authors call an environment categorizable, if “a reduced fraction of the available
inputs and actuators have to be considered at a time”. In other words: In a
categorizable environment, an abstraction can be achieved that subsumes iden-

tical action selection to identical representations.
5 Conclusion
In this article we classify abstraction by three distinct principles: aspectualiza-
tion, coarsening, and conceptual classification. We give a formal definition of
these principles for classifying and clarifying the manifold concept names for
abstraction found in the literature. This enables us to show that knowledge rep-
resentation is of critical importance and thus must be addressed in any discus-
sion of abstraction. Identical information may be represented differently, and, by
choosing a specific representation, different types of abstraction processes may
be applicable and lead to an identical result. Also, as abstraction is triggered by
the need to perform a certain task, abstraction can never be regarded as purely
data driven, but it requires a solid a-priori concept of the problem to solve and,
consequently, the actions to take.
We introduce the notion of aspectualizability in knowledge representations. As-
pectualizable knowledge representations are key to enabling knowledge transfer.
By designing an aspectualizable representation, it is possible to transfer naviga-
tion knowledge learned in a simplified simulation to a real-world robot setting.
Acknowledgments. This work was supported by the DFG Transregional Collab-
orative Research Center SFB/TR 8 “Spatial Cognition” (project R3-[Q-Shape]).
Funding by the German Research Foundation (DFG) is gratefully acknowledged.
The authors would like to thank Jan Oliver Wallgrün, Frank Dylla, and Jae Hae
Lee for inspiring discussions. We also thank the anonymous reviewers for pointing
us to further literature from different research communities.
References
1. Hobbs, J.R.: Granularity. In: Proceedings of the Ninth International Joint Confer-
ence on Artificial Intelligence (IJCAI), pp. 432–435 (1985)
2. Bittner, T., Smith, B.: A taxonomy of granular partitions. In: Montello, D. (ed.)
Spatial Information Theory: Cognitive and Computational Foundations of Geo-
graphic Information Science (COSIT), pp. 28–43. Springer, Berlin (2001)
3. Mackaness, W.A., Chaudhry, O.: Generalization and symbolization. In: Shekhar,
S., Xiong, H. (eds.) Encyclopedia of GIS (2008)
4. Klippel, A., Richter, K.F., Barkowsky, T., Freksa, C.: The cognitive reality of
schematic maps. In: Meng, L., Zipf, A., Reichenbacher, T. (eds.) Map-based Mobile
Services – Theories, Methods and Implementations, pp. 57–74. Springer, Berlin
(2005)
5. Herskovits, A.: Schematization. In: Olivier, P., Gapp, K.P. (eds.) Representation
and Processing of Spatial Expressions, pp. 149–162. Lawrence Erlbaum Associates,
Mahwah (1998)
6. Stell, J.G., Worboys, M.F.: Generalizing graphs using amalgamation and selection.
In: Güting, R.H., Papadias, D., Lochovsky, F. (eds.) SSD 1999. LNCS, vol. 1651,
7. Bertel, S., Vrachliotis, G., Freksa, C.: Aspect-oriented building design: Toward
computer-aided approaches to solving spatial contraint problems in architecture.
In: Allen, G.L. (ed.) Applied Spatial Cognition: From Research to Cognitive Tech-
nology, pp. 75–102. Lawrence Erlbaum Associates, Mahwah (2007)
8. Moravec, H.P., Elfes, A.E.: High resolution maps from wide angle sonar. In:
Proceedings of the IEEE International Conference on Robotics and Automation
(ICRA), St. Louis, MO (1985)
9. Gutmann, J.S., Weigel, T., Nebel, B.: A fast, accurate and robust method for self-
localization in polygonal environments using laser range finders. Advanced Robot-
ics 14(8), 651–667 (2001)
10. Roberts, F.S.: Tolerance geometry. Notre Dame Journal of Formal Logic 14(1),
68–76 (1973)
11. Sutton, R.S., Barto, A.G.: Reinforcement learning: an introduction. In: Adaptive
Computation and Machine Learning. MIT Press, Cambridge (1998)
12. Konidaris, G.D., Barto, A.G.: Building portable options: Skill transfer in reinforce-
ment learning. In: Proceedings of the Twentieth International Joint Conference on
Artificial Intelligence (IJCAI) (2007)
13. Taylor, M.E., Stone, P.: Cross-domain transfer for reinforcement learning. In: Pro-
ceedings of the Twenty Fourth International Conference on Machine Learning
(ICML 2007), Corvallis, Oregon (2007)
14. Torrey, L., Shavlik, J., Walker, T., Maclin, R.: Skill acquisition via transfer learning
and advice taking. In: Fürnkranz, J., Scheffer, T., Spiliopoulou, M. (eds.) ECML
15. Watkins, C., Dayan, P.: Q-learning. Machine Learning 8, 279–292 (1992)
16. Thrun, S., Schwartz, A.: Finding structure in reinforcement learning. In: Tesauro,
G., Touretzky, D., Leen, T. (eds.) Advances in Neural Information Processing Sys-
tems: Proceedings of the 1994 Conference, vol. 7. MIT Press, Cambridge (1995)
17. Frommberger, L.: A generalizing spatial representation for robot navigation with
reinforcement learning. In: Proceedings of the Twentieth International Florida Ar-
tificial Intelligence Research Society Conference (FLAIRS 2007), Key West, FL,
USA, pp. 586–591. AAAI Press, Menlo Park (2007)
18. Goyal, R.K., Egenhofer, M.J.: Consistent queries over cardinal directions across
different levels of detail. In: Tjoa, A.M., Wagner, R., Al-Zobaidie, A. (eds.) Pro-
ceedings of the 11th International Workshop on Database and Expert System Ap-
plications, Greenwich, UK, pp. 867–880 (2000)
19. Schlieder, C.: Representing visible locations for qualitative navigation. In: Car-
rete, N.P., Singh, M.G. (eds.) Qualitative Reasoning and Decision Technologies,
Barcelona, Spain, pp. 523–532 (1993)
20. Frommberger, L.: Generalization and transfer learning in noise-affected robot nav-
igation tasks. In: Neves, J., Santos, M.F., Machado, J.M. (eds.) EPIA 2007. LNCS
(LNAI), vol. 4874, pp. 508–519. Springer, Heidelberg (2007)
21. Talmy, L.: How language structures space. In: Pick Jr., H.L., Acredolo, L.P. (eds.)
Spatial Orientation: Theory, Research, and Application. Plenum, New York (1983)
22. Porta, J.M., Celaya, E.: Reinforcement learning for agents with many sensors and
actuators acting in categorizable environments. Journal of Artificial Intelligence
Research 23, 79–122 (2005)
Representing Concepts in Time*
Martin Raubal
Department of Geography, University of California, Santa Barbara

5713 Ellison Hall, Santa Barbara, CA 93106, U.S.A.
raubal@geog.ucsb.edu
Abstract. People make use of concepts in all aspects of their lives. Concepts
are mental entities, which structure our experiences and support reasoning in
the world. They are usually regarded as static, although there is ample evidence
that they change over time with respect to structure, content, and relation to
real-world objects and processes. Recent research considers concepts as dy-
namical systems, emphasizing this potential for change. In order to analyze the
alteration of concepts in time, a formal representation of this process is neces-
sary. This paper proposes an algebraic model for representing dynamic concep-
tual structures, which integrates two theories from geography and cognitive
science, i.e., time geography and conceptual spaces. Such representation allows
for investigating the development of a conceptual structure along space-time
paths and serves as a foundation for querying the structure of concepts at a spe-
cific point in time or for a time interval. The geospatial concept of ‘landmark’ is
used to demonstrate the formal specifications.
Keywords: Conceptual spaces, time geography, concepts, representation, alge-

braic specifications.
1 Introduction
Humans employ concepts to structure their world, and to perform reasoning and cate-
gorization tasks. Many concepts are not static but change over time with respect to
their structure, substance, and relations to the real world. In addition, different people
use the same or similar concepts to refer to different objects and processes in the real
world, which can lead to communication problems. In this paper, we propose a novel
model to represent conceptual change over time. The model is based on a spatio-
temporal metaphor, representing conceptual change as movement along space-time
paths in a semantic space. It thereby integrates conceptual spaces [1] as one form of
conceptual representation within a time-geographic framework [2].
Formal representations of dynamic concepts are relevant from both a theoretical
and practical perspective. On the one hand, they allow us to theorize about how peo-
ple’s internal processes operate on conceptual structures and result in their alterations
over time. On the other hand, they are the basis for solving some of the current press-
ing research questions, such as in Geographic Information Science (GIScience) and
*
This paper is dedicated to Andrew Frank, for his 60th birthday. He has been a great teacher
and mentor to me.
Representing Concepts in Time 329
the disciplines concerned with ontologies. In GIScience, questions addressing which

geospatial concepts exist, how to trace their developmental patterns, model their in-
teractions (such as merging), and how to represent and process them computationally
are of major importance [3]. Research on ontologies has focused on dynamic ontolo-
gies1 for services to be integrated within the semantic web [4]. If we consider ontolo-
gies as explicit specifications of conceptualizations [5], then formal representations of
dynamic concepts can be utilized for translation into ontologies.
Section 2 presents related work regarding concepts, and introduces conceptual
spaces and time geography as the foundations for the proposed model. In Section 3,
we define our use of representation and describe the metaphorical mapping from
time-geographic elements to entities and operations in semantic space. We further
elaborate on the difference of changes within and between conceptual spaces.
Section 4 presents a computational model of conceptual change in terms of executable
algebraic specifications. Within this model, the mappings of entities and operations
are specified at the level of conceptual spaces, which consist of quality dimensions.
Section 5 applies the formal specifications to represent the change of a person’s geo-
spatial concept of ‘landmark’ over time. The final section presents conclusions and
directions for future research.
2 Related Work
This section starts with an explanation of the notion of concepts and their importance
for categorization. We then introduce conceptual spaces and time geography as the
underlying frameworks for representing concepts in time.
2.1 Concepts
There are several conflicting views on concepts, categories, and their relation to each
other across and even within different communities. From a classical perspective,
concepts have been defined as structured mental representations (of classes or indi-
viduals), which encode a set of necessary and sufficient conditions for their applica-
tion [6]. They deal with what is being represented and how such information is used
during categorization [7]. Barsalou et al. [8] view concepts as mental representations
of categories and point out that concepts are context dependent and situated. For ex-
ample, the concept of a chair is applied locally and does not cover all chairs
universally. From a memory perspective, “concepts are the underlying knowledge in
long-term memory from which temporary conceptualizations in working memory are
constructed.” [8, footnote 7] It is important to note the difference between concepts
and categories: a concept is a mental entity, whereas a category refers to a set of enti-
ties that are grouped together [9].
Concepts are viewed as dynamical systems that evolve and change over time [8].
New sensory input leads to the adaptation of previous concepts, such as during the
interactive process of spatial knowledge acquisition [10]. Neisser’s [11] perceptual
cycle is also based on the argument that perception and cognition involve dynamic
1
See, for example, http://dynamo.cs.manchester.ac.uk/
330 M. Raubal
cognitive structures (schemata in his case rather than explicit concepts). These are
subject to change as more information becomes available.
Here, we use concepts within the paradigm of cognitive semantics, which asserts
that meanings are mental entities—mappings from expressions to conceptual struc-
tures, which refer to the real world [12-14]. The main argument is therefore that a
symbolic representation of an object cannot refer directly to objects, but rather
through concepts in the mind. This difference between objects, concepts, and symbols
is often expressed through the semiotic triangle [15].
2.2 Conceptual Spaces

The notion of conceptual space was introduced as a framework for representing in-
formation at the conceptual level [1]. Such representation rests on the before-
mentioned foundation of cognitive semantics. Conceptual spaces can be utilized for
knowledge representation and sharing, and support the paradigm that concepts are
dynamical systems [16]. Sowa [17] argued that conceptual spaces are a promising
geometrical model for representing abstract concepts as well as physical images.
Furthermore, conceptual spaces may serve as an explanatory framework for results
from neuroscientific research regarding the representational structure of the brain [1].
A conceptual space is a set of quality dimensions with a geometrical or topological
structure for one or more domains. Domains are represented through sets of integral
dimensions, which are distinguishable from all other dimensions. For example, the
color domain is formed through the dimensions hue, saturation, and brightness. Con-
cepts cover multiple domains and are modeled as n-dimensional regions. Every object
or member of the corresponding category is represented as a point in the conceptual
space. This allows for expressing the similarity between two objects as the spatial
distance between their points. Recent work has focused on representing actions and
functional properties in conceptual spaces [18].
In [19], a methodology to formalize conceptual spaces as vector spaces was pre-
sented. Formally, a conceptual vector space is defined as Cn = {(c1, c2, ⁄, cn) | ci ∈
C} where the ci are the quality dimensions. A quality dimension can also represent a
whole domain and in this case cj = Dn = {(d1, d2, ⁄, dn) | dk ∈ D}. Vector spaces
have a metric and therefore allow for the calculation of distances between points in
the space. This can also be utilized for measuring distances between concepts, either
based on their approximation by ‘prototypical points’ or ‘prototypical regions’ [20].
In order to calculate these semantic distances between instances of concepts all qual-
ity dimensions of the space must be represented in the same relative unit of measure-
ment. Assuming a normal distribution, this is ensured by calculating the z scores for
these values, also called z-transformation [21]. For specifying different contexts one
can assign weights to the quality dimensions of a conceptual vector space. This is
essential for the representation of concepts as dynamical systems, because the sali-
ence of dimensions may change over time. Cn is then defined as {(w1c1, w2c2, ⁄,
wncn) | ci ∈ C, wj ∈ W} where W is the set of real numbers.
2.3 Time Geography

People and resources are available only at a limited number of locations and for a
limited amount of time. Time geography focuses on this necessary condition at the
core of human existence: “How does my location in space at a given time affect my
ability to be present at other locations at other times?” It defines the space-time

mechanics by considering different constraints for such presence—the capability,
coupling, and authority constraints [2]. The possibility of being present at a specific
location and time is determined by people’s ability to trade time for space, supported
by transportation and communication services.
Space-time paths depict the movement of individuals in space over time. Such paths
are available at various spatial (e.g., house, city, country) and temporal granularities
(e.g., decade, year, day) and can be represented through different dimensions. Figure 1
shows a person’s space-time path during a day, representing her movements and activity
participation at three different locations. The tubes depict space-time stations—locations
that provide resources for engaging in particular activities, such as sleeping, eating, and
working. The slope of the path represents the travel velocity. If the path is vertical then
the person is engaged in a stationary activity.
Time
Doctor’s office
Mall
Geographical
space
Home
Fig. 1. Space-time path of a person’s daily activities
Three classes of constraints limit a person’s activities in space and time. Capability
constraints limit an individual’s activities based on her abilities and the available re-
sources. For example, a fundamental requirement for many people is to sleep between
six and eight hours at home. Coupling constraints require a person to occupy a certain
location for a fixed duration to conduct an activity. If two people want to meet at a Café,
then they have to be there at the same time. In time-geographic terms, their paths cluster
into a space-time bundle. Certain domains in life are controlled through authority con-
straints, which are fiat restrictions on activities in space and time. A person can only
shop at a mall when the mall is open, such as between 10am and 9pm.
All space-time paths must lie within space-time prisms (STP). These are geometri-
cal constructs of two intersecting cones [22]. Their boundaries limit the possible
locations a path can take based on people’s abilities to trade time for space. Figure 2
depicts a space-time prism for a scenario where origin and destination have the same
location. The time budget is defined by Δt = t2−t1 in which a person can move away
from the origin, limited only by the maximum travel velocity. The interior of the
332 M. Raubal
Time
t2
PPS
t1 Geographical
PPA space
Fig. 2. Space-time prism as intersecting cones
prism defines a potential path space (PPS), which represents all locations in space
and time that can be reached by the individual during Δt. The projection of the PPS
onto geographical space results in the potential path area (PPA) [23].
3 A Spatio-temporal Metaphor for Representing Concepts in

Time
In this section, we first give a definition of representation, which applies to the model
presented here. The metaphorical mapping from time-geographic to semantic-space
elements is then explained. A formal model for the resulting semantic space will be
developed in the next section.
3.1 Representational Aspects
Different definitions of what a representation is have been given in the literature. In this
paper, we commit to the following: “A world, X, is a representation of another world, Y,
if at least some of the relations for objects of X are preserved by relations for corre-
sponding objects of Y.” [24, p.267] In order to avoid confusion about what is being
represented how and where regarding conceptual change over time, we distinguish
between two representations—the mental world and the mental model—, according to
[24]. The mental world is a representation of the real world and concerned with the
inner workings and processes within the brain and nervous system (i.e., inside the head).
Here, we formally specify a possible mental model as a representation of the mental
world2. The goal is to be able to use this model to explain the processes that lead to the
change of concepts in time. In this sense, we are aiming for informational equivalence
[24], see also [25] and [26] for examples from the geospatial domain.
2
A mental model is therefore a representation of a representation of the real world—see Palmer
[24] for a formal demonstration of this idea.
3.2 Metaphorical Mapping
The proposed mental model for representing conceptual change in time is based on a
spatio-temporal metaphor. The power of spatial metaphors for modeling and compre-
hending various non-spatial domains has been widely demonstrated [27-30]. From a
cognitive perspective, the reason for such potential is that space plays a fundamental
role in people’s everyday lives, including reasoning, language, and action [31].
Our representation of conceptual change in a mental model is based on the meta-
phorical projection of entities, their relations, and processes from a spatio-temporal
source domain to a semantic target domain. As with all metaphors, this is a partial
mapping, because source and target are not identical [30]. Concepts are represented as
n-dimensional regions in conceptual spaces, which can move through a semantic
space in time. The goal of this metaphor is to impose structure on the target domain
and therefore support the explanation of its processes.
Table 1. Metaphorical projection from time-geographic to semantic-space elements
Time-geographic elements Semantic-space elements

geographic space semantic space
geographic distance (dgeog) semantic distance (dsem)
space-time path (ST-path) semantic space-time path (SST-path)
space-time station (STS) semantic space-time station (SSTS)
space-time prism (STP) semantic space-time envelope (SSTE)
coupling constraint semantic coupling constraint
authority constraint contextual constraint
potential path space (PPS) semantic potential path space (SPPS)
More specifically, individual time-geographic elements are being mapped to ele-

ments in the semantic space (Table 1, Figure 3). Geographic space is being mapped to
semantic space, which can be thought of as a two- or three-dimensional attribute
surface as used in information visualization [32, 33]. Both conceptual spaces and
semantic spaces have a metric, which allows for measuring semantic distances dsem
between concepts and conceptual spaces [19]. Conceptual spaces (CS1 and CS2 in
Figure 3) move along semantic space-time paths (SST-path), vertical paths thereby
signifying stationary semantics, i.e., no conceptual change involving a change in
dimensions but changes in dimension values are possible (see Section 3.3). Such
stationarity corresponds to a semantic space-time station (SSTS). The semantic space-
time envelope (SSTE) and semantic potential path space (SPPS) define through their
boundaries, how far a conceptual space (including its concept regions) can deviate
from a vertical path and still represent the same or similar semantics. Crossing the
boundaries corresponds to conceptual change. It is important to note that these
boundaries are often fuzzy and indeterminate [34]. The extent of the SSTE is a func-
tion of time depending on the changes in the semantic space as defined above.
The partial mapping from source to target domain includes two constraints. Cou-
pling constraints are being mapped to semantic coupling constraints, which specify
the interaction of conceptual spaces (and concepts) based on the coincidence of their
334 M. Raubal
Time
Semantic coupling
SPPS
SST-path2
SSTS
SSTE
SST-path1
CS2
dsem
CS1 Semantic
space
Fig. 3. Representation of moving conceptual spaces in a semantic space over time. For clarity
reasons, the concept regions are only visualized once (during semantic coupling).
semantic space-time paths (i.e., semantic space-time bundling). Such coincidence

signifies high (significant overlap of concept regions, see Figure 3) or even total con-
ceptual similarity, e.g., when two different concepts merge into one over time, such as
the abstract political concepts of Eastern and Western Germany. Authority constraints
are being mapped to contextual constraints. Similar to fiat restrictions on activities in
space and time, there exist legal definitions, such as traffic codes or data transfer
standards, which create fiat conceptual boundaries. For example, the definition and
meaning of terms, such as parcel or forest, depend on the legal system of the respon-
sible administration—see also the discussion of institutional reality in [35]. The same
symbol can therefore relate to different concepts represented by different dimensions
or different regions in a conceptual space.
3.3 Within- and between-Conceptual-Space Changes
Our proposed mental model allows for representing conceptual change over time from
two perspectives, namely (a) change of the geometrical structure of concepts as n-
dimensional regions within one conceptual space and (b) changes between different
conceptual spaces. Case (a) presumes that no change of quality dimensions has oc-
curred in the conceptual space, therefore allowing only for movement of the concept
region within this particular space—caused by a change in dimension values. One can
then measure the semantic distance between a concept c at time ti and the same con-
cept at time ti+1. Three strategies for calculating semantic similarity between concep-
tual regions, including overlapping concepts, have been demonstrated in [20] and can
be applied here. These methods differ, in that for each vector of c(ti) one or several
corresponding vectors of c(ti+1) are identified.
Case (b) applies to mappings between conceptual spaces, leading to a change in
quality dimensions. These mappings can either be projections, which reduce the com-
plexity of the space by reducing its number of dimensions, or transformations, which
involve a major change of quality dimensions, such as the addition of new dimen-
sions. As shown in [36], projections (Equation 1) and transformations (Equation 2)
can be expressed as partial mappings with C, D denoting conceptual spaces and m, n
the number of quality dimensions. For projections, the semantics of the mapped qual-
ity dimensions must not change or can be mapped by rules.
(Rproj: Cm → Dn) where n < m and Cm ∩ Dn = Dn (1)
(Rtrafo: C → D ) where (n ≤ m and C ∩ D ≠ D ) or (n > m)
m n m n n
(2)
4 Formal Model of Conceptual Change in Time

This section develops a computational mental model for representing conceptual
change in time according to the presented spatio-temporal metaphor. We take an al-
gebraic approach to formally specify the mappings of entities and operations at the
level of conceptual spaces (which represent the conceptual regions). These specifica-
tions will be used in Section 5 to demonstrate the applicability of the formal model.
4.1 Algebraic Specifications
Our method of formalization uses algebraic specifications, which present a natural

way of representing entities and processes. Algebraic specifications have proven use-
ful for specifying data abstractions in spatial and temporal domains [25, 37-39]. Data
abstractions are based on abstract data types, which are representation-independent
formal definitions of all operations of a data type [40]. Entities are described in terms
of their operations, depicting how they behave. Algebraic specifications written in an
executable programming language can be tested as a prototype [41]. The tool chosen
here is Hugs, a dialect of the purely functional language Haskell [42], which includes
types, type classes, and algebraic axioms. Haskell provides higher-order capabilities
and one of its major strengths is strong typing: every object has a particular type and
the compiler checks that operations can only be applied to certain types.
4.2 Formal Model
A conceptual space is formally specified3 as a data type, together with its attributes.
Every conceptual space has an identifier Id, a Position in the semantic space at a
3
The complete Hugs code including the test data for this paper is available at http://www.
geog.ucsb.edu/~raubal/Downloads/CS.hs. Hugs interpreters can be downloaded freely from
http://www.haskell.org.
336 M. Raubal
given Time, and consists of a number of quality dimensions (list [Dimension]).

Every Dimension has a Name and a range of values (ValueRange) with a given
Unit, e.g., dimension weight with values between 0 and 250 kg. Here, we define
Position as a coordinate pair in a 2-dimensional semantic space and Time through
discrete steps.
data ConceptualSpace = NewConceptualSpace Id Position
Time [Dimension]
data Dimension = Dimension Name ValueRange Unit
We can now define a type class with common functions for conceptual spaces.
These functions can be simple operations to observe properties, such as the current
position of a conceptual space (getConceptualSpacePosition), but also more
complex operations that specify the elements, processes, and constraints described in
Section 3. The abstract type signatures are implementation-independent and can there-
fore be implemented for different types of conceptual spaces. Here, we inherit the
class behavior to the data type ConceptualSpace as specified above.
class ConceptualSpaces cs where
getConceptualSpacePosition :: cs -> Position
instance ConceptualSpaces ConceptualSpace where
getConceptualSpacePosition
(NewConceptualSpace id pos t ds) = pos
Conceptual change happens through movement of conceptual spaces along space-
time paths in the semantic space (and through movement of conceptual regions within
conceptual spaces). Conceptual spaces move to new positions only if there is a change
in dimensions (dsNew), otherwise they are stationary. The semanticDistance
function calculates either how far one conceptual space has moved in the semantic
space during a particular time interval, or the distance between two different concep-
tual spaces (such as dsem in Figure 3). It is currently implemented for 2-D Euclidean
distance (dist) but different instances of the Minkowski metric can be used instead,
depending on the types of dimensions and spaces [1]. A SemanticSpaceTime-
Path is constructed by finding (filtering) all conceptual space instances for a particu-
lar Id and ordering them in a temporal sequence.
moveConceptualSpace :: cs -> [Dimension] ->
ConceptualSpace
semanticDistance :: cs -> cs -> Distance
constructSemanticSpaceTimePath :: Id -> [cs] ->
SemanticSpaceTimePath
moveConceptualSpace (NewConceptualSpace id pos t
ds) dsNew
= if ds == dsNew
then (NewConceptualSpace id pos newT ds)
else (NewConceptualSpace id newPos newT dsNew)
semanticDistance (NewConceptualSpace id pos t ds)
(NewConceptualSpace id2 pos2 t2 ds2)
= dist pos pos2
constructSemanticSpaceTimePath i cs
= NewSemanticSpaceTimePath id css
where
id = i
css = filter ((i== ).getConceptualSpaceId) cs
Semantic space-time stations are specified as special types of SemanticSpace-
TimePaths—similar to the representation of space-time stations in [43]—, i.e.,
consisting of conceptual space instances with equal positions (but potential temporal
gaps). The derivation of a SemanticSpaceTimeStation is based on the sorting
function sortConceptualSpaces, which orders conceptual spaces according to
their positions.
class SemanticSpaceTimePaths sstPath where
constructSemanticSpaceTimeStation :: sstPath ->
[ConceptualSpace]
instance SemanticSpaceTimePaths SemanticSpaceTimePath
where
constructSemanticSpaceTimeStation
(NewSemanticSpaceTimePath id cs)
= sortConceptualSpaces cs
The data type SemanticSpaceTimeEnvelope is defined by a Center (of
type Position) and a Boundary for each time step. The projection of SSTE to
semantic space results in a region (equivalent to the PPA from time geography),
whose boundary delimits a semantic similarity area. Note that contrary to semantic
space-time stations, semantic potential path spaces—which result from integration
over a sequence of SSTE slices—cannot have gaps. One can now determine algo-
rithmically, whether a conceptual space falls inside the boundary or not (which identi-
fies conceptual change).
data SemanticSpaceTimeEnvelope =
NewSemanticSpaceTimeEnvelope Center Time Boundary
Semantic coupling constraints are represented through the semanticMeet func-
tion. It determines whether two instances of conceptual spaces interact at a given time
step. This definition leaves room for integrating semantic uncertainty by specifying a
threshold for the semantic distance (epsilon), within which the conceptual spaces
are still considered to be interacting, see also [44]. Contextual constraints are fiat
boundaries in the semantic space and can therefore be represented by the Boundary
type.
semanticMeet :: cs -> cs -> Bool
semanticMeet cs1 cs2
= (getConceptualSpaceTime cs1 ==
getConceptualSpaceTime cs2)
&& (semanticDistance cs1 cs2 <= epsilon)
338 M. Raubal
5 Application: Geospatial Concept Change in Time

The formal model in the previous section provides executable specifications of the
represented elements and processes for conceptual change based on the geometrical
framework of conceptual spaces. In order to demonstrate the model with respect to
analyzing the change of conceptual structures in time, we apply it to the use case of
representing the concept of ‘landmark’ within the particular scenario of wayfinding in
a city [45], where façades of buildings are often used as landmarks. Geospatial con-
cepts, such as lake, mountain, geologic region, street, or landmark, differ in many
qualitative ways from other concepts, due to their spatio-temporal nature [46, 47].
Their structure in terms of represented meaning changes for individual persons over
time and may also differ between cultures, e.g., classifications of landscapes [48].
In the following, the change of a person’s conceptual structure of ‘landmark’ (in
terms of façade as described above) over time is represented with respect to the
change of quality dimensions in a semantic space. Based on previous work, we spec-
ify the dimensions façade area fa (square meters), shape deviation sd (deviation
from minimum bounding rectangle in percent), color co (three RGB values), cultural
importance ci (ordinal scale of 1 to 5), and visibility vi (square meters) [19, 45].
fa = (Dimension "area" (100,1200) "sqm")
sd = (Dimension "shape" (0,100) "%")
co = (Dimension "color" (0,255) "RGB")
ci = (Dimension "cultural" (1,5) "importance")
vi = (Dimension "visibility" (0,10000) "sqm")
Time
fa
sd
vi
t4
fa
sd
co
t3
fa
SST-path sd
ci
co
t2
fa
sd dcs2-cs3
dcs1-cs3 Semantic
t1 vi dcs1-cs2
dcs1-cs2 space
Fig. 4. Change of a person’s conceptual structure of ‘landmark’ over time

Four time steps are considered, which results in four instances of the conceptual
space4. In this scenario, the person’s ‘landmark’ concept comprises three quality di-
mensions at time t1 (cs1). Through experience and over the years, the person has
acquired a sense of cultural importance of buildings (cs2)—a building may be fa-
mous for its architectural style, therefore being a landmark—, adding this new dimen-
sion and also the significance of color. Next, for the reason of variation in the per-
son’s interests, cultural importance vanishes again (cs3). Over time, due to physio-
logical changes resulting in color blindness, the person’s concept structure changes
back to the original one, eliminating color and again including visibility. Figure 4
visualizes these conceptual changes over time.
cs1 = NewConceptualSpace 1 (3,1) 1 [fa,sd,vi]
cs2 = NewConceptualSpace 1 (6,3) 2 [fa,sd,ci,co]
cs3 = NewConceptualSpace 1 (4,2) 3 [fa,sd,co]
cs4 = NewConceptualSpace 1 (3,1) 4 [fa,sd,vi]
The formal specifications can now be used to query the temporal conceptual repre-
sentation in order to find conceptual changes and when they happened, and what
semantics is represented by a particular conceptual structure at a specific time. We
can infer that the semantic change from cs1 at time 1 to cs2 at time 2 (transforma-
tion with two new dimensions) is larger than the change from cs1 at time 1 to cs3 at
time 3 (transformation with one new dimension) by calculating the respective seman-
tic distances (dcs1-cs2 and dcs1-cs3 in Figure 4). The change resulting from the move
between time 2 and 3 (dcs2-cs3) is due to a projection, involving a reduction to three
dimensions. Similarity is thereby a decaying function of semantic distance, which
depends on the semantic space. The interpretation of semantic distance is domain-
dependent and may be determined through human participants tests [49].
semanticDistance cs1 cs2
3.605551
1.414214
2.236068
We can further construct the semantic space-time path for the conceptual space un-
der investigation from the set of all available conceptual space instances (allCs).
The result (presented below is only the very beginning for space reasons) is a list of
the four conceptual space instances with Id=1 in a temporal sequence. This SST-
path is visualized in Figure 4.
constructSemanticSpaceTimePath 1 allCs
[NewSemanticSpaceTimePath 1 [NewConceptualSpace 1 …]
Applying the constructSemanticSpaceTimeStation function to the
SST-path derives all conceptual space instances with equal positions but potentially
temporal gaps, such as cs1 and cs4.
4
The quantitative values for the positions of conceptual spaces in the semantic space are for
demonstration purposes. Their determination, such as through similarity ratings from human
participants tests, is left for future work.
340 M. Raubal
constructSemanticSpaceTimeStation
(constructSemanticSpaceTimePath 1 allCs)
[NewConceptualSpace 1 (3.0,1.0) 1 [Dimension "area"
(100.0,1200.0) "sqm",Dimension "shape" (0.0,100.0)
"%",Dimension "color" (0.0,255.0) "RGB"],
NewConceptualSpace 1 (3.0,1.0) 4 [Dimension "area"
(100.0,1200.0) "sqm",Dimension "shape" (0.0,100.0)
"%",Dimension "color" (0.0,255.0) "RGB"]]

This paper presented a novel computational model to represent conceptual change
over time. The model is based on a spatio-temporal metaphor, utilizing elements from
time geography and conceptual spaces. Conceptual change is represented through
movement of conceptual spaces along space-time paths in a semantic space. We de-
veloped executable algebraic specifications for the mapped entities, relations, and
operations, which allowed demonstrating the model through an application to a geo-
spatial conceptual structure. This application showed the potential of the formal rep-
resentation for analyzing the dynamic nature of concepts and their changes in time.
The presented work suggests several directions for future research:
• The formal model needs to be extended to represent conceptual regions within the
conceptual spaces. This will allow the application of semantic similarity measures,
such as the ones proposed in [20], to determine semantic distances between indi-
vidual concepts anchored within their corresponding conceptual spaces.
• The quantification of conceptual change depends on the representation of the se-
mantic space, which we have modeled as a two-dimensional attribute surface.
More research in cognitive science and information science is required to establish
cognitively plausible, semantic surface representations (similar to those developed
in the area of information visualization) for different domains that can be used
within our proposed model. This will also determine the distance and direction
when moving a conceptual space due to a change in its quality dimensions.
• Conceptual regions often do not have crisp boundaries therefore their representa-
tion must take aspects of uncertainty into account. Uncertainty also propagates
when applying operations such as intersection to concept regions. Future work
must address these issues based on the time-geographic uncertainty problems iden-
tified in [43].
• The semantic space is a similarity space, i.e., distance represents similarity be-
tween concepts. This leads to the question whether disparate concepts, such as
roundness and speed, can be compared at all? A possible solution is to make con-
cepts comparable only when they are within a certain threshold distance: if this is
exceeded, then the similarity is zero. Another way is to specifically include infinite
distance. It is essential to account for the given context in which concepts are com-
pared. The context can be represented through different dimension weights.
• The formal specifications serve as the basis for implementing a concept query
language, which can be tested in different application domains. This will help un-
derstanding various concept dynamics, more specifically, the characterization and
prediction of conceptual change through time.
• In this work we utilized Gärdenfors’ [1] notion of conceptual spaces as a geometric

way of representing information at the conceptual level. Different views on the na-
ture of conceptual representations in the human cognitive system exist, such as the
ideas of mental images [50] or schematic perceptual images extracted from modes
of experience [8]. Could such images be represented in or combined with concep-
tual spaces? Would such combination be similar to a cognitive collage [51]? Hu-
man participants tests may help assess the validity of geometrical representations
of concepts and point to potential limitations of conceptual spaces as a representa-
tional model.
Acknowledgments
The comments from Carsten Keßler and three anonymous reviewers provided useful
suggestions to improve the content of the paper.
Bibliography
1. Gärdenfors, P.: Conceptual Spaces - The Geometry of Thought. MIT Press, Cambridge
(2000)
2. Hägerstrand, T.: What about people in regional science? Papers of the Regional Science
Association 24, 7–21 (1970)
3. Brodaric, B., Gahegan, M.: Distinguishing Instances and Evidence of Geographical Con-
cepts for Geospatial Database Design. In: Egenhofer, M., Mark, D. (eds.) Geographic In-
formation Science - Second International Conference, GIScience 2002, Boulder, CO,
USA, September 2002, pp. 22–37. Springer, Berlin (2002)
4. Berners-Lee, T., Hendler, J., Lassila, O.: The Semantic Web. In: Scientific American, pp.
34–43 (2001)
5. Gruber, T.: A Translation Approach to Portable Ontology Specifications. Knowledge Ac-
quisition 5(2), 199–220 (1993)
6. Laurence, S., Margolis, E.: Concepts and Cognitive Science. In: Margolis, E., Laurence, S.
(eds.) Concepts - Core Readings, pp. 3–81. MIT Press, Cambridge (1999)
7. Smith, E.: Concepts and induction. In: Posner, M. (ed.) Foundations of cognitive science,
pp. 501–526. MIT Press, Cambridge (1989)
8. Barsalou, L., Yeh, W., Luka, B., Olseth, K., Mix, K., Wu, L.: Concepts and meaning. In:
Beals, K., et al. (eds.) Parasession on conceptual representations, pp. 23–61. University of
Chicago, Chicago Linguistics Society (1993)
9. Goldstone, R., Kersten, A.: Concepts and Categorization. In: Healy, A., Proctor, R. (eds.)
Comprehensive handbook of psychology, pp. 599–621 (2003)
10. Piaget, J., Inhelder, B.: The Child’s Conception of Space. Norton, New York (1967)
11. Neisser, U.: Cognition and Reality - Principles and Implications of Cognitive Psychology.
Freeman, New York (1976)
12. Lakoff, G.: Cognitive Semantics, in Meaning and Mental Representations. In: Eco, U., San-
tambrogio, M., Violi, P. (eds.), pp. 119–154. Indiana University Press, Bloomington (1988)
13. Green, R.: Internally-Structured Conceptual Models in Cognitive Semantics. In: Green, R.,
Bean, C., Myaeng, S. (eds.) The Semantics of Relationships - An Interdisciplinary Per-
spective, pp. 73–89. Kluwer, Dordrecht (2002)
342 M. Raubal
14. Kuhn, W., Raubal, M., Gärdenfors, P.: Cognitive Semantics and Spatio-Temporal Ontolo-
gies. Spatial Cognition and Computation 7(1), 3–12 (2007)
15. Ogden, C., Richards, I.: The Meaning of Meaning: A Study of the Influence of Language
Upon Thought and of the Science of Symbolism. Routledge & Kegan Paul, London (1923)
16. Barsalou, L.: Situated simulation in the human conceptual system. Language and Cogni-
tive Processes 5(6), 513–562 (2003)
17. Sowa, J.: Categorization in Cognitive Computer Science. In: Cohen, H., Lefebvre, C.
(eds.) Handbook of Categorization in Cognitive Science, pp. 141–163. Elsevier, Amster-
dam (2006)
18. Gärdenfors, P.: Representing actions and functional properties in conceptual spaces. In:
Ziemke, T., Zlatev, J., Frank, R. (eds.) Body, Language and Mind, pp. 167–195. Mouton
de Gruyter, Berlin (2007)
19. Raubal, M.: Formalizing Conceptual Spaces, in Formal Ontology in Information Systems.
In: Varzi, A., Vieu, L. (eds.) Proceedings of the Third International Conference (FOIS
2004), pp. 153–164. IOS Press, Amsterdam (2004)
20. Schwering, A., Raubal, M.: Measuring Semantic Similarity between Geospatial Concep-
tual Regions. In: Rodriguez, A., et al. (eds.) GeoSpatial Semantics - First International
Conference, GeoS 2005, Mexico City, Mexico, November 2005, pp. 90–106. Springer,
Berlin (2005)
21. Devore, J., Peck, R.: Statistics - The Exploration and Analysis of Data, 4th edn. Duxbury,
Pacific Grove (2001)
22. Lenntorp, B.: Paths in Space-Time Environments: A Time-Geographic Study of the
Movement Possibilities of Individuals. Lund Studies in Geography, Series B (44) (1976)
23. Miller, H.: Modeling accessibility using space-time prism concepts within geographical in-
formation systems. International Journal of Geographical Information Systems 5(3), 287–
301 (1991)
24. Palmer, S.: Fundamental aspects of cognitive representation. In: Rosch, E., Lloyd, B. (eds.)
Cognition and categorization, pp. 259–303. Lawrence Erlbaum, Hillsdale (1978)
25. Frank, A.: Spatial Communication with Maps: Defining the Correctness of Maps Using a
Multi-Agent Simulation. In: Freksa, C., et al. (eds.) Spatial Cognition II - Integrating Ab-
stract Theories, Empirical Studies, Formal Methods, and Practical Applications, pp. 80–99.
26. Frank, A.: Pragmatic Information Content: How to Measure the Information in a Route
Description. In: Duckham, M., Goodchild, M., Worboys, M. (eds.) Foundations of Geo-
graphic Information Science, pp. 47–68. Taylor & Francis, London (2003)
27. Lakoff, G., Johnson, M.: Metaphors We Live By. University of Chicago Press, Chicago
(1980)
28. Kuipers, B.: The ’Map in the Head’ Metaphor. Environment and Behaviour 14(2), 202–
220 (1982)
29. Kuhn, W.: Metaphors Create Theories for Users. In: Frank, A.U., Campari, I. (eds.) Spatial
Information Theory: Theoretical Basis for GIS, pp. 366–376. Springer, Berlin (1993)
30. Kuhn, W., Blumenthal, B.: Spatialization: Spatial Metaphors for User Interfaces. Geoinfo-
Series, vol. 8. Department of Geoinformation, Technical University Vienna, Vienna (1996)
31. Lakoff, G.: Women, Fire, and Dangerous Things: What Categories Reveal About the
Mind. The University of Chicago Press, Chicago (1987)
32. Skupin, A.: Where do you want to go today [in attribute space]? In: Miller, H. (ed.) Socie-
ties and Cities in the Age of Instant Access, pp. 133–149. Springer, Dordrecht (2007)
33. Skupin, A., Fabrikant, S.: Spatialization Methods: A Cartographic Research Agenda for
Non-Geographic Information Visualization. Cartography and Geographic Information Sci-
ence 30(2), 95–119 (2003)
34. Burrough, P., Frank, A., Masser, I., Salgé, F.: Geographic Objects with Indeterminate
Boundaries. GISDATA Series. Taylor & Francis, London (1996)
35. Frank, A.: Ontology for spatio-temporal Databases. In: Koubarakis, M., et al. (eds.) Spatio-
temporal Databases: The Chorochronos Approach, pp. 9–77. Springer, Berlin (2003)
36. Raubal, M.: Mappings For Cognitive Semantic Interoperability. In: Toppen, F., Painho, M.
(eds.) AGILE 2005 - 8th Conference on Geographic Information Science, pp. 291–296. In-
stituto Geografico Portugues (IGP), Lisboa (2005)
37. Winter, S., Nittel, S.: Formal information modelling for standardisation in the spatial domain.
International Journal of Geographical Information Science, 2003 17(8), 721–742 (2003)
38. Raubal, M., Kuhn, W.: Ontology-Based Task Simulation. Spatial Cognition and Computa-
tion 4(1), 15–37 (2004)
39. Krieg-Brückner, B., Shi, H.: Orientation Calculi and Route Graphs: Towards Semantic
Representations for Route Descriptions. In: Raubal, M., et al. (eds.) Geographic Informa-
tion Science, 4th International Conference GIScience 2006, Muenster, Germany, pp. 234–
250. Springer, Berlin (2006)
40. Guttag, J., Horowitz, E., Musser, D.: The Design of Data Type Specifications. In: Yeh, R.
(ed.) Current Trends in Programming Methodology, pp. 60–79. Prentice-Hall, Englewood
Cliffs (1978)
41. Frank, A., Kuhn, W.: Specifying Open GIS with Functional Languages. In: Egenhofer, M.,
Herring, J. (eds.) Advances in Spatial Databases (SSD 1995), pp. 184–195. Springer, Port-
land (1995)
42. Hudak, P.: The Haskell School of Expression: Learning Functional Programming through
Multimedia. Cambridge University Press, New York (2000)
43. Miller, H.: A Measurement Theory for Time Geography. Geographical Analysis 37(1),
17–45 (2005)
44. Ahlqvist, O.: A Parameterized Representation of Uncertain Conceptual Spaces. Transac-
tions in GIS 8(4), 493–514 (2004)
45. Nothegger, C., Winter, S., Raubal, M.: Selection of Salient Features for Route Directions.
Spatial Cognition and Computation 4(2), 113–136 (2004)
46. Smith, B., Mark, D.: Geographical categories: an ontological investigation. International
Journal of Geographical Information Science 15(7), 591–612 (2001)
47. Brodaric, B., Gahegan, M.: Experiments to Examine the Situated Nature of Geoscientific
Concepts. Spatial Cognition and Computation 7(1), 61–95 (2007)
48. Mark, D., Turk, A., Stea, D.: Progress on Yindjibarndi Ethnophysiography. In: Winter, S.,
et al. (eds.) Spatial Information Theory, 8th International Conference COSIT 2007, Mel-
bourne, Australia, pp. 1–19. Springer, Berlin (2007)
49. Hahn, U., Chater, N.: Understanding Similarity: A Joint Project for Psychology, Case-
Based Reasoning, and Law. Artificial Intelligence Review 12, 393–427 (1998)
50. Kosslyn, S.: Image and brain - The resolution of the imagery debate. MIT Press, Cam-
bridge (1994)
51. Tversky, B.: Cognitive Maps, Cognitive Collages, and Spatial Mental Model. In: Frank,
A., Campari, I. (eds.) Spatial Information Theory: Theoretical Basis for GIS, pp. 14–24.
The Network of Reference Frames Theory: A Synthesis
of Graphs and Cognitive Maps
Tobias Meilinger
Max-Planck-Institute for Biological Cybernetics

Spemannstr. 44, 72076 Tübingen, Germany
tobias.meilinger@tuebingen.mpg.de
Abstract. The network of reference frames theory explains the orientation behav-
ior of human and non-human animals in directly experienced environmental
spaces, such as buildings or towns. This includes self-localization, route and sur-
vey navigation. It is a synthesis of graph representations and cognitive maps, and
solves the problems associated with explaining orientation behavior based either
on graphs, maps or both of them in parallel. Additionally, the theory points out the
unique role of vista spaces and asymmetries in spatial memory. New predictions
are derived from the theory, one of which has been tested recently.
Keywords: graph; cognitive map; spatial memory; reference frame; route

knowledge; survey knowledge; self-localization; environmental space.
1 Introduction
Orientation in space is fundamental for all humans and the majority of other animals.
Accomplishing goals frequently requires moving through environmental spaces such
as forests, houses, or cities [26]. How do navigators accomplish this? How do they
represent the environment they traveled? Which processes operate on these represen-
tations in order to reach distant destinations or to self-localize when lost? Various
theories have been proposed to explain these questions. Regarding the underlying
representation these theories can be roughly classified into two groups which are
called here graph representations and cognitive maps. In the following paper, I will
explain graph representations and cognitive maps. I will also highlight how graph
representations and cognitive maps fail to properly explain orientation behaviour. As
a solution I will introduce the network of reference frames theory and discuss it with
respect to other theories and further empirical results.
1.1 Graph Representations and Cognitive Maps
In graphs the environment is represented as multiple interconnected units (e.g., [4],

[19], [20], [45], [48]; see Fig. 1). A node within such a graph, for example, represents
a location in space or a specific sensory input encountered, such as a view. An edge
within a graph typically represents the action necessary to reach the adjacent node.
Graphs are particularly suitable for explaining navigating and communicating routes
(i.e., a sequence of actions at locations or views which allows navigators at a location
.
The Network of Reference Frames Theory 345
Fig. 1. Visualizations of a graph representation where an environment is represented as multi-

ple interconnected units (left) and a cognitive map where an environment is represented within
one reference frame (right)
A to reach B without necessarily knowing where exactly B is relative to A). This

could be, for example,, turn right at the church, then turn left at the next intersection
etc., The knowledge expressed in these sequences is called route knowledge.
A cognitive map, on the other hand, assumes that the environment is represented
within one single metric frame of reference, (i.e., all locations within the environment
can be expressed by coordinates of one single coordinate system; see Fig. 1; [2], [7],
[30]; cf., [28], [33]).1 A cognitive map has to be constructed from several different
pieces of information encountered during navigation. The case of learning a cognitive
map from a physical map which provides the information already within one frame of
reference is not considered here. A cognitive map is especially suited to provide direct
spatial relations between two locations, without necessarily knowing how to get there,
for example, the station is 300 Meters to the east from my current location. This type
of knowledge is known as survey knowledge. Survey knowledge is necessary for
tasks such as shortcutting or pointing to distant locations.
1.2 Problems with Graph Representations and Cognitive Maps
Graph representations and cognitive maps are especially suited to represent route and
survey knowledge, respectively. The other side of the coin is, however, that they also
have their specific limitations. These will now be described in detail.
Graph representations (1) do not represent survey knowledge, (2) often ignore met-
ric relations given in perception, and (3) often assume actions are sufficient to explain
route knowledge. The main limitation of graph representations is that there is no sur-
vey knowledge expressed at all. Using a graph representation, navigators know how
to reach a location and have the ability to choose between different routes. Graph
representations, however, do not give navigators any cue as to where their goal is
1
Often the term cognitive map is used for the sum of all spatial representations. Contrary to
that, cognitive map is understood here as a specific spatial representation, namely storing spa-
tial information within one reference frame. A reference frame here is not understood as the
general notion of representing something relative to ones’ own body (= egocentric) vs. rela-
tive to other objects (= allocentric), but a reference frame is considered as one single coordi-
nate system (cf. [15]). Nevertheless, a reference frame can be egocentric or allocentric.
346 T. Meilinger
located in terms of direction or distance. This problem originates from the fact that
graph representations normally do not represent metric knowledge at all. This is de-
spite the fact that not only human navigators are provided with at least rough distance
estimates especially by their visual system and by proprioceptive cues during locomo-
tion. Some graph models ignore this already available information and instead assume
that a navigator stores raw or only barely processed sensory data ([4], [20]). As a final
point, actions themselves ([19], [20], [45]) can not be sufficient to explain route
knowledge. Rats can swim a route learned by walking [18]. Cats can walk a route
learned while being passively carried along a route [10]. We can cycle a path learned
by walking. Even for route knowledge the edge of a graph representing how to get
from one node to the next has to be more abstract than a specific action. However, not
only graph representations are limited.
Cognitive maps (1) have problems in explaining self-localization and route knowl-
edge. (2) There is a surprising lack of evidence that proves non-human animals have
cognitive maps at all. (3) Human survey navigation is not always consistent with a
cognitive map, and (4), cognitive maps are necessarily limited in size. Self-localizing
based exclusively on a cognitive map can only take the geometric relations into ac-
count that are displayed there, (e.g., the form of a place). The visual appearance of
landmarks is almost impossible to represent within a cognitive map itself. This infor-
mation has to be represented separately and somehow linked to a location within the
cognitive map. This is probably one reason why simultaneously constructing a map
while staying located within this map (SLAM) is considered a complicated problem
in robotics [42]. Similarly, planning a route based on a cognitive map alone is also not
trivial, as possible routes have to be identified first [16]. Another issue is that cogni-
tive maps seem to be limited to human navigation. If animals had cognitive maps,
they would easily be able to take novel shortcuts, (i.e., directly approach a goal via a
novel path without using updating or landmarks visible from both locations). How-
ever, the few observations arguing for novel shortcuts in insects and mammals have
been criticized because they do not exclude alternative explanations and could not be
replicated in better controlled experiments [1]. For example, in the famous experiment
by Tolman, Ritchie and Khalish [43] rat’s shortcutting behavior can be explained by
assuming they directly approached the only available light source within the room.
Although the discussion whether non-human animals are able to make novel shortcuts
has yet to be settled, such shortcutting behavior should be fairly common if orienta-
tion was based on a cognitive map. This is clearly not the case. Similarly, a human
shortcutting experiment within an “impossible” virtual environment casts doubt upon
a cognitive map as the basis for such survey navigation [34]. In this experiment unno-
ticeable portals within the virtual environment teleported participants to another loca-
tion within the environment. They could, therefore, not construct a consistent
two-dimensional map of this environment. Still, participants were able to shortcut
quite accurately. The last shortcoming of cognitive maps is that we have to use many
of them anyway. We surely do not have one and the same cognitive map (reference
frame) to represent the house we grew up in, New York and the Eiffel Tower. At one
point, we have to use multiple cognitive maps and (probably) represent relations be-
tween them.
Graph representations and cognitive maps have specific advantages and limitations.
Graphs are good for representing route knowledge. However, they do not explain survey
knowledge. Contrary to that, cognitive maps are straight forward representations of

survey knowledge. They are, however, not well suited for self-localization and route
knowledge and fail to explain some human and non-human orientation behavior. As a
solution to these limitations, often both representations are assumed in parallel to best
account for the different behaviors observed ([2], [4], [30], [45]; see also [12], [28],
[33]). However, assuming two representations in parallel also poses difficulties. First,
the last three arguments against cognitive maps also argue against theories which
assume both graphs and cognitive maps. In addition, according to “Occam’s razor”
(law of parsimony), one simple representation is preferable to multiple representations
when explaining behavior. Multiple representations of one environment also raise the
question of how these representations are connected. A house for example, can be
represented within a graph and a cognitive map. The house-representation in the map
should refer to the corresponding house representation within the graph and not to a
representation of another house. First, this correspondence has to be specified some-
how, for example, via an association which results in even more information to be
represented. Second, it is a non-trivial problem to keep the correspondences free of
error. A theory has to state how this is accomplished.
In conclusion, neither a graph representation, nor a cognitive map alone is suffi-
cient to convincingly explain orientation behavior in humans and non-human animals.
Both representations together also pose tremendous difficulties. As a solution to these
problems, I would like to propose the network of reference frames theory which com-
bines graphs and cognitive maps within one representation. This theory described in
Chapter 2 avoids the problems which were already mentioned. Together with proc-
esses operating on this representation, it explains self-localization, route navigation
and survey navigation. Furthermore, this theory can also explain other effects which
have not yet been pointed out. This will be described in Chapter 3 where it will also
be compared to other theories.
2 The Network of Reference Frames Theory
In this chapter, I will describe the network of reference frames theory in terms of the
representations and the processes acting on those, and how these are used for different
tasks, such as navigation, survey knowledge, etc.
2.1 Representation
The network of reference frames theory describes the memory representation acquired
by human and non-human animals when locomoting through environmental spaces
such as the country side, buildings, or cities. It also describes how this representation is
used for self-localization, route and survey navigation. The theory is a fusion between
graph representations and cognitive maps (cf., Fig.2). It assumes that the environment
is encoded in multiple interconnected reference frames. Each reference frame can be
described as a coordinate system with a specific orientation. These reference frames
form a network or graph. A node within this network is a reference frame referring to a
single vista space. Vista spaces surround the navigator and can be perceived from
.
348 T. Meilinger
Fig. 2. A visualization of the network of reference frame theory. Reference frames correspond
to single vista spaces. They are connected via perspective shifts which specify the translation
and rotation necessary to get from one reference frame to the next one.
one point of view, for example, a room, a street or even a valley [26].2 This means
that the basic unit in the network is always the reference frame of a vista space.
Within this vista space reference frame, the location of objects and the surrounding
geometry are specified. The edges in the network define the so called perspective shift
necessary to move from one reference frame to the next. Such a perspective shift
consists of both a translation and a rotation component, for example, moving forward
150 meters and then turning right 90°. Perspective shifts all point to another reference
frame,3 they may differ in precision and the association strength with which they
connect the two reference frames. The more familiar a navigator is with an environ-
ment, the more precise the perspective shifts will become and the more strongly the
perspective shift will connect two reference frames.
The network of vista space reference frames connected via perspective shifts is
stored in long-term memory. Several processes shape or operate on this memory.
These processes are encoding, reorientation by recognition, route navigation, and
survey navigation. In the following they will be described in detail (for a summary see
Table 1).
2.2 Encoding
First Time Encounter. Encoding describes the process of constructing a representa-

tion of an environmental space through initial and continued contact. It is assumed
2
Vista spaces extend to the back of a navigator (although nothing might be represented there).
While other senses such as audition or information from self motion may be used to construct
a representation of a vista space, the main source to do so will be vision.
3
Humans are able to imagine how a perceived or a remembered vista space looks like from a
different perspective. Such an imaginary shift in perspective within a vista space is not what is
called perspective shift in the network of reference frames theory. Here a perspective shift,
first, is stored in memory and is not imagined online, and second, a perspective shift always
connects two vista spaces and does not occur within one vista space.
that encoding happens automatically. When navigating through an environmental

space for the first time, we perceive vista spaces within the environmental space. This
perceived vista space corresponds to a reference frame. The orientation of that refer-
ence frame is either determined by the view from which the vista space was experi-
enced in the first place (cf., [20], [46]) or it is determined by the salient geometry of
that vista space ([28], [33]). In daily life, these two directions usually coincide. For
example, when entering a street or a house, our first view of the street or house is
usually aligned with the geometry of the surrounding walls. Accessing such a refer-
ence frame will be easier and lead to an improved performance when one is aligned
with the orientation of this reference frame, (e.g., looking down the street), than when
not aligned, (e.g., facing a house in the street). Within this reference frame, the ge-
ometry of the enclosure is encoded (e.g., walls, hedges, houses or large objects). In
addition to the geometry, locations of objects, such as landmarks, can be located
within such a reference frame of a vista space.
After encoding an individual reference frame, a navigator moves on and encodes
other reference frames corresponding to other vista spaces. These vista spaces do not
necessarily have to be adjacent. A perspective shift will connect the two vista space
reference frames, (i.e., the translations and rotations necessary to get from the first
reference frame to the second). This perspective shift can be derived (1) from the
visual scene itself, (2) from updating during navigating between the two vista spaces,
and (3) from global landmarks visible from both vista spaces.
Deriving the perspective shift from the visual scene can be shown in an example
such as standing in the corridor of a house and watching the kitchen door. The kitchen
Table 1. Summary of the representation and processes assumed in the network of reference
frames theory
Representation
Network (graph) consisting of nodes connected by edges (see Fig. 2)
Node: a reference frame with an orientation specifying locations and orientations within a
vista space; within this reference frame, objects and the geometric layout are encoded
Edge: perspective shift, i.e., translation and rotation necessary to move to the next reference
frame; perspective shifts point to the next reference frame and differ in precision and
association strength.
Processes
Encoding: first time experience or the geometry of a vista space define the orientation of a
new reference frame; the visual scene itself, updating, or global landmarks can provide the
perspective shift to the next vista space reference frame; familiarity increases the accuracy
of the perspective shifts and the association strength of these connections.
Self-localization by recognition: recognizing a vista space by the geometry or landmarks it
contains provides location and orientation within this vista space and the current
node/reference frame within the network
Route navigation by activation spread: an activation spread mechanism provides a route from
the current location to the goal; during wayfinding, reference frames on the route are pre-
activated and, therefore, recognized more easily; recently visited reference frames are
deactivated
Survey navigation by imagination: imagining connected vista spaces not visible step-by-step
within the current reference frame; allows retrieving direction and straight line distance to
distant locations; this can be used for shortcutting or pointing.
350 T. Meilinger
door provides us with the information of where (translational component) and in

which orientation (rotational component) the kitchen is located with respect to the
reference frame of the corridor. Extracting the perspective shift from the visual scene
itself, however, only works for adjacent vista spaces with a visible connection.
For non-adjacent vista spaces, updating can provide the perspective shift. In doing
so, one’s location and orientation within the current reference frame is updated while
moving away from its origin, (i.e., navigators track their location and orientation
relative to the latest encoded vista space). When encoding a new reference frame, the
updated distance and orientation within the former reference frame provides the nec-
essary perspective shift to get from the first reference frame to the next. In that sense,
updating can provide the “glue” connecting locations in an environmental space (cf.,
[17]). Updating can also work as a lifeline saving navigators from getting lost. As
long as navigators update the last reference frame visited, they are able to return to the
origin of the last encoded reference frame, (i.e., they are oriented).
A third possibility to get a perspective shift when already located in the second ref-
erence frame is by self-localizing with respect to a global landmark also visible from
the first vista space reference frame, for example, a tower or a mountain top. Self-
localizing provides a navigator with the position and orientation with respect to the
reference frame in which the global landmark was first experienced. This is the per-
spective shift necessary to get from the first reference frame to the second one.
Repeated Visits. Re-visiting an environmental space can add new perspective shifts
to the network and will increase the precision and association strength of existing
perspective shifts (for the later see 2.4). Walking a new route to a familiar goal will
form a new chain of reference frames and perspective shifts connecting the start and
goal. That way, formerly unconnected areas, such as city districts, can be connected.
When walking a known route in reverse direction, the theory assumes that new per-
spective shifts are encoded in a backward direction. Then two reference frames A and
B are connected with two perspective shifts, one pointing from A to B and the other
one pointing from B to A. In principle, inverting one perspective shift would be suffi-
cient to get the opposite perspective shift. However, such an inversion process is as-
sumed to be error-prone and costly therefore it is usually not applied.
When navigating an existing perspective shift along its orientation repeatedly no
new perspective shift is encoded, but the existing perspective shift becomes more
precise. This increase in precision corresponds to a shift from route knowledge to
more precise survey knowledge. The precision of survey knowledge is directly de-
pendent upon the precision of the perspective shift (for a similar model for updating
see [6]). For many people, perspective shifts will be imprecise after the first visit, and
therefore, highly insufficient, (e.g., for pointing to distant destinations). However,
they still accurately represent route knowledge, (i.e., indicate which reference frame is
connected with which other reference frame). When the perspective shifts become
more precise after repeated visits, survey knowledge will also become more precise
(cf., [25]; see 2.5). This corresponds with the original claim that route knowledge
usually develops earlier than survey knowledge (e.g., [36]). However, survey knowl-
edge does not have to develop at all (e.g., [24]) or can in principle also be observed
after just a few learning trials (e.g., [27]). Correspondingly, the perspective shifts may
be precise enough for pointing or other survey knowledge tasks after little experience
or they may remain imprecise even after an extended experience. Here, large differ-
ences between individuals due to the sense of direction can be expected (cf., [9],
[35]). Updating global orientation while navigating an environmental space will result
in more precise perspective shifts, and therefore, improve survey knowledge. It fol-
lows that people with a good sense of direction will also acquire precise survey
knowledge quicker. Similarly, environments which ease such updating will lead to
more precise perspective shifts and improve survey knowledge accordingly. This
facilitation can be gained, for example, by uniform slant, distant landmarks, or a grid
city, which all have been shown to enhance orientation performance (e.g., [25], [32]).
2.3 Self-localization by Recognition
When someone gets lost within a familiar environmental space, the principal mode of
reorientation will be by recognizing a single vista space within this environment (for
self-localizing by the structure of environmental spaces see [21], [38]). A vista space
can be recognized by its geometry or by salient landmarks located within (cf. [3]).
First, recognizing a vista space provides navigators, with their location and their ori-
entation within this vista space. Second, recognizing a vista space provides navigators
with their location within the network, (i.e., in which node or vista space reference
frame they are located). Their position in terms of direction and distance with respect
to currently hidden locations in the environmental space however, has to be inferred
from memory. This will be explained in the section on survey navigation by imagina-
tion further below.
2.4 Route Navigation by Activation Spread
Route navigation means selecting and traveling a route from the current location to a
goal. The network of reference frames theory assumes an activation spread mecha-
nism to explain route selection which was proposed by Chown et al. [4] as well as
Trullier et al. [45]. Within the network, activation from the current reference frame
(current node) spreads along the perspective shifts (edges) connecting the various
reference frames (nodes). If the activation reaches the goal node, the route transferring
the activation will be selected, (i.e., a chain of reference frames connected with per-
spective shifts). Here, the association strength of perspective shifts is important. The
association strength is higher for the most navigated perspective shifts. Activation
will be spread faster along those edges that are higher in association strength. If sev-
eral possible routes are encoded within the network, the route that spreads the activa-
tion fastest will be selected for navigation. This route must not necessarily be the
shortest route or the route with the least number of nodes. As the activation propa-
gates easier via highly associated edges, such familiar routes will be selected with
higher probability.
During navigation, the perspective shift provides navigators with information about
where to move next, (i.e., perform the perspective shift). If the perspective shift is
rather imprecise, navigators will only have an indicated direction in which to move.
Moving in this direction, they will eventually be able to recognize another vista space
reference frame. By updating the last reference frame visited, it will prevent naviga-
tors from getting lost. Pre-activating reference frames to come and de-activating
352 T. Meilinger
already visited reference frames will facilitate recognition. When successfully navi-
gating a known route, its perspective shifts will become more accurate and their asso-
ciation strengths will increase, making it more probable that the route will be selected
again.
The described process is probably sufficient to explain most non-human route
navigation. It is also plausible that such a process is inherited in humans and applied
for example, when navigating familiar environments without paying much attention.
However, humans can certainly override this process and select routes by other
means.
2.5 Survey Navigation by Imagination
Survey knowledge tasks such a pointing or shortcutting require that relevant locations
are represented within one frame of reference, (e.g., the current location and the goal
destination). The network of reference frames theory assumes that this integration
within one frame of reference occurs online within working memory. This is only
done when necessary and only for the respective area. For example, when pointing to
a specific destination, only the area from the current location to the destination is
represented. In this framework, the integration within one frame of reference happens
during the retrieval of information and not during encoding or elaboration, as with a
cognitive map. The common reference frame is available only temporarily in working
memory and is not constantly represented in long term memory. The integration itself
is done by imagining distant locations as if the visibility barriers of the current vista
space were transparent. The current vista space can be the one physically surrounding
the navigator or another vista space that is imagined. From the current vista space’s
reference frame, a perspective shift provides the direction and orientation of the con-
nected reference frame. With this information, the navigator imagines the next vista
space within the current frame of reference, (i.e., this location is imagined in terms of
direction and distance from the current vista space). This way, the second vista space
is included in the current reference frame. Now, a third vista space can be included
using the perspective shift connecting the second and the third vista space reference
frames. That way, every location known in the surrounding environmental space can
be imagined. Now, the navigator can point to this distant location, determine the
straight line distance, and try to find a shortcut.
3 The Network of Reference Frames Theory in the Theoretical

and the Empirical Context
3.1 The Network of Reference Frames Theory Compared to Graph

Representations and Cognitive Maps
The network of reference frames theory is a fusion between graph representations and
cognitive maps. Multiple reference frames or cognitive maps are connected with each
other within a graph structure. As in graph representations, the basic structure is a
network or graph. However, in contrast to most existing graph models ([4], [19], [20],
[45], [48]), metric information is included within this graph. This is done for the
nodes, which consist of reference frames, as well as for the edges, (i.e., the perspec-
tive shifts, which represent translations and turns). Such a representation avoids the
problems associated with the mentioned graph representations (see 1.2): (1) Most
importantly, it can explain survey knowledge, as metric relations are represented
contrary to other graph models. (2) Representing metric relations also uses informa-
tion provided by perception. Depth vision and other processes allow us to perceive the
spatial structure of a scene. This information is stored and not discarded like in other
graph models. (3) Perspective shifts represent abstract relations that can be used to
guide walking, cycling, driving, etc. No problem of generalizing from one represented
action to another action occurs as in other graph representations.
The network of reference frames theory also avoids problems from the cognitive
map (see 1.2): (1) It can explain self-localization and route navigation in a straight
forward manner which is difficult for cognitive maps. (2) An environmental space is
not encoded within one reference frame as with a cognitive map. The representation,
therefore, does not have to be consistent globally. So, contrary to cognitive maps,
short cutting is also possible when navigating “impossible” virtual environments [34].
(3) The lack of clear evidence for survey navigation in non-human mammals and
insects can be easily explained. According to the network of reference frames theory,
these animals are not capable of imagining anything or they do not do so for survey
purposes. However, survey navigation relies on the same representation as self-
localization and route navigation. Only the additional process of imagining operates
on this representation. This process might have even evolved for completely different
purposes than navigation. Contrary to that, cognitive map theory has to assume that an
additional representation, (i.e., a cognitive map), evolved only in humans specifically
for orientation. These are much stronger assumptions. (4) Imagining distant destina-
tions within working memory involves a lot of computation. Survey tasks are, there-
fore, rigorous and error prone which probably most people can confirm. In contrast,
this daily life observation is not plausible with a cognitive map. Deriving the direction
to distant locations from a cognitive map is rather straight forward and should not be
more rigorous than, for example, route navigation. 4
The network of reference frames theory also has advantages compared to assuming
both a graph and a cognitive map in parallel (see 1.2):5 Here survey navigation is
again explained by the cognitive map part. This does not avoid the last three problems
mentioned in the last paragraph.6 In addition, the network of reference frames theory
makes fewer assumptions. On a rough scale, it only assumes one representation, the
4
Alternatively to simply read out survey relations from a cognitive map, mental travel has been
proposed as an alternative process [2]. Mental travel can be considered as being more effort-
ful and is, therefore, much more plausible. For the network of reference frames theory
continuous mental travel in the area of an encoded vista space can be imagined. Between non-
adjacent vista spaces, this should be rather difficult.
5
Some theories assuming both a network representation and a global cognitive map are skepti-
cal regarding the necessity and the evidence for such a cognitive map ([16], [31]).
6
In his theory Poucet [31] assumes a network layer with pairwise metric relations between
places. This representation can be used to compute shortcuts and avoids the problems men-
tioned with cognitive maps. However, Poucet also proposes a global integration within a cog-
nitive map, leading again to the mentioned problems. In addition, it is unclear which of the
two metric representations determine survey navigation.
354 T. Meilinger
combination of graphs and maps assume two representations. More specifically,

graphs and maps need to connect corresponding elements, for example elements
which represent the same house. These connections are extra and are potentially more
error prone. A last problem with cognitive maps already mentioned is that we must
have multiple cognitive maps anyway, because we cannot represent the whole world
within one single cognitive map. As we do use reference frames to represent spatial
locations, the question is, what spatial area do such reference frames encode usually?
Here, it is proposed that this basic unit consists of a vista space.
3.2 Vista Space Reference Frames as the Basic Unit in the Representation of
Environmental Spaces
Representing a space in multiple interconnected units, works with units of different

size. Using large units such as towns, results in large packages of information which
might be difficult to process as a whole. On the other hand, smaller units such as
individual objects, result in an exponential increase in relations between the units
which have to be represented. Many experiments show that humans are able to repre-
sent vista spaces within one frame of reference (e.g., [12], [28], [33]). So the main
question is whether navigators use vista spaces or whether they also use larger units,
(e.g., buildings or city districts), to represent locations within one reference frame.
Updating experiments indicate that a surrounding room is always updated during
blindfolded rotations. This is not necessarily the case for the whole surrounding cam-
pus suggesting that the relevant unit is smaller than a campus [47].
The network of reference fames theory predicts that there are no common reference
frames for units larger than vista spaces. Other theories on spatial orientation in robots
[50] and rodents [44] also rely on the visible area as the basic element.7 Several argu-
ments support vista spaces as the basic unit in spatial orientation. (1) Vista spaces are
the largest unit provided directly by visual perception, and (2) they are directly rele-
vant for navigation. (3) Visibility is correlated with wayfinding performance. (4)
Hippocampal place cells are likely related to vista spaces, and (5) our own experi-
ments show that participants encode a simple environmental space not within one
reference frame, but use multiple reference frames in the orientation predicted by the
network of reference frames theory.
Vista spaces can be experienced from only one point of view. In order to represent
environmental spaces, such as buildings and cities, we have to move around (the case
of learning from paper maps is not considered here). When encoding units larger than
vista spaces, several percepts have to be integrated. Such integration is not done spon-
taneously [8]. Vista spaces are also the most relevant unit for navigation. Route deci-
sions have to be taken within a vista space. When lost, self-localization is usually
accomplished by recognizing the geometry or landmarks within a specific vista space
7
In Yeap’s theory [50] all vista spaces are directly adjacent to each other and are connected via
exits. Survey relations computed from that representation are, therefore, correct when the
form of individual vista spaces are correct. In the network of reference frames theory the pre-
ciseness of survey relations depends of the preciseness of the perspective shifts. In addition,
Yeap assumes a hierarchical structuring on top of the basic vista space level. Touretzky and
Redish [44] do not tell anything about environmental spaces. They also assume multiple, si-
multaneously active reference frames represent one vista space.
(e.g., [3]). Short cutting is difficult, because it encompasses more than just one vista
space. In contrast, selecting the direct path to a necessarily visible location within a
vista space is trivial. Visibility is also correlated with behavior. More vista spaces,
(i.e., corridors on a route), lead to larger errors in Euclidean distance estimation [41].
Learning a virtual environmental space is easier with a full view down a corridor than
when visual access is restricted to a short distance, which results in more vista spaces
that need be encoded [38]. Place cells in human and rodent hippocampus seem to
represent a location in a vista space ([5], [30]). Place cells fire every time a navigator
crosses a specific area independent of head orientation. This area is relative to the
surrounding boundaries of a vista space and is adjusted when changing the overall
size or shape of the vista space [29]. One and the same place cell can be active in
different vista spaces, and can therefore, not encode one specific location in an envi-
ronmental space [37]. In conclusion, a set of place cells is a possible neuronal repre-
sentation of locations within one frame of reference. This frame is likely to be limited
to a vista space.
In addition to arguments from the literature, we recently tested the prediction from
the network reference frames theory concerning the importance of vista space refer-
ence frames [23]. This prediction incorporated, first, that a vista space is the largest
unit encoded within one single reference frame, and second, that the orientation of
such a vista space reference frame is important, (i.e., that navigators perform better
when they are aligned with that orientation). Participants learned a simple immersive
virtual environmental space consisting of seven corridors by walking in one direction.
In the testing phase, they were teleported to different locations in the environment and
were asked to self-localize and then point towards previously learned targets. As pre-
dicted by the network of reference frames theory, participants performed better when
oriented in the direction in which they originally learned each corridor, (i.e., when
they were aligned with an encoded vista space reference frame). If the whole envi-
ronment was encoded within one single frame of reference, this result could not be
predicted. One global reference frame should not result in any difference at all (cf.,
[12]) or participants should perform better when aligned with the orientation of this
single global reference frame as predicted by reference axis theory ([28], [33]). No
evidence for this could be observed. Participants seem to encode multiple local refer-
ence frames for each vista space in the orientation they experienced this vista space
(which coincided with its geometry).
3.3 Egocentric and Allocentric Reference Frames
The reference frames in the network of reference frames theory correspond to vista
spaces and they are connected via perspective shifts. Are these relations egocentric or
allocentric? Egocentric and allocentric reference frames have been discussed inten-
sively over the last few years (e.g., [28], [46]). In an egocentric reference frame loca-
tions and orientations within an environment are represented relative to the location
and orientation of a navigator’s body in space [15]. This is best described by a polar
coordinate system. An allocentric reference frame is specified by a space external to a
navigator. Here object-to-object relations are represented in contrast to the object-to-
body relations in the egocentric reference frame. An allocentric reference frame is
best described by a Cartesian coordinate system.
356 T. Meilinger
In principle, the network of reference frames theory is compatible with egocentric

as well as allocentric reference frames. With egocentric reference frames, elements
within a vista space are encoded relative to the origin of the egocentric reference
frame by vectors (and additional rotations if the relative bearing matters). Perspective
shifts are just egocentric vectors which point to another reference frame instead of an
object within the vista space. Despite in principle being compatible with egocentric
reference frames, the network of reference frames theory is better classified as allo-
centric. This decision is based on five arguments: (1) The origin which is quite
prominent in polar coordinate systems does not play a role in the network of reference
frames theory. No performance differences are predicted whether a navigator is lo-
cated at the origin or at another location within a vista space reference frame. A polar
coordinate system would suggest that this makes a difference. (2) Contrary to the
origin, the orientation of a reference frame does make a difference according to the
network of reference frames. When aligned with this orientation, participants should
perform better and do so (see 3.2). Such an orientation, however, is more prominent
in Cartesian coordinate systems, than it is in polar coordinate systems. (3) The orien-
tation of a reference frame originates either from the initial experience with a vista
space or from the vista space’s main geometric orientation, (e.g., the orientation of the
longer walls of a room). In principle, the main geometric orientation might never have
been experienced directly, (i.e., a navigator was never aligned with the surrounding
walls). Still, the geometry might determine the orientation of the reference frame (cf.,
[33]). Although this is a highly artificial situation, such a reference frame has to be
allocentric. (4) Within a vista space reference frame, the geometry of the boundaries
of this vista space is encoded. It has been shown that the room geometry is encoded as
a whole (i.e., allocentrically not by egocentric vectors; e.g., [46]). So at least some of
the relations within a vista space are allocentric anyway. (5) Although perspective
shifts can be understood as egocentric vectors (plus rotations), they are intuitively
better described as relations between locations in an environmental space, (i.e., allo-
centric relations), rather then relations between egocentric experiences. In summary,
the arguments suggest that the network of reference frames theory is better under-
stood as allocentric than as egocentric.
3.4 The Relation between Vista Space Reference Frames: Network vs.
Hierarchy
Hierarchic theories of spatial memory have been very prominent (e.g., [4], [11], [40],
[50]). In such views, smaller scale spaces are stored at progressively lower levels of
the hierarchy. Contrary to these approaches, the network of reference frames theory
does not assume environmental spaces are organized hierarchically, but assumes envi-
ronmental spaces are organized in a network. There is no higher hierarchical layer
assumed above a vista space. All vista spaces are equally important in that sense. This
does not exclude vista spaces themselves from being organized hierarchically.
Hierarchical graph models or hierarchical cognitive maps still face most of the
problems discussed in 3.1. However, one argument for hierarchical structuring is
based on clustering effects. In clustering effects, judgments within a spatial region are
different from judgments between or without spatial regions. For instance, within
a region distances are estimated faster and judged being shorter or locations are
remembered lying more to the center of such a region than they were seen before.
Many of these clustering effects have been examined for regions within a vista space
or a whole country usually learned via maps (e.g., [40]). They are, therefore, not rele-
vant here. However, clustering effects are also found in directly experienced envi-
ronmental spaces. Experiments show that distance judgments [11] and route decisions
between equal length alternatives [49] are influenced by regions within the environ-
mental space. These effects cannot be explained by the network of reference theory
alone. A second categorical memory has to be assumed which represents a specific
region (cf., [13]). Judgments must be based at least partially on these categories and
not on the network of reference frames only. These categories might consist of verbal
labels such as “downtown” [22]. As a prediction, no clustering effects for directly
learned environmental spaces should be observed when such a category system is
inhibited, (e.g., by verbal shadowing).
3.5 Asymmetry in Spatial Memory
The perspective shifts assumed by the network of reference frames theory are not
symmetric. They always point from one vista space to another and are not inverted
easily. Tasks accessing a perspective shift in its encoded direction should be easier
and more precise than tasks that require accessing the perspective shift in the opposite
direction - at least as long as there is no additional perspective shift encoded in the
opposite direction. This asymmetry can explain the route direction effect in spatial
priming and different route choices for wayfinding there and back.
After learning a route presented on a computer screen in only one direction, recog-
nizing pictures of landmarks is faster when primed with a picture of an object encoun-
tered before the landmark than when primed with an object encountered after the
landmark (e.g., [14]). According to the network of reference frames theory the direc-
tionality of perspective shifts speeds up activation spread in the direction the route
was learned. Therefore, priming is faster in the direction a route was learned.
Asymmetries are also found in path choices. In a familiar environment, navigators
often choose different routes on the way out and back (e.g., [39]). According to the
network of reference frames theory, different perspective shifts usually connect vista
spaces on a route out and back. Due to different connections, different routes can be
selected when planning a route out compared to planning the route back.
The network of reference frames theory explains asymmetries on the level of route
knowledge. However, it also predicts an asymmetry in survey knowledge. Learning a
route mainly in one direction should result in an improved survey performance, (i.e.,
faster and more precise pointing), in this direction compared to the opposite direction.
This yet has to be examined.
4 Conclusions
The network of reference frames theory is a synthesis from graph representations and
cognitive maps. It resolves problems that exist in explaining the orientation behavior
of human and non-human animals based on either graphs, maps or both of them in
parallel. In addition, the theory explains the unique role of vista spaces as well as
358 T. Meilinger
asymmetries in spatial memory. New predictions from the theory concern, first, the
role of orientation within environmental spaces, which has been tested recently, sec-
ond, the lack of clustering effects in environmental spaces based on the assumed
memory alone, and third, an asymmetry in survey knowledge tasks. Further experi-
ments have to show whether the network of reference frames theory will prove of
value in these and other cases.
Acknowledements. This research was supported by the EU grant “Wayfinding” (6th

FP - NEST). I would like to thank Heinrich Bülthoff for supporting this work, Bern-
hard Riecke, Christoph Hölscher, Hanspeter Mallot, Jörg Schulte-Pelkum and Jack
Loomis for discussing the ideas proposed here, Jörg Schulte-Pelkum for help with
writing and Brian Oliver for proof-reading.
References
1. Bennett, A.T.D.: Do animals have cognitive maps? Journal of Experimental Biology 199,
219–224 (1996)
2. Byrne, P., Becker, S., Burgess, N.: Remembering the past and imagining the future: a neu-
ral model of spatial memory and imagery. Psychological Review 114, 340–375 (2007)
3. Cheng, K., Newcombe, N.S.: Is there a geometric module for spatial orientation? Squaring
theory and evidence. Psychonomic Bulletin & Review 12, 1–23 (2005)
4. Chown, E., Kaplan, S., Kortenkamp, D.: Prototypes location, and associative networks
(PLAN): Towards a unified theory of cognitive mapping. Cognitive Science 19, 1–51
(1995)
5. Ekstrom, A., Kahana, M., Caplan, J., Fields, T., Isham, E., Newman, E., Fried, I.: Cellular
networks underlying human spatial navigation. Nature 425, 184–187 (2003)
6. Fujita, N., Klatzky, R.L., Loomis, J.M., Golledge, R.G.: The encoding-error model of
pathway completion without vision. Geographical Analysis 25, 295–314 (1993)
7. Gallistel, C.R.: The organization of learning. MIT Press, Cambridge (1990)
8. Hamilton, D.A., Driscoll, I., Sutherland, R.J.: Human place learning in a virtual Morris
water task: some important constraints on the flexibility of place navigation. Behavioural
Brain Research 129, 159–170 (2002)
9. Hegarty, M., Waller, D.: Individual differences in spatial abilities. In: Shah, P., Miyake, A.
(eds.) The Cambridge Handbook of Visuospatial Thinking, pp. 121–169. Cambridge Uni-
versity Press, Cambridge (2005)
10. Hein, A., Held, R.: A neural model for labile sensorimotor coordination. In: Bernard, E.E.,
Kare, M.R. (eds.) Biological prototypes and synthetic systems, vol. 1, pp. 71–74. Plenum,
New York (1962)
11. Hirtle, S.C., Jonides, J.: Evidence of hierarchies in cognitive maps. Memory & Cogni-
tion 13, 208–217 (1985)
12. Holmes, M.C., Sholl, M.J.: Allocentric coding of object-to-object relations in overlearned
and novel environments. Journal of Experimental Psychology: Learning, Memory and
Cognition 31, 1069–1078 (2005)
13. Huttenlocher, J., Hedges, L.V., Duncan, S.: Categories and particulars: prototype effects in
estimating spatial location. Psychological Review 98, 352–376 (1991)
14. Janzen, G.: Memory for object location and route direction in virtual large-scale space. The
Quarterly Journal of Experimental Psychology 59, 493–508 (2006)
15. Klatzky, R.L.: Allocentric and egocentric spatial representations: Definitions, distinctions,
and interconnections. In: Freska, C., Habel, C., Wender, K.F. (eds.) Spatial cognition - An
interdisciplinary approach to representation and processing of spatial knowledge, pp. 1–17.
16. Kuipers, B.: The spatial semantic hierarchy. Artificial Intelligence 119, 191–233 (2000)
17. Loomis, J.M., Klatzky, R.L., Golledge, R.G., Philbeck, J.W.: Human navigation by path
integration. In: Golledge, R.G. (ed.) Wayfinding behavior, pp. 125–151. John Hopkins
Press, Baltimore (1999)
18. MacFarlane, D.A.: The role of kinesthesis in maze learning. University of California Pub-
lications in Psychology 4 277-305 (1930); (cited from Spada, H. (ed.) Lehrbuch allge-
meine Psychologie. Huber, Bern (1992)
19. McNaughton, B.L., Leonard, B., Chen, L.: Cortical-hippocampal interactions and cogni-
tive mapping: A hypothesis based on reintegration of parietal and inferotemporal pathways
for visual processing. Psychbiology 17, 230–235 (1989)
20. Mallot, H.: Spatial cognition: Behavioral competences, neural mechanisms, and evolution-
ary scaling. Kognitionswissenschaft 8, 40–48 (1999)
21. Meilinger, T., Hölscher, C., Büchner, S.J., Brösamle, M.: How Much Information Do You
Need? Schematic Maps in Wayfinding and Self Localisation. In: Barkowsky, T., Knauff,
M., Ligozat, G., Montello, D.R. (eds.) Spatial Cognition V, pp. 381–400. Springer, Berlin
(2007)
22. Meilinger, T., Knauff, M., Bülthoff, H.H.: Working memory in wayfinding - a dual task
experiment in a virtual city. Cognitive Science 32, 755–770 (2008)
23. Meilinger, T., Riecke, B.E., Bülthoff, H.H.: Orientation Specificity in Long-Term-Memory
for Environmental Spaces (submitted)
24. Moeser, S.D.: Cognitive mapping in a complex building. Environment and Behavior 20,
21–49 (1988)
25. Montello, D.R.: Spatial orientation and the angularity of urban routes: A field study. Envi-
ronment and Behavior 23, 47–69 (1991)
26. Montello, D.R.: Scale and multiple psychologies of space. In: Frank, A.U., Campari, I.
(eds.) Spatial information theory: A theoretical basis for GIS, pp. 312–321. Springer, Ber-
lin (1993)
27. Montello, D.R., Pick, H.L.: Integrating knowledge of vertically aligned large-scale spaces.
Environment and Behavior 25, 457–484 (1993)
28. Mou, W., Xiao, C., McNamara, T.P.: Reference directions and reference objects in spatial
memory of a briefly viewed layout. Cognition 108, 136–154 (2008)
29. O’Keefe, J., Burgess, N.: Geometric determinants of the place fields of hippocampal neu-
rons. Nature 381, 425–428 (1996)
30. O’Keefe, J., Nadel, L.: The hippocampus as a cognitive map. Clarendon Press, Oxford
(1978)
31. Poucet, B.: Spatial cognitive maps in animals: New hypotheses on their structure and neu-
ral mechanisms. Psychological Review 100, 163–182 (1993)
32. Restat, J., Steck, S.D., Mochnatzki, H.F., Mallot, H.A.: Geographical slant facilitates navi-
gation and orientation in virtual environments. Perception 33, 667–687 (2004)
33. Rump, B., McNamara, T.P.: Updating Models of Spatial Memory. In: Barkowsky, T.,
Knauff, M., Ligozat, G., Montello, D.R. (eds.) Spatial Cognition V, pp. 249–269. Springer,
Berlin (2007)
34. Schnapp, B., Warren, W.: Wormholes in virtual reality: What spatial knowledge is learned
for navigation? In: Proceedings of the 7th Annual Meeting of the Vision Science Society
2007, Sarasota, Florida, USA (2007)
360 T. Meilinger
35. Sholl, J.M., Kenny, R.J., DellaPorta, K.A.: Allocentric-heading recall and its relation to
self-reported sense-of-direction. Journal of Experimental Psychology: Learning, Memory,
and Cognition 32, 516–533 (2006)
36. Siegel, A.W., White, S.H.: The development of spatial representations of large-scale envi-
ronments. In: Reese, H. (ed.) Advances in Child Development and Behavior, vol. 10, pp.
10–55. Academic Press, New York (1975)
37. Skaggs, W.E., McNaughton, B.L.: Spatial Firing Properties of Hippocampal CA1 Popula-
tions in an Environment Containing Two Visually Identical Regions. Journal of Neurosci-
ence 18, 8455–8466 (1998)
38. Stankiewicz, B.J., Legge, G.E., Mansfield, J.S., Schlicht, E.J.: Lost in Virtual Space: Stud-
ies in Human and Ideal Spatial Navigation. Journal of Experimental Psychology: Human
Perception and Performance 37, 688–704 (2006)
39. Stern, E., Leiser, D.: Levels of spatial knowledge and urban travel modeling. Geographical
Analysis 20, 140–155 (1988)
40. Stevens, A., Coupe, P.: Distortions in judged spatial relations. Cognitive Psychology 10,
422–437 (1978)
41. Thorndyke, P.W., Hayes-Roth, B.: Differences in spatial knowledge acquired from maps
and navigation. Cognitive Psychology 14, 560–589 (1982)
42. Thrun, S., Burgard, W., Fox, D.: Probabilistic Robotics. MIT Press, Cambridge (2005)
43. Tolman, E.C., Ritchie, B.F., Khalish, D.: Studies in spatial learning. I. Orientation and the
short-cut. Journal of Experimental Psychology 36, 13–24 (1946)
44. Touretzky, D.S., Redish, A.D.: Theory of rodent navigation based on interacting represen-
tations of space. Hippocampus 6, 247–270 (1996)
45. Trullier, O., Wiener, S.I., Berthoz, A., Meyer, J.-A.: Biologically based artificial naviga-
tion systems: Review and prospects. Progress in Neurobiology 51, 483–544 (1997)
46. Wang, F.R., Spelke, E.S.: Human spatial representation: insights form animals. Trends in
Cognitive Sciences 6, 376–382 (2002)
47. Wang, R.F., Brockmole, J.R.: Simultaneous spatial updating in nested environments. Psy-
chonomic Bulletin & Review 10, 981–986 (2003)
48. Werner, S., Krieg-Brückner, B., Herrmann, T.: Modelling Navigational Knowledge by
Route Graphs. In: Habel, C., Brauer, W., Freksa, C., Wender, K.F. (eds.) Spatial Cognition
49. Wiener, J., Mallot, H.: Fine-to-coarse route planning and navigation in regionalized envi-
ronments. Spatial Cognition and Computation 3, 331–358 (2003)
50. Yeap, W.K.: Toward a computational theory of cognitive maps. Artificial Intelligence 34,
297–360 (1988)
Spatially Constrained Grammars
for Mobile Intention Recognition
Peter Kiefer
Laboratory for Semantic Information Technologies

Otto-Friedrich-University Bamberg
96045 Bamberg, Germany
peter.kiefer@uni-bamberg.de
Abstract. Mobile intention recognition is the problem of inferring a mo-

bile agent’s intentions from her spatio-temporal behavior. The intentions
an agent can have in a specific situation depend on the spatial context,
and on the spatially contextualized behavior history. We introduce two
spatially constrained grammars that allow for modeling of complex con-
straints between space and intentions, one based on Context-Free, one
based on Tree-Adjoining Grammars. We show which of these formalisms
is suited best for frequently occurring intentional patterns. We argue
that our grammars are cognitively comprehensible, while at the same
time helping to prune the search space for intention recognition.
Keywords: Intention recognition, Mobile assistance systems.
1 Introduction
The problem of inferring an agent’s intentions from her behavior is called in-
tention recognition problem. The closely related problem of plan recognition has
been discussed in AI literature since many years [1]. Approaches for plan recog-
nition differ in the way the domain and possible plans are represented. While
early work tended to be quite general, like Kautz’s event hierarchies [2], current
research is typically concerned with specialized use cases (e.g. [3]), and efficient
inference (e.g. [4]).
A class of intention recognition problems with specific need for efficient infer-
ence is mobile intention recognition. We observe a mobile user’s trajectory and
try to ‘guess’ what intentions she has in mind. These mobile problems are differ-
ent, not only because of the restricted computational and cognitive resources [5].
Mobile intention recognition problems also differ to ‘traditional’ use cases be-
cause mobile behavior happens in space. This has a number of implications. One
is that we have knowledge about the spatial context, about spatial objects, their
relations, and spatial constraints. A glance at current research on the inverse
problem, spatio-temporal planning, gives us an idea how these constraints can
look like: Seifert et al. discuss an interactive assistance system that supports in
spatio-temporal planning tasks [6]. In their example they describe the constraints
that need to be considered when planning a trip: the temporal order of activities,

362 P. Kiefer
the time needed for traveling from A to B, and spatial constraints about what
actions can be performed at which location. Important about Seifert’s approach
is that the chosen hierarchical spatial structure offers a cognitively appealing
way of interaction between user and planning system, while at the same time
helping to prune the search space.
In this paper, we will see that complex constraints between intentions and
space not only give us a rich toolbox to formalize typical behavioral patterns
in mobile intention recognition, but can also speed up inference. We choose for-
mal grammars to represent intentions so that the intention recognition problem
becomes a parsing problem. Grammars are, in general, cognitively easy to under-
stand and make the connection between expressiveness and complexity explicit.
The main contribution of this paper is the combination of spatial constraints
with Tree Adjoining Grammars (TAG), a formalism from natural language pro-
cessing (NLP) that falls in complexity between context-free and context-sensitive
grammars (CFG, CSG). The idea to apply grammar formalisms from NLP to
plan/intention recognition is also followed by Geib and Steedman [7], and in own
previous work [8]. In difference to these approaches, our spatially constrained
grammars allow the formalization of complex, non-local constraints between in-
tentions and space (and not only between intentions).
The rest of this paper is structured as follows: in section 2 we explain which
steps are necessary to state a mobile intention recognition problem as a pars-
ing problem. In this context we review Spatially Grounded Intentional Systems
(SGIS) [9]. In section 3, we explain which important use cases cannot be han-
dled with SGIS, and proceed over Spatially Constrained Context-Free Grammars
(SCCFG) to Spatially Constrained Tree-Adjoining Grammars (SCTAG). Using
real motion track data from the location-based game CityPoker we discuss which
general spatio-temporal behavior patterns are handled best with which formal-
ism. The paper closes with a discussion of related work (section 4) and an outlook
on questions that remain open (section 5).
2 From Spatio-temporal Behavior to Intentions

2.1 Mobile Intention Recognition
The fact that mobile behavior happens in space and time has mainly two im-
plications: one is that we can take use of spatial information. We do not only
know the absolute coordinate of a user’s behavior, but also the spatial context.
With an according spatial model we can say that the behavior happened, for
instance, in a specific region, on a road, or close to a point of interest. We also
have information about the spatial relations between these objects [10], like in-
tersect, overlap, or north of. Depending on the specific use case, these spatial
objects also bear a certain semantics: ‘a restaurant is a place where I can have
the intention to eat something’. This is very similar to the basic intuition behind
activity-based spatial ontologies [11]. However, inferring the agent’s intention di-
rectly from her position is too simple in many situations: a mobile user passing
Spatially Constrained Grammars for Mobile Intention Recognition 363
br
br b0 br
br b0
bcs br
br br
bcs
br br bc
bs
bs
b0
br
br
br
br
bc
Fig. 1. Segmented motion track with classified behavior sequence from a CityPoker
game. (The player enters from the right.)
by a restaurant does not necessarily have the intention to eat there. Schlieder
calls this spatio-temporal design problem room crossing problem [9].
This leads us to the second implication of spatio-temporality: the gap be-
tween sensor input (e.g. position data from a GPS device) and high-level in-
tentions (e.g. ‘find a restaurant’ ) is extremely large. It is not possible to design
an intelligent intention recognition algorithm that works directly on pairs of
(latitude/longitude). To bridge this gap, we use a multi-level architecture with
the level of behaviors as intermediate level between position and intention. We
process a stream of (lat/lon)-pairs as follows:
1. Preprocessing. The quality of the raw GPS data is improved. This includes
removing points with zero satellites, and those with an impossible speed.
2. Segmentation. The motion track is segmented at the border of regions, and
when the spatio-temporal properties (e.g. speed, direction) of the last n
points have changed significantly [12].
3. Feature Extraction. Each segment is analyzed and annotated with certain
features, like speed and curvature [13].
4. Classification. Using these features, each motion segment is classified to one
behavior. We can use any mapping function from feature vector to behaviors,
for instance realized as a decision tree.
As output we get a stream of behaviors. In the example from Fig. 1 we distin-

guish the following behaviors: riding (br ), standing (b0 ), sauntering (bs ), curving
(bc ), and slow-curving (bcs ). This track was recorded in the location-based game
CityPoker. In this game, two players are trying to find (physical) playing cards
364 P. Kiefer
which are hidden in a city. The gaming area is structured by five rectangular
cache regions. In each cache region there are three potential cache coordinates
(one is drawn as a circle in Fig. 1). Cards are only hidden in one of the three
potential caches. Players can find out about the correct cache by answering a
multiple choice question. Once they have arrived at the cache, they perform
a detail search in the environment, under bushes, trees, or benches, until they
finally find the cards. They may then trade one card against one from their hand,
and continue in the game. For a complete description of the game, refer to [9].
The reason why this game is especially suited as exemplary use case is that
CityPoker is played by bike at high speed. The user’s cognitive resources are
bound by the traffic, and she does not have the possibility to interact with
the device (a J2ME enabled smartphone, localized by GPS) in a proper way.
Similar situations occur in other use cases, like car navigation or maintenance
work. Depending on the intention recognized we want to select an appropriate
information service automatically. For instance, if we recognize the intention
F ind W ay we will probably select a map service. It is up to the application
designer to decide whether to present the service with information push, or just
to ease the access to this service (‘hotbutton’). We will not discuss the step of
mapping intentions to information services any further in this paper.
2.2 Parsing Behavior Sequences

The stream of behaviors described above serves as input to a parsing algorithm.
Using behaviors as terminals and intentions as non-terminals, we can write rules of
a formal grammar that describe the intentions of an agent in our domain. Most plan
recognition approaches have followed a hierarchical structure of plans/intentions
(e.g. [14,15]). We should say something about the difference between plans and
intentions although an elaborate discussion of this issue is beyond the scope of this
paper. In line with common BDI agent literature, we see intentions as ‘states of
mind’ which are directed ‘towards some future state of affairs’ ([16, p.23]). We see
‘plans as recipes for achieving intentions.’ [16, p.28]. We can say that a rule in our
grammar describes a plan, while each non-terminal stands for one intention. Thus,
the aim of intention recognition is to find out (at least) the current intention.
In CityPoker, for instance, a player will certainly have the intention to Play. At
the beginning of each game, the members of a team discuss their strategy. Play-
ing in CityPoker means exchanging cards in several cache regions, so we model
a sequence of intentions as follows: GotoRegion HandleRegion, GotoRegion
HandleRegion, and so on. In the cache region players find themselves a com-
fortable place to stand, answer a multiple-choice question, and select one out
of three caches, depending on their answer. In the cache, they search a playing
card which is hidden in the environment (see the behavior sequence in Fig. 1).
A context-free production system for CityPoker is listed in Fig. 21 . Gram-
mar rules like these are modular and intuitively understandable, also for non-
computer scientists. Formal properties of grammars are well-known, and parsing
1
Rules with a right-hand side of the form (symbol1 |...|symboln )+ are a simplified
notation for ‘an arbitrary sequence of symbol1 , ..., symboln , but at least one of them’.
algorithms exist. The choice of the formalism depends on the requirements of

the use case. We briefly recall that with a CFG, we can express patterns of the
form an bn . As argued in [9], most intention recognition use cases need at least
this expressiveness. A typical example is leaving the same number of regions as
entered before (entern leaven ). Note that parsing a stream of behaviors must be
done incrementally, i.e. with an incomplete behavior sequence. We can find the
currently active intention in the parse tree by choosing the non-terminal which
is direct parent of the current behavior.
2.3 Reducing Parsing Ambiguities by Adding Spatial Knowledge
When parsing formal grammars we easily find ourselves in a situation where the
same input sequence may have two or more possible parse trees, i.e. more than
one possible interpretation. This is especially true when parsing an incomplete
behavior sequence incrementally. One way to deal with ambiguity are probabilis-
tic grammars [17] where we have to determine a probability for each rule in the
grammar. A spatial way of ambiguity reduction is proposed by Schlieder in [9]:
SGIS are context-free production systems, like that in Fig. 2, with the extension
that each rule is annotated with a number of regions in which it is applicable. We
call this the spatial grounding of rules. For instance, a HandleCache intention is
grounded in all regions of type cache. We modify all rules accordingly. An SGIS
rule for the original rule (12) would look like follows:
HandleCache → SearchCards DiscussStrategy

[grounding : cache1,1 , ..., cache5,3 ]
This reduces the number of possible rules applicable at each position in the
behavior sequence, thus avoiding many ambiguities. Figure 3 shows two possible
interpretations for the behavior sequence from Fig. 1: without spatial knowledge
we could not decide which of the two interpretations is correct. For parsing in
SGIS we replace the pure behavior stream (beh1 , beh2 , beh3 , ...) by a stream of
behavior/region pairs: ((beh1 , reg1 ), (beh2 , reg2 ), (beh3 , reg3 ), ...). Each behavior
is annotated with the region in which it occurs. Also the non-terminals in the
parse tree are annotated with a region (Intention, region), with the meaning that
all child-intentions or child-behaviors of this intention must occur in that region.
SGIS are a short form of writing rules of the following form (where Symbol can
be an intention or a behavior):
(Intention, regx ) → (Symbol1 , regx ) ... (Symboln , regx )
That means, we cannot write rules for arbitrary combinations of regions. In addi-
tion, we require that another rule can only be inserted at an intention Symboli if
the region of the other rule is (transitive) child in the partonomy, i.e. in the above
rule we can only insert productions with a region regy part of regx (which in-
cludes the same region: regy .equals(regx )). SGIS have been designed for partono-
mially structured space. The nesting of rules follows closely the nesting of regions
366 P. Kiefer
Production Rules for CityPoker

Play → DiscussStrategy Continue (1)
DiscussStrategy → b0 (2)
Continue → ε | GotoRegion HandleRegion Continue (3)
GotoRegion → (br |b0 |bc )+ (4)
HandleRegion → SelectCache GotoCache HandleCache (5)
SelectCache → FindParkingPos AnswerQuiz (6)
FindParkingPos → (br |bc |bcs )+ (7)
AnswerQuiz → b0 (8)
GotoCache → (SearchWayToC |NavigateTowardsC)+ (9)
SearchWayToC → (b0 |bcs |bs )+ (10)
NavigateTowardsC → (br |bc )+ (11)
HandleCache → SearchCards DiscussStrategy (12)
SearchCards → (CrossCache|DetailSearch)+ (13)
CrossCache → (br )+ (14)
DetailSearch → (b0 |bcs |bs |bc )+ (15)
Fig. 2. Context-free production rules for intention recognition in CityPoker
HandleRegion
SelectCache GoToCache HandleCache
FindPP AnsQ NavigateToC SearchCards ...
DetailSearch SearchCards
DetailSearch SearchCards
CrossCache SearchCards
br b0 br bcs bc br ... ...

HandleRegion
SelectCache GoToCache HandleCache
FindPP AnsQ NavigateToC GoToCache ... ...
SearchWayToC GoToCache
SearchWayToC GoToCache
NavigateTowardsC
...
br b0 br bcs bc br ...
Fig. 3. Parsing ambiguity if we had no spatial knowledge (see track from Fig. 1).
Through spatial disambiguation in SGIS we can decide that the bottom parse tree is
correct.
and sub-regions in the spatial model. The CityPoker partonomy is structured as

follows: the game area contains five rectangular cache regions, each of which in
turn contains three caches.
SGIS deliberately restrict us in what we can express: we cannot write rules for
arbitrary pairs of behavior and region. This makes sense from a spatial point of
view (the agent cannot ‘beam’ herself), as well as from a cognitive point of view:
as in Seifert et al. [6], the knowledge engineer is working with a representational
formalism that resembles a structure of space prefered by many individuals: a
hierarchical one [18].
3 Spatially Constrained Grammars

3.1 Spatially Constrained Context-Free Grammars
SGIS is a formalism with which we can model a variety of spatio-temporal inten-
tion recognition problems. With the spatial grounding of rules we can formalize
spatial constraints of type part of . Constraints about the temporal order of in-
tentions are formalized implicitly through the order of right-hand symbols in the
production rules.
However, the restrictions of SGIS hinder us from expressing frequently occur-
ing use cases. Consider the motion track in Fig. 1: the agent enters the cache,
shows some searching behavior, and then temporarily leaves the circular cache
to the south. Knowing the whole motion track we can decide that this is better
described as an accidental leaving, i.e. no intention change, than as a Change-
Plan intention2 . For an incremental algorithm, it is not clear at the moment of
leaving whether the agent will return. It is also not necessary that the intermedi-
ate behavior is located in the parent cache region of the cache. Finally, entering
just any cache is not sufficient for accidental leaving, but we require that cache
to be the same as left before. We would need the following rule
(HandleCache, cache1,1) →(SearchCards, cache1,1 ),
(accidental leaving behavior, [unconstrained]),
(SearchCards, cache1,1 )
We cannot formulate this in SGIS, but still it makes no sense to write rules for
pairs of (intention, region). We have already argued against this maximum of
complexity in section 2.3. At this point, we can add another argument: we would
have to write a plethora of similar rules for each cache in our game. What we
would need to formalize the accidental leaving pattern elegantly is the following:
identical
HandleCache → SearchCards Confused SearchCards
2
A player in CityPoker who has given a wrong answer to the quiz will be searching
at the wrong cache and probably give up after some time. He will then head for one
of the other caches. The ChangeP lan intention was omitted in Fig. 2 for reasons of
clarity.
368 P. Kiefer
We can easily find other examples of the pattern ‘a certain behavior/intention

occurs in a region which has a spatial relation r to another region where the
agent has done something else before’. For instance, we can find use cases where
it makes sense to detect a ReturnT oX intention if the agent has forgotten the
way back to some place. We could define this as ‘the agent shows a searching
behavior in a region which touches a region she has been to before’:
touches
ClothesShopping → ExamineClothes HaveABreak ReturnToShop
The definition of a new spatial context-free grammar that handles these ex-
amples is quite straightforward.
Definition 1. A Spatially Constrained Context-Free Grammar is defined as
SCCF G = (CF G, R, SR, GC, N LC), where
– CFG is a context-free grammar (I, B, P, S), defined over intentions I, and
behaviors B, with production rules P and start symbol S (the top-level inten-
tion).
– R is a set of regions
– SR is a set of spatial relations, where each relation r ⊆ R × R
– GC ⊆ P × R is a set of grounding constraints (as in SGIS [9])
– NLC is a set of spatial non-local constraints. Each constraint has a type
from the spatial relations SR and is defined for two right-hand symbols of
one production rule from P.
We introduce the grounding constraints to make SCCFG a real extension of
SGIS. However, we will not always need them, as in the CityPoker example.
The reason is that CityPoker-regions are typed according to their level in the
partonomy (cache part of cache region part of gameboard). With a SCCFG we
can rewrite the rules from Fig. 2 without spatial grounding in a specific region,
but with part of and identical relations, for instance for rules (5) and (12):
identical part of
HandleRegion → SelectCache GotoCache HandleCache
identical
HandleCache → SearchCards DiscussStrategy
SCCFG obviously have a higher expressiveness than SGIS. We can express more
spatial relations than part of, and create a nesting of relations by applying the
production rules. In difference to SGIS, the nesting of constraints is not neces-
sarily accompanied by an according nesting of regions in the partonomy. The
example above for rule (5) shows that we could also infer new relations from
those we know (HandleCache must be partof SelectCache).
In principle, we could define an SCCFG for a non-partonomial spatial struc-
ture although this might make the model cognitively more demanding.
Fig. 4. Substitution (left) and adjoining (right) on a TAG (taken from [20, Fig. 2.2])
3.2 Cross-Dependencies: A Parallel to NLP

Quite frequently, players in CityPoker do not change a playing card although
they have found it. They memorize the types of cards they have found and their
exact position, and continue in the game. For a number of reasons it might
make sense to change in another cache region first. Sometimes they return to
that cache region at some time later in the game to change a card (without the
effort of answering the quiz, cache search, and so on). An intelligent assistance
system should recognize the intention RevisitRegion and offer an appropriate
information service. The crossed return to region pattern we would like to model
in this use case looks as follows:
identical
identical
HandleRegion HandleRegion RevisitRegion HandleRegion RevisitRegion
What we need for this is a possibility to create cross-dependencies. A constrained

context-free grammar, like SCCFG, can have cross-dependencies, but only static
ones which are defined directly in the rules. No new cross-dependencies can evolve
during parsing by the operations offered for CFGs. Modeling all possibilities for
cross-dependencies statically in the rules is infeasible, even for CityPoker. Note
that more than two constraints might be crossing, and not all HandleRegion
intentions are followed by an according RevisitRegion.
As explained in [7] and [8], similar cross-dependencies occur in NLP. In some
natural languages, cross-dependencies are possible between grammatical con-
structs. If a certain tense, case, or other grammatical form is chosen for the
front non- or pre-terminal, we have to choose an according construct for the
back non- or pre-terminal. To handle such cross-dependencies, the NLP com-
munity has developed formalisms with an extended domain of locality: ‘By a
domain of locality we mean the elementary structures of a formalism over which
dependencies such as agreement, subcategorization, filler-gap, etc. can be spec-
ified.’ ([19, p.5]). In the following, we introduce one of these formalisms, and
convert it to a spatially constrained one.
3.3 Tree-Adjoining Grammars

Mildly Context-Sensitive Grammars (MCSG) are a class of formal grammars with
common properties [21]. Their expressiveness falls between CFGs and CSGs, and
370 P. Kiefer
they support certain kinds of dependencies, including crossed and nested depen-
denciess. They are polynomially parsable and thus especially attractive for mobile
intention recognition.
Tree-Adjoining Grammars (TAG), first introduced in [22], are a MCSG with
an especially comprehensible way of modeling dependencies. The fundamental
difference to CFGs is that TAGs operate on trees, and not on strings. A good
introduction to TAG is given by Joshi and Schabes in [20]. They define TAG as
follows.
Definition 2. A Tree-Adjoining Grammar is defined as TAG = (NT, Σ, IT,
AT, S), where
– NT are non-terminals
– Σ are terminals.
– IT is a finite set of initial trees. In an initial tree, interior nodes are labeled
by non-terminals. The nodes on the frontier (leaf nodes) are labeled by either
terminals, or non-terminals. A frontier node labeled with a non-terminal
must be marked for substitution. We mark substitution nodes with a ↓.
– AT is a finite set of auxiliary trees. In an auxiliary tree, interior nodes are
also labeled by non-terminals. Exactly one node at the frontier is the foot
node, marked with an asterisk ∗. The foot node must have the same label as
the root node. All other frontier nodes are either terminals or substitution
nodes, as in the initial trees.
– S is a distinguished non-terminal (starting symbol).
The two operations defined on TAGs are substitution and adjoining (see Fig. 4).
Adjoining is sometimes also called adjunction. Both operations work directly on
trees. Substitution is quite straightforward: we can place any initial tree (or any
tree that has been derived from an initial tree) headed with a symbol X into
a substitution node labeled with X↓. It is the adjoining operation that makes
TAGs unique: we can adjoin an auxiliary tree labeled with X into an interior
node of another tree with the same label. This operation works as follows: (1) we
remove the part of the tree which is headed by the interior node, (2) replace it
by the auxiliary tree, and (3) attach the partial tree which was removed in step 1
at the foot node. The language defined by a TAG is a set of trees. By traversing
a tree we can certainly also interpret it as a String, just like traversing a parse
tree of a CFG. If, just for a moment, we try to interpret the two operations as
operations on Strings, we see that substitution just replaces a non-terminal by
a number of symbols. This is exactly as applying a production rule in a CFG.
Adjoining manipulates a String in a more intricate way: a part of the old String
(the terminals of the grey tree in Fig. 4) becomes surrounded by new Strings to
the left and to the right (by the left and right handside of the X∗ in the auxiliary
trees).
Joshi and Schabes later add to their definition of TAG the following Adjoining
Constraints: Selective Adjunction, Null Adjunction, and Obligatory Adjunction.
Every non-terminal in any tree may be constrained by one of these. Selective
Adjunction restrains the auxiliary tree that may be adjoined at that node to a
(α) Play (β) Continue
DiscussStrategy↓ Continue GotoRegion↓HandleRegion↓ Continue∗
identical part of
(γ) Continue
GotoRegion↓ HandleRegion↓ Continue

part of Continue∗ RevisitR↓
identical
Fig. 5. Initial tree (α) and auxiliary trees (β and γ) in a SCTAG for CityPoker
set of auxiliary trees. Obligatory Adjunction does the same, but at the same time
forces us to do adjoin at that node. Null Adjunction disallows any adjunction
at that node. These local constraints are important to write sensible grammars,
but will not be further discussed here due to our focus on non-local constraints.
A discussion of the formal properties of TAGs, the differences to other gram-
mars, a corresponding automaton, as well as parsing algorithms can be found in
a number of publications, e.g. [20,23,24]. For our use case it should be clear that
(1) we can easily rewrite any CFG as TAG, (2) TAGs are more expressive than
CFGs, and (3) writing a TAG is not necessarily more complicated than writing
a CFG. Instead of writing a number of production rules, we just write a number
of trees.
3.4 Spatially Constrained Tree-Adjoining Grammars
Definition 3. A Spatially Constrained Tree-Adjoining Grammar is defined as

SCT AG = (T AG, R, SR, GC, N LC), where
– TAG = (I, B, IT, AT, S), defined over intentions I, and behaviors B.
– R is a set of regions
– SR is a set of spatial relations, where each relation r ⊆ R × R
– GC ⊆ (IT ∪ AT ) × R is a set of grounding constraints
– NLC is a set of spatial non-local constraints. Each constraint has a type from
the spatial relations SR and is defined for two nodes in one tree from IT∪AT.
This definition applies the idea of spatial constraints to TAGs. The non-local
constraints are now defined between nodes in initial/auxiliary trees. The idea of
specifying non-local dependencies in TAG is not new. In earlier work on TAGs,
Joshi describes this concept as ‘TAGs with links’ [23, Section 6.2].
During the operations of substitution and adjoining the non-local constraints
remain in the tree, and become stretched if necessary. Adjoining may also lead to
372 P. Kiefer
(γ adj α) Play
DiscussStrategy↓ Continue
GotoRegion HandleRegion Continue
Continue RevisitRegion↓
identical
(γ adj (γ adj α))
Play
DiscussStrategy↓ Continue
GotoRegion↓HandleRegion↓ Continue
GotoRegion↓HandleRegion↓ Continue
Continue RevisitRegion↓
ContinueRevisitRegion↓
identical
identical
Fig. 6. Adjoining in an SCTAG can lead to cross-dependencies of constraints. Non-

crossing spatial constraints are omitted for reasons of clarity.
cross-dependencies like needed for modeling the crossed return to region pattern.
Figure 5 lists part of a SCTAG that handles the re-visisting of cache regions in
CityPoker. Non-local spatial constraints are displayed as dotted lines. A complete
grammar for this use case would convert all context-free rules from Fig. 2 to trees
and add them to the grammar. This step is trivial. Figure 6 demonstrates how
cross-dependencies evolve through two adjoining operations.
3.5 Parsing Spatially Constrained Grammars

For parsing a spatially constrained grammar, we modify existing parsing al-
gorithms. CFGs are typically handled with chart-based parsers, like the well-
known Earley algorithm [25]. An algorithm for parsing TAGs, based on the
Cocke-Younger-Kasami algorithm, was proposed in [24], with a polynomial worst
and average case complexity. Unfortunately, this complexity is O(n6 ) and thus
quite high. Joshi presents a TAG parser that adopts the idea of Earley and
improves the average case complexity [20].
We build the parsers for SCCFG and SCTAG on these Earley-like parsers.
Earley parsers work on a chart in which the elementary constructs of the gram-
mar are kept, production rules for CFGs, trees for TAGs. A dot in each of these
chart entries marks the position up to which this construct has been recognized.
In Joshi’s parser the ‘Earley dot’ traverses trees and not Strings. Earley parsers
work in three steps: scan, predict, and complete. Predict checks for possible
derivations and adds them to a chart. Scan reads the next symbol from the
stream and matches it with the chart entries. Complete passes the recognition
of rules up the tree until finally we have recognized the starting symbol. The
TAG parser has a fourth operation, called ‘adjoin’, to handle this additional
operation.
Our point is that adding spatial constraints to such a parser will not make it
slower but faster. The reason is that spatial constraints give us more predictive
information. ‘Any algorithm should have enough information to know which
tokens are to be expected after a given left context’ [20, p.36]. Knowing the
spatial context of left-hand terminals we can throw away those hypotheses that
are not consistent with the spatial constraints. We add this step after each scan
operation.
4 Related Work
We started this paper by saying that approaches for intention recognition dif-
fer in the way the domain and possible intentions are represented. A number
of formalisms has been proposed for modeling the mental state of an agent,
ranging from finite state machines [26] to complex cognitive modeling architec-
tures, like the ACT-R architecture [27]. With our formal grammars, which are
between these two extremes, we try to keep the balance between expressiveness
and computational complexity.
Using formal grammars to describe structural regularities is common, not
only in NLP, but also in areas like computer vision [28], and action recognition
[29]. Pynadath’s state dependent grammars constrain the applicability of a rule
dependent on a general state variable [17]. The generality of this state variable
leads to an explosion in symbol space if trying to apply a parsing algorithm,
so that an inference mechanism is chosen which translates the grammar into a
Dynamic Bayes Network (DBN).
Choosing a grammatical approach means using grammars not only for syntax
description, but implicitly assigning a certain semantics (in terms of intentions
and plans). Linguistics is also concerned with semantics, both, on the sentence
level, and on the level of discourse. Webber et al. [30], as one example for the
literature on discourse semantics, argue that multiple, possibly overlapping, se-
mantic relations are common in discourse semantics. By using (lexicalized) TAG
they describe these relations without the need for building multiple trees.
374 P. Kiefer
Dependencies supported Typical spatial intention Example

pattern
SGIS Nested: Yes Sub-intentions are located
(only part-of relation) in the same or in part-of
Cross: No sub-regions of their parent
intention. R1R2R3R4R2R1
SCCFG Nested: Yes Accidental leaving

Cross: No pattern. part-of
(unless statically defined touches
in productions) R1R2R3R1R4R5R1R2
SCTAG Nested: Yes Crossed return to region

Cross: Yes pattern. part-of
touches
R1R2R3R1R4R1R2R1R5
Fig. 7. A hierarchy of spatial grammars for mobile intention recognition
Approaches based on probabilistic networks, like DBNs, have widely been

applied in plan recognition research. Starting from Charniak and Goldman’s
Plan Recognition Bayesian Networks [31], to hierarchical Markov Models as used
by Liao et al. in the BELIEVER system [32]. The semantics of ‘goal’ in the latter
publication is ‘target location’ without a complex intention model. Bui proposes
the Abstract Hidden Markov Memory Model for plan recognition in an intelligent
office environment [33]. Geo-referenced DBN are proposed in [34] to fuse sensory
data and cope with the problem of inaccurate data.
Intention recognition approaches also differ in the way space is represented:
the simplest model consists of a number of points of interest with circular or
polygonal areas around them [35,26]. Others add a street network to these loca-
tions [32], use spatial tessellation [36], or formalize space with Spatial Conceptual
Maps [37].
The quality of our intention recognition relies on a good preprocessing. Con-
verting a motion track into a qualitative representation has been done by a
number of researchers, for instance [38]. The authors also compare a number
of approaches to generalization. For the classification of segments in Fig. 1 we
used a simple decision tree. The set of behavior types we are interested in was
chosen manually. An automatic detection of motion patterns is the concern of
the spatio-temporal data mining community, see e.g. [39].
One concern of computational and formal linguistics is to find approaches
that closely resemble the human conceptualization of language. Steedman, for
instance, argues that planned action and natural language are related systems
that share the same operations: functional composition and type-raising [40].
Combinatory Categorial Grammar (CCG) is a mildly context-sensitive formal-
ism that supports these operators. Using a ’spatialized’ version of CCG for mobile
intention recognition could be worthwhile. We chose TAG in this paper because
we belief that TAG are cognitively more appealing for knowledge engineers not
familiar with NLP concepts.
5 Conclusion and Outlook
We have presented a hierarchy of formal grammars for mobile intention recogni-

tion: SGIS, SCCFG, and SCTAG. With increasing expressiveness we can handle
a larger number of spatio-temporal patterns which frequently occur in scenar-
ios of mobile intention recognition, like in CityPoker. Our grammars allow the
knowledge engineer to specify complex intention/space relations by using intu-
itive spatial relations, instead of writing arbitrarily complex rules for input of
behavior/region tuples. Figure 7 gives an overview on the three formalisms.
We only sketched the principle of parsing. Currently, we are specifying the pars-
ing algorithm for SCTAG formally. As a next step we will evaluate the algorithm on
the restricted computational resources of a mobile device. In this paper we treated
all spatial relations as arbitrary relations, and only mentioned that we could use
them for inference. This is also one issue of our future work. Adding temporal con-
straints could be worthwhile, like ‘the duration between these two intentions may
not be longer than a certain Δt’. Another issues that remains open is recognizing
that the agent spontaneously changes her intention [15].
Acknowledgements
I would like to thank Klaus Stein for the discussions on the algorithmic possibil-
ities of SCTAG parsing. Christoph Schlieder’s motivating and constant support
of my PhD research made this work possible.
References
1. Schmidt, C., Sridharan, N., Goodson, J.: The plan recognition problem: An in-
tersection of psychology and artificial intelligence. Artificial Intelligence 11(1-2),
45–83 (1978)
2. Kautz, H.A.: A Formal Theory of Plan Recognition. PhD thesis, University of
Rochester, Rochester, NY (1987)
3. Jarvis, P.A., Lunt, T.F., Myers, K.L.: Identifying terrorist activity with ai plan-
recognition technology. AI Magazine 26(3), 73–81 (2005)
4. Bui, H.H.: Efficient approximate inference for online probabilistic plan recogni-
tion. Technical Report 1/2002, School of Computing Science, Curtin University of
Technology, Perth, WA, Australia (2002)
5. Baus, J., Krueger, A., Wahlster, W.: A resource-adaptive mobile navigation system.
In: Proc. 7th International Conference on Intelligent User Interfaces, San Francisco,
USA, pp. 15–22. ACM Press, New York (2002)
6. Seifert, I., Barkowsky, T., Freksa, C.: Region-Based Representation for Assistance
with Spatio-Temporal Planning in Unfamiliar Environments. In: Location Based
Services and TeleCartography, pp. 179–192. Springer, Heidelberg (2007)
7. Geib, C.W., Steedman, M.: On natural language processing and plan recognition.
In: Proceedings of the 20th International Joint Conference on Artificial Intelligence
(IJCAI), pp. 1612–1617 (2007)
376 P. Kiefer
8. Kiefer, P., Schlieder, C.: Exploring context-sensitivity in spatial intention recogni-

tion. In: Workshop on Behavior Monitoring and Interpretation, 40th German Con-
ference on Artificial Intelligence (KI-2007), CEUR, vol. 296, pp. 102–116 (2007)
102–116 ISSN 1613-0073
9. Schlieder, C.: Representing the meaning of spatial behavior by spatially grounded
intentional systems. In: Rodrı́guez, M.A., Cruz, I., Levashkin, S., Egenhofer, M.J.
(eds.) GeoS 2005. LNCS, vol. 3799, pp. 30–44. Springer, Heidelberg (2005)
10. Egenhofer, M.J., Franzosa, R.D.: Point-set topological relations. International
Journal of Geographical Information Systems 5(2), 161–174 (1991)
11. Kuhn, W.: Ontologies in support of activities in geographical space. International
Journal of Geographical Information Science 15(7), 613–631 (2001)
12. Stein, K., Schlieder, C.: Recognition of intentional behavior in spatial partonomies.
In: ECAI 2004 Worskhop 15: Spatial and Temporal Reasoning (16th European
Conference on Artificial Intelligence) (2005)
13. Schlieder, C., Werner, A.: Interpretation of intentional behavior in spatial
partonomies. In: Freksa, C., Brauer, W., Habel, C., Wender, K.F. (eds.) Spatial
Cognition III. LNCS (LNAI), vol. 2685, pp. 401–414. Springer, Heidelberg (2003)
14. Kautz, H., Allen, J.F.: Generalized plan recognition. In: Proc. of the AAAI con-
ference 1986 (1986)
15. Geib, C.W., Goldman, R.P.: Recognizing plan/goal abandonment. In: Proceedings
of the Eighteenth International Joint Conference on Artificial Intelligence (IJCAI),
pp. 1515–1517 (2003)
16. Wooldridge, M.: Reasoning About Rational Agents. MIT Press, Cambridge (2000)
17. Pynadath, D.V.: Probabilistic Grammars for Plan Recognition. PhD thesis, The
University of Michigan (1999)
18. Hirtle, S., Jonides, J.: Evidence of hierarchies in cognitive maps. Memory and
Cognition 13(3), 208–217 (1985)
19. Joshi, A.K., Vijay-Shanker, K., Weir, D.: The convergence of mildly context-
sensitive grammar formalisms. Technical Report MS-CIS-90-01, Department of
Computer and Information Science, University of Pennsylvania (1990)
20. Vijay-Shanker, K., Weir, D.: The equivalence of four extensions of context-free
grammars. Mathematical Systems Theory 27(6), 511–546 (1994)
21. Joshi, A., Levy, L., Takahashi, M.: Tree adjunct grammars. Journal of Computer
and System Sciences 10, 136–163 (1975)
22. Joshi, A.K., Schabes, Y.: Tree-adjoining grammars. In: Rozenberg, G., Salomaa, A.
(eds.) Handbook of Formal Languages, vol. 3, pp. 69–124. Springer, Berlin (1997)
23. Joshi, A.K.: Tree adjoining grammars: How much context-sensitivity is required to
provide reasonable structural descriptions? In: Dowty, D.R., Karttunen, L., Zwicky,
A.M. (eds.) Natural Language Parsing: Psychological, Computational, and Theo-
retical Perspectives, pp. 206–250. Cambridge University Press, Cambridge (1985)
24. Vijay-Shanker, K., Joshi, A.K.: Some computational properties of tree adjoining
grammars. In: Meeting of the Association for Computational Linguistics, Chicago,
Illinois, pp. 82–83 (1985)
25. Earley, J.: An efficient context-free parsing algorithm. Communications of the
ACM 13(2), 94–102 (1970)
26. Dee, H., Hogg, D.: Detecting inexplicable behaviour. In: Proceedings of the British
Machine Vision Conference, pp. 477–486. The British Machine Vision Association
(2004)
27. Anderson, J., Bothell, D., Byrne, M., Douglass, S., Lebiere, C., Qin, Y.: An inte-
grated theory of the mind. Psychological Review 111(4), 1036–1060 (2004)
28. Chanda, G., Dellaert, F.: Grammatical methods in computer vision: An overview.
Technical Report GIT-GVU-04-29, College of Computing, Georgia Institute of
Technology, Atlanta, GA, USA (November 2004),
ftp://ftp.cc.gatech.edu/pub/gvu/tr/2004/04-29.pdf
29. Bobick, A., Ivanov, Y.: Action recognition using probabilistic parsing. In: Proc. of
the Conference on Computer Vision and Pattern Recognition, pp. 196–202 (1998)
30. Webber, B., Knott, A., Stone, M., Joshi, A.: Discourse relations: A structural
and presuppositional account using lexicalised tag. In: Proc. of the 37th. Annual
Meeting of the American Association for Computational Linguistics (ACL1999),
pp. 41–48 (1999)
31. Charniak, E., Goldman, R.P.: A bayesian model of plan recognition. Artificial
Intelligence 64(1), 53–79 (1993)
32. Liao, L., Patterson, D.J., Fox, D., Kautz, H.: Learning and inferring transportation
routines. Artificial Intelligence 171(5-6), 311–331 (2007)
33. Bui, H.H.: A general model for online probabilistic plan recognition. In: Proceedings
of the International Joint Conference on Artificial Intelligence (IJCAI) (2003)
34. Brandherm, B., Schwartz, T.: Geo referenced dynamic Bayesian networks for user
positioning on mobile systems. In: Strang, T., Linnhoff-Popien, C. (eds.) LoCA
2005. LNCS, vol. 3479, pp. 223–234. Springer, Heidelberg (2005)
35. Ashbrook, D., Starner, T.: Using gps to learn significant locations and predict
movement across multiple users. Personal and Ubiquitous Computing 7(5), 275–
286 (2003)
36. Gottfried, B., Witte, J.: Representing spatial activities by spatially contextu-
alised motion patterns. In: RoboCup 2007, International Symposium, pp. 329–336.
37. Samaan, N., Karmouch, A.: A mobility prediction architecture based on contextual
knowledge and spatial conceptual maps. IEEE Transactions on Mobile Comput-
ing 4(6), 537–551 (2005)
38. Musto, A., Stein, K., Eisenkolb, A., Röfer, T., Brauer, W., Schill, K.: From motion
observation to qualitative motion representation. In: Habel, C., Brauer, W., Freksa,
C., Wender, K.F. (eds.) Spatial Cognition 2000. LNCS (LNAI), vol. 1849, pp. 115–
126. Springer, Heidelberg (2000)
39. Laube, P., van Krefeld, M., Imfeld, S.: Finding remo - detecting relative motion
patterns in geospatial lifelines. In: Developments in Spatial Data Handling, Pro-
ceedings of the 11th International Symposium on Spatial Data Handling, pp. 201–
215 (2004)
40. Steedman, M.: Plans, affordances, and combinatory grammar. Linguistics and Phi-
losophy 25(5-6), 725–753 (2002)
Modeling Cross-Cultural Performance on the Visual
Oddity Task
Andrew Lovett, Kate Lockwood, and Kenneth Forbus
Qualitative Reasoning Group, Northwestern University

2133 Sheridan Rd., Evanston, IL, 60201, USA
{andrew-lovett@, kate@cs., forbus@}northwestern.edu
Abstract. Cognitive simulation offers a means of more closely examining the

reasons for behavior found in psychological studies. This paper describes a
computational model of the visual oddity task, in which individuals are shown
six images and asked to pick the one that doesn’t belong. We show that the
model can match performance by participants from two cultures: Americans
and the Mundurukú. We use ablation experiments on the model to provide evi-
dence as to what factors might help explain differences in performance by the
members of the two cultures.
Keywords: Qualitative representation, analogy, cognitive modeling, oddity task.
1 Introduction
A central problem in studying spatial cognition is representation. Understanding and
modeling the visual representations people construct for the world around them is a
difficult challenge for cognitive science. Dehaene and colleagues [7] made important
progress on this problem by designing a study which directly tests what features peo-
ple represent when they look at geometric figures in a visual scene. Their study util-
ized the Oddity Task methodology: participants were shown an array of six images
and asked to pick the image that did not belong (e.g., see Fig. 1). By varying the diag-
nostic spatial feature, i.e., the feature that distinguished one image from the other five,
they were able to test which features their participants were capable of representing
and comparing.
Dehaene and colleagues ran their study on multiple age groups within two popula-
tions: Americans and the Mundurukú, an indigenous group in South America. They
found that while the Americans performed better overall, the Mundurukú appeared to
be capable of encoding the same spatial features. The Mundurukú performed above
chance on nearly all of the 45 problems, and their pattern of errors correlated highly
with the American pattern of errors. Dehaene concluded from the results that many
spatial features are universal in human representation. However, several questions
remain: (1) What makes one problem harder than another? (2) Why is it that, despite
the high correlation between population groups, some problems seem especially hard
for Americans, while other problems seem especially hard for the Mundurukú? (3) To
what extent can questions 1) and 2) be answered in terms of the process of encoding
representations, versus the process of operating over those representations to solve
problems?
Modeling Cross-Cultural Performance on the Visual Oddity Task 379
A B C
D E F
Fig. 1. Six example problems from the Oddity Task
This paper presents a cognitive model designed to explore these questions. Our
model is based upon two core claims about spatial cognition: (1) When people encode
a visual scene, they focus on the qualitative attributes and relations of the objects in
these scene [11]. This provides them with a more abstract, more robust representation
than one filled with quantitative details about the scene. (2) People compare low-level
visual representations using the same mapping process used to perform abstract
analogies. Our model of comparison is based on Gentner’s [14] structure-mapping
theory of analogy.
Our model uses four components to simulate the oddity task from end-to-end. We
use a modified version of CogSketch1 [13], a sketch understanding system, to auto-
matically construct qualitative representations of sketches and other two-dimensional
stimuli. We use the Structure-Mapping Engine (SME) [8], a computational model of
structure-mapping theory, to model comparison and similarity judgments. We use
two additional components based on structure-mapping theory: MAGI [9], which
models symmetry detection, and SEQL [18], which models analogical generalization.
Using this approach, we have modeled human performance on geometric analogy
problems [25] (problems of the form “A is to B as C is to …?”); a subset of the Ra-
ven’s Progressive Matrices [20], a visually-based intelligence test; and basic visual
comparison tasks [19,21]. However, the Dehaene task offers a unique opportunity in
that it was designed to isolate specific spatial features and check for their presence or
absence in one’s representation.
This paper presents our cognitive model of performance on the Oddity Task and
uses it to study factors that contribute to difficulty on the task. In comparing the
model with human results, we focus on two population groups: American children
aged 8-13, and the full set of Mundurkú of all ages. We consider these groups because
their overall performance on the 45 problems in the Dehaene study was comparable:
75% for the Americans and 67% for the Mundurkú. We provide evidence for what
might distinguish these groups from each other via ablation studies using the model.
1
Publicly available at http://www.spatialintelligence.org/projects/cogsketch_index.html
380 A. Lovett, K. Lockwood, and K. Forbus
We begin by briefly reviewing the components of our model. We then show how
these component models are combined in our overall model of the Oddity Task. We
analyze the results produced by running the model on the 45 problems from the origi-
nal study, and use ablation studies to explore possible explanations for performance
differences between the two groups. We close with a discussion of related and future
work.
2 Modeling Comparison Via Analogy

Our model of comparison is based on Gentner’s [14] structure-mapping theory of
analogy. According to structure mapping, people compare two objects by aligning the
common structure in their representations of the objects. Comparison is guided by a
systematicity bias; that is, people prefer mappings that place deeper structure with
more higher-order relations into correspondence. Structure-mapping has been used to
explain and predict a variety of psychological phenomena, including visual similarity
and differences [15,22]. Next we describe three computational models based on
structure-mapping, each of which is used in the present study.
2.1 Structure-Mapping Engine
The Structure-Mapping Engine (SME) [8,10], is a computational implementation of

structure-mapping theory. It takes as input two cases, a base and a target. Each case is
a symbolic representation consisting of entities, attributes of entities, and relations.
There are both first-order relations between entities and higher-order relations be-
tween other relations. SME returns one to three mappings between the base and tar-
get. Each mapping has three components: a set of correspondences between elements
in the base and elements in the target, a structural evaluation score, which estimates
the degree of similarity between the cases; and a set of candidate inferences, infer-
ences about the target supported by the mapping and unaligned structure in the base.
2.2 MAGI
MAGI [9] is a model of symmetry detection based upon SME. Essentially, it identi-
fies symmetry in a representation by comparing the representation to itself, while
avoiding perfect self-matches. MAGI is important in modeling spatial cognition be-
cause it is often necessary to identify axes of symmetry in a visual scene, or in a spe-
cific object.
2.3 SEQL
SEQL [18] is a model of analogical generalization. SEQL is based upon the idea that
individuals learn generalizations for categories through a process of progressive
alignment [16], in which instances of a category are compared and the commonalities
are abstracted out as a direct result of the comparison. Given a set of cases, SEQL can
build one or more generalizations from them by comparing them via SME and elimi-
nating the structure that fails to align between cases, leaving only the structure that is
common across all the cases in the generalization. Because the generalization is in the
same form as individual case representations, new cases can be compared to the gen-
eralization to measure their similarity to a category.
3 Modeling Qualitative Representation Via CogSketch

One of our core claims is that people use qualitative spatial representations when
reasoning over or comparing images. While quantitative data, such as the exact sizes
of objects or the exact orientation of edges, may vary widely, even between images of
the same object, qualitative relations are much more consistent. For example, nearly
every face contains an eye to the right of another eye, with both eyes above a nose
and a mouth. The key to qualitative representation is to encode what Biederman calls
the nonaccidental properties [4]. These are the relations that are unlikely to have
occurred by accident in a sketch. For example, two lines chosen at random are
unlikely to have exactly the same orientation. Therefore, when two lines are parallel,
this is unlikely to have occurred by random chance, and so it is probably significant.
There is abundant evidence that people encode qualitative relations corresponding
to nonaccidental properties in visual scenes. For example, parallel lines are salient for
children as young as three [1]. Adults and infants can distinguish between concave
and convex shapes—a qualitative distinction [3], and humans have been shown to
have a preference for objects aligned with a vertical or horizontal axis, as opposed to
those with an arbitrary orientation [2]. Huttenlocher and colleagues [17] have shown
that when individuals memorize a point’s location in a circle, they pay special atten-
tion to which quadrant of the circle the point lies in, again a qualitative distinction.
While it is obviously the case that individuals are capable of encoding quantitative
information in addition to these qualitative relations, the qualitative relations appear
particularly well-suited to spatial problem-solving, as they can be easily encoded
symbolically and used to compare different scenes. Thus, in our present work we
explore the hypothesis that spatial tasks can be solved relying exclusively on qualita-
tive representations.
We see qualitative spatial representations as hierarchical (e.g., [24]). Each of the
shapes in an image can have its own attributes and relations. At the same time, each
of the edges that make up that shape can also have its own attributes and relations.
This gives rise to two representational foci: the shape representation and the edge
representation. A further claim we are evaluating with the current study is that these
two representational foci will never be used together. That is, a comparison or other
operation will always run on either an image’s shape representation or its edge repre-
sentation.
3.1 CogSketch
CogSketch [13] is a sketch understanding system based upon the nuSketch [12] archi-
tecture. Users sketch a series of glyphs, or objects in a sketch. CogSketch then
computes a number of qualitative spatial relations between the glyphs, building up a
structural representation of the sketch that corresponds to the shape representation.
CogSketch can also decompose a glyph into its component edges and construct a
representation of the qualitative relations between the glyph’s edges. This corresponds
to the edge representation.
Many of the spatial relations in the shape representation (e.g., relative position,
containment) are computed based on the relative position and topology of the glyphs.
However, some shape relations can only be computed by first decomposing a glyph
into its edges and constructing the glyph’s edge representation. By comparing two
glyph’s edge representations using SME, CogSketch can identify the corresponding
edges in the two glyph’s shapes. These correspondences can be used to determine
whether the two glyphs are the same shape, and whether one glyph’s shape is a trans-
formation of the other (e.g., a rotation or a reflection). Furthermore, a glyph’s edge
representation can be compared to itself via MAGI to identify axes of symmetry.
Table 1. Qualitative vocabulary for the edge representation
Edge Attributes Edge Relations

• Straight/Curved/Ellipse • Concave/convex corner
• Axis-aligned (horizontal • Perpendicular corner
or vertical) • Edges-same-length cor-
• Short/Med/Long (relative ner
length) • Intersecting
• Parallel
• Perpendicular
3.2 Representing the Oddity Task Stimuli
In order to model the Oddity Task, we examined the Dehaene [7] stimuli and identi-
fied a set of qualitative attributes and relations that appeared to be important for solv-
ing the problems. All attributes and relations had to be among those that could be
computed automatically by CogSketch.
Table 2. Qualitative vocabulary for the shape representation
Shape Attributes Shape Relations

• Closed shape • Right-of/Above (rela-
• Convex shape tive position)
• Circle shape • Containment
• Empty/Filled • Frame-of-Reference
• Axis (Symmetric, Ver- • Shape-proximity-group
tical, and/or Horizon- • Same-shape
tal) • Rotation-between
• Reflection-between
Line-Line Relations Line-Point Relations

• Intersecting • Intersecting
• Parallel • Colinear
• Perpendicular • Centered-On
Table 1 summarizes qualitative attributes and relations for the edge representa-
tions. Many relations are based on corners between edges. The other relations can
only hold for edges that are not connected by a corner along the shape.
Table 2 summarizes attributes and relations for shapes. Empty/filled is a simplifica-
tion of shape color; it refers to whether the shape has any fill color. Frame-of-
Reference relations are used when a smaller shape is located inside a larger, symmetric
shape (i.e., a circle). The inner’s shape location is described in terms of which quadrant
of the larger shape it is located in; additionally, the inner shape may lie exactly along
the larger shape’s axes of symmetry. Shape-proximity-group refers to shapes grouped
together based on the Gestalt law of proximity [26]. Currently, grouping by proximity
is only implemented for circles.
Line/Line and Line/Point relations apply only to special shape types. Line/Line re-
lations are for shapes that are simple, straight lines (thus these relations are a subset of
the edge relations). Line/Point relations are for when a small circle lies near a line.
The centered-on relation applies when the circle lies at the center of the line. This
relation is essentially a special case of the frame-of-reference relation for a dot lying
at the center of a circle.
Axes of symmetry, same-shape, rotation-between, and reflection-between are all
computed by comparing shapes’ edge representations, as described above. Reflections
are classified as X-Axis-Reflections, Y-Axis-Reflections, and Other-Reflections.
4 Modeling the Oddity Task

Our approach to performing the Oddity Task is to identify what is common across the
images in an array by generalizing over their representations with SEQL. Individual
images can then be compared to the generalization using SME. If one image is no-
ticeably less similar to the generalization, then it must be the odd image out. Most of
the time (e.g., Problem B in Fig. 2), the odd image out lacks a qualitative feature that
is present in the other five images, in this case parallel lines. However, in some cases
(e.g., Problem C), the odd image out possesses an extra feature beyond those found in
the other images.
4.1 Theoretical Claims of Model
Our model of the oddity task is based on the following theoretical claims:
1) People encode qualitative, structural representations of visual scenes and use
these representations to perform visual tasks.
2) For a given problem, people will focus on a particular representational level
(either the shape level or the edge level) in solving that problem.
3) Qualitative spatial representations are compared via structure-mapping, as
implemented in SME.
4) People will identify the common features across a set of images via analogi-
cal generalization, as implemented in SEQL.
Note that these claims are general enough to apply to many spatial tasks. However,
they are not detailed enough to fully specify how any task would be completed. Thus,
it is necessary to make additional modeling assumptions in order to fill out a complete

computational model of the task.
4.2 Modeling the Process
Our model attempts to pick out the image that does not belong by performing a series
of Generalize/Compare trials. In each trial, the system constructs a generalization
from a subset of the images in the array (either the top three or the bottom three). This
generalization represents what is common across all of these images. For example,
consider the right-angled triangle problem (Fig. 1, Problem A). The generalization
built from the three top images will describe three connected edges, with two of the
edges being perpendicular. In the rightmost top image, the two perpendicular edges
form an edges-same-length-corner, but this relation will have been abstracted out
because it is not common to all three images.
The generalization is then compared to each of the other images in the array, using
SME. The model examines the similarity scores for the three images, looking for a
particular pattern of results: two of the images should be quite similar to the generali-
zation, while the third image, lacking a key feature, should be less similar. In this
case, the lower middle triangle will be less similar to the generalization because it
lacks a right angle.
Similarity is based on SME’s structural evaluation score, but it must be normalized.
There are two different ways to normalize it: Similarity scores can be normalized based
only on the size of the generalization (gen-normalized). This score measures how much
of the generalization is present in the image being compared. This measure is ideal for
noticing whether an image lacks some feature of the generalization.
Alternatively, similarity scores can be normalized based on both the size of the
generalization and the size of the image’s representation (fully-normalized). This
score measures both how much of the generalization is present in the image and how
much of the image is present in the generalization. While more complex than gen-
normalized scores, fully-normalized scores are necessary for noticing an odd image
out that possesses an extra qualitative feature that the other images lack. For exam-
ple, it allows the model to pick out the image with parallel lines from the other five
images without parallel lines (Fig. 1, Problem C).
4.3 Controlling the Processing
In each Generalize/Compare trial, the model must make three choices. The first is
which subset of the images to generalize over (either the top three images or the
bottom three). The second is whether to use gen-normalized or fully-normalized
similarity scores. The third is whether to use edge representations or shape representa-
tions—recall that we are predicting that edge representations and shape representa-
tions will never be combined in a single comparison.
These choices are made via the following simple control mechanism: (1) To en-
sure that the results are not dependent on the order of the images in the array, trial
runs are attempted in pairs, one based on generalizing from the top three images and
one based on generalizing from the bottom three images. (2) Because the gen-
normalized similarity score is simpler, it is always attempted first. (3) The model
chooses whether to use edge or shape representations based on the makeup of the first
image. If the image contains multiple shapes, or if the image contains an elliptical
shape consisting of only a single edge (e.g., a circle), then a shape representation is
used. Otherwise, an edge representation is used. Note, however, that an edge repre-
sentation will be quickly abandoned if it is impossible to find a good generalization
across images, as indicated by different images having different numbers of edges.
After the initial pair of trials is run, the model looks for a sufficient candidate. Re-
call that each Generalize/Compare run produces three similarity scores for the three
images that have been compared to the generalization. A sufficient candidate is cho-
sen when the lowest-scoring image has a similarity score noticeably lower than the
other two (< 95% of the second lowest-scoring image), meaning the image is noticea-
bly less similar to the generalization.
In cases where a sufficient candidate is not found, the model will attempt addi-
tional trials. (1) If the model was previously run using edge representations, it will try
using shape representations. (2) The model will try using a fully-normalized similar-
ity score, to see if the odd image out possesses an extra feature. At this point, if no
sufficient candidate has been identified, the model gives up (this is the equivalent of a
person guessing randomly, but we do not allow the model to make such guesses).
5 Simulation
We evaluated our model by running it on the 45 problems from the Dehaene [7]
study. The original stimuli, in the form of PowerPoint slides, were copied and pasted
into CogSketch, which automatically converted each PowerPoint shape into a glyph.
Four of the 45 problems were touched up in PowerPoint to ease the transition—lines
or polygons that had been drawn as separate parts and then grouped together were
redrawn as a single shape. Five additional problems were modified after being pasted
into CogSketch. In all five cases, we removed simple edges which had been added to
the images of the problem to help illustrate an angle or reflection to which partici-
pants were meant to attend. Because the model was unable to understand the informa-
tion these lines were meant to convey, they would have served only as distracters.
Aside from the changes to these nine problems, no changes were made to the stimuli
which had been run on human participants.
In analyzing the results, we consider first the model’s overall accuracy, including
the correlation between its performance and that of both the American participants
and the Mundurukú participants. We then use the model to identify four factors that
could contribute to problem difficulty. We examine the correlation between these
factors and human performance on the subset of problems that are correctly solved by
the model.
5.1 Model Accuracy
Our model correctly solves 39/45 problems. Note that chance performance would be
7.5/45. Furthermore, there is a strong correlation between the model’s performance and
the performance of the human participants. Table 3 shows the Pearson correlation coef-
ficient between the model and each of the human populations. As the table shows, the
model correlates better with the American participants. However, there is also a high
correlation with the Mundurukú participants. The coefficient of determination, which is
computed by squaring the correlation coefficient, indicates the percentage of the vari-
ance in one variable which is accounted for by another. In this case, the coefficient of
determination between the model and the Mundurukú participants is (.4932 = .243),
meaning the model accounts for about ¼ of the variance in the performance of the
Mundurukú participants.
Table 3. Correlations between the model and the American and Mundurukú participants
Americans Mundurukú
Model .656 .493
Americans * .758
Mundurukú .758 *
Fig. 2 plots the performance of the two populations and the model. As the figure
shows, the six problems on which the model fails are among the hardest for both
populations. The one clear exception is problem 21 (see Fig. 3). Although the model
fails on this problem, the Mundurukú performed quite well on it (86% accuracy).
Fig. 2. Performance of Americans, Mundurukú, and our model on the Oddity Task
Discussion. Fig. 3 shows the six problems which our model fails to solve. As the
percentages show, these problems were for the most part quite difficult for both the
Americans and the Mundurukú, with performance on some problems little or no
higher than chance (17%).
21 (55% / 88%) (4) 22 (13% / 48%) (5)
34 (37% / 18%) (6) 38 (60% / 23%) (6)
39 (24% / 20%) (1) 44 (31% / 23%) (4)
Fig. 3. The six problems the model fails to solve. Above each problem the average accuracy
for the Americans and the Mundurukú are listed, respectively, followed by the number of the
correct answer.
Overall, these six problems can be roughly broken down into three categories
based on what is required to solve them. First, problem 22 requires encoding whether
the dot lies along the axes of the asymmetric quadrilateral. Our model simply does not
encode this relation—nor, it appears, do Americans, as they actually fall below
chance on this problem. Interestingly, the Mundurukú are well above chance; at this
time, it is difficult to say why they are better at solving this problem.
Problems 38 and 44 both require identifying a rotation between shapes found in

different images. Our model only looks for rotations between shapes within a single
image. As the percentages show, the participants, and particularly the Mundurukú,
had difficulty solving these problems. We believe the most likely reason is that it did
not occur to them to look for rotations between shapes in different images.
Problems 21, 34, and 39 all appear to require encoding a quantitative relation be-
tween shapes: a percentage distance along an edge for 21, a number of degrees of
rotation for 34, and a ratio between two shapes’ sizes for 39. The fact that participants
had so much trouble with these problems supports our prediction that individuals
primarily encode and compare qualitative spatial features. The one exception here
was problem 21, which was reasonably difficult for the Americans but actually quite
easy for the Mundurukú. As with problem 22, it is difficult to say why the Mundurukú
performed so well on this problem. It may that they are better at dividing a space
(either a line or a quadrilateral) into smaller parts and qualitatively encoding which of
those smaller parts a dot lies along.
5.2 Modeling Problem Difficulty
We analyzed problem difficulty on the 39 problems that the model correctly solves.
We used the model to identify four factors that could contribute to difficulty. For this
paper, we focus on factors related to encoding the stimuli. The factors are:
(1) Shape Comparison. Some problems (e.g., Fig. 1, Problem D) require construct-
ing edge representations of two shapes and comparing them in order to identify a
relation between the shapes (e.g., a rotation or a reflection). This may be difficult
because it involves switching between the edge and shape representations, and be-
cause it requires conducting an additional comparison with SME before one begins
comparing the six images.
(2) Shape Symmetry: Some problems (e.g., Fig. 1, Problem E) require comparing a
shape’s edge representation to itself, via MAGI, in order to identify an axis of sym-
metry. This could be difficult for similar reasons.
(3) Shape Decomposition: Several problems (e.g., Fig 1, Problem A) require de-
composing shapes into edges in order to represent each image at the edge representa-
tion level. It is possible that this will be difficult for individuals because there may be
a temptation to consider closed shapes only at the shape representation level.
(4) Shape Grouping: A couple problems (e.g., Fig. 1, Problem F) require grouping
shapes together based on the Gestalt rule of proximity. Normally, one would assume
this was easy, but preliminary analysis indicated it might be difficult for the Mundu-
rukú participants.
We used the model to produce a measure for each difficulty factor on each problem
via ablation; for example, we ran the model with the ability to conduct shape com-
parisons turned off in order to identify the problems on which shape comparisons
were required. We then attempted to find a difficulty function, based on the four fac-
tors, which correlated highly with each of the human populations. This was done by
performing an exhaustive search over all possible linear weights for the four factors in
the range of 0 to 15.
Results. The optimal difficulty function for the American participants is shown in
Table 4 (the weight for each factor is normalized based on the size of the largest
weight). In addition to the weight of each factor, the table shows the individual
contribution of each factor to the correlation between the function and human per-
formance. This was computed by removing a factor from the difficulty function and
considering the drop in the function’s correlation with the human population.
As Table 4 shows, the difficulty function had an overall correlation of .667 with
the American participants. This means that the function explains (.6672 = 44%) of the
variance in human performance on the 39 problems. Most of the contribution to this
correlation comes from shape comparison and shape symmetry. It appears that the
American participants had a great deal of difficulty with problems that required de-
composing shapes into edges and comparing the edge representations to identify rela-
tions between shapes, or symmetry within a single shape. Shape decomposition also
contributed to the correlation, suggesting that the participants had some difficulty
with the problems requiring focusing on the edge representations of closed shapes.
Table 4. Relative contribution of factors to our difficulty function for American performance
Factor Weight in Function Contribution to Correlation

Shape Comparison .69 .163
Shape Symmetry 1.0 .267
Shape Decomposition .38 .062
Shape Grouping .08 .001
Overall --- .667
The optimal difficulty function for the Mundurukú participants is shown in Table 5.
This difficulty function had a correlation of .637 with the human data, indicating it ac-
counts for (.6372 = 41%) of the variance in the Mundurukú performance. By far, the
most important factor was shape comparison. The other contributing factor was shape
grouping, suggesting that the Mundurukú participants might have some difficulty with
problems requiring grouping elements together based on proximity. This is surprising,
as Gestalt grouping is generally thought to be a basic, low-level operation. Note that the
Mundurukú participants had no trouble with problems requiring estimating relative
distances, as indicated by their high performance on problem 21 (Fig. 3).
Table 5. Relative contribution of factors to our difficulty function for Mundurukú performance
Factor Weight in Function Contribution to Correlation

Shape Comparison 1.0 .393
Shape Symmetry .29 .018
Shape Decomposition .14 .009
Shape Grouping .71 .081
Overall --- .637
Table 6 shows the correlation between each difficulty function and each population
group. As expected, each difficulty function correlates far better with the population
group for which it was built. The fact that there is still a relatively high correlation
between the American function and the Mundurukú performance, and between
the Mundurukú function and the American performance, most likely results from the
fact that both groups have a great deal of trouble with problems requiring shape
comparison.
Table 6. Correlations between difficulty function and population
Difficulty Function American Participants Mundurukú Participants
American Function .667 .402
Mundurukú Function .427 .637
Discussion. One of our original goals was to use the model to identify differences
between the two populations. Our two difficulty functions appear to have accom-
plished this. The difficulty function for American participants suggests that they tend
to encode images holistically. They tend to have trouble when a problem requires
breaking a shape down into its edge representation. This may be because the aca-
demic training in basic shapes encourages Americans to look at shapes as a whole,
rather than explicitly considering the individual edges that make up a shape. The
Mundurukú participants, in contrast, appear to encode stimuli more analytically. They
are better able to consider shapes in terms of their component edges; most noticeably,
they are better at using a shape’s edges to identify axes of symmetry. However, they
had difficulty seeing groups of shapes holistically in this task.
6 Related Work
Several AI systems have been constructed to explore visual analogy. Croft and Tha-
gard’s DIVA [5] uses a 3D scene graph representation from computer graphics as a
model of mental imagery. That is, the system “watches” animation in the computer
graphics system in order to perceive its mental imagery. Analogy is carried out via a
connectionist network over the hierarchical structure of the scene graph. DIVA’s
initial inputs, unlike ours, are generated by hand. Their background knowledge is also
hand-generated specifically for their simulation, unlike our use of the same knowl-
edge base across many simulation systems and experiments. DIVA has only been
tested on a handful of examples, and to the best of our knowledge, has not been used
to model specific psychological findings. Davies and Goel’s Galatea [6] uses a small
vocabulary of primitive visual elements (line, circle box) plus a set of visual trans-
formation over them (e.g., move, decompose) to describe base and target descriptions,
and uses a copy/substitution algorithm to model analogy, carrying sequences of
transformations from one description to the other. All of Galatea’s inputs are hand-
generated, as is its background knowledge, and it has only been tested on a few
examples. Mitchell and Hofstader’s Copycat [23] modeled analogy as an aspect of

high-level perception, using comparisons between letter strings as the domain. Copy-
cat was domain-specific, and even the potential correspondences between items were
hand-coded (the slipnet), making it less flexible than SME, which is domain-
independent.
7 Discussion
We have described a model of the Oddity task, using CogSketch to automatically
encode stimuli in terms of qualitative spatial representations, MAGI to detect symme-
try, and SME and SEQL to carry out the task itself. We showed that this combination
of modules can achieve behavior comparable to the participants in Dehaene et al’s
study of American and Mundurukú performance on the same stimuli. Furthermore,
we were able to provide some evidence about possible causes for performance differ-
ences between the groups, through statistical analysis of ablation experiments on the
model.
We find these results quite exciting on their own, but they are also part of a larger
pattern. That is, similar combinations of qualitative representations and analogical
processing have already been used to model a variety of visual processing tasks
[19,20,25]. This study lends further evidence for our larger hypotheses, that (1) quali-
tative attributes and relations are central to human visual encoding and (2) people
compare low-level visual representations using the same mapping process they use for
abstract analogies. The study also lends support to the proposal that (3) comparison
operations are performed using either a shape representational focus or an edge repre-
sentational focus.
We plan to pursue two lines of investigation in future work. First, this paper fo-
cused on difficulties related to encoding. Our model suggests difficulties involving
comparisons may also be implicated. For example, a problem might be harder because
the six images in the array are less similar, making alignment and generalization pro-
duction more difficult. We plan to explore how well aspects of the comparison process
can explain the variance. Of particular interest are whether their contributions are uni-
versal, or whether there will be cultural differences. Second, we plan on using these
analyses to construct more detailed models of specific groups performing this task (i.e.,
children and adults, as well as both cultures). Comparing these models to each other,
and to models of similar spatial tasks, could help identify general processing con-
straints on such tasks. This may shed light on how universal human spatial representa-
tions and reasoning are, both across cultures and across tasks.
Acknowledgements
This work was supported by NSF SLC Grant SBE-0541957, the Spatial Intelligence
and Learning Center (SILC). We thank Elizabeth Spelke for providing the original
oddity task stimuli.
References
1. Abravanel, E.: The Figure Simplicity of Parallel Lines. Child Development 48(2), 708–710
(1977)
2. Appelle, S.: Perception and Discrimination as a Function of Stimulus Orientation: The
Oblique Effect in Man and Animal. Psychological Bulletin 78, 266–278 (1972)
3. Bhatt, R., Hayden, A., Reed, A., Bertin, E., Joseph, J.: Infants’ Perception of Information
along Object Boundaries: Concavities versus Convexities. Experimental Child Psychol-
ogy 94, 91–113 (2006)
4. Biederman, I.: Recognition-by-Components: A Theory of Human Image Understanding.
Psychological Review 94, 115–147 (1987)
5. Croft, D., Thagard, P.: Dynamic Imagery: A Computational Model of Motion and Visual
Analogy. In: Magnani, L., Nersessian, N. (eds.) Model-based Reasoning: Science, Tech-
nology, Values, pp. 259–274. Kluwer/Plenum (2002)
6. Davies, J., Goel, A.K.: Visual Analogy in Problem Solving. In: Proceedings of the Interna-
tional Joint Conference on Artificial Intelligence, pp. 377–382 (2001)
7. Dehaene, S., Izard, V., Pica, P., Spelke, E.: Core Knowledge of Geometry in an Amazo-
nian Indigene Group. Science 311, 381–384 (2006)
8. Falkenhainer, B., Forbus, K., Gentner, D.: The Structure-Mapping Engine. In: Proceedings
of the Fifth National Conference on Artificial Intelligence (1986)
9. Ferguson, R.W.: MAGI: Analogy-Based Encoding Using Regularity and Symmetry. In:
Proceedings of the 16th Annual Conference of the Cognitive Science Society, pp. 283–288
(1994)
10. Forbus, K., Oblinger, D.: Making SME Greedy and Pragmatic. In: Proceedings of the
11. Forbus, K., Ferguson, R., Usher, J.: Towards a Computational Model of Sketching. In:
Proceedings of the 2001 Conference on Intelligent User Interfaces (IUI-2001) (2001)
12. Forbus, K., Lockwood, K., Klenk, M., Tomai, E., Usher, J.: Open-Domain Sketch Under-
standing: The nuSketch Approach. In: AAAI Fall Symposium on Making Pen-based Inter-
action Intelligent and Natural (2004)
13. Forbus, K., Usher, J., Lovett, A., Wetzel, J.: CogSketch: Open-Domain Sketch Under-
standing for Cognitive Science Research and for Education. In: Proceedings of the Euro-
graphics Workshop on Sketch-Based Interfaces and Modeling (2008)
14. Gentner, D.: Structure-Mapping: A Theoretical Framework for Analogy. Cognitive Sci-
ence 7(2), 155–170 (1983)
15. Gentner, D., Markman, A.B.: Structure Mapping in Analogy and Similarity. American
Psychologist 52, 42–56 (1997)
16. Gentner, D., Loewenstein, J.: Relational Language and Relational Thought. In: Amsel, E.,
Byrnes, J.P. (eds.) Language, Literacy, and Cognitive Development: The Development and
Consequences of Symbolic Communication. Lawrence Erlbaum Associates, Mahwah
(2002)
17. Huttenlocher, J., Hedges, L.V., Duncan, S.: Categories and Particulars: Prototype Effects
in Estimating Location. Psychological Review 98(3), 352–376 (1991)
18. Kuehne, S., Forbus, K., Gentner, D., Quinn, B.: SEQL: Category Learning as Progressive
Abstraction Using Structure Mapping. In: Proceedings of the 22nd Annual Meeting of the
19. Lovett, A., Gentner, D., Forbus, K.: Simulating Time-Course Phenomena in Perceptual
Similarity via Incremental Encoding. In: Proceedings of the 28th Annual Meeting of the
20. Lovett, A., Forbus, K., Usher, J.: Analogy with Qualitative Spatial Representations Can
Simulate Solving Raven’s Progressive Matrices. In: Proceedings of the 29th Annual Con-
ference of the Cognitive Society (2007)
21. Lovett, A., Sagi, E., Gentner, D.: Analogy as a Mechanism for Comparison. In: Proceed-
ings of Analogies: Integrating Multiple Cognitive Abilities (2007)
22. Markman, A.B., Gentner, D.: Commonalities and Differences in Similarity Comparisons.
Memory & Cognition 24(2), 235–249 (1996)
23. Mitchell, M.: Analogy-making as Perception: A Computer Model. MIT Press, Cambridge
(1993)
24. Palmer, S.E.: Hierarchical Structure in Perceptual Representation. Cognitive Psychol-
ogy 9(4), 441–474 (1977)
25. Tomai, E., Lovett, A., Forbus, K., Usher, J.: A Structure Mapping Model for Solving
Geometric Analogy Problems. In: Proceedings of the 27th Annual Conference of the Cog-
nitive Science Society (2005)
26. Wertheimer, M.: Gestalt Theory. In: Ellis, W.D. (ed.) A Sourcebook of Gestalt Psychol-
ogy, pp. 1–11. The Humanities Press, New York (1924/1950)
Modelling Scenes Using the Activity within
Them
Hannah M. Dee, Roberto Fraile, David C. Hogg, and Anthony G. Cohn
School of Computing,
University of Leeds,
Leeds LS2 9JT, United Kingdom
{hannah,rf,dch,agc}@comp.leeds.ac.uk
Abstract. This paper describes a method for building visual “maps”

from video data using quantized descriptions of motion. This enables
unsupervised classification of scene regions based upon the motion pat-
terns observed within them. Our aim is to recognise generic places using
a qualitative representation of the spatial layout of regions with com-
mon motion patterns. Such places are characterised by the distribution
of these motion patterns as opposed to static appearance patterns, and
could include locations such as train platforms, bus stops, and park
benches. Motion descriptions are obtained by tracking image features
over a temporal window, and are then subjected to normalisation and
thresholding to provide a quantized representation of that feature’s gross
motion. Input video is quantized spatially into N × N pixel blocks, and
a histogram of the frequency of occurrence of each vector is then built
for each of these small areas of scene. Within these we can therefore
characterise the dominant patterns of motion, and then group our spa-
tial regions based upon both proximity and local motion similarity to
define areas or regions with particular motion characteristics. Moving up
a level we then consider the relationship between the motion in adjacent
spatial areas, and can characterise the dominant patterns of motion ex-
pected in a particular part of the scene over time. The current paper
differs from previous work which has largely been based on the paths of
moving agents, and therefore restricted to scenes in which such paths are
identifiable. We demonstrate our method in three very different scenes:
an indoor room scenario with multiple chairs and unpredictable uncon-
strained motion, an underground station featuring regions where motion
is constrained (train tracks) and regions with complicated motion and
difficult occlusion relationships (platform), and an outdoor scene with
challenging camera motion and partially overlapping video streams.
Keywords: Learning, Spatial relations, Computer vision, Modeling be-

haviour, Qualitative reasoning.
1 Introduction and Motivation

The ability to reason about the things we see in video streams is influenced
by our ability to break down the spatial structure of such scenes into semanti-
cally meaningful regions. In our day-to-day talk about behaviour (“The chicken

Modelling Scenes Using the Activity within Them 395
crossed the road”, for example) we discuss regions (roads) which might be visu-
ally determined by clear kerb stones and line markings. However, these regions
could also be functionally determined: it is easy to imagine some dirt path which
has no clear visible boundaries, but which is still a road by virtue of the cars
driven along it regularly (much to the peril of chickens). In this sense, roads and
paths can be identified as much by typical patterns of motion as by physical
structures. There are certain things we can find out from motion patterns which
would be very difficult to discover through the analysis of static scene structures.
For example, whilst it is possible to imagine a hypothetical scene analysis system
that could identify roads and roundabouts from static images, determining what
side of the road people drive on or which way around the roundabout people
travel would require analysis of motion.
Within the field of Computer Vision there is a body of work concerning the
modelling of scene structure through tracking visible agents, and this work iden-
tifies such emergent, functional paths. In scenes with limited behavioural reper-
toires (Fernyhough et al. [7] call these “strongly stylised domains”) and in which
the behaviour of interest is detectable from trajectories alone, such systems work
well. In scenes where finer grained ideas of motion are of interest (such as around
chairs and benches, which we might be interested in detecting as the loci of sit-
ting and standing activities) trajectory based systems have difficulties. In areas
where behaviour is not as constrained (such as on a train platform, where paths
have little meaning) the trajectory based systems also have difficulties. Strong
occlusion is also a problem for trajectory based systems, and much work con-
siders the problem of maintaining tracks through occlusion. In this paper we
sidestep this difficult problem by using what we call “tracklets”, which are short
indicative bursts of motion, and by working at the level of image features rather
than tracked unitary objects.
The current paper makes two contributions: we apply feature based tracking
(as used in the activity modelling community, e.g. in [13]) to the problem of
modelling scene geography, and we do this within a qualitative framework to ex-
tract descriptions that can be used within Qualitative Spatial Reasoning (QSR)
systems. This allows us to label regions of unconstrained scenes, some of which
are difficult for computer vision systems to handle.
2 Related Work
Whilst there is a large literature on modelling spatial regions using a priori ideas
about space and motion, or previously crafted maps, the current paper falls in
the category of scene modelling from automated analysis of video. Work in scene
modelling has thus far concentrated on the analysis of gross patterns of motion,
such as the trajectories of tracked people (or other moving objects) or on optical
flow patterns.
Systems which work at the level of the entire trajectory are able to con-
struct models of the way in which agents move through the scene. Johnson and
396 H.M. Dee et al.
Hogg in [10] create models of activity within an environment for prediction

and classification of human behaviour by learning behaviour vectors describing
typical motion. Stauffer and Grimson [19] take trajectories and perform a hier-
archical clustering, which brings similar motion patterns together (activity near
a loading bay, pedestrians on a lawn). Makris and Ellis in e.g. [16] learn scene
models including entrances and exits, and paths through the scene, and use these
for tracker initialisation and anomaly detection; a similar approach is used in
[17]. The rich scene models obtained from trajectory modelling have been used
to inform either about the observed behaviour (e.g., typicality detection as in
[10]) or about the scene (e.g., using models of trajectory to determine the layout
of cameras in a multi camera system as in [11]), or to detect specific behaviour
patterns of interest (e.g., elderly people falling down [17]). These systems all rely
upon tracking moving objects through the scene and upon having entire tracks,
which means that they are susceptible to tracker errors (such as those caused by
occlusion). Because of the underlying reliance on background subtraction these
systems are also very susceptible to camera shake. In many of these systems
several hours of training data is required.
Xiang and Gong, in [21], use pixel change history combined with dynamic prob-
abilistic networks (specifically, various types of hidden Markov model (HMM)) to
learn temporal and causal relationships between observed image patterns. Their
work differs from ours in that they aim to detect and model events and the relation-
ships between them directly and statistically. We are interested in modelling the
spatial structure of a scene symbolically. A related approach centred more upon
scene modelling is that presented in [1], working directly from image data using a
forest of HMMs, and learning activity patterns (regions of scene in which there is
increased activity).
Activity discovery or recognition, however, generally works on a smaller scale,
dealing with features or patterns of motion rather than trajectories, and is con-
cerned with determining whether a particular video sequence contains an exam-
ple of a learned activity (running, jumping, or more fine grained activities such
as particular tennis shots). Efros et al., in [6], present early work on activity
modelling using flow where they determine motion type and pose for “30 pixel
man”, in which database data is labelled and then matched to input video us-
ing normalised optical flow. Laptev, in [13], describes his “Space-Time Interest
Points”, which are spatio-temporal features developed from the 2D spatial-only
Harris interest points, and with Pérez [14] more recently extends this to deal with
the classification of even finer grained actions (smoking and drinking). Dalal et
al. in [5] use similar techniques (based upon histograms of gradients) for the
detection of humans in video; their trained detector uses both appearance and
motion cues so can be seen as using activity to aid detection.
Gryn et al. in [8] use hand-crafted direction maps to detect particular patterns
of interest (using a particular door, or making an illegal left turn at an intersec-
tion). These direction maps are regularly spaced vector fields representing the di-
rection of motion at locations of interest, and are scene specific, detecting motion
in a particular image plane location. Colombo et al. in [4] take a different tack,
modelling regular scene changes on a smaller temporal scale (escalator movement,

cyclic advertising hoardings) as part of the background model using Markov mod-
els – modelling scene motion in order to be able to ignore it.
Our work is related to many of these approaches: we learn histograms from
feature data, and use these to build models of activity within the scene. The
method characterises scene projections by patterns of accumulated motion over
an extended period as opposed to short-term motion patterns used in most earlier
work (e.g. motion history templates). The aim of this paper is to show that these
techniques can be used to move towards qualitative scene representations, which
will facilitate qualitative reasoning about scene activities.
3 Feature Detection and Tracking

The “KLT” tracker is suggested in [18] building upon work from [15,20] and is
based upon the insight that feature selection for tracking should be guided by
the tracker itself: the features we should be tracking are precisely those features
which are easy to track. Whilst in general the KLT feature tracker is good
at following features across the image, there is a trade-off between longevity
and reliability. This trade-off provides a practical bound on the length of our
descriptors, or tracklets: by reinitialising the tracker every M frames (in the
current implementation, M = 50) we have tracks which are reliable but long
enough to provide a descriptive representation of feature motion. These M frame
tracks are then split into two, which gives us a pair of temporally linked short
(yet reliable and descriptive) tracklets which with M = 50 last one second each.
We believe these tracklets comprise a promising representational level, but in
the current work we do not exploit this fully, and consider just the displacement
Fig. 1. A frame of video showing two sets of tracklets: most recent (just completed)
in blue; previous in green. These give a robust indication of motion in the image plane
without committing to any object segmentation or scale.
398 H.M. Dee et al.
between start and end points using the tracklet as a robust means of getting
to this. In order to do this, we look at the gross motion within each tracklet
thresholding on angle from the vertical θ and distance travelled d between first
and last points. This descriptor is one of up, up-right, right, down right . . . or still.
Tracklets are classified as still if their total movement d is below a threshold α:
in the current implementation α = 2 pixels, which we find allows for considerable
camera shake whilst still detecting most significant motion. This calculation is
set out in Table 1.
This directional quantization is similar to the system described in [8], although
they work with optical flow rather than tracked features and match their motion
descriptors to hand-crafted templates.
Table 1. An illustration of the direction calculations with associated thresholds. θ

is the angle between start and end points of the tracklet; d is the total displacement
between start and end points (in pixels). α has been set to two pixels in the current
implementation.
Label Short label Illustration Classification Criteria

Still S d<α
−π
Up U 8
<θ< π
8
3π
Up-right UR π
8
<θ≤ 8
3π 5π
Right R 8
<θ≤ 8
5π 7π
Down-right DR 8
<θ≤ 8
... ... ... ...
−π −3π
Up-left UL 8
≥θ> 8
4 Spatial Quantization: Histograms of Features

In order to characterise patterns of motion across the scene, we collect frequency
counts for each of these directional quanta in different image regions. We do this
by dividing the scene into N × N pixel bins and by accumulating a histogram
of each directional relation in each bin based upon the start of the tracklet.
In the current implementation, N is 16 - this works well in the scenes under
consideration allowing us to avoid a large proportion of empty bins, yet generate
regions detailed enough to capture the structure of the scene. A figure illustrating
the types of histogram we observe is shown in Figure 2. These histograms are
learned through observation over a period of time: the scenes in our experiments
are all over 10 minutes long, with the longest being about 30 minutes.
We have applied these processes (feature tracking and then spatial histogram-
ming) to three video datasets. The datasets are: a 30 minute video taken in a
university common room, featuring chairs in which people occasionally sit and
Fig. 2. A screenshot from the chair dataset with grid overlayed, showing histograms
calculated from different scene cells. Cell A near the top of the door does not see much
movement, and the movement that is observed is R and L corresponding to the opening
and closing motion of the door. Cell B is at a medium height on the wall behind the
chairs and sees motion both to the left and the right due to people moving backwards
and forwards behind the row of chairs. C, in the door region, has a major peak in its
histogram corresponding to motion to the left, due to people opening the door and
going out through it, and a less pronounced peak at R, presumably corresponding to
the door closing again.
drink tea or coffee; a 30 minute video from the UK Home Office “i-LIDS” (Im-
agery library for intelligent detection systems [9]) dataset of an underground
station, including platform, train track region and a bench where passengers oc-
casionally sit and wait for trains; and a 14 minute video of a busy roundabout
intersection, taken from the top of a 20 metre pole using an experimental 2
camera setup (containing considerable camera shake as a result). We have not
attempted to correct any issues with these datasets by pre-processing. These will
be called the chair, i-LIDS and roundabout scenes. Figure 3 shows the gener-
ated histogram information presented as a bitmap for each scene and for each
direction.
5 Dominant Patterns of Motion

Having calculated the motion histograms for each cell of the input video, the
next stage is to use these to segment the visual scene into regions characterised
by similar patterns of motion. This section describes two methods for achieving
this: the first uses one direction alone (the significant direction) as a basis for
clustering, and the second uses unsupervised learning techniques (K-means) to
determine which prototypical direction histograms best partition the space.
400 H.M. Dee et al.
Chair i-LIDS Roundabout

Thumbnails of input videos. From left to
right: Chair, within a busy room, i-Lids
underground station dataset, Roundabout
featuring experimental multi camera setup
and extraneous metadata.
With the frequency maps associated with

“still”, you see a representation of where
features are generally found.
Up: here we see the chairs in the chair
scene, movement on the platform and the
bench underground, and an artifact of the
split screen in the roundabout scene.
Up left: here we see the chairs in the chair
scene, movement around the bench in the
underground scene and the exit to the
roundabout at the left of the image.
Left: we see some movement behind the
chairs, some movement in the train area
of the underground scene, and the near-
ground side of the roundabout.
Down left: Here we see movement asso-
ciated with the chairs, the platform edge
(this is an artifact of the feature tracker),
and the relevant portion of the round-
about.
Down: here we see movement associated
with the chairs (again), the platform of the
tube scene, and a small amount of move-
ment in the far ground of the roundabout
scene.
Down right: here we see the chairs, the far

part of the platform and the bench, and
the far entrance onto the roundabout.
Right: here we see the area behind the
chairs, the far platform of the tube scene,
and the upper part of the roundabout (far-
thest from the camera).
Up right: here we see the chairs in the chair
scene, the edge of the platform in the tube
scenario (see Up left), and the right en-
trance of the roundabout scene.
Fig. 3. Histogrammed direction data (one row per bin) showing evident patterns of
motion within each input scene
5.1 Using Dominant Direction to Categorise Regions

The simplest means of categorising motion within a square is just to consider
each direction independently: essentially treating each directional histogram bin
as a separate “image”. A small amount of standard image pre-processing is
carried out on each channel (normalisation to scale each direction so frequency
counts fall in the range [0, 1], median filtering to smooth, and morphological
opening to create coherent regions). The results are then thresholded at the
median value for that direction. This results in a binary image for each direction
showing those parts of the scene where the amount of motion in that direction
is significant.
To create a single segmentation based upon principal direction of motion, we
use Markov Random Fields (MRFs) [2,3,12] in the place of simple thresholding.
MRFs provide a graph-based formalism for segmentation through energy min-
imisation: in this case we define energy as a function of both the input frequency
histogram (the data term) and the cost of labelling adjacent squares differently
(the smoothness term).
C(f ) = Cdata (f ) + Csmooth (f ) (1)
We use a smoothness term which penalises adjacent labels which are different
and does not penalise adjacent labels which are the same (thus encouraging
uniform regions). We have a smaller penalty for labels which are “one out”,
which has the effect of lowering the penalty for adjacent regions with adjacent
directions (right and up-right, for example). This can be thought of as decreasing
the penalty term for labels which are conceptual neighbours as well as physical
neighbours. Equation 2 provides details of the smoothness term for two adjacent
squares i and j; k is a constant set in these experiments to be 0.5.
Csmooth (i, j) = k if i and j have different labels (2)

Csmooth (i, j) = k/2 if i and j are conceptual neighbours
Csmooth (i, j) = 0 if i and j have the same label
The advantage of the MRF framework is that it provides a means of creating

smooth segmentations which preserve sharp boundaries where they exist; the
main disadvantage when used in this context is that it does not allow multiple
labels per element. Figure 4 shows the binary images generated by thresholding
for each direction for each scene, and the MRF generated joint segmentation.
5.2 Region Classification Using Unsupervised Learning

Whilst considering individual directions provides an informative segmentation
of the scene region it requires areas to be characterised by movement in just one
direction. This is not, in many cases, a valid assumption.
Rather than consider each direction independently, this section describes the
use of unsupervised clustering of the histograms themselves, to provide a sin-
gle label for each bin. We cluster the motion histograms using the K-means
402 H.M. Dee et al.
Chair i-LIDS Roundabout Comments
The Up Reft and Up Right directions high-

light the exit to the roundabout and the
traffic passing the exit.
Left motion and right motion both high-
light the “train” area of the i-LIDS dataset,
and the foreground and background sec-
tions of the roundabout.
Downleft, left, and downright seem to

highlight the bench in the i-LIDS scene.
Note the clear identification of the far side

of the roundabout
Markov Random Field segmentation
Fig. 4. Considering each direction independently. Final row shows the result of using
an MRF to combine these to form an overall segmentation, rather than using a set
threshold on each direction alone.
Chair i-LIDS Roundabout
K-means (K=9) as a raw scene partition.
K-means followed by MRF smoothing.
Fig. 5. Learned motion patterns used for scene partitioning, with clusters learned for
each scene. Colour coding in this figure is chosen within each scene: darker regions in
one scene are not necessary related to darker regions in another.
Input K=10 K=12 K=14
Fig. 6. Learned motion patterns used for scene partitioning, with clusters learned
across all scenes. Despite diferent values of K, the bench in the i-LIDS scene has
been identified as similar in motion pattern to the chairs in the chair scene. In this
Figure, the colour coding changes between values of K but is consistent across scenes.
For example, in the K=10 column the dark grey region which makes up the majority
of the column corresponds to a vector representing very little motion.
algorithm, and then we use these clusters as the basis for our segmentation. As
before, we use a Markov random field to smooth the segmentation. We use a
smoothness term which does not consider conceptual neighbours, as it is more
difficult to determine an ordering on the 8-dimensional1 input vectors. Thus the
1
The dimensions being: up; up left; left; down left; down; down right; right; and up
right.
404 H.M. Dee et al.
Fig. 7. Illustrations of the learned cluster centres. The size of the arrow is proportional
to the frequency with which that direction was observed. These illustrations are clusters
learned across all scenes when K=10.
smoothness term has a penalty for neighbouring squares which differ in cate-
gory, and no penalty for neighbouring squares which are the same. The distance
measure used is Euclidean distance between histograms. Figure 5 shows the par-
titioning of each scene given by the use of K-means clustering, and the same
partitioning after application of an MRF.
The images in Figure 5 illustrate segmentations obtained by training on each
scene individually. The motivation for this is that we might expect the motion
patterns of vehicles at a roundabout to be different to those of people in an under-
ground station, or in a university common room. However we might also expect
there to be a certain amount of similarity in motion between the scenes. Applying
K-means to all three datasets at once provides us with motion descriptors which
are not individually tailored to each scene but which capture similarities between
motion in each, and the results of this are shown in Figure 6. Figure 6 includes di-
agrams drawn with different values of K (the number of clusters). In each of these,
similar patterns appear.
Figure 7 shows cluster centres learned across all scenes when K=10, corre-
sponding to the second column in Figure 6. This figure shows quite clearly that
the observed patterns do not correspond to single dominant directions, but often
to pairs of opposites.
6 Evaluation
Informally, various scene elements can be identified – in the i-LIDS scene, the
track region is clear, in the chair scene, the chairs are clear, and in the roundabout
there is an obvious structure in the right place.
More formal evaluation is difficult as the generation of ground truth for motion
segmentation is not a trivial matter. We are concerned not with the way in which
the scene is superficially structured, but the way in which people interact with
the scene as they move around. For example, whilst the roundabout dataset is
indeed a roundabout, the majority of traffic goes straight across and turning
traffic is fairly uncommon. In the i-LIDS dataset, the platform has a number of
associated motion patterns, which differ from region to region (in some areas,
Screen shot from video.
Rough “ground-truth”.
MRF based on dominant direction
K-means followed by MRF smooth-

ing learned individually from each
scene.
K-means across all three scenes fol-

lowed by MRF smoothing.
Fig. 8. Ground truth with various segmentations: dominant direction, motion patterns
learned per scene and motion patterns learned across all scenes
hardly anybody waits, but in others there are often people milling around).
Despite these acknowledged difficulties we believe that comparison with a hand-
marked-up ground truth is the best way to evaluate this work and have generated
a simple region based segmentation against which to compare out output. This
is shown in Figure 8, alongside various outputs.
From Figure 8 we can see that many of the identified ground truth image
regions have parallels in the segmentations. The MRF based upon dominant
direction alone is the least like the ground truth segmentations; whilst it is
possible to find similarities it would be generous to say that these segmentations
were clear.
With the segmentations learned for each scene individually the scene structure
is more evident. The chair scene in particular has clearly highlighted the chairs
as regions of heightened motion (although not the door). Within the i-LIDS
dataset there is an unexpected distinction between regions of the train platform;
the middle area where most people chose to wait is associated with a different
cluster centre to the far and nearground, and there appears to be some form of
emergent “path” heading to and from the bench. The edge of the platform and
the train region have both emerged from the observed data. In the roundabout
406 H.M. Dee et al.
scene the near and far sections stand out very well, as does the left hand feeder
branch to the roundabout.
Finally considering the segmentations created by learning over all scenes si-
multaneously (the final line of images in Figure 8) we can begin to detect sim-
ilarities between the regions defined in each scene. Whilst we cannot claim to
have constructed something that can detect chairs and benches it is however fair
to say that the clusters associated with the chairs in the chair scene (marked
as pale grey in the ground truth) also seem to be associated with the bench in
the i-LIDS scene (marked as black in the ground truth). The roundabout scene
is not segmented as clearly in the combined segmentations as in the individual
segmentations, presumably as this scene contains strongly directional motion
(each section effectively being a one-way street).
7 Conclusions and Future Directions
This paper has presented a novel approach for the unsupervised learning of spa-
tial regions from motion patterns. Our aim is to create segmentations of input
video which correspond to semantically meaningful regions in an unsupervised
fashion, and then to use these semantically meaningful regions within a quali-
tative spatial reasoning (QSR) framework. We have made considerable progress
towards this aim, and have generated segmentations which correspond in part to
ground truth segmentations of three experimental scenes. Our method is robust
to camera shake and background changes in a way that the existing path based
systems are not (due to their reliance on some form of background model).
Further investigation is required to determine which varieties of input are
most useful to this type of system: the directional histograms used here could be
augmented by information about speed, for example, and we are investigating
ways to further exploit the tracklet representation. We have carried out informal
investigations in the variation of histogram bin size (resulting in the 16 by 16
bins reported here) but a more thorough study could be useful, and the opti-
mal size will almost certainly be scene dependent. The use of overlapping bins
or pyramidical representations is also something we wish to pursue. Perhaps
more interestingly, further investigation is needed into the detection of common
patterns across different scenes, perhaps within a supervised or semi-supervised
machine learning framework. The similarity between segmentation of the bench
in the i-LIDS dataset and the chairs in the chair dataset is a promising sign, and
it would be an interesting experiment to collect video of many scenes containing
chairs or benches and see if we can learn their associated motion patterns from
observation.
The scenes under consideration in this paper contain various types of mo-
tion constrained in various ways, and perhaps because of this the two broad
approaches outlined in this paper (dominant direction vs. K-means clustering)
perform differently in each scene. The dominant direction thresholding results in
clear images of the roundabout scene, which is an example of what Fernyhough
called a strongly stylised domain. As such we should expect strong directions to
emerge. There are certain aspects of a roundabout which cannot be modelled in

terms of dominant direction alone; what we have are a sequence of observations
caused by motion of objects in the real world subject to certain spatial and tem-
poral constraints. Incorporating temporal information in some way might also
detect patterns in the i-LIDS scene such as those caused by trains entering and
passengers leaving the station.
Acknowledgements
This work was supported by EPSRC project LAVID, EP/D061334/1. We would

like to thank Joe Sinfield for assistance with data collection and Mark Conmy
for technical assistance.
References
1. Fernyhough, J.H., Cohn, A.G., Hogg, D.C.: Generation of semantic regions from
image sequences. In: Proc. European Conference on Computer Vision (ECCV),
Cambridge, UK, pp. 475–484 (1996)
2. Laptev, I.: On space-time interest points. Journal of Computer Vision 64(2/3),
107–123 (2005)
3. Johnson, N., Hogg, D.C.: Learning the distribution of object tractories for event
recognition. Image and Vision Computing 14(8), 609–615 (1996)
4. Stauffer, C., Grimson, E.: Learning patterns of activity using real-time tracking.
IEEE transactions on Pattern Analysis and Machine Intelligence (PAMI) 22(8),
747–757 (2000)
5. Makris, D., Ellis, T.: Learning semantic scene models from observing activity in
visual surveillance. IEEE Transactions on Systems, Man and Cybernetics 35(3),
397–408 (2005)
6. McKenna, S.J., Charif, H.N.: Summarising contextual activity and detecting un-
usual inactivity in a supportive home environment. Pattern Analysis and Applica-
tions 7(4), 386–401 (2004)
7. KaewTraKulPong, P., Bowden, R.: Probabilistic learning of salient patterns across
spatially separated, uncalibrated views. In: Intelligent Distributed Surveillance Sys-
tems, pp. 36–40 (2004)
8. Xiang, T., Gong, S.: Beyond tracking: Modelling activity and understanding be-
haviour. International Journal of Computer Vision 67(1), 21–51 (2006)
9. Bicego, M., Cristiani, M., Murino, V.: Unsupervised scene analysis: a hidden
Markov model approach. Computer Vision and Image Understanding (CVIU) 102,
22–41 (2006)
10. Efros, A.A., Berg, A.C., Mori, G., Malik, J.: Recognizing action at a distance. In:
Proc. International Conference on Computer Vision (ICCV), Nice, France (2003)
11. Laptev, I., Pérez, P.: Retrieving actions in movies. In: Proc. International Confer-
ence on Computer Vision (ICCV) (2007)
12. Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented histograms of
flow and appearance. In: Proc. European Conference on Computer Vision (ECCV),
pp. 428–441 (2006)
408 H.M. Dee et al.
13. Gryn, J.M., Wildes, R.P., Tsotsos, J.: Detecting motion patterns via direction
maps with application to surveillance. In: Workshop on Applications of Computer
Vision, pp. 202–209 (2005)
14. Colombo, A., Leung, V., Orwell, J., Velastin, S.A.: Markov models of periodically
varying backgrounds for change detection. In: Visual Information Engineering, Lon-
don, UK (2007)
15. Shi, J., Tomasi, C.: Good features to track. In: Proc. Computer Vision and Pattern
Recognition (CVPR), pp. 593–600 (1994)
16. Lucas, B.D., Kanade, T.: An iterative image registration technique with an appli-
cation to stereo vision. In: International Joint Conference on Artificial Intelligence,
pp. 674–679 (1981)
17. Tomasi, C., Kanade, T.: Detection and tracking of point features. Technical Re-
port CMU-CS-91-132, Carnegie Mellon (1991)
18. Home Office Scientific Development Branch U.i-LIDS: Imagery library for intelli-
gent detection systems, http://scienceandresearch.homeoffice.gov.uk/hosdb/
cctv-imaging-technology/video-based-detection-systems/i-lids/
19. Boykov, Y., Veksler, O., Zabih, R.: Efficient approximate energy minimization
via graph cuts. IEEE transactions on Pattern Analysis and Machine Intelligence
(PAMI) 20(12), 1222–1239 (2001)
20. Kolmogorov, V., Zabih, R.: What energy functions can be minimized via
graph cuts? IEEE transactions on Pattern Analysis and Machine Intelligence
(PAMI) 26(2), 147–159 (2004)
21. Boykov, Y., Kolmogorov, V.: An experimental comparison of min-cut/max-flow al-
gorithms for energy minimization in vision. IEEE transactions on Pattern Analysis
and Machine Intelligence (PAMI) 26(9), 1124–1137 (2004)
Pareto-Optimality of Cognitively Preferred
Polygonal Hulls for Dot Patterns
Antony Galton
School of Engineering, Computing and Mathematics, University of Exeter, UK
Abstract. In several areas of research one encounters the problem of

generating an outline that is in some way representative of the spatial
distribution of a pattern of dots. Several different algorithms have been
published which can generate such outlines, but the detailed evaluation
of such algorithms has mostly concentrated on their computational and
mathematical properties, while the adequacy of the resulting outlines
themselves has been left as a matter of informal human judgment. In
this paper it is proposed to investigate the perceptual acceptability of
outlines independently of any particular algorithm for generating them,
in order to determine objective criteria for evaluating outlines from the
full range of possibilities in a way that is conformable to human intuitive
assessments. For the sake of definiteness it is assumed that the outline
to be produced is a simple closed polygon whose vertices are elements of
the given dot pattern, all remaining elements of the dot pattern being
in the interior of the polygon. It is hypothesised that to produce a cog-
nitively acceptable outline one should seek simultaneously to minimise
both the area and perimeter of the polygon, and that therefore the points
in area-perimeter space corresponding to cognitively optimal outlines will
lie on or close to the Pareto front. A small pilot study was conducted,
the results of which lend strong support to the hypothesis. The paper
concludes with some suggestions for further more detailed investigations.
Keywords: Polygonal hulls, dot patterns, perceived shape, multi-

objective optimisation.
1 Introduction
When presented with a two-dimensional pattern of dots such as the one shown
in Figure 1, and asked to draw a polygonal outline which best captures the
shape formed by the pattern, people readily respond by drawing outlines such
as those shown in Figure 2. Interestingly, on first encountering this task, people
often tend to imagine that there is a unique solution, ‘the’ outline of the dots;
but they will very quickly be persuaded that there is typically no unique best
answer. Only the convex hull has any claim to uniqueness, but in very many
cases (such as the example shown), the convex hull is a bad solution to the task,
since it does not capture the shape that we humans perceive the dots as forming.
This is illustrated in Figure 3, where two distinct point-sets, having the shape
of the letters ‘C’ and ‘S’, have the same convex hull.

410 A. Galton
Fig. 1. A simple dot pattern
Fig. 2. Example outlines for the same dot pattern
The problem of representing a dot pattern by means of an outline which

in some way captures the shape defined by the pattern, or the region of space
occupied by the dots, has been investigated over a number of years by researchers
from a number of different disciplines [1,2,3,4,5,6]. There may be several distinct
motivations underlying such investigations; for example:
– Map generalisation. What appears at one level of detail as a set of discrete
points may be better represented, at a coarser level of detail, as a region.
The region indicates the general location and configuration of the points, but
does not indicate how many points there are or their individual positions.
Fig. 3. Point-sets with the same convex hull
– Region approximation for storage and retrieval efficiency. Geographical re-

gions typically have complex sinuous outlines which place a high load on
storage and retrieval capacity when represented digitally. Web-based dig-
ital gazetteers require efficient ways of recording the locations associated
Pareto-Optimality of Cognitively Preferred Polygonal Hulls 411
with region names; but traditional approximations such as bounding boxes,

centroids, or convex hulls are far too crude for most purposes, and detailed
information concerning the region’s boundary may be unnecessarily complex,
even if available. What is needed is an approximation to a region which may
be efficiently generated from available information about points known to lie
inside (or outside) the region [7,8].
– Gestalt perception. As humans, we typically perceive a cluster of points as
occupying some two-dimensional region in the visual field, and can describe,
at least roughly, the outline of the region they occupy. If we want to emulate
this capacity of the human visual system in a computer vision system, we
need to be able to compute the region from the points.
– Representation and reasoning about collective phenomena such as flocking
animals, traffic, or crowds. In such cases the ‘ground truth’ consists of a set
of individuals each with its own typically point-like location at any given
time, but for many purposes it is desirable to think of the phenomenon as
a single entity spread out across a region which may be thought of as the
‘spatial footprint’ of the collective [9].
Different motivations may result in different criteria for evaluating the quality of
the outlines produced by any proposed method, but it is striking that, for the
most part, existing literature on the problem has had little to say about these
criteria, focusing rather on the technical details and computational characteristics
of different algorithms. The main aim of this paper is to redress this imbalance by
focussing on the question of evaluation criteria rather than particular algorithms.
2 Previous Work
As mentioned above, there is already a considerable body of work, much of it
in the pattern analysis, computer vision, and geographical information science
communities, on defining the shape of dot patterns. A typical paper in this area
will propose an algorithm for generating a shape from a pattern of dots, explore
its mathematical and/or computational characteristics (e.g., computational com-
plexity), and examine its behaviour when applied to various dot patterns. The
evaluation of this behaviour is typically very informal, often amounting to little
more than observing that the shape produced by the algorithm is a ‘good ap-
proximation’ to the perceived shape of the dots. While lip-service is generally
paid to the fact that there is no objective definition of such a ‘perceived shape’,
little is said about how to verify this, or indeed, about exactly what it means.
The much-cited work of Edelsbrunner et al. [1], introduces the notion of α-
shape: whereas the convex hull of a point-set S is the intersection of all closed
half-planes containing all the points of S, their ‘α-hull’ is the intersection of all
closed discs of radius 1/α containing all points of S (for α < 0 the closed disc of
radius 1/α is interpreted as the complement of an open disk of radius −1/α, and
for α = 0 it is a half-plane). The α-shape is a piecewise linear curve derived in
a straightforward manner from the α-hull. For certain (typically small negative)
values of α, the α-shape can come close to capturing the cognitively salient
412 A. Galton
aspects of the overall distribution of points. The authors go into considerable

details concerning the mathematical properties of these shapes, but almost the
only thing stated by way of evaluating the adequacy of the shapes produced by
the algorithm is that ‘α-shapes . . . seem to capture the intuitive notion of “finer”
or “cruder shape” of a planar pointset’.
Similar reticence is shown by others who have followed. Garai and Chaudhuri [2]
propose a ‘split-and-merge’ procedure, which starts by constructing the convex hull
of the points, and then successively inserts extra edges or smooths over zigzags. The
splitting procedure results in a highly jagged outline, which is then made smoother
by the merging procedure. But again, the authors say almost nothing on the evalua-
tion of the results of the algorithm, although it is clear that one purpose of reducing
the jaggedness of the outline is to improve its cognitive acceptability.
Melkemi and Djebali [3] propose the A-shape: Given a finite set of points P
and a set A disjoint from P , the A-shape of P is obtained from the Voronoi
diagram for A ∪ P by joining any pair of points p, q ∈ P whose Voronoi cells
border each other and also the Voronoi cell of a point in A. The edges pq are
the ‘A-exposed’ edges of the Delaunay triangulation of A ∪ P . The A-shape was
introduced ‘with the aim of curing the limits of α-shape’, and the authors have
only a little more to say on its evaluation: their explicit aim is to look for a
‘polygonal representation’ that ‘must reflect the forms perceived by a human
observer of the dot patterns set’.
Chaudhuri et al [4] also make explicit reference to human visual perception.
Their r-shape is obtained by constructing the union Ur of all disks of radius r
centred on points of P , and then, for p, q ∈ P , selecting edge pq if the boundaries
of the discs centred on p and q intersect on the boundary of Ur ; the r-shape of
P is the union of the selected edges. In the same paper they discuss the s-
shape, obtained by partitioning the space into a lattice of s × s squares and then
taking the union of those squares which contain points of P . They confine their
attention to regular dot patterns, in which ‘the points are clearly visible as well
as fairly densely and more or less evenly distributed’ (unlike, for example, our
Figure 1). For such patterns they say that ‘one can perceive the border of the
point set’, and see their problem as ‘extracting the border that is compatible
with the perceived shape of the input pattern’; they also speak of ‘the intuitive
shape of the dot pattern’. This way of speaking seems to imply that there is a
unique perceived shape, but they acknowledge that ‘“perceptual structure” of
[a dot pattern] S cannot be defined uniquely’, adding that it ‘will vary from
one person to another to a small extent’. But no attempt is made to determine
the extent of such variation, and in evaluating the results little is said beyond
the statement that ‘if ε [a real-valued scaling factor used in their algorithms]
lies in the range 0.3–0.5, the extracted border is compatible with the perceptual
border of the dot pattern’ — and again, no quantitative measure of degree
of compatibility is given. The remainder of their evaluation concerns intrinsic
features of the algorithm such as its computational complexity.
Galton and Duckham [5] proposed three different algorithms for generating a
region (called a ‘footprint’) from a set of points. One, the ‘swinging arm’ method,
generalises the ‘gift-wrap’ algorithm for constructing convex hulls; a line segment
of length r is swung about an extremal point of the set until it encounters another
point in the set; the two points are joined, and the procedure repeated from the
second point, until a closed shape is produced. Additional components of the
footprint will be obtained if points in the set lie outside the first component.
Similar results can be obtained by joining all pairs of points separated by at
most r and then selecting the peripheral joins, resulting in the ‘close pairs’
method. In the third algorithm, a region is produced by successively removing
the longest exterior edges from the Delaunay triangulation of the points, subject
to the condition that the region remains connected and its boundary forms a
Jordan curve. In this work, more attention was paid to the question of evaluation
criteria, and nine questions were listed that could be used to help classify different
types of solution to the general problem of associating a region with a set of
points. But like the work previously reviewed, this paper shied away from any
detailed examination of the concept of ‘perceived shape’ other than noting that
any such examination must ‘go beyond computational geometry to engage with
more human-oriented disciplines such as cognitive science’.
Moreira and Santos [6] proposed a ‘concave hull’ algorithm which is an al-
ternative generalisation of the gift-wrap algorithm, in which at any stage only
the k nearest neighbours of the latest point added to the outline are considered
as candidates for the next addition. They state the problem as that of find-
ing ‘the polygon that best describes the region occupied by the given points’,
and acknowledge that the word ‘best’ here is ambiguous, what counts as a best
solution being application dependent; but evaluation of the algorithm is largely
confined to its computational characteristics and not the adequacy of the results,
for which they do little more than refer to the criteria listed in [5]. Outputs from
this algorithm (for Pattern 5 in Appendix A) are shown in Figure 4.
In work currently in press, Duckham et al. [10] present more detailed eval-
uation for the Delaunay-based method first presented in [5], leading to a con-
clusion that ‘normalized parameter values of between 0.05–0.2 typically produce
optimal or near-optimal shape characterization across a wide range of point
distributions’, but it is acknowledged that what ‘optimal’ means here is both
underspecified and somehow connected with ‘a shape’s “visual salience” to a
human’. The actual evaluation presented in [10] takes the approach of starting
with a well-defined shape, generating a dot pattern from it, and then testing the
algorithm’s efficacy at reconstructing the original shape.
The purpose of the present paper is to take some first steps towards estab-
lishing some principles for evaluating any proposed solution to the problem of
determining an outline for a set of points. Whereas previous work has mostly
been concerned with proposing particular algorithms for generating outlines,
here I propose that, independently of any particular algorithm, we consider a
full range of possible outlines, and try to determine what features, describable
in objective (e.g., geometrical) terms, influence cognitive judgments as to the
suitability of an outline as a depiction of ‘the’ shape defined by the set of points.
414 A. Galton
k=4 k=5 k=6
k=7 k=8 k = 10
Fig. 4. Polygonal hulls generated by the Concave Hull algorithm [6]
3 The Scope of the Inquiry

In order to bring the treatment to manageable proportions, we first make some
assumptions about the kind of solution that is being sought. Many, though by no
means all, of the published algorithms produce outlines satisfying the following
criteria:
1. The outline is a polygon whose vertices are members of the dot pattern.
2. Any member of the dot pattern which is not a vertex of the polygon lies in
the interior of the polygon.
3. The boundary of the polygon forms a Jordan curve (so in particular no point
is encountered more than once in a full traversal of the boundary).
We shall call such outlines polygonal hulls of the underlying dot pattern; for
brevity, we shall usually just refer to them as ‘hulls’. The outlines shown in
Figures 2 and 4 are of this kind. We exclude from consideration curvilinear
outlines, outlines which exclude one or more points of the dot pattern, outlines
which include all points of the dot pattern in their interior, outlines which are
topologically non-regular, self-intersecting outlines, etc. Examples of two such
excluded outlines are shown Figure 5.
It is obvious that the vertices of the convex hull for any dot pattern will
appear as vertices of all of the polygonal hulls for that pattern; and moreover,
in any of the polygonal hulls, the convex-hull vertices will appear in the the
same sequential order around the perimeter. In general we may represent a dot
pattern as K ∪ I, where K is the set of vertices of the convex hull and I is the set
of dots in the interior of the convex hull. Let the clockwise ordering of convex
Fig. 5. Two non-examples: these do not count as polygonal hulls
hull vertices be p1 , p2 , . . . , pk , and let the interior dots be q1 , q2 , . . . , qn−k . Then

the sequence of vertices of any polygonal hull for the dot pattern will consist of
p1 , p2 , . . . , pk in that order, interspersed with some selection from q1 , q2 , . . . , qn−k
in some order.
How many polygonal hulls are there for a pattern of n dots? We can easily cal-
culate an upper bound. From the above observations, for an n-point dot pattern
whose convex hull has k vertices, we can select a polygonal hull by a sequence of
four choices: (1) choosing how many interior dots qi will be vertices of the hull
(say r dots, where 0 ≤ r ≤ n − k); (2) choosing which r of the n − k available
interior dots will be vertices of the hull (n−k Cr choices); (3) in the clockwise
traversal of the hull starting from p1 , choosing which r of the k + r − 1 remaining
vertices will be assigned to interior dots (k+r−1 Cr choices); (4) choosing in which
order the r interior dots will be assigned to the r vertex positions chosen at the
previous step (r! choices). Not every combination of such choices will lead to a
polygonal hull (the perimeter of the resulting polygon may be self-intersecting,
or some of the dots may lie outside the polygon), but each polygonal hull will
arise from exactly one combination of choices. Thus the number of polygonal
hulls for an n-point dot pattern with a k-vertex convex hull is at most
n−k

n−k
Cr k+r−1 Cr r!.
r=0
For the case n = 12 and k = 7, this comes to 86,276; but the 12-point dot
pattern shown in Figure 6, with seven vertices in its convex hull, actually has
only 5674 polygonal hulls, approximately 6.6% of the upper bound. Even so, the
number of polygonal hulls does grow rapidly as the number of dots increases,
and for large values of n it becomes impracticable to compute all of them (with
n = 16 we are already talking days rather than hours or minutes in the worst
case). In reality, however, only a tiny fraction of the polygonal hulls are worth
considering as good candidates for the ‘perceived shape’ of the dot pattern.
Figure 7 illustrates three of the 5674 polygonal hulls for the dot pattern in
Figure 6. The leftmost one is the convex hull. This is easily defined, has well-
known mathematical and computational properties, and might be considered as a
useful representation of the dot pattern for some purposes; but as already noted,
it does not usually capture the perceived shape of the pattern. The rightmost
one provides a very jagged outline which does not correspond to anything that
416 A. Galton
s
s
s
s s
s s
s s
s s
s
Fig. 6. Dot Pattern 1
Fig. 7. Example polygonal hulls for Dot Pattern 1
we readily perceive when observing the dots on their own. The middle hull, on
the other hand, does seem to capture pretty well a shape that we can readily
perceive in the dots. It is certainly not unique in doing so, however, and in the
pilot study reported below, only 2 out of 13 subjects drew this as their preferred
hull for this pattern of dots.
What factors make a polygonal hull acceptable as a representation of the ‘per-
ceived shape’ of a dot pattern? The problem with the convex hull is that it will
often include large areas devoid of dots; these are the perceived concavities in the
shape, and the convex hull completely fails to account for them. Of all possible
hulls, the convex hull simultaneously maximises the area while minimising the
perimeter. It is the maximality of the area which causes the problem, since this
correlates with the inclusion of the empty spaces represented by the concavities
in the perceived outline. At the other extreme, the jagged figure on the right does
very well at reducing the area, but at the cost of a greatly extended perimeter.
The middle figure seems to strike a better balance, with both area and perimeter
taking intermediate values, as shown in Table 1.
A cognitively acceptable outline should (a) not contain too much empty space,
and (b) should not be too long and sinuous. This suggests that to produce the
Table 1. Area and perimeter measurements for the hulls in Figure 7 (units of mea-
surement arbitrary)
Area Perimeter
Hull 1 42761.0 783.5
Hull 2 27163.0 962.5
Hull 3 21032.0 1599.3
optimal outline we should seek to simultaneously minimise both the area and
the perimeter. These are, of course, conflicting objectives, since the minimum
perimeter (that of the convex hull) corresponds to the maximum area. In the
language of multi-objective optimisation theory [11], we seek non-dominated so-
lutions. A polygonal hull with area A1 and perimeter P1 is said to dominate
one with area A2 and perimeter P2 (with respect to our chosen objectives of
minimising both area and perimeter) so long as
(A1 ≤ A2 ∧ P1 < P2 ) ∨ (A1 < A2 ∧ P1 ≤ P2 ).
The hulls which are not dominated by any other hulls form what is known as
the Pareto set. When plotted in area-perimeter space (‘objective space’) they
lie along the Pareto front. This shows up in the graphs as the ‘south-western’
frontier of the set of points corresponding to all the hulls for a given dot pattern.
Area-perimeter plots for all eight dot patterns used in the pilot study described
below can be found in Appendix B. In these figures, area is plotted along the
horizontal axis, perimeter along the vertical; the convex hull, with maximal area
and minimal perimeter, corresponds to the point at the extreme lower right.
In light of the above considerations, we propose the following
Hypothesis: The points in area-perimeter space corresponding to polyg-
onal hulls which best capture a perceived shape of a dot pattern lie on or
close to the Pareto front.
The next section describes a pilot study which was carried out as a first step in
the investigation of this hypothesis.
4 Pilot Study
A small pilot study was carried out to gain an initial estimation of the plausibility
of the hypothesis. Eight dot patterns were presented to 13 adult subjects, who
were asked to draw a polygonal outline which best captures the shape formed
by each pattern of dots. An example dot pattern with two possible polygons was
shown (these are our Figures 1 and 2), and more precise rules given as follows:
1. The outline must be a simple closed polygon whose vertices are members of
the dot pattern; that is, it must consist of a series of straight edges joining
up some or all of the dots, forming a closed circuit.
418 A. Galton
2. You do not have to include all the given dots as vertices of your outline; but
any dots that are not used must be in the interior of the polygon formed,
not outside it.
3. The outline must not intersect or touch itself; so outlines such as the two
below are not allowed: [here the two non-examples of Figure 5 were given].
The eight dot patterns used in the pilot study are shown in Appendix A.
The results of the pilot study are tabulated in Table 2. The rows of the table
correspond to the eight dot patterns. For each dot pattern the following data
are given:
– The number of dots in the pattern.
– The total number of polygonal hulls for the pattern.
– The number of Pareto-optimal polygonal hulls for the pattern.
– The maximum number of dominators for any individual polygonal hull.
– The number of distinct hulls generated by the subjects: the relevance of
this figure is that it shows that the subjects provided a variety of different
responses — for none of the dot patterns were there just one or two ‘obvious’
outlines to draw.
– The number of subjects who responded with a Pareto-optimal hull.
– The mean relative domination of the responses — this quantity is explained
below.
Table 2. Results of pilot study involving 13 subjects and 8 dot patterns
No. of No. of Pareto-opt. Max. no. of Distinct Pareto-opt. Mean

Pattern dots hulls hulls dominators responses responses rel. dom.
1 12 5674 43 5186 9 8 0.000252
2 12 14095 81 13023 12 5 0.002640
3 11 1246 38 996 8 10 0.004943
4 12 1826 23 1632 10 7 0.002168
5 13 74710 61 73205 12 6 0.000139
6 11 3303 29 3024 12 4 0.000738
7 11 3637 36 3322 11 6 0.003473
8 11 8308 72 7630 5 11 0.000323
Our hypothesis was that hulls corresponding to some ‘perceived shape’ of the
dot pattern should lie on or close to the Pareto front in the area-perimeter plot.
Totalling the figures in the penultimate column of the table, we see that 57
out of the total 104 responses were Pareto-optimal. The figures in the fourth
column give the number of Pareto-optimal hulls available for that dot pattern,
an indication of the size of the ‘target’ if our hypothesis is correct. The fifth
column in the table shows the maximum number of hulls by which any given
hull for that dot pattern is dominated: it will be seen that this always falls short
of the total number of hulls, but not usually by much.
A measure of the extent to which a hull falls short of being Pareto-optimal is

given by the ‘relative domination’, that is, the ratio of the number of hulls which
dominate it to the the maximum number of hulls that dominate any one hull for
that dot pattern. The relative domination for any individual hull is thus obtained
by dividing the number of dominators of that hull by the number of dominators of
a maximally dominated hull. The relative domination ranges from 0 for a Pareto-
optimal hull to 1 for a maximally-dominated hull. For the hypothesis to be
corroborated, we should expect the relative domination of subjects’ responses to
be consistently close to 0, and this is indeed what we find. The rightmost column
of the table shows the mean relative domination across all thirteen subjects, for
each dot-pattern. The highest individual value for the relative domination was
0.008578, for a response to dot pattern 2 which was dominated by 118 out of
the 14,095 hulls for that pattern. Compare this with the jagged rightmost hull
in Figure 7, which has a relative domination of 0.2347.
If Pareto-optimality had no influence on the subjects’ selection of polygonal
hulls, we should expect the relative frequency of Pareto-optimal hulls selected for
any of the dot patterns in the pilot study to approximate the relative frequency
of Pareto-optimal hulls in the full set of hulls for that pattern. For example, for
pattern 3, only 3% of the hulls are Pareto-optimal, which means that we should
expect a Pareto-optimal hull to be chosen by 0.03 × 13 ≈ 0.4 subjects. Summing
the corresponding values for all the dot patterns, we would expect about 1.1
out of the 104 responses to lie on the Pareto front, on the hypothesis that
Pareto-optimality is not a relevant factor. This should be compared with the 57
Pareto-optimal responses actually observed. A chi-squared test gives χ2 = 2872,
considerably larger than the value of 10.827 required for statistical significance
at the 0.1% level. From our observations, the chance that Pareto-optimality has
no influence on subjects’ choices is effectively zero.
In Appendix C are shown, for each dot pattern, the points on the Pareto front
(small dots), and the points corresponding to the hulls chosen by the subjects in
the pilot study (circles). Comparing these with the full set of hulls illustrated in
Appendix B, one obtains a good idea of how closely the hulls drawn by human
subjects to represent the perceived shape of the dot pattern adhere to the Pareto
front.
In conclusion, the results of the pilot study lend considerable support to the
hypothesis that the perceived shape of a dot pattern will tend to be Pareto-
optimal with respect to minimising both area and perimeter.
5 Next Steps
The pilot study reported here is limited in both scale and scope. There are
many possibilities for further work to examine a range of additional factors with
larger-scale experiments. Here we list a number of such possibilities.
1. Choice of dot patterns. The dot patterns used in the pilot study were chosen
on the basis of an informal idea that they were in some way ‘interesting’.
420 A. Galton
As such, they no doubt incorporate an unconscious bias towards patterns

of a particular type. To be sure that our results remain valid over the full
range of possible dot patterns, it will be necessary to adopt a more principled
approach to the selection of the patterns, e.g., using a randomised procedure
to generate the patterns. It will also be necessary to investigate larger dot
patterns, but the inherent intractability of any algorithm to generate all the
hulls for a given pattern would make this impractical for patterns much larger
than those already considered. Alternative approaches, involving sampling
from the full set of hulls, may have to be considered instead.
2. Choice of experimental procedures. Instead of asking subjects to draw hulls
for the dot patterns presented, other tasks may also yield useful information.
Examples are
(a) Subjects are presented with a selection of possible outlines for a dot
pattern and asked to choose the one which, for them, best represents the
perceived shape of the pattern.
(b) Subjects are presented with pairs of outlines for a given dot pattern and
asked to select the preferred outline.
(c) Subjects are presented with a selection of possible outlines for a dot
pattern and asked to rank them in order of acceptability.
(d) Free-form commentary: in any of the above situations, subjects are in-
vited to explain why they judge one outline to be more acceptable than
another.
3. Application context. A possible concern with any of the above procedures
is that they are assumed to be conducted in the absence of any proposed
application context. Subjects are not being asked to rate the outlines as good
for anything in particular, but merely what looks ‘right’ to them. A priori,
one might suppose that this would prove problematic for some subjects,
although in the pilot study it was found that subjects were very willing to
treat the task as an abstract exercise without reference to any application.
However, most of the subjects in the pilot study were university-educated,
many of them actually working in the university, and if a wider-ranging set
of subjects is used, this may become a more serious consideration, and it
may be appropriate to embed the tasks in some ‘real-world’ problem context
(e.g., map generalisation) in order to provide better motivation.
4. Other objective criteria. The results of the pilot study were only examined
from the point of view of the area/perimeter minimisation hypothesis. But no
doubt other factors are involved: in particular, once it is established that pre-
ferred outlines tend to lie on or close to the Pareto front of the area/perimeter
graph, the obvious question is what further factors influence exactly where-
abouts on the Pareto front the preferred solutions will be found. As the
examples in the pilot study show, the Pareto front may take various forms.
The point of maximum curvature sometimes assumes the form of a well-
marked ‘knee’, to the right of which the slope is quite gentle, representing a
series of hulls with increasing area but similar perimeter. A priori one might
expect the preferred hulls to lie towards the left of this series, near the knee,
but the experimental results do not really bear this out. Further investigation
is needed to determine what factors influence the location of the optimal
hulls along the front. Factors that might be considered include sinuosity (a
measure of which is the number of times the outline changes from convex
to concave or vice versa as it is traversed), or the number of vertices in the
hull. Both of these are to some extent correlated with perimeter, although
the correlation is far from exact. One might also wish to investigate other
factors such as symmetry, which undoubtedly affect visual salience.
5. Evaluation of algorithms. Having established an appropriate set of criteria
for evaluating polygonal hulls, one can then begin experimenting with dif-
ferent algorithms. Many of the published algorithms for producing outlines
of dot-patterns yield polygonal hulls in the sense defined in this paper, and
an obvious first step would be to investigate to what extent these algorithms
tend to produce outlines that are optimal according to the criteria that have
been established. In particular, most of the existing algorithms involve a pa-
rameter — typically a real-valued length parameter, but in the case of the
k-nearest neighbour algorithm of [6], it is a positive integer. It would there-
fore be interesting to investigate how the objective evaluation criteria vary
as the parameter is varied: one could, for example, trace the path followed by
an algorithm’s output in area-perimeter space as the parameter runs through
the full range of its possible values, and hence find which parameter settings
optimise the quality of the output. For the hulls shown in Figure 4, for ex-
ample, the number of dominators in area-perimeter space are 0, 5, 4, 0, 5,
and 0 respectively, suggesting that this algorithm, like our human subjects,
is very good at finding hulls on or near the Pareto front.
6. Algorithm design. Going beyond this, one might also ask whether it is pos-
sible to design an algorithm with those criteria in mind, that is, to tailor an
algorithm to produce hulls which are optimal with respect to the criteria.
With larger point sets, one can only expect to identify the Pareto-optimal
hulls to some degree of approximation, suggesting that a fruitful approach
here might be to use some form of evolutionary algorithm.
7. Extension to three dimensions. Many of the ideas discussed here could prob-
ably be generalised to apply to three-dimensional dot patterns. A hull must
now be a volume of space bounded by a polyhedral surface rather than an
area bounded by a polygonal outline: a ‘polyhedral hull’. Some, but not
all, of the algorithms that have been used for generating outlines of two-
dimensional dot patterns readily generalise to three dimensions; little work
has been done on this, though the Power Crust algorithm of [12,13] is not
unrelated. There would be obvious practical difficulties in asking experimen-
tal subjects to construct polyhedra in space rather than drawing outlines on
a piece of paper, but no doubt some suitable experiments could be devised.
For the time being, however, the two-dimensional case already offers ample
scope for further investigation.
422 A. Galton
Acknowledgments
The author wishes to thank Jonathan Fieldsend and Richard Everson for useful
comments on an earlier draft of this paper, including advice on multi-objective
optimisation.
References
1. Edelsbrunner, H., Kirkpatrick, D.G., Seidel, R.: On the shape of a set of points in
the plane. IEEE Transactions on Information Theory IT-29(4), 551–559 (1983)
2. Garai, G., Chaudhuri, B.B.: A split and merge procedure for polygonal border
detection of dot pattern. Image and Vision Computing 17, 75–82 (1999)
3. Melkemi, M., Djebali, M.: Computing the shape of a planar points set. Pattern
Recognition 33, 1423–1436 (2000)
4. Chaudhuri, A.R., Chaudhuri, B.B., Parui, S.K.: A novel approach to computation
of the shape of a dot pattern and extraction of its perceptual border. Computer
Vision and Image Understanding 68(3), 257–275 (1997)
5. Galton, A.P., Duckham, M.: What is the region occupied by a set of points? In:
Raubal, M., Miller, H.J., Frank, A.U., Goodchild, M.F. (eds.) Geographic Infor-
mation Science: Proceedings of the 4th International Conference, GIScience 2006,
6. Moreira, A., Santos, M.: Concave hull: a k-nearest neighbours approach for the
computation of the region occupied by a set of points. In: Proceedings of the 2nd In-
ternational Conference on Computer Graphics Theory and Applications (GRAPP
2007), Barcelona, Spain, March 8-11 (2007)
7. Alani, H., Jones, C.B., Tudhope, D.: Voronoi-based region approximation for geo-
graphical information retrieval with gazetteers. International Journal of Geograph-
ical Information Science 15(4), 287–306 (2001)
8. Arampatzis, A., van Kreveld, M., Reinbacher, I., Jones, C.B., Vaid, S., Clough, P.,
Joho, H., Sanderson, M.: Web-based delineation of imprecise regions. Computers,
Environment and Urban Systems 30, 436–459 (2006)
9. Galton, A.P.: Dynamic collectives and their collective dynamics. In: Mark, D.M.,
Cohn, A.G. (eds.) Spatial Information Theory. Springer, Heidelberg (2005)
10. Duckham, M., Kulik, L., Worboys, M., Galton, A.: Efficient generation of sim-
ple polygons for characterizing the shape of a set of points in the plane. Pattern
Recognition (March 2008) (in press) (Accepted for publication)
11. Deb, K.: Multi-objective Optimization Using Evolutionary Algorithms. John Wi-
ley, Chichester (2001)
12. Amenta, N., Choi, S., Kolluri, R.: The power crust. In: Sixth ACM Symposium on
Solid Modeling and Applications, pp. 249–260 (2001)
13. Amenta, N., Choi, S., Kolluri, R.: The power crust, unions of balls, and the medial
axis transform. Computational Geometry: Theory and Applications 19(2-3), 127–
153 (2001)
A The Dot Patterns Used in the Pilot Study
q q q
q q
q q q
q q q
q q
q q
q q q
q q q q
q q
Pattern 1 Pattern 2
q
q q q
q q q
q
q q q
q q
q q
q q q q
q q q q
Pattern 3 Pattern 4
q q
q
q q
q q q
q
q q q q
q q q
q q q
q q
q
q q
Pattern 5 Pattern 6
q
q
q
q
q q
q q
q q
q q q q q
q q
q q q
q q
Pattern 7 Pattern 8
424 A. Galton
B Area-Perimeter Plots for Pilot Study Dot Patterns
P P
A A
Pattern 1 Pattern 2
P P
A A
Pattern 3 Pattern 4
P P
A A
Pattern 5 Pattern 6
P P
A A
Pattern 7 Pattern 8
C Pareto Fronts, with Pilot Study Responses
P P
p pp
p pp p
pp
p p p pp
p p
pp pp p p
p paapa aap p p p
pp pp
p pap ap ap p p p p p ppp ap p p p p a a aa
p pa p ap pa p p paap a a p pp p p p
pp p pap p pp pp p
A A
Pattern 1 Pattern 2
P p P
pp
pp
pp p
p
pp
pp
pp pp p p p
pp ap p
p p p ap a
p ap p p ap p p p p p
p pa pa p p p p p ap aapa aa a
pa pa ap p p p
A A
Pattern 3 Pattern 4
P P
p p
p p pp a
p pp a
p pp p p papa p a
p pp p p
p p pp ppa p aap p ap
p ppapapaapapa a p p a pa p p a
p p pa p ppa p p p p p p p p p p p pp
p p pp p
A A
Pattern 5 Pattern 6
P P
p
p pp p p pp
ppp p p pp p
p p p pp pp pp p p p p p
p p p p p ppp ap p apap aa ap papap a ap p p p p p p ap pp p
aap p p p p p p p p p p p pap p p p p ap
ap p p p p p
A A
Pattern 7 Pattern 8
Qualitative Reasoning about Convex Relations
Dominik Lücke1 , Till Mossakowski1,2, and Diedrich Wolter1

1
Dept. of Computer Science
University of Bremen
P.O. Box 330440, D-28334 Bremen
2
DFKI Lab Bremen
Safe & Secure Cognitive Systems
Enrique-Schmidt-Str. 5, D-28359 Bremen
Abstract. Various calculi have been designed for qualitative constraint-

based representation and reasoning. Especially for orientation calculi, it
happens that the well-known method of algebraic closure cannot decide
consistency of constraint networks, even when considering networks over
base relations (= scenarios) only. We show that this is the case for all
relative orientation calculi capable of distinguishing between “left of”
and “right of”. Indeed, for these calculi, it is not clear whether efficient
(i.e. polynomial) algorithms deciding scenario-consistency exist.
As a partial solution of this problem, we present a technique to decide
global consistency in qualitative calculi. It is applicable to all calculi
that employ convex base relations over the real-valued space Rn and it
can be performed in polynomial time when dealing with convex relations
only. Since global consistency implies consistency, this can be an efficient
aid for identifying consistent scenarios. This complements the method of
algebraic closure which can identify a subset of inconsistent scenarios.
Keywords: Qualitative spatio-temporal reasoning, relative orientation

calculi, consistency.
1 Introduction
Since the work of [1] on temporal intervals, constraint calculi have been used
to model a variety of aspects of space and time in a way that is both quali-
tative (and thus closer to natural language than quantitative representations)
and computationally efficient (by appropriately restricting the vocabulary of
rich mathematical theories about space and time). For example, the well-known
region connection calculus by [2] allows for reasoning about regions in space. Ap-
plications include geographic information systems, human-machine interaction,
and robot navigation.
Efficient qualitative spatial reasoning mainly relies on the algebraic closure
algorithm. It is based on an algebra of (often binary) relations: using relational
composition and converse, it refines (basic) constraint networks in polynomial
time. If algebraic closure detects an inconsistency, the original network is surely

Qualitative Reasoning about Convex Relations 427
inconsistent. If no inconsistency is detected, for some calculi, this implies con-

sistency of the original network — not for all calculi, though.
Orientation calculi focus on relative directions in Euclidean space, like “to
the left of”, “to the right of”, “in front of”, or “behind of”. They face two dif-
ficulties: often, these calculi employ ternary relations, for which the theory is
much less developed than for binary ones. Moreover, in this work, we show that
algebraic closure can badly fail to approximate the decision of consistency of con-
straint networks. Hence, we look for alternative ways of tackling the consistency
problem. We both refine the algebraic closure method by using compositions
of higher arities, and present a polynomial decision procedure for global consis-
tency of constraint networks that consist of convex relations. These two methods
approximate consistency from below and above.
2 Qualitative Calculi
Qualitative calculi are employed for representing knowledge about a domain
using a finite set of labels, so-called base relations. Base relations partition the
domain into discrete parts. One example is distinguishing points on the time
line by binary relations such as “before” or “after”. A qualitative representation
only captures membership of domain objects in these parts. For example, it can
be represented that time point A occurs before B, but not how much earlier nor
at which absolute time. Thus, a qualitative representation abstracts, which is
particularly helpful when dealing with infinite domains like time and space that
possess an internal structure like for example Rn .
In order to ensure that any constellation of domain objects is captured by
exactly one qualitative relation, a special property is commonly required:
Definition 1. Let B = {B1 , . . . , Bk } be a set of n-ary relations over a domain
D. These relations are said to be jointly exhaustive and pairwise disjoint (JEPD),
if they satisfy the properties
1. ∀i, j ∈{1, . . . , k} with i = j : Bi ∩ Bj = ∅
2. Dn = i∈{1,...,k} Bi
For representing uncertain knowledge within a qualitative calculus, e.g., to rep-
resent that objects x1 , x2 , . . . , xn are either related by relation Bi or by relation
Bj , general relations are introduced.
Definition 2. Let B = {B1 , . . . , Bk } be a set of n-ary relations over a domain
D. The set of general relations RB (or simply R) is the powerset P(B). The
semantics of a relation R ∈ RB is defined as follows:
R(x1 , . . . , xn ) :⇔ ∃Bi ∈ R, Bi (x1 , . . . , xn )
In a set of base relations that is JEPD, the empty relation ∅ ∈ RB is called
the impossible relation. Reasoning with qualitative information takes place on
the symbolical level of relations R, so we need special operators that allow us
to manipulate qualitative knowledge. These operators constitute the algebraic
structure of a qualitative calculus.
428 D. Lücke, T. Mossakowski, and D. Wolter
2.1 Algebraic Structure of Qualitative Calculi

The most fundamental operators in a qualitative calculus are those for relating
qualitative relations in accordance to their set-theoretic disjunctive semantics.
So, for R, S ∈ R, intersection (∩) and union (∪) are defined canonically. The
set of general relations is closed under these operators. Set-theoretic operators
are independent of the calculus at hand, further operators are defined using the
calculus semantics.
Qualitative calculi need to provide operators for interrelating relations that are
declared to hold for the same set of objects but differ in the order of arguments.
Put differently, we need operators which allow us to change perspective. For
binary calculi only one operator needs to be defined:
Definition 3. The converse ( ) of a binary relation R is defined as:
R := {(x2 , x1 )|(x1 , x2 ) ∈ R}
Ternary calculi require more operators to realize all possible permutations of

three variables. The three commonly used operators are shortcut, homing, and
inverse:
Definition 4. Permutation operators for ternary calculi:
IN V (R) := { (y, x, z) | (x, y, z) ∈ R } (inverse)

SC(R) := { (x, z, y) | (x, y, z) ∈ R } (shortcut)
HM (R) := { (y, z, x) | (x, y, z) ∈ R } (homing)
Additional permutation operations can be defined, but a small basis that can
generate any permutation suffices, given that the permutation operations are
strong (see discussion further below) [3]. A restriction to few operations partic-
ularly eases definition of higher arity calculi.
Definition 5 ([3]). Let R1 , R2 , . . . , Rn ∈ RB be a sequence of n general rela-

tions in an n-ary qualitative calculus over the domain D. Then the operation
◦ (R1 , . . . , Rn ) := {(x1 , . . . , xn ) ∈ Dn | ∃u ∈ D, (x1 , . . . , xn−1 , u) ∈ R1 ,

(x1 , . . . , xn−2 , u, xn ) ∈ R2 , . . . , (u, x2 . . . , xn ) ∈ Rn }
is called n-ary composition.
Note that for n = 2 one obtains the classical composition operation for binary
calculi (cp. [4]) which is usually noted as infix operator. Nevertheless different
kinds of binary compositions have been used for ternary calculi, too.
2.2 Strong and Weak Operations

Permutation and composition operators define relations. Per se it is unclear
whether the relations obtained by application of an operation are expressible
in the calculus, i.e. whether the set of general relations RB is closed under an
operation. Indeed, for some calculi the set of relations is not closed, there even
exist calculi for which no closed set of finite size can exist, e.g. the composition
operation in Freksa’s double cross calculus [5].
Definition 6. Let an n-ary qualitative calculus with relations RB over domain

D and an m-ary operation φ : B m → P(Dn ) be given. If theset of relations is
closed under φ, i.e. for ∀B ∈ B m ∃R ∈ RB : φ(B) = B∈R B, then the
operation φ is called strong.
In qualitative reasoning we must restrict ourselves to a finite set of relations.

Therefore, if some operation is not strong in the sense of Def. 6, an upper ap-
proximation of the true operation is used instead.
Definition 7. Given a qualitative calculus with n-ary relations RB over domain

D and an operation φ : B m → P(Dn ), then the operator
φ : B m → RB
φ (B1 , . . . , Bk ) := {R ∈ B|R ∩ φ(B1 , . . . , Bk ) = ∅}
is called a weak operation, namely the weak approximation of φ.
Note that the weak approximation of an operation is identical to the original

operation if and only if the original operation is strong. Further note that any
calculus is closed under weak operations. Applying weak operations can lead to
a loss of information which may be critical in certain reasoning processes. In the
literature the weak composition operation is usually denoted by .
Definition 8. We call an m-ary relation R over Rn convex, if
{y | R (x1 , . . . , xm−1 , y) , (x1 , . . . , xm−1 , y) ∈ Rn }
is a convex subset of Rn .
3 Constraint Based Qualitative Reasoning
Qualitative reasoning is concerned with solving constraint satisfaction problems

(CSPs) in which constraints are expressed using relations of the calculus. Defi-
nitions from the field of CSP are carried over to qualitative reasoning (cp. [6]).
Definition 9. Let R be the general relations of a qualitative calculus over the

domain D. A qualitative constraint is a formula R(X1 , . . . , Xn ) (also written
X1 . . . Xn−1 R Xn ) with variables Xi taking values from the domain and R ∈ R.
A constraint network is a set of constraints. A constraint network is said to
be a scenario if it gives base relations for all relations R(X1 , . . . , Xn ) and the
base relations obtained for different permutations of variables X1 , . . . , Xn must
be agreeable wrt. the permutation operations.
One key problem is to decide whether a given CSP has a solution or not. This
can be a very hard problem. Infinity of the domain underlying qualitative CSPs
inhibits searching for an agreeable valuation of the variables. This is why decision
procedures that purely operate on the symbolic, discrete level of relations (rather
than on the level of underlying domain) receive particular interest.
Definition 10. A constraint network is called consistent if a valuation of all
variables exists, such that all constraints are fulfilled. A constraint network is
called n-consistent (n ∈ N) if every solution for n − 1 variables can be extended
to a n variable solution involving any further variable. A constraint network
is called strongly n-consistent, if it is m-consistent for all m ≤ n. A CSP in
n-variables is globally consistent, if it is strongly n-consistent.
A fundamental technique for deciding consistency in a classical CSP is to en-
force k-consistency by restricting the domain of variables in the CSP to mutually
agreeable values. Backtracking search can then identify a consistent variable as-
signment. If the domain of some variable gets restricted to down to zero size
while enforcing k-consistency, the CSP is not consistent. This procedure except
for backtracking search (which is not applicable in infinite domains) is also ap-
plied to qualitative CSPs [4]. For a JEPD calculus with n-ary relations any
qualitative CSP is strongly n-consistent unless it contains a constraint with the
empty relation. So the first step in checking consistency would be to test n + 1-
consistency. In the case of a calculus with binary relations this would mean
analyzing 3-consistency, also called path-consistency. This is the aim of the al-
gebraic closure algorithm which exploits that composition lists all 3-consistent
scenarios.
Definition 11. A CSP over binary relations is called algebraically closed if for
all variables X1 , X2 , X3 and all relations R1 , R2 , R3 the constraint relations
R1 (X1 , X2 ), R2 (X2 , X3 ), R3 (X1 , X3 )
imply
R3 ⊆ R1 R2
To enforce algebraic closure, the operation R3 := R3 ∩ R1 R2 (as well as a
similar operation for converses) is applied for all variables until a fixpoint is
reached.
Enforcing algebraic closure preserves consistency, i.e., if the empty relation is
obtained during refinement, then the qualitative CSP is inconsistent. However,
algebraic closure does not mandatorily decide consistency: a CSP may be alge-
braically closed but inconsistent — even if composition is strong [7].
Algebraic closure has also been adapted to ternary calculi using binary compo-
sition [8]. Binary composition of ternary relations involves 4 variables, it may not
be able to represent all 4-consistent scenarios though. Scenarios with 4 variables
are specified by 4 ternary relations. However, binary composition R1 R2 = R3
only involves 3 ternary relations. Therefore, using n-ary composition in reasoning
with n-ary relations is more natural (cp. [3]).
4 Reasoning about Relative Orientation

In this section we give an account on findings for deciding consistency of qualita-
tive CSPs. Our study is based on the LR-calculus (ref. to [9]), a coarse relative
orientation calculus. It defines nine base relations which are depicted in Fig. 1.
The LR-calculus deals with the relative position of a point C with respect to
the oriented line from point A to point B, if A = B. The point C can be to the
left of (l), to the right of (r) the line, or it can be on a line collinear to the given
one and in front of (f ) B, between A and B with the relation (i) or behind (b)
A, further it can be on the start-point A (s) or an the end-point B (e). If A = B,
then we can distinguish between the relations Tri , expressing that A = C and
Dou, meaning A = C. Freksa’s double cross calculus DCC is a refinement of the
LR-calculus and, henceforth, our findings for the LR-calculus can be directly
applied to the DCC-calculus as well. We give negative results on the applicability
of existing approaches for qualitative reasoning and discuss how computations
on the algebraic level can nevertheless be helpful. We begin with a lower bound
of the complexity.
f
dou
e
B
i
l r A,B tri
s
A
b
Fig. 1. The nine base relations of the LR-calculus; tri designates the case of A = B =
C, whereas dou stands for A = B = C
Theorem 12. Deciding consistency of CSPs in LR is N P-hard.

Proof (sketch). In a straightforward adaption of the proof given in [10] for the
DCC calculus, the N P-hard problem NOT-ALL-EQUAL-3SAT can be reduced
to equality of points.

Algebraic closure usually is regarded the central tool for deciding consistency
of qualitative CSPs. For the first qualitative calculi investigated (point calcu-
lus [11], Allen’s interval algebra [1]) it turned out that algebraic closure decides
consistency for the set of base relations, i.e. algebraic closure gives us a polyno-
mial time decision procedure for consistency of qualitative CSPs when dealing
with scenarios. This leads to the exponential time algorithm for deciding consis-
tency of general CSPs using backtracking search to refine relations in the CSP
to base relations [1]. Renz pioneered research on identifying larger sets for which
algebraic closure decides consistency, thereby obtaining a practical decision pro-
cedure [12]. If however algebraic closure is too weak for deciding consistency of
scenarios, no approaches are known for dealing with qualitative CSPs on the
algebraic level. Unfortunately this is the case for the LR-calculus.
Proposition 13. All scenarios only containing the relations l and r are alge-
braically closed wrt. the LR-calculus with binary composition.
Proof. We have a look at the permutations of LR and see that
operation operand result

INV l r
r l
SC l r
r l
HM l l
r r
the set of {l, r} is closed under all permutations. A look at the binary composition
table of LR reveals that all compositions containing only l and r on their left
hand side, always have the set {l, r} included in their right hand side:
operand 1 operand 2 result

l l {b, s, i, l, r}
l r {f, l, r}
r l {f, l, r}
r r {b, s, i, l, r}
But with this we can conclude, that
Ri,k Rk,j ∩ Ri,j = ∅
for all i, k, j, with Rn,m ∈ {l, r}.

Of course not all LR-scenarios over the variables l and r are consistent. We will
show that
SCEN := {(A B r C), (A E r D), (D B r A),

(D C r A), (D C r B), (D E r B),
(D E l C), (E B r A), (E C r A),
(E C r B)}
is algebraically closed but inconsistent. Algebraic closure directly follows from

Thm. 13. We will show that any projection of this scenario to the natural domain
R2 of the LR-calculus yields a contradiction. Therefore we construct equations
β r
6
α rγ

σ
αr -
α
Fig. 2. Constructing equations
for the relations of the LR-calculus. In R2 the sign of the scalar product
sign(X, Y ) determines the relative direction of X and Y . Given three points
α, β and γ that are connected by an LR-relation, we can construct a local co-
ordinate system with origin α. It has one base vector going from α to β; we call
this vector α. The vector orthogonal to this one and and facing to the right is
called α , as shown in Fig. 2. The vector from α to γ is called σ. With this we
get that (α β r γ) is true iff α , σ > 0, and (α β l γ) is true iff α , σ < 0,
and of course we know that the points α, β, and γ are different points in these
cases. The vectors α and σ are described by

yβ − yα xγ − xα
α = ,σ= .
xα − xβ yγ − yα
With this we get
(α β r γ) ⇔ (yβ − yα ) · (xγ − xα ) + (xα − xβ ) · (yγ − yα ) > 0
(α β l γ) ⇔ (yβ − yα ) · (xγ − xα ) + (xα − xβ ) · (yγ − yα ) < 0.

Scenarios of the LR-calculus are invariant wrt. the operations of translation,
rotation and scaling, this means that we can fix two points to arbitrary values,
we chose to set D to (0, 0) and B to (0, 1). With this we obtain the inequations
xA · yE < yA · xE (1) xC < 0 (4)

xC · yA < yC · xA (2) xE < 0 (5)
yE · xC < xE · yC (3) 0 < xA (6)
In fact more inequations are derivable, but already these ones are not jointly
satisfiable and we conclude:
Theorem 14. Classical algebraic closure does not enforce scenario consistency
for the LR-calculus.
Proof. We consider the algebraically closed LR scenario SCEN and the inequa-
tions (1) to (6) that we derived when projecting it into R2 , the intended domain
of LR. From inequations (1), (6), (4), (5) and (3) we obtain
xE · yC yA · xE
< yE <
xC xA
and again using inequations (6), (4) and (5) we get
yC · xA < xC · yA
contradicting (2). Hence our scenario is not consistent.

As discussed earlier ternary composition is more natural for ternary calculi than
binary composition. Therefore we examined the ternary composition table of the
LR-calculus1 and conclude:
Theorem 15. Algebraic closure wrt. ternary composition does not enforce sce-
nario consistency for the LR-calculus.
Proof. Let us have a closer look at the ternary composition operation wrt. the
relations contained in SCEN, namely the relation l and r. Recall that the set {l, r}
of LR-relations is closed under all permutation operations. So we only need to
consider the fragment of the composition table with triples over l, r:
(r, r, r) = {r}, (r, r, l) = {b, r, l},
(r, l, r) = {f, r, l}, (r, l, l) = {i, r, l},
(l, r, r) = {i, r, l}, (l, r, l) = {f, r, l},
(l, l, r) = {b, r, l}, (l, l, l) = {l}.
We see that any composition that contains r as well as l in the triple on the
left-hand side yields a superset of {r, l} on the right-hand side. So all com-
posable triples that have both l and r on their left hand side cannot yield an
empty set while applying algebraic closure. So, we have to investigate how the
compositions (l, l, l) and (r, r, r) are used when enforcing algebraic closure.
Enumerating all composable triples (X1 X2 r1 X4 ), (X1 X4 r2 X3 ), (X4 X2 r3 X3 )
and their respective refinement relation (X1 X2 rf X3 ) yields a list with 18 entries
shown in Appendix A. All of those entries list l as refinement relation whenever
composing (l, l, l) and analogously for r. Thus, no refinement is possible, and
the given scenario is algebraically closed wrt. ternary composition.

We believe that advancing to even higher arity composition will not provide us
with a sound algebraic closure algorithm. It turns out, however, that moving to
a certain level of k-consistency does indeed make a change.
Remark 16. Of course it is theoretically possible to solve these systems of inequa-

tions by quantifier elimination, or by the more optimized Cylindrical Algebraic
1
Such a table is available via the qualitative reasoner SparQ. (ref. to http://www.
sfbtr8.spatial-cognition.de/project/r3/sparq/)
Decomposition (CAD). Unfortunately the CAD algorithm has a double expo-

nential worst case running time (even though this can be reduced to polynomial
running time with a optimal choice of the involved projections). Our experiments
with CAD tools unfortunately were quite disillusioning, since those tools choked
on our problems mainly because of the large number of involved variables (con-
sider that each point in our scenarios introduces 2 variables in our systems of
inequalities).
5 Deciding Global Consistency

In this section we will generalize a technique from [13] and we will show that this
generalization decides global consistency for arbitrary CSPs over m-ary convex
relations over a domain Rn . The resulting theorem transfers Thm. 5 of [14] from
classical constraint satisfaction to qualitative spatio-temporal reasoning.
Theorem 17 (Helly [15]). Let S be a set of convex regions of the n-
dimensional space Rn . If every n+ 1 elements in S have a nonempty intersection
then the intersection of all elements of S is nonempty.
Theorem 18. A CSP over m-ary convex relations over a domain Rn is globally
consistent, i.e. k-consistent for all k ∈ N, if and only if it is strongly ((m − 1) ·
(n + 1) + 1)-consistent.
Proof. In the first step of this proof consider an arbitrary CSP over convex m-
ary relations that is strongly (m − 1) · (n + 1) + 1 consistent. By induction on k,
which is the number of variables that can be instantiated in a strongly consistent
way, we show that it is k + 1 consistent for an arbitrary k. Assume that for each
tuple (X1 , . . . , Xk ) of these variables a consistent valuation (z1 , . . . , zk ) exists.
For this purpose we define sets

ps zi1 , . . . , zim−1 , Ri1 ,...,is ,k+1,is+1 ,...,im−1 = {z |

Ri1 ,...,is ,k+1,is+1 ,...,im−1 zi1 , . . . , zis , z, zis+1 , . . . , zim−1 }
with 1 ≤ ij ≤ k and 1 ≤ s ≤ m − 1. By assumption, these are sets of con-
vex regions of the particular space defined by the assignment of the variables
(X1 , . . . , Xk ) → (z1 , . . . , zk ) and the particular relation Ri1 ,...,,is ,k+1,is+1 ,...,im−1 .
Let

P = {ps zi1 , . . . , zim−1 , Ri1 ,...,is ,k+1,is+1 ,...,im−1 |
1 ≤ s ≤ m − 1 ∧ 1 ≤ ij ≤ k}
be the set of all such convex regions. Observe that n + 1 tuples of elements of P
are induced by constraints containing up to (m − 1) · (n + 1) different variables.
By strong ((m − 1) · (n + 1) + 1)-consistency we know that the intersection of all
these regions is non-empty. The application of Helly’s Theorem yields

p = ∅.
p∈P
Hence a valuation for k + 1 variables exists. The second step of this proof is
trivial, since global consistency implies k-consistency for all k ∈ N.

In [7, Prop. 1] it was shown that whether composition is weak or strong is
independent of the property of algebraic closure to decide consistency. However,
in some cases, these two properties are related:
Theorem 19. In a binary calculus over the real line that
1. has only 2-consistent relations
2. and has strong binary composition
algebraic closure decides consistency of CSPs over convex base relations.
Proof (Proof sketch). By Thm. 18 we know that strong 3-consistency decides
global consistency. Since composition is strong, algebraic closure decides 3-
consistency and, since we have 2 consistency, it decides strong 3-consistency
too. Thus algebraically closed scenarios are either inconsistent (containing the
empty relation) or globally consistent. Put differently, global consistency and
consistency coincide.

Corollary 20. For CSPs over convex {LR, DCC}-relations strong 7-consistency
decides global consistency.
Proof. Follows directly from Thm. 18 for both calculi.

Corollary 21. Global consistency of scenarios in convex {LR, DCC}-relations
is polynomially decidable.
Proof. Compute the set of strongly 7-consistent scenarios in constant time (e.g.
using quantifier elimination2 ). The given scenario is strongly 7-consistent iff all
7-point subscenarios are contained in the set of strongly 7-consistent scenarios.
By Thm. 18 this decides global consistency.

Unfortunately consistency and global consistency are not equivalent in the LR-
calculus.
Proposition 22. For the LR-calculus not every consistent scenario is globally
consistent.
Proof. Consider the consistent scenario

{(A B r C), (A B r D), (C D l A)
(C D l B), (A B f E), (C D f E)}
which has a realization as shown in Fig. 3 (left), the lines AB and CD intersect.
Now consider the sub-CSP in the variables A, B, C, and D with the solution
shown in Fig. 3 (right). We see that the lines AB and CD are parallel, but the
constraints (A B f E) and (C D f E) demand that the point E is on the line
AB as well as on the line CD, hence the given scenario is not 5-consistent, and
so it is not globally consistent.

2
Here we just want to state the computation is possible, we do not claim to suggest
a practical method though.
DD
DD yy
DD yy
DD yy
D yyy
89:;
?>=<
E@ 89:;
?>=<
B 89:;
?>=<
C
~~ @@
~~ @@
~~ @@
~ @@
~~
89:;
?>=<
B 89:;
?>=<
D@ 89:;
?>=<
A 89:;
?>=<
D
~ @@
~~ @@
~~~ @@
~ @@
~~
89:;
?>=<
A 89:;
?>=<
C?
??
??
??
?
Fig. 3. Illustration for Prop. 22
6 Discussion and Conclusion

We have shown that for relative orientation calculi capable of distinguishing be-
tween “left of” and “right of” like the LR-calculus, the composition table alone
is not sufficient for deciding consistency of qualitative scenarios. We have argued
that binary composition in ternary calculi in general does not provide sufficient
means for generalizing algebraic closure to ternary calculi. Instead ternary com-
position is required. However, advancing to ternary composition which can list
4-consistent scenarios and thus allows us to generalize algebraic closure is still
not sufficient for deciding consistency. This is a remarkable result that has im-
plications to several relative orientation calculi to which the given proofs can be
transferred:
– LR calculus [16]
– Dipole calculus [17]
– OPRA calculus family [18]
– Double-cross calculus (DCC) [19]
To conclude, at the time being we have no practical method for deciding
consistency in any of the listed relative orientation calculi. This may lead to a
dramatic impact on qualitative spatial reasoning: The highly structured spatial
domain does not yet help us to implement more effective reasoning algorithms
than for general logical reasoning. So far the only backbone for reasoning with
relative information is given by a logic-based approach [20].
In future work the practical utility of the presented polynomial-time decision
procedure given by Cor. 21 for global consistency needs to be analyzed. While the
general problem of deciding consistency of constraint satisfaction problems in LR
is N P-hard, it is likely to be easier for scenarios. Therefore, our future work will
be involved with singling out tractable problem classes and we aim at developing
a method for deciding consistency of qualitative constraint satisfaction problems
contained in N P, possibly finding a polynomial-time method for scenarios.
Acknowledgements
This work was supported by the DFG Transregional Collaborative Research Cen-
ter SFB/TR 8 “Spatial Cognition”, projects I4-[Spin] and R3-[Q-Shape]. Funding
by the German Research Foundation (DFG) is gratefully acknowledged.
References
1. Allen, J.: Maintaining knowledge about temporal intervals. Communications of the

ACM, 832–843 (1983)
2. Randell, D.A., Cui, Z., Cohn, A.: A spatial logic based on regions and connection.
In: Nebel, B., Rich, C., Swartout, W. (eds.) KR 1992. Principles of Knowledge
Representation and Reasoning, pp. 165–176. Morgan Kaufmann, San Francisco
(1992)
3. Condotta, J.F., Saade, M., Ligozat, G.: A generic toolkit for n-ary qualitative
temporal and spatial calculi. In: TIME 2006: Proceedings of the Thirteenth Inter-
national Symposium on Temporal Representation and Reasoning, pp. 78–86. IEEE
Computer Society, Los Alamitos (2006)
4. Renz, J., Nebel, B.: Qualitative spatial reasoning using constraint calculi. In: Aiello,
M., Pratt-Hartmann, I.E., Johan, F., Johan, F.v.B. (eds.) Handbook of Spatial
Logics. Springer, Heidelberg (2007)
5. Scivos, A., Nebel, B.: Double-crossing: Decidability and computational complexity
of a qualitative calculus for navigation. In: Montello, D.R. (ed.) COSIT 2001.
LNCS, vol. 2205, pp. 431–446. Springer, Heidelberg (2001)
6. Dechter, R.: From local to global consistency. Artificial Intelligence 55, 87–108
(1992)
7. Renz, J., Ligozat, G.: Weak composition for qualitative spatial and temporal rea-
soning. In: van Beek, P. (ed.) CP 2005. LNCS, vol. 3709, pp. 534–548. Springer,
Heidelberg (2005)
8. Dylla, F., Moratz, R.: Empirical complexity issues of practical qualitative spatial
reasoning about relative position. In: Proceedings of the Workshop on Spatial and
Temporal Reasoning at ECAI 2004 (2004)
9. Scivos, A., Nebel, B.: The finest of its class: The natural, point-based ternary
calculus LR for qualitative spatial reasoning. In: Spatial Cognition, pp. 283–303
(2004)
10. Scivos, A.: Einführung in eine Theorie der ternären RST-Kalküle für qualitatives
räumliches Schließen. Master’s thesis, Universität Freiburg (in German) (April
2000)
11. Vilain, M.B., Kautz, H.A., van Beek, P.G.: Constraint propagation algorithms for
temporal reasoning: A revised report. In: Readings in Qualitative Reasoning about
Physical Systems. Morgan Kaufmann, San Francisco (1989)
12. Renz, J.: Qualitative Spatial Reasoning with Topological Information. LNCS,
vol. 2293. Springer, Berlin (2002)
13. Isli, A., Cohn, A.: A new approach to cyclic ordering of 2D orientations using
ternary relation algebras. Artificial Intelligence 122(1-2), 137–187 (2000)
14. Sam-Haroud, D., Faltings, B.: Consistency techniques for continuous constraints.
constraints 1, 85–118 (1996)
15. Helly, E.: Über Mengen konvexer Körper mit gemeinschaftlichen Punkten. Jber.
Deutsch. Math. Verein 32, 175–176 (1923)
16. Ligozat, G.: Qualitative triangulation for spatial reasoning. In: Proc. International
Conference on Spatial Information Theory. A Theoretical Basis for GIS, pp. 54–68
(1993)
17. Moratz, R., Renz, J., Wolter, D.: Qualitative spatial reasoning about line segments.
In: Horn, W. (ed.) ECAI 2000. Proceedings of the 14th European Conference on
Artificial Intelligence. IOS Press, Amsterdam (2000)
18. Moratz, R., Dylla, F., Frommberger, L.: A relative orientation algebra with ad-
justable granularity. In: Proceedings of the Workshop on Agents in Real-Time,
and Dynamic Environments (IJCAI 2005) (2005)
19. Freksa, C.: Using orientation information for qualitative spatial reasoning. In: Pro-
ceedings of the International Conference GIS - From Space to Territory: Theories
and Methods of Spatio-Temporal Reasoning on Theories and Methods of Spatio-
Temporal Reasoning in Geographic Space, London, UK, pp. 162–178. Springer,
Heidelberg (1992)
20. Renegar, J.: On the computational complexity and geometry of the first order
theory of the reals. Part I–III. Journal of Symbolic Computation 13(3), 301–328
(1992); 255300, 301328, 329352
A Table of Composable l/r Triples

(A C l B) (A B l D) (B C l D) (A C l D)
↓ ↓ ↓ ↓
( l, l, l) ∩ {l} = {l}
(A C l E) (A E l D) (E C l D) (A C l D)
↓ ↓ ↓ ↓
( l, l, l) ∩ {l} = {l}
(A C l B) (A B l E) (B C l E) (A C l E)
↓ ↓ ↓ ↓
( l, l, l) ∩ {l} = {l}
(E A l B) (E B l C) (B A l C) (E A l C)
↓ ↓ ↓ ↓
( l, l, l) ∩ {l} = {l}
(C D l B) (C B l A) (B D l A) (C D l A)
↓ ↓ ↓ ↓
( l, l, l) ∩ {l} = {l}
(C D l E) (C E l A) (E D l A) (C D l A)
↓ ↓ ↓ ↓
( l, l, l) ∩ {l} = {l}
(C E l B) (C B l A) (B D l A) (C E l A)
↓ ↓ ↓ ↓
( l, l, l) ∩ {l} = {l}
(E C r B) (E B r A) (B C r A) (E C r A)
↓ ↓ ↓ ↓
( r, r, r) ∩ {r} = {r}
(D A l B) (D B l C) (B A l C) (D A l C)
↓ ↓ ↓ ↓
( l, l, l) ∩ {l} = {l}
(D A l E) (D E l C) (E A l C) (D A l C)
↓ ↓ ↓ ↓
( l, l, l) ∩ {l} = {l}
(A D r B) (A B r C) (B D r C) (A D r C)
↓ ↓ ↓ ↓
( r, r, r) ∩ {r} = {r}
(A D r E) (A E r C) (E D r C) (A D r C)
↓ ↓ ↓ ↓
( r, r, r) ∩ {r} = {r}
(A E r B) (A B r C) (B E r C) (A E r C)
↓ ↓ ↓ ↓
( r, r, r) ∩ {r} = {r}
(C A r B) (C B r E) (B A r E) (C A r E)
↓ ↓ ↓ ↓
( r, r, r) ∩ {r} = {r}
(C A r E) (C E r D) (E A r D) (C A r D)
↓ ↓ ↓ ↓
( r, r, r) ∩ {r} = {r}
(C A r B) (C B r D) (B A r D) (C A r D)
↓ ↓ ↓ ↓
( r, r, r) ∩ {r} = {r}
(D C r B) (D B r A) (B C r A) (D C r A)
↓ ↓ ↓ ↓
( r, r, r) ∩ {r} = {r}
(D C r E) (D E r A) (E C r A) (D C r A)
↓ ↓ ↓ ↓
( r, r, r) ∩ {r} = {r}
Author Index
Agrawal, Shruti 202 Kelly, Jonathan W. 22

Andonova, Elena 250 Khamassi, Mehdi 71
Arleo, Angelo 39 Kiefer, Peter 361
Avraamides, Marios N. 8 Kohlhagen, Christian 56
Kuhnmünch, Gregory 154
Barclay, Michael 216 Kutz, Oliver 266
Basten, Kai 104
Bülthoff, Heinrich H. 1 Liben, Lynn S. 171, 202
Lockwood, Kate 283, 378
Campos, Jennifer L. 1 Lovett, Andrew 283, 378
Carlson, Laura A. 4 Lücke, Dominik 426
Chavarriaga, Ricardo 71
Cohn, Anthony G. 394 Mallot, Hanspeter A. 87, 104
Martinet, Louis-Emmanuel 39
Dara-Abrams, Drew 138 McNamara, Timothy P. 22
Dee, Hannah M. 394 Meilinger, Tobias 1, 344
Dollé, Laurent 71 Meyer, Jean-Arcady 39
Mossakowski, Till 426
Egenhofer, Max J. 295 Myers, Lauren J. 171
Forbus, Kenneth 283, 378 Nedas, Konstantinos A. 295

Fouque, Benjamin 39
Fraile, Roberto 394 Pantelidou, Stephanie 8
Frommberger, Lutz 311 Passot, Jean-Baptiste 39
Peters, Denise 154
Galton, Antony 216, 409
Gentner, Dedre 7 Raubal, Martin 328
Gerstmayr, Lorenz 87 Reineking, Thomas 56
Girard, Benoı̂t 71 Richter, Kai-Florian 154
Giudice, Nicholas A. 121 Ross, Robert J. 233, 250
Goschler, Juliana 250
Guillot, Agnès 71 Schmid, Falko 154
Hogg, David C. 394 Tietz, Jerome D. 121

Hois, Joana 266
Hurlebaus, Rebecca 104 Wiener, Jan M. 87, 104
Wolter, Diedrich 311, 426
Kastens, Kim A. 171, 202
Keehner, Madeleine 188 Zetzsche, Christoph 56

Spatial Cognition

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Spatial Cognition

Uploaded by

Copyright:

Available Formats

Lecture Notes in Artificial Intelligence 5248

Edited by R. Goebel, J. Siekmann, and W. Wahlster

Subseries of Lecture Notes in Computer Science

International Conference Spatial Cognition 2008

Library of Congress Control Number: 2008934601

CR Subject Classiﬁcation (1998): H.2.8, I.2.10, H.3.1, K.4.2, B.5.1

September 2008 Christian Freksa

Tutorial Chair Poster Session Chair

Workshop Chairs Doctoral Colloquium Chairs

Pragya Agarwal Maureen Donnelly

Markus Knauﬀ Martin Raubal

On the “Whats” and “Hows” of “Where”: The Role of Salience in

Learning about Space (Abstract) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

Spatial Memory and Spatial Orientation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

Eﬃcient Wayﬁnding in Hierarchically Regionalized Spatial

Analyzing Interactions between Navigation Strategies Using a

A Minimalistic Model of Visually Guided Obstacle Avoidance and Path

Learning with Virtual Verbal Displays: Eﬀects of Interface Fidelity on

Cognitive Surveying: A Framework for Mobile Data Collection,

Maps and Modalities

Locating Oneself on a Map in Relation to Person Qualities and Map

An Inﬂuence Model for Reference Object Selection in Spatially Locative

Perspective Use and Perspective Shift in Spatial Dialogue . . . . . . . . . . . . . 250

Natural Language Meets Spatial Calculi . . . . . . . . . . . . . . . . . . . . . . . . . . . . 266

Automatic Classiﬁcation of Containment and Support Spatial Relations

Similarity and Abstraction

Spatial Abstraction: Aspectualization, Coarsening, and Conceptual

Concepts and Reference Frames

The Network of Reference Frames Theory: A Synthesis of Graphs and

Spatially Constrained Grammars for Mobile Intention Recognition . . . . . 361

Modeling Cross-Cultural Performance on the Visual Oddity Task . . . . . . 378

Spatial Modeling and Spatial Reasoning

Pareto-Optimality of Cognitively Preferred Polygonal Hulls for Dot

Qualitative Reasoning about Convex Relations . . . . . . . . . . . . . . . . . . . . . . 426

Author Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441

Heinrich H. Bülthoﬀ, Jennifer L. Campos, and Tobias Meilinger

Max-Planck-Institute for Biological Cybernetics

The interdisciplinary research ﬁeld of spatial cognition has beneﬁted greatly

it with a name. In a successive route memory test participants were “beamed”

Department of Psychology, University of Notre Dame, USA

The Main Points

Department of Psychology, Northwestern University, USA

C. Freksa et al. (Eds.): Spatial Cognition VI, LNAI 5248, p. 7, 2008.

Marios N. Avraamides and Stephanie Pantelidou

Department of Psychology, University of Cyprus

Abstract. Two experiments were conducted to assess whether the orientation of

Keywords: body orientation, sensorimotor interference, perspective-taking,

In order to investigate the boundary conditions of sensorimotor facilita-

Materials and Apparatus

Fig. 1. Example of a path used in Experiment 1

Aligned 90° Misaligned 90° Contra-Aligned 180°

Stay 86.27 (18.40) 78.57 (17.55) 75,00 (23.23)

Rotate 82.45 (13.11) 73.75 (13.28) 65.42 (15.63)

Results from Experiment 1 clearly documented the presence of a normal alignment

Materials and Apparatus

Aligned 90° Misaligned 90° Contra-Aligned 180°

Stay 76.96 (20.59) 69.24 (20.84) 59.94 (23.67)

Rotate 77.15 (16.89) 70.38 (18.85) 63.45 (17.37)

3.4 Discussion and Cross-Experiment Analyses

Results from Experiment 2 replicated closely those of Experiment 1. Specifically, a

typically used to encode non-immediate spatial relations while immediate relations

7. May, M.: Imaginal perspective switches in remembered environments: transformation ver-

Jonathan W. Kelly and Timothy P. McNamara

Department of Psychology, Vanderbilt University

k1 (s) = [rj (s) − r̄J (s))/σJ (s)]4 j∈J . (11)

k2 (j) = [rj (s) − r̄j )/σj ]4 s∈S (12)